{"generated_at":"2026-06-02T08:11:43+00:00","collection":{"since":null,"days":14,"comprehensive":true,"openalex_enabled":true,"arxiv_company_search_enabled":false},"source_notes":["Configured official company publication pages and feeds are collected where available.","Official company technical reports are collected from configured company-owned HuggingFace and GitHub repositories.","HuggingFace Papers search is accepted only when organization or author metadata matches a tracked company.","Default archive filtering keeps explicit reports or papers matching configured frontier-AI model keywords.","OpenAlex authorship institution metadata is collected to catch papers whose author affiliations name a tracked lab.","arXiv company-name fallback search is disabled for this run.","Company-owned HuggingFace model repositories are scanned for PDF technical reports.","PDF-only affiliations not exposed by OpenAlex, arXiv metadata, or configured official sources may still require an additional PDF-text adapter."],"totals":{"papers":5885,"companies":19,"tracked_companies":19},"companies":[{"id":"microsoft","name":"Microsoft","group_id":"company_us","group_name":"🇺🇸 US AI Labs","region":"US","aliases":["Microsoft","Microsoft Research","MSR"],"paper_count":1695,"latest_paper_date":"2026-06-01"},{"id":"alibaba-qwen","name":"Alibaba/Qwen","group_id":"company_china","group_name":"🇨🇳 China AI Labs","region":"China","aliases":["Alibaba/Qwen","Alibaba","Alibaba Cloud","DAMO Academy","Qwen","Tongyi"],"paper_count":451,"latest_paper_date":"2026-05-29"},{"id":"google-deepmind","name":"Google/DeepMind","group_id":"company_us","group_name":"🇺🇸 US AI Labs","region":"US","aliases":["Google/DeepMind","Google","Google Research","Google DeepMind","DeepMind"],"paper_count":405,"latest_paper_date":"2026-05-29"},{"id":"amazon","name":"Amazon","group_id":"company_us","group_name":"🇺🇸 US AI Labs","region":"US","aliases":["Amazon","AWS","Amazon AGI","Amazon Science"],"paper_count":948,"latest_paper_date":"2026-05-28"},{"id":"tencent-hunyuan","name":"Tencent/Hunyuan","group_id":"company_china","group_name":"🇨🇳 China AI Labs","region":"China","aliases":["Tencent/Hunyuan","Tencent","Tencent AI","Hunyuan"],"paper_count":703,"latest_paper_date":"2026-05-28"},{"id":"baidu","name":"Baidu","group_id":"company_china","group_name":"🇨🇳 China AI Labs","region":"China","aliases":["Baidu","Baidu Research","ERNIE"],"paper_count":259,"latest_paper_date":"2026-05-28"},{"id":"nvidia","name":"NVIDIA","group_id":"company_us","group_name":"🇺🇸 US AI Labs","region":"US","aliases":["NVIDIA","Nvidia Research"],"paper_count":231,"latest_paper_date":"2026-05-28"},{"id":"meta-fair","name":"Meta/FAIR","group_id":"company_us","group_name":"🇺🇸 US AI Labs","region":"US","aliases":["Meta/FAIR","Meta","Meta AI","FAIR","Facebook AI"],"paper_count":130,"latest_paper_date":"2026-05-28"},{"id":"huawei-noah","name":"Huawei/Noah","group_id":"company_china","group_name":"🇨🇳 China AI Labs","region":"China","aliases":["Huawei/Noah","Huawei","Huawei Noah","Noah's Ark Lab","Noah’s Ark Lab"],"paper_count":424,"latest_paper_date":"2026-05-26"},{"id":"minimax","name":"MiniMax","group_id":"company_china","group_name":"🇨🇳 China AI Labs","region":"China","aliases":["MiniMax","MiniMax AI"],"paper_count":7,"latest_paper_date":"2026-05-25"},{"id":"apple","name":"Apple","group_id":"company_us","group_name":"🇺🇸 US AI Labs","region":"US","aliases":["Apple","Apple Machine Learning Research"],"paper_count":367,"latest_paper_date":"2026-05-22"},{"id":"openai","name":"OpenAI","group_id":"company_us","group_name":"🇺🇸 US AI Labs","region":"US","aliases":["OpenAI"],"paper_count":53,"latest_paper_date":"2026-05-22"},{"id":"deepseek","name":"DeepSeek","group_id":"company_china","group_name":"🇨🇳 China AI Labs","region":"China","aliases":["DeepSeek","DeepSeek AI","DeepSeek V4","Engram"],"paper_count":29,"latest_paper_date":"2026-05-12"},{"id":"stepfun","name":"StepFun","group_id":"company_china","group_name":"🇨🇳 China AI Labs","region":"China","aliases":["StepFun","Step Fun"],"paper_count":22,"latest_paper_date":"2026-05-12"},{"id":"anthropic","name":"Anthropic","group_id":"company_us","group_name":"🇺🇸 US AI Labs","region":"US","aliases":["Anthropic"],"paper_count":22,"latest_paper_date":"2026-05-08"},{"id":"zai-zhipu","name":"Z.ai/Zhipu","group_id":"company_china","group_name":"🇨🇳 China AI Labs","region":"China","aliases":["Z.ai/Zhipu","Z.ai","Zhipu","Zhipu AI","GLM","ChatGLM"],"paper_count":23,"latest_paper_date":"2026-05"},{"id":"bytedance-seed","name":"ByteDance/Seed","group_id":"company_china","group_name":"🇨🇳 China AI Labs","region":"China","aliases":["ByteDance/Seed","ByteDance","Bytedance","ByteDance Seed","Seed Team","Doubao"],"paper_count":146,"latest_paper_date":"2026-04-22"},{"id":"moonshot-kimi","name":"Moonshot/Kimi","group_id":"company_china","group_name":"🇨🇳 China AI Labs","region":"China","aliases":["Moonshot/Kimi","Moonshot","Moonshot AI","Kimi"],"paper_count":16,"latest_paper_date":"2026-03-16"},{"id":"xai","name":"xAI","group_id":"company_us","group_name":"🇺🇸 US AI Labs","region":"US","aliases":["xAI"],"paper_count":3,"latest_paper_date":"2025-11-05"}],"papers":[{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/ishift-lightweight-slow-fast-gui-agent-with-adaptive-perception","title":"iSHIFT: Lightweight Slow-Fast GUI Agent with Adaptive Perception","url":"https://www.microsoft.com/en-us/research/publication/ishift-lightweight-slow-fast-gui-agent-with-adaptive-perception/","published":"2026-06-01","authors":["Sarthak Mehrotra","S. V. Rebbapragada","Mani Hemanth Reddy Bonthu","Vineeth N Balasubramanian"],"abstract":"Multimodal Large Language Models (MLLMs) show strong potential for interpreting and interacting with complex, pixel-rich Graphical User Interface (GUI) environments. However, building agents that are both efficient for high-level tasks and precise for fine-grained interactions remains challenging. GUI agents must perform routine actions efficiently while also handling tasks that demand exact visual grounding, yet existing approaches struggle when accuracy depends on identifying specific interface elements. These MLLMs also remain large and cannot adapt their reasoning depth to the task at hand. In this work, we introduce iSHIFT: Implicit Slow-fast Hybrid Inference with Flexible Tokens, a lightweight agent that integrates latent thinking (implicit chain-of-thought) with a perception control module. iSHIFT enables an MLLM to switch between a slow mode, which leverages detailed visual groun...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer vision","AI agents","Multimodal Large Language Models","Vision-language models"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/understanding-task-transfer-in-vision-language-models","title":"Understanding Task Transfer in Vision-Language Models","url":"https://www.microsoft.com/en-us/research/publication/understanding-task-transfer-in-vision-language-models/","published":"2026-06-01","authors":["Bhuvan Sachdeva","Karan Uppal","Abhinav Java","Vineeth N Balasubramanian"],"abstract":"Vision-Language Models (VLMs) perform well on multimodal benchmarks but lag behind humans and specialized models on visual perception tasks like depth estimation or object counting. Finetuning on one task can unpredictably affect performance on others, making task-specific finetuning challenging. In this paper, we address this challenge through a systematic study of task transferability. We examine how finetuning a VLM on one perception task affects its zero-shot performance on others. To quantify these effects, we introduce Perfection Gap Factor (PGF), a metric that captures both the breadth and magnitude of transfer. Using three open-weight VLMs evaluated across 13 perception tasks, we construct a task-transfer graph that reveals previously unobserved relationships among perception tasks. Our analysis uncovers patterns of positive and negative transfer, identifies groups of tasks that....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer vision","Deep learning","Multimodal Large Language Models","Vision-language models"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/foundation-model-priors-enhance-object-focus-in-feature-space-for-source-free-object-detection","title":"Foundation Model Priors Enhance Object Focus in Feature Space for Source-Free Object Detection","url":"https://www.microsoft.com/en-us/research/publication/foundation-model-priors-enhance-object-focus-in-feature-space-for-source-free-object-detection/","published":"2026-06-01","authors":["S. V. Rebbapragada","Rishabh Lalla","Aveen Dayal","Tejal Kulkarni","A. Lalla","Vineeth N Balasubramanian","Muhammad Haris Khan"],"abstract":"Current state-of-the-art approaches in Source-Free Object Detection (SFOD) typically rely on Mean-Teacher self-labeling. However, domain shift often reduces the detector's ability to maintain strong object-focused representations, causing high-confidence activations over background clutter. This weak object focus results in unreliable pseudo-labels from the detection head. While prior works mainly refine these pseudo-labels, they overlook the underlying need to strengthen the feature space itself. We propose FALCON-SFOD (Foundation-Aligned Learning with Clutter suppression and Noise robustness), a framework designed to enhance object-focused adaptation under domain shift. It consists of two complementary components. SPAR (Spatial Prior-Aware Regularization) leverages the generalization strength of vision foundation models to regularize the detector's feature space. Using class-agnostic b...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer vision","Deep learning","Domain adaptation","Object detection"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/position-reasoning-is-a-learnable-rule-based-process","title":"Position: Reasoning is a Learnable Rule-Based Process","url":"https://www.microsoft.com/en-us/research/publication/position-reasoning-is-a-learnable-rule-based-process/","published":"2026-06-01","authors":["Rachel Lawrence","Jacqueline Maasch"],"abstract":"Autonomous reasoning is among the most scientifically and economically motivating topics in AI today. Historically the purview of symbolic AI, recent advances have mainly emerged from deep probabilistic generative models. Despite immense interest and rapid progress, the generative AI community has not clearly converged on operational definitions for reasoning and often implicitly rejects the historical treatment of this topic in logic and verifiable automated reasoning. This position contends that definitional ambiguity leaves the construct validity of reasoning evaluation unverifiable, undermining quantifiable progress toward trustworthy autonomous reasoning. We also contend that this ambiguity is addressable. To that end, we provide (1) operational definitions based on a synthesis of the literature, positioning valid and sound reasoning as a learnable rule-based process; and (2) a chec...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W7162787002","title":"ZA-SLAM: Leveraging Vision-Language Model for Zero-Shot Acoustic SLAM","url":"https://doi.org/10.1145/3745756.3809218","published":"2026-05-29","authors":["Zhuochen Yu","David K. Y. Yau","Yijie Shen","Xiaoran Fan","Tao Chen","Qun Song"],"abstract":"Existing acoustic indoor location sensing systems are limited by the need for extensive data collection and model retraining in unseen environments. This paper introduces ZA-SLAM, a novel zero-shot acoustic Simultaneous Localization and Mapping (SLAM) system that can be deployed in unseen environments without model retraining. Our core idea is to train an acoustic encoder that inherits the generalization capabilities of pre-trained Vision-Language Models (VLMs), which show superiority in tasks like zero-shot visual SLAM. To achieve this goal, we perform Acoustic-Visual Feature Alignment to enable the acoustic encoder to generate features aligned with visual features from VLMs. To select high-quality images for effective alignment, we design a Semantic-Guided Image Selection that filters out low-quality collected images caused by factors like abrupt view changes, occlusions, and uninforma...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3745756.3809218","openalex_id":"https://openalex.org/W7162787002","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["City University of Hong Kong","Google (United States)","ShanghaiTech University","Singapore University of Technology and Design"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7613000273704529},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.6657999753952026},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.6144000291824341},{"id":"https://openalex.org/C86369673","display_name":"Simultaneous localization and mapping","score":0.59170001745224},{"id":"https://openalex.org/C13662910","display_name":"Trajectory","score":0.5311999917030334},{"id":"https://openalex.org/C118505674","display_name":"Encoder","score":0.45910000801086426},{"id":"https://openalex.org/C2776401178","display_name":"Feature (linguistics)","score":0.43709999322891235},{"id":"https://openalex.org/C79061980","display_name":"Inertial measurement unit","score":0.40059998631477356}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7162785891","title":"Alignment-free phylogenetic inference via hyperbolic protein language models","url":"https://doi.org/10.64898/2026.05.26.723419","published":"2026-05-29","authors":["Yongtao Shan","Pan Fang","Yuqi Liu","Yuanfei Pan","Kaijie Liu","Yong He","Xue Liu","Weichen Wu","Guirong Xue","Jian He","Deyin Guo","Jianguo He"],"abstract":"Conventional phylogenetic methods rely on multiple sequence alignments which are computationally intensive and often fail for highly divergent lineages. Here, we introduce LucaPhylo, an alignment-free framework that infers evolutionary relationships directly from unaligned sequences. Through a cascaded learning strategy LucaPhylo integrates protein language models with hyperbolic geometry, a representation space naturally suited to hierarchical branching, to capture deep evolutionary constraints without explicit homology matching. Using highly divergent RNA virosphere as a test case, LucaPhylo places unaligned sequences into phylogenetic trees with an accuracy comparable to leading alignment-based tree construction tools, while retaining divergent sequences that conventional pipelines frequently discard. It further enables the integration of divergent viral lineages into phylogenetic tre...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.64898/2026.05.26.723419","openalex_id":"https://openalex.org/W7162785891","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Alibaba Group (China)","First Affiliated Hospital of Guangzhou Medical University","Guangzhou Medical University","Southern Marine Science and Engineering Guangdong Laboratory (Guangzhou)","Sun Yat-sen University","The University of Sydney","Zhejiang Lab"],"concepts":[{"id":"https://openalex.org/C193252679","display_name":"Phylogenetic tree","score":0.8287000060081482},{"id":"https://openalex.org/C2776214188","display_name":"Inference","score":0.5942000150680542},{"id":"https://openalex.org/C26619641","display_name":"Phylogenetic network","score":0.49149999022483826},{"id":"https://openalex.org/C90132467","display_name":"Phylogenetics","score":0.4756999909877777},{"id":"https://openalex.org/C113174947","display_name":"Tree (set theory)","score":0.45820000767707825},{"id":"https://openalex.org/C2776359362","display_name":"Representation (politics)","score":0.4530999958515167},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.44440001249313354},{"id":"https://openalex.org/C165525559","display_name":"Homology (biology)","score":0.42640000581741333}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7162665351","title":"SMART: Spatio-Temporal Attention-based Large Language Model for Real-Time Traffic Prediction","url":"https://doi.org/10.1145/3774905.3794651","published":"2026-05-28","authors":["Srestha Sadhu","Deepanwita Mallick","Mariana Curado Malta","Amrita Namtirtha","Animesh Dutta","Nilabza Som"],"abstract":"","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3774905.3794651","openalex_id":"https://openalex.org/W7162665351","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Google (United States)","INESC TEC","National Institute of Technology Durgapur","University of Kalyani"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6158999800682068},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.46050000190734863},{"id":"https://openalex.org/C124101348","display_name":"Data mining","score":0.30640000104904175},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.29499998688697815},{"id":"https://openalex.org/C116834253","display_name":"Identification (biology)","score":0.2854999899864197},{"id":"https://openalex.org/C67186912","display_name":"Data modeling","score":0.27880001068115234},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.2752000093460083},{"id":"https://openalex.org/C177264268","display_name":"Set (abstract data type)","score":0.272599995136261}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"arxiv:2509.10397","title":"RecoWorld: Building Simulated Environments for Agentic Recommender Systems","url":"http://arxiv.org/abs/2509.10397","published":"2026-05-28","authors":["Fei Liu","Xinyu Lin","Hanchao Yu","Mingyuan Wu","Jianyu Wang","Qiang Zhang","Zhuokai Zhao","Yinglong Xia","Yao Zhang","Weiwei Li","Mingze Gao","Qifan Wang"],"abstract":"We present RecoWorld, a blueprint for building simulated environments tailored to agentic recommender systems. Such environments give agents a proper training space where they can learn from errors without impacting real users. RecoWorld distinguishes itself with a dual-view architecture: a simulated user and an agentic recommender engage in multi-turn interactions aimed at maximizing user retention. The user simulator reviews recommended items, updates its mindset, and when sensing potential user disengagement, generates reflective instructions. The agentic recommender adapts its recommendations by incorporating these user instructions and reasoning traces, creating a dynamic feedback loop that actively engages users. This process leverages the exceptional reasoning capabilities of modern LLMs. We explore diverse content representations within the simulator, including text-based, multim...","companies":["Meta/FAIR"],"matched_orgs":["Meta/FAIR"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3774905.3794661","openalex_id":"https://openalex.org/W4414597576","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Meta (United Kingdom)","Meta (United States)","National University of Singapore","University of Illinois Urbana-Champaign"],"concepts":[{"id":"https://openalex.org/C557471498","display_name":"Recommender system","score":0.8810999989509583},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8159999847412109},{"id":"https://openalex.org/C155911762","display_name":"Blueprint","score":0.6549000144004822},{"id":"https://openalex.org/C98045186","display_name":"Process (computing)","score":0.6132000088691711},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.5611000061035156},{"id":"https://openalex.org/C67712803","display_name":"User modeling","score":0.4763999879360199},{"id":"https://openalex.org/C136764020","display_name":"World Wide Web","score":0.4172999858856201},{"id":"https://openalex.org/C143587482","display_name":"Iterative and incremental development","score":0.4090000092983246}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7162664418","title":"Orchestrating Heterogeneous Experts: A Scalable MoE Framework with Anisotropy-Preserving Fusion","url":"https://doi.org/10.1145/3774905.3797243","published":"2026-05-28","authors":["Ye Liu","Wuji Chen","Xu Chen","Mang Li"],"abstract":"In cross-border e-commerce, search relevance modeling faces the dual challenge of extreme linguistic diversity and fine-grained semantic nuances. Existing approaches typically rely on scaling up a single monolithic Large Language Model (LLM). However, our empirical analysis reveals that single models suffer from uneven capability distributions across regions. For example, excelling in English while underperforming in specific Southeast Asian languages. In this work, we shift the paradigm from scaling a single model to orchestrating heterogeneous experts. We propose a scalable Coarse-grained Mixture-of-Experts (MoE) framework that leverages the inherent complementarity of distinct open-source LLMs (e.g., Qwen, Gemma) without expensive pre-training. Unlike standard token-level MoE, our framework dynamically routes entire queries to specialized experts and, crucially, employs an Information...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3774905.3797243","openalex_id":"https://openalex.org/W7162664418","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Alibaba Group (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6162999868392944},{"id":"https://openalex.org/C48044578","display_name":"Scalability","score":0.43619999289512634},{"id":"https://openalex.org/C33954974","display_name":"Sensor fusion","score":0.3788999915122986},{"id":"https://openalex.org/C158525013","display_name":"Fusion","score":0.3587000072002411},{"id":"https://openalex.org/C168167062","display_name":"Component (thermodynamics)","score":0.31209999322891235},{"id":"https://openalex.org/C2779343474","display_name":"Context (archaeology)","score":0.29019999504089355},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.2720000147819519},{"id":"https://openalex.org/C2778755073","display_name":"Scale (ratio)","score":0.27149999141693115}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"arxiv:2605.29960","title":"Hijacking Agent Memory: Stealthy Trojan Attacks Through Conversational Interaction","url":"https://arxiv.org/abs/2605.29960","published":"2026-05-28","authors":["Hongtao Wang","Se Yang","Yu Chen","Puzhuo Liu"],"abstract":"Large language model (LLM) agents increasingly leverage long term memory to support persistent and autonomous task execution. However, this capability also introduces a new attack surface: memory poisoning, where adversaries can inject malicious information to influence future behavior. Existing memory poisoning attacks often assume that injected content can be stored directly in memory, overlooking the selective extraction and rewriting stages in modern memory pipelines. This makes prior methods ineffective under realistic settings. In this paper, we propose MemPoison, a novel memory poisoning attack that bypasses selective memory mechanisms in LLM agents, where an attacker can inject triggerable backdoors into the agent's long-term memory through dialogue interactions, thereby misleading its subsequent responses. MemPoison introduces three key components: (i) a semantic relational brid...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"","openalex_id":"https://openalex.org/W7162894027","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["North China Electric Power University","Tencent (China)","Tsinghua University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8187999725341797},{"id":"https://openalex.org/C165696696","display_name":"Exploit","score":0.6572999954223633},{"id":"https://openalex.org/C38652104","display_name":"Computer security","score":0.5681999921798706},{"id":"https://openalex.org/C28180684","display_name":"Memory safety","score":0.4977000057697296},{"id":"https://openalex.org/C41608201","display_name":"Embedding","score":0.43619999289512634},{"id":"https://openalex.org/C153083717","display_name":"Leverage (statistics)","score":0.4296000003814697},{"id":"https://openalex.org/C2775941552","display_name":"Isolation (microbiology)","score":0.39730000495910645},{"id":"https://openalex.org/C26517878","display_name":"Key (lock)","score":0.3889999985694885}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7162663542","title":"FedUMM: A General Framework for Federated Learning with Unified Multimodal Models","url":"https://doi.org/10.1145/3774905.3796623","published":"2026-05-28","authors":["Zhaolong Su","Leheng Zhao","Xiaoying Wu","Ziyue Xu","Jindong Wang"],"abstract":"","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3774905.3796623","openalex_id":"https://openalex.org/W7162663542","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Nvidia (United Kingdom)","Nvidia (United States)","William & Mary","Williams (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7110999822616577},{"id":"https://openalex.org/C2992525071","display_name":"Federated learning","score":0.38940000534057617},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3474999964237213},{"id":"https://openalex.org/C2776401178","display_name":"Feature (linguistics)","score":0.33719998598098755},{"id":"https://openalex.org/C26517878","display_name":"Key (lock)","score":0.32580000162124634},{"id":"https://openalex.org/C9652623","display_name":"Field (mathematics)","score":0.31119999289512634},{"id":"https://openalex.org/C168167062","display_name":"Component (thermodynamics)","score":0.27880001068115234},{"id":"https://openalex.org/C77618280","display_name":"Scheme (mathematics)","score":0.26249998807907104}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"arxiv:2605.29847","title":"EvoRubric: Self-Evolving Rubric-Driven RL for Open-Ended Generation","url":"https://arxiv.org/abs/2605.29847","published":"2026-05-28","authors":["Xin Guan","Xiaomeng Hu","Shen Huang","Zhenyi Wang","Bo Zhang","Zijian Li","Pengjun Xie","Bo Liu","Jiuxin Cao"],"abstract":"Reinforcement Learning (RL) has significantly advanced Large Language Models (LLMs) in verifiable domains, but aligning models for open-ended generation remains profoundly challenging due to the lack of definitive rewards. Current rubric-based RL methods mitigate this by employing explicit criteria; however, they rely heavily on static, human-annotated rubrics that inevitably cause policy lag, or expensive external proprietary models for dynamic updates. In this paper, we propose EvoRubric, a novel single-policy co-evolutionary RL framework that eliminates the reliance on static criteria and on external rubric generators. By unifying response generation and rubric generation under a single parameterized policy, EvoRubric dynamically alternates between a Reasoner and a Rubric Generator. To prevent reward hacking and ensure the reliability of generated signals, we introduce a multi-level v...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"","openalex_id":"https://openalex.org/W7162893576","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Alibaba Group (China)","Alibaba Group (United States)"],"concepts":[{"id":"https://openalex.org/C111640148","display_name":"Rubric","score":0.7893000245094299},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7531999945640564},{"id":"https://openalex.org/C97931131","display_name":"Discriminative model","score":0.5776000022888184},{"id":"https://openalex.org/C127705205","display_name":"Heuristics","score":0.5738000273704529},{"id":"https://openalex.org/C43521106","display_name":"Pipeline (software)","score":0.5504999756813049},{"id":"https://openalex.org/C97541855","display_name":"Reinforcement learning","score":0.46129998564720154},{"id":"https://openalex.org/C85847156","display_name":"Verifiable secret sharing","score":0.44679999351501465},{"id":"https://openalex.org/C43214815","display_name":"Reliability (semiconductor)","score":0.43320000171661377}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"arxiv:2605.29615","title":"DiffSpot: Can VLMs Spot Fine-Grained Visual Differences in Web Interfaces?","url":"https://arxiv.org/abs/2605.29615","published":"2026-05-28","authors":["Linhao Zhang","Aiwei Liu","Yuan Liu","Xiao Zhou"],"abstract":"Vision-language models (VLMs) have made strong progress on high-level image-text alignment, yet their ability to perceive subtle visual differences remains limited. We study this problem in rendered web interfaces, where localized visual changes are both a diagnostic test of fine-grained perception and a practical requirement for GUI agents and design tools. We introduce \\textbf{DiffSpot}, a code-driven benchmark for open-ended spot-the-difference on web interfaces. DiffSpot constructs controlled image pairs by mutating a single CSS property of a target element in self-contained HTML, re-rendering the page, and recording the changed property, element, and mutation magnitude. A grounding gate retains only pairs whose rendered pixel difference is confined to the target element. The benchmark contains 4{,}400 pairs, including 3{,}900 has-diff pairs balanced across 13 CSS-property operators....","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"","openalex_id":"https://openalex.org/W7162893476","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Tencent (China)"],"concepts":[{"id":"https://openalex.org/C185798385","display_name":"Benchmark (surveying)","score":0.7062000036239624},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6269999742507935},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.6118000149726868},{"id":"https://openalex.org/C160633673","display_name":"Pixel","score":0.5394999980926514},{"id":"https://openalex.org/C26760741","display_name":"Perception","score":0.49570000171661377},{"id":"https://openalex.org/C189950617","display_name":"Property (philosophy)","score":0.49390000104904175},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.47940000891685486},{"id":"https://openalex.org/C81669768","display_name":"Precision and recall","score":0.39089998602867126}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"arxiv:2507.05495","title":"Deep Research Comparator: A Platform For Fine-grained Human Annotations of Deep Research Agents","url":"http://arxiv.org/abs/2507.05495","published":"2026-05-28","authors":["Prahaladh Chandrahasan","Jiahe Jin","Zhihan Zhang","Tevin Wang","Andy Tang","Lucy Mo","Morteza Ziyadi","Leonardo F. R. Ribeiro","Zimeng Qiu","Markus Dreyer","Akari Asai","Chenyan Xiong"],"abstract":"Evaluating deep research agents that iteratively search the web, analyze information, and generate reports remains a major challenge, especially in assessing long reports and providing fine-grained feedback. To address these gaps, we introduce Deep Research Comparator, a holistic annotation platform for the human evaluation of deep research agents. Our platform displays the generated reports and intermediate steps from two agents side-by-side, and allows annotators to state their preference between two final reports, and provide fine-grained feedback on specific text spans within the report or intermediate steps for each agent separately. Furthermore, we develop Simple Deepresearch, an agent scaffold that serves as a baseline to facilitate the integration of various large language models to transform them into deep research agents for evaluation. Experiment results of three agents with 1...","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"preprint","doi":"https://doi.org/10.1145/3774905.3793116","openalex_id":"https://openalex.org/W4416060136","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Amazon (United States)","Carnegie Mellon University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.791100025177002},{"id":"https://openalex.org/C108583219","display_name":"Deep learning","score":0.6741999983787537},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5722000002861023},{"id":"https://openalex.org/C189430467","display_name":"Ranking (information retrieval)","score":0.5663999915122986},{"id":"https://openalex.org/C12725497","display_name":"Baseline (sea)","score":0.4794999957084656},{"id":"https://openalex.org/C2779530757","display_name":"Quality (philosophy)","score":0.4657000005245209},{"id":"https://openalex.org/C2781249084","display_name":"Preference","score":0.4115999937057495},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.4004000127315521}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"arxiv:2605.30312","title":"DP-SAPF: Saliency-Aware Parameter Fine-tuning of Public Models for Differentially Private Image Synthesis","url":"https://arxiv.org/abs/2605.30312","published":"2026-05-28","authors":["Chen Gong","Kecen Li","Zinan Lin","Tianhao Wang"],"abstract":"Differentially private (DP) image synthesis generates images that preserve the statistical characteristics of a sensitive dataset, enabling sensitive data analysis and usage while providing rigorous guarantees of privacy leakage. Existing methods fine-tune public models using DP Stochastic Gradient Descent (DP-SGD) on sensitive images to generate synthetic images. But full fine-tuning public models on sensitive images is computationally expensive, because current public models typically contain a large number of parameters. Recent work proposes heuristically using Low-Rank Adaptation (LoRA) on all attention-layer parameters of public models to reduce the number of trainable parameters. However, we argue that exhaustive LoRA coverage across all attention-layer parameters is suboptimal in a DP setting, as it leads to noise accumulation and collapse during private training. To address this....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"","openalex_id":"https://openalex.org/W7162892860","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Microsoft (United States)","Microsoft Research (United Kingdom)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7063999772071838},{"id":"https://openalex.org/C2780719617","display_name":"Salient","score":0.6729999780654907},{"id":"https://openalex.org/C2776459999","display_name":"Fidelity","score":0.6650000214576721},{"id":"https://openalex.org/C115961682","display_name":"Image (mathematics)","score":0.600600004196167},{"id":"https://openalex.org/C99498987","display_name":"Noise (video)","score":0.5748000144958496},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5102999806404114},{"id":"https://openalex.org/C160920958","display_name":"Synthetic data","score":0.49239999055862427},{"id":"https://openalex.org/C153258448","display_name":"Gradient descent","score":0.47850000858306885}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"arxiv:2605.30000","title":"Cookie-Bench: Continuous On-screen Key Interaction Evaluation for Web Generation","url":"https://arxiv.org/abs/2605.30000","published":"2026-05-28","authors":["Haoyue Yang","Zhangxiao Shen","Fan Ding","Hangting Lou","Yifeng Kou","Haoqing Yu","Jingyao Li","Zhengfan Wu","Siqi Bao","Jing Liu","Hua Wu"],"abstract":"Front-end web code has become a core product surface for every frontier LLM release, yet evaluating these interactive applications at development speed remains costly because human-judged leaderboards like Arena do not scale. Existing automated proxies typically lean on reference implementations, test suites, or rigid checklists, and tend to miss the reasoned synthesis a human reviewer performs over a live session. We articulate a new evaluation regime that is simultaneously reference-free, autonomously driven, and holistically reasoned, and instantiate it through two artifacts. \\textbf{\\dataname} is an 11-domain, 54-leaf, 1,000-query WebDev benchmark spanning both static-presentation and interactive-application tasks, balanced across three difficulty tiers and three target-language groups, with briefs rewritten to resist recall from circulated prompts. \\textbf{\\framename}, grounded in F...","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"","openalex_id":"https://openalex.org/W7162892839","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Baidu (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6304000020027161},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.45649999380111694},{"id":"https://openalex.org/C143299363","display_name":"Attribution","score":0.44200000166893005},{"id":"https://openalex.org/C136764020","display_name":"World Wide Web","score":0.4388999938964844},{"id":"https://openalex.org/C2778571376","display_name":"Frontier","score":0.43650001287460327},{"id":"https://openalex.org/C26760741","display_name":"Perception","score":0.4325000047683716},{"id":"https://openalex.org/C2776684213","display_name":"Impression","score":0.41019999980926514},{"id":"https://openalex.org/C19351080","display_name":"New product development","score":0.3919000029563904}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7162644437","title":"Concept-Grounded Detection of Vaccine Misinformation in Multimodal Content Using Interpretable Vision-Language Models","url":"https://doi.org/10.1145/3774905.3795453","published":"2026-05-28","authors":["Laxmi Thapa","Aryaman Jain","Lakshmojee Koduru","Surabhi Adhikari","Junaid Rashid","Jungeun Kim","Surendrabikram Thapa","Usman Naseem"],"abstract":"Vaccine misinformation poses a persistent public health challenge, particularly in visual formats such as memes and infographics that combine text, imagery, and rhetorical cues. While textual misinformation has been widely studied, image-based vaccine misinformation remains comparatively underexplored due to the difficulty of interpreting multimodal signals at scale. In this work, we evaluate how effectively multimodal Large Vision-Language Models (LVLMs) can (i) directly classify vaccination stance from images and (ii) extract interpretable concept-level representations that support more reliable and transparent prediction. Using the VaxMeme dataset of 10,244 annotated images, we compare direct zero-shot LVLM inference against a hybrid framework in which classical machine learning models are trained on LVLM-extracted binary concept features. Our results show that grounding stance predic...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3774905.3795453","openalex_id":"https://openalex.org/W7162644437","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Columbia University","Delhi Technological University","Google (United States)","Inha University","Korea University of Technology and Education","Macquarie University","O. P. Jindal Global University","Sejong University","Virginia Tech"],"concepts":[{"id":"https://openalex.org/C2776990098","display_name":"Misinformation","score":0.6297000050544739},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5591999888420105},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.550599992275238},{"id":"https://openalex.org/C2778152352","display_name":"Content (measure theory)","score":0.43209999799728394},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.36809998750686646},{"id":"https://openalex.org/C26517878","display_name":"Key (lock)","score":0.29899999499320984},{"id":"https://openalex.org/C153180895","display_name":"Pattern recognition (psychology)","score":0.2924000024795532},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.2718999981880188}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"arxiv:2605.29277","title":"Code-QA-Bench: Separating Code Reasoning from Documentation Memorization in Repository-Level QA","url":"https://arxiv.org/abs/2605.29277","published":"2026-05-28","authors":["Jun Zhang","JianYing Qu","Hanwen Du","Zhongkai Sun","Yehua Yang","Qiao Zhao"],"abstract":"We present Code-QA-Bench, a fully automated framework for synthesizing repository-level code understanding benchmarks that separates genuine code comprehension from documentation recall and pretraining memorization. The framework makes two methodological contributions: (1) an answer-first generation pipeline where a tool-equipped agent explores source code to produce verified gold answers before deriving questions, ensuring every task is grounded in real code structure; and (2) a three-condition experimental design evaluating agents under closed-book (no repository), code-only (documentation removed), and documented (full repository) conditions, with deltas directly quantifying documentation utility and memorization. We generate 528 code-derivable and 100 doc-dependent tasks across 10 Python repositories from SWE-Bench, scored by an LLM judge on accuracy, completeness, and specificity. E...","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"","openalex_id":"https://openalex.org/W7162893335","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Baidu (China)"],"concepts":[{"id":"https://openalex.org/C56666940","display_name":"Documentation","score":0.8223999738693237},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7738999724388123},{"id":"https://openalex.org/C199360897","display_name":"Programming language","score":0.6304000020027161},{"id":"https://openalex.org/C519991488","display_name":"Python (programming language)","score":0.6119999885559082},{"id":"https://openalex.org/C43126263","display_name":"Source code","score":0.5493000149726868},{"id":"https://openalex.org/C55439883","display_name":"Correctness","score":0.5428000092506409},{"id":"https://openalex.org/C2777561058","display_name":"Program comprehension","score":0.4912000000476837},{"id":"https://openalex.org/C115903868","display_name":"Software engineering","score":0.426800012588501}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"arxiv:2605.29791","title":"ActTraitBench: Quantifying the Knowledge-Decision Gap in Large Language Models via Human-Grounded Behavioral Validation","url":"https://arxiv.org/abs/2605.29791","published":"2026-05-28","authors":["Yutong Yang","Chenxi Miao","Weikang Li","Yunfang Wu"],"abstract":"While Large Language Models (LLMs) can convincingly simulate personas in explicit self-reports, they often deviate in implicit behavioral decisions, revealing a substantial Knowledge-Decision Gap ($G_{\\text{KD}}$). Existing benchmarks struggle to measure this asymmetry due to limited construct validity, multi-dimensional entanglement, and distributional biases in LLM-based evaluation. To address these issues, we propose ActTraitBench, a human-grounded evaluation framework for measuring personality consistency in LLMs. Grounded in empirical human data, ActTraitBench establishes one-to-one mappings between psychometric facets and behavioral paradigms, and applies a Distributional Calibration via Quantile Mapping procedure to align LLM-judge score distributions with human norms. Experiments on 14 mainstream LLMs reveal a pervasive knowledge-decision asymmetry, where larger and more capable....","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"","openalex_id":"https://openalex.org/W7162894045","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Baidu (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6273000240325928},{"id":"https://openalex.org/C2776436953","display_name":"Consistency (knowledge bases)","score":0.6269999742507935},{"id":"https://openalex.org/C2780009758","display_name":"Measure (data warehouse)","score":0.5691999793052673},{"id":"https://openalex.org/C2780801425","display_name":"Construct (python library)","score":0.5379999876022339},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5350000262260437},{"id":"https://openalex.org/C207390915","display_name":"Divergence (linguistics)","score":0.48739999532699585},{"id":"https://openalex.org/C180747234","display_name":"Cognitive psychology","score":0.47769999504089355},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.4693000018596649}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/adopt-%e2%89%a0-adapt-longitudinal-analyses-of-llm-conversations-in-the-wild","title":"Adopt ≠ Adapt: Longitudinal Analyses of LLM Conversations in the Wild","url":"https://www.microsoft.com/en-us/research/publication/adopt-%e2%89%a0-adapt-longitudinal-analyses-of-llm-conversations-in-the-wild/","published":"2026-05-27","authors":["Rebecca M. M. Hicke","Kiran Tomlinson"],"abstract":"Although a growing body of research has begun to describe user-LLM interactions, the picture it paints is largely static; little is known about how individual users change their behavior over time. To address this gap, we analyze the conversational trajectories of ~12,000 randomly sampled Microsoft Bing Copilot users and compare these with data from WildChat-4.8M. While the Copilot data contains significant population-level trends, we find that trends in individual user trajectories are much weaker; user habits prove to be overwhelmingly sticky. We also find stark differences between users of different activity levels: more active users have more successful conversations and use the LLM for more complex and professionally oriented tasks. Some user trends also appear in WildChat-4.8M, but we find evidence that this dataset is significantly skewed towards highly proficient \"power\" users. U...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Artificial intelligence","Human-computer interaction","Miscellaneous"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"hf-org-paper:Qwen:2605.30280","title":"Qwen-VLA: Unifying Vision-Language-Action Modeling across Tasks, Environments, and Robot Embodiments","url":"https://huggingface.co/papers/2605.30280","published":"2026-05-27","authors":["Alibaba/Qwen"],"abstract":"Embodied intelligence is often studied through specialized models for individual tasks such as manipulation or navigation, resulting in fragmented capabilities and limited generalization across tasks, environments, and robot embodiments. In this work, we study whether heterogeneous embodied decision-making problems can be unified within a single vision-language-action model. We present Qwen-VLA, a unified embodied foundation model that extends Qwen's vision-language modeling stack from perception, understanding, and reasoning to continuous action and trajectory generation through a DiT-based action decoder. Qwen-VLA is trained with a large-scale joint pretraining recipe over diverse data sources, including robotics manipulation trajectories, human egocentric demonstrations, synthetic simulation data, vision-and-language navigation data, trajectory-centric supervision, and auxiliary visio...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["HuggingFace org papers","Qwen"],"author_affiliations":["Alibaba/Qwen"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/Qwen/papers"}},{"id":"hf-org-paper:baidu:2605.30073","title":"Native Audio-Visual Alignment for Generation","url":"https://huggingface.co/papers/2605.30073","published":"2026-05-27","authors":["Baidu"],"abstract":"Joint audio-video generation aims to synthesize temporally synchronized and semantically coherent visual-acoustic content. However, existing open-source methods mainly rely on either dual-tower designs with posterior alignment or fully unified tri-modal designs that mix textual context, audio and video in one shared space. The former weakens fine-grained audio-video co-evolution, while the latter couples semantic conditioning with low-level synchronization. To address these limitations, we propose NAVA, a Native Audio-Visual Alignment framework for joint audio-video generation. NAVA is built upon context-conditioned native audio-visual alignment: it first establishes audio-video correspondence in a dedicated interaction space, and then uses external context to condition the joint denoising process. Specifically, NAVA is instantiated with an Align-then-Fuse MMDiT architecture, which trans...","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["HuggingFace org papers","baidu"],"author_affiliations":["Baidu","Baidu (China)"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/baidu/papers"}},{"id":"hf-org-paper:Tencent-Hunyuan:2605.30248","title":"GenClaw: Code-Driven Agentic Image Generation","url":"https://huggingface.co/papers/2605.30248","published":"2026-05-27","authors":["Tencent/Hunyuan"],"abstract":"","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["HuggingFace org papers","Tencent-Hunyuan"],"author_affiliations":["Tencent/Hunyuan"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/Tencent-Hunyuan/papers"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/ai-and-the-democratization-of-knowledge-work","title":"AI and the democratization of knowledge work","url":"https://www.microsoft.com/en-us/research/publication/ai-and-the-democratization-of-knowledge-work/","published":"2026-05-27","authors":["Madeleine Daepp","Kiran Tomlinson","Scott Counts","Siddharth Suri"],"abstract":"Generative artificial intelligence (AI) is broadly impacting knowledge workers and knowledge work, a valuable part of modern economies. Recent empirical research shows that when people use generative AI for work they primarily use it for high-complexity, knowledge-work tasks. However, this leaves open the question of who the users and beneficiaries of such tools are. Here, we explore the extent to which AI democratizes knowledge work and describe technical and social interventions for addressing notable social and place-based divides.","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Artificial intelligence","Article (Journal)"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"official:19ee73efa8e6fbb8","title":"DualKV: Shared-Prompt Flash-Attention for RL Training","url":"https://github.com/amazon-science/dualkv-flash-attn-for-rl/blob/main/flash-attention/assets/fa4_paper.pdf","published":"2026-05-27","authors":["Amazon"],"abstract":"Implementation of DualKV: Shared-Prompt Flash Attention for Efficient RL Training with Large Rollouts and Long Contexts","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_repository_scan"],"source":"official_repository_scan","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official GitHub repo amazon-science/dualkv-flash-attn-for-rl"}},{"id":"arxiv:2605.28149","title":"Sign-Aware Gated Sparse Autoencoders: Modeling Anticorrelated Features with Bi-Jump-ReLU Activations","url":"https://arxiv.org/abs/2605.28149","published":"2026-05-27","authors":["Bartosz Wieciech","Zmnako Awrahman","Marcin Czelej","Víctor Hugo Jaramillo Velásquez","Wioletta Stobieniecka"],"abstract":"Sparse Autoencoders (SAEs) extract interpretable features from Large Language Models, but standard variants enforce non-negativity, forcing separate latents for diametrically opposed concepts (e.g., \"pressure too high\" vs. \"pressure too low\") and wasting dictionary capacity when features are anticorrelated. We propose the Sign-Aware Gated SAE (SA-GSAE): two-sided gated sparsity with signed magnitude and auxiliary supervision. A polarity-sensitive gate selects support on either sign, a signed-magnitude path avoids L1 shrinkage, and an auxiliary reconstruction prevents gate collapse. Bipolar sharing - one latent encoding both signs along a shared direction - is realised via a new Bi-Jump-ReLU activation; parameter accounting shows sign-awareness stays parameter-efficient even when anticorrelated pairs are rare. On real LLM activations across three mid-depth hookpoints on Pythia-1B and Smol...","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"","openalex_id":"https://openalex.org/W7162818284","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Amazon (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6028000116348267},{"id":"https://openalex.org/C177264268","display_name":"Set (abstract data type)","score":0.5925999879837036},{"id":"https://openalex.org/C2777735758","display_name":"Path (computing)","score":0.5805000066757202},{"id":"https://openalex.org/C11413529","display_name":"Algorithm","score":0.4970000088214874},{"id":"https://openalex.org/C125411270","display_name":"Encoding (memory)","score":0.48829999566078186},{"id":"https://openalex.org/C2780938662","display_name":"Tying","score":0.43849998712539673},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.423799991607666},{"id":"https://openalex.org/C153180895","display_name":"Pattern recognition (psychology)","score":0.4018000066280365}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"arxiv:2605.27971","title":"Semantic Flow Regularization: Teaching LLMs to Generate Diverse Yet Coherent Responses","url":"https://arxiv.org/abs/2605.27971","published":"2026-05-27","authors":["Kerui Peng","Feifei Li","Xingyu Fan","Wenhui Que"],"abstract":"When large language models are fine-tuned to generate persona- or tone-conditioned responses, their output diversity is severely limited--a failure we term Cross-Style Collapse. We trace this collapse to the cross-entropy objective, which under shared representations tends to suppress diverse continuations. We propose Semantic Flow Regularization (SFR), a lightweight auxiliary objective that supervises the backbone with continuous sentence-encoder embeddings of future segments via conditional flow matching. The stochastic flow source preserves multi-modality by construction; the flow-matching head is discarded at inference, adding zero deployment cost. On a large-scale industrial dialogue dataset (Qwen3-32B, 9 personas), SFR improves output diversity, style fidelity, and response quality over SFT. We further validate on the public LiveCodeBench-v5 (Qwen2.5-Coder-7B-Instruct), where SFR c...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"","openalex_id":"https://openalex.org/W7162817840","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Tencent (China)"],"concepts":[{"id":"https://openalex.org/C38935604","display_name":"Stylized fact","score":0.8314999938011169},{"id":"https://openalex.org/C2780767217","display_name":"Generality","score":0.7630000114440918},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6660000085830688},{"id":"https://openalex.org/C75291252","display_name":"TRACE (psycholinguistics)","score":0.5403000116348267},{"id":"https://openalex.org/C2779530757","display_name":"Quality (philosophy)","score":0.5012000203132629},{"id":"https://openalex.org/C2776135515","display_name":"Regularization (linguistics)","score":0.4871000051498413},{"id":"https://openalex.org/C138673069","display_name":"Tracing","score":0.46939998865127563},{"id":"https://openalex.org/C105339364","display_name":"Software deployment","score":0.4397999942302704}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"arxiv:2605.27788","title":"Knowing When to Ask: Segment-Level Credit Assignment for LLM Tool Use","url":"https://arxiv.org/abs/2605.27788","published":"2026-05-27","authors":["Abhijit Kumar","Zoey Wu","Mohit Suley"],"abstract":"Humans know when to reach for help e.g. $347 \\times 28$ warrants a calculator while $2+2$ does not. Language models do not. Prompt-based approaches can instruct a model when to invoke tools, but this scaffolding does not teach it to recognize the boundary of its own knowledge. RL approaches that assign a single outcome reward to the whole trajectory fare no better: trajectory-level credit cannot isolate which tool call in a successful episode actually helped, nor penalize unnecessary calls. We propose \\textbf{CARL} (\\textbf{C}ompetence-\\textbf{A}ware \\textbf{R}einforcement \\textbf{L}earning), which trains a critic on the model's own rollouts to learn where parametric knowledge suffices and where it needs external help. By decomposing each rollout at natural tool-use boundaries (e.g., code fence delimiters and context block transitions), CARL assigns independent credit to each segment fro...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"","openalex_id":"https://openalex.org/W7162817414","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Microsoft (United States)","Microsoft Research (United Kingdom)"],"concepts":[{"id":"https://openalex.org/C2779343474","display_name":"Context (archaeology)","score":0.6686000227928162},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.660099983215332},{"id":"https://openalex.org/C2776760102","display_name":"Code (set theory)","score":0.5184999704360962},{"id":"https://openalex.org/C62354387","display_name":"Boundary (topology)","score":0.49639999866485596},{"id":"https://openalex.org/C2776836400","display_name":"Calculator","score":0.48170000314712524},{"id":"https://openalex.org/C2777210771","display_name":"Block (permutation group theory)","score":0.4674000144004822},{"id":"https://openalex.org/C43126263","display_name":"Source code","score":0.46549999713897705},{"id":"https://openalex.org/C148220186","display_name":"Outcome (game theory)","score":0.45570001006126404}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"arxiv:2605.28218","title":"IFMTBench: A Comprehensive Benchmark for Multilingual Translation Instruction Following","url":"https://arxiv.org/abs/2605.28218","published":"2026-05-27","authors":["Mingrui Sun","Mao Zheng","Zheng Li","Mingyang Song"],"abstract":"Modern translation workflows demand more than semantic equivalence. Users routinely require models to preserve JSON or HTML schemas, honor curated glossaries, disambiguate with provided context, and match prescribed registers, often several at once. Conventional metrics such as BLEU and xCOMET capture semantic fidelity but provide little signal on constraint adherence, while general instruction following benchmarks ignore the cross-lingual nature of translation. We introduce \\bench, a benchmark for multilingual translation instruction following covering seven languages, with 4,506 single-constraint and 2,838 multi-constraint items spanning six constraint dimensions and five compositional patterns with instructions issued in all seven languages. Constraints are split into a gating subset verified by deterministic checkers and a continuous subset scored by a rubric-based LLM judge, combine...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"","openalex_id":"https://openalex.org/W7162818252","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Tencent (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8256000280380249},{"id":"https://openalex.org/C185798385","display_name":"Benchmark (surveying)","score":0.715399980545044},{"id":"https://openalex.org/C2776036281","display_name":"Constraint (computer-aided design)","score":0.553600013256073},{"id":"https://openalex.org/C149364088","display_name":"Translation (biology)","score":0.5419999957084656},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.5375000238418579},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5335999727249146},{"id":"https://openalex.org/C2776459999","display_name":"Fidelity","score":0.5148000121116638},{"id":"https://openalex.org/C177212765","display_name":"Workflow","score":0.4948999881744385}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"arxiv:2605.28315","title":"HardMTBench: Stress-Testing Chinese-English Translation on Knowledge-Intensive Domains","url":"https://arxiv.org/abs/2605.28315","published":"2026-05-27","authors":["Zheng Li","Mao Zheng","Mingyang Song","T. Fei"],"abstract":"General-purpose machine translation benchmarks such as FLORES-200 have reached a saturation regime on Chinese-English pairs, where modern large language models cluster within a narrow band of high scores. Across 22 systems, FLORES-200 zh-en GEMBA scores fall in a 7.87-point range with a standard deviation of 2.29, which compresses the separation between systems on knowledge-intensive domains such as finance, healthcare, law, and science and technology. We introduce HardMTBench, a difficulty-aware diagnostic benchmark for bidirectional Chinese-English domain translation. HardMTBench covers 12 domains and contains 10,000 hand-curated source sentences with reference translations, packaged as 20,000 directional test items. A three-stage construction pipeline builds a domain-balanced candidate pool of 84{,}566 pairs, applies an LLM-based multi-signal judge over knowledge density, translation....","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"","openalex_id":"https://openalex.org/W7162817752","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Tencent (China)"],"concepts":[{"id":"https://openalex.org/C547195049","display_name":"Terminology","score":0.8210999965667725},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6848000288009644},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.5877000093460083},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5307999849319458},{"id":"https://openalex.org/C43521106","display_name":"Pipeline (software)","score":0.4828000068664551},{"id":"https://openalex.org/C36503486","display_name":"Domain (mathematical analysis)","score":0.46459999680519104},{"id":"https://openalex.org/C177264268","display_name":"Set (abstract data type)","score":0.45590001344680786},{"id":"https://openalex.org/C203005215","display_name":"Machine translation","score":0.45489999651908875}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"arxiv:2605.27846","title":"EAPO: Entropy-Driven Adaptive Positive-Negative Sample Weighting for Policy Optimization in Open-Ended QA","url":"https://arxiv.org/abs/2605.27846","published":"2026-05-27","authors":["Yunsheng Zeng","Gen Li","Yuwei Miao","Xiandong Li","Yujin Wang","Siyu Chen","Luning Wang","Yunhao Qiao","Junfeng Wang","Jianwei Lv","Bo Yuan"],"abstract":"Large Reasoning Models are typically trained via reinforcement learning from verifiable rewards (RLVR). However, existing approaches adopt fixed weights for positive and negative samples, and the conclusions hardly generalize to open-ended question answering (QA). In this paper, we systematically investigate the roles of positive and negative samples in reinforcement learning for open-ended QA. We propose a reward-mean-based strategy for distinguishing positive from negative samples, and observe that negative samples predominantly govern response diversity and the performance upper bound, whereas positive samples primarily determine response quality and convergence stability. Building on these observations, we propose EAPO, an Entropy-driven Adaptive Policy Optimization method that adaptively computes the weighting coefficients of positive samples based on the ratio of the current policy...","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"","openalex_id":"https://openalex.org/W7162817668","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Baidu (China)"],"concepts":[{"id":"https://openalex.org/C183115368","display_name":"Weighting","score":0.7358999848365784},{"id":"https://openalex.org/C97541855","display_name":"Reinforcement learning","score":0.6690999865531921},{"id":"https://openalex.org/C106301342","display_name":"Entropy (arrow of time)","score":0.5605999827384949},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5091999769210815},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.44209998846054077},{"id":"https://openalex.org/C2777303404","display_name":"Convergence (economics)","score":0.427700012922287},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.41290000081062317},{"id":"https://openalex.org/C33923547","display_name":"Mathematics","score":0.39809998869895935}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"arxiv:2605.28787","title":"Do Agents Need Semantic Metadata? A Comparative Study in Agentic Data Retrieval","url":"https://arxiv.org/abs/2605.28787","published":"2026-05-27","authors":["Shiyu Chen","Tarfah Alrashed","Alon Halevy","Natasha Noy"],"abstract":"In the era of autonomous agents, machine-actionable data is critical for data-driven workflows. For more than a decade, semantic metadata like schema.org has anchored the FAIR principles (Findable, Accessible, Interoperable, and Reusable) for machine-actionable data and enabled discovery tools like Google Dataset Search. However, the rise of Large Language Models (LLMs) capable of navigating the unstructured web raises a fundamental question: Is semantic metadata still necessary for agentic data discovery, or can agents reliably retrieve actionable data directly from the web? We present a comparative analysis of agentic data retrieval across two distinct environments: a Baseline Agent searching billions of open-web documents, and a Semantic Agent leveraging a corpus of 90 million datasets using schema.org. We deploy an \"LLM-as-a-judge\" evaluation pipeline, mapped directly to the FAIR pri...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"","openalex_id":"https://openalex.org/W7162818307","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Google (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8497999906539917},{"id":"https://openalex.org/C93518851","display_name":"Metadata","score":0.7907999753952026},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.7161999940872192},{"id":"https://openalex.org/C12725497","display_name":"Baseline (sea)","score":0.679099977016449},{"id":"https://openalex.org/C2129575","display_name":"Semantic Web","score":0.48570001125335693},{"id":"https://openalex.org/C136764020","display_name":"World Wide Web","score":0.4706999957561493},{"id":"https://openalex.org/C184337299","display_name":"Semantics (computer science)","score":0.40790000557899475},{"id":"https://openalex.org/C551230270","display_name":"Data retrieval","score":0.3968000113964081}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"arxiv:2605.27860","title":"C-MIG: Multi-view Information Gain-based Retrieval-Augmented Generation for Clinical Diagnosis Reasoning","url":"https://arxiv.org/abs/2605.27860","published":"2026-05-27","authors":["Yuwei Miao","Gen Li","Yunsheng Zeng","Xiandong Li","Yujin Wang","Siyu Chen","Luning Wang","Yunhao Qiao","Junfeng Wang","Jianwei Lv","Bo Yuan"],"abstract":"Retrieval-augmented generation combined with reinforcement learning has shown promise for grounding large language models in trustworthy medical evidence. However, existing methods rely on exact-match binary rewards, which in clinical diagnosis cause two issues: (i) semantically relevant but non-verbatim steps receive zero signal, discarding valuable learning signals; and (ii) uni-dimensional rewards cannot effectively supervise heterogeneous reasoning capabilities. To address these issues, we propose C-MIG, a Multi-view Information Gain-based retrieval-augmented generation framework for Clinical diagnosis. C-MIG estimates information gain under a frozen reference model from two complementary views, retrieved-document and document-refinement, to jointly guide what to retrieve and how to refine, alleviating the issues of valuable reward signal loss and credit assignment. We further design...","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"","openalex_id":"https://openalex.org/W7162817868","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Baidu (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6944000124931335},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.6126000285148621},{"id":"https://openalex.org/C100660578","display_name":"Recall","score":0.546500027179718},{"id":"https://openalex.org/C534262118","display_name":"Medical diagnosis","score":0.5273000001907349},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.510200023651123},{"id":"https://openalex.org/C97541855","display_name":"Reinforcement learning","score":0.5022000074386597},{"id":"https://openalex.org/C153701036","display_name":"Trustworthiness","score":0.47429999709129333},{"id":"https://openalex.org/C2983449737","display_name":"Clinical diagnosis","score":0.4129999876022339}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7162567528","title":"Algorithmic Compression via Pretrained Neural Networks","url":"https://doi.org/10.3390/e28060596","published":"2026-05-27","authors":["Tim Genewein","Jordi Grau-Moya","Li Kevin Wenliang","Laurent Orseau","Marcus Hütter"],"abstract":"The success of large neural networks trained for sequential prediction via log-loss minimization over massive and diverse datasets has sparked debate regarding the fundamental limits of this paradigm. While these models are not explicitly programmed to perform planning and search, their behavior increasingly resembles complex reasoning and adaptive problem-solving. This paper reviews a series of theoretical and empirical works, aiming to bridge the gap between the practical success of LLMs and formal theories of computation and intelligence—that is, algorithmic information theory and Universal Artificial Intelligence. Grounded in the framework of memory-based meta-learning, the main argument is that training sequence models to predict the next token across diverse tasks implicitly meta-trains them to perform algorithmic compression, thereby performing (amortized) Bayesian inference over....","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.3390/e28060596","openalex_id":"https://openalex.org/W7162567528","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Australian National University","Google (United Kingdom)","Google DeepMind (United Kingdom)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8062999844551086},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.6136000156402588},{"id":"https://openalex.org/C2776214188","display_name":"Inference","score":0.6039000153541565},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.5778999924659729},{"id":"https://openalex.org/C50644808","display_name":"Artificial neural network","score":0.5149999856948853},{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.4959999918937683},{"id":"https://openalex.org/C98184364","display_name":"Argument (complex analysis)","score":0.45750001072883606},{"id":"https://openalex.org/C107673813","display_name":"Bayesian probability","score":0.451200008392334}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7162454131","title":"ConDABench: Interactive Evaluation of Language Models for Data Analysis","url":"https://doi.org/10.1145/3788853.3803099","published":"2026-05-26","authors":["Avik Dutta","Priyanshu Gupta","Hosein Hasanbeig","Rahul Pratap Singh","Harshit Nigam","Sumit Gulwani","Arjun Radhakrishna","Gustavo Soares","Ashish Tiwari"],"abstract":"Real-world data analysis tasks often come with under-specified goals and unclean data. User interaction is necessary to understand and disambiguate a user's intent, and hence, essential to solving these complex tasks. Existing benchmarks for evaluating LLMs on data analysis tasks do not capture these complexities or provide first-class support for interactivity. We introduce ConDABench, a framework for generating conversational data analysis (ConDA) benchmarks and evaluating external tools on the generated benchmarks. ConDABench consists of (a) a multi-agent workflow for generating realistic benchmarks from articles describing insights gained from public datasets, (b) 1,420 ConDA problems generated using this workflow, and (c) an evaluation harness that, for the first time, makes it possible to systematically evaluate conversational data analysis tools on the generated ConDA problems. Ev...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"article","doi":"https://doi.org/10.1145/3788853.3803099","openalex_id":"https://openalex.org/W7162454131","cited_by_count":0,"quality_score":68,"matched_keywords":["Artificial intelligence","Search and information retrieval","1970-01-01","Inproceedings (Conference)"],"author_affiliations":["Microsoft (United States)","Microsoft Research (India)","Microsoft"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6855999827384949},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.44020000100135803},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.43220001459121704},{"id":"https://openalex.org/C195324797","display_name":"Natural language","score":0.32269999384880066},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.3215999901294708},{"id":"https://openalex.org/C67186912","display_name":"Data modeling","score":0.30559998750686646},{"id":"https://openalex.org/C179603123","display_name":"Modeling language","score":0.2791999876499176},{"id":"https://openalex.org/C2776401178","display_name":"Feature (linguistics)","score":0.2720000147819519}],"official_report":true,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"hf-org-paper:tencent:2605.28398","title":"HRBench: Benchmarking and Understanding Thinking-Mode Switch Strategies in Hybrid-Reasoning LLMs","url":"https://huggingface.co/papers/2605.28398","published":"2026-05-26","authors":["Tencent/Hunyuan"],"abstract":"","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["HuggingFace org papers","tencent"],"author_affiliations":["Tencent/Hunyuan"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/tencent/papers"}},{"id":"hf-org-paper:Tencent-Hunyuan:2605.28548","title":"GEM: Generative Supervision Helps Embodied Intelligence","url":"https://huggingface.co/papers/2605.28548","published":"2026-05-26","authors":["Tencent/Hunyuan"],"abstract":"","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["HuggingFace org papers","Tencent-Hunyuan"],"author_affiliations":["Tencent/Hunyuan"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/Tencent-Hunyuan/papers"}},{"id":"arxiv:2605.27740","title":"UNIQUE: Universal Top-k Sparse Attention for Training-free Inference and Sparsity-aware Training","url":"https://arxiv.org/abs/2605.27740","published":"2026-05-26","authors":["Keqi Deng","Shaoshi Ling","Ruchao Fan","Jinyu Li"],"abstract":"Long-context inference in large language models (LLMs) is bottlenecked by the linear growth of the self-attention key-value (KV) cache. Top-k sparse attention alleviates this by loading only a small fraction of the KV cache, but accurately and cheaply estimating cache importance, for both training-free use and sparsity-aware training, remains challenging. This paper proposes UNIQUE, a universal top-k sparse attention framework that addresses both requirements and stays consistently effective across LLM modalities. UNIQUE operates at the granularity of KV pages and estimates per-page importance with a simple yet accurate score combining the mean of the page's keys as a representative vector with their standard deviation as an offset term. To further close the train-inference gap, this paper introduces a soft-mask sparsity-aware training scheme that uses the top-k score boundary as a per-q...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"","openalex_id":"https://openalex.org/W7162817640","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Microsoft (United States)","Microsoft Research (United Kingdom)"],"concepts":[{"id":"https://openalex.org/C68339613","display_name":"Speedup","score":0.8628000020980835},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7368999719619751},{"id":"https://openalex.org/C2776214188","display_name":"Inference","score":0.7141000032424927},{"id":"https://openalex.org/C177774035","display_name":"Granularity","score":0.6111999750137329},{"id":"https://openalex.org/C175291020","display_name":"Offset (computer science)","score":0.6097999811172485},{"id":"https://openalex.org/C57273362","display_name":"Decoding methods","score":0.5037999749183655},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.42829999327659607},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.4083999991416931}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"arxiv:2605.26941","title":"The 2nd EReL@MIR Workshop on Efficient Representation Learning for Multimodal Information Retrieval","url":"https://arxiv.org/abs/2605.26941","published":"2026-05-26","authors":["Junchen Fu","Xuri Ge","Xin Xin","Alexandros Karatzoglou","Ioannis Arapakis","Xi Wang","Qijiong Liu","Qian Li","Joemon M. Jose"],"abstract":"Multimodal representation learning has attracted increasing attention in AI, driven by the strong performance of large, pretrained multimodal foundation models such as Qwen, LLaVA, and CLIP. These models deliver impressive performance on a range of multimodal information retrieval (MIR) tasks, including web search, cross-modal retrieval, and recommender systems. Yet their massive parameter counts create major efficiency bottlenecks when adapting their representations for IR tasks during training, deployment, and inference. These limitations hinder the practical use of foundation models for representation learning in information retrieval. To address these issues, we propose organizing the EReL@MIR workshop at MM 2026, bringing together researchers from academia and industry to discuss emerging solutions, open challenges, and new efficiency metrics and benchmarks for multimodal IR represe...","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"","openalex_id":"https://openalex.org/W7162700426","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Amazon (United States)","Beijing University of Posts and Telecommunications","Hong Kong Polytechnic University","Shandong Management University","Shandong University","Telefonica Research and Development","University of Glasgow","University of Sheffield"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7849000096321106},{"id":"https://openalex.org/C2776359362","display_name":"Representation (politics)","score":0.6600000262260437},{"id":"https://openalex.org/C59404180","display_name":"Feature learning","score":0.5558000206947327},{"id":"https://openalex.org/C2780966255","display_name":"Foundation (evidence)","score":0.48260000348091125},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.4666999876499176},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.45570001006126404},{"id":"https://openalex.org/C2780660688","display_name":"Multimodal learning","score":0.3716999888420105},{"id":"https://openalex.org/C557471498","display_name":"Recommender system","score":0.35440000891685486}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7162405629","title":"SQLens: Continuous Code-to-SQL Visibility in the Wild","url":"https://doi.org/10.1145/3788853.3803091","published":"2026-05-26","authors":["Xiao Yang","Mo Sha","Yiran Li","Shian Zhong","Sheng Wang","Feixue Zhou","Feifei Li"],"abstract":"Modern Internet-scale services evolve, yet database interactions are increasingly obscured by languages, frameworks, and abstraction layers. This loss of visibility weakens the link between code changes and SQL behavior, leading to performance regressions, security risks, and costly diagnosis. SQLens is a practical system that restores auditable, end-to-end visibility between application code and executed SQL in large, heterogeneous codebases. It combines static program analysis with LLM-guided reasoning in a closed-loop workflow: starting from database emission sites, cooperative agents traverse control and data flows to reconstruct SQL construction and parameter binding. The system refines these mappings using SQL logs collected over time and stores them in a versioned knowledge layer supporting code-to-SQL lookup and SQL-to-code attribution. Deployed at Alibaba Group and evaluated on....","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3788853.3803091","openalex_id":"https://openalex.org/W7162405629","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Alibaba Group (China)","National University of Singapore"],"concepts":[{"id":"https://openalex.org/C123403432","display_name":"Visibility","score":0.5479000210762024},{"id":"https://openalex.org/C62649853","display_name":"Remote sensing","score":0.448199987411499},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.4047999978065491},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.3889999985694885},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3416999876499176},{"id":"https://openalex.org/C127313418","display_name":"Geology","score":0.29019999504089355},{"id":"https://openalex.org/C205649164","display_name":"Geography","score":0.28929999470710754},{"id":"https://openalex.org/C99498987","display_name":"Noise (video)","score":0.24330000579357147}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7162430258","title":"Optimizing Dropout in LLM Training: Performance Comparison of Fusion and Overlap","url":"https://doi.org/10.1145/3801489.3806911","published":"2026-05-26","authors":["Haiyue Ma","Jian Liu","Ronny Krashinsky"],"abstract":"This work proposes overlapping Random Number Generation (RNG), the main runtime contributor of Dropout, with preceding GEneral Matrix Multiplication (GEMM) layers to hide RNG latency during LLM training. The state-of-the-art optimization is to fuse Dropout into the Flash-Attention kernel; however, evaluating fine-grained architecture resource constraints beyond traditional compute or memory metrics reveals that fusion fails to fully hide RNG latency due to shared lower-level architecture bottlenecks. RNG and GEMM have distinct hardware bottlenecks, so they can run together without compromising each other's performance. Our analytical model, validated on GH100 GPUs, shows 1.26× speedup over sequential execution and 1.22× over state-of-the-art fusion on Llama3 for a single Transformer block. Our methodology generalizes to fusion-versus-overlap decisions for new LLM operators across various...","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3801489.3806911","openalex_id":"https://openalex.org/W7162430258","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Nvidia (United States)","Princeton University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7396000027656555},{"id":"https://openalex.org/C68339613","display_name":"Speedup","score":0.7008000016212463},{"id":"https://openalex.org/C82876162","display_name":"Latency (audio)","score":0.5853999853134155},{"id":"https://openalex.org/C2776145597","display_name":"Dropout (neural networks)","score":0.5139999985694885},{"id":"https://openalex.org/C158525013","display_name":"Fusion","score":0.5091999769210815},{"id":"https://openalex.org/C141353440","display_name":"Fuse (electrical)","score":0.4805999994277954},{"id":"https://openalex.org/C173608175","display_name":"Parallel computing","score":0.42579999566078186},{"id":"https://openalex.org/C17349429","display_name":"Matrix multiplication","score":0.40689998865127563}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"arxiv:2605.27630","title":"OptiLoop: Coordination-in-the-Loop Verification and Repair for LLM-Generated Optimization Agents","url":"https://arxiv.org/abs/2605.27630","published":"2026-05-26","authors":["Yujia Xu","Zhiheng Wang","Thi Dinh"],"abstract":"Many decentralized decision problems require multiple parties to coordinate on shared decisions while keeping objectives, constraints, and data private. Large language models (LLMs) offer a promising way to lower the barrier to participation by generating local optimization agents from natural-language specifications. In coordination settings, however, executability is not enough: a generated agent may compile, solve, and pass local checks while still being semantically wrong, for example by misrepresenting costs, mis-scoping constraints, or responding incorrectly to incentives. Such errors often surface only during coordination, as systematic behavioral failures rather than infeasibility. We propose coordination-in-the-loop verification and repair for LLM-generated optimization agents. We instantiate this idea with an Alternating Direction Method of Multipliers (ADMM)-style consensus pr...","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"","openalex_id":"https://openalex.org/W7162817409","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Amazon (United States)"],"concepts":[{"id":"https://openalex.org/C55439883","display_name":"Correctness","score":0.775600016117096},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7462999820709229},{"id":"https://openalex.org/C206588197","display_name":"Reuse","score":0.5555999875068665},{"id":"https://openalex.org/C43521106","display_name":"Pipeline (software)","score":0.5256999731063843},{"id":"https://openalex.org/C2780385302","display_name":"Protocol (science)","score":0.47699999809265137},{"id":"https://openalex.org/C137836250","display_name":"Optimization problem","score":0.446399986743927},{"id":"https://openalex.org/C2776760102","display_name":"Code (set theory)","score":0.41940000653266907},{"id":"https://openalex.org/C120314980","display_name":"Distributed computing","score":0.40470001101493835}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7162463779","title":"MagicGeo: Training-free text-guided geometric diagram generation","url":"https://doi.org/10.1016/j.gmod.2026.101331","published":"2026-05-26","authors":["Ting Zhang","Yì Wáng","Heng Yu","Qunyi Xie","Yì Wáng","Hua Huang"],"abstract":"While text-to-image generation has made strides in photorealistic imagery, creating accurate geometric diagrams remains a challenge due to the need for precise spatial relationships and the scarcity of geometry-specific datasets. This paper presents MagicGeo, a training-free framework for generating geometric diagrams from textual descriptions. MagicGeo formulates the diagram generation process as a coordinate optimization problem, ensuring geometric correctness through a formal language solver, and then employs coordinate-aware generation. The framework leverages the strong language translation capability of large language models, while formal mathematical solving ensures geometric correctness. We further introduce MagicGeoBench, a benchmark dataset of 220 geometric diagram descriptions, and demonstrate that MagicGeo outperforms current methods in both qualitative and quantitative evalu...","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1016/j.gmod.2026.101331","openalex_id":"https://openalex.org/W7162463779","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Baidu (China)","Beijing Academy of Artificial Intelligence","Beijing Normal University"],"concepts":[{"id":"https://openalex.org/C55439883","display_name":"Correctness","score":0.6844000220298767},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6473000049591064},{"id":"https://openalex.org/C186399060","display_name":"Diagram","score":0.6425999999046326},{"id":"https://openalex.org/C98045186","display_name":"Process (computing)","score":0.5206000208854675},{"id":"https://openalex.org/C104065381","display_name":"Geometric modeling","score":0.45989999175071716},{"id":"https://openalex.org/C202446494","display_name":"Class diagram","score":0.4587000012397766},{"id":"https://openalex.org/C48419115","display_name":"Communication diagram","score":0.4413999915122986},{"id":"https://openalex.org/C80444323","display_name":"Theoretical computer science","score":0.39959999918937683}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7162399423","title":"MULLER: A Multimodal Data Lake Format for Collaborative AI Data Workflows","url":"https://doi.org/10.1145/3788853.3801585","published":"2026-05-26","authors":["Xueling Lin","Bingyu Liu","Gao Cong"],"abstract":"While multimodal large language models (MLLMs) have attracted growing attention, collaboratively preparing and maintaining large-scale multimodal datasets remains challenging due to version conflicts and inefficient data access in real-world scenarios. We present MULLER, a novel Multimodal data lake format for collaborative AI data workflows. MULLER provides (1) an array-oriented hybrid search engine for joint vector, text, and scalar queries; (2) fine-grained Git-like versioning with row-level commit, checkout, diff, and three-way merge with conflict resolution; (3) low-latency random access and fast full-scan for efficient sampling and exploration; and (4) seamless integration with LLM/MLLM training pipelines. Experimental results show that MULLER achieves millisecond-level query and versioning operations on large datasets, with random access and full-scan performance comparable to sta...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3788853.3801585","openalex_id":"https://openalex.org/W7162399423","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Huawei Technologies (China)","Nanyang Technological University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6399000287055969},{"id":"https://openalex.org/C177212765","display_name":"Workflow","score":0.5823000073432922},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.33570000529289246},{"id":"https://openalex.org/C136764020","display_name":"World Wide Web","score":0.310699999332428},{"id":"https://openalex.org/C2522767166","display_name":"Data science","score":0.2944999933242798},{"id":"https://openalex.org/C2776401178","display_name":"Feature (linguistics)","score":0.2888999879360199},{"id":"https://openalex.org/C9652623","display_name":"Field (mathematics)","score":0.28690001368522644},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.2856999933719635}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7162412425","title":"LindormVector: A Distributed Vector Engine on a Cloud-Native Multi-Model NoSQL Database","url":"https://doi.org/10.1145/3788853.3803088","published":"2026-05-26","authors":["Yan Wang","Jian Zhou","Sai Huang","Chao Dou","Hanwen Tian","Zhijie Jiang","Zongning Zhang","Xiaoqi Li","Zhencan Peng","Chunhui Shen","Wei Zhang","Feifei Li"],"abstract":"Vector databases have become a cornerstone of modern AI infrastructure, enabling semantic retrieval and retrieval-augmented generation (RAG) over massive unstructured datasets. However, existing systems face an inherent trade-off: specialized vector databases deliver low-latency in-memory search but struggle with scalability and integration with structured data, whereas general-purpose databases provide consistency and fault tolerance at the cost of query performance. In this paper, we present LindormVector, a distributed, scalable, and cost-efficient vector engine built natively on Lindorm, Alibaba Cloud's multi-model, cloud-native NoSQL database. LindormVector adopts a shared-storage architecture where all data is persisted in a distributed file system for elasticity, durability, and high availability. To minimize I/O operations, network communication, and memory footprint, LindormVect...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3788853.3803088","openalex_id":"https://openalex.org/W7162412425","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Alibaba Group (China)","Rutgers, The State University of New Jersey","University of Science and Technology of China"],"concepts":[{"id":"https://openalex.org/C2779599972","display_name":"NoSQL","score":0.8309000134468079},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6521000266075134},{"id":"https://openalex.org/C77088390","display_name":"Database","score":0.5925999879837036},{"id":"https://openalex.org/C70061542","display_name":"Distributed database","score":0.3716000020503998},{"id":"https://openalex.org/C124101348","display_name":"Data mining","score":0.28529998660087585},{"id":"https://openalex.org/C2989070954","display_name":"Database query","score":0.28459998965263367},{"id":"https://openalex.org/C5655090","display_name":"Relational database","score":0.28189998865127563},{"id":"https://openalex.org/C26517878","display_name":"Key (lock)","score":0.2797999978065491}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7162393714","title":"Demo of SemWeave: Semantic Common Expressions for LLM-powered Query Processing","url":"https://doi.org/10.1145/3788853.3801593","published":"2026-05-26","authors":["Md. Tareq Mahmood","Venkatesh Emani","Hangdong Zhao","Shivaram Venkataraman"],"abstract":"Large Language Models (LLMs) enable semantic query processing using natural language operators such as semantic filters, maps, and joins. In exploratory data analysis, users commonly issue semantically related queries, for example, by incrementally refining a filter to be more specific or general, causing existing systems to repeatedly evaluate semantic operators and incur redundant inference costs. To address this, we introduce Semantic Common Expressions (SCE), a novel abstraction that leverages Natural Language Inference to identify containment relationships between natural language filters. We develop a system, SemWeave, that exploits SCEs to reuse prior inferences for cheaper and faster semantic operations. SemWeave is general and can be integrated into any semantic query engine; in this demo, we showcase how SemWeave enhances exploratory data analytics with two popular engines. A v...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3788853.3801593","openalex_id":"https://openalex.org/W7162393714","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Microsoft (United States)","University of Wisconsin–Madison"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.718999981880188},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.42089998722076416},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.3626999855041504},{"id":"https://openalex.org/C184337299","display_name":"Semantics (computer science)","score":0.3382999897003174},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.33309999108314514},{"id":"https://openalex.org/C177264268","display_name":"Set (abstract data type)","score":0.32120001316070557},{"id":"https://openalex.org/C2776401178","display_name":"Feature (linguistics)","score":0.2996000051498413},{"id":"https://openalex.org/C168167062","display_name":"Component (thermodynamics)","score":0.29919999837875366}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7162442151","title":"Can AI assist in Mathematics and Computer Science research?","url":"https://doi.org/10.1145/3788853.3801143","published":"2026-05-26","authors":["Prabhakar Raghavan"],"abstract":"The question of whether AI can assist in Mathematics and Computer Science research is central to modern scientific discourse. In this lecture we explore the rapidly emerging role of large language models (LLMs) and artificial intelligence in advancing mathematical and computer science research. While acknowledging that it remains too early to draw definitive conclusions about the capabilities of AI in these domains, we showcase promising results achieved in part using AlphaEvolve, an evolutionary language model developed by Google DeepMind.","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3788853.3801143","openalex_id":"https://openalex.org/W7162442151","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Google (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5539000034332275},{"id":"https://openalex.org/C145420912","display_name":"Mathematics education","score":0.43700000643730164},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3846000134944916},{"id":"https://openalex.org/C33923547","display_name":"Mathematics","score":0.26649999618530273},{"id":"https://openalex.org/C2984499602","display_name":"Computer software","score":0.25450000166893005},{"id":"https://openalex.org/C26517878","display_name":"Key (lock)","score":0.23829999566078186},{"id":"https://openalex.org/C2776401178","display_name":"Feature (linguistics)","score":0.23229999840259552},{"id":"https://openalex.org/C127413603","display_name":"Engineering","score":0.21150000393390656}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7162402709","title":"A <scp>mbi</scp> SQL: Interactive Ambiguity Detection and Resolution for Text-to-SQL","url":"https://doi.org/10.1145/3788853.3801581","published":"2026-05-26","authors":["Zhongjun Ding","Yin Lin","Tianjing Zeng","Rong Zhu","Bolin Ding","Jingren Zhou"],"abstract":"Text-to-SQL systems translate natural language questions into SQL queries, providing substantial value for non-expert users. While large language models (LLMs) show promising results for this task, they remain error-prone. Query ambiguity has been recognized as a major obstacle in LLM-based Text-to-SQL systems, leading to misinterpretation of user intent and inaccurate SQL generation. To this end, we present AmbiSQL, an interactive system that automatically detects query ambiguities and guides users through intuitive multiple-choice questions to clarify their intent. It introduces a fine-grained ambiguity taxonomy for identifying ambiguities arising from both database elements and LLM reasoning, and subsequently incorporates user feedback to rewrite ambiguous questions. In this demonstration, AmbiSQL is integrated with XiYan-SQL, our commercial Text-to-SQL backend. We provide 40 ambiguou...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3788853.3801581","openalex_id":"https://openalex.org/W7162402709","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Alibaba Group (China)","Alibaba Group (United States)"],"concepts":[{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5688999891281128},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5453000068664551},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.526199996471405},{"id":"https://openalex.org/C2780522230","display_name":"Ambiguity","score":0.43970000743865967},{"id":"https://openalex.org/C138268822","display_name":"Resolution (logic)","score":0.3896999955177307},{"id":"https://openalex.org/C99498987","display_name":"Noise (video)","score":0.2919999957084656},{"id":"https://openalex.org/C153180895","display_name":"Pattern recognition (psychology)","score":0.28540000319480896},{"id":"https://openalex.org/C2779343474","display_name":"Context (archaeology)","score":0.2833999991416931}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"arxiv:2605.27358","title":"MobileMoE: Scaling On-Device Mixture of Experts","url":"https://huggingface.co/papers/2605.27358","published":"2026-05-26","authors":["Yanbei Chen","Hanxian Huang","Ernie Chang","Jacob Szwejbka","Digant Desai","Zechun Liu","Vikas Chandra","Raghuraman Krishnamoorthi"],"abstract":"Mixture-of-Experts (MoE) has become the de facto architecture for hundred-billion-parameter language models, yet its advantages at sub-billion scales for on-device deployment remain largely unexplored. To close this gap, we present MobileMoE, a family of on-device MoE language models with sub-billion active parameters (0.3-0.9B active and 1.3-5.3B total) that establish a new Pareto frontier for on-device LLMs. We first formulate an on-device MoE scaling law that jointly optimizes MoE architecture under mobile memory and compute constraints, identifying an on-device sweet spot - moderate sparsity with fine-grained and shared experts - that is simultaneously memory and compute-optimal. Building on the derived architectures, we train MobileMoE with a four-stage recipe covering pre-training, mid-training, instruction fine-tuning, and quantization-aware training, all on open-source datasets.....","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["huggingface_search"],"source":"huggingface_search","work_type":"","doi":"","openalex_id":"","cited_by_count":0,"quality_score":27,"matched_keywords":[],"author_affiliations":[],"concepts":[],"official_report":false,"quality_signals":{"company_match_source":"HuggingFace Papers organization/author metadata"}},{"id":"arxiv:2605.27366","title":"MUSE-Autoskill: Self-Evolving Agents via Skill Creation, Memory, Management, and Evaluation","url":"https://huggingface.co/papers/2605.27366","published":"2026-05-26","authors":["Huawei Lin","Peng Li","Jie Song","Fuxin Jiang","Tieying Zhang"],"abstract":"Large language model (LLM) agents rely on reusable skills to solve complex tasks. However, existing skill creation approaches treat skills as isolated and static artifacts, limiting their reusability, reliability, and long-term improvement. We propose MUSE-Autoskill Agent (Memory-Utilizing Skill Evolution), a skill-centric agent framework that lets agents continuously improve their task-solving capability by creating, reusing, and refining skills under a unified lifecycle (creation, memory, management, evaluation, and refinement). Our framework enables agents to create skills on demand, store and reuse them across tasks, organize and select them efficiently, and evaluate them through unit tests and runtime feedback for continuous refinement. We further introduce skill-level memory that accumulates experience for each skill across tasks, enabling more effective reuse and adaptation over t...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["huggingface_search"],"source":"huggingface_search","work_type":"","doi":"","openalex_id":"","cited_by_count":0,"quality_score":27,"matched_keywords":[],"author_affiliations":[],"concepts":[],"official_report":false,"quality_signals":{"company_match_source":"HuggingFace Papers organization/author metadata"}},{"id":"arxiv:2605.27295","title":"Gemini Embedding 2: A Native Multimodal Embedding Model from Gemini","url":"https://huggingface.co/papers/2605.27295","published":"2026-05-26","authors":["Madhuri Shanbhogue","Zhe Li","Shanfeng Zhang","Gustavo Hernández Ábrego","Shih-Cheng Huang","Aashi Jain","Daniel Salz","Sonam Goenka","Chaitra Hegde","Ji Ma","Feiyang Chen","Jiaxing Wu"],"abstract":"We introduce Gemini Embedding 2, a native multimodal embedding model that allows embedding video, audio, image, and text modalities in a unified representation space. We leverage the multimodal capabilities of Gemini to produce embeddings for arbitrary combinations of interleaved inputs across all these modalities that generalize well across a wide variety of tasks. Applying large-scale contrastive learning in a multi-task multi-stage training setup, we achieve state-of-the-art performance on key embedding benchmarks including unimodal, cross-modal, and multimodal retrieval spanning a diverse set of tasks. We show that our embedding model demonstrates strong performance (with a score of 62.9 R@1 on MSCOCO, 68.8 NDCG@10 on Vatex, 69.9 on MTEB multilingual and 84.0 on MTEB Code) across a variety of tasks surpassing the performance of specialized models. These unified capabilities make Gemi...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["huggingface_search"],"source":"huggingface_search","work_type":"","doi":"","openalex_id":"","cited_by_count":0,"quality_score":27,"matched_keywords":[],"author_affiliations":[],"concepts":[],"official_report":false,"quality_signals":{"company_match_source":"HuggingFace Papers organization/author metadata"}},{"id":"hf-org-paper:MiniMaxAI:2605.26494","title":"The MiniMax-M2 Series: Mini Activations Unleashing Max Real-World Intelligence","url":"https://huggingface.co/papers/2605.26494","published":"2026-05-25","authors":["MiniMax"],"abstract":"We introduce the MiniMax-M2 series, a family of Mixture-of-Experts language models built around the principle that mini activations can unleash maximum real-world intelligence. The flagship M2 contains 229.9B total parameters with only 9.8B activated per token. Designed end-to-end for agentic deployment, the M2 series rests on three components: (i) agent-driven data pipelines producing large-scale, verifiable trajectories across agentic coding and agentic cowork, each grounded in an executable workspace and an artifact-aligned reward; (ii) Forge, a scalable agent-native RL system that adapts to long-horizon agent trajectories, paired with windowed-FIFO scheduling, prefix-tree merging, inference optimization, and a clean training-inference-agent decoupling that supports both white-box and black-box agents; (iii) the latest M2.7 checkpoint takes an early step toward self-evolution -- auton...","companies":["MiniMax"],"matched_orgs":["MiniMax"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["HuggingFace org papers","MiniMaxAI"],"author_affiliations":["MiniMax"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/MiniMaxAI/papers"}},{"id":"hf-org-paper:tencent:2605.26952","title":"Efficient Agentic Reinforcement Learning with On-Policy Intrinsic Knowledge Boundary Enhancement","url":"https://huggingface.co/papers/2605.26952","published":"2026-05-25","authors":["Tencent/Hunyuan"],"abstract":"","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["HuggingFace org papers","tencent"],"author_affiliations":["Tencent/Hunyuan"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/tencent/papers"}},{"id":"arxiv:2605.25480","title":"Retrieval as Reasoning: Self-Evolving Agent-Native Retrieval via LLM-Wiki","url":"https://arxiv.org/abs/2605.25480","published":"2026-05-25","authors":["铭豪 梁","Feifei Li","Wu Xiaoqing","Wenhui Que"],"abstract":"LLM agents require retrieval to behave less like one-shot context fetching and more like reasoning: searching, reading, traversing, and deciding when evidence is sufficient. Yet current Retrieval-Augmented Generation (RAG) systems organize external knowledge as flat chunks retrieved by embedding similarity, exposing a retrieval-as-lookup interface ill-suited to iterative reasoning agents. We propose LLM-Wiki, an agent-native retrieval system that operationalizes the Retrieval-as-Reasoning paradigm by treating external knowledge as a compilable, composable, and self-evolving structure rather than a static retrieval index. LLM-Wiki compiles documents into structured Wiki pages with bidirectional links, exposes search, read, and link-following operations through standard tool-calling interfaces, and introduces an Error Book for persistent structural and semantic self-correction. LLM-Wiki ac...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"","openalex_id":"https://openalex.org/W7162699997","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Tencent (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8032000064849854},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.6315000057220459},{"id":"https://openalex.org/C41608201","display_name":"Embedding","score":0.5730000138282776},{"id":"https://openalex.org/C2779343474","display_name":"Context (archaeology)","score":0.5428000092506409},{"id":"https://openalex.org/C1667742","display_name":"Image retrieval","score":0.4408999979496002},{"id":"https://openalex.org/C161156560","display_name":"Document retrieval","score":0.4404999911785126},{"id":"https://openalex.org/C90288658","display_name":"Human–computer information retrieval","score":0.42570000886917114},{"id":"https://openalex.org/C551230270","display_name":"Data retrieval","score":0.3700000047683716}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"arxiv:2605.25486","title":"RAG-Match: Retrieval-Augmented Knowledge Injection and Hierarchical Reasoning for Calibrated Semantic Relevance","url":"https://arxiv.org/abs/2605.25486","published":"2026-05-25","authors":["Hengjun Jiang","Li Sun","Yan Jiang","Xiaojie Ke","Yongjin Wang","Xiangkun Liu","Cunxin Gu","Jian Xu","Guanjun Jiang"],"abstract":"Semantic relevance judgment for search is particularly challenging in knowledge-intensive scenarios, where accurate ranking requires not only semantic matching but also background grounding, multi-step reasoning, and well-calibrated decision boundaries. Existing relevance models mainly rely on direct label supervision or shallow semantic similarity, which limits their ability to handle implicit intent, factual equivalence, and fine-grained relevance distinctions. To address this issue, we propose \\textsc{RAG-Match}, a three-stage framework that integrates knowledge-augmented pretraining, hierarchical reasoning alignment, and preference-based decision calibration for relevance modeling. The key idea is to first strengthen query-centered semantic grounding, then align the model with structured relevance reasoning, and finally correct decision-level inconsistencies in difficult boundary cas...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"","openalex_id":"https://openalex.org/W7162605370","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Alibaba Group (China)","Alibaba Group (United States)"],"concepts":[{"id":"https://openalex.org/C158154518","display_name":"Relevance (law)","score":0.8834999799728394},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.742900013923645},{"id":"https://openalex.org/C189430467","display_name":"Ranking (information retrieval)","score":0.6880999803543091},{"id":"https://openalex.org/C165064840","display_name":"Matching (statistics)","score":0.5864999890327454},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5511999726295471},{"id":"https://openalex.org/C185798385","display_name":"Benchmark (surveying)","score":0.5407000184059143},{"id":"https://openalex.org/C2778493491","display_name":"Semantic matching","score":0.4690999984741211},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.46219998598098755}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"arxiv:2605.25604","title":"DVAO: Dynamic Variance-adaptive Advantage Optimization for Multi-reward Reinforcement Learning","url":"https://arxiv.org/abs/2605.25604","published":"2026-05-25","authors":["Guochao Jiang","Jingyi Song","Guofeng Quan","Chuzhan Hao","Guohua Liu","Yuewei Zhang"],"abstract":"Reinforcement Learning has become a standard paradigm for aligning Large Language Models with human intent and task requirements. While Group Relative Policy Optimization offers an efficient, value-model-free alternative to Proximal Policy Optimization, adapting it to real-world multi-reward settings remains challenging. Standard scalarization practices, such as Reward Combination and Advantage Combination, suffer from significant drawbacks: Reward Combination frequently generates advantages with excessively large squared magnitudes that lead to training instability, while Advantage Combination relies on static hyperparameters and ignores cross-objective correlations. To address these limitations, we propose Dynamic Variance-adaptive Advantage Optimization (DVAO), which dynamically adjusts combination weights based on the empirical reward variance of each objective within a rollout group...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"","openalex_id":"https://openalex.org/W7162606170","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Alibaba Group (China)"],"concepts":[{"id":"https://openalex.org/C97541855","display_name":"Reinforcement learning","score":0.7208999991416931},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6897000074386597},{"id":"https://openalex.org/C8642999","display_name":"Hyperparameter","score":0.576200008392334},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.5472999811172485},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5110999941825867},{"id":"https://openalex.org/C2780451532","display_name":"Task (project management)","score":0.4771000146865845},{"id":"https://openalex.org/C2776135515","display_name":"Regularization (linguistics)","score":0.4580000042915344},{"id":"https://openalex.org/C137635306","display_name":"Pareto principle","score":0.45399999618530273}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"arxiv:2605.25382","title":"AuthTrace: Diagnosing Evidence Construction in Thematically Dense Single-Author Corpora","url":"https://arxiv.org/abs/2605.25382","published":"2026-05-25","authors":["Wu Xiaoqing","Feifei Li","铭豪 梁","Wenhui Que"],"abstract":"Evidence construction--the stage that determines which passages reach the language model before generation begins--is evaluated paradigm by paradigm, leaving practitioners with no principled way to diagnose which organization strategy fails, where, or why. We introduce AuthTrace, a diagnostic benchmark built on thematically dense single-author corpora where near-miss distractors share style, topic, and vocabulary with the required evidence. AuthTrace provides explicit quoted evidence, exact fan-in annotation, and a unified pack-level protocol measuring evidence recall, evidence precision, and answer correctness. A fan-in gradient--the number of source documents required to support the answer--serves as the primary diagnostic axis, enabling controlled comparison across retrieval, memory, graph, and structured-evidence paradigms. Evaluating eight systems across two QA models, we find that....","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"","openalex_id":"https://openalex.org/W7162700675","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Tencent (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6945000290870667},{"id":"https://openalex.org/C2777601683","display_name":"Vocabulary","score":0.6679999828338623},{"id":"https://openalex.org/C55439883","display_name":"Correctness","score":0.6373999714851379},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.5837000012397766},{"id":"https://openalex.org/C185798385","display_name":"Benchmark (surveying)","score":0.5425000190734863},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5171999931335449},{"id":"https://openalex.org/C100660578","display_name":"Recall","score":0.4986000061035156},{"id":"https://openalex.org/C2780385302","display_name":"Protocol (science)","score":0.4683000147342682}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"hf-org-paper:tencent:2605.25343","title":"Toward Native Multimodal Modeling: A Roadmap","url":"https://huggingface.co/papers/2605.25343","published":"2026-05-24","authors":["Tencent/Hunyuan"],"abstract":"","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["HuggingFace org papers","tencent"],"author_affiliations":["Tencent/Hunyuan"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/tencent/papers"}},{"id":"hf-org-paper:Tencent-Hunyuan:2605.26108","title":"Reinforcing Few-step Generators via Reward-Tilted Distribution Matching","url":"https://huggingface.co/papers/2605.26108","published":"2026-05-24","authors":["Tencent/Hunyuan"],"abstract":"","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["HuggingFace org papers","Tencent-Hunyuan"],"author_affiliations":["Tencent/Hunyuan"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/Tencent-Hunyuan/papers"}},{"id":"hf-org-paper:Qwen:2605.25624","title":"CUA-Gym: Scaling Verifiable Training Environments and Tasks for Computer-Use Agents","url":"https://huggingface.co/papers/2605.25624","published":"2026-05-24","authors":["Alibaba/Qwen"],"abstract":"","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["HuggingFace org papers","Qwen"],"author_affiliations":["Alibaba/Qwen"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/Qwen/papers"}},{"id":"openalex:W7162241229","title":"RoboGPT-R1: Enhancing Robot Task Planning with Reinforcement Learning","url":"https://doi.org/10.65109/noxt1107","published":"2026-05-24","authors":["J M Liu","Bingyan Nie","Boyu Li","Yaran Chen","Yuze Wang","Shunsen He","Hui Li"],"abstract":"Improving the reasoning capabilities of embodied agents is crucial for robots to complete complex human instructions in long-view manipulation tasks successfully. Despite the success of large language models and vision language models based on Supervised Fine-Tuning (SFT) in planning tasks, they continue facing challenges in performing long-horizon manipulation tasks in complex real-world environments, owing to their restricted common sense and reasoning capabilities. Considering that aligning general-purpose vision language models to robotic planning tasks via supervised fine-tuning suffers from poor generalization and insufficient physical understanding, we propose RoboGPT-R1, a two-stage fine-tuning framework for embodied planning. In this framework, supervised training acquires foundational knowledge through expert sequences, followed by RL to address the model's shortcomings in visu...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.65109/noxt1107","openalex_id":"https://openalex.org/W7162241229","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Beijing Academy of Artificial Intelligence","Huawei Technologies (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7196999788284302},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.6444000005722046},{"id":"https://openalex.org/C90509273","display_name":"Robot","score":0.5885999798774719},{"id":"https://openalex.org/C100609095","display_name":"Embodied cognition","score":0.572700023651123},{"id":"https://openalex.org/C2780451532","display_name":"Task (project management)","score":0.5619000196456909},{"id":"https://openalex.org/C2780791683","display_name":"Action (physics)","score":0.5063999891281128},{"id":"https://openalex.org/C97541855","display_name":"Reinforcement learning","score":0.4999000132083893},{"id":"https://openalex.org/C2776436953","display_name":"Consistency (knowledge bases)","score":0.4611999988555908}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7162244353","title":"RIB-Guard: A Risk-Aware Information Bottleneck Defense for Black-Box Large Language Models","url":"https://doi.org/10.3390/e28060585","published":"2026-05-24","authors":["Muen Cai","Yuan Shen","X Luo","Jian Hu"],"abstract":"Large language models (LLMs) remain vulnerable to jailbreak attacks, especially in black-box settings where target-model gradients and internal tokenization are inaccessible. Recent information bottleneck-based defenses cast prompt protection as a compression problem, but existing methods still rely heavily on white-box optimization and the intrinsic alignment strength of the protected model. To address these limitations, we propose RIB-Guard, a safety-aware information bottleneck defense for black-box LLMs. RIB-Guard learns a token-level masking policy that extracts a minimally safety-sufficient prompt via reinforcement learning using only black-box feedback. In addition, it introduces an independent lightweight safety head to estimate residual jailbreak risk and provide model-agnostic safety guidance during training. The proposed framework jointly balances prompt compactness, benign ut...","companies":["Meta/FAIR"],"matched_orgs":["Meta/FAIR"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.3390/e28060585","openalex_id":"https://openalex.org/W7162244353","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Meta (United States)","University of Electronic Science and Technology of China","Uppsala University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8188999891281128},{"id":"https://openalex.org/C2780513914","display_name":"Bottleneck","score":0.7366999983787537},{"id":"https://openalex.org/C60008888","display_name":"Information bottleneck method","score":0.5807999968528748},{"id":"https://openalex.org/C63479239","display_name":"Robustness (evolution)","score":0.5590000152587891},{"id":"https://openalex.org/C46686674","display_name":"Boosting (machine learning)","score":0.5005000233650208},{"id":"https://openalex.org/C176982825","display_name":"Lexical analysis","score":0.4966999888420105},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.3862000107765198},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.38510000705718994}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7162264888","title":"Pareto-guided Pipeline for Distilling Featherweight AI Agents in Mobile MOBA Games","url":"https://doi.org/10.65109/huot2523","published":"2026-05-24","authors":["Xionghui Yang","Bozhou Chen","Ye Lu","Yue Wang","Li Li","Lanxiao Huang","Lin Liu","Wenjun Wang","Meng Meng","Xia Lin","Weiwei Li"],"abstract":"Recent advances in game AI have demonstrated the feasibility of training agents that surpass top-tier human professionals in complex environments such as Honor of Kings (HoK), a leading mobile multiplayer online battle arena (MOBA) game. However, deploying such powerful agents on mobile devices remains a major challenge. On one hand, the intricate multi-modal state representation and hierarchical action space of HoK demand large, sophisticated policy networks that are inherently difficult to compress into lightweight forms. On the other hand, production deployment requires high-frequency inference under strict energy and latency constraints on mobile platform. To the best of our knowledge, bridging large-scale game AI and practical on-device deployment has not been systematically studied. In this work, we propose a Pareto optimality guided pipeline and design a high-efficiency student ar...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.65109/huot2523","openalex_id":"https://openalex.org/W7162264888","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Peking University","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7615000009536743},{"id":"https://openalex.org/C105339364","display_name":"Software deployment","score":0.7041000127792358},{"id":"https://openalex.org/C43521106","display_name":"Pipeline (software)","score":0.5906999707221985},{"id":"https://openalex.org/C174348530","display_name":"Bridging (networking)","score":0.5655999779701233},{"id":"https://openalex.org/C186967261","display_name":"Mobile device","score":0.5430999994277954},{"id":"https://openalex.org/C2776214188","display_name":"Inference","score":0.5329999923706055},{"id":"https://openalex.org/C557433098","display_name":"Android (operating system)","score":0.4749000072479248},{"id":"https://openalex.org/C204495577","display_name":"Callback","score":0.4226999878883362}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7162258100","title":"Learning Semantic and Structure Aware Representation with Large Language Models for Concept Recommendation","url":"https://doi.org/10.65109/ckwm7360","published":"2026-05-24","authors":["Qi Li","Wei Xia","Kounianhua Du","Qiji Zhang","Wei Zhang","Ruiming Tang","Yong Yu"],"abstract":"Concept recommendation aims to suggest the next concept aligned with both the learner's state and the educational knowledge system. However, existing methods often overlook concept semantics, leading to recommendations that lack semantic relevance and structural consistency. To address this, we propose SSRec, a novel Semantic and Structure aware representation learning framework. SSRec leverages Large Language Models (LLMs) to capture concept semantics and introduces a graph-based adapter. This adapter not only integrates structural relationships but also transforms anisotropic text encodings into a smooth representation space. Extensive experiments on real-world datasets demonstrate that SSRec significantly outperforms state-of-the-art baselines in delivering accurate and consistent recommendations.","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.65109/ckwm7360","openalex_id":"https://openalex.org/W7162258100","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["GTx (United States)","Huawei Technologies (China)","Shanghai Jiao Tong University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.835099995136261},{"id":"https://openalex.org/C184337299","display_name":"Semantics (computer science)","score":0.6261000037193298},{"id":"https://openalex.org/C2776359362","display_name":"Representation (politics)","score":0.613099992275238},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.5580000281333923},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5228000283241272},{"id":"https://openalex.org/C161301231","display_name":"Knowledge representation and reasoning","score":0.4945000112056732},{"id":"https://openalex.org/C158154518","display_name":"Relevance (law)","score":0.4578000009059906},{"id":"https://openalex.org/C2983448237","display_name":"Language understanding","score":0.4083000123500824}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7162244956","title":"Integrating Counterfactual Simulations with Language Models for Explaining Multi-Agent Behaviour","url":"https://doi.org/10.65109/mcsd1905","published":"2026-05-24","authors":["Bálint Gyevnár","Christopher G. Lucas","Stefano V. Albrecht","Shay Cohen"],"abstract":"Explainability is vital for users' trust calibration in multi-agent systems (MAS), but explainable MAS face challenges due to complex environments, the human factor, and non-standardised evaluation. Leveraging the counterfactual effect size model and LLMs, we propose Agentic eXplanations via Interrogative Simulation (AXIS). AXIS generates human-centred action explanations for multi-agent policies by having an LLM interrogate an environment simulator using prompts like whatif and remove to observe and synthesise counterfactual information over multiple rounds. We evaluate AXIS on autonomous driving across ten scenarios for five LLMs with a comprehensive methodology combining robustness, subjective preference, correctness, and goal/action prediction with an external LLM as evaluator. Compared to baselines, AXIS improves perceived explanation correctness by at least 7.7% across all models a...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.65109/mcsd1905","openalex_id":"https://openalex.org/W7162244956","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Google DeepMind (United Kingdom)","University of Edinburgh"],"concepts":[{"id":"https://openalex.org/C108650721","display_name":"Counterfactual thinking","score":0.9473999738693237},{"id":"https://openalex.org/C2780791683","display_name":"Action (physics)","score":0.7069000005722046},{"id":"https://openalex.org/C55439883","display_name":"Correctness","score":0.6909999847412109},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6315000057220459},{"id":"https://openalex.org/C57098296","display_name":"Interrogative","score":0.5723999738693237},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4790000021457672},{"id":"https://openalex.org/C165838908","display_name":"Calibration","score":0.4058000147342682},{"id":"https://openalex.org/C2776760102","display_name":"Code (set theory)","score":0.39640000462532043}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7162319834","title":"Digital Module 40: Introduction to Machine Learning and Generative AI: From AutoGluon to Amazon Bedrock","url":"https://doi.org/10.1111/emip.70029","published":"2026-05-24","authors":["Ye Ma","Vinita Talreja"],"abstract":"Abstract Machine learning (ML) and generative artificial intelligence (AI) are rapidly transforming the field of educational measurement. This module focuses on illustrating the process of (1) automated machine learning (AutoML) using AutoGluon via an application of detecting aberrant test behavior and (2) AI‐based item generation using Amazon Bedrock. To support these demonstrations, two tools are used: AutoGluon, an open‐source automated machine learning (AutoML) system, and Amazon Bedrock, a fully managed AWS service for accessing foundation models from leading AI providers. By the end of this module, participants will (1) understand the key concepts and fundamentals underlying these two applications and (2) be able to programmatically train a classification ML model via AutoML using the provided data, as well as conduct AI‐based item generation via the LLMs.","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1111/emip.70029","openalex_id":"https://openalex.org/W7162319834","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Amazon (United States)"],"concepts":[{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.7889000177383423},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.7441999912261963},{"id":"https://openalex.org/C535291247","display_name":"Amazon rainforest","score":0.7228999733924866},{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.6912999749183655},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6100999712944031},{"id":"https://openalex.org/C9652623","display_name":"Field (mathematics)","score":0.5508000254631042},{"id":"https://openalex.org/C98045186","display_name":"Process (computing)","score":0.4650999903678894},{"id":"https://openalex.org/C26517878","display_name":"Key (lock)","score":0.420199990272522}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7162250757","title":"DebugTA: An LLM-Based Agent for Simplifying Debugging and Teaching in Programming Education","url":"https://doi.org/10.65109/gymb4283","published":"2026-05-24","authors":["Lingyue Fu","Datong Chen","Haowei Yuan","Xinyi Dai","Qi Li","Weinan Zhang","Liu W","Yong Yu"],"abstract":"In programming education, Debugging and Teaching (DT) task is a common scenario which requires generating modification suggestions from erroneous code, error messages, reference solutions, and problem descriptions. Existing approaches struggle with complex multi-source reasoning and underutilize available reference code, limiting the effectiveness of large language models (LLMs) in DT tasks. To address these challenges, we propose DebugTA, a novel LLM-based debugging and teaching agent with specialized tools for standard code retrieval, variable substitution to align reference code, and an external compiler for real-time code analysis. Guided by pedagogical and debugging principles, DebugTA decomposes complex DT tasks into structured LLM–tool interactions that reduce reasoning complexity. By aligning reference code with erroneous code, DebugTA enables the LLM to focus on logical errors a...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.65109/gymb4283","openalex_id":"https://openalex.org/W7162250757","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Fudan University","Shanghai Jiao Tong University","Stevens Institute of Technology","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C168065819","display_name":"Debugging","score":0.8877999782562256},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8497999906539917},{"id":"https://openalex.org/C199360897","display_name":"Programming language","score":0.7656000256538391},{"id":"https://openalex.org/C169590947","display_name":"Compiler","score":0.6622999906539917},{"id":"https://openalex.org/C2776760102","display_name":"Code (set theory)","score":0.5307000279426575},{"id":"https://openalex.org/C2780451532","display_name":"Task (project management)","score":0.49540001153945923},{"id":"https://openalex.org/C136388014","display_name":"Algorithmic program debugging","score":0.4431000053882599},{"id":"https://openalex.org/C188198153","display_name":"Limiting","score":0.4018000066280365}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7162239736","title":"Code-Space Response Oracles: Generating Interpretable Multi-Agent Policies with Large Language Models","url":"https://doi.org/10.65109/ikjf6607","published":"2026-05-24","authors":["Daniel Hennes","Zun Li","John Schultz","Marc Lanctot"],"abstract":"Policy-Space Response Oracles (PSRO) have enabled the computation of approximate Nash equilibria in complex games. However, standard implementations rely on Deep Reinforcement Learning oracles, producing ''black-box'' neural network policies that are opaque, difficult to verify, and sample-inefficient. We introduce Code-Space Response Oracles (CSRO), a framework that tasks a Large Language Model (LLM) to synthesize code policies. CSRO reframes best-response computation as a code generation task, producing policies as executable, human-readable Python code. We demonstrate that CSRO, particularly when augmented with evolutionary refinement (AlphaEvolve), achieves performance competitive with baselines while offering superior interpretability and leveraging the LLM's pretraining knowledge.","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.65109/ikjf6607","openalex_id":"https://openalex.org/W7162239736","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Google (Canada)","Google (Switzerland)","Google (United States)"],"concepts":[{"id":"https://openalex.org/C2781067378","display_name":"Interpretability","score":0.9236000180244446},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8047999739646912},{"id":"https://openalex.org/C519991488","display_name":"Python (programming language)","score":0.7084000110626221},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.61080002784729},{"id":"https://openalex.org/C45374587","display_name":"Computation","score":0.5674999952316284},{"id":"https://openalex.org/C2776760102","display_name":"Code (set theory)","score":0.5224999785423279},{"id":"https://openalex.org/C97541855","display_name":"Reinforcement learning","score":0.5188999772071838},{"id":"https://openalex.org/C26713055","display_name":"Implementation","score":0.504800021648407}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"apple:menu2aulr52emmoa6a22a758","title":"VSAS-Bench: Real-Time Evaluation of Visual Streaming Assistant Models","url":"https://machinelearning.apple.com/research/vsas-bench-streaming-assistant","published":"2026-05-22","authors":["Pavan Kumar Anasosalu Vasu","Cem Koc","Fartash Faghri","Chun-Liang Li","Bo Feng","Zhengfeng Lai","Meng Cao","Oncel Tuzel","Hadi Pouransari"],"abstract":"Streaming vision-language models (VLMs) continuously generate responses given an instruction prompt and an online stream of input frames. This is a core mechanism for real-time visual assistants. Existing VLM frameworks predominantly assess models in offline settings. In contrast, the performance of a streaming VLM depends on additional metrics beyond pure video understanding, including proactiveness, which reflects the timeliness of the model’s...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"openalex:W7162104115","title":"Same Weights, Different Words: Measuring Inter-Provider Divergence and Temperature-Zero Non-Determinism in Open-Weight LLM Inference","url":"https://doi.org/10.21203/rs.3.rs-9776212/v1","published":"2026-05-22","authors":["Gokul Chandra Purnachandra Reddy"],"abstract":"","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"preprint","doi":"https://doi.org/10.21203/rs.3.rs-9776212/v1","openalex_id":"https://openalex.org/W7162104115","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Amazon (United States)"],"concepts":[{"id":"https://openalex.org/C5274069","display_name":"Categorical variable","score":0.5496000051498413},{"id":"https://openalex.org/C20136886","display_name":"Interoperability","score":0.5472999811172485},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5414000153541565},{"id":"https://openalex.org/C2776760102","display_name":"Code (set theory)","score":0.5336999893188477},{"id":"https://openalex.org/C2780451532","display_name":"Task (project management)","score":0.5285000205039978},{"id":"https://openalex.org/C180505990","display_name":"News aggregator","score":0.5246000289916992},{"id":"https://openalex.org/C2776214188","display_name":"Inference","score":0.459199994802475},{"id":"https://openalex.org/C207390915","display_name":"Divergence (linguistics)","score":0.45719999074935913}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7162067185","title":"On the Limits of End-to-End Foundation Models: Coordination as a Missing Primitive for Artificial General Intelligence","url":"https://doi.org/10.21203/rs.3.rs-9784163/v1","published":"2026-05-22","authors":["M. Rizwan Jameel Qureshi","Abdelrahman B.M. Eldaly","Anas Zafar","Shaina Raza","Abbas Shah Syed","A Muneer","Kai Zhang","Xinqi Fan","Mohammed Alnemari","Ahsan Khan","Aman Chadha","Leanne Lai-hang Chan"],"abstract":"","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"preprint","doi":"https://doi.org/10.21203/rs.3.rs-9784163/v1","openalex_id":"https://openalex.org/W7162067185","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["City University of Hong Kong","Google (United States)","Google DeepMind (United Kingdom)","Harvard University","Hong Kong Baptist University","Machine Science","Manchester Metropolitan University","Massachusetts General Hospital","Mehran University of Engineering and Technology","Obuda University","The University of Texas MD Anderson Cancer Center","Torrens University Australia","University of Tabuk","Vector Institute"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6707000136375427},{"id":"https://openalex.org/C12713177","display_name":"Perspective (graphical)","score":0.48170000314712524},{"id":"https://openalex.org/C2780451532","display_name":"Task (project management)","score":0.46480000019073486},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.44699999690055847},{"id":"https://openalex.org/C188198153","display_name":"Limiting","score":0.4296000003814697},{"id":"https://openalex.org/C2780791683","display_name":"Action (physics)","score":0.42500001192092896},{"id":"https://openalex.org/C2780966255","display_name":"Foundation (evidence)","score":0.40139999985694885},{"id":"https://openalex.org/C2775924081","display_name":"Control (management)","score":0.39890000224113464}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"arxiv:2605.24229","title":"How Well Do Models Follow Their Constitutions?","url":"https://arxiv.org/abs/2605.24229","published":"2026-05-22","authors":["Arya Jakkli","Senthooran Rajamanoharan","Neel Nanda"],"abstract":"Frontier AI developers now train models against long written behavioral specifications, such as Anthropic's constitution (Anthropic, 2025a) and OpenAI's Model Spec (OpenAI, 2025a), integrated into post-training via methods like character training (Anthropic, 2024) and deliberative alignment (Guan et al., 2024). These documents serve a governance function, but it is unclear how well models actually follow them under adversarial, multi-turn pressure similar to what they would face in real-world deployment. We propose a multi-method audit pipeline that treats each lab's published specification as an auditable target: it decomposes the specification into atomic testable tenets (205 for Anthropic, 197 for OpenAI), generates multi-turn adversarial scenarios with the Petri auditing agent (Anthropic, 2025b), runs a modified SURF-style rubric search (Murray et al., 2026) to catch shallow single-t...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"","openalex_id":"https://openalex.org/W7162606751","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Google (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6237000226974487},{"id":"https://openalex.org/C199521495","display_name":"Audit","score":0.5530999898910522},{"id":"https://openalex.org/C43521106","display_name":"Pipeline (software)","score":0.5246999859809875},{"id":"https://openalex.org/C2780791683","display_name":"Action (physics)","score":0.4586000144481659},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4503999948501587},{"id":"https://openalex.org/C111640148","display_name":"Rubric","score":0.3837999999523163},{"id":"https://openalex.org/C153180980","display_name":"Commit","score":0.37689998745918274},{"id":"https://openalex.org/C2779276979","display_name":"Weir","score":0.3513000011444092}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"arxiv:2605.16748","title":"Genflow Ad Studio: A Compound AI Architecture for Brand-Aligned, Self-Correcting Video Generation","url":"https://arxiv.org/abs/2605.16748","published":"2026-05-22","authors":["Debanshu Das","Lavi Nigam","Sunil Kumar Jang Bahadur","Gopala Dhar"],"abstract":"Recent advancements in generative video models demonstrate high visual fidelity, yet their integration into enterprise environments is restricted by temporal inconsistencies and severe brand misalignment. Current monolithic architectures struggle to enforce rigid brand constraints, frequently hallucinating unapproved visual assets. We introduce Genflow, a Compound AI System designed to enforce brand consistency in generative media production. Our architecture integrates a retrieval-based 'Brand DNA' extraction module to parameterize generation according to established corporate identity guidelines. Furthermore, we implement an Adversarial Multi-Agent Quality Control (QC) loop. Instead of a single-pass generation, this pipeline employs evaluator agents to iteratively critique generated frames against the extracted parameters, prompting generator models to refine outputs until a determinis...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3786335.3813213","openalex_id":"https://openalex.org/W7161807725","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Google (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7305999994277954},{"id":"https://openalex.org/C2911011789","display_name":"Hallucinating","score":0.6773999929428101},{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.63919997215271},{"id":"https://openalex.org/C2780992000","display_name":"Generator (circuit theory)","score":0.5993000268936157},{"id":"https://openalex.org/C2776436953","display_name":"Consistency (knowledge bases)","score":0.5667999982833862},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4982999861240387},{"id":"https://openalex.org/C37736160","display_name":"Adversarial system","score":0.49050000309944153},{"id":"https://openalex.org/C43521106","display_name":"Pipeline (software)","score":0.4819999933242798}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7162124120","title":"DraftNEPABench: A Benchmark for Drafting NEPA Document Sections with Coding Agents","url":"https://doi.org/10.1145/3786335.3813132","published":"2026-05-22","authors":["Anurag Acharya","Bishal Lakha","Rounak Meyur","Rohan Nuttall","Sarthak Chaturvedi","Anika Halappanavar","Leah Hare","Lin Zeng","Mike Parker","Sai Munikoti","Sameera Horawalavithana"],"abstract":"Coding agents represent a transformative paradigm in software engineering, enabling automated coding, generation, and debugging through a natural language interface. Recent advancements in large language models (LLMs) and their ability to use external tools have expanded the potential of using these agents beyond software engineering tasks. In this work, we explore the application of coding agents in a noncoding domain: drafting environmental impact statement (EIS) sections. For that, we introduce DraftNEPABench: a challenging benchmark that requires coding agents to compose structured, coherent, and domain-specific drafts grounded in multiple complex regulatory and scientific reference materials. We evaluate various state-of-the-art commercial coding agents on this benchmark and demonstrate their promise in generating EIS documents. Our findings show that while coding agents outperform....","companies":["OpenAI"],"matched_orgs":["OpenAI"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3786335.3813132","openalex_id":"https://openalex.org/W7162124120","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["OpenAI (United States)","Pacific Northwest National Laboratory"],"concepts":[{"id":"https://openalex.org/C168065819","display_name":"Debugging","score":0.7275000214576721},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7080000042915344},{"id":"https://openalex.org/C179518139","display_name":"Coding (social sciences)","score":0.696399986743927},{"id":"https://openalex.org/C185798385","display_name":"Benchmark (surveying)","score":0.5654000043869019},{"id":"https://openalex.org/C115903868","display_name":"Software engineering","score":0.5144000053405762},{"id":"https://openalex.org/C2777904410","display_name":"Software","score":0.4650000035762787},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3871999979019165},{"id":"https://openalex.org/C56666940","display_name":"Documentation","score":0.34380000829696655}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/from-patches-to-trajectories-privileged-process-supervision-for-software-engineering-agents","title":"From Patches to Trajectories: Privileged Process Supervision for Software-Engineering Agents","url":"https://www.microsoft.com/en-us/research/publication/from-patches-to-trajectories-privileged-process-supervision-for-software-engineering-agents/","published":"2026-05-21","authors":["Murong Ma","Tianyu Chen","Yun Lin","Shuai Lu","Qinglin Zhu","Yeyun Gong","Zhiyong Huang","Peng Cheng","Yan Lu","Jin Song Dong"],"abstract":"Supervised fine-tuning (SFT) on long teacher trajectories is the dominant way to instill investigation and reasoning in open software-engineering (SWE) agents. Since every retained response becomes an imitation target, the student inherits the final outcome and intermediate flaws, including ungrounded leaps and redundant loops. High-quality training data must be effective(each step is grounded and narrows the agent's epistemic gap to the correct fix) and efficient(each step is information-bearing rather than redundant or looping). Existing recipes filter or relabel teacher rollouts using only a binary terminal verifier, which does not directly target these axes and provides no supervision on instances where the teacher fails. Most real issue includes a developer-authored reference patch, $p^star$, revealing the file paths, runtime behaviors, and coding conventions presupposed by the corr...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Miscellaneous","Artificial intelligence","Programming languages and software engineering","Computer science","software engineering"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/synae-a-framework-for-measuring-the-quality-of-synthetic-data-for-tool-calling-agent-evaluations","title":"SynAE: A Framework for Measuring the Quality of Synthetic Data for Tool-Calling Agent Evaluations","url":"https://www.microsoft.com/en-us/research/publication/synae-a-framework-for-measuring-the-quality-of-synthetic-data-for-tool-calling-agent-evaluations/","published":"2026-05-21","authors":["Shuaiqi Wang","Aadyaa Maddi","Zinan Lin","Giulia Fanti"],"abstract":"Today, tool-calling agents are commonly evaluated or tested on static datasets of execution traces, including input commands, agent responses, and associated tool calls. However, internal production datasets are often insufficient or unusable for testing; for example, they may contain sensitive or proprietary data, or they may be too sparse to support comprehensive testing (especially pre-deployment). In these settings, practitioners are increasingly replacing or augmenting real datasets with synthetic ones for evaluation purposes. A key challenge is quantifying the relation between these synthetic datasets and the real data. We introduce SynAE, an evaluation framework for assessing how well synthetic benchmarks for multi-turn, tool-calling agents replicate and augment the characteristics of real data trajectories. SynAE assesses the validity, fidelity, and diversity of synthetic data ac...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Miscellaneous","Artificial intelligence","Programming languages and software engineering","Computer science"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"hf-org-paper:tencent:2605.23271","title":"EvalVerse: Pipeline-Aware and Expert-Calibrated Benchmarking for Professional Cinematic Video Generation","url":"https://huggingface.co/papers/2605.23271","published":"2026-05-21","authors":["Tencent/Hunyuan"],"abstract":"","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["HuggingFace org papers","tencent"],"author_affiliations":["Tencent/Hunyuan"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/tencent/papers"}},{"id":"openalex:W7161949248","title":"The Future of Selection Enabled by Artificial Intelligence","url":"https://doi.org/10.1093/9780197809013.003.0018","published":"2026-05-21","authors":["Anthony S. Boyce","Louis Hickman","Christine E Boyce"],"abstract":"Abstract Artificial intelligence (AI) is rapidly transforming how organizations design, validate, and deploy employee selection systems. This chapter provides a comprehensive examination of the current state, emerging evidence, and future potential of AI in personnel selection. It begins with a forward-looking vignette that illustrates an end-to-end, AI-enabled hiring process and then introduces foundational machine-learning and generative-AI concepts relevant to selection science. The chapter reviews AI’s role in developing, delivering, scoring, and reporting on assessments, and it provides an in-depth analysis of AI applications across major predictor domains—including résumés, knowledge and ability tests, personality and biodata self-reports, simulations and work samples, and interviews. For each method, it synthesizes what is known, highlights key gaps and risks, and offers explicit....","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"book-chapter","doi":"https://doi.org/10.1093/9780197809013.003.0018","openalex_id":"https://openalex.org/W7161949248","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Amazon (United States)","Office of the Chief Scientist","Society for Industrial and Organizational Psychology"],"concepts":[{"id":"https://openalex.org/C81917197","display_name":"Selection (genetic algorithm)","score":0.6617000102996826},{"id":"https://openalex.org/C98045186","display_name":"Process (computing)","score":0.5586000084877014},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5040000081062317},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.48919999599456787},{"id":"https://openalex.org/C9719361","display_name":"Vignette","score":0.46309998631477356},{"id":"https://openalex.org/C157170001","display_name":"Applications of artificial intelligence","score":0.4433000087738037},{"id":"https://openalex.org/C2777207495","display_name":"Personnel selection","score":0.4255000054836273},{"id":"https://openalex.org/C56739046","display_name":"Knowledge management","score":0.4047999978065491}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7161994276","title":"The Brain Imaging and Neurophysiology Dataset of large-scale multimodal neural data","url":"https://doi.org/10.1038/s41597-026-07421-x","published":"2026-05-21","authors":["Charlotte Maschke","Peter N. Hadar","Yicheng Zhang","Jian Li","Gauri Ganjoo","Andrew Hoopes","Alessandro Guazzo","Aditya Gupta","Manohar Ghanta","Bruce Nearing","Christine Tsien Silvers","Bharath Gunapati"],"abstract":"The Brain Imaging and Neurophysiology Dataset (BIND) represents one of the largest multi-institutional, multimodal, clinical neuroimaging repositories, comprising 1.8 million brain scans from 38,942 patients, linked to full-text reports and neurophysiological recordings. This comprehensive dataset addresses critical limitations in neuroimaging research by providing a rich and diverse set of large-scale multimodal data. BIND integrates de-identified data from Massachusetts General Hospital, Brigham and Women’s Hospital, and Stanford University, including 1,723,699 MRI scans (1.5, 3 and 7 Tesla), 54,137 CT scans, 5,093 PET scans, and 526 SPECT scans, converted to standardized NIfTI format following BIDS organization. The dataset spans the full age spectrum and encompasses diverse neurological conditions alongside healthy subjects. We deployed Large Language Models to extract structured cli...","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1038/s41597-026-07421-x","openalex_id":"https://openalex.org/W7161994276","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Amazon (United States)","Artificial Intelligence in Medicine (Canada)","Athinoula A. Martinos Center for Biomedical Imaging","Beth Israel Deaconess Medical Center","Harvard University","Massachusetts General Hospital","Palo Alto University","Stanford University","Yale University"],"concepts":[{"id":"https://openalex.org/C58693492","display_name":"Neuroimaging","score":0.862500011920929},{"id":"https://openalex.org/C152478114","display_name":"Neurophysiology","score":0.6642000079154968},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.574400007724762},{"id":"https://openalex.org/C522805319","display_name":"Electroencephalography","score":0.47999998927116394},{"id":"https://openalex.org/C93518851","display_name":"Metadata","score":0.4388999938964844},{"id":"https://openalex.org/C205427263","display_name":"Neuroinformatics","score":0.42669999599456787},{"id":"https://openalex.org/C169760540","display_name":"Neuroscience","score":0.4253999888896942},{"id":"https://openalex.org/C31601959","display_name":"Medical imaging","score":0.41510000824928284}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7161956647","title":"ReflectRAG: Enhancing retrieval-augmented generation with GRPO-optimized iterative reflection","url":"https://doi.org/10.1016/j.neucom.2026.134047","published":"2026-05-21","authors":["Xuanhe Chen","Yuchen Li","Youyi Bi","Shuaiqiang Wang","Linghe Kong","Dawei Yin"],"abstract":"","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1016/j.neucom.2026.134047","openalex_id":"https://openalex.org/W7161956647","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Baidu (China)","Shanghai Jiao Tong University"],"concepts":[{"id":"https://openalex.org/C65682993","display_name":"Reflection (computer programming)","score":0.5776000022888184},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.49160000681877136},{"id":"https://openalex.org/C159694833","display_name":"Iterative method","score":0.44699999690055847},{"id":"https://openalex.org/C120665830","display_name":"Optics","score":0.43209999799728394},{"id":"https://openalex.org/C121332964","display_name":"Physics","score":0.38499999046325684},{"id":"https://openalex.org/C11413529","display_name":"Algorithm","score":0.3785000145435333},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.32659998536109924},{"id":"https://openalex.org/C143587482","display_name":"Iterative and incremental development","score":0.2964000105857849}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"arxiv:2605.21969","title":"LLM Retrieval for Stable and Predictable Ad Recommendations","url":"https://arxiv.org/abs/2605.21969","published":"2026-05-21","authors":["Vinodh Kumar Sunkara","Satheeshkumar Karuppusamy","Heng Xu","Sai Deepika Regani","Kshitij Gupta","Gaby Nahum","Sneha Iyer","Jean-Baptiste Fiot","Yinglong Guo","Xiaowen Guo","Atul Jangra","Yucheng Liu"],"abstract":"Traditional ads recommendation systems have primarily focused on optimizing for prediction accuracy of click or conversion events using canonical metrics such as recall or normalized discounted cumulative gain (NDCG). With the hyper-growth of ads inventory and liquidity with generative AI technologies, the prediction stability and predictability is becoming increasingly critical. Intuitively, prediction stability and predictability can be defined to quantify system robustness with respect to minor/noisy input (ads, creatives) perturbations, the lack of which could lead to advertiser perceivable problems such as repeatability, cold start and under-exploration. In this paper, we introduce a new evaluation framework for quantifying stability and predictability of an ads recommender system, and present an online validated semantic candidate generation framework powered by fine-tuned Large La...","companies":["Meta/FAIR"],"matched_orgs":["Meta/FAIR"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"","openalex_id":"https://openalex.org/W7162218280","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Meta (United States)"],"concepts":[{"id":"https://openalex.org/C197640229","display_name":"Predictability","score":0.9004999995231628},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.761900007724762},{"id":"https://openalex.org/C63479239","display_name":"Robustness (evolution)","score":0.6276999711990356},{"id":"https://openalex.org/C557471498","display_name":"Recommender system","score":0.4902999997138977},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.44040000438690186},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.43869999051094055},{"id":"https://openalex.org/C112972136","display_name":"Stability (learning theory)","score":0.3977000117301941},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.382099986076355}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7162023374","title":"Current Artificial Intelligence Large Language Models Exhibit Sycophantic Behavior in Orthopaedic Contexts","url":"https://doi.org/10.2106/jbjs.25.01576","published":"2026-05-21","authors":["Arthur Perry","Swara Kalva","Dario Fucich","Srikar Muppidi","M. L. Aggarwal","Mandeep S. Virk","Joseph D. Zuckerman","Jie Yao"],"abstract":"Background: The use of large language models (LLMs) is increasingly common. However, LLMs may exhibit sycophancy, echoing users’ beliefs while avoiding contradiction. In the present study, we describe sycophancy in general-purpose LLMs when applied to orthopaedic contexts. Methods: We investigated sycophancy in 2 general-purpose LLMs. We evaluated performance on 3 tasks: (1) accuracy on benchmark answering: LLMs were tested on validated benchmark orthopaedic questions, with correct and incorrect cues, and the change in accuracy and sycophancy error rate were determined; (2) user belief agreement: LLMs were provided with ambiguous statements and a user belief, and LLM agreement, contradiction, and uncertainty were described; and (3) false information detection: false information was placed within a task prompt to measure noncontradiction and propagation rates. Results: Baseline factual ac...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.2106/jbjs.25.01576","openalex_id":"https://openalex.org/W7162023374","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Google (United States)","NYU Langone Health","New York University Langone Orthopedic Hospital","Search"],"concepts":[{"id":"https://openalex.org/C185798385","display_name":"Benchmark (surveying)","score":0.7384999990463257},{"id":"https://openalex.org/C12725497","display_name":"Baseline (sea)","score":0.4959999918937683},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.49070000648498535},{"id":"https://openalex.org/C143299363","display_name":"Attribution","score":0.4781999886035919},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.43869999051094055},{"id":"https://openalex.org/C2780451532","display_name":"Task (project management)","score":0.43869999051094055},{"id":"https://openalex.org/C15744967","display_name":"Psychology","score":0.43810001015663147},{"id":"https://openalex.org/C180747234","display_name":"Cognitive psychology","score":0.4260999858379364}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/llamas-on-the-web-memory-efficient-performance-portable-and-multi-precision-llm-inference-with-webgpu","title":"Llamas on the Web: Memory-Efficient, Performance-Portable, and Multi-Precision LLM Inference with WebGPU","url":"https://www.microsoft.com/en-us/research/publication/llamas-on-the-web-memory-efficient-performance-portable-and-multi-precision-llm-inference-with-webgpu/","published":"2026-05-20","authors":["Reese Levine","Rithik Sharma","Nitisha Jain","Abhijit Ramesh","Zheyuan Chen","N. Abbas","James Contini","Tyler Sorensen"],"abstract":"Running language models in the browser presents a unique opportunity to build efficient, private, and portable AI applications, but requires contending with constrained memory availability and heterogeneous hardware targets. To realize this opportunity, we present Llamas on the Web (LlamaWeb), a WebGPU backend for llama$.$cpp that enables memory-efficient and performance-portable LLM inference across a wide range of model weight formats in the browser. Our design significantly reduces memory overhead through static memory planning and efficient model loading, addresses cross-device variability through a tunable kernel library, and introduces templated GPU kernels that support performant implementations of numerous quantization formats, enabling broad model support and extensibility to new formats. We evaluate LlamaWeb on 16 devices from 8 vendors, collecting data from 10 language models....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Miscellaneous","Artificial intelligence","Programming languages and software engineering","Computer science","Distributed, Parallel, and Cluster Computing"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/memgym-a-long-horizon-memory-environment-for-llm-agents","title":"MemGym: a Long-Horizon Memory Environment for LLM Agents","url":"https://www.microsoft.com/en-us/research/publication/memgym-a-long-horizon-memory-environment-for-llm-agents/","published":"2026-05-20","authors":["Wujiang Xu","Yu Wang","Kai Mei","Kaiqu Liang","Zhenting Wang","Ming Jin","Han Zhang","Shi-Xiong Zhang","Wenyue Hua","Sambit Sahu","Dimitris N. Metaxas"],"abstract":"Memory is a central capability for LLM agents operating across long-horizon tasks. Existing memory benchmarks predominantly evaluate retention of personalized information in multi-turn chat scenarios, overlooking the dynamic memory formation that occurs during extended agent execution. Consequently, the memory systems they produce transfer poorly to realistic agentic environments, such as coding and web navigation. We present MemGym, a benchmark for agentic memory that unifies existing agent gyms and in-house memory-grounded pipelines behind one memory-reasoning interface. MemGym spans five evaluation tracks grouped into four agentic regimes: tool-use dialogue (tau2-bench), multi-turn deep-research search (MEMGYM-DR), coding (SWE-Gym and MEMGYM-CODEQA), and computer use (WebArena-Infinity). MemGym reports memory-isolated scores that decouple memory performance from reasoning, retrieval,....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Miscellaneous","Artificial intelligence","Human language technologies","Computer science"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/reinforcing-vlas-in-task-agnostic-world-models","title":"Reinforcing VLAs in Task-Agnostic World Models","url":"https://www.microsoft.com/en-us/research/publication/reinforcing-vlas-in-task-agnostic-world-models/","published":"2026-05-20","authors":["Yucen Wang","Rui Yu","Fengming Zhang","Junjie Lu","Xinyao Qin","Tianxiang Zhang","Kaixin Wang","Li Zhao"],"abstract":"Post-training Vision-Language-Action (VLA) models via reinforcement learning (RL) in learned world models has emerged as an effective strategy to adapt to new tasks without costly real-world interactions. However, while using imagined trajectories reduces the sample complexity of policy training, existing methods still heavily rely on task-specific data to fine-tune both the world and reward models, fundamentally limiting their scalability to unseen tasks. To overcome this, we argue that world and reward models should capture transferable physical priors that enable zero-shot inference. We propose RAW-Dream (Reinforcing VLAs in task-Agnostic World Dreams), a new paradigm that completely disentangles world model learning from downstream task dependencies. RAW-Dream utilizes a world model pre-trained on diverse task-free behaviors for predicting future rollouts, and an off-the-shelf Vision...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Miscellaneous","Artificial intelligence","Computer science"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/memory-grafting-scaling-language-model-pre-training-via-offline-conditional-memory","title":"Memory Grafting: Scaling Language Model Pre-training via Offline Conditional Memory","url":"https://www.microsoft.com/en-us/research/publication/memory-grafting-scaling-language-model-pre-training-via-offline-conditional-memory/","published":"2026-05-20","authors":["Runxi Cheng","Yuchen Guan","Yongxian Wei","Qianpu Sun","Qixiu Li","Sinan Du","Feng Xiong","Chun Yuan","Yan Lu","Yeyun Gong"],"abstract":"Scaling conditional memory offers a promising way to increase language-model capacity, but existing methods such as Engram learn large memory tables from scratch during pre-training, making memory scaling expensive and sometimes ineffective. We propose Memory Grafting , a conditional memory scaling method that utilizes frozen hidden states from a grafting model as conditional n-gram memory. Given frequent local n-grams, we run the grafting model offline, store final-token hidden representations as memory values, and let the recipient model retrieve them through exact longest-match suffix lookup. Retrieved memories are adapted by lightweight projections and gates, while a hash-based Engram fallback preserves coverage for unmatched contexts. Since the grafting model is only run offline and exact lookup has expected O (1) complexity with respect to memory-bank size, Memory Grafting expands....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Miscellaneous","Artificial intelligence","Computer science"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/lens-rethinking-training-efficiency-for-foundational-text-to-image-models","title":"Lens: Rethinking Training Efficiency for Foundational Text-to-Image Models","url":"https://www.microsoft.com/en-us/research/publication/lens-rethinking-training-efficiency-for-foundational-text-to-image-models/","published":"2026-05-20","authors":["Dong Chen","Fangyun Wei","Ziyu Wan","Dongdong Chen","Jiawei Zhang","Jinjing Zhao","Sirui Zhang","Yang Yue","Zhiyang Liang","Baining Guo","Chong Luo","Jianmin Bao"],"abstract":"We introduce Lens, a 3.8B-parameter T2I model that achieves performance competitive with, and in several cases surpassing, state-of-the-art models with more than 6B parameters across various benchmarks, while requiring significantly less training compute. For example, Lens requires only about 19.3% of the training compute used by Z-Image. The training efficiency of Lens stems from two key strategies beyond its compact model size. First, we maximize data information density per training batch by (i) training on Lens-800M, a dataset of 800M densely captioned image-text pairs whose captions are generated by GPT-4.1 and contain approximately 109 words on average, providing richer semantic supervision than conventional short captions, and (ii) constructing each batch from images with multiple resolutions and diverse aspect ratios, thereby enlarging the effective visual coverage of each optimi...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Miscellaneous","Computer vision","Computer science"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"arxiv:2605.21810","title":"Trace2Skill: Verifier-Guided Skill Evolution for Long-Context EDA Agents","url":"https://arxiv.org/abs/2605.21810","published":"2026-05-20","authors":["Zijian Du","Nathaniel Pinckney"],"abstract":"Complex Verilog Design Problems (CVDP) challenge hardware LLM agents because solving them requires localizing verifier-relevant RTL, testbenches, include paths, and build dependencies inside large repository snapshots, making precise edits, and recovering from sparse hidden-verifier failures. We present Trace2Skill, a test-time scaling framework that improves a hardware agent without RTL-specialized model fine-tuning. Rather than training a new model or only sampling more candidate solutions, Trace2Skill treats the agent's natural-language skill as an evolvable policy. It mines repeated rollout traces for success and failure modes, converts them into dense diagnostics and oracle lessons, and uses an oracle, mutator, and selector loop to produce task-specific skills that guide later search, editing, validation, and recovery. Because final pass/fail labels are often too coarse for hard fai...","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"","openalex_id":"https://openalex.org/W7162218252","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Nvidia (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7573999762535095},{"id":"https://openalex.org/C55166926","display_name":"Oracle","score":0.614799976348877},{"id":"https://openalex.org/C2780451532","display_name":"Task (project management)","score":0.5842999815940857},{"id":"https://openalex.org/C185798385","display_name":"Benchmark (surveying)","score":0.48249998688697815},{"id":"https://openalex.org/C179518139","display_name":"Coding (social sciences)","score":0.4260999858379364},{"id":"https://openalex.org/C2779030575","display_name":"Verilog","score":0.42149999737739563},{"id":"https://openalex.org/C85847156","display_name":"Verifiable secret sharing","score":0.41359999775886536},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.38100001215934753}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"arxiv:2605.21446","title":"Lost in Fog: Sensor Perturbations Expose Reasoning Fragility in Driving VLAs","url":"https://arxiv.org/abs/2605.21446","published":"2026-05-20","authors":["Abhinaw Priyadershi","Jelena Frtunikj"],"abstract":"Interpretable autonomous driving planners depend not only on generating explanations, but also on those explanations remaining reliable under real-world sensor degradation. In this paper we present a controlled perturbation study of Vision-Language-Action (VLA) robustness in autonomous driving, evaluating Alpamayo R1 (10B parameters) across 1,996 scenarios under eight sensor perturbations (Gaussian noise at four intensities, two lighting extremes, and two fog levels; ${\\sim}18{,}000$ inference trials). We find that reasoning consistency is a high-fidelity indicator of trajectory reliability: when Chain-of-Causation (CoC) explanations change after perturbation, trajectory deviation spikes $5.3{\\times}$ (21.8m vs 4.1m), with $r\\!=\\!0.99$ across attack types and $r_{pb}\\!=\\!0.53$ per-sample (Cohen's $d\\!=\\!1.12$). A controlled ablation provides evidence that enabling CoC generation is assoc...","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"","openalex_id":"https://openalex.org/W7162149859","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Nvidia (United Kingdom)","Nvidia (United States)"],"concepts":[{"id":"https://openalex.org/C80191262","display_name":"Fragility","score":0.5911999940872192},{"id":"https://openalex.org/C13662910","display_name":"Trajectory","score":0.5408999919891357},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5375999808311462},{"id":"https://openalex.org/C2776214188","display_name":"Inference","score":0.5329999923706055},{"id":"https://openalex.org/C63479239","display_name":"Robustness (evolution)","score":0.5271999835968018},{"id":"https://openalex.org/C2776654903","display_name":"SAFER","score":0.4275999963283539},{"id":"https://openalex.org/C177918212","display_name":"Perturbation (astronomy)","score":0.415800005197525},{"id":"https://openalex.org/C47446073","display_name":"Control theory (sociology)","score":0.4083999991416931}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"arxiv:2605.21537","title":"Articulate but Wrong: Self-Review Failures in LLM-Based Code Modernization","url":"https://arxiv.org/abs/2605.21537","published":"2026-05-20","authors":["Gokul Chandra Purnachandra Reddy","Aditya Lolla","Harsha Sanku"],"abstract":"Large language model (LLM) agents are increasingly used to migrate legacy code to modern stacks. We ask a deceptively simple question: when an LLM modernizes legacy code, can the same model be relied upon to recognize when its own output silently changes observable behavior? We run 1,980 real modernization calls across 11 production LLMs from 7 distinct families on a balanced 60-snippet legacy-Python-2 corpus, evaluate every output with a type-strict behavioral oracle, and then ask each model to judge whether its own output preserves behavior. We report four findings. (1) Semantic-preservation drift is prevalent and sharply separable from a cleanly-controlled baseline: semantic-trap snippets drift in 39.7% of attempts versus 7.0% on benign-control code that requires no real modernization (+32.7 percentage points; n=660 each). (2) Drift concentrates on specific snippets that fail across m...","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"","openalex_id":"https://openalex.org/W7162219178","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Amazon (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6686999797821045},{"id":"https://openalex.org/C2776760102","display_name":"Code (set theory)","score":0.6287999749183655},{"id":"https://openalex.org/C53844881","display_name":"Modernization theory","score":0.6108999848365784},{"id":"https://openalex.org/C89992363","display_name":"Track (disk drive)","score":0.5175999999046326},{"id":"https://openalex.org/C2780586882","display_name":"Simple (philosophy)","score":0.351500004529953},{"id":"https://openalex.org/C204323151","display_name":"Range (aeronautics)","score":0.34380000829696655},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3294000029563904},{"id":"https://openalex.org/C2164484","display_name":"Core (optical fiber)","score":0.3179999887943268}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"arxiv:2605.20682","title":"IndusAgent: Reinforcing Open-Vocabulary Industrial Anomaly Detection with Agentic Tools","url":"https://huggingface.co/papers/2605.20682","published":"2026-05-20","authors":["Rongbin Tan","Fangfang Lin","Zhenlong Yuan","Min Qiu","Kejin Cui","Mengmeng Wang","Yi Wang","Zijian Song","Zhiyuan Wang","Jiyuan Wang","Yue Wang","Shuhan Song§"],"abstract":"Multimodal large language models (MLLMs) have shown remarkable capability in bridging visual perception and textual reasoning, enabling zero-shot understanding across diverse industrial scenarios. However, their performance in open-vocabulary industrial anomaly detection (IAD) is often limited by domain-misaligned reasoning and hallucinated structural inferences. To address these challenges, we propose IndusAgent, a tool-augmented agentic framework for open-vocabulary IAD. Specifically, we first construct Indus-CoT, a structured dataset that integrates global visual observations, high-resolution local patches, and expert normalcy priors, providing supervision for fine-tuning the model on rigorous industrial inspection trajectories. Building on this, IndusAgent dynamically orchestrates a set of external tools, including dynamic region cropping, high-frequency feature enhancement, and prio...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["huggingface_search"],"source":"huggingface_search","work_type":"","doi":"","openalex_id":"","cited_by_count":0,"quality_score":27,"matched_keywords":[],"author_affiliations":[],"concepts":[],"official_report":false,"quality_signals":{"company_match_source":"HuggingFace Papers organization/author metadata"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/m3bert-a-modern-multi-lingual-matryoshka-bidirectional-encoder","title":"m3BERT: A Modern, Multi-lingual, Matryoshka Bidirectional Encoder","url":"https://www.microsoft.com/en-us/research/publication/m3bert-a-modern-multi-lingual-matryoshka-bidirectional-encoder/","published":"2026-05-19","authors":["Yaoxiang Wang","Simiao Zuo","Qingguo Hu","Yucheng Ding","Yeyun Gong","Jian Jiao","Jinsong Su"],"abstract":"Embedding models are pivotal in industrial information retrieval systems like search and advertising. However, existing pretrained models often exhibit fixed architectures and embedding dimensionalities, posing significant challenges when adapting them to diverse deployment scenarios with varying business-driven constraints. A common practice involves fine-tuning with partial parameter initialization from larger pretrained models for resource-constrained tasks. This method is often suboptimal as the misalignment between pretraining and downstream usage prevents full realization of pretraining benefits. To address this limitation, we introduce m3BERT: a Modern, Multi-lingual, Matryoshka Bidirectional Encoder, which features a novel pretraining strategy that jointly optimizes representations across both transformer layers and multiple embedding dimensions. This enables a single model to be...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computation and Language","Computer science","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/open-world-evaluations-for-measuring-frontier-ai-capabilities","title":"Open-World Evaluations for Measuring Frontier AI Capabilities","url":"https://www.microsoft.com/en-us/research/publication/open-world-evaluations-for-measuring-frontier-ai-capabilities/","published":"2026-05-19","authors":["Sayash Kapoor","Peter Kirgis","Andrew Schwartz","Stephan Rabanser","J.J. Allaire","Rishi Bommasani","Harry Coppock","Magda Dubois","Gillian K. Hadfield","Andy Hall","Sara Hooker","Seth Lazar"],"abstract":"Benchmark-based evaluation remains important for tracking frontier AI progress. But it can both overstate and understate deployed capability because it privileges tasks that can be precisely specified, automatically graded, easy to optimize for, and run with low budgets and short time horizons. We advocate for a complementary class of evaluations, which we term open-world evaluations: long-horizon, messy, real-world tasks assessed through small-sample qualitative analysis rather than benchmark-scale automation. In this paper we survey recent open-world evaluations, identify their strengths and limitations, and introduce CRUX (Collaborative Research for Updating AI eXpectations), a project for conducting such evaluations regularly. As a first instance, we task an AI agent with developing and publishing a simple iOS application to the Apple App Store. The agent completed the task with only...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Miscellaneous","Artificial intelligence","Computer science"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"hf-org-paper:Tencent-Hunyuan:2605.20873","title":"PlanningBench: Generating Scalable and Verifiable Planning Data for Evaluating and Training Large Language Models","url":"https://huggingface.co/papers/2605.20873","published":"2026-05-19","authors":["Tencent/Hunyuan"],"abstract":"","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["HuggingFace org papers","Tencent-Hunyuan"],"author_affiliations":["Tencent/Hunyuan"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/Tencent-Hunyuan/papers"}},{"id":"official:b9c7e8f0724a0507","title":"Gemini Omni Flash Model Card","url":"https://deepmind.google/models/model-cards/gemini-omni-flash/","published":"2026-05-19","authors":["Google/DeepMind"],"abstract":"Official Google DeepMind model card.","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"model_card","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["model card","Gemini Omni Flash"],"author_affiliations":["Google/DeepMind"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Google DeepMind model cards page https://deepmind.google/models/model-cards/"}},{"id":"official:4a73f0f458baa17d","title":"Gemini 3.5 Flash Model Card","url":"https://deepmind.google/models/model-cards/gemini-3-5-flash/","published":"2026-05-19","authors":["Google/DeepMind"],"abstract":"Official Google DeepMind model card.","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"model_card","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["model card","Gemini 3.5 Flash"],"author_affiliations":["Google/DeepMind"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Google DeepMind model cards page https://deepmind.google/models/model-cards/"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/an-efficient-streaming-video-understanding-framework-with-agentic-control","title":"An Efficient Streaming Video Understanding Framework with Agentic Control","url":"https://www.microsoft.com/en-us/research/publication/an-efficient-streaming-video-understanding-framework-with-agentic-control/","published":"2026-05-18","authors":["Jinming Liu","Jianguo Huang","Zhaoyang Jia","Jiahao Li","Xiaoyi Zhang","Zongyu Guo","Bin Li","Wenjun Zeng","Yan Lu","Xin Jin"],"abstract":"Streaming video requires handling dynamic information density under strict latency budgets. Yet, existing methods typically employ static strategies, such as fixed memory compression or reliance on a single model, forcing a trade-off: fast models fail on complex queries, while always-on heavy models violate real-time constraints and overcomplicate simple queries. Rather than fixing these decisions upfront, we propose R3-Streaming (Remember, Respond, Reason), which formulates streaming video understanding as a cascaded control problem: for each query, the system compresses memory, judges response readiness, and routes computation sequentially, so that each downstream decision builds on progressively refined information states. To optimize this pipeline, we introduce an age-aware forgetting policy for memory compression, as aggressively compressing historical frames can yield substantial p...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Miscellaneous","Artificial intelligence","Computer vision","Computer science","Computer Vision and Pattern Recognition"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/multi-agent-ai-systems-outperform-human-teams-in-creativity","title":"Multi-agent AI systems outperform human teams in creativity","url":"https://www.microsoft.com/en-us/research/publication/multi-agent-ai-systems-outperform-human-teams-in-creativity/","published":"2026-05-18","authors":["Tiancheng Hu","Yixuan Jiang","Haotian Li","Jos'e Hern'andez-Orallo","Xing Xie","Nigel Collier","David Stillwell","Luning Sun"],"abstract":"Although artificial intelligence (AI) now matches or exceeds human performance across numerous cognitive tasks, creativity remains a highly contested frontier. As AI systems based on large language models (LLMs) are increasingly adopted in research and innovation, it is essential to understand and augment their creativity. Here we demonstrate that multi-agent LLM teams not only surpass single agents, but also substantially outperform human teams in creativity (Cohen's d=1.50) across 4,541 multi-agent LLM ideas and 341 human-team ideas on six diverse problem-solving tasks. This advantage is driven by novelty while maintaining comparable usefulness. To investigate the generative processes in both groups, we represent conversations as paths through semantic space using neural language model representations. Both LLM and human teams produce more creative ideas when conversations range widely...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Miscellaneous","Artificial intelligence","Human language technologies","Computer science"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"hf-org-paper:tencent:2605.18451","title":"Code-as-Room: Generating 3D Rooms from Top-Down View Images via Agentic Code Synthesis","url":"https://huggingface.co/papers/2605.18451","published":"2026-05-18","authors":["Tencent/Hunyuan"],"abstract":"","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["HuggingFace org papers","tencent"],"author_affiliations":["Tencent/Hunyuan"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/tencent/papers"}},{"id":"openalex:W7161574483","title":"Reward-SQL: Boosting Text-to-SQL via Stepwise Execution-Aware Reasoning and Process-Supervised Rewards","url":"https://doi.org/10.1145/3802105","published":"2026-05-18","authors":["Yuxin Zhang","Meihao Fan","Ju Fan","Mingyang Yi","Yuyu Luo","G Li","Bin Wu","Wenchao Zhou"],"abstract":"Recent advances in large language models (LLMs) trained with reinforcement learning (RL) have improved Text-to-SQL performance. However, RL-based approaches still struggle with complex queries due to two key limitations: insufficient stepwise execution-aware reasoning grounded in database feedback, and the lack of process-level rewards for guiding reasoning optimization. To address these issues, we propose CoCTE, a divide-and-conquer and execution-aware reasoning framework that progressively composes SQL queries through intermediate view validation and structured Common Table Expressions (CTEs), improving both accuracy and interpretability. To realize a CoCTE reasoning process, we develop Reward-SQL, a unified approach with three stages: (1) model initialization, which equips LLMs with structured CoCTE reasoning capabilities; (2) process reward design, which delivers fine-grained, execut...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3802105","openalex_id":"https://openalex.org/W7161574483","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Alibaba Group (China)","Renmin University of China","Tsinghua University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7544000148773193},{"id":"https://openalex.org/C2776214188","display_name":"Inference","score":0.7038999795913696},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.6883999705314636},{"id":"https://openalex.org/C46686674","display_name":"Boosting (machine learning)","score":0.5855000019073486},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.5579000115394592},{"id":"https://openalex.org/C98045186","display_name":"Process (computing)","score":0.5127999782562256},{"id":"https://openalex.org/C97541855","display_name":"Reinforcement learning","score":0.5042999982833862},{"id":"https://openalex.org/C26517878","display_name":"Key (lock)","score":0.5023999810218811}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"arxiv:2605.17900","title":"DuIVRS-2: An LLM-based Interactive Voice Response System for Large-scale POI Attribute Acquisition","url":"https://arxiv.org/abs/2605.17900","published":"2026-05-18","authors":["Le Zhang","Shengming Zhang","Rui Zha","Yunpeng Wu","Jingbo Zhou","Jizhou Huang"],"abstract":"Accurate Point of Interest (POI) attribute acquisition is essential for location-based services, yet traditional modular Interactive Voice Response (IVR) systems suffer from error accumulation and high maintenance overhead. We present DuIVRS-2, a large language model (LLM)-based end-to-end framework designed for large-scale POI attribute acquisition at Baidu Maps. To address the long-tail distribution of real-world interactions, our methodology first employs a finite state machine (FSM)-guided data augmentation strategy to synthesize a balanced and diverse training dataset. We then streamline dialogue management via a selective generation scheme combined with a Chain-of-Thought (CoT) mechanism, which ensures output stability and effectively eliminates hallucinations in industrial settings. To facilitate continuous policy refinement with minimal manual effort, we design a cooperative iter...","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"","openalex_id":"https://openalex.org/W7161917082","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Baidu (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8205000162124634},{"id":"https://openalex.org/C2780451532","display_name":"Task (project management)","score":0.6265000104904175},{"id":"https://openalex.org/C101468663","display_name":"Modular design","score":0.5863999724388123},{"id":"https://openalex.org/C77618280","display_name":"Scheme (mathematics)","score":0.5471000075340271},{"id":"https://openalex.org/C112972136","display_name":"Stability (learning theory)","score":0.5349000096321106},{"id":"https://openalex.org/C28719098","display_name":"Point (geometry)","score":0.4993000030517578},{"id":"https://openalex.org/C48103436","display_name":"State (computer science)","score":0.4498000144958496},{"id":"https://openalex.org/C2778348673","display_name":"Production (economics)","score":0.42480000853538513}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"arxiv:2512.16227","title":"An information-theoretic framework for robust large language model editing","url":"http://arxiv.org/abs/2512.16227","published":"2026-05-18","authors":["Qizhou Chen","Chengyu Wang","Taolin Zhang","Xiaofeng He"],"abstract":"Large Language Models (LLMs) have become indispensable tools in science, technology, and society, enabling transformative advances across diverse fields. However, errors or outdated information within these models can undermine their accuracy and restrict their safe deployment. Developing efficient strategies for updating model knowledge without the expense and disruption of full retraining remains a critical challenge. Current model editing techniques frequently struggle to generalize corrections beyond narrow domains, leading to unintended consequences and limiting their practical impact. Here, we introduce a novel framework for editing LLMs, grounded in information bottleneck theory. This approach precisely compresses and isolates the essential information required for generalizable knowledge correction while minimizing disruption to unrelated model behaviors. Building upon this found...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1038/s44387-026-00114-1","openalex_id":"https://openalex.org/W4417530032","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Alibaba Group (China)","East China Normal University","Hefei University of Technology"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7501000165939331},{"id":"https://openalex.org/C2780513914","display_name":"Bottleneck","score":0.6646999716758728},{"id":"https://openalex.org/C2780767217","display_name":"Generality","score":0.6348000168800354},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.46650001406669617},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.43959999084472656},{"id":"https://openalex.org/C60008888","display_name":"Information bottleneck method","score":0.4300999939441681},{"id":"https://openalex.org/C70587473","display_name":"Transformative learning","score":0.4271000027656555},{"id":"https://openalex.org/C2776889888","display_name":"Unintended consequences","score":0.3790999948978424}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/personaarena-dynamic-simulation-for-evaluating-and-enhancing-persona-level-role-playing-in-large-language-models","title":"PersonaArena: Dynamic Simulation for Evaluating and Enhancing Persona-Level Role-Playing in Large Language Models","url":"https://www.microsoft.com/en-us/research/publication/personaarena-dynamic-simulation-for-evaluating-and-enhancing-persona-level-role-playing-in-large-language-models/","published":"2026-05-16","authors":["Wen Shi","Jianxun Lian","Mingqi Wu","Haiming Qin","Ming Zhou","Xing Xie","Naipeng Chao","Hao Liao"],"abstract":"Large language models (LLMs) increasingly serve as interactive social agents, yet their ability to maintain coherent and authentic persona-level role-playing remains limited, particularly in realistic social scenarios. Existing research predominantly focuses on character-level settings and relies on static evaluation formats, failing to capture the complexity of everyday social interactions. In this work, we present PersonaArena, a dynamic simulation framework for evaluating and improving persona-level role-playing in LLMs. PersonaArena leverages a large, filtered corpus of user-generated social content to construct a nuanced persona bank, and elicits multi-turn, context-rich interactions within simulated social environments. Our framework features a multi-agent debating judge for holistic and unbiased assessment. Through extensive experiments, we demonstrate that PersonaArena enables ri...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Miscellaneous","Artificial intelligence","Computer science","Gaming"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W7161401622","title":"VGL-DPO: Vision-Guided Lexical Direct Preference Optimization for Mitigating Hallucination in Multimodal Large Language Models","url":"https://doi.org/10.1145/3796715","published":"2026-05-16","authors":["Siyuan Li","F H Wang","Simeng Qin","Ranjie Duan","Haonan Cheng","Long Ye"],"abstract":"Multimodal large language models (MLLMs) have achieved significant advancements in multimodal understanding, reasoning, and interaction. However, they still suffer from hallucination, where the generated text often deviates from the factual content of the input image. To mitigate this issue, prior studies have primarily employed direct preference optimization (DPO) for human preference alignment. However, these approaches treat all textual words equally, neglecting the varying significance of individual words in grounding text generation to image content. This limitation hinders fine-grained semantic alignment and consequently constrains their effectiveness in hallucination suppression. To address this limitation, we propose a vision-guided lexical direct preference optimization method, called VGL-DPO. Specifically, we quantify the significance of words in positive preference data based....","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3796715","openalex_id":"https://openalex.org/W7161401622","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Alibaba Group (China)","Communication University of China","Northeastern University","Tianjin University"],"concepts":[{"id":"https://openalex.org/C2911011789","display_name":"Hallucinating","score":0.8478000164031982},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8130999803543091},{"id":"https://openalex.org/C2781249084","display_name":"Preference","score":0.7027999758720398},{"id":"https://openalex.org/C153083717","display_name":"Leverage (statistics)","score":0.6930999755859375},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.6047000288963318},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.4787999987602234},{"id":"https://openalex.org/C158154518","display_name":"Relevance (law)","score":0.4602999985218048},{"id":"https://openalex.org/C108154423","display_name":"Salience (neuroscience)","score":0.39430001378059387}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7161274572","title":"A framework for clinical validation of generative artificial intelligence therapeutics","url":"https://doi.org/10.1002/wps.70067","published":"2026-05-15","authors":["Isaac R. Galatzer‐Levy","Nenad Tomašev","Carolyn Rodriguez","John Torous"],"abstract":"While established frameworks exist for assessing the clinical efficacy and effectiveness of human-delivered interventions, and standards are in place for pre-artificial intelligence (AI) chatbots that have achieved clearance from the US Food and Drug Administration (FDA) as companions to psychological treatment, a significant void remains. There are currently no defined standards to determine the efficacy of an AI agent in delivering validated treatment approaches, whether it is assisting with medication management, supporting clinicians, or directly delivering talk therapy. This gap leaves the field vulnerable, confronting a surge of emerging technologies without the necessary tools to ascertain their safety, and if – or for whom – they genuinely work. Autonomous or semi-autonomous AI agents, capable of interacting across diverse modalities – text, voice and images – can both understand...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1002/wps.70067","openalex_id":"https://openalex.org/W7161274572","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Google (United Kingdom)","Google DeepMind (United Kingdom)","Harvard University","Stanford Medicine"],"concepts":[{"id":"https://openalex.org/C70587473","display_name":"Transformative learning","score":0.7631000280380249},{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.5870000123977661},{"id":"https://openalex.org/C71924100","display_name":"Medicine","score":0.5827999711036682},{"id":"https://openalex.org/C2779903281","display_name":"Modalities","score":0.5188000202178955},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4535999894142151},{"id":"https://openalex.org/C9652623","display_name":"Field (mathematics)","score":0.37940001487731934},{"id":"https://openalex.org/C535046627","display_name":"Clinical trial","score":0.37139999866485596},{"id":"https://openalex.org/C3018890749","display_name":"Food and drug administration","score":0.3467000126838684}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7161607242","title":"3D Segmentation Using Viewpoint-Dependent Spatial Relationships","url":"https://doi.org/10.48550/arxiv.2605.15708","published":"2026-05-15","authors":["Ayaka Nanri","Klara Reichard","Mert Kiray","Federico Tombari","Benjamin Busam","Asako Kanezaki"],"abstract":"Recent advances in 3D datasets and multimodal models have greatly improved natural language 3D scene understanding. However, most 3D referring segmentation methods do not explicitly represent the observer viewpoint, making spatial relations such as \"left,\" \"right,\" \"front,\" and \"behind\" ambiguous and difficult to evaluate. We introduce a viewpoint-aware 3D referring segmentation dataset containing 220k benchmark samples, and scalable to tens of millions of viewpoint-conditioned samples through dense viewpoint sampling. In this dataset, target objects can only be identified through observer-centric spatial relations, making viewpoint-conditioned grounding necessary. We construct the benchmark by leveraging camera poses to automatically annotate observer-centric relations (left/right, front/behind) together with viewpoint-independent relations (above/under). Using this benchmark, we evalua...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"preprint","doi":"https://doi.org/10.48550/arxiv.2605.15708","openalex_id":"https://openalex.org/W7161607242","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["BMW Group (Germany)","Google (United States)","Institute of Science Tokyo","Munich Center for Machine Learning","RIKEN","Shanghai Institute for Science of Science","Sphere Institute","Technical University of Munich","Tohoku University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7749000191688538},{"id":"https://openalex.org/C89600930","display_name":"Segmentation","score":0.7228999733924866},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.7023000121116638},{"id":"https://openalex.org/C185798385","display_name":"Benchmark (surveying)","score":0.6442999839782715},{"id":"https://openalex.org/C2776359362","display_name":"Representation (politics)","score":0.5543000102043152},{"id":"https://openalex.org/C2780801425","display_name":"Construct (python library)","score":0.5476999878883362},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.4625999927520752},{"id":"https://openalex.org/C48044578","display_name":"Scalability","score":0.44920000433921814}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/orchard-an-open-source-agentic-modeling-framework","title":"Orchard: An Open-Source Agentic Modeling Framework","url":"https://www.microsoft.com/en-us/research/publication/orchard-an-open-source-agentic-modeling-framework/","published":"2026-05-14","authors":["Baolin Peng","Wenlin Yao","Qianhui Wu","Hao Cheng","Xiao Yu","Ruiyi Yang","Tao Ge","Alessandro Sordoni","Xingdi Yuan","Yelong Shen","Pengcheng He","Tong Zhang"],"abstract":"Agentic modeling aims to transform LLMs into autonomous agents capable of solving complex tasks through planning, reasoning, tool use, and multi-turn interaction with environments. Despite major investment, open research remains constrained by infrastructure and training gaps. Many high-performing systems rely on proprietary codebases, models, or services, while most open-source frameworks focus on orchestration and evaluation rather than scalable agent training. We present Orchard, an open-source framework for scalable agentic modeling. At its core is Orchard Env, a lightweight environment service providing reusable primitives for sandbox lifecycle management across task domains, agent harnesses, and pipeline stages. On top of Orchard Env, we build three agentic modeling recipes. Orchard-SWE targets coding agents. We distill 107K trajectories from MiniMax-M2.5 and Qwen3.5-397B, introduc...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Miscellaneous","Artificial intelligence","Data platforms and analytics","Computation and Language","Computer science"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/groupmembench-benchmarking-llm-agent-memory-in-multi-party-conversations","title":"GroupMemBench: Benchmarking LLM Agent Memory in Multi-Party Conversations","url":"https://www.microsoft.com/en-us/research/publication/groupmembench-benchmarking-llm-agent-memory-in-multi-party-conversations/","published":"2026-05-14","authors":["Jingbo Yang","Kwei-Herng Lai","Xiaowen Wang","Shiyu Chang","Y. Harari","Evgeniy Gabrilovich"],"abstract":"Large Language Model (LLM) agents increasingly serve as personal assistants and workplace collaborators, where their utility depends on memory systems that extract, retrieve, and apply information across long-running conversations. However, both existing memory systems and benchmarks are built around the dyadic, single-user setup, even though real deployments routinely span groups and channels with multiple users interacting with the agent and with each other. This mismatch leaves three properties of group memory unmeasured: (i) group dynamics that go beyond concatenated one-on-one chats, (ii) speaker-grounded belief tracking, where the per-user memory modeling is needed, and (iii) audience-adapted language, where Theory-of-Mind shifts produce role-specific vocabulary. We introduce GroupMemBench, a benchmark that exposes all three. A graph-grounded synthesis pipeline produces multi-party...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Miscellaneous","Artificial intelligence","Human language technologies","Computer science","Natural language processing"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/auditing-agent-harness-safety","title":"Auditing Agent Harness Safety","url":"https://www.microsoft.com/en-us/research/publication/auditing-agent-harness-safety/","published":"2026-05-14","authors":["Chengzhi Liu","Yicheng Guo","Yepeng Liu","Yuzhe Yang","Qianqi Yan","Xuandong Zhao","Wenyue Hua","Shengchao Liu","Sharon Li","Yuheng Bu","Xin Wang"],"abstract":"LLM agents increasingly run inside execution harnesses that dispatch tools, allocate resources, and route messages between specialized components. However, a harness can return a correct, benign answer over a trajectory that accesses unauthorized resources or leaks context to the wrong agent. Output-level evaluation cannot see these failures, yet most safety benchmarks score only final outputs or terminal states, even though many violations occur mid-trajectory rather than at termination. The central question is whether the harness respects user intent, permission boundaries, and information-flow constraints throughout execution. To address this gap, we propose HarnessAudit, a framework that audits full execution trajectories across boundary compliance, execution fidelity, and system stability, with a focus on multi-agent harnesses where these risks are most pronounced. We further introd...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Miscellaneous","Artificial intelligence","Human language technologies","Computer science","Security and Privacy"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/test-time-learning-with-an-evolving-library","title":"Test-Time Learning with an Evolving Library","url":"https://www.microsoft.com/en-us/research/publication/test-time-learning-with-an-evolving-library/","published":"2026-05-14","authors":["Weijia Xu","Alessandro Sordoni","Chandan Singh","Zelalem Gero","Michel Galley","Xingdi Yuan","Jianfeng Gao"],"abstract":"We introduce EvoLib, a test-time learning framework that enables large language models to accumulate, reuse, and evolve knowledge across problem instances without parameter updates or external supervision. Instead of adapting model parameters, our approach maintains a shared library of knowledge abstractions, including modular skills and reflective insights, automatically extracted from the model's own inference trajectories. To support continual improvement, we introduce a principled weighting and consolidation mechanism that jointly optimizes for immediate utility and long-term value. This allows simple, instance-specific abstractions to evolve into more general and reusable ones over time. Across challenging benchmarks in mathematical reasoning, code generation, and multi-turn agentic environments, EvoLib improves substantially over the top test-time scaling and learning methods witho...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Miscellaneous","Artificial intelligence","Computer science","Machine learning"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/insighttok-improving-text-and-face-fidelity-in-discrete-tokenization-for-autoregressive-image-generation","title":"InsightTok: Improving Text and Face Fidelity in Discrete Tokenization for Autoregressive Image Generation","url":"https://www.microsoft.com/en-us/research/publication/insighttok-improving-text-and-face-fidelity-in-discrete-tokenization-for-autoregressive-image-generation/","published":"2026-05-14","authors":["Yang Yue","Fangyun Wei","Tianyu He","Jinjing Zhao","Zanlin Ni","Zeyu Liu","Junliang Guo","Lei Shi","Yue Dong","Li Chen","Ji Li","Gao Huang"],"abstract":"Text and faces are among the most perceptually salient and practically important patterns in visual generation, yet they remain challenging for autoregressive generators built on discrete tokenization. A central bottleneck is the tokenizer: aggressive downsampling and quantization often discard the fine-grained structures needed to preserve readable glyphs and distinctive facial features. We attribute this gap to standard discrete-tokenizer objectives being weakly aligned with text legibility and facial fidelity, as these objectives typically optimize generic reconstruction while compressing diverse content uniformly. To address this, we propose InsightTok, a simple yet effective discrete visual tokenization framework that enhances text and face fidelity through localized, content-aware perceptual losses. With a compact 16k codebook and a 16x downsampling rate, InsightTok significantly o...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Miscellaneous","Computer vision","Graphics and multimedia","Computer science"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/metabackdoor-exploiting-positional-encoding-as-a-backdoor-attack-surface-in-llms","title":"MetaBackdoor: Exploiting Positional Encoding as a Backdoor Attack Surface in LLMs","url":"https://www.microsoft.com/en-us/research/publication/metabackdoor-exploiting-positional-encoding-as-a-backdoor-attack-surface-in-llms/","published":"2026-05-14","authors":["Rui Wen","Mark Russinovich","Andrew Paverd","Jun Sakuma","Ahmed Salem"],"abstract":"Backdoor attacks pose a serious security threat to large language models (LLMs), which are increasingly deployed as general-purpose assistants in safety- and privacy-critical applications. Existing LLM backdoors rely primarily on content-based triggers, requiring explicit modification of the input text. In this work, we show that this assumption is unnecessary and limiting. We introduce MetaBackdoor, a new class of backdoor attacks that exploits positional information as the trigger, without modifying textual content. Our key insight is that Transformer-based LLMs necessarily encode token positions to process ordered sequences. As a result, length-correlated positional structure is reflected in the model's internal computation and can be used as an effective non-content trigger signal.We demonstrate that even a simple length-based positional trigger is sufficient to activate stealthy bac...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Miscellaneous","Security, privacy, and cryptography","Computer science"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"hf-org-paper:Tencent-Hunyuan:2605.15876","title":"Unlocking Dense Metric Depth Estimation in VLMs","url":"https://huggingface.co/papers/2605.15876","published":"2026-05-14","authors":["Tencent/Hunyuan"],"abstract":"","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["HuggingFace org papers","Tencent-Hunyuan"],"author_affiliations":["Tencent/Hunyuan"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/Tencent-Hunyuan/papers"}},{"id":"hf-org-paper:baidu:2605.15572","title":"Measuring Maximum Activations in Open Large Language Models","url":"https://huggingface.co/papers/2605.15572","published":"2026-05-14","authors":["Baidu"],"abstract":"","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["HuggingFace org papers","baidu"],"author_affiliations":["Baidu"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/baidu/papers"}},{"id":"arxiv:2605.18857","title":"The 99% Success Paradox: When Near-Perfect Retrieval Equals Random Selection","url":"https://arxiv.org/abs/2605.18857","published":"2026-05-14","authors":["Vyzantinos Repantis","Harshvardhan Singh","Tony Joseph","Cien Zhang","Akash Vishwakarma","Svetlana Karslioglu","Michael Wyatt Thot","Ameya Gawde"],"abstract":"For most of the history of information retrieval (IR), search results were designed for human consumers who could scan, filter, and discard irrelevant information on their own. This shaped retrieval systems to optimize for finding and ranking more relevant documents, but not keeping results clean and minimal, as the human was the final filter. However, LLMs have changed that by lacking this filtering ability. To address this, we introduce Bits-over-Random (BoR), a chance-corrected measure of retrieval selectivity that reveals when high success rates mask random-level performance. We measure selectivity as $BoR = \\log_{2}\\left(\\frac{\\mathrm{P}_{obs}}{\\mathrm{P}_{rand}}\\right)$, where $\\mathrm{P}_{rand}$ is the hypergeometric baseline for the chosen success rule (here, coverage: $ \\geq1 $ relevant in top-$K$). On the 20 Newsgroups dataset, BM25 and SPLADE both report $>99$% success at $K=1...","companies":["Meta/FAIR"],"matched_orgs":["Meta/FAIR"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"","openalex_id":"https://openalex.org/W7162044977","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Meta (United States)"],"concepts":[{"id":"https://openalex.org/C189430467","display_name":"Ranking (information retrieval)","score":0.6304000020027161},{"id":"https://openalex.org/C12725497","display_name":"Baseline (sea)","score":0.5009999871253967},{"id":"https://openalex.org/C2780009758","display_name":"Measure (data warehouse)","score":0.4975999891757965},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.49070000648498535},{"id":"https://openalex.org/C81917197","display_name":"Selection (genetic algorithm)","score":0.47269999980926514},{"id":"https://openalex.org/C118792377","display_name":"Selectivity","score":0.36559998989105225},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.365200012922287},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.35249999165534973}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"arxiv:2605.14443","title":"Prompting Policies for Multi-step Reasoning and Tool-Use in Black-box LLMs with Iterative Distillation of Experience","url":"https://arxiv.org/abs/2605.14443","published":"2026-05-14","authors":["Krishna Sayana","Ketan Todi","Ambarish Jash"],"abstract":"The shift toward interacting with frozen, \"black-box\" Large Language Models (LLMs) has transformed prompt engineering from a heuristic exercise into a critical optimization challenge. We propose a Reinforcement Learning (RL) framework for training learned prompting policies via iterative distillation of experience. In this architecture, a lightweight prompter model is optimized to maximize task-specific rewards for a larger, frozen worker LLM. By utilizing a contrastive experience buffer that couples scalar rewards with dense textual critiques, our approach effectively amortizes iterative prompt refinement into single-shot policy weights. Our experimental analysis focuses on the Big Bench Extra Hard (BBEH) and Tau-bench suites, covering a diverse range of multi-step reasoning and tool-use tasks. We demonstrate significant gains, improving performance from 55% to 90% in logic-intensive re...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"","openalex_id":"https://openalex.org/W7161452369","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Google (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.661899983882904},{"id":"https://openalex.org/C173801870","display_name":"Heuristic","score":0.5285999774932861},{"id":"https://openalex.org/C97541855","display_name":"Reinforcement learning","score":0.5174000263214111},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4997999966144562},{"id":"https://openalex.org/C204030448","display_name":"Distillation","score":0.4787999987602234},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.44620001316070557},{"id":"https://openalex.org/C143587482","display_name":"Iterative and incremental development","score":0.42820000648498535},{"id":"https://openalex.org/C198531522","display_name":"Sample (material)","score":0.4278999865055084}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/what-to-ignore-what-to-react-visually-robust-rl-fine-tuning-of-vla-models","title":"What to Ignore, What to React: Visually Robust RL Fine-Tuning of VLA Models","url":"https://www.microsoft.com/en-us/research/publication/what-to-ignore-what-to-react-visually-robust-rl-fine-tuning-of-vla-models/","published":"2026-05-13","authors":["Yu Peng","Jingjing Fu","Chuheng Zhang","Li Zhao","Jiang Bian","Mingyu Liu","Ling Zhang","Jun Zhang","Rui Wang"],"abstract":"Reinforcement learning (RL) fine-tuning has shown promise for Vision-Language-Action (VLA) models in robotic manipulation, but deployment-time visual shifts pose practical challenges. A key difficulty is that standard task rewards supervise task success, but offer limited guidance on whether a visual change is task-irrelevant or changes the behavior required for manipulation. We propose PAIR-VLA (Paired Action Invariance&Sensitivity for Visually Robust VLA), an RL fine-tuning framework to address this difficulty by adding two auxiliary objectives over paired visual variants during PPO optimization: an invariance term that reduces the discrepancy between action distributions for a task-preserving pair (e.g., different distractors), and a sensitivity objective that encourages separable action distributions for a task-altering pair (e.g., target object in a different pose). Together, these....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Miscellaneous","Artificial intelligence","Computer science","Machine learning","Robotics"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/pdcr-perception-decomposed-confidence-reward-for-vision-language-reasoning","title":"PDCR: Perception-Decomposed Confidence Reward for Vision-Language Reasoning","url":"https://www.microsoft.com/en-us/research/publication/pdcr-perception-decomposed-confidence-reward-for-vision-language-reasoning/","published":"2026-05-13","authors":["Heegeon Yoon","Eunseop Yoon","Jinqiu Hong","Soo-Hwan Eom","Gwanhyeong Koo","M. Hasegawa-Johnson","Qi Dai","Chong Luo","C. D. Yoo"],"abstract":"Reinforcement Learning with Verifiable Rewards (RLVR) traditionally relies on a sparse, outcome-based signal. Recent work shows that providing a fine-grained, model-intrinsic signal (rewarding the confidence growth in the ground-truth answer) effectively improves language reasoning training by providing step-level guidance without costly external models. While effective for unimodal text, we find that naively applying this global reward to vision-language (V-L) reasoning is a suboptimal strategy, as the task is a heterogeneous mix of sparse visual perception and dense textual reasoning. This global normalization creates mixture-induced signal degradation, where the training signal for visual steps is statistically distorted by the predominant textual steps. We propose Perception-Decomposed Confidence Reward (PDCR), a framework that solves this by aligning the reward structure with the ta...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Inproceedings (Conference)","Computer vision","Computation and Language","Computer science","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"arxiv:2605.13424","title":"LIFT: Last-Mile Fine-Tuning for Table Explicitation","url":"https://arxiv.org/abs/2605.13424","published":"2026-05-13","authors":["Divij Khaitan","Ashish Tiwari"],"abstract":"We propose last-mile fine-tuning, or Lift, a pipeline in which a pre-trained large language model extracts an initial table from unstructured clipboard text, and a fine-tuned small language model (1B-24B parameters SLM) repairs errors in the extracted table. On a benchmark of 2,596 tables from three datasets, Lift matches or exceeds end-to-end SLM fine-tuning on tree-edit-distance-based similarity (TEDS) metric while requiring as little as 1,000 training examples - where it outperforms end-to-end fine-tuning by up to 0.144 TEDS points. We term this approach last-mile fine-tuning and show it also more robust to input format variability. Comparisons with self-debug and end-to-end fine-tuning approaches show that last-mile fine-tuning provides an attractive option when training data is limited or when robustness to input variation is sought without compromising on accuracy.","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"article","doi":"","openalex_id":"https://openalex.org/W7161354387","cited_by_count":0,"quality_score":72,"matched_keywords":["Miscellaneous","Artificial intelligence","Human language technologies","Computer science","Natural language processing"],"author_affiliations":["Microsoft (United States)","Microsoft Research (India)","Microsoft"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7245000004768372},{"id":"https://openalex.org/C63479239","display_name":"Robustness (evolution)","score":0.683899998664856},{"id":"https://openalex.org/C45235069","display_name":"Table (database)","score":0.5514000058174133},{"id":"https://openalex.org/C139002025","display_name":"Lift (data mining)","score":0.5364999771118164},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.48809999227523804},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.4779999852180481},{"id":"https://openalex.org/C51632099","display_name":"Training set","score":0.47530001401901245},{"id":"https://openalex.org/C176217482","display_name":"Metric (unit)","score":0.4699999988079071}],"official_report":true,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/find-toward-multimodal-financial-reasoning-and-question-answering-for-indic-languages","title":"FIND: Toward Multimodal Financial Reasoning and Question Answering for Indic Languages","url":"https://www.microsoft.com/en-us/research/publication/find-toward-multimodal-financial-reasoning-and-question-answering-for-indic-languages/","published":"2026-05-13","authors":["Sarmistha Das","V. Vishal","Syed Ibrahim Ahmad","Manish Gupta","Sriparna Saha"],"abstract":"Financial decision-making in multilingual settings demands accurate numerical reasoning grounded in diverse modalities, yet existing benchmarks largely overlook this high-stakes, real-world challenge, especially for Indic languages. We introduce FinVQA, a benchmark for evaluating financial numerical and multimodal reasoning in multilingual Indic contexts. FinVQA spans English, Hindi, Bengali, Marathi, Gujarati, and Tamil, and comprises 18,900 samples across 14 financial domains. The dataset captures diverse reasoning paradigms under realistic constraints, and is structured across three difficulty levels (easy, moderate, hard) and four question formats: multiple choice, fill-in-the-blank, table matching, and true/false. To address these challenges, we propose FIND, a framework that combines supervised fine-tuning with constraint-aware decoding to promote faithful numerical reasoning, robu...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Miscellaneous","Artificial intelligence","Human language technologies","Computation and Language","Computer science"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/thinking-ahead-prospection-guided-retrieval-of-memory-with-language-models","title":"Thinking Ahead: Prospection-Guided Retrieval of Memory with Language Models","url":"https://www.microsoft.com/en-us/research/publication/thinking-ahead-prospection-guided-retrieval-of-memory-with-language-models/","published":"2026-05-13","authors":["Harshita Chopra","Krishna Chintalapudi","Suman Nath","Ryen W. White","Chirag Shah"],"abstract":"Long-horizon personalization requires dialogue assistants to retrieve user-specific facts from extended interaction histories. In practice, many relevant facts often have low semanticsimilarity to the query under dense retrieval. Standard Retrieval-Augmented Generation (RAG) and GraphRAG systems are still largely retrospective: they rely on embedding similarity to the query or on fixed graph traversals, so they often miss facts that matter for the user's needs but lie far from the query in embedding space. Inspired by prospection, the human ability to use imagined futures as cues for recall, we introduce Prospection-Guided Retrieval (PGR), which decouples retrieval from how memories are stored. Given a user query, PGR first expands the goal into a short Tree-of-Thought (ToT) or linear chain of plausible next steps, and uses these steps as retrieval probes rather than relying on the origi...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Miscellaneous","Search and information retrieval","Computation and Language","Computer science"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/gridsfm-a-foundation-model-for-ac-optimal-power-flow","title":"GridSFM: A Foundation Model for AC Optimal Power Flow","url":"https://www.microsoft.com/en-us/research/publication/gridsfm-a-foundation-model-for-ac-optimal-power-flow/","published":"2026-05-13","authors":["Weiwei Yang","Andrea Britto Mattos Lima","Thiago V. Spina","Spencer Fowers","Baosen Zhang","Chris White"],"abstract":"Grid Small Foundation Model (GridSFM) is a foundation model for power systems trained on 200 grids and over half a million scenarios. It predicts AC-OPF solutions in milliseconds: given a grid topology and loading conditions, it produces bus voltages, generator dispatch, branch power flows, and a feasibility classification without running a solver, and when higher confidence is needed its predictions serve as warm starts that accelerate conventional solvers. GridSFM is released in two tiers, GridSFM-Open (∼15M parameters, for grids up to a few thousand buses, research and prototyping) and GridSFM-Premier (∼100M parameters, for production scale grids up to tens of thousands of buses), both sharing the same architecture and trained on a broad open-data corpus of transmission topologies and operating scenarios spanning feasible and infeasible regimes via a multi-axis perturbation pipeline.....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Unpublished","Systems and networking"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"hf-org-paper:baidu:2605.14589","title":"EndPrompt: Efficient Long-Context Extension via Terminal Anchoring","url":"https://huggingface.co/papers/2605.14589","published":"2026-05-13","authors":["Baidu"],"abstract":"","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["HuggingFace org papers","baidu"],"author_affiliations":["Baidu"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/baidu/papers"}},{"id":"arxiv:2605.13801","title":"Improving Reproducibility in Evaluation through Multi-Level Annotator Modeling","url":"https://arxiv.org/abs/2605.13801","published":"2026-05-13","authors":["Deepak Pandita","Flip Korn","Chris Welty","Christopher M. Homan"],"abstract":"As generative AI models such as large language models (LLMs) become more pervasive, ensuring the safety, robustness, and overall trustworthiness of these systems is paramount. However, AI is currently facing a reproducibility crisis driven by unreliable evaluations and unrepeatable experimental results. While human raters are often used to assess models for utility and safety, they introduce divergent biases and subjective opinions into their annotations. Overcoming this variance is exceptionally challenging because very little data exists to study how experimental repeatability actually improves as the annotator pool grows. Standard evaluation practices typically rely on a small number of annotations per item (often 3 to 5) and lack the persistent rater identifiers necessary to model individual variance across items. In this work, we introduce a multi-level bootstrapping approach to rea...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"","openalex_id":"https://openalex.org/W7161354116","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Google (United States)","Rochester Institute of Technology"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7545999884605408},{"id":"https://openalex.org/C196083921","display_name":"Variance (accounting)","score":0.660099983215332},{"id":"https://openalex.org/C153701036","display_name":"Trustworthiness","score":0.642799973487854},{"id":"https://openalex.org/C207609745","display_name":"Bootstrapping (finance)","score":0.6327000260353088},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5307000279426575},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.5087000131607056},{"id":"https://openalex.org/C97256817","display_name":"Spurious relationship","score":0.43799999356269836},{"id":"https://openalex.org/C167966045","display_name":"Generative model","score":0.43720000982284546}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7161181628","title":"Data Difficulty and the Generalization--Extrapolation Tradeoff in LLM Fine-Tuning","url":"https://doi.org/10.48550/arxiv.2605.12906","published":"2026-05-13","authors":["Siyuan Liu","Tinghong Chen","X F Li","Yifei Wang","Jingzhao Zhang"],"abstract":"Data selection during supervised fine-tuning (SFT) can critically change the behavior of large language models (LLMs). Although existing work has studied the effect of selecting data based on heuristics such as perplexity, difficulty, or length, the reported findings are often inconsistent or context-dependent. In this work, we systematically study the role of data difficulty in fine-tuning from both empirical and theoretical perspectives, and find that there is no universally optimal difficulty level; rather, its effectiveness depends on the dataset size. We show that for a fixed data budget, there exists an optimal data difficulty for SFT, and that this optimal difficulty shifts toward harder data as the data budget increases. To explain this phenomenon, we conduct controlled synthetic experiments that reveal a simple underlying mechanism: the interplay between the (in-distribution) ge...","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"preprint","doi":"https://doi.org/10.48550/arxiv.2605.12906","openalex_id":"https://openalex.org/W7161181628","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Amazon (United States)","Donghua University","Huanghuai University","Tsinghua University"],"concepts":[{"id":"https://openalex.org/C177148314","display_name":"Generalization","score":0.8034999966621399},{"id":"https://openalex.org/C127705205","display_name":"Heuristics","score":0.7276999950408936},{"id":"https://openalex.org/C132459708","display_name":"Extrapolation","score":0.6953999996185303},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6927000284194946},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.5778999924659729},{"id":"https://openalex.org/C81917197","display_name":"Selection (genetic algorithm)","score":0.5726000070571899},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4803999960422516},{"id":"https://openalex.org/C2780586882","display_name":"Simple (philosophy)","score":0.41620001196861267}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"arxiv:2605.13778","title":"Realtime-VLA FLASH: Speculative Inference Framework for Diffusion-based VLAs","url":"https://huggingface.co/papers/2605.13778","published":"2026-05-13","authors":["Jiahui Niu","Kefan Gu","Yucheng Zhao","Shengwen Liang","Tiancai Wang","Xing Hu","Ying Wang","Huawei Li"],"abstract":"Diffusion-based vision-language-action models (dVLAs) are promising for embodied intelligence but are fundamentally limited in real-time deployment by the high latency of full inference. We propose Realtime-VLA FLASH, a speculative inference framework that eliminates most full inference calls during replanning by introducing a lightweight draft model with parallel verification via the main model's Action Expert and a phase-aware fallback mechanism that reverts to the full inference pipeline when needed. This design enables low-latency, high-frequency replanning without sacrificing reliability. Experiments show that on LIBERO, FLASH largely preserves task performance by replacing many 58.0 ms full-inference rounds with speculative rounds as fast as 7.8 ms, lowering task-level average inference latency to 19.1 ms (3.04x speedup). We additionally demonstrate effectiveness on real-world conv...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["huggingface_search"],"source":"huggingface_search","work_type":"","doi":"","openalex_id":"","cited_by_count":0,"quality_score":27,"matched_keywords":[],"author_affiliations":[],"concepts":[],"official_report":false,"quality_signals":{"company_match_source":"HuggingFace Papers organization/author metadata"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/kairos-a-scalable-serving-system-for-physical-ai","title":"Kairos: A Scalable Serving System for Physical AI","url":"https://www.microsoft.com/en-us/research/publication/kairos-a-scalable-serving-system-for-physical-ai/","published":"2026-05-12","authors":["Yinwei Dai","Ganesh Ananthanarayanan","Landon Cox","Xenofon Foukas","Bozidar Radunovic","Ravi Netravali"],"abstract":"Physical AI is experiencing rapid growth with frontier foundation models increasing its capabilities across general environments. Physical AI tasks are characterized by inference properties that are markedly different from digital AI. They consist of multiple rounds of inference and action execution, generating a chunk of actions in each inference round, and asynchronously interleaving inference and execution. This makes existing digital AI serving systems unsuited for physical AI; a shortcoming that is critical for enabling their wide adoption, considering their size and the scale of the robot fleets they have to serve. To fill this gap, we design Kairos, the first multi-robot serving system that makes the generate-execute loop a first-class citizen, with active involvement in the execution phase. Across a wide range of physical AI models and robots, Kairos reduces the average end-to-en...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Miscellaneous","Artificial intelligence","Hardware and devices","Systems and networking","Computer science","systems and networking"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/disabench-a-participatory-evaluation-framework-for-disability-harms-in-language-models","title":"DisaBench: A Participatory Evaluation Framework for Disability Harms in Language Models","url":"https://www.microsoft.com/en-us/research/publication/disabench-a-participatory-evaluation-framework-for-disability-harms-in-language-models/","published":"2026-05-12","authors":["Eugenia Kim","Ioana Tănase","Christina Mallon"],"abstract":"General-purpose safety benchmarks for large language models do not adequately evaluate disability-related harms. We introduce DisaBench: a taxonomy of twelve disability harm categories co-created with people with disabilities and red teaming experts, a taxonomy-driven evaluation methodology that pairs benign and adversarial prompts across seven life domains, and a dataset of 175 prompts with human-annotated labels on 525 prompt-response pairs. Annotation by four evaluators with lived disability experience reveals three findings: harm rates vary sharply by disability type and will compound in non-text modalities, terminology-driven harm is culturally and temporally bound rather than universally assessable, and standard safety evaluation catches overt failures while missing the subtle harms that only domain expertise can recognize. Disability harm is simultaneously personal, intersectional...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Miscellaneous","Artificial intelligence","Medical, health and genomics","Computer science","Health care"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/covering-human-action-space-for-computer-use-data-synthesis-and-benchmark","title":"Covering Human Action Space for Computer Use: Data Synthesis and Benchmark","url":"https://www.microsoft.com/en-us/research/publication/covering-human-action-space-for-computer-use-data-synthesis-and-benchmark/","published":"2026-05-12","authors":["Miaosen Zhang","Xiaohan Zhao","Zhihong Tan","Huoshen Zhou","Yijia Fan","Yifan Yang","Kai Qiu","Bei Liu","Justin Wagle","Chenzhong Yin","Mingxi Cheng","Ji Li"],"abstract":"Computer-use agents (CUAs) automate on-screen work, as illustrated by GPT-5.4 and Claude. Yet their reliability on complex, low-frequency interactions is still poor, limiting user trust. Our analysis of failure cases from advanced models suggests a long-tail pattern in GUI operations, where a relatively small fraction of complex and diverse interactions accounts for a disproportionate share of task failures. We hypothesize that this issue largely stems from the scarcity of data for complex interactions. To address this problem, we propose a new benchmark CUActSpot for evaluating models'capabilities on complex interactions across five modalities: GUI, text, table, canvas, and natural image, as well as a variety of actions (click, drag, draw, etc.), covering a broader range of interaction types than prior click-centric benchmarks that focus mainly on GUI widgets. We also design a renderer-...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Miscellaneous","Artificial intelligence","Data platforms and analytics","Computer science","Human–computer interaction"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/beyond-grpo-and-on-policy-distillation-an-empirical-sparse-to-dense-reward-principle-for-language-model-post-training","title":"Beyond GRPO and On-Policy Distillation: An Empirical Sparse-to-Dense Reward Principle for Language-Model Post-Training","url":"https://www.microsoft.com/en-us/research/publication/beyond-grpo-and-on-policy-distillation-an-empirical-sparse-to-dense-reward-principle-for-language-model-post-training/","published":"2026-05-12","authors":["Yuan Xu","Hejian Sang","Zhengze Zhou","Ran He","Zhipeng Wang","A. Geramifard"],"abstract":"In settings where labeled verifiable training data is the binding constraint, each checked example should be allocated carefully. The standard practice is to use this data directly on the model that will be deployed, for example by running GRPO on the deployment student. We argue that this is often an inefficient allocation because it overlooks a reward-density principle: sparse sequence-level reward should train models where exploration is productive, while dense token-level teacher reward should be used where the aim is to compress behavior into a smaller model. In this view, GRPO-style sparse RL and OPD-style dense teacher supervision are not separate recipes; they are different reward-density regimes. The allocation rule is simple: use scarce labeled training data upstream on the strongest model that can turn it into reward-shaped behavior, then transfer that behavior downstream as d...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Miscellaneous","Algorithms","Artificial intelligence","Computer science","Machine learning"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/agent-brace-decoupling-beliefs-from-actions-in-long-horizon-tasks-via-verbalized-state-uncertainty","title":"Agent-BRACE: Decoupling Beliefs from Actions in Long-Horizon Tasks via Verbalized State Uncertainty","url":"https://www.microsoft.com/en-us/research/publication/agent-brace-decoupling-beliefs-from-actions-in-long-horizon-tasks-via-verbalized-state-uncertainty/","published":"2026-05-12","authors":["Joykirat Singh","Zaid Khan","Archiki Prasad","Junyan Chen","Akshay Nambi","Hyunji Lee","Elias Stengel-Eskin","Mohit Bansal"],"abstract":"Large language models (LLMs) are increasingly deployed on long-horizon tasks in partially observable environments, where they must act while inferring and tracking a complex environment state over many steps. This leads to two challenges: partial observability requires maintaining uncertainty over unobserved world attributes, and long interaction history causes context to grow without bound, diluting task-relevant information. A principled solution to both challenges is a belief state: a posterior distribution over environment states given past observations and actions, which compactly encodes history for decision making regardless of episode length. In LLM agents, however, the open-ended nature of text makes it unclear how to represent such a distribution. Therefore, we introduce Agent-BRACE: Agent Belief state Representation via Abstraction and Confidence Estimation, a method that deco...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Miscellaneous","Artificial intelligence","Human language technologies","Computer science","large language models"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/no-one-knows-the-state-of-the-art-in-geospatial-foundation-models","title":"No One Knows the State of the Art in Geospatial Foundation Models","url":"https://www.microsoft.com/en-us/research/publication/no-one-knows-the-state-of-the-art-in-geospatial-foundation-models/","published":"2026-05-12","authors":["Isaac Corley","Nils Lehmann","Caleb Robinson","Gabriel Tseng","Anthony Fuller","Hamed Alemohammad","Evan Shelhamer","Jennifer Marcus","Hannah Kerner"],"abstract":"Geospatial foundation models (GFMs) have been proposed as generalizable backbones for disaster response, land-cover mapping, food-security monitoring, and other high-stakes Earth-observation tasks. Yet the published work about these models does not give reviewers or users enough information to tell which model fits a given task. We argue that nobody knows what the current state of the art is in geospatial foundation models. The methods may be useful, but the GFM literature does not standardize evaluations, training and testing protocols, released weights, or pretraining controls well enough for anyone to compare or rank them. In a 152-paper audit, we find 46 cross-paper disagreements of at least 10 points for the same model, benchmark, and protocol; 94/126 papers with extractable pretraining data use a configuration no other paper uses; and 39% of GFM papers release no model weights. Thi...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Miscellaneous","Artificial intelligence","Computer vision","Computer science"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/multi-rollout-on-policy-distillation-via-peer-successes-and-failures","title":"Multi-Rollout On-Policy Distillation via Peer Successes and Failures","url":"https://www.microsoft.com/en-us/research/publication/multi-rollout-on-policy-distillation-via-peer-successes-and-failures/","published":"2026-05-12","authors":["Weicheng Yu","Xiaomin Li","Yizhou Zhao","Xiaoze Liu","Ruowang Zhang","Haixin Wang","Yinyi Luo","Chenglin Wu","Gaurav Mittal","Matt Fredrikson","Yu Hu"],"abstract":"Large language models are often post-trained with sparse verifier rewards, which indicate whether a sampled trajectory succeeds but provide limited guidance about where reasoning succeeds or fails. On-policy distillation (OPD) offers denser token-level supervision by training on student-generated trajectories, yet existing methods typically distill each rollout independently and ignore the other attempts sampled for the same prompt. We introduce Multi-Rollout On-Policy Distillation (MOPD), a peer-conditioned distillation framework that uses the student's local rollout group to construct more informative teacher signals. MOPD conditions the teacher on both successful and failed peer rollouts: successes provide positive evidence for valid reasoning patterns, while failures provide structured negative evidence about plausible mistakes to avoid. We study two peer-context constructions: posit...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Miscellaneous","Artificial intelligence","Computer science","Machine learning"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/gear-granularity-adaptive-advantage-reweighting-for-llm-agents-via-self-distillation","title":"GEAR: Granularity-Adaptive Advantage Reweighting for LLM Agents via Self-Distillation","url":"https://www.microsoft.com/en-us/research/publication/gear-granularity-adaptive-advantage-reweighting-for-llm-agents-via-self-distillation/","published":"2026-05-12","authors":["Sijia Li","Yuchen Huang","Zifan Liu","Yanping Li","Jingjing Fu","Li Zhao","Jiang Bian","Ling Zhang","Jun Zhang","Rui Wang"],"abstract":"Reinforcement learning has become a widely used post-training approach for LLM agents, where training commonly relies on outcome-level rewards that provide only coarse supervision. While finer-grained credit assignment is promising for effective policy updates, obtaining reliable local credit and assigning it to the right parts of the long-horizon trajectory remains an open challenge. In this paper, we propose Granularity-adaptivE Advantage Reweighting (GEAR), an adaptive-granularity credit assignment framework that reshapes the trajectory-level GRPO advantage using token- and segment-level signals derived from self-distillation. GEAR compares an on-policy student with a ground-truth-conditioned teacher to obtain a reference-guided divergence signal for identifying adaptive segment boundaries and modulating local advantage weights. This divergence often spikes at the onset of a semantic....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Miscellaneous","Artificial intelligence","Human language technologies","Computer science"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"hf-org-paper:Qwen:2605.13565","title":"Qwen-Image-VAE-2.0 Technical Report","url":"https://huggingface.co/papers/2605.13565","published":"2026-05-12","authors":["Alibaba/Qwen"],"abstract":"We present Qwen-Image-VAE-2.0, a suite of high-compression Variational Autoencoders (VAEs) that achieve significant advances in both reconstruction fidelity and diffusability. To address the reconstruction bottlenecks of high compression, we adopt an improved architecture featuring Global Skip Connections (GSC) and expanded latent channels. Moreover, we scale training to billions of images and incorporate a synthetic rendering engine to improve performance in text-rich scenarios. To tackle the convergence challenges of high-dimensional latent space, we implement an enhanced semantic alignment strategy to make the latent space highly amenable to diffusion modeling. To optimize computational efficiency, we leverage an asymmetric and attention-free encoder-decoder backbone to minimize encoding overhead. We present a comprehensive evaluation of Qwen-Image-VAE-2.0 on public reconstruction ben...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["HuggingFace org papers","Qwen"],"author_affiliations":["Alibaba/Qwen"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/Qwen/papers"}},{"id":"hf-org-paper:deepseek-ai:2605.13027","title":"PRISM: Prior Rectification and Uncertainty-Aware Structure Modeling for Diffusion-Based Text Image Super-Resolution","url":"https://huggingface.co/papers/2605.13027","published":"2026-05-12","authors":["DeepSeek"],"abstract":"","companies":["DeepSeek"],"matched_orgs":["DeepSeek"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["HuggingFace org papers","deepseek-ai"],"author_affiliations":["DeepSeek"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/deepseek-ai/papers"}},{"id":"hf-org-paper:Tencent-Hunyuan:2605.11739","title":"Learning to Foresee: Unveiling the Unlocking Efficiency of On-Policy Distillation","url":"https://huggingface.co/papers/2605.11739","published":"2026-05-12","authors":["Tencent/Hunyuan"],"abstract":"","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["HuggingFace org papers","Tencent-Hunyuan"],"author_affiliations":["Tencent/Hunyuan"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/Tencent-Hunyuan/papers"}},{"id":"hf-org-paper:stepfun-ai:2605.12034","title":"Boosting Omni-Modal Language Models: Staged Post-Training with Visually Debiased Evaluation","url":"https://huggingface.co/papers/2605.12034","published":"2026-05-12","authors":["StepFun"],"abstract":"","companies":["StepFun"],"matched_orgs":["StepFun"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["HuggingFace org papers","stepfun-ai"],"author_affiliations":["StepFun"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/stepfun-ai/papers"}},{"id":"official:a27297bde9732f2e","title":"report","url":"https://huggingface.co/tencent/Hy-MT2-1.8B-FP8/blob/main/HY_MT2_0_Report.pdf","published":"2026-05-12","authors":["Tencent/Hunyuan"],"abstract":"","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_repository_scan"],"source":"official_repository_scan","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Tencent/Hunyuan"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace repo tencent/Hy-MT2-1.8B-FP8"}},{"id":"arxiv:2605.11629","title":"OmniThoughtVis: A Scalable Distillation Pipeline for Deployable Multimodal Reasoning Models","url":"https://arxiv.org/abs/2605.11629","published":"2026-05-12","authors":["Yuanhao Yue","Chengyu Wang","Yuanjie Lyu","Lei Shen","Jun Huang"],"abstract":"Recent multimodal large language models (MLLMs) have shown strong chain-of-thought (CoT) reasoning ability on vision-language tasks, but their direct deployment in real-world systems is often limited by latency and resource constraints. In practice, smaller MLLMs are preferred for online serving, yet their reasoning performance is bottlenecked by the lack of large-scale, high-quality multimodal CoT supervision. In this paper, we present OmniThoughtVis, a scalable data curation and distillation pipeline for transferring multimodal reasoning capabilities from high-capacity teacher models to smaller, deployment-oriented MLLMs. Starting from a diverse open-source seed pool, our pipeline generates structured CoT traces and performs joint annotation of reasoning difficulty, answer quality, and semantic task tags. To maintain data quality at scale, we combine rule-based filtering, difficulty-aw...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"","openalex_id":"https://openalex.org/W7161204657","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Alibaba Group (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7391999959945679},{"id":"https://openalex.org/C43521106","display_name":"Pipeline (software)","score":0.6654999852180481},{"id":"https://openalex.org/C48044578","display_name":"Scalability","score":0.6550999879837036},{"id":"https://openalex.org/C204030448","display_name":"Distillation","score":0.5637999773025513},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5023000240325928},{"id":"https://openalex.org/C2780451532","display_name":"Task (project management)","score":0.46399998664855957},{"id":"https://openalex.org/C61224824","display_name":"Mixture model","score":0.4171000123023987},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.36559998989105225}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"arxiv:2605.12419","title":"ORBIT: Preserving Foundational Language Capabilities in GenRetrieval via Origin-Regulated Merging","url":"https://arxiv.org/abs/2605.12419","published":"2026-05-12","authors":["Neha Verma","Nikhil Mehta","Shao-Chuan Wang","Naijing Zhang","Alicia Tsai","Li Wei","Lukasz Heldt","Lichan Hong","Ed Chi","Xinyang Yi"],"abstract":"Despite the rapid advancements in large language model (LLM) development, fine-tuning them for specific tasks often results in the catastrophic forgetting of their general, language-based reasoning abilities. This work investigates and addresses this challenge in the context of the Generative Retrieval (GenRetrieval) task. During GenRetrieval fine-tuning, we find this forgetting occurs rapidly and correlates with the distance between the fine-tuned and original model parameters. Given these observations, we propose ORBIT, a novel approach that actively tracks the distance between fine-tuned and initial model weights, and uses a weight averaging strategy to constrain model drift during GenRetrieval fine-tuning when this inter-model distance exceeds a maximum threshold. Our results show that ORBIT retains substantial text and retrieval performance by outperforming both common continual lea...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"","openalex_id":"https://openalex.org/W7161204586","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Google (United States)","Johns Hopkins University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7318999767303467},{"id":"https://openalex.org/C7149132","display_name":"Forgetting","score":0.7196999788284302},{"id":"https://openalex.org/C2776135515","display_name":"Regularization (linguistics)","score":0.5722000002861023},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5406000018119812},{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.5353999733924866},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.5199000239372253},{"id":"https://openalex.org/C2779343474","display_name":"Context (archaeology)","score":0.5077000260353088},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.3596999943256378}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7160954061","title":"From evaluator to principal: the agentic AI literacy framework (AALF) for delegated autonomy","url":"https://doi.org/10.1007/s43681-026-01167-3","published":"2026-05-12","authors":["Rohith Nama"],"abstract":"","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1007/s43681-026-01167-3","openalex_id":"https://openalex.org/W7160954061","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Amazon (United States)"],"concepts":[{"id":"https://openalex.org/C65414064","display_name":"Autonomy","score":0.6244000196456909},{"id":"https://openalex.org/C108170787","display_name":"Agency (philosophy)","score":0.5974000096321106},{"id":"https://openalex.org/C44725695","display_name":"Normative","score":0.5914000272750854},{"id":"https://openalex.org/C547764534","display_name":"Literacy","score":0.5371000170707703},{"id":"https://openalex.org/C39389867","display_name":"Corporate governance","score":0.5109000205993652},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.45590001344680786},{"id":"https://openalex.org/C56739046","display_name":"Knowledge management","score":0.4474000036716461},{"id":"https://openalex.org/C14224292","display_name":"Conceptual framework","score":0.37130001187324524}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"arxiv:2605.11775","title":"Entropy Polarity in Reinforcement Fine-Tuning: Direction, Asymmetry, and Control","url":"https://arxiv.org/abs/2605.11775","published":"2026-05-12","authors":["Jiazheng Zhang","Ziche Fu","Junrui Shen","Yunbin Zhao","Yunke Zhang","Zhiheng Xi","Long Ma","Chenxin An","Zhihao Zhang","Shichun Liu","Dingwei Zhu","Shihan Dou"],"abstract":"Policy entropy has emerged as a fundamental measure for understanding and controlling exploration in reinforcement learning with verifiable rewards (RLVR) for LLMs. However, existing entropy-aware methods mainly regulate entropy through global objectives, while the token-level mechanism by which sampled policy updates reshape policy entropy remains underexplored. In this work, we develop a theoretical framework of entropy mechanics in RLVR. Our analysis yields a first-order approximation of the entropy change, giving rise to entropy polarity, a signed token-level quantity that predicts how much a sampled update expands or contracts entropy. This analysis further reveals a structural asymmetry: reinforcing frequent high-probability tokens triggers contraction tendencies, whereas expansive tendencies typically require lower-probability samples or stronger distributional correction. Empiric...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"","openalex_id":"https://openalex.org/W7161203877","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Tencent (China)","University of Hong Kong"],"concepts":[{"id":"https://openalex.org/C106301342","display_name":"Entropy (arrow of time)","score":0.6990000009536743},{"id":"https://openalex.org/C97541855","display_name":"Reinforcement learning","score":0.5634999871253967},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5013999938964844},{"id":"https://openalex.org/C2780502288","display_name":"Expansive","score":0.47540000081062317},{"id":"https://openalex.org/C33923547","display_name":"Mathematics","score":0.41019999980926514},{"id":"https://openalex.org/C121864883","display_name":"Statistical physics","score":0.39879998564720154},{"id":"https://openalex.org/C101721835","display_name":"Conditional entropy","score":0.3564000129699707},{"id":"https://openalex.org/C125252325","display_name":"Entropy rate","score":0.3472000062465668}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7160925988","title":"Assessing generative modeling approaches for free energy estimates in condensed matter","url":"https://doi.org/10.1063/5.0320214","published":"2026-05-12","authors":["Maximilian Schebek","J HE","Emil Hoffmann","Yuanqi Du","Frank Noé","Jutta Rogal"],"abstract":"The accurate estimation of free energy differences between two states is a long-standing challenge in molecular simulations. Traditional approaches generally rely on sampling multiple intermediate states to ensure sufficient overlap in phase space and are, consequently, computationally expensive. Boltzmann generators and related generative-model-based methods have recently addressed this challenge by learning a direct probability density transform between two states. However, it remains unclear which approach provides the best trade-off between efficiency, accuracy, and scalability. In this work, we review and benchmark selected generative approaches for condensed-matter systems, including discrete and continuous normalizing flows for targeted free energy perturbation and FEAT (Free Energy Estimators with Adaptive Transport) combined with the escorted Jarzynski equality, using coarse-gra...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1063/5.0320214","openalex_id":"https://openalex.org/W7160925988","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Cornell University","Flatiron Health (United States)","Freie Universität Berlin","Microsoft (United States)","Microsoft Research (United Kingdom)","Rice University","University of Cambridge"],"concepts":[{"id":"https://openalex.org/C185429906","display_name":"Estimator","score":0.5857999920845032},{"id":"https://openalex.org/C162641036","display_name":"Free energy perturbation","score":0.5449000000953674},{"id":"https://openalex.org/C2776214188","display_name":"Inference","score":0.501800000667572},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.4887999892234802},{"id":"https://openalex.org/C185798385","display_name":"Benchmark (surveying)","score":0.4876999855041504},{"id":"https://openalex.org/C186370098","display_name":"Energy (signal processing)","score":0.45879998803138733},{"id":"https://openalex.org/C121864883","display_name":"Statistical physics","score":0.42250001430511475},{"id":"https://openalex.org/C126255220","display_name":"Mathematical optimization","score":0.40529999136924744}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"arxiv:2605.11887","title":"Qwen-Scope: Turning Sparse Features into Development Tools for Large Language Models","url":"https://huggingface.co/papers/2605.11887","published":"2026-05-12","authors":["Boyi Deng","Xu Wang","Yaoning Wang","Yu Wan","Yubo Ma","Baosong Yang","Haoran Wei","Jialong Tang","Huan Lin","Ruize Gao","Tianhao Li","Qian Cao"],"abstract":"Large language models have achieved remarkable capabilities across diverse tasks, yet their internal decision-making processes remain largely opaque, limiting our ability to inspect, control, and systematically improve them. This opacity motivates a growing body of research in mechanistic interpretability, with sparse autoencoders (SAEs) emerging as one of the most promising tools for decomposing model activations into sparse, interpretable feature representations. We introduce Qwen-Scope, an open-source suite of SAEs built on the Qwen model family, comprising 14 groups of SAEs across 7 model variants from the Qwen3 and Qwen3.5 series, covering both dense and mixture-of-expert architectures. Built on top of these SAEs, we show that SAEs can go beyond post-hoc analysis to serve as practical interfaces for model development along four directions: (i) inference-time steering, where SAE feat...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["huggingface_search"],"source":"huggingface_search","work_type":"","doi":"","openalex_id":"","cited_by_count":0,"quality_score":27,"matched_keywords":[],"author_affiliations":[],"concepts":[],"official_report":false,"quality_signals":{"company_match_source":"HuggingFace Papers organization/author metadata"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/rebellious-student-reversing-teacher-signals-for-reasoning-exploration-with-self-distilled-rlvr","title":"Rebellious Student: Reversing Teacher Signals for Reasoning Exploration with Self-Distilled RLVR","url":"https://www.microsoft.com/en-us/research/publication/rebellious-student-reversing-teacher-signals-for-reasoning-exploration-with-self-distilled-rlvr/","published":"2026-05-11","authors":["Jeonghye Kim","Jiwon Jeon","Dongsheng Li","Yuqing Yang"],"abstract":"Self-distillation has emerged as a powerful framework for post-training LLMs, where a teacher conditioned on extra information guides a student without it, both from the same model. While this guidance is useful when the student has failed, on successful rollouts, the same mechanism instead overwrites the student's choices and suppresses it's own reasoning. Therefore, we propose reading the original self-distillation signal in reverse: when the student succeeds along a path the teacher would not have predicted, these tokens reflect its self-driven reasoning. Building on this, we propose RLRT (RLVR with Reversed Teacher), which augments GRPO by reinforcing these tokens on correct rollouts. We interpret this as a new form of exploration in RLVR: not uniform diversity, but valuable exploration grounded in the student's own success. Across base, instruction-tuned, and thinking-tuned Qwen3 ch...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Miscellaneous","Artificial intelligence","Computation and Language","Computer science","Machine learning"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/reinforce-adjoint-matching-scaling-rl-post-training-of-diffusion-and-flow-matching-models","title":"Reinforce Adjoint Matching: Scaling RL Post-Training of Diffusion and Flow-Matching Models","url":"https://www.microsoft.com/en-us/research/publication/reinforce-adjoint-matching-scaling-rl-post-training-of-diffusion-and-flow-matching-models/","published":"2026-05-11","authors":["Andreas Bergmeister","Stefanie Jegelka","Nikolas Nusken","Carles Domingo-Enrich","Jakiw Pidstrigach"],"abstract":"Diffusion and flow-matching models scale because pretraining is supervised regression: a clean sample is noised analytically, and a model regresses against a closed-form target. RL post-training aligns the model with a reward. In image generation, this makes samples compose objects correctly, render text legibly, and match human preferences. Existing methods rely on costly SDE rollouts, reward gradients, or surrogate losses, sacrificing pretraining's regression structure. We show that the structure extends to RL post-training. Under KL-regularized reward maximization, the optimal generative process tilts the clean-endpoint distribution towards samples with higher reward and leaves the noising law unchanged. Combining this with the adjoint-matching optimality condition and a REINFORCE identity, we derive Reinforce Adjoint Matching (RAM): a consistency loss that corrects the pretraining ta...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Miscellaneous","Computer vision","Computer science","Computer Vision and Pattern Recognition"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/revision-scaling-computer-use-agents-via-temporal-visual-redundancy-reduction","title":"ReVision: Scaling Computer-Use Agents via Temporal Visual Redundancy Reduction","url":"https://www.microsoft.com/en-us/research/publication/revision-scaling-computer-use-agents-via-temporal-visual-redundancy-reduction/","published":"2026-05-11","authors":["Amirhossein Abaskohi","Yuhang He","Peter West","Giuseppe Carenini","Pranit Chawla","Vibhav Vineet"],"abstract":"Computer-use agents (CUAs) rely on visual observations of graphical user interfaces, where each screenshot is encoded into a large number of visual tokens. As interaction trajectories grow, the token cost increases rapidly, limiting the amount of history that can be incorporated under fixed context and compute budgets. This has resulted in no or very limited improvement in the performance when using history unlike other domains. We address this inefficiency by introducing ReVision, which is used to train multimodal language models on trajectories where redundant visual patches are removed using a learned patch selector that compares patch representations across consecutive screenshots while preserving spatial structure required by the model. Across three benchmarks, OSWorld, WebTailBench, and AgentNetBench, when processing trajectories with 5 history screenshots using Qwen2.5-VL-7B, ReVi...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Miscellaneous","Artificial intelligence","Computation and Language","Computer science"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/deeprefine-agent-compiled-knowledge-refinement-via-reinforcement-learning","title":"DeepRefine: Agent-Compiled Knowledge Refinement via Reinforcement Learning","url":"https://www.microsoft.com/en-us/research/publication/deeprefine-agent-compiled-knowledge-refinement-via-reinforcement-learning/","published":"2026-05-11","authors":["Haoyu Huang","Jiaxin Bai","Shujie Liu","Yang Wei","Hong Ting Tsang","Yisen Gao","Zhongwei Xie","Yufei Li","Yangqiu Song"],"abstract":"Agent-compiled knowledge bases provide persistent external knowledge for large language model (LLM) agents in open-ended, knowledge-intensive downstream tasks. Yet their quality is systematically limited by emph{incompleteness}, emph{incorrectness}, and emph{redundancy}, manifested as missing evidence or cross-document links, low-confidence or imprecise claims, and ambiguous or coreference resolution issues. Such defects compound under iterative use, degrading retrieval fidelity and downstream task performance. We present textbf{DeepRefine}, a general LLM-based reasoning model for emph{agent-compiled knowledge refinement} that improves the quality of any pre-constructed knowledge bases with user queries to make it more suitable for the downstream tasks. DeepRefine performs multi-turn interactions with the knowledge base and conducts abductive diagnosis over interaction history, localizes...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Miscellaneous","Artificial intelligence","Human language technologies","Computer science"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"hf-org-paper:Tencent-Hunyuan:2605.11711","title":"Debiased Model-based Representations for Sample-efficient Continuous Control","url":"https://huggingface.co/papers/2605.11711","published":"2026-05-11","authors":["Tencent/Hunyuan"],"abstract":"","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["HuggingFace org papers","Tencent-Hunyuan"],"author_affiliations":["Tencent/Hunyuan"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/Tencent-Hunyuan/papers"}},{"id":"apple:ud32dwhs02793g6lvzl55s4t","title":"BalCapRL: A Balanced Framework for RL-Based MLLM Image Captioning","url":"https://machinelearning.apple.com/research/balcaprl-mllm-image-captioning","published":"2026-05-11","authors":["Shaokai Ye","Vasileios Saveris","Yihao Qian","Jiaming Hu","Elmira Amirloo","Peter Grasch"],"abstract":"Image captioning is one of the most fundamental tasks in computer vision. Owing to its open-ended nature, it has received significant attention in the era of multimodal large language models (MLLMs). In pursuit of ever more detailed and accurate captions, recent work has increasingly turned to reinforcement learning (RL). However, existing captioning-RL methods and evaluation metrics often emphasize a narrow notion of caption quality, inducing...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"openalex:W7160848864","title":"An Interpretable Multi-Modal Ensemble Framework for Breast Cancer Analysis Using Imaging, Omics and Biomedical Literature","url":"https://doi.org/10.3991/ijoe.v22i05.60535","published":"2026-05-11","authors":["Sayeedakhanum Pathan","Dhanush Kandagatla","Takkedu Malathi","Syeda Imrana Fatima","Vijay Kumar Gugulothu","Purude Vaishali Narayanrao"],"abstract":"Although breast cancer is still a concern in the global healthcare domain, there is an immediate requirement for intelligent systems that can help in the early and accurate diagnosis of the disease based on the synthesis of various types of data. This paper proposes AutoMedEnsemble, an artificial intelligence-powered multi-modal ensemble system that integrates the extraction of healthcare literature, gene expression analysis, and histopathological image assessment. The literature processing module with BioBERT has a precision of 91.8% and an F1-score of 90.5%. The omics-based component, analyzing gene expressions from the NCBI Gene Expression Omnibus (GSE45827), achieves an accuracy of 93.5% with an F1-score of 93.7%. The imaging module utilizes a ResNet50 architecture with Grad-CAM for interpretability, achieving an accuracy of 95.2% and an F1-score of 95.5%. While evaluated as independ...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.3991/ijoe.v22i05.60535","openalex_id":"https://openalex.org/W7160848864","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Electronics Corporation of India","Higher Education Academy","Institute of Technical Education","Microsoft (United States)","Raisoni Group of Institutions","Vignana Jyothi Institute of Management"],"concepts":[{"id":"https://openalex.org/C530470458","display_name":"Breast cancer","score":0.6898999810218811},{"id":"https://openalex.org/C185798385","display_name":"Benchmark (surveying)","score":0.6635000109672546},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5958999991416931},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5113999843597412},{"id":"https://openalex.org/C163763905","display_name":"Precision medicine","score":0.46700000762939453},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.44510000944137573},{"id":"https://openalex.org/C124101348","display_name":"Data mining","score":0.4235999882221222},{"id":"https://openalex.org/C90559484","display_name":"Expression (computer science)","score":0.36890000104904175}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/codeclinic-evaluating-automation-of-coding-skills-for-clinical-reasoning-agents","title":"CodeClinic: Evaluating Automation of Coding Skills for Clinical Reasoning Agents","url":"https://www.microsoft.com/en-us/research/publication/codeclinic-evaluating-automation-of-coding-skills-for-clinical-reasoning-agents/","published":"2026-05-10","authors":["Timothy Ossowski","Xinchi Liu","Danyal Maqbool","Vaibhav Dhanuka","Sheng Zhang","Hoifung Poon","Majid Afshar","Tyler J. Bradshaw","Junjie Hu"],"abstract":"Clinical reasoning agents based on large language models (LLMs) aim to automate tasks such as intensive care unit (ICU) monitoring and patient state tracking from electronic health records (EHRs). Existing systems typically rely on manually curated clinical tools or skills for concepts such as sepsis detection and organ failure assessment. However, maintaining these tool libraries requires substantial expert effort, while zero-shot querying or code generation often produces inefficient and unreliable reasoning chains, especially under institution-specific clinical policies. We introduce CodeClinic, a benchmark built on MIMIC-IV for evaluating whether LLM agents can synthesize and compose reusable clinical skills instead of relying on fixed toolboxes. The benchmark contains two complementary tasks: longitudinal ICU surveillance and compositional information seeking. The longitudinal setti...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Miscellaneous","Artificial intelligence","Medical, health and genomics","Computer science","Healthcare"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/security-risks-in-tool-enabled-ai-agents-a-systematic-analysis-of-privileged-execution-environments","title":"Security Risks in Tool-Enabled AI Agents: A Systematic Analysis of Privileged Execution Environments","url":"https://www.microsoft.com/en-us/research/publication/security-risks-in-tool-enabled-ai-agents-a-systematic-analysis-of-privileged-execution-environments/","published":"2026-05-10","authors":["Hardik Goel"],"abstract":"Tool-enabled AI agents are increasingly deployed in cloud-hosted environments and offered as services, where they perform side-effecting operations through privileged tools within execution environments. While such agents enable powerful automation, the security implications of hosting autonomous agents in privileged execution environments are not yet fully explored. This paper presents a structured analysis of security risks associated with cloud-hosted AI agents. We introduce a taxonomy of risk categories, illustrate these risks through three representative agent scenarios, and discuss mitigation strategies along with their tradeoffs. A small controlled experiment empirically illustrates risk manifestation and the effect of lightweight mitigations in this setup. Our analysis suggests that many risks in autonomous cloud agents arise not from novel vulnerabilities, but from over-privileg...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Miscellaneous","Security, privacy, and cryptography","Computer science","Cryptography"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/oracle-poisoning-corrupting-knowledge-graphs-to-weaponise-ai-agent-reasoning","title":"Oracle Poisoning: Corrupting Knowledge Graphs to Weaponise AI Agent Reasoning","url":"https://www.microsoft.com/en-us/research/publication/oracle-poisoning-corrupting-knowledge-graphs-to-weaponise-ai-agent-reasoning/","published":"2026-05-10","authors":["Ben Kereopa-Yorke","Guillermo Díaz","Holly Wright","Reagan Johnston","Ron F. Del Rosario","Timothy Lynar"],"abstract":"We define Oracle Poisoning, an attack class in which an adversary corrupts a structured knowledge graph that AI agents query at runtime via tool-use protocols, causing incorrect conclusions through correct reasoning. Unlike prompt injection, Oracle Poisoning manipulates the data agents reason over, not their instructions. We demonstrate six attack scenarios against a production 42-million-node code knowledge graph, providing the first empirical demonstration of knowledge graph poisoning against a production-scale agentic system, distinct from CTI embedding poisoning. Primary evaluation uses real SDK tool-use across nine models from three providers (N=30 per model), where models autonomously invoke a graph query tool and reason from results. The result is unambiguous: every tested model trusts poisoned data at 100% at moderate attacker sophistication(L2), with 269 valid trials (of 270) ac...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Miscellaneous","Artificial intelligence","Security, privacy, and cryptography","Computer science"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/position-avoid-overstretching-llms-for-every-enterprise-task","title":"Position: Avoid Overstretching LLMs for every Enterprise Task","url":"https://www.microsoft.com/en-us/research/publication/position-avoid-overstretching-llms-for-every-enterprise-task/","published":"2026-05-10","authors":["Kuldeep Singh","Anson Bastos","Isaiah Onando Mulang"],"abstract":"Enterprise workloads are dominated by deterministic, structured, and knowledge-dependent tasks operating under strict cost, latency, and reliability constraints. While these are often addressed through large language model (LLM) deployment or distillation into smaller models, we argue this is inefficient, unreliable, and misaligned with enterprise task structures. Instead, AI systems should treat language models as interfaces rather than monolithic engines, externalizing knowledge and computation into dedicated components for greater reliability, scalability, and transparency. Our theoretical evidences show that finite-capacity models cannot fully capture the breadth of knowledge required for enterprise tasks, creating inherent limits to efficiency and interpretability. Building on this, we take the position that language models should primarily be used for structured extraction in deter...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Miscellaneous","Artificial intelligence","Computer science"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"hf-org-paper:Qwen:2605.10730","title":"Qwen-Image-2.0 Technical Report","url":"https://huggingface.co/papers/2605.10730","published":"2026-05-10","authors":["Alibaba/Qwen"],"abstract":"We present Qwen-Image-2.0, an omni-capable image generation foundation model that unifies high-fidelity generation and precise image editing within a single framework. Despite recent progress, existing models still struggle with ultra-long text rendering, multilingual typography, high-resolution photorealism, robust instruction following, and efficient deployment, especially in text-rich and compositionally complex scenarios. Qwen-Image-2.0 addresses these challenges by coupling Qwen3-VL as the condition encoder with a Multimodal Diffusion Transformer for joint condition-target modeling, supported by large-scale data curation and a customized multi-stage training pipeline. This enables strong multimodal understanding while preserving flexible generation and editing capabilities. The model supports instructions of up to 1K tokens for generating text-rich content such as slides, posters, i...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["HuggingFace org papers","Qwen"],"author_affiliations":["Alibaba/Qwen"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/Qwen/papers"}},{"id":"arxiv:2605.09650","title":"Workspace Optimization: How to Train Your Agent","url":"https://arxiv.org/abs/2605.09650","published":"2026-05-10","authors":["Elad Sarafian","Gal Kaplun","Ron Banner","Daniel Soudry","Boris Ginsburg"],"abstract":"Modern agents built on frontier language models often cannot adapt their weights. What, then, remains trainable? We argue it is the agent's \\emph{workspace}, the structured external substrate it reads, writes, and tests; we call its evolution workspace optimization. Workspace optimization targets hard multi-turn environments where a frontier model has strong priors but cannot solve the task in a single shot, so the agent must learn through interaction. We propose a principled way to evolve the workspace, mirroring the structure of weight-space training: artifacts in place of parameters, evidence in place of data, counterexamples in place of losses, and textual feedback in place of gradients. We instantiate the idea in DreamTeam, a multi-agent harness for ARC-AGI-3 whose roles build an executable world model, plan, hypothesize, probe, strategize, and route failures. On the current 25-game...","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"","openalex_id":"https://openalex.org/W7161091689","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Nvidia (United Kingdom)","Nvidia (United States)"],"concepts":[{"id":"https://openalex.org/C189645446","display_name":"Mirroring","score":0.8593000173568726},{"id":"https://openalex.org/C58581272","display_name":"Workspace","score":0.8416000008583069},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7350999712944031},{"id":"https://openalex.org/C160145156","display_name":"Executable","score":0.6370000243186951},{"id":"https://openalex.org/C2780451532","display_name":"Task (project management)","score":0.6366000175476074},{"id":"https://openalex.org/C177264268","display_name":"Set (abstract data type)","score":0.6152999997138977},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5174999833106995},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.45879998803138733}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"arxiv:2605.09618","title":"Statistical Scouting Finds Debate-Safe but Not Debate-Useful Cases: A Matched-Ceiling Study of Open-Weight LLM Reasoning Protocols","url":"https://arxiv.org/abs/2605.09618","published":"2026-05-10","authors":["Julia Hu","Alfred Shen","Kumar Lakshmipathi"],"abstract":"When should a language model answer directly, sample and vote, or engage in multi-agent debate? Recent work shows voting often explains much of the gain attributed to debate, while selective-debate systems activate deliberation only on uncertain examples. We ask: under a matched ceiling on generated tokens (960 per example), how much per-example routing headroom exists, and how much is recoverable from cheap pre-deliberation signals? We evaluate greedy decoding, three-sample voting, and a two-agent critique-revise debate on MuSiQue and GSM8K using Llama 3.1 8B Instruct and Ministral 3 8B Instruct. On MuSiQue, an oracle selecting the correct protocol per example gains +14.0 and +13.7 pp over the best fixed one. The best fixed protocol is model- and dataset-dependent: each (model, dataset) cell has a different winner. This headroom is hard to recover from cheap ex-ante signals. A vote-entr...","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"","openalex_id":"https://openalex.org/W7161089925","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Amazon (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5637999773025513},{"id":"https://openalex.org/C55166926","display_name":"Oracle","score":0.5449000000953674},{"id":"https://openalex.org/C520049643","display_name":"Voting","score":0.5085999965667725},{"id":"https://openalex.org/C106301342","display_name":"Entropy (arrow of time)","score":0.47099998593330383},{"id":"https://openalex.org/C140940377","display_name":"Condorcet method","score":0.4212999939918518},{"id":"https://openalex.org/C107673813","display_name":"Bayesian probability","score":0.40880000591278076},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.39259999990463257},{"id":"https://openalex.org/C80444323","display_name":"Theoretical computer science","score":0.3612000048160553}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/evident-an-evidence-preserving-framework-for-iterative-system-level-package-repair","title":"EvidenT: An Evidence-Preserving Framework for Iterative System-Level Package Repair","url":"https://www.microsoft.com/en-us/research/publication/evident-an-evidence-preserving-framework-for-iterative-system-level-package-repair/","published":"2026-05-09","authors":["Chenyu Zhao","Minghua Ma","Shenglin Zhang","Zeshun Huang","Yongqian Sun","Chetan Bansal","Saravan Rajmohan","Dan Pei"],"abstract":"Frequent toolchain updates and growing ISA diversity have made system-level software package repair increasingly important. Diagnosing and repairing build failures remains challenging because failures involve heterogeneous evidence, dependency constraints, and architecture-specific build conventions. While recent LLM-based repair methods show promise for project-level source fixes, they struggle with system-level repair, where failures span multi-language artifacts such as build recipes, scripts, and source archives, and require iterative validation through external build services. In this paper, we first conduct a systematic empirical study of real-world system-level build failures. We find that 72% of failures stem from dependency and environment misconfigurations rather than isolated code defects, suggesting that effective repair must prioritize packaging logic and iterative feedback....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Miscellaneous","Programming languages and software engineering","Systems and networking","Computer science","Programming language","software engineering"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/generating-leakage-free-benchmarks-for-robust-rag-evaluation","title":"Generating Leakage-Free Benchmarks for Robust RAG Evaluation","url":"https://www.microsoft.com/en-us/research/publication/generating-leakage-free-benchmarks-for-robust-rag-evaluation/","published":"2026-05-09","authors":["Jiayi Liu","Jiaxing Zhang","Bowen Jin","Jennifer Neville"],"abstract":"Retrieval-augmented generation (RAG) is widely used to augment large language models (LLMs) with external knowledge. However, many benchmark datasets, designed to test RAG performance, comprise many questions that can already be answered from an LLM's parametric memory. This leads to unreliable evaluation. We refer to this phenomenon as knowledge leakage: cases where RAG tasks are solvable without retrieval. This issue worsens over time due to benchmark aging. As benchmarks are reused for training, their contents are increasingly absorbed into model parameters, making them less effective for evaluating retrieval. We introduce SeedRG, a semi-synthetic benchmark generation pipeline that mitigates knowledge leakage and addresses the issue of benchmark aging. Starting from a seed benchmark dataset, SeedRG extracts a reasoning graph from question-context pairs to capture their underlying reas...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Miscellaneous","Artificial intelligence","Human language technologies","Computer science","Natural language processing"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"hf-org-paper:Tencent-Hunyuan:2605.09262","title":"Reinforcing Multimodal Reasoning Against Visual Degradation","url":"https://huggingface.co/papers/2605.09262","published":"2026-05-09","authors":["Tencent/Hunyuan"],"abstract":"","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["HuggingFace org papers","Tencent-Hunyuan"],"author_affiliations":["Tencent/Hunyuan"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/Tencent-Hunyuan/papers"}},{"id":"hf-org-paper:Tencent-Hunyuan:2605.09269","title":"DeltaRubric: Generative Multimodal Reward Modeling via Joint Planning and Verification","url":"https://huggingface.co/papers/2605.09269","published":"2026-05-09","authors":["Tencent/Hunyuan"],"abstract":"","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["HuggingFace org papers","Tencent-Hunyuan"],"author_affiliations":["Tencent/Hunyuan"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/Tencent-Hunyuan/papers"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/human-inspired-memory-architecture-for-llm-agents","title":"Human-Inspired Memory Architecture for LLM Agents","url":"https://www.microsoft.com/en-us/research/publication/human-inspired-memory-architecture-for-llm-agents/","published":"2026-05-08","authors":["Doga Kerestecioglu","Alexei Robsky","Clemens Vasters","Anshu Kiran Sharma","Y. Kesselman"],"abstract":"Current LLM agents lack principled mechanisms for managing persistent memory across long interaction horizons. We present a biologically-grounded memory architecture comprising six cognitive mechanisms: (1) sleep-phase consolidation, (2) interference-based forgetting, (3) engram maturation, (4) reconsolidation upon retrieval, (5) entity knowledge graphs, and (6) hybrid multi-cue retrieval. Each mechanism addresses a specific failure mode of naive memory accumulation. We introduce a synthetic calibration methodology that derives all pipeline thresholds without benchmark data exposure, eliminating a common source of evaluation leakage. We evaluate on two benchmarks. First, a VSCode issue-tracking dataset (13K issues, 120K events) where deduplication-based consolidation achieves 97.2% retention precision with 58% store reduction (+21.8 pp over baseline). Second, the LongMemEval personal-cha...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Miscellaneous","Artificial intelligence","Search and information retrieval","Computer science","Natural language processing"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/switchcraft-ai-model-router-for-agentic-tool-calling","title":"Switchcraft: AI Model Router for Agentic Tool Calling","url":"https://www.microsoft.com/en-us/research/publication/switchcraft-ai-model-router-for-agentic-tool-calling/","published":"2026-05-08","authors":["Sharad Agarwal","Pooria Namyar","Alec Wolman","Rahul Ambavat","Ankur Gupta","Qizheng Zhang"],"abstract":"Agentic AI systems that invoke external tools are powerful but costly, leading developers to default to large models and overspend inference budgets. Model routing can mitigate this, but existing routers are designed for chat completion rather than tool use. We present Switchcraft, the first (to the best of our knowledge) model router optimized for agentic tool calling. Switchcraft operates inline, selecting the lowest-cost model subject to correctness. We construct an evaluation framework on five function-calling benchmarks and train a DistilBERT-based classifier, deployed under a latency budget. Switchcraft achieves 82.9% accuracy -- matching or exceeding the best individual model -- while reducing inference cost by 84%, saving over $3,600 per million queries. We find that larger models do not consistently outperform smaller ones on tool-use tasks, and that nominally cheaper models can...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Artificial intelligence","Systems and networking","AI agents","Tech Report"],"author_affiliations":["Microsoft","Microsoft (United States)","Microsoft Research (United Kingdom)"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/can-revealed-preferences-clarify-llm-alignment-and-steering","title":"Can Revealed Preferences Clarify LLM Alignment and Steering?","url":"https://www.microsoft.com/en-us/research/publication/can-revealed-preferences-clarify-llm-alignment-and-steering/","published":"2026-05-08","authors":["K. Yamin","Jingjing Tang","Eric Horvitz","Bryan Wilder"],"abstract":"LLMs are increasingly used to make or support high-stakes decisions under uncertainty, where alignment depends not only on factual accuracy but on how models weigh tradeoffs between different outcomes. We present an empirical pipeline for estimating the implied preferences that an LLM's observed choices optimize: we elicit the model's probability distribution over unknowns along with the choice it would make for the decision task and then fit a discrete choice model to recover the cost function that best rationalizes the model's decisions. We show how this revealed-preference description allows rigorous evaluation of whether models behave in a consistently goal-directed way, whether they can verbalize a description of their objectives which matches their revealed decision policy, and whether prompting can reliably steer those policies to implement a user-specified cost function. We apply...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Miscellaneous","Algorithms","Artificial intelligence","Computer science"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/willful-disobedience-automatically-detecting-failures-in-agentic","title":"Willful Disobedience: Automatically Detecting Failures in Agentic","url":"https://www.microsoft.com/en-us/research/publication/willful-disobedience-automatically-detecting-failures-in-agentic/","published":"2026-05-08","authors":["Reshabh K Sharma","Shraddha Barke","Ben Zorn"],"abstract":"AI agents are increasingly embedded in real software systems, where they execute multi-step workflows through multi-turn dialogue, tool invocations, and intermediate decisions. These long execution histories, called agentic traces, make validation difficult. Outcome-only benchmarks can miss critical procedural failures, such as incorrect workflow routing, unsafe tool usage, or violations of prompt-specified rules. This paper presents AgentPex, an AI-powered tool designed to systematically evaluate agentic traces. AgentPex extracts behavioral rules from agent prompts and system instructions, then uses these specifications to automatically evaluate traces for compliance. We evaluate AgentPex on 424 traces from 𝜏2-bench across models in telecom, retail, and airline customer service. Our results show that AgentPex distinguishes agent behavior across models and surfaces specification violatio...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Artificial intelligence"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"official:73afe2cefb85d7f9","title":"Teaching Claude why","url":"https://www.anthropic.com/research/teaching-claude-why","published":"2026-05-08","authors":["Anthropic"],"abstract":"New research on how we've reduced agentic misalignment.","companies":["Anthropic"],"matched_orgs":["Anthropic"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Anthropic"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Anthropic research page https://www.anthropic.com/research"}},{"id":"apple:g9uapm4wn5zdxeh08xylhuja","title":"RVPO: Risk-Sensitive Alignment via Variance Regularization","url":"https://machinelearning.apple.com/research/rvpo-risk-sensitive-alignment","published":"2026-05-08","authors":["Ivan Montero","Tomasz Jurczyk","Bhuwan Dhingra"],"abstract":"Current critic-less RLHF methods aggregate multi-objective rewards via an arithmetic mean, leaving them vulnerable to constraint neglect: high-magnitude success in one objective can numerically offset critical failures in others (e.g., safety or formatting), masking low-performing “bottleneck” rewards vital for reliable multi-objective alignment. We propose Reward-Variance Policy Optimization (RVPO), a risk-sensitive framework that penalizes...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/vla-gse-boosting-parameter-efficient-fine-tuning-in-vla-with-generalized-and-specialized-experts","title":"VLA-GSE: Boosting Parameter-Efficient Fine-Tuning in VLA with Generalized and Specialized Experts","url":"https://www.microsoft.com/en-us/research/publication/vla-gse-boosting-parameter-efficient-fine-tuning-in-vla-with-generalized-and-specialized-experts/","published":"2026-05-07","authors":["Yuhua Jiang","Jingwen Lu","Xiaoting Qin","Xiaoyu Chen","Kaixin Wang","Feifei Gao","Li Zhao"],"abstract":"Vision-language-action (VLA) models inherit rich visual-semantic priors from pre-trained vision-language backbones, but adapting them to robotic control remains challenging. Full fine-tuning (FFT) is prone to overfitting on downstream robotic data and catastrophic forgetting of pretrained vision-language capabilities. Parameter-efficient fine-tuning (PEFT) better preserves pre-trained knowledge, yet existing PEFT methods still struggle to adapt effectively to robot control tasks. To address this gap, we propose VLA-GSE, a parameter-efficient VLA fine-tuning framework that improves control adaptation while retaining PEFT's knowledge preservation advantage. Specifically, VLA-GSE (Generalized and Specialized Experts) is initialized by spectrally decomposing the frozen backbone, assigning leading singular components to generalized experts (shared experts) and disjoint residual components to....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Miscellaneous","Artificial intelligence","Human-computer interaction","Computer science"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/unifying-scientific-communication-fine-grained-correspondence-across-scientific-media","title":"Unifying Scientific Communication: Fine-Grained Correspondence Across Scientific Media","url":"https://www.microsoft.com/en-us/research/publication/unifying-scientific-communication-fine-grained-correspondence-across-scientific-media/","published":"2026-05-07","authors":["Megha Mariam K.M","Vineeth N Balasubramanian","C. V. Jawahar"],"abstract":"The communication of scientific knowledge has become increasingly multimodal, spanning text, visuals, and speech through materials such as research papers, slides, and recorded presentations. These different representations collectively convey a study's reasoning, results, and insights, offering complementary perspectives that enrich understanding. However, despite their shared purpose, such materials are rarely connected in a structured way. The absence of explicit links across formats makes it difficult to trace how concepts, visuals, and explanations correspond, limiting unified exploration and analysis of research content. To address this gap, we introduce the Multimodal Conference Dataset (MCD), the first benchmark that integrates research papers, presentation videos, explanatory videos, and slides from the same works. We evaluate a range of embedding-based and vision-language model...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Inproceedings (Conference)","Computer vision","Computer science","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/quantizing-with-randomized-hadamard-transforms-efficient-heuristic-now-proven","title":"Quantizing With Randomized Hadamard Transforms: Efficient Heuristic Now Proven","url":"https://www.microsoft.com/en-us/research/publication/quantizing-with-randomized-hadamard-transforms-efficient-heuristic-now-proven/","published":"2026-05-07","authors":["Ran Ben-Basat","William Kuszmaul","Michael Mitzenmacher","Amit Portnoy","S. Vargaftik"],"abstract":"Uniform random rotations (URRs) are a common preprocessing step in modern quantization approaches used for gradient compression, inference acceleration, KV-cache compression, model weight quantization, and approximate nearest-neighbor search in vector databases. In practice, URRs are often replaced by randomized Hadamard transforms (RHTs), which preserve orthogonality while admitting fast implementations. The remaining issue is the performance for worst-case inputs. With a URR, each coordinate is individually distributed as a shifted beta distribution, which converges to a Gaussian distribution in high dimensions. Generally, one RHT is not suitable in the worst case, as individual coordinates can be far from these distributions. We show that after composing two RHTs on any $d$-sized input vector, the marginal distribution of every fixed coordinate of the normalized rotated vector is with...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Miscellaneous","Artificial intelligence","Computer science","Machine learning"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/datadignity-training-data-attribution-for-large-language-models","title":"DataDignity: Training Data Attribution for Large Language Models","url":"https://www.microsoft.com/en-us/research/publication/datadignity-training-data-attribution-for-large-language-models/","published":"2026-05-07","authors":["Xiaomin Li","Andrzej Banburski-Fahey","Jaron Lanier"],"abstract":"Auditing language-model outputs often requires more than judging correctness: an auditor may need to identify which source document most likely supports the knowledge expressed in a response. We study this as pinpoint provenance: given a prompt, a target-model response, and a candidate corpus, rank the documents that best support the response. We introduce FakeWiki, a controlled benchmark of 3,537 fabricated Wikipedia-style articles designed to preserve ground-truth provenance while weakening lexical shortcuts. FakeWiki includes QA probes, source-preserving paraphrases, retro-generated variants, hard anti-documents that remain topically similar while removing answer-critical facts, and five query conditions: clean prompting plus four jailbreak-inspired transformations. We evaluate seven retrieval baselines, a training-free activation-steering retrieval-fusion method, SteerFuse, and a sup...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Miscellaneous","Artificial intelligence","Human language technologies","Computer science"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/agenticrag-agentic-retrieval-for-enterprise-knowledge-bases","title":"AgenticRAG: Agentic Retrieval for Enterprise Knowledge Bases","url":"https://www.microsoft.com/en-us/research/publication/agenticrag-agentic-retrieval-for-enterprise-knowledge-bases/","published":"2026-05-07","authors":["Susheel Suresh","Hazel Mak","Sha-chʻen Chou","Fred W. Kroon","Sahil Bhatnagar"],"abstract":"We present AgenticRAG, a practical agentic harness for retrieval and analysis over enterprise knowledge bases. Standard RAG pipelines place significant burden of grounding on the search stack, constraining the language model to a fixed candidate set chosen deep in the retrieval process. Our approach reduces this overdependence by layering a lightweight harness on top of existing enterprise search infrastructure, equipping a reasoning LLM with search, find, open, and summarize tools enabling the model to iteratively retrieve information, navigate within documents, and analyze evidence autonomously. On three open benchmarks we observe substantial gains: $49.6%$ recall@1 on BRIGHT (+21.8 pp over the best embedding baseline), 0.96 factuality on WixQA ($+13%$ relative improvement), and $92%$ answer correctness on FinanceBench--within 2 pp of oracle access to true evidence. Ablation studies sh...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Miscellaneous","Artificial intelligence","Search and information retrieval","Computer science"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/xl-safetybench-a-country-grounded-cross-cultural-benchmark-for-llm-safety-and-cultural-sensitivity","title":"XL-SafetyBench: A Country-Grounded Cross-Cultural Benchmark for LLM Safety and Cultural Sensitivity","url":"https://www.microsoft.com/en-us/research/publication/xl-safetybench-a-country-grounded-cross-cultural-benchmark-for-llm-safety-and-cultural-sensitivity/","published":"2026-05-07","authors":["Dasol Choi","Eugenia Kim","Jae-won Noh","Sanghyun Seo","Eunmi Kim","Yunjin Park","Brigitta Jesica Kartono","Josef Pichlmeier","Helena Berndt","Sai Krishna Mendu","Glenn Johannes Tungka","Ozlem Gokcce"],"abstract":"Current LLM safety benchmarks are predominantly English-centric and often rely on translation, failing to capture country-specific harms. Moreover, they rarely evaluate a model's ability to detect culturally embedded sensitivities as distinct from universal harms. We introduce XL-SafetyBench. a suite of 5,500 test cases across 10 country-language pairs, comprising a Jailbreak Benchmark of country-grounded adversarial prompts and a Cultural Benchmark where local sensitivities are embedded within innocuous requests. Each item is constructed via a multi-stage pipeline that combines LLM-assisted discovery, automated validation gates, and dual independent native-speaker annotators per country. To distinguish principled refusal from comprehension failure, we evaluate Attack Success Rate (ASR) alongside two complementary metrics we introduce: Neutral-Safe Rate (NSR) and Cultural Sensitivity Rat...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Miscellaneous","Artificial intelligence","Computer science"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"hf-org-paper:Tencent-Hunyuan:2605.07545","title":"Implicit Preference Alignment for Human Image Animation","url":"https://huggingface.co/papers/2605.07545","published":"2026-05-07","authors":["Tencent/Hunyuan"],"abstract":"","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["HuggingFace org papers","Tencent-Hunyuan"],"author_affiliations":["Tencent/Hunyuan"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/Tencent-Hunyuan/papers"}},{"id":"hf-org-paper:baidu:2605.00425","title":"AEM: Adaptive Entropy Modulation for Multi-Turn Agentic Reinforcement Learning","url":"https://huggingface.co/papers/2605.00425","published":"2026-05-07","authors":["Baidu"],"abstract":"","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["HuggingFace org papers","baidu"],"author_affiliations":["Baidu"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/baidu/papers"}},{"id":"official:d48c6138d31ac5ca","title":"Natural Language Autoencoders: Turning Claude’s thoughts into text","url":"https://www.anthropic.com/research/natural-language-autoencoders","published":"2026-05-07","authors":["Anthropic"],"abstract":"AI models like Claude talk in words but think in numbers. In this study we train Claude to translate its thoughts into human-readable text.","companies":["Anthropic"],"matched_orgs":["Anthropic"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Anthropic"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Anthropic research page https://www.anthropic.com/research"}},{"id":"arxiv:2605.06184","title":"Teaching LLMs Program Semantics via Symbolic Execution Traces","url":"https://arxiv.org/abs/2605.06184","published":"2026-05-07","authors":["Jonas Bayer","Stefan Zetzsche","Olivier Bouissou","Rémi Delmas","Michael Tautschnig","Soonho Kong"],"abstract":"We introduce an evaluation framework of 500 C verification tasks across five property types (memory safety, overflow, termination, reachability, data races) built on SV-COMP 2025, and evaluate 14 models across six families. We find that high overall accuracy masks a critical weakness: while most models reliably confirm properties hold, violation detection varies widely and degrades sharply with program length. To close this gap, we train on formal verification artifacts: running the Soteria symbolic execution engine on generic open-source C code and using the resulting traces for continued pretraining of Qwen3-8B. Just ${\\sim}$3,000 bug traces combined with chain-of-thought reasoning at inference time improve violation detection by over 17 percentage points, producing one of the most balanced accuracy profiles among evaluated models. On violation detection, the trained 8B model outperfor...","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"","openalex_id":"https://openalex.org/W7160726425","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Amazon (United States)"],"concepts":[{"id":"https://openalex.org/C75291252","display_name":"TRACE (psycholinguistics)","score":0.8163999915122986},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7565000057220459},{"id":"https://openalex.org/C199360897","display_name":"Programming language","score":0.6370999813079834},{"id":"https://openalex.org/C2776214188","display_name":"Inference","score":0.6313999891281128},{"id":"https://openalex.org/C184337299","display_name":"Semantics (computer science)","score":0.6288999915122986},{"id":"https://openalex.org/C189950617","display_name":"Property (philosophy)","score":0.6151999831199646},{"id":"https://openalex.org/C2776760102","display_name":"Code (set theory)","score":0.5809999704360962},{"id":"https://openalex.org/C2779639559","display_name":"Symbolic execution","score":0.4226999878883362}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"arxiv:2605.05716","title":"More Is Not Always Better: Cross-Component Interference in LLM Agent Scaffolding","url":"https://arxiv.org/abs/2605.05716","published":"2026-05-07","authors":["Ming Liu"],"abstract":"LLM agent systems are built by stacking scaffolding components (planning, tools, memory, self-reflection, retrieval) assuming more is better. We study cross-component interference (CCI): degradation when components interact destructively. We run a full factorial experiment over all 2^5=32 subsets of five components on HotpotQA and GSM8K with Llama-3.1-8B/70B (96 conditions, up to 10 seeds). The All-In system is consistently suboptimal: on HotpotQA, a single-tool agent surpasses All-In by 32% (F1 0.233 vs 0.177, p=0.023); on GSM8K, a 3-component subset beats All-In by 79% (0.43 vs 0.24, p=0.010). Optimal component count is task-dependent (k*=1-4) and scale-sensitive: at 70B, combinations that hurt at 8B provide gains, though All-In still trails the best subset. We fit a main-effects regression (R^2=0.916, adj-R^2=0.899, LOOCV=0.872), compute exact Shapley values, and find 183/325 submodul...","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"","openalex_id":"https://openalex.org/W7160726264","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Amazon (United States)"],"concepts":[{"id":"https://openalex.org/C81917197","display_name":"Selection (genetic algorithm)","score":0.6607999801635742},{"id":"https://openalex.org/C32022120","display_name":"Interference (communication)","score":0.6222000122070312},{"id":"https://openalex.org/C168167062","display_name":"Component (thermodynamics)","score":0.5728999972343445},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5011000037193298},{"id":"https://openalex.org/C51823790","display_name":"Greedy algorithm","score":0.42340001463890076},{"id":"https://openalex.org/C149629883","display_name":"Fraction (chemistry)","score":0.4083999991416931},{"id":"https://openalex.org/C126255220","display_name":"Mathematical optimization","score":0.3901999890804291},{"id":"https://openalex.org/C69637215","display_name":"Default","score":0.38580000400543213}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"arxiv:2412.16359","title":"Human-Readable Adversarial Prompts: An Investigation into LLM Vulnerabilities Using Situational Context","url":"http://arxiv.org/abs/2412.16359","published":"2026-05-07","authors":["Nilanjana Das","Edward Raff","Aman Chadha","Manas Gaur"],"abstract":"Large Language Models (LLMs) remain vulnerable to adversarial prompts that elicit harmful responses. While many existing safety systems are better able to detect overtly nonsensical attack strings, human-readable prompts embedded in plausible situational contexts remain harder to identify and evaluate. This paper presents an empirical investigation of human-readable, situation-driven adversarial prompts for assessing LLM robustness. First, we use movie scripts as situational contexts (e.g., crime narratives) to construct natural-looking prompts that bypass safety mechanisms. Second, we transform adversarial gibberish into coherent, innocuous-appearing text that retains exploitation capability within these contextual frameworks. Third, we enhance the AdvPrompter framework with p-nucleus sampling to generate diverse human-readable attacks, substantially improving success rates against mode...","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"preprint","doi":"https://doi.org/10.1145/3815174","openalex_id":"https://openalex.org/W4405765791","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Amazon (United States)","University of Maryland, Baltimore County"],"concepts":[{"id":"https://openalex.org/C37736160","display_name":"Adversarial system","score":0.8687513470649719},{"id":"https://openalex.org/C2779343474","display_name":"Context (archaeology)","score":0.6706539392471313},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5998370051383972},{"id":"https://openalex.org/C9114305","display_name":"Situational ethics","score":0.4918510913848877},{"id":"https://openalex.org/C38652104","display_name":"Computer security","score":0.4831923544406891},{"id":"https://openalex.org/C15744967","display_name":"Psychology","score":0.26447951793670654},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.1308411955833435},{"id":"https://openalex.org/C77805123","display_name":"Social psychology","score":0.07505398988723755}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"arxiv:2502.16469","title":"Cross-domain Few-shot Object Detection with Multi-modal Textual Enrichment","url":"http://arxiv.org/abs/2502.16469","published":"2026-05-07","authors":["Zeyu Shangguan","Daniel Seita","Mohammad Rostami"],"abstract":"","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1007/s11263-026-02839-7","openalex_id":"https://openalex.org/W4414841383","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Amazon (United States)","University of Southern California"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8122000098228455},{"id":"https://openalex.org/C2776151529","display_name":"Object detection","score":0.6459000110626221},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.6322000026702881},{"id":"https://openalex.org/C153083717","display_name":"Leverage (statistics)","score":0.5853000283241272},{"id":"https://openalex.org/C2776401178","display_name":"Feature (linguistics)","score":0.5491999983787537},{"id":"https://openalex.org/C192209626","display_name":"Focus (optics)","score":0.49810001254081726},{"id":"https://openalex.org/C52622490","display_name":"Feature extraction","score":0.49219998717308044},{"id":"https://openalex.org/C184337299","display_name":"Semantics (computer science)","score":0.46799999475479126}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"arxiv:2605.06200","title":"A$^2$TGPO: Agentic Turn-Group Policy Optimization with Adaptive Turn-level Clipping","url":"https://arxiv.org/abs/2605.06200","published":"2026-05-07","authors":["Dingwei Chen","Zefang Zong","Zhipeng Ma","Leo Luo","Yang Li","Chengming Li","Peng Chen","Jie Jiang"],"abstract":"Reinforcement learning for agentic large language models (LLMs) typically relies on a sparse, trajectory-level outcome reward, making it difficult to evaluate the contribution of individual tool-calls within multi-turn interactions. Existing approaches to such process credit assignment either depend on separate external process reward models that introduce additional consumption, or tree-based structural rollout that merely redistributes the outcome signal while constraining trajectory diversity. A promising alternative leverages the per-turn change in the policy's predicted probability of the ground-truth, termed Information Gain (IG), as an intrinsic process signal without an external evaluator. However, prior work on leveraging IG signals within the RL training loop faces three systematic challenges: normalizing across turns that face heterogeneous positional contexts can distort the....","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"","openalex_id":"https://openalex.org/W7160726178","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Chinese University of Hong Kong, Shenzhen","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C2776848632","display_name":"Clipping (morphology)","score":0.7299000024795532},{"id":"https://openalex.org/C13662910","display_name":"Trajectory","score":0.6492999792098999},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5763000249862671},{"id":"https://openalex.org/C98045186","display_name":"Process (computing)","score":0.5752999782562256},{"id":"https://openalex.org/C148220186","display_name":"Outcome (game theory)","score":0.5242999792098999},{"id":"https://openalex.org/C97541855","display_name":"Reinforcement learning","score":0.4749000072479248},{"id":"https://openalex.org/C2779843651","display_name":"SIGNAL (programming language)","score":0.4697999954223633},{"id":"https://openalex.org/C204323151","display_name":"Range (aeronautics)","score":0.45680001378059387}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"arxiv:2605.06892","title":"Not All Tokens Need 40 Steps: Heterogeneous Step Allocation in Diffusion Transformers for Efficient Video Generation","url":"https://huggingface.co/papers/2605.06892","published":"2026-05-07","authors":["Ernie Chu","Vishal M. Patel"],"abstract":"Diffusion Transformers (DiTs) have achieved state-of-the-art video generation quality, but they incur immense computational cost because standard inference applies the same number of denoising steps uniformly to every token in the sequence. It is well known that human vision ignores vast amounts of redundant motion. Why, then, do our densest models treat every spatiotemporal token with equal priority? In this paper, we introduce Heterogeneous Step Allocation (HSA), a training-free inference algorithm that assigns varying step budgets to different spatiotemporal tokens based on their velocity dynamics. To resolve the resulting sequence-length mismatch without sacrificing global context, HSA introduces a KV-cache synchronization mechanism that allows active tokens to attend to the full sequence while entirely bypassing inactive tokens. Furthermore, we derive a cached Euler update that adva...","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["huggingface_search"],"source":"huggingface_search","work_type":"","doi":"","openalex_id":"","cited_by_count":0,"quality_score":27,"matched_keywords":[],"author_affiliations":[],"concepts":[],"official_report":false,"quality_signals":{"company_match_source":"HuggingFace Papers organization/author metadata"}},{"id":"hf-org-paper:tencent:2605.06221","title":"UniPrefill: Universal Long-Context Prefill Acceleration via Block-wise Dynamic Sparsification","url":"https://huggingface.co/papers/2605.06221","published":"2026-05-06","authors":["Tencent/Hunyuan"],"abstract":"","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["HuggingFace org papers","tencent"],"author_affiliations":["Tencent/Hunyuan"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/tencent/papers"}},{"id":"hf-org-paper:tencent:2605.06416","title":"MiA-Signature: Approximating Global Activation for Long-Context Understanding","url":"https://huggingface.co/papers/2605.06416","published":"2026-05-06","authors":["Tencent/Hunyuan"],"abstract":"","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["HuggingFace org papers","tencent"],"author_affiliations":["Tencent/Hunyuan"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/tencent/papers"}},{"id":"hf-org-paper:Tencent-Hunyuan:2605.06139","title":"Listwise Policy Optimization: Group-based RLVR as Target-Projection on the LLM Response Simplex","url":"https://huggingface.co/papers/2605.06139","published":"2026-05-06","authors":["Tencent/Hunyuan"],"abstract":"","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["HuggingFace org papers","Tencent-Hunyuan"],"author_affiliations":["Tencent/Hunyuan"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/Tencent-Hunyuan/papers"}},{"id":"hf-org-paper:tencent:2605.06200","title":"A^2TGPO: Agentic Turn-Group Policy Optimization with Adaptive Turn-level Clipping","url":"https://huggingface.co/papers/2605.06200","published":"2026-05-06","authors":["Tencent/Hunyuan"],"abstract":"","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["HuggingFace org papers","tencent"],"author_affiliations":["Tencent/Hunyuan"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/tencent/papers"}},{"id":"apple:u1gfev8er75tyibe411bwocm","title":"SpecMD: A Comprehensive Study on Speculative Expert Prefetching","url":"https://machinelearning.apple.com/research/specmd-expert-prefetching","published":"2026-05-06","authors":["Duc Hoang","Ajay Jaiswal","Mohammad Samragh Razlighi","Minsik Cho"],"abstract":"Mixture-of-Experts (MoE) models enable sparse expert activation, meaning that only a subset of the model's parameters is used during each inference. However, to translate this sparsity into practical performance, an expert caching mechanism is required. Previous works have proposed hardware-centric caching policies, but how these various caching policies interact with each other and different hardware specification remains poorly understood. To...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"apple:r1a590rdil092wyyyvrx7dpj","title":"From Where Things Are to What They’re For: Benchmarking Spatial–Functional Intelligence for Multimodal LLMs","url":"https://machinelearning.apple.com/research/spatial","published":"2026-05-06","authors":["Le Zhang","Jihan Yang","Soundarya Krishnan","Jimit Majmudar","Xiou Ge","Prasoon Puri","Prathamesh Saraf","Shruti Bhargava","Dhivya Piraviperumal","Yinan Ling","Cindy Pan","Hong Yu"],"abstract":"True spatial intelligence for multimodal agents transcends low-level geometric perception, evolving from knowing where things are to understanding what they are for. While existing benchmarks, such as VSI-Bench, effectively evaluate this foundational geometric stage, they fall short of probing the higher-order cognitive abilities essential for grounded intelligence. To bridge this gap, we introduce the Spatial-Functional Intelligence Benchmark...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"openalex:W7160380856","title":"CAMV: A Framework for Context-Aware Multi-View Visualization of Data Analysis Results","url":"https://doi.org/10.1007/s41019-026-00353-x","published":"2026-05-06","authors":["Yanna Lin","Liwenhan Xie","Leixian Shen","Zhuochen Jin","Sicheng Song","Zikun Deng","Huamin Qu","Ke Xu"],"abstract":"Understanding analytical results in visual data analysis is essential to inform further exploration and decision-making. However, existing tools often visualize the results alone, offering limited access to associated contexts needed to explain why the results occur. To address this gap, we propose CAMV, a framework that automatically extracts relevant contextual information from given results and datasets, and generates multi-view visualizations to support deeper understanding. CAMV consists of two major components: one that identifies key data subspaces and fact types as explanatory context, and the other that generates coordinated multi-view visualizations of both the analysis results and their contextual information. Based on the framework, we developed an interactive prototype and conducted a comparison study with 12 participants, using GPT-4o as a baseline. Results show that CAMV e...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1007/s41019-026-00353-x","openalex_id":"https://openalex.org/W7160380856","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["East China Normal University","Hong Kong University of Science and Technology","Huawei Technologies (China)","Nanjing University","South China University of Technology"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8820000290870667},{"id":"https://openalex.org/C36464697","display_name":"Visualization","score":0.744700014591217},{"id":"https://openalex.org/C26517878","display_name":"Key (lock)","score":0.6360999941825867},{"id":"https://openalex.org/C12362212","display_name":"Linear subspace","score":0.6151000261306763},{"id":"https://openalex.org/C172367668","display_name":"Data visualization","score":0.5942000150680542},{"id":"https://openalex.org/C2780977526","display_name":"Data exploration","score":0.515999972820282},{"id":"https://openalex.org/C124101348","display_name":"Data mining","score":0.430400013923645},{"id":"https://openalex.org/C64073096","display_name":"Interactive visualization","score":0.414000004529953}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/audio-visual-intelligence-in-large-foundation-models","title":"Audio-Visual Intelligence in Large Foundation Models","url":"https://www.microsoft.com/en-us/research/publication/audio-visual-intelligence-in-large-foundation-models/","published":"2026-05-05","authors":["Youxuan Qin","Kaihong Liu","Shengqiong Wu","Kai Wang","Shijian Deng","Yapeng Tian","Junbin Xiao","Yazhou Xing","Yinghao Ma","Bobo Li","Roger Zimmermann","Lei Cui"],"abstract":"Audio-Visual Intelligence (AVI) has emerged as a central frontier in artificial intelligence, bridging auditory and visual modalities to enable machines that can perceive, generate, and interact in the multimodal real world. In the era of large foundation models, joint modeling of audio and vision has become increasingly crucial, i.e., not only for understanding but also for controllable generation and reasoning across dynamic, temporally grounded signals. Recent advances, such as Meta MovieGen and Google Veo-3, highlight the growing industrial and academic focus on unified audio-vision architectures that learn from massive multimodal data. However, despite rapid progress, the literature remains fragmented, spanning diverse tasks, inconsistent taxonomies, and heterogeneous evaluation practices that impede systematic comparison and knowledge integration. This survey provides the first com...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Miscellaneous","Artificial intelligence","Computer vision","Computer science"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/agentic-imodels-evolving-agentic-interpretability-tools-via-autoresearch","title":"Agentic-imodels: Evolving agentic interpretability tools via autoresearch","url":"https://www.microsoft.com/en-us/research/publication/agentic-imodels-evolving-agentic-interpretability-tools-via-autoresearch/","published":"2026-05-05","authors":["Chandan Singh","Yuting Tan","Weijia Xu","Zelalem Gero","Weiwei Yang","Michel Galley","Jianfeng Gao"],"abstract":"Agentic data science (ADS) systems are rapidly improving their capability to autonomously analyze, fit, and interpret data, potentially moving towards a future where agents conduct the vast majority of data-science work. However, current ADS systems use statistical tools designed to be interpretable by humans, rather than interpretable by agents. To address this, we introduce Agentic-imodels, an agentic autoresearch loop that evolves data-science tools designed to be interpretable by agents. Specifically, it develops a library of scikit-learn-compatible regressors for tabular data that are optimized for both predictive performance and a novel LLM-based interpretability metric. The metric measures a suite of LLM-graded tests that probe whether a fitted model's string representation is\"simulatable\"by an LLM, i.e. whether the LLM can answer questions about the model's behavior by reading it...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Miscellaneous","Algorithms","Artificial intelligence","Computer science"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"hf-org-paper:Tencent-Hunyuan:2605.05185","title":"OpenSearch-VL: An Open Recipe for Frontier Multimodal Search Agents","url":"https://huggingface.co/papers/2605.05185","published":"2026-05-05","authors":["Tencent/Hunyuan"],"abstract":"","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["HuggingFace org papers","Tencent-Hunyuan"],"author_affiliations":["Tencent/Hunyuan"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/Tencent-Hunyuan/papers"}},{"id":"hf-org-paper:Tencent-Hunyuan:2605.04702","title":"FaithfulFaces: Pose-Faithful Facial Identity Preservation for Text-to-Video Generation","url":"https://huggingface.co/papers/2605.04702","published":"2026-05-05","authors":["Tencent/Hunyuan"],"abstract":"","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["HuggingFace org papers","Tencent-Hunyuan"],"author_affiliations":["Tencent/Hunyuan"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/Tencent-Hunyuan/papers"}},{"id":"official:97e732ab157a4889","title":"GPT-5.5 Instant System Card","url":"https://openai.com/index/gpt-5-5-instant-system-card","published":"2026-05-05","authors":["OpenAI"],"abstract":"","companies":["OpenAI"],"matched_orgs":["OpenAI"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Safety"],"author_affiliations":["OpenAI"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official RSS feed https://openai.com/news/rss.xml"}},{"id":"apple:kns2v42wqo0z8oqov3s3zuhm","title":"Stochastic KV Routing: Enabling Adaptive Depth-Wise Cache Sharing","url":"https://machinelearning.apple.com/research/stochastic-kv-routing","published":"2026-05-05","authors":["Anastasiia Filippova","David Grangier","Marco Cuturi","João Monteiro"],"abstract":"Serving transformer language models with high throughput requires caching Key-Values (KVs) to avoid redundant computation during autoregressive generation. The memory footprint of KV caching is significant and heavily impacts serving costs. This work proposes to lessen these memory requirements. While recent work has largely addressed KV cache reduction via compression and eviction along the temporal axis, we argue that the depth dimension offers...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/terminus-4b-can-a-smaller-model-replace-frontier-llms-at-agentic-execution-tasks","title":"Terminus-4B: Can a Smaller Model Replace Frontier LLMs at Agentic Execution Tasks?","url":"https://www.microsoft.com/en-us/research/publication/terminus-4b-can-a-smaller-model-replace-frontier-llms-at-agentic-execution-tasks/","published":"2026-05-04","authors":["Spandan Garg","V. Nitin","Yufan Huang"],"abstract":"Modern coding agents increasingly delegate specialized subtasks to subagents, which are smaller, focused agentic loops that handle narrow responsibilities like search, debugging or terminal execution. This architectural pattern keeps the main agent's context window clean by isolating verbose outputs (e.g. build logs, test results, etc.) within the subagent context. Typically when agents employ subagents for such tasks, they use frontier models as these subagents. In this paper, we investigate whether a finetuned small language model (SLM) can achieve comparable performance to frontier models in the task of agentic terminal execution. We present Terminus-4B, which is a post-trained Qwen3-4B model via Supervised Finetuning (SFT) and Reinforcement Learning (RL) using rubric-based LLM-as-judge reward, specifically for this task. In our extensive evaluation spanning various frontier models, t...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Miscellaneous","Artificial intelligence","Programming languages and software engineering","Computer science"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/learning-correct-behavior-from-examples-validating-sequential-execution-in-autonomous-agents","title":"Learning Correct Behavior from Examples: Validating Sequential Execution in Autonomous Agents","url":"https://www.microsoft.com/en-us/research/publication/learning-correct-behavior-from-examples-validating-sequential-execution-in-autonomous-agents/","published":"2026-05-04","authors":["Reshabh K Sharma","Gaurav Mittal","Yu Hu"],"abstract":"As autonomous agents become increasingly sophisticated, validating their sequential behavior presents a significant challenge. Traditional testing approaches require manual specification, exact sequence matching, or thousands of training examples. We present a novel algorithm that automatically learns correct behavior from just 2-10 passing execution traces and validates new executions against this learned model. Our approach combines dominator analysis from compiler theory with multimodal large language model-powered semantic understanding to identify essential states and handle non-deterministic behavior. The system constructs a generalized ground truth model using Prefix Tree Acceptors, merges traces through multi-tiered equivalence detection, and validates new executions via topological subsequence matching. In controlled experiments, our system achieved high accuracy in detecting pr...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Miscellaneous","Artificial intelligence","Programming languages and software engineering","Computer science"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/the-origins-of-artificial-intelligence-in-natural-intelligence","title":"The Origins of Artificial Intelligence in Natural Intelligence","url":"https://www.microsoft.com/en-us/research/publication/the-origins-of-artificial-intelligence-in-natural-intelligence/","published":"2026-05-04","authors":["Ken Archer","Harald Wiltsche"],"abstract":"Contemporary AI systems - especially large language models (LLMs) - exhibit a striking combination of capabilities and failures: apparent fluency and cross-domain coherence alongside persistent breakdowns in compositional generalization, object tracking, and truth reliability. These phenomena are often interpreted in two opposing ways: either as evidence that AI is converging on human-like intelligence, or as proof that it merely manipulates linguistic form without understanding. In recent interdisciplinary work - including Adam Frank, Marcelo Gleiser, and Evan Thompson's The Blind Spot and DeepMind researcher Alexander Lerchner's “The Abstraction Fallacy” - a different picture is emerging. Rather than asking whether AI systems are becoming intelligent in the human sense, these approaches ask a more basic question: What if AI systems work because they rely on structures that are rooted i...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Artificial intelligence","Unpublished"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"apple:q41labgaxnw7c18scr7nf1uv","title":"PORTool: Importance-Aware Policy Optimization with Rewarded Tree for Multi-Tool-Integrated Reasoning","url":"https://machinelearning.apple.com/research/portool-policy-optimization","published":"2026-05-04","authors":["Feijie Wu","Weiwu Zhu","Yuxiang Zhang","Soumya Chatterjee","Jiarong Zhu","Fan Mo","Rong Luo","Jing Gao"],"abstract":"Multi-tool-integrated reasoning enables LLM-empowered tool-use agents to solve complex tasks by interleaving natural-language reasoning with calls to external tools. However, training such agents using outcome-only rewards suffers from credit-assignment ambiguity, obscuring which intermediate steps (or tool-use decisions) lead to success or failure. In this paper, we propose PORTool, an importance-aware policy-optimization algorithm that...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["LLM"],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"arxiv:2605.02458","title":"Active multiple matrix completion with adaptive confidence sets","url":"https://arxiv.org/abs/2605.02458","published":"2026-05-04","authors":["Andrea Locatelli","Alexandra Carpentier","Michal Vaľko"],"abstract":"In this work, we formulate a new multi-task active learning setting in which the learner's goal is to solve multiple matrix completion problems simultaneously. At each round, the learner can choose from which matrix it receives a sample from an entry drawn uniformly at random. Our main practical motivation is market segmentation, where the matrices represent different regions with different preferences of the customers. The challenge in this setting is that each of the matrices can be of a different size and also of a different rank which is unknown. We provide and analyze a new algorithm, MAlocate that is able to adapt to the unknown ranks of the different matrices. We then give a lower-bound showing that our strategy is minimax-optimal and demonstrate its performance with synthetic experiments.","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"preprint","doi":"https://doi.org/10.48550/arxiv.2605.02458","openalex_id":"https://openalex.org/W2922008860","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Google DeepMind (United Kingdom)","Otto-von-Guericke-Universität Magdeburg"],"concepts":[{"id":"https://openalex.org/C2778459887","display_name":"Matrix completion","score":0.7964900732040405},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6682680249214172},{"id":"https://openalex.org/C2780451532","display_name":"Task (project management)","score":0.6645870804786682},{"id":"https://openalex.org/C106487976","display_name":"Matrix (chemical analysis)","score":0.6235470175743103},{"id":"https://openalex.org/C164226766","display_name":"Rank (graph theory)","score":0.5838831067085266},{"id":"https://openalex.org/C2778445095","display_name":"Sample complexity","score":0.4815416932106018},{"id":"https://openalex.org/C149728462","display_name":"Minimax","score":0.4687374234199524},{"id":"https://openalex.org/C125308379","display_name":"Market segmentation","score":0.4607129991054535}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/cross-layer-energy-analysis-of-multimodal-training-on-grace-hopper-superchips","title":"Cross-Layer Energy Analysis of Multimodal Training on Grace Hopper Superchips","url":"https://www.microsoft.com/en-us/research/publication/cross-layer-energy-analysis-of-multimodal-training-on-grace-hopper-superchips/","published":"2026-05-03","authors":["Mahmoud Ahmed","Sameh Abdulah","Olatunji Ruwase","Sam Ade Jacobs","Mathis Bode","Mohamed Elhoseiny","David E. Keyes"],"abstract":"Multimodal deep learning models enable joint learning across heterogeneous data sources, including text, images, and video, but their rapid scaling introduces significant memory and communication bottlenecks. As model sizes and sequence lengths increase, training performance becomes increasingly impacted by data movement rather than computation. Frameworks such as DeepSpeed mitigate these challenges through CPU offloading, activation checkpointing, and communication optimizations. However, these techniques introduce additional system activity, which may affect energy efficiency. Meanwhile, tightly integrated heterogeneous architectures, such as the NVIDIA Grace Hopper (GH200) superchip, provide high-bandwidth CPU-GPU interconnects and unified memory, thereby reducing data transfer overhead. In this work, we present a cross-layer analysis of energy and performance trade-offs in multimodal...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Miscellaneous","Hardware and devices","Systems and networking","Computer science","systems and networking"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/only-say-what-you-know-calibration-aware-generation-for-long-form-factuality","title":"Only Say What You Know: Calibration-Aware Generation for Long-Form Factuality","url":"https://www.microsoft.com/en-us/research/publication/only-say-what-you-know-calibration-aware-generation-for-long-form-factuality/","published":"2026-05-03","authors":["Wen Luo","Guangyue Peng","Liang Wang","Nan Yang","Wei Li","Yuhan Song","Shaohang Wei","Feifan Song","Furu Wei","Houfeng Wang"],"abstract":"Large Reasoning Models achieve strong performance on complex tasks but remain prone to hallucinations, particularly in long-form generation where errors compound across reasoning steps. Existing approaches to improving factuality, including abstention and factuality-driven optimization, follow a emph{coupled exploration-commitment} paradigm, in which intermediate reasoning is unconditionally propagated to the final output, limiting fine-grained control over information selection and integration. In this paper, we propose an textbf{Exploration-Commitment Decoupling} paradigm that disentangles knowledge exploration from final commitment, enabling models to explore with awareness while answering cautiously. We instantiate the paradigm with textbf{Calibration-Aware Generation (CAG)}, a framework that equips models with end-to-end, calibration-aware generation capabilities, by augmenting inte...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Miscellaneous","Artificial intelligence","Human language technologies","Computer science"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/a-comprehensive-ecosystem-for-open-domain-customized-video-generation","title":"A Comprehensive Ecosystem for Open-Domain Customized Video Generation","url":"https://www.microsoft.com/en-us/research/publication/a-comprehensive-ecosystem-for-open-domain-customized-video-generation/","published":"2026-05-03","authors":["Jingxu Zhang","Yuqian Hong","Daneul Kim","Kai Qiu","Qi Dai","Jianmin Bao","Yifan Yang","Xiaoyan Sun","Chong Luo"],"abstract":"Recent progress in video generation has shown impressive visual synthesis capabilities. However, open-domain customized video generation remains limited by the lack of large-scale, annotated datasets capturing diverse identity-specific attributes. To address this, we introduce PexelsCustom-1M, the first publicly available million-scale dataset for identity-preserving video generation, containing one million curated ⟨identity, text, video⟩ triplets across 8,000+ categories. Leveraging this, we propose CustoMDiT, a parameter-efficient framework that adapts a pretrained multimodal Diffusion Transformer into a customized video generator with only 8% additional learnable parameters. Our method surpasses prior state-of-the-art. However, benchmarks such as DreamBooth cover only 100 classes, which is insufficient for real-world applications. To overcome this, we construct OpenCustom, a new bench...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Inproceedings (Conference)","Audio and Acoustics","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"arxiv:2605.01489","title":"SciResearcher: Scaling Deep Research Agents for Frontier Scientific Reasoning","url":"https://arxiv.org/abs/2605.01489","published":"2026-05-02","authors":["Tianshi Zheng","Rui Wang","Xiyun Li","Yangqiu Song","Tianqing Fang"],"abstract":"Frontier scientific reasoning is rapidly emerging as a key foundation for advancing AI agents in automated scientific discovery. Deep research agents offer a promising approach to this challenge. These models develop robust problem-solving capabilities through post-training on information-seeking tasks, which are typically curated via knowledge graph construction or iterative web browsing. However, these strategies face inherent limitations in frontier science, where domain-specific knowledge is scattered across sparse and heterogeneous academic sources, and problem solving requires sophisticated computation and reasoning far beyond factual recall. To bridge this gap, we introduce SciResearcher, a fully automated agentic framework for frontier-science data construction. SciResearcher synthesizes diverse conceptual and computational tasks grounded in academic evidence, while eliciting inf...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"","openalex_id":"https://openalex.org/W7160458083","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Tencent (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6980000138282776},{"id":"https://openalex.org/C2522767166","display_name":"Data science","score":0.5814999938011169},{"id":"https://openalex.org/C48044578","display_name":"Scalability","score":0.487199991941452},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.48190000653266907},{"id":"https://openalex.org/C2778571376","display_name":"Frontier","score":0.4717000126838684},{"id":"https://openalex.org/C26517878","display_name":"Key (lock)","score":0.447299987077713},{"id":"https://openalex.org/C132525143","display_name":"Graph","score":0.3644999861717224},{"id":"https://openalex.org/C2777735758","display_name":"Path (computing)","score":0.33329999446868896}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/cognitive-load-estimation-using-brain-foundation-models-and-interpretability-for-bcis","title":"Cognitive Load Estimation Using Brain Foundation Models and Interpretability for BCIs","url":"https://www.microsoft.com/en-us/research/publication/cognitive-load-estimation-using-brain-foundation-models-and-interpretability-for-bcis/","published":"2026-05-01","authors":["Deeksha M Shama","Dimitra Emmanouilidou","Ivan Tashev"],"abstract":"Figure 1: Overall pipeline of cognitive load estimation with brain foundation model (BFM) in adaptive training systems. We investigate the use of Brain Foundational Models (BFMs) for continuous cognitive-load monitoring and examine key challenges in scalability, generalization, interpretability. We apply BFMs to continuous cognitive-load estimation and analyze their behavior in a multi-day training setting. Our contributions are: A scalable and cross-participant pipeline for long-term cognitive load estimation using BFM-derived features. A flexible group-average channel alignment for heterogeneous layouts, improving cross-subject generalization An adaptation of Partition SHAP to interpret EEG feature and region importance, aligned with neuroscience [23]. A longitudinal analysis across multiple days revealing learning progression w.r.t. cognitive load and other neural markers. We find tha...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.1109/icassp55912.2026.11463223","openalex_id":"https://openalex.org/W7155071872","cited_by_count":0,"quality_score":84,"matched_keywords":["Inproceedings (Conference)","Human-computer interaction","Brain–computer interface","Signal processing","personalized","memory","long-term","efficient"],"author_affiliations":["Microsoft","Johns Hopkins University","Microsoft (United States)"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/semantic-caching-for-low-cost-llm-serving-from-offline-learning-to-online-adaptation","title":"Semantic Caching for Low-Cost LLM Serving: From Offline Learning to Online Adaptation","url":"https://www.microsoft.com/en-us/research/publication/semantic-caching-for-low-cost-llm-serving-from-offline-learning-to-online-adaptation/","published":"2026-05-01","authors":["Xutong Liu","Baran Atalar","Xiangxiang Dai","Jinhang Zuo","Siwei Wang","John C.S. Lui","Wei Chen","Carlee Joe-Wong"],"abstract":"Large Language Models (LLMs) are revolutionizing how users interact with information systems, yet their high inference cost poses serious scalability and sustainability challenges. Caching inference responses, allowing them to be retrieved without another forward pass through the LLM, has emerged as one possible solution. Traditional exact-match caching, however, overlooks the semantic similarity between queries, leading to unnecessary recomputation. Semantic caching addresses this by retrieving responses based on semantic similarity, but introduces a fundamentally different cache eviction problem: one must account for mismatch costs between incoming queries and cached responses. Moreover, key system parameters, such as query arrival probabilities and serving costs, are often unknown and must be learned over time. Existing semantic caching methods are largely ad-hoc, lacking theoretical....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Algorithms","Artificial intelligence","1970-01-01","LLM","efficient"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/droidspeak-kv-cache-sharing-for-efficient-multi-llm-serving","title":"DroidSpeak: KV Cache Sharing Across Fine-tuned Model Variants","url":"https://www.microsoft.com/en-us/research/publication/droidspeak-kv-cache-sharing-for-efficient-multi-llm-serving/","published":"2026-05-01","authors":["Yuhan Liu","Yuyang Huang","Jiayi Yao","Shaoting Feng","Zhuohan Gu","Kuntai Du","Hanchen Li","Yihua Cheng","Junchen Jiang","Shan Lu","Madan Musuvathi","Esha Choukse"],"abstract":"Compound AI systems , such as agentic systems, are an emerging trend in large-scale enterprise settings, with multiple LLMs specialized for different users, tasks, and/or roles working together. In these scenarios, different models often process inputs that share the same context prefix. Although much work was done in the past to enable the reuse of prefix KV caches across inputs for a single model, how to enable one model to reuse the prefix KV caches of a different model remains an open question.We introduce DroidSpeak, the first distributed LLM inference system that enables KV cache reuse across distributed nodes running inference of different LLMs, so long as the LLMs have the same architecture. We present the first study that aims at understanding the impact of sharing KV caches across different LLMs, and if/when such sharing affects quality. Inspired by the findings, we present Dro...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Systems and networking","Computer science","1970-01-01","LLM"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/forestcoll-throughput-optimal-collective-communications-on-heterogeneous-network-fabrics","title":"ForestColl: Throughput-Optimal Collective Communications on Heterogeneous Network Fabrics","url":"https://www.microsoft.com/en-us/research/publication/forestcoll-throughput-optimal-collective-communications-on-heterogeneous-network-fabrics/","published":"2026-05-01","authors":["Liangyu Zhao","Saeed Maleki","Yuanhong Wang","Zezhou Wang","Ziyue Yang","Hossein Pourreza","Arvind Krishnamurthy"],"abstract":"s modern DNN models grow ever larger, collective communications between the accelerators (allreduce, etc.) emerge as a significant performance bottleneck. Designing efficient communication schedules is challenging, given today’s heterogeneous and diverse network fabrics. We present ForestColl, a tool that generates throughput-optimal schedules for any network topology. ForestColl constructs broadcast/aggregationspanning trees as the communication schedule, achieving theoretical optimality. Its schedule generation runs in polynomial time and is highly scalable. ForestColl supports any network fabric, including both switching fabrics and direct accelerator connections. We evaluated ForestColl on AMD MI250 and NVIDIA DGX A100 & H100 clusters. ForestColl shows significant improvements over the vendors’ own optimized communication libraries across various settings and in LLM training. ForestC...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Systems and networking","systems and networking","LLM","efficient","Unpublished"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/flashlight-a-pytorch-compiler-framework-for-accelerating-attention-variants","title":"Flashlight: A PyTorch Compiler Framework for Accelerating Attention Variants","url":"https://www.microsoft.com/en-us/research/publication/flashlight-a-pytorch-compiler-framework-for-accelerating-attention-variants/","published":"2026-05-01","authors":["Bozhi You","Irene Wang","Zelal Su Mustafaoglu","Abhinav Jangda","Angelica Moreira","Roshan Dathathri","Divya Mahajan","Keshav Pingali"],"abstract":"Attention is a fundamental building block of large language models (LLMs), so there have been many efforts to implement it efficiently. For example, FlashAttention leverages tiling and kernel fusion to optimize attention. Recently, a number of variants of attention have been introduced to enhance model quality or efficiency. Supporting them efficiently remains difficult since they usually require specialized kernels or hand-tuned implementations. FlexAttention recently addressed part of this gap by using static programming templates to support FlashAttention-like kernels for a subset of attention variants.In this paper, we introduce Flashlight , a compiler-native framework within the PyTorch ecosystem that automatically generates fused, FlashAttention-style kernels for arbitrary attention-based programs, without relying on static templates or predefined kernel specializations. Flashlight...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Inproceedings (Conference)","Programming languages and software engineering","Systems and networking","1970-01-01","efficient"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/eywa-automating-model-based-testing-using-llms","title":"Eywa: Automating Model Based Testing using LLMs","url":"https://www.microsoft.com/en-us/research/publication/eywa-automating-model-based-testing-using-llms/","published":"2026-05-01","authors":["Rajdeep Mondal","Rathin Singha","Todd D. Millstein","George Varghese","Ryan Beckett","Siva Kesava Reddy Kakarla"],"abstract":"Model-based testing (MBT), whereby a model of the system under test is analyzed to generate high-coverage test cases, has beenused to test protocol implementations. A key barrier to the use of MBT is the need for users to understand protocol RFCs in detail to create a compliant model.Our new approach to MBT uses LLMs to automatically build rich models of intended protocol behavior from knowledge embedded in Request for Comments documents (RFCs), blogs, and other natural language sources. Our approach addresses key challenges with using LLMs, including hallucinations and their inability to monolithically generate complex protocol models. We realize our approach through a novel protocol testing framework Eywa, and demonstrate its effectiveness through extensive case studies of DNS and BGP, and a smaller study of SMTP. Despite minimal user effort, applying Eywa enabled the discovery of 33 u...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Systems and networking","Computer science","large language models","1970-01-01","Inproceedings (Conference)"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/mattersim-mt-a-multi-task-foundation-model-for-in-silico-materials-characterization","title":"MatterSim-MT: A multi-task foundation model for in silico materials characterization","url":"https://www.microsoft.com/en-us/research/publication/mattersim-mt-a-multi-task-foundation-model-for-in-silico-materials-characterization/","published":"2026-05-01","authors":["Han Yang","Xixian Liu","Chenxi Hu","Yichi Zhou","Yu Shi","Chang Liu","Junfu Tan","Jielan Li","Guanzhi Li","Qian Wang","Yu Zhu","Zekun Chen"],"abstract":"Accurate property characterization is a major bottleneck in materials design. While first-principles methods and task-specific machine-learning models have driven important progress, they remain fundamentally limited in scalability and generalizability across the vast space of structures and properties relevant to real-world materials design. We present MatterSim-MT, a multi-task foundation model for in silico materials simulation and property characterization. The model is pretrained on over 35 million first-principles-labeled structures covering 89 elements, temperatures up to 5000 K and pressures up to 1000 GPa, and is fine-tuned on various properties including Bader charges, magnetic moments, Born effective charges, and dielectric matrices. Out of the box, MatterSim-MT not only serves as a foundation model for predicting material structure, dynamics and thermodynamics, its multi-task...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Artificial intelligence","Unpublished"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"apple:xw10hgmzj5i4p142wocalfut","title":"Reinforced Agent: Inference-Time Feedback for Tool-Calling Agents","url":"https://machinelearning.apple.com/research/reinforced-agent-inference-feedback","published":"2026-05-01","authors":["Anh Ta","Junjie Zhu","Shahin Shayandeh"],"abstract":"This paper was accepted at the Fifth Workshop on Natural Language Generation, Evaluation, and Metrics at ACL 2026.","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["agent"],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"arxiv:2601.05047","title":"Challenges and Research Directions for Large Language Model Inference Hardware","url":"http://arxiv.org/abs/2601.05047","published":"2026-05-01","authors":["Xiaoyu Ma","David Patterson"],"abstract":"Large Language Model (LLM) inference is hard. The autoregressive Decode phase of the underlying Transformer model makes LLM inference fundamentally different from training. Exacerbated by recent AI trends, the primary challenges are memory and interconnect rather than compute. To address these challenges, we highlight four architecture research opportunities: High Bandwidth Flash for 10X memory capacity with HBM-like bandwidth; Processing-Near-Memory and 3D memory-logic stacking for high memory bandwidth; and low-latency interconnect to speedup communication. While our focus is datacenter AI, we also review their applicability for mobile devices.","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/mc.2026.3652916","openalex_id":"https://openalex.org/W7119941304","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Google (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7886000275611877},{"id":"https://openalex.org/C2776214188","display_name":"Inference","score":0.682200014591217},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.5814999938011169},{"id":"https://openalex.org/C188045654","display_name":"Memory bandwidth","score":0.527999997138977},{"id":"https://openalex.org/C123745756","display_name":"Interconnection","score":0.4595000147819519},{"id":"https://openalex.org/C118524514","display_name":"Computer architecture","score":0.44749999046325684},{"id":"https://openalex.org/C68339613","display_name":"Speedup","score":0.44690001010894775},{"id":"https://openalex.org/C192209626","display_name":"Focus (optics)","score":0.4465000033378601}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"hf-org-paper:Qwen:2605.19633","title":"optimizeanything: A Universal API for Optimizing any Text Parameter","url":"https://huggingface.co/papers/2605.19633","published":"2026-05","authors":["Alibaba/Qwen"],"abstract":"","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["HuggingFace org papers","Qwen"],"author_affiliations":["Alibaba/Qwen"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/Qwen/papers"}},{"id":"hf-org-paper:stepfun-ai:2605.23463","title":"StepAudio 2.5 Technical Report","url":"https://huggingface.co/papers/2605.23463","published":"2026-05","authors":["StepFun"],"abstract":"","companies":["StepFun"],"matched_orgs":["StepFun"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["HuggingFace org papers","stepfun-ai"],"author_affiliations":["StepFun"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/stepfun-ai/papers"}},{"id":"hf-org-paper:zai-org:2605.31584","title":"LongTraceRL: Learning Long-Context Reasoning from Search Agent Trajectories with Rubric Rewards","url":"https://huggingface.co/papers/2605.31584","published":"2026-05","authors":["Z.ai/Zhipu"],"abstract":"","companies":["Z.ai/Zhipu"],"matched_orgs":["Z.ai/Zhipu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["HuggingFace org papers","zai-org"],"author_affiliations":["Z.ai/Zhipu"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/zai-org/papers"}},{"id":"hf-org-paper:Qwen:2605.30350","title":"DynaFLIP: Rethinking Robotics Perception via Tri-Modal-Dynamics Guided Representation","url":"https://huggingface.co/papers/2605.30350","published":"2026-05","authors":["Alibaba/Qwen"],"abstract":"","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["HuggingFace org papers","Qwen"],"author_affiliations":["Alibaba/Qwen"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/Qwen/papers"}},{"id":"hf-org-paper:Qwen:2605.23346","title":"Contrastive Distribution Matching for Amortized Sequential Monte Carlo in Discrete Diffusion","url":"https://huggingface.co/papers/2605.23346","published":"2026-05","authors":["Alibaba/Qwen"],"abstract":"","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["HuggingFace org papers","Qwen"],"author_affiliations":["Alibaba/Qwen"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/Qwen/papers"}},{"id":"official:4bfc08cbfcb24fc4","title":"Nemotron-Labs-Diffusion: A Tri-Mode Language Model Unifying Autoregressive, Diffusion, and Self-Speculation Decoding","url":"https://research.nvidia.com/publication/2026-05_nemotron-labs-diffusion-tri-mode-language-model-unifying-autoregressive","published":"2026-05","authors":["Yonggan Fu","Lexington Whalen","Abhinav Garg","Chengyue Wu","Maksim Khadkevich","Nicolai Oswald","Enze Xie","Daniel Egert","Sharath Turuvekere Sreenivas","Shizhe Diao","Chenhan Yu","Ye Yu"],"abstract":"Official NVIDIA Research publication.","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["NVIDIA"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official NVIDIA Research publications page https://research.nvidia.com/publications?f%5B0%5D=publication_date%3A2026&page=0"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/diagnosing-capability-gaps-in-fine-tuning-data","title":"Diagnosing Capability Gaps in Fine-Tuning Data","url":"https://www.microsoft.com/en-us/research/publication/diagnosing-capability-gaps-in-fine-tuning-data/","published":"2026-04-30","authors":["Saeid Asgari Taghanaki","Raksha Agarwal","Bruce Sun","Rohan Jha","Elias Stengel-Eskin","Sara Malvar","Rui Ying","Yifei Xu","Guilherme Potje","Tusher Chakraborty","Leonardo Nunes","Ranveer Chandra"],"abstract":"Fine-tuning large language models (LLMs) for domain-specific tasks requires training datasets that comprehensively cover the target capabilities a practitioner needs. Yet identifying which capabilities a dataset fails to support, and doing so before an expensive fine-tuning run, remains a largely unsolved problem. We introduce GoalCover, a framework that helps practitioners systematically detect capability gaps in fine-tuning datasets through interactive goal decomposition and automated coverage assessment. GoalCover guides a practitioner through structured decomposition of a high-level goal into atomic, independently evaluable subgoals; assigns each training sample an LLM-based alignment score against every subgoal; and surfaces missing capabilities through automated analysis of low-scoring sample explanations. We validate the framework along two complementary axes. First, through contr...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Miscellaneous","Artificial intelligence","Computer science","Machine learning"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/caslayout-cascaded-3d-layout-diffusion-for-indoor-scene-synthesis-with-implicit-relation-modeling","title":"CasLayout: Cascaded 3D Layout Diffusion for Indoor Scene Synthesis with Implicit Relation Modeling","url":"https://www.microsoft.com/en-us/research/publication/caslayout-cascaded-3d-layout-diffusion-for-indoor-scene-synthesis-with-implicit-relation-modeling/","published":"2026-04-30","authors":["Yingrui Wu","Youkang Kong","Mingyang Zhao","Weize Quan","Dong Yan","Yang Liu"],"abstract":"Synthesizing realistic 3D indoor scenes remains challenging due to data scarcity and the difficulty of simultaneously enforcing global architectural constraints and local semantic consistency. Existing approaches often overlook structural boundaries or rely on fully connected relation graphs that introduce redundant generation errors. Inspired by human design cognition, we present CasLayout, a cascaded diffusion framework that decomposes the joint scene generation task into four conditional sub-stages with explicit physical and semantic roles: (1) predicting furniture quantity and categories, (2) refining object sizes and feature embeddings, (3) modeling spatial relationships in a latent space, and (4) generating Oriented Bounding Boxes (OBBs). This decoupled architecture reduces data requirements and enables flexible integration of Large Language Models (LLMs) and Vision Language Models...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Miscellaneous","Computer vision","Computer science","Computer Vision and Pattern Recognition"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"official:a11f9fbe3a2a810f","title":"How people ask Claude for personal guidance","url":"https://www.anthropic.com/research/claude-personal-guidance","published":"2026-04-30","authors":["Anthropic"],"abstract":"Societal Impacts","companies":["Anthropic"],"matched_orgs":["Anthropic"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Societal Impacts"],"author_affiliations":["Anthropic"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Anthropic research page https://www.anthropic.com/research"}},{"id":"apple:yp47wl7m5m0yudpk4wl55qiw","title":"STARFlow-V: End-to-End Video Generative Modeling with Normalizing Flows","url":"https://machinelearning.apple.com/research/starflow-v-video-modeling","published":"2026-04-30","authors":["Jiatao Gu","Ying Shen","Tianrong Chen","Laurent Dinh","Yuyang Wang","Miguel Ángel Bautista","David Berthelot","Josh Susskind","Shuangfei Zhai"],"abstract":"Normalizing flows (NFs) are end-to-end likelihood-based generative models for continuous data, and have recently regained attention with encouraging progress on image generation. Yet in the video generation domain, where spatiotemporal complexity and computational cost are substantially higher, state-of-the-art systems almost exclusively rely on diffusion-based models. In this work, we revisit this design space by presenting STARFlow-V, a...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"apple:c8nhoherh74a3kbgei58lo8d","title":"Bootstrapping Sign Language Annotations with Sign Language Models","url":"https://machinelearning.apple.com/research/sign-language-annotations","published":"2026-04-30","authors":["Colin Lea","Vasileios Baltatzis","Connor Gillis","Raja Kushalnagar","Lorna Quandt","Leah Findlater"],"abstract":"AI-driven sign language interpretation is limited by a lack of high-quality annotated data. New datasets including ASL STEM Wiki and FLEURS-ASL contain professional interpreters and 100s of hours of data but remain only partially annotated and thus underutilized, in part due to the prohibitive costs of annotating at this scale. In this work, we develop a pseudo-annotation pipeline that takes signed video and English as input and outputs a ranked...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"openalex:W7159601360","title":"Performance of a large language model on the reasoning tasks of a physician","url":"https://doi.org/10.1126/science.adz4433","published":"2026-04-30","authors":["Peter G. Brodeur","Thomas A Buckley","Zahir Kanjee","Ethan Goh","Evelyn Ling","Priyank Jain","Stephanie Cabral","Raja-Elie Abdulnour","Adrian D. Haimovich","Jason A. Freed","Andrew Olson","Daniel J Morgan"],"abstract":"More than 65 years ago, complex clinical diagnostic reasoning cases were introduced as the gold standard for the evaluation of expert medical computing systems, a standard that has held ever since. In this study, we report the results of a physician evaluation of a large language model (LLM) on challenging clinical cases across five experiments with a baseline of hundreds of physicians. We then report a real-world study comparing human expert and artificial intelligence (AI) second opinions in randomly selected patients in the emergency room of a major tertiary academic medical center. In all experiments, the LLM outperformed physician baselines and displayed continued improvement from prior generations of AI clinical decision support. Our study suggests that LLMs have eclipsed most benchmarks of clinical reasoning, motivating the urgent need for prospective trials.","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1126/science.adz4433","openalex_id":"https://openalex.org/W7159601360","cited_by_count":3,"quality_score":48,"matched_keywords":["LLM","language model"],"author_affiliations":["Beth Israel Deaconess Hospital","Beth Israel Deaconess Medical Center","Brigham and Women's Hospital","Cambridge Health Alliance","Harvard University","Lahey Medical Center","Massachusetts General Hospital","Massachusetts Institute of Technology","Microsoft (United States)","San Francisco General Hospital","Stanford Health Care","Stanford Medicine","Stanford University","University of Alberta","University of Maryland, Baltimore","University of Minnesota Medical Center","VA Maryland Health Care System","VA Palo Alto Health Care System"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5565999746322632},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.510200023651123},{"id":"https://openalex.org/C12725497","display_name":"Baseline (sea)","score":0.4966999888420105},{"id":"https://openalex.org/C40993552","display_name":"Gold standard (test)","score":0.49129998683929443},{"id":"https://openalex.org/C58328972","display_name":"Expert system","score":0.4117000102996826},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.40860000252723694},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.3959999978542328},{"id":"https://openalex.org/C509550671","display_name":"Medical education","score":0.3652999997138977}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W7159686488","title":"Use of a Large Language Model to Reveal Narrative Architectures of Veteran Transition Stress: Development and Validation Study","url":"https://doi.org/10.2196/90155","published":"2026-04-30","authors":["Isaac R. Galatzer‐Levy","Xi Pan","Roland Hart","George A. Bonanno"],"abstract":"Background: The stress caused by multiple aspects of veterans' transitions from military to civilian, termed transition stress, represents a unique source of psychological impact that is underresearched due to its qualitative nature. The assessment of this complex psychological phenomena has thus relied on laborious interviews designed to extract quantitative information from qualitative narratives of the transition to civilian life. We sought to determine if large language models (LLMs) could be used as valid measurement tools to extract relevant information from open-ended narratives. Objective: This study sought to develop and validate a generative artificial intelligence (AI) approach to automate the quantification and subsequent thematic analysis of veteran transition stress. Methods: Utilizing transcripts from interviews of a sample of US military veterans, we developed an LLM to r...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.2196/90155","openalex_id":"https://openalex.org/W7159686488","cited_by_count":0,"quality_score":45,"matched_keywords":["LLM","language model"],"author_affiliations":["Columbia University","Google (United States)"],"concepts":[{"id":"https://openalex.org/C194232998","display_name":"Transition (genetics)","score":0.7062000036239624},{"id":"https://openalex.org/C199033989","display_name":"Narrative","score":0.679099977016449},{"id":"https://openalex.org/C74196892","display_name":"Thematic analysis","score":0.6347000002861023},{"id":"https://openalex.org/C15744967","display_name":"Psychology","score":0.5997999906539917},{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.5008000135421753},{"id":"https://openalex.org/C177264268","display_name":"Set (abstract data type)","score":0.4666999876499176},{"id":"https://openalex.org/C43214815","display_name":"Reliability (semiconductor)","score":0.45750001072883606},{"id":"https://openalex.org/C21036866","display_name":"Stress (linguistics)","score":0.44339999556541443}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"arxiv:2604.27410","title":"From Unstructured to Structured: LLM-Guided Attribute Graphs for Entity Search and Ranking","url":"https://arxiv.org/abs/2604.27410","published":"2026-04-30","authors":["Yilun Zhu","Nikhita Vedula","Shervin Malmasi"],"abstract":"Entity search, i.e., finding the most similar entities to a query entity, faces unique challenges in e-commerce, where product similarity varies across categories and contexts. Traditional embedding-based approaches often struggle to capture nuanced context-specific attribute relevance. In this paper, we present a two-stage approach combining Large Language Model (LLM)-driven attribute graph construction with graph-aware LLM ranking. In the offline stage, we extract structured product attributes from unstructured text, and construct a reusable attribute graph with category-aware schemas. In the online stage, we rank retrieved candidates by reasoning over this structured representation rather than raw text, reducing per-product token usage by 57% while improving ranking precision. Experiments show that our approach outperforms multiple baselines under zero-shot scenarios, achieving a over...","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3805712.3808501","openalex_id":"https://openalex.org/W7159803291","cited_by_count":0,"quality_score":45,"matched_keywords":["LLM","language model"],"author_affiliations":["Amazon (United States)","Seattle University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7559000253677368},{"id":"https://openalex.org/C132525143","display_name":"Graph","score":0.5803999900817871},{"id":"https://openalex.org/C164226766","display_name":"Rank (graph theory)","score":0.5522000193595886},{"id":"https://openalex.org/C48145219","display_name":"Security token","score":0.5374000072479248},{"id":"https://openalex.org/C86037889","display_name":"Learning to rank","score":0.5325999855995178},{"id":"https://openalex.org/C189430467","display_name":"Ranking (information retrieval)","score":0.5235000252723694},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4878000020980835},{"id":"https://openalex.org/C90673727","display_name":"Product (mathematics)","score":0.47510001063346863}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"arxiv:2604.27166","title":"Distributional Alignment Games for Answer-Level Fine-Tuning","url":"https://arxiv.org/abs/2604.27166","published":"2026-04-29","authors":["Mehryar Mohri","Jon Schneider","Yifan Wu"],"abstract":"We focus on the problem of \\emph{Answer-Level Fine-Tuning} (ALFT), where the goal is to optimize a language model based on the correctness or properties of its final answers, rather than the specific reasoning traces used to produce them. Directly optimizing answer-level objectives is computationally intractable due to the need to marginalize over the vast space of latent reasoning paths. To overcome this, we propose a general game-theoretical framework that lifts the problem to a \\emph{Distributional Alignment Game}. We formulate ALFT as a two-player game between a Policy (the generator) and a Target (an auxiliary distribution). We prove that the Nash Equilibrium of this game corresponds exactly to the solution of the original answer-level optimization problem. This variational perspective transforms the intractable marginalization problem into a tractable projection problem. We demonst...","companies":["Google/DeepMind","Microsoft"],"matched_orgs":["Google/DeepMind","Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"article","doi":"","openalex_id":"https://openalex.org/W7159891894","cited_by_count":0,"quality_score":88,"matched_keywords":["language model","efficient","Miscellaneous","Artificial intelligence","Computer science","Machine learning"],"author_affiliations":["Google (United States)","Microsoft"],"concepts":[{"id":"https://openalex.org/C12713177","display_name":"Perspective (graphical)","score":0.6366000175476074},{"id":"https://openalex.org/C55439883","display_name":"Correctness","score":0.6244999766349792},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5720999836921692},{"id":"https://openalex.org/C46814582","display_name":"Nash equilibrium","score":0.5306000113487244},{"id":"https://openalex.org/C192209626","display_name":"Focus (optics)","score":0.5181000232696533},{"id":"https://openalex.org/C126255220","display_name":"Mathematical optimization","score":0.4675999879837036},{"id":"https://openalex.org/C57493831","display_name":"Projection (relational algebra)","score":0.4163999855518341},{"id":"https://openalex.org/C137836250","display_name":"Optimization problem","score":0.3995000123977661}],"official_report":true,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/theory-under-construction-orchestrating-language-models-for-research-software-where-the-specification-evolves","title":"Theory Under Construction: Orchestrating Language Models for Research Software Where the Specification Evolves","url":"https://www.microsoft.com/en-us/research/publication/theory-under-construction-orchestrating-language-models-for-research-software-where-the-specification-evolves/","published":"2026-04-29","authors":["Halley Young","Nikolaj Bjørner"],"abstract":"Large language models can now generate substantial code and draft research text, but research-software projects require more than either artifact alone. The mathematical thesis, executable system, benchmark surface, and public claims must mature together, yet often drift apart. We identify two LM-specific failure modes: hallucination accumulation, in which claims exceed what code or theory supports and unsupported assertions propagate across sessions; and desynchronization, in which code, theory, or the model's own world model fall out of alignment. We propose Comet-H, an iterative prompt automaton that orchestrates ideation, implementation, evaluation, grounding, and paper-writing as coupled coordinates of a single workspace state. At each step, a controller selects the next prompt by scoring it against what the workspace currently lacks, carries unfinished follow-up work forward with a...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Miscellaneous","Artificial intelligence","Programming languages and software engineering","Computer science","software engineering"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/unifying-sparse-attention-with-hierarchical-memory-for-scalable-long-context-llm-serving","title":"Unifying Sparse Attention with Hierarchical Memory for Scalable Long-Context LLM Serving","url":"https://www.microsoft.com/en-us/research/publication/unifying-sparse-attention-with-hierarchical-memory-for-scalable-long-context-llm-serving/","published":"2026-04-29","authors":["Zihan Zhao","Baotong Lu","Shengjie Lin","Yizou Chen","Jing Liu","Yanqi Zhang","Ziming Miao","Ming-Chang Yang","Haiying Shen","Qi Chen","Fan Yang"],"abstract":"Long-context LLM serving is bottlenecked by the cost of attending over ever-growing KV caches. Dynamic sparse attention promises relief by accessing only a small, query-dependent subset of the KV state per decoding step and extending the KV storage to CPU memory. In practice, however, these algorithmic savings rarely translate into end-to-end system-level gains because sparse methods typically operate at different granularities and thus rely on ad hoc, per-algorithm implementations. At the same time, hierarchical KV storage introduces a new systems bottleneck: retrieving fine-grained, irregular KV subsets across the GPU-CPU boundary can easily erase the benefits of sparsity. We present SPIN, a sparse-attention-aware inference framework that co-designs the execution pipeline with hierarchical KV storage through three techniques: (1) a unified partition abstraction that maps different spar...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Miscellaneous","Artificial intelligence","Computer science","Machine learning"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/autosurfer-teaching-web-agents-through-comprehensive-surfing-learning-and-modeling","title":"AutoSurfer -- Teaching Web Agents through Comprehensive Surfing, Learning, and Modeling","url":"https://www.microsoft.com/en-us/research/publication/autosurfer-teaching-web-agents-through-comprehensive-surfing-learning-and-modeling/","published":"2026-04-29","authors":["Fazle Faisal","Qianhui Wu","Baolin Peng","Jianfeng Gao"],"abstract":"Recent advances in multimodal large language models (LLMs) have revolutionized web agents that can automate complex tasks on websites. However, their accuracy remains limited by the scarcity of high-quality web trajectory training data. Existing automatic trajectory generation methods suffer from incomplete website coverage due to homepage-based task proposals or random-walk exploration. Such methods often result in hallucinated or ambiguous task synthesis that lead to incomplete and unreliable trajectory generation. Here, we present AutoSurfer, a comprehensive web trajectory generator that addresses these limitations through three key innovations. First, AutoSurfer employs a systematic breadth-first exploration strategy that maintains a queue of discovered pages and action traces, propagates knowledge across pages to avoid redundant exploration, and recursively expands multi-level graph...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Systems and networking","Computer science","agent","Miscellaneous"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"official:87b6ef7a46e5d84e","title":"Evaluating Claude’s bioinformatics research capabilities with BioMysteryBench","url":"https://www.anthropic.com/research/Evaluating-Claude-For-Bioinformatics-With-BioMysteryBench","published":"2026-04-29","authors":["Anthropic"],"abstract":"Science","companies":["Anthropic"],"matched_orgs":["Anthropic"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Science"],"author_affiliations":["Anthropic"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Anthropic research page https://www.anthropic.com/research"}},{"id":"apple:pkqplfum8gseh7edfb3galep","title":"Adaptive Thinking: Large Language Models Know When to Think in Latent Space","url":"https://machinelearning.apple.com/research/adaptive-thinking","published":"2026-04-29","authors":["Pingzhi Li","Bairu Hou","Yun Zhu","Yihao Feng","Ke Ye","Tao Lei","Zhifeng Chen","Tianlong Chen","Xianzhi Du"],"abstract":"Recent advances in large language models (LLMs) test-time computing have introduced the capability to perform intermediate chain-of-thought (CoT) reasoning (thinking) before generating answers. While increasing the thinking budget yields smooth performance improvements at inference time, the relationship between LLM capability, query complexity, and optimal budget allocation remains poorly understood for achieving compute-optimal inference. To...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["LLM"],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"apple:u9qezousr4iv537v6fbnf9zc","title":"DSO: Direct Steering Optimization for Bias Mitigation","url":"https://machinelearning.apple.com/research/direct-steering-optimization","published":"2026-04-29","authors":["Lucas Monteiro Paes","Nivedha Sivakumar","Oliver Wang","Masha Fedzechkina","Barry-John Theobald","Luca Zappella","Nicholas Apostoloff"],"abstract":"Generative models are often deployed to make decisions on behalf of users, such as vision-language models (VLMs) identifying which person in a room is a doctor to help visually impaired individuals. Yet, VLM decisions are influenced by the perceived demographic attributes of people in the input, which can lead to biased outcomes like failing to identify women as doctors. Moreover, when reducing bias leads to performance loss, users may have...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"hf-org-paper:zai-org:2604.26752","title":"GLM-5V-Turbo: Toward a Native Foundation Model for Multimodal Agents","url":"https://huggingface.co/papers/2604.26752","published":"2026-04-28","authors":["Z.ai/Zhipu"],"abstract":"We present GLM-5V-Turbo, a step toward native foundation models for multimodal agents. As foundation models are increasingly deployed in real environments, agentic capability depends not only on language reasoning, but also on the ability to perceive, interpret, and act over heterogeneous contexts such as images, videos, webpages, documents, GUIs. GLM-5V-Turbo is built around this objective: multimodal perception is integrated as a core component of reasoning, planning, tool use, and execution, rather than as an auxiliary interface to a language model. This report summarizes the main improvements behind GLM-5V-Turbo across model design, multimodal training, reinforcement learning, toolchain expansion, and integration with agent frameworks. These developments lead to strong performance in multimodal coding, visual tool use, and framework-based agentic tasks, while preserving competitive t...","companies":["Z.ai/Zhipu"],"matched_orgs":["Z.ai/Zhipu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["HuggingFace org papers","zai-org","language model","agent"],"author_affiliations":["Z.ai/Zhipu"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/zai-org/papers"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/from-prompt-risk-to-response-risk-paired-analysis-of-safety-behavior-of-large-language-model","title":"From Prompt Risk to Response Risk: Paired Analysis of Safety Behavior of Large Language Model","url":"https://www.microsoft.com/en-us/research/publication/from-prompt-risk-to-response-risk-paired-analysis-of-safety-behavior-of-large-language-model/","published":"2026-04-28","authors":["Mengya Hu","Qiong Wei","Sandeep Atluri"],"abstract":"Safety evaluations of large language models (LLMs) typically report binary outcomes such as attack success rate, refusal rate, or harmful/not-harmful response classification. While useful, these can hide how risk changes between a user's input and the model's response. We present a paired, transition-based analysis over 1250 prompt-response records with human-provided labels over four harm categories (Hate, Sexual, Violence, Self-harm) and ordinal severity levels aligned with the Azure AI Content Safety taxonomy. 61% of responses de-escalate harm relative to the prompt, 36% preserve the same severity, and 3% escalate to higher harm. A per-category persistence/drift-up decomposition identifies Sexual content as 3x harder to de-escalate than Hate or Violence, driven by persistence on already-sexual prompts, not by newly introducing sexual harm from benign inputs. Jointly measuring response...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Miscellaneous","Artificial intelligence","Computation and Language","Computer science"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/evaluating-risks-in-weak-to-strong-alignment-a-bias-variance-perspective","title":"Evaluating Risks in Weak-to-Strong Alignment: A Bias-Variance Perspective","url":"https://www.microsoft.com/en-us/research/publication/evaluating-risks-in-weak-to-strong-alignment-a-bias-variance-perspective/","published":"2026-04-28","authors":["Hamid Osooli","Kareema Batool","Rick Gentry","Tiasa Singha Roy","Ashwin Gupta","Anirudha Ramesh"],"abstract":"Weak-to-strong alignment offers a promising route to scalable supervision, but it can fail when a strong model becomes confidently wrong on examples that lie in the weak teacher's blind spots. Understanding such failures requires going beyond aggregate accuracy, since weak-to-strong errors depend not only on whether the strong model disagrees with its teacher, but also on how confidence and uncertainty are distributed across examples. In this work, we analyze weak-to-strong alignment through a bias-variance-covariance lens that connects misfit theory to practical post-training pipelines. We derive a misfit-based upper bound on weak-to-strong population risk and study its empirical components using continuous confidence scores. We evaluate four weak-to-strong pipelines spanning supervised fine-tuning (SFT), reinforcement learning from human feedback (RLHF), and reinforcement learning from...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Miscellaneous","Artificial intelligence","Computer science"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"apple:x59wax402f702gfw6rn37v71","title":"LaDiR: Latent Diffusion Enhances LLMs for Text Reasoning","url":"https://machinelearning.apple.com/research/ladir","published":"2026-04-28","authors":["Haoqiang Kang","Yizhe Zhang","Nikki Lijing Kuang","Nicklas Majamaki","Navdeep Jaitly","Yi-An Ma","Lianhui Qin"],"abstract":"Large Language Models (LLMs) demonstrate their reasoning ability through chain-of-thought (CoT) generation. However, LLM's autoregressive decoding may limit the ability to revisit and refine earlier tokens in a holistic manner, which can also lead to inefficient exploration for diverse solutions. In this paper, we propose LaDiR (Latent Diffusion Reasoner), a novel reasoning framework that unifies the expressiveness of continuous latent...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["LLM"],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"arxiv:2604.26139","title":"HIVE: Hidden-Evidence Verification for Hallucination Detection in Diffusion Large Language Models","url":"https://arxiv.org/abs/2604.26139","published":"2026-04-28","authors":["Guoshenghui Zhao","Weijie Zhao","Tan Yu"],"abstract":"Diffusion large language models generate text through multi-step denoising, where hallucination signals may emerge throughout the trajectory rather than only in the final output. Existing detectors mainly rely on output uncertainty or coarse trace statistics, which often fail to capture the richer hidden dynamics of D-LLMs. We propose HIVE, a hidden-evidence verification framework that extracts compressed hidden evidence from denoising trajectories, selects informative step-layer evidence, and conditions a verifier language model on the selected evidence through prefix embeddings. HIVE produces both a continuous hallucination score from verifier decision logits and structured verification outputs, including hallucination types, evidence pairs, and short rationales. Across two D-LLMs and three QA benchmarks, HIVE consistently outperforms eight strong baselines and achieves up to 0.9236 AU...","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"","openalex_id":"https://openalex.org/W7159734003","cited_by_count":0,"quality_score":41,"matched_keywords":["language model"],"author_affiliations":["Nvidia (United States)","Rochester Institute of Technology"],"concepts":[{"id":"https://openalex.org/C75291252","display_name":"TRACE (psycholinguistics)","score":0.7343000173568726},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7020000219345093},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5874000191688538},{"id":"https://openalex.org/C163294075","display_name":"Noise reduction","score":0.4848000109195709},{"id":"https://openalex.org/C13662910","display_name":"Trajectory","score":0.4237000048160553},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.42289999127388},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.4072999954223633},{"id":"https://openalex.org/C153180895","display_name":"Pattern recognition (psychology)","score":0.4058000147342682}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7159576433","title":"An Isoform-Centric, Structure-Aware Framework for Protein Function Prediction and Evaluation, Instantiated in 3DisoDeepPF","url":"https://doi.org/10.64898/2026.04.24.720502","published":"2026-04-28","authors":["Felicia Jiang","Runhao Zhao","Feng Liang","Y S Zhang","Taoyong Cui","Xiang Zhao","Xiangeng Wang","minghao Xu","Yi Shuai","Tianli Luo","Hualiang Yao","Chenchen Xu"],"abstract":"Understanding functional diversity across protein isoforms remains a long-standing challenge with broad biological and translational implications, yet most computational methods are developed and benchmarked on a single reference protein per gene, limiting their ability to resolve isoform-specific functional differences. This challenge is compounded by the scarcity of isoform-resolved annotations and benchmarks. Here, we present an isoform-centric, structure-aware framework for the protein family (Pfam) domain and Gene Ontology (GO) term prediction. We implemented this framework in 3DisoDeepPF, which combines a dense graph combining sequence and structure similarity with multimodal representations, and evaluated 3DisoDeepPF in both conventional and isoform-resolved settings. Across conventional canonical benchmarks, 3DisoDeepPF showed strong performance relative to representative methods...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.64898/2026.04.24.720502","openalex_id":"https://openalex.org/W7159576433","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Chinese University of Hong Kong","Peking University","Prince of Wales Hospital","Tencent (China)","Changsha University","Chinese University of Hong Kong, Shenzhen"],"concepts":[{"id":"https://openalex.org/C70721500","display_name":"Computational biology","score":0.6632999777793884},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6330999732017517},{"id":"https://openalex.org/C2987395477","display_name":"Gene ontology","score":0.45559999346733093},{"id":"https://openalex.org/C14036430","display_name":"Function (biology)","score":0.43880000710487366},{"id":"https://openalex.org/C144292202","display_name":"Protein domain","score":0.4375999867916107},{"id":"https://openalex.org/C207060522","display_name":"Protein function prediction","score":0.4284999966621399},{"id":"https://openalex.org/C36503486","display_name":"Domain (mathematical analysis)","score":0.420199990272522},{"id":"https://openalex.org/C11804247","display_name":"Protein–protein interaction","score":0.41909998655319214}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/chow-liu-ordering-for-long-context-reasoning-in-chain-of-agents","title":"Chow-Liu Ordering for Long-Context Reasoning in Chain-of-Agents","url":"https://www.microsoft.com/en-us/research/publication/chow-liu-ordering-for-long-context-reasoning-in-chain-of-agents/","published":"2026-04-27","authors":["Naman Gupta","Vaibhav Singh","Arun Iyer","Kirankumar Shiragur","Pratham Grover","Ramakrishna Bairi","Ritabrata Maiti","Sankarshan Damle","Shachee Mishra Gupta","Rishikesh Maurya","Vageesh D C"],"abstract":"Sequential multi-agent reasoning frameworks such as Chain-of-Agents (CoA) handle long-context queries by decomposing inputs into chunks and processing them sequentially using LLM-based worker agents that read from and update a bounded shared memory. From a probabilistic perspective, CoA aims to approximate the conditional distribution corresponding to a model capable of jointly reasoning over the entire long context. CoA achieves this through a latent-state factorization in which only bounded summaries of previously processed evidence are passed between agents. The resulting bounded-memory approximation introduces a lossy information bottleneck, making the final evidence state inherently dependent on the order in which chunks are processed. In this work, we study the problem of chunk ordering for long-context reasoning. We use the well-known Chow-Liu trees to learn a dependency structure...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":84,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","1970-01-01","LLM","memory","agent","multi-agent"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/world-r1-reinforcing-3d-constraints-for-text-to-video-generation","title":"World-R1: Reinforcing 3D Constraints for Text-to-Video Generation","url":"https://www.microsoft.com/en-us/research/publication/world-r1-reinforcing-3d-constraints-for-text-to-video-generation/","published":"2026-04-27","authors":["Weijie Wang","Xiaoxuan He","Youping Gu","Yifan Yang","Zeyu Zhang","Yefei He","Yanbo Ding","Xirui Hu","Donny Y. Chen","Zhiyuan He","Yuqing Yang","Bohan Zhuang"],"abstract":"Recent video foundation models demonstrate impressive visual synthesis but frequently suffer from geometric inconsistencies. While existing methods attempt to inject 3D priors via architectural modifications, they often incur high computational costs and limit scalability. We propose World-R1, a framework that aligns video generation with 3D constraints through reinforcement learning. To facilitate this alignment, we introduce a specialized pure text dataset tailored for world simulation. Utilizing Flow-GRPO, we optimize the model using feedback from pre-trained 3D foundation models and vision-language models to enforce structural coherence without altering the underlying architecture. We further employ a periodic decoupled training strategy to balance rigid geometric consistency with dynamic scene fluidity. Extensive evaluations reveal that our approach significantly enhances 3D consist...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Miscellaneous","Computer vision","Computer science","Computer Vision and Pattern Recognition"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"hf-org-paper:Tencent-Hunyuan:2604.25727","title":"Toward Scalable Terminal Task Synthesis via Skill Graphs","url":"https://huggingface.co/papers/2604.25727","published":"2026-04-27","authors":["Tencent/Hunyuan"],"abstract":"Terminal agents have demonstrated strong potential for autonomous command-line execution, yet their training remains constrained by the scarcity of high-quality and diverse execution trajectories. Existing approaches mitigate this bottleneck by synthesizing large-scale terminal task instances for trajectory sampling. However, they primarily focus on scaling the number of tasks while providing limited control over the diversity of execution trajectories that agents actually experience during training. In this paper, we present SkillSynth, an automated framework for terminal task synthesis built on a scenario-mediated skill graph. SkillSynth first constructs a large-scale skill graph, where scenarios serve as intermediate transition nodes that connect diverse command-line skills. It then samples paths from this graph as abstractions of real-world workflows, and uses a multi-agent harness t...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"https://openalex.org/W7159547356","cited_by_count":0,"quality_score":68,"matched_keywords":["HuggingFace org papers","Tencent-Hunyuan","agent","multi-agent"],"author_affiliations":["Tencent/Hunyuan","Tencent (China)"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/Tencent-Hunyuan/papers"}},{"id":"hf-org-paper:stepfun-ai:2604.25719","title":"Step-Audio-R1.5 Technical Report","url":"https://huggingface.co/papers/2604.25719","published":"2026-04-27","authors":["StepFun"],"abstract":"Recent advancements in large audio language models have extended Chain-of-Thought (CoT) reasoning into the auditory domain, enabling models to tackle increasingly complex acoustic and spoken tasks. To elicit and sustain these extended reasoning chains, the prevailing paradigm -- driven by the success of text-based reasoning models -- overwhelmingly relies on Reinforcement Learning with Verified Rewards (RLVR). However, as models are strictly optimized to distill rich, continuous auditory contexts into isolated, verifiable text labels, a fundamental question arises: are we fostering true audio intelligence, or merely reducing a continuous sensory medium into a discrete puzzle? We identify this as the \"verifiable reward trap.\" While RLVR yields remarkable scores on standardized objective benchmarks, it systematically degrades the real-world conversational feel of audio models. By prioritiz...","companies":["StepFun"],"matched_orgs":["StepFun"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["HuggingFace org papers","stepfun-ai"],"author_affiliations":["StepFun"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/stepfun-ai/papers"}},{"id":"openalex:W7156022843","title":"ARIA: Adaptive Reasoning for Integrated Analysis — An LLM-Powered Framework for Autonomous Transcriptome Analysis with Decision-Aware Workflow Orchestration","url":"https://doi.org/10.21203/rs.3.rs-9500973/v1","published":"2026-04-27","authors":["Byeongsoo Kang"],"abstract":"","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"preprint","doi":"https://doi.org/10.21203/rs.3.rs-9500973/v1","openalex_id":"https://openalex.org/W7156022843","cited_by_count":0,"quality_score":41,"matched_keywords":["LLM"],"author_affiliations":["Microsoft (United States)"],"concepts":[{"id":"https://openalex.org/C177212765","display_name":"Workflow","score":0.7337999939918518},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6905999779701233},{"id":"https://openalex.org/C2779343474","display_name":"Context (archaeology)","score":0.5835999846458435},{"id":"https://openalex.org/C185798385","display_name":"Benchmark (surveying)","score":0.5012000203132629},{"id":"https://openalex.org/C2779530757","display_name":"Quality (philosophy)","score":0.4602999985218048},{"id":"https://openalex.org/C124101348","display_name":"Data mining","score":0.42980000376701355},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4196000099182129},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.3853999972343445}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7155802897","title":"CE-CLIP: Cloud–edge collaborative fine-tuning for multimodal adaptation","url":"https://doi.org/10.1016/j.patcog.2026.113846","published":"2026-04-27","authors":["Kejun Ren","Yuntao Du","Lianming Xu","Yunxiang Yao","Wei Han","Lei Jin","Li Wang"],"abstract":"","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1016/j.patcog.2026.113846","openalex_id":"https://openalex.org/W7155802897","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Beijing University of Posts and Telecommunications","Huawei Technologies (China)","Shandong University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.669700026512146},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5044000148773193},{"id":"https://openalex.org/C139807058","display_name":"Adaptation (eye)","score":0.41749998927116394},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.3833000063896179},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.2533000111579895},{"id":"https://openalex.org/C2780910867","display_name":"Multimodality","score":0.24269999563694},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.23250000178813934},{"id":"https://openalex.org/C9652623","display_name":"Field (mathematics)","score":0.22429999709129333}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"arxiv:2604.24954","title":"Nemotron 3 Nano Omni: Efficient and Open Multimodal Intelligence","url":"https://huggingface.co/papers/2604.24954","published":"2026-04-27","authors":["NVIDIA","Amala Sanjay Deshmukh","Kateryna Chumachenko","Tuomas Rintamaki","Matthieu Le","Tyler Poon","Danial Mohseni Taheri","Ilia Karmanov","Guilin Liu","Jarno Seppanen","Arushi Goel","Mike Ranzinger"],"abstract":"We introduce Nemotron 3 Nano Omni, the latest model in the Nemotron multimodal series and the first to natively support audio inputs alongside text, images, and video. Nemotron 3 Nano Omni delivers consistent accuracy improvements over its predecessor, Nemotron Nano V2 VL, across all modalities, enabled by advances in architecture, training data and recipes. In particular, Nemotron 3 delivers leading results in real-world document understanding, long audio-video comprehension, and agentic computer use. Built on the highly efficient Nemotron 3 Nano 30B-A3B backbone, Nemotron 3 Nano Omni further incorporates innovative multimodal token-reduction techniques to deliver substantially lower inference latency and higher throughput than other models of similar size. We are releasing model checkpoints in BF16, FP8, and FP4 formats, along with portions of the training data and codebase to facilita...","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["huggingface_search"],"source":"huggingface_search","work_type":"","doi":"","openalex_id":"","cited_by_count":0,"quality_score":31,"matched_keywords":["efficient"],"author_affiliations":[],"concepts":[],"official_report":false,"quality_signals":{"company_match_source":"HuggingFace Papers organization/author metadata"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/how-researchers-navigate-accountability-transparency-and-trust-when-using-ai-tools-in-early-stage-research-a-think-aloud-study","title":"How Researchers Navigate Accountability, Transparency, and Trust When Using AI Tools in Early-Stage Research: A Think-Aloud Study","url":"https://www.microsoft.com/en-us/research/publication/how-researchers-navigate-accountability-transparency-and-trust-when-using-ai-tools-in-early-stage-research-a-think-aloud-study/","published":"2026-04-25","authors":["Sanjana Gautam","Houjiang Liu","Yujin Choi","Matthew Lease"],"abstract":"In the early stages of scientific research, researchers rely on core scholarly judgments to identify relevant literature, assess credible evidence, and determine which directions merit pursuit. As AI tools become increasingly integrated into these early-stage workflows, the scholarly judgments that were once transparent and attributable to individual researchers become obscured, raising critical Responsible AI (RAI) concerns around accountability, transparency, and trust. Yet how these three dimensions manifest in real-time, in-situ scholarly practice remains largely unexplored. To address this gap, we conducted a think-aloud study with 15 researchers to examine how they used AI tools powered by large language models (LLMs) across early-stage research tasks, including literature exploration, synthesis, and research ideation. Our key findings address the tripartite constructs of accountab...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Miscellaneous","Human-computer interaction","Computer science"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"official:7bcc84d4dc247b70","title":"ProEval: Proactive Failure Discovery and Efficient Performance Estimation for Generative AI Evaluation","url":"https://deepmind.google/research/publications/238239/","published":"2026-04-25","authors":["Google/DeepMind"],"abstract":"","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Google/DeepMind"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Google DeepMind publications page https://deepmind.google/research/publications/"}},{"id":"openalex:W7155608791","title":"Multi-modal large language model-based image captioning algorithm in information and communication technology: Bridging the gap between general and industry domain","url":"https://doi.org/10.1016/j.engappai.2026.114848","published":"2026-04-25","authors":["Lianying Chao","Kai Zhang","Xubin Li","Linfeng Yin","Haoran Cai","Sijie Wu","Dingcheng Shan"],"abstract":"","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1016/j.engappai.2026.114848","openalex_id":"https://openalex.org/W7155608791","cited_by_count":0,"quality_score":41,"matched_keywords":["language model"],"author_affiliations":["Huawei Technologies (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.9293000102043152},{"id":"https://openalex.org/C174348530","display_name":"Bridging (networking)","score":0.9211000204086304},{"id":"https://openalex.org/C157657479","display_name":"Closed captioning","score":0.9126999974250793},{"id":"https://openalex.org/C36503486","display_name":"Domain (mathematical analysis)","score":0.5110999941825867},{"id":"https://openalex.org/C115961682","display_name":"Image (mathematics)","score":0.4909999966621399},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.4593999981880188},{"id":"https://openalex.org/C11413529","display_name":"Algorithm","score":0.45739999413490295},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.43970000743865967}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7155645199","title":"Context-aware multi-property antibody predictor: a novel framework integrating text and protein language models","url":"https://doi.org/10.1038/s41540-026-00723-1","published":"2026-04-25","authors":["Luca Giancardo","Melih Yilmaz","Edward Lee","Ke Ren","Yue Zhao","Gordon Trang","Kemal Sonmez","Lan Guo","Nina Cheng"],"abstract":"Recent advances in Machine Learning have transformed antibody development through in silico models, accelerating therapeutic candidate identification. However, challenges persist: rapid adaptation of property predictors to laboratory-specific assays with incomplete datasets; batch effects introducing systematic bias; assay costs necessitating efficient unseen property prediction. We introduce a novel multimodal architecture featuring specialized tokenization and embedding projection that integrates text and protein language models (pLM) and a learning strategy to enable context-conditioned multi-property prediction without learning shortcuts. Our framework enables prompting without dictionary merging across modalities, creating a compact model capable of context-conditioned learning for multi-property prediction. The orchestrating model avoids pLM-to-text projection while enabling infere...","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1038/s41540-026-00723-1","openalex_id":"https://openalex.org/W7155645199","cited_by_count":0,"quality_score":41,"matched_keywords":["efficient"],"author_affiliations":["Amazon (United States)","Amazon (Germany)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.651199996471405},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.49869999289512634},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.48170000314712524},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.34689998626708984},{"id":"https://openalex.org/C195324797","display_name":"Natural language","score":0.31610000133514404},{"id":"https://openalex.org/C2776401178","display_name":"Feature (linguistics)","score":0.27309998869895935},{"id":"https://openalex.org/C184337299","display_name":"Semantics (computer science)","score":0.27219998836517334},{"id":"https://openalex.org/C2778112365","display_name":"Sequence (biology)","score":0.26420000195503235}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/cosmicdancepro-measuring-leo-satellites-orbital-decay-and-network-connectivity-implications-during-solar-storms","title":"CosmicDancePro -- Measuring LEO satellite's orbital decay and network connectivity implications during solar storms","url":"https://www.microsoft.com/en-us/research/publication/cosmicdancepro-measuring-leo-satellites-orbital-decay-and-network-connectivity-implications-during-solar-storms/","published":"2026-04-24","authors":["Suvam Basak","Amitangshu Pal","Debopam Bhattacherjee"],"abstract":"The May 2024 solar superstorm highlighted the vulnerability of rapidly expanding low Earth orbit (LEO) satellite networks to severe space weather events. To systematically evaluate LEO network resilience, we introduce an open-source tool, CosmicDancePro. It enables a comprehensive analysis of the effects of solar storms in the LEO satellite network. It integrates real-world multimodal datasets, including space weather measurements from several satellites, upper-atmospheric density conditions from data-driven and high-fidelity physics-based models, and LEO satellite trajectory and LEO network measurement traces to quantify orbital decay driven by enhanced atmospheric density and network connectivity degradation. We utilize CosmicDancePro to analyze the Starlink constellation's behavior during two recent major solar storms. First, we identify the specific fleet management strategies Starli...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Miscellaneous","Systems and networking","Astrophysics","Computer science","Physics"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/how-do-ai-agents-spend-your-money-analyzing-and-predicting-token-consumption-in-agentic-coding-tasks","title":"How Do AI Agents Spend Your Money? Analyzing and Predicting Token Consumption in Agentic Coding Tasks","url":"https://www.microsoft.com/en-us/research/publication/how-do-ai-agents-spend-your-money-analyzing-and-predicting-token-consumption-in-agentic-coding-tasks/","published":"2026-04-24","authors":["Longju Bai","Zhemin Huang","Xingyao Wang","Jiao Sun","Rada Mihalcea","Erik Brynjolfsson","A. Pentland","Jiaxin Pei"],"abstract":"The wide adoption of AI agents in complex human workflows is driving rapid growth in LLM token consumption. When agents are deployed on tasks that require a significant amount of tokens, three questions naturally arise: (1) Where do AI agents spend the tokens? (2) Which models are more token-efficient? and (3) Can agents predict their token usage before task execution? In this paper, we present the first systematic study of token consumption patterns in agentic coding tasks. We analyze trajectories from eight frontier LLMs on SWE-bench Verified and evaluate models'ability to predict their own token costs before task execution. We find that: (1) agentic tasks are uniquely expensive, consuming 1000x more tokens than code reasoning and code chat, with input tokens rather than output tokens driving the overall cost; (2) token usage is highly variable and inherently stochastic: runs on the sa...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Miscellaneous","Artificial intelligence","Computation and Language","Computer science"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"arxiv:2510.15620","title":"On-device Semantic Selection Made Low Latency and Memory Efficient with Monolithic Forwarding","url":"http://arxiv.org/abs/2510.15620","published":"2026-04-24","authors":["Jiahao Zhou","Chengliang Lin","Dingji Li","Mingkai Dong","Haibo Chen"],"abstract":"Semantic top-K selection with cross-encoder rerankers underpins on-device AI services, such as retrieval-augmented generation, agent memory, and personalized recommendation. However, its latency and memory demands dominate end-to-end budgets on edge hardware. Revisiting the objective of top-K selection, we reveal that only relative rankings matter, not exact per-candidate scores. We further observe sequence-level sparsity: relative rankings progressively stabilize in intermediate layers, enabling early pruning prior to completing full inference.","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3767295.3803572","openalex_id":"https://openalex.org/W4415882851","cited_by_count":0,"quality_score":57,"matched_keywords":["personalized","memory","retrieval","efficient","agent"],"author_affiliations":["Huawei Technologies (China)","Shanghai Jiao Tong University"],"concepts":[{"id":"https://openalex.org/C82876162","display_name":"Latency (audio)","score":0.7651000022888184},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7196999788284302},{"id":"https://openalex.org/C2776214188","display_name":"Inference","score":0.7074999809265137},{"id":"https://openalex.org/C45374587","display_name":"Computation","score":0.6215999722480774},{"id":"https://openalex.org/C81917197","display_name":"Selection (genetic algorithm)","score":0.4666000008583069},{"id":"https://openalex.org/C2777813233","display_name":"Grating","score":0.4203999936580658},{"id":"https://openalex.org/C108010975","display_name":"Pruning","score":0.4065999984741211},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.40149998664855957}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"official:1c68a3c2ceeb5b83","title":"DeepSeek-V4: Towards Highly Efficient Million-Token Context Intelligence","url":"https://huggingface.co/deepseek-ai/DeepSeek-V4-Flash/blob/main/DeepSeek_V4.pdf","published":"2026-04-24","authors":["DeepSeek-AI"],"abstract":"We present a preview version of DeepSeek-V4 series, including DeepSeek-V4-Pro and DeepSeek-V4-Flash, both supporting a context length of one million tokens. The report introduces hybrid attention with Compressed Sparse Attention and Heavily Compressed Attention, Manifold-Constrained Hyper-Connections, the Muon optimizer, and a post-training pipeline for long-context and agentic tasks.","companies":["DeepSeek"],"matched_orgs":["DeepSeek"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_report"],"source":"official_report","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["efficient"],"author_affiliations":["DeepSeek"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official company report source"}},{"id":"arxiv:2604.22180","title":"ResRank: Unifying Retrieval and Listwise Reranking via End-to-End Joint Training with Residual Passage Compression","url":"https://arxiv.org/abs/2604.22180","published":"2026-04-24","authors":["Xiaojie Ke","Shuai Zhang","Li Sun","Yongjin Wang","Hengjun Jiang","Xiangkun Liu","Cunxin Gu","Jian Xu","Guanjun Jiang"],"abstract":"Large language model (LLM) based listwise reranking has emerged as the dominant paradigm for achieving state-of-the-art ranking effectiveness in information retrieval. However, its reliance on feeding full passage texts into the LLM introduces two critical bottlenecks: the \"lost in the middle\" phenomenon degrades ranking quality as input length grows, and the inference latency scales super-linearly with sequence length, rendering it impractical for industrial deployment. In this paper, we present ResRank, a unified retrieval-reranking framework that fundamentally addresses both challenges. Inspired by multimodal LLMs that project visual inputs into compact token representations, ResRank employs an Encoder-LLM to compress each candidate passage into a single embedding, which is then fed alongside the query text into a Reranker-LLM for listwise ranking. To alleviate the misalignment betwee...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"","openalex_id":"https://openalex.org/W7157506686","cited_by_count":0,"quality_score":53,"matched_keywords":["LLM","language model","retrieval","compression"],"author_affiliations":["Alibaba Group (China)","Alibaba Group (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8355000019073486},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.570900022983551},{"id":"https://openalex.org/C155512373","display_name":"Residual","score":0.5583000183105469},{"id":"https://openalex.org/C2776214188","display_name":"Inference","score":0.534600019454956},{"id":"https://openalex.org/C48145219","display_name":"Security token","score":0.5145000219345093},{"id":"https://openalex.org/C165696696","display_name":"Exploit","score":0.5126000046730042},{"id":"https://openalex.org/C118505674","display_name":"Encoder","score":0.5101000070571899},{"id":"https://openalex.org/C189430467","display_name":"Ranking (information retrieval)","score":0.4341000020503998}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7155527384","title":"SAS: Sparse Attention Synthesizer for Efficient Language Model Inference","url":"https://doi.org/10.1145/3767295.3769364","published":"2026-04-24","authors":["Yuan Zhou","Shaojie Xiang","Lingfan Yu","Zhenyu Song","Charith Mendis","Yida Wang"],"abstract":"Modern large language models rely on attention mechanisms that attend to all tokens in a sequence, resulting in quadratic computational complexity that limits scalability. While sparse attention reduces compute and memory requirements by attending to only important tokens, implementing these techniques presents significant challenges due to the complexity of combining static and dynamic sparse patterns and optimizing key-value (KV) cache management.","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3767295.3769364","openalex_id":"https://openalex.org/W7155527384","cited_by_count":0,"quality_score":49,"matched_keywords":["language model","memory","efficient"],"author_affiliations":["Amazon (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.704800009727478},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5425999760627747},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.5324000120162964},{"id":"https://openalex.org/C2776214188","display_name":"Inference","score":0.4912000000476837},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.4553000032901764},{"id":"https://openalex.org/C129792486","display_name":"Language identification","score":0.3100000023841858},{"id":"https://openalex.org/C67186912","display_name":"Data modeling","score":0.305400013923645},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.3012999892234802}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"arxiv:2604.22985","title":"Uncertainty Quantification for LLM Function-Calling","url":"https://arxiv.org/abs/2604.22985","published":"2026-04-24","authors":["Zihuiwen Ye","Lukas Aichberger","Michael Kirchhof","Sinead Williamson","Luca Zappella","Yarin Gal","Arno Blaas","Adam Golinski"],"abstract":"Large Language Models (LLMs) are increasingly deployed to autonomously solve real-world tasks. A key ingredient for this is the LLM Function-Calling paradigm, a widely used approach for equipping LLMs with tool-use capabilities. However, an LLM calling functions incorrectly can have severe implications, especially when their effects are irreversible, e.g., transferring money or deleting data. Hence, it is of paramount importance to consider the LLM's confidence that a function call solves the task correctly prior to executing it. Uncertainty Quantification (UQ) methods can be used to quantify this confidence and prevent potentially incorrect function calls. In this work, we present what is, to our knowledge, the first evaluation of UQ methods for LLM Function-Calling (FC). While multi-sample UQ methods, such as Semantic Entropy, show strong performance for natural language Q&A tasks, we....","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"","openalex_id":"https://openalex.org/W7158423735","cited_by_count":0,"quality_score":41,"matched_keywords":["LLM"],"author_affiliations":["Apple (Israel)","Apple (United Kingdom)","Apple (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7465000152587891},{"id":"https://openalex.org/C32230216","display_name":"Uncertainty quantification","score":0.6503999829292297},{"id":"https://openalex.org/C2780451532","display_name":"Task (project management)","score":0.5633000135421753},{"id":"https://openalex.org/C26517878","display_name":"Key (lock)","score":0.5281999707221985},{"id":"https://openalex.org/C14036430","display_name":"Function (biology)","score":0.483599990606308},{"id":"https://openalex.org/C60048249","display_name":"Syntax","score":0.4661000072956085},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.42640000581741333},{"id":"https://openalex.org/C73555534","display_name":"Cluster analysis","score":0.4099999964237213}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"arxiv:2503.20191","title":"Maya: Optimizing Deep Learning Training Workloads using GPU Runtime Emulation","url":"http://arxiv.org/abs/2503.20191","published":"2026-04-24","authors":["Srihas Yarlagadda","Amey Agrawal","Elton Mártires Pinto","Hakesh Darapaneni","Mitali Meratwal","Shivam Kumar Mittal","Pranavi Bajjuri","Srinivas Sridharan","Alexey Tumanov"],"abstract":"Training large foundation models costs hundreds of millions of dollars, making deployment optimization critical. Current approaches require machine learning engineers to manually craft training recipes through error-prone trial-and-error on expensive compute clusters. To enable efficient exploration of training configurations, researchers have developed performance modeling systems. However, these systems force users to translate their workloads into custom specification languages, introducing a fundamental semantic gap between the actual workload and its representation. This gap creates an inherent tradeoff: systems must either support a narrow set of workloads to maintain usability, require complex specifications that limit practical adoption, or compromise prediction accuracy with simplified performance models.","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3767295.3769366","openalex_id":"https://openalex.org/W4416401445","cited_by_count":0,"quality_score":41,"matched_keywords":["efficient"],"author_affiliations":["Georgia Institute of Technology","Nvidia (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8116999864578247},{"id":"https://openalex.org/C2778476105","display_name":"Workload","score":0.6478999853134155},{"id":"https://openalex.org/C149810388","display_name":"Emulation","score":0.6419000029563904},{"id":"https://openalex.org/C108583219","display_name":"Deep learning","score":0.5719000101089478},{"id":"https://openalex.org/C2776760102","display_name":"Code (set theory)","score":0.5565999746322632},{"id":"https://openalex.org/C2777211547","display_name":"Training (meteorology)","score":0.521399974822998},{"id":"https://openalex.org/C177264268","display_name":"Set (abstract data type)","score":0.47679999470710754},{"id":"https://openalex.org/C113843644","display_name":"Interface (matter)","score":0.45730000734329224}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7155533080","title":"H2O: A Foundation Model Bridging Histopathology to Spatial Multi-Omics Profiling","url":"https://doi.org/10.64898/2026.04.21.717342","published":"2026-04-24","authors":["Yunjie Gu","Zihan Wu","Rui Yan","Zhikang Wang","Yuan Li","Senlin Lin","Yan Cui","Haoran Lai","Xin Luo","S. Kevin Zhou","Zhiyuan Yuan","Jianhua Yao"],"abstract":"Abstract Spatial omics technologies have revolutionized the molecular profiling of tissues but remain constrained by high costs and limited scalability. While hematoxylin and eosin (H&E) staining is ubiquitous, it lacks molecular specificity. Here, we present H2O (Histopathology to Omics), a generalist AI framework that bridges the modality gap between histopathology and spatial multi-omics, enabling the direct inference of spatial transcriptomics (ST) and proteomics (SP) landscapes from routine H&E images. H2O integrates Vision Transformers (ViT) with Large Language Models (LLM) via contrastive learning to align histological morphology with semantic molecular knowledge. This cross-modal approach allows the model to incorporate spatial expression profiles into histological pattern recognition, effectively decoding the molecular heterogeneity underlying tissue morphology. Trained on a pan...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.64898/2026.04.21.717342","openalex_id":"https://openalex.org/W7155533080","cited_by_count":0,"quality_score":41,"matched_keywords":["LLM"],"author_affiliations":["Fudan University","Pudong Medical Center","Shanghai Center for Brain Science and Brain-Inspired Technology","Suzhou Research Institute","Suzhou University of Science and Technology","Tencent (China)","University of Science and Technology of China"],"concepts":[{"id":"https://openalex.org/C187191949","display_name":"Profiling (computer programming)","score":0.6743000149726868},{"id":"https://openalex.org/C2776214188","display_name":"Inference","score":0.5195000171661377},{"id":"https://openalex.org/C70721500","display_name":"Computational biology","score":0.5077999830245972},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.4756999909877777},{"id":"https://openalex.org/C142724271","display_name":"Pathology","score":0.45989999175071716},{"id":"https://openalex.org/C544855455","display_name":"Histopathology","score":0.4578000009059906},{"id":"https://openalex.org/C86803240","display_name":"Biology","score":0.4505999982357025},{"id":"https://openalex.org/C145741570","display_name":"Proteogenomics","score":0.4440999925136566}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"arxiv:2503.16683","title":"GAIR: Location-aware self-supervised contrastive pre-training with geo-aligned implicit representations","url":"http://arxiv.org/abs/2503.16683","published":"2026-04-24","authors":["Zeping Liu","Fan Zhang","Junfeng Jiao","Ni Lao","Gengchen Mai"],"abstract":"","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1016/j.isprsjprs.2026.04.035","openalex_id":"https://openalex.org/W4417150304","cited_by_count":2,"quality_score":39,"matched_keywords":[],"author_affiliations":["Google (United States)","The University of Texas at Austin","University of Maine"],"concepts":[{"id":"https://openalex.org/C9770341","display_name":"Geospatial analysis","score":0.867900013923645},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7448999881744385},{"id":"https://openalex.org/C22041718","display_name":"Geolocation","score":0.6413000226020813},{"id":"https://openalex.org/C41608201","display_name":"Embedding","score":0.6025000214576721},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5695000290870667},{"id":"https://openalex.org/C59404180","display_name":"Feature learning","score":0.5568000078201294},{"id":"https://openalex.org/C42629822","display_name":"Geocoding","score":0.5094000101089478},{"id":"https://openalex.org/C125411270","display_name":"Encoding (memory)","score":0.4510999917984009}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":2}},{"id":"openalex:W7155556928","title":"Shared-private vision-to-text connector for grounded multimodal named entity recognition with synergistic global–local alignment","url":"https://doi.org/10.1007/s13042-026-03062-z","published":"2026-04-24","authors":["Zihao Zheng","L Chen","Chen Zhao","Dandan Tu","M Liu","Bing Qin"],"abstract":"","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1007/s13042-026-03062-z","openalex_id":"https://openalex.org/W7155556928","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Beijing Normal University","Harbin Institute of Technology","Huawei Technologies (China)","Peng Cheng Laboratory"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8343999981880188},{"id":"https://openalex.org/C97931131","display_name":"Discriminative model","score":0.7117999792098999},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.6100000143051147},{"id":"https://openalex.org/C2776401178","display_name":"Feature (linguistics)","score":0.5163000226020813},{"id":"https://openalex.org/C184337299","display_name":"Semantics (computer science)","score":0.5048999786376953},{"id":"https://openalex.org/C2776436953","display_name":"Consistency (knowledge bases)","score":0.47360000014305115},{"id":"https://openalex.org/C26517878","display_name":"Key (lock)","score":0.47200000286102295},{"id":"https://openalex.org/C2780226545","display_name":"Modality (human–computer interaction)","score":0.43459999561309814}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7155543416","title":"MVCBench: A Multimodal Benchmark for Drug-induced Virtual Cell Phenotypes","url":"https://doi.org/10.64898/2026.04.22.720110","published":"2026-04-24","authors":["Bo Li","Qing Wang","Shihang Wang","Bob Zhang","Yuzhong Peng","Pinxian Zeng","Chengliang Liu","Mengran Li","Ziyang Tang","Xiaojun Yao","Chuxia Deng","Qianqian Song"],"abstract":"ABSTRACT Drugs induce coordinated phenotypic changes across multiple modalities, including transcriptional reprogramming and cellular morphological remodeling. Predicting these drug-induced modality changes is central to drug discovery, mechanism-of-action studies and precision therapeutics, however, prediction performance depends critically on how both drug compounds and cellular states are represented. Despite rapid advances in drug molecular and gene representation methods, a systematic evaluation of these methods remains lacking. Herein, we introduce MVCBench, a comprehensive benchmarking framework for evaluating drug molecular and gene representation methods in predicting drug-induced multimodal virtual cell (MVC) phenotypes. MVCBench leverages large-scale transcriptomic and high-content imaging data and systematically evaluates 24 representation methods (12 drug molecular and 12 ge...","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.64898/2026.04.22.720110","openalex_id":"https://openalex.org/W7155543416","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Centre for Artificial Intelligence and Robotics","Florida College","Macao Polytechnic University","Nvidia (United States)","University of Florida","University of Macau","Wake Forest University"],"concepts":[{"id":"https://openalex.org/C86251818","display_name":"Benchmarking","score":0.7411999702453613},{"id":"https://openalex.org/C185798385","display_name":"Benchmark (surveying)","score":0.741100013256073},{"id":"https://openalex.org/C2776359362","display_name":"Representation (politics)","score":0.6322000026702881},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.590499997138977},{"id":"https://openalex.org/C70721500","display_name":"Computational biology","score":0.545199990272522},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.5113000273704529},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4936999976634979},{"id":"https://openalex.org/C127716648","display_name":"Phenotype","score":0.3862000107765198}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"arxiv:2604.22271","title":"How LLMs Detect and Correct Their Own Errors: The Role of Internal Confidence Signals","url":"https://arxiv.org/abs/2604.22271","published":"2026-04-24","authors":["Dharshan Kumaran","Viorica Patraucean","Simon Osindero","Petar Veličković","Nathaniel Daw"],"abstract":"Large language models can detect their own errors and sometimes correct them without external feedback, but the underlying mechanisms remain unknown. We investigate this through the lens of second-order models of confidence from decision neuroscience. In a first-order system, confidence derives from the generation signal itself and is therefore maximal for the chosen response, precluding error detection. Second-order models posit a partially independent evaluative signal that can disagree with the committed response, providing the basis for error detection. Kumaran et al. (2026) showed that LLMs cache a confidence representation at a token immediately following the answer (i.e. post-answer newline: PANL) -- that causally drives verbal confidence and dissociates from log-probabilities. Here we test whether this PANL signal extends beyond confidence to support error detection and self-corr...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"","openalex_id":"https://openalex.org/W7157506232","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Google (United States)"],"concepts":[{"id":"https://openalex.org/C2781162219","display_name":"Replicate","score":0.7265999913215637},{"id":"https://openalex.org/C44249647","display_name":"Confidence interval","score":0.5266000032424927},{"id":"https://openalex.org/C2779843651","display_name":"SIGNAL (programming language)","score":0.5113000273704529},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.4754999876022339},{"id":"https://openalex.org/C15744967","display_name":"Psychology","score":0.45419999957084656},{"id":"https://openalex.org/C2776359362","display_name":"Representation (politics)","score":0.4498000144958496},{"id":"https://openalex.org/C137270730","display_name":"Detection theory","score":0.41269999742507935},{"id":"https://openalex.org/C2909755999","display_name":"Low Confidence","score":0.40880000591278076}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/separable-expert-architecture-toward-privacy-preserving-llm-personalization-via-composable-adapters-and-deletable-user-proxies","title":"Separable Expert Architecture: Toward Privacy-Preserving LLM Personalization via Composable Adapters and Deletable User Proxies","url":"https://www.microsoft.com/en-us/research/publication/separable-expert-architecture-toward-privacy-preserving-llm-personalization-via-composable-adapters-and-deletable-user-proxies/","published":"2026-04-23","authors":["C. Schneider","Philipp Schoenegger","Ben Bariach"],"abstract":"Current model training approaches incorporate user information directly into shared weights, making individual data removal computationally infeasible without retraining. This paper presents a three-layer architecture that decouples personal data from shared weights by combining a static base model, composable domain-expert LoRA adapters that shape behavior without imparting user data, and per-user proxy artefacts whose deletion constitutes deterministic unlearning. Evaluation on Phi-3.5-mini and Llama-3.1-8B confirms per-user differentiation in which personal data influences outputs while remaining isolated, verified by a return to baseline after proxy removal (KL divergence of approximately 0.21 nats, 82-89% verification pass rate) and near-zero cross-user contamination. Because user-specific information never enters shared weights, the architecture mitigates model inversion, membershi...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Miscellaneous","Artificial intelligence","Security, privacy, and cryptography","Computer science","LLM","personalization"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/vidguard-r1-ai-generated-video-detection-and-explanation-via-reasoning-mllms-and-rl","title":"VidGuard-R1: AI-Generated Video Detection and Explanation via Reasoning MLLMs and RL","url":"https://www.microsoft.com/en-us/research/publication/vidguard-r1-ai-generated-video-detection-and-explanation-via-reasoning-mllms-and-rl/","published":"2026-04-23","authors":["Kyoungjun Park","Yifan Yang","Juheon Yi","Shicheng Zheng","Yifei Shen","Dongqi Han","Caihua Shan","Muhammad Muaz","Lili Qiu"],"abstract":"With the rapid advancement of AI-generated videos, there is an urgent need for effective detection tools to mitigate societal risks such as misinformation and reputational harm. In addition to accurate classification, it is essential that detection models provide interpretable explanations to ensure transparency for regulators and end users. To address these challenges, we introduce VidGuard-R1, the first video authenticity detector that fine-tunes a multi-modal large language model (MLLM) using group relative policy optimization (GRPO). Our model delivers both highly accurate judgments and insightful reasoning. We curate a challenging dataset of 140k real and AI-generated videos produced by state-of-the-art generation models, carefully designing the generation process to maximize discrimination difficulty. We then fine-tune Qwen-VL using GRPO with two specialized reward models that targ...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","1970-01-01","language model"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/parallel-sampling-from-masked-diffusion-models-via-conditional-independence-testing","title":"Parallel Sampling from Masked Diffusion Models via Conditional Independence Testing","url":"https://www.microsoft.com/en-us/research/publication/parallel-sampling-from-masked-diffusion-models-via-conditional-independence-testing/","published":"2026-04-23","authors":["Iskander Azangulov","Teodora Pandeva","Niranjani Prasad","Javier Zazo","Sushrut Karmalkar"],"abstract":"Masked diffusion models (MDMs) offer a compelling alternative to autoregressive models (ARMs) for discrete text generation because they enable parallel token sampling, rather than sequential, left-to-right generation. This means potentially much faster inference. However, effective parallel sampling faces two competing requirements: (i) simultaneously updated tokens must be conditionally independent, and (ii) updates should prioritise high-confidence predictions. These goals conflict because high-confidence predictions often cluster and depend on each other, opportunities for parallel updates. We present PUNT, a model-agnostic sampler that reconciles this trade-off. Our method identifies token dependencies and removes lower-confidence tokens from conflicting groups. This produces sets of indices for unmasking that satisfy both independence and confidence criteria. Our approach ensures im...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","Diffusion models","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/do-not-let-low-probability-tokens-over-dominate-in-rl-for-llms","title":"Do Not Let Low-Probability Tokens Over-Dominate in RL for LLMs","url":"https://www.microsoft.com/en-us/research/publication/do-not-let-low-probability-tokens-over-dominate-in-rl-for-llms/","published":"2026-04-23","authors":["Zhihe Yang","Xufang Luo","Zilong Wang","Dongqi Han","Zhiyuan He","Dongsheng Li","Yunjian Xu"],"abstract":"Reinforcement learning (RL) has become a cornerstone for enhancing the reasoning capabilities of large language models (LLMs), with recent innovations such as Group Relative Policy Optimization (GRPO) demonstrating exceptional effectiveness. In this study, we identify a critical yet underexplored issue in RL training: low-probability tokens disproportionately influence model updates due to their large gradient magnitudes. This dominance hinders the effective learning of high-probability tokens, whose gradients are essential for LLMs' performance but are substantially suppressed. To mitigate this interference, we propose two novel methods: Advantage Reweighting and Low-Probability Token Isolation (Lopti), both of which effectively attenuate gradients from low-probability tokens while emphasizing parameter updates driven by high-probability tokens. Our approaches promote balanced updates a...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Inproceedings (Conference)","Algorithms","Artificial intelligence","Computer science","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/metamuse-algorithm-generation-via-creative-ideation","title":"MetaMuse: Algorithm Generation via Creative Ideation","url":"https://www.microsoft.com/en-us/research/publication/metamuse-algorithm-generation-via-creative-ideation/","published":"2026-04-23","authors":["Ruiying Ma","Chieh-Jan Mike Liang","Yanjie Gao","Francis Y. Yan"],"abstract":"Designing system algorithms remains challenging, where the discontinuous nature of the solution space often forces system engineers to rely on generic heuristics at the expense of performance. We study whether LLMs can practically drive algorithm generation, and find that they are biased towards well-known generic designs, rather than making the creative leaps needed to navigate the discontinuous solution space. To address this limitation, we introduce MetaMuse, a framework for creative ideation built on three self-reflection principles: (1) quantifying solution diversity and usefulness in measurable performance space, rather than abstract idea space, (2) steering ideation through external stimuli, rather than internal randomness, and (3) constructing executable solutions using waypoint reasoning, rather than free-form chain-of-thought. Considering two critical online problems at a globa...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Systems and networking","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/egobrain-synergizing-minds-and-eyes-for-human-action-understanding","title":"EgoBrain: Synergizing Minds and Eyes For Human Action Understanding","url":"https://www.microsoft.com/en-us/research/publication/egobrain-synergizing-minds-and-eyes-for-human-action-understanding/","published":"2026-04-23","authors":["Nie Lin","Yansen Wang","Dongqi Han","Weibang Jiang","Jingyuan Li","Ryosuke Furuta","Yoichi Sato","Dongsheng Li"],"abstract":"The integration of brain-computer interfaces (BCIs), in particular electroencephalography (EEG), with artificial intelligence (AI) has shown tremendous promise in decoding human cognition and behavior from neural signals. In particular, the rise of multimodal AI models have brought new possibilities that have never been imagined before. Here, we present EgoBrain --the world's first large-scale, temporally aligned multimodal dataset that synchronizes egocentric vision and EEG of human brain over extended periods of time, establishing a new paradigm for human-centered behavior analysis. This dataset comprises 61 hours of synchronized 32-channel EEG recordings and first-person video from 40 participants engaged in 29 categories of daily activities. We then developed a muiltimodal learning framework to fuse EEG and vision for action understanding, validated across both cross-subject and cros...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"official:fe724adc69bca218","title":"GPT-5.5 System Card","url":"https://openai.com/index/gpt-5-5-system-card","published":"2026-04-23","authors":["OpenAI"],"abstract":"","companies":["OpenAI"],"matched_orgs":["OpenAI"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Safety"],"author_affiliations":["OpenAI"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official RSS feed https://openai.com/news/rss.xml"}},{"id":"openalex:W7155372121","title":"GeneBench: Assessing AI Agents for Multi-Stage Inference Problems in Genomics and Quantitative Biology","url":"https://doi.org/10.64898/2026.04.22.720113","published":"2026-04-23","authors":["Jeremiah H. Li","Andrew Ho"],"abstract":"Abstract We introduce GeneBench, a benchmark for AI agents on realistic multi-stage scientific data analysis in genetics and quantitative biology. Existing biology benchmarks mostly measure knowledge retrieval, execution of routine pipelines, or a single analysis step. Yet they do not capture the broader scope of work that occupies much of computational scientists’ time: cleaning and normalizing assay, phenotype, or clinical data; exploratory data analysis; statistical model selection and diagnostic iteration; and producing a conclusion that informs a downstream scientific or translational decision. GeneBench addresses this gap with 103 evaluations targeting quantities of direct practical relevance across 10 domains, with a genomics-centered core and adjacent coverage in other ‘omics and quantitative biology settings. Each problem comprises an encapsulated multi-step analysis with staged...","companies":["OpenAI"],"matched_orgs":["OpenAI"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.64898/2026.04.22.720113","openalex_id":"https://openalex.org/W7155372121","cited_by_count":0,"quality_score":41,"matched_keywords":["retrieval"],"author_affiliations":["Herat University","OpenAI (United States)"],"concepts":[{"id":"https://openalex.org/C2776214188","display_name":"Inference","score":0.6937999725341797},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6280999779701233},{"id":"https://openalex.org/C185798385","display_name":"Benchmark (surveying)","score":0.5802000164985657},{"id":"https://openalex.org/C81917197","display_name":"Selection (genetic algorithm)","score":0.5745000243186951},{"id":"https://openalex.org/C158154518","display_name":"Relevance (law)","score":0.5126000046730042},{"id":"https://openalex.org/C85847156","display_name":"Verifiable secret sharing","score":0.5049999952316284},{"id":"https://openalex.org/C2776207758","display_name":"Downstream (manufacturing)","score":0.47699999809265137},{"id":"https://openalex.org/C2778012447","display_name":"Scope (computer science)","score":0.4494999945163727}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"arxiv:2604.21590","title":"AgenticQwen: Training Small Agentic Language Models with Dual Data Flywheels for Industrial-Scale Tool Use","url":"https://arxiv.org/abs/2604.21590","published":"2026-04-23","authors":["Yuanjie Lyu","Chengyu Wang","Haonan Zheng","Yuanhao Yue","Junbing Yan","Ming Wang","Jun Huang"],"abstract":"Modern industrial applications increasingly demand language models that act as agents, capable of multi-step reasoning and tool use in real-world settings. These tasks are typically performed under strict cost and latency constraints, making small agentic models highly desirable. In this paper, we introduce the AgenticQwen family of models, trained via multi-round reinforcement learning (RL) on synthetic data and a limited amount of open-source data. Our training framework combines reasoning RL and agentic RL with dual data flywheels that automatically generate increasingly challenging tasks. The reasoning flywheel increases task difficulty by learning from errors, while the agentic flywheel expands linear workflows into multi-branch behavior trees that better reflect the decision complexity of real-world applications. We validate AgenticQwen on public benchmarks and in an industrial age...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"","openalex_id":"https://openalex.org/W7155654223","cited_by_count":0,"quality_score":41,"matched_keywords":["agent"],"author_affiliations":["Alibaba Group (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6553000211715698},{"id":"https://openalex.org/C2780980858","display_name":"Dual (grammatical number)","score":0.5957000255584717},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.5953999757766724},{"id":"https://openalex.org/C177212765","display_name":"Workflow","score":0.5702999830245972},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5569000244140625},{"id":"https://openalex.org/C97541855","display_name":"Reinforcement learning","score":0.5206000208854675},{"id":"https://openalex.org/C43521106","display_name":"Pipeline (software)","score":0.4607999920845032},{"id":"https://openalex.org/C2780451532","display_name":"Task (project management)","score":0.4474000036716461}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"arxiv:2604.22119","title":"Emergent Strategic Reasoning Risks in AI: A Taxonomy-Driven Evaluation Framework","url":"https://arxiv.org/abs/2604.22119","published":"2026-04-23","authors":["Tharindu Kumarage","Lisa Bauer","Yao Ma","Dan Rosen","Yashasvi Raghavendra Guduri","Anna Rumshisky","Kai-Wei Chang","Aram Galstyan","Rahul Gupta","Charith Peris"],"abstract":"As reasoning capacity and deployment scope grow in tandem, large language models (LLMs) gain the capacity to engage in behaviors that serve their own objectives, a class of risks we term Emergent Strategic Reasoning Risks (ESRRs). These include, but are not limited to, deception (intentionally misleading users or evaluators), evaluation gaming (strategically manipulating performance during safety testing), and reward hacking (exploiting misspecified objectives). Systematically understanding and benchmarking these risks remains an open challenge. To address this gap, we introduce ESRRSim, a taxonomy-driven agentic framework for automated behavioral risk evaluation. We construct an extensible risk taxonomy of 7 categories, which is decomposed into 20 subcategories. ESRRSim generates evaluation scenarios designed to elicit faithful reasoning, paired with dual rubrics assessing both model re...","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"","openalex_id":"https://openalex.org/W7157506365","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Amazon (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.557200014591217},{"id":"https://openalex.org/C86251818","display_name":"Benchmarking","score":0.5246999859809875},{"id":"https://openalex.org/C2780801425","display_name":"Construct (python library)","score":0.5105000138282776},{"id":"https://openalex.org/C111640148","display_name":"Rubric","score":0.4984999895095825},{"id":"https://openalex.org/C2779267917","display_name":"Deception","score":0.4690999984741211},{"id":"https://openalex.org/C112930515","display_name":"Risk analysis (engineering)","score":0.43070000410079956},{"id":"https://openalex.org/C2778012447","display_name":"Scope (computer science)","score":0.40689998865127563},{"id":"https://openalex.org/C56739046","display_name":"Knowledge management","score":0.400299996137619}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/self-aware-vector-embeddings-for-retrieval-augmented-generation-a-neuroscience-inspired-framework-for-temporal-confidence-weighted-and-relational-knowledge","title":"Self-Aware Vector Embeddings for Retrieval-Augmented Generation: A Neuroscience-Inspired Framework for Temporal, Confidence-Weighted, and Relational Knowledge","url":"https://www.microsoft.com/en-us/research/publication/self-aware-vector-embeddings-for-retrieval-augmented-generation-a-neuroscience-inspired-framework-for-temporal-confidence-weighted-and-relational-knowledge/","published":"2026-04-22","authors":["Naizhong Xu"],"abstract":"Modern retrieval-augmented generation (RAG) systems treat vector embeddings as static, context-free artifacts: an embedding has no notion of when it was created, how trustworthy its source is, or which other embeddings depend on it. This flattening of knowledge has a measurable cost: recent work on VersionRAG reports that conventional RAG achieves only 58% accuracy on versioned technical queries, because retrieval returns semantically similar but temporally invalid content. We propose SmartVector, a framework that augments dense embeddings with three explicit properties -- temporal awareness, confidence decay, and relational awareness -- and a five-stage lifecycle modeled on hippocampal-neocortical memory consolidation. A retrieval pipeline replaces pure cosine similarity with a four-signal score that mixes semantic relevance, temporal validity, live confidence, and graph-relational impo...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":80,"matched_keywords":["Miscellaneous","Algorithms","Artificial intelligence","Computer science","memory","retrieval","agent"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/auditing-and-controlling-ai-agent-actions-in-spreadsheets","title":"Auditing and Controlling AI Agent Actions in Spreadsheets","url":"https://www.microsoft.com/en-us/research/publication/auditing-and-controlling-ai-agent-actions-in-spreadsheets/","published":"2026-04-22","authors":["Sadra Sabouri","Zeinabsadat Saghi","Run Huang","Sujay Maladi","Esmeralda Eufracio","Sumit Gulwani","Souti Chattopadhyay"],"abstract":"Advances in AI agent capabilities have outpaced users'ability to meaningfully oversee their execution. AI agents can perform sophisticated, multi-step knowledge work autonomously from start to finish, yet this process remains effectively inaccessible during execution, often buried within large volumes of intermediate reasoning and outputs: by the time users receive the output, all underlying decisions have already been made without their involvement. This lack of transparency leaves users unable to examine the agent's assumptions, identify errors before they propagate, or redirect execution when it deviates from their intent. The stakes are particularly high in spreadsheet environments, where process and artifact are inseparable. Each decision the agent makes is recorded directly in cells that belong to and reflect on the user. We introduce Pista, a spreadsheet AI agent that decomposes e...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Unpublished","Artificial intelligence","Human-computer interaction","Computer science","agent"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/spaceformer-fast-proposal-free-open-vocabulary-3d-instance-segmentation","title":"SpaCeFormer: Fast Proposal-Free Open-Vocabulary 3D Instance Segmentation","url":"https://www.microsoft.com/en-us/research/publication/spaceformer-fast-proposal-free-open-vocabulary-3d-instance-segmentation/","published":"2026-04-22","authors":["C. Choy","Junha Lee","Chunghyun Park","Minsu Cho","Jan Kautz"],"abstract":"Open-vocabulary 3D instance segmentation is a core capability for robotics and AR/VR, but prior methods trade one bottleneck for another: multi-stage 2D+3D pipelines aggregate foundation-model outputs at hundreds of seconds per scene, while pseudo-labeled end-to-end approaches rely on fragmented masks and external region proposals. We present SpaCeFormer, a proposal-free space-curve transformer that runs at 0.14 seconds per scene, 2-3 orders of magnitude faster than multi-stage 2D+3D pipelines. We pair it with SpaCeFormer-3M, the largest open-vocabulary 3D instance segmentation dataset (3.0M multi-view-consistent captions over 604K instances from 7.4K scenes) built through multi-view mask clustering and multi-view VLM captioning; it reaches 21x higher mask recall than prior single-view pipelines (54.3% vs 2.5% at IoU0.5). SpaCeFormer combines spatial window attention with Morton-curve se...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Miscellaneous","Artificial intelligence","Computer vision","Computer science"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"bytedance-seed:1443","title":"Seed3D 2.0: Advancing High-Fidelity Simulation-Ready 3D Content Generation","url":"https://seed.bytedance.com/en/research/seed3d-2-0-advancing-high-fidelity-simulation-ready-3d-content-generation","published":"2026-04-22","authors":["Diandian Gu","Jing Lin","Gaohong Liu","Jiahang Liu","Su Ma","Guang Shi","Jun Wang","Qinlong Wang","Qianyi Wu","Zhongcong Xu","Xuanyu Yi","Zihao Yu"],"abstract":"We present Seed3D 2.0, an advanced 3D content generation system built on Seed3D 1.0 [16], withsubstantial improvements across generation fidelity, simulation-ready capabilities, and applicationcoverage. For geometry, a coarse-to-fine two-stage pipeline decouples global structure learning fromhigh-frequency detail recovery, while a locality-aware VAE achieves higher spatial compressionand more efficient decoding. For texture and material generation, we replace the cascadedpipeline of Seed3D 1.0 with a unified PBR model that directly generates multi-view albedoand metallic-roughness maps, enhanced by Mixture-of-Experts scaling and VLM-based semanticconditioning for improved material precision and visual fidelity. Beyond single-object generation,Seed3D 2.0 introduces a simulation-ready model suite comprising scene layout planning, part-awaredecomposition, and training-free articulation gene...","companies":["ByteDance/Seed"],"matched_orgs":["ByteDance/Seed"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Computer Vision","Multimodal","arXiv","efficient"],"author_affiliations":["ByteDance/Seed"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official ByteDance Seed publication API https://seed.bytedance.com/api/get_article_list_v2"}},{"id":"arxiv:2604.21154","title":"Agentic AI for Personalized Physiotherapy: A Multi-Agent Framework for Generative Video Training and Real-Time Pose Correction","url":"https://arxiv.org/abs/2604.21154","published":"2026-04-22","authors":["Abhishek Dharmaratnakar","Srikanth Ranganathan","Anushree Sinha","Debanshu Das"],"abstract":"At-home physiotherapy compliance remains critically low due to a lack of personalized supervision and dynamic feedback. Existing digital health solutions rely on static, pre-recorded video libraries or generic 3D avatars that fail to account for a patient's specific injury limitations or home environment. In this paper, we propose a novel Multi-Agent System (MAS) architecture that leverages Generative AI and computer vision to close the tele-rehabilitation loop. Our framework consists of four specialized micro-agents: a Clinical Extraction Agent that parses unstructured medical notes into kinematic constraints; a Video Synthesis Agent that utilizes foundational video generation models to create personalized, patient-specific exercise videos; a Vision Processing Agent for real-time pose estimation; and a Diagnostic Feedback Agent that issues corrective instructions. We present the system....","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"","openalex_id":"https://openalex.org/W7155654091","cited_by_count":0,"quality_score":53,"matched_keywords":["personalized","media","agent","multi-agent"],"author_affiliations":["Google (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6969000101089478},{"id":"https://openalex.org/C43521106","display_name":"Pipeline (software)","score":0.6265000104904175},{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.593500018119812},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.45750001072883606},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.42100000381469727},{"id":"https://openalex.org/C49774154","display_name":"Multimedia","score":0.40639999508857727},{"id":"https://openalex.org/C2778755073","display_name":"Scale (ratio)","score":0.37400001287460327},{"id":"https://openalex.org/C124304363","display_name":"Abstraction","score":0.35499998927116394}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"arxiv:2604.20714","title":"Learning to Evolve: A Self-Improving Framework for Multi-Agent Systems via Textual Parameter Graph Optimization","url":"https://arxiv.org/abs/2604.20714","published":"2026-04-22","authors":["Shan He","Runze Wang","Zhuoyun Du","Huiyu Bai","Zouying Cao","Yu Cheng","Bo Zheng"],"abstract":"Designing and optimizing multi-agent systems (MAS) is a complex, labor-intensive process of \"Agent Engineering.\" Existing automatic optimization methods, primarily focused on flat prompt tuning, lack the structural awareness to debug the intricate web of interactions in MAS. More critically, these optimizers are static; they do not learn from experience to improve their own optimization strategies. To address these gaps, we introduce Textual Parameter Graph Optimization (TPGO), a framework that enables a multi-agent system to learn to evolve. TPGO first models the MAS as a Textual Parameter Graph (TPG), where agents, tools, and workflows are modular, optimizable nodes. To guide evolution, we derive \"textual gradients,\" structured natural language feedback from execution traces, to pinpoint failures and suggest granular modifications. The core of our framework is Group Relative Agent Opti...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"","openalex_id":"https://openalex.org/W7155574433","cited_by_count":0,"quality_score":45,"matched_keywords":["agent","multi-agent"],"author_affiliations":["Alibaba Group (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.786899983882904},{"id":"https://openalex.org/C177212765","display_name":"Workflow","score":0.7612000107765198},{"id":"https://openalex.org/C168065819","display_name":"Debugging","score":0.7527999877929688},{"id":"https://openalex.org/C132525143","display_name":"Graph","score":0.5598000288009644},{"id":"https://openalex.org/C98045186","display_name":"Process (computing)","score":0.4862000048160553},{"id":"https://openalex.org/C80444323","display_name":"Theoretical computer science","score":0.4503999948501587},{"id":"https://openalex.org/C137836250","display_name":"Optimization problem","score":0.4253999888896942},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.382999986410141}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"arxiv:2604.20144","title":"An Agentic Approach to Metadata Reasoning","url":"https://arxiv.org/abs/2604.20144","published":"2026-04-22","authors":["Jiani Zhang","Sercan O. Arik","Cosmin Arad","Fatma Ozcan","Alon Halevy"],"abstract":"As LLM-driven autonomous agents evolve to perform complex, multi-step tasks that require integrating multiple datasets, the problem of discovering relevant data sources becomes a key bottleneck. Beyond the challenge posed by the sheer volume of available data sources, data-source selection is difficult because the semantics of data are extremely nuanced and require considering many aspects of the data. To address this, we introduce the Metadata Reasoner, an agentic approach to metadata reasoning, designed to identify a small set of data sources that are both sufficient and minimal for a given analytical task. The Metadata Reasoner leverages a table-search engine to retrieve candidate tables, and then autonomously consults various aspects of the available metadata to determine whether the candidates fit the requirements of the task. We demonstrate the effectiveness of the Metadata Reasone...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"","openalex_id":"https://openalex.org/W7155573924","cited_by_count":0,"quality_score":41,"matched_keywords":["LLM"],"author_affiliations":["Google (United States)"],"concepts":[{"id":"https://openalex.org/C93518851","display_name":"Metadata","score":0.9502000212669373},{"id":"https://openalex.org/C9616225","display_name":"Semantic reasoner","score":0.9223999977111816},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.817799985408783},{"id":"https://openalex.org/C185798385","display_name":"Benchmark (surveying)","score":0.6026999950408936},{"id":"https://openalex.org/C774472","display_name":"Margin (machine learning)","score":0.5274999737739563},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.5206000208854675},{"id":"https://openalex.org/C184337299","display_name":"Semantics (computer science)","score":0.4964999854564667},{"id":"https://openalex.org/C153048206","display_name":"Metadata repository","score":0.4871000051498413}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7155201584","title":"Evaluating large language models for accuracy incentivizes hallucinations","url":"https://doi.org/10.1038/s41586-026-10549-w","published":"2026-04-22","authors":["Adam Tauman Kalai","Ofir Nachum","Santosh S. Vempala","Edwin Zhang"],"abstract":"","companies":["OpenAI"],"matched_orgs":["OpenAI"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1038/s41586-026-10549-w","openalex_id":"https://openalex.org/W7155201584","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Georgia Institute of Technology","Information Systems Laboratories (United States)","OpenAI (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7261999845504761},{"id":"https://openalex.org/C2778689934","display_name":"Headline","score":0.6658999919891357},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.5691999793052673},{"id":"https://openalex.org/C187029079","display_name":"Cognitive reframing","score":0.5575000047683716},{"id":"https://openalex.org/C97541855","display_name":"Reinforcement learning","score":0.5407999753952026},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5210999846458435},{"id":"https://openalex.org/C180747234","display_name":"Cognitive psychology","score":0.4876999855041504},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.4683000147342682}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/reasoning-aware-aigc-detection-via-alignment-and-reinforcement","title":"Reasoning-Aware AIGC Detection via Alignment and Reinforcement","url":"https://www.microsoft.com/en-us/research/publication/reasoning-aware-aigc-detection-via-alignment-and-reinforcement/","published":"2026-04-21","authors":["Zhao Wang","Max Xiong","Jianxun Lian","Zhicheng Dou"],"abstract":"The rapid advancement and widespread adoption of Large Language Models (LLMs) have elevated the need for reliable AI-generated content (AIGC) detection, which remains challenging as models evolve. We introduce AIGC-text-bank, a comprehensive multi-domain dataset with diverse LLM sources and authorship scenarios, and propose REVEAL, a detection framework that generates interpretable reasoning chains before classification. Our approach uses a two-stage training strategy: supervised fine-tuning to establish reasoning capabilities, followed by reinforcement learning to improve accuracy, improve logical consistency, and reduce hallucinations. Extensive experiments show that REVEAL achieves state-of-the-art performance across multiple benchmarks, offering a robust and transparent solution for AIGC detection. The project is open-source at https://aka.ms/reveal","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Miscellaneous","Security, privacy, and cryptography","Systems and networking","Artificial intelligence","Computer science","LLM"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W7155039358","title":"Scale: Semantic Chunking and Label-Delay Engine For Streaming Speech-LLM","url":"https://doi.org/10.1109/icassp55912.2026.11462898","published":"2026-04-21","authors":["Akshat Jaiswal","Debmalya Chakrabarty","Ritwik Kotra","Harish Arsikere","Nikhil Bhave","Sambuddha Bhattacharya","Sri Garimella"],"abstract":"Streaming automatic speech recognition (ASR) systems based on Large Language Models (LLMs) face a fundamental trade-off between accuracy and latency. Existing approaches typically employ fixed-size chunking to maintain low latency, which often compromises recognition accuracy. We propose SCALE, a streaming ASR framework that addresses this challenge through three key techniques: (a) dynamic chunk boundary prediction leveraging semantic information, replacing rigid fixed-size chunking, (b) intra-chunk bidirectional attention mechanism for efficient acoustic context modeling, and (c) label delay training strategy enabling stable word predictions with smaller chunks. Our experiments show consistent improvements in streaming ASR performance, with up to 25% relative reduction in Word Error Rate (WER) compared to existing LLM based streaming ASR systems reported in literature.","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"article","doi":"https://doi.org/10.1109/icassp55912.2026.11462898","openalex_id":"https://openalex.org/W7155039358","cited_by_count":0,"quality_score":64,"matched_keywords":["LLM","efficient","Machine learning"],"author_affiliations":["Amazon (United States)","Amazon"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7178000211715698},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4528999924659729},{"id":"https://openalex.org/C203357204","display_name":"Chunking (psychology)","score":0.37599998712539673},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.3612000048160553},{"id":"https://openalex.org/C184337299","display_name":"Semantics (computer science)","score":0.3034999966621399},{"id":"https://openalex.org/C124101348","display_name":"Data mining","score":0.2567000091075897},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.24220000207424164},{"id":"https://openalex.org/C2776401178","display_name":"Feature (linguistics)","score":0.23980000615119934}],"official_report":true,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"apple:bip4g3ag97pzcpkeqlnagbdm","title":"Can Large Language Models Understand Context?","url":"https://machinelearning.apple.com/research/llm-context-understanding","published":"2026-04-21","authors":["Yilun Zhu","Joel Ruben Antony Moniz","Shruti Bhargava","Jiarui Lu","Dhivya Piraviperumal","Site Li","Yuan Zhang","Hong Yu","Bo-Hsiang Tseng"],"abstract":"Understanding context is key to understanding human language, an ability which Large Language Models (LLMs) have been increasingly seen to demonstrate to an impressive extent. However, though the evaluation of LLMs encompasses various domains within the realm of Natural Language Processing, limited attention has been paid to probing their linguistic capability of understanding contextual features. This paper introduces a context understanding...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"openalex:W7155067614","title":"Low Rank Quantization Adaptation for Large Language Model","url":"https://doi.org/10.1109/icassp55912.2026.11464312","published":"2026-04-21","authors":["Xiwei Xu","Yuexiao Ma","Wenting Lin","Yuhang Wu","Yuxin Lin","Zelan Yang","W. Sui","Shen Li","Yong Li","Fei Chao","Xiawu Zheng","Rongrong Ji"],"abstract":"As Large Language Models (LLMs) grow, quantization is vital for compression and acceleration, while Low-Rank Adaptation (LoRA) boosts performance. Combining both effectively remains challenging. We propose LoQA (Low-rank Quantized Adaptation), a method that jointly fine-tunes full quantization parameters. Specifically, we introduce HQ-LoRA, a redesigned quantization operator compatible with LoRA, which optimizes all parameters (scale and zero-point) simultaneously for significant gains. LoQA’s expanded optimization space ensures strong generalization across diverse Post-Training Quantization (PTQ) techniques. To handle integer weight variations across bit-widths, we also propose QBAS (Quantization Bit-aware Scaling), which dynamically adjusts LoRA’s scaling factor based on target precision, stabilizing and improving fine-tuning. Experiments show LoQA consistently improves performance acr...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/icassp55912.2026.11464312","openalex_id":"https://openalex.org/W7155067614","cited_by_count":0,"quality_score":49,"matched_keywords":["language model","compression","quantization"],"author_affiliations":["Alibaba Group (China)","Xiamen University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.578000009059906},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5105999708175659},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.4796000123023987},{"id":"https://openalex.org/C28855332","display_name":"Quantization (signal processing)","score":0.4065000116825104},{"id":"https://openalex.org/C11413529","display_name":"Algorithm","score":0.3828999996185303},{"id":"https://openalex.org/C33923547","display_name":"Mathematics","score":0.328000009059906},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.3222000002861023},{"id":"https://openalex.org/C195324797","display_name":"Natural language","score":0.3127000033855438}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7155106776","title":"From Pixels to Diagnoses: Towards Interpretable Medical Image Retrieval Via LLM-Driven Semantic Enhancement","url":"https://doi.org/10.1109/icassp55912.2026.11462895","published":"2026-04-21","authors":["Baoyao Yang","Wanyun Li","Dixin Chen","Junxiang Chen","Haifeng Lin","Wenbin Yao"],"abstract":"Medical Image Retrieval (MIR) has emerged as a crucial technique in clinical practice, as it offers relevant references to support subsequent diagnoses, going beyond mere classification outcomes. Traditional MIR methods focus on learning discriminative features, discovering effective representations with minimized intra-class variance. Due to the black-box coding nature of deep learning models, their retrieval results are unexplainable, which means that the top-ranked samples may exhibit similarities in terms of external factors such as angles and illuminations, rather than in disease characteristics. Although some current methods propose fine-grained constraints to improve explanation ability, their results remain less understandable. This paper proposes a novel interpretative MIR method (SD-MIR), rooted in the semantic disclosure of large language models (LLMs), that systematically bri...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/icassp55912.2026.11462895","openalex_id":"https://openalex.org/W7155106776","cited_by_count":0,"quality_score":49,"matched_keywords":["LLM","retrieval","distillation"],"author_affiliations":["Guangdong University of Technology","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.6948000192642212},{"id":"https://openalex.org/C160633673","display_name":"Pixel","score":0.6348999738693237},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6176000237464905},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.5354999899864197},{"id":"https://openalex.org/C115961682","display_name":"Image (mathematics)","score":0.4440999925136566},{"id":"https://openalex.org/C153180895","display_name":"Pattern recognition (psychology)","score":0.4153999984264374},{"id":"https://openalex.org/C1667742","display_name":"Image retrieval","score":0.3700999915599823},{"id":"https://openalex.org/C2776401178","display_name":"Feature (linguistics)","score":0.29679998755455017}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7155038823","title":"SatBadEdit: Towards Efficient and Robust Multi-Trigger Backdoor Injection in Large Language Models","url":"https://doi.org/10.1109/icassp55912.2026.11464407","published":"2026-04-21","authors":["Yue Chen","Xiaohu Zhao","Xinwei Wu","Jianxiang Peng","Shi Dan","Lei Yang","Linlong Xu","Yueheng Sun","Deyi Xiong"],"abstract":"Efficient and stealthy backdoor injection into Large Language Models (LLMs) remains a critical challenge. While recent model editing techniques offer an efficient alternative to traditional fine-tuning, they are often limited to injecting a few triggers and demonstrate poor robustness against state-of-the-art defenses. To overcome this limitation, we introduce SatBadEdit, a novel saturation-based model editing for implanting a large volume of backdoors with high adversarial strength. SatBadEdit uniquely leverages batch editing capabilities to inject numerous triggers in a single, highly efficient operation. To counteract the potential model instability from this massive injection, we devise a low-rank weight update and aggregation strategy. This mechanism confines parameter modifications to a sparse, low-dimensional subspace, ensuring minimal model disruption while maximizing trigger eff...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/icassp55912.2026.11464407","openalex_id":"https://openalex.org/W7155038823","cited_by_count":0,"quality_score":45,"matched_keywords":["LLM","efficient"],"author_affiliations":["Alibaba Group (China)","Tianjin University"],"concepts":[{"id":"https://openalex.org/C2781045450","display_name":"Backdoor","score":0.8079000115394592},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5852000117301941},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.428600013256073},{"id":"https://openalex.org/C11413529","display_name":"Algorithm","score":0.38769999146461487},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.289000004529953},{"id":"https://openalex.org/C63479239","display_name":"Robustness (evolution)","score":0.28619998693466187},{"id":"https://openalex.org/C26517878","display_name":"Key (lock)","score":0.2800999879837036},{"id":"https://openalex.org/C168167062","display_name":"Component (thermodynamics)","score":0.25600001215934753}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"arxiv:2412.15277","title":"PLPP: Prompt Learning with Perplexity is Self-Distillation for Vision-Language Models","url":"http://arxiv.org/abs/2412.15277","published":"2026-04-21","authors":["Biao Liu","Wenyi Fang","Xiaoyu Wu","Yang Zheng","Hu Zheng","Bo Yuan"],"abstract":"Pre-trained Vision-Language (VL) models such as CLIP have demonstrated their excellent performance across numerous downstream tasks. A recent method, Context Optimization (CoOp), further improves the performance of VL models on downstream tasks by introducing prompt learning. CoOp optimizes a set of learnable vectors, aka prompt, and freezes the whole CLIP model. However, relying solely on CLIP loss to fine-tune prompts can lead to models that are prone to overfitting on downstream tasks. To address this issue, we propose a plug-in prompt-regularization method called PLPP (Prompt Learning with PerPlexity), which uses perplexity loss to regularize prompt learning. PLPP designs a two-step operation to compute the perplexity for prompts: (a) calculating cosine similarity between the weights of the embedding layer and prompts to get labels, (b) introducing a language model (LM) head that req...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"preprint","doi":"https://doi.org/10.1109/icassp55912.2026.11464890","openalex_id":"https://openalex.org/W4405713890","cited_by_count":0,"quality_score":45,"matched_keywords":["language model","distillation"],"author_affiliations":["Huawei Technologies (China)","Southern University of Science and Technology"],"concepts":[{"id":"https://openalex.org/C100279451","display_name":"Perplexity","score":0.9913429021835327},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.4741508364677429},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.45083087682724},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.4246417284011841},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4205256998538971},{"id":"https://openalex.org/C15744967","display_name":"Psychology","score":0.3478793203830719}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"arxiv:2601.21503","title":"MAR: Efficient Large Language Models Via Module-Aware Architecture Refinement","url":"https://arxiv.org/abs/2601.21503","published":"2026-04-21","authors":["Junhong Cai","Guiqin Wang","Kejie Zhao","Jianxiong Tang","Xiang Wang","Luziwei Leng","Ran Cheng","Yuxin Ma","Qinghai Guo"],"abstract":"Large Language Models (LLMs) excel across diverse domains but suffer from high energy costs due to quadratic attention and dense Feed-Forward Network (FFN) operations. To address these issues, we propose Module-aware Architecture Refinement (MAR), a two-stage framework that integrates State Space Models (SSMs) for linear-time sequence modeling and applies activation sparsification to reduce FFN costs. In addition, to mitigate low information density and temporal mismatch in integrating Spiking Neural Networks (SNNs) with SSMs, we design the Adaptive Ternary Multi-step Neuron (ATMN) and the Spike-aware Bidirectional Distillation Strategy (SBDS). Extensive experiments demonstrate that MAR effectively restores the performance of its dense counterpart under constrained resources while substantially reducing inference energy consumption. Furthermore, it outperforms efficient models of compara...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/icassp55912.2026.11464338","openalex_id":"https://openalex.org/W7155031313","cited_by_count":0,"quality_score":45,"matched_keywords":["efficient","distillation"],"author_affiliations":["City University of Hong Kong","Hong Kong Polytechnic University","Huawei Technologies (China)","Southern University of Science and Technology","Xi'an Jiaotong University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6995000243186951},{"id":"https://openalex.org/C199360897","display_name":"Programming language","score":0.4984000027179718},{"id":"https://openalex.org/C80444323","display_name":"Theoretical computer science","score":0.36160001158714294},{"id":"https://openalex.org/C123657996","display_name":"Architecture","score":0.35510000586509705},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.34610000252723694},{"id":"https://openalex.org/C124304363","display_name":"Abstraction","score":0.29899999499320984},{"id":"https://openalex.org/C179603123","display_name":"Modeling language","score":0.2985999882221222},{"id":"https://openalex.org/C11413529","display_name":"Algorithm","score":0.2858000099658966}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7155084426","title":"Less Redundancy: Boosting Practicality of Vision Language Model in Walking Assistants","url":"https://doi.org/10.1109/icassp55912.2026.11464758","published":"2026-04-21","authors":["Chongyang Li","Zhiqiang Yuan","Hanbo Bi","Zexi Jia","Jinchao Zhang"],"abstract":"About 283 million people worldwide have visual impairments, motivating research on Visual Language Models (VLMs) for walking assistance. However, existing VLMs for walking assistance often generate redundant and extraneous outputs, hindering users’ situational awareness. In addition, they lack the ability to proactively assess risks and adaptively trigger reminders, causing temporal redundancy. To mitigate output and temporal redundancy, we propose WalkVLM-LR, a walking assistance model with less redundancy. To reduce output redundancy, we introduce four human-preference-based custom reward functions within the GRPO-based reasoning framework to optimize the output in terms of conciseness, fluency, keyword density, and accuracy, thereby producing more informative and streamlined outputs. To minimize temporal redundancy, we incorporate an environment awareness discriminator, which shares t...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/icassp55912.2026.11464758","openalex_id":"https://openalex.org/W7155084426","cited_by_count":0,"quality_score":45,"matched_keywords":["language model","preference"],"author_affiliations":["Tencent (China)"],"concepts":[{"id":"https://openalex.org/C46686674","display_name":"Boosting (machine learning)","score":0.6621000170707703},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6190999746322632},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.47350001335144043},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.3813999891281128},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.3671000003814697},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.32339999079704285},{"id":"https://openalex.org/C195324797","display_name":"Natural language","score":0.2815000116825104},{"id":"https://openalex.org/C15744967","display_name":"Psychology","score":0.26190000772476196}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7155039852","title":"LESS: Large Language Model Enhanced Semi-Supervised Learning for Speech Foundational Models Using in-the-wild Data","url":"https://doi.org/10.1109/icassp55912.2026.11462195","published":"2026-04-21","authors":["Ding Wen","Fan Qian"],"abstract":"Although state-of-the-art Speech Foundational Models can produce high-quality text pseudo-labels, applying Semi-Supervised Learning (SSL) for in-the-wild real-world data remains challenging due to its richer and more complex acoustics compared to curated datasets. To address the challenges, we introduce LESS (Large Language Model Enhanced Semi-supervised Learning), a versatile framework that uses Large Language Models (LLMs) to correct pseudo-labels generated on in-the-wild data. In the LESS framework, pseudo-labeled text from Automatic Speech Recognition (ASR) or Automatic Speech Translation (AST) of the unsupervised data is refined by an LLM, and further improved by a data filtering strategy. Across Mandarin ASR and Spanish-to-English AST evaluations, LESS delivers consistent gains, with an absolute Word Error Rate reduction of 3.8% on WenetSpeech, and BLEU score increase of 0.8 and 0....","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/icassp55912.2026.11462195","openalex_id":"https://openalex.org/W7155039852","cited_by_count":0,"quality_score":45,"matched_keywords":["LLM","language model"],"author_affiliations":["Nvidia (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6759999990463257},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.5253000259399414},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.49079999327659607},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.4544999897480011},{"id":"https://openalex.org/C195324797","display_name":"Natural language","score":0.3546000123023987},{"id":"https://openalex.org/C2776401178","display_name":"Feature (linguistics)","score":0.3384000062942505},{"id":"https://openalex.org/C67186912","display_name":"Data modeling","score":0.3321000039577484},{"id":"https://openalex.org/C41895202","display_name":"Linguistics","score":0.32429999113082886}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7155099652","title":"FlowSE-GRPO: Training Flow Matching Speech Enhancement via Online Reinforcement Learning","url":"https://doi.org/10.1109/icassp55912.2026.11461623","published":"2026-04-21","authors":["Haoxu Wang","Biao Tian","Yiheng Jiang","Zexu Pan","Shengkui Zhao","Bin Ma","Daren Chen","Xiangang Li"],"abstract":"Generative speech enhancement offers a promising alternative to traditional discriminative methods by modeling the distribution of clean speech conditioned on noisy inputs. Post-training alignment via reinforcement learning (RL) effectively aligns generative models with human preferences and downstream metrics in domains such as natural language processing, but its use in speech enhancement remains limited, especially for online RL. Prior work explores offline methods like Direct Preference Optimization (DPO); online methods such as Group Relative Policy Optimization (GRPO) remain largely uninvestigated. In this paper, we present the first successful integration of online GRPO into a flow-matching speech enhancement framework, enabling efficient post-training alignment to perceptual and task-oriented metrics with few update steps. Unlike prior GRPO work on Large Language Models, we adapt...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/icassp55912.2026.11461623","openalex_id":"https://openalex.org/W7155099652","cited_by_count":0,"quality_score":45,"matched_keywords":["preference","efficient"],"author_affiliations":["Alibaba Group (China)","Alibaba Group (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6202999949455261},{"id":"https://openalex.org/C2776182073","display_name":"Speech enhancement","score":0.5823000073432922},{"id":"https://openalex.org/C2777211547","display_name":"Training (meteorology)","score":0.545199990272522},{"id":"https://openalex.org/C28490314","display_name":"Speech recognition","score":0.4945000112056732},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4602999985218048},{"id":"https://openalex.org/C165064840","display_name":"Matching (statistics)","score":0.44920000433921814},{"id":"https://openalex.org/C97541855","display_name":"Reinforcement learning","score":0.38519999384880066},{"id":"https://openalex.org/C38349280","display_name":"Flow (mathematics)","score":0.37049999833106995}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"arxiv:2604.19835","title":"Expert Upcycling: Shifting the Compute-Efficient Frontier of Mixture-of-Experts","url":"https://arxiv.org/abs/2604.19835","published":"2026-04-21","authors":["Chaitanya Dwivedi","Binxuan Huang","Himanshu Gupta","Pratik Jayarao","Neeraj Varshney","Bing Yin"],"abstract":"Mixture-of-Experts (MoE) has become the dominant architecture for scaling large language models: frontier models routinely decouple total parameters from per-token computation through sparse expert routing. Scaling laws show that under fixed active computation, model quality scales predictably with total parameters, and MoEs realize this by increasing expert count. However, training large MoEs is expensive, as memory requirements and inter-device communication both scale with total parameter count. We propose expert upcycling, a method for progressively expanding MoE capacity by increasing the number of experts during continued pre-training (CPT). Given a trained E-expert model, the upcycling operator constructs an mE-expert model through expert duplication and router extension while holding top-K routing fixed, preserving per-token inference cost. Duplication provides a warm initializat...","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"","openalex_id":"https://openalex.org/W7155573832","cited_by_count":0,"quality_score":45,"matched_keywords":["memory","efficient"],"author_affiliations":["Amazon (United States)"],"concepts":[{"id":"https://openalex.org/C114466953","display_name":"Initialization","score":0.6410999894142151},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6010000109672546},{"id":"https://openalex.org/C58328972","display_name":"Expert system","score":0.5415999889373779},{"id":"https://openalex.org/C2776214188","display_name":"Inference","score":0.507099986076355},{"id":"https://openalex.org/C2779530757","display_name":"Quality (philosophy)","score":0.4828000068664551},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4505999982357025},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.42010000348091125},{"id":"https://openalex.org/C17020691","display_name":"Operator (biology)","score":0.41449999809265137}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7155081870","title":"Enhance Balance between Generalization and Personalization for Vision-Language Models in Federated Learning","url":"https://doi.org/10.1109/icassp55912.2026.11464153","published":"2026-04-21","authors":["Ziyun Cai","Yizhou Lu","Yawen Huang","Jie Song","Ye Liu","Xiao‐Yuan Jing"],"abstract":"We propose FLORA, a novel Federated Parameter-Efficient Fine-Tuning framework that combines orthogonal low-rank adaptation for global prompt with attention-guided visual adapter to address the balance between generalization and personalization under data heterogeneity. Specifically, each client personalizes the global prompt through orthogonal low-rank adaptation term with an additional orthogonal loss, thereby achieving efficient local adaptation while maintaining the generalization of the global prompt. Furthermore, we design a lightweight attention-guided visual adapter for the image encoder to enhance cross-modal alignment, further balancing generalization and personalization. Extensive experiments on multiple datasets demonstrate that our FLORA achieves superior performance in balancing generalization and personalization over state-of-the-art methods.","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/icassp55912.2026.11464153","openalex_id":"https://openalex.org/W7155081870","cited_by_count":0,"quality_score":45,"matched_keywords":["personalization","efficient"],"author_affiliations":["Nanjing University","Nanjing University of Posts and Telecommunications","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.705299973487854},{"id":"https://openalex.org/C183003079","display_name":"Personalization","score":0.5997999906539917},{"id":"https://openalex.org/C177148314","display_name":"Generalization","score":0.5098999738693237},{"id":"https://openalex.org/C168031717","display_name":"Balance (ability)","score":0.4634000062942505},{"id":"https://openalex.org/C2992525071","display_name":"Federated learning","score":0.41760000586509705},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.35830000042915344},{"id":"https://openalex.org/C26517878","display_name":"Key (lock)","score":0.33959999680519104},{"id":"https://openalex.org/C136764020","display_name":"World Wide Web","score":0.31299999356269836}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7155083198","title":"DDPT: Distillation and Dynamic Prompt Tuning for Improving Complex Reasoning in Large Language Models","url":"https://doi.org/10.1109/icassp55912.2026.11461607","published":"2026-04-21","authors":["Ge Teng","Chen Shen","W B Wang","Sinan Fan","Liang Xie","Xiang Tian","X D Liu","Yuehua Chen","Jieping Ye"],"abstract":"Prompt tuning (PT) is a well-known parameter-efficient fine-tuning (PEFT) approach for tuning Large Language Models (LLMs) to downstream tasks. While its lightweight nature facilitates efficient deployment, it raises concerns regarding its limited capacity for complex reasoning tasks. Meanwhile, naively lengthening the soft prompt to achieve better reasoning performance can be inefficient and even ineffective. To address these drawbacks, we first reduce the training difficulty by transforming the training target from fitting ground truth into distilling knowledge from structurally similar hard prompts. Secondly, we consider training multiple short soft prompts, with each of them specialized for specific data subsets, instead of training a long one for the whole dataset. During inference, we can dynamically choose from these complementary soft prompts for better accuracy via strategies in...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/icassp55912.2026.11461607","openalex_id":"https://openalex.org/W7155083198","cited_by_count":0,"quality_score":45,"matched_keywords":["efficient","distillation"],"author_affiliations":["Alibaba Group (China)","Zhejiang University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6646999716758728},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.46480000019073486},{"id":"https://openalex.org/C204030448","display_name":"Distillation","score":0.3905999958515167},{"id":"https://openalex.org/C165064840","display_name":"Matching (statistics)","score":0.31630000472068787},{"id":"https://openalex.org/C26517878","display_name":"Key (lock)","score":0.31349998712539673},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.3100999891757965},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.3082999885082245},{"id":"https://openalex.org/C195344581","display_name":"Automated reasoning","score":0.3077999949455261}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"arxiv:2509.14930","title":"Cross-Modal Knowledge Distillation for Speech Large Language Models","url":"http://arxiv.org/abs/2509.14930","published":"2026-04-21","authors":["Enzhi Wang","Qiaowei Li","Zhiyuan Tang","Yuhan Jia"],"abstract":"In this work, we present the first systematic evaluation of catastrophic forgetting and modality inequivalence in speech large language models, showing that introducing speech capabilities can degrade knowledge and reasoning even when inputs remain textual, and performance further decreases with spoken queries. To address these challenges, we propose a cross-modal knowledge distillation framework that leverages both text-to-text and speech-to-text channels to transfer knowledge from a text-based teacher model to a speech LLM. Extensive experiments on dialogue and audio understanding tasks validate the effectiveness of our approach in preserving textual knowledge, improving cross-modal alignment, and enhancing reasoning in speech-based interactions.","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/icassp55912.2026.11464272","openalex_id":"https://openalex.org/W4417077701","cited_by_count":0,"quality_score":45,"matched_keywords":["LLM","distillation"],"author_affiliations":["Nankai University","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7821999788284302},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.5756000280380249},{"id":"https://openalex.org/C7149132","display_name":"Forgetting","score":0.5228999853134155},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.513700008392334},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4927000105381012},{"id":"https://openalex.org/C2983448237","display_name":"Language understanding","score":0.4359999895095825},{"id":"https://openalex.org/C2776230583","display_name":"Spoken language","score":0.4066999852657318},{"id":"https://openalex.org/C2780226545","display_name":"Modality (human–computer interaction)","score":0.38510000705718994}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7155053853","title":"CAD-Judge: Toward Efficient Morphological Grading and Verification for Text-to-CAD Generation","url":"https://doi.org/10.1109/icassp55912.2026.11461978","published":"2026-04-21","authors":["Zheyuan Zhou","Jiayi Han","Liang Du","Naiyu Fang","Qiu L","Shuyou Zhang"],"abstract":"Computer-Aided Design (CAD) models are widely used across industrial design, simulation, and manufacturing processes. Text-to-CAD systems aim to generate editable, general-purpose CAD models from textual descriptions, significantly reducing the complexity and entry barrier associated with traditional CAD workflows. However, rendering CAD models can be slow, and deploying VLMs to review CAD models can be expensive and may introduce reward hacking that degrades the systems. To address these challenges, we propose CAD-Judge, a novel, verifiable reward system for efficient and effective CAD preference grading and grammatical validation. We adopt the Compiler-as-a-Judge Module (CJM) as a fast, direct reward signal, optimizing model alignment by maximizing generative utility through prospect theory. To further improve the robustness during testing phase, we introduce a simple yet effective age...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/icassp55912.2026.11461978","openalex_id":"https://openalex.org/W7155053853","cited_by_count":0,"quality_score":45,"matched_keywords":["preference","efficient"],"author_affiliations":["Inspur (China)","Nanyang Technological University","Tencent (China)","Zhejiang University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5956000089645386},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5253000259399414},{"id":"https://openalex.org/C2777286243","display_name":"Grading (engineering)","score":0.3587000072002411},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.34549999237060547},{"id":"https://openalex.org/C153180895","display_name":"Pattern recognition (psychology)","score":0.29330000281333923},{"id":"https://openalex.org/C115901376","display_name":"Automation","score":0.2732999920845032},{"id":"https://openalex.org/C2776401178","display_name":"Feature (linguistics)","score":0.24529999494552612},{"id":"https://openalex.org/C192209626","display_name":"Focus (optics)","score":0.2206999957561493}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"arxiv:2507.09929","title":"Aligning Generative Speech Enhancement with Perceptual Feedback","url":"http://arxiv.org/abs/2507.09929","published":"2026-04-21","authors":["Haoyang Li","Nana Hou","Yuchen Hu","Jixun Yao","Sabato Marco Siniscalchi","Xuyi Zhuang","Deheng Ye","Wei Yang","Eng Siong Chng"],"abstract":"Language Model (LM)-based speech enhancement (SE) has recently emerged as a promising direction, but existing approaches predominantly rely on token-level likelihood objectives that weakly reflect human perception. This mismatch limits progress, as optimizing signal accuracy does not always improve naturalness or listening comfort. We address this gap by introducing a perceptually aligned LM-based SE approach. Our method applies Direct Preference Optimization (DPO) with UTMOS, a neural MOS predictor, as a proxy for human ratings, directly steering models toward perceptually preferred outputs. This design directly connects model training to perceptual quality and is broadly applicable within LM-based SE frameworks. On the Deep Noise Suppression Challenge 2020 test sets, our approach consistently improves speech quality metrics, achieving relative gains of up to 56%. To our knowledge, this...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/icassp55912.2026.11461654","openalex_id":"https://openalex.org/W4414700447","cited_by_count":0,"quality_score":45,"matched_keywords":["language model","preference"],"author_affiliations":["Nanyang Technological University","Northwestern Polytechnical University","Tencent (China)","University of Palermo"],"concepts":[{"id":"https://openalex.org/C134537474","display_name":"Naturalness","score":0.7534000277519226},{"id":"https://openalex.org/C28490314","display_name":"Speech recognition","score":0.6690000295639038},{"id":"https://openalex.org/C2776182073","display_name":"Speech enhancement","score":0.6679999828338623},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6625000238418579},{"id":"https://openalex.org/C177291462","display_name":"Active listening","score":0.5379999876022339},{"id":"https://openalex.org/C26760741","display_name":"Perception","score":0.5317999720573425},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4348999857902527},{"id":"https://openalex.org/C2779530757","display_name":"Quality (philosophy)","score":0.4341999888420105}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"arxiv:2509.16971","title":"AUDIOGENIE-Reasoner: A Training-Free Multi-Agent Framework for Coarse-to-Fine Audio Deep Reasoning","url":"http://arxiv.org/abs/2509.16971","published":"2026-04-21","authors":["Rong Yan","Chenxing Li","Dong Yu","Liu Li"],"abstract":"Audio deep reasoning is a challenging task that requires expert-level perception, multi-step logical inference, and the integration of contextual knowledge. However, existing models suffer from a gap between audio perception and reasoning abilities due to the lack of training data with explicit reasoning chains and the absence of mechanisms for active exploration and iterative refinement. To address these challenges, we propose AudioGenie-Reasoner (AGR), the first unified training-free multi-agent system that coordinates perception and reasoning over an evolving chain of textual evidence. Our key idea is a paradigm shift that transforms audio deep reasoning into complex text understanding task from a new perspective, thereby unlocking the full potential of large language models. Specifically, the design of AGR mimics the human coarse-to-fine cognitive process. It first transforms the inp...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/icassp55912.2026.11463631","openalex_id":"https://openalex.org/W4415252758","cited_by_count":0,"quality_score":45,"matched_keywords":["agent","multi-agent"],"author_affiliations":["Tencent (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7936000227928162},{"id":"https://openalex.org/C2780451532","display_name":"Task (project management)","score":0.6251000165939331},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5473999977111816},{"id":"https://openalex.org/C26760741","display_name":"Perception","score":0.49880000948905945},{"id":"https://openalex.org/C89288958","display_name":"Reasoning system","score":0.47429999709129333},{"id":"https://openalex.org/C26517878","display_name":"Key (lock)","score":0.46470001339912415},{"id":"https://openalex.org/C108583219","display_name":"Deep learning","score":0.4271000027656555},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.4185999929904938}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7155048739","title":"Visual Keys to Symphonies: Latent Diffusion for Multi-Scene Video-to-Music Generation","url":"https://doi.org/10.1109/icassp55912.2026.11462058","published":"2026-04-21","authors":["Chiu Fai Ng","Karsper So","Jing Yang","Patricio Ovalle","Simon Lui","Fan Fan","Hanyu Dong"],"abstract":"The generation of coherent and emotionally resonant music for multi-scene videos remains a central challenge in the video-to-music (V2M) domain. We introduce a latent diffusion framework that addresses this by synthesizing globally coherent, high-fidelity background scores. Our approach moves beyond trivial audiovisual synchronization, establishing culturally-grounded semantic correspondences between the visual narrative and resulting acoustic properties. To facilitate this, we introduced a scalable data pipeline for synthesizing large-scale, audiovisuals-aligned pre-training data. The generative process is conditioned on a combination of visual keyframes and extracted affective cues, and the model is then fine-tuned using Direct Preference Optimization (DPO) to directly align its outputs with human aesthetic preferences. Extensive evaluations on established benchmarks demonstrate that o...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/icassp55912.2026.11462058","openalex_id":"https://openalex.org/W7155048739","cited_by_count":0,"quality_score":41,"matched_keywords":["preference"],"author_affiliations":["Huawei Technologies (China)","Huawei Technologies (United States)","Tsinghua University"],"concepts":[{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5232999920845032},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5230000019073486},{"id":"https://openalex.org/C153180895","display_name":"Pattern recognition (psychology)","score":0.3946000039577484},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.3840999901294708},{"id":"https://openalex.org/C2776401178","display_name":"Feature (linguistics)","score":0.30410000681877136},{"id":"https://openalex.org/C11413529","display_name":"Algorithm","score":0.29739999771118164},{"id":"https://openalex.org/C36464697","display_name":"Visualization","score":0.2865000069141388},{"id":"https://openalex.org/C33923547","display_name":"Mathematics","score":0.2727000117301941}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"arxiv:2509.13785","title":"Summary on The Multilingual Conversational Speech Language Model Challenge: Datasets, Tasks, Baselines, and Methods","url":"http://arxiv.org/abs/2509.13785","published":"2026-04-21","authors":["Bingshen Mu","Pengcheng Guo","Zhiming Sun","Shuai Wang","Hexin Liu","Mingchen Shao","Lei Xie","Eng Siong Chng","Longshuai Xiao","Qiangze Feng","Daliang Wang"],"abstract":"This paper summarizes the Interspeech2025 Multilingual Conversational Speech Language Model (MLC-SLM) challenge, which aims to advance the exploration of building effective multilingual conversational speech LLMs (SLLMs). We provide a detailed description of the task settings for the MLC-SLM challenge, the released real-world multilingual conversational speech dataset totaling approximately 1,604 hours, and the baseline systems for participants. The MLC-SLM challenge attracts 78 teams from 13 countries to participate, with 489 valid leaderboard results and 14 technical reports for the two tasks. We distill valuable insights on building multilingual conversational SLLMs based on submissions from participants, aiming to contribute to the advancement of the community.","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/icassp55912.2026.11464955","openalex_id":"https://openalex.org/W4416255027","cited_by_count":0,"quality_score":41,"matched_keywords":["language model"],"author_affiliations":["DXC Technology (United States)","Huawei Technologies (China)","Nanjing University of Science and Technology","Nanyang Technological University","Northwestern Polytechnical University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7251999974250793},{"id":"https://openalex.org/C2780451532","display_name":"Task (project management)","score":0.600600004196167},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.54339998960495},{"id":"https://openalex.org/C12725497","display_name":"Baseline (sea)","score":0.48570001125335693},{"id":"https://openalex.org/C41895202","display_name":"Linguistics","score":0.4715999960899353},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4059999883174896},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.39879998564720154},{"id":"https://openalex.org/C504749915","display_name":"Speech technology","score":0.38580000400543213}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7155045137","title":"PromptSID: A Self-Iterative Distillation Framework For Unsupervised Adaptation Of Vision-Language Models","url":"https://doi.org/10.1109/icassp55912.2026.11463147","published":"2026-04-21","authors":["Yikai Lin","Xianwei Zhuang","Junbin Zhang","Chenxing Li","Zikang Huang","Yuexian Zou"],"abstract":"Vision-Language Models (VLMs) such as CLIP demonstrate remarkable zero-shot recognition, yet their adaptation to unseen domains without labels remains challenging. We propose PromptSID, a self-iterative distillation framework for unsupervised adaptation of VLMs. PromptSID operates in two stages: (1) generating high-confidence pseudo-labels from zero-shot CLIP to train a prompt-enhanced student model; (2) iteratively replacing the teacher with the best student to refine both pseudo-labels and prompts in a loop. This iterative mechanism enables stable adaptation by leveraging pseudo-supervision and dynamic prompt optimization. Experiments on 13 image classification benchmarks show that PromptSID significantly surpasses zero-shot CLIP, demonstrating consistent gains across diverse domains. Our framework highlights the potential of prompt-based self-iterative distillation to unlock hidden ca...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/icassp55912.2026.11463147","openalex_id":"https://openalex.org/W7155045137","cited_by_count":0,"quality_score":41,"matched_keywords":["distillation"],"author_affiliations":["Peking University","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5830000042915344},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5285000205039978},{"id":"https://openalex.org/C204030448","display_name":"Distillation","score":0.41690000891685486},{"id":"https://openalex.org/C139807058","display_name":"Adaptation (eye)","score":0.39410001039505005},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.37630000710487366},{"id":"https://openalex.org/C124101348","display_name":"Data mining","score":0.3021000027656555},{"id":"https://openalex.org/C154030694","display_name":"Fractionating column","score":0.3005000054836273},{"id":"https://openalex.org/C98045186","display_name":"Process (computing)","score":0.2597000002861023}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"arxiv:2506.09175","title":"Phrased: Phrase Dictionary Biasing for Speech Translation","url":"http://arxiv.org/abs/2506.09175","published":"2026-04-21","authors":["Peidong Wang","Jian Xue","Rui Zhao","Junkun Chen","Aswin Shanmugam Subramanian","Jinyu Li"],"abstract":"Phrases are essential to understand the core concepts in conversations. However, due to their rare occurrence in training data, correct translation of phrases is challenging in speech translation tasks. In this paper, we propose a phrase dictionary biasing method to leverage pairs of phrases mapping from the source language to the target language. We apply the phrase dictionary biasing method to two types of widely adopted models, a transducer-based streaming speech translation model and a multimodal large language model. Experimental results show that the phrase dictionary biasing method outperforms phrase list biasing by 21% relatively for the streaming speech translation model. In addition, phrase dictionary biasing enables multimodal large language models to use external phrase information, achieving 85% relative improvement in phrase recall.","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"preprint","doi":"https://doi.org/10.1109/icassp55912.2026.11462714","openalex_id":"https://openalex.org/W4417307922","cited_by_count":0,"quality_score":41,"matched_keywords":["language model"],"author_affiliations":["Microsoft (United States)","Microsoft Research (United Kingdom)"],"concepts":[{"id":"https://openalex.org/C2776224158","display_name":"Phrase","score":0.8274000287055969},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7731000185012817},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.6710000038146973},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.6308000087738037},{"id":"https://openalex.org/C153083717","display_name":"Leverage (statistics)","score":0.5365999937057495},{"id":"https://openalex.org/C28490314","display_name":"Speech recognition","score":0.5321000218391418},{"id":"https://openalex.org/C203005215","display_name":"Machine translation","score":0.48829999566078186},{"id":"https://openalex.org/C80877019","display_name":"Phrase structure rules","score":0.37389999628067017}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"arxiv:2509.18890","title":"Generalizability of Predictive and Generative Speech Enhancement Models to Pathological Speakers","url":"http://arxiv.org/abs/2509.18890","published":"2026-04-21","authors":["Mingzhen Hou","Ante Jukić","Ina Kodrasi"],"abstract":"State-of-the-art speech enhancement (SE) models achieve strong performance on neurotypical speech, but their effectiveness is substantially reduced for pathological speech. In this paper, we investigate strategies to address this gap for both predictive and generative SE models, including (i) training models from scratch using pathological data, (ii) fine-tuning models pre-trained on neurotypical speech with additional data from pathological speakers, and (iii) speaker-specific personalization using only data from the individual pathological test speaker. Our results show that, despite the limited size of pathological speech datasets, SE models can be successfully trained or fine-tuned on such data. Fine-tuning models with data from several pathological speakers yields the largest performance improvements, while speaker-specific personalization is less effective, likely due to the small....","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/icassp55912.2026.11461028","openalex_id":"https://openalex.org/W4416258799","cited_by_count":0,"quality_score":41,"matched_keywords":["personalization"],"author_affiliations":["Idiap Research Institute","Nvidia (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.621399998664856},{"id":"https://openalex.org/C27158222","display_name":"Generalizability theory","score":0.506600022315979},{"id":"https://openalex.org/C207886595","display_name":"Pathological","score":0.4796999990940094},{"id":"https://openalex.org/C167966045","display_name":"Generative model","score":0.4255000054836273},{"id":"https://openalex.org/C2778391849","display_name":"Neurotypical","score":0.424699991941452},{"id":"https://openalex.org/C183003079","display_name":"Personalization","score":0.399399995803833},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.39570000767707825},{"id":"https://openalex.org/C28490314","display_name":"Speech recognition","score":0.3603000044822693}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"arxiv:2504.07433","title":"From Token to Line: Enhancing Code Generation with a Long-Term Perspective","url":"http://arxiv.org/abs/2504.07433","published":"2026-04-21","authors":["Tingwei Lu","Yangning Li","Liyuan Wang","Binghuai Lin","Qingsong Lv","Zishan Xu","Haitao Zheng","Yinghui Li","Hong-Gee Kim"],"abstract":"The emergence of large language models (LLMs) has significantly promoted the development of code generation task, sparking a surge in pertinent literature. Current research is hindered by redundant generation results and a tendency to overfit local patterns in the short term. Although existing studies attempt to alleviate the issue by adopting a multi-token prediction strategy, there remains limited focus on choosing the appropriate processing length for generations. By analyzing the attention between tokens during the generation process of LLMs, it can be observed that the high spikes of the attention scores typically appear at the end of lines. This insight suggests that it is reasonable to treat each line of code as a fundamental processing unit and generate them sequentially. Inspired by this, we propose the LSR-MCTS algorithm, which leverages MCTS to determine the code line-by-line....","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/icassp55912.2026.11464461","openalex_id":"https://openalex.org/W4417298998","cited_by_count":0,"quality_score":41,"matched_keywords":["long-term"],"author_affiliations":["Seoul National University","Tencent (China)","Tsinghua University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7978000044822693},{"id":"https://openalex.org/C48145219","display_name":"Security token","score":0.6349999904632568},{"id":"https://openalex.org/C133162039","display_name":"Code generation","score":0.6046000123023987},{"id":"https://openalex.org/C22019652","display_name":"Overfitting","score":0.552299976348877},{"id":"https://openalex.org/C2776760102","display_name":"Code (set theory)","score":0.5299999713897705},{"id":"https://openalex.org/C192209626","display_name":"Focus (optics)","score":0.5184000134468079},{"id":"https://openalex.org/C12713177","display_name":"Perspective (graphical)","score":0.47600001096725464},{"id":"https://openalex.org/C98045186","display_name":"Process (computing)","score":0.4616999924182892}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7155090955","title":"Feedback-Driven Retrieval-Augmented Audio Generation with Large Audio Language Models","url":"https://doi.org/10.1109/icassp55912.2026.11462219","published":"2026-04-21","authors":["Junqi Zhao","Chenxing Li","Jinzheng Zhao","Rilin Chen","Dong Yu","Mark D. Plumbley","Wenwu Wang"],"abstract":"We propose a general feedback-driven retrieval-augmented generation (RAG) approach that leverages Large Audio Language Models (LALMs) to address the missing or imperfect synthesis of specific sound events in text-to-audio (TTA) generation. Unlike previous RAG-based TTA methods that typically train specialized models from scratch, we utilize LALMs to analyze audio generation outputs, retrieve concepts that pre-trained models struggle to generate from an external database, and incorporate the retrieved information into the generation process. Experimental results show that our method not only enhances the ability of LALMs to identify missing sound events but also delivers improvements across different models, outperforming existing RAG-specialized approaches.","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/icassp55912.2026.11462219","openalex_id":"https://openalex.org/W7155090955","cited_by_count":0,"quality_score":41,"matched_keywords":["retrieval"],"author_affiliations":["KLA (United States)","Seattle University","Tencent (China)","University of Surrey"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6815999746322632},{"id":"https://openalex.org/C28490314","display_name":"Speech recognition","score":0.47699999809265137},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.37040001153945923},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3346000015735626},{"id":"https://openalex.org/C177264268","display_name":"Set (abstract data type)","score":0.3190999925136566},{"id":"https://openalex.org/C87687168","display_name":"Digital audio","score":0.30709999799728394},{"id":"https://openalex.org/C195324797","display_name":"Natural language","score":0.3050999939441681},{"id":"https://openalex.org/C160372630","display_name":"Audio analyzer","score":0.30399999022483826}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7155028917","title":"Exploration Beyond Budget: Training Large Language Models to Explore Under Truncation Constraints","url":"https://doi.org/10.1109/icassp55912.2026.11461683","published":"2026-04-21","authors":["Yi Ouyang","Qianru Wang","Xian Wu"],"abstract":"Reinforcement Learning with Verifiable Rewards (RLVR) usually employs binary outcome rewards from rule-based verifiers to improve the reasoning abilities of large language models (LLMs). However, under constrained training budgets, responses truncated before producing a final answer are uniformly penalized, thereby conflating incomplete reasoning with genuine errors. This truncation-induced bias discourages exploration and limits generalization. To address this issue, we propose Difficulty-aware Truncation Reward (DATR), a novel reward-shaping method that adaptively allocates rewards to truncated responses based on the estimated difficulty of the input prompt. DATR encourages efficient reasoning on tractable problems while promoting deeper exploration on challenging ones. Experiments on mathematical benchmarks (AIME 2025 and HMMT 2025) demonstrate that DATR mitigates the negative impact....","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/icassp55912.2026.11461683","openalex_id":"https://openalex.org/W7155028917","cited_by_count":0,"quality_score":41,"matched_keywords":["efficient"],"author_affiliations":["City University of Hong Kong","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C2777211547","display_name":"Training (meteorology)","score":0.6144000291824341},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6065999865531921},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.45820000767707825},{"id":"https://openalex.org/C106195933","display_name":"Truncation (statistics)","score":0.36809998750686646},{"id":"https://openalex.org/C2776036281","display_name":"Constraint (computer-aided design)","score":0.30399999022483826},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.2824999988079071},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.28189998865127563},{"id":"https://openalex.org/C149364088","display_name":"Translation (biology)","score":0.27810001373291016}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"arxiv:2509.23938","title":"Easy Turn: Integrating Acoustic and Linguistic Modalities for Robust Turn-Taking in Full-Duplex Spoken Dialogue Systems","url":"http://arxiv.org/abs/2509.23938","published":"2026-04-21","authors":["Guojian Li","Chengyou Wang","Hongfei Xue","Shuiyuan Wang","Dehui Gao","Zihan Zhang","Yuke Lin","Wenjie Li","Longshuai Xiao","Zhonghua Fu","Lei Xie"],"abstract":"Full-duplex interaction is crucial for natural human–machine communication, yet remains challenging as it requires robust turn-taking detection to decide when the system should speak, listen, or remain silent. Existing solutions either rely on dedicated turn-taking models, most of which are not open-sourced. The few vailable ones are limited by their large parameter size or by supporting only a single modality, such as acoustic or linguistic. Alternatively, some approaches finetune LLM backbones to enable full-duplex capability, but this requires large amounts of full-duplex data, which remain scarce in open-source form. To address these issues, we propose Easy Turn—an open-source, modular turn-taking detection model that integrates acoustic and linguistic bimodal information to predict four dialogue turn states: complete, incomplete, backchannel, and wait, accompanied by the release of....","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/icassp55912.2026.11463929","openalex_id":"https://openalex.org/W4415334984","cited_by_count":0,"quality_score":41,"matched_keywords":["LLM"],"author_affiliations":["Huawei Technologies (China)","Northwestern Polytechnical University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.754800021648407},{"id":"https://openalex.org/C2779903281","display_name":"Modalities","score":0.5364999771118164},{"id":"https://openalex.org/C101468663","display_name":"Modular design","score":0.5159000158309937},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4740000069141388},{"id":"https://openalex.org/C28490314","display_name":"Speech recognition","score":0.40130001306533813},{"id":"https://openalex.org/C2776608160","display_name":"Natural (archaeology)","score":0.4009999930858612},{"id":"https://openalex.org/C195324797","display_name":"Natural language","score":0.3959999978542328},{"id":"https://openalex.org/C2779439875","display_name":"Natural language understanding","score":0.38600000739097595}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7155040765","title":"Discrete Diffusion for Generative Modeling of Text-Aligned Speech Tokens","url":"https://doi.org/10.1109/icassp55912.2026.11462921","published":"2026-04-21","authors":["Pin-Jui Ku","He Huang","Jean-Marie Lemercier","Subham Sekhar Sahoo","Zhehuai Chen","Ante Jukić"],"abstract":"This paper introduces a discrete diffusion model (DDM) framework for text-aligned speech tokenization and reconstruction. By replacing the auto-regressive speech decoder with a discrete diffusion counterpart, our model achieves significantly better reconstruction quality, stronger ASR performance, and faster inference. We provide a comprehensive analysis of applying DDMs to speech reconstruction, examining sampler choices, inference steps, and robustness to length-scale estimation errors. Furthermore, we improve the original TASTE by systematically comparing vector quantization modules, showing that FSQ yields up to a 35% relative WER reduction and +0.14 UT-MOS improvement over RVQ for AR models, while also enhancing DDM performance. Our model generates speech in just 10 denoising steps and even supports single-step generation with only minor quality degradation.","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/icassp55912.2026.11462921","openalex_id":"https://openalex.org/W7155040765","cited_by_count":0,"quality_score":41,"matched_keywords":["quantization"],"author_affiliations":["Nvidia (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6175000071525574},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.49970000982284546},{"id":"https://openalex.org/C28490314","display_name":"Speech recognition","score":0.3799000084400177},{"id":"https://openalex.org/C2776401178","display_name":"Feature (linguistics)","score":0.3612000048160553},{"id":"https://openalex.org/C167966045","display_name":"Generative model","score":0.35420000553131104},{"id":"https://openalex.org/C11413529","display_name":"Algorithm","score":0.3490000069141388},{"id":"https://openalex.org/C69357855","display_name":"Diffusion","score":0.34700000286102295},{"id":"https://openalex.org/C153180895","display_name":"Pattern recognition (psychology)","score":0.3192000091075897}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"arxiv:2509.14545","title":"Controlling Language Difficulty in Dialogues with Linguistic Features","url":"http://arxiv.org/abs/2509.14545","published":"2026-04-21","authors":["Shijing Xu","Wenguang Wang","Handong Gao","Wei Kang","Long Qin","Weizhi Wang"],"abstract":"Large language models (LLMs) have emerged as powerful tools for supporting second language acquisition, particularly in simulating interactive dialogues for speaking practice. However, adapting the language difficulty of LLM-generated responses to match learners’ proficiency levels remains a challenge. This work addresses this issue by proposing a framework for controlling language proficiency in educational dialogue systems. Our approach leverages three categories of linguistic features, readability features (e.g., Flesch-Kincaid Grade Level), syntactic features (e.g., syntactic tree depth), and lexical features (e.g., simple word ratio), to quantify and regulate text complexity. We demonstrate that training LLMs on linguistically annotated dialogue data enables precise modulation of language proficiency, outperforming prompt-based methods in both flexibility and stability. To evaluate....","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/icassp55912.2026.11462256","openalex_id":"https://openalex.org/W4417081403","cited_by_count":0,"quality_score":41,"matched_keywords":["LLM"],"author_affiliations":["Alibaba Group (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7914000153541565},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.6013000011444092},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5877000093460083},{"id":"https://openalex.org/C2780598303","display_name":"Flexibility (engineering)","score":0.5177000164985657},{"id":"https://openalex.org/C2778143727","display_name":"Readability","score":0.5159000158309937},{"id":"https://openalex.org/C41895202","display_name":"Linguistics","score":0.4406000077724457},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.41359999775886536},{"id":"https://openalex.org/C60048249","display_name":"Syntax","score":0.4049000144004822}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"arxiv:2510.12995","title":"Continuous-Token Diffusion for Speaker-Referenced TTS in Multimodal LLMs","url":"http://arxiv.org/abs/2510.12995","published":"2026-04-21","authors":["Xinlu He","Swayambhu Nath Ray","Harish Mallidi","Jia‐Hong Huang","Ashwin Bellur","Chander Chandak","Md. Hasan Maruf","Venkatesh Ravichandran"],"abstract":"Unified architectures in multimodal large language models (MLLM) have shown promise in handling diverse tasks within a single framework. In the text-to-speech (TTS) task, current MLLM-based approaches rely on discrete tokens, which disregard the inherently continuous nature of speech and can lead to loss of fine acoustic detail. In this work, we investigate TTS within the MLLM paradigm using continuous speech representations. We design a dual-head architecture and implement two complementary training strategies for a robust model. (1) A diffusion head generating continuous speech representations is added to the MLLM, which is on frame-level and strictly autoregressive. (2) The original language model head is retained to preserve multitask capability and to control the start and end of speech synthesis. (3) Masked training is employed to address exposure bias in autoregressive decoding. (...","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/icassp55912.2026.11463131","openalex_id":"https://openalex.org/W4415274124","cited_by_count":0,"quality_score":41,"matched_keywords":["language model"],"author_affiliations":["Amazon (United States)","Worcester Polytechnic Institute"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7081999778747559},{"id":"https://openalex.org/C159877910","display_name":"Autoregressive model","score":0.7016000151634216},{"id":"https://openalex.org/C28490314","display_name":"Speech recognition","score":0.640999972820282},{"id":"https://openalex.org/C48145219","display_name":"Security token","score":0.5766000151634216},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.5666000247001648},{"id":"https://openalex.org/C2780312720","display_name":"Head (geology)","score":0.47699999809265137},{"id":"https://openalex.org/C111335779","display_name":"Reduction (mathematics)","score":0.4203000068664551},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4124999940395355}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7155053910","title":"Chain of Correction for Full-Text Speech Recognition with Large Language Models","url":"https://doi.org/10.1109/icassp55912.2026.11462360","published":"2026-04-21","authors":["Zhiyuan Tang","D Wang","Zhikai Zhou","Yong Liu","Shen Huang","Shidong Shang"],"abstract":"Full-text error correction with Large Language Models (LLMs) for Automatic Speech Recognition (ASR) is attracting increased attention for its ability to address a wide range of error types, such as punctuation restoration and inverse text normalization, across long context. However, challenges remain regarding stability, controllability, completeness, and fluency. To mitigate these issues, this paper proposes the Chain of Correction (CoC), which uses a multi-turn chat format to correct errors segment by segment, guided by pre-recognized text and full-text context for better semantic understanding. Utilizing the open-sourced ChFT dataset, we fine-tune a pre-trained LLM to evaluate CoC’s performance. Experiments show that CoC significantly outperforms baseline and benchmark systems in correcting full-text ASR outputs. We also analyze correction thresholds to balance under-correction and ov...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/icassp55912.2026.11462360","openalex_id":"https://openalex.org/W7155053910","cited_by_count":0,"quality_score":41,"matched_keywords":["LLM"],"author_affiliations":["Tencent (China)","Tsinghua University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7006999850273132},{"id":"https://openalex.org/C28490314","display_name":"Speech recognition","score":0.6482999920845032},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.4745999872684479},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.47130000591278076},{"id":"https://openalex.org/C199185054","display_name":"Chain (unit)","score":0.454800009727478},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.45419999957084656},{"id":"https://openalex.org/C61328038","display_name":"Speech processing","score":0.3212999999523163},{"id":"https://openalex.org/C2776401178","display_name":"Feature (linguistics)","score":0.3212999999523163}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7155049591","title":"Align2speak: Improving TTS for Low Resource Languages via ASR-Guided Online Preference Optimization","url":"https://doi.org/10.1109/icassp55912.2026.11460913","published":"2026-04-21","authors":["Shehzeen Hussain","Paarth Neekhara","Xuesong Yang","Edresson Casanova","Subhankar Ghosh","Roy Fejgin","Ryan Langman","Mikyas Desta","Leili Tavabi","Jason Li"],"abstract":"Developing high-quality text-to-speech (TTS) systems for low-resource languages is challenging due to the scarcity of paired text and speech data. In contrast, automatic speech recognition (ASR) models for such languages are often more accessible, owing to large-scale multilingual pre-training efforts. We propose a framework based on Group Relative Policy Optimization (GRPO) to adapt an autoregressive, multilingual TTS model to new languages. Our method first establishes a language-agnostic foundation for TTS synthesis by training a multilingual baseline with International Phonetic Alphabet (IPA) tokens. Next, we fine-tune this model on limited paired data of the new languages to capture the target language’s prosodic features. Finally, we apply GRPO to optimize the model using only unpaired text and speaker prompts, guided by a multi-objective reward from pretrained ASR, speaker verific...","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/icassp55912.2026.11460913","openalex_id":"https://openalex.org/W7155049591","cited_by_count":0,"quality_score":41,"matched_keywords":["preference"],"author_affiliations":["Nvidia (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6261000037193298},{"id":"https://openalex.org/C2781249084","display_name":"Preference","score":0.4740999937057495},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4738999903202057},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.3546999990940094},{"id":"https://openalex.org/C206345919","display_name":"Resource (disambiguation)","score":0.34929999709129333},{"id":"https://openalex.org/C195324797","display_name":"Natural language","score":0.2847000062465668},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.2709999978542328},{"id":"https://openalex.org/C81917197","display_name":"Selection (genetic algorithm)","score":0.26919999718666077}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"arxiv:2306.03741","title":"Pre-training Tensor-Train Networks Facilitates Machine Learning with Variational Quantum Circuits","url":"http://arxiv.org/abs/2306.03741","published":"2026-04-21","authors":["Jun Qi","Chao-Han Huck Yang","Pin‐Yu Chen","Min-Hsiu Hsieh"],"abstract":"Data encoding remains a fundamental bottleneck in quantum machine learning, where amplitude encoding of high-dimensional classical vectors into quantum states incurs exponential cost. In this work, we propose a pre-trained tensor-train (TT) encoding network (Pre-TT-Encoder) that significantly reduces the computational complexity of amplitude encoding while preserving essential data structure. The Pre-TT-Encoder exploits low-rank TT decompositions learned from classical data, enabling polynomial-time state preparation in the number of qubits and TT-ranks. We provide a theoretical analysis of the encoding complexity and establish fidelity bounds that quantify the trade-off between TT-rank and approximation error. Empirical evaluations on classical (MNIST) and quantum-native (semiconductor quantum dot) datasets demonstrate that our approach achieves substantial gains in encoding efficiency....","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"preprint","doi":"https://doi.org/10.1109/icassp55912.2026.11461253","openalex_id":"https://openalex.org/W4379924986","cited_by_count":2,"quality_score":39,"matched_keywords":[],"author_affiliations":["Foxconn (Taiwan)","Georgia Institute of Technology","IBM (United States)","Nvidia (United States)"],"concepts":[{"id":"https://openalex.org/C177148314","display_name":"Generalization","score":0.7467079162597656},{"id":"https://openalex.org/C155281189","display_name":"Tensor (intrinsic definition)","score":0.6437812447547913},{"id":"https://openalex.org/C2776359362","display_name":"Representation (politics)","score":0.49166494607925415},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.47566336393356323},{"id":"https://openalex.org/C50644808","display_name":"Artificial neural network","score":0.47554799914360046},{"id":"https://openalex.org/C14036430","display_name":"Function (biology)","score":0.4647846221923828},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.42261379957199097},{"id":"https://openalex.org/C84114770","display_name":"Quantum","score":0.41470301151275635}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":2}},{"id":"openalex:W7155102364","title":"SynaSpot: A Lightweight, Streaming Multi-modal Framework for Keyword Spotting with Audio-Text Synergy","url":"https://doi.org/10.1109/icassp55912.2026.11462416","published":"2026-04-21","authors":["Kewei Li","Yinan Zhong","Xiaotao Liang","Tianchi Dai","Shaofei Xue"],"abstract":"Open-vocabulary keyword spotting (KWS) in continuous speech streams holds significant practical value across a wide range of real-world applications. While increasing attention has been paid to the role of different modalities in KWS, their effectiveness has been acknowledged. However, the increased parameter cost from multimodal integration and the constraints of end-to-end deployment have limited the practical applicability of such models. To address these challenges, we propose a lightweight, streaming multi-modal framework. First, we focus on multimodal enrollment features and reduce speaker-specific (voiceprint) information in the speech enrollment to extract speaker-irrelevant characteristics. Second, we effectively fuse speech and text features. Finally, we introduce a streaming decoding framework that only requires the encoder to extract features, which are then mathematically de...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/icassp55912.2026.11462416","openalex_id":"https://openalex.org/W7155102364","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Alibaba Group (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6967999935150146},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4713999927043915},{"id":"https://openalex.org/C2781213101","display_name":"Keyword spotting","score":0.40799999237060547},{"id":"https://openalex.org/C2779506182","display_name":"Spotting","score":0.3086000084877014},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.30820000171661377},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.257099986076355},{"id":"https://openalex.org/C26517878","display_name":"Key (lock)","score":0.25609999895095825},{"id":"https://openalex.org/C2776401178","display_name":"Feature (linguistics)","score":0.24500000476837158}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7155087989","title":"SmoothCLAP: Soft-Target Enhanced Contrastive Language-Audio Pretraining for Affective Computing","url":"https://doi.org/10.1109/icassp55912.2026.11462634","published":"2026-04-21","authors":["Xin Jing","J Wang","Andreas Triantafyllopoulos","Maurice Gerczuk","Shahin Amiriparian","Jun Luo","Björn Schuller"],"abstract":"The ambiguity of human emotions poses several challenges for machine learning models, as they often overlap and lack clear delineating boundaries. Contrastive language-audio pretraining (CLAP) has emerged as a key technique for generalisable emotion recognition. However, as conventional CLAP enforces a strict one-to-one alignment between paired audio–text samples, it overlooks intra-modal similarity and treats all non-matching pairs as equally negative. This conflicts with the fuzzy boundaries between different emotions. To address this limitation, we propose SmoothCLAP, which introduces softened targets derived from intra-modal similarity and paralinguistic features. By combining these softened targets with conventional contrastive supervision, SmoothCLAP learns embeddings that respect graded emotional relationships, while retaining the same inference pipeline as CLAP. Experiments on ei...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/icassp55912.2026.11462634","openalex_id":"https://openalex.org/W7155087989","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Huawei Technologies (China)","Huawei Technologies (United States)","Technical University of Munich"],"concepts":[{"id":"https://openalex.org/C15744967","display_name":"Psychology","score":0.5687999725341797},{"id":"https://openalex.org/C180747234","display_name":"Cognitive psychology","score":0.46939998865127563},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.3813999891281128},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3075999915599823},{"id":"https://openalex.org/C26760741","display_name":"Perception","score":0.27639999985694885},{"id":"https://openalex.org/C2776401178","display_name":"Feature (linguistics)","score":0.2565999925136566},{"id":"https://openalex.org/C116834253","display_name":"Identification (biology)","score":0.2547000050544739},{"id":"https://openalex.org/C138496976","display_name":"Developmental psychology","score":0.250900000333786}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7155076069","title":"Scaling Audio-Visual Quality Assessment Dataset via Crowdsourcing","url":"https://doi.org/10.1109/icassp55912.2026.11463693","published":"2026-04-21","authors":["Renyu Yang","Jian Jin","Lili Meng","Meiqin Liu","Yilin Wang","Balu Adsumilli","Weisi Lin"],"abstract":"Audio-visual quality assessment (AVQA) research has been stalled by limitations of existing datasets: they are typically small in scale, with insufficient diversity in content and quality, and annotated only with overall scores. These shortcomings provide limited support for model development and multimodal perception research. We propose a practical approach for AVQA dataset construction. First, we design a crowd-sourced subjective experiment framework for AVQA, breaks the constraints of in-lab settings and achieves reliable annotation across varied environments. Second, a systematic data preparation strategy is further employed to ensure broad coverage of both quality levels and semantic scenarios. Third, we extend the dataset with additional annotations, enabling research on multimodal perception mechanisms and their relation to content. Finally, we validate this approach through YT-N...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/icassp55912.2026.11463693","openalex_id":"https://openalex.org/W7155076069","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Beijing Jiaotong University","Google (United States)","Nanyang Technological University","Shandong Normal University"],"concepts":[{"id":"https://openalex.org/C62230096","display_name":"Crowdsourcing","score":0.7028999924659729},{"id":"https://openalex.org/C3020001037","display_name":"Quality assessment","score":0.5964000225067139},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5853999853134155},{"id":"https://openalex.org/C2779530757","display_name":"Quality (philosophy)","score":0.47999998927116394},{"id":"https://openalex.org/C124101348","display_name":"Data mining","score":0.3912999927997589},{"id":"https://openalex.org/C99844830","display_name":"Scaling","score":0.37630000710487366},{"id":"https://openalex.org/C24756922","display_name":"Data quality","score":0.3398999869823456},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.334199994802475}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7155091957","title":"RLBR: Reinforcement Learning with Biasing Rewards for Contextual Speech Large Language Models","url":"https://doi.org/10.1109/icassp55912.2026.11464286","published":"2026-04-21","authors":["Bo Ren","Ruchao Fan","Yelong Shen","Weizhu Chen","Jinyu Li"],"abstract":"Speech large language models (LLMs) have driven significant progress in end-to-end speech understanding and recognition, yet they continue to struggle with accurately recognizing rare words and domain-specific terminology. This paper presents a novel fine-tuning method, Reinforcement Learning with Biasing Rewards (RLBR), which employs a specialized biasing words preferred reward to explicitly emphasize biasing words in the reward calculation. In addition, we introduce reference-aware mechanisms that extend the reinforcement learning algorithm with reference transcription to strengthen the potential trajectory exploration space. Experiments on the LibriSpeech corpus across various biasing list sizes demonstrate that RLBR delivers substantial performance improvements over strong supervised fine-tuning (SFT) baseline and consistently outperforms several recently published methods. The propo...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/icassp55912.2026.11464286","openalex_id":"https://openalex.org/W7155091957","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Microsoft (United States)"],"concepts":[{"id":"https://openalex.org/C15744967","display_name":"Psychology","score":0.49619999527931213},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.45669999718666077},{"id":"https://openalex.org/C67203356","display_name":"Reinforcement","score":0.38659998774528503},{"id":"https://openalex.org/C180747234","display_name":"Cognitive psychology","score":0.38530001044273376},{"id":"https://openalex.org/C74672266","display_name":"Language acquisition","score":0.3522999882698059},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.34459999203681946},{"id":"https://openalex.org/C97541855","display_name":"Reinforcement learning","score":0.34310001134872437},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.32580000162124634}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7155072977","title":"Multi Stage Training with Dynamic Data Balancing for Multilingual Speech Recognition and Translation","url":"https://doi.org/10.1109/icassp55912.2026.11464449","published":"2026-04-21","authors":["Nithin Koluguri","Monica Sekoyan","Nune Tadevosyan","Nikolay Karpov","Jagadeesh Balam","Boris Ginsburg"],"abstract":"Training large-scale multilingual speech models is often hindered by severe data imbalances across tasks, languages, and corpora. We introduce a systematic, multi-stage training framework to ad-dress this challenge. First, for pre-training on 1.7 million hours of speech across 25 languages, we propose an inverted two-tier sampling policy that prioritizes balancing corpora within each language before balancing across languages, enhancing data diversity. Second, during fine-tuning, we apply a dynamic data balancing schedule that smoothly transitions the data mixture from the initial imbalanced state to a targeted, high-quality distribution using a co-sine schedule. This gradual adaptation prevents catastrophic forget-ting and maximizes performance. Our resulting 1-billion parameter model, Canary-1B-v2, demonstrates the success of this methodology, achieving state-of-the-art multilingual AS...","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/icassp55912.2026.11464449","openalex_id":"https://openalex.org/W7155072977","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Nvidia (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.738099992275238},{"id":"https://openalex.org/C28490314","display_name":"Speech recognition","score":0.6690999865531921},{"id":"https://openalex.org/C3017905481","display_name":"Multi stage","score":0.5891000032424927},{"id":"https://openalex.org/C2777211547","display_name":"Training (meteorology)","score":0.5812000036239624},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.4846000075340271},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4814000129699707},{"id":"https://openalex.org/C51632099","display_name":"Training set","score":0.4560000002384186},{"id":"https://openalex.org/C2780366754","display_name":"Speech translation","score":0.40070000290870667}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7155036021","title":"Leveraging Large Multimodal Models for Audio-Video Deepfake Detection: A Pilot Study","url":"https://doi.org/10.1109/icassp55912.2026.11463461","published":"2026-04-21","authors":["Songjun Cao","Yuqi Li","Yi Luo","Jianjun Yin","Long Ma"],"abstract":"Audio-visual deepfake detection (AVD) is increasingly important as modern generators can fabricate convincing speech and video. Most current multimodal detectors are small, task-specific models: they work well on curated tests but scale poorly and generalize weakly across domains. We introduce AV-LMMDetect, a supervised fine-tuned(SFT) large multimodal model that casts AVD as a prompted yes/no classification—\"Is this video real or fake?\". Built on Qwen 2.5 Omni, it jointly analyzes audio and visual streams for deepfake detection and is trained in two stages: lightweight LoRA alignment followed by audio-visual encoder full fine-tuning. On FakeAVCeleb and Mavos-DD, AV-LMMDetect matches or surpasses prior methods and sets a new state of the art on Mavos-DD datasets.","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/icassp55912.2026.11463461","openalex_id":"https://openalex.org/W7155036021","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Fudan University","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6182000041007996},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.39340001344680786},{"id":"https://openalex.org/C124101348","display_name":"Data mining","score":0.29980000853538513},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.2994999885559082},{"id":"https://openalex.org/C116834253","display_name":"Identification (biology)","score":0.2806999981403351},{"id":"https://openalex.org/C192209626","display_name":"Focus (optics)","score":0.272599995136261},{"id":"https://openalex.org/C26517878","display_name":"Key (lock)","score":0.25920000672340393},{"id":"https://openalex.org/C2776401178","display_name":"Feature (linguistics)","score":0.24690000712871552}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7155075762","title":"Leveraging Large Language Models for Text Normalization of Non-Standard Words in Text-to-Speech Synthesis","url":"https://doi.org/10.1109/icassp55912.2026.11463552","published":"2026-04-21","authors":["Min Ma","Heiga Zen","James Zhao"],"abstract":"","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/icassp55912.2026.11463552","openalex_id":"https://openalex.org/W7155075762","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Google (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7242000102996826},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.645799994468689},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.6194000244140625},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.4440999925136566},{"id":"https://openalex.org/C136886441","display_name":"Normalization (sociology)","score":0.4397999942302704},{"id":"https://openalex.org/C195324797","display_name":"Natural language","score":0.37450000643730164},{"id":"https://openalex.org/C129792486","display_name":"Language identification","score":0.3034000098705292},{"id":"https://openalex.org/C155092808","display_name":"Computational linguistics","score":0.3019999861717224}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7155034633","title":"Latent Temporal Discrepancy as Motion Prior: A Loss-Weighting Strategy for Dynamic Fidelity in T2V","url":"https://doi.org/10.1109/icassp55912.2026.11460490","published":"2026-04-21","authors":["Meiqi Wu","Bingze Song","Ruimin Lin","Chen Zhu","Xiaokun Feng","Jianmin Wu","Xiangxiang Chu","Kaiqi Huang"],"abstract":"Video generation models have achieved notable progress in static scenarios, yet their performance in motion video generation remains limited, with quality degrading under drastic dynamic changes. This is due to noise disrupting temporal coherence and increasing the difficulty of learning dynamic regions. Unfortunately, existing diffusion models rely on static loss for all scenarios, constraining their ability to capture complex dynamics. To address this issue, we introduce Latent Temporal Discrepancy (LTD) as a motion prior to guide loss weighting. LTD measures frame-to-frame variation in the latent space, assigning larger penalties to regions with higher discrepancy while maintaining regular optimization for stable regions. This motion-aware strategy stabilizes training and enables the model to better reconstruct high-frequency dynamics. Extensive experiments on the general benchmark VB...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/icassp55912.2026.11460490","openalex_id":"https://openalex.org/W7155034633","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Alibaba Group (China)","Southeast University","University of Chinese Academy of Sciences"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5602999925613403},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5145000219345093},{"id":"https://openalex.org/C104114177","display_name":"Motion (physics)","score":0.5016000270843506},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.44999998807907104},{"id":"https://openalex.org/C99498987","display_name":"Noise (video)","score":0.35429999232292175},{"id":"https://openalex.org/C11413529","display_name":"Algorithm","score":0.32100000977516174},{"id":"https://openalex.org/C2776459999","display_name":"Fidelity","score":0.30790001153945923},{"id":"https://openalex.org/C2776401178","display_name":"Feature (linguistics)","score":0.30379998683929443}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7155058836","title":"Gen-SER: When the Generative Model Meets Speech Emotion Recognition","url":"https://doi.org/10.1109/icassp55912.2026.11463095","published":"2026-04-21","authors":["Taihui Wang","Jinzheng Zhao","Rilin Chen","Tong Lei","Wenwu Wang","Dong Yu"],"abstract":"Speech emotion recognition (SER) is crucial in speech understanding and generation. Most approaches are based on either classification models or large language models. Different from previous methods, we propose Gen-SER, a novel approach that reformulates SER as a distribution shift problem via generative models. We propose to project discrete class labels into a continuous space, and obtain the terminal distribution via sinusoidal taxonomy encoding. The targetmatching-based generative model is adopted to transform the initial distribution into the terminal distribution efficiently. The classification is achieved by calculating the similarity of the generated terminal distribution and ground truth terminal distribution. The experimental results confirm the efficacy of the proposed method, demonstrating its extensibility to various speech-understanding tasks and suggesting its potential a...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/icassp55912.2026.11463095","openalex_id":"https://openalex.org/W7155058836","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Bellevue Hospital Center","Tencent (China)","University of Surrey"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.536899983882904},{"id":"https://openalex.org/C28490314","display_name":"Speech recognition","score":0.5077000260353088},{"id":"https://openalex.org/C167966045","display_name":"Generative model","score":0.4300999939441681},{"id":"https://openalex.org/C15744967","display_name":"Psychology","score":0.39419999718666077},{"id":"https://openalex.org/C2777438025","display_name":"Emotion recognition","score":0.3912999927997589},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.35359999537467957},{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.3497999906539917},{"id":"https://openalex.org/C2776401178","display_name":"Feature (linguistics)","score":0.3481000065803528}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7155030855","title":"Fine-Grained Text-to-Image Synthesis with Semantic Refinement","url":"https://doi.org/10.1109/icassp55912.2026.11460951","published":"2026-04-21","authors":["Xia Song","Jianxin Sun","Yingya Zhang","Libin Wang","Qi Li","Z X Sun"],"abstract":"Recent advance in text-to-image synthesis greatly benefits from large-scale vision-language models such as CLIP. Despite the capability of producing high-quality and creative images, existing methods often struggle in capturing details of the text prompt, especially when the text is lengthy. We reveal that such an issue is partially caused by the imperfect text-image matching using CLIP, where fine-grained semantics may get obscured by the dominant ones. This work presents a new diffusion-based method that favors fine-grained synthesis with semantic refinement. Concretely, instead of getting a synthesis using the entire descriptive sentence as the prompt, users can emphasize some specific words of their own interests. For this purpose, we incorporate a semantic-induced gradient as a reference input in each denoising step to help the model understand the selected sub-concept. We find out....","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/icassp55912.2026.11460951","openalex_id":"https://openalex.org/W7155030855","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Alibaba Group (China)","University of Chinese Academy of Sciences"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6334999799728394},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.42570000886917114},{"id":"https://openalex.org/C177264268","display_name":"Set (abstract data type)","score":0.3449999988079071},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.32440000772476196},{"id":"https://openalex.org/C184337299","display_name":"Semantics (computer science)","score":0.32199999690055847},{"id":"https://openalex.org/C11413529","display_name":"Algorithm","score":0.27630001306533813},{"id":"https://openalex.org/C2776401178","display_name":"Feature (linguistics)","score":0.27090001106262207},{"id":"https://openalex.org/C98045186","display_name":"Process (computing)","score":0.2538999915122986}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7155084119","title":"FGGM: Fisher-Guided Gradient Masking for Continual Learning","url":"https://doi.org/10.1109/icassp55912.2026.11462111","published":"2026-04-21","authors":["Chao-Hong Tan","Qian Chen","Wen Wang","Yukun Ma","Chong Zhang","Chong Deng","Qinglin Zhang","Xiangang Li","Jieping Ye"],"abstract":"Catastrophic forgetting impairs the continuous learning of large language models. We propose Fisher-Guided Gradient Masking (FGGM), a framework that mitigates this by strategically selecting parameters for updates using diagonal Fisher Information. FGGM dynamically generates binary masks with adaptive thresholds, preserving critical parameters to balance stability and plasticity without requiring historical data. Unlike magnitude-based methods such as MIGU, our approach offers a mathematically principled parameter importance estimation. On the TRACE benchmark, FGGM shows a 9.6% relative improvement in retaining general capabilities over supervised fine-tuning (SFT) and a 4.4% improvement over MIGU on TRACE tasks. Additional analysis on code generation tasks confirms FGGM’s superior performance and reduced forgetting, establishing it as an effective solution.","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/icassp55912.2026.11462111","openalex_id":"https://openalex.org/W7155084119","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Alibaba Group (China)","Alibaba Group (United States)"],"concepts":[{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5266000032424927},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5220000147819519},{"id":"https://openalex.org/C99498987","display_name":"Noise (video)","score":0.34290000796318054},{"id":"https://openalex.org/C2777402240","display_name":"Masking (illustration)","score":0.33980000019073486},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.3019999861717224},{"id":"https://openalex.org/C153180895","display_name":"Pattern recognition (psychology)","score":0.2971999943256378},{"id":"https://openalex.org/C28490314","display_name":"Speech recognition","score":0.26930001378059387},{"id":"https://openalex.org/C33923547","display_name":"Mathematics","score":0.2689000070095062}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7155107173","title":"F5E-TTS: Enhancing Speech Synthesis by Aligning Text with Rich Semantic Representations","url":"https://doi.org/10.1109/icassp55912.2026.11462167","published":"2026-04-21","authors":["Yihang Chen","Hualei Wang","Na Li","Zhifeng Li"],"abstract":"Mainstream Text-to-Speech (TTS) systems still struggle with semantic alignment between speech and text, resulting in unnatural or incorrect speech for some hard cases, a limitation we attribute to the semantic sparsity of text. To address this, we introduce F5E-TTS, a novel flow-matching framework that enriches the synthesis process by conditioning on both text and semantically dense PPG embeddings, the bottleneck features from a Phonetic PosteriorGrams (PPG) model. The core of our framework is a training strategy where a Diffusion Transformer (DiT) backbone learns to implicitly align text and PPGs, further encouraged by an explicit cross-modal regularization technique using a shared vector-quantized (VQ) codebook. This enriched content representation generates more natural-sounding speech and significantly improves content accuracy. Our model achieves a state-of-the-art 16.4% relative W...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/icassp55912.2026.11462167","openalex_id":"https://openalex.org/W7155107173","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Tencent (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6873000264167786},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.5546000003814697},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4781999886035919},{"id":"https://openalex.org/C28490314","display_name":"Speech recognition","score":0.4284000098705292},{"id":"https://openalex.org/C14999030","display_name":"Speech synthesis","score":0.38370001316070557},{"id":"https://openalex.org/C41895202","display_name":"Linguistics","score":0.32910001277923584},{"id":"https://openalex.org/C177264268","display_name":"Set (abstract data type)","score":0.32019999623298645},{"id":"https://openalex.org/C2776359362","display_name":"Representation (politics)","score":0.29980000853538513}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"arxiv:2509.15692","title":"Direct Simultaneous Translation Activation for Large Audio-Language Models","url":"http://arxiv.org/abs/2509.15692","published":"2026-04-21","authors":["Pei Zhang","Yiming Wang","Jialong Tang","Baosong Yang","Rui Wang","Derek F. Wong","Fei Huang"],"abstract":"Simultaneous speech-to-text translation (Simul-S2TT) aims to translate speech into target text in real time, outputting translations while receiving source speech input, rather than waiting for the entire utterance to be spoken. Simul-S2TT research often modifies model architectures to implement read-write strategies. However, with the rise of large audio-language models (LALMs), a key challenge is how to directly activate Simul-S2TT capabilities in base models without additional architectural changes. In this paper, we introduce Simultaneous Self-Augmentation (SimulSA), a strategy that utilizes LALMs’ inherent capabilities to obtain simultaneous data by randomly truncating speech and constructing partially aligned translation. By incorporating them into offline SFT data, SimulSA effectively bridges the distribution gap between offline translation during pretraining and simultaneous tran...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/icassp55912.2026.11461653","openalex_id":"https://openalex.org/W4414700378","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Alibaba Group (China)","Alibaba Group (United States)","Shanghai Jiao Tong University","University of Macau"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7465000152587891},{"id":"https://openalex.org/C149364088","display_name":"Translation (biology)","score":0.7429999709129333},{"id":"https://openalex.org/C57273362","display_name":"Decoding methods","score":0.5842999815940857},{"id":"https://openalex.org/C26517878","display_name":"Key (lock)","score":0.5473999977111816},{"id":"https://openalex.org/C28490314","display_name":"Speech recognition","score":0.5372999906539917},{"id":"https://openalex.org/C2775852435","display_name":"Utterance","score":0.5322999954223633},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4706999957561493},{"id":"https://openalex.org/C2780366754","display_name":"Speech translation","score":0.44620001316070557}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"arxiv:2512.05137","title":"Chromouvqa: Benchmarking Vision-Language Models under Chromatic Camouflaged Images","url":"http://arxiv.org/abs/2512.05137","published":"2026-04-21","authors":["Yunfei Zhang","Yizhuo He","Yuanxun Shao","Zhengtao Yao","Haoyan Xu","Junhao Dong","Zhen Yao","Zhikang Dong"],"abstract":"Vision-Language Models (VLMs) have advanced multimodal understanding, yet still struggle when targets are embedded in cluttered backgrounds requiring figure–ground segregation. To address this, we introduce ChromouVQA, a large-scale, multi-task benchmark based on Ishihara-style chromatic camouflaged images. We extend classic dot plates with multiple fill geometries and vary chromatic separation, density, size, occlusion, and rotation, recording full metadata for reproducibility. The benchmark covers nine vision-question-answering tasks, including recognition, counting, comparison, and spatial reasoning. Evaluations of humans and VLMs reveal large gaps, especially under subtle chromatic contrast or disruptive geometric fills. We also propose a model-agnostic contrastive recipe aligning silhouettes with their camouflaged renderings, improving recovery of global shapes. ChromouVQA provides....","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/icassp55912.2026.11463218","openalex_id":"https://openalex.org/W4417141646","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Amazon (Germany)","California Southern University","Drury University","Google (United States)","Lehigh University","Nanyang Technological University","Stony Brook University","University of Southern California"],"concepts":[{"id":"https://openalex.org/C185798385","display_name":"Benchmark (surveying)","score":0.838100016117096},{"id":"https://openalex.org/C86251818","display_name":"Benchmarking","score":0.7630000114440918},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7032999992370605},{"id":"https://openalex.org/C196956537","display_name":"Chromatic scale","score":0.6504999995231628},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.6342999935150146},{"id":"https://openalex.org/C2776760102","display_name":"Code (set theory)","score":0.5440999865531921},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.5231000185012817},{"id":"https://openalex.org/C93518851","display_name":"Metadata","score":0.427700012922287}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7155056907","title":"CIP-DoA: Cross-Instance Prompted DoA Estimation via Semantic-Spatial Matching","url":"https://doi.org/10.1109/icassp55912.2026.11461913","published":"2026-04-21","authors":["Yu Chen","Qiquan Zhang","J Wang","Kainan Chen","Xinyuan Qian"],"abstract":"Audio-visual sound source localization (AV-SSL) leverages complementary cues from audio and vision to estimate the direction of arrival (DoA) of sound sources. Despite significant advancements, existing methods suffer from two key limitations: the spatially aligned audio–visual pairs are required, and DoAs cannot be associated with source identities in multi-source environments. These issues arise from a semantic–spatial misalignment, as they align modalities only at spatial level while neglecting semantic grounding. To this end, we propose a novel framework CIP-DoA for identity-aware AV-SSL. It introduces a Cross-Instance Prompting technique, which localizes a target source using an image of a different instance from the same category, removing reliance on paired data. A Semantic–Spatial Matching mechanism bridges the \"what–where\" gap by aligning visual prompts with spatial audio featur...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/icassp55912.2026.11461913","openalex_id":"https://openalex.org/W7155056907","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Alibaba Group (China)","Technical University of Munich","University of Science and Technology Beijing"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.541700005531311},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4228000044822693},{"id":"https://openalex.org/C11413529","display_name":"Algorithm","score":0.3968000113964081},{"id":"https://openalex.org/C165064840","display_name":"Matching (statistics)","score":0.3869999945163727},{"id":"https://openalex.org/C99498987","display_name":"Noise (video)","score":0.35010001063346863},{"id":"https://openalex.org/C153180895","display_name":"Pattern recognition (psychology)","score":0.31049999594688416},{"id":"https://openalex.org/C33923547","display_name":"Mathematics","score":0.2840999960899353},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.2637999951839447}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7155100046","title":"Benchmarking Gaslighting Attacks against Speech Large Language Models","url":"https://doi.org/10.1109/icassp55912.2026.11463139","published":"2026-04-21","authors":["Jinyang Wu","Bin Zhu","Xiandong Zou","Qiquan Zhang","Fang Xu","Pan Zhou"],"abstract":"As Speech Large Language Models (Speech LLMs) become increasingly integrated into voice-based applications, ensuring their robustness against manipulative or adversarial input becomes critical. Although prior work has studied adversarial attacks in text-based LLMs and vision-language models, the unique cognitive and perceptual challenges of speech-based interaction remain underexplored. In contrast, speech presents inherent ambiguity, continuity, and perceptual diversity, which make adversarial attacks more difficult to detect. In this paper, we introduce gaslighting attacks, strategically crafted prompts designed to mislead, override, or distort model reasoning as a means to evaluate the vulnerability of Speech LLMs. Specifically, we construct five manipulation strategies: Anger, Cognitive Disruption, Sarcasm, Implicit, and Professional Negation, designed to test model robustness across...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/icassp55912.2026.11463139","openalex_id":"https://openalex.org/W7155100046","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Alibaba Group (China)","Dalian University of Technology","Singapore Management University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6646999716758728},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.45329999923706055},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.39750000834465027},{"id":"https://openalex.org/C28490314","display_name":"Speech recognition","score":0.36719998717308044},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.34850001335144043},{"id":"https://openalex.org/C195324797","display_name":"Natural language","score":0.32710000872612},{"id":"https://openalex.org/C116834253","display_name":"Identification (biology)","score":0.3154999911785126},{"id":"https://openalex.org/C41895202","display_name":"Linguistics","score":0.2930999994277954}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7155079357","title":"Audio-Guided Multimodal Approach for Fine-Grained Alignment and Boundary Modeling in Active Speaker Detection","url":"https://doi.org/10.1109/icassp55912.2026.11464605","published":"2026-04-21","authors":["Yongkang Yin","Yukun Zhuang","Zeyu Xie","Chenxing Li","Le Xu","Yuexian Zou"],"abstract":"Active Speaker Detection (ASD) aims to identify speakers in videos using audio-visual cues. Existing methods typically fuse audio and visual features but often lack fine-grained alignment between speech and speaker activity, and seldom explicitly model speech or speaker transition boundaries. To address these limitations, we propose an audio-guided multi-modal method. First, we enrich the AVA dataset with voice activity labels derived from Silero VAD and visual labels, providing more comprehensive training signals. Second, we introduce a semantic alignment strategy that guides visual features to align with pretrained voice activity features. Third, we construct a boundary modeling network that combines pretrained voice activity and speaker features with visual cues to capture transition boundaries, and is optimized with a dedicated loss to capture transition dynamics and enhance boundary...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/icassp55912.2026.11464605","openalex_id":"https://openalex.org/W7155079357","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Peking University Shenzhen Hospital","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6499999761581421},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5475000143051147},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.41110000014305115},{"id":"https://openalex.org/C28490314","display_name":"Speech recognition","score":0.3919999897480011},{"id":"https://openalex.org/C62354387","display_name":"Boundary (topology)","score":0.3750999867916107},{"id":"https://openalex.org/C153180895","display_name":"Pattern recognition (psychology)","score":0.3271999955177307},{"id":"https://openalex.org/C133892786","display_name":"Speaker recognition","score":0.30799999833106995},{"id":"https://openalex.org/C2776401178","display_name":"Feature (linguistics)","score":0.30550000071525574}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7155044497","title":"Advancing Speech Summarization in Multi-Modal LLMs with Reinforcement Learning","url":"https://doi.org/10.1109/icassp55912.2026.11461448","published":"2026-04-21","authors":["Shaoshi Ling","Gang Liu","Guoli Ye","Jinyu Li"],"abstract":"Speech summarization (SSum) is a critical component of spoken content understanding, particularly in the era of rapidly growing spoken and audiovisual data. Recent advances in multi-modal large language models (MLLMs), leveraging the power of LLMs, enable generating textual summaries directly from speech without intermediate transcriptions, while supporting controllable styles and zero-shot generalization. However, open-source MLLMs continue to lag behind the state-of-the-art text-based LLMs, limiting their practical deployment for speech summarization. In this work, we present a novel multi-stage reinforcement learning (RL) training framework to enhance the speech summarization capabilities in MLLMs. Our model delivers substantial improvements over strong baselines, outperforms much larger MLLMs, and significantly narrows the gap with state-of-the-art text-based LLMs.","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/icassp55912.2026.11461448","openalex_id":"https://openalex.org/W7155044497","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Microsoft (United States)"],"concepts":[{"id":"https://openalex.org/C170858558","display_name":"Automatic summarization","score":0.5843999981880188},{"id":"https://openalex.org/C15744967","display_name":"Psychology","score":0.45019999146461487},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.39070001244544983},{"id":"https://openalex.org/C97541855","display_name":"Reinforcement learning","score":0.38269999623298645},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.37040001153945923},{"id":"https://openalex.org/C67203356","display_name":"Reinforcement","score":0.3682999908924103},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.3174999952316284},{"id":"https://openalex.org/C180747234","display_name":"Cognitive psychology","score":0.29420000314712524}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7155064892","title":"A hybrid discriminative and generative system for Universal speech enhancement","url":"https://doi.org/10.1109/icassp55912.2026.11461249","published":"2026-04-21","authors":["Yinghao Liu","Chengwei Liu","Xiaotao Liang","Haoyin Yan","Shaofei Xue","Zheng Xue"],"abstract":"Universal speech enhancement aims at handling inputs with various speech distortions and recording conditions. In this work, we propose a novel hybrid architecture that synergizes the signal fidelity of discriminative modeling with the reconstruction capabilities of generative modeling. Our system utilizes the discriminative TF-GridNet model with the Sampling-Frequency-Independent strategy to handle variable sampling rates universally. In parallel, an autoregressive model combined with spectral mapping modeling generates detail-rich speech while effectively suppressing generative artifacts. Finally, A fusion network learns adaptive weights of the two outputs under the optimization of signal-level losses and the comprehensive Speech Quality Assessment (SQA) loss. Our proposed system is evaluated in the ICASSP 2026 URGENT Challenge (Track 1) and ranks the third place.","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/icassp55912.2026.11461249","openalex_id":"https://openalex.org/W7155064892","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Alibaba Group (China)","Alibaba Group (United States)"],"concepts":[{"id":"https://openalex.org/C97931131","display_name":"Discriminative model","score":0.6430000066757202},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6039000153541565},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5587999820709229},{"id":"https://openalex.org/C2776182073","display_name":"Speech enhancement","score":0.5249000191688538},{"id":"https://openalex.org/C28490314","display_name":"Speech recognition","score":0.49959999322891235},{"id":"https://openalex.org/C153180895","display_name":"Pattern recognition (psychology)","score":0.39649999141693115},{"id":"https://openalex.org/C9652623","display_name":"Field (mathematics)","score":0.3230000138282776},{"id":"https://openalex.org/C2776401178","display_name":"Feature (linguistics)","score":0.31850001215934753}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7155103121","title":"2025 Urgent Speech Enhancement Challenge Multilingual P.808 Listening Tests: Approach and Results","url":"https://doi.org/10.1109/icassp55912.2026.11465025","published":"2026-04-21","authors":["M Sach","Yihui Fu","Kohei Saijo","Wangyou Zhang","Samuele Cornell","Robin Scheibler","C. Li","Zhaoheng Ni","Anurag Kumar","Wei Wang","Yanmin Qian","Shinji Watanabe"],"abstract":"In speech quality estimation for speech enhancement (SE) systems, subjective listening tests so far are considered as the gold standard. This should be even more true considering the large influx of new generative or hybrid methods into the field, revealing issues of some objective metrics. Efforts such as the Interspeech 2025 URGENT Speech Enhancement Challenge also involving non-English datasets add the aspect of multilinguality to the testing procedure. In this paper, we provide updated challenge results on the full multilingual test set and surprising insights into URGENT Challenge results, questioning the reliability of (P.808) absolute category rating (ACR) subjective testing as gold standard in the age of generative AI. Particularly, it seems that for generative SE methods, subjective (ACR MOS) and objective (DNSMOS, NISQA) reference-free metrics should be accompanied by objective...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/icassp55912.2026.11465025","openalex_id":"https://openalex.org/W7155103121","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Carnegie Mellon University","Google (United States)","Shanghai Jiao Tong University","Technische Universität Braunschweig","Waseda University"],"concepts":[{"id":"https://openalex.org/C177291462","display_name":"Active listening","score":0.7781000137329102},{"id":"https://openalex.org/C2776182073","display_name":"Speech enhancement","score":0.588699996471405},{"id":"https://openalex.org/C15744967","display_name":"Psychology","score":0.4156999886035919},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.3840000033378601},{"id":"https://openalex.org/C28490314","display_name":"Speech recognition","score":0.3718999922275543},{"id":"https://openalex.org/C41895202","display_name":"Linguistics","score":0.33970001339912415},{"id":"https://openalex.org/C2986659363","display_name":"Listening comprehension","score":0.2808000147342682},{"id":"https://openalex.org/C9652623","display_name":"Field (mathematics)","score":0.25519999861717224}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/when-can-llms-learn-to-reason-with-weak-supervision","title":"When Can LLMs Learn to Reason with Weak Supervision?","url":"https://www.microsoft.com/en-us/research/publication/when-can-llms-learn-to-reason-with-weak-supervision/","published":"2026-04-20","authors":["Salman Rahman","Jingyan Shen","Anna Mordvina","Hamid Palangi","Saadia Gabriel","Pavel Izmailov"],"abstract":"Large language models have achieved significant reasoning improvements through reinforcement learning with verifiable rewards (RLVR). Yet as model capabilities grow, constructing high-quality reward signals becomes increasingly difficult, making it essential to understand when RLVR can succeed under weaker forms of supervision. We conduct a systematic empirical study across diverse model families and reasoning domains under three weak supervision settings: scarce data, noisy rewards, and self-supervised proxy rewards. We find that generalization is governed by training reward saturation dynamics: models that generalize exhibit a prolonged pre-saturation phase during which training reward and downstream performance climb together, while models that saturate rapidly memorize rather than learn. We identify reasoning faithfulness, defined as the extent to which intermediate steps logically s...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Miscellaneous","Algorithms","Artificial intelligence","Computer science"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"hf-org-paper:tencent:2604.19742","title":"PlayCoder: Making LLM-Generated GUI Code Playable","url":"https://huggingface.co/papers/2604.19742","published":"2026-04-20","authors":["Tencent/Hunyuan"],"abstract":"","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["HuggingFace org papers","tencent","LLM"],"author_affiliations":["Tencent/Hunyuan"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/tencent/papers"}},{"id":"official:da83a2ccd0f96bf4","title":"Gemini Robotics-ER 1.6 Model Card","url":"https://deepmind.google/models/model-cards/gemini-robotics-er-1-6/","published":"2026-04-20","authors":["Google/DeepMind"],"abstract":"Official Google DeepMind model card.","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"model_card","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["model card","Gemini Robotics-ER 1.6"],"author_affiliations":["Google/DeepMind"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Google DeepMind model cards page https://deepmind.google/models/model-cards/"}},{"id":"hf-org-paper:tencent:2604.19667","title":"Chat2Workflow: A Benchmark for Generating Executable Visual Workflows with Natural Language","url":"https://huggingface.co/papers/2604.19667","published":"2026-04-20","authors":["Tencent/Hunyuan"],"abstract":"","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["HuggingFace org papers","tencent"],"author_affiliations":["Tencent/Hunyuan"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/tencent/papers"}},{"id":"apple:bxpcp2a3psys9f6abi85ot7c","title":"What Do Your Logits Know? (The Answer May Surprise You!)","url":"https://machinelearning.apple.com/research/what-do-your-logits-know","published":"2026-04-20","authors":["Masha Fedzechkina","Eleonora Gualdoni","Rita Ramos","Sinead Williamson"],"abstract":"Recent work has shown that probing model internals can reveal a wealth of information not apparent from the model generations. This poses the risk of unintentional or malicious information leakage, where model users are able to learn information that the model owner assumed was inaccessible. Using vision-language models as a testbed, we present the first systematic comparison of information retained at different \"representational levels\" as it is...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"arxiv:2604.18529","title":"HybridGen: Efficient LLM Generative Inference via CPU-GPU Hybrid Computing","url":"https://arxiv.org/abs/2604.18529","published":"2026-04-20","authors":["Mao Lin","Xi Wang","Guilherme Cox","Dong Li","Hyeran Jeon"],"abstract":"As modern LLMs support thousands to millions of tokens, KV caches grow to hundreds of gigabytes, stressing memory capacity and bandwidth. Existing solutions, such as KV cache pruning and offloading, alleviate these but underutilize hardware by relying solely on either GPU or CPU for attention computing, and considering yet limited CPU local memory for KV cache storage. We propose HybridGen, an efficient hybrid attention framework for long-context LLM inference. HybridGen enables CPU-GPU collaborative attention on systems with expanded tiered memory (e.g., CXL memory), addressing three key challenges: (1) multi-dimensional attention dependencies, (2) intensifying CPU-GPU load imbalance with longer sequences, and (3) NUMA penalty of tiered memories. HybridGen tackles these by introducing attention logit parallelism, a feedback-driven scheduler, and semantic-aware KV cache mapping. Experime...","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"","openalex_id":"https://openalex.org/W7155246443","cited_by_count":0,"quality_score":49,"matched_keywords":["LLM","memory","efficient"],"author_affiliations":["Nvidia (United States)","University of California, Merced"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8402000069618225},{"id":"https://openalex.org/C115537543","display_name":"Cache","score":0.7621999979019165},{"id":"https://openalex.org/C173608175","display_name":"Parallel computing","score":0.5636000037193298},{"id":"https://openalex.org/C108010975","display_name":"Pruning","score":0.49959999322891235},{"id":"https://openalex.org/C26517878","display_name":"Key (lock)","score":0.47690001130104065},{"id":"https://openalex.org/C2776214188","display_name":"Inference","score":0.4616999924182892},{"id":"https://openalex.org/C3720319","display_name":"Cache-only memory architecture","score":0.4043999910354614},{"id":"https://openalex.org/C189783530","display_name":"CPU cache","score":0.3727000057697296}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"arxiv:2511.12439","title":"A multi-agent framework combining large language models with medical flowcharts for self-triage","url":"http://arxiv.org/abs/2511.12439","published":"2026-04-20","authors":["Yujia Liu","Shu-Chen Yu","Hongyue Jin","Jessica Wen","Alexander S. Qian","Terrence Lee","Mattheus Ramsis","Gi Won Choi","Lianhui Qin","Xin Liu","Edward Jay Wang"],"abstract":"","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1038/s44360-026-00112-2","openalex_id":"https://openalex.org/W4416355107","cited_by_count":0,"quality_score":45,"matched_keywords":["agent","multi-agent"],"author_affiliations":["Ansan University","Google (United States)","Kaiser Permanente","Kaiser Permanente San Diego Medical Center","Korea University","University of California San Diego","University of California, San Francisco","University of Washington"],"concepts":[{"id":"https://openalex.org/C72041958","display_name":"Flowchart","score":0.8672000169754028},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7257000207901001},{"id":"https://openalex.org/C2780598303","display_name":"Flexibility (engineering)","score":0.6294000148773193},{"id":"https://openalex.org/C43214815","display_name":"Reliability (semiconductor)","score":0.5698000192642212},{"id":"https://openalex.org/C28719098","display_name":"Point (geometry)","score":0.4645000100135803},{"id":"https://openalex.org/C206345919","display_name":"Resource (disambiguation)","score":0.4146000146865845},{"id":"https://openalex.org/C107327155","display_name":"Decision support system","score":0.37470000982284546},{"id":"https://openalex.org/C63527458","display_name":"Clinical decision support system","score":0.36970001459121704}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"arxiv:2604.17803","title":"Adversarial Arena: Crowdsourcing Data Generation through Interactive Competition","url":"https://arxiv.org/abs/2604.17803","published":"2026-04-20","authors":["Prasoon Goyal","Sattvik Sahai","Michael Johnston","Hangjie Shi","Yao Lu","Shaohua Liu","Anna Rumshisky","Rahul Gupta","Anna Gottardi","Desheng Zhang","Lavina Vaz","Leslie Ball"],"abstract":"Post-training Large Language Models requires diverse, high-quality data which is rare and costly to obtain, especially in low resource domains and for multi-turn conversations. Common solutions are crowdsourcing or synthetic generation, but both often yield low-quality or low-diversity data. We introduce Adversarial Arena for building high quality conversational datasets by framing data generation as an adversarial task: attackers create prompts, and defenders generate responses. This interactive competition between multiple teams naturally produces diverse and complex data. We validated this approach by conducting a competition with 10 academic teams from top US and European universities, each building attacker or defender bots. The competition, focused on safety alignment of LLMs in cybersecurity, generated 19,683 multi-turn conversations. Fine-tuning an open-source model on this datas...","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"","openalex_id":"https://openalex.org/W7155247444","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Amazon (United States)"],"concepts":[{"id":"https://openalex.org/C62230096","display_name":"Crowdsourcing","score":0.8693000078201294},{"id":"https://openalex.org/C37736160","display_name":"Adversarial system","score":0.8689000010490417},{"id":"https://openalex.org/C169087156","display_name":"Framing (construction)","score":0.6787999868392944},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6784999966621399},{"id":"https://openalex.org/C2522767166","display_name":"Data science","score":0.5335999727249146},{"id":"https://openalex.org/C2779530757","display_name":"Quality (philosophy)","score":0.46720001101493835},{"id":"https://openalex.org/C91306197","display_name":"Competition (biology)","score":0.46639999747276306},{"id":"https://openalex.org/C38652104","display_name":"Computer security","score":0.41260001063346863}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/transparent-and-controllable-recommendation-filtering-via-multimodal-multi-agent-collaboration","title":"Transparent and Controllable Recommendation Filtering via Multimodal Multi-Agent Collaboration","url":"https://www.microsoft.com/en-us/research/publication/transparent-and-controllable-recommendation-filtering-via-multimodal-multi-agent-collaboration/","published":"2026-04-19","authors":["Chi Zhang","Zhipeng Xu","Jiahao Liu","Dongsheng Li","Hansu Gu","Peng Zhang","Ning Gu","T. Lu"],"abstract":"While personalized recommender systems excel at content discovery, they frequently expose users to undesirable or discomforting information, highlighting the critical need for user-centric filtering tools. Current methods leveraging Large Language Models (LLMs) struggle with two major bottlenecks: they lack multimodal awareness to identify visually inappropriate content, and they are highly prone to\"over-association\"-- incorrectly generalizing a user's specific dislike (e.g., anxiety-inducing marketing) to block benign, educational materials. These unconstrained hallucinations lead to a high volume of false positives, ultimately undermining user agency. To overcome these challenges, we introduce a novel framework that integrates end-to-cloud collaboration, multimodal perception, and multi-agent orchestration. Our system employs a fact-grounded adjudication pipeline to eliminate inferenti...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":84,"matched_keywords":["Miscellaneous","Artificial intelligence","Search and information retrieval","Computer science","personalized","preference","agent","multi-agent"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/waking-up-blind-cold-start-optimization-of-supervision-free-agentic-trajectories-for-grounded-visual-perception","title":"Waking Up Blind: Cold-Start Optimization of Supervision-Free Agentic Trajectories for Grounded Visual Perception","url":"https://www.microsoft.com/en-us/research/publication/waking-up-blind-cold-start-optimization-of-supervision-free-agentic-trajectories-for-grounded-visual-perception/","published":"2026-04-19","authors":["Ashutosh Bajpai","Tamal Majumder","Akshay Nambi","Tanmoy Chakraborty"],"abstract":"Small Vision-Language Models (SVLMs) are efficient task controllers but often suffer from visual brittleness and poor tool orchestration. They typically require expensive supervised trajectory tuning to mitigate these deficits. In this work, we propose Self-supervised Perception Enabled by Cascaded Tool Rollout Alignment (SPECTRA), a supervision-free framework that bootstraps agentic capabilities via Coldstart Reinforcement Learning for SVLMs. SPECTRA enforces Soft Structured Multi-turn Rollouts, a topological constraint that directs agents to explicitly sequence tool derived evidence before synthesis, effectively grounding reasoning in visual observations. We employ a multi-objective reward signal that simultaneously maximizes task correctness, rollout structure, and tool utility, enabling agent to self-discover robust behaviors without human preference labels. We further introduce Tool...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":80,"matched_keywords":["Miscellaneous","Artificial intelligence","Computer vision","Computer science","preference","efficient","agent"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/rosettasearch-multi-objective-inference-time-search-for-protein-sequence-design","title":"RosettaSearch: Multi-Objective Inference-Time Search for Protein Sequence Design","url":"https://www.microsoft.com/en-us/research/publication/rosettasearch-multi-objective-inference-time-search-for-protein-sequence-design/","published":"2026-04-19","authors":["Meghana Kshirsagar","Ching-An Cheng","Allen Nie","Fanglei Xue","Rahul Dodhia","Juan M. Lavista Ferres","Kevin Kaichuang Yang","F. Dimaio"],"abstract":"We introduce RosettaSearch, an inference-time multi-objective optimization approach for protein sequence optimization. We use large language models (LLMs) as a generative optimizer within a search algorithm capable of controlled exploration and exploitation, using rewards computed from RosettaFold3, a structure prediction model. In a large-scale evaluation, we apply RosettaSearch to 400 suboptimal sequences generated by LigandMPNN (a state-of-the-art model trained for protein sequence design), recovering high-fidelity designs that LigandMPNN's single-pass decoding fails to produce. RosettaSearch's designs show improvements in structural fidelity metrics ranging between 18% to 68%, translating to a 2.5$times$ improvement in design success rate. We observe that these gains in success rate are robust when RosettaSearch-designed sequences are evaluated with an independent structure predictio...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Miscellaneous","Artificial intelligence","Medical, health and genomics","Biology","Computer science","LLM"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/precise-debugging-benchmark-is-your-model-debugging-or-regenerating","title":"Precise Debugging Benchmark: Is Your Model Debugging or Regenerating?","url":"https://www.microsoft.com/en-us/research/publication/precise-debugging-benchmark-is-your-model-debugging-or-regenerating/","published":"2026-04-19","authors":["Wangrong Zhu","Miaosen Chai","Shangshang Wang","Yejia Liu","Song Bian","Honghua Dong","W. Neiswanger","Robin Jia"],"abstract":"Unlike code completion, debugging requires localizing faults and applying targeted edits. We observe that frontier LLMs often regenerate correct but over-edited solutions during debugging. To evaluate how far LLMs are from precise debugging, we introduce the Precise Debugging Benchmark (PDB) framework, which automatically converts any coding dataset into a debugging benchmark with precision-aware evaluation. PDB generates buggy programs by synthesizing verified atomic bugs and composing them into multi-bug programs. We define two novel metrics, edit-level precision and bug-level recall, which measures how many necessary edits are made and how many bugs are resolved. We release two evaluation benchmarks: PDB-Single-Hard on single-line bugs, and PDB-Multi on multi-line bugs. Experiments show that frontier models, such as GPT-5.1-Codex and DeepSeek-V3.2-Thinking, achieve unit-test pass rate...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Miscellaneous","Artificial intelligence","Programming languages and software engineering","Computer science"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W7155194979","title":"A Survey on Graph-Based Retrieval-Augmented Generation: Architectures, Methods, and Applications","url":"https://doi.org/10.63503/j.ijcma.2026.235","published":"2026-04-19","authors":["Tanay Chowdhury"],"abstract":"Retrieval-Augmented Generation (RAG) stands out as a promising paradigm that can be applied to improve large language models by serving text generation with external sources of knowledge. Nevertheless, traditional RAG models are mostly based on flat text corpora and similarity-based retrieval and thus are not able to realize intricate relations, multi-hop dependencies, and structured semantics. The survey provides a high-level introduction to graph-based Retrieval-Augmented Generation (Graph-RAG), which is an emerging methodology adding systematic knowledge representations to the retrieval and generation process. Graph-RAG supports relational knowledge reasoning, contextual cohesion and explicit dependency modeling, complementary to text-only retrieval by organizing knowledge into graphs e.g., knowledge graphs, citation networks, and semantic graphs. The survey discusses the principles u...","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.63503/j.ijcma.2026.235","openalex_id":"https://openalex.org/W7155194979","cited_by_count":0,"quality_score":41,"matched_keywords":["retrieval"],"author_affiliations":["Amazon (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.739799976348877},{"id":"https://openalex.org/C104054115","display_name":"Cohesion (chemistry)","score":0.5954999923706055},{"id":"https://openalex.org/C44291984","display_name":"Question answering","score":0.5519999861717224},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.4745999872684479},{"id":"https://openalex.org/C2987255567","display_name":"Knowledge graph","score":0.4697999954223633},{"id":"https://openalex.org/C161301231","display_name":"Knowledge representation and reasoning","score":0.42969998717308044},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.42239999771118164},{"id":"https://openalex.org/C2522767166","display_name":"Data science","score":0.38449999690055847}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"arxiv:2604.17087","title":"EvoComp: Learning Visual Token Compression for Multimodal Large Language Models via Semantic-Guided Evolutionary Labeling","url":"https://arxiv.org/abs/2604.17087","published":"2026-04-18","authors":["Jiafei Song","Fengwei Zhou","Jin Qu","Wenjin Jason Li","Tong Wu","Gengjian Xue","Zhikang Zhao","Daomin Wei","Yichao Lu","Bailin Na"],"abstract":"Recent Multimodal Large Language Models (MLLMs) have demonstrated strong performance on vision-language understanding tasks, yet their inference efficiency is often hampered by the large number of visual tokens, particularly in high-resolution or multi-image scenarios. To address this issue, we propose EvoComp, a visual token compression framework that significantly reduces token count while preserving task accuracy. EvoComp introduces a lightweight encoder-only transformer-based compressor that selects the most informative and non-redundant visual tokens by jointly considering visual and textual contexts. A core challenge lies in providing effective supervision for training the compressor. To this end, we design an evolutionary labeling strategy that searches for token subsets minimizing the MLLM's output loss, while enforcing semantic diversity through vocabulary-based token grouping.....","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"","openalex_id":"https://openalex.org/W7155246142","cited_by_count":0,"quality_score":41,"matched_keywords":["compression"],"author_affiliations":["Huawei Technologies (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8197000026702881},{"id":"https://openalex.org/C48145219","display_name":"Security token","score":0.7706999778747559},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5687999725341797},{"id":"https://openalex.org/C2776214188","display_name":"Inference","score":0.5659999847412109},{"id":"https://openalex.org/C2780762811","display_name":"Cosine similarity","score":0.40799999237060547},{"id":"https://openalex.org/C67277372","display_name":"Semantic role labeling","score":0.357699990272522},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.3537999987602234},{"id":"https://openalex.org/C103278499","display_name":"Similarity (geometry)","score":0.35280001163482666}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7154846997","title":"DiffCrack: A semantic–structural controllable framework for crack image generation in complex scenes","url":"https://doi.org/10.1016/j.patcog.2026.113771","published":"2026-04-18","authors":["Tian Qin","Lingxi Xie","Q C Zou","Qi Tian","Qingquan Li"],"abstract":"","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1016/j.patcog.2026.113771","openalex_id":"https://openalex.org/W7154846997","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Huawei Technologies (China)","Hubei University","Shenzhen Metro (China)","Shenzhen University","Wuhan Business University","Wuhan University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7268999814987183},{"id":"https://openalex.org/C63479239","display_name":"Robustness (evolution)","score":0.6549999713897705},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.6028000116348267},{"id":"https://openalex.org/C48209547","display_name":"Controllability","score":0.5561000108718872},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.5527999997138977},{"id":"https://openalex.org/C2780513914","display_name":"Bottleneck","score":0.5041000247001648},{"id":"https://openalex.org/C177148314","display_name":"Generalization","score":0.49720001220703125},{"id":"https://openalex.org/C89600930","display_name":"Segmentation","score":0.4708999991416931}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/memexplorer-navigating-the-heterogeneous-memory-design-space-for-agentic-inference-npus","title":"MemExplorer: Navigating the Heterogeneous Memory Design Space for Agentic Inference NPUs","url":"https://www.microsoft.com/en-us/research/publication/memexplorer-navigating-the-heterogeneous-memory-design-space-for-agentic-inference-npus/","published":"2026-04-17","authors":["Haoran Wu","Zeyu Cao","Yao Lai","Binglei Lou","Jiayi Nie","Can Xiao","T. Adeniran","Przemyslaw Forys","Kauser Johar","Catriona R Wright","Junyi Liu","Kai Shi"],"abstract":"Emerging agentic LLM workloads are driving rapidly growing demand on both memory capacity and bandwidth, with different phases of inference (e.g., prefill and decode) imposing distinct requirements. Industry is responding by composing heterogeneous accelerators into single interconnected systems, as exemplified by NVIDIA's Vera Rubin platform, where each device brings its own memory architecture. This heterogeneity is further compounded by a widening landscape of available memory technologies: high-density on-chip SRAM, HBM, LPDDR, GDDR, and emerging options such as high-bandwidth flash (HBF), each offering different capacity, bandwidth, and power trade-offs. Identifying the right memory architecture for next-generation inference accelerators requires navigating a vast and rapidly evolving design space, in which the interplay between workload characteristics, NPU design dimensions, and m...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":84,"matched_keywords":["Miscellaneous","Hardware and devices","Systems and networking","Computer hardware","Computer science","LLM","memory","efficient"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"arxiv:2604.15602","title":"GroupDPO: Memory efficient Group-wise Direct Preference Optimization","url":"https://arxiv.org/abs/2604.15602","published":"2026-04-17","authors":["Jixuan Leng","Si Si","Hsiang‐Fu Yu","Vinod Raman","Inderjit S. Dhillon"],"abstract":"Preference optimization is widely used to align Large Language Models (LLMs) with preference feedback. However, most existing methods train on a single positive-negative pair per prompt, discarding additional supervision available in preference datasets that typically contain multiple candidate responses. Motivated by this limitation, recent work explores group-wise preference optimization, which jointly contrasts multiple responses for the same prompt, but its empirical behavior and scalability remain underexplored due to the memory overhead of group-coupled objectives. In this work, we introduce a memory-efficient group-wise preference optimization algorithm that preserves gradients while decoupling samples during backpropagation, substantially reducing peak memory usage, which enables scalable training with larger group sizes. Across both offline and online alignment settings, we show...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"","openalex_id":"https://openalex.org/W7155245316","cited_by_count":0,"quality_score":49,"matched_keywords":["preference","memory","efficient"],"author_affiliations":["Google (United States)"],"concepts":[{"id":"https://openalex.org/C2781249084","display_name":"Preference","score":0.7452999949455261},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7208999991416931},{"id":"https://openalex.org/C48044578","display_name":"Scalability","score":0.6498000025749207},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5123000144958496},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.487199991941452},{"id":"https://openalex.org/C205606062","display_name":"Decoupling (probability)","score":0.4830000102519989},{"id":"https://openalex.org/C181204326","display_name":"Preference learning","score":0.45559999346733093},{"id":"https://openalex.org/C137836250","display_name":"Optimization problem","score":0.4018999934196472}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"arxiv:2604.16943","title":"MNAFT: modality neuron-aware fine-tuning of multimodal large language models for image translation","url":"https://arxiv.org/abs/2604.16943","published":"2026-04-17","authors":["Bo Li","Ningyuan Deng","Tianyu Dong","Shaobo Wang","Shaolin Zhu","Lijie Wen"],"abstract":"","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1007/s11432-025-4914-1","openalex_id":"https://openalex.org/W7155201233","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Baidu (China)","Renmin University of China","Shanghai Jiao Tong University","Tianjin University of Science and Technology","Tsinghua University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.808899998664856},{"id":"https://openalex.org/C152124472","display_name":"Redundancy (engineering)","score":0.6367999911308289},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.6184999942779541},{"id":"https://openalex.org/C2780226545","display_name":"Modality (human–computer interaction)","score":0.6051999926567078},{"id":"https://openalex.org/C149364088","display_name":"Translation (biology)","score":0.5194000005722046},{"id":"https://openalex.org/C177148314","display_name":"Generalization","score":0.5074999928474426},{"id":"https://openalex.org/C115961682","display_name":"Image (mathematics)","score":0.5008000135421753},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.47909998893737793}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7154684808","title":"4DVD: Cascaded Dense-view Video Diffusion Model for High-quality 4D Content Generation","url":"https://doi.org/10.1007/s11263-026-02811-5","published":"2026-04-17","authors":["Shuzhou Yang","Xiaodong Cun","Xiaoyu Li","Yaowei Li","Jian Zhang"],"abstract":"","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1007/s11263-026-02811-5","openalex_id":"https://openalex.org/W7154684808","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Great Bay University","Immersion (United States)","Peking University Shenzhen Hospital","Peng Cheng Laboratory","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5461999773979187},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5397999882698059},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.5040000081062317},{"id":"https://openalex.org/C69357855","display_name":"Diffusion","score":0.5012000203132629},{"id":"https://openalex.org/C153180895","display_name":"Pattern recognition (psychology)","score":0.4514000117778778},{"id":"https://openalex.org/C11413529","display_name":"Algorithm","score":0.43619999289512634},{"id":"https://openalex.org/C2778152352","display_name":"Content (measure theory)","score":0.4350999891757965},{"id":"https://openalex.org/C9417928","display_name":"Image processing","score":0.40049999952316284}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"arxiv:2604.15804","title":"Qwen3.5-Omni Technical Report","url":"https://huggingface.co/papers/2604.15804","published":"2026-04-17","authors":["Qwen Team"],"abstract":"In this work, we present Qwen3.5-Omni, the latest advancement in the Qwen-Omni model family. Representing a significant evolution over its predecessor, Qwen3.5-Omni scales to hundreds of billions of parameters and supports a 256k context length. By leveraging a massive dataset comprising heterogeneous text-vision pairs and over 100 million hours of audio-visual content, the model demonstrates robust omni-modality capabilities. Qwen3.5-Omni-plus achieves SOTA results across 215 audio and audio-visual understanding, reasoning, and interaction subtasks and benchmarks, surpassing Gemini-3.1 Pro in key audio tasks and matching it in comprehensive audio-visual understanding. Architecturally, Qwen3.5-Omni employs a Hybrid Attention Mixture-of-Experts (MoE) framework for both Thinker and Talker, enabling efficient long-sequence inference. The model facilitates sophisticated interaction, supporti...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["huggingface_search"],"source":"huggingface_search","work_type":"","doi":"","openalex_id":"","cited_by_count":0,"quality_score":31,"matched_keywords":["efficient"],"author_affiliations":[],"concepts":[],"official_report":false,"quality_signals":{"company_match_source":"HuggingFace Papers organization/author metadata"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/pushing-the-limits-of-on-device-streaming-asr-a-compact-high-accuracy-english-model-for-low-latency-inference","title":"Pushing the Limits of On-Device Streaming ASR: A Compact, High-Accuracy English Model for Low-Latency Inference","url":"https://www.microsoft.com/en-us/research/publication/pushing-the-limits-of-on-device-streaming-asr-a-compact-high-accuracy-english-model-for-low-latency-inference/","published":"2026-04-16","authors":["Nenad Banfic","D. Fan","Kunal Vaishnavi","S. Kemp","Sunghoon Choi","Ruifeng Ren","Sun Shaw","Meng Tang"],"abstract":"Deploying high-quality automatic speech recognition (ASR) on edge devices requires models that jointly optimize accuracy, latency, and memory footprint while operating entirely on CPU without GPU acceleration. We conduct a systematic empirical study of state-of-the-art ASR architectures, encompassing encoder-decoder, transducer, and LLM-based paradigms, evaluated across batch, chunked, and streaming inference modes. Through a comprehensive benchmark of over 50 configurations spanning OpenAI Whisper, NVIDIA Nemotron, Parakeet TDT, Canary, Conformer Transducer, and Qwen3-ASR, we identify NVIDIA's Nemotron Speech Streaming as the strongest candidate for real-time English streaming on resource-constrained hardware. We then re-implement the complete streaming inference pipeline in ONNX Runtime and conduct a controlled evaluation of multiple post-training quantization strategies, including imp...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":80,"matched_keywords":["Miscellaneous","Artificial intelligence","Human language technologies","Computer science","LLM","memory","quantization"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/public-use-of-a-generalist-llm-chatbot-for-health-queries","title":"Public use of a generalist LLM chatbot for health queries","url":"https://www.microsoft.com/en-us/research/publication/public-use-of-a-generalist-llm-chatbot-for-health-queries/","published":"2026-04-16","authors":["Beatriz Costa-Gomes","Pavel Tolmachev","E. Taysom","V. Sounderajah","Hannah Richardson","Philipp Schoenegger","Xiaoxuan Liu","M. M. Nour","Seth Spielman","Samuel F Way","Yash Shah","M. Bhaskar"],"abstract":"Here we analyse over 500,000 de-identified health-related conversations with Microsoft Copilot from January 2026 to characterize what people ask conversational artificial intelligence (AI) about health. We apply a hierarchical intent taxonomy of 12 primary categories using privacy-preserving large language model-based classification validated against expert human annotation and use topic clustering for prevalent themes within each intent. We then characterize the intents and topics behind health queries, identify who they are about, and analyse how usage varies by device and time of day. Nearly one in five conversations involves personal symptom assessment or condition discussion, and the dominant general information category is also concentrated on specific treatments and conditions, suggesting that this is a lower bound on personal health intent. One in seven of these personal health q...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Article (Journal)","Artificial intelligence","Medical, health and genomics","Genomics","LLM","language model"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/mm-webagent-a-hierarchical-multimodal-web-agent-for-webpage-generation","title":"MM-WebAgent: A Hierarchical Multimodal Web Agent for Webpage Generation","url":"https://www.microsoft.com/en-us/research/publication/mm-webagent-a-hierarchical-multimodal-web-agent-for-webpage-generation/","published":"2026-04-16","authors":["Yan Li","Zezi Zeng","Yifan Yang","Yuqing Yang","Ning Liao","Weiwei Guo","Lili Qiu","Mingxi Cheng","Qiuchao Dai","Zhendong Wang","Zhengyuan Yang","Xue Yang"],"abstract":"The rapid progress of Artificial Intelligence Generated Content (AIGC) tools enables images, videos, and visualizations to be created on demand for webpage design, offering a flexible and increasingly adopted paradigm for modern UI/UX. However, directly integrating such tools into automated webpage generation often leads to style inconsistency and poor global coherence, as elements are generated in isolation. We propose MM-WebAgent, a hierarchical agentic framework for multimodal webpage generation that coordinates AIGC-based element generation through hierarchical planning and iterative self-reflection. MM-WebAgent jointly optimizes global layout, local multimodal content, and their integration, producing coherent and visually consistent webpages. We further introduce a benchmark for multimodal webpage generation and a multi-level evaluation protocol for systematic assessment. Experimen...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Miscellaneous","Artificial intelligence","Computer vision","Computer science","Statistical mechanics","agent"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/how-people-use-copilot-for-health","title":"How people use Copilot for Health","url":"https://www.microsoft.com/en-us/research/publication/how-people-use-copilot-for-health/","published":"2026-04-16","authors":["Beatriz Costa-Gomes","Pavel Tolmachev","Eloise Taysom","Viknesh Sounderajah","Hannah Richardson (nee Murfet)","Philipp Schoenegger","Xiaoxuan Liu","Matthew M Nour","Seth Spielman","Samuel F. Way","Yash Shah","Michael Bhaskar"],"abstract":"We analyze over 500,000 de-identified health-related conversations with Microsoft Copilot from January 2026 to characterize what people ask conversational AI about health. We develop a hierarchical intent taxonomy of 12 primary categories using privacy-preserving LLM-based classification validated against expert human annotation, and apply LLM-driven topic-clustering for prevalent themes within each intent. Using this taxonomy, we characterize the intents and topics behind health queries, identify who these queries are about, and analyze how usage varies by device and time of day. Five findings stand out. First, nearly one in five conversations involve personal symptom assessment or condition discussion, and even the dominant general information category (40%) is concentrated on specific treatments and conditions, suggesting that this is a lower bound on personal health intent. Second, o...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Article (Journal)","Artificial intelligence","Medical, health and genomics","Healthcare","1970-01-01","LLM"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"apple:f9dpiasu26owx82c4oa2dvhc","title":"MixAtlas: Uncertainty-aware Data Mixture Optimization for Multimodal LLM Midtraining","url":"https://machinelearning.apple.com/research/mixatlas","published":"2026-04-16","authors":["Bingbing Wen","Sirajul Salekin","Feiyang Kang","Lucy Lu Wang","Bill Howe","Javier Movellan","Manjot Bilkhu"],"abstract":"This paper was accepted at the Workshop on Navigating and Addressing Data Problems for Foundation Models (NADPFM) at ICLR 2026.","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["LLM"],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"official:a01b99e5ef46a7dd","title":"Introducing GPT-Rosalind for life sciences research","url":"https://openai.com/index/introducing-gpt-rosalind","published":"2026-04-16","authors":["OpenAI"],"abstract":"OpenAI introduces GPT-Rosalind, a frontier reasoning model built to accelerate drug discovery, genomics analysis, protein reasoning, and scientific research workflows.","companies":["OpenAI"],"matched_orgs":["OpenAI"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Research"],"author_affiliations":["OpenAI"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official RSS feed https://openai.com/news/rss.xml"}},{"id":"openalex:W7154599711","title":"ORION: An agentic reasoning construct for the analysis of complex human immune profiling","url":"https://doi.org/10.64898/2026.04.13.718286","published":"2026-04-16","authors":["Monica T. Dayao","Kenny Kim","Bernard Khor","Aaron Jaech","Bas van Opheusden","Aaron Bodansky","Joseph DeRisi"],"abstract":"Abstract The capacity to generate high-dimensional biological datasets has outpaced the ability to interpret them. Technologies such as phage immunoprecipitation and sequencing (PhIP-seq) enable proteome-scale profiling of antibody repertoires, but interpreting thousands of enriched peptides into mechanistic hypotheses remains a labor-intensive bottleneck requiring expert synthesis of statistics, literature, and domain knowledge. Here we describe ORION (Omics Reasoning & Interpretation Orchestrator), a multi-agent framework that uses reasoning-capable large language models to perform end-to-end analysis of complex immune profiling data. ORION integrates statistical analysis, machine learning, and automated literature review into a single structured workflow, producing results that are reproducible and fully traceable. Applied to a published PhIP-seq dataset from autoimmune polyendocrine....","companies":["OpenAI"],"matched_orgs":["OpenAI"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.64898/2026.04.13.718286","openalex_id":"https://openalex.org/W7154599711","cited_by_count":0,"quality_score":45,"matched_keywords":["agent","multi-agent"],"author_affiliations":["Chan Zuckerberg Biohub San Francisco","OpenAI (United States)","University of California, San Francisco","University of San Francisco","Virginia Mason Medical Center"],"concepts":[{"id":"https://openalex.org/C187191949","display_name":"Profiling (computer programming)","score":0.666700005531311},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5828999876976013},{"id":"https://openalex.org/C70721500","display_name":"Computational biology","score":0.49939998984336853},{"id":"https://openalex.org/C2780513914","display_name":"Bottleneck","score":0.4438999891281128},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.436599999666214},{"id":"https://openalex.org/C2780801425","display_name":"Construct (python library)","score":0.3912000060081482},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.3402999937534332},{"id":"https://openalex.org/C164614171","display_name":"DECIPHER","score":0.319599986076355}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7154610595","title":"KnitLoRA: bridging low-rank adaptation as interwoven layers for deeper semantic reasoning","url":"https://doi.org/10.1038/s41598-026-47668-3","published":"2026-04-16","authors":["Hongjie Qiu","Youyou Ning","Jinqiang Li","X.F. Zhu","Yifan Gong","Liqi Yan","Fangli Guan","Jianhui Zhang","Fuli Feng","Z XU","Qifan Wang","Pan Li"],"abstract":"Low-Rank Adaptation (LoRA) is a parameter-efficient fine-tuning (PEFT) method that facilitates the lightweight adaptation of large language models (LLMs) by introducing low-rank update matrices, and has since motivated the development of various extensions. However, LoRA and its most variants overlook the issue of information loss in deep LoRA layers, where signals or gradients passing through multiple LoRA blocks may gradually vanish before reaching the final layers of the network. This limitation hampers convergence speed during fine-tuning. To address these challenges, we propose KnitLoRA, an innovative dense connection low-rank adaptation framework. First, we introduce dense connections between each LoRA block and multiple, or even all preceding blocks to facilitate feature reuse and fusion. Second, these connections improve gradient flow and mitigate the vanishing gradient problem.....","companies":["Meta/FAIR"],"matched_orgs":["Meta/FAIR"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1038/s41598-026-47668-3","openalex_id":"https://openalex.org/W7154610595","cited_by_count":0,"quality_score":41,"matched_keywords":["efficient"],"author_affiliations":["Alpha Omega Alpha Medical Honor Society","Fudan University","Hangzhou Dianzi University","Meta (United States)","Ministry of Education","University of Science and Technology of China"],"concepts":[{"id":"https://openalex.org/C174348530","display_name":"Bridging (networking)","score":0.9388999938964844},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8454999923706055},{"id":"https://openalex.org/C139807058","display_name":"Adaptation (eye)","score":0.7113000154495239},{"id":"https://openalex.org/C2777210771","display_name":"Block (permutation group theory)","score":0.620199978351593},{"id":"https://openalex.org/C206588197","display_name":"Reuse","score":0.618399977684021},{"id":"https://openalex.org/C13355873","display_name":"Connection (principal bundle)","score":0.5734000205993652},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.47099998593330383},{"id":"https://openalex.org/C2777303404","display_name":"Convergence (economics)","score":0.4117000102996826}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/shuffle-the-context-rope-perturbed-self-distillation-for-long-context-adaptation","title":"Shuffle the Context: RoPE-Perturbed Self-Distillation for Long-Context Adaptation","url":"https://www.microsoft.com/en-us/research/publication/shuffle-the-context-rope-perturbed-self-distillation-for-long-context-adaptation/","published":"2026-04-15","authors":["Zichong Li","Chen Liang","Liliang Ren","Tuo Zhao","Yelong Shen","Weizhu Chen"],"abstract":"Large language models (LLMs) increasingly operate in settings that require reliable long-context understanding, such as retrieval-augmented generation and multi-document reasoning. A common strategy is to fine-tune pretrained short-context models at the target sequence length. However, we find that standard long-context adaptation can remain brittle: model accuracy depends strongly on the absolute placement of relevant evidence, exhibiting high positional variance even when controlling for task format and difficulty. We propose RoPE-Perturbed Self-Distillation, a training regularizer that improves positional robustness. The core idea is to form alternative\"views\"of the same training sequence by perturbing its RoPE indices -- effectively moving parts of the context to different positions -- and to train the model to produce consistent predictions across views via self-distillation. This e...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":80,"matched_keywords":["Miscellaneous","Artificial intelligence","Human language technologies","Computation and Language","Computer science","retrieval","distillation"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/response-aware-user-memory-selection-for-llm-personalization","title":"Response-Aware User Memory Selection for LLM Personalization","url":"https://www.microsoft.com/en-us/research/publication/response-aware-user-memory-selection-for-llm-personalization/","published":"2026-04-15","authors":["Jillian R. Fisher","Jennifer Neville","Chanhee Park"],"abstract":"A common approach to personalization in large language models (LLMs) is to incorporate a subset of the user memory into the prompt at inference time to guide the model's generation. Existing methods select these subsets primarily using similarity between user memory items and input queries, ignoring how features actually affect the model's response distribution. We propose Response-Utility optimization for Memory Selection (RUMS), a novel method that selects user memory items by measuring the mutual information between a subset of memory and the model's outputs, identifying items that reduce response uncertainty and sharpen predictions beyond semantic similarity. We demonstrate that this information-theoretic foundation enables more principled user memory selection that aligns more closely with human selection compared to state-of-the-art methods, and models $400times$ larger. Additional...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":80,"matched_keywords":["Miscellaneous","Algorithms","Artificial intelligence","Computer science","LLM","personalization","memory"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/dont-let-ai-agents-yolo-your-files-shifting-information-and-control-to-filesystems-for-agent-safety-and-autonomy","title":"Don't Let AI Agents YOLO Your Files: Shifting Information and Control to Filesystems for Agent Safety and Autonomy","url":"https://www.microsoft.com/en-us/research/publication/dont-let-ai-agents-yolo-your-files-shifting-information-and-control-to-filesystems-for-agent-safety-and-autonomy/","published":"2026-04-15","authors":["Shawn Zhong","Junxuan Liao","Jing Liu","Mai Zheng","Andrea C. Arpaci-Dusseau","Remzi H. Arpaci-Dusseau"],"abstract":"AI coding agents operate directly on users'filesystems, where they regularly corrupt data, delete files, and leak secrets. Current approaches force a tradeoff between safety and autonomy: unrestricted access risks harm, while frequent permission prompts burden users and block agents. To understand this problem, we conduct the first systematic study of agent filesystem misuse, analyzing 290 public reports across 13 frameworks. Our analysis reveals that today's agents have limited information about their filesystem effects and insufficient control over them. We therefore argue for shifting this information and control to the filesystem itself. Based on this principle, we design YoloFS, an agent-native filesystem with three techniques. Staging isolates all mutations before commit, giving users corrective control. Snapshots extend this control to agents, letting them detect and correct their...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Miscellaneous","Security, privacy, and cryptography","Systems and networking","Computer science","Operating system","agent"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/duet-joint-exploration-of-user-item-profiles-in-recommendation-system","title":"DUET: Joint Exploration of User Item Profiles in Recommendation System","url":"https://www.microsoft.com/en-us/research/publication/duet-joint-exploration-of-user-item-profiles-in-recommendation-system/","published":"2026-04-15","authors":["Yue Chen","Yifei Sun","Lu Wang","Fangkai Yang","Pu Zhao","Minjie Hong","Yi Dong","Minghua He","Nan Hu","Jianjin Zhang","Zhiwei Dai","Yuefeng Zhan"],"abstract":"Traditional recommendation systems represent users and items as dense vectors and learn to align them in a shared latent space for relevance estimation. Recent LLM-based recommenders instead leverage natural-language representations that are easier to interpret and integrate with downstream reasoning modules. This paper studies how to construct effective textual profiles for users and items, and how to align them for recommendation. A central difficulty is that the best profile format is not known a priori: manually designed templates can be brittle and misaligned with task objectives. Moreover, generating user and item profiles independently may produce descriptions that are individually plausible yet semantically inconsistent for a specific user--item pair. We propose Duet, an interaction-aware profile generator that jointly produces user and item profiles conditioned on both user hist...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Miscellaneous","Artificial intelligence","Search and information retrieval","Computer science","Information retrieval","LLM"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/towards-enabling-an-artificial-self-construction-software-life-cycle-via-autopoietic-architectures","title":"Towards Enabling An Artificial Self-Construction Software Life-cycle via Autopoietic Architectures","url":"https://www.microsoft.com/en-us/research/publication/towards-enabling-an-artificial-self-construction-software-life-cycle-via-autopoietic-architectures/","published":"2026-04-15","authors":["Daniel Rodríguez-Cárdenas","David N. Palacio","Denys Poshyvanyk"],"abstract":"Software engineering research has focused on automating maintenance and evolution processes to reduce costs and improve reliability. The emergence of foundation models (FMs) with strong code understanding and reasoning abilities offers new opportunities for autonomous software behavior. Inspired by Artificial Life (ALife), we propose a fundamental shift in the Software Development Life-Cycle (SDLC) by introducing self-construction mechanisms that enable software to evolve and maintain autonomously. This position paper explores the potential of Autopoietic Architectures, specifically Psi-Arch, as a foundational framework for self-constructing software. We first analyze the limitations of traditional maintenance approaches and identify gaps in current SDLC automation. Subsequently, we outline the core challenges in achieving self-construction, including the integration of foundation-model-...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"https://openalex.org/W7154790107","cited_by_count":0,"quality_score":72,"matched_keywords":["Miscellaneous","Artificial intelligence","Programming languages and software engineering","Computer science","software engineering"],"author_affiliations":["Microsoft","Microsoft (United States)"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"official:79b2d42b9d834bc7","title":"Gemini 3.1 Flash Audio (Flash Live, TTS) Model Card","url":"https://deepmind.google/models/model-cards/gemini-3-1-flash-audio/","published":"2026-04-15","authors":["Google/DeepMind"],"abstract":"Official Google DeepMind model card.","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"model_card","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["model card","Gemini 3.1 Flash Audio (Flash Live, TTS)"],"author_affiliations":["Google/DeepMind"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Google DeepMind model cards page https://deepmind.google/models/model-cards/"}},{"id":"arxiv:2604.14362","title":"APEX-MEM: Agentic Semi-Structured Memory with Temporal Reasoning for Long-Term Conversational AI","url":"https://arxiv.org/abs/2604.14362","published":"2026-04-15","authors":["Pratyay Banerjee","Masud Moshtaghi","Shivashankar Subramanian","Amita Misra","Ankit Chadha"],"abstract":"Large language models still struggle with reliable long-term conversational memory: simply enlarging context windows or applying naive retrieval often introduces noise and destabilizes responses. We present APEX-MEM, a conversational memory system that combines three key innovations: (1) a property graph which uses domain-agnostic ontology to structure conversations as temporally grounded events in an entity-centric framework, (2) append-only storage that preserves the full temporal evolution of information, and (3) a multi-tool retrieval agent that understands and resolves conflicting or evolving information at query time, producing a compact and contextually relevant memory summary. This retrieval-time resolution preserves the full interaction history while suppressing irrelevant details. APEX-MEM achieves 88.88% accuracy on LOCOMO's Question Answering task and 86.2% on LongMemEval, ou...","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"","openalex_id":"https://openalex.org/W7154865868","cited_by_count":0,"quality_score":53,"matched_keywords":["memory","long-term","retrieval","agent"],"author_affiliations":["Amazon (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7735000252723694},{"id":"https://openalex.org/C189950617","display_name":"Property (philosophy)","score":0.5848000049591064},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5224999785423279},{"id":"https://openalex.org/C2780451532","display_name":"Task (project management)","score":0.4887000024318695},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.48669999837875366},{"id":"https://openalex.org/C26517878","display_name":"Key (lock)","score":0.4706000089645386},{"id":"https://openalex.org/C44291984","display_name":"Question answering","score":0.4699999988079071},{"id":"https://openalex.org/C2779343474","display_name":"Context (archaeology)","score":0.45669999718666077}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"official:2be4125c36dffb70","title":"Introducing ERNIE-Image","url":"https://ernie.baidu.com/blog/posts/ernie-image/","published":"2026-04-15","authors":["Baidu"],"abstract":"ERNIE-Image is a text-to-image generation model built on a single-stream Diffusion Transformer (DiT) with 8B DiT parameters, achieving leading performance among open-weights models.","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Baidu"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official RSS feed https://ernie.baidu.com/blog/index.xml"}},{"id":"arxiv:2604.11244","title":"Script-a-Video: Deep Structured Audio-visual Captions via Factorized Streams and Relational Grounding","url":"https://huggingface.co/papers/2604.11244","published":"2026-04-15","authors":["Tencent Hunyuan Team"],"abstract":"Advances in Multimodal Large Language Models (MLLMs) are transforming video captioning from a descriptive endpoint into a semantic interface for both video understanding and generation. However, the dominant paradigm still casts videos as monolithic narrative paragraphs that entangle visual, auditory, and identity information. This dense coupling not only compromises representational fidelity but also limits scalability, since even local edits can trigger global rewrites. To address this structural bottleneck, we propose Multi-Stream Scene Script (MTSS), a novel paradigm that replaces monolithic text with factorized and explicitly grounded scene descriptions. MTSS is built on two core principles: Stream Factorization, which decouples a video into complementary streams (Reference, Shot, Event, and Global), and Relational Grounding, which reconnects these isolated streams through explicit....","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["huggingface_search"],"source":"huggingface_search","work_type":"","doi":"","openalex_id":"","cited_by_count":0,"quality_score":27,"matched_keywords":[],"author_affiliations":[],"concepts":[],"official_report":false,"quality_signals":{"company_match_source":"HuggingFace Papers organization/author metadata"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/development-evaluation-and-deployment-of-a-multi-agent-system-for-thoracic-tumor-board","title":"Development, Evaluation, and Deployment of a Multi-Agent System for Thoracic Tumor Board","url":"https://www.microsoft.com/en-us/research/publication/development-evaluation-and-deployment-of-a-multi-agent-system-for-thoracic-tumor-board/","published":"2026-04-14","authors":["T. Ellis-Caleo","T. Keyes","N. Ambers","Faraah N Bekheet","Wen-wai Yim","N. Kotecha","Nigam H. Shah","J. Neal"],"abstract":"Tumor boards are multidisciplinary conferences dedicated to producing actionable patient care recommendations with live review of primary radiology and pathology data. Succinct patient case summaries are needed to drive efficient and accurate case discussions. We developed a manual AI-based workflow to generate patient summaries to display live at the Stanford Thoracic Tumor board. To improve on this manually intensive process, we developed several automated AI chart summarization methods and evaluated them against physician gold standard summaries and fact-based scoring rubrics. We report these comparative evaluations as well as our deployment of the final state automated AI chart summarization tool along with post-deployment monitoring. We also validate the use of an LLM as a judge evaluation strategy for fact-based scoring. This work is an example of integrating AI-based workflows int...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":88,"matched_keywords":["Miscellaneous","Artificial intelligence","Medical, health and genomics","Computer science","Medicine","LLM","efficient","agent","multi-agent"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/pareto-optimal-offline-reinforcement-learning-via-smooth-tchebysheff-scalarization","title":"Pareto-Optimal Offline Reinforcement Learning via Smooth Tchebysheff Scalarization","url":"https://www.microsoft.com/en-us/research/publication/pareto-optimal-offline-reinforcement-learning-via-smooth-tchebysheff-scalarization/","published":"2026-04-14","authors":["Aadyot Bhatnagar","Peter Morch Groth","Ali Madani"],"abstract":"Large language models can be aligned with human preferences through offline reinforcement learning (RL) on small labeled datasets. While single-objective alignment is well-studied, many real-world applications demand the simultaneous optimization of multiple conflicting rewards, e.g. optimizing both catalytic activity and specificity in protein engineering, or helpfulness and harmlessness for chatbots. Prior work has largely relied on linear reward scalarization, but this approach provably fails to recover non-convex regions of the Pareto front. In this paper, instead of scalarizing the rewards directly, we frame multi-objective RL itself as an optimization problem to be scalarized via smooth Tchebysheff scalarization, a recent technique that overcomes the shortcomings of linear scalarization. We use this formulation to derive Smooth Tchebysheff Optimization of Multi-Objective Preference...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":80,"matched_keywords":["Miscellaneous","Algorithms","Artificial intelligence","Biology","Computer science","Machine learning","preference"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/webxskill-skill-learning-for-autonomous-web-agents","title":"WebXSkill: Skill Learning for Autonomous Web Agents","url":"https://www.microsoft.com/en-us/research/publication/webxskill-skill-learning-for-autonomous-web-agents/","published":"2026-04-14","authors":["Zhaoyang Wang","Qianhui Wu","Xuchao Zhang","Chaoyun Zhang","Wenlin Yao","Fazle Faisal","Baolin Peng","Si Qin","Suman Nath","Qingwei Lin","Chetan Bansal","Dongmei Zhang"],"abstract":"Autonomous web agents powered by large language models (LLMs) have shown promise in completing complex browser tasks, yet they still struggle with long-horizon workflows. A key bottleneck is the grounding gap in existing skill formulations: textual workflow skills provide natural language guidance but cannot be directly executed, while code-based skills are executable but opaque to the agent, offering no step-level understanding for error recovery or adaptation. We introduce WebXSkill, a framework that bridges this gap with executable skills, each pairing a parameterized action program with step-level natural language guidance, enabling both direct execution and agent-driven adaptation. WebXSkill operates in three stages: skill extraction mines reusable action subsequences from readily available synthetic agent trajectories and abstracts them into parameterized skills, skill organization...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Miscellaneous","Algorithms","Artificial intelligence","Computer science","retrieval","agent"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/selecting-feature-interactions-for-generalized-additive-models-by-distilling-foundation-models","title":"Selecting Feature Interactions for Generalized Additive Models by Distilling Foundation Models","url":"https://www.microsoft.com/en-us/research/publication/selecting-feature-interactions-for-generalized-additive-models-by-distilling-foundation-models/","published":"2026-04-14","authors":["Jingyun Jia","Chandan Singh","Rich Caruana","Benjamin J. Lengerich"],"abstract":"Identifying meaningful feature interactions is a central challenge in building accurate and interpretable models for tabular data. Generalized additive models (GAMs) have shown great success at modeling tabular data, but often rely on heuristic procedures to select interactions, potentially missing higher-order or context-dependent effects. To meet this challenge, we propose TabDistill, a method that leverages tabular foundation models and post-hoc distillation methods. Our key intuition is that tabular foundation models implicitly learn rich, adaptive feature dependencies through large-scale representation learning. Given a dataset, TabDistill first fits a tabular foundation model to the dataset, and then applies a post-hoc interaction attribution method to extract salient feature interactions from it. We evaluate these interactions by then using them as terms in a GAM. Across tasks, we...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Miscellaneous","Artificial intelligence","Data platforms and analytics","Computer science","Machine learning","distillation"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/goals-as-first-class-abstractions-in-human-ai-collaboration","title":"Goals as First-Class Abstractions in Human-AI Collaboration","url":"https://www.microsoft.com/en-us/research/publication/goals-as-first-class-abstractions-in-human-ai-collaboration/","published":"2026-04-14","authors":["Lev Tankelevitch","Sean Rintel"],"abstract":"As AI assumes more of the material production in knowledge work, human effort shifts toward planning, orchestration, and evaluation, all of which revolves around goals. Yet goals remain poorly represented in knowledge work tools and workflows: implicit, unexpressed, or confused with outputs. Beyond their importance for human work, clear goals are fundamental to human-AI communication and collaboration. We review research establishing the value of explicit goals, show through a review of commercial tools that existing ecosystems support goal tracking but not goal articulation, alignment, or contextual use, and use meetings as a proving ground demonstrating that upstream goal articulation produces disproportionate downstream value for both humans and AI agents. We argue that goals should be encoded as first-class abstractions that drive human-AI collaborative workflows and that generative....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Artificial intelligence","Human-computer interaction","Social sciences","Human–computer interaction","Social Science","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/cod-lite-real-time-diffusion-based-generative-image-compression","title":"CoD-Lite: Real-Time Diffusion-Based Generative Image Compression","url":"https://www.microsoft.com/en-us/research/publication/cod-lite-real-time-diffusion-based-generative-image-compression/","published":"2026-04-14","authors":["Bin Li","Zhaoyang Jia","Naifu Xue","Zihan Zheng","Jiahao Li","Xiaoyi Zhang","Zongyu Guo","Yuan Zhang","Houqiang Li","Yan Lu"],"abstract":"Recent advanced diffusion methods typically derive strong generative priors by scaling diffusion transformers. However, scaling fails to generalize when adapted for real-time compression scenarios that demand lightweight models. In this paper, we explore the design of real-time and lightweight diffusion codecs by addressing two pivotal questions. First, does diffusion pre-training benefit lightweight diffusion codecs? Through systematic analysis, we find that generation-oriented pre-training is less effective at small model scales whereas compression-oriented pre-training yields consistently better performance. Second, are transformers essential? We find that while global attention is crucial for standard generation, lightweight convolutions suffice for compression-oriented diffusion when paired with distillation. Guided by these findings, we establish a one-step lightweight convolution....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Miscellaneous","Artificial intelligence","Computer vision","Computer science","compression","distillation"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/see-point-refine-multi-turn-approach-to-gui-grounding-with-visual-feedback","title":"See, Point, Refine: Multi-Turn Approach to GUI Grounding with Visual Feedback","url":"https://www.microsoft.com/en-us/research/publication/see-point-refine-multi-turn-approach-to-gui-grounding-with-visual-feedback/","published":"2026-04-14","authors":["Himangi Mittal","Gaurav Mittal","Nelson Troncoso","Yu Hu"],"abstract":"Computer Use Agents (CUAs) fundamentally rely on graphical user interface (GUI) grounding to translate language instructions into executable screen actions, but editing-level grounding in dense coding interfaces, where sub-pixel accuracy is required to interact with dense IDE elements, remains underexplored. Existing approaches typically rely on single-shot coordinate prediction, which lacks a mechanism for error correction and often fails in high-density interfaces. In this technical report, we conduct an empirical study of pixel-precise cursor localization in coding environments. Instead of a single-step execution, our agent engages in an iterative refinement process, utilizing visual feedback from previous attempts to reach the target element. This closed-loop grounding mechanism allows the agent to self-correct displacement errors and adapt to dynamic UI changes. We evaluate our appr...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Miscellaneous","Artificial intelligence","Computer vision","Computer science","agent"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/can-persona-prompted-llms-emulate-subgroup-values-an-empirical-analysis-of-generalisability-and-fairness-in-cultural-alignment","title":"Can Persona-Prompted LLMs Emulate Subgroup Values? An Empirical Analysis of Generalisability and Fairness in Cultural Alignment","url":"https://www.microsoft.com/en-us/research/publication/can-persona-prompted-llms-emulate-subgroup-values-an-empirical-analysis-of-generalisability-and-fairness-in-cultural-alignment/","published":"2026-04-14","authors":["Bryan Chen Zhengyu Tan","Zhengyuan Liu","Xiaoyuan Yi","Jing Yao","Xing Xie","Nancy F. Chen","Roy Ka-Wei Lee"],"abstract":"Despite their global prevalence, many Large Language Models (LLMs) are aligned to a monolithic, often Western-centric set of values. This paper investigates the more challenging task of fine-grained value alignment: examining whether LLMs can emulate the distinct cultural values of demographic subgroups. Using Singapore as a case study and the World Values Survey (WVS), we examine the value landscape and show that even state-of-the-art models like GPT-4.1 achieve only 57.4% accuracy in predicting subgroup modal preferences. We construct a dataset of over 20,000 samples to train and evaluate a range of models. We demonstrate that simple fine-tuning on structured numerical preferences yields substantial gains, improving accuracy on unseen, out-of-distribution subgroups by an average of 17.4%. These gains partially transfer to open-ended generation. However, we find significant pre-existing...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Miscellaneous","Artificial intelligence","Social sciences","Computer science","Computers and Society"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"official:ec715b8b53e83127","title":"Automated Alignment Researchers: Using large language models to scale scalable oversight","url":"https://www.anthropic.com/research/automated-alignment-researchers","published":"2026-04-14","authors":["Anthropic"],"abstract":"Alignment","companies":["Anthropic"],"matched_orgs":["Anthropic"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Alignment"],"author_affiliations":["Anthropic"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Anthropic research page https://www.anthropic.com/research"}},{"id":"arxiv:2604.12652","title":"PromptEcho: Annotation-Free Reward from Vision-Language Models for Text-to-Image Reinforcement Learning","url":"https://arxiv.org/abs/2604.12652","published":"2026-04-14","authors":["Jinlong Liu","Wanggui He","Peng Zhang","Mushui Liu","Hao Jiang","Pipei Huang"],"abstract":"Reinforcement learning (RL) can improve the prompt following capability of text-to-image (T2I) models, yet obtaining high-quality reward signals remains challenging: CLIP Score is too coarse-grained, while VLM-based reward models (e.g., RewardDance) require costly human-annotated preference data and additional fine-tuning. We propose PromptEcho, a reward construction method that requires \\emph{no} annotation and \\emph{no} reward model training. Given a generated image and a guiding query, PromptEcho computes the token-level cross-entropy loss of a frozen VLM with the original prompt as the label, directly extracting the image-text alignment knowledge encoded during VLM pretraining. The reward is deterministic, computationally efficient, and improves automatically as stronger open-source VLMs become available. For evaluation, we develop DenseAlignBench, a benchmark of concept-rich dense c...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"","openalex_id":"https://openalex.org/W7154655534","cited_by_count":0,"quality_score":45,"matched_keywords":["preference","efficient"],"author_affiliations":["Alibaba Group (China)","Alibaba Group (United States)"],"concepts":[{"id":"https://openalex.org/C97541855","display_name":"Reinforcement learning","score":0.7497000098228455},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7246000170707703},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.6780999898910522},{"id":"https://openalex.org/C185798385","display_name":"Benchmark (surveying)","score":0.5983999967575073},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.585099995136261},{"id":"https://openalex.org/C2779530757","display_name":"Quality (philosophy)","score":0.44609999656677246},{"id":"https://openalex.org/C2776321320","display_name":"Annotation","score":0.42800000309944153},{"id":"https://openalex.org/C2781249084","display_name":"Preference","score":0.4068000018596649}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"arxiv:2604.12782","title":"OSC: Hardware Efficient W4A4 Quantization via Outlier Separation in Channel Dimension","url":"https://arxiv.org/abs/2604.12782","published":"2026-04-14","authors":["Zhiyuan Zhang","Yanzhao Li","Zhiqiang Zou","Bai Du","Yupeng Sun","Hui Dong","Hui Wang"],"abstract":"While 4-bit quantization is essential for high-throughput deployment of Large Language Models, activation outliers often lead to significant accuracy degradation due to the restricted dynamic range of low-bit formats. In this paper, we systematically investigate the spatial distribution of outliers and demonstrate a token-persistent structural clustering effect, where high-magnitude outliers consistently occupy fixed channels across tokens. Building on this insight, we propose OSC, a hardware-efficient framework for outlier suppression. During inference, OSC executes a dual-path computation consisting of a low-precision 4-bit General Matrix Multiplication (GEMM) path and a high-precision 16-bit branch GEMM path. Specifically, OSC uses an offline group-wise strategy to identify the channels where outliers are located and then performs structured sub-tensor extraction to coalesce these sca...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"","openalex_id":"https://openalex.org/W7154655646","cited_by_count":0,"quality_score":45,"matched_keywords":["efficient","quantization"],"author_affiliations":["Huawei Technologies (China)"],"concepts":[{"id":"https://openalex.org/C79337645","display_name":"Outlier","score":0.7955999970436096},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7190999984741211},{"id":"https://openalex.org/C73555534","display_name":"Cluster analysis","score":0.6942999958992004},{"id":"https://openalex.org/C739882","display_name":"Anomaly detection","score":0.5533000230789185},{"id":"https://openalex.org/C68339613","display_name":"Speedup","score":0.520799994468689},{"id":"https://openalex.org/C28855332","display_name":"Quantization (signal processing)","score":0.4846000075340271},{"id":"https://openalex.org/C11413529","display_name":"Algorithm","score":0.48339998722076416},{"id":"https://openalex.org/C45374587","display_name":"Computation","score":0.46630001068115234}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"arxiv:2604.14220","title":"Knowledge Graph RAG: Agentic Crawling and Graph Construction in Enterprise Documents","url":"https://arxiv.org/abs/2604.14220","published":"2026-04-14","authors":["Koushik Chakraborty","Koyel Guha"],"abstract":"This research paper addresses the limitations of semantic search in complex enterprise document ecosystems. Traditional RAG pipelines often fail to capture hierarchical and interconnected information, leading to retrieval inaccuracies. We propose Agentic Knowledge Graphs featuring Recursive Crawling as a robust solution for navigating superseding logic and multi-hop references. Our benchmark evaluation using the Code of Federal Regulations (CFR) demonstrates that this Knowledge Graph-enhanced approach achieves a 70% accuracy improvement over standard vector-based RAG systems, providing exhaustive and precise answers for complex regulatory queries.","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"","openalex_id":"https://openalex.org/W7154865779","cited_by_count":0,"quality_score":41,"matched_keywords":["retrieval"],"author_affiliations":["Global Services (Slovakia)","Google (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7484999895095825},{"id":"https://openalex.org/C100368936","display_name":"Crawling","score":0.7143999934196472},{"id":"https://openalex.org/C2987255567","display_name":"Knowledge graph","score":0.6952000260353088},{"id":"https://openalex.org/C132525143","display_name":"Graph","score":0.444599986076355},{"id":"https://openalex.org/C185798385","display_name":"Benchmark (surveying)","score":0.40939998626708984},{"id":"https://openalex.org/C2129575","display_name":"Semantic Web","score":0.4081999957561493},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.4050999879837036},{"id":"https://openalex.org/C234837","display_name":"Conceptual graph","score":0.39629998803138733}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"arxiv:2604.12374","title":"Nemotron 3 Super: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning","url":"https://huggingface.co/papers/2604.12374","published":"2026-04-14","authors":["NVIDIA","Aakshita Chandiramani","Aaron Blakeman","Abdullahi Olaoye","Abhibha Gupta","Abhilash Somasamudramath","Abhinav Khattar","Adeola Adesoba","Adi Renduchintala","Adil Asif","Aditya Agrawal","Aditya Vavre"],"abstract":"We describe the pre-training, post-training, and quantization of Nemotron 3 Super, a 120 billion (active 12 billion) parameter hybrid Mamba-Attention Mixture-of-Experts model. Nemotron 3 Super is the first model in the Nemotron 3 family to 1) be pre-trained in NVFP4, 2) leverage LatentMoE, a new Mixture-of-Experts architecture that optimizes for both accuracy per FLOP and accuracy per parameter, and 3) include MTP layers for inference acceleration through native speculative decoding. We pre-trained Nemotron 3 Super on 25 trillion tokens followed by post-training using supervised fine tuning (SFT) and reinforcement learning (RL). The final model supports up to 1M context length and achieves comparable accuracy on common benchmarks, while also achieving up to 2.2x and 7.5x higher inference throughput compared to GPT-OSS-120B and Qwen3.5-122B, respectively. Nemotron 3 Super datasets, along....","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["huggingface_search"],"source":"huggingface_search","work_type":"","doi":"","openalex_id":"","cited_by_count":0,"quality_score":35,"matched_keywords":["efficient","quantization"],"author_affiliations":[],"concepts":[],"official_report":false,"quality_signals":{"company_match_source":"HuggingFace Papers organization/author metadata"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/exploring-the-future-of-ai-in-clinical-collaboration-a-study-on-tumor-board-case-preparation","title":"Exploring the Future of AI in Clinical Collaboration: A Study on Tumor Board Case Preparation","url":"https://www.microsoft.com/en-us/research/publication/exploring-the-future-of-ai-in-clinical-collaboration-a-study-on-tumor-board-case-preparation/","published":"2026-04-13","authors":["Jiachen Li","Amanda K. Hall","Ruican Zhong","Selin S. Everett","Alyssa Unell","Hanwen Xu","Matthias Blondeel","Jonathan M. Carlson","Katie Claveau","Thulasee Jose","Tristan Naumann","David C. Rhew"],"abstract":"Multidisciplinary tumor boards (MTBs) bring specialists together to identify therapies for complex cancer cases, but preparing for them is time-intensive. Clinicians must extract key details from extensive records and evaluate treatment options. While large language models (LLMs) show promise in medicine for basic tasks like summarizing notes, little is known about their role in high-stakes tasks like MTB preparation. We conducted a mixed-methods study with 16 oncologists using two AI systems to prepare patient cases for MTB: an off-the-shelf assistant (Copilot) and a task-specific multi-agent system (Healthcare Agent Orchestrator, HAO). We analyzed oncologist prompts, AI responses, and oncologists' perception of AI. Participants showed greater willingness to adopt HAO but were often overconfident in AI summaries and skeptical of AI-recommended therapies. Trust calibration strategies, su...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.1145/3772318.3790448","openalex_id":"https://openalex.org/W7153991838","cited_by_count":1,"quality_score":101,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Medical, health and genomics","Healthcare","Human-AI Collaboration","Human-AI interaction","Human–computer interaction","large language models","Multi-agent system","1970-01-01","agent","multi-agent"],"author_affiliations":["Microsoft","Microsoft (United States)","Northeastern University","Stanford University","University of Washington"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/storycaster-an-ai-system-for-immersive-room-based-storytelling","title":"Storycaster: An AI System for Immersive Room-Based Storytelling","url":"https://www.microsoft.com/en-us/research/publication/storycaster-an-ai-system-for-immersive-room-based-storytelling/","published":"2026-04-13","authors":["Naisha Agarwal","Judith Amores","Andrew D. Wilson"],"abstract":"While Cave Automatic Virtual Environment (CAVE) systems have long enabled room-scale virtual reality and various kinds of interactivity, their content has largely remained predetermined. We present Storycaster, a generative AI CAVE system that transforms physical rooms into responsive storytelling environments. Unlike headset-based VR, Storycaster preserves spatial awareness, using live camera feeds to augment the walls with cylindrical projections, allowing users to create worlds that blend with their physical surroundings. Additionally, our system enables object-level editing, where physical items in the room can be transformed to their virtual counterparts in a story. A narrator agent guides participants, enabling them to co-create stories that evolve in response to voice commands, with each scene enhanced by generated ambient audio, dialogue, and imagery. Participants in our study (n...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.1145/3772318.3791305","openalex_id":"https://openalex.org/W7154087259","cited_by_count":1,"quality_score":85,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Graphics and multimedia","Human-computer interaction","Computer science","Human–computer interaction","1970-01-01","agent"],"author_affiliations":["Microsoft","Microsoft (United States)","University of California, Los Angeles"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/evaluating-cooperation-in-llm-social-groups-through-elected-leadership","title":"Evaluating Cooperation in LLM Social Groups through Elected Leadership","url":"https://www.microsoft.com/en-us/research/publication/evaluating-cooperation-in-llm-social-groups-through-elected-leadership/","published":"2026-04-13","authors":["Ryan Faulkner","Anushka Deshpande","David Guzman Piedrahita","Joel Z. Leibo","Zhijing Jin"],"abstract":"Governing common-pool resources requires agents to develop enduring strategies through cooperation and self-governance to avoid collective failure. While foundation models have shown potential for cooperation in these settings, existing multi-agent research provides little insight into whether structured leadership and election mechanisms can improve collective decision making. The lack of such a critical organizational feature ubiquitous in human society presents a significant shortcoming of the current methods. In this work we aim to directly address whether leadership and elections can support improved social welfare and cooperation through multi-agent simulation with LLMs. We present our open-source framework that simulates leadership through elected personas and candidate-driven agendas and carry out an empirical study of LLMs under controlled governance conditions. Our experiments....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":84,"matched_keywords":["Miscellaneous","Artificial intelligence","Human language technologies","Computer science","LLM","election","agent","multi-agent"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/toward-natural-and-companionable-virtual-agents-via-cross-temporal-emotional-modeling","title":"Toward Natural and Companionable Virtual Agents via Cross-Temporal Emotional Modeling","url":"https://www.microsoft.com/en-us/research/publication/toward-natural-and-companionable-virtual-agents-via-cross-temporal-emotional-modeling/","published":"2026-04-13","authors":["Yi Zheng","Feier Qin","Xiao Li","Haibin Huang","Hanyao Wang","Xiaoyu Wang","Yan Lu","Yuan Zhang"],"abstract":"Recent advances in foundation models have enabled conversational agents that aim for sustained companionship rather than mere task completion. Yet most still remain unable to support natural, long-term companion-like interactions, resulting in experiences that feel episodic and inauthentic. We argue that current agents overlooked cross-temporal modeling of agents’ social behaviors and internal emotions: generated behaviors rarely influence an agent’s emotional state, and emotional states seldom shape subsequent behaviors. We present Cross-Temporal Emotion Modeling (CTEM), a framework that links long-term behavioral history to moment-to-moment emotional expression. CTEM establishes a closed loop where past experiences update an evolving emotional state; this state conditions immediate interactions; and user feedback continually revises both memory and emotional state, enabling reflection....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":80,"matched_keywords":["Inproceedings (Conference)","Human-computer interaction","Human–computer interaction","1970-01-01","memory","long-term","agent"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/sanity-checks-for-agentic-data-science","title":"Sanity Checks for Agentic Data Science","url":"https://www.microsoft.com/en-us/research/publication/sanity-checks-for-agentic-data-science/","published":"2026-04-13","authors":["Zachary T. Rewolinski","Austin Zane","Haoyang Huang","Chandan Singh","Chenglong Wang","Jianfeng Gao","Bin Yu"],"abstract":"Agentic data science (ADS) pipelines have grown rapidly in both capability and adoption, with systems such as OpenAI Codex now able to directly analyze datasets and produce answers to statistical questions. However, these systems can reach falsely optimistic conclusions that are difficult for users to detect. To address this, we propose a pair of lightweight sanity checks grounded in the Predictability-Computability-Stability (PCS) framework for veridical data science. These checks use reasonable perturbations to screen whether an agent can reliably distinguish signal from noise, acting as a falsifiability constraint that can expose affirmative conclusions as unsupported. Together, the two checks characterize the trustworthiness of an ADS output, e.g. whether it has found stable signal, is responding to noise, or is sensitive to incidental aspects of the input. We validate the approach o...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":80,"matched_keywords":["Miscellaneous","Artificial intelligence","Data platforms and analytics","Computer science","Data science","Statistics","agent"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/human-expertise-for-ai-red-teaming-and-scalable-evaluation","title":"Human Expertise for AI Red-Teaming and Scalable Evaluation","url":"https://www.microsoft.com/en-us/research/publication/human-expertise-for-ai-red-teaming-and-scalable-evaluation/","published":"2026-04-13","authors":["Alice Qian","Srravya Chandhiramowuli","Laura A. Dabbish","Hong Shen","Alex S Taylor","Ding Wang","Theodora Skeadas","Bolor-Erdene Jagdagdorj"],"abstract":"Rapid adoption of generative AI has outpaced the infrastructure needed to red team systems responsibly. This workshop tackles a core tension: scaling AI red teaming while centering human expertise and well-being. We convene academic, industry, and nonprofit practitioners for two threads. (A) Vision: surface high-level goals and principles for effective, humane red teaming. (B) Build: identify opportunities to support human-AI red teaming, such as scenario libraries, role prompts for red teamers, and calibration methods that align automated efforts with human expertise. Through this workshop, we will develop a vision for the future of effective AI red teaming that leverages and protects human expertise while meeting the needs of evaluation at scale. Venue: 1970-01-01","companies":["Microsoft","Google/DeepMind"],"matched_orgs":["Microsoft","Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.1145/3772363.3778702","openalex_id":"https://openalex.org/W7153795152","cited_by_count":0,"quality_score":80,"matched_keywords":["Inproceedings (Conference)","Human-computer interaction","Human–computer interaction","1970-01-01"],"author_affiliations":["Microsoft","Addgene","Carnegie Mellon University","Google (United States)","Microsoft (United States)","University of Edinburgh"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/the-promise-and-peril-of-on-device-ai-for-conservation-work","title":"The Promise and Peril of On-Device AI for Conservation Work","url":"https://www.microsoft.com/en-us/research/publication/the-promise-and-peril-of-on-device-ai-for-conservation-work/","published":"2026-04-13","authors":["Cynthia Dong","Emmanuel Azuh Mensah","Vaishnavi Ranganathan","Kurtis Heimerl"],"abstract":"At the heart of conservation are the field staff who study and monitor ecosystems in challenging environments. Recent advances in AI models raise the question of whether LLM assistants could improve the experience of data collection for these staff. However, on-device AI deployment for conservation field work poses significant challenges, and is understudied. To address this gap, we conducted semi-structured interviews, surveys, and participant observation with partner conservancies in the Pacific Northwest and Namibia to better understand the field work context. We employ speculative methods through the lens of technology acceptance theory to critically analyze how on-device AI would affect field work, by developing an on-device transcription-language model pipeline, which we built atop of EarthRanger, a widely-used, open-source conservation platform. Our findings suggest that although....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Human-computer interaction","Human–computer interaction","1970-01-01","LLM","language model"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/continuous-benchmark-generation-for-evaluating-enterprise-scale-llm-agents","title":"Continuous Benchmark Generation for Evaluating Enterprise-scale LLM Agents","url":"https://www.microsoft.com/en-us/research/publication/continuous-benchmark-generation-for-evaluating-enterprise-scale-llm-agents/","published":"2026-04-13","authors":["Divyanshu Saxena","Rishikesh Maurya","Xiaoxuan Ou","Gagan Somashekar","Shachee Mishra Gupta","Arun Iyer","Yu Kang","Chetan Bansal","Aditya Akella","Saravan Rajmohan"],"abstract":"The rapid adoption of AI agents across domains has made systematic evaluation crucial for ensuring their usefulness and successful production deployment. Evaluation of AI agents typically involves using a fixed set of benchmarks and computing multiple evaluation metrics for the agent. While sufficient for simple coding tasks, these benchmarks fall short for enterprise-scale agents, where services and requirements evolve continuously and ground-truth examples are sparse. We propose a process of benchmark generation that helps evolve the benchmarks as the requirements change and perform robust evaluation of evolving AI agents. We instantiate this approach for a case study of service migration from one deployment platform to another at a large public enterprise. Our approach relies on semi-structured documents where developers express the high-level intent, and uses state-of-the-art LLMs to...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","1970-01-01","LLM","agent"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/the-sure-framework-social-intelligence-for-human-agent-collaboration","title":"The SURE Framework: Social Intelligence for Human-Agent Collaboration","url":"https://www.microsoft.com/en-us/research/publication/the-sure-framework-social-intelligence-for-human-agent-collaboration/","published":"2026-04-13","authors":["Javier Hernandez","Ed Cutrell","John Tang","Denae Ford","Martez Mott","Sasa Junuzovic","Andrew D. Wilson","Kori Inkpen"],"abstract":"Large Language Model agents are evolving from question-answering tools to genuine collaborators that run continuously, reason across turns, reflect on tool usage, and act on our behalf. Yet effective human-agent collaboration requires more than raw capability. It demands social intelligence: the ability to sense, understand, remember, and engage in ways that feel natural, effective, and meaningful. We argue that social intelligence, not reasoning capability, is the primary bottleneck preventing LLM agents from becoming genuine collaborators. We propose SURE (Sense, Understand, Remember, Engage) as a conceptual framework for organizing research on socially intelligent agents. SURE decomposes social intelligence into four interdependent processes: (1) Sensing user cognitive and emotional states through multimodal signals, (2) Understanding user beliefs, intentions, and state through theory...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Artificial intelligence","1970-01-01","LLM","language model","agent"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/swe-agile-a-software-agent-framework-for-efficiently-managing-dynamic-reasoning-context","title":"SWE-AGILE: A Software Agent Framework for Efficiently Managing Dynamic Reasoning Context","url":"https://www.microsoft.com/en-us/research/publication/swe-agile-a-software-agent-framework-for-efficiently-managing-dynamic-reasoning-context/","published":"2026-04-13","authors":["Shuquan Lian","Juncheng Liu","Yazhe Chen","Yuhong Chen","Hui Li"],"abstract":"Prior representative ReAct-style approaches in autonomous Software Engineering (SWE) typically lack the explicit System-2 reasoning required for deep analysis and handling complex edge cases. While recent reasoning models demonstrate the potential of extended Chain-of-Thought (CoT), applying them to the multi-turn SWE task creates a fundamental dilemma: retaining full reasoning history leads to context explosion and Lost-in-the-Middle''degradation, while discarding it would force the agent to redundantly re-reason at every step. To address these challenges, we propose SWE-AGILE, a novel software agent framework designed to bridge the gap between reasoning depth, efficiency, and context constraints. SWE-AGILE introduces a Dynamic Reasoning Context strategy, maintaining a sliding window''of detailed reasoning for immediate continuity to prevent redundant re-analyzing, while compressing his...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Miscellaneous","Artificial intelligence","Programming languages and software engineering","Computer science","agent"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/whose-knowledge-counts-co-designing-community-centered-ai-auditing-tools-with-educators-in-hawaii","title":"Whose Knowledge Counts? Co-Designing Community-Centered AI Auditing Tools with Educators in Hawaii","url":"https://www.microsoft.com/en-us/research/publication/whose-knowledge-counts-co-designing-community-centered-ai-auditing-tools-with-educators-in-hawaii/","published":"2026-04-13","authors":["Dora Zhao","Hannah Cha","Michael J Ryan","Angelina Wang","Rachel Baker-Ramos","Evyn-Bree Helekahi-Kaiwi","Rebecca Diego","Josiah Hester","Diyi Yang"],"abstract":"Although generative AI is being deployed into classrooms with promises of aiding teachers, educators caution that these tools can have unintended pedagogical repercussions, including cultural misrepresentation and bias. These concerns are heightened in low-resource language and Indigenous education settings, where AI systems frequently underperform. We investigate these challenges in Hawaii, where public schools operate under a statewide mandate to integrate Hawaiian language and culture into education. Through four co-design workshops with 22 public school educators, we surfaced concerns about using generative AI in educational settings, particularly around cultural misrepresentation and corresponding designs for auditing tools that address these issues. We find that educators envision tools grounded in specific Hawaiian cultural values and practices, such as tracing the genealogy of kn...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Inproceedings (Conference)","Human-computer interaction","Human Computer Interaction","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/the-global-impact-of-generative-ai-on-the-hci-landscape-international-perspectives-on-hci-education-industry-dynamics-and-funding-considerations","title":"The Global Impact of Generative AI on the HCI Landscape: International Perspectives on HCI Education, Industry Dynamics, and Funding Considerations","url":"https://www.microsoft.com/en-us/research/publication/the-global-impact-of-generative-ai-on-the-hci-landscape-international-perspectives-on-hci-education-industry-dynamics-and-funding-considerations/","published":"2026-04-13","authors":["Guo Freeman","Cliff Lampe","Elizabeth D. Mynatt","Heloisa Candello","Kori Inkpen","Nitesh Goyal","K. Karahalios","Xiaojuan Ma","Paweł W. Woźniak"],"abstract":"In recent years, we have witnessed a boom of AI-related research from both academia and industry at CHI. Built upon these ongoing conversations and a recent panel at CSCW 2025, this panel aims to promote community-wide discussions that reflect on generative AI’s multidimensional impact on the global HCI landscape beyond specific research agendas or directions. In particular, rather than discussing such impact at the regional or even national level, we will highlight international perspectives on AI’s impact on HCI education, industry dynamics, and funding considerations across various cultures and regions. Featuring a diverse group of panelists, including academic leaders in HCI education and industry experts from various regions, this panel aims to foster collective reflection at CHI on key questions crucial to sustaining the future of HCI as an international community. Venue: 1970-01-0...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Inproceedings (Conference)","Human-computer interaction","Human–computer interaction","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/programmers-who-use-screen-readers-in-the-vibe-coding-era-adaptation-empowerment-and-new-accessibility-landscape","title":"Programmers Who Use Screen Readers in the Vibe Coding Era: Adaptation, Empowerment, and New Accessibility Landscape","url":"https://www.microsoft.com/en-us/research/publication/programmers-who-use-screen-readers-in-the-vibe-coding-era-adaptation-empowerment-and-new-accessibility-landscape/","published":"2026-04-13","authors":["Nan Chen","Luna K. Qiu","Arran Zeyu Wang","Zilong Wang","Yuqing Yang"],"abstract":"Generative AI agents are reshaping human-computer interaction, shifting users from direct task execution to supervising machine-driven actions, especially the rise of “vibe coding” in programming. Yet little is known about how programmers who use screen readers interact with AI code assistants in practice. We conducted a longitudinal study with 16 blind and low-vision programmers. Participants completed a GitHub Copilot tutorial, engaged with a programming task, and provided initial feedback. After two weeks of AI-assisted programming, follow-ups examined how their practices and perceptions evolved. Our findings show that code assistants enhanced programming efficiency and bridged accessibility gaps. However, participants struggled to convey intent, interpret AI outputs, and manage multiple views while maintaining situational awareness. They showed diverse preferences for accessibility f...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Inproceedings (Conference)","Human-computer interaction","Human–computer interaction","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/interaction-augmented-instruction-modeling-the-synergy-of-prompts-and-interactions-in-human-genai-collaboration","title":"Interaction-Augmented Instruction: Modeling the Synergy of Prompts and Interactions in Human-GenAI Collaboration","url":"https://www.microsoft.com/en-us/research/publication/interaction-augmented-instruction-modeling-the-synergy-of-prompts-and-interactions-in-human-genai-collaboration/","published":"2026-04-13","authors":["Leixian Shen","Yifang Wang","Huamin Qu","Xing Xie","Haotian Li"],"abstract":"Text prompt is the most common way for human-generative AI (GenAI) communication. Though convenient, it is challenging to convey fine-grained and referential intent. One promising solution is to combine text prompts with precise GUI interactions, like brushing and clicking. However, there lacks a formal model to capture synergistic designs between prompts and interactions, hindering their comparison and innovation. To fill this gap, via an iterative and deductive process, we develop the Interaction-Augmented Instruction (IAI) model, a compact entity–relation graph formalizing how the combination of interactions and text prompts enhances human-GenAI communication. With the model, we distill twelve recurring and composable atomic interaction paradigms from prior tools, verifying our model’s capability to facilitate systematic design characterization and comparison. Four usage scenarios fur...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Inproceedings (Conference)","Human-computer interaction","Computer science","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/how-do-human-creators-embrace-human-ai-co-creation-a-perspective-on-human-agency-of-screenwriters","title":"How Do Human Creators Embrace Human-AI Co-Creation? A Perspective on Human Agency of Screenwriters","url":"https://www.microsoft.com/en-us/research/publication/how-do-human-creators-embrace-human-ai-co-creation-a-perspective-on-human-agency-of-screenwriters/","published":"2026-04-13","authors":["Yuying Tang","Jiayi Zhou","Haotian Li","Xing Xie","Xiaojuan Ma","Huamin Qu"],"abstract":"Generative AI has greatly transformed creative work in various domains, such as screenwriting. To understand this transformation, prior research often focused on capturing a snapshot of human-AI co-creation practice at a specific moment, with less attention to how humans mobilize, regulate, and reflect to form the practice gradually. Motivated by Bandura's theory of human agency, we conducted a two-week study with 19 professional screenwriters to investigate how they embraced AI in their creation process. Our findings revealed that screenwriters not only mindfully planned, foresaw, and responded to AI usage, but, more importantly, through reflections on practice, they developed themselves and human-AI co-creation paradigms, such as cognition, strategies, and workflows. They also expressed various expectations for how future AI should better support their agency. Based on our findings, we...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Inproceedings (Conference)","Human-computer interaction","Computer science","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/discourse-diversity-in-multi-turn-empathic-dialogue","title":"Discourse Diversity in Multi-Turn Empathic Dialogue","url":"https://www.microsoft.com/en-us/research/publication/discourse-diversity-in-multi-turn-empathic-dialogue/","published":"2026-04-13","authors":["Hongli Zhan","Emma S. Gueorguieva","Javier Hernandez","Jina Suh","Desmond C. Ong","Junyi Jessy Li"],"abstract":"Large language models (LLMs) produce responses rated as highly empathic in single-turn settings (Ayers et al., 2023; Lee et al., 2024), yet they are also known to be formulaic generators that reuse the same lexical patterns, syntactic templates, and discourse structures across tasks (Jiang et al., 2025; Shaib et al., 2024; Namuduri et al., 2025). Less attention has been paid to whether this formulaicity extends to the level of discourse moves, i.e., what a response does for the person it is addressing. This question is especially consequential for empathic dialogue, where effective support demands not just a kind response at one moment but varied strategies as a conversation unfolds (Stiles et al., 1998). Indeed, prior work shows that LLMs reuse the same tactic sequences more than human supporters in single-turn settings (Gueorguieva et al., 2026). We extend this analysis to multi-turn c...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Artificial intelligence","Human language technologies","Computation and Language","Computer science"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/cspo-alleviating-reward-ambiguity-for-structured-table-to-latex-generation","title":"CSPO: Alleviating Reward Ambiguity for Structured Table-to-LaTeX Generation","url":"https://www.microsoft.com/en-us/research/publication/cspo-alleviating-reward-ambiguity-for-structured-table-to-latex-generation/","published":"2026-04-13","authors":["Yunfan Yang","Cuiling Lan","Jitao Sang","Yan Lu"],"abstract":"Tables contain rich structured information, yet when stored as images their contents remain\"locked\"within pixels. Converting table images into LaTeX code enables faithful digitization and reuse, but current multimodal large language models (MLLMs) often fail to preserve structural, style, or content fidelity. Conventional post-training with reinforcement learning (RL) typically relies on a single aggregated reward, leading to reward ambiguity that conflates multiple behavioral aspects and hinders effective optimization. We propose Component-Specific Policy Optimization (CSPO), an RL framework that disentangles optimization across LaTeX tables components-structure, style, and content. In particular, CSPO assigns component-specific rewards and backpropagates each signal only through the tokens relevant to its component, alleviating reward ambiguity and enabling targeted component-wise opti...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/dittos-mimetic-reciprocal-agents-in-ai-mediated-communication","title":"Dittos: Mimetic, Reciprocal Agents in AI Mediated Communication","url":"https://www.microsoft.com/en-us/research/publication/dittos-mimetic-reciprocal-agents-in-ai-mediated-communication/","published":"2026-04-13","authors":["Ed Cutrell","John Tang","Sasa Junuzovic","Martez Mott","Denae Ford","Kori Inkpen"],"abstract":"Recent advances in generative AI have enabled agents that can represent specific people in social interactions when they are unavailable. These agents—often described as digital twins—can look and sound like an individual and participate in conversations on their behalf. In this position paper, we focus on a particular class of such systems that we refer to as Dittos: mimetic, reciprocal AI agents that not only interact with others as a proxy for a person but also report back what occurred so that human relationships can continue to develop over time. Drawing on our experiences designing, deploying, and studying Dittos, we argue that this class of systems surfaces a distinct set of challenges to core assumptions in AI mediated communication research. These challenges concern how presence is evoked through representation, how trust is established when AI speaks in someone’s voice, how peo...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Inproceedings (Conference)","Human-computer interaction","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"hf-org-paper:Tencent-Hunyuan:2604.12617","title":"SOAR: Self-Correction for Optimal Alignment and Refinement in Diffusion Models","url":"https://huggingface.co/papers/2604.12617","published":"2026-04-13","authors":["Tencent/Hunyuan"],"abstract":"","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["HuggingFace org papers","Tencent-Hunyuan"],"author_affiliations":["Tencent/Hunyuan"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/Tencent-Hunyuan/papers"}},{"id":"apple:b3r1jub8sog9hgatnc245cr4","title":"Cram Less to Fit More: Training Data Pruning Improves Memorization of Facts","url":"https://machinelearning.apple.com/research/cram-less","published":"2026-04-13","authors":["Jiayuan Ye","Vitaly Feldman","Kunal Talwar"],"abstract":"This paper was accepted at the Workshop on Navigating and Addressing Data Problems for Foundation Models at ICLR 2026.","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"openalex:W7153898680","title":"Human-AI Interaction Alignment: Designing, Evaluating, and Evolving Value-Centered AI For Reciprocal Human-AI Futures","url":"https://doi.org/10.1145/3772363.3778710","published":"2026-04-13","authors":["Hua Shen","Tiffany Knearem","Divy Thakkar","Pat Pataranutaporn","A. K. Sinha","Yike Shi","Jenny T. Liang","Lama Ahmad","Tanushree Mitra","Brad A. Myers","Yang Li"],"abstract":"The rapid integration of generative AI into everyday life underscores the need to move beyond unidirectional alignment models that only adapt AI to human values. This workshop focuses on bidirectional human-AI alignment, a dynamic, reciprocal process where humans and AI co-adapt through interaction, evaluation, and value-centered design. Building on our past CHI 2025 BiAlign SIG and ICLR 2025 Workshop, this workshop will bring together interdisciplinary researchers from HCI, AI, social sciences and more domains to advance value-centered AI and reciprocal human-AI collaboration. We focus on embedding human and societal values into alignment research, emphasizing not only steering AI toward human values but also enabling humans to critically engage with and evolve alongside AI systems. Through talks, interdisciplinary discussions, and collaborative activities, participants will explore met...","companies":["OpenAI","Google/DeepMind"],"matched_orgs":["OpenAI","Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3772363.3778710","openalex_id":"https://openalex.org/W7153898680","cited_by_count":0,"quality_score":49,"matched_keywords":[],"author_affiliations":["Carnegie Mellon University","Google (United States)","Massachusetts Institute of Technology","Mohamed bin Zayed University of Artificial Intelligence","New York University Shanghai","OpenAI (United States)","Seattle University","University of Washington"],"concepts":[{"id":"https://openalex.org/C106306483","display_name":"Futures contract","score":0.5163999795913696},{"id":"https://openalex.org/C2777742833","display_name":"Reciprocal","score":0.5153999924659729},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.4302000105381012},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.33009999990463257},{"id":"https://openalex.org/C12713177","display_name":"Perspective (graphical)","score":0.32359999418258667},{"id":"https://openalex.org/C162324750","display_name":"Economics","score":0.3181999921798706},{"id":"https://openalex.org/C2776401178","display_name":"Feature (linguistics)","score":0.3142000138759613},{"id":"https://openalex.org/C111472728","display_name":"Epistemology","score":0.2955000102519989}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7154081556","title":"AgentHands: Generating Interactive Hand Gestures for Spatially Grounded Agent Conversations in XR","url":"https://doi.org/10.1145/3772318.3790938","published":"2026-04-13","authors":["Ziyi Liu","David Li","Zhongyi Zhou","David Kim","Ruofei Du","Xun Qian"],"abstract":"Communicating spatial tasks via text or speech creates “a mental mapping gap” that limits an agent’s expressiveness. Inspired by co-speech gestures in face-to-face conversation, we propose AgentHands, an LLM-powered XR system that equips agents with hands to render responses clearer and more engaging. Guided by a design taxonomy distilled from a formative study (N=10), we implement a novel pipeline to generate and render a hand agent that augments conversational responses with synchronized, space-aware, and interactive hand gestures: using a meta-instruction, AgentHands generates verbal responses embedded with GestureEvents aligned to specific words; each event specifies gesture type and parameters. At runtime, a parser converts events into time-stamped poses and motions, driving an animation system that renders expressive hands synchronized with speech. In a within-subjects study (N=12)...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3772318.3790938","openalex_id":"https://openalex.org/W7154081556","cited_by_count":1,"quality_score":46,"matched_keywords":["LLM","agent"],"author_affiliations":["Google (Switzerland)","Google (United States)","Purdue University West Lafayette"],"concepts":[{"id":"https://openalex.org/C207347870","display_name":"Gesture","score":0.8338000178337097},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7038000226020813},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.6256999969482422},{"id":"https://openalex.org/C502989409","display_name":"Animation","score":0.5785999894142151},{"id":"https://openalex.org/C186644900","display_name":"Parsing","score":0.5418999791145325},{"id":"https://openalex.org/C2779662365","display_name":"Event (particle physics)","score":0.46540001034736633},{"id":"https://openalex.org/C43521106","display_name":"Pipeline (software)","score":0.42340001463890076},{"id":"https://openalex.org/C2777810175","display_name":"Event structure","score":0.36970001459121704}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W7154095102","title":"SceneScout: Towards AI-Driven Access to Street Level Imagery for Blind Users","url":"https://doi.org/10.1145/3772318.3790449","published":"2026-04-13","authors":["Gaurav Jain","Leah Findlater","Cole Gleason"],"abstract":"People who are blind or have low-vision (BLV) may hesitate to travel independently in unfamiliar environments due to uncertainty about the physical landscape. While most tools focus on in-situ navigation assistance, those supporting pre-travel assistance typically provide information about only landmarks and turn-by-turn instructions, lacking detailed visual context. Street level imagery, which contains rich visual information and has the potential to reveal numerous environmental details, remains inaccessible to BLV people. In this work, we present SceneScout, a multimodal large language model (MLLM)-driven prototype that enables accessible interactions with street level imagery. SceneScout supports two modes: (1) Route Preview, enabling users to familiarize themselves with visual details along a route, and (2) Virtual Exploration, enabling free, user-driven movement within street level...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3772318.3790449","openalex_id":"https://openalex.org/W7154095102","cited_by_count":1,"quality_score":42,"matched_keywords":["language model"],"author_affiliations":["Appen (United States)","Apple (United States)","Columbia University"],"concepts":[{"id":"https://openalex.org/C192209626","display_name":"Focus (optics)","score":0.7146999835968018},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6710000038146973},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.6098999977111816},{"id":"https://openalex.org/C2781372952","display_name":"Visual impairment","score":0.39489999413490295},{"id":"https://openalex.org/C2780878386","display_name":"Visual language","score":0.35749998688697815},{"id":"https://openalex.org/C36464697","display_name":"Visualization","score":0.3546000123023987},{"id":"https://openalex.org/C3020106864","display_name":"Visually impaired","score":0.3012000024318695},{"id":"https://openalex.org/C136764020","display_name":"World Wide Web","score":0.29679998755455017}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W7154168514","title":"Engaging Communities Meaningfully in Defining Disability Representation for AI Image Generation","url":"https://doi.org/10.1145/3772318.3790768","published":"2026-04-13","authors":["Anja Thieme","Rita Faia Marques","Martin Grayson","Sidhika Balachandar","Cameron Tyler Cassidy","Madiha Zahrah Choksi","Camilla Longden","Reeda Shimaz Huda","Nicholas Ileve Kalovwe","Christina Mallon","Courtney Mansperger","Daniela Massiceti"],"abstract":"Media representations of people with disabilities profoundly influence societal perceptions, yet have historically been absent, stereotyped, or inaccurate. As AI-generated visual media becomes increasingly prevalent, there is a critical opportunity to address these misrepresentations. Responding to the lack of collectively negotiated representation standards, this paper presents our human-centric approach to engaging disability communities meaningfully in AI data practices. Over three months, we worked closely with three disability organizations across the Global North and South to develop the Community Library Creator that introduces design scaffolds to support communities in defining ‘good’ representation and curating community-centric AI datasets; laying the foundations for community-specific evaluation metrics and future model adaptations. We contribute qualitative insights into the....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3772318.3790768","openalex_id":"https://openalex.org/W7154168514","cited_by_count":1,"quality_score":42,"matched_keywords":["media"],"author_affiliations":["Child Welfare Society of Kenya","Cornell University","Echo Network Africa","Georgia Institute of Technology","Lipman Hearne (United States)","Microsoft (Canada)","Microsoft (Finland)","Microsoft (France)","Microsoft (United States)","Microsoft Research (United Kingdom)","Microsoft Research Montréal (Canada)","University of Waterloo"],"concepts":[{"id":"https://openalex.org/C2776359362","display_name":"Representation (politics)","score":0.7131999731063843},{"id":"https://openalex.org/C2776291640","display_name":"Value (mathematics)","score":0.48399999737739563},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.45489999651908875},{"id":"https://openalex.org/C2522767166","display_name":"Data science","score":0.45329999923706055},{"id":"https://openalex.org/C144024400","display_name":"Sociology","score":0.428600013256073},{"id":"https://openalex.org/C87156501","display_name":"Qualitative property","score":0.3912999927997589},{"id":"https://openalex.org/C157657479","display_name":"Closed captioning","score":0.3817000091075897},{"id":"https://openalex.org/C116409475","display_name":"External Data Representation","score":0.3652999997138977}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"arxiv:2509.12152","title":"Beyond PII: How Users Attempt to Estimate and Mitigate Implicit LLM Inference","url":"http://arxiv.org/abs/2509.12152","published":"2026-04-13","authors":["Synthia Wang","Sai Teja Peddinti","Nina Taft","Nick Feamster"],"abstract":"Large Language Models (LLMs) such as ChatGPT can infer personal attributes from seemingly innocuous text, raising privacy risks beyond memorized data leakage. While prior work has demonstrated these risks, little is known about how users estimate and respond. We conducted a survey with 240 U.S. participants who judged text snippets for inference risks, reported concern levels, and attempted rewrites to block inference. We compared their rewrites with those generated by ChatGPT and Rescriber, a state-of-the-art sanitization tool. Results show that participants struggled to anticipate inference, performing a little better than chance. User rewrites were effective in just 28% of cases - better than Rescriber but worse than ChatGPT. We examined our participants’ rewriting strategies, and observed that while paraphrasing was the most common strategy it is also the least effective; instead abs...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3772318.3791762","openalex_id":"https://openalex.org/W4415089829","cited_by_count":1,"quality_score":42,"matched_keywords":["LLM"],"author_affiliations":["Google (United States)","University of Chicago","University of Illinois Chicago"],"concepts":[{"id":"https://openalex.org/C2780522230","display_name":"Ambiguity","score":0.7412999868392944},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7390999794006348},{"id":"https://openalex.org/C2776214188","display_name":"Inference","score":0.7070000171661377},{"id":"https://openalex.org/C124304363","display_name":"Abstraction","score":0.5364000201225281},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.43709999322891235},{"id":"https://openalex.org/C154690210","display_name":"Rewriting","score":0.41819998621940613},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.40880000591278076},{"id":"https://openalex.org/C2777210771","display_name":"Block (permutation group theory)","score":0.35850000381469727}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"arxiv:2604.12110","title":"SOLARIS: Speculative Offloading of Latent-bAsed Representation for Inference Scaling","url":"https://arxiv.org/abs/2604.12110","published":"2026-04-13","authors":["Zikun Liu","Liang Luo","Qianru Li","Zhengyu Zhang","Wei Ling","Jingyi Shen","Zeliang Chen","Yaning Huang","Jingxian Huang","Abdallah Aboelela","Chonglin Sun","Feifan Gu"],"abstract":"Recent advances in recommendation scaling laws have led to foundation models of unprecedented complexity. While these models offer superior performance, their computational demands make real-time serving impractical, often forcing practitioners to rely on knowledge distillation-compromising serving quality for efficiency. To address this challenge, we present SOLARIS (Speculative Offloading of Latent-bAsed Representation for Inference Scaling), a novel framework inspired by speculative decoding. SOLARIS proactively precomputes user-item interaction embeddings by predicting which user-item pairs are likely to appear in future requests, and asynchronously generating their foundation model representations ahead of time. This approach decouples the costly foundation model inference from the latency-critical serving path, enabling real-time knowledge transfer from models previously considered...","companies":["Meta/FAIR"],"matched_orgs":["Meta/FAIR"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"","openalex_id":"https://openalex.org/W7154655559","cited_by_count":0,"quality_score":41,"matched_keywords":["distillation"],"author_affiliations":["Alpha Omega Alpha Medical Honor Society","Meta (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8112999796867371},{"id":"https://openalex.org/C2776214188","display_name":"Inference","score":0.689300000667572},{"id":"https://openalex.org/C2776359362","display_name":"Representation (politics)","score":0.6121000051498413},{"id":"https://openalex.org/C2780966255","display_name":"Foundation (evidence)","score":0.5164999961853027},{"id":"https://openalex.org/C197115733","display_name":"Forcing (mathematics)","score":0.5102999806404114},{"id":"https://openalex.org/C2779530757","display_name":"Quality (philosophy)","score":0.4214000105857849},{"id":"https://openalex.org/C161301231","display_name":"Knowledge representation and reasoning","score":0.4196000099182129},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4074000120162964}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"arxiv:2604.11102","title":"OmniScript: Towards Audio-Visual Script Generation for Long-Form Cinematic Video","url":"https://arxiv.org/abs/2604.11102","published":"2026-04-13","authors":["Junfu Pu","Yuxin Chen","Teng Wang","Ying Shan"],"abstract":"Current multimodal large language models (MLLMs) have demonstrated remarkable capabilities in short-form video understanding, yet translating long-form cinematic videos into detailed, temporally grounded scripts remains a significant challenge. This paper introduces the novel video-to-script (V2S) task, aiming to generate hierarchical, scene-by-scene scripts encompassing character actions, dialogues, expressions, and audio cues. To facilitate this, we construct a first-of-its-kind human-annotated benchmark and propose a temporally-aware hierarchical evaluation framework. Furthermore, we present OmniScript, an 8B-parameter omni-modal (audio-visual) language model tailored for long-form narrative comprehension. OmniScript is trained via a progressive pipeline that leverages chain-of-thought supervised fine-tuning for plot and character reasoning, followed by reinforcement learning using te...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"","openalex_id":"https://openalex.org/W7154539895","cited_by_count":0,"quality_score":41,"matched_keywords":["language model"],"author_affiliations":["Tencent (China)"],"concepts":[{"id":"https://openalex.org/C61423126","display_name":"Scripting language","score":0.8406999707221985},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7972999811172485},{"id":"https://openalex.org/C43521106","display_name":"Pipeline (software)","score":0.6097000241279602},{"id":"https://openalex.org/C2780861071","display_name":"Character (mathematics)","score":0.6014999747276306},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5831000208854675},{"id":"https://openalex.org/C2780801425","display_name":"Construct (python library)","score":0.5598000288009644},{"id":"https://openalex.org/C185798385","display_name":"Benchmark (surveying)","score":0.4846999943256378},{"id":"https://openalex.org/C199033989","display_name":"Narrative","score":0.4690000116825104}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"arxiv:2603.13200","title":"Navig-AI-tion: Navigation by Contextual AI and Spatial Audio","url":"http://arxiv.org/abs/2603.13200","published":"2026-04-13","authors":["Mathias N. Lystbæk","Haley Adams","Ranjith Kagathi Ananda","Eric J. Gonzalez","Luca Ballan","Qiuxuan Wu","Andrea Colaço","Peter Tan","Mar Gonzalez-Franco"],"abstract":"Audio-only walking navigation can leave users disoriented, relying on vague cardinal directions and lacking real-time environmental context, leading to frequent errors. To address this, we present a novel system that integrates a Vision Language Model (VLM) with a spatial audio cue. Our system extracts environmental landmarks to anchor navigation instructions and, crucially, provides a directional spatial audio signal when the user faces the wrong direction, indicating the precise turn direction. In a user study (n=12), the spatial audio cue with VLM reduced route deviations compared to both VLM-only and Google Maps (audio-only) baseline systems. Users reported that the spatial audio cue effectively supported orientation and that landmark-anchored instructions provided a better navigation experience over audio-only Google Maps. This work serves as an initial look at the utility of future...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3772363.3799295","openalex_id":"https://openalex.org/W7138190847","cited_by_count":0,"quality_score":41,"matched_keywords":["language model"],"author_affiliations":["Aarhus University","Google (United States)","Seattle University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7699000239372253},{"id":"https://openalex.org/C2777891301","display_name":"Navigation system","score":0.6252999901771545},{"id":"https://openalex.org/C16345878","display_name":"Orientation (vector space)","score":0.5882999897003174},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.515999972820282},{"id":"https://openalex.org/C43472768","display_name":"Turn-by-turn navigation","score":0.47940000891685486},{"id":"https://openalex.org/C64754055","display_name":"Spatial contextual awareness","score":0.459199994802475},{"id":"https://openalex.org/C12725497","display_name":"Baseline (sea)","score":0.45820000767707825},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.453000009059906}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"arxiv:2604.12108","title":"LLM-Based Automated Diagnosis Of Integration Test Failures At Google","url":"https://arxiv.org/abs/2604.12108","published":"2026-04-13","authors":["Celal Ziftci","Ray Liu","Spencer Greene","Livio Dalloro"],"abstract":"Integration testing is critical for the quality and reliability of complex software systems. However, diagnosing their failures presents significant challenges due to the massive volume, unstructured nature, and heterogeneity of logs they generate. These result in a high cognitive load, low signal-to-noise ratio, and make diagnosis difficult and time-consuming. Developers complain about these difficulties consistently and report spending substantially more time diagnosing integration test failures compared to unit test failures. To address these shortcomings, we introduce Auto-Diagnose, a novel diagnosis tool that leverages LLMs to help developers efficiently determine the root cause of integration test failures. Auto-Diagnose analyzes failure logs, produces concise summaries with the most relevant log lines, and is integrated into Critique, Google's internal code review system, providin...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"","openalex_id":"https://openalex.org/W7154655273","cited_by_count":0,"quality_score":41,"matched_keywords":["LLM"],"author_affiliations":["Google (United States)"],"concepts":[{"id":"https://openalex.org/C2781265381","display_name":"Helpfulness","score":0.6859999895095825},{"id":"https://openalex.org/C177212765","display_name":"Workflow","score":0.6536999940872192},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6180999875068665},{"id":"https://openalex.org/C2777267654","display_name":"Test (biology)","score":0.5756999850273132},{"id":"https://openalex.org/C148027188","display_name":"Unit testing","score":0.5509999990463257},{"id":"https://openalex.org/C43214815","display_name":"Reliability (semiconductor)","score":0.526199996471405},{"id":"https://openalex.org/C2779530757","display_name":"Quality (philosophy)","score":0.4593999981880188},{"id":"https://openalex.org/C130963320","display_name":"Root cause analysis","score":0.4505999982357025}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7153770945","title":"From Correctness to Collaboration: A Human-Centered Taxonomy of AI Agent Behavior in Software Engineering","url":"https://doi.org/10.1145/3772363.3798733","published":"2026-04-13","authors":["Tao Dong","Sherry Shi","Harini Sampath","Andrew Macvean"],"abstract":"The ongoing transition of Large Language Models in software engineering from code generators into autonomous agents requires a shift in how we define and measure success. While models are becoming more capable, the industry lacks a clear understanding of the behavioral norms that make an agent effective in collaborative software development in the enterprise. This work addresses this gap by presenting a taxonomy of desirable agent behaviors, synthesized from 91 sets of developer-defined rules for coding agents. We identify four core expectations: Adhere to Standards and Processes, Ensure Code Quality and Reliability, Solve Problems Effectively, and Collaborate with the Developer. These findings offer a concrete vocabulary for agent behavior, enabling researchers to move beyond correctness-only benchmarks and start designing evaluations that reflect the socio-technical nature of professio...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3772363.3798733","openalex_id":"https://openalex.org/W7153770945","cited_by_count":0,"quality_score":41,"matched_keywords":["agent"],"author_affiliations":["Google (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6632999777793884},{"id":"https://openalex.org/C55439883","display_name":"Correctness","score":0.5960000157356262},{"id":"https://openalex.org/C115903868","display_name":"Software engineering","score":0.5756999850273132},{"id":"https://openalex.org/C58642233","display_name":"Taxonomy (biology)","score":0.541100025177002},{"id":"https://openalex.org/C2777904410","display_name":"Software","score":0.41589999198913574},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4156999886035919},{"id":"https://openalex.org/C149091818","display_name":"Software system","score":0.31189998984336853},{"id":"https://openalex.org/C199360897","display_name":"Programming language","score":0.30640000104904175}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7153888140","title":"Developing an AI-Powered UX Research Point of View (POV)","url":"https://doi.org/10.1145/3772363.3778773","published":"2026-04-13","authors":["Hüseyin Doğan","Stephen Giff","R.S. Barsoum","Alan Dix"],"abstract":"User Experience Research (UXR) Points of View (POVs) distil complex data into actionable insights for product strategy and design, providing a focused perspective on user needs, pain points, and motivations, enabling teams to make informed decisions. This workshop aims to integrate Generative AI into every level of the UXR POV pyramid Framework (https://www.uxrpovplaybook.com/) to equip UX Practitioners with the essential tools needed to develop and articulate a persuasive PoV. The workshop will guide participants in utilizing Generative AI tools to establish a plan/roadmap for defining a POV; leverage plays and best practice cards that support the journey through the UXR framework; and develop a clear, structured POV narrative. The proposed workshop will provide exemplar case studies where AI will be utilized in conjunction with the UXR playbook to offer an opportunity for HCI researche...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3772363.3778773","openalex_id":"https://openalex.org/W7153888140","cited_by_count":0,"quality_score":41,"matched_keywords":["persuasive"],"author_affiliations":["Bournemouth University","Foundry (United Kingdom)","Google (United States)","Swansea University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5555999875068665},{"id":"https://openalex.org/C28719098","display_name":"Point (geometry)","score":0.5167999863624573},{"id":"https://openalex.org/C26517878","display_name":"Key (lock)","score":0.34060001373291016},{"id":"https://openalex.org/C12713177","display_name":"Perspective (graphical)","score":0.30410000681877136},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.2930999994277954},{"id":"https://openalex.org/C9652623","display_name":"Field (mathematics)","score":0.2621999979019165},{"id":"https://openalex.org/C2776401178","display_name":"Feature (linguistics)","score":0.25999999046325684},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.25780001282691956}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7153766509","title":"Speech AI for All: The What, How, and Who of Measurement","url":"https://doi.org/10.1145/3772363.3778768","published":"2026-04-13","authors":["Kimi Wenzel","Alisha Pradhan","Maria Teleki","Tobias M Weinberg","Robin Netzorg","Alyssa Hillary Zisk","Anna Seo Gyeong Choi","Jingjin Li","Raja Kushalnagar","Colin Lea","Abraham Glasser","Christian Vogler"],"abstract":"Optimized for “typical” and fluent speech, today’s speech AI systems perform poorly for people with speech diversities, sometimes to an unusable or even harmful degree. These harms play out in daily life through household voice assistants and workplace meeting services, in higher stakes scenarios like medical transcription, and in emerging applications of AI in augmentative and alternative communication. Standard metrics aiming to quantify these inequities, however, fail to comprehensively understand the impact of speech AI on diverse user groups, and furthermore do not easily generalize to newer speech language and speech generation models. To address these social inequities and measurement limitations, this workshop brings academics, practitioners, and non-profit workers together in proactive dialogue to improve measurement of speech AI performance and user impact. Through a poster ses...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3772363.3778768","openalex_id":"https://openalex.org/W7153766509","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["ATUM (United States)","Apple (United States)","Carnegie Mellon University","Cornell University","Gallaudet University","Georgetown University","New Jersey Institute of Technology","Texas A&M University","University of California, Berkeley","University of Maryland, College Park"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5371999740600586},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4528000056743622},{"id":"https://openalex.org/C28490314","display_name":"Speech recognition","score":0.43070000410079956},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.37700000405311584},{"id":"https://openalex.org/C2780009758","display_name":"Measure (data warehouse)","score":0.28610000014305115},{"id":"https://openalex.org/C2776401178","display_name":"Feature (linguistics)","score":0.24899999797344208},{"id":"https://openalex.org/C177264268","display_name":"Set (abstract data type)","score":0.243599995970726},{"id":"https://openalex.org/C168167062","display_name":"Component (thermodynamics)","score":0.24210000038146973}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7153776620","title":"Human-AI-UI Interactions Across Modalities","url":"https://doi.org/10.1145/3772363.3778737","published":"2026-04-13","authors":["Kewen Peng","Jeffrey Nichols","Christof Lutteroth","Tiffany Knearem","Felix Kretzer","Jeffrey P. Bigham","Alexander Maedche","Yue Jiang"],"abstract":"Designing and developing user-friendly interfaces has long been a cornerstone of HCI research, yet today we are at a turning point where UIs are no longer designed solely for humans but also for intelligent agents that act on users’ behalf, while UIs are also expanding beyond 2D screens into extended reality environments with inherently multimodal characteristics, together challenging us to rethink the role of the UI as a mediator of human–AI interaction. This workshop will explore how UI agents bridge human intent and system behavior by interpreting multimodal inputs and generating adaptive outputs across surfaces from screens to extended reality (XR), and we will examine not only their technical capabilities but also their broader impact, including how agents reshape daily workflows, how bidirectional alignment between human and AI activity can be achieved, and how generative models ma...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3772363.3778737","openalex_id":"https://openalex.org/W7153776620","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Apple (United States)","Carnegie Mellon University","Human Computer Interaction (Switzerland)","Karlsruhe Institute of Technology","Mohamed bin Zayed University of Artificial Intelligence","University of Bath","University of Utah"],"concepts":[{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.7527999877929688},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6647999882698059},{"id":"https://openalex.org/C2779903281","display_name":"Modalities","score":0.6621000170707703},{"id":"https://openalex.org/C31395832","display_name":"Testbed","score":0.6043999791145325},{"id":"https://openalex.org/C100776233","display_name":"Bridge (graph theory)","score":0.49230000376701355},{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.47859999537467957},{"id":"https://openalex.org/C28719098","display_name":"Point (geometry)","score":0.3937999904155731},{"id":"https://openalex.org/C2780616401","display_name":"Cornerstone","score":0.39259999990463257}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/structure-grounded-knowledge-retrieval-via-code-dependencies-for-multi-step-data-reasoning","title":"Structure-Grounded Knowledge Retrieval via Code Dependencies for Multi-Step Data Reasoning","url":"https://www.microsoft.com/en-us/research/publication/structure-grounded-knowledge-retrieval-via-code-dependencies-for-multi-step-data-reasoning/","published":"2026-04-12","authors":["Xinyi Huang","Haoyu Dong"],"abstract":"Selecting the right knowledge is critical when using large language models (LLMs) to solve domain-specific data analysis tasks. However, most retrieval-augmented approaches rely primarily on lexical or embedding similarity, which is often a weak proxy for the task-critical knowledge needed for multi-step reasoning. In many such tasks, the relevant knowledge is not merely textually related to the query, but is instead grounded in executable code and the dependency structure through which computations are carried out. To address this mismatch, we propose SGKR (Structure-Grounded Knowledge Retrieval), a retrieval framework that organizes domain knowledge with a graph induced by function-call dependencies. Given a question, SGKR extracts semantic input and output tags, identifies dependency paths connecting them, and constructs a task-relevant subgraph. The associated knowledge and correspon...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":80,"matched_keywords":["Miscellaneous","Artificial intelligence","Search and information retrieval","Computation and Language","Computer science","LLM","retrieval"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/bringing-value-models-back-generative-critics-for-value-modeling-in-llm-reinforcement-learning","title":"Bringing Value Models Back: Generative Critics for Value Modeling in LLM Reinforcement Learning","url":"https://www.microsoft.com/en-us/research/publication/bringing-value-models-back-generative-critics-for-value-modeling-in-llm-reinforcement-learning/","published":"2026-04-12","authors":["Zikang Shan","Han Zhong","Liwei Wang","Li Zhao"],"abstract":"Credit assignment is a central challenge in reinforcement learning (RL). Classical actor-critic methods address this challenge through fine-grained advantage estimation based on a learned value function. However, learned value models are often avoided in modern large language model (LLM) RL because conventional discriminative critics are difficult to train reliably. We revisit value modeling and argue that this difficulty is partly due to limited expressiveness. In particular, representation complexity theory suggests that value functions can be hard to approximate under the one-shot prediction paradigm used by existing value models, and our scaling experiments show that such critics do not improve reliably with scale. Motivated by this observation, we propose Generative Actor-Critic (GenAC), which replaces one-shot scalar value prediction with a generative critic that performs chain-of-...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Miscellaneous","Artificial intelligence","Computer science","Reinforcement learning","LLM","language model"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/intent-aligned-formal-specification-synthesis-via-traceable-refinement","title":"Intent-aligned Formal Specification Synthesis via Traceable Refinement","url":"https://www.microsoft.com/en-us/research/publication/intent-aligned-formal-specification-synthesis-via-traceable-refinement/","published":"2026-04-12","authors":["Zhe Ye","Aidan Z.H. Yang","Huangyuan Su","Zhenyu Liao","Samuel Tenka","Zhizhen Qin","Udaya Ghai","Dawn Song","Soonho Kong"],"abstract":"Large language models are increasingly used to generate code from natural language, but ensuring correctness remains challenging. Formal verification offers a principled way to obtain such guarantees by proving that a program satisfies a formal specification. However, specifications are frequently missing in real-world codebases, and writing high-quality specifications remains expensive and expertise-intensive. We present VeriSpecGen, a traceable refinement framework that synthesizes intent-aligned specifications in Lean through requirement-level attribution and localized repair. VeriSpecGen decomposes natural language into atomic requirements and generates requirement-targeted tests with explicit traceability maps to validate generated specifications. When validation fails, traceability maps attribute failures to specific requirements, enabling targeted clause-level repairs. VeriSpecGen...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Miscellaneous","Artificial intelligence","Programming languages and software engineering","Computer science","Machine learning"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/agent2-rl-bench-can-llm-agents-engineer-agentic-rl-post-training","title":"Agent^2 RL-Bench: Can LLM Agents Engineer Agentic RL Post-Training?","url":"https://www.microsoft.com/en-us/research/publication/agent2-rl-bench-can-llm-agents-engineer-agentic-rl-post-training/","published":"2026-04-12","authors":["Wanyi Chen","Xiao Yang","Xu Yang","Tianming Sha","Qizheng Li","Zhuo Wang","Bowen Xian","Fang Kong","Weiqing Liu","Jiang Bian"],"abstract":"We introduce Agent^2 RL-Bench, a benchmark for evaluating agentic RL post-training -- whether LLM agents can autonomously design, implement, and run complete RL pipelines that improve foundation models. This capability is important because RL post-training increasingly drives model alignment and specialization, yet existing benchmarks remain largely static: supervised fine-tuning alone yields strong results, leaving interactive RL engineering untested. Agent^2 RL-Bench addresses this with six tasks across three levels -- from static rule-based training to closed-loop online RL with trajectory collection -- each adding a structural requirement that prior levels do not impose. The benchmark provides isolated workspaces with a grading API, runtime instrumentation that records every submission and code revision, and automated post-hoc analysis that generates structured run reports, enabling....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Miscellaneous","Artificial intelligence","Computer science","LLM","agent"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"hf-org-paper:Qwen:2604.10866","title":"OccuBench: Evaluating AI Agents on Real-World Professional Tasks via Language World Models","url":"https://huggingface.co/papers/2604.10866","published":"2026-04-12","authors":["Alibaba/Qwen"],"abstract":"","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["HuggingFace org papers","Qwen"],"author_affiliations":["Alibaba/Qwen"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/Qwen/papers"}},{"id":"arxiv:2509.19094","title":"Pathways of Thoughts: Multi-Directional Thinking for Long-form Personalized Question Answering","url":"http://arxiv.org/abs/2509.19094","published":"2026-04-12","authors":["Alireza Salemi","Cheng Li","Mingyang Zhang","Qiaozhu Mei","Zhuowan Li","Spurthi Amba Hombaiah","Weize Kong","Tao Chen","Hamed Zamani","Michael Bendersky"],"abstract":"Personalization is well studied in search and recommendation, but personalized question answering remains underexplored due to challenges in inferring preferences from long, noisy, implicit contexts and generating responses that are both accurate and aligned with user expectations. To address this, we propose Pathways of Thoughts (PoT), an inference-stage method that applies to any large language model (LLM) without task-specific fine-tuning. PoT models the thinking as an iterative decision process, where the model dynamically selects among cognitive operations such as reasoning, revision, personalization, and clarification. This enables exploration of multiple reasoning trajectories, producing diverse candidate responses that capture different perspectives. PoT then aggregates and reweights these candidates according to inferred user preferences, yielding a final personalized response t...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3774904.3792145","openalex_id":"https://openalex.org/W4416254476","cited_by_count":0,"quality_score":53,"matched_keywords":["LLM","language model","personalized","personalization"],"author_affiliations":["Google (United States)","University of Massachusetts Amherst","University of Michigan"],"concepts":[{"id":"https://openalex.org/C183003079","display_name":"Personalization","score":0.8109999895095825},{"id":"https://openalex.org/C44291984","display_name":"Question answering","score":0.7278000116348267},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6899999976158142},{"id":"https://openalex.org/C12725497","display_name":"Baseline (sea)","score":0.6273999810218811},{"id":"https://openalex.org/C185798385","display_name":"Benchmark (surveying)","score":0.5708000063896179},{"id":"https://openalex.org/C169900460","display_name":"Cognition","score":0.5055999755859375},{"id":"https://openalex.org/C12713177","display_name":"Perspective (graphical)","score":0.46889999508857727},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.46650001406669617}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"arxiv:2512.13368","title":"BlossomRec: Block-level Fused Sparse Attention Mechanism for Sequential Recommendations","url":"http://arxiv.org/abs/2512.13368","published":"2026-04-12","authors":["M Ma","Xiaopeng Li","Wanyu Wang","Zongliang Du","Jingtong Gao","Peng Jia","Yuyang Ye","Yiqi Wang","Yunpeng Weng","Weihong Luo","Han Xiao","Xiangyu Zhao"],"abstract":"Transformer structures have been widely used in sequential recommender systems (SRS). However, as user interaction histories increase, computational time and memory requirements also grow. This is mainly caused by the standard attention mechanism. Although there exist many methods employing efficient attention and SSM-based models, these approaches struggle to effectively model long sequences and may exhibit unstable performance on short sequences. To address these challenges, we design a sparse attention mechanism, BlossomRec, which models both long-term and short-term user interests through attention computation to achieve stable performance across sequences of varying lengths. Specifically, we categorize user interests in recommendation systems into long-term and short-term interests, and compute them using two distinct sparse attention patterns, with the results combined through a le...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3774904.3792408","openalex_id":"https://openalex.org/W4417457302","cited_by_count":0,"quality_score":49,"matched_keywords":["memory","long-term","efficient"],"author_affiliations":["City University of Hong Kong","Michigan State University","Rutgers, The State University of New Jersey","Tencent (China)","Zhejiang University of Technology"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7660999894142151},{"id":"https://openalex.org/C94124525","display_name":"Categorization","score":0.6557000279426575},{"id":"https://openalex.org/C45374587","display_name":"Computation","score":0.4837000072002411},{"id":"https://openalex.org/C557471498","display_name":"Recommender system","score":0.444599986076355},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.41359999775886536},{"id":"https://openalex.org/C89611455","display_name":"Mechanism (biology)","score":0.40119999647140503},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.4007999897003174},{"id":"https://openalex.org/C43126263","display_name":"Source code","score":0.3416999876499176}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7155843410","title":"DynaMoLTV: A Cross-Game Dynamic Mixture Model with Weighted Sub-Distributions for Player Lifetime Value Prediction","url":"https://doi.org/10.1145/3774904.3792420","published":"2026-04-12","authors":["Furen Xu","Jie Zhang","Kai Jiang","Chengxiang Zhuo","Zang Li"],"abstract":"Online game advertising is a prominent class of Web-mediated interactive services, where understanding and predicting player Lifetime Value (LTV) is a core scientific challenge in Web-scale user modeling, personalization, and digital economy optimization. However, the LTV prediction task poses severe challenges to traditional methods, which include data sparsity and complex distribution characteristics (such as zero-inflation, long tail, multimodal distribution, and cross-game). Existing methods struggle to capture the realistic and complex LTV distributions and exhibit limitations in leveraging cross-game data. We propose the first cross-game dynamic mixture framework with weighted sub-distributions for LTV prediction, DynaMoLTV. DynaMoLTV primarily models complex distributions via a zero-inflated mixture of lognormal (ZIMLN) loss, incorporates a game expert for cross-game data adaptati...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3774904.3792420","openalex_id":"https://openalex.org/W7155843410","cited_by_count":0,"quality_score":45,"matched_keywords":["personalized","personalization"],"author_affiliations":["Tencent (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7197999954223633},{"id":"https://openalex.org/C61224824","display_name":"Mixture model","score":0.5728999972343445},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.46209999918937683},{"id":"https://openalex.org/C124101348","display_name":"Data mining","score":0.46149998903274536},{"id":"https://openalex.org/C95623464","display_name":"Classifier (UML)","score":0.4388999938964844},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4106999933719635},{"id":"https://openalex.org/C116834253","display_name":"Identification (biology)","score":0.3935000002384186},{"id":"https://openalex.org/C2780451532","display_name":"Task (project management)","score":0.385699987411499}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7156265759","title":"Optimizing Multi-Turn Interactive Recommendation Agents via Generative Intrinsic Motivation","url":"https://doi.org/10.1145/3774904.3792209","published":"2026-04-12","authors":["Xueyang Feng","Jiakai Tang","Xu Chen","Quanyu Dai","Zhenhua Dong"],"abstract":"Large language models have given rise to interactive recommendation agents (IRAs). Through proactive clarification, tool invocation, and dynamic dialogue, IRAs shift recommender systems from passive prediction to interactive, proactive intelligence. For training IRAs, agentic reinforcement learning offers a natural pathway, as it enables models to learn interactive capabilities directly from environmental feedback without requiring costly annotated data. However, this process faces three key challenges: credit assignment in multi-turn interactions, efficient exploration in large action spaces, and coordinated learning of multiple interactive skills.","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3774904.3792209","openalex_id":"https://openalex.org/W7156265759","cited_by_count":0,"quality_score":41,"matched_keywords":["efficient"],"author_affiliations":["Huawei Technologies (China)","Renmin University of China"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7937999963760376},{"id":"https://openalex.org/C98045186","display_name":"Process (computing)","score":0.6482999920845032},{"id":"https://openalex.org/C2780791683","display_name":"Action (physics)","score":0.6083999872207642},{"id":"https://openalex.org/C97541855","display_name":"Reinforcement learning","score":0.6036999821662903},{"id":"https://openalex.org/C26517878","display_name":"Key (lock)","score":0.5910000205039978},{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.5497000217437744},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5364000201225281},{"id":"https://openalex.org/C2776716048","display_name":"Interactive Learning","score":0.5131000280380249}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"arxiv:2604.10604","title":"NSFL: A Post-Training Neuro-Symbolic Fuzzy Logic Framework for Boolean Operators in Neural Embeddings","url":"https://arxiv.org/abs/2604.10604","published":"2026-04-12","authors":["Vladi Vexler","Ofer Idan","Gil Lederman","Dima Sivov"],"abstract":"Standard dense retrievers lack a native calculus for multi-atom logical constraints. We introduce Neuro-Symbolic Fuzzy Logic (NSFL), a framework that adapts formal t-norms and t-conorms to neural embedding spaces without requiring retraining. NSFL operates as a first-order hybrid calculus: it anchors logical operations on isolated zero-order similarity scores while actively steering representations using Neuro-Symbolic Deltas (NS-Delta) -- the first-order marginal differences derived from contextual fusion. This preserves pure atomic meaning while capturing domain reliance, preventing the representation collapse and manifold escape endemic to traditional geometric baselines. For scalable real-time retrieval, Spherical Query Optimization (SQO) leverages Riemannian optimization to project these fuzzy formulas into manifold-stable query vectors. Validated across six distinct encoder configu...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"","openalex_id":"https://openalex.org/W7154539736","cited_by_count":0,"quality_score":41,"matched_keywords":["retrieval"],"author_affiliations":["Huawei Technologies (China)","Huawei Technologies (United States)"],"concepts":[{"id":"https://openalex.org/C41608201","display_name":"Embedding","score":0.7055000066757202},{"id":"https://openalex.org/C58166","display_name":"Fuzzy logic","score":0.6212000250816345},{"id":"https://openalex.org/C2776359362","display_name":"Representation (politics)","score":0.5579000115394592},{"id":"https://openalex.org/C529865628","display_name":"Manifold (fluid mechanics)","score":0.5435000061988831},{"id":"https://openalex.org/C80444323","display_name":"Theoretical computer science","score":0.498199999332428},{"id":"https://openalex.org/C36503486","display_name":"Domain (mathematical analysis)","score":0.4975999891757965},{"id":"https://openalex.org/C118505674","display_name":"Encoder","score":0.4973999857902527},{"id":"https://openalex.org/C48044578","display_name":"Scalability","score":0.49540001153945923}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"arxiv:2604.10390","title":"LLM-PRISM: Characterizing Silent Data Corruption from Permanent GPU Faults in LLM Training","url":"https://arxiv.org/abs/2604.10390","published":"2026-04-12","authors":["Abhishek Tyagi","Saurabh Hukerikar","Nirmal Saxena","Yanxiang Huang","P.P. Shirvani","Chung-Hsuan Tung","Yuhao Zhu"],"abstract":"Large-scale LLM training is increasingly susceptible to hardware defects stemming from manufacturing escapes and silicon aging. These defects manifest as Silent Data Corruption (SDC) that perturb gradients and parameters throughout the training process. We present LLM-PRISM, a methodology to characterize LLM pre-training resilience to hardware faults. LLM-PRISM couples RTL-level GPU fault simulation with a stochastic injection engine embedded in Megatron-LM. Through 7,664 training runs across FP16, BF16, and FP8 regimes, we analyze how fault type, rate, and numeric format govern resilience. We find that while LLMs resist low-frequency faults, impact is highly non-uniform; critical datapaths and specific precision formats can induce catastrophic divergence even at moderate fault rates. This study provides the first hardware-grounded, pre-training characterization of SDC resilience.","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"","openalex_id":"https://openalex.org/W7154540168","cited_by_count":0,"quality_score":41,"matched_keywords":["LLM"],"author_affiliations":["Duke University","Nvidia (United States)","University of Rochester"],"concepts":[{"id":"https://openalex.org/C2777211547","display_name":"Training (meteorology)","score":0.6697999835014343},{"id":"https://openalex.org/C175551986","display_name":"Fault (geology)","score":0.6057999730110168},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6025999784469604},{"id":"https://openalex.org/C2779585090","display_name":"Resilience (materials science)","score":0.5771999955177307},{"id":"https://openalex.org/C51632099","display_name":"Training set","score":0.46399998664855957},{"id":"https://openalex.org/C207390915","display_name":"Divergence (linguistics)","score":0.44699999690055847},{"id":"https://openalex.org/C2775928411","display_name":"Fault injection","score":0.436599999666214},{"id":"https://openalex.org/C149635348","display_name":"Embedded system","score":0.43290001153945923}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/from-searchable-to-non-searchable-generative-ai-and-information-diversity-in-online-information-seeking","title":"From Searchable to Non-Searchable: Generative AI and Information Diversity in Online Information Seeking","url":"https://www.microsoft.com/en-us/research/publication/from-searchable-to-non-searchable-generative-ai-and-information-diversity-in-online-information-seeking/","published":"2026-04-11","authors":["Yulin Yu","Yizhou Li","Siddharth Suri","Scott Counts"],"abstract":"Conversational generative AI systems such as ChatGPT are transforming how people seek and engage with information online. Unlike traditional search engines, these systems support open-ended, conversational inquiry, yet it remains unclear whether they ultimately expand or constrain the diversity of knowledge that users encounter in online search spaces—a primary foundation for knowledge work, learning, and innovation. Using over 200,000 real-world human–ChatGPT interactions, we examine how generative-AI–mediated inquiry reshapes diversity in both user inputs and system outputs through the lens of searchability—whether queries could plausibly be answered by traditional search engines. We find that almost 80% of ChatGPT user queries are non-searchable and span a broader knowledge space and topics than searchable queries, indicating expanded modes of inquiry. However, for comparable searchab...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.1145/3772363.3798802","openalex_id":"https://openalex.org/W7153811513","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Human-computer interaction","Search and information retrieval","Computer science","Human–computer interaction"],"author_affiliations":["Microsoft","Microsoft (United States)","Northwestern Michigan College","Northwestern University","University of Michigan"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W7153442782","title":"Hybrid bioprinting platforms: integrating microfluidic, AI, and additive manufacturing for functional tissue constructs","url":"https://doi.org/10.1007/s42114-026-01780-0","published":"2026-04-11","authors":["Md. Azhar","Rishabha Malviya","Balamurugan Balusamy","S. B. SRIDHAR","J. SHAREEF","Tarun Wadhwa","Aarthi Sivasankaran"],"abstract":"Traditional methods of bioprinting tend to compromise the areas of mechanical strength, biological feasibility, and multifaceted tissue structure. Material gradients, vascularisation, and scalability have posed challenges to the rapid development of hybrid bioprinting systems combining microfluidics, additive manufacturing, and artificial intelligence to enhance functional tissues. This review summarizes recent (2015–2025) innovations in hybrid bioprinting systems that combine multimodal printing, microfluidic precision, and AI-controlled technology to improve construct fidelity and translational prospects. In this study, the literature search was done using Google Scholar, PubMed, MDPI, Scopus, ScienceDirect, and SpringerLink. The keywords were hybrid bioprinting, microfluidic bioprinting, AI-assisted bioprinting, bioinks, and tissue engineering. Articles were filtered based on relevanc...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1007/s42114-026-01780-0","openalex_id":"https://openalex.org/W7153442782","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Dubai Medical College","Galgotias University","Google (United States)","Manipal Academy of Higher Education","Ras al-Khaimah Medical and Health Sciences University"],"concepts":[{"id":"https://openalex.org/C2779718196","display_name":"3D bioprinting","score":0.6377999782562256},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5699999928474426},{"id":"https://openalex.org/C48044578","display_name":"Scalability","score":0.5354999899864197},{"id":"https://openalex.org/C183696295","display_name":"Biochemical engineering","score":0.39480000734329224},{"id":"https://openalex.org/C8673954","display_name":"Microfluidics","score":0.3790000081062317},{"id":"https://openalex.org/C171250308","display_name":"Nanotechnology","score":0.3375000059604645},{"id":"https://openalex.org/C188087704","display_name":"Standardization","score":0.32710000872612},{"id":"https://openalex.org/C158154518","display_name":"Relevance (law)","score":0.3091000020503998}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/mstar-every-task-deserves-its-own-memory-harness","title":"M$^star$: Every Task Deserves Its Own Memory Harness","url":"https://www.microsoft.com/en-us/research/publication/mstar-every-task-deserves-its-own-memory-harness/","published":"2026-04-10","authors":["Wenbo Pan","Shujie Liu","Xiangyang Zhou","Shiwei Zhang","Wanlu Shi","Mirror Xu","Xiaohua Jia"],"abstract":"Large language model agents rely on specialized memory systems to accumulate and reuse knowledge during extended interactions. Recent architectures typically adopt a fixed memory design tailored to specific domains, such as semantic retrieval for conversations or skills reused for coding. However, a memory system optimized for one purpose frequently fails to transfer to others. To address this limitation, we introduce M$^star$, a method that automatically discovers task-optimized memory harnesses through executable program evolution. Specifically, M$^star$ models an agent memory system as a memory program written in Python. This program encapsulates the data Schema, the storage Logic, and the agent workflow Instructions. We optimize these components jointly using a reflective code evolution method; this approach employs a population-based search strategy and analyzes evaluation failures....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":84,"matched_keywords":["Miscellaneous","Algorithms","Artificial intelligence","Computer science","language model","memory","retrieval","agent"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/sppo-sequence-level-ppo-for-long-horizon-reasoning-tasks","title":"SPPO: Sequence-Level PPO for Long-Horizon Reasoning Tasks","url":"https://www.microsoft.com/en-us/research/publication/sppo-sequence-level-ppo-for-long-horizon-reasoning-tasks/","published":"2026-04-10","authors":["Tianyi Wang","Yixia Li","Long Li","Yibiao Chen","Shaohan Huang","Yun Chen","Peng Li","Yang Liu","Guanhua Chen"],"abstract":"Proximal Policy Optimization (PPO) is central to aligning Large Language Models (LLMs) in reasoning tasks with verifiable rewards. However, standard token-level PPO struggles in this setting due to the instability of temporal credit assignment over long Chain-of-Thought (CoT) horizons and the prohibitive memory cost of the value model. While critic-free alternatives like GRPO mitigate these issues, they incur significant computational overhead by requiring multiple samples for baseline estimation, severely limiting training throughput. In this paper, we introduce Sequence-Level PPO (SPPO), a scalable algorithm that harmonizes the sample efficiency of PPO with the stability of outcome-based updates. SPPO reformulates the reasoning process as a Sequence-Level Contextual Bandit problem, employing a decoupled scalar value function to derive low-variance advantage signals without multi-sampli...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Miscellaneous","Algorithms","Artificial intelligence","Computer science","memory","efficient"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/litmus-reagent-a-benchmark-and-agentic-system-for-predictive-evaluation-of-multilingual-models","title":"Litmus (Re)Agent: A Benchmark and Agentic System for Predictive Evaluation of Multilingual Models","url":"https://www.microsoft.com/en-us/research/publication/litmus-reagent-a-benchmark-and-agentic-system-for-predictive-evaluation-of-multilingual-models/","published":"2026-04-10","authors":["Avni Mittal","Shanu Kumar","Sandipan Dandapat","Monojit Choudhury"],"abstract":"We study predictive multilingual evaluation: estimating how well a model will perform on a task in a target language when direct benchmark results are missing. This problem is common in multilingual deployment, where evaluation coverage is sparse and published evidence is uneven across languages, tasks, and model families. We introduce a controlled benchmark of 1,500 questions spanning six tasks and five evidence scenarios. The benchmark separates accessible evidence from ground truth, enabling evaluation of systems that must infer missing results from incomplete literature evidence. We also present Litmus (Re)Agent, a DAG-orchestrated agentic system that decomposes queries into hypotheses, retrieves evidence, and synthesises predictions through feature-aware aggregation. Across six systems, Litmus (Re)Agent achieves the best overall performance, with the largest gains in transfer-heavy....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Miscellaneous","Artificial intelligence","Human language technologies","Computer science","Natural language processing","agent"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/do-llms-follow-their-own-rules-a-reflexive-audit-of-self-stated-safety-policies","title":"Do LLMs Follow Their Own Rules? A Reflexive Audit of Self-Stated Safety Policies","url":"https://www.microsoft.com/en-us/research/publication/do-llms-follow-their-own-rules-a-reflexive-audit-of-self-stated-safety-policies/","published":"2026-04-10","authors":["Avni Mittal"],"abstract":"LLMs internalize safety policies through RLHF, yet these policies are never formally specified and remain difficult to inspect. Existing benchmarks evaluate models against external standards but do not measure whether models understand and enforce their own stated boundaries. We introduce the Symbolic-Neural Consistency Audit (SNCA), a framework that (1) extracts a model's self-stated safety rules via structured prompts, (2) formalizes them as typed predicates (Absolute, Conditional, Adaptive), and (3) measures behavioral compliance via deterministic comparison against harm benchmarks. Evaluating four frontier models across 45 harm categories and 47,496 observations reveals systematic gaps between stated policy and observed behavior: models claiming absolute refusal frequently comply with harmful prompts, reasoning models achieve the highest self-consistency but fail to articulate polici...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Miscellaneous","Artificial intelligence","Human language technologies","Computation and Language","Computer science"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/memento-teaching-llms-to-manage-their-own-context","title":"MEMENTO: Teaching LLMs to Manage Their Own Context","url":"https://www.microsoft.com/en-us/research/publication/memento-teaching-llms-to-manage-their-own-context/","published":"2026-04-10","authors":["Vasilis Kontonis","Yuchen Zeng","Shivam Garg","Lingjiao Chen","Hao Tang","Ziyan Wang","Ahmed Awadallah","Eric Horvitz","John Langford","Dimitris Papailiopoulos"],"abstract":"Reasoning models think in long, unstructured streams with no mechanism for compressing or organizing their own intermediate state. We introduce MEMENTO: a method that teaches models to segment reasoning into blocks, compress each block into a memento, i.e., a dense state summary, and reason forward by attending only to mementos, reducing context, KV cache, and compute. To train MEMENTO models, we release OpenMementos, a public dataset of 228K reasoning traces derived from OpenThoughts-v3, segmented and annotated with intermediate summaries. We show that a two-stage SFT recipe on OpenMementos is effective across different model families (Qwen3, Phi-4, Olmo 3) and scales (8B--32B parameters). Trained models maintain strong accuracy on math, science, and coding benchmarks while achieving ${sim}2.5times$ peak KV cache reduction. We extend vLLM to support our inference method, achieving ${sim...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Miscellaneous","Algorithms","Artificial intelligence","Computer science"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/ff3r-feedforward-feature-3d-reconstruction-from-unconstrained-views","title":"FF3R: Feedforward Feature 3D Reconstruction from Unconstrained views","url":"https://www.microsoft.com/en-us/research/publication/ff3r-feedforward-feature-3d-reconstruction-from-unconstrained-views/","published":"2026-04-10","authors":["Chaoyi Zhou","Runze Wang","Feng Luo","Mert D. Pes'e","Zhiwen Fan","Yiqi Zhong","Siyu Huang"],"abstract":"Recent advances in vision foundation models have revolutionized geometry reconstruction and semantic understanding. Yet, most of the existing approaches treat these capabilities in isolation, leading to redundant pipelines and compounded errors. This paper introduces FF3R, a fully annotation-free feed-forward framework that unifies geometric and semantic reasoning from unconstrained multi-view image sequences. Unlike previous methods, FF3R does not require camera poses, depth maps, or semantic labels, relying solely on rendering supervision for RGB and feature maps, establishing a scalable paradigm for unified 3D reasoning. In addition, we address two critical challenges in feedforward feature reconstruction pipelines, namely global semantic inconsistency and local structural inconsistency, through two key innovations: (i) a Token-wise Fusion Module that enriches geometry tokens with sem...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Miscellaneous","Artificial intelligence","Computer vision","Computer science"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/confident-in-a-confidence-score-investigating-the-sensitivity-of-confidence-scores-to-supervised-fine-tuning","title":"Confident in a Confidence Score: Investigating the Sensitivity of Confidence Scores to Supervised Fine-Tuning","url":"https://www.microsoft.com/en-us/research/publication/confident-in-a-confidence-score-investigating-the-sensitivity-of-confidence-scores-to-supervised-fine-tuning/","published":"2026-04-10","authors":["Lorenzo Jaime Flores","Cesare Spinoso di-Piano","Jackie Cheung"],"abstract":"Uncertainty quantification is a set of techniques that measure confidence in language models. They can be used, for example, to detect hallucinations or alert users to review uncertain predictions. To be useful, these confidence scores must be correlated with the quality of the output. However, recent work found that fine-tuning can affect the correlation between confidence scores and quality. Hence, we investigate the underlying behavior of confidence scores to understand its sensitivity to supervised fine-tuning (SFT). We find that post-SFT, the correlation of various confidence scores degrades, which can stem from changes in confidence scores due to factors other than the output quality, such as the output's similarity to the training distribution. We demonstrate via a case study how failing to address this miscorrelation reduces the usefulness of the confidence scores on a downstream...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Miscellaneous","Human language technologies","Computation and Language","Computer science"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W7153324563","title":"Automatic Propagation of Profile Information through the Optimization Pipeline","url":"https://doi.org/10.1145/3798247","published":"2026-04-10","authors":["Elisa Fröhlich","Angélica Aparecida Moreira","F. B. Pereira"],"abstract":"Profile-guided optimization (PGO) is a well-established technique for improving program performance, being integrated into major compilers such as GCC, LLVM/Clang, and Microsoft Visual C++. PGO collects information about a program's execution and uses it to guide optimizations such as inlining, and code layout. However, these very transformations alter the program's control flow, rendering the collected profiles stale or inaccurate. To deal with this problem, this paper investigates how to reuse profile data after optimization without re-executing the program. We study two complementary strategies: prediction, which estimates likely hot code paths in the optimized program, and projection, which transfers profile information from the original control-flow graph to its transformed version. We evaluate several techniques for reconstructing profile data, including a large language model (LLM...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3798247","openalex_id":"https://openalex.org/W7153324563","cited_by_count":0,"quality_score":45,"matched_keywords":["LLM","language model"],"author_affiliations":["Microsoft (United States)","Microsoft Research (United Kingdom)","Universidade Federal de Minas Gerais"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7997999787330627},{"id":"https://openalex.org/C169590947","display_name":"Compiler","score":0.569599986076355},{"id":"https://openalex.org/C52173422","display_name":"Opcode","score":0.4984000027179718},{"id":"https://openalex.org/C206588197","display_name":"Reuse","score":0.4968000054359436},{"id":"https://openalex.org/C43521106","display_name":"Pipeline (software)","score":0.4819999933242798},{"id":"https://openalex.org/C205711294","display_name":"Rendering (computer graphics)","score":0.4666000008583069},{"id":"https://openalex.org/C75608658","display_name":"Pascal (unit)","score":0.45969998836517334},{"id":"https://openalex.org/C43126263","display_name":"Source code","score":0.4404999911785126}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"arxiv:2602.23065","title":"LLM-Powered Silent Bug Fuzzing in Deep Learning Libraries via Versatile and Controlled Bug Transfer","url":"http://arxiv.org/abs/2602.23065","published":"2026-04-10","authors":["Kunpeng Zhang","Dongwei Xiao","Daoyuan Wu","Shuai Wang","Jiali Zhao","Yuanyi Lin","Tongtong Xu","shuai wang"],"abstract":"Deep learning (DL) libraries are widely used in critical applications, where even subtle silent bugs can lead to serious consequences. While existing DL fuzzing techniques have made progress in detecting crashes, they inherently struggle to detect silent bugs due to the lack of effective test programs and corresponding oracles. Building on the observation that historical bug reports contain rich, underutilized information about silent bugs, we leverage large language models (LLMs) to perform versatile yet controlled bug transfer for silent bug fuzzing. Specifically, our approach uses LLMs to extract context-aware bug patterns from historical issues, match semantically related Application Programming Interfaces (APIs) using functionality-based embeddings, and synthesize test cases with customized oracles. This enables proactive detection of silent bugs by transferring high-risk contexts a...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3798258","openalex_id":"https://openalex.org/W7131838316","cited_by_count":0,"quality_score":41,"matched_keywords":["LLM"],"author_affiliations":["Central University of Finance and Economics","Hong Kong University of Science and Technology","Huawei Technologies (China)"],"concepts":[{"id":"https://openalex.org/C111065885","display_name":"Fuzz testing","score":0.8109999895095825},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7455000281333923},{"id":"https://openalex.org/C55166926","display_name":"Oracle","score":0.5630999803543091},{"id":"https://openalex.org/C1009929","display_name":"Software bug","score":0.5054000020027161},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5042999982833862},{"id":"https://openalex.org/C153083717","display_name":"Leverage (statistics)","score":0.4855000078678131},{"id":"https://openalex.org/C108583219","display_name":"Deep learning","score":0.45249998569488525},{"id":"https://openalex.org/C168065819","display_name":"Debugging","score":0.3921999931335449}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/human-ai-collaboration-field-experiment","title":"Scaffolding Human-AI Collaboration: A Field Experiment on Behavioral Protocols and Cognitive Reframing","url":"https://www.microsoft.com/en-us/research/publication/human-ai-collaboration-field-experiment/","published":"2026-04-09","authors":["Alex Farach","Alexia Cambon","Lev Tankelevitch","Connie Hsueh","Rebecca Janssen"],"abstract":"Organizations have widely deployed generative AI tools, yet productivity gains remain uneven, suggesting that how people use AI matters as much as whether they have access. We conducted a field experiment with 388 employees at a Fortune 500 retailer to test two scaffolding interventions for human-AI collaboration. All participants had access to the same AI tool; we varied only the structure surrounding its use. A behavioral scaffolding intervention (a structured protocol requiring joint AI use within pairs) was associated with lower document quality relative to unstructured use and substantially lower document production. A cognitive scaffolding intervention (partnership training that reframed AI as a thought partner) was associated with higher individual document quality at the top of the distribution. Treatment participants also showed greater positive belief change across the session,...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":80,"matched_keywords":["Miscellaneous","Artificial intelligence","Economics","Human-computer interaction","Social sciences","Computer science","LLM"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/from-gaze-to-guidance-interpreting-and-adapting-to-users-cognitive-needs-with-multimodal-gaze-aware-ai-assistants","title":"From Gaze to Guidance: Interpreting and Adapting to Users' Cognitive Needs with Multimodal Gaze-Aware AI Assistants","url":"https://www.microsoft.com/en-us/research/publication/from-gaze-to-guidance-interpreting-and-adapting-to-users-cognitive-needs-with-multimodal-gaze-aware-ai-assistants/","published":"2026-04-09","authors":["Valdemar Danry","Javier Hernandez","Andrew D. Wilson","Pattie Maes","Judith Amores"],"abstract":"Current LLM assistants are powerful at answering questions, but they have limited access to the behavioral context that reveals when and where a user is struggling. We present a gaze-grounded multimodal LLM assistant that uses egocentric video with gaze overlays to identify likely points of difficulty and target follow-up retrospective assistance. We instantiate this vision in a controlled study (n=36) comparing the gaze-aware AI assistant to a text-only LLM assistant. Compared to a conventional LLM assistant, the gaze-aware assistant was rated as significantly more accurate and personalized in its assessments of users'reading behavior and significantly improved people's ability to recall information. Users spoke significantly fewer words with the gaze-aware assistant, indicating more efficient interactions. Qualitative results underscored both perceived benefits in comprehension and cha...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":80,"matched_keywords":["Artificial intelligence","Human-computer interaction","accessibility","Computer science","LLM","personalized","efficient"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/to-copilot-and-beyond-22-ai-systems-developers-want-built","title":"To Copilot and Beyond: 22 AI Systems Developers Want Built","url":"https://www.microsoft.com/en-us/research/publication/to-copilot-and-beyond-22-ai-systems-developers-want-built/","published":"2026-04-09","authors":["Rudrajit Choudhuri","Christian Bird","Carmen Badea","Anita Sarma"],"abstract":"Developers spend roughly one-tenth of their workday writing code, yet most AI tooling targets that fraction. This paper asks what should be built for the rest. We surveyed 860 Microsoft developers to understand where they want AI support, and where they want it to stay out. Using a human-in-the-loop, multi-model council-based thematic analysis, we identify 22 AI systems that developers want built across five task categories. For each, we describe the problem it solves, what makes it hard to build, and the constraints developers place on its behavior. Our findings point to a growing right-shift burden in AI-assisted development: developers wanted systems that embed quality signals earlier in their workflow to keep pace with accelerating code generation, while enforcing explicit authority scoping, provenance, uncertainty signaling, and least-privilege access throughout. This tension reveal...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Miscellaneous","Artificial intelligence","Programming languages and software engineering","Computer science","Human Computer Interaction"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/faithful-grpo-improving-visual-spatial-reasoning-in-multimodal-language-models-via-constrained-policy-optimization","title":"Faithful GRPO: Improving Visual Spatial Reasoning in Multimodal Language Models via Constrained Policy Optimization","url":"https://www.microsoft.com/en-us/research/publication/faithful-grpo-improving-visual-spatial-reasoning-in-multimodal-language-models-via-constrained-policy-optimization/","published":"2026-04-09","authors":["Sai Srinivas Kancheti","Aditya Kanade","Rohit Sinha","Vineeth N Balasubramanian","Tanuja Ganu"],"abstract":"Multimodal reasoning models (MRMs) trained with reinforcement learning with verifiable rewards (RLVR) show improved accuracy on visual reasoning benchmarks. However, we observe that accuracy gains often come at the cost of reasoning quality: generated Chain-of-Thought (CoT) traces are frequently inconsistent with the final answer and poorly grounded in the visual evidence. We systematically study this phenomenon across seven challenging real-world spatial reasoning benchmarks and find that it affects contemporary MRMs such as ViGoRL-Spatial, TreeVGR as well as our own models trained with standard Group Relative Policy Optimization (GRPO). We characterize CoT reasoning quality along two complementary axes:\"logical consistency\"(does the CoT entail the final answer?) and\"visual grounding\"(does each reasoning step accurately describe objects, attributes, and spatial relationships in the imag...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Miscellaneous","Computer vision","Computer science","Computer Vision and Pattern Recognition","Multimodal Large Language Models"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/entropy-gradient-grounding-training-free-evidence-retrieval-in-vision-language-models","title":"Entropy-Gradient Grounding: Training-Free Evidence Retrieval in Vision-Language Models","url":"https://www.microsoft.com/en-us/research/publication/entropy-gradient-grounding-training-free-evidence-retrieval-in-vision-language-models/","published":"2026-04-09","authors":["Marcel Gropl","Jaewoo Jung","Seungryong Kim","Marc Pollefeys","Sung‐Jin Hong"],"abstract":"Despite rapid progress, pretrained vision-language models still struggle when answers depend on tiny visual details or on combining clues spread across multiple regions, as in documents and compositional queries. We address this by framing grounding as test-time evidence retrieval: given a query, the model should actively identify where to look next to resolve ambiguity. To this end, we propose a training-free, model-intrinsic grounding method that uses uncertainty as supervision. Specifically, we compute the entropy of the model's next-token distribution and backpropagate it to the visual token embeddings to obtain an entropy-gradient relevance map, without auxiliary detectors or attention-map heuristics. We then extract and rank multiple coherent regions to support multi-evidence queries, and introduce an iterative zoom-and-reground procedure with a spatial-entropy stopping rule to avo...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Miscellaneous","Artificial intelligence","Computer vision","Computer science","retrieval"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/avgen-bench-a-task-driven-benchmark-for-multi-granular-evaluation-of-text-to-audio-video-generation","title":"AVGen-Bench: A Task-Driven Benchmark for Multi-Granular Evaluation of Text-to-Audio-Video Generation","url":"https://www.microsoft.com/en-us/research/publication/avgen-bench-a-task-driven-benchmark-for-multi-granular-evaluation-of-text-to-audio-video-generation/","published":"2026-04-09","authors":["Ziwei Zhou","Zeyuan Lai","Rui Wang","Yifan Yang","Zhening Xing","Yuqing Yang","Qi Dai","Lili Qiu","Chong Luo"],"abstract":"Text-to-Audio-Video (T2AV) generation is rapidly becoming a core interface for media creation, yet its evaluation remains fragmented. Existing benchmarks largely assess audio and video in isolation or rely on coarse embedding similarity, failing to capture the fine-grained joint correctness required by realistic prompts. We introduce AVGen-Bench, a task-driven benchmark for T2AV generation featuring high-quality prompts across 11 real-world categories. To support comprehensive assessment, we propose a multi-granular evaluation framework that combines lightweight specialist models with Multimodal Large Language Models (MLLMs), enabling evaluation from perceptual quality to fine-grained semantic controllability. Our evaluation reveals a pronounced gap between strong audio-visual aesthetics and weak semantic reliability, including persistent failures in text rendering, speech coherence, phy...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Miscellaneous","Computer vision","Computer science","Computer Vision and Pattern Recognition","media"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/ai-generates-well-liked-but-templatic-empathic-responses","title":"AI generates well-liked but templatic empathic responses","url":"https://www.microsoft.com/en-us/research/publication/ai-generates-well-liked-but-templatic-empathic-responses/","published":"2026-04-09","authors":["Emma S. Gueorguieva","Hongli Zhan","Jina Suh","Javier Hernandez","Tatiana Lau","Junyi Jessy Li","Desmond C. Ong"],"abstract":"Recent research shows that greater numbers of people are turning to Large Language Models (LLMs) for emotional support, and that people rate LLM responses as more empathic than human-written responses. We suggest a reason for this success: LLMs have learned and consistently deploy a well-liked template for expressing empathy. We develop a taxonomy of 10 empathic language\"tactics\"that include validating someone's feelings and paraphrasing, and apply this taxonomy to characterize the language that people and LLMs produce when writing empathic responses. Across a set of 2 studies comparing a total of n = 3,265 AI-generated (by six models) and n = 1,290 human-written responses, we find that LLM responses are highly formulaic at a discourse functional level. We discovered a template -- a structured sequence of tactics -- that matches between 83--90% of LLM responses (and 60--83% in a held out...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Artificial intelligence","Computer science","large language models","LLM"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/afgnn-api-misuse-detection-using-graph-neural-networks-and-clustering","title":"AFGNN: API Misuse Detection using Graph Neural Networks and Clustering","url":"https://www.microsoft.com/en-us/research/publication/afgnn-api-misuse-detection-using-graph-neural-networks-and-clustering/","published":"2026-04-09","authors":["P. Pirapuraj","Tamalika Mondal","Sharanya Gupta","Akash Lal","Somak Aditya","Jyothi Vedurada"],"abstract":"Application Programming Interfaces (APIs) are crucial to software development, enabling integration of existing systems with new applications by reusing tried and tested code, saving development time and increasing software safety. In particular, the Java standard library APIs, along with numerous third-party APIs, are extensively utilized in the development of enterprise application software. However, their misuse remains a significant source of bugs and vulnerabilities. Furthermore, due to the limited examples in the official API documentation, developers often rely on online portals and generative AI models to learn unfamiliar APIs, but using such examples may introduce unintentional errors in the software. In this paper, we present AFGNN, a novel Graph Neural Network (GNN)-based framework for efficiently detecting API misuses in Java code. AFGNN uses a novel API Flow Graph (AFG) repr...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Miscellaneous","Programming languages and software engineering","Computer science","Graph neural networks"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"official:c624402669d847e6","title":"Think in Strokes, Not Pixels: Process-Driven Image Generation via Interleaved Reasoning","url":"https://ai.meta.com/research/publications/think-in-strokes-not-pixels-process-driven-image-generation-via-interleaved-reasoning/","published":"2026-04-09","authors":["Lei Zhang","Junjiao Tian","Zhipeng Fan","Kunpeng Li","Jialiang Wang","Weifeng Chen","Markos Georgopoulos","Felix Xu","Yuxiao Bao","Julian McAuley","Manling Li","Zecheng He"],"abstract":"Official Meta AI publication page.","companies":["Meta/FAIR"],"matched_orgs":["Meta/FAIR"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Human & Machine Intelligence","Computer Vision"],"author_affiliations":["Meta/FAIR"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Meta AI publication search https://ai.meta.com/results/?content_types%5B0%5D=publication&page=1"}},{"id":"apple:jxutaf43ugmay1lt2shoieif","title":"LaCy: What Small Language Models Can and Should Learn is Not Just a Question of Loss","url":"https://machinelearning.apple.com/research/lacy","published":"2026-04-09","authors":["Szilvia Ujváry","Louis Béthune","Pierre Ablin","João Monteiro","Marco Cuturi","Michael Kirchhof"],"abstract":"This paper was accepted at the Workshop on Memory for LLM-Based Agentic Systems at ICLR.","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["LLM","memory"],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"hf-org-paper:Tencent-Hunyuan:2604.05072","title":"Hierarchical SVG Tokenization: Learning Compact Visual Programs for Scalable Vector Graphics Modeling","url":"https://huggingface.co/papers/2604.05072","published":"2026-04-09","authors":["Tencent/Hunyuan"],"abstract":"","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["HuggingFace org papers","Tencent-Hunyuan"],"author_affiliations":["Tencent/Hunyuan"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/Tencent-Hunyuan/papers"}},{"id":"arxiv:2510.05598","title":"AgentDR: Dynamic Recommendation with Implicit Item-Item Relations via LLM-based Agents","url":"http://arxiv.org/abs/2510.05598","published":"2026-04-09","authors":["Mingdai Yang","Nurendra Choudhary","Jiangshu Du","Edward W Huang","Philip S. Yu","Karthik Subbian","Danai Koutra"],"abstract":"Recent agent-based recommendation frameworks aim to simulate user behaviors by incorporating memory mechanisms and prompting strategies, but they struggle with hallucinating non-existent items and full-catalog ranking. Besides, a largely underexplored opportunity lies in leveraging LLMs' commonsense reasoning to capture user intent through substitute and complement relationships between items, which are usually implicit in datasets and difficult for traditional ID-based recommenders to capture. In this work, we propose a novel LLM-agent framework, AgentDR, which bridges LLM reasoning with scalable recommendation tools. Our approach delegates full-ranking tasks to traditional models while utilizing LLMs to (i) integrate multiple recommendation outputs based on personalized tool suitability and (ii) reason over substitute and complement relationships grounded in user history. This design m...","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3774904.3792304","openalex_id":"https://openalex.org/W4414977855","cited_by_count":0,"quality_score":53,"matched_keywords":["LLM","personalized","memory","agent"],"author_affiliations":["Amazon (United States)","University of Illinois Chicago","University of Michigan"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7829999923706055},{"id":"https://openalex.org/C112313634","display_name":"Complement (music)","score":0.7117999792098999},{"id":"https://openalex.org/C176217482","display_name":"Metric (unit)","score":0.5831999778747559},{"id":"https://openalex.org/C158154518","display_name":"Relevance (law)","score":0.5324000120162964},{"id":"https://openalex.org/C557471498","display_name":"Recommender system","score":0.5315999984741211},{"id":"https://openalex.org/C189430467","display_name":"Ranking (information retrieval)","score":0.5220999717712402},{"id":"https://openalex.org/C48044578","display_name":"Scalability","score":0.5160999894142151},{"id":"https://openalex.org/C2911011789","display_name":"Hallucinating","score":0.47699999809265137}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"arxiv:2510.21603","title":"Doc-Researcher: A Unified System for Multimodal Document Parsing and Deep Research","url":"http://arxiv.org/abs/2510.21603","published":"2026-04-09","authors":["Kuicai Dong","Shurui Huang","Fangda Ye","Wei Han","Zhi Zhang","Dexun Li","Wenjun Li","Qu Yang","Gang Wang","Yichao Wang","Chen Zhang","Yong Liu"],"abstract":"Deep Research systems have revolutionized how LLMs solve complex questions through iterative reasoning and evidence gathering. However, current systems remain fundamentally constrained to textual web data, overlooking the vast knowledge embedded in multimodal documents: scientific papers, technical reports, and financial documents where critical information exists in figures, tables, charts, and equations. Processing such documents demands sophisticated parsing to preserve visual semantics, intelligent chunking to maintain structural coherence, and adaptive retrieval across modalities, which are capabilities absent in existing systems. In response, we present Doc-Researcher, a unified system that bridges this gap through three integrated components: (i) deep multimodal parsing that preserves layout structure and visual semantics while creating multi-granular representations from chunk to...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3774904.3792599","openalex_id":"https://openalex.org/W4417328505","cited_by_count":0,"quality_score":49,"matched_keywords":["retrieval","agent","multi-agent"],"author_affiliations":["Huawei Technologies (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8300999999046326},{"id":"https://openalex.org/C186644900","display_name":"Parsing","score":0.6004999876022339},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5835999846458435},{"id":"https://openalex.org/C177212765","display_name":"Workflow","score":0.5038999915122986},{"id":"https://openalex.org/C108583219","display_name":"Deep learning","score":0.49790000915527344},{"id":"https://openalex.org/C184337299","display_name":"Semantics (computer science)","score":0.4957999885082245},{"id":"https://openalex.org/C185798385","display_name":"Benchmark (surveying)","score":0.39800000190734863},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.3783999979496002}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7152615548","title":"Aligning Query Rewriting with Human Cognition and Preference in E-Commerce Search","url":"https://doi.org/10.1145/3774904.3792806","published":"2026-04-09","authors":["Ruize Ou","Kai Wang","Jianzhi Shao","Tao Zhang","Chengfu Huo"],"abstract":"In e-commerce search, the diverse ways in which users express intentions lead to lexical and semantic gaps between queries and product descriptions, making query rewriting (QR) indispensable for improving matching efficiency. With the development of LLMs, QR has evolved from discriminative approaches to various LLM-based alignment methods. However, these methods typically treat all queries uniformly, without fundamentally distinguishing their rewriting difficulty or underlying linguistic issues, making the rewritten query deviate from human expectations. To address this limitation, we propose AWHCP (Aligning with Human Cognition and Preference), a novel framework that adopts a human-centric perspective and introduces the Problem–Intention–Fix–Rewrite (PIFR) paradigm. Built upon PIFR, AWHCP establishes a multi-granularity alignment training framework that simultaneously aligns with both s...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3774904.3792806","openalex_id":"https://openalex.org/W7152615548","cited_by_count":0,"quality_score":49,"matched_keywords":["LLM","preference","retrieval"],"author_affiliations":["Alibaba Group (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6406000256538391},{"id":"https://openalex.org/C154690210","display_name":"Rewriting","score":0.609000027179718},{"id":"https://openalex.org/C2781249084","display_name":"Preference","score":0.5479999780654907},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.49000000953674316},{"id":"https://openalex.org/C169900460","display_name":"Cognition","score":0.46309998631477356},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.39629998803138733},{"id":"https://openalex.org/C192028432","display_name":"Query language","score":0.3504999876022339},{"id":"https://openalex.org/C2989070954","display_name":"Database query","score":0.29649999737739563}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"arxiv:2510.08048","title":"TaoSR-AGRL: Adaptive Guided Reinforcement Learning Framework for E-commerce Search Relevance","url":"http://arxiv.org/abs/2510.08048","published":"2026-04-09","authors":["Jianhui Yang","Yiming Jin","Pengkun Jiao","Chenhe Dong","Zerui Huang","Shaowei Yao","Xuejun Zhou","Dan Ou","Haihong Tang"],"abstract":"Query-product relevance prediction is fundamental to e-commerce search and has become even more critical in the era of AI-powered shopping, where semantic understanding and complex reasoning directly shape the user experience and business conversion. Large Language Models (LLMs) enable generative, reasoning-based approaches, typically aligned via supervised fine-tuning (SFT) or preference optimization methods like Direct Preference Optimization (DPO). However, the increasing complexity of business rules and user queries exposes the inability of existing methods to endow models with robust reasoning capacity for long-tail and challenging cases. Efforts to address this via reinforcement learning strategies like Group Relative Policy Optimization (GRPO) often suffer from sparse terminal rewards, offering insufficient guidance for multi-step reasoning and slowing convergence. To address thes...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3774904.3792799","openalex_id":"https://openalex.org/W4416401411","cited_by_count":0,"quality_score":45,"matched_keywords":["LLM","preference"],"author_affiliations":["Alibaba Group (China)"],"concepts":[{"id":"https://openalex.org/C158154518","display_name":"Relevance (law)","score":0.8583999872207642},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7670999765396118},{"id":"https://openalex.org/C97541855","display_name":"Reinforcement learning","score":0.7502999901771545},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5958999991416931},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.5523999929428101},{"id":"https://openalex.org/C26517878","display_name":"Key (lock)","score":0.508899986743927},{"id":"https://openalex.org/C2781249084","display_name":"Preference","score":0.4796000123023987},{"id":"https://openalex.org/C22367795","display_name":"Structured prediction","score":0.3926999866962433}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"arxiv:2604.08618","title":"SkillForge: Forging Domain-Specific, Self-Evolving Agent Skills in Cloud Technical Support","url":"https://arxiv.org/abs/2604.08618","published":"2026-04-09","authors":["Xingyan Liu","Xiyue Luo","Linyu Li","Ganghong Huang","Jianfeng Liu","Honglin Qiao"],"abstract":"Deploying LLM-powered agents in enterprise scenarios such as cloud technical support demands high-quality, domain-specific skills. However, existing skill creators lack domain grounding, producing skills poorly aligned with real-world task requirements. Moreover, once deployed, there is no systematic mechanism to trace execution failures back to skill deficiencies and drive targeted refinements, leaving skill quality stagnant despite accumulating operational evidence. We introduce SkillForge, a self-evolving framework that closes an end-to-end creation-evaluation-refinement loop. To produce well-aligned initial skills, a Domain-Contextualized Skill Creator grounds skill synthesis in knowledge bases and historical support tickets. To enable continuous self-optimization, a three-stage pipeline -- Failure Analyzer, Skill Diagnostician, and Skill Optimizer -- automatically diagnoses executio...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"","openalex_id":"https://openalex.org/W7154426910","cited_by_count":0,"quality_score":45,"matched_keywords":["LLM","agent"],"author_affiliations":["Alibaba Group (China)","Alibaba Group (United States)"],"concepts":[{"id":"https://openalex.org/C79974875","display_name":"Cloud computing","score":0.7190999984741211},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6912000179290771},{"id":"https://openalex.org/C2780451532","display_name":"Task (project management)","score":0.6322000026702881},{"id":"https://openalex.org/C2776436953","display_name":"Consistency (knowledge bases)","score":0.6241000294685364},{"id":"https://openalex.org/C105339364","display_name":"Software deployment","score":0.5942999720573425},{"id":"https://openalex.org/C43521106","display_name":"Pipeline (software)","score":0.5881999731063843},{"id":"https://openalex.org/C36503486","display_name":"Domain (mathematical analysis)","score":0.5788000226020813},{"id":"https://openalex.org/C2779530757","display_name":"Quality (philosophy)","score":0.5374000072479248}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7152608437","title":"ST-LEGO: Large Language Models as Modular Architects for Traffic Prediction","url":"https://doi.org/10.1145/3774904.3792132","published":"2026-04-09","authors":["Shuhao Li","Weidong Yang","Yue Cui","Lipeng Ma","Yixuan Li","C. -H. Wu","Lu Qin","Fan Zhang"],"abstract":"Traffic prediction serves as a cornerstone for systems and network services such as the Web of Vehicles (WoV), online navigation, and smart city applications. Despite the proliferation of model architectures in recent years, existing approaches often suffer from highly customized structures and weak transferability, making it difficult to cope with increasing task heterogeneity and modeling complexity. To address these challenges, we propose ST-LEGO, a modular assembly framework driven by large language models (LLMs) that supports flexible structural composition and automated code generation. ST-LEGO employs a multi-agent collaborative system comprising a Prompt Agent, Assemble Agent, and Code Agent, which are responsible for understanding task requirements, dynamically assembling structural modules, and automatically generating executable PyTorch code. By introducing a standardized modu...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3774904.3792132","openalex_id":"https://openalex.org/W7152608437","cited_by_count":0,"quality_score":45,"matched_keywords":["agent","multi-agent"],"author_affiliations":["Alibaba Group (China)","Fudan University","Guangzhou University","University of Technology Sydney","Zhuhai Fudan Innovation Research Institute"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6262000203132629},{"id":"https://openalex.org/C101468663","display_name":"Modular design","score":0.5615000128746033},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3425000011920929},{"id":"https://openalex.org/C26517878","display_name":"Key (lock)","score":0.28040000796318054},{"id":"https://openalex.org/C179603123","display_name":"Modeling language","score":0.26829999685287476},{"id":"https://openalex.org/C2780791683","display_name":"Action (physics)","score":0.2671000063419342},{"id":"https://openalex.org/C199360897","display_name":"Programming language","score":0.26409998536109924},{"id":"https://openalex.org/C127413603","display_name":"Engineering","score":0.2639999985694885}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7152570016","title":"Reinforcement Learning from Human and AI Feedback for Large Language Model Alignment: A Review","url":"https://doi.org/10.63503/j.ijssic.2026.234","published":"2026-04-09","authors":["Tanay Chowdhury"],"abstract":"Safe and effective deployment of AI requires that large language models (LLMs) generate it in a way that complies with human values and preferences. The application of Reinforcement Learning from Human Feedback (RLHF) has effectively been applied to fine-tune models on human judgment-based categories, enhancing the helpfulness, coherence, and safety. Nonetheless, RLHF suffers certain limitations, such as using high-quality human labels, being costly, slow to iterate, and not being consistent owing to the subjectivity of annotators. The Reinforcement Learning AI Feedback (RLAIF) has become a scalable and effective method of resolving the challenges. The RLAIF provides an opportunity to use AI-generated preferences, revisions, and reward modeling to automatically fine-tune LLMs without violating ethical and safety standards. This will decrease human efforts, enhance reproducibility, and en...","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.63503/j.ijssic.2026.234","openalex_id":"https://openalex.org/W7152570016","cited_by_count":0,"quality_score":45,"matched_keywords":["language model","personalization"],"author_affiliations":["Amazon (United States)"],"concepts":[{"id":"https://openalex.org/C97541855","display_name":"Reinforcement learning","score":0.7483999729156494},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7145000100135803},{"id":"https://openalex.org/C183003079","display_name":"Personalization","score":0.6535999774932861},{"id":"https://openalex.org/C48044578","display_name":"Scalability","score":0.5187000036239624},{"id":"https://openalex.org/C63479239","display_name":"Robustness (evolution)","score":0.46140000224113464},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.44760000705718994},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.4343999922275543},{"id":"https://openalex.org/C105339364","display_name":"Software deployment","score":0.4341000020503998}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7152482019","title":"ReSuMe: Retriever-Summarizer Mutual Enhancement via Reinforcement Learning","url":"https://doi.org/10.1145/3774904.3792959","published":"2026-04-09","authors":["Owais Makroo","Nikhil Pattisapu","Karan Gupta","Ankit Gandhi","Vijay Huddar","Atul Saroop"],"abstract":"We present ReSuMe, a general framework for mutual enhancement of dense retrieval systems and document summarizers through reinforcement learning. The framework jointly optimizes a language model for generating retrieval-oriented summaries and adapts the retrieval model to these summaries through alternating fine-tuning phases. We employ Group Relative Policy Optimization (GRPO) to fine-tune the language model based on retrieval relevance rather than linguistic quality alone, while the retrieval model is iteratively updated using contrastive learning on the generated summaries. This co-optimization process addresses the fundamental distribution shift problem that arises when retrieval models trained on full documents must operate on synthetic summaries during inference. By progressively reducing this distribution gap, our framework yields two key benefits: improved retrieval performance a...","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3774904.3792959","openalex_id":"https://openalex.org/W7152482019","cited_by_count":0,"quality_score":45,"matched_keywords":["language model","retrieval"],"author_affiliations":["Amazon (United States)","Indian Institute of Technology Kharagpur"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5041999816894531},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.45890000462532043},{"id":"https://openalex.org/C97541855","display_name":"Reinforcement learning","score":0.3977000117301941},{"id":"https://openalex.org/C2775924081","display_name":"Control (management)","score":0.29580000042915344},{"id":"https://openalex.org/C99498987","display_name":"Noise (video)","score":0.29420000314712524},{"id":"https://openalex.org/C2777212361","display_name":"Class (philosophy)","score":0.29249998927116394},{"id":"https://openalex.org/C165064840","display_name":"Matching (statistics)","score":0.2612999975681305},{"id":"https://openalex.org/C33923547","display_name":"Mathematics","score":0.2493000030517578}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7152696460","title":"Probe-and-Fetch: Dynamic KV Cache Pruning for Accelerated Long-Context Inference in Web-Scale AI Search","url":"https://doi.org/10.1145/3774904.3792794","published":"2026-04-09","authors":["Yuchen Li","Rui Kong","Xinran Chen","Chengzhe Zhang","Jiamin Chen","Cheng Deng","Xinyu Ma","Haojie Zhang","Tianhao Peng","Hengyi Cai","Shuaiqiang Wang","Jiashu Zhao"],"abstract":"Generative inference with Large Language Models (LLMs) is the cornerstone of web-scale AI search, where queries are answered using vast, heterogeneous documents retrieved via Retrieval-Augmented Generation (RAG). This paradigm is critically bottlenecked by the cost of self-attention mechanism on long context. The sheer diversity of retrieved web content (multi-sourced, multi-lingual, multi-faceted) makes simple Key-Value (KV) cache optimizations with pre-fixed subsets ineffective, demanding a dynamic, content-aware approach. This challenge, however, introduces a classic chicken-and-egg problem: the model cannot foresee the necessary KV entries for attention without first inferring on the content, yet doing so on the full context is prohibitively expensive. This paper introduces P&F, a unified framework that resolves this dilemma through a core ''probe-and-fetch'' mechanism, which ingenio...","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3774904.3792794","openalex_id":"https://openalex.org/W7152696460","cited_by_count":0,"quality_score":45,"matched_keywords":["memory","retrieval"],"author_affiliations":["Baidu (China)","City University of Hong Kong","Shanghai Jiao Tong University","University College London","University of Edinburgh","Wilfrid Laurier University","York University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7085999846458435},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5497999787330627},{"id":"https://openalex.org/C108010975","display_name":"Pruning","score":0.5458999872207642},{"id":"https://openalex.org/C2776214188","display_name":"Inference","score":0.48989999294281006},{"id":"https://openalex.org/C115537543","display_name":"Cache","score":0.42010000348091125},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.32350000739097595},{"id":"https://openalex.org/C124101348","display_name":"Data mining","score":0.32249999046325684},{"id":"https://openalex.org/C11413529","display_name":"Algorithm","score":0.3208000063896179}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"arxiv:2604.07809","title":"PolicyLong: Towards On-Policy Context Extension","url":"http://arxiv.org/abs/2604.07809","published":"2026-04-09","authors":["Junlong Jia","Ziyang Chen","Xing Wu","Chaochen Gao","TingHao Yu","Feng Zhang","Songlin Hu"],"abstract":"Extending LLM context windows is hindered by scarce high-quality long-context data. Recent methods synthesize data with genuine long-range dependencies via information-theoretic verification, selecting contexts that reduce a base model's predictive entropy. However, their single-pass offline construction with a fixed model creates a fundamental off-policy gap: the static screening landscape misaligns with the model's evolving capabilities, causing the training distribution to drift. We propose PolicyLong, shifting data construction towards a dynamic on-policy paradigm. By iteratively re-executing data screening (entropy computation, retrieval, and verification) using the current model, PolicyLong ensures the training distribution tracks evolving capabilities, yielding an emergent self-curriculum. Crucially, both positive and hard negative contexts derive from the current model's entropy....","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"","openalex_id":"https://openalex.org/W7153670765","cited_by_count":0,"quality_score":45,"matched_keywords":["LLM","retrieval"],"author_affiliations":["Tencent (China)"],"concepts":[{"id":"https://openalex.org/C165696696","display_name":"Exploit","score":0.7649999856948853},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7063999772071838},{"id":"https://openalex.org/C2779343474","display_name":"Context (archaeology)","score":0.541100025177002},{"id":"https://openalex.org/C2778029271","display_name":"Extension (predicate logic)","score":0.5347999930381775},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.4431000053882599},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4187999963760376},{"id":"https://openalex.org/C106301342","display_name":"Entropy (arrow of time)","score":0.4074000120162964},{"id":"https://openalex.org/C42058472","display_name":"Base (topology)","score":0.3619000017642975}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"arxiv:2507.23581","title":"GraphRAG-R1: Graph Retrieval-Augmented Generation with Process-Constrained Reinforcement Learning","url":"http://arxiv.org/abs/2507.23581","published":"2026-04-09","authors":["Caiyang Yu","Kuo Zhao","Yuhan Li","Heng Chang","Mingjian Feng","Xiangzhe Jiang","Yufei Sun","Jia Li","Yuzhi Zhang","Qingyun Sun","Jianxin Li","Ziwei Zhang"],"abstract":"Graph Retrieval-Augmented Generation (GraphRAG) has shown great effectiveness in enhancing the reasoning abilities of Large Language Models (LLMs) by leveraging graph structures for knowledge representation and modeling complex real-world relationships. However, existing GraphRAG methods still face significant bottlenecks when handling complex problems that require multi-hop reasoning, as their query and retrieval phases are largely based on pre-defined heuristics and do not fully utilize the reasoning potentials of LLMs. To address this problem, we propose GraphRAG-R1, an adaptive GraphRAG framework by training LLMs with process-constrained outcome-based reinforcement learning (RL) to enhance the multi-hop reasoning ability. Our method can decompose complex problems, autonomously invoke retrieval tools to acquire necessary information, and perform effective reasoning. Specifically, we u...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"preprint","doi":"https://doi.org/10.1145/3774904.3792589","openalex_id":"https://openalex.org/W4414921206","cited_by_count":0,"quality_score":45,"matched_keywords":["LLM","retrieval"],"author_affiliations":["Beihang University","Guangzhou HKUST Fok Ying Tung Research Institute","Hong Kong University of Science and Technology","Huawei Technologies (China)","Huawei Technologies (United Kingdom)","Nankai University","University of Hong Kong"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7423999905586243},{"id":"https://openalex.org/C97541855","display_name":"Reinforcement learning","score":0.659500002861023},{"id":"https://openalex.org/C127705205","display_name":"Heuristics","score":0.65420001745224},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5519999861717224},{"id":"https://openalex.org/C132525143","display_name":"Graph","score":0.5432000160217285},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.5005999803543091},{"id":"https://openalex.org/C2776359362","display_name":"Representation (politics)","score":0.4749999940395355},{"id":"https://openalex.org/C161301231","display_name":"Knowledge representation and reasoning","score":0.33180001378059387}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7152432201","title":"Generalized Pseudo-Relevance Feedback","url":"https://doi.org/10.1145/3774904.3792078","published":"2026-04-09","authors":["Yiteng Tu","Weihang Su","Yujia Zhou","Yiqun Liu","Fen Lin","Qin Liu","Qingyao Ai"],"abstract":"Query rewriting is a fundamental technique in information retrieval (IR). It typically employs the retrieval result as relevance feedback to refine the query and thereby addresses the vocabulary mismatch between user queries and relevant documents. Traditional pseudo-relevance feedback (PRF) and its vector-based extension (VPRF) improve retrieval performance by leveraging top-retrieved documents as relevance feedback. However, they are constructed based on two major hypotheses: the relevance assumption (top documents are relevant) and the model assumption (rewriting methods need to be designed specifically for particular model architectures). While recent large language models (LLMs)-based generative relevance feedback (GRF) enables model-free query reformulation, it either suffers from severe LLM hallucination or, again, relies on the relevance assumption to guarantee the effectiveness....","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3774904.3792078","openalex_id":"https://openalex.org/W7152432201","cited_by_count":0,"quality_score":45,"matched_keywords":["LLM","retrieval"],"author_affiliations":["Tencent (China)","Tsinghua University"],"concepts":[{"id":"https://openalex.org/C2779532271","display_name":"Relevance feedback","score":0.8190000057220459},{"id":"https://openalex.org/C158154518","display_name":"Relevance (law)","score":0.7781999707221985},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7717999815940857},{"id":"https://openalex.org/C154690210","display_name":"Rewriting","score":0.6582000255584717},{"id":"https://openalex.org/C63479239","display_name":"Robustness (evolution)","score":0.5633000135421753},{"id":"https://openalex.org/C99016210","display_name":"Query expansion","score":0.5411999821662903},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.4796999990940094},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4327999949455261}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7152578024","title":"Data-Driven Function Calling Improvements in Large Language Model for Online Financial QA","url":"https://doi.org/10.1145/3774904.3792813","published":"2026-04-09","authors":["Xing Tang","Hao Chen","Shiwei Li","Fuyuan Lyu","Weijie Shi","Lingjie Li","Dugang Liu","Weihong Luo","Xiku Du","Xiuqiang He"],"abstract":"Large language models (LLMs) have been incorporated into numerous industrial applications. Meanwhile, a vast array of API assets is scattered across various functions in the financial domain. An online financial question-answering system can leverage both LLMs and private APIs to provide timely financial analysis and information. The key is equipping the LLM model with function calling capability tailored to a financial scenario. However, a generic LLM requires customized financial APIs to call and struggles to adapt to the financial domain. Additionally, online user queries are diverse and contain out-of-distribution parameters compared with the required function input parameters, which makes it more difficult for a generic LLM to serve online users. In this paper, we propose a data-driven pipeline to enhance function calling in LLM for our online, deployed financial QA, comprising data...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3774904.3792813","openalex_id":"https://openalex.org/W7152578024","cited_by_count":0,"quality_score":45,"matched_keywords":["LLM","language model"],"author_affiliations":["Hong Kong University of Science and Technology","Huazhong University of Science and Technology","McGill University","Shenzhen Technology University","Shenzhen University","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5853000283241272},{"id":"https://openalex.org/C14036430","display_name":"Function (biology)","score":0.5806999802589417},{"id":"https://openalex.org/C10138342","display_name":"Finance","score":0.4383000135421753},{"id":"https://openalex.org/C26517878","display_name":"Key (lock)","score":0.35409998893737793},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3237000107765198},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.30649998784065247},{"id":"https://openalex.org/C2779530757","display_name":"Quality (philosophy)","score":0.3025999963283539},{"id":"https://openalex.org/C168167062","display_name":"Component (thermodynamics)","score":0.2808000147342682}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7152650418","title":"Smart Eye: LLM-Guided Proposer-Verifier Framework for Industrial-Scale Log Anomaly Detection","url":"https://doi.org/10.1145/3774904.3792804","published":"2026-04-09","authors":["Changhua Pei","Hang Cui","Jingjing Li","Yuxuan Li","Zihan Liu","Xinyuan Liao","Cenjie Hu","Jiabao Wang","Zheyuan Li","Zexin Wang","Haotian Si","Ke Xiang"],"abstract":"","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3774904.3792804","openalex_id":"https://openalex.org/W7152650418","cited_by_count":0,"quality_score":41,"matched_keywords":["LLM"],"author_affiliations":["Chinese Academy of Sciences","Computer Network Information Center","Huawei Technologies (China)","Huawei Technologies (United Kingdom)","Shenyang Institute of Automation","Tsinghua University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.4864000082015991},{"id":"https://openalex.org/C739882","display_name":"Anomaly detection","score":0.4178999960422516},{"id":"https://openalex.org/C124101348","display_name":"Data mining","score":0.3686000108718872},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.32420000433921814},{"id":"https://openalex.org/C9652623","display_name":"Field (mathematics)","score":0.29260000586509705},{"id":"https://openalex.org/C2780009758","display_name":"Measure (data warehouse)","score":0.29089999198913574},{"id":"https://openalex.org/C99498987","display_name":"Noise (video)","score":0.2904999852180481},{"id":"https://openalex.org/C11413529","display_name":"Algorithm","score":0.28139999508857727}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7152604881","title":"Real-Time Trend Prediction via Continually-Aligned LLM Query Generation","url":"https://doi.org/10.1145/3774904.3792950","published":"2026-04-09","authors":["Zijing Hui","Wenhan Lyu","Shusen Wang","Li Chen","Chu Wang"],"abstract":"","companies":["Meta/FAIR"],"matched_orgs":["Meta/FAIR"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3774904.3792950","openalex_id":"https://openalex.org/W7152604881","cited_by_count":0,"quality_score":41,"matched_keywords":["LLM"],"author_affiliations":["Meta (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5942999720573425},{"id":"https://openalex.org/C124101348","display_name":"Data mining","score":0.3598000109195709},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3549000024795532},{"id":"https://openalex.org/C2776401178","display_name":"Feature (linguistics)","score":0.27379998564720154},{"id":"https://openalex.org/C165838908","display_name":"Calibration","score":0.26510000228881836},{"id":"https://openalex.org/C61797465","display_name":"Term (time)","score":0.26179999113082886},{"id":"https://openalex.org/C26517878","display_name":"Key (lock)","score":0.25040000677108765},{"id":"https://openalex.org/C9652623","display_name":"Field (mathematics)","score":0.2459000051021576}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"arxiv:2510.07784","title":"PLUM: Adapting Pre-trained Language Models for Industrial-scale Generative Recommendations","url":"http://arxiv.org/abs/2510.07784","published":"2026-04-09","authors":["R. He","Lukasz Heldt","Lichan Hong","Raghunandan H. Keshavan","Shifan Mao","Nikhil Mehta","Zhengyang Su","Alicia Y. Tsai","Yueqi Wang","Shao-Chuan Wang","Xinyang Yi","Lexi Baugher"],"abstract":"Large Language Models (LLMs) pose a new paradigm of modeling and computation for information tasks. Recommendation systems are a critical application domain poised to benefit significantly from the sequence modeling capabilities and world knowledge inherent in these large models. In this paper, we introduce PLUM, a framework designed to adapt pre-trained LLMs for industry-scale recommendation tasks. PLUM consists of item tokenization using Semantic IDs, continued pre-training (CPT) on domain-specific data, and task-specific fine-tuning for recommendation objectives. For fine-tuning, we focus particularly on generative retrieval, where the model is directly trained to generate Semantic IDs of recommended items based on user context. We conduct comprehensive experiments on large-scale internal video recommendation datasets. Our results demonstrate that PLUM achieves substantial improvement...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3774904.3792802","openalex_id":"https://openalex.org/W4415318445","cited_by_count":0,"quality_score":41,"matched_keywords":["retrieval"],"author_affiliations":["Google (United States)","Mountain View College"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8109999895095825},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.5605000257492065},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5537999868392944},{"id":"https://openalex.org/C192209626","display_name":"Focus (optics)","score":0.5277000069618225},{"id":"https://openalex.org/C2776214188","display_name":"Inference","score":0.5242000222206116},{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.4880000054836273},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.48750001192092896},{"id":"https://openalex.org/C36503486","display_name":"Domain (mathematical analysis)","score":0.4471000134944916}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7152706713","title":"NEZHA: A Zero-sacrifice and Hyperspeed Decoding Architecture for Generative Recommendations","url":"https://doi.org/10.1145/3774904.3792797","published":"2026-04-09","authors":["Yejing Wang","Shengyu Zhou","Jinyu Lu","Ziwei Liu","Langming Liu","Maolin Wang","Wenlin Zhang","Feng Li","Wenbo Su","Pengjie Wang","Jian Xu","Xiangyu Zhao"],"abstract":"Generative Recommendation (GR), powered by Large Language Models (LLMs), represents a promising new paradigm for industrial recommender systems. However, their practical application is severely hindered by high inference latency, making them infeasible for high-throughput, real-time services and limiting their overall business impact. While Speculative Decoding (SD) has been proposed to accelerate the autoregressive generation process, existing implementations introduce new bottlenecks: they typically require separate draft models and model-based verifiers, which require additional training and increase latency overhead. In this paper, we address these challenges with NEZHA, a novel architecture that achieves hyperspeed decoding for GR systems without sacrificing recommendation quality. Specifically, NEZHA integrates a nimble autoregressive draft head directly into the primary model, ena...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3774904.3792797","openalex_id":"https://openalex.org/W7152706713","cited_by_count":0,"quality_score":41,"matched_keywords":["efficient"],"author_affiliations":["Alibaba Group (China)","City University of Hong Kong"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6682000160217285},{"id":"https://openalex.org/C57273362","display_name":"Decoding methods","score":0.5098999738693237},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4706999957561493},{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.4650999903678894},{"id":"https://openalex.org/C125411270","display_name":"Encoding (memory)","score":0.3483000099658966},{"id":"https://openalex.org/C123657996","display_name":"Architecture","score":0.3292999863624573},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.32519999146461487},{"id":"https://openalex.org/C26517878","display_name":"Key (lock)","score":0.30889999866485596}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7152446058","title":"ARCHER: Shooting Straight in Multimodal E-Commerce Search at Alibaba with Progressive Alignment","url":"https://doi.org/10.1145/3774904.3792819","published":"2026-04-09","authors":["Maolin Wang","Lang Fu","Jun Chu","Kai Guo","Chen‐Jie Qin","Xinxin Wang","Siyu Wu","Wen Jiang","Xiangyu Zhao"],"abstract":"In the rapidly evolving landscape of e-commerce, visual search has become a cornerstone of user experience, enabling customers to find products using images rather than traditional text queries. However, a comprehensive analysis reveals a persistent challenge: nearly half of retrieval failures stem from systems that prioritize superficial visual similarity over semantic relevance, resulting in frustrating user experiences where searches return visually similar but functionally different products. This limitation becomes particularly acute in Business-to-Business environments, where incorrect product recommendations can have significant operational and safety implications. In this paper, we propose a novel solution, Adaptive Retrieval with Category-aware Hierarchical sEmantic Refinement (ARCHER), which presents a novel multimodal retrieval framework that addresses these challenges through...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3774904.3792819","openalex_id":"https://openalex.org/W7152446058","cited_by_count":0,"quality_score":41,"matched_keywords":["retrieval"],"author_affiliations":["Alibaba Group (China)","City University of Hong Kong"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5755000114440918},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.5436000227928162},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5357000231742859},{"id":"https://openalex.org/C199639397","display_name":"Engineering drawing","score":0.3149999976158142},{"id":"https://openalex.org/C2781238097","display_name":"Object (grammar)","score":0.31369999051094055},{"id":"https://openalex.org/C2776401178","display_name":"Feature (linguistics)","score":0.3043000102043152},{"id":"https://openalex.org/C121684516","display_name":"Computer graphics (images)","score":0.27059999108314514},{"id":"https://openalex.org/C127413603","display_name":"Engineering","score":0.25200000405311584}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7152501276","title":"AFE-Master: Enhancing LLM-Driven Autonomous Feature Engineering with Domain-Specific Language Parsing and Guided Local Search","url":"https://doi.org/10.1145/3774904.3792816","published":"2026-04-09","authors":["Hebin Liang","Jianye Hao","Jinyi Liu","Yi Ma","Zilin Cao","Jing Liang","Kun Shao","Zhaocheng Du","Fei Ni","Yifu Yuan","Yan Zheng"],"abstract":"Autonomous Feature Engineering (AFE) is critical for improving predictive performance on tabular data by relieving humans from manual feature crafting. However, traditional AFE lacks the semantic guidance needed to fully exploit domain knowledge. Although large language models (LLMs) can, in principle, emulate experts, existing approaches typically operate in an open code space that directly generates and rewrites entire features; without a compositional structural representation and invariant constraints, edits are coarse and non-local, making it hard to distill interpretable features with high information content and rich hierarchical structure.","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3774904.3792816","openalex_id":"https://openalex.org/W7152501276","cited_by_count":0,"quality_score":41,"matched_keywords":["LLM"],"author_affiliations":["Huawei Technologies (China)","Imperial College London","Shanxi University","Tianjin University","Tianjin University of Technology"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6776000261306763},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.6089000105857849},{"id":"https://openalex.org/C2776401178","display_name":"Feature (linguistics)","score":0.5192999839782715},{"id":"https://openalex.org/C186644900","display_name":"Parsing","score":0.45969998836517334},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.44209998846054077},{"id":"https://openalex.org/C195324797","display_name":"Natural language","score":0.3165000081062317},{"id":"https://openalex.org/C2778827112","display_name":"Feature engineering","score":0.30239999294281006},{"id":"https://openalex.org/C26517878","display_name":"Key (lock)","score":0.2824999988079071}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"arxiv:2511.12932","title":"Text2Traffic: a text-to-image generation and editing method for traffic scenes","url":"http://arxiv.org/abs/2511.12932","published":"2026-04-09","authors":["Feng Lv","Haoxuan Feng","Zilu Zhang","Chunlong Xia","Y. Li"],"abstract":"With the rapid advancement of intelligent transportation systems, text-driven image generation and editing techniques have demonstrated significant potential in providing rich, controllable visual scene data for applications such as traffic monitoring and autonomous driving. However, several challenges remain, including insufficient semantic richness of generated traffic elements, limited camera viewpoints, low visual fidelity of synthesized images, and poor alignment between textual descriptions and generated content. To address these issues, we propose a unified text-driven framework for both image generation and editing, leveraging a controllable mask mechanism to seamlessly integrate the two tasks. Furthermore, we incorporate both vehicle-side and roadside multi-view data to enhance the geometric diversity of traffic scenes. Our training strategy follows a two-stage paradigm: first,....","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1117/12.3109071","openalex_id":"https://openalex.org/W4417192525","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Baidu (China)","Beijing Jiaotong University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7994999885559082},{"id":"https://openalex.org/C2776459999","display_name":"Fidelity","score":0.5649999976158142},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4945000112056732},{"id":"https://openalex.org/C115961682","display_name":"Image (mathematics)","score":0.4821000099182129},{"id":"https://openalex.org/C2776674983","display_name":"Image editing","score":0.4352000057697296},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.4291999936103821},{"id":"https://openalex.org/C113364801","display_name":"High fidelity","score":0.38909998536109924},{"id":"https://openalex.org/C36464697","display_name":"Visualization","score":0.38119998574256897}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7152719604","title":"Fate: <u>Fa</u> s <u>s</u> s <u>sE</u> sdge Inference of Mixture-of-Experts Models via Cross-Layer Gate","url":"https://doi.org/10.1145/3774904.3792527","published":"2026-04-09","authors":["Zhiyuan Fang","Xingfan Yu","Yuegui Huang","Zicong Hong","Yufeng Lyu","Wuhui Chen","Yue Yu","Yu Fan"],"abstract":"","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3774904.3792527","openalex_id":"https://openalex.org/W7152719604","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Hong Kong University of Science and Technology","Huawei Technologies (China)","Peng Cheng Laboratory","Sun Yat-sen University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5511999726295471},{"id":"https://openalex.org/C11413529","display_name":"Algorithm","score":0.42080000042915344},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.37450000643730164},{"id":"https://openalex.org/C2776214188","display_name":"Inference","score":0.37369999289512634},{"id":"https://openalex.org/C33923547","display_name":"Mathematics","score":0.29750001430511475},{"id":"https://openalex.org/C2776401178","display_name":"Feature (linguistics)","score":0.25369998812675476},{"id":"https://openalex.org/C99498987","display_name":"Noise (video)","score":0.25279998779296875},{"id":"https://openalex.org/C80444323","display_name":"Theoretical computer science","score":0.24449999630451202}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"arxiv:2604.04088","title":"Embedding Enhancement via Fine-Tuned Language Models for Learner-Item Cognitive Modeling","url":"http://arxiv.org/abs/2604.04088","published":"2026-04-09","authors":["Yuanhao Liu","Zihan Zhou","Kaiying Wu","Shuo Liu","Yiyang Huang","Jiajun Guo","Aimin Zhou","Hong Qian"],"abstract":"Learner-item cognitive modeling plays a central role in the web-based online intelligent education system by enabling cognitive diagnosis (CD) across diverse online educational scenarios. Although ID embedding remains the mainstream approach in cognitive modeling due to its effectiveness and flexibility, recent advances in language models (LMs) have introduced new possibilities for incorporating rich semantic representations to enhance CD performance. This highlights the need for a comprehensive analysis of how LMs enhance embeddings through semantic integration across mainstream CD tasks. This paper identifies two key challenges in fully leveraging LMs in existing work: Misalignment between the training objectives of LMs and CD models creates a distribution gap in feature spaces; A unified framework is essential for integrating textual embeddings across varied CD tasks while preserving....","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3774904.3792542","openalex_id":"https://openalex.org/W7152024169","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["East China Normal University","Shanghai Normal University","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7878000140190125},{"id":"https://openalex.org/C41608201","display_name":"Embedding","score":0.6208999752998352},{"id":"https://openalex.org/C26517878","display_name":"Key (lock)","score":0.5246000289916992},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.49470001459121704},{"id":"https://openalex.org/C161407221","display_name":"Cognitive model","score":0.4860999882221222},{"id":"https://openalex.org/C184337299","display_name":"Semantics (computer science)","score":0.46560001373291016},{"id":"https://openalex.org/C63479239","display_name":"Robustness (evolution)","score":0.4632999897003174},{"id":"https://openalex.org/C169900460","display_name":"Cognition","score":0.4244000017642975}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7152510654","title":"Distribution-Aligned Synthetic Text Generation via Tail-Aware Enhancement","url":"https://doi.org/10.1145/3774904.3792520","published":"2026-04-09","authors":["Yuan Fan","Xiaoyuan Liu","Bo Liu","Wubing Wang","Jia Sun","Wenzhi Chen","Huaikang Fang","Lifeng Tao","Fan Mo"],"abstract":"Recent advances in generative AI have popularized synthetic content for training, offering a practical alternative to costly data curation while addressing privacy concerns. However, accumulating evidence shows that the indiscriminate reuse of synthetic data can induce model collapse—a degenerative process that contracts the learned distribution and erodes rare features. For instance, when models are iteratively trained on their own synthetic outputs, the upper tail of the perplexity distribution substantially compresses, with high-percentile values dropping by nearly half—a clear indicator of severe diversity loss.","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3774904.3792520","openalex_id":"https://openalex.org/W7152510654","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Alibaba Group (China)","Zhejiang University","Zhejiang University of Science and Technology"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5529000163078308},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4083000123500824},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.2992999851703644},{"id":"https://openalex.org/C2776401178","display_name":"Feature (linguistics)","score":0.2671000063419342},{"id":"https://openalex.org/C177264268","display_name":"Set (abstract data type)","score":0.2563000023365021},{"id":"https://openalex.org/C99498987","display_name":"Noise (video)","score":0.24729999899864197},{"id":"https://openalex.org/C153180895","display_name":"Pattern recognition (psychology)","score":0.23569999635219574},{"id":"https://openalex.org/C9652623","display_name":"Field (mathematics)","score":0.21940000355243683}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"arxiv:2604.08124","title":"Beyond Stochastic Exploration: What Makes Training Data Valuable for Agentic Search","url":"http://arxiv.org/abs/2604.08124","published":"2026-04-09","authors":["Chuzhan Hao","Wenfeng Feng","Guochao Jiang","Guofeng Quan","Guohua Liu","Yuewei Zhang"],"abstract":"Reinforcement learning (RL) has become an effective approach for advancing the reasoning capabilities of large language models (LLMs) through the strategic integration of external search engines. However, current RL-based search agents often rely on a process of stochastic exploration guided by carefully crafted outcome rewards, leading to inefficient reasoning trajectories and unstable training. To address these issues, we propose a novel framework, Hierarchical Experience (HiExp), to enhance the performance and training stability of search agents. Specifically, we extract empirical knowledge through contrastive analysis and a multi-level clustering mechanism, transforming raw reasoning trajectories into hierarchical experience knowledge. By leveraging experience-aligned training, we effectively regularize stochastic exploration, evolving it into a strategic and experience-driven search...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"","openalex_id":"https://openalex.org/W7153669831","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Alibaba Group (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7008000016212463},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.6144000291824341},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.6103000044822693},{"id":"https://openalex.org/C112972136","display_name":"Stability (learning theory)","score":0.5670999884605408},{"id":"https://openalex.org/C98045186","display_name":"Process (computing)","score":0.5263000130653381},{"id":"https://openalex.org/C148220186","display_name":"Outcome (game theory)","score":0.4796999990940094},{"id":"https://openalex.org/C73555534","display_name":"Cluster analysis","score":0.4706000089645386},{"id":"https://openalex.org/C97541855","display_name":"Reinforcement learning","score":0.4383000135421753}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/bridging-natural-language-and-interactive-what-if-interfaces-via-llm-generated-declarative-specification","title":"Bridging Natural Language and Interactive What-If Interfaces via LLM-Generated Declarative Specification","url":"https://www.microsoft.com/en-us/research/publication/bridging-natural-language-and-interactive-what-if-interfaces-via-llm-generated-declarative-specification/","published":"2026-04-08","authors":["Sneha Gathani","Sirui Zeng","Diya Patel","Ryan A. Rossi","Dan Marshall","Çağatay Demiralp","Steven Drucker","Zhicheng Liu"],"abstract":"What-if analysis (WIA) is an iterative, multi-step process where users explore and compare hypothetical scenarios by adjusting parameters, applying constraints, and scoping data through interactive interfaces. Current tools fall short of supporting effective interactive WIA: spreadsheet and BI tools require time-consuming and laborious setup, while LLM-based chatbot interfaces are semantically fragile, frequently misinterpret intent, and produce inconsistent results as conversations progress. To address these limitations, we present a two-stage workflow that translates natural language (NL) WIA questions into interactive visual interfaces via an intermediate representation, powered by the Praxa Specification Language (PSL): first, LLMs generate PSL specifications from NL questions capturing analytical intent and logic, enabling validation and repair of erroneous specifications; and secon...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Miscellaneous","Artificial intelligence","Human-computer interaction","Computer science","Natural language","LLM"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/flowinoneunifying-multimodal-generation-as-image-in-image-out-flow-matching","title":"FlowInOne:Unifying Multimodal Generation as Image-in, Image-out Flow Matching","url":"https://www.microsoft.com/en-us/research/publication/flowinoneunifying-multimodal-generation-as-image-in-image-out-flow-matching/","published":"2026-04-08","authors":["Junchao Yi","Rui Zhao","Jiahao Tang","Weixian Lei","Linjie Li","Qi Su","Zhengyuan Yang","Lijuan Wang","Xiaofeng Zhu","Alex Jinpeng Wang"],"abstract":"Multimodal generation has long been dominated by text-driven pipelines where language dictates vision but cannot reason or create within it. We challenge this paradigm by asking whether all modalities, including textual descriptions, spatial layouts, and editing instructions, can be unified into a single visual representation. We present FlowInOne, a framework that reformulates multimodal generation as a purely visual flow, converting all inputs into visual prompts and enabling a clean image-in, image-out pipeline governed by a single flow matching model. This vision-centric formulation naturally eliminates cross-modal alignment bottlenecks, noise scheduling, and task-specific architectural branches, unifying text-to-image generation, layout-guided editing, and visual instruction following under one coherent paradigm. To support this, we introduce VisPrompt-5M, a large-scale dataset of 5...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Miscellaneous","Computer vision","Computer science","Computer Vision and Pattern Recognition","Artificial intelligence"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/does-a-global-perspective-help-prune-sparse-moes-elegantly","title":"Does a Global Perspective Help Prune Sparse MoEs Elegantly?","url":"https://www.microsoft.com/en-us/research/publication/does-a-global-perspective-help-prune-sparse-moes-elegantly/","published":"2026-04-08","authors":["Zeliang Zhang","Nikhil Ghosh","Jiani Liu","Bin Yu","Xiaodong Liu"],"abstract":"Empirical scaling laws for language models have encouraged the development of ever-larger LLMs, despite their growing computational and memory costs. Sparse Mixture-of-Experts (MoEs) offer a promising alternative by activating only a subset of experts per forward pass, improving efficiency without sacrificing performance. However, the large number of expert parameters still leads to substantial memory consumption. Existing pruning methods typically allocate budgets uniformly across layers, overlooking the heterogeneous redundancy that arises in sparse MoEs. We propose GRAPE (Global Redundancy-Aware Pruning of Experts, a global pruning strategy that dynamically allocates pruning budgets based on cross-layer redundancy. Experiments on Mixtral-8x7B, Mixtral-8x22B, DeepSeek-MoE, Qwen-MoE, and GPT-OSS show that, under the same pruning budget, GRAPE consistently achieves the best average perfo...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Miscellaneous","Artificial intelligence","Computation and Language","Computer science","memory"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/training-free-spatially-grounded-geometric-shape-encoding-technical-report","title":"Training-free Spatially Grounded Geometric Shape Encoding (Technical Report)","url":"https://www.microsoft.com/en-us/research/publication/training-free-spatially-grounded-geometric-shape-encoding-technical-report/","published":"2026-04-08","authors":["Yuhan He"],"abstract":"Positional encoding has become the de facto standard for grounding deep neural networks on discrete point-wise positions, and it has achieved remarkable success in tasks where the input can be represented as a one-dimensional sequence. However, extending this concept to 2D spatial geometric shapes demands carefully designed encoding strategies that account not only for shape geometry and pose, but also for compatibility with neural network learning. In this work, we address these challenges by introducing a training-free, general-purpose encoding strategy, dubbed XShapeEnc, that encodes an arbitrary spatially grounded 2D geometric shape into a compact representation exhibiting five favorable properties, including invertibility, adaptivity, and frequency richness. Specifically, a 2D spatially grounded geometric shape is decomposed into its normalized geometry within the unit disk and its....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Miscellaneous","Artificial intelligence","Computer vision","Computer science"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"bytedance-seed:1430","title":"Not all tokens contribute equally to diffusion learning","url":"https://seed.bytedance.com/en/research/not-all-tokens-contribute-equally-to-diffusion-learning","published":"2026-04-08","authors":["Guoqing Zhang","Lu Shi","Wanru Xu","Linna Zhang","Sen Wang","Fangfang Wang","Yigang Cen"],"abstract":"With the rapid development of conditional diffusion models, significant progress has been made in text-to-video generation. However, we observe that these models often neglect semantically important tokens during inference, leading to biased or incomplete generations under classifier-free guidance. We attribute this issue to two key factors: distributional bias caused by the long-tailed token frequency in training data, and spatial misalignment in cross-attention where semantically important tokens are overshadowed by less informative ones. To address these issues, we propose Distribution-Aware Rectification and Spatial Ensemble (DARE), a unified framework that improves semantic guidance in diffusion models from the perspectives of distributional debiasing and spatial consistency. First, we introduce Distribution-Rectified Classifier-Free Guidance (DR-CFG), which regularizes the training...","companies":["ByteDance/Seed"],"matched_orgs":["ByteDance/Seed"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Computer Vision and Pattern Recognition","Vision","arXiv"],"author_affiliations":["ByteDance/Seed"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official ByteDance Seed publication API https://seed.bytedance.com/api/get_article_list_v2"}},{"id":"official:9f0cb8dcf5d75f3d","title":"Veo 3.1 Lite Model Card","url":"https://deepmind.google/models/model-cards/veo-3-1-lite/","published":"2026-04-08","authors":["Google/DeepMind"],"abstract":"Official Google DeepMind model card.","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"model_card","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["model card","Veo 3.1 Lite"],"author_affiliations":["Google/DeepMind"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Google DeepMind model cards page https://deepmind.google/models/model-cards/"}},{"id":"hf-org-paper:tencent:2604.08364","title":"MegaStyle: Constructing Diverse and Scalable Style Dataset via Consistent Text-to-Image Style Mapping","url":"https://huggingface.co/papers/2604.08364","published":"2026-04-08","authors":["Tencent/Hunyuan"],"abstract":"","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["HuggingFace org papers","tencent"],"author_affiliations":["Tencent/Hunyuan"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/tencent/papers"}},{"id":"openalex:W7152600598","title":"Retrieval-Augmented Generation (RAG) for Large Language Models: A Comprehensive Survey","url":"https://doi.org/10.63503/j.ijaimd.2026.233","published":"2026-04-08","authors":["Tanay Chowdhury"],"abstract":"The Retrieval-Augmented Generation (RAG) paradigm has been proposed as a potent concept that would increase the factual accuracy, reliability, and adaptability of Large Language Models (LLMs) by incorporating external information retrieval and text generation. In comparison to an independent LLM, which leverages solely parametric knowledge, RAG dynamically accesses non-parametric knowledge sources during inference that are up-to-date and are founded on the evidence that was accessed to generate the response. This paper summarizes the foundations of RAG, its architecture, major components (retriever and generator) and an indexing-retrieval-generation methodology. It critically examines retrieval strategies in which sparse, dense and hybrid retrieval are examined with their efficiency trade-offs, semantic insight, interpretability and application to domain-specific tasks including healthca...","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.63503/j.ijaimd.2026.233","openalex_id":"https://openalex.org/W7152600598","cited_by_count":0,"quality_score":45,"matched_keywords":["LLM","retrieval"],"author_affiliations":["Amazon (United States)"],"concepts":[{"id":"https://openalex.org/C2781067378","display_name":"Interpretability","score":0.8242999911308289},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7670000195503235},{"id":"https://openalex.org/C48044578","display_name":"Scalability","score":0.6586999893188477},{"id":"https://openalex.org/C183003079","display_name":"Personalization","score":0.6388999819755554},{"id":"https://openalex.org/C177606310","display_name":"Adaptability","score":0.5509999990463257},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.54339998960495},{"id":"https://openalex.org/C2776214188","display_name":"Inference","score":0.5426999926567078},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.4253999888896942}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"arxiv:2604.07484","title":"ConsistRM: Improving Generative Reward Models via Consistency-Aware Self-Training","url":"https://arxiv.org/abs/2604.07484","published":"2026-04-08","authors":["Yu Liang","Liangxin Liu","Longzheng Wang","Yan Wang","Yueyang Zhang","Long Xia","Zhiyuan Sun","Daiting Shi"],"abstract":"Generative reward models (GRMs) have emerged as a promising approach for aligning Large Language Models (LLMs) with human preferences by offering greater representational capacity and flexibility than traditional scalar reward models. However, GRMs face two major challenges: reliance on costly human-annotated data restricts scalability, and self-training approaches often suffer from instability and vulnerability to reward hacking. To address these issues, we propose ConsistRM, a self-training framework that enables effective and stable GRM training without human annotations. ConsistRM incorporates the Consistency-Aware Answer Reward, which produces reliable pseudo-labels with temporal consistency, thereby providing more stable model optimization. Moreover, the Consistency-Aware Critique Reward is introduced to assess semantic consistency across multiple critiques and allocates fine-grain...","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"","openalex_id":"https://openalex.org/W7153669768","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Baidu (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7150999903678894},{"id":"https://openalex.org/C2776436953","display_name":"Consistency (knowledge bases)","score":0.6480000019073486},{"id":"https://openalex.org/C2780598303","display_name":"Flexibility (engineering)","score":0.6182000041007996},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.5950000286102295},{"id":"https://openalex.org/C97541855","display_name":"Reinforcement learning","score":0.5845000147819519},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5837000012397766},{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.5181000232696533},{"id":"https://openalex.org/C185798385","display_name":"Benchmark (surveying)","score":0.5130000114440918}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/human-values-matter-investigating-how-misalignment-shapes-collective-behaviors-in-llm-agent-communities","title":"Human Values Matter: Investigating How Misalignment Shapes Collective Behaviors in LLM Agent Communities","url":"https://www.microsoft.com/en-us/research/publication/human-values-matter-investigating-how-misalignment-shapes-collective-behaviors-in-llm-agent-communities/","published":"2026-04-07","authors":["Xiangxu Zhang","Jiaming Wang","Qinlin Zhao","Hanze Guo","Linzhuo Li","Jing Yao","Xiao Zhou","Xiaoyuan Yi","Xing Xie"],"abstract":"As LLMs become increasingly integrated into human society, evaluating their orientations on human values from social science has drawn growing attention. Nevertheless, it is still unclear why human values matter for LLMs, especially in LLM-based multi-agent systems, where group-level failures may accumulate from individually misaligned actions. We ask whether misalignment with human values alters the collective behavior of LLM agents and what changes it induces? In this work, we introduce CIVA, a controlled multi-agent environment grounded in social science theories, where LLM agents form a community and autonomously communicate, explore, and compete for resources, enabling systematic manipulation of value prevalence and behavioral analysis. Through comprehensive simulation experiments, we reveal three key findings. (1) We identify several structurally critical values that substantially....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":84,"matched_keywords":["Miscellaneous","Artificial intelligence","Human language technologies","Computation and Language","Computer science","LLM","agent","multi-agent"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/llm-reasoning-as-trajectories-step-specific-representation-geometry-and-correctness-signals","title":"LLM Reasoning as Trajectories: Step-Specific Representation Geometry and Correctness Signals","url":"https://www.microsoft.com/en-us/research/publication/llm-reasoning-as-trajectories-step-specific-representation-geometry-and-correctness-signals/","published":"2026-04-07","authors":["Lihao Sun","Hang Dong","Bo Qiao","Qingwei Lin","Dongmei Zhang","Saravan Rajmohan"],"abstract":"This work characterizes large language models'chain-of-thought generation as a structured trajectory through representation space. We show that mathematical reasoning traverses functionally ordered, step-specific subspaces that become increasingly separable with layer depth. This structure already exists in base models, while reasoning training primarily accelerates convergence toward termination-related subspaces rather than introducing new representational organization. While early reasoning steps follow similar trajectories, correct and incorrect solutions diverge systematically at late stages. This late-stage divergence enables mid-reasoning prediction of final-answer correctness with ROC-AUC up to 0.87. Furthermore, we introduce trajectory-based steering, an inference-time intervention framework that enables reasoning correction and length control based on derived ideal trajectories...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Miscellaneous","Artificial intelligence","Human language technologies","Computation and Language","Computer science","LLM"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/agentopt-v0-1-technical-report-client-side-optimization-for-llm-based-agent","title":"AgentOpt v0.1 Technical Report: Client-Side Optimization for LLM-Based Agent","url":"https://www.microsoft.com/en-us/research/publication/agentopt-v0-1-technical-report-client-side-optimization-for-llm-based-agent/","published":"2026-04-07","authors":["Wenyue Hua","Sripad Karne","Qian Xie","Armaan Agrawal","Nikos Pagonas","Kostis Kaffes","Tianyi Peng"],"abstract":"AI agents are increasingly deployed in real-world applications, including systems such as Manus, OpenClaw, and coding agents. Existing research has primarily focused on emph{server-side} efficiency, proposing methods such as caching, speculative execution, traffic scheduling, and load balancing to reduce the cost of serving agentic workloads. However, as users increasingly construct agents by composing local tools, remote APIs, and diverse models, an equally important optimization problem arises on the client side. Client-side optimization asks how developers should allocate the resources available to them, including model choice, local tools, and API budget across pipeline stages, subject to application-specific quality, cost, and latency constraints. Because these objectives depend on the task and deployment setting, they cannot be determined by server-side systems alone. We introduce....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Miscellaneous","Artificial intelligence","AI agents","Computer science","LLM","agent"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/ragen-2-reasoning-collapse-in-agentic-rl","title":"RAGEN-2: Reasoning Collapse in Agentic RL","url":"https://www.microsoft.com/en-us/research/publication/ragen-2-reasoning-collapse-in-agentic-rl/","published":"2026-04-07","authors":["Zihan Wang","Chi Gui","Xing Jin","Qineng Wang","Licheng Liu","Kangrui Wang","Shiqi Chen","Linjie Li","Zhengyuan Yang","Pingyue Zhang","Yiping Lu","Jiajun Wu"],"abstract":"RL training of multi-turn LLM agents is inherently unstable, and reasoning quality directly determines task performance. Entropy is widely used to track reasoning stability. However, entropy only measures diversity within the same input, and cannot tell whether reasoning actually responds to different inputs. In RAGEN-2, we find that even with stable entropy, models can rely on fixed templates that look diverse but are input-agnostic. We call this template collapse, a failure mode invisible to entropy and all existing metrics. To diagnose this failure, we decompose reasoning quality into within-input diversity (Entropy) and cross-input distinguishability (Mutual Information, MI), and introduce a family of mutual information proxies for online diagnosis. Across diverse tasks, mutual information correlates with final performance much more strongly than entropy, making it a more reliable pr...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Miscellaneous","Algorithms","Artificial intelligence","Computer science","LLM"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/artificial-intelligence-and-the-structure-of-mathematics","title":"Artificial Intelligence and the Structure of Mathematics","url":"https://www.microsoft.com/en-us/research/publication/artificial-intelligence-and-the-structure-of-mathematics/","published":"2026-04-07","authors":["Maissam Barkeshli","Michael R. Douglas","Michael Freedman"],"abstract":"Recent progress in artificial intelligence (AI) is unlocking transformative capabilities for mathematics. There is great hope that AI will help solve major open problems and autonomously discover new mathematical concepts. In this essay, we further consider how AI may open a grand perspective on mathematics by forging a new route, complementary to mathematicaltextbf{ logic,} to understanding the global structure of formal textbf{proof}textbf{s}. We begin by providing a sketch of the formal structure of mathematics in terms of universal proof and structural hypergraphs and discuss questions this raises about the foundational structure of mathematics. We then outline the main ingredients and provide a set of criteria to be satisfied for AI models capable of automated mathematical discovery. As we send AI agents to traverse Platonic mathematical worlds, we expect they will teach us about th...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Miscellaneous","Artificial intelligence","Mathematics","Computer science","mathematics"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/toward-consistent-world-models-with-multi-token-prediction-and-latent-semantic-enhancement","title":"Toward Consistent World Models with Multi-Token Prediction and Latent Semantic Enhancement","url":"https://www.microsoft.com/en-us/research/publication/toward-consistent-world-models-with-multi-token-prediction-and-latent-semantic-enhancement/","published":"2026-04-07","authors":["Qimin Zhong","Hao Liao","Haiming Qin","Mingyang Zhou","Rui Mao","Wei Chen","Naipeng Chao"],"abstract":"Whether Large Language Models (LLMs) develop coherent internal world models remains a core debate. While conventional Next-Token Prediction (NTP) focuses on one-step-ahead supervision, Multi-Token Prediction (MTP) has shown promise in learning more structured representations. In this work, we provide a theoretical perspective analyzing the gradient inductive bias of MTP, supported by empirical evidence, showing that MTP promotes the convergence toward internal belief states by inducing representational contractivity via gradient coupling. However, we reveal that standard MTP often suffers from structural hallucinations, where discrete token supervision encourages illegal shortcuts in latent space that violate environmental constraints. To address this, we propose a novel method Latent Semantic Enhancement MTP (LSE-MTP), which anchors predictions to ground-truth hidden state trajectories....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"hf-org-paper:Tencent-Hunyuan:2604.07430","title":"HY-Embodied-0.5: Embodied Foundation Models for Real-World Agents","url":"https://huggingface.co/papers/2604.07430","published":"2026-04-07","authors":["Tencent/Hunyuan"],"abstract":"We introduce HY-Embodied-0.5, a family of foundation models specifically designed for real-world embodied agents. To bridge the gap between general Vision-Language Models (VLMs) and the demands of embodied agents, our models are developed to enhance the core capabilities required by embodied intelligence: spatial and temporal visual perception, alongside advanced embodied reasoning for prediction, interaction, and planning. The HY-Embodied-0.5 suite comprises two primary variants: an efficient model with 2B activated parameters designed for edge deployment, and a powerful model with 32B activated parameters targeted for complex reasoning. To support the fine-grained visual perception essential for embodied tasks, we adopt a Mixture-of-Transformers (MoT) architecture to enable modality-specific computing. By incorporating latent tokens, this design effectively enhances the perceptual repr...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["HuggingFace org papers","Tencent-Hunyuan","efficient","distillation"],"author_affiliations":["Tencent/Hunyuan"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/Tencent-Hunyuan/papers"}},{"id":"arxiv:2604.06425","title":"Neural Computers","url":"https://huggingface.co/papers/2604.06425","published":"2026-04-07","authors":["Mingchen Zhuge","Changsheng Zhao","Haozhe Liu","Zijian Zhou","Shuming Liu","Wenyi Wang","Ernie Chang","Gael Le Lan","Junjie Fei","Wenxuan Zhang","Yasheng Sun","Zhipeng Cai"],"abstract":"We propose a new frontier: Neural Computers (NCs) -- an emerging machine form that unifies computation, memory, and I/O in a learned runtime state. Unlike conventional computers, which execute explicit programs, agents, which act over external execution environments, and world models, which learn environment dynamics, NCs aim to make the model itself the running computer. Our long-term goal is the Completely Neural Computer (CNC): the mature, general-purpose realization of this emerging machine form, with stable execution, explicit reprogramming, and durable capability reuse. As an initial step, we study whether early NC primitives can be learned solely from collected I/O traces, without instrumented program state. Concretely, we instantiate NCs as video models that roll out screen frames from instructions, pixels, and user actions (when available) in CLI and GUI settings. These implemen...","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["huggingface_search"],"source":"huggingface_search","work_type":"","doi":"","openalex_id":"","cited_by_count":0,"quality_score":35,"matched_keywords":["memory","long-term"],"author_affiliations":[],"concepts":[],"official_report":false,"quality_signals":{"company_match_source":"HuggingFace Papers organization/author metadata"}},{"id":"hf-org-paper:baidu:2604.05643","title":"Graph-Based Chain-of-Thought Pruning for Reducing Redundant Reflections in Reasoning LLMs","url":"https://huggingface.co/papers/2604.05643","published":"2026-04-06","authors":["Baidu"],"abstract":"","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["HuggingFace org papers","baidu"],"author_affiliations":["Baidu"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/baidu/papers"}},{"id":"apple:ill4tft5bpuqkln7celsm28k","title":"SQUIRE: Interactive UI Authoring via Slot QUery Intermediate REpresentations","url":"https://machinelearning.apple.com/research/squire","published":"2026-04-06","authors":["Alan Leung","Ruijia Cheng","Jason Wu","Jeffrey Nichols","Titus Barik"],"abstract":"Frontend developers create UI prototypes to evaluate alternatives, which is a time-consuming process of repeated iteration and refinement. Generative AI code assistants enable rapid prototyping simply by prompting through a chat interface rather than writing code. However, while this interaction gives developers flexibility since they can write any prompt they wish, it makes it challenging to control what is generated. First, natural language on...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"arxiv:2604.05265","title":"Semantic Reality: Interactive Context-Aware Visualization of Inter-Object Relationships in Augmented Reality","url":"http://arxiv.org/abs/2604.05265","published":"2026-04-06","authors":["Xiaoan Liu","Eric J. Gonzalez","Nels Numan","Andrea Colaço","Lucy Abramyan","Chen Zhu-Tian","Ryo Suzuki","Mar Gonzalez-Franco"],"abstract":"Bridging the physical and digital world through interaction remains a core challenge in augmented reality (AR). Existing systems target single objects, limiting support for planning, comparison, and assembly tasks that depend on relationships among multiple items. We present Semantic Reality, an AR system focused on surfacing inter-object connectivity and making it interactive. Leveraging multimodal reasoning, spatial anchoring, and physical action recognition, Semantic Reality maintains a persistent model of objects around the user and their relationships. Connections are visualized in-situ to highlight compatibility, reveal next steps, and reduce ambiguity during tasks. We contribute a connectivity-centered interaction paradigm and a system architecture that couples anchor tracking, action sensing, and model inference to construct a live connectivity graph. In an exploratory study comp...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"","openalex_id":"https://openalex.org/W7152934335","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Google (United States)","University of Colorado Boulder","University of Colorado System","University of Minnesota System"],"concepts":[{"id":"https://openalex.org/C153715457","display_name":"Augmented reality","score":0.7926999926567078},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7548999786376953},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.6779000163078308},{"id":"https://openalex.org/C174348530","display_name":"Bridging (networking)","score":0.6141999959945679},{"id":"https://openalex.org/C36464697","display_name":"Visualization","score":0.6071000099182129},{"id":"https://openalex.org/C2780522230","display_name":"Ambiguity","score":0.5676000118255615},{"id":"https://openalex.org/C2780791683","display_name":"Action (physics)","score":0.5149999856948853},{"id":"https://openalex.org/C2776214188","display_name":"Inference","score":0.45590001344680786}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/the-art-of-building-verifiers-for-computer-use-agents","title":"The Art of Building Verifiers for Computer Use Agents","url":"https://www.microsoft.com/en-us/research/publication/the-art-of-building-verifiers-for-computer-use-agents/","published":"2026-04-05","authors":["Corby Rosset","Pratyusha Sharma","Andrew Zhao","M. González-Fernández","Ahmed Awadallah"],"abstract":"Verifying the success of computer use agent (CUA) trajectories is a critical challenge: without reliable verification, neither evaluation nor training signal can be trusted. In this paper, we present lessons learned from building a best-in-class verifier for web tasks we call the Universal Verifier. We design the Universal Verifier around four key principles: 1) constructing rubrics with meaningful, non-overlapping criteria to reduce noise; 2) separating process and outcome rewards that yield complementary signals, capturing cases where an agent follows the right steps but gets blocked or succeeds through an unexpected path; 3) distinguishing between controllable and uncontrollable failures scored via a cascading-error-free strategy for finer-grained failure understanding; and 4) a divide-and-conquer context management scheme that attends to all screenshots in a trajectory, improving rel...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Miscellaneous","Artificial intelligence","Security, privacy, and cryptography","Computer science","agent"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/quantifying-trust-financial-risk-management-for-trustworthy-ai-agents","title":"Quantifying Trust: Financial Risk Management for Trustworthy AI Agents","url":"https://www.microsoft.com/en-us/research/publication/quantifying-trust-financial-risk-management-for-trustworthy-ai-agents/","published":"2026-04-05","authors":["Wenyue Hua","Tianyi Peng","Chi Wang","I. Kaufman","Bryan Lim","Chandler Fang"],"abstract":"Prior work on trustworthy AI emphasizes model-internal properties such as bias mitigation, adversarial robustness, and interpretability. As AI systems evolve into autonomous agents deployed in open environments and increasingly connected to payments or assets, the operational meaning of trust shifts to end-to-end outcomes: whether an agent completes tasks, follows user intent, and avoids failures that cause material or psychological harm. These risks are fundamentally product-level and cannot be eliminated by technical safeguards alone because agent behavior is inherently stochastic. To address this gap between model-level reliability and user-facing assurance, we propose a complementary framework based on risk management. Drawing inspiration from financial underwriting, we introduce the textbf{Agentic Risk Standard (ARS)}, a payment settlement standard for AI-mediated transactions. ARS....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Miscellaneous","Artificial intelligence","Economics","Computer science","agent"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/effects-of-generative-ai-errors-on-user-reliance-across-task-difficulty","title":"Effects of Generative AI Errors on User Reliance Across Task Difficulty","url":"https://www.microsoft.com/en-us/research/publication/effects-of-generative-ai-errors-on-user-reliance-across-task-difficulty/","published":"2026-04-05","authors":["Jacy Reese Anthis","Hannah Cha","Solon Barocas","Alexandra Chouldechova","Jake Hofman"],"abstract":"The capabilities of artificial intelligence (AI) lie along a jagged frontier, where AI systems surprisingly fail on tasks that humans find easy and succeed on tasks that humans find hard. To investigate user reactions to this phenomenon, we developed an incentive-compatible experimental methodology based on diagram generation tasks, in which we induce errors in generative AI output and test effects on user reliance. We demonstrate the interface in a preregistered 3x2 experiment (N = 577) with error rates of 10%, 30%, or 50% on easier or harder diagram generation tasks. We confirmed that observing more errors reduces use, but we unexpectedly found that easy-task errors did not significantly reduce use more than hard-task errors, suggesting that people are not averse to jaggedness in this experimental setting. We encourage future work that varies task difficulty at the same time as other f...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.1145/3772363.3798463","openalex_id":"https://openalex.org/W7151913260","cited_by_count":0,"quality_score":68,"matched_keywords":["Miscellaneous","Artificial intelligence","Computer science","Generative AI"],"author_affiliations":["Microsoft","Microsoft (United States)","Microsoft Research New York City (United States)","Stanford University"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"arxiv:2604.04020","title":"Unmasking Hallucinations: A Causal Graph-Attention Perspective on Factual Reliability in Large Language Models","url":"http://arxiv.org/abs/2604.04020","published":"2026-04-05","authors":["Sailesh Kiran Kurra","Shiek Ruksana","Vishal Borusu"],"abstract":"This paper primarily focuses on the hallucinations caused due to AI language models(LLMs).LLMs have shown extraordinary Language understanding and generation capabilities .Still it has major a disadvantage hallucinations which give outputs which are factually incorrect ,misleading or unsupported by input data . These hallucinations cause serious problems in scenarios like medical diagnosis or legal reasoning.Through this work,we propose causal graph attention network (GCAN) framework that reduces hallucinations through interpretation of internal attention flow within a transformer architecture with the help of constructing token level graphs that combine self attention weights and gradient based influence scores.our method quantifies each tokens factual dependency using a new metric called the Causal Contribution Score (CCS). We further introduce a fact-anchored graph reweighting layer t...","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"","openalex_id":"https://openalex.org/W7152331577","cited_by_count":0,"quality_score":45,"matched_keywords":["LLM","retrieval"],"author_affiliations":["Amazon (Germany)","Amazon (United States)","Nalsar University of Law"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6255999803543091},{"id":"https://openalex.org/C43214815","display_name":"Reliability (semiconductor)","score":0.45649999380111694},{"id":"https://openalex.org/C176217482","display_name":"Metric (unit)","score":0.4440999925136566},{"id":"https://openalex.org/C48145219","display_name":"Security token","score":0.4352000057697296},{"id":"https://openalex.org/C132525143","display_name":"Graph","score":0.4291999936103821},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.41449999809265137},{"id":"https://openalex.org/C12713177","display_name":"Perspective (graphical)","score":0.4092000126838684},{"id":"https://openalex.org/C152124472","display_name":"Redundancy (engineering)","score":0.40790000557899475}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7151322685","title":"Schema-Aware Planning and Hybrid Knowledge Toolset for Reliable Knowledge Graph Triple Verification","url":"https://doi.org/10.48550/arxiv.2604.04190","published":"2026-04-05","authors":["Xinyan Ma","Xianhao Ou","Wen Zhang","Shixin Jiang","Runxuan Liu","Dandan Tu","Libin Chen","Ming Liu","Bing Qin"],"abstract":"Knowledge Graphs (KGs) serve as a critical foundation for AI systems, yet their automated construction inevitably introduces noise, compromising data trustworthiness. Existing triple verification methods, based on graph embeddings or language models, often suffer from single-source bias by relying on either internal structural constraints or external semantic evidence, and usually follow a static inference paradigm. As a result, they struggle with complex or long-tail facts and provide limited interpretability. To address these limitations, we propose SHARP (Schema-Hybrid Agent for Reliable Prediction), a training-free autonomous agent that reformulates triple verification as a dynamic process of strategic planning, active investigation, and evidential reasoning. Specifically, SHARP combines a Memory-Augmented Mechanism with Schema-Aware Strategic Planning to improve reasoning stability,...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"preprint","doi":"https://doi.org/10.48550/arxiv.2604.04190","openalex_id":"https://openalex.org/W7151322685","cited_by_count":0,"quality_score":45,"matched_keywords":["memory","agent"],"author_affiliations":["Beijing Normal University","Harbin Institute of Technology","Huawei Technologies (China)"],"concepts":[{"id":"https://openalex.org/C2781067378","display_name":"Interpretability","score":0.8601999878883362},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7731000185012817},{"id":"https://openalex.org/C2776214188","display_name":"Inference","score":0.5791000127792358},{"id":"https://openalex.org/C63479239","display_name":"Robustness (evolution)","score":0.5145999789237976},{"id":"https://openalex.org/C2987255567","display_name":"Knowledge graph","score":0.4921000003814697},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.48829999566078186},{"id":"https://openalex.org/C132525143","display_name":"Graph","score":0.46480000019073486},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.41760000586509705}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"arxiv:2604.04269","title":"Beyond Fluency: Toward Reliable Trajectories in Agentic IR","url":"https://arxiv.org/abs/2604.04269","published":"2026-04-05","authors":["Anushree Sinha","Srikanth Ranganathan","Debanshu Das","Abhishek Dharmaratnakar"],"abstract":"Information Retrieval is shifting from passive document ranking toward autonomous agentic workflows that operate in multi-step Reason-Act-Observe loops. In such long-horizon trajectories, minor early errors can cascade, leading to functional misalignment between internal reasoning and external tool execution despite continued linguistic fluency. This position paper synthesizes failure modes observed in industrial agentic systems, categorizing errors across planning, retrieval, reasoning, and execution. We argue that safe deployment requires moving beyond endpoint accuracy toward trajectory integrity and causal attribution. To address compounding error and deceptive fluency, we propose verification gates at each interaction unit and advocate systematic abstention under calibrated uncertainty. Reliable Agentic IR systems must prioritize process correctness and grounded execution over plaus...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"","openalex_id":"https://openalex.org/W7152330729","cited_by_count":0,"quality_score":41,"matched_keywords":["retrieval"],"author_affiliations":["Google (United States)"],"concepts":[{"id":"https://openalex.org/C55439883","display_name":"Correctness","score":0.7193999886512756},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6926000118255615},{"id":"https://openalex.org/C98045186","display_name":"Process (computing)","score":0.6205999851226807},{"id":"https://openalex.org/C13662910","display_name":"Trajectory","score":0.48089998960494995},{"id":"https://openalex.org/C28719098","display_name":"Point (geometry)","score":0.45669999718666077},{"id":"https://openalex.org/C177212765","display_name":"Workflow","score":0.45500001311302185},{"id":"https://openalex.org/C198082294","display_name":"Position (finance)","score":0.4487000107765198},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4481000006198883}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/funfact-building-probabilistic-functional-3d-scene-graphs-via-factor-graph-reasoning","title":"FunFact: Building Probabilistic Functional 3D Scene Graphs via Factor-Graph Reasoning","url":"https://www.microsoft.com/en-us/research/publication/funfact-building-probabilistic-functional-3d-scene-graphs-via-factor-graph-reasoning/","published":"2026-04-04","authors":["Zhengyu Fu","Ren'e Zurbrugg","Kaixian Qu","Marc Pollefeys","Marco Hutter","Hermann Blum","Z. Bauer"],"abstract":"Recent work in 3D scene understanding is moving beyond purely spatial analysis toward functional scene understanding. However, existing methods often consider functional relationships between object pairs in isolation, failing to capture the scene-wide interdependence that humans use to resolve ambiguity. We introduce FunFact, a framework for constructing probabilistic open-vocabulary functional 3D scene graphs from posed RGB-D images. FunFact first builds an object- and part-centric 3D map and uses foundation models to propose semantically plausible functional relations. These candidates are converted into factor graph variables and constrained by both LLM-derived common-sense priors and geometric priors. This formulation enables joint probabilistic inference over all functional edges and their marginals, yielding substantially better calibrated confidence scores. To benchmark this sett...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Miscellaneous","Computer vision","Computer science","Computer Vision and Pattern Recognition","LLM"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W7149380049","title":"Comparative Analysis of RAG Algorithms and LLM Fine-Tuning Methods for Domain-Specific Search Tasks","url":"https://doi.org/10.37547/tajet/volume08issue04-03","published":"2026-04-04","authors":["Kapil Verma"],"abstract":"The article examines the comparative properties of Retrieval-Augmented Generation algorithms and large-language-model fine-tuning methods in the context of domain-specific search tasks with a high cost of error. The aim is to identify operating regimes in which RAG and fine-tuning differentially affect the accuracy of top-ranked results, the evidential quality of answers, and the safety of handling sensitive data. The relevance of the study is driven by the rapid growth of industrial domain-specific search systems that must simultaneously ensure knowledge updatability, strict citation-based verifiability, and regulatory discipline. The novelty lies in the fact that the comparison is conducted not in the abstract form of RAG versus fine-tuning, but at the level of individual pipeline components and from the perspective of operational trade-offs: it is shown that retrieval and ranking form...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.37547/tajet/volume08issue04-03","openalex_id":"https://openalex.org/W7149380049","cited_by_count":0,"quality_score":49,"matched_keywords":["LLM","retrieval","efficient"],"author_affiliations":["Google (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6771000027656555},{"id":"https://openalex.org/C2779343474","display_name":"Context (archaeology)","score":0.5871000289916992},{"id":"https://openalex.org/C158154518","display_name":"Relevance (law)","score":0.5867000222206116},{"id":"https://openalex.org/C2780598303","display_name":"Flexibility (engineering)","score":0.5350000262260437},{"id":"https://openalex.org/C2778738651","display_name":"Novelty","score":0.5148000121116638},{"id":"https://openalex.org/C48044578","display_name":"Scalability","score":0.49480000138282776},{"id":"https://openalex.org/C2779530757","display_name":"Quality (philosophy)","score":0.49149999022483826},{"id":"https://openalex.org/C189430467","display_name":"Ranking (information retrieval)","score":0.4690000116825104}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/vert-reliable-llm-judges-for-radiology-report-evaluation","title":"VERT: Reliable LLM Judges for Radiology Report Evaluation","url":"https://www.microsoft.com/en-us/research/publication/vert-reliable-llm-judges-for-radiology-report-evaluation/","published":"2026-04-03","authors":["Federica Bologna","Jean-Philippe Corbeil","Matthew Wilkens","Asma Ben Abacha"],"abstract":"Current literature on radiology report evaluation has focused primarily on designing LLM-based metrics and fine-tuning small models for chest X-rays. However, it remains unclear whether these approaches are robust when applied to reports from other modalities and anatomies. Which model and prompt configurations are best suited to serve as LLM judges for radiology evaluation? We conduct a thorough correlation analysis between expert and LLM-based ratings. We compare three existing LLM-as-a-judge metrics (RadFact, GREEN, and FineRadScore) alongside VERT, our proposed LLM-based metric, using open- and closed-source models (reasoning and non-reasoning) of different sizes across two expert-annotated datasets, RadEval and RaTE-Eval, spanning multiple modalities and anatomies. We further evaluate few-shot approaches, ensembling, and parameter-efficient fine-tuning using RaTE-Eval. To better und...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":80,"matched_keywords":["Miscellaneous","Algorithms","Programming languages and software engineering","Computer science","Medical Imaging","LLM","efficient"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/actionnex-a-virtual-outage-manager-for-cloud","title":"ActionNex: A Virtual Outage Manager for Cloud Computing","url":"https://www.microsoft.com/en-us/research/publication/actionnex-a-virtual-outage-manager-for-cloud/","published":"2026-04-03","authors":["Zhenfeng Lin","Hao Hu","Mingyuan Hao","Xuchao Zhang","Ryan Zhang","Junhao Li","Ze Li","Oleg Kulygin","Chetan Bansal","H. Tuna","Murali Chintalapati","Sheila Jiang"],"abstract":"Outage management in large-scale cloud operations remains heavily manual, requiring rapid triage, cross-team coordination, and experience-driven decisions under partial observability. We present \\textbf{ActionNex}, a production-grade agentic system that supports end-to-end outage assistance, including real-time updates, knowledge distillation, and role- and stage-conditioned next-best action recommendations. ActionNex ingests multimodal operational signals (e.g., outage content, telemetry, and human communications) and compresses them into critical events that represent meaningful state transitions. It couples this perception layer with a hierarchical memory subsystem: long-term Key-Condition-Action (KCA) knowledge distilled from playbooks and historical executions, episodic memory of prior outages, and working memory of the live context. A reasoning agent aligns current critical events....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":80,"matched_keywords":["Miscellaneous","Artificial intelligence","Computer science","memory","long-term","distillation","agent"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/the-tool-illusion-rethinking-tool-use-in-web-agents","title":"The Tool Illusion: Rethinking Tool Use in Web Agents","url":"https://www.microsoft.com/en-us/research/publication/the-tool-illusion-rethinking-tool-use-in-web-agents/","published":"2026-04-03","authors":["Renze Lou","Baolin Peng","Wenlin Yao","Qianhui Wu","Hao Cheng","Suman Nath","Wenpeng Yin","Jianfeng Gao"],"abstract":"As web agents rapidly evolve, an increasing body of work has moved beyond conventional atomic browser interactions and explored tool use as a higher-level action paradigm. Although prior studies have shown the promise of tools, their conclusions are often drawn from limited experimental scales and sometimes non-comparable settings. As a result, several fundamental questions remain unclear: i) whether tools provide consistent gains for web agents, ii) what practical design principles characterize effective tools, and iii) what side effects tool use may introduce. To establish a stronger empirical foundation for future research, we revisit tool use in web agents through an extensive and carefully controlled study across diverse tool sources, backbone models, tool-use frameworks, and evaluation benchmarks. Our findings both revise some prior conclusions and complement others with broader ev...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Miscellaneous","Artificial intelligence","Human language technologies","Computer science"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"arxiv:2604.04953","title":"Generative AI for Video Trailer Synthesis: From Extractive Heuristics to Autoregressive Creativity","url":"http://arxiv.org/abs/2604.04953","published":"2026-04-03","authors":["Abhishek Dharmaratnakar","Srikanth Ranganathan","Debanshu Das","Anushree Sinha"],"abstract":"The domain of automatic video trailer generation is currently undergoing a profound paradigm shift, transitioning from heuristic-based extraction methods to deep generative synthesis. While early methodologies relied heavily on low-level feature engineering, visual saliency, and rule-based heuristics to select representative shots, recent advancements in Large Language Models (LLMs), Multimodal Large Language Models (MLLMs), and diffusion-based video synthesis have enabled systems that not only identify key moments but also construct coherent, emotionally resonant narratives. This survey provides a comprehensive technical review of this evolution, with a specific focus on generative techniques including autoregressive Transformers, LLM-orchestrated pipelines, and text-to-video foundation models like OpenAI's Sora and Google's Veo. We analyze the architectural progression from Graph Convo...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"","openalex_id":"https://openalex.org/W7152933672","cited_by_count":0,"quality_score":41,"matched_keywords":["LLM"],"author_affiliations":["Google (United States)","Google (Canada)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7268000245094299},{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.6200000047683716},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.6169999837875366},{"id":"https://openalex.org/C127705205","display_name":"Heuristics","score":0.5961999893188477},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.47600001096725464},{"id":"https://openalex.org/C108583219","display_name":"Deep learning","score":0.46209999918937683},{"id":"https://openalex.org/C81363708","display_name":"Convolutional neural network","score":0.4066999852657318},{"id":"https://openalex.org/C36503486","display_name":"Domain (mathematical analysis)","score":0.39719998836517334}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7148762181","title":"Abstract 1298: Structure-sequence integration for peptide:MHC class II binding prediction using AI foundation models (AlphaFold 3 and ESM2).","url":"https://doi.org/10.1158/1538-7445.am2026-1298","published":"2026-04-03","authors":["Kamel Lahouel","Mete Mulazimoglu","Jorge Soria-Bustos","Kameron Bates","Erin Kelley","Lawson J. Woods","Kunjur Manasa Upadhyaya","Gonzalo J. Acevedo","Sophie Pénisson","Matteo Munini","Ehsan Variani","Margaret E. Feeney"],"abstract":"Abstract Background: Accurate prediction of CD4 T cell epitopes is essential for vaccine design and immunotherapy development but remains challenging due to MHC class II polymorphism and the complexity of antigen presentation. We present a foundation-model-based framework that integrates structural representations from AlphaFold 3 (AF3) and sequence embeddings from ESM2 within a graph neural network (GNN) to predict peptide:MHC II binding. Model and Experimental Procedures: For each peptide-MHC II complex, AF3 was used to generate a 3D structural model, from which we constructed a residue-level graph where edges represent geometric proximity. Node features combined AF3-derived spatial descriptors with contextual embeddings from ESM2 corresponding to each amino-acid token in the peptide sequence. The hybrid GNN was trained on experimental data from a multiplexed MHCII-PepSeq assay, a high...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1158/1538-7445.am2026-1298","openalex_id":"https://openalex.org/W7148762181","cited_by_count":0,"quality_score":41,"matched_keywords":["efficient"],"author_affiliations":["Google (United States)","Translational Genomics Research Institute","University of California, San Francisco"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5796999931335449},{"id":"https://openalex.org/C165064840","display_name":"Matching (statistics)","score":0.4480000138282776},{"id":"https://openalex.org/C2777212361","display_name":"Class (philosophy)","score":0.4458000063896179},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4440000057220459},{"id":"https://openalex.org/C188280979","display_name":"Human leukocyte antigen","score":0.4399999976158142},{"id":"https://openalex.org/C48145219","display_name":"Security token","score":0.41780000925064087},{"id":"https://openalex.org/C195616568","display_name":"Epitope","score":0.4124999940395355},{"id":"https://openalex.org/C2778112365","display_name":"Sequence (biology)","score":0.39910000562667847}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"arxiv:2604.03136","title":"StoryScope: Investigating idiosyncrasies in AI fiction","url":"https://arxiv.org/abs/2604.03136","published":"2026-04-03","authors":["Jenna Russell","Rishanth Rajendhran","Pham, Chau Minh","Iyyer, Mohit","John Wieting"],"abstract":"As AI-generated fiction becomes increasingly prevalent, questions of authorship and originality are becoming central to how written work is evaluated. While most existing work in this space focuses on identifying surface-level signatures of AI writing, we ask instead whether AI-generated stories can be distinguished from human ones without relying on stylistic signals, focusing on discourse-level narrative choices such as character agency and chronological discontinuity. We propose StoryScope, a pipeline that automatically induces a fine-grained, interpretable feature space of discourse-level narrative features across 10 dimensions. We apply StoryScope to a parallel corpus of 10,272 writing prompts, each written by a human author and five LLMs, yielding 61,608 stories, each ~5,000 words, and 304 extracted features per story. Narrative features alone achieve 93.2% macro-F1 for human vs. A...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"","openalex_id":"https://openalex.org/W7152330698","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Google (United States)","University of Maryland, College Park"],"concepts":[{"id":"https://openalex.org/C199033989","display_name":"Narrative","score":0.8901000022888184},{"id":"https://openalex.org/C2780861071","display_name":"Character (mathematics)","score":0.5656999945640564},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5163000226020813},{"id":"https://openalex.org/C108170787","display_name":"Agency (philosophy)","score":0.49810001254081726},{"id":"https://openalex.org/C177264268","display_name":"Set (abstract data type)","score":0.4975000023841858},{"id":"https://openalex.org/C2778572836","display_name":"Space (punctuation)","score":0.47360000014305115},{"id":"https://openalex.org/C41895202","display_name":"Linguistics","score":0.427700012922287},{"id":"https://openalex.org/C2776401178","display_name":"Feature (linguistics)","score":0.41929998993873596}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7148632004","title":"A foundation model for multivariate time series forecasting","url":"https://doi.org/10.21203/rs.3.rs-9096522/v1","published":"2026-04-03","authors":["Abdul Fatir Ansari","Oleksandr Shchur","Jaris Küken","Andreas Auer","Boran Han","Pedro Mercado","Syama Sundar Rangapuram","Huibin Shen","Lorenzo Stella","Xiyuan Zhang","Mononito Goswami","Shubham Kapoor"],"abstract":"","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"preprint","doi":"https://doi.org/10.21203/rs.3.rs-9096522/v1","openalex_id":"https://openalex.org/W7148632004","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Amazon (Germany)","Amazon (United States)","University of Freiburg"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6467000246047974},{"id":"https://openalex.org/C49937458","display_name":"Probabilistic logic","score":0.607200026512146},{"id":"https://openalex.org/C143724316","display_name":"Series (stratigraphy)","score":0.593999981880188},{"id":"https://openalex.org/C2779343474","display_name":"Context (archaeology)","score":0.5920000076293945},{"id":"https://openalex.org/C122282355","display_name":"Probabilistic forecasting","score":0.5917999744415283},{"id":"https://openalex.org/C161584116","display_name":"Multivariate statistics","score":0.5763999819755554},{"id":"https://openalex.org/C151406439","display_name":"Time series","score":0.5458999872207642},{"id":"https://openalex.org/C149782125","display_name":"Econometrics","score":0.5098999738693237}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/coral-towards-autonomous-multi-agent-evolution-for-open-ended-discovery","title":"CORAL: Towards Autonomous Multi-Agent Evolution for Open-Ended Discovery","url":"https://www.microsoft.com/en-us/research/publication/coral-towards-autonomous-multi-agent-evolution-for-open-ended-discovery/","published":"2026-04-02","authors":["Ao Qu","Handi Zheng","Zijian Zhou","Yihao Yan","Yihong Tang","S. Y. Ong","Fenglu Hong","Kai-Qing Zhou","Chonghe Jiang","Minwei Kong","Jiacheng Zhu","Xuan Jiang"],"abstract":"Large language model (LLM)-based evolution is a promising approach for open-ended discovery, where progress requires sustained search and knowledge accumulation. Existing methods still rely heavily on fixed heuristics and hard-coded exploration rules, which limit the autonomy of LLM agents. We present CORAL, the first framework for autonomous multi-agent evolution on open-ended problems. CORAL replaces rigid control with long-running agents that explore, reflect, and collaborate through shared persistent memory, asynchronous multi-agent execution, and heartbeat-based interventions. It also provides practical safeguards, including isolated workspaces, evaluator separation, resource management, and agent session and health management. Evaluated on diverse mathematical, algorithmic, and systems optimization tasks, CORAL sets new state-of-the-art results on 10 tasks, achieving 3-10 times hig...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":92,"matched_keywords":["Miscellaneous","Artificial intelligence","Human-computer interaction","Computer science","large language models","LLM","language model","memory","agent","multi-agent"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/geoai-agency-primitives","title":"GeoAI Agency Primitives","url":"https://www.microsoft.com/en-us/research/publication/geoai-agency-primitives/","published":"2026-04-02","authors":["Akram Zaytar","Rohan Sawahn","Caleb Robinson","Gilles Quentin Hacheme","Girmaw Abebe Tadesse","I. Becker-Reshef","Rahul Dodhia","Juan M. Lavista Ferres"],"abstract":"We present ongoing research on agency primitives for GeoAI assistants -- core capabilities that connect Foundation models to the artifact-centric, human-in-the-loop workflows where GIS practitioners actually work. Despite advances in satellite image captioning, visual question answering, and promptable segmentation, these capabilities have not translated into productivity gains for practitioners who spend most of their time producing vector layers, raster maps, and cartographic products. The gap is not model capability alone but the absence of an agency layer that supports iterative collaboration. We propose a vocabulary of $9$ primitives for such a layer -- including navigation, perception, geo-referenced memory, and dual modeling -- along with a benchmark that measures human productivity. Our goal is a vocabulary that makes agentic assistance in GIS implementable, testable, and compara...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":84,"matched_keywords":["Miscellaneous","Artificial intelligence","Computer vision","Ecology and environment","Cartography","Computer science","Computer Vision and Pattern Recognition","memory"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/strive-structured-spatiotemporal-exploration-for-reinforcement-learning-in-video-question-answering","title":"STRIVE: Structured Spatiotemporal Exploration for Reinforcement Learning in Video Question Answering","url":"https://www.microsoft.com/en-us/research/publication/strive-structured-spatiotemporal-exploration-for-reinforcement-learning-in-video-question-answering/","published":"2026-04-02","authors":["E. Bahrami","Olga Zatsarynna","Parth Pathak","Sunando Sengupta","Juergen Gall","Mohsen Fayyaz"],"abstract":"We introduce STRIVE (SpatioTemporal Reinforcement with Importance-aware Variant Exploration), a structured reinforcement learning framework for video question answering. While group-based policy optimization methods have shown promise in large multimodal models, they often suffer from low reward variance when responses exhibit similar correctness, leading to weak or unstable advantage estimates. STRIVE addresses this limitation by constructing multiple spatiotemporal variants of each input video and performing joint normalization across both textual generations and visual variants. By expanding group comparisons beyond linguistic diversity to structured visual perturbations, STRIVE enriches reward signals and promotes more stable and informative policy updates. To ensure exploration remains semantically grounded, we introduce an importance-aware sampling mechanism that prioritizes frames...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Miscellaneous","Artificial intelligence","Computer vision","Computer science","Computer Vision and Pattern Recognition","Reinforcement learning"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/magic-madness-heaven-sin-llm-output-diversity-is-everything-everywhere-all-at-once","title":"Magic, Madness, Heaven, Sin: LLM Output Diversity is Everything, Everywhere, All at Once","url":"https://www.microsoft.com/en-us/research/publication/magic-madness-heaven-sin-llm-output-diversity-is-everything-everywhere-all-at-once/","published":"2026-04-02","authors":["Harnoor Dhingra"],"abstract":"Research on Large Language Models (LLMs) studies output variation across generation, reasoning, alignment, and representational analysis, often under the umbrella of\"diversity.\"Yet the terminology remains fragmented, largely because the normative objectives underlying tasks are rarely made explicit. We introduce the Magic, Madness, Heaven, Sin framework, which models output variation along a homogeneity-heterogeneity axis, where valuation is determined by the task and its normative objective. We organize tasks into four normative contexts: epistemic (factuality), interactional (user utility), societal (representation), and safety (robustness). For each, we examine the failure modes and vocabulary such as hallucination, mode collapse, bias, and erasure through which variation is studied. We apply the framework to analyze all pairwise cross-contextual interactions, revealing that optimizin...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Miscellaneous","Artificial intelligence","Human language technologies","Computer science","large language models","LLM"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/evaluating-ai-generated-images-of-cultural-artifacts-with-community-informed-rubrics","title":"Evaluating AI-Generated Images of Cultural Artifacts with Community-Informed Rubrics","url":"https://www.microsoft.com/en-us/research/publication/evaluating-ai-generated-images-of-cultural-artifacts-with-community-informed-rubrics/","published":"2026-04-02","authors":["Nari Johnson","Deepthi Sudharsan","Hamna","Samantha Dalal","Theo Holroyd","Anja Thieme","Hoda Heidari","Daniela Massiceti","Jennifer Wortman Vaughan","Cecily Morrison"],"abstract":"Measurement is essential to improving AI performance and mitigating harms for marginalized groups. As generative AI systems are rapidly deployed across geographies and contexts, AI measurement practices must be designed to support repeatable, automatable application across different models, datasets, and evaluation settings. But the drive to automate measurement can be in tension with the ability for measurement instruments to capture the expertise and perspectives of communities impacted by AI. Recent work advocates for breaking measurement into several key stages: first moving from an abstract concept to be measured into a precise, \"systematized\" concept; next operationalizing the systematized concept into a concrete measurement instrument; and finally applying the measurement instrument on data to produce measurements. This opens up an opportunity to concentrate community engagement i...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Miscellaneous","Artificial intelligence","Graphics and multimedia","Computer science","Generative AI","LLM"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/livemathematicianbench-a-live-benchmark-for-mathematician-level-reasoning-with-proof-sketches","title":"LiveMathematicianBench: A Live Benchmark for Mathematician-Level Reasoning with Proof Sketches","url":"https://www.microsoft.com/en-us/research/publication/livemathematicianbench-a-live-benchmark-for-mathematician-level-reasoning-with-proof-sketches/","published":"2026-04-02","authors":["Linyang He","Qiyao Yu","Hanze Dong","Baohao Liao","Xinxing Xu","Micah Goldblum","Jiang Bian","N. Mesgarani"],"abstract":"Mathematical reasoning is a hallmark of human intelligence, and whether large language models (LLMs) can meaningfully perform it remains a central question in artificial intelligence and cognitive science. As LLMs are increasingly integrated into scientific workflows, rigorous evaluation of their mathematical capabilities becomes a practical necessity. Existing benchmarks are limited by synthetic settings and data contamination. We present LiveMathematicianBench, a dynamic multiple-choice benchmark for research-level mathematical reasoning built from recent arXiv papers published after model training cutoffs. By grounding evaluation in newly published theorems, it provides a realistic testbed beyond memorized patterns. The benchmark introduces a thirteen-category logical taxonomy of theorem types (e.g., implication, equivalence, existence, uniqueness), enabling fine-grained evaluation ac...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Miscellaneous","Artificial intelligence","Computation and Language","Computer science","large language models"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/dynavid-learning-to-generate-highly-dynamic-videos-using-synthetic-motion-data","title":"DynaVid: Learning to Generate Highly Dynamic Videos using Synthetic Motion Data","url":"https://www.microsoft.com/en-us/research/publication/dynavid-learning-to-generate-highly-dynamic-videos-using-synthetic-motion-data/","published":"2026-04-02","authors":["Wonjoon Jin","J. Won","Janghyeok Han","Qi Dai","Chong Luo","Seung-Hwan Baek","Sunghyun Cho"],"abstract":"Despite recent progress, video diffusion models still struggle to synthesize realistic videos involving highly dynamic motions or requiring fine-grained motion controllability. A central limitation lies in the scarcity of such examples in commonly used training datasets. To address this, we introduce DynaVid, a video synthesis framework that leverages synthetic motion data in training, which is represented as optical flow and rendered using computer graphics pipelines. This approach offers two key advantages. First, synthetic motion offers diverse motion patterns and precise control signals that are difficult to obtain from real data. Second, unlike rendered videos with artificial appearances, rendered optical flow encodes only motion and is decoupled from appearance, thereby preventing models from reproducing the unnatural look of synthetic videos. Building on this idea, DynaVid adopts....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Miscellaneous","Computer vision","Computer science","Computer Vision and Pattern Recognition"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"apple:xs1ag3vmevjgeqwdhhhy5mlc","title":"Personalized Group Relative Policy Optimization for Heterogenous Preference Alignment","url":"https://machinelearning.apple.com/research/personalized-group","published":"2026-04-02","authors":["Jialu Wang","Heinrich Peters","Asad A. Butt","Navid Hashemi","Alireza Hashemi","Pouya M. Ghari","Joseph Hoover","James Rae","Morteza Dehghani"],"abstract":"Despite their sophisticated general-purpose capabilities, Large Language Models (LLMs) often fail to align with diverse individual preferences because standard post-training methods, like Reinforcement Learning with Human Feedback (RLHF), optimize for a single, global objective. While Group Relative Policy Optimization (GRPO) is a widely adopted on-policy reinforcement learning framework, its group-based normalization implicitly assumes that all...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["personalized","preference"],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"official:c116e719fcfc389d","title":"Gemma 4 Model Card","url":"https://ai.google.dev/gemma/docs/core/model_card_4?utm_source=deepmind.google&utm_medium=referral&utm_campaign=gdm&utm_content","published":"2026-04-02","authors":["Google/DeepMind"],"abstract":"Official Google DeepMind model card.","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"model_card","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["model card","Gemma 4"],"author_affiliations":["Google/DeepMind"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Google DeepMind model cards page https://deepmind.google/models/model-cards/"}},{"id":"official:fb026f9c925d16bb","title":"Emotion concepts and their function in a large language model","url":"https://www.anthropic.com/research/emotion-concepts-function","published":"2026-04-02","authors":["Anthropic"],"abstract":"All modern language models sometimes act like they have emotions. What’s behind these behaviors? Our interpretability team investigates.","companies":["Anthropic"],"matched_orgs":["Anthropic"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["language model"],"author_affiliations":["Anthropic"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Anthropic research page https://www.anthropic.com/research"}},{"id":"arxiv:2604.01707","title":"Memory in the LLM Era: Modular Architectures and Strategies in a Unified Framework","url":"http://arxiv.org/abs/2604.01707","published":"2026-04-02","authors":["Yanchen Wu","Tenghui Lin","Yingli Zhou","Fangyuan Zhang","Qintian Guo","Xun Zhou","Sibo Wang","Xilin Liu","Yuchi Ma","Yixiang Fang"],"abstract":"Memory emerges as the core module in the large language model (LLM)-based agents for long-horizon complex tasks (e.g., multi-turn dialogue, game playing, scientific discovery), where memory can enable knowledge accumulation, iterative reasoning and self-evolution. A number of memory methods have been proposed in the literature. However, these methods have not been systematically and comprehensively compared under the same experimental settings. In this paper, we first summarize a unified framework that incorporates all the existing agent memory methods from a high-level perspective. We then extensively compare representative agent memory methods on two well-known benchmarks and examine the effectiveness of all methods, providing a thorough analysis of those methods. As a byproduct of our experimental analysis, we also design a new memory method by exploiting modules in the existing metho...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"","openalex_id":"https://openalex.org/W7149873440","cited_by_count":0,"quality_score":53,"matched_keywords":["LLM","language model","memory","agent"],"author_affiliations":["Huawei Technologies (China)","University of Hong Kong"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8263000249862671},{"id":"https://openalex.org/C101468663","display_name":"Modular design","score":0.6114000082015991},{"id":"https://openalex.org/C12186640","display_name":"Memory model","score":0.4514000117778778},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.38499999046325684},{"id":"https://openalex.org/C26517878","display_name":"Key (lock)","score":0.37540000677108765},{"id":"https://openalex.org/C2164484","display_name":"Core (optical fiber)","score":0.31679999828338623},{"id":"https://openalex.org/C2779478453","display_name":"Modularity (biology)","score":0.3154999911785126},{"id":"https://openalex.org/C176649486","display_name":"Memory management","score":0.3068999946117401}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7151321456","title":"PRISM: Navigating Cost–Accuracy Trade-offs for NL2SQL","url":"https://doi.org/10.1145/3786679","published":"2026-04-02","authors":["Gaurav Tarlok Kakkar","Yeounoh Chung","Fatma Özcan","Steve Mussmann","Joy Arulraj"],"abstract":"Large language models (LLMs) have achieved strong performance on natural language to SQL (NL2SQL) tasks, but their practical effectiveness depends on tuning a complex pipeline of interacting components. Real-world deployments must navigate a critical trade-off between execution accuracy and monetary cost, a factor that has been largely overlooked by prior work focused primarily on maximizing accuracy. Navigating this trade-off is non-trivial: the ideal configuration of components (e.g., LLM, prompting strategy, schema linking) is not only interdependent but also highly sensitive to the target database schema. This creates a challenging, schema-aware configuration tuning problem that lacks a systematic solution. We present PRISM, a framework that systematically identifies high-accuracy, cost-efficient NL2SQL configurations tailored to each schema. Adopting an optimize-then-deploy strategy...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3786679","openalex_id":"https://openalex.org/W7151321456","cited_by_count":0,"quality_score":45,"matched_keywords":["LLM","efficient"],"author_affiliations":["Georgia Institute of Technology","Google (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7470999956130981},{"id":"https://openalex.org/C43521106","display_name":"Pipeline (software)","score":0.44769999384880066},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.43799999356269836},{"id":"https://openalex.org/C12725497","display_name":"Baseline (sea)","score":0.3961000144481659},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.38940000534057617},{"id":"https://openalex.org/C510870499","display_name":"SQL","score":0.3750999867916107},{"id":"https://openalex.org/C124101348","display_name":"Data mining","score":0.3720000088214874},{"id":"https://openalex.org/C67666897","display_name":"Prism","score":0.3686000108718872}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7151422544","title":"Reveal Hidden Pitfalls and Navigate Next Generation of Vector Similarity Search from Task-Centric Views: [Experiments & Analysis]","url":"https://doi.org/10.1145/3786690","published":"2026-04-02","authors":["Tingyang Chen","Cong Fu","Jiahua Wu","Haotian Wu","Hua Fan","Xiangyu Ke","Yunjun Gao","Yabo Ni","Anxiang Zeng"],"abstract":"Vector Similarity Search (VSS) in high-dimensional spaces is rapidly emerging as core functionality in next-generation database systems for numerous data-intensive services -- from embedding lookups in large language models (LLMs), to semantic information retrieval and recommendation engines. Current benchmarks, however, evaluate VSS primarily on the recall-latency trade-off against a ground truth defined solely by distance metrics, neglecting how retrieval quality ultimately impacts downstream tasks. This disconnect can mislead both academic research and industrial practice. We present Iceberg, a holistic benchmark suite for end-to-end evaluation of VSS methods in realistic application contexts. From a task-centric view, Iceberg uncovers the Information Loss Funnel , which identifies three principal sources of end-to-end performance degradation: (1) Embedding Loss during feature extract...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3786690","openalex_id":"https://openalex.org/W7151422544","cited_by_count":0,"quality_score":41,"matched_keywords":["retrieval"],"author_affiliations":["Alibaba Group (China)","Merck (Singapore)","Nanyang Technological University","Ningbo University","Zhejiang University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7773000001907349},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.6152999997138977},{"id":"https://openalex.org/C79581498","display_name":"Suite","score":0.5532000064849854},{"id":"https://openalex.org/C63479239","display_name":"Robustness (evolution)","score":0.5295000076293945},{"id":"https://openalex.org/C124101348","display_name":"Data mining","score":0.5245000123977661},{"id":"https://openalex.org/C41608201","display_name":"Embedding","score":0.4336000084877014},{"id":"https://openalex.org/C185798385","display_name":"Benchmark (surveying)","score":0.4325000047683716},{"id":"https://openalex.org/C177264268","display_name":"Set (abstract data type)","score":0.4311999976634979}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"arxiv:2604.01682","title":"PRISM: Probability Reallocation with In-Span Masking for Knowledge-Sensitive Alignment","url":"http://arxiv.org/abs/2604.01682","published":"2026-04-02","authors":["Chenning Xu","Mao Zheng","Mingyang Song"],"abstract":"Supervised fine-tuning (SFT) with token-level hard labels can amplify overconfident imitation of factually unsupported targets, causing hallucinations that propagate in multi-sentence generation. We study an augmented SFT setting in which training instances include coarse sentence-level factuality risk labels and inter-sentence dependency annotations, providing structured signals about where factual commitments are weakly supported. We propose \\textbf{PRISM}, a differentiable risk-gated framework that modifies learning only at fact-critical positions. PRISM augments standard SFT with a lightweight, model-aware probability reallocation objective that penalizes high-confidence predictions on risky target tokens, with its scope controlled by span-level risk weights and model-aware gating. Experiments on hallucination-sensitive factual benchmarks and general evaluations show that PRISM impro...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"","openalex_id":"https://openalex.org/W7149873681","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Tencent (China)"],"concepts":[{"id":"https://openalex.org/C2777402240","display_name":"Masking (illustration)","score":0.6639000177383423},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6312999725341797},{"id":"https://openalex.org/C2778012447","display_name":"Scope (computer science)","score":0.5834000110626221},{"id":"https://openalex.org/C126388530","display_name":"Imitation","score":0.5214999914169312},{"id":"https://openalex.org/C2779843651","display_name":"SIGNAL (programming language)","score":0.5011000037193298},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4902999997138977},{"id":"https://openalex.org/C67666897","display_name":"Prism","score":0.4772999882698059},{"id":"https://openalex.org/C19768560","display_name":"Dependency (UML)","score":0.4490000009536743}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"arxiv:2604.01496","title":"From SWE-ZERO to SWE-HERO: Execution-free to Execution-based Fine-tuning for Software Engineering Agents","url":"http://arxiv.org/abs/2604.01496","published":"2026-04-02","authors":["Nikolai Ludwig","Wasi Uddin Ahmad","Somshubra Majumdar","Boris Ginsburg"],"abstract":"We introduce SWE-ZERO to SWE-HERO, a two-stage SFT recipe that achieves state-of-the-art results on SWE-bench by distilling open-weight frontier LLMs. Our pipeline replaces resource-heavy dependencies with an evolutionary refinement strategy: (1) SWE-ZERO utilizes large-scale, execution-free trajectories to master code semantics and repository-level reasoning, and (2) SWE-HERO applies targeted, execution-backed refinement to transition these semantic intuitions into rigorous engineering workflows. Our empirical results set a new benchmark for open-source models of comparable size. We release a dataset of 300k SWE-ZERO and 13k SWE-HERO trajectories distilled from Qwen3-Coder-480B, alongside a suite of agents based on the Qwen2.5-Coder series. Notably, SWE-HERO-32B achieves a 62.2% resolution rate on SWE-bench Verified. Furthermore, despite being trained exclusively on Python, our agents d...","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"","openalex_id":"https://openalex.org/W7149874344","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Nvidia (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7116000056266785},{"id":"https://openalex.org/C185798385","display_name":"Benchmark (surveying)","score":0.5927000045776367},{"id":"https://openalex.org/C79581498","display_name":"Suite","score":0.5608000159263611},{"id":"https://openalex.org/C43521106","display_name":"Pipeline (software)","score":0.5536999702453613},{"id":"https://openalex.org/C199360897","display_name":"Programming language","score":0.5306000113487244},{"id":"https://openalex.org/C177264268","display_name":"Set (abstract data type)","score":0.5152000188827515},{"id":"https://openalex.org/C115903868","display_name":"Software engineering","score":0.5058000087738037},{"id":"https://openalex.org/C27158222","display_name":"Generalizability theory","score":0.46219998598098755}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/refinerl-advancing-competitive-programming-with-self-refinement-reinforcement-learning","title":"RefineRL: Advancing Competitive Programming with Self-Refinement Reinforcement Learning","url":"https://www.microsoft.com/en-us/research/publication/refinerl-advancing-competitive-programming-with-self-refinement-reinforcement-learning/","published":"2026-04-01","authors":["Shaopeng Fu","Xingxing Zhang","Li Dong","Furu Wei"],"abstract":"While large language models (LLMs) have demonstrated strong performance on complex reasoning tasks such as competitive programming (CP), existing methods predominantly focus on single-attempt settings, overlooking their capacity for iterative refinement. In this paper, we present RefineRL, a novel approach designed to unleash the self-refinement capabilities of LLMs for CP problem solving. RefineRL introduces two key innovations: (1) Skeptical-Agent, an iterative self-refinement agent equipped with local execution tools to validate generated solutions against public test cases of CP problems. This agent always maintains a skeptical attitude towards its own outputs and thereby enforces rigorous self-refinement even when validation suggests correctness. (2) A reinforcement learning (RL) solution to incentivize LLMs to self-refine with only standard RLVR data (i.e., problems paired with the...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":84,"matched_keywords":["Miscellaneous","Artificial intelligence","Programming languages and software engineering","Computer science","large language models","Machine learning","LLM","agent"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/from-binary-groundedness-to-support-relations-towards-a-reader-centred-taxonomy-for-comprehension-of-ai-output","title":"From Binary Groundedness to Support Relations: Towards a Reader-Centred Taxonomy for Comprehension of AI Output","url":"https://www.microsoft.com/en-us/research/publication/from-binary-groundedness-to-support-relations-towards-a-reader-centred-taxonomy-for-comprehension-of-ai-output/","published":"2026-04-01","authors":["Advait Sarkar","Christian Poelitz","Viktor Kewenig"],"abstract":"Generative AI tools often answer questions using source documents, e.g., through retrieval augmented generation. Current groundedness and hallucination evaluations largely frame the relationship between an answer and its sources as binary (the answer is either supported or unsupported). However, this obscures both the syntactic moves (e.g., direct quotation vs. paraphrase) and the interpretive moves (e.g., induction vs. deduction) performed when models reformulate evidence into an answer. This limits both benchmarking and user-facing provenance interfaces.We propose the development of a reader-centred taxonomy of grounding as a set of support relations between generated statements and source documents. We explain how this might be synthesised from prior research in linguistics and philosophy of language, and evaluated through a benchmark and human annotation protocol. Such a framework wo...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":84,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Human language technologies","Human-computer interaction","Human–computer interaction","Natural language processing","1970-01-01","retrieval"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/speech-llms-are-contextual-reasoning-transcribers","title":"Speech LLMs are Contextual Reasoning Transcribers","url":"https://www.microsoft.com/en-us/research/publication/speech-llms-are-contextual-reasoning-transcribers/","published":"2026-04-01","authors":["Keqi Deng","Ruchao Fan","Bo Ren","Yiming Wang","Jinyu Li"],"abstract":"Despite extensions to speech inputs, effectively leveraging the rich knowledge and contextual understanding of large language models (LLMs) in automatic speech recognition (ASR) remains non-trivial, as the task primarily involves direct speech-to-text mapping. To address this, this paper proposes chain-of-thought ASR (CoT-ASR), which constructs a reasoning chain that enables LLMs to first analyze the input speech and generate contextual analysis, thereby fully exploiting their generative capabilities. With this contextual reasoning, CoT-ASR then performs more informed speech recognition and completes both reasoning and transcription in a single pass. Moreover, CoT-ASR naturally supports user-guided transcription: while designed to self-generate reasoning, it can also seamlessly incorporate user-provided context to guide transcription, further extending ASR functionality. To reduce the mo...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"https://openalex.org/W7149210512","cited_by_count":0,"quality_score":80,"matched_keywords":["Miscellaneous","Artificial intelligence","Audio and Acoustics","Computer science","large language models","Speech recognition","LLM"],"author_affiliations":["Microsoft","Microsoft (United States)"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/personal-validation-effect-in-llms-positive-ai-responses-bias-perceptions-of-validity-reliability-personalization-and-usefulness-of-fictitious-predictions","title":"Personal Validation Effect in LLMs: Positive AI Responses Bias Perceptions of Validity, Reliability, Personalization, and Usefulness of Fictitious Predictions","url":"https://www.microsoft.com/en-us/research/publication/personal-validation-effect-in-llms-positive-ai-responses-bias-perceptions-of-validity-reliability-personalization-and-usefulness-of-fictitious-predictions/","published":"2026-04-01","authors":["Pat Pataranutaporn","Eunhae Lee","Judith Amores","Pattie Maes"],"abstract":"Large Language Models (LLMs) are becoming increasingly ubiquitous in daily life, impacting decision-making across various domains. A substantial body of prior work has shown that individuals tend to evaluate positive predictions more favorably than negative ones---a phenomenon often referred to as the personal validation effect---across various non-AI prediction sources. Building on this foundation, we extend this well-established psychological effect to the context of LLM-based predictions, examining how prediction valence influences users’ perceptions when the source is an AI system. We investigate how positive AI-generated responses affect perceived validity, personalization, reliability, and usefulness of chatbot predictions, even when those predictions are fictitious and pre-scripted. In a study of 238 participants, positive predictions were perceived as significantly more valid (36...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":80,"matched_keywords":["Inproceedings (Conference)","Human-computer interaction","Human–computer interaction","1970-01-01","LLM","personalized","personalization"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/chat-should-i-leave-him-risks-rewards-and-roles-for-ai-in-relationship-advice","title":"\"Chat, Should I Leave Him?\" Risks, Rewards, and Roles for AI in Relationship Advice","url":"https://www.microsoft.com/en-us/research/publication/chat-should-i-leave-him-risks-rewards-and-roles-for-ai-in-relationship-advice/","published":"2026-04-01","authors":["Emily Tseng","Calvin A. Liang"],"abstract":"As more people turn to chatbots for socioemotional support—often termed psychosocial AI—the stakes of understanding these interactions grow. Psychosocial AI might foster healthier human-human relationships—and also might exacerbate loneliness, abuse, and self-harm. We provide an empirical account of one less-studied facet: seeking AI advice on sex, dating, and relationships with other people. We recruited 25 people who use AI for relationship advice to a questionnaire, collecting 90 prompts illustrating their practices. Interviews with 17 further explored how they navigate AI’s limitations to achieve intimacy goals. Our findings detail (1) the roles that users imagine for AI in relationship advice; (2) how users navigate risks like sycophancy and overreliance to attain relational benefits; and (3) the folk theories users hold and the prompting tactics they employ to overcome AI’s limitat...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":80,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Human-computer interaction","Social sciences","Human–computer interaction","Sociotechnical system","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/helping-me-versus-doing-it-for-me-designing-for-agency-in-llm-infused-writing-tools-for-science-journalism","title":"“Helping Me Versus Doing It for Me”: Designing for Agency in LLM-Infused Writing Tools for Science Journalism","url":"https://www.microsoft.com/en-us/research/publication/helping-me-versus-doing-it-for-me-designing-for-agency-in-llm-infused-writing-tools-for-science-journalism/","published":"2026-04-01","authors":["Sachita Nishal","Mina Lee","Nicholas Diakopoulos","Jennifer Wortman Vaughan"],"abstract":"Journalists rely on their agency—the ability to exercise independent judgment in alignment with their values—to fulfill their democratic social role. In this study, we investigate how LLM-infused writing tools reshape journalists’ agency in editorial decision making. In interviews with 20 science journalists, we presented four hypothetical LLM-infused writing tools representing a range of possible design space configurations. We find that journalists are selectively willing to cede control: they view AI that gathers information or offers feedback as supporting their efficiency by automating exe cution while leaving decision making intact. In contrast, they see AI that generates core ideas or drafts as a threat to their autonomy, skill development, self-fulfillment, and professional relationships. This sensitivity extends to seemingly automatable tasks such as manipulating writing voice w...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Human-computer interaction","1970-01-01","LLM","journalism"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/universal-yoco-for-efficient-depth-scaling","title":"Universal YOCO for Efficient Depth Scaling","url":"https://www.microsoft.com/en-us/research/publication/universal-yoco-for-efficient-depth-scaling/","published":"2026-04-01","authors":["Yutao Sun","Li Dong","Tianzhu Ye","Shaohan Huang","Jianyong Wang","Furu Wei"],"abstract":"The rise of test-time scaling has remarkably boosted the reasoning and agentic proficiency of Large Language Models (LLMs). Yet, standard Transformers struggle to scale inference-time compute efficiently, as conventional looping strategies suffer from high computational overhead and a KV cache that inflates alongside model depth. We present Universal YOCO (YOCO-U), which combines the YOCO decoder-decoder architecture with recursive computation to achieve a synergistic effect greater than either alone. Built on the YOCO framework, YOCO-U implements a Universal Self-Decoder that performs multiple iterations via parameter sharing, while confining the iterative process to shallow, efficient-attention layers. This combination yields a favorable capability-efficiency tradeoff that neither YOCO nor recursion achieves independently. The YOCO architecture provides a constant global KV cache and l...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Miscellaneous","Artificial intelligence","Programming languages and software engineering","Computer science","large language models","efficient"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/tackling-the-complexity-of-cancer-with-generative-models","title":"Tackling the complexity of cancer with generative models","url":"https://www.microsoft.com/en-us/research/publication/tackling-the-complexity-of-cancer-with-generative-models/","published":"2026-04-01","authors":["Ashley Conard","Madeline Hughes","Jimmy Hall","Neil Tenenholtz","Eric Zimmermann","Lorin Crawford","Ava P. Amini","Kristen Severson"],"abstract":"The Hallmarks of Cancer framework has played a seminal role in developing our understanding of cancer biology. By design, these hallmarks abstract cancer into a common set of functional capabilities. The hallmarks thus constitute an intentionally reductionist framework that has unified diverse observations and yielded valuable mechanistic insight, while leaving unresolved how these processes interact across scales. Complementary tools are therefore needed to capture cancer's inherently complex, multimodal, and multiscale nature. Here, we posit that generative models, built on the recent advances of artificial intelligence, are the key technology to capture this complexity and to thereby improve how we diagnose, understand, and intervene in cancer. Specifically, because of their ability to recognize complex patterns, process unstructured inputs, and synthesize multimodal inputs, generativ...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.1016/j.cell.2026.03.027","openalex_id":"https://openalex.org/W7154582675","cited_by_count":0,"quality_score":76,"matched_keywords":["Article (Journal)","Artificial intelligence","Medical, health and genomics","Biology","Cancer","Oncology"],"author_affiliations":["Microsoft","Microsoft (United States)","Microsoft Research (United Kingdom)"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/not-another-ehr-reimagining-physician-information-needs-with-generative-ai-technology","title":"Not Another EHR: Reimagining Physician Information Needs with Generative AI Technology","url":"https://www.microsoft.com/en-us/research/publication/not-another-ehr-reimagining-physician-information-needs-with-generative-ai-technology/","published":"2026-04-01","authors":["Ruican Zhong","Jicahen Li","Gary Hsieh","David W. McDonald","Selin S. Everett","Alyssa Unell","Jonathan M. Carlson","Katie Claveau","Noel Codella","Khalil Malik","Scott Mackie","Eduardo Olvera"],"abstract":"This position paper explores how generative AI and dynamic user interfaces (UI) can reshape clinicians’ interactions with patient data. Based on interviews with physician technologists, we frame clinical work as an investigative, sensemaking process and argue that dynamic UI enables more adaptive, clinician centered ways to surface and synthesize information in real time. We examine emerging mental models of AI in clinical practice and identify key design considerations around trust, safety, and human–AI collaboration. Venue: 1970-01-01","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Human language technologies","Human-computer interaction","Medical, health and genomics","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/netpress-dynamically-generated-llm-benchmarks-for-network-applications","title":"NetArena: Dynamic Benchmarks for AI Agents in Network Automation","url":"https://www.microsoft.com/en-us/research/publication/netpress-dynamically-generated-llm-benchmarks-for-network-applications/","published":"2026-04-01","authors":["Yajie Zhou","Jiajun Ruan","Eric S. Wang","Sadjad Fouladi","Francis Y. Yan","Kevin Hsieh","Zaoxing Liu"],"abstract":"As AI agents expand into high-stakes domains like network system operations, evaluating their real-world reliability becomes increasingly critical. However, existing benchmarks risk contamination due to static design, show high statistical variance from limited dataset size, and fail to reflect the complexity of production environments. We present NetArena, a dynamic benchmark generation framework for network applications. NetArena introduces a novel abstraction and unified interface that generalize across diverse tasks, enabling dynamic benchmarking despite the heterogeneity of network workloads. At runtime, users can generate unlimited queries on demand. NetArena integrates with network emulators to measure correctness, safety, and latency during execution. We demonstrate NetArena on three representative applications and find that (1) NetArena significantly improves statistical reliabi...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Systems and networking","Computer science","large language models","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/learning-to-generate-unit-test-via-adversarial-reinforcement-learning","title":"Learning to Generate Unit Test via Adversarial Reinforcement Learning","url":"https://www.microsoft.com/en-us/research/publication/learning-to-generate-unit-test-via-adversarial-reinforcement-learning/","published":"2026-04-01","authors":["Dongjun Lee","Changho Hwang","Kimin Lee"],"abstract":"Unit testing is a core practice in programming, enabling systematic evaluation of programs produced by human developers or large language models (LLMs). Given the challenges in writing comprehensive unit tests, LLMs have been employed to automate unit test generation, yet methods for training LLMs to produce high-quality unit tests remain underexplored. In this work, we propose UTRL, a novel reinforcement learning (RL) framework that trains an LLM to generate high-quality unit test given a programming instruction. Our key idea is to iteratively train two LLMs, the unit test generator and the code generator, in an adversarial manner via RL: (1) the unit test generator is trained to maximize a discrimination reward, encouraging it to produce tests that reveal faults in the code generator’s solutions; and (2) the code generator is trained to maximize a code reward, encouraging it to produce...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Programming languages and software engineering","large language models","1970-01-01","LLM"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/latent-darm-bridging-discrete-diffusion-and-autoregressive-models-for-reasoning","title":"Latent-DARM: Bridging Discrete Diffusion and Autoregressive Models for Reasoning","url":"https://www.microsoft.com/en-us/research/publication/latent-darm-bridging-discrete-diffusion-and-autoregressive-models-for-reasoning/","published":"2026-04-01","authors":["Lina Berrayana","Ahmed Heakl","Muhammad Abdullah Sohail","Thomas Hofmann","Salman Khan","Wei Chen"],"abstract":"Most multi-agent systems rely exclusively on autoregressive language models (ARMs) that are based on sequential generation. Although effective for fluent text, ARMs limit global reasoning and plan revision. On the other hand, Discrete Diffusion Language Models (DDLMs) enable non-sequential, globally revisable generation and have shown strong planning capabilities, but their limited text fluency hinders direct collaboration with ARMs. We introduce Latent-DARM, a latent-space communication framework bridging DDLM (planners) and ARM (executors), maximizing collaborative benefits. Across mathematical, scientific, and commonsense reasoning benchmarks, Latent-DARM outperforms text-based interfaces on average, improving accuracy from 27.0% to 36.0% on DART-5 and from 0.0% to 14.0% on AIME 2024. Latent-DARM approaches the results of state-of-the-art reasoning models while using less than 2.2% of...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Algorithms","Artificial intelligence","1970-01-01","agent","multi-agent"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/identifying-harm-in-personalized-generative-ai-systems-require-user-centered-auditing-at-the-interaction-level","title":"Identifying Harm in Personalized, Generative AI Systems Require User-Centered Auditing at the Interaction Level","url":"https://www.microsoft.com/en-us/research/publication/identifying-harm-in-personalized-generative-ai-systems-require-user-centered-auditing-at-the-interaction-level/","published":"2026-04-01","authors":["Hannah Cha"],"abstract":"Personalized, generative AI systems increasingly adapt their behavior to individual users over time, fundamentally changing model behavior. While existing auditing approaches have been effective at surfacing harms in non-personalized contexts, they often rely on static, simulated evaluations and definitions of harm that aggregate across broad, group categories. In this position paper, we argue that such approaches can fail to capture emergent harms in personalized generative AI systems, where harms emerge through interpretations of ongoing interaction and evolve with user history. We identify three presuppositions underlying many harm auditing paradigms: that harms can be (1) specified outside real-world interaction, (2) defined non-pluralistically within groups, and (3) treated as static. One might argue that personalized systems could simply learn definitions of what constitutes as har...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Human language technologies","1970-01-01","personalized","personalization"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/a-framework-to-characterize-reporting-on-generative-ai-use","title":"A Framework to Characterize Reporting on Generative AI Use","url":"https://www.microsoft.com/en-us/research/publication/a-framework-to-characterize-reporting-on-generative-ai-use/","published":"2026-04-01","authors":["Agathe Balayn","Varun Nagaraj Rao","Su Lin Blodgett","Aylin Caliskan","Solon Barocas"],"abstract":"Unlike with traditional predictive AI models, today's generative AI models are increasingly designed to be general-purpose, able to perform a wide range of tasks. This makes it challenging to develop a reliable and useful understanding of the ways in which this technology is and could be used. As a result, academic and policy researchers and generative AI providers have started to publish the results of their own investigations about the use of generative AI. This information is, however, fragmented, potentially incomplete, sometimes ambiguous, and often lacking in methodological specificity. In this paper, we conducted an integrative review to build a multi-dimensional framework that specifies what kind of information about generative AI use could be reported and how, and illustrated its analytical utility by applying the framework to a collection of over 110 industry documents. Our ana...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.1145/3772318.3791649","openalex_id":"https://openalex.org/W7154152607","cited_by_count":1,"quality_score":73,"matched_keywords":["Inproceedings (Conference)","Human-computer interaction","Generative AI","Human–computer interaction","1970-01-01"],"author_affiliations":["Microsoft","Microsoft (Canada)","Microsoft (United States)","Microsoft Research Montréal (Canada)","Microsoft Research New York City (United States)","Princeton University","University of Washington"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/tools-for-thought-understanding-protecting-and-augmenting-human-cognition-with-generative-ai-from-vision-to-implementation","title":"Tools for Thought: Understanding, Protecting, and Augmenting Human Cognition with Generative AI — From Vision to Implementation","url":"https://www.microsoft.com/en-us/research/publication/tools-for-thought-understanding-protecting-and-augmenting-human-cognition-with-generative-ai-from-vision-to-implementation/","published":"2026-04-01","authors":["Zelun Tony Zhang","Nick von Felten","Leon Reicherts","Lev Tankelevitch","Zhitong Guan","Sean Rintel","Yue Fu","Jessica He","Kenneth Holstein","Advait Sarkar","Gonzalo Ramos","Anuschka Schmitt"],"abstract":"About this CHI 2026 workshop: GenAI radically widens the scope and capability of automation for work, learning, and creativity. While impactful, it also changes workflows, raising questions about its effects on cognition, including critical thinking and learning. Yet GenAI also offers opportunities for designing “tools for thought” (TfT) that protect and augment cognition. Such systems provoke critical thinking, provide personalized tutoring, or enable novel ways of sensemaking, among other approaches.How does GenAI change workflows and human cognition? What are opportunities and challenges for designing GenAI systems that protect and augment thinking? Which theories, perspectives, and methods are relevant? This workshop aims to develop a multidisciplinary community interested in exploring these questions to protect against the erosion, and fuel the augmentation, of human cognition using...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Human-computer interaction","1970-01-01","personalized"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/quotient-space-diffusion-model","title":"Quotient-Space Diffusion Model","url":"https://www.microsoft.com/en-us/research/publication/quotient-space-diffusion-model/","published":"2026-04-01","authors":["Yixian Xu","Yusong Wang","Shengjie Luo","Kaiyuan Gao","Tianyu He","Di He","Chang Liu"],"abstract":"Diffusion-based generative models have reformed generative AI, and have enabled new capabilities in the science domain, for example, generating 3D structures of molecules. Due to the intrinsic problem structure of certain tasks, there is often a symmetry in the system, which identifies objects that can be converted by a group action as equivalent, hence the target distribution is essentially defined on the quotient space with respect to the group. In this work, we establish a formal framework for diffusion modeling on a general quotient space, and apply it to molecular structure generation which follows the special Euclidean group SE(3) symmetry. The framework reduces the necessity of learning the component corresponding to the group action, hence simplifies learning difficulty over conventional group-equivariant diffusion models, and the sampler guarantees recovering the target distribu...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Mathematics","mathematics","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/generative-design-and-vibe-coding-rethinking-the-design-development-divide-for-ui-prototyping","title":"Generative Design and Vibe Coding: Rethinking The Design-Development Divide for UI Prototyping","url":"https://www.microsoft.com/en-us/research/publication/generative-design-and-vibe-coding-rethinking-the-design-development-divide-for-ui-prototyping/","published":"2026-04-01","authors":["Xinqi Zhang","Hari Subramonyam","Advait Sarkar","Ian Drosos","Jack Wang","Kyungho Lee","Veronica Pimenova","Xiang \"Anthony\" Chen","Kai Lukoff"],"abstract":"About this CHI 2026 meetup: Generative Design & Vibe Coding: Rethinking the design–development divide for UI prototyping. An interactive CHI 2026 meetup exploring how Gen-AI is reshaping prototyping across Houde & Hill's dimensions of look and feel and implementation. Through hands-on activities and reflection, we'll discuss opportunities, breakdowns, and best practices for human–AI collaboration. Prototyping has long been central to HCI as a way of knowing. Recent advances in Generative AI are reshaping who prototypes and how, blurring boundaries between designers and developers, enabling faster workflows while raising new challenges around trust, authorship, and control. Generative AI blurs the traditional boundary between design and development. Two emerging paradigms, Generative Design and Vibe Coding, are transforming the one-directional handoff into a collaborative, AI-mediated co-...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Human-computer interaction","Programming languages and software engineering","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/veristruct","title":"VeriStruct: AI-assisted Automated Verification of Data-Structure Modules in Verus","url":"https://www.microsoft.com/en-us/research/publication/veristruct/","published":"2026-04-01","authors":["Chuyue Sun","Yican Sun","Ethan Zhang","Daneshvar Amrollahi","Shuvendu Lahiri","Shan Lu","David Dill","Clark Barrett"],"abstract":"We introduce VeriStruct, a novel framework that extends AI-assisted automated verification from single functions to more complex data structure modules in Verus. VeriStruct employs a planner module to orchestrate the systematic generation of abstractions, type invariants, specifications, and proof code. To address the challenge that LLMs often misunderstand Verus' annotation syntax and verification-specific semantics, VeriStruct embeds syntax guidance within prompts and includes a repair stage to automatically correct annotation errors. In an evaluation on eleven Rust data structure modules, VeriStruct succeeds on ten of the eleven, successfully verifying 128 out of 129 functions (99.2%) in total. These results represent an important step toward the goal of automatic AI-assisted formal verification. Venue: 1970-01-01","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.1007/978-3-032-22749-2_6","openalex_id":"https://openalex.org/W4415909449","cited_by_count":0,"quality_score":68,"matched_keywords":["Inproceedings (Conference)","Programming languages and software engineering","Systems and networking","1970-01-01"],"author_affiliations":["Microsoft","Microsoft (United States)","Peking University","Stanford University"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"huawei-noah:205","title":"Beyond Masks: Efficient, Flexible Diffusion Language Models via Deletion-Insertion Processes","url":"https://www.noahlab.com.hk/en/scientific_research/beyond-masks-efficient-flexible-diffusion-language-models-via-deletion-insertion-processes","published":"2026-04-01","authors":["Huawei/Noah"],"abstract":"Official Huawei Noah's Ark Lab publication entry. Venue: ICLR 2026. External paper link: https://openreview.net/forum?id=VbvXjs5f72","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Model architecture and optimization","ICLR 2026","2026","efficient"],"author_affiliations":["Huawei/Noah"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Huawei Noah Wagtail publication API https://www.noahlab.com.hk/wt_app/api/v2/pages/"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/improving-diffusion-models-for-class-imbalanced-training-data-via-capacity-manipulation","title":"Improving Diffusion Models for Class-imbalanced Training Data via Capacity Manipulation","url":"https://www.microsoft.com/en-us/research/publication/improving-diffusion-models-for-class-imbalanced-training-data-via-capacity-manipulation/","published":"2026-04-01","authors":["Feng Hong","Jiangchao Yao","Yifei Shen","Dongsheng Li","Ya Zhang","Yanfeng Wang"],"abstract":"While diffusion models have achieved remarkable performance in image generation, they often struggle with the imbalanced datasets frequently encountered in real-world applications, resulting in significant performance degradation on minority classes. In this paper, we identify model capacity allocation as a key and previously underexplored factor contributing to this issue, providing a perspective that is orthogonal to existing research. Our empirical experiments and theoretical analysis reveal that majority classes monopolize an unnecessarily large portion of the model's capacity, thereby restricting the representation of minority classes. To address this, we propose Capacity Manipulation (CM), which explicitly reserves model capacity for minority classes. Our approach leverages a low-rank decomposition of model parameters and introduces a capacity manipulation loss to allocate appropri...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/on-the-use-of-llms-for-relevance-labelling","title":"On the Use of LLMs for Relevance Labelling","url":"https://www.microsoft.com/en-us/research/publication/on-the-use-of-llms-for-relevance-labelling/","published":"2026-04-01","authors":["Marwah Alaofi","Paul Thomas","Falk Scholer","Mark Sanderson"],"abstract":"Large Language Models (LLMs) are increasingly used to replace human judges to assess the relevance of information objects, raising concerns about circularity, bias, and whether simulated preferences can substitute for human judgement. This work presents experiments using multiple LLMs to label passages for relevance. It examines their gullibility – how easily they are misled into labelling irrelevant passages as relevant. It also compares LLMs with human judges in ranking systems, analysing differences in discriminative power and whether some systems benefit under LLM-based evaluation. Results show that LLMs are influenced by the presence of query terms, even with irrelevant or random passages. Moreover, LLM-generated rankings are highly correlated with those of human judges, with strong agreement on which system is better in pairwise comparisons. However, LLMs may exhibit lower discrimi...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Search and information retrieval","LLM"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"arxiv:2604.00626","title":"A Survey of On-Policy Distillation for Large Language Models","url":"http://arxiv.org/abs/2604.00626","published":"2026-04-01","authors":["Mingyang Song","Mao Zheng"],"abstract":"Knowledge distillation has become a primary mechanism for transferring reasoning and domain expertise from frontier Large Language Models (LLMs) to smaller, deployable students. However, the dominant paradigm remains \\textit{off-policy}: students train on static teacher-generated data and never encounter their own errors during learning. This train--test mismatch, an instance of \\textit{exposure bias}, causes prediction errors to compound autoregressively at inference time. On-Policy Distillation (OPD) addresses this by letting the student generate its own trajectories and receive teacher feedback on these self-generated outputs, grounding distillation in the theory of interactive imitation learning. Despite rapid growth spanning divergence minimization, reward-guided learning, and self-play, the OPD literature remains fragmented with no unified treatment. This survey provides the first....","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"","openalex_id":"https://openalex.org/W7149209681","cited_by_count":0,"quality_score":45,"matched_keywords":["distillation","agent"],"author_affiliations":["Tencent (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6840000152587891},{"id":"https://openalex.org/C204030448","display_name":"Distillation","score":0.6402000188827515},{"id":"https://openalex.org/C2776214188","display_name":"Inference","score":0.567300021648407},{"id":"https://openalex.org/C207390915","display_name":"Divergence (linguistics)","score":0.5364999771118164},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5006999969482422},{"id":"https://openalex.org/C36503486","display_name":"Domain (mathematical analysis)","score":0.45969998836517334},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.44359999895095825},{"id":"https://openalex.org/C126388530","display_name":"Imitation","score":0.4002000093460083}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7147692500","title":"Surgical RARP copilot: a vision language model for robot-assisted radical prostatectomy","url":"https://doi.org/10.1038/s44484-025-00003-1","published":"2026-04-01","authors":["Wouter Bogaert","François Remy","Javier Gamazo Tejero","Sean D. Huver","Edoardo Beatrici","Frederiek D’Hondt","Niki Rashidian","Mahdi Azizian","Tony Belpaeme","A. Mottrie","Pieter De Backer"],"abstract":"Complex surgical procedures may benefit from AI systems that integrate visual and textual data for holistic scene understanding. We present Surgical RARP Copilot, a vision-language model for robot-assisted radical prostatectomy (RARP) that enables open question answering during surgery. We adapted a large language model to RARP literature and used it to generate a dataset of RARP images paired with ~1 million Q&A examples to train the model. Performance was evaluated for open-domain Q&A, surgical phase recognition, and instrument detection, and the system was deployed and tested in real time during a live operation—the first surgical VLM implemented in live robotic surgery. On unseen RARP procedures, Copilot showed robust performance across tasks. This work demonstrates feasible real-time AI guidance and suggests benefits for training, team communication, and knowledge support; future wo...","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1038/s44484-025-00003-1","openalex_id":"https://openalex.org/W7147692500","cited_by_count":0,"quality_score":41,"matched_keywords":["language model"],"author_affiliations":["Ghent University","Humanitas University","Nvidia (United States)","ORSI Academy","Onze Lieve Vrouwziekenhuis Hospital"],"concepts":[{"id":"https://openalex.org/C2779466945","display_name":"Prostatectomy","score":0.7924000024795532},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.619700014591217},{"id":"https://openalex.org/C18762648","display_name":"Work (physics)","score":0.45739999413490295},{"id":"https://openalex.org/C3019611579","display_name":"Surgical procedures","score":0.429500013589859},{"id":"https://openalex.org/C71924100","display_name":"Medicine","score":0.40130001306533813},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3864000141620636},{"id":"https://openalex.org/C168167062","display_name":"Component (thermodynamics)","score":0.37709999084472656},{"id":"https://openalex.org/C19527891","display_name":"Medical physics","score":0.3416000008583069}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"official:6b32a4e4ac229236","title":"Mythos Preview System Card","url":"https://www.anthropic.com/claude-mythos-preview-system-card","published":"2026-04","authors":["Anthropic"],"abstract":"Official Anthropic system card for Mythos Preview.","companies":["Anthropic"],"matched_orgs":["Anthropic"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"model_card","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["system card","Claude","Mythos Preview"],"author_affiliations":["Anthropic"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Anthropic system cards page https://www.anthropic.com/system-cards"}},{"id":"official:9a6471ae1c18144a","title":"Claude Opus 4.7 System Card","url":"https://anthropic.com/claude-opus-4-7-system-card","published":"2026-04","authors":["Anthropic"],"abstract":"Official Anthropic system card for Claude Opus 4.7.","companies":["Anthropic"],"matched_orgs":["Anthropic"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"model_card","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["system card","Claude","Claude Opus 4.7"],"author_affiliations":["Anthropic"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Anthropic system cards page https://www.anthropic.com/system-cards"}},{"id":"official:ef6b93541859556e","title":"TimeOmni-1: Incentivizing Complex Reasoning with Time Series in Large Language Models","url":"https://research.nvidia.com/publication/2026-04_timeomni-1-incentivizing-complex-reasoning-time-series-large-language-models","published":"2026-04","authors":["Tong Guan","Huck Yang","Sabato Marco Siniscalchi","Qingsong Wen","Ming Jin","Shirui Pan"],"abstract":"Official NVIDIA Research publication. ICLR","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["ICLR"],"author_affiliations":["NVIDIA"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official NVIDIA Research publications page https://research.nvidia.com/publications?f%5B0%5D=publication_date%3A2026&page=0"}},{"id":"official:45a350e3a4e026ab","title":"Test-Time Alignment for Large Language Models via Textual Model Predictive Control","url":"https://research.nvidia.com/publication/2026-04_test-time-alignment-large-language-models-textual-model-predictive-control","published":"2026-04","authors":["Kuang-Da Wang","Teng-Ruei Chen","Yu Heng Hung","Guo-Xun Ko","Shuoyang Ding","Frank Wang","Huck Yang","Wen-Chih Peng","Ping-Chun Hsieh"],"abstract":"Official NVIDIA Research publication. ICLR","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["ICLR"],"author_affiliations":["NVIDIA"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official NVIDIA Research publications page https://research.nvidia.com/publications?f%5B0%5D=publication_date%3A2026&page=0"}},{"id":"official:573c2216d3aead98","title":"QCalEval: Benchmarking Vision-Language Models for Quantum Calibration Plot Understanding","url":"https://research.nvidia.com/publication/2026-04_qcaleval-benchmarking-vision-language-models-quantum-calibration-plot","published":"2026-04","authors":["Shuxiang Cao","Zijian Zhang","Abhishek Agarwal","Grace Bratrud","Niyaz R. Beysengulov","Daniel C. Cole","Alejandro Gomez Frieiro","Elena O. Glen","Hao Hsu","Gang Huang","Raymond Jow","Greshma Shaji"],"abstract":"Official NVIDIA Research publication.","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["NVIDIA"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official NVIDIA Research publications page https://research.nvidia.com/publications?f%5B0%5D=publication_date%3A2026&page=0"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/predicting-neuromodulation-outcome-for-parkinsons-disease-with-generative-virtual-brain-model","title":"Predicting Neuromodulation Outcome for Parkinson's Disease with Generative Virtual Brain Model","url":"https://www.microsoft.com/en-us/research/publication/predicting-neuromodulation-outcome-for-parkinsons-disease-with-generative-virtual-brain-model/","published":"2026-03-31","authors":["Siyuan Du","Siyi Li","Shuwei Bai","Ang Li","Haolin Li","Mingqing Xiao","Yang Pan","Dongsheng Li","Weidi Xie","Yanfeng Wang","Ya Zhang","Chencheng Zhang"],"abstract":"Parkinson's disease (PD) affects over ten million people worldwide. Although temporal interference (TI) and deep brain stimulation (DBS) are promising therapies, inter-individual variability limits empirical treatment selection, increasing non-negligible surgical risk and cost. Previous explorations either resort to limited statistical biomarkers that are insufficient to characterize variability, or employ AI-driven methods which is prone to overfitting and opacity. We bridge this gap with a pretraining-finetuning framework to predict outcomes directly from resting-state fMRI. Critically, a generative virtual brain foundation model, pretrained on a collective dataset (2707 subjects, 5621 sessions) to capture universal disorder patterns, was finetuned on PD cohorts receiving TI (n=51) or DBS (n=55) to yield individualized virtual brains with high fidelity to empirical functional connectiv...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":80,"matched_keywords":["Miscellaneous","Artificial intelligence","Data platforms and analytics","Medical, health and genomics","Biology","Computer science","personalized"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/omnisch-a-multimodal-pcb-schematic-benchmark-for-structured-diagram-visual-reasoning","title":"OmniSch: A Multimodal PCB Schematic Benchmark For Structured Diagram Visual Reasoning","url":"https://www.microsoft.com/en-us/research/publication/omnisch-a-multimodal-pcb-schematic-benchmark-for-structured-diagram-visual-reasoning/","published":"2026-03-31","authors":["Taiting Lu","Kaiyuan Lin","Yuxin Tian","Yubo Wang","Muchuan Wang","Sharique Khatri","Akshit Kartik","Yixi Wang","Amey Rane","Yida Wang","Yifan Yang","Yi-Chao Chen"],"abstract":"Recent large multimodal models (LMMs) have made rapid progress in visual grounding, document understanding, and diagram reasoning tasks. However, their ability to convert Printed Circuit Board (PCB) schematic diagrams into machine-readable spatially weighted netlist graphs, jointly capturing component attributes, connectivity, and geometry, remains largely underexplored, despite such graph representations are the backbone of practical electronic design automation (EDA) workflows. To bridge this gap, we introduce OmniSch, the first comprehensive benchmark designed to assess LMMs on schematic understanding and spatial netlist graph construction. OmniSch contains 1,854 real-world schematic diagrams and includes four tasks: (1) visual grounding for schematic entities, with 109.9K grounded instances aligning 423.4K diagram semantic labels to their visual regions; (2) diagram-to-graph reasonin...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Miscellaneous","Artificial intelligence","Computer vision","Computer science","Computer Vision and Pattern Recognition","Multimodal Large Language Models"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/drift-aware-continual-tokenization-for-generative-recommendation","title":"Drift-Aware Continual Tokenization for Generative Recommendation","url":"https://www.microsoft.com/en-us/research/publication/drift-aware-continual-tokenization-for-generative-recommendation/","published":"2026-03-31","authors":["Yu-Hao Feng","Jiahao Liu","Mingzhe Han","Dongsheng Li","H. Gu","Peng Zhang","Tun Lu","Ning Gu"],"abstract":"Generative recommendation commonly adopts a two-stage pipeline in which a learnable tokenizer maps items to discrete token sequences (i.e. identifiers) and an autoregressive generative recommender model (GRM) performs prediction based on these identifiers. Recent tokenizers further incorporate collaborative signals so that items with similar user-behavior patterns receive similar codes, substantially improving recommendation quality. However, real-world environments evolve continuously: new items cause identifier collision and shifts, while new interactions induce collaborative drift in existing items (e.g., changing co-occurrence patterns and popularity). Fully retraining both tokenizer and GRM is often prohibitively expensive, yet naively fine-tuning the tokenizer can alter token sequences for the majority of existing items, undermining the GRM's learned token-embedding alignment. To b...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Miscellaneous","Search and information retrieval","Computer science","Information retrieval"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"hf-org-paper:Tencent-Hunyuan:2603.29620","title":"Unify-Agent: A Unified Multimodal Agent for World-Grounded Image Synthesis","url":"https://huggingface.co/papers/2603.29620","published":"2026-03-31","authors":["Tencent/Hunyuan"],"abstract":"","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["HuggingFace org papers","Tencent-Hunyuan","agent"],"author_affiliations":["Tencent/Hunyuan"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/Tencent-Hunyuan/papers"}},{"id":"official:a52bc48df530a184","title":"How Australia Uses Claude: Findings from the Anthropic Economic Index","url":"https://www.anthropic.com/research/how-australia-uses-claude","published":"2026-03-31","authors":["Anthropic"],"abstract":"Economic Research","companies":["Anthropic"],"matched_orgs":["Anthropic"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Economic Research"],"author_affiliations":["Anthropic"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Anthropic research page https://www.anthropic.com/research"}},{"id":"openalex:W7147435254","title":"Meta-reasoning in autonomous agents: performance gains across benchmarks and models","url":"https://doi.org/10.20935/acadai8229","published":"2026-03-31","authors":["Wrick Talukdar","Anjanava Biswas","Gowtham Shankar","Varun Shinde","Gaurav Parekh"],"abstract":"Introduction: Meta-reasoning, the ability of an autonomous agent to monitor and regulate its own reasoning processes, has emerged as a critical component for achieving adaptive and trustworthy artificial intelligence. However, limited empirical evidence quantifies how meta-reasoning impacts agent performance across diverse contexts and model architectures. This study presents a comprehensive empirical analysis of meta-reasoning capabilities in large language model (LLM)-based autonomous agents using two established benchmarks: GAIA (General AI Assistants) and AgentBench.","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.20935/acadai8229","openalex_id":"https://openalex.org/W7147435254","cited_by_count":0,"quality_score":49,"matched_keywords":["LLM","language model","agent"],"author_affiliations":["Amazon (United States)","PDF Solutions (United States)","Wojewódzki Szpital Specjalistyczny nr 5 im. św. Barbary w Sosnowcu"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.671500027179718},{"id":"https://openalex.org/C168167062","display_name":"Component (thermodynamics)","score":0.6119999885559082},{"id":"https://openalex.org/C153701036","display_name":"Trustworthiness","score":0.542900025844574},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4544999897480011},{"id":"https://openalex.org/C120936955","display_name":"Empirical research","score":0.41179999709129333},{"id":"https://openalex.org/C166052673","display_name":"Empirical evidence","score":0.3668000102043152},{"id":"https://openalex.org/C13687954","display_name":"Autonomous agent","score":0.3521000146865845},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.2856999933719635}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"arxiv:2603.29093","title":"APEX-EM: Non-Parametric Online Learning for Autonomous Agents via Structured Procedural-Episodic Experience Replay","url":"http://arxiv.org/abs/2603.29093","published":"2026-03-31","authors":["Pratyay Banerjee","Masud Moshtaghi","Ankit Chadha"],"abstract":"LLM-based autonomous agents lack persistent procedural memory: they re-derive solutions from scratch even when structurally identical tasks have been solved before. We present APEX-EM, a non-parametric online learning framework that accumulates, retrieves, and reuses structured procedural plans without modifying model weights. APEX-EM introduces: (1) a structured experience representation encoding the full procedural-episodic trace of each execution -- planning steps, artifacts, iteration history with error analysis, and quality scores; (2) a Plan-Retrieve-Generate-Iterate-Ingest (PRGII) workflow with Task Verifiers providing multi-dimensional reward signals; and (3) a dual-outcome Experience Memory with hybrid retrieval combining semantic search, structural signature matching, and plan DAG traversal -- enabling cross-domain transfer between tasks sharing no lexical overlap but analogous...","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"","openalex_id":"https://openalex.org/W7148175972","cited_by_count":0,"quality_score":49,"matched_keywords":["LLM","memory","retrieval"],"author_affiliations":["Amazon (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8289999961853027},{"id":"https://openalex.org/C177212765","display_name":"Workflow","score":0.6402000188827515},{"id":"https://openalex.org/C2780451532","display_name":"Task (project management)","score":0.5690000057220459},{"id":"https://openalex.org/C2776760102","display_name":"Code (set theory)","score":0.5049999952316284},{"id":"https://openalex.org/C140745168","display_name":"Tree traversal","score":0.49000000953674316},{"id":"https://openalex.org/C168167062","display_name":"Component (thermodynamics)","score":0.4884999990463257},{"id":"https://openalex.org/C22367795","display_name":"Structured prediction","score":0.4722000062465668},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4575999975204468}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7147010900","title":"A Framework for Governing Generative AI Workloads in Cloud Environments","url":"https://doi.org/10.1109/mc.2025.3615616","published":"2026-03-31","authors":["Goutham Bandapati","Srinivasa Rao Atta","Ananya Ghosh Chowdhury"],"abstract":"This article presents a platform-agnostic governance framework with a structured approach to handle the complete lifecycle of generative artificial intelligence workloads on cloud platforms, presenting the framework across five pillars: cost optimization, security, resilience, operational efficiency, and model governance.","companies":["Google/DeepMind","Microsoft"],"matched_orgs":["Google/DeepMind","Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/mc.2025.3615616","openalex_id":"https://openalex.org/W7147010900","cited_by_count":0,"quality_score":49,"matched_keywords":[],"author_affiliations":["Google (United States)","Microsoft (United States)","Microsoft Research (United Kingdom)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7222999930381775},{"id":"https://openalex.org/C79974875","display_name":"Cloud computing","score":0.715399980545044},{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.5562000274658203},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4153999984264374},{"id":"https://openalex.org/C167966045","display_name":"Generative model","score":0.32850000262260437},{"id":"https://openalex.org/C115903868","display_name":"Software engineering","score":0.31520000100135803},{"id":"https://openalex.org/C39389867","display_name":"Corporate governance","score":0.2987000048160553},{"id":"https://openalex.org/C120314980","display_name":"Distributed computing","score":0.27459999918937683}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7143658416","title":"AGENTIC AI IN MULTI-AGENT SYSTEMS: EXPLORING THE COORDINATION, NEGOTIATION, AND COOPERATION OF AUTONOMOUS ARTIFICIAL AGENTS IN COMPETITIVE AND COLLABORATIVE DIGITAL ECOSYSTEMS","url":"https://doi.org/10.29121/jissi.v2.i1.2026.43","published":"2026-03-31","authors":["Deepthi Talasila"],"abstract":"This study investigates the dynamics of agentic AI within multi-agent systems (MAS), focusing on coordination, negotiation, and cooperation mechanisms in competitive and collaborative digital ecosystems. Employing a simulation-based methodology utilizing multi-agent reinforcement learning (MARL) frameworks, the research analyzes hypothetical yet realistic datasets derived from environments like the StarCraft Multi-Agent Challenge (SMAC) and Multi-Agent Particle Environment (MPE). Key findings reveal that collaborative scenarios yield higher success rates (up to 64.47%) and shorter negotiation times compared to competitive ones, while mixed environments exhibit balanced but volatile cooperation indices. Algorithms such as Proximal Policy Optimization (PPO) demonstrate superior stability in convergence, though Deep Q-Networks (DQN) excel in reward maximization. The analysis underscores the...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.29121/jissi.v2.i1.2026.43","openalex_id":"https://openalex.org/W7143658416","cited_by_count":0,"quality_score":45,"matched_keywords":["agent","multi-agent"],"author_affiliations":["Microsoft (United States)"],"concepts":[{"id":"https://openalex.org/C199776023","display_name":"Negotiation","score":0.7333999872207642},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6276000142097473},{"id":"https://openalex.org/C105339364","display_name":"Software deployment","score":0.6238999962806702},{"id":"https://openalex.org/C97541855","display_name":"Reinforcement learning","score":0.5681999921798706},{"id":"https://openalex.org/C48044578","display_name":"Scalability","score":0.5056999921798706},{"id":"https://openalex.org/C26517878","display_name":"Key (lock)","score":0.49239999055862427},{"id":"https://openalex.org/C56739046","display_name":"Knowledge management","score":0.47130000591278076},{"id":"https://openalex.org/C112972136","display_name":"Stability (learning theory)","score":0.4708000123500824}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7147396993","title":"Task-Dependent Sensitivity of VLA Models to Instruction Wording","url":"https://doi.org/10.22541/au.177497139.90709706/v1","published":"2026-03-31","authors":["Jihwan Woo"],"abstract":"Vision-Language-Action (VLA) models condition robot actions on natural language, yet sensitivity to instruction wording has not been characterised. This letter evaluates OpenVLA-7B on three manipulation tasks, comparing action differences from synonymous rephrasing (e.g., ”put” vs. ”place” vs. ”set”) against differences from specificity variation (brief vs. step-by-step). Across 5 scenes per task with balanced comparisons (n = 50 pairs each), phrasing sensitivity is task-dependent: one task shows significantly larger phrasing than specificity differences (1.6x, p = 0.018), one shows no difference (p = 0.957), and one trends in the opposite direction (p = 0.092). In aggregate, phrasing and specificity produce comparable action differences (p = 0.395). Both exceed the stochastic noise floor by 2-4x. The results indicate that VLA instruction sensitivity is real but task-specific, and that d...","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.22541/au.177497139.90709706/v1","openalex_id":"https://openalex.org/W7147396993","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Amazon (United States)"],"concepts":[{"id":"https://openalex.org/C21200559","display_name":"Sensitivity (control systems)","score":0.7057999968528748},{"id":"https://openalex.org/C63479239","display_name":"Robustness (evolution)","score":0.7010999917984009},{"id":"https://openalex.org/C2780451532","display_name":"Task (project management)","score":0.6547999978065491},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5273000001907349},{"id":"https://openalex.org/C2780791683","display_name":"Action (physics)","score":0.5092999935150146},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.4959999918937683},{"id":"https://openalex.org/C2778334786","display_name":"Variation (astronomy)","score":0.4341999888420105},{"id":"https://openalex.org/C99498987","display_name":"Noise (video)","score":0.4239000082015991}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/see-it-to-place-it-evolving-macro-placements-with-vision-language-models","title":"See it to Place it: Evolving Macro Placements with Vision-Language Models","url":"https://www.microsoft.com/en-us/research/publication/see-it-to-place-it-evolving-macro-placements-with-vision-language-models/","published":"2026-03-30","authors":["∗. IkechukwuUchendu","Swati Goel","Karly Hou","Ebrahim M. Songhori","V. Reddi","Vincent Zhuang","Google DeepMind"],"abstract":"We propose using Vision-Language Models (VLMs) for macro placement in chip floorplanning, a complex optimization task that has recently shown promising advancements through machine learning methods. Because human designers rely heavily on spatial reasoning to arrange components on the chip canvas, we hypothesize that VLMs with strong visual reasoning abilities can effectively complement existing learning-based approaches. We introduce VeoPlace (Visual Evolutionary Optimization Placement), a novel framework that uses a VLM, without any fine-tuning, to guide the actions of a base placer by constraining them to subregions of the chip canvas. The VLM proposals are iteratively optimized through an evolutionary search strategy with respect to resulting placement quality. On open-source benchmarks, VeoPlace outperforms the best prior learning-based approach on 9 of 10 benchmarks with peak wirel...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Miscellaneous","Artificial intelligence","Hardware and devices","Computer science","Routing (electronic design automation)"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/rethinking-language-model-scaling-under-transferable-hypersphere-optimization","title":"Rethinking Language Model Scaling under Transferable Hypersphere Optimization","url":"https://www.microsoft.com/en-us/research/publication/rethinking-language-model-scaling-under-transferable-hypersphere-optimization/","published":"2026-03-30","authors":["Liliang Ren","Yang Liu","Yelong Shen","Weizhu Chen"],"abstract":"Scaling laws for large language models depend critically on the optimizer and parameterization. Existing hyperparameter transfer laws are mainly developed for first-order optimizers, and they do not structurally prevent training instability at scale. Recent hypersphere optimization methods constrain weight matrices to a fixed-norm hypersphere, offering a promising alternative for more stable scaling. We introduce HyperP (Hypersphere Parameterization), the first framework for transferring optimal learning rates across model width, depth, training tokens, and Mixture-of-Experts (MoE) granularity under the Frobenius-sphere constraint with the Muon optimizer. We prove that weight decay is a first-order no-op on the Frobenius sphere, show that Depth-$mu$P remains necessary, and find that the optimal learning rate follows the same data-scaling power law with the\"magic exponent\"0.32 previously....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Miscellaneous","Artificial intelligence","Computer science","Machine learning","language model"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/adapttoken-entropy-based-adaptive-token-selection-for-mllm-long-video-understanding","title":"AdaptToken: Entropy-based Adaptive Token Selection for MLLM Long Video Understanding","url":"https://www.microsoft.com/en-us/research/publication/adapttoken-entropy-based-adaptive-token-selection-for-mllm-long-video-understanding/","published":"2026-03-30","authors":["Haozhe Qi","Kevin Qu","Mahdi Rad","Rui Wang","Alexander Mathis","Marc Pollefeys"],"abstract":"Long video understanding remains challenging for Multi-modal Large Language Models (MLLMs) due to high memory costs and context-length limits. Prior approaches mitigate this by scoring and selecting frames/tokens within short clips, but they lack a principled mechanism to (i) compare relevance across distant video clips and (ii) stop processing once sufficient evidence has been gathered. We propose AdaptToken, a training-free framework that turns an MLLM's self-uncertainty into a global control signal for long-video token selection. AdaptToken splits a video into groups, extracts cross-modal attention to rank tokens within each group, and uses the model's response entropy to estimate each group's prompt relevance. This entropy signal enables a global token budget allocation across groups and further supports early stopping (AdaptToken-Lite), skipping the remaining groups when the model b...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Miscellaneous","Computer vision","Computer science","Computer Vision and Pattern Recognition","memory"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"hf-org-paper:stepfun-ai:2603.28547","title":"GEditBench v2: A Human-Aligned Benchmark for General Image Editing","url":"https://huggingface.co/papers/2603.28547","published":"2026-03-30","authors":["StepFun"],"abstract":"","companies":["StepFun"],"matched_orgs":["StepFun"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["HuggingFace org papers","stepfun-ai"],"author_affiliations":["StepFun"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/stepfun-ai/papers"}},{"id":"apple:kphtkimnntc6g6l1i42wgbsf","title":"Entropy-Preserving Reinforcement Learning","url":"https://machinelearning.apple.com/research/entropy-preserving-reinforcement-learning","published":"2026-03-30","authors":["Aleksei Petrenko","Ben Lipkin","Kevin Chen","Erik Wijmans","Marco Cusumano-Towner","Raja Giryes","Philipp Krähenbühl"],"abstract":"Policy gradient algorithms have driven many recent advancements in language model reasoning. An appealing property is their ability to learn from exploration on their own trajectories, a process crucial for fostering diverse and creative solutions. As we show in this paper, many policy gradient algorithms naturally reduce the entropy—and thus the diversity of explored trajectories—as part of training, yielding a policy increasingly limited in its...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"https://openalex.org/W7135428798","cited_by_count":0,"quality_score":56,"matched_keywords":["language model"],"author_affiliations":["Apple","Apple (Israel)","Apple (United States)"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/progressvla-progress-guided-diffusion-policy-for-vision-language-robotic-manipulation","title":"ProgressVLA: Progress-Guided Diffusion Policy for Vision-Language Robotic Manipulation","url":"https://www.microsoft.com/en-us/research/publication/progressvla-progress-guided-diffusion-policy-for-vision-language-robotic-manipulation/","published":"2026-03-29","authors":["Hongyu Yan","Qiwei Li","Jiaolong Yang","Yadong Mu"],"abstract":"Most existing vision-language-action (VLA) models for robotic manipulation lack progress awareness, typically relying on hand-crafted heuristics for task termination. This limitation is particularly severe in long-horizon tasks involving cascaded sub-goals. In this work, we investigate the estimation and integration of task progress, proposing a novel model named {textbf vla}. Our technical contributions are twofold: (1) emph{robust progress estimation}: We pre-train a progress estimator on large-scale, unsupervised video-text robotic datasets. This estimator achieves a low prediction residual (0.07 on a scale of $[0, 1]$) in simulation and demonstrates zero-shot generalization to unseen real-world samples, and (2) emph{differentiable progress guidance}: We introduce an inverse dynamics world model that maps predicted action tokens into future latent visual states. These latents are then...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Miscellaneous","Artificial intelligence","Computer vision","Computer science","Robotics","vision language models"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/reasoning-beyond-labels-measuring-llm-sentiment-in-low-resource-culturally-nuanced-contexts","title":"Reasoning Beyond Labels: Measuring LLM Sentiment in Low-Resource, Culturally Nuanced Contexts","url":"https://www.microsoft.com/en-us/research/publication/reasoning-beyond-labels-measuring-llm-sentiment-in-low-resource-culturally-nuanced-contexts/","published":"2026-03-28","authors":["Millicent Ochieng","Anja Thieme","Ignatius Ezeani","Risa Ueno","Samuel Chege Maina","Keshet Ronen","Javier González","Jacki O'Neill"],"abstract":"Sentiment analysis in low-resource, culturally nuanced contexts challenges conventional NLP approaches that assume fixed labels and universal affective expressions. We present a diagnostic framework that treats sentiment as a context-dependent, culturally embedded construct, and evaluate how large language models (LLMs) reason about sentiment in informal, code-mixed WhatsApp messages from Nairobi youth health groups. Using human-annotated data, sentiment-flipped counterfactuals, and rubric-based explanation evaluation, we probe LLM interpretability, robustness, and alignment with human reasoning. Framing our evaluation through a social science measurement lens, we operationalize LLM outputs as an instrument for measuring the abstract concept of sentiment. Our findings reveal significant variation in model reasoning quality, with top-tier LLMs demonstrating greater interpretive stability,...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","1970-01-01","LLM"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/prue-a-practical-recipe-for-field-boundary-segmentation-at-scale","title":"PRUE: A Practical Recipe for Field Boundary Segmentation at Scale","url":"https://www.microsoft.com/en-us/research/publication/prue-a-practical-recipe-for-field-boundary-segmentation-at-scale/","published":"2026-03-28","authors":["Gedeon Muhawenayo","Caleb Robinson","Subash Khanal","Zhanpei Fang","I. Corley","A. Wollam","Tianyi Gao","L. Strnad","Ryan Avery","Lyndon D. Estes","A. M. T'arano","Nathan Jacobs"],"abstract":"Large-scale maps of field boundaries are essential for agricultural monitoring tasks. Existing deep learning approaches for satellite-based field mapping are sensitive to illumination, spatial scale, and changes in geographic location. We conduct the first systematic evaluation of segmentation and geospatial foundation models (GFMs) for global field boundary delineation using the Fields of The World (FTW) benchmark. We evaluate 18 models under unified experimental settings, showing that a U-Net semantic segmentation model outperforms instance-based and GFM alternatives on a suite of performance and deployment metrics. We propose a new segmentation approach that combines a U-Net backbone, composite loss functions, and targeted data augmentations to enhance performance and robustness under real-world conditions. Our model achieves a 76% IoU and 47% object-F1 on FTW, an increase of 6% and 9...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Miscellaneous","Computer vision","Computer science","Computer Vision and Pattern Recognition"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/mimetic-alignment-with-aspect-evaluation-of-ai-inferred-personal-profiles","title":"Mimetic Alignment with ASPECT: Evaluation of AI-inferred Personal Profiles","url":"https://www.microsoft.com/en-us/research/publication/mimetic-alignment-with-aspect-evaluation-of-ai-inferred-personal-profiles/","published":"2026-03-27","authors":["Ruoxi Shang","Dan Marshall","Edward Cutrell","Denae Ford"],"abstract":"AI agents that communicate on behalf of individuals need to capture how each person actually communicates, yet current approaches either require costly per-person fine-tuning, produce generic outputs from shallow persona descriptions, or optimize preferences without modeling communication style. We present ASPECT (Automated Social Psychometric Evaluation of Communication Traits), a pipeline that directs LLMs to assess constructs from a validated communication scale against behavioral evidence from workplace data, without per-person training. In a case study with 20 participants (1,840 paired item ratings, 600 scenario evaluations), ASPECT-generated profiles achieved moderate alignment with self-assessments, and ASPECT-generated responses were preferred over generic and self-report baselines on aggregate, with substantial variation across individuals and scenarios. During the profile revi...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Miscellaneous","Artificial intelligence","Human language technologies","Human-computer interaction","Computer science","Human–computer interaction"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/developers-and-generative-ai-a-study-of-self-admitted-usage-in-open-source-projects","title":"Developers and generative AI: A study of self-admitted usage in open source projects","url":"https://www.microsoft.com/en-us/research/publication/developers-and-generative-ai-a-study-of-self-admitted-usage-in-open-source-projects/","published":"2026-03-27","authors":["Rosalia Tufano","Federica Pepe","Fiorella Zampetti","A. Mastropaolo","Ozren Dabi'c","Massimiliano Di Penta","Gabriele Bavota"],"abstract":"The availability of generative Artificial Intelligence (AI) tools such as ChatGPT or GitHub Copilot is reshaping the way in which software is developed, evolved, and maintained. Oftentimes, developers leave traces of such an usage in software artifacts. This allows not only to understand how AI is used in software development, but also to let others be aware how such software artifacts were created, e.g., for licensing or trustworthiness purposes. This paper-building upon our preliminary work presented at MSR 2024-aims at qualitatively investigating on the self-admitted use of two very popular generative AI tools - ChatGPT and GitHub Copilot - in software development. To this aim, we mined GitHub for such traces, by looking at commits, issues and pull requests (PRs). Then, through a manual coding, we create a taxonomy of 64 different ChatGPT and GitHub Copilot usage tasks, grouped into 7...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Miscellaneous","Artificial intelligence","Programming languages and software engineering","Computer science","software engineering"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/ovi-mapopen-vocabulary-instance-semantic-mapping","title":"OVI-MAP:Open-Vocabulary Instance-Semantic Mapping","url":"https://www.microsoft.com/en-us/research/publication/ovi-mapopen-vocabulary-instance-semantic-mapping/","published":"2026-03-27","authors":["Zilong Deng","Federico Tombari","Marc Pollefeys","Johanna Wald","Dániel Baráth"],"abstract":"Incremental open-vocabulary 3D instance-semantic mapping is essential for autonomous agents operating in complex everyday environments. However, it remains challenging due to the need for robust instance segmentation, real-time processing, and flexible open-set reasoning. Existing methods often rely on the closed-set assumption or dense per-pixel language fusion, which limits scalability and temporal consistency. We introduce OVI-MAP that decouples instance reconstruction from semantic inference. We propose to build a class-agnostic 3D instance map that is incrementally constructed from RGB-D input, while semantic features are extracted only from a small set of automatically selected views using vision-language models. This design enables stable instance tracking and zero-shot semantic labeling throughout online exploration. Our system operates in real time and outperforms state-of-the-a...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Miscellaneous","Artificial intelligence","Computer vision","Computer science"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"hf-org-paper:zai-org:2603.26648","title":"Vision2Web: A Hierarchical Benchmark for Visual Website Development with Agent Verification","url":"https://huggingface.co/papers/2603.26648","published":"2026-03-27","authors":["Z.ai/Zhipu"],"abstract":"","companies":["Z.ai/Zhipu"],"matched_orgs":["Z.ai/Zhipu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["HuggingFace org papers","zai-org","agent"],"author_affiliations":["Z.ai/Zhipu"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/zai-org/papers"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/lacon-training-text-to-image-model-from-uncurated-data","title":"LACON: Training Text-to-Image Model from Uncurated Data","url":"https://www.microsoft.com/en-us/research/publication/lacon-training-text-to-image-model-from-uncurated-data/","published":"2026-03-27","authors":["Zhiyang Liang","Ziyu Wan","Hongyu Liu","Dong Chen","Qiu Shen","Hao Zhu","Dongdong Chen"],"abstract":"The success of modern text-to-image generation is largely attributed to massive, high-quality datasets. Currently, these datasets are curated through a filter-first paradigm that aggressively discards low-quality raw data based on the assumption that it is detrimental to model performance. Is the discarded bad data truly useless, or does it hold untapped potential? In this work, we critically re-examine this question. We propose LACON (Labeling-and-Conditioning), a novel training framework that exploits the underlying uncurated data distribution. Instead of filtering, LACON re-purposes quality signals, such as aesthetic scores and watermark probabilities as explicit, quantitative condition labels. The generative model is then trained to learn the full spectrum of data quality, from bad to good. By learning the explicit boundary between high- and low-quality content, LACON achieves superi...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Miscellaneous","Computer vision","Computer science"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"apple:fpbhgr80m95451zt9zsgml4g","title":"Athena: Intermediate Representations for Iterative Scaffolded App Generation with an LLM","url":"https://machinelearning.apple.com/research/athena","published":"2026-03-27","authors":["Jazbo Beason","Ruijia Cheng","Eldon Schoop","Jeffrey Nichols"],"abstract":"It is challenging to generate the code for a complete user interface using a Large Language Model (LLM). User interfaces are complex and their implementations often consist of multiple, inter-related files that together specify the contents of each screen, the navigation flows between the screens, and the data model used throughout the application. It is challenging to craft a single prompt for an LLM that contains enough detail to generate a...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["LLM","language model"],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"arxiv:2603.26556","title":"When Perplexity Lies: Generation-Focused Distillation of Hybrid Sequence Models","url":"http://arxiv.org/abs/2603.26556","published":"2026-03-27","authors":["Juan Gabriel Kostelec","Xiang Wang","Axel Laborieux","Christos Sourmpis","Qinghai Guo"],"abstract":"Converting a pretrained Transformer into a more efficient hybrid model through distillation offers a promising approach to reducing inference costs. However, achieving high-quality generation in distilled models requires careful joint design of both the student architecture and the distillation process. Many prior distillation works evaluate downstream multiple-choice benchmarks by ranking candidate answers with log-likelihood rather than requiring autoregressive generation, which can obscure important differences in model quality. For example, we show that a 7B parameter distilled model that nearly matches its teacher to within 0.2\\,pp under log-likelihood scoring actually falls behind by 20.8\\,pp when the model must generate answers autoregressively. We propose a Hybrid Kimi Delta Attention (Hybrid-KDA) architecture paired with GenDistill, a multi-stage distillation pipeline, and use g...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"","openalex_id":"https://openalex.org/W7144391286","cited_by_count":0,"quality_score":49,"matched_keywords":["memory","efficient","distillation"],"author_affiliations":["Huawei Technologies (China)","Huawei Technologies (United Kingdom)","Huawei Technologies (United States)"],"concepts":[{"id":"https://openalex.org/C204030448","display_name":"Distillation","score":0.7016000151634216},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6960999965667725},{"id":"https://openalex.org/C2776214188","display_name":"Inference","score":0.546999990940094},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.5174000263214111},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5083000063896179},{"id":"https://openalex.org/C66322947","display_name":"Transformer","score":0.4837999939918518},{"id":"https://openalex.org/C159877910","display_name":"Autoregressive model","score":0.41659998893737793},{"id":"https://openalex.org/C100279451","display_name":"Perplexity","score":0.415800005197525}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/dflop-a-data-driven-framework-for-multimodal-llm-training-pipeline-optimization","title":"DFLOP: A Data-driven Framework for Multimodal LLM Training Pipeline Optimization","url":"https://www.microsoft.com/en-us/research/publication/dflop-a-data-driven-framework-for-multimodal-llm-training-pipeline-optimization/","published":"2026-03-26","authors":["H. An","Sihyun Kim","Chaerim Lim","Hyunjoong Kim","Rathijit Sen","Sangmin Jung","Hye Yoon Lee","Dongwook Kim","Takki Yu","Jinkyu Jeong","Youngsok Kim","Kwanghyun Park"],"abstract":"Multimodal Large Language Models (MLLMs) have achieved remarkable advances by integrating text, image, and audio understanding within a unified architecture. However, existing distributed training frameworks remain fundamentally data-blind: they parallelize computation without accounting for variations in input data characteristics. This data unawareness leads to severe computation skew across stages and microbatches, where heterogeneous multimodal inputs incur different processing costs. Consequently, GPU resources are unevenly utilized, synchronization delays accumulate, and overall training efficiency degrades. To address this limitation, we present DFLOP, a data-driven framework for multimodal LLM training pipeline optimization. DFLOP continuously profiles runtime behavior to capture data-induced computation variance and employs predictive scheduling to balance workloads across stage...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.1145/3802037","openalex_id":"https://openalex.org/W7161597045","cited_by_count":0,"quality_score":84,"matched_keywords":["Miscellaneous","Artificial intelligence","Data platforms and analytics","Systems and networking","Computer science","large language models","Machine learning","LLM"],"author_affiliations":["Microsoft","Microsoft (United States)","Yonsei University"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/megaflow-zero-shot-large-displacement-optical-flow","title":"MegaFlow: Zero-Shot Large Displacement Optical Flow","url":"https://www.microsoft.com/en-us/research/publication/megaflow-zero-shot-large-displacement-optical-flow/","published":"2026-03-26","authors":["Dingxi Zhang","Fangjinhua Wang","Marc Pollefeys","Haofei Xu"],"abstract":"Accurate estimation of large displacement optical flow remains a critical challenge. Existing methods typically rely on iterative local search or/and domain-specific fine-tuning, which severely limits their performance in large displacement and zero-shot generalization scenarios. To overcome this, we introduce MegaFlow, a simple yet powerful model for zero-shot large displacement optical flow. Rather than relying on highly complex, task-specific architectural designs, MegaFlow adapts powerful pre-trained vision priors to produce temporally consistent motion fields. In particular, we formulate flow estimation as a global matching problem by leveraging pre-trained global Vision Transformer features, which naturally capture large displacements. This is followed by a few lightweight iterative refinements to further improve the sub-pixel accuracy. Extensive experiments demonstrate that MegaFl...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Miscellaneous","Algorithms","Artificial intelligence","Computer vision","Computer science","Machine learning"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"bytedance-seed:1432","title":"TopoMesh: High-Fidelity Mesh Autoencoding via Topological Unification","url":"https://seed.bytedance.com/en/research/topomesh-high-fidelity-mesh-autoencoding-via-topological-unification","published":"2026-03-26","authors":["Guan Luo","Xiu Li","Rui Chen","Xuanyu Yi","Jing Lin","Chia-Hao Chen","Jiahang Liu","Song-Hai Zhang","Jianfeng Zhang"],"abstract":"The dominant paradigm for high-fidelity 3D generation relies on a VAE-Diffusion pipeline, where the VAE's reconstruction capability sets a firm upper bound on generation quality. A fundamental challenge limiting existing VAEs is the representation mismatch between ground-truth meshes and network predictions: GT meshes have arbitrary, variable topology, while VAEs typically predict fixed-structure implicit fields (\\eg, SDF on regular grids). This inherent misalignment prevents establishing explicit mesh-level correspondences, forcing prior work to rely on indirect supervision signals such as SDF or rendering losses. Consequently, fine geometric details, particularly sharp features, are poorly preserved during reconstruction. To address this, we introduce TopoMesh, a sparse voxel-based VAE that unifies both GT and predicted meshes under a shared Dual Marching Cubes (DMC) topological framew...","companies":["ByteDance/Seed"],"matched_orgs":["ByteDance/Seed"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Computer Vision and Pattern Recognition","Vision","arXiv","efficient"],"author_affiliations":["ByteDance/Seed"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official ByteDance Seed publication API https://seed.bytedance.com/api/get_article_list_v2"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/hispatial-taming-hierarchical-3d-spatial-understanding-in-vision-language-models","title":"HiSpatial: Taming Hierarchical 3D Spatial Understanding in Vision-Language Models","url":"https://www.microsoft.com/en-us/research/publication/hispatial-taming-hierarchical-3d-spatial-understanding-in-vision-language-models/","published":"2026-03-26","authors":["Huizhi Liang","Yichao Shen","Yu Deng","Sicheng Xu","Zhiyuan Feng","Tong Zhang","Yaobo Liang","Jiaolong Yang"],"abstract":"Achieving human-like spatial intelligence for vision-language models (VLMs) requires inferring 3D structures from 2D observations, recognizing object properties and relations in 3D space, and performing high-level spatial reasoning. In this paper, we propose a principled hierarchical framework that decomposes the learning of 3D spatial understanding in VLMs into four progressively complex levels, from geometric perception to abstract spatial reasoning. Guided by this framework, we construct an automated pipeline that processes approximately 5M images with over 45M objects to generate 3D spatial VQA pairs across diverse tasks and scenes for VLM supervised fine-tuning. We also develop an RGB-D VLM incorporating metric-scale point maps as auxiliary inputs to further enhance spatial understanding. Extensive experiments demonstrate that our approach achieves state-of-the-art performance on mu...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Inproceedings (Conference)","Computer vision","Computer science","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"bytedance-seed:1336","title":"Hessian-informed machine learning interatomic potential towards bridging theory and experiments","url":"https://seed.bytedance.com/en/research/hessian-informed-machine-learning-interatomic-potential-towards-bridging-theory-and-experiments","published":"2026-03-26","authors":["Bangchen Yin","Jian Ouyang","Zhen Fan","Kailai Lin","Hanshi Hu","Dingshun Lv","Weiluo Ren","Hai Xiao","Ji Chen","Changsu Cao"],"abstract":"Local curvature of potential energy surfaces is critical for predicting certain experimental observables of molecules and materials from first principles, yet it remains far beyond reach for complex systems. In this work, we introduce a Hessian-informed Machine Learning Interatomic Potential (Hi-MLIP) that captures such curvature reliably, thereby enabling accurate analysis of associated thermodynamic and kinetic phenomena. To make Hessian supervision practically viable, we develop a highly efficient training protocol, termed Hessian INformed Training (HINT), achieving two to four orders of magnitude reduction for the requirement of expensive Hessian labels. HINT integrates critical techniques, including Hessian pre-training, configuration sampling, curriculum learning and stochastic projection Hessian loss. Enabled by HINT, Hi-MLIP significantly improves transition-state search and brin...","companies":["ByteDance/Seed"],"matched_orgs":["ByteDance/Seed"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Machine Learning","AI for Science","arXiv","efficient"],"author_affiliations":["ByteDance/Seed"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official ByteDance Seed publication API https://seed.bytedance.com/api/get_article_list_v2"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/bizgeneval-a-systematic-benchmark-for-commercial-visual-content-generation","title":"BizGenEval: A Systematic Benchmark for Commercial Visual Content Generation","url":"https://www.microsoft.com/en-us/research/publication/bizgeneval-a-systematic-benchmark-for-commercial-visual-content-generation/","published":"2026-03-26","authors":["Yan Li","Zezi Zeng","Ziwei Zhou","Xin Gao","Muzhao Tian","Yifan Yang","Ming-Hung Cheng","Qiaomin Dai","Yuqing Yang","Lili Qiu","Zhendong Wang","Zhengyuan Yang"],"abstract":"Recent advances in image generation models have expanded their applications beyond aesthetic imagery toward practical visual content creation. However, existing benchmarks mainly focus on natural image synthesis and fail to systematically evaluate models under the structured and multi-constraint requirements of real-world commercial design tasks. In this work, we introduce BizGenEval, a systematic benchmark for commercial visual content generation. The benchmark spans five representative document types: slides, charts, webpages, posters, and scientific figures, and evaluates four key capability dimensions: text rendering, layout control, attribute binding, and knowledge-based reasoning, forming 20 diverse evaluation tasks. BizGenEval contains 400 carefully curated prompts and 8000 human-verified checklist questions to rigorously assess whether generated images satisfy complex visual and....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Miscellaneous","Artificial intelligence","Computer vision","Computer science"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/a-decade-scale-benchmark-evaluating-llms-clinical-practice-guidelines-detection-and-adherence-in-multi-turn-conversations","title":"A Decade-Scale Benchmark Evaluating LLMs' Clinical Practice Guidelines Detection and Adherence in Multi-turn Conversations","url":"https://www.microsoft.com/en-us/research/publication/a-decade-scale-benchmark-evaluating-llms-clinical-practice-guidelines-detection-and-adherence-in-multi-turn-conversations/","published":"2026-03-26","authors":["Andong Tan","Shuyun Dai","Jinglu Wang","Fengtao Zhou","Yan Lu","Xi (Ada) Wang","Ying-Che Chen","Can Yang","Shujie Liu","Hao Chen"],"abstract":"Clinical practice guidelines (CPGs) play a pivotal role in ensuring evidence-based decision-making and improving patient outcomes. While Large Language Models (LLMs) are increasingly deployed in healthcare scenarios, it is unclear to which extend LLMs could identify and adhere to CPGs during conversations. To address this gap, we introduce CPGBench, an automated framework benchmarking the clinical guideline detection and adherence capabilities of LLMs in multi-turn conversations. We collect 3,418 CPG documents from 9 countries/regions and 2 international organizations published in the last decade spanning across 24 specialties. From these documents, we extract 32,155 clinical recommendations with corresponding publication institute, date, country, specialty, recommendation strength, evidence level, etc. One multi-turn conversation is generated for each recommendation accordingly to evalu...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Miscellaneous","Artificial intelligence","Human language technologies","Computer science"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"hf-org-paper:baidu:2603.25743","title":"RefAlign: Representation Alignment for Reference-to-Video Generation","url":"https://huggingface.co/papers/2603.25743","published":"2026-03-26","authors":["Baidu"],"abstract":"","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["HuggingFace org papers","baidu"],"author_affiliations":["Baidu"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/baidu/papers"}},{"id":"hf-org-paper:Qwen:2603.25804","title":"RealChart2Code: Advancing Chart-to-Code Generation with Real Data and Multi-Task Evaluation","url":"https://huggingface.co/papers/2603.25804","published":"2026-03-26","authors":["Alibaba/Qwen"],"abstract":"","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["HuggingFace org papers","Qwen"],"author_affiliations":["Alibaba/Qwen"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/Qwen/papers"}},{"id":"apple:y06oyycnw5nikfn39kfxk5sc","title":"Revisiting the Scaling Properties of Downstream Metrics in Large Language Model Training","url":"https://machinelearning.apple.com/research/downstream-metrics","published":"2026-03-26","authors":["Jakub Krajewski","Amitis Shidani","Dan Busbridge","Sam Wiseman","Jason Ramapuram"],"abstract":"While scaling laws for Large Language Models (LLMs) traditionally focus on proxy metrics like pretraining loss, predicting downstream task performance has been considered unreliable. This paper challenges that view by proposing a direct framework to model the scaling of benchmark performance from the training budget. We find that for a fixed token-to-parameter ratio, a simple power law can accurately describe the scaling behavior of log accuracy...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["language model"],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"official:758bba465d7cc365","title":"A foundation model of vision, audition, and language for in-silico neuroscience","url":"https://ai.meta.com/research/publications/a-foundation-model-of-vision-audition-and-language-for-in-silico-neuroscience/","published":"2026-03-26","authors":["Stéphane d'Ascoli","Jérémy Rapin","Yohann Benchetrit","Teon Brooks","Katelyn Begany","Josephine Raugel","Hubert Jacob Banville","Jean Remi King"],"abstract":"Official Meta AI publication page.","companies":["Meta/FAIR"],"matched_orgs":["Meta/FAIR"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Human & Machine Intelligence"],"author_affiliations":["Meta/FAIR"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Meta AI publication search https://ai.meta.com/results/?content_types%5B0%5D=publication&page=1"}},{"id":"arxiv:2603.25891","title":"Few Shots Text to Image Retrieval: New Benchmarking Dataset and Optimization Methods","url":"http://arxiv.org/abs/2603.25891","published":"2026-03-26","authors":["Ofer Idan","Vladi Vexler","G. Lederman","Dima Sivov","Aviad Cohen Zada","Shir Niego Komforti"],"abstract":"Pre-trained vision-language models (VLMs) excel in multimodal tasks, commonly encoding images as embedding vectors for storage in databases and retrieval via approximate nearest neighbor search (ANNS). However, these models struggle with compositional queries and out-of-distribution (OOD) image-text pairs. Inspired by human cognition's ability to learn from minimal examples, we address this performance gap through few-shot learning approaches specifically designed for image retrieval. We introduce the Few-Shot Text-to-Image Retrieval (FSIR) task and its accompanying benchmark dataset, FSIR-BD - the first to explicitly target image retrieval by text accompanied by reference examples, focusing on the challenging compositional and OOD queries. The compositional part is divided to urban scenes and nature species, both in specific situations or with distinctive features. FSIR-BD contains 38,3...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"","openalex_id":"https://openalex.org/W7144391590","cited_by_count":0,"quality_score":41,"matched_keywords":["retrieval"],"author_affiliations":["Huawei Technologies (China)","Huawei Technologies (United States)"],"concepts":[{"id":"https://openalex.org/C185798385","display_name":"Benchmark (surveying)","score":0.7864000201225281},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7857999801635742},{"id":"https://openalex.org/C1667742","display_name":"Image retrieval","score":0.6682000160217285},{"id":"https://openalex.org/C86251818","display_name":"Benchmarking","score":0.6128000020980835},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.6097999811172485},{"id":"https://openalex.org/C2780451532","display_name":"Task (project management)","score":0.6003000140190125},{"id":"https://openalex.org/C115961682","display_name":"Image (mathematics)","score":0.576200008392334},{"id":"https://openalex.org/C146849305","display_name":"Ground truth","score":0.5738000273704529}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"arxiv:2603.25248","title":"ColBERT-Att: Late-Interaction Meets Attention for Enhanced Retrieval","url":"http://arxiv.org/abs/2603.25248","published":"2026-03-26","authors":["Raj Nath Patel","Sourav Dutta"],"abstract":"Vector embeddings from pre-trained language models form a core component in Neural Information Retrieval systems across a multitude of knowledge extraction tasks. The paradigm of late interaction, introduced in ColBERT, demonstrates high accuracy along with runtime efficiency. However, the current formulation fails to take into account the attention weights of query and document terms, which intuitively capture the \"importance\" of similarities between them, that might lead to a better understanding of relevance between the queries and documents. This work proposes ColBERT-Att, to explicitly integrate attention mechanism into the late interaction framework for enhanced retrieval performance. Empirical evaluation of ColBERT-Att depicts improvements in recall accuracy on MS-MARCO as well as on a wide range of BEIR and LoTTE benchmark datasets.","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"","openalex_id":"https://openalex.org/W7142557107","cited_by_count":0,"quality_score":41,"matched_keywords":["retrieval"],"author_affiliations":["Huawei Technologies (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7885000109672546},{"id":"https://openalex.org/C158154518","display_name":"Relevance (law)","score":0.6176000237464905},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5163999795913696},{"id":"https://openalex.org/C81669768","display_name":"Precision and recall","score":0.5040000081062317},{"id":"https://openalex.org/C185798385","display_name":"Benchmark (surveying)","score":0.4993000030517578},{"id":"https://openalex.org/C168167062","display_name":"Component (thermodynamics)","score":0.4740999937057495},{"id":"https://openalex.org/C100660578","display_name":"Recall","score":0.4433000087738037},{"id":"https://openalex.org/C204323151","display_name":"Range (aeronautics)","score":0.4381999969482422}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7140619430","title":"Revisiting the <i>Six Human-Centered Artificial Intelligence Grand Challenges</i> in the Age of Generative AI","url":"https://doi.org/10.1080/10447318.2026.2641703","published":"2026-03-26","authors":["Brent Winslow","Özlem Özmen Garibay","Tesh Goyal","Sean Koon","George Margetis","Gavriel Salvendy","Ben Shneiderman","Aida Tayebi","Laura Vardoulakis"],"abstract":"","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1080/10447318.2026.2641703","openalex_id":"https://openalex.org/W7140619430","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["FORTH Institute of Computer Science","Google (United States)","Kaiser Permanente","University of Central Florida","University of Maryland, College Park"],"concepts":[{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.597000002861023},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.4814999997615814},{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.3677999973297119},{"id":"https://openalex.org/C157170001","display_name":"Applications of artificial intelligence","score":0.30649998784065247},{"id":"https://openalex.org/C2776401178","display_name":"Feature (linguistics)","score":0.30250000953674316},{"id":"https://openalex.org/C167966045","display_name":"Generative model","score":0.27160000801086426},{"id":"https://openalex.org/C19273510","display_name":"Artificial life","score":0.25290000438690186},{"id":"https://openalex.org/C9652623","display_name":"Field (mathematics)","score":0.2434999942779541}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/why-does-self-distillation-sometimes-degrade-the-reasoning-capability-of-llms","title":"Why Does Self-Distillation (Sometimes) Degrade the Reasoning Capability of LLMs?","url":"https://www.microsoft.com/en-us/research/publication/why-does-self-distillation-sometimes-degrade-the-reasoning-capability-of-llms/","published":"2026-03-25","authors":["Jeonghye Kim","Xufang Luo","Minbeom Kim","Sangmook Lee","Dohyung Kim","Jiwon Jeon","Dongsheng Li","Yuqing Yang"],"abstract":"Self-distillation has emerged as an effective post-training paradigm for LLMs, often improving performance while shortening reasoning traces. However, in mathematical reasoning, we find that it can reduce response length while degrading performance. We trace this degradation to the suppression of epistemic verbalization - the model's expression of uncertainty during reasoning. Through controlled experiments varying conditioning context richness and task coverage, we show that conditioning the teacher on rich information suppresses uncertainty expression, enabling rapid in-domain optimization with limited task coverage but harming OOD performance, where unseen problems benefit from expressing uncertainty and adjusting accordingly. Across Qwen3-8B, DeepSeek-Distill-Qwen-7B, and Olmo3-7B-Instruct, we observe performance drops of up to 40%. Our findings highlight that exposing appropriate le...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Miscellaneous","Artificial intelligence","Human language technologies","Computer science","large language models","distillation"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/counting-without-numbers-finding-without-words","title":"Counting Without Numbers &Finding Without Words","url":"https://www.microsoft.com/en-us/research/publication/counting-without-numbers-finding-without-words/","published":"2026-03-25","authors":["B. N. Patro"],"abstract":"Every year, 10 million pets enter shelters, separated from their families. Despite desperate searches by both guardians and lost animals, 70% never reunite, not because matches do not exist, but because current systems look only at appearance, while animals recognize each other through sound. We ask, why does computer vision treat vocalizing species as silent visual objects? Drawing on five decades of cognitive science showing that animals perceive quantity approximately and communicate identity acoustically, we present the first multimodal reunification system integrating visual and acoustic biometrics. Our species-adaptive architecture processes vocalizations from 10Hz elephant rumbles to 4kHz puppy whines, paired with probabilistic visual matching that tolerates stress-induced appearance changes. This work demonstrates that AI grounded in biological communication principles can serve....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Miscellaneous","Artificial intelligence","Audio and Acoustics","Computer vision","Computer science","Multimodal"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/willful-disobedience-automatically-detecting-failures-in-agentic-traces","title":"Willful Disobedience: Automatically Detecting Failures in Agentic Traces","url":"https://www.microsoft.com/en-us/research/publication/willful-disobedience-automatically-detecting-failures-in-agentic-traces/","published":"2026-03-25","authors":["Reshabh K Sharma","Shraddha Barke","Ben Zorn"],"abstract":"AI agents are increasingly embedded in real software systems, where they execute multi-step workflows through multi-turn dialogue, tool invocations, and intermediate decisions. These long execution histories, called agentic traces, make validation difficult. Outcome-only benchmarks can miss critical procedural failures, such as incorrect workflow routing, unsafe tool usage, or violations of prompt-specified rules. This paper presents AgentPex, an AI-powered tool designed to systematically evaluate agentic traces. AgentPex extracts behavioral rules from agent prompts and system instructions, then uses these specifications to automatically evaluate traces for compliance. We evaluate AgentPex on 424 traces from {tau}2-bench across models in telecom, retail, and airline customer service. Our results show that AgentPex distinguishes agent behavior across models and surfaces specification viol...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page","openalex"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.1145/3786335.3813153","openalex_id":"https://openalex.org/W7161020803","cited_by_count":0,"quality_score":72,"matched_keywords":["Miscellaneous","Artificial intelligence","AI agents","Computer science","agent"],"author_affiliations":["Microsoft","Microsoft (United States)","University of Washington"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/the-price-reversal-phenomenon-when-cheaper-reasoning-models-end-up-costing-more","title":"The Price Reversal Phenomenon: When Cheaper Reasoning Models End Up Costing More","url":"https://www.microsoft.com/en-us/research/publication/the-price-reversal-phenomenon-when-cheaper-reasoning-models-end-up-costing-more/","published":"2026-03-25","authors":["Lingjiao Chen","Chi Zhang","Yeye He","Ion Stoica","Matei A. Zaharia","James Zou"],"abstract":"Developers and consumers increasingly choose reasoning language models (RLMs) based on their listed API prices. However, how accurately do these prices reflect actual inference costs? We conduct the first systematic study of this question, evaluating 8 frontier RLMs across 9 diverse tasks covering competition math, science QA, code generation, and multi-domain reasoning. We uncover the pricing reversal phenomenon: in 21.8% of model-pair comparisons, the model with a lower listed price actually incurs a higher total cost, with reversal magnitude reaching up to 28x. For example, Gemini 3 Flash's listed price is 78% cheaper than GPT-5.2's, yet its actual cost across all tasks is 22% higher. We trace the root cause to vast heterogeneity in thinking token consumption: on the same query, one model may use 900% more thinking tokens than another. In fact, removing thinking token costs reduces ra...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Miscellaneous","Artificial intelligence","Computer science","reasoning language models"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"hf-org-paper:tencent:2603.24533","title":"UI-Voyager: A Self-Evolving GUI Agent Learning via Failed Experience","url":"https://huggingface.co/papers/2603.24533","published":"2026-03-25","authors":["Tencent/Hunyuan"],"abstract":"","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["HuggingFace org papers","tencent","agent"],"author_affiliations":["Tencent/Hunyuan"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/tencent/papers"}},{"id":"hf-org-paper:tencent:2603.24458","title":"OmniWeaving: Towards Unified Video Generation with Free-form Composition and Reasoning","url":"https://huggingface.co/papers/2603.24458","published":"2026-03-25","authors":["Tencent/Hunyuan"],"abstract":"","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["HuggingFace org papers","tencent"],"author_affiliations":["Tencent/Hunyuan"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/tencent/papers"}},{"id":"arxiv:2506.02943","title":"Hallucination to Consensus: Multi-Agent LLMs for End-to-End JUnit Test Generation","url":"http://arxiv.org/abs/2506.02943","published":"2026-03-25","authors":["Qinghua Xu","Guancheng Wang","Lionel Briand","Kui Liu"],"abstract":"Unit testing plays a critical role in ensuring software correctness. However, writing unit tests manually is labor-intensive, especially for strongly typed languages like Java, motivating the need for automated approaches. Traditional methods primarily rely on search-based or randomized algorithms to generate tests that achieve high code coverage and produce regression oracles, which are assertions derived from the program's current behavior rather than its intended functionality. Recent advances in large language models (LLMs) have enabled oracle generation from natural language descriptions, aligning better with user requirements. However, existing LLM-based methods often require LLM fine-tuning or rely on external tools such as EvoSuite for test prefix generation, making them costly or cumbersome to apply in practice. In this work, we propose CANDOR, a novel end-to-end, prompt enginee...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3803418","openalex_id":"https://openalex.org/W4414430094","cited_by_count":0,"quality_score":49,"matched_keywords":["LLM","agent","multi-agent"],"author_affiliations":["Huawei Technologies (China)","Lero","Science Foundation Ireland","University of Limerick"],"concepts":[{"id":"https://openalex.org/C55166926","display_name":"Oracle","score":0.8881999850273132},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7050999999046326},{"id":"https://openalex.org/C55439883","display_name":"Correctness","score":0.4885999858379364},{"id":"https://openalex.org/C148027188","display_name":"Unit testing","score":0.42640000581741333},{"id":"https://openalex.org/C2780992000","display_name":"Generator (circuit theory)","score":0.41780000925064087},{"id":"https://openalex.org/C2777267654","display_name":"Test (biology)","score":0.41200000047683716},{"id":"https://openalex.org/C26517878","display_name":"Key (lock)","score":0.3935000002384186},{"id":"https://openalex.org/C43521106","display_name":"Pipeline (software)","score":0.3840999901294708}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"arxiv:2603.24373","title":"PP-OCRv5: A Specialized 5M-Parameter Model Rivaling Billion-Parameter Vision-Language Models on OCR Tasks","url":"http://arxiv.org/abs/2603.24373","published":"2026-03-25","authors":["Cheng Cui","Yubo Zhang","Ting Sun","Xueqing Wang","Hongen Liu","Manhui Lin","Yue Zhang","Tingquan Gao","Changda Zhou","Jiaxuan Liu","Zelun Zhang","Jing Zhang"],"abstract":"The advent of \"OCR 2.0\" and large-scale vision-language models (VLMs) has set new benchmarks in text recognition. However, these unified architectures often come with significant computational demands, challenges in precise text localization within complex layouts, and a propensity for textual hallucinations. Revisiting the prevailing notion that model scale is the sole path to high accuracy, this paper introduces PP-OCRv5, a meticulously optimized, lightweight OCR system with merely 5 million parameters. We demonstrate that PP-OCRv5 achieves performance competitive with many billion-parameter VLMs on standard OCR benchmarks, while offering superior localization precision and reduced hallucinations. The cornerstone of our success lies not in architectural expansion but in a data-centric investigation. We systematically dissect the role of training data by quantifying three critical dimen...","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"","openalex_id":"https://openalex.org/W7141772451","cited_by_count":0,"quality_score":41,"matched_keywords":["efficient"],"author_affiliations":["Baidu (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.808899998664856},{"id":"https://openalex.org/C43521106","display_name":"Pipeline (software)","score":0.5511000156402588},{"id":"https://openalex.org/C177264268","display_name":"Set (abstract data type)","score":0.45410001277923584},{"id":"https://openalex.org/C2778755073","display_name":"Scale (ratio)","score":0.4341999888420105},{"id":"https://openalex.org/C2776760102","display_name":"Code (set theory)","score":0.430400013923645},{"id":"https://openalex.org/C2780616401","display_name":"Cornerstone","score":0.41690000891685486},{"id":"https://openalex.org/C43126263","display_name":"Source code","score":0.41370001435279846},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4104999899864197}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/sortedrl-accelerating-rl-training-for-llms-through-online-length-aware-scheduling","title":"SortedRL: Accelerating RL Training for LLMs through Online Length-Aware Scheduling","url":"https://www.microsoft.com/en-us/research/publication/sortedrl-accelerating-rl-training-for-llms-through-online-length-aware-scheduling/","published":"2026-03-24","authors":["Yiqi Zhang","Huiqiang Jiang","Xufang Luo","Zhihe Yang","Chengruidong Zhang","Yifei Shen","Dongsheng Li","Yuqing Yang","Lili Qiu","Yang You"],"abstract":"Scaling reinforcement learning (RL) has shown strong promise for enhancing the reasoning abilities of large language models (LLMs), particularly in tasks requiring long chain-of-thought generation. However, RL training efficiency is often bottlenecked by the rollout phase, which can account for up to 70% of total training time when generating long trajectories (e.g., 16k tokens), due to slow autoregressive generation and synchronization overhead between rollout and policy updates. We propose SortedRL, an online length-aware scheduling strategy designed to address this bottleneck by improving rollout efficiency and maintaining training stability. SortedRL reorders rollout samples based on output lengths, prioritizing short samples forming groups for early updates. This enables large rollout batches, flexible update batches, and near on-policy micro-curriculum construction simultaneously.....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Miscellaneous","Algorithms","Artificial intelligence","Computer science","large language models","Machine learning"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"bytedance-seed:1334","title":"SIMART: Decomposing Monolithic Meshes into Sim-ready Articulated Assets via MLLM","url":"https://seed.bytedance.com/en/research/simart-decomposing-monolithic-meshes-into-sim-ready-articulated-assets-via-mllm","published":"2026-03-24","authors":["Chuanrui Zhang","Minghan Qin","Yuang Wang","Baifeng Xie","Hang Li","Ziwei Wang"],"abstract":"High-quality articulated 3D assets are indispensable for embodied AI and physical simulation, yet 3D generation still focuses on static meshes, leaving a gap in \"sim-ready\" interactive objects. Most recent articulated object creation methods rely on multi-stage pipelines that accumulate errors across decoupled modules. Alternatively, unified MLLMs offer a single-stage path to joint static asset understanding and sim-ready asset generation. However dense voxel-based 3D tokenization yields long 3D token sequences and high memory overhead, limiting scalability to complex articulated objects. To address this, we propose SIMART, a unified MLLM framework that jointly performs part-level decomposition and kinematic prediction. By introducing a Sparse 3D VQ-VAE, SIMART reduces token counts by 70% vs. dense voxel tokens, enabling high-fidelity multi-part assemblies. SIMART achieves state-of-the-a...","companies":["ByteDance/Seed"],"matched_orgs":["ByteDance/Seed"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Computer Vision and Pattern Recognition","Robotics","arXiv","memory"],"author_affiliations":["ByteDance/Seed"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official ByteDance Seed publication API https://seed.bytedance.com/api/get_article_list_v2"}},{"id":"bytedance-seed:1428","title":"UniGRPO: Unified Policy Optimization for Reasoning-Driven Visual Generation","url":"https://seed.bytedance.com/en/research/unigrpo-unified-policy-optimization-for-reasoning-driven-visual-generation","published":"2026-03-24","authors":["Jie Liu","Zilyu Ye","Linxiao Yuan","Shenhan Zhu","Yu Gao","Jie Wu","Kunchang Li","Xionghui Wang","Xiaonan Nie","Weilin Huang","Wanli Ouyang"],"abstract":"Unified models capable of interleaved generation have emerged as a promising paradigm, with the community increasingly converging on autoregressive modeling for text and flow matching for image generation. To advance this direction, we propose a unified reinforcement learning framework tailored for interleaved generation. We validate our approach on its fundamental unit: a single round of reasoning-driven image generation, where the model first expands the user prompt through reasoning, followed by image synthesis. Formulating this multimodal generation process as a Markov Decision Process with sparse terminal rewards, we introduce UniGRPO to jointly optimize text and image generation policies using GRPO. Adopting a minimalist methodology to avoid over-design, we leverage established training recipes for both modalities by seamlessly integrating standard GRPO for reasoning and FlowGRPO f...","companies":["ByteDance/Seed"],"matched_orgs":["ByteDance/Seed"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Computer Vision and Pattern Recognition","Vision","arXiv"],"author_affiliations":["ByteDance/Seed"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official ByteDance Seed publication API https://seed.bytedance.com/api/get_article_list_v2"}},{"id":"apple:wabbxsvfgv9lg6rwn0jxlsri","title":"Trained on Tokens, Calibrated on Concepts: The Emergence of Semantic Calibration in LLMs","url":"https://machinelearning.apple.com/research/trained-on-tokens","published":"2026-03-24","authors":["Preetum Nakkiran","Arwen Bradley","Adam Goliński","Eugene Ndiaye","Michael Kirchhof","Sinead Williamson"],"abstract":"Large Language Models (LLMs) often lack meaningful confidence estimates for their outputs. While base LLMs are known to exhibit next-token calibration, it remains unclear whether they can assess confidence in the actual meaning of their responses beyond the token level. We find that, when using a certain sampling-based notion of semantic calibration, base LLMs are remarkably well-calibrated: they can meaningfully assess confidence in open-domain...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"apple:cykqs8ltyz9hkdyq6356oifh","title":"Scaling Synthetic Task Generation for Agents via Exploration","url":"https://machinelearning.apple.com/research/scaling-synthetic-task","published":"2026-03-24","authors":["Ram Ramrakhya","Andrew Szot","Omar Attia","Yuhao Yang","Anh Nguyen","Bogdan Mazoure","Zhe Gan","Harsh Agrawal","Alexander Toshev"],"abstract":"Post-Training Multimodal Large Language Models (MLLMs) to build interactive agents holds promise across domains such as computer-use, web navigation, and robotics. A key challenge in scaling such post-training is lack of high-quality downstream agentic task datasets with tasks that are diverse, feasible, and verifiable. Existing approaches for task generation rely heavily on human annotation or prompting MLLM with limited downstream environment...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"apple:v0q44nrous538gfmqapp345p","title":"SafetyPairs: Isolating Safety Critical Image Features with Counterfactual Image Generation","url":"https://machinelearning.apple.com/research/safetypairs","published":"2026-03-24","authors":["Alec Helbling","Shruti Palaskar","Kundan Krishna","Polo Chau","Leon Gatys","Joseph Yitan Cheng"],"abstract":"This paper was accepted at the Principled Design for Trustworthy AI — Interpretability, Robustness, and Safety across Modalities Workshop at ICLR 2026.","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"arxiv:2603.22779","title":"KARMA: Knowledge-Action Regularized Multimodal Alignment for Personalized Search at Taobao","url":"http://arxiv.org/abs/2603.22779","published":"2026-03-24","authors":["Zhi Sun","Wenming Zhang","Yi Wei","Liren Yu","Zhixuan Zhang","Dan Ou","Haihong Tang"],"abstract":"Large Language Models (LLMs) are equipped with profound semantic knowledge, making them a natural choice for injecting semantic generalization into personalized search systems. However, in practice we find that directly fine-tuning LLMs on industrial personalized tasks (e.g. next item prediction) often yields suboptimal results. We attribute this bottleneck to a critical Knowledge--Action Gap: the inherent conflict between preserving pre-trained semantic knowledge and aligning with specific personalized actions by discriminative objectives. Empirically, action-only training objectives induce Semantic Collapse, such as attention \"sinks\". This degradation severely cripples the LLM's generalization, failing to bring improvements to personalized search systems. We propose KARMA (Knowledge--Action Regularized Multimodal Alignment), a unified framework that treats semantic reconstruction as a....","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"","openalex_id":"https://openalex.org/W7141772138","cited_by_count":0,"quality_score":49,"matched_keywords":["LLM","personalized","retrieval"],"author_affiliations":["Alibaba Group (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7569000124931335},{"id":"https://openalex.org/C41608201","display_name":"Embedding","score":0.5990999937057495},{"id":"https://openalex.org/C547328371","display_name":"Karma","score":0.5706999897956848},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5619000196456909},{"id":"https://openalex.org/C2776214188","display_name":"Inference","score":0.5346999764442444},{"id":"https://openalex.org/C184337299","display_name":"Semantics (computer science)","score":0.48890000581741333},{"id":"https://openalex.org/C166423231","display_name":"Semantic search","score":0.48489999771118164},{"id":"https://openalex.org/C90312973","display_name":"Semantic data model","score":0.43479999899864197}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"arxiv:2603.22942","title":"Optimizing Small Language Models for NL2SQL via Chain-of-Thought Fine-Tuning","url":"http://arxiv.org/abs/2603.22942","published":"2026-03-24","authors":["Anshul Solanki","Sanchit Latawa","Koushik Chakraborty","Navneet Kamboj"],"abstract":"Translating Natural Language to SQL (NL2SQL) remains a critical bottleneck for democratization of data in enterprises. Although Large Language Models (LLMs) like Gemini 2.5 and other LLMs have demonstrated impressive zero-shot capabilities, their high inference costs limit deployment at scale. This paper explores the efficacy of fine-tuning both large and small language models on NL2SQL tasks. Our research reveals a counter-intuitive scaling phenomenon. Fine-tuning large models (Gemini 2.5 Flash/Lite) on standard datasets yields negligible returns, often leading to overfitting on complex queries. Conversely, small models (Qwen) show significant gains. Fine-tuning improved the small model baseline from 36% to 45%, and further enriching the dataset with explicit Chain-of-Thought (CoT) reasoning surged accuracy to 54.5%(Fig 2). While this is still lower than the accuracy of large models lik...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"","openalex_id":"https://openalex.org/W7140954458","cited_by_count":0,"quality_score":41,"matched_keywords":["efficient"],"author_affiliations":["Global Services (Slovakia)","Google (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7663000226020813},{"id":"https://openalex.org/C22019652","display_name":"Overfitting","score":0.7210999727249146},{"id":"https://openalex.org/C2776214188","display_name":"Inference","score":0.6111000180244446},{"id":"https://openalex.org/C2780513914","display_name":"Bottleneck","score":0.6100999712944031},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.5235000252723694},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5131999850273132},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.511900007724762},{"id":"https://openalex.org/C105339364","display_name":"Software deployment","score":0.44040000438690186}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/early-discoveries-of-algorithmist-i-promise-of-provable-algorithm-synthesis-at-scale","title":"Early Discoveries of Algorithmist I: Promise of Provable Algorithm Synthesis at Scale","url":"https://www.microsoft.com/en-us/research/publication/early-discoveries-of-algorithmist-i-promise-of-provable-algorithm-synthesis-at-scale/","published":"2026-03-23","authors":["Janardhan (Jana) Kulkarni"],"abstract":"Designing algorithms with provable guarantees that also work well in practice remains difficult, requiring both mathematical reasoning and careful implementation. Existing approaches that bridge worst-case theory and empirical performance, such as beyond-worst-case analysis and data-driven algorithm selection, typically assume prior distributional knowledge or restrict attention to a fixed pool of algorithms. Recent progress in LLMs suggests a new possibility: provable algorithm synthesis on the fly. To study this, we built Algorithmist, an autonomous researcher agent on top of GitHub Copilot that runs a multi-agent research-and-review loop, with separate stages for idea generation, algorithm and proof development, proof-guided implementation, and review of proofs, code, and their alignment. We evaluate Algorithmist on research-level tasks in private data analysis and clustering. When as...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":92,"matched_keywords":["Miscellaneous","Algorithms","Artificial intelligence","Programming languages and software engineering","Computer science","large language models","software engineering","LLM","agent","multi-agent"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/designing-medical-chatbots-where-accuracy-and-acceptability-are-in-conflict-an-exploratory-vignette-based-study-in-urban-india","title":"Designing Medical Chatbots where Accuracy and Acceptability are in Conflict: An Exploratory, Vignette-based Study in Urban India","url":"https://www.microsoft.com/en-us/research/publication/designing-medical-chatbots-where-accuracy-and-acceptability-are-in-conflict-an-exploratory-vignette-based-study-in-urban-india/","published":"2026-03-23","authors":["Ananditha Raghunath","W. Thies","Mohit Jain"],"abstract":"When medical chatbots provide advice that conflicts with users' lived care experiences, users are left to interpret, negotiate, and evaluate the legitimacy of that guidance. In India, the widespread overuse of antibiotics, antidiarrheals, and injections has shifted patient expectations away from the guideline-aligned advice that chatbots are trained to provide. We present a mixed-methods, vignette-based study with 200 urban Indian adults examining preferences for and against guideline-aligned, norm-divergent advice in chatbot transcripts. We find that a majority of users reject such advice, drawing on diverse rationales grounded in their lived expectations. Through the design and introduction of context-aware nudges, we support expectation alignment that shifts preferences towards transcripts containing guideline-aligned advice. In doing so, we surface key tensions in the equitable desig...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":80,"matched_keywords":["Miscellaneous","Artificial intelligence","Human-computer interaction","Medical, health and genomics","Computer science","Health care","Human Computer Interaction"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/cap-x-a-framework-for-benchmarking-and-improving-coding-agents-for-robot-manipulation","title":"CaP-X: A Framework for Benchmarking and Improving Coding Agents for Robot Manipulation","url":"https://www.microsoft.com/en-us/research/publication/cap-x-a-framework-for-benchmarking-and-improving-coding-agents-for-robot-manipulation/","published":"2026-03-23","authors":["Max Fu","Justin Yu","Karim El-Refai","Ethan Kou","Haoru Xue","Huang Huang","Wenli Xiao","Guanzhi Wang","Fei-Fei Li","Guanya Shi","Jiajun Wu","Shankar Sastry"],"abstract":"\"Code-as-Policy\"considers how executable code can complement data-intensive Vision-Language-Action (VLA) methods, yet their effectiveness as autonomous controllers for embodied manipulation remains underexplored. We present CaP-X, an open-access framework for systematically studying Code-as-Policy agents in robot manipulation. At its core is CaP-Gym, an interactive environment in which agents control robots by synthesizing and executing programs that compose perception and control primitives. Building on this foundation, CaP-Bench evaluates frontier language and vision-language models across varying levels of abstraction, interaction, and perceptual grounding. Across 12 models, CaP-Bench reveals a consistent trend: performance improves with human-crafted abstractions but degrades as these priors are removed, exposing a dependence on designer scaffolding. At the same time, we observe that...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Miscellaneous","Artificial intelligence","Programming languages and software engineering","Systems and networking","Computer science","Robotics"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"hf-org-paper:Qwen:2603.22117","title":"On the Direction of RLVR Updates for LLM Reasoning: Identification and Exploitation","url":"https://huggingface.co/papers/2603.22117","published":"2026-03-23","authors":["Alibaba/Qwen"],"abstract":"","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["HuggingFace org papers","Qwen","LLM"],"author_affiliations":["Alibaba/Qwen"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/Qwen/papers"}},{"id":"hf-org-paper:Qwen:2603.22446","title":"Sparse but Critical: A Token-Level Analysis of Distributional Shifts in RLVR Fine-Tuning of LLMs","url":"https://huggingface.co/papers/2603.22446","published":"2026-03-23","authors":["Alibaba/Qwen"],"abstract":"","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["HuggingFace org papers","Qwen"],"author_affiliations":["Alibaba/Qwen"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/Qwen/papers"}},{"id":"hf-org-paper:Tencent-Hunyuan:2603.21872","title":"Manifold-Aware Exploration for Reinforcement Learning in Video Generation","url":"https://huggingface.co/papers/2603.21872","published":"2026-03-23","authors":["Tencent/Hunyuan"],"abstract":"","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["HuggingFace org papers","Tencent-Hunyuan"],"author_affiliations":["Tencent/Hunyuan"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/Tencent-Hunyuan/papers"}},{"id":"official:0bb3da748fb5ff59","title":"Creating with Sora Safely","url":"https://openai.com/index/creating-with-sora-safely","published":"2026-03-23","authors":["OpenAI"],"abstract":"To address the novel safety challenges posed by a state-of-the-art video model as well as a new social creation platform, we’ve built Sora 2 and the Sora app with safety at the foundation. Our approach is anchored in concrete protections.","companies":["OpenAI"],"matched_orgs":["OpenAI"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Safety"],"author_affiliations":["OpenAI"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official RSS feed https://openai.com/news/rss.xml"}},{"id":"apple:zc1oyvefbew99hi70t8kug51","title":"Optimal Splitting of Language Models from Mixtures to Specialized Domains","url":"https://machinelearning.apple.com/research/optimal-splitting","published":"2026-03-23","authors":["Skyler Seto","Pierre Ablin","Anastasiia Filippova","Jiayuan Ye","Louis Béthune","Angelos Katharopoulos","David Grangier"],"abstract":"This paper was accepted at the Workshop on Navigating and Addressing Data Problems for Foundation Models at ICLR 2026.","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"arxiv:2603.22455","title":"SkillRouter: Retrieve-and-Rerank Skill Selection for LLM Agents at Scale","url":"http://arxiv.org/abs/2603.22455","published":"2026-03-23","authors":["YanZhao Zheng","ZhenTao Zhang","Chao Ma","Yuanqiang Yu","JiHuan Zhu","Baohua Dong","Hangcheng Zhu"],"abstract":"As LLM agent ecosystems grow, the number of available skills (tools, plugins) has reached tens of thousands, making it infeasible to inject all skills into an agent's context. This creates a need for skill routing -- retrieving the most relevant skills from a large pool given a user task. The problem is compounded by pervasive functional overlap in community skill repositories, where many skills share similar names and purposes yet differ in implementation details. Despite its practical importance, skill routing remains under-explored. Current agent architectures adopt a progressive disclosure design -- exposing only skill names and descriptions to the agent while keeping the full implementation body hidden -- implicitly treating metadata as sufficient for selection. We challenge this assumption through a systematic empirical study on a benchmark of ~$80K skills and 75 expert-verified qu...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"","openalex_id":"https://openalex.org/W7140954339","cited_by_count":0,"quality_score":49,"matched_keywords":["LLM","retrieval","agent"],"author_affiliations":["Alibaba Group (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7160000205039978},{"id":"https://openalex.org/C185798385","display_name":"Benchmark (surveying)","score":0.6363999843597412},{"id":"https://openalex.org/C74172769","display_name":"Routing (electronic design automation)","score":0.6049000024795532},{"id":"https://openalex.org/C93518851","display_name":"Metadata","score":0.5206000208854675},{"id":"https://openalex.org/C43521106","display_name":"Pipeline (software)","score":0.5012000203132629},{"id":"https://openalex.org/C2778755073","display_name":"Scale (ratio)","score":0.49549999833106995},{"id":"https://openalex.org/C81917197","display_name":"Selection (genetic algorithm)","score":0.4713999927043915},{"id":"https://openalex.org/C26517878","display_name":"Key (lock)","score":0.45969998836517334}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/niyama-breaking-the-silos-of-llm-inference-serving","title":"QoServe : Breaking the Silos of LLM Inference Serving","url":"https://www.microsoft.com/en-us/research/publication/niyama-breaking-the-silos-of-llm-inference-serving/","published":"2026-03-22","authors":["Kanishk Goel","Jayashree Mohan","Nipun Kwatra","Ravi Shreyas Anupindi","Ramachandran Ramjee"],"abstract":"The widespread adoption of Large Language Models (LLMs) has enabled diverse applications with very different latency requirements. Existing LLM serving frameworks rely on siloed infrastructure with coarse-grained workload segregation -- interactive and batch -- leading to inefficient resource utilization and limited support for fine-grained Quality-of-Service (QoS) differentiation. This results in operational inefficiencies, over-provisioning and poor load management during traffic surges.We present Niyama, a novel QoS-driven inference serving system that enables efficient co-scheduling of diverse workloads on shared infrastructure. Niyama introduces fine-grained QoS classification allowing applications to specify precise latency requirements, and dynamically adapts scheduling decisions based on real-time system state. Leveraging the predictable execution characteristics of LLM inference...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":80,"matched_keywords":["Inproceedings (Conference)","Systems and networking","Computer science","systems","1970-01-01","LLM","efficient"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"bytedance-seed:1407","title":"Beyond Token Eviction: Mixed-Dimension Budget Allocation for Efficient KV Cache Compression","url":"https://seed.bytedance.com/en/research/beyond-token-eviction-mixed-dimension-budget-allocation-for-efficient-kv-cache-compression","published":"2026-03-21","authors":["Ruijie Miao","Zhiming Wang","Wang Li","Shiwei Wu","Shufan Liu","Yanbing Jiang","Tong Yang"],"abstract":"Key-value (KV) caching is widely used to accelerate transformer inference, but its memory cost grows linearly with input length, limiting long-context deployment. Existing token eviction methods reduce memory by discarding less important tokens, which can be viewed as a coarse form of dimensionality reduction that assigns each token either zero or full dimension. We propose MixedDimKV, a mixed-dimension KV cache compression method that allocates dimensions to tokens at a more granular level, and MixedDimKV-H, which further integrates head-level importance information. Experiments on long-context benchmarks show that MixedDimKV outperforms prior KV cache compression methods that do not rely on head-level importance profiling. When equipped with the same head-level importance information, MixedDimKV-H consistently outperforms HeadKV. Notably, our approach achieves comparable performance to...","companies":["ByteDance/Seed"],"matched_orgs":["ByteDance/Seed"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Machine Learning","Infrastructures","arXiv","memory","efficient","compression"],"author_affiliations":["ByteDance/Seed"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official ByteDance Seed publication API https://seed.bytedance.com/api/get_article_list_v2"}},{"id":"openalex:W7139982091","title":"The most important features in generalized additive models might be groups of features","url":"https://doi.org/10.1038/s41598-026-43928-4","published":"2026-03-21","authors":["Tomas M. Bosschieter","Luis França","Jessica Wolk","Yiyuan Wu","Bella Mehta","Joseph Dehoney","Orsolya Kiss","Fiona C. Baker","Qingyu Zhao","Rich Caruana","Kilian M. Pohl"],"abstract":"While analyzing the importance of features has become ubiquitous in interpretable machine learning, the joint signal from a group of related features is sometimes overlooked or inadvertently excluded. Neglecting the joint signal could bypass a critical insight: in many instances, the most significant predictors are not isolated features, but rather the combined effect of groups of features. This can be especially problematic for datasets that contain natural groupings of features, including multimodal datasets. This paper introduces a novel approach to determine the importance of a group of features for Generalized Additive Models (GAMs) that is efficient, requires no model retraining, allows defining groups posthoc, permits overlapping groups, and remains meaningful in high-dimensional settings. We showcase properties of our method on three synthetic experiments that illustrate the beha...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1038/s41598-026-43928-4","openalex_id":"https://openalex.org/W7139982091","cited_by_count":0,"quality_score":41,"matched_keywords":["efficient"],"author_affiliations":["Blue Marble Space","Cornell University","Hospital for Special Surgery","Menlo School","Microsoft (United States)","SRI International","Stanford University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5985999703407288},{"id":"https://openalex.org/C18555067","display_name":"Joint (building)","score":0.5590999722480774},{"id":"https://openalex.org/C2781311116","display_name":"Group (periodic table)","score":0.5440999865531921},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5167999863624573},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.5019999742507935},{"id":"https://openalex.org/C194648359","display_name":"Generalized additive model","score":0.43709999322891235},{"id":"https://openalex.org/C2776608160","display_name":"Natural (archaeology)","score":0.40310001373291016},{"id":"https://openalex.org/C2779843651","display_name":"SIGNAL (programming language)","score":0.3716999888420105}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/an-agentic-multi-agent-architecture-for-cybersecurity-risk-management","title":"An Agentic Multi-Agent Architecture for Cybersecurity Risk Management","url":"https://www.microsoft.com/en-us/research/publication/an-agentic-multi-agent-architecture-for-cybersecurity-risk-management/","published":"2026-03-20","authors":["Ravi Gupta","Saket Kumar","Shreeya Sharma","Maulik Dang","Abhishek Aggarwal BigCommerce","U. A. Buffalo","The State University of New York","Buffalo","Ny","Usa","Microsoft","Amazon"],"abstract":"Getting a real cybersecurity risk assessment for a small organization is expensive -- a NIST CSF-aligned engagement runs $15,000 on the low end, takes weeks, and depends on practitioners who are genuinely scarce. Most small companies skip it entirely. We built a six-agent AI system where each agent handles one analytical stage: profiling the organization, mapping assets, analyzing threats, evaluating controls, scoring risks, and generating recommendations. Agents share a persistent context that grows as the assessment proceeds, so later agents build on what earlier ones concluded -- the mechanism that distinguishes this from standard sequential agent pipelines. We tested it on a 15-person HIPAA-covered healthcare company and compared outputs to independent assessments by three CISSP practitioners -- the system agreed with them 85% of the time on severity classifications, covered 92% of i...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":88,"matched_keywords":["Miscellaneous","Artificial intelligence","Security, privacy, and cryptography","Systems and networking","Computer science","Computer security","Engineering","agent","multi-agent"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/memory-over-maps-3d-object-localization-without-reconstruction","title":"Memory Over Maps: 3D Object Localization Without Reconstruction","url":"https://www.microsoft.com/en-us/research/publication/memory-over-maps-3d-object-localization-without-reconstruction/","published":"2026-03-20","authors":["Ruifa Zhou","Xander Yap","Jian Cao","Allison Lau","Boyang Sun","Marc Pollefeys"],"abstract":"Target localization is a prerequisite for embodied tasks such as navigation and manipulation. Conventional approaches rely on constructing explicit 3D scene representations to enable target localization, such as point clouds, voxel grids, or scene graphs. While effective, these pipelines incur substantial mapping time, storage overhead, and scalability limitations. Recent advances in vision-language models suggest that rich semantic reasoning can be performed directly on 2D observations, raising a fundamental question: is a complete 3D scene reconstruction necessary for object localization? In this work, we revisit object localization and propose a map-free pipeline that stores only posed RGB-D keyframes as a lightweight visual memory--without constructing any global 3D representation of the scene. At query time, our method retrieves candidate views, re-ranks them with a vision-language....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Miscellaneous","Computer vision","Computer science","language model","memory"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"hf-org-paper:Qwen:2603.19835","title":"FIPO: Eliciting Deep Reasoning with Future-KL Influenced Policy Optimization","url":"https://huggingface.co/papers/2603.19835","published":"2026-03-20","authors":["Alibaba/Qwen"],"abstract":"","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["HuggingFace org papers","Qwen"],"author_affiliations":["Alibaba/Qwen"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/Qwen/papers"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/memma-coordinating-the-memory-cycle-through-multi-agent-reasoning-and-in-situ-self-evolution","title":"MemMA: Coordinating the Memory Cycle through Multi-Agent Reasoning and In-Situ Self-Evolution","url":"https://www.microsoft.com/en-us/research/publication/memma-coordinating-the-memory-cycle-through-multi-agent-reasoning-and-in-situ-self-evolution/","published":"2026-03-19","authors":["Min Lin","Zhiwei Zhang","Hanqing Lu","Hui Liu","Xianfeng Tang","Qi He","XiangRui Zhang","Suhang Wang"],"abstract":"Memory-augmented LLM agents maintain external memory banks to support long-horizon interaction, yet most existing systems treat construction, retrieval, and utilization as isolated subroutines. This creates two coupled challenges: strategic blindness on the forward path of the memory cycle, where construction and retrieval are driven by local heuristics rather than explicit strategic reasoning, and sparse, delayed supervision on the backward path, where downstream failures rarely translate into direct repairs of the memory bank. To address these challenges, we propose MemMA, a plug-and-play multi-agent framework that coordinates the memory cycle along both the forward and backward paths. On the forward path, a Meta-Thinker produces structured guidance that steers a Memory Manager during construction and directs a Query Reasoner during iterative retrieval. On the backward path, MemMA intr...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":88,"matched_keywords":["Miscellaneous","Artificial intelligence","Computer science","large language models","LLM","memory","retrieval","agent","multi-agent"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/prorl-agent-rollout-as-a-service-for-rl-training-of-multi-turn-llm-agents","title":"ProRL Agent: Rollout-as-a-Service for RL Training of Multi-Turn LLM Agents","url":"https://www.microsoft.com/en-us/research/publication/prorl-agent-rollout-as-a-service-for-rl-training-of-multi-turn-llm-agents/","published":"2026-03-19","authors":["Hao Zhang","Mingjie Liu","Shaokun Zhang","Song Han","Jian Hu","Zhenghui Jin","Yuchi Zhang","Shizhe Diao","Ximing Lu","Binfeng Xu","Zhiding Yu","Jan Kautz"],"abstract":"Multi-turn LLM agents are increasingly important for solving complex, interactive tasks, and reinforcement learning (RL) is a key ingredient for improving their long-horizon behavior. However, RL training requires generating large numbers of sandboxed rollout trajectories, and existing infrastructures often couple rollout orchestration with the training loop, making systems hard to migrate and maintain. Under the rollout-as-a-service philosophy, we present ProRL Agent , a scalable infrastructure that serves the full agentic rollout lifecycle through an API service. ProRL Agent also provides standardized and extensible sandbox environments that support diverse agentic tasks in rootless HPC settings. We validate ProRL Agent through RL training on software engineering, math, STEM, and coding tasks. ProRL Agent is open-sourced and integrated as part of NVIDIA NeMo Gym.","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Miscellaneous","Artificial intelligence","Computer science","Reinforcement learning","LLM","agent"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/act-while-thinking-accelerating-llm-agents-via-pattern-aware-speculative-tool-execution","title":"Act While Thinking: Accelerating LLM Agents via Pattern-Aware Speculative Tool Execution","url":"https://www.microsoft.com/en-us/research/publication/act-while-thinking-accelerating-llm-agents-via-pattern-aware-speculative-tool-execution/","published":"2026-03-19","authors":["Yifan Sui","Han Zhao","Rui Ma","Zhiyuan He","Hao Wang","Jianxun Li","Yuqing Yang"],"abstract":"LLM-powered agents are emerging as a dominant paradigm for autonomous task solving. Unlike standard inference workloads, agents operate in a strictly serial\"LLM-tool\"loop, where the LLM must wait for external tool execution at every step. This execution model introduces severe latency bottlenecks. To address this problem, we propose PASTE, a Pattern-Aware Speculative Tool Execution method designed to hide tool latency through speculation. PASTE is based on the insight that although agent requests are semantically diverse, they exhibit stable application level control flows (recurring tool-call sequences) and predictable data dependencies (parameter passing between tools). By exploiting these properties, PASTE improves agent serving performance through speculative tool execution. Experimental results against state of the art baselines show that PASTE reduces average task completion time b...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Miscellaneous","Artificial intelligence","Computer science","large language models","LLM","agent"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/sol-execbench-speed-of-light-benchmarking-for-real-world-gpu-kernels-against-hardware-limits","title":"SOL-ExecBench: Speed-of-Light Benchmarking for Real-World GPU Kernels Against Hardware Limits","url":"https://www.microsoft.com/en-us/research/publication/sol-execbench-speed-of-light-benchmarking-for-real-world-gpu-kernels-against-hardware-limits/","published":"2026-03-19","authors":["Edward Lin","Sahil Modi","S. Hari","Qijing Huang","Zhifan Ye","N. Qin","Fengzhe Zhou","Yuan Zhang","Jingquan Wang","S. Damani","Dheeraj Peri","Ouye Xie"],"abstract":"As agentic AI systems become increasingly capable of generating and optimizing GPU kernels, progress is constrained by benchmarks that reward speedup over software baselines rather than proximity to hardware-efficient execution. We present SOL-ExecBench, a benchmark of 235 CUDA kernel optimization problems extracted from 124 production and emerging AI models spanning language, diffusion, vision, audio, video, and hybrid architectures, targeting NVIDIA Blackwell GPUs. The benchmark covers forward and backward workloads across BF16, FP8, and NVFP4, including kernels whose best performance is expected to rely on Blackwell-specific capabilities. Unlike prior benchmarks that evaluate kernels primarily relative to software implementations, SOL-ExecBench measures performance against analytically derived Speed-of-Light (SOL) bounds computed by SOLAR, our pipeline for deriving hardware-grounded S...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Miscellaneous","Artificial intelligence","Computer science","Machine learning","efficient"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/from-servers-to-sites-compositional-power-trace-generation-of-llm-inference-for-infrastructure-planning","title":"From Servers to Sites: Compositional Power Trace Generation of LLM Inference for Infrastructure Planning","url":"https://www.microsoft.com/en-us/research/publication/from-servers-to-sites-compositional-power-trace-generation-of-llm-inference-for-infrastructure-planning/","published":"2026-03-19","authors":["Grant Wilkins","Fiodar Kazhamiaka","Ram Rajagopal"],"abstract":"Datacenter operators and electrical utilities rely on power traces at different spatiotemporal scales. Operators use fine-grained traces for provisioning, facility management, and scheduling, while utilities use site-level load profiles for capacity and interconnection planning. Existing datacenter power models do not capture LLM inference workloads, in which GPUs shift rapidly among compute-intensive prefill, lower-power decode, and idle states, and facility demand depends on how these states evolve and synchronize across many devices. We show that LLM inference power can be represented compositionally through two components: workload-driven transitions among operating states and configuration-specific power distributions within those states. Building on this observation, we develop a trace-generation framework that learns from measured traces and synthesizes power profiles for new traf...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Miscellaneous","Artificial intelligence","Computer science","large language models","LLM"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"hf-org-paper:baidu:2603.19228","title":"SAMA: Factorized Semantic Anchoring and Motion Alignment for Instruction-Guided Video Editing","url":"https://huggingface.co/papers/2603.19228","published":"2026-03-19","authors":["Baidu"],"abstract":"","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["HuggingFace org papers","baidu"],"author_affiliations":["Baidu"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/baidu/papers"}},{"id":"arxiv:2603.19415","title":"Scalable Prompt Routing via Fine-Grained Latent Task Discovery","url":"http://arxiv.org/abs/2603.19415","published":"2026-03-19","authors":["Yunyi Zhang","Soji Adeshina","Sheng Guan","Ashwin Ganesh","Zhen Han","Vassilis N. Ioannidis","Huzefa Rangwala","George Karypis"],"abstract":"Prompt routing dynamically selects the most appropriate large language model from a pool of candidates for each query, optimizing performance while managing costs. As model pools scale to include dozens of frontier models with narrow performance gaps, existing approaches face significant challenges: manually defined task taxonomies cannot capture fine-grained capability distinctions, while monolithic routers struggle to differentiate subtle differences across diverse tasks. We propose a two-stage routing architecture that addresses these limitations through automated fine-grained task discovery and task-aware quality estimation. Our first stage employs graph-based clustering to discover latent task types and trains a classifier to assign prompts to discovered tasks. The second stage uses a mixture-of-experts architecture with task-specific prediction heads for specialized quality estimat...","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"","openalex_id":"https://openalex.org/W7140346981","cited_by_count":0,"quality_score":41,"matched_keywords":["language model"],"author_affiliations":["Amazon (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.824400007724762},{"id":"https://openalex.org/C48044578","display_name":"Scalability","score":0.6430000066757202},{"id":"https://openalex.org/C2780451532","display_name":"Task (project management)","score":0.5561000108718872},{"id":"https://openalex.org/C95623464","display_name":"Classifier (UML)","score":0.5019000172615051},{"id":"https://openalex.org/C73555534","display_name":"Cluster analysis","score":0.4796000123023987},{"id":"https://openalex.org/C74172769","display_name":"Routing (electronic design automation)","score":0.4514000117778778},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4431000053882599},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.435699999332428}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7138851455","title":"Cosmos-Surg-DVRK: World Foundation Model-Based Automated Online Evaluation of Surgical Robot Policy Learning","url":"https://doi.org/10.1109/lra.2026.3675962","published":"2026-03-19","authors":["Lukas Zbinden","Nigel Nelson","Juo-Tung Chen","Xinhao Chen","Ji Woong Kim","Mahdi Azizian","Axel Krieger","Sean D. Huver"],"abstract":"The rise of robot-assisted surgery and vision language-action models has accelerated progress in autonomous surgical policies and efficient assessment strategies. However, evaluating these policies directly on physical robotic platforms such as the da Vinci Research Kit (dVRK) remains hindered by high costs, time demands, reproducibility challenges, and variability in execution. World foundation models (WFM) for physical AI offer a transformative approach to simulate complex real-world surgical tasks, such as soft tissue deformation, with a high degree of realism. This work introduces Cosmos-Surg dVRK, a surgical finetune of the Cosmos WFM, which, together with a trained video classifier, enables fully automated online evaluation and benchmarking of surgical policies. We evaluate Cosmos-Surg-dVRK using two independent, teleoperated, real world dVRK surgical datasets. On tabletop suture p...","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/lra.2026.3675962","openalex_id":"https://openalex.org/W7138851455","cited_by_count":0,"quality_score":41,"matched_keywords":["efficient"],"author_affiliations":["Johns Hopkins University","Nvidia (United States)"],"concepts":[{"id":"https://openalex.org/C86251818","display_name":"Benchmarking","score":0.7045000195503235},{"id":"https://openalex.org/C3017684034","display_name":"Surgical robot","score":0.5680000185966492},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5582000017166138},{"id":"https://openalex.org/C43521106","display_name":"Pipeline (software)","score":0.5361999869346619},{"id":"https://openalex.org/C2779370443","display_name":"Surgical planning","score":0.4154999852180481},{"id":"https://openalex.org/C177212765","display_name":"Workflow","score":0.412200003862381},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.40290001034736633},{"id":"https://openalex.org/C90509273","display_name":"Robot","score":0.3939000070095062}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"arxiv:2603.19002","title":"RADIUS: Ranking, Distribution, and Significance - A Comprehensive Alignment Suite for Survey Simulation","url":"http://arxiv.org/abs/2603.19002","published":"2026-03-19","authors":["Weronika Łajewska","Paul Missault","George Davidson","Saab Mansour"],"abstract":"Simulation of surveys using LLMs is emerging as a powerful application for generating human-like responses at scale. Prior work evaluates survey simulation using metrics borrowed from other domains, which are often ad hoc, fragmented, and non-standardized, leading to results that are difficult to compare. Moreover, existing metrics focus mainly on accuracy or distributional measures, overlooking the critical dimension of ranking alignment. In practice, a simulation can achieve high accuracy while still failing to capture the option most preferred by humans - a distinction that is critical in decision-making applications. We introduce RADIUS, a comprehensive two-dimensional alignment suite for survey simulation that captures: 1) RAnking alignment and 2) DIstribUtion alignment, each complemented by statistical Significance testing. RADIUS highlights the limitations of existing metrics, ena...","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"","openalex_id":"https://openalex.org/W7140000982","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Amazon (United States)","University of Luxembourg"],"concepts":[{"id":"https://openalex.org/C79581498","display_name":"Suite","score":0.8517000079154968},{"id":"https://openalex.org/C189430467","display_name":"Ranking (information retrieval)","score":0.723800003528595},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6815999746322632},{"id":"https://openalex.org/C33676613","display_name":"Dimension (graph theory)","score":0.6013000011444092},{"id":"https://openalex.org/C192209626","display_name":"Focus (optics)","score":0.5349000096321106},{"id":"https://openalex.org/C124101348","display_name":"Data mining","score":0.4903999865055084},{"id":"https://openalex.org/C2522767166","display_name":"Data science","score":0.44609999656677246},{"id":"https://openalex.org/C198477413","display_name":"Survey data collection","score":0.39089998602867126}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"arxiv:2603.18806","title":"dTRPO: Trajectory Reduction in Policy Optimization of Diffusion Large Language Models","url":"https://huggingface.co/papers/2603.18806","published":"2026-03-19","authors":["Wenxuan Zhang","Lemeng Wu","Changsheng Zhao","Ernie Chang","Mingchen Zhuge","Zechun Liu","Andy Su","Hanxian Huang","Jun Chen","Chong Zhou","Raghuraman Krishnamoorthi","Vikas Chandra"],"abstract":"Diffusion Large Language Models (dLLMs) introduce a new paradigm for language generation, which in turn presents new challenges for aligning them with human preferences. In this work, we aim to improve the policy optimization for dLLMs by reducing the cost of the trajectory probability calculation, thereby enabling scaled-up offline policy training. We prove that: (i) under reference policy regularization, the probability ratio of the newly unmasked tokens is an unbiased estimate of that of intermediate diffusion states, and (ii) the probability of the full trajectory can be effectively estimated with a single forward pass of a re-masked final state. By integrating these two trajectory reduction strategies into a policy optimization objective, we propose Trajectory Reduction Policy Optimization (dTRPO). We evaluate dTRPO on 7B dLLMs across instruction-following and reasoning benchmarks.....","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["huggingface_search"],"source":"huggingface_search","work_type":"","doi":"","openalex_id":"","cited_by_count":0,"quality_score":27,"matched_keywords":[],"author_affiliations":[],"concepts":[],"official_report":false,"quality_signals":{"company_match_source":"HuggingFace Papers organization/author metadata"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/loc3r-vlm-language-based-localization-and-3d-reasoning-with-vision-language-models","title":"Loc3R-VLM: Language-based Localization and 3D Reasoning with Vision-Language Models","url":"https://www.microsoft.com/en-us/research/publication/loc3r-vlm-language-based-localization-and-3d-reasoning-with-vision-language-models/","published":"2026-03-18","authors":["Kevin Qu","Haozhe Qi","Mihai Dusmanu","Mahdi Rad","Rui Wang","Marc Pollefeys"],"abstract":"Multimodal Large Language Models (MLLMs) have made impressive progress in connecting vision and language, but they still struggle with spatial understanding and viewpoint-aware reasoning. Recent efforts aim to augment the input representations with geometric cues rather than explicitly teaching models to reason in 3D space. We introduce Loc3R-VLM, a framework that equips 2D Vision-Language Models with advanced 3D understanding capabilities from monocular video input. Inspired by human spatial cognition, Loc3R-VLM relies on two joint objectives: global layout reconstruction to build a holistic representation of the scene structure, and explicit situation modeling to anchor egocentric perspective. These objectives provide direct spatial supervision that grounds both perception and language in a 3D context. To ensure geometric consistency and metric-scale alignment, we leverage lightweight....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Miscellaneous","Artificial intelligence","Computer science","Computer Vision and Pattern Recognition","Multimodal Large Language Models"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/learning-to-reason-with-curriculum-i-provable-benefits-of-autocurriculum","title":"Learning to Reason with Curriculum I: Provable Benefits of Autocurriculum","url":"https://www.microsoft.com/en-us/research/publication/learning-to-reason-with-curriculum-i-provable-benefits-of-autocurriculum/","published":"2026-03-18","authors":["Nived Rajaraman","Audrey Huang","Miro Dudík","Robert E. Schapire","Dylan Foster","Akshay Krishnamurthy"],"abstract":"Chain-of-thought reasoning, where language models expend additional computation by producing thinking tokens prior to final responses, has driven significant advances in model capabilities. However, training these reasoning models is extremely costly in terms of both data and compute, as it involves collecting long traces of reasoning behavior from humans or synthetic generators and further post-training the model via reinforcement learning. Are these costs fundamental, or can they be reduced through better algorithmic design? We show that autocurriculum, where the model uses its own performance to decide which problems to focus training on, provably improves upon standard training recipes for both supervised fine-tuning (SFT) and reinforcement learning (RL). For SFT, we show that autocurriculum requires exponentially fewer reasoning demonstrations than non-adaptive fine-tuning, by focus...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Miscellaneous","Artificial intelligence","Computer science","Machine learning","mathematics"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/offload-or-overload-a-platform-measurement-study-of-mobile-robotic-manipulation-workloads","title":"Offload or Overload: A Platform Measurement Study of Mobile Robotic Manipulation Workloads","url":"https://www.microsoft.com/en-us/research/publication/offload-or-overload-a-platform-measurement-study-of-mobile-robotic-manipulation-workloads/","published":"2026-03-18","authors":["Sara Pohland","Xenofon Foukas","Ganesh Ananthanarayanan","Andrey Kolobov","Sanjeev Mehrotra","Bozidar Radunovic","Ankit Verma"],"abstract":"Mobile robotic manipulation–the ability of robots to navigate spaces and interact with objects–is a core capability of physical AI. Foundation models have led to breakthroughs in their performance, but at a significant computational cost. We present the first measurement study of mobile robotic manipulation workloads across onboard, edge, and cloud GPU platforms. We find that the full workload stack is infeasible to run on smaller onboard GPUs, while larger onboard GPUs drain robot batteries several hours faster. Offloading alleviates these constraints but introduces its own challenges, as additional network latency degrades task accuracy, and the bandwidth requirement makes naive cloud offloading impractical. Finally, we quantify opportunities and pitfalls of sharing compute across robot fleets. We believe our measurement study will be crucial to designing inference systems for mobile r...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Tech Report","Artificial intelligence","Systems and networking","Robotics"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"apple:wekzdq9itlyv4euuilw7399v","title":"Prose2Policy (P2P): A Practical LLM Pipeline for Translating Natural-Language Access Policies into Executable Rego","url":"https://machinelearning.apple.com/research/prose2policy","published":"2026-03-18","authors":["Vatsal Gupta","Darshan Sreenivasamurthy"],"abstract":"Prose2Policy (P2P) is a LLM-based practical tool that translates natural-language access control policies (NLACPs) into executable Rego code (the policy language of Open Policy Agent, OPA). It provides a modular, end-to-end pipeline that performs policy detection, component extraction, schema validation, linting, compilation, automatic test generation and execution. Prose2Policy is designed to bridge the gap between human-readable access...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["LLM","agent"],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"apple:ev8x17240hty3n2l678e828p","title":"Goldilocks RL: Tuning Task Difficulty to Escape Sparse Rewards for Reasoning","url":"https://machinelearning.apple.com/research/goldilocks","published":"2026-03-18","authors":["Ilia Mahrooghi","Aryo Lotfi","Emmanuel Abbe"],"abstract":"Reinforcement learning has emerged as a powerful paradigm for unlocking reasoning capabilities in large language models. However, relying on sparse rewards makes this process highly sample-inefficient, as models must navigate vast search spaces with minimal feedback. While classic curriculum learning aims to mitigate this by ordering data based on complexity, the right ordering for a specific model is often unclear. To address this, we propose...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"openalex:W7139120435","title":"From tool to teammate in a randomized controlled trial of clinician-AI collaborative workflows for diagnosis","url":"https://doi.org/10.1038/s41746-026-02545-1","published":"2026-03-18","authors":["Selin Everett","Bryan Bunning","Priyank Jain","Ivan Lopez","Anup Agarwal","Manisha Desai","Robert Gallo","Ethan Goh","Vinay B. Kadiyala","Zahir Kanjee","Jacob M. Koshy","Andrew Olson"],"abstract":"Early studies of large language models (LLMs) in clinical settings have largely treated artificial intelligence (AI) as a tool rather than an active collaborator. As LLMs demonstrate expert-level diagnostic performance, the focus shifts from whether AI can offer valuable suggestions to how it integrates into physicians' diagnostic workflows. We conducted a randomized controlled trial (n = 70 clinicians) to assess a custom system designed for collaborative diagnostic reasoning. The design involved independent diagnostic assessments by the clinician and AI, followed by an AI-generated synthesis integrating both perspectives, highlighting agreements, disagreements, and offering commentary. We evaluated two collaborative workflows: AI as first opinion (preceding clinician) and AI as second opinion (following clinician). Both improved clinician diagnostic accuracy over conventional resources,...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1038/s41746-026-02545-1","openalex_id":"https://openalex.org/W7139120435","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Beth Israel Deaconess Medical Center","Cambridge Health Alliance","Center for Innovation","Harvard University","Microsoft (United States)","Microsoft Research (United Kingdom)","Stanford Medicine","Stanford University","University of Minnesota Medical Center","VA Palo Alto Health Care System"],"concepts":[{"id":"https://openalex.org/C177212765","display_name":"Workflow","score":0.8691999912261963},{"id":"https://openalex.org/C168563851","display_name":"Randomized controlled trial","score":0.6067000031471252},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5794000029563904},{"id":"https://openalex.org/C112313634","display_name":"Complement (music)","score":0.4722999930381775},{"id":"https://openalex.org/C192209626","display_name":"Focus (optics)","score":0.4011000096797943},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3971000015735626},{"id":"https://openalex.org/C2522767166","display_name":"Data science","score":0.3896999955177307},{"id":"https://openalex.org/C3020132585","display_name":"Diagnostic accuracy","score":0.35010001063346863}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/evaluating-llm-simulated-conversations-in-modeling-inconsistent-and-uncollaborative-behaviors-in-human-social-interaction","title":"Evaluating LLM-Simulated Conversations in Modeling Inconsistent and Uncollaborative Behaviors in Human Social Interaction","url":"https://www.microsoft.com/en-us/research/publication/evaluating-llm-simulated-conversations-in-modeling-inconsistent-and-uncollaborative-behaviors-in-human-social-interaction/","published":"2026-03-17","authors":["Ryo Kamoi","Ameya Godbole","Longqi Yang","Rui Zhang","Mengting Wan","Pei Zhou"],"abstract":"Simulating human conversations using large language models (LLMs) has emerged as a scalable methodology for modeling human social interaction. However, simulating human conversations is challenging because they inherently involve inconsistent and uncollaborative behaviors, such as misunderstandings and interruptions. Analysis comparing inconsistent and uncollaborative behaviors in human- and LLM-generated conversations remains limited, although reproducing these behaviors is integral to simulating human-like and complex social interaction. In this work, we introduce CoCoEval, an evaluation framework that analyzes LLM-simulated conversations by detecting 10 types of inconsistent and uncollaborative behaviors at the turn level using an LLM-as-a-Judge. Using CoCoEval, we evaluate GPT-4.1, GPT-5.1, and Claude Opus 4 by comparing the frequencies of detected behaviors in conversations simulate...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Miscellaneous","Artificial intelligence","Human language technologies","Computer science","large language models","LLM"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/pauth-precise-task-scoped-authorization-for-agents","title":"PAuth - Precise Task-Scoped Authorization For Agents","url":"https://www.microsoft.com/en-us/research/publication/pauth-precise-task-scoped-authorization-for-agents/","published":"2026-03-17","authors":["Reshabh K Sharma","Linxi Jiang","Zhiqiang Lin","Shuo Chen"],"abstract":"The emerging agentic web envisions AI agents that reliably fulfill users' natural-language (NL)-based tasks by interacting with existing web services. However, existing authorization models are misaligned with this vision. In particular, today's operator-scoped authorization, exemplified by OAuth, grants broad permissions tied to operators (e.g., the transfer operator) rather than to the specific operations (e.g., transfer $100 to Bob) implied by a user's task. This will inevitably result in overprivileged agents. We introduce Precise Task-Scoped Implicit Authorization (PAuth), a fundamentally different model in which submitting an NL task implicitly authorizes only the concrete operations required for its faithful execution. To make this enforceable at servers, we propose NL slices: symbolic specifications of the calls each service expects, derived from the task and upstream results. Co...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Miscellaneous","Human language technologies","Computer science","Natural language processing","agent"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/molmob0t-large-scale-simulation-enables-zero-shot-manipulation","title":"MolmoB0T: Large-Scale Simulation Enables Zero-Shot Manipulation","url":"https://www.microsoft.com/en-us/research/publication/molmob0t-large-scale-simulation-enables-zero-shot-manipulation/","published":"2026-03-17","authors":["Abhay Deshpande","Maya Guru","Rose Hendrix","Snehal Jauhri","Haoquan Fang","Wilbert Pumacay","Yejin Kim","Quinn Pfeifer","Ying-Chun Lee","Piper Wolters","Omar Rayyan","Mingtong Zhang"],"abstract":"A prevailing view in robot learning is that simulation alone is not enough; effective sim-to-real transfer is widely believed to require at least some real-world data collection or task-specific fine-tuning to bridge the gap between simulated and physical environments. We challenge that assumption. With sufficiently large-scale and diverse simulated synthetic training data, we show that zero-shot transfer to the real world is not only possible, but effective for both static and mobile manipulation. We introduce MolmoBot-Engine, a fully open-source pipeline for procedural data generation across robots, tasks, and diverse simulated environments in MolmoSpaces. With it, we release MolmoBot-Data, a dataset of 1.8 million expert trajectories for articulated object manipulation and pick-and-place tasks. We train three policy classes: MolmoBot, a Molmo2-based multi-frame vision-language model w...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Miscellaneous","Artificial intelligence","Computer science","Robotics","language model"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/ai-scientist-via-synthetic-task-scaling","title":"AI Scientist via Synthetic Task Scaling","url":"https://www.microsoft.com/en-us/research/publication/ai-scientist-via-synthetic-task-scaling/","published":"2026-03-17","authors":["Ziyang Cai","Harkirat Behl"],"abstract":"With the advent of AI agents, automatic scientific discovery has become a tenable goal. Many recent works scaffold agentic systems that can perform machine learning research, but don't offer a principled way to train such agents -- and current LLMs often generate plausible-looking but ineffective ideas. To make progress on training agents that can learn from doing, we provide a novel synthetic environment generation pipeline targeting machine learning agents. Our pipeline automatically synthesizes machine learning challenges compatible with the SWE-agent framework, covering topic sampling, dataset proposal, and code generation. The resulting synthetic tasks are 1) grounded in real machine learning datasets, because the proposed datasets are verified against the Huggingface API and are 2) verified for higher quality with a self-debugging loop. To validate the effectiveness of our syntheti...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"https://openalex.org/W7139147037","cited_by_count":0,"quality_score":72,"matched_keywords":["Unpublished","Artificial intelligence","Computer vision","Computer science","agent"],"author_affiliations":["Microsoft","Microsoft (United States)","Microsoft Research (United Kingdom)"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/intent-formalization-a-grand-challenge-for-reliable-coding-in-the-age-of-ai-agents","title":"Intent Formalization: A Grand Challenge for Reliable Coding in the Age of AI Agents","url":"https://www.microsoft.com/en-us/research/publication/intent-formalization-a-grand-challenge-for-reliable-coding-in-the-age-of-ai-agents/","published":"2026-03-17","authors":["Shuvendu Lahiri"],"abstract":"Agentic AI systems can now generate code with remarkable fluency, but a fundamental question remains: emph{does the generated code actually do what the user intended?} The gap between informal natural language requirements and precise program behavior -- the emph{intent gap} -- has always plagued software engineering, but AI-generated code amplifies it to an unprecedented scale. This article argues that textbf{intent formalization} -- the translation of informal user intent into a set of checkable formal specifications -- is the key challenge that will determine whether AI makes software more reliable or merely more abundant. Intent formalization offers a tradeoff spectrum suitable to the reliability needs of different contexts: from lightweight tests that disambiguate likely misinterpretations, through full functional specifications for formal verification, to domain-specific languages....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Miscellaneous","Programming languages and software engineering","Computer science","Programming language"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"hf-org-paper:Qwen:2603.17024","title":"HopChain: Multi-Hop Data Synthesis for Generalizable Vision-Language Reasoning","url":"https://huggingface.co/papers/2603.17024","published":"2026-03-17","authors":["Alibaba/Qwen"],"abstract":"","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["HuggingFace org papers","Qwen"],"author_affiliations":["Alibaba/Qwen"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/Qwen/papers"}},{"id":"apple:nlhyvya1ltgkb790vm3ckzh3","title":"AMES: Approximate Multi-modal Enterprise Search via Late Interaction Retrieval","url":"https://machinelearning.apple.com/research/ames","published":"2026-03-17","authors":["Tony Joseph","Carlos Pareja","David Lopes Pegna","Abhishek Singh"],"abstract":"We present AMES (Approximate Multimodal Enterprise Search), a unified multimodal late interaction retrieval architecture which is backend agnostic. AMES demonstrates that fine-grained multimodal late interaction retrieval can be deployed within a production grade enterprise search engine without architectural redesign. Text tokens, image patches, and video frames are embedded into a shared representation space using multi-vector encoders,...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["retrieval"],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"arxiv:2603.16455","title":"Evo-Retriever: LLM-Guided Curriculum Evolution with Viewpoint-Pathway Collaboration for Multimodal Document Retrieval","url":"http://arxiv.org/abs/2603.16455","published":"2026-03-17","authors":["Weiqing Li","Jinyue Guo","Yaqi Wang","Haiyang Xiao","Yuewei Zhang","Guohua LIU","Hao Henry Wang"],"abstract":"Visual-language models (VLMs) excel at data mappings, but real-world document heterogeneity and unstructuredness disrupt the consistency of cross-modal embeddings. Recent late-interaction methods enhance image-text alignment through multi-vector representations, yet traditional training with limited samples and static strategies cannot adapt to the model's dynamic evolution, causing cross-modal retrieval confusion. To overcome this, we introduce Evo-Retriever, a retrieval framework featuring an LLM-guided curriculum evolution built upon a novel Viewpoint-Pathway collaboration. First, we employ multi-view image alignment to enhance fine-grained matching via multi-scale and multi-directional perspectives. Then, a bidirectional contrastive learning strategy generates \"hard queries\" and establishes complementary learning paths for visual and textual disambiguation to rebalance supervision. F...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"","openalex_id":"https://openalex.org/W7139143969","cited_by_count":0,"quality_score":45,"matched_keywords":["LLM","retrieval"],"author_affiliations":["Alibaba Group (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7849000096321106},{"id":"https://openalex.org/C2776436953","display_name":"Consistency (knowledge bases)","score":0.7235000133514404},{"id":"https://openalex.org/C47177190","display_name":"Curriculum","score":0.6114000082015991},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.5116999745368958},{"id":"https://openalex.org/C165064840","display_name":"Matching (statistics)","score":0.5097000002861023},{"id":"https://openalex.org/C51632099","display_name":"Training set","score":0.41780000925064087},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4129999876022339},{"id":"https://openalex.org/C157657479","display_name":"Closed captioning","score":0.38609999418258667}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7137870410","title":"Large language models possess some ecological knowledge, but how much?","url":"https://doi.org/10.1016/j.ecoinf.2026.103699","published":"2026-03-17","authors":["Filip Dorm","Joseph Millard","Drew W. Purves","Michael Harfoot","Oisin Mac Aodha"],"abstract":"Large Language Models (LLMs) have shown remarkable capabilities in question answering across various domains, yet their effectiveness in ecological knowledge remains underexplored. Understanding their potential to recall and synthesize ecological information is crucial as AI tools become increasingly integrated into scientific workflows. Here, we assess the ecological knowledge of two LLMs, Gemini 1.5 Pro and GPT-4o , across a suite of ecologically focused tasks. These tasks evaluate an LLM’s ability to predict species presence, generate range maps, list critically endangered species, classify threats, and estimate species traits. We introduce a new benchmark dataset to quantify LLM performance against expert-derived data. While the LLMs tested outperform naive baselines, achieving around 20 percentage points higher accuracy in species presence prediction, they reach only a third of the....","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1016/j.ecoinf.2026.103699","openalex_id":"https://openalex.org/W7137870410","cited_by_count":2,"quality_score":43,"matched_keywords":["LLM"],"author_affiliations":["Google (United Kingdom)","Google DeepMind (United Kingdom)","University of Cambridge","University of Edinburgh","Vizzuality (Spain)","École Polytechnique Fédérale de Lausanne"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5751000046730042},{"id":"https://openalex.org/C18903297","display_name":"Ecology","score":0.5141000151634216},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3804999887943268},{"id":"https://openalex.org/C188147891","display_name":"Cognitive science","score":0.31360000371932983},{"id":"https://openalex.org/C172606299","display_name":"Social ecological model","score":0.2955000102519989},{"id":"https://openalex.org/C2522767166","display_name":"Data science","score":0.28200000524520874},{"id":"https://openalex.org/C179603123","display_name":"Modeling language","score":0.2754000127315521},{"id":"https://openalex.org/C49539007","display_name":"Theoretical ecology","score":0.2752000093460083}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":2}},{"id":"arxiv:2603.16110","title":"VIGIL: Towards Edge-Extended Agentic AI for Enterprise IT Support","url":"http://arxiv.org/abs/2603.16110","published":"2026-03-17","authors":["Sarthak Ahuja","Neda Kordjazi","Evren Yortucboylu","Vishaal Kapoor","Mariam Dundua","Yiming Li","Derek Ho","Vaibhavi Padala","Jennifer Whitted","Rebecca Steinert"],"abstract":"Enterprise IT support is constrained by heterogeneous devices, evolving policies, and long-tail failure modes that are difficult to resolve centrally. We present VIGIL, an edge-extended agentic AI system that deploys desktop-resident agents to perform situated diagnosis, retrieval over enterprise knowledge, and policy-governed remediation directly on user devices with explicit consent and end-to-end observability. In a 10-week pilot of VIGIL's operational loop on 100 resource-constrained endpoints, VIGIL reduces interaction rounds by 39%, achieves at least 4 times faster diagnosis, and supports self-service resolution in 82% of matched cases. Users report excellent usability, high trust, and low cognitive workload across four validated instruments, with qualitative feedback highlighting transparency as critical for trust. Notably, users rated the system higher when no historical matches....","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"","openalex_id":"https://openalex.org/W7139144888","cited_by_count":0,"quality_score":41,"matched_keywords":["retrieval"],"author_affiliations":["Amazon (Germany)","Amazon (United States)"],"concepts":[{"id":"https://openalex.org/C132829578","display_name":"Situated","score":0.61080002784729},{"id":"https://openalex.org/C36299963","display_name":"Observability","score":0.550000011920929},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5259000062942505},{"id":"https://openalex.org/C56739046","display_name":"Knowledge management","score":0.5109000205993652},{"id":"https://openalex.org/C2778476105","display_name":"Workload","score":0.4799000024795532},{"id":"https://openalex.org/C2780233690","display_name":"Transparency (behavior)","score":0.4659999907016754},{"id":"https://openalex.org/C108154423","display_name":"Salience (neuroscience)","score":0.4507000148296356},{"id":"https://openalex.org/C4554734","display_name":"Knowledge base","score":0.37869998812675476}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"arxiv:2603.20275","title":"Understanding Pruning Regimes in Vision-Language Models Through Domain-Aware Layer Selection","url":"http://arxiv.org/abs/2603.20275","published":"2026-03-17","authors":["Saeed Khaki","Nima Safaei","Kamal Ginotra"],"abstract":"Transformer-based vision-language models (VLMs) contain substantial depth redundancy, yet the effect of removing specific decoder layers remains poorly understood, especially for domains that require tight coupling between perception and multi-step reasoning. We study structured decoder layer pruning through the lens of domain-aware activation similarity, measuring how strongly each layer transforms representations for math versus non-math inputs. This yields simple math-aware, non-math-aware, and mixed ranking criteria that identify layers whose input-output activations change least within a target domain. Across two state-of-the-art VLMs and a broad suite of math and general multimodal benchmarks, we uncover a consistent three-regime structure: at low pruning budgets, performance is highly sensitive to which layers are removed; at moderate budgets, methods converge as structural damage...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"","openalex_id":"https://openalex.org/W7140347093","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Microsoft (United States)","Microsoft Research (United Kingdom)"],"concepts":[{"id":"https://openalex.org/C108010975","display_name":"Pruning","score":0.7159000039100647},{"id":"https://openalex.org/C2779227376","display_name":"Layer (electronics)","score":0.6208999752998352},{"id":"https://openalex.org/C112972136","display_name":"Stability (learning theory)","score":0.5929999947547913},{"id":"https://openalex.org/C189430467","display_name":"Ranking (information retrieval)","score":0.5467000007629395},{"id":"https://openalex.org/C79581498","display_name":"Suite","score":0.5454000234603882},{"id":"https://openalex.org/C165064840","display_name":"Matching (statistics)","score":0.5432000160217285},{"id":"https://openalex.org/C81917197","display_name":"Selection (genetic algorithm)","score":0.539900004863739},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5393000245094299}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/amplification-effects-in-test-time-reinforcement-learning-safety-and-reasoning-vulnerabilities","title":"Amplification Effects in Test-Time Reinforcement Learning: Safety and Reasoning Vulnerabilities","url":"https://www.microsoft.com/en-us/research/publication/amplification-effects-in-test-time-reinforcement-learning-safety-and-reasoning-vulnerabilities/","published":"2026-03-16","authors":["Vanshaj Khattar","Md. Rafi Ur Rashid","Moumita Choudhury","Jing Liu","T. Koike-Akino","Ming Jin","Ye Wang"],"abstract":"Test-time training (TTT) has recently emerged as a promising method to improve the reasoning abilities of large language models (LLMs), in which the model directly learns from test data without access to labels. However, this reliance on test data also makes TTT methods vulnerable to harmful prompt injections. In this paper, we investigate safety vulnerabilities of TTT methods, where we study a representative self-consistency-based test-time learning method: test-time reinforcement learning (TTRL), a recent TTT method that improves LLM reasoning by rewarding self-consistency using majority vote as a reward signal. We show that harmful prompt injection during TTRL amplifies the model's existing behaviors, i.e., safety amplification when the base model is relatively safe, and harmfulness amplification when it is vulnerable to the injected data. In both cases, there is a decline in reasonin...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Unpublished","Artificial intelligence","Security, privacy, and cryptography","Systems and networking","Computer science","LLM"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"bytedance-seed:1426","title":"Mixture-of-Depths Attention","url":"https://seed.bytedance.com/en/research/mixture-of-depths-attention","published":"2026-03-16","authors":["Lianghui Zhu","Yuxin Fang","Bencheng Liao","Shijie Wang","Tianheng Cheng","Zilong Huang","Chen Chen","Lai Wei","Yutao Zeng","Ya Wang","Yi Lin","Yu Li"],"abstract":"Scaling depth is a key driver for large language models (LLMs). Yet, as LLMs become deeper, they often suffer from signal degradation: informative features formed in shallow layers are gradually diluted by repeated residual updates, making them harder to recover in deeper layers. We introduce mixture-of-depths attention (MoDA), a mechanism that allows each attention head to attend to sequence KV pairs at the current layer and depth KV pairs from preceding layers. We further describe a hardware-efficient algorithm for MoDA that resolves non-contiguous memory-access patterns, achieving 97.3% of FlashAttention-2's efficiency at a sequence length of 64K. Experiments on 1.5B-parameter models demonstrate that MoDA consistently outperforms strong baselines. Notably, it improves average perplexity by 0.2 across 10 validation benchmarks and increases average performance by 2.11% on 10 downstream....","companies":["ByteDance/Seed"],"matched_orgs":["ByteDance/Seed"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Computation and Language","Multimodal","arXiv","memory","efficient"],"author_affiliations":["ByteDance/Seed"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official ByteDance Seed publication API https://seed.bytedance.com/api/get_article_list_v2"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/distributional-open-ended-evaluation-of-llm-cultural-value-alignment-based-on-value-codebook","title":"Distributional Open-Ended Evaluation of LLM Cultural Value Alignment Based on Value Codebook","url":"https://www.microsoft.com/en-us/research/publication/distributional-open-ended-evaluation-of-llm-cultural-value-alignment-based-on-value-codebook/","published":"2026-03-16","authors":["Jaehyeok Lee","Xiaoyuan Yi","Jingtao Yao","Hyunjin Hwang","Roy Ka-Wei Lee","Xing Xie","Jinyeong Bak"],"abstract":"As LLMs are globally deployed, aligning their cultural value orientations is critical for safety and user engagement. However, existing benchmarks face the Construct-Composition-Context ($C^3$) challenge: relying on discriminative, multiple-choice formats that probe value knowledge rather than true orientations, overlook subcultural heterogeneity, and mismatch with real-world open-ended generation. We introduce DOVE, a distributional evaluation framework that directly compares human-written text distributions with LLM-generated outputs. DOVE utilizes a rate-distortion variational optimization objective to construct a compact value-codebook from 10K documents, mapping text into a structured value space to filter semantic noise. Alignment is measured using unbalanced optimal transport, capturing intra-cultural distributional structures and sub-group diversity. Experiments across 12 LLMs sh...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Miscellaneous","Human language technologies","Computation and Language","Computer science","LLM"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/molora-composable-specialization-via-per-token-adapter-routing","title":"MoLoRA: Composable Specialization via Per-Token Adapter Routing","url":"https://www.microsoft.com/en-us/research/publication/molora-composable-specialization-via-per-token-adapter-routing/","published":"2026-03-16","authors":["Shrey B. Shah","Justin Wagle"],"abstract":"Multi-adapter serving systems route entire sequences to a single adapter, forcing a choice when requests span multiple domains. This assumption fails in two important settings: (1) multimodal generation, where text and image tokens require different adapters within the same sequence, and (2) mixed-capability requests like\"write code to solve this equation,\"which need expertise from multiple specialized adapters. We introduce per-token routing, which routes individual tokens to adapters based on either vocabulary structure (for multimodal models) or learned gating (for semantic specialization). Per-token routing is provably optimal, achieving work N for N tokens versus K cdot N for per-sequence routing with K adapter types. Our key contribution is MoLoRA (Mixture of LoRA), which enables composable specialization: load multiple domain-specific adapters and let a learned router select the a...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Miscellaneous","Artificial intelligence","Computer science","Machine learning"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"arxiv:2603.14864","title":"Shopping Companion: A Memory-Augmented LLM Agent for Real-World E-Commerce Tasks","url":"http://arxiv.org/abs/2603.14864","published":"2026-03-16","authors":["Zijian Yu","Kejun Xiao","Huaipeng Zhao","Tao Luo","Xiaoyi Zeng"],"abstract":"In e-commerce, LLM agents show promise for shopping tasks such as recommendations, budgeting, and bundle deals, where accurately capturing user preferences from long-term conversations is critical. However, two challenges hinder realizing this potential: (1) the absence of benchmarks for evaluating long-term preference-aware shopping tasks, and (2) the lack of end-to-end optimization due to existing designs that treat preference identification and shopping assistance as separate components. In this paper, we introduce a novel benchmark with a long-term memory setup, spanning two shopping tasks over 1.2 million real-world products, and propose Shopping Companion, a unified framework that jointly tackles memory retrieval and shopping assistance while supporting user intervention. To train such capabilities, we develop a dual-reward reinforcement learning strategy with tool-wise rewards to....","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"","openalex_id":"https://openalex.org/W7139145170","cited_by_count":0,"quality_score":61,"matched_keywords":["LLM","preference","memory","long-term","retrieval","agent"],"author_affiliations":["Alibaba Group (China)","Alibaba Group (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7609999775886536},{"id":"https://openalex.org/C185798385","display_name":"Benchmark (surveying)","score":0.722100019454956},{"id":"https://openalex.org/C2780451532","display_name":"Task (project management)","score":0.6784999966621399},{"id":"https://openalex.org/C97541855","display_name":"Reinforcement learning","score":0.6567999720573425},{"id":"https://openalex.org/C2778134712","display_name":"Bundle","score":0.5898000001907349},{"id":"https://openalex.org/C2781249084","display_name":"Preference","score":0.5637999773025513},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.4593000113964081},{"id":"https://openalex.org/C116834253","display_name":"Identification (biology)","score":0.459199994802475}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"hf-org-paper:moonshotai:2603.15031","title":"Attention Residuals","url":"https://huggingface.co/papers/2603.15031","published":"2026-03-16","authors":["Moonshot/Kimi"],"abstract":"","companies":["Moonshot/Kimi"],"matched_orgs":["Moonshot/Kimi"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["HuggingFace org papers","moonshotai"],"author_affiliations":["Moonshot/Kimi"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/moonshotai/papers"}},{"id":"apple:q5hhmc8re2iptk6vj1ftncoj","title":"RubiCap: Rubric-Guided Reinforcement Learning for Dense Image Captioning","url":"https://machinelearning.apple.com/research/rubicap","published":"2026-03-16","authors":["Tzu-Heng Huang","Sirajul Salekin","Javier Movellan","Frederic Sala","Manjot Bilkhu"],"abstract":"Dense image captioning is critical for cross-modal alignment in vision-language pretraining and text-to-image generation, but scaling expert-quality annotations is prohibitively expensive. While synthetic captioning via strong vision-language models (VLMs) is a practical alternative, supervised distillation often yields limited output diversity and weak generalization. Reinforcement learning (RL) could overcome these limitations, but its...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["distillation"],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"openalex:W7136147378","title":"Insulin resistance prediction from wearables and routine blood biomarkers","url":"https://doi.org/10.1038/s41586-026-10179-2","published":"2026-03-16","authors":["Ahmed A. Metwally","A. Ali Heydari","Daniel McDuff","Alexandru Solot","Zeinab Esmaeilpour","Anthony Z. Faranesh","Menglian Zhou","Girish Narayanswamy","Maxwell A. Xu","Xin Liu","Yuzhe Yang","David B. Savage"],"abstract":", median age = 45 years, median haemoglobin A1c (HbA1c) = 5.4%) that uses time-series data from wearable devices and routine blood biomarkers to train deep neural networks against a ground-truth measure of IR (homeostatic model assessment of IR; HOMA-IR). Using a HOMA-IR cut-off of 2.9, our multimodal model achieved robust performance (area under the receiver operating characteristic curve (AUROC) = 0.80, sensitivity = 76%, specificity = 84%) with data from wearable devices, together with demographic and routine blood biomarker data. To enhance the use of time-series data from wearables, we fine-tuned a wearable foundation model (WFM) pretrained on 40 million hours of sensor data. In an independent validation cohort (n = 72), a model integrating WFM-derived representations with demographic data surpassed a demographics-only baseline (AUROC = 0.75 versus 0.66). Moreover, adding WFM-derive...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1038/s41586-026-10179-2","openalex_id":"https://openalex.org/W7136147378","cited_by_count":1,"quality_score":46,"matched_keywords":["language model","personalized"],"author_affiliations":["Google (United States)","University of Cambridge"],"concepts":[{"id":"https://openalex.org/C150594956","display_name":"Wearable computer","score":0.8586000204086304},{"id":"https://openalex.org/C2781197716","display_name":"Biomarker","score":0.649399995803833},{"id":"https://openalex.org/C2777391703","display_name":"Insulin resistance","score":0.6388999819755554},{"id":"https://openalex.org/C71924100","display_name":"Medicine","score":0.5881999731063843},{"id":"https://openalex.org/C54290928","display_name":"Wearable technology","score":0.5613999962806702},{"id":"https://openalex.org/C58471807","display_name":"Receiver operating characteristic","score":0.5246999859809875},{"id":"https://openalex.org/C72563966","display_name":"Cohort","score":0.4388999938964844},{"id":"https://openalex.org/C60644358","display_name":"Bioinformatics","score":0.4242999851703644}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"arxiv:2603.15386","title":"RieMind: Geometry-Grounded Spatial Agent for Scene Understanding","url":"http://arxiv.org/abs/2603.15386","published":"2026-03-16","authors":["Fernando Ropero","Erkin Turkoz","Daniel Matos","Junqing Du","Antonio Ruiz–Cortés","Zhang, Yanfeng","L. Liu","Mingwei Sun","Yejie Wang"],"abstract":"Visual Language Models (VLMs) have increasingly become the main paradigm for understanding indoor scenes, but they still struggle with metric and spatial reasoning. Current approaches rely on end-to-end video understanding or large-scale spatial question answering fine-tuning, inherently coupling perception and reasoning. In this paper, we investigate whether decoupling perception and reasoning leads to improved spatial reasoning. We propose an agentic framework for static 3D indoor scene reasoning that grounds an LLM in an explicit 3D scene graph (3DSG). Rather than ingesting videos directly, each scene is represented as a persistent 3DSG constructed by a dedicated perception module. To isolate reasoning performance, we instantiate the 3DSG from ground-truth annotations. The agent interacts with the scene exclusively through structured geometric tools that expose fundamental properties....","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"","openalex_id":"https://openalex.org/W7139144656","cited_by_count":0,"quality_score":45,"matched_keywords":["LLM","agent"],"author_affiliations":["Huawei Technologies (China)","Huawei Technologies (United Kingdom)","Huawei Technologies (United States)"],"concepts":[{"id":"https://openalex.org/C26760741","display_name":"Perception","score":0.689300000667572},{"id":"https://openalex.org/C2777508537","display_name":"Visual reasoning","score":0.6467000246047974},{"id":"https://openalex.org/C155911833","display_name":"Spatial intelligence","score":0.6392999887466431},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6351000070571899},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5856999754905701},{"id":"https://openalex.org/C2781238097","display_name":"Object (grammar)","score":0.5077999830245972},{"id":"https://openalex.org/C2780451532","display_name":"Task (project management)","score":0.4927000105381012},{"id":"https://openalex.org/C193221554","display_name":"Commonsense reasoning","score":0.4603999853134155}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7136998336","title":"Erratum: Recommendation as Instruction Following: A Large Language Model Empowered Recommendation Approach","url":"https://doi.org/10.1145/3795143","published":"2026-03-16","authors":["Junjie Zhang","Ruobing Xie","Yupeng Hou","Wayne Xin Zhao","Leyu Lin","Ji-Rong Wen"],"abstract":"This is an erratum for the article “Recommendation as Instruction Following: A Large Language Model Empowered Recommendation Approach” published in ACM Trans. Inf. Syst. 43, 5, Article 114 (July 2025), 37 pages.","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3795143","openalex_id":"https://openalex.org/W7136998336","cited_by_count":0,"quality_score":41,"matched_keywords":["language model"],"author_affiliations":["Renmin University of China","Tencent (China)","University of California San Diego"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.9010000228881836},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.7074999809265137},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.44830000400543213},{"id":"https://openalex.org/C557471498","display_name":"Recommender system","score":0.42660000920295715},{"id":"https://openalex.org/C136764020","display_name":"World Wide Web","score":0.39640000462532043},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.39010000228881836},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.3465000092983246},{"id":"https://openalex.org/C67186912","display_name":"Data modeling","score":0.3296999931335449}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7136958704","title":"Deepfake Generation and Detection: A Benchmark and Survey","url":"https://doi.org/10.1145/3801962","published":"2026-03-16","authors":["Gan Pei","Jiangning Zhang","Menghan Hu","Zhenyu Zhang","Chengjie Wang","Yunsheng Wu","Guangtao Zhai","Jian Yang","Dacheng Tao"],"abstract":"Deepfake technology aims to synthesize highly realistic facial images and videos, with broad application potential in entertainment, film production, and digital human modeling. Deep learning has driven major progress in generative modeling, from VAEs and GANs to the recent rise of diffusion models. The latter have sparked a renewed wave of research through their superior generation quality. In addition to deepfake generation, corresponding detection technologies continuously evolve to regulate the potential misuse of deepfakes, such as privacy invasion and phishing attacks. This survey comprehensively reviews the latest developments in deepfake generation and detection, summarizing and analyzing current state-of-the-arts in this rapidly evolving field. First, we unify task definitions, comprehensively introduce datasets and metrics, and summarize the underlying technologies. Then, we re...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3801962","openalex_id":"https://openalex.org/W7136958704","cited_by_count":4,"quality_score":41,"matched_keywords":[],"author_affiliations":["East China Normal University","Nanjing University","Nanyang Technological University","Shanghai Jiao Tong University","Tencent (China)","Zhejiang University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8892999887466431},{"id":"https://openalex.org/C185798385","display_name":"Benchmark (surveying)","score":0.7461000084877014},{"id":"https://openalex.org/C2522767166","display_name":"Data science","score":0.6466000080108643},{"id":"https://openalex.org/C2779304628","display_name":"Face (sociological concept)","score":0.5900999903678894},{"id":"https://openalex.org/C26517878","display_name":"Key (lock)","score":0.5447999835014343},{"id":"https://openalex.org/C2780451532","display_name":"Task (project management)","score":0.4984000027179718},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4957999885082245},{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.4611999988555908}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":4}},{"id":"openalex:W7136216575","title":"AI Agents as Universal Task Solvers","url":"https://doi.org/10.3390/e28030332","published":"2026-03-16","authors":["Alessandro Achille","Stefano Soatto"],"abstract":", the role of which in learning has remained largely unexplored.","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.3390/e28030332","openalex_id":"https://openalex.org/W7136216575","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Amazon (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7964000105857849},{"id":"https://openalex.org/C85847156","display_name":"Verifiable secret sharing","score":0.6459000110626221},{"id":"https://openalex.org/C2780451532","display_name":"Task (project management)","score":0.6220999956130981},{"id":"https://openalex.org/C2776214188","display_name":"Inference","score":0.6108999848365784},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5537999868392944},{"id":"https://openalex.org/C126042441","display_name":"Frame (networking)","score":0.5148000121116638},{"id":"https://openalex.org/C14036430","display_name":"Function (biology)","score":0.508899986743927},{"id":"https://openalex.org/C151201525","display_name":"Limit (mathematics)","score":0.45739999413490295}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"arxiv:2603.15970","title":"100x Cost & Latency Reduction: Performance Analysis of AI Query Approximation using Lightweight Proxy Models","url":"https://arxiv.org/abs/2603.15970","published":"2026-03-16","authors":["Yeounoh Chung","Rushabh Desai","Jian He","Yu Xiao","Thibaud Hottelier","Yves-Laurent Kom Samo","Khadilkar, Pushkar","Xianshun Chen","Sam Idicula","Fatma Özcan","Alon Halevy","Yannis Papakonstantinou"],"abstract":"Several data warehouse and database providers have recently introduced extensions to SQL called AI Queries, enabling users to specify functions and conditions in SQL that are evaluated by LLMs, thereby broadening significantly the kinds of queries one can express over the combination of structured and unstructured data. LLMs offer remarkable semantic reasoning capabilities, making them an essential tool for complex and nuanced queries that blend structured and unstructured data. While extremely powerful, these AI queries can become prohibitively costly when invoked thousands of times. This paper provides an extensive evaluation of a recent AI query approximation approach that enables low cost analytics and database applications to benefit from AI queries. The approach delivers >100x cost and latency reduction for the semantic filter operator and also important gains for semantic ranking....","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3802002","openalex_id":"https://openalex.org/W7138915493","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Google (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8618999719619751},{"id":"https://openalex.org/C2780148112","display_name":"Proxy (statistics)","score":0.6414999961853027},{"id":"https://openalex.org/C185798385","display_name":"Benchmark (surveying)","score":0.5478000044822693},{"id":"https://openalex.org/C75165309","display_name":"Search engine indexing","score":0.5220999717712402},{"id":"https://openalex.org/C82876162","display_name":"Latency (audio)","score":0.5055999755859375},{"id":"https://openalex.org/C123657996","display_name":"Architecture","score":0.4650000035762787},{"id":"https://openalex.org/C510870499","display_name":"SQL","score":0.4609000086784363},{"id":"https://openalex.org/C41608201","display_name":"Embedding","score":0.4296000003814697}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"arxiv:2604.06205","title":"Tool-MCoT: Tool Augmented Multimodal Chain-of-Thought for Content Safety Moderation","url":"http://arxiv.org/abs/2604.06205","published":"2026-03-15","authors":["Shutong Zhang","Dylan Zhou","Yinxiao Liu","Yang Yang","Huiwen Luo","Wenfei Zou"],"abstract":"The growth of online platforms and user content requires strong content moderation systems that can handle complex inputs from various media types. While large language models (LLMs) are effective, their high computational cost and latency present significant challenges for scalable deployment. To address this, we introduce Tool-MCoT, a small language model (SLM) fine-tuned for content safety moderation leveraging external framework. By training our model on tool-augmented chain-of-thought data generated by LLM, we demonstrate that the SLM can learn to effectively utilize these tools to improve its reasoning and decision-making. Our experiments show that the fine-tuned SLM achieves significant performance gains. Furthermore, we show that the model can learn to use these tools selectively, achieving a balance between moderation accuracy and inference efficiency by calling tools only when....","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"","openalex_id":"https://openalex.org/W7153670533","cited_by_count":0,"quality_score":49,"matched_keywords":["LLM","language model","media"],"author_affiliations":["Google (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7605000138282776},{"id":"https://openalex.org/C93225998","display_name":"Moderation","score":0.6955000162124634},{"id":"https://openalex.org/C48044578","display_name":"Scalability","score":0.6342999935150146},{"id":"https://openalex.org/C2776214188","display_name":"Inference","score":0.516700029373169},{"id":"https://openalex.org/C82876162","display_name":"Latency (audio)","score":0.4945000112056732},{"id":"https://openalex.org/C26517878","display_name":"Key (lock)","score":0.4537999927997589},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.39980000257492065},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.39410001039505005}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7141535823","title":"Invited: Polymath: Self-Improving Hierarchical Workflow for Multi-Domain Problem Solving","url":"https://doi.org/10.1145/3764386.3779639","published":"2026-03-15","authors":["Chia-Tung Ho","Jing Gong","Haoyu Yang","Abhishek B. Akkur","Haoxing Ren"],"abstract":"Large language models (LLMs) excel at solving complex tasks by executing agentic workflows composed of detailed instructions and structured operations. However, building agents for diverse applications by manually embedding foundation models into agentic systems such as Chain-of-Thought, Self-Reflection, and ReACT through text interfaces limits scalability and efficiency. Recently, researchers have explored automating workflow generation using code-based representations, but most methods depend on labeled data, limiting their applicability to real-world, dynamic hardware design problems. We introduce Polymath, a self-improving agent with a dynamic hierarchical workflow that combines task flow graphs with code-represented workflows to address these challenges. Polymath employs an experience-driven optimization framework that integrates multi-level graph optimization using surrogate scores...","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3764386.3779639","openalex_id":"https://openalex.org/W7141535823","cited_by_count":0,"quality_score":41,"matched_keywords":["agent"],"author_affiliations":["Nvidia (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6553999781608582},{"id":"https://openalex.org/C177212765","display_name":"Workflow","score":0.4717999994754791},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.31779998540878296},{"id":"https://openalex.org/C98045186","display_name":"Process (computing)","score":0.29319998621940613},{"id":"https://openalex.org/C124101348","display_name":"Data mining","score":0.27950000762939453},{"id":"https://openalex.org/C25343380","display_name":"Relation (database)","score":0.27079999446868896},{"id":"https://openalex.org/C115903868","display_name":"Software engineering","score":0.2703999876976013},{"id":"https://openalex.org/C18762648","display_name":"Work (physics)","score":0.2599000036716461}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7136078069","title":"DesInsert: Strategic descriptive term insertion fools text-to-image generation","url":"https://doi.org/10.1016/j.neunet.2026.108857","published":"2026-03-15","authors":["Fanyu Bu","Kai Ye","Jianan Ma","Tianyi Chen","Zhen Wang"],"abstract":"","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1016/j.neunet.2026.108857","openalex_id":"https://openalex.org/W7136078069","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Hangzhou Dianzi University","Microsoft (United States)","Microsoft Research (United Kingdom)"],"concepts":[{"id":"https://openalex.org/C100279451","display_name":"Perplexity","score":0.819100022315979},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7829999923706055},{"id":"https://openalex.org/C188441871","display_name":"Softmax function","score":0.6887999773025513},{"id":"https://openalex.org/C61797465","display_name":"Term (time)","score":0.6521000266075134},{"id":"https://openalex.org/C184337299","display_name":"Semantics (computer science)","score":0.5907999873161316},{"id":"https://openalex.org/C37736160","display_name":"Adversarial system","score":0.5562000274658203},{"id":"https://openalex.org/C26517878","display_name":"Key (lock)","score":0.5536999702453613},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5494999885559082}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/greedy-information-projection-for-llm-data-selection","title":"Greedy Information Projection for LLM Data Selection","url":"https://www.microsoft.com/en-us/research/publication/greedy-information-projection-for-llm-data-selection/","published":"2026-03-14","authors":["Victor Ye Dong","Kuan-Yun Lee","Jiamei Shuai","Shen Liu","Yi Liu","Jian Jiao"],"abstract":"We present emph{Greedy Information Projection} (textsc{GIP}), a principled framework for choosing training examples for large language model fine-tuning. textsc{GIP} casts selection as maximizing mutual information between a subset of examples and task-specific query signals, which may originate from LLM quality judgments, metadata, or other sources. The framework involves optimizing a closed-form mutual information objective defined using both data and query embeddings, naturally balancing {it quality} and {it diversity}. Optimizing this score is equivalent to maximizing the projection of the query embedding matrix onto the span of the selected data, which provides a geometric explanation for the co-emergence of quality and diversity. Building on this view, we employ a fast greedy matching-pursuit procedure with efficient projection-based updates. On instruction-following and mathematic...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":80,"matched_keywords":["Miscellaneous","Artificial intelligence","Computer science","large language models","LLM","language model","efficient"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/sirens-whisper-inaudible-near-ultrasonic-jailbreaks-of-speech-driven-llms","title":"Sirens'Whisper: Inaudible Near-Ultrasonic Jailbreaks of Speech-Driven LLMs","url":"https://www.microsoft.com/en-us/research/publication/sirens-whisper-inaudible-near-ultrasonic-jailbreaks-of-speech-driven-llms/","published":"2026-03-14","authors":["Zijian Ling","Pingyi Hu","Xiuyong Gao","Xiaojing Ma","Man Zhou","Jun Feng","Songfeng Lu","Dongmei Zhang","Bin Benjamin Zhu"],"abstract":"Speech-driven large language models (LLMs) are increasingly accessed through speech interfaces, introducing new security risks via open acoustic channels. We present Sirens'Whisper (SWhisper), the first practical framework for covert prompt-based attacks against speech-driven LLMs under realistic black-box conditions using commodity hardware. SWhisper enables robust, inaudible delivery of arbitrary target baseband audio-including long and structured prompts-on commodity devices by encoding it into near-ultrasound waveforms that demodulate faithfully after acoustic transmission and microphone nonlinearity. This is achieved through a simple yet effective approach to modeling nonlinear channel characteristics across devices and environments, combined with lightweight channel-inversion pre-compensation. Building on this high-fidelity covert channel, we design a voice-aware jailbreak generati...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Miscellaneous","Artificial intelligence","Audio and Acoustics","Computer science","large language models","Speech recognition"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/multi-object-advertisement-creative-generation","title":"Multi-Object Advertisement Creative Generation","url":"https://www.microsoft.com/en-us/research/publication/multi-object-advertisement-creative-generation/","published":"2026-03-14","authors":["Jialu Gao","Mithun Gupta","Qun Li","Raveena Kshatriya","Andrew D. Wilson","Keng-hao Chang","Balasaravanan Thoravi Kumaravel"],"abstract":"Lifestyle images are photographs that capture environments and objects in everyday settings. In furniture product marketing, advertisers often create lifestyle images containing products to resonate with potential buyers, allowing buyers to visualize how the products fit into their daily lives. While recent advances in Generative Artificial Intelligence (GenAI) have given rise to realistic image content creation, their application in e-commerce advertising is challenging because high-quality ads must authentically representing the products in realistic scearios. Therefore, manual intervention is usually required for individual generations, making it difficult to scale to larger product catalogs. To understand the challenges faced by advertisers using GenAI to create lifestyle images at scale, we conducted evaluations on ad images generated using state-of-the-art image generation models a...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Miscellaneous","Artificial intelligence","Graphics and multimedia","Computer science","Generative AI"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W7138229367","title":"Schema-Guided Scene-Graph Reasoning Based on Multi-Agent Large Language Model System","url":"https://doi.org/10.1609/aaai.v40i36.40285","published":"2026-03-14","authors":["Yiye Chen","Harpreet Sawhney","Nicholas Gydé","Yanan Jian","Jack Saunders","PATRICIO VELA","Benjamin Lundell"],"abstract":"Scene graphs have emerged as a structured and serializable environment representation for grounded spatial reasoning with Large Language Models (LLMs). In this work, we propose SG2, an iterative Schema-Guided Scene-Graph reasoning framework based on multi-agent LLMs. The agents are grouped into two modules: a (1) Reasoner module for abstract task planning and graph information queries generation, and a (2) Retriever module for extracting corresponding graph information based on code-writing following the queries. Two modules collaborate iteratively, enabling sequential reasoning and adaptive attention to graph information. The scene graph schema, prompted to both modules, serves to not only streamline both reasoning and retrieval process, but also guide the cooperation between two modules. This eliminates the need to prompt LLMs with full graph data, reducing the chance of hallucination....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1609/aaai.v40i36.40285","openalex_id":"https://openalex.org/W7138229367","cited_by_count":0,"quality_score":57,"matched_keywords":["LLM","language model","retrieval","agent","multi-agent"],"author_affiliations":["American Rock Mechanics Association","Georgia Institute of Technology","Microsoft (United States)","Microsoft Research (United Kingdom)","Nvidia (United Kingdom)"],"concepts":[{"id":"https://openalex.org/C9616225","display_name":"Semantic reasoner","score":0.7838000059127808},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.724399983882904},{"id":"https://openalex.org/C132525143","display_name":"Graph","score":0.6035000085830688},{"id":"https://openalex.org/C89288958","display_name":"Reasoning system","score":0.4745999872684479},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4733000099658966},{"id":"https://openalex.org/C161301231","display_name":"Knowledge representation and reasoning","score":0.4406000077724457},{"id":"https://openalex.org/C2776359362","display_name":"Representation (politics)","score":0.3646000027656555},{"id":"https://openalex.org/C2780451532","display_name":"Task (project management)","score":0.3562999963760376}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7138314860","title":"MemGuide: Intent-Driven Memory Selection for Goal-Oriented Multi-Session LLM Agents","url":"https://doi.org/10.1609/aaai.v40i36.40313","published":"2026-03-14","authors":["Yiming Du","Bingbing Wang","Yang He","Bin Liang","Baojun Wang","Zhongyang Li","Lin Gui","Jeff Z. Pan","Ruifeng Xu","Kam-Fai Wong"],"abstract":"Modern task-oriented dialogue (TOD) systems increasingly rely on large language model (LLM) agents, leveraging Retrieval-Augmented Generation (RAG) and long-context capabilities for long-term memory utilization. However, these methods prioritise semantic similarity over task intent, degrading multi-session coherence. We propose MemGuide, a two-stage intent-driven memory selection framework: (1) Intent‑Aligned Retrieval retrieves goal-consistent QA‑formatted memory units; (2) Missing‑Slot Guided Filtering reranks units by slot-completion gain via a chain‑of‑thought reasoner and fine‑tuned LLaMA‑8B filter. We also introduce the MS-TOD, the first multi-session TOD benchmark with 132 diverse personas, 956 task goals, and annotated intent-aligned memory targets. Evaluations on MS-TOD show that MemGuide boosts task success rate by 11% (88%→99%) and reduces dialogue length by 2.84 turns, and ma...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1609/aaai.v40i36.40313","openalex_id":"https://openalex.org/W7138314860","cited_by_count":0,"quality_score":57,"matched_keywords":["LLM","language model","memory","long-term","retrieval"],"author_affiliations":["Chinese University of Hong Kong","Harbin Institute of Technology","Hong Kong University of Science and Technology","Huawei Technologies (China)","King's College London","Microsoft (Germany)","University of Edinburgh"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.791100025177002},{"id":"https://openalex.org/C2780451532","display_name":"Task (project management)","score":0.7031000256538391},{"id":"https://openalex.org/C81917197","display_name":"Selection (genetic algorithm)","score":0.6705999970436096},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.6466000080108643},{"id":"https://openalex.org/C185798385","display_name":"Benchmark (surveying)","score":0.641700029373169},{"id":"https://openalex.org/C9616225","display_name":"Semantic reasoner","score":0.6118999719619751},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.5351999998092651},{"id":"https://openalex.org/C125411270","display_name":"Encoding (memory)","score":0.4878000020980835}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7138303837","title":"VRAgent-R1: Boosting Video Recommendation with MLLM-based Agents via Reinforcement Learning","url":"https://doi.org/10.1609/aaai.v40i3.37152","published":"2026-03-14","authors":["Siran Chen","BoYu Chen","Yuxiao Luo","Chenyun Yu","Yi Ouyang","Lei Cheng","Chengxiang Zhuo","Zang Li","Yali Wang"],"abstract":"Large language model (LLM) agents have emerged as a promising solution for enhancing recommendation systems via user simulation. However, existing studies predominantly resort to prompt-based simulation using frozen LLMs, which frequently results in suboptimal item modeling and user preference learning, thereby ultimately constraining recommendation performance. To address these challenges, we introduce VRAgent-R1, a novel agent-based paradigm that incorporates human-like intelligence in user simulation. Specifically, VRAgent-R1 comprises two distinct agents: the Item Perception (IP) Agent and the User Simulation (US) Agent, designed for interactive user-item modeling. Firstly, the IP Agent emulates human-like progressive thinking based on MLLMs, effectively capturing hidden recommendation semantics in videos. With a more comprehensive multimodal content understanding provided by the IP....","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1609/aaai.v40i3.37152","openalex_id":"https://openalex.org/W7138303837","cited_by_count":0,"quality_score":53,"matched_keywords":["LLM","language model","preference","agent"],"author_affiliations":["Shenzhen Institutes of Advanced Technology","Sun Yat-sen University","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8130999803543091},{"id":"https://openalex.org/C97541855","display_name":"Reinforcement learning","score":0.6330999732017517},{"id":"https://openalex.org/C46686674","display_name":"Boosting (machine learning)","score":0.5964999794960022},{"id":"https://openalex.org/C185798385","display_name":"Benchmark (surveying)","score":0.516700029373169},{"id":"https://openalex.org/C184337299","display_name":"Semantics (computer science)","score":0.4844000041484833},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.47909998893737793},{"id":"https://openalex.org/C557471498","display_name":"Recommender system","score":0.46709999442100525},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.43810001015663147}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7139056871","title":"SR-KI: Scalable and Real-Time Knowledge Integration into LLMs via Supervised Attention","url":"https://doi.org/10.1609/aaai.v40i41.40747","published":"2026-03-14","authors":["Bohan Yu","Wei Huang","Kang Liu"],"abstract":"This paper proposes SR-KI, a novel approach for integrating real-time and large-scale structured knowledge bases (KBs) into large language models (LLMs). SR-KI begins by encoding KBs into key-value pairs using a pretrained encoder, and injects them into LLMs' KV cache. Building on this representation, we employ a two-stage training paradigm: first locating a dedicated retrieval layer within the LLM, and then applying an attention-based loss at this layer to explicitly supervise attention toward relevant KB entries. Unlike traditional retrieval-augmented generation methods that rely heavily on the performance of external retrievers and multi-stage pipelines, SR-KI supports end-to-end inference by performing retrieval entirely within the model’s latent space. This design enables efficient compression of injected knowledge and facilitates dynamic knowledge updates. Comprehensive experiments...","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1609/aaai.v40i41.40747","openalex_id":"https://openalex.org/W7139056871","cited_by_count":0,"quality_score":53,"matched_keywords":["LLM","retrieval","efficient","compression"],"author_affiliations":["Baidu (China)","Beijing Academy of Artificial Intelligence","Institute of Automation","Shandong Institute of Automation"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7813000082969666},{"id":"https://openalex.org/C2780451532","display_name":"Task (project management)","score":0.6852999925613403},{"id":"https://openalex.org/C2776214188","display_name":"Inference","score":0.6675999760627747},{"id":"https://openalex.org/C125411270","display_name":"Encoding (memory)","score":0.6564000248908997},{"id":"https://openalex.org/C48044578","display_name":"Scalability","score":0.6345000267028809},{"id":"https://openalex.org/C44291984","display_name":"Question answering","score":0.5110999941825867},{"id":"https://openalex.org/C2779227376","display_name":"Layer (electronics)","score":0.5073000192642212},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4797999858856201}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7138861429","title":"ComoRAG: A Cognitive-Inspired Memory-Organized RAG for Stateful Long Narrative Reasoning","url":"https://doi.org/10.1609/aaai.v40i39.40644","published":"2026-03-14","authors":["Juyuan Wang","Rongchen Zhao","Wei Wei","Yufeng Wang","M. K. Yu","Jie Zhou","Jin Xu","Liyan Xu"],"abstract":"Narrative comprehension on long stories and novels has been a challenging domain attributed to their intricate plotlines and entangled, often evolving relations among characters and entities. Given the LLM's diminished reasoning over extended context and its high computational cost, retrieval-based approaches remain a pivotal role in practice. However, traditional RAG methods could fall short due to their stateless, single-step retrieval process, which often overlooks the dynamic nature of capturing interconnected relations within long-range context. In this work, we propose ComoRAG, holding the principle that narrative reasoning is not a one-shot process, but a dynamic, evolving interplay between new evidence acquisition and past knowledge consolidation, analogous to human cognition on reasoning with memory-related signals in the brain. Specifically, when encountering a reasoning impass...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1609/aaai.v40i39.40644","openalex_id":"https://openalex.org/W7138861429","cited_by_count":1,"quality_score":50,"matched_keywords":["LLM","memory","retrieval"],"author_affiliations":["South China University of Technology","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7450000047683716},{"id":"https://openalex.org/C199033989","display_name":"Narrative","score":0.6643999814987183},{"id":"https://openalex.org/C188147891","display_name":"Cognitive science","score":0.6158000230789185},{"id":"https://openalex.org/C2779343474","display_name":"Context (archaeology)","score":0.6148999929428101},{"id":"https://openalex.org/C22927095","display_name":"Stateful firewall","score":0.5719000101089478},{"id":"https://openalex.org/C169900460","display_name":"Cognition","score":0.48539999127388},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.47350001335144043},{"id":"https://openalex.org/C511192102","display_name":"Comprehension","score":0.40459999442100525}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W7137864603","title":"VirtualEnv: A Platform for Embodied AI Research","url":"https://doi.org/10.1609/aaai.v40i22.38923","published":"2026-03-14","authors":["Kabir Swain","Sijie Han","Ayush Raina","Jin Zhang","Shuang Li","Michael Stopa","Antonio Torralba"],"abstract":"As large language models (LLMs) continue to improve in reasoning and decision-making, there is a growing need for realistic and interactive environments where their abilities can be rigorously evaluated. We present VirtualEnv, a next-generation simulation platform built on Unreal Engine 5 that enables fine-grained benchmarking of LLMs in embodied and interactive scenarios. VirtualEnv supports rich agent–environment interactions, including object manipulation, navigation, and adaptive multi-agent collaboration, as well as game-inspired mechanics like escape rooms and procedurally generated environments. We provide a user-friendly API built on top of Unreal Engine, allowing researchers to deploy and control LLM-driven agents using natural language instructions. We integrate large-scale LLMs and vision-language models (VLMs), such as GPT-based models, to generate novel environments and stru...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1609/aaai.v40i22.38923","openalex_id":"https://openalex.org/W7137864603","cited_by_count":0,"quality_score":49,"matched_keywords":["LLM","agent","multi-agent"],"author_affiliations":["Google (United States)","Massachusetts Institute of Technology","University of Toronto"],"concepts":[{"id":"https://openalex.org/C100609095","display_name":"Embodied cognition","score":0.7943000197410583},{"id":"https://openalex.org/C2780451532","display_name":"Task (project management)","score":0.6983000040054321},{"id":"https://openalex.org/C64543145","display_name":"Intersection (aeronautics)","score":0.6934000253677368},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6710000038146973},{"id":"https://openalex.org/C86251818","display_name":"Benchmarking","score":0.6302000284194946},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.6079000234603882},{"id":"https://openalex.org/C185798385","display_name":"Benchmark (surveying)","score":0.43299999833106995},{"id":"https://openalex.org/C511192102","display_name":"Comprehension","score":0.42579999566078186}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7138058965","title":"TabFlash: Efficient Table Understanding with Progressive Question Conditioning and Token Focusing","url":"https://doi.org/10.1609/aaai.v40i27.39417","published":"2026-03-14","authors":["Jongha Kim","Minseong Bae","Sanghyeok Lee","Jinsung Yoon","Hyunwoo J. Kim"],"abstract":"Table images present unique challenges for effective and efficient understanding due to the need for question-specific focus and the presence of redundant background regions. Existing Multimodal Large Language Model (MLLM) approaches often overlook these characteristics, resulting in uninformative and redundant visual representations. To address these issues, we aim to generate visual features that are both informative and compact for improved table understanding. We first propose progressive question conditioning, which injects the question into Vision Transformer layers with gradually increasing frequency, considering each layer’s capacity to handle additional information, to generate question-aware visual features. To reduce redundancy, we introduce a pruning strategy that discards background tokens, thereby improving efficiency. To mitigate information loss from pruning, we further p...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1609/aaai.v40i27.39417","openalex_id":"https://openalex.org/W7138058965","cited_by_count":0,"quality_score":49,"matched_keywords":["language model","memory","efficient"],"author_affiliations":["Google (United States)","Kootenay Association for Science & Technology","Korea Advanced Institute of Science and Technology"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7580999732017517},{"id":"https://openalex.org/C48145219","display_name":"Security token","score":0.682699978351593},{"id":"https://openalex.org/C192209626","display_name":"Focus (optics)","score":0.5514000058174133},{"id":"https://openalex.org/C45235069","display_name":"Table (database)","score":0.5145000219345093},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4616999924182892},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.45969998836517334},{"id":"https://openalex.org/C108010975","display_name":"Pruning","score":0.4489000141620636},{"id":"https://openalex.org/C66322947","display_name":"Transformer","score":0.36970001459121704}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7138851585","title":"S³-MSD: Large Vision-Language Model for Explainable and Generalizable Multi-modal Sarcasm Detection","url":"https://doi.org/10.1609/aaai.v40i41.40834","published":"2026-03-14","authors":["Zhihong Zhu","Fan Zhang","Yunyan Zhang","Jinghan Sun","Guimin Hu","Hao Wu","Yuyan Chen","Bowen Xing","Xian Wu"],"abstract":"Multimodal sarcasm detection (MSD) aims to identify sarcasm polarity from diverse modalities (i.e., image–text pairs), a task that has received increasing attention. While significant progress has been made, existing approaches still face two major issues: lack of explainability and weak generalizability. In this paper, we introduce a new large vision–language model (LVLM) dubbed S³-MSD for explainable and generalizable MSD through three key components. For explainability, we develop (1) a self-training paradigm that automatically bootstraps answers with explanations, and (2) a self-calibrating mechanism that rectifies flawed explanations. For generalizability, we design (3) a self-focusing module that amplifies visual semantic entities through preference optimization, thereby mitigating textual over-reliance. Experimental results on both in-distribution and out-of-distribution (OOD) ben...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1609/aaai.v40i41.40834","openalex_id":"https://openalex.org/W7138851585","cited_by_count":0,"quality_score":49,"matched_keywords":["language model","preference","persuasive"],"author_affiliations":["Chinese University of Hong Kong","Cornell University","Tencent (China)","University of Copenhagen","University of Science and Technology Beijing"],"concepts":[{"id":"https://openalex.org/C2776207355","display_name":"Sarcasm","score":0.9613999724388123},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.704800009727478},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.6158999800682068},{"id":"https://openalex.org/C2780451532","display_name":"Task (project management)","score":0.5769000053405762},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.5196999907493591},{"id":"https://openalex.org/C26517878","display_name":"Key (lock)","score":0.5110999941825867},{"id":"https://openalex.org/C2781249084","display_name":"Preference","score":0.5063999891281128},{"id":"https://openalex.org/C2779903281","display_name":"Modalities","score":0.48080000281333923}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7138935137","title":"Multi-Agent VLMs Guided Self-Training with PNU Loss for Low-Resource Offensive Content Detection","url":"https://doi.org/10.1609/aaai.v40i46.41288","published":"2026-03-14","authors":["Han Wang","Deyi Ji","Junyu Lu","Lanyun Zhu","Hailong Zhang","Haiyang Wu","Liqun Liu","Peng Shu","Roy Ka-Wei Lee"],"abstract":"Accurate detection of offensive content on social media demands high-quality labeled data; however, such data is often scarce due to the low prevalence of offensive instances and the high cost of manual annotation. To address this low-resource challenge, we propose a self-training framework that leverages abundant unlabeled data through collaborative pseudo-labeling. Starting with a lightweight classifier trained on limited labeled data, our method iteratively assigns pseudo-labels to unlabeled instances with the support of Multi-Agent Vision-Language Models (MA-VLMs). Unlabeled data on which the classifier and MA-VLMs agree are designated as the Agreed-Unknown set, while conflicting samples form the Disagreed-Unknown set. To enhance label reliability, MA-VLMs simulate dual perspectives, moderator and user, capturing both regulatory and subjective viewpoints. The classifier is optimized....","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1609/aaai.v40i46.41288","openalex_id":"https://openalex.org/W7138935137","cited_by_count":0,"quality_score":49,"matched_keywords":["media","agent","multi-agent"],"author_affiliations":["Dalian University of Technology","Nanyang Technological University","Singapore University of Technology and Design","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7890999913215637},{"id":"https://openalex.org/C176856949","display_name":"Offensive","score":0.7221999764442444},{"id":"https://openalex.org/C95623464","display_name":"Classifier (UML)","score":0.6960999965667725},{"id":"https://openalex.org/C165696696","display_name":"Exploit","score":0.6557999849319458},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.6014000177383423},{"id":"https://openalex.org/C2776145971","display_name":"Labeled data","score":0.5440999865531921},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.5120999813079834},{"id":"https://openalex.org/C51632099","display_name":"Training set","score":0.47920000553131104}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7138385128","title":"EM-KD: Distilling Efficient Multimodal Large Language Model with Unbalanced Vision Tokens","url":"https://doi.org/10.1609/aaai.v40i25.39254","published":"2026-03-14","authors":["Ze Feng","sen Yang","Boqiang Duan","Wankou Yang","Jingdong Wang"],"abstract":"Efficient Multimodal Large Language Models (MLLMs) compress vision tokens to reduce resource consumption, but the loss of visual information can degrade comprehension capabilities. Although some priors introduce Knowledge Distillation to enhance student models, they overlook the fundamental differences in fine-grained vision comprehension caused by unbalanced vision tokens between the efficient student and vanilla teacher. In this paper, we propose EM-KD, a novel paradigm that enhances the Efficient MLLMs with Knowledge Distillation. To overcome the challenge of unbalanced vision tokens, we first calculate the Manhattan distance between the vision logits of teacher and student, and then align them in the spatial dimension with the Hungarian matching algorithm. After alignment, EM-KD introduces two distillation strategies: 1) Vision-Language Affinity Distillation (VLAD) and 2) Vision Sema...","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1609/aaai.v40i25.39254","openalex_id":"https://openalex.org/W7138385128","cited_by_count":0,"quality_score":49,"matched_keywords":["language model","efficient","distillation"],"author_affiliations":["Baidu (China)","Southeast University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.766700029373169},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.614300012588501},{"id":"https://openalex.org/C48145219","display_name":"Security token","score":0.527999997138977},{"id":"https://openalex.org/C165064840","display_name":"Matching (statistics)","score":0.5163000226020813},{"id":"https://openalex.org/C204030448","display_name":"Distillation","score":0.475600004196167},{"id":"https://openalex.org/C2780009758","display_name":"Measure (data warehouse)","score":0.4408000111579895},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.4172999858856201},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.4156999886035919}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7137961846","title":"When Top-ranked Recommendations Fail: Modeling Multi-Granular Negative Feedback for Explainable and Robust Video Recommendation","url":"https://doi.org/10.1609/aaai.v40i24.39114","published":"2026-03-14","authors":["Siran Chen","BoYu Chen","Chenyun Yu","Yi Ouyang","Lei Cheng","Chengxiang Zhuo","Zang Li","Yali Wang"],"abstract":"Existing video recommendation systems, relying mainly on ID-based embedding mapping and collaborative filtering, often fail to capture in-depth video content semantics. Moreover, most struggle to address biased user behaviors (e.g., accidental clicks, fast skips), leading to inaccurate interest modeling and frequent negative feedback in top recommendations with unclear causes. To tackle this issue, we collect real-world user video-watching sequences, annotate the reasons for users' dislikes, and construct a benchmark dataset for personalized explanations. We then introduce the Agentic Explainable Negative Feedback (ENF) framework, which integrates three core components: (1) the Profile Agent, extracting behavioral cues from users' historical data to derive psychological and personality profiles; (2) the Video Agent, performing comprehensive multimodal video analysis; and (3) the Reason A...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1609/aaai.v40i24.39114","openalex_id":"https://openalex.org/W7137961846","cited_by_count":0,"quality_score":45,"matched_keywords":["personalized","agent"],"author_affiliations":["Shenzhen Institutes of Advanced Technology","Sun Yat-sen University","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8205000162124634},{"id":"https://openalex.org/C2780801425","display_name":"Construct (python library)","score":0.583899974822998},{"id":"https://openalex.org/C105339364","display_name":"Software deployment","score":0.5683000087738037},{"id":"https://openalex.org/C185798385","display_name":"Benchmark (surveying)","score":0.5512999892234802},{"id":"https://openalex.org/C2984870255","display_name":"User engagement","score":0.5351999998092651},{"id":"https://openalex.org/C41608201","display_name":"Embedding","score":0.5210000276565552},{"id":"https://openalex.org/C557471498","display_name":"Recommender system","score":0.49230000376701355},{"id":"https://openalex.org/C193081819","display_name":"Video feedback","score":0.4860000014305115}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7138457928","title":"VCapsBench: A Large-scale Fine-grained Benchmark for Video Caption Quality Evaluation","url":"https://doi.org/10.1609/aaai.v40i15.38269","published":"2026-03-14","authors":["Shi-Xue Zhang","Hongfa Wang","Duojun Huang","Xin Li","Xiaobin Zhu","Xu-Cheng Yin"],"abstract":"Video captions play a crucial role in text-to-video generation tasks, as their quality directly influences the semantic coherence and visual fidelity of the generated videos. Although large vision-language models (VLMs) have demonstrated significant potential in caption generation, existing benchmarks inadequately address fine-grained evaluation, particularly in capturing spatial-temporal details critical for video generation. To address this gap, we introduce the Fine-grained Video Caption Evaluation Benchmark (VCapsBench), the first large-scale fine-grained benchmark comprising 5,677 (5K+) videos and 109,796 (100K+) question-answer pairs. These QA-pairs are systematically annotated across 21 fine-grained dimensions (e.g., camera movement, and shot type) that are empirically proven critical for text-to-video generation. We further introduce three metrics (Accuracy (AR), Inconsistency Ra...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1609/aaai.v40i15.38269","openalex_id":"https://openalex.org/W7138457928","cited_by_count":0,"quality_score":45,"matched_keywords":["LLM","language model"],"author_affiliations":["Tencent (China)","Tsinghua–Berkeley Shenzhen Institute","University Town of Shenzhen","University of Science and Technology Beijing"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8514000177383423},{"id":"https://openalex.org/C185798385","display_name":"Benchmark (surveying)","score":0.8465999960899353},{"id":"https://openalex.org/C43521106","display_name":"Pipeline (software)","score":0.6771000027656555},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5246999859809875},{"id":"https://openalex.org/C2779530757","display_name":"Quality (philosophy)","score":0.4893999993801117},{"id":"https://openalex.org/C2781181686","display_name":"Coherence (philosophical gambling strategy)","score":0.4659999907016754},{"id":"https://openalex.org/C2776459999","display_name":"Fidelity","score":0.45829999446868896},{"id":"https://openalex.org/C184337299","display_name":"Semantics (computer science)","score":0.39250001311302185}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7138061288","title":"TransMamba: A Sequence-Level Hybrid Transformer-Mamba Language Model","url":"https://doi.org/10.1609/aaai.v40i38.40451","published":"2026-03-14","authors":["Yixing Li","Ruobing Xie","Zhen Yang","Xingwu Sun","Shuaipeng Li","Weidong Han","Zhanhui Kang","Di Wang","Yu Cheng"],"abstract":"Transformers are the cornerstone of modern large language models, but their quadratic computational complexity limits efficiency in long-sequence processing. Recent advancements in Mamba, a state space model (SSM) with linear complexity, offer promising efficiency gains but suffer from unstable contextual learning and multitask generalization. Some works conduct layer-level hybrid structures that combine Transformer and Mamba layers, aiming to make full use of both advantages. This paper proposes TransMamba, a novel sequence-level hybrid framework that unifies Transformer and Mamba through shared parameter matrices (QKV and CBx), and thus could dynamically switch between attention and SSM mechanisms at different token lengths and layers. We design the Memory Converter to bridge Transformer and Mamba by converting attention outputs into SSM-compatible states, ensuring seamless information...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1609/aaai.v40i38.40451","openalex_id":"https://openalex.org/W7138061288","cited_by_count":0,"quality_score":45,"matched_keywords":["language model","memory"],"author_affiliations":["Chinese University of Hong Kong","Tencent (China)","University of Macau"],"concepts":[{"id":"https://openalex.org/C66322947","display_name":"Transformer","score":0.7524999976158142},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6945000290870667},{"id":"https://openalex.org/C48044578","display_name":"Scalability","score":0.5372999906539917},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.4733000099658966},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.38019999861717224},{"id":"https://openalex.org/C206729178","display_name":"Scheduling (production processes)","score":0.32850000262260437},{"id":"https://openalex.org/C113775141","display_name":"Computer engineering","score":0.3111000061035156},{"id":"https://openalex.org/C2778648169","display_name":"Compatibility (geochemistry)","score":0.30790001153945923}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7138363741","title":"TFRank: Think-Free Reasoning Enables Practical Pointwise LLM Ranking","url":"https://doi.org/10.1609/aaai.v40i25.39244","published":"2026-03-14","authors":["Yongqi Fan","Xiaoyang Chen","Dezhi Ye","Jie Liu","Haijin Liang","Jin Ma","Ben He","Yingfei Sun","Tong Ruan"],"abstract":"Reasoning-intensive ranking models built on Large Language Models (LLMs) have made notable progress. However, existing approaches often rely on large-scale LLMs and explicit Chain-of-Thought (CoT) reasoning, resulting in high computational cost and latency that limit real-world use. To address this, we propose TFRank, an efficient pointwise reasoning ranker based on small-scale LLMs. To improve ranking performance, TFRank effectively integrates CoT data, fine-grained score supervision, and multi-task training. Furthermore, it achieves an efficient \"Think-Free\" reasoning capability by employing a \"think-mode switch\" and pointwise format constraints. Specifically, this allows the model to leverage explicit reasoning during training while delivering precise relevance scores for complex queries at inference without generating any reasoning chains. Experiments show that TFRank achieves perfor...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1609/aaai.v40i25.39244","openalex_id":"https://openalex.org/W7138363741","cited_by_count":0,"quality_score":45,"matched_keywords":["LLM","efficient"],"author_affiliations":["East China University of Science and Technology","Institute of Software","Tencent (China)","University of Chinese Academy of Sciences"],"concepts":[{"id":"https://openalex.org/C2777984123","display_name":"Pointwise","score":0.7953000068664551},{"id":"https://openalex.org/C2776214188","display_name":"Inference","score":0.7134000062942505},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6735000014305115},{"id":"https://openalex.org/C153083717","display_name":"Leverage (statistics)","score":0.6735000014305115},{"id":"https://openalex.org/C189430467","display_name":"Ranking (information retrieval)","score":0.654699981212616},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5167999863624573},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.4713999927043915},{"id":"https://openalex.org/C195344581","display_name":"Automated reasoning","score":0.4528999924659729}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7138122487","title":"Seeing Is Believing: Grounding Long-Video Understanding in Spatio-Temporal Visual Evidence","url":"https://doi.org/10.1609/aaai.v40i13.38031","published":"2026-03-14","authors":["Zhaoyang Wei","Guoliang Wang","Guohua Gao","Yanchao Hao","Mingda Li","Wenchao Ding","Xi Chen","Shizhu He","Xuehui Yu"],"abstract":"Although Vision Language Models (VLMs) have excelled at image and video understanding, applying them to hour-long videos is held back by two interrelated challenges: exorbitant computational expense and a qualitative breakdown in long-term temporal reasoning. Thus, models tend to generate answers based on speculation instead of solid visual facts, causing both factually incorrect and plausible hallucinations. This problem is compounded by current benchmarks that, by only emphasizing final answers, lack an effective mechanism to check whether reasoning is substantiated by specific visual evidence. This makes it hard to differentiate between true understanding and pretend comprehension, inhibiting targeted model refinement. To address these interrelated challenges of model fragility and evaluation weakness, we adopt a twofold strategy. First, we present EV²-Bench, a large-scale benchmark t...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1609/aaai.v40i13.38031","openalex_id":"https://openalex.org/W7138122487","cited_by_count":0,"quality_score":45,"matched_keywords":["long-term","compression"],"author_affiliations":["Institute of Automation","Tencent (China)","The University of Sydney","University of Chinese Academy of Sciences","Chinese Academy of Sciences"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6891999840736389},{"id":"https://openalex.org/C2780719617","display_name":"Salient","score":0.6449000239372253},{"id":"https://openalex.org/C197115733","display_name":"Forcing (mathematics)","score":0.5462999939918518},{"id":"https://openalex.org/C2777508537","display_name":"Visual reasoning","score":0.5145000219345093},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.49950000643730164},{"id":"https://openalex.org/C66024118","display_name":"Computational model","score":0.41830000281333923},{"id":"https://openalex.org/C80191262","display_name":"Fragility","score":0.4156999886035919},{"id":"https://openalex.org/C47941915","display_name":"Speculation","score":0.3862999975681305}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7137819120","title":"Parameter-, Memory-, Time-Efficient Multi-Task Dense Vision Adaptation","url":"https://doi.org/10.1609/aaai.v40i14.38171","published":"2026-03-14","authors":["Haiming Yao","Wei Luo","Qiyu Chen","Jianxing Liao","Wei You"],"abstract":"While adapting pretrained vision models to downstream dense prediction tasks is widely used, current methods often overlook adaptation efficiency, especially in the context of multi-task learning (MTL). Although parameter-efficient fine-tuning (PEFT) methods can enhance parameter efficiency, broader aspects such as GPU memory and training time efficiency remain underexplored. In this paper, we propose a new paradigm that simultaneously achieves efficiency in Parameters, GPU Memory, and Training Time for Multi-Task Dense Vision Adaptation. Specifically, we propose a dual-branch framework, in which a frozen pretrained backbone serves as the generic main branch, and the proposed Bi-Directional Task Adaptation (BDTA) modules are integrated in parallel to form a task bypass branch that extracts adaptation features required by multiple specific tasks. This adaptation module is lightweight, eff...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1609/aaai.v40i14.38171","openalex_id":"https://openalex.org/W7137819120","cited_by_count":0,"quality_score":45,"matched_keywords":["memory","efficient"],"author_affiliations":["Huawei Technologies (China)","Huawei Technologies (United Kingdom)","Institute of Computing Technology"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8119000196456909},{"id":"https://openalex.org/C139807058","display_name":"Adaptation (eye)","score":0.7746999859809875},{"id":"https://openalex.org/C2780451532","display_name":"Task (project management)","score":0.7129999995231628},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.6550999879837036},{"id":"https://openalex.org/C2779343474","display_name":"Context (archaeology)","score":0.5490999817848206},{"id":"https://openalex.org/C155032097","display_name":"Backpropagation","score":0.507099986076355},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.4706000089645386},{"id":"https://openalex.org/C175154964","display_name":"Task analysis","score":0.39259999990463257}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7138368391","title":"MoSE: Hierarchical Self-Distillation Enhances Early Layer Embeddings","url":"https://doi.org/10.1609/aaai.v40i37.40348","published":"2026-03-14","authors":["Andrea Gurioli","Federico Pennino","Joao Monteiro","Maurizio Gabbrielli"],"abstract":"Deploying language models often requires navigating accuracy vs. performance trade-offs to meet latency constraints while preserving utility. Traditional model distillation reduces size but incurs substantial costs through training separate models. We introduce ModularStarEncoder (MoSE), a 1-billion-parameter multi-exit encoder for code retrieval and classification that employs a novel Self-Distillation mechanism. This approach significantly enhances lower-layer representations, enabling flexible deployment of different model portions with favorable performance trade-offs. Our architecture improves text-to-code and code-to-code search by targeting specific encoder layers as exit heads, where higher layers guide earlier ones during training, thereby improving intermediate representations at minimal additional cost. We further enhance MoSE with a repository-level contextual loss that maxim...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1609/aaai.v40i37.40348","openalex_id":"https://openalex.org/W7138368391","cited_by_count":0,"quality_score":45,"matched_keywords":["retrieval","distillation"],"author_affiliations":["Apple (Israel)","Apple (United States)","University of Bologna"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7936000227928162},{"id":"https://openalex.org/C118505674","display_name":"Encoder","score":0.6951000094413757},{"id":"https://openalex.org/C2776214188","display_name":"Inference","score":0.6579999923706055},{"id":"https://openalex.org/C2776760102","display_name":"Code (set theory)","score":0.6287000179290771},{"id":"https://openalex.org/C2779343474","display_name":"Context (archaeology)","score":0.5349000096321106},{"id":"https://openalex.org/C82876162","display_name":"Latency (audio)","score":0.5322999954223633},{"id":"https://openalex.org/C2779227376","display_name":"Layer (electronics)","score":0.5041000247001648},{"id":"https://openalex.org/C105339364","display_name":"Software deployment","score":0.4950000047683716}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7138843526","title":"LLM-Oriented Token-Adaptive Knowledge Distillation","url":"https://doi.org/10.1609/aaai.v40i40.40701","published":"2026-03-14","authors":["Xi Xie","Zhucun Xue","Jiafu Wu","Jian Li","Yabiao Wang","Xiaobin Hu","Yong Liu","Jiangning Zhang"],"abstract":"Knowledge Distillation (KD) is a key technique for compressing Large-scale Language Models (LLMs), but prevailing logit-based methods employ static strategies misaligned with the student’s dynamic learning process. By treating all tokens indiscriminately with a fixed temperature, these methods result in suboptimal knowledge transfer. To address this, we propose LLM-oriented token-Adaptive Knowledge Distillation (AdaKD), a framework that adapts the distillation process to each token’s real-time learning state. AdaKD consists of two synergistic modules driven by a unified token difficulty metric. First, the Loss-driven Adaptive Token Focusing (LATF) module dynamically concentrates distillation on valuable tokens by monitoring the student’s learning stability. Second, Inverse Difficulty Temperature Scaling (IDTS) introduces a counterintuitive token-level temperature: low for difficult token...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1609/aaai.v40i40.40701","openalex_id":"https://openalex.org/W7138843526","cited_by_count":0,"quality_score":45,"matched_keywords":["LLM","distillation"],"author_affiliations":["National University of Singapore","Tencent (China)","Zhejiang University"],"concepts":[{"id":"https://openalex.org/C204030448","display_name":"Distillation","score":0.8528000116348267},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7682999968528748},{"id":"https://openalex.org/C26517878","display_name":"Key (lock)","score":0.6410999894142151},{"id":"https://openalex.org/C98045186","display_name":"Process (computing)","score":0.640500009059906},{"id":"https://openalex.org/C48145219","display_name":"Security token","score":0.4560999870300293},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.42809998989105225},{"id":"https://openalex.org/C99844830","display_name":"Scaling","score":0.3538999855518341},{"id":"https://openalex.org/C165064840","display_name":"Matching (statistics)","score":0.33180001378059387}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7137823514","title":"Inference Scaling Law for Retrieval Augmented Generation","url":"https://doi.org/10.1609/aaai.v40i19.38692","published":"2026-03-14","authors":["Shu Zhou","yuxuan ao","Yunyang Xuan","Xin Eric Wang","Tao Fan","Hao Henry Wang"],"abstract":"Retrieval-augmented generation (RAG) has recently emerged as a powerful framework for knowledge-intensive natural language processing tasks, which leverages the strengths of both pre-trained language models and external knowledge. While significant progress has been made, the scaling behavior of these approaches during inference remains poorly understood. Towards this end, this paper presents a comprehensive study of inference scaling law for RAG models, which investigates how inference performance scales with respect to key factors including retriever model scale, generator model scale, number of retrieved documents, and context window size. Through extensive experiments on benchmark datasets, we establish empirical scaling laws that reveal power-law and sigmoid-type relationships between these factors and performance. We further build a joint inference scaling law with theoretical just...","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1609/aaai.v40i19.38692","openalex_id":"https://openalex.org/W7137823514","cited_by_count":0,"quality_score":45,"matched_keywords":["retrieval","efficient"],"author_affiliations":["Baidu (China)","Nanjing University of Finance and Economics","Nanjing University of Information Science and Technology"],"concepts":[{"id":"https://openalex.org/C2776214188","display_name":"Inference","score":0.8680999875068665},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.742900013923645},{"id":"https://openalex.org/C185798385","display_name":"Benchmark (surveying)","score":0.6176999807357788},{"id":"https://openalex.org/C2779343474","display_name":"Context (archaeology)","score":0.5605999827384949},{"id":"https://openalex.org/C99844830","display_name":"Scaling","score":0.5260999798774719},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.48010000586509705},{"id":"https://openalex.org/C26517878","display_name":"Key (lock)","score":0.4530999958515167},{"id":"https://openalex.org/C80444323","display_name":"Theoretical computer science","score":0.40290001034736633}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7138874415","title":"Incorporating Self-Rewriting into Large Language Model Reasoning Reinforcement","url":"https://doi.org/10.1609/aaai.v40i40.40738","published":"2026-03-14","authors":["Jiashu Yao","Heyan Huang","Shuang Zeng","Chuwei Luo","Wangjie You","Jie Tang","Qingsong Liu","Yuhang Guo","Yu Kang"],"abstract":"Through reinforcement learning (RL) with outcome correctness rewards, large reasoning models (LRMs) with scaled inference computation have demonstrated substantial success on complex reasoning tasks. However, the one-sided reward, focused solely on final correctness, limits its ability to provide detailed supervision over internal reasoning process. This deficiency leads to suboptimal internal reasoning quality, manifesting as issues like over-thinking, under-thinking, redundant-thinking, and disordered-thinking. Inspired by the recent progress in LRM self-rewarding, we introduce self-rewriting framework, where a model rewrites its own reasoning texts, and subsequently learns from the rewritten reasoning to improve the internal thought process quality. For algorithm design, we propose a selective rewriting approach wherein only \"simple\" samples, defined by the model's consistent correctn...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1609/aaai.v40i40.40738","openalex_id":"https://openalex.org/W7138874415","cited_by_count":0,"quality_score":45,"matched_keywords":["LLM","language model"],"author_affiliations":["Beijing Institute of Technology","Tencent (China)","Zhejiang University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7598000168800354},{"id":"https://openalex.org/C55439883","display_name":"Correctness","score":0.7182999849319458},{"id":"https://openalex.org/C154690210","display_name":"Rewriting","score":0.6908000111579895},{"id":"https://openalex.org/C2776214188","display_name":"Inference","score":0.5914999842643738},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5825999975204468},{"id":"https://openalex.org/C89288958","display_name":"Reasoning system","score":0.534600019454956},{"id":"https://openalex.org/C159032336","display_name":"Non-monotonic logic","score":0.5238999724388123},{"id":"https://openalex.org/C98045186","display_name":"Process (computing)","score":0.5055999755859375}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7138177537","title":"Importance-Aware Data Selection for Efficient LLM Instruction Tuning","url":"https://doi.org/10.1609/aaai.v40i37.40396","published":"2026-03-14","authors":["Tingyu Jiang","Shen Li","Yiyao Song","Lan Zhang","Hualei Zhu","Yuan Zhao","Xiaohang Xu","Kenjiro Taura","Hao Henry Wang"],"abstract":"Instruction tuning plays a critical role in enhancing the performance and efficiency of Large Language Models (LLMs). Its success depends not only on the quality of the instruction data but also on the inherent capabilities of the LLM itself. Some studies suggest that even a small amount of high-quality data can achieve instruction fine-tuning results that are on par with, or even exceed, those from using a full-scale dataset. However, rather than focusing solely on calculating data quality scores to evaluate instruction data, there is a growing need to select high-quality data that maximally enhances the performance of instruction tuning for a given LLM. In this paper, we propose the Model Instruction Weakness Value (MIWV) as a novel metric to quantify the importance of instruction data in enhancing model's capabilities. The MIWV metric is derived from the discrepancies in the model’s r...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1609/aaai.v40i37.40396","openalex_id":"https://openalex.org/W7138177537","cited_by_count":0,"quality_score":45,"matched_keywords":["LLM","efficient"],"author_affiliations":["Alibaba Group (China)","The University of Tokyo"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7979999780654907},{"id":"https://openalex.org/C176217482","display_name":"Metric (unit)","score":0.6593999862670898},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.5192000269889832},{"id":"https://openalex.org/C81917197","display_name":"Selection (genetic algorithm)","score":0.49779999256134033},{"id":"https://openalex.org/C2779530757","display_name":"Quality (philosophy)","score":0.4964999854564667},{"id":"https://openalex.org/C2780898871","display_name":"Performance metric","score":0.4788999855518341},{"id":"https://openalex.org/C120936955","display_name":"Empirical research","score":0.429500013589859},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.41839998960494995}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7138180034","title":"Harnessing Vision-Language Models for Time Series Anomaly Detection","url":"https://doi.org/10.1609/aaai.v40i26.39319","published":"2026-03-14","authors":["Zelin He","Sarah Alnegheimish","Matthew Reimherr"],"abstract":"Time-series anomaly detection (TSAD) has played a vital role in a variety of fields, including healthcare, finance, and sensor-based condition monitoring. Prior methods, which mainly focus on training domain-specific models on numerical data, lack the visual–temporal reasoning capacity that human experts have to identify contextual anomalies. To fill this gap, we explore a solution based on vision language models (VLMs). Recent studies have shown the ability of VLMs for visual reasoning tasks, yet their direct application to time series has fallen short on both accuracy and efficiency. To harness the power of VLMs for TSAD, we propose a two-stage solution, with (1) ViT4TS, a vision-screening stage built on a relatively lightweight pre-trained vision encoder, which leverages 2-D time series representations to accurately localize candidate anomalies; (2) VLM4TS, a VLM-based stage that inte...","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1609/aaai.v40i26.39319","openalex_id":"https://openalex.org/W7138180034","cited_by_count":0,"quality_score":45,"matched_keywords":["language model","efficient"],"author_affiliations":["Amazon (United States)","Massachusetts Institute of Technology","Pennsylvania State University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.738099992275238},{"id":"https://openalex.org/C739882","display_name":"Anomaly detection","score":0.7149999737739563},{"id":"https://openalex.org/C2779343474","display_name":"Context (archaeology)","score":0.6075000166893005},{"id":"https://openalex.org/C192209626","display_name":"Focus (optics)","score":0.5821999907493591},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5803999900817871},{"id":"https://openalex.org/C143724316","display_name":"Series (stratigraphy)","score":0.5699999928474426},{"id":"https://openalex.org/C136197465","display_name":"Variety (cybernetics)","score":0.4779999852180481},{"id":"https://openalex.org/C151406439","display_name":"Time series","score":0.4740000069141388}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7138142549","title":"From Macro to Micro: Probing Dataset Diversity in Language Model Fine-Tuning","url":"https://doi.org/10.1609/aaai.v40i37.40426","published":"2026-03-14","authors":["Haoyu Li","Xuhong Li","Yiming Dong","Kun Liu"],"abstract":"Dataset diversity plays a pivotal role for the successful training of many machine learning models, particularly in the supervised fine-tuning (SFT) stage of large language model (LLM) development. Despite increasing recognition of its importance, systematic analyses of dataset diversity still remain underexplored. To address this gap, this work presents a systematic taxonomy of existing diversity-control strategies, which primarily focus on the instruction component, operating at either macroscopic (entire instruction semantics) or mesoscopic levels (instruction units), and furthermore introduces a novel analysis of microscopic diversity within the response component, specifically analyzing the statistical distribution of tokens in SFT training samples. In the experimental evaluation, we construct fixed-size datasets (e.g., 10,000 samples each) from a corpus of 117,000 open-source SFT s...","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1609/aaai.v40i37.40426","openalex_id":"https://openalex.org/W7138142549","cited_by_count":0,"quality_score":45,"matched_keywords":["LLM","language model"],"author_affiliations":["Baidu (China)","Beijing Institute of Technology","King University","Peking University"],"concepts":[{"id":"https://openalex.org/C2780801425","display_name":"Construct (python library)","score":0.6769000291824341},{"id":"https://openalex.org/C2781316041","display_name":"Diversity (politics)","score":0.6743000149726868},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.663100004196167},{"id":"https://openalex.org/C166955791","display_name":"Macro","score":0.607699990272522},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5430999994277954},{"id":"https://openalex.org/C192209626","display_name":"Focus (optics)","score":0.4616999924182892},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.43970000743865967},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.41940000653266907}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7137966048","title":"F2RVLM: Boosting Fine-grained Fragment Retrieval for Multi-Modal Long-form Dialogue with Vision Language Model","url":"https://doi.org/10.1609/aaai.v40i17.38466","published":"2026-03-14","authors":["Hanbo Bi","Zhiqiang Yuan","Zexi Jia","Jiapei Zhang","Chongyang Li","Peixiang Luo","Ying Deng","Xiaoyue Duan","Jinchao Zhang"],"abstract":"Traditional dialogue retrieval aims to select the most appropriate utterance or image from recent dialogue history. However, they often fail to meet users’ actual needs for revisiting semantically coherent content scattered across long-form conversations. To fill this gap, we define the Fine-grained Fragment Retrieval (FFR) task, requiring models to locate query-relevant fragments, comprising both utterances and images, from multimodal long-form dialogues. As a foundation for FFR, we construct MLDR, the longest-turn multimodal dialogue retrieval dataset to date, averaging 25.45 turns per dialogue, with each naturally spanning three distinct topics. To evaluate generalization in real-world scenarios, we curate and annotate a WeChat-based test set comprising real-world multimodal dialogues with an average of 75.38 turns. Building on these resources, we explore existing generation-based Vis...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1609/aaai.v40i17.38466","openalex_id":"https://openalex.org/W7137966048","cited_by_count":0,"quality_score":45,"matched_keywords":["language model","retrieval"],"author_affiliations":["Aerospace Information Research Institute","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7954999804496765},{"id":"https://openalex.org/C46686674","display_name":"Boosting (machine learning)","score":0.6202999949455261},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.6119999885559082},{"id":"https://openalex.org/C2781181686","display_name":"Coherence (philosophical gambling strategy)","score":0.5170999765396118},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.51419997215271},{"id":"https://openalex.org/C2775852435","display_name":"Utterance","score":0.5134000182151794},{"id":"https://openalex.org/C177264268","display_name":"Set (abstract data type)","score":0.4763000011444092},{"id":"https://openalex.org/C169903167","display_name":"Test set","score":0.42590001225471497}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7138711743","title":"Efficient Thought Space Exploration Through Strategic Intervention","url":"https://doi.org/10.1609/aaai.v40i38.40459","published":"2026-03-14","authors":["Ziheng Li","Hengyi Cai","Xiaochi Wei","Yuchen Li","Shuaiqiang Wang","Zhi-Hong Deng","Dawei Yin"],"abstract":"While large language models (LLMs) demonstrate emerging reasoning capabilities, current inference-time expansion methods incur prohibitive computational costs through exhaustive sampling. Through analyzing decoding trajectories, we observe that most next-token predictions align well with the golden output, except for a few critical tokens that lead to deviations. Inspired by this phenomenon, we propose a novel Hint-Practice Reasoning (HPR) framework that operationalizes this insight through two synergistic components: 1) a hinter (powerful LLM) that provides probabilistic guidance at critical decision points, and 2) a practitioner (efficient smaller model) that executes major reasoning steps. The framework's core innovation lies in Distributional Inconsistency Reduction (DIR), a theoretically-grounded metric that dynamically identifies intervention points by quantifying the divergence be...","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1609/aaai.v40i38.40459","openalex_id":"https://openalex.org/W7138711743","cited_by_count":0,"quality_score":45,"matched_keywords":["LLM","efficient"],"author_affiliations":["Baidu (China)","Peking University"],"concepts":[{"id":"https://openalex.org/C49937458","display_name":"Probabilistic logic","score":0.8021000027656555},{"id":"https://openalex.org/C176217482","display_name":"Metric (unit)","score":0.6352999806404114},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6154999732971191},{"id":"https://openalex.org/C193221554","display_name":"Commonsense reasoning","score":0.5652999877929688},{"id":"https://openalex.org/C57273362","display_name":"Decoding methods","score":0.44589999318122864},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4426000118255615},{"id":"https://openalex.org/C113174947","display_name":"Tree (set theory)","score":0.39649999141693115},{"id":"https://openalex.org/C2778572836","display_name":"Space (punctuation)","score":0.39149999618530273}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7139107659","title":"Co-EPG: A Framework for Co-Evolution of Planning and Grounding in Autonomous GUI Agents","url":"https://doi.org/10.1609/aaai.v40i43.40981","published":"2026-03-14","authors":["Yuan Zhao","Hongzi Zhu","tingyu jiang","Shen Li","X. J. Xu","Hao Henry Wang"],"abstract":"Graphical User Interface (GUI) task automation constitutes a critical frontier in artificial intelligence research. While effective GUI agents synergistically integrate planning and grounding capabilities, current methodologies exhibit two fundamental limitations: (1) insufficient exploitation of cross-model synergies, and (2) over-reliance on synthetic data generation without sufficient utilization. To address these challenges, we propose Co-EPG, a self-iterative training framework for Co-Evolution of Planning and Grounding. Co-EPG establishes an iterative positive feedback loop: through this loop, the planning model explores superior strategies under grounding-based reward guidance via Group Relative Policy Optimization (GRPO), generating diverse data to optimize the grounding model. Concurrently, the optimized Grounding model provides more effective rewards for subsequent GRPO trainin...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1609/aaai.v40i43.40981","openalex_id":"https://openalex.org/W7139107659","cited_by_count":0,"quality_score":45,"matched_keywords":["distillation","agent"],"author_affiliations":["Alibaba Group (China)","The University of Tokyo"],"concepts":[{"id":"https://openalex.org/C2780451532","display_name":"Task (project management)","score":0.660099983215332},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.59170001745224},{"id":"https://openalex.org/C115901376","display_name":"Automation","score":0.5392000079154968},{"id":"https://openalex.org/C113843644","display_name":"Interface (matter)","score":0.45329999923706055},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.43529999256134033},{"id":"https://openalex.org/C168993435","display_name":"Ground","score":0.424699991941452},{"id":"https://openalex.org/C18762648","display_name":"Work (physics)","score":0.3707999885082245},{"id":"https://openalex.org/C13687954","display_name":"Autonomous agent","score":0.3546000123023987}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7137962370","title":"Beyond Next Token Probabilities: Learnable, Fast Detection of Hallucinations and Data Contamination on LLM Output Distributions","url":"https://doi.org/10.1609/aaai.v40i36.40254","published":"2026-03-14","authors":["Guy Bar-Shalom","Fabrizio Frasca","Derek Lim","Yoav Gelberg","Yftah Ziser","Ran El-Yaniv","Gal Chechik","Haggai Maron"],"abstract":"The automated detection of hallucinations and training data contamination is pivotal to the safe deployment of Large Language Models (LLMs). These tasks are particularly challenging in settings where no access to model internals is available. Current approaches in this setup typically leverage only the probabilities of actual tokens in the text, relying on simple task-specific heuristics. Crucially, they overlook the information contained in the full sequence of next-token probability distributions. We propose to go beyond hand-crafted decision rules by learning directly from the complete observable output of LLMs — consisting not only of next-token probabilities, but also the full sequence of next-token distributions. We refer to this as the LLM Output Signature (LOS), and treat it as a reference data type for detecting hallucinations and data contamination. To that end, we introduce LO...","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1609/aaai.v40i36.40254","openalex_id":"https://openalex.org/W7137962370","cited_by_count":0,"quality_score":45,"matched_keywords":["LLM","efficient"],"author_affiliations":["Bar-Ilan University","Center for Open Science","Nvidia (United Kingdom)","Nvidia (United States)","Oxford Centre for Mission Studies","Technion – Israel Institute of Technology","University of Groningen"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8102999925613403},{"id":"https://openalex.org/C153083717","display_name":"Leverage (statistics)","score":0.6563000082969666},{"id":"https://openalex.org/C48145219","display_name":"Security token","score":0.5254999995231628},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4927999973297119},{"id":"https://openalex.org/C124101348","display_name":"Data mining","score":0.4645000100135803},{"id":"https://openalex.org/C26517878","display_name":"Key (lock)","score":0.45669999718666077},{"id":"https://openalex.org/C2778112365","display_name":"Sequence (biology)","score":0.4560000002384186},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.4352000057697296}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7138840773","title":"BEE-RAG: Balanced Entropy Engineering for Retrieval-Augmented Generation","url":"https://doi.org/10.1609/aaai.v40i40.40664","published":"2026-03-14","authors":["Yuhao Wang","Ruiyang Ren","Yucheng Wang","Jing Liu","Xin Zhao","Hua Wu","Haifeng Wang"],"abstract":"With the rapid advancement of large language models (LLMs), retrieval-augmented generation (RAG) has emerged as a critical approach to supplement the inherent knowledge limitations of LLMs. However, due to the typically large volume of retrieved information, RAG tends to operate with long context lengths. From the perspective of entropy engineering, we identify unconstrained entropy growth and attention dilution due to long retrieval context as significant factors affecting RAG performance. In this paper, we propose the balanced entropy-engineered RAG (BEE-RAG) framework, which improves the adaptability of RAG systems to varying context lengths through the principle of entropy invariance. By leveraging balanced context entropy to reformulate attention dynamics, BEE-RAG separates attention sensitivity from context length, ensuring a stable entropy level. Building upon this, we introduce a...","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1609/aaai.v40i40.40664","openalex_id":"https://openalex.org/W7138840773","cited_by_count":0,"quality_score":45,"matched_keywords":["retrieval","efficient"],"author_affiliations":["Baidu (China)","Beijing Academy of Artificial Intelligence","Renmin University of China"],"concepts":[{"id":"https://openalex.org/C177606310","display_name":"Adaptability","score":0.6261000037193298},{"id":"https://openalex.org/C106301342","display_name":"Entropy (arrow of time)","score":0.6180999875068665},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5608000159263611},{"id":"https://openalex.org/C167981619","display_name":"Cross entropy","score":0.46480000019073486},{"id":"https://openalex.org/C2776214188","display_name":"Inference","score":0.435699999332428},{"id":"https://openalex.org/C33923547","display_name":"Mathematics","score":0.382999986410141},{"id":"https://openalex.org/C125252325","display_name":"Entropy rate","score":0.3531999886035919},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3481999933719635}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7138362239","title":"ArchRAG: Attributed Community-based Hierarchical Retrieval-Augmented Generation","url":"https://doi.org/10.1609/aaai.v40i19.38619","published":"2026-03-14","authors":["Shu Wang","Yixiang Fang","Yingli Zhou","Xilin Liu","Yuchi Ma"],"abstract":"Retrieval-Augmented Generation (RAG) has proven effective in integrating external knowledge into large language models (LLMs) for solving question-answer (QA) tasks. The state-of-the-art RAG approaches often use the graph data as the external data since they capture the rich semantic information and link relationships between entities. However, existing graph-based RAG approaches cannot accurately identify the relevant information from the graph and also consume large numbers of tokens in the online retrieval process. To address these issues, we introduce a novel graph-based RAG approach, called Attributed Community-based Hierarchical RAG (ArchRAG), by augmenting the question using attributed communities, and also introducing a novel LLM-based hierarchical clustering method. To retrieve the most relevant information from the graph for the question, we build a novel hierarchical index str...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1609/aaai.v40i19.38619","openalex_id":"https://openalex.org/W7138362239","cited_by_count":0,"quality_score":45,"matched_keywords":["LLM","retrieval"],"author_affiliations":["Chinese University of Hong Kong, Shenzhen","Cloud Computing Center","Huawei Technologies (China)","Huawei Technologies (United Kingdom)","Huawei Technologies (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7778000235557556},{"id":"https://openalex.org/C132525143","display_name":"Graph","score":0.5636000037193298},{"id":"https://openalex.org/C73555534","display_name":"Cluster analysis","score":0.5195000171661377},{"id":"https://openalex.org/C48145219","display_name":"Security token","score":0.48010000586509705},{"id":"https://openalex.org/C124101348","display_name":"Data mining","score":0.4708999991416931},{"id":"https://openalex.org/C2987255567","display_name":"Knowledge graph","score":0.4408000111579895},{"id":"https://openalex.org/C22047676","display_name":"Clustering coefficient","score":0.41429999470710754},{"id":"https://openalex.org/C92835128","display_name":"Hierarchical clustering","score":0.39980000257492065}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7135221716","title":"OBIMD: A Multi-modal Dataset for Contextual Interpretation of Oracle Bone Inscriptions","url":"https://doi.org/10.1038/s41597-026-06967-0","published":"2026-03-14","authors":["Bang Li","Jing Yang","Yujie Liang","Xiaobin Hu","Zengmao Ding","Xu Peng","Shengwei Han","Peichao Qin","Donghao Luo","Taisong Jin","Feng Gao","Yongge Liu"],"abstract":"Oracle bone inscriptions, the earliest known form of Chinese writing, hold immense historical and linguistic significance. However, existing digital datasets are typically limited to isolated characters and lack contextual and structural information essential for comprehensive analysis. We present the Oracle Bone Inscriptions Multi-modal Dataset (OBIMD), a large-scale, publicly available corpus to provide pixel-aligned rubbing and facsimile images, character-level annotations, and sentence-level transcriptions with corresponding reading sequences. OBIMD encompasses 10,077 oracle bone inscription images spanning five phases of the Shang Dynasty, featuring 93,652 annotated characters, 21,667 recorded missing-character positions, 21,941 sentence units, and 4,192 non-sentential elements. By integrating visual, structural, and linguistic modalities, OBIMD supports multi-modal learning and div...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1038/s41597-026-06967-0","openalex_id":"https://openalex.org/W7135221716","cited_by_count":1,"quality_score":42,"matched_keywords":["retrieval"],"author_affiliations":["Anyang Normal University","Tencent (China)","University of Cambridge","Xiamen University"],"concepts":[{"id":"https://openalex.org/C527412718","display_name":"Interpretation (philosophy)","score":0.7441999912261963},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6850000023841858},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.6011000275611877},{"id":"https://openalex.org/C55166926","display_name":"Oracle","score":0.5546000003814697},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5206000208854675},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.4487000107765198},{"id":"https://openalex.org/C41895202","display_name":"Linguistics","score":0.3262999951839447},{"id":"https://openalex.org/C2779343474","display_name":"Context (archaeology)","score":0.23499999940395355}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W7138217008","title":"Human Cognition Inspired RAG with Knowledge Graph for Complex Problem Solving","url":"https://doi.org/10.1609/aaai.v40i36.40291","published":"2026-03-14","authors":["Yao Cheng","Yibo Zhao","Jiapeng Zhu","Yao Liu","Xing Sun","Xiang Li"],"abstract":"Large Language Models (LLMs) have demonstrated significant potential across various domains. However, they often struggle with integrating external knowledge and performing complex reasoning, leading to hallucinations and unreliable outputs. Retrieval Augmented Generation (RAG) has emerged as a promising paradigm to mitigate these issues by incorporating external knowledge. Yet, conventional RAG approaches, especially those based on vector similarity, fail to effectively capture relational dependencies and support multi-step reasoning. In this work, we propose CogGRAG, a human cognition-inspired, graph-based RAG framework designed for Knowledge Graph Question Answering (KGQA). CogGRAG models the reasoning process as a tree-structured mind map that decomposes the original problem into interrelated subproblems and explicitly encodes their semantic relationships. This structure not only pro...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1609/aaai.v40i36.40291","openalex_id":"https://openalex.org/W7138217008","cited_by_count":1,"quality_score":42,"matched_keywords":["retrieval"],"author_affiliations":["East China Normal University","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6847000122070312},{"id":"https://openalex.org/C169900460","display_name":"Cognition","score":0.555899977684021},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5299999713897705},{"id":"https://openalex.org/C2987255567","display_name":"Knowledge graph","score":0.5148000121116638},{"id":"https://openalex.org/C132525143","display_name":"Graph","score":0.45879998803138733},{"id":"https://openalex.org/C161301231","display_name":"Knowledge representation and reasoning","score":0.412200003862381},{"id":"https://openalex.org/C98045186","display_name":"Process (computing)","score":0.4090000092983246},{"id":"https://openalex.org/C44291984","display_name":"Question answering","score":0.40549999475479126}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W7139107105","title":"Beyond ReAct: A Planner-Centric Framework for Complex Tool-Augmented LLM Reasoning","url":"https://doi.org/10.1609/aaai.v40i40.40676","published":"2026-03-14","authors":["Xiaolong Wei","Yuehu Dong","Xin Wang","Xi Zhang","Zhejun Zhao","Dongdong Shen","Long Xia","Dawei Yin"],"abstract":"Existing tool-augmented large language models (LLMs) encounter significant challenges when processing complex queries. Current frameworks such as ReAct are prone to local optimization traps due to their reliance on incremental decision-making processes. To address these limitations, we propose a novel Planner-centric Plan-Execute paradigm that fundamentally resolves local optimization bottlenecks through architectural innovation. Central to our approach is a novel Planner model that performs global Directed Acyclic Graph (DAG) planning for complex queries, enabling optimized execution beyond conventional tool coordination. We also introduce ComplexTool-Plan, a large-scale benchmark dataset featuring complex queries that demand sophisticated multi-tool composition and coordination capabilities. Additionally, we develop a two-stage training methodology that integrates Supervised Fine-Tunin...","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1609/aaai.v40i40.40676","openalex_id":"https://openalex.org/W7139107105","cited_by_count":1,"quality_score":42,"matched_keywords":["LLM"],"author_affiliations":["Baidu (China)","Beihang University","Beijing Jiaotong University","Beijing University of Posts and Telecommunications"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7426000237464905},{"id":"https://openalex.org/C2776999362","display_name":"Planner","score":0.7258999943733215},{"id":"https://openalex.org/C185798385","display_name":"Benchmark (surveying)","score":0.6960999965667725},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5217999815940857},{"id":"https://openalex.org/C81917197","display_name":"Selection (genetic algorithm)","score":0.4819999933242798},{"id":"https://openalex.org/C114073186","display_name":"Automated planning and scheduling","score":0.48069998621940613},{"id":"https://openalex.org/C74197172","display_name":"Directed acyclic graph","score":0.4779999852180481},{"id":"https://openalex.org/C132525143","display_name":"Graph","score":0.44850000739097595}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W7138876343","title":"Audio-Thinker: Guiding Large Audio Language Model When and How to Think via Reinforcement Learning","url":"https://doi.org/10.1609/aaai.v40i40.40689","published":"2026-03-14","authors":["Shu Wu","Chenxing Li","Wenfu Wang","Hao Zhang","Hualei Wang","M. K. Yu","Dong Yu"],"abstract":"Recent advancements in large language models, multimodal large language models, and large audio language models (LALMs) have significantly improved their reasoning capabilities through reinforcement learning utilizing rule-based rewards. However, the explicit reasoning process has not yet yielded substantial benefits for audio question answering, and effectively leveraging deep reasoning remains an open challenge, with LALMs still falling short of achieving human-level auditory-language reasoning. To address these limitations, we propose Audio-Thinker, a reinforcement learning framework designed to enhance the reasoning capabilities of LALMs through improved adaptability, consistency, and effectiveness. Our approach introduces an adaptive think accuracy reward, enabling the model to adjust its reasoning strategies based on task complexity. Furthermore, we incorporate an external reward m...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1609/aaai.v40i40.40689","openalex_id":"https://openalex.org/W7138876343","cited_by_count":1,"quality_score":42,"matched_keywords":["language model"],"author_affiliations":["Tencent (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7522000074386597},{"id":"https://openalex.org/C97541855","display_name":"Reinforcement learning","score":0.6801999807357788},{"id":"https://openalex.org/C177148314","display_name":"Generalization","score":0.5989000201225281},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5561000108718872},{"id":"https://openalex.org/C2776436953","display_name":"Consistency (knowledge bases)","score":0.5067999958992004},{"id":"https://openalex.org/C98045186","display_name":"Process (computing)","score":0.474700003862381},{"id":"https://openalex.org/C2780451532","display_name":"Task (project management)","score":0.46230000257492065},{"id":"https://openalex.org/C185798385","display_name":"Benchmark (surveying)","score":0.4383000135421753}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W7138000221","title":"UniFit: Towards Universal Virtual Try-on with MLLM-Guided Semantic Alignment","url":"https://doi.org/10.1609/aaai.v40i15.38279","published":"2026-03-14","authors":["Wei Zhang","Yeying Jin","Xin Li","Yan Zhang","Xiaofeng Cong","Cong Wang","Fengcai Qiao","Zhichao Lian"],"abstract":"Image-based virtual try-on (VTON) aims to synthesize photorealistic images of a person wearing specified garments. Despite significant progress, building a universal VTON framework that can flexibly handle diverse and complex tasks remains a major challenge. Recent methods explore multi-task VTON frameworks guided by textual instructions, yet they still face two key limitations: (1) semantic gap between text instructions and reference images, and (2) data scarcity in complex scenarios. To address these challenges, we propose UniFit, a universal VTON framework driven by a Multimodal Large Language Model (MLLM). Specifically, we introduce an MLLM-Guided Semantic Alignment Module (MGSA), which integrates multimodal inputs using an MLLM and a set of learnable queries. By imposing a semantic alignment loss, MGSA captures cross-modal semantic relationships and provides coherent and explicit se...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1609/aaai.v40i15.38279","openalex_id":"https://openalex.org/W7138000221","cited_by_count":0,"quality_score":41,"matched_keywords":["language model"],"author_affiliations":["Nanjing University of Science and Technology","National University of Defense Technology","Tencent (China)","University of California, San Francisco","University of Science and Technology of China"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8317000269889832},{"id":"https://openalex.org/C177264268","display_name":"Set (abstract data type)","score":0.5587000250816345},{"id":"https://openalex.org/C2775955345","display_name":"Semantic mapping","score":0.5410000085830688},{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.5117999911308289},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.510200023651123},{"id":"https://openalex.org/C184337299","display_name":"Semantics (computer science)","score":0.5034000277519226},{"id":"https://openalex.org/C86034646","display_name":"Semantic gap","score":0.44690001010894775},{"id":"https://openalex.org/C26517878","display_name":"Key (lock)","score":0.44589999318122864}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7138188203","title":"TDSS: Task Dynamic-Synergistic Skill Adaptation for Boosting Efficient and Scalable Multi-Task Learning in Dense Visual Prediction","url":"https://doi.org/10.1609/aaai.v40i14.38172","published":"2026-03-14","authors":["Haiming Yao","Qiyu Chen","Wei Luo","Zheng Zhang","Jianxing Liao","Wei You"],"abstract":"The transfer of knowledge from large-scale pre-trained models to diverse downstream tasks has achieved remarkable success. Beyond the traditional full fine-tuning paradigm, Parameter-Efficient Fine-Tuning (PEFT) has emerged as a more efficient model adaptation approach. However, applying existing PEFT methods to adapt dense vision models, particularly in multi-task settings, remains inadequately explored due to their low efficiency, limited task scalability, and neglect of cross-task fine-tuning interactions. To address these challenges, we propose the Task Dynamic-Synergistic Skill Adaptation, termed TDSS, an efficient and scalable multi-task model adaptation framework for dense visual predictions. TDSS comprises two key components: Task-Dynamic Skill Adapters (TDSA) and Task-Synergistic Adaptation Interaction (TSAI). Specifically, TDSA are inserted in parallel into pre-trained vision m...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1609/aaai.v40i14.38172","openalex_id":"https://openalex.org/W7138188203","cited_by_count":0,"quality_score":41,"matched_keywords":["efficient"],"author_affiliations":["Huawei Technologies (China)","Huawei Technologies (United Kingdom)","Institute of Computing Technology"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8004000186920166},{"id":"https://openalex.org/C48044578","display_name":"Scalability","score":0.6589999794960022},{"id":"https://openalex.org/C46686674","display_name":"Boosting (machine learning)","score":0.6388999819755554},{"id":"https://openalex.org/C139807058","display_name":"Adaptation (eye)","score":0.6089000105857849},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5680999755859375},{"id":"https://openalex.org/C2780451532","display_name":"Task (project management)","score":0.5311999917030334},{"id":"https://openalex.org/C174348530","display_name":"Bridging (networking)","score":0.5307999849319458},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.48339998722076416}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7137897638","title":"SwiftVideo: A Unified Framework for Few-Step Video Generation Through Trajectory-Distribution Alignment","url":"https://doi.org/10.1609/aaai.v40i11.37881","published":"2026-03-14","authors":["Yanxiao Sun","Jiafu Wu","Yun Cao","Chengming Xu","Yabiao Wang","Weijian Cao","Donghao Luo","Chengjie Wang","Yanwei Fu"],"abstract":"Diffusion-based or flow-based models have achieved significant progress in video synthesis but require multiple iterative sampling steps, which incurs substantial computational overhead. While many distillation methods that are solely based on trajectory-preserving or distribution-matching have been developed to accelerate video generation models, these approaches often suffer from performance breakdown or increased artifacts in few-step settings. To address these limitations, we propose SwiftVideo, a unified and stable distillation framework that combines the advantages of trajectory-preserving and distribution-matching strategies. Our approach introduces continuous-time consistency distillation to ensure precise preservation of ODE trajectories. Subsequently, We propose a dual-perspective alignment encompassing distribution alignment between synthetic and real data along with trajector...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1609/aaai.v40i11.37881","openalex_id":"https://openalex.org/W7137897638","cited_by_count":0,"quality_score":41,"matched_keywords":["distillation"],"author_affiliations":["Fudan University","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7932000160217285},{"id":"https://openalex.org/C185798385","display_name":"Benchmark (surveying)","score":0.652999997138977},{"id":"https://openalex.org/C2776436953","display_name":"Consistency (knowledge bases)","score":0.6111999750137329},{"id":"https://openalex.org/C2776214188","display_name":"Inference","score":0.5963000059127808},{"id":"https://openalex.org/C13662910","display_name":"Trajectory","score":0.5087000131607056},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4375999867916107},{"id":"https://openalex.org/C2776449333","display_name":"View synthesis","score":0.39559999108314514},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.3855000138282776}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7139122609","title":"SMPRO: Self-Supervised Visual Preference Alignment via Differentiable Multi-Preference Multi-Group Ranking","url":"https://doi.org/10.1609/aaai.v40i44.41132","published":"2026-03-14","authors":["Sirnam Swetha","Rui Meng","Sabita Ram","Tal Neiman","Son Tran","Mubarak Shah"],"abstract":"Direct Preference Optimization (DPO) has emerged as a simple and effective approach for aligning models with human preferences. However, existing DPO-based methods suffer from 3 key drawbacks: they rely on only a single positive-negative preference pair per question, restricting the diversity and richness of feedback; they often emphasize minimizing negative preference scores while neglecting to strengthen the positive preferences; and they depend on either human-annotated preferences or expert model outputs - both expensive and difficult to scale. Moreover, the deterministic ranking assumptions of recent Group-based preference optimization methods break down in open-ended tasks such as Visual Question Answering (VQA), where multiple answers can be equally plausible but differ subtly in relevance or specificity. Given this subtle variance in preferences, we propose to perform ranking ove...","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1609/aaai.v40i44.41132","openalex_id":"https://openalex.org/W7139122609","cited_by_count":0,"quality_score":41,"matched_keywords":["preference"],"author_affiliations":["Amazon (Germany)","Amazon (United States)","University of Central Florida"],"concepts":[{"id":"https://openalex.org/C2781249084","display_name":"Preference","score":0.7950999736785889},{"id":"https://openalex.org/C189430467","display_name":"Ranking (information retrieval)","score":0.76419997215271},{"id":"https://openalex.org/C181204326","display_name":"Preference learning","score":0.7544999718666077},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.607699990272522},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5863999724388123},{"id":"https://openalex.org/C158154518","display_name":"Relevance (law)","score":0.5641000270843506},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.5623999834060669},{"id":"https://openalex.org/C202615002","display_name":"Differentiable function","score":0.4885999858379364}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7138224785","title":"Phased One-Step Adversarial Equilibrium for Video Diffusion Models","url":"https://doi.org/10.1609/aaai.v40i5.37318","published":"2026-03-14","authors":["Jiaxiang Cheng","Bing Ma","Xuhua Ren","Hongyi Jin","Kai Yu","Peng Zhang","Wenyue Li","Yuan Zhou","Tianxiang Zheng","Qinglin Lu"],"abstract":"Video diffusion generation suffers from critical sampling efficiency bottlenecks, particularly for large-scale models and long contexts. Existing video acceleration methods, adapted from image-based techniques, lack a single-step distillation ability for large-scale video models and task generalization for conditional downstream tasks. To bridge this gap, we propose the Video Phased Adversarial Equilibrium (V-PAE), a distillation framework that enables high-quality, single-step video generation from large-scale video models. Our approach employs a two-phase process. (i) Stability priming is a warm-up process to align the distributions of real and generated videos. It improves the stability of single-step adversarial distillation in the following process. (ii) Unified adversarial equilibrium is a flexible self-adversarial process that reuses generator parameters for the discriminator back...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1609/aaai.v40i5.37318","openalex_id":"https://openalex.org/W7138224785","cited_by_count":0,"quality_score":41,"matched_keywords":["distillation"],"author_affiliations":["Tencent (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.722599983215332},{"id":"https://openalex.org/C37736160","display_name":"Adversarial system","score":0.47769999504089355},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4645000100135803},{"id":"https://openalex.org/C2779803651","display_name":"Discriminator","score":0.421999990940094},{"id":"https://openalex.org/C112972136","display_name":"Stability (learning theory)","score":0.41690000891685486},{"id":"https://openalex.org/C99498987","display_name":"Noise (video)","score":0.3993000090122223},{"id":"https://openalex.org/C2780992000","display_name":"Generator (circuit theory)","score":0.3763999938964844},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.35510000586509705}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7138637823","title":"Omni-Effects: Unified and Spatially-Controllable Visual Effects Generation","url":"https://doi.org/10.1609/aaai.v40i10.37737","published":"2026-03-14","authors":["Fangyuan Mao","Aiming Hao","Jintao Chen","Dongxia Liu","Xiaokun Feng","Jiashu Zhu","Meiqi Wu","Chubin Chen","Jiahong Wu","Xiangxiang Chu"],"abstract":"Visual effects (VFX) are essential visual enhancements fundamental to modern cinematic production. Although video generation models offer cost-efficient solutions for VFX production, current methods are constrained by per-effect LoRA training, which limits generation to single effects. This fundamental limitation impedes applications that require spatially controllable composite effects, i.e., the concurrent generation of multiple effects at designated locations. However, integrating diverse effects into a unified framework faces major challenges: interference from effect variations and spatial uncontrollability during multi-VFX joint training. To tackle these challenges, we propose Omni-Effects, a first unified framework capable of generating prompt-guided effects and spatially controllable composite effects. The core of our framework comprises two key innovations: (1) LoRA-based Mixtur...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1609/aaai.v40i10.37737","openalex_id":"https://openalex.org/W7138637823","cited_by_count":0,"quality_score":41,"matched_keywords":["efficient"],"author_affiliations":["Alibaba Group (Cayman Islands)","Alibaba Group (China)","Alibaba Group (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7537000179290771},{"id":"https://openalex.org/C43521106","display_name":"Pipeline (software)","score":0.6435999870300293},{"id":"https://openalex.org/C2780801425","display_name":"Construct (python library)","score":0.5569000244140625},{"id":"https://openalex.org/C26517878","display_name":"Key (lock)","score":0.5547999739646912},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4569999873638153},{"id":"https://openalex.org/C9652623","display_name":"Field (mathematics)","score":0.33730000257492065},{"id":"https://openalex.org/C2164484","display_name":"Core (optical fiber)","score":0.3301999866962433},{"id":"https://openalex.org/C168167062","display_name":"Component (thermodynamics)","score":0.3269999921321869}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7139124303","title":"Multimodal DeepResearcher: Generating Text-Chart Interleaved Reports from Scratch with Agentic Framework","url":"https://doi.org/10.1609/aaai.v40i40.40734","published":"2026-03-14","authors":["Zhaorui Yang","Bo Pan","Han Wang","Yiyao Wang","Xi Liu","Luoxuan Weng","Yingchaojie Feng","Haozhe Feng","Minfeng Zhu","Bo Zhang","Weiqiu Chen"],"abstract":"Visualizations play a crucial part in effective communication of concepts and information. Recent advances in reasoning and retrieval augmented generation have enabled Large Language Models (LLMs) to perform deep research and generate comprehensive reports. Despite its progress, existing deep research frameworks primarily focus on generating text-only content, leaving the automated generation of interleaved texts and visualizations underexplored. This novel task poses key challenges in designing informative visualizations and effectively integrating them with text reports. To address these challenges, we propose Formal Description of Visualization (FDV), a structured textual representation of charts that enables LLMs to learn from and generate diverse, high-quality visualizations. Building on this representation, we introduce Multimodal DeepResearcher, an agentic framework that decompose...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1609/aaai.v40i40.40734","openalex_id":"https://openalex.org/W7139124303","cited_by_count":0,"quality_score":41,"matched_keywords":["retrieval"],"author_affiliations":["National University of Singapore","Tencent (China)","Zhejiang University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8256000280380249},{"id":"https://openalex.org/C2780451532","display_name":"Task (project management)","score":0.605400025844574},{"id":"https://openalex.org/C26517878","display_name":"Key (lock)","score":0.5708000063896179},{"id":"https://openalex.org/C177264268","display_name":"Set (abstract data type)","score":0.5485000014305115},{"id":"https://openalex.org/C192209626","display_name":"Focus (optics)","score":0.5296000242233276},{"id":"https://openalex.org/C36464697","display_name":"Visualization","score":0.5026000142097473},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.47620001435279846},{"id":"https://openalex.org/C2776359362","display_name":"Representation (politics)","score":0.47519999742507935}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7138002186","title":"MMhops-R1: Multimodal Multi-hop Reasoning","url":"https://doi.org/10.1609/aaai.v40i33.40068","published":"2026-03-14","authors":["Tao Zhang","Ziqi Zhang","Zongyang Ma","Yuxin Chen","Bing Li","Chunfeng Yuan","Guangting Wang","Fengyun Rao","Ying Shan","Weiming Hu"],"abstract":"The ability to perform multi-modal multi-hop reasoning by iteratively integrating information across various modalities and external knowledge is critical for addressing complex real-world challenges. However, existing Multi-modal Large Language Models (MLLMs) are predominantly limited to single-step reasoning, as existing benchmarks lack the complexity needed to evaluate and drive multi-hop abilities. To bridge this gap, we introduce MMhops, a novel, large-scale benchmark designed to systematically evaluate and foster multi-modal multi-hop reasoning. MMhops dataset comprises two challenging task formats, Bridging and Comparison, which necessitate that models dynamically construct complex reasoning chains by integrating external knowledge. To tackle the challenges posed by MMhops, we propose MMhops-R1, a novel multi-modal Retrieval-Augmented Generation (mRAG) framework for dynamic reason...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1609/aaai.v40i33.40068","openalex_id":"https://openalex.org/W7138002186","cited_by_count":0,"quality_score":41,"matched_keywords":["retrieval"],"author_affiliations":["Institute of Automation","Shandong Institute of Automation","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7972000241279602},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.6086999773979187},{"id":"https://openalex.org/C174348530","display_name":"Bridging (networking)","score":0.6035000085830688},{"id":"https://openalex.org/C97541855","display_name":"Reinforcement learning","score":0.5519999861717224},{"id":"https://openalex.org/C63479239","display_name":"Robustness (evolution)","score":0.5149999856948853},{"id":"https://openalex.org/C114073186","display_name":"Automated planning and scheduling","score":0.5008000135421753},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.4726000130176544},{"id":"https://openalex.org/C2780801425","display_name":"Construct (python library)","score":0.450300008058548}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7138461808","title":"How Foundational Skills Influence VLM-based Embodied Agents: A Native Perspective","url":"https://doi.org/10.1609/aaai.v40i10.37781","published":"2026-03-14","authors":["Bo Peng","Pi Bu","Keyu Pan","Xinrun Xu","Yingxiu Zhao","Miao Chen","Yang Du","Lin Li","Jun Song","Tong Xu"],"abstract":"Recent advances in vision–language models (VLMs) have shed light on human-level embodied intelligence. However, existing benchmarks for VLM-driven embodied agents still rely on high-level commands or discretised action spaces—``non-native'' settings that diverge markedly from the real world. Moreover, current benchmarks focus exclusively on high-level tasks, while lacking joint evaluation and analysis on both low- and high-level. To bridge these gaps, we present \\textbf{NativeEmbodied}, a challenging benchmark for VLM-driven embodied agents that adopts a unified, native low-level action space. Built upon diverse simulated scenes, NativeEmbodied first designs three representative high-level tasks in complex scenarios to evaluate overall performance. For more detailed and comprehensive performance analysis, we further decouple the entangled skills behind complex tasks and construct four ty...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1609/aaai.v40i10.37781","openalex_id":"https://openalex.org/W7138461808","cited_by_count":0,"quality_score":41,"matched_keywords":["agent"],"author_affiliations":["Alibaba Group (Cayman Islands)","Alibaba Group (China)","Alibaba Group (United States)","University of Science and Technology of China"],"concepts":[{"id":"https://openalex.org/C100609095","display_name":"Embodied cognition","score":0.9505000114440918},{"id":"https://openalex.org/C2780801425","display_name":"Construct (python library)","score":0.6449000239372253},{"id":"https://openalex.org/C12713177","display_name":"Perspective (graphical)","score":0.6330000162124634},{"id":"https://openalex.org/C2780791683","display_name":"Action (physics)","score":0.6046000123023987},{"id":"https://openalex.org/C26517878","display_name":"Key (lock)","score":0.558899998664856},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5388000011444092},{"id":"https://openalex.org/C2780451532","display_name":"Task (project management)","score":0.5023000240325928},{"id":"https://openalex.org/C100776233","display_name":"Bridge (graph theory)","score":0.4652999937534332}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7138376140","title":"GigaMoE: Sparsity-Guided Mixture of Experts for Efficient Gigapixel Object Detection","url":"https://doi.org/10.1609/aaai.v40i21.38810","published":"2026-03-14","authors":["Xiang Li","Wenxi Li","Yuetong Wang","Chenyang Lyu","Haozhe Lin","Guiguang Ding","Yuchen Guo"],"abstract":"Object detection in High-Resolution Wide (HRW) shots, or gigapixel images, presents unique challenges due to extreme object sparsity and vast scale variations. State-of-the-art methods like SparseFormer have pioneered sparse processing by selectively focusing on important regions, yet they apply a uniform computational model to all selected regions, overlooking their intrinsic complexity differences. This leads to a suboptimal trade-off between performance and efficiency. In this paper, we introduce GigaMoE, a novel backbone architecture that pioneers adaptive computation for this domain by replacing the standard Feed-Forward Networks (FFNs) with a Mixture-of-Experts (MoE) module. Our architecture first employs a shared expert to provide a robust feature baseline for all selected regions. Upon this foundation, our core innovation---a novel Sparsity-Guided Routing mechanism---insightfully...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1609/aaai.v40i21.38810","openalex_id":"https://openalex.org/W7138376140","cited_by_count":0,"quality_score":41,"matched_keywords":["efficient"],"author_affiliations":["Alibaba Group (China)","East China Normal University","Tsinghua University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7814000248908997},{"id":"https://openalex.org/C2776151529","display_name":"Object detection","score":0.6754999756813049},{"id":"https://openalex.org/C36503486","display_name":"Domain (mathematical analysis)","score":0.5764999985694885},{"id":"https://openalex.org/C2776401178","display_name":"Feature (linguistics)","score":0.5435000061988831},{"id":"https://openalex.org/C2781238097","display_name":"Object (grammar)","score":0.5217999815940857},{"id":"https://openalex.org/C88796919","display_name":"Backbone network","score":0.5151000022888184},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5033000111579895},{"id":"https://openalex.org/C45374587","display_name":"Computation","score":0.49720001220703125}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7137921786","title":"GRAM-R²: Self-Training Generative Foundation Reward Models for Reward Reasoning","url":"https://doi.org/10.1609/aaai.v40i39.40626","published":"2026-03-14","authors":["Chenglong Wang","Yongyu Mu","Hang Zhou","Yifu Huo","Ziming Zhu","Jiali Zeng","Murun Yang","Bei Li","Xiaoyang Hao","Chunliang Zhang","Fandong Meng","Jingbo Zhu"],"abstract":"Major progress in reward modeling over recent years has been driven by a paradigm shift from task-specific designs to generalist reward models. Despite this trend, developing effective reward models remains a fundamental challenge: the heavy reliance on large-scale labeled preference data. Pre-training on abundant unlabeled data offers a promising direction, but existing approaches fall short in instilling explicit reasoning capabilities into reward models. To bridge this gap, we propose a self-training approach that can leverage unlabeled data to scale up reward reasoning in reward models. Based on this approach, we develop GRAM-R² a generative reward model trained to produce not only preference labels but also accompanying reward rationales. GRAM-R² can serve as a foundation model for reward reasoning and can be applied to a wide range of tasks with minimal or no additional fine-tuning...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1609/aaai.v40i39.40626","openalex_id":"https://openalex.org/W7137921786","cited_by_count":0,"quality_score":41,"matched_keywords":["preference"],"author_affiliations":["Northeastern University","Tencent (China)","Universidad del Noreste"],"concepts":[{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.5921000242233276},{"id":"https://openalex.org/C97541855","display_name":"Reinforcement learning","score":0.5830000042915344},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5561000108718872},{"id":"https://openalex.org/C153083717","display_name":"Leverage (statistics)","score":0.5483999848365784},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5404999852180481},{"id":"https://openalex.org/C97931131","display_name":"Discriminative model","score":0.4828000068664551},{"id":"https://openalex.org/C2780966255","display_name":"Foundation (evidence)","score":0.4399000108242035},{"id":"https://openalex.org/C167966045","display_name":"Generative model","score":0.4341999888420105}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7138214508","title":"FAM: Fine-Grained Alignment Matters in Multimodal Embedding Learning with Large Vision-Language Models","url":"https://doi.org/10.1609/aaai.v40i32.39918","published":"2026-03-14","authors":["Tianhang Xiang","Yirui Li","Lizhao Liu","Hongyan Zhi","Chuanshen Chen","Qing Du","Mingkui Tan"],"abstract":"Learning multimodal representation is a fundamental task that supports a wide range of applications such as visual-text retrieval. While pioneering approaches e.g., CLIP paves the way by learning separated encoders for different modalities, they struggle to model complex interactions between modalities, resulting in inferior vision and language representation. Recently, researchers have begun to leverage powerful Large Vision-Language Models (LVLMs) for unimodal or multimodal encoding, showing substantial improvement over separated encoder methods. However, we find that directly adapting LVLMs to embedding models suffers from insufficient visual representation and coarse multimodal alignment. To address these issues, we propose a simple yet effective Fine-grained Alignment Matters (FAM) method to achieve fine-grained vision-language embedding learning with LVLMs. First, to close the gap....","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1609/aaai.v40i32.39918","openalex_id":"https://openalex.org/W7138214508","cited_by_count":0,"quality_score":41,"matched_keywords":["retrieval"],"author_affiliations":["Peng Cheng Laboratory","South China University of Technology","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C41608201","display_name":"Embedding","score":0.873199999332428},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7107999920845032},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.6075999736785889},{"id":"https://openalex.org/C153083717","display_name":"Leverage (statistics)","score":0.5755000114440918},{"id":"https://openalex.org/C118505674","display_name":"Encoder","score":0.5299000144004822},{"id":"https://openalex.org/C2776359362","display_name":"Representation (politics)","score":0.5235999822616577},{"id":"https://openalex.org/C59404180","display_name":"Feature learning","score":0.5200999975204468},{"id":"https://openalex.org/C2780660688","display_name":"Multimodal learning","score":0.4934999942779541}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7138080906","title":"Enhancing Stability and Fidelity for Zero-Shot TTS with a Multi-Level Evaluator","url":"https://doi.org/10.1609/aaai.v40i39.40636","published":"2026-03-14","authors":["Hualei Wang","Na Li","Chuke Wang","Shu Wu","Zhifeng Li","Dong Yu"],"abstract":"Recent advances in zero-shot text-to-speech (TTS), driven by language models, diffusion models and masked generation, have achieved impressive naturalness in speech synthesis. Nevertheless, stability and fidelity remain key challenges, manifesting as mispronunciations, audible noise, and quality degradation. To address these issues, we introduce Vox-Evaluator, a multi-level evaluator designed to guide the correction of erroneous speech segments and preference alignment for TTS systems. It is capable of identifying the temporal boundaries of erroneous segments and providing a holistic quality assessment of the generated speech. Specifically, to refine erroneous segments and enhance the robustness of the zero-shot TTS model, we propose to automatically identify acoustic errors with the evaluator, mask the erroneous segments, and finally regenerate speech conditioning on the correct portion...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1609/aaai.v40i39.40636","openalex_id":"https://openalex.org/W7138080906","cited_by_count":0,"quality_score":41,"matched_keywords":["preference"],"author_affiliations":["Tencent (China)","Vision Technology (United States)"],"concepts":[{"id":"https://openalex.org/C134537474","display_name":"Naturalness","score":0.8547999858856201},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7889000177383423},{"id":"https://openalex.org/C2776459999","display_name":"Fidelity","score":0.7361000180244446},{"id":"https://openalex.org/C63479239","display_name":"Robustness (evolution)","score":0.6284000277519226},{"id":"https://openalex.org/C28490314","display_name":"Speech recognition","score":0.605400025844574},{"id":"https://openalex.org/C113364801","display_name":"High fidelity","score":0.5885000228881836},{"id":"https://openalex.org/C60048801","display_name":"Intelligibility (philosophy)","score":0.5217000246047974},{"id":"https://openalex.org/C2780844864","display_name":"Pronunciation","score":0.5175999999046326}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7138741883","title":"Enhancing Spatial Reasoning Through Visual and Textual Thinking","url":"https://doi.org/10.1609/aaai.v40i28.39514","published":"2026-03-14","authors":["Xun Liang","Xin Guo","Zhongming Jin","Weihang Pan","Penghui Shang","Deng Cai","Binbin Lin","Jieping Ye"],"abstract":"The spatial reasoning task aims to reason about the spatial relationships in 2D and 3D space, which is a fundamental capability for Visual Question Answering (VQA) and robotics. Although vision language models (VLMs) have developed rapidly in recent years, they are still struggling with the spatial reasoning task. In this paper, we introduce a method that can enhance Spatial reasoning through Visual and Textual thinking Simultaneously (SpatialVTS). In the spatial visual thinking phase, our model is trained to generate location-related specific tokens of important targets automatically. Not only are the objects mentioned in the problem addressed, but also the potential objects related to the reasoning are considered. During the spatial textual thinking phase, our model conducts long-term thinking based on visual cues and dialogues and gradually inferences the answers to spatial reasoning....","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1609/aaai.v40i28.39514","openalex_id":"https://openalex.org/W7138741883","cited_by_count":0,"quality_score":41,"matched_keywords":["long-term"],"author_affiliations":["Alibaba Group (China)","Cloud Computing Center","State Key Laboratory of Chemical Engineering","Xihu Institute of Electronic Research","Zhejiang University"],"concepts":[{"id":"https://openalex.org/C155911833","display_name":"Spatial intelligence","score":0.8090000152587891},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7017999887466431},{"id":"https://openalex.org/C2777508537","display_name":"Visual reasoning","score":0.6345000267028809},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5870000123977661},{"id":"https://openalex.org/C2780451532","display_name":"Task (project management)","score":0.5778999924659729},{"id":"https://openalex.org/C27511587","display_name":"Spatial relation","score":0.5593000054359436},{"id":"https://openalex.org/C183521366","display_name":"Psychology of reasoning","score":0.5393000245094299},{"id":"https://openalex.org/C552651612","display_name":"Visual thinking","score":0.5218999981880188}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7137970978","title":"Efficiently Seeking Flat Minima for Better Generalization in Fine-Tuning Large Language Models and Beyond","url":"https://doi.org/10.1609/aaai.v40i25.39211","published":"2026-03-14","authors":["Jiaxin Deng","Qingcheng Zhu","Junbiao Pang","Linlin Yang","Zhihui Fu","Baochang Zhang"],"abstract":"Little research explores the correlation between the expressive ability and generalization ability of the low-rank adaptation (LoRA). Sharpness-Aware Minimization (SAM) improves model generalization for both Convolutional Neural Networks (CNNs) and Transformers by encouraging convergence to locally flat minima. However, the connection between sharpness and generalization has not been fully explored for LoRA due to the lack of tools to either empirically seek flat minima or develop theoretical methods. In this work, we propose Flat Minima LoRA (FMLoRA) and its efficient version i.e., EFMLoRA, to seek flat minima for LoRA. Concretely, we theoretically demonstrate that perturbations in the full parameter space can be transferred to the low-rank subspace. This approach eliminates the potential interference introduced by perturbations across multiple matrices in the low-rank subspace. Our ext...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1609/aaai.v40i25.39211","openalex_id":"https://openalex.org/W7137970978","cited_by_count":0,"quality_score":41,"matched_keywords":["efficient"],"author_affiliations":["Beihang University","Beijing University of Technology","Communication University of China","Huawei Technologies (China)","Zhejiang Lab"],"concepts":[{"id":"https://openalex.org/C186633575","display_name":"Maxima and minima","score":0.9294999837875366},{"id":"https://openalex.org/C177148314","display_name":"Generalization","score":0.7297000288963318},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6517999768257141},{"id":"https://openalex.org/C147764199","display_name":"Minification","score":0.503000020980835},{"id":"https://openalex.org/C11413529","display_name":"Algorithm","score":0.4921000003814697},{"id":"https://openalex.org/C2777303404","display_name":"Convergence (economics)","score":0.4595000147819519},{"id":"https://openalex.org/C50644808","display_name":"Artificial neural network","score":0.44350001215934753},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.38440001010894775}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7138007162","title":"Difficulty Is Not Enough: Curriculum Learning for LLMs Fine-tuning Must Consider Utility","url":"https://doi.org/10.1609/aaai.v40i37.40400","published":"2026-03-14","authors":["Zishang Jiang","Jinyi Han","Tingyun Li","Xinyi Wang","Sihang Jiang","Xiaojun Meng","Jiansheng Wei","Jiaqing Liang","Yanghua Xiao"],"abstract":"Fine-tuning plays an essential role in improving the performance of large language models (LLMs) on specific tasks. A central challenge lies in designing data-efficient strategy to achieve better fine-tuning performance. Curriculum learning, which organizes data from easy to hard, has become a widely adopted technique in LLMs training. However, existing methods for curriculum learning focus only on the difficulty of samples, while neglecting their contribution to improving model performance, making them vulnerable when applied to fine-tuning LLMs. To address this, we propose Difficulty-Utility Curriculum Learning (DUCL), a curriculum learning framework that jointly considers difficulty and utility. DUCL introduces a novel scoring method, Difficulty-Utility Evaluation (DUE), and a soft scheduling strategy called Window Ordering, which together promote efficient and effective fine-tuning.....","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1609/aaai.v40i37.40400","openalex_id":"https://openalex.org/W7138007162","cited_by_count":0,"quality_score":41,"matched_keywords":["efficient"],"author_affiliations":["East China Normal University","Fudan University","Huawei Technologies (China)"],"concepts":[{"id":"https://openalex.org/C47177190","display_name":"Curriculum","score":0.755299985408783},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5910999774932861},{"id":"https://openalex.org/C206729178","display_name":"Scheduling (production processes)","score":0.4194999933242798},{"id":"https://openalex.org/C539667460","display_name":"Management science","score":0.4018000066280365},{"id":"https://openalex.org/C192209626","display_name":"Focus (optics)","score":0.37950000166893005},{"id":"https://openalex.org/C145420912","display_name":"Mathematics education","score":0.3783000111579895},{"id":"https://openalex.org/C48044578","display_name":"Scalability","score":0.3698999881744385},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3384999930858612}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7138181719","title":"Cross-Scale Collaboration between LLMs and Lightweight Sequential Recommenders with Domain-Specific Latent Reasoning","url":"https://doi.org/10.1609/aaai.v40i19.38680","published":"2026-03-14","authors":["Yipeng Zhang","Xin Eric Wang","Hong Chen","Junwei Pan","Qian Li","Jun Zhang","Jie Jiang","Hong Mei","Wenwu Zhu"],"abstract":"Sequential recommendation aims to predict the next item based on historical interactions. To further enhance the reasoning capability in sequential recommendation, LLMs are employed to predict the next item or generate semantic IDs for item representation, given LLMs' extensive domain knowledge and reasoning ability. However, existing LLM-based methods suffer from two limitations. (i) The scarcity of recommendation data with reasoning paths makes it challenging to design suitable chain-of-thought prompting templates, and the full potential of LLMs' reasoning abilities remains underutilized. (ii) Upon obtaining semantic IDs, the LLMs and their representations are excluded from the subsequent recommendation model training, preventing downstream models from fully utilizing the rich semantic information encoded within these IDs. To address these issues, we propose a novel CoderRec framework,...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1609/aaai.v40i19.38680","openalex_id":"https://openalex.org/W7138181719","cited_by_count":0,"quality_score":41,"matched_keywords":["LLM"],"author_affiliations":["Peking University","Software (Spain)","Tencent (China)","Tsinghua University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7652000188827515},{"id":"https://openalex.org/C2776214188","display_name":"Inference","score":0.691100001335144},{"id":"https://openalex.org/C153083717","display_name":"Leverage (statistics)","score":0.6299999952316284},{"id":"https://openalex.org/C184337299","display_name":"Semantics (computer science)","score":0.5796999931335449},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5047000050544739},{"id":"https://openalex.org/C161301231","display_name":"Knowledge representation and reasoning","score":0.414900004863739},{"id":"https://openalex.org/C109747225","display_name":"Scarcity","score":0.37770000100135803},{"id":"https://openalex.org/C36503486","display_name":"Domain (mathematical analysis)","score":0.3763999938964844}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7138854239","title":"Beyond Step Pruning: Information Theory Based Step-level Optimization for Self-Refining Large Language Models","url":"https://doi.org/10.1609/aaai.v40i41.40798","published":"2026-03-14","authors":["Jinman Zhao","Erxue Min","Hui Wu","Ziheng Li","Zexu Sun","Hengyi Cai","Shuaiqiang Wang","Xu Chen","Gerald Penn"],"abstract":"Large language models (LLMs) have shown impressive capabilities in natural language tasks, yet they continue to struggle with multi-step mathematical reasoning, where correctness depends on a precise chain of intermediate steps. Preference optimization methods such as Direct Preference Optimization (DPO) have improved answer-level alignment, but they often overlook the reasoning process itself, providing little supervision over intermediate steps that are critical for complex problem-solving. Existing fine-grained approaches typically rely on strong annotators or reward models to assess the quality of individual steps. However, reward models are vulnerable to reward hacking. To address this, we propose ISLA, a reward-model-free framework that constructs step-level preference data directly from SFT gold traces. ISLA also introduces a self-improving pruning mechanism that identifies inform...","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1609/aaai.v40i41.40798","openalex_id":"https://openalex.org/W7138854239","cited_by_count":0,"quality_score":41,"matched_keywords":["preference"],"author_affiliations":["Aerospace Information Research Institute","Baidu (China)","Chinese Academy of Sciences","Peking University","Renmin University of China","University of Toronto"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7750999927520752},{"id":"https://openalex.org/C55439883","display_name":"Correctness","score":0.652899980545044},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.6413000226020813},{"id":"https://openalex.org/C108010975","display_name":"Pruning","score":0.6043999791145325},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.5946999788284302},{"id":"https://openalex.org/C98045186","display_name":"Process (computing)","score":0.5515000224113464},{"id":"https://openalex.org/C2781249084","display_name":"Preference","score":0.5436000227928162},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.5371999740600586}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7138997087","title":"Automatic Funny Scene Extraction from Long-form Cinematic Videos","url":"https://doi.org/10.1609/aaai.v40i47.41480","published":"2026-03-14","authors":["Sibendu Paul","Haotian Jiang","Caren Chen"],"abstract":"Automatically extracting engaging and high-quality humorous scenes from cinematic titles is pivotal for creating captivating video previews and snackable content, boosting user engagement on streaming platforms. Long-form cinematic titles, with their extended duration and complex narratives, challenge scene localization, while humor’s reliance on diverse modalities and its nuanced style add further complexity. This paper introduces an end-to-end system for automatically identifying and ranking humorous scenes from long-form cinematic titles, featuring shot detection, multimodal scene localization, and humor tagging optimized for cinematic content. Key innovations include a novel scene segmentation approach combining visual and textual cues, improved shot representations via guided triplet mining, and a multimodal humor tagging framework leveraging both audio and text modalities. Our syst...","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1609/aaai.v40i47.41480","openalex_id":"https://openalex.org/W7138997087","cited_by_count":0,"quality_score":41,"matched_keywords":["media"],"author_affiliations":["Amazon (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7742999792098999},{"id":"https://openalex.org/C2778344882","display_name":"Shot (pellet)","score":0.6290000081062317},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5846999883651733},{"id":"https://openalex.org/C43521106","display_name":"Pipeline (software)","score":0.5548999905586243},{"id":"https://openalex.org/C89600930","display_name":"Segmentation","score":0.5051000118255615},{"id":"https://openalex.org/C46686674","display_name":"Boosting (machine learning)","score":0.45509999990463257},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.4374000132083893},{"id":"https://openalex.org/C2778598663","display_name":"Video content analysis","score":0.4253999888896942}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7135376562","title":"A survey on imitation learning for contact-rich tasks in robotics","url":"https://doi.org/10.1177/02783649261417694","published":"2026-03-14","authors":["Toshiaki Tsuji","Yasuhiro Kato","Gökhan Solak","Heng Zhang","Tadej Petrič","Francesco Nori","Arash Ajoudani"],"abstract":"This paper comprehensively surveys research trends in imitation learning (IL) for contact-rich robotic tasks. Contact-rich tasks, which require complex physical interactions with the environment, represent a central challenge in robotics due to their nonlinear dynamics and sensitivity to small positional deviations. The paper examines demonstration collection methodologies, including teaching methods and sensory modalities crucial for capturing subtle interaction dynamics. We then analyze IL approaches, highlighting their applications to contact-rich manipulation. Recent advances in multimodal learning and foundation models have significantly enhanced performance in complex contact tasks across industrial, household, and healthcare domains. Through systematic organization of current research and identification of challenges, this survey provides a foundation for future advancements in co...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1177/02783649261417694","openalex_id":"https://openalex.org/W7135376562","cited_by_count":2,"quality_score":39,"matched_keywords":[],"author_affiliations":["Google (United States)","Intelligent Machines (Sweden)","Italian Institute of Technology","Jožef Stefan Institute","Purdue University West Lafayette","Saitama University","The University of Tokyo"],"concepts":[{"id":"https://openalex.org/C34413123","display_name":"Robotics","score":0.7329000234603882},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.7174000144004822},{"id":"https://openalex.org/C2779903281","display_name":"Modalities","score":0.5760999917984009},{"id":"https://openalex.org/C126388530","display_name":"Imitation","score":0.5706999897956848},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5552999973297119},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.4884999990463257},{"id":"https://openalex.org/C90509273","display_name":"Robot","score":0.44519999623298645},{"id":"https://openalex.org/C11207580","display_name":"Developmental robotics","score":0.39419999718666077}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":2}},{"id":"openalex:W7139004623","title":"STAR-1: Safer Alignment of Reasoning LLMs with 1K Data","url":"https://doi.org/10.1609/aaai.v40i44.41136","published":"2026-03-14","authors":["Zijun Wang","Haoqin Tu","Yuhan Wang","Juncheng Wu","Yanqing Liu","Jieru Mei","Brian R. Bartoldson","B Kailkhura","Cihang Xie"],"abstract":"This paper introduces STAR-1, a high-quality, just-1k-scale safety dataset specifically designed for large reasoning models (LRMs) like DeepSeek-R1. Built on three core principles --- diversity, deliberative reasoning, and rigorous filtering --- STAR-1 aims to address the critical needs for safety alignment in LRMs. Specifically, we begin by integrating existing open-source safety datasets from diverse sources. Then, we curate safety policies to generate policy-grounded deliberative reasoning samples. Lastly, we apply a GPT-4o-based safety scoring system to select training examples aligned with best practices. Experimental results show that fine-tuning LRMs with STAR-1 leads to an average 40% improvement in safety performance across four benchmarks, while only incurring a marginal decrease (e.g., an average of 1.1%) in reasoning ability measured across five reasoning tasks. Extensive abl...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1609/aaai.v40i44.41136","openalex_id":"https://openalex.org/W7139004623","cited_by_count":1,"quality_score":38,"matched_keywords":[],"author_affiliations":["Google (United States)","Lawrence Livermore National Laboratory","University of California, Santa Cruz"],"concepts":[{"id":"https://openalex.org/C2776654903","display_name":"SAFER","score":0.7491000294685364},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6351000070571899},{"id":"https://openalex.org/C2164484","display_name":"Core (optical fiber)","score":0.3901999890804291},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.33730000257492065},{"id":"https://openalex.org/C112930515","display_name":"Risk analysis (engineering)","score":0.33309999108314514},{"id":"https://openalex.org/C112972136","display_name":"Stability (learning theory)","score":0.27619999647140503},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.273499995470047},{"id":"https://openalex.org/C2522767166","display_name":"Data science","score":0.2718000113964081}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W7138463056","title":"Facial-R1: Aligning Reasoning and Recognition for Facial Emotion Analysis","url":"https://doi.org/10.1609/aaai.v40i32.39906","published":"2026-03-14","authors":["Jiulong Wu","Yi Shen","Lingyong Yan","Haixin Sun","Deguo Xia","Jizhou Huang","Min Cao"],"abstract":"Facial Emotion Analysis (FEA) extends traditional facial emotion recognition by incorporating explainable, fine-grained reasoning. The task integrates three subtasks—emotion recognition, facial Action Unit (AU) recognition, and AU-based emotion reasoning—to jointly model affective states. While recent approaches leverage Vision-Language Models (VLMs) and achieve promising results, they face two critical limitations: (1) hallucinated reasoning, where VLMs generate plausible but inaccurate explanations due to insufficient emotion-specific knowledge; and (2) misalignment between emotion reasoning and recognition, caused by fragmented connections between observed facial features and final labels. We propose Facial-R1, a three-stage alignment framework that effectively addresses both challenges with minimal supervision. First, we employ instruction fine-tuning to establish basic emotional rea...","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1609/aaai.v40i32.39906","openalex_id":"https://openalex.org/W7138463056","cited_by_count":1,"quality_score":38,"matched_keywords":[],"author_affiliations":["Baidu (China)","Soochow University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7484999895095825},{"id":"https://openalex.org/C2911011789","display_name":"Hallucinating","score":0.7247999906539917},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.6175000071525574},{"id":"https://openalex.org/C153083717","display_name":"Leverage (statistics)","score":0.5393000245094299},{"id":"https://openalex.org/C177148314","display_name":"Generalization","score":0.4828999936580658},{"id":"https://openalex.org/C2777438025","display_name":"Emotion recognition","score":0.47209998965263367},{"id":"https://openalex.org/C195704467","display_name":"Facial expression","score":0.45410001277923584},{"id":"https://openalex.org/C97541855","display_name":"Reinforcement learning","score":0.4496000111103058}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W7138287536","title":"Detect All-Type Deepfake Audio: Wavelet Prompt Tuning for Enhanced Auditory Perception","url":"https://doi.org/10.1609/aaai.v40i42.40907","published":"2026-03-14","authors":["Yuankun Xie","Ruibo Fu","Xiaopeng Wang","Zhiyong Wang","Songjun Cao","Long Ma","Haonan Cheng","Long Ye"],"abstract":"The rapid advancement of audio generation technologies has escalated the risks of malicious deepfake audio across speech, sound, singing voice, and music, threatening multimedia security and trust. While existing countermeasures (CMs) perform well in single-type audio deepfake detection (ADD), their performance declines in cross-type scenarios. This paper is dedicated to studying the all-type ADD task. We are the first to comprehensively establish an all-type ADD benchmark to evaluate current CMs, incorporating cross-type deepfake detection across speech, sound, singing voice, and music. Then, we introduce the prompt tuning self-supervised learning (PT-SSL) training paradigm, which optimizes SSL front-end by learning specialized prompt tokens for ADD, requiring 458× fewer trainable parameters than fine-tuning (FT). Considering the auditory perception of different audio types, we propose....","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1609/aaai.v40i42.40907","openalex_id":"https://openalex.org/W7138287536","cited_by_count":1,"quality_score":38,"matched_keywords":[],"author_affiliations":["Beijing Academy of Artificial Intelligence","Chinese Academy of Sciences","Communication University of China","Shandong Institute of Automation","Tencent (China)","University of Chinese Academy of Sciences"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8224999904632568},{"id":"https://openalex.org/C47432892","display_name":"Wavelet","score":0.5759000182151794},{"id":"https://openalex.org/C185798385","display_name":"Benchmark (surveying)","score":0.5375999808311462},{"id":"https://openalex.org/C28490314","display_name":"Speech recognition","score":0.5049999952316284},{"id":"https://openalex.org/C26760741","display_name":"Perception","score":0.47859999537467957},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4528000056743622},{"id":"https://openalex.org/C36503486","display_name":"Domain (mathematical analysis)","score":0.39879998564720154},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.3659999966621399}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W7138891962","title":"CMID: Towards Medical Visual Question Answering via Contrastive Mutual Information Decoding","url":"https://doi.org/10.1609/aaai.v40i41.40835","published":"2026-03-14","authors":["Zhihong Zhu","Yunyan Zhang","Fan Zhang","Bowen Xing","Xian Wu"],"abstract":"Medical Visual Question Answering (Med-VQA) aims to generate accurate answers for clinical questions grounded in medical images, which has attracted increasing research attention due to its potential to streamline diagnostics and reduce clinical burden. Recent advances in Large Vision-Language Models (LVLMs) have shown great promise for Med-VQA, but still suffer from two inference-time issues: (1) attention shift, where the LVLM over-relies on textual priors; and (2) attention dispersion, where it fails to focus on critical diagnostic regions. To tackle these issues, we propose Contrastive Mutual Information Decoding (CMID), a training-free inference-time intervention grounded in information theory for Med-VQA. Concretely, CMID first identifies the Principal Focus Area (PFA) from decoder attention maps, then constructs focus-preserving and focus-excluding views to derive dual contrastive...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1609/aaai.v40i41.40835","openalex_id":"https://openalex.org/W7138891962","cited_by_count":1,"quality_score":38,"matched_keywords":[],"author_affiliations":["Chinese University of Hong Kong","Tencent (China)","University of Science and Technology Beijing"],"concepts":[{"id":"https://openalex.org/C27158222","display_name":"Generalizability theory","score":0.7343000173568726},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6823999881744385},{"id":"https://openalex.org/C2780719617","display_name":"Salient","score":0.6166999936103821},{"id":"https://openalex.org/C192209626","display_name":"Focus (optics)","score":0.578000009059906},{"id":"https://openalex.org/C57273362","display_name":"Decoding methods","score":0.5716000199317932},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4950000047683716},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.48339998722076416},{"id":"https://openalex.org/C152139883","display_name":"Mutual information","score":0.4645000100135803}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W7138284635","title":"Where and What Matters: Sensitivity-Aware Task Vectors for Many-Shot Multimodal In-Context Learning","url":"https://doi.org/10.1609/aaai.v40i10.37733","published":"2026-03-14","authors":["Ziyu Ma","Chenhui Gou","Yiming Hu","Yong Wang","Bohan Zhuang","Jianfei Cai"],"abstract":"Large Multimodal Models (LMMs) have shown promising in-context learning (ICL) capabilities, but scaling to many-shot settings remains difficult due to limited context length and high inference cost. To address these challenges, task-vector-based methods have been explored by inserting compact representations of many-shot in-context demonstrations into model activations. However, existing task-vector-based methods either overlook the importance of where to insert task vectors or struggle to determine suitable values for each location. To this end, we propose a novel Sensitivity-aware Task Vector insertion framework (STV) to figure out where and what to insert. Our key insight is that activation deltas across query-context pairs exhibit consistent structural patterns, providing a reliable cue for insertion. Based on the identified sensitive-aware locations, we construct a pre-clustered act...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1609/aaai.v40i10.37733","openalex_id":"https://openalex.org/W7138284635","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Alibaba Group (China)","Alibaba Group (United States)","Monash University","Zhejiang University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7261000275611877},{"id":"https://openalex.org/C2780451532","display_name":"Task (project management)","score":0.6711999773979187},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5961999893188477},{"id":"https://openalex.org/C2780801425","display_name":"Construct (python library)","score":0.5806999802589417},{"id":"https://openalex.org/C2779343474","display_name":"Context (archaeology)","score":0.5568000078201294},{"id":"https://openalex.org/C2776214188","display_name":"Inference","score":0.5375999808311462},{"id":"https://openalex.org/C73555534","display_name":"Cluster analysis","score":0.5192999839782715},{"id":"https://openalex.org/C97541855","display_name":"Reinforcement learning","score":0.4959999918937683}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7138031529","title":"Verb Mirage: Unveiling and Assessing Verb Concept Hallucinations in Multimodal Large Language Models","url":"https://doi.org/10.1609/aaai.v40i12.38005","published":"2026-03-14","authors":["Zehao Wang","Xinpeng Liu","Yudonglin Zhang","Xiaoqian Wu","Zhou Fang","Yifan Fang","Junfu Pu","Cewu Lu","Yong-Lu Li"],"abstract":"Multimodal Large Language Models (MLLMs) have garnered significant attention recently and demonstrate outstanding capabilities in various tasks such as OCR, VQA, captioning, etc. However, hallucination remains a persistent issue. While numerous methods have been proposed to mitigate hallucinations, achieving notable improvements, these methods primarily focus on mitigating hallucinations related to object/noun concepts. Verb concepts, which are crucial for understanding human actions, have been largely overlooked. In this paper, to the best of our knowledge, we are the first to investigate the verb hallucination phenomenon of MLLMs from various perspectives. Our findings reveal that most state-of-the-art MLLMs suffer from severe verb hallucination. To assess the effectiveness of existing mitigation methods for object concept hallucination in relation to verb hallucination, we evaluated t...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1609/aaai.v40i12.38005","openalex_id":"https://openalex.org/W7138031529","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Institute of Natural Science","Shanghai Jiao Tong University","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C2776397901","display_name":"Verb","score":0.9248999953269958},{"id":"https://openalex.org/C15744967","display_name":"Psychology","score":0.529699981212616},{"id":"https://openalex.org/C2781238097","display_name":"Object (grammar)","score":0.5224000215530396},{"id":"https://openalex.org/C41895202","display_name":"Linguistics","score":0.4578999876976013},{"id":"https://openalex.org/C180747234","display_name":"Cognitive psychology","score":0.44830000400543213},{"id":"https://openalex.org/C111360854","display_name":"Reflexive verb","score":0.44279998540878296},{"id":"https://openalex.org/C192209626","display_name":"Focus (optics)","score":0.42080000042915344},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.4205999970436096}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7138098690","title":"VPN: Visual Prompt Navigation","url":"https://doi.org/10.1609/aaai.v40i22.38888","published":"2026-03-14","authors":["Shuo Feng","Zihan Wang","Yuchen Li","Rui Kong","Hengyi Cai","Shuaiqiang Wang","Gim Hee Lee","Piji Li","Shuqiang Jiang"],"abstract":"While natural language is commonly used to guide embodied agents, the inherent ambiguity and verbosity of language often hinder the effectiveness of language-guided navigation in complex environments. To this end, we propose Visual Prompt Navigation (VPN), a novel paradigm that guides agents to navigate using only user-provided visual prompts within 2D top-view maps. This visual prompt primarily focuses on marking the visual navigation trajectory on a top-down view of a scene, offering intuitive and spatially grounded guidance without relying on language instructions. It is more friendly for non-expert users and reduces interpretive ambiguity. We build VPN tasks in both discrete and continuous navigation settings, constructing two new datasets, R2R-VP and R2R-CE-VP, by extending existing R2R and R2R-CE episodes with corresponding visual prompts. Furthermore, we introduce VPNet, a dedicat...","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1609/aaai.v40i22.38888","openalex_id":"https://openalex.org/W7138098690","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Baidu (China)","Nanjing University of Aeronautics and Astronautics","National University of Singapore","University of Chinese Academy of Sciences","Artificial Intelligence in Medicine (Canada)","Shanghai Jiao Tong University"],"concepts":[{"id":"https://openalex.org/C2780522230","display_name":"Ambiguity","score":0.7071999907493591},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6919999718666077},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.5787000060081482},{"id":"https://openalex.org/C36464697","display_name":"Visualization","score":0.5224000215530396},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5090000033378601},{"id":"https://openalex.org/C195324797","display_name":"Natural language","score":0.46970000863075256},{"id":"https://openalex.org/C2780878386","display_name":"Visual language","score":0.45419999957084656},{"id":"https://openalex.org/C13662910","display_name":"Trajectory","score":0.4512999951839447}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7137994066","title":"TIMA: Text-Image Mutual Awareness for Balancing Zero-Shot Adversarial Robustness and Generalization Ability","url":"https://doi.org/10.1609/aaai.v40i29.39603","published":"2026-03-14","authors":["Fengji Ma","Hei Victor Cheng","Chenxing Li","Li Liu"],"abstract":"Achieving zero-shot adversarial robustness without sacrificing generalization remains challenging for foundation models such as CLIP, especially under large adversarial perturbations. Through empirical analyses, we identify three critical yet overlooked issues: (1) Logit margins exhibit a stable offset between small and large adversarial perturbations, suggesting that explicitly adjusting margins could improve robustness against unseen large perturbations. (2) A significant negative correlation exists between logit margin and inter-class semantic similarity, indicating that semantic structures are insufficiently leveraged by existing methods. (3) Existing methods for adjusting text embeddings disrupt the intrinsic semantic consistency established by pre-trained models, undermining generalization capability. Motivated by these findings, we propose a novel Text-Image Mutual Awareness (TIMA...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1609/aaai.v40i29.39603","openalex_id":"https://openalex.org/W7137994066","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Aarhus University","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C37736160","display_name":"Adversarial system","score":0.7688999772071838},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6873000264167786},{"id":"https://openalex.org/C63479239","display_name":"Robustness (evolution)","score":0.6218000054359436},{"id":"https://openalex.org/C774472","display_name":"Margin (machine learning)","score":0.5397999882698059},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5286999940872192},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.5015000104904175},{"id":"https://openalex.org/C140331021","display_name":"Logit","score":0.4805000126361847},{"id":"https://openalex.org/C177148314","display_name":"Generalization","score":0.4729999899864197}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7137961651","title":"Sortblock: Similarity-Aware Feature Reuse for Diffusion Model","url":"https://doi.org/10.1609/aaai.v40i4.37276","published":"2026-03-14","authors":["Hanqi Chen","Xu Zhang","Xiao Guan","Lielin Jiang","Guanzhong Wang","Zeyu Chen","Yi Liu"],"abstract":"Diffusion Transformers (DiTs) have demonstrated remarkable generative capabilities, particularly benefiting from Transformer architectures that enhance visual and artistic fidelity. However, their inherently sequential denoising process results in high inference latency, limiting their deployment in real-time scenarios. Existing training-free acceleration approaches typically reuse intermediate features at fixed timesteps or layers, overlooking the evolving semantic focus across denoising stages and Transformer blocks.To address this, we propose Sortblock, a training-free inference acceleration framework that dynamically caches block-wise features based on their similarity across adjacent timesteps. By ranking the evolution of residuals, Sortblock adaptively determines a recomputation ratio, selectively skipping redundant computations while preserving generation quality. Furthermore, we....","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1609/aaai.v40i4.37276","openalex_id":"https://openalex.org/W7137961651","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Baidu (China)","Wuhan University","Zhejiang University","University of Science and Technology of China"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7409999966621399},{"id":"https://openalex.org/C2776214188","display_name":"Inference","score":0.7146000266075134},{"id":"https://openalex.org/C68339613","display_name":"Speedup","score":0.6007999777793884},{"id":"https://openalex.org/C206588197","display_name":"Reuse","score":0.5857999920845032},{"id":"https://openalex.org/C177774035","display_name":"Granularity","score":0.5241000056266785},{"id":"https://openalex.org/C45374587","display_name":"Computation","score":0.5115000009536743},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4675999879837036},{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.4212000072002411}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7138987661","title":"ReCode: Updating Code API Knowledge with Reinforcement Learning","url":"https://doi.org/10.1609/aaai.v40i40.40683","published":"2026-03-14","authors":["Haoze Wu","Yunzhi Yao","Wenhao Yu","Ningyu Zhang"],"abstract":"Large Language Models (LLMs) exhibit remarkable code generation capabilities but falter when adapting to frequent updates in external library APIs. This critical limitation, stemming from reliance on outdated API knowledge from their training data, even with access to current documentation, impedes reliable code generation in dynamic environments. To tackle this issue, we propose ReCode (rule-based Reinforcement learning for Code Update), a novel framework that mimics human programmer adaptation to API changes. Specifically, we construct a dataset of approximately 2,000 data entries to train the LLMs to perform version migration based on updated information. Then, we introduce a modified string similarity metric for code evaluation as the reward for reinforcement learning. Our experiments demonstrate that ReCode substantially boosts LLMs' code generation performance in dynamic API scenar...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1609/aaai.v40i40.40683","openalex_id":"https://openalex.org/W7138987661","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Tencent (China)","Zhejiang University","Zhejiang University of Technology"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8274000287055969},{"id":"https://openalex.org/C97541855","display_name":"Reinforcement learning","score":0.7620000243186951},{"id":"https://openalex.org/C2778514511","display_name":"Programmer","score":0.6568999886512756},{"id":"https://openalex.org/C2776760102","display_name":"Code (set theory)","score":0.6456999778747559},{"id":"https://openalex.org/C2780801425","display_name":"Construct (python library)","score":0.5200999975204468},{"id":"https://openalex.org/C157486923","display_name":"String (physics)","score":0.5103999972343445},{"id":"https://openalex.org/C176217482","display_name":"Metric (unit)","score":0.5037000179290771},{"id":"https://openalex.org/C139807058","display_name":"Adaptation (eye)","score":0.49309998750686646}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7137942562","title":"RLMR: Reinforcement Learning with Mixed Rewards for Creative Writing","url":"https://doi.org/10.1609/aaai.v40i38.40467","published":"2026-03-14","authors":["Jianxing Liao","Tian Zhang","Xiao Feng","Yusong Zhang","Haorui Wang","Bosi Wen","Ziying Wang","Runzhi Shi"],"abstract":"Large language models are extensively utilized in creative writing applications. Creative writing requires a balance between subjective writing quality (e.g., literariness and emotional expression) and objective constraint following (e.g., format requirements and word limits). Existing reinforcement learning methods struggle to balance these two aspects: single reward strategies fail to improve both abilities simultaneously, while fixed-weight mixed-reward methods lack the ability to adapt to different writing scenarios. To address this problem, we propose Reinforcement Learning with Mixed Rewards (RLMR), utilizing a dynamically mixed reward system from a writing reward model evaluating subjective writing quality and a constraint verification model assessing objective constraint following. The constraint following reward weight is adjusted dynamically according to the writing quality wit...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1609/aaai.v40i38.40467","openalex_id":"https://openalex.org/W7137942562","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Peking University","Tencent (China)","Tsinghua University"],"concepts":[{"id":"https://openalex.org/C97541855","display_name":"Reinforcement learning","score":0.6859999895095825},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6344000101089478},{"id":"https://openalex.org/C2779530757","display_name":"Quality (philosophy)","score":0.580299973487854},{"id":"https://openalex.org/C2776036281","display_name":"Constraint (computer-aided design)","score":0.5564000010490417},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4668999910354614},{"id":"https://openalex.org/C185798385","display_name":"Benchmark (surveying)","score":0.40630000829696655},{"id":"https://openalex.org/C2775924081","display_name":"Control (management)","score":0.37369999289512634},{"id":"https://openalex.org/C184898388","display_name":"Pairwise comparison","score":0.36500000953674316}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7138251520","title":"PBR3DGen: A VLM-Guided Mesh Generation with High-Quality PBR Texture","url":"https://doi.org/10.1609/aaai.v40i13.38030","published":"2026-03-14","authors":["Xiaokang Wei","Bowen Zhang","Xianghui Yang","Yuxuan Wang","Chunchao Guo","Xi Zhao","Yan Luximon"],"abstract":"Generating high-quality physically based rendering (PBR) materials is important to achieve realistic rendering in the downstream tasks, yet it remains challenging due to the intertwined effects of materials and lighting. While existing methods have made breakthroughs by incorporating material decomposition in the 3D generation pipeline, they tend to bake highlights into albedo and ignore spatially varying properties of metallicity and roughness. In this work, we present PBR3DGen, a two-stage mesh generation method with high-quality PBR materials that integrates the novel multi-view PBR material estimation model and a 3D PBR mesh reconstruction model. Specifically, PBR3DGen leverages vision language models (VLM) to guide multi-view diffusion, precisely capturing the spatial distribution and inherent attributes of reflective-metalness material. Additionally, we incorporate view-dependent i...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1609/aaai.v40i13.38030","openalex_id":"https://openalex.org/W7138251520","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Hong Kong Polytechnic University","Nanyang Technological University","Tencent (China)","Xi'an Jiaotong University"],"concepts":[{"id":"https://openalex.org/C205711294","display_name":"Rendering (computer graphics)","score":0.7565000057220459},{"id":"https://openalex.org/C177769412","display_name":"Prior probability","score":0.6301000118255615},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5954999923706055},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4893999993801117},{"id":"https://openalex.org/C181145010","display_name":"Mesh generation","score":0.44679999351501465},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.43959999084472656},{"id":"https://openalex.org/C50494287","display_name":"Texture synthesis","score":0.3675999939441681},{"id":"https://openalex.org/C189950617","display_name":"Property (philosophy)","score":0.34119999408721924}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7137978937","title":"MagicPaint: Operate Anything for Image Inpainting with Diffusion Model","url":"https://doi.org/10.1609/aaai.v40i14.38151","published":"2026-03-14","authors":["Qinhong Yang","Dongdong Chen","Qi Chu","Tao Gong","Qiankun Liu","Zhentao Tan","Xulin Li","Huamin Feng","Nenghai Yu"],"abstract":"Recent diffusion-based models have significantly improved inpainting quality. However, existing methods struggle with multi-task inpainting due to conflicting optimization objectives, and current datasets are typically limited to task-specific scenarios, hindering joint training. To address these challenges, we propose MagicPaint, a unified diffusion-based inpainting model that supports object addition, removal, and unconditional inpainting across both text and image modalities. MagicPaint semantically decouples operation types and target content by learnable tokens in MMToken Module, effectively reconciling conflicting optimization objectives and enabling robust multi-task, multi-modal inpainting. Besides, a novel inpainting paradigm named MagicMask, encodes operating intent directly into the mask and applies a mask loss for spatially precise supervision. In addition, existing inpaintin...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1609/aaai.v40i14.38151","openalex_id":"https://openalex.org/W7137978937","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Beijing Electronic Science and Technology Institute","Microsoft (United States)","Microsoft Research (United Kingdom)","University of Science and Technology Beijing","University of Science and Technology of China"],"concepts":[{"id":"https://openalex.org/C11727466","display_name":"Inpainting","score":0.9818999767303467},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.7114999890327454},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6956999897956848},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.6523000001907349},{"id":"https://openalex.org/C2781238097","display_name":"Object (grammar)","score":0.6351000070571899},{"id":"https://openalex.org/C115961682","display_name":"Image (mathematics)","score":0.579200029373169},{"id":"https://openalex.org/C2776436953","display_name":"Consistency (knowledge bases)","score":0.46299999952316284},{"id":"https://openalex.org/C2776459999","display_name":"Fidelity","score":0.4392000138759613}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7137876994","title":"MMIFEvol: Towards Evolutionary Multimodal Instruction Following","url":"https://doi.org/10.1609/aaai.v40i31.39824","published":"2026-03-14","authors":["Haoyu Wang","Sihang Jiang","Xiangru Zhu","Yuyan Chen","Xiaojun Meng","Jiansheng Wei","Yitong Wang","Yanghua Xiao"],"abstract":"Multimodal Instruction Following serves as a fundamental capability of multimodal language models, involving accurate comprehension and execution of user-provided instructions. However, existing multimodal instruction-following datasets and benchmarks face the shortcomings outlined below: (a) Lack of Difficulty Stratification, they collect diverse instruction categories but neglect the stratification of difficulty levels across these categories, which leads to overlap, bias, and low interpretability. (b) Lack of Fine-Grained Metrics, they conflate the model's ability to ``solve tasks\" and ``follow constraints\" into a single metric, which fails to accurately reflect its instruction-following capability. (c) Lack of Multi-Task Instructions, they overlook the fact that real-world user instructions often consist of multiple combined tasks. This paper proposes MMIFEvol, a framework for multim...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1609/aaai.v40i31.39824","openalex_id":"https://openalex.org/W7137876994","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Cornell University","Fudan University","Huawei Technologies (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7943999767303467},{"id":"https://openalex.org/C2780801425","display_name":"Construct (python library)","score":0.6295999884605408},{"id":"https://openalex.org/C177264268","display_name":"Set (abstract data type)","score":0.5562000274658203},{"id":"https://openalex.org/C511192102","display_name":"Comprehension","score":0.5475000143051147},{"id":"https://openalex.org/C185798385","display_name":"Benchmark (surveying)","score":0.5364000201225281},{"id":"https://openalex.org/C130440534","display_name":"Conflation","score":0.49459999799728394},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.43290001153945923},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.4327000081539154}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7138205136","title":"LSAP-PV: High-Fidelity Palm Vein Image Synthesis via Layered Spectral Absorption Projection-Guided Diffusion Model","url":"https://doi.org/10.1609/aaai.v40i11.37837","published":"2026-03-14","authors":["Sheng Shang","Chenglong Zhao","Ruixin Zhang","Jianlong Jin","Jingyun Zhang","Jun Wang","Yang Zhao","Shouhong Ding","Wei Jia"],"abstract":"Palm vein recognition has emerged as a promising biometric technology, yet its development remains constrained by the scarcity of large-scale publicly available datasets. Several methods of palm vein image generation have been proposed to address this issue. These methods usually focus on the anatomical realism of palm vein patterns, but overlook the biophysical correlation between identities and vein patterns, particularly in simulating identity-specific vein contrast. To tackle this limitation, we propose a novel biophysics-driven synthesis method. Our method constructs a 3D palm vascular tree via established modeling method. Then, a projection model is proposed to map the 3D tree into 2D space to derive palm vein patterns. The projection model is based on skin spectral absorption and simulates the natural attenuation of light passing through the skin using a layer integration method.....","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1609/aaai.v40i11.37837","openalex_id":"https://openalex.org/W7138205136","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Hefei University of Technology","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C184297639","display_name":"Biometrics","score":0.5525000095367432},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5371000170707703},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5103999972343445},{"id":"https://openalex.org/C113174947","display_name":"Tree (set theory)","score":0.47600001096725464},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.4659000039100647},{"id":"https://openalex.org/C192209626","display_name":"Focus (optics)","score":0.4390000104904175},{"id":"https://openalex.org/C69357855","display_name":"Diffusion","score":0.4117000102996826},{"id":"https://openalex.org/C153180895","display_name":"Pattern recognition (psychology)","score":0.3549000024795532}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7137866498","title":"InfiGUI-G1: Advancing GUI Grounding with Adaptive Exploration Policy Optimization","url":"https://doi.org/10.1609/aaai.v40i38.40500","published":"2026-03-14","authors":["Yuhang Liu","Zeyu Liu","Shuanghe Zhu","Pengxiang Li","Congkai Xie","Jiasheng Wang","Xueyu Hu","Xiaotian Han","Jianbo Yuan","Xinyao Wang","Shengyu Zhang","Hongxia Yang"],"abstract":"The emergence of Multimodal Large Language Models (MLLMs) has propelled the development of autonomous agents that operate on Graphical User Interfaces (GUIs) using pure visual input. A fundamental challenge is robustly grounding natural language instructions. This requires a precise spatial alignment, which accurately locates the coordinates of each element, and, more critically, a correct semantic alignment, which matches the instructions to the functionally appropriate UI element. Although Reinforcement Learning with Verifiable Rewards (RLVR) has proven to be effective at improving spatial alignment for these MLLMs, we find that inefficient exploration bottlenecks semantic alignment, which prevents models from learning difficult semantic associations. To address this exploration problem, we present Adaptive Exploration Policy Optimization (AEPO), a new policy optimization framework. AE...","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1609/aaai.v40i38.40500","openalex_id":"https://openalex.org/W7137866498","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Amazon (United States)","Heilongjiang University of Technology","Hong Kong Polytechnic University","University of Chicago","Zhejiang University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7473999857902527},{"id":"https://openalex.org/C97541855","display_name":"Reinforcement learning","score":0.6108999848365784},{"id":"https://openalex.org/C177148314","display_name":"Generalization","score":0.5882999897003174},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5498999953269958},{"id":"https://openalex.org/C85847156","display_name":"Verifiable secret sharing","score":0.4496999979019165},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.42890000343322754},{"id":"https://openalex.org/C14036430","display_name":"Function (biology)","score":0.42149999737739563},{"id":"https://openalex.org/C195324797","display_name":"Natural language","score":0.4180999994277954}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7138314334","title":"HPSU: A Benchmark for Human-Level Perception in Real-World Spoken Speech Understanding","url":"https://doi.org/10.1609/aaai.v40i37.40419","published":"2026-03-14","authors":["Chen Li","Peiji Yang","Yicheng Zhong","Jianxing Yu","Zhisheng Wang","Zihao Gou","Wenqing Chen","Jian Yin"],"abstract":"Recent advances in Speech Large Language Models (Speech LLMs) have led to great progress in speech understanding tasks such as Automatic Speech Recognition (ASR) and Speech Emotion Recognition (SER). However, whether these models can achieve human-level auditory perception, particularly in terms of their ability to comprehend latent intentions and implicit emotions in real-world spoken language, remains underexplored. To this end, we introduce the Human-level Perception in Spoken Speech Understanding (HPSU), a new benchmark for fully evaluating the human-level perceptual and understanding capabilities of Speech LLMs. HPSU comprises over 20,000 expert-validated spoken language understanding samples in English and Chinese. It establishes a comprehensive evaluation framework by encompassing a spectrum of tasks, ranging from basic speaker attribute recognition to complex inference of latent....","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1609/aaai.v40i37.40419","openalex_id":"https://openalex.org/W7138314334","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["College of Tourism","Sun Yat-sen University","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7459999918937683},{"id":"https://openalex.org/C2776230583","display_name":"Spoken language","score":0.6523000001907349},{"id":"https://openalex.org/C26760741","display_name":"Perception","score":0.555899977684021},{"id":"https://openalex.org/C2776214188","display_name":"Inference","score":0.5383999943733215},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.5277000069618225},{"id":"https://openalex.org/C28490314","display_name":"Speech recognition","score":0.5242999792098999},{"id":"https://openalex.org/C185798385","display_name":"Benchmark (surveying)","score":0.5095999836921692},{"id":"https://openalex.org/C99209842","display_name":"Speech perception","score":0.4934000074863434}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7137959354","title":"Flowing Backwards: Improving Normalizing Flows via Reverse Representation Alignment","url":"https://doi.org/10.1609/aaai.v40i4.37300","published":"2026-03-14","authors":["Yang chen","Xiaowei Xu","Shuai Wang","Chenhui Zhu","Ruxue Wen","Xubin Li","Tiezheng Ge","Limin Wang"],"abstract":"Normalizing Flows (NFs) are a class of generative models distinguished by a mathematically invertible architecture, where the forward pass transforms data into a latent space for density estimation, and the reverse pass generates new samples from this space. This characteristic creates an intrinsic synergy between representation learning and data generation. However, the generative quality of standard NFs is limited by poor semantic representations from log-likelihood optimization. To remedy this, we propose a novel alignment strategy that creatively leverages the invertibility of NFs: instead of regularizing the forward pass, we align the intermediate features of the generative (reverse) pass with representations from a powerful vision foundation model, demonstrating superior effectiveness over naive alignment. We also introduce a novel training-free, test-time optimization algorithm fo...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1609/aaai.v40i4.37300","openalex_id":"https://openalex.org/W7137959354","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Alibaba Group (China)","Alibaba Group (United States)","Nanjing University"],"concepts":[{"id":"https://openalex.org/C96442724","display_name":"Invertible matrix","score":0.7071999907493591},{"id":"https://openalex.org/C2776359362","display_name":"Representation (politics)","score":0.6585000157356262},{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.6389999985694885},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6269999742507935},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5789999961853027},{"id":"https://openalex.org/C167966045","display_name":"Generative model","score":0.4999000132083893},{"id":"https://openalex.org/C2777212361","display_name":"Class (philosophy)","score":0.46939998865127563},{"id":"https://openalex.org/C2778572836","display_name":"Space (punctuation)","score":0.39149999618530273}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7138090585","title":"D²Pruner: Debiased Importance and Structural Diversity for MLLM Token Pruning","url":"https://doi.org/10.1609/aaai.v40i15.38234","published":"2026-03-14","authors":["Evelyn Zhang","Fufu Yu","Aoqi Wu","Zichen Wen","Ke Yan","Shouhong Ding","Biqing Qi","Linfeng Zhang"],"abstract":"Processing long visual token sequences poses a significant computational burden on Multimodal Large Language Models (MLLMs). While token pruning offers a path to acceleration, we find that current methods, while adequate for general understanding, catastrophically fail on fine-grained localization tasks. We attribute this failure to the inherent flaws of the two prevailing strategies: importance-based methods suffer from a strong positional bias, an inherent model artifact that distracts from semantic content, while diversity-based methods exhibit structural blindness, disregarding the user's prompt and spatial redundancy. To address this, we introduce D²Pruner, a framework that rectifies these issues by uniquely combining debiased importance with a structural pruning mechanism. Our method first secures a core set of the most critical tokens as pivots based on a debiased attention score....","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1609/aaai.v40i15.38234","openalex_id":"https://openalex.org/W7138090585","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["ShangHai JiAi Genetics & IVF Institute","Shanghai Artificial Intelligence Laboratory","Shanghai Jiao Tong University","Tencent (China)","Tongji University"],"concepts":[{"id":"https://openalex.org/C48145219","display_name":"Security token","score":0.8361999988555908},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.73580002784729},{"id":"https://openalex.org/C108010975","display_name":"Pruning","score":0.7121999859809875},{"id":"https://openalex.org/C177264268","display_name":"Set (abstract data type)","score":0.6008999943733215},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5615000128746033},{"id":"https://openalex.org/C2777735758","display_name":"Path (computing)","score":0.5424000024795532},{"id":"https://openalex.org/C98045186","display_name":"Process (computing)","score":0.4997999966144562},{"id":"https://openalex.org/C81917197","display_name":"Selection (genetic algorithm)","score":0.492000013589859}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7139136507","title":"DualSpeechLM: Towards Unified Speech Understanding and Generation via Dual Speech Token Modeling with Large Language Models","url":"https://doi.org/10.1609/aaai.v40i40.40663","published":"2026-03-14","authors":["Yuanyuan Wang","Dongchao Yang","Yiwen Shao","Hangting Chen","Jiankun Zhao","Zhiyong Wu","Helen Meng","Xixin Wu"],"abstract":"Extending pre-trained text Large Language Models (LLMs)’s speech understanding or generation abilities by introducing various effective speech tokens has attracted great attention in the speech research community. However, building a unified speech understanding and generation model still faces the following challenges: (1) Due to the huge modality gap between speech and text tokens, extending text LLMs to unified speech LLMs relies on large-scale paired data for fine-tuning, and (2) Generation and understanding tasks prefer information at different levels, e.g., generation benefits from detailed acoustic features, while understanding favors high-level semantics. This divergence leads to difficult performance optimization in one unified model. To solve these challenges, in this paper, we present two key insights in speech tokenization and speech language modeling. Specifically, we first....","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1609/aaai.v40i40.40663","openalex_id":"https://openalex.org/W7139136507","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Chinese University of Hong Kong","Tencent (China)","Tsinghua University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.79339998960495},{"id":"https://openalex.org/C48145219","display_name":"Security token","score":0.5623000264167786},{"id":"https://openalex.org/C2776187449","display_name":"Natural language generation","score":0.5615000128746033},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.5358999967575073},{"id":"https://openalex.org/C14999030","display_name":"Speech synthesis","score":0.5069000124931335},{"id":"https://openalex.org/C26517878","display_name":"Key (lock)","score":0.4828000068664551},{"id":"https://openalex.org/C2780226545","display_name":"Modality (human–computer interaction)","score":0.47760000824928284},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.43479999899864197}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7139011883","title":"CoT-VLNBench: A Benchmark for Visual Chain-of-Thought Reasoning in Vision-Language-Navigation Robots","url":"https://doi.org/10.1609/aaai.v40i43.40980","published":"2026-03-14","authors":["Xiao Zhao","Chang Liu","Ruiteng Ji","Zheyuan Zhang","Mingxu Zhu","Linna Song","Zhe Ren","Luo Qingliang","Yuhang Gao","Zhaolong Du","Chufan Guo","Kuifeng Su"],"abstract":"Recent advances in vision language models (VLMs) have demonstrated remarkable potential in embodied navigation tasks. However, existing robot-centric datasets primarily focus on traditional 3D tasks such as perception and prediction, lacking adequate support for vision-language tasks. Vision-language-navigation (VLN) is a key capability for achieving human-like and interpretable navigation in complex environments. In this study, we present CoT-VLNBench, the first large-scale benchmark and dataset designed for chain-of-thought (CoT) reasoning in quadruped robot navigation. Our dataset encompasses a diverse range of indoor and outdoor scenes, multi-step navigation trajectories, and rich natural language instructions, all annotated with fine-grained CoT reasoning traces. Specifically, it contains 175K frames, 5.25M 3D bounding boxes, and 875K vision–question–answer (VQA) pairs. This compreh...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1609/aaai.v40i43.40980","openalex_id":"https://openalex.org/W7139011883","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Tencent (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.73089998960495},{"id":"https://openalex.org/C185798385","display_name":"Benchmark (surveying)","score":0.6804999709129333},{"id":"https://openalex.org/C100609095","display_name":"Embodied cognition","score":0.6532999873161316},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.6252999901771545},{"id":"https://openalex.org/C64543145","display_name":"Intersection (aeronautics)","score":0.6197999715805054},{"id":"https://openalex.org/C90509273","display_name":"Robot","score":0.5626000165939331},{"id":"https://openalex.org/C26760741","display_name":"Perception","score":0.5562999844551086},{"id":"https://openalex.org/C206345919","display_name":"Resource (disambiguation)","score":0.5526000261306763}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7137849539","title":"CR³: Boosting Compositional Reasoning in MLLMs Through Rule-Based Reinforcement Learning","url":"https://doi.org/10.1609/aaai.v40i29.39680","published":"2026-03-14","authors":["Shun Qian","Bingquan Liu","Chengjie Sun","Peijin Xie","Zhen Xu","Baoxun Wang"],"abstract":"Compositional reasoning is a critical capability for multimodal models, enabling systematic understanding of complex scenes through structured combinations of objects, attributes, and relations. However, existing research on this ability primarily focuses on vision-language models (VLMs, e.g., CLIP and SigLIP), with limited exploration of multimodal large language models (MLLMs). To address this gap, we introduce CR³, a novel framework that enhances compositional reasoning abilities of MLLMs via rule-based reinforcement learning. CR³ leverages rule-based rewards to optimize the MLLM's policy on systematically curated multimodal instruction-following tasks, guided by a model-adaptive dynamic task mixing strategy. Our approach boosts performance by over 19% on three compositional reasoning benchmarks, significantly outperforming supervised fine-tuning (SFT) by at least 12%. Crucially, CR³....","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1609/aaai.v40i29.39680","openalex_id":"https://openalex.org/W7137849539","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Harbin Institute of Technology","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7581999897956848},{"id":"https://openalex.org/C97541855","display_name":"Reinforcement learning","score":0.6944000124931335},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.6577000021934509},{"id":"https://openalex.org/C46686674","display_name":"Boosting (machine learning)","score":0.6492000222206116},{"id":"https://openalex.org/C177148314","display_name":"Generalization","score":0.5547999739646912},{"id":"https://openalex.org/C2780451532","display_name":"Task (project management)","score":0.49160000681877136},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.4848000109195709},{"id":"https://openalex.org/C22367795","display_name":"Structured prediction","score":0.3540000021457672}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7138401343","title":"Beyond Passive Critical Thinking: Fostering Proactive Questioning to Enhance Human-AI Collaboration","url":"https://doi.org/10.1609/aaai.v40i39.40621","published":"2026-03-14","authors":["Ante Wang","Yujie Lin","Jingyao Liu","Suhang Wu","Hao Liu","Xinyan Xiao","Jinsong Su"],"abstract":"Critical thinking is essential for building robust AI systems, preventing them from blindly accepting flawed data or biased reasoning. However, prior work has primarily focused on passive critical thinking, where models simply reject problematic queries without taking constructive steps to address user requests. In this work, we introduce proactive critical thinking, a paradigm where models actively seek missing or clarifying information from users to resolve their queries better. To evaluate this capability, we present GSM-MC and GSM-MCE, two novel benchmarks based on GSM8K for assessing mathematical reasoning under incomplete or misleading conditions. Experiments on Qwen3 and Llama series models show that, while these models excel in traditional reasoning tasks, they struggle with proactive critical thinking, especially smaller ones. However, we demonstrate that reinforcement learning....","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1609/aaai.v40i39.40621","openalex_id":"https://openalex.org/W7138401343","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Baidu (China)","Xiamen University","Xiamen University of Technology"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6837999820709229},{"id":"https://openalex.org/C2778701210","display_name":"Constructive","score":0.5882999897003174},{"id":"https://openalex.org/C46686674","display_name":"Boosting (machine learning)","score":0.5356000065803528},{"id":"https://openalex.org/C173801870","display_name":"Heuristic","score":0.4699000120162964},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4052000045776367},{"id":"https://openalex.org/C113336015","display_name":"Complete information","score":0.38670000433921814},{"id":"https://openalex.org/C18762648","display_name":"Work (physics)","score":0.3774000108242035},{"id":"https://openalex.org/C533356498","display_name":"Critical thinking","score":0.3700999915599823}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7138019143","title":"Benchmarking LLMs’ Mathematical Reasoning with Unseen Random Variables Questions","url":"https://doi.org/10.1609/aaai.v40i37.40362","published":"2026-03-14","authors":["Zijin Hong","Hao Wu","Su Dong","Junnan Dong","Yilin Xiao","Yujing Zhang","Zhu Wang","Feiran Huang","Linyi Li","Hongxia Yang","Xiao Huang"],"abstract":"Recent studies have raised significant concerns regarding the reliability of current mathematical benchmarks, highlighting key limitations such as simplistic design and potential data contamination that undermine evaluation accuracy. Consequently, developing a reliable benchmark that effectively evaluates large language models' (LLMs) genuine capabilities in mathematical reasoning remains a critical challenge. To address these concerns, we propose RV-Bench, a novel evaluation methodology for Benchmarking LLMs with Random Variables in mathematical reasoning. Specifically, we develop question-generating functions to produce random variable questions (RVQs), whose background content mirrors the original benchmark problems, but with randomized variable combinations, rendering them \"unseen\" to LLMs. Models must completely understand the inherent question pattern to correctly answer RVQs with....","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1609/aaai.v40i37.40362","openalex_id":"https://openalex.org/W7138019143","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Beihang University","Hong Kong Polytechnic University","Simon Fraser University","Tencent (China)","University of Electronic Science and Technology of China"],"concepts":[{"id":"https://openalex.org/C86251818","display_name":"Benchmarking","score":0.8300999999046326},{"id":"https://openalex.org/C185798385","display_name":"Benchmark (surveying)","score":0.6047000288963318},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5819000005722046},{"id":"https://openalex.org/C63479239","display_name":"Robustness (evolution)","score":0.5356000065803528},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5163000226020813},{"id":"https://openalex.org/C182365436","display_name":"Variable (mathematics)","score":0.4684999883174896},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.4666000008583069},{"id":"https://openalex.org/C177148314","display_name":"Generalization","score":0.4510999917984009}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/cti-realm-benchmark-to-evaluate-agent-performance-on-security-detection-rule-generation-capabilities","title":"CTI-REALM: Benchmark to Evaluate Agent Performance on Security Detection Rule Generation Capabilities","url":"https://www.microsoft.com/en-us/research/publication/cti-realm-benchmark-to-evaluate-agent-performance-on-security-detection-rule-generation-capabilities/","published":"2026-03-13","authors":["Arjun Chakraborty","Sandra Ho","Adam Cook","Manuel Mel'endez"],"abstract":"CTI-REALM (Cyber Threat Real World Evaluation and LLM Benchmarking) is a benchmark designed to evaluate AI agents' ability to interpret cyber threat intelligence (CTI) and develop detection rules. The benchmark provides a realistic environment that replicates the security analyst workflow. This enables agents to examine CTI reports, execute queries, understand schema structures, and construct detection rules. Evaluation involves emulated attacks of varying complexity across Linux systems, cloud platforms, and Azure Kubernetes Service (AKS), with ground truth data for accurate assessment. Agent performance is measured through both final detection results and trajectory-based rewards that capture decision-making effectiveness. This work demonstrates the potential of AI agents to support labor-intensive aspects of detection engineering. Our comprehensive evaluation of 16 frontier models sho...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"https://openalex.org/W7139144877","cited_by_count":0,"quality_score":80,"matched_keywords":["Miscellaneous","Security, privacy, and cryptography","Computer science","Cryptography and Security","LLM","memory","agent"],"author_affiliations":["Microsoft","Microsoft (United States)","Microsoft Research (United Kingdom)"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/spatially-grounded-long-horizon-task-planning-in-the-wild","title":"Spatially Grounded Long-Horizon Task Planning in the Wild","url":"https://www.microsoft.com/en-us/research/publication/spatially-grounded-long-horizon-task-planning-in-the-wild/","published":"2026-03-13","authors":["Sehun Jung","Hyunjee Song","Donghyun Kim","Reuben Tan","Jianfeng Gao","Yong Jae Lee","Donghyun Kim"],"abstract":"Recent advances in robot manipulation increasingly leverage Vision-Language Models (VLMs) for high-level reasoning, such as decomposing task instructions into sequential action plans expressed in natural language that guide downstream low-level motor execution. However, current benchmarks do not assess whether these plans are spatially executable, particularly in specifying the exact spatial locations where the robot should interact to execute the plan, limiting evaluation of real-world manipulation capability. To bridge this gap, we define a novel task of grounded planning and introduce GroundedPlanBench, a newly curated benchmark for spatially grounded long-horizon action planning in the wild. GroundedPlanBench jointly evaluates hierarchical sub-action planning and spatial action grounding (where to act), enabling systematic assessment of whether generated sub-actions are spatially exe...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Miscellaneous","Computer vision","Computer science","Vision-language models"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"hf-org-paper:tencent:2603.16929","title":"MHPO: Modulated Hazard-aware Policy Optimization for Stable Reinforcement Learning","url":"https://huggingface.co/papers/2603.16929","published":"2026-03-13","authors":["Tencent/Hunyuan"],"abstract":"","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["HuggingFace org papers","tencent"],"author_affiliations":["Tencent/Hunyuan"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/tencent/papers"}},{"id":"apple:y7pyrh2cqd67m80hup9pvbev","title":"mAceReason-Math: A Dataset of High-Quality Multilingual Math Problems Ready For RLVR","url":"https://machinelearning.apple.com/research/macereason-math","published":"2026-03-13","authors":["Konstantin Dobler","Simon Lehnerer","Federico Scozzafava","Jonathan Janke","Mohamed Ali"],"abstract":"Reinforcement Learning with Verifiable Rewards (RLVR) has been successfully applied to significantly boost the capabilities of pretrained large language models, especially in the math and logic problem domains. However, current research and available training datasets remain English-centric. While multilingual training data and benchmarks have been created in the past, they were not created with RLVR and current model capability in mind, and...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"openalex:W7135229697","title":"Constructing a global ground truth: A news-derived dataset for socioeconomic drought event validation","url":"https://doi.org/10.5194/egusphere-egu26-3402","published":"2026-03-13","authors":["Yonatan Nakar","Grey Nearing","Rotem Mayo","Oleg Zlydenko","Frederik Kratzert","Moral Bootbool","Amitay Sicherman","Ido Zemach","Deborah Cohen"],"abstract":"Meteorological drought indices (e.g., SPI) and composite products (e.g., USDM) serve as standard benchmarks for evaluating drought forecasting models. However, these metrics are physical proxies rather than direct measures of societal impact. A precipitation deficit does not always manifest as a drought. Yet, when a true drought impacts agriculture, water supply, or ecosystems, it is typically reported in local or national media. To capture this reality, we introduce a comprehensive global dataset of socioeconomic drought events, designed to serve as an independent ground truth for model validation.Our approach utilizes a scalable, two-stage pipeline. We first filter global web news data to identify candidate articles, followed by a targeted analysis of approximately 600,000 texts using Gemini. Unlike traditional keyword scraping, the LLM allows for nuanced semantic filtering. It explici...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.5194/egusphere-egu26-3402","openalex_id":"https://openalex.org/W7135229697","cited_by_count":0,"quality_score":49,"matched_keywords":["LLM","news","media"],"author_affiliations":["Google (United States)"],"concepts":[{"id":"https://openalex.org/C9770341","display_name":"Geospatial analysis","score":0.7253000140190125},{"id":"https://openalex.org/C2779662365","display_name":"Event (particle physics)","score":0.6207000017166138},{"id":"https://openalex.org/C4438859","display_name":"Timeline","score":0.5455999970436096},{"id":"https://openalex.org/C39410599","display_name":"Natural hazard","score":0.527400016784668},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5113999843597412},{"id":"https://openalex.org/C146849305","display_name":"Ground truth","score":0.4893999993801117},{"id":"https://openalex.org/C205649164","display_name":"Geography","score":0.37139999866485596},{"id":"https://openalex.org/C196083921","display_name":"Variance (accounting)","score":0.3546000123023987}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7135191240","title":"Hammer: An Expert-Level Large Language Model for Hydro-Science and Engineering Balancing Domain Expertise and General Intelligence","url":"https://doi.org/10.5194/egusphere-egu26-2906","published":"2026-03-13","authors":["Xinpeng Yu","Wenbo Shan","Ye Li","Shiruo Hu","Dingxiao Liu","Zhijun Zheng","Jing Liu","Wei Luo","Lizhi Wang","Bin Xu","Jianshi Zhao"],"abstract":"Large Language Models (LLMs) have demonstrated outstanding performance across natural language processing tasks. However, when deployed in specialized domains such as hydro-science and engineering (HydroSE), these models face challenges such as insufficient domain knowledge and catastrophic forgetting during domain adaption. In this work, we constructed a multi-dimensional corpus for the HydroSE and trained a domain-specific LLM named Hammer. We propose a comprehensive training paradigm that integrates multi-dimensional knowledge injection with a multi-model merging method, effectively balancing domain expertise with general intelligence. First, to overcome knowledge scarcity, multi-disciplinary knowledge involved in HdyroSE is collected from various sources (such as textbooks, papers, laws and industry standards, etc.). Second, to mitigate catastrophic forgetting, we implemented a progr...","companies":["Z.ai/Zhipu"],"matched_orgs":["Z.ai/Zhipu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.5194/egusphere-egu26-2906","openalex_id":"https://openalex.org/W7135191240","cited_by_count":0,"quality_score":45,"matched_keywords":["LLM","language model"],"author_affiliations":["Beijing Institute of Big Data Research","Tsinghua University","Zhipu AI (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7617999911308289},{"id":"https://openalex.org/C4554734","display_name":"Knowledge base","score":0.6463000178337097},{"id":"https://openalex.org/C36503486","display_name":"Domain (mathematical analysis)","score":0.5530999898910522},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5493999719619751},{"id":"https://openalex.org/C43521106","display_name":"Pipeline (software)","score":0.5486000180244446},{"id":"https://openalex.org/C207685749","display_name":"Domain knowledge","score":0.5354999899864197},{"id":"https://openalex.org/C105002631","display_name":"Subject-matter expert","score":0.5271999835968018},{"id":"https://openalex.org/C92548554","display_name":"Domain model","score":0.4544999897480011}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7135183802","title":"SHRUG-FM: Reliability-Aware Foundation Models for Earth Observation","url":"https://doi.org/10.5194/egusphere-egu26-9940","published":"2026-03-13","authors":["Kai-Hendrik Cohrs","Maria Gonzalez-Calabuig","Vishal Nedungadi","Zuzanna Osika","Ruben Cartuyvels","Steffen Knoblauch","Joppe Massant","Shruti Nath","Patrick Ebel","Vasileios Sitokonstantinou"],"abstract":"Following recent advances of foundation models in natural language processing and computer vision, there is growing interest in leveraging geospatial foundation models (GFMs) for Earth system monitoring and climate-relevant applications. In particular, GFMs promise to support large-scale observation of climate-driven extreme events such as wildfires, floods and landslides. However, despite strong benchmark results, recent studies indicate that GFMs for land-cover modelling and hazard mapping models can behave unreliably under real-world conditions. Pretraining datasets often underrepresent rare or extreme environmental regimes, leading to degraded model performance precisely in situations where robust predictions are most critical for climate risk assessment and disaster response. Furthermore, GFMs are often surpassed by simple supervised baselines, highlighting the need for systematic r...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.5194/egusphere-egu26-9940","openalex_id":"https://openalex.org/W7135183802","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Delft University of Technology","European Space Research Institute","Ghent University Hospital","Google (United States)","Heidelberg University","Parc Científic de la Universitat de València","Universitat de València","University of Oxford","Wageningen University & Research"],"concepts":[{"id":"https://openalex.org/C9770341","display_name":"Geospatial analysis","score":0.7785000205039978},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6273000240325928},{"id":"https://openalex.org/C43214815","display_name":"Reliability (semiconductor)","score":0.5425000190734863},{"id":"https://openalex.org/C32230216","display_name":"Uncertainty quantification","score":0.5189999938011169},{"id":"https://openalex.org/C74256435","display_name":"Flood myth","score":0.4449000060558319},{"id":"https://openalex.org/C39399123","display_name":"Earth observation","score":0.38920000195503235},{"id":"https://openalex.org/C49261128","display_name":"Hazard","score":0.3865000009536743},{"id":"https://openalex.org/C29852176","display_name":"Critical infrastructure","score":0.3824000060558319}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7138426444","title":"NCCL EP: Towards a Unified Expert Parallel Communication API for NCCL","url":"https://doi.org/10.48550/arxiv.2603.13606","published":"2026-03-13","authors":["Amos Goldman","Nimrod Boker","Maayan Sheraizin","Nimrod Admoni","Artem Y. Polyakov","Subhadeep Bhattacharya","Fan Yu","Kai Sun","Georgios Theodorakis","Hsin-Chun Yin","Peter-Jan Gootzen","Aamir Shafi"],"abstract":"Mixture-of-Experts (MoE) architectures have become essential for scaling large language models, driving the development of specialized device-initiated communication libraries such as DeepEP, Hybrid-EP, and others. These libraries demonstrate the performance benefits of GPU-initiated RDMA for MoE dispatch and combine operations. This paper presents NCCL EP (Expert Parallelism), a ground-up MoE communication library built entirely on NCCL's Device API. NCCL EP provides unified ncclEpDispatch and ncclEpCombine primitives with both C and Python interfaces, supporting Low-Latency (LL) mode for inference decoding and High-Throughput (HT) mode for training and inference prefill. LL targets small batch sizes (1-128 tokens) using direct all-to-all RDMA+NVLink mesh connectivity with double-buffered communication for overlapping dispatch and combine phases. HT targets large batches (4096+ tokens)....","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"preprint","doi":"https://doi.org/10.48550/arxiv.2603.13606","openalex_id":"https://openalex.org/W7138426444","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Nvidia (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7753000259399414},{"id":"https://openalex.org/C130795937","display_name":"Remote direct memory access","score":0.7264000177383423},{"id":"https://openalex.org/C153083717","display_name":"Leverage (statistics)","score":0.5388000011444092},{"id":"https://openalex.org/C2776214188","display_name":"Inference","score":0.5059000253677368},{"id":"https://openalex.org/C519991488","display_name":"Python (programming language)","score":0.4291999936103821},{"id":"https://openalex.org/C120314980","display_name":"Distributed computing","score":0.3560999929904938},{"id":"https://openalex.org/C74193536","display_name":"Kernel (algebra)","score":0.35019999742507935},{"id":"https://openalex.org/C206345919","display_name":"Resource (disambiguation)","score":0.28700000047683716}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/evotok-a-unified-image-tokenizer-via-residual-latent-evolution-for-visual-understanding-and-generation","title":"EvoTok: A Unified Image Tokenizer via Residual Latent Evolution for Visual Understanding and Generation","url":"https://www.microsoft.com/en-us/research/publication/evotok-a-unified-image-tokenizer-via-residual-latent-evolution-for-visual-understanding-and-generation/","published":"2026-03-12","authors":["Yan Li","Ning Liao","Xiangyu Zhao","Shaofeng Zhang","Xiaoxing Wang","Yifan Yang","Junchi Yan","Xue Yang"],"abstract":"The development of unified multimodal large language models (MLLMs) is fundamentally challenged by the granularity gap between visual understanding and generation: understanding requires high-level semantic abstractions, while image generation demands fine-grained pixel-level representations. Existing approaches usually enforce the two supervision on the same set of representation or decouple these two supervision on separate feature spaces, leading to interference and inconsistency, respectively. In this work, we propose EvoTok, a unified image tokenizer that reconciles these requirements through a residual evolution process within a shared latent space. Instead of maintaining separate token spaces for pixels and semantics, EvoTok encodes an image into a cascaded sequence of residual tokens via residual vector quantization. This residual sequence forms an evolution trajectory where earl...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Miscellaneous","Artificial intelligence","Computer science","Multimodal Large Language Models","language model","quantization"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/actprompt-in-domain-feature-adaptation-via-action-cues-for-video-temporal-grounding","title":"ActPrompt: In-Domain Feature Adaptation via Action Cues for Video Temporal Grounding","url":"https://www.microsoft.com/en-us/research/publication/actprompt-in-domain-feature-adaptation-via-action-cues-for-video-temporal-grounding/","published":"2026-03-12","authors":["Yubin Wang","Xinyang Jiang","De Cheng","Dongsheng Li","Cairong Zhao"],"abstract":"Video temporal grounding, including moment retrieval and highlight detection, is an emerging topic aiming to identify specific clips within videos. In addition to pre-trained video models, contemporary methods utilize pre-trained vision-language models (VLMs) to capture detailed characteristics of diverse scenes and objects from video frames. However, as pre-trained on images, directly using pre-extracted VLM features neglects the domain gap between the pre-trained and temporal grounding datasets, thus inducing domain shifts due to the data-level distribution disparity. As a result, VLMs may struggle to distinguish action-sensitive patterns from static objects, making it necessary to adapt them to specific data domains for effective feature representation over temporal grounding. In this work, we address two primary challenges to achieve this goal. Specifically, to mitigate high adaptati...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Article (Journal)","Computer vision","Computer science","Medicine","retrieval","efficient"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/matching-features-not-tokens-energy-based-fine-tuning-of-language-models","title":"Matching Features, Not Tokens: Energy-Based Fine-Tuning of Language Models","url":"https://www.microsoft.com/en-us/research/publication/matching-features-not-tokens-energy-based-fine-tuning-of-language-models/","published":"2026-03-12","authors":["Samy Jelassi","Mujin Kwun","Rosie Zhao","Yuanzhi Li","Nicolo Fusi","Yilun Du","Sham Kakade","Carles Domingo-Enrich"],"abstract":"Cross-entropy (CE) training provides dense and scalable supervision for language models, but it optimizes next-token prediction under teacher forcing rather than sequence-level behavior under model rollouts. We introduce a feature-matching objective for language-model fine-tuning that targets sequence-level statistics of the completion distribution, providing dense semantic feedback without requiring a task-specific verifier or preference model. To optimize this objective efficiently, we propose energy-based fine-tuning (EBFT), which uses strided block-parallel sampling to generate multiple rollouts from nested prefixes concurrently, batches feature extraction over these rollouts, and uses the resulting embeddings to perform an on-policy policy-gradient update. We present a theoretical perspective connecting EBFT to KL-regularized feature-matching and energy-based modeling. Empirically,....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Miscellaneous","Artificial intelligence","Computer science","Language model","preference"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/flashmotion-few-step-controllable-video-generation-with-trajectory-guidance","title":"FlashMotion: Few-Step Controllable Video Generation with Trajectory Guidance","url":"https://www.microsoft.com/en-us/research/publication/flashmotion-few-step-controllable-video-generation-with-trajectory-guidance/","published":"2026-03-12","authors":["Quanhao Li","Zhen Xing","Rui Wang","Haidong Cao","Qiaofei Dai","Daoguo Dong","Zuxuan Wu"],"abstract":"Recent advances in trajectory-controllable video generation have achieved remarkable progress. Previous methods mainly use adapter-based architectures for precise motion control along predefined trajectories. However, all these methods rely on a multi-step denoising process, leading to substantial time redundancy and computational overhead. While existing video distillation methods successfully distill multi-step generators into few-step, directly applying these approaches to trajectory-controllable video generation results in noticeable degradation in both video quality and trajectory accuracy. To bridge this gap, we introduce FlashMotion, a novel training framework designed for few-step trajectory-controllable video generation. We first train a trajectory adapter on a multi-step video generator for precise trajectory control. Then, we distill the generator into a few-step version to ac...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Miscellaneous","Computer vision","Computer science","video generation","distillation"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"hf-org-paper:Tencent-Hunyuan:2603.12255","title":"Spatial-TTT: Streaming Visual-based Spatial Intelligence with Test-Time Training","url":"https://huggingface.co/papers/2603.12255","published":"2026-03-12","authors":["Tencent/Hunyuan"],"abstract":"","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["HuggingFace org papers","Tencent-Hunyuan"],"author_affiliations":["Tencent/Hunyuan"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/Tencent-Hunyuan/papers"}},{"id":"hf-org-paper:zai-org:2603.12201","title":"IndexCache: Accelerating Sparse Attention via Cross-Layer Index Reuse","url":"https://huggingface.co/papers/2603.12201","published":"2026-03-12","authors":["Z.ai/Zhipu"],"abstract":"","companies":["Z.ai/Zhipu"],"matched_orgs":["Z.ai/Zhipu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["HuggingFace org papers","zai-org"],"author_affiliations":["Z.ai/Zhipu"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/zai-org/papers"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/does-llm-alignment-really-need-diversity-an-empirical-study-of-adapting-rlvr-methods-for-moral-reasoning","title":"Does LLM Alignment Really Need Diversity? An Empirical Study of Adapting RLVR Methods for Moral Reasoning","url":"https://www.microsoft.com/en-us/research/publication/does-llm-alignment-really-need-diversity-an-empirical-study-of-adapting-rlvr-methods-for-moral-reasoning/","published":"2026-03-11","authors":["Zhaowei Zhang","Xiaohan Liu","Xue-Peng Zhu","Junchao Huang","Ceyao Zhang","Zhiyuan Feng","Yaodong Yang","Xiaoyuan Yi","Xing Xie"],"abstract":"Reinforcement learning with verifiable rewards (RLVR) has achieved remarkable success in logical reasoning tasks, yet whether large language model (LLM) alignment requires fundamentally different approaches remains unclear. Given the apparent tolerance for multiple valid responses in moral reasoning, a natural hypothesis is that alignment tasks inherently require diversity-seeking distribution-matching algorithms rather than reward-maximizing policy-based methods. We conduct the first comprehensive empirical study comparing both paradigms on MoReBench. To enable stable RLVR training, we build a rubric-grounded reward pipeline by training a Qwen3-1.7B judge model. Contrary to our hypothesis, we find that distribution-matching approaches do not demonstrate significant advantages over reward-maximizing methods as expected on alignment tasks. Through semantic visualization mapping high-rewar...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Miscellaneous","Artificial intelligence","Computer science","large language models","LLM","language model"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"hf-org-paper:stepfun-ai:2603.13391","title":"WebVR: Benchmarking Multimodal LLMs for WebPage Recreation from Videos via Human-Aligned Visual Rubrics","url":"https://huggingface.co/papers/2603.13391","published":"2026-03-11","authors":["StepFun"],"abstract":"","companies":["StepFun"],"matched_orgs":["StepFun"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["HuggingFace org papers","stepfun-ai"],"author_affiliations":["StepFun"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/stepfun-ai/papers"}},{"id":"hf-org-paper:Tencent-Hunyuan:2603.10702","title":"UniCom: Unified Multimodal Modeling via Compressed Continuous Semantic Representations","url":"https://huggingface.co/papers/2603.10702","published":"2026-03-11","authors":["Tencent/Hunyuan"],"abstract":"","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["HuggingFace org papers","Tencent-Hunyuan"],"author_affiliations":["Tencent/Hunyuan"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/Tencent-Hunyuan/papers"}},{"id":"hf-org-paper:tencent:2603.11421","title":"ShotVerse: Advancing Cinematic Camera Control for Text-Driven Multi-Shot Video Creation","url":"https://huggingface.co/papers/2603.11421","published":"2026-03-11","authors":["Tencent/Hunyuan"],"abstract":"","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["HuggingFace org papers","tencent"],"author_affiliations":["Tencent/Hunyuan"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/tencent/papers"}},{"id":"hf-org-paper:baidu:2603.13398","title":"Qianfan-OCR: A Unified End-to-End Model for Document Intelligence","url":"https://huggingface.co/papers/2603.13398","published":"2026-03-11","authors":["Baidu"],"abstract":"","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["HuggingFace org papers","baidu"],"author_affiliations":["Baidu"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/baidu/papers"}},{"id":"hf-org-paper:Qwen:2603.10757","title":"CodePercept: Code-Grounded Visual STEM Perception for MLLMs","url":"https://huggingface.co/papers/2603.10757","published":"2026-03-11","authors":["Alibaba/Qwen"],"abstract":"","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["HuggingFace org papers","Qwen"],"author_affiliations":["Alibaba/Qwen"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/Qwen/papers"}},{"id":"openalex:W7159762460","title":"Optimizing LLM Prompt Engineering with DSPy-Based Declarative Learning","url":"https://doi.org/10.1109/esci68015.2026.11493364","published":"2026-03-11","authors":["Shiek Ruksana","Sailesh Kiran Kurra","Thipparthi Sanjay Baradwaj"],"abstract":"","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/esci68015.2026.11493364","openalex_id":"https://openalex.org/W7159762460","cited_by_count":0,"quality_score":41,"matched_keywords":["LLM"],"author_affiliations":["Administrative Staff College of India","Amazon (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6162999868392944},{"id":"https://openalex.org/C199360897","display_name":"Programming language","score":0.40470001101493835},{"id":"https://openalex.org/C115903868","display_name":"Software engineering","score":0.35280001163482666},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.350600004196167},{"id":"https://openalex.org/C146206909","display_name":"Declarative programming","score":0.2955999970436096},{"id":"https://openalex.org/C14156362","display_name":"Descriptive knowledge","score":0.2842999994754791},{"id":"https://openalex.org/C9652623","display_name":"Field (mathematics)","score":0.2782999873161316},{"id":"https://openalex.org/C168167062","display_name":"Component (thermodynamics)","score":0.2732999920845032}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"arxiv:2603.11027","title":"Beyond the Illusion of Consensus: From Surface Heuristics to Knowledge-Grounded Evaluation in LLM-as-a-Judge","url":"http://arxiv.org/abs/2603.11027","published":"2026-03-11","authors":["Mingyang Song","Mao Zheng","Chenning Xu"],"abstract":"The paradigm of LLM-as-a-judge relies on a critical assumption, namely that high inter-evaluator agreement indicates reliable and objective evaluation. We present two complementary findings that challenge this assumption. \\textbf{First}, we demonstrate that this consensus is frequently illusory. We identify and formalize \\textbf{Evaluation Illusion}, a phenomenon where LLM judges generate sophisticated critiques yet anchor scores on shared surface heuristics rather than substantive quality. Through a large-scale study of 105,600 evaluation instances (32 LLMs $\\times$ 3 frontier judges $\\times$ 100 tasks $\\times$ 11 temperatures), we show that model-level agreement (Spearman $ρ= 0.99$) masks fragile sample-level agreement (Pearson $\\bar{r} = 0.72$; absolute agreement ICC $= 0.67$), that merely sharing rubric structure restores 62\\% of total agreement, and that high-quality outputs paradox...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"","openalex_id":"https://openalex.org/W7135156227","cited_by_count":0,"quality_score":41,"matched_keywords":["LLM"],"author_affiliations":["Tencent (China)"],"concepts":[{"id":"https://openalex.org/C111640148","display_name":"Rubric","score":0.973800003528595},{"id":"https://openalex.org/C127705205","display_name":"Heuristics","score":0.8317999839782715},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5389999747276306},{"id":"https://openalex.org/C111472728","display_name":"Epistemology","score":0.46219998598098755},{"id":"https://openalex.org/C49831778","display_name":"Pluralism (philosophy)","score":0.4519999921321869},{"id":"https://openalex.org/C50335755","display_name":"Phenomenon","score":0.4503999948501587},{"id":"https://openalex.org/C2777655017","display_name":"Toolbox","score":0.44909998774528503},{"id":"https://openalex.org/C184047640","display_name":"Illusion","score":0.42489999532699585}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/evaluating-the-practical-effectiveness-of-llm-driven-index-tuning-with-microsoft-database-tuning-advisor","title":"Evaluating the Practical Effectiveness of LLM-Driven Index Tuning with Microsoft Database Tuning Advisor","url":"https://www.microsoft.com/en-us/research/publication/evaluating-the-practical-effectiveness-of-llm-driven-index-tuning-with-microsoft-database-tuning-advisor/","published":"2026-03-10","authors":["Xiaoying Wang","Wentao Wu","Vivek Narasayya","Surajit Chaudhuri"],"abstract":"Index tuning is critical for the performance of modern database systems. Industrial index tuners, such as the Database Tuning Advisor (DTA) developed for Microsoft SQL Server, rely on the\"what-if\"API provided by the query optimizer to estimate the cost of a query given an index configuration, which can lead to suboptimal recommendations when the estimations are inaccurate. Large language model (LLM) offers a new approach to index tuning, with knowledge learned from web-scale training datasets. However, the effectiveness of LLM-driven index tuning, especially beyond what is already achieved by commercial index tuners, remains unclear. In this paper, we study the practical effectiveness of LLM-driven index tuning using both industrial benchmarks and real-world enterprise customer workloads, and compare it with DTA. Our results show that although DTA is generally more reliable, with a few i...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"https://openalex.org/W7134992077","cited_by_count":0,"quality_score":80,"matched_keywords":["Miscellaneous","Artificial intelligence","Data platforms and analytics","Computer science","large language models","LLM","language model"],"author_affiliations":["Microsoft","Microsoft (United States)"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/social-r1-towards-human-like-social-reasoning-in-llms","title":"Social-R1: Towards Human-like Social Reasoning in LLMs","url":"https://www.microsoft.com/en-us/research/publication/social-r1-towards-human-like-social-reasoning-in-llms/","published":"2026-03-10","authors":["Jincenzi Wu","Yuxuan Lei","Jianxun Lian","Yitian Huang","Lexin Zhou","Haotian Li","Xing Xie","Helen M. Meng"],"abstract":"While large language models demonstrate remarkable capabilities across numerous domains, social intelligence - the capacity to perceive social cues, infer mental states, and generate appropriate responses - remains a critical challenge, particularly for enabling effective human-AI collaboration and developing AI that truly serves human needs. Current models often rely on superficial patterns rather than genuine social reasoning. We argue that cultivating human-like social intelligence requires training with challenging cases that resist shortcut solutions. To this end, we introduce ToMBench-Hard, an adversarial benchmark designed to provide hard training examples for social reasoning. Building on this, we propose Social-R1, a reinforcement learning framework that aligns model reasoning with human cognition through multi-dimensional rewards. Unlike outcome-based RL, Social-R1 supervises t...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Miscellaneous","Artificial intelligence","Computer science","large language models","efficient"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W7134917117","title":"<scp>Bat:</scp> Efficient Generative Recommender Serving with Bipartite Attention","url":"https://doi.org/10.1145/3779212.3790131","published":"2026-03-10","authors":["Jie Sun","Shaohang Wang","Zimo Zhang","Zhengyu Liu","Yunlong Xu","Peng Sun","Bo Zhao","Bingsheng He","Fei Wu","Zeke Wang"],"abstract":"Generative Recommenders (GRs) have recently emerged as promising alternatives to traditional Deep Learning Recommendation Models (DLRMs). Despite their potential, GRs remain computationally expensive in inference, exhibiting compute-bound characteristics similar to the prefill stage of Large Language Model (LLM) inference. Prefix caching can reduce redundant computation by reusing previously constructed KV caches. However, the unique properties of GRs, i.e., highly personalized user profiles and real-time item retrieval, make cache reuse across queries challenging, resulting in limited computational savings.","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3779212.3790131","openalex_id":"https://openalex.org/W7134917117","cited_by_count":0,"quality_score":57,"matched_keywords":["LLM","language model","personalized","retrieval","efficient"],"author_affiliations":["Aalto University","Alibaba Group (China)","National University of Singapore","University of Hong Kong","Zhejiang University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8198999762535095},{"id":"https://openalex.org/C206588197","display_name":"Reuse","score":0.7038000226020813},{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.6251999735832214},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5194000005722046},{"id":"https://openalex.org/C141603448","display_name":"Prefix","score":0.486299991607666},{"id":"https://openalex.org/C115537543","display_name":"Cache","score":0.4754999876022339},{"id":"https://openalex.org/C557471498","display_name":"Recommender system","score":0.4528000056743622},{"id":"https://openalex.org/C45374587","display_name":"Computation","score":0.41690000891685486}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7134945412","title":"Taming the Long-Tail: Efficient Reasoning RL Training with Adaptive Drafter","url":"https://doi.org/10.1145/3779212.3790231","published":"2026-03-10","authors":["Qinghao Hu","Shang Yang","Junxian Guo","Xiaozhe Yao","Yujun Lin","Yuxian Gu","Han Cai","Chuang Gan","Ana Klimovic","Song Han"],"abstract":"The emergence of Large Language Models (LLMs) with strong reasoning capabilities marks a significant milestone, unlocking new frontiers in complex problem-solving. However, training these reasoning models, typically using Reinforcement Learning (RL), encounters critical efficiency bottlenecks: response generation during RL training exhibits a persistent long-tail distribution, where a few very long responses dominate execution time, wasting resources and inflating costs. To address this, we propose TLT, a system that accelerates reasoning RL training losslessly by integrating adaptive speculative decoding. Applying speculative decoding in RL is challenging due to the dynamic workloads, evolving target model, and draft model training overhead. TLT overcomes these obstacles with two synergistic components: (1) Adaptive Drafter, a lightweight draft model trained continuously on idle GPUs du...","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3779212.3790231","openalex_id":"https://openalex.org/W7134945412","cited_by_count":0,"quality_score":45,"matched_keywords":["memory","efficient"],"author_affiliations":["ETH Zurich","IIT@MIT","Nvidia (United Kingdom)","Nvidia (United States)","University of Massachusetts Amherst"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7972000241279602},{"id":"https://openalex.org/C2777211547","display_name":"Training (meteorology)","score":0.6873999834060669},{"id":"https://openalex.org/C68339613","display_name":"Speedup","score":0.6496000289916992},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5218999981880188},{"id":"https://openalex.org/C2776760102","display_name":"Code (set theory)","score":0.45509999990463257},{"id":"https://openalex.org/C97541855","display_name":"Reinforcement learning","score":0.44780001044273376},{"id":"https://openalex.org/C57273362","display_name":"Decoding methods","score":0.3644999861717224},{"id":"https://openalex.org/C139807058","display_name":"Adaptation (eye)","score":0.35010001063346863}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7134902181","title":"TPLA: Tensor Parallel Latent Attention for Efficient Disaggregated Prefill & Decode Inference","url":"https://doi.org/10.1145/3779212.3790237","published":"2026-03-10","authors":["Xiaojuan Tang","Fanxu Meng","Pingzhi Tang","Yuxuan Wang","Di Yin","Xing Sun","Muhan Zhang"],"abstract":"Multi-Head Latent Attention (MLA), introduced in DeepSeek-V2, compresses key–value states into a low-rank latent vector cKV, caching only this vector to reduce memory. In tensor parallelism (TP), however, attention heads are computed across multiple devices, and each device must load the full cKV, eroding the advantage of MLA over Grouped Query Attention (GQA). We present TPLA, a scheme that partitions both the latent representation and each head's input dimension across devices, performs attention independently on each shard, and aggregates the results with an all-reduce. Unlike GLA, every attention head in TPLA still attends to the full latent space, preserving MLA's representational capacity while reducing the per-device KV cache. To make TPLA drop-in compatible with MLA checkpoints, we further derive orthogonal reparameterizations of RMSNorm and softmax---instantiated with Hadamard a...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3779212.3790237","openalex_id":"https://openalex.org/W7134902181","cited_by_count":0,"quality_score":45,"matched_keywords":["memory","efficient"],"author_affiliations":["Peking University","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7192999720573425},{"id":"https://openalex.org/C2776214188","display_name":"Inference","score":0.5792999863624573},{"id":"https://openalex.org/C2779343474","display_name":"Context (archaeology)","score":0.5778999924659729},{"id":"https://openalex.org/C2776359362","display_name":"Representation (politics)","score":0.5414999723434448},{"id":"https://openalex.org/C155281189","display_name":"Tensor (intrinsic definition)","score":0.5242000222206116},{"id":"https://openalex.org/C80444323","display_name":"Theoretical computer science","score":0.522599995136261},{"id":"https://openalex.org/C77618280","display_name":"Scheme (mathematics)","score":0.5080999732017517},{"id":"https://openalex.org/C125411270","display_name":"Encoding (memory)","score":0.45649999380111694}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"arxiv:2603.09938","title":"Model Merging in the Era of Large Language Models: Methods, Applications, and Future Directions","url":"http://arxiv.org/abs/2603.09938","published":"2026-03-10","authors":["Mingyang Song","Mao Zheng"],"abstract":"Model merging combines the parameters of multiple neural networks into a single model without additional training. As fine-tuned large language models (LLMs) proliferate, merging offers a computationally efficient alternative to ensembles and full retraining, enabling practitioners to compose specialized capabilities at minimal cost. This survey examines model merging in the LLM era through the \\textbf{FUSE} taxonomy, organized along \\textbf{F}oundations, \\textbf{U}nification Strategies, \\textbf{S}cenarios, and \\textbf{E}cosystem. We first establish the theoretical underpinnings of merging, including loss landscape geometry and mode connectivity, then systematically review the algorithmic space spanning weight averaging, task vector arithmetic, sparsification-enhanced methods, mixture-of-experts architectures, and evolutionary optimization. We further examine downstream applications acro...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"","openalex_id":"https://openalex.org/W7134992069","cited_by_count":0,"quality_score":45,"matched_keywords":["LLM","efficient"],"author_affiliations":["Tencent (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.766700029373169},{"id":"https://openalex.org/C48044578","display_name":"Scalability","score":0.5964999794960022},{"id":"https://openalex.org/C70587473","display_name":"Transformative learning","score":0.5781000256538391},{"id":"https://openalex.org/C2522767166","display_name":"Data science","score":0.5702000260353088},{"id":"https://openalex.org/C188087704","display_name":"Standardization","score":0.5641999840736389},{"id":"https://openalex.org/C26517878","display_name":"Key (lock)","score":0.536899983882904},{"id":"https://openalex.org/C36503486","display_name":"Domain (mathematical analysis)","score":0.3959999978542328},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.38920000195503235}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"arxiv:2511.03782","title":"Expert evaluation of LLM world models: A high-T <sub> <i>c</i> </sub> superconductivity case study","url":"http://arxiv.org/abs/2511.03782","published":"2026-03-10","authors":["Haoyu Guo","Maria Tikhanovskaya","Paul Raccuglia","Alexey Vlaskin","Chris Co","Daniel J. Liebling","Scott Ellsworth","Matthew Abraham","Elizabeth H. Dorfman","N. P. Armitage","Chunhan Feng","Antoine Georges"],"abstract":"Large Language Models (LLMs) show great promise as a powerful tool for scientific literature exploration. However, their effectiveness in providing scientifically accurate and comprehensive answers to complex questions within specialized domains remains an active area of research. Using the field of high-temperature cuprates as an exemplar, we evaluate the ability of LLM systems to understand the literature at the level of an expert. We construct an expert-curated database of 1,726 scientific papers that covers the history of the field, and a set of 67 expert-formulated questions that probe deep understanding of the literature. We then evaluate six different LLM-based systems for answering these questions, including both commercially available closed models and a custom retrieval-augmented generation (RAG) system capable of retrieving images alongside text. Experts then evaluate the answ...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1073/pnas.2533676123","openalex_id":"https://openalex.org/W4416021964","cited_by_count":0,"quality_score":45,"matched_keywords":["LLM","retrieval"],"author_affiliations":["Brookhaven National Laboratory","Center for Theoretical Physics","Centre de Physique Théorique","City University of New York","Collaborative Innovation Center of Quantum Matter","College of Staten Island","Collège de France","Commissariat à l'Énergie Atomique et aux Énergies Alternatives","Cornell University","Ewha Womans University Medical Center","Flatiron Health (United States)","Flatiron Institute","Google (United States)","Harvard University Press","Instituto de Física Teórica","Johns Hopkins University","Massachusetts Institute of Technology","Mountain View College","Stanford University","The Graduate Center, CUNY","University of Geneva","Université Paris-Saclay","Yukhnovskii Institute for Condensed Matter Physics of the National Academy of Sciences of Ukraine","École Polytechnique"],"concepts":[{"id":"https://openalex.org/C111640148","display_name":"Rubric","score":0.8781999945640564},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6225000023841858},{"id":"https://openalex.org/C9652623","display_name":"Field (mathematics)","score":0.5964999794960022},{"id":"https://openalex.org/C177264268","display_name":"Set (abstract data type)","score":0.5669999718666077},{"id":"https://openalex.org/C2780801425","display_name":"Construct (python library)","score":0.5544000267982483},{"id":"https://openalex.org/C2522767166","display_name":"Data science","score":0.49470001459121704},{"id":"https://openalex.org/C26517878","display_name":"Key (lock)","score":0.4577000141143799},{"id":"https://openalex.org/C539667460","display_name":"Management science","score":0.3305000066757202}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7134835413","title":"Embodied Referring Expression Comprehension in Human-Robot Interaction","url":"https://doi.org/10.1145/3757279.3785601","published":"2026-03-10","authors":["Mofijul Islam","Alexi Gladstone","Sujan Sarker","Ganesh Nanduru","Md Fahim","Keyan Du","Aman Chadha","Tariq Iqbal"],"abstract":"As robots enter human workspaces, there is a crucial need for them to comprehend embodied human instructions, enabling intuitive and fluent human-robot interaction (HRI). However, accurate comprehension is challenging due to a lack of large-scale datasets that capture natural embodied interactions in diverse HRI settings. Existing datasets suffer from perspective bias, single-view data collection, inadequate coverage of nonverbal gestures, and a predominant focus on indoor environments. To address these issues, we present the Refer360 dataset, a large-scale dataset of embodied verbal and nonverbal interactions collected across diverse viewpoints in both indoor and outdoor settings. Additionally, we introduce MuRes, a multimodal guided residual module designed to improve embodied referring expression comprehension. MuRes acts as an information bottleneck, extracting salient modality-speci...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3757279.3785601","openalex_id":"https://openalex.org/W7134835413","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Apple (United States)","University of Dhaka","University of Virginia"],"concepts":[{"id":"https://openalex.org/C511192102","display_name":"Comprehension","score":0.5390999913215637},{"id":"https://openalex.org/C100609095","display_name":"Embodied cognition","score":0.5087000131607056},{"id":"https://openalex.org/C90559484","display_name":"Expression (computer science)","score":0.4830000102519989},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.4307999908924103},{"id":"https://openalex.org/C15744967","display_name":"Psychology","score":0.4239000082015991},{"id":"https://openalex.org/C46312422","display_name":"Communication","score":0.35830000042915344},{"id":"https://openalex.org/C180747234","display_name":"Cognitive psychology","score":0.3165999948978424},{"id":"https://openalex.org/C188147891","display_name":"Cognitive science","score":0.3021000027656555}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7134913586","title":"Deep Multi-modal Species Occupancy Modeling","url":"https://doi.org/10.5194/wbf2026-347","published":"2026-03-10","authors":["Timm Haucke","Lauren Harrell","Yunyi Shen","Levente Ioan Klein","David Rolnick","Lauren Gillespie","Sara Beery"],"abstract":"Occupancy models are tools for modeling the relationship between habitat and species occurrence while accounting for the fact that species may still be present even if not detected. The types of environmental variables typically used for characterizing habitats in such ecological models, such as precipitation or tree cover, are frequently of low spatial resolution, with a single value for a spatial pixel size of, e.g., 1 km2. This spatial scale fails to capture the nuances of micro-habitat conditions that can strongly influence species presence, and additionally, as many of these are derived from satellite data, there are aspects of the environment they cannot capture, such as the structure of vegetation below the forest canopy. To address these gaps, we propose to combine high-resolution satellite and ground-level imagery to produce multi-modal environmental features that better capture...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.5194/wbf2026-347","openalex_id":"https://openalex.org/W7134913586","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Google (United States)","McGill University","Mila - Quebec Artificial Intelligence Institute","Moscow Institute of Thermal Technology","University of Michigan"],"concepts":[{"id":"https://openalex.org/C160331591","display_name":"Occupancy","score":0.6959999799728394},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5982999801635742},{"id":"https://openalex.org/C153083717","display_name":"Leverage (statistics)","score":0.5080999732017517},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4553999900817871},{"id":"https://openalex.org/C49937458","display_name":"Probabilistic logic","score":0.4514000117778778},{"id":"https://openalex.org/C519991488","display_name":"Python (programming language)","score":0.4474000036716461},{"id":"https://openalex.org/C107673813","display_name":"Bayesian probability","score":0.4325000047683716},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.4318999946117401}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/reject-resample-repeat-understanding-parallel-reasoning-in-language-model-inference","title":"Reject, Resample, Repeat: Understanding Parallel Reasoning in Language Model Inference","url":"https://www.microsoft.com/en-us/research/publication/reject-resample-repeat-understanding-parallel-reasoning-in-language-model-inference/","published":"2026-03-09","authors":["Noah Golowich","Fan Chen","Dhruv Rohatgi","Raghav Singhal","Carles Domingo-Enrich","Dylan Foster","Akshay Krishnamurthy"],"abstract":"Inference-time methods that aggregate and prune multiple samples have emerged as a powerful paradigm for steering large language models, yet we lack any principled understanding of their accuracy-cost tradeoffs. In this paper, we introduce a route to rigorously study such approaches using the lens of particle filtering algorithms such as Sequential Monte Carlo (SMC). Given a base language model and a process reward model estimating expected terminal rewards, we ask: how accurately can we sample from a target distribution given some number of process reward evaluations? Theoretically, we identify (1) simple criteria enabling non-asymptotic guarantees for SMC; (2) algorithmic improvements to SMC; and (3) a fundamental limit faced by all particle filtering methods. Empirically, we demonstrate that our theoretical criteria effectively govern the sampling error of SMC, though not necessarily....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"https://openalex.org/W7134860114","cited_by_count":0,"quality_score":76,"matched_keywords":["Miscellaneous","Artificial intelligence","Computer science","large language models","mathematics","language model"],"author_affiliations":["Microsoft","Microsoft (United States)","Microsoft Research (United Kingdom)"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/synthcraft-an-ai-partner-for-synthetic-data-generation-to-support-data-access-and-augmentation-in-healthcare","title":"SynthCraft: an AI partner for synthetic data generation to support data access and augmentation in healthcare","url":"https://www.microsoft.com/en-us/research/publication/synthcraft-an-ai-partner-for-synthetic-data-generation-to-support-data-access-and-augmentation-in-healthcare/","published":"2026-03-09","authors":["Thomas Callender","Anders Boyd","Robert Davis","Silas Ruhrberg Estevez","Juan M. Lavista Ferres","Mihaela van der Schaar"],"abstract":"Access to high-quality data provides the foundation for biomedical research. But data access is often limited or challenging due to privacy constraints, whilst the data themselves may be unrepresentative or sparse. Synthetic data can support both privacy-preserving data access and advanced analytical workflows, including data augmentation or the development of digital twins. However, the use of synthetic data remains limited due to the complexity of the methods themselves, their use, and their evaluation. To address this, we developed SynthCraft, an AI tool to support the principled, transparent, application of state-of-the-art synthetic data generation methods. SynthCraft couples a reinforcement learning-based reasoning engine with large language models (LLMs) to orchestrate the workflow necessary for the generation of synthetic data based on dynamic interaction with the user through na...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.1371/journal.pdig.0001290","openalex_id":"https://openalex.org/W7134262195","cited_by_count":0,"quality_score":68,"matched_keywords":["Article (Journal)","Artificial intelligence","Medical, health and genomics","1970-01-01"],"author_affiliations":["Microsoft","Amsterdam University Medical Centers","Microsoft (United States)","Netherlands Center for Occupational Diseases","University Hospital of Bern","University of Amsterdam","University of Bern","University of Cambridge"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"hf-org-paper:Tencent-Hunyuan:2603.08703","title":"HiAR: Efficient Autoregressive Long Video Generation via Hierarchical Denoising","url":"https://huggingface.co/papers/2603.08703","published":"2026-03-09","authors":["Tencent/Hunyuan"],"abstract":"","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["HuggingFace org papers","Tencent-Hunyuan","efficient"],"author_affiliations":["Tencent/Hunyuan"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/Tencent-Hunyuan/papers"}},{"id":"hf-org-paper:tencent:2603.09151","title":"Deep Tabular Research via Continual Experience-Driven Execution","url":"https://huggingface.co/papers/2603.09151","published":"2026-03-09","authors":["Tencent/Hunyuan"],"abstract":"","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["HuggingFace org papers","tencent"],"author_affiliations":["Tencent/Hunyuan"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/tencent/papers"}},{"id":"arxiv:2603.08369","title":"M$^3$-ACE: Rectifying Visual Perception in Multimodal Math Reasoning via Multi-Agentic Context Engineering","url":"http://arxiv.org/abs/2603.08369","published":"2026-03-09","authors":["Peijin Xie","Zhen Xu","Bingquan Liu","Baoxun Wang"],"abstract":"Multimodal large language models have recently shown promising progress in visual mathematical reasoning. However, their performance is often limited by a critical yet underexplored bottleneck: inaccurate visual perception. Through systematic analysis, we find that the most failures originate from incorrect or incomplete visual evidence extraction rather than deficiencies in reasoning capability. Moreover, models tend to remain overly confident in their initial perceptions, making standard strategies such as prompt engineering, multi-round self-reflection, or posterior guidance insufficient to reliably correct errors. To address this limitation, we propose M3-ACE, a multi-agentic context engineering framework designed to rectify visual perception in multimodal math reasoning. Instead of directly aggregating final answers, our approach decouples perception and reasoning by dynamically mai...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"","openalex_id":"https://openalex.org/W7134860390","cited_by_count":0,"quality_score":45,"matched_keywords":["agent","multi-agent"],"author_affiliations":["Tencent (China)"],"concepts":[{"id":"https://openalex.org/C2777508537","display_name":"Visual reasoning","score":0.7003999948501587},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.695900022983551},{"id":"https://openalex.org/C26760741","display_name":"Perception","score":0.6407999992370605},{"id":"https://openalex.org/C2779343474","display_name":"Context (archaeology)","score":0.6029999852180481},{"id":"https://openalex.org/C185798385","display_name":"Benchmark (surveying)","score":0.574999988079071},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5555999875068665},{"id":"https://openalex.org/C178253425","display_name":"Visual perception","score":0.44859999418258667},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.41769999265670776}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7134282567","title":"Vision-Language Efficient Tuning for Mitigating Catastrophic Forgetting in Multi-Modal Learning","url":"https://doi.org/10.1007/s11263-025-02690-2","published":"2026-03-09","authors":["Yaoming Wang","Yuchen Liu","Wenrui Dai","Xiaopeng Zhang","Han Li","Chenglin Li","Junni Zou","Qi Tian","Hongkai Xiong"],"abstract":"","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1007/s11263-025-02690-2","openalex_id":"https://openalex.org/W7134282567","cited_by_count":0,"quality_score":41,"matched_keywords":["efficient"],"author_affiliations":["Huawei Technologies (China)","Shanghai Jiao Tong University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6524999737739563},{"id":"https://openalex.org/C7149132","display_name":"Forgetting","score":0.6359999775886536},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.6065000295639038},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.3734000027179718},{"id":"https://openalex.org/C153180895","display_name":"Pattern recognition (psychology)","score":0.36959999799728394},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.34459999203681946},{"id":"https://openalex.org/C50644808","display_name":"Artificial neural network","score":0.28290000557899475},{"id":"https://openalex.org/C26517878","display_name":"Key (lock)","score":0.27379998564720154}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7155417063","title":"Large Language Models for Explainable and Self-Service Business Intelligence in Large Enterprises","url":"https://doi.org/10.1109/icoeca68095.2026.11485058","published":"2026-03-09","authors":["Ajith Suresh"],"abstract":"Business intelligence (BI) in large enterprises refers to the systematic process of collecting, integrating, analyzing, and interpreting organizational data to support strategic and operational decision-making. Traditional enterprise BI systems rely heavily on predefined dashboards, static reports, and manually authored Structured Query Language (SQL) queries, which ensure governance and reliability but require technical expertise and limit flexibility for non-technical users. As data volumes and analytical demands increase, these approaches struggle to provide accessible, interactive, and explainable analytics at scale. Recent advances in large language models (LLMs) have introduced new possibilities for natural language-driven analytics; however, existing LLM-based BI solutions often lack semantic grounding, governance enforcement, and explanation fidelity, rendering them unsuitable fo...","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/icoeca68095.2026.11485058","openalex_id":"https://openalex.org/W7155417063","cited_by_count":0,"quality_score":41,"matched_keywords":["LLM"],"author_affiliations":["Amazon (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5412999987602234},{"id":"https://openalex.org/C56739046","display_name":"Knowledge management","score":0.4537000060081482},{"id":"https://openalex.org/C2767350","display_name":"Business intelligence","score":0.3677000105381012},{"id":"https://openalex.org/C2522767166","display_name":"Data science","score":0.3328000009059906},{"id":"https://openalex.org/C144133560","display_name":"Business","score":0.3253999948501587},{"id":"https://openalex.org/C539667460","display_name":"Management science","score":0.30959999561309814},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.29280000925064087},{"id":"https://openalex.org/C4216890","display_name":"Business model","score":0.29170000553131104}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7134829514","title":"High-Fidelity Simulated Data Generation for Real-World Zero-Shot Robotic Manipulation Learning With Gaussian Splatting","url":"https://doi.org/10.1109/lra.2026.3671535","published":"2026-03-09","authors":["Haoyu Zhao","Cheng Zeng","Linghao Zhuang","Yaxi Zhao","Shengke Xue","Hao Wang","Xingyue Zhao","Zhongyu Li","Kehan Li","Siteng Huang","Mingxiu Chen","Xin Li"],"abstract":"The scalability of robotic learning is fundamentally bottlenecked by the significant cost and labor of real-world data collection. While simulated data offers a scalable alternative, it often fails to generalize to the real world due to significant gaps in visual appearance, physical properties, and object interactions. To address this, we propose RoboSimGS, a novel Real2Sim2Real framework that converts multi-view real-world images into highfidelity, and physically interactive simulation environments for robotic manipulation. Our approach reconstructs scenes using a hybrid representation: 3D Gaussian Splatting (3DGS) captures the photorealistic appearance of the environment, while mesh primitives for interactive objects ensure accurate physics simulation. Crucially, our framework leverages a Multi-modal Large Language Model (MLLM) to automate the creation of physically plausible, articul...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/lra.2026.3671535","openalex_id":"https://openalex.org/W7134829514","cited_by_count":0,"quality_score":41,"matched_keywords":["language model"],"author_affiliations":["Alibaba Group (China)","Chinese University of Hong Kong","Huazhong University of Science and Technology","Tsinghua University","Wuhan University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8126000165939331},{"id":"https://openalex.org/C48044578","display_name":"Scalability","score":0.6538000106811523},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5888000130653381},{"id":"https://openalex.org/C174348530","display_name":"Bridging (networking)","score":0.5354999899864197},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.4966000020503998},{"id":"https://openalex.org/C177264268","display_name":"Set (abstract data type)","score":0.4699999988079071},{"id":"https://openalex.org/C39920418","display_name":"Kinematics","score":0.46380001306533813},{"id":"https://openalex.org/C153083717","display_name":"Leverage (statistics)","score":0.42910000681877136}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7155424040","title":"A Comprehensive Study on Auto - BI Systems using Generative AI for Scalable and Explainable Enterprise Analytics","url":"https://doi.org/10.1109/icoeca68095.2026.11485569","published":"2026-03-09","authors":["Ajith Suresh"],"abstract":"Enterprise organizations increasingly rely on business intelligence (BI) systems to support data-driven decision-making across strategic, tactical, and operational levels. Automated Business Intelligence (Auto-BI) extends traditional BI by reducing manual intervention across data preparation, analysis, and reporting processes; however, enterprise-scale deployments continue to encounter limitations related to scalability, transparency, and trust. Recent advancements in generative artificial intelligence (GenAI), particularly large language models, have introduced new capabilities for Auto-BI systems, including natural language querying, automated insight generation, and narrative explanation of analytical results. These capabilities significantly alter how users interact with enterprise analytics platforms and how insights are produced and communicated. This study presents a comprehensive...","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/icoeca68095.2026.11485569","openalex_id":"https://openalex.org/W7155424040","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Amazon (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7289000153541565},{"id":"https://openalex.org/C79158427","display_name":"Analytics","score":0.5579000115394592},{"id":"https://openalex.org/C48044578","display_name":"Scalability","score":0.5465999841690063},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3977999985218048},{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.3779999911785126},{"id":"https://openalex.org/C2522767166","display_name":"Data science","score":0.37220001220703125},{"id":"https://openalex.org/C180198813","display_name":"Information system","score":0.2992999851703644},{"id":"https://openalex.org/C2776401178","display_name":"Feature (linguistics)","score":0.29350000619888306}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/scaling-data-difficulty-improving-coding-models-via-reinforcement-learning-on-fresh-and-challenging-problems","title":"Scaling Data Difficulty: Improving Coding Models via Reinforcement Learning on Fresh and Challenging Problems","url":"https://www.microsoft.com/en-us/research/publication/scaling-data-difficulty-improving-coding-models-via-reinforcement-learning-on-fresh-and-challenging-problems/","published":"2026-03-08","authors":["Zongqian Li","Tengchao Lv","Shaohan Huang","Yixuan Su","Qinzheng Sun","Qiufeng Yin","Ying Xin","Scarlett Li","Lei Cui","Nigel Collier","Furu Wei"],"abstract":"Training next-generation code generation models requires high-quality datasets, yet existing datasets face difficulty imbalance, format inconsistency, and data quality problems. We address these challenges through systematic data processing and difficulty scaling. We introduce a four-stage Data Processing Framework encompassing collection, processing, filtering, and verification, incorporating Automatic Difficulty Filtering via an LLM-based predict-calibrate-select framework that leverages multi-dimensional difficulty metrics across five weighted dimensions to retain challenging problems while removing simplistic ones. The resulting MicroCoder dataset comprises tens of thousands of curated real competitive programming problems from diverse platforms, emphasizing recency and difficulty. Evaluations on strictly unseen LiveCodeBench demonstrate that MicroCoder achieves 3x larger performance...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Miscellaneous","Artificial intelligence","Computer science","Reinforcement learning","LLM"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/breaking-training-bottlenecks-effective-and-stable-reinforcement-learning-for-coding-models","title":"Breaking Training Bottlenecks: Effective and Stable Reinforcement Learning for Coding Models","url":"https://www.microsoft.com/en-us/research/publication/breaking-training-bottlenecks-effective-and-stable-reinforcement-learning-for-coding-models/","published":"2026-03-08","authors":["Zongqian Li","Shaohan Huang","Zewen Chi","Yixuan Su","Lexin Zhou","Li Dong","Nigel Collier","Furu Wei"],"abstract":"Modern code generation models exhibit longer outputs, accelerated capability growth, and changed training dynamics, rendering traditional training methodologies, algorithms, and datasets ineffective for improving their performance. To address these training bottlenecks, we propose MicroCoder-GRPO, an improved Group Relative Policy Optimization approach with three innovations: conditional truncation masking to improve long output potential while maintaining training stability, diversity-determined temperature selection to maintain and encourage output diversity, and removal of KL loss with high clipping ratios to facilitate solution diversity. MicroCoder-GRPO achieves up to 17.6% relative improvement over strong baselines on LiveCodeBench v6, with more pronounced gains under extended context evaluation. Additionally, we release MicroCoder-Dataset, a more challenging training corpus that a...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Miscellaneous","Artificial intelligence","Computer science","Reinforcement learning"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"hf-org-paper:Tencent-Hunyuan:2603.07236","title":"HY-WU (Part I): An Extensible Functional Neural Memory Framework and An Instantiation in Text-Guided Image Editing","url":"https://huggingface.co/papers/2603.07236","published":"2026-03-07","authors":["Tencent/Hunyuan"],"abstract":"Foundation models are transitioning from offline predictors to deployed systems expected to operate over long time horizons. In real deployments, objectives are not fixed: domains drift, user preferences evolve, and new tasks appear after the model has shipped. This elevates continual learning and instant personalization from optional features to core architectural requirements. Yet most adaptation pipelines still follow a static weight paradigm: after training (or after any adaptation step), inference executes a single parameter vector regardless of user intent, domain, or instance-specific constraints. This treats the trained or adapted model as a single point in parameter space. In heterogeneous and continually evolving regimes, distinct objectives can induce separated feasible regions over parameters, forcing any single shared update into compromise, interference, or overspecializati...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["HuggingFace org papers","Tencent-Hunyuan","personalization","memory"],"author_affiliations":["Tencent/Hunyuan"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/Tencent-Hunyuan/papers"}},{"id":"arxiv:2603.07300","title":"AutoResearch-RL: Perpetual Self-Evaluating Reinforcement Learning Agents for Autonomous Neural Architecture Discovery","url":"https://huggingface.co/papers/2603.07300","published":"2026-03-07","authors":["Nilesh Jain","Rohit Yadav","Sagar Kotian","Claude AI"],"abstract":"We present AutoResearch-RL, a framework in which a reinforcement learning agent conducts open-ended neural architecture and hyperparameter research without human supervision, running perpetually until a termination oracle signals convergence or resource exhaustion. At each step the agent proposes a code modification to a target training script, executes it under a fixed wall clock time budget, observes a scalar reward derived from validation bits-per-byte (val-bpb), and updates its policy via Proximal Policy Optimisation (PPO). The key design insight is the separation of three concerns: (i) a frozen environment (data pipeline, evaluation protocol, and constants) that guarantees fair cross-experiment comparison; (ii) a mutable target file (train.py) that represents the agent's editable state; and (iii) a meta-learner (the RL agent itself) that accumulates a growing trajectory of experimen...","companies":["Anthropic"],"matched_orgs":["Anthropic"],"company_groups":["company_us"],"company_regions":["US"],"sources":["huggingface_search"],"source":"huggingface_search","work_type":"","doi":"","openalex_id":"","cited_by_count":0,"quality_score":31,"matched_keywords":["agent"],"author_affiliations":[],"concepts":[],"official_report":false,"quality_signals":{"company_match_source":"HuggingFace Papers organization/author metadata"}},{"id":"hf-org-paper:tencent:2603.06569","title":"Penguin-VL: Exploring the Efficiency Limits of VLM with LLM-based Vision Encoders","url":"https://huggingface.co/papers/2603.06569","published":"2026-03-06","authors":["Tencent/Hunyuan"],"abstract":"Vision Language Model (VLM) development has largely relied on scaling model size, which hinders deployment on compute-constrained mobile and edge devices such as smartphones and robots. In this work, we explore the performance limits of compact (e.g., 2B and 8B) VLMs. We challenge the prevailing practice that state-of-the-art VLMs must rely on vision encoders initialized via massive contrastive pretraining (e.g., CLIP/SigLIP). We identify an objective mismatch: contrastive learning, optimized for discrimination, enforces coarse and category-level invariances that suppress fine-grained visual cues needed for dense captioning and complex VLM reasoning. To address this issue, we present Penguin-VL, whose vision encoder is initialized from a text-only LLM. Our experiments reveal that Penguin-Encoder serves as a superior alternative to traditional contrastive pretraining, unlocking a higher d...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"https://openalex.org/W7134291381","cited_by_count":0,"quality_score":72,"matched_keywords":["HuggingFace org papers","tencent","LLM","language model","efficient"],"author_affiliations":["Tencent/Hunyuan","Tencent (China)"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/tencent/papers"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/lumina-llm-guided-gpu-architecture-exploration-via-bottleneck-analysis","title":"LUMINA: LLM-Guided GPU Architecture Exploration via Bottleneck Analysis","url":"https://www.microsoft.com/en-us/research/publication/lumina-llm-guided-gpu-architecture-exploration-via-bottleneck-analysis/","published":"2026-03-06","authors":["Tao Zhang","Rui Ma","Shuotao Xu","Yongqiang Xiong","Peng Cheng"],"abstract":"GPU design space exploration (DSE) for modern AI workloads, such as Large-Language Model (LLM) inference, is challenging because of GPUs'vast, multi-modal design spaces, high simulation costs, and complex design optimization objectives (e.g. performance, power and area trade-offs). Existing automated DSE methods are often prohibitively expensive, either requiring an excessive number of exploration samples or depending on intricate, manually crafted analyses of interdependent critical paths guided by human heuristics. We present LUMINA, an LLM-driven GPU architecture exploration framework that leverage AI to enhance the DSE efficiency and efficacy for GPUs. LUMINA extracts architectural knowledge from simulator code and performs sensitivity studies to automatically compose DSE rules,which are auto-corrected during exploration. A core component of LUMINA is a DSE Benchmark that comprehensi...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Miscellaneous","Hardware and devices","Computer science","LLM","language model"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/lost-in-stories-consistency-bugs-in-long-story-generation-by-llms","title":"Lost in Stories: Consistency Bugs in Long Story Generation by LLMs","url":"https://www.microsoft.com/en-us/research/publication/lost-in-stories-consistency-bugs-in-long-story-generation-by-llms/","published":"2026-03-06","authors":["Junjie Li","Xinru Guo","Yuhao Wu","Roy Ka-Wei Lee","Hongzhi Li","Yutao Xie"],"abstract":"What happens when a storyteller forgets its own story? Large Language Models (LLMs) can now generate narratives spanning tens of thousands of words, but they often fail to maintain consistency throughout. When generating long-form narratives, these models can contradict their own established facts, character traits, and world rules. Existing story generation benchmarks focus mainly on plot quality and fluency, leaving consistency errors largely unexplored. To address this gap, we present ConStory-Bench, a benchmark designed to evaluate narrative consistency in long-form story generation. It contains 2,000 prompts across four task scenarios and defines a taxonomy of five error categories with 19 fine-grained subtypes. We also develop ConStory-Checker, an automated pipeline that detects contradictions and grounds each judgment in explicit textual evidence. Evaluating a range of LLMs throug...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Miscellaneous","Artificial intelligence","Computer science","large language models"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"apple:has76o67cc3p4urvit2tq4v0","title":"GenCtrl -- A Formal Controllability Toolkit for Generative Models","url":"https://machinelearning.apple.com/research/genctrl","published":"2026-03-06","authors":["Emily Cheng","Carmen Amo Alonso","Federico Danieli","Arno Blaas","Luca Zappella","Pau Rodríguez","Xavier Suau"],"abstract":"As generative models become ubiquitous, there is a critical need for fine-grained control over the generation process. Yet, while controlled generation methods from prompting to fine-tuning proliferate, a fundamental question remains unanswered: are these models truly controllable in the first place? In this work, we provide a theoretical framework to formally answer this question. Framing human-model interaction as a control process, we propose...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"openalex:W7155387028","title":"Evaluating the Effect of Retrieval Augmentation on Social Biases","url":"https://doi.org/10.48448/bra2-2d08","published":"2026-03-06","authors":["Association for Computational Linguistics 2026","Danushka Bollegala","Tianhui Zhang","Yi Zhou"],"abstract":"Retrieval Augmented Generation (RAG) is a popular method for injecting up-to-date information into Large Language Model (LLM)-based Natural Language Generation (NLG) systems. While RAG can enhance factual accuracy, its effect on the social biases inherent in LLMs is not well understood. This paper systematically investigates how RAG modulates social biases across three languages (English, Japanese, and Chinese) and four categories (gender, race, age, and religion). By evaluating various generator LLMs on the BBQ benchmark, we analyse how document collections with controlled stereotypical content affect RAG outputs. We find that biases present in the retrieved documents are often significantly amplified in the generated texts, even when the base LLM itself has a low-level of intrinsic bias. These findings raise concerns about the social fairness of RAG systems, underscoring the urgent nee...","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"other","doi":"https://doi.org/10.48448/bra2-2d08","openalex_id":"https://openalex.org/W7155387028","cited_by_count":0,"quality_score":49,"matched_keywords":["LLM","language model","retrieval"],"author_affiliations":["Amazon (United States)","Cardiff University","University of Liverpool"],"concepts":[{"id":"https://openalex.org/C2776035688","display_name":"Affect (linguistics)","score":0.5910999774932861},{"id":"https://openalex.org/C15744967","display_name":"Psychology","score":0.5349000096321106},{"id":"https://openalex.org/C180747234","display_name":"Cognitive psychology","score":0.5072000026702881},{"id":"https://openalex.org/C77805123","display_name":"Social psychology","score":0.4916999936103821},{"id":"https://openalex.org/C2776187449","display_name":"Natural language generation","score":0.4449000060558319},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.41600000858306885},{"id":"https://openalex.org/C88629717","display_name":"Information bias","score":0.40790000557899475},{"id":"https://openalex.org/C2780992000","display_name":"Generator (circuit theory)","score":0.3424000144004822}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7155383689","title":"CASE вЂ“ Condition-Aware Sentence Embeddings for Conditional Semantic Textual Similarity Measurement","url":"https://doi.org/10.48448/w2tf-0t96","published":"2026-03-06","authors":["Association for Computational Linguistics 2026","Danushka Bollegala","Gaifan Zhang","Yi Zhou"],"abstract":"The meaning conveyed by a sentence often depends on the context in which it appears. Despite the progress of sentence embedding methods, it remains unclear as how to best modify a sentence embedding conditioned on its context. To address this problem, we propose Condition-Aware Sentence Embeddings (CASE), an efficient and accurate method to create an embedding for a sentence under a given condition. First, CASE creates an embedding for the condition using an Large Language Model (LLM) encoder, where the sentence influences the attention scores computed for the tokens in the condition during pooling. Next, a supervised method is learnt to align the LLM-based text embeddings with the Conditional Semantic Textual Similarity (C-STS) task. We find that subtracting the condition embedding will consistently improve the C-STS performance of LLM-based text embeddings and improve the isotropy of t...","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"other","doi":"https://doi.org/10.48448/w2tf-0t96","openalex_id":"https://openalex.org/W7155383689","cited_by_count":0,"quality_score":49,"matched_keywords":["LLM","language model","efficient"],"author_affiliations":["Amazon (United States)","Cardiff University","University of Liverpool"],"concepts":[{"id":"https://openalex.org/C41608201","display_name":"Embedding","score":0.8623999953269958},{"id":"https://openalex.org/C2777530160","display_name":"Sentence","score":0.8216999769210815},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.64410001039505},{"id":"https://openalex.org/C2779343474","display_name":"Context (archaeology)","score":0.6302000284194946},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5839999914169312},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5781000256538391},{"id":"https://openalex.org/C103278499","display_name":"Similarity (geometry)","score":0.5557000041007996},{"id":"https://openalex.org/C2780876879","display_name":"Meaning (existential)","score":0.5343999862670898}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7155390040","title":"Tokenizer-Aware Cross-Lingual Adaptation of Decoder-Only LLMs through Embedding Relearning and Swapping","url":"https://doi.org/10.48448/ht3x-z174","published":"2026-03-06","authors":["Association for Computational Linguistics 2026","Grace Chung","Trevor Cohn","Fan Jiang","Honglin Yu"],"abstract":"Extending Large Language Models (LLMs) to new languages is challenging, with most methods proposed suffering from high computational cost and catastrophic forgetting of original model capabilities. Embedding relearning~\\citep{artetxe-etal-2020-cross}, a technique that creates new tokenizers and tunes embeddings on fixed model weights for target language adaptation, is both light-weight and performant. However, it has only been shown to work for older generation encoder-only models and for high resource languages. In this paper, we extend this framework to decoder-only LLMs focusing on joint adaptation to many languages, including low-resource ones. We experiment in three language groups over 100 languages each. We adapt a pre-trained LLM via switching to a customized tokenizer, and relearning the embedding layer. Across 96 diverse languages spanning both classification and generation tas...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"other","doi":"https://doi.org/10.48448/ht3x-z174","openalex_id":"https://openalex.org/W7155390040","cited_by_count":0,"quality_score":45,"matched_keywords":["LLM","efficient"],"author_affiliations":["Google (United States)","The University of Melbourne"],"concepts":[{"id":"https://openalex.org/C41608201","display_name":"Embedding","score":0.7907000184059143},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7148000001907349},{"id":"https://openalex.org/C7149132","display_name":"Forgetting","score":0.6902999877929688},{"id":"https://openalex.org/C139807058","display_name":"Adaptation (eye)","score":0.6478999853134155},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.5544999837875366},{"id":"https://openalex.org/C2164484","display_name":"Core (optical fiber)","score":0.5238000154495239},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.49149999022483826},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.3882000148296356}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7155369956","title":"Learning to Ideate for Machine Learning Engineering Agents","url":"https://doi.org/10.48448/y5jn-x988","published":"2026-03-06","authors":["Association for Computational Linguistics 2026","Lin Cheong","Haibo Ding","Kiran Ramnath","Sangmin Woo","Zhichao Xu","Yunxiang Zhang","Kang Zhou","Yun Zhou"],"abstract":"Existing machine learning engineering (MLE) agents struggle to iteratively optimize their implemented algorithms for effectiveness. To address this, we introduce MLE-Ideator, a dual-agent framework that separates ideation from implementation. In our system, an implementation agent can request strategic help from a dedicated Ideator. We show this approach is effective in two ways. First, in a prompting-based setup, our framework significantly outperforms implementation-only agent baselines on MLE-Bench. Second, we demonstrate that the Ideator can be trained with reinforcement learning (RL) to generate more effective ideas. With only 1K training samples from 10 MLE tasks, our RL-trained Qwen3-8B Ideator achieves an 11.5% relative improvement compared to its prompted counterpart and surpasses Claude Sonnet 3.5. These results highlight that enabling LLM agents to learn to ideate offers a pro...","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"other","doi":"https://doi.org/10.48448/y5jn-x988","openalex_id":"https://openalex.org/W7155369956","cited_by_count":0,"quality_score":45,"matched_keywords":["LLM","agent"],"author_affiliations":["Amazon (Germany)","Amazon (United States)","University of Illinois Urbana-Champaign","University of Michigan"],"concepts":[{"id":"https://openalex.org/C97541855","display_name":"Reinforcement learning","score":0.7229999899864197},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7159000039100647},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.6773999929428101},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.6559000015258789},{"id":"https://openalex.org/C2777735758","display_name":"Path (computing)","score":0.41119998693466187},{"id":"https://openalex.org/C77967617","display_name":"Active learning (machine learning)","score":0.37220001220703125},{"id":"https://openalex.org/C2778827112","display_name":"Feature engineering","score":0.34880000352859497},{"id":"https://openalex.org/C170477896","display_name":"Ideation","score":0.3305000066757202}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7155368512","title":"ConvApparel: A Benchmark Dataset and Validation Framework for User Simulators in Conversational Recommenders","url":"https://doi.org/10.48448/vghp-t262","published":"2026-03-06","authors":["Association for Computational Linguistics 2026","Krisztian Balog","Craig Boutilier","Avi Caciularu","Amir Globerson","Sally Goldman","Jihwan Jeong","Ofer Meshi","Guy Tennenholtz"],"abstract":"The promise of LLM-based user simulators to improve conversational AI is hindered by a critical \"realism gap,\" leading to systems that are optimized for simulated interactions, but may fail in the real world. We introduce ConvApparel, a new dataset of human-AI conversations designed to better understand this gap. Its unique dual-agent data collection protocol, using both \"good\" and \"bad\" recommenders, enables counterfactual validation by capturing a wide spectrum of user experiences, enriched with first-person annotations of user satisfaction. We propose a comprehensive validation framework that combines statistical alignment, a human-likeness score, and counterfactual validation to test for generalization. Our experiments reveal a significant realism gap across all simulators. However, the framework also shows that data-driven methods consistently outperform a prompted baseline, particu...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"other","doi":"https://doi.org/10.48448/vghp-t262","openalex_id":"https://openalex.org/W7155368512","cited_by_count":0,"quality_score":45,"matched_keywords":["LLM","agent"],"author_affiliations":["Google (Israel)","Google (United States)","Tel Aviv University","University of Stavanger"],"concepts":[{"id":"https://openalex.org/C108650721","display_name":"Counterfactual thinking","score":0.9232000112533569},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7854999899864197},{"id":"https://openalex.org/C185798385","display_name":"Benchmark (surveying)","score":0.7425000071525574},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.44269999861717224},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4327999949455261},{"id":"https://openalex.org/C2781170535","display_name":"Noisy data","score":0.4162999987602234},{"id":"https://openalex.org/C3019813237","display_name":"Model validation","score":0.40709999203681946},{"id":"https://openalex.org/C124101348","display_name":"Data mining","score":0.3898000121116638}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7155378750","title":"Attribute-Controlled Translation with Preference Optimization","url":"https://doi.org/10.48448/4stj-5202","published":"2026-03-06","authors":["Association for Computational Linguistics 2026","Vimal Bhat","Iñigo Jauregi Unanue","Zhu Liu","Massimo Piccardi","Najmeh Sadoughi"],"abstract":"Attribute-controlled translation (ACT) seeks to produce translations that satisfy specific constraints on linguistic and stylistic attributes. While careful prompt engineering can enable large language models to perform strongly in this task, its effectiveness is mainly limited to models of very large size. For this reason, in this paper we set to improve the performance of language models of more contained size by leveraging the contrastive nature of ACT tasks with preference optimization, as well as exploiting knowledge distillation with synthetically-generated training samples from larger models. As a resource for this investigation, we also introduce PREF-FAME-MT, a large, contrastive, formality-controlled parallel corpus which has been generated by expanding the existing FAME-MT dataset with synthetic contrastive samples. Experiments conducted over three datasets for formality- and....","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"other","doi":"https://doi.org/10.48448/4stj-5202","openalex_id":"https://openalex.org/W7155378750","cited_by_count":0,"quality_score":45,"matched_keywords":["preference","distillation"],"author_affiliations":["Amazon (Germany)","Amazon (United States)","University of Technology Sydney"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7682999968528748},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.6107000112533569},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.6040999889373779},{"id":"https://openalex.org/C203005215","display_name":"Machine translation","score":0.5784000158309937},{"id":"https://openalex.org/C149364088","display_name":"Translation (biology)","score":0.5763999819755554},{"id":"https://openalex.org/C165064840","display_name":"Matching (statistics)","score":0.5719000101089478},{"id":"https://openalex.org/C177264268","display_name":"Set (abstract data type)","score":0.5480999946594238},{"id":"https://openalex.org/C2781249084","display_name":"Preference","score":0.5250999927520752}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7155363778","title":"Adaptive Data Flywheel: Applying MAPE Control Loops to AI Agent Improvement","url":"https://doi.org/10.48448/52z1-wf35","published":"2026-03-06","authors":["Association for Computational Linguistics 2026","Rama Akkiraju","Lu An","Ryan Angilly","Abhinav Balasubramanian","David Farris","Sidney Knowles","Meenakshi Madugula","Santiago Pombo","Jiaxiang Ren","Aaditya Shukla","Anbang Xu"],"abstract":"Enterprise AI agents must continuously adapt to maintain accuracy, reduce latency, and remain aligned with user needs. We present a practical implementation of a data flywheel in NVInfo AI, NVIDIA's Mixture-of-Experts (MoE) Knowledge Assistant serving over 30,000 employees. By operationalizing a MAPE-driven data flywheel, we built a closed-loop system that systematically addresses failures in retrieval-augmented generation (RAG) pipelines and enables continuous learning. Over a 3-month post-deployment period, we monitored feedback and collected 495 negative samples. Analysis revealed two major failure modes: routing errors (5.25\\%) and query rephrasal errors (3.2\\%). Using NVIDIA NeMo Microservices, we implemented targeted improvements through fine-tuning. For routing, we replaced a Llama 3.1 70B model with a fine-tuned 8B variant, achieving 96\\% accuracy, a 10× reduction in model size,....","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"other","doi":"https://doi.org/10.48448/52z1-wf35","openalex_id":"https://openalex.org/W7155363778","cited_by_count":0,"quality_score":45,"matched_keywords":["retrieval","agent"],"author_affiliations":["Baidu (China)","Nvidia (United Kingdom)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7918999791145325},{"id":"https://openalex.org/C63479239","display_name":"Robustness (evolution)","score":0.6047000288963318},{"id":"https://openalex.org/C82876162","display_name":"Latency (audio)","score":0.5127999782562256},{"id":"https://openalex.org/C26517878","display_name":"Key (lock)","score":0.38519999384880066},{"id":"https://openalex.org/C120314980","display_name":"Distributed computing","score":0.36230000853538513},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3610999882221222},{"id":"https://openalex.org/C115901376","display_name":"Automation","score":0.35839998722076416},{"id":"https://openalex.org/C124101348","display_name":"Data mining","score":0.3416000008583069}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7155397979","title":"What Really Matters for Table LLMs? A Meta-Evaluation of Model and Data Effects","url":"https://doi.org/10.48448/3x7r-zh41","published":"2026-03-06","authors":["Association for Computational Linguistics 2026","Shuaichen Chang","Naihao Deng","Chung-Wei Hang","Yiqun Hu","Hideo Kobayashi","Alexander Hanbo Li","Patrick Ng","Jiani Zhang","Sheng Zhang","Henghui Zhu"],"abstract":"Table modeling has progressed for decades. In this work, we revisit this trajectory and highlight emerging challenges in the LLM era, particularly the paradox of choice: the difficulty of attributing performance gains amid diverse base models and training sets in the context of table instruction tuning. We replicate four table LLMs by instruction-tuning three foundation models on four existing datasets, yielding 12 models. We then evaluate these models across 16 table benchmarks. Our study is the first to quantitatively disentangle the effects of training data and base model selection, revealing that base model choice plays a more dominant role than the training data itself. Generalization and reasoning remain challenging, inviting future effort on table modeling. Based on our findings, we share our thoughts on the future directions for table modeling.","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"other","doi":"https://doi.org/10.48448/3x7r-zh41","openalex_id":"https://openalex.org/W7155397979","cited_by_count":0,"quality_score":41,"matched_keywords":["LLM"],"author_affiliations":["Amazon (United States)","University of Michigan"],"concepts":[{"id":"https://openalex.org/C45235069","display_name":"Table (database)","score":0.8866999745368958},{"id":"https://openalex.org/C177148314","display_name":"Generalization","score":0.6428999900817871},{"id":"https://openalex.org/C2779343474","display_name":"Context (archaeology)","score":0.6407999992370605},{"id":"https://openalex.org/C42058472","display_name":"Base (topology)","score":0.6086000204086304},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5992000102996826},{"id":"https://openalex.org/C2781162219","display_name":"Replicate","score":0.5654000043869019},{"id":"https://openalex.org/C13662910","display_name":"Trajectory","score":0.4523000121116638},{"id":"https://openalex.org/C2776505523","display_name":"Plan (archaeology)","score":0.3587000072002411}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7155354866","title":"We Are What We Repeatedly Do: Improving Long Context Instruction Following","url":"https://doi.org/10.48448/3se8-zc91","published":"2026-03-06","authors":["Association for Computational Linguistics 2026","Ehsan Amid","Andrew Hard","Taylor Johnson","Rajiv Mathews","Swaroop Ramaswamy","Preston K. Robinette"],"abstract":"Large language model context lengths have grown rapidly in recent years, from 512 tokens in GPT to 2M tokens in Gemini 1.5 Pro. Larger context windows enable models to condition on significantly more input tokens, leading to higher quality responses for some user prompts. However, longer contexts also pose challenges to system instruction adherence. In this work, we formalize verifiable instructions to evaluate model compliance based on clear, measurable criteria. From this criteria, we present VerIFY, a Verifiable Instruction Following Yardstick dataset designed to benchmark the compliance and accuracy of LLMs in adhering to various types of instructions across multi-turn, long-context conversations. From experiments with open-source models, we reveal insights into instruction-following failures in long contexts, helping to improve the reliability, safety, and precision of these models....","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"other","doi":"https://doi.org/10.48448/3se8-zc91","openalex_id":"https://openalex.org/W7155354866","cited_by_count":0,"quality_score":41,"matched_keywords":["language model"],"author_affiliations":["Google (United States)","Google DeepMind (United Kingdom)"],"concepts":[{"id":"https://openalex.org/C82261069","display_name":"Yardstick","score":0.7949000000953674},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7409999966621399},{"id":"https://openalex.org/C2779343474","display_name":"Context (archaeology)","score":0.7299000024795532},{"id":"https://openalex.org/C85847156","display_name":"Verifiable secret sharing","score":0.6726999878883362},{"id":"https://openalex.org/C2781460075","display_name":"Compliance (psychology)","score":0.550599992275238},{"id":"https://openalex.org/C2779530757","display_name":"Quality (philosophy)","score":0.5016000270843506},{"id":"https://openalex.org/C185798385","display_name":"Benchmark (surveying)","score":0.484499990940094},{"id":"https://openalex.org/C18762648","display_name":"Work (physics)","score":0.4083000123500824}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7155364617","title":"Steerable Agentic Data Generation for Deep Search with Execution Feedback","url":"https://doi.org/10.48448/fmg1-d834","published":"2026-03-06","authors":["Association for Computational Linguistics 2026","Yanfei Chen","Eunsol Choi","Rujun Han","I-Hung Hsu","Chen-Yu Lee","Tomas Pfister","Vishy Tirumalashetty","Zifeng Wang","Fangyuan Xu","Jun Yan"],"abstract":"Deep search, which aims to answer complicated questions requiring searching for and reasoning across multiple documents, is becoming one of the most important agentic use-cases. Yet, it is not scalable to leverage human annotations to collect deep search data due to long and complex exploration trajectories. We propose an agentic pipeline to automatically generate high-quality deep search question answer pairs for a given corpus. Our initial data generator agent is able to produce QA pairs that need multiple rounds of search based on an input document. To further improve the steer-ability of our data generation framework, we 1) add a natural language prompt to specify a level of difficulty; 2) incorporate the execution traces from a task execution agent as feedback to correct undesirable data. Our intrinsic evaluation shows our framework significantly increases the correctness and diffic...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"other","doi":"https://doi.org/10.48448/fmg1-d834","openalex_id":"https://openalex.org/W7155364617","cited_by_count":0,"quality_score":41,"matched_keywords":["agent"],"author_affiliations":["California Southern University","Google (United States)","University of Southern California"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.832099974155426},{"id":"https://openalex.org/C153083717","display_name":"Leverage (statistics)","score":0.6956999897956848},{"id":"https://openalex.org/C55439883","display_name":"Correctness","score":0.6875},{"id":"https://openalex.org/C43521106","display_name":"Pipeline (software)","score":0.6195999979972839},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.6029999852180481},{"id":"https://openalex.org/C2780451532","display_name":"Task (project management)","score":0.5418999791145325},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.49799999594688416},{"id":"https://openalex.org/C48044578","display_name":"Scalability","score":0.49720001220703125}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7155359542","title":"SpARK: An Embarrassingly Simple Sparse Watermarking in LLMs with Enhanced Text Quality","url":"https://doi.org/10.48448/fyt4-m230","published":"2026-03-06","authors":["Association for Computational Linguistics 2026","Rui Chu","Khoa Doan","Duy Hoang","Yingjie Lao","Thanh Quoc Hung Le","Ping Li","Weijie Zhao"],"abstract":"With the widespread adoption of Large Language Models (LLMs), concerns about potential misuse have emerged. To this end, watermarking has been adapted to LLM, enabling a simple and effective way to detect and monitor generated text. However, while the existing methods can differentiate between watermarked and unwatermarked text with high accuracy, they often face a trade-off between the quality of the generated text and the effectiveness of the watermarking process. In this work, we present a novel type of LLM watermark, Sparse WatermARK (or SpARK), which aims to mitigate this trade-off by applying watermarks to a small subset of generated tokens distributed across the text. To demonstrate this type of watermark, we introduce two novel variants, SpARK-P and SpARK-R, which achieve sparsity by anchoring watermarked tokens to words that have specific Part-of-Speech (POS) tags and specific h...","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"other","doi":"https://doi.org/10.48448/fyt4-m230","openalex_id":"https://openalex.org/W7155359542","cited_by_count":0,"quality_score":41,"matched_keywords":["LLM"],"author_affiliations":["Baidu (China)","Tufts University","VinUniversity"],"concepts":[{"id":"https://openalex.org/C150817343","display_name":"Digital watermarking","score":0.9035999774932861},{"id":"https://openalex.org/C164112704","display_name":"Watermark","score":0.8122000098228455},{"id":"https://openalex.org/C99138194","display_name":"Hash function","score":0.6906999945640564},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6840000152587891},{"id":"https://openalex.org/C2779530757","display_name":"Quality (philosophy)","score":0.524399995803833},{"id":"https://openalex.org/C2780586882","display_name":"Simple (philosophy)","score":0.49540001153945923},{"id":"https://openalex.org/C140642157","display_name":"Pseudorandom number generator","score":0.4787999987602234},{"id":"https://openalex.org/C2779304628","display_name":"Face (sociological concept)","score":0.42590001225471497}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7155417354","title":"PromptPrism: A Linguistically-Inspired Taxonomy for Prompts","url":"https://doi.org/10.48448/xqea-kc44","published":"2026-03-06","authors":["Association for Computational Linguistics 2026","Lin Cheong","Haibo Ding","Sullam Jeoung","Ninad Kulkarni","Shuai Wang","Yi Zhang"],"abstract":"Prompts are the interface for eliciting the capabilities of large language models (LLMs). Understanding their structure and components is critical for analyzing LLM behavior and optimizing performance. However, the field lacks a comprehensive framework for systematic prompt analysis and understanding. We introduce PromptPrism, a linguistically-inspired taxonomy that enables prompt analysis across three hierarchical levels: functional structure, semantic component, and syntactic pattern. By applying linguistic concepts to prompt analysis, PromptPrism bridges traditional language understanding and modern LLM research, offering insights that purely empirical approaches might miss. We show the practical utility of PromptPrism by applying it to three applications: (1) a taxonomy-guided prompt refinement approach that automatically improves prompt quality and enhances model performance across....","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"other","doi":"https://doi.org/10.48448/xqea-kc44","openalex_id":"https://openalex.org/W7155417354","cited_by_count":0,"quality_score":41,"matched_keywords":["LLM"],"author_affiliations":["Amazon (Germany)","Amazon (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7893000245094299},{"id":"https://openalex.org/C58642233","display_name":"Taxonomy (biology)","score":0.6694999933242798},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.5138000249862671},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4724000096321106},{"id":"https://openalex.org/C187191949","display_name":"Profiling (computer programming)","score":0.4674000144004822},{"id":"https://openalex.org/C168167062","display_name":"Component (thermodynamics)","score":0.3179999887943268},{"id":"https://openalex.org/C184337299","display_name":"Semantics (computer science)","score":0.3140999972820282},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.3019999861717224}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7155402915","title":"LM-Lexicon: Improving Definition Modeling via Harmonizing Semantic Experts","url":"https://doi.org/10.48448/pqmm-j972","published":"2026-03-06","authors":["Association for Computational Linguistics 2026","Yang Li","Weikang Li","Jiahui Liang","Yang Liu","Lingyong Yan","Jiaye Yang"],"abstract":"We introduce LM-Lexicon, an innovative definition modeling approach that incorporates data clustering, semantic expert learning, and model merging using a sparse mixture-of-experts architecture. By decomposing the definition modeling task into specialized semantic domains, where small language models are trained as domain experts, LM-Lexicon achieves substantial improvements (+7% BLEU score compared with the prior state-of-the-art model) over existing methods on five widely used benchmarks. Empirically, we demonstrate that 1) the clustering strategy enables fine-grained expert specialization with nearly 10% improvement in definition quality; 2) the semantic-aware domain-level routing mechanism achieves higher expert efficacy (+1%) than conventional token-level routing; and 3) further performance gains can be obtained through test-time compute and semantic expert scaling. Our work advance...","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"other","doi":"https://doi.org/10.48448/pqmm-j972","openalex_id":"https://openalex.org/W7155402915","cited_by_count":0,"quality_score":41,"matched_keywords":["efficient"],"author_affiliations":["Baidu (China)","Beijing Academy of Artificial Intelligence","Beijing Institute for General Artificial Intelligence"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8421000242233276},{"id":"https://openalex.org/C2780451532","display_name":"Task (project management)","score":0.6118999719619751},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5479000210762024},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.5031999945640564},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.45899999141693115},{"id":"https://openalex.org/C73555534","display_name":"Cluster analysis","score":0.45739999413490295},{"id":"https://openalex.org/C36503486","display_name":"Domain (mathematical analysis)","score":0.42669999599456787},{"id":"https://openalex.org/C90312973","display_name":"Semantic data model","score":0.39660000801086426}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7155375212","title":"Hierarchical Text Classification with LLM-Refined Taxonomies","url":"https://doi.org/10.48448/b3bx-av09","published":"2026-03-06","authors":["Association for Computational Linguistics 2026","Jonas Golde","Nicolaas Jedema","Ravikiran Krishnan","Phong Le"],"abstract":"Hierarchical text classification (HTC) depends on taxonomies that organize labels into structured hierarchies. However, many real-world taxonomies introduce ambiguities, such as identical leaf names under similar parent nodes, which prevent language models (LMs) from learning clear decision boundaries. In this paper, we present TaxMorph, a framework that uses large language models (LLMs) to transform entire taxonomies through operations such as renaming, merging, splitting, and reordering. Unlike prior work, our method revises the full hierarchy to better match the semantics encoded by LMs. Experiments across three HTC benchmarks show that LLM-refined taxonomies consistently outperform human-curated ones in various settings up to +2.9pp. in F1. To better understand these improvements, we compare how well LMs can assign leaf nodes to parent nodes and vice versa across human-curated and LL...","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"other","doi":"https://doi.org/10.48448/b3bx-av09","openalex_id":"https://openalex.org/W7155375212","cited_by_count":0,"quality_score":41,"matched_keywords":["LLM"],"author_affiliations":["Amazon (Germany)","Amazon (United States)","Humboldt-Universität zu Berlin"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7372000217437744},{"id":"https://openalex.org/C31170391","display_name":"Hierarchy","score":0.6194000244140625},{"id":"https://openalex.org/C184337299","display_name":"Semantics (computer science)","score":0.5934000015258789},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5784000158309937},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.5515000224113464},{"id":"https://openalex.org/C41608201","display_name":"Embedding","score":0.5299999713897705},{"id":"https://openalex.org/C2781140086","display_name":"Confusion","score":0.4514000117778778},{"id":"https://openalex.org/C2778828372","display_name":"Distributional semantics","score":0.32589998841285706}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7155382408","title":"Enhancing User Safety: Context-Aware Detection of Offensive Query-Ad Pairs in Multimodal Search Advertising","url":"https://doi.org/10.48448/x1ar-yd98","published":"2026-03-06","authors":["Association for Computational Linguistics 2026","Emilio Antunez","Tanmaya Shekhar Dabral","Zhongli Ding","Danqing Fu","Hooshang Ghasemi","Abishek Krishnamoorthy","Gaurav Kumar","Rui Min","Pradyumna Narayana","Qiangjian Xi"],"abstract":"The proliferation of multi-modal online advertisements necessitates robust content moderation to ensure user safety, as offensive ad content can cause user distress and erode platform trust. This paper addresses the detection of content that becomes offensive only when a user’s search query is paired with a specific ad, a context-dependent challenge that simple moderation often misses. Key challenges include the nuanced, multi-modal nature of ads, severe data scarcity and class imbalance due to the rarity of offensive content, and the high cost of human labeling. To overcome these limitations, we introduce a novel, context-aware detection framework centered on a large-scale, Multi-modal Teacher-Student Knowledge Distillation architecture. A powerful Gemini encoder-only “teacher” model distills its knowledge into a lightweight student model suitable for low-latency deployment. We enhance....","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"other","doi":"https://doi.org/10.48448/x1ar-yd98","openalex_id":"https://openalex.org/W7155382408","cited_by_count":0,"quality_score":41,"matched_keywords":["distillation"],"author_affiliations":["Google (United States)"],"concepts":[{"id":"https://openalex.org/C176856949","display_name":"Offensive","score":0.9700999855995178},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7132999897003174},{"id":"https://openalex.org/C63479239","display_name":"Robustness (evolution)","score":0.6238999962806702},{"id":"https://openalex.org/C109747225","display_name":"Scarcity","score":0.5156000256538391},{"id":"https://openalex.org/C26517878","display_name":"Key (lock)","score":0.420199990272522},{"id":"https://openalex.org/C93225998","display_name":"Moderation","score":0.40290001034736633},{"id":"https://openalex.org/C110875604","display_name":"The Internet","score":0.39980000257492065},{"id":"https://openalex.org/C2780586882","display_name":"Simple (philosophy)","score":0.31690001487731934}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7155394467","title":"Attacker's Noise Can Manipulate Your Audio-based LLM in the Real World","url":"https://doi.org/10.48448/8j4s-fr05","published":"2026-03-06","authors":["Association for Computational Linguistics 2026","Soheil Feizi","Rajiv Mathews","Vinu Sankar Sadasivan","Lun Wang"],"abstract":"This paper investigates the real-world vulnerabilities of audio-based large language models (ALLMs), such as Qwen2-Audio. We first demonstrate that an adversary can craft stealthy audio perturbations to manipulate ALLMs into exhibiting specific targeted behaviors, such as eliciting responses to wake-keywords (e.g., \"Hey Qwen\"), or triggering harmful behaviors (e.g., \"Change my calendar event\"). Subsequently, we show that playing adversarial background noise during user interaction with the ALLMs can significantly degrade the response quality. Crucially, our research illustrates the scalability of these attacks to real-world scenarios, impacting other innocent users when these adversarial noises are played through the air. Further, we discuss the transferability of the attack and potential defensive measures.","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"other","doi":"https://doi.org/10.48448/8j4s-fr05","openalex_id":"https://openalex.org/W7155394467","cited_by_count":0,"quality_score":41,"matched_keywords":["LLM"],"author_affiliations":["Google (United States)","University of Maryland, College Park"],"concepts":[{"id":"https://openalex.org/C37736160","display_name":"Adversarial system","score":0.8579999804496765},{"id":"https://openalex.org/C61272859","display_name":"Transferability","score":0.7317000031471252},{"id":"https://openalex.org/C41065033","display_name":"Adversary","score":0.7196999788284302},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6717000007629395},{"id":"https://openalex.org/C99498987","display_name":"Noise (video)","score":0.6305000185966492},{"id":"https://openalex.org/C48044578","display_name":"Scalability","score":0.5958999991416931},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.43860000371932983},{"id":"https://openalex.org/C38652104","display_name":"Computer security","score":0.38920000195503235}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7155369968","title":"Scaling Cultural Resources for Improving Generative Models","url":"https://doi.org/10.48448/8m2v-dh41","published":"2026-03-06","authors":["Association for Computational Linguistics 2026","Sunipa Dev","Rutledge Chin Feman","Charu Kalia","Erin MacMurray van Liemt","Vinodkumar Prabhakaran","Hayk Stepanyan","Aishwarya Verma","Andrew Zaldivar"],"abstract":"Generative models are known to have reduced performance in different global cultural contexts and languages. While continual data updates have been known to be conducted to improve overall model performance, bolstering and evaluating this cross-cultural competence of generative AI models requires data resources to be intentionally expanded to include global contexts and languages. In this work, we construct a multi-pronged pipeline to collect and contribute culturally salient, multilingual data. We posit that such data can assess the state of the global applicability of our models and thus, in turn, help identify and improve upon cross-cultural gaps.","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"other","doi":"https://doi.org/10.48448/8m2v-dh41","openalex_id":"https://openalex.org/W7155369968","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Google (United States)"],"concepts":[{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.7372999787330627},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6288999915122986},{"id":"https://openalex.org/C2780801425","display_name":"Construct (python library)","score":0.5634999871253967},{"id":"https://openalex.org/C167966045","display_name":"Generative model","score":0.5314000248908997},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4593000113964081},{"id":"https://openalex.org/C2522767166","display_name":"Data science","score":0.42489999532699585},{"id":"https://openalex.org/C100521375","display_name":"Competence (human resources)","score":0.3905999958515167},{"id":"https://openalex.org/C43521106","display_name":"Pipeline (software)","score":0.35109999775886536}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7155400119","title":"Neural Breadcrumbs: Membership Inference Attacks on LLMs Through Hidden State and Attention Pattern Analysis","url":"https://doi.org/10.48448/1fn9-d881","published":"2026-03-06","authors":["Association for Computational Linguistics 2026","Manoj Ghuhan Arivazhagan","Vinayshekhar Bannihatti Kumar","Rashmi Gangadharaiah","Disha Makhija"],"abstract":"Membership inference attacks (MIAs) reveal whether specific data was used to train machine learning models, serving as important tools for privacy auditing and compliance assessment. Recent studies have reported that MIAs perform only marginally better than random guessing against large language models, suggesting that modern pre-training approaches with massive datasets may be free from privacy leakage risks. Our work offers a complementary perspective to these findings by exploring how examining LLMs' internal representations, rather than just their outputs, may provide additional insights into potential membership inference signals. Our framework, \\emph{memTrace}, follows what we call \\enquote{neural breadcrumbs} extracting informative signals from transformer hidden states and attention patterns as they process candidate sequences. By analyzing layer-wise representation dynamics, att...","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"other","doi":"https://doi.org/10.48448/1fn9-d881","openalex_id":"https://openalex.org/W7155400119","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Amazon (United States)"],"concepts":[{"id":"https://openalex.org/C2776214188","display_name":"Inference","score":0.7146999835968018},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6624000072479248},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.6032999753952026},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5857999920845032},{"id":"https://openalex.org/C169258074","display_name":"Random forest","score":0.5181999802589417},{"id":"https://openalex.org/C30038468","display_name":"Memorization","score":0.5067999958992004},{"id":"https://openalex.org/C2776359362","display_name":"Representation (politics)","score":0.46299999952316284},{"id":"https://openalex.org/C12713177","display_name":"Perspective (graphical)","score":0.4138000011444092}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7155402721","title":"How Many Ratings per Item are Necessary for Reliable Significance Testing?","url":"https://doi.org/10.48448/qkp2-my09","published":"2026-03-06","authors":["Association for Computational Linguistics 2026","Christopher M. Homan","Flip Korn","Deepak Pandita","Chris Welty"],"abstract":"A cornerstone of machine learning evaluation is the (often hidden) assumption that model and human responses are reliable enough to evaluate models against unitary, authoritative, ``gold standard'' data, via simple metrics such as accuracy, precision, and recall. The generative AI revolution would seem to explode this assumption, given the critical role stochastic inference plays. Yet, in spite of public demand for more transparency in AI---along with strong evidence that humans are unreliable judges---estimates of model reliability are conventionally based on, at most, a few output responses per input item. We adapt a method, previously used to evaluate the reliability of various metrics and estimators for machine learning evaluation, to determine whether an (existing or planned) dataset has enough responses per item to assure reliable null hypothesis statistical testing. We show that,....","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"other","doi":"https://doi.org/10.48448/qkp2-my09","openalex_id":"https://openalex.org/W7155402721","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Google (United States)","Rochester Institute of Technology"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6262000203132629},{"id":"https://openalex.org/C43214815","display_name":"Reliability (semiconductor)","score":0.546500027179718},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.541700005531311},{"id":"https://openalex.org/C185429906","display_name":"Estimator","score":0.5328999757766724},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5156999826431274},{"id":"https://openalex.org/C87007009","display_name":"Statistical hypothesis testing","score":0.4636000096797943},{"id":"https://openalex.org/C2776214188","display_name":"Inference","score":0.45239999890327454},{"id":"https://openalex.org/C191988596","display_name":"Null hypothesis","score":0.4343000054359436}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7134060225","title":"A Novel Approach to Overseeing the Clinical Application of Generative AI","url":"https://doi.org/10.1001/jamahealthforum.2025.6947","published":"2026-03-06","authors":["Bakul Patel","David Blumenthal"],"abstract":"This Viewpoint discusses current regulatory frameworks governing the use of generative artificial intelligence (AI) in health care and proposes an alternative oversight system.","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1001/jamahealthforum.2025.6947","openalex_id":"https://openalex.org/W7134060225","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Google (United States)","Harvard University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5238999724388123},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.38989999890327454},{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.36250001192092896},{"id":"https://openalex.org/C2776401178","display_name":"Feature (linguistics)","score":0.2976999878883362},{"id":"https://openalex.org/C9652623","display_name":"Field (mathematics)","score":0.2727000117301941},{"id":"https://openalex.org/C2779343474","display_name":"Context (archaeology)","score":0.2727000117301941},{"id":"https://openalex.org/C167966045","display_name":"Generative model","score":0.27090001106262207},{"id":"https://openalex.org/C165064840","display_name":"Matching (statistics)","score":0.258899986743927}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/sparse-bitnet-1-58-bit-llms-are-naturally-friendly-to-semi-structured-sparsity","title":"Sparse-BitNet: 1.58-bit LLMs are Naturally Friendly to Semi-Structured Sparsity","url":"https://www.microsoft.com/en-us/research/publication/sparse-bitnet-1-58-bit-llms-are-naturally-friendly-to-semi-structured-sparsity/","published":"2026-03-05","authors":["Di Zhang","Xun Wu","Shaohan Huang","Yudong Wang","Hanyong Shao","Yingbo Hao","Zewen Chi","Li Dong","Ting Song","Yan Xia","Zhifang Sui","Furu Wei"],"abstract":"Semi-structured N:M sparsity and low-bit quantization (e.g., 1.58-bit BitNet) are two promising approaches for improving the efficiency of large language models (LLMs), yet they have largely been studied in isolation. In this work, we investigate their interaction and show that 1.58-bit BitNet is naturally more compatible with N:M sparsity than full-precision models. To study this effect, we propose Sparse-BitNet, a unified framework that jointly applies 1.58-bit quantization and dynamic N:M sparsification while ensuring stable training for the first time. Across multiple model scales and training regimes (sparse pretraining and dense-to-sparse schedules), 1.58-bit BitNet consistently exhibits smaller performance degradation than full-precision baselines at the same sparsity levels and can tolerate higher structured sparsity before accuracy collapse. Moreover, using our custom sparse ten...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":80,"matched_keywords":["Miscellaneous","Artificial intelligence","Computation and Language","Computer science","large language models","efficient","quantization"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/slidesparse-fast-and-flexible-2n-22n-structured-sparsity","title":"SlideSparse: Fast and Flexible (2N-2):2N Structured Sparsity","url":"https://www.microsoft.com/en-us/research/publication/slidesparse-fast-and-flexible-2n-22n-structured-sparsity/","published":"2026-03-05","authors":["Hanyong Shao","Yingbo Hao","Ting Song","Yan Xia","Di Zhang","Shaohan Huang","Xun Wu","Songcheng Xu","Le Xu","Li Dong","Zewen Chi","Yinxue Zou"],"abstract":"NVIDIA's 2:4 Sparse Tensor Cores deliver 2x throughput but demand strict 50% pruning -- a ratio that collapses LLM reasoning accuracy (Qwen3: 54% to 15%). Milder $(2N-2):2N$ patterns (e.g., 6:8, 25% pruning) preserve accuracy yet receive no hardware support, falling back to dense execution without any benefit from sparsity. We present SlideSparse, the first system to unlock Sparse Tensor Core acceleration for the $(2N-2):2N$ model family on commodity GPUs. Our Sliding Window Decomposition reconstructs any $(2N-2):2N$ weight block into $N-1$ overlapping 2:4-compliant windows without any accuracy loss; Activation Lifting fuses the corresponding activation rearrangement into per-token quantization at near-zero cost. Integrated into vLLM, SlideSparse is evaluated across various GPUs (A100, H100, B200, RTX 4090, RTX 5080, DGX-spark), precisions (FP4, INT8, FP8, BF16, FP16), and model families...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":80,"matched_keywords":["Miscellaneous","Artificial intelligence","Computer science","large language models","Machine learning","LLM","quantization"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/repolaunch-automating-buildtest-pipeline-of-code-repositories-on-any-language-and-any-platform","title":"RepoLaunch: Automating Build&Test Pipeline of Code Repositories on ANY Language and ANY Platform","url":"https://www.microsoft.com/en-us/research/publication/repolaunch-automating-buildtest-pipeline-of-code-repositories-on-any-language-and-any-platform/","published":"2026-03-05","authors":["Kenan Li","Rongzhi Li","Linghao Zhang","Qirui Jin","Liao Zhu","Xiaosong Huang","Geng Zhang","Yikai Zhang","Shilin He","Chengxing Xie","Xin Zhang","Zijian Jin"],"abstract":"Building software repositories typically requires significant manual effort. Recent advances in large language model (LLM) agents have accelerated automation in software engineering (SWE). We introduce RepoLaunch, the first agent capable of automatically resolving dependencies, compiling source code, and extracting test results for repositories across arbitrary programming languages and operating systems. To demonstrate its utility, we further propose a fully automated pipeline for SWE dataset creation, where task design is the only human intervention. RepoLaunch automates the remaining steps, enabling scalable benchmarking and training of coding agents and LLMs. Notably, several works on agentic benchmarking and training have recently adopted RepoLaunch for automated task generation.","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":80,"matched_keywords":["Miscellaneous","Artificial intelligence","Programming languages and software engineering","Computer science","LLM","language model","agent"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/scaling-agentic-capabilities-not-context-efficient-reinforcement-finetuning-for-large-toolspaces","title":"Scaling Agentic Capabilities, Not Context: Efficient Reinforcement Finetuning for Large Toolspaces","url":"https://www.microsoft.com/en-us/research/publication/scaling-agentic-capabilities-not-context-efficient-reinforcement-finetuning-for-large-toolspaces/","published":"2026-03-05","authors":["Karan Gupta","Pranav Vajreshwari","Yash Pandya","Raghav Magazine","Akshay Nambi","Ahmed Awadallah"],"abstract":"Agentic systems operating over large tool ecosystems must plan and execute long-horizon workflows under weak or non-verifiable supervision. While frontier models mitigate these challenges through scale and large context budgets, small language models (SLMs) remain brittle: eager tool loading saturates context, execution errors compound over time, and sparse rewards limit learning. We introduce ATLAS, a reinforcement finetuning framework that enables SLMs to operate effectively in large-scale toolspace environments by learning how to acquire context and how to execute actions. Our approach makes two key contributions. First, we treat context control and execution structure as learnable decisions, combining iterative tool loading with programmatic tool orchestration to bound context growth and stabilize long-horizon trajectories. Second, we propose rubric-based reinforcement finetuning, wh...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","1970-01-01","efficient","agent"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"bytedance-seed:1416","title":"Learn Hard Problems During RL with Reference Guided Fine-tuning","url":"https://seed.bytedance.com/en/research/learn-hard-problems-during-rl-with-reference-guided-fine-tuning","published":"2026-03-05","authors":["Yangzhen Wu","Shanda Li","Zixin Wen","Xin Zhou","Ameet Talwalkar","Yiming Yang","Wenhao Huang","Tianle Cai"],"abstract":"Reinforcement learning (RL) for mathematical reasoning can suffer from reward sparsity: for challenging problems, LLM fails to sample any correct trajectories, preventing RL from receiving meaningful positive feedback. At the same time, there often exist human-written reference solutions along with the problem (e.g., problems from AoPS), but directly fine-tuning on these solutions offers no benefit because models often cannot imitate human proofs that lie outside their own reasoning distribution.We introduce Reference-Guided Fine-Tuning (ReGFT), a simple and effective method that utilizes human-written reference solutions to synthesize positive trajectories on hard problems and train on them before RL. For each problem, we provide the model with a partial reference solution and let it generate its own reasoning trace, ensuring the resulting trajectories remain in the model's reasoning sp...","companies":["ByteDance/Seed"],"matched_orgs":["ByteDance/Seed"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Machine Learning","LLM","arXiv"],"author_affiliations":["ByteDance/Seed"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official ByteDance Seed publication API https://seed.bytedance.com/api/get_article_list_v2"}},{"id":"official:9361c62d8440ca56","title":"GPT-5.4 Thinking System Card","url":"https://openai.com/index/gpt-5-4-thinking-system-card","published":"2026-03-05","authors":["OpenAI"],"abstract":"","companies":["OpenAI"],"matched_orgs":["OpenAI"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Publication"],"author_affiliations":["OpenAI"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official RSS feed https://openai.com/news/rss.xml"}},{"id":"arxiv:2603.04800","title":"MASQuant: Modality-Aware Smoothing Quantization for Multimodal Large Language Models","url":"http://arxiv.org/abs/2603.04800","published":"2026-03-05","authors":["Lulu Hu","Wenhu Xiao","Xin Chen","Xinhua Xu","Bowen Xu","Kun Li","Yongliang Tao"],"abstract":"Post-training quantization (PTQ) with computational invariance for Large Language Models~(LLMs) have demonstrated remarkable advances, however, their application to Multimodal Large Language Models~(MLLMs) presents substantial challenges. In this paper, we analyze SmoothQuant as a case study and identify two critical issues: Smoothing Misalignment and Cross-Modal Computational Invariance. To address these issues, we propose Modality-Aware Smoothing Quantization (MASQuant), a novel framework that introduces (1) Modality-Aware Smoothing (MAS), which learns separate, modality-specific smoothing factors to prevent Smoothing Misalignment, and (2) Cross-Modal Compensation (CMC), which addresses Cross-modal Computational Invariance by using SVD whitening to transform multi-modal activation differences into low-rank forms, enabling unified quantization across modalities. MASQuant demonstrates st...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"","openalex_id":"https://openalex.org/W7134094133","cited_by_count":0,"quality_score":41,"matched_keywords":["quantization"],"author_affiliations":["Alibaba Group (China)","Alibaba Group (United States)"],"concepts":[{"id":"https://openalex.org/C3770464","display_name":"Smoothing","score":0.9355999827384949},{"id":"https://openalex.org/C28855332","display_name":"Quantization (signal processing)","score":0.7971000075340271},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6579999923706055},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.5292999744415283},{"id":"https://openalex.org/C11413529","display_name":"Algorithm","score":0.45320001244544983},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.43050000071525574},{"id":"https://openalex.org/C179799912","display_name":"Computational complexity theory","score":0.4189000129699707},{"id":"https://openalex.org/C199833920","display_name":"Vector quantization","score":0.4106000065803528}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"hf-org-paper:Qwen:2603.03825","title":"From Narrow to Panoramic Vision: Attention-Guided Cold-Start Reshapes Multimodal Reasoning","url":"https://huggingface.co/papers/2603.03825","published":"2026-03-04","authors":["Alibaba/Qwen"],"abstract":"","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["HuggingFace org papers","Qwen"],"author_affiliations":["Alibaba/Qwen"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/Qwen/papers"}},{"id":"openalex:W7133527361","title":"Merlin: a computed tomography vision–language foundation model and dataset","url":"https://doi.org/10.1038/s41586-026-10181-8","published":"2026-03-04","authors":["Louis Blankemeier","Ashwin Kumar","Joseph Paul Cohen","Jiaming Liu","Longchao Liu","Dave Van Veen","Syed Jamal Safdar Gardezi","Hongkun Yu","Magdalini Paschali","Zhihong Chen","Jean-Benoit Delbrouck","Eduardo Reis"],"abstract":"","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1038/s41586-026-10181-8","openalex_id":"https://openalex.org/W7133527361","cited_by_count":4,"quality_score":41,"matched_keywords":[],"author_affiliations":["Artificial Intelligence in Medicine (Canada)","Cardiovascular Institute of the South","Chang Gung Memorial Hospital","Google (United States)","Hospital Israelita Albert Einstein","Stanford Medicine","Stanford University","University Hospital of Zurich","University of California, Berkeley","University of Wisconsin–Madison"],"concepts":[{"id":"https://openalex.org/C544519230","display_name":"Computed tomography","score":0.6966999769210815},{"id":"https://openalex.org/C2780966255","display_name":"Foundation (evidence)","score":0.5493999719619751},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5200999975204468},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.367000013589859},{"id":"https://openalex.org/C163716698","display_name":"Tomography","score":0.34450000524520874},{"id":"https://openalex.org/C127313418","display_name":"Geology","score":0.2863999903202057},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.2736999988555908},{"id":"https://openalex.org/C40706702","display_name":"Computed tomography laser mammography","score":0.2574000060558319}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":4}},{"id":"openalex:W7147281985","title":"Hierarchical Graph Vision Transformers with Multimodal Contrastive Fusion for Semantically Consistent Biomedical Image Analysis","url":"https://doi.org/10.1109/iciss67859.2026.11453748","published":"2026-03-04","authors":["Nithin Reddy Kumbham","Raghavendra Reddy","Amulya Mallepalli","Venkata Gopi Siva Sai Nallapati","Vineetha Batchu"],"abstract":"Biomedical image analysis is a cornerstone of modern healthcare, enabling early disease detection, treatment planning, and precision diagnostics. However, traditional deep learning methods — including convolutional neural networks and standalone transformers — often suffer from limitations such as inadequate structural understanding, poor semantic consistency, and the inability to effectively integrate multimodal biomedical information. To address these challenges, we propose HGVT-MCF (Hierarchical Graph Vision Transformer with Multimodal Contrastive Fusion), a novel framework that unifies graph-based spatial reasoning, transformer-driven global attention, and contrastive multimodal fusion to deliver semantically consistent and biologically meaningful image analysis. The proposed architecture models complex topological relationships in medical images through hierarchical graph constructi...","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/iciss67859.2026.11453748","openalex_id":"https://openalex.org/W7147281985","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Amazon (Germany)","Amazon (United States)","Dunwoody College of Technology","Logan University","Short and Associates (United States)"],"concepts":[{"id":"https://openalex.org/C2781067378","display_name":"Interpretability","score":0.7421000003814697},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.732200026512146},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.6833000183105469},{"id":"https://openalex.org/C81363708","display_name":"Convolutional neural network","score":0.5099999904632568},{"id":"https://openalex.org/C89600930","display_name":"Segmentation","score":0.4580000042915344},{"id":"https://openalex.org/C108583219","display_name":"Deep learning","score":0.453900009393692},{"id":"https://openalex.org/C66322947","display_name":"Transformer","score":0.45320001244544983},{"id":"https://openalex.org/C118505674","display_name":"Encoder","score":0.4462999999523163}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7147437413","title":"Contrastive Multimodal Graph Vision Transformers for Context-Aware Visual Representation and Relational Reasoning","url":"https://doi.org/10.1109/iciss67859.2026.11453927","published":"2026-03-04","authors":["Vineetha Batchu","Nithin Reddy Kumbham","Raghavendra Reddy","Amulya Mallepalli","Venkata Gopi Siva Sai Nallapati"],"abstract":"Vision transformers have recently advanced fast vision recognition operations, yet have low capacity to capture relational dependency and contextual semantics which impedes higher level reasoning. In the quest to overcome these shortcomings, this paper introduces CMG-ViT (Contrastive Multimodal Graph Vision Transformer) - a single framework that integrates the transformer-based learning of global features, graph-based relational reasoning, and contrastive multimodal alignment to the context-based visual understanding. This is done by using the input images and encoding them into patch-level representations and constructing a scene graph whereby objects, properties and binary relationships are represented. The heterogeneous representations are then added with the assistance of a graphconscience self-attention model, and it is fed into the transformer with topological priors to become supe...","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/iciss67859.2026.11453927","openalex_id":"https://openalex.org/W7147437413","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Amazon (United States)","EY Technologies (United States)","Innovative Technology Applications (United States)","Short and Associates (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6159999966621399},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.6072999835014343},{"id":"https://openalex.org/C2776214188","display_name":"Inference","score":0.5105999708175659},{"id":"https://openalex.org/C132525143","display_name":"Graph","score":0.43970000743865967},{"id":"https://openalex.org/C2777508537","display_name":"Visual reasoning","score":0.3840000033378601},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.3758000135421753},{"id":"https://openalex.org/C177877439","display_name":"Statistical relational learning","score":0.37459999322891235},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.3743000030517578}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"arxiv:2603.03975","title":"Phi-4-reasoning-vision-15B Technical Report","url":"https://huggingface.co/papers/2603.03975","published":"2026-03-04","authors":["Jyoti Aneja","Michael Harrison","Neel Joshi","Tyler LaBonte","John Langford","Eduardo Salinas"],"abstract":"We present Phi-4-reasoning-vision-15B, a compact open-weight multimodal reasoning model, and share the motivations, design choices, experiments, and learnings that informed its development. Our goal is to contribute practical insight to the research community on building smaller, efficient multimodal reasoning models and to share the result of these learnings as an open-weight model that is good at common vision and language tasks and excels at scientific and mathematical reasoning and understanding user interfaces. Our contributions include demonstrating that careful architecture choices and rigorous data curation enable smaller, open-weight multimodal models to achieve competitive performance with significantly less training and inference-time compute and tokens. The most substantial improvements come from systematic filtering, error correction, and synthetic augmentation -- reinforcin...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["huggingface_search"],"source":"huggingface_search","work_type":"","doi":"","openalex_id":"","cited_by_count":0,"quality_score":31,"matched_keywords":["efficient"],"author_affiliations":[],"concepts":[],"official_report":false,"quality_signals":{"company_match_source":"HuggingFace Papers organization/author metadata"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/contextualized-privacy-defense-for-llm-agents","title":"Contextualized Privacy Defense for LLM Agents","url":"https://www.microsoft.com/en-us/research/publication/contextualized-privacy-defense-for-llm-agents/","published":"2026-03-03","authors":["Yule Wen","Yanzhe Zhang","Jianxun Lian","Xiaoyuan Yi","Xing Xie","Diyi Yang"],"abstract":"LLM agents increasingly act on users'personal information, yet existing privacy defenses remain limited in both design and adaptability. Most prior approaches rely on static or passive defenses, such as prompting and guarding. These paradigms are insufficient for supporting contextual, proactive privacy decisions in multi-step agent execution. We propose Contextualized Defense Instructing (CDI), a new privacy defense paradigm in which an instructor model generates step-specific, context-aware privacy guidance during execution, proactively shaping actions rather than merely constraining or vetoing them. Crucially, CDI is paired with an experience-driven optimization framework that trains the instructor via reinforcement learning (RL), where we convert failure trajectories with privacy violations into learning environments. We formalize baseline defenses and CDI as distinct intervention po...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":80,"matched_keywords":["Miscellaneous","Artificial intelligence","Security, privacy, and cryptography","Computer science","large language models","LLM","agent"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/trade-offs-in-ensembling-merging-and-routing-among-parameter-efficient-experts","title":"Trade-offs in Ensembling, Merging and Routing Among Parameter-Efficient Experts","url":"https://www.microsoft.com/en-us/research/publication/trade-offs-in-ensembling-merging-and-routing-among-parameter-efficient-experts/","published":"2026-03-03","authors":["Sanae Lotfi","Lucas Caccia","Alessandro Sordoni","Jordan Ash","Miroslav Dudík"],"abstract":"While large language models (LLMs) fine-tuned with lightweight adapters achieve strong performance across diverse tasks, their performance on individual tasks depends on the fine-tuning strategy. Fusing independently trained models with different strengths has shown promise for multi-task learning through three main strategies: ensembling, which combines outputs from independent models; merging, which fuses model weights via parameter averaging; and routing, which integrates models in an input-dependent fashion. However, many design decisions in these approaches remain understudied, and the relative benefits of more sophisticated ensembling, merging and routing techniques are not fully understood. We empirically evaluate their trade-offs, addressing two key questions: What are the advantages of going beyond uniform ensembling or merging? And does the flexibility of routing justify its co...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Miscellaneous","Artificial intelligence","Computer science","large language models","Machine learning","efficient"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/learning-when-to-act-or-refuse-guarding-agentic-reasoning-models-for-safe-multi-step-tool-use","title":"Learning When to Act or Refuse: Guarding Agentic Reasoning Models for Safe Multi-Step Tool Use","url":"https://www.microsoft.com/en-us/research/publication/learning-when-to-act-or-refuse-guarding-agentic-reasoning-models-for-safe-multi-step-tool-use/","published":"2026-03-03","authors":["Aradhye Agarwal","Gurdit Siyan","Yash Pandya","Joykirat Singh","Akshay Nambi","Ahmed Awadallah"],"abstract":"Agentic language models operate in a fundamentally different safety regime than chat models: they must plan, call tools, and execute long-horizon actions where a single misstep, such as accessing files or entering credentials, can cause irreversible harm. Existing alignment methods, largely optimized for static generation and task completion, break down in these settings due to sequential decision-making, adversarial tool feedback, and overconfident intermediate reasoning. We introduce MOSAIC, a post-training framework that aligns agents for safe multi-step tool use by making safety decisions explicit and learnable. MOSAIC structures inference as a plan, check, then act or refuse loop, with explicit safety reasoning and refusal as first-class actions. To train without trajectory-level labels, we use preference-based reinforcement learning with pairwise trajectory comparisons, which captu...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","agentic AI","Computer science","1970-01-01","preference"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"official:41417356c844f636","title":"Gemini 3.1 Flash-Lite Model Card","url":"https://deepmind.google/models/model-cards/gemini-3-1-flash-lite/","published":"2026-03-03","authors":["Google/DeepMind"],"abstract":"Official Google DeepMind model card.","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"model_card","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["model card","Gemini 3.1 Flash-Lite"],"author_affiliations":["Google/DeepMind"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Google DeepMind model cards page https://deepmind.google/models/model-cards/"}},{"id":"official:aee1df4ab8363f71","title":"GPT-5.3 Instant System Card","url":"https://openai.com/index/gpt-5-3-instant-system-card","published":"2026-03-03","authors":["OpenAI"],"abstract":"","companies":["OpenAI"],"matched_orgs":["OpenAI"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Publication"],"author_affiliations":["OpenAI"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official RSS feed https://openai.com/news/rss.xml"}},{"id":"apple:majnn4yedf7v47qd6yheunma","title":"On the Impossibility of Separating Intelligence from Judgment: The Computational Intractability of Filtering for AI Alignment","url":"https://machinelearning.apple.com/research/separating-intelligence","published":"2026-03-03","authors":["Sarah Ball","Greg Gluch","Shafi Goldwasser","Frauke Kreuter§","Omer Reingold¶","Guy N. Rothblum"],"abstract":"With the increased deployment of large language models (LLMs), one concern is their potential misuse for generating harmful content. Our work studies the alignment challenge, with a focus on filters to prevent the generation of unsafe information. Two natural points of intervention are the filtering of the input prompt before it reaches the model, and filtering the output after generation. Our main results demonstrate computational challenges in...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"apple:eba8gsucdicfari92d1oebqk","title":"Learning to Reason for Hallucination Span Detection","url":"https://machinelearning.apple.com/research/hallucination-span-detection","published":"2026-03-03","authors":["Hsuan Su","Ting-Yao Hu","Hema Swetha Koppula","Kundan Krishna","Hadi Pouransari","Cheng-Yu Hsieh","Cem Koc","Joseph Yitan Cheng","Oncel Tuzel","Raviteja Vemulapalli"],"abstract":"Large language models (LLMs) often generate hallucinations -- unsupported content that undermines reliability. While most prior works frame hallucination detection as a binary task, many real-world applications require identifying hallucinated spans, which is a multi-step decision making process. This naturally raises the question of whether explicit reasoning can help the complex task of detecting hallucination spans. To answer this question, we...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"openalex:W7133523620","title":"Enhancing Search Efficiency through LLM-Based User Memory Systems for Query Matching and Intent Modeling","url":"https://doi.org/10.20944/preprints202603.0182.v1","published":"2026-03-03","authors":["Xudong Yu"],"abstract":"Traditional search engines primarily rely on keyword matching and ranking algorithms, which often fail to capture users’ implicit intents and contextual needs. This paper presents an LLM-based search framework that integrates user memory and behavioral modeling to enable proactive, context-aware retrieval. By continuously analyzing user interaction patterns such as past queries, click behavior, and temporal preferences the system builds dynamic user profiles that guide the generation of adaptive query embeddings. This approach allows the model to infer what users intend to search, rather than what they type, resulting in faster response times and significantly higher relevance in returned results. Experimental evaluations demonstrate that the proposed LLM-memory framework reduces query latency by 21.8% and improves top-1 precision by 15.6% compared to traditional retrieval systems. The s...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"preprint","doi":"https://doi.org/10.20944/preprints202603.0182.v1","openalex_id":"https://openalex.org/W7133523620","cited_by_count":0,"quality_score":49,"matched_keywords":["LLM","memory","retrieval"],"author_affiliations":["Google (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8438000082969666},{"id":"https://openalex.org/C158154518","display_name":"Relevance (law)","score":0.6089000105857849},{"id":"https://openalex.org/C189430467","display_name":"Ranking (information retrieval)","score":0.5695000290870667},{"id":"https://openalex.org/C99016210","display_name":"Query expansion","score":0.567799985408783},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.5649999976158142},{"id":"https://openalex.org/C165064840","display_name":"Matching (statistics)","score":0.5031999945640564},{"id":"https://openalex.org/C174348530","display_name":"Bridging (networking)","score":0.48809999227523804},{"id":"https://openalex.org/C67712803","display_name":"User modeling","score":0.45910000801086426}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7133335326","title":"\"Un-default\" Behavior Tuning: Specifying Model Behavior outside the Norm with LLM Self-Playing and Self-Improving","url":"https://doi.org/10.1145/3742413.3789119","published":"2026-03-03","authors":["Soya Park","J.D. Zamfirescu-Pereira","Chinmay Kulkarni"],"abstract":"Specifying model behavior is challenging—especially when the desired behavior is unpopular relative to the model’s training data. Reversing the influence of massive training corpora is both time-consuming and costly, and such interventions are typically inaccessible to end users. While Large Language Models (LLMs) make it easier to write instructions using natural language, specifying unpopular behaviors remains a difficult task.","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3742413.3789119","openalex_id":"https://openalex.org/W7133335326","cited_by_count":1,"quality_score":42,"matched_keywords":["LLM"],"author_affiliations":["Health Initiatives for Youth","International Computer Science Institute","Microsoft (United States)","University of California, Berkeley"],"concepts":[{"id":"https://openalex.org/C191795146","display_name":"Norm (philosophy)","score":0.6184999942779541},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5981000065803528},{"id":"https://openalex.org/C2776608160","display_name":"Natural (archaeology)","score":0.41100001335144043},{"id":"https://openalex.org/C195324797","display_name":"Natural language","score":0.40689998865127563},{"id":"https://openalex.org/C2781085045","display_name":"Reversing","score":0.33970001339912415},{"id":"https://openalex.org/C27415008","display_name":"Psychological intervention","score":0.3314000070095062},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3131999969482422},{"id":"https://openalex.org/C18762648","display_name":"Work (physics)","score":0.28949999809265137}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W7133295865","title":"Multi-degree-of-freedom active alignment for camera modules via a sequential method and streamlined setup","url":"https://doi.org/10.1364/oe.590303","published":"2026-03-03","authors":["Wenjiang Zhu","Chenwei Yang","Jiajian He","Anqin Yao","Yungui Ma"],"abstract":"The cumulative misalignments in optical modules have become a major constraint on imaging quality, making precision alignment crucial in advanced manufacturing. However, traditional passive alignment is inefficient, and existing active alignment (AA) often relies on specialized equipment such as wavefront sensors, struggling to balance accuracy, speed, and practicality. This study introduces a sequential AA method using the modulation transfer function (MTF) as the key metric. The proposed method implements a sequential process: first, compensating for lens group decenters using a sensitivity matrix derived from defocus curves; second, optimizing lens group tilt via Bayesian optimization (BO); and finally, fine-tuning the image sensor by leveraging physical information from multiple fields of view (FoVs). The experimental results demonstrate that our method achieves alignment in merely 8...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1364/oe.590303","openalex_id":"https://openalex.org/W7133295865","cited_by_count":0,"quality_score":41,"matched_keywords":["efficient"],"author_affiliations":["Huawei Technologies (China)","Huawei Technologies (United Kingdom)","Sohu (China)","Zhejiang University"],"concepts":[{"id":"https://openalex.org/C120665830","display_name":"Optics","score":0.6330999732017517},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5282999873161316},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.37540000677108765},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3276999890804291},{"id":"https://openalex.org/C9417928","display_name":"Image processing","score":0.3257000148296356},{"id":"https://openalex.org/C165838908","display_name":"Calibration","score":0.296099990606308},{"id":"https://openalex.org/C121332964","display_name":"Physics","score":0.2870999872684479},{"id":"https://openalex.org/C19819980","display_name":"Beam splitter","score":0.2538999915122986}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/reasoning-as-gradient-scaling-mle-agents-beyond-tree-search","title":"Reasoning as Gradient: Scaling MLE Agents Beyond Tree Search","url":"https://www.microsoft.com/en-us/research/publication/reasoning-as-gradient-scaling-mle-agents-beyond-tree-search/","published":"2026-03-02","authors":["Yifei Zhang","Xu Yang","Xiao Yang","Bowen Xian","Qizheng Li","Shikai Fang","Jingyuan Li","Jian Wang","Mingrui Xu","Weiqing Liu","Jiang Bian"],"abstract":"LLM-based agents for machine learning engineering (MLE) predominantly rely on tree search, a form of gradient-free optimization that uses scalar validation scores to rank candidates. As LLM reasoning capabilities improve, exhaustive enumeration becomes increasingly inefficient compared to directed updates, analogous to how accurate gradients enable efficient descent over random search. We introduce textsc{Gome}, an MLE agent that operationalizes gradient-based optimization. textsc{Gome} maps structured diagnostic reasoning to gradient computation, success memory to momentum, and multi-trace execution to distributed optimization. Under a closed-world protocol that isolates architectural effects from external knowledge, textsc{Gome} achieves a state-of-the-art 35.1% any-medal rate on MLE-Bench with a restricted 12-hour budget on a single V100 GPU. Scaling experiments across 10 models revea...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":84,"matched_keywords":["Miscellaneous","Artificial intelligence","Computer science","Machine learning","LLM","memory","efficient","agent"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/learning-to-draft-adaptive-speculative-decoding-with-reinforcement-learning","title":"Learning to Draft: Adaptive Speculative Decoding with Reinforcement Learning","url":"https://www.microsoft.com/en-us/research/publication/learning-to-draft-adaptive-speculative-decoding-with-reinforcement-learning/","published":"2026-03-02","authors":["Jiebin Zhang","Zhenghan Yu","Liang Wang","Nan Yang","Eugene J. Yu","Zheng Li","Yifan Song","Dawei Zhu","Xingxing Zhang","Furu Wei","Sujian Li"],"abstract":"Speculative decoding accelerates large language model (LLM) inference by using a small draft model to generate candidate tokens for a larger target model to verify. The efficacy of this technique hinges on the trade-off between the time spent on drafting candidates and verifying them. However, current state-of-the-art methods rely on a static time allocation, while recent dynamic approaches optimize for proxy metrics like acceptance length, often neglecting the true time cost and treating the drafting and verification phases in isolation. To address these limitations, we introduce Learning to Draft (LTD), a novel method that directly optimizes for throughput of each draft-and-verify cycle. We formulate the problem as a reinforcement learning environment and train two co-adaptive policies to dynamically coordinate the draft and verification phases. This encourages the policies to adapt to...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Miscellaneous","Artificial intelligence","Computer science","Reinforcement learning","LLM","language model"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/care-towards-clinical-accountability-in-multi-modal-medical-reasoning-with-an-evidence-grounded-agentic-framework","title":"CARE: Towards Clinical Accountability in Multi-Modal Medical Reasoning with an Evidence-Grounded Agentic Framework","url":"https://www.microsoft.com/en-us/research/publication/care-towards-clinical-accountability-in-multi-modal-medical-reasoning-with-an-evidence-grounded-agentic-framework/","published":"2026-03-02","authors":["Yuexi Du","Jinglu Wang","Shujie Liu","N. Dvornek","Yan Lu"],"abstract":"Large visual language models (VLMs) have shown strong multi-modal medical reasoning ability, but most operate as end-to-end black boxes, diverging from clinicians'evidence-based, staged workflows and hindering clinical accountability. Complementarily, expert visual grounding models can accurately localize regions of interest (ROIs), providing explicit, reliable evidence that improves both reasoning accuracy and trust. In this paper, we introduce CARE, advancing Clinical Accountability in multi-modal medical Reasoning with an Evidence-grounded agentic framework. Unlike existing approaches that couple grounding and reasoning within a single generalist model, CARE decomposes the task into coordinated sub-modules to reduce shortcut learning and hallucination: a compact VLM proposes relevant medical entities; an expert entity-referring segmentation model produces pixel-level ROI evidence; and...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Miscellaneous","Artificial intelligence","Medical, health and genomics","Computer science","Medical Large Vision Language Models"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"hf-org-paper:tencent:2603.02049","title":"WorldStereo: Bridging Camera-Guided Video Generation and Scene Reconstruction via 3D Geometric Memories","url":"https://huggingface.co/papers/2603.02049","published":"2026-03-02","authors":["Tencent/Hunyuan"],"abstract":"","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["HuggingFace org papers","tencent"],"author_affiliations":["Tencent/Hunyuan"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/tencent/papers"}},{"id":"hf-org-paper:Tencent-Hunyuan:2603.01562","title":"RubricBench: Aligning Model-Generated Rubrics with Human Standards","url":"https://huggingface.co/papers/2603.01562","published":"2026-03-02","authors":["Tencent/Hunyuan"],"abstract":"","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["HuggingFace org papers","Tencent-Hunyuan"],"author_affiliations":["Tencent/Hunyuan"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/Tencent-Hunyuan/papers"}},{"id":"hf-org-paper:Tencent-Hunyuan:2603.01571","title":"Beyond Length Scaling: Synergizing Breadth and Depth for Generative Reward Models","url":"https://huggingface.co/papers/2603.01571","published":"2026-03-02","authors":["Tencent/Hunyuan"],"abstract":"","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["HuggingFace org papers","Tencent-Hunyuan"],"author_affiliations":["Tencent/Hunyuan"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/Tencent-Hunyuan/papers"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/individual-turing-test-a-case-study-of-llm-based-simulation-using-longitudinal-personal-data","title":"Individual Turing Test: A Case Study of LLM-based Simulation Using Longitudinal Personal Data","url":"https://www.microsoft.com/en-us/research/publication/individual-turing-test-a-case-study-of-llm-based-simulation-using-longitudinal-personal-data/","published":"2026-03-01","authors":["Minghao Guo","Ziying Ye","Wujiang Xu","Xi Zhu","Wenyue Hua","Dimitris N. Metaxas"],"abstract":"Large Language Models (LLMs) have demonstrated remarkable human-like capabilities, yet their ability to replicate a specific individual remains under-explored. This paper presents a case study to investigate LLM-based individual simulation with a volunteer-contributed archive of private messaging history spanning over ten years. Based on the messaging data, we propose the\"Individual Turing Test\"to evaluate whether acquaintances of the volunteer can correctly identify which response in a multi-candidate pool most plausibly comes from the volunteer. We investigate prevalent LLM-based individual simulation approaches including: fine-tuning, retrieval-augmented generation (RAG), memory-based approach, and hybrid methods that integrate fine-tuning and RAG or memory. Empirical results show that current LLM-based simulation methods do not pass the Individual Turing Test, but they perform substa...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":84,"matched_keywords":["Miscellaneous","Artificial intelligence","Computation and Language","Computer science","large language models","LLM","memory","retrieval"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/autoadapt-an-automated-domain-adaptation-framework-for-llms","title":"AutoAdapt: An Automated Domain Adaptation Framework for LLMs","url":"https://www.microsoft.com/en-us/research/publication/autoadapt-an-automated-domain-adaptation-framework-for-llms/","published":"2026-03-01","authors":["Sidharth Sinha","Anson Bastos","Xuchao Zhang","Akshay Nambi","Chetan Bansal","Saravan Rajmohan"],"abstract":"Large language models (LLMs) excel in open domains but struggle in specialized settings with limited data and evolving knowledge. Existing domain adaptation practices rely heavily on manual trial-and-error processes, incur significant hyperparameter complexity, and are highly sensitive to data and user preferences, all under the high cost of LLM training. Moreover, the interactions and transferability of hyperparameter choices across models/domains remain poorly understood, making adaptation gains uncertain even with substantial effort. To solve these challenges, we present AutoAdapt, a novel end-to-end automated framework for efficient and reliable LLM domain adaptation. AutoAdapt leverages curated knowledge bases from literature and open-source resources to reduce expert intervention. To narrow the search space, we design a novel multi-agent debating system in which proposal and critic...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":84,"matched_keywords":["Miscellaneous","Artificial intelligence","Computer science","Machine learning","LLM","efficient","agent","multi-agent"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/online-experiential-learning-for-language-models","title":"Online Experiential Learning for Language Models","url":"https://www.microsoft.com/en-us/research/publication/online-experiential-learning-for-language-models/","published":"2026-03-01","authors":["Tianzhu Ye","Li Dong","Qingxiu Dong","Xun Wu","Shaohan Huang","Furu Wei"],"abstract":"The prevailing paradigm for improving large language models relies on offline training with human annotations or simulated environments, leaving the rich experience accumulated during real-world deployment entirely unexploited. We propose Online Experiential Learning (OEL), a framework that enables language models to continuously improve from their own deployment experience. OEL operates in two stages: first, transferable experiential knowledge is extracted and accumulated from interaction trajectories collected on the user side; second, this knowledge is consolidated into model parameters via on-policy context distillation, requiring no access to the user-side environment. The two stages are iterated to form an online learning loop, where the improved model collects higher-quality trajectories that yield richer experiential knowledge for subsequent rounds. We evaluate OEL on text-based....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Article (Journal)","Artificial intelligence","Computation and Language","Deep learning","Machine learning","distillation"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/streamwise-serving-multi-modal-generation-in-real-time-at-scale","title":"StreamWise: Serving Multi-Modal Generation in Real-Time at Scale","url":"https://www.microsoft.com/en-us/research/publication/streamwise-serving-multi-modal-generation-in-real-time-at-scale/","published":"2026-03-01","authors":["Haoran Qiu","Gohar Irfan Chaudhry","Chaojie Zhang","Íñigo Goiri","Esha Choukse","Rodrigo Fonseca","Ricardo Bianchini"],"abstract":"Advances in multi-modal generative models are enabling new applications, from storytelling to automated media synthesis. Most current workloads generate simple outputs (e.g., image generation from a prompt) in batch mode, often requiring several seconds even for basic results. Serving real-time multi-modal workflows at scale is costly and complex, requiring efficient coordination of diverse models (each with unique resource needs) across language, audio, image, and video, all under strict latency and resource constraints.We tackle these challenges through the lens of real-time podcast video generation, integrating LLMs, text-to-speech, and video-audio generation. To meet tight SLOs, we design an adaptive, modular serving system, StreamWise, that dynamically manages quality (e.g., resolution, sharpness), model/content parallelism, and resource-aware scheduling. We leverage heterogeneous h...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Unpublished","Artificial intelligence","Systems and networking","media","efficient"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/cinescene-implicit-3d-as-effective-scene-representation-for-cinematic-video-generation","title":"CineScene: Implicit 3D as Effective Scene Representation for Cinematic Video Generation","url":"https://www.microsoft.com/en-us/research/publication/cinescene-implicit-3d-as-effective-scene-representation-for-cinematic-video-generation/","published":"2026-03-01","authors":["Kaiyi Huang","Yukun Huang","Yu Li","Jianhong Bai","Xintao Wang","Zinan Lin","Xuefei Ning","Jiwen Yu","Pengfei Wan","Yu Wang","Xihui Liu"],"abstract":"Cinematic video production requires control over scene-subject composition and camera movement, but live-action shooting remains costly due to the need for constructing physical sets. To address this, we introduce the task of cinematic video generation with decoupled scene context: given multiple images of a static environment, the goal is to synthesize high-quality videos featuring dynamic subject while preserving the underlying scene consistency and following a user-specified camera trajectory. We present CineScene, a framework that leverages implicit 3D-aware scene representation for cinematic video generation. Our key innovation is a novel context conditioning mechanism that injects 3D-aware features in an implicit way: By encoding scene images into visual representations through VGGT, CineScene injects spatial priors into a pretrained text-to-video generation model by additional con...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer vision","Computer Vision and Pattern Recognition","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/ai-at-your-fingertips-wearable-ring-as-a-low-friction-interface-for-agentic-ai","title":"AI at your Fingertips: Wearable Ring as a Low-Friction Interface for Agentic AI","url":"https://www.microsoft.com/en-us/research/publication/ai-at-your-fingertips-wearable-ring-as-a-low-friction-interface-for-agentic-ai/","published":"2026-03-01","authors":["Minghui Zhao","Judith Amores","Vaishnavi Ranganathan","Xiaofan Jiang","Bodhi Priyantha"],"abstract":"While \"Agentic AI\" can now execute complex, multi-step workflows, human interaction with these agents remains tethered to high-friction screens. We present a technology probe designed to explore the experience of screenless, fire-and-forget'' task delegation. Our system consists of a custom-built wearable ring with touch input and haptic feedback, paired with an agentic pipeline that autonomously recovers from failures to ensure robust execution. Through an exploratory user study (N=11) involving real-world scenarios, we identify design tensions in screenless interaction. Our findings reveal a conflict between delegation and verification: participants valued the efficiency of screenless interaction for simple tasks but lacked the confidence to delegate complex workflows without audio/visual feedback. We further highlight the social tension of public voice input, where users prefer whispe...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.1145/3772363.3798736","openalex_id":"https://openalex.org/W7153820868","cited_by_count":0,"quality_score":72,"matched_keywords":["Inproceedings (Conference)","Hardware and devices","Human-computer interaction","HCI","1970-01-01"],"author_affiliations":["Microsoft","Columbia University","Microsoft (United States)"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/the-microsoft-northwestern-witness-benchmark-for-deepfake-detection","title":"The Microsoft-Northwestern-WITNESS Benchmark for Deepfake Detection","url":"https://www.microsoft.com/en-us/research/publication/the-microsoft-northwestern-witness-benchmark-for-deepfake-detection/","published":"2026-03-01","authors":["Thomas Roca","PhD","Marco Postiglione","Chongyang Gao","Isabel Gortner","Zuzanna Wojciak","Pengce Wang","Mahsa Alimardani","Shirin Anlen","Kevin White","Juan M. Lavista Ferres","Sarit Kraus"],"abstract":"We introduce the Microsoft-Northwestern-WITNESS (MNW) deepfake detection benchmark, a dataset designed to evaluate and improve artificial intelligence (AI)-generated content detection algorithms. The dataset contains more than 50,000 artifacts (images, videos, and audio files) generated by us. It also includes real-world examples of AI-manipulated or suspicious media encountered by journalists and human rights defenders globally, annotated by experts to reflect practical, high-stakes detection scenarios. The MNW dataset will be periodically updated to cover emerging generators and includes adversarial examples created with state-of-the-art attacks. This is a collaborative effort, and we encourage generative AI model developers to help maintain the dataset’s currency. This dataset is intended solely for evaluation purposes and cannot be used for training or commercial purposes. We recomme...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.1109/mis.2026.3668398","openalex_id":"https://openalex.org/W7153135439","cited_by_count":0,"quality_score":68,"matched_keywords":["Article (Journal)","Artificial intelligence","Computer vision","media"],"author_affiliations":["Microsoft","Bar-Ilan University","Microsoft (United States)","Northwestern University","Services Australia","Witness"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/can-vision-language-models-assess-graphic-design-aesthetics-a-benchmark-evaluation-and-dataset-perspective","title":"Can Vision Language Models Assess Graphic Design Aesthetics? A Benchmark, Evaluation, and Dataset Perspective","url":"https://www.microsoft.com/en-us/research/publication/can-vision-language-models-assess-graphic-design-aesthetics-a-benchmark-evaluation-and-dataset-perspective/","published":"2026-03-01","authors":["Arctanx An","Shizhao Sun","Danqing Huang","Mingxi Cheng","Yang Gao","Ji Li","Yu Qiao","Jiang Bian"],"abstract":"Assessing the aesthetic quality of graphic design is central to visual communication, yet remains underexplored in vision language models (VLMs). We investigate whether VLMs can evaluate design aesthetics in ways comparable to humans. Prior work faces three key limitations: benchmarks restricted to narrow principles and coarse evaluation protocols, a lack of systematic VLM comparisons, and limited training data for model improvement. In this work, we introduce AesEval-Bench, a comprehensive benchmark spanning four dimensions, twelve indicators, and three fully quantifiable tasks: aesthetic judgment, region selection, and precise localization. Then, we systematically evaluate proprietary, open-source, and reasoning-augmented VLMs, revealing clear performance gaps against the nuanced demands of aesthetic assessment. Moreover, we construct a training dataset to fine-tune VLMs for this domai...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Miscellaneous","Computer vision","Computer science","vision language models"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/neither-here-nor-there-cross-lingual-representation-dynamics-of-code-mixed-text-in-multilingual-encoders","title":"Neither Here Nor There: Cross-Lingual Representation Dynamics of Code-Mixed Text in Multilingual Encoders","url":"https://www.microsoft.com/en-us/research/publication/neither-here-nor-there-cross-lingual-representation-dynamics-of-code-mixed-text-in-multilingual-encoders/","published":"2026-03-01","authors":["Debajyoti Mazumder","Divyansh Pathak","Prashant Kodali","Jasabanta Patro"],"abstract":"Multilingual encoder-based language models are widely adopted for code-mixed analysis tasks, yet we know surprisingly little about how they represent code-mixed inputs internally - or whether those representations meaningfully connect to the constituent languages being mixed. Using Hindi-English as a case study, we construct a unified trilingual corpus of parallel English, Hindi (Devanagari), and Romanized code-mixed sentences, and probe cross-lingual representation alignment across standard multilingual encoders and their code-mixed adapted variants via CKA, token-level saliency, and entropy-based uncertainty analysis. We find that while standard models align English and Hindi well, code-mixed inputs remain loosely connected to either language - and that continued pre-training on code-mixed data improves English-code-mixed alignment at the cost of English-Hindi alignment. Interpretabili...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Miscellaneous","Artificial intelligence","Human language technologies"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"bytedance-seed:1411","title":"How RL Unlocks the Aha Moment in Geometric Interleaved Reasoning","url":"https://seed.bytedance.com/en/research/how-rl-unlocks-the-aha-moment-in-geometric-interleaved-reasoning","published":"2026-03-01","authors":["Xiangxiang Zhang","Caijun Jia","Siyuan Li","Dingyu He","Xiya Xiong","Zheng Sun","Honghao He","Yuchen Wu","Bihui Yu","Linzhuang Sun","Cheng Tan","Jingxuan Wei"],"abstract":"Solving complex geometric problems inherently requires interleaved reasoning: a tight alternation between constructing diagrams and performing logical deductions. Although recent Multimodal Large Language Models (MLLMs) have demonstrated strong capabilities in visual generation and plotting, we identify a counter-intuitive and underexplored phenomenon. Naively applying Supervised Fine-Tuning (SFT) on interleaved plot-solution data leads to a substantial degradation in reasoning performance compared to text-only baselines. We argue that this failure stems from a fundamental limitation of SFT, which primarily induces distributional alignment: the model learns to reproduce the surface format of interleaved plotting but fails to internalize the causal dependency between the generated plot and reasoning steps. To overcome this limitation, we propose Faire (Functional alignment for interleaved...","companies":["ByteDance/Seed"],"matched_orgs":["ByteDance/Seed"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Computation and Language","Application","arXiv"],"author_affiliations":["ByteDance/Seed"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official ByteDance Seed publication API https://seed.bytedance.com/api/get_article_list_v2"}},{"id":"hf-org-paper:Tencent-Hunyuan:2603.01142","title":"ArtLLM: Generating Articulated Assets via 3D LLM","url":"https://huggingface.co/papers/2603.01142","published":"2026-03-01","authors":["Tencent/Hunyuan"],"abstract":"","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["HuggingFace org papers","Tencent-Hunyuan","LLM"],"author_affiliations":["Tencent/Hunyuan"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/Tencent-Hunyuan/papers"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/from-pixels-to-patches-pooling-strategies-for-earth-embeddings","title":"From pixels to patches: Pooling strategies for earth embeddings","url":"https://www.microsoft.com/en-us/research/publication/from-pixels-to-patches-pooling-strategies-for-earth-embeddings/","published":"2026-03-01","authors":["Isaac Corley","Caleb Robinson","Inbal Becker-Reshef","Juan M. Lavista Ferres"],"abstract":"As geospatial foundation models shift from patch-level to pixel-level embeddings, practitioners must aggregate thousands of pixel vectors into patch representations that preserve class-discriminative signal while matching downstream label resolution. The default choice, mean pooling, discards within-patch variability and can reduce accuracy by more than 10% under spatial shift. To evaluate this effect, we introduce EuroSAT-Embed, a dataset of 81,000 embedding GeoTIFFs derived from three foundation models: AlphaEarth, OlmoEarth, and Tessera. We benchmark 11 training-free and 2 parametric pooling methods under both random and geographically disjoint test splits. Results show that richer pooling schemes reduce the geographic generalization gap by up to 40% relative to mean pooling and increase accuracy by up to 5% on spatial splits. We recommend Generalized Mean Pooling (GeM) as a drop-in r...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Unpublished","Artificial intelligence"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W7136945678","title":"TaiChiGPT: complex sports action generation based on large language models","url":"https://doi.org/10.1007/s00371-026-04428-8","published":"2026-03-01","authors":["Jianwei Li","Kehao Ran","Yanwen Ma","Hongwen Xie","Yihong Wu"],"abstract":"","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1007/s00371-026-04428-8","openalex_id":"https://openalex.org/W7136945678","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Beijing Sport University","Chinese Academy of Sciences","Shandong Institute of Automation","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7389000058174133},{"id":"https://openalex.org/C48007421","display_name":"Motion capture","score":0.697700023651123},{"id":"https://openalex.org/C104114177","display_name":"Motion (physics)","score":0.6808000206947327},{"id":"https://openalex.org/C48044578","display_name":"Scalability","score":0.6291999816894531},{"id":"https://openalex.org/C2780791683","display_name":"Action (physics)","score":0.597599983215332},{"id":"https://openalex.org/C77660652","display_name":"Computer graphics","score":0.5949000120162964},{"id":"https://openalex.org/C39920418","display_name":"Kinematics","score":0.5856999754905701},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5580999851226807}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7140100285","title":"Intelligent Payment and Treasury Platforms Powered by Generative AI","url":"https://doi.org/10.48047/jocaaa.2026.35.03.29","published":"2026-03-01","authors":["Amit Chaudhary"],"abstract":"The rapid growth of real-time payments, cross-border transactions, and digitally integrated financial ecosystems has intensified the complexity of enterprise payment and treasury operations.Traditional treasury management and payment platforms, which rely on static rules and retrospective analytics, are increasingly inadequate for managing dynamic liquidity, risk, and compliance requirements.This study examines the role of generative artificial intelligence in enabling intelligent payment and treasury platforms capable of adaptive learning, predictive insight generation, and prescriptive decision support.Using a mixedmethod, design-oriented analytical framework, the research evaluates the impact of generative AI on payment efficiency, cash flow forecasting accuracy, liquidity optimization, governance effectiveness, and system resilience.Empirical results based on simulated enterprise-sca...","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.48047/jocaaa.2026.35.03.29","openalex_id":"https://openalex.org/W7140100285","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Amazon (United States)"],"concepts":[{"id":"https://openalex.org/C2780889827","display_name":"Treasury","score":0.8432000279426575},{"id":"https://openalex.org/C145097563","display_name":"Payment","score":0.5698000192642212},{"id":"https://openalex.org/C39389867","display_name":"Corporate governance","score":0.525600016117096},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.47999998927116394},{"id":"https://openalex.org/C183582576","display_name":"Market liquidity","score":0.47519999742507935},{"id":"https://openalex.org/C2780233690","display_name":"Transparency (behavior)","score":0.4580000042915344},{"id":"https://openalex.org/C190978112","display_name":"Payment processor","score":0.40369999408721924},{"id":"https://openalex.org/C75949130","display_name":"Database transaction","score":0.3903000056743622}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7135102609","title":"Cloud Database Product Usage Prediction Based on Component Decomposition and Multimodal Fusion","url":"https://doaj.org/article/e0507ae5513948a582e026c2e79809b7","published":"2026-03-01","authors":["YANG Dingyu, DENG Yufeng, QIAN Shiyou, CAO Jian, XUE Guangtao"],"abstract":"Cloud database technology has been widely used because of its flexible expansion, ease of management, and on-demand charging. Businesses usually select cloud database products based on their specific application scenarios and requirements. Service providers determine the usage of different types of resources, such as computing and storage, to satisfy service requirements. Accurate prediction of cloud product usage is critical for improving resource usage efficiency, reducing operational costs, and ensuring Quality of Service (QoS). However, predicting cloud database product usage is complex. A usage sequence typically comprises multiple interrelated components with complex entanglements. Additionally, the behavioral characteristics of different businesses vary according to cloud products and billing items, which poses a significant challenge for accurate usage prediction. To solve this p...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.19678/j.issn.1000-3428.0069841","openalex_id":"https://openalex.org/W7135102609","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Alibaba Group (China)"],"concepts":[{"id":"https://openalex.org/C79974875","display_name":"Cloud computing","score":0.8493000268936157},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7102000117301941},{"id":"https://openalex.org/C77088390","display_name":"Database","score":0.6642000079154968},{"id":"https://openalex.org/C168167062","display_name":"Component (thermodynamics)","score":0.5902000069618225},{"id":"https://openalex.org/C124101348","display_name":"Data mining","score":0.5389000177383423},{"id":"https://openalex.org/C2778974508","display_name":"Cloud database","score":0.5016999840736389},{"id":"https://openalex.org/C90673727","display_name":"Product (mathematics)","score":0.42080000042915344},{"id":"https://openalex.org/C116537","display_name":"Service provider","score":0.3747999966144562}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"arxiv:2603.01211","title":"A Unified Framework to Quantify Cultural Intelligence of AI","url":"http://arxiv.org/abs/2603.01211","published":"2026-03-01","authors":["Sunipa Dev","Vinodkumar Prabhakaran","Rutledge Chin Feman","Aida Mostafazadeh Davani","Remi Denton","Charu Kalia","Piyawat L Kumjorn","Madhurima Maji","Rida Qadri","Negar Rostamzadeh","Renee Shelby","Romina Stella"],"abstract":"As generative AI technologies are increasingly being launched across the globe, assessing their competence to operate in different cultural contexts is exigently becoming a priority. While recent years have seen numerous and much-needed efforts on cultural benchmarking, these efforts have largely focused on specific aspects of culture and evaluation. While these efforts contribute to our understanding of cultural competence, a unified and systematic evaluation approach is needed for us as a field to comprehensively assess diverse cultural dimensions at scale. Drawing on measurement theory, we present a principled framework to aggregate multifaceted indicators of cultural capabilities into a unified assessment of cultural intelligence. We start by developing a working definition of culture that includes identifying core domains of culture. We then introduce a broad-purpose, systematic, an...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"","openalex_id":"https://openalex.org/W7133365569","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Google (United States)"],"concepts":[{"id":"https://openalex.org/C9354725","display_name":"Operationalization","score":0.7890999913215637},{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.49549999833106995},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.4740000069141388},{"id":"https://openalex.org/C2522767166","display_name":"Data science","score":0.4388999938964844},{"id":"https://openalex.org/C2777900618","display_name":"Cultural intelligence","score":0.43209999799728394},{"id":"https://openalex.org/C169087156","display_name":"Framing (construction)","score":0.41920000314712524},{"id":"https://openalex.org/C125209646","display_name":"Cultural diversity","score":0.41690000891685486},{"id":"https://openalex.org/C79581498","display_name":"Suite","score":0.4147000014781952}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"hf-org-paper:huawei-noah:2603.22078","title":"Do World Action Models Generalize Better than VLAs? A Robustness Study","url":"https://huggingface.co/papers/2603.22078","published":"2026-03","authors":["Huawei/Noah"],"abstract":"","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["HuggingFace org papers","huawei-noah"],"author_affiliations":["Huawei/Noah"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/huawei-noah/papers"}},{"id":"official:aa1684ae50935a11","title":"3D-GENERALIST: Vision-Language-Action Models for Crafting 3D Worlds","url":"https://research.nvidia.com/publication/2026-03_3d-generalist-vision-language-action-models-crafting-3d-worlds","published":"2026-03","authors":["Fan-Yun Sun","Shengguang Wu","Christian Jacobsen","Thomas Yim","Haoming Zou","Alex Zook","Shangru Li","Yu-Hsin Chou","Ethem Can","Xunlei Wu","Clemens Eppner","Valts Blukis"],"abstract":"Official NVIDIA Research publication.","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["NVIDIA"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official NVIDIA Research publications page https://research.nvidia.com/publications?f%5B0%5D=publication_date%3A2026&page=0"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/texterial-a-text-as-material-interaction-paradigm-for-llm-mediated-writing","title":"Texterial: A Text-as-Material Interaction Paradigm for LLM-Mediated Writing","url":"https://www.microsoft.com/en-us/research/publication/texterial-a-text-as-material-interaction-paradigm-for-llm-mediated-writing/","published":"2026-02-28","authors":["Jocelyn Shen","Nicolai Marquardt","Hugo Romat","Ken Hinckley","Nathalie Henry Riche","Fanny Chevalier"],"abstract":"[caption id=\"attachment1167963\" align=\"alignnone\" width=\"2017\"] We explore how materials can inspire new interactions with text. Building on insights from a formative study, we propose Texterial, a conceptual {framework} and prototype implementations that treat text-as-material. Our work uses materiality to foster a new way of interacting with generative AI. [/caption]Abstract: What if text could be sculpted and refined like clay -- or cultivated and pruned like a plant? Texterial reimagines text as a material that users can grow, sculpt, and transform. Current generative-AI models enable rich text operations, yet rigid, linear interfaces often mask such capabilities. We explore how the text-as-material metaphor can reveal AI-enabled operations, reshape the writing process, and foster compelling user experiences. A formative study shows that users readily reason with text-as-material, in...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":84,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Human-computer interaction","Computer science","Human–computer interaction","large language models","1970-01-01","LLM"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"hf-org-paper:Qwen:2603.00729","title":"Qwen3-Coder-Next Technical Report","url":"https://huggingface.co/papers/2603.00729","published":"2026-02-28","authors":["Alibaba/Qwen"],"abstract":"We present Qwen3-Coder-Next, an open-weight language model specialized for coding agents. Qwen3-Coder-Next is an 80-billion-parameter model that activates only 3 billion parameters during inference, enabling strong coding capability with efficient inference. In this work, we explore how far strong training recipes can push the capability limits of models with small parameter footprints. To achieve this, we perform agentic training through large-scale synthesis of verifiable coding tasks paired with executable environments, allowing learning directly from environment feedback via mid-training and reinforcement learning. Across agent-centric benchmarks including SWE-Bench and Terminal-Bench, Qwen3-Coder-Next achieves competitive performance relative to its active parameter count. We release both base and instruction-tuned open-weight versions to support research and real-world coding agent...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["HuggingFace org papers","Qwen","language model","efficient","agent"],"author_affiliations":["Alibaba/Qwen"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/Qwen/papers"}},{"id":"openalex:W7132851297","title":"Unified AI Approach Using Encoding and Generative Large Language Models for Variant Product Matching in e-Commerce","url":"https://doi.org/10.1177/2167647x261423127","published":"2026-02-28","authors":["Pedro Herrero-Vidal","You-Lin Chen","Wai‐Ching Liu","Bin Xu","Prithviraj Sen","Lichao Wang"],"abstract":"what the attributes are that vary between them. To satisfy these two requirements, we developed a strategy that leverages the strengths of both encoding and generative AI models. First, we construct a dataset that captures webpage product links, and therefore variant product relationships, to train an encoding large language model (LLM) to predict variant matches for any given pair of products. Second, we use retrieval-augmented generation-prompted generative LLMs to extract variation and common attributes amongst groups of variant products. To validate our strategy, we evaluated model performance using real data from one of the world's leading e-commerce retailers. The results showed that our strategy outperforms alternative solutions and paves the way to exploiting these new types of product relationships.","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1177/2167647x261423127","openalex_id":"https://openalex.org/W7132851297","cited_by_count":0,"quality_score":49,"matched_keywords":["LLM","language model","retrieval"],"author_affiliations":["Amazon (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7328000068664551},{"id":"https://openalex.org/C167966045","display_name":"Generative model","score":0.6769999861717224},{"id":"https://openalex.org/C2780801425","display_name":"Construct (python library)","score":0.6707000136375427},{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.6377999782562256},{"id":"https://openalex.org/C90673727","display_name":"Product (mathematics)","score":0.6101999878883362},{"id":"https://openalex.org/C125411270","display_name":"Encoding (memory)","score":0.5975000262260437},{"id":"https://openalex.org/C165064840","display_name":"Matching (statistics)","score":0.5968000292778015},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5515999794006348}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7132844451","title":"Do LLMs Understand Collaborative Signals? Diagnosis and Repair","url":"https://doi.org/10.1145/3786304.3787876","published":"2026-02-28","authors":["Shahrooz Pouryousef","Ali Montazeralghaem"],"abstract":"Collaborative information from user-item interactions is a fundamental source of signal in successful recommender systems. Recently, researchers have attempted to incorporate this knowledge into large language model-based recommender approaches (LLMRec) to enhance their performance. However, there has been little fundamental analysis of whether LLMs can effectively reason over collaborative information. In this paper, we analyze the ability of LLMs to reason about collaborative information in recommendation tasks, comparing their performance to traditional matrix factorization (MF) models. We propose a simple and effective method to improve LLMs’ reasoning capabilities using retrieval-augmented generation (RAG) over the user-item interaction matrix with four different prompting strategies. Our results show that the LLM outperforms the MF model whenever we provide relevant information in....","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3786304.3787876","openalex_id":"https://openalex.org/W7132844451","cited_by_count":0,"quality_score":49,"matched_keywords":["LLM","language model","retrieval"],"author_affiliations":["Amherst College","Google (United States)","University of Massachusetts Amherst"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6305000185966492},{"id":"https://openalex.org/C21569690","display_name":"Collaborative filtering","score":0.525600016117096},{"id":"https://openalex.org/C557471498","display_name":"Recommender system","score":0.5054000020027161},{"id":"https://openalex.org/C2780586882","display_name":"Simple (philosophy)","score":0.462799996137619},{"id":"https://openalex.org/C42355184","display_name":"Matrix decomposition","score":0.3846000134944916},{"id":"https://openalex.org/C2522767166","display_name":"Data science","score":0.35370001196861267},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.32580000162124634},{"id":"https://openalex.org/C56739046","display_name":"Knowledge management","score":0.30250000953674316}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"arxiv:2601.10936","title":"Can Instructed Retrieval Models Really Support Exploration?","url":"http://arxiv.org/abs/2601.10936","published":"2026-02-28","authors":["Piyush Maheshwari","Sheshera Mysore","Hamed Zamani"],"abstract":"Exploratory searches are characterized by under-specified goals and evolving query intents. In such scenarios, retrieval models that can capture user-specified nuances in query intent and adapt results accordingly are desirable — instruction-following retrieval models promise such a capability. In this work, we evaluate instructed retrievers for the prevalent yet under-explored application of aspect-conditional seed-guided exploration using an expert-annotated test collection. We evaluate both recent LLMs fine-tuned for instructed retrieval and general-purpose LLMs prompted for ranking with the highly performant Pairwise Ranking Prompting. We find that the best instructed retrievers improve on ranking relevance compared to instruction-agnostic approaches. However, we also find that instruction following performance, crucial to the user experience of interacting with models, does not mirr...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3786304.3787888","openalex_id":"https://openalex.org/W7124700920","cited_by_count":0,"quality_score":41,"matched_keywords":["retrieval"],"author_affiliations":["Microsoft (United States)","Seattle University","University of Massachusetts Amherst"],"concepts":[{"id":"https://openalex.org/C189430467","display_name":"Ranking (information retrieval)","score":0.8877999782562256},{"id":"https://openalex.org/C158154518","display_name":"Relevance (law)","score":0.8125},{"id":"https://openalex.org/C184898388","display_name":"Pairwise comparison","score":0.7098000049591064},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.685699999332428},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.6011000275611877},{"id":"https://openalex.org/C2777267654","display_name":"Test (biology)","score":0.4973999857902527},{"id":"https://openalex.org/C3018260909","display_name":"Exploratory analysis","score":0.4602000117301941},{"id":"https://openalex.org/C85973986","display_name":"Exploratory research","score":0.35440000891685486}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/multimodal-alignment-improves-generalizability-of-genomic-biomarker-prediction-in-computational-pathology","title":"Multimodal Alignment Improves Generalizability of Genomic Biomarker Prediction in Computational Pathology","url":"https://www.microsoft.com/en-us/research/publication/multimodal-alignment-improves-generalizability-of-genomic-biomarker-prediction-in-computational-pathology/","published":"2026-02-27","authors":["Ekaterina Redekop","Eric Zimmermann","Ava P. Amini","Alex Lu","Neil Tenenholtz","Jimmy Hall","Lorin Crawford","Kristen Severson"],"abstract":"Computational pathology models that use digitized histopathology whole-slide images have the potential to become a cost-effective and scalable alternative to molecular assays for the prediction of genomic biomarkers, a key task in precision oncology. However, as new genomic biomarkers are discovered or quantified, large, labeled datasets must be prospectively collected to train new models. To address this challenge, we developed MARBLE, a multimodal contrastive pretraining strategy that integrates structured biomarker knowledge into representation learning of histopathology images. MARBLE aligns histopathology-derived representations with representations of genomic biomarkers generated by a large language model (LLM) and a protein language model (PLM). This biologically informed alignment enables data-efficient generalization to novel, out-of-distribution biomarkers. Using the MSK-IMPACT...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":84,"matched_keywords":["Miscellaneous","Artificial intelligence","Medical, health and genomics","Biology","large language models","LLM","language model","efficient"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/keep-a-kv-cache-centric-memory-management-system-for-efficient-embodied-planning","title":"KEEP: A KV-Cache-Centric Memory Management System for Efficient Embodied Planning","url":"https://www.microsoft.com/en-us/research/publication/keep-a-kv-cache-centric-memory-management-system-for-efficient-embodied-planning/","published":"2026-02-27","authors":["Zebin Yang","Tong Xie","Baotong Lu","Shaoshan Liu","Bo Yu","Meng Li"],"abstract":"Memory-augmented Large Language Models (LLMs) have demonstrated remarkable capability for complex and long-horizon embodied planning. By keeping track of past experiences and environmental states, memory enables LLMs to maintain a global view, thereby avoiding repetitive exploration. However, existing approaches often store the memory as raw text, leading to excessively long prompts and high prefill latency. While it is possible to store and reuse the KV caches, the efficiency benefits are greatly undermined due to frequent KV cache updates. In this paper, we propose KEEP, a KV-cache-centric memory management system for efficient embodied planning. KEEP features 3 key innovations: (1) a Static-Dynamic Memory Construction algorithm that reduces KV cache recomputation by mixed-granularity memory group; (2) a Multi-hop Memory Re-computation algorithm that dynamically identifies important cr...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Miscellaneous","Artificial intelligence","Computer science","large language models","memory","efficient"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/reasoning-driven-multimodal-llm-for-domain-generalization","title":"Reasoning-Driven Multimodal LLM for Domain Generalization","url":"https://www.microsoft.com/en-us/research/publication/reasoning-driven-multimodal-llm-for-domain-generalization/","published":"2026-02-27","authors":["Zhipeng Xu","Zilong Wang","Xinyang Jiang","Dongsheng Li","De Cheng","Nannan Wang"],"abstract":"This paper addresses the domain generalization (DG) problem in deep learning. While most DG methods focus on enforcing visual feature invariance, we leverage the reasoning capability of multimodal large language models (MLLMs) and explore the potential of constructing reasoning chains that derives image categories to achieve more robust predictions under domain shift. To this end, we systematically study the role of reasoning in DG using DomainBed-Reasoning, a newly constructed extension of DomainBed dataset, in which each sample is paired with class-relevant reasoning chains. Our analysis reveals two key challenges: (i) fine-tuning MLLMs with reasoning chains for classification is more challenging than direct label supervision, since the model must optimize complex reasoning sequences before label prediction; and (ii) mismatches in reasoning patterns between supervision signals and fine...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Miscellaneous","Artificial intelligence","Computer science","Multimodal Large Language Models","LLM"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/position-science-of-ai-evaluation-requires-item-level-benchmark-data","title":"Position: Science of AI Evaluation Requires Item-level Benchmark Data","url":"https://www.microsoft.com/en-us/research/publication/position-science-of-ai-evaluation-requires-item-level-benchmark-data/","published":"2026-02-27","authors":["Hang Jiang","Susu Zhang","Xiaoyuan Yi","Xing Xie","Ziang Xiao"],"abstract":"AI evaluations have become the primary evidence for deploying generative AI systems across high-stakes domains. However, current evaluation paradigms often exhibit systemic validity failures. These issues, ranging from unjustified design choices to misaligned metrics, remain intractable without a principled framework for gathering validity evidence and conducting granular diagnostic analysis. In this position paper, we argue that item-level AI benchmark data is essential for establishing a rigorous science of AI evaluation. Item-level analysis enables fine-grained diagnostics and principled validation of benchmarks. We substantiate this position by dissecting current validity failures and revisiting evaluation paradigms across computer science and psychometrics. Through illustrative analyses of item properties and latent constructs, we demonstrate the unique insights afforded by item-lev...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Miscellaneous","Artificial intelligence","Data platforms and analytics","Computer science"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"bytedance-seed:1393","title":"Does Your Reasoning Model Implicitly Know When to Stop Thinking?","url":"https://seed.bytedance.com/en/research/does-your-reasoning-model-implicitly-know-when-to-stop-thinking","published":"2026-02-27","authors":["Zixuan Huang","Xin Xia","Yuxi Ren","Jianbin Zheng","Xuanda Wang","Zhixia Zhang","Hongyan Xie","Songshi Liang","Zehao Chen","Xuefeng Xiao","Fuzhen Zhuang","Jianxin Li"],"abstract":"Recent advancements in large reasoning models (LRMs) have greatly improved their capabilities on complex reasoning tasks through Long Chains of Thought (CoTs). However, this approach often results in substantial redundancy, impairing computational efficiency and causing significant delays in real-time applications. Recent studies show that longer reasoning chains are frequently uncorrelated with correctness and can even be detrimental to accuracy. In a further in-depth analysis of this phenomenon, we surprisingly uncover and empirically verify that LRMs implicitly know the appropriate time to stop thinking, while this capability is obscured by current sampling paradigms. Motivated by this, we introduce SAGE (Self-Aware Guided Efficient Reasoning), a novel sampling paradigm that unleashes this efficient reasoning potential. Furthermore, integrating SAGE as mixed sampling into group-based....","companies":["ByteDance/Seed"],"matched_orgs":["ByteDance/Seed"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Artificial Intelligence","Vision","arXiv","efficient"],"author_affiliations":["ByteDance/Seed"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official ByteDance Seed publication API https://seed.bytedance.com/api/get_article_list_v2"}},{"id":"bytedance-seed:1412","title":"CUDA Agent: Large-Scale Agentic RL for High-Performance CUDA Kernel Generation","url":"https://seed.bytedance.com/en/research/cuda-agent-large-scale-agentic-rl-for-high-performance-cuda-kernel-generation","published":"2026-02-27","authors":["Weinan Dai","Hanlin Wu","Qiying Yu","Huan-ang Gao","Jiahao Li","Chengquan Jiang","Weiqiang Lou","Yufan Song","Hongli Yu","Jiaze Chen","Wei-Ying Ma","Ya-Qin Zhang"],"abstract":"GPU kernel optimization is fundamental to modern deep learning but remains a highly specialized task requiring deep hardware expertise. Despite strong performance in general programming, large language models (LLMs) remain uncompetitive with compiler-based systems such as this http URL for CUDA kernel generation. Existing CUDA code generation approaches either rely on training-free refinement or fine-tune models within fixed multi-turn execution-feedback loops, but both paradigms fail to fundamentally improve the model's intrinsic CUDA optimization ability, resulting in limited performance gains. We present CUDA Agent, a large-scale agentic reinforcement learning system that develops CUDA kernel expertise through three components: a scalable data synthesis pipeline, a skill-augmented CUDA development environment with automated verification and profiling to provide reliable reward signals...","companies":["ByteDance/Seed"],"matched_orgs":["ByteDance/Seed"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Machine Learning","LLM","arXiv","agent"],"author_affiliations":["ByteDance/Seed"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official ByteDance Seed publication API https://seed.bytedance.com/api/get_article_list_v2"}},{"id":"bytedance-seed:1420","title":"Steerable Instruction Following Coding Data Synthesis with Actor-Parametric Schema Co-Evolution","url":"https://seed.bytedance.com/en/research/steerable-instruction-following-coding-data-synthesis-with-actor-parametric-schema-co-evolution","published":"2026-02-27","authors":["Tinglin Huang","Bo Chen","Xiao Zhang","Kai Shen","Rex Ying"],"abstract":"Interpreting and following human instructions is a critical capability of large language models (LLMs) in automatic programming. However, synthesizing large-scale instruction-paired coding data remains largely unexplored and is particularly challenging when ensuring logical compatibility among multiple constraints. In this study, we propose IFCodeEvolve, an actor-schema co-evolution framework for instruction following coding data generation. By representing instructions as parametric function schema, we construct a library that covers the vast instruction space via dynamic constraint instantiation. Building upon this, Monte Carlo Tree Search (MCTS) sampler is applied to efficiently navigate this space, utilizing actor model feedback as a dynamic termination signal. Furthermore, to progressively explore challenging problems, we introduce a co-evolving paradigm that iteratively advances bo...","companies":["ByteDance/Seed"],"matched_orgs":["ByteDance/Seed"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Software Engineering","LLM","arXiv"],"author_affiliations":["ByteDance/Seed"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official ByteDance Seed publication API https://seed.bytedance.com/api/get_article_list_v2"}},{"id":"official:38469d6a841a83fd","title":"Unified Vision–Language Modeling via Concept Space Alignment","url":"https://ai.meta.com/research/publications/unified-vision-language-modeling-via-concept-space-alignment/","published":"2026-02-27","authors":["Yifu Qiu","Paul-Ambroise Duquenne","Holger Schwenk"],"abstract":"Official Meta AI publication page.","companies":["Meta/FAIR"],"matched_orgs":["Meta/FAIR"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Human & Machine Intelligence"],"author_affiliations":["Meta/FAIR"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Meta AI publication search https://ai.meta.com/results/?content_types%5B0%5D=publication&page=1"}},{"id":"apple:oe6xy419hvqt6vy3so6crqzz","title":"Scaling Search Relevance: Augmenting App Store Ranking with LLM-Generated Judgments","url":"https://machinelearning.apple.com/research/augmenting-app","published":"2026-02-27","authors":["Evangelia Christakopoulou","Vivekkumar Patel","Hemanth Velaga","Sandip Gaikwad","Sean Suchter","Venkat Sundaranatha"],"abstract":"Large-scale commercial search systems optimize for relevance to drive successful sessions that help users find what they are looking for. To maximize relevance, we leverage two complementary objectives: behavioral relevance (results users tend to click or download) and textual relevance (a result's semantic fit to the query). A persistent challenge is the scarcity of expert-provided textual relevance labels relative to abundant behavioral...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"https://openalex.org/W7131909598","cited_by_count":0,"quality_score":56,"matched_keywords":["LLM"],"author_affiliations":["Apple","Apple (United States)"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"openalex:W7131784568","title":"A scalable framework for evaluating health language models","url":"https://doi.org/10.1038/s41746-026-02492-x","published":"2026-02-27","authors":["Neil Mallinar","A. Ali Heydari","Xin Liu","Anthony Z. Faranesh","Brent Winslow","Nova Hammerquist","Benjamin Graef","Cathy Speed","Mark Malhotra","Shwetak Patel","Javier L. Prieto","Daniel McDuff"],"abstract":"Large language models (LLMs) have emerged as powerful tools for analyzing and interpreting complex datasets. Recent studies demonstrate their potential to generate useful, personalized responses when provided with patient-specific health information that encompasses lifestyle, biomarkers, and context. As LLM-driven health applications are increasingly adopted, rigorous and efficient one-sided evaluation methodologies are crucial to ensure response quality across multiple dimensions, including accuracy, personalization, relevance and safety. However, current evaluation practices, particularly for open-ended text responses, heavily rely on human experts. This approach introduces human factors (perspectives, potential biases, inconsistencies) and is often cost-prohibitive, labor-intensive, and hinders scalability, especially in complex domains like healthcare where response assessment neces...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1038/s41746-026-02492-x","openalex_id":"https://openalex.org/W7131784568","cited_by_count":2,"quality_score":55,"matched_keywords":["LLM","personalized","personalization","efficient"],"author_affiliations":["Google (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7577999830245972},{"id":"https://openalex.org/C111640148","display_name":"Rubric","score":0.7113000154495239},{"id":"https://openalex.org/C177264268","display_name":"Set (abstract data type)","score":0.6184999942779541},{"id":"https://openalex.org/C158154518","display_name":"Relevance (law)","score":0.6032999753952026},{"id":"https://openalex.org/C48044578","display_name":"Scalability","score":0.5371000170707703},{"id":"https://openalex.org/C36503486","display_name":"Domain (mathematical analysis)","score":0.4706999957561493},{"id":"https://openalex.org/C2522767166","display_name":"Data science","score":0.42989999055862427},{"id":"https://openalex.org/C2779530757","display_name":"Quality (philosophy)","score":0.4293000102043152}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":2}},{"id":"openalex:W7131906619","title":"Towards multi-language repository-level code generation: From-scratch to guided tasks","url":"https://doi.org/10.1016/j.neucom.2026.133204","published":"2026-02-27","authors":["Jingjing Liu","Silin Li","Zeming Liu","Zihao Cheng","Yuhang Guo","Yuanfang Guo","Yunhong Wang","Haifeng Wang"],"abstract":"","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1016/j.neucom.2026.133204","openalex_id":"https://openalex.org/W7131906619","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Baidu (China)","Beihang University","Beijing Institute of Technology"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8810999989509583},{"id":"https://openalex.org/C55439883","display_name":"Correctness","score":0.5939000248908997},{"id":"https://openalex.org/C97541855","display_name":"Reinforcement learning","score":0.5598000288009644},{"id":"https://openalex.org/C185798385","display_name":"Benchmark (surveying)","score":0.536899983882904},{"id":"https://openalex.org/C177774035","display_name":"Granularity","score":0.5270000100135803},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.49799999594688416},{"id":"https://openalex.org/C133162039","display_name":"Code generation","score":0.4927000105381012},{"id":"https://openalex.org/C63479239","display_name":"Robustness (evolution)","score":0.4781000018119812}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7131783941","title":"Towards fine-grained vision-language alignment for few-shot anomaly detection","url":"https://doi.org/10.1016/j.patcog.2026.113316","published":"2026-02-27","authors":["Yuanting Fan","Jun Liu","Xiaochen Chen","Bin-Bin Gao","Jian Li","Yong Liu","Jinlong Peng","Chengjie Wang"],"abstract":"","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1016/j.patcog.2026.113316","openalex_id":"https://openalex.org/W7131783941","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Tencent (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7426999807357788},{"id":"https://openalex.org/C739882","display_name":"Anomaly detection","score":0.7182999849319458},{"id":"https://openalex.org/C87619178","display_name":"Concatenation (mathematics)","score":0.6301000118255615},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.6126999855041504},{"id":"https://openalex.org/C184337299","display_name":"Semantics (computer science)","score":0.5885000228881836},{"id":"https://openalex.org/C177148314","display_name":"Generalization","score":0.5846999883651733},{"id":"https://openalex.org/C48145219","display_name":"Security token","score":0.5778999924659729},{"id":"https://openalex.org/C2776214188","display_name":"Inference","score":0.5486999750137329}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"official:4102c05c92d75f91","title":"Gemini 3.1 Flash Image Model Card","url":"https://deepmind.google/models/model-cards/gemini-3-1-flash-image/","published":"2026-02-26","authors":["Google/DeepMind"],"abstract":"Official Google DeepMind model card.","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"model_card","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["model card","Gemini 3.1 Flash Image"],"author_affiliations":["Google/DeepMind"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Google DeepMind model cards page https://deepmind.google/models/model-cards/"}},{"id":"openalex:W7152707616","title":"Deploying Disconnected Generative AI for Naval Maintenance and Decision Superiority","url":"https://doi.org/10.48448/x019-q055","published":"2026-02-26","authors":["Armed Forces Communications and Electronics Association 2026","Kelly Jones"],"abstract":"As Naval forces increasingly operate in contested, disconnected, intermittent, and limited environments, reliance on cloud-tethered Artificial Intelligence becomes a critical vulnerability. Large Language Models and Generative AI offer clear value for predictive maintenance, technical document synthesis, and decision support, but most current architectures depend on reach-back to hyper-scale data centers. This dependency is not viable for vessels operating under EMCON or in bandwidth-denied theaters. This presentation examines the architectural and operational considerations for deploying sovereign AI at the tactical edge. The focus is the convergence of ruggedized compute and air-gapped Generative AI models running entirely within the shipboard perimeter. We introduce a reference architecture for a disconnected maintenance advisor. The system uses Retrieval-Augmented Generation to inges...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"other","doi":"https://doi.org/10.48448/x019-q055","openalex_id":"https://openalex.org/W7152707616","cited_by_count":0,"quality_score":49,"matched_keywords":["retrieval","efficient","quantization"],"author_affiliations":["Google (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7103000283241272},{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.5349000096321106},{"id":"https://openalex.org/C2776214188","display_name":"Inference","score":0.5080000162124634},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4887999892234802},{"id":"https://openalex.org/C26517878","display_name":"Key (lock)","score":0.4853000044822693},{"id":"https://openalex.org/C167966045","display_name":"Generative model","score":0.42170000076293945},{"id":"https://openalex.org/C2779231336","display_name":"Sketch","score":0.4146000146865845},{"id":"https://openalex.org/C123657996","display_name":"Architecture","score":0.3968999981880188}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7131656703","title":"Patient Experiences in the Cochlear Implant Reddit Community: Comparing Human and Large Language Model Categorization","url":"https://doi.org/10.1044/2025_aja-25-00216","published":"2026-02-26","authors":["Daniel R. S. Habib","Kiran S Depala","Jack Lin","Samuel Le","Natalie McFall","Shiv S. Dewan","Justin Huang","Michael W. S. Habib","Anthony E. Bishay","Konrad Siebor","Gizem Babaoğlu","Naweed I. Chowdhury"],"abstract":"PURPOSE: Although some work has leveraged automated analyses of online communities to gain cochlear implant (CI) patient insights, there remains a gap in comparing human versus automated analysis of the nuanced, real-world experiences patients share outside clinical settings. This study characterizes experiences within the r/Cochlearimplants Reddit community and compares human to large language model (LLM) performance in annotating posts. METHOD: Using reflexive thematic analysis, 996 publicly available r/Cochlearimplants posts (October 2024-June 2025) were manually coded and consolidated into themes. Three LLMs-OpenAI o3, Gemini 2.5 Pro, and Claude Sonnet 4-were prompted with the posts and human-generated codebook to perform post categorization. Model performance was evaluated against human coding using Cohen's kappa, percent agreement, sensitivity, specificity, positive predictive valu...","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1044/2025_aja-25-00216","openalex_id":"https://openalex.org/W7131656703","cited_by_count":0,"quality_score":45,"matched_keywords":["LLM","language model"],"author_affiliations":["Amazon (United States)","Duke University","Vanderbilt University","Vanderbilt University Medical Center"],"concepts":[{"id":"https://openalex.org/C94124525","display_name":"Categorization","score":0.7414000034332275},{"id":"https://openalex.org/C2778882171","display_name":"Cochlear implant","score":0.6675000190734863},{"id":"https://openalex.org/C548259974","display_name":"Audiology","score":0.5462999939918518},{"id":"https://openalex.org/C15744967","display_name":"Psychology","score":0.527400016784668},{"id":"https://openalex.org/C71924100","display_name":"Medicine","score":0.37940001487731934},{"id":"https://openalex.org/C26760741","display_name":"Perception","score":0.37310001254081726},{"id":"https://openalex.org/C2779473830","display_name":"MEDLINE","score":0.3490000069141388},{"id":"https://openalex.org/C3017423656","display_name":"Cochlear implantation","score":0.3474999964237213}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"arxiv:2603.19258","title":"MAPLE: Metadata Augmented Private Language Evolution","url":"http://arxiv.org/abs/2603.19258","published":"2026-02-26","authors":["Eli Chien","Yuzheng Hu","Ryan McKenna","Shanshan Wu","Zheng Xu","Peter Kairouz"],"abstract":"While differentially private (DP) fine-tuning of large language models (LLMs) is a powerful tool, it is often computationally prohibitive or infeasible when state-of-the-art models are only accessible via proprietary APIs. In such settings, generating DP synthetic data has emerged as a crucial alternative, offering the added benefits of arbitrary reuse across downstream tasks and transparent exploratory data analysis without the opaque constraints of a model's parameter space. Private Evolution (PE) is a promising API-based framework for this goal; however, its performance critically depends on initialization. When the private data distribution deviates substantially from the foundation model's pre-training priors--particularly in highly specialized domains--PE frequently struggles to align with the target data, resulting in degraded utility, poor convergence, and inefficient API usage.....","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"","openalex_id":"https://openalex.org/W7140238512","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Google (United States)"],"concepts":[{"id":"https://openalex.org/C93518851","display_name":"Metadata","score":0.9078999757766724},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.777999997138977},{"id":"https://openalex.org/C206588197","display_name":"Reuse","score":0.6029000282287598},{"id":"https://openalex.org/C114466953","display_name":"Initialization","score":0.5074999928474426},{"id":"https://openalex.org/C110326360","display_name":"Metadata modeling","score":0.44369998574256897},{"id":"https://openalex.org/C160920958","display_name":"Synthetic data","score":0.3684999942779541},{"id":"https://openalex.org/C153048206","display_name":"Metadata repository","score":0.3310000002384186},{"id":"https://openalex.org/C2777466982","display_name":"Data extraction","score":0.32170000672340393}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/satext-generative-ai-framework-for-spatio-spectral-satellite-unification-and-beyond","title":"SatExt: Generative AI framework for Spatio-Spectral Satellite Unification and Beyond","url":"https://www.microsoft.com/en-us/research/publication/satext-generative-ai-framework-for-spatio-spectral-satellite-unification-and-beyond/","published":"2026-02-25","authors":["Nazish Naeem","Peder Olsen","Vaishnavi Ranganathan"],"abstract":"Satellite imagery is indispensable for remote sensing and environmental monitoring. However, despite the presence of over a thousand Earth-observation satellites, achieving frequent and consistent global monitoring remains challenging due to the heterogeneity across satellite constellations. This work explores the potential of generative AI as a foundation for unifying such heterogeneous imagery into a standardized, high-resolution format. We introduce SatExt, a modular generative framework that enables spatio-spectral unification of multi-satellite data to facilitate frequent and consistent Earth monitoring. SatExt strategically decouples the unification process into two stages: spectral extension and spatial super-resolution. It integrates a lightweight attention-based network for spectral extension with a diffusion-based model for high-resolution spatial reconstruction. We evaluate Sa...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.1145/3789514.3792050","openalex_id":"https://openalex.org/W7133222271","cited_by_count":0,"quality_score":72,"matched_keywords":["Article (Journal)","Ecology and environment","Systems and networking","Technology for emerging markets","Computer science"],"author_affiliations":["Microsoft","Massachusetts Institute of Technology","Microsoft (United States)"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"bytedance-seed:1382","title":"FlowPortrait: Reinforcement Learning for Audio-Driven Portrait Video Generation","url":"https://seed.bytedance.com/en/research/flowportrait-reinforcement-learning-for-audio-driven-portrait-video-generation","published":"2026-02-25","authors":["Weiting Tan","Andy T. Liu","Ming Tu","Xinghua Qu","Philipp Koehn","Lu Lu"],"abstract":"Generating realistic talking-head videos remains challenging due to persistent issues such as imperfect lip synchronization, unnatural motion, and evaluation metrics that correlate poorly with human perception. We propose FlowPortrait, a reinforcement-learning framework for audio-driven portrait animation built on a multimodal backbone for autoregressive audio-to-video generation. FlowPortrait introduces a human-aligned evaluation system based on Multimodal Large Language Models (MLLMs) to assess lip-sync accuracy, expressiveness, and motion quality. These signals are combined with perceptual and temporal consistency regularizers to form a stable composite reward, which is used to post-train the generator via Group Relative Policy Optimization (GRPO). Extensive experiments, including both automatic evaluations and human preference studies, demonstrate that FlowPortrait consistently produ...","companies":["ByteDance/Seed"],"matched_orgs":["ByteDance/Seed"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Computer Vision and Pattern Recognition","Speech","arXiv","preference"],"author_affiliations":["ByteDance/Seed"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official ByteDance Seed publication API https://seed.bytedance.com/api/get_article_list_v2"}},{"id":"bytedance-seed:1418","title":"World Guidance: World Modeling in Condition Space for Action Generation","url":"https://seed.bytedance.com/en/research/world-guidance-world-modeling-in-condition-space-for-action-generation","published":"2026-02-25","authors":["Yue Su","Sijin Chen","Haixin Shi","Mingyu Liu","Zhengshen Zhang","Ningyuan Huang","Weiheng Zhong","Zhengbang Zhu","Yuxiao Liu","Xihui Liu"],"abstract":"Leveraging future observation modeling to facilitate action generation presents a promising avenue for enhancing the capabilities of Vision-Language-Action (VLA) models. However, existing approaches struggle to strike a balance between maintaining efficient, predictable future representations and preserving sufficient fine-grained information to guide precise action generation. To address this limitation, we propose WoG (World Guidance), a framework that maps future observations into compact conditions by injecting them into the action inference pipeline. The VLA is then trained to simultaneously predict these compressed conditions alongside future actions, thereby achieving effective world modeling within the condition space for action inference. We demonstrate that modeling and predicting this condition space not only facilitates fine-grained action generation but also exhibits superio...","companies":["ByteDance/Seed"],"matched_orgs":["ByteDance/Seed"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Robotics","arXiv","efficient"],"author_affiliations":["ByteDance/Seed"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official ByteDance Seed publication API https://seed.bytedance.com/api/get_article_list_v2"}},{"id":"apple:scgmj1or50b7bn7c0s8u70xq","title":"Closing the Gap Between Text and Speech Understanding in LLMs","url":"https://machinelearning.apple.com/research/closing-the-gap","published":"2026-02-25","authors":["Santiago Cuervo","Skyler Seto","Maureen de Seyssel","Richard He Bai","Zijin Gu","Tatiana Likhomanenko","Navdeep Jaitly","Zakaria Aldeneh"],"abstract":"Large Language Models (LLMs) can be adapted to extend their text capabilities to speech inputs. However, these speech-adapted LLMs consistently underperform their text-based counterparts--and even cascaded pipelines--on language understanding tasks. We term this shortfall the text-speech understanding gap: the performance drop observed when a speech-adapted LLM processes spoken inputs relative to when the original text-based LLM processes the...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["LLM"],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"apple:zolyizxbrl68ygjkzbzkdh7x","title":"Constructive Circuit Amplification: Improving Math Reasoning in LLMs via Targeted Sub-Network Updates","url":"https://machinelearning.apple.com/research/constructive-circuit-amplification","published":"2026-02-25","authors":["Nikhil Prakash","Donghao Ren","Dominik Moritz","Yannick Assogba"],"abstract":"Prior studies investigating the internal workings of LLMs have uncovered sparse subnetworks, often referred to as circuits, that are responsible for performing specific tasks. Additionally, it has been shown that model performance improvement through fine-tuning often results from the strengthening of existing circuits in the model. Taken together, these findings suggest the possibility of intervening directly on such circuits to make precise,...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"arxiv:2601.17676","title":"GazeSummary: Exploring Gaze as an Implicit Prompt for Personalization in Text-based LLM Tasks","url":"http://arxiv.org/abs/2601.17676","published":"2026-02-25","authors":["Jiexin Ding","Yizhuo Zhang","Xinyun Liu","Ke Chen","Yuntao Wang","Shwetak Patel","Akshay Gadre"],"abstract":"Smart glasses are accelerating progress toward more seamless and personalized LLM-based assistance by integrating multimodal inputs. Yet, these inputs rely on obtrusive explicit prompts. The advent of gaze tracking on smart devices offers a unique opportunity to extract implicit user intent for personalization. This paper investigates whether LLMs can interpret user gaze for text-based tasks. We evaluate different gaze representations for personalization and validate their effectiveness in realistic reading tasks. Results show that LLMs can leverage gaze to generate high-quality personalized summaries and support users in downstream tasks, highlighting the feasibility and value of gaze-driven personalization for future mobile and wearable LLM applications.","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3789514.3792037","openalex_id":"https://openalex.org/W7125800206","cited_by_count":0,"quality_score":49,"matched_keywords":["LLM","personalized","personalization"],"author_affiliations":["Google (United States)","Seattle University","Tsinghua University","University of Washington"],"concepts":[{"id":"https://openalex.org/C183003079","display_name":"Personalization","score":0.8862000107765198},{"id":"https://openalex.org/C2779916870","display_name":"Gaze","score":0.8598999977111816},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.6373999714851379},{"id":"https://openalex.org/C153083717","display_name":"Leverage (statistics)","score":0.6103000044822693},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6018000245094299},{"id":"https://openalex.org/C150594956","display_name":"Wearable computer","score":0.49000000953674316},{"id":"https://openalex.org/C56461940","display_name":"Eye tracking","score":0.41119998693466187},{"id":"https://openalex.org/C54290928","display_name":"Wearable technology","score":0.39089998602867126}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7134193820","title":"Adaptive Orchestration for Large‐Scale Inference on Heterogeneous Accelerator Systems: Balancing Cost, Performance, and Resilience","url":"https://doi.org/10.1002/cpe.70627","published":"2026-02-25","authors":["Yahav Biran","Imry Kissos"],"abstract":"ABSTRACT The surge in generative AI workloads has created a need for scalable inference systems that can flexibly harness both GPUs and specialized accelerators while containing operational costs. This paper proposes a hardware‐agnostic control loop that adaptively allocates requests across heterogeneous accelerators based on real‐time cost and capacity signals. The approach sustains low latency and high throughput by dynamically shifting between cost‐optimized and capacity‐optimized modes—ensuring the most efficient use of expensive compute resources under fluctuating availability. Evaluated using the Stable Diffusion model, the framework consistently meets latency targets, automatically redirects traffic during capacity shortfalls, and capitalizes on lower‐cost accelerators when possible. These results highlight how a feedback‐driven deployment strategy, spanning the entire software an...","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1002/cpe.70627","openalex_id":"https://openalex.org/W7134193820","cited_by_count":0,"quality_score":41,"matched_keywords":["efficient"],"author_affiliations":["Amazon (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8913999795913696},{"id":"https://openalex.org/C48044578","display_name":"Scalability","score":0.696399986743927},{"id":"https://openalex.org/C105339364","display_name":"Software deployment","score":0.6960999965667725},{"id":"https://openalex.org/C120314980","display_name":"Distributed computing","score":0.6158999800682068},{"id":"https://openalex.org/C2776214188","display_name":"Inference","score":0.5965999960899353},{"id":"https://openalex.org/C82876162","display_name":"Latency (audio)","score":0.5906999707221985},{"id":"https://openalex.org/C2779585090","display_name":"Resilience (materials science)","score":0.5152999758720398},{"id":"https://openalex.org/C199168358","display_name":"Orchestration","score":0.5034999847412109}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"arxiv:2602.22442","title":"A Framework for Assessing AI Agent Decisions and Outcomes in AutoML Pipelines","url":"http://arxiv.org/abs/2602.22442","published":"2026-02-25","authors":["Gaoyuan Du","Amit Ahlawat","Xiaoyang Liu","Jing Wu"],"abstract":"Agent-based AutoML systems rely on large language models to make complex, multi-stage decisions across data processing, model selection, and evaluation. However, existing evaluation practices remain outcome-centric, focusing primarily on final task performance. Through a review of prior work, we find that none of the surveyed agentic AutoML systems report structured, decision-level evaluation metrics intended for post-hoc assessment of intermediate decision quality. To address this limitation, we propose an Evaluation Agent (EA) that performs decision-centric assessment of AutoML agents without interfering with their execution. The EA is designed as an observer that evaluates intermediate decisions along four dimensions: decision validity, reasoning consistency, model quality risks beyond accuracy, and counterfactual decision impact. Across four proof-of-concept experiments, we demonstra...","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"","openalex_id":"https://openalex.org/W7131910287","cited_by_count":0,"quality_score":41,"matched_keywords":["agent"],"author_affiliations":["Amazon (Germany)","Amazon (United States)"],"concepts":[{"id":"https://openalex.org/C108650721","display_name":"Counterfactual thinking","score":0.7944999933242798},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5145999789237976},{"id":"https://openalex.org/C199521495","display_name":"Audit","score":0.47749999165534973},{"id":"https://openalex.org/C12713177","display_name":"Perspective (graphical)","score":0.4307999908924103},{"id":"https://openalex.org/C2779530757","display_name":"Quality (philosophy)","score":0.4138999879360199},{"id":"https://openalex.org/C539667460","display_name":"Management science","score":0.3968000113964081},{"id":"https://openalex.org/C2780451532","display_name":"Task (project management)","score":0.3928000032901764},{"id":"https://openalex.org/C153083717","display_name":"Leverage (statistics)","score":0.39259999990463257}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/actionengine-from-reactive-to-programmatic-gui-agents-via-state-machine-memory","title":"ActionEngine: From Reactive to Programmatic GUI Agents via State Machine Memory","url":"https://www.microsoft.com/en-us/research/publication/actionengine-from-reactive-to-programmatic-gui-agents-via-state-machine-memory/","published":"2026-02-24","authors":["Hongbin Zhong","Fazle Faisal","Luis França","Tanakorn Leesatapornwongsa","Adriana Szekeres","Kexin Rong","Suman Nath"],"abstract":"Existing Graphical User Interface (GUI) agents operate through step-by-step calls to vision language models--taking a screenshot, reasoning about the next action, executing it, then repeating on the new page--resulting in high costs and latency that scale with the number of reasoning steps, and limited accuracy due to no persistent memory of previously visited pages. We propose ActionEngine, a training-free framework that transitions from reactive execution to programmatic planning through a novel two-agent architecture: a Crawling Agent that constructs an updatable state-machine memory of the GUIs through offline exploration, and an Execution Agent that leverages this memory to synthesize complete, executable Python programs for online task execution. To ensure robustness against evolving interfaces, execution failures trigger a vision-based re-grounding fallback that repairs the failed...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Article (Journal)","Systems and networking","Computer science","LLM","memory","agent"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/sibylsense-adaptive-rubric-learning-via-memory-tuning-and-adversarial-probing","title":"SibylSense: Adaptive Rubric Learning via Memory Tuning and Adversarial Probing","url":"https://www.microsoft.com/en-us/research/publication/sibylsense-adaptive-rubric-learning-via-memory-tuning-and-adversarial-probing/","published":"2026-02-24","authors":["Yifei Xu","Guilherme Potje","Shivam Shandilya","Tiancheng Yuan","Leonardo de Oliveira Nunes","Rakshanda Agarwal","Saeid Asgari","Adam Atkinson","Songwu Lu","Emre Kiciman","Ranveer Chandra","Tusher Chakraborty"],"abstract":"Designing aligned and robust rewards for open-ended generation remains a key barrier to RL post-training. Rubrics provide structured, interpretable supervision, but scaling rubric construction is difficult: expert rubrics are costly, prompted rubrics are often superficial or inconsistent, and fixed-pool discriminative rubrics can saturate and drift, enabling reward hacking. We present SibylSense, an inference-time learning approach that adapts a frozen rubric generator through a tunable memory bank of validated rubric items. Memory is updated via verifier-based item rewards measured by reference-candidate answer discriminative gaps from a handful of examples. SibylSense alternates memory tuning with a rubric-adversarial policy update that produces rubric-satisfying candidate answers, shrinking discriminative gaps and driving the rubric generator to capture new quality dimensions. Experim...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Unpublished","Artificial intelligence","large language models","Reinforcement learning","memory"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"hf-org-paper:deepseek-ai:2602.21548","title":"DualPath: Breaking the Storage Bandwidth Bottleneck in Agentic LLM Inference","url":"https://huggingface.co/papers/2602.21548","published":"2026-02-24","authors":["DeepSeek"],"abstract":"","companies":["DeepSeek"],"matched_orgs":["DeepSeek"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["HuggingFace org papers","deepseek-ai","LLM"],"author_affiliations":["DeepSeek"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/deepseek-ai/papers"}},{"id":"hf-org-paper:stepfun-ai:2602.20933","title":"Dropping Anchor and Spherical Harmonics for Sparse-view Gaussian Splatting","url":"https://huggingface.co/papers/2602.20933","published":"2026-02-24","authors":["StepFun"],"abstract":"","companies":["StepFun"],"matched_orgs":["StepFun"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["HuggingFace org papers","stepfun-ai"],"author_affiliations":["StepFun"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/stepfun-ai/papers"}},{"id":"apple:m8e9z2blnaohcov66gxets7z","title":"Beyond a Single Extractor: Re-thinking HTML-to-Text Extraction for LLM Pretraining","url":"https://machinelearning.apple.com/research/beyond-a-single-extractor","published":"2026-02-24","authors":["Jeffrey Li","Josh Gardner","Doug Kang","Fangping Shi","Karanjeet Singh","Chun-Liang Li","Herumb Shandilya","David Hall","Oncel Tuzel","Percy Liang","Ludwig Schmidt","Hadi Pour Ansari"],"abstract":"One of the first pre-processing steps for constructing web-scale LLM pretraining datasets involves extracting text from HTML. Despite the immense diversity of web content, existing open-source datasets predominantly apply a single fixed extractor to all webpages. In this work, we investigate whether this practice leads to suboptimal coverage and utilization of Internet data. We first show that while different extractors may lead to similar model...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["LLM"],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"apple:fkw07q8wvypyimdg9wnc14c0","title":"The Potential of CoT for Reasoning: A Closer Look at Trace Dynamics","url":"https://machinelearning.apple.com/research/cot","published":"2026-02-24","authors":["Gregor Bachmann","Yichen Jiang","Seyed Mohsen Moosavi Dezfooli","Moin Nabi"],"abstract":"Chain-of-thought (CoT) prompting is a de-facto standard technique to elicit reasoning-like responses from large language models (LLMs), allowing them to spell out individual steps before giving a final answer. While the resemblance to human-like reasoning is undeniable, the driving forces underpinning the success of CoT reasoning still remain largely unclear. In this work, we perform an in-depth analysis of CoT traces originating from...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"apple:bmuk1dy3ejg680pqbinu86je","title":"AMUSE: Audio-Visual Benchmark and Alignment Framework for Agentic Multi-Speaker Understanding","url":"https://machinelearning.apple.com/research/amuse","published":"2026-02-24","authors":["Sanjay Chowdhury","Karren D. Yang","Xudong Liu","Fartash Faghri","Pavan Kumar Anasosalu Vasu","Oncel Tuzel","Dinesh Manocha","Chun-Liang Li","Raviteja Vemulapalli"],"abstract":"Recent multimodal large language models (MLLMs) such as GPT-4o and Qwen3-Omni show strong perception but struggle in multi-speaker, dialogue-centric settings that demand agentic reasoning tracking who speaks, maintaining roles, and grounding events across time. These scenarios are central to multimodal audio-video understanding, where models must jointly reason over audio and visual streams in applications such as conversational video assistants...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"openalex:W7131403784","title":"Impact of tissue staining and scanner variation on the performance of pathology foundation models: a study of sarcomas and their mimics","url":"https://doi.org/10.1002/2056-4538.70080","published":"2026-02-24","authors":["Binghao Chai","Jianan Chen","Paul Cool","Fatine Oumlil","Anna Tollitt","David F. Steiner","Tapabrata Chakraborti","Adrienne M Flanagan"],"abstract":"Histopathological analysis is considered the gold standard for the diagnosis and prognostication of cancer. Recent advances in AI, driven by large-scale digitisation and pan-cancer foundation models, are opening new opportunities for clinical integration. However, it remains unclear how robust these foundation models are to real-world sources of variability, particularly in H&E staining and scanners produced by different manufacturers. In this study, we use soft tissue tumours, a rare and morphologically diverse tumour type, as a challenging test case to systematically investigate the colour-related robustness and generalisability of seven AI models. Controlled staining and scanning experiments were utilised to assess model performance across diverse real-world data sources. Foundation models, particularly UNI-v2, Virchow and TITAN, demonstrated encouraging robustness to staining and sca...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1002/2056-4538.70080","openalex_id":"https://openalex.org/W7131403784","cited_by_count":0,"quality_score":41,"matched_keywords":["efficient"],"author_affiliations":["CRUK Lung Cancer Centre of Excellence","Google (United States)","Keele University","Robert Jones and Agnes Hunt Orthopaedic Hospital","Royal National Orthopaedic Hospital","The Alan Turing Institute","University College London"],"concepts":[{"id":"https://openalex.org/C74864618","display_name":"Staining","score":0.6157000064849854},{"id":"https://openalex.org/C142724271","display_name":"Pathology","score":0.6090999841690063},{"id":"https://openalex.org/C63479239","display_name":"Robustness (evolution)","score":0.5559999942779541},{"id":"https://openalex.org/C2780966255","display_name":"Foundation (evidence)","score":0.5425000190734863},{"id":"https://openalex.org/C71924100","display_name":"Medicine","score":0.5295000076293945},{"id":"https://openalex.org/C2777522853","display_name":"Digital pathology","score":0.4851999878883362},{"id":"https://openalex.org/C136948725","display_name":"Soft tissue","score":0.36149999499320984},{"id":"https://openalex.org/C40993552","display_name":"Gold standard (test)","score":0.33230000734329224}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"arxiv:2604.09577","title":"Generative UI: LLMs are Effective UI Generators","url":"https://arxiv.org/abs/2604.09577","published":"2026-02-24","authors":["Yaniv Leviathan","Dani Valevski","Matan Kalman","Danny Lumen","Eyal Segalis","Eyal Molad","Shlomi Pasternak","Vishnu Natchu","Valerie Nygaard","Srinivasan","Venkatachary","James Manyika"],"abstract":"AI models excel at creating content, but typically render it with static, predefined interfaces. Specifically, the output of LLMs is often a markdown \"wall of text\". Generative UI is a long standing promise, where the model generates not just the content, but the interface itself. Until now, Generative UI was not possible in a robust fashion. We demonstrate that when properly prompted and equipped with the right set of tools, a modern LLM can robustly produce high quality custom UIs for virtually any prompt. When ignoring generation speed, results generated by our implementation are overwhelmingly preferred by humans over the standard LLM markdown output. In fact, while the results generated by our implementation are worse than those crafted by human experts, they are at least comparable in 50% of cases. We show that this ability for robust Generative UI is emergent, with substantial imp...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"","openalex_id":"https://openalex.org/W7154538968","cited_by_count":0,"quality_score":41,"matched_keywords":["LLM"],"author_affiliations":["Google (United States)"],"concepts":[{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.8338000178337097},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6751000285148621},{"id":"https://openalex.org/C177264268","display_name":"Set (abstract data type)","score":0.659500002861023},{"id":"https://openalex.org/C167966045","display_name":"Generative model","score":0.5370000004768372},{"id":"https://openalex.org/C2779530757","display_name":"Quality (philosophy)","score":0.53329998254776},{"id":"https://openalex.org/C184408114","display_name":"Generative Design","score":0.49050000309944153},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4327999949455261},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.4041999876499176}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"arxiv:2602.21193","title":"On Data Engineering for Scaling LLM Terminal Capabilities","url":"https://huggingface.co/papers/2602.21193","published":"2026-02-24","authors":["Renjie Pi","Grace Lam","Mohammad Shoeybi","Pooya Jannaty","Bryan Catanzaro","Wei Ping"],"abstract":"Despite rapid recent progress in the terminal capabilities of large language models, the training data strategies behind state-of-the-art terminal agents remain largely undisclosed. We address this gap through a systematic study of data engineering practices for terminal agents, making two key contributions: (1) Terminal-Task-Gen, a lightweight synthetic task generation pipeline that supports seed-based and skill-based task construction, and (2) a comprehensive analysis of data and training strategies, including filtering, curriculum learning, long context training, and scaling behavior. Our pipeline yields Terminal-Corpus, a large-scale open-source dataset for terminal tasks. Using this dataset, we train Nemotron-Terminal, a family of models initialized from Qwen3(8B, 14B, 32B) that achieve substantial gains on Terminal-Bench 2.0: Nemotron-Terminal-8B improves from 2.5% to 13.0% Nemotro...","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["huggingface_search"],"source":"huggingface_search","work_type":"","doi":"","openalex_id":"","cited_by_count":0,"quality_score":31,"matched_keywords":["LLM"],"author_affiliations":[],"concepts":[],"official_report":false,"quality_signals":{"company_match_source":"HuggingFace Papers organization/author metadata"}},{"id":"arxiv:2602.21201","title":"Aletheia tackles FirstProof autonomously","url":"https://huggingface.co/papers/2602.21201","published":"2026-02-24","authors":["Tony Feng","Junehyuk Jung","Sang-hyun Kim","Carlo Pagano","Sergei Gukov","Chiang-Chiang Tsai","David Woodruff","Adel Javanmard","Aryan Mokhtari","Dawsen Hwang","Yuri Chervonyi","Jonathan N. Lee"],"abstract":"We report the performance of Aletheia (Feng et al., 2026b), a mathematics research agent powered by Gemini 3 Deep Think, on the inaugural FirstProof challenge. Within the allowed timeframe of the challenge, Aletheia autonomously solved 6 problems (2, 5, 7, 8, 9, 10) out of 10 according to majority expert assessments; we note that experts were not unanimous on Problem 8 (only). For full transparency, we explain our interpretation of FirstProof and disclose details about our experiments as well as our evaluation. Raw prompts and outputs are available at https://github.com/google-deepmind/superhuman/tree/main/aletheia.","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["huggingface_search"],"source":"huggingface_search","work_type":"","doi":"","openalex_id":"","cited_by_count":0,"quality_score":31,"matched_keywords":["agent"],"author_affiliations":[],"concepts":[],"official_report":false,"quality_signals":{"company_match_source":"HuggingFace Papers organization/author metadata"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/identifying-explaining-and-correcting-ableist-language-with-ai","title":"Identifying, Explaining, and Correcting Ableist Language with AI","url":"https://www.microsoft.com/en-us/research/publication/identifying-explaining-and-correcting-ableist-language-with-ai/","published":"2026-02-23","authors":["Katy Smith","Lydia B. Chilton","Danielle Bragg"],"abstract":"Ableist language perpetuates harmful stereotypes and exclusion, yet its nuanced nature makes it difficult to recognize and address. Artificial intelligence could serve as a powerful ally in the fight against ableist language, offering tools that detect and suggest alternatives to biased terms. This two-part study investigates the potential of large language models (LLMs), specifically ChatGPT, to rectify ableist language and educate users about inclusive communication. We compared GPT-4o generations with crowdsourced annotations from trained disability community members, then invited disabled participants to evaluate both. Participants reported equal agreement with human and AI annotations but significantly preferred the AI, citing its narrative consistency and accessible style. At the same time, they valued the emotional depth and cultural grounding of human annotations. These findings....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Human language technologies","Computer science","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"apple:lp5ukyohhsq8gt2rmbc3pd9s","title":"Learning to Evict from Key-Value Cache","url":"https://machinelearning.apple.com/research/evict","published":"2026-02-23","authors":["Luca Moschella","Laura Manduchi","Ozan Sener"],"abstract":"The growing size of Large Language Models (LLMs) makes efficient inference challenging, primarily due to the memory demands of the autoregressive Key-Value (KV) cache. Existing eviction or compression methods reduce cost but rely on heuristics, such as recency or past attention scores, which serve only as indirect proxies for a token's future utility and introduce computational overhead. We reframe KV cache eviction as a reinforcement learning...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["memory","efficient","compression"],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/satellite-based-detection-of-looted-archaeological-sites-using-machine-learning","title":"Satellite-Based Detection of Looted Archaeological Sites Using Machine Learning","url":"https://www.microsoft.com/en-us/research/publication/satellite-based-detection-of-looted-archaeological-sites-using-machine-learning/","published":"2026-02-23","authors":["Girmaw Abebe Tadesse","Titien Bartette","Andrew Hassanali","Allen Kim","Jonathan Chemla","Andrew Zolli","Yves Ubelmann","Caleb Robinson","Inbal Becker-Reshef","Juan M. Lavista Ferres"],"abstract":"Looting at archaeological sites poses a severe risk to cultural heritage, yet monitoring thousands of remote locations remains operationally difficult. We present a scalable, satellite-based pipeline to detect looted archaeological sites using PlanetScope monthly mosaics (4.7 m/pixel) and a curated dataset of 1,943 archaeological sites in Afghanistan (898 looted, 1,045 preserved) with multi-year imagery (2016–2023) and site-footprint masks. We compare (i) end-to-end CNN classifiers trained on raw RGB patches and (ii) traditional machine learning models trained on handcrafted spectral and texture features as well as embeddings from recent remote-sensing foundation models. Results indicate that ImageNet-pretrained CNNs combined with spatial masking reach an F1 score of 0.926, clearly surpassing the strongest traditional machine learning setup, which attains an F1 score of 0.710 using SatCL...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Unpublished","Artificial intelligence"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W7131106748","title":"Direct Retrieval-augmented Optimization: Synergizing Knowledge Selection and Language Models","url":"https://doi.org/10.1145/3795527","published":"2026-02-23","authors":["Zhengliang Shi","Lingyong Yan","Weiwei Sun","Yue Feng","Pengjie Ren","Xinyu Ma","Shuaiqiang Wang","Dawei Yin","Maarten de Rijke","Zhaochun Ren"],"abstract":"Retrieval-augmented generation (RAG) integrates large language models (LLMs) with retrievers to access external knowledge, improving the factuality of LLM generation in knowledge-grounded tasks. To optimize the RAG performance, most previous work independently fine-tunes the retriever to adapt to frozen LLMs or trains the LLMs to use documents retrieved by off-theshelf retrievers, lacking end-to-end training supervision. Recent work addresses this limitation by jointly training these two components but relies on overly simplifying assumptions of document independence, which has been criticized for being far from real-world scenarios. Thus, effectively optimizing the overall RAG performance remains a critical challenge. We propose a direct retrieval-augmented optimization framework, named DRO, that enables end-to-end training of two key components: (i) a generative knowledge selection mod...","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3795527","openalex_id":"https://openalex.org/W7131106748","cited_by_count":1,"quality_score":46,"matched_keywords":["LLM","retrieval"],"author_affiliations":["Baidu (China)","Carnegie Mellon University","Leiden University","Shandong University","University of Amsterdam","University of Birmingham"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8580999970436096},{"id":"https://openalex.org/C81917197","display_name":"Selection (genetic algorithm)","score":0.6579999923706055},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.6395999789237976},{"id":"https://openalex.org/C2776330181","display_name":"Maximization","score":0.6276000142097473},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.6054999828338623},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5909000039100647},{"id":"https://openalex.org/C21308566","display_name":"Permutation (music)","score":0.4675000011920929},{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.4652000069618225}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"arxiv:2503.03114","title":"PromCopilot: Simplifying Prometheus Metric Querying in Cloud Native Online Service Systems via Large Language Models","url":"http://arxiv.org/abs/2503.03114","published":"2026-02-23","authors":["Chenxi Zhang","Bicheng Zhang","Dingyu Yang","Xin Peng","Miao Chen","Senyu Xie","Gang Chen","Wei Bi","Wei Li"],"abstract":"With the increasing complexity of modern online service systems, understanding the state and behavior of the systems is essential for ensuring their reliability and stability. Therefore, metric monitoring systems are widely used and become an important infrastructure in online service systems. Engineers usually interact with metrics data by manually writing domain-specific language (DSL) queries to achieve various analysis objectives. However, writing these queries can be challenging and time-consuming, as it requires engineers to have high programming skills and understand the context of the system. In this paper, we focus on PromQL, which is the metric query DSL provided by the widely used metric monitoring system Prometheus. We aim to simplify metrics querying by enabling engineers to interact with metrics data in Prometheus through natural language, and we call this task text-to-Prom...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3797910","openalex_id":"https://openalex.org/W4415336444","cited_by_count":0,"quality_score":45,"matched_keywords":["LLM","language model"],"author_affiliations":["Alibaba Group (China)","Fudan University","Xidian University","Zhejiang University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8992999792098999},{"id":"https://openalex.org/C176217482","display_name":"Metric (unit)","score":0.5843999981880188},{"id":"https://openalex.org/C79974875","display_name":"Cloud computing","score":0.5313000082969666},{"id":"https://openalex.org/C195324797","display_name":"Natural language","score":0.4569000005722046},{"id":"https://openalex.org/C2779343474","display_name":"Context (archaeology)","score":0.4307999908924103},{"id":"https://openalex.org/C2780378061","display_name":"Service (business)","score":0.42559999227523804},{"id":"https://openalex.org/C185798385","display_name":"Benchmark (surveying)","score":0.4163999855518341},{"id":"https://openalex.org/C2780801425","display_name":"Construct (python library)","score":0.3862000107765198}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7131071962","title":"Multimodal Large Language Model for Virtual Object Grounding","url":"https://doi.org/10.1145/3796717","published":"2026-02-23","authors":["Ziheng Xia","Chao Li","Ding Ding","Ye Wang","Hao Chen"],"abstract":"We propose a novel task, V irtual O bject G rounding (VOG). It aims to predict plausible locations in an image for inserting virtual objects that align with a given textual description. This VOG task can address the challenge of providing region constraints for object insertion in image editing, thereby ensuring the consistency of irrelevant areas in the image. To support this task, we construct Virtual Seg mentation dataset (VirtualSeg), a dataset of over 92,000 samples automatically generated from VrR-VG via a four-step dataset construction pipeline. This pipeline employs CLIP to automatically filter out low-quality data samples, ensuring the quality of VirtualSeg. Furthermore, we propose the VirLLaVA model, a novel virtual object grounding framework built upon LLaVA-7B. By equipping the MLLM backbone with two sequences of learnable tokens and a dual grounding module, and by guiding th...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3796717","openalex_id":"https://openalex.org/W7131071962","cited_by_count":0,"quality_score":41,"matched_keywords":["language model"],"author_affiliations":["Alibaba Group (China)","Southeast University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.9140999913215637},{"id":"https://openalex.org/C2781238097","display_name":"Object (grammar)","score":0.6978999972343445},{"id":"https://openalex.org/C43521106","display_name":"Pipeline (software)","score":0.6237999796867371},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5838000178337097},{"id":"https://openalex.org/C2780801425","display_name":"Construct (python library)","score":0.5523999929428101},{"id":"https://openalex.org/C2776436953","display_name":"Consistency (knowledge bases)","score":0.5253000259399414},{"id":"https://openalex.org/C106131492","display_name":"Filter (signal processing)","score":0.5095999836921692},{"id":"https://openalex.org/C51970089","display_name":"Virtual image","score":0.486299991607666}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7131093170","title":"Analyzing how pre-trained language models capture factual knowledge using attribution methods","url":"https://doi.org/10.1016/j.knosys.2026.115553","published":"2026-02-23","authors":["Shaobo Li","Chengjie Sun","Bingquan Liu","Xiaoguang Li","Lifeng Shang","Zhenhua Dong","Zhenzhou Ji","Xin Jiang","Qun Liu"],"abstract":"","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1016/j.knosys.2026.115553","openalex_id":"https://openalex.org/W7131093170","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Harbin Institute of Technology","Heilongjiang Institute of Technology","Huawei Technologies (China)","Weihai Science and Technology Bureau"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6187999844551086},{"id":"https://openalex.org/C2776436953","display_name":"Consistency (knowledge bases)","score":0.6040999889373779},{"id":"https://openalex.org/C143299363","display_name":"Attribution","score":0.5407000184059143},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.45680001378059387},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.43790000677108765},{"id":"https://openalex.org/C2776401178","display_name":"Feature (linguistics)","score":0.4250999987125397},{"id":"https://openalex.org/C2983448237","display_name":"Language understanding","score":0.41530001163482666},{"id":"https://openalex.org/C120936955","display_name":"Empirical research","score":0.34880000352859497}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7130817414","title":"A Survey of Multimodal Hallucination Evaluation and Detection","url":"https://doi.org/10.1007/s11263-026-02756-9","published":"2026-02-21","authors":["Zhiyuan Chen","Yuecong Min","Jie Zhang","Bei Yan","Jiahao Wang","Xiaozhen Wang","Shiguang Shan"],"abstract":"Multi-modal Large Language Models (MLLMs) have emerged as a powerful paradigm for integrating visual and textual information, supporting a wide range of multi-modal tasks. However, these models often suffer from hallucination, producing content that appears plausible but contradicts the input content or established world knowledge. This survey offers an in-depth review of hallucination evaluation benchmarks and detection methods across Image-to-Text (I2T) and Text-to-image (T2I) generation tasks. Specifically, we first propose a taxonomy of hallucination based on faithfulness and factuality, incorporating the common types of hallucinations observed in practice. Then we provide an overview of existing hallucination evaluation benchmarks for both T2I and I2T tasks, highlighting their construction process, evaluation objectives, and employed metrics. Furthermore, we summarize recent advance...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1007/s11263-026-02756-9","openalex_id":"https://openalex.org/W7130817414","cited_by_count":1,"quality_score":38,"matched_keywords":[],"author_affiliations":["Huawei Technologies (China)","Institute of Computing Technology","University of Chinese Academy of Sciences"],"concepts":[{"id":"https://openalex.org/C2911011789","display_name":"Hallucinating","score":0.9077000021934509},{"id":"https://openalex.org/C2908998935","display_name":"Visual Hallucination","score":0.6554999947547913},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5907999873161316},{"id":"https://openalex.org/C58642233","display_name":"Taxonomy (biology)","score":0.5085999965667725},{"id":"https://openalex.org/C26517878","display_name":"Key (lock)","score":0.4767000079154968},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4690999984741211},{"id":"https://openalex.org/C112313634","display_name":"Complement (music)","score":0.4505999982357025},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.3698999881744385}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W7154997778","title":"Synergetic Adaptive Orchestration and Governance: A Unified Framework for Production-Grade Multi-Agent Systems","url":"https://doi.org/10.1109/southeastcon63549.2026.11476668","published":"2026-02-20","authors":["Khushboo Bhatia"],"abstract":"The field of artificial intelligence is transitioning from isolated Large Language Models (LLMs) to agentic collectives, autonomous networked systems essential for missioncritical deployments. This fundamental shift introduces core challenges of orchestration (dynamic coordination) and governance (oversight of autonomous behaviors), which determine system safety, reliability, and economic value. This paper proposes the Synergetic Adaptive Orchestration and Governance (SAOG) framework, a unified architecture designed to bridge the gap between performance optimization and safety-critical oversight. The Orchestration Layer utilizes a modular, self-optimizing, cellstructured design where agents minimize Variational Free Energy (VFE). The Governance Layer establishes a zero-trust communication environment using Decentralized Identifiers (DIDs) and Verifiable Credentials (VCs) to prevent imper...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/southeastcon63549.2026.11476668","openalex_id":"https://openalex.org/W7154997778","cited_by_count":0,"quality_score":45,"matched_keywords":["agent","multi-agent"],"author_affiliations":["Google (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.593999981880188},{"id":"https://openalex.org/C199168358","display_name":"Orchestration","score":0.4593000113964081},{"id":"https://openalex.org/C26517878","display_name":"Key (lock)","score":0.31130000948905945},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3068000078201294},{"id":"https://openalex.org/C168167062","display_name":"Component (thermodynamics)","score":0.30559998750686646},{"id":"https://openalex.org/C18762648","display_name":"Work (physics)","score":0.29030001163482666},{"id":"https://openalex.org/C127413603","display_name":"Engineering","score":0.26089999079704285},{"id":"https://openalex.org/C2779343474","display_name":"Context (archaeology)","score":0.26080000400543213}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7154944725","title":"Quantum-Enhanced Transactional Cloud Operations Using Agentic Ai","url":"https://doi.org/10.1109/southeastcon63549.2026.11476139","published":"2026-02-20","authors":["Shalini Sudarsan","Nihar Karra","Milankumar Rana","Manan Agrawal"],"abstract":"This work investigates the integration to improve transactional operations, tackling important security and efficiency issues. We present a new framework using agentic artificial intelligence to coordinate complex decision-making procedures with quantum algorithms to handle safe transaction processing. The approach combines quantum key distribution protocols, Grover's algorithm, and utility-based agent decision models within a cloud-native design. Compared to conventional methods, empirical data show a 64% increase in transaction throughput, an 89% decrease in security vulnerability exposure, and a 42% decrease in processing delay. Measuring user experience reveals a 37% rise in transaction completion rates under little human involvement. These results imply that by offering hitherto unheard-of security guarantees while keeping operational efficiency, quantum-enhanced agentic cloud opera...","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/southeastcon63549.2026.11476139","openalex_id":"https://openalex.org/W7154944725","cited_by_count":0,"quality_score":41,"matched_keywords":["agent"],"author_affiliations":["Amazon (United States)","Charlotte School of Law","Kindercare Pediatrics","Oncology Specialists of Charlotte","University of North Carolina at Charlotte","University of the Cumberlands"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.628600001335144},{"id":"https://openalex.org/C79974875","display_name":"Cloud computing","score":0.5412999987602234},{"id":"https://openalex.org/C68489960","display_name":"Transactional leadership","score":0.30140000581741333},{"id":"https://openalex.org/C2522767166","display_name":"Data science","score":0.29249998927116394},{"id":"https://openalex.org/C2781238097","display_name":"Object (grammar)","score":0.2849000096321106},{"id":"https://openalex.org/C26517878","display_name":"Key (lock)","score":0.28459998965263367},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.2799000144004822},{"id":"https://openalex.org/C38652104","display_name":"Computer security","score":0.2630000114440918}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"official:4b580fdabb0a16e4","title":"Gemini 3.1 Pro Model Card","url":"https://deepmind.google/models/model-cards/gemini-3-1-pro/","published":"2026-02-19","authors":["Google/DeepMind"],"abstract":"Official Google DeepMind model card.","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"model_card","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["model card","Gemini 3.1 Pro"],"author_affiliations":["Google/DeepMind"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Google DeepMind model cards page https://deepmind.google/models/model-cards/"}},{"id":"arxiv:2602.17554","title":"A Theoretical Framework for Modular Learning of Robust Generative Models","url":"http://arxiv.org/abs/2602.17554","published":"2026-02-19","authors":["Corinna Cortes","Mehryar Mohri","Yutao Zhong"],"abstract":"Training large-scale generative models is resource-intensive and relies heavily on heuristic dataset weighting. We address two fundamental questions: Can we train Large Language Models (LLMs) modularly-combining small, domain-specific experts to match monolithic performance-and can we do so robustly for any data mixture, eliminating heuristic tuning? We present a theoretical framework for modular generative modeling where a set of pre-trained experts are combined via a gating mechanism. We define the space of normalized gating functions, $G_{1}$, and formulate the problem as a minimax game to find a single robust gate that minimizes divergence to the worst-case data mixture. We prove the existence of such a robust gate using Kakutani's fixed-point theorem and show that modularity acts as a strong regularizer, with generalization bounds scaling with the lightweight gate's complexity. Furt...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"","openalex_id":"https://openalex.org/W7130762928","cited_by_count":0,"quality_score":45,"matched_keywords":["efficient","distillation"],"author_affiliations":["Google (United States)"],"concepts":[{"id":"https://openalex.org/C101468663","display_name":"Modular design","score":0.7013999819755554},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.666700005531311},{"id":"https://openalex.org/C48044578","display_name":"Scalability","score":0.554099977016449},{"id":"https://openalex.org/C173801870","display_name":"Heuristic","score":0.5138999819755554},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4991999864578247},{"id":"https://openalex.org/C177148314","display_name":"Generalization","score":0.4909000098705292},{"id":"https://openalex.org/C2779478453","display_name":"Modularity (biology)","score":0.4812999963760376},{"id":"https://openalex.org/C149728462","display_name":"Minimax","score":0.45899999141693115}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7130548767","title":"A short survey on small reasoning models: training, inference, applications, and research directions","url":"https://doi.org/10.1007/s11704-025-50990-0","published":"2026-02-19","authors":["Chengyu Wang","Taolin Zhang","Richang Hong","Jun Huang"],"abstract":"Abstract Recently, the reasoning capabilities of Large Reasoning Models (LRMs), such as DeepSeek-R1, have witnessed significant advancements through computationally intensive “slow thinking” processes. These models have demonstrated impressive performance across a variety of complex reasoning tasks. However, despite their remarkable success, LRMs come with substantial computational demands that pose considerable challenges in terms of resource consumption, scalability, and accessibility. In contrast, Small Reasoning Models (SRMs), which are often distilled from larger models, offer a more efficient alternative while still achieving competitive performance. Beyond their efficiency, SRMs frequently exhibit distinct capabilities and cognitive trajectories compared with their larger counterparts, making them particularly interesting from both practical and theoretical perspectives. In this w...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1007/s11704-025-50990-0","openalex_id":"https://openalex.org/W7130548767","cited_by_count":0,"quality_score":41,"matched_keywords":["efficient"],"author_affiliations":["Alibaba Group (China)","Hefei University of Technology"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8317000269889832},{"id":"https://openalex.org/C2776214188","display_name":"Inference","score":0.5903000235557556},{"id":"https://openalex.org/C136197465","display_name":"Variety (cybernetics)","score":0.5806000232696533},{"id":"https://openalex.org/C2522767166","display_name":"Data science","score":0.5008000135421753},{"id":"https://openalex.org/C206345919","display_name":"Resource (disambiguation)","score":0.4652999937534332},{"id":"https://openalex.org/C539667460","display_name":"Management science","score":0.4408999979496002},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3995000123977661},{"id":"https://openalex.org/C195344581","display_name":"Automated reasoning","score":0.38909998536109924}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/filmaster-bridging-cinematic-principles-and-generative-ai-for-automated-film-generation","title":"FilMaster: Bridging Cinematic Principles and Generative AI for Automated Film Generation","url":"https://www.microsoft.com/en-us/research/publication/filmaster-bridging-cinematic-principles-and-generative-ai-for-automated-film-generation/","published":"2026-02-18","authors":["Kaiyi Huang","Yukun Huang","Xintao Wang","Zinan Lin","Xuefei Ning","Pengfei Wan","Di Zhang","Yu Wang","Xihui Liu"],"abstract":"AI-driven content creation has shown potential in film production. However, existing film generation systems struggle to implement cinematic principles and thus fail to generate professional-quality films, particularly lacking diverse camera language and cinematic rhythm. This results in templated visuals and unengaging narratives. To address this, we introduce FilMaster, an end-to-end AI system that integrates real-world cinematic principles for professional-grade film generation, yielding editable, industry-standard outputs. FilMaster is built on two key principles: (1) learning cinematography from extensive real-world film data and (2) emulating professional, audience-centric post-production workflows. Inspired by these principles, FilMaster incorporates two stages: a Reference-Guided Generation Stage which transforms user input to video clips, and a Generative Post-Production Stage w...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer vision","Graphics and multimedia","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/ni-sampling-accelerating-discrete-diffusion-sampling-by-token-order-optimization","title":"NI Sampling: Accelerating Discrete Diffusion Sampling by Token Order Optimization","url":"https://www.microsoft.com/en-us/research/publication/ni-sampling-accelerating-discrete-diffusion-sampling-by-token-order-optimization/","published":"2026-02-18","authors":["Enshu Liu","Xuefei Ning","Yu Wang","Zinan Lin"],"abstract":"Discrete diffusion language models (dLLMs) have recently emerged as a promising alternative to traditional autoregressive approaches, offering the flexibility to generate tokens in arbitrary orders and the potential of parallel decoding. However, existing heuristic sampling strategies remain inefficient: they choose only a small part of tokens to sample at each step, leaving substantial room for improvement. In this work, we study the problem of token sampling order optimization and demonstrate its significant potential for acceleration. Specifically, we find that fully leveraging correct predictions at each step can reduce the number of sampling iterations by an order of magnitude without compromising accuracy. Based on this, we propose Neural Indicator Sampling (NI Sampling), a general sampling order optimization framework that utilize a neural indicator to decide which tokens should b...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Human language technologies","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"official:45bf6443b7bde669","title":"Lyria 3 Model Card","url":"https://deepmind.google/models/model-cards/lyria-3/","published":"2026-02-18","authors":["Google/DeepMind"],"abstract":"Official Google DeepMind model card.","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"model_card","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["model card","Lyria 3"],"author_affiliations":["Google/DeepMind"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Google DeepMind model cards page https://deepmind.google/models/model-cards/"}},{"id":"apple:ahtgsqc8rp7mtel1bzif3l6d","title":"Unifying Ranking and Generation in Query Auto-Completion via Retrieval-Augmented Generation and Multi-Objective Alignment","url":"https://machinelearning.apple.com/research/query-auto-completion","published":"2026-02-18","authors":["Kai Yuan","Anthony Zheng","Jia Hu","Divyanshu Sheth","Hemanth Velaga","Kylee Kim","Matteo Guarrera","Besim Avci","Xuetao Yin","Jianhua Li","Rajyashree Mukherjee","Sean Suchter"],"abstract":"Query Auto-Completion (QAC) is a critical feature of modern search systems that improves search efficiency by suggesting completions as users type. However, existing approaches face fundamental challenges: traditional retrieve-and-rank pipelines have poor long-tail coverage and require extensive feature engineering, while recent generative methods suffer from hallucination and safety risks. We present a unified framework that reformulates QAC as...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["retrieval"],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"openalex:W7131102438","title":"The Evolution of Agentic AI in Cybersecurity: From Single LLM Reasoners to Multi-Agent Systems and Autonomous Pipelines","url":"https://doi.org/10.1109/icaic67076.2026.11395809","published":"2026-02-18","authors":["Vaishali Vinay"],"abstract":"Cybersecurity operations are increasingly adopting agentic AI solutions due to the time-critical and complex decision-making in security operations centers (SOCs). While large language models (LLMs) are good with summarization tasks or interpreting structured and unstructured reports, real-world SOC workflows have additional requirements such as access to original logs, reproducibility and accountability to triage security incidents. For example, analysts routinely correlate alerts to understand the kill-chain of the cyber-attack and analyze the event telemetries to identify the root cause event which may not have triggered an alert. Incorrect and incomplete automations in such settings can directly impact production systems and business operations.In this survey, we examine the architectural shifts from single-model assistants to tool-augmented agents, distributed multiagent systems, an...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/icaic67076.2026.11395809","openalex_id":"https://openalex.org/W7131102438","cited_by_count":0,"quality_score":53,"matched_keywords":["LLM","memory","agent","multi-agent"],"author_affiliations":["Microsoft (United States)","Microsoft Research (United Kingdom)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7498000264167786},{"id":"https://openalex.org/C177212765","display_name":"Workflow","score":0.7186999917030334},{"id":"https://openalex.org/C2779662365","display_name":"Event (particle physics)","score":0.524399995803833},{"id":"https://openalex.org/C2522767166","display_name":"Data science","score":0.45500001311302185},{"id":"https://openalex.org/C123606473","display_name":"Complex event processing","score":0.4537999927997589},{"id":"https://openalex.org/C2776007630","display_name":"Accountability","score":0.4390000104904175},{"id":"https://openalex.org/C170858558","display_name":"Automatic summarization","score":0.43140000104904175},{"id":"https://openalex.org/C38652104","display_name":"Computer security","score":0.4291999936103821}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7131123367","title":"Software Engineering Challenges in the Deployment of Generative AI Models at Scale","url":"https://doi.org/10.1109/icaic67076.2026.11395781","published":"2026-02-18","authors":["Venkata Nagendra Satyam","Braja Gopal Mahapatra","Devisharan Mishra"],"abstract":"The rapid development of one of the biggest advances in the sphere of Generative AI has led to natural language generation still being limited to large-scale implementation due to latency, reproducibility, integration complexity, and scalability challenges. This project offers a simple and repeatable deployment pipeline which is based on the Kaggle Fitness Exercises dataset. The final corpus was then fine-tuned with GPT-2 on Hugging Face and PyTorch using extensive preprocessing, that is, data cleaning, instruction merging, normalization, tokenization, and outlier filtering. The model was evaluated on five configurations with a loss of evaluation of approximately 0.4083, 0.4094, and perplexity of 1.5043, 1.5058, and sensitivity and ablation studies indicate the most sensitive hyperparameter is the learning rate. The fine-tuning of the model was performed through a FastAPI service, and it...","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/icaic67076.2026.11395781","openalex_id":"https://openalex.org/W7131123367","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Amazon (United States)","Mastercard (United States)"],"concepts":[{"id":"https://openalex.org/C105339364","display_name":"Software deployment","score":0.8183000087738037},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7860000133514404},{"id":"https://openalex.org/C43521106","display_name":"Pipeline (software)","score":0.5740000009536743},{"id":"https://openalex.org/C48044578","display_name":"Scalability","score":0.5720999836921692},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5702000260353088},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.5031999945640564},{"id":"https://openalex.org/C2777904410","display_name":"Software","score":0.4602000117301941},{"id":"https://openalex.org/C2778755073","display_name":"Scale (ratio)","score":0.4431000053882599}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"arxiv:2508.03716","title":"FeynTune: large language models for high-energy theory","url":"http://arxiv.org/abs/2508.03716","published":"2026-02-18","authors":["Paul Richmond","Constantinos Papageorgakis","Vasilis Niarchos","Borun D. Chowdhury","Prarit Agarwal"],"abstract":"Abstract We present specialized large language models (LLMs) for theoretical high-energy physics, obtained as 20 fine-tuned variants of the 8 billion parameter Llama-3.1 model. Each variant was trained on arXiv abstracts (through August 2024) from different combinations of hep-th, hep-ph and gr-qc. For a comparative study, we also trained models on datasets that contained abstracts from disparate fields such as the q-bio and cs categories. All models were fine-tuned using two distinct low-rank adaptation fine-tuning approaches and varying dataset sizes, and outperformed the base model on hep-th abstract completion tasks. We compare performance against leading commercial LLMs (ChatGPT, Claude, Gemini, DeepSeek) and derive insights for further developing specialized language models for high-energy theoretical physics.","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1088/2632-2153/ae47bb","openalex_id":"https://openalex.org/W7130384311","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Amazon (United States)","Queen Mary University of London","Regent's University London","University of Crete"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6843000054359436},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.6498000025749207},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.54830002784729},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.4893999993801117},{"id":"https://openalex.org/C139807058","display_name":"Adaptation (eye)","score":0.4742000102996826},{"id":"https://openalex.org/C42058472","display_name":"Base (topology)","score":0.40049999952316284},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.3522000014781952},{"id":"https://openalex.org/C41895202","display_name":"Linguistics","score":0.2842999994754791}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"hf-org-paper:zai-org:2602.15763","title":"GLM-5: from Vibe Coding to Agentic Engineering","url":"https://huggingface.co/papers/2602.15763","published":"2026-02-17","authors":["Z.ai/Zhipu"],"abstract":"We present GLM-5, a next-generation foundation model designed to transition the paradigm of vibe coding to agentic engineering. Building upon the agentic, reasoning, and coding (ARC) capabilities of its predecessor, GLM-5 adopts DSA to significantly reduce training and inference costs while maintaining long-context fidelity. To advance model alignment and autonomy, we implement a new asynchronous reinforcement learning infrastructure that drastically improves post-training efficiency by decoupling generation from training. Furthermore, we propose novel asynchronous agent RL algorithms that further improve RL quality, enabling the model to learn from complex, long-horizon interactions more effectively. Through these innovations, GLM-5 achieves state-of-the-art performance on major open benchmarks. Most critically, GLM-5 demonstrates unprecedented capability in real-world coding tasks, sur...","companies":["Z.ai/Zhipu"],"matched_orgs":["Z.ai/Zhipu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["HuggingFace org papers","zai-org","agent"],"author_affiliations":["Z.ai/Zhipu"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/zai-org/papers"}},{"id":"apple:dtk87g38phuazu5y82rm7cb3","title":"Ferret-UI Lite: Lessons from Building Small On-Device GUI Agents","url":"https://machinelearning.apple.com/research/ferret-ui","published":"2026-02-17","authors":["Zhen Yang","Zi-Yi Dou","Di Feng","Forrest Huang","Anh Nguyen","Keen You","Omar Attia","Yuhao Yang","Michael Feng","Haotian Zhang","Ram Ramrakhya","Chao Jia"],"abstract":"Developing autonomous agents that effectively interact with Graphic User Interfaces (GUIs) remains a challenging open problem, especially for small on-device models. In this paper, we present Ferret-UI Lite, a compact, end-to-end GUI agent that operates across diverse platforms, including mobile, web, and desktop. Utilizing techniques optimized for developing small models, we build our 3B Ferret-UI Lite agent through curating a diverse GUI data...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["agent"],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"openalex:W7129728849","title":"A memory fabric for conversational AI agents enabling shared and persistent multiuser memory","url":"https://doi.org/10.1007/s44163-026-00992-z","published":"2026-02-17","authors":["Anjikya Tiwari","Vibhuti Gupta"],"abstract":"Conversational artificial intelligence is now the most widely adopted platform for interfacing with large language models. Alongside large language models these artificial intelligence systems rely on contexts derived from past conversations and preferences to provide accurate and the most relevant responses to users. The knowledge base and past experiences contribute to long-term memory, while processing ongoing conversations generates short-term memory. Both long-term and short-term memories together provide a comprehensive and coherent context to the user. While most architectures focus on a single user context, there is an emerging need in conversational artificial intelligence to provide a system to generate context from multiple individuals and/or agents. Building on this foundation, we introduce memory fabric, a framework that allows conversational artificial intelligence to lever...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1007/s44163-026-00992-z","openalex_id":"https://openalex.org/W7129728849","cited_by_count":0,"quality_score":53,"matched_keywords":["memory","long-term","agent","multi-agent"],"author_affiliations":["Microsoft (United States)","The University of Texas Medical Branch at Galveston"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8019000291824341},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4903999865055084},{"id":"https://openalex.org/C2779343474","display_name":"Context (archaeology)","score":0.45350000262260437},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.40459999442100525},{"id":"https://openalex.org/C7149132","display_name":"Forgetting","score":0.397599995136261},{"id":"https://openalex.org/C2776303644","display_name":"Interfacing","score":0.36329999566078186},{"id":"https://openalex.org/C197914299","display_name":"Semantic memory","score":0.36059999465942383},{"id":"https://openalex.org/C157170001","display_name":"Applications of artificial intelligence","score":0.35429999232292175}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7129318890","title":"MedPTQ: a practical pipeline for real post-training quantization in 3D medical image segmentation","url":"https://doi.org/10.1117/1.jmi.13.1.014006","published":"2026-02-17","authors":["Chongyu Qu","Ritchie Zhao","Ye Yu","Bin Liu","Tianyuan Yao","Junchao Zhu","Bennett A. Landman","Yucheng Tang","Yuankai Z Huo"],"abstract":"Purpose: Quantizing deep neural networks, reducing the precision (bit-width) of their computations, can remarkably decrease memory usage and accelerate processing, making these models more suitable for large-scale medical imaging applications with limited computational resources. However, many existing methods studied \"simulated quantization,\" which simulates lower precision operations during inference but does not actually reduce model size or improve real-world inference speed. Moreover, the potential of deploying real three-dimensional (3D) low-bit quantization on modern graphics processing units (GPUs) is still unexplored. Approach: We introduce MedPTQ, an open-source pipeline for real post-training quantization that implements true 8-bit (INT8) inference on state-of-the-art (SOTA) 3D medical segmentation models, i.e., U-Net, SegResNet, SwinUNETR, nnU-Net, UNesT, TransUNet, ST-UNet,....","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1117/1.jmi.13.1.014006","openalex_id":"https://openalex.org/W7129318890","cited_by_count":0,"quality_score":49,"matched_keywords":["memory","efficient","quantization"],"author_affiliations":["Nvidia (United States)","Vanderbilt University"],"concepts":[{"id":"https://openalex.org/C71924100","display_name":"Medicine","score":0.678600013256073},{"id":"https://openalex.org/C89600930","display_name":"Segmentation","score":0.661300003528595},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.6424999833106995},{"id":"https://openalex.org/C31601959","display_name":"Medical imaging","score":0.6395999789237976},{"id":"https://openalex.org/C143409427","display_name":"Magnetic resonance imaging","score":0.5313000082969666},{"id":"https://openalex.org/C43521106","display_name":"Pipeline (software)","score":0.5238999724388123},{"id":"https://openalex.org/C58693492","display_name":"Neuroimaging","score":0.4925999939441681},{"id":"https://openalex.org/C28855332","display_name":"Quantization (signal processing)","score":0.45820000767707825}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/long-video-understanding-with-learnable-retrieval-in-video-language-models-2","title":"Long Video Understanding with Learnable Retrieval in Video-Language Models","url":"https://www.microsoft.com/en-us/research/publication/long-video-understanding-with-learnable-retrieval-in-video-language-models-2/","published":"2026-02-16","authors":["Jiaqi Xu","Cuiling Lan","Wenxuan Xie","Xuejin Chen","Yan Lu"],"abstract":"The remarkable natural language understanding, reasoning, and generation capabilities of large language models (LLMs) have made them attractive for application to video understanding, utilizing video tokens as contextual input. However, employing LLMs for long video understanding presents significant challenges. The extensive number of video tokens leads to considerable computational costs for LLMs while using aggregated tokens results in loss of vision details. Moreover, the presence of abundant question-irrelevant tokens introduces noise to the video reasoning process. To address these issues, we introduce a simple yet effective learnable retrieval-based video-language model (R-VLM) for efficient long video understanding. Specifically, given a question (query) and a long video, our model identifies and selects the most relevant K video chunks and uses their associated visual tokens to....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":84,"matched_keywords":["Article (Journal)","Graphics and multimedia","Computer science","large language models","LLM","language model","retrieval","efficient"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/sense-7-taxonomy-and-dataset-for-measuring-user-perceptions-of-empathy-in-sustained-human-ai-conversations","title":"SENSE-7: Taxonomy and Dataset for Measuring User Perceptions of Empathy in Sustained Human-AI Conversations","url":"https://www.microsoft.com/en-us/research/publication/sense-7-taxonomy-and-dataset-for-measuring-user-perceptions-of-empathy-in-sustained-human-ai-conversations/","published":"2026-02-16","authors":["Jina Suh","Lindy Le","Erfan Shayegani","Gonzalo Ramos","Judith Amores","Desmond C. Ong","Mary Czerwinski","Javier Hernandez"],"abstract":"Empathy is increasingly recognized as a key factor in human–AI communication, yet conventional approaches to “digital empathy” often focus on simulating internal, human like emotional states while overlooking the inherently subjective, contextual, and relational facets of empathy as perceived by users. In this work, we propose a human-centered taxonomy that emphasizes observable empathic behaviors and introduce a new dataset, SENSE-7, of real-world conversations between information workers and Large Language Models (LLMs), which includes per-turn empathy annotations directly from the users, along with user characteristics, and contextual details, offering a more user-grounded representation of empathy. Analysis of 695 conversations from 109 participants reveals that empathy judgments are highly individualized, context-sensitive, and vulnerable to disruption when conversational continuity...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Artificial intelligence","Human-computer interaction","Affective Computing","Human Computer Interaction","1970-01-01","LLM"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"hf-org-paper:Qwen:2602.14721","title":"WebWorld: A Large-Scale World Model for Web Agent Training","url":"https://huggingface.co/papers/2602.14721","published":"2026-02-16","authors":["Alibaba/Qwen"],"abstract":"","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["HuggingFace org papers","Qwen","agent"],"author_affiliations":["Alibaba/Qwen"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/Qwen/papers"}},{"id":"apple:karpqp4socaxhfw8qd3mytf3","title":"Asynchronous Verified Semantic Caching for Tiered LLM Architectures","url":"https://machinelearning.apple.com/research/semantic-caching","published":"2026-02-16","authors":["Asmit Kumar Singh","Haozhe Wang","Laxmi Naga Santosh Attaluri","Tak Chiam","Weihua Zhu"],"abstract":"Large language models (LLMs) now sit in the critical path of search, assistance, and agentic workflows, making semantic caching essential for reducing inference cost and latency. Production deployments typically use a tiered static-dynamic design: a static cache of curated, offline vetted responses mined from logs, backed by a dynamic cache populated online. In practice, both tiers are commonly governed by a single embedding similarity threshold,...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["LLM"],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"arxiv:2512.12716","title":"CoDA: A Context-Decoupled Hierarchical Agent with Reinforcement Learning","url":"http://arxiv.org/abs/2512.12716","published":"2026-02-16","authors":["X. L. Liu","Jianglun Feng","Zhuoran Zhuang","Junzhe Zhao","Maofei Que","Jieting Li","Dianlei Wang","Hao Tong","Ye Chen","Pan Li"],"abstract":"Large Language Model (LLM) agents trained with reinforcement learning (RL) show great promise for solving complex, multi-step tasks. However, their performance is often crippled by ''Context Explosion'', where the accumulation of long text outputs overwhelms the model's context window and leads to reasoning failures. To address this, we introduce CoDA, a Context-Decoupled hierarchical Agent, a simple but effective reinforcement learning framework that decouples high-level planning from low-level execution. It employs a single, shared LLM backbone that learns to operate in two distinct, contextually isolated roles: a high-level Planner that decomposes tasks within a concise strategic context, and a low-level Executor that handles tool interactions in an ephemeral, isolated workspace. We train this unified agent end-to-end using PECO (Planner-Executor Co-Optimization), a reinforcement lear...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3773966.3777986","openalex_id":"https://openalex.org/W4417449994","cited_by_count":0,"quality_score":49,"matched_keywords":["LLM","language model","agent"],"author_affiliations":["Alibaba Group (China)","Georgia Institute of Technology"],"concepts":[{"id":"https://openalex.org/C97541855","display_name":"Reinforcement learning","score":0.902899980545044},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8037999868392944},{"id":"https://openalex.org/C2776999362","display_name":"Planner","score":0.6518999934196472},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.646399974822998},{"id":"https://openalex.org/C180591056","display_name":"Executor","score":0.6294000148773193},{"id":"https://openalex.org/C63479239","display_name":"Robustness (evolution)","score":0.6104999780654907},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.47870001196861267},{"id":"https://openalex.org/C2779343474","display_name":"Context (archaeology)","score":0.4611000120639801}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"arxiv:2503.09382","title":"Towards Next-Generation Recommender Systems: A Benchmark for Personalized Recommendation Assistant with LLMs","url":"http://arxiv.org/abs/2503.09382","published":"2026-02-16","authors":["Jiani Huang","Shijie Wang","Liangbo Ning","Wenqi Fan","Shuaiqiang Wang","Dawei Yin","Qing Li"],"abstract":"Recommender systems (RecSys) are widely used across various modern digital platforms and have garnered significant attention. Traditional recommender systems usually focus only on fixed and simple recommendation scenarios, making it difficult to generalize to new and unseen recommendation tasks in an interactive paradigm. Recently, the advancement of large language models (LLMs) has revolutionized the foundational architecture of RecSys, driving their evolution into more intelligent and interactive personalized recommendation assistants. However, most existing studies rely on fixed task-specific prompt templates to generate recommendations and evaluate the performance of personalized assistants, which limits the comprehensive assessments of their capabilities. This is because commonly used datasets lack high-quality textual user queries that reflect real-world recommendation scenarios, m...","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3773966.3777954","openalex_id":"https://openalex.org/W4415102543","cited_by_count":1,"quality_score":46,"matched_keywords":["LLM","personalized"],"author_affiliations":["Baidu (China)","Hong Kong Polytechnic University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7936999797821045},{"id":"https://openalex.org/C185798385","display_name":"Benchmark (surveying)","score":0.7922999858856201},{"id":"https://openalex.org/C557471498","display_name":"Recommender system","score":0.7541000247001648},{"id":"https://openalex.org/C192209626","display_name":"Focus (optics)","score":0.6122999787330627},{"id":"https://openalex.org/C177264268","display_name":"Set (abstract data type)","score":0.5759000182151794},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.42410001158714294},{"id":"https://openalex.org/C136764020","display_name":"World Wide Web","score":0.4007999897003174},{"id":"https://openalex.org/C2522767166","display_name":"Data science","score":0.38690000772476196}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"arxiv:2508.15281","title":"MMQ: Multimodal Mixture-of-Quantization Tokenization for Semantic ID Generation and User Behavioral Adaptation","url":"http://arxiv.org/abs/2508.15281","published":"2026-02-16","authors":["Yi Xu","Moyu Zhang","Chenxuan Li","Zhihao Liao","Haibo Xing","Hao Deng","Jinxin Hu","Yu Zhang","Xiaoyi Zeng","Jing Zhang"],"abstract":"Recommender systems traditionally represent items using unique identifiers (ItemIDs), but this approach struggles with large, dynamic item corpora and sparse long-tail data, limiting scalability and generalization. Semantic IDs, derived from multimodal content such as text and images, offer a promising alternative by mapping items into a shared semantic space, enabling knowledge transfer and improving recommendations for new or rare items. However, existing methods face two key challenges: (1) balancing cross-modal synergy with modality-specific uniqueness, and (2) bridging the semantic-behavioral gap, where semantic representations may misalign with actual user preferences. To address these challenges, we propose Multimodal Mixture-of-Quantization (MMQ), a two-stage framework that trains a novel multimodal tokenizer. First, a shared-specific tokenizer leverages a multi-expert architectu...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"preprint","doi":"https://doi.org/10.1145/3773966.3777923","openalex_id":"https://openalex.org/W4416050722","cited_by_count":1,"quality_score":46,"matched_keywords":["retrieval","quantization"],"author_affiliations":["Alibaba Group (China)","Beihang University","Peking University","Wuhan University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8658999800682068},{"id":"https://openalex.org/C174348530","display_name":"Bridging (networking)","score":0.5978999733924866},{"id":"https://openalex.org/C48044578","display_name":"Scalability","score":0.5572999715805054},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5009999871253967},{"id":"https://openalex.org/C97931131","display_name":"Discriminative model","score":0.47450000047683716},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.44679999351501465},{"id":"https://openalex.org/C153083717","display_name":"Leverage (statistics)","score":0.39890000224113464},{"id":"https://openalex.org/C95623464","display_name":"Classifier (UML)","score":0.38960000872612}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W7129029224","title":"Third Workshop on Generative AI for Recommender Systems and Personalization","url":"https://doi.org/10.1145/3773966.3778018","published":"2026-02-16","authors":["Narges Tabari","Aniket Deshmukh","Wang-Cheng Kang","Julian McAuley","James Caverlee","Neil Shah","George Karypis"],"abstract":"Building personalized recommender systems and search experiences is a cornerstone of the modern data mining and applied machine learning (ML) community. Modern online platforms have a confluence of data including user-item interaction graphs, user and item-associated semantics (text, visual content, etc.), and metadata. Recent advancements in generative models and semantic encoders via large language models (LLMs), visual and audio encoders have significantly impacted research in relevant domains, enabling new directions in knowledge discovery and ability of models to better incorporate semantic context. These techniques are quickly advancing in the academic sphere, and adoption in industrial environments is growing. These advances force large questions about the future of search, recommendation and personalized experiences in the future. This workshop bridges the research gap between th...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3773966.3778018","openalex_id":"https://openalex.org/W7129029224","cited_by_count":0,"quality_score":45,"matched_keywords":["personalized","personalization"],"author_affiliations":["Google (United States)","Santa Clara University","Snap (United States)","Texas A&M University","University of California San Diego","University of Minnesota"],"concepts":[{"id":"https://openalex.org/C183003079","display_name":"Personalization","score":0.7955999970436096},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7425000071525574},{"id":"https://openalex.org/C184337299","display_name":"Semantics (computer science)","score":0.6337000131607056},{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.6100999712944031},{"id":"https://openalex.org/C557471498","display_name":"Recommender system","score":0.5914999842643738},{"id":"https://openalex.org/C192209626","display_name":"Focus (optics)","score":0.5145999789237976},{"id":"https://openalex.org/C2780616401","display_name":"Cornerstone","score":0.4918999969959259},{"id":"https://openalex.org/C2522767166","display_name":"Data science","score":0.4814999997615814}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7128996139","title":"The Future of Personalized Universal Assistant","url":"https://doi.org/10.1145/3773966.3778027","published":"2026-02-16","authors":["Ed H."],"abstract":"We've moved way beyond the old days of building discovery, recommendation, decision support, and other AI tools using traditional ML and pattern recognition techniques. The future of universal personal assistance for discovery and learning is upon us. How will multimodality image, video, and audio understanding, and reasoning abilities of large foundation models change how we build these systems? I will shed some initial light on this topic by discussing 3 trends: First, the move to a single multimodal large model with reasoning abilities; Second, the fundamental research on personalization and user alignment; Third, the combination of System 1 and System 2 cognitive abilities into a single universal assistant.","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3773966.3778027","openalex_id":"https://openalex.org/W7128996139","cited_by_count":0,"quality_score":45,"matched_keywords":["personalized","personalization"],"author_affiliations":["Google (United States)"],"concepts":[{"id":"https://openalex.org/C183003079","display_name":"Personalization","score":0.8105000257492065},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6538000106811523},{"id":"https://openalex.org/C2780910867","display_name":"Multimodality","score":0.578000009059906},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.49900001287460327},{"id":"https://openalex.org/C2780966255","display_name":"Foundation (evidence)","score":0.36340001225471497},{"id":"https://openalex.org/C169900460","display_name":"Cognition","score":0.361299991607666},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.35370001196861267},{"id":"https://openalex.org/C67712803","display_name":"User modeling","score":0.3160000145435333}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7129076650","title":"Privacy-preserved LLM Cascade via CoT-enhanced Policy Learning","url":"https://doi.org/10.1145/3773966.3777920","published":"2026-02-16","authors":["Kai Zhang","Conchao Wang","Liqian Peng","Alec Go","Xiaozhong Liu"],"abstract":"Large Language Models (LLMs) have attracted significant attention for on-device applications, delivering strong performance across a variety of real-world tasks. However, hardware constraints on edge devices limit model capacity, often resulting in suboptimal performance. A promising remedy is LLM cascading, where a lightweight local model defers selected hard queries to a more capable server model for response generation. While prior work has primarily optimized the performance--cost trade-off, real-world deployments must also address privacy concerns i.e., user information leakage, a requirement that remains largely overlooked. In this work, we go beyond existing confidence- and logit-based cascade methods and propose P3Defer, a novel Chain-of-Thought (CoT)-enhanced policy learning framework coupled with a private memory for privacy-preserved deferral decision-making. By jointly optimi...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3773966.3777920","openalex_id":"https://openalex.org/W7129076650","cited_by_count":0,"quality_score":45,"matched_keywords":["LLM","memory"],"author_affiliations":["Google (United States)","Worcester Polytechnic Institute"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7781999707221985},{"id":"https://openalex.org/C136197465","display_name":"Variety (cybernetics)","score":0.722000002861023},{"id":"https://openalex.org/C185798385","display_name":"Benchmark (surveying)","score":0.6604999899864197},{"id":"https://openalex.org/C34146451","display_name":"Cascade","score":0.5602999925613403},{"id":"https://openalex.org/C27286358","display_name":"Information cascade","score":0.4417000114917755},{"id":"https://openalex.org/C151201525","display_name":"Limit (mathematics)","score":0.4339999854564667},{"id":"https://openalex.org/C162307627","display_name":"Enhanced Data Rates for GSM Evolution","score":0.39649999141693115},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.3946000039577484}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7128991447","title":"Pre-training Language Model for Friend Recommendation: A Case Study of Large Social Graph","url":"https://doi.org/10.1145/3773966.3784972","published":"2026-02-16","authors":["Bin Ren","Wei Sun","Shuqi Feng","Yunzhong He","Jun Xiao"],"abstract":"Despite the demonstrated success of Pre-trained Language Models (PLM) in enhancing various recommendation tasks, their performance in friend recommendation systems, which are heavily influenced by complex social network dynamics, remains unproven. On the other hand, for social networks like Facebook, graph-based machine learning approaches in friend suggestion, like Graph Neural Networks (GNNs), struggle with the computational demands posed by social graphs. In is paper we investigate the application of PLM and GNN in the context of PYMK features within Facebook's social graph. We propose a representation learning scheme that captures both the local structure and semantic content within a second-degree connection space. Our approach leverages pre-trained transformer models with a feature aggregation scheme that enables efficient node representation learning in large social networks. We d...","companies":["Meta/FAIR"],"matched_orgs":["Meta/FAIR"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3773966.3784972","openalex_id":"https://openalex.org/W7128991447","cited_by_count":0,"quality_score":45,"matched_keywords":["language model","efficient"],"author_affiliations":["BC Platforms (Finland)","Bellevue College","Meta (United Kingdom)","Meta (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7588000297546387},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5604000091552734},{"id":"https://openalex.org/C132525143","display_name":"Graph","score":0.527899980545044},{"id":"https://openalex.org/C77618280","display_name":"Scheme (mathematics)","score":0.48159998655319214},{"id":"https://openalex.org/C2776359362","display_name":"Representation (politics)","score":0.4578999876976013},{"id":"https://openalex.org/C59404180","display_name":"Feature learning","score":0.4510999917984009},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.44999998807907104},{"id":"https://openalex.org/C2776401178","display_name":"Feature (linguistics)","score":0.41999998688697815}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7129069124","title":"How Do LLM-Generated Texts Impact Term-Based Retrieval Models?","url":"https://doi.org/10.1145/3773966.3777988","published":"2026-02-16","authors":["Wei Huang","Keping Bi","Yinqiong Cai","Wei Chen","Jiafeng Guo","Xueqi Cheng"],"abstract":"As more content generated by large language models (LLMs) floods into the Internet, information retrieval (IR) systems now face the challenge of distinguishing and handling a blend of human-authored and machine-generated texts. Recent studies suggest that neural retrievers may exhibit a preferential inclination toward LLM-generated content, while classic term-based retrievers like BM25 tend to favor human-written documents. This paper investigates the influence of LLM-generated content on term-based retrieval models, which are valued for their efficiency and robust generalization across domains. Our linguistic analysis reveals that LLM-generated texts exhibit smoother high-frequency and steeper low-frequency Zipf slopes, higher term specificity, and greater document-level diversity. These traits are aligned with LLMs being trained to optimize reader experience through diverse and precise...","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3773966.3777988","openalex_id":"https://openalex.org/W7129069124","cited_by_count":0,"quality_score":45,"matched_keywords":["LLM","retrieval"],"author_affiliations":["Baidu (China)","Institute of Computing Technology","University of Chinese Academy of Sciences"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6780999898910522},{"id":"https://openalex.org/C177148314","display_name":"Generalization","score":0.642799973487854},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.5504000186920166},{"id":"https://openalex.org/C61797465","display_name":"Term (time)","score":0.5407999753952026},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.49790000915527344},{"id":"https://openalex.org/C2776760102","display_name":"Code (set theory)","score":0.4611000120639801},{"id":"https://openalex.org/C2779304628","display_name":"Face (sociological concept)","score":0.4350999891757965},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.4205000102519989}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7129015434","title":"Unsupervised Dense Retrieval with Conterfactual Contrastive Learning","url":"https://doi.org/10.1145/3773966.3778015","published":"2026-02-16","authors":["Haitian Chen","Qingyao Ai","Yujia Zhou","Xiao Wang","YiquN Liu","Lin Fen","Qin Liu"],"abstract":"Efficiently retrieving a concise set of candidates from a large doc- ument corpus remains a pivotal challenge in Information Retrieval (IR). Neural retrieval models, particularly dense retrieval models built with transformers and pretrained language models, have been popular due to their superior performance. However, criticisms have also been raised on their lack of explainability and vulnerability to adversarial attacks. In response to these challenges, we propose to improve the robustness of dense retrieval models by enhancing their sensitivity of fine-grained relevance signals. A model achieving sensitivity in this context should exhibit high variances when doc- uments' key passages determining their relevance to queries have been modified, while maintaining low variances for other changes in irrelevant passages. This sensitivity allows a dense retrieval model to produce robust resul...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3773966.3778015","openalex_id":"https://openalex.org/W7129015434","cited_by_count":0,"quality_score":41,"matched_keywords":["retrieval"],"author_affiliations":["Tencent (China)","Tsinghua University","University of International Business and Economics"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7343000173568726},{"id":"https://openalex.org/C108650721","display_name":"Counterfactual thinking","score":0.6945000290870667},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.6834999918937683},{"id":"https://openalex.org/C63479239","display_name":"Robustness (evolution)","score":0.6351000070571899},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.5719000101089478},{"id":"https://openalex.org/C158154518","display_name":"Relevance (law)","score":0.43799999356269836},{"id":"https://openalex.org/C26517878","display_name":"Key (lock)","score":0.3553999960422516},{"id":"https://openalex.org/C2776135515","display_name":"Regularization (linguistics)","score":0.3244999945163727}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7129065452","title":"MOON: Generative MLLM-based Multimodal Representation Learning for E-commerce Product Understanding","url":"https://doi.org/10.1145/3773966.3777958","published":"2026-02-16","authors":["Daoze Zhang","Chenghan Fu","Zhanheng Nie","Jianyu Liu","Wanxian Guan","Yuan Gao","Jun Song","Pengjie Wang","Jian Xu","Bo Zheng"],"abstract":"With the rapid advancement of e-commerce, exploring general representations rather than task-specific ones has attracted increasing research attention. For product understanding, although existing discriminative dual-flow architectures drive progress in this field, they inherently struggle to model the many-to-one alignment between multiple images and texts of products. Therefore, we argue that generative Multimodal Large Language Models (MLLMs) hold significant potential for improving product representation learning. Nevertheless, achieving this goal still remains non-trivial due to several key challenges: the lack of multimodal and aspect-aware modeling modules in typical LLMs; the common presence of background noise in product images; and the absence of a standard benchmark for evaluation. To address these issues, we propose the first generative MLLM-based model named MOON for product...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3773966.3777958","openalex_id":"https://openalex.org/W7129065452","cited_by_count":0,"quality_score":41,"matched_keywords":["retrieval"],"author_affiliations":["Alibaba Group (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7032999992370605},{"id":"https://openalex.org/C167966045","display_name":"Generative model","score":0.609499990940094},{"id":"https://openalex.org/C185798385","display_name":"Benchmark (surveying)","score":0.5777000188827515},{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.5343000292778015},{"id":"https://openalex.org/C90673727","display_name":"Product (mathematics)","score":0.5152000188827515},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5055999755859375},{"id":"https://openalex.org/C2776359362","display_name":"Representation (politics)","score":0.48170000314712524},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.42419999837875366}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7129013692","title":"Code LLMs Still Fall Short of Top Programmers: Evaluating Algorithmic Code Generation Through Computational Thinking","url":"https://doi.org/10.1145/3773966.3778008","published":"2026-02-16","authors":["Shisong Chen","Ziyu Zhou","Yicong Zhao","Chengyi Yang","Zhixu Li","Yanghua Xiao","Xin Lin","Xiaojun Meng","Jiansheng Wei","Kuien Liu"],"abstract":"Evaluating the coding capabilities of models through algorithmic code generation is challenging, as it requires deep problem understanding and complex algorithm design. Current benchmarks suffer from a narrow focus on final execution results (such as pass@k), neglecting the crucial reasoning and problem-solving processes inherent in code generation. To address this limitation, we introduce a multi-phase algorithmic code generation benchmark, MUPA, structured around human computational thinking. MUPA dissects the evaluation into four distinct phases: example understanding, algorithm selection, solution description, and code generation. This framework facilitates a comprehensive assessment by providing insights into the model's intermediate problem-solving steps, rather than just the final code. We manually curated 197 high-quality competitive programming problems from Codeforces. Utilizin...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3773966.3778008","openalex_id":"https://openalex.org/W7129013692","cited_by_count":0,"quality_score":41,"matched_keywords":["LLM"],"author_affiliations":["Chinese Academy of Sciences","East China Normal University","Fudan University","Huawei Technologies (China)","Institute of Software","Renmin University of China"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.736299991607666},{"id":"https://openalex.org/C133162039","display_name":"Code generation","score":0.6682999730110168},{"id":"https://openalex.org/C2776760102","display_name":"Code (set theory)","score":0.6144999861717224},{"id":"https://openalex.org/C185798385","display_name":"Benchmark (surveying)","score":0.5533000230789185},{"id":"https://openalex.org/C192209626","display_name":"Focus (optics)","score":0.4968000054359436},{"id":"https://openalex.org/C185874996","display_name":"Interdependence","score":0.4936000108718872},{"id":"https://openalex.org/C179518139","display_name":"Coding (social sciences)","score":0.4625999927520752},{"id":"https://openalex.org/C115903868","display_name":"Software engineering","score":0.35199999809265137}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"arxiv:2512.13396","title":"Automated Information Flow Selection for Multi-scenario Multi-task Recommendation","url":"http://arxiv.org/abs/2512.13396","published":"2026-02-16","authors":["Chaohua Yang","Dugang Liu","Shiwei Li","Yuwen Fu","Xing Tang","Weihong Luo","Xiangyu Zhao","Xiuqiang He","Zhong Ming"],"abstract":"Multi-scenario multi-task recommendation (MSMTR) systems must address recommendation demands across diverse scenarios while simultaneously optimizing multiple objectives, such as click-through rate and conversion rate. Existing MSMTR models typically consist of four information units: scenario-shared, scenario-specific, task-shared, and task-specific networks. These units interact to generate four types of relationship information flows, directed from scenario-shared or scenario-specific networks to task-shared or task-specific networks. However, these models face two main limitations: 1) They often rely on complex architectures, such as mixture-of-experts (MoE) networks, which increase the complexity of information fusion, model size, and training cost. 2) They extract all available information flows without filtering out irrelevant or even harmful content, introducing potential noise.....","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3773966.3777992","openalex_id":"https://openalex.org/W4417459060","cited_by_count":0,"quality_score":41,"matched_keywords":["efficient"],"author_affiliations":["City University of Hong Kong","Huazhong University of Science and Technology","Shenzhen Technology University","Shenzhen University","Tencent (China)","University of Chinese Academy of Sciences"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7940999865531921},{"id":"https://openalex.org/C108010975","display_name":"Pruning","score":0.6552000045776367},{"id":"https://openalex.org/C185798385","display_name":"Benchmark (surveying)","score":0.5950000286102295},{"id":"https://openalex.org/C81917197","display_name":"Selection (genetic algorithm)","score":0.5396999716758728},{"id":"https://openalex.org/C124101348","display_name":"Data mining","score":0.5343000292778015},{"id":"https://openalex.org/C2779136372","display_name":"Information flow","score":0.5249999761581421},{"id":"https://openalex.org/C26517878","display_name":"Key (lock)","score":0.4977000057697296},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.45820000767707825}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7129001209","title":"Unlocking Scaling Law in Industrial Recommendation Systems with a Three-step Paradigm based Large User Model","url":"https://doi.org/10.1145/3773966.3777970","published":"2026-02-16","authors":["Bencheng Yan","Shilei Liu","Zhiyuan Zeng","Zihao Wang","Yizhen Zhang","Yujin Yuan","Langming Liu","Jiaqi Liu","Di Wang","Wenbo Su","Pengjie Wang","Jian Xu"],"abstract":"Recent advancements in autoregressive Large Language Models (LLMs) have achieved remarkable progress, largely driven by their scalability—commonly formalized as the scaling law. Inspired by these successes, there has been growing interest in adapting LLMs to recommendation systems (RecSys) by reformulating recommendation tasks as generative sequence modeling problems. However, existing End-to-End Generative Recommendation (E2E-GR) methods often sacrifice the practical advantages of traditional Deep Learning-based Recommendation Models (DLRMs)—including mature feature engineering, modular architectures, and production-grade optimization practices. This trade-off introduces critical challenges that hinder the effective application of scaling laws in industrial RecSys. In this paper, we present Large User Model (LUM), a scalable and production-aware framework that bridges the gap between ge...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3773966.3777970","openalex_id":"https://openalex.org/W7129001209","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Alibaba Group (China)"],"concepts":[{"id":"https://openalex.org/C48044578","display_name":"Scalability","score":0.703499972820282},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.652400016784668},{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.6396999955177307},{"id":"https://openalex.org/C105339364","display_name":"Software deployment","score":0.569100022315979},{"id":"https://openalex.org/C2780598303","display_name":"Flexibility (engineering)","score":0.5590999722480774},{"id":"https://openalex.org/C557471498","display_name":"Recommender system","score":0.5199000239372253},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5031999945640564},{"id":"https://openalex.org/C101468663","display_name":"Modular design","score":0.49300000071525574}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/experiential-reinforcement-learning","title":"Experiential Reinforcement Learning","url":"https://www.microsoft.com/en-us/research/publication/experiential-reinforcement-learning/","published":"2026-02-15","authors":["Taiwei Shi","Sihao Chen","Bowen Jiang","Linxin Song","Longqi Yang","Jieyu Zhao"],"abstract":"Reinforcement learning has become the central approach for language models (LMs) to learn from environmental reward or feedback. In practice, the environmental feedback is usually sparse and delayed. Learning from such signals is challenging, as LMs must implicitly infer how observed failures should translate into behavioral changes for future iterations. We introduce Experiential Reinforcement Learning (ERL), a training paradigm that embeds an explicit experience-reflection-consolidation loop into the reinforcement learning process. Given a task, the model generates an initial attempt, receives environmental feedback, and produces a reflection that guides a refined second attempt, whose success is reinforced and internalized into the base policy. This process converts feedback into structured behavioral revision, improving exploration and stabilizing optimization while preserving gains....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Miscellaneous","Artificial intelligence","large language models","Reinforcement learning"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"arxiv:2602.14293","title":"KernelBlaster: Continual Cross-Task CUDA Optimization via Memory-Augmented In-Context Reinforcement Learning","url":"https://huggingface.co/papers/2602.14293","published":"2026-02-15","authors":["Kris Shengjun Dong","Sahil Modi","Dima Nikiforov","Sana Damani","Edward Lin","Siva Kumar Sastry Hari","Christos Kozyrakis"],"abstract":"Optimizing CUDA code across multiple generations of GPU architectures is challenging, as achieving peak performance requires an extensive exploration of an increasingly complex, hardware-specific optimization space. Traditional compilers are constrained by fixed heuristics, whereas finetuning Large Language Models (LLMs) can be expensive. However, agentic workflows for CUDA code optimization have limited ability to aggregate knowledge from prior exploration, leading to biased sampling and suboptimal solutions. We propose KernelBlaster, a Memory-Augmented In-context Reinforcement Learning (MAIC-RL) framework designed to improve CUDA optimization search capabilities of LLM-based GPU coding agents. KernelBlaster enables agents to learn from experience and make systematically informed decisions on future tasks by accumulating knowledge into a retrievable Persistent CUDA Knowledge Base. We pr...","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["huggingface_search"],"source":"huggingface_search","work_type":"","doi":"","openalex_id":"","cited_by_count":0,"quality_score":35,"matched_keywords":["LLM","memory"],"author_affiliations":[],"concepts":[],"official_report":false,"quality_signals":{"company_match_source":"HuggingFace Papers organization/author metadata"}},{"id":"apple:phrgm7wkwjmg4exjxgyroy4u","title":"A Small-Scale System for Autoregressive Program Synthesis Enabling Controlled Experimentation","url":"https://machinelearning.apple.com/research/controlled-experimentation","published":"2026-02-13","authors":["Russ Webb","Jason Ramapuram"],"abstract":"What research can be pursued with small models trained to complete true programs? Typically, researchers study program synthesis via large language models (LLMs) which introduce issues such as knowing what is in or out of distribution, understanding fine-tuning effects, understanding the effects of tokenization, and higher demand on compute and storage to carry out experiments. We present a system called Cadmus which includes an integer virtual...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"arxiv:2602.12735","title":"VimRAG: Navigating Massive Visual Context in Retrieval-Augmented Generation via Multimodal Memory Graph","url":"https://arxiv.org/abs/2602.12735","published":"2026-02-13","authors":["Qiuchen Wang","Shihang Wang","Yu Zeng","Qiang Zhang","Fanrui Zhang","Zhuoning Guo","Bosi Zhang","Wenxuan Huang","Lin yang Chen","Zehui Chen","Pengjun Xie","Ruixue Ding"],"abstract":"Effectively retrieving, reasoning, and understanding multimodal information remains a critical challenge for agentic systems. Traditional Retrieval-augmented Generation (RAG) methods rely on linear interaction histories, which struggle to handle long-context tasks, especially those involving information-sparse yet token-heavy visual data in iterative reasoning scenarios. To bridge this gap, we introduce VimRAG, a framework tailored for multimodal Retrieval-augmented Reasoning across text, images, and videos. Inspired by our systematic study, we model the reasoning process as a dynamic directed acyclic graph that structures the agent states and retrieved multimodal evidence. Building upon this structured memory, we introduce a Graph-Modulated Visual Memory Encoding mechanism, with which the significance of memory nodes is evaluated via their topological position, allowing the model to dyn...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"","openalex_id":"https://openalex.org/W7129180369","cited_by_count":0,"quality_score":49,"matched_keywords":["memory","retrieval","agent"],"author_affiliations":["Alibaba Group (China)","Alibaba Group (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8185999989509583},{"id":"https://openalex.org/C125411270","display_name":"Encoding (memory)","score":0.5547999739646912},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.49140000343322754},{"id":"https://openalex.org/C98045186","display_name":"Process (computing)","score":0.4586000144481659},{"id":"https://openalex.org/C108010975","display_name":"Pruning","score":0.4537000060081482},{"id":"https://openalex.org/C80444323","display_name":"Theoretical computer science","score":0.4440000057220459},{"id":"https://openalex.org/C132525143","display_name":"Graph","score":0.4408999979496002},{"id":"https://openalex.org/C2779343474","display_name":"Context (archaeology)","score":0.43160000443458557}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7128809324","title":"M2HF: Multi-Branch Multi-Modal Hybrid Fusion for Text-Video Retrieval","url":"https://doi.org/10.26599/cvm.2025.9450444","published":"2026-02-13","authors":["Shuo Liu","Weize QUAN","Ming Zhou","Sihong Chen","Jian Kang","Zhe Zhao","Kimmo Yan","Chen Chen","Dong-Ming Yan"],"abstract":"Videos contain multi-modal content, and exploring multi-branch cross-modal interactions with natural language queries can be of benefit to the text-video retrieval task (TVR). However, recent methods applying the large-scale pre-trained CLIP model for TVR only focus on visual cues in videos. Furthermore, traditional methods of simply concatenating multimodal features do not exploit fine-grained cross-modal information in videos. In this paper, we propose a multi-branch multi-modal hybrid fusion (M2HF) network to hierarchically explore interaction between text queries and other modality content in videos. Specifically, M2HF first fuses visual features extracted by CLIP with audio and motion features extracted from videos to obtain fused audio-visual features and motion-visual features respectively. The multi-modal completion problem is also considered and solved in this process. Then, vis...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.26599/cvm.2025.9450444","openalex_id":"https://openalex.org/W7128809324","cited_by_count":2,"quality_score":47,"matched_keywords":["retrieval","efficient"],"author_affiliations":["Chinese Academy of Sciences","Donghua University","Shandong Institute of Automation","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8669000267982483},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.6025000214576721},{"id":"https://openalex.org/C165696696","display_name":"Exploit","score":0.5679000020027161},{"id":"https://openalex.org/C192209626","display_name":"Focus (optics)","score":0.5633999705314636},{"id":"https://openalex.org/C2780226545","display_name":"Modality (human–computer interaction)","score":0.5583999752998352},{"id":"https://openalex.org/C2780451532","display_name":"Task (project management)","score":0.4853000044822693},{"id":"https://openalex.org/C195324797","display_name":"Natural language","score":0.436599999666214},{"id":"https://openalex.org/C153180895","display_name":"Pattern recognition (psychology)","score":0.421999990940094}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":2}},{"id":"openalex:W7128790372","title":"Can LLMs Be Effective Sensor Processing Copilots?","url":"https://doi.org/10.1109/jiot.2026.3664751","published":"2026-02-13","authors":["Pengrui Quan","Xiaomin Ouyang","Jeya Vikranth Jeyakumar","Ziqi Wang","Yang Xing","Mani Srivastava"],"abstract":"Effective sensor data processing is critical for cyber-physical and IoT systems but often requires specialized expertise. While Large Language Models (LLMs) show promise as autonomous <italic xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">copilots</i> for sensor processing, their capabilities remain underexplored. We introduce SensorBench, the first comprehensive benchmark for evaluating LLMs across diverse real-world sensor datasets and tasks. SensorBench evaluates three paradigms for leveraging LLMs in sensing tasks: Tool-Augmented Coding (TAC), Standalone Coding (SAC), and Direct Answer (DA). We evaluate 8 leading LLM variants, including 2 Large Reasoning Models (LRMs) and 2 domain-specific LLMs, providing a structured reference for absolute performance, latency, and resource requirements. Our analysis reveals that: (1) TAC significantly outp...","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/jiot.2026.3664751","openalex_id":"https://openalex.org/W7128790372","cited_by_count":0,"quality_score":41,"matched_keywords":["LLM"],"author_affiliations":["Hong Kong University of Science and Technology","Nvidia (United States)","Qualcomm (United States)","UCLA Health"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7570000290870667},{"id":"https://openalex.org/C185798385","display_name":"Benchmark (surveying)","score":0.710099995136261},{"id":"https://openalex.org/C179518139","display_name":"Coding (social sciences)","score":0.5497999787330627},{"id":"https://openalex.org/C100776233","display_name":"Bridge (graph theory)","score":0.4007999897003174},{"id":"https://openalex.org/C36503486","display_name":"Domain (mathematical analysis)","score":0.382099986076355},{"id":"https://openalex.org/C206345919","display_name":"Resource (disambiguation)","score":0.35420000553131104},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3490999937057495},{"id":"https://openalex.org/C112930515","display_name":"Risk analysis (engineering)","score":0.3199999928474426}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7128778479","title":"City-Scale Lane-Level Mapping From Crowdsourced Trajectories and Satellite Imagery","url":"https://doi.org/10.1109/lra.2026.3664665","published":"2026-02-13","authors":["Guangwei Liu","Dazhi Zhang","Chengjian Xu","Xiaoyu Zhang","Zichao Zhang","Ji Zhao","Zheng Wu","Jian Zhang"],"abstract":"Lane-level maps are increasingly preferred over Standard-Definition (SD) and High-Definition (HD) maps, offering a better trade-off among detail richness, coverage breadth, and data freshness. However, constructing city-scale lane-level maps remains time-consuming and labor-intensive. To address these challenges, this letter presents an automated mapping framework that fuses crowdsourced trajectories with satellite imagery to enable scalable and accurate map generation. Our approach begins by mining billions of trajectories to extract the geometric and topological structure of road networks. To enrich feature representation, we introduce an effective multimodal fusion mechanism that integrates trajectory data with satellite images, leveraging the complementary strengths of both modalities. Furthermore, a spatiotemporal prior-fusion decoding strategy is proposed to enhance the accuracy an...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/lra.2026.3664665","openalex_id":"https://openalex.org/W7128778479","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Alibaba Group (China)","Chinese Academy of Sciences","Chinese University of Hong Kong","Oldham Council","Shenzhen Institutes of Advanced Technology"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.72079998254776},{"id":"https://openalex.org/C48044578","display_name":"Scalability","score":0.6104000210762024},{"id":"https://openalex.org/C13662910","display_name":"Trajectory","score":0.5498999953269958},{"id":"https://openalex.org/C2776436953","display_name":"Consistency (knowledge bases)","score":0.5317000150680542},{"id":"https://openalex.org/C2778102629","display_name":"Satellite imagery","score":0.5295000076293945},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.45719999074935913},{"id":"https://openalex.org/C124101348","display_name":"Data mining","score":0.44209998846054077},{"id":"https://openalex.org/C33954974","display_name":"Sensor fusion","score":0.4284999966621399}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/evaluating-llm-reasoning-beyond-correctness-and-cot","title":"Evaluating LLM Reasoning Beyond Correctness and CoT","url":"https://www.microsoft.com/en-us/research/publication/evaluating-llm-reasoning-beyond-correctness-and-cot/","published":"2026-02-12","authors":["Soheil Abbasloo"],"abstract":"What does it truly mean for a language model to “reason”? Current evaluations reward models’ correct standalone answers—but correctness alone reveals little about the process that produced them. We argue that reasoning should be understood not as a static chain of steps but as a dynamic trajectory in which ideas interact, clash, and evolve into integrated insights.Building on the philosophical tradition of dialectics , we introduce SIEV , a structured evaluation framework that assesses reasoning through explicit thesis–antithesis–synthesis interactions. SIEV produces interpretable trajectories that highlight key properties of reasoning—robustness to challenge, adaptability under conflict, and synthesis across competing viewpoints—dimensions that conventional correctness-based metrics cannot capture.Empirical results on GSM and MMLU demonstrate substantial gaps in the reasoning abilities....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Miscellaneous","Artificial intelligence","large language models","Logical reasoning","LLM","language model"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"bytedance-seed:1391","title":"Thinking with Drafting: Optical Decompression via Logical Reconstruction","url":"https://seed.bytedance.com/en/research/thinking-with-drafting-optical-decompression-via-logical-reconstruction","published":"2026-02-12","authors":["Jingxuan Wei","Honghao He","Caijun Jia","Siyuan Li","Zheng Sun","Yuhang Xu","Yuanyuan Lin","Linzhuang Sun","Yuchen Wu","Bihui Yu","Xiangxiang Zhang","Cheng Tan"],"abstract":"Existing multimodal large language models have achieved high-fidelity visual perception and exploratory visual generation. However, a precision paradox persists in complex reasoning tasks: optical perception systems transcribe symbols without capturing logical topology, while pixel-based generative models produce visual artifacts lacking mathematical exactness. To bridge this gap, we propose that reasoning over visual inputs be reconceptualized as optical decompression-the process of reconstructing latent logical structures from compressed visual tokens. Guided by the axiom that Parsing is Reasoning, we introduce Thinking with Drafting (TwD), which utilizes a minimalist Domain-Specific Language (DSL) as a grounding intermediate representation. Unlike standard approaches that hallucinate answers directly, TwD forces the model to draft its mental model into executable code, rendering deter...","companies":["ByteDance/Seed"],"matched_orgs":["ByteDance/Seed"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Computation and Language","Application","arXiv"],"author_affiliations":["ByteDance/Seed"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official ByteDance Seed publication API https://seed.bytedance.com/api/get_article_list_v2"}},{"id":"hf-org-paper:tencent:2602.12192","title":"Query-focused and Memory-aware Reranker for Long Context Processing","url":"https://huggingface.co/papers/2602.12192","published":"2026-02-12","authors":["Tencent/Hunyuan"],"abstract":"","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["HuggingFace org papers","tencent","memory"],"author_affiliations":["Tencent/Hunyuan"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/tencent/papers"}},{"id":"hf-org-paper:Tencent-Hunyuan:2602.12125","title":"Learning beyond Teacher: Generalized On-Policy Distillation with Reward Extrapolation","url":"https://huggingface.co/papers/2602.12125","published":"2026-02-12","authors":["Tencent/Hunyuan"],"abstract":"","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["HuggingFace org papers","Tencent-Hunyuan","distillation"],"author_affiliations":["Tencent/Hunyuan"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/Tencent-Hunyuan/papers"}},{"id":"hf-org-paper:tencent:2602.12108","title":"The Pensieve Paradigm: Stateful Language Models Mastering Their Own Context","url":"https://huggingface.co/papers/2602.12108","published":"2026-02-12","authors":["Tencent/Hunyuan"],"abstract":"","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["HuggingFace org papers","tencent"],"author_affiliations":["Tencent/Hunyuan"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/tencent/papers"}},{"id":"apple:rb5iup36xfftgo6xiqkcsr8c","title":"Mapping the Design Space of User Experience for Computer Use Agents","url":"https://machinelearning.apple.com/research/mapping","published":"2026-02-12","authors":["Ruijia Cheng","Jenny T. Liang","Eldon Schoop","Jeffrey Nichols"],"abstract":"Large language model (LLM)-based computer use agents execute user commands by interacting with available UI elements, but little is known about how users want to interact with these agents or what design factors matter for their user experience (UX). We conducted a two-phase study to map the UX design space for computer use agents. In Phase 1, we reviewed existing systems to develop a taxonomy of UX considerations, then refined it through...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["LLM","language model"],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"hf-org-paper:Tencent-Hunyuan:2602.12036","title":"Composition-RL: Compose Your Verifiable Prompts for Reinforcement Learning of Large Language Models","url":"https://huggingface.co/papers/2602.12036","published":"2026-02-12","authors":["Tencent/Hunyuan"],"abstract":"","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["HuggingFace org papers","Tencent-Hunyuan"],"author_affiliations":["Tencent/Hunyuan"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/Tencent-Hunyuan/papers"}},{"id":"apple:hqp0f6zftpm3twy106jxctoz","title":"Trace Length is a Simple Uncertainty Signal in Reasoning Models","url":"https://machinelearning.apple.com/research/trace-length","published":"2026-02-12","authors":["Siddhartha Devic","Charlotte Peale","Arwen Bradley","Sinead Williamson","Preetum Nakkiran","Aravind Gollakota"],"abstract":"Uncertainty quantification for LLMs is a key research direction towards addressing hallucination and other issues that limit their reliable deployment. In this work, we show that reasoning trace length is a simple and useful confidence estimator in large reasoning models. Through comprehensive experiments across multiple models, datasets, and prompts, we show that trace length performs in comparable but complementary ways to other zero-shot...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"arxiv:2602.13571","title":"LLM-confidence reranker: A training-free approach for enhancing retrieval-augmented generation systems","url":"http://arxiv.org/abs/2602.13571","published":"2026-02-12","authors":["Zhipeng Song","Xiangyu Kong","Xinrui Bao","Yizhi Zhou","Jiulong Jiao","Sitong Liu","Yuhang Zhou","Heng Qi"],"abstract":"","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1016/j.eswa.2026.131627","openalex_id":"https://openalex.org/W7128691825","cited_by_count":0,"quality_score":45,"matched_keywords":["LLM","retrieval"],"author_affiliations":["Dalian Ocean University","Dalian University","Dalian University of Technology","Eastern Liaoning University","Liaoning Technical University","Qinghai University","Tencent (China)","Tianjin Normal University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8682000041007996},{"id":"https://openalex.org/C165696696","display_name":"Exploit","score":0.7479000091552734},{"id":"https://openalex.org/C189430467","display_name":"Ranking (information retrieval)","score":0.6413999795913696},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5340999960899353},{"id":"https://openalex.org/C26517878","display_name":"Key (lock)","score":0.5088000297546387},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.46619999408721924},{"id":"https://openalex.org/C2779843651","display_name":"SIGNAL (programming language)","score":0.4645000100135803},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.43869999051094055}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"arxiv:2602.11551","title":"SIGHT: Reinforcement Learning with Self-Evidence and Information-Gain Diverse Branching for Search Agent","url":"http://arxiv.org/abs/2602.11551","published":"2026-02-12","authors":["Wenlin Zhong","Jinluan Yang","Yiquan Wu","Yi Liu","Jianhang Yao","Kun Kuang"],"abstract":"Reinforcement Learning (RL) has empowered Large Language Models (LLMs) to master autonomous search for complex question answering. However, particularly within multi-turn search scenarios, this interaction introduces a critical challenge: search results often suffer from high redundancy and low signal-to-noise ratios. Consequently, agents easily fall into \"Tunnel Vision,\" where the forced interpretation of early noisy retrievals leads to irreversible error accumulation. To address these challenges, we propose SIGHT, a framework that enhances search-based reasoning through Self-Evidence Support (SES) and Information-Gain Driven Diverse Branching. SIGHT distills search results into high-fidelity evidence via SES and calculates an Information Gain score to pinpoint pivotal states where observations maximally reduce uncertainty. This score guides Dynamic Prompting Interventions - including d...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"","openalex_id":"https://openalex.org/W7128864669","cited_by_count":0,"quality_score":41,"matched_keywords":["agent"],"author_affiliations":["Alibaba Group (China)","Communication University of Zhejiang","Runze (China)","Zhejiang University"],"concepts":[{"id":"https://openalex.org/C97541855","display_name":"Reinforcement learning","score":0.7229999899864197},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6330999732017517},{"id":"https://openalex.org/C55439883","display_name":"Correctness","score":0.6191999912261963},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5618000030517578},{"id":"https://openalex.org/C1517167","display_name":"Sight","score":0.5065000057220459},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.46779999136924744},{"id":"https://openalex.org/C152124472","display_name":"Redundancy (engineering)","score":0.41940000653266907},{"id":"https://openalex.org/C125583679","display_name":"Search algorithm","score":0.4169999957084656}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7154478173","title":"A Machine Learning-Based Approach to Real-Time Traffic Prediction using Large Language Models: Model Performance and Scalability Challenges","url":"https://doi.org/10.1109/ic3ecsbhi67834.2026.11469057","published":"2026-02-12","authors":["Chandra Nikitha Shatdarsanam","Srinivasa Rao Kurakula","Satyam Sheshansh","Balaji Salem Balasundram","Riaz Ahmed Mohammed Sait","Sivakoteswararao Katta"],"abstract":"Semantic and accurate real-time contextdependent traffic prediction is essential to develop smart city intelligent transportation systems, which can be used to improve route plan, decrease traffic jams and rapid response of emergency. Despite the partial success of classical machine learning and conventional deep learning paradigms, they failed to accurately model the complex traffic data with spatial and temporal patterns. In this paper, we introduce an innovative machine learning-based framework that exploits sequence modeling of large language models (LLMs) including fine-tuned GPT and LLaMA models on real-time traffic prediction. The framework represents multi-source traffic features to tokenized sequences, which enables LLMs to exploit long-range temporal dependencies and cross-location traffic flow interactions. We perform extensive experiments on the benchmark datasets METR-LA, PE...","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/ic3ecsbhi67834.2026.11469057","openalex_id":"https://openalex.org/W7154478173","cited_by_count":0,"quality_score":41,"matched_keywords":["LLM"],"author_affiliations":["Amazon (United States)","Community-Campus Partnerships for Health","EP Analytics (United States)","Jawaharlal Nehru Technological University, Kakinada","Oracle (United States)","Walmart (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6847000122070312},{"id":"https://openalex.org/C48044578","display_name":"Scalability","score":0.5090000033378601},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.38530001044273376},{"id":"https://openalex.org/C124101348","display_name":"Data mining","score":0.3456999957561493},{"id":"https://openalex.org/C2778755073","display_name":"Scale (ratio)","score":0.34529998898506165},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.3165999948978424},{"id":"https://openalex.org/C26517878","display_name":"Key (lock)","score":0.31459999084472656},{"id":"https://openalex.org/C79403827","display_name":"Real-time computing","score":0.2867000102996826}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7128735846","title":"Generative AI and Digital Ecosystem resilience: A Proactive Lifecycle-Based Survey","url":"https://doi.org/10.36227/techrxiv.177092219.91727099/v1","published":"2026-02-12","authors":["Jonghyun Chung","Rishabh Chaddha","Amanpreet Kaur","Debanshu Das","Sanket Badhe","Nathan Huang"],"abstract":"The proliferation of adversarial synthetic content, accelerated by Generative AI (GenAI) is rendering traditional reactive detection methods ineffective. This survey synthesizes emerging research to demonstrate a paradigm shift toward the proactive detection of emerging adversarial synthetic campaigns. In this survey, we adopt a unified, lifecycle-based taxonomy to combine socio-technical lifecycle models of adversarial campaigns with advanced computational methodologies for synthetic content cluster detection. By structuring the analysis around the C5 Interaction Model (Context, Causes, Content, Cycle of Amplification, Consequences), we integrate different research streams from machine learning and social science. To differentiate spread patterns of synthetic amplification from authentic baseline traffic, this paper surveys state-of-the-art techniques for modeling the creation, seeding,...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.36227/techrxiv.177092219.91727099/v1","openalex_id":"https://openalex.org/W7128735846","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Google (United States)"],"concepts":[{"id":"https://openalex.org/C37736160","display_name":"Adversarial system","score":0.7487999796867371},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.698199987411499},{"id":"https://openalex.org/C739882","display_name":"Anomaly detection","score":0.6086999773979187},{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.5963000059127808},{"id":"https://openalex.org/C2775945657","display_name":"Structuring","score":0.5655999779701233},{"id":"https://openalex.org/C2522767166","display_name":"Data science","score":0.557699978351593},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5059000253677368},{"id":"https://openalex.org/C58642233","display_name":"Taxonomy (biology)","score":0.39430001378059387}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7128730772","title":"EgoPlan-Bench: Benchmarking Multimodal Large Language Models for Human-Level Planning","url":"https://doi.org/10.1007/s11263-025-02676-0","published":"2026-02-12","authors":["Yuehua Chen","Yuying Ge","Yixiao Ge","Mingyu Ding","Bing Li","Rui Wang","Ruifeng Xu","Ying Shan","Xihui Liu"],"abstract":"Abstract The pursuit of artificial general intelligence (AGI) has been accelerated by Multimodal Large Language Models (MLLMs), which exhibit superior reasoning, generalization capabilities, and proficiency in processing multimodal inputs. A crucial milestone in the evolution of AGI is the attainment of human-level planning, a fundamental ability for making informed decisions in complex environments, and solving a wide range of real-world problems. Despite the impressive advancements in MLLMs, a question remains: How far are current MLLMs from achieving human-level planning? To shed light on this question, we introduce EgoPlan-Bench, a comprehensive benchmark to evaluate the planning abilities of MLLMs in real-world scenarios from an egocentric perspective, mirroring human perception. EgoPlan-Bench emphasizes the evaluation of planning capabilities of MLLMs, featuring realistic tasks, di...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1007/s11263-025-02676-0","openalex_id":"https://openalex.org/W7128730772","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Chinese University of Hong Kong","Peng Cheng Laboratory","Tencent (China)","University of California, Berkeley","University of Hong Kong"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8227999806404114},{"id":"https://openalex.org/C189645446","display_name":"Mirroring","score":0.7479000091552734},{"id":"https://openalex.org/C86251818","display_name":"Benchmarking","score":0.7437999844551086},{"id":"https://openalex.org/C185798385","display_name":"Benchmark (surveying)","score":0.7343999743461609},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.6452999711036682},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.5544999837875366},{"id":"https://openalex.org/C2780451532","display_name":"Task (project management)","score":0.5418000221252441},{"id":"https://openalex.org/C177148314","display_name":"Generalization","score":0.5267000198364258}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7154459827","title":"A Temperature-Scaled and Direction-Consistent Softmax Framework for Multimodal Medical Image Fusion","url":"https://doi.org/10.1109/ic3ecsbhi67834.2026.11469039","published":"2026-02-12","authors":["Siddham Jain","Vikram Yadav","Prabhishek Singh","Shailesh Bhosekar","Deepak Garg","Manoj Diwakar"],"abstract":"In this paper, a novel multimodal medical image fusion (M3IF) strategy (SWT-TSSF-DCS) that incorporates Stationary Wavelet Transform (SWT) and Temperature-Scaled Softmax Fusion (TSSF) for low-frequency (LF) coefficients and Direction-Consistency Softmax (DCS) for high-frequency (HF) coefficients is proposed. SWT-TSSF-DCS is compared with some traditional M3IF methods. The comparative experimental results on different medical modalities datasets indicate that the SWT-TSSF-DCS approach yields better fusion results. The visual analysis as well as quantitative analysis is done for results interpretation. Overall image quality using parameters like edge, texture, structure preservation etc are used for comparative quality preservation analysis. The quantitative analysis is done using Qab/f, Qy, Qcb, and Qs metrics. Also Orientation-Consistency Index (OCI) comparison of M3IF methods is done fo...","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/ic3ecsbhi67834.2026.11469039","openalex_id":"https://openalex.org/W7154459827","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Amazon (United States)","Bennett University","Graphic Era University"],"concepts":[{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.7024000287055969},{"id":"https://openalex.org/C188441871","display_name":"Softmax function","score":0.6514999866485596},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6014000177383423},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.5343999862670898},{"id":"https://openalex.org/C69744172","display_name":"Image fusion","score":0.49140000343322754},{"id":"https://openalex.org/C158525013","display_name":"Fusion","score":0.43160000443458557},{"id":"https://openalex.org/C153180895","display_name":"Pattern recognition (psychology)","score":0.39329999685287476},{"id":"https://openalex.org/C115961682","display_name":"Image (mathematics)","score":0.295199990272522}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/optimizing-agent-planning-for-security-and-autonomy","title":"Optimizing Agent Planning for Security and Autonomy","url":"https://www.microsoft.com/en-us/research/publication/optimizing-agent-planning-for-security-and-autonomy/","published":"2026-02-11","authors":["Aashish Kolluri","Rishi Sharma","Manuel Costa","Boris Köpf","Tobias Niessen","Mark Russinovich","Shruti Tople","Santiago Zanella-Béguelin"],"abstract":"Indirect prompt injection attacks threaten AI agents that execute consequential actions, motivating deterministic system-level defenses. Such defenses can provably block unsafe actions by enforcing confidentiality and integrity policies, but currently appear costly: they reduce task completion rates and increase token usage compared to probabilistic defenses. We argue that existing evaluations miss a key benefit of system-level defenses: reduced reliance on human oversight. We introduce autonomy metrics to quantify this benefit: the fraction of consequential actions an agent can execute without human-in-the-loop (HITL) approval while preserving security. To increase autonomy, we design a security-aware agent that (i) introduces richer HITL interactions, and (ii) explicitly plans for both task progress and policy compliance. We implement this agent design atop an existing information-flow...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":80,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","AI agents","Computer science","Security and Privacy","1970-01-01","agent"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"bytedance-seed:1404","title":"When to Memorize and When to Stop: Gated Recurrent Memory for Long-Context Reasoning","url":"https://seed.bytedance.com/en/research/when-to-memorize-and-when-to-stop-gated-recurrent-memory-for-long-context-reasoning","published":"2026-02-11","authors":["Leheng Sheng","Yongtao Zhang","Wenchang Ma","Yaorui Shi","Ting Huang","Xiang Wang","An Zhang","Ke Shen","Tat-Seng Chua"],"abstract":"While reasoning over long context is crucial for various real-world applications, it remains challenging for large language models (LLMs) as they suffer from performance degradation as the context length grows. Recent work MemAgent has tried to tackle this by processing context chunk-by-chunk in an RNN-like loop and updating a textual memory for final answering. However, this naive recurrent memory update faces two crucial drawbacks: (i) memory can quickly explode because it can update indiscriminately, even on evidence-free chunks; and (ii) the loop lacks an exit mechanism, leading to unnecessary computation after even sufficient evidence is collected. To address these issues, we propose GRU-Mem, which incorporates two text-controlled gates for more stable and efficient long-context reasoning. Specifically, in GRU-Mem, the memory only updates when the update gate is open and the recurre...","companies":["ByteDance/Seed"],"matched_orgs":["ByteDance/Seed"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Computation and Language","LLM","arXiv","memory","efficient"],"author_affiliations":["ByteDance/Seed"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official ByteDance Seed publication API https://seed.bytedance.com/api/get_article_list_v2"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/temperature-as-a-meta-policy-adaptive-temperature-in-llm-reinforcement-learning","title":"Temperature as a Meta-Policy: Adaptive Temperature in LLM Reinforcement Learning","url":"https://www.microsoft.com/en-us/research/publication/temperature-as-a-meta-policy-adaptive-temperature-in-llm-reinforcement-learning/","published":"2026-02-11","authors":["Haoran Dang","Cuiling Lan","Hai Wan","Xibin Zhao","Yan Lu"],"abstract":"Temperature is a crucial hyperparameter in large language models (LLMs), controlling the trade-off between exploration and exploitation during text generation. High temperatures encourage diverse but noisy outputs, while low temperatures produce focused outputs but may cause premature convergence. Yet static or heuristic temperature schedules fail to adapt to the dynamic demands of reinforcement learning (RL) throughout training, often limiting policy improvement. We propose Temperature Adaptive Meta Policy Optimization (TAMPO), a new framework that recasts temperature control as a learnable meta-policy. TAMPO operates through a hierarchical two-loop process. In the inner loop, the LLM policy is updated (e.g., using GRPO) with trajectories sampled at the temperature selected by the meta-policy. In the outer loop, meta-policy updates the distribution over candidate temperatures by rewardi...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","1970-01-01","LLM"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"hf-org-paper:stepfun-ai:2602.10604","title":"Step 3.5 Flash: Open Frontier-Level Intelligence with 11B Active Parameters","url":"https://huggingface.co/papers/2602.10604","published":"2026-02-11","authors":["StepFun"],"abstract":"We introduce Step 3.5 Flash, a sparse Mixture-of-Experts (MoE) model that bridges frontier-level agentic intelligence and computational efficiency. We focus on what matters most when building agents: sharp reasoning and fast, reliable execution. Step 3.5 Flash pairs a 196B-parameter foundation with 11B active parameters for efficient inference. It is optimized with interleaved 3:1 sliding-window/full attention and Multi-Token Prediction (MTP-3) to reduce the latency and cost of multi-round agentic interactions. To reach frontier-level intelligence, we design a scalable reinforcement learning framework that combines verifiable signals with preference feedback, while remaining stable under large-scale off-policy training, enabling consistent self-improvement across mathematics, code, and tool use. Step 3.5 Flash demonstrates strong performance across agent, coding, and math tasks, achievin...","companies":["StepFun"],"matched_orgs":["StepFun"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["HuggingFace org papers","stepfun-ai","preference","efficient","agent"],"author_affiliations":["StepFun"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/stepfun-ai/papers"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/testexplora-benchmarking-llms-for-proactive-bug-discovery-via-repository-level-test-generation","title":"TestExplora: Benchmarking LLMs for Proactive Bug Discovery via Repository-Level Test Generation","url":"https://www.microsoft.com/en-us/research/publication/testexplora-benchmarking-llms-for-proactive-bug-discovery-via-repository-level-test-generation/","published":"2026-02-11","authors":["Steven Liu","Jane Luo","Xin Zhang","Aofan Liu","Hao Liu","J. Wu","Ziyang Huang","Yangyu Huang","Yu Kang","Scarlett Li"],"abstract":"Given that Large Language Models (LLMs) are increasingly applied to automate software development, comprehensive software assurance spans three distinct goals: regression prevention, reactive reproduction, and proactive discovery. Current evaluations systematically overlook the third goal. Specifically, they either treat existing code as ground truth (a compliance trap) for regression prevention, or depend on post-failure artifacts (e.g., issue reports) for bug reproduction-so they rarely surface defects before failures. To bridge this gap, we present TestExplora, a benchmark designed to evaluate LLMs as proactive testers within full-scale, realistic repository environments. TestExplora contains 2,389 tasks from 482 repositories and hides all defect-related signals. Models must proactively find bugs by comparing implementations against documentation-derived intent, using documentation as...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Miscellaneous","Artificial intelligence","Computer science","large language models"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"bytedance-seed:1397","title":"Reinforcing Chain-of-Thought Reasoning with Self-Evolving Rubrics","url":"https://seed.bytedance.com/en/research/reinforcing-chain-of-thought-reasoning-with-self-evolving-rubrics","published":"2026-02-11","authors":["Leheng Sheng","Wenchang Ma","Ruixin Hong","Xiang Wang","An Zhang","Tat-Seng Chua"],"abstract":"Despite chain-of-thought (CoT) playing crucial roles in LLM reasoning, directly rewarding it is difficult: training a reward model demands heavy human labeling efforts, and static RMs struggle with evolving CoT distributions and reward hacking. These challenges motivate us to seek an autonomous CoT rewarding approach that requires no human annotation efforts and can evolve gradually. Inspired by recent self-evolving training methods, we propose \\textbf{RLCER} (\\textbf{R}einforcement \\textbf{L}earning with \\textbf{C}oT Supervision via Self-\\textbf{E}volving \\textbf{R}ubrics), which enhances the outcome-centric RLVR by rewarding CoTs with self-proposed and self-evolving rubrics. We show that self-proposed and self-evolving rubrics provide reliable CoT supervision signals even without outcome rewards, enabling RLCER to outperform outcome-centric RLVR. Moreover, when used as in-prompt hints,...","companies":["ByteDance/Seed"],"matched_orgs":["ByteDance/Seed"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Artificial Intelligence","LLM","arXiv"],"author_affiliations":["ByteDance/Seed"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official ByteDance Seed publication API https://seed.bytedance.com/api/get_article_list_v2"}},{"id":"bytedance-seed:1402","title":"BagelVLA: Enhancing Long-Horizon Manipulation via Interleaved Vision-Language-Action Generation","url":"https://seed.bytedance.com/en/research/bagelvla-enhancing-long-horizon-manipulation-via-interleaved-vision-language-action-generation","published":"2026-02-11","authors":["Yucheng Hu","Jianke Zhang","Yuanfei Luo","Yanjiang Guo","Xiaoyu Chen","Xinshu Sun","Kun Feng","Qingzhou Lu","Sheng Chen","Yangang Zhang","Wei Li","Jianyu Chen"],"abstract":"Equipping embodied agents with the ability to reason about tasks, foresee physical outcomes, and generate precise actions is essential for general-purpose manipulation. While recent Vision-Language-Action (VLA) models have leveraged pre-trained foundation models, they typically focus on either linguistic planning or visual forecasting in isolation. These methods rarely integrate both capabilities simultaneously to guide action generation, leading to suboptimal performance in complex, long-horizon manipulation tasks. To bridge this gap, we propose BagelVLA, a unified model that integrates linguistic planning, visual forecasting, and action generation within a single framework. Initialized from a pretrained unified understanding and generative model, BagelVLA is trained to interleave textual reasoning and visual prediction directly into the action execution loop. To efficiently couple thes...","companies":["ByteDance/Seed"],"matched_orgs":["ByteDance/Seed"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Robotics","arXiv"],"author_affiliations":["ByteDance/Seed"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official ByteDance Seed publication API https://seed.bytedance.com/api/get_article_list_v2"}},{"id":"official:cc0276cbcc881faa","title":"UniT: Unified Multimodal Chain-of-Thought Test-time Scaling","url":"https://ai.meta.com/research/publications/unit-unified-multimodal-chain-of-thought-test-time-scaling/","published":"2026-02-11","authors":["Leon Liangyu Chen","Haoyu Ma","Zhipeng Fan","Ziqi Huang","Animesh Sinha","Xiaoliang Dai","Jialiang Wang","Zecheng He","Jianwei Yang","Chunyuan Li","Junzhe Sun","Chu Wang"],"abstract":"Official Meta AI publication page.","companies":["Meta/FAIR"],"matched_orgs":["Meta/FAIR"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Computer Vision"],"author_affiliations":["Meta/FAIR"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Meta AI publication search https://ai.meta.com/results/?content_types%5B0%5D=publication&page=1"}},{"id":"bytedance-seed:1384","title":"Hydra-Nav: Object Navigation via Adaptive Dual-Process Reasoning","url":"https://seed.bytedance.com/en/research/hydra-nav-object-navigation-via-adaptive-dual-process-reasoning","published":"2026-02-10","authors":["Zixuan Wang","Huang Fang","Shaoan Wang","Yuanfei Luo","Heng Dong","Wei Li","Yiming Gan"],"abstract":"While large vision-language models (VLMs) show promise for object goal navigation, current methods still struggle with low success rates and inefficient localization of unseen objects--failures primarily attributed to weak temporal-spatial reasoning. Meanwhile, recent attempts to inject reasoning into VLM-based agents improve success rates but incur substantial computational overhead. To address both the ineffectiveness and inefficiency of existing approaches, we introduce Hydra-Nav, a unified VLM architecture that adaptively switches between a deliberative slow system for analyzing exploration history and formulating high-level plans, and a reactive fast system for efficient execution. We train Hydra-Nav through a three-stage curriculum: (i) spatial-action alignment to strengthen trajectory planning, (ii) memory-reasoning integration to enhance temporal-spatial reasoning over long-horiz...","companies":["ByteDance/Seed"],"matched_orgs":["ByteDance/Seed"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Robotics","arXiv","memory","efficient"],"author_affiliations":["ByteDance/Seed"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official ByteDance Seed publication API https://seed.bytedance.com/api/get_article_list_v2"}},{"id":"huawei-noah:161","title":"Zero-shot Model-based Reinforcement Learning using Large Language Models","url":"https://www.noahlab.com.hk/en/scientific_research/zero-shot-model-based-reinforcement-learning-using-large-language-models","published":"2026-02-10","authors":["Huawei/Noah"],"abstract":"Official Huawei Noah's Ark Lab publication entry. Venue: ICML2025. External paper link: https://arxiv.org/abs/2410.11711","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Time Series AI","ICML2025","2025"],"author_affiliations":["Huawei/Noah"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Huawei Noah Wagtail publication API https://www.noahlab.com.hk/wt_app/api/v2/pages/"}},{"id":"huawei-noah:163","title":"LiPM: Foundation Model for Lithium-Ion Battery Analysis","url":"https://www.noahlab.com.hk/en/scientific_research/lipm-foundation-model-for-lithium-ion-battery-analysis","published":"2026-02-10","authors":["Huawei/Noah"],"abstract":"Official Huawei Noah's Ark Lab publication entry. Venue: KDD2025. External paper link: https://dl.acm.org/doi/10.1145/3711896.3737027","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Time Series AI","KDD2025","2025"],"author_affiliations":["Huawei/Noah"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Huawei Noah Wagtail publication API https://www.noahlab.com.hk/wt_app/api/v2/pages/"}},{"id":"huawei-noah:138","title":"Dual Prompting Image Restoration with Diffusion Transformers","url":"https://www.noahlab.com.hk/en/scientific_research/dual-prompting-image-restoration-with-diffusion-transformers","published":"2026-02-10","authors":["Huawei/Noah"],"abstract":"Official Huawei Noah's Ark Lab publication entry. Venue: CVPR 2025. External paper link: https://openaccess.thecvf.com/content/CVPR2025/papers/KongDualPromptingImageRestorationwithDiffusionTransformersCVPR2025paper.pdf","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Terminal intelligence","CVPR 2025","2025"],"author_affiliations":["Huawei/Noah"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Huawei Noah Wagtail publication API https://www.noahlab.com.hk/wt_app/api/v2/pages/"}},{"id":"huawei-noah:160","title":"Causal-aware Large Language Models: Enhancing Decision-Making Through Learning, Adapting and Acting","url":"https://www.noahlab.com.hk/en/scientific_research/causal-aware-large-language-models-enhancing-decision-making-through-learning-adapting-and-acting","published":"2026-02-10","authors":["Huawei/Noah"],"abstract":"Official Huawei Noah's Ark Lab publication entry. Venue: IJCAI2025. External paper link: https://arxiv.org/abs/2505.24710","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Time Series AI","IJCAI2025","2025"],"author_affiliations":["Huawei/Noah"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Huawei Noah Wagtail publication API https://www.noahlab.com.hk/wt_app/api/v2/pages/"}},{"id":"huawei-noah:164","title":"CAT: Causal Attention Tuning For Injecting Fine-grained Causal Knowledge into Large Language Models","url":"https://www.noahlab.com.hk/en/scientific_research/cat-causal-attention-tuning-for-injecting-fine-grained-causal-knowledge-into-large-language-models","published":"2026-02-10","authors":["Huawei/Noah"],"abstract":"Official Huawei Noah's Ark Lab publication entry. Venue: EMNLP 2025. External paper link: https://arxiv.org/abs/2509.01535","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Time Series AI","EMNLP 2025","2025"],"author_affiliations":["Huawei/Noah"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Huawei Noah Wagtail publication API https://www.noahlab.com.hk/wt_app/api/v2/pages/"}},{"id":"huawei-noah:162","title":"AdaPTS: Adapting Univariate Foundation Models to Probabilistic Multivariate Time Series Forecasting","url":"https://www.noahlab.com.hk/en/scientific_research/adapts-adapting-univariate-foundation-models-to-probabilistic-multivariate-time-series-forecasting","published":"2026-02-10","authors":["Huawei/Noah"],"abstract":"Official Huawei Noah's Ark Lab publication entry. Venue: ICML 2025. External paper link: https://arxiv.org/abs/2502.10235","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Time Series AI","ICML 2025","2025"],"author_affiliations":["Huawei/Noah"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Huawei Noah Wagtail publication API https://www.noahlab.com.hk/wt_app/api/v2/pages/"}},{"id":"huawei-noah:142","title":"A^2 Flow: Automating Agentic Workflow Generation via Self-Adaptive Abstraction Operators","url":"https://www.noahlab.com.hk/en/scientific_research/a2flow-automating-agentic-workflow-generation-via-self-adaptive-abstraction-operators","published":"2026-02-10","authors":["Huawei/Noah"],"abstract":"Official Huawei Noah's Ark Lab publication entry. Venue: AAAI 2026. External paper link: https://arxiv.org/abs/2511.20693","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Intelligent sensing integration","AAAI 2026","2026"],"author_affiliations":["Huawei/Noah"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Huawei Noah Wagtail publication API https://www.noahlab.com.hk/wt_app/api/v2/pages/"}},{"id":"hf-org-paper:tencent:2602.09823","title":"Covo-Audio Technical Report","url":"https://huggingface.co/papers/2602.09823","published":"2026-02-10","authors":["Tencent/Hunyuan"],"abstract":"","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["HuggingFace org papers","tencent"],"author_affiliations":["Tencent/Hunyuan"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/tencent/papers"}},{"id":"arxiv:2602.09829","title":"Internalizing Multi-Agent Reasoning for Accurate and Efficient LLM-based Recommendation","url":"http://arxiv.org/abs/2602.09829","published":"2026-02-10","authors":["Yang Wu","Hao Wang","Qian Li","Jun Zhang","Huan Yu","Jie Jiang"],"abstract":"Large Language Models (LLMs) are reshaping recommender systems by leveraging extensive world knowledge and semantic reasoning to interpret user intent. However, effectively integrating these capabilities with collaborative signals while avoiding prohibitive inference latency remains a critical bottleneck. To address this, we propose a trajectory-driven internalization framework to develop a Single-agent Trajectory-Aligned Recommender (STAR). Specifically, to internalize complex reasoning capabilities into a single efficient model, we first design a multi-agent teacher system capable of multi-turn tool usage and reflection. This teacher utilizes a Collaborative Signal Translation mechanism to explicitly convert latent behavioral patterns into descriptive natural language evidence to enhance reasoning accuracy. Subsequently, a trajectory-driven distillation pipeline transfers this agentic....","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"","openalex_id":"https://openalex.org/W7128648442","cited_by_count":0,"quality_score":57,"matched_keywords":["LLM","efficient","distillation","agent","multi-agent"],"author_affiliations":["Tencent (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8062999844551086},{"id":"https://openalex.org/C43521106","display_name":"Pipeline (software)","score":0.6466000080108643},{"id":"https://openalex.org/C2776214188","display_name":"Inference","score":0.6107000112533569},{"id":"https://openalex.org/C557471498","display_name":"Recommender system","score":0.5026999711990356},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.498199999332428},{"id":"https://openalex.org/C195324797","display_name":"Natural language","score":0.4837999939918518},{"id":"https://openalex.org/C2779439875","display_name":"Natural language understanding","score":0.4327999949455261},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.38679999113082886}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"apple:cnrasx4grpzzr7kuoflgwhm0","title":"Parallel Track Transformers: Enabling Fast GPU Inference with Reduced Synchronization","url":"https://machinelearning.apple.com/research/parallel-track","published":"2026-02-10","authors":["Chong Wang","Nan Du","Tom Gunter","Tao Lei","Kulin Seth","Senyu Tong","Jianyu Wang","Guoli Yin","Xiyou Zhou","Kelvin Zou","Ruoming Pang"],"abstract":"Efficient large-scale inference of transformer-based large language models (LLMs) remains a fundamental systems challenge, frequently requiring multi-GPU parallelism to meet stringent latency and throughput targets. Conventional tensor parallelism decomposes matrix operations across devices but introduces substantial inter-GPU synchronization, leading to communication bottlenecks and degraded scalability. We propose the Parallel Track (PT)...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["efficient"],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"arxiv:2409.07253","title":"Alignment of Diffusion Models: Fundamentals, Challenges, and Future","url":"https://arxiv.org/abs/2409.07253","published":"2026-02-10","authors":["Buhua Liu","Shitong Shao","Bao Li","Lichen Bai","Zhiqiang Xu","Haoyi Xiong","James T. Kwok","Sumi Helal","Zeke Xie"],"abstract":"Diffusion models have emerged as the leading paradigm in generative modeling, excelling in various applications. Despite their success, these models often misalign with human intentions and generate results with undesired properties or even harmful content. Inspired by the success and popularity of alignment in tuning large language models, recent studies have investigated aligning diffusion models with human expectations and preferences. This work mainly reviews alignment of diffusion models, covering advancements in fundamentals of alignment, alignment techniques of diffusion models, preference benchmarks, and evaluation for diffusion models. Moreover, we discuss key perspectives on current challenges and promising future directions on solving the remaining challenges in alignment of diffusion models. To the best of our knowledge, our work is the first comprehensive review paper for re...","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3796982","openalex_id":"https://openalex.org/W7128533478","cited_by_count":0,"quality_score":41,"matched_keywords":["preference"],"author_affiliations":["Baidu (China)","Chinese Academy of Sciences","Hong Kong University of Science and Technology","Mohamed bin Zayed University of Artificial Intelligence","Tsinghua University","University of Bologna","University of Florida"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7603999972343445},{"id":"https://openalex.org/C69357855","display_name":"Diffusion","score":0.6991000175476074},{"id":"https://openalex.org/C2780586970","display_name":"Popularity","score":0.6901999711990356},{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.5271999835968018},{"id":"https://openalex.org/C18762648","display_name":"Work (physics)","score":0.5047000050544739},{"id":"https://openalex.org/C26517878","display_name":"Key (lock)","score":0.49810001254081726},{"id":"https://openalex.org/C2781249084","display_name":"Preference","score":0.4878999888896942},{"id":"https://openalex.org/C539667460","display_name":"Management science","score":0.3869999945163727}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/aro-a-new-lens-on-matrix-optimization-for-large-models","title":"ARO: A New Lens On Matrix Optimization For Large Models","url":"https://www.microsoft.com/en-us/research/publication/aro-a-new-lens-on-matrix-optimization-for-large-models/","published":"2026-02-09","authors":["Wenbo Gong","Javier Zazo","Qijun Luo","Puqian Wang","James Hensman","Chao Ma"],"abstract":"Matrix-based optimizers have attracted growing interest for improving LLM training efficiency, with significant progress centered on orthogonalization/whitening based methods. While yielding substantial performance gains, a fundamental question arises: can we develop new paradigms beyond orthogonalization, pushing the efficiency frontier further? We present \\textbf{Adaptively Rotated Optimization (ARO}, a new matrix optimization framework that treats gradient rotation as a first class design principle. ARO accelerates LLM training by performing normed steepest descent in a rotated coordinate system, where the rotation is determined by a novel norm-informed policy. This perspective yields update rules that go beyond existing orthogonalization and whitening optimizers, improving sample efficiency in practice. To make comparisons reliable, we propose a rigorously controlled benchmarking pro...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Miscellaneous","Artificial intelligence","Computer science","mathematics","LLM","efficient"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"hf-org-paper:Tencent-Hunyuan:2602.09022","title":"WorldCompass: Reinforcement Learning for Long-Horizon World Models","url":"https://huggingface.co/papers/2602.09022","published":"2026-02-09","authors":["Tencent/Hunyuan"],"abstract":"","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["HuggingFace org papers","Tencent-Hunyuan"],"author_affiliations":["Tencent/Hunyuan"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/Tencent-Hunyuan/papers"}},{"id":"hf-org-paper:stepfun-ai:2602.09007","title":"GEBench: Benchmarking Image Generation Models as GUI Environments","url":"https://huggingface.co/papers/2602.09007","published":"2026-02-09","authors":["StepFun"],"abstract":"","companies":["StepFun"],"matched_orgs":["StepFun"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["HuggingFace org papers","stepfun-ai"],"author_affiliations":["StepFun"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/stepfun-ai/papers"}},{"id":"arxiv:2602.08377","title":"Reinforcement Learning with Backtracking Feedback","url":"https://arxiv.org/abs/2602.08377","published":"2026-02-09","authors":["Bilgehan Sel","Vaishakh Keshava","Phillip Wallis","Lukas Rutishauser","Ming Jin","Dingcheng Li"],"abstract":"Addressing the critical need for robust safety in Large Language Models (LLMs), particularly against adversarial attacks and in-distribution errors, we introduce Reinforcement Learning with Backtracking Feedback (RLBF). This framework advances upon prior methods, such as BSAFE, by primarily leveraging a Reinforcement Learning (RL) stage where models learn to dynamically correct their own generation errors. Through RL with critic feedback on the model's live outputs, LLMs are trained to identify and recover from their actual, emergent safety violations by emitting an efficient \"backtrack by x tokens\" signal, then continuing generation autoregressively. This RL process is crucial for instilling resilience against sophisticated adversarial strategies, including middle filling, Greedy Coordinate Gradient (GCG) attacks, and decoding parameter manipulations. To further support the acquisition....","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"","openalex_id":"https://openalex.org/W7128554565","cited_by_count":0,"quality_score":41,"matched_keywords":["efficient"],"author_affiliations":["Google (United States)"],"concepts":[{"id":"https://openalex.org/C156884757","display_name":"Backtracking","score":0.8449000120162964},{"id":"https://openalex.org/C97541855","display_name":"Reinforcement learning","score":0.7372999787330627},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.722000002861023},{"id":"https://openalex.org/C37736160","display_name":"Adversarial system","score":0.6567999720573425},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5386999845504761},{"id":"https://openalex.org/C98045186","display_name":"Process (computing)","score":0.5105999708175659},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.4876999855041504},{"id":"https://openalex.org/C2779585090","display_name":"Resilience (materials science)","score":0.4253000020980835}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7128428985","title":"Trainable subnetworks reveal insights into structure knowledge organization in protein language models","url":"https://doi.org/10.1371/journal.pcbi.1013925","published":"2026-02-09","authors":["Ria Vinod","Ava P. Amini","Lorin Crawford","Kevin K. Yang"],"abstract":"Protein language models (PLMs) pretrained via a masked language modeling objective have proven effective across a range of structure-related tasks, including high-resolution structure prediction. However, it remains unclear to what extent these models factorize protein structural categories among their learned parameters. In this work, we introduce trainable subnetworks, which mask out the PLM weights responsible for language modeling performance on a structural category of proteins. We systematically trained 39 PLM subnetworks targeting both sequence- and residue-level features at varying degrees of resolution using annotations defined by the CATH taxonomy and secondary structure elements. Using these PLM subnetworks, we assessed how structural factorization in PLMs influences downstream structure prediction. Our results show that PLMs are highly sensitive to sequence-level features and...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1371/journal.pcbi.1013925","openalex_id":"https://openalex.org/W7128428985","cited_by_count":3,"quality_score":40,"matched_keywords":[],"author_affiliations":["Brown University","Microsoft (United States)","John Brown University","Microsoft Research (United Kingdom)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7303000092506409},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.5669999718666077},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5351999998092651},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.5156999826431274},{"id":"https://openalex.org/C2776401178","display_name":"Feature (linguistics)","score":0.4239000082015991},{"id":"https://openalex.org/C187834632","display_name":"Factorization","score":0.3495999872684479},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.33000001311302185},{"id":"https://openalex.org/C116834253","display_name":"Identification (biology)","score":0.30239999294281006}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7128377724","title":"AI-generated data contamination erodes pathological variability and diagnostic reliability","url":"https://doi.org/10.21203/rs.3.rs-8753102/v1","published":"2026-02-09","authors":["Dianbo Liu","Hongyu He","Shaowen Xiang","Ye Zhang","Yingtao Zhu","Jin Zhang","Yunyi Lu","Hao Deng","Emily Alsentzer","Yun Liu","Qingyu Chen","Kun‐Hsing Yu"],"abstract":"Abstract Generative artificial intelligence (AI) is rapidly populating medical records with synthetic or partially AI-generated content, creating a feedback loop where future models are increasingly at risk of training on uncurated AI-generated data. However, the clinical consequences of this AI-generated data contamination remain unexplored. Here, we show that in the absence of mandatory human verification, this self-referential cycle drives a rapid erosion of pathological variability and diagnostic reliability of medical data at population scale. By analysing more than 800,000 synthetic data points across clinical text generation, vision–language reporting, and medical image synthesis, we find that models progressively converge toward generic phenotypes regardless of the model architectures. Specifically, rare but critical findings, including pneumothorax and effusions, vanish from the...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"preprint","doi":"https://doi.org/10.21203/rs.3.rs-8753102/v1","openalex_id":"https://openalex.org/W7128377724","cited_by_count":2,"quality_score":39,"matched_keywords":[],"author_affiliations":["Google (United States)","Harvard University","Massachusetts General Hospital","National University of Singapore","Singapore Eye Research Institute","Singapore National Eye Center","Stanford University","WinnMed","Yale University","Harvard University Press","Mayo Clinic in Arizona"],"concepts":[{"id":"https://openalex.org/C43214815","display_name":"Reliability (semiconductor)","score":0.5091999769210815},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.49390000104904175},{"id":"https://openalex.org/C2908647359","display_name":"Population","score":0.48069998621940613},{"id":"https://openalex.org/C56666940","display_name":"Documentation","score":0.4431000053882599},{"id":"https://openalex.org/C71924100","display_name":"Medicine","score":0.4018000066280365},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.39559999108314514},{"id":"https://openalex.org/C160920958","display_name":"Synthetic data","score":0.3750999867916107},{"id":"https://openalex.org/C43711488","display_name":"Skew","score":0.3481999933719635}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7128424260","title":"StyleAvatar3D: Leveraging Image-Text Diffusion Models for High-Fidelity 3D Avatar Generation","url":"https://doi.org/10.1109/jstsp.2026.3662496","published":"2026-02-09","authors":["Chi Zhang","Yiwen Chen","Yijun Fu","Wei Cheng","Zhenglin Zhou","Wenjia Jiang","Zhibin Wang","Bin Fu","Tao Chen","Gang Yu","Guosheng Lin","Chenxi Song"],"abstract":"The recent advancements in image-text diffusion models have stimulated research interest in large-scale 3D generative models. Nevertheless, the limited availability of diverse 3D resources presents significant challenges to learning. In this paper, we present a novel method for generating high-quality, stylized 3D avatars that utilizes pre-trained image-text diffusion models for data generation and a Generative Adversarial Network (GAN)-based 3D generation network for training. Our method leverages the comprehensive priors of appearance and geometry offered by image-text diffusion models to generate multi-view images of avatars in various styles. During data generation, we employ poses extracted from existing 3D models to guide the generation of multi-view images. To handle inaccurate pose annotations of stylized images, we investigate view-specific prompts and develop a coarse-to-fine d...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/jstsp.2026.3662496","openalex_id":"https://openalex.org/W7128424260","cited_by_count":1,"quality_score":38,"matched_keywords":[],"author_affiliations":["Fudan University","Nanyang Technological University","Tencent (China)","Westlake University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8277000188827515},{"id":"https://openalex.org/C38935604","display_name":"Stylized fact","score":0.7143999934196472},{"id":"https://openalex.org/C2779803651","display_name":"Discriminator","score":0.5996000170707703},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5680999755859375},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.4797999858856201},{"id":"https://openalex.org/C177769412","display_name":"Prior probability","score":0.45910000801086426},{"id":"https://openalex.org/C67186912","display_name":"Data modeling","score":0.4171000123023987},{"id":"https://openalex.org/C2776214188","display_name":"Inference","score":0.40700000524520874}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W7128502050","title":"Multimodal Machine Learning Reveals the Genomic and Proteomic Architecture of Heart Failure with Preserved Ejection Fraction","url":"https://doi.org/10.64898/2026.02.07.26345811","published":"2026-02-09","authors":["J O'sullivan","Taedong Yun","Ruoyi Cai","David Amar","Tim Assimes","Akshay Chaudhari","Dan Say Kim","E D Lewis","Francois Haddad","Farhad Hormozdiari","J. Weston Hughes","Gabriel N. Mannis"],"abstract":"Abstract Heart failure with preserved ejection fraction (HFpEF) affects over 30 million people and lacks disease-modifying therapies. Although genomic-led drug discovery increases success by more than 2.6-fold, HFpEF genomic discovery remains constrained by imprecise phenotyping in biobanks, with only two loci identified to date. Biobanks lack HFpEF diagnostic codes and echocardiograms, yet HFpEF diagnosis exists along a continuum and is inherently probabilistic, presenting an opportunity for multimodal prediction. Here we introduce TRI-modal Assessment and Discovery of HFpEF (TRIAD-HFpEF), a machine learning framework integrating electrocardiograms, cardiac magnetic resonance imaging, and biomarkers to assign HFpEF probabilities. Deployed in UK Biobank, these probabilities validate with respect to mortality, hospitalizations, and structural and functional HFpEF features. Genome-wide and...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.64898/2026.02.07.26345811","openalex_id":"https://openalex.org/W7128502050","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Columbia University","Columbia University Irving Medical Center","Google (United States)","Stanford University","Tel Aviv University","University of California, San Francisco","University of Washington","VA Palo Alto Health Care System"],"concepts":[{"id":"https://openalex.org/C2777099384","display_name":"Heart failure with preserved ejection fraction","score":0.8723999857902527},{"id":"https://openalex.org/C116567970","display_name":"Biobank","score":0.6053000092506409},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.552299976348877},{"id":"https://openalex.org/C71924100","display_name":"Medicine","score":0.5248000025749207},{"id":"https://openalex.org/C2778198053","display_name":"Heart failure","score":0.5220000147819519},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4683000147342682},{"id":"https://openalex.org/C2779134260","display_name":"Disease","score":0.45820000767707825},{"id":"https://openalex.org/C74187038","display_name":"Drug discovery","score":0.40860000252723694}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/welfarist-formulations-for-diverse-similarity-search","title":"Welfarist Formulations for Diverse Similarity Search","url":"https://www.microsoft.com/en-us/research/publication/welfarist-formulations-for-diverse-similarity-search/","published":"2026-02-08","authors":["Siddharth Barman","Nirjhar Das","Shivam Gupta","Kiran Shiragur"],"abstract":"Nearest Neighbor Search (NNS) is a fundamental problem in data structures with wide-ranging applications, such as web search, recommendation systems, and, more recently, retrieval-augmented generations (RAG). In such recent applications, in addition to the relevance (similarity) of the returned neighbors, diversity among the neighbors is a central requirement. In this paper, we develop principled welfare-based formulations in NNS for realizing diversity across attributes. Our formulations are based on welfare functions -- from mathematical economics -- that satisfy central diversity (fairness) and relevance (economic efficiency) axioms. With a particular focus on Nash social welfare, we note that our welfare-based formulations provide objective functions that adaptively balance relevance and diversity in a query-dependent manner. Notably, such a balance was not present in the prior const...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":88,"matched_keywords":["Inproceedings (Conference)","Algorithms","Artificial intelligence","Data platforms and analytics","Search and information retrieval","Computer science","1970-01-01","retrieval","efficient"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/beyond-correctness-learning-robust-reasoning-via-transfer","title":"Beyond Correctness: Learning Robust Reasoning via Transfer","url":"https://www.microsoft.com/en-us/research/publication/beyond-correctness-learning-robust-reasoning-via-transfer/","published":"2026-02-08","authors":["Hyunseok Lee","Soheil Abbasloo","Jihoon Tack","Jinwoo Shin"],"abstract":"Reinforcement Learning with Verifiable Rewards (RLVR) has recently strengthened LLM reasoning, but its focus on final answer correctness leaves a critical gap: it does not ensure the robustness of the reasoning process itself. We adopt a simple philosophical view, robust reasoning should remain useful beyond the mind that produced it, and treat reasoning as a form of meaning transfer that must survive truncation, reinterpretation, and continuation. Building on this principle, we introduce Reinforcement Learning with Transferable Reward (RLTR), which operationalizes robustness via transfer reward that tests whether a partial reasoning prefix from one model can guide a separate model to the correct answer. This encourages LLMs to produce reasoning that is stable, interpretable, and genuinely generalizable. Our approach improves sampling consistency while improving final answer accuracy, an...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Miscellaneous","Algorithms","Artificial intelligence","Computer science","LLM","efficient"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/statistical-estimation-of-adversarial-risk-in-large-language-models-under-best-of-n-sampling","title":"Statistical Estimation of Adversarial Risk in Large Language Models under Best-of-N Sampling","url":"https://www.microsoft.com/en-us/research/publication/statistical-estimation-of-adversarial-risk-in-large-language-models-under-best-of-n-sampling/","published":"2026-02-08","authors":["Mingqian Feng","Xiaodong Liu","Weiwei Yang","Chenliang Xu","Chris White","Jianfeng Gao"],"abstract":"Large Language Models (LLMs) are typically evaluated for safety under single-shot or low-budget adversarial prompting, which underestimates real-world risk. In practice, attackers can exploit large-scale parallel sampling to repeatedly probe a model until a harmful response is produced. While recent work shows that attack success increases with repeated sampling, principled methods for predicting large-scale adversarial risk remain limited. We propose a scaling-aware Best-of-N estimation of risk, SABER, for modeling jailbreak vulnerability under Best-of-N sampling. We model sample-level success probabilities using a Beta distribution, the conjugate prior of the Bernoulli distribution, and derive an analytic scaling law that enables reliable extrapolation of large-N attack success rates from small-budget measurements. Using only n=100 samples, our anchored estimator predicts ASR@1000 with...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Miscellaneous","Security, privacy, and cryptography","Computer science","large language models","LLM"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"hf-org-paper:tencent:2602.08030","title":"Free(): Learning to Forget in Malloc-Only Reasoning Models","url":"https://huggingface.co/papers/2602.08030","published":"2026-02-08","authors":["Tencent/Hunyuan"],"abstract":"","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["HuggingFace org papers","tencent"],"author_affiliations":["Tencent/Hunyuan"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/tencent/papers"}},{"id":"openalex:W7128380964","title":"Genetic Diagnosis and Discovery Enabled by Large Language Models","url":"https://doi.org/10.1002/advs.202518656","published":"2026-02-08","authors":["Tao Tu","Khaled Saab","W Liu","Zhouqing Fang","Zhuanfen Cheng","Svetolik Spasić","Maja Djurišić","Hiroaki Mohri","Wenlong Ren","Anil Palepu","Juraj Gottweis","Alan Karthikesalingam"],"abstract":"Artificial intelligence (AI) has been used in many areas of medicine, and large language models (LLMs) have shown potential utility for various clinical applications. However, to determine if LLMs can accelerate the pace of genetic diagnosis and discovery, we examined whether recently developed LLMs (Med-PaLM 2 and Gemini) could assist in solving four types of genetic problems with sequentially increasing complexity. First, in response to free-text input, Med-PaLM 2 correctly identified murine genes with experimentally verified causative genetic factors for six previously studied murine models of biomedical traits. Second, Med-PaLM 2 identified a novel causative murine genetic factor for spontaneous hearing loss that was validated using knock-in mice. Third, we developed a retrieval and grounding pipeline that enabled Gemini 2.5 Pro to analyze large lists of genes, which contained geneti...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1002/advs.202518656","openalex_id":"https://openalex.org/W7128380964","cited_by_count":0,"quality_score":41,"matched_keywords":["retrieval"],"author_affiliations":["Google (United States)","National Taiwan University","National Taiwan University Hospital","Stanford Health Care","Stanford Medicine","Stanford University"],"concepts":[{"id":"https://openalex.org/C43521106","display_name":"Pipeline (software)","score":0.6617000102996826},{"id":"https://openalex.org/C2993153387","display_name":"Genetic diagnosis","score":0.5809000134468079},{"id":"https://openalex.org/C70721500","display_name":"Computational biology","score":0.47130000591278076},{"id":"https://openalex.org/C2780673598","display_name":"Genetic testing","score":0.460999995470047},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.4503999948501587},{"id":"https://openalex.org/C2992519594","display_name":"Genetic model","score":0.4472000002861023},{"id":"https://openalex.org/C23085057","display_name":"Genetic analysis","score":0.4383000135421753},{"id":"https://openalex.org/C104317684","display_name":"Gene","score":0.41620001196861267}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7128298583","title":"Can test cases generated by large language models facilitate automated program repair?","url":"https://doi.org/10.1007/s10664-026-10802-w","published":"2026-02-07","authors":["Chengming Zhang","Haoye Wang","Chuyang Xu","Jiakun Liu","Kui Liu","Zhongxin Liu"],"abstract":"","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1007/s10664-026-10802-w","openalex_id":"https://openalex.org/W7128298583","cited_by_count":1,"quality_score":38,"matched_keywords":[],"author_affiliations":["Harbin Institute of Technology","Huawei Technologies (China)","Zhejiang Lab","Zhejiang University","Zhejiang University of Science and Technology"],"concepts":[{"id":"https://openalex.org/C168065819","display_name":"Debugging","score":0.7415000200271606},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7404000163078308},{"id":"https://openalex.org/C153083717","display_name":"Leverage (statistics)","score":0.7110999822616577},{"id":"https://openalex.org/C43521106","display_name":"Pipeline (software)","score":0.6003999710083008},{"id":"https://openalex.org/C189430467","display_name":"Ranking (information retrieval)","score":0.47380000352859497},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.46779999136924744},{"id":"https://openalex.org/C128942645","display_name":"Test case","score":0.43479999899864197},{"id":"https://openalex.org/C200601418","display_name":"Reliability engineering","score":0.42419999837875366}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/plugmem-a-task-agnostic-plugin-memory-module-for-llm-agents","title":"PlugMem: A Task-Agnostic Plugin Memory Module for LLM Agents","url":"https://www.microsoft.com/en-us/research/publication/plugmem-a-task-agnostic-plugin-memory-module-for-llm-agents/","published":"2026-02-06","authors":["Ke Yang","Zixiang Chen","Xuan He","Jize Jiang","Michel Galley","Chenglong Wang","Jianfeng Gao","Jiawei Han","ChengXiang Zhai"],"abstract":"Long-term memory is essential for large language model (LLM) agents operating in complex environments, yet existing memory designs are either task-specific and non-transferable, or task-agnostic but less effective due to low task-relevance and context explosion from raw memory retrieval. We propose PlugMem, a task-agnostic plugin memory module that can be attached to arbitrary LLM agents without task-specific redesign. Motivated by the fact that decision-relevant information is concentrated as abstract knowledge rather than raw experience, we draw on cognitive science to structure episodic memories into a compact, extensible knowledge-centric memory graph that explicitly represents propositional and prescriptive knowledge. This representation enables efficient memory retrieval and reasoning over task-relevant knowledge, rather than verbose raw trajectories, and departs from other graph-b...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":92,"matched_keywords":["Miscellaneous","Artificial intelligence","Computer science","LLM","language model","memory","long-term","retrieval","efficient","agent"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"hf-org-paper:huawei-noah:2602.06883","title":"Vision Transformer Finetuning Benefits from Non-Smooth Components","url":"https://huggingface.co/papers/2602.06883","published":"2026-02-06","authors":["Huawei/Noah"],"abstract":"","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["HuggingFace org papers","huawei-noah"],"author_affiliations":["Huawei/Noah"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/huawei-noah/papers"}},{"id":"openalex:W7128307948","title":"ItemRAG: Retrieval-Augmented Generation with Item-Based Knowledge Computing for E-Commerce Product Question Answering","url":"https://doi.org/10.26599/bdma.2025.9020080","published":"2026-02-06","authors":["Changliang Xu","Yukun Kang","Quan Feng","Jinghua Hua","Piji Li","Feiran Wu","Wei Hu","Xiang Chen","Sheng-Jun Huang","Songcan Chen"],"abstract":"The integration of Large Language Models (LLMs) into e-commerce platforms has significantly enhanced user experience through personalized recommendations and automated customer support. However, existing Retrieval-Augmented Generation (RAG) frameworks face challenges when applied to e-commerce product Question Answering (QA), such as handling extensive product catalogs, ensuring timely knowledge updates, and maintaining efficient retrieval performance. In this paper, we propose ItemRAG, a novel framework that combines RAG with item-based knowledge computing to address these challenges. ItemRAG decouples QA templates from specific products by leveraging a dynamic knowledge graph, enabling efficient updates and reducing the size of the knowledge base. The framework includes state analysis to capture user intent and context, grouped indexing for efficient retrieval, and knowledge computing....","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.26599/bdma.2025.9020080","openalex_id":"https://openalex.org/W7128307948","cited_by_count":1,"quality_score":54,"matched_keywords":["LLM","personalized","retrieval","efficient"],"author_affiliations":["Alibaba Group (China)","Hunan Xiangdian Test Research Institute (China)","Nanjing University of Aeronautics and Astronautics","University of Chinese Academy of Sciences","Zhejiang Energy Research Institute"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6844000220298767},{"id":"https://openalex.org/C44291984","display_name":"Question answering","score":0.53329998254776},{"id":"https://openalex.org/C90673727","display_name":"Product (mathematics)","score":0.4706000089645386},{"id":"https://openalex.org/C115925183","display_name":"Knowledge-based systems","score":0.3716000020503998},{"id":"https://openalex.org/C4554734","display_name":"Knowledge base","score":0.35899999737739563},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.34709998965263367},{"id":"https://openalex.org/C161301231","display_name":"Knowledge representation and reasoning","score":0.34689998626708984},{"id":"https://openalex.org/C120567893","display_name":"Knowledge extraction","score":0.31150001287460327}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"official:06cbdeda1508d3bc","title":"ERNIE 5.0: A 2.4 Trillion-Parameter Unified Multimodal Foundation Model","url":"https://ernie.baidu.com/blog/posts/ernie5.0/","published":"2026-02-06","authors":["Baidu"],"abstract":"We introduce ERNIE 5.0: a 2.4 trillion-parameter Unified Multimodal Model trained from scratch. Integrating text, image, video, and audio into a single autoregressive framework, it overcomes the limitations of late-fusion architectures to achieve seamless cross-modal understanding and generation.","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Baidu"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official RSS feed https://ernie.baidu.com/blog/index.xml"}},{"id":"openalex:W7128400517","title":"Di3PO - Diptych Diffusion DPO for Targeted Improvements in Image Generation","url":"https://doi.org/10.48550/arxiv.2602.06355","published":"2026-02-06","authors":["Sanjana Reddy","Ishaan Malhi","Sally Ma","Dutta, Praneet"],"abstract":"Existing methods for preference tuning of text-to-image (T2I) diffusion models often rely on computationally expensive generation steps to create positive and negative pairs of images. These approaches frequently yield training pairs that either lack meaningful differences, are expensive to sample and filter, or exhibit significant variance in irrelevant pixel regions, thereby degrading training efficiency. To address these limitations, we introduce \"Di3PO\", a novel method for constructing positive and negative pairs that isolates specific regions targeted for improvement during preference tuning, while keeping the surrounding context in the image stable. We demonstrate the efficacy of our approach by applying it to the challenging task of text rendering in diffusion models, showcasing improvements over baseline methods of SFT and DPO.","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"preprint","doi":"https://doi.org/10.48550/arxiv.2602.06355","openalex_id":"https://openalex.org/W7128400517","cited_by_count":0,"quality_score":41,"matched_keywords":["preference"],"author_affiliations":["Google (United States)","Google DeepMind (United Kingdom)"],"concepts":[{"id":"https://openalex.org/C205711294","display_name":"Rendering (computer graphics)","score":0.6310999989509583},{"id":"https://openalex.org/C160633673","display_name":"Pixel","score":0.6007999777793884},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5638999938964844},{"id":"https://openalex.org/C2779343474","display_name":"Context (archaeology)","score":0.5443000197410583},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5203999876976013},{"id":"https://openalex.org/C115961682","display_name":"Image (mathematics)","score":0.4772000014781952},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.46149998903274536},{"id":"https://openalex.org/C196083921","display_name":"Variance (accounting)","score":0.42969998717308044}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/sema-simple-yet-effective-learning-for-multi-turn-jailbreak-attacks","title":"SEMA: Simple yet Effective Learning for Multi-Turn Jailbreak Attacks","url":"https://www.microsoft.com/en-us/research/publication/sema-simple-yet-effective-learning-for-multi-turn-jailbreak-attacks/","published":"2026-02-05","authors":["Mingqian Feng","Xiaodong Liu","Weiwei Yang","Jialin Song","Xuekai Zhu","Chenliang Xu","Jianfeng Gao"],"abstract":"Multi-turn jailbreaks capture the real threat model for safety-aligned chatbots, where single-turn attacks are merely a special case. Yet existing approaches break under exploration complexity and intent drift. We propose SEMA, a simple yet effective framework that trains a multi-turn attacker without relying on any existing strategies or external data. SEMA comprises two stages. Prefilling self-tuning enables usable rollouts by fine-tuning on non-refusal, well-structured, multi-turn adversarial prompts that are self-generated with a minimal prefix, thereby stabilizing subsequent learning. Reinforcement learning with intent-drift-aware reward trains the attacker to elicit valid multi-turn adversarial prompts while maintaining the same harmful objective. We anchor harmful intent in multi-turn jailbreaks via an intent-drift-aware reward that combines intent alignment, compliance risk, and....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":80,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","1970-01-01","LLM","language model","preference"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"hf-org-paper:Qwen:2602.05400","title":"OPUS: Towards Efficient and Principled Data Selection in Large Language Model Pre-training in Every Iteration","url":"https://huggingface.co/papers/2602.05400","published":"2026-02-05","authors":["Alibaba/Qwen"],"abstract":"","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["HuggingFace org papers","Qwen","language model","efficient"],"author_affiliations":["Alibaba/Qwen"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/Qwen/papers"}},{"id":"bytedance-seed:1398","title":"BABE: Biology Arena BEnchmark","url":"https://seed.bytedance.com/en/research/babe-biology-arena-benchmark","published":"2026-02-05","authors":["Junting Zhou","Jin Chen","Linfeng Hao","Denghui Cao","Zheyu Wang","Qiguang Chen","Chaoyou Fu","Jiaze Chen","Yuchen Wu","Ge Zhang","Mingxuan Wang","Wenhao Huang"],"abstract":"The rapid evolution of large language models (LLMs) has expanded their capabilities from basic dialogue to advanced scientific reasoning. However, existing benchmarks in biology often fail to assess a critical skill required of researchers: the ability to integrate experimental results with contextual knowledge to derive meaningful conclusions. To address this gap, we introduce BABE(Biology Arena BEnchmark), a comprehensive benchmark designed to evaluate the experimental reasoning capabilities of biological AI systems. BABE is uniquely constructed from peer-reviewed research papers and real-world biological studies, ensuring that tasks reflect the complexity and interdisciplinary nature of actual scientific inquiry. BABE challenges models to perform causal reasoning and cross-scale inference. Our benchmark provides a robust framework for assessing how well AI systems can reason like prac...","companies":["ByteDance/Seed"],"matched_orgs":["ByteDance/Seed"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Artificial Intelligence","Seed","arXiv"],"author_affiliations":["ByteDance/Seed"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official ByteDance Seed publication API https://seed.bytedance.com/api/get_article_list_v2"}},{"id":"hf-org-paper:Tencent-Hunyuan:2602.05327","title":"ProAct: Agentic Lookahead in Interactive Environments","url":"https://huggingface.co/papers/2602.05327","published":"2026-02-05","authors":["Tencent/Hunyuan"],"abstract":"","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["HuggingFace org papers","Tencent-Hunyuan"],"author_affiliations":["Tencent/Hunyuan"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/Tencent-Hunyuan/papers"}},{"id":"hf-org-paper:tencent:2602.05847","title":"OmniVideo-R1: Reinforcing Audio-visual Reasoning with Query Intention and Modality Attention","url":"https://huggingface.co/papers/2602.05847","published":"2026-02-05","authors":["Tencent/Hunyuan"],"abstract":"","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["HuggingFace org papers","tencent"],"author_affiliations":["Tencent/Hunyuan"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/tencent/papers"}},{"id":"official:7909627c9d99e03b","title":"GPT-5.3-Codex System Card","url":"https://openai.com/index/gpt-5-3-codex-system-card","published":"2026-02-05","authors":["OpenAI"],"abstract":"GPT‑5.3-Codex is the most capable agentic coding model to date, combining the frontier coding performance of GPT‑5.2-Codex with the reasoning and professional knowledge capabilities of GPT‑5.2.","companies":["OpenAI"],"matched_orgs":["OpenAI"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Publication"],"author_affiliations":["OpenAI"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official RSS feed https://openai.com/news/rss.xml"}},{"id":"openalex:W7128073339","title":"Automatically Defining Protein Words for Diverse Functional Predictions Based on Attention Analysis of a Protein Language Model","url":"https://doi.org/10.1002/advs.202521970","published":"2026-02-05","authors":["Hedi Chen","Jingrui Zhong","Xi Zhang","Jingke Chen","Lin Guo","Xiaoliang Xiong","Xi Zhang","Xiangyu Liu","Bailong Xiao","Boxue Tian"],"abstract":"Understanding the relationship between protein sequence and function remains a longstanding challenge in bioinformatics, and to date the lion's share of related tools parse proteins at the domain or motif levels. Here, we define \"protein words\" as an alternative to \"motif\" for studying proteins and functional prediction applications. We first developed an unsupervised tool we term Protein Wordwise, which parses analyte protein sequences into protein words by analyzing attention matrices from a protein language model (PLM) through a community detection algorithm. We then developed a supervised sequence-function prediction model called Word2Function, for mapping protein words to GO terms through feature importance analysis. We compared the prediction performance of our protein word-based toolkit with a motif-based method (PROSITE) for multiple protein function datasets. We also assembled a...","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1002/advs.202521970","openalex_id":"https://openalex.org/W7128073339","cited_by_count":2,"quality_score":43,"matched_keywords":["language model"],"author_affiliations":["Baidu (China)","Tsinghua University","State Key Laboratory of Molecular Oncology"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.732200026512146},{"id":"https://openalex.org/C186644900","display_name":"Parsing","score":0.7168999910354614},{"id":"https://openalex.org/C2986374874","display_name":"Protein function","score":0.5958999991416931},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5264000296592712},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.43959999084472656},{"id":"https://openalex.org/C207060522","display_name":"Protein function prediction","score":0.4348999857902527},{"id":"https://openalex.org/C14036430","display_name":"Function (biology)","score":0.41339999437332153},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.4108999967575073}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7128042511","title":"Building Trustworthy AI for Enterprise Support: An Empirical Study of a RAG-Based Architectural Framework","url":"https://doi.org/10.70315/uloap.ulirs.2026.0301008","published":"2026-02-05","authors":["Udit Joshi","Kapil Verma"],"abstract":"The paper considers an approach to engineering a trustworthy enterprise support service based on a Retrieval-Augmented Generation (RAG) architecture in a high-load ticket-handling environment. The study’s relevance stems from the widespread deployment of generative solutions in contact centers and users’ growing sensitivity to hallucinations, stale data, and unpredictable system behavior. The objective of the research is to obtain an empirical assessment of a full-scale RAG architecture for internal support and to identify engineering decisions that critically determine its reliability. Across five experimental series, empirical data were obtained for key metrics: MRR@10 for retrieval quality, overall refusal and correct-refusal rates, the proportion of confident yet factually incorrect answers, p50/p90/p99 end-to-end latency, the time required to incorporate a document into the index, a...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.70315/uloap.ulirs.2026.0301008","openalex_id":"https://openalex.org/W7128042511","cited_by_count":0,"quality_score":41,"matched_keywords":["retrieval"],"author_affiliations":["Google (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7559000253677368},{"id":"https://openalex.org/C120936955","display_name":"Empirical research","score":0.5613999962806702},{"id":"https://openalex.org/C168167062","display_name":"Component (thermodynamics)","score":0.5378999710083008},{"id":"https://openalex.org/C2779010991","display_name":"Artifact (error)","score":0.475600004196167},{"id":"https://openalex.org/C105339364","display_name":"Software deployment","score":0.4722000062465668},{"id":"https://openalex.org/C158154518","display_name":"Relevance (law)","score":0.47209998965263367},{"id":"https://openalex.org/C56739046","display_name":"Knowledge management","score":0.44850000739097595},{"id":"https://openalex.org/C75165309","display_name":"Search engine indexing","score":0.44269999861717224}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7154617791","title":"The Effects of Audio Sample Rate on Training in Large Scale Foundation Models for Time Series Forecasting","url":"https://doi.org/10.1109/acdsa67686.2026.11468234","published":"2026-02-05","authors":["Logan Boehm","Kaleb E. Smith","Anthony O. Smith"],"abstract":"","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/acdsa67686.2026.11468234","openalex_id":"https://openalex.org/W7154617791","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Florida Institute of Technology","Nvidia (United States)"],"concepts":[{"id":"https://openalex.org/C2777211547","display_name":"Training (meteorology)","score":0.6363999843597412},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5738000273704529},{"id":"https://openalex.org/C143724316","display_name":"Series (stratigraphy)","score":0.5623000264167786},{"id":"https://openalex.org/C2778755073","display_name":"Scale (ratio)","score":0.5548999905586243},{"id":"https://openalex.org/C198531522","display_name":"Sample (material)","score":0.5},{"id":"https://openalex.org/C151406439","display_name":"Time series","score":0.4964999854564667},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.37549999356269836},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.3504999876022339}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7154629777","title":"Multimodal Generative AI for Next-Generation Healthcare Diagnostics and Predictive Analytics","url":"https://doi.org/10.1109/acdsa67686.2026.11468232","published":"2026-02-05","authors":["Sai Sukesh Reddy Tummuri"],"abstract":"The rapid evolution of multimodal healthcare data, such as medical images, clinical text, and structured electronic records, has provided new possibilities of smart diagnostics systems. This research introduces a Cross-Modal Generative Transformer (CMGT), the purpose of which is to combine heterogeneous healthcare modalities in a way that guarantees accurate prediction of the diseases and unambiguous generation of clinical knowledge. The suggested architecture brings together feature extraction based on various data sources with generative learning in order to increase data balance and wealth of data representation. It uses explainable-AI elements like SHAP-based feature attribution, attention heatmap and t-SNE latent-space visualization to make sure the decision process is interpretable. Performance metrics such as Precision, Recall, F1-Score, and Accuracy and ROC-AUC and calibration cu...","companies":["Meta/FAIR"],"matched_orgs":["Meta/FAIR"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/acdsa67686.2026.11468232","openalex_id":"https://openalex.org/W7154629777","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Meta (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6381999850273132},{"id":"https://openalex.org/C83209312","display_name":"Predictive analytics","score":0.5680999755859375},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5077999830245972},{"id":"https://openalex.org/C79158427","display_name":"Analytics","score":0.3937999904155731},{"id":"https://openalex.org/C160735492","display_name":"Health care","score":0.38989999890327454},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.3382999897003174},{"id":"https://openalex.org/C2522767166","display_name":"Data science","score":0.32170000672340393},{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.3019999861717224}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"arxiv:2602.06139","title":"EgoAVU: Egocentric Audio-Visual Understanding","url":"https://huggingface.co/papers/2602.06139","published":"2026-02-05","authors":["Ashish Seth","Xinhao Mei","Changsheng Zhao","Varun Nagaraja","Ernie Chang","Gregory P. Meyer","Gael Le Lan","Yunyang Xiong","Vikas Chandra","Yangyang Shi","Dinesh Manocha","Zhipeng Cai"],"abstract":"Understanding egocentric videos plays a vital role for embodied intelligence. Recent multi-modal large language models (MLLMs) can accept both visual and audio inputs. However, due to the challenge of obtaining text labels with coherent joint-modality information, whether MLLMs can jointly understand both modalities in egocentric videos remains under-explored. To address this problem, we introduce EgoAVU, a scalable data engine to automatically generate egocentric audio-visual narrations, questions, and answers. EgoAVU enriches human narrations with multimodal context and generates audio-visual narrations through cross-modal correlation modeling. Token-based video filtering and modular, graph-based curation ensure both data diversity and quality. Leveraging EgoAVU, we construct EgoAVU-Instruct, a large-scale training dataset of 3M samples, and EgoAVU-Bench, a manually verified evaluation...","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["huggingface_search"],"source":"huggingface_search","work_type":"","doi":"","openalex_id":"","cited_by_count":0,"quality_score":27,"matched_keywords":[],"author_affiliations":[],"concepts":[],"official_report":false,"quality_signals":{"company_match_source":"HuggingFace Papers organization/author metadata"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/duodrama-supporting-screenplay-refinement-through-llm-assisted-human-reflection","title":"DuoDrama: Supporting Screenplay Refinement Through LLM-Assisted Human Reflection","url":"https://www.microsoft.com/en-us/research/publication/duodrama-supporting-screenplay-refinement-through-llm-assisted-human-reflection/","published":"2026-02-04","authors":["Yuying Tang","Xinyi Chen","Haotian Li","Xing Xie","Xiaojuan Ma","Huamin Qu"],"abstract":"AI has been increasingly integrated into screenwriting practice. In refinement, screenwriters expect AI to provide feedback that supports reflection across the internal perspective of characters and the external perspective of the overall story. However, existing AI tools cannot sufficiently coordinate the two perspectives to meet screenwriters'needs. To address this gap, we present DuoDrama, an AI system that generates feedback to assist screenwriters'reflection in refinement. To enable DuoDrama, based on performance theories and a formative study with nine professional screenwriters, we design the Experience-Grounded Feedback Generation Workflow for Human Reflection (ExReflect). In ExReflect, an AI agent adopts an experience role to generate experience and then shifts to an evaluation role to generate feedback based on the experience. A study with fourteen professional screenwriters sh...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Human-computer interaction","Computer science","1970-01-01","LLM","agent"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/reducing-the-costs-of-proof-synthesis-on-rust-systems-by-scaling-up-a-seed-training-set","title":"Reducing the Costs of Proof Synthesis on Rust Systems by Scaling Up a Seed Training Set","url":"https://www.microsoft.com/en-us/research/publication/reducing-the-costs-of-proof-synthesis-on-rust-systems-by-scaling-up-a-seed-training-set/","published":"2026-02-04","authors":["Nongyu Di","Tianyu Chen","Shan Lu","Shuai Lu","Yeyun Gong","Peng Cheng","Jay Lorch","Yuan Yao","Xiaoxing Ma"],"abstract":"Large Language Models (LLMs) are widely used for code generation. However, the correctness of code generated by LLMs remains a concern. A potential remedy to this concern is to have LLMs generate formal correctness proofs along with such code. However, compared with code generation, code-proof generation requires much higher reasoning capability and has much less existing data to learn from. In this paper, we present VeruSyn, a data synthesis pipeline for Verus, a state-of-the-art verification tool for system software written in Rust. Through self-synthesis and tutorial-based synthesis, VeruSyn achieves much larger scale and Verus-feature coverage than previous data-synthesis techniques designed for Verus; VeruSyn also supplements its dataset with long-chain-of-thought (CoT) data through agent trajectory synthesis. With VeruSyn, we synthesize the largest set of Verus verified programs: 6...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Miscellaneous","Artificial intelligence","Programming languages and software engineering","Systems and networking","agent"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"bytedance-seed:1376","title":"VTok: A Unified Video Tokenizer with Decoupled Spatial-Temporal Latents","url":"https://seed.bytedance.com/en/research/vtok-a-unified-video-tokenizer-with-decoupled-spatial-temporal-latents","published":"2026-02-04","authors":["Feng Wang","Yichun Shi","Ceyuan Yang","Qiushan Guo","Jingxiang Sun","Alan Yuille","Peng Wang"],"abstract":"This work presents VTok, a unified video tokenization framework that can be used for both generation and understanding tasks. Unlike the leading vision-language systems that tokenize videos through a naive frame-sampling strategy, we propose to decouple the spatial and temporal representations of videos by retaining the spatial features of a single key frame while encoding each subsequent frame into a single residual token, achieving compact yet expressive video tokenization. Our experiments suggest that VTok effectively reduces the complexity of video representation from the product of frame count and per-frame token count to their sum, while the residual tokens sufficiently capture viewpoint and motion changes relative to the key frame. Extensive evaluations demonstrate the efficacy and efficiency of VTok: it achieves notably higher performance on a range of video understanding and tex...","companies":["ByteDance/Seed"],"matched_orgs":["ByteDance/Seed"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Computer Vision and Pattern Recognition","Vision","arXiv"],"author_affiliations":["ByteDance/Seed"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official ByteDance Seed publication API https://seed.bytedance.com/api/get_article_list_v2"}},{"id":"hf-org-paper:Qwen:2602.04649","title":"Outcome Accuracy is Not Enough: Aligning the Reasoning Process of Reward Models","url":"https://huggingface.co/papers/2602.04649","published":"2026-02-04","authors":["Alibaba/Qwen"],"abstract":"","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["HuggingFace org papers","Qwen"],"author_affiliations":["Alibaba/Qwen"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/Qwen/papers"}},{"id":"hf-org-paper:tencent:2602.05085","title":"Locas: Your Models are Principled Initializers of Locally-Supported Parametric Memories","url":"https://huggingface.co/papers/2602.05085","published":"2026-02-04","authors":["Tencent/Hunyuan"],"abstract":"","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["HuggingFace org papers","tencent"],"author_affiliations":["Tencent/Hunyuan"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/tencent/papers"}},{"id":"hf-org-paper:Qwen:2602.06079","title":"Canzona: A Unified, Asynchronous, and Load-Balanced Framework for Distributed Matrix-based Optimizers","url":"https://huggingface.co/papers/2602.06079","published":"2026-02-04","authors":["Alibaba/Qwen"],"abstract":"","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["HuggingFace org papers","Qwen"],"author_affiliations":["Alibaba/Qwen"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/Qwen/papers"}},{"id":"hf-org-paper:huawei-noah:2602.05027","title":"AudioSAE: Towards Understanding of Audio-Processing Models with Sparse AutoEncoders","url":"https://huggingface.co/papers/2602.05027","published":"2026-02-04","authors":["Huawei/Noah"],"abstract":"","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["HuggingFace org papers","huawei-noah"],"author_affiliations":["Huawei/Noah"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/huawei-noah/papers"}},{"id":"openalex:W7127366769","title":"GENERator: A Long-Context Generative Genomic Foundation Model","url":"https://doi.org/10.21203/rs.3.rs-8686063/v1","published":"2026-02-04","authors":["Q. Li","Wei Wu","Yong Zhang","Zhihao Zhan","Rui Chen","Mingyang Li","Kun Fu","Junyan Qi","Yongzhou Bao","Chao Wang","Yiheng Zhu","Zhiyun Zhang"],"abstract":"","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"preprint","doi":"https://doi.org/10.21203/rs.3.rs-8686063/v1","openalex_id":"https://openalex.org/W7127366769","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Alibaba Group (China)","Carnegie Mellon University","Chinese Academy of Agricultural Sciences","Hong Kong University of Science and Technology","Mila - Quebec Artificial Intelligence Institute","Moscow State University of Printing Arts","University of Science and Technology of China"],"concepts":[{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.6542999744415283},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5885000228881836},{"id":"https://openalex.org/C2780992000","display_name":"Generator (circuit theory)","score":0.5486999750137329},{"id":"https://openalex.org/C2778112365","display_name":"Sequence (biology)","score":0.5223000049591064},{"id":"https://openalex.org/C2779343474","display_name":"Context (archaeology)","score":0.5135999917984009},{"id":"https://openalex.org/C191908910","display_name":"Synthetic biology","score":0.4772000014781952},{"id":"https://openalex.org/C167966045","display_name":"Generative model","score":0.461899995803833},{"id":"https://openalex.org/C70721500","display_name":"Computational biology","score":0.44589999318122864}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"hf-org-paper:tencent:2602.03587","title":"CL-bench: A Benchmark for Context Learning","url":"https://huggingface.co/papers/2602.03587","published":"2026-02-03","authors":["Tencent/Hunyuan"],"abstract":"Current language models (LMs) excel at reasoning over prompts using pre-trained knowledge. However, real-world tasks are far more complex and context-dependent: models must learn from task-specific context and leverage new knowledge beyond what is learned during pre-training to reason and resolve tasks. We term this capability context learning, a crucial ability that humans naturally possess but has been largely overlooked. To this end, we introduce CL-bench, a real-world benchmark consisting of 500 complex contexts, 1,899 tasks, and 31,607 verification rubrics, all crafted by experienced domain experts. Each task is designed such that the new content required to resolve it is contained within the corresponding context. Resolving tasks in CL-bench requires models to learn from the context, ranging from new domain-specific knowledge, rule systems, and complex procedures to laws derived fr...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"https://openalex.org/W7127739632","cited_by_count":0,"quality_score":64,"matched_keywords":["HuggingFace org papers","tencent","retrieval"],"author_affiliations":["Tencent/Hunyuan","Fudan University","Tencent (China)"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/tencent/papers"}},{"id":"hf-org-paper:Tencent-Hunyuan:2602.03647","title":"Search-R2: Enhancing Search-Integrated Reasoning via Actor-Refiner Collaboration","url":"https://huggingface.co/papers/2602.03647","published":"2026-02-03","authors":["Tencent/Hunyuan"],"abstract":"","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["HuggingFace org papers","Tencent-Hunyuan"],"author_affiliations":["Tencent/Hunyuan"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/Tencent-Hunyuan/papers"}},{"id":"hf-org-paper:tencent:2602.03619","title":"Learning Query-Specific Rubrics from Human Preferences for DeepResearch Report Generation","url":"https://huggingface.co/papers/2602.03619","published":"2026-02-03","authors":["Tencent/Hunyuan"],"abstract":"","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["HuggingFace org papers","tencent"],"author_affiliations":["Tencent/Hunyuan"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/tencent/papers"}},{"id":"hf-org-paper:Tencent-Hunyuan:2602.03907","title":"HY3D-Bench: Generation of 3D Assets","url":"https://huggingface.co/papers/2602.03907","published":"2026-02-03","authors":["Tencent/Hunyuan"],"abstract":"While recent advances in neural representations and generative models have revolutionized 3D content creation, the field remains constrained by significant data processing bottlenecks. To address this, we introduce HY3D-Bench, an open-source ecosystem designed to establish a unified, high-quality foundation for 3D generation. Our contributions are threefold: (1) We curate a library of 250k high-fidelity 3D objects distilled from large-scale repositories, employing a rigorous pipeline to deliver training-ready artifacts, including watertight meshes and multi-view renderings; (2) We introduce structured part-level decomposition, providing the granularity essential for fine-grained perception and controllable editing; and (3) We bridge real-world distribution gaps via a scalable AIGC synthesis pipeline, contributing 125k synthetic assets to enhance diversity in long-tail categories. Validat...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["HuggingFace org papers","Tencent-Hunyuan"],"author_affiliations":["Tencent/Hunyuan"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/Tencent-Hunyuan/papers"}},{"id":"arxiv:2603.04413","title":"Simulating Meaning, Nevermore! Introducing ICR: A Semiotic-Hermeneutic Metric for Evaluating Meaning in LLM Text Summaries","url":"http://arxiv.org/abs/2603.04413","published":"2026-02-03","authors":["Natalie Perez","Sreyoshi Bhaduri","Aman Chadha"],"abstract":"Meaning in human language is relational, context dependent, and emergent, arising from dynamic systems of signs rather than fixed word-concept mappings. In computational settings, this semiotic and interpretive complexity complicates the generation and evaluation of meaning. This article proposes an interdisciplinary framework for studying meaning in large language model (LLM) generated language by integrating semiotics and hermeneutics with qualitative research methods. We review prior scholarship on meaning and machines, examining how linguistic signs are transformed into vectorized representations in static and contextualized embedding models, and identify gaps between statistical approximation and human interpretive meaning. We then introduce the Inductive Conceptual Rating (ICR) metric, a qualitative evaluation approach grounded in inductive content analysis and reflexive thematic a...","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"","openalex_id":"https://openalex.org/W7134094467","cited_by_count":0,"quality_score":45,"matched_keywords":["LLM","language model"],"author_affiliations":["Amazon (United States)","Komatsu (Japan)","University of Hawaii–West Oahu"],"concepts":[{"id":"https://openalex.org/C2780876879","display_name":"Meaning (existential)","score":0.6053000092506409},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6050000190734863},{"id":"https://openalex.org/C139997677","display_name":"Semiotics","score":0.5530999898910522},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.476500004529953},{"id":"https://openalex.org/C13200473","display_name":"Reflexivity","score":0.4575999975204468},{"id":"https://openalex.org/C41895202","display_name":"Linguistics","score":0.44839999079704285},{"id":"https://openalex.org/C527412718","display_name":"Interpretation (philosophy)","score":0.43639999628067017},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4253000020980835}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7127418831","title":"Dual-path consistency constrained concept erasure for text-to-image diffusion models","url":"https://doi.org/10.1007/s00530-025-02168-8","published":"2026-02-03","authors":["Xiaoran Bai","Dan Song","Peng Sun","Shuangyan Yue"],"abstract":"","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1007/s00530-025-02168-8","openalex_id":"https://openalex.org/W7127418831","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Baidu (China)","Tianjin Economic-Technological Development Area","Tianjin University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8252000212669373},{"id":"https://openalex.org/C2778790127","display_name":"Erasure","score":0.7041000127792358},{"id":"https://openalex.org/C63479239","display_name":"Robustness (evolution)","score":0.6603999733924866},{"id":"https://openalex.org/C101468663","display_name":"Modular design","score":0.6122999787330627},{"id":"https://openalex.org/C37736160","display_name":"Adversarial system","score":0.483599990606308},{"id":"https://openalex.org/C80444323","display_name":"Theoretical computer science","score":0.47519999742507935},{"id":"https://openalex.org/C2776436953","display_name":"Consistency (knowledge bases)","score":0.461899995803833},{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.4235000014305115}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/agentrx-diagnosing-ai-agent-failures-from-execution-trajectories","title":"AgentRx: Diagnosing AI Agent Failures from Execution Trajectories","url":"https://www.microsoft.com/en-us/research/publication/agentrx-diagnosing-ai-agent-failures-from-execution-trajectories/","published":"2026-02-02","authors":["Shraddha Barke","Arnav Goyal","Alind Khare","Avaljot Singh","Suman Nath","Chetan Bansal"],"abstract":"AI agents often fail in ways that are difficult to localize because executions are probabilistic, long-horizon, multi-agent, and mediated by noisy tool outputs. We address this gap by manually annotating failed agent runs and release a novel benchmark of 115 failed trajectories spanning structured API workflows, incident management, and open-ended web/file tasks. Each trajectory is annotated with a critical failure step and a category from a grounded-theory derived, cross domain failure taxonomy. To mitigate the human cost of failure attribution, we present AgentRx, an automated domain-agnostic diagnostic framework that pinpoints the critical failure step in a failed agent trajectory. It synthesizes constraints, evaluates them step-by-step, and produces an auditable validation log of constraint violations with associated evidence; an LLM-based judge uses this log to localize the critical...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Unpublished","Artificial intelligence","Programming languages and software engineering","LLM","agent","multi-agent"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"hf-org-paper:huawei-noah:2602.02110","title":"An Empirical Study of World Model Quantization","url":"https://huggingface.co/papers/2602.02110","published":"2026-02-02","authors":["Huawei/Noah"],"abstract":"World models learn an internal representation of environment dynamics, enabling agents to simulate and reason about future states within a compact latent space for tasks such as planning, prediction, and inference. However, running world models rely on hevay computational cost and memory footprint, making model quantization essential for efficient deployment. To date, the effects of post-training quantization (PTQ) on world models remain largely unexamined. In this work, we present a systematic empirical study of world model quantization using DINO-WM as a representative case, evaluating diverse PTQ methods under both weight-only and joint weight-activation settings. We conduct extensive experiments on different visual planning tasks across a wide range of bit-widths, quantization granularities, and planning horizons up to 50 iterations. Our results show that quantization effects in worl...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["HuggingFace org papers","huawei-noah","memory","efficient","quantization"],"author_affiliations":["Huawei/Noah"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/huawei-noah/papers"}},{"id":"bytedance-seed:1388","title":"SPARKLING: Balancing Signal Preservation and Symmetry Breaking for Width-Progressive Learning","url":"https://seed.bytedance.com/en/research/sparkling-balancing-signal-preservation-and-symmetry-breaking-for-width-progressive-learning","published":"2026-02-02","authors":["Qifan Yu","Xinyu Ma","Zhijian Zhuo","Minrui Wang","Deyi Liu","Shiyi Zhan","Yiyuan Ma","Liang Xiang","Xingyan Bin","Di He"],"abstract":"Progressive Learning (PL) reduces pre-training computational overhead by gradually increasing model scale. While prior work has extensively explored depth expansion, width expansion remains significantly understudied, with the few existing methods limited to the early stages of training. However, expanding width during the mid-stage is essential for maximizing computational savings, yet it remains a formidable challenge due to severe training instabilities. Empirically, we show that naive initialization at this stage disrupts activation statistics, triggering loss spikes, while copy-based initialization introduces gradient symmetry that hinders feature diversity. To address these issues, we propose SPARKLING (balancing {S}ignal {P}reservation {A}nd symmet{R}y brea{K}ing for width-progressive {L}earn{ING}), a novel framework for mid-stage width expansion. Our method achieves signal preser...","companies":["ByteDance/Seed"],"matched_orgs":["ByteDance/Seed"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Machine Learning","LLM","arXiv"],"author_affiliations":["ByteDance/Seed"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official ByteDance Seed publication API https://seed.bytedance.com/api/get_article_list_v2"}},{"id":"hf-org-paper:tencent:2602.03075","title":"ReMiT: RL-Guided Mid-Training for Iterative LLM Evolution","url":"https://huggingface.co/papers/2602.03075","published":"2026-02-02","authors":["Tencent/Hunyuan"],"abstract":"","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["HuggingFace org papers","tencent","LLM"],"author_affiliations":["Tencent/Hunyuan"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/tencent/papers"}},{"id":"hf-org-paper:moonshotai:2602.02276","title":"Kimi K2.5: Visual Agentic Intelligence","url":"https://huggingface.co/papers/2602.02276","published":"2026-02-02","authors":["Moonshot/Kimi"],"abstract":"We introduce Kimi K2.5, an open-source multimodal agentic model designed to advance general agentic intelligence. K2.5 emphasizes the joint optimization of text and vision so that two modalities enhance each other. This includes a series of techniques such as joint text-vision pre-training, zero-vision SFT, and joint text-vision reinforcement learning. Building on this multimodal foundation, K2.5 introduces Agent Swarm, a self-directed parallel agent orchestration framework that dynamically decomposes complex tasks into heterogeneous sub-problems and executes them concurrently. Extensive evaluations show that Kimi K2.5 achieves state-of-the-art results across various domains including coding, vision, reasoning, and agentic tasks. Agent Swarm also reduces latency by up to 4.5times over single-agent baselines. We release the post-trained Kimi K2.5 model checkpoint to facilitate future rese...","companies":["Moonshot/Kimi"],"matched_orgs":["Moonshot/Kimi"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["HuggingFace org papers","moonshotai","agent"],"author_affiliations":["Moonshot/Kimi"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/moonshotai/papers"}},{"id":"hf-org-paper:Qwen:2602.02361","title":"SWE-Universe: Scale Real-World Verifiable Environments to Millions","url":"https://huggingface.co/papers/2602.02361","published":"2026-02-02","authors":["Alibaba/Qwen"],"abstract":"","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["HuggingFace org papers","Qwen"],"author_affiliations":["Alibaba/Qwen"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/Qwen/papers"}},{"id":"hf-org-paper:tencent:2602.02103","title":"No Global Plan in Chain-of-Thought: Uncover the Latent Planning Horizon of LLMs","url":"https://huggingface.co/papers/2602.02103","published":"2026-02-02","authors":["Tencent/Hunyuan"],"abstract":"","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["HuggingFace org papers","tencent"],"author_affiliations":["Tencent/Hunyuan"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/tencent/papers"}},{"id":"openalex:W7127123718","title":"A Content- and Context-Aware Click Model Based on Dynamic Graph Neural Networks","url":"https://doi.org/10.1145/3795515","published":"2026-02-02","authors":["Liu Yang","Jiaxin Mao","Ziyuan Zhao","Qiang Yan"],"abstract":"Click modeling constitutes a pivotal area of study within information retrieval, as it provides insights into user search behavior and enables the extraction of valuable implicit relevance feedback from large-scale click logs. However, existing click models often rely on content-agnostic IDs to represent queries and documents. Additionally, they employ context-independent assumptions, such as the examination hypothesis, in modeling click probabilities. As a result, contemporary click models often fall short of capturing the influence of the diverse, multi-modal content found on modern Search Engine Result Pages (SERPs) and the intricate contextual interactions among heterogeneous search results. To address this issue, we propose a novel Dynamic Graph Neural Click Model (DGCM). The proposed model incorporates rich content and context information by jointly representing them as nodes in a....","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3795515","openalex_id":"https://openalex.org/W7127123718","cited_by_count":0,"quality_score":41,"matched_keywords":["retrieval"],"author_affiliations":["Renmin University of China","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.9067999720573425},{"id":"https://openalex.org/C158154518","display_name":"Relevance (law)","score":0.6337000131607056},{"id":"https://openalex.org/C132525143","display_name":"Graph","score":0.583299994468689},{"id":"https://openalex.org/C2779343474","display_name":"Context (archaeology)","score":0.4876999855041504},{"id":"https://openalex.org/C50644808","display_name":"Artificial neural network","score":0.462799996137619},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.45010000467300415},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.43560001254081726},{"id":"https://openalex.org/C115174607","display_name":"Click-through rate","score":0.3725999891757965}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/redcodeagent-automatic-red-teaming-agent-against-diverse-code-agents","title":"RedCodeAgent: Automatic Red-teaming Agent against Diverse Code Agents","url":"https://www.microsoft.com/en-us/research/publication/redcodeagent-automatic-red-teaming-agent-against-diverse-code-agents/","published":"2026-02-01","authors":["Chengquan Guo","Chulin Xie","Yu Yang","Zhaorun Chen","Zinan Lin","Xander Davies","Yarin Gal","Dawn Song","Bo Li"],"abstract":"Code agents have gained widespread adoption due to their strong code generation capabilities and integration with code interpreters, enabling dynamic execution, debugging, and interactive programming capabilities. While these advancements have streamlined complex workflows, they have also introduced critical safety and security risks. Current static safety benchmarks and red-teaming tools are inadequate for identifying emerging real-world risky scenarios, as they fail to cover certain boundary conditions, such as the combined effects of different jailbreak tools. In this work, we propose RedCodeAgent, the first automated red-teaming agent designed to systematically uncover vulnerabilities in diverse code agents. With an adaptive memory module, RedCodeAgent can leverage existing jailbreak knowledge, dynamically select the most effective red-teaming tools and tool combinations in a tailore...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":100,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Security, privacy, and cryptography","AI agents","Code generation","Computer security","large language models","Security and Privacy","1970-01-01","LLM","memory","agent"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/genmac-compositional-text-to-video-generation-with-multi-agent-collaboration","title":"GenMAC: Compositional Text-to-Video Generation with Multi-Agent Collaboration","url":"https://www.microsoft.com/en-us/research/publication/genmac-compositional-text-to-video-generation-with-multi-agent-collaboration/","published":"2026-02-01","authors":["Kaiyi Huang","Yukun Huang","Xuefei Ning","Zinan Lin","Yu Wang","Xihui Liu"],"abstract":"Text-to-video generation models have shown significant progress in the recent years. However, they still struggle with generating complex dynamic scenes based on compositional text prompts, such as attribute binding for multiple objects, temporal dynamics associated with different objects, and interactions between objects. Our key motivation is that complex tasks can be decomposed into simpler ones, each handled by a role-specialized MLLM agent. Multiple agents can collaborate together to achieve collective intelligence for complex goals. We propose GenMAC, an iterative, multi-agent framework that enables compositional text-to-video generation. The collaborative workflow includes three stages: Design, Generation, and Redesign, with an iterative loop between the Generation and Redesign stages to progressively verify and refine the generated videos. The Redesign stage is the most challengi...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":84,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer vision","AI agents","text-to-video generation","1970-01-01","agent","multi-agent"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/sutradhara-an-intelligent-orchestrator-engine-co-design-for-tool-based-agentic-inference","title":"SUTRADHARA : An Intelligent Orchestrator-Engine Co-design for Tool-based Agentic Inference","url":"https://www.microsoft.com/en-us/research/publication/sutradhara-an-intelligent-orchestrator-engine-co-design-for-tool-based-agentic-inference/","published":"2026-02-01","authors":["Anish Biswas","Kanishk Goel","Jayashree Mohan","Alind Khare","A. Parayil","Ramachandran Ramjee","Chetan Bansal"],"abstract":"Agentic applications are LLM that iteratively invoke external tools to accomplish complex tasks. Such tool-based agents are rapidly becoming the dominant paradigm for deploying language models in production. Unlike traditional single-turn inference, agentic workloads chain together multiple LLM calls and tool executions before producing a final response, creating a new performance bottleneck that manifests as increased latency in First Token Rendered (FTR) of the final answer. Through analysis of synthetic requests at production scale, we reveal three critical challenges: tool calls account for 30-80% of FTR latency, KV cache hit rates collapse despite substantial context reuse across iterations, and sequential orchestration wastes potential intra-request parallelism by sequentially executing LLM calls and tools. These bottlenecks stem from a design gap in which orchestrators and LLM eng...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"https://openalex.org/W7125351876","cited_by_count":0,"quality_score":80,"matched_keywords":["Unpublished","Artificial intelligence","Systems and networking","agentic AI","large language models","Operating system","LLM"],"author_affiliations":["Microsoft","Microsoft (United States)","Microsoft Research (India)","Microsoft Research (United Kingdom)"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/training-large-reasoning-models-efficiently-via-progressive-thought-encoding","title":"Training Large Reasoning Models Efficiently via Progressive Thought Encoding","url":"https://www.microsoft.com/en-us/research/publication/training-large-reasoning-models-efficiently-via-progressive-thought-encoding/","published":"2026-02-01","authors":["Zeliang Zhang","Xiaodong Liu","Hao Cheng","Hao Sun","Chenliang Xu","Jianfeng Gao"],"abstract":"Large reasoning models (LRMs) excel on complex problems but face a critical barrier to efficiency: reinforcement learning (RL) training requires long rollouts for outcome-based rewards, where autoregressive decoding dominates time and memory usage. While sliding-window cache strategies can bound memory, they disrupt long-context reasoning and degrade performance. We introduce Progressive Thought Encoding, a parameter-efficient fine-tuning method that enables LRMs to reason effectively under fixed-size caches. By progressively encoding intermediate reasoning into compact representations, our approach eliminates the need to backpropagate through full-cache rollouts, thereby reducing training-time memory usage, while maintaining constant memory during inference. Experiments on three models, including Qwen2.5-3B-Instruct, Qwen2.5-7B-Instruct, and DeepSeek-R1-Distill-Llama-8B, across six wide...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","1970-01-01","memory","efficient"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/text2arch-a-dataset-for-generating-scientific-architecture-diagrams-from-natural-language-descriptions","title":"Text2Arch: A Dataset for Generating Scientific Architecture Diagrams from Natural Language Descriptions","url":"https://www.microsoft.com/en-us/research/publication/text2arch-a-dataset-for-generating-scientific-architecture-diagrams-from-natural-language-descriptions/","published":"2026-02-01","authors":["Shivank Garg","Sankalp Mittal","Manish Gupta"],"abstract":"Communicating complex system designs or scientific processes through text alone is inefficient and prone to ambiguity. A system that automatically generates scientific architecture diagrams from text with high semantic fidelity can be useful in multiple applications like enterprise architecture visualization, AI-driven software design, and educational content creation. Hence, in this paper, we focus on leveraging language models to perform semantic understanding of the input text description to generate intermediate code that can be processed to generate high-fidelity architecture diagrams. Unfortunately, no clean large-scale open-access dataset exists, implying lack of any effective open models for this task. Hence, we contribute a comprehensive dataset, \\system, comprising scientific architecture images, their corresponding textual descriptions, and associated DOT code representations....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Human language technologies","Computation and Language","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/score-distillation-beyond-acceleration-generative-modeling-from-corrupted-data","title":"Score Distillation Beyond Acceleration: Generative Modeling from Corrupted Data","url":"https://www.microsoft.com/en-us/research/publication/score-distillation-beyond-acceleration-generative-modeling-from-corrupted-data/","published":"2026-02-01","authors":["Yasi Zhang","Tianyu Chen","Zhendong Wang","Yingnian Wu","Mingyuan Zhou","Oscar Leong"],"abstract":"Learning generative models directly from corrupted observations is a long-standing challenge across natural and scientific domains. We introduce Distillation from Corrupted Data (DCD), a unified framework for learning high-fidelity, one-step generative models using only degraded data of the form [latex]y = \\mathcal{A}(x) + \\sigma \\varepsilon, \\ x\\sim pX,\\ \\varepsilon\\sim \\mathcal{N}(0,Im),[/latex] where the mapping [latex]\\mathcal{A}[/latex] may be the identity or a non-invertible corruption operator (e.g., blur, masking, subsampling, Fourier acquisition). DCD first pretrains a corruption-aware diffusion teacher on the observed measurements, then distills it into an efficient one-step generator whose samples are statistically closer to the clean distribution [latex]pX[/latex]. The framework subsumes identity corruption (denoising task) as a special case of our general formulation. Empiri...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","1970-01-01","efficient","distillation"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/interwhen-a-generalizable-framework-for-verifiable-reasoning-with-test-time-monitors","title":"interwhen: A Generalizable Framework for Verifiable Reasoning with Test-time Monitors","url":"https://www.microsoft.com/en-us/research/publication/interwhen-a-generalizable-framework-for-verifiable-reasoning-with-test-time-monitors/","published":"2026-02-01","authors":["Vishak Bhat","Prateek Chanda","Ashmit Khandelwal","Maitreyi Swaroop","Vineeth N Balasubramanian","Subbarao Kambhampati","Nagarajan Natarajan","Amit Sharma"],"abstract":"We present a test-time verification framework, interwhen, that ensures that the output of a reasoning model is valid wrt. a given set of verifiers. Verified reasoning is an important goal in high-stakes scenarios such as deploying agents in the physical world or in domains such as law and finance. However, current techniques either rely on the generate-test paradigm that verifies only after the final answer is produced, or verify partial output through a step-extraction paradigm where the task execution is externally broken down into structured steps. The former is inefficient while the latter artificially restricts a model's problem-solving strategies. Instead, we propose to verify a model's reasoning trace as-is, taking full advantage of a model's reasoning capabilities while verifying and steering the model's output only when needed.The key idea is meta-prompting , identifying the ver...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Unpublished","Artificial intelligence","Machine learning","efficient"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/materials-are-mission-critical-to-the-sustainability-transition-accelerating-discovery-of-materials-for-sustainability","title":"Materials are Mission Critical to the Sustainability Transition: Accelerating Discovery of Materials for Sustainability","url":"https://www.microsoft.com/en-us/research/publication/materials-are-mission-critical-to-the-sustainability-transition-accelerating-discovery-of-materials-for-sustainability/","published":"2026-02-01","authors":["Bichlien Nguyen"],"abstract":"Artificial intelligence (AI) is rapidly transforming the field of materials science, offering a new paradigm to accelerate the discovery and design of sustainable materials. This article explores how AI-driven innovations are enabling breakthroughs across the material spectrum, from recyclable polymers, coolants, to low-carbon cement and crystalline materials for energy applications ‒ advancing goals across material circularity, clean energy transition, and climate resilience. Additionally, it examines current key bottlenecks in AI for materials design, including computational limitations and data gaps. Finally, the article highlights emerging opportunities to close the loop between theory and practice through agentic AI systems and automation, emphasizing the importance of sustained investment and thoughtful deployment to catalyze a more sustainable future. Venue: 1970-01-01","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Article (Journal)","Artificial intelligence","Ecology and environment","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/trustgen-a-platform-of-dynamic-benchmarking-on-the-trustworthiness-of-generative-foundation-models","title":"TrustGen: A Platform of Dynamic Benchmarking on the Trustworthiness of Generative Foundation Models","url":"https://www.microsoft.com/en-us/research/publication/trustgen-a-platform-of-dynamic-benchmarking-on-the-trustworthiness-of-generative-foundation-models/","published":"2026-02-01","authors":["TrustGen Team","Jianfeng Gao"],"abstract":"Generative foundation models (GenFMs), such as large language models and text-to-image systems, have demonstrated remarkable capabilities in various downstream applications. As they are increasingly deployed in high-stakes applications, assessing their trustworthiness has become both a critical necessity and a substantial challenge. Existing evaluation efforts are fragmented, rapidly outdated, and often lack extensibility across modalities. This raises a fundamental question: how can we systematically, reliably, and continuously assess the trustworthiness of rapidly advancing GenFMs across diverse modalities and use cases? To address these gaps, we introduce TrustGen, a dynamic and modular benchmarking system designed to systematically evaluate the trustworthiness of GenFMs across text-to-image, large language, and vision-language modalities. TrustGen standardizes trust evaluation throug...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/synergizing-understanding-and-generation-with-interleaved-analyzing-drafting-thinking","title":"Synergizing Understanding and Generation with Interleaved Analyzing-Drafting Thinking","url":"https://www.microsoft.com/en-us/research/publication/synergizing-understanding-and-generation-with-interleaved-analyzing-drafting-thinking/","published":"2026-02-01","authors":["Shengqiong Wu","Bobo Li","Xinkai Wang","Xiangtai Li","Lei Cui","Furu Wei","Shuicheng YAN","Hao Fei","Tat-Seng Chua"],"abstract":"Unified Vision–Language Models (UVLMs) aim to advance multimodal learning by supporting both understanding and generation within a single framework. However, existing approaches largely focus on architectural unification while overlooking the need for explicit interaction between the two capabilities during task solving. As a result, current models treat understanding and generation as parallel skills rather than synergistic processes. To achieve real synergy, we introduce the interleaved Analyzing–Drafting problem-solving loop (AD-Loop), a new think paradigm that dynamically alternates between analytic and drafting operations. By interleaving textual thoughts with visual thoughts, AD-Loop enables models to iteratively refine both comprehension and outputs, fostering genuine synergy. To train this mechanism, we design a two-stage strategy: supervised learning on interleaved thought data....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/seerattention-r-sparse-attention-adaptation-for-long-reasoning","title":"SeerAttention-R: Sparse Attention Adaptation for Long Reasoning","url":"https://www.microsoft.com/en-us/research/publication/seerattention-r-sparse-attention-adaptation-for-long-reasoning/","published":"2026-02-01","authors":["Yizhao Gao","Shuming Guo","Shijie Cao","Yuqing Xia","Yu Cheng","Lei Wang","Lingxiao Ma","Yutao Sun","Tianzhu Ye","Li Dong","Hayden Kwok-Hay So","Yu Hua"],"abstract":"We introduce SeerAttention-R, a sparse attention framework specifically tailored for the long decoding of reasoning models. Extended from SeerAttention, SeerAttention-R retains the design of learning attention sparsity through a self-distilled gating mechanism, while removing query pooling to accommodate auto-regressive decoding. With a lightweight plug-in gating, SeerAttention-R is flexible and can be easily integrated into existing pretrained model without modifying the original parameters. We demonstrate that SeerAttention-R, trained on just 0.4B tokens, maintains near-lossless reasoning accuracy with 4K token budget in AIME benchmark under large sparse attention block sizes (64/128). Using TileLang, we develop a highly optimized sparse decoding kernel that achieves near-theoretical speedups of up to 9x over FlashAttention-3 on H100 GPU at 90% sparsity. Venue: 1970-01-01","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"hf-org-paper:tencent:2602.01335","title":"Beyond Pixels: Visual Metaphor Transfer via Schema-Driven Agentic Reasoning","url":"https://huggingface.co/papers/2602.01335","published":"2026-02-01","authors":["Tencent/Hunyuan"],"abstract":"","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["HuggingFace org papers","tencent"],"author_affiliations":["Tencent/Hunyuan"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/tencent/papers"}},{"id":"openalex:W7127049121","title":"MolSpecFlow: Mass-Constrained Hybrid Flow Matching for Joint Molecular-Spectral Analysis","url":"https://doi.org/10.64898/2026.01.28.702438","published":"2026-02-01","authors":["Yu Wang","Fan Yang","Kaikun Xu","Li Yuan","Jun Zhu","Jingjie Zhang","Zhenchao Tang","Yatao Bian","Cheng Chang","Yonghong Tian","Jianhua Yao"],"abstract":"Abstract Identifying the “dark matter” of the chemical universe requires bridging the fundamental gap between molecules and mass spectra. Existing approaches struggle to accurately map between these distinct data formats, often yielding results that are either chemically implausible or inconsistent with the physical evidence. We introduce MolSpecFlow, a unified foundation model pretrained on 100 million molecules and 42 million spectra, which leverages a hybrid flow matching framework to orchestrate optimal transport paths for spectral peaks and discrete probability flows for molecular tokens. Furthermore, to guarantee the physicochemical validity of the generated structures, we incorporate explicit rule-based constraints into the generative process, utilizing a token-level mass control mechanism that strictly enforces alignment with the precursor molecular weight. MolSpecFlow establishe...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.64898/2026.01.28.702438","openalex_id":"https://openalex.org/W7127049121","cited_by_count":0,"quality_score":41,"matched_keywords":["retrieval"],"author_affiliations":["Beijing Proteome Research Center","Cancer Hospital of Chinese Academy of Medical Sciences","Center for Life Sciences","Chinese University of Hong Kong","German Research Centre for Artificial Intelligence","National University of Singapore","Peking University","Tencent (China)","Tsinghua University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6549000144004822},{"id":"https://openalex.org/C165064840","display_name":"Matching (statistics)","score":0.5928000211715698},{"id":"https://openalex.org/C174348530","display_name":"Bridging (networking)","score":0.5893999934196472},{"id":"https://openalex.org/C38349280","display_name":"Flow (mathematics)","score":0.45010000467300415},{"id":"https://openalex.org/C11413529","display_name":"Algorithm","score":0.4496000111103058},{"id":"https://openalex.org/C167966045","display_name":"Generative model","score":0.44780001044273376},{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.3937000036239624},{"id":"https://openalex.org/C18555067","display_name":"Joint (building)","score":0.38850000500679016}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7127308463","title":"Too Good to Be Human: AI Turing Test in Incomplete Information-Competitive Games","url":"https://doi.org/10.1109/mc.2025.3600295","published":"2026-02-01","authors":["Wancheng Ni","Xiaoqi Wang","Kaiqi Huang","Xiaonan Zhao","Shixian Wang","Jian Hu","Xin Li"],"abstract":"Recent advances in artificial intelligence (AI) have significantly improved our ability to address complicated problems. However, concerns about human-like AI are also growing. This article investigates when humans consider AI agents to be human-like in competitive board games based on their game performance.","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/mc.2025.3600295","openalex_id":"https://openalex.org/W7127308463","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Alibaba Group (China)","Beijing Academy of Artificial Intelligence","Chinese Academy of Sciences","Hong Kong Polytechnic University","Institute of Automation","Shandong Institute of Automation"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6111999750137329},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5863999724388123},{"id":"https://openalex.org/C577917","display_name":"Turing test","score":0.5741999745368958},{"id":"https://openalex.org/C9870796","display_name":"Turing","score":0.49470001459121704},{"id":"https://openalex.org/C157170001","display_name":"Applications of artificial intelligence","score":0.44940000772476196},{"id":"https://openalex.org/C2777267654","display_name":"Test (biology)","score":0.4034999907016754},{"id":"https://openalex.org/C113336015","display_name":"Complete information","score":0.336899995803833},{"id":"https://openalex.org/C26517878","display_name":"Key (lock)","score":0.30390000343322754}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7127118597","title":"Bootstrapping Large Language Models with Outsideknowledge for Knowledge-based Visual Question Answering","url":"https://doi.org/10.1007/s11633-025-1591-z","published":"2026-02-01","authors":["Yanze Min","Yawei Sun","Yin Zhu","Jun Zhu","Bo Zhang"],"abstract":"","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1007/s11633-025-1591-z","openalex_id":"https://openalex.org/W7127118597","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Alibaba Group (China)","Nanjing University","Tsinghua University"],"concepts":[{"id":"https://openalex.org/C44291984","display_name":"Question answering","score":0.885699987411499},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7404000163078308},{"id":"https://openalex.org/C207609745","display_name":"Bootstrapping (finance)","score":0.7092999815940857},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.6435999870300293},{"id":"https://openalex.org/C2776502983","display_name":"Contrast (vision)","score":0.5112000107765198},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.4878000020980835},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.4523000121116638},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.4490000009536743}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"official:722c3d41e33d4132","title":"Claude Sonnet 4.6 System Card","url":"https://www.anthropic.com/claude-sonnet-4-6-system-card","published":"2026-02","authors":["Anthropic"],"abstract":"Official Anthropic system card for Claude Sonnet 4.6.","companies":["Anthropic"],"matched_orgs":["Anthropic"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"model_card","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["system card","Claude","Claude Sonnet 4.6"],"author_affiliations":["Anthropic"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Anthropic system cards page https://www.anthropic.com/system-cards"}},{"id":"official:58059138bea2e556","title":"Claude Opus 4.6 System Card","url":"https://anthropic.com/claude-opus-4-6-system-card","published":"2026-02","authors":["Anthropic"],"abstract":"Official Anthropic system card for Claude Opus 4.6.","companies":["Anthropic"],"matched_orgs":["Anthropic"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"model_card","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["system card","Claude","Claude Opus 4.6"],"author_affiliations":["Anthropic"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Anthropic system cards page https://www.anthropic.com/system-cards"}},{"id":"bytedance-seed:1380","title":"Adaptive Ability Decomposing for Unlocking Large Reasoning Model Effective Reinforcement Learning","url":"https://seed.bytedance.com/en/research/adaptive-ability-decomposing-for-unlocking-large-reasoning-model-effective-reinforcement-learning","published":"2026-01-31","authors":["Zhipeng Chen","Xiaobo Qin","Wayne Xin Zhao","Youbin Wu","Ji-Rong Wen"],"abstract":"Reinforcement learning with verifiable rewards (RLVR) has shown great potential to enhance the reasoning ability of large language models (LLMs). However, due to the limited amount of information provided during the RLVR process, the model can only engage in largely blind exploration, which often results in failure on challenging problems. To provide additional information for the RLVR process without relying on a teacher model, we propose A2D, an Adaptive Ability Decomposing method for enhancing the effectiveness of RLVR. Specifically, we first train a decomposer via RLVR without distillation, enabling it to decompose complex questions into a set of simpler sub-questions. Next, we use this decomposer to annotate sub-questions for each question in the training dataset, and then train the reasoner under RLVR with sub-question guidance. To better understand A2D, we first compare its perfor...","companies":["ByteDance/Seed"],"matched_orgs":["ByteDance/Seed"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Computation and Language","Multimodal","arXiv","distillation"],"author_affiliations":["ByteDance/Seed"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official ByteDance Seed publication API https://seed.bytedance.com/api/get_article_list_v2"}},{"id":"openalex:W7126415217","title":"Why not transform chat large language models to non-English?","url":"https://doi.org/10.1007/s11704-025-50646-z","published":"2026-01-31","authors":["Xiang Geng","Ming Zhu","Jiahuan Li","Zhejian Lai","Wei Zou","Shuaijie She","Jiaxin Guo","Xiaofeng Zhao","Yinglu Li","Yuang Li","Chang Su","Yanqing Zhao"],"abstract":"","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1007/s11704-025-50646-z","openalex_id":"https://openalex.org/W7126415217","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Huawei Technologies (China)","Nanjing University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.9190999865531921},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.5835000276565552},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5148000121116638},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.5040000081062317},{"id":"https://openalex.org/C2776849261","display_name":"Online chat","score":0.4099000096321106},{"id":"https://openalex.org/C2983448237","display_name":"Language understanding","score":0.39570000767707825},{"id":"https://openalex.org/C28490314","display_name":"Speech recognition","score":0.3181999921798706},{"id":"https://openalex.org/C129792486","display_name":"Language identification","score":0.2937999963760376}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"bytedance-seed:1332","title":"Post-LayerNorm Is Back: Stable, ExpressivE, and Deep","url":"https://seed.bytedance.com/en/research/post-layernorm-is-back-stable-expressive-and-deep","published":"2026-01-30","authors":["Chen Chen","Lai Wei"],"abstract":"Large language model (LLM) scaling is hitting a wall. Widening models yields diminishing returns, and extending context length does not improve fundamental expressivity. In contrast, depth scaling offers theoretically superior expressivity, yet current Transformer architectures struggle to train reliably at extreme depths. We revisit the Post-LayerNorm (Post-LN) formulation, whose instability at scale caused its replacement by Pre-LN in modern LLMs. We show that the central failure mode of Post-LN arises from the ResNet-style residual pathway, which introduces gradient vanishing in deep networks. We present Keel, a Post-LN Transformer that replaces this residual path with a Highway-style connection. This modification preserves the gradient flow through the residual branch, preventing signal vanishing from the top layers to the bottom. Unlike prior methods, Keel enables stable training at...","companies":["ByteDance/Seed"],"matched_orgs":["ByteDance/Seed"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Machine Learning","Multimodal","arXiv","LLM","language model"],"author_affiliations":["ByteDance/Seed"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official ByteDance Seed publication API https://seed.bytedance.com/api/get_article_list_v2"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/how-far-are-llms-from-professional-poker-players-revisiting-game-theoretic-reasoning-with-agentic-tool-use","title":"How Far Are LLMs from Professional Poker Players? Revisiting Game-Theoretic Reasoning with Agentic Tool Use","url":"https://www.microsoft.com/en-us/research/publication/how-far-are-llms-from-professional-poker-players-revisiting-game-theoretic-reasoning-with-agentic-tool-use/","published":"2026-01-30","authors":["Min Lin","Enyan Dai","Hui Liu","Xianfeng Tang","Yuliang Yan","Zhenwei Dai","Jingying Zeng","Zhiwei Zhang","Fali Wang","Hongcheng Gao","Chen Luo","Xiang Zhang"],"abstract":"As Large Language Models (LLMs) are increasingly applied in high-stakes domains, their ability to reason strategically under uncertainty becomes critical. Poker provides a rigorous testbed, requiring not only strong actions but also principled, game-theoretic reasoning. In this paper, we conduct a systematic study of LLMs in multiple realistic poker tasks, evaluating both gameplay outcomes and reasoning traces. Our analysis reveals LLMs fail to compete against traditional algorithms and identifies three recurring flaws: reliance on heuristics, factual misunderstandings, and a\"knowing-doing\"gap where actions diverge from reasoning. An initial attempt with behavior cloning and step-level reinforcement learning improves reasoning style but remains insufficient for accurate game-theoretic play. Motivated by these limitations, we propose ToolPoker, a tool-integrated reasoning framework that c...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","large language models","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"bytedance-seed:1338","title":"Retrieval-Infused Reasoning Sandbox: A Benchmark for Decoupling Retrieval and Reasoning Capabilities","url":"https://seed.bytedance.com/en/research/retrieval-infused-reasoning-sandbox-a-benchmark-for-decoupling-retrieval-and-reasoning-capabilities","published":"2026-01-30","authors":["Shuangshuang Ying","Zheyu Wang","Yunjian Peng","Jin Chen","Yuhao Wu","Hongbin Lin","Dingyu He","Siyi Liu","Gengchen Yu","YinZhu Piao","Yuchen Wu","Xin Gui"],"abstract":"Despite strong performance on existing benchmarks, it remains unclear whether large language models can reason over genuinely novel scientific information. Most evaluations score end-to-end RAG pipelines, where reasoning is confounded with retrieval and toolchain choices, and the signal is further contaminated by parametric memorization and open-web volatility. We introduce DeR2, a controlled deep-research sandbox that isolates document-grounded reasoning while preserving core difficulties of deep search: multi-step synthesis, denoising, and evidence-based conclusion making. DeR2 decouples evidence access from reasoning via four regimes--Instruction-only, Concepts (gold concepts without documents), Related-only (only relevant documents), and Full-set (relevant documents plus topically related distractors)--yielding interpretable regime gaps that operationalize retrieval loss vs. reasonin...","companies":["ByteDance/Seed"],"matched_orgs":["ByteDance/Seed"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Artificial Intelligence","Seed","arXiv","retrieval"],"author_affiliations":["ByteDance/Seed"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official ByteDance Seed publication API https://seed.bytedance.com/api/get_article_list_v2"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/from-abstract-to-contextual-what-llms-still-cannot-do-in-mathematics","title":"From Abstract to Contextual: What LLMs Still Cannot Do in Mathematics","url":"https://www.microsoft.com/en-us/research/publication/from-abstract-to-contextual-what-llms-still-cannot-do-in-mathematics/","published":"2026-01-29","authors":["Bowen Cao","Dongdong Zhang","Yixia Li","Junpeng Liu","Shijue Huang","Chufan Shi","Hongyuan Lu","Yaokang Wu","Guanhua Chen","Wai Lam","Furu Wei"],"abstract":"Large language models now solve many benchmark math problems at near-expert levels, yet this progress has not fully translated into reliable performance in real-world applications. We study this gap through contextual mathematical reasoning, where the mathematical core must be formulated from descriptive scenarios. We introduce ContextMATH, a benchmark that repurposes AIME and MATH-500 problems into two contextual settings: Scenario Grounding (SG), which embeds abstract problems into realistic narratives without increasing reasoning complexity, and Complexity Scaling (CS), which transforms explicit conditions into sub-problems to capture how constraints often appear in practice. Evaluating 61 proprietary and open-source models, we observe sharp drops: on average, open-source models decline by 13 and 34 points on SG and CS, while proprietary models drop by 13 and 20. Error analysis shows....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":80,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Mathematics","Computer science","large language models","mathematics","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/optimizing-agentic-workflows-using-meta-tools","title":"Optimizing Agentic Workflows using Meta-tools","url":"https://www.microsoft.com/en-us/research/publication/optimizing-agentic-workflows-using-meta-tools/","published":"2026-01-29","authors":["Sami Abuzakuk","Anne-Marie Kermarrec","Rishi Sharma","Rasmus Moorits Veski","M. Vos"],"abstract":"Agentic AI enables LLM to dynamically reason, plan, and interact with tools to solve complex tasks. However, agentic workflows often require many iterative reasoning steps and tool invocations, leading to significant operational expense, end-to-end latency and failures due to hallucinations. This work introduces Agent Workflow Optimization (AWO), a framework that identifies and optimizes redundant tool execution patterns to improve the efficiency and robustness of agentic workflows. AWO analyzes existing workflow traces to discover recurring sequences of tool calls and transforms them into meta-tools, which are deterministic, composite tools that bundle multiple agent actions into a single invocation. Meta-tools bypass unnecessary intermediate LLM reasoning steps and reduce operational cost while also shortening execution paths, leading to fewer failures. Experiments on two agentic AI be...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Article (Journal)","Artificial intelligence","Systems and networking","Computer science","LLM","agent"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"bytedance-seed:1339","title":"ConceptMoE: Adaptive Token-to-Concept Compression for Implicit Compute Allocation","url":"https://seed.bytedance.com/en/research/conceptmoe-adaptive-token-to-concept-compression-for-implicit-compute-allocation","published":"2026-01-29","authors":["Zihao Huang","Jundong Zhou","Xingwei Qu","Qiyang Min","Ge Zhang"],"abstract":"Large language models allocate uniform computation across all tokens, ignoring that some sequences are trivially predictable while others require deep reasoning. We introduce ConceptMoE, which dynamically merges semantically similar tokens into concept representations, performing implicit token-level compute allocation. A learnable chunk module identifies optimal boundaries by measuring inter-token similarity, compressing sequences by a target ratio R before they enter the compute-intensive concept model. Crucially, the MoE architecture enables controlled evaluation: we reallocate saved computation to match baseline activated FLOPs (excluding attention map computation) and total parameters, isolating genuine architectural benefits. Under these conditions, ConceptMoE consistently outperforms standard MoE across language and vision-language tasks, achieving +0.9 points on language pretrain...","companies":["ByteDance/Seed"],"matched_orgs":["ByteDance/Seed"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Machine Learning","LLM","arXiv","compression"],"author_affiliations":["ByteDance/Seed"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official ByteDance Seed publication API https://seed.bytedance.com/api/get_article_list_v2"}},{"id":"hf-org-paper:Qwen:2601.21337","title":"Qwen3-ASR Technical Report","url":"https://huggingface.co/papers/2601.21337","published":"2026-01-29","authors":["Alibaba/Qwen"],"abstract":"In this report, we introduce Qwen3-ASR family, which includes two powerful all-in-one speech recognition models and a novel non-autoregressive speech forced alignment model. Qwen3-ASR-1.7B and Qwen3-ASR-0.6B are ASR models that support language identification and ASR for 52 languages and dialects. Both of them leverage large-scale speech training data and the strong audio understanding ability of their foundation model Qwen3-Omni. We conduct comprehensive internal evaluation besides the open-sourced benchmarks as ASR models might differ little on open-sourced benchmark scores but exhibit significant quality differences in real-world scenarios. The experiments reveal that the 1.7B version achieves SOTA performance among open-sourced ASR models and is competitive with the strongest proprietary APIs while the 0.6B version offers the best accuracy-efficiency trade-off. Qwen3-ASR-0.6B can ach...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["HuggingFace org papers","Qwen","LLM"],"author_affiliations":["Alibaba/Qwen"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/Qwen/papers"}},{"id":"official:2f3da420a065b264","title":"PaddleOCR-VL-1.5: Towards a Multi-Task 0.9B VLM for Robust In-the-Wild Document Parsing","url":"https://ernie.baidu.com/blog/posts/paddleocr-vl-1.5/","published":"2026-01-29","authors":["Baidu"],"abstract":"🚀 We release PaddleOCR-VL-1.5, an upgraded model achieving a new state-of-the-art (SOTA)accuracy of 94.5% on OmniDocBench v1.5.","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["ERNIE","Baidu","technical report"],"author_affiliations":["Baidu"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official RSS feed https://ernie.baidu.com/blog/index.xml"}},{"id":"hf-org-paper:baidu:2601.21244","title":"Less Noise, More Voice: Reinforcement Learning for Reasoning via Instruction Purification","url":"https://huggingface.co/papers/2601.21244","published":"2026-01-28","authors":["Baidu"],"abstract":"Reinforcement Learning with Verifiable Rewards (RLVR) has advanced LLM reasoning, but remains constrained by inefficient exploration under limited rollout budgets, leading to low sampling success and unstable training in complex tasks. We find that many exploration failures arise not from problem difficulty, but from a small number of prompt tokens that introduce interference. Building on this insight, we propose the Less Noise Sampling Framework (LENS), which first prompts by identifying and removing interference tokens. then transfers successful rollouts from the purification process to supervise policy optimization on the original noisy prompts, enabling the model to learn to ignore interference in the real-world, noisy prompting settings. Experimental results show that LENS significantly outperforms GRPO, delivering higher performance and faster convergence, with a 3.88% average gain...","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"https://openalex.org/W7126282195","cited_by_count":0,"quality_score":64,"matched_keywords":["HuggingFace org papers","baidu","LLM"],"author_affiliations":["Baidu","Baidu (China)"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/baidu/papers"}},{"id":"hf-org-paper:deepseek-ai:2601.20552","title":"DeepSeek-OCR 2: Visual Causal Flow","url":"https://huggingface.co/papers/2601.20552","published":"2026-01-28","authors":["DeepSeek"],"abstract":"We present DeepSeek-OCR 2 to investigate the feasibility of a novel encoder-DeepEncoder V2-capable of dynamically reordering visual tokens upon image semantics. Conventional vision-language models (VLMs) invariably process visual tokens in a rigid raster-scan order (top-left to bottom-right) with fixed positional encoding when fed into LLMs. However, this contradicts human visual perception, which follows flexible yet semantically coherent scanning patterns driven by inherent logical structures. Particularly for images with complex layouts, human vision exhibits causally-informed sequential processing. Inspired by this cognitive mechanism, DeepEncoder V2 is designed to endow the encoder with causal reasoning capabilities, enabling it to intelligently reorder visual tokens prior to LLM-based content interpretation. This work explores a novel paradigm: whether 2D image understanding can be...","companies":["DeepSeek"],"matched_orgs":["DeepSeek"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["HuggingFace org papers","deepseek-ai","LLM"],"author_affiliations":["DeepSeek"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/deepseek-ai/papers"}},{"id":"openalex:W7125961045","title":"Adaptive Semantic Compression and Transmission for Cognitive Knowledge Coordination in a Hierarchical LLM-Agents System","url":"https://doi.org/10.1109/jiot.2026.3658564","published":"2026-01-28","authors":["Xinju He","Jiayi Liu","Chen Wang","Xuemei Xie","Guangming Shi"],"abstract":"The rapid development of Large Language Model (LLM) agents has facilitated the advancement of multi-agent systems, where cognitive knowledge sharing is crucial for the execution of complex tasks. However, achieving the synchronization of the cognitive Knowledge base (KB) among agents under restricted wireless resources remains a challenge, especially in dynamic real-time environments. Therefore, we propose a hierarchical LLM agent system that consists of a high-level Cluster Brain (CB) and multiple Lower-level LLM Agents (LLAs). The cognitive KB of each LLA is represented in the form of a Knowledge Graph (KG). To improve the efficiency of transmitting cognitive KB updates from LLAs to CB, a KG compression framework named MED-EmPress is proposed, which adaptively compresses the semantic features of cognitive KB by applying dimensionality reduction and binary quantization, and then a joint...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/jiot.2026.3658564","openalex_id":"https://openalex.org/W7125961045","cited_by_count":0,"quality_score":61,"matched_keywords":["LLM","language model","compression","quantization","agent","multi-agent"],"author_affiliations":["Huawei Technologies (China)","Xidian University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8831999897956848},{"id":"https://openalex.org/C169900460","display_name":"Cognition","score":0.5131999850273132},{"id":"https://openalex.org/C32542511","display_name":"Cognitive network","score":0.45239999890327454},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.41269999742507935},{"id":"https://openalex.org/C149946192","display_name":"Cognitive radio","score":0.3946000039577484},{"id":"https://openalex.org/C2776459999","display_name":"Fidelity","score":0.3801000118255615},{"id":"https://openalex.org/C197914299","display_name":"Semantic memory","score":0.37860000133514404},{"id":"https://openalex.org/C4554734","display_name":"Knowledge base","score":0.37290000915527344}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"hf-org-paper:tencent:2601.20430","title":"Youtu-Parsing: Perception, Structuring and Recognition via High-Parallelism Decoding","url":"https://huggingface.co/papers/2601.20430","published":"2026-01-28","authors":["Tencent/Hunyuan"],"abstract":"","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["HuggingFace org papers","tencent"],"author_affiliations":["Tencent/Hunyuan"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/tencent/papers"}},{"id":"hf-org-paper:moonshotai:2602.02537","title":"WorldVQA: Measuring Atomic World Knowledge in Multimodal Large Language Models","url":"https://huggingface.co/papers/2602.02537","published":"2026-01-28","authors":["Moonshot/Kimi"],"abstract":"","companies":["Moonshot/Kimi"],"matched_orgs":["Moonshot/Kimi"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["HuggingFace org papers","moonshotai"],"author_affiliations":["Moonshot/Kimi"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/moonshotai/papers"}},{"id":"openalex:W7125967601","title":"JanusQuant: Accurate and Efficient 2-bit KV Cache Quantization for Long-Context Inference","url":"https://doi.org/10.1145/3774934.3786428","published":"2026-01-28","authors":["Chengyu Sun","Yaqi Xia","Hulin Wang","Donglin Yang","Xiaobo Zhou","Dazhao Cheng"],"abstract":"Long-context large language models (LLMs) have seen widespread adoption in recent years. However, during inference, the key-value (KV) cache—which stores intermediate activations—consumes significant memory, particularly as sequence lengths grow. Quantization offers a promising path to compress KV cache, but existing 2-bit approaches fall short of achieving optimal inference efficiency due to hardware-unfriendly algorithms and system implementations.","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3774934.3786428","openalex_id":"https://openalex.org/W7125967601","cited_by_count":0,"quality_score":49,"matched_keywords":["memory","efficient","quantization"],"author_affiliations":["Nvidia (United States)","University of Macau","Wuhan University"],"concepts":[{"id":"https://openalex.org/C28855332","display_name":"Quantization (signal processing)","score":0.7170000076293945},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7050999999046326},{"id":"https://openalex.org/C2776214188","display_name":"Inference","score":0.659500002861023},{"id":"https://openalex.org/C11413529","display_name":"Algorithm","score":0.6065000295639038},{"id":"https://openalex.org/C115537543","display_name":"Cache","score":0.5648999810218811},{"id":"https://openalex.org/C2777735758","display_name":"Path (computing)","score":0.5073999762535095},{"id":"https://openalex.org/C2778112365","display_name":"Sequence (biology)","score":0.4433000087738037},{"id":"https://openalex.org/C3018263672","display_name":"Efficient algorithm","score":0.34439998865127563}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"arxiv:2601.21123","title":"CUA-Skill: Develop Skills for Computer Using Agent","url":"https://huggingface.co/papers/2601.21123","published":"2026-01-28","authors":["Tianyi Chen","Yinheng Li","Michael Solodko","Sen Wang","Nan Jiang","Tingyuan Cui","Junheng Hao","Jongwoo Ko","Sara Abdali","Suzhen Zheng","Leon Xu","Hao Fan"],"abstract":"Computer-Using Agents (CUAs) aim to autonomously operate computer systems to complete real-world tasks. However, existing agentic systems remain difficult to scale and lag behind human performance. A key limitation is the absence of reusable and structured skill abstractions that capture how humans interact with graphical user interfaces and how to leverage these skills. We introduce CUA-Skill, a computer-using agentic skill base that encodes human computer-use knowledge as skills coupled with parameterized execution and composition graphs. CUA-Skill is a large-scale library of carefully engineered skills spanning common Windows applications, serving as a practical infrastructure and tool substrate for scalable, reliable agent development. Built upon this skill base, we construct CUA-Skill Agent, an end-to-end computer-using agent that supports dynamic skill retrieval, argument instantia...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["huggingface_search"],"source":"huggingface_search","work_type":"","doi":"","openalex_id":"","cited_by_count":0,"quality_score":43,"matched_keywords":["memory","retrieval","efficient","agent"],"author_affiliations":[],"concepts":[],"official_report":false,"quality_signals":{"company_match_source":"HuggingFace Papers organization/author metadata"}},{"id":"openalex:W7125930659","title":"Template-guided interpretable reasoning with execution feedback for LLM-based program repair","url":"https://doi.org/10.1016/j.infsof.2026.108058","published":"2026-01-28","authors":["Sichong Hao","Xianjun Shi","Hongwei Liu","Yuyang Yin","Xi Chen"],"abstract":"","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1016/j.infsof.2026.108058","openalex_id":"https://openalex.org/W7125930659","cited_by_count":0,"quality_score":41,"matched_keywords":["LLM"],"author_affiliations":["Harbin Institute of Technology","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C2781067378","display_name":"Interpretability","score":0.9068999886512756},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7782999873161316},{"id":"https://openalex.org/C75291252","display_name":"TRACE (psycholinguistics)","score":0.7630000114440918},{"id":"https://openalex.org/C199360897","display_name":"Programming language","score":0.44999998807907104},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.43540000915527344},{"id":"https://openalex.org/C115903868","display_name":"Software engineering","score":0.3621000051498413},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.3546999990940094},{"id":"https://openalex.org/C86251818","display_name":"Benchmarking","score":0.33820000290870667}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7126022570","title":"BAP v2: An Enhanced Task Framework for Instruction Following in Minecraft Dialogues","url":"https://doi.org/10.1162/coli.a.602","published":"2026-01-28","authors":["Prashant Jayannavar","Liliang Ren","Marisa Hudspeth","Risham Sidhu","Charlotte Lambert","A. W. Cordes","Elizabeth Kaplan","Anjali Narayan-Chen","Julia Hockenmaier"],"abstract":"Abstract Developing interactive agents that can understand language, perceive their surroundings, and act within the physical world is a long-standing goal of AI research. The Minecraft Collaborative Building Task (MCBT) (Narayan-Chen, Jayannavar, and Hockenmaier 2019), a two-player game in which an Architect (A) instructs a Builder (B) to construct a target structure in a simulated 3D Blocks World environment, offers a rich platform to work towards this goal. In this work, we focus on the Builder Action Prediction (BAP) subtask: predicting B’s actions in a multimodal game context (Jayannavar, Narayan-Chen, and Hockenmaier 2020) – a challenging testbed for grounded instruction following, with limited training data. We holistically re-examine this task and introduce BAP v2 to address key challenges in evaluation, training data, and modeling. Specifically, we define an enhanced evaluation....","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1162/coli.a.602","openalex_id":"https://openalex.org/W7126022570","cited_by_count":0,"quality_score":41,"matched_keywords":["LLM"],"author_affiliations":["Amazon (Germany)","Amazon (United States)","Microsoft Research (United Kingdom)","University of Illinois Urbana-Champaign","University of Massachusetts Amherst"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8087999820709229},{"id":"https://openalex.org/C2780451532","display_name":"Task (project management)","score":0.6301000118255615},{"id":"https://openalex.org/C2780801425","display_name":"Construct (python library)","score":0.617900013923645},{"id":"https://openalex.org/C31395832","display_name":"Testbed","score":0.6078000068664551},{"id":"https://openalex.org/C2779343474","display_name":"Context (archaeology)","score":0.5961999893188477},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.5521000027656555},{"id":"https://openalex.org/C177264268","display_name":"Set (abstract data type)","score":0.5131000280380249},{"id":"https://openalex.org/C192209626","display_name":"Focus (optics)","score":0.49779999256134033}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"arxiv:2601.20975","title":"DeepSearchQA: Bridging the Comprehensiveness Gap for Deep Research Agents","url":"https://huggingface.co/papers/2601.20975","published":"2026-01-28","authors":["Nikita Gupta","Riju Chatterjee","Lukas Haas","Connie Tao","Andrew Wang","Chang Liu","Hidekazu Oiwa","Elena Gribovskaya","Jan Ackermann","John Blitzer","Sasha Goldshtein","Dipanjan Das"],"abstract":"We introduce DeepSearchQA, a 900-prompt benchmark for evaluating agents on difficult multi-step information-seeking tasks across 17 different fields. Unlike traditional benchmarks that target single answer retrieval or broad-spectrum factuality, DeepSearchQA features a dataset of challenging, handcrafted tasks designed to evaluate an agent's ability to execute complex search plans to generate exhaustive answer lists. This shift in design explicitly tests three critical, yet under-evaluated capabilities: 1) systematic collation of fragmented information from disparate sources, 2) de-duplication and entity resolution to ensure precision, and 3) the ability to reason about stopping criteria within an open-ended search space. Each task is structured as a causal chain, where discovering information for one step is dependent on the successful completion of the previous one, stressing long-hori...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["huggingface_search"],"source":"huggingface_search","work_type":"","doi":"","openalex_id":"","cited_by_count":0,"quality_score":35,"matched_keywords":["retrieval","agent"],"author_affiliations":[],"concepts":[],"official_report":false,"quality_signals":{"company_match_source":"HuggingFace Papers organization/author metadata"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/calibration-without-ground-truth","title":"Calibration without Ground Truth","url":"https://www.microsoft.com/en-us/research/publication/calibration-without-ground-truth/","published":"2026-01-27","authors":["Yuqing Kong","Mingyu Song","Yizhou Wang","Yifan Wu"],"abstract":"Villalobos et al. [2024] predict that publicly available human text will be exhausted within the next decade. Thus, improving models without access to ground-truth labels becomes increasingly important. We propose a label-free post-processing framework that improves a strong but miscalibrated model using a weaker yet better-calibrated reference. Our framework guarantees a strict performance improvement under any proper loss. Our approach is based on a characterization of when strict improvement is possible: when the strong and reference models are not mutually calibrated. We formalize this condition, connect it to arbitrage and no-trade results from economics, and develop an efficient Bregman projection algorithm that guarantees worst-case loss reduction without labels. Experiments on representative LLMs across varying scales demonstrate that our label-free method significantly reduces p...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Article (Journal)","Artificial intelligence","Computer science","efficient"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"bytedance-seed:1378","title":"Visual Generation Unlocks Human-Like Reasoning through Multimodal World Models","url":"https://seed.bytedance.com/en/research/visual-generation-unlocks-human-like-reasoning-through-multimodal-world-models","published":"2026-01-27","authors":["Jialong Wu","Xiaoying Zhang","Hongyi Yuan","Xiangcheng Zhang","Tianhao Huang","Changjing He","Chaoyi Deng","Renrui Zhang","Youbin Wu","Mingsheng Long"],"abstract":"Humans construct internal world models and reason by manipulating the concepts within these models. Recent advances in AI, particularly chain-of-thought (CoT) reasoning, approximate such human cognitive abilities, where world models are believed to be embedded within large language models. Expert-level performance in formal and abstract domains such as mathematics and programming has been achieved in current systems by relying predominantly on verbal reasoning. However, they still lag far behind humans in domains like physical and spatial intelligence, which require richer representations and prior knowledge. The emergence of unified multimodal models (UMMs) capable of both verbal and visual generation has therefore sparked interest in more human-like reasoning grounded in complementary multimodal pathways, though their benefits remain unclear. From a world-model perspective, this paper....","companies":["ByteDance/Seed"],"matched_orgs":["ByteDance/Seed"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Artificial Intelligence","Multimodal","arXiv"],"author_affiliations":["ByteDance/Seed"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official ByteDance Seed publication API https://seed.bytedance.com/api/get_article_list_v2"}},{"id":"hf-org-paper:tencent:2601.19280","title":"Group Distributionally Robust Optimization-Driven Reinforcement Learning for LLM Reasoning","url":"https://huggingface.co/papers/2601.19280","published":"2026-01-27","authors":["Tencent/Hunyuan"],"abstract":"","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["HuggingFace org papers","tencent","LLM"],"author_affiliations":["Tencent/Hunyuan"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/tencent/papers"}},{"id":"hf-org-paper:tencent:2601.19798","title":"Youtu-VL: Unleashing Visual Potential via Unified Vision-Language Supervision","url":"https://huggingface.co/papers/2601.19798","published":"2026-01-27","authors":["Tencent/Hunyuan"],"abstract":"","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["HuggingFace org papers","tencent"],"author_affiliations":["Tencent/Hunyuan"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/tencent/papers"}},{"id":"hf-org-paper:moonshotai:2601.19228","title":"Towards Pixel-Level VLM Perception via Simple Points Prediction","url":"https://huggingface.co/papers/2601.19228","published":"2026-01-27","authors":["Moonshot/Kimi"],"abstract":"","companies":["Moonshot/Kimi"],"matched_orgs":["Moonshot/Kimi"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["HuggingFace org papers","moonshotai"],"author_affiliations":["Moonshot/Kimi"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/moonshotai/papers"}},{"id":"apple:hlb433a8rw63h9234s4wtwnq","title":"SelfReflect: Can LLMs Communicate Their Internal Answer Distribution?","url":"https://machinelearning.apple.com/research/selfreflect","published":"2026-01-27","authors":["Michael Kirchhoff","Luca Füger","Adam Goliński","Eeshan Gunesh Dhekane","Arno Blaas","Seong Joon Oh","Sinead Williamson"],"abstract":"The common approach to communicate a large language model's (LLM) uncertainty is to add a percentage number or a hedging word to its response. But is this all we can do? Instead of generating a single answer and then hedging it, an LLM that is fully transparent to the user needs to be able to reflect on its internal belief distribution and output a summary of all options it deems possible, and how likely they are. To test whether LLMs possess...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["LLM","language model"],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"openalex:W7125826563","title":"Spatial-Temporal Multimodal Large Language Model for Generative Recommendation in Alipay","url":"https://doi.org/10.1109/tkde.2026.3658153","published":"2026-01-27","authors":["Yunhui Xu","Zhixiang Yang","Youru Li","Zhenfeng Zhu","Zujian Weng","Jingjuan Zhao","Chenguang Ma","Jieping Ye","Yao Zhao"],"abstract":"Despite the encouraging achievements, the practical application of recommendation systems still faces two key issues. The first is how to better understand the multimodal real-time requests that are the more mainstream request behavior in industrial scenarios; the other is how to effectively capture users' dynamic needs that change with temporal and spatial conditions. The breakthroughs in text understanding and generation capabilities of Large Language Models (LLMs) have demonstrated their tremendous potential in precise recommendation systems, particularly through the enhancement of the understanding of user intent. To address these issues, we propose a novel Spatial-Temporal Multimodal LLM for generative recommendation. Specifically, on the basis of the behavior data constructed from Alipay, spatial-temporal knowledge-guided fine-tuning module is proposed to capture specific needs in....","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/tkde.2026.3658153","openalex_id":"https://openalex.org/W7125826563","cited_by_count":0,"quality_score":53,"matched_keywords":["LLM","language model","personalized","preference"],"author_affiliations":["Alibaba Group (China)","Beijing Jiaotong University","Beijing University of Technology"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8970000147819519},{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.6930000185966492},{"id":"https://openalex.org/C2780451532","display_name":"Task (project management)","score":0.5647000074386597},{"id":"https://openalex.org/C167966045","display_name":"Generative model","score":0.5497999787330627},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.47920000553131104},{"id":"https://openalex.org/C26517878","display_name":"Key (lock)","score":0.47450000047683716},{"id":"https://openalex.org/C2781249084","display_name":"Preference","score":0.46950000524520874},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4293000102043152}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"apple:ddg2j6jwh9h6yb9rxdv2lc7h","title":"VLSU: Mapping the Limits of Joint Multimodal Understanding for AI Safety","url":"https://machinelearning.apple.com/research/vlsu","published":"2026-01-27","authors":["Shruti Palaskar","Leon Gatys","Mona Abdelrahman","Mar Jacobo","Larry Lindsey","Rutika Moharir","Gunnar Lund","Yang Xu","Navid Shiee","Jeffrey Bigham","Charles Maalouf","Joseph Yitan Cheng"],"abstract":"Safety evaluation of multimodal foundation models often treats vision and language inputs separately, missing risks from joint interpretation where benign content becomes harmful in combination. Existing approaches also fail to distinguish clearly unsafe content from borderline cases, leading to problematic over-blocking or under-refusal of genuinely harmful content. We present Vision Language Safety Understanding (VLSU), a comprehensive...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"apple:wrj34tk3btcih0pu0q6kmw8q","title":"Principled Coarse-Grained Acceptance for Speculative Decoding in Speech","url":"https://machinelearning.apple.com/research/coarse-grained","published":"2026-01-27","authors":["Moran Yanuka","Paul Dixon","Eyal Finkelshttein","Daniel Rotman","Raja Giryes"],"abstract":"Speculative decoding accelerates autoregressive speech generation by letting a fast draft model propose tokens that a larger target model verifies. However, for speech LLMs that generate acoustic tokens, exact token matching is overly restrictive: many discrete tokens are acoustically or semantically interchangeable, reducing acceptance rates and limiting speedups. We introduce Principled Coarse-Graining (PCG), which verifies proposals at the...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"apple:tfcp2jgnzxerbuszpd3j4c1b","title":"Learning to Reason as Action Abstractions with Scalable Mid-Training RL","url":"https://machinelearning.apple.com/research/action-abstractions","published":"2026-01-27","authors":["Shenao Zhang","Donghan Yu","Yihao Feng","Bowen Jin","Zhaoran Wang","John Peebles","Zirui Wang"],"abstract":"Large language models excel with reinforcement learning (RL), but fully unlocking this potential requires a mid-training stage. An effective mid-training phase should identify a compact set of useful actions and enable fast selection among them through online RL. We formalize this intuition by presenting the first theoretical result on how mid-training shapes post-training: it characterizes an action subspace that minimizes both the value...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"arxiv:2507.04701","title":"XiYan-SQL: A Novel Multi-Generator Framework for Text-to-SQL","url":"http://arxiv.org/abs/2507.04701","published":"2026-01-27","authors":["Yifu Liu","Yin Zhu","Yingqi Gao","Zhiling Luo","Xiaoxia Li","Xiaorong Shi","Yú Hónɡ","Jinyang Gao","Haijun Yu","Bolin Ding","Jingren Zhou"],"abstract":"To leverage the advantages of LLM in addressing challenges in the Text-to-SQL task, we present XiYan-SQL, an innovative framework effectively generating and utilizing multiple SQL candidates. It consists of three components: 1) a Schema Filter module filtering and obtaining multiple relevant schemas; 2) a multi-generator ensemble approach generating multiple high-quality and diverse SQL queries; 3) a selection model with a candidate reorganization strategy implemented to obtain the optimal SQL query. Specifically, for the multi-generator ensemble, we employ a multi-task fine-tuning strategy to enhance the capabilities of SQL generation models for the intrinsic alignment between SQL and text, and construct multiple generation models with distinct generation styles by fine-tuning across different SQL formats. The experimental results and comprehensive analysis demonstrate the effectiveness...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/tkde.2026.3657851","openalex_id":"https://openalex.org/W4415348225","cited_by_count":2,"quality_score":43,"matched_keywords":["LLM"],"author_affiliations":["Alibaba Group (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7915999889373779},{"id":"https://openalex.org/C510870499","display_name":"SQL","score":0.7145000100135803},{"id":"https://openalex.org/C153083717","display_name":"Leverage (statistics)","score":0.4765999913215637},{"id":"https://openalex.org/C63479239","display_name":"Robustness (evolution)","score":0.476500004529953},{"id":"https://openalex.org/C124101348","display_name":"Data mining","score":0.4652000069618225},{"id":"https://openalex.org/C106131492","display_name":"Filter (signal processing)","score":0.3977999985218048},{"id":"https://openalex.org/C32145003","display_name":"PL/SQL","score":0.3944000005722046},{"id":"https://openalex.org/C177264268","display_name":"Set (abstract data type)","score":0.38609999418258667}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":2}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/in-agents-we-trust-but-who-do-agents-trust-latent-source-preferences-steer-llm-generations-2","title":"In Agents We Trust, but Who Do Agents Trust? Latent Source Preferences Steer LLM Generations","url":"https://www.microsoft.com/en-us/research/publication/in-agents-we-trust-but-who-do-agents-trust-latent-source-preferences-steer-llm-generations-2/","published":"2026-01-26","authors":["Mohammad Aflah Khan","Mahsa Amani","Soumi Das","Bishwamittra Ghosh","Qinyuan Wu","Krishna P. Gummadi","Manish Gupta","Abhilasha Ravichander"],"abstract":"Agents based on Large Language Models (LLMs) are increasingly being deployed as interfaces to information on online platforms. These agents filter, prioritize, and synthesize information retrieved from the platforms'back-end databases or via web search. In these scenarios, LLM agents govern the information users receive, by drawing users'attention to particular instances of retrieved information at the expense of others. While much prior work has focused on biases in the information LLMs themselves generate, less attention has been paid to the factors that influence what information LLMs select and present to users. We hypothesize that when information is attributed to specific sources (e.g., particular publishers, journals, or platforms), current LLMs exhibit systematic latent source preferences- that is, they prioritize information from some sources over others. Through controlled expe...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","1970-01-01","LLM","news"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"arxiv:2601.18202","title":"SAGE: Steerable Agentic Data Generation for Deep Search with Execution Feedback","url":"https://huggingface.co/papers/2601.18202","published":"2026-01-26","authors":["Fangyuan Xu","Rujun Han","Yanfei Chen","Zifeng Wang","I-Hung Hsu","Jun Yan","Vishy Tirumalashetty","Eunsol Choi","Tomas Pfister","Chen-Yu Lee"],"abstract":"Deep search agents, which aim to answer complex questions requiring reasoning across multiple documents, can significantly speed up the information-seeking process. Collecting human annotations for this application is prohibitively expensive due to long and complex exploration trajectories. We propose an agentic pipeline that automatically generates high quality, difficulty-controlled deep search question-answer pairs for a given corpus and a target difficulty level. Our pipeline, SAGE, consists of a data generator which proposes QA pairs and a search agent which attempts to solve the generated question and provide execution feedback for the data generator. The two components interact over multiple rounds to iteratively refine the question-answer pairs until they satisfy the target difficulty level. Our intrinsic evaluation shows SAGE generates questions that require diverse reasoning st...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["huggingface_search"],"source":"huggingface_search","work_type":"","doi":"","openalex_id":"","cited_by_count":0,"quality_score":35,"matched_keywords":["retrieval","agent"],"author_affiliations":[],"concepts":[],"official_report":false,"quality_signals":{"company_match_source":"HuggingFace Papers organization/author metadata"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/adareasoner-dynamic-tool-orchestration-for-iterative-visual-reasoning","title":"AdaReasoner: Dynamic Tool Orchestration for Iterative Visual Reasoning","url":"https://www.microsoft.com/en-us/research/publication/adareasoner-dynamic-tool-orchestration-for-iterative-visual-reasoning/","published":"2026-01-25","authors":["Mingyang Song","Haoyu Sun","Jiawei Gu","Linjie Li","Luxin Xu","Ranjay Krishna","Yu Cheng"],"abstract":"When humans face problems beyond their immediate capabilities, they rely on tools, providing a promising paradigm for improving visual reasoning in multimodal large language models (MLLMs). Effective reasoning, therefore, hinges on knowing which tools to use, when to invoke them, and how to compose them over multiple steps, even when faced with new tools or new tasks. We introduce \\textbf{AdaReasoner}, a family of multimodal models that learn tool use as a general reasoning skill rather than as tool-specific or explicitly supervised behavior. AdaReasoner is enabled by (i) a scalable data curation pipeline exposing models to long-horizon, multi-step tool interactions; (ii) Tool-GRPO, a reinforcement learning algorithm that optimizes tool selection and sequencing based on end-task success; and (iii) an adaptive learning mechanism that dynamically regulates tool usage. Together, these compo...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","Multimodal Large Language Models","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/llm-42-enabling-determinism-in-llm-inference-with-verified-speculation","title":"LLM-42: Enabling Determinism in LLM Inference with Verified Speculation","url":"https://www.microsoft.com/en-us/research/publication/llm-42-enabling-determinism-in-llm-inference-with-verified-speculation/","published":"2026-01-25","authors":["Raja Gond","Aditya K Kamath","Ramachandran Ramjee","Ashish Panwar"],"abstract":"In LLM inference, the same prompt may yield different outputs across different runs. At the system level, this non-determinism arises from floating-point non-associativity combined with dynamic batching and GPU kernels whose reduction orders vary with batch size. A straightforward way to eliminate non-determinism is to disable dynamic batching during inference, but doing so severely degrades throughput. Another approach is to make kernels batch-invariant; however, this tightly couples determinism to kernel design, requiring new implementations. This coupling also imposes fixed runtime overheads, regardless of how much of the workload actually requires determinism.Inspired by ideas from speculative decoding, we present LLM-42, a scheduling-based approach to enable determinism in LLM inference. Our key observation is that if a sequence is in a consistent state, the next emitted token is li...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Unpublished","Systems and networking","LLMs Inference","LLM"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"hf-org-paper:Tencent-Hunyuan:2601.17737","title":"The Script is All You Need: An Agentic Framework for Long-Horizon Dialogue-to-Cinematic Video Generation","url":"https://huggingface.co/papers/2601.17737","published":"2026-01-25","authors":["Tencent/Hunyuan"],"abstract":"","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["HuggingFace org papers","Tencent-Hunyuan"],"author_affiliations":["Tencent/Hunyuan"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/Tencent-Hunyuan/papers"}},{"id":"hf-org-paper:Qwen:2601.18137","title":"DeepPlanning: Benchmarking Long-Horizon Agentic Planning with Verifiable Constraints","url":"https://huggingface.co/papers/2601.18137","published":"2026-01-25","authors":["Alibaba/Qwen"],"abstract":"","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["HuggingFace org papers","Qwen"],"author_affiliations":["Alibaba/Qwen"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/Qwen/papers"}},{"id":"openalex:W7125637838","title":"Affective computing in the era of large language models: A survey from the NLP perspective","url":"https://doi.org/10.1016/j.knosys.2026.115411","published":"2026-01-25","authors":["Yiqun Zhang","Xiaocui Yang","Xingle Xu","Zeran Gao","Yijie Huang","Shiyi Mu","Shi Feng","Daling Wang","Yifei Zhang","Kaisong Song","Ge Yu"],"abstract":"","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1016/j.knosys.2026.115411","openalex_id":"https://openalex.org/W7125637838","cited_by_count":1,"quality_score":38,"matched_keywords":[],"author_affiliations":["Alibaba Group (China)","Northeastern University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6847000122070312},{"id":"https://openalex.org/C86251818","display_name":"Benchmarking","score":0.5824999809265137},{"id":"https://openalex.org/C12713177","display_name":"Perspective (graphical)","score":0.5504999756813049},{"id":"https://openalex.org/C2780451532","display_name":"Task (project management)","score":0.4740999937057495},{"id":"https://openalex.org/C139807058","display_name":"Adaptation (eye)","score":0.4713999927043915},{"id":"https://openalex.org/C169900460","display_name":"Cognition","score":0.46320000290870667},{"id":"https://openalex.org/C97541855","display_name":"Reinforcement learning","score":0.44519999623298645},{"id":"https://openalex.org/C6438553","display_name":"Affective computing","score":0.4097000062465668}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/spatialmath-spatial-comprehension-infused-symbolic-reasoning-for-mathematical-problem-solving","title":"SpatialMath: Spatial Comprehension-Infused Symbolic Reasoning for Mathematical Problem-Solving","url":"https://www.microsoft.com/en-us/research/publication/spatialmath-spatial-comprehension-infused-symbolic-reasoning-for-mathematical-problem-solving/","published":"2026-01-24","authors":["Ashutosh Bajpai","Akshat Bhandari","Akshay Nambi","Tanmoy Chakraborty"],"abstract":"Multimodal Small-to-Medium sized Language Models (MSLMs) have demonstrated strong capabilities in integrating visual and textual information but still face significant limitations in visual comprehension and mathematical reasoning, particularly in geometric problems with diverse levels of visual infusion. Current models struggle to accurately decompose intricate visual inputs and connect perception with structured reasoning, leading to suboptimal performance. To address these challenges, we propose SpatialMath, a novel Spatial Comprehension-Infused Symbolic Reasoning Framework designed to integrate spatial representations into structured symbolic reasoning chains. SpatialMath employs a specialized perception module to extract spatially-grounded representations from visual diagrams, capturing critical geometric structures and spatial relationships. These representations are then methodica...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Article (Journal)","Artificial intelligence","Computer science"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W7125587198","title":"Cosh-DiT: Co-Speech Gesture Video Synthesis via Hybrid Audio-Visual Diffusion Transformers","url":"https://doi.org/10.1007/s11263-026-02752-z","published":"2026-01-24","authors":["Yasheng Sun","Zhiliang Xu","Hang Zhou","Jiazhi Guan","Quanwei Yang","Kaisiyuan Wang","Borong Liang","Yingying Li","Haocheng Feng","Jingdong Wang","Ziwei Liu","Koike Hideki"],"abstract":"","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1007/s11263-026-02752-z","openalex_id":"https://openalex.org/W7125587198","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Baidu (China)","King Abdullah University of Science and Technology","Nanyang Technological University","Tokyo Institute of Technology","Tsinghua University","University of Science and Technology of China"],"concepts":[{"id":"https://openalex.org/C207347870","display_name":"Gesture","score":0.8463000059127808},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7908999919891357},{"id":"https://openalex.org/C66322947","display_name":"Transformer","score":0.652899980545044},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5648999810218811},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.5565000176429749},{"id":"https://openalex.org/C49937458","display_name":"Probabilistic logic","score":0.4595000147819519},{"id":"https://openalex.org/C28490314","display_name":"Speech recognition","score":0.4318999946117401},{"id":"https://openalex.org/C159437735","display_name":"Gesture recognition","score":0.4246000051498413}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/scaling-medical-imaging-report-generation-with-multimodal-reinforcement-learning","title":"Scaling medical imaging report generation with multimodal reinforcement learning","url":"https://www.microsoft.com/en-us/research/publication/scaling-medical-imaging-report-generation-with-multimodal-reinforcement-learning/","published":"2026-01-23","authors":["Flora Liu","Sheng Zhang","Guanghui Qin","Yu Gu","Ying Jin","Sam Preston","Yanbo Xu","Wen-wai Yim","Sid Kiblawi","Tim Ossowski","Tristan Naumann","Mu Wei"],"abstract":"Frontier models have demonstrated remarkable capabilities in understanding and reasoning with natural-language text, but they still exhibit major competency gaps in multimodal understanding and reasoning especially in high-value verticals such as biomedicine. Medical imaging report generation is a prominent example. Supervised fine-tuning can substantially improve performance, but they are prone to overfitting to superficial boilerplate patterns. In this paper, we introduce Universal Report Generation (UniRG) as a general framework for medical imaging report generation. By leveraging reinforcement learning as a unifying mechanism to directly optimize for evaluation metrics designed for end applications, UniRG can significantly improve upon supervised fine-tuning and attain durable generalization across diverse institutions and clinical practices. We trained UniRG-CXR on publicly availabl...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":84,"matched_keywords":["Miscellaneous","Artificial intelligence","Computer vision","Human language technologies","Computation and Language","Computer Vision and Pattern Recognition","Medical diagnosis","Medical Imaging"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"bytedance-seed:1364","title":"Stable-DiffCoder: Pushing the Frontier of Code Diffusion Large Language Model","url":"https://seed.bytedance.com/en/research/stable-diffcoder-pushing-the-frontier-of-code-diffusion-large-language-model","published":"2026-01-23","authors":["Chenghao Fan","Wen Heng","Bo Li","Sichen Liu","Yuxuan Song","Jing Su","Xiaoye Qu","Kai Shen","Wei Wei"],"abstract":"Diffusion-based language models (DLLMs) offer non-sequential, block-wise generation and richer data reuse compared to autoregressive (AR) models, but existing code DLLMs still lag behind strong AR baselines under comparable budgets. We revisit this setting in a controlled study and introduce Stable-DiffCoder, a block diffusion code model that reuses the Seed-Coder architecture, data, and training pipeline. To enable efficient knowledge learning and stable training, we incorporate a block diffusion continual pretraining (CPT) stage enhanced by a tailored warmup and block-wise clipped noise schedule. Under the same data and architecture, Stable-DiffCoder overall outperforms its AR counterpart on a broad suite of code benchmarks. Moreover, relying only on the CPT and supervised fine-tuning stages, Stable-DiffCoder achieves stronger performance than a wide range of \\~8B ARs and DLLMs, demons...","companies":["ByteDance/Seed"],"matched_orgs":["ByteDance/Seed"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Computation and Language","LLM","arXiv","language model","efficient"],"author_affiliations":["ByteDance/Seed"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official ByteDance Seed publication API https://seed.bytedance.com/api/get_article_list_v2"}},{"id":"hf-org-paper:Tencent-Hunyuan:2601.17124","title":"iFSQ: Improving FSQ for Image Generation with 1 Line of Code","url":"https://huggingface.co/papers/2601.17124","published":"2026-01-23","authors":["Tencent/Hunyuan"],"abstract":"The field of image generation is currently bifurcated into autoregressive (AR) models operating on discrete tokens and diffusion models utilizing continuous latents. This divide, rooted in the distinction between VQ-VAEs and VAEs, hinders unified modeling and fair benchmarking. Finite Scalar Quantization (FSQ) offers a theoretical bridge, yet vanilla FSQ suffers from a critical flaw: its equal-interval quantization can cause activation collapse. This mismatch forces a trade-off between reconstruction fidelity and information efficiency. In this work, we resolve this dilemma by simply replacing the activation function in original FSQ with a distribution-matching mapping to enforce a uniform prior. Termed iFSQ, this simple strategy requires just one line of code yet mathematically guarantees both optimal bin utilization and reconstruction precision. Leveraging iFSQ as a controlled benchmar...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["HuggingFace org papers","Tencent-Hunyuan","quantization"],"author_affiliations":["Tencent/Hunyuan"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/Tencent-Hunyuan/papers"}},{"id":"openalex:W7125637239","title":"Beyond Functional Correctness: Exploring Hallucinations in LLM-Generated Code","url":"https://doi.org/10.1109/tse.2026.3657432","published":"2026-01-23","authors":["Fang Liu","Yang Liu","Lin Shi","Zhen Yang","Li Zhang","Xiaoli Lian","Zhongqi Li","Yuchi Ma"],"abstract":"","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/tse.2026.3657432","openalex_id":"https://openalex.org/W7125637239","cited_by_count":0,"quality_score":41,"matched_keywords":["LLM"],"author_affiliations":["Beihang University","Huawei Technologies (China)","Shandong University of Science and Technology"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7947999835014343},{"id":"https://openalex.org/C199360897","display_name":"Programming language","score":0.5360000133514404},{"id":"https://openalex.org/C2776760102","display_name":"Code (set theory)","score":0.5353000164031982},{"id":"https://openalex.org/C80444323","display_name":"Theoretical computer science","score":0.3481000065803528},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.3393999934196472},{"id":"https://openalex.org/C43126263","display_name":"Source code","score":0.3319000005722046},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3183000087738037},{"id":"https://openalex.org/C133162039","display_name":"Code generation","score":0.2978000044822693}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7125482931","title":"SDGT: LLMs fine-tuning with seed-driven growth technology based on GPT-4 data expansion","url":"https://doi.org/10.1016/j.neucom.2026.132766","published":"2026-01-23","authors":["Daming Gao","Jiayi Dai","Sen Liu","Linbo Jin","Wen Jiang","Shanqing Yu","Qi Xuan","Xiaoyan Cai","Libin Yang"],"abstract":"","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1016/j.neucom.2026.132766","openalex_id":"https://openalex.org/W7125482931","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Alibaba Group (China)","Northwestern Polytechnical University","Zhejiang University of Technology"],"concepts":[{"id":"https://openalex.org/C2776436953","display_name":"Consistency (knowledge bases)","score":0.7462999820709229},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7257000207901001},{"id":"https://openalex.org/C177606310","display_name":"Adaptability","score":0.7142000198364258},{"id":"https://openalex.org/C2781316041","display_name":"Diversity (politics)","score":0.5867999792098999},{"id":"https://openalex.org/C93361087","display_name":"Data consistency","score":0.5285999774932861},{"id":"https://openalex.org/C2781181686","display_name":"Coherence (philosophical gambling strategy)","score":0.4447000026702881},{"id":"https://openalex.org/C9652623","display_name":"Field (mathematics)","score":0.43479999899864197},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.40880000591278076}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"hf-org-paper:tencent:2601.15808","title":"Inference-Time Scaling of Verification: Self-Evolving Deep Research Agents via Test-Time Rubric-Guided Verification","url":"https://huggingface.co/papers/2601.15808","published":"2026-01-22","authors":["Tencent/Hunyuan"],"abstract":"","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["HuggingFace org papers","tencent"],"author_affiliations":["Tencent/Hunyuan"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/tencent/papers"}},{"id":"openalex:W7125607202","title":"Open-World Task and Motion Planning via Vision-Language Model Generated Constraints","url":"https://doi.org/10.1109/lra.2026.3656799","published":"2026-01-22","authors":["Nishanth Kumar","William Shen","Fábio Ramos","Dieter Fox","Tomás Lozano-Pérez","Leslie Pack Kaelbling","Caelan Reed Garrett"],"abstract":"","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/lra.2026.3656799","openalex_id":"https://openalex.org/W7125607202","cited_by_count":1,"quality_score":42,"matched_keywords":["language model"],"author_affiliations":["Nvidia (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5752000212669373},{"id":"https://openalex.org/C2780451532","display_name":"Task (project management)","score":0.5187000036239624},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4569000005722046},{"id":"https://openalex.org/C104114177","display_name":"Motion (physics)","score":0.41620001196861267},{"id":"https://openalex.org/C81074085","display_name":"Motion planning","score":0.41600000858306885},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.38199999928474426},{"id":"https://openalex.org/C2776036281","display_name":"Constraint (computer-aided design)","score":0.34880000352859497},{"id":"https://openalex.org/C2780791683","display_name":"Action (physics)","score":0.30070000886917114}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"arxiv:2601.16093","title":"SAMTok: Representing Any Mask with Two Words","url":"https://huggingface.co/papers/2601.16093","published":"2026-01-22","authors":["Yikang Zhou","Tao Zhang","Dengxian Gong","Yuanzheng Wu","Ye Tian","Haochen Wang","Haobo Yuan","Jiacong Wang","Lu Qi","Hao Fei","Anran Wang","Zhuochen Wang"],"abstract":"Pixel-wise capabilities are essential for building interactive intelligent systems. However, pixel-wise multi-modal LLMs (MLLMs) remain difficult to scale due to complex region-level encoders, specialized segmentation decoders, and incompatible training objectives. To address these challenges, we present SAMTok, a discrete mask tokenizer that converts any region mask into two special tokens and reconstructs the mask using these tokens with high fidelity. By treating masks as new language tokens, SAMTok enables base MLLMs (such as the QwenVL series) to learn pixel-wise capabilities through standard next-token prediction and simple reinforcement learning, without architectural modifications and specialized loss design. SAMTok builds on SAM2 and is trained on 209M diverse masks using a mask encoder and residual vector quantizer to produce discrete, compact, and information-rich tokens. With...","companies":["ByteDance/Seed"],"matched_orgs":["ByteDance/Seed"],"company_groups":["company_china"],"company_regions":["China"],"sources":["huggingface_search"],"source":"huggingface_search","work_type":"","doi":"","openalex_id":"","cited_by_count":0,"quality_score":31,"matched_keywords":["efficient"],"author_affiliations":[],"concepts":[],"official_report":false,"quality_signals":{"company_match_source":"HuggingFace Papers organization/author metadata"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/emotionthinker-prosody-aware-reinforcement-learning-for-explainable-speech-emotion-reasoning","title":"EmotionThinker: Prosody-Aware Reinforcement Learning for Explainable Speech Emotion Reasoning","url":"https://www.microsoft.com/en-us/research/publication/emotionthinker-prosody-aware-reinforcement-learning-for-explainable-speech-emotion-reasoning/","published":"2026-01-21","authors":["Dingdong Wang","Shujie Liu","Tianhua Zhang","Youjun Chen","Jinyu Li","Helen M. Meng"],"abstract":"Emotional information in speech plays a unique role in multimodal perception. However, current Speech Large Language Models (SpeechLLMs), similar to conventional speech emotion recognition (SER) systems, still treat emotion understanding as a simple classification problem. This provides limited interpretability of predictions, while leaving the LLMs'expressive and reasoning capabilities underutilized. In this work, we take the first step to reformulate SER as a deep reasoning problem through reinforcement learning (RL). We propose EmotionThinker, which is designed to generate accurate emotion predictions with interpretable explanations grounded in fine-grained acoustic cues. To achieve this, we first construct EmotionCoT-35K, an emotional reasoning dataset with Chain-of-Thought annotations and detailed captions. Second, we observe that current SpeechLLMs exhibit weak prosody perception,....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Audio and Acoustics","Computer science","Reinforcement learning","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"hf-org-paper:tencent:2601.14750","title":"Render-of-Thought: Rendering Textual Chain-of-Thought as Images for Visual Latent Reasoning","url":"https://huggingface.co/papers/2601.14750","published":"2026-01-21","authors":["Tencent/Hunyuan"],"abstract":"","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["HuggingFace org papers","tencent"],"author_affiliations":["Tencent/Hunyuan"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/tencent/papers"}},{"id":"hf-org-paper:Qwen:2601.15621","title":"Qwen3-TTS Technical Report","url":"https://huggingface.co/papers/2601.15621","published":"2026-01-21","authors":["Alibaba/Qwen"],"abstract":"","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["HuggingFace org papers","Qwen"],"author_affiliations":["Alibaba/Qwen"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/Qwen/papers"}},{"id":"apple:bl0bbc917rm4gucos6sk4ab2","title":"DiffuCoder: Understanding and Improving Masked Diffusion Models for Code Generation","url":"https://machinelearning.apple.com/research/diffucoder","published":"2026-01-21","authors":["Shansan Gong","Ruixiang Zhang","Huangjie Zheng","Jiatao Gu","Navdeep Jaitly","Lingpeng Kong","Yizhe Zhang"],"abstract":"Diffusion large language models (dLLMs) are compelling alternatives to autoregressive (AR) models because their denoising models operate over the entire sequence. The global planning and iterative refinement features of dLLMs are particularly useful for code generation. However, current training and inference mechanisms for dLLMs in coding are still under-explored. To demystify the decoding behavior of dLLMs and unlock their potential for coding,...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"openalex:W7126155172","title":"WESE: weak exploration to strong exploitation for LLM agents","url":"https://doi.org/10.1007/s11432-024-4554-x","published":"2026-01-21","authors":["Xu Huang","Weiwen Liu","Xiaolong Chen","Xingmei Wang","Defu Lian","Yasheng Wang","Ruiming Tang","Enhong Chen"],"abstract":"","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1007/s11432-024-4554-x","openalex_id":"https://openalex.org/W7126155172","cited_by_count":0,"quality_score":41,"matched_keywords":["LLM"],"author_affiliations":["Huawei Technologies (China)","Huawei Technologies (Sweden)","University of Science and Technology of China"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6481000185012817},{"id":"https://openalex.org/C192209626","display_name":"Focus (optics)","score":0.46880000829696655},{"id":"https://openalex.org/C205606062","display_name":"Decoupling (probability)","score":0.3594000041484833},{"id":"https://openalex.org/C113336015","display_name":"Complete information","score":0.33160001039505005},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3212999999523163},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.3197000026702881},{"id":"https://openalex.org/C112930515","display_name":"Risk analysis (engineering)","score":0.2892000079154968},{"id":"https://openalex.org/C51823790","display_name":"Greedy algorithm","score":0.2854999899864197}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7125508931","title":"GDPVAL: Evaluating AI Model Performance on Real-World Economically Valuable Tasks","url":"https://doi.org/10.70777/si.v2i4.17197","published":"2026-01-21","authors":["Tejal Patwardhan","Rachel Dias","Elizabeth Proehl","Grace Kim","Michele Wang","Olivia Watkins","Sim´on Posada Fishman","Marwan Aljubeh","Phoebe Thacker","Laurance Fauconnet","Natalie S. Kim","Patrick Chao"],"abstract":"We introduce GDPval, a benchmark evaluating AI model capabilities on realworld economically valuable tasks. GDPval covers the majority of U.S. Bureau of Labor Statistics Work Activities for 44 occupations across the top 9 sectors contributing to U.S. GDP (Gross Domestic Product). Tasks are constructed from the representative work of industry professionals with an average of 14 years of experience. We find that frontier model performance on GDPval is improving roughly linearly over time, and that the current best frontier models are approaching industry experts in deliverable quality. We analyze the potential for frontier models, when paired with human oversight, to perform GDPval tasks cheaper and faster than unaided experts. We also demonstrate that increased reasoning effort, increased task context, and increased scaffolding improves model performance on GDPval. Finally, we open-source...","companies":["OpenAI"],"matched_orgs":["OpenAI"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.70777/si.v2i4.17197","openalex_id":"https://openalex.org/W7125508931","cited_by_count":3,"quality_score":40,"matched_keywords":[],"author_affiliations":["OpenAI (United States)"],"concepts":[{"id":"https://openalex.org/C2778571376","display_name":"Frontier","score":0.6617000102996826},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6126999855041504},{"id":"https://openalex.org/C185798385","display_name":"Benchmark (surveying)","score":0.5888000130653381},{"id":"https://openalex.org/C2780451532","display_name":"Task (project management)","score":0.5774000287055969},{"id":"https://openalex.org/C2777286243","display_name":"Grading (engineering)","score":0.4846000075340271},{"id":"https://openalex.org/C18762648","display_name":"Work (physics)","score":0.4740000069141388},{"id":"https://openalex.org/C51485801","display_name":"Efficient frontier","score":0.4375999867916107},{"id":"https://openalex.org/C21883318","display_name":"Deliverable","score":0.41830000281333923}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":3}},{"id":"arxiv:2601.14594","title":"LFS: Learnable Frame Selector for Event-Aware and Temporally Diverse Video Captioning","url":"http://arxiv.org/abs/2601.14594","published":"2026-01-21","authors":["Lianying Chao","Linfeng Yin","Peiyu Ren","Yifan Jiang","Qiaoyu Ren","Dingcheng Shan","Jing-Cheng Pang","Sijie Wu","Xubin Li","Kai Zhang"],"abstract":"Video captioning models convert frames into visual tokens and generate descriptions with large language models (LLMs). Since encoding all frames is prohibitively expensive, uniform sampling is the default choice, but it enforces equal temporal coverage while ignoring the uneven events distribution. This motivates a Learnable Frame Selector (LFS) that selects temporally diverse and event-relevant frames. LFS explicitly models temporal importance to balance temporal diversity and event relevance, and employs a stratified strategy to ensure temporal coverage while avoiding clustering. Crucially, LFS leverages caption feedback from frozen video-LLMs to learn frame selection that directly optimizes downstream caption quality. Additionally, we identify the gap between existing benchmark and human's cognition. Thus, we introduce ICH-CC built from carefully designed questions by annotators that....","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"","openalex_id":"https://openalex.org/W7125460065","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Huawei Technologies (China)","Huawei Technologies (United Kingdom)","Huawei Technologies (United States)"],"concepts":[{"id":"https://openalex.org/C157657479","display_name":"Closed captioning","score":0.9358999729156494},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8549000024795532},{"id":"https://openalex.org/C126042441","display_name":"Frame (networking)","score":0.6288999915122986},{"id":"https://openalex.org/C185798385","display_name":"Benchmark (surveying)","score":0.5690000057220459},{"id":"https://openalex.org/C2779662365","display_name":"Event (particle physics)","score":0.5547000169754028},{"id":"https://openalex.org/C81917197","display_name":"Selection (genetic algorithm)","score":0.5271999835968018},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5103999972343445},{"id":"https://openalex.org/C28490314","display_name":"Speech recognition","score":0.38659998774528503}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"arxiv:2601.15282","title":"Rethinking Video Generation Model for the Embodied World","url":"https://huggingface.co/papers/2601.15282","published":"2026-01-21","authors":["Yufan Deng","Zilin Pan","Hongyu Zhang","Xiaojie Li","Ruoqing Hu","Yufei Ding","Yiming Zou","Yan Zeng","Daquan Zhou"],"abstract":"Video generation models have significantly advanced embodied intelligence, unlocking new possibilities for generating diverse robot data that capture perception, reasoning, and action in the physical world. However, synthesizing high-quality videos that accurately reflect real-world robotic interactions remains challenging, and the lack of a standardized benchmark limits fair comparisons and progress. To address this gap, we introduce a comprehensive robotics benchmark, RBench, designed to evaluate robot-oriented video generation across five task domains and four distinct embodiments. It assesses both task-level correctness and visual fidelity through reproducible sub-metrics, including structural consistency, physical plausibility, and action completeness. Evaluation of 25 representative models highlights significant deficiencies in generating physically realistic robot behaviors. Furth...","companies":["ByteDance/Seed"],"matched_orgs":["ByteDance/Seed"],"company_groups":["company_china"],"company_regions":["China"],"sources":["huggingface_search"],"source":"huggingface_search","work_type":"","doi":"","openalex_id":"","cited_by_count":0,"quality_score":27,"matched_keywords":[],"author_affiliations":[],"concepts":[],"official_report":false,"quality_signals":{"company_match_source":"HuggingFace Papers organization/author metadata"}},{"id":"hf-org-paper:huawei-noah:2601.13599","title":"Diffusion In Diffusion: Reclaiming Global Coherence in Semi-Autoregressive Diffusion","url":"https://huggingface.co/papers/2601.13599","published":"2026-01-20","authors":["Huawei/Noah"],"abstract":"","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["HuggingFace org papers","huawei-noah"],"author_affiliations":["Huawei/Noah"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/huawei-noah/papers"}},{"id":"openalex:W7124888346","title":"Holistic evaluation of large language models for medical tasks with MedHELM","url":"https://doi.org/10.1038/s41591-025-04151-2","published":"2026-01-20","authors":["Suhana Bedi","Hejie Cui","Miguel Fuentes","Alyssa Unell","Michael Wornow","Juan M. Banda","Nikesh Kotecha","Timothy Keyes","Yifan Mai","Mert Oez","Hao Qiu","Shrey Jain"],"abstract":"","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1038/s41591-025-04151-2","openalex_id":"https://openalex.org/W7124888346","cited_by_count":8,"quality_score":45,"matched_keywords":[],"author_affiliations":["Cardiovascular Institute of the South","Microsoft (United States)","Stanford Health Care","Stanford Medicine","Stanford University","eHealth Africa"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6859999895095825},{"id":"https://openalex.org/C177212765","display_name":"Workflow","score":0.6830999851226807},{"id":"https://openalex.org/C79581498","display_name":"Suite","score":0.5324000120162964},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.47049999237060547},{"id":"https://openalex.org/C185798385","display_name":"Benchmark (surveying)","score":0.45170000195503235},{"id":"https://openalex.org/C58642233","display_name":"Taxonomy (biology)","score":0.3896999955177307},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.37720000743865967},{"id":"https://openalex.org/C160735492","display_name":"Health care","score":0.3741999864578247}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":8}},{"id":"arxiv:2601.14440","title":"VisTIRA: Closing the Image-Text Modality Gap in Visual Math Reasoning via Structured Tool Integration","url":"http://arxiv.org/abs/2601.14440","published":"2026-01-20","authors":["Saeed Khaki","Ashudeep Singh","Nima Safaei","Kamal Ginotra"],"abstract":"Vision-language models (VLMs) lag behind text-only language models on mathematical reasoning when the same problems are presented as images rather than text. We empirically characterize this as a modality gap: the same question in text form yields markedly higher accuracy than its visually typeset counterpart, due to compounded failures in reading dense formulas, layout, and mixed symbolic-diagrammatic context. First, we introduce VisTIRA (Vision and Tool-Integrated Reasoning Agent), a tool-integrated reasoning framework that enables structured problem solving by iteratively decomposing a given math problem (as an image) into natural language rationales and executable Python steps to determine the final answer. Second, we build a framework to measure and improve visual math reasoning: a LaTeX-based pipeline that converts chain-of-thought math corpora (e.g., NuminaMath) into challenging i...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"","openalex_id":"https://openalex.org/W7125459675","cited_by_count":0,"quality_score":41,"matched_keywords":["agent"],"author_affiliations":["Microsoft (United States)","Microsoft Research (United Kingdom)"],"concepts":[{"id":"https://openalex.org/C2777508537","display_name":"Visual reasoning","score":0.7893999814987183},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5461999773979187},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.541700005531311},{"id":"https://openalex.org/C160145156","display_name":"Executable","score":0.5235000252723694},{"id":"https://openalex.org/C195344581","display_name":"Automated reasoning","score":0.5131999850273132},{"id":"https://openalex.org/C2778775528","display_name":"Closing (real estate)","score":0.5024999976158142},{"id":"https://openalex.org/C519991488","display_name":"Python (programming language)","score":0.4607999920845032},{"id":"https://openalex.org/C43521106","display_name":"Pipeline (software)","score":0.44269999861717224}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"arxiv:2601.13600","title":"Foundations of Global Consistency Checking with Noisy LLM Oracles","url":"http://arxiv.org/abs/2601.13600","published":"2026-01-20","authors":["Paul He","Elke Kirschbaum","Shiva Kasiviswanathan"],"abstract":"Ensuring that collections of natural-language facts are globally consistent is essential for tasks such as fact-checking, summarization, and knowledge base construction. While Large Language Models (LLMs) can assess the consistency of small subsets of facts, their judgments are noisy, and pairwise checks are insufficient to guarantee global coherence. We formalize this problem and show that verifying global consistency requires exponentially many oracle queries in the worst case. To make the task practical, we propose an adaptive divide-and-conquer algorithm that identifies minimal inconsistent subsets (MUSes) of facts and optionally computes minimal repairs through hitting-sets. Our approach has low-degree polynomial query complexity. Experiments with both synthetic and real LLM oracles show that our method efficiently detects and localizes inconsistencies, offering a scalable framework...","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"","openalex_id":"https://openalex.org/W7125352741","cited_by_count":0,"quality_score":41,"matched_keywords":["LLM"],"author_affiliations":["Amazon (United States)"],"concepts":[{"id":"https://openalex.org/C2776436953","display_name":"Consistency (knowledge bases)","score":0.8008999824523926},{"id":"https://openalex.org/C55166926","display_name":"Oracle","score":0.6708999872207642},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6525999903678894},{"id":"https://openalex.org/C37279795","display_name":"Consistency model","score":0.6000000238418579},{"id":"https://openalex.org/C48044578","display_name":"Scalability","score":0.5860000252723694},{"id":"https://openalex.org/C80444323","display_name":"Theoretical computer science","score":0.5712000131607056},{"id":"https://openalex.org/C42058472","display_name":"Base (topology)","score":0.5306000113487244},{"id":"https://openalex.org/C184898388","display_name":"Pairwise comparison","score":0.5241000056266785}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"arxiv:2601.14243","title":"Jet-RL: Enabling On-Policy FP8 Reinforcement Learning with Unified Training and Rollout Precision Flow","url":"https://huggingface.co/papers/2601.14243","published":"2026-01-20","authors":["Haocheng Xi","Charlie Ruan","Peiyuan Liao","Yujun Lin","Han Cai","Yilong Zhao","Shuo Yang","Kurt Keutzer","Song Han","Ligeng Zhu"],"abstract":"Reinforcement learning (RL) is essential for enhancing the complex reasoning capabilities of large language models (LLMs). However, existing RL training pipelines are computationally inefficient and resource-intensive, with the rollout phase accounting for over 70% of total training time. Quantized RL training, particularly using FP8 precision, offers a promising approach to mitigating this bottleneck. A commonly adopted strategy applies FP8 precision during rollout while retaining BF16 precision for training. In this work, we present the first comprehensive study of FP8 RL training and demonstrate that the widely used BF16-training + FP8-rollout strategy suffers from severe training instability and catastrophic accuracy collapse under long-horizon rollouts and challenging tasks. Our analysis shows that these failures stem from the off-policy nature of the approach, which introduces subs...","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["huggingface_search"],"source":"huggingface_search","work_type":"","doi":"","openalex_id":"","cited_by_count":0,"quality_score":27,"matched_keywords":[],"author_affiliations":[],"concepts":[],"official_report":false,"quality_signals":{"company_match_source":"HuggingFace Papers organization/author metadata"}},{"id":"hf-org-paper:Qwen:2601.13384","title":"From Completion to Editing: Unlocking Context-Aware Code Infilling via Search-and-Replace Instruction Tuning","url":"https://huggingface.co/papers/2601.13384","published":"2026-01-19","authors":["Alibaba/Qwen"],"abstract":"","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["HuggingFace org papers","Qwen"],"author_affiliations":["Alibaba/Qwen"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/Qwen/papers"}},{"id":"arxiv:2504.18776","title":"ThinkFL: Self-Refining Failure Localization for Microservice Systems via Reinforcement Fine-Tuning","url":"http://arxiv.org/abs/2504.18776","published":"2026-01-19","authors":["Lingzhe Zhang","Yunpeng Zhai","Tong Jia","Chiming Duan","Siyu Yu","Jinyang Gao","Bolin Ding","Zhonghai Wu","Ying Li"],"abstract":"As modern microservice systems grow increasingly popular and complex—often consisting of hundreds or even thousands of interdependent components—they are becoming more susceptible to frequent and subtle failures. Ensuring system reliability therefore hinges on accurate and efficient failure localization. Traditional failure localization approaches based on small models lack the flexibility to adapt to diverse failure scenarios, while recent LLM-based methods suffer from two major limitations: they often rely on rigid invocation workflows that constrain the model's ability to dynamically explore effective localization paths, and they require resource-intensive inference, making them cost-prohibitive for real-world deployment. To address these challenges, we explore the use of reinforcement fine-tuning to equip lightweight LLMs with reasoning and self-refinement capabilities, significantly...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"preprint","doi":"https://doi.org/10.1145/3789262","openalex_id":"https://openalex.org/W4416888443","cited_by_count":1,"quality_score":46,"matched_keywords":["LLM","efficient"],"author_affiliations":["Alibaba Group (China)","Alibaba Group (United States)","China National Space Administration","Peking University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8198999762535095},{"id":"https://openalex.org/C177606310","display_name":"Adaptability","score":0.6678000092506409},{"id":"https://openalex.org/C177212765","display_name":"Workflow","score":0.6288999915122986},{"id":"https://openalex.org/C2780598303","display_name":"Flexibility (engineering)","score":0.626800000667572},{"id":"https://openalex.org/C185874996","display_name":"Interdependence","score":0.5849999785423279},{"id":"https://openalex.org/C26517878","display_name":"Key (lock)","score":0.4896000027656555},{"id":"https://openalex.org/C43214815","display_name":"Reliability (semiconductor)","score":0.48069998621940613},{"id":"https://openalex.org/C120314980","display_name":"Distributed computing","score":0.47519999742507935}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W7124926458","title":"Make-Your-Anchor+: Temporal Consistent 2D Avatar Generation via Video Diffusion Prior","url":"https://doi.org/10.1109/tvcg.2026.3655478","published":"2026-01-19","authors":["Ziyao Huang","Fan Tang","Juan Cao","Yong Zhang","Xiaodong Cun","Yihang Bo","Jintao Li","Tong-Yee Lee"],"abstract":"Despite the remarkable process of talking-head-based avatar-creating solutions, directly generating anchor-style videos with full-body motions remains challenging. In this study, we propose Make-Your-Anchor+, a novel system necessitating only a one-minute video clip of an individual for training, subsequently enabling the automatic generation of anchor-style videos with precise torso and hand movements. Specifically, we finetune a proposed structure-guided diffusion model on input video to render 3D mesh conditions into human appearances. We adopt a two-stage training strategy for the diffusion model, effectively mapping movements with specific appearances to create digital avatars for online streamers, live shopping hosts, and other applications. To produce arbitrary long temporal video, we extract human motion information from video diffusion prior by adapting the frame-wise diffusion....","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/tvcg.2026.3655478","openalex_id":"https://openalex.org/W7124926458","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Beijing Film Academy","Chinese Academy of Sciences","Great Bay University","Institute of Computing Technology","National Cheng Kung University","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8657000064849854},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.7562000155448914},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.7282999753952026},{"id":"https://openalex.org/C523889960","display_name":"Torso","score":0.552299976348877},{"id":"https://openalex.org/C163294075","display_name":"Noise reduction","score":0.49799999594688416},{"id":"https://openalex.org/C2779304628","display_name":"Face (sociological concept)","score":0.4943999946117401},{"id":"https://openalex.org/C99498987","display_name":"Noise (video)","score":0.4754999876022339},{"id":"https://openalex.org/C2777365542","display_name":"Avatar","score":0.45879998803138733}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7130389813","title":"MAESIL: Masked Autoencoder for Enhanced Self-supervised Medical Image Learning","url":"https://doi.org/10.1109/iceic69189.2026.11386292","published":"2026-01-18","authors":["Kyeonghun Kim","Hyeonseok Jung","Youngung Han","Junsu Lim","Yeonju Jean","Seongbin Park","Eunseob Choi","Hyunsu Go","Seoyoung Ju","Seohyoung Park","Gyeongmin Kim","Minju Kwon"],"abstract":"Training deep learning models for three-dimensional (3D) medical imaging, such as Computed Tomography (CT), is fundamentally challenged by the scarcity of labeled data. While pre-training on natural images is common, it results in a significant domain shift, limiting performance. Self-Supervised Learning (SSL) on unlabeled medical data has emerged as a powerful solution, but prominent frameworks often fail to exploit the inherent 3D nature of CT scans. These methods typically process 3D scans as a collection of independent 2D slices, an approach that fundamentally discards critical axial coherence and the 3D structural context. To address this limitation, we propose the autoencoder for enhanced self-supervised medical image learning(MAESIL), a novel self-supervised learning framework designed to capture 3D structural information efficiently. The core innovation is the ‘superpatch,’ a 3D....","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/iceic69189.2026.11386292","openalex_id":"https://openalex.org/W7130389813","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Chung-Ang University","Dankook University","Ewha Womans University Medical Center","Gist (Czechia)","Health Outcomes Solutions (United States)","MPI Research (United States)","Nvidia (United Kingdom)","Nvidia (United States)","Sangmyung University","Seoul National University"],"concepts":[{"id":"https://openalex.org/C101738243","display_name":"Autoencoder","score":0.8148000240325928},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.7457000017166138},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7125999927520752},{"id":"https://openalex.org/C108583219","display_name":"Deep learning","score":0.6248000264167786},{"id":"https://openalex.org/C165696696","display_name":"Exploit","score":0.5770999789237976},{"id":"https://openalex.org/C2779343474","display_name":"Context (archaeology)","score":0.5339999794960022},{"id":"https://openalex.org/C26517878","display_name":"Key (lock)","score":0.520799994468689},{"id":"https://openalex.org/C31601959","display_name":"Medical imaging","score":0.5188000202178955}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"apple:ljlol9olf05d9osujcytmbkk","title":"The Data-Quality Illusion: Rethinking Classifier-Based Quality Filtering for LLM Pretraining","url":"https://machinelearning.apple.com/research/data-quality-illusion","published":"2026-01-16","authors":["Thiziri Nait Saada§","Louis Bethune","Michal Klein","David Grangier","Marco Cuturi","Pierre Ablin"],"abstract":"Large-scale models are pretrained on massive web-crawled datasets containing documents of mixed quality, making data filtering essential. A popular method is Classifier-based Quality Filtering (CQF), which trains a binary classifier to distinguish between pretraining data and a small, high-quality set. It assigns each pretraining document a quality score defined as the classifier's score and retains only the top-scoring ones. We provide an...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["LLM"],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"apple:eibqf3wfiwgk7479fdqq7m9b","title":"ParaRNN: Unlocking Parallel Training of Nonlinear RNNs for Large Language Models","url":"https://machinelearning.apple.com/research/pararnn","published":"2026-01-16","authors":["Federico Danieli","Pau Rodriguez","Miguel Sarabia","Xavier Suau","Luca Zappella"],"abstract":"Recurrent Neural Networks (RNNs) laid the foundation for sequence modeling, but their intrinsic sequential nature restricts parallel computation, creating a fundamental barrier to scaling. This has led to the dominance of parallelizable architectures like Transformers and, more recently, State Space Models (SSMs). While SSMs achieve efficient parallelization through structured linear recurrences, this linearity constraint limits their expressive...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["efficient"],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"arxiv:2601.11432","title":"The unreasonable effectiveness of pattern matching","url":"http://arxiv.org/abs/2601.11432","published":"2026-01-16","authors":["Gary Lupyan","Blaise Agüera y Arcas"],"abstract":"We report on an astonishing ability of large language models (LLMs) to make sense of \"Jabberwocky\" language in which most or all content words have been randomly replaced by nonsense strings, e.g., translating \"He dwushed a ghanc zawk\" to \"He dragged a spare chair\". This result addresses ongoing controversies regarding how to best think of what LLMs are doing: are they a language mimic, a database, a blurry version of the Web? The ability of LLMs to recover meaning from structural patterns speaks to the unreasonable effectiveness of pattern-matching. Pattern-matching is not an alternative to \"real\" intelligence, but rather a key ingredient.","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"","openalex_id":"https://openalex.org/W7124818067","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Google (United States)"],"concepts":[{"id":"https://openalex.org/C2780876879","display_name":"Meaning (existential)","score":0.6858000159263611},{"id":"https://openalex.org/C62923972","display_name":"Nonsense","score":0.5835000276565552},{"id":"https://openalex.org/C26517878","display_name":"Key (lock)","score":0.5746999979019165},{"id":"https://openalex.org/C41895202","display_name":"Linguistics","score":0.5008999705314636},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.482699990272522},{"id":"https://openalex.org/C165064840","display_name":"Matching (statistics)","score":0.4729999899864197},{"id":"https://openalex.org/C2780292567","display_name":"Original meaning","score":0.4262999892234802},{"id":"https://openalex.org/C194648553","display_name":"Spare part","score":0.4108000099658966}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"arxiv:2601.11147","title":"Do We Always Need Query-Level Workflows? Rethinking Agentic Workflow Generation for Multi-Agent Systems","url":"https://huggingface.co/papers/2601.11147","published":"2026-01-16","authors":["Zixu Wang","Bingbing Xu","Yige Yuan","Huawei Shen","Xueqi Cheng"],"abstract":"Multi-Agent Systems (MAS) built on large language models typically solve complex tasks by coordinating multiple agents through workflows. Existing approaches generates workflows either at task level or query level, but their relative costs and benefits remain unclear. After rethinking and empirical analyses, we show that query-level workflow generation is not always necessary, since a small set of top-K best task-level workflows together already covers equivalent or even more queries. We further find that exhaustive execution-based task-level evaluation is both extremely token-costly and frequently unreliable. Inspired by the idea of self-evolution and generative reward modeling, we propose a low-cost task-level generation framework SCALE, which means \\textbf{S}elf prediction of the optimizer with few shot \\textbf{CAL}ibration for \\textbf{E}valuation instead of full validation execution....","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["huggingface_search"],"source":"huggingface_search","work_type":"","doi":"","openalex_id":"","cited_by_count":0,"quality_score":35,"matched_keywords":["agent","multi-agent"],"author_affiliations":[],"concepts":[],"official_report":false,"quality_signals":{"company_match_source":"HuggingFace Papers organization/author metadata"}},{"id":"openalex:W7124434768","title":"DyDiT++: Diffusion Transformers With Timestep and Spatial Dynamics for Efficient Visual Generation","url":"https://doi.org/10.1109/tpami.2026.3654201","published":"2026-01-15","authors":["Wangbo Zhao","Yizeng Han","Jiasheng Tang","Kai Wang","Hao Luo","Yibing Song","Gao Huang","Fan Wang","Yang You"],"abstract":"Diffusion Transformer (DiT), an emerging diffusion model for visual generation, has demonstrated superior performance but suffers from substantial computational costs. Our investigations reveal that these costs primarily stem from the static inference paradigm, which inevitably introduces redundant computation in certain diffusion timesteps and spatial regions. To overcome this inefficiency, we propose Dynamic Diffusion Transformer (DyDiT), an architecture that dynamically adjusts its computation along both timestep and spatial dimensions. Specifically, we introduce a Timestep-wise Dynamic Width (TDW) approach that adapts model width conditioned on the generation timesteps. In addition, we design a Spatial-wise Dynamic Token (SDT) strategy to avoid redundant computation at unnecessary spatial locations. TDW and SDT can be seamlessly integrated into DiT and significantly accelerate the ge...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/tpami.2026.3654201","openalex_id":"https://openalex.org/W7124434768","cited_by_count":0,"quality_score":41,"matched_keywords":["efficient"],"author_affiliations":["Alibaba Group (China)","National University of Singapore","Tsinghua University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.800599992275238},{"id":"https://openalex.org/C45374587","display_name":"Computation","score":0.7537999749183655},{"id":"https://openalex.org/C3826847","display_name":"FLOPS","score":0.6129999756813049},{"id":"https://openalex.org/C68339613","display_name":"Speedup","score":0.6086999773979187},{"id":"https://openalex.org/C66322947","display_name":"Transformer","score":0.5519000291824341},{"id":"https://openalex.org/C2776214188","display_name":"Inference","score":0.5123000144958496},{"id":"https://openalex.org/C177774035","display_name":"Granularity","score":0.5016000270843506},{"id":"https://openalex.org/C48145219","display_name":"Security token","score":0.49869999289512634}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"arxiv:2601.10193","title":"GFM4GA: Graph Foundation Model for Group Anomaly Detection","url":"http://arxiv.org/abs/2601.10193","published":"2026-01-15","authors":["Jiujiu Chen","Weijun Zeng","Shaofeng Hu","Shipeng Xie","Hui Xiong"],"abstract":"Group anomaly detection is crucial in many network applications, but faces challenges due to diverse anomaly patterns. Motivated by the success of large language models (LLMs) in natural language processing, graph foundation models (GFMs) is proposed to handle few-shot learning task with fewer labeling efforts. GFMs have been successfully applied to detection of individual anomalies but cannot be generalized to group anomalies, as group anomaly patterns must be detected as a whole and individuals in an abnormal group can look rather normal. Therefore, we propose GFM4GA, a novel graph foundation model for group anomaly detection. The pipeline is pretrained via dual-level contrastive learning based on feature-based estimation and group extraction, to capture potential group anomaly structure and feature inconsistencies. In the downstream tasks, the pipeline is finetuned in parameter-constr...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"","openalex_id":"https://openalex.org/W7124513622","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["HKUST Shenzhen Research Institute","Hong Kong University of Science and Technology","Tencent (China)","University of Hong Kong"],"concepts":[{"id":"https://openalex.org/C12997251","display_name":"Anomaly (physics)","score":0.7063000202178955},{"id":"https://openalex.org/C739882","display_name":"Anomaly detection","score":0.6837000250816345},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6524999737739563},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5778999924659729},{"id":"https://openalex.org/C132525143","display_name":"Graph","score":0.5625},{"id":"https://openalex.org/C2781311116","display_name":"Group (periodic table)","score":0.5353000164031982},{"id":"https://openalex.org/C43521106","display_name":"Pipeline (software)","score":0.4652000069618225},{"id":"https://openalex.org/C2776401178","display_name":"Feature (linguistics)","score":0.4277999997138977}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"arxiv:2601.10323","title":"ROMA: Real-time Omni-Multimodal Assistant with Interactive Streaming Understanding","url":"https://huggingface.co/papers/2601.10323","published":"2026-01-15","authors":["Xueyun Tian","Wei Li","Bingbing Xu","Heng Dong","Yuanzhuo Wang","Huawei Shen"],"abstract":"Recent Omni-multimodal Large Language Models show promise in unified audio, vision, and text modeling. However, streaming audio-video understanding remains challenging, as existing approaches suffer from disjointed capabilities: they typically exhibit incomplete modality support or lack autonomous proactive monitoring. To address this, we present ROMA, a real-time omni-multimodal assistant for unified reactive and proactive interaction. ROMA processes continuous inputs as synchronized multimodal units, aligning dense audio with discrete video frames to handle granularity mismatches. For online decision-making, we introduce a lightweight speak head that decouples response initiation from generation to ensure precise triggering without task conflict. We train ROMA with a curated streaming dataset and a two-stage curriculum that progressively optimizes for streaming format adaptation and pr...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["huggingface_search"],"source":"huggingface_search","work_type":"","doi":"","openalex_id":"","cited_by_count":0,"quality_score":27,"matched_keywords":[],"author_affiliations":[],"concepts":[],"official_report":false,"quality_signals":{"company_match_source":"HuggingFace Papers organization/author metadata"}},{"id":"hf-org-paper:stepfun-ai:2601.09668","title":"STEP3-VL-10B Technical Report","url":"https://huggingface.co/papers/2601.09668","published":"2026-01-14","authors":["StepFun"],"abstract":"We present STEP3-VL-10B, a lightweight open-source foundation model designed to redefine the trade-off between compact efficiency and frontier-level multimodal intelligence. STEP3-VL-10B is realized through two strategic shifts: first, a unified, fully unfrozen pre-training strategy on 1.2T multimodal tokens that integrates a language-aligned Perception Encoder with a Qwen3-8B decoder to establish intrinsic vision-language synergy; and second, a scaled post-training pipeline featuring over 1k iterations of reinforcement learning. Crucially, we implement Parallel Coordinated Reasoning (PaCoRe) to scale test-time compute, allocating resources to scalable perceptual reasoning that explores and synthesizes diverse visual hypotheses. Consequently, despite its compact 10B footprint, STEP3-VL-10B rivals or surpasses models 10times-20times larger (e.g., GLM-4.6V-106B, Qwen3-VL-235B) and top-tier...","companies":["StepFun"],"matched_orgs":["StepFun"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["HuggingFace org papers","stepfun-ai","efficient"],"author_affiliations":["StepFun"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/stepfun-ai/papers"}},{"id":"official:802fa6a14bdbb4e5","title":"FunctionGemma Model Card","url":"https://ai.google.dev/gemma/docs/functiongemma/model_card","published":"2026-01-14","authors":["Google/DeepMind"],"abstract":"Official Google DeepMind model card.","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"model_card","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["model card","FunctionGemma"],"author_affiliations":["Google/DeepMind"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Google DeepMind model cards page https://deepmind.google/models/model-cards/"}},{"id":"arxiv:2601.14287","title":"Chain-of-Memory: Lightweight Memory Construction with Dynamic Evolution for LLM Agents","url":"https://huggingface.co/papers/2601.14287","published":"2026-01-14","authors":["Xiucheng Xu","Bingbing Xu","Xueyun Tian","Zihe Huang","Rongxin Chen","Yunfan Li","Huawei Shen"],"abstract":"External memory systems are pivotal for enabling Large Language Model (LLM) agents to maintain persistent knowledge and perform long-horizon decision-making. Existing paradigms typically follow a two-stage process: computationally expensive memory construction (e.g., structuring data into graphs) followed by naive retrieval-augmented generation. However, our empirical analysis reveals two fundamental limitations: complex construction incurs high costs with marginal performance gains, and simple context concatenation fails to bridge the gap between retrieval recall and reasoning accuracy. To address these challenges, we propose CoM (Chain-of-Memory), a novel framework that advocates for a paradigm shift toward lightweight construction paired with sophisticated utilization. CoM introduces a Chain-of-Memory mechanism that organizes retrieved fragments into coherent inference paths through d...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["huggingface_search"],"source":"huggingface_search","work_type":"","doi":"","openalex_id":"","cited_by_count":0,"quality_score":43,"matched_keywords":["LLM","language model","memory","retrieval"],"author_affiliations":[],"concepts":[],"official_report":false,"quality_signals":{"company_match_source":"HuggingFace Papers organization/author metadata"}},{"id":"arxiv:2601.09258","title":"LatencyPrism: Online Non-intrusive Latency Sculpting for SLO-Guaranteed LLM Inference","url":"http://arxiv.org/abs/2601.09258","published":"2026-01-14","authors":["Du Yin","Jiayi Ren","Xiayu Sun","Tianyao Zhou","Haizhu Zhou","Ruiyan Ma","Danyang Zhang"],"abstract":"LLM inference latency critically determines user experience and operational costs, directly impacting throughput under SLO constraints. Even brief latency spikes degrade service quality despite acceptable average performance. However, distributed inference environments featuring diverse software frameworks and XPU architectures combined with dynamic workloads make latency analysis challenging. Constrained by intrusive designs that necessitate service restarts or even suspension, and by hardware-bound implementations that fail to adapt to heterogeneous inference environments, existing AI profiling methods are often inadequate for real-time production analysis. We present LatencyPrism, the first zero-intrusion multi-platform latency sculpting system. It aims to break down the inference latency across pipeline, proactively alert on inference latency anomalies, and guarantee adherence to SLO...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"","openalex_id":"https://openalex.org/W7124358109","cited_by_count":0,"quality_score":41,"matched_keywords":["LLM"],"author_affiliations":["Alibaba Group (China)","Cloud Computing Center"],"concepts":[{"id":"https://openalex.org/C2776214188","display_name":"Inference","score":0.8797000050544739},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8508999943733215},{"id":"https://openalex.org/C82876162","display_name":"Latency (audio)","score":0.7950000166893005},{"id":"https://openalex.org/C26713055","display_name":"Implementation","score":0.513700008392334},{"id":"https://openalex.org/C120314980","display_name":"Distributed computing","score":0.4977000057697296},{"id":"https://openalex.org/C5119721","display_name":"Quality of service","score":0.46939998865127563},{"id":"https://openalex.org/C187191949","display_name":"Profiling (computer programming)","score":0.400299996137619},{"id":"https://openalex.org/C2777904410","display_name":"Software","score":0.361299991607666}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"arxiv:2601.09088","title":"Distribution-Aligned Sequence Distillation for Superior Long-CoT Reasoning","url":"http://arxiv.org/abs/2601.09088","published":"2026-01-14","authors":["Shaotian Yan","Kaiyuan Liu","Chen Shen","Bing Wang","Sinan Fan","jun zhang","Yue Wu","Zheng Wang","Jieping Ye"],"abstract":"In this report, we introduce DASD-4B-Thinking, a lightweight yet highly capable, fully open-source reasoning model. It achieves SOTA performance among open-source models of comparable scale across challenging benchmarks in mathematics, scientific reasoning, and code generation -- even outperforming several larger models. We begin by critically reexamining a widely adopted distillation paradigm in the community: SFT on teacher-generated responses, also known as sequence-level distillation. Although a series of recent works following this scheme have demonstrated remarkable efficiency and strong empirical performance, they are primarily grounded in the SFT perspective. Consequently, these approaches focus predominantly on designing heuristic rules for SFT data filtering, while largely overlooking the core principle of distillation itself -- enabling the student model to learn the teacher's...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"","openalex_id":"https://openalex.org/W7124358350","cited_by_count":0,"quality_score":41,"matched_keywords":["distillation"],"author_affiliations":["Alibaba Group (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7214999794960022},{"id":"https://openalex.org/C204030448","display_name":"Distillation","score":0.7138000130653381},{"id":"https://openalex.org/C177148314","display_name":"Generalization","score":0.6700999736785889},{"id":"https://openalex.org/C2776359362","display_name":"Representation (politics)","score":0.6118999719619751},{"id":"https://openalex.org/C173801870","display_name":"Heuristic","score":0.59579998254776},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.5110999941825867},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5060999989509583},{"id":"https://openalex.org/C2778112365","display_name":"Sequence (biology)","score":0.47049999237060547}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/gi-bench-a-panoramic-benchmark-revealing-the-knowledge-experience-dissociation-of-multimodal-large-language-models-in-gastrointestinal-endoscopy-against-clinical-standards","title":"GI-Bench: A Panoramic Benchmark Revealing the Knowledge-Experience Dissociation of Multimodal Large Language Models in Gastrointestinal Endoscopy Against Clinical Standards","url":"https://www.microsoft.com/en-us/research/publication/gi-bench-a-panoramic-benchmark-revealing-the-knowledge-experience-dissociation-of-multimodal-large-language-models-in-gastrointestinal-endoscopy-against-clinical-standards/","published":"2026-01-13","authors":["Yan Zhu","Tengfei Luo","Pei-yao Fu","Zhen Zhang","Zilong Wang","Yi-Fan Qu","Zifan Geng","Jia-qi Xu","L. Yao","Li-yun Ma","Wei Su","Wei-Feng Chen"],"abstract":"Multimodal Large Language Models (MLLMs) show promise in gastroenterology, yet their performance against comprehensive clinical workflows and human benchmarks remains unverified. To systematically evaluate state-of-the-art MLLMs across a panoramic gastrointestinal endoscopy workflow and determine their clinical utility compared with human endoscopists. We constructed GI-Bench, a benchmark encompassing 20 fine-grained lesion categories. Twelve MLLMs were evaluated across a five-stage clinical workflow: anatomical localization, lesion identification, diagnosis, findings description, and management. Model performance was benchmarked against three junior endoscopists and three residency trainees using Macro-F1, mean Intersection-over-Union (mIoU), and multi-dimensional Likert scale. Gemini-3-Pro achieved state-of-the-art performance. In diagnostic reasoning, top-tier models (Macro-F1 0.641)....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Article (Journal)","Computer vision","Medical, health and genomics","Computer science"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"official:a5d9462db04eb8d9","title":"Veo 3 Model Card","url":"https://storage.googleapis.com/deepmind-media/Model-Cards/Veo-3-Model-Card.pdf","published":"2026-01-13","authors":["Google/DeepMind"],"abstract":"Official Google DeepMind model card.","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"model_card","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["model card","Veo 3"],"author_affiliations":["Google/DeepMind"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Google DeepMind model cards page https://deepmind.google/models/model-cards/"}},{"id":"official:3adf2ce37d0b6e90","title":"Conditional Memory via Scalable Lookup: A New Axis of Sparsity for Large Language Models","url":"https://github.com/deepseek-ai/Engram/blob/main/Engram_paper.pdf","published":"2026-01-12","authors":["Xin Cheng","Wangding Zeng","Damai Dai","Qinyu Chen","Bingxuan Wang","Zhenda Xie","Kezhao Huang","Xingkai Yu","Zhewen Hao","Yukun Li","Han Zhang","Huishuai Zhang"],"abstract":"The paper introduces conditional memory as a complementary sparsity axis for Large Language Models, instantiated via Engram, an N-gram lookup module with O(1) retrieval. It reports iso-parameter and iso-FLOPs improvements over MoE baselines across knowledge, reasoning, code, math, and long-context retrieval tasks.","companies":["DeepSeek"],"matched_orgs":["DeepSeek"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_report"],"source":"official_report","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["HuggingFace org papers","deepseek-ai","memory","retrieval"],"author_affiliations":["Peking University","DeepSeek-AI","DeepSeek"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official company report source"}},{"id":"openalex:W7122726170","title":"Transforming wearable data into personal health insights using large language model agents","url":"https://doi.org/10.1038/s41467-025-67922-y","published":"2026-01-12","authors":["Mike A. Merrill","Akshay Paruchuri","Naghmeh Rezaei","Geza Kovacs","Javier Perez","Yun Liu","Erik Schenck","Nova Hammerquist","Jake Sunshine","Shyam Tailor","Kumar Ayush","Hao-Wei Su"],"abstract":"Deriving personalized insights from popular wearable trackers requires complex numerical reasoning that challenges standard LLMs, necessitating tool-based approaches like code generation. Large language model (LLM) agents present a promising yet largely untapped solution for this analysis at scale. We introduce the Personal Health Insights Agent (PHIA), a system leveraging multistep reasoning with code generation and information retrieval to analyze and interpret behavioral health data. To test its capabilities, we create and share two benchmark datasets with over 4000 health insights questions. A 650-hour human expert evaluation shows that PHIA significantly outperforms a strong code generation baseline, achieving 84% accuracy on objective, numerical questions and, for open-ended ones, earning 83% favorable ratings while being twice as likely to achieve the highest quality rating. This....","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1038/s41467-025-67922-y","openalex_id":"https://openalex.org/W7122726170","cited_by_count":5,"quality_score":62,"matched_keywords":["LLM","language model","personalized","retrieval","agent"],"author_affiliations":["Google (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7728000283241272},{"id":"https://openalex.org/C150594956","display_name":"Wearable computer","score":0.6338000297546387},{"id":"https://openalex.org/C185798385","display_name":"Benchmark (surveying)","score":0.6319000124931335},{"id":"https://openalex.org/C2776760102","display_name":"Code (set theory)","score":0.6315000057220459},{"id":"https://openalex.org/C57501372","display_name":"BitTorrent tracker","score":0.5871999859809875},{"id":"https://openalex.org/C54290928","display_name":"Wearable technology","score":0.5292999744415283},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.492000013589859},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.49079999327659607}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":5}},{"id":"hf-org-paper:Tencent-Hunyuan:2601.08881","title":"TAG-MoE: Task-Aware Gating for Unified Generative Mixture-of-Experts","url":"https://huggingface.co/papers/2601.08881","published":"2026-01-12","authors":["Tencent/Hunyuan"],"abstract":"","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["HuggingFace org papers","Tencent-Hunyuan"],"author_affiliations":["Tencent/Hunyuan"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/Tencent-Hunyuan/papers"}},{"id":"hf-org-paper:Qwen:2601.07526","title":"MegaFlow: Large-Scale Distributed Orchestration System for the Agentic Era","url":"https://huggingface.co/papers/2601.07526","published":"2026-01-12","authors":["Alibaba/Qwen"],"abstract":"","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["HuggingFace org papers","Qwen"],"author_affiliations":["Alibaba/Qwen"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/Qwen/papers"}},{"id":"openalex:W7123342818","title":"AugMMRev: An LLM-Augmented Multimodal Ranking Model for Personalized Image Material Retrieval","url":"https://doi.org/10.1109/tce.2026.3652186","published":"2026-01-12","authors":["Yu Li","Jiaxuan He","Xiaoxiao Chen","Zhongyu Wang","Z. B. Chen","Niu Qi","Zulong Chen","Wenjian Xu","Yuyu Yin"],"abstract":"Consumer electronics devices—including smartphones and smart cameras—generate massive volumes of image data. Image retrieval serves as a critical enabling technology for diverse image-centric applications in consumer electronic applications (like AI-powered photo retrieval in smartphone, efficient media asset and creative template retrieval in smart camera, streaming recommendation systems for smart TVs). Nevertheless, text-query-based image retrieval encounters unique challenges within consumer electronics environments. First, in consumer electronics applications, text-to-image retrieving queries are typically concise, frequently leading to ambiguity in intent. Second,numerous images lack textual descriptions and metadata tags, while others contain inaccuracies or unreliable annotations—particularly in consumer-generated content where such semantic gaps critically undermine retrieval ac...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/tce.2026.3652186","openalex_id":"https://openalex.org/W7123342818","cited_by_count":3,"quality_score":60,"matched_keywords":["LLM","personalized","retrieval","media","efficient"],"author_affiliations":["Alibaba Group (China)","Hangzhou Dianzi University","Zhejiang Hospital","Zhejiang University of Science and Technology"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.821399986743927},{"id":"https://openalex.org/C1667742","display_name":"Image retrieval","score":0.7774999737739563},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.7123000025749207},{"id":"https://openalex.org/C93518851","display_name":"Metadata","score":0.5722000002861023},{"id":"https://openalex.org/C75165309","display_name":"Search engine indexing","score":0.4668000042438507},{"id":"https://openalex.org/C189430467","display_name":"Ranking (information retrieval)","score":0.4440999925136566},{"id":"https://openalex.org/C43521106","display_name":"Pipeline (software)","score":0.43950000405311584},{"id":"https://openalex.org/C189391414","display_name":"Visual Word","score":0.41940000653266907}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":3}},{"id":"apple:wpiez3tg4gekds2vfp0our0t","title":"Over-Searching in Search-Augmented Large Language Models","url":"https://machinelearning.apple.com/research/search-augmented","published":"2026-01-12","authors":["Roy Xie","Deepak Gopinath","David Qiu","Dong Lin","Haitian Sun","Saloni Potdar","Bhuwan Dhingra"],"abstract":"Search-augmented large language models (LLMs) excel at knowledge-intensive tasks by integrating external retrieval.However, they often over-search – unnecessarily invoking search tool even when it does not improve response quality,which leads to computational inefficiency and hallucinations by incorporating irrelevant context. In this work, we conduct asystematic evaluation of over-searching across multiple dimensions, including query types,...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["retrieval"],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"apple:c47ym9f297b1tk98wyx17ub0","title":"DeepMMSearch-R1: Empowering Multimodal LLMs in Multimodal Web Search","url":"https://machinelearning.apple.com/research/r1","published":"2026-01-12","authors":["Kartik Narayan","Yang Xu","Tian Cao","Kavya Nerella","Vishal M. Patel","Navid Shiee","Peter Grasch","Chao Jia","Yinfei Yang","Zhe Gan"],"abstract":"Multimodal Large Language Models (MLLMs) in real-world applications require access to external knowledge sources and must remain responsive to the dynamic and ever-changing real-world information in order to address information-seeking and knowledge-intensive user queries. Existing approaches, such as retrieval augmented generation (RAG) methods, search agents, and search equipped MLLMs, often suffer from rigid pipelines, excessive search calls,...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["retrieval"],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"apple:yzu8reegjbpyhk16fydsgjq9","title":"MoEs Are Stronger than You Think: Hyper-Parallel Inference Scaling with RoE","url":"https://machinelearning.apple.com/research/roe","published":"2026-01-12","authors":["Soheil Zibakhsh","Mohammad Samragh","Kumari Nishu","Lauren Hannah","Arnav Kundu","Minsik Cho"],"abstract":"The generation quality of large language models (LLMs) is often improved by utilizing inference-time sequence-level scaling methods (e.g., Chain-of-Thought). We introduce hyper-parallel scaling, a complementary framework that improves prediction quality at the token level. Hyper-parallel scaling computes and aggregates multiple output proposals for a single token from the model. We implement this concept in Mixture-of-Experts (MoE) models, which...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"openalex:W7123347033","title":"Designing AI-programmable therapeutics with the EDEN family of foundation models","url":"https://doi.org/10.64898/2026.01.12.699009","published":"2026-01-12","authors":["Geraldene Munsamy","Gavin Ayres","Carla Greco","Keith Kam","Gus Minto-Cowcher","John St John","Tanggis Bohnuud","Matthew H. Bakalar","William Chow","Robert Pecoraro","Marcelo D.T. Torres","Aaron Kollasch"],"abstract":"Abstract The ability to interpret, modify, and design DNA has driven many of the most significant advances in modern medicine, from diagnostics, biologics, and vaccines to cell and gene therapies. However, the inherent complexity of biological systems means that most modern medicines are still engineered using bespoke, labor-intensive processes. To address the need for a generalisable and programmable approach to therapeutic design, we introduce the EDEN (environmentally-derived evolutionary network) family of metagenomic foundation models, including a 28 billion parameter model trained on 9.7 trillion nucleotide tokens from BaseData 1 . This dataset, at the time of training, contained more than 10 billion novel genes from over 1 million new species, and is intentionally enriched for environmental and host-associated metagenomes, phage sequences, and mobile genetic elements, enabling the...","companies":["Microsoft","NVIDIA"],"matched_orgs":["Microsoft","NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.64898/2026.01.12.699009","openalex_id":"https://openalex.org/W7123347033","cited_by_count":0,"quality_score":49,"matched_keywords":[],"author_affiliations":["Centre for Genomic Regulation","Johns Hopkins University","Johns Hopkins University Applied Physics Laboratory","Microsoft (Finland)","Microsoft (United States)","Microsoft Research (India)","Nvidia (United Kingdom)","Nvidia (United States)","Oxford University Press (United Kingdom)","University of Oxford"],"concepts":[{"id":"https://openalex.org/C2778738651","display_name":"Novelty","score":0.5303000211715698},{"id":"https://openalex.org/C70721500","display_name":"Computational biology","score":0.5216000080108643},{"id":"https://openalex.org/C191908910","display_name":"Synthetic biology","score":0.44440001249313354},{"id":"https://openalex.org/C86803240","display_name":"Biology","score":0.4230000078678131},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.41659998893737793},{"id":"https://openalex.org/C15151743","display_name":"Metagenomics","score":0.4083000123500824},{"id":"https://openalex.org/C144501496","display_name":"Genome editing","score":0.3578999936580658},{"id":"https://openalex.org/C2780598303","display_name":"Flexibility (engineering)","score":0.3571000099182129}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"arxiv:2601.07349","title":"Reward Modeling from Natural Language Human Feedback","url":"http://arxiv.org/abs/2601.07349","published":"2026-01-12","authors":["Zongqi Wang","Rui xue Wang","Yuchuan Wu","Yiyao Yu","Pinyi Zhang","Shaoning Sun","Yujiu Yang","Yongbin Li"],"abstract":"Reinforcement Learning with Verifiable reward (RLVR) on preference data has become the mainstream approach for training Generative Reward Models (GRMs). Typically in pairwise rewarding tasks, GRMs generate reasoning chains ending with critiques and preference labels, and RLVR then relies on the correctness of the preference labels as the training reward. However, in this paper, we demonstrate that such binary classification tasks make GRMs susceptible to guessing correct outcomes without sound critiques. Consequently, these spurious successes introduce substantial noise into the reward signal, thereby impairing the effectiveness of reinforcement learning. To address this issue, we propose Reward Modeling from Natural Language Human Feedback (RM-NLHF), which leverages natural language feedback to obtain process reward signals, thereby mitigating the problem of limited solution space inher...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"","openalex_id":"https://openalex.org/W7124117416","cited_by_count":0,"quality_score":41,"matched_keywords":["preference"],"author_affiliations":["Alibaba Group (China)","Alibaba Group (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7091000080108643},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5920000076293945},{"id":"https://openalex.org/C97541855","display_name":"Reinforcement learning","score":0.5306000113487244},{"id":"https://openalex.org/C97256817","display_name":"Spurious relationship","score":0.5135999917984009},{"id":"https://openalex.org/C195324797","display_name":"Natural language","score":0.512499988079071},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.5027999877929688},{"id":"https://openalex.org/C98045186","display_name":"Process (computing)","score":0.44510000944137573},{"id":"https://openalex.org/C2781249084","display_name":"Preference","score":0.44040000438690186}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"arxiv:2601.07577","title":"Beyond Entangled Planning: Task-Decoupled Planning for Long-Horizon Agents","url":"https://huggingface.co/papers/2601.07577","published":"2026-01-12","authors":["Yunfan Li","Bingbing Xu","Xueyun Tian","Xiucheng Xu","Huawei Shen"],"abstract":"Recent advances in large language models (LLMs) have enabled agents to autonomously execute complex, long-horizon tasks, yet planning remains a primary bottleneck for reliable task execution. Existing methods typically fall into two paradigms: step-wise planning, which is reactive but often short-sighted; and one-shot planning, which generates a complete plan upfront yet is brittle to execution errors. Crucially, both paradigms suffer from entangled contexts, where the agent must reason over a monolithic history spanning multiple sub-tasks. This entanglement increases cognitive load and lets local errors propagate across otherwise independent decisions, making recovery computationally expensive. To address this, we propose Task-Decoupled Planning (TDP), a training-free framework that replaces entangled reasoning with task decoupling. TDP decomposes tasks into a directed acyclic graph (DA...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["huggingface_search"],"source":"huggingface_search","work_type":"","doi":"","openalex_id":"","cited_by_count":0,"quality_score":31,"matched_keywords":["agent"],"author_affiliations":[],"concepts":[],"official_report":false,"quality_signals":{"company_match_source":"HuggingFace Papers organization/author metadata"}},{"id":"apple:wypyac2i7hi83ayveysx0vdf","title":"MANZANO: A Simple and Scalable Unified Multimodal Model with a Hybrid Vision Tokenizer","url":"https://machinelearning.apple.com/research/manzano","published":"2026-01-11","authors":["Yanghao Li","Rui Qian","Bowen Pan","Haotian Zhang","Haoshuo Huang","Bowen Zhang","Jialing Tong","Haoxuan You","Xianzhi Du","Zhe Gan","Hyunjik Kim","Chao Jia"],"abstract":"Unified multimodal Large Language Models (LLMs) that can both understand and generate visual content hold immense potential. However, existing open-source models often suffer from a performance trade-off between these capabilities. We present Manzano, a simple and scalable unified framework that substantially reduces this tension by coupling a hybrid image tokenizer with a well-curated training recipe. A single shared vision encoder feeds two...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"openalex:W7120071429","title":"FedSNA: a federated learning neuro-spiking and algae-optimized agentic AI framework for real-time fraud detection in cloud-based financial services","url":"https://doi.org/10.1007/s41870-025-03087-7","published":"2026-01-10","authors":["Rajesh Sura","Mohan Kumar Meesala","Prayas Lohalekar","Nitya Sri Nellore","Nitin Mukhi","Mahesh Kumar Goyal"],"abstract":"","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1007/s41870-025-03087-7","openalex_id":"https://openalex.org/W7120071429","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Amazon (United States)","Anna University, Chennai","Binghamton University","Institute of Electrical and Electronics Engineers","Princeton University","University of the Cumberlands","Williams (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8664000034332275},{"id":"https://openalex.org/C2992525071","display_name":"Federated learning","score":0.6438000202178955},{"id":"https://openalex.org/C64869954","display_name":"False positive paradox","score":0.6290000081062317},{"id":"https://openalex.org/C2776401178","display_name":"Feature (linguistics)","score":0.47209998965263367},{"id":"https://openalex.org/C75949130","display_name":"Database transaction","score":0.4602999985218048},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.45559999346733093},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.45399999618530273},{"id":"https://openalex.org/C2985140798","display_name":"Financial fraud","score":0.4368000030517578}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"hf-org-paper:stepfun-ai:2601.05593","title":"PaCoRe: Learning to Scale Test-Time Compute with Parallel Coordinated Reasoning","url":"https://huggingface.co/papers/2601.05593","published":"2026-01-09","authors":["StepFun"],"abstract":"","companies":["StepFun"],"matched_orgs":["StepFun"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["HuggingFace org papers","stepfun-ai"],"author_affiliations":["StepFun"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/stepfun-ai/papers"}},{"id":"hf-org-paper:zai-org:2601.06021","title":"Chaining the Evidence: Robust Reinforcement Learning for Deep Search Agents with Citation-Aware Rubric Rewards","url":"https://huggingface.co/papers/2601.06021","published":"2026-01-09","authors":["Z.ai/Zhipu"],"abstract":"","companies":["Z.ai/Zhipu"],"matched_orgs":["Z.ai/Zhipu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["HuggingFace org papers","zai-org"],"author_affiliations":["Z.ai/Zhipu"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/zai-org/papers"}},{"id":"apple:w0x4fa1nzufql6uarf4hy2up","title":"Pretraining with Hierarchical Memories: Separating Long-Tail and Common Knowledge","url":"https://machinelearning.apple.com/research/hierarchical-memories","published":"2026-01-09","authors":["Hadi Pouransari","David Grangier","C Thomas","Michael Kirchhof","Oncel Tuzel"],"abstract":"The impressive performance gains of modern language models currently rely on scaling parameters: larger models store more world knowledge and reason better. Yet compressing all world knowledge into parameters is unnecessary, as only a fraction is used per prompt, and impractical for edge devices with limited inference-time memory and compute. We address this shortcoming by a memory-augmented architecture and a pretraining strategy aligned with...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["memory"],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"apple:chv8npw381g9eqfv0ivzreu7","title":"AgentBuilder: Exploring Scaffolds for Prototyping User Experiences of Interface Agents","url":"https://machinelearning.apple.com/research/agentbuilder","published":"2026-01-09","authors":["Jenny T. Liang","Titus Barik","Jeffrey Nichols","Eldon Schoop","Ruijia Cheng"],"abstract":"Interface agents powered by generative AI models (referred to as \"agents\") can automate actions based on user commands. An important aspect of developing agents is their user experience (i.e., agent experience). There is a growing need to provide scaffolds for a broader set of individuals beyond AI engineers to prototype agent experiences, since they can contribute valuable perspectives to designing agent experiences. In this work, we explore the...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["agent"],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"apple:b92zni3xrhzo361eaq28qr7x","title":"Which Evaluation for Which Model? A Taxonomy for Speech Model Assessment","url":"https://machinelearning.apple.com/research/taxonomy-for-speech","published":"2026-01-09","authors":["Maureen de Seyssel","Eeshan Gunesh Dhekane"],"abstract":"Speech foundation models have recently achieved remarkable capabilities across a wide range of tasks. However, their evaluation remains disjointed across tasks and model types. Different models excel at distinct aspects of speech processing and thus require different evaluation protocols. This paper proposes a unified taxonomy that addresses the question: Which evaluation is appropriate for which model? The taxonomy defines three orthogonal axes:...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"apple:y4yz2p945uwsupb5rawj6o8l","title":"AdaBoN: Adaptive Best-of-N Alignment","url":"https://machinelearning.apple.com/research/best-of-n","published":"2026-01-09","authors":["Vinod Raman","Hilal Asi","Satyen Kale"],"abstract":"Recent advances in test-time alignment methods, such as Best-of-N sampling, offer a simple and effective way to steer language models (LMs) toward preferred behaviors using reward models (RM). However, these approaches can be computationally expensive, especially when applied uniformly across prompts without accounting for differences in alignment difficulty. In this work, we propose a prompt-adaptive strategy for Best-of-N alignment that...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"hf-org-paper:Qwen:2601.04720","title":"Qwen3-VL-Embedding and Qwen3-VL-Reranker: A Unified Framework for State-of-the-Art Multimodal Retrieval and Ranking","url":"https://huggingface.co/papers/2601.04720","published":"2026-01-08","authors":["Alibaba/Qwen"],"abstract":"In this report, we introduce the Qwen3-VL-Embedding and Qwen3-VL-Reranker model series, the latest extensions of the Qwen family built on the Qwen3-VL foundation model. Together, they provide an end-to-end pipeline for high-precision multimodal search by mapping diverse modalities, including text, images, document images, and video, into a unified representation space. The Qwen3-VL-Embedding model employs a multi-stage training paradigm, progressing from large-scale contrastive pre-training to reranking model distillation, to generate semantically rich high-dimensional vectors. It supports Matryoshka Representation Learning, enabling flexible embedding dimensions, and handles inputs up to 32k tokens. Complementing this, Qwen3-VL-Reranker performs fine-grained relevance estimation for query-document pairs using a cross-encoder architecture with cross-attention mechanisms. Both model serie...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["HuggingFace org papers","Qwen","retrieval","distillation"],"author_affiliations":["Alibaba/Qwen"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/Qwen/papers"}},{"id":"hf-org-paper:tencent:2601.04767","title":"AT^2PO: Agentic Turn-based Policy Optimization via Tree Search","url":"https://huggingface.co/papers/2601.04767","published":"2026-01-08","authors":["Tencent/Hunyuan"],"abstract":"","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["HuggingFace org papers","tencent"],"author_affiliations":["Tencent/Hunyuan"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/tencent/papers"}},{"id":"arxiv:2601.05214","title":"Internal Representations as Indicators of Hallucinations in Agent Tool Selection","url":"http://arxiv.org/abs/2601.05214","published":"2026-01-08","authors":["Kait Healy","Bharathi Srinivasan","Visakh Madathil","Jing Wu"],"abstract":"Large Language Models (LLMs) have shown remarkable capabilities in tool calling and tool usage, but suffer from hallucinations where they choose incorrect tools, provide malformed parameters and exhibit 'tool bypass' behavior by performing simulations and generating outputs instead of invoking specialized tools or external systems. This undermines the reliability of LLM based agents in production systems as it leads to inconsistent results, and bypasses security and audit controls. Such hallucinations in agent tool selection require early detection and error handling. Unlike existing hallucination detection methods that require multiple forward passes or external validation, we present a computationally efficient framework that detects tool-calling hallucinations in real-time by leveraging LLMs' internal representations during the same forward pass used for generation. We evaluate this a...","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"","openalex_id":"https://openalex.org/W7120272621","cited_by_count":0,"quality_score":49,"matched_keywords":["LLM","efficient","agent"],"author_affiliations":["Amazon (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7124000191688538},{"id":"https://openalex.org/C81917197","display_name":"Selection (genetic algorithm)","score":0.6450999975204468},{"id":"https://openalex.org/C2776214188","display_name":"Inference","score":0.6445000171661377},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5626000165939331},{"id":"https://openalex.org/C43214815","display_name":"Reliability (semiconductor)","score":0.534500002861023},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.49619999527931213},{"id":"https://openalex.org/C2779458634","display_name":"Debiasing","score":0.4293999969959259},{"id":"https://openalex.org/C28427503","display_name":"Internal model","score":0.36899998784065247}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"arxiv:2601.04696","title":"A Method for Constructing a Digital Transformation Driving Mechanism Based on Semantic Understanding of Large Models","url":"http://arxiv.org/abs/2601.04696","published":"2026-01-08","authors":["Huayi Liu"],"abstract":"In the process of digital transformation, enterprises are faced with problems such as insufficient semantic understanding of unstructured data and lack of intelligent decision-making basis in driving mechanisms. This study proposes a method that combines a large language model (LLM) and a knowledge graph. First, a fine-tuned BERT (Bidirectional Encoder Representations from Transformers) model is used to perform entity recognition and relationship extraction on multi-source heterogeneous texts, and GPT-4 is used to generate semantically enhanced vector representations; secondly, a two-layer graph neural network (GNN) architecture is designed to fuse the semantic vectors output by LLM with business metadata to construct a dynamic and scalable enterprise knowledge graph; then reinforcement learning is introduced to optimize decision path generation, and the reward function is used to drive....","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"","openalex_id":"https://openalex.org/W7120272628","cited_by_count":0,"quality_score":45,"matched_keywords":["LLM","language model"],"author_affiliations":["Baidu (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7965999841690063},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.572700023651123},{"id":"https://openalex.org/C93518851","display_name":"Metadata","score":0.4309999942779541},{"id":"https://openalex.org/C141353440","display_name":"Fuse (electrical)","score":0.42399999499320984},{"id":"https://openalex.org/C48044578","display_name":"Scalability","score":0.41679999232292175},{"id":"https://openalex.org/C97541855","display_name":"Reinforcement learning","score":0.39590001106262207},{"id":"https://openalex.org/C98045186","display_name":"Process (computing)","score":0.39570000767707825},{"id":"https://openalex.org/C126082660","display_name":"Digital transformation","score":0.3910999894142151}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"arxiv:2601.05175","title":"VideoAuto-R1: Video Auto Reasoning via Thinking Once, Answering Twice","url":"https://huggingface.co/papers/2601.05175","published":"2026-01-08","authors":["Shuming Liu","Mingchen Zhuge","Changsheng Zhao","Jun Chen","Lemeng Wu","Zechun Liu","Chenchen Zhu","Zhipeng Cai","Chong Zhou","Haozhe Liu","Ernie Chang","Saksham Suri"],"abstract":"Chain-of-thought (CoT) reasoning has emerged as a powerful tool for multimodal large language models on video understanding tasks. However, its necessity and advantages over direct answering remain underexplored. In this paper, we first demonstrate that for RL-trained video models, direct answering often matches or even surpasses CoT performance, despite CoT producing step-by-step analyses at a higher computational cost. Motivated by this, we propose VideoAuto-R1, a video understanding framework that adopts a reason-when-necessary strategy. During training, our approach follows a Thinking Once, Answering Twice paradigm: the model first generates an initial answer, then performs reasoning, and finally outputs a reviewed answer. Both answers are supervised via verifiable rewards. During inference, the model uses the confidence score of the initial answer to determine whether to proceed wit...","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["huggingface_search"],"source":"huggingface_search","work_type":"","doi":"","openalex_id":"","cited_by_count":0,"quality_score":27,"matched_keywords":[],"author_affiliations":[],"concepts":[],"official_report":false,"quality_signals":{"company_match_source":"HuggingFace Papers organization/author metadata"}},{"id":"arxiv:2601.04992","title":"Learning from Mistakes: Negative Reasoning Samples Enhance Out-of-Domain Generalization","url":"https://huggingface.co/papers/2601.04992","published":"2026-01-08","authors":["Xueyun Tian","Minghua Ma","Bingbing Xu","Nuoyan Lyu","Wei Li","Heng Dong","Zheng Chu","Yuanzhuo Wang","Huawei Shen"],"abstract":"Supervised fine-tuning (SFT) on chain-of-thought (CoT) trajectories demonstrations is a common approach for enabling reasoning in large language models. Standard practices typically only retain trajectories with correct final answers (positives) while ignoring the rest (negatives). We argue that this paradigm discards substantial supervision and exacerbates overfitting, limiting out-of-domain (OOD) generalization. Specifically, we surprisingly find that incorporating negative trajectories into SFT yields substantial OOD generalization gains over positive-only training, as these trajectories often retain valid intermediate reasoning despite incorrect final answers. To understand this effect in depth, we systematically analyze data, training dynamics, and inference behavior, identifying 22 recurring patterns in negative chains that serve a dual role: they moderate loss descent to mitigate....","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["huggingface_search"],"source":"huggingface_search","work_type":"","doi":"","openalex_id":"","cited_by_count":0,"quality_score":27,"matched_keywords":[],"author_affiliations":[],"concepts":[],"official_report":false,"quality_signals":{"company_match_source":"HuggingFace Papers organization/author metadata"}},{"id":"hf-org-paper:tencent:2601.04544","title":"TCAndon-Router: Adaptive Reasoning Router for Multi-Agent Collaboration","url":"https://huggingface.co/papers/2601.04544","published":"2026-01-07","authors":["Tencent/Hunyuan"],"abstract":"","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["HuggingFace org papers","tencent","agent","multi-agent"],"author_affiliations":["Tencent/Hunyuan"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/tencent/papers"}},{"id":"arxiv:2503.17523","title":"Bayesian teaching enables probabilistic reasoning in large language models","url":"http://arxiv.org/abs/2503.17523","published":"2026-01-07","authors":["Linlu Qiu","Fei Sha","Kelsey R. Allen","Yoon Kim","Tal Linzen","Sjoerd van Steenkiste"],"abstract":"Large language models (LLMs) are increasingly used as agents that interact with users and with the world. To do so successfully, LLMs must construct representations of the world and form probabilistic beliefs about them. To provide personalized recommendations, for example, the LLM needs to infer a user's preferences from their behavior over multiple interactions. The Bayesian inference framework lays out the optimal way for an agent to update its beliefs as it receives new information. We first show that LLMs fall far short of the standard defined by the Bayesian framework. We then show that by teaching LLMs to mimic the predictions of the normative Bayesian model, we can dramatically improve their ability to update their beliefs; this ability generalizes to new tasks. We conclude that LLMs can effectively learn reasoning skills from examples and generalize those skills to new domains.","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1038/s41467-025-67998-6","openalex_id":"https://openalex.org/W7118304105","cited_by_count":5,"quality_score":54,"matched_keywords":["LLM","personalized","agent"],"author_affiliations":["Google (United States)","Massachusetts Institute of Technology","Menlo School","University of British Columbia","Vector Institute"],"concepts":[{"id":"https://openalex.org/C49937458","display_name":"Probabilistic logic","score":0.6916999816894531},{"id":"https://openalex.org/C44725695","display_name":"Normative","score":0.6880000233650208},{"id":"https://openalex.org/C2780801425","display_name":"Construct (python library)","score":0.6626999974250793},{"id":"https://openalex.org/C2776214188","display_name":"Inference","score":0.6607999801635742},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6248999834060669},{"id":"https://openalex.org/C160234255","display_name":"Bayesian inference","score":0.5455999970436096},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5418000221252441},{"id":"https://openalex.org/C107673813","display_name":"Bayesian probability","score":0.5304999947547913}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":5}},{"id":"openalex:W7128696696","title":"Thinking Forward and Backward: Multi-Objective Reinforcement Learning for Retrieval-Augmented Reasoning","url":"https://doi.org/10.48448/e77r-p351","published":"2026-01-07","authors":["Association for Artificial Intelligence 2026","Xueqi Cheng","Maarten de Rijke","Jiafeng Guo","Yu-An Liu","Lixin Su","Shuaiqiang Wang","Wenda Wei","Dawei Yin","Ruqing Zhang"],"abstract":"Retrieval-augmented generation (RAG) has proven effective in mitigating hallucinations in large language models, yet its effectiveness remains limited in complex, multi-step reasoning scenarios. Recent efforts have incorporated search-based interactions into RAG, enabling iterative reasoning with real-time retrieval. Most approaches rely on outcome-based supervision, offering no explicit guidance for intermediate steps. This often leads to reward hacking and degraded response quality. We propose Bi-RAR, a novel retrieval-augmented reasoning framework that evaluates each intermediate step jointly in both forward and backward directions. To assess the information completeness of each step, we introduce a bidirectional information distance grounded in Kolmogorov complexity, approximated via language model generation probabilities. This quantification measures both how far the current reason...","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"other","doi":"https://doi.org/10.48448/e77r-p351","openalex_id":"https://openalex.org/W7128696696","cited_by_count":0,"quality_score":49,"matched_keywords":["language model","retrieval","efficient"],"author_affiliations":["Baidu (China)","Carl Albert State College","Institute of Computing Technology","University of Amsterdam"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7024000287055969},{"id":"https://openalex.org/C97541855","display_name":"Reinforcement learning","score":0.5846999883651733},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5271999835968018},{"id":"https://openalex.org/C89288958","display_name":"Reasoning system","score":0.45320001244544983},{"id":"https://openalex.org/C159032336","display_name":"Non-monotonic logic","score":0.4339999854564667},{"id":"https://openalex.org/C86827895","display_name":"Opportunistic reasoning","score":0.37619999051094055},{"id":"https://openalex.org/C17231256","display_name":"Completeness (order theory)","score":0.3522000014781952},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.35199999809265137}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7118515839","title":"HiViTrack: Hierarchical vision transformer with efficient target-prompt update for visual object tracking","url":"https://doi.org/10.1016/j.patcog.2025.112992","published":"2026-01-07","authors":["Yang Fang","Yujie Hu","Bailian Xie","Yujie Wang","Zongyi Xu","Weisheng Li","Xinbo Gao"],"abstract":"","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1016/j.patcog.2025.112992","openalex_id":"https://openalex.org/W7118515839","cited_by_count":1,"quality_score":42,"matched_keywords":["efficient"],"author_affiliations":["Chongqing University of Posts and Telecommunications","Huawei Technologies (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7680000066757202},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.6608999967575073},{"id":"https://openalex.org/C66322947","display_name":"Transformer","score":0.6057999730110168},{"id":"https://openalex.org/C57501372","display_name":"BitTorrent tracker","score":0.5968000292778015},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.5060999989509583},{"id":"https://openalex.org/C202474056","display_name":"Video tracking","score":0.44699999690055847},{"id":"https://openalex.org/C56461940","display_name":"Eye tracking","score":0.4180999994277954},{"id":"https://openalex.org/C48145219","display_name":"Security token","score":0.4059999883174896}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W7128703499","title":"Pre-DPO: Improving Data Utilization in Direct Preference Optimization Using a Guiding Reference Model","url":"https://doi.org/10.48448/vdev-dr35","published":"2026-01-07","authors":["Association for Artificial Intelligence 2026","Shulin Huang","Junshu Pan","Wei Shen","Y. Zhang","Qiji Zhou"],"abstract":"Direct Preference Optimization (DPO) simplifies reinforcement learning from human feedback (RLHF) for large language models (LLMs) by directly training on offline preference data to align with human preferences. During DPO training, the reference model serves as a data weight adjuster. However, the common practice of initializing the policy and reference models identically in DPO can lead to inefficient data utilization and impose a performance ceiling. Meanwhile, the absence of a reference model in Simple Preference Optimization (SimPO) reduces training robustness and requires stricter conditions to prevent catastrophic forgetting. In this work, we propose Pre-DPO, a simple yet effective DPO-based training paradigm that improves preference optimization by introducing a guiding reference model. This reference model provides foresight into the desired policy state achievable through the t...","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"other","doi":"https://doi.org/10.48448/vdev-dr35","openalex_id":"https://openalex.org/W7128703499","cited_by_count":0,"quality_score":41,"matched_keywords":["preference"],"author_affiliations":["Baidu (China)","Westlake University"],"concepts":[{"id":"https://openalex.org/C63479239","display_name":"Robustness (evolution)","score":0.6977999806404114},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6779999732971191},{"id":"https://openalex.org/C2781249084","display_name":"Preference","score":0.6216999888420105},{"id":"https://openalex.org/C114466953","display_name":"Initialization","score":0.573199987411499},{"id":"https://openalex.org/C150189527","display_name":"Reference model","score":0.5497999787330627},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4406000077724457},{"id":"https://openalex.org/C141513077","display_name":"Independent and identically distributed random variables","score":0.4068000018596649},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.37310001254081726}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7128744008","title":"Enhancing Conversational Recommender Systems with Tree-Structured Knowledge and Pretrained Language Models","url":"https://doi.org/10.48448/9mgh-r642","published":"2026-01-07","authors":["Association for Artificial Intelligence 2026","Peng Du","Chuan Qin","Yongwen Ren","Dazhong Shen","Chao Wang","Hui Xiong"],"abstract":"Recent advances in pretrained language models (PLMs) have significantly improved conversational recommender systems (CRS), enabling more fluent and context-aware interactions. To further enhance accuracy and mitigate hallucination, many methods integrate PLMs with knowledge graphs (KGs), but face key challenges: failing to fully exploit PLM reasoning over graph relationships, indiscriminately incorporating retrieved knowledge without context filtering, and neglecting collaborative preferences in multi-turn dialogues. To this end, we propose PCRS-TKA, a prompt-based framework employing retrieval-augmented generation to integrate PLMs with KGs. PCRS-TKA constructs dialogue-specific knowledge trees from KGs and serializes them into texts, enabling structure-aware reasoning while capturing rich entity semantics. Our approach selectively filters context-relevant knowledge and explicitly model...","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"other","doi":"https://doi.org/10.48448/9mgh-r642","openalex_id":"https://openalex.org/W7128744008","cited_by_count":0,"quality_score":41,"matched_keywords":["retrieval"],"author_affiliations":["Baidu (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8456000089645386},{"id":"https://openalex.org/C165696696","display_name":"Exploit","score":0.7279999852180481},{"id":"https://openalex.org/C557471498","display_name":"Recommender system","score":0.6244000196456909},{"id":"https://openalex.org/C2779343474","display_name":"Context (archaeology)","score":0.5674999952316284},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.548799991607666},{"id":"https://openalex.org/C2987255567","display_name":"Knowledge graph","score":0.5245000123977661},{"id":"https://openalex.org/C21569690","display_name":"Collaborative filtering","score":0.4812000095844269},{"id":"https://openalex.org/C26517878","display_name":"Key (lock)","score":0.47369998693466187}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7128696695","title":"Collaborative LLM Numerical Reasoning with Local Data Protection","url":"https://doi.org/10.48448/33kh-qb70","published":"2026-01-07","authors":["Association for Artificial Intelligence 2026","Lin Cheong","Chang-Tien Lu","Yuzhe Lu","Haozhu Wang","Panpan Xu","Min Zhang","Yun Zhou"],"abstract":"Numerical reasoning over documents, which demands both contextual understanding and logical inference, is challenging for low-capacity local models deployed on computation-constrained devices. Although such complex reasoning queries could be routed to powerful remote models like GPT-4, exposing local data raises significant data leakage concerns. Existing mitigation methods generate problem descriptions or examples for remote assistance. However, the inherent complexity of numerical reasoning hinders the local model from generating logically equivalent queries and accurately inferring answers with remote guidance. In this paper, we present a model collaboration framework with two key innovations: (1) a context-aware synthesis strategy that shifts the query topics while preserving reasoning patterns; and (2) a tool-based answer reconstruction approach that reuses the remote-generated plug...","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"other","doi":"https://doi.org/10.48448/33kh-qb70","openalex_id":"https://openalex.org/W7128696695","cited_by_count":0,"quality_score":41,"matched_keywords":["LLM"],"author_affiliations":["Amazon (Germany)","Amazon (United States)","Virginia Tech"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7455000281333923},{"id":"https://openalex.org/C26517878","display_name":"Key (lock)","score":0.6410999894142151},{"id":"https://openalex.org/C67186912","display_name":"Data modeling","score":0.3887999951839447},{"id":"https://openalex.org/C89288958","display_name":"Reasoning system","score":0.3873000144958496},{"id":"https://openalex.org/C124101348","display_name":"Data mining","score":0.37299999594688416},{"id":"https://openalex.org/C203702819","display_name":"Logical data model","score":0.36890000104904175},{"id":"https://openalex.org/C2776760102","display_name":"Code (set theory)","score":0.36090001463890076},{"id":"https://openalex.org/C47487241","display_name":"Data access","score":0.29670000076293945}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7128691903","title":"165 - GRAM-R$^2$: Self-Training Generative Foundation Reward Models for Reward Reasoning","url":"https://doi.org/10.48448/6mng-1s85","published":"2026-01-07","authors":["Association for Artificial Intelligence 2026","Xiaoyang Hao","Yifu Huo","Bei Li","Fandong Meng","Yongyu Mu","Chenglong Wang","Tong Xiao","Murun Yang","Jiali Zeng","Chunliang Zhang","Hang Zhou"],"abstract":"Major progress in reward modeling over recent years has been driven by a paradigm shift from task-specific designs to generalist reward models. Despite this trend, developing effective reward models remains a fundamental challenge: the heavy reliance on large-scale labeled preference data. Pre-training on abundant unlabeled data offers a promising direction, but existing approaches fall short in instilling explicit reasoning capabilities into reward models. To bridge this gap, we propose a self-training approach that can leverage unlabeled data to scale up reward reasoning in reward models. Based on this approach, we develop GRAM-R² a generative reward model trained to produce not only preference labels but also accompanying reward rationales. GRAM-R² can serve as a foundation model for reward reasoning and can be applied to a wide range of tasks with minimal or no additional fine-tuning...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"other","doi":"https://doi.org/10.48448/6mng-1s85","openalex_id":"https://openalex.org/W7128691903","cited_by_count":0,"quality_score":41,"matched_keywords":["preference"],"author_affiliations":["Northeastern University","Tencent (China)","Universidad del Noreste"],"concepts":[{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.612500011920929},{"id":"https://openalex.org/C97541855","display_name":"Reinforcement learning","score":0.5776000022888184},{"id":"https://openalex.org/C153083717","display_name":"Leverage (statistics)","score":0.5565999746322632},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5210000276565552},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5187000036239624},{"id":"https://openalex.org/C97931131","display_name":"Discriminative model","score":0.4729999899864197},{"id":"https://openalex.org/C2780966255","display_name":"Foundation (evidence)","score":0.4526999890804291},{"id":"https://openalex.org/C167966045","display_name":"Generative model","score":0.4153999984264374}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7128680191","title":"159 - Audio-Thinker: Guiding Large Audio Language Model When and How to Think via Reinforcement Learning","url":"https://doi.org/10.48448/cgj7-z915","published":"2026-01-07","authors":["Association for Artificial Intelligence 2026","Chenxing Li","Wenfu Wang","Hualei Wang","Shu Wu","Dong Yu","Meng Yu","Hao Zhang"],"abstract":"Recent advancements in large language models, multimodal large language models, and large audio language models (LALMs) have significantly improved their reasoning capabilities through reinforcement learning utilizing rule-based rewards. However, the explicit reasoning process has not yet yielded substantial benefits for audio question answering, and effectively leveraging deep reasoning remains an open challenge, with LALMs still falling short of achieving human-level auditory-language reasoning. To address these limitations, we propose Audio-Thinker, a reinforcement learning framework designed to enhance the reasoning capabilities of LALMs through improved adaptability, consistency, and effectiveness. Our approach introduces an adaptive think accuracy reward, enabling the model to adjust its reasoning strategies based on task complexity. Furthermore, we incorporate an external reward m...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"other","doi":"https://doi.org/10.48448/cgj7-z915","openalex_id":"https://openalex.org/W7128680191","cited_by_count":0,"quality_score":41,"matched_keywords":["language model"],"author_affiliations":["Tencent (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7562000155448914},{"id":"https://openalex.org/C97541855","display_name":"Reinforcement learning","score":0.6833999752998352},{"id":"https://openalex.org/C177148314","display_name":"Generalization","score":0.6047000288963318},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5559999942779541},{"id":"https://openalex.org/C2776436953","display_name":"Consistency (knowledge bases)","score":0.5085999965667725},{"id":"https://openalex.org/C98045186","display_name":"Process (computing)","score":0.4659000039100647},{"id":"https://openalex.org/C2780451532","display_name":"Task (project management)","score":0.44670000672340393},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.4462999999523163}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7128713567","title":"VP-Bench: A Comprehensive Benchmark for Visual Prompting in Multimodal Large Language Models","url":"https://doi.org/10.48448/wpfz-ht59","published":"2026-01-07","authors":["Association for Artificial Intelligence 2026","Jinpeng Chen","Zekang Du","Wenqiang Lei","Jason Chun Lok Li","Qinbin Li","Kun Li","Kangcheng Liu","Wenao Ma","Yue Qiu","Jiaheng Wei","Mengyang Wu"],"abstract":"Multimodal Large Language Models (MLLM) have enabled a wide range of advanced vision-language applications, including fine-grained object recognition and contextual understanding. When querying specific regions or objects in an image, human users naturally use \"Visual Prompts\" (VP) like bounding boxes to provide reference. However, no existing benchmark systematically evaluates the ability of MLLMs to interpret such VPs. This gap raises uncertainty about whether current MLLMs can effectively recognize VPs, an intuitive prompting method for humans, and utilize them to solve problems. To address this limitation, we introduce VP-Bench, aiming to assess MLLMs’ capability in VP perception and utilization. VP-Bench employs a two-stage evaluation framework: Stage 1 examines models’ ability to perceive VPs in natural scenes, utilizing 100K visualized prompts spanning 8 shapes and 355 attribute c...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"other","doi":"https://doi.org/10.48448/wpfz-ht59","openalex_id":"https://openalex.org/W7128713567","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Chinese University of Hong Kong","Huawei Technologies (China)","Sichuan University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7971000075340271},{"id":"https://openalex.org/C185798385","display_name":"Benchmark (surveying)","score":0.7131999731063843},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5472000241279602},{"id":"https://openalex.org/C26760741","display_name":"Perception","score":0.48669999837875366},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.4487000107765198},{"id":"https://openalex.org/C63584917","display_name":"Bounding overwatch","score":0.44440001249313354},{"id":"https://openalex.org/C2781238097","display_name":"Object (grammar)","score":0.4397999942302704},{"id":"https://openalex.org/C195324797","display_name":"Natural language","score":0.4284999966621399}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7138368544","title":"UniCUE: Unified Recognition and Generation Framework for Chinese Cued Speech Video-to-Speech Generation","url":"https://doi.org/10.48448/n8jx-z168","published":"2026-01-07","authors":["Association for Artificial Intelligence 2026","Chenxing Li","Li Liu","Jinting Wang","Shan Yang","Dong Yu"],"abstract":"Cued Speech (CS) enhances lipreading via hand coding, offering visual phonemic cues that support precise speech perception for the hearing-impaired. The task of CS Video-to-Speech generation (CSV2S) aims to convert CS videos into intelligible speech signals. Most existing research focuses on CS Recognition (CSR), which transcribes video content into text. Consequently, a common solution for CSV2S is to integrate CSR with a text-to-speech (TTS) system. However, this pipeline relies on text as an intermediate medium, which may lead to error propagation and temporal misalignment between speech and CS video dynamics. In contrast, directly generating audio speech from CS video (direct CSV2S) often suffer from the inherent multimodal complexity and the limited availability of CS data. To address these challenges, we propose UniCUE, the first unified framework for CSV2S that directly generates....","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"other","doi":"https://doi.org/10.48448/n8jx-z168","openalex_id":"https://openalex.org/W7138368544","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Tencent (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8025000095367432},{"id":"https://openalex.org/C83195618","display_name":"Cued speech","score":0.7328000068664551},{"id":"https://openalex.org/C28490314","display_name":"Speech recognition","score":0.7124999761581421},{"id":"https://openalex.org/C2780451532","display_name":"Task (project management)","score":0.5662000179290771},{"id":"https://openalex.org/C43521106","display_name":"Pipeline (software)","score":0.5633999705314636},{"id":"https://openalex.org/C61328038","display_name":"Speech processing","score":0.5134000182151794},{"id":"https://openalex.org/C138954614","display_name":"Mandarin Chinese","score":0.5008999705314636},{"id":"https://openalex.org/C54953205","display_name":"Speech analytics","score":0.49380001425743103}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7128734482","title":"Generative AI Against Poaching: Latent Composite Flow Matching for Poaching Prediction","url":"https://doi.org/10.48448/fgep-k603","published":"2026-01-07","authors":["Association for Artificial Intelligence 2026","Vincent Boersch-Supan","Charles Emogor","Lingkai Kong","Milind Tambe","Haichuan Wang","Lily Xu"],"abstract":"Poaching poses significant threats to wildlife and biodiversity. A valuable step in reducing poaching is to forecast poacher behavior, which can inform patrol planning and other conservation interventions. Existing poaching prediction methods based on linear models or decision trees lack the expressivity to capture complex, nonlinear spatiotemporal patterns. Recent advances in generative modeling, particularly flow matching, offer a more flexible alternative. However, training such models on real-world poaching data faces two central obstacles: imperfect detection of poaching events and limited data. To address imperfect detection, we integrate flow matching with an occupancy-based detection model and train the flow in latent space to infer the underlying occupancy state. To mitigate data scarcity, we adopt a composite flow initialized from a linear-model prediction rather than random no...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"other","doi":"https://doi.org/10.48448/fgep-k603","openalex_id":"https://openalex.org/W7128734482","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Google (United States)"],"concepts":[{"id":"https://openalex.org/C2776217272","display_name":"Poaching","score":0.8679999709129333},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5837000012397766},{"id":"https://openalex.org/C165064840","display_name":"Matching (statistics)","score":0.5218999981880188},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.4999000132083893},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.45100000500679016},{"id":"https://openalex.org/C84525736","display_name":"Decision tree","score":0.4101000130176544},{"id":"https://openalex.org/C38349280","display_name":"Flow (mathematics)","score":0.3903999924659729},{"id":"https://openalex.org/C124101348","display_name":"Data mining","score":0.3578999936580658}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7133487493","title":"Artificial Intelligence and Machine Learning New Techniques for Generative AI that Drive Outcomes for Health and Transportation","url":"https://doi.org/10.1109/icmcsi67283.2026.11412819","published":"2026-01-07","authors":["Karteek Kotamsetty"],"abstract":"Artificial Intelligence (AI) and Machine Learning (ML) have emerged as powerful tools for data-driven decisionmaking across critical sectors such as healthcare and transportation. However, traditional ML models often struggle with data imbalance and scarcity, leading to biased predictions and limited generalization. This study introduces an advanced Generative AI-driven hybrid framework that integrates a Tabular Generative Adversarial Network (TGAN) with the XGBoost classifier to overcome these limitations and enhance predictive reliability. The framework is validated using two benchmark datasets: the heart-disease dataset for binary diagnosis in the healthcare application and the UK roadaccident dataset for multi-class severity prediction in the transportation application. The Generative AI component learns the statistical patterns of real data and generates highquality synthetic record...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/icmcsi67283.2026.11412819","openalex_id":"https://openalex.org/W7133487493","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Google (United States)"],"concepts":[{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.6933000087738037},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.60589998960495},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.5157999992370605},{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.45010000467300415},{"id":"https://openalex.org/C157170001","display_name":"Applications of artificial intelligence","score":0.3456999957561493},{"id":"https://openalex.org/C127413603","display_name":"Engineering","score":0.26919999718666077},{"id":"https://openalex.org/C2776401178","display_name":"Feature (linguistics)","score":0.26660001277923584},{"id":"https://openalex.org/C34413123","display_name":"Robotics","score":0.2485000044107437}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7128700338","title":"AdaptCLIP: Adapting CLIP for Universal Visual Anomaly Detection","url":"https://doi.org/10.48448/71rd-cy10","published":"2026-01-07","authors":["Association for Artificial Intelligence 2026","Yuezhi Cai","Bin-Bin Gao","Jun Liu","Yong Liu","Lei Wang","Chengjie Wang","Meng Wang","Jiangtao Yan","Weixi Zhang","Yue Zhou"],"abstract":"Universal visual anomaly detection aims to identify anomalies from novel or unseen vision domains without additional fine-tuning, which is critical in open scenarios. Recent studies have demonstrated that pre-trained vision-language models like CLIP exhibit strong generalization with just zero or a few normal images. However, existing methods struggle with designing prompt templates, complex token interactions, or requiring fine-tuning on target domains, resulting in limited flexibility. In this work, we present a simple yet effective AdaptCLIP based on two key insights. First, adaptive visual and textual representations should be learned alternately rather than jointly. Second, comparative learning between query and normal image prompt should incorporate both contextual and aligned residual features, rather than relying solely on residual features. AdaptCLIP treats CLIP models as a foun...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"other","doi":"https://doi.org/10.48448/71rd-cy10","openalex_id":"https://openalex.org/W7128700338","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Tencent (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7452999949455261},{"id":"https://openalex.org/C177148314","display_name":"Generalization","score":0.6930000185966492},{"id":"https://openalex.org/C739882","display_name":"Anomaly detection","score":0.6814000010490417},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.6407999992370605},{"id":"https://openalex.org/C155512373","display_name":"Residual","score":0.595300018787384},{"id":"https://openalex.org/C2780586882","display_name":"Simple (philosophy)","score":0.5662999749183655},{"id":"https://openalex.org/C26517878","display_name":"Key (lock)","score":0.5105999708175659},{"id":"https://openalex.org/C48145219","display_name":"Security token","score":0.43779999017715454}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7133492017","title":"A Framework for Explainable Artificial Intelligence in Healthcare Using Model-Agnostic Methods","url":"https://doi.org/10.1109/icmcsi67283.2026.11412454","published":"2026-01-07","authors":["Amulya Mallepalli","Venkata Gopi Siva Sai Nallapati","Vineetha Batchu","Nithin Reddy Kumbham","Raghavendra Reddy"],"abstract":"The growing use of the artificial intelligence (AI) in clinical decision support systems has compounded the necessity of transparent and reliable models with the ability to give reliable explanations. The conventional deep learning and ensemble-based healthcare prediction models have high accuracy, but they are black boxes and their use may be restricted in regulated clinical settings. The current paper suggests a unified model-agnostic Explainable Artificial Intelligence (XAI) system that will combine multimodal data processing with stratified interpretability to improve clinical transparency. The framework is comprised of local methods of explaining, including SHAP and LIME, global interpretability using aggregated SHAP values and permutation importance, and actionable explanation using counterfactual analysis and contrastive analysis. The experimental assessment based on the use of st...","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/icmcsi67283.2026.11412454","openalex_id":"https://openalex.org/W7133492017","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Amazon (Germany)","Amazon (United States)","Dunwoody College of Technology","Logan University","Short and Associates (United States)"],"concepts":[{"id":"https://openalex.org/C160735492","display_name":"Health care","score":0.5613999962806702},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5475000143051147},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4447999894618988},{"id":"https://openalex.org/C2989236134","display_name":"Patient care","score":0.296099990606308},{"id":"https://openalex.org/C157170001","display_name":"Applications of artificial intelligence","score":0.29269999265670776},{"id":"https://openalex.org/C26517878","display_name":"Key (lock)","score":0.28790000081062317},{"id":"https://openalex.org/C56739046","display_name":"Knowledge management","score":0.2782999873161316},{"id":"https://openalex.org/C2522767166","display_name":"Data science","score":0.27300000190734863}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7128697308","title":"1503 - Generative AI Against Poaching: Latent Composite Flow Models for Wildlife Conservation","url":"https://doi.org/10.48448/5r3h-8f74","published":"2026-01-07","authors":["Association for Artificial Intelligence 2026","Vincent Boersch-Supan","Charles Emogor","Lingkai Kong","Milind Tambe","Haichuan Wang","Lily Xu"],"abstract":"Poaching poses significant threats to biodiversity. A valuable step in reducing poaching is to forecast poacher behavior, which can inform patrol deployment and other conservation interventions. Existing poaching prediction methods based on linear models or decision trees lack the expressivity to capture complex, nonlinear spatiotemporal patterns. Recent advances in generative modeling, particularly flow matching, offer a more flexible alternative. However, training such models on real-world poaching data faces two central obstacles: imperfect detection of poaching events and limited data. To address imperfect detection, we integrate flow matching with an occupancy-based detection model and train the flow in latent space to infer the underlying occupancy state. To mitigate data scarcity, we adopt a composite flow initialized from a linear-model prediction rather than random noise which i...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"other","doi":"https://doi.org/10.48448/5r3h-8f74","openalex_id":"https://openalex.org/W7128697308","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Google (United States)"],"concepts":[{"id":"https://openalex.org/C2776217272","display_name":"Poaching","score":0.7953000068664551},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.580299973487854},{"id":"https://openalex.org/C165064840","display_name":"Matching (statistics)","score":0.4713999927043915},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.4602999985218048},{"id":"https://openalex.org/C160331591","display_name":"Occupancy","score":0.40959998965263367},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4020000100135803},{"id":"https://openalex.org/C84525736","display_name":"Decision tree","score":0.39500001072883606},{"id":"https://openalex.org/C147366489","display_name":"Wildlife conservation","score":0.38850000500679016}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"apple:ch2z5uzs6w68u76x7sirz77l","title":"Improving User Interface Generation Models from Designer Feedback","url":"https://machinelearning.apple.com/research/designer-feedback","published":"2026-01-06","authors":["Jason Wu","Amanda Swearngin","Arun Krishna Vajjala","Alan Leung","Jeffrey Nichols","Titus Barik"],"abstract":"Despite being trained on vast amounts of data, most LLMs are unable to reliably generate well-designed UIs. Designer feedback is essential to improving performance on UI generation; however, we find that existing RLHF methods based on ratings or rankings are not well-aligned with designers' workflows and ignore the rich rationale used to critique and improve UI designs. In this paper, we investigate several approaches for designers to give...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.1145/3772318.3791567","openalex_id":"https://openalex.org/W4415252568","cited_by_count":1,"quality_score":53,"matched_keywords":[],"author_affiliations":["Apple","Apple (United States)","Purdue University West Lafayette"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"apple:vlxmp7pzf3b5grveiyqaqk5p","title":"NarrativeTrack: Evaluating Video Language Models Beyond the Frame","url":"https://machinelearning.apple.com/research/narrativetrack","published":"2026-01-06","authors":["Hyeonjeong Ha","Jinjin Ge","Bo Feng","Kaixin Ma","Gargi Chakraborty"],"abstract":"Multimodal large language models (MLLMs) have achieved impressive progress in vision-language reasoning, yet their ability to understand temporally unfolding narratives in videos remains underexplored. True narrative understanding requires grounding who is doing what, when, and where, maintaining coherent entity representations across dynamic visual and temporal contexts. We introduce NarrativeTrack, the first benchmark to evaluate narrative...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"arxiv:2601.02736","title":"Hypothesize-Then-Verify: Speculative Root Cause Analysis for Microservices with Pathwise Parallelism","url":"http://arxiv.org/abs/2601.02736","published":"2026-01-06","authors":["Lingzhe Zhang","Tong Jia","Yunpeng Zhai","Leyi Pan","Chiming Duan","Minghua He","P. Xiao","Ying Li"],"abstract":"Microservice systems have become the backbone of cloud-native enterprise applications due to their resource elasticity, loosely coupled architecture, and lightweight deployment. Yet, the intrinsic complexity and dynamic runtime interactions of such systems inevitably give rise to anomalies. Ensuring system reliability therefore hinges on effective root cause analysis (RCA), which entails not only localizing the source of anomalies but also characterizing the underlying failures in a timely and interpretable manner. Recent advances in intelligent RCA techniques, particularly those powered by large language models (LLMs), have demonstrated promising capabilities, as LLMs reduce reliance on handcrafted features while offering cross-platform adaptability, task generalization, and flexibility. However, existing LLM-based methods still suffer from two critical limitations: (a) limited explorat...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3786582.3786803","openalex_id":"https://openalex.org/W7118827999","cited_by_count":0,"quality_score":41,"matched_keywords":["LLM"],"author_affiliations":["Alibaba Group (China)","Peking University","Tsinghua University"],"concepts":[{"id":"https://openalex.org/C2778505942","display_name":"Microservices","score":0.9150999784469604},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8366000056266785},{"id":"https://openalex.org/C130963320","display_name":"Root cause analysis","score":0.8091999888420105},{"id":"https://openalex.org/C84945661","display_name":"Root cause","score":0.7947999835014343},{"id":"https://openalex.org/C48044578","display_name":"Scalability","score":0.7289999723434448},{"id":"https://openalex.org/C171078966","display_name":"Root (linguistics)","score":0.6883000135421753},{"id":"https://openalex.org/C43214815","display_name":"Reliability (semiconductor)","score":0.6166999936103821},{"id":"https://openalex.org/C2780451532","display_name":"Task (project management)","score":0.5735999941825867}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/fine-tuning-small-language-models-as-efficient-enterprise-search-relevance-labelers","title":"Fine-tuning Small Language Models as Efficient Enterprise Search Relevance Labelers","url":"https://www.microsoft.com/en-us/research/publication/fine-tuning-small-language-models-as-efficient-enterprise-search-relevance-labelers/","published":"2026-01-05","authors":["Yue Kang","Zhuoyi Huang","Benji Schussheim","Diana Licon","Dina Atia","Shixing Cao","Jacob Danovitch","Kunho Kim","Billy Norcilien","Jonah Karpman","Mahmound Sayed","Mike Taylor"],"abstract":"In enterprise search, building high-quality datasets at scale remains a central challenge due to the difficulty of acquiring labeled data. To resolve this challenge, we propose an efficient approach to fine-tune small language models (SLMs) for accurate relevance labeling, enabling high-throughput, domain-specific labeling comparable or even better in quality to that of state-of-the-art large language models (LLMs). To overcome the lack of high-quality and accessible datasets in the enterprise domain, our method leverages on synthetic data generation. Specifically, we employ an LLM to synthesize realistic enterprise queries from a seed document, apply BM25 to retrieve hard negatives, and use a teacher LLM to assign relevance scores. The resulting dataset is then distilled into an SLM, producing a compact relevance labeler. We evaluate our approach on a high-quality benchmark consisting o...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Article (Journal)","Search and information retrieval","Computer science","LLM","retrieval","efficient"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W7118167202","title":"Lifelong Learning of Large Language Model Based Agents: A Roadmap","url":"https://doi.org/10.1109/tpami.2025.3650546","published":"2026-01-05","authors":["Junhao Zheng","Chengming Shi","Xidi Cai","Qiuke Li","Duzhen Zhang","Chenxing Li","Dong Yu","Qianli Ma"],"abstract":"Lifelong learning, also known as continual or incremental learning, is a crucial component for advancing Artificial General Intelligence (AGI) by enabling systems to continuously adapt in dynamic environments. While large language models (LLMs) have demonstrated impressive capabilities in natural language processing, existing LLM agents are typically designed for static systems and lack the ability to adapt over time in response to new challenges. This survey is the first to systematically summarize the potential techniques for incorporating lifelong learning into LLM-based agents. We categorize the core components of these agents into three modules: the perception module for multimodal input integration, the memory module for storing and retrieving evolving knowledge, and the action module for grounded interactions with the dynamic environment. We highlight how these pillars collectivel...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/tpami.2025.3650546","openalex_id":"https://openalex.org/W7118167202","cited_by_count":3,"quality_score":56,"matched_keywords":["LLM","language model","memory","long-term"],"author_affiliations":["Bellevue Hospital Center","Mohamed bin Zayed University of Artificial Intelligence","South China University of Technology","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C108771440","display_name":"Lifelong learning","score":0.8421000242233276},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7555999755859375},{"id":"https://openalex.org/C168167062","display_name":"Component (thermodynamics)","score":0.6182000041007996},{"id":"https://openalex.org/C94124525","display_name":"Categorization","score":0.5092999935150146},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.45829999446868896},{"id":"https://openalex.org/C2780791683","display_name":"Action (physics)","score":0.4377000033855438},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.42649999260902405},{"id":"https://openalex.org/C2779439875","display_name":"Natural language understanding","score":0.4187000095844269}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":3}},{"id":"openalex:W7118171927","title":"VDSAgents: A PCS‐Guided Multi‐Agent System for Veridical Data Science Automation","url":"https://doi.org/10.1002/sta4.70126","published":"2026-01-05","authors":["Yunxuan Jiang","Silan Hu","Xiaoning Wang","Yuanyuan Zhang","Xiangyu Chang"],"abstract":"ABSTRACT Large language models (LLMs) become increasingly integrated into data science workflows for automated system design. However, these LLM‐driven data science systems rely solely on the internal reasoning of LLMs, lacking guidance from scientific and theoretical principles. This limits their trustworthiness and robustness, especially when dealing with noisy and complex real‐world datasets. This paper provides VDSAgents 1 , a multi‐agent system grounded in the predictability–computability–stability (PCS) principles (Yu and Kumbier, 2020). proposed in the veridical data science (VDS) (Yu and Barter, 2024). Guided by PCS principles, the system implements a modular workflow for data cleaning, feature engineering, modeling and evaluation. Each phase is handled by an elegant agent, incorporating perturbation analysis, unit testing and model validation to ensure both functionality and sci...","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1002/sta4.70126","openalex_id":"https://openalex.org/W7118171927","cited_by_count":0,"quality_score":45,"matched_keywords":["LLM","agent"],"author_affiliations":["Baidu (China)","Communication University of China","National University of Singapore","Xi'an Jiaotong University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7400000095367432},{"id":"https://openalex.org/C177212765","display_name":"Workflow","score":0.7019000053405762},{"id":"https://openalex.org/C101468663","display_name":"Modular design","score":0.5027999877929688},{"id":"https://openalex.org/C115901376","display_name":"Automation","score":0.47209998965263367},{"id":"https://openalex.org/C2522767166","display_name":"Data science","score":0.40639999508857727},{"id":"https://openalex.org/C124101348","display_name":"Data mining","score":0.3752000033855438},{"id":"https://openalex.org/C153701036","display_name":"Trustworthiness","score":0.3495999872684479},{"id":"https://openalex.org/C67186912","display_name":"Data modeling","score":0.3449000120162964}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7131376718","title":"AutoSOC Cyber Analyst (ASOC-CA): Using AI to Automate SOC Tier 1 & 2 Activities","url":"https://doi.org/10.1109/ccwc67433.2026.11393893","published":"2026-01-05","authors":["Iason Mihalopoulos","Jen-Ryann Ngo-Antonio","Matthew Pugh","Damon Neal","Yinzhi Cao","Lanier Watkins"],"abstract":"Hackers are now beginning to use artificial intelligence (AI) to expand the capabilities and sophistication of their attacks, creating challenges for network owners. As state-sponsored adversaries and advanced persistent threats (APTs) utilize AI to their advantage, network owners must evolve concurrently to address the emergent threat landscape shaped by advanced AI systems. Thus, in this paper, we propose an end-to-end Machine Learning Framework to automate and strengthen cybersecurity operations. To demonstrate the feasibility of our approach, in this work-in-progress paper, we built and evaluated, a framework that consists of a machine learning (ML) model that automates the classifications of alerts generated by the Splunk Security Information and Event Management system (SIEM) to map adversary behaviors to MITRE ATT&CK stages. Our automated detection efforts are then fed into an LLM...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/ccwc67433.2026.11393893","openalex_id":"https://openalex.org/W7131376718","cited_by_count":0,"quality_score":41,"matched_keywords":["LLM"],"author_affiliations":["Johns Hopkins Center for Health Security","Johns Hopkins University","Microsoft (United States)","Microsoft Research (United Kingdom)"],"concepts":[{"id":"https://openalex.org/C41065033","display_name":"Adversary","score":0.6859999895095825},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6549000144004822},{"id":"https://openalex.org/C149810388","display_name":"Emulation","score":0.6241000294685364},{"id":"https://openalex.org/C38652104","display_name":"Computer security","score":0.6154000163078308},{"id":"https://openalex.org/C86844869","display_name":"Hacker","score":0.6126000285148621},{"id":"https://openalex.org/C168725872","display_name":"Sophistication","score":0.569599986076355},{"id":"https://openalex.org/C2779585090","display_name":"Resilience (materials science)","score":0.564300000667572},{"id":"https://openalex.org/C2779662365","display_name":"Event (particle physics)","score":0.5508000254631042}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7126223327","title":"AI-Driven Incident Management for Distributed Cloud Systems: Detection, Mitigation, and Root Cause Automation","url":"https://doi.org/10.52783/jisem.v11i1s.14216","published":"2026-01-05","authors":["Harpreet Singh"],"abstract":"Artificial intelligence for IT operations signifies a paradigmatic shift in managing hyperscale cloud distributions because manual incident management strategies fail to scale. This rises from the growing complexity of service dependencies, increasingly high numbers of alerts, and failure propagation patterns. Multiple learning anomaly detection strategies use self-adjusting thresholds and combined signal analysis to isolate occurrences of operation beyond the norm on an operational timeline. Meanwhile, intelligent alert consolidation strategies use intelligence to group alerts based on commonalities. Autopsy-style diagnosis uses causal analysis along with large language models to synthesize incident information from a data mashup of various telemetry data. Meanwhile, predictive repair uses time-series forecasting along with reinforcement learning to predict repair strategies in the form...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.52783/jisem.v11i1s.14216","openalex_id":"https://openalex.org/W7126223327","cited_by_count":0,"quality_score":41,"matched_keywords":["memory"],"author_affiliations":["Microsoft (Finland)","Microsoft (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6104999780654907},{"id":"https://openalex.org/C79974875","display_name":"Cloud computing","score":0.6003999710083008},{"id":"https://openalex.org/C2780952636","display_name":"Incident management","score":0.5609999895095825},{"id":"https://openalex.org/C739882","display_name":"Anomaly detection","score":0.5349000096321106},{"id":"https://openalex.org/C115901376","display_name":"Automation","score":0.47780001163482666},{"id":"https://openalex.org/C200601418","display_name":"Reliability engineering","score":0.4578999876976013},{"id":"https://openalex.org/C2909164965","display_name":"Incident report","score":0.4219000041484833},{"id":"https://openalex.org/C130963320","display_name":"Root cause analysis","score":0.40389999747276306}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7118120805","title":"A Multi-Modal Machine Learning Architecture for Resource-Efficient Sensing and Sustainable Edge Intelligence","url":"https://doi.org/10.63503/j.ijaimd.2025.214","published":"2026-01-03","authors":["Sudheekar Reddy Pothireddy","Soumya Remella","Yashovardhan Jayaram","Dilliraja Sundar","Jayant Bhat"],"abstract":"Intelligent sensing at the network edge is a tricky issue, even though it is not an easy endeavor to try to maximize accuracy but is rather a skirmish against limited resources. Embedded systems are identifying increased sensors and are becoming omnipresent and the real-time and multi-modal interpretation is booming, rendering traditional and cloud-reliant or computationally intensive machine learning models ineffective. It thus requires the creation of architecture that will handle this wilderness of limited compute and energy in real-time, not monolithic models that have been transplanted out of data centers. The current paper constitutes a computational framework of multi-modal learning at the edge straightforwardly addressing the issue of the efficiency-accuracy trade-off. We do not consider the highly complex suite of WSM-2023 streams of benchmarks as the very classification tasks b...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.63503/j.ijaimd.2025.214","openalex_id":"https://openalex.org/W7118120805","cited_by_count":0,"quality_score":41,"matched_keywords":["efficient"],"author_affiliations":["Digital Science (United States)","Microsoft (United States)","Structural Analytics (United States)","a.i. solutions (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7250999808311462},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.6926000118255615},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.6420999765396118},{"id":"https://openalex.org/C63479239","display_name":"Robustness (evolution)","score":0.6358000040054321},{"id":"https://openalex.org/C33954974","display_name":"Sensor fusion","score":0.5940999984741211},{"id":"https://openalex.org/C79581498","display_name":"Suite","score":0.42809998989105225},{"id":"https://openalex.org/C205711294","display_name":"Rendering (computer graphics)","score":0.400299996137619},{"id":"https://openalex.org/C50644808","display_name":"Artificial neural network","score":0.38589999079704285}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7118131596","title":"Out of distribution detection with attention head masking for multimodal document classification","url":"https://doi.org/10.1038/s41598-025-32328-9","published":"2026-01-03","authors":["Christos E. Constantinou","Γεώργιος Ιωαννίδης","Aman Chadha","Aaron Elkins","Edwin Simpson"],"abstract":"Detecting out-of-distribution (OOD) data is critical for ensuring the reliability and safety of deployed machine learning systems by mitigating model overconfidence and misclassification. While existing OOD detection methods primarily focus on uni-modal inputs, such as images or text, their effectiveness in multi-modal settings, particularly documents, remains underexplored. Moreover, most approaches prioritize decision mechanisms over optimizing the underlying dense embedding representations for optimal separation. In this work, we introduce Attention Head Masking (AHM), a novel technique applied to Transformer-based models for both uni-modal and multi-modal OOD detection. Our empirical results demonstrate that AHM enhances embedding quality, significantly improving the separation between in-distribution and OOD data. Notably, our method reduces the false positive rate (FPR) by up to 10...","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1038/s41598-025-32328-9","openalex_id":"https://openalex.org/W7118131596","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Amazon (Germany)","Amazon (United Kingdom)","Amazon (United States)","Carnegie Mellon University","San Diego State University","Stanford University","University of Bristol"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8551999926567078},{"id":"https://openalex.org/C2777402240","display_name":"Masking (illustration)","score":0.5810999870300293},{"id":"https://openalex.org/C41608201","display_name":"Embedding","score":0.5637999773025513},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.555400013923645},{"id":"https://openalex.org/C66322947","display_name":"Transformer","score":0.5033000111579895},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.49129998683929443},{"id":"https://openalex.org/C43214815","display_name":"Reliability (semiconductor)","score":0.48899999260902405},{"id":"https://openalex.org/C192209626","display_name":"Focus (optics)","score":0.4731000065803528}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7118067677","title":"Exploring Multi-Lingual Bias of Large Code Models in Code Generation","url":"https://doi.org/10.1145/3786793","published":"2026-01-03","authors":["Chaozheng Wang","Zongjie Li","Cuiyun Gao","Wenxuan Wang","Ting Peng","Hailiang Huang","Yuetang Deng","Shuai Wang","M. R. Lyu"],"abstract":"Code generation aims to synthesize code and fulfill functional requirements based on natural language (NL) specifications, which can greatly improve development efficiency. In the era of large language models (LLMs), large code models (LCMs) have been recently proposed to generate source code. LCMs can generate highly feasible solutions for programming problems described in natural language. Despite the effectiveness, we observe a noticeable multilingual bias in the generation performance of LCMs. Specifically, LCMs demonstrate proficiency in generating solutions when provided with instructions in English, yet may falter when faced with semantically equivalent instructions in other NLs such as Chinese. Moreover, the ability of LCMs to generate code exhibits variety across different programming languages (PLs), such as Python and C++. The observed phenomenon indicates the presence of mult...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3786793","openalex_id":"https://openalex.org/W7118067677","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Chinese University of Hong Kong","Hong Kong University of Science and Technology","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C519991488","display_name":"Python (programming language)","score":0.8730999827384949},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8342999815940857},{"id":"https://openalex.org/C199360897","display_name":"Programming language","score":0.6297000050544739},{"id":"https://openalex.org/C185798385","display_name":"Benchmark (surveying)","score":0.5698000192642212},{"id":"https://openalex.org/C133162039","display_name":"Code generation","score":0.5379999876022339},{"id":"https://openalex.org/C2776760102","display_name":"Code (set theory)","score":0.4830000102519989},{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.4722999930381775},{"id":"https://openalex.org/C43126263","display_name":"Source code","score":0.4611999988555908}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/user-perceptions-of-an-llm-based-chatbot-for-cognitive-reappraisal-of-stress-feasibility-study","title":"User Perceptions of an LLM-Based Chatbot for Cognitive Reappraisal of Stress: Feasibility Study","url":"https://www.microsoft.com/en-us/research/publication/user-perceptions-of-an-llm-based-chatbot-for-cognitive-reappraisal-of-stress-feasibility-study/","published":"2026-01-02","authors":["Ananya Bhattacharjee","Jina Suh","Mohit Chandra","Javier Hernandez"],"abstract":"Cognitive reappraisal is a well-studied emotion regulation strategy that helps individuals reinterpret stressful situations to reduce their impact. Many digital mental health tools struggle to support this process because rigid scripts fail to accommodate how users naturally describe stressors. This study examined the feasibility of an LLM-based single-session intervention (SSI) for workplace stress reappraisal. We assessed short-term changes in stress-related outcomes and examined design tensions during use. We conducted a feasibility study with 100 employees at a large technology company who completed a structured cognitive reappraisal session delivered by a GPT-4o-based chatbot. Pre-post measures included perceived stress intensity, stress mindset, perceived demand, and perceived resources. These outcomes were analyzed using paired Wilcoxon signed-rank tests with correction for multip...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Unpublished","Artificial intelligence","Human-computer interaction","Mental health","LLM"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W7118085507","title":"Retrieval-Augmented Generation for AI-Generated Content: A Survey","url":"https://doi.org/10.1007/s41019-025-00335-5","published":"2026-01-02","authors":["Penghao Zhao","Hailin Zhang","Qinhan Yu","Zhengren Wang","Yunteng Geng","Fangcheng Fu","Ling Yang","Wentao Zhang","Jie Jiang","Bin Cui"],"abstract":"Advancements in model algorithms, the growth of foundational models, and access to high-quality datasets have propelled the evolution of Artificial Intelligence Generated Content (AIGC). Despite its notable successes, AIGC still faces hurdles such as updating knowledge, handling long-tail data, mitigating data leakage, and managing high training and inference costs. Retrieval-augmented generation (RAG) has recently emerged as a paradigm to address such challenges. In particular, RAG introduces the information retrieval process, which enhances the generation process by retrieving relevant objects from available data stores, leading to higher accuracy and better robustness. In this paper, we comprehensively review existing efforts that integrate RAG techniques into AIGC scenarios. We first classify RAG foundations according to how the retriever augments the generator, distilling the fundam...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1007/s41019-025-00335-5","openalex_id":"https://openalex.org/W7118085507","cited_by_count":36,"quality_score":71,"matched_keywords":["retrieval"],"author_affiliations":["Peking University","Shanghai Jiao Tong University","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8248000144958496},{"id":"https://openalex.org/C2776214188","display_name":"Inference","score":0.6680999994277954},{"id":"https://openalex.org/C2522767166","display_name":"Data science","score":0.6241999864578247},{"id":"https://openalex.org/C98045186","display_name":"Process (computing)","score":0.5889999866485596},{"id":"https://openalex.org/C12713177","display_name":"Perspective (graphical)","score":0.5273000001907349},{"id":"https://openalex.org/C2779903281","display_name":"Modalities","score":0.47200000286102295},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3093999922275543},{"id":"https://openalex.org/C157170001","display_name":"Applications of artificial intelligence","score":0.3003999888896942}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":36}},{"id":"official:24df736add73f9ea","title":"PhyGDPO: Physics-Aware Groupwise Direct Preference Optimization for Physically Consistent Text-to-Video Generation","url":"https://ai.meta.com/research/publications/phygdpo-physics-aware-groupwise-direct-preference-optimization-for-physically-consistent-text-to-video-generation/","published":"2026-01-02","authors":["Yuanhao Cai","Kunpeng Li","Menglin Jia","Jialiang Wang","Junzhe Sun","Feng Liang","Weifeng Chen","Felix Xu","Chu Wang","Ali Thabet","Xiaoliang Dai","Xuan Ju"],"abstract":"Official Meta AI publication page.","companies":["Meta/FAIR"],"matched_orgs":["Meta/FAIR"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Computer Vision","preference"],"author_affiliations":["Meta/FAIR"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Meta AI publication search https://ai.meta.com/results/?content_types%5B0%5D=publication&page=2"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/legomem-modular-procedural-memory-for-multi-agent-llm-systems-for-workflow-automation","title":"LEGOMem: Modular Procedural Memory for Multi-agent LLM Systems for Workflow Automation","url":"https://www.microsoft.com/en-us/research/publication/legomem-modular-procedural-memory-for-multi-agent-llm-systems-for-workflow-automation/","published":"2026-01-01","authors":["Dongge Han","Camille Couturier","Daniel Madrigal","Xuchao Zhang","Victor Ruehle","Saravan Rajmohan"],"abstract":"We introduce LEGOMem, a modular procedural memory framework for multi-agent large language model (LLM) systems in workflow automation. LEGOMem decomposes past task trajectories into reusable memory units and flexibly allocates them across orchestrators and task agents to support planning and execution. To explore the design space of memory in multi-agent systems, we use LEGOMem as a lens and conduct a systematic study of procedural memory in multi-agent systems, examining where memory should be placed, how it should be retrieved, and which agents benefit most. Experiments on the OfficeBench benchmark show that orchestrator memory is critical for effective task decomposition and delegation, while fine-grained agent memory improves execution accuracy. We find that even teams composed of smaller language models can benefit substantially from procedural memory, narrowing the performance gap....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":88,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","1970-01-01","LLM","language model","memory","agent","multi-agent"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/adults-using-aac-to-inform-the-design-of-mimetic-agentic-ai","title":"“I did not know people talk without thinking about it”: How the lived experiences of adults who use AAC can inform the design of mimetic, agentic AI","url":"https://www.microsoft.com/en-us/research/publication/adults-using-aac-to-inform-the-design-of-mimetic-agentic-ai/","published":"2026-01-01","authors":["Erin Beneteau","Humphrey Curtis","John Tang","Sasa Junuzovic","Ann Paradiso","Ed Cutrell","Martez Mott"],"abstract":"With the rapid adoption of generative AI in meetings, we anticipate a near-future where people prepare and deploy personalized AI-powered mimetic agents, or digital twins, to speak on their behalf. However, most people have limited experience in preparing for and reviewing conversations made by these agents, and it remains unclear how to design next-generation conversational AI technologies that can support people as they use them in their daily lives. Adults who use Augmentative and Alternative Communication (AAC) have unique expertise in using computing technologies for their daily communication and can provide valuable perspectives on designing AI-mediated conversations. We interviewed nine adults who use AAC to understand their practices and strategies before, during, and after meetings. Our analysis revealed nine communication strategies across these meeting phases that can help inf...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Tech Report","Artificial intelligence","Human-computer interaction","accessibility","Human–computer interaction","personalized"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/sci-phi-a-large-language-model-spatial-audio-descriptor","title":"Sci-Phi: A Large Language Model Spatial Audio Descriptor","url":"https://www.microsoft.com/en-us/research/publication/sci-phi-a-large-language-model-spatial-audio-descriptor/","published":"2026-01-01","authors":["Xilin Jiang","Sebastian Braun","Hannes Gamper"],"abstract":"Acoustic scene perception involves describing the type of sounds, their timing, their direction and distance, as well as their loudness and reverberation. While audio language models excel in sound recognition, single-channel input fundamentally limits spatial understanding. This work presents Sci-Phi, a spatial audio large language model with dual spatial and spectral encoders that estimates a complete parameter set for all sound sources and the surrounding environment. Learning from over 4,000 hours of synthetic first-order Ambisonics recordings including metadata, Sci-Phi enumerates and describes up to four directional sound sources in one pass, alongside non-directional background sounds and room characteristics. We evaluate the model with a permutation-invariant protocol and 15 metrics covering content, location, timing, loudness, and reverberation, and analyze its robustness across...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.1109/ojsp.2026.3657300","openalex_id":"https://openalex.org/W7125601035","cited_by_count":0,"quality_score":76,"matched_keywords":["Article (Journal)","Artificial intelligence","Audio and Acoustics","Audio signal processing","LLM","language model"],"author_affiliations":["Microsoft","Columbia University","Microsoft (United States)"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/salad-vae-semantic-audio-compression-with-language-audio-distillation","title":"SALAD-VAE: Semantic Audio Compression with Language-Audio Distillation","url":"https://www.microsoft.com/en-us/research/publication/salad-vae-semantic-audio-compression-with-language-audio-distillation/","published":"2026-01-01","authors":["Sebastian Braun","Hannes Gamper","Dimitra Emmanouilidou"],"abstract":"Modern generative and multimodal models increasingly rely on compact latent representations that trade and balance semantic richness with high-fidelity reconstruction. We introduce SALAD-VAE, a continuous and highly compact semantic Audio Variational Autoencoder, which operates in the frequency domain and achieves state-of-the-art compression with very low latent frame rate (7.8 Hz) while surfacing semantic structure and producing high audio quality. We enhance the standard VAE semantic losses and augmentation, specifically contrastive learning and CLAP-based embedding distillation, enabling it to generalize across diverse audio domains. With a significantly less computational complex architecture than comparable state-of-the-art VAEs, SALAD-VAE shows comparably high reconstruction quality while it consistently outperforms them on a wide range of classification benchmarks. Furthermore, t...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.1109/icassp55912.2026.11463284","openalex_id":"https://openalex.org/W4416395544","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Audio and Acoustics","Audio and Speech Processing","compression","distillation"],"author_affiliations":["Microsoft","Microsoft (United States)"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"huawei-noah:184","title":"Scaling Up, Speeding Up: A Benchmark of Speculative Decoding for Efficient LLM Test-Time Scaling","url":"https://www.noahlab.com.hk/en/scientific_research/scaling-up-speeding-up-a-benchmark-of-speculative-decoding-for-efficient-llm-test-time-scaling","published":"2026-01-01","authors":["Huawei/Noah"],"abstract":"Official Huawei Noah's Ark Lab publication entry. Venue: ICLR 2026. External paper link: https://arxiv.org/pdf/2509.04474","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Industry Intelligence","ICLR 2026","2026","LLM","efficient"],"author_affiliations":["Huawei/Noah"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Huawei Noah Wagtail publication API https://www.noahlab.com.hk/wt_app/api/v2/pages/"}},{"id":"huawei-noah:185","title":"PASER : Post-Training Data Selection for Efficient Pruned Large Language Model Recovery","url":"https://www.noahlab.com.hk/en/scientific_research/paser-post-training-data-selection-for-efficient-pruned-large-language-model-recovery","published":"2026-01-01","authors":["Huawei/Noah"],"abstract":"Official Huawei Noah's Ark Lab publication entry. Venue: ICLR 2026. External paper link: https://arxiv.org/pdf/2502.12594","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Industry Intelligence","ICLR 2026","2026","language model","efficient"],"author_affiliations":["Huawei/Noah"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Huawei Noah Wagtail publication API https://www.noahlab.com.hk/wt_app/api/v2/pages/"}},{"id":"huawei-noah:194","title":"MOSS: Efficient and Accurate FP8 LLM Training with Microscaling and Automatic Scaling","url":"https://www.noahlab.com.hk/en/scientific_research/moss-efficient-and-accurate-fp8-llm-training-with-microscaling-and-automatic-scaling","published":"2026-01-01","authors":["Huawei/Noah"],"abstract":"Official Huawei Noah's Ark Lab publication entry. Venue: ICLR 2026. External paper link: https://arxiv.org/pdf/2511.05811","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Industry Intelligence","ICLR 2026","2026","LLM","efficient"],"author_affiliations":["Huawei/Noah"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Huawei Noah Wagtail publication API https://www.noahlab.com.hk/wt_app/api/v2/pages/"}},{"id":"huawei-noah:197","title":"PocketLLM: Ultimate Compression of Large Language Models via Meta Networks","url":"https://www.noahlab.com.hk/en/scientific_research/pocketllm-ultimate-compression-of-large-language-models-via-meta-networks","published":"2026-01-01","authors":["Huawei/Noah"],"abstract":"Official Huawei Noah's Ark Lab publication entry. Venue: AAAI 2026. External paper link: https://arxiv.org/abs/2511.17637","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Model architecture and optimization","AAAI 2026","2026","compression"],"author_affiliations":["Huawei/Noah"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Huawei Noah Wagtail publication API https://www.noahlab.com.hk/wt_app/api/v2/pages/"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/murakkab-resource-efficient-agentic-workflow-orchestration-in-cloud-platforms","title":"Murakkab: Resource-Efficient Agentic Workflow Orchestration in Cloud Platforms","url":"https://www.microsoft.com/en-us/research/publication/murakkab-resource-efficient-agentic-workflow-orchestration-in-cloud-platforms/","published":"2026-01-01","authors":["Gohar Irfan Chaudhry","Esha Choukse","Haoran Qiu","Íñigo Goiri","Rodrigo Fonseca","Adam Belay","Ricardo Bianchini"],"abstract":"Agentic workflows commonly coordinate multiple models and tools with complex control logic. They are quickly becoming the dominant paradigm for AI applications. However, serving them remains inefficient with today's frameworks. The key problem is that they expose workflows as opaque sequences of model and tool calls that tightly couple agent logic with model and hardware choices. Often, these workflow components are fragmented across different entities, preventing systems from reasoning about trade-offs across accuracy, latency, energy, and cost. This leads to resource waste and degraded service-level objectives (SLOs).We present Murakkab, a resource-efficient serving system for agentic workflows. Murakkab introduces a declarative abstraction that decouples workflow specification from execution configuration. A profile-guided optimizer and adaptive runtime jointly manage the full stack:....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Unpublished","Artificial intelligence","efficient","agent"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/improving-long-context-summarization-with-multi-granularity-retrieval-optimization","title":"Improving Long-Context Summarization with Multi-Granularity Retrieval Optimization","url":"https://www.microsoft.com/en-us/research/publication/improving-long-context-summarization-with-multi-granularity-retrieval-optimization/","published":"2026-01-01","authors":["Xueyu Chen","Kaitao Song","Zifan Song","Dongsheng Li","Cairong Zhao"],"abstract":"Retrieval-Augmented Generation (RAG) is an effective solution to overcome the limitations of Large Language Models (LLMs) in terms of specific-domain knowledge and timely information updates. However, current RAG methods typically respond to queries based on isolated segments, lacking the ability to integrate information within the same document. This undermines performance in real-world tasks requiring coherent understanding across an entire document. Notably, the human brain naturally integrates and summarizes prior knowledge upon reading a given text, progressively formulating a comprehensive understanding. Motivated by this cognitive process, we propose the Hierarchical Two-Stage Summarization-based Information Retrieval (HTSIR) method,whichpreprocesses the corpus prior to retrieval, summarizes continuous texts to obtain integrated information, and constructs a retrieval tree with va...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","1970-01-01","retrieval"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/byol-bring-your-own-language-into-llms","title":"BYOL: Bring Your Own Language into LLMs","url":"https://www.microsoft.com/en-us/research/publication/byol-bring-your-own-language-into-llms/","published":"2026-01-01","authors":["Waqas Zamir","Wassim Hamidouche","Boulbaba Ben Amor","Luana Marotti","Inbal Becker-Reshef","Juan M. Lavista Ferres"],"abstract":"Large Language Models (LLMs) exhibit strong multilingual capabilities, yet remain fundamentally constrained by the severe imbalance in global language resources. While over 7,000 languages are spoken worldwide, only a small subset (fewer than 100) has sufficient digital presence to meaningfully influence modern LLM training. This disparity leads to systematic underperformance, cultural misalignment, and limited accessibility for speakers of low-resource and extreme-low-resource languages. To address this gap, we introduce Bring Your Own Language (BYOL), a unified framework for scalable, language-aware LLM development tailored to each language's digital footprint. BYOL begins with a language resource classification that maps languages into four tiers (Extreme-Low, Low, Mid, High) using curated web-scale corpora, and uses this classification to select the appropriate integration pathway. F...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Unpublished","Artificial intelligence","Human language technologies","LLM"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"huawei-noah:193","title":"Constraint Matters: Multi-Modal Representation for Reducing Mixed-Integer Linear programming","url":"https://www.noahlab.com.hk/en/scientific_research/constraint-matters-multi-modal-representation-for-reducing-mixed-integer-linear-programming","published":"2026-01-01","authors":["Huawei/Noah"],"abstract":"Official Huawei Noah's Ark Lab publication entry. Venue: ICLR 2026. External paper link: https://arxiv.org/pdf/2508.18742","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Industry Intelligence","ICLR 2026","2026"],"author_affiliations":["Huawei/Noah"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Huawei Noah Wagtail publication API https://www.noahlab.com.hk/wt_app/api/v2/pages/"}},{"id":"huawei-noah:182","title":"Beyond Speedup - Utilizing KV Cache for Sampling and Reasoning","url":"https://www.noahlab.com.hk/en/scientific_research/beyond-speedup-utilizing-kv-cache-for-sampling-and-reasoning","published":"2026-01-01","authors":["Huawei/Noah"],"abstract":"Official Huawei Noah's Ark Lab publication entry. Venue: ICLR 2026. External paper link: https://arxiv.org/pdf/2601.20326","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Industry Intelligence","ICLR 2026","2026"],"author_affiliations":["Huawei/Noah"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Huawei Noah Wagtail publication API https://www.noahlab.com.hk/wt_app/api/v2/pages/"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/benchmarking-affordance-generalization-with-busybox","title":"Benchmarking Affordance Generalization with BusyBox","url":"https://www.microsoft.com/en-us/research/publication/benchmarking-affordance-generalization-with-busybox/","published":"2026-01-01","authors":["Dean Fortier","Timothy Adamson","Tess Hellebrekers","Teresa LaScala","Kofi Ennin","Michael Murray","Andrey Kolobov","Galen Mullins"],"abstract":"Robot Foundation Models (RFMs), also referred to as Vision-Language Action models (VLAs), have been attracting the attention of researchers and practitioners with a promise of generalizing robot behaviors across tasks, objects, and environments. The community has extensively studied RFMs' generalization capabilities in the vision and language space. However, affordance generalization – RFMs' ability to manipulate new objects with familiar physical features - remains largely unexplored. In the meantime, this meta-skill is plays a critical rule in a person's ability to quickly figure out how to handle hitherto unseen objects. In fact, basic physical interface elements like buttons and switches are designed to look and function similarly across different devices to facilitate affordance generalization in environments inhabited by people. Whether robots can capitalize on these design aids re...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W7129089911","title":"MC#: Mixture Compressor for Mixture-of-Experts Large Models","url":"https://doi.org/10.1109/tpami.2026.3664873","published":"2026-01-01","authors":["Wei Huang","Yue Liao","Yukang Chen","Jianhui Liu","Haoru Tan","Si Liu","Shiming Zhang","Shuicheng Yan","Xiaojuan Qi"],"abstract":"Mixture-of-Experts (MoE) has emerged as an effective and efficient scaling mechanism for large language models (LLMs) and vision-language models (VLMs). By expanding a single feed-forward network into multiple expert branches, MoE increases model capacity while maintaining efficiency through sparse activation. However, despite this sparsity, the need to preload all experts into memory and activate multiple experts per input introduces significant computational and memory overhead. The expert module becomes the dominant contributor to model size and inference cost, posing a major challenge for deployment. To address this, we propose MC# (Mixture-Compressor-sharp), a unified framework that combines static quantization and dynamic expert pruning by leveraging the significance of both experts and tokens to achieve aggressive compression of MoE-LLMs/VLMs. To reduce storage and loading overhea...","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/tpami.2026.3664873","openalex_id":"https://openalex.org/W7129089911","cited_by_count":0,"quality_score":53,"matched_keywords":["memory","efficient","compression","quantization"],"author_affiliations":["Beihang University","National University of Singapore","Nvidia (United States)","University of Hong Kong"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7807999849319458},{"id":"https://openalex.org/C28855332","display_name":"Quantization (signal processing)","score":0.603600025177002},{"id":"https://openalex.org/C2776214188","display_name":"Inference","score":0.5020999908447266},{"id":"https://openalex.org/C48145219","display_name":"Security token","score":0.44209998846054077},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4113999903202057},{"id":"https://openalex.org/C111335779","display_name":"Reduction (mathematics)","score":0.38429999351501465},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.3743000030517578},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.3612000048160553}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"arxiv:2504.11002","title":"Dopamine Audiobook: A Training-free MLLM Agent for Emotional and Immersive Audiobook Generation","url":"http://arxiv.org/abs/2504.11002","published":"2026-01-01","authors":["Yan Rong","Shan Yang","Chengyu Li","Yu Dong","Li Liu"],"abstract":"Audiobook generation aims to create rich, immersive listening experiences from multimodal inputs, but current approaches face three critical challenges: (1) the lack of synergistic generation of diverse audio types (e.g., speech, sound effects, and music) with precise temporal and semantic alignment; (2) the difficulty in conveying expressive, fine-grained emotions, which often results in machine-like vocal outputs; and (3) the absence of automated evaluation frameworks that align with human preferences for complex and diverse audio. To address these issues, we propose Dopamine Audiobook, a novel unified training-free multi-agent system, where a multimodal large language model (MLLM) serves two specialized roles (i.e., speech designer and audio designer) for emotional, human-like, and immersive audiobook generation and evaluation. Specifically, we firstly propose a flow-based, context-aw...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/taslpro.2026.3688415","openalex_id":"https://openalex.org/W4416852257","cited_by_count":0,"quality_score":53,"matched_keywords":["language model","retrieval","agent","multi-agent"],"author_affiliations":["Tencent (China)"],"concepts":[{"id":"https://openalex.org/C133378560","display_name":"Paralanguage","score":0.7124999761581421},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6692000031471252},{"id":"https://openalex.org/C177291462","display_name":"Active listening","score":0.5419999957084656},{"id":"https://openalex.org/C542774811","display_name":"Prosody","score":0.49570000171661377},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.438400000333786},{"id":"https://openalex.org/C2779530757","display_name":"Quality (philosophy)","score":0.38190001249313354},{"id":"https://openalex.org/C28490314","display_name":"Speech recognition","score":0.33180001378059387},{"id":"https://openalex.org/C2779304628","display_name":"Face (sociological concept)","score":0.3050000071525574}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7123356248","title":"LSTD: Long Short-Term Temporal Diffusion for Video Generation","url":"https://doi.org/10.1109/tmm.2026.3651052","published":"2026-01-01","authors":["Haoyu Zhao","Jiaxi Gu","Shicong Wang","Tianyi Lu","Xing Zhang","Zuxuan Wu","Hang Xu","Yu-Gang Jiang"],"abstract":"Recently, text-driven video generation has achieved tremendous progress. However, existing methods neglect the contexts of long short-term frames in the video, thereby compromising temporal consistency. They also encounter challenges of heavy memory costs due to the use of the standard temporal attention mechanism and misalignment between training videos and captions. Additionally, previous approaches for long video generation are flawed because they are hard to ensure content diversity and consistency. To alleviate these issues, we propose a novel Long Short-term Temporal Diffusion (LSTD) model to generate videos with superior temporal consistency. We introduce two novel temporal modules, <italic xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">i.e.</i>, the Short-term Temporal Convolution and the Long-term Temporal Attention. The former can lear...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/tmm.2026.3651052","openalex_id":"https://openalex.org/W7123356248","cited_by_count":1,"quality_score":50,"matched_keywords":["memory","long-term","efficient"],"author_affiliations":["Fudan University","Huawei Technologies (Sweden)","Shanghai Key Laboratory of Trustworthy Computing","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8830999732017517},{"id":"https://openalex.org/C2776214188","display_name":"Inference","score":0.6078000068664551},{"id":"https://openalex.org/C77277458","display_name":"Temporal database","score":0.5990999937057495},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.54339998960495},{"id":"https://openalex.org/C45347329","display_name":"Convolution (computer science)","score":0.5332000255584717},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.45010000467300415},{"id":"https://openalex.org/C65483669","display_name":"Video processing","score":0.42570000886917114},{"id":"https://openalex.org/C202474056","display_name":"Video tracking","score":0.42149999737739563}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W7125431163","title":"Unifying Multi-modal Hair Editing via Proxy Feature Blending","url":"https://doi.org/10.1109/tpami.2026.3656763","published":"2026-01-01","authors":["Tianyi Wei","Dongdong Chen","Wenbo Zhou","Jing Liao","Can Wang","Weiming Zhang","Gang Hua","Nenghai Yu"],"abstract":"Hair editing is a long-standing problem in computer vision that demands both fine-grained local control and intuitive user interactions across diverse modalities. Despite the remarkable progress of GANs and diffusion models, existing methods still lack a unified framework that simultaneously supports arbitrary interaction modes (e.g., text, sketch, mask, and reference image) while ensuring precise editing and faithful preservation of irrelevant attributes. In this work, we introduce a novel paradigm that reformulates hair editing as proxy-based hair transfer. Specifically, we leverage the dense and semantically disentangled latent space of StyleGAN for precise manipulation and exploit its feature space for disentangled attribute preservation, thereby decoupling the objectives of editing and preservation. Our framework unifies different modalities by converting editing conditions into dis...","companies":["Microsoft","Amazon"],"matched_orgs":["Microsoft","Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/tpami.2026.3656763","openalex_id":"https://openalex.org/W7125431163","cited_by_count":0,"quality_score":49,"matched_keywords":[],"author_affiliations":["Amazon (United States)","City University of Hong Kong","Microsoft (United States)","University of Science and Technology of China"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8508999943733215},{"id":"https://openalex.org/C165696696","display_name":"Exploit","score":0.8091999888420105},{"id":"https://openalex.org/C153083717","display_name":"Leverage (statistics)","score":0.6840000152587891},{"id":"https://openalex.org/C2776674983","display_name":"Image editing","score":0.6593000292778015},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5645999908447266},{"id":"https://openalex.org/C2776401178","display_name":"Feature (linguistics)","score":0.5159000158309937},{"id":"https://openalex.org/C167966045","display_name":"Generative model","score":0.44029998779296875},{"id":"https://openalex.org/C2779903281","display_name":"Modalities","score":0.4235000014305115}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"arxiv:2601.09385","title":"SLAM-LLM: A Modular, Open-Source Multimodal Large Language Model Framework and Best Practice for Speech, Language, Audio and Music Processing","url":"http://arxiv.org/abs/2601.09385","published":"2026-01-01","authors":["Ziyang Ma","Guanrou Yang","Wenxi Chen","Zhifu Gao","Yexing Du","Xiquan Li","Zhisheng Zhen","Haina Zhu","Jianheng Zhuo","Zheshu Song","Ruiyang Xu","Tiranrui Wang"],"abstract":"The recent surge in open-source Multimodal Large Language Models (MLLM) frameworks, such as LLaVA, provides a convenient kickoff for artificial intelligence developers and researchers. However, most of the MLLM frameworks take vision as the main input modality, and provide limited in-depth support for the modality of speech, audio, and music. This situation hinders the development of audio-language models, and forces researchers to spend a lot of effort on code writing and hyperparameter tuning. We present SLAM-LLM, an open-source deep learning framework designed to train customized MLLMs, focused on speech, language, audio, and music processing. SLAM-LLM provides a modular configuration of different encoders, projectors, LLMs, and parameter-efficient fine-tuning plugins. SLAM-LLM also includes detailed training and inference recipes for mainstream tasks, along with high-performance chec...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/jstsp.2026.3653157","openalex_id":"https://openalex.org/W7123336686","cited_by_count":0,"quality_score":49,"matched_keywords":["LLM","language model","efficient"],"author_affiliations":["Alibaba Group (China)","Hong Kong University of Science and Technology","Nanyang Technological University","Peng Cheng Laboratory","Queen Mary University of London","Shanghai Jiao Tong University","The University of Texas at Austin","Tianjin University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8529000282287598},{"id":"https://openalex.org/C157657479","display_name":"Closed captioning","score":0.8220999836921692},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.5171999931335449},{"id":"https://openalex.org/C2776214188","display_name":"Inference","score":0.4887999892234802},{"id":"https://openalex.org/C184356942","display_name":"Best practice","score":0.4747999906539917},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4657000005245209},{"id":"https://openalex.org/C127220857","display_name":"Audio signal processing","score":0.4348999857902527},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.42730000615119934}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7135217867","title":"CausLab: LLM-driven Multi-agent Bayesian Framework for Causal Discovery and Inference","url":"https://doi.org/10.2139/ssrn.6286418","published":"2026-01-01","authors":["Chen Wang","Shan Huang","Shichao Han","Yuyi Wang"],"abstract":"","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"preprint","doi":"https://doi.org/10.2139/ssrn.6286418","openalex_id":"https://openalex.org/W7135217867","cited_by_count":0,"quality_score":49,"matched_keywords":["LLM","agent","multi-agent"],"author_affiliations":["Tencent (China)","University of Hong Kong"],"concepts":[{"id":"https://openalex.org/C158600405","display_name":"Causal inference","score":0.8112000226974487},{"id":"https://openalex.org/C2776214188","display_name":"Inference","score":0.7019000053405762},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6531999707221985},{"id":"https://openalex.org/C107673813","display_name":"Bayesian probability","score":0.6215999722480774},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.5982999801635742},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5899999737739563},{"id":"https://openalex.org/C160234255","display_name":"Bayesian inference","score":0.5788999795913696},{"id":"https://openalex.org/C177264268","display_name":"Set (abstract data type)","score":0.513700008392334}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"arxiv:2405.14093","title":"A Survey on Vision–Language–Action Models for Embodied AI","url":"http://arxiv.org/abs/2405.14093","published":"2026-01-01","authors":["Yueen Ma","Zixing Song","Yuzheng Zhuang","Jianye Hao","Irwin King"],"abstract":"Embodied AI is widely recognized as a cornerstone of artificial general intelligence (AGI) because it involves controlling embodied agents to perform tasks in the physical world. Building on the success of large language models (LLMs) and vision-language models (VLMs), a new category of multimodal models-referred to as vision-language-action (VLA) models-has emerged to address language-conditioned robotic tasks in embodied AI by leveraging their distinct ability to generate actions. The recent proliferation of VLAs necessitates a comprehensive survey to capture the rapidly evolving landscape. To this end, we present the first survey on VLAs for embodied AI. This work provides a detailed taxonomy of VLAs, organized into three major lines of research. The first line focuses on individual components of VLAs. The second line is dedicated to developing VLA-based control policies adept at pred...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"preprint","doi":"https://doi.org/10.1109/tnnls.2025.3650584","openalex_id":"https://openalex.org/W4398796510","cited_by_count":10,"quality_score":47,"matched_keywords":[],"author_affiliations":["Chinese University of Hong Kong","Huawei Technologies (China)","University of Bristol"],"concepts":[{"id":"https://openalex.org/C100609095","display_name":"Embodied cognition","score":0.8956042528152466},{"id":"https://openalex.org/C2780791683","display_name":"Action (physics)","score":0.6346052289009094},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5124750733375549},{"id":"https://openalex.org/C188147891","display_name":"Cognitive science","score":0.4584502875804901},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.44196510314941406},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.3473881483078003},{"id":"https://openalex.org/C15744967","display_name":"Psychology","score":0.3385435938835144},{"id":"https://openalex.org/C121332964","display_name":"Physics","score":0.053335726261138916}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":10}},{"id":"openalex:W7127974238","title":"Orchestrating Well Analytics with Agentic Intelligence","url":"https://doi.org/10.3997/2214-4609.202639044","published":"2026-01-01","authors":["Y. Gubanov","T.B. Grant","D. Tishechkin"],"abstract":"Summary The energy industry continues to face fragmented data landscapes and complex analytical workflows that impede well performance insights and production optimization. While Retrieval Augmented Generation (RAG) approaches dominated early applications of large language models (LLMs) to subsurface data, industry focus has shifted toward Agentic AI architectures that offer greater autonomy in data processing and analytical workflows. These agent-based systems can orchestrate multiple tools, make autonomous decisions, and handle significantly more complex multi-step analytical tasks. This paper outlines a transformative approach that integrates three complementary technologies: Agentic AI architectures, Model Context Protocol (MCP) servers, and adaptive user experience (UX) design. This intelligent, self-orchestrating system seamlessly bridges disparate data sources across geographical....","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.3997/2214-4609.202639044","openalex_id":"https://openalex.org/W7127974238","cited_by_count":0,"quality_score":45,"matched_keywords":["retrieval","agent"],"author_affiliations":["Amazon (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6901000142097473},{"id":"https://openalex.org/C2522767166","display_name":"Data science","score":0.6334999799728394},{"id":"https://openalex.org/C177212765","display_name":"Workflow","score":0.604200005531311},{"id":"https://openalex.org/C2779343474","display_name":"Context (archaeology)","score":0.5049999952316284},{"id":"https://openalex.org/C70587473","display_name":"Transformative learning","score":0.49399998784065247},{"id":"https://openalex.org/C192209626","display_name":"Focus (optics)","score":0.44690001010894775},{"id":"https://openalex.org/C79158427","display_name":"Analytics","score":0.3797999918460846},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.361299991607666}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"arxiv:2506.02954","title":"Mutation-Guided Unit Test Generation with a Large Language Model","url":"https://arxiv.org/abs/2506.02954","published":"2026-01-01","authors":["Guancheng Wang","Qinghua Xu","Lionel Briand","Kui Liu"],"abstract":"Unit tests play a vital role in uncovering potential faults in software. While tools like EvoSuite focus on maximizing code coverage, recent advances in large language models (LLMs) have shifted attention toward LLM-based test generation. However, code coverage metrics—such as line and branch coverage—remain overly emphasized in reported research, despite being weak indicators of a test suite’s fault-detection capability. In contrast, <italic xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">mutation score</i> offers a more reliable and stringent measure, as demonstrated in our findings where some test suites achieve 100% coverage but only 4% mutation score. Although a few studies consider mutation score, the effectiveness of LLMs in killing mutants remains underexplored. <p xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" xmlns:xlink=\"http://www.w3....","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/tse.2026.3682975","openalex_id":"https://openalex.org/W7153155872","cited_by_count":0,"quality_score":45,"matched_keywords":["LLM","language model"],"author_affiliations":["Huawei Technologies (China)","University of Limerick","University of Ottawa"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8357999920845032},{"id":"https://openalex.org/C199360897","display_name":"Programming language","score":0.5027999877929688},{"id":"https://openalex.org/C2777267654","display_name":"Test (biology)","score":0.38519999384880066},{"id":"https://openalex.org/C67186912","display_name":"Data modeling","score":0.3806999921798706},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.3483999967575073},{"id":"https://openalex.org/C148027188","display_name":"Unit testing","score":0.3393999934196472},{"id":"https://openalex.org/C7166840","display_name":"System testing","score":0.3131999969482422},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3093999922275543}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7123365217","title":"HMS <sup>2</sup> Net: Heterogeneous Multimodal State Space Network via CLIP for Dynamic Scene Classification in Livestreaming","url":"https://doi.org/10.1109/tmm.2025.3632629","published":"2026-01-01","authors":["Jiafeng Li","Jing Zhang","Li Zhuo","Qi Tian"],"abstract":"Livestreaming platforms attract countless daily active users, making online content regulation imperative. The complex and diverse multimodal content elements in dynamic livestreaming scene pose a great challenge to video content understanding. Thanks to the success of contrastive language-image pre-training (CLIP) for dynamic scene classification, which is one of the basic tasks of video content understanding. We propose a heterogeneous multimodal state space network (HMS<sup xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">2</sup>Net) for dynamic scene classification in livestreaming via CLIP. (1) To fully and efficiently mine the dynamic scene elements in livestreaming, we design a heterogeneous teacher-student Transformer (HT-SFormer) with CLIP to extract multimodal features in an energy-efficient unified pipeline; (2) To cope with the possibl...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/tmm.2025.3632629","openalex_id":"https://openalex.org/W7123365217","cited_by_count":0,"quality_score":45,"matched_keywords":["memory","efficient"],"author_affiliations":["Beijing University of Technology","Huawei Technologies (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8751999735832214},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5627999901771545},{"id":"https://openalex.org/C83665646","display_name":"Feature vector","score":0.4699000120162964},{"id":"https://openalex.org/C52622490","display_name":"Feature extraction","score":0.4674000144004822},{"id":"https://openalex.org/C202269582","display_name":"Complementarity (molecular biology)","score":0.4551999866962433},{"id":"https://openalex.org/C2776401178","display_name":"Feature (linguistics)","score":0.4372999966144562},{"id":"https://openalex.org/C147168706","display_name":"Recurrent neural network","score":0.38119998574256897},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.35659998655319214}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7154576002","title":"From Image to Pixels: towards Fine-Grained Medical Vision-Language Models","url":"https://doi.org/10.1109/tpami.2026.3682684","published":"2026-01-01","authors":["Lingdong Shen","Xiaoshuang Huang","Fangxin Shang","Xi Zhang","Yehui Yang","Bin Fan","Shiming Xiang"],"abstract":"Multimodal large language models (MLLMs) offer immense potential for biomedical AI, yet current applications remain limited to coarse-grained image understanding and basic textual queries-falling short of the fine-grained reasoning required in clinical contexts. In this work, we present a comprehensive solution spanning data, model, and training innovations to advance pixel-level multimodal intelligence in biomedicine. First, we construct MeCoVQA, a new visual-language benchmark that spans eight medical imaging modalities and four core tasks, supporting both spatially-grounded reasoning and fine grained diagnostic comprehension. Building on this, we introduce MedPLIB, an end-to-end biomedical MLLM equipped with pixel level visual understanding. MedPLIB supports diverse multi modal tasks-including VQA, point- and region-based querying, grounding, and segmentation-through unified modeling....","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/tpami.2026.3682684","openalex_id":"https://openalex.org/W7154576002","cited_by_count":0,"quality_score":45,"matched_keywords":["retrieval","efficient"],"author_affiliations":["Baidu (China)","Beijing Academy of Artificial Intelligence","Beijing Dance Academy","China Agricultural University","Institute of Automation","Peking University","Shandong Institute of Automation","University of Science and Technology Beijing"],"concepts":[{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.7419999837875366},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.7041000127792358},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6940000057220459},{"id":"https://openalex.org/C115961682","display_name":"Image (mathematics)","score":0.5291000008583069},{"id":"https://openalex.org/C124504099","display_name":"Image segmentation","score":0.4242999851703644},{"id":"https://openalex.org/C9417928","display_name":"Image processing","score":0.42179998755455017},{"id":"https://openalex.org/C31601959","display_name":"Medical imaging","score":0.39329999685287476},{"id":"https://openalex.org/C153180895","display_name":"Pattern recognition (psychology)","score":0.3693999946117401}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7130726943","title":"FLAME: Enhancing Functional Coverage in Processor Verification via Large Language Models","url":"https://doi.org/10.1109/tcad.2026.3666802","published":"2026-01-01","authors":["Xiaopeng Li","J. P. Chen","Ming Yan","Dong Wang","Xingyu Fan","Zhentao Tang","Shixiong Kai","Jianye Hao","Mingxuan Yuan","Zan Wang"],"abstract":"Processor functional verification plays a crucial role in ensuring the quality of processor designs. Traditional techniques like Constrained Random Verification (CRV) struggle to achieve high functional coverage due to the vast instruction space of processors. While LLM-based techniques show potential, merely instructing LLMs has notable limitations, especially when addressing functional points that require deep semantic understanding. To tackle these challenges, we propose a novel technique, FLAME, which specially designs and integrates Retrieval-Augmented Generation (RAG), Chain-of-Thought (CoT), and functional-coverage-guided feedback strategies. This technique establishes semantic mappings between functional points and instructions, enabling the iterative generation of valid and effective test cases. Evaluation on four widely-used open-source processor designs shows that FLAME outper...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/tcad.2026.3666802","openalex_id":"https://openalex.org/W7130726943","cited_by_count":0,"quality_score":45,"matched_keywords":["LLM","retrieval"],"author_affiliations":["Huawei Technologies (China)","Tianjin University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8119000196456909},{"id":"https://openalex.org/C62460635","display_name":"Functional verification","score":0.6208999752998352},{"id":"https://openalex.org/C64346931","display_name":"Functional design","score":0.5192999839782715},{"id":"https://openalex.org/C168167062","display_name":"Component (thermodynamics)","score":0.4535999894142151},{"id":"https://openalex.org/C184337299","display_name":"Semantics (computer science)","score":0.3587000072002411},{"id":"https://openalex.org/C2779530757","display_name":"Quality (philosophy)","score":0.3547999858856201},{"id":"https://openalex.org/C62235348","display_name":"Functional requirement","score":0.3192000091075897},{"id":"https://openalex.org/C42383842","display_name":"Functional programming","score":0.31189998984336853}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7139125689","title":"DeSTA2.5-Audio: Toward General-Purpose Large Audio Language Model With Self-Generated Cross-Modal Alignment","url":"https://doi.org/10.1109/taslpro.2026.3675792","published":"2026-01-01","authors":["K C Lu","Zhehuai Chen","Szu‐Wei Fu","Chao-Han Huck Yang","Sung-Feng Huang","C. C. Yang","Chia-Mu Yu","Chun‐Wei Chen","Weiyou Chen","Chien-yu Huang","Yi‐Cheng Lin","Yuxiang Lin"],"abstract":"","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/taslpro.2026.3675792","openalex_id":"https://openalex.org/W7139125689","cited_by_count":4,"quality_score":45,"matched_keywords":["language model"],"author_affiliations":["National Taipei University","National Taiwan University","Nvidia (United States)","University of Southern California"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8108000159263611},{"id":"https://openalex.org/C28490314","display_name":"Speech recognition","score":0.555899977684021},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.4756999909877777},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4731999933719635},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.3946000039577484},{"id":"https://openalex.org/C127220857","display_name":"Audio signal processing","score":0.3921000063419342},{"id":"https://openalex.org/C183322885","display_name":"Context model","score":0.37459999322891235},{"id":"https://openalex.org/C61328038","display_name":"Speech processing","score":0.3555999994277954}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":4}},{"id":"openalex:W7138839763","title":"Breaking the Observability Tax: Dynamic Resolution Anomaly Detection via Topology-Aware Active LLM Agents","url":"https://doi.org/10.1109/access.2026.3675074","published":"2026-01-01","authors":["Rahul Kapoor","Miray Kas"],"abstract":"The ‘‘observability tax’’—the escalating cost of telemetry in hyperscale data centers—forces a trade-off between expensive full-fidelity monitoring and low-cost sampling that misses critical ‘‘gray failures.’’ While traditional AIOps models act as passive post-hoc analyzers, the novelty of this work lies in reformulating observability as an active, hypothesis-driven process. To achieve this, we propose a Dynamic Resolution Architecture where a Topology-Aware LLM Agent dynamically controls the telemetry resolution of the underlying infrastructure. Our Topology-Aware Active LLM Agent employs ‘‘Sentinel Sampling,’’ monitoring 100% of nodes with low-cost essential metrics and only upgrading to high-resolution extended telemetry when semantically justified. Unlike rigid heuristics, the agent uses semantic reasoning to distinguish legitimate heavy workloads from pathological failures, even whe...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/access.2026.3675074","openalex_id":"https://openalex.org/W7138839763","cited_by_count":0,"quality_score":45,"matched_keywords":["LLM","agent"],"author_affiliations":["Google (United States)"],"concepts":[{"id":"https://openalex.org/C36299963","display_name":"Observability","score":0.8788999915122986},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7523999810218811},{"id":"https://openalex.org/C79403827","display_name":"Real-time computing","score":0.5649999976158142},{"id":"https://openalex.org/C739882","display_name":"Anomaly detection","score":0.5378000140190125},{"id":"https://openalex.org/C111335779","display_name":"Reduction (mathematics)","score":0.5242000222206116},{"id":"https://openalex.org/C89377073","display_name":"Indirection","score":0.45890000462532043},{"id":"https://openalex.org/C60229501","display_name":"Global Positioning System","score":0.4169999957084656},{"id":"https://openalex.org/C173801870","display_name":"Heuristic","score":0.41280001401901245}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7135155447","title":"Boosting Tool-Calling Capabilities of Large Language Models via a Novel In-Context Learning Approach","url":"https://doi.org/10.1109/access.2026.3673174","published":"2026-01-01","authors":["Junhao Dong","Wei Zhu"],"abstract":"Large language models (LLMs) have demonstrated impressive capabilities in natural language understanding, yet they struggle with tasks requiring real-time information retrieval, complex computations, or integration with external tools. Tool calling has emerged as a key paradigm to extend LLM functionalities, but existing methods often rely on fine-tuning or generic retrieval models that are computationally expensive or suboptimal for task-specific demonstration selection. In this paper, we propose DRanker, a lightweight and effective in-context learning framework that enhances tool calling through intelligent demonstration retrieval and reranking. DRanker employs a fine-tuned reranker model, optimized with a ranking-aware loss function, to select high-quality demonstrations from a candidate set retrieved via dense embeddings. Evaluated on the ToolACE and BFCL benchmarks, DRanker consiste...","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/access.2026.3673174","openalex_id":"https://openalex.org/W7135155447","cited_by_count":0,"quality_score":45,"matched_keywords":["LLM","retrieval"],"author_affiliations":["Amazon (United States)","University of Hong Kong"],"concepts":[{"id":"https://openalex.org/C46686674","display_name":"Boosting (machine learning)","score":0.8855000138282776},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8256000280380249},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5676000118255615},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.40720000863075256},{"id":"https://openalex.org/C67186912","display_name":"Data modeling","score":0.35260000824928284},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.34040001034736633},{"id":"https://openalex.org/C2983448237","display_name":"Language understanding","score":0.33739998936653137},{"id":"https://openalex.org/C183322885","display_name":"Context model","score":0.32589998841285706}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7140198059","title":"OmniParser V2: Structured-Points-of-Thought for Unified Visual Text Parsing and Its Generality to Multimodal Large Language Models","url":"https://doi.org/10.1109/tpami.2026.3677075","published":"2026-01-01","authors":["Wenwen Yu","Zhibo Yang","Jianqiang Wan","Sibo Song","Jun Tang","Wenqing Cheng","Yuliang Liu","Xiang Bai"],"abstract":"Visually-situated text parsing (VsTP) has recently seen notable advancements, driven by the growing demand for automated document understanding and the emergence of large language models capable of processing document-based questions. While various methods have been proposed to tackle the complexities of VsTP, existing solutions often rely on task-specific architectures and objectives for individual tasks. This leads to modal isolation and complex workflows due to the diversified targets and heterogeneous schemas. In this paper, we introduce OmniParser V2, a universal model that unifies VsTP typical tasks, including text spotting, key information extraction, table recognition, and layout analysis, into a unified framework. Central to our approach is the proposed Structured-Points-of-Thought (SPOT) prompting schemas, which improves model performance across diverse scenarios by leveraging....","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/tpami.2026.3677075","openalex_id":"https://openalex.org/W7140198059","cited_by_count":1,"quality_score":42,"matched_keywords":["language model"],"author_affiliations":["Alibaba Group (China)","East China University of Science and Technology","Huazhong University of Science and Technology"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8780999779701233},{"id":"https://openalex.org/C2780767217","display_name":"Generality","score":0.7813000082969666},{"id":"https://openalex.org/C186644900","display_name":"Parsing","score":0.7558000087738037},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.6007999777793884},{"id":"https://openalex.org/C26517878","display_name":"Key (lock)","score":0.5490999817848206},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.5049999952316284},{"id":"https://openalex.org/C45235069","display_name":"Table (database)","score":0.47780001163482666},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.46050000190734863}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W7123360386","title":"Multi-modal Cross-Attention Guided Network for Audio-Visual Quality Evaluation via Visual Saliency and Mel-spectrum Features","url":"https://doi.org/10.1109/tcsvt.2026.3652641","published":"2026-01-01","authors":["Junhao Lin","Yueli Cui","Chenli Fang","Binghong Pan","Chencheng Pan","Gangyi Jiang","Shiqing Zhang","Siwei Ma","Qi Tian"],"abstract":"The quality evaluation of audio-visual (A/V) content has become increasingly critical in modern multimedia communication systems. Traditional single-modality quality evaluation methods and existing dedicated A/V quality models often fail to accurately assess the quality of A/V signals. To address this challenge, we propose a novel multi-modal cross-attention guided network specifically designed for A/V quality evaluation. By leveraging visual saliency and Mel-spectrum features, our network aims to achieve accurate and comprehensive quality evaluation. Specifically, distorted video frames are first converted into saliency maps, from which perceptually salient patches are selectively extracted and fed into a Convolutional Neural Network (CNN) for intra-frame visual feature extraction. Concurrently, the distorted audio signal is transformed into a Mel-spectrum, and time-frequency patches ar...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/tcsvt.2026.3652641","openalex_id":"https://openalex.org/W7123360386","cited_by_count":1,"quality_score":42,"matched_keywords":["long-term"],"author_affiliations":["Huawei Technologies (China)","Ningbo University","Peking University","Taizhou University","University of Southern California","Viterbo University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8737000226974487},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.6787999868392944},{"id":"https://openalex.org/C2780719617","display_name":"Salient","score":0.6722000241279602},{"id":"https://openalex.org/C2779530757","display_name":"Quality (philosophy)","score":0.6309000253677368},{"id":"https://openalex.org/C176217482","display_name":"Metric (unit)","score":0.6122999787330627},{"id":"https://openalex.org/C2776401178","display_name":"Feature (linguistics)","score":0.5806000232696533},{"id":"https://openalex.org/C81363708","display_name":"Convolutional neural network","score":0.5267000198364258},{"id":"https://openalex.org/C102392041","display_name":"Sliding window protocol","score":0.4909000098705292}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W7156289985","title":"X2Video: Adapting Diffusion Models for Multimodal Controllable Neural Video Rendering","url":"https://doi.org/10.1109/tvcg.2026.3687740","published":"2026-01-01","authors":["Zhitong Huang","Mohan Zhang","Renhan Wang","Rui Tang","Hao Zhu","Jing Liao"],"abstract":"We present X2Video, the first diffusion model for rendering photorealistic videos guided by a sequence of intrinsic channels including albedo, normal, roughness, metallicity, and irradiance, while supporting intuitive multi-modal controls with reference images and text prompts for both global and local regions. The intrinsic guidance allows accurate manipulation of color, material, geometry, and lighting, while reference images and text prompts provide intuitive adjustments in the absence of intrinsic information. To enable these functionalities, we extend the intrinsic-guided image generation model XRGB to video generation by employing a novel and efficient Hybrid Self-Attention, which ensures temporal consistency across video frames and also enhances fidelity to reference images. We further develop a Masked Cross-Attention to disentangle global and local text prompts, applying them eff...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/tvcg.2026.3687740","openalex_id":"https://openalex.org/W7156289985","cited_by_count":0,"quality_score":41,"matched_keywords":["efficient"],"author_affiliations":["City University of Hong Kong","Tencent (China)","University of Hong Kong"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8374999761581421},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.6319000124931335},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.6212000250816345},{"id":"https://openalex.org/C205711294","display_name":"Rendering (computer graphics)","score":0.5996000170707703},{"id":"https://openalex.org/C121684516","display_name":"Computer graphics (images)","score":0.5349000096321106},{"id":"https://openalex.org/C36464697","display_name":"Visualization","score":0.5212000012397766},{"id":"https://openalex.org/C50644808","display_name":"Artificial neural network","score":0.3962000012397766},{"id":"https://openalex.org/C172367668","display_name":"Data visualization","score":0.37709999084472656}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7155418544","title":"WordCon: Word-level Typography Control in Visual Text Rendering","url":"https://doi.org/10.1109/tcsvt.2026.3686871","published":"2026-01-01","authors":["Wenda Shi","Yiren Song","Zihan Rao","D X Zhang","Jiaming Liu","Xingxing Zou"],"abstract":"Visual text rendering represents a fundamental capability of large-scale text-to-image (T2I) models, yet achieving precise word-level controllability remains a significant challenge in this domain. While existing approaches primarily focus on text content accuracy, they often fail to provide fine-grained control over typographic attributes at the word level. To address this limitation, we introduce a comprehensive solution comprising three key components: (1) a novel word-level controlled scene text dataset and benchmark, (2) the Text-Image Alignment (TIA) framework that leverages cross-modal correspondence between textual queries and local image regions through grounding models, and (3) WordCon, a hybrid parameter-efficient fine-tuning (PEFT) method that employs selective parameter reparameterization to enhance both computational efficiency and model portability. The proposed framework....","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/tcsvt.2026.3686871","openalex_id":"https://openalex.org/W7155418544","cited_by_count":0,"quality_score":41,"matched_keywords":["efficient"],"author_affiliations":["Alibaba Group (China)","Chongqing University","Hong Kong Polytechnic University","National University of Singapore","Zhejiang University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7932000160217285},{"id":"https://openalex.org/C121684516","display_name":"Computer graphics (images)","score":0.597000002861023},{"id":"https://openalex.org/C166422571","display_name":"Typography","score":0.5317999720573425},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4903999865055084},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.48159998655319214},{"id":"https://openalex.org/C205711294","display_name":"Rendering (computer graphics)","score":0.47519999742507935},{"id":"https://openalex.org/C36464697","display_name":"Visualization","score":0.3986000120639801},{"id":"https://openalex.org/C9417928","display_name":"Image processing","score":0.3249000012874603}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7140164214","title":"VLN-Game: Vision-Language Equilibrium Search for Zero-Shot Semantic Navigation","url":"https://doi.org/10.1109/tro.2026.3677047","published":"2026-01-01","authors":["Bangguo Yu","Yuzhen Liu","Lei Han","Hamidreza Kasaei","Tingguang Li","Ming Cao"],"abstract":"Following human instructions to explore and search for a specified target in an unfamiliar environment is a crucial skill for mobile service robots. Most of the previous works on object goal navigation have typically focused on a single input modality as the target, which may lead to limited consideration of language descriptions containing detailed attributes and spatial relationships. To address this limitation, we propose VLN-Game, a novel zero-shot framework for visual target navigation that can process object names and descriptive language targets effectively. To be more precise, our approach constructs a 3D object-centric spatial map by integrating pre-trained visual-language features with a 3D reconstruction of the physical environment. Then, the framework identifies the most promising areas to explore in search of potential target candidates. A game-theoretic vision-language mode...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/tro.2026.3677047","openalex_id":"https://openalex.org/W7140164214","cited_by_count":0,"quality_score":41,"matched_keywords":["language model"],"author_affiliations":["Tencent (China)","University of Groningen"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8104000091552734},{"id":"https://openalex.org/C2781238097","display_name":"Object (grammar)","score":0.5952000021934509},{"id":"https://openalex.org/C98045186","display_name":"Process (computing)","score":0.5922999978065491},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5812000036239624},{"id":"https://openalex.org/C2780226545","display_name":"Modality (human–computer interaction)","score":0.46070000529289246},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.4526999890804291},{"id":"https://openalex.org/C26990112","display_name":"Mobile robot navigation","score":0.45239999890327454},{"id":"https://openalex.org/C19966478","display_name":"Mobile robot","score":0.4429999887943268}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7140774180","title":"POSITION: Open World 3D Scene CAD Recomposition","url":"https://doi.org/10.1109/tip.2026.3676289","published":"2026-01-01","authors":["Rongkun Yang","Hongda Liu","Yijun Chen","Sheng Ao","Yongjian Zhang","Longguang Wang","Kaiwen Xue","Shunbo Zhou","Yulan Guo"],"abstract":"3D scene CAD recomposition aims to reconstruct a given scene by retrieving and assembling CAD models from a database, so as to accurately simulate the geometric properties and spatial arrangement of the original environment. Recent methods learn this task through training on limited scan-to-CAD annotation data, which hinders their generalization to diverse real-world scenes. In this paper, we propose POSITION, an open-world 3D scene CAD recomposition method to construct the 3D scene with CADs retrieved from an open-set database. POSITION is designed following a divide-and-conquer strategy. Firstly, we extract open-world multi-modal object representations from a captured 3D scene. Secondly, on top of the representations, we propose a coarse-to-fine retrieval method to retrieve CADs that are visually, geometrically and semantically match real objects. Thirdly, we present a physically plaus...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/tip.2026.3676289","openalex_id":"https://openalex.org/W7140774180","cited_by_count":0,"quality_score":41,"matched_keywords":["retrieval"],"author_affiliations":["Cloud Computing Center","Huawei Technologies (China)","Sun Yat-sen University","Xiamen University"],"concepts":[{"id":"https://openalex.org/C194789388","display_name":"CAD","score":0.8460000157356262},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7663999795913696},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.6309999823570251},{"id":"https://openalex.org/C108882727","display_name":"Solid modeling","score":0.6294000148773193},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.5799999833106995},{"id":"https://openalex.org/C177148314","display_name":"Generalization","score":0.5192999839782715},{"id":"https://openalex.org/C2780801425","display_name":"Construct (python library)","score":0.44780001044273376},{"id":"https://openalex.org/C2781238097","display_name":"Object (grammar)","score":0.42179998755455017}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7133957177","title":"O37: Demystifying base large language model: Reproducibility and accuracy in ACMG/AMP variant classification","url":"https://doi.org/10.1016/j.gimo.2026.104325","published":"2026-01-01","authors":["Sam Nixa","Zack Haugan","Chen Wang","Sanjana Reddy","Angela Pickart","Emily Lauer","Zhiyv Niu"],"abstract":"Large language models (LLMs) are increasingly used by both consumers and clinicians to interpret medical and genetic test results via AI-powered chatbots and inference engines. Recent work underscores concern about their reproducibility, safety, and parameter sensitivity in these settings. Although online chatbots are not direct model calls but wrap foundation models via complex structure with multiple API calls and other machine learning models, the intrinsic nature of generative models exhibits random variation.","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1016/j.gimo.2026.104325","openalex_id":"https://openalex.org/W7133957177","cited_by_count":0,"quality_score":41,"matched_keywords":["language model"],"author_affiliations":["Google (United States)","WinnMed"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6277999877929688},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5699999928474426},{"id":"https://openalex.org/C42058472","display_name":"Base (topology)","score":0.47429999709129333},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.4406000077724457},{"id":"https://openalex.org/C9893847","display_name":"Reproducibility","score":0.4016000032424927},{"id":"https://openalex.org/C153180895","display_name":"Pattern recognition (psychology)","score":0.3319000005722046},{"id":"https://openalex.org/C124101348","display_name":"Data mining","score":0.32829999923706055},{"id":"https://openalex.org/C2780009758","display_name":"Measure (data warehouse)","score":0.3249000012874603}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7131908978","title":"MambaGesture2: Co-Speech Gesture Generation via Hierarchical Fusion and Spatiotemporal Aggregation","url":"https://doi.org/10.1109/tmm.2026.3668541","published":"2026-01-01","authors":["Chencan Fu","Yabiao Wang","Haoyang He","Shuo Wang","Chengjie Wang","Ying Tai","Yong Liu","Jiangning Zhang"],"abstract":"Co-speech gesture generation plays a vital role in producing synchronized and natural human gestures, thereby enhancing the realism of avatars in virtual environments. Although diffusion models have shown strong generative capabilities, their combination with transformer-based architectures often incurs high computational costs due to the quadratic complexity of self-attention. Moreover, as a temporal sequence modeling task, existing methods frequently struggle to effectively capture multi-scale temporal dynamics inherent in speech and gesture signals. To address these challenges, we propose MambaGesture2, a novel framework that integrates a Mamba-based denoising network, Hierarchical U-Net Gesture Mamba (HUG-Mamba), with a multimodal feature fusion module, SEAD. HUG-Mamba combines the efficient state-space modeling of Mamba blocks with the hierarchical sampling of the U-Net architecture...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/tmm.2026.3668541","openalex_id":"https://openalex.org/W7131908978","cited_by_count":0,"quality_score":41,"matched_keywords":["efficient"],"author_affiliations":["Nanjing University","Suzhou University of Science and Technology","Tencent (China)","Zhejiang University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.880299985408783},{"id":"https://openalex.org/C207347870","display_name":"Gesture","score":0.6696000099182129},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.541100025177002},{"id":"https://openalex.org/C2776214188","display_name":"Inference","score":0.4878999888896942},{"id":"https://openalex.org/C159437735","display_name":"Gesture recognition","score":0.45989999175071716},{"id":"https://openalex.org/C2776401178","display_name":"Feature (linguistics)","score":0.45320001244544983},{"id":"https://openalex.org/C2781181686","display_name":"Coherence (philosophical gambling strategy)","score":0.4399999976158142},{"id":"https://openalex.org/C185798385","display_name":"Benchmark (surveying)","score":0.43050000071525574}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7125589661","title":"MTLM: Incorporating Bidirectional Text Information to Enhance Language Model Training in Speech Recognition Systems","url":"https://doi.org/10.1007/978-3-032-13048-8_11","published":"2026-01-01","authors":["Qingliang Meng","Pengju Ren","Li Tian","Changsong Dai","Huizhi Liang"],"abstract":"","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"book-chapter","doi":"https://doi.org/10.1007/978-3-032-13048-8_11","openalex_id":"https://openalex.org/W7125589661","cited_by_count":0,"quality_score":41,"matched_keywords":["language model"],"author_affiliations":["Baidu (China)","Newcastle University","Tektronix (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.786300003528595},{"id":"https://openalex.org/C57273362","display_name":"Decoding methods","score":0.766700029373169},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.7390000224113464},{"id":"https://openalex.org/C28490314","display_name":"Speech recognition","score":0.6764000058174133},{"id":"https://openalex.org/C159877910","display_name":"Autoregressive model","score":0.6208999752998352},{"id":"https://openalex.org/C2779343474","display_name":"Context (archaeology)","score":0.46959999203681946},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.46560001373291016},{"id":"https://openalex.org/C2780598303","display_name":"Flexibility (engineering)","score":0.45190000534057617}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7131420977","title":"LSFusion: Ladder-Side Attribute Composition for Multi-Aspect Controllable Text Generation","url":"https://doi.org/10.1109/tce.2026.3667936","published":"2026-01-01","authors":["Xiaosong Yuan","Chen Shen","Shaotian Yan","Renchu Guan","Ying Wang","Xuhang Chen","Kim-Fung Tsang"],"abstract":"Multi-aspect controllable text generation (CTG), opposite to single-aspect CTG, aims to produce texts that align with multiple attributes. Incorporating attribute information through supervised fine-tuning for pre-trained language models (PLMs) on a related task is effective for single-aspect CTG. However, extending PLMs to accommodate new attributes in such an approach presents challenges in scalability, demanding substantial computational resources and often undermining performance on previously established attribute control capabilities. To tackle these challenges, we introduce LSFusion, a parameterefficient ladder-side fusion method tailored for multi-aspect CTG. Concretely, LSFusion consists of two main stages: first, we train several ladder-side networks on corresponding data for distinct attributes with the parameters of backbone PLMs frozen. Secondly, we fuse these individual net...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/tce.2026.3667936","openalex_id":"https://openalex.org/W7131420977","cited_by_count":0,"quality_score":41,"matched_keywords":["memory"],"author_affiliations":["Alibaba Group (China)","Huizhou University","Jilin Province Science and Technology Department","Jilin University","Shenzhen Institutes of Advanced Technology"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7986999750137329},{"id":"https://openalex.org/C48044578","display_name":"Scalability","score":0.7199000120162964},{"id":"https://openalex.org/C141353440","display_name":"Fuse (electrical)","score":0.6675999760627747},{"id":"https://openalex.org/C2780451532","display_name":"Task (project management)","score":0.5820000171661377},{"id":"https://openalex.org/C2985684807","display_name":"Text generation","score":0.48910000920295715},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.46309998631477356},{"id":"https://openalex.org/C2775924081","display_name":"Control (management)","score":0.40610000491142273},{"id":"https://openalex.org/C175154964","display_name":"Task analysis","score":0.3801000118255615}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7139938267","title":"LLM-Enhanced Failure Localization in Microservices: Integrating Multi-Modal Data and Expert Interpretation","url":"https://doi.org/10.1109/tsc.2026.3676262","published":"2026-01-01","authors":["Zhenyu Zhong","Ruowei Fu","Minghua Ma","Shenglin Zhang","Yongqian Sun","Chetan Bansal","Dan Pei"],"abstract":"Failure localization in microservice environments is increasingly challenging. While large language models (LLMs) have shown promise in software engineering tasks, existing approaches struggle to effectively integrate multi-modal telemetry data (e.g., log, metric and trace) and provide interpretable results. This paper presents LocaleXpert, a novel failure localization system that combines specialized LLM-based agents with traditional AIOps methods to diagnose issues in microservice environments. LocaleXpert introduces three key innovations: (1) a modular pipeline that transforms metrics, logs, and traces into natural language descriptions that LLMs can effectively process, (2) specialized expert agents that analyze each data type and collaborate to identify root causes, and (3) an interpretation mechanism that produces clear, actionable explanations of its reasoning process. Evaluation....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/tsc.2026.3676262","openalex_id":"https://openalex.org/W7139938267","cited_by_count":0,"quality_score":41,"matched_keywords":["LLM"],"author_affiliations":["Microsoft (United States)","Nankai University","Tsinghua University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7731000185012817},{"id":"https://openalex.org/C527412718","display_name":"Interpretation (philosophy)","score":0.5116000175476074},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.48730000853538513},{"id":"https://openalex.org/C124101348","display_name":"Data mining","score":0.45739999413490295},{"id":"https://openalex.org/C58328972","display_name":"Expert system","score":0.3352000117301941},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.3328000009059906},{"id":"https://openalex.org/C67186912","display_name":"Data modeling","score":0.3165999948978424},{"id":"https://openalex.org/C9652623","display_name":"Field (mathematics)","score":0.2913999855518341}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7140136540","title":"LLM-Based Listwise Reranking Under the Effect of Positional Bias","url":"https://doi.org/10.1007/978-3-032-21289-4_9","published":"2026-01-01","authors":["Jingfen Qiao","Jin Huang","Xinyu Ma","Shuaiqiang Wang","Dawei Yin","Evangelos Kanoulas","Andrew Yates"],"abstract":"","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"book-chapter","doi":"https://doi.org/10.1007/978-3-032-21289-4_9","openalex_id":"https://openalex.org/W7140136540","cited_by_count":0,"quality_score":41,"matched_keywords":["LLM"],"author_affiliations":["Baidu (China)","Johns Hopkins University","University of Amsterdam","University of Cambridge"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8130000233650208},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5338000059127808},{"id":"https://openalex.org/C153180895","display_name":"Pattern recognition (psychology)","score":0.3239000141620636},{"id":"https://openalex.org/C11413529","display_name":"Algorithm","score":0.3066999912261963},{"id":"https://openalex.org/C28490314","display_name":"Speech recognition","score":0.27709999680519104},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.2583000063896179},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.24459999799728394},{"id":"https://openalex.org/C36464697","display_name":"Visualization","score":0.24400000274181366}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7118015194","title":"LLM-Based Keyphrase-Augmented Framework for Semantic Relevance Assessment in E-Commerce","url":"https://doi.org/10.1007/978-981-95-4158-4_27","published":"2026-01-01","authors":["Guoliang Zhang","Gang Zhao","Zhiyuan Zeng","Songyan Liu","Haoyue Zhang","Hui Zhao","Tianshu Wu","PengjieWang","Jiayi Xu","Bo Zheng","Baolin Liu"],"abstract":"","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"book-chapter","doi":"https://doi.org/10.1007/978-981-95-4158-4_27","openalex_id":"https://openalex.org/W7118015194","cited_by_count":0,"quality_score":41,"matched_keywords":["LLM"],"author_affiliations":["Alibaba Group (China)","University of Science and Technology Beijing"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.907800018787384},{"id":"https://openalex.org/C158154518","display_name":"Relevance (law)","score":0.8783000111579895},{"id":"https://openalex.org/C184337299","display_name":"Semantics (computer science)","score":0.5845999717712402},{"id":"https://openalex.org/C2780451532","display_name":"Task (project management)","score":0.5317000150680542},{"id":"https://openalex.org/C2779532271","display_name":"Relevance feedback","score":0.5005999803543091},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4900999963283539},{"id":"https://openalex.org/C26517878","display_name":"Key (lock)","score":0.4828999936580658},{"id":"https://openalex.org/C2776207758","display_name":"Downstream (manufacturing)","score":0.43790000677108765}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7130536996","title":"Integrating AI and Large Language Models for Automated Data Quality Enhancement in Data Integration Systems","url":"https://doi.org/10.1109/ojcs.2026.3666345","published":"2026-01-01","authors":["Nidhin Karunakaran Ponon","Maria Anurag Reddy Basani"],"abstract":"This paper introduces an AI and LLM-based framework to automate data quality improvement in complex data systems. Traditional methods struggle with semantic inconsistencies and evolving schemas, degrading quality as data scales. The framework incorporates Real-Time Semantic Annotation (RTSA), adaptive ontology reinforcement, contextual similarity for duplicate detection, and continuous auto-healing. Explainability is ensured via SHAP-based alignment for transparency. Evaluated on the GOBY Benchmark dataset, it achieved 89.4% semantic annotation accuracy, outperforming the strongest baseline by 3%. The duplicate reduction rate was 64.5%, and the quality score averaged 83.2%, validating the auto-healing loop's effectiveness. It adapts to evolving data without retraining, confirmed by robust performance under semantic drift. The explainability analysis showed a low SHAP divergence of 0.11,....","companies":["Meta/FAIR"],"matched_orgs":["Meta/FAIR"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/ojcs.2026.3666345","openalex_id":"https://openalex.org/W7130536996","cited_by_count":0,"quality_score":41,"matched_keywords":["LLM"],"author_affiliations":["Meta (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8008999824523926},{"id":"https://openalex.org/C48044578","display_name":"Scalability","score":0.5236999988555908},{"id":"https://openalex.org/C24756922","display_name":"Data quality","score":0.5020999908447266},{"id":"https://openalex.org/C90312973","display_name":"Semantic data model","score":0.5016000270843506},{"id":"https://openalex.org/C25810664","display_name":"Ontology","score":0.49570000171661377},{"id":"https://openalex.org/C185798385","display_name":"Benchmark (surveying)","score":0.4749999940395355},{"id":"https://openalex.org/C72634772","display_name":"Data integration","score":0.44940000772476196},{"id":"https://openalex.org/C2776321320","display_name":"Annotation","score":0.4332999885082245}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7154956392","title":"Human-AI Collaboration in Corporate Valuation: Experimental Evidence with a Valuation AI Agent","url":"https://doi.org/10.2139/ssrn.6485198","published":"2026-01-01","authors":["Huan Liu","Miao Liu","Zhizhe Liu","Danqing Mei"],"abstract":"","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"preprint","doi":"https://doi.org/10.2139/ssrn.6485198","openalex_id":"https://openalex.org/W7154956392","cited_by_count":0,"quality_score":41,"matched_keywords":["agent"],"author_affiliations":["Boston College","Cheung Kong Graduate School of Business","Columbia University","Google (United States)"],"concepts":[{"id":"https://openalex.org/C186027771","display_name":"Valuation (finance)","score":0.8461999893188477},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5684999823570251},{"id":"https://openalex.org/C56739046","display_name":"Knowledge management","score":0.4189999997615814},{"id":"https://openalex.org/C122251271","display_name":"Ex-ante","score":0.3970000147819519},{"id":"https://openalex.org/C180198813","display_name":"Information system","score":0.34689998626708984},{"id":"https://openalex.org/C2780801425","display_name":"Construct (python library)","score":0.31459999084472656},{"id":"https://openalex.org/C2781027943","display_name":"Financial statement","score":0.2996000051498413},{"id":"https://openalex.org/C2522767166","display_name":"Data science","score":0.2980000078678131}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7159935193","title":"HD-Custom: Efficient Hierarchical Disentanglement for Coarse-to-Fine Concept Customization in Subject Video Generation","url":"https://doi.org/10.1109/tcsvt.2026.3688304","published":"2026-01-01","authors":["Yuanhang Li","Qi Mao","Xinyan Xiao","Libiao Jin","Siwei Ma"],"abstract":"","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/tcsvt.2026.3688304","openalex_id":"https://openalex.org/W7159935193","cited_by_count":0,"quality_score":41,"matched_keywords":["efficient"],"author_affiliations":["Baidu (China)","Communication University of China","Peking University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8016999959945679},{"id":"https://openalex.org/C183003079","display_name":"Personalization","score":0.5543000102043152},{"id":"https://openalex.org/C2777855551","display_name":"Subject (documents)","score":0.4851999878883362},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3504999876022339},{"id":"https://openalex.org/C36464697","display_name":"Visualization","score":0.257999986410141},{"id":"https://openalex.org/C3020028006","display_name":"Electronic mail","score":0.2578999996185303},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.25679999589920044},{"id":"https://openalex.org/C49774154","display_name":"Multimedia","score":0.2526000142097473}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7127345171","title":"From Contrastive to Generative Alignment: Large-Scale Hierarchical Multi-Modal Pre-training for Hotspot Detection","url":"https://doi.org/10.1109/tcad.2026.3660602","published":"2026-01-01","authors":["Xinyun Zhang","Yuyang Chen","Yiwen Wu","Su Zheng","Ran Chen","Min Li","Hao Geng","Binwu Zhu","Mingxuan Yuan","Bei Yu"],"abstract":"The continuous reduction in semiconductor feature sizes has made hotspot detection (HSD) a critical yet challenging task in optimizing mask designs for manufacturability. While deep learning-based methods show promise, their reliance on large labeled datasets and training from scratch for each design makes them impractical for industrial use due to costly and time-intensive HSD data labeling. To address these challenges, we are the first to investigate self-supervised large-scale multi-modal pre-training for HSD, leveraging both layout images and GDSII text data to learn robust and transferable representations. To enable large-scale pre-training, we construct AugLayout-500K, a dataset of 500K paired layout images and GDSII files generated through an automatic augmentation pipeline. Building on this, we propose a novel self-supervised multi-modal pre-training framework that aligns paired....","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/tcad.2026.3660602","openalex_id":"https://openalex.org/W7127345171","cited_by_count":0,"quality_score":41,"matched_keywords":["efficient"],"author_affiliations":["Chinese University of Hong Kong","Huawei Technologies (China)","ShanghaiTech University","Southeast University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8192999958992004},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5626999735832214},{"id":"https://openalex.org/C124101348","display_name":"Data mining","score":0.4474000036716461},{"id":"https://openalex.org/C153180895","display_name":"Pattern recognition (psychology)","score":0.436599999666214},{"id":"https://openalex.org/C26517878","display_name":"Key (lock)","score":0.37459999322891235},{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.3601999878883362},{"id":"https://openalex.org/C52622490","display_name":"Feature extraction","score":0.3521000146865845},{"id":"https://openalex.org/C2778858076","display_name":"Decodes","score":0.3499999940395355}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7135245498","title":"FinSCRA: An LLM-Powered Multi-Chain Reasoning Framework for Interpretable Node Classification on Text-Attributed Graphs","url":"https://doi.org/10.2139/ssrn.6302180","published":"2026-01-01","authors":["Pengfei Pan","Lizi Chen","Qi He","Keyu Yuan","Han Wang","Wenchao Zhang"],"abstract":"","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"preprint","doi":"https://doi.org/10.2139/ssrn.6302180","openalex_id":"https://openalex.org/W7135245498","cited_by_count":0,"quality_score":41,"matched_keywords":["LLM"],"author_affiliations":["California Southern University","Fordham University","Google (United States)","New York University","University of Southern California","University of Washington"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.684499979019165},{"id":"https://openalex.org/C63479239","display_name":"Robustness (evolution)","score":0.5651999711990356},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5311999917030334},{"id":"https://openalex.org/C58166","display_name":"Fuzzy logic","score":0.4702000021934509},{"id":"https://openalex.org/C25343380","display_name":"Relation (database)","score":0.4408999979496002},{"id":"https://openalex.org/C132525143","display_name":"Graph","score":0.43479999899864197},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.421099990606308},{"id":"https://openalex.org/C112799922","display_name":"Choquet integral","score":0.38280001282691956}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7124152295","title":"EinsPT: Efficient Instance-Aware Pre-Training of Vision Foundation Models","url":"https://doi.org/10.1109/tip.2026.3652371","published":"2026-01-01","authors":["Zhaozhi Wang","Yunjie Tian","Lingxi Xie","Yaowei Wang","Qixiang Ye"],"abstract":"In this study, we introduce EinsPT, an efficient instance-aware pre-training paradigm designed to reduce the transfer gap between vision foundation models and downstream instance-level tasks. Unlike conventional image-level pre-training that relies solely on unlabeled images, EinsPT leverages both image reconstruction and instance annotations to learn representations that are spatially coherent and instance discriminative. To achieve this efficiently, we propose a proxy-foundation architecture that decouples high-resolution and low-resolution learning: the foundation model processes masked low-resolution images for global semantics, while a lightweight proxy model operates on complete high-resolution images to preserve fine-grained details. The two branches are jointly optimized through reconstruction and instance-level prediction losses on fused features. Extensive experiments demonstra...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/tip.2026.3652371","openalex_id":"https://openalex.org/W7124152295","cited_by_count":0,"quality_score":41,"matched_keywords":["efficient"],"author_affiliations":["Huawei Technologies (China)","Peng Cheng Laboratory","University of Chinese Academy of Sciences"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7296000123023987},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.6105999946594238},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.4514000117778778},{"id":"https://openalex.org/C2776760102","display_name":"Code (set theory)","score":0.4507000148296356},{"id":"https://openalex.org/C17231256","display_name":"Completeness (order theory)","score":0.4480000138282776},{"id":"https://openalex.org/C36464697","display_name":"Visualization","score":0.4390999972820282},{"id":"https://openalex.org/C2780966255","display_name":"Foundation (evidence)","score":0.430400013923645},{"id":"https://openalex.org/C123657996","display_name":"Architecture","score":0.4171999990940094}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7158687668","title":"EEG-VLM: A Hierarchical Vision-Language Model With Multi-Level Feature Alignment and Visually Enhanced Language-Guided Reasoning for EEG Image-Based Sleep Stage Prediction","url":"https://doi.org/10.1109/jbhi.2026.3685511","published":"2026-01-01","authors":["Xihe Qiu","Gengchen Ma","Haoyu Wang","Chen Zhan","Xiaoyu Tan","Shuo Li"],"abstract":"Sleep stage classification based on electroencephalography (EEG) is fundamental for assessing sleep quality and diagnosing sleep-related disorders. However, most traditional machine learning methods rely heavily on prior knowledge and handcrafted features, while existing deep learning models still struggle to jointly capture fine-grained time-frequency patterns and achieve clinical interpretability. Recently, vision-language models (VLMs) have made significant progress in the medical domain, yet their performance remains constrained when applied to physiological waveform data, especially EEG signals, due to their limited visual understanding and insufficient reasoning capability. To address these challenges, we propose EEG-VLM, a hierarchical vision-language framework that integrates multi-level feature alignment with visually enhanced language-guided reasoning for interpretable EEG-base...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/jbhi.2026.3685511","openalex_id":"https://openalex.org/W7158687668","cited_by_count":0,"quality_score":41,"matched_keywords":["language model"],"author_affiliations":["Case Western Reserve University","Shanghai University of Engineering Science","Tencent (China)","Tongji University"],"concepts":[{"id":"https://openalex.org/C2781067378","display_name":"Interpretability","score":0.9178000092506409},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7508999705314636},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.7279000282287598},{"id":"https://openalex.org/C522805319","display_name":"Electroencephalography","score":0.6762999892234802},{"id":"https://openalex.org/C2776214188","display_name":"Inference","score":0.5914000272750854},{"id":"https://openalex.org/C2910364982","display_name":"Sleep Stages","score":0.5656999945640564},{"id":"https://openalex.org/C2776401178","display_name":"Feature (linguistics)","score":0.555899977684021},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.4853000044822693}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7134957481","title":"Dynamic Learning and Optimal Advertising Mechanism for LLM Platforms","url":"https://doi.org/10.2139/ssrn.6212838","published":"2026-01-01","authors":["Saeed Alaei","Ali Makhdoumi","Azarakhsh Malekian"],"abstract":"","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"preprint","doi":"https://doi.org/10.2139/ssrn.6212838","openalex_id":"https://openalex.org/W7134957481","cited_by_count":0,"quality_score":41,"matched_keywords":["LLM"],"author_affiliations":["Fucape Business School","Google (United States)","Massachusetts Institute of Technology","University of Toronto"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6450999975204468},{"id":"https://openalex.org/C186027771","display_name":"Valuation (finance)","score":0.5273000001907349},{"id":"https://openalex.org/C195487862","display_name":"Revenue","score":0.5098999738693237},{"id":"https://openalex.org/C88626702","display_name":"Continuation","score":0.49619999527931213},{"id":"https://openalex.org/C2779530757","display_name":"Quality (philosophy)","score":0.430400013923645},{"id":"https://openalex.org/C2776401178","display_name":"Feature (linguistics)","score":0.4212000072002411},{"id":"https://openalex.org/C90673727","display_name":"Product (mathematics)","score":0.41769999265670776},{"id":"https://openalex.org/C126255220","display_name":"Mathematical optimization","score":0.3898000121116638}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7131068311","title":"DrivingGaussian++: Towards Realistic Reconstruction and Editable Simulation for Surrounding Dynamic Driving Scenes","url":"https://doi.org/10.1109/tpami.2026.3667072","published":"2026-01-01","authors":["Yajiao Xiong","Xiaoyu Zhou","Yongtao Wang","Deqing Sun","Ming-Hsuan Yang"],"abstract":"We present DrivingGaussian++, an efficient and effective framework for realistic reconstruction and controllable editing of surrounding dynamic autonomous driving scenes. DrivingGaussian++ models the static background with incremental 3D Gaussians and reconstructs moving objects with a composite dynamic Gaussian graph, ensuring accurate positions and occlusions. By integrating a LiDAR prior, it achieves detailed and consistent scene reconstruction, outperforming existing methods in dynamic scene reconstruction and photorealistic surround-view synthesis. DrivingGaussian++ supports training-free controllable editing for dynamic driving scenes, including texture modification, weather simulation, and object manipulation, leveraging multi-view images and depth priors. By integrating large language models (LLMs) and controllable editing, our method can automatically generate dynamic object mot...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/tpami.2026.3667072","openalex_id":"https://openalex.org/W7131068311","cited_by_count":0,"quality_score":41,"matched_keywords":["efficient"],"author_affiliations":["Google (United States)","Google DeepMind (United Kingdom)","King University","Peking University","University of California, Merced"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.79339998960495},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.7802000045776367},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.7139999866485596},{"id":"https://openalex.org/C2781238097","display_name":"Object (grammar)","score":0.5613999962806702},{"id":"https://openalex.org/C37404715","display_name":"Dynamic programming","score":0.4480000138282776},{"id":"https://openalex.org/C121684516","display_name":"Computer graphics (images)","score":0.3774999976158142},{"id":"https://openalex.org/C13662910","display_name":"Trajectory","score":0.3695000112056732},{"id":"https://openalex.org/C163716315","display_name":"Gaussian","score":0.36579999327659607}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7128619344","title":"DegDiT: Controllable Audio Generation With Dynamic Event Graph Guided Diffusion Transformer","url":"https://doi.org/10.1109/taslpro.2026.3663920","published":"2026-01-01","authors":["Yisu Liu","Chenxing Li","Wanqian Zhang","Wenfu Wang","Meng Yu","Ruibo Fu","Zheng Lin","Weiping Wang","Dong Yu"],"abstract":"Controllable text-to-audio generation aims to synthesize audio from textual descriptions while satisfying user-specified constraints, including event types, onset and offset timestamps, and temporal sequences. This enables precise control over both the content and temporal structure of the generated audio. Despite recent progress, existing methods still face inherent trade-offs among accurate temporal localization, open-vocabulary scalability, and practical efficiency. To address these challenges, we propose <bold xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">DegDiT</b>, a novel dynamic event graph-guided diffusion transformer framework for open-vocabulary controllable audio generation. DegDiT encodes the events from the text description into structured dynamic graphs. The nodes represent distinct audio events, while edges encode the temporal r...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/taslpro.2026.3663920","openalex_id":"https://openalex.org/W7128619344","cited_by_count":0,"quality_score":41,"matched_keywords":["preference"],"author_affiliations":["Bellevue College","Chinese Academy of Sciences","Institute of Information Engineering","Shandong Institute of Automation","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8379999995231628},{"id":"https://openalex.org/C66322947","display_name":"Transformer","score":0.5192000269889832},{"id":"https://openalex.org/C49937458","display_name":"Probabilistic logic","score":0.5069000124931335},{"id":"https://openalex.org/C2779662365","display_name":"Event (particle physics)","score":0.4377000033855438},{"id":"https://openalex.org/C66746571","display_name":"ENCODE","score":0.43639999628067017},{"id":"https://openalex.org/C132525143","display_name":"Graph","score":0.4268999993801117},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4242999851703644},{"id":"https://openalex.org/C175291020","display_name":"Offset (computer science)","score":0.4034999907016754}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7125766939","title":"DHPT: Dual-Modality Heterogeneous Prompt Tuning for Online Test-time Adaption in Vision-language Models","url":"https://doi.org/10.1109/tcsvt.2026.3657756","published":"2026-01-01","authors":["Guiqin Wang","Peng Zhao","Xiang Wang","Haoran Guo","Nan Qi","Shusen Yang","Qinghai Guo"],"abstract":"Test-Time Adaptation (TTA) has recently emerged as a promising research direction, enabling vision-language models (VLMs) to adapt to unlabeled test data in zero-shot settings. Among TTA approaches, test-time prompt tuning has shown great potential for enhancing the practical applicability of VLMs. However, existing methods typically either focus on adapting a single modality or apply uniform optimization to both modalities, without explicitly defining modality-specific optimization objectives. Such a one-size-fits-all strategy often results in suboptimal performance under test-time conditions. To address this limitation, we propose Dual-modality Heterogeneous Prompt Tuning (DHPT), a novel framework designed to simultaneously capture fine-grained textual semantics and alleviate domain shift noise in the visual modality. Specifically, we leverage a large language model to provide textual....","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/tcsvt.2026.3657756","openalex_id":"https://openalex.org/W7125766939","cited_by_count":0,"quality_score":41,"matched_keywords":["language model"],"author_affiliations":["Academic Degrees & Graduate Education","Huawei Technologies (China)","Xi'an Jiaotong University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8325999975204468},{"id":"https://openalex.org/C153083717","display_name":"Leverage (statistics)","score":0.7922999858856201},{"id":"https://openalex.org/C27158222","display_name":"Generalizability theory","score":0.6550999879837036},{"id":"https://openalex.org/C2776434776","display_name":"Domain adaptation","score":0.633899986743927},{"id":"https://openalex.org/C185798385","display_name":"Benchmark (surveying)","score":0.5396999716758728},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5205000042915344},{"id":"https://openalex.org/C184337299","display_name":"Semantics (computer science)","score":0.516700029373169},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.4837999939918518}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7131415337","title":"Correspondence Calibrating and Dynamic Consistency Learning for Noisy Cross-Modal Retrieval","url":"https://doi.org/10.1109/tmm.2026.3664974","published":"2026-01-01","authors":["Tao Yao","Yizhen Wu","Liang Zhang","Guorui Sheng","Yanfang Li","Qi Tian"],"abstract":"Cross-modal retrieval has drawn an increasing amount of attention due to its effective ability for searching semantic relative data points with different modalities. In spite of some progress obtained, such methods often require the data pair maintaining the correct cross-modal correspondence in the training process, which is impractical in real application. To tackle this issue, we propose a Correspondence Calibrating and Dynamic Consistency Learning Network (CCDCL), aiming at optimizing the correspondence of positive samples and deeply investigating the consistency of negative samples. Specifically, to effectively alleviate the false positive issues, co-teaching paradigm is introduced to optimize the correspondence of positive samples by calibrating their confidence scores. To address the false negative sample problem, we propose a Vision-Language Semantic Collaborative Dynamic Margin....","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/tmm.2026.3664974","openalex_id":"https://openalex.org/W7131415337","cited_by_count":0,"quality_score":41,"matched_keywords":["retrieval"],"author_affiliations":["Huawei Technologies (China)","Ludong University","Nanjing Normal University","Taiyuan University of Technology"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8604999780654907},{"id":"https://openalex.org/C774472","display_name":"Margin (machine learning)","score":0.8012999892234802},{"id":"https://openalex.org/C2776436953","display_name":"Consistency (knowledge bases)","score":0.7408000230789185},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.6007999777793884},{"id":"https://openalex.org/C165064840","display_name":"Matching (statistics)","score":0.5404000282287598},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.4887999892234802},{"id":"https://openalex.org/C124101348","display_name":"Data mining","score":0.42080000042915344},{"id":"https://openalex.org/C198531522","display_name":"Sample (material)","score":0.40459999442100525}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7159665524","title":"Annotating Spatial Multi-Omics Spot-Level Niche Types Using Bi-View Retrieval-Augmented Generation With SpotTypeLLM","url":"https://doi.org/10.1109/tcbbio.2026.3689143","published":"2026-01-01","authors":["Longyi Li","Liyan Dong","Bo Yu","Decheng Li","Hanbo Liu","Yuheng Zhu","Xiyuan Mei","Hao Zhang","Dong Xu"],"abstract":"Spatial multi-omics techniques generate extensive spot-level profiles without accompanying spot -type labels, forcing biologists into labor-intensive manual annotation. Although large language models (LLMs) promise automated annotation, they are poorly equipped to handle high-dimensional numeric inputs, struggle to convert complex spatial-omics structures into interpretable text, and lack the specialized biological knowledge needed to avoid hallucinations. Moreover, the intrinsic sparsity and heterogeneity of spatial omics data undermine robust feature extraction and accurate spot-level niche label assignment. To address these challenges, we propose SpotTypeLLM, a framework for annotating spatial multi-omics spot-level niche types using a Bi-view Retrieval-Augmented Generation (BiRAG) tailored for LLMs. Specifically, SpotTypeLLM encodes spatial multi-omics data through scLLM-based embedd...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/tcbbio.2026.3689143","openalex_id":"https://openalex.org/W7159665524","cited_by_count":0,"quality_score":41,"matched_keywords":["retrieval"],"author_affiliations":["Huawei Technologies (China)","Jilin University","University of Missouri"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.760699987411499},{"id":"https://openalex.org/C2780801425","display_name":"Construct (python library)","score":0.5608000159263611},{"id":"https://openalex.org/C159620131","display_name":"Spatial analysis","score":0.5583000183105469},{"id":"https://openalex.org/C2776321320","display_name":"Annotation","score":0.5388000011444092},{"id":"https://openalex.org/C124101348","display_name":"Data mining","score":0.4966999888420105},{"id":"https://openalex.org/C132525143","display_name":"Graph","score":0.48559999465942383},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.424699991941452},{"id":"https://openalex.org/C197115733","display_name":"Forcing (mathematics)","score":0.4153999984264374}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7139080605","title":"Alleviating Contextual Misguidance: Response-Aware Prompt Compression for Long-Context Question Answering","url":"https://doi.org/10.1109/taslpro.2026.3675784","published":"2026-01-01","authors":["Haoyuan Wang","Zhen Wang","Wenmeng Zhou","Yang Deng"],"abstract":"The ability of Large Language Models (LLMs) to accurately process long contexts is crucial for many real-world applications. However, despite recent advancements in extending context windows, we find that contemporary LLMs still suffer from the phenomenon known as “Lost In The Middle”, where LLMs fail to accurately retrieve information located centrally within long-context prompts. Our analysis reveals that this failure is often caused by “contextual misguidance”, where query-relevant yet non-grounding segments distract the model. To mitigate this issue, we introduce <bold xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">L</b>eave-<bold xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">O</b>ne-<bold xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">O</b>ut <bold xmlns:...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/taslpro.2026.3675784","openalex_id":"https://openalex.org/W7139080605","cited_by_count":0,"quality_score":41,"matched_keywords":["compression"],"author_affiliations":["Alibaba Group (China)","Singapore Management University","Sun Yat-sen University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7825000286102295},{"id":"https://openalex.org/C63479239","display_name":"Robustness (evolution)","score":0.6628000140190125},{"id":"https://openalex.org/C2779343474","display_name":"Context (archaeology)","score":0.5245000123977661},{"id":"https://openalex.org/C44291984","display_name":"Question answering","score":0.5138999819755554},{"id":"https://openalex.org/C180016635","display_name":"Compression (physics)","score":0.48980000615119934},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.48030000925064087},{"id":"https://openalex.org/C98045186","display_name":"Process (computing)","score":0.47769999504089355},{"id":"https://openalex.org/C183322885","display_name":"Context model","score":0.41920000314712524}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7142127368","title":"AI Agent Traps","url":"https://doi.org/10.2139/ssrn.6372438","published":"2026-01-01","authors":["Matija Franklin","Nenad Tomašev","Julian Jacobs","Joel Z. Leibo","Simon Osindero"],"abstract":"","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"preprint","doi":"https://doi.org/10.2139/ssrn.6372438","openalex_id":"https://openalex.org/W7142127368","cited_by_count":0,"quality_score":41,"matched_keywords":["agent"],"author_affiliations":["Google (United States)","Google DeepMind (United Kingdom)"],"concepts":[{"id":"https://openalex.org/C165696696","display_name":"Exploit","score":0.9631999731063843},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6575999855995178},{"id":"https://openalex.org/C95713431","display_name":"Vulnerability (computing)","score":0.559499979019165},{"id":"https://openalex.org/C37736160","display_name":"Adversarial system","score":0.5327000021934509},{"id":"https://openalex.org/C38652104","display_name":"Computer security","score":0.4823000133037567},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4684999883174896},{"id":"https://openalex.org/C48103436","display_name":"State (computer science)","score":0.44760000705718994},{"id":"https://openalex.org/C2779304628","display_name":"Face (sociological concept)","score":0.44429999589920044}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7130335306","title":"<i>HoloQA</i> : Full Reference Video Quality Assessor of Rendered Human Avatars in Virtual Reality","url":"https://doi.org/10.1109/tip.2026.3663930","published":"2026-01-01","authors":["Avinab Saha","Yu-Chih Chen","Christian Häne","Jean‐Charles Bazin","Ioannis Katsavounidis","Alexandre Chapiro","Alan C. Bovik"],"abstract":"We present HoloQA, a new state-of-the-art Full Reference Video Quality Assessment (VQA) model that was designed using principles of visual neuroscience, information theory, and self-supervised deep learning to accurately predict the quality of rendered digital human avatars in Virtual Reality (VR) and Augmented Reality (AR) systems. The growing adoption of VR/AR applications that aim to transmit digital human avatars over bandwidth-limited video networks has driven the need for VQA algorithms that better account for the kinds of distortions that reduce the quality of rendered and viewed avatars. As we will show, standard VQA models often fail to capture distortions unique to the rendering, transmission, and compression of videos containing human avatars. Towards solving this difficult problem, we adopt a multi-level Mixture-of-Experts approach. This involves computing distortion-aware pe...","companies":["Meta/FAIR"],"matched_orgs":["Meta/FAIR"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/tip.2026.3663930","openalex_id":"https://openalex.org/W7130335306","cited_by_count":0,"quality_score":41,"matched_keywords":["compression"],"author_affiliations":["Meta (United States)","National Yang Ming Chiao Tung University","The University of Texas at Austin"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8600999712944031},{"id":"https://openalex.org/C2777365542","display_name":"Avatar","score":0.8111000061035156},{"id":"https://openalex.org/C194969405","display_name":"Virtual reality","score":0.5713000297546387},{"id":"https://openalex.org/C2779530757","display_name":"Quality (philosophy)","score":0.5710999965667725},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.541100025177002},{"id":"https://openalex.org/C108583219","display_name":"Deep learning","score":0.4514000117778778},{"id":"https://openalex.org/C153715457","display_name":"Augmented reality","score":0.44530001282691956},{"id":"https://openalex.org/C160086991","display_name":"Human visual system model","score":0.44020000100135803}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7118018739","title":"Large Language Models as Topological Structure Enhancers for Text-Attributed Graphs","url":"https://doi.org/10.1007/978-981-95-4158-4_7","published":"2026-01-01","authors":["Shengyin Sun","Yuxiang Ren","Jiehao Chen","Chen Ma"],"abstract":"","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"book-chapter","doi":"https://doi.org/10.1007/978-981-95-4158-4_7","openalex_id":"https://openalex.org/W7118018739","cited_by_count":2,"quality_score":39,"matched_keywords":[],"author_affiliations":["City University of Hong Kong","Huawei Technologies (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7871000170707703},{"id":"https://openalex.org/C132525143","display_name":"Graph","score":0.48910000920295715},{"id":"https://openalex.org/C511192102","display_name":"Comprehension","score":0.4837000072002411},{"id":"https://openalex.org/C2776135515","display_name":"Regularization (linguistics)","score":0.4587000012397766},{"id":"https://openalex.org/C80444323","display_name":"Theoretical computer science","score":0.44200000166893005},{"id":"https://openalex.org/C184720557","display_name":"Topology (electrical circuits)","score":0.4415000081062317},{"id":"https://openalex.org/C2776401178","display_name":"Feature (linguistics)","score":0.3822999894618988},{"id":"https://openalex.org/C50644808","display_name":"Artificial neural network","score":0.382099986076355}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":2}},{"id":"openalex:W7135225187","title":"AIGC video detection based on the fusion of spatial-frequency-optical flow multimodal features","url":"https://doi.org/10.23919/jsee.2026.000049","published":"2026-01-01","authors":["Hong Sheng","Wang Xuanqi","Zhang Chang","Wang Jiacheng","Duan Pingxia","Wang Yuwei"],"abstract":"The rapid evolution of generative artificial intelligence (AI) (e.g., Sora, Hunyuan) makes it essential to develop effective detection strategies that can generalize across ever-evolving synthesis techniques. This study is motivated by the observation of a fundamental challenge in generative models: the inherent difficulty of maintaining cross-modal consistency between appearance and motion. To this end, we propose a multi-modal framework for AI generated content (AIGC) video forgery detection tasks, named cross-attention based video forgery detector (CrossAtt-VFD), based on joint multi-view analysis of content. Methodologically, we introduce a dual-branch architecture that simultaneously extracts spatial-frequency and optical-flow features. This approach enables the modeling of videos from complementary perceptual perspectives. The core of this process is a dedicated cross-attention mec...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.23919/jsee.2026.000049","openalex_id":"https://openalex.org/W7135225187","cited_by_count":2,"quality_score":39,"matched_keywords":[],"author_affiliations":["Alibaba Group (China)","Beihang University","Chinese Academy of Sciences","Institute of Computing Technology","Nanchang University"],"concepts":[{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.7731999754905701},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.7466999888420105},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7057999968528748},{"id":"https://openalex.org/C158525013","display_name":"Fusion","score":0.5658000111579895},{"id":"https://openalex.org/C38349280","display_name":"Flow (mathematics)","score":0.4336000084877014},{"id":"https://openalex.org/C33954974","display_name":"Sensor fusion","score":0.39309999346733093},{"id":"https://openalex.org/C155542232","display_name":"Optical flow","score":0.3917999863624573},{"id":"https://openalex.org/C2776151529","display_name":"Object detection","score":0.382099986076355}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":2}},{"id":"arxiv:2508.06763","title":"SafePLUG: Empowering Multimodal LLMs with Pixel-Level Insight and Temporal Grounding for Traffic Accident Understanding","url":"http://arxiv.org/abs/2508.06763","published":"2026-01-01","authors":["Zihao Sheng","Zilin Huang","Yi Qu","Jiancong Chen","Yuhao Luo","Yen‐Jung Chen","Yue Leng","Sikai Chen"],"abstract":"Multimodal Large Language Models (MLLMs) have achieved remarkable progress across a range of vision-language tasks and demonstrate strong potential for traffic accident understanding. However, existing MLLMs in this domain primarily focus on coarse-grained image-level or video-level comprehension and often struggle to handle fine-grained visual details or localized scene components, limiting their applicability in complex accident scenarios. To address these limitations, we propose SafePLUG, a novel framework that empowers MLLMs with both pixel-level understanding and temporal grounding for comprehensive traffic accident analysis. SafePLUG supports both arbitrary-shaped visual prompts for region-aware question answering and pixel-level segmentation based on language instructions, while also enabling the recognition of temporally anchored events in traffic accident scenarios. To advance t...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.23919/chain.2026.000005","openalex_id":"https://openalex.org/W4416176980","cited_by_count":1,"quality_score":38,"matched_keywords":[],"author_affiliations":["Google (United States)","Purdue University West Lafayette","University of Wisconsin–Madison"],"concepts":[{"id":"https://openalex.org/C2780289543","display_name":"Accident (philosophy)","score":0.5964000225067139},{"id":"https://openalex.org/C145804949","display_name":"Situation awareness","score":0.5910999774932861},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5741000175476074},{"id":"https://openalex.org/C2779662365","display_name":"Event (particle physics)","score":0.5583999752998352},{"id":"https://openalex.org/C511192102","display_name":"Comprehension","score":0.5047000050544739},{"id":"https://openalex.org/C38652104","display_name":"Computer security","score":0.47760000824928284},{"id":"https://openalex.org/C2776544517","display_name":"Unexpected events","score":0.4350000023841858},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.40220001339912415}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W7148648169","title":"Open-Vocabulary SAM3D: Towards Training-free Open-Vocabulary 3D Scene Understanding","url":"https://doi.org/10.1109/tcsvt.2026.3680794","published":"2026-01-01","authors":["Hanchen Tai","Qingdong He","Yijie Qian","Xiaobin Hu","Xiangtai Li","Yong Liu","Jiangning Zhang"],"abstract":"Open-vocabulary 3D scene understanding presents a significant challenge in the field. Recent works have sought to transfer knowledge embedded in vision-language models from 2D to 3D domains. However, these approaches often require prior knowledge from specific 3D scene datasets, limiting their applicability in open-world scenarios. The Segment Anything Model (SAM) has demonstrated remarkable zero-shot segmentation capabilities, prompting us to investigate its potential for comprehending 3D scenes without training. In this paper, we introduce OV-SAM3D, a training-free method that contains a universal framework for understanding open-vocabulary 3D scenes. This framework is designed to perform understanding tasks for any 3D scene without requiring prior knowledge of the scene. Specifically, our method is composed of two key sub-modules: First, we initiate the process by generating superpoin...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/tcsvt.2026.3680794","openalex_id":"https://openalex.org/W7148648169","cited_by_count":1,"quality_score":38,"matched_keywords":[],"author_affiliations":["Peking University","Tencent (China)","Zhejiang University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6384000182151794},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.5708000063896179},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5090000033378601},{"id":"https://openalex.org/C36464697","display_name":"Visualization","score":0.40939998626708984},{"id":"https://openalex.org/C9417928","display_name":"Image processing","score":0.3386000096797943},{"id":"https://openalex.org/C124504099","display_name":"Image segmentation","score":0.3257000148296356},{"id":"https://openalex.org/C121684516","display_name":"Computer graphics (images)","score":0.31850001215934753},{"id":"https://openalex.org/C12713177","display_name":"Perspective (graphical)","score":0.30869999527931213}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W7119128502","title":"DeepTrans: Deep Reasoning Translation via Reinforcement Learning","url":"https://doi.org/10.1162/tacl.a.65","published":"2026-01-01","authors":["Jiaan Wang","Fandong Meng","Jie Zhou"],"abstract":"Abstract Recently, deep reasoning LLMs (e.g., OpenAI o1 and DeepSeek-R1) have shown promising performance in various downstream tasks. Free translation is an important and interesting task in the multilingual world, which requires going beyond word-for-word translation. However, the task is still under-explored in deep reasoning LLMs. In this paper, we introduce DeepTrans, a deep reasoning translation model that learns free translation via reinforcement learning (RL). Specifically, we carefully build a reward model with pre-defined scoring criteria on both the translation results and the thought processes. The reward model teaches DeepTrans how to think and free-translate the given sentences during RL. Besides, our RL training does not need any labeled translations, avoiding the human-intensive annotation or resource-intensive data synthesis. Experimental results show the effectiveness o...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1162/tacl.a.65","openalex_id":"https://openalex.org/W7119128502","cited_by_count":1,"quality_score":38,"matched_keywords":[],"author_affiliations":["Tencent (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8476999998092651},{"id":"https://openalex.org/C97541855","display_name":"Reinforcement learning","score":0.7423999905586243},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.7156000137329102},{"id":"https://openalex.org/C2780451532","display_name":"Task (project management)","score":0.6973999738693237},{"id":"https://openalex.org/C149364088","display_name":"Translation (biology)","score":0.6349999904632568},{"id":"https://openalex.org/C2776321320","display_name":"Annotation","score":0.5996999740600586},{"id":"https://openalex.org/C108583219","display_name":"Deep learning","score":0.5177000164985657},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.48410001397132874}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W7147289030","title":"The Emerging Paradigm of Geospatial Foundation Models: From Pre-training to Agentic Reasoning","url":"https://doi.org/10.1007/978-3-032-18474-0_1","published":"2026-01-01","authors":["Shelley M Cazares"],"abstract":"","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"book-chapter","doi":"https://doi.org/10.1007/978-3-032-18474-0_1","openalex_id":"https://openalex.org/W7147289030","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Google (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8303999900817871},{"id":"https://openalex.org/C9770341","display_name":"Geospatial analysis","score":0.7907000184059143},{"id":"https://openalex.org/C9354725","display_name":"Operationalization","score":0.5464000105857849},{"id":"https://openalex.org/C2522767166","display_name":"Data science","score":0.5303999781608582},{"id":"https://openalex.org/C9652623","display_name":"Field (mathematics)","score":0.5151000022888184},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4722999930381775},{"id":"https://openalex.org/C36503486","display_name":"Domain (mathematical analysis)","score":0.4611000120639801},{"id":"https://openalex.org/C2776434776","display_name":"Domain adaptation","score":0.43209999799728394}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7135062635","title":"StarVid: Enhancing Semantic Alignment in Video Diffusion Models via Spatial and SynTactic Guided Attention Refocusing","url":"https://doi.org/10.1109/tmm.2026.3668668","published":"2026-01-01","authors":["Yuanhang Li","Qi Mao","Lan Chen","Zhen Fang","Lei Tian","Xinyan Xiao","Libiao Jin","Hua Wu"],"abstract":"Recent advances in text-to-video (T2V) generation with diffusion models have garnered significant attention. How ever, they typically perform well in scenes with a single object and motion, struggling in compositional scenarios with multiple objects and distinct motions to accurately reflect the semantic content of text prompts. To address these challenges, we propose StarVid, a plug-and-play, training-free method that improves semantic alignment between multiple subjects, their motions, and text prompts in T2V models. StarVid first employs large language models (LLMs) to perform two-stage motion trajectory planning based on the input prompt, providing object-level spatial priors that guide the generation process. These priors are then used to impose a spatial-aware loss that refines cross-attention (CA) maps, encouraging attention to be focused on distinct regions corresponding to diffe...","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/tmm.2026.3668668","openalex_id":"https://openalex.org/W7135062635","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Baidu (China)","Communication University of China"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8795999884605408},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.6126000285148621},{"id":"https://openalex.org/C69357855","display_name":"Diffusion","score":0.42660000920295715},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.4131999909877777},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.40459999442100525},{"id":"https://openalex.org/C12713177","display_name":"Perspective (graphical)","score":0.36739999055862427},{"id":"https://openalex.org/C184337299","display_name":"Semantics (computer science)","score":0.3537999987602234},{"id":"https://openalex.org/C153180895","display_name":"Pattern recognition (psychology)","score":0.33399999141693115}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7124930157","title":"SceneVTG++: Controllable Multilingual Visual Text Generation in the Wild","url":"https://doi.org/10.1109/tcsvt.2026.3656228","published":"2026-01-01","authors":["Jiawei Liu","Yuanzhi Zhu","Feiyu Gao","Zhibo Yang","Peng Wang","Junyang Lin","Xinggang Wang","Wenyu Liu"],"abstract":"Generating visual text in natural scene images is a challenging task with many unsolved problems. Different from generating text on artificially designed images (such as posters, covers, and cartoons), existing methods for natural scene visual text generation still have significant deficiencies: methods based on rendering engines rely on manually crafted rules, which struggle to adapt to diverse backgrounds and leave obvious artificial traces, while their text layouts may be placed in unreasonable areas (e.g., sky or ground) and text content is semantically disconnected from the scene; diffusion model-based methods, on the other hand, face difficulties in generating small characters, depend on manually designed prompts to ensure reasonable layout and content, fail to generate text at precise locations, and cannot effectively control text attributes (e.g., font and color). In this paper,....","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/tcsvt.2026.3656228","openalex_id":"https://openalex.org/W7124930157","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Alibaba Group (China)","Huazhong University of Science and Technology"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8258000016212463},{"id":"https://openalex.org/C2985684807","display_name":"Text generation","score":0.5978000164031982},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.583899974822998},{"id":"https://openalex.org/C205711294","display_name":"Rendering (computer graphics)","score":0.5770000219345093},{"id":"https://openalex.org/C151375590","display_name":"Noisy text analytics","score":0.5738000273704529},{"id":"https://openalex.org/C2777737414","display_name":"Font","score":0.5457000136375427},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.5336999893188477},{"id":"https://openalex.org/C157657479","display_name":"Closed captioning","score":0.4925999939441681}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7135079331","title":"SCSV: Spatial-Temporal Consistent Dynamic 3D Scene Generation From Sparse Views","url":"https://doi.org/10.1109/tip.2026.3671692","published":"2026-01-01","authors":["Junfeng Li","Junjie He","Wenjie Liu","Tianyu Huang","Shunbo Zhou","Jun Ma","Hesheng Wang","Haoang Li"],"abstract":"Generating dynamic scenes from images has gained increasing attention. Existing methods have two major limitations: 1) they can hardly handle sparse images which exhibit limited geometry constraints and insufficient motion; 2) they struggle to maintain spatial-temporal consistency when rendering multi-view videos. To address these limitations, we propose SCSV, a spatial-temporal consistent dynamic scene generation method from sparse views. Our method consists of two stages: scene reconstruction and scene expansion, both of which decouple background and foreground. In the scene reconstruction stage, we first interpolate a set of images between the input images based on a video generation model, followed by the optimization of the scene Gaussian from the interpolated and input images. To improve the spatial-temporal consistency of the reconstructed scene, we propose an uncertainty-aware Ga...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/tip.2026.3671692","openalex_id":"https://openalex.org/W7135079331","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Chinese University of Hong Kong","Hong Kong University of Science and Technology","Huawei Technologies (China)","Shanghai Jiao Tong University","University of Hong Kong"],"concepts":[{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.6499000191688538},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.635699987411499},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.6302000284194946},{"id":"https://openalex.org/C9417928","display_name":"Image processing","score":0.38269999623298645},{"id":"https://openalex.org/C124504099","display_name":"Image segmentation","score":0.38040000200271606},{"id":"https://openalex.org/C141379421","display_name":"Iterative reconstruction","score":0.36239999532699585},{"id":"https://openalex.org/C153180895","display_name":"Pattern recognition (psychology)","score":0.36160001158714294},{"id":"https://openalex.org/C36464697","display_name":"Visualization","score":0.36090001463890076}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7130708373","title":"Revolutionizing Turn-by-Turn Navigation With Cloud-Edge Deep Learning","url":"https://doi.org/10.1109/tits.2026.3662685","published":"2026-01-01","authors":["Yiming Yang","Hao Fu","Fanxiang Zeng","Xikai Yang","Yue Liu","Ning Guo"],"abstract":"Turn-by-turn (TBT) navigation systems are integral to modern driving experiences, providing real-time audio instructions to guide drivers safely to destinations. However, existing audio instruction policy often relies on rule-based approaches that struggle to balance informational content with cognitive load, potentially leading to driver confusion or missed turns in complex environments. To overcome these difficulties, we first model the generation of navigation instructions as a multi-task learning problem by decomposing the audio content into combinations of modular elements. Then, we propose a novel deep learning framework that leverages the powerful spatiotemporal information processing capabilities of Transformers and the strong multi-task learning abilities of Mixture of Experts (MoE) to generate real-time, context-aware audio instructions for TBT driving navigation. A cloud-edge....","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/tits.2026.3662685","openalex_id":"https://openalex.org/W7130708373","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Alibaba Group (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7221999764442444},{"id":"https://openalex.org/C48044578","display_name":"Scalability","score":0.6764000058174133},{"id":"https://openalex.org/C108583219","display_name":"Deep learning","score":0.65420001745224},{"id":"https://openalex.org/C101468663","display_name":"Modular design","score":0.5949000120162964},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4449999928474426},{"id":"https://openalex.org/C49774154","display_name":"Multimedia","score":0.4357999861240387},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.42080000042915344},{"id":"https://openalex.org/C127220857","display_name":"Audio signal processing","score":0.4099000096321106}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7126045983","title":"Real-Time Human and Generative AI Interaction: Network Challenges and Opportunities","url":"https://doi.org/10.1109/mnet.2026.3656136","published":"2026-01-01","authors":["Ruizhi Cheng","Guowu Xie","Bo Han"],"abstract":"The rapid rise of real-time, audio/video-based interactions between humans and generative AI (GenAI) introduces fundamental challenges for today’s network infrastructure. Our measurement study, combined with an in-depth analysis of existing deployments, reveals two key shortcomings in the current system design for human and GenAI interaction. First, they fall short of providing fast and robust network adaptation, resulting in a degraded user experience under dynamic and lossy network conditions. Second, existing platforms adopt inconsistent and ad hoc transport-layer strategies to reuse legacy architectures that are ill-suited for the unique traffic patterns of human and GenAI interactions. To address these challenges, we first propose leveraging the network edge as an intelligent relay between users and GenAI backends to perform key network adaptation tasks such as congestion control. T...","companies":["Meta/FAIR"],"matched_orgs":["Meta/FAIR"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/mnet.2026.3656136","openalex_id":"https://openalex.org/W7126045983","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["George Mason University","Meta (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8809999823570251},{"id":"https://openalex.org/C26517878","display_name":"Key (lock)","score":0.6072999835014343},{"id":"https://openalex.org/C139807058","display_name":"Adaptation (eye)","score":0.5677000284194946},{"id":"https://openalex.org/C193415008","display_name":"Network architecture","score":0.5527999997138977},{"id":"https://openalex.org/C162307627","display_name":"Enhanced Data Rates for GSM Evolution","score":0.4648999869823456},{"id":"https://openalex.org/C120314980","display_name":"Distributed computing","score":0.4551999866962433},{"id":"https://openalex.org/C138236772","display_name":"Edge device","score":0.42890000343322754},{"id":"https://openalex.org/C123657996","display_name":"Architecture","score":0.3946000039577484}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7127905411","title":"RGBT Tracking Based on Multimodal Spatio-Temporal Feature Interaction and Progressive Mamba Fusion","url":"https://doi.org/10.1007/978-981-95-5758-5_39","published":"2026-01-01","authors":["Zhiyuan Chang","Peng Yuan","Zining Song","He Li"],"abstract":"","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"book-chapter","doi":"https://doi.org/10.1007/978-981-95-5758-5_39","openalex_id":"https://openalex.org/W7127905411","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Huawei Technologies (China)","Robotics Research (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8823000192642212},{"id":"https://openalex.org/C63479239","display_name":"Robustness (evolution)","score":0.7164999842643738},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.6699000000953674},{"id":"https://openalex.org/C2776214188","display_name":"Inference","score":0.5842000246047974},{"id":"https://openalex.org/C82990744","display_name":"RGB color model","score":0.545799970626831},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.46810001134872437},{"id":"https://openalex.org/C2776401178","display_name":"Feature (linguistics)","score":0.4578000009059906},{"id":"https://openalex.org/C2780226545","display_name":"Modality (human–computer interaction)","score":0.4575999975204468}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7154937310","title":"One-Shot Federated Learning with Pre-Trained Foundation Models via Prototype Learning","url":"https://doi.org/10.2139/ssrn.6614387","published":"2026-01-01","authors":["Yunlu Yan","Yawen Huang","Huafu Zhu","Yuexiang Li","Jinheng Xie","Yarui Xu","Jiaxing Shen","Weiming Wang","Ping Li","Xian Wu","Lei Zhu"],"abstract":"","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"preprint","doi":"https://doi.org/10.2139/ssrn.6614387","openalex_id":"https://openalex.org/W7154937310","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Agency for Science, Technology and Research","City University of Hong Kong","Hong Kong Polytechnic University","Institute of High Performance Computing","Lingnan University","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C2992525071","display_name":"Federated learning","score":0.8199999928474426},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8137000203132629},{"id":"https://openalex.org/C177284502","display_name":"Adapter (computing)","score":0.7621999979019165},{"id":"https://openalex.org/C2776401178","display_name":"Feature (linguistics)","score":0.49709999561309814},{"id":"https://openalex.org/C36503486","display_name":"Domain (mathematical analysis)","score":0.43650001287460327},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.41760000586509705},{"id":"https://openalex.org/C45374587","display_name":"Computation","score":0.40639999508857727},{"id":"https://openalex.org/C2776760102","display_name":"Code (set theory)","score":0.3928000032901764}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7131122742","title":"MelodyGLM: Multi-Task Pre-Training for Structured Symbolic Melody Generation","url":"https://doi.org/10.1109/taslpro.2026.3667433","published":"2026-01-01","authors":["Xinda Wu","Zhijie Huang","Kejun Zhang","Jiaxing Yu","Xu Tan","Tieyao Zhang","Youhan Li","Zheng Wang","Lingyun Sun"],"abstract":"Pre-trained language models in natural language processing (NLP) have substantially advanced music understanding and generation. However, traditional pre-training methods for symbolic melody generation are limited in their ability to capture multi-scale, multi-dimensional musical structures, primarily due to fundamental differences between textual and musical domains. In addition, the scarcity of large-scale symbolic melody datasets constrains further progress. In this paper, we introduce MelodyGLM, a novel multi-task pre-training framework tailored for structured symbolic melody generation. MelodyGLM enhances autoregressive blank infilling pre-training through melodic <inline-formula xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" xmlns:xlink=\"http://www.w3.org/1999/xlink\"><tex-math notation=\"LaTeX\">$n$</tex-math></inline-formula>-gram and long-span sampling strategies, which target loca...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/taslpro.2026.3667433","openalex_id":"https://openalex.org/W7131122742","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Georgia Institute of Technology","Tencent (China)","Zhejiang University of Science and Technology"],"concepts":[{"id":"https://openalex.org/C43803900","display_name":"Melody","score":0.9490000009536743},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7335000038146973},{"id":"https://openalex.org/C28490314","display_name":"Speech recognition","score":0.555400013923645},{"id":"https://openalex.org/C207609745","display_name":"Bootstrapping (finance)","score":0.5291000008583069},{"id":"https://openalex.org/C159877910","display_name":"Autoregressive model","score":0.5200999975204468},{"id":"https://openalex.org/C8112396","display_name":"MIDI","score":0.5080999732017517},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4823000133037567},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.48010000586509705}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7121203414","title":"MME-VirtualWorld: Simplifying Multimodal Assessment via Programmable Synthetic Benchmarking","url":"https://doi.org/10.1007/978-981-95-5761-5_28","published":"2026-01-01","authors":["Zedong Liu","Li Chen","Shenao Chen","Xiang He","Aoqi Fu","Jiaxiang Liu","Shikun Feng"],"abstract":"","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"book-chapter","doi":"https://doi.org/10.1007/978-981-95-5761-5_28","openalex_id":"https://openalex.org/W7121203414","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Baidu (China)","Wuhan University of Science and Technology"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8988000154495239},{"id":"https://openalex.org/C86251818","display_name":"Benchmarking","score":0.8256999850273132},{"id":"https://openalex.org/C185798385","display_name":"Benchmark (surveying)","score":0.8154000043869019},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5637000203132629},{"id":"https://openalex.org/C99498987","display_name":"Noise (video)","score":0.5289999842643738},{"id":"https://openalex.org/C205372480","display_name":"Image resolution","score":0.46000000834465027},{"id":"https://openalex.org/C160633673","display_name":"Pixel","score":0.45100000500679016},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.40310001373291016}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7156189649","title":"LVMark: Robust Watermark for Latent Video Diffusion Models","url":"https://doi.org/10.1109/tifs.2026.3688194","published":"2026-01-01","authors":["Youngdong Jang","MinHyuk Jang","Jaehyeok Lee","Feng Yang","Gyeongrok Oh","Jongheon Jeong","Sangpil Kim"],"abstract":"Rapid advancements in video diffusion models have enabled the creation of realistic videos, raising concerns about unauthorized use and driving the demand for techniques to protect model ownership. Existing watermarking methods suffer from two key limitations: they overlook temporal consistency due to conventional watermark decoders and degrade the visual quality of the generated videos. To address these issues, we introduce a robust watermarking method for latent video diffusion models named Latent Video Diffusion Watermarking (LVMark). We propose a novel watermark decoder tailored for generated videos by learning the consistency between adjacent frames. It ensures accurate message decoding, even under malicious attacks, by combining the low-frequency components of the three-dimensional wavelet domain with the color features of the video. Additionally, we train a latent decoder to maint...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/tifs.2026.3688194","openalex_id":"https://openalex.org/W7156189649","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Google (United States)","Korea University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8033999800682068},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.6815000176429749},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.6359000205993652},{"id":"https://openalex.org/C164112704","display_name":"Watermark","score":0.5094000101089478},{"id":"https://openalex.org/C153180895","display_name":"Pattern recognition (psychology)","score":0.46790000796318054},{"id":"https://openalex.org/C150817343","display_name":"Digital watermarking","score":0.4242999851703644},{"id":"https://openalex.org/C63479239","display_name":"Robustness (evolution)","score":0.42149999737739563},{"id":"https://openalex.org/C69357855","display_name":"Diffusion","score":0.3806999921798706}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7128086133","title":"LD-Seg: Training-Free Novel Instance Segmentation Based on LVLM-Driven Vision Foundation Models","url":"https://doi.org/10.1007/978-981-95-5758-5_38","published":"2026-01-01","authors":["Yingnan Guo","Yongliang Lin","Hanqing Yang","Ji Won Han","Yu Zhang"],"abstract":"","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"book-chapter","doi":"https://doi.org/10.1007/978-981-95-5758-5_38","openalex_id":"https://openalex.org/W7128086133","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Alibaba Group (China)","State Key Laboratory of Industrial Control Technology","Suffolk University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8909000158309937},{"id":"https://openalex.org/C89600930","display_name":"Segmentation","score":0.7912999987602234},{"id":"https://openalex.org/C43521106","display_name":"Pipeline (software)","score":0.7688999772071838},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.7347000241279602},{"id":"https://openalex.org/C147037132","display_name":"Minimum bounding box","score":0.6611999869346619},{"id":"https://openalex.org/C63584917","display_name":"Bounding overwatch","score":0.6306999921798706},{"id":"https://openalex.org/C2776151529","display_name":"Object detection","score":0.5845000147819519},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.5629000067710876}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"arxiv:2507.23590","title":"Identifying Hearing Difficulty Moments in Conversational Audio","url":"http://arxiv.org/abs/2507.23590","published":"2026-01-01","authors":["J.B. Collins","Adrian Buzea","Chris Collier","Alejandro Ballesta Rosen","Julian Maclaren","Richard F. Lyon","Simon Carlile","Simon Carlile"],"abstract":"in everyday conversation. Identifying Hearing Difficulty Moments has particular significance in the field of hearing assistive technology where timely interventions are key for real-time hearing assistance. In this article, we propose and compare machine learning solutions for the temporal detection of segments containing Hearing Difficulty Moments in conversational audio. We show that audio language models, through their multimodal reasoning capabilities, can achieve state-of-the-art results for this task, significantly outperforming a simple automatic speech recognition (ASR) hotword heuristic and a more conventional fine-tuning approach with Wav2Vec, an audio-only input architecture that is state-of-the-art for ASR.","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"preprint","doi":"https://doi.org/10.1177/23312165261446379","openalex_id":"https://openalex.org/W4414921243","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Google (United States)","Macquarie University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.692300021648407},{"id":"https://openalex.org/C28490314","display_name":"Speech recognition","score":0.6363999843597412},{"id":"https://openalex.org/C2780801066","display_name":"Hearing aid","score":0.4796000123023987},{"id":"https://openalex.org/C9652623","display_name":"Field (mathematics)","score":0.47870001196861267},{"id":"https://openalex.org/C173801870","display_name":"Heuristic","score":0.4480000138282776},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.4142000079154968},{"id":"https://openalex.org/C36503486","display_name":"Domain (mathematical analysis)","score":0.4138000011444092},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4108999967575073}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7124180657","title":"Generative AI for Synthetic Data Creation in Privacy-Preserving Data Analytics","url":"https://doi.org/10.1007/978-981-96-8104-4_9","published":"2026-01-01","authors":["Rahul Vadisetty","Anand Polamarasetti","Mahesh Kumar Goyal","Deven Yadav"],"abstract":"","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"book-chapter","doi":"https://doi.org/10.1007/978-981-96-8104-4_9","openalex_id":"https://openalex.org/W7124180657","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Andhra University","Google (United States)","Kern Medical Center","Wayne State University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7444000244140625},{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.6568999886512756},{"id":"https://openalex.org/C79158427","display_name":"Analytics","score":0.644599974155426},{"id":"https://openalex.org/C167966045","display_name":"Generative model","score":0.6086000204086304},{"id":"https://openalex.org/C2522767166","display_name":"Data science","score":0.5855000019073486},{"id":"https://openalex.org/C160920958","display_name":"Synthetic data","score":0.5498999953269958},{"id":"https://openalex.org/C23130292","display_name":"Differential privacy","score":0.5461000204086304},{"id":"https://openalex.org/C175801342","display_name":"Data analysis","score":0.5277000069618225}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7118004732","title":"Generative AI for Predictive Modeling of Upcoming Cancers: Identifying Risk Factors and Emerging Disease Patterns","url":"https://doi.org/10.1007/978-981-96-8126-6_26","published":"2026-01-01","authors":["Rahul Vadisetty","Anand Polamarasetti","Mahesh Kumar Goyal","Deven Yadav"],"abstract":"","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"book-chapter","doi":"https://doi.org/10.1007/978-981-96-8126-6_26","openalex_id":"https://openalex.org/W7118004732","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Andhra University","Google (United States)","Kern Medical Center","Wayne State University"],"concepts":[{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.7235000133514404},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.6499000191688538},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.6481999754905701},{"id":"https://openalex.org/C70587473","display_name":"Transformative learning","score":0.6021999716758728},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5842999815940857},{"id":"https://openalex.org/C2778136018","display_name":"Predictive power","score":0.5810999870300293},{"id":"https://openalex.org/C73555534","display_name":"Cluster analysis","score":0.5285999774932861},{"id":"https://openalex.org/C167966045","display_name":"Generative model","score":0.5281999707221985}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"arxiv:2602.14771","title":"GOT-JEPA: Generic Object Tracking with Model Adaptation and Occlusion Handling using Joint-Embedding Predictive Architecture","url":"http://arxiv.org/abs/2602.14771","published":"2026-01-01","authors":["Shih-Fang Chen","Jun-Cheng Chen","I-Hong Jhuo","Yen-Yu Lin"],"abstract":"The human visual system tracks objects by integrating current observations with previously observed information, adapting to target and scene changes, and reasoning about occlusion at fine granularity. In contrast, recent generic object trackers are often optimized for training targets, which limits robustness and generalization in unseen scenarios, and their occlusion reasoning remains coarse, lacking detailed modeling of occlusion patterns. To address these limitations in generalization and occlusion perception, we propose GOT-JEPA, a model-predictive pretraining framework that extends JEPA from predicting image features to predicting tracking models. Given identical historical information, a teacher predictor generates pseudo-tracking models from a clean current frame, and a student predictor learns to predict the same pseudo-tracking models from a corrupted version of the current fra...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/tcsvt.2026.3675005","openalex_id":"https://openalex.org/W7138189118","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Microsoft (United States)","Microsoft Research (United Kingdom)","National Yang Ming Chiao Tung University","Research Center for Information Technology Innovation, Academia Sinica"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7235000133514404},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.6704999804496765},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.6621000170707703},{"id":"https://openalex.org/C2776268601","display_name":"Occlusion","score":0.5424000024795532},{"id":"https://openalex.org/C2781238097","display_name":"Object (grammar)","score":0.49320000410079956},{"id":"https://openalex.org/C2775936607","display_name":"Tracking (education)","score":0.43630000948905945},{"id":"https://openalex.org/C202474056","display_name":"Video tracking","score":0.4083999991416931},{"id":"https://openalex.org/C139807058","display_name":"Adaptation (eye)","score":0.40310001373291016}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7152006584","title":"Exploring Cross-Lingual Latent Transplantation: Mutual Opportunities and Open Challenges","url":"https://doi.org/10.1109/taslpro.2026.3682051","published":"2026-01-01","authors":["Yangfan Ye","Xiaocheng Feng","Xiachong Feng","Libo Qin","Yichong Huang","Liang Huang","Weitao Ma","Qichen Hong","Zhirui Zhang","Yunfei Lu","Xiaohui Yan","Duyu Tang"],"abstract":"Current large language models (LLMs) often exhibit imbalances in multilingual capabilities and cultural adaptability, largely attributed to their English-centric pretraining data. In this paper, we introduce and investigate cross-lingual latent transplantation (<inline-formula xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" xmlns:xlink=\"http://www.w3.org/1999/xlink\"><tex-math notation=\"LaTeX\">$\\mathcal {X}$</tex-math></inline-formula>Transplant), a probing framework which aims to further exploit the model's internalized multilingual knowledge during inference and examine its effects on the multilingual capability and cultural adaptability of LLMs. <inline-formula xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" xmlns:xlink=\"http://www.w3.org/1999/xlink\"><tex-math notation=\"LaTeX\">$\\mathcal {X}$</tex-math></inline-formula>Transplant framework enables models to harness the complementary stren...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/taslpro.2026.3682051","openalex_id":"https://openalex.org/W7152006584","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Central South University","Harbin Institute of Technology","Huawei Technologies (China)","University of Hong Kong"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.567799985408783},{"id":"https://openalex.org/C56739046","display_name":"Knowledge management","score":0.3555999994277954},{"id":"https://openalex.org/C2522767166","display_name":"Data science","score":0.32179999351501465},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.30410000681877136},{"id":"https://openalex.org/C183322885","display_name":"Context model","score":0.2793000042438507},{"id":"https://openalex.org/C26517878","display_name":"Key (lock)","score":0.27300000190734863},{"id":"https://openalex.org/C2779343474","display_name":"Context (archaeology)","score":0.26739999651908875},{"id":"https://openalex.org/C192209626","display_name":"Focus (optics)","score":0.26460000872612}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7133295236","title":"Enhancing Weakly Supervised Multimodal Video Anomaly Detection through Text Guidance","url":"https://doi.org/10.1109/tmm.2026.3668927","published":"2026-01-01","authors":["Shengyang Sun","Jiashen Hua","Junyi Feng","Xiaojin Gong"],"abstract":"In recent years, weakly supervised multimodal video anomaly detection, which leverages RGB, optical flow, and audio modalities, has garnered significant attention from researchers, emerging as a vital subfield within video anomaly detection. However, previous studies have inadequately explored the role of text modality in this domain. With the proliferation of large-scale text-annotated video datasets and the advent of video captioning models, obtaining text descriptions from videos has become increasingly feasible. Text modality, carrying explicit semantic information, can more accurately characterize events within videos and identify anomalies, thereby enhancing the model's detection capabilities and reducing false alarms. However, text feature extraction challenges anomaly detection. Pre-trained large language models often struggle to effectively capture the nuances associated with an...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/tmm.2026.3668927","openalex_id":"https://openalex.org/W7133295236","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Alibaba Group (China)","Zhejiang University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8274999856948853},{"id":"https://openalex.org/C739882","display_name":"Anomaly detection","score":0.590399980545044},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.590399980545044},{"id":"https://openalex.org/C152124472","display_name":"Redundancy (engineering)","score":0.5824000239372253},{"id":"https://openalex.org/C52622490","display_name":"Feature extraction","score":0.5432999730110168},{"id":"https://openalex.org/C2780226545","display_name":"Modality (human–computer interaction)","score":0.5127999782562256},{"id":"https://openalex.org/C2776401178","display_name":"Feature (linguistics)","score":0.5055000185966492},{"id":"https://openalex.org/C2780513914","display_name":"Bottleneck","score":0.4982999861240387}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7124199540","title":"Enhancing Multimodal Learning in Generative AI: Integrating Visual Context with LLMs for Improved Understanding in Cloud Environments","url":"https://doi.org/10.1007/978-981-96-8104-4_4","published":"2026-01-01","authors":["Rahul Vadisetty","Anand Polamarasetti","Mahesh Kumar Goyal","Harshini Gadam"],"abstract":"","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"book-chapter","doi":"https://doi.org/10.1007/978-981-96-8104-4_4","openalex_id":"https://openalex.org/W7124199540","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Andhra University","Google (United States)","Staples (United States)","Wayne State University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7391999959945679},{"id":"https://openalex.org/C79974875","display_name":"Cloud computing","score":0.6725999712944031},{"id":"https://openalex.org/C2779343474","display_name":"Context (archaeology)","score":0.5309000015258789},{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.4945000112056732},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.4763999879360199},{"id":"https://openalex.org/C48044578","display_name":"Scalability","score":0.40959998965263367},{"id":"https://openalex.org/C2522767166","display_name":"Data science","score":0.40779998898506165},{"id":"https://openalex.org/C98045186","display_name":"Process (computing)","score":0.3815000057220459}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7128605167","title":"Enhancing MMDiT-Based Text-to-Image Models for Similar Subject Generation","url":"https://doi.org/10.1109/tpami.2026.3663759","published":"2026-01-01","authors":["Tianyi Wei","Dongdong Chen","Yifan Zhou","Xingang Pan"],"abstract":"Representing the cutting-edge technique of text-to-image models, the latest Multimodal Diffusion Transformer (MMDiT) largely mitigates many generation issues existing in previous models. However, we discover that it still suffers from subject neglect or mixing when the input text prompt contains multiple subjects of similar semantics or appearance. We identify three possible ambiguities within the MMDiT architecture that cause this problem: Inter-block Ambiguity, Text Encoder Ambiguity, and Semantic Ambiguity. To address these issues, we propose to repair the ambiguous latent on-the-fly by test-time optimization at early denoising steps. In detail, we design three loss functions: Block Alignment Loss, Text Encoder Alignment Loss, and Overlap Loss, each tailored to mitigate these ambiguities. Despite significant improvements, we observe that semantic ambiguity persists when generating mul...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/tpami.2026.3663759","openalex_id":"https://openalex.org/W7128605167","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Microsoft (United States)","Nanyang Technological University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7930999994277954},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5860999822616577},{"id":"https://openalex.org/C2780522230","display_name":"Ambiguity","score":0.5590000152587891},{"id":"https://openalex.org/C184337299","display_name":"Semantics (computer science)","score":0.5458999872207642},{"id":"https://openalex.org/C118505674","display_name":"Encoder","score":0.5095999836921692},{"id":"https://openalex.org/C86251818","display_name":"Benchmarking","score":0.41780000925064087},{"id":"https://openalex.org/C66322947","display_name":"Transformer","score":0.4142000079154968},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.38589999079704285}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7123486136","title":"Enhancing Compositional Reasoning in Multimodal Large Language Models","url":"https://doi.org/10.1007/978-981-95-5679-3_6","published":"2026-01-01","authors":["Shun Qian","Bingquan Liu","Chengjie Sun","Zhen Xu","Baoxun Wang"],"abstract":"","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"book-chapter","doi":"https://doi.org/10.1007/978-981-95-5679-3_6","openalex_id":"https://openalex.org/W7123486136","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Harbin Institute of Technology","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8756999969482422},{"id":"https://openalex.org/C121375916","display_name":"Principle of compositionality","score":0.8607000112533569},{"id":"https://openalex.org/C2777508537","display_name":"Visual reasoning","score":0.6381000280380249},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5397999882698059},{"id":"https://openalex.org/C89288958","display_name":"Reasoning system","score":0.41769999265670776},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.4000000059604645},{"id":"https://openalex.org/C2780980858","display_name":"Dual (grammatical number)","score":0.396699994802475},{"id":"https://openalex.org/C2983448237","display_name":"Language understanding","score":0.3781000077724457}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7139117483","title":"Chatbots vs AI Agents: The Shift toward Multi-Step Automation in Sales Support Workflows","url":"https://doi.org/10.63282/3050-9416.ijaibdcms-v7i1p127","published":"2026-01-01","authors":["Adish Rai"],"abstract":"Sales organizations rely on automation tools to handle repetitive customer interactions, lead qualification, and support tasks. Traditional chatbots execute predefined scripts effectively for straightforward queries but struggle with complex, multi-step workflows requiring contextual understanding and adaptive decision-making. AI agents represent an evolution beyond chatbot capabilities, employing large language models and reasoning frameworks to autonomously manage multi-stage processes including deal coaching, opportunity analysis, and cross-system task orchestration. This paper examines fundamental differences between chatbots and AI agents in sales support contexts, describing when each approach delivers value and how organizations can transition from scripted automation to agentic workflows. The analysis covers implementation considerations, use case selection criteria, and practica...","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.63282/3050-9416.ijaibdcms-v7i1p127","openalex_id":"https://openalex.org/W7139117483","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Amazon (United States)"],"concepts":[{"id":"https://openalex.org/C177212765","display_name":"Workflow","score":0.8705999851226807},{"id":"https://openalex.org/C2779041454","display_name":"Chatbot","score":0.864300012588501},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7056999802589417},{"id":"https://openalex.org/C115901376","display_name":"Automation","score":0.7037000060081482},{"id":"https://openalex.org/C2780451532","display_name":"Task (project management)","score":0.566100001335144},{"id":"https://openalex.org/C61423126","display_name":"Scripting language","score":0.5403000116348267},{"id":"https://openalex.org/C115903868","display_name":"Software engineering","score":0.4489000141620636},{"id":"https://openalex.org/C81917197","display_name":"Selection (genetic algorithm)","score":0.4020000100135803}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7128427183","title":"AnchorCrafter: Animate Cyber-Anchors Selling Your Products via Human-Object Interacting Video Generation","url":"https://doi.org/10.1109/tvcg.2026.3662720","published":"2026-01-01","authors":["Ziyi Xu","Ziyao Huang","Juan Cao","Yong Zhang","Xiaodong Cun","Qing Shuai","Yuchen Wang","Linchao Bao","Fan Tang"],"abstract":"The generation of anchor-style product promotion videos presents promising opportunities in e-commerce, advertising, and consumer engagement. Despite advancements in pose-guided human video generation, creating product promotion videos remains challenging. In addressing this challenge, we identify the integration of human-object interactions (HOI) into pose-guided human video generation as a core issue. To this end, we introduce AnchorCrafter, a novel diffusion-based system designed to generate 2D videos featuring a target human and a customized object, achieving high visual fidelity and controllable interactions. Specifically, we propose two key innovations: the HOI-appearance perception, which enhances object appearance recognition from arbitrary multi-view perspectives and disentangles object and human appearance, and the HOI-motion injection, which enables complex human-object intera...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/tvcg.2026.3662720","openalex_id":"https://openalex.org/W7128427183","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Chinese Academy of Sciences","Great Bay University","Institute of Computing Technology","Meizu (China)","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8817999958992004},{"id":"https://openalex.org/C26517878","display_name":"Key (lock)","score":0.6021999716758728},{"id":"https://openalex.org/C2781238097","display_name":"Object (grammar)","score":0.6000000238418579},{"id":"https://openalex.org/C2776459999","display_name":"Fidelity","score":0.58160001039505},{"id":"https://openalex.org/C2776436953","display_name":"Consistency (knowledge bases)","score":0.559499979019165},{"id":"https://openalex.org/C90673727","display_name":"Product (mathematics)","score":0.5281000137329102},{"id":"https://openalex.org/C202474056","display_name":"Video tracking","score":0.5102999806404114},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5085999965667725}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7124144554","title":"Analyzing and Defending Against Adversarial Attacks on Generative AI in the Cloud (Vulnerabilities)","url":"https://doi.org/10.1007/978-981-96-8104-4_5","published":"2026-01-01","authors":["Rahul Vadisetty","Anand Polamarasetti","Mahesh Kumar Goyal","Manikanta Rajendra Kumar Kakarala"],"abstract":"","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"book-chapter","doi":"https://doi.org/10.1007/978-981-96-8104-4_5","openalex_id":"https://openalex.org/W7124144554","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Andhra University","Google (United States)","University of Central Missouri","Wayne State University"],"concepts":[{"id":"https://openalex.org/C37736160","display_name":"Adversarial system","score":0.885200023651123},{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.7264999747276306},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6895999908447266},{"id":"https://openalex.org/C38652104","display_name":"Computer security","score":0.6100999712944031},{"id":"https://openalex.org/C153083717","display_name":"Leverage (statistics)","score":0.6026999950408936},{"id":"https://openalex.org/C79974875","display_name":"Cloud computing","score":0.5740000009536743},{"id":"https://openalex.org/C2779585090","display_name":"Resilience (materials science)","score":0.489300012588501},{"id":"https://openalex.org/C140547941","display_name":"Threat model","score":0.36320000886917114}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7131286449","title":"A Measurement Report Data-Driven Framework for Localized Statistical Channel Modeling","url":"https://doi.org/10.1109/tmc.2026.3667749","published":"2026-01-01","authors":["Xinyu Qin","Qi Yan","S. Zhang","Bingsheng Peng","Ye Xue","Tsung‐Hui Chang"],"abstract":"Localized statistical channel modeling (LSCM), a key enabler for digital twin networks, traditionally relies on costly and spatially limited drive test data to estimate the channel angular power spectrum (APS) from reference signal received power measurements. This paper proposes a measurement report (MR) data-driven LSCM framework (MR-LSCM) to leverage low-cost and ubiquitous MR data. However, integrating MR data presents critical challenges: the prevalent lack of location labels required for LSCM, and the mismatch between uniform geographic grids in LSCM and spatially non-uniform MR data in complex propagation environments. To address these issues, our MR-LSCM framework introduces two specialized modules. First, a semi-supervised hypergraph neural network is proposed for MR localization, which exploits multimodal information to achieve robust performance even with scarce labels. Second...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/tmc.2026.3667749","openalex_id":"https://openalex.org/W7131286449","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Chinese University of Hong Kong, Shenzhen","Huawei Technologies (China)","Shenzhen Research Institute of Big Data","Shenzhen University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8208000063896179},{"id":"https://openalex.org/C63479239","display_name":"Robustness (evolution)","score":0.678600013256073},{"id":"https://openalex.org/C124101348","display_name":"Data mining","score":0.5224999785423279},{"id":"https://openalex.org/C153083717","display_name":"Leverage (statistics)","score":0.5217999815940857},{"id":"https://openalex.org/C165696696","display_name":"Exploit","score":0.5184000134468079},{"id":"https://openalex.org/C73555534","display_name":"Cluster analysis","score":0.510200023651123},{"id":"https://openalex.org/C26517878","display_name":"Key (lock)","score":0.4812000095844269},{"id":"https://openalex.org/C187691185","display_name":"Grid","score":0.48100000619888306}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"official:e36b0e63d7067ebc","title":"Proteina-Complexa: Scaling Atomistic Protein Binder Design with Generative Pretraining and Test-Time Compute","url":"https://research.nvidia.com/publication/2026-01_proteina-complexa-scaling-atomistic-protein-binder-design-generative","published":"2026-01","authors":["Kieran Didi","Zuobai Zhang","Guoqing Zhou","Danny Reidenbach","Zhonglin Cao","Sooyoung Cha","Tomas Geffner","Christian Dallago","Jian Tang","Michael M. Bronstein","Martin Steinegger","Emine Kucukbenli"],"abstract":"Official NVIDIA Research publication. ICLR","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["ICLR"],"author_affiliations":["NVIDIA"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official NVIDIA Research publications page https://research.nvidia.com/publications?f%5B0%5D=publication_date%3A2026&page=0"}},{"id":"official:41b1cdb2fee038d9","title":"Using brand knowledge bases and LLM agents to enhance e-commerce retailers' catalog quality","url":"https://www.amazon.science/publications/using-brand-knowledge-bases-and-llm-agents-to-enhance-e-commerce-retailers-catalog-quality","published":"2026","authors":["Hayreddin Ceker","Gang Luo","Kee Kiat Koo","Prashant Mathur","Wencong You","Atharva Amdekar","Rob Barton","Navaneet KL","Vidit Bansal","Karim Bouyarmane"],"abstract":"For e-commerce retailers, high-quality product catalogs are vital to customer experience. Yet, despite lots of data cleaning efforts, catalog quality, especially in large catalogs, remains suboptimal. This paper shows how to use unstructured brand knowledge base data as a reference and a large language model agent to automatically enhance an e-commerce retailer's catalog quality. Unlike prior methods that Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.1145/3773966.3784969","openalex_id":"https://openalex.org/W7129071952","cited_by_count":0,"quality_score":68,"matched_keywords":["Conversational AI","LLM","language model","agent"],"author_affiliations":["Amazon","Amazon (United States)"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=5"}},{"id":"official:a8d3e528b37c4f6f","title":"The subtle art of defection: Understanding uncooperative behaviors in LLM based multi-agent systems","url":"https://www.amazon.science/publications/the-subtle-art-of-defection-understanding-uncooperative-behaviors-in-llm-based-multi-agent-systems","published":"2026","authors":["Devang Kulshreshtha","Wanyu Du","Raghav Jain","Srikanth Doss","Hang Su","Sandesh Swamy","Yanjun (Jane) Qi"],"abstract":"This paper introduces a novel framework for simulating and analyzing how uncooperative behaviors can destabilize or collapse LLM-based multi-agent systems. Our framework includes two key components: (1) a game theory-based taxonomy of uncooperative agent behaviors, addressing a notable gap in the existing literature; and (2) a structured, multistage simulation pipeline that dynamically generates and refines Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Conversational AI","LLM","agent","multi-agent"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=4"}},{"id":"official:44366c6f7ae4332a","title":"Scaling laws meet model architecture: Toward inference-efficient LLMs","url":"https://www.amazon.science/publications/scaling-laws-meet-model-architecture-toward-inference-efficient-llms","published":"2026","authors":["Song Bian","Tao Yu","Shivaram Venkataraman","Youngsuk Park"],"abstract":"Scaling the number of parameters and the size of training data has proven to be an effective strategy for improving large language model (LLM) performance. Yet, as these models grow increasingly powerful and widely deployed, the cost of inference has become a pressing concern. Despite its importance, the tradeoff between model accuracy and inference efficiency remains underexplored. In this work, we examine Category: Machine learning","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Machine learning","LLM","language model","efficient"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=3"}},{"id":"official:6b5c4c0d792405bd","title":"Personalized autocompletion of interactions with LLM-based chatbots","url":"https://www.amazon.science/publications/personalized-autocompletion-of-interactions-with-llm-based-chatbots","published":"2026","authors":["Shani Goren","Oren Kalinsky","Tomer Stav","Nachshon Cohen","Yuri Rapoport","Yaron Fairstein","Ram Yazdi","Alex Libov","Guy Kushilevitz"],"abstract":"Composing messages in chatbot interactions is often time-consuming, making autocompletion an appealing way to reduce user effort. Different users have different preferences and therefore different expectations from autocompletion solutions. We study how personalization can improve the autocompletion process, evaluating four schemes defined along two axes: generation vs. ranking, and prior messages vs. external Category: Machine learning","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Machine learning","LLM","personalized","personalization"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=6"}},{"id":"official:6eaa81b8f25fc8da","title":"MAPRO: Recasting multi-agent prompt optimization as maximum a posteriori inference","url":"https://www.amazon.science/publications/mapro-recasting-multi-agent-prompt-optimization-as-maximum-a-posteriori-inference","published":"2026","authors":["Zheyuan Zhang","Lin Ge","Hongjiang Li","Weicheng Zhu","Chuxu Zhang","Yanfang Ye"],"abstract":"Large language models (LLMs) have demonstrated remarkable capabilities across diverse tasks, and LLM-based agents further extend these abilities to various practical workflows. While recent progress shows that multi-agent systems (MAS) can outperform single agents by coordinating specialized roles, designing effective MAS remains difficult due to prompt sensitivity and the compounded instability MAS creates Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Conversational AI","LLM","agent","multi-agent"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=6"}},{"id":"official:60bd388562372f4d","title":"Hearing between the lines: Unlocking the reasoning power of LLMs for speech evaluation","url":"https://www.amazon.science/publications/hearing-between-the-lines-unlocking-the-reasoning-power-of-llms-for-speech-evaluation","published":"2026","authors":["Arjun Chandra","Kevin Miller","Venkatesh Ravichandran","Costas Papayiannis","Venkatesh Saligrama"],"abstract":"Large Language Model (LLM) judges exhibit strong reasoning capabilities but are limited to textual content. This leaves current automatic Speech-to-Speech (S2S) evaluation methods reliant on opaque and expensive Audio Language Models (ALMs). In this work, we propose TRACE (Textual Reasoning over Audio Cues for Evaluation), a novel framework that enables LLM judges to reason over audio cues to achieve cost-efficient Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Conversational AI","LLM","language model","efficient"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=4"}},{"id":"official:f9d883c80d29355c","title":"Finny: A multi-agent system for structured decision-making with LLMs","url":"https://www.amazon.science/publications/finny-a-multi-agent-system-for-structured-decision-making-with-llms","published":"2026","authors":["Harshitha Ravindra","Utkarsh Bajaj","Madhur Mehta"],"abstract":"Finny is a multi-agent system that demonstrates how large language models can perform structured decision-making by applying domain-specific rules to multiple related scenarios. Leveraging foundation models with Retrieval-Augmented Generation (RAG), the system applies Standard Operating Procedures (SOPs) for intelligent forecast refinement at scale. Finny employs a two-stage architecture: a knowledge base Category: Machine learning","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Machine learning","retrieval","agent","multi-agent"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications"}},{"id":"official:538ad9721195ef62","title":"Exploring fine-tuning for in-context retrieval and efficient KV-caching in long-context language models","url":"https://www.amazon.science/publications/exploring-fine-tuning-for-in-context-retrieval-and-efficient-kv-caching-in-long-context-language-models","published":"2026","authors":["Francesco Molfese","Momchil Hardalov","Rexhina Blloshmi","Bill Byrne","Adrià de Gispert"],"abstract":"With context windows of millions of tokens, Long-Context Language Models (LCLMs) can encode entire document collections, offering a strong alternative to conventional retrieval augmented generation (RAG). However, it remains unclear whether fine-tuning strategies can improve long-context performance and translate to greater robustness under KV-cache compression techniques. In this work, we investigate which Category: Machine learning","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.48448/5t3s-9922","openalex_id":"https://openalex.org/W7155432380","cited_by_count":0,"quality_score":68,"matched_keywords":["Machine learning","retrieval","efficient","compression"],"author_affiliations":["Amazon","Amazon (United States)","Sapienza University of Rome"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications"}},{"id":"official:1fa451f8329ad351","title":"Domain-specific LLM adaptation: Bridging personalization and efficiency through synthetic data and optimization","url":"https://www.amazon.science/publications/domain-specific-llm-adaptation-bridging-personalization-and-efficiency-through-synthetic-data-and-optimization","published":"2026","authors":["Iman Abbasnejad","Brett Tully","Wei Zhou","Tomal Deb","Sheldon Liu","Xuefeng Liu","Warren Wei"],"abstract":"Large Language Models (LLMs) have demonstrated exceptional capabilities but face two critical deployment challenges: high computational costs and scarcity of personalized domain training data. We address these dual challenges through a comprehensive framework that combines synthetic data generation with inference optimization techniques. Our approach employs LLMs for zero-shot and few-shot synthetic dataset Category: Machine learning","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Machine learning","LLM","personalized","personalization"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=7"}},{"id":"official:44272125eea5e0ee","title":"Confidence-calibrated small-large language model collaboration for cost-efficient reasoning","url":"https://www.amazon.science/publications/confidence-calibrated-small-large-language-model-collaboration-for-cost-efficient-reasoning","published":"2026","authors":["Chuang Zhang","Zizhen Zhu","Yihao Wei","Bing Tian","Junyi Liu","Henan Wang","Xavier Wang","Yaxiao Liu"],"abstract":"Large language models (LLMs) demonstrate superior reasoning capabilities compared to small language models (SLMs), but incur substantially higher costs. We propose COllaborative REAsoner (COREA), a system that cascades an SLM with an LLM to achieve a balance between accuracy and cost in complex reasoning tasks. COREA first attempts to answer questions using the SLM, which outputs both an answer and a verbalized Category: Machine learning","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Machine learning","LLM","language model","efficient"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=6"}},{"id":"official:b58ddbdbc95c0de7","title":"R-WOM: Retrieval-augmented world model for computer-use agents","url":"https://www.amazon.science/publications/r-wom-retrieval-augmented-world-model-for-computer-use-agents","published":"2026","authors":["Kai Mei","Jiang Guo","Shuaichen Chang","Marvin Dong","Dongkyu Lee","Xing Niu","Jiarong Jiang"],"abstract":"Large Language Models (LLMs) can serve as world models to enhance agent decision-making in digital environments by simulating future states and predicting action outcomes, potentially eliminating costly trial-and-error exploration. However, this capability is fundamentally limited by LLMs' tendency to hallucination and their reliance on static training knowledge, which could lead to compounding errors that Category: Search and information retrieval","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Search and information retrieval","retrieval","agent"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=4"}},{"id":"official:553acf6910037495","title":"MEAV: Model editing with alignment vectors for inference time LLM alignment in single and multidomain preference spectrum","url":"https://www.amazon.science/publications/meav-model-editing-with-alignment-vectors-for-inference-time-llm-alignment-in-single-and-multidomain-preference-spectrum","published":"2026","authors":["Sadat Shahriar","Zheng Qi","Nikolaos Pappas","Srikanth Doss","Kishaloy Halder","Monica Sunkara","Manuel Mager","Yassine Benajiba"],"abstract":"Aligning Large Language Models (LLM) to address subjectivity and nuanced preference levels requires adequate flexibility and control, which can be a resource-intensive and time-consuming procedure. Existing training-time alignment methods require full re-training when a change is needed and inference-time ones typically require access to the reward model at each inference step. We introduce MEAV, an inference-time Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Conversational AI","LLM","preference"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications"}},{"id":"official:e1cef5bcd1bb8c31","title":"Knowledge distillation for large language models through residual learning","url":"https://www.amazon.science/publications/knowledge-distillation-for-large-language-models-through-residual-learning","published":"2026","authors":["Thinh On","Hengzhi Pei","Leonard Lausen","George Karypis"],"abstract":"Knowledge distillation has become a crucial technique to transfer the capacities of large language models (LLMs) to smaller, more efficient models for practical deployment. While recent work exploits rich information from intermediate states of the teacher model for more effective knowledge transfer, imperfect knowledge from the teacher can also mislead student learning, restricting the student’s generalization Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Conversational AI","efficient","distillation"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications"}},{"id":"official:d58cbd80adae7d1b","title":"Keyword search is all you need: Achieving RAG-level performance without vector databases using agentic tool use","url":"https://www.amazon.science/publications/keyword-search-is-all-you-need-achieving-rag-level-performance-without-vector-databases-using-agentic-tool-use","published":"2026","authors":["Shreyas Subramanian","Wale Akinfaderin","Yanyan Zhang","Ishan Singh","Chris Pecora","Mani Khanuja","Sandeep Singh","Maira Ladeira Tanke"],"abstract":"While Retrieval-Augmented Generation (RAG) has proven effective for generating accurate, context-based responses based on existing knowledge bases, it presents several challenges including retrieval quality dependencies, integration complexity and cost. Recent advances in agentic-RAG and tool-augmented LLM architectures have introduced alternative approaches to information retrieval and processing. We question Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Conversational AI","LLM","retrieval"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=5"}},{"id":"official:b93732b3958ce675","title":"Hierarchical tokenization of multimodal music data for generative music retrieval","url":"https://www.amazon.science/publications/hierarchical-tokenization-of-multimodal-music-data-for-generative-music-retrieval","published":"2026","authors":["Wo Jae Lee","Rifat Joyee","Zhonghao Luo","Grace Kochavi","Sudev Mukherjee","Emanuele Coviello"],"abstract":"Recent advances in generative retrieval allow large language models (LLMs) to recommend items by generating their identifiers token by token. This requires each item to be represented by a compact, semantically meaningful sequence of tokens that an LLM can understand. We introduce a method to generate multimodal music token (3MToken) that transforms rich metadata from a music database—including audio, credits Category: Machine learning","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.1109/icassp55912.2026.11461319","openalex_id":"https://openalex.org/W7155055397","cited_by_count":0,"quality_score":64,"matched_keywords":["Machine learning","LLM","retrieval"],"author_affiliations":["Amazon","Amazon (United States)","San Francisco Conservatory of Music"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=4"}},{"id":"baidu-ernie:official:709be558c66373de","title":"ERNIE 5.0 Technical Report","url":"https://arxiv.org/abs/2602.04705","published":"2026","authors":["Wang","H"],"abstract":"Official ERNIE/Baidu publication page entry.","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["ERNIE","Baidu","technical report"],"author_affiliations":["Baidu"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official ERNIE publication page https://ernie.baidu.com/blog/publication/"}},{"id":"official:899b2dbde90e1a82","title":"Accelerating personalization signal learning via synthetic data","url":"https://www.amazon.science/publications/accelerating-personalization-signal-learning-via-synthetic-data","published":"2026","authors":["Daraksha Parveen","Doug Kang","Anwitha Paruchuri","Deep Kayal","Pavan Mallapragada"],"abstract":"Personalized experiences in multimodal assistants rely on accurate user understanding, yet large-scale training for personalization remains limited by privacy constraints and data sparsity. We introduce a framework for generating Comprehensive Synthetic Personas (CSPs) and personalized synthetic training data through taxonomy-guided knowledge enrichment, in-context learning, and Chain-of-Thought (CoT) knowledge Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.1007/978-3-032-21321-1_17","openalex_id":"https://openalex.org/W7140135506","cited_by_count":0,"quality_score":64,"matched_keywords":["Conversational AI","personalized","personalization"],"author_affiliations":["Amazon","Amazon (United States)","Seattle University"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=3"}},{"id":"official:03bc3d8b8a193add","title":"A modular LLM framework for explainable price outlier detection","url":"https://www.amazon.science/publications/a-modular-llm-framework-for-explainable-price-outlier-detection","published":"2026","authors":["Shadi Sartipi","John Wu","Sina Ghotbi","Nikhita Vedula","Shervin Malmasi"],"abstract":"Detecting product price outliers is important for retail and e-commerce stores as erroneous or unexpectedly high prices adversely affect competitiveness, revenue, and consumer trust. Classical techniques offer simple thresholds while ignoring the rich semantic relationships among product attributes. We propose an agentic Large Language Model (LLM) framework that treats outlier price flagging as a reasoning Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Conversational AI","LLM","language model"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=3"}},{"id":"official:ea3afff9fcf55a6a","title":"When LLMs get significantly worse: A statistical approach to detect model degradations","url":"https://www.amazon.science/publications/when-llms-get-significantly-worse-a-statistical-approach-to-detect-model-degradations","published":"2026","authors":["Jonas Kübler","Kailash Budhathoki","Matthaeus Kleindessner","Xiong Zhou","Junming Yin","Ashish Khetan","George Karypis"],"abstract":"Minimizing the inference cost and latency of foundation models has become a crucial area of research. Optimization approaches include theoretically lossless methods and others without accuracy guarantees like quantization. In all of these cases it is crucial to ensure that the model quality has not degraded. However, even at temperature zero, model generations are not necessarily robust even to theoretically Category: Machine learning","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Machine learning","quantization"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications"}},{"id":"official:bf1959da77e771b1","title":"Vision-guided iterative refinement for frontend code generation","url":"https://www.amazon.science/publications/vision-guided-iterative-refinement-for-frontend-code-generation","published":"2026","authors":["Hannah Sansford","Derek Law","Wei Liu","Abhishek Tripathi","Niresh Agarwal","Gerrit van den Burg"],"abstract":"Code generation with large language models often relies on multi-stage human-in-the-loop refinement, which is effective but very costly — particularly in domains such as frontend web development where the solution quality depends on rendered visual output. We present a fully automated critic-in-the-loop framework in which a vision-language model serves as a visual critic that provides structured feedback Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Conversational AI","language model"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=3"}},{"id":"official:ca2e10816473f5d6","title":"ViG-LLM: Enhancing visual grounding capabilities in closed-box LLMs for document information extraction without OCR dependencies","url":"https://www.amazon.science/publications/vig-llm-enhancing-visual-grounding-capabilities-in-closed-box-llms-for-document-information-extraction-without-ocr-dependencies","published":"2026","authors":["Sudhanshu Bhoi"],"abstract":"Large Language Models (LLMs) have shown remarkable capabilities in document processing, but their inability to provide visual grounding without OCR dependencies poses significant challenges in business-critical applications. Current solutions either require model fine-tuning or rely on external OCR services, introducing additional costs, latency, and limitations in handling derived information. This paper Category: Computer vision","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Computer vision","LLM"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=5"}},{"id":"official:fb17f2afd87017b8","title":"VERAFI: Verified agentic financial intelligence through neurosymbolic policy generation","url":"https://www.amazon.science/publications/verafi-verified-agentic-financial-intelligence-through-neurosymbolic-policy-generation","published":"2026","authors":["Wale Akinfaderin","Shreyas Subramanian"],"abstract":"Financial AI systems suffer from a critical blind spot: while Retrieval-Augmented Generation (RAG) excels at finding relevant documents, language models still generate calculation errors and regulatory violations during reasoning, even with perfect retrieval. This paper introduces VERAFI (Verified Agentic Financial Intelligence), an agentic framework with neurosymbolic policy generation for verified financial Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Conversational AI","retrieval"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=7"}},{"id":"official:388dbf9e5d7b5eaf","title":"Turn-PPO: Turn-level advantage estimation with PPO for improved multi-turn RL in agentic LLMs","url":"https://www.amazon.science/publications/turn-ppo-turn-level-advantage-estimation-with-ppo-for-improved-multi-turn-rl-in-agentic-llms","published":"2026","authors":["Junbo Li","Peng Zhou","Rui Meng","Meet Vadera","Lihong Li","Laurence (Yang) Li"],"abstract":"Reinforcement learning (RL) has re-emerged as a natural approach for training interactive LLM agents in real-world environments. However, directly applying the widely used Group Relative Policy Optimization (GRPO) algorithm to multi-turn tasks exposes notable limitations, particularly in scenarios requiring long-horizon reasoning. To address these challenges, we investigate more stable and effective advantage Category: Machine learning","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Machine learning","LLM"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=4"}},{"id":"official:3189b1802ae0ecf8","title":"Test-time efficient pretrained model portfolios for time series forecasting","url":"https://www.amazon.science/publications/test-time-efficient-pretrained-model-portfolios-for-time-series-forecasting","published":"2026","authors":["Mert Kayaalp","Caner Turkmen","Oleksandr Shchur","Pedro Mercado","Abdul Fatir Ansari","Michael Bohlke-Schneider","Yuyang (Bernie) Wang"],"abstract":"Is bigger always better for time series foundation models? With the question in mind, we explore an alternative to training a single, large monolithic model: build-ing a portfolio of smaller, pretrained forecasting models. By applying ensembling or model selection over these portfolios, we achieve competitive performance on large-scale benchmarks using much fewer parameters. We explore strategies for designing Category: Machine learning","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Machine learning","efficient"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=2"}},{"id":"official:ccf6e5396e263fd8","title":"Small language models for efficient agentic tool calling: Outperforming large models with targeted fine-tuning","url":"https://www.amazon.science/publications/small-language-models-for-efficient-agentic-tool-calling-outperforming-large-models-with-targeted-fine-tuning","published":"2026","authors":["Polaris Jhandi","Owais Kazi","Shreyas Subramanian","Neel Sendas"],"abstract":"As organizations scale adoption of generative AI, model cost optimization and operational efficiency have emerged as critical factors determining sustainability and accessibility. While Large Language Models (LLMs) demonstrate impressive capabilities across diverse tasks, their extensive computational requirements make them cost-prohibitive for routine enterprise use. This limitation motivates the exploration Category: Machine learning","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Machine learning","efficient"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=5"}},{"id":"official:50f551985c2dffe0","title":"Self-refining vision language model for robotic failure detection and reasoning","url":"https://www.amazon.science/publications/self-refining-vision-language-model-for-robotic-failure-detection-and-reasoning","published":"2026","authors":["Carl Qi","Xiaojie Wang","Silong Yong","Stephen Sheng","Huitan Mao","Sriram Srinivasan","Mani Nambi","Amy Zhang","Yesh Dattatreya"],"abstract":"Reasoning about failures is crucial for building reliable and trustworthy robotic systems. Prior approaches either treat failure reasoning as a closed-set classification problem or assume access to ample human annotations. Failures in the real world are typically subtle, combinatorial, and difficult to enumerate, whereas rich reasoning labels are expensive to acquire. We address this problem by introducing Category: Automated reasoning","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Automated reasoning","language model"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=4"}},{"id":"official:f1b6ece4c46c01f6","title":"Self-aligned reward: Towards effective and efficient reasoners","url":"https://www.amazon.science/publications/self-aligned-reward-towards-effective-and-efficient-reasoners","published":"2026","authors":["Peixuan Han","Adit Krishnan","Gerald Friedland","Jiaxuan You","Chris (Luyang) Kong"],"abstract":"Reinforcement learning with verifiable rewards has significantly advanced reasoning with large language models (LLMs) in domains such as mathematics and logic. However, verifiable signals provide only coarse-grained or binary correctness feedback. This limitation results in inefficiencies like overly verbose or repetitive reasoning. Existing length-based solutions (e.g., length penalty) compromise accuracy Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Conversational AI","efficient"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=4"}},{"id":"official:78ed328b9b98711c","title":"ReflectiveRAG: Rethinking adaptivity in retrieval-augmented generation","url":"https://www.amazon.science/publications/reflectiverag-rethinking-adaptivity-in-retrieval-augmented-generation","published":"2026","authors":["Akshay Verma","Swapnil Gupta","Siddharth Pillai","Prateek Sircar","Deepak Gupta"],"abstract":"Retrieval-Augmented Generation (RAG) systems degrade sharply under extreme noise, where irrelevant or redundant passages dominate. Current methods-fixed top-k retrieval, cross-encoder reranking, or policybased iteration-depend on static heuristics or costly reinforcement learning, failing to assess evidence sufficiency, detect subtle mismatches, or reduce redundancy, leading to hallucinations and poor grounding Category: Automated reasoning","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Automated reasoning","retrieval"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=6"}},{"id":"official:05c7df2761ef479a","title":"Personality-driven AI agents: Operationalizing OCEAN traits for human-AI collaboration in the coding domain","url":"https://www.amazon.science/publications/personality-driven-ai-agents-operationalizing-ocean-traits-for-human-ai-collaboration-in-the-coding-domain","published":"2026","authors":["Akanksha Garg","Ishaani M","Ray DeLaPena"],"abstract":"As AI agents become collaborative partners in complex tasks, understanding how agent personality affects human-AI interaction becomes critical. While recent work explores personality customization in language models, little is known about how personality affects AI coding agents. We conducted the first exploratory study investigating: if OCEAN (Openness, Conscientiousness, Extraversion, Agreeableness, Neuroticism Category: Machine learning","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.1145/3772363.3798372","openalex_id":"https://openalex.org/W7153860869","cited_by_count":0,"quality_score":60,"matched_keywords":["Machine learning","agent"],"author_affiliations":["Amazon","Amazon (United States)"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=3"}},{"id":"official:9cd6775c3ef043a0","title":"PRECISE: Reducing the bias of LLM evaluations using prediction-powered ranking estimation","url":"https://www.amazon.science/publications/precise-reducing-the-bias-of-llm-evaluations-using-prediction-powered-ranking-estimation","published":"2026","authors":["Abhishek Divekar","Anirban Majumder"],"abstract":"Evaluating the quality of search systems traditionally requires a significant number of human relevance annotations. In recent times, several systems have explored the usage of Large Language Models (LLMs) as automated judges for this task while their inherent biases prevent direct use for metric estimation. We present a statistical framework extending Prediction-Powered Inference (PPI) (Angelopoulos, Duchi Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Conversational AI","LLM"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=6"}},{"id":"official:dae52a8f353c468c","title":"Neural codec language model for controllable timbre transfer in music synthesis","url":"https://www.amazon.science/publications/neural-codec-language-model-for-controllable-timbre-transfer-in-music-synthesis","published":"2026","authors":["Sheldon Liu","Tianyu Liu","Deepak Dalakoti","Adithya Suresh","Yueying Teng","Xuefeng Liu","Atanu Roy","Randeep Bhatia","Daniel Hatadi","Prabhjeet Ghuman"],"abstract":"Neural codec language models have revolutionized speech synthesis but face significant challenges when adapted to music generation, particularly in achieving precise timbre control while preserving melodic content. We introduce Neural Code Language Model for Controllable Timbre Transfer (NCLMCTT), a novel architecture that enables zero-shot instrument cloning through direct audio conditioning without explicit Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Conversational AI","language model"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=7"}},{"id":"official:a035249e999fd72f","title":"MuonBP: Faster Muon via block-periodic orthogonalization","url":"https://www.amazon.science/publications/muonbp-faster-muon-via-block-periodic-orthogonalization","published":"2026","authors":["Ahmed Khaled","Kaan Ozkara","Tao Yu","Mingyi Hong","Youngsuk Park"],"abstract":"Gradient orthogonalization is a simple strategy that shows great utility in speeding up gradient descent. The Muon optimizer (Jordan et al., 2024b) combines gradient orthogonalization with first-order momentum and achieves significant improvement in data efficiency over Adam/AdamW (Loshchilov & Hutter, 2019a) for language model training. However, when using model parallelism, gradient orthogonalization Category: Machine learning","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Machine learning","language model"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=2"}},{"id":"official:05cfce44349f74e8","title":"Mitigating hallucinations in LLMs for international trade: Introducing the TradeGov evaluation dataset and TradeGuard hallucination mitigation framework for trade Q&A","url":"https://www.amazon.science/publications/mitigating-hallucinations-in-llms-for-international-trade-introducing-the-tradegov-evaluation-dataset-and-tradeguard-hallucination-mitigation-framework-for-trade-q-a","published":"2026","authors":["Kriti Mahajan"],"abstract":"Given the constant flux in the world of geopolitics, staying up to date and compliant with international trade issues is challenging. But exploring if LLMs can aid this task is a frontier hitherto unexplored in the LLM evaluation literature - primarily due to the lack of a dataset for benchmarking the capabilities of LLMs on questions regarding international trade subjects. To address this gap, we introduce Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Conversational AI","LLM"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=5"}},{"id":"official:0f411c9b7950c532","title":"MIRAGE: Metadata-guided image retrieval and answer generation for e-commerce troubleshooting","url":"https://www.amazon.science/publications/mirage-metadata-guided-image-retrieval-and-answer-generation-for-e-commerce-troubleshooting","published":"2026","authors":["Rishav Sahay","Lavanya Tekumalla","Anoop S V K K Saladi"],"abstract":"Existing multimodal systems typically associate text and available images based on embedding similarity or simple co-location, but such approaches often fail to ensure that the linked image accurately depicts the specific product or component mentioned in a troubleshooting instruction. We introduce MIRAGE, a metadata-first paradigm that treats structured metadata, (not raw pixels), as a first-class modality Category: Machine learning","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Machine learning","retrieval"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=5"}},{"id":"official:eb3893037a78d22d","title":"MEDAL: Multi-modal meta-space distillation and algnment for visual compatibility learning","url":"https://www.amazon.science/publications/medal-multi-modal-meta-space-distillation-and-algnment-for-visual-compatibility-learning","published":"2026","authors":["Dween Rabius Sanny","Vinay Kumar Verma","Prateek Sircar","Deepak Gupta"],"abstract":"Visual compatibility recommendation systems aim to surface compatible items (e.g. pants, shoes) that harmonise with a user-selected product (e.g., shirt). Existing methods struggle in three key aspects: they rely on global CNN representations that overlook fine-grained local cues critical for visual pairing; they force all categories into a single latent space, ignoring the fact that compatibility rules Category: Computer vision","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Computer vision","distillation"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=6"}},{"id":"official:d9bf8098c6e76256","title":"Learning compact video representations for efficient long-form video understanding in large multimodal models","url":"https://www.amazon.science/publications/learning-compact-video-representations-for-efficient-long-form-video-understanding-in-large-multimodal-models","published":"2026","authors":["Yuxiao Chen","Jue Wang","Zhikang Zhang","Jingru Yi","Xu Zhang","Yang Zou","Zhaowei Cai","Steve Yuan","Xinyu (Arthur) Li","Hao Yang","Davide Modolo"],"abstract":"With recent advancements in video backbone architectures, combined with the remarkable achievements of large language models (LLMs), the analysis of long-form videos spanning tens of minutes has become both feasible and increasingly prevalent. However, the inherently redundant nature of video sequences poses significant challenges for contemporary state-of-the-art models. These challenges stem from two Category: Computer vision","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Computer vision","efficient"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=5"}},{"id":"official:3887e7cf403bcd36","title":"LLMEvalRec: An agentic framework for simulating users to evaluate news recommendation systems","url":"https://www.amazon.science/publications/llmevalrec-an-agentic-framework-for-simulating-users-to-evaluate-news-recommendation-systems","published":"2026","authors":["Yao Ma","Samuel Louvan","Abhishek Tripathi","Wei Liu","Murat Sensoy"],"abstract":"Evaluating news recommendation systems (NRS) presents unique challenges due to their dynamic and interactive nature coupled with evolving user interests. In the early stages of development, when user bases and historical data are scarce, it is difficult to conduct meaningful offline and online evaluations. This cold-start evaluation challenge hinders data-driven decision-making for product development and Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Conversational AI","news"],"author_affiliations":["Amazon","Amazon (United Kingdom)","Amazon (United States)"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=3"}},{"id":"official:9596cc1a86d566c5","title":"Iterative reranking as a compute-scaling method for LLM-based rankers","url":"https://www.amazon.science/publications/iterative-reranking-as-a-compute-scaling-method-for-llm-based-rankers","published":"2026","authors":["Tamara Czinczoll","Dong Liu","Filippo Betello"],"abstract":"E-commerce search faces challenges such as sparse data and poor generalization from issues like multi-attribute resolution, multihop reasoning, and implicit intent. We propose iterative reranking as a compute-scaling strategy for LLM-based rankers, repeatedly applying listwise rankers to refine results by exploiting LLM non-determinism. Evaluated on three open datasets with three open-source LLMs, the method Category: Automated reasoning","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Automated reasoning","LLM"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=6"}},{"id":"official:233ec657886e04c3","title":"How catastrophic is your LLM? Certifying risk in conversation","url":"https://www.amazon.science/publications/how-catastrophic-is-your-llm-certifying-risk-in-conversation","published":"2026","authors":["Chengxiao Wang","Isha Chaudhary","Qian Hu","Weitong Ruan","Rahul Gupta","Gagandeep Singh"],"abstract":"Large Language Models (LLMs) can produce catastrophic responses in conversational settings that pose serious risks to public safety and security. Existing evaluations often fail to fully reveal these vulnerabilities because they rely on fixed attack prompt sequences, lack statistical guarantees, and do not scale to the vast space of multi-turn conversations. In this work, we propose C3LLM, a novel, principled Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Conversational AI","LLM"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications"}},{"id":"official:07dbfaf7932a073f","title":"GEM: Graph-enhanced mixture-of-experts with ReAct agents for dialogue state tracking","url":"https://www.amazon.science/publications/gem-graph-enhanced-mixture-of-experts-with-react-agents-for-dialogue-state-tracking","published":"2026","authors":["Ziqi Zhu","Adithya Suresh","Tomal Deb","Iman Abbasnejad"],"abstract":"Dialogue State Tracking (DST) requires precise extraction of structured information from multi-domain conversations, a task where Large Language Models (LLMs) struggle despite their impressive general capabilities. We present GEM (Graph-Enhanced Mixture-of-Experts), a novel framework that combines language models and graph-structured dialogue understanding with ReAct agent-based reasoning for superior DST Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Conversational AI","agent"],"author_affiliations":["Amazon","Amazon (United States)"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=6"}},{"id":"official:2b6a92483e0189b3","title":"From metrics to meaning: Estimating user feedback using LLM-based evaluation","url":"https://www.amazon.science/publications/from-metrics-to-meaning-estimating-user-feedback-using-llm-based-evaluation","published":"2026","authors":["Wanqun Zhao","Adam Patterson","Cibi Chakravarthy Senthilkumar","Yulong Wang","Anuraag Gupta","Shahriar Sadighi","Naumaan Nayyar"],"abstract":"Large language models (LLMs) are increasingly deployed in real-world applications such as chatbots, writing assistants, and text summarization tools. As these applications become more central to user-facing tasks, robust evaluation of their performance becomes critical, not only for ensuring quality but also for guiding continuous improvement. Traditional evaluation approaches rely on intrinsic metrics Category: Economics","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Economics","LLM"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=4"}},{"id":"official:32788eb5ccd780dd","title":"FregeLogic at SemEval 2026 Task 11: A hybrid neuro-symbolic architecture for content-robust syllogistic validity prediction","url":"https://www.amazon.science/publications/fregelogic-at-semeval-2026-task-11-a-hybrid-neuro-symbolic-architecture-for-content-robust-syllogistic-validity-prediction","published":"2026","authors":["Wale Akinfaderin","Nafi Diallo"],"abstract":"We present FregeLogic, a hybrid neurosymbolic system for SemEval-2026 Task 11 (Subtask 1), which addresses syllogistic validity prediction while reducing content effects on predictions. Our approach combines an ensemble of five LLM classifiers, spanning three open-weights models (Llama 4 Maverick, Llama 4 Scout, and Qwen3-32B) paired with varied prompting strategies, with a Z3 SMT solver that serves as Category: Automated reasoning","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"https://openalex.org/W7155245658","cited_by_count":0,"quality_score":60,"matched_keywords":["Automated reasoning","LLM"],"author_affiliations":["Amazon","Amazon (United States)"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=2"}},{"id":"official:8381ebf1d6ae19f6","title":"ELLA: Efficient lifelong learning for adapters in large language models","url":"https://www.amazon.science/publications/ella-efficient-lifelong-learning-for-adapters-in-large-language-models","published":"2026","authors":["Shristi Das Biswas","Yue Zhang","Anwesan Pal","Radhika Bhargava","Kaushik Roychoudhury (Roy)"],"abstract":"Large Language Models (LLMs) suffer from severe catastrophic forgetting when adapted sequentially to new tasks in a continual learning (CL) setting. Existing approaches are fundamentally limited: replay-based methods are impractical and could potentially violate privacy, while strict orthogonality-based methods collapse under scale: each new task is projected onto an orthogonal complement, progressively Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.48448/q1m1-az83","openalex_id":"https://openalex.org/W7155381940","cited_by_count":0,"quality_score":60,"matched_keywords":["Conversational AI","efficient"],"author_affiliations":["Amazon","Amazon (United States)","Purdue University West Lafayette"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=4"}},{"id":"official:2148b8fc6ebbcdb8","title":"Diffusion language model inference with Monte Carlo Tree Search","url":"https://www.amazon.science/publications/diffusion-language-model-inference-with-monte-carlo-tree-search","published":"2026","authors":["Zheng Huang","Kiran Ramnath","Yueyan Chen","Aosong Feng","Sangmin Woo","Balasubramaniam Srinivasan","Zhichao Xu","Kang Zhou","Shuai Wang","Haibo Ding","Lin Lee Cheong"],"abstract":"Diffusion language models (DLMs) have recently emerged as a compelling alternative to autoregressive generation, offering parallel generation and improved global coherence. During inference, DLMs generate text by iteratively denoising masked sequences in parallel; however, determining which positions to unmask and which tokens to commit forms a large combinatorial search problem. Existing inference methods Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.48448/ba4a-sn94","openalex_id":"https://openalex.org/W7155450760","cited_by_count":0,"quality_score":60,"matched_keywords":["Conversational AI","language model"],"author_affiliations":["Amazon","Amazon (Germany)","Amazon (United States)","Dartmouth College","Dartmouth Hospital","University of Illinois Urbana-Champaign"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=4"}},{"id":"official:7660feed85e36ac5","title":"CodeStruct: Code agents over structured action spaces","url":"https://www.amazon.science/publications/codestruct-code-agents-over-structured-action-spaces","published":"2026","authors":["Myeongsoo Kim","Joe Hsu","Dingmin Wang","Shweta Garg","Varun Kumar","Murali Krishna Ramanathan"],"abstract":"LLM-based code agents treat repositories as unstructured text, applying edits through brittle string matching that frequently fails due to formatting drift or ambiguous patterns. We propose reframing the codebase as a structured action space where agents operate on named AST entities rather than text spans. Our framework, CODESTRUCT, provides readCode for retrieving complete syntactic units and editCode Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Conversational AI","LLM"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=2"}},{"id":"official:b88eca1bb3f4fd84","title":"CASPER: Bridging discrete and continuous prompt optimization through feedback-guided gradient descent","url":"https://www.amazon.science/publications/casper-bridging-discrete-and-continuous-prompt-optimization-through-feedback-guided-gradient-descent","published":"2026","authors":["Aryan Jain","Pushpendu Ghosh","Promod Yenigalla"],"abstract":"Workflow automation is critical for reducing manual efforts in industries, yet existing pipelines fail to handle generative tasks like summarization and extraction without pre-built tools, forcing human intervention. While LLM-based agents offer solutions, their creation depends heavily on prompt engineering—a resource-intensive process often yielding sub-optimal results. Current automated approaches face Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Conversational AI","LLM"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=5"}},{"id":"official:6866b1b4746523e0","title":"ByteFlow: Language modeling through adaptive byte compression without a tokenizer","url":"https://www.amazon.science/publications/byteflow-language-modeling-through-adaptive-byte-compression-without-a-tokenizer","published":"2026","authors":["Chunyuan Deng","Sanket Lokegaonkar","Colin Lockard","Besnik Fetahu","Nasser Zalmout","Xian Li"],"abstract":"Modern language models (LMs) still rely on fixed, pre-defined subword tokenizations. Once a tokenizer is trained, the LM can only operate at this fixed level of granularity, which often leads to brittle and counterintuitive behaviors even in otherwise strong reasoning models. We introduce ByteFlow Net, a new hierarchical architecture that removes tokenizers entirely and instead enables models to learn their Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Conversational AI","compression"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications"}},{"id":"official:85b93dd86f7b740a","title":"Automated cricket scene classification using vision-language model","url":"https://www.amazon.science/publications/automated-cricket-scene-classification-using-vision-language-model","published":"2026","authors":["Karan Sindwani","Debasish Mishra","Yash Shah"],"abstract":"Vision-Language Models (VLMs) have demonstrated impressive capabilities in general- purpose multi-modal tasks, but their adaptation to specialized sports analysis remains relatively unexplored. This paper bridges this gap by investigating VLM's effectiveness for automated cricket scene classification, addressing critical bottlenecks in current workflows that require 45-50 minutes of human intervention. Category: Computer vision","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Computer vision","language model"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=7"}},{"id":"official:cad459a687b0112b","title":"Zodiac — Zero-inflated overshoot controlled dual-head integration for asymmetric cross-domain forecasting","url":"https://www.amazon.science/publications/zodiac-zero-inflated-overshoot-controlled-dual-head-integration-for-asymmetric-cross-domain-forecasting","published":"2026","authors":["Igor Yakushin","Sai Krishna Kiran Beathanabhotla","Dhruv Garg","Mahmudur Rahman"],"abstract":"Foundation models promise zero-shot forecasting across domains, yet their effectiveness for cold-start scenarios with zero-inflated distributions remains underexplored. We study cross-domain demand forecasting, predicting outcomes for items launching in new domains without historical data where a substantial fraction of launches (≈ 30%) yield zero outcomes and overestimation carries asymmetric costs. We Category: Machine learning","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Machine learning"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications"}},{"id":"official:0c2dc377216adf36","title":"When thoughts meet facts: Reusable reasoning for long-context LMs","url":"https://www.amazon.science/publications/when-thoughts-meet-facts-reusable-reasoning-for-long-context-lms","published":"2026","authors":["Soyeong Jeong","Taehee Jung","Sung Ju Hwang","Joo-Kyung Kim","Dongyeop Kang"],"abstract":"Recent Long-Context Language Models (LCLMs) can process hundreds of thousands of tokens in a single prompt, enabling new opportunities for knowledge-intensive multi-hop reasoning by integrating large sets of retrieved documents or, in some cases, directly all necessary information. However, simply feeding more documents into the context window fails to capture how evidence should be connected. We address Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Conversational AI"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications"}},{"id":"official:ce45bee7be1b6eb1","title":"When speed meets intelligence: Scalable conversational NER in an ever-evolving world","url":"https://www.amazon.science/publications/when-speed-meets-intelligence-scalable-conversational-ner-in-an-ever-evolving-world","published":"2026","authors":["Karim Ghonim","Antonio Roberto","Davide Bernardi"],"abstract":"Modern conversational AI systems require sophisticated Named Entity Recognition (NER) capabilities that can handle complex, contextual dialogue patterns. While Large Language Models (LLMs) excel at understanding conversational semantics, their inference latency and inability to efficiently incorporate emerging entities make them impractical for production deployment. Moreover, the scarcity of conversational Category: Operations research and optimization","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Operations research and optimization"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=6"}},{"id":"official:5020acf56adb0a2b","title":"When LLMs read tables carelessly: Measuring and reducing data referencing errors","url":"https://www.amazon.science/publications/when-llms-read-tables-carelessly-measuring-and-reducing-data-referencing-errors","published":"2026","authors":["Yuqing Yang","Qi Zhu","Zhen Han","Boran Han","Zhengyuan Shen","Shuai Wang","Vassilis N. Ioannidis","Huzefa Rangwala"],"abstract":"While large language models (LLMs) perform well on table tasks, they still make data referencing errors (DREs), i.e., incorrectly citing or omitting table values, despite understanding the table structure. Beyond final-answer accuracy, DREs directly compromise the correctness and reliability of intermediate reasoning steps. Yet prior studies have only offered limited, small-scale analyses. In this work, Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Conversational AI"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications"}},{"id":"official:16fc20a48c043e3f","title":"Visual reasoning through tool-supervised reinforcement learning","url":"https://www.amazon.science/publications/visual-reasoning-through-tool-supervised-reinforcement-learning","published":"2026","authors":["Qihua Dong","Gozde Sahin","Pei Wang","Zhaowei Cai","Robik Shrestha","Hao Yang","Davide Modolo"],"abstract":"In this paper, we investigate the problem of how to effectively master tool-use to solve complex visual reasoning tasks for Multimodal Large Language Models. To achieve that, we propose a novel Tool-supervised Reinforcement Learning (ToolsRL) framework, with direct tool supervision for more effective tool-use learning. We focus on a series of simple, native, and interpretable visual tools, including zoom-in Category: Computer vision","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Computer vision"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications"}},{"id":"official:12d99faf85d45561","title":"ViLL-E: Video LLM embeddings for retrieval","url":"https://www.amazon.science/publications/vill-e-video-llm-embeddings-for-retrieval","published":"2026","authors":["Rohit Gupta","Jayakrishnan Unnikrishnan","Fan Fei","Sheng Liu"],"abstract":"Video Large Language Models (VideoLLMs) excel at video understanding tasks where outputs are textual, such as Video Question Answering and Video Captioning. However, they underperform specialized embedding-based models in Retrieval tasks, such as Text-to-Video Retrieval and Moment Retrieval. We introduce ViLL-E (Video-LLM-Embed), a unified VideoLLM architecture endowed with a novel embedding generation Category: Computer vision","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Computer vision"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications"}},{"id":"official:7bfedd3cb248aa1f","title":"Universal guideline-driven image clustering via a hybrid LLM agent","url":"https://www.amazon.science/publications/universal-guideline-driven-image-clustering-via-a-hybrid-llm-agent","published":"2026","authors":["Wenliang Zhong","Rob Barton","Lucas Goncalves","Kushal Kumar","Feng Jiang","Hehuan Ma","Yuzhi Guo","Vidit Bansal","Karim Bouyarmane","Junzhou Huang"],"abstract":"Unifying image clustering across different clustering scenarios remains challenging due to fundamental gaps among tasks. We introduce a Guideline-Driven Image Clustering Agent, the first universal framework that bridges these gaps through textual guidelines. To incorporate complex guidelines without task-specific training, we propose Generative Concept Proxy Modeling, which generates guideline-aware embeddings Category: Computer vision","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Computer vision"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=2"}},{"id":"official:651e32bd19fdf096","title":"Understanding the implicit biases of design choices for time series foundation models","url":"https://www.amazon.science/publications/understanding-the-implicit-biases-of-design-choices-for-time-series-foundation-models","published":"2026","authors":["Annan Yu","Danielle Maddix Robinson","Boran Han","Xiyuan Zhang","Abdul Fatir Ansari","Oleksandr Shchur","Christos Faloutsos","Andrew Gordon Wilson","Michael Mahoney","Yuyang (Bernie) Wang"],"abstract":"Time series foundation models (TSFMs) are a potential class of powerful, general-purpose tools for forecasting and related temporal tasks, but their behavior is strongly shaped by subtle inductive biases in their design. Rather than developing a new model and claiming that it is better than existing TSFMs, e.g., by winning on existing benchmarks, our objective is to understand how the various \"knobs\" of Category: Machine learning","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Machine learning"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=2"}},{"id":"official:40f01987195e7a9a","title":"Training large language models to reason in parallel with global forking tokens","url":"https://www.amazon.science/publications/training-large-language-models-to-reason-in-parallel-with-global-forking-tokens","published":"2026","authors":["SHENG JIA","Xiao Wang","Shiva Kasiviswanathan"],"abstract":"Although LLMs have demonstrated improved performance by scaling parallel test-time compute, doing so relies on generating reasoning paths that are both diverse and accurate. For challenging problems, the forking tokens that trigger diverse yet correct reasoning modes are typically deep in the sampling tree. Consequently, common strategies to encourage diversity, such as temperature scaling, encounter a Category: Machine learning","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Machine learning"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=3"}},{"id":"official:7d6ed1852e2b450b","title":"Towards self-improving error diagnosis in multi-agent systems","url":"https://www.amazon.science/publications/towards-self-improving-error-diagnosis-in-multi-agent-systems","published":"2026","authors":["Jiazheng Li","Emine Yilmaz","Bei Chen","Thu Le"],"abstract":"Large Language Model (LLM)-based Multi-Agent Systems (MAS) enable complex problem-solving but introduce significant debugging challenges, characterized by long interaction traces, inter-agent dependencies, and delayed error manifestation. Existing diagnostic approaches often rely on expensive expert annotation or 'LLM-as-a-judge' paradigms, which struggle to pinpoint decisive error steps within extended Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Conversational AI"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications"}},{"id":"official:6a443a065921355d","title":"TaTToo: Tool-augmented thinking PRM for tabular reasoning","url":"https://www.amazon.science/publications/tatto-tool-augmented-thinking-prm-for-tabular-reasoning","published":"2026","authors":["Rubin Zou","Soumya Roy","Vinay Kumar Verma","Ziyi Wang","David Paul Wipf","Pan Lu","Jingrui He","Sumit Negi"],"abstract":"Process Reward Models (PRMs) have recently emerged as a powerful framework for enhancing the reasoning capabilities of large reasoning models (LRMs), particularly in the context of test-time scaling (TTS). However, their potential for supervising LRMs on tabular reasoning domains remains underexplored. Through detailed empirical analyses, we identify that existing PRMs, though widely adopted for supervising Category: Automated reasoning","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Automated reasoning"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=3"}},{"id":"official:9f6c1567ad5a3bdb","title":"SQL-Trail: multi-turn reinforcement learning with interleaved feedback for text-to-SQL","url":"https://www.amazon.science/publications/sql-trail-multi-turn-reinforcement-learning-with-interleaved-feedback-for-text-to-sql","published":"2026","authors":["Harper Hua","Zhen Han","Zhengyuan Shen","Jeremy Lee","Patrick Guan","Qi Zhu","Sullam Jeoung","Yueyan Chen","Yunfei Bai","Shuai Wang","Vassilis N. Ioannidis","Huzefa Rangwala"],"abstract":"While large language models (LLMs) have substantially improved Text-to-SQL generation, a pronounced gap remains between AI systems and human experts on challenging benchmarks such as BIRD-SQL. We argue this gap stems largely from the prevailing single-pass paradigm, which lacks the iterative reasoning, schema exploration, and error-correction behaviors that humans naturally employ. To address this limitation Category: Machine learning","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Machine learning"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications"}},{"id":"official:4907365f064e9eba","title":"SELENE: Selective and evidence-weighted LLM debating for efficient and reliable reasoning","url":"https://www.amazon.science/publications/selene-selective-and-evidence-weighted-llm-debating-for-efficient-and-reliable-reasoning","published":"2026","authors":["Akshay Verma","Swapnil Gupta","Siddharth Pillai","Prateek Sircar","Deepak Gupta"],"abstract":"Multi-Agent Debate (MAD) frameworks improve factual reliability in large language models (LLMs) by allowing agents to critique and refine one another's reasoning. Yet, existing MAD systems are computationally expensive and prone to degradation under prolonged debates due to redundant exchanges and unstable judging. We propose a lightweight, industry-deployable alternative that unifies Selective Debate Initiation Category: Information and knowledge management","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Information and knowledge management"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications"}},{"id":"official:93ca41ea1f894346","title":"SALT: Step-level advantage assignment for long-horizon agents via trajectory graph","url":"https://www.amazon.science/publications/salt-step-level-advantage-assignment-for-long-horizon-agents-via-trajectory-graph","published":"2026","authors":["Jiazheng Li","Yawei Wang","David Yan","Yijun Tian","Zhichao Xu","Huan Song","Panpan Xu","Lin Lee Cheong"],"abstract":"Large language models (LLMs) have demonstrated remarkable capabilities, enabling language agents to excel at single-turn tasks. However, their application to complex, multi-step, and long-horizon tasks remains challenging. While reinforcement learning (RL) offers a promising avenue for addressing these challenges, mainstream approaches typically rely solely on sparse, outcome-based rewards, a limitation Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.48448/3020-e820","openalex_id":"https://openalex.org/W7155384085","cited_by_count":0,"quality_score":56,"matched_keywords":["Conversational AI"],"author_affiliations":["Amazon","Amazon (Germany)","Amazon (United States)"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=7"}},{"id":"official:ba70db3458e2e91c","title":"Revisiting model stitching in the foundation model era","url":"https://www.amazon.science/publications/revisiting-model-stitching-in-the-foundation-model-era","published":"2026","authors":["Zheda Mai","Ke Zhang","Fu-En Wang","Ken Wang","Albert Chen","Lu Xia","Wei-Lun Chao","Cheng-Hao Kuo"],"abstract":"Model stitching, connecting early layers of one model (source) to later layers of another (target) via a light stitch layer, has served as a probe of representational compatibility. Prior work finds that models trained on the same dataset remain stitchable (negligible accuracy drop) despite different initializations or objectives. We revisit stitching for Vision Foundation Models (VFMs) that vary in objectives Category: Computer vision","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Computer vision"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications"}},{"id":"official:01f37a9abb3b3894","title":"Rethinking language models for building outline extraction from remote sensing imagery","url":"https://www.amazon.science/publications/rethinking-language-models-for-building-outline-extraction-from-remote-sensing-imagery","published":"2026","authors":["Will Qian","Yang He","Mohamed Moustafa"],"abstract":"Building outline extraction from remote sensing imagery traditionally relies on segmentation or detection followed by post-processing to derive polygonal geometries. Despite advances in sequential prediction methods [2, 20], end-to-end extraction remains challenging, often missing buildings or requiring additional refinement steps. In this work, we reformulate building outline extraction as next-coordinate Category: Computer vision","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Computer vision"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=2"}},{"id":"official:617d96932eef404a","title":"Reinforcing structured chain-of-thought for video understanding","url":"https://www.amazon.science/publications/reinforcing-structured-chain-of-thought-for-video-understanding","published":"2026","authors":["Peiyao Wang","Haotian Xu","Sol Vesdapunt","Rui Hou","Jingyi Zhang","Haibin Ling","Oleksandr Obiednikov","Ning Zhou","Kah Kuen Fu"],"abstract":"Multi-modal Large Language Models (MLLMs) show promise in video understanding. However, their reasoning often suffers from thinking drift and weak temporal comprehension, even when enhanced by Reinforcement Learning (RL) techniques like Group Relative Policy Optimization (GRPO). Moreover, existing RL methods usually depend on Supervised Fine-Tuning (SFT), which requires costly Chain-of-Thought (CoT) annotation Category: Computer vision","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Computer vision"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications"}},{"id":"official:07f53e80d0f1a91d","title":"RMIR: A benchmark dataset for reasoning-intensive multimodal image retrieval","url":"https://www.amazon.science/publications/rmir-a-benchmark-dataset-for-reasoning-intensive-multimodal-image-retrieval","published":"2026","authors":["Yijiang Li","Kunal Kotian","Ali Marjaninejad","Meir Friedenberg","Kaushik Pavani","Sunny Dasgupta"],"abstract":"Current multimodal image retrieval benchmarks focus on relatively simple queries where target images are either described directly or by simple composition with an input image. When retrieval requires complex reasoning to determine the target image, the task becomes significantly more challenging, yet standardized benchmarks for this setting do not exist. To fill this gap, we introduce RMIR, a benchmark Category: Search and information retrieval","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Search and information retrieval"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=2"}},{"id":"official:d9b829eadea5abbc","title":"Pattern discovery with wide-lens analysis and sharp-focus validation","url":"https://www.amazon.science/publications/pattern-discovery-with-wide-lens-analysis-and-sharp-focus-validation","published":"2026","authors":["Li Liu","Omar Alonso","Giorgio Ballardin"],"abstract":"Given an unfamiliar dataset without ground truth annotations or established taxonomies, how do we systematically discover meaningful patterns? Even with large language models providing initial categorization suggestions, it remains challenging to capture patterns and standardize them into consistent representations across unstructured data. This persistent challenge highlights the need for systematic discovery Category: Information and knowledge management","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Information and knowledge management"],"author_affiliations":["Amazon","Amazon (United States)","University of California, Santa Cruz"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=5"}},{"id":"official:eb6e871d7350804e","title":"PGGA: A plan-grounded GUI agent for automated device support","url":"https://www.amazon.science/publications/pgga-a-plan-grounded-gui-agent-for-automated-device-support","published":"2026","authors":["Lei Hsiung","Zhiyu Chen","Seonhoon Kim","Qun Liu"],"abstract":"Current GUI agents struggle with multi-step digital device support. We investigate whether this failure is partly caused by a procedural knowledge deficit: agents often rely on zero-shot visual exploration instead of executing verified instructions. To address this, we introduce the Plan-Grounded GUI Agent (PGGA), framing interface navigation as a knowledge-execution problem by conditioning low-level actions Category: Computer vision","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Computer vision"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications"}},{"id":"official:5fab23c0224501a1","title":"Not-a-bandit: Provably no-regret drafter selection in speculative decoding for LLMs","url":"https://www.amazon.science/publications/not-a-bandit-provably-no-regret-drafter-selection-in-speculative-decoding-for-llms","published":"2026","authors":["Hongyi Liu","Jiaji Huang","Zhen Jia","Youngsuk Park","Yu-Xiang Wang"],"abstract":"Speculative decoding is widely used in accelerating large language model (LLM) inference. In this work, we focus on the online draft model selection problem in speculative decoding. We design an algorithm that provably competes with the best draft model in hindsight for each query in terms of either the token acceptance probability or expected acceptance length. In particular, we show that we can accurately Category: Machine learning","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Machine learning"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=2"}},{"id":"official:8c9762afa119d558","title":"MTSQL-R1: Towards long-horizon multi-turn text-to-SQL via agentic training","url":"https://www.amazon.science/publications/mtsql-r1-towards-long-horizon-multi-turn-text-to-sql-via-agentic-training","published":"2026","authors":["Taicheng Guo","Hai Wang","Chaochun Liu","Mohsen Golalikhani","Xin Chen","Xiangliang Zhang","Chandan Reddy"],"abstract":"Multi-turn Text-to-SQL aims to translate a user's conversational utterances into executable SQL while preserving dialogue coherence and grounding to the target schema. However, most existing systems only regard this task as a simple text translation task and follow a short-horizon paradigm, generating a query per turn without execution, explicit verification, and refinement, which leads to non-executable Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Conversational AI"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications"}},{"id":"official:4f2d08e2fe732898","title":"LocRegen: Cost-efficient redundancy removal in multilingual e-commerce titles with small language models","url":"https://www.amazon.science/publications/locregen-cost-efficient-redundancy-removal-in-multilingual-e-commerce-titles-with-small-language-models","published":"2026","authors":["Bryan Zhang","Stephan Walter","Luca Lomanto","Merve Arinik"],"abstract":"E-commerce product titles often include redundant information that negatively impacts the user experience. Removing repeated words through restructuring and paraphrasing can make titles more concise and improve readability. While large language models can optimize titles, their computational cost makes them impractical for large-scale applications. In this paper, we first analyze the sources of repetition Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Conversational AI"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications"}},{"id":"official:43acc176054d44e1","title":"LinguaMAP: Which layers of LLMs speak your language and how to tune them?","url":"https://www.amazon.science/publications/linguamap-which-layers-of-llms-speak-your-language-and-how-to-tune-them","published":"2026","authors":["Ben Tamo","Daniel Carlander Reuterfelt Gallo","Jonathan Rubin","Oleg Poliannikov","Dezhi Hong","Mingxian Wang"],"abstract":"Despite multilingual pretraining, large language models often struggle with non-English tasks, particularly in language control — the ability to respond in the intended language. We identify and characterize two key failure modes: the multilingual transfer bottleneck (correct language, incorrect task response) and the language consistency bottleneck (correct task response, wrong language). To systematically Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Conversational AI"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=3"}},{"id":"official:99c41a28307fdccd","title":"Learning to staff: Offline reinforcement learning and fine-tuned LLMs for warehouse staffing optimization","url":"https://www.amazon.science/publications/learning-to-staff-offline-reinforcement-learning-and-fine-tuned-llms-for-warehouse-staffing-optimization","published":"2026","authors":["Kalle Kujanpää","Yuying Zhu","Kristina Klinkner","Shervin Malmasi"],"abstract":"We investigate machine learning approaches for optimizing real-time staffing decisions in semi-automated warehouse sortation systems. Operational decision-making can be supported at different levels of abstraction, with different tradeoffs. We evaluate two approaches, each in a matching simulation environment. First, we train custom Transformer-based policies using offline reinforcement learning on detailed Category: Machine learning","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Machine learning"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=3"}},{"id":"official:76a7fd30213ce513","title":"KG-CRAFT: Knowledge graph-based contrastive reasoning with LLMs for enhancing automated fact-checking","url":"https://www.amazon.science/publications/kg-craft-knowledge-graph-based-contrastive-reasoning-with-llms-for-enhancing-automated-fact-checking","published":"2026","authors":["Vítor Lourenço","Aline Paes","Tillman Weyde","Audrey Depeige","Mohnish Dubey"],"abstract":"Claim verification is a core component of automated fact-checking systems, aimed at determining the truthfulness of a statement by assessing it against reliable evidence sources such as documents or knowledge bases. This work presents KG-CRAFT, a method that improves automatic claim verification by leveraging large language models (LLMs) augmented with contrastive questions grounded in a knowledge graph Category: Information and knowledge management","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Information and knowledge management"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=6"}},{"id":"official:5ceba5d04c63899b","title":"Journey before destination: On the importance of visual faithfulness in slow thinking","url":"https://www.amazon.science/publications/journey-before-destination-on-the-importance-of-visual-faithfulness-in-slow-thinking","published":"2026","authors":["Rheeya Uppaal","Phu Mon Htut","Min Bai","Nikolaos Pappas","Zheng Qi","Sandesh Swamy"],"abstract":"Reasoning-augmented vision language models (VLMs) generate explicit chains of thought that promise greater capability and transparency but also introduce new failure modes: models may reach correct answers via visually unfaithful intermediate steps, or reason faithfully yet fail on the final prediction. Standard evaluations that only measure final-answer accuracy cannot distinguish these behaviors. We introduce Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.48448/zwmm-5d11","openalex_id":"https://openalex.org/W7155400654","cited_by_count":0,"quality_score":56,"matched_keywords":["Conversational AI"],"author_affiliations":["Amazon","Amazon (Germany)","Amazon (United States)","University of Wisconsin–Madison"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=4"}},{"id":"official:0a3aa3daafa3bbce","title":"Investigating equation-only reasoning in large language models","url":"https://www.amazon.science/publications/investigating-equation-only-reasoning-in-large-language-models","published":"2026","authors":["Jonathan Chung","Ramya Toshniwal"],"abstract":"While Large Language Models excel at mathematical reasoning with Chain-of-Thought prompting, their ability to perform systematic arithmetic reasoning without natural language scaffolding remains poorly understood. We investigate equation-only supervision, where LLMs map natural language problems directly to symbolic equation sequences without intermediate explanations. This approach separates reasoning Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Conversational AI"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=3"}},{"id":"official:a87b94d47a132297","title":"Incentivizing consistent, effective and scalable reasoning capability in audio LLMs via reasoning process rewards","url":"https://www.amazon.science/publications/incentivizing-consistent-effective-and-scalable-reasoning-capability-in-audio-llms-via-reasoning-process-rewards","published":"2026","authors":["Jiajun Fan","Roger Ren","Jingyuan Li","Rahul Pandey","Prashanth Gurunath Shivakumar","Ivan Bulyko","Ankur Gandhe","Ge Liu","Yi Gu"],"abstract":"The role of reasoning in Audio Large Language Models remains widely underexplored, as introducing a reasoning process often degrades rather than improves performance during inference, a phenomenon we term test-time inverse scaling, where longer reasoning chains yield progressively worse results. We demonstrate that this stems not from fundamental limitations of reasoning itself, but from inadequate training Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Conversational AI"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications"}},{"id":"official:21d294e6b78b0bf8","title":"ImageRAGTurbo: Towards one-step text-to-image generation with retrieval-augmented diffusion models","url":"https://www.amazon.science/publications/imageragturbo-towards-one-step-text-to-image-generation-with-retrieval-augmented-diffusion-models","published":"2026","authors":["Peijie Qiu","Hariharan Ramshankar","Arnau Ramisa","Amit Kumar K C","Rene Vidal","Vamsi Salaka","Rahul Bhagat"],"abstract":"Diffusion models have emerged as the leading approach for text-to-image generation. However, their iterative sampling process, which gradually morphs random noise into coherent images, introduces significant latency that limits their applicability. While recent few-step diffusion models reduce the number of sampling steps to as few as one to four steps, they often compromise image quality and prompt alignment Category: Computer vision","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Computer vision"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=2"}},{"id":"official:7b854ea2d4a7a1bd","title":"IDP Accelerator: Agentic document intelligence from extraction to compliance validation","url":"https://www.amazon.science/publications/idp-accelerator-agentic-document-intelligence-from-extraction-to-compliance-validation","published":"2026","authors":["Mofijul Islam","Sirajus Salekin","Joe King","Priyashree Roy","Vamsi Thilak Gudi","Spencer Romo","Akhil Nooney","Boyi Xie","Bob Strahan","Diego Socolinsky"],"abstract":"Understanding and extracting structured insights from unstructured documents remains a foundational challenge in industrial NLP. While Large Language Models (LLMs) enable zero-shot extraction, traditional pipelines often fail to handle multi-document packets, complex reasoning, and strict compliance requirements. We present IDP (Intelligent Document Processing) Accelerator, a framework enabling agentic Category: Computer vision","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Computer vision"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications"}},{"id":"official:0e75268475b0780f","title":"Hindsight-anchored policy optimization: Turning failure into feedback in sparse reward settings","url":"https://www.amazon.science/publications/hindsight-anchored-policy-optimization-turning-failure-into-feedback-in-sparse-reward-settings","published":"2026","authors":["Yuning Wu","Ke Wang","Devin Chen","Kai Wei"],"abstract":"Reinforcement Learning with Verifiable Rewards (RLVR) has emerged as a promising paradigm for post-training reasoning models. However, group-based methods such as Group Relative Policy Optimization (GRPO) face a critical dilemma in sparse-reward settings: pure Reinforcement Learning (RL) suffers from advantage collapse and high-variance gradient estimation, while mixed-policy optimization introduces persistent Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Conversational AI"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=3"}},{"id":"official:829841d624327409","title":"Graph-based nearest neighbors with dynamic updates via random walk","url":"https://www.amazon.science/publications/graph-based-nearest-neighbors-with-dynamic-updates-via-random-walk","published":"2026","authors":["Nina Mishra","Yonatan Naamad","Tal Wagner","Lichen Zhang"],"abstract":"Approximate nearest neighbor search (ANN) is a common way to retrieve relevant search results, especially now in the context of large language models and retrieval augmented generation. One of the most widely used algorithms for ANN is based on constructing a multi-layer graph over the dataset, called the Hierarchical Navigable Small World (HNSW). While this algorithm supports insertion of new data, it Category: Machine learning","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Machine learning"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications"}},{"id":"official:cabbc0ad73bdc067","title":"From narrow unlearning to emergent misalignment: Causes, consequences, and containment in LLMs","url":"https://www.amazon.science/publications/from-narrow-unlearning-to-emergent-misalignment-causes-consequences-and-containment-in-llms","published":"2026","authors":["Erum Mushtaq","Anil Ramakrishna","Satyapriya Krishna","Sattvik Sahai","Prasoon Goyal","Kai-Wei Chang","Tao Zhang","Rahul Gupta"],"abstract":"Recent work has shown that fine-tuning on insecure code data can trigger an emergent misalignment (EMA) phenomenon, where models generate malicious responses even to prompts unrelated to the original insecure code-writing task. Such cross-domain generalization of harmful behavior underscores the need for a deeper understanding of the algorithms, tasks, and datasets that induce emergent misalignment. In Category: Machine learning","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Machine learning"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=6"}},{"id":"official:8c1027ada370a8e6","title":"Feedback-aware prompt optimization framework for generating job postings","url":"https://www.amazon.science/publications/feedback-aware-prompt-optimization-framework-for-generating-job-postings","published":"2026","authors":["Suraj Maharjan","Ainur Yessenalina","Srinivasan Sengamedu","\"SHS\""],"abstract":"Job postings are critical for recruitment, yet large enterprises struggle with standardization and consistency, requiring significant time and effort from hiring managers and recruiters. We present a feedback-aware prompt optimization framework that automates high-quality job posting generation through iterative human-in-the-loop refinement. Our system integrates multiple data sources: job metadata, competencies Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Conversational AI"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=6"}},{"id":"official:20cb39b01c435023","title":"Exectune: Effective steering of black-box LLMs with guide models","url":"https://www.amazon.science/publications/exectune-effective-steering-of-black-box-llms-with-guide-models","published":"2026","authors":["Vijay Lingam","Aditya Golatkar","Anwesan Pal","Ben Vo","Narayanan Sadagopan","Alessandro Achille","Jun Huan","Anoop Deoras","Stefano Soatto"],"abstract":"For large language models deployed through black-box APIs, recurring inference costs often dominate one-time training costs, motivating composed agentic systems that amortize expensive reasoning into reusable intermediate representations. We study a broad class of such systems, termed Guide–Core Policies (GCOP), in which a guide model generates a structured strategy that is executed by a black-box core Category: Machine learning","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Machine learning"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications"}},{"id":"official:d45f184fc4b26116","title":"Encoding domain expertise in agents: Lessons from NFL Fantasy AI","url":"https://www.amazon.science/publications/encoding-domain-expertise-in-agents-lessons-from-nfl-fantasy-ai","published":"2026","authors":["Michael Butler","Henry Wang","Jake Lee","Kenton Blacut","Dan Volk","Mike Band","Diego Socolinsky"],"abstract":"Agentic AI systems can access vast data but struggle to apply domain expertise, namely the contextual understanding of how to use specialized information. This paper presents a practical framework for encoding such expertise, demonstrated with the National Football League (NFL) through NFL Fantasy AI, a production system delivering analyst-grade fantasy football advice, as assessed by NFL Pro analysts. Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Conversational AI"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=2"}},{"id":"official:746d2b1c053fc2d4","title":"Enabling user agency in scalable content recommendations with large language models","url":"https://www.amazon.science/publications/enabling-user-agency-in-scalable-content-recommendations-with-large-language-models","published":"2026","authors":["Yucheng Li","Gerrit van den Burg","Wei Liu","Zhunxuan Wang","Abhishek Tripathi","Murat Sensoy"],"abstract":"Existing content recommender systems usually depend on centrally stored interaction histories, creating vendor lock-in and disadvantaging newer providers who lack sufficient user data. They also limit users' ability to understand, control, or edit how their preferences are represented, since profiles are learned as opaque latent vectors within provider-controlled models. We propose a user-centric alternative Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Conversational AI"],"author_affiliations":["Amazon","Amazon (United Kingdom)","Amazon (United States)"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=7"}},{"id":"official:36fed1eff7f7515a","title":"DualVision: RGB-infrared multimodal large language models for robust visual reasoning","url":"https://www.amazon.science/publications/dualvision-rgb-infrared-multimodal-large-language-models-for-robust-visual-reasoning","published":"2026","authors":["Abrar Majeedi","Ryan (Zhiyuan) Ruan","Ziyi Zhao","Hongcheng Wang","Jianglin Lu","Yin Li"],"abstract":"Multimodal large language models (MLLMs) have achieved impressive performance on visual perception and reasoning tasks with RGB imagery, yet they remain fragile under common degradations, such as fog, blur, or low-light conditions. Infrared (IR) imaging, a well-established complement to RGB, offers inherent robustness in these conditions, but its integration into MLLMs remains underexplored. To bridge this Category: Computer vision","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Computer vision"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=2"}},{"id":"official:bab5b24f46698907","title":"Do VLMs read or rewrite?","url":"https://www.amazon.science/publications/do-vlms-read-or-rewrite","published":"2026","authors":["Gwang Gook Lee","Jay Mohta","Kenan Emir Ak","Dimitris Dimitriadis","Yan Xu"],"abstract":"Vision Language Models (VLMs) are increasingly adopted for document understanding tasks, often replacing traditional OCR systems. However, VLMs exhibit a fundamental difference: they frequently correct or rewrite imperfect text rather than transcribe it literally, a behavior that remains largely underexplored. We present a systematic investigation through controlled experiments with intentionally perturbed Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Conversational AI"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=2"}},{"id":"official:b68bc24d85ad9913","title":"Detecting hallucinations in SpeechLLMs at inference time using attention maps","url":"https://www.amazon.science/publications/detecting-hallucinations-in-speechllms-at-inference-time-using-attention-maps","published":"2026","authors":["Jonas Waldendorf","Bashar Awwad Shiekh Hasan","Evgenii Tsymbalov"],"abstract":"Hallucinations in Speech Large Language Models (SpeechLLMs) pose significant risks, yet existing detection methods typically rely on goldstandard outputs that are costly or impractical to obtain. Moreover, hallucination detection methods developed for text-based LLMs do not directly capture audio-specific signals. We investigate four attention-derived metrics: AUDIORATIO, AUDIOCONSISTENCY, AUDIOENTROPY, Category: Machine learning","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Machine learning"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications"}},{"id":"official:bf7333ae4ee5e6b8","title":"Delta debugging for LLM-integrated systems","url":"https://www.amazon.science/publications/delta-debugging-for-llm-integrated-systems","published":"2026","authors":["Hao-Nan Zhu","Muhammad Numair Mansur","Martin Schaef","Zeya Chen","Tancrède Lepoint","Willem Visser"],"abstract":"Large Language Models (LLMs) are increasingly integrated into software systems as automated decision-making components. These systems rely on instruction prompts written in natural language to encode complex workflows. However, debugging these prompts when LLMs produce undesired outputs remains challenging due to their black-box nature and the impracticality of manually inspecting large, complex inputs. Category: Automated reasoning","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Automated reasoning"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications"}},{"id":"official:aef8abc3c29b87c7","title":"DQA: Diagnostic question answering for IT support","url":"https://www.amazon.science/publications/dqa-diagnostic-question-answering-for-it-support","published":"2026","authors":["Vishaal Kapoor","Mariam Dundua","Evren YORTUCBOYLU","Sarthak Ahuja","Neda Kordjazi","Yiming Li","Vaibhavi Padala","Derek Ho","Jennifer Whitted","Rebecca Steinert"],"abstract":"Enterprise IT support interactions are fundamentally diagnostic: effective resolution requires iterative evidence gathering from ambiguous user reports to identify an underlying root cause. While retrieval-augmented generation (RAG) provides grounding through historical cases, standard multi-turn RAG systems lack explicit diagnostic state and therefore struggle to accumulate evidence and resolve competing Category: Automated reasoning","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Automated reasoning"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications"}},{"id":"official:50b79064272afd19","title":"Correct, concise and complete: Multi-stage training for adaptive reasoning","url":"https://www.amazon.science/publications/correct-concise-and-complete-multi-stage-training-for-adaptive-reasoning","published":"2026","authors":["Carraz Rakotonirina","Ren Pang","Neha Anna John","Michael Bohlke-Schneider","Momchil Hardalov"],"abstract":"The reasoning capabilities of large language models (LLMs) have improved substantially through increased test-time computation, typically in the form of intermediate tokens known as chain-of-thought (CoT). However, CoT often becomes unnecessarily long, increasing computation costs without improving accuracy and sometimes even degrading performance, a phenomenon known as 'overthinking'. We propose a multi-stage Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Conversational AI"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=2"}},{"id":"official:832f8021e556360c","title":"CompAgent: An agentic framework for visual compliance verification","url":"https://www.amazon.science/publications/compagent-an-agentic-framework-for-visual-compliance-verification","published":"2026","authors":["Rahul Ghosh","baishali chaudhury","Hari Prasanna Das","Meghana Ashok","Ryan Razkenari","Long Chen","Sungmin Hong","Chun-Hao Liu"],"abstract":"Visual compliance verification is a critical yet underexplored problem in computer vision, especially in domains such as media, entertainment, and advertising where content must adhere to complex and evolving policy rules. Existing methods often rely on task-specific deep learning models trained on manually labeled datasets, which are costly to build and limited in generalizability. While recent Multimodal Category: Computer vision","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Computer vision"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications"}},{"id":"official:5c6be89b1bca422b","title":"CodeV: Code with images for faithful visual reasoning via tool-aware policy optimization","url":"https://www.amazon.science/publications/codev-code-with-images-for-faithful-visual-reasoning-via-tool-aware-policy-optimization","published":"2026","authors":["Xinhai Hou","Shaoyuan Xu","Manan Biyani","Moyan Li","Jia (Kevin) Liu","Todd C. Hollon","Bryan Wang"],"abstract":"Agentic vision–language models are increasingly trained to 'think with images' by calling image operations. However, we show that high final-answer accuracy often hides unfaithful visual reasoning: models may invoke tools on irrelevant regions or ignore tool outputs entirely, yet still guess the correct answer. In this work, we first propose a faithfulness evaluation protocol that measures whether intermediate Category: Computer vision","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Computer vision"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=3"}},{"id":"official:97b4f1f5dcb0d69e","title":"CausalFusion: Integrating LLMs and graph falsification for causal discovery","url":"https://www.amazon.science/publications/causalfusion-integrating-LLMs-and-graph-falsification-for-causal-discovery","published":"2026","authors":["Alessandro Casadei","Sreyoshi Bhaduri","Pavan Mullapudi","Ankush Pole","Raj Ratan","Rohit Malshe"],"abstract":"Causal discovery is central to enable causal models for tasks such as effect estimation, counterfactual reasoning, and root cause attribution. Yet existing approaches face trade-offs: purely statistical methods (e.g., PC, LiNGAM) often return structures that overlook domain knowledge, while expert-designed DAGs are difficult to scale and time-consuming to construct. We propose CausalFusion, a hybrid framework Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Conversational AI"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=7"}},{"id":"official:7d8ce4057b925917","title":"Beyond statistical changepoint detection: Semantic interpretation of time series via large language models","url":"https://www.amazon.science/publications/beyond-statistical-changepoint-detection-semantic-interpretation-of-time-series-via-large-language-models","published":"2026","authors":["Hong Kiat Tan","Trilokya Akula","Akash Tonne","Tom Blake"],"abstract":"Changepoint detection algorithms identify where structural breaks occur but are conventionally used under a one-to-one mapping between detected breaks and real-world events. We show this mapping assumption is undermined by a fundamental ambiguity: the confidence interval for a detected break widens as the slope jump shrinks, so a wide interval may indicate either a mild genuine break or an approximation Category: Economics","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Economics"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications"}},{"id":"official:6694f5b1a0c3ccce","title":"Beyond grey-box assumptions: Uncertainty-guided example selection for black-box language models","url":"https://www.amazon.science/publications/beyond-grey-box-assumptions-uncertainty-guided-example-selection-for-black-box-language-models","published":"2026","authors":["Egor Krasheninnikov","Zainab Afolabi","Giuseppe Mascellaro","Salvatore Radosta"],"abstract":"In-context learning (ICL) with Large Language Models has been historically effective, but performance depends heavily on demonstration quality while annotation budgets remain constrained. Existing uncertainty-based selection methods like Cover-ICL achieve strong performance through logit-based uncertainty estimation, but most production LLMs operate as black-box APIs where internal states are inaccessible Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Conversational AI"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=7"}},{"id":"official:6f7af97f75b98323","title":"Attribute-aware controlled product generation with LLMs for e-commerce","url":"https://www.amazon.science/publications/attribute-aware-controlled-product-generation-with-llms-for-e-commerce","published":"2026","authors":["Virginia Negri","Victor Martinez Gomez","Sergio Alvarez Balanya","Subbu Rajaram"],"abstract":"Product information extraction is crucial for e-commerce services, but obtaining high-quality labeled datasets remains challenging. We present a systematic approach for generating synthetic e-commerce product data using Large Language Models (LLMs), introducing a controlled modification framework with three strategies: attribute-preserving modification, controlled negative example generation, and systematic Category: Machine learning","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Machine learning"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=4"}},{"id":"official:17aa7308bc306999","title":"Align to structure: Aligning large language models with structural information","url":"https://www.amazon.science/publications/align-to-structure-aligning-large-language-models-with-structural-information","published":"2026","authors":["Zae Kim","Anand Ramachandran","Farideh Tavazoee","JK Kim","Oleg Rokhlenko","Dongyeop Kang"],"abstract":"Generating long, coherent text remains a challenge for large language models (LLMs), as they lack hierarchical planning and structured organization in discourse generation. We introduce Structural Alignment, a novel method that aligns LLMs with human-like discourse structures to enhance long-form text generation. By integrating linguistically grounded discourse frameworks into reinforcement learning, our Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Conversational AI"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=4"}},{"id":"official:4f1c3d55f77e835c","title":"Agentic simulacra for synthetic construction management data generation","url":"https://www.amazon.science/publications/agentic-simulacra-for-synthetic-construction-management-data-generation","published":"2026","authors":["Vincil Bishop","Nivedha Balakrishnan","Saeideh Shahrokh Esfahani"],"abstract":"Construction management systems require realistic test data capturing complex stakeholder interactions and temporal dependencies, yet accessing real project data remains challenging due to privacy constraints and proprietary information protection. This research addresses a critical systems engineering challenge by introducing agentic simulacra patterns that leverage multi-agent coordination to generate Category: Automated reasoning","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Automated reasoning"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications"}},{"id":"official:350356d1d61eb7c9","title":"AccelOpt: A self-improving LLM agentic system for AI accelerator kernel optimization","url":"https://www.amazon.science/publications/accelopt-a-self-improving-llm-agentic-system-for-ai-accelerator-kernel-optimization","published":"2026","authors":["Genghan Zhang","Shaowei Zhu","Anjiang Wei","Zhenyu Song","Allen Nie","Zhen Jia","Nandita Vijaykumar","Yida Wang","Kunle Olukotun"],"abstract":"We present AccelOpt, a self-improving large language model (LLM) agentic system that autonomously optimizes kernels for emerging AI acclerators, eliminating the need for expert-provided hardware-specific optimization knowledge. AccelOpt explores the kernel optimization space through iterative generation, informed by an optimization memory that curates experiences and insights from previously encountered Category: Machine learning","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Machine learning"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=2"}},{"id":"official:6734884ee103a55e","title":"ARES: Adaptive red-teaming and end-to-end repair of policy-reward system","url":"https://www.amazon.science/publications/ares-adaptive-red-teaming-and-end-to-end-repair-of-policy-reward-system","published":"2026","authors":["Jiacheng Liang","Yao Ma","Tharindu Kumarage","Satyapriya Krishna","Rahul Gupta","Kai-Wei Chang","Aram Galstyan","Charith Peris"],"abstract":"Reinforcement Learning from Human Feedback (RLHF) is central to aligning Large Language Models (LLMs), yet it introduces a critical vulnerability: an imperfect Reward Model (RM) can become a single point of failure when it fails to penalize unsafe behaviors. While existing red-teaming approaches primarily target policy-level weaknesses, they overlook what we term systemic weaknesses cases where both the Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Conversational AI"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications"}},{"id":"official:8c5e72675bca3079","title":"A functionality-grounded benchmark for evaluating web agents in e-commerce domains","url":"https://www.amazon.science/publications/a-functionality-grounded-benchmark-for-evaluating-web-agents-in-e-commerce-domains","published":"2026","authors":["Xianren Zhang","Shreyas Prasad","Di Wang","Qiuhai Zeng","Suhang Wang","Wenbo Yan","Mat Hans"],"abstract":"Web agents have shown great promise in performing many tasks on e-commerce websites. To assess their capabilities, several benchmarks have been introduced. However, current benchmarks in the e-commerce domain face two major problems. First, they primarily focus on product search tasks (e.g., 'Find an Apple Watch'), failing to capture the broader range of functionalities offered by real-world e-commerce Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Conversational AI"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=2"}},{"id":"official:be61dd00b4a411a5","title":"A framework for prompt optimization and translation across foundation models","url":"https://www.amazon.science/publications/a-framework-for-prompt-optimization-and-translation-across-foundation-models","published":"2026","authors":["Abhinav Shankaranarayanan Venkataraman","Thanos Nikolakopoulos","Vishwanath Kumaraswamy","Tao Zhang","Sarath Chander","Rohit Saboo","Suleiman Khan"],"abstract":"Foundation-model upgrades frequently break deployed prompt-based systems: target models differ in chat-template conventions, multimodal interfaces, context limits, and structured-output reliability. We study cross-model prompt adaptation: given a prompt program validated on a source model, produce a target-model prompt that preserves a semantic contract and an interface contract under bounded regression Category: Automated reasoning","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Automated reasoning"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=2"}},{"id":"official:df7b312f45107f97","title":"mHC: Manifold-Constrained Hyper-Connections","url":"https://huggingface.co/papers/2512.24880","published":"2025-12-31","authors":["Zhenda Xie","Yixuan Wei","Huanqi Cao","Chenggang Zhao","Chengqi Deng","Jiashi Li","Damai Dai","Huazuo Gao","Jiang Chang","Liang Zhao","Shangyan Zhou","Zhean Xu"],"abstract":"Manifold-Constrained Hyper-Connections stabilize and scale residual connection architectures by restoring identity mapping properties through manifold projection and infrastructure optimization, suggesting directions for the evolution of foundational language models.","companies":["DeepSeek"],"matched_orgs":["DeepSeek"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_report"],"source":"official_report","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["HuggingFace org papers","deepseek-ai"],"author_affiliations":["DeepSeek"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official company report source"}},{"id":"arxiv:2512.24834","title":"GenZ: Foundational models as latent variable generators within traditional statistical models","url":"http://arxiv.org/abs/2512.24834","published":"2025-12-31","authors":["Marko Jojic","Nebojša Jojić"],"abstract":"We present GenZ, a hybrid model that bridges foundational models and statistical modeling through interpretable semantic features. While large language models possess broad domain knowledge, they often fail to capture dataset-specific patterns critical for prediction tasks. Our approach addresses this by discovering semantic feature descriptions through an iterative process that contrasts groups of items identified via statistical modeling errors, rather than relying solely on the foundational model's domain understanding. We formulate this as a generalized EM algorithm that jointly optimizes semantic feature descriptors and statistical model parameters. The method prompts a frozen foundational model to classify items based on discovered features, treating these judgments as noisy observations of latent binary features that predict real-valued targets through learned statistical relation...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"","openalex_id":"https://openalex.org/W7118052403","cited_by_count":0,"quality_score":41,"matched_keywords":["LLM"],"author_affiliations":["Arizona State University","Microsoft (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7045000195503235},{"id":"https://openalex.org/C114289077","display_name":"Statistical model","score":0.5691999793052673},{"id":"https://openalex.org/C2776401178","display_name":"Feature (linguistics)","score":0.5544000267982483},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5529000163078308},{"id":"https://openalex.org/C170133592","display_name":"Latent semantic analysis","score":0.5504999756813049},{"id":"https://openalex.org/C36503486","display_name":"Domain (mathematical analysis)","score":0.5166000127792358},{"id":"https://openalex.org/C2781122975","display_name":"Semantic feature","score":0.4440000057220459},{"id":"https://openalex.org/C101814296","display_name":"Feature model","score":0.4438999891281128}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7117727641","title":"Position Paper: Artificial Intelligence in Medical Image Analysis: Advances, Clinical Translation, and Emerging Frontiers","url":"https://doi.org/10.1109/jbhi.2025.3649496","published":"2025-12-31","authors":["A. S. Panayides","H. Chen","N. D. Filipovic","Tijana Geroski","J. Hou","Karim Lekadir","K. Marias","G. K. Matsopoulos","G. Papanastasiou","Pinaki Sarder","Georgia D. Tourassi","S. A. Tsaftaris"],"abstract":"Over the past five years, artificial intelligence (AI) has introduced new models and methods for addressing the challenges associated with the broader adoption of AI models and systems in medicine. This paper reviews recent advances in AI for medical image and video analysis, outlines emerging paradigms, highlights pathways for successful clinical translation, and provides recommendations for future work. Hybrid Convolutional Neural Network (CNN) Transformer architectures now deliver state-of-the-art results in segmentation, classification, reconstruction, synthesis, and registration. Foundation and generative AI models enable the use of transfer learning to smaller datasets with limited ground truth. Federated learning supports privacy-preserving collaboration across institutions. Explainable and trustworthy AI approaches have become essential to foster clinician trust, ensure regulator...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/jbhi.2025.3649496","openalex_id":"https://openalex.org/W7117727641","cited_by_count":2,"quality_score":39,"matched_keywords":[],"author_affiliations":["Abu Dhabi University","Academy of Athens","Athena Research and Innovation Center In Information Communication & Knowledge Technologies","Cyprus University of Technology","Foundation for Research and Technology Hellas","Hong Kong University of Science and Technology","IBM Research - Almaden","Institució Catalana de Recerca i Estudis Avançats","Institute of High Performance Computing","Mediterranean University","National Technical University of Athens","New York University Abu Dhabi","Oak Ridge National Laboratory","Stony Brook University","Technical University of Crete","Tencent (China)","University of Cyprus","University of Edinburgh","University of Florida Health","University of Ioannina","University of Kragujevac","University of Louisville","University of New Mexico"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7121000289916992},{"id":"https://openalex.org/C78780964","display_name":"Position paper","score":0.6514999866485596},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.6279000043869019},{"id":"https://openalex.org/C108583219","display_name":"Deep learning","score":0.5842999815940857},{"id":"https://openalex.org/C81363708","display_name":"Convolutional neural network","score":0.5170999765396118},{"id":"https://openalex.org/C2522767166","display_name":"Data science","score":0.44519999623298645},{"id":"https://openalex.org/C157170001","display_name":"Applications of artificial intelligence","score":0.42980000376701355},{"id":"https://openalex.org/C153701036","display_name":"Trustworthiness","score":0.391400009393692}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":2}},{"id":"openalex:W7117726994","title":"An Evidence-Grounded Research Assistant for Functional Genomics and Drug Target Assessment","url":"https://doi.org/10.64898/2025.12.30.697073","published":"2025-12-31","authors":["Ksenia Sokolova","Dmitri Kosenkov","Keerthana Nallamotu","Sanketh Vedula","Daniil Sokolov","Guillermo Sapiro","Olga G. Troyanskaya"],"abstract":"The growing availability of biological data resources has transformed research, yet their effective use remains challenging: selecting appropriate sources requires domain knowledge, data are fragmented across databases, and synthesizing results into reliable conclusions is labor-intensive. Although large language models promise to address these barriers, their impact in biomedicine has been limited by unsupported statements, incorrect claims, and lack of provenance. We introduce Alvessa, an evidence-grounded agentic research assistant designed around verifiability. Alvessa integrates entity recognition, orchestration of pre-validated biological tools, and data-constrained answer generation with statement-level verification against retrieved records, explicitly flagging unsupported claims and guiding revision when reliability criteria are not met. We evaluate Alvessa on dbQA from LAB-Benc...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.64898/2025.12.30.697073","openalex_id":"https://openalex.org/W7117726994","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Apple (United States)","Princeton University","Simons Foundation"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7289999723434448},{"id":"https://openalex.org/C2777548347","display_name":"Flagging","score":0.6151999831199646},{"id":"https://openalex.org/C66782513","display_name":"Biomedicine","score":0.5580000281333923},{"id":"https://openalex.org/C36503486","display_name":"Domain (mathematical analysis)","score":0.5271999835968018},{"id":"https://openalex.org/C116834253","display_name":"Identification (biology)","score":0.5139999985694885},{"id":"https://openalex.org/C37736160","display_name":"Adversarial system","score":0.4794999957084656},{"id":"https://openalex.org/C185798385","display_name":"Benchmark (surveying)","score":0.421999990940094},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.4032999873161316}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"hf-org-paper:tencent:2512.24618","title":"Youtu-LLM: Unlocking the Native Agentic Potential for Lightweight Large Language Models","url":"https://huggingface.co/papers/2512.24618","published":"2025-12-30","authors":["Tencent/Hunyuan"],"abstract":"","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["HuggingFace org papers","tencent","LLM"],"author_affiliations":["Tencent/Hunyuan"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/tencent/papers"}},{"id":"hf-org-paper:tencent:2512.24615","title":"Youtu-Agent: Scaling Agent Productivity with Automated Generation and Hybrid Policy Optimization","url":"https://huggingface.co/papers/2512.24615","published":"2025-12-30","authors":["Tencent/Hunyuan"],"abstract":"","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["HuggingFace org papers","tencent","agent"],"author_affiliations":["Tencent/Hunyuan"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/tencent/papers"}},{"id":"hf-org-paper:baidu:2512.24077","title":"LoongFlow: Directed Evolutionary Search via a Cognitive Plan-Execute-Summarize Paradigm","url":"https://huggingface.co/papers/2512.24077","published":"2025-12-30","authors":["Baidu"],"abstract":"","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["HuggingFace org papers","baidu"],"author_affiliations":["Baidu"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/baidu/papers"}},{"id":"hf-org-paper:tencent:2512.23959","title":"Improving Multi-step RAG with Hypergraph-based Memory for Long-Context Complex Relational Modeling","url":"https://huggingface.co/papers/2512.23959","published":"2025-12-29","authors":["Tencent/Hunyuan"],"abstract":"","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["HuggingFace org papers","tencent","memory"],"author_affiliations":["Tencent/Hunyuan"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/tencent/papers"}},{"id":"hf-org-paper:tencent:2512.23273","title":"YOLO-Master: MOE-Accelerated with Specialized Transformers for Enhanced Real-time Detection","url":"https://huggingface.co/papers/2512.23273","published":"2025-12-29","authors":["Tencent/Hunyuan"],"abstract":"","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["HuggingFace org papers","tencent"],"author_affiliations":["Tencent/Hunyuan"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/tencent/papers"}},{"id":"arxiv:2512.23236","title":"KernelEvolve: Scaling Agentic Kernel Coding for Heterogeneous AI Accelerators at Meta","url":"https://huggingface.co/papers/2512.23236","published":"2025-12-29","authors":["Gang Liao","Hongsen Qin","Ying Wang","Alicia Golden","Michael Kuchnik","Yavuz Yetim","Jia Jiunn Ang","Chunli Fu","Yihan He","Samuel Hsia","Zewei Jiang","Dianshi Li"],"abstract":"Making deep learning recommendation model (DLRM) training and inference fast and efficient is important. However, this presents three key system challenges - model architecture diversity, kernel primitive diversity, and hardware generation and architecture heterogeneity. This paper presents KernelEvolve-an agentic kernel coding framework-to tackle heterogeneity at-scale for DLRM. KernelEvolve is designed to take kernel specifications as input and automate the process of kernel generation and optimization for recommendation model across heterogeneous hardware architectures. KernelEvolve does so by operating at multiple programming abstractions, from Triton and CuTe DSL to low-level hardware agnostic languages, spanning the full hardware-software optimization stack. The kernel optimization process is described as graph-based search with selection policy, universal operator, fitness functio...","companies":["Meta/FAIR"],"matched_orgs":["Meta/FAIR"],"company_groups":["company_us"],"company_regions":["US"],"sources":["huggingface_search"],"source":"huggingface_search","work_type":"","doi":"","openalex_id":"","cited_by_count":0,"quality_score":35,"matched_keywords":["retrieval","efficient"],"author_affiliations":[],"concepts":[],"official_report":false,"quality_signals":{"company_match_source":"HuggingFace Papers organization/author metadata"}},{"id":"hf-org-paper:Tencent-Hunyuan:2512.22955","title":"Diversity or Precision? A Deep Dive into Next Token Prediction","url":"https://huggingface.co/papers/2512.22955","published":"2025-12-28","authors":["Tencent/Hunyuan"],"abstract":"","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["HuggingFace org papers","Tencent-Hunyuan"],"author_affiliations":["Tencent/Hunyuan"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/Tencent-Hunyuan/papers"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/cyberthreat-eval-can-large-language-models-automate-real-world-threat-research","title":"CyberThreat-Eval: Can Large Language Models Automate Real-World Threat Research?","url":"https://www.microsoft.com/en-us/research/publication/cyberthreat-eval-can-large-language-models-automate-real-world-threat-research/","published":"2025-12-26","authors":["Xiangsen Chen","Xuan Feng (xuafeng)","Shuo Chen","Sudipto Rakshit","Diana Duvieilh","Ashley Picone","Nan Tang"],"abstract":"Analyzing Open Source Intelligence (OSINT) from large volumes of data is critical for drafting and publishing comprehensive CTI reports. This process usually follows a three-stage workflow---triage, deep search and TI drafting. While Large Language Models (LLMs) offer a promising route toward automation, existing benchmarks still have limitations. These benchmarks often consist of tasks that do not reflect real-world analyst workflows. For example, human analysts rarely receive tasks in the form of multiple-choice questions. Also, existing benchmarks often rely on model-centric metrics that emphasize lexical overlap rather than actionable, detailed insights essential for security analysts. Moreover, they typically fail to cover the complete three-stage workflow. To address these issues, we introduce CyberThreat-Eval, which is collected from the daily CTI workflow of a world-leading compa...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Article (Journal)","Security, privacy, and cryptography","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"hf-org-paper:tencent:2512.22322","title":"SmartSnap: Proactive Evidence Seeking for Self-Verifying Agents","url":"https://huggingface.co/papers/2512.22322","published":"2025-12-26","authors":["Tencent/Hunyuan"],"abstract":"","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["HuggingFace org papers","tencent"],"author_affiliations":["Tencent/Hunyuan"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/tencent/papers"}},{"id":"openalex:W7117320003","title":"DynLLM: When Large Language Models Meet Dynamic Graph-based Recommendation","url":"https://doi.org/10.1145/3786601","published":"2025-12-26","authors":["Ziwei Zhao","Fake Lin","Xi Zhu","Zhi Zheng","Tong Xu","Shitian Shen","Xueying Li","Zikai Yin","Enhong Chen Enhong Chen"],"abstract":"Recommendation systems have become ubiquitous tools in online platforms, providing personalized suggestions based on user–item interactions. To capture the dynamic higher-order connections between users and items, recommendation approaches based on dynamic graphs have garnered significant attention from researchers. However, existing recommendation methods based on dynamic graphs are often limited by data sparsity, which prevents them from achieving satisfactory performance. Fortunately, the rapid development of large language models (LLMs) with powerful text generation capabilities and extensive domain knowledge has offered new possibilities for addressing this challenge. However, how to effectively integrate LLMs with dynamic graphs remains unexplored. To bridge this gap, in this article, we propose a novel framework, that is, DynLLM, for applying LLMs to dynamic graph-based recommenda...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3786601","openalex_id":"https://openalex.org/W7117320003","cited_by_count":1,"quality_score":46,"matched_keywords":["LLM","personalized"],"author_affiliations":["Alibaba Group (China)","Rutgers, The State University of New Jersey","University of Science and Technology of China"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8510000109672546},{"id":"https://openalex.org/C41608201","display_name":"Embedding","score":0.644599974155426},{"id":"https://openalex.org/C132525143","display_name":"Graph","score":0.4943000078201294},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.47909998893737793},{"id":"https://openalex.org/C2776760102","display_name":"Code (set theory)","score":0.44769999384880066},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.3982999920845032},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.38499999046325684},{"id":"https://openalex.org/C557471498","display_name":"Recommender system","score":0.37880000472068787}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W7117308532","title":"Toward Dataset Copyright Evasion Attack Against Personalized Text-to-Image Diffusion Models","url":"https://doi.org/10.1109/tifs.2025.3648660","published":"2025-12-26","authors":["Kuofeng Gao","Yufei Zhu","Yiming Li","Jiawang Bai","Yong Yang","Zhifeng Li","Shu-Tao Xia"],"abstract":"Text-to-image (T2I) diffusion models enable high-quality image generation conditioned on textual prompts. However, fine-tuning these pre-trained models for personalization raises concerns about unauthorized dataset usage. To address this issue, dataset ownership verification (DOV) has recently been proposed, which embeds watermarks into fine-tuning datasets via backdoor techniques. These watermarks remain dormant on benign samples but produce owner-specified outputs when triggered. Despite its promise, the robustness of DOV against copyright evasion attacks (CEA) remains unexplored. In this paper, we investigate how adversaries can circumvent these mechanisms, enabling models trained on watermarked datasets to bypass ownership verification. We begin by analyzing the limitations of potential attacks achieved by backdoor removal, including TPD and T2IShield. In practice, TPD suffers from i...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/tifs.2025.3648660","openalex_id":"https://openalex.org/W7117308532","cited_by_count":0,"quality_score":45,"matched_keywords":["personalized","personalization"],"author_affiliations":["Nanyang Technological University","Shenzhen University","Tencent (China)","Tsinghua University","Tsinghua–Berkeley Shenzhen Institute"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8773999810218811},{"id":"https://openalex.org/C2781045450","display_name":"Backdoor","score":0.8709999918937683},{"id":"https://openalex.org/C63479239","display_name":"Robustness (evolution)","score":0.6880000233650208},{"id":"https://openalex.org/C150817343","display_name":"Digital watermarking","score":0.6820999979972839},{"id":"https://openalex.org/C2781251061","display_name":"Evasion (ethics)","score":0.5885999798774719},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.47839999198913574},{"id":"https://openalex.org/C2776401178","display_name":"Feature (linguistics)","score":0.4691999852657318},{"id":"https://openalex.org/C124101348","display_name":"Data mining","score":0.4383000135421753}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7140294741","title":"Research on a Mental Health Service System Based on an Improved RAG Algorithm","url":"https://doi.org/10.1109/iceace67491.2025.11439659","published":"2025-12-26","authors":["Pingping Chen","Huan Liu","Yong Zhang","Yuhao Yan","Yuehao Tang","Dingying Tan"],"abstract":"Mental health service systems face significant challenges, including high costs, limited accessibility, and a scarcity of expert resources. This paper presents an AI-driven dual-layer service system optimized by the RetrievalAugmented Generation (RAG) algorithm. By incorporating a hybrid retrieval and dynamic re-ranking mechanism, the system addresses issues such as knowledge hallucination, low recall rates, and inaccurate intervention suggestions commonly encountered in general large language models for psychological counseling. The system deploys AI digital agents (first-line) in collaboration with human experts (second-line) to provide 24/7 mental health support. Experimental results show that the optimized RAG system achieves a Top-5 knowledge recall accuracy of 99.35%. After deployment, the system serves over 30,000 users monthly, increases expert efficiency by over 200 %, and reduc...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/iceace67491.2025.11439659","openalex_id":"https://openalex.org/W7140294741","cited_by_count":0,"quality_score":41,"matched_keywords":["retrieval"],"author_affiliations":["Guangzhou University of Chinese Medicine","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6007000207901001},{"id":"https://openalex.org/C134362201","display_name":"Mental health","score":0.5134000182151794},{"id":"https://openalex.org/C11413529","display_name":"Algorithm","score":0.46810001134872437},{"id":"https://openalex.org/C3019351904","display_name":"Mental health service","score":0.426800012588501},{"id":"https://openalex.org/C2780378061","display_name":"Service (business)","score":0.4083999991416931},{"id":"https://openalex.org/C15587899","display_name":"Service system","score":0.398499995470047},{"id":"https://openalex.org/C38652104","display_name":"Computer security","score":0.35120001435279846},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3116999864578247}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7117315152","title":"Pan-Arctic Permafrost Landform and Human-Built Infrastructure Feature Detection With Vision Transformers and Location Embeddings","url":"https://doi.org/10.1109/jstars.2025.3648673","published":"2025-12-26","authors":["Amal Shehan Perera","David Fernandez","Chandi Witharana","Elias Manos","Michael Pimenta","Anna K. Liljedahl","Ingmar Nitze","Yili Yang","Todd Nicholson","Chia-Yu Hsu","Wenwen Li","Guido Grosse"],"abstract":"Accurate mapping of permafrost landforms, thaw disturbances, and human-built infrastructure at pan-Arctic scale using sub-meter resolution satellite imagery is increasingly critical. Handling petabyte-scale image data requires high performance computing and robust feature detection models. While convolutional neural network (CNN)-based deep learning approaches are widely used for remote sensing (RS), Vision Transformers (ViTs) offer advantages in capturing long-range dependencies and global context via attention mechanisms, similar to the success in transformer-based large language models. ViTs support pretraining via self-supervised learning, addressing the common limitation of labeled data in Arctic feature detection, and outperform CNNs on benchmark datasets. The Arctic domain also poses challenges for model generalization, especially when features with the same semantic class exhibit...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/jstars.2025.3648673","openalex_id":"https://openalex.org/W7117315152","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Alfred-Wegener-Institut Helmholtz-Zentrum für Polar- und Meeresforschung","Arizona State University","Google (United States)","National Center for Supercomputing Applications","University of Connecticut","University of Illinois Urbana-Champaign","Woodwell Climate Research Center"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7368000149726868},{"id":"https://openalex.org/C81363708","display_name":"Convolutional neural network","score":0.6050999760627747},{"id":"https://openalex.org/C9770341","display_name":"Geospatial analysis","score":0.5831000208854675},{"id":"https://openalex.org/C15098985","display_name":"Permafrost","score":0.5687999725341797},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5602999925613403},{"id":"https://openalex.org/C2776401178","display_name":"Feature (linguistics)","score":0.5530999898910522},{"id":"https://openalex.org/C62649853","display_name":"Remote sensing","score":0.5286999940872192},{"id":"https://openalex.org/C52622490","display_name":"Feature extraction","score":0.4683000147342682}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/see-less-see-right-bi-directional-perceptual-shaping-for-multimodal-reasoning","title":"See Less, See Right: Bi-directional Perceptual Shaping For Multimodal Reasoning","url":"https://www.microsoft.com/en-us/research/publication/see-less-see-right-bi-directional-perceptual-shaping-for-multimodal-reasoning/","published":"2025-12-25","authors":["Shuoshuo Zhang","Yizhen Zhang","Jingjing Fu","Lei Song","Jiang Bian","Jiang Bian","Yujiu Yang","Rui Wang"],"abstract":"Large vision-language models (VLMs) often benefit from intermediate visual cues, either injected via external tools or generated as latent visual tokens during reasoning, but these mechanisms still overlook fine-grained visual evidence (e.g., polylines in charts), generalize poorly across domains, and incur high inference-time cost. In this paper, we propose Bi-directional Perceptual Shaping (BiPS), which transforms question-conditioned masked views into bidirectional where-to-look signals that shape perception during training. BiPS first applies a KL-consistency constraint between the original image and an evidence-preserving view that keeps only question-relevant regions, encouraging coarse but complete coverage of supporting pixels. It then applies a KL-separation constraint between the original and an evidence-ablated view where critical pixels are masked so the image no longer suppo...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Miscellaneous","Computer vision","Computer science","Vision-language models"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W7117124506","title":"Explaining categorical feature interactions using graph covariance and LLMs","url":"https://doi.org/10.1007/s41109-025-00770-3","published":"2025-12-24","authors":["Cencheng Shen","Darren Edge","Jonathan Larson","Carey E. Priebe"],"abstract":"Modern datasets often consist of numerous samples with abundant features and associated timestamps. Analyzing such datasets to uncover underlying events typically requires complex statistical methods and substantial domain expertise. A notable example, and the primary data focus of this paper, is the global synthetic dataset from the Counter Trafficking Data Collaborative (CTDC)—a global hub of human trafficking data containing over 200,000 anonymized records spanning from 2002 to 2022, with numerous categorical features for each record. In this paper, we propose a fast and scalable method for analyzing and extracting significant categorical feature interactions, and querying large language models (LLMs) to generate data-driven insights that explain these interactions. Our approach begins with a binarization step for categorical features using one-hot encoding, followed by the computatio...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1007/s41109-025-00770-3","openalex_id":"https://openalex.org/W7117124506","cited_by_count":0,"quality_score":41,"matched_keywords":["LLM"],"author_affiliations":["Johns Hopkins University","Microsoft (United States)","Microsoft Research (United Kingdom)","University of Delaware"],"concepts":[{"id":"https://openalex.org/C5274069","display_name":"Categorical variable","score":0.8870999813079834},{"id":"https://openalex.org/C2776401178","display_name":"Feature (linguistics)","score":0.6535999774932861},{"id":"https://openalex.org/C178650346","display_name":"Covariance","score":0.616100013256073},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6090999841690063},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5163999795913696},{"id":"https://openalex.org/C132525143","display_name":"Graph","score":0.4781000018119812},{"id":"https://openalex.org/C153180895","display_name":"Pattern recognition (psychology)","score":0.45419999957084656},{"id":"https://openalex.org/C124101348","display_name":"Data mining","score":0.45089998841285706}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"arxiv:2512.20856","title":"NVIDIA Nemotron 3: Efficient and Open Intelligence","url":"https://huggingface.co/papers/2512.20856","published":"2025-12-24","authors":["NVIDIA","Aaron Blakeman","Aaron Grattafiori","Aarti Basant","Abhibha Gupta","Abhinav Khattar","Adi Renduchintala","Aditya Vavre","Akanksha Shukla","Akhiad Bercovich","Aleksander Ficek","Aleksandr Shaposhnikov"],"abstract":"We introduce the Nemotron 3 family of models - Nano, Super, and Ultra. These models deliver strong agentic, reasoning, and conversational capabilities. The Nemotron 3 family uses a Mixture-of-Experts hybrid Mamba-Transformer architecture to provide best-in-class throughput and context lengths of up to 1M tokens. Super and Ultra models are trained with NVFP4 and incorporate LatentMoE, a novel approach that improves model quality. The two larger models also include MTP layers for faster text generation. All Nemotron 3 models are post-trained using multi-environment reinforcement learning enabling reasoning, multi-step tool use, and support granular reasoning budget control. Nano, the smallest model, outperforms comparable models in accuracy while remaining extremely cost-efficient for inference. Super is optimized for collaborative agents and high-volume workloads such as IT ticket automat...","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["huggingface_search"],"source":"huggingface_search","work_type":"","doi":"","openalex_id":"","cited_by_count":0,"quality_score":31,"matched_keywords":["efficient"],"author_affiliations":[],"concepts":[],"official_report":false,"quality_signals":{"company_match_source":"HuggingFace Papers organization/author metadata"}},{"id":"hf-org-paper:stepfun-ai:2512.20491","title":"Step-DeepResearch Technical Report","url":"https://huggingface.co/papers/2512.20491","published":"2025-12-23","authors":["StepFun"],"abstract":"As LLMs shift toward autonomous agents, Deep Research has emerged as a pivotal metric. However, existing academic benchmarks like BrowseComp often fail to meet real-world demands for open-ended research, which requires robust skills in intent recognition, long-horizon decision-making, and cross-source verification. To address this, we introduce Step-DeepResearch, a cost-effective, end-to-end agent. We propose a Data Synthesis Strategy Based on Atomic Capabilities to reinforce planning and report writing, combined with a progressive training path from agentic mid-training to SFT and RL. Enhanced by a Checklist-style Judger, this approach significantly improves robustness. Furthermore, to bridge the evaluation gap in the Chinese domain, we establish ADR-Bench for realistic deep research scenarios. Experimental results show that Step-DeepResearch (32B) scores 61.4% on Scale AI Research Rubr...","companies":["StepFun"],"matched_orgs":["StepFun"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["HuggingFace org papers","stepfun-ai","agent"],"author_affiliations":["StepFun"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/stepfun-ai/papers"}},{"id":"arxiv:2512.20848","title":"Nemotron 3 Nano: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning","url":"https://huggingface.co/papers/2512.20848","published":"2025-12-23","authors":["NVIDIA","Aaron Blakeman","Aaron Grattafiori","Aarti Basant","Abhibha Gupta","Abhinav Khattar","Adi Renduchintala","Aditya Vavre","Akanksha Shukla","Akhiad Bercovich","Aleksander Ficek","Aleksandr Shaposhnikov"],"abstract":"We present Nemotron 3 Nano 30B-A3B, a Mixture-of-Experts hybrid Mamba-Transformer language model. Nemotron 3 Nano was pretrained on 25 trillion text tokens, including more than 3 trillion new unique tokens over Nemotron 2, followed by supervised fine tuning and large-scale RL on diverse environments. Nemotron 3 Nano achieves better accuracy than our previous generation Nemotron 2 Nano while activating less than half of the parameters per forward pass. It achieves up to 3.3x higher inference throughput than similarly-sized open models like GPT-OSS-20B and Qwen3-30B-A3B-Thinking-2507, while also being more accurate on popular benchmarks. Nemotron 3 Nano demonstrates enhanced agentic, reasoning, and chat abilities and supports context lengths up to 1M tokens. We release both our pretrained Nemotron 3 Nano 30B-A3B Base and post-trained Nemotron 3 Nano 30B-A3B checkpoints on Hugging Face.","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["huggingface_search"],"source":"huggingface_search","work_type":"","doi":"","openalex_id":"","cited_by_count":0,"quality_score":35,"matched_keywords":["language model","efficient"],"author_affiliations":[],"concepts":[],"official_report":false,"quality_signals":{"company_match_source":"HuggingFace Papers organization/author metadata"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/generalization-of-rlvr-using-causal-reasoning-as-a-testbed","title":"Generalization of RLVR Using Causal Reasoning as a Testbed","url":"https://www.microsoft.com/en-us/research/publication/generalization-of-rlvr-using-causal-reasoning-as-a-testbed/","published":"2025-12-22","authors":["Brian Lu","Hongyu Zhao","Shuo Sun","Hao Peng","Rui Ding","Hongyuan Mei"],"abstract":"Reinforcement learning with verifiable rewards (RLVR) has emerged as a promising paradigm for post-training large language models (LLMs) on complex reasoning tasks. Yet, the conditions under which RLVR yields robust generalization remain poorly understood. This paper provides an empirical study of RLVR generalization in the setting of probabilistic inference over causal graphical models. This setting offers two natural axes along which to examine generalization: (i) the level of the probabilistic query -- associational, interventional, or counterfactual -- and (ii) the structural complexity of the query, measured by the size of its relevant subgraph. We construct datasets of causal graphs and queries spanning these difficulty axes and fine-tune Qwen-2.5-Instruct models using RLVR or supervised fine-tuning (SFT). We vary both the model scale (3B-32B) and the query level included in traini...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","Reinforcement learning","1970-01-01","LLM"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/lola-long-horizon-latent-action-learning-for-general-robot-manipulation","title":"LoLA: Long Horizon Latent Action Learning for General Robot Manipulation","url":"https://www.microsoft.com/en-us/research/publication/lola-long-horizon-latent-action-learning-for-general-robot-manipulation/","published":"2025-12-22","authors":["Xiaofan Wang","Xingyu Gao","Jianlong Fu","Zuolei Li","Dean Fortier","Galen Mullins","Andrey Kolobov","Baining Guo"],"abstract":"The capability of performing long-horizon, language-guided robotic manipulation tasks critically relies on leveraging historical information and generating coherent action sequences. However, such capabilities are often overlooked by existing Vision-Language-Action (VLA) models. To solve this challenge, we propose LoLA (Long Horizon Latent Action Learning), a framework designed for robot manipulation that integrates long-term multi-view observations and robot proprioception to enable multi-step reasoning and action generation. We first employ Vision-Language Models to encode rich contextual features from historical sequences and multi-view observations. We further introduces a key module, State-Aware Latent Re-representation, which transforms visual inputs and language commands into actionable robot motion space. Unlike existing VLA approaches that merely concatenate robot proprioception...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Miscellaneous","Computer vision","Computer science","Vision-language models","long-term"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W7116898200","title":"DiffNMR: diffusion models for nuclear magnetic resonance spectra elucidation","url":"https://doi.org/10.1088/2752-5724/ae301f","published":"2025-12-22","authors":["Qingsong Yang","Binglan Wu","Xuwei Liu","Bo Chen","Wei Li","Gen Long","Xin Chen","Mingjun Xiao"],"abstract":"Abstract Nuclear magnetic resonance (NMR) spectroscopy is a key method for molecular structure elucidation. However, interpreting NMR spectra to deduce molecular structures remains challenging due to the complexity of spectral data and the vastness of the chemical space. Here we introduce DiffNMR, a novel end-to-end framework that leverages a conditional discrete diffusion model for de novo molecular structure elucidation from NMR spectra. DiffNMR refines molecular graphs iteratively through a diffusion-based generative process, ensuring global consistency and mitigating error accumulation inherent in autoregressive methods. The framework integrates a two-stage pretraining strategy that aligns spectral and molecular representations via a diffusion autoencoder and contrastive learning. It also incorporates retrieval initialization and similarity filtering during inference. Our experimenta...","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1088/2752-5724/ae301f","openalex_id":"https://openalex.org/W7116898200","cited_by_count":1,"quality_score":46,"matched_keywords":["retrieval","efficient"],"author_affiliations":["Baidu (China)","Suzhou Research Institute","The Fifth People’s Hospital of Suzhou","University of Science and Technology of China"],"concepts":[{"id":"https://openalex.org/C114466953","display_name":"Initialization","score":0.6952999830245972},{"id":"https://openalex.org/C101738243","display_name":"Autoencoder","score":0.5601999759674072},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.513700008392334},{"id":"https://openalex.org/C159877910","display_name":"Autoregressive model","score":0.4438999891281128},{"id":"https://openalex.org/C2776436953","display_name":"Consistency (knowledge bases)","score":0.43849998712539673},{"id":"https://openalex.org/C66974803","display_name":"Nuclear magnetic resonance spectroscopy","score":0.438400000333786},{"id":"https://openalex.org/C4839761","display_name":"Spectral line","score":0.4000000059604645},{"id":"https://openalex.org/C11413529","display_name":"Algorithm","score":0.3912999927997589}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/vibe-reasoning-eliciting-frontier-ai-mathematical-capabilities-a-case-study-on-imo-2025-problem-6","title":"Vibe Reasoning: Eliciting Frontier AI Mathematical Capabilities -- A Case Study on IMO 2025 Problem 6","url":"https://www.microsoft.com/en-us/research/publication/vibe-reasoning-eliciting-frontier-ai-mathematical-capabilities-a-case-study-on-imo-2025-problem-6/","published":"2025-12-21","authors":["Jiaao Wu","Xian Zhang","Fan Yang","Yinpeng Dong"],"abstract":"We introduce Vibe Reasoning, a human-AI collaborative paradigm for solving complex mathematical problems. Our key insight is that frontier AI models already possess the knowledge required to solve challenging problems -- they simply do not know how, what, or when to apply it. Vibe Reasoning transforms AI's latent potential into manifested capability through generic meta-prompts, agentic grounding, and model orchestration. We demonstrate this paradigm through IMO 2025 Problem 6, a combinatorial optimization problem where autonomous AI systems publicly reported failures. Our solution combined GPT-5's exploratory capabilities with Gemini 3 Pro's proof strengths, leveraging agentic workflows with Python code execution and file-based memory, to derive both the correct answer (2112) and a rigorous mathematical proof. Through iterative refinement across multiple attempts, we discovered the nece...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Miscellaneous","Mathematics","Computer science","mathematics","memory"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/peak-a-performance-engineering-ai-assistant-for-gpu-kernels-powered-by-natural-language-transformations","title":"PEAK: A Performance Engineering AI-Assistant for GPU Kernels Powered by Natural Language Transformations","url":"https://www.microsoft.com/en-us/research/publication/peak-a-performance-engineering-ai-assistant-for-gpu-kernels-powered-by-natural-language-transformations/","published":"2025-12-21","authors":["M. Tariq","Abhinav Jangda","Angelica Moreira","Madan Musuvathi","Tyler Sorensen"],"abstract":"Advancements in large language models (LLMs) are showing promising impact in software development and programming assistance. However, these models struggle when operating on low-level backend code. This challenge is exacerbated in the domain of GPU kernels, where performance-critical details are coupled to rapidly evolving hardware characteristics and available code examples are sparse. In this work, we introduce PEAK, a Performance Engineering AI-Assistant for GPU Kernels powered by natural language transformations. PEAK utilizes the key insight that iterative code transformations (optimizations) can straightforwardly be written in natural language, and then carried out by LLMs. Thus, these transformations can be rapidly developed, encoding general portable optimizations, but also easily specialized to specific GPU devices and even kernels. These natural transformations are supported b...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Miscellaneous","Artificial intelligence","Computer science","large language models","agent"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/adapting-language-models-for-low-resource-programming-languages","title":"Adapting Language Models for Low-Resource Programming Languages","url":"https://www.microsoft.com/en-us/research/publication/adapting-language-models-for-low-resource-programming-languages/","published":"2025-12-20","authors":["Mukul Singh","Hosein Hasanbeig","Ananya Singha","Arjun Radhakrishna","Sumit Gulwani"],"abstract":"Large Language Models (LLMs) have achieved remarkable success in code generation, yet their capabilities remain predominantly concentrated in well-resourced programming languages such as Python and Java. In contrast, low-resource programming languages present a significant challenge due to limited available data and unique syntax features. In this paper, we systematically implement and evaluate four core adaptation techniques (retrieval-augmented generation, agentic architectures, tool calling and feedback guided generation) to understand how these models can be better improved for underrepresented programming languages. Our findings reveal that tool calling is particularly effective for low-resource languages, outperforming its performance on high-resource counterparts. Conversely, high-resource languages show a stronger preference for agentic workflows and RAG, likely due to the models...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","1970-01-01","preference","retrieval"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W7117157478","title":"Attitudes toward large language model-based Artificial Intelligence systems as an information source for shared decision-making in radiation oncology","url":"https://doi.org/10.1093/oncolo/oyaf414","published":"2025-12-20","authors":["R. Moser","Lena Marie Buchecker","Jana Nano","Nina A. Mayr","S Behzadi","Sophia Kiesl","Sophie Maier","Luisa Allwohn","Jacqueline Lammert","Lisa C. Adams","Max Tschochohei","Stephanie E Combs"],"abstract":"BACKGROUND: Implementing structured shared decision-making (SDM) requires high-quality, reliable patient information. In radiation oncology, patients often have limited knowledge and misconceptions about therapy and side effects, affecting their decision-making. Large language model-based AI systems (LLMs) may help by providing evidence-based information in accessible language, but successful implementation depends on the willingness of patients and health care professionals (HCPs) to adopt these technologies. METHODS: A survey was conducted among patients undergoing radiation therapy and HCPs between 03/2024 and 02/2025. Data was collected using structured electronic questionnaires (32 items for patients, 35 for HCPs). The survey assessed sociodemographic characteristics, the status of SDM in oncology, sources of information relevant to SDM, and current and anticipated LLM applications....","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1093/oncolo/oyaf414","openalex_id":"https://openalex.org/W7117157478","cited_by_count":1,"quality_score":46,"matched_keywords":["LLM","language model"],"author_affiliations":["German Cancer Research Center","Google (United States)","Helmholtz Zentrum München","Michigan State University","TUM Klinikum","Technical University of Munich"],"concepts":[{"id":"https://openalex.org/C2992520072","display_name":"Radiation oncology","score":0.8145999908447266},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.49630001187324524},{"id":"https://openalex.org/C18296254","display_name":"Skepticism","score":0.48080000281333923},{"id":"https://openalex.org/C2778095710","display_name":"Information source (mathematics)","score":0.367900013923645},{"id":"https://openalex.org/C2522767166","display_name":"Data science","score":0.34049999713897705},{"id":"https://openalex.org/C180198813","display_name":"Information system","score":0.3375999927520752},{"id":"https://openalex.org/C26517878","display_name":"Key (lock)","score":0.3255999982357025},{"id":"https://openalex.org/C15744967","display_name":"Psychology","score":0.31859999895095825}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4417519375","title":"A Data-Centric Perspective on the Lifecycle of Large Language Models","url":"https://doi.org/10.36227/techrxiv.176620610.03288677/v1","published":"2025-12-20","authors":["Jun Rao","Xuebo Liu","Haotian Yan","Junjie Shen","Haizhen Mo","Yanghaopeng Dong","Zihao Yan","Ziyi Wang","Zepeng Lin","Xiaojun Meng","Zixiong Yu","Liqun Deng"],"abstract":"This survey reframes large language model (LLM) development through a purely data-centric lens, arguing that downstream capability is shaped primarily by the quality and evolution of training data rather than parameter count. We systematically map the","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.36227/techrxiv.176620610.03288677/v1","openalex_id":"https://openalex.org/W4417519375","cited_by_count":0,"quality_score":45,"matched_keywords":["LLM","language model"],"author_affiliations":["Harbin Institute of Technology","Huawei Technologies (China)","Huawei Technologies (Sweden)"],"concepts":[{"id":"https://openalex.org/C12713177","display_name":"Perspective (graphical)","score":0.7770000100135803},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6187000274658203},{"id":"https://openalex.org/C2779530757","display_name":"Quality (philosophy)","score":0.555899977684021},{"id":"https://openalex.org/C2776207758","display_name":"Downstream (manufacturing)","score":0.45820000767707825},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.4325000047683716},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.41040000319480896},{"id":"https://openalex.org/C67186912","display_name":"Data modeling","score":0.35089999437332153},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.3479999899864197}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4417523660","title":"CoSwinNet: A conditional Swin Transformer multimodal surrogate model for subsurface multiphase flow","url":"https://doi.org/10.1016/j.fuel.2025.138067","published":"2025-12-20","authors":["Zhao Feng","Zeeshan Tariq","Zhong Zhang","Peilin Zhao","Ruize Zhao","Wenhao Wang","Xinwo Huang","Bicheng Yan","Xianda Shen","Fengshou Zhang"],"abstract":"","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1016/j.fuel.2025.138067","openalex_id":"https://openalex.org/W4417523660","cited_by_count":1,"quality_score":38,"matched_keywords":[],"author_affiliations":["King Abdullah University of Science and Technology","Tencent (China)","Tongji University"],"concepts":[{"id":"https://openalex.org/C2779379648","display_name":"Multiphase flow","score":0.8019000291824341},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6689000129699707},{"id":"https://openalex.org/C45374587","display_name":"Computation","score":0.6258999705314636},{"id":"https://openalex.org/C2776214188","display_name":"Inference","score":0.6039999723434448},{"id":"https://openalex.org/C66322947","display_name":"Transformer","score":0.535099983215332},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5048999786376953},{"id":"https://openalex.org/C131675550","display_name":"Surrogate model","score":0.48100000619888306},{"id":"https://openalex.org/C108583219","display_name":"Deep learning","score":0.47589999437332153}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4417527082","title":"Leveraging saliency-based pre-trained foundation model representations to uncover breathing patterns in speech","url":"https://doi.org/10.1016/j.csl.2025.101926","published":"2025-12-20","authors":["Vikramjit Mitra","Anirban Chatterjee","Kejie Zhai","Helen Y. Weng","Ayuko Hill","Nicole Hay","C. E. Webb","Jamie Cheng","Erdrin Azemi"],"abstract":"","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1016/j.csl.2025.101926","openalex_id":"https://openalex.org/W4417527082","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Apple (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8284000158309937},{"id":"https://openalex.org/C176217482","display_name":"Metric (unit)","score":0.7243000268936157},{"id":"https://openalex.org/C39300077","display_name":"Breathing","score":0.6334999799728394},{"id":"https://openalex.org/C2778263558","display_name":"Microphone","score":0.5877000093460083},{"id":"https://openalex.org/C28490314","display_name":"Speech recognition","score":0.5540000200271606},{"id":"https://openalex.org/C2780009758","display_name":"Measure (data warehouse)","score":0.4641000032424927},{"id":"https://openalex.org/C8213797","display_name":"Respiratory rate","score":0.45809999108314514},{"id":"https://openalex.org/C98045186","display_name":"Process (computing)","score":0.45489999651908875}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"apple:fu5g01v3bbmd6r5jbmx5wm99","title":"BED-LLM: Intelligent Information Gathering with LLMs and Bayesian Experimental Design","url":"https://machinelearning.apple.com/research/bed-llm","published":"2025-12-19","authors":["Deepro Choudhury","Sinead Williamson","Adam Goliński","Ning Miao","Freddie Bickford Smith","Michael Kirchhof","Yizhe Zhang","Tom Rainforth"],"abstract":"We propose a general-purpose approach for improving the ability of Large Language Models (LLMs) to intelligently and adaptively gather information from a user or other external source using the framework of sequential Bayesian experimental design (BED). This enables LLMs to act as effective multi-turn conversational agents and interactively interface with external environments. Our approach, which we call BED-LLM (Bayesian Experimental Design...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["LLM"],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"openalex:W7134199037","title":"FIRE-RAG: Focused Integration and Refinement of Evidence for Retrieval-Augmented Generation","url":"https://doi.org/10.1109/icvrv67992.2025.00107","published":"2025-12-19","authors":["Jiarui Wu","Xia Yang","Tong Wu","Ke Li","Fei Chao"],"abstract":"","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/icvrv67992.2025.00107","openalex_id":"https://openalex.org/W7134199037","cited_by_count":0,"quality_score":41,"matched_keywords":["retrieval"],"author_affiliations":["Tencent (China)","Xiamen University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5062999725341797},{"id":"https://openalex.org/C26517878","display_name":"Key (lock)","score":0.2694000005722046},{"id":"https://openalex.org/C2779343474","display_name":"Context (archaeology)","score":0.26899999380111694},{"id":"https://openalex.org/C177264268","display_name":"Set (abstract data type)","score":0.25060001015663147},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.24719999730587006},{"id":"https://openalex.org/C72634772","display_name":"Data integration","score":0.2460000067949295},{"id":"https://openalex.org/C165064840","display_name":"Matching (statistics)","score":0.24169999361038208},{"id":"https://openalex.org/C2164484","display_name":"Core (optical fiber)","score":0.23579999804496765}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4417507535","title":"High-speed X-ray tomography for 4D imaging","url":"https://doi.org/10.1073/pnas.2521089122","published":"2025-12-19","authors":["Ivan Grega","WILLIAM F. WHITNEY","V.S. Deshpande"],"abstract":"Capturing high-rate spatiotemporal deformation of materials in three dimensions (3D) remains a significant challenge with current X-ray imaging techniques. We present a methodology that combines advances in neural rendering techniques with volume correlation methods to accurately reconstruct complex, high-rate 3D spatiotemporal structural evolutions. The fidelity and versatility of the method, which requires no pretraining, are demonstrated for a diverse set of intricate 3D-printed microarchitected solids. Using laboratory-based X-ray tomography, we capture the 3D growth of a high-rate crush band on a timescale of less than 100 ms. By broadening this idea to a stereo X-ray concept, we eliminate the need to rotate the image object, thereby extending the technique to significantly faster timescales. Our neural rendering framework opens possibilities for 3D observations of viscoelastic resp...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1073/pnas.2521089122","openalex_id":"https://openalex.org/W4417507535","cited_by_count":2,"quality_score":39,"matched_keywords":[],"author_affiliations":["Google (United States)","Google DeepMind (United Kingdom)","University of Cambridge"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.603600025177002},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.5561000108718872},{"id":"https://openalex.org/C205711294","display_name":"Rendering (computer graphics)","score":0.5375000238418579},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5267000198364258},{"id":"https://openalex.org/C30769735","display_name":"Volume rendering","score":0.5245000123977661},{"id":"https://openalex.org/C115635565","display_name":"Digital image correlation","score":0.350600004196167},{"id":"https://openalex.org/C177264268","display_name":"Set (abstract data type)","score":0.33570000529289246},{"id":"https://openalex.org/C141379421","display_name":"Iterative reconstruction","score":0.3181999921798706}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":2}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/aligndp-hybrid-differential-privacy-with-rarity-aware-protection-for-llms","title":"AlignDP: Hybrid Differential Privacy with Rarity-Aware Protection for LLMs","url":"https://www.microsoft.com/en-us/research/publication/aligndp-hybrid-differential-privacy-with-rarity-aware-protection-for-llms/","published":"2025-12-18","authors":["Madhava Gaikwad"],"abstract":"Large language models are exposed to risks of extraction, distillation, and unauthorized fine-tuning. Existing defenses use watermarking or monitoring, but these act after leakage. We design AlignDP, a hybrid privacy lock that blocks knowledge transfer at the data interface. The key idea is to separate rare and non-rare fields. Rare fields are shielded by PAC indistinguishability, giving effective zero-epsilon local DP. Non-rare fields are privatized with RAPPOR, giving unbiased frequency estimates under local DP. A global aggregator enforces composition and budget. This two-tier design hides rare events and adds controlled noise to frequent events. We prove limits of PAC extension to global aggregation, give bounds for RAPPOR estimates, and analyze utility trade-off. A toy simulation confirms feasibility: rare categories remain hidden, frequent categories are recovered with small error.","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Miscellaneous","Artificial intelligence","Computer science","large language models","distillation"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"hf-org-paper:tencent:2512.17220","title":"Mindscape-Aware Retrieval Augmented Generation for Improved Long Context Understanding","url":"https://huggingface.co/papers/2512.17220","published":"2025-12-18","authors":["Tencent/Hunyuan"],"abstract":"","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["HuggingFace org papers","tencent","retrieval"],"author_affiliations":["Tencent/Hunyuan"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/tencent/papers"}},{"id":"hf-org-paper:tencent:2512.16561","title":"N3D-VLM: Native 3D Grounding Enables Accurate Spatial Reasoning in Vision-Language Models","url":"https://huggingface.co/papers/2512.16561","published":"2025-12-18","authors":["Tencent/Hunyuan"],"abstract":"","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["HuggingFace org papers","tencent"],"author_affiliations":["Tencent/Hunyuan"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/tencent/papers"}},{"id":"hf-org-paper:tencent:2512.16767","title":"Make-It-Poseable: Feed-forward Latent Posing Model for 3D Humanoid Character Animation","url":"https://huggingface.co/papers/2512.16767","published":"2025-12-18","authors":["Tencent/Hunyuan"],"abstract":"","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["HuggingFace org papers","tencent"],"author_affiliations":["Tencent/Hunyuan"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/tencent/papers"}},{"id":"official:9672a782ca55df0f","title":"How Good is Post-Hoc Watermarking With Language Model Rephrasing?","url":"https://ai.meta.com/research/publications/how-good-is-post-hoc-watermarking-with-language-model-rephrasing/","published":"2025-12-18","authors":["Pierre Fernandez","Tom Sander","Hady Elsahar","Hongyan Chang","Tomáš Souček","Sylvestre Rebuffi","Valeriu Lacatusu","Tuan Tran","Alexandre Mourachko"],"abstract":"Official Meta AI publication page.","companies":["Meta/FAIR"],"matched_orgs":["Meta/FAIR"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["NLP","language model"],"author_affiliations":["Meta/FAIR"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Meta AI publication search https://ai.meta.com/results/?content_types%5B0%5D=publication&page=2"}},{"id":"official:09728316ddb52146","title":"Addendum to GPT-5.2 System Card: GPT-5.2-Codex","url":"https://openai.com/index/gpt-5-2-codex-system-card","published":"2025-12-18","authors":["OpenAI"],"abstract":"","companies":["OpenAI"],"matched_orgs":["OpenAI"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Publication"],"author_affiliations":["OpenAI"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official RSS feed https://openai.com/news/rss.xml"}},{"id":"official:aa9bb366e2df1276","title":"Project Vend: Phase two","url":"https://www.anthropic.com/research/project-vend-2","published":"2025-12-18","authors":["Anthropic"],"abstract":"In June, we revealed that we’d set up a small shop in our San Francisco office lunchroom, run by an AI shopkeeper. It was part of Project Vend, a free-form experiment exploring how well AIs could do on complex, real-world tasks. How has Claude's business been since we last wrote?","companies":["Anthropic"],"matched_orgs":["Anthropic"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Anthropic"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Anthropic research page https://www.anthropic.com/research"}},{"id":"openalex:W4417465992","title":"Uncovering inequalities in new knowledge learning by large language models across different languages","url":"https://doi.org/10.1073/pnas.2514626122","published":"2025-12-18","authors":["Chenglong Wang","Haoyu Tang","Xiyuan Yang","Yueqi Xie","Yueqi Xie","Jina Suh","Sunayana Sitaram","Junming Huang","Yu Xie","Yu Xie","Pengjun Zhao","Zhaoya Gong"],"abstract":"As large language models (LLMs) gradually demonstrate their potential to boost productivity and become integral tools for problem-solving in daily life worldwide, understanding the linguistic inequalities they introduce is becoming increasingly important. Prior research has primarily focused on static analyses of disparities in existing knowledge and capabilities of LLMs across languages. However, LLMs are continuously evolving, acquiring new knowledge to provide current, relevant responses and deliver precise, expert-level answers in specific domains. Investigating linguistic inequalities within this dynamic learning process is, therefore, also essential. In this paper, we explore inequalities in new knowledge learning by LLMs across different languages and four key dimensions: effectiveness, transferability, prioritization, and robustness. Through extensive experiments in both in-conte...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1073/pnas.2514626122","openalex_id":"https://openalex.org/W4417465992","cited_by_count":4,"quality_score":41,"matched_keywords":[],"author_affiliations":["Artificial Intelligence in Medicine (Canada)","Center for Social Sciences","China Academy of Urban Planning and Design","China Institutes of Contemporary International Relations","Hong Kong University of Science and Technology","Institute of Contemporary History","Jiangsu Provincial Urban Planning and Design Institute","Microsoft (United States)","Microsoft Research (India)","Microsoft Research (United Kingdom)","Microsoft Research Asia (China)","Ministry of Natural Resources","Peking University","Peking University Shenzhen Hospital","Princeton University","Renmin University of China","Shanghai Guanghua Hospital of Integrated Traditional Chinese and Western Medicine","Shenzhen University","University of Hong Kong","University of Illinois Urbana-Champaign","University of International Relations"],"concepts":[{"id":"https://openalex.org/C45555294","display_name":"Inequality","score":0.5383999943733215},{"id":"https://openalex.org/C26517878","display_name":"Key (lock)","score":0.4577000141143799},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.43709999322891235},{"id":"https://openalex.org/C98045186","display_name":"Process (computing)","score":0.41440001130104065},{"id":"https://openalex.org/C204983608","display_name":"Productivity","score":0.3718999922275543},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3479999899864197},{"id":"https://openalex.org/C2779304628","display_name":"Face (sociological concept)","score":0.3441999852657318},{"id":"https://openalex.org/C18762648","display_name":"Work (physics)","score":0.31200000643730164}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":4}},{"id":"openalex:W4417465597","title":"Evaluating the Social Impact of Generative AI Systems","url":"https://doi.org/10.1093/oxfordhb/9780198940272.013.0025","published":"2025-12-18","authors":["Irene Solaiman","Zeerak Talat","William S. Agnew","Lama Ahmad","Dylan Baker","Su Lin Blodgett","Canyu Chen","Hal Daumé","Jesse Dodge","Isabella Duan","Ellie Evans","F. Friedrich"],"abstract":"Abstract Generative artificial intelligence (AI) systems across modalities, ranging from text, code, image, audio, and video, have broad social impacts, but there is little agreement on which impacts to evaluate or how to evaluate them. In this chapter, we present a guide for evaluating base generative AI systems (i.e. systems without predetermined applications or deployment contexts). We propose a framework of two overarching categories: what can be evaluated in a system independent of context and what requires societal context. For the former, we define seven areas of interest: stereotypes and representational harms; cultural values and sensitive content; disparate performance; privacy and data protection; financial costs; environmental costs; and data and content moderation labor costs. For the latter, we present five areas: trustworthiness and autonomy; inequality, marginalization, a...","companies":["OpenAI"],"matched_orgs":["OpenAI"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"book-chapter","doi":"https://doi.org/10.1093/oxfordhb/9780198940272.013.0025","openalex_id":"https://openalex.org/W4417465597","cited_by_count":1,"quality_score":38,"matched_keywords":[],"author_affiliations":["Allen Institute","Artificial Intelligence in Medicine (Canada)","Carnegie Mellon University","Coherent (United States)","FACE Foundation","Film Independent","German Research Centre for Artificial Intelligence","Hugging Face","Illinois Institute of Technology","Internet Society","Iowa State University","Language Science (South Korea)","Lovelace Clinic Foundation Research","Microsoft Research (United Kingdom)","Mila - Quebec Artificial Intelligence Institute","National Institute of Standards and Technology","OpenAI (United States)","Simon Fraser University","Stanford University","The University of Texas at Austin","University of Amsterdam","University of California, Berkeley","University of California, Los Angeles","University of Chicago","University of Edinburgh","University of Illinois Chicago","University of Maryland, College Park","University of Oxford"],"concepts":[{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.6062999963760376},{"id":"https://openalex.org/C2779343474","display_name":"Context (archaeology)","score":0.600600004196167},{"id":"https://openalex.org/C105339364","display_name":"Software deployment","score":0.5523999929428101},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5304999947547913},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4562000036239624},{"id":"https://openalex.org/C153701036","display_name":"Trustworthiness","score":0.4424999952316284},{"id":"https://openalex.org/C93225998","display_name":"Moderation","score":0.4083999991416931},{"id":"https://openalex.org/C2522767166","display_name":"Data science","score":0.4036000072956085}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"hf-org-paper:stepfun-ai:2512.15431","title":"Step-GUI Technical Report","url":"https://huggingface.co/papers/2512.15431","published":"2025-12-17","authors":["StepFun"],"abstract":"Recent advances in multimodal large language models unlock unprecedented opportunities for GUI automation. However, a fundamental challenge remains: how to efficiently acquire high-quality training data while maintaining annotation reliability? We introduce a self-evolving training pipeline powered by the Calibrated Step Reward System, which converts model-generated trajectories into reliable training signals through trajectory-level calibration, achieving >90% annotation accuracy with 10-100x lower cost. Leveraging this pipeline, we introduce Step-GUI, a family of models (4B/8B) that achieves state-of-the-art GUI performance (8B: 80.2% AndroidWorld, 48.5% OSWorld, 62.6% ScreenShot-Pro) while maintaining robust general capabilities. As GUI agent capabilities improve, practical deployment demands standardized interfaces across heterogeneous devices while protecting user privacy. To this e...","companies":["StepFun"],"matched_orgs":["StepFun"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["HuggingFace org papers","stepfun-ai","agent"],"author_affiliations":["StepFun"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/stepfun-ai/papers"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/flashportrait-6x-faster-infinite-portrait-animation-with-adaptive-latent-prediction","title":"FlashPortrait: 6x Faster Infinite Portrait Animation with Adaptive Latent Prediction","url":"https://www.microsoft.com/en-us/research/publication/flashportrait-6x-faster-infinite-portrait-animation-with-adaptive-latent-prediction/","published":"2025-12-17","authors":["Shuyuan Tu","Yueming Pan","Yinming Huang","Xintong Han","Zhen Xing","Qi Dai","Kai Qiu","Chong Luo","Zuxuan Wu"],"abstract":"Current diffusion-based acceleration methods for long-portrait animation struggle to ensure identity (ID) consistency. This paper presents FlashPortrait, an end-to-end video diffusion transformer capable of synthesizing ID-preserving, infinite-length videos while achieving up to 6x acceleration in inference speed. In particular, FlashPortrait begins by computing the identity-agnostic facial expression features with an off-the-shelf extractor. It then introduces a Normalized Facial Expression Block to align facial features with diffusion latents by normalizing them with their respective means and variances, thereby improving identity stability in facial modeling. During inference, FlashPortrait adopts a dynamic sliding-window scheme with weighted blending in overlapping areas, ensuring smooth transitions and ID consistency in long animations. In each context window, based on the latent va...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Miscellaneous","Computer vision","Computer science"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"hf-org-paper:tencent:2512.15687","title":"Can LLMs Guide Their Own Exploration? Gradient-Guided Reinforcement Learning for LLM Reasoning","url":"https://huggingface.co/papers/2512.15687","published":"2025-12-17","authors":["Tencent/Hunyuan"],"abstract":"","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["HuggingFace org papers","tencent","LLM"],"author_affiliations":["Tencent/Hunyuan"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/tencent/papers"}},{"id":"official:49d9c2816c1a2ba7","title":"Gemini 3 Flash Model Card","url":"https://deepmind.google/models/model-cards/gemini-3-flash/","published":"2025-12-17","authors":["Google/DeepMind"],"abstract":"Official Google DeepMind model card.","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"model_card","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["model card","Gemini 3 Flash"],"author_affiliations":["Google/DeepMind"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Google DeepMind model cards page https://deepmind.google/models/model-cards/"}},{"id":"apple:ml4nr4n9txipra0b59jocb1i","title":"AgREE: Agentic Reasoning for Knowledge Graph Completion on Emerging Entities","url":"https://machinelearning.apple.com/research/agentic-reasoning","published":"2025-12-17","authors":["Ruocheng Zhao","Simone Conia","Eric Peng","Min Li","Saloni Potdar"],"abstract":"Open-domain Knowledge Graph Completion (KGC) faces significant challenges in an ever-changing world, especially when considering the continual emergence of new entities in daily news. Existing approaches for KGC mainly rely on pretrained language models' parametric knowledge, pre-constructed queries, or single-step retrieval, typically requiring substantial supervision and training data. Even so, they often fail to capture comprehensive and...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["retrieval","news"],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"openalex:W7131853120","title":"LLMCache: Layer-Wise Caching Strategies for Accelerated Reuse in Transformer Inference","url":"https://doi.org/10.1109/ised67359.2025.11405274","published":"2025-12-17","authors":["H. Bansal"],"abstract":"Transformer-based language models have achieved remarkable performance across a wide range of tasks, yet their high inference latency poses a significant challenge for real-time and large-scale deployment. While existing caching mechanisms, such as token-level key-value caches, offer speedups in autoregressive decoding, they are limited in scope and applicability. In this paper, we present LLMCache, a novel layer-wise caching framework that accelerates transformer inference by reusing intermediate activations based on semantic similarity of input sequences. Unlike prior work, LLMCache is model-agnostic, operates across both encoder and decoder architectures, and supports caching at arbitrary transformer layers. We introduce a lightweight fingerprinting mechanism for matching semantically similar inputs and propose adaptive eviction strategies to manage cache staleness. Experiments on BER...","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/ised67359.2025.11405274","openalex_id":"https://openalex.org/W7131853120","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Amazon (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.821399986743927},{"id":"https://openalex.org/C2776214188","display_name":"Inference","score":0.777400016784668},{"id":"https://openalex.org/C206588197","display_name":"Reuse","score":0.5789999961853027},{"id":"https://openalex.org/C66322947","display_name":"Transformer","score":0.550000011920929},{"id":"https://openalex.org/C118505674","display_name":"Encoder","score":0.43860000371932983},{"id":"https://openalex.org/C115537543","display_name":"Cache","score":0.4047999978065491},{"id":"https://openalex.org/C68339613","display_name":"Speedup","score":0.35569998621940613},{"id":"https://openalex.org/C79403827","display_name":"Real-time computing","score":0.3285999894142151}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7155498585","title":"Beyond the Ticker: Graph-Fused Stock Forecasting with Candlestick chart, Temporal, and Relational Intelligence","url":"https://doi.org/10.1145/3799830.3799872","published":"2025-12-17","authors":["Manali Patel","Shreya Goyal","Krupa Jariwala","Chiranjoy Chattopadhyay"],"abstract":"The objective of this work is to predict the future price movement of the stocks listed in the NIFTY-50 index of the Indian economy. The dependency of price movements on multiple factors makes an accurate prediction an inherently complex task. Existing methodologies leverage multiple sources of information to improve prediction efficiency. However, these multimodal approaches consider coarse-grained information and also their fusion techniques often lack interpretability and scalability. To fill this gap, we propose an end-to-end framework, the Multimodal Market Movement Prediction Network M3PNet, that harnesses the power of deep learning to extract information from multiple sources and fuse them in a non-linear manner. We have considered geometric features from candlestick charts, temporal features, and the interconnections between stocks belonging to the same sector. A novel graph-base...","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3799830.3799872","openalex_id":"https://openalex.org/W7155498585","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Amazon (United States)","Flame University","Sardar Vallabhbhai National Institute of Technology Surat"],"concepts":[{"id":"https://openalex.org/C2781067378","display_name":"Interpretability","score":0.845300018787384},{"id":"https://openalex.org/C153083717","display_name":"Leverage (statistics)","score":0.6876000165939331},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.6018999814987183},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5613999962806702},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.4641000032424927},{"id":"https://openalex.org/C2780299701","display_name":"Stock market","score":0.4255000054836273},{"id":"https://openalex.org/C50644808","display_name":"Artificial neural network","score":0.38029998540878296},{"id":"https://openalex.org/C2778136018","display_name":"Predictive power","score":0.37529999017715454}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/synthseg-agents-multi-agent-synthetic-data-generation-for-zero-shot-weakly-supervised-semantic-segmentation","title":"SynthSeg-Agents: Multi-Agent Synthetic Data Generation for Zero-Shot Weakly Supervised Semantic Segmentation","url":"https://www.microsoft.com/en-us/research/publication/synthseg-agents-multi-agent-synthetic-data-generation-for-zero-shot-weakly-supervised-semantic-segmentation/","published":"2025-12-16","authors":["Wangyu Wu","Zhenhong Chen","Xiaowei Huang","Fei Ma","Jimin Xiao"],"abstract":"Weakly Supervised Semantic Segmentation (WSSS) with image level labels aims to produce pixel level predictions without requiring dense annotations. While recent approaches have leveraged generative models to augment existing data, they remain dependent on real world training samples. In this paper, we introduce a novel direction, Zero Shot Weakly Supervised Semantic Segmentation (ZSWSSS), and propose SynthSeg Agents, a multi agent framework driven by Large Language Models (LLMs) to generate synthetic training data entirely without real images. SynthSeg Agents comprises two key modules, a Self Refine Prompt Agent and an Image Generation Agent. The Self Refine Prompt Agent autonomously crafts diverse and semantically rich image prompts via iterative refinement, memory mechanisms, and prompt space exploration, guided by CLIP based similarity and nearest neighbor diversity filtering. These p...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":84,"matched_keywords":["Miscellaneous","Computer vision","Computer science","LLM","memory","efficient","agent","multi-agent"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/dynamic-rebatching-for-efficient-early-exit-inference-with-drex","title":"Dynamic Rebatching for Efficient Early-Exit Inference with DREX","url":"https://www.microsoft.com/en-us/research/publication/dynamic-rebatching-for-efficient-early-exit-inference-with-drex/","published":"2025-12-16","authors":["Xuting Liu","Daniel Alexander","Siva Kesava Reddy Kakarla","Behnaz Arzani","Vincent Liu"],"abstract":"Early-Exit (EE) is a Large Language Model (LLM) architecture that accelerates inference by allowing easier tokens to be generated using only a subset of the model's layers. However, traditional batching frameworks are ill-suited for EE LLMs, as not all requests in a batch may be ready to exit at the same time. Existing solutions either force a uniform decision on the batch, which overlooks EE opportunities, or degrade output quality by forcing premature exits. We propose Dynamic Rebatching, a solution where we dynamically reorganize the batch at each early-exit point. Requests that meet the exit criteria are immediately processed, while those that continue are held in a buffer, re-grouped into a new batch, and forwarded to deeper layers. We introduce DREX, an early-exit inference system that implements Dynamic Rebatching with two key optimizations: 1) a copy-free rebatching buffer that a...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":84,"matched_keywords":["Miscellaneous","Artificial intelligence","Computer science","large language models","LLM","language model","memory","efficient"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/spatia-video-generation-with-updatable-spatial-memory","title":"Spatia: Video Generation with Updatable Spatial Memory","url":"https://www.microsoft.com/en-us/research/publication/spatia-video-generation-with-updatable-spatial-memory/","published":"2025-12-16","authors":["Jinjing Zhao","Fangyun Wei","Zhening Liu","Hongyang Zhang","Chang Xu","Yan Lu"],"abstract":"Existing video generation models struggle to maintain long-term spatial and temporal consistency due to the dense, high-dimensional nature of video signals. To overcome this limitation, we propose Spatia, a spatial memory-aware video generation framework that explicitly preserves a 3D scene point cloud as persistent spatial memory. Spatia iteratively generates video clips conditioned on this spatial memory and continuously updates it through visual SLAM. This dynamic-static disentanglement design enhances spatial consistency throughout the generation process while preserving the model's ability to produce realistic dynamic entities. Furthermore, Spatia enables applications such as explicit camera control and 3D-aware interactive editing, providing a geometrically grounded framework for scalable, memory-driven video generation.","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Miscellaneous","Computer vision","Computer science","Computer Vision and Pattern Recognition","memory","long-term"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/city-navigation-in-the-wild-exploring-emergent-navigation-from-web-scale-knowledge-in-mllms","title":"City Navigation in the Wild: Exploring Emergent Navigation from Web-Scale Knowledge in MLLMs","url":"https://www.microsoft.com/en-us/research/publication/city-navigation-in-the-wild-exploring-emergent-navigation-from-web-scale-knowledge-in-mllms/","published":"2025-12-16","authors":["Dwip Dalal","Utkarsh Mishra","Narendra Ahuja","Nebojsa Jojic"],"abstract":"Leveraging multimodal large language models (MLLMs) to develop embodied agents offers significant promise for addressing complex real-world tasks. However, current evaluation benchmarks remain predominantly language-centric or heavily reliant on simulated environments, rarely probing the nuanced, knowledge-intensive reasoning essential for practical, real-world scenarios. To bridge this critical gap, we introduce the task of Sparsely Grounded Visual Navigation, explicitly designed to evaluate the sequential decision-making abilities of MLLMs in challenging, knowledge-intensive real-world environments. We operationalize this task with CityNav, a comprehensive benchmark encompassing four diverse global cities, specifically constructed to assess raw MLLM-driven agents in city navigation. Agents are required to rely solely on visual inputs and internal multimodal reasoning to sequentially na...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Miscellaneous","Artificial intelligence","Computer science","Machine learning","agent"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/stepwise-think-critique-a-unified-framework-for-robust-and-interpretable-llm-reasoning","title":"Stepwise Think-Critique: A Unified Framework for Robust and Interpretable LLM Reasoning","url":"https://www.microsoft.com/en-us/research/publication/stepwise-think-critique-a-unified-framework-for-robust-and-interpretable-llm-reasoning/","published":"2025-12-16","authors":["Jiaqi Xu","Cuiling Lan","Xuejin Chen","Yan Lu"],"abstract":"Human beings solve complex problems through critical thinking, where reasoning and evaluation are intertwined to converge toward correct solutions. However, most existing large language models (LLMs) decouple reasoning from verification: they either generate reasoning without explicit self-checking or rely on external verifiers to detect errors post hoc. The former lacks immediate feedback, while the latter increases system complexity and hinders synchronized learning. Motivated by human critical thinking, we propose Stepwise Think-Critique (STC), a unified framework that interleaves reasoning and self-critique at each step within a single model. STC is trained with a hybrid reinforcement learning objective combining reasoning rewards and critique-consistency rewards to jointly optimize reasoning quality and self-evaluation. Experiments on mathematical reasoning benchmarks show that STC....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Miscellaneous","Artificial intelligence","Computer science","LLM"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"hf-org-paper:huawei-noah:2512.14531","title":"VersatileFFN: Achieving Parameter Efficiency in LLMs via Adaptive Wide-and-Deep Reuse","url":"https://huggingface.co/papers/2512.14531","published":"2025-12-16","authors":["Huawei/Noah"],"abstract":"","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["HuggingFace org papers","huawei-noah"],"author_affiliations":["Huawei/Noah"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/huawei-noah/papers"}},{"id":"official:11630da1b8a1b75c","title":"Pushing the Frontier of Audiovisual Perception with Large-Scale Multimodal Correspondence Learning","url":"https://ai.meta.com/research/publications/pushing-the-frontier-of-audiovisual-perception-with-large-scale-multimodal-correspondence-learning/","published":"2025-12-16","authors":["Apoorv Vyas","Heng-Jui Chang","Cheng-Fu Yang","Bernie Huang","Luya Gao","Julius Richter","Sanyuan Chen","Matt Le","Piotr Dollar","Christoph Feichtenhofer","Ann Lee","Wei-Ning Hsu"],"abstract":"Official Meta AI publication page.","companies":["Meta/FAIR"],"matched_orgs":["Meta/FAIR"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Speech & Audio","Computer Vision"],"author_affiliations":["Meta/FAIR"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Meta AI publication search https://ai.meta.com/results/?content_types%5B0%5D=publication&page=2"}},{"id":"apple:qcxufdg9frh3nmnf12md50nk","title":"UniGen-1.5: Enhancing Image Generation and Editing through Reward Unification in Reinforcement Learning","url":"https://machinelearning.apple.com/research/unigen-1.5","published":"2025-12-16","authors":["Rui Tian","Mingfei Gao§","Haiming Gang","Jiasen Lu","Zhe Gan","Yinfei Yang","Zuxuan Wu§","Afshin Dehghan"],"abstract":"We present UniGen-1.5, a unified multimodal large language model (MLLM) for advanced image understanding, generation and editing. Building upon UniGen, we comprehensively enhance the model architecture and training pipeline to strengthen the image understanding and generation capabilities while unlocking strong image editing ability. Especially, we propose a unified Reinforcement Learning (RL) strategy that improves both image generation and...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["language model"],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"apple:ujua2a0qa48rks0ykqhy5kks","title":"Synthetic Bootstrapped Pretraining","url":"https://machinelearning.apple.com/research/bootstrapped","published":"2025-12-16","authors":["Zitong Yang","Aonan Zhang","Hong Liu","Tatsunori Hashimoto","Emmanuel Candès","Chong Wang","Ruoming Pang"],"abstract":"We introduce Synthetic Bootstrapped Pretraining (SBP), a language model (LM) pretraining procedure that first learns a model of relations between documents from the pretraining dataset and then leverages it to synthesize a vast new corpus for joint training. While the standard pretraining teaches LMs to learn causal correlations among tokens within a single document, it is not designed to efficiently model the rich, learnable inter-document...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["language model"],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"apple:k82gdcggd2ogcptr6iwim3ux","title":"Score Distillation of Flow Matching Models","url":"https://machinelearning.apple.com/research/score-distillation","published":"2025-12-16","authors":["Mingyuan Zhou","Yi Gu","Huangjie Zheng","Liangchen Song","Guande He","Yizhe Zhang","Wenze Hu","Yinfei Yang"],"abstract":"Diffusion models achieve high-quality image generation but are limited by slow iterative sampling. Distillation methods alleviate this by enabling one- or few-step generation. Flow matching, originally introduced as a distinct framework, has since been shown to be theoretically equivalent to diffusion under Gaussian assumptions, raising the question of whether distillation techniques such as score distillation transfer directly. We provide a...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["distillation"],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"apple:x5zx8tr3bhvwk8k3zu90k0te","title":"Unified Open-World Segmentation with Multi-Modal Prompts","url":"https://machinelearning.apple.com/research/unified-open","published":"2025-12-16","authors":["Yang Liu","Yufei Yin","Chenchen Jing§","Muzhi Zhu","Hao Chen","Yuling Xi","Bo Feng","Hao Wang","Shiyu Li","Chunhua Shen"],"abstract":"Recent years have witnessed the rapid development of open-world image segmentation, including open-vocabulary segmentation and in-context segmentation. Nonetheless, existing methods are limited to a single modality prompt, which lacks the flexibility and accuracy needed for complex object-aware prompting. In this work, we present COSINE, a unified open-world segmentation model that Consolidates Open-vocabulary Segmentation and IN-context...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"apple:p2mby3yfgtraulkuc33traqh","title":"Data-Centric Lessons To Improve Speech-Language Pretraining","url":"https://machinelearning.apple.com/research/data-centric-lessons","published":"2025-12-16","authors":["Vishaal Udandarao","Zhiyun Lu","Xuankai Chang","Yongqiang Wang","Violet Z. Yao","Albin Madapally Jose","Fartash Faghri","Josh Gardner","Chung-Cheng Chiu"],"abstract":"Spoken Question-Answering (SQA) is a core capability for useful and interactive artificial intelligence systems. Recently, several speech-language models (SpeechLMs) have been released with a specific focus on improving their SQA performance. However, a lack of controlled ablations of pretraining data processing and curation makes it challenging to understand what factors account for performance, despite substantial gains from similar studies in...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"openalex:W4417415121","title":"Research on Human-Robot Interaction Technology Based on Gesture Recognition","url":"https://doi.org/10.26689/jera.v9i6.13197","published":"2025-12-16","authors":["Ming Hu"],"abstract":"With the growing application of intelligent robots in service, manufacturing, and medical fields, efficient and natural interaction between humans and robots has become key to improving collaboration efficiency and user experience. Gesture recognition, as an intuitive and contactless interaction method, can overcome the limitations of traditional interfaces and enable real-time control and feedback of robot movements and behaviors. This study first reviews mainstream gesture recognition algorithms and their application on different sensing platforms (RGB cameras, depth cameras, and inertial measurement units). It then proposes a gesture recognition method based on multimodal feature fusion and a lightweight deep neural network that balances recognition accuracy with computational efficiency. At system level, a modular human-robot interaction architecture is constructed, comprising percep...","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.26689/jera.v9i6.13197","openalex_id":"https://openalex.org/W4417415121","cited_by_count":0,"quality_score":41,"matched_keywords":["efficient"],"author_affiliations":["Baidu (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7404999732971191},{"id":"https://openalex.org/C159437735","display_name":"Gesture recognition","score":0.7149999737739563},{"id":"https://openalex.org/C207347870","display_name":"Gesture","score":0.7034000158309937},{"id":"https://openalex.org/C90509273","display_name":"Robot","score":0.574400007724762},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.5669000148773193},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5303999781608582},{"id":"https://openalex.org/C101468663","display_name":"Modular design","score":0.5073999762535095},{"id":"https://openalex.org/C2776401178","display_name":"Feature (linguistics)","score":0.4726000130176544}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/evicpress-joint-kv-cache-compression-and-eviction-for-efficient-llm-serving","title":"EVICPRESS: Joint KV-Cache Compression and Eviction for Efficient LLM Serving","url":"https://www.microsoft.com/en-us/research/publication/evicpress-joint-kv-cache-compression-and-eviction-for-efficient-llm-serving/","published":"2025-12-15","authors":["Shaoting Feng","Yuhan Liu","Hanchen Li","Xiaokun Chen","Samuel Shen","Kuntai Du","Zhuohan Gu","Rui Zhang","Yuyang Huang","Yihua Cheng","Jiayi Yao","Qizheng Zhang"],"abstract":"Reusing KV cache is essential for high efficiency of Large Language Model (LLM) inference systems. With more LLM users, the KV cache footprint can easily exceed GPU memory capacity, so prior work has proposed to either evict KV cache to lower-tier storage devices, or compress KV cache so that more KV cache can be fit in the fast memory. However, prior work misses an important opportunity: jointly optimizing the eviction and compression decisions across all KV caches to minimize average generation latency without hurting quality. We propose EVICPRESS, a KV-cache management system that applies lossy compression and adaptive eviction to KV cache across multiple storage tiers. Specifically, for each KV cache of a context, EVICPRESS considers the effect of compression and eviction of the KV cache on the average generation quality and delay across all contexts as a whole. To achieve this, EVIC...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":92,"matched_keywords":["Miscellaneous","Artificial intelligence","Programming languages and software engineering","Computer science","Operating Systems","LLM","language model","memory","efficient","compression"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/native-and-compact-structured-latents-for-3d-generation","title":"Native and Compact Structured Latents for 3D Generation","url":"https://www.microsoft.com/en-us/research/publication/native-and-compact-structured-latents-for-3d-generation/","published":"2025-12-15","authors":["Jianfeng Xiang","Xiaoxue Chen","Sicheng Xu","Ruicheng Wang","Zelong Lv","Yu Deng","Hongyuan Zhu","Yue Dong","Hao Zhao","Nicholas Jing Yuan","Jiaolong Yang"],"abstract":"Recent advancements in 3D generative modeling have significantly improved the generation realism, yet the field is still hampered by existing representations, which struggle to capture assets with complex topologies and detailed appearance. This paper present an approach for learning a structured latent representation from native 3D data to address this challenge. At its core is a new sparse voxel structure called O-Voxel, an omni-voxel representation that encodes both geometry and appearance. O-Voxel can robustly model arbitrary topology, including open, non-manifold, and fully-enclosed surfaces, while capturing comprehensive surface attributes beyond texture color, such as physically-based rendering parameters. Based on O-Voxel, we design a Sparse Compression VAE which provides a high spatial compression rate and a compact latent space. We train large-scale flow-matching models compris...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":80,"matched_keywords":["Miscellaneous","Computer vision","3D generative modeling","Computer science","Computer Vision and Pattern Recognition","efficient","compression"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/effect-of-document-packing-on-the-latent-multi-hop-reasoning-capabilities-of-large-language-models","title":"Effect of Document Packing on the Latent Multi-Hop Reasoning Capabilities of Large Language Models","url":"https://www.microsoft.com/en-us/research/publication/effect-of-document-packing-on-the-latent-multi-hop-reasoning-capabilities-of-large-language-models/","published":"2025-12-15","authors":["Gabriele Prato","Shagun Sodhani","Alessandro Sordoni","Sarath Chandar","Alessandro Sordoni"],"abstract":"The standard practice for training large language models involves packing multiple documents together to optimize computational efficiency. However, the impact of this process on the models'capabilities remains largely unexplored. To address this gap, we investigate how different document-packing strategies influence the latent multi-hop reasoning abilities of LLMs. Our findings indicate that packing can improve model performance compared to training on individual documents, at the expense of more compute. To further understand the underlying mechanisms, we conduct an ablation study, identifying key factors that explain the advantages of packing. Ultimately, our research deepens the understanding of LLM training dynamics and provides practical insights for optimizing model development.","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Miscellaneous","Artificial intelligence","Computer science","large language models","LLM"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/a-scientific-reasoning-model-for-organic-synthesis-procedure-generation","title":"A Scientific Reasoning Model for Organic Synthesis Procedure Generation","url":"https://www.microsoft.com/en-us/research/publication/a-scientific-reasoning-model-for-organic-synthesis-procedure-generation/","published":"2025-12-15","authors":["Guoqing Liu","Junren Li","Zihan Zhao","Eray Inanc","Krzysztof Maziarz","Jose Garrido Torres","Victor Garcia Satorras","Shoko Ueda","Christopher Bishop","Marwin Segler"],"abstract":"Solving computer-aided synthesis planning is essential for enabling fully automated, robot-assisted synthesis workflows and improving the efficiency of drug discovery. A key challenge, however, is bridging the gap between computational route design and practical laboratory execution, particularly the accurate prediction of viable experimental procedures for each synthesis step. In this work, we present QFANG, a scientific reasoning language model capable of generating precise, structured experimental procedures directly from reaction equations, with explicit chain-of-thought reasoning. To develop QFANG, we curated a high-quality dataset comprising 905,990 chemical reactions paired with structured action sequences, extracted and processed from patent literature using large language models. We introduce a Chemistry-Guided Reasoning (CGR) framework that produces chain-of-thought data ground...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Unpublished","Artificial intelligence","LLM","language model","retrieval"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"hf-org-paper:MiniMaxAI:2512.13687","title":"Towards Scalable Pre-training of Visual Tokenizers for Generation","url":"https://huggingface.co/papers/2512.13687","published":"2025-12-15","authors":["MiniMax"],"abstract":"The quality of the latent space in visual tokenizers (e.g., VAEs) is crucial for modern generative models. However, the standard reconstruction-based training paradigm produces a latent space that is biased towards low-level information, leading to a foundation flaw: better pixel-level accuracy does not lead to higher-quality generation. This implies that pouring extensive compute into visual tokenizer pre-training translates poorly to improved performance in generation. We identify this as the ``pre-training scaling problem`` and suggest a necessary shift: to be effective for generation, a latent space must concisely represent high-level semantics. We present VTP, a unified visual tokenizer pre-training framework, pioneering the joint optimization of image-text contrastive, self-supervised, and reconstruction losses. Our large-scale study reveals two principal findings: (1) understandin...","companies":["MiniMax"],"matched_orgs":["MiniMax"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["HuggingFace org papers","MiniMaxAI","distillation"],"author_affiliations":["MiniMax"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/MiniMaxAI/papers"}},{"id":"bytedance-seed:1323","title":"Seedance 1.5 pro: A Native Audio-Visual Joint Generation Foundation Model","url":"https://seed.bytedance.com/en/research/seedance-1-5-pro-a-native-audio-visual-joint-generation-foundation-model","published":"2025-12-15","authors":["Seed Vision Team"],"abstract":"Recent strides in video generation have paved the way for unified audio-visual generation. In this work, we present Seedance 1.5 pro, a foundational model engineered specifically for native, joint audio-video generation. Leveraging a dual-branch Diffusion Transformer architecture, the model integrates a cross-modal joint module with a specialized multi-stage data pipeline, achieving exceptional audio-visual synchronization and superior generation quality. To ensure practicalutility, we implement meticulous post-training optimizations, including Supervised Fine-Tuning(SFT) on high-quality datasets and Reinforcement Learning from Human Feedback (RLHF) with multi-dimensional reward models. Furthermore, we introduce an acceleration framework that boosts inference speed by over 10×. Seedance 1.5 pro distinguishes itself through precisemultilingual and dialect lip-syncing, dynamic cinematic ca...","companies":["ByteDance/Seed"],"matched_orgs":["ByteDance/Seed"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Computer Vision and Pattern Recognition","Vision","arXiv"],"author_affiliations":["ByteDance/Seed"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official ByteDance Seed publication API https://seed.bytedance.com/api/get_article_list_v2"}},{"id":"openalex:W7126067746","title":"Pangenome-Informed Language Models for Synthetic Genome Sequence Generation","url":"https://doi.org/10.1109/bibm66473.2025.11356310","published":"2025-12-15","authors":["Pengzhi Huang","François Charton","Jan-Niklas Schmelzle","Shelby S. Darnell","Pjotr Prins","Erik Garrison","G. Edward Suh"],"abstract":"Language Models (LM) have been extensively utilized for learning DNA sequence patterns and generating synthetic sequences. In this paper, we present a novel approach for the generation of synthetic DNA data using pangenomes in combination with LM. We introduce three innovative pangenome-based tokenization schemes that enhance DNA sequence generation. Our experimental results demonstrate the superiority of pangenome-based tokenization over classical methods in generating high-utility synthetic DNA sequences, highlighting significant improvements in training efficiency and sequence quality.","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/bibm66473.2025.11356310","openalex_id":"https://openalex.org/W7126067746","cited_by_count":4,"quality_score":41,"matched_keywords":[],"author_affiliations":["Cornell University","Milieux environnementaux, transferts et interactions dans les hydrosystèmes et les sols","Nvidia (United States)","University of Tennessee Health Science Center"],"concepts":[{"id":"https://openalex.org/C2778112365","display_name":"Sequence (biology)","score":0.6819999814033508},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6615999937057495},{"id":"https://openalex.org/C160920958","display_name":"Synthetic data","score":0.5680000185966492},{"id":"https://openalex.org/C51679486","display_name":"DNA sequencing","score":0.5613999962806702},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.484499990940094},{"id":"https://openalex.org/C70721500","display_name":"Computational biology","score":0.48339998722076416},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.4260999858379364},{"id":"https://openalex.org/C176982825","display_name":"Lexical analysis","score":0.42579999566078186}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7126120739","title":"Harnessing Large Models, Distilling to Small: Localized Deployment for Accurate Medical Prescription Diagnostic Inference","url":"https://doi.org/10.1109/bibm66473.2025.11357089","published":"2025-12-15","authors":["X H Guo","R. Huang F. L. Zhou","Pujun Feng","Yang Liu","Yuxue Qi","Tian Yang","Bin Cu"],"abstract":"Diagnostic errors impose substantial healthcare costs. To address this, we propose BELL, a framework that leverages LLMs for data augmentation and distills knowledge into compact BERT models for efficient deployment. Our twostage framework first standardizes non-uniform clinical terms using fine-tuned BERT models, followed by multi-label disease prediction incorporating prescription data. Experiments on realworld anonymized data demonstrate BELL achieves 94.27 % standardization accuracy and improves diagnostic F1-score from 0.45 to 0.73, with 0.678 s average inference time.","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/bibm66473.2025.11357089","openalex_id":"https://openalex.org/W7126120739","cited_by_count":0,"quality_score":41,"matched_keywords":["efficient"],"author_affiliations":["Academy of State Administration of Grain","Baidu (China)","Peking University"],"concepts":[{"id":"https://openalex.org/C188087704","display_name":"Standardization","score":0.7159000039100647},{"id":"https://openalex.org/C2776214188","display_name":"Inference","score":0.6862000226974487},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6765000224113464},{"id":"https://openalex.org/C105339364","display_name":"Software deployment","score":0.5975000262260437},{"id":"https://openalex.org/C2426938","display_name":"Medical prescription","score":0.476500004529953},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.43970000743865967},{"id":"https://openalex.org/C124101348","display_name":"Data mining","score":0.43790000677108765},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.39489999413490295}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7126015404","title":"Exploratory Analysis of the Regulation of Long Non-Coding RNA Transcription with Nucleotide Large Language Models","url":"https://doi.org/10.1109/bibm66473.2025.11357206","published":"2025-12-15","authors":["Wei Wang","Zhichao Hou","Tianfu Matt Wu","Dage Liu","Xinxia Peng"],"abstract":"Large language models (LLMs) have emerged as powerful tools for biological sequence analysis. However, their applicability to the transcriptional regulation of long non-coding RNAs (IncRNAs) remains underexplored due to the complexity and diversity of IncRNA sequences, combined with limited knowledge of their regulatory mechanisms and functional characteristics. In this study, we systematically evaluate both singletask and multi-task fine-tuning strategies of genome foundation models across four tasks designed to capture increasing biological complexity. By fine-tuning genome foundation models on a series of progressively complex tasks, each designed to closely mimic the complexities of IncRNA classification, we explore how task complexity impacts model performance and biological interpretability. Our findings reveal that while foundation models capture promoter-specific signals, task co...","companies":["Meta/FAIR"],"matched_orgs":["Meta/FAIR"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/bibm66473.2025.11357206","openalex_id":"https://openalex.org/W7126015404","cited_by_count":0,"quality_score":41,"matched_keywords":["LLM"],"author_affiliations":["Meta (United States)","North Carolina State University"],"concepts":[{"id":"https://openalex.org/C2781067378","display_name":"Interpretability","score":0.7243000268936157},{"id":"https://openalex.org/C70721500","display_name":"Computational biology","score":0.5706999897956848},{"id":"https://openalex.org/C2778112365","display_name":"Sequence (biology)","score":0.4584999978542328},{"id":"https://openalex.org/C2780451532","display_name":"Task (project management)","score":0.4431000053882599},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.43209999799728394},{"id":"https://openalex.org/C86803240","display_name":"Biology","score":0.4268999993801117},{"id":"https://openalex.org/C179926584","display_name":"Transcription (linguistics)","score":0.41280001401901245},{"id":"https://openalex.org/C179518139","display_name":"Coding (social sciences)","score":0.4043000042438507}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7115166209","title":"Connecting the Impact of Silent Data Corruption With Different Training Characteristics: An Empirical Study","url":"https://doi.org/10.1109/mm.2025.3642709","published":"2025-12-15","authors":["Hengzhi Pei","Leonard Lausen","George Karypis"],"abstract":"Despite the non-negligible occurrence of Silent Data Corruption (SDC) during large-scale training of Large Language Models (LLMs), SDC impact on training lacks systematic understanding. This article empirically analyzes the connections between different training characteristics and the impact of SDC on LLM training. Using deterministic training workloads on real-world SDC-affected hardware, we quantify SDC impact by measuring the difference from the baselines on healthy hardware and provide insights into training robustness against SDC by systematically controlled experiments. We find that SDC impact correlates strongly with training stability and loss landscape regions, with Not-a-Number (NaN) occurring during training larger models. We further study if setting elementwise gradient bounds can mitigate SDC impact considering that SDC can change gradients by large magnitudes. Our results....","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/mm.2025.3642709","openalex_id":"https://openalex.org/W7115166209","cited_by_count":0,"quality_score":41,"matched_keywords":["LLM"],"author_affiliations":["Amazon (United States)"],"concepts":[{"id":"https://openalex.org/C63479239","display_name":"Robustness (evolution)","score":0.8009999990463257},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7835000157356262},{"id":"https://openalex.org/C2777211547","display_name":"Training (meteorology)","score":0.7454000115394592},{"id":"https://openalex.org/C46355384","display_name":"Compromise","score":0.597100019454956},{"id":"https://openalex.org/C51632099","display_name":"Training set","score":0.5512999892234802},{"id":"https://openalex.org/C120936955","display_name":"Empirical research","score":0.531000018119812},{"id":"https://openalex.org/C2780027415","display_name":"Language change","score":0.5210999846458435},{"id":"https://openalex.org/C112972136","display_name":"Stability (learning theory)","score":0.5030999779701233}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4417337751","title":"AnyoneCue: Gloss-Prompted Fine-Grained and Personalized Cued Speech Video Generation","url":"https://doi.org/10.1109/taslpro.2025.3641284","published":"2025-12-15","authors":["Li Liu","Wentao Lei","Jun Wang","Wenwu Wang"],"abstract":"Cued Speech (CS) is a visual coding system, which combines lip-reading with several specific hand codings to help hearing-impaired people to communicate effectively. Generating CS videos from audio speech and text can significantly improve accessibility and communication for individuals with hearing impairments. However, existing video generation methods pri marily concentrate on general gestures, such as human walking, and hence are not directly suitable for generating CS videos. Moreover, current approaches struggle to produce realistic, fine grained, personalized videos adhering to specific CS coding rules. To address these challenges, firstly, we propose a Gloss-based Diffusion Pose Generation Model (GlossDiff), where the gloss is a novel CS motion parsing prompt to integrate additional linguistic rules knowledge into the CS pose generation model. The glosses are automatically genera...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/taslpro.2025.3641284","openalex_id":"https://openalex.org/W4417337751","cited_by_count":0,"quality_score":41,"matched_keywords":["personalized"],"author_affiliations":["Jingdong (China)","Tencent (China)","University of Surrey"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8300999999046326},{"id":"https://openalex.org/C28490314","display_name":"Speech recognition","score":0.5892000198364258},{"id":"https://openalex.org/C83195618","display_name":"Cued speech","score":0.5493000149726868},{"id":"https://openalex.org/C179518139","display_name":"Coding (social sciences)","score":0.515999972820282},{"id":"https://openalex.org/C186644900","display_name":"Parsing","score":0.5127999782562256},{"id":"https://openalex.org/C153083717","display_name":"Leverage (statistics)","score":0.49239999055862427},{"id":"https://openalex.org/C207347870","display_name":"Gesture","score":0.48840001225471497},{"id":"https://openalex.org/C89600930","display_name":"Segmentation","score":0.45210000872612}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7115582381","title":"A Contrastive Feedback Loops-Based Unified Framework For Generating Self-Improving LLM Agents Using PL-RL","url":"https://doi.org/10.36227/techrxiv.176583871.16158474/v1","published":"2025-12-15","authors":["Ravi Kiran Vadlamani","DPU","Mahesh Reddy Konatham"],"abstract":"LLM has made significant progress in text summarization. Existing works did not focus on fine-grained credit attribution mechanisms in contrastive feedback loops. Therefore, a unified framework based on contrastive feedback loops using PL-RL is proposed for self-improved LLM. Initially, the texts are pre-processed, followed by topic modeling and feature extraction. Now, based on the features, the LLM model is trained to showcase its stability. During training, the learning module effectively learns the input data with contrastive feedback using PL-RL. Then, to support sequential learning, G-LORA is introduced. Now, DEMA-DDM is applied to monitor the model drift while selecting optimal hyperparameters using TM-SSO. If instability is detected, then the learning module is triggered through contrastive feedback until the model regains stability. Finally, FSLGPT-2 produces summarized texts wi...","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.36227/techrxiv.176583871.16158474/v1","openalex_id":"https://openalex.org/W7115582381","cited_by_count":0,"quality_score":41,"matched_keywords":["LLM"],"author_affiliations":["Amazon (United States)","PayPal (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7753000259399414},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.6265000104904175},{"id":"https://openalex.org/C2776401178","display_name":"Feature (linguistics)","score":0.6078000068664551},{"id":"https://openalex.org/C192209626","display_name":"Focus (optics)","score":0.5831000208854675},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.45559999346733093},{"id":"https://openalex.org/C112972136","display_name":"Stability (learning theory)","score":0.40630000829696655},{"id":"https://openalex.org/C8642999","display_name":"Hyperparameter","score":0.3758000135421753},{"id":"https://openalex.org/C45493050","display_name":"Unified Model","score":0.3725000023841858}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7124179532","title":"HAP: Hybrid Adaptive Parallelism for Efficient Mixture-of-Experts Inference","url":"https://doi.org/10.1109/icpads67057.2025.11322882","published":"2025-12-14","authors":["Haoran Lin","Xianzhi Yu","Kang Zhao","Han Bao","Zongyuan Zhan","Ting Hu","Wulong Liu","Zekun Yin","Xin Li","Weiguo Liu"],"abstract":"Current inference systems for Mixture-of-Experts (MoE) models primarily employ static parallelization strategies. However, these static approaches cannot consistently achieve optimal performance across different inference scenarios, as they lack the flexibility to adapt to varying computational requirements. In this work, we propose HAP (Hybrid Adaptive Parallelism), a novel method that dynamically selects hybrid parallel strategies to enhance MoE inference efficiency. The fundamental innovation of HAP lies in hierarchically decomposing MoE architectures into two distinct computational modules: the Attention module and the Expert module, each augmented with a specialized inference latency simulation model. This decomposition promotes the construction of a comprehensive search space for seeking model parallel strategies. By leveraging Integer Linear Programming (ILP), HAP could solve the....","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/icpads67057.2025.11322882","openalex_id":"https://openalex.org/W7124179532","cited_by_count":0,"quality_score":41,"matched_keywords":["efficient"],"author_affiliations":["Huawei Technologies (China)","Shandong University"],"concepts":[{"id":"https://openalex.org/C2776214188","display_name":"Inference","score":0.8328999876976013},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7793999910354614},{"id":"https://openalex.org/C2780598303","display_name":"Flexibility (engineering)","score":0.5551000237464905},{"id":"https://openalex.org/C177148314","display_name":"Generalization","score":0.5188000202178955},{"id":"https://openalex.org/C173608175","display_name":"Parallel computing","score":0.4442000091075897},{"id":"https://openalex.org/C124681953","display_name":"Decomposition","score":0.4043000042438507},{"id":"https://openalex.org/C2781172179","display_name":"Parallelism (grammar)","score":0.37689998745918274},{"id":"https://openalex.org/C66024118","display_name":"Computational model","score":0.337799996137619}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4417299246","title":"MuMu-LLaMA: Multi-modal music understanding and generation via large language models","url":"https://doi.org/10.1016/j.eswa.2025.130688","published":"2025-12-13","authors":["Shansong Liu","Q. M. Jonathan Wu","Atin Sakkeer Hussain","Ying Shan","Chenshuo Sun"],"abstract":"","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1016/j.eswa.2025.130688","openalex_id":"https://openalex.org/W4417299246","cited_by_count":4,"quality_score":41,"matched_keywords":[],"author_affiliations":["National University of Singapore","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7861999869346619},{"id":"https://openalex.org/C26517878","display_name":"Key (lock)","score":0.6881999969482422},{"id":"https://openalex.org/C73520026","display_name":"Pop music automation","score":0.5875999927520752},{"id":"https://openalex.org/C2779903281","display_name":"Modalities","score":0.5098000168800354},{"id":"https://openalex.org/C511192102","display_name":"Comprehension","score":0.4902999997138977},{"id":"https://openalex.org/C2777946086","display_name":"Music information retrieval","score":0.46380001306533813},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.4140999913215637},{"id":"https://openalex.org/C36503486","display_name":"Domain (mathematical analysis)","score":0.4124000072479248}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":4}},{"id":"apple:k508j615p1iieqbpwa8pz1y3","title":"Reusing Pre-Training Data at Test Time is a Compute Multiplier","url":"https://machinelearning.apple.com/research/compute-multiplier","published":"2025-12-12","authors":["Alex Fang","Thomas Voice","Ruoming Pang","Ludwig Schmidt","Tom Gunter"],"abstract":"Large language models learn from their vast pre-training corpora, gaining the ability to solve an ever increasing variety of tasks; yet although researchers work to improve these datasets, there is little effort to understand how efficient the pre-training apparatus is at extracting ideas and knowledge from the data. In this work, we use retrieval augmented generation along with test-time compute as a way to quantify how much dataset value was...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["retrieval","efficient"],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"apple:gcmn5p2bqlqiaelemjki92cj","title":"IMPACT: Inflectional Morphology Probes Across Complex Typologies","url":"https://machinelearning.apple.com/research/inflectional-morphology-probes","published":"2025-12-12","authors":["Mohammed J. Saeed","Tommi Vehvilainen","Evgeny Fedoseev","Sevil Caliskan","Tatiana Vodolazova"],"abstract":"Large Language Models (LLMs) have shown significant progress on various multilingual benchmarks and are increasingly used to generate and evaluate text in non-English languages. However, while they may produce fluent outputs, it remains unclear to what extent these models truly grasp the underlying linguistic complexity of those languages, particularly in morphology. To investigate this, we introduce IMPACT, a synthetically generated evaluation...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"openalex:W7135062144","title":"Weighted Spatial Encoding Grids for Digital Twin City Applications Using Implicit Neural Representations of Environmental Factors and Geometric Complexity","url":"https://doi.org/10.1109/cait68620.2025.11424839","published":"2025-12-12","authors":["Pengfei Xu","Bin Zhang","Xuren Deng","Su Zhou","Ranyu Chen","Huiming Luo"],"abstract":"With the rapid development of digital twin city platforms, urban multimodal data have grown explosively, making efficient spatial querying and computation a critical challenge. Conventional grid-based encoding methods hierarchically partition 3D space but struggle to capture nonlinear geometric features and exhibit low query efficiency in large-scale scenarios. This paper proposes a weighted spatial grid partitioning method that integrates geometric complexity with environmental factors. A neural network is used to learn a continuous mapping from spatial coordinates to environmental factor intensities, enabling interpolation at arbitrary positions. These outputs, combined with geometric complexity, drive a weighting model that adaptively subdivides high-importance grids. Considered factors include airflow, illumination, magnetic field, and GPS signals, with support for incremental extens...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/cait68620.2025.11424839","openalex_id":"https://openalex.org/W7135062144","cited_by_count":0,"quality_score":41,"matched_keywords":["efficient"],"author_affiliations":["Shenzhen Technology University","Shenzhen University","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6690000295639038},{"id":"https://openalex.org/C187691185","display_name":"Grid","score":0.6383000016212463},{"id":"https://openalex.org/C42812","display_name":"Partition (number theory)","score":0.6309999823570251},{"id":"https://openalex.org/C125411270","display_name":"Encoding (memory)","score":0.5860000252723694},{"id":"https://openalex.org/C45374587","display_name":"Computation","score":0.5726000070571899},{"id":"https://openalex.org/C183115368","display_name":"Weighting","score":0.5701000094413757},{"id":"https://openalex.org/C137800194","display_name":"Interpolation (computer graphics)","score":0.5249999761581421},{"id":"https://openalex.org/C50644808","display_name":"Artificial neural network","score":0.5216000080108643}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7114894699","title":"ThinkBox: Integrating generative artificial intelligence into graduate studies in administration: impacts, ethical dilemmas, and methodological paths","url":"https://doi.org/10.1108/rege-10-2025-218","published":"2025-12-12","authors":["Emílio José Montero Arruda Filho","Ricardo Limongi","Mark Michael Lennon"],"abstract":"The rapid integration of artificial intelligence (AI), particularly Generative AI (GenAI), into graduate programs in business administration is transforming teaching and research. Tools such as ChatGPT, Gemini, Claude and other text-generation models now assist with tasks once entirely manual, from literature reviews to statistical analyses. This progress raises profound ethical and methodological dilemmas: How can academic integrity and originality be preserved when algorithms contribute to the production of knowledge? Graduate programs worldwide are debating policies for the responsible use of GenAI, but the pace of discussion varies across contexts. Globally, the adoption of AGI in higher education is expanding rapidly, with a 43% increase in applications between 2018 and 2022 (Zawacki-Richter, Marín, Bond and Gouverneur, 2019). Since the release of ChatGPT, growth has accelerated fur...","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1108/rege-10-2025-218","openalex_id":"https://openalex.org/W7114894699","cited_by_count":0,"quality_score":41,"matched_keywords":["long-term"],"author_affiliations":["Amazon (United States)","Amazon Research Foundation","Clarion University","Universidade Federal de Goiás","Universidade Federal do Pará"],"concepts":[{"id":"https://openalex.org/C55587333","display_name":"Engineering ethics","score":0.6241000294685364},{"id":"https://openalex.org/C2777526511","display_name":"Pace","score":0.6126000285148621},{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.6104000210762024},{"id":"https://openalex.org/C158154518","display_name":"Relevance (law)","score":0.5967000126838684},{"id":"https://openalex.org/C144024400","display_name":"Sociology","score":0.45910000801086426},{"id":"https://openalex.org/C2776950860","display_name":"Originality","score":0.3937000036239624},{"id":"https://openalex.org/C177264268","display_name":"Set (abstract data type)","score":0.39309999346733093},{"id":"https://openalex.org/C17744445","display_name":"Political science","score":0.35839998722076416}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"apple:fwgxlktt1rdud6uhneys8gnr","title":"COMPASS: A Multi-Turn Benchmark for Tool-Mediated Planning & Preference Optimization","url":"https://machinelearning.apple.com/research/multi-turn-benchmark","published":"2025-12-11","authors":["Tian Qin","Felix Bai","Ting-Yao Hu","Raviteja Vemulapalli","Hema Swetha Koppula","Zhiyang Xu","Bowen Jin§","Mert Cemri¶","Jiarui Lu","Zirui Wang","Meng Cao"],"abstract":"Real-world large language model (LLM) agents must master strategic tool use and user preference optimization through multi-turn interactions to assist users with complex planning tasks. We introduce COMPASS (Constrained Optimization through Multi-turn Planning and Strategic Solutions), a benchmark that evaluates agents on realistic travel-planning scenarios. We cast travel planning as a constrained preference optimization problem, where agents...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["LLM","language model","preference"],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"official:f616ab3708b30e3d","title":"Update to GPT-5 System Card: GPT-5.2","url":"https://openai.com/index/gpt-5-system-card-update-gpt-5-2","published":"2025-12-11","authors":["OpenAI"],"abstract":"GPT-5.2 is the latest model family in the GPT-5 series. The comprehensive safety mitigation approach for these models is largely the same as that described in the GPT-5 System Card and GPT-5.1 System Card. Like OpenAI’s other models, the GPT-5.2 models were trained on diverse datasets, including information that is publicly available on the internet, information that we partner with third parties to access, and information that our users or human trainers and researchers provide or generate.","companies":["OpenAI"],"matched_orgs":["OpenAI"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Publication"],"author_affiliations":["OpenAI"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official RSS feed https://openai.com/news/rss.xml"}},{"id":"apple:zgad5w5boz5jsgq9deboel7y","title":"Mirror Speculative Decoding: Breaking the Serial Barrier in LLM Inference","url":"https://machinelearning.apple.com/research/mirror","published":"2025-12-11","authors":["Nikhil Bhendawade","Kumari Nishu","Arnav Kundu","Chris Bartels","Minsik Cho","Irina Belousova"],"abstract":"Speculative decoding accelerates LLM inference by using a draft model to look ahead, but gains are capped by the cost of autoregressive draft generation: increasing draft size elevates acceptance rates but introduces additional latency overhead exacerbating the speed-accuracy tradeoff. Prior methods (Medusa, Hydra, EAGLE) partially reduce draft cost but either degrade acceptance or introduce overheads that limit scaling. We present Mirror...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["LLM"],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"apple:k862q4u6ca2hbprz5s7ufbxj","title":"GRACE: A Language Model Framework for Explainable Inverse Reinforcement Learning","url":"https://machinelearning.apple.com/research/grace","published":"2025-12-11","authors":["Silvia Sapora","Devon Hjelm","Alexander Toshev","Omar Attia","Bogdan Mazoure"],"abstract":"Inverse Reinforcement Learning aims to recover reward models from expert demonstrations, but traditional methods yield \"black-box\" models that are difficult to interpret and debug. In this work, we introduce GRACE (Generating Rewards As CodE), a method for using Large Language Models within an evolutionary search to reverse-engineer an interpretable, code-based reward function directly from expert trajectories. The resulting reward function is...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["language model"],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"apple:h8kq35d6mcq7gfa8tg5wlnrw","title":"MoE-PHDS: One MoE Checkpoint for Flexible Runtime Sparsity","url":"https://machinelearning.apple.com/research/moe-phds","published":"2025-12-11","authors":["Lauren A. Hannah","Soheil Zibakhsh","Kumari Nishu","Arnav Kundu","Mohammad Samragh Razlighi","Mehrdad Farajtabar","Minsik Cho"],"abstract":"Sparse Mixtures of Experts (MoEs) are typically trained to operate at a fixed sparsity level, e.g. k in a top-k gating function. This global sparsity level determines an operating point on the accuracy/latency curve; currently, meeting multiple efficiency targets means training and maintaining multiple models. This practice complicates serving, increases training and maintenance costs, and limits flexibility in meeting diverse latency,...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"apple:ht7oaieob3qa922ims99vkbn","title":"DiT-Air: Revisiting the Efficiency of Diffusion Model Architecture Design in Text to Image Generation","url":"https://machinelearning.apple.com/research/dit-air","published":"2025-12-11","authors":["Chen Chen","Rui Qian","Wenze Hu","Tsu-Jui Fu","Jialing Tong","Xinze Wang","Lezhi Li","Bowen Zhang","Alex Schwing","Wei Liu","Yinfei Yang"],"abstract":"In this work, we empirically study Diffusion Transformers (DiTs) for text-to-image generation, focusing on architectural choices, text-conditioning strategies, and training protocols. We evaluate a range of DiT-based architectures--including PixArt-style and MMDiT variants--and compare them with a standard DiT variant which directly processes concatenated text and noise inputs. Surprisingly, our findings reveal that the performance of standard...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"apple:twkhgn9gvp8wvg38hhcgsnhw","title":"Assessing the Role of Data Quality in Training Bilingual Language Models","url":"https://machinelearning.apple.com/research/data-quality-bilingual-lms","published":"2025-12-11","authors":["Skyler Seto","Maartje ter Hoeve","Maureen de Seyssel","David Grangier"],"abstract":"Bilingual and multilingual language models offer a promising path toward scaling NLP systems across diverse languages and users. However, their performance often varies wildly between languages as prior works show that adding more languages can degrade performance for some languages (such as English), while improving others (typically more data constrained languages). In this work, we investigate causes of these inconsistencies by comparing...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"openalex:W4417241857","title":"SuperOffload: Unleashing the Power of Large-Scale LLM Training on Superchips","url":"https://doi.org/10.1145/3760250.3762217","published":"2025-12-11","authors":["Xinyu Lian","Masahiro Tanaka","Olatunji Ruwase","Minjia Zhang"],"abstract":"The emergence of Superchips represents a significant advancement in next-generation AI hardware. These Superchips employ a tightly coupled heterogeneous architecture that integrates GPU and CPU on the same package, which offers unprecedented computational power. However, there has been scant research investigating how LLM training benefits from this new architecture. In this work, for the first time, we study LLM training solutions based on offloading for Superchips. We observe important differences between Superchips and traditional loosely-coupled GPU-CPU architecture, which necessitate revisiting prevailing assumptions about offloading. Based on that, we present SuperOffload, a Superchip-centric offloading system that simultaneously uses Hopper GPU, Grace CPU, and NVLink-C2C interconnect more efficiently. SuperOffload accomplishes this via a combination of techniques, such as adaptive...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3760250.3762217","openalex_id":"https://openalex.org/W4417241857","cited_by_count":1,"quality_score":42,"matched_keywords":["LLM"],"author_affiliations":["Bellevue Hospital Center","Microsoft (United States)","University of Illinois System"],"concepts":[{"id":"https://openalex.org/C2777211547","display_name":"Training (meteorology)","score":0.7807000279426575},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7432000041007996},{"id":"https://openalex.org/C2778112365","display_name":"Sequence (biology)","score":0.5295000076293945},{"id":"https://openalex.org/C163258240","display_name":"Power (physics)","score":0.5228000283241272},{"id":"https://openalex.org/C2781172179","display_name":"Parallelism (grammar)","score":0.510699987411499},{"id":"https://openalex.org/C157764524","display_name":"Throughput","score":0.477400004863739},{"id":"https://openalex.org/C118524514","display_name":"Computer architecture","score":0.4146000146865845},{"id":"https://openalex.org/C51632099","display_name":"Training set","score":0.4034000039100647}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4417239216","title":"Large-scale generative tumor synthesis in computed tomography images for improving tumor recognition","url":"https://doi.org/10.1038/s41467-025-66071-6","published":"2025-12-11","authors":["Linshan Wu","Jiaxin Zhuang","Yanning Zhou","Sunan He","Jiabo Ma","Luyang Luo","Xi Wang","Xuefeng Ni","Xiaoling Zhong","Mingxiang Wu","Yinghua Zhao","Xiaohui Duan"],"abstract":"AI-driven tumor recognition unlocks new possibilities for precise tumor screening and diagnosis. However, the progress is heavily hampered by the scarcity of annotated datasets, demanding extensive efforts by radiologists. To this end, we introduce FreeTumor, a Generative AI framework to enable large-scale tumor synthesis for mitigating data scarcity. Specifically, FreeTumor effectively leverages limited labeled data and large-scale unlabeled data for training. Unleashing the power of large-scale data, FreeTumor is capable of synthesizing a large number of realistic tumors for augmenting training datasets. We curate a large-scale dataset comprising 161,310 Computed Tomography (CT) volumes for tumor synthesis and recognition, with only 2.3% containing annotated tumors. 13 board-certified radiologists are engaged to discern between synthetic and real tumors, rigorously validating the quali...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1038/s41467-025-66071-6","openalex_id":"https://openalex.org/W4417239216","cited_by_count":2,"quality_score":39,"matched_keywords":[],"author_affiliations":["Chinese University of Hong Kong","City University of Hong Kong, Shenzhen Research Institute","Harvard University","Hong Kong University of Science and Technology","Institut de Recherche et d’Innovation","ShenZhen People’s Hospital","Sun Yat-sen Memorial Hospital","Sun Yat-sen University","Tencent (China)","Third Affiliated Hospital of Southern Medical University","University of Hong Kong"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7317000031471252},{"id":"https://openalex.org/C2989087649","display_name":"Image synthesis","score":0.6606000065803528},{"id":"https://openalex.org/C544519230","display_name":"Computed tomography","score":0.650600016117096},{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.6165000200271606},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5979999899864197},{"id":"https://openalex.org/C51632099","display_name":"Training set","score":0.47760000824928284},{"id":"https://openalex.org/C167966045","display_name":"Generative model","score":0.4300000071525574},{"id":"https://openalex.org/C3020616263","display_name":"Tumor cells","score":0.37470000982284546}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":2}},{"id":"openalex:W4417248995","title":"An open-source bio-logger for studying cetacean behavior and communication","url":"https://doi.org/10.1371/journal.pone.0337093","published":"2025-12-11","authors":["Daniel M. Vogt","Joseph DelPreto","Michael Salino-Hugg","Matthew R. Cummings","Michael A. Bell","Aidan Kenny","Peter Malkin","Alyssa M. Hernandez","A. J. WRIGHT","Molly A. Duncan","Matthew R. Davidsen","K.K. Grewal"],"abstract":"Over the past decade, bioacoustics associated with diverse marine life has become the focus of increasing research. While fixed acoustic devices play important roles in characterizing localized soundscapes, animal-worn devices that record audio alongside physiological metrics provide richer portals to understanding cetacean communication and characterizing sounds in their environment. To facilitate scaling the collection of such multimodal datasets for deep learning applications and to encourage rapid prototyping for new recording capabilities, we present an open-source non-invasive bio-logger that can be deployed on marine animals to record high-quality audio synchronized with an extensible suite of behavioral and environmental sensors. The current implementation is tailored to investigating sperm whale communication and biology. It features four suction cups, three high-bandwidth synch...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1371/journal.pone.0337093","openalex_id":"https://openalex.org/W4417248995","cited_by_count":1,"quality_score":38,"matched_keywords":[],"author_affiliations":["Baruch College","Carleton University","Dominican College of Blauvelt","Google (United States)","Harvard University","Kingston Technology (United States)","Massachusetts Institute of Technology","The Graduate Center, CUNY","University of Haifa","University of Zagreb"],"concepts":[{"id":"https://openalex.org/C34951282","display_name":"Bioacoustics","score":0.6998000144958496},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6480000019073486},{"id":"https://openalex.org/C2777904410","display_name":"Software","score":0.5404000282287598},{"id":"https://openalex.org/C2988419192","display_name":"Animal behavior","score":0.46309998631477356},{"id":"https://openalex.org/C79581498","display_name":"Suite","score":0.445499986410141},{"id":"https://openalex.org/C2776123653","display_name":"Sperm whale","score":0.4074999988079071},{"id":"https://openalex.org/C2777704720","display_name":"Whale","score":0.39739999175071716},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.3952000141143799}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W7131652959","title":"COSTAR: Cloud-Observed Safety and Trust-Aware Agentic Reasoning for Enterprise Workflows","url":"https://doi.org/10.1109/icaides67265.2025.11404085","published":"2025-12-11","authors":["Sowjanya Pandruju"],"abstract":"Agentic AI systems are moving from lab demos to production use, yet reproducibility, observability, and policy conformance remain open problems. I present COSTAR, a cloud-observed, safety- and trust-aware agentic orchestration layer that couples a deliberative planner with a verified executor, instrumented end-to-end with standard traces and an immutable Action Ledger. COSTAR integrates with Amazon Bedrock Agents for tool-use and knowledge grounding while adding a dual-loop controller: a Deliberation Loop for plan generation and reflection, and an Execution Loop with policy checks, risk budgets, and rollback. I evaluate COSTAR on GAIA, AgentBench, and SWE-bench style tasks, plus enterprise workflows (ticket triage, CRM updates, cloud change requests). COSTAR improves task success under policy constraints while reducing unsafe actions and cost variance versus strong agentic baselines.","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/icaides67265.2025.11404085","openalex_id":"https://openalex.org/W7131652959","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Amazon (United States)"],"concepts":[{"id":"https://openalex.org/C177212765","display_name":"Workflow","score":0.608299970626831},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5934000015258789},{"id":"https://openalex.org/C2776946740","display_name":"Deliberation","score":0.5640000104904175},{"id":"https://openalex.org/C2780451532","display_name":"Task (project management)","score":0.49880000948905945},{"id":"https://openalex.org/C199168358","display_name":"Orchestration","score":0.4968000054359436},{"id":"https://openalex.org/C79974875","display_name":"Cloud computing","score":0.460999995470047},{"id":"https://openalex.org/C2780791683","display_name":"Action (physics)","score":0.4580000042915344},{"id":"https://openalex.org/C2780154230","display_name":"Undo","score":0.4259999990463257}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7114897931","title":"Agentic AI in Research: Regional Analysis of Community Priorities","url":"https://doi.org/10.15497/RDA00146","published":"2025-12-11","authors":["Clare, Connie","Hanahoe, Hilary","Sharma, Curtis J M","Collinson, Marcy"],"abstract":"<p><span>Regional breakdown of survey responses (<em>n</em>=83) </span>from a global community consultation on agentic AI in research conducted by the Research Data Alliance (RDA), in collaboration with Microsoft Research, in November 2025. This dataset presents weighted usefulness scores and rankings for eleven proposed AI agents across continents.</p>","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"report","doi":"https://doi.org/10.15497/rda00146","openalex_id":"https://openalex.org/W7114897931","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Microsoft (United States)"],"concepts":[{"id":"https://openalex.org/C2778431023","display_name":"Alliance","score":0.5669999718666077},{"id":"https://openalex.org/C56739046","display_name":"Knowledge management","score":0.3840999901294708},{"id":"https://openalex.org/C39549134","display_name":"Public relations","score":0.375},{"id":"https://openalex.org/C17744445","display_name":"Political science","score":0.3531000018119812},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.3495999872684479},{"id":"https://openalex.org/C2522767166","display_name":"Data science","score":0.3472999930381775},{"id":"https://openalex.org/C133462117","display_name":"Data collection","score":0.34310001134872437},{"id":"https://openalex.org/C15744967","display_name":"Psychology","score":0.32190001010894775}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7114929454","title":"Agentic AI in Research: Claude.ai Prompts for Qualitative Data Analysis","url":"https://doi.org/10.15497/RDA00148","published":"2025-12-11","authors":["Clare, Connie","Hanahoe, Hilary","Sharma, Curtis J M","Collinson, Marcy","Payton, Ryan"],"abstract":"<p><span>Documentation of Claude.ai prompts used for thematic analysis and clustering of qualitative free-text survey responses from </span>a global community consultation on agentic AI in research conducted by the Research Data Alliance (RDA), in collaboration with Microsoft Research, in November 2025. This file provides transparency in the AI-assisted analytical methodology, including prompts for sentiment analysis, theme extraction, and response categorisation of community feedback on proposed agentic AI agents.</p>","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.15497/rda00148","openalex_id":"https://openalex.org/W7114929454","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Microsoft (United States)"],"concepts":[{"id":"https://openalex.org/C74196892","display_name":"Thematic analysis","score":0.5968000292778015},{"id":"https://openalex.org/C2780233690","display_name":"Transparency (behavior)","score":0.5087000131607056},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5081999897956848},{"id":"https://openalex.org/C73555534","display_name":"Cluster analysis","score":0.5004000067710876},{"id":"https://openalex.org/C33566652","display_name":"Theme (computing)","score":0.487199991941452},{"id":"https://openalex.org/C190248442","display_name":"Qualitative research","score":0.4779999852180481},{"id":"https://openalex.org/C15744967","display_name":"Psychology","score":0.43299999833106995},{"id":"https://openalex.org/C87156501","display_name":"Qualitative property","score":0.42419999837875366}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7114915269","title":"AI in Research: Information Session Interactive Polling Results","url":"https://doi.org/10.15497/RDA00149","published":"2025-12-11","authors":["Clare, Connie","Hanahoe, Hilary","Sharma, Curtis J M","Collinson, Marcy","Payton, Ryan"],"abstract":"<p><span>Mentimeter polling results from four online information sessions held in November 2025 as part of the RDA-Microsoft global community consultation on agentic AI in research. This dataset captures real-time participant responses to questions about the future of agentic AI in research, including opportunities, concerns, and live prioritisation of proposed AI agents. Data collected from sessions on November 11, 12, 13, and 27, 2025.</span></p>","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.15497/rda00149","openalex_id":"https://openalex.org/W7114915269","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Microsoft (United States)"],"concepts":[{"id":"https://openalex.org/C204854418","display_name":"Polling","score":0.8659999966621399},{"id":"https://openalex.org/C2779182362","display_name":"Session (web analytics)","score":0.7515000104904175},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6270999908447266},{"id":"https://openalex.org/C49774154","display_name":"Multimedia","score":0.4246000051498413},{"id":"https://openalex.org/C136764020","display_name":"World Wide Web","score":0.36000001430511475},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.32260000705718994},{"id":"https://openalex.org/C87190583","display_name":"Polling system","score":0.32179999351501465},{"id":"https://openalex.org/C3019144022","display_name":"Questions and answers","score":0.27469998598098755}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"arxiv:2512.10675","title":"Evaluating Gemini Robotics Policies in a Veo World Simulator","url":"https://huggingface.co/papers/2512.10675","published":"2025-12-11","authors":["Gemini Robotics Team","Coline Devin","Yilun Du","Debidatta Dwibedi","Ruiqi Gao","Abhishek Jindal","Thomas Kipf","Sean Kirmani","Fangchen Liu","Anirudha Majumdar","Andrew Marmon","Carolina Parada"],"abstract":"Generative world models hold significant potential for simulating interactions with visuomotor policies in varied environments. Frontier video models can enable generation of realistic observations and environment interactions in a scalable and general manner. However, the use of video models in robotics has been limited primarily to in-distribution evaluations, i.e., scenarios that are similar to ones used to train the policy or fine-tune the base video model. In this report, we demonstrate that video models can be used for the entire spectrum of policy evaluation use cases in robotics: from assessing nominal performance to out-of-distribution (OOD) generalization, and probing physical and semantic safety. We introduce a generative evaluation system built upon a frontier video foundation model (Veo). The system is optimized to support robot action conditioning and multi-view consistency...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["huggingface_search"],"source":"huggingface_search","work_type":"","doi":"","openalex_id":"","cited_by_count":0,"quality_score":27,"matched_keywords":[],"author_affiliations":[],"concepts":[],"official_report":false,"quality_signals":{"company_match_source":"HuggingFace Papers organization/author metadata"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/on-the-dynamics-of-multi-agent-llm-communities-driven-by-value-diversity","title":"On the Dynamics of Multi-Agent LLM Communities Driven by Value Diversity","url":"https://www.microsoft.com/en-us/research/publication/on-the-dynamics-of-multi-agent-llm-communities-driven-by-value-diversity/","published":"2025-12-10","authors":["Muhua Huang","Qinlin Zhao","Xiaoyuan Yi","Xing Xie"],"abstract":"As Large Language Models (LLM) based multi-agent systems become increasingly prevalent, the collective behaviors, e.g., collective intelligence, of such artificial communities have drawn growing attention. This work aims to answer a fundamental question: How does diversity of values shape the collective behavior of AI communities? Using naturalistic value elicitation grounded in the prevalent Schwartz's Theory of Basic Human Values, we constructed multi-agent simulations where communities with varying numbers of agents engaged in open-ended interactions and constitution formation. The results show that value diversity enhances value stability, fosters emergent behaviors, and brings more creative principles developed by the agents themselves without external guidance. However, these effects also show diminishing returns: extreme heterogeneity induces instability. This work positions value...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":80,"matched_keywords":["Miscellaneous","Artificial intelligence","Computer science","large language models","LLM","agent","multi-agent"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/cosplan-corrective-sequential-planning-via-scene-graph-incremental-updates","title":"CoSPlan: Corrective Sequential Planning via Scene Graph Incremental Updates","url":"https://www.microsoft.com/en-us/research/publication/cosplan-corrective-sequential-planning-via-scene-graph-incremental-updates/","published":"2025-12-10","authors":["Shresth Grover","P. Pathak","Akash Kumar","Vibhav Vineet","Y. S. Rawat"],"abstract":"Large-scale Vision-Language Models (VLMs) exhibit impressive complex reasoning capabilities but remain largely unexplored in visual sequential planning, i.e., executing multi-step actions towards a goal. Additionally, practical sequential planning often involves non-optimal (erroneous) steps, challenging VLMs to detect and correct such steps. We propose Corrective Sequential Planning Benchmark (CoSPlan) to evaluate VLMs in error-prone, vision-based sequential planning tasks across 4 domains: maze navigation, block rearrangement, image reconstruction,and object reorganization. CoSPlan assesses two key abilities: Error Detection (identifying non-optimal action) and Step Completion (correcting and completing action sequences to reach the goal). Despite using state-of-the-art reasoning techniques such as Chain-of-Thought and Scene Graphs, VLMs (e.g. Intern-VLM and Qwen2) struggle on CoSPlan,...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Miscellaneous","Computer vision","Computer science","Computer Vision and Pattern Recognition","Vision-language models"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/causal-reasoning-favors-encoders-on-the-limits-of-decoder-only-models","title":"Causal Reasoning Favors Encoders: On The Limits of Decoder-Only Models","url":"https://www.microsoft.com/en-us/research/publication/causal-reasoning-favors-encoders-on-the-limits-of-decoder-only-models/","published":"2025-12-10","authors":["Amartya Roy","Iit Sire","Delhi Robert","India Bosch GmbH","Kripabandhu Ghosh","Iiser Kolkata","P. Kumaraguru","Adrian de Wynter"],"abstract":"In context learning (ICL) underpins recent advances in large language models (LLMs), although its role and performance in causal reasoning remains unclear. Causal reasoning demands multihop composition and strict conjunctive control, and reliance on spurious lexical relations of the input could provide misleading results. We hypothesize that, due to their ability to project the input into a latent space, encoder and encoder decoder architectures are better suited for said multihop conjunctive reasoning versus decoder only models. To do this, we compare fine-tuned versions of all the aforementioned architectures with zero and few shot ICL in both natural language and non natural language scenarios. We find that ICL alone is insufficient for reliable causal reasoning, often overfocusing on irrelevant input features. In particular, decoder only models are noticeably brittle to distributiona...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Miscellaneous","Artificial intelligence","Computer science","large language models"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"apple:su40sx3ed0jhcuv8i84ob1s7","title":"ChipChat: Low-Latency Cascaded Conversational Agent in MLX","url":"https://machinelearning.apple.com/research/chipchat","published":"2025-12-10","authors":["Tatiana Likhomanenko§","Luke Carlson§","Richard He Bai§","Zijin Gu§","Han Tran§","Zakaria Aldeneh§","Yizhe Zhang","Ruixiang Zhang","Huangjie Zheng","Navdeep Jaitly§"],"abstract":"The emergence of large language models (LLMs) has transformed spoken dialog systems, yet the optimal architecture for real-time on-device voice agents remains an open question. While end-to-end approaches promise theoretical advantages, cascaded systems (CSs) continue to outperform them in language understanding tasks, despite being constrained by sequential processing latency. In this work, we introduce ChipChat, a novel low-latency CS that...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["agent"],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"apple:g19tm3rqdrfvfoyqlc9zl4sa","title":"Continuously Augmented Discrete Diffusion model for Categorical Generative Modeling","url":"https://machinelearning.apple.com/research/continuously-augmented","published":"2025-12-10","authors":["Huangjie Zheng","Shansan Gong","Ruixiang Zhang","Tianrong Chen","Jiatao Gu","Mingyuan Zhou","Navdeep Jaitly","Yizhe Zhang"],"abstract":"Standard discrete diffusion models treat all unobserved states identically by mapping them to an absorbing [MASK] token. This creates an 'information void' where semantic information that could be inferred from unmasked tokens is lost between denoising steps. We introduce Continuously Augmented Discrete Diffusion (CADD), a framework that augments the discrete state space with a paired diffusion in a continuous latent space. This yields graded,...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"arxiv:2512.09892","title":"Provably Learning from Modern Language Models via Low Logit Rank","url":"http://arxiv.org/abs/2512.09892","published":"2025-12-10","authors":["Golowich, Noah","Liu, Allen","Shetty, Abhishek"],"abstract":"While modern language models and their inner workings are incredibly complex, recent work (Golowich, Liu & Shetty; 2025) has proposed a simple and potentially tractable abstraction for them through the observation that empirically, these language models all seem to have approximately low logit rank. Roughly, this means that a matrix formed by the model's log probabilities of various tokens conditioned on certain sequences of tokens is well approximated by a low rank matrix. In this paper, our focus is on understanding how this structure can be exploited algorithmically for obtaining provable learning guarantees. Since low logit rank models can encode hard-to-learn distributions such as noisy parities, we study a query learning model with logit queries that reflects the access model for common APIs. Our main result is an efficient algorithm for learning any approximately low logit rank mo...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"","openalex_id":"https://openalex.org/W7114932582","cited_by_count":0,"quality_score":41,"matched_keywords":["efficient"],"author_affiliations":["Microsoft (United States)","Microsoft Research (United Kingdom)"],"concepts":[{"id":"https://openalex.org/C164226766","display_name":"Rank (graph theory)","score":0.6793000102043152},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.647599995136261},{"id":"https://openalex.org/C124304363","display_name":"Abstraction","score":0.5837000012397766},{"id":"https://openalex.org/C140331021","display_name":"Logit","score":0.574999988079071},{"id":"https://openalex.org/C2780586882","display_name":"Simple (philosophy)","score":0.5184000134468079},{"id":"https://openalex.org/C167966045","display_name":"Generative model","score":0.5008000135421753},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4941999912261963},{"id":"https://openalex.org/C80444323","display_name":"Theoretical computer science","score":0.4715999960899353}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4417201555","title":"TIME-VAD: Text-Informed Magnitude Enhancement Feature Learning for Vehicle Accident Detection and Anticipation","url":"https://doi.org/10.1109/tits.2025.3637912","published":"2025-12-10","authors":["Sumit Mishra","Medhavi Mishra","Pranjay Shyam","Dongsoo Har"],"abstract":"Vehicular accidents pose a substantial risk to drivers, underscoring the persistent and vital need for heightening safety measures. Early accident anticipation mechanisms are imperative for proactive measures, while detection accuracy is pivotal for prompt response and effective post-accident mitigation. Accurate and early anticipation of accidents for automated driving assistance systems in vehicles or CCTV in cities remains a complex task due to the intricate spatial-temporal interactions within traffic videos. This study presents text-informed magnitude enhancement in contrastive multiple-instance feature learning for vehicle accident detection and anticipation (TIME-VAD). Text is a better representative of concepts when compared to images in video, thus multi-modal learning is suitable. Also, the traditional assumption about feature magnitude of accidents and normal frames in magnitu...","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/tits.2025.3637912","openalex_id":"https://openalex.org/W4417201555","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Amazon (United States)","Korea Advanced Institute of Science and Technology"],"concepts":[{"id":"https://openalex.org/C176777502","display_name":"Anticipation (artificial intelligence)","score":0.7957000136375427},{"id":"https://openalex.org/C2776401178","display_name":"Feature (linguistics)","score":0.6638000011444092},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6614999771118164},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.582099974155426},{"id":"https://openalex.org/C185798385","display_name":"Benchmark (surveying)","score":0.5546000003814697},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.48890000581741333},{"id":"https://openalex.org/C26760741","display_name":"Perception","score":0.4884999990463257},{"id":"https://openalex.org/C126691448","display_name":"Magnitude (astronomy)","score":0.41440001130104065}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/multimodal-ai-generates-virtual-population-for-tumor-microenvironment-modeling","title":"Multimodal AI generates virtual population for tumor microenvironment modeling","url":"https://www.microsoft.com/en-us/research/publication/multimodal-ai-generates-virtual-population-for-tumor-microenvironment-modeling/","published":"2025-12-09","authors":["Jeya Maria Jose Valanarasu","Hanwen Xu","Naoto Usuyama","Chanwoo Kim","Cliff Wong","Peniel Argaw","Racheli Ben Shimol","Angela Crabtree","Kevin Matlock","Alexandra Q. Bartlett","Jaspreet Bagga","Yu Gu"],"abstract":"The tumor immune microenvironment (TIME) critically impacts cancer progression and immunotherapy response. Multiplex immunofluorescence (mIF) is a powerful imaging modality for deciphering TIME, but its applicability is limited by high cost and low throughput. We propose GigaTIME , a multimodal AI framework for population-scale TIME modeling by bridging cell morphology and states. GigaTIME learns a cross-modal translator to generate virtual mIF images from hematoxylin and eosin (H&E) slides by training on 40 million cells with paired H&E and mIF data across 21 proteins. We applied GigaTIME to 14,256 patients from 51 hospitals and over 1,000 clinics across seven US states in Providence Health, generating 299,376 virtual mIF slides spanning 24 cancer types and 306 subtypes. This virtual population uncovered 1,234 statistically significant associations linking proteins, biomarkers, staging,...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.1016/j.cell.2025.11.016","openalex_id":"https://openalex.org/W4417164932","cited_by_count":15,"quality_score":83,"matched_keywords":["Article (Journal)","Artificial intelligence","Medical, health and genomics","Healthcare"],"author_affiliations":["Microsoft","Microsoft (United States)","Microsoft Research (United Kingdom)","Providence College","Providence Portland Medical Center","Renton Technical College","Research Network (United States)"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/exqutor-extended-query-optimizer-for-vector-augmented-analytical-queries","title":"Exqutor: Extended Query Optimizer for Vector-augmented Analytical Queries","url":"https://www.microsoft.com/en-us/research/publication/exqutor-extended-query-optimizer-for-vector-augmented-analytical-queries/","published":"2025-12-09","authors":["Hyunjoon Kim","Chaerim Lim","Hyeonjun An","Rathijit Sen","Kwanghyun Park"],"abstract":"Vector similarity search is becoming increasingly important for data science pipelines, particularly in Retrieval-Augmented Generation (RAG), where it enhances large language model inference by enabling efficient retrieval of relevant external knowledge. As RAG expands with table-augmented generation to incorporate structured data, workloads integrating table and vector search are becoming more prevalent. However, efficiently executing such queries remains challenging due to inaccurate cardinality estimation for vector search components, leading to suboptimal query plans. In this paper, we propose Exqutor, an extended query optimizer for vector-augmented analytical queries. Exqutor is a pluggable cardinality estimation framework designed to address this issue, leveraging exact cardinality query optimization techniques to enhance estimation accuracy when vector indexes (e.g., HNSW, IVF) a...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":80,"matched_keywords":["Miscellaneous","Artificial intelligence","Computer science","Retrieval-Augmented Generation","language model","retrieval","efficient"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"apple:tgvlljs92wwcefpaxutlcu5v","title":"Semantic Mastery: Enhancing LLMs with Advanced Natural Language Understanding","url":"https://machinelearning.apple.com/research/semantic-mastery","published":"2025-12-09","authors":["Mohanakrishnan Hariharan"],"abstract":"Large language models (LLMs) have greatly improved their capability in performing NLP tasks. However, deeper semantic understanding, contextual coherence, and more subtle reasoning are still difficult to obtain. The paper discusses state-of-the-art methodologies that advance LLMs with more advanced NLU techniques, such as semantic parsing, knowledge integration, and contextual reinforcement learning. We analyze the use of structured knowledge...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"apple:v1evcs124mxa35pp2ff57pke","title":"Reinforcement Learning Integrated Agentic RAG for Software Test Cases Authoring","url":"https://machinelearning.apple.com/research/reinforcement-learning-integrated","published":"2025-12-09","authors":["Mohanakrishnan Hariharan"],"abstract":"This paper introduces a framework that integrates reinforcement learning (RL) with autonomous agents to enable continuous improvement in the automated process of software test cases authoring from business requirement documents within Quality Engineering (QE) workflows. Conventional systems employing Large Language Models (LLMs) generate test cases from static knowledge bases, which fundamentally limits their capacity to enhance performance over...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.1109/icaic67076.2026.11395683","openalex_id":"https://openalex.org/W7131081022","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Apple","Apple (United States)"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"openalex:W7140145207","title":"Efficient Ads Ranking Using Distilled Large Language Models and Transformer Architectures","url":"https://doi.org/10.1109/caisais68078.2025.11440886","published":"2025-12-09","authors":["Arpita Vasant Shah","Sargam Menghani","Shuxian Yu","Patrick R. Jordan"],"abstract":"","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/caisais68078.2025.11440886","openalex_id":"https://openalex.org/W7140145207","cited_by_count":0,"quality_score":41,"matched_keywords":["efficient"],"author_affiliations":["Microsoft (Finland)","Microsoft (United States)","Microsoft Research (United Kingdom)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6438999772071838},{"id":"https://openalex.org/C66322947","display_name":"Transformer","score":0.5968000292778015},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.4239000082015991},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3831999897956848},{"id":"https://openalex.org/C11413529","display_name":"Algorithm","score":0.320499986410141},{"id":"https://openalex.org/C79403827","display_name":"Real-time computing","score":0.31529998779296875},{"id":"https://openalex.org/C189430467","display_name":"Ranking (information retrieval)","score":0.29899999499320984},{"id":"https://openalex.org/C124101348","display_name":"Data mining","score":0.289000004529953}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4417169478","title":"Synergistic Fusion of Sentinel-1 and Sentinel-2 for Global LULC Mapping: The Multimodal Network LULC-Former and Dynamic World+ Dataset","url":"https://doi.org/10.1109/jstars.2025.3641788","published":"2025-12-09","authors":["Hao Yu","Gen Li","Haoyu Liu","Songyan Zhu","Jian Xu","Wenquan Dong","Changjian Li","Jiancheng Shi"],"abstract":"Accurate, high-resolution global land use and land cover (LULC) mapping is crucial for environmental monitoring, but remains challenging when relying solely on multispectral data. Most existing global LULC mapping studies rely exclusively on multispectral observations, and even those that incorporate Synthetic Aperture Radar (SAR) data often fail to fully exploit the information it provides. SAR provides an all-weather sensing capability and is uniquely sensitive to surface structure, texture, and moisture—critical information for LULC classes that are often spectrally ambiguous. To address this data gap, we introduce the <italic xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">Dynamic World+</i> dataset, a new global benchmark that expands the authoritative Dynamic World by aligning it with Sentinel-1 SAR data. Additionally, to facilitate the com...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/jstars.2025.3641788","openalex_id":"https://openalex.org/W4417169478","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Chinese Academy of Sciences","Lund University","National Space Science Center","Tencent (China)","University of Edinburgh","University of Southampton"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7964000105857849},{"id":"https://openalex.org/C165696696","display_name":"Exploit","score":0.7318999767303467},{"id":"https://openalex.org/C173163844","display_name":"Multispectral image","score":0.6736000180244446},{"id":"https://openalex.org/C87360688","display_name":"Synthetic aperture radar","score":0.6510000228881836},{"id":"https://openalex.org/C124101348","display_name":"Data mining","score":0.4657000005245209},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.45910000801086426},{"id":"https://openalex.org/C2780648208","display_name":"Land cover","score":0.4302000105381012},{"id":"https://openalex.org/C33954974","display_name":"Sensor fusion","score":0.41290000081062317}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"arxiv:2512.09106","title":"Learning Unmasking Policies for Diffusion Language Models","url":"https://huggingface.co/papers/2512.09106","published":"2025-12-09","authors":["Metod Jazbec","Theo X. Olausson","Louis Béthune","Pierre Ablin","Michael Kirchhof","Joao Monterio","Victor Turrisi","Jason Ramapuram","Marco Cuturi"],"abstract":"Diffusion (Large) Language Models (dLLMs) now match the downstream performance of their autoregressive counterparts on many tasks, while holding the promise of being more efficient during inference. One particularly successful variant is masked discrete diffusion, in which a buffer filled with special mask tokens is progressively replaced with tokens sampled from the model's vocabulary. Efficiency can be gained by unmasking several tokens in parallel, but doing too many at once risks degrading the generation quality. Thus, one critical design aspect of dLLMs is the sampling procedure that selects, at each step of the diffusion process, which tokens to replace. Indeed, recent work has found that heuristic strategies such as confidence thresholding lead to both higher quality and token throughput compared to random unmasking. However, such heuristics have downsides: they require manual tun...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["huggingface_search"],"source":"huggingface_search","work_type":"","doi":"","openalex_id":"","cited_by_count":0,"quality_score":31,"matched_keywords":["efficient"],"author_affiliations":[],"concepts":[],"official_report":false,"quality_signals":{"company_match_source":"HuggingFace Papers organization/author metadata"}},{"id":"hf-org-paper:tencent:2512.07778","title":"Distribution Matching Variational AutoEncoder","url":"https://huggingface.co/papers/2512.07778","published":"2025-12-08","authors":["Tencent/Hunyuan"],"abstract":"","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["HuggingFace org papers","tencent"],"author_affiliations":["Tencent/Hunyuan"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/tencent/papers"}},{"id":"openalex:W4417136827","title":"Memory Fabric for Conversational AI Agents: Enabling Shared and Persistent Memory Across Users","url":"https://doi.org/10.36227/techrxiv.176523350.08289935/v1","published":"2025-12-08","authors":["Anjikya Tiwari","Vibhuti Gupta"],"abstract":"Conversational AI is now the most widely adopted platform for interfacing with LLMs. Alongside LLMs these AI systems rely on contexts derived from past conversations and preferences to provide accurate and the most relevant responses to users. The knowledge base and past experiences contribute to long-term memory, while processing ongoing conversations generates short-term memory. Both long-term and short-term memories together provide a comprehensive and coherent context to the user. While most architectures focus on a single user context, there is an emerging need in conversational AI to provide a system to generate context from multiple individuals and/or agents. Building on this foundation, we introduce memory fabric, a framework that allows conversational AI to leverage context drawn from multiple users to generate coherent responses in a multiuser setting. This review is a synthesi...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.36227/techrxiv.176523350.08289935/v1","openalex_id":"https://openalex.org/W4417136827","cited_by_count":0,"quality_score":53,"matched_keywords":["memory","long-term","agent","multi-agent"],"author_affiliations":["Data Management (Italy)","Microsoft (Finland)","Microsoft (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7936000227928162},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.4993000030517578},{"id":"https://openalex.org/C192209626","display_name":"Focus (optics)","score":0.4918000102043152},{"id":"https://openalex.org/C2779343474","display_name":"Context (archaeology)","score":0.47530001401901245},{"id":"https://openalex.org/C86251818","display_name":"Benchmarking","score":0.39640000462532043},{"id":"https://openalex.org/C188147891","display_name":"Cognitive science","score":0.3864000141620636},{"id":"https://openalex.org/C153083717","display_name":"Leverage (statistics)","score":0.38609999418258667},{"id":"https://openalex.org/C2781355261","display_name":"Organizational memory","score":0.36980000138282776}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4417125393","title":"AniMaker: Multi-Agent Animated Storytelling with MCTS-Driven Clip Generation","url":"https://doi.org/10.1145/3757377.3764009","published":"2025-12-08","authors":["Haoyuan Shi","Yunxin Li","Xinyu Chen","Longyue Wang","Baotian Hu","Min Zhang"],"abstract":"Despite rapid advancements in video generation models, generating coherent, long-form storytelling videos that span multiple scenes and characters remains challenging. Current methods often rigidly convert pre-generated keyframes into fixed-length clips, resulting in disjointed narratives and pacing issues. Furthermore, the inherent instability of video generation models means that even a single low-quality clip can significantly degrade the entire output animation’s logical coherence and visual continuity. To overcome these obstacles, we introduce AniMaker, a multi-agent framework enabling efficient multi-candidate clip generation and storytelling-aware clip selection, thus creating globally consistent and story-coherent animation solely from text input. The framework is structured around specialized agents, including the Director Agent for storyboard generation, the Photography Agent f...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3757377.3764009","openalex_id":"https://openalex.org/W4417125393","cited_by_count":2,"quality_score":51,"matched_keywords":["efficient","agent","multi-agent"],"author_affiliations":["Alibaba Group (China)","Harbin Institute of Technology"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8144999742507935},{"id":"https://openalex.org/C502989409","display_name":"Animation","score":0.6942999958992004},{"id":"https://openalex.org/C2776538412","display_name":"Storytelling","score":0.6668000221252441},{"id":"https://openalex.org/C2777080924","display_name":"Storyboard","score":0.6498000025749207},{"id":"https://openalex.org/C2779343474","display_name":"Context (archaeology)","score":0.6001999974250793},{"id":"https://openalex.org/C49774154","display_name":"Multimedia","score":0.4668999910354614},{"id":"https://openalex.org/C26517878","display_name":"Key (lock)","score":0.4171000123023987},{"id":"https://openalex.org/C69369342","display_name":"Computer animation","score":0.4115999937057495}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":2}},{"id":"arxiv:2509.19937","title":"GS-RoadPatching: Inpainting Gaussians via 3D Searching and Placing for Driving Scenes","url":"http://arxiv.org/abs/2509.19937","published":"2025-12-08","authors":["Guo Chen","Jiarun Liu","Sicong Du","Chenming Wu","Deqi Li","Shi-Sheng Huang","Guofeng Zhang","Sheng Yang"],"abstract":"This paper presents GS-RoadPatching, an inpainting method for driving scene completion by referring to completely reconstructed regions, which are represented by 3D Gaussian Splatting (3DGS). Unlike existing 3DGS inpainting methods that perform generative completion relying on 2D perspective-view-based diffusion or GAN models to predict limited appearance or depth cues for missing regions, our approach enables substitutional scene inpainting and editing directly through the 3DGS modality, extricating it from requiring spatial-temporal consistency of 2D cross-modals and eliminating the need for time-intensive retraining of Gaussians. Our key insight is that the highly repetitive patterns in driving scenes often share multi-modal similarities within the implicit 3DGS feature space and are particularly suitable for structural matching to enable effective 3DGS-based substitutional inpainting...","companies":["Alibaba/Qwen","Baidu"],"matched_orgs":["Alibaba/Qwen","Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3757377.3763892","openalex_id":"https://openalex.org/W4417125566","cited_by_count":1,"quality_score":50,"matched_keywords":[],"author_affiliations":["Alibaba Group (China)","Baidu (China)","Beijing Normal University","Zhejiang University"],"concepts":[{"id":"https://openalex.org/C11727466","display_name":"Inpainting","score":0.9427000284194946},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7457000017166138},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.7242000102996826},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.5662000179290771},{"id":"https://openalex.org/C2779343474","display_name":"Context (archaeology)","score":0.5605999827384949},{"id":"https://openalex.org/C2776436953","display_name":"Consistency (knowledge bases)","score":0.5141000151634216},{"id":"https://openalex.org/C165064840","display_name":"Matching (statistics)","score":0.5063999891281128},{"id":"https://openalex.org/C2776401178","display_name":"Feature (linguistics)","score":0.44909998774528503}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W7110216084","title":"LLM-Primitives: Large Language Model for 3D Reconstruction with Primitives","url":"https://doi.org/10.1145/3757377.3763857","published":"2025-12-08","authors":["Kuan Tian","Zhihao Hu","Yonghang Guan","Jun Zhang"],"abstract":"We present LLM-Primitives: Large Language Model for 3D Reconstruction with Primitives, a novel approach to shape abstraction. By incorporating multi-modal conditional inputs, our method enables LLMs to reconstruct high-quality 3D primitives using only a modest amount of training data (tens of thousands of samples). This work marks a significant milestone in applying large language models to 3D primitive-based reconstruction, demonstrating both their feasibility and effectiveness in this domain. Specifically, we leverage the point clouds of existing 3D models as conditional inputs to the LLM via a multi-modal connector. Instead of directly estimating primitive parameters, we introduce a center-to-surface vector representation, ensuring deterministic outputs and avoiding the ambiguity often associated with primitive parameterization. Experimental results show that LLM-Primitives surpass st...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3757377.3763857","openalex_id":"https://openalex.org/W7110216084","cited_by_count":1,"quality_score":46,"matched_keywords":["LLM","language model"],"author_affiliations":["Tencent (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7293000221252441},{"id":"https://openalex.org/C153083717","display_name":"Leverage (statistics)","score":0.6790000200271606},{"id":"https://openalex.org/C131979681","display_name":"Point cloud","score":0.6608999967575073},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.5968999862670898},{"id":"https://openalex.org/C2780522230","display_name":"Ambiguity","score":0.5911999940872192},{"id":"https://openalex.org/C109950114","display_name":"3D reconstruction","score":0.53329998254776},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5160999894142151},{"id":"https://openalex.org/C3019007443","display_name":"3d model","score":0.5016000270843506}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W7138949575","title":"WMAS: A Multi-Agent System Towards Intelligent and Customized Wireless Networks","url":"https://doi.org/10.1109/globecom59602.2025.11432052","published":"2025-12-08","authors":["Jingchen Peng","Dingli Yuan","Boxiang Ren","Jie Fan","Zhigang Wu","Lu Yang"],"abstract":"The fast development of Artificial Intelligence (AI) agents provides a promising way for the realization of intelligent and customized wireless networks. In this paper, we propose a Wireless Multi-Agent System (WMAS), which can provide intelligent and customized services for different user equipment (UEs). Note that orchestrating multiple agents carries the risk of malfunction, and multi-agent conversations may fall into infinite loops. It is thus crucial to design a conversation topology for WMAS that enables agents to complete UE task requests with high accuracy and low conversation overhead. To address this issue, we model the multi-agent conversation topology as a directed acyclic graph and propose a reinforcement learning- based algorithm to optimize the adjacency matrix of this graph. As such, WMAS is capable of generating and self-optimizing multi-agent conversation topologies, en...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/globecom59602.2025.11432052","openalex_id":"https://openalex.org/W7138949575","cited_by_count":0,"quality_score":45,"matched_keywords":["agent","multi-agent"],"author_affiliations":["Huawei Technologies (China)","Tsinghua University"],"concepts":[{"id":"https://openalex.org/C2777200299","display_name":"Conversation","score":0.8363999724388123},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7371000051498413},{"id":"https://openalex.org/C2780451532","display_name":"Task (project management)","score":0.5307000279426575},{"id":"https://openalex.org/C120314980","display_name":"Distributed computing","score":0.5212000012397766},{"id":"https://openalex.org/C555944384","display_name":"Wireless","score":0.5130000114440918},{"id":"https://openalex.org/C108037233","display_name":"Wireless network","score":0.46799999475479126},{"id":"https://openalex.org/C2779960059","display_name":"Overhead (engineering)","score":0.46050000190734863},{"id":"https://openalex.org/C199845137","display_name":"Network topology","score":0.4388999938964844}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4417132154","title":"Video-Bench: A Comprehensive Benchmark and Toolkit for Evaluating Video-Based Large Language Models","url":"https://doi.org/10.26599/cvm.2025.9450516","published":"2025-12-08","authors":["Munan Ning","Bin Zhu","Yujia Xie","Bin Lin","Jiaxi Cui","Yuan Lu","Dongdong Chen","Li Yuan"],"abstract":"Video-based large language models (Video-LLMs) have been recently introduced, targeting both fundamental improvements in perception and comprehension, and a diverse range of user inquiries. In pursuit of the ultimate goal of achieving artificial general intelligence, a truly intelligent Video-LLM model should not only see and understand the surroundings, but also possess human-level commonsense, and make well-informed decisions for users. To guide the development of such a model, the establishment of a robust and comprehensive evaluation system becomes crucial. To this end, this paper proposes Video-Bench, a new comprehensive benchmark along with a toolkit specifically designed for evaluating Video-LLMs. The benchmark comprises 10 meticulously crafted tasks, evaluating the capabilities of Video-LLMs across three distinct levels: video-exclusive understanding, prior knowledge-based questi...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.26599/cvm.2025.9450516","openalex_id":"https://openalex.org/W4417132154","cited_by_count":4,"quality_score":45,"matched_keywords":["LLM"],"author_affiliations":["Meta (United Kingdom)","Microsoft (United States)","Peking University Shenzhen Hospital","Peng Cheng Laboratory","Shenzhen University","Standards, Productivity and Innovation Board"],"concepts":[{"id":"https://openalex.org/C185798385","display_name":"Benchmark (surveying)","score":0.8795999884605408},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8230000138282776},{"id":"https://openalex.org/C511192102","display_name":"Comprehension","score":0.6442000269889832},{"id":"https://openalex.org/C98045186","display_name":"Process (computing)","score":0.6279000043869019},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.47940000891685486},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.460999995470047},{"id":"https://openalex.org/C204323151","display_name":"Range (aeronautics)","score":0.40639999508857727},{"id":"https://openalex.org/C115903868","display_name":"Software engineering","score":0.40560001134872437}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":4}},{"id":"openalex:W7134162714","title":"SAMAG: Structure-Aware Multi-Agent Graph Generation with Large Language Models","url":"https://doi.org/10.1109/bigdata66926.2025.11402080","published":"2025-12-08","authors":["Jingcheng Cen","Jiarui Ji","Zhen Wang","Zhewei Wei","Yaliang Li","Bolin Ding"],"abstract":"","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/bigdata66926.2025.11402080","openalex_id":"https://openalex.org/W7134162714","cited_by_count":0,"quality_score":45,"matched_keywords":["agent","multi-agent"],"author_affiliations":["Alibaba Group (China)","Renmin University of China","Sun Yat-sen University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.598800003528595},{"id":"https://openalex.org/C132525143","display_name":"Graph","score":0.45399999618530273},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.37860000133514404},{"id":"https://openalex.org/C80444323","display_name":"Theoretical computer science","score":0.3479999899864197},{"id":"https://openalex.org/C195324797","display_name":"Natural language","score":0.30869999527931213},{"id":"https://openalex.org/C88230418","display_name":"Graph theory","score":0.28529998660087585},{"id":"https://openalex.org/C33923547","display_name":"Mathematics","score":0.27639999985694885},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.2736999988555908}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4417124633","title":"HOMA: Towards Generic Human-Object Interaction in Multimodal Driven Human Animation with Weak Conditions","url":"https://doi.org/10.1145/3757377.3763861","published":"2025-12-08","authors":["Ziyao Huang","Zixiang Zhou","Juan Cao","Yifeng Ma","Yi Chen","Z. Rao","Zhiyong Xu","Hongmei Wang","Qin Lin","Yuan Zhou","Qinglin Lu","Fan Tang"],"abstract":"While recent advances in human-object interaction (HOI) video generation showcase promising capabilities for synthesizing coordinated human-object dynamics, existing methods remain constrained by their reliance on meticulously curated motion sequences and actor-specific data, thereby limiting practical scalability and user accessibility. Furthermore, generalization to novel object appearances and interaction scenarios remains understudied. To address these limitations, we propose HOMA, a weakly conditioned multimodal-driven HOI video generation framework that introduces sparse, decoupled motion guidance to enhance controllability and reduce dependency on stringent input conditions. Our approach encodes appearance and motion signals into the dual input space of a multimodal diffusion transformer (MMDiT), fusing them within a shared context space to enable temporally consistent and physica...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3757377.3763861","openalex_id":"https://openalex.org/W4417124633","cited_by_count":1,"quality_score":42,"matched_keywords":["efficient"],"author_affiliations":["Tencent (China)","University of Chinese Academy of Sciences"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8008000254631042},{"id":"https://openalex.org/C502989409","display_name":"Animation","score":0.5631999969482422},{"id":"https://openalex.org/C48209547","display_name":"Controllability","score":0.5612000226974487},{"id":"https://openalex.org/C134537474","display_name":"Naturalness","score":0.5178999900817871},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5069000124931335},{"id":"https://openalex.org/C48044578","display_name":"Scalability","score":0.4846999943256378},{"id":"https://openalex.org/C177148314","display_name":"Generalization","score":0.46860000491142273},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.43220001459121704}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"arxiv:2512.12534","title":"Animus3D: Text-driven 3D Animation via Motion Score Distillation","url":"http://arxiv.org/abs/2512.12534","published":"2025-12-08","authors":["Qi Sun","Can Wang","Jiaxiang Shang","Wensen Feng","Jing Liao"],"abstract":"We present Animus3D , a text-driven 3D animation framework that generates motion field given a static 3D asset and text prompt. Previous methods mostly leverage the vanilla Score Distillation Sampling (SDS) objective to distill motion from pretrained text-to-video diffusion, leading to animations with minimal movement or noticeable jitter. To address this, our approach introduces a novel SDS alternative, Motion Score Distillation (MSD). Specifically, we introduce a LoRA-enhanced video diffusion model that defines a static source distribution rather than pure noise as in SDS, while another inversion-based noise estimation technique ensures appearance preservation when guiding motion. To further improve motion fidelity, we incorporate explicit temporal and spatial regularization terms that mitigate geometric distortions across time and space. Additionally, we propose a motion refinement mo...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"preprint","doi":"https://doi.org/10.1145/3757377.3763916","openalex_id":"https://openalex.org/W4417124964","cited_by_count":1,"quality_score":42,"matched_keywords":["distillation"],"author_affiliations":["City University of Hong Kong","Huawei Technologies (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7634999752044678},{"id":"https://openalex.org/C502989409","display_name":"Animation","score":0.7307000160217285},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.6274999976158142},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.6080999970436096},{"id":"https://openalex.org/C104114177","display_name":"Motion (physics)","score":0.4779999852180481},{"id":"https://openalex.org/C153083717","display_name":"Leverage (statistics)","score":0.4745999872684479},{"id":"https://openalex.org/C99498987","display_name":"Noise (video)","score":0.4652000069618225},{"id":"https://openalex.org/C50637493","display_name":"Morphing","score":0.46219998598098755}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4417125575","title":"Img2CAD: Reverse Engineering 3D CAD Models from Images through VLM-Assisted Conditional Factorization","url":"https://doi.org/10.1145/3757377.3763891","published":"2025-12-08","authors":["Yang You","Mikaela Angelina Uy","Jiaqi Han","Rahul Thomas","H.S. Zhang","Yi Du","Hansheng Chen","Francis Engelmann","Suya You","Leonidas Guibas"],"abstract":"Reverse engineering 3D computer-aided design (CAD) models from images is an important task for many downstream applications including interactive editing, manufacturing, architecture, robotics, etc. The difficulty of the task lies in vast representational disparities between the CAD output and the image input. CAD models are precise, programmatic constructs that involves sequential operations combining discrete command structure with continuous attributes – making it challenging to learn and optimize in an end-to-end fashion. Concurrently, input images introduce inherent challenges such as photometric variability and sensor noise, complicating the reverse engineering process. In this work, we introduce a novel approach that conditionally factorizes the task into two sub-problems. First, we leverage vision-language foundation models (VLMs), a finetuned Llama3.2, to predict the global disc...","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3757377.3763891","openalex_id":"https://openalex.org/W4417125575","cited_by_count":3,"quality_score":40,"matched_keywords":[],"author_affiliations":["DEVCOM Army Research Laboratory","Nvidia (United Kingdom)","Nvidia (United States)","Peking University","Stanford University","United States Army Combat Capabilities Development Command"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8084999918937683},{"id":"https://openalex.org/C194789388","display_name":"CAD","score":0.6883000135421753},{"id":"https://openalex.org/C153083717","display_name":"Leverage (statistics)","score":0.6851999759674072},{"id":"https://openalex.org/C207850805","display_name":"Reverse engineering","score":0.6025999784469604},{"id":"https://openalex.org/C2780451532","display_name":"Task (project management)","score":0.5715000033378601},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.49149999022483826},{"id":"https://openalex.org/C184337299","display_name":"Semantics (computer science)","score":0.4851999878883362},{"id":"https://openalex.org/C80444323","display_name":"Theoretical computer science","score":0.39879998564720154}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":3}},{"id":"openalex:W7110151566","title":"CamPVG: Camera-Controlled Panoramic Video Generation with Epipolar-Aware Diffusion","url":"https://doi.org/10.1145/3757377.3763990","published":"2025-12-08","authors":["Chenhao Ji","Chaohui Yu","Junyao Gao","Fan Wang","Cairong Zhao"],"abstract":"Recently, camera-controlled video generation has seen rapid development, offering more precise control over video generation. However, existing methods predominantly focus on camera control in perspective projection video generation, while geometrically consistent panoramic video generation remains challenging. This limitation is primarily due to the inherent complexities in panoramic pose representation and spherical projection. To address this issue, we propose CamPVG, the first diffusion-based framework for panoramic video generation guided by precise camera poses. We achieve camera position encoding for panoramic images and cross-view feature aggregation based on spherical projection. Specifically, we propose a panoramic Plücker embedding that encodes camera extrinsic parameters through spherical coordinate transformation. This pose encoder effectively captures panoramic geometry, ov...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3757377.3763990","openalex_id":"https://openalex.org/W7110151566","cited_by_count":2,"quality_score":39,"matched_keywords":[],"author_affiliations":["Alibaba Group (Cayman Islands)","Alibaba Group (China)","Alibaba Group (United States)","Tongji University"],"concepts":[{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.8452000021934509},{"id":"https://openalex.org/C23379248","display_name":"Epipolar geometry","score":0.8371999859809875},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.7922000288963318},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7039999961853027},{"id":"https://openalex.org/C2776401178","display_name":"Feature (linguistics)","score":0.5188000202178955},{"id":"https://openalex.org/C192209626","display_name":"Focus (optics)","score":0.459199994802475},{"id":"https://openalex.org/C12713177","display_name":"Perspective (graphical)","score":0.44830000400543213},{"id":"https://openalex.org/C118505674","display_name":"Encoder","score":0.43700000643730164}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":2}},{"id":"openalex:W7109971032","title":"Uni3C: Unifying Precisely 3D-Enhanced Camera and Human Motion Controls for Video Generation","url":"https://doi.org/10.1145/3757377.3763842","published":"2025-12-08","authors":["Chenjie Cao","Jingkai Zhou","Shikai Li","Jingyun Liang","Chaohui Yu","Fan Wang","Xiangyang Xue","Yanwei Fu"],"abstract":"Camera and human motion controls have been extensively studied for video generation, but existing approaches typically address them separately, suffering from limited data with high-quality annotations for both aspects. To overcome this, we present Uni3C, a unified 3D-enhanced framework for precise control of both camera and human motion in video generation. Uni3C includes two key contributions. First, we propose a plug-and-play control module trained with a frozen video generative backbone, PCDController, which utilizes unprojected point clouds from monocular depth to achieve accurate camera control. By leveraging the strong 3D priors of point clouds and the powerful capacities of video foundational models, PCDController shows impressive generalization, performing well regardless of whether the inference backbone is frozen or fine-tuned. This flexibility enables different modules of Uni...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3757377.3763842","openalex_id":"https://openalex.org/W7109971032","cited_by_count":1,"quality_score":38,"matched_keywords":[],"author_affiliations":["Alibaba Group (China)","Alibaba Group (United States)","Fudan University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7932000160217285},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.7663999795913696},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.7613999843597412},{"id":"https://openalex.org/C63479239","display_name":"Robustness (evolution)","score":0.536899983882904},{"id":"https://openalex.org/C94816000","display_name":"Camera auto-calibration","score":0.4677000045776367},{"id":"https://openalex.org/C202474056","display_name":"Video tracking","score":0.42010000348091125},{"id":"https://openalex.org/C104114177","display_name":"Motion (physics)","score":0.41260001063346863},{"id":"https://openalex.org/C131979681","display_name":"Point cloud","score":0.3930000066757202}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W7110201020","title":"Shape-for-Motion: Precise and Consistent Video Editing With 3D Proxy","url":"https://doi.org/10.1145/3757377.3763816","published":"2025-12-08","authors":["Yuhao Liu","Tengfei Wang","Fang Liu","Zhenwei Wang","Rynson W.H. Lau"],"abstract":"Recent advances in deep generative modeling have unlocked unprecedented opportunities for video synthesis. In real-world applications, however, users often seek tools to faithfully realize their creative editing intentions with precise and consistent control. Despite the progress achieved by existing methods, ensuring fine-grained alignment with user intentions remains an open and challenging problem. In this work, we present Shape-for-Motion, a novel framework that incorporates a 3D proxy for precise and consistent video editing. Shape-for-Motion achieves this by converting the target object in the input video to a time-consistent mesh, i.e., a 3D proxy, allowing edits to be performed directly on the proxy and then inferred back to the video frames. To simplify the editing process, we design a novel Dual-Propagation Strategy that allows users to perform edits on the 3D mesh of a single....","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3757377.3763816","openalex_id":"https://openalex.org/W7110201020","cited_by_count":1,"quality_score":38,"matched_keywords":[],"author_affiliations":["City University of Hong Kong","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8560000061988831},{"id":"https://openalex.org/C2780310081","display_name":"Video editing","score":0.7560999989509583},{"id":"https://openalex.org/C31487907","display_name":"Polygon mesh","score":0.5608999729156494},{"id":"https://openalex.org/C2776674983","display_name":"Image editing","score":0.49309998750686646},{"id":"https://openalex.org/C26517878","display_name":"Key (lock)","score":0.47859999537467957},{"id":"https://openalex.org/C2780148112","display_name":"Proxy (statistics)","score":0.4499000012874603},{"id":"https://openalex.org/C137402728","display_name":"Non-linear editing system","score":0.4408000111579895},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.42179998755455017}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4417115068","title":"Generative AI for All and for Humanity: From Zero to Hero with Open Data, Space Computing, and Sustainable Smart Cities","url":"https://doi.org/10.1145/3757372.3771874","published":"2025-12-08","authors":["Dongping Liu","Xiaomeng Li","Mengqian Lu","Shiqi Wang","Luyao Zhang"],"abstract":"Generative artificial intelligence is reshaping creativity, learning, and problem solving—a renaissance for all and for humanity. This one-hour workshop guides educators and learners \"from zero to hero\" by linking generative AI with open data, space computing, and sustainable smart-city applications. Participants will curate and publish 3D urban datasets with clear documentation, transform them into interactive models, and simulate energy-aware scenarios such as drone routing. The workshop emphasizes responsible data governance and ethical practice, highlighting principles that call for data to be easy to find and reuse as well as respectful of community rights and benefits. Outcomes are aligned with the United Nations Sustainable Development Goals to connect technical practice with global challenges. Participants will leave with openly licensed teaching resources, including a rubric, wo...","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3757372.3771874","openalex_id":"https://openalex.org/W4417115068","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Amazon (United States)","City University of Hong Kong","Duke Kunshan University","Hong Kong University of Science and Technology"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5372999906539917},{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.4611999988555908},{"id":"https://openalex.org/C2778572836","display_name":"Space (punctuation)","score":0.43959999084472656},{"id":"https://openalex.org/C47177190","display_name":"Curriculum","score":0.4081999957561493},{"id":"https://openalex.org/C59519942","display_name":"Drone","score":0.37209999561309814},{"id":"https://openalex.org/C206588197","display_name":"Reuse","score":0.34220001101493835},{"id":"https://openalex.org/C39389867","display_name":"Corporate governance","score":0.33730000257492065},{"id":"https://openalex.org/C2129575","display_name":"Semantic Web","score":0.3310999870300293}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7134181059","title":"CASE: An Agentic AI Framework for Enhancing Scam Intelligence in Digital Payments","url":"https://doi.org/10.1109/bigdata66926.2025.11402424","published":"2025-12-08","authors":["Nitish Jaipuria","Lorenzo Gatto","Zijun Kan","Shankey Poddar","Bill Cheung","Diksha Bansal","Ramanan Balakrishnan","Aviral Suri","Jose Estevez"],"abstract":"","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/bigdata66926.2025.11402424","openalex_id":"https://openalex.org/W7134181059","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Google (United States)"],"concepts":[{"id":"https://openalex.org/C145097563","display_name":"Payment","score":0.5472000241279602},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.4837000072002411},{"id":"https://openalex.org/C144133560","display_name":"Business","score":0.44530001282691956},{"id":"https://openalex.org/C56739046","display_name":"Knowledge management","score":0.39820000529289246},{"id":"https://openalex.org/C108827166","display_name":"Internet privacy","score":0.3537999987602234},{"id":"https://openalex.org/C108170787","display_name":"Agency (philosophy)","score":0.303600013256073},{"id":"https://openalex.org/C38652104","display_name":"Computer security","score":0.30160000920295715},{"id":"https://openalex.org/C26517878","display_name":"Key (lock)","score":0.29829999804496765}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/sit-graph-state-integrated-tool-graph-for-multi-turn-agents","title":"SIT-Graph: State Integrated Tool Graph for Multi-Turn Agents","url":"https://www.microsoft.com/en-us/research/publication/sit-graph-state-integrated-tool-graph-for-multi-turn-agents/","published":"2025-12-07","authors":["Sijia Li","Yuchen Huang","Zifan Liu","Zijian Li","Jingjing Fu","Lei Song","Jiang Bian","Jun Zhang","Rui Wang"],"abstract":"Despite impressive advances in agent systems, multi-turn tool-use scenarios remain challenging. It is mainly because intent is clarified progressively and the environment evolves with each tool call. While reusing past experience is natural, current LLM agents either treat entire trajectories or pre-defined subtasks as indivisible units, or solely exploit tool-to-tool dependencies, hindering adaptation as states and information evolve across turns. In this paper, we propose a State Integrated Tool Graph (SIT-Graph), which enhances multi-turn tool use by exploiting partially overlapping experience. Inspired by human decision-making that integrates episodic and procedural memory, SIT-Graph captures both compact state representations (episodic-like fragments) and tool-to-tool dependencies (procedural-like routines) from historical trajectories. Specifically, we first build a tool graph from...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":80,"matched_keywords":["Miscellaneous","Artificial intelligence","Computer science","Machine learning","LLM","memory","agent"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/dover-intervention-driven-auto-debugging-for-llm-multi-agent-systems","title":"DoVer: Intervention-Driven Auto Debugging for LLM Multi-Agent Systems","url":"https://www.microsoft.com/en-us/research/publication/dover-intervention-driven-auto-debugging-for-llm-multi-agent-systems/","published":"2025-12-06","authors":["Ming-Jie Ma","Jue Zhang","Fangkai Yang","Yu Kang","Qingwei Lin","S. Rajmohan","Dongmei Zhang"],"abstract":"Large language model (LLM)-based multi-agent systems are challenging to debug because failures often arise from long, branching interaction traces. The prevailing practice is to leverage LLMs for log-based failure localization, attributing errors to a specific agent and step. However, this paradigm has two key limitations: (i) log-only debugging lacks validation, producing untested hypotheses, and (ii) single-step or single-agent attribution is often ill-posed, as we find that multiple distinct interventions can independently repair the failed task. To address the first limitation, we introduce DoVer, an intervention-driven debugging framework, which augments hypothesis generation with active verification through targeted interventions (e.g., editing messages, altering plans). For the second limitation, rather than evaluating on attribution accuracy, we focus on measuring whether the sys...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":88,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","large language models","1970-01-01","LLM","language model","agent","multi-agent"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W7148575771","title":"Efficient Scaling for LLM-based ASR","url":"https://doi.org/10.1109/asru65441.2025.11434774","published":"2025-12-06","authors":["Bingshen Mu","Yiwen Shao","Kun Wei","Dong Yu","Lei Xie"],"abstract":"Large language model (LLM)-based automatic speech recognition (ASR) achieves strong performance but often incurs high computational costs. This work investigates how to obtain the best LLM-ASR performance efficiently. Through comprehensive and controlled experiments, we find that pretraining the speech encoder before integrating it with the LLM leads to significantly better scaling efficiency than the standard practice of joint post-training of LLM-ASR. Based on this insight, we propose a new multi-stage LLM-ASR training strategy, EFIN: Encoder First Integration. Among all training strategies evaluated, EFIN consistently delivers better performance (relative to 21.1 % CERR) with significantly lower computation budgets ($49.9 \\%$ FLOPs). Furthermore, we derive a scaling law that approximates ASR error rates as a computation function, providing practical guidance for LLM-ASR scaling.","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/asru65441.2025.11434774","openalex_id":"https://openalex.org/W7148575771","cited_by_count":2,"quality_score":51,"matched_keywords":["LLM","language model","efficient"],"author_affiliations":["Tencent (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7511000037193298},{"id":"https://openalex.org/C118505674","display_name":"Encoder","score":0.7419000267982483},{"id":"https://openalex.org/C45374587","display_name":"Computation","score":0.7386000156402588},{"id":"https://openalex.org/C99844830","display_name":"Scaling","score":0.6001999974250793},{"id":"https://openalex.org/C28490314","display_name":"Speech recognition","score":0.5389000177383423},{"id":"https://openalex.org/C18555067","display_name":"Joint (building)","score":0.4885999858379364},{"id":"https://openalex.org/C11413529","display_name":"Algorithm","score":0.4327000081539154},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4235999882221222}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":2}},{"id":"openalex:W7148604914","title":"FlexCTC: GPU-powered CTC Beam Decoding With Advanced Contextual Abilities","url":"https://doi.org/10.1109/asru65441.2025.11434636","published":"2025-12-06","authors":["Lilit Grigoryan","Vladimir Bataev","Nikolay Karpov","Andrei Andrusenko","Vitaly Lavrukhin","Boris Ginsburg"],"abstract":"While beam search improves speech recognition quality over greedy decoding, standard implementations are slow, often sequential, and CPU-bound. To fully leverage modern hardware capabilities, we present a novel open-source FlexCTC toolkit for fully GPU-based beam decoding, designed for Connectionist Temporal Classification (CTC) models. Developed entirely in Python and PyTorch, it offers a fast, user-friendly, and extensible alternative to traditional C++, CUDA, or WFST-based decoders. The toolkit features a high-performance, fully batched GPU implementation with eliminated CPU-GPU synchronization and minimized kernel launch overhead via CUDA Graphs. It also supports advanced contextualization techniques, including GPU-powered N-gram language model fusion and phrase-level boosting. These features enable accurate and efficient decoding, making them suitable for both research and productio...","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/asru65441.2025.11434636","openalex_id":"https://openalex.org/W7148604914","cited_by_count":1,"quality_score":46,"matched_keywords":["language model","efficient"],"author_affiliations":["Nvidia (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8112999796867371},{"id":"https://openalex.org/C519991488","display_name":"Python (programming language)","score":0.5322999954223633},{"id":"https://openalex.org/C26713055","display_name":"Implementation","score":0.5285999774932861},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5011000037193298},{"id":"https://openalex.org/C153083717","display_name":"Leverage (statistics)","score":0.499099999666214},{"id":"https://openalex.org/C19889080","display_name":"Beam search","score":0.4318000078201294},{"id":"https://openalex.org/C57273362","display_name":"Decoding methods","score":0.4309000074863434},{"id":"https://openalex.org/C2778119891","display_name":"CUDA","score":0.42489999532699585}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W7148463928","title":"Customizing Speech Recognition Model with Large Language Model Feedback","url":"https://doi.org/10.1109/asru65441.2025.11434757","published":"2025-12-06","authors":["Shaoshi Ling","Guoli Ye"],"abstract":"Automatic speech recognition (ASR) systems have achieved strong performance on general transcription tasks. However, they continue to struggle with recognizing rare named entities and adapting to domain mismatches. In contrast, large language models (LLMs), trained on massive internet-scale datasets, are often more effective across a wide range of domains. In this work, we propose a reinforcement learning based approach for unsupervised domain adaptation, leveraging unlabeled data to enhance transcription quality—particularly the named entities affected by domain mismatch—through feedback from a LLM. Given contextual information, our framework employs a LLM as the reward model to score the hypotheses from the ASR model. These scores serve as reward signals to fine-tune the ASR model via reinforcement learning. Our method achieves a 21% improvement on entity word error rate over conventio...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/asru65441.2025.11434757","openalex_id":"https://openalex.org/W7148463928","cited_by_count":1,"quality_score":46,"matched_keywords":["LLM","language model"],"author_affiliations":["Microsoft (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8116000294685364},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.7235000133514404},{"id":"https://openalex.org/C97541855","display_name":"Reinforcement learning","score":0.710099995136261},{"id":"https://openalex.org/C179926584","display_name":"Transcription (linguistics)","score":0.657800018787384},{"id":"https://openalex.org/C28490314","display_name":"Speech recognition","score":0.6450999975204468},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5580000281333923},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.5164999961853027},{"id":"https://openalex.org/C36503486","display_name":"Domain (mathematical analysis)","score":0.5103999972343445}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W7148570079","title":"Efficient Deployment of Large Speech Recognition Models on GPU","url":"https://doi.org/10.1109/asru65441.2025.11434664","published":"2025-12-06","authors":["Yuekai Zhang","Shuang Yu","Junjie Lai"],"abstract":"Large automatic speech recognition (ASR) models have achieved remarkable progress in recent years, but their deployment in production faces significant challenges due to large model sizes and autoregressive decoding methods. This paper presents comprehensive solutions for efficiently deploying large ASR models on GPUs using NVIDIA Triton Inference Server and TensorRT-LLM. Our deployment framework supports both encoder-decoder architectures and speech LLMs. With a modular design based on NVIDIA Triton, the framework can be easily extended to other model architectures. We implement optimized TensorRT-LLM engines for Whisper models and speech LLMs. Compared to existing implementations, our Whisper TensorRTLLM solution achieves more than $\\mathbf{5 0 \\%}$ throughput improvement. The complete deployment solutions are open-sourced and provide one-click deployment through docker-compose, facili...","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/asru65441.2025.11434664","openalex_id":"https://openalex.org/W7148570079","cited_by_count":0,"quality_score":45,"matched_keywords":["LLM","efficient"],"author_affiliations":["Nvidia (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7175999879837036},{"id":"https://openalex.org/C28490314","display_name":"Speech recognition","score":0.5047000050544739},{"id":"https://openalex.org/C105339364","display_name":"Software deployment","score":0.4959999918937683},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4499000012874603},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.3441999852657318},{"id":"https://openalex.org/C26517878","display_name":"Key (lock)","score":0.31929999589920044},{"id":"https://openalex.org/C104267543","display_name":"Signal processing","score":0.2980000078678131},{"id":"https://openalex.org/C204201278","display_name":"Voice activity detection","score":0.29100000858306885}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7148490330","title":"Continual Pre-training for Codec-Based Speech LLMs: Balancing Understanding and Generation","url":"https://doi.org/10.1109/asru65441.2025.11434591","published":"2025-12-06","authors":["Jiatong Shi","Chunlei Zhang","Jinchuan Tian","Junrui Ni","Hao Zhang","Shinji Watanabe","Dong Yu"],"abstract":"Recent advances in speech language models (LLMs) have extended textual LLMs to the speech domain, but balancing speech understanding and generation remains challenging, especially with codec-based representations. We propose a continual pre-training (CPT) framework that adapts a textual LLM to handle codec-discretized speech, mitigating modality mismatch and preserving linguistic reasoning. Our unified model supports both understanding and generation, achieving strong results across ASR, TTS, S2T-Trans, and S2S-Trans. Notably, we present the first end-to-end, single-pass S2S-Trans system using only neural codec tokens, without intermediate transcriptions, translations, or semantic tokens. CPT proves essential for crossmodal alignment and task generalization, making it a powerful tool for building robust, unified speech LLMs.","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/asru65441.2025.11434591","openalex_id":"https://openalex.org/W7148490330","cited_by_count":0,"quality_score":41,"matched_keywords":["LLM"],"author_affiliations":["International University of the Caribbean","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6240000128746033},{"id":"https://openalex.org/C28490314","display_name":"Speech recognition","score":0.3783000111579895},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3531999886035919},{"id":"https://openalex.org/C98045186","display_name":"Process (computing)","score":0.2793000042438507},{"id":"https://openalex.org/C2778348673","display_name":"Production (economics)","score":0.2711000144481659},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.2590999901294708},{"id":"https://openalex.org/C26517878","display_name":"Key (lock)","score":0.258899986743927},{"id":"https://openalex.org/C9652623","display_name":"Field (mathematics)","score":0.24240000545978546}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4417073344","title":"EliGen: Entity-Level Controlled Image Generation with Regional Attention","url":"https://doi.org/10.1145/3743093.3771013","published":"2025-12-06","authors":["Hong Zhang","Zhongjie Duan","Xingjun Wang","Yingda Chen","Yu Zhang"],"abstract":"Recent advancements in diffusion models have significantly advanced text-to-image generation, yet global text prompts alone remain insufficient for achieving fine-grained control over individual entities within an image. To address this limitation, we present EliGen, a novel framework for Entity-level controlled image Generation. Firstly, we put forward regional attention, a mechanism for diffusion transformers that requires no additional structures, seamlessly integrating entity prompts and arbitrary-shaped spatial masks. By contributing a high-quality dataset with fine-grained spatial and semantic entity-level annotations, we train EliGen to achieve robust and accurate entity-level manipulation, surpassing existing methods in both spatial precision and image quality. Additionally, we propose an inpainting fusion pipeline, extending EliGen’s capabilities to multi-entity image inpainting...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3743093.3771013","openalex_id":"https://openalex.org/W4417073344","cited_by_count":3,"quality_score":40,"matched_keywords":[],"author_affiliations":["Alibaba Group (China)","Communication University of Zhejiang","Zhejiang Lab","Zhejiang University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7797999978065491},{"id":"https://openalex.org/C11727466","display_name":"Inpainting","score":0.6923999786376953},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.6202999949455261},{"id":"https://openalex.org/C2780598303","display_name":"Flexibility (engineering)","score":0.6047000288963318},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.5228000283241272},{"id":"https://openalex.org/C115961682","display_name":"Image (mathematics)","score":0.4927999973297119},{"id":"https://openalex.org/C63479239","display_name":"Robustness (evolution)","score":0.4796000123023987},{"id":"https://openalex.org/C124504099","display_name":"Image segmentation","score":0.31520000100135803}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":3}},{"id":"openalex:W7148294870","title":"mSTEB: Massively Multilingual Evaluation of LLMs on Speech and Text Tasks","url":"https://doi.org/10.1109/asru65441.2025.11434707","published":"2025-12-06","authors":["Luel Hagos Beyene","Vivek Verma","Min Ma","Jesujoba O. Alabi","Fabian David Schmidt","Joyce Nakatumba-Nabende","David Ifeoluwa Adelani"],"abstract":"Large Language models (LLMs) have demonstrated impressive performance on a wide range of tasks, including in multimodal settings such as speech. However, their evaluation is often limited to English and a few high-resource languages. For low-resource languages, there is no standardized evaluation benchmark. In this paper, we address this gap by introducing mSTEB, a new benchmark to evaluate the performance of LLMs on a wide range of tasks covering language identification, text classification, question answering, and translation tasks on both speech and text modalities. We evaluated the performance of leading LLMs such as Gemini 2.0 Flash and GPT-4o and state-of-the-art open models such as Qwen 2 Audio and Gemma 327 B. Our evaluation shows a wide gap in performance between high-resource and low-resource languages, especially for languages spoken in Africa and Americas/Oceania. Our finding...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/asru65441.2025.11434707","openalex_id":"https://openalex.org/W7148294870","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Google (United States)","Google DeepMind (United Kingdom)","Makerere University","Mila - Quebec Artificial Intelligence Institute","Riverbank Local Redevelopment Authority","Saarland University","University of Würzburg"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6601999998092651},{"id":"https://openalex.org/C185798385","display_name":"Benchmark (surveying)","score":0.5011000037193298},{"id":"https://openalex.org/C2776230583","display_name":"Spoken language","score":0.4934000074863434},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.39660000801086426},{"id":"https://openalex.org/C41895202","display_name":"Linguistics","score":0.36230000853538513},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3587000072002411},{"id":"https://openalex.org/C204323151","display_name":"Range (aeronautics)","score":0.31940001249313354},{"id":"https://openalex.org/C2780451532","display_name":"Task (project management)","score":0.2671000063419342}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7148376623","title":"MBENet: Bone-conduction and Air-conduction Fusion Network for Target Speaker Extraction","url":"https://doi.org/10.1109/asru65441.2025.11434606","published":"2025-12-06","authors":["Chen Zhang","Linfeng Feng","Zhi Liu","Xi Zhang","Xiao Li"],"abstract":"Target speaker extraction (TSE) aims to isolate a target speaker’s voice from mixed speech using additional cues. Most TSE models rely on air-conduction (AC) signals, which are easily affected by interfering speakers and background noise. In contrast, bone-conduction (BC) speech is naturally resistant to ambient noise and captures only the target speaker’s voice. Existing BC-AC fusion methods have only been applied in conventional speech enhancement, lacking exploration in TSE applications. To leverage the advantages of BC signals, we propose the Multi-modal Bone-conduction Enhancement Network (MBENet), the first BC-AC fusion model designed for the TSE task. To further enhance performance, we introduce multi-task learning by incorporating the bandwidth extension of the BC channel as an auxiliary task. Experimental results show that our casual model outperforms existing approaches in chal...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/asru65441.2025.11434606","openalex_id":"https://openalex.org/W7148376623","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["China Telecom","China Telecom (China)","Huawei Technologies (China)","Northwestern Polytechnical University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6313999891281128},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5853000283241272},{"id":"https://openalex.org/C28490314","display_name":"Speech recognition","score":0.4223000109195709},{"id":"https://openalex.org/C158525013","display_name":"Fusion","score":0.3659999966621399},{"id":"https://openalex.org/C50644808","display_name":"Artificial neural network","score":0.35670000314712524},{"id":"https://openalex.org/C153180895","display_name":"Pattern recognition (psychology)","score":0.34850001335144043},{"id":"https://openalex.org/C52622490","display_name":"Feature extraction","score":0.3314000070095062},{"id":"https://openalex.org/C33954974","display_name":"Sensor fusion","score":0.3296999931335449}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/simsort-a-data-driven-framework-for-spike-sorting-by-large-scale-electrophysiology-simulation","title":"SimSort: A Data-Driven Framework for Spike Sorting by Large-Scale Electrophysiology Simulation","url":"https://www.microsoft.com/en-us/research/publication/simsort-a-data-driven-framework-for-spike-sorting-by-large-scale-electrophysiology-simulation/","published":"2025-12-05","authors":["Yimu Zhang","Dongqi Han","Yansen Wang","Zhenning Lv","Yu Gu","Dongsheng Li"],"abstract":"Spike sorting is an essential process in neural recording, which identifies and separates electrical signals from individual neurons recorded by electrodes in the brain, enabling researchers to study how specific neurons communicate and process information. Although there exist a number of spike sorting methods which have contributed to significant neuroscientific breakthroughs, many are heuristically designed, making it challenging to verify their correctness due to the difficulty of obtaining ground truth labels from real-world neural recordings. In this work, we explore a data-driven, deep learning-based approach. We begin by creating a large-scale dataset through electrophysiology simulations using biologically realistic computational models. We then present SimSort, a pretraining framework for spike sorting. Trained solely on simulated data, SimSort demonstrates zero-shot generaliza...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"hf-org-paper:zai-org:2512.05905","title":"SCAIL: Towards Studio-Grade Character Animation via In-Context Learning of 3D-Consistent Pose Representations","url":"https://huggingface.co/papers/2512.05905","published":"2025-12-05","authors":["Z.ai/Zhipu"],"abstract":"","companies":["Z.ai/Zhipu"],"matched_orgs":["Z.ai/Zhipu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["HuggingFace org papers","zai-org"],"author_affiliations":["Z.ai/Zhipu"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/zai-org/papers"}},{"id":"openalex:W4417043428","title":"Towards On-device Personalization: Cloud-device Collaborative Data Augmentation for Efficient On-device Language Model","url":"https://doi.org/10.1145/3779452","published":"2025-12-05","authors":["Zhaofeng Zhong","Wei Yuan","Zhaojun Li","Tong Chen","Hao Wang","Xiangyu Zhao","Hongzhi Yin"],"abstract":"With the advancement of large language models (LLMs), significant progress has been achieved in various natural language processing (NLP) tasks. However, existing LLMs still face two major challenges that hinder their broader adoption: (1) their responses tend to be generic and lack personalization tailored to individual users, and (2) they rely heavily on cloud infrastructure due to intensive computational requirements, leading to stable network dependency and response delay. Recent research has predominantly focused on either developing cloud-based personalized LLMs or exploring the on-device deployment of general-purpose LLMs. However, few studies have addressed both limitations simultaneously by investigating personalized on-device language models (LMs). To bridge this gap, we propose CDCDA-PLM, a framework for deploying personalized on-device LMs on user devices with support from a....","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3779452","openalex_id":"https://openalex.org/W4417043428","cited_by_count":1,"quality_score":58,"matched_keywords":["LLM","language model","personalized","personalization","efficient"],"author_affiliations":["Alibaba Group (China)","City University of Hong Kong","The University of Queensland"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8948000073432922},{"id":"https://openalex.org/C183003079","display_name":"Personalization","score":0.8335999846458435},{"id":"https://openalex.org/C105339364","display_name":"Software deployment","score":0.6629999876022339},{"id":"https://openalex.org/C98045186","display_name":"Process (computing)","score":0.5491999983787537},{"id":"https://openalex.org/C100776233","display_name":"Bridge (graph theory)","score":0.5475000143051147},{"id":"https://openalex.org/C79974875","display_name":"Cloud computing","score":0.47510001063346863},{"id":"https://openalex.org/C19768560","display_name":"Dependency (UML)","score":0.45239999890327454},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.43130001425743103}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"apple:ljjwknoop8ypikapgd502tp6","title":"SO-Bench: A Structural Output Evaluation of Multimodal LLMs","url":"https://machinelearning.apple.com/research/so-bench","published":"2025-12-05","authors":["Di Feng","Kaixin Ma","Feng Nan","Haofeng Chen","Bohan Zhai","David Griffiths","Mingfei Gao","Zhe Gan","Eshan Verma","Yinfei Yang","Zhifeng Chen","Afshin Dehghan"],"abstract":"Multimodal large language models (MLLMs) are increasingly deployed in real-world, agentic settings where outputs must not only be correct, but also conform to predefined data schemas. Despite recent progress in structured generation in textual domain, there is still no benchmark that systematically evaluates schema-grounded information extraction and reasoning over visual inputs. In this work, we conduct a comprehensive study of visual structural...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"openalex:W4417026094","title":"Peer-aided repairer: empowering large language models to repair advanced student assignments","url":"https://doi.org/10.1007/s10664-025-10716-z","published":"2025-12-05","authors":["Qianhui Zhao","Li Zhang","Fang Liu","Yang Liu","Yan Zhen","Zhenghao Chen","Yufei Zhou","Jing Jiang","Ge Li","Zian Sun","Zhongqi Li","Yuchi Ma"],"abstract":"","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1007/s10664-025-10716-z","openalex_id":"https://openalex.org/W4417026094","cited_by_count":1,"quality_score":38,"matched_keywords":[],"author_affiliations":["Beihang University","Cloud Computing Center","Huawei Technologies (China)","Huawei Technologies (United States)","Peking University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7915999889373779},{"id":"https://openalex.org/C12725497","display_name":"Baseline (sea)","score":0.602400004863739},{"id":"https://openalex.org/C115903868","display_name":"Software engineering","score":0.5266000032424927},{"id":"https://openalex.org/C81917197","display_name":"Selection (genetic algorithm)","score":0.48910000920295715},{"id":"https://openalex.org/C34165917","display_name":"Programming paradigm","score":0.42250001430511475},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.41819998621940613},{"id":"https://openalex.org/C199360897","display_name":"Programming language","score":0.40119999647140503},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.3589000105857849}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/spacecontrol-introducing-test-time-spatial-control-to-3d-generative-modeling","title":"SpaceControl: Introducing Test-Time Spatial Control to 3D Generative Modeling","url":"https://www.microsoft.com/en-us/research/publication/spacecontrol-introducing-test-time-spatial-control-to-3d-generative-modeling/","published":"2025-12-04","authors":["Elisabetta Fedele","Francis Engelmann","Ian Huang","O. Litany","Marc Pollefeys","Leonidas J. Guibas"],"abstract":"Generative methods for 3D assets have recently achieved remarkable progress, yet providing intuitive and precise control over the object geometry remains a key challenge. Existing approaches predominantly rely on text or image prompts, which often fall short in geometric specificity: language can be ambiguous, and images are cumbersome to edit. In this work, we introduce SpaceControl, a training-free test-time method for explicit spatial control of 3D generation. Our approach accepts a wide range of geometric inputs, from coarse primitives to detailed meshes, and integrates seamlessly with modern pre-trained generative models without requiring any additional training. A controllable parameter lets users trade off between geometric fidelity and output realism. Extensive quantitative evaluation and user studies demonstrate that SpaceControl outperforms both training-based and optimization-...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W4417070084","title":"Can Large Language Models Be Query Optimizer for Relational Databases?","url":"https://doi.org/10.1145/3769771","published":"2025-12-04","authors":["Jie Tan","Kangfei Zhao","Rui Li","Jeffrey Xu Yu","Chengzhi Piao","Hong Cheng","Helen Meng","Deli Zhao","Yu Rong"],"abstract":"Query optimization is a complex planning and decision-making problem within the exponentially growing plan space in database management systems (DBMS). Traditional optimization techniques have been extensively studied over decades, leaving limited room for further improvement along this track. Recent developments of Large Language Models (LLMs) have demonstrated their potential in solving complex planning and decision-making problems, such as arithmetic and programmatic tasks. In this paper, we try to explore the potential of LLMs in handling query optimization and propose a tentative LLM-based query optimizer dubbed LLM-QO, established on PostgreSQL's execution engine. In LLM-QO, we formulate query optimization in an autoregressive fashion which directly generates the execution plan without explicit plan enumeration. To investigate the essential input of LLM-QO, we design a customized d...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3769771","openalex_id":"https://openalex.org/W4417070084","cited_by_count":2,"quality_score":47,"matched_keywords":["LLM","preference"],"author_affiliations":["Alibaba Group (China)","Chinese University of Hong Kong","Guangzhou University","Hong Kong Baptist University","South China University of Technology"],"concepts":[{"id":"https://openalex.org/C157692150","display_name":"Query optimization","score":0.8598999977111816},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8245000243186951},{"id":"https://openalex.org/C192939062","display_name":"Sargable","score":0.7699000239372253},{"id":"https://openalex.org/C2779729312","display_name":"Query plan","score":0.7279000282287598},{"id":"https://openalex.org/C192028432","display_name":"Query language","score":0.6638000011444092},{"id":"https://openalex.org/C52723943","display_name":"Serialization","score":0.5891000032424927},{"id":"https://openalex.org/C96956885","display_name":"RDF query language","score":0.5855000019073486},{"id":"https://openalex.org/C99016210","display_name":"Query expansion","score":0.5410000085830688}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":2}},{"id":"openalex:W4417004072","title":"PerTTS: Personalized and Controllable Zero-Shot Spontaneous Style Text-to-Speech Synthesis","url":"https://doi.org/10.1109/taslpro.2025.3639814","published":"2025-12-04","authors":["Weiqin Li","Qian Chen","Dan Luo","Tianjiao Du","Yafeng Chen","Zhiyong Wu","Xixin Wu","Helen Meng"],"abstract":"In spoken scenarios, achieving personalized and controllable zero-shot spontaneous style speech synthesis is highly significant, particularly in generating natural and expressive speech for unseen speakers under data-limited conditions. Traditional methods typically achieve this by fine-tuning pre-trained multi-speaker speech synthesis models or adopting zero-shot adaptation techniques. However, these methods exhibit limitations in voice cloning and style modeling, struggling to capture f ine-grained voice characteristics and complex speaking styles of target speakers. In this paper, we propose PerTTS, a personalized and controllable zero-shot spontaneous speech synthesis method. This approach introduces a personalized speaking style encoder that utilizes pre-trained models and a local prosody encoder to extract semantic, duration, timbre and prosody information from multiple reference u...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/taslpro.2025.3639814","openalex_id":"https://openalex.org/W4417004072","cited_by_count":0,"quality_score":45,"matched_keywords":["personalized","distillation"],"author_affiliations":["Alibaba Group (China)","Alibaba Group (United States)","Chinese University of Hong Kong","Tsinghua University","Tsinghua–Berkeley Shenzhen Institute","University Town of Shenzhen","University of Hong Kong"],"concepts":[{"id":"https://openalex.org/C134537474","display_name":"Naturalness","score":0.9067999720573425},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7921000123023987},{"id":"https://openalex.org/C542774811","display_name":"Prosody","score":0.7644000053405762},{"id":"https://openalex.org/C14999030","display_name":"Speech synthesis","score":0.6748999953269958},{"id":"https://openalex.org/C28490314","display_name":"Speech recognition","score":0.595300018787384},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.474700003862381},{"id":"https://openalex.org/C2776445246","display_name":"Style (visual arts)","score":0.4162999987602234},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.4090999960899353}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/multimodal-reinforcement-learning-with-agentic-verifier-for-ai-agents","title":"Multimodal Reinforcement Learning with Agentic Verifier for AI Agents","url":"https://www.microsoft.com/en-us/research/publication/multimodal-reinforcement-learning-with-agentic-verifier-for-ai-agents/","published":"2025-12-03","authors":["Reuben Tan","Baolin Peng","Zhengyuan Yang","Hao Cheng","Oier Mees","Theodore Zhao","Andrea Tupini","Isar Meijer","Qianhui Wu","Yuncong Yang","Lars Liden","Yu Gu"],"abstract":"Agentic reasoning models trained with multimodal reinforcement learning (MMRL) have become increasingly capable, yet they are almost universally optimized using sparse, outcome-based rewards computed based on the final answers. Richer rewards computed from the reasoning tokens can improve learning significantly by providing more fine-grained guidance. However, it is challenging to compute more informative rewards in MMRL beyond those based on outcomes since different samples may require different scoring functions and teacher models may provide noisy reward signals too. In this paper, we introduce the Argos (Agentic Reward for Grounded & Objective Scoring), a principled reward agent to train multimodal reasoning models for agentic tasks. For each sample, Argos selects from a pool of teacher-model derived and rule-based scoring functions to simultaneously evaluate: (i) final response accu...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Miscellaneous","Artificial intelligence","Reinforcement learning","agent"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"hf-org-paper:tencent:2512.03794","title":"AdaptVision: Efficient Vision-Language Models via Adaptive Visual Acquisition","url":"https://huggingface.co/papers/2512.03794","published":"2025-12-03","authors":["Tencent/Hunyuan"],"abstract":"","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["HuggingFace org papers","tencent","efficient"],"author_affiliations":["Tencent/Hunyuan"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/tencent/papers"}},{"id":"apple:hspekc2dywt3n6sx02m67ypx","title":"Semantic Regexes: Auto-Interpreting LLM Features with a Structured Language","url":"https://machinelearning.apple.com/research/semantic-regexes","published":"2025-12-03","authors":["Angie Boggust","Donghao Ren","Yannick Assogba","Dominik Moritz","Arvind Satyanarayan","Fred Hohman"],"abstract":"Automated interpretability aims to translate large language model (LLM) features into human understandable descriptions. However, these natural language feature descriptions are often vague, inconsistent, and require manual relabeling. In response, we introduce semantic regexes, structured language descriptions of LLM features. By combining primitives that capture linguistic and semantic feature patterns with modifiers for contextualization,...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["LLM","language model"],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"apple:ry1m2uj8gqurazlmtacwvznq","title":"PREDICT: Preference Reasoning by Evaluating Decomposed preferences Inferred from Candidate Trajectories","url":"https://machinelearning.apple.com/research/predict","published":"2025-12-03","authors":["Stéphane Aroca-Ouellette","Natalie Mackraz","Barry-John Theobald","Katherine Metcalf"],"abstract":"Accommodating human preferences is essential for creating AI agents that deliver personalized and effective interactions. Recent work has shown the potential for LLMs to infer preferences from user interactions, but they often produce broad and generic preferences, failing to capture the unique and individualized nature of human preferences. This paper introduces PREDICT, a method designed to enhance the precision and adaptability of inferring...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["personalized","preference"],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"openalex:W7108329187","title":"Distilling a Small Utility-Based Passage Selector to Enhance Retrieval-Augmented Generation","url":"https://doi.org/10.1145/3767695.3769496","published":"2025-12-03","authors":["Hengran Zhang","Keping Bi","Jiafeng Guo","Jiaming Zhang","Shuaiqiang Wang","Dawei Yin","Xueqi Cheng"],"abstract":"Retrieval-augmented generation (RAG) enhances large language models (LLMs) by incorporating retrieved information. Standard retrieval process prioritized relevance, focusing on topical alignment between queries and passages. In contrast, in RAG, the emphasis has shifted to utility, which considers the usefulness of passages for generating accurate answers. Despite empirical evidence showing the benefits of utility-based retrieval in RAG, the high computational cost of using LLMs for utility judgments limits the number of passages evaluated. This restriction is problematic for complex queries requiring extensive information. To address this, we propose a method to distill the utility judgment capabilities of LLMs into smaller, more efficient models. Our approach focuses on utility-based selection rather than ranking, enabling dynamic passage selection tailored to specific queries without....","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3767695.3769496","openalex_id":"https://openalex.org/W7108329187","cited_by_count":0,"quality_score":49,"matched_keywords":["retrieval","efficient","distillation"],"author_affiliations":["Baidu (China)","Institute of Computing Technology","University of Chinese Academy of Sciences"],"concepts":[{"id":"https://openalex.org/C189430467","display_name":"Ranking (information retrieval)","score":0.892300009727478},{"id":"https://openalex.org/C158154518","display_name":"Relevance (law)","score":0.838699996471405},{"id":"https://openalex.org/C81917197","display_name":"Selection (genetic algorithm)","score":0.7932000160217285},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7422999739646912},{"id":"https://openalex.org/C98045186","display_name":"Process (computing)","score":0.5929999947547913},{"id":"https://openalex.org/C2778751112","display_name":"Window (computing)","score":0.49889999628067017},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.4986000061035156},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4348999857902527}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7108315616","title":"CogPlanner: Unveiling the Potential of Agentic Multimodal Retrieval Augmented Generation with Planning","url":"https://doi.org/10.1145/3767695.3769486","published":"2025-12-03","authors":["Xiaohan Yu","Zhihan Yang","Chong Chen"],"abstract":"Multimodal Retrieval Augmented Generation (MRAG) systems have shown promise in enhancing the generation capabilities of multimodal large language models (MLLMs). However, existing MRAG frameworks primarily adhere to rigid, single-step retrieval strategies that fail to address real-world challenges of information acquisition and query reformulation. In this work, we introduce the task of Multimodal Retrieval Augmented Generation Planning (MRAG Planning) that aims at effective information seeking and integration while minimizing computational overhead. Specifically, we propose CogPlanner, an agentic plug-and-play framework inspired by human cognitive processes, which iteratively determines query reformulation and retrieval strategies to generate accurate and contextually relevant responses. CogPlanner supports parallel and sequential modeling paradigms. Furthermore, we introduce CogBench,....","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3767695.3769486","openalex_id":"https://openalex.org/W7108315616","cited_by_count":1,"quality_score":46,"matched_keywords":["retrieval","efficient"],"author_affiliations":["Huawei Technologies (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8230000138282776},{"id":"https://openalex.org/C2780451532","display_name":"Task (project management)","score":0.6959999799728394},{"id":"https://openalex.org/C185798385","display_name":"Benchmark (surveying)","score":0.6542999744415283},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5192999839782715},{"id":"https://openalex.org/C21025794","display_name":"Cognitive models of information retrieval","score":0.4433000087738037},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.3928000032901764},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.38190001249313354},{"id":"https://openalex.org/C2780910867","display_name":"Multimodality","score":0.3513000011444092}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W7108331514","title":"Trustworthy Information Retrieval in the LLM Era: Bias, Unfairness, and Hallucination","url":"https://doi.org/10.1145/3767695.3769670","published":"2025-12-03","authors":["Sunhao Dai","Chen Xu","Shicheng Xu","Zhongxiang Sun","Liang Pang","Zhenhua Dong","Jun Xu"],"abstract":"The rapid progress of large language models (LLMs) has fundamentally reshaped information retrieval (IR) systems, including search engines and recommender systems, by enabling new capabilities and interaction paradigms. However, the integration of LLMs into IR pipelines also brings pressing challenges to trustworthiness, particularly in the form of bias, unfairness, and hallucination, which can significantly disrupt the information ecosystem. This tutorial provides a comprehensive overview of these challenges and their emerging mitigation strategies. We begin by presenting a unified perspective that frames bias, unfairness, and hallucination as manifestations of distribution mismatch, with mitigation strategies broadly conceptualized under distribution alignment. Building on this framework, we examine how these issues arise across three critical stages of LLM-integrated IR systems: data....","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3767695.3769670","openalex_id":"https://openalex.org/W7108331514","cited_by_count":0,"quality_score":45,"matched_keywords":["LLM","retrieval"],"author_affiliations":["Beijing Academy of Artificial Intelligence","Chinese Academy of Sciences","Huawei Technologies (China)","Institute of Computing Technology","Renmin University of China"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6807000041007996},{"id":"https://openalex.org/C174348530","display_name":"Bridging (networking)","score":0.6419000029563904},{"id":"https://openalex.org/C153701036","display_name":"Trustworthiness","score":0.607699990272522},{"id":"https://openalex.org/C2522767166","display_name":"Data science","score":0.499099999666214},{"id":"https://openalex.org/C12713177","display_name":"Perspective (graphical)","score":0.48989999294281006},{"id":"https://openalex.org/C165064840","display_name":"Matching (statistics)","score":0.3560999929904938},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.3276999890804291},{"id":"https://openalex.org/C206345919","display_name":"Resource (disambiguation)","score":0.3255000114440918}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7108338987","title":"On the Diminishing Returns of Complex Robust RAG Training in the Era of Powerful LLMs","url":"https://doi.org/10.1145/3767695.3769518","published":"2025-12-03","authors":["Hanxing Ding","Shuchang Tao","Liang Pang","Zihao Wei","Liwei Chen","Kun Xu","Huawei Shen","Xueqi Cheng"],"abstract":"Retrieval-augmented generation (RAG) systems traditionally employ sophisticated training strategies to enhance robustness against retrieval noise. In this work, we investigate a critical question: does the benefit of these complex robust training methods diminish as language models become more powerful? Through systematic evaluation across multiple model scales and question-answering datasets, our analysis reveals a consistent trend: the marginal robustness benefit of sophisticated training strategies decreases substantially as model capacity increases. While smaller models show significant performance improvements from complex document selection and adversarial objectives, more capable models achieve comparable or even superior performance with simpler training approaches. Further investigation demonstrates that stronger models naturally exhibit better confidence calibration, cross-data...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3767695.3769518","openalex_id":"https://openalex.org/W7108338987","cited_by_count":0,"quality_score":41,"matched_keywords":["retrieval"],"author_affiliations":["Alibaba Group (China)","Chinese Academy of Sciences","Institute of Computing Technology","Kuaishou (China)"],"concepts":[{"id":"https://openalex.org/C63479239","display_name":"Robustness (evolution)","score":0.7473000288009644},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5529999732971191},{"id":"https://openalex.org/C37736160","display_name":"Adversarial system","score":0.5516999959945679},{"id":"https://openalex.org/C2777211547","display_name":"Training (meteorology)","score":0.5281999707221985},{"id":"https://openalex.org/C177148314","display_name":"Generalization","score":0.44600000977516174},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.40950000286102295},{"id":"https://openalex.org/C112930515","display_name":"Risk analysis (engineering)","score":0.40049999952316284},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.376800000667572}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"arxiv:2504.07343","title":"Code Generation with Small Language Models: A Codeforces-Based Study","url":"http://arxiv.org/abs/2504.07343","published":"2025-12-03","authors":["Débora Souza","Rohit Gheyi","Lucas Albuquerque","Gustavo Soares","Márcio Ribeiro"],"abstract":"Large Language Models (LLMs) demonstrate capabilities in code generation, potentially boosting developer productivity. However, their adoption remains limited by high computational costs, among other factors. Small Language Models (SLMs) present a lightweight alternative. While LLMs have been evaluated on competitive programming tasks, prior work often emphasizes metrics like Elo or pass rates, neglecting failure analysis. The potential of SLMs in this space remains underexplored. In this study, we benchmark three open SLMs—Llama-3.2-3B, Gemma-3-12B, and Phi-4-14B—across 280 Codeforces problems spanning Elo ratings from 800 to 2100 and covering 36 distinct topics. All models were tasked with generating Python solutions. Phi-4-14B achieved the best SLM performance with a pass@3 of 63.6%, nearing o3-mini-high (86.8%). Combining Python and C++ outputs increased Phi-4-14B’s pass@6 to 73.6%.....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/icmla66185.2025.00085","openalex_id":"https://openalex.org/W4415249770","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Microsoft (United States)","Universidade Federal de Alagoas","Universidade Federal de Campina Grande"],"concepts":[{"id":"https://openalex.org/C519991488","display_name":"Python (programming language)","score":0.8001999855041504},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.717199981212616},{"id":"https://openalex.org/C199360897","display_name":"Programming language","score":0.6861000061035156},{"id":"https://openalex.org/C46686674","display_name":"Boosting (machine learning)","score":0.558899998664856},{"id":"https://openalex.org/C185798385","display_name":"Benchmark (surveying)","score":0.3878999948501587},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.38589999079704285},{"id":"https://openalex.org/C133162039","display_name":"Code generation","score":0.37610000371932983},{"id":"https://openalex.org/C115903868","display_name":"Software engineering","score":0.3758000135421753}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/metrorlhf-enabling-memory-effective-training-for-on-policy-rlhf-via-adaptive-sequence-streaming","title":"MetroRLHF: Enabling Memory-Effective Training for On-Policy RLHF via Adaptive Sequence Streaming","url":"https://www.microsoft.com/en-us/research/publication/metrorlhf-enabling-memory-effective-training-for-on-policy-rlhf-via-adaptive-sequence-streaming/","published":"2025-12-02","authors":["Wei Cui","Peng Cheng"],"abstract":"Reinforcement learning from human feedback (RLHF) has become thestandard post-training technique for endowing large language models (LLMs)with helpful, harmless, and intent-consistent behavior. In practice, however, itsadoption is hampered by prohibitive memory consumption during the phase ofthe policy-model update, especially when training on long-form generation tasks.In this paper, we propose MetroRLHF, a memory-efficient, on-policy RLHF approach that exploits the inference-time computations to reduce the training-timememory budget and to skip unnecessary work. By re-using the inference-phasematerialized K,V context, the inter-token dependencies are freely removed thatnormally force the entire sequence to train in parallel. Building upon fine-grainedsubsequence streaming, RLHF can train the productive tokens in an effectivemanner. This yields a training pipeline that matches the exact...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":80,"matched_keywords":["Inproceedings (Conference)","Algorithms","Systems and networking","1970-01-01","LLM","memory","efficient"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/lost-in-transmission-when-and-why-llms-fail-to-reason-globally","title":"Lost in Transmission: When and Why LLMs Fail to Reason Globally","url":"https://www.microsoft.com/en-us/research/publication/lost-in-transmission-when-and-why-llms-fail-to-reason-globally/","published":"2025-12-02","authors":["Tobias Schnabel","Kiran Tomlinson","Adith Swaminathan","Jennifer Neville"],"abstract":"Despite their many successes, transformer-based large language models (LLMs) continue to struggle with tasks that require complex reasoning over large parts of their input. We argue that these failures arise due to capacity limits on the accurate flow of information within LLMs. To formalize this issue, we introduce the bounded attention prefix oracle (BAPO) model, a new computational framework that models bandwidth constraints on attention heads, the mechanism for internal communication in LLMs. We show that several important reasoning problems like graph reachability require high communication bandwidth for BAPOs to solve; we call these problems BAPO-hard. Our experiments corroborate our theoretical predictions: GPT-4, Claude, and Gemini succeed on BAPO-easy tasks and fail even on relatively small BAPO-hard tasks. BAPOs also reveal another benefit of chain of thought (CoT): we prove th...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":80,"matched_keywords":["Inproceedings (Conference)","Algorithms","Artificial intelligence","Computational complexity theory","Machine learning","1970-01-01","LLM"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/reviving-dsp-for-advanced-theorem-proving-in-the-era-of-reasoning-models","title":"Reviving DSP for Advanced Theorem Proving in the Era of Reasoning Models","url":"https://www.microsoft.com/en-us/research/publication/reviving-dsp-for-advanced-theorem-proving-in-the-era-of-reasoning-models/","published":"2025-12-02","authors":["Chenrui Cao","Liangcheng Song","Zenan Li","Xinyi Le","Xian Zhang","Hui Xue","Fan Yang"],"abstract":"Recent advancements, such as DeepSeek-Prover-V2-671B and Kimina-Prover-Preview-72B, demonstrate a prevailing trend in leveraging reinforcement learning (RL)-based large-scale training for automated theorem proving. Surprisingly, we discover that even without any training, careful neuro-symbolic coordination of existing off-the-shelf reasoning models and tactic step provers can achieve comparable performance. This paper introduces \\textbf{DSP+}, an improved version of the Draft, Sketch, and Prove framework, featuring a \\emph{fine-grained and integrated} neuro-symbolic enhancement for each phase: (1) In the draft phase, we prompt reasoning models to generate concise natural-language subgoals to benefit the sketch phase, removing thinking tokens and references to human-written proofs; (2) In the sketch phase, subgoals are autoformalized with hypotheses to benefit the proving phase, and sket...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"hf-org-paper:deepseek-ai:2512.02556","title":"DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models","url":"https://huggingface.co/papers/2512.02556","published":"2025-12-02","authors":["DeepSeek"],"abstract":"We introduce DeepSeek-V3.2, a model that harmonizes high computational efficiency with superior reasoning and agent performance. The key technical breakthroughs of DeepSeek-V3.2 are as follows: (1) DeepSeek Sparse Attention (DSA): We introduce DSA, an efficient attention mechanism that substantially reduces computational complexity while preserving model performance in long-context scenarios. (2) Scalable Reinforcement Learning Framework: By implementing a robust reinforcement learning protocol and scaling post-training compute, DeepSeek-V3.2 performs comparably to GPT-5. Notably, our high-compute variant, DeepSeek-V3.2-Speciale, surpasses GPT-5 and exhibits reasoning proficiency on par with Gemini-3.0-Pro, achieving gold-medal performance in both the 2025 International Mathematical Olympiad (IMO) and the International Olympiad in Informatics (IOI). (3) Large-Scale Agentic Task Synthesis...","companies":["DeepSeek"],"matched_orgs":["DeepSeek"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["HuggingFace org papers","deepseek-ai","efficient","agent"],"author_affiliations":["DeepSeek"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/deepseek-ai/papers"}},{"id":"hf-org-paper:tencent:2512.02631","title":"SeeNav-Agent: Enhancing Vision-Language Navigation with Visual Prompt and Step-Level Policy Optimization","url":"https://huggingface.co/papers/2512.02631","published":"2025-12-02","authors":["Tencent/Hunyuan"],"abstract":"","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["HuggingFace org papers","tencent","agent"],"author_affiliations":["Tencent/Hunyuan"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/tencent/papers"}},{"id":"arxiv:2604.02211","title":"Multi-Agent Video Recommenders: Evolution, Patterns, and Open Challenges","url":"http://arxiv.org/abs/2604.02211","published":"2025-12-02","authors":["Srikanth Ranganathan","Abhishek Dharmaratnakar","Anushree Sinha","Debanshu Das"],"abstract":"Video recommender systems are among the most popular and impactful applications of AI, shaping content consumption and influencing culture for billions of users. Traditional single-model recommenders, which optimize static engagement metrics, are increasingly limited in addressing the dynamic requirements of modern platforms. In response, multiagent architectures are redefining how video recommender systems serve, learn, and adapt to both users and datasets. These agent-based systems coordinate specialized agents responsible for video understanding, reasoning, memory, and feedback, to provide precise, explainable recommendations. In this survey, we trace the evolution of multi-agent video recommendation systems (MAVRS). We combine ideas from multi-agent recommender systems, foundation models, and conversational AI, culminating in the emerging field of large language model (LLM)-powered M...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.36227/techrxiv.176471435.56211583/v1","openalex_id":"https://openalex.org/W4416928998","cited_by_count":0,"quality_score":61,"matched_keywords":["LLM","language model","personalization","memory","agent","multi-agent"],"author_affiliations":["Google (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.796500027179718},{"id":"https://openalex.org/C557471498","display_name":"Recommender system","score":0.7638000249862671},{"id":"https://openalex.org/C183003079","display_name":"Personalization","score":0.7310000061988831},{"id":"https://openalex.org/C49774154","display_name":"Multimedia","score":0.4925999939441681},{"id":"https://openalex.org/C136764020","display_name":"World Wide Web","score":0.4742000102996826},{"id":"https://openalex.org/C9652623","display_name":"Field (mathematics)","score":0.4722999930381775},{"id":"https://openalex.org/C97541855","display_name":"Reinforcement learning","score":0.4438000023365021},{"id":"https://openalex.org/C75291252","display_name":"TRACE (psycholinguistics)","score":0.4171000123023987}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"hf-org-paper:tencent:2512.02472","title":"Guided Self-Evolving LLMs with Minimal Human Supervision","url":"https://huggingface.co/papers/2512.02472","published":"2025-12-02","authors":["Tencent/Hunyuan"],"abstract":"","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["HuggingFace org papers","tencent"],"author_affiliations":["Tencent/Hunyuan"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/tencent/papers"}},{"id":"bytedance-seed:875","title":"GR-RL: Going Dexterous and Precise for Long-Horizon Robotic Manipulation","url":"https://seed.bytedance.com/en/research/gr-rl-going-dexterous-and-precise-for-long-horizon-robotic-manipulation","published":"2025-12-02","authors":["Yunfei Li","Xiao Ma","Jiafeng Xu","Yu Cui","Zhongren Cui","Zhigang Han","Liqun Huang","Tao Kong","Yuxiao Liu","Hao Niu","Wanli Peng","Jingchao Qiao"],"abstract":"We present GR-RL, a robotic learning framework that turns a generalist vision-language-action (VLA) policy into a highly capable specialist for long-horizon dexterous manipulation. Assuming the optimality of human demonstrations is core to existing VLA policies. However, we claim that in highly dexterous and precise manipulation tasks, human demonstrations are noisy and suboptimal. GR-RL proposes a multi-stage training pipeline that filters, augments, and reinforces the demonstrations by reinforcement learning. First, GR-RL learns a vision-language-conditioned task progress, filters the demonstration trajectories, and only keeps the transitions that contribute positively to the progress. Specifically, we show that by directly applying offline RL with sparse reward, the resulting Q-values can be treated as a robust progress function. Next, we introduce morphological symmetry augmentation....","companies":["ByteDance/Seed"],"matched_orgs":["ByteDance/Seed"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Robotics","arXiv"],"author_affiliations":["ByteDance/Seed"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official ByteDance Seed publication API https://seed.bytedance.com/api/get_article_list_v2"}},{"id":"openalex:W4416926807","title":"From Embeddings to Accuracy: Comparing Foundation Models for Radiographic Classification","url":"https://doi.org/10.1007/s10278-025-01747-5","published":"2025-12-02","authors":["Xue Li","Jameson Merkow","Noel Codella","Alberto Santamaría-Pang","Naiteek Sangani","Alexander Ersoy","Christopher Burt","John W. Garrett","Richard J. Bruce","Joshua Warner","Tyler Bradshaw","Ivan Tarapov"],"abstract":"Foundation models, pre-trained on extensive datasets, have significantly advanced machine learning by providing robust and transferable embeddings applicable to various domains, including medical imaging diagnostics. This study evaluates the utility of embeddings derived from both general-purpose and medical domain-specific foundation models for training lightweight adapter models in multi-class radiography classification, focusing specifically on tube placement assessment and related findings, with comparison to the end-to-end training of an established convolutional neural network. A dataset comprising 8842 radiographs classified into seven distinct categories was employed to extract embeddings using seven foundation models: DenseNet121, BiomedCLIP, Med-Flamingo, MedImageInsight, MedSigLIP, Rad-DINO, and CXR-Foundation. Adapter models were subsequently trained using classical machine l...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1007/s10278-025-01747-5","openalex_id":"https://openalex.org/W4416926807","cited_by_count":2,"quality_score":43,"matched_keywords":["efficient"],"author_affiliations":["Microsoft (United States)","University of Wisconsin–Madison"],"concepts":[{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.679099977016449},{"id":"https://openalex.org/C81363708","display_name":"Convolutional neural network","score":0.6144999861717224},{"id":"https://openalex.org/C12267149","display_name":"Support vector machine","score":0.6053000092506409},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5891000032424927},{"id":"https://openalex.org/C177284502","display_name":"Adapter (computing)","score":0.5641000270843506},{"id":"https://openalex.org/C127808970","display_name":"Bonferroni correction","score":0.5605999827384949},{"id":"https://openalex.org/C206041023","display_name":"Wilcoxon signed-rank test","score":0.5418999791145325},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.5127999782562256}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":2}},{"id":"openalex:W4416926265","title":"AutoLife: Automatic Life Journaling with Smartphones and LLMs","url":"https://doi.org/10.1145/3770683","published":"2025-12-02","authors":["Huatao Xu","Zilin Zeng","Panrong Tong","Mo Li","Mani Srivastava"],"abstract":"This paper introduces a novel mobile sensing application - life journaling - designed to generate semantic descriptions of users' daily lives. We present AutoLife, an automatic life journaling system based on commercial smartphones. AutoLife only inputs low-cost sensor data (without photos or audio) from smartphones and can automatically generate comprehensive life journals for users. To achieve this, we first derive time, motion, and location contexts from multimodal sensor data, and harness the zero-shot capabilities of Large Language Models (LLMs), enriched with commonsense knowledge about human lives, to interpret diverse contexts and generate life journals. To manage the task complexity and long sensing duration, a multilayer framework is proposed, which decomposes tasks and seamlessly integrates LLMs with other techniques for life journaling. This study establishes a real-life data...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3770683","openalex_id":"https://openalex.org/W4416926265","cited_by_count":1,"quality_score":38,"matched_keywords":[],"author_affiliations":["Alibaba Group (China)","Hong Kong University of Science and Technology","University of California, Los Angeles"],"concepts":[{"id":"https://openalex.org/C2225880","display_name":"Journaling file system","score":0.9717000126838684},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.736299991607666},{"id":"https://openalex.org/C2780451532","display_name":"Task (project management)","score":0.6636999845504761},{"id":"https://openalex.org/C186967261","display_name":"Mobile device","score":0.522599995136261},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.49900001287460327},{"id":"https://openalex.org/C185798385","display_name":"Benchmark (surveying)","score":0.4945000112056732},{"id":"https://openalex.org/C2988145974","display_name":"Mobile apps","score":0.45910000801086426},{"id":"https://openalex.org/C93518851","display_name":"Metadata","score":0.4505000114440918}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W7131207497","title":"Multi-view Leaderboard: Towards Evaluating the Code Intelligence of LLMs From Multiple Views","url":"https://doi.org/10.1109/apsec66846.2025.00119","published":"2025-12-02","authors":["Mengyuan Liu","Zexun Zhan","Cuiyun Gao","Yujia Chen","Xu Gao","Chun Yong Chong","Shan Gao","Xin Xia"],"abstract":"Large Language Models (LLMs) have shown remarkable performance in code intelligence tasks, prompting the development of various benchmarks and leaderboards to assess their effectiveness across diverse programming scenarios. However, existing leaderboards often rely on coarse-grained metrics and overlook performance variations across different types of tasks. In this paper, we introduce Multi-view Leaderboard, a comprehensive evaluation framework designed to assess the coding capabilities of LLMs from multiple views. Our leaderboard partitions widely-used datasets such as HumanEval, MBPP, and ComplexCodeEval into subsets based on factors like prompt length, problem complexity, and task type. It supports four popular code intelligence tasks including code generation, code completion, test case generation, and API recommendation. Additionally, our leaderboard presents results using ranking....","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/apsec66846.2025.00119","openalex_id":"https://openalex.org/W7131207497","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Harbin Institute of Technology","Huawei Technologies (China)","Monash University Malaysia","Sichuan University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7293000221252441},{"id":"https://openalex.org/C2776760102","display_name":"Code (set theory)","score":0.5895000100135803},{"id":"https://openalex.org/C179518139","display_name":"Coding (social sciences)","score":0.5411999821662903},{"id":"https://openalex.org/C2780451532","display_name":"Task (project management)","score":0.5252000093460083},{"id":"https://openalex.org/C189430467","display_name":"Ranking (information retrieval)","score":0.484499990940094},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.45339998602867126},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.3824000060558319},{"id":"https://openalex.org/C2522767166","display_name":"Data science","score":0.3479999899864197}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/new-future-of-work-report-2025","title":"New Future of Work Report 2025","url":"https://www.microsoft.com/en-us/research/publication/new-future-of-work-report-2025/","published":"2025-12-01","authors":["Jenna Butler","Sonia Jaffe","Rebecca Janssen","Nancy Baym","Jake Hofman","Brent Hecht","Sean Rintel","Bahar Sarrafzadeh","Abigail Sellen","Mihaela Vorvoreanu","Jaime Teevan","Mohammed Alsobay"],"abstract":"Note from Chief Scientist and editor Jaime Teevan: As you sit down to read the 2025 New Future of Work report, it’s worth pausing to consider the thread that ties the past five years of reports together. The inaugural New Future of Work report , published in 2021, focused on new ways people could work without relying on colocation as a key productivity tool. The second, in 2022 , centered on the reintroduction of physical offices and the emergence of hybrid work. In 2023 , we explored how large language models could reshape everyday work, and, in 2024 , how those advances moved from promise to real‑world impact.Each year, as I’ve written this introduction, I’ve found myself saying that the previous year marked a once-in-a-lifetime generational shift. But after five years, it’s clear that the reports aren’t capturing a series of separate revolutions. Rather, they are chapters in a single....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":92,"matched_keywords":["Tech Report","Artificial intelligence","Data platforms and analytics","Economics","Human-computer interaction","Programming languages and software engineering","Social sciences","AI and society","Generative AI","Human-AI Collaboration"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/latent-zoning-network-a-unified-principle-for-generative-modeling-representation-learning-and-classification","title":"Latent Zoning Network: A Unified Principle for Generative Modeling, Representation Learning, and Classification","url":"https://www.microsoft.com/en-us/research/publication/latent-zoning-network-a-unified-principle-for-generative-modeling-representation-learning-and-classification/","published":"2025-12-01","authors":["Zinan Lin","Enshu Liu","Xuefei Ning","Junyi Zhu","Wenyu Wang","Sergey Yekhanin"],"abstract":"Generative modeling, representation learning, and classification are three core problems in machine learning (ML), yet their state-of-the-art (SoTA) solutions remain largely disjoint. In this paper, we ask: Can a unified principle address all three? Such unification could simplify ML pipelines and foster greater synergy across tasks. We introduce Latent Zoning Network (LZN) as a step toward this goal. At its core, LZN creates a shared Gaussian latent space that encodes information across all tasks. Each data type (e.g., images, text, labels) is equipped with an encoder that maps samples to disjoint latent zones, and a decoder that maps latents back to data. ML tasks are expressed as compositions of these encoders and decoders: for example, label-conditional image generation uses a label encoder and image decoder; image embedding uses an image encoder; classification uses an image encoder...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":92,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer vision","Computer science","Diffusion models","Generative model","Machine learning","mathematics","Representation learning","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/meshagent-enabling-reliable-network-management-with-large-language-models","title":"MeshAgent: Enabling Reliable Network Management with Large Language Models","url":"https://www.microsoft.com/en-us/research/publication/meshagent-enabling-reliable-network-management-with-large-language-models/","published":"2025-12-01","authors":["Yajie Zhou","Kevin Hsieh","Sathiya Kumaran Mani","Srikanth Kandula","Zaoxing Liu"],"abstract":"The emergence of large language models (LLMs) offers great promise for building domain-specific agents, but adapting them for network management remains challenging. To understand why, we conduct a case study on network management tasks and find that state-of-the-art specialization techniques rely heavily on extensive, high-quality task-specific data to produce precise solutions. However, real-world network queries are often diverse and unpredictable, making such techniques difficult to scale. Motivated by this gap, we propose MeshAgent, a new workflow that improves precision by extracting domain-specific invariants from sample queries and encoding them as constraints. These constraints guide LLM’s generation and validation process, narrowing the search space and enabling low-effort adaptation. We evaluate our method across three network management applications and a user study involving...","companies":["Microsoft","Amazon"],"matched_orgs":["Microsoft","Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.1145/3771567","openalex_id":"https://openalex.org/W4416927413","cited_by_count":1,"quality_score":85,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Systems and networking","1970-01-01","LLM"],"author_affiliations":["Microsoft","Amazon (United States)","Microsoft (United States)","University of Maryland, College Park"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/synthesize-privacy-preserving-high-resolution-images-via-private-textual-intermediaries","title":"Synthesize Privacy-Preserving High-Resolution Images via Private Textual Intermediaries","url":"https://www.microsoft.com/en-us/research/publication/synthesize-privacy-preserving-high-resolution-images-via-private-textual-intermediaries/","published":"2025-12-01","authors":["Haoxiang Wang","Zinan Lin","Da Yu","Huishuai Zhang"],"abstract":"Generating high fidelity, differentially private (DP) synthetic images offers a promising route to share and analyze sensitive visual data without compromising individual privacy. However, existing DP image synthesis methods struggle to produce high resolution outputs that faithfully capture the structure of the original data. In this paper, we introduce a novel method, referred to as Synthesis via Private Textual Intermediaries (SPTI), that can generate high resolution DP images with easy adoption. The key idea is to shift the challenge of DP image synthesis from the image domain to the text domain by leveraging state of the art DP text generation methods. SPTI first summarizes each private image into a concise textual description using image to text models, then applies a modified Private Evolution algorithm to generate DP text, and finally reconstructs images using text to image model...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":84,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer vision","Differential privacy","Image generation","Synthetic data","1970-01-01","efficient"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/struct-bench-a-benchmark-for-differentially-private-structured-text-generation","title":"Struct-Bench: A Benchmark for Differentially Private Structured Text Generation","url":"https://www.microsoft.com/en-us/research/publication/struct-bench-a-benchmark-for-differentially-private-structured-text-generation/","published":"2025-12-01","authors":["Shuaiqi Wang","Vikas Raunak","Arturs Backurs","Victor Reis","Pei Zhou","Sihao Chen","Longqi Yang","Zinan Lin","Sergey Yekhanin","Giulia Fanti"],"abstract":"Differentially private (DP) synthetic data generation is a promising technique for utilizing private datasets that otherwise cannot be exposed for model training or other analytics. While much research literature has focused on generating private unstructured text and image data, in enterprise settings, structured data (e.g., tabular) is more common, often including natural language fields or components. Existing synthetic data evaluation techniques (e.g., FID) struggle to capture the structural properties and correlations of such datasets. In this work, we propose Struct-Bench, a framework and benchmark for evaluating synthetic datasets derived from structured datasets that contain natural language data. The Struct-Bench framework requires users to provide a representation of their dataset structure as a Context-Free Grammar (CFG). Our benchmark comprises 5 real-world and 2 syntheticall...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":84,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Security, privacy, and cryptography","Benchmarking","Differential privacy","Synthetic data","Text generation","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/verusage-a-study-of-agent-based-verification-for-rust-systems","title":"VeruSAGE: A Study of Agent-Based Verification for Rust Systems","url":"https://www.microsoft.com/en-us/research/publication/verusage-a-study-of-agent-based-verification-for-rust-systems/","published":"2025-12-01","authors":["Chenyuan Yang","Natalie Neamtu","Chris Hawblitzel","Jay Lorch","Shan Lu"],"abstract":"Large language models (LLMs) have shown impressive capability to understand and develop code. However, their capability to rigorously reason about and prove code correctness remains in question. This paper offers a comprehensive study of LLMs' capability to develop correctness proofs for system software written in Rust. We curate a new system-verification benchmark suite, VeruSAGE-Bench, which consists of 849 proof tasks extracted from eight open-source Verus-verified Rust systems. Furthermore, we design different agent systems to match the strengths and weaknesses of different LLMs (o4-mini, GPT-5, Sonnet 4, and Sonnet 4.5). Our study shows that different tools and agent settings are needed to stimulate the system-verification capability of different types of LLMs. The best LLM-agent combination in our study completes over 80% of system-verification tasks in VeruSAGE-Bench. It also comp...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Miscellaneous","Artificial intelligence","Programming languages and software engineering","Systems and networking","LLM","agent"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/nova-an-agentic-framework-for-automated-histopathology-analysis-and-discovery","title":"NOVA: An Agentic Framework for Automated Histopathology Analysis and Discovery","url":"https://www.microsoft.com/en-us/research/publication/nova-an-agentic-framework-for-automated-histopathology-analysis-and-discovery/","published":"2025-12-01","authors":["Anurag Vaidya","Felix Meissen","Daniel Coelho de Castro","Shruthi Bannur","Tristan Lazard","Drew F. K. Williamson","Faisal Mahmood","Javier Alvarez-Valle","Stephanie Hyland","Kenza Bouzid"],"abstract":"Digitized histopathology analysis involves complex, time-intensive workflows and specialized expertise, limiting its accessibility. We introduce NOVA, an agentic framework that translates scientific queries into executable analysis pipelines by iteratively generating and running Python code. NOVA integrates 49 domain-specific tools (e.g., nuclei segmentation, whole-slide encoding) built on open-source software, and can also create new tools ad hoc. To evaluate such systems, we present SlideQuest, a 90-question benchmark -- verified by pathologists and biomedical scientists -- spanning data processing, quantitative analysis, and hypothesis testing. Unlike prior biomedical benchmarks focused on knowledge recall or diagnostic QA, SlideQuest demands multi-step reasoning, iterative coding, and computational problem solving. Quantitative evaluation shows NOVA outperforms coding-agent baselines...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Medical, health and genomics","Computer science","1970-01-01","agent"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/mechanism-design-for-llm-fine-tuning-with-multiple-reward-models","title":"Mechanism Design for LLM Fine-tuning with Multiple Reward Models","url":"https://www.microsoft.com/en-us/research/publication/mechanism-design-for-llm-fine-tuning-with-multiple-reward-models/","published":"2025-12-01","authors":["Haoran Sun","Yurong Chen","Siwei Wang","Xu Chu","Wei Chen","Xiaotie Deng"],"abstract":"Recent research on fine-tuning large language models (LLMs) through the aggregation of multiple preferences has attracted considerable attention. However, the existing literature predominantly focuses on the empirical performance of aggregation algorithms while neglecting the underlying motivation for agents to misreport their preferences. In this paper, we formalize this as a multi-parameter mechanism design problem, where an LLM provider designs training and payment rules to achieve specific objectives and promote the truthful reporting of preferences. Firstly, we claim the necessity of a payment scheme by demonstrating that without payments, truth-telling is a strictly dominated strategy under a wide range of training rules. Then, we introduce the affine maximizer payment scheme for the social welfare maximizing training rules, which ensures both dominant-strategyincentive compatibili...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Algorithms","Artificial intelligence","Economics","1970-01-01","LLM"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"official:ac43b4d243a06351","title":"Rubric-Based Benchmarking and Reinforcement Learning for Advancing LLM Instruction Following","url":"https://ai.meta.com/research/publications/rubric-based-benchmarking-and-reinforcement-learning-for-advancing-llm-instruction-following/","published":"2025-12-01","authors":["Yun He","Wenzhe Li","Hejia Zhang","Vincent Li","Karishma Mandyam","Sopan Khosla","Yuanhao Xiong","Nanshu Wang","Selina Xiaoliang Peng","Shengjie Bi","Shishir G. Patil","Qi Qi"],"abstract":"Official Meta AI publication page.","companies":["Meta/FAIR"],"matched_orgs":["Meta/FAIR"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Conversational AI","Reinforcement Learning","LLM"],"author_affiliations":["Meta/FAIR"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Meta AI publication search https://ai.meta.com/results/?content_types%5B0%5D=publication&page=2"}},{"id":"hf-org-paper:Qwen:2512.01374","title":"Stabilizing Reinforcement Learning with LLMs: Formulation and Practices","url":"https://huggingface.co/papers/2512.01374","published":"2025-12-01","authors":["Alibaba/Qwen"],"abstract":"","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["HuggingFace org papers","Qwen"],"author_affiliations":["Alibaba/Qwen"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/Qwen/papers"}},{"id":"apple:lw6h2ldhj3ewrvjrspvgg3n8","title":"Sample and Map from a Single Convex Potential: Generation using Conjugate Moment Measures","url":"https://machinelearning.apple.com/research/sample-and-map","published":"2025-12-01","authors":["Nina Vesseron","Louis Béthune","Marco Cuturi"],"abstract":"The canonical approach in generative modeling is to split model fitting into two blocks: define first how to sample noise (e.g. Gaussian) and choose next what to do with it (e.g. using a single map or flows). We explore in this work an alternative route that ties sampling and mapping. We find inspiration in moment measures, a result that states that for any measure ρ, there exists a unique convex potential u such that ρ = ∇u♯e-u. While this does...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"openalex:W4417147545","title":"Efficient multimodal large language models: a survey","url":"https://doi.org/10.1007/s44267-025-00099-6","published":"2025-12-01","authors":["Yizhang Jin","Jian Li","Tianjun Gu","Yexin Liu","Bo Zhao","Jinxiang Lai","Zhenye Gan","Yabiao Wang","Chengjie Wang","Xin Tan","Lizhuang Ma"],"abstract":"Abstract In the past years, multimodal large language models (MLLMs) have demonstrated remarkable performance in tasks such as visual question answering and visual understanding and reasoning. However, the extensive model size and high training and inference costs have hindered the widespread application of MLLMs in academia and industry. Thus, studying efficient and lightweight MLLMs has enormous potential, especially in edge computing scenarios. In this survey, we provide a comprehensive and systematic review of the current state of efficient MLLMs. Specifically, this survey summarizes the timeline of representative efficient MLLMs, the current state of research in structures and strategies, and the applications. Finally, the limitations of current efficient MLLM research and promising future directions are discussed.","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1007/s44267-025-00099-6","openalex_id":"https://openalex.org/W4417147545","cited_by_count":11,"quality_score":52,"matched_keywords":["efficient"],"author_affiliations":["Beijing Academy of Artificial Intelligence","East China Normal University","Hong Kong University of Science and Technology","Shanghai Jiao Tong University","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7742000222206116},{"id":"https://openalex.org/C4438859","display_name":"Timeline","score":0.6498000025749207},{"id":"https://openalex.org/C2776214188","display_name":"Inference","score":0.5310999751091003},{"id":"https://openalex.org/C48103436","display_name":"State (computer science)","score":0.4812999963760376},{"id":"https://openalex.org/C44291984","display_name":"Question answering","score":0.41609999537467957},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.41589999198913574},{"id":"https://openalex.org/C2522767166","display_name":"Data science","score":0.40709999203681946},{"id":"https://openalex.org/C2983448237","display_name":"Language understanding","score":0.3937999904155731}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":11}},{"id":"openalex:W7133499090","title":"LLM-Based Long-Term Life Task Planning to Reduce Human Uncertainty","url":"https://doi.org/10.1145/3799914.3799938","published":"2025-12-01","authors":["Ben Wang"],"abstract":"In long-term life tasks, people often face challenges from uncertainty in tasks and information-seeking, which can create difficulties in decision-making and task completion. Recent advancements in Artificial Intelligence (AI), especially in Large Language Models (LLMs), offer transformative capabilities in domain-specific task planning and problem-solving. Despite these innovations, there is limited understanding of how such technologies can be applied to assist humans in long-term life tasks. This dissertation work seeks to address this gap by exploring how human-AI collaboration, mediated through LLM-based agents, can improve long-term life task planning and uncertainty management. To achieve this, this dissertation first proposes the long-term life task type and investigates how people may use AI tools to assist them in planning long-term life tasks and cope with uncertainty. Secondl...","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3799914.3799938","openalex_id":"https://openalex.org/W7133499090","cited_by_count":0,"quality_score":45,"matched_keywords":["LLM","long-term"],"author_affiliations":["Amazon (United States)"],"concepts":[{"id":"https://openalex.org/C2780451532","display_name":"Task (project management)","score":0.7293000221252441},{"id":"https://openalex.org/C64543145","display_name":"Intersection (aeronautics)","score":0.5551000237464905},{"id":"https://openalex.org/C202033279","display_name":"Scenario planning","score":0.48249998688697815},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.4781000018119812},{"id":"https://openalex.org/C539667460","display_name":"Management science","score":0.43560001254081726},{"id":"https://openalex.org/C175154964","display_name":"Task analysis","score":0.4074999988079071},{"id":"https://openalex.org/C18762648","display_name":"Work (physics)","score":0.4016000032424927},{"id":"https://openalex.org/C70587473","display_name":"Transformative learning","score":0.3984000086784363}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7130403900","title":"Item-Language Model: Improving Large Language Model for Recommendation via Item-Language Representation Learning","url":"https://doi.org/10.48448/ydk1-n832","published":"2025-12-01","authors":["Association for Computational Linguistics 2025","Vikram Aggarwal","Fuli Feng","J. Li","Dongfang Liu","Reza Mirghaderi","Hardik Patel","Yanwei Song","Anushya Subbiah","Qifan Wang","Zenglin Xu","Li Yang"],"abstract":"Large Language Models (LLMs) have recently made significant advancements in tackling complex tasks, such as retrieving hard-to-find information and solving intricate problems. Consequently, various approaches have been proposed to integrate LLMs into recommender systems, primarily by embedding them within existing architectures or training them on the recommendation data. However, most existing methods fail to effectively incorporate user-item interaction signals into pretrained LLMs due to the modality gap between interaction data and the LLM’s internal knowledge. To address this challenge, we propose the Item-Language Model (ILM) to enhance LLMs for recommendation. ILM consists of two main components: An item-language representation learning module, where an ILM encoder is pretrained to generate text-aligned item representations. And an item-language co-training module, where the ILM e...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"other","doi":"https://doi.org/10.48448/ydk1-n832","openalex_id":"https://openalex.org/W7130403900","cited_by_count":0,"quality_score":45,"matched_keywords":["LLM","language model"],"author_affiliations":["Fudan University","Google (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7807000279426575},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.6014000177383423},{"id":"https://openalex.org/C2780226545","display_name":"Modality (human–computer interaction)","score":0.5726000070571899},{"id":"https://openalex.org/C174348530","display_name":"Bridging (networking)","score":0.5496000051498413},{"id":"https://openalex.org/C118505674","display_name":"Encoder","score":0.5392000079154968},{"id":"https://openalex.org/C2776359362","display_name":"Representation (politics)","score":0.5063999891281128},{"id":"https://openalex.org/C41608201","display_name":"Embedding","score":0.49399998784065247},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.4860999882221222}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4416999604","title":"Voyager: Long-Range and World-Consistent Video Diffusion for Explorable 3D Scene Generation","url":"https://doi.org/10.1145/3763330","published":"2025-12-01","authors":["Tianyu Huang","Wangguandong Zheng","Tengfei Wang","Yuhao Liu","Zhenwei Wang","Junta Wu","Jie Jiang","Hui Li","Rynson W. H. Lau","Wangmeng Zuo","Chunchao Guo"],"abstract":"Real-world applications like video gaming and virtual reality often demand the ability to model 3D scenes that users can explore along custom camera trajectories. While significant progress has been made in generating 3D objects from text or images, creating long-range, 3D-consistent, explorable 3D scenes remains a complex and challenging problem. In this work, we present Voyager , a novel video diffusion framework that generates world-consistent 3D point-cloud sequences from a single image with user-defined camera path. Unlike existing approaches, Voyager achieves end-to-end scene generation and reconstruction with inherent consistency across frames, eliminating the need for 3D reconstruction pipelines (e.g., structure-from-motion or multi-view stereo). Our method integrates three key components: 1) World-Consistent Video Diffusion : A unified architecture that jointly generates aligned...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3763330","openalex_id":"https://openalex.org/W4416999604","cited_by_count":2,"quality_score":43,"matched_keywords":["efficient"],"author_affiliations":["City University of Hong Kong","Harbin Institute of Technology","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8187999725341797},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.7075999975204468},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.6658999919891357},{"id":"https://openalex.org/C2776449333","display_name":"View synthesis","score":0.4797999858856201},{"id":"https://openalex.org/C109950114","display_name":"3D reconstruction","score":0.43950000405311584},{"id":"https://openalex.org/C43521106","display_name":"Pipeline (software)","score":0.4214000105857849},{"id":"https://openalex.org/C121684516","display_name":"Computer graphics (images)","score":0.420199990272522},{"id":"https://openalex.org/C202474056","display_name":"Video tracking","score":0.40610000491142273}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":2}},{"id":"openalex:W7117552639","title":"Sky-Drive: a Distributed Multiagent Simulation Platform for Human-AI Collaborative and Socially Aware Future Transportation","url":"https://doi.org/10.26599/jicv.2026.9210070","published":"2025-12-01","authors":["Zilin Huang","Zihao Sheng","Zhengyang Wan","Yansong Qu","Yan Luo","Boyue Wang","Pei Li","Yen‐Jung Chen","Jiancong Chen","Keke Long","Jiayi Meng","Yue Leng"],"abstract":"Recent advances in autonomous system simulation platforms have significantly enhanced the safe and scalable testing of driving policies. Although existing simulators have greatly accelerated development by providing controlled testing environments, they face limitations in addressing the evolving needs of future transportation research, particularly in enabling effective human-artificial intelligence (human-AI) collaboration and modeling socially aware driving agents. This study introduces Sky-Drive, a novel distributed multiagent simulation platform that addresses these limitations through four key innovations: (1) a distributed architecture for synchronized simulation across multiple terminals; (2) a multimodal human-in-the-loop framework that integrates diverse sensors to collect rich behavioral data; (3) a human-AI collaboration mechanism that supports continuous and adaptive knowled...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.26599/jicv.2026.9210070","openalex_id":"https://openalex.org/W7117552639","cited_by_count":2,"quality_score":43,"matched_keywords":["personalized"],"author_affiliations":["Google (United States)","Purdue University West Lafayette","The University of Texas at Arlington","University of Wisconsin–Madison"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6575999855995178},{"id":"https://openalex.org/C174348530","display_name":"Bridging (networking)","score":0.6128000020980835},{"id":"https://openalex.org/C48044578","display_name":"Scalability","score":0.5551000237464905},{"id":"https://openalex.org/C26517878","display_name":"Key (lock)","score":0.47049999237060547},{"id":"https://openalex.org/C123657996","display_name":"Architecture","score":0.46230000257492065},{"id":"https://openalex.org/C97541855","display_name":"Reinforcement learning","score":0.424699991941452},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.42160001397132874},{"id":"https://openalex.org/C47796450","display_name":"Intelligent transportation system","score":0.37049999833106995}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":2}},{"id":"openalex:W7117486739","title":"Seeing clearly with artificial intelligence: Brand and video measurement in focus","url":"https://doi.org/10.69554/eoqc6008","published":"2025-12-01","authors":["Suraj Rajdev"],"abstract":"This paper traces the evolution of video marketing measurement from the pre-internet era to the digital age, highlighting the shift from simplistic models to complex, data-rich environments. It then delves into the artificial intelligence (AI) era, where probabilistic measurement challenges traditional frameworks such as media mix modelling, attribution and experimentation. The paper proposes branded search volume as a realtime ‘conversion’ metric for brand measurement that also strongly correlates with sales. Ultimately, it explores how cutting-edge AI capabilities, including large language models interpretability and advanced attention measurement, provide revolutionary ways to understand brand impact and drive marketing effectiveness in today’s dynamic landscape. This article is also included in The Business & Management Collection which can be accessed at https://hstalks.com/business...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.69554/eoqc6008","openalex_id":"https://openalex.org/W7117486739","cited_by_count":0,"quality_score":41,"matched_keywords":["media"],"author_affiliations":["Google (United States)"],"concepts":[{"id":"https://openalex.org/C2781067378","display_name":"Interpretability","score":0.7652000188827515},{"id":"https://openalex.org/C192209626","display_name":"Focus (optics)","score":0.6098999977111816},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.574400007724762},{"id":"https://openalex.org/C2522767166","display_name":"Data science","score":0.5002999901771545},{"id":"https://openalex.org/C176217482","display_name":"Metric (unit)","score":0.48590001463890076},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.41659998893737793},{"id":"https://openalex.org/C49937458","display_name":"Probabilistic logic","score":0.4027000069618225},{"id":"https://openalex.org/C98495876","display_name":"Digital marketing","score":0.3944999873638153}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7129538909","title":"Digital Twin Enabled Deep Learning System for Predictive Monitoring of Cardiovascular Health","url":"https://doi.org/10.1109/ic2nc67409.2025.11376464","published":"2025-12-01","authors":["V. Srinivasan","Kalyan Kondisetty","Srikanth Gorle","C.R. Durga Devi","Manas Ranjan Panda","Mohan Vamsi Musunuru"],"abstract":"Cardiovascular diseases (CVDs) are a major cause of death globally, that require timely and personalized monitoring. This research introduces a Digital Twin Enabled Deep Learning (DT-DL) framework by incorporating an open source cardiovascular model and data driven deep learning in predictive and interpretable health monitoring. The proposed system integrates a physiological digital twin, multimodal sensor integration layer, deep residual predictor, and uncertainty quantification module into a closed loop real time feedback. The digital twin models hemodynamic dynamics with a time dependent elastance model, whilst a Temporal Convolutional Network learns residual corrections and delivers uncertainty informed predictions. Model parameters are adapted in real time through Unscented Kalman Filtering such that individual patient physiology is tracked. Empirical studies on multimodal datasets....","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/ic2nc67409.2025.11376464","openalex_id":"https://openalex.org/W7129538909","cited_by_count":0,"quality_score":41,"matched_keywords":["personalized"],"author_affiliations":["AbbVie (United States)","Amazon (United States)","Chicago Department of Public Health","Dr. Hari Singh Gour University","Grammar School","TD Bank","Ta Solutions (China)"],"concepts":[{"id":"https://openalex.org/C108583219","display_name":"Deep learning","score":0.8091999888420105},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.7427999973297119},{"id":"https://openalex.org/C155512373","display_name":"Residual","score":0.6175000071525574},{"id":"https://openalex.org/C3018284874","display_name":"Cardiovascular health","score":0.5796999931335449},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.5651000142097473},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5602999925613403},{"id":"https://openalex.org/C157286648","display_name":"Kalman filter","score":0.492000013589859},{"id":"https://openalex.org/C81363708","display_name":"Convolutional neural network","score":0.4650999903678894}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7116116926","title":"ChatHSI: Reliable LLM-Powered Human-Swarm Interaction Framework","url":"https://doi.org/10.1145/3769534.3769608","published":"2025-12-01","authors":["Yiheng Zhang","Shen Bohan","Le Liu","Shizhou Zhang","Peng Wang","Lingyun Yu","Di Xu"],"abstract":"Human-swarm interaction (HSI) is critical for scalable control of UAV swarm systems. Traditional interfaces struggle with generalization and user workload, especially in immersive environments. Hence, we present ChatHSI, a framework leveraging large language models (LLMs) for swarm task planning. ChatHSI integrates prompt engineering, action validation, and a human-in-the-loop mechanism to improve planning feasibility and executability. We implement ChatHSI in an immersive simulation to improve users’ spatial and situational awareness. Our method shows improved task efficiency, reduced workload, and higher usability in user studies. Ablation study proves the effectiveness of prompt context and action validation. The results show the feasibility of LLM-driven interaction for immersive swarm control and point toward adaptive, intuitive, and scalable HSI systems.","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3769534.3769608","openalex_id":"https://openalex.org/W7116116926","cited_by_count":0,"quality_score":41,"matched_keywords":["LLM"],"author_affiliations":["Cloud Computing Center","Huawei Technologies (China)","Northwestern Polytechnical University","Xi’an Jiaotong-Liverpool University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7904999852180481},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.6855000257492065},{"id":"https://openalex.org/C48044578","display_name":"Scalability","score":0.6660000085830688},{"id":"https://openalex.org/C170130773","display_name":"Usability","score":0.6621999740600586},{"id":"https://openalex.org/C2780451532","display_name":"Task (project management)","score":0.6061000227928162},{"id":"https://openalex.org/C2779343474","display_name":"Context (archaeology)","score":0.567300021648407},{"id":"https://openalex.org/C177148314","display_name":"Generalization","score":0.48829999566078186},{"id":"https://openalex.org/C181335050","display_name":"Swarm behaviour","score":0.4765999913215637}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4416958227","title":"Leveraging automated machine learning (AutoML) for urban climate emulation","url":"https://doi.org/10.1016/j.bdes.2025.100040","published":"2025-12-01","authors":["Junjie Yu","Zhonghua Zheng","Sarah Lindley","Lei Zhao","Chi Wang","Qingyun Wu","Lingcheng Li","David Topping","John S. Schreck","David John Gagne","Keith W. Oleson"],"abstract":"• Location-independent urban climate emulators are developed using AutoML. • A feature importance analysis framework is proposed for AutoML models. • Location and urban surface parameters improve the emulation performance. • Forcing and location are more important than urban surface parameters in emulation. Urban climate models are critical for understanding and addressing the impacts of urban climate change and for supporting the development of sustainable cities. Yet, process-based urban climate models face limitations of high-entry barriers and substantial computing resource consumption, prompting the development of data-driven methods. In this study, we develop location-independent machine learning emulators for the daily maximum canyon air temperature. To overcome the complexities associated with model selection and hyperparameter optimization in machine learning, we apply automated...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1016/j.bdes.2025.100040","openalex_id":"https://openalex.org/W4416958227","cited_by_count":2,"quality_score":39,"matched_keywords":[],"author_affiliations":["Google (United States)","NSF NCAR Climate and Global Dynamics Laboratory","NSF National Center for Atmospheric Research","National Center for Supercomputing Applications","Pacific Northwest National Laboratory","Pennsylvania State University","University of Illinois Urbana-Champaign","University of Manchester"],"concepts":[{"id":"https://openalex.org/C149810388","display_name":"Emulation","score":0.8589000105857849},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.6309999823570251},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5450999736785889},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5041999816894531},{"id":"https://openalex.org/C2776401178","display_name":"Feature (linguistics)","score":0.4555000066757202},{"id":"https://openalex.org/C168754636","display_name":"Climate model","score":0.44179999828338623},{"id":"https://openalex.org/C148483581","display_name":"Feature selection","score":0.42179998755455017},{"id":"https://openalex.org/C132651083","display_name":"Climate change","score":0.40849998593330383}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":2}},{"id":"openalex:W7130424934","title":"On Memorization of Large Language Models in Logical Reasoning","url":"https://doi.org/10.48448/tayp-9m19","published":"2025-12-01","authors":["Association for Computational Linguistics 2025","Xinyun Chen","Badih Ghazi","Yangsibo Huang","Ravi Kumar","Bo Li","Yuchen Lin","Chulin Xie","Da Yu","Chiyuan Zhang"],"abstract":"Large language models (LLMs) achieve good performance on challenging reasoning benchmarks, yet could also make basic reasoning mistakes. This contrasting behavior is puzzling when it comes to understanding the mechanisms behind LLMs' reasoning capabilities. One hypothesis is that the increasingly high and nearly saturated performance on common reasoning benchmarks could be due to the memorization of similar problems. In this paper, we systematically investigate this hypothesis with a quantitative measurement of memorization in reasoning tasks, using two dynamically generated logical reasoning benchmarks based on Knights and Knaves (K&K) puzzles and Zebra puzzles (DynamicZebra). We find that LLMs could interpolate and memorize the training puzzles (achieving near-perfect accuracy) after fine-tuning, yet they struggle with slight variations of these puzzles. On the other hand, we show that...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"other","doi":"https://doi.org/10.48448/tayp-9m19","openalex_id":"https://openalex.org/W7130424934","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Google (United States)","International University of the Caribbean"],"concepts":[{"id":"https://openalex.org/C30038468","display_name":"Memorization","score":0.8960999846458435},{"id":"https://openalex.org/C43971567","display_name":"Logical reasoning","score":0.7228999733924866},{"id":"https://openalex.org/C177148314","display_name":"Generalization","score":0.573199987411499},{"id":"https://openalex.org/C97364631","display_name":"Deductive reasoning","score":0.5274999737739563},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.4869999885559082},{"id":"https://openalex.org/C36964233","display_name":"Verbal reasoning","score":0.4844000041484833},{"id":"https://openalex.org/C89288958","display_name":"Reasoning system","score":0.4602999985218048},{"id":"https://openalex.org/C2985612853","display_name":"Analogical reasoning","score":0.4514999985694885}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4417000471","title":"Imaginarium: Vision-guided High-Quality 3D Scene Layout Generation","url":"https://doi.org/10.1145/3763353","published":"2025-12-01","authors":["X. Zhu","Xu Huang","Qinghongbing Xie","Zhi Deng","Junsheng Yu","Y. Guan","Z. Liu","Lin Zhu","Qijun Zhao","Ligang Liu","Long Zeng"],"abstract":"Generating artistic and coherent 3D scene layouts is crucial in digital content creation. Traditional optimization-based methods are often constrained by cumbersome manual rules, while deep generative models face challenges in producing content with richness and diversity. Furthermore, approaches that utilize large language models frequently lack robustness and fail to accurately capture complex spatial relationships. To address these challenges, this paper presents a novel vision-guided 3D layout generation system. We first construct a high-quality asset library containing 2,037 scene assets and 147 3D scene layouts. Subsequently, we employ an image generation model to expand prompt representations into images, fine-tuning it to align with our asset library. We then develop a robust image parsing module to recover the 3D layout of scenes based on visual semantics and geometric informati...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3763353","openalex_id":"https://openalex.org/W4417000471","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Southeast University","Tencent (China)","Tsinghua University","University of Science and Technology of China"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8179000020027161},{"id":"https://openalex.org/C186644900","display_name":"Parsing","score":0.6151000261306763},{"id":"https://openalex.org/C63479239","display_name":"Robustness (evolution)","score":0.6140999794006348},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5552999973297119},{"id":"https://openalex.org/C184337299","display_name":"Semantics (computer science)","score":0.4918000102043152},{"id":"https://openalex.org/C179372163","display_name":"Scene graph","score":0.48489999771118164},{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.459199994802475},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.43810001015663147}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"official:0ec405c80bb0dd96","title":"ThinkAct: Vision-Language-Action Reasoning via Reinforced Visual Latent Planning","url":"https://research.nvidia.com/publication/2025-12_thinkact-vision-language-action-reasoning-reinforced-visual-latent-planning","published":"2025-12","authors":["Chi-Pin Huang","Yueh-Hua Wu","Min-Hung Chen","Frank Wang","Fred Yang"],"abstract":"Official NVIDIA Research publication. NeurIPS","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["NeurIPS"],"author_affiliations":["NVIDIA"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official NVIDIA Research publications page https://research.nvidia.com/publications?f%5B0%5D=publication_date%3A2025&page=0"}},{"id":"official:e6fdbce9164e85a2","title":"Policy Optimized Text-to-Image Pipeline Design","url":"https://research.nvidia.com/publication/2025-12_policy-optimized-text-image-pipeline-design","published":"2025-12","authors":["Uri Gadot","Rinon Gal","Yftah Zisser","Gal Chechik","Shie Mannor"],"abstract":"Official NVIDIA Research publication. NeurIPS","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["NeurIPS"],"author_affiliations":["NVIDIA"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official NVIDIA Research publications page https://research.nvidia.com/publications?f%5B0%5D=publication_date%3A2025&page=0"}},{"id":"openalex:W4416893777","title":"Mechanisms of Symbol Processing for In-Context Learning in Transformer Networks","url":"https://doi.org/10.1613/jair.1.17469","published":"2025-11-30","authors":["Paul Smolensky","R. García Fernández","Zhenghao Zhou","Mattia Opper","Adam Davies","Jianfeng Gao"],"abstract":"Large Language Models (LLMs) have demonstrated impressive abilities in symbol processing through in-context learning (ICL). This success flies in the face of decades of critiques asserting that artificial neural networks cannot master abstract symbol manipulation. We seek to understand the mechanisms that can enable robust symbol processing in transformer networks, illuminating both the unanticipated success, and the significant limitations, of transformers in symbol processing. Borrowing insights from symbolic AI and cognitive science on the power of Production System architectures, we develop a high-level Production System Language, PSL, that allows us to write symbolic programs to do complex, abstract symbol processing, and create compilers that precisely implement PSL programs in transformer networks which are, by construction, 100% mechanistically interpretable. The work is driven b...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1613/jair.1.17469","openalex_id":"https://openalex.org/W4416893777","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Microsoft (United States)","Microsoft Research (United Kingdom)","University of Edinburgh","Yale University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7501999735832214},{"id":"https://openalex.org/C66322947","display_name":"Transformer","score":0.7217000126838684},{"id":"https://openalex.org/C134400042","display_name":"Symbol (formal)","score":0.60589998960495},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.42489999532699585},{"id":"https://openalex.org/C169590947","display_name":"Compiler","score":0.3666999936103821},{"id":"https://openalex.org/C9870796","display_name":"Turing","score":0.3628000020980835},{"id":"https://openalex.org/C50644808","display_name":"Artificial neural network","score":0.3564999997615814},{"id":"https://openalex.org/C123657996","display_name":"Architecture","score":0.3188999891281128}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"official:110c1ae4f11317ad","title":"DeepSeek-V3.2: Efficient Reasoning & Agentic AI","url":"https://huggingface.co/deepseek-ai/DeepSeek-V3.2-Speciale/blob/main/assets/paper.pdf","published":"2025-11-28","authors":["DeepSeek"],"abstract":"","companies":["DeepSeek"],"matched_orgs":["DeepSeek"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_repository_scan"],"source":"official_repository_scan","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["efficient"],"author_affiliations":["DeepSeek"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace repo deepseek-ai/DeepSeek-V3.2-Speciale"}},{"id":"openalex:W7143522068","title":"Generative AI Models for Simulating Brain Lesion Impacts","url":"https://doi.org/10.1109/emergin67762.2025.11450766","published":"2025-11-28","authors":["Nitin Rakesh","Monali Gulhane","Sana Raj","M. RUBAN ANTONY","Garima Shukla","Aarthi Sivasankaran"],"abstract":"Brain damage, which can happen from a number of things like an accident, a stroke, or a neurodegenerative disease, can significantly change how neurones work. To improve the accuracy of diagnosis, treatment planning, and therapy approaches, it is important to be able to accurately simulate and predict the effects of brain injuries. Statistical models and simple computer methods are often used in traditional ways to try to figure out how brain injuries affect people. These models don't fully show, though, the complicated, nonlinear links between brain damage and how it affects function. This article looks into how generative artificial intelligence (AI) models, especially deep learning and generative adversarial networks (GANs), might be able to simulate the effects of brain injuries. We look into how well these models can create accurate brain lesion simulations and guess how those simul...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/emergin67762.2025.11450766","openalex_id":"https://openalex.org/W7143522068","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Google (United States)","Krishna Institute of Medical Sciences","Krishna Institute of Medical Sciences Deemed University","Symbiosis International University"],"concepts":[{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5497000217437744},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5291000008583069},{"id":"https://openalex.org/C167966045","display_name":"Generative model","score":0.352400004863739},{"id":"https://openalex.org/C2776401178","display_name":"Feature (linguistics)","score":0.3467999994754791},{"id":"https://openalex.org/C153180895","display_name":"Pattern recognition (psychology)","score":0.34610000252723694},{"id":"https://openalex.org/C50644808","display_name":"Artificial neural network","score":0.30300000309944153},{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.2822999954223633},{"id":"https://openalex.org/C168167062","display_name":"Component (thermodynamics)","score":0.2425999939441681}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"hf-org-paper:deepseek-ai:2511.22570","title":"DeepSeekMath-V2: Towards Self-Verifiable Mathematical Reasoning","url":"https://huggingface.co/papers/2511.22570","published":"2025-11-27","authors":["DeepSeek"],"abstract":"Large language models have made significant progress in mathematical reasoning, which serves as an important testbed for AI and could impact scientific research if further advanced. By scaling reasoning with reinforcement learning that rewards correct final answers, LLMs have improved from poor performance to saturating quantitative reasoning competitions like AIME and HMMT in one year. However, this approach faces fundamental limitations. Pursuing higher final answer accuracy doesn't address a key issue: correct answers don't guarantee correct reasoning. Moreover, many mathematical tasks like theorem proving require rigorous step-by-step derivation rather than numerical answers, making final answer rewards inapplicable. To push the limits of deep reasoning, we believe it is necessary to verify the comprehensiveness and rigor of mathematical reasoning. Self-verification is particularly i...","companies":["DeepSeek"],"matched_orgs":["DeepSeek"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["HuggingFace org papers","deepseek-ai","LLM"],"author_affiliations":["DeepSeek"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/deepseek-ai/papers"}},{"id":"hf-org-paper:stepfun-ai:2511.22625","title":"REASONEDIT: Towards Reasoning-Enhanced Image Editing Models","url":"https://huggingface.co/papers/2511.22625","published":"2025-11-27","authors":["StepFun"],"abstract":"","companies":["StepFun"],"matched_orgs":["StepFun"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["HuggingFace org papers","stepfun-ai"],"author_affiliations":["StepFun"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/stepfun-ai/papers"}},{"id":"arxiv:2511.21989","title":"Selecting User Histories to Generate LLM Users for Cold-Start Item Recommendation","url":"http://arxiv.org/abs/2511.21989","published":"2025-11-27","authors":["Nachiket Subbaraman","Jaskinder Sarai","Aniruddh Nath","Lichan Hong","Lukasz Heldt","Li Wei","Zhe Zhao"],"abstract":"Large Language Models (LLMs) have demonstrated remarkable capabilities in reasoning, generalization, and simulating human-like behavior across a wide range of tasks. These strengths present new opportunities to enhance traditional recommendation systems (RS), especially in the cold-start item scenario where newly introduced items lack interactions. Existing works have used LLMs to address cold-start issues in traditional RS through data augmentation, but they have limitations. One recent work directly addresses this issue by prompting LLMs to generate augmented interaction data between randomly sampled users and cold-start items. Then, they train the traditional RS with augmented data, incorporating collaborative signals for cold-start items. Although they use LLMs to provide cold-start items with feedback, they use partial user histories, which does not allow the LLM to fully emulate th...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"preprint","doi":"https://doi.org/10.48550/arxiv.2511.21989","openalex_id":"https://openalex.org/W4416942881","cited_by_count":0,"quality_score":45,"matched_keywords":["LLM","efficient"],"author_affiliations":["DeepMind (United Kingdom)","Google (United States)","University of California, Davis"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7689999938011169},{"id":"https://openalex.org/C153083717","display_name":"Leverage (statistics)","score":0.7064999938011169},{"id":"https://openalex.org/C81917197","display_name":"Selection (genetic algorithm)","score":0.4666999876499176},{"id":"https://openalex.org/C2779436431","display_name":"Policy learning","score":0.43560001254081726},{"id":"https://openalex.org/C90673727","display_name":"Product (mathematics)","score":0.4309999942779541},{"id":"https://openalex.org/C557471498","display_name":"Recommender system","score":0.3919000029563904},{"id":"https://openalex.org/C97541855","display_name":"Reinforcement learning","score":0.3702999949455261},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.33309999108314514}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"hf-org-paper:Qwen:2511.21631","title":"Qwen3-VL Technical Report","url":"https://huggingface.co/papers/2511.21631","published":"2025-11-26","authors":["Alibaba/Qwen"],"abstract":"We introduce Qwen3-VL, the most capable vision-language model in the Qwen series to date, achieving superior performance across a broad range of multimodal benchmarks. It natively supports interleaved contexts of up to 256K tokens, seamlessly integrating text, images, and video. The model family includes both dense (2B/4B/8B/32B) and mixture-of-experts (30B-A3B/235B-A22B) variants to accommodate diverse latency-quality trade-offs. Qwen3-VL delivers three core pillars: (i) markedly stronger pure-text understanding, surpassing comparable text-only backbones in several cases; (ii) robust long-context comprehension with a native 256K-token window for both text and interleaved multimodal inputs, enabling faithful retention, retrieval, and cross-referencing across long documents and videos; and (iii) advanced multimodal reasoning across single-image, multi-image, and video tasks, demonstrating...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["HuggingFace org papers","Qwen","language model","retrieval"],"author_affiliations":["Alibaba/Qwen"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/Qwen/papers"}},{"id":"hf-org-paper:Tencent-Hunyuan:2511.21541","title":"Video Generation Models Are Good Latent Reward Models","url":"https://huggingface.co/papers/2511.21541","published":"2025-11-26","authors":["Tencent/Hunyuan"],"abstract":"","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["HuggingFace org papers","Tencent-Hunyuan"],"author_affiliations":["Tencent/Hunyuan"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/Tencent-Hunyuan/papers"}},{"id":"hf-org-paper:Tencent-Hunyuan:2511.21579","title":"Harmony: Harmonizing Audio and Video Generation through Cross-Task Synergy","url":"https://huggingface.co/papers/2511.21579","published":"2025-11-26","authors":["Tencent/Hunyuan"],"abstract":"","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["HuggingFace org papers","Tencent-Hunyuan"],"author_affiliations":["Tencent/Hunyuan"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/Tencent-Hunyuan/papers"}},{"id":"openalex:W4416707357","title":"HybridMoE: LoRA-Based LLMs Fine-Tune With Hybrid Mixture of Experts","url":"https://doi.org/10.1109/taslpro.2025.3637555","published":"2025-11-26","authors":["Song Lin","Yufei Ma","Sen Liu","Junhua Shi","Linbo Jin","Dehong Gao","Shanqing Yu","Qi Xuan","Xiaoyan Cai","Libin Yang"],"abstract":"<bold xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">P</b>arameter-<bold xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">E</b>fficient <bold xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">F</b>ine-<bold xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">T</b>uning (PEFT) of <bold xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">L</b>arge <bold xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">L</b>anguage <bold xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">M</b>odels (LLMs) in multitask scenarios is cutting-edge research currently. This paper introduces an innovative PEFT approach, named HybridMoE, by...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/taslpro.2025.3637555","openalex_id":"https://openalex.org/W4416707357","cited_by_count":0,"quality_score":45,"matched_keywords":["LLM","efficient"],"author_affiliations":["Alibaba Group (China)","Northwestern Polytechnical University","Sir Run Run Shaw Hospital","Zhejiang Lab","Zhejiang University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7311999797821045},{"id":"https://openalex.org/C185798385","display_name":"Benchmark (surveying)","score":0.6751999855041504},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5454000234603882},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.5289000272750854},{"id":"https://openalex.org/C124681953","display_name":"Decomposition","score":0.5128999948501587},{"id":"https://openalex.org/C103278499","display_name":"Similarity (geometry)","score":0.5},{"id":"https://openalex.org/C206345919","display_name":"Resource (disambiguation)","score":0.4472000002861023},{"id":"https://openalex.org/C42355184","display_name":"Matrix decomposition","score":0.4180000126361847}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7128781818","title":"Generative AI-based Framework for Fraud Detection and Prevention in Online Payment Systems","url":"https://doi.org/10.1109/icuis67429.2025.11380758","published":"2025-11-26","authors":["R. V. S. Praveen","Satya Subrahmanya Sai Ram Gopal","Harikrishna Vemuri","S. Sista","RaviTeja Aida","Srinikhil Saisatya Vemuri"],"abstract":"The escalating menace of financial fraud has emerged as a significant issue in the contemporary digitally interconnected landscape, particularly with the proliferation of e-commerce and online payment mechanisms. Fraudulent activities, especially in credit and payment card transactions, have increased significantly, leading both public and private sectors to invest substantially in research and development aimed at fraud detection and prevention in online transactions. This paper presents a comprehensive strategy that initiates with a rigorous data preprocessing pipeline to manage missing values, eliminate outliers, and normalise the dataset through undersampling approaches. Fraud indicators are derived using dimensionality reduction, and the Improved Red Piranha Optimisation (IRPO) algorithm is utilised to optimise feature selection. A hybrid spatial-temporal feature space is created by...","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/icuis67429.2025.11380758","openalex_id":"https://openalex.org/W7128781818","cited_by_count":0,"quality_score":41,"matched_keywords":["efficient"],"author_affiliations":["Accenture (Switzerland)","Amazon (United States)","Biological E (India)","Digital Science (United States)","ET Enterprises (United Kingdom)","Fort Bend County Libraries","Saudi Arabia Basic Industries (United States)"],"concepts":[{"id":"https://openalex.org/C145097563","display_name":"Payment","score":0.6377000212669373},{"id":"https://openalex.org/C2780747020","display_name":"Credit card fraud","score":0.6200000047683716},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6111999750137329},{"id":"https://openalex.org/C136536468","display_name":"Undersampling","score":0.6061999797821045},{"id":"https://openalex.org/C43521106","display_name":"Pipeline (software)","score":0.5813000202178955},{"id":"https://openalex.org/C2776401178","display_name":"Feature (linguistics)","score":0.5598000288009644},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.46000000834465027},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.45829999446868896}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7128803382","title":"Next-Generation Clinical Documentation: Ambient AI and Automated Workflows with DAX Copilot","url":"https://doi.org/10.1109/icuis67429.2025.11380516","published":"2025-11-26","authors":["Tharun Kumar Nallamothu"],"abstract":"Business Intelligence (BI) and healthcare systems are increasingly reliant on complicated data processing, yet they face significant challenges. Generating accurate Data Analysis Expressions (DAX) queries within Power BI is a technical skill that involves an appropriate level of difficulty and creates barriers to non-technical users. DAX Copilot is an Artificial Intelligence (AI)-powered assistant that improves DAX query generation and automates healthcare documentation. DAX Copilot is an Assistant application using communicative and generative AI models (e.g., GPT-4), designed to read Natural Language Queries (NLQ), build optimized DAX statements, engage in interactive improvement, and ensure syntax validity. DAX Copilot integrates seamlessly with Power BI Desktop, Power BI Service, and Microsoft Fabric for users of any skill level to perform advanced analytics easily. AX Copilot, throu...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/icuis67429.2025.11380516","openalex_id":"https://openalex.org/W7128803382","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Microsoft (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7613000273704529},{"id":"https://openalex.org/C177212765","display_name":"Workflow","score":0.7208999991416931},{"id":"https://openalex.org/C79158427","display_name":"Analytics","score":0.541700005531311},{"id":"https://openalex.org/C56666940","display_name":"Documentation","score":0.5246999859809875},{"id":"https://openalex.org/C79974875","display_name":"Cloud computing","score":0.4961000084877014},{"id":"https://openalex.org/C75684735","display_name":"Big data","score":0.42320001125335693},{"id":"https://openalex.org/C160735492","display_name":"Health care","score":0.41749998927116394},{"id":"https://openalex.org/C115903868","display_name":"Software engineering","score":0.4120999872684479}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/beyond-accuracy-realistic-and-diagnostic-evaluation-of-code-generation-models","title":"Beyond Accuracy: Realistic and Diagnostic Evaluation of Code Generation Models","url":"https://www.microsoft.com/en-us/research/publication/beyond-accuracy-realistic-and-diagnostic-evaluation-of-code-generation-models/","published":"2025-11-25","authors":["Pareesa Ameneh Golnari","Xiaoyu Liu (lixiaoyu)","Gabriel Ryan (ryangabriel)","Shengyu Fu (shengyfu)"],"abstract":"DevBench is a telemetry-driven benchmark designed to evaluate Large Language Models (LLMs) on realistic code completion tasks. It includes 1,800 evaluation instances across six programming languages and six task categories derived from real developer telemetry, such as API usage and code purpose understanding. Unlike prior benchmarks, it emphasizes ecological validity, avoids training data contamination, and enables detailed diagnostics. The evaluation combines functional correctness, similarity-based metrics, and LLM-judge assessments focused on usefulness and contextual relevance. 11 state-of-the-art models were assessed, revealing differences in syntactic precision, semantic reasoning, and practical utility. Our benchmark provides actionable insights to guide model selection and improvement—detail that is often missing from other benchmarks but is essential for both practical deployme...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Generative AI","large language models","1970-01-01","LLM"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/beyond-membership-limitations-of-add-remove-adjacency-in-differential-privacy","title":"Beyond Membership: Limitations of Add/Remove Adjacency in Differential Privacy","url":"https://www.microsoft.com/en-us/research/publication/beyond-membership-limitations-of-add-remove-adjacency-in-differential-privacy/","published":"2025-11-25","authors":["Gauri Pradhan","Joonas Jälkö","Santiago Zanella-Béguelin","Antti Honkela"],"abstract":"Training machine learning models with differential privacy (DP) limits an adversary's ability to infer sensitive information about the training data. It can be interpreted as a bound on adversary's capability to distinguish two adjacent datasets according to chosen adjacency relation. In practice, most DP implementations use the add/remove adjacency relation, where two datasets are adjacent if one can be obtained from the other by adding or removing a single record, thereby protecting membership. In many ML applications, however, the goal is to protect attributes of individual records (e.g., labels used in supervised fine-tuning). We show that privacy accounting under add/remove overstates attribute privacy compared to accounting under the substitute adjacency relation, which permits substituting one record. To demonstrate this gap, we develop novel attacks to audit DP under substitute a...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","Machine learning","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"hf-org-paper:tencent:2511.20549","title":"Flash-DMD: Towards High-Fidelity Few-Step Image Generation with Efficient Distillation and Joint Reinforcement Learning","url":"https://huggingface.co/papers/2511.20549","published":"2025-11-25","authors":["Tencent/Hunyuan"],"abstract":"","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["HuggingFace org papers","tencent","efficient","distillation"],"author_affiliations":["Tencent/Hunyuan"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/tencent/papers"}},{"id":"hf-org-paper:stepfun-ai:2511.20635","title":"iMontage: Unified, Versatile, Highly Dynamic Many-to-many Image Generation","url":"https://huggingface.co/papers/2511.20635","published":"2025-11-25","authors":["StepFun"],"abstract":"","companies":["StepFun"],"matched_orgs":["StepFun"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["HuggingFace org papers","stepfun-ai"],"author_affiliations":["StepFun"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/stepfun-ai/papers"}},{"id":"hf-org-paper:Qwen:2511.20347","title":"Soft Adaptive Policy Optimization","url":"https://huggingface.co/papers/2511.20347","published":"2025-11-25","authors":["Alibaba/Qwen"],"abstract":"","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["HuggingFace org papers","Qwen"],"author_affiliations":["Alibaba/Qwen"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/Qwen/papers"}},{"id":"hf-org-paper:huawei-noah:2511.20626","title":"ROOT: Robust Orthogonalized Optimizer for Neural Network Training","url":"https://huggingface.co/papers/2511.20626","published":"2025-11-25","authors":["Huawei/Noah"],"abstract":"The optimization of large language models (LLMs) remains a critical challenge, particularly as model scaling exacerbates sensitivity to algorithmic imprecision and training instability. Recent advances in optimizers have improved convergence efficiency through momentum orthogonalization, but suffer from two key robustness limitations: dimensional fragility in orthogonalization precision and vulnerability to outlier-induced noise. To address these robustness challenges, we introduce ROOT, a Robust Orthogonalized Optimizer that enhances training stability through dual robustness mechanisms. First, we develop a dimension-robust orthogonalization scheme using adaptive Newton iterations with fine-grained coefficients tailored to specific matrix sizes, ensuring consistent precision across diverse architectural configurations. Second, we introduce an optimization-robust framework via proximal o...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["HuggingFace org papers","huawei-noah"],"author_affiliations":["Huawei/Noah"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/huawei-noah/papers"}},{"id":"openalex:W7130589880","title":"Optimizing Federated Learning in the Era of LLMs: Message Quantization and Streaming","url":"https://doi.org/10.1109/fllm67465.2025.11390924","published":"2025-11-25","authors":["Ziyue Xu","Zhihong Zhang","H Roth","Chester Chen","Yan Cheng","Andrew Feng"],"abstract":"Federated Learning (FL) offers a promising solution for training machine learning models across distributed data sources while preserving data privacy. However, FL faces critical challenges related to communication overhead and local resource constraints, especially in the era of Large Language Models (LLMs) with billions of parameters. The sheer size of these models exacerbates both memory and communication constraints, making efficient transmission and processing essential for practical deployment. NVIDIA FLARE, an open-source SDK for federated learning, addresses these challenges by introducing advanced communication capabilities. Building upon existing solutions for large object streaming, we enhance FL workflows for LLMs through two key techniques: message quantization and container/file streaming. Quantization reduces message size, while streaming enables efficient memory managemen...","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/fllm67465.2025.11390924","openalex_id":"https://openalex.org/W7130589880","cited_by_count":0,"quality_score":49,"matched_keywords":["memory","efficient","quantization"],"author_affiliations":["Nvidia (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8623999953269958},{"id":"https://openalex.org/C2992525071","display_name":"Federated learning","score":0.8379999995231628},{"id":"https://openalex.org/C48044578","display_name":"Scalability","score":0.6748999953269958},{"id":"https://openalex.org/C177212765","display_name":"Workflow","score":0.6545000076293945},{"id":"https://openalex.org/C63479239","display_name":"Robustness (evolution)","score":0.5692999958992004},{"id":"https://openalex.org/C120314980","display_name":"Distributed computing","score":0.5160999894142151},{"id":"https://openalex.org/C28855332","display_name":"Quantization (signal processing)","score":0.5077000260353088},{"id":"https://openalex.org/C557945733","display_name":"Data transmission","score":0.38280001282691956}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7130605841","title":"Leveraging Large Language Models for Hybrid Workplace Recommendation","url":"https://doi.org/10.1109/fllm67465.2025.11390885","published":"2025-11-25","authors":["Yujin Kim","Chin-Chia Hsu"],"abstract":"In hybrid work environments, Large Language Models (LLMs) can assist employees in planning where to work by offering personalized workspace recommendations and explanations. This paper presents a workspace recommendation system that leverages LLM’s reasoning skills to support decision-making in hybrid settings. Through a user study, we evaluated how LLM-generated suggestions influence workers’ choices and examined the effectiveness of the system. We find that LLMs can reason beyond prompt constraints, balance competing workspace needs, and influence user decisions. Participants in our study found the system convenient and helpful, even without explanations. Our findings highlight the potential of LLM-driven tools to enhance workspace planning in a hybrid workplace.","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/fllm67465.2025.11390885","openalex_id":"https://openalex.org/W7130605841","cited_by_count":0,"quality_score":45,"matched_keywords":["LLM","personalized"],"author_affiliations":["Microsoft (United States)","Microsoft Research (United Kingdom)"],"concepts":[{"id":"https://openalex.org/C58581272","display_name":"Workspace","score":0.9563000202178955},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6779999732971191},{"id":"https://openalex.org/C18762648","display_name":"Work (physics)","score":0.593999981880188},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.5479000210762024},{"id":"https://openalex.org/C168031717","display_name":"Balance (ability)","score":0.3937999904155731},{"id":"https://openalex.org/C56739046","display_name":"Knowledge management","score":0.37630000710487366},{"id":"https://openalex.org/C195094911","display_name":"Process management","score":0.3310000002384186},{"id":"https://openalex.org/C20162079","display_name":"Case-based reasoning","score":0.31369999051094055}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4416655211","title":"Computing Transformation: From Large Language Model to Agentic AI","url":"https://doi.org/10.1145/3765515.3771732","published":"2025-11-25","authors":["Kun Tan"],"abstract":"This talk examines the computing transformation during the era of modern AI revolution. Driven by high computing and bandwidth demand, scale-out datacenter architecture has shifted to scale-up super AI computers. And the evolution of AI models, from dense transformers, to sparse mixture-of-experts, to future agentic AI systems, continuously bring new types of workloads and cast new requirements to computing. Taken together, these render the next-generation accelerated and parallel techniques as well as distributed software designs.","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3765515.3771732","openalex_id":"https://openalex.org/W4416655211","cited_by_count":0,"quality_score":41,"matched_keywords":["language model"],"author_affiliations":["Huawei Technologies (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7587000131607056},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.49300000071525574},{"id":"https://openalex.org/C2777904410","display_name":"Software","score":0.44440001249313354},{"id":"https://openalex.org/C123657996","display_name":"Architecture","score":0.3882000148296356},{"id":"https://openalex.org/C199360897","display_name":"Programming language","score":0.3206999897956848},{"id":"https://openalex.org/C26517878","display_name":"Key (lock)","score":0.320499986410141},{"id":"https://openalex.org/C204241405","display_name":"Transformation (genetics)","score":0.31679999828338623},{"id":"https://openalex.org/C115903868","display_name":"Software engineering","score":0.30809998512268066}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/effects-of-llm-use-and-note-taking-on-reading-comprehension-and-memory-a-randomised-experiment-in-secondary-schools","title":"Effects of LLM Use and Note-Taking On Reading Comprehension and Memory: A Randomised Experiment in Secondary Schools","url":"https://www.microsoft.com/en-us/research/publication/effects-of-llm-use-and-note-taking-on-reading-comprehension-and-memory-a-randomised-experiment-in-secondary-schools/","published":"2025-11-24","authors":["Pia Kreijkes","Viktor Kewenig","Martina Kuvalja","Mina Lee","Jake Hofman","Sylvia Vitello","Abigail Sellen","Sean Rintel","Daniel G. Goldstein","David Rothschild","Lev Tankelevitch","Tim Oates"],"abstract":"Students’ rapid uptake of Generative Artificial Intelligence tools, particularly large language models (LLMs), raises urgent questions about their effects on learning. We compared the impact of LLM use to that of traditional note-taking, or a combination of both, on secondary school students’ reading comprehension and retention. We conducted a pre-registered, randomised controlled experiment with within- and between-participant design elements in schools in England. 405 students, aged 14-15 years, studied two text passages and completed comprehension and retention tests three days later. Quantitative results demonstrated that both note-taking alone and combined with LLM use had significant positive effects on retention and comprehension compared to using the LLM alone. Yet, most students preferred using the LLM over note-taking, and perceived it as more helpful. Qualitative results revea...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":84,"matched_keywords":["Article (Journal)","Artificial intelligence","Human-computer interaction","Social sciences","1970-01-01","LLM","memory","long-term"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"hf-org-paper:Tencent-Hunyuan:2511.19575","title":"HunyuanOCR Technical Report","url":"https://huggingface.co/papers/2511.19575","published":"2025-11-24","authors":["Tencent/Hunyuan"],"abstract":"This paper presents HunyuanOCR, a commercial-grade, open-source, and lightweight (1B parameters) Vision-Language Model (VLM) dedicated to OCR tasks. The architecture comprises a Native Vision Transformer (ViT) and a lightweight LLM connected via an MLP adapter. HunyuanOCR demonstrates superior performance, outperforming commercial APIs, traditional pipelines, and larger models (e.g., Qwen3-VL-4B). Specifically, it surpasses current public solutions in perception tasks (Text Spotting, Parsing) and excels in semantic tasks (IE, Text Image Translation), securing first place in the ICDAR 2025 DIMT Challenge (Small Model Track). Furthermore, it achieves state-of-the-art (SOTA) results on OCRBench among VLMs with fewer than 3B parameters. HunyuanOCR achieves breakthroughs in three key aspects: 1) Unifying Versatility and Efficiency: We implement comprehensive support for core capabilities incl...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["HuggingFace org papers","Tencent-Hunyuan","LLM","language model"],"author_affiliations":["Tencent/Hunyuan"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/Tencent-Hunyuan/papers"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/fara-7b-an-efficient-agentic-model-for-computer-use","title":"Fara-7B: An Efficient Agentic Model for Computer Use","url":"https://www.microsoft.com/en-us/research/publication/fara-7b-an-efficient-agentic-model-for-computer-use/","published":"2025-11-24","authors":["Ahmed Awadallah","Yash Lara","Raghav Magazine","Hussein Mozannar","Akshay Nambi","Yash Pandya","Aravind Rajeswaran","Corby Rosset","Alexey Taymanov","Vibhav Vineet","Spencer Whitehead","Andrew Zhao"],"abstract":"Progress in computer use agents (CUAs) has been constrained by the absence of large and high-qualitydatasets that capture how humans interact with a computer. While LLMs have thrived on abundanttextual data, no comparable corpus exists for CUA trajectories. To address these gaps, we introduceFaraGen, a novel synthetic data generation system for multi-step web tasks. FaraGen can proposediverse tasks from frequently used websites, generate multiple solution attempts, and filter successfultrajectories using multiple verifiers. It achieves high throughput, yield, and diversity for multi-stepweb tasks, producing verified trajectories at approximately $1 each. We use this data to train Fara-7B, anative CUA model that perceives the computer using only screenshots, executes actions via predictedcoordinates, and is small enough to run on-device. We find that Fara-7B outperforms other CUA modelsof...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Tech Report","Artificial intelligence","efficient"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"official:3de5b0b765a554ce","title":"HunyuanVideo 1.5 Technical Report","url":"https://huggingface.co/papers/2511.18870","published":"2025-11-24","authors":["Tencent Hunyuan"],"abstract":"","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_report"],"source":"official_report","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Tencent/Hunyuan"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official company report source"}},{"id":"openalex:W7106483534","title":"Embedding vs Image-Based AI: A Comparative Fairness Studyin Chest X-ray Analysis","url":"https://doi.org/10.1609/aaaiss.v7i1.36920","published":"2025-11-23","authors":["Gebreyowhans H. Bahre","Hassan Hamidi","Andrew B. Sellergren","Leo Anthony Celi","Francesco Calimeri","Laleh Seyyed-Kalantari"],"abstract":"AI has shown remarkable potential in healthcare, but faces accessibility challenges due to high computational and expertise demands, especially in medical image analysis. Vector embeddings, compact representations of medical images achieved from foundation models in zero-shot inference, offer a potential solution. Recently, an equivalent vector embeddings dataset of existing large publicly available medical images has been released, for which training an AI model requires significantly lower computing infrastructure and storage needs. Such data sets provide greater accessibility to AI in medical imaging for those who do not have access to large computing resources. The burning question remains: What is the gain or loss in using vector embedding to replace medical images, particularly from a fairness and utility point of view? In this work, we compare AI models trained in vector embedding...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1609/aaaiss.v7i1.36920","openalex_id":"https://openalex.org/W7106483534","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Computational Physics (United States)","Google (United States)","University of Calabria","Vector Institute"],"concepts":[{"id":"https://openalex.org/C41608201","display_name":"Embedding","score":0.7239999771118164},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6693000197410583},{"id":"https://openalex.org/C31601959","display_name":"Medical imaging","score":0.5422000288963318},{"id":"https://openalex.org/C48044578","display_name":"Scalability","score":0.5273000001907349},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4950999915599823},{"id":"https://openalex.org/C28719098","display_name":"Point (geometry)","score":0.48570001125335693},{"id":"https://openalex.org/C534262118","display_name":"Medical diagnosis","score":0.4609000086784363},{"id":"https://openalex.org/C115961682","display_name":"Image (mathematics)","score":0.42890000343322754}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7106238051","title":"EvoP: Robust LLM Inference via Evolutionary Pruning","url":"https://doi.org/10.1007/978-981-95-3343-5_36","published":"2025-11-22","authors":["Shangyu Wu","Hongchao Du","Ying Xiong","Shuai Chen","Tei-Wei Kuo","Nan Guan","Chun Jason Xue"],"abstract":"","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"book-chapter","doi":"https://doi.org/10.1007/978-981-95-3343-5_36","openalex_id":"https://openalex.org/W7106238051","cited_by_count":0,"quality_score":41,"matched_keywords":["LLM"],"author_affiliations":["Baidu (China)","City University of Hong Kong","Mohamed bin Zayed University of Artificial Intelligence","National Taiwan University"],"concepts":[{"id":"https://openalex.org/C108010975","display_name":"Pruning","score":0.9172000288963318},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8626000285148621},{"id":"https://openalex.org/C173801870","display_name":"Heuristic","score":0.6503000259399414},{"id":"https://openalex.org/C48044578","display_name":"Scalability","score":0.6326000094413757},{"id":"https://openalex.org/C2776214188","display_name":"Inference","score":0.6220999956130981},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.6111000180244446},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5730000138282776},{"id":"https://openalex.org/C159149176","display_name":"Evolutionary algorithm","score":0.4846999943256378}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4416532613","title":"RAG-Targeted SFT Improves RAG-Enhanced Math Reasoning","url":"https://doi.org/10.1007/978-981-95-3346-6_24","published":"2025-11-22","authors":["Haiye Lin","Ruobing Xie","Hao Zhang","Wenjie Liang","Jin Xu","Ding Zhang","Jiale Wang","Haitao Zheng","Yanfeng Chen","Saiyong Yang","Xingwu Sun","Zhanhui Kang"],"abstract":"","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"book-chapter","doi":"https://doi.org/10.1007/978-981-95-3346-6_24","openalex_id":"https://openalex.org/W4416532613","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Tencent (China)","Tsinghua–Berkeley Shenzhen Institute"],"concepts":[{"id":"https://openalex.org/C2776214188","display_name":"Inference","score":0.815500020980835},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7544999718666077},{"id":"https://openalex.org/C2779343474","display_name":"Context (archaeology)","score":0.6146000027656555},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5138000249862671},{"id":"https://openalex.org/C195344581","display_name":"Automated reasoning","score":0.5067999958992004},{"id":"https://openalex.org/C2780586882","display_name":"Simple (philosophy)","score":0.5009999871253967},{"id":"https://openalex.org/C89288958","display_name":"Reasoning system","score":0.3776000142097473},{"id":"https://openalex.org/C86827895","display_name":"Opportunistic reasoning","score":0.37059998512268066}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4416532805","title":"Latent Feature Activation Steering for Enhancing Semantic Consistency in Large Language Models","url":"https://doi.org/10.1007/978-981-95-3346-6_13","published":"2025-11-22","authors":["Jingyuan Yang","Rongjun Li","Weixuan Wang","Ziyu Zhou","Zhiyong Feng","Wei Peng"],"abstract":"","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"book-chapter","doi":"https://doi.org/10.1007/978-981-95-3346-6_13","openalex_id":"https://openalex.org/W4416532805","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Huawei Technologies (China)","RMIT University","Tianjin University","University of Edinburgh"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8622000217437744},{"id":"https://openalex.org/C2776214188","display_name":"Inference","score":0.5796999931335449},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5508999824523926},{"id":"https://openalex.org/C2776401178","display_name":"Feature (linguistics)","score":0.5443000197410583},{"id":"https://openalex.org/C101738243","display_name":"Autoencoder","score":0.5394999980926514},{"id":"https://openalex.org/C66746571","display_name":"ENCODE","score":0.5037999749183655},{"id":"https://openalex.org/C66322947","display_name":"Transformer","score":0.5006999969482422},{"id":"https://openalex.org/C2776436953","display_name":"Consistency (knowledge bases)","score":0.4975999891757965}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4416532562","title":"Gradient Co-occurrence Analysis for Detecting Unsafe Prompts in Large Language Models","url":"https://doi.org/10.1007/978-981-95-3346-6_14","published":"2025-11-22","authors":["Jingyuan Yang","Bowen Yan","Rongling Li","Ziyu Zhou","Xin Chen","Zhiyong Feng","Wei Peng"],"abstract":"","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"book-chapter","doi":"https://doi.org/10.1007/978-981-95-3346-6_14","openalex_id":"https://openalex.org/W4416532562","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Beijing Academy of Artificial Intelligence","Beijing University of Posts and Telecommunications","Huawei Technologies (China)","RMIT University","Tianjin University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8984000086784363},{"id":"https://openalex.org/C27158222","display_name":"Generalizability theory","score":0.8657000064849854},{"id":"https://openalex.org/C185798385","display_name":"Benchmark (surveying)","score":0.6671000123023987},{"id":"https://openalex.org/C204323151","display_name":"Range (aeronautics)","score":0.5249999761581421},{"id":"https://openalex.org/C103278499","display_name":"Similarity (geometry)","score":0.5184000134468079},{"id":"https://openalex.org/C188198153","display_name":"Limiting","score":0.48739999532699585},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4738999903202057},{"id":"https://openalex.org/C124101348","display_name":"Data mining","score":0.4350999891757965}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/scaling-competence-shrinking-reasoning-cognitive-signatures-in-language-model-learning","title":"Scaling Competence, Shrinking Reasoning: Cognitive Signatures in Language Model Learning","url":"https://www.microsoft.com/en-us/research/publication/scaling-competence-shrinking-reasoning-cognitive-signatures-in-language-model-learning/","published":"2025-11-21","authors":["Mukul Singh","Ananya Singha","Arjun Radhakrishna","Sumit Gulwani"],"abstract":"We analyze reasoning in language models during task-specific fine-tuning and draws parallel between reasoning tokens--intermediate steps generated while solving problem and the human working memory. Drawing from cognitive science, we align training dynamics with the Four Stages of Competence: models initially produce incorrect outputs without reasoning, then begin reasoning (but still fail), eventually reason effectively, and finally solve tasks without explicit reasoning. We find that reasoning token length expands as performance improves, peaks at the stage of conscious competence, then declines as the model internalizes the task. Notably, after training, models retain performance even when reasoning is removed--suggesting it scaffolded learning but is no longer needed. This progression offers actionable insights: reasoning token dynamics can serve as a signal for diagnosing training s...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Miscellaneous","Artificial intelligence","Programming languages and software engineering","Computer science","language model","memory"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/training-emergent-joint-associations-a-reinforcement-learning-approach-to-creative-thinking-in-language-models","title":"Training Emergent Joint Associations: A Reinforcement Learning Approach to Creative Thinking in Language Models","url":"https://www.microsoft.com/en-us/research/publication/training-emergent-joint-associations-a-reinforcement-learning-approach-to-creative-thinking-in-language-models/","published":"2025-11-21","authors":["Mukul Singh","Ananya Singha","Aishni Parab","Pronita Mehrotra","Sumit Gulwani"],"abstract":"Associative thinking--the ability to connect seemingly unrelated ideas--is a foundational element of human creativity and problem-solving. This paper explores whether reinforcement learning (RL) guided by associative thinking principles can enhance a model's performance across diverse generative tasks, including story writing, code generation, and chart creation. We introduce a reinforcement learning framework that uses a prompt-based evaluation mechanism, incorporating established divergent thinking metrics from creativity research. A base language model is fine-tuned using this framework to reward outputs demonstrating higher novelty through higher degrees of conceptual connectivity. Interestingly, the experimental results suggest that RL-based associative thinking-trained models not only generate more original and coherent stories but also exhibit improved abstraction and flexibility....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Miscellaneous","Artificial intelligence","Programming languages and software engineering","Computer science","language model"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/closing-the-performance-gap-between-ai-and-radiologists-in-chest-x-ray-reporting","title":"Closing the Performance Gap Between AI and Radiologists in Chest X-Ray Reporting","url":"https://www.microsoft.com/en-us/research/publication/closing-the-performance-gap-between-ai-and-radiologists-in-chest-x-ray-reporting/","published":"2025-11-21","authors":["Harshita Sharma","Maxwell C. Reynolds","Valentina Salvatelli","Anne-Marie G. Sykes","Kelly K. Horst","Anton Schwaighofer","Maximilian Ilse","Olesya Melnichenko","Sam Bond-Taylor","Fernando Pérez-García","Vamshi K. Mugu","Alex Chan"],"abstract":"AI-assisted report generation offers the opportunity to reduce radiologists'workload stemming from expanded screening guidelines, complex cases and workforce shortages, while maintaining diagnostic accuracy. In addition to describing pathological findings in chest X-ray reports, interpreting lines and tubes (L&T) is demanding and repetitive for radiologists, especially with high patient volumes. We introduce MAIRA-X, a clinically evaluated multimodal AI model for longitudinal chest X-ray (CXR) report generation, that encompasses both clinical findings and L&T reporting. Developed using a large-scale, multi-site, longitudinal dataset of 3.1 million studies (comprising 6 million images from 806k patients) from Mayo Clinic, MAIRA-X was evaluated on three holdout datasets and the public MIMIC-CXR dataset, where it significantly improved AI-generated reports over the state of the art on lexic...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Miscellaneous","Artificial intelligence","Medical, health and genomics","Medical Imaging"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W7106230846","title":"SageCopilot: An LLM-Empowered Autonomous Agent for Data Science as a Service","url":"https://doi.org/10.1109/tsc.2025.3635384","published":"2025-11-21","authors":["Yuan Liao","Jiang Bian","Yuhui Yun","Shuo Wang","Yubo Zhang","Jiaming Chu","Tao Wang","Yuchen Li","Xuhong Li","Shilei Ji","Haoyi Xiong"],"abstract":"While the field of natural language to SQL(NL2SQL) has made significant advancements in translating natural language instructions into executable SQL scripts for data querying and processing, achieving full automation within the broader data science pipeline–encompassing data querying, analysis, visualization, and reporting–remains a complex challenge. This study introduces SageCopilot, an advanced, industry-grade system that automates the data science pipeline by integrating Large Language Models (LLMs), Autonomous Agents (AutoAgents), and Language User Interfaces (LUIs). Designed with a two-phase architecture, SageCopilot uses an offline phase to generate high-quality demonstrations supporting In-Context Learning (ICL), which powers the online phase to transform user inputs into executable scripts for database queries, analysis, and visualization tasks. Leveraging specialized component...","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/tsc.2025.3635384","openalex_id":"https://openalex.org/W7106230846","cited_by_count":0,"quality_score":45,"matched_keywords":["LLM","agent"],"author_affiliations":["Baidu (China)","Beihang University","Beijing University of Posts and Telecommunications"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8950999975204468},{"id":"https://openalex.org/C160145156","display_name":"Executable","score":0.7559999823570251},{"id":"https://openalex.org/C61423126","display_name":"Scripting language","score":0.7092000246047974},{"id":"https://openalex.org/C55439883","display_name":"Correctness","score":0.583299994468689},{"id":"https://openalex.org/C510870499","display_name":"SQL","score":0.5309000015258789},{"id":"https://openalex.org/C195324797","display_name":"Natural language","score":0.5252000093460083},{"id":"https://openalex.org/C56288433","display_name":"Data manipulation language","score":0.4880000054836273},{"id":"https://openalex.org/C43521106","display_name":"Pipeline (software)","score":0.4575999975204468}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7116834104","title":"Orchestrating Human-AI Teams: The Manager Agent as aUnifying Research Challenge","url":"https://doi.org/10.1145/3772429.3772439","published":"2025-11-21","authors":["Charlie Masters","Advaith Vellanki","Jianhua Shangguan","Bart Kultys","Jonathan Gilmore","Alastair Moore","Stefano V. Albrecht"],"abstract":"While agentic AI has advanced in automating individual tasks, managing complex multi-agent workflows remains a challenging problem. This paper presents a research vision for autonomous agentic systems that orchestrate collaboration within dynamic human-AI teams. We propose the Autonomous Manager Agent as a core challenge: an agent that decomposes complex goals into task graphs, allocates tasks to human and AI workers, monitors progress, adapts to changing conditions, and maintains transparent stakeholder communication. We formalize workflow management as a Partially Observable Stochastic Game and identify four foundational challenges: (1) compositional reasoning for hierarchical decomposition, (2) multi-objective optimization under shifting preferences, (3) coordination and planning in ad hoc teams, and (4) governance and compliance by design. To advance this agenda, we release MA-Gym, a...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3772429.3772439","openalex_id":"https://openalex.org/W7116834104","cited_by_count":0,"quality_score":45,"matched_keywords":["agent","multi-agent"],"author_affiliations":["Google DeepMind (United Kingdom)"],"concepts":[{"id":"https://openalex.org/C177212765","display_name":"Workflow","score":0.8418999910354614},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6891999840736389},{"id":"https://openalex.org/C2780451532","display_name":"Task (project management)","score":0.5813999772071838},{"id":"https://openalex.org/C56739046","display_name":"Knowledge management","score":0.5235999822616577},{"id":"https://openalex.org/C140824633","display_name":"Workflow management system","score":0.5019000172615051},{"id":"https://openalex.org/C2164484","display_name":"Core (optical fiber)","score":0.4180999994277954},{"id":"https://openalex.org/C195094911","display_name":"Process management","score":0.4056999981403351},{"id":"https://openalex.org/C2780021488","display_name":"Task management","score":0.4023999869823456}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4416466501","title":"A Question Answering Dataset for Temporal-Sensitive Retrieval-Augmented Generation","url":"https://doi.org/10.1038/s41597-025-06098-y","published":"2025-11-21","authors":["Ziyang Chen","Erxue Min","Xiang Zhao","Yunxin Li","Xin Jia","Jinzhi Liao","Jichao Li","Shuaiqiang Wang","Baotian Hu","Dawei Yin"],"abstract":"We introduce ChronoQA, a benchmark dataset for Chinese question answering focused on evaluating temporal reasoning in Retrieval-Augmented Generation (RAG) systems. Built from over 300,000 news articles published between 2019 and 2024, ChronoQA contains 5,176 questions covering absolute, aggregate, and relative temporal types, with both explicit and implicit time expressions. The dataset features both single- and multi-document scenarios, reflecting real-world requirements for temporal alignment and logical consistency. By providing structured evaluation across a wide range of temporal tasks, ChronoQA offers a dynamic, reliable, and scalable resource for benchmarking RAG systems in evolving knowledge environments.","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1038/s41597-025-06098-y","openalex_id":"https://openalex.org/W4416466501","cited_by_count":0,"quality_score":45,"matched_keywords":["retrieval","news"],"author_affiliations":["Baidu (China)","Harbin Institute of Technology","National University of Defense Technology"],"concepts":[{"id":"https://openalex.org/C44291984","display_name":"Question answering","score":0.8149999976158142},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7940999865531921},{"id":"https://openalex.org/C86251818","display_name":"Benchmarking","score":0.7322999835014343},{"id":"https://openalex.org/C185798385","display_name":"Benchmark (surveying)","score":0.6409000158309937},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.5655999779701233},{"id":"https://openalex.org/C48044578","display_name":"Scalability","score":0.5414000153541565},{"id":"https://openalex.org/C206345919","display_name":"Resource (disambiguation)","score":0.4699999988079071},{"id":"https://openalex.org/C204323151","display_name":"Range (aeronautics)","score":0.44279998540878296}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4416464008","title":"SDSP: Scalable and Diverse Synthetic Pairwise Text Generation from Web Corpus Using Large Language Model","url":"https://doi.org/10.1007/978-981-95-4367-0_1","published":"2025-11-21","authors":["Xiaoxu Wu","Xi Li","Wentao Wu","Aleksei Timofeev","Yinfei Yang","Meng Cao","Ping Huang","Si Li","Jiulong Shan"],"abstract":"","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"book-chapter","doi":"https://doi.org/10.1007/978-981-95-4367-0_1","openalex_id":"https://openalex.org/W4416464008","cited_by_count":0,"quality_score":41,"matched_keywords":["language model"],"author_affiliations":["Apple (Israel)","Apple (United States)"],"concepts":[{"id":"https://openalex.org/C184898388","display_name":"Pairwise comparison","score":0.9096999764442444},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8921999931335449},{"id":"https://openalex.org/C48044578","display_name":"Scalability","score":0.6920999884605408},{"id":"https://openalex.org/C2776321320","display_name":"Annotation","score":0.6225000023841858},{"id":"https://openalex.org/C41608201","display_name":"Embedding","score":0.5986999869346619},{"id":"https://openalex.org/C177264268","display_name":"Set (abstract data type)","score":0.5504000186920166},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5493000149726868},{"id":"https://openalex.org/C77618280","display_name":"Scheme (mathematics)","score":0.5453000068664551}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4416513233","title":"PiCCL: Data-Driven Composition of Bespoke Pictorial Charts","url":"https://doi.org/10.1109/tvcg.2025.3634264","published":"2025-11-21","authors":["Haoyan Shi","Yunhai Wang","Junhao Chen","Chenglong Wang","Bongshin Lee"],"abstract":"We present PiCCL (Pictorial Chart Composition Language), a new language that enables users to easily create pictorial charts using a set of simple operators. To support systematic construction while addressing the main challenge of expressive pictorial chart authoring-manual composition and fine-tuning of visual properties-PiCCL introduces a parametric representation that integrates data-driven chart generation with graphical composition. It also employs a lazy data-binding mechanism that automatically synthesizes charts. PiCCL is grounded in a comprehensive analysis of real-world pictorial chart examples. We describe PiCCL's design and its implementation as piccl.js, a JavaScript-based library. To evaluate PiCCL, we showcase a gallery that demonstrates its expressiveness and report findings from a user study assessing the usability of piccl.js. We conclude with a discussion of PiCCL's l...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/tvcg.2025.3634264","openalex_id":"https://openalex.org/W4416513233","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Microsoft (United States)","Renmin University of China","Shandong University","Yonsei University"],"concepts":[{"id":"https://openalex.org/C44210515","display_name":"Bespoke","score":0.8589000105857849},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8245000243186951},{"id":"https://openalex.org/C170130773","display_name":"Usability","score":0.6859999895095825},{"id":"https://openalex.org/C190812933","display_name":"Chart","score":0.6741999983787537},{"id":"https://openalex.org/C2776359362","display_name":"Representation (politics)","score":0.5203999876976013},{"id":"https://openalex.org/C177264268","display_name":"Set (abstract data type)","score":0.503600001335144},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.4745999872684479},{"id":"https://openalex.org/C205208641","display_name":"Pie chart","score":0.4603999853134155}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4416466342","title":"Introduction to the Special Issue on Large Language Models for Recommender Systems","url":"https://doi.org/10.1145/3721299","published":"2025-11-21","authors":["Yongfeng Zhang","Lei Li","Luyang Kong"],"abstract":"Recommender systems have become pivotal in today’s digital landscape, shaping user experiences across diverse online platforms. Recent advances in Large Language Models (LLMs) such as T5, GPT, LLaMA, and their variants have introduced transformative possibilities for recommender systems. LLMs excel in processing and generating natural language text, offering a unique opportunity to reshape the design and elevate the effectiveness of recommendation algorithms. The main topic of this special issue is to explore the integration of Large Language Models and Recommender Systems, encompassing various facets, including model architectures, recommendation algorithms, evaluation methods, and real-world applications. It provides a dedicated platform for researchers and practitioners to share their insights, innovations, and empirical findings in the realm of LLMs for recommender systems, which hel...","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3721299","openalex_id":"https://openalex.org/W4416466342","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Amazon (United States)","Hong Kong Baptist University","Rutgers Sexual and Reproductive Health and Rights","Rutgers, The State University of New Jersey"],"concepts":[{"id":"https://openalex.org/C557471498","display_name":"Recommender system","score":0.8194000124931335},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7742999792098999},{"id":"https://openalex.org/C2778757428","display_name":"Realm","score":0.5393999814987183},{"id":"https://openalex.org/C136764020","display_name":"World Wide Web","score":0.5044999718666077},{"id":"https://openalex.org/C195324797","display_name":"Natural language","score":0.4375999867916107},{"id":"https://openalex.org/C70587473","display_name":"Transformative learning","score":0.4345000088214874},{"id":"https://openalex.org/C2522767166","display_name":"Data science","score":0.4163999855518341},{"id":"https://openalex.org/C153701036","display_name":"Trustworthiness","score":0.36239999532699585}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"hf-org-paper:Tencent-Hunyuan:2511.16317","title":"NaTex: Seamless Texture Generation as Latent Color Diffusion","url":"https://huggingface.co/papers/2511.16317","published":"2025-11-20","authors":["Tencent/Hunyuan"],"abstract":"","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["HuggingFace org papers","Tencent-Hunyuan"],"author_affiliations":["Tencent/Hunyuan"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/Tencent-Hunyuan/papers"}},{"id":"official:32cc109f7e72f7b8","title":"Gemini 3 Pro Image Model Card","url":"https://deepmind.google/models/model-cards/gemini-3-pro-image/","published":"2025-11-20","authors":["Google/DeepMind"],"abstract":"Official Google DeepMind model card.","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"model_card","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["model card","Gemini 3 Pro Image"],"author_affiliations":["Google/DeepMind"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Google DeepMind model cards page https://deepmind.google/models/model-cards/"}},{"id":"apple:mlupuqttz4bc5o94kl8skka4","title":"Using LLMs for Late Multimodal Sensor Fusion for Activity Recognition","url":"https://machinelearning.apple.com/research/multimodal-sensor-fusion","published":"2025-11-20","authors":["Ilker Demirel","Karan Ketankumar Thakkar","Benjamin Elizalde","Miquel Espi Marques","Shirley Ren","Jaya Narain"],"abstract":"This paper was accepted at the Learning from Time Series for Health workshop at NeurIPS 2025.","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"apple:e0b8l9l58spx5skkall7l6g9","title":"Speech Foundation Models Generalize to Time Series Tasks from Wearable Sensor Data","url":"https://machinelearning.apple.com/research/speech-foundation","published":"2025-11-20","authors":["Jaya Narain","Zakaria Aldeneh","Shirley Ren"],"abstract":"This paper was accepted at the Learning from Time Series for Health workshop at NeurIPS 2025.","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"apple:lwm1hm142z8ceyhhva2na4gv","title":"Learning the Relative Composition of EEG Signals Using Pairwise Relative Shift Pretraining","url":"https://machinelearning.apple.com/research/relative-composition-eeg","published":"2025-11-20","authors":["Christopher Sandino","Sayeri Lala","Geeling Chau§","Melika Ayoughi","Behrooz Mahasseni","Ellen Zippi","Ali Moin","Erdrin Azemi","Hanlin Goh"],"abstract":"This paper was accepted at the Foundation Models for the Brain and Body workshop at NeurIPS 2025.","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"openalex:W4416410015","title":"NeuroDiff3D: a 3D generation method optimizing viewpoint consistency through diffusion modeling","url":"https://doi.org/10.1038/s41598-025-24916-6","published":"2025-11-20","authors":["Kai Lu","Qiao Sui","Xi Chen","Zihao Wang"],"abstract":"Converting 2D images into accurate 3D models is one of the core tasks in computer vision and graphics. However, existing methods still face issues in multi-view generation tasks, such as poor geometric consistency, insufficient detail recovery, and inaccurate texture mapping. This is particularly evident in complex objects or multi-view environments, where the generated 3D models often fail to maintain consistency. To address these challenges, this paper proposes the NeuroDiff3D model, which combines 3D diffusion modeling with multimodal information fusion techniques. NeuroDiff3D integrates structural, texture, and semantic information and is divided into two main components: the 3D Prior Pipeline and the Model Training Pipeline. In the 3D Prior Pipeline, a rough 3D object representation is generated using the 3D diffusion model, gradually recovering the object's geometric shape, texture...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1038/s41598-025-24916-6","openalex_id":"https://openalex.org/W4416410015","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Google (United States)","Ningbo University of Technology","Quzhou University","Wuxi Taihu Hospital"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8141999840736389},{"id":"https://openalex.org/C2776436953","display_name":"Consistency (knowledge bases)","score":0.6233999729156494},{"id":"https://openalex.org/C2776359362","display_name":"Representation (politics)","score":0.598800003528595},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5486999750137329},{"id":"https://openalex.org/C2777897806","display_name":"3D modeling","score":0.5331000089645386},{"id":"https://openalex.org/C3019007443","display_name":"3d model","score":0.5238000154495239},{"id":"https://openalex.org/C43521106","display_name":"Pipeline (software)","score":0.4729999899864197},{"id":"https://openalex.org/C2781238097","display_name":"Object (grammar)","score":0.4726000130176544}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7130396739","title":"Demonstration of MechStyle: Augmenting Generative AI with Mechanical Simulation to Create Stylized and Structurally Viable 3D Models","url":"https://doi.org/10.1145/3774746.3779236","published":"2025-11-20","authors":["Faraz Faruqi","Amira Abdel-Rahman","Leandra Tejedor","Martin Nisser","Jiaji Li","Vrushank Phadnis","Varun Jampani","Neil Gershenfeld","Megan Hofmann","Stefanie Mueller"],"abstract":"Recent developments in Generative AI enable creators to stylize 3D models based on text and image prompts. These methods change the 3D model geometry, which can compromise the model’s structural integrity once fabricated. We present MechStyle, a system that enables creators to stylize 3D printable models while preserving their structural integrity. MechStyle accomplishes this by augmenting the Generative AI-based stylization process with feedback from a Finite Element Analysis (FEA) simulation. As the stylization process modifies the geometry to approximate the desired style, feedback from the FEA simulation reduces modifications to regions with increased stress. In this demonstration, attendees can interact with MechStyle’s prompt-based UI, observe simulation-informed stylization results, and explore a curated collection of 3D printed objects generated using MechStyle. These artifacts s...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3774746.3779236","openalex_id":"https://openalex.org/W7130396739","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Disability Practice Institute","Google (United States)","MIT-Harvard Center for Ultracold Atoms","Northeastern University","University of Washington"],"concepts":[{"id":"https://openalex.org/C38935604","display_name":"Stylized fact","score":0.8330000042915344},{"id":"https://openalex.org/C184408114","display_name":"Generative Design","score":0.8330000042915344},{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.7700999975204468},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7035999894142151},{"id":"https://openalex.org/C98045186","display_name":"Process (computing)","score":0.6686000227928162},{"id":"https://openalex.org/C167966045","display_name":"Generative model","score":0.616599977016449},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5307000279426575},{"id":"https://openalex.org/C100776233","display_name":"Bridge (graph theory)","score":0.4765999913215637}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/towards-efficient-large-multimodal-model-serving","title":"ModServe: Modality- and Stage-Aware Resource Disaggregation for Scalable Multimodal Model Serving","url":"https://www.microsoft.com/en-us/research/publication/towards-efficient-large-multimodal-model-serving/","published":"2025-11-19","authors":["Haoran Qiu","Anish Biswas","Zihan Zhao","Jayashree Mohan","Alind Khare","Esha Choukse","Íñigo Goiri","Zeyu Zhang","Haiying Shen","Chetan Bansal","Ramachandran Ramjee","Rodrigo Fonseca"],"abstract":"Large multimodal models (LMMs) demonstrate impressive capabilities in understanding images, videos, and audio beyond text. However, efficiently serving LMMs in production environments poses significant challenges due to their complex architectures and heterogeneous characteristics across their multi-stage inference pipelines.We present the first comprehensive systems analysis of two prominent LMM architectures, decoder-only and cross-attention, across six representative open-source models, revealing key systems design implications. We also present an in-depth analysis of production LMM inference traces, uncovering unique workload characteristics, including variable, heavy-tailed request distributions and bursty traffic patterns.Based on these insights, we propose ModServe, a modular LMM serving system that decouples stages for independent optimization and adaptive scaling. ModServe dynam...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":84,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Systems and networking","large language models","Machine learning","Multimodal Large Language Models","systems","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/train-short-infer-long-speech-llm-enables-zero-shot-streamable-joint-asr-and-diarization-on-long-audio","title":"Train Short, Infer Long: Speech-LLM Enables Zero-Shot Streamable Joint ASR and Diarization on Long Audio","url":"https://www.microsoft.com/en-us/research/publication/train-short-infer-long-speech-llm-enables-zero-shot-streamable-joint-asr-and-diarization-on-long-audio/","published":"2025-11-19","authors":["Mohan Shi","Xiong Xiao","Ruchao Fan","Shaoshi Ling","Jinyu Li"],"abstract":"Joint automatic speech recognition (ASR) and speaker diarization aim to answer the question\"who spoke what\"in multi-speaker scenarios. In this paper, we present an end-to-end speech large language model (Speech-LLM) for Joint strEamable DIarization and aSr (JEDIS-LLM). The model is trained only on short audio under 20s but is capable of streamable inference on long-form audio without additional training. This is achieved by introducing a Speaker Prompt Cache (SPC) with an on-the-fly update mechanism during chunk-wise streaming inference, inspired by the autoregressive nature of LLMs. The SPC also allows the seamless use of pre-enrolled speaker profiles which is common in many scenarios like meeting transcription. To further enhance diarization capability, we incorporate word-level speaker supervision into the speech encoder during training. Experimental results demonstrate that our syste...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.1109/icassp55912.2026.11464726","openalex_id":"https://openalex.org/W4416550218","cited_by_count":0,"quality_score":76,"matched_keywords":["Miscellaneous","Audio and Acoustics","Engineering","Speech recognition","LLM","language model"],"author_affiliations":["Microsoft","Microsoft (United States)","University of California, Los Angeles"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"hf-org-paper:stepfun-ai:2511.15848","title":"Step-Audio-R1 Technical Report","url":"https://huggingface.co/papers/2511.15848","published":"2025-11-19","authors":["StepFun"],"abstract":"Recent advances in reasoning models have demonstrated remarkable success in text and vision domains through extended chain-of-thought deliberation. However, a perplexing phenomenon persists in audio language models: they consistently perform better with minimal or no reasoning, raising a fundamental question - can audio intelligence truly benefit from deliberate thinking? We introduce Step-Audio-R1, the first audio reasoning model that successfully unlocks reasoning capabilities in the audio domain. Through our proposed Modality-Grounded Reasoning Distillation (MGRD) framework, Step-Audio-R1 learns to generate audio-relevant reasoning chains that genuinely ground themselves in acoustic features rather than hallucinating disconnected deliberations. Our model exhibits strong audio reasoning capabilities, surpassing Gemini 2.5 Pro and achieving performance comparable to the state-of-the-art...","companies":["StepFun"],"matched_orgs":["StepFun"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["HuggingFace org papers","stepfun-ai","distillation"],"author_affiliations":["StepFun"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/stepfun-ai/papers"}},{"id":"hf-org-paper:tencent:2511.15248","title":"EntroPIC: Towards Stable Long-Term Training of LLMs via Entropy Stabilization with Proportional-Integral Control","url":"https://huggingface.co/papers/2511.15248","published":"2025-11-19","authors":["Tencent/Hunyuan"],"abstract":"","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["HuggingFace org papers","tencent","long-term"],"author_affiliations":["Tencent/Hunyuan"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/tencent/papers"}},{"id":"hf-org-paper:Tencent-Hunyuan:2511.15705","title":"GeoVista: Web-Augmented Agentic Visual Reasoning for Geolocalization","url":"https://huggingface.co/papers/2511.15705","published":"2025-11-19","authors":["Tencent/Hunyuan"],"abstract":"","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["HuggingFace org papers","Tencent-Hunyuan"],"author_affiliations":["Tencent/Hunyuan"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/Tencent-Hunyuan/papers"}},{"id":"official:c833edaa5850c74b","title":"GPT-5.1-Codex-Max System Card","url":"https://openai.com/index/gpt-5-1-codex-max-system-card","published":"2025-11-19","authors":["OpenAI"],"abstract":"This system card outlines the comprehensive safety measures implemented for GPT‑5.1-CodexMax. It details both model-level mitigations, such as specialized safety training for harmful tasks and prompt injections, and product-level mitigations like agent sandboxing and configurable network access.","companies":["OpenAI"],"matched_orgs":["OpenAI"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Publication","agent"],"author_affiliations":["OpenAI"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official RSS feed https://openai.com/news/rss.xml"}},{"id":"openalex:W7124033644","title":"DyOrc: Efficient Serving of Dynamic Machine Learning Workflows","url":"https://doi.org/10.1145/3772052.3772218","published":"2025-11-19","authors":["Shiwei Zhang","Lansong Diao","Zisheng Meng","Siyu Wang","Wei Lin","Chuan Wu"],"abstract":"The landscape of machine learning applications has shifted from monolithic end-to-end models to compositions of pretrained large foundation models. For instance, multi-modal chatbots are often built by composition of a large language model and modality-specific encoder models. Such applications often feature dynamic workflows, with models conditionally evoked according to different inputs and intermediate processing results. Conditional model execution prevents conventional request batching and hinders efficient hardware utilization, due to dynamic, diverging execution paths across requests. Separately deploying models as dedicated services and invoking them on the go during dynamic workflow executions can potentially allow service-wise request batching, boosting resource efficiency. However, generic workflow orchestrators are proven inefficient for machine learning applications, due to....","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3772052.3772218","openalex_id":"https://openalex.org/W7124033644","cited_by_count":0,"quality_score":45,"matched_keywords":["language model","efficient"],"author_affiliations":["Alibaba Group (China)","Chinese University of Hong Kong","University of Hong Kong"],"concepts":[{"id":"https://openalex.org/C177212765","display_name":"Workflow","score":0.8378999829292297},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8201000094413757},{"id":"https://openalex.org/C165696696","display_name":"Exploit","score":0.7922000288963318},{"id":"https://openalex.org/C46686674","display_name":"Boosting (machine learning)","score":0.6053000092506409},{"id":"https://openalex.org/C2776401178","display_name":"Feature (linguistics)","score":0.5105999708175659},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.5001000165939331},{"id":"https://openalex.org/C118505674","display_name":"Encoder","score":0.4936000108718872},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.47760000824928284}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4416549410","title":"Cascading Adversarial Bias from Injection to Distillation in Language Models","url":"https://doi.org/10.1145/3719027.3765122","published":"2025-11-19","authors":["Harsh Chaudhari","Jamie Hayes","Matthew Jagielski","Ilia Shumailov","Milad Nasr","Alina Oprea"],"abstract":"Model distillation has become essential for creating deployable language models, but their widespread deployment raises concerns about about their resilience to adversarial manipulation. This paper investigates how adversaries can inject subtle biases into teacher models through minimal data poisoning during training, which propagates to a smaller distilled student model and becomes significantly amplified. We identify two propagation modes: Untargeted (affecting multiple tasks) and Targeted (focusing on specific task while maintaining normal behavior elsewhere). With only 25 poisoned samples (0.25% poisoning rate), student models generate biased responses 76.9% of the time in targeted scenarios versus 69.4% in teachers, while untargeted propagation shows 5.7X-29.2X higher adversarial bias rate in students on unseen tasks. We validate across six bias types (targeted advertisement, phishi...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3719027.3765122","openalex_id":"https://openalex.org/W4416549410","cited_by_count":0,"quality_score":45,"matched_keywords":["LLM","distillation"],"author_affiliations":["Google (United Kingdom)","Google (United States)","Google DeepMind (United Kingdom)","Northeastern University"],"concepts":[{"id":"https://openalex.org/C37736160","display_name":"Adversarial system","score":0.8345000147819519},{"id":"https://openalex.org/C100279451","display_name":"Perplexity","score":0.698199987411499},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.6761000156402588},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6437000036239624},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4616999924182892},{"id":"https://openalex.org/C38652104","display_name":"Computer security","score":0.4341000020503998},{"id":"https://openalex.org/C2780451532","display_name":"Task (project management)","score":0.42500001192092896},{"id":"https://openalex.org/C2779585090","display_name":"Resilience (materials science)","score":0.3993000090122223}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4416385048","title":"The cost of thinking is similar between large reasoning models and humans","url":"https://doi.org/10.1073/pnas.2520077122","published":"2025-11-19","authors":["Andrea Gregor de Varda","Ferdinando Pio D'Elia","Hope Kean","Andrew K. Lampinen","Evelina Fedorenko"],"abstract":"Do neural network models capture the cognitive demands of human reasoning? Across seven reasoning tasks, we show that the length of the chain-of-thought generated by large reasoning models predicts human reaction times both within tasks-tracking item-level difficulty-and across tasks-capturing broader differences in cognitive demands. This model-to-human alignment shows that out-of-the-box reasoning models reflect core features underlying problem and task complexity in human cognition, without requiring any built-in symbolic mechanisms.","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1073/pnas.2520077122","openalex_id":"https://openalex.org/W4416385048","cited_by_count":7,"quality_score":44,"matched_keywords":[],"author_affiliations":["Google (United States)","IT University of Copenhagen","Institute of Cognitive and Brain Sciences","Massachusetts Institute of Technology","University College Copenhagen","University of Copenhagen"],"concepts":[{"id":"https://openalex.org/C169900460","display_name":"Cognition","score":0.6363000273704529},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.574999988079071},{"id":"https://openalex.org/C2780451532","display_name":"Task (project management)","score":0.5625},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5260999798774719},{"id":"https://openalex.org/C86827895","display_name":"Opportunistic reasoning","score":0.4336000084877014},{"id":"https://openalex.org/C188147891","display_name":"Cognitive science","score":0.40849998593330383},{"id":"https://openalex.org/C2164484","display_name":"Core (optical fiber)","score":0.39739999175071716},{"id":"https://openalex.org/C50644808","display_name":"Artificial neural network","score":0.3968999981880188}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":7}},{"id":"openalex:W7123919272","title":"M <scp>od</scp> S <scp>erve</scp> : Modality- and Stage-Aware Resource Disaggregation for Scalable Multimodal Model Serving","url":"https://doi.org/10.1145/3772052.3772254","published":"2025-11-19","authors":["Haoran Qiu","Anish Biswas","Z.W. Zhao","Jayashree Mohan","Alind Khare","Esha Choukse","Íñigo Goiri","Zeyu Zhang","Haiying Shen","Chetan Bansal","Ram Ramjee","Rodrigo Fonseca"],"abstract":"Large multimodal models (LMMs) demonstrate impressive capabilities in understanding images, videos, and audio beyond text. However, efficiently serving LMMs in production environments poses significant challenges due to their complex model architectures and heterogeneous characteristics across their multi-stage inference pipelines and modalities.","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3772052.3772254","openalex_id":"https://openalex.org/W7123919272","cited_by_count":2,"quality_score":39,"matched_keywords":[],"author_affiliations":["Microsoft (United States)","Microsoft Research (India)","University of Virginia"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7372999787330627},{"id":"https://openalex.org/C48044578","display_name":"Scalability","score":0.6446999907493591},{"id":"https://openalex.org/C2776214188","display_name":"Inference","score":0.531000018119812},{"id":"https://openalex.org/C206345919","display_name":"Resource (disambiguation)","score":0.44190001487731934},{"id":"https://openalex.org/C2778348673","display_name":"Production (economics)","score":0.3865000009536743},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3817000091075897},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.33180001378059387},{"id":"https://openalex.org/C2992770021","display_name":"Production model","score":0.3228999972343445}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":2}},{"id":"openalex:W7123622058","title":"Understanding Diffusion Model Serving in Production: A Top-Down Analysis of Workload, Scheduling, and Resource Efficiency","url":"https://doi.org/10.1145/3772052.3772206","published":"2025-11-19","authors":["Yanying Lin","Shuaipeng Wu","Shutian Luo","Hong Xu","Haiying Shen","Chong Ma","Min Shen","Le Chen","Chengzhong Xu","Lin Qu","Kejiang Ye"],"abstract":"This paper presents a comprehensive analysis of diffusion model serving challenges in production cloud environments. We examine the unique computational patterns and resource requirements that distinguish diffusion model serving from traditional ML workloads, revealing fundamental systemlevel challenges from their multi-stage pipeline architectures. Our analysis is based on a dataset collected from a commercial image generation service processing 3.5 million requests across 300+ GPUs of production operation.","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3772052.3772206","openalex_id":"https://openalex.org/W7123622058","cited_by_count":1,"quality_score":38,"matched_keywords":[],"author_affiliations":["Alibaba Group (China)","Chinese University of Hong Kong","Shenzhen Institutes of Advanced Technology","Southern University of Science and Technology","University of Macau","University of Virginia"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6557999849319458},{"id":"https://openalex.org/C2778348673","display_name":"Production (economics)","score":0.6383000016212463},{"id":"https://openalex.org/C43521106","display_name":"Pipeline (software)","score":0.6057999730110168},{"id":"https://openalex.org/C79974875","display_name":"Cloud computing","score":0.5952000021934509},{"id":"https://openalex.org/C206345919","display_name":"Resource (disambiguation)","score":0.5310999751091003},{"id":"https://openalex.org/C69357855","display_name":"Diffusion","score":0.5228000283241272},{"id":"https://openalex.org/C2777958785","display_name":"Resource efficiency","score":0.5085999965667725},{"id":"https://openalex.org/C2780378061","display_name":"Service (business)","score":0.4147000014781952}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/a-causal-perspective-on-measuring-explaining-and-mitigating-smells-in-llm-generated-code","title":"A Causal Perspective on Measuring, Explaining and Mitigating Smells in LLM-Generated Code","url":"https://www.microsoft.com/en-us/research/publication/a-causal-perspective-on-measuring-explaining-and-mitigating-smells-in-llm-generated-code/","published":"2025-11-18","authors":["Alejandro Velasco","Daniel Rodríguez-Cárdenas","Dipin Khati","David N. Palacio","Luftar Rahman Alif","Denys Poshyvanyk"],"abstract":"Recent advances in large language models (LLMs) have accelerated their adoption in software engineering contexts. However, concerns persist about the structural quality of the code they produce. In particular, LLMs often replicate poor coding practices, introducing code smells (i.e., patterns that hinder readability, maintainability, or design integrity). Although prior research has examined the detection or repair of smells, we still lack a clear understanding of how and when these issues emerge in generated code.This paper addresses this gap by systematically measuring , explaining and mitigating smell propensity in LLM-generated code. We build on the Propensity Smelly Score (PSC), a probabilistic metric that estimates the likelihood of generating particular smell types, and establish its robustness as a signal of structural quality. Using PSC as an instrument for causal analysis, we i...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Inproceedings (Conference)","Programming languages and software engineering","Computer science","1970-01-01","LLM"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"official:e64b4922fbda62b6","title":"Souper-Model: How Simple Arithmetic Unlocks State-of-the-Art LLM Performance","url":"https://ai.meta.com/research/publications/souper-model-how-simple-arithmetic-unlocks-state-of-the-art-llm-performance/","published":"2025-11-18","authors":["Shalini Maiti","Amar Budhiraja","Bhavul Gauri","Gaurav Chaurasia","Anton Protopopov","Alexis Audran-Reiss","Michael Slater","Despoina Magka","Tatiana Shavrina","Roberta Raileanu","Yoram Bachrach","Equal authorship"],"abstract":"Official Meta AI publication page.","companies":["Meta/FAIR"],"matched_orgs":["Meta/FAIR"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Core Machine Learning","LLM"],"author_affiliations":["Meta/FAIR"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Meta AI publication search https://ai.meta.com/results/?content_types%5B0%5D=publication&page=3"}},{"id":"official:f666c6433fde6600","title":"Gemini 3 Pro Model Card","url":"https://deepmind.google/models/model-cards/gemini-3-pro/","published":"2025-11-18","authors":["Google/DeepMind"],"abstract":"Official Google DeepMind model card.","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"model_card","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["model card","Gemini 3 Pro"],"author_affiliations":["Google/DeepMind"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Google DeepMind model cards page https://deepmind.google/models/model-cards/"}},{"id":"arxiv:2511.14410","title":"TTA: Transcribe, Translate and Alignment for Cross-lingual Speech Representation","url":"http://arxiv.org/abs/2511.14410","published":"2025-11-18","authors":["Liu, Wei","Li, Jiahong","Shao, Yiwen","Yu, Dong"],"abstract":"Speech-LLM models have demonstrated great performance in multi-modal and multi-task speech understanding. A typical speech-LLM paradigm is integrating speech modality with a large language model (LLM). While the Whisper encoder was frequently adopted in previous studies for speech input, it shows limitations regarding input format, model scale, and semantic performance. To this end, we propose a lightweight TTA model specialized in speech semantics for more effective LLM integration. With large-scale training of 358k hours of speech data on multilingual speech recognition (ASR), speech translation (ST) and speech-text alignment tasks, TTA is capable of producing robust cross-lingual speech representations. Extensive evaluations across diverse benchmarks, including ASR/ST, speech retrieval, and ASR-LLM performance assessments, demonstrate TTA's superiority over Whisper. Furthermore, we ri...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"","openalex_id":"https://openalex.org/W7106206878","cited_by_count":0,"quality_score":49,"matched_keywords":["LLM","language model","retrieval"],"author_affiliations":["Tencent (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8253999948501587},{"id":"https://openalex.org/C28490314","display_name":"Speech recognition","score":0.7084000110626221},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.5914999842643738},{"id":"https://openalex.org/C118505674","display_name":"Encoder","score":0.5267999768257141},{"id":"https://openalex.org/C184337299","display_name":"Semantics (computer science)","score":0.5049999952316284},{"id":"https://openalex.org/C2780366754","display_name":"Speech translation","score":0.49779999256134033},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.49160000681877136},{"id":"https://openalex.org/C2776359362","display_name":"Representation (politics)","score":0.45669999718666077}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4416334285","title":"ERNIE-RNA: an RNA language model with structure-enhanced representations","url":"https://doi.org/10.1038/s41467-025-64972-0","published":"2025-11-18","authors":["Weijie Yin","Zhaoyu Zhang","Shuo Zhang","Liang He","Ruiyang Zhang","Rui Jiang","Gan Liu","Jingyi Wang","Xuegong Zhang","Tao Qin","Zhen Xie"],"abstract":"Existing RNA language models (RLMs) largely overlook structural information in RNA sequences, leading to incomplete feature extraction and suboptimal performance on downstream tasks. In this study, we present ERNIE-RNA (Enhanced Representations with Base-Pairing Restriction for RNA Modeling), an RNA pre-trained language model based on a modified BERT (Bidirectional Encoder Representations from Transformers). Notably, ERNIE-RNA's attention maps exhibit superior ability to capture RNA structural features through zero-shot prediction, outperforming conventional methods like RNAfold and RNAstructure, suggesting that ERNIE-RNA naturally develops comprehensive representations of RNA architecture during pre-training. Moreover, after fine-tuning, ERNIE-RNA achieves state-of-the-art (SOTA) performance across various downstream tasks, including RNA structure and function predictions. In summary, E...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1038/s41467-025-64972-0","openalex_id":"https://openalex.org/W4416334285","cited_by_count":7,"quality_score":48,"matched_keywords":["language model"],"author_affiliations":["Beijing Haidian Hospital","Beijing Tongren Hospital","Capital Medical University","Microsoft (United States)","Microsoft Research Asia (China)","Qingdao Center of Resource Chemistry and New Materials","Syngenta (China)","Tsinghua University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7120000123977661},{"id":"https://openalex.org/C67705224","display_name":"RNA","score":0.6643999814987183},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.6114000082015991},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.5062000155448914},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5001000165939331},{"id":"https://openalex.org/C26517878","display_name":"Key (lock)","score":0.4925000071525574},{"id":"https://openalex.org/C14036430","display_name":"Function (biology)","score":0.46810001134872437},{"id":"https://openalex.org/C118505674","display_name":"Encoder","score":0.40720000863075256}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":7}},{"id":"openalex:W4416333173","title":"Understanding wetland park feature influence through cross-regional multimodal analysis and interpretable modeling","url":"https://doi.org/10.1038/s41598-025-24399-5","published":"2025-11-18","authors":["Xiaojuan Zheng","Yan Huang","Zhiming Xie","Aimin Zheng"],"abstract":"Wetland parks serve both ecological conservation and social service functions, and the mechanisms through which their spatial characteristics influence public perception have attracted wide attention. Given the ecological and socio-economic differences between China and the United States and the availability of social media data, this study focuses on 147 wetland parks in both countries. It collects image-text reviews and ratings, builds a labeling system covering ecological environment, infrastructure, and user experience, and develops a unified feature framework based on multimodal data. The study extracts text, image, and fused features for predicting sentiment values and scores, and applies the SHAP method to explain feature contributions and their interactions. On this basis, high-contribution features identified by SHAP analysis are selected as optimization variables. A multi-objec...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1038/s41598-025-24399-5","openalex_id":"https://openalex.org/W4416333173","cited_by_count":2,"quality_score":43,"matched_keywords":["media"],"author_affiliations":["Huaiyin Normal University","Huawei Technologies (China)"],"concepts":[{"id":"https://openalex.org/C2776401178","display_name":"Feature (linguistics)","score":0.7031000256538391},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6535000205039978},{"id":"https://openalex.org/C26760741","display_name":"Perception","score":0.6247000098228455},{"id":"https://openalex.org/C67715294","display_name":"Wetland","score":0.570900022983551},{"id":"https://openalex.org/C206345919","display_name":"Resource (disambiguation)","score":0.5242000222206116},{"id":"https://openalex.org/C2779903281","display_name":"Modalities","score":0.49799999594688416},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.476500004529953},{"id":"https://openalex.org/C2780378061","display_name":"Service (business)","score":0.40849998593330383}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":2}},{"id":"openalex:W4416417137","title":"Large multimodal models evaluation: a survey","url":"https://doi.org/10.1007/s11432-025-4676-4","published":"2025-11-18","authors":["Zicheng Zhang","Junying Wang","Farong Wen","Yijin Guo","Xiangyu Zhao","Xinyu Fang","Shengyuan Ding","Ziheng Jia","Jiahao Xiao","Ye Shen","Yushuo Zheng","Xiaorong Zhu"],"abstract":"","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1007/s11432-025-4676-4","openalex_id":"https://openalex.org/W4416417137","cited_by_count":6,"quality_score":43,"matched_keywords":[],"author_affiliations":["Beijing Academy of Artificial Intelligence","Berkeley College","Cardiff University","China University of Mining and Technology","Chinese University of Hong Kong","Delft University of Technology","East China Normal University","Fudan University","Harvard University","Huawei Technologies (China)","Monash University","Nantes Université","Nanyang Technological University","Peking University","Peng Cheng Laboratory","Shanghai Artificial Intelligence Laboratory","Shanghai Jiao Tong University","Tsinghua University","University Town of Shenzhen","University of British Columbia","University of California, Berkeley","Zhejiang University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6061000227928162},{"id":"https://openalex.org/C2522767166","display_name":"Data science","score":0.49880000948905945},{"id":"https://openalex.org/C100521375","display_name":"Competence (human resources)","score":0.41929998993873596},{"id":"https://openalex.org/C539667460","display_name":"Management science","score":0.41510000824928284},{"id":"https://openalex.org/C133462117","display_name":"Data collection","score":0.2921999990940094},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.26339998841285706},{"id":"https://openalex.org/C3018395757","display_name":"Evaluation methods","score":0.24230000376701355},{"id":"https://openalex.org/C61797465","display_name":"Term (time)","score":0.23250000178813934}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":6}},{"id":"openalex:W7136141188","title":"MS-VLMDet: Multi-Scale Feature Enhanced Vision-Language Model for Pedestrian Detection","url":"https://doi.org/10.1109/itsc60802.2025.11423248","published":"2025-11-18","authors":["Zekai Dai","Xingyuan Dai","Yisheng L.V.","Xin Pei","Xu Wang","Xiaoyan Gong","Yuliang Liu","Wuling Huang"],"abstract":"Pedestrian detection, as a critical component of Intelligent Transportation Systems, encounters numerous challenges. Pedestrians are often situated against complex backgrounds, appear as small targets in images, and adopt diverse poses, all of which make detection challenging. This research proposes MS-VLMDet (Multi-Scale Feature-Enhanced Vision-Language Model for Pedestrian Detection), which fuses multi-scale features by integrating multi-level visual representations from a Feature Pyramid Network into large vision-language models to overcome the models' limitations in precisely localizing small objects. MS-VLMDet contains three key modules: a multi-scale feature extraction module that captures pedestrian features at different resolutions; a feature fusion module that integrates the extracted features with the original image and text prompts; and a feature enhanced vision-language model...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/itsc60802.2025.11423248","openalex_id":"https://openalex.org/W7136141188","cited_by_count":0,"quality_score":41,"matched_keywords":["language model"],"author_affiliations":["Beijing Academy of Artificial Intelligence","China XD Group (China)","Chinese Academy of Sciences","Huawei Technologies (China)","Institute of Automation","Shanghai Artificial Intelligence Laboratory","Tsinghua University","University of Chinese Academy of Sciences"],"concepts":[{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.6320000290870667},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5637999773025513},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.5410000085830688},{"id":"https://openalex.org/C2776401178","display_name":"Feature (linguistics)","score":0.4837000072002411},{"id":"https://openalex.org/C153180895","display_name":"Pattern recognition (psychology)","score":0.4025999903678894},{"id":"https://openalex.org/C2776151529","display_name":"Object detection","score":0.3125999867916107},{"id":"https://openalex.org/C99498987","display_name":"Noise (video)","score":0.30329999327659607},{"id":"https://openalex.org/C52622490","display_name":"Feature extraction","score":0.29420000314712524}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4416332410","title":"Aligning brains into a shared space improves their alignment with large language models","url":"https://doi.org/10.1038/s43588-025-00900-y","published":"2025-11-18","authors":["Arnab Bhattacharjee","Zaid Zada","Haocheng Wang","Bobbi Aubrey","Werner Doyle","Patricia Dugan","Daniel Friedman","Orrin Devinsky","Adeen Flinker","Peter J. Ramadge","Uri Hasson","Ariel Goldstein"],"abstract":"","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1038/s43588-025-00900-y","openalex_id":"https://openalex.org/W4416332410","cited_by_count":2,"quality_score":39,"matched_keywords":[],"author_affiliations":["Google (United States)","Hebrew University of Jerusalem","New York University","Princeton University","University of Southern California"],"concepts":[{"id":"https://openalex.org/C125411270","display_name":"Encoding (memory)","score":0.7660999894142151},{"id":"https://openalex.org/C2780117969","display_name":"Electrocorticography","score":0.7635999917984009},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6970999836921692},{"id":"https://openalex.org/C177291462","display_name":"Active listening","score":0.49799999594688416},{"id":"https://openalex.org/C188198153","display_name":"Limiting","score":0.46459999680519104},{"id":"https://openalex.org/C2778572836","display_name":"Space (punctuation)","score":0.46209999918937683},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.45910000801086426},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.45879998803138733}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":2}},{"id":"openalex:W7137314088","title":"Multimodal HD Mapping for Intersections by Intelligent Roadside Units","url":"https://doi.org/10.1109/itsc60802.2025.11423769","published":"2025-11-18","authors":["Zhongzhang Chen","Miao Fan","Sugang Xu","Mengmeng Yang","Kun Jiang","Xiaojing Liu","Haoyi Xiong"],"abstract":"","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/itsc60802.2025.11423769","openalex_id":"https://openalex.org/W7137314088","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Autodesk (United States)","Baidu (China)","Beijing Institute of Graphic Communication","Tsinghua University","United States Department of the Navy","Xidian University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5408999919891357},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4392000138759613},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.4131999909877777},{"id":"https://openalex.org/C64543145","display_name":"Intersection (aeronautics)","score":0.3869999945163727},{"id":"https://openalex.org/C62649853","display_name":"Remote sensing","score":0.3578000068664551},{"id":"https://openalex.org/C47796450","display_name":"Intelligent transportation system","score":0.2842000126838684},{"id":"https://openalex.org/C177264268","display_name":"Set (abstract data type)","score":0.2702000141143799},{"id":"https://openalex.org/C2776401178","display_name":"Feature (linguistics)","score":0.27000001072883606}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"hf-org-paper:Tencent-Hunyuan:2511.13647","title":"Part-X-MLLM: Part-aware 3D Multimodal Large Language Model","url":"https://huggingface.co/papers/2511.13647","published":"2025-11-17","authors":["Tencent/Hunyuan"],"abstract":"We introduce Part-X-MLLM, a native 3D multimodal large language model that unifies diverse 3D tasks by formulating them as programs in a structured, executable grammar. Given an RGB point cloud and a natural language prompt, our model autoregressively generates a single, coherent token sequence encoding part-level bounding boxes, semantic descriptions, and edit commands. This structured output serves as a versatile interface to drive downstream geometry-aware modules for part-based generation and editing. By decoupling the symbolic planning from the geometric synthesis, our approach allows any compatible geometry engine to be controlled through a single, language-native frontend. We pre-train a dual-encoder architecture to disentangle structure from semantics and instruction-tune the model on a large-scale, part-centric dataset. Experiments demonstrate that our model excels at producing....","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["HuggingFace org papers","Tencent-Hunyuan","language model"],"author_affiliations":["Tencent/Hunyuan"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/Tencent-Hunyuan/papers"}},{"id":"openalex:W4416290701","title":"SoMORE: Social Context-Aware MLLM for Video Character Search","url":"https://doi.org/10.1007/978-981-95-3052-6_31","published":"2025-11-17","authors":["Kou Xin","Wenjun Peng","Tong Xu"],"abstract":"","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"book-chapter","doi":"https://doi.org/10.1007/978-981-95-3052-6_31","openalex_id":"https://openalex.org/W4416290701","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Alibaba Group (China)","University of Science and Technology of China"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8884999752044678},{"id":"https://openalex.org/C63479239","display_name":"Robustness (evolution)","score":0.5565999746322632},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5516999959945679},{"id":"https://openalex.org/C184337299","display_name":"Semantics (computer science)","score":0.5472000241279602},{"id":"https://openalex.org/C2778355321","display_name":"Identity (music)","score":0.4837000072002411},{"id":"https://openalex.org/C2780451532","display_name":"Task (project management)","score":0.4657999873161316},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.40700000524520874},{"id":"https://openalex.org/C2776401178","display_name":"Feature (linguistics)","score":0.39430001378059387}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7125969228","title":"VERT: Polyglot Verified Equivalent Rust Transpilation with Large Language Models","url":"https://doi.org/10.1109/ase63991.2025.00123","published":"2025-11-16","authors":["Aidan Z.H. Yang","Yoshiki Takashima","Brandon Paulsen","Josiah Dodds","Daniel Kroening"],"abstract":"Rust is a programming language that combines memory safety and low-level control, providing C-like performance while guaranteeing the absence of undefined behaviors by default. Rust’s growing popularity has prompted research on correct and idiomatic transpiling of existing code-bases to Rust. Existing work falls into two categories: rule-based and large language model (LLM)-based. While rule-based approaches are theoretically sound, they often yield unidiomatic and unsafe Rust code, and are limited to few source languages, which hinders maintainability and industrial application. By contrast, LLM-based approaches, while providing no guarantees, are polyglot and typically produce more idiomatic and safe Rust code. In this work, we present VERT, a formally correct, polyglot Rust translator with more idiomatic outputs. VERT supports any language that compiles to Web Assembly. Using the Web....","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/ase63991.2025.00123","openalex_id":"https://openalex.org/W7125969228","cited_by_count":0,"quality_score":49,"matched_keywords":["LLM","language model","memory"],"author_affiliations":["Amazon (United States)","Yale University"],"concepts":[{"id":"https://openalex.org/C2780239667","display_name":"Polyglot","score":0.9054999947547913},{"id":"https://openalex.org/C197781089","display_name":"Rust (programming language)","score":0.817300021648407},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7031999826431274},{"id":"https://openalex.org/C55166926","display_name":"Oracle","score":0.595300018787384},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.49799999594688416},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.49079999327659607},{"id":"https://openalex.org/C199360897","display_name":"Programming language","score":0.48899999260902405},{"id":"https://openalex.org/C171078966","display_name":"Root (linguistics)","score":0.4300999939441681}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7125900629","title":"AlertGuardian: Intelligent Alert Life-Cycle Management for Large-scale Cloud Systems","url":"https://doi.org/10.1109/ase63991.2025.00009","published":"2025-11-16","authors":["Guangba Yu","Genting Mai","Rui Wang","Ruipeng Li","Pengfei Chen","Long Pan","Ruijie Xu"],"abstract":"Alerts are critical for detecting anomalies in large-scale cloud systems, ensuring reliability and user experience. However, current systems generate overwhelming volumes of alerts, degrading operational efficiency due to ineffective alert life-cycle management. This paper details the efforts of Company-X to optimize alert life-cycle management, addressing alert fatigue in cloud systems. We propose AlertGuardian, a framework collaborating large language models (LLMs) and lightweight graph models to optimize the alert life-cycle through three phases: Alert Denoise uses graph learning model with virtual noise to filter noise, Alert Summary employs Retrieval Augmented Generation (RAG) with LLMs to create actionable summary, and Alert Rule Refinement leverages multi-agent iterative feedbacks to improve alert rule quality. Evaluated on four real-world datasets from Company-X’s services, Alert...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/ase63991.2025.00009","openalex_id":"https://openalex.org/W7125900629","cited_by_count":0,"quality_score":49,"matched_keywords":["retrieval","agent","multi-agent"],"author_affiliations":["Sun Yat-sen University","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7773000001907349},{"id":"https://openalex.org/C79974875","display_name":"Cloud computing","score":0.6945000290870667},{"id":"https://openalex.org/C105339364","display_name":"Software deployment","score":0.6283000111579895},{"id":"https://openalex.org/C43214815","display_name":"Reliability (semiconductor)","score":0.5156000256538391},{"id":"https://openalex.org/C132525143","display_name":"Graph","score":0.39739999175071716},{"id":"https://openalex.org/C38652104","display_name":"Computer security","score":0.3822000026702881},{"id":"https://openalex.org/C108074857","display_name":"Fault management","score":0.3693000078201294},{"id":"https://openalex.org/C106131492","display_name":"Filter (signal processing)","score":0.35089999437332153}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7125901809","title":"iKnow: an Intent-Guided Chatbot for Cloud Operations with Retrieval-Augmented Generation","url":"https://doi.org/10.1109/ase63991.2025.00084","published":"2025-11-16","authors":["Junjie Huang","Yuedong Zhong","Guangba Yu","Zhihan Jiang","Minzhi Yan","Wenfei Luan","Tianyu Yang","Rui Ren","Michael R. Lyu"],"abstract":"Managing complex cloud services requires standard operational documentation, but its sheer volume often hinders cloud engineers from efficient knowledge acquisition. Retrieval-Augmented Generation (RAG) can streamline this process by retrieving relevant knowledge and generating concise, referenced answers. However, deploying a reliable RAG-based chatbot for cloud operation remains a challenge. In this experience paper, we analyze the development and deployment of RAG-based chatbots for operational question answering (OpsQA) at a large-scale cloud vendor. Through an empirical study of 2,000 real-world queries across three operational teams, we identify five unique OpsQA intent types (e.g., symptom analysis and terminology explanation) and their corresponding requirements for a satisfactory answer, which differ from general software engineering queries. Our analysis further uncovers six ro...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/ase63991.2025.00084","openalex_id":"https://openalex.org/W7125901809","cited_by_count":0,"quality_score":45,"matched_keywords":["retrieval","efficient"],"author_affiliations":["Chinese University of Hong Kong","Huawei Technologies (China)","Sun Yat-sen University"],"concepts":[{"id":"https://openalex.org/C2779041454","display_name":"Chatbot","score":0.8119000196456909},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.800000011920929},{"id":"https://openalex.org/C79974875","display_name":"Cloud computing","score":0.7695000171661377},{"id":"https://openalex.org/C547195049","display_name":"Terminology","score":0.5776000022888184},{"id":"https://openalex.org/C98045186","display_name":"Process (computing)","score":0.4634999930858612},{"id":"https://openalex.org/C105339364","display_name":"Software deployment","score":0.44780001044273376},{"id":"https://openalex.org/C136764020","display_name":"World Wide Web","score":0.44699999690055847},{"id":"https://openalex.org/C26713055","display_name":"Implementation","score":0.4325000047683716}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7125977933","title":"SE-Jury: An LLM-as-Ensemble-Judge Metric for Narrowing the Gap with Human Evaluation in SE","url":"https://doi.org/10.1109/ase63991.2025.00214","published":"2025-11-16","authors":["Xin Zhou","Kisub Kim","Ting Zhang","Martin Weyssow","Luís F. Gomes","Guang Yang","Kui Liu","Xin Xia","David Lo"],"abstract":"Large Language Models (LLMs) and other automated techniques have been increasingly used to support software developers by generating software artifacts such as code snippets, patches, and comments. However, accurately assessing the correctness of these generated artifacts remains a significant challenge. On one hand, human evaluation provides high accuracy but is labor-intensive and lacks scalability. On the other hand, many automatic evaluation metrics are scalable and require minimal human effort, but they often fail to accurately reflect the actual correctness of generated software artifacts.In this paper, we present SE-Jury, the first evaluation metric for LLM-as-Ensemble-Judge specifically designed to accurately assess the correctness of generated software artifacts. SE-Jury first defines five distinct evaluation strategies, each implemented as an independent judge. A dynamic team s...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/ase63991.2025.00214","openalex_id":"https://openalex.org/W7125977933","cited_by_count":1,"quality_score":42,"matched_keywords":["LLM"],"author_affiliations":["Daegu Gyeongbuk Institute of Science and Technology","Huawei Technologies (China)","Huawei Technologies (United States)","Monash University","Nanjing University of Aeronautics and Astronautics","Singapore Management University","Zhejiang University"],"concepts":[{"id":"https://openalex.org/C55439883","display_name":"Correctness","score":0.8855000138282776},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7756999731063843},{"id":"https://openalex.org/C176217482","display_name":"Metric (unit)","score":0.6690999865531921},{"id":"https://openalex.org/C2777904410","display_name":"Software","score":0.5781000256538391},{"id":"https://openalex.org/C48044578","display_name":"Scalability","score":0.5586000084877014},{"id":"https://openalex.org/C177264268","display_name":"Set (abstract data type)","score":0.5327000021934509},{"id":"https://openalex.org/C2776760102","display_name":"Code (set theory)","score":0.5268999934196472},{"id":"https://openalex.org/C82214349","display_name":"Software metric","score":0.4902999997138977}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4416257993","title":"IRSC: A Zero-Shot Evaluation Benchmark for Information Retrieval Based on Semantic Comprehension in Retrieval-Augmented Generation Scenarios","url":"https://doi.org/10.1007/978-981-95-3352-7_20","published":"2025-11-16","authors":["Hai Lin","Shaoxiong Zhan","Jian Su","Hai-Tao Zheng","Hui Wang","Xin Su","Ruitong Liu"],"abstract":"","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"book-chapter","doi":"https://doi.org/10.1007/978-981-95-3352-7_20","openalex_id":"https://openalex.org/W4416257993","cited_by_count":1,"quality_score":42,"matched_keywords":["retrieval"],"author_affiliations":["Peng Cheng Laboratory","Southern University of Science and Technology","Tencent (China)","Tsinghua University","University Town of Shenzhen"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8981999754905701},{"id":"https://openalex.org/C185798385","display_name":"Benchmark (surveying)","score":0.6793000102043152},{"id":"https://openalex.org/C174348530","display_name":"Bridging (networking)","score":0.6671000123023987},{"id":"https://openalex.org/C197947376","display_name":"Comparability","score":0.6258000135421753},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.6165000200271606},{"id":"https://openalex.org/C41608201","display_name":"Embedding","score":0.542900025844574},{"id":"https://openalex.org/C86034646","display_name":"Semantic gap","score":0.4553999900817871},{"id":"https://openalex.org/C26517878","display_name":"Key (lock)","score":0.4544000029563904}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W7125943867","title":"Clarifying Semantics of In-Context Examples for Unit Test Generation","url":"https://doi.org/10.1109/ase63991.2025.00250","published":"2025-11-16","authors":["Chen Yang","Lin Yang","Ziqi Wang","Dong Wang","Jianyi Zhou","Junjie Chen"],"abstract":"Recent advances in large language models (LLMs) have enabled promising performance in unit test generation through in-context learning (ICL). However, the quality of in-context examples significantly influences the effectiveness of generated tests—poorly structured or semantically unclear test examples often lead to suboptimal outputs. In this paper, we propose CLAST, a novel technique that systematically refines unit tests to improve their semantic clarity, thereby enhancing their utility as in-context examples. The approach decomposes complex tests into logically clearer ones and improves semantic clarity through a combination of program analysis and LLM-based rewriting. We evaluated CLAST on four open-source and three industrial projects. The results demonstrate that CLAST largely outperforms UTgen, the state-of-the-art refinement technique, in both preserving test effectiveness and e...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/ase63991.2025.00250","openalex_id":"https://openalex.org/W7125943867","cited_by_count":1,"quality_score":42,"matched_keywords":["LLM"],"author_affiliations":["Cloud Computing Center","Huawei Technologies (China)","Tianjin University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7483000159263611},{"id":"https://openalex.org/C184337299","display_name":"Semantics (computer science)","score":0.7037000060081482},{"id":"https://openalex.org/C2777146004","display_name":"CLARITY","score":0.6887000203132629},{"id":"https://openalex.org/C148027188","display_name":"Unit testing","score":0.6234999895095825},{"id":"https://openalex.org/C2777267654","display_name":"Test (biology)","score":0.5849000215530396},{"id":"https://openalex.org/C2779530757","display_name":"Quality (philosophy)","score":0.47200000286102295},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.47099998593330383},{"id":"https://openalex.org/C128942645","display_name":"Test case","score":0.4465000033378601}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W7125908675","title":"Vul-R2: A Reasoning LLM for Automated Vulnerability Repair","url":"https://doi.org/10.1109/ase63991.2025.00011","published":"2025-11-16","authors":["Xin-Cheng Wen","Zirui Lin","Yijun Yang","Cuiyun Gao","Deheng Ye"],"abstract":"The exponential increase in software vulnerabilities has created an urgent need for automatic vulnerability repair (AVR) solutions. Recent research has formulated AVR as a sequence generation problem and has leveraged large language models (LLMs) to address this problem. Typically, these approaches prompt or fine-tune LLMs to generate repairs for vulnerabilities directly. Although these methods show state-of-the-art performance, they face the following challenges: (1) Lack of high-quality, vulnerability-related reasoning data. Current approaches primarily rely on foundation models that mainly encode general programming knowledge. Without vulnerability-related reasoning data, they tend to fail to capture the diverse vulnerability repair patterns. (2) Hard to verify the intermediate vulnerability repair process during LLM training. Existing reinforcement learning methods often leverage int...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/ase63991.2025.00011","openalex_id":"https://openalex.org/W7125908675","cited_by_count":0,"quality_score":41,"matched_keywords":["LLM"],"author_affiliations":["Chinese University of Hong Kong","City University of Hong Kong","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7208999991416931},{"id":"https://openalex.org/C97541855","display_name":"Reinforcement learning","score":0.6165000200271606},{"id":"https://openalex.org/C95713431","display_name":"Vulnerability (computing)","score":0.571399986743927},{"id":"https://openalex.org/C85847156","display_name":"Verifiable secret sharing","score":0.565500020980835},{"id":"https://openalex.org/C153083717","display_name":"Leverage (statistics)","score":0.5095000267028809},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5088000297546387},{"id":"https://openalex.org/C9616225","display_name":"Semantic reasoner","score":0.47999998927116394},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.42149999737739563}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7125915424","title":"MCTS-Refined CoT: High-Quality Fine-Tuning Data for LLM-Based Repository Issue Resolution","url":"https://doi.org/10.1109/ase63991.2025.00154","published":"2025-11-16","authors":["Yibo Wang","Zhihao Peng","Ying Wang","Zhao Wei","Hai Yu","Zhiliang Zhu"],"abstract":"LLMs demonstrate strong performance in automated software engineering, particularly for code generation and issue resolution. While proprietary models like GPT-4o achieve high benchmarks scores on SWE-bench, their API dependence, cost, and privacy concerns limit adoption. Open-source alternatives offer transparency but underperform in complex tasks, especially sub-100B parameter models. Although quality Chain-of-Thought (CoT) data can enhance reasoning, current methods face two critical flaws: (1) weak rejection sampling reduces data quality, and (2) inadequate step validation causes error accumulation. These limitations lead to flawed reasoning chains that impair LLMs’ ability to learn reliable issue resolution.The paper proposes MCTS-REFINE, an enhanced Monte Carlo Tree Search (MCTS)-based algorithm that dynamically validates and optimizes intermediate reasoning steps through a rigorou...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/ase63991.2025.00154","openalex_id":"https://openalex.org/W7125915424","cited_by_count":0,"quality_score":41,"matched_keywords":["LLM"],"author_affiliations":["Northeastern University","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8061000108718872},{"id":"https://openalex.org/C55439883","display_name":"Correctness","score":0.7822999954223633},{"id":"https://openalex.org/C140779682","display_name":"Sampling (signal processing)","score":0.5855000019073486},{"id":"https://openalex.org/C43711488","display_name":"Skew","score":0.5414000153541565},{"id":"https://openalex.org/C124101348","display_name":"Data mining","score":0.5248000025749207},{"id":"https://openalex.org/C26517878","display_name":"Key (lock)","score":0.48750001192092896},{"id":"https://openalex.org/C2777904410","display_name":"Software","score":0.47429999709129333},{"id":"https://openalex.org/C2780233690","display_name":"Transparency (behavior)","score":0.4512999951839447}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7125900418","title":"Element-Aware Fine-Tuning of Vision-Language Models for Cost-Efficient GUI Testing in an Industrial Setting","url":"https://doi.org/10.1109/ase63991.2025.00296","published":"2025-11-16","authors":["Mengzhou Wu","Y. Guo","Yuan Cao","Haochuan Lu","Hengyu Zhang","Xia Zeng","Liangchao Yao","Yuetang Deng","Dezhi Ran","Wei Yang","Tao Xie"],"abstract":"User Interface (UI) testing is crucial for quality assurance of industrial mobile applications, and yet it remains labor-intensive and challenging to automate effectively. Recent advances in Vision-Language Models (VLMs) present a promising solution for automating GUI testing by mapping natural language instructions to pixel-level actions, significantly reducing the manual effort required for writing test scripts and even designing test cases. While numerous VLMs have been proposed and evaluated for GUI testing, they often fail to meet two critical industrial requirements: (1) effectiveness when handling complex, multi-step workflows in industrial applications, and (2) efficiency for large-scale, high-frequency testing environments typical in industrial settings. Toward addressing the preceding industrial requirements, in this paper, we report our experiences in developing and deploying....","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/ase63991.2025.00296","openalex_id":"https://openalex.org/W7125900418","cited_by_count":0,"quality_score":41,"matched_keywords":["efficient"],"author_affiliations":["Beijing Jiaotong University","Peking University","Tencent (China)","The University of Texas at Dallas","University of North Texas at Dallas"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7512000203132629},{"id":"https://openalex.org/C177212765","display_name":"Workflow","score":0.675000011920929},{"id":"https://openalex.org/C61423126","display_name":"Scripting language","score":0.6471999883651733},{"id":"https://openalex.org/C115903868","display_name":"Software engineering","score":0.5356000065803528},{"id":"https://openalex.org/C43521106","display_name":"Pipeline (software)","score":0.424699991941452},{"id":"https://openalex.org/C41608201","display_name":"Embedding","score":0.4050999879837036},{"id":"https://openalex.org/C115901376","display_name":"Automation","score":0.3993000090122223},{"id":"https://openalex.org/C169168650","display_name":"Keyword-driven testing","score":0.3896999955177307}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7125939562","title":"AlignCoder: Aligning Retrieval with Target Intent for Repository-Level Code Completion","url":"https://doi.org/10.1109/ase63991.2025.00085","published":"2025-11-16","authors":["Tianyue Jiang","Y. F. Wang","Yanlin Wang","Daya Guo","Ensheng Shi","Yuchi Ma","Ting Chen","Zibin Zheng"],"abstract":"Repository-level code completion remains a challenging task for existing code large language models (code LLMs) due to their limited understanding of repository-specific context and domain knowledge. While retrieval-augmented generation (RAG) approaches have shown promise by retrieving relevant code snippets as cross-file context, they suffer from two fundamental problems: misalignment between the query and the target code in the retrieval process, and the inability of existing retrieval methods to effectively utilize the inference information. To address these challenges, we propose AlignCoder, a repository-level code completion framework that introduces a query enhancement mechanism and a reinforcement learning based retriever training method. Our approach generates multiple candidate completions to construct an enhanced query that bridges the semantic gap between the initial query and...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/ase63991.2025.00085","openalex_id":"https://openalex.org/W7125939562","cited_by_count":0,"quality_score":41,"matched_keywords":["retrieval"],"author_affiliations":["Huawei Technologies (China)","Sun Yat-sen University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8449000120162964},{"id":"https://openalex.org/C153083717","display_name":"Leverage (statistics)","score":0.6784999966621399},{"id":"https://openalex.org/C2776214188","display_name":"Inference","score":0.5875999927520752},{"id":"https://openalex.org/C2776760102","display_name":"Code (set theory)","score":0.5831999778747559},{"id":"https://openalex.org/C97541855","display_name":"Reinforcement learning","score":0.5285000205039978},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.5231000185012817},{"id":"https://openalex.org/C99016210","display_name":"Query expansion","score":0.48739999532699585},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.44760000705718994}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7125942778","title":"PALM: Synergizing Program Analysis and LLMs to Enhance Rust Unit Test Coverage","url":"https://doi.org/10.1109/ase63991.2025.00223","published":"2025-11-16","authors":["Bei Chu","Yang Feng","Kui Liu","Hange Shi","Zifan Nan","Zhaoqiang Guo","Baowen Xu"],"abstract":"Unit testing is essential for ensuring software reliability and correctness. Classic Search-Based Software Testing (SBST) methods and concolic execution-based approaches for generating unit tests often fail to achieve high coverage due to difficulties in handling complex program units, such as branching conditions and external dependencies. Recent work has increasingly utilized large language models (LLMs) to generate test cases, improving the quality of test generation by providing better context and correcting errors in the model’s output. However, these methods rely on fixed prompts, resulting in relatively low compilation success rates and coverage.This paper presents PALM, an approach that leverages large language models (LLMs) to enhance the generation of high-coverage unit tests. PALM performs program analysis to identify branching conditions within functions, which are then combi...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/ase63991.2025.00223","openalex_id":"https://openalex.org/W7125942778","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Huawei Technologies (China)","Nanjing University"],"concepts":[{"id":"https://openalex.org/C148027188","display_name":"Unit testing","score":0.7660999894142151},{"id":"https://openalex.org/C2779343474","display_name":"Context (archaeology)","score":0.5633000135421753},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5389999747276306},{"id":"https://openalex.org/C122637931","display_name":"Unit (ring theory)","score":0.4699999988079071},{"id":"https://openalex.org/C2777904410","display_name":"Software","score":0.44119998812675476},{"id":"https://openalex.org/C43214815","display_name":"Reliability (semiconductor)","score":0.4269999861717224},{"id":"https://openalex.org/C2777267654","display_name":"Test (biology)","score":0.4262000024318695},{"id":"https://openalex.org/C188598960","display_name":"Test strategy","score":0.4244999885559082}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7125894792","title":"Data Dependency-Aware Code Generation from Enhanced UML Sequence Diagrams","url":"https://doi.org/10.1109/ase63991.2025.00282","published":"2025-11-16","authors":["Wenxin Mao","Zhitao Wang","Long Wang","Sirong Chen","Cuiyun Gao","Luyang Cao","Ziming Liu","Qiming Zhang","Jun Zhou","Zhi Jin"],"abstract":"Large language models (LLMs) excel at generating code from natural language (NL) descriptions. However, the plain textual descriptions are inherently ambiguous and often fail to capture complex requirements like intricate system behaviors, conditional logic, and architectural constraints; implicit data dependencies in service-oriented architectures are difficult to infer and handle correctly.To bridge this gap, we propose a novel step-by-step code generation framework named UML2Dep by leveraging unambiguous formal specifications of complex requirements. First, we introduce an enhanced Unified Modeling Language (UML) sequence diagram tailored for service-oriented architectures. This diagram extends traditional visual syntax by integrating decision tables and API specifications, explicitly formalizing structural relationships and business logic flows in service interactions to rigorously e...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/ase63991.2025.00282","openalex_id":"https://openalex.org/W7125894792","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Chinese University of Hong Kong","Tencent (China)","Wuhan University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8605999946594238},{"id":"https://openalex.org/C199360897","display_name":"Programming language","score":0.6416000127792358},{"id":"https://openalex.org/C133162039","display_name":"Code generation","score":0.567799985408783},{"id":"https://openalex.org/C16311509","display_name":"Dependency graph","score":0.43220001459121704},{"id":"https://openalex.org/C19768560","display_name":"Dependency (UML)","score":0.42260000109672546},{"id":"https://openalex.org/C80444323","display_name":"Theoretical computer science","score":0.3910999894142151},{"id":"https://openalex.org/C2776214188","display_name":"Inference","score":0.36410000920295715},{"id":"https://openalex.org/C2776760102","display_name":"Code (set theory)","score":0.3366999924182892}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7125949895","title":"Automated Prompt Generation for Code Intelligence: An Empirical study and Experience in WeChat","url":"https://doi.org/10.1109/ase63991.2025.00285","published":"2025-11-16","authors":["Kexing Ji","Shiyun Fu","Cuiyun Gao","Yujia Chen","Zezhou Yang","C J Wang","Yuetang Deng"],"abstract":"Large Code Models (LCMs) have demonstrated potential in advancing various code intelligence tasks. However, their effectiveness can be greatly influenced by the quality of the prompts. Current prompt design strategies in code intelligence studies are mostly manually generated, which could be time-consuming and extremely rely on the base LCMs and tasks. Although automated prompt generation (APG) has been investigated in the natural language processing field, it has not attracted sufficient attention and been well explored in the code intelligence tasks. Considering the various tasks and black-box nature of LCMs faced by developers in practice, it is essential to automate the prompt generation process.To mitigate the gap, we empirically investigate the two important parts in APG, including Instruction Generation (IG) and Muti-Step Reasoning (MSR). The instruction generation part aims at pr...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/ase63991.2025.00285","openalex_id":"https://openalex.org/W7125949895","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Chinese University of Hong Kong","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7894999980926514},{"id":"https://openalex.org/C133162039","display_name":"Code generation","score":0.660099983215332},{"id":"https://openalex.org/C170858558","display_name":"Automatic summarization","score":0.6406999826431274},{"id":"https://openalex.org/C2776760102","display_name":"Code (set theory)","score":0.6226000189781189},{"id":"https://openalex.org/C150292731","display_name":"Code review","score":0.5134999752044678},{"id":"https://openalex.org/C120936955","display_name":"Empirical research","score":0.5123000144958496},{"id":"https://openalex.org/C115903868","display_name":"Software engineering","score":0.4999000132083893},{"id":"https://openalex.org/C199360897","display_name":"Programming language","score":0.4348999857902527}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4416236520","title":"Improving LLM-Based Document-Level MT with Multi-Knowledge Fusion","url":"https://doi.org/10.1007/978-981-95-3349-7_14","published":"2025-11-15","authors":["Bin Liu","Xinglin Lyu","Junhui Li","Daimeng Wei","M. Zhang","Shimin Tao","Hao Yang"],"abstract":"","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"book-chapter","doi":"https://doi.org/10.1007/978-981-95-3349-7_14","openalex_id":"https://openalex.org/W4416236520","cited_by_count":0,"quality_score":41,"matched_keywords":["LLM"],"author_affiliations":["Huawei Technologies (China)","Soochow University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8841999769210815},{"id":"https://openalex.org/C170858558","display_name":"Automatic summarization","score":0.864799976348877},{"id":"https://openalex.org/C192209626","display_name":"Focus (optics)","score":0.6309999823570251},{"id":"https://openalex.org/C203005215","display_name":"Machine translation","score":0.6032999753952026},{"id":"https://openalex.org/C120012220","display_name":"Source text","score":0.5928999781608582},{"id":"https://openalex.org/C2779343474","display_name":"Context (archaeology)","score":0.57669997215271},{"id":"https://openalex.org/C149364088","display_name":"Translation (biology)","score":0.5529999732971191},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.5336999893188477}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"hf-org-paper:tencent:2511.11373","title":"MarsRL: Advancing Multi-Agent Reasoning System via Reinforcement Learning with Agentic Pipeline Parallelism","url":"https://huggingface.co/papers/2511.11373","published":"2025-11-14","authors":["Tencent/Hunyuan"],"abstract":"","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["HuggingFace org papers","tencent","agent","multi-agent"],"author_affiliations":["Tencent/Hunyuan"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/tencent/papers"}},{"id":"openalex:W7127184736","title":"STAR-Shield: Self-Tuning Adaptive Rules for Web Application Firewall-as-a-Service via Multiple Large Language Models","url":"https://doi.org/10.1109/trustcom66490.2025.00228","published":"2025-11-14","authors":["Letian Sha","Lei Xue","Nan Yi","Fu Xiao"],"abstract":"As the primary entry point to modern digital services, Web applications are now subjected to the fastest-evolving threat landscape on the Internet. Consequently, ML-based Web Application Firewall (WAF) exhibits degraded accuracy when exposed to novel attack patterns, while regex-driven solution remains bottlenecked by manual rule crafting, impeding agile response to emergent threats. Large Language Models (LLMs) bring to bear capabilities such as real-time Internet-scale intelligence gathering, symbolic code reasoning, and targeted analytic generation. We introduce STAR-Shield, a LLM-powered adaptive rule-evolution framework engineered for Web Application Firewall-as-a-service (FWaaS). Through a multi-agent choreography, STAR-Shield automates the full cycle: harvesting and analyzing new threats and vulnerabilities, reconstructing attack payloads, synthesizing and refining regular-express...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/trustcom66490.2025.00228","openalex_id":"https://openalex.org/W7127184736","cited_by_count":0,"quality_score":49,"matched_keywords":["LLM","agent","multi-agent"],"author_affiliations":["Nanjing University of Posts and Telecommunications","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7623999714851379},{"id":"https://openalex.org/C118643609","display_name":"Web application","score":0.5863000154495239},{"id":"https://openalex.org/C77714075","display_name":"Firewall (physics)","score":0.5706999897956848},{"id":"https://openalex.org/C14185376","display_name":"Agile software development","score":0.5406000018119812},{"id":"https://openalex.org/C115903868","display_name":"Software engineering","score":0.40400001406669617},{"id":"https://openalex.org/C28719098","display_name":"Point (geometry)","score":0.4009999930858612},{"id":"https://openalex.org/C2776760102","display_name":"Code (set theory)","score":0.3684000074863434},{"id":"https://openalex.org/C26517878","display_name":"Key (lock)","score":0.34459999203681946}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4416198711","title":"FactorMAD: A Multi-Agent Debate Framework Based on Large Language Models for Interpretable Stock Alpha Factor Mining","url":"https://doi.org/10.1145/3768292.3770377","published":"2025-11-14","authors":["Yongkang Duan","C. Zhang","Jian Li"],"abstract":"In quantitative investment, alpha factor mining plays a crucial role in predicting stock returns. Traditional approaches rely on human experts to design factors based on financial intuition. To enhance the efficiency of factor mining, recent machine learning (ML) methods have driven a shift toward automated factor mining. However, these ML-based approaches often suffer from a lack of interpretability or are restricted by predefined mathematical operators, limiting their ability to mine novel and effective factors. In this paper, we propose FactorMAD, a multi-agent debate framework based on large language models (LLMs) for interpretable alpha factor mining. Unlike existing methods, our framework employs two specialized LLM agents that iteratively refine factors through structured debate, leveraging diverse prior perspectives and critiques. To enhance factor flexibility and expressiveness,...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3768292.3770377","openalex_id":"https://openalex.org/W4416198711","cited_by_count":0,"quality_score":49,"matched_keywords":["LLM","agent","multi-agent"],"author_affiliations":["Microsoft (United States)","Microsoft Research Asia (China)","Tsinghua University"],"concepts":[{"id":"https://openalex.org/C2781067378","display_name":"Interpretability","score":0.8367999792098999},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6305999755859375},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.5759999752044678},{"id":"https://openalex.org/C2781039887","display_name":"Factor (programming language)","score":0.5611000061035156},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4966000020503998},{"id":"https://openalex.org/C2780299701","display_name":"Stock market","score":0.484499990940094},{"id":"https://openalex.org/C2780598303","display_name":"Flexibility (engineering)","score":0.4595000147819519},{"id":"https://openalex.org/C204036174","display_name":"Stock (firearms)","score":0.41429999470710754}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"arxiv:2509.20571","title":"MechStyle: Augmenting Generative AI with Mechanical Simulation to Create Stylized and Structurally Viable 3D Models","url":"http://arxiv.org/abs/2509.20571","published":"2025-11-14","authors":["Faraz Faruqi","Amira Abdel-Rahman","Leandra Tejedor","Martin Nisser","Jiaji Li","Vrushank Phadnis","Varun Jampani","Neil Gershenfeld","Megan Hofmann","Stefanie Mueller"],"abstract":"Recent developments in Generative AI enable creators to stylize 3D models based on text prompts. These methods change the 3D model geometry, which can compromise the model’s structural integrity once fabricated. We present MechStyle, a system that enables creators to stylize 3D printable models while preserving their structural integrity. MechStyle accomplishes this by augmenting the Generative AI-based stylization process with feedback from a Finite Element Analysis (FEA) simulation. As the stylization process modifies the geometry to approximate the desired style, feedback from the FEA simulation reduces modifications to regions with increased stress. We evaluate the effectiveness of FEA simulation feedback in the augmented stylization process by comparing three stylization control strategies. We also investigate the time efficiency of our approach by comparing three adaptive schedulin...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"preprint","doi":"https://doi.org/10.1145/3745778.3766655","openalex_id":"https://openalex.org/W4414787997","cited_by_count":4,"quality_score":41,"matched_keywords":[],"author_affiliations":["Disability Practice Institute","Google (United States)","MIT-Harvard Center for Ultracold Atoms","Northeastern University","University of Washington"],"concepts":[{"id":"https://openalex.org/C38935604","display_name":"Stylized fact","score":0.8675000071525574},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7258999943733215},{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.7074999809265137},{"id":"https://openalex.org/C98045186","display_name":"Process (computing)","score":0.6287999749183655},{"id":"https://openalex.org/C184408114","display_name":"Generative Design","score":0.6273000240325928},{"id":"https://openalex.org/C167966045","display_name":"Generative model","score":0.6197999715805054},{"id":"https://openalex.org/C135628077","display_name":"Finite element method","score":0.5242000222206116},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.49619999527931213}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":4}},{"id":"openalex:W7127077112","title":"Leveraging large language models for SQL behavior-based database intrusion detection","url":"https://doi.org/10.1109/trustcom66490.2025.00035","published":"2025-11-14","authors":["Meital Shlezinger","Shay Akirav","Lei Zhou","Liang Guo","Avi Kessel","Guoliang Li"],"abstract":"Database systems are extensively used to store critical data across various domains. However, the frequency of abnormal database access behaviors, such as database intrusion by internal and external attacks, continues to rise. Internal masqueraders often have greater organizational knowledge, making it easier to mimic employee behavior effectively. In contrast, external masqueraders may behave differently due to their lack of familiarity with the organization. Current approaches lack the granularity needed to detect anomalies at the operational level, frequently misclassifying entire sequences of operations as anomalies, even though most operations are likely to represent normal behavior. On the other hand, some anomalous behaviors often resemble normal activities, making them difficult for existing detection methods to identify. This paper introduces a two-tiered anomaly detection appro...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/trustcom66490.2025.00035","openalex_id":"https://openalex.org/W7127077112","cited_by_count":0,"quality_score":41,"matched_keywords":["efficient"],"author_affiliations":["Huawei Technologies (China)","Tel Aviv University","Tsinghua University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7958999872207642},{"id":"https://openalex.org/C739882","display_name":"Anomaly detection","score":0.6963000297546387},{"id":"https://openalex.org/C137524506","display_name":"Anomaly-based intrusion detection system","score":0.6029999852180481},{"id":"https://openalex.org/C35525427","display_name":"Intrusion detection system","score":0.57669997215271},{"id":"https://openalex.org/C124101348","display_name":"Data mining","score":0.5652999877929688},{"id":"https://openalex.org/C177774035","display_name":"Granularity","score":0.5145000219345093},{"id":"https://openalex.org/C8038995","display_name":"Unsupervised learning","score":0.48559999465942383},{"id":"https://openalex.org/C55596503","display_name":"Data definition language","score":0.474700003862381}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4416209966","title":"Author Correction: Foundation model for efficient biological discovery in single-molecule time traces","url":"https://doi.org/10.1038/s41592-025-02977-9","published":"2025-11-14","authors":["Jieming Li","Leyou Zhang","Alexander Johnson‐Buck","Nils G. Walter"],"abstract":"","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1038/s41592-025-02977-9","openalex_id":"https://openalex.org/W4416209966","cited_by_count":0,"quality_score":41,"matched_keywords":["efficient"],"author_affiliations":["Analysis Group (United States)","Bristol-Myers Squibb (Germany)","Bristol-Myers Squibb (Ireland)","Google (United States)","The Bristol-Myers Squibb Children's Hospital","University of Michigan"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6100999712944031},{"id":"https://openalex.org/C2780966255","display_name":"Foundation (evidence)","score":0.5924000144004822},{"id":"https://openalex.org/C2522767166","display_name":"Data science","score":0.39399999380111694},{"id":"https://openalex.org/C70721500","display_name":"Computational biology","score":0.3384000062942505},{"id":"https://openalex.org/C127413603","display_name":"Engineering","score":0.27720001339912415},{"id":"https://openalex.org/C115903868","display_name":"Software engineering","score":0.27480000257492065},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.26339998841285706},{"id":"https://openalex.org/C2984917352","display_name":"Scientific discovery","score":0.2395000010728836}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7127139164","title":"A Semantic-Aware Network Intelligence Framework for Anomaly Detection using Large Language Models","url":"https://doi.org/10.1109/trustcom66490.2025.00389","published":"2025-11-14","authors":["Wei Li","Jianjun Li","Hui Shao","Junjie Li","Wei Zhang"],"abstract":"As enterprise networks grow in complexity, detecting sophisticated and covert anomalous traffic has become a critical challenge. This paper presents SENTINEL, a framework that integrates Large Language Models (LLMs) into anomaly detection by converting traffic logs into natural language and generating semantic embeddings. These embeddings are fused with traditional numerical features to form enriched representations, processed through a Denoising Autoencoder and a Transformer to capture temporal patterns. Experiments on a large enterprise dataset (ZE-CDT) show that SENTINEL achieves notable performance, with ablation studies confirming the clear benefits of LLM-based semantic augmentation for more context-aware intrusion detection.","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/trustcom66490.2025.00389","openalex_id":"https://openalex.org/W7127139164","cited_by_count":0,"quality_score":41,"matched_keywords":["LLM"],"author_affiliations":["Alibaba Group (China)","China Tobacco"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7297999858856201},{"id":"https://openalex.org/C739882","display_name":"Anomaly detection","score":0.6132000088691711},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5906000137329102},{"id":"https://openalex.org/C35525427","display_name":"Intrusion detection system","score":0.5773000121116638},{"id":"https://openalex.org/C101738243","display_name":"Autoencoder","score":0.5152999758720398},{"id":"https://openalex.org/C195324797","display_name":"Natural language","score":0.43309998512268066},{"id":"https://openalex.org/C124101348","display_name":"Data mining","score":0.4277999997138977},{"id":"https://openalex.org/C66322947","display_name":"Transformer","score":0.39719998836517334}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"arxiv:2511.17462","title":"Scaling Conditional Autoencoders for Portfolio Optimization via Uncertainty-Aware Factor Selection","url":"http://arxiv.org/abs/2511.17462","published":"2025-11-14","authors":["Ronald E. Engel","Yu Chen","Paweł Polak","Ioana Boier"],"abstract":"Conditional Autoencoders (CAEs) offer a flexible, interpretable approach for estimating latent asset-pricing factors from firm characteristics. However, existing studies usually limit the latent factor dimension to around K = 5 due to concerns that larger K can degrade performance. To overcome this challenge, we propose a scalable framework that couples a high-dimensional CAE with an uncertainty-aware factor selection procedure. We employ three models for quantile prediction: zero-shot Chronos, a pretrained time-series foundation model (ZS-Chronos), gradient-boosted quantile regression trees using XGBoost and RAPIDS (Q-Boost), and an I.I.D bootstrap-based sample mean model (IID-BS). For each model, we rank factors by forecast uncertainty and retain the top-κ most predictable factors for portfolio construction, where κ denotes the selected subset of factors. This pruning strategy delivers...","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3768292.3770415","openalex_id":"https://openalex.org/W4416197284","cited_by_count":1,"quality_score":38,"matched_keywords":[],"author_affiliations":["Nvidia (United States)","Stony Brook University"],"concepts":[{"id":"https://openalex.org/C10879293","display_name":"Factor analysis","score":0.6047999858856201},{"id":"https://openalex.org/C108010975","display_name":"Pruning","score":0.5849000215530396},{"id":"https://openalex.org/C118671147","display_name":"Quantile","score":0.5461999773979187},{"id":"https://openalex.org/C81917197","display_name":"Selection (genetic algorithm)","score":0.5339000225067139},{"id":"https://openalex.org/C2780821815","display_name":"Portfolio","score":0.5101000070571899},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.501800000667572},{"id":"https://openalex.org/C33676613","display_name":"Dimension (graph theory)","score":0.5016999840736389},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4794999957084656}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W7105610646","title":"Generalizing Graph Transformers Across Diverse Graphs and Tasks via Pre-Training","url":"https://doi.org/10.1109/tkde.2025.3632394","published":"2025-11-13","authors":["Yufei He","Zhenyu Hou","Yukuo Cen","Jun Hu","Feng He","Xu Cheng","Jie Tang","Bryan Hooi"],"abstract":"Graph pre-training has been concentrated on graph-level tasks involving small graphs (e.g., molecular graphs) or learning node representations on a fixed graph. Extending graph pre-trained models to web-scale graphs with billions of nodes in industrial scenarios, while avoiding negative transfer across graphs or tasks, remains a challenge. We aim to develop a general graph pre-trained model with inductive ability that can make predictions for unseen new nodes and even new graphs. In this work, we introduce a scalable transformer-based graph pre-training framework called (Pre-trained Graph Transformer). Based on the masked autoencoder architecture, we design two pre-training tasks: one for reconstructing node features and the other for reconstructing local structures. Unlike the original autoencoder architecture where the pre-trained decoder is discarded, we propose a novel strategy that....","companies":["Z.ai/Zhipu","Tencent/Hunyuan"],"matched_orgs":["Z.ai/Zhipu","Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/tkde.2025.3632394","openalex_id":"https://openalex.org/W7105610646","cited_by_count":0,"quality_score":49,"matched_keywords":[],"author_affiliations":["National University of Singapore","Tencent (China)","Tsinghua University","Zhipu AI (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7937999963760376},{"id":"https://openalex.org/C48044578","display_name":"Scalability","score":0.7451000213623047},{"id":"https://openalex.org/C80444323","display_name":"Theoretical computer science","score":0.6338000297546387},{"id":"https://openalex.org/C101738243","display_name":"Autoencoder","score":0.5741000175476074},{"id":"https://openalex.org/C132525143","display_name":"Graph","score":0.487199991941452},{"id":"https://openalex.org/C157406716","display_name":"Topological graph theory","score":0.38940000534057617},{"id":"https://openalex.org/C59404180","display_name":"Feature learning","score":0.3797000050544739},{"id":"https://openalex.org/C62611344","display_name":"Node (physics)","score":0.3296000063419342}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7105605013","title":"A Survey on Deep Generative Models for Robot Learning From Multimodal Demonstrations","url":"https://doi.org/10.1109/tro.2025.3631816","published":"2025-11-13","authors":["Julen Urain","Ajay Mandlekar","Yilun Du","Nur Muhammad “Mahi” Shafiullah","Danfei Xu","Katerina Fragkiadaki","Georgia Chalvatzaki","Jan Peters"],"abstract":"Learning from Demonstrations, the field that proposes to learn robot behavior models from data, is gaining popularity with the emergence of deep generative models. Although the problem has been studied for years under names such as Imitation Learning, Behavioral Cloning, or Inverse Reinforcement Learning, classical methods have relied on models that don't capture complex data distributions well or don't scale well to large numbers of demonstrations. In recent years, the robot learning community has shown increasing interest in using deep generative models to capture the complexity of large datasets. In this survey, we aim to provide a unified and comprehensive review of the last year's progress in the use of deep generative models in robotics. We present the different types of models that the community has explored, such as energy-based models, diffusion models, action value maps, or gen...","companies":["Meta/FAIR","NVIDIA"],"matched_orgs":["Meta/FAIR","NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/tro.2025.3631816","openalex_id":"https://openalex.org/W7105605013","cited_by_count":0,"quality_score":49,"matched_keywords":[],"author_affiliations":["Berkeley College","Carnegie Mellon University","Georgia Institute of Technology","Harvard University Press","Meta (United States)","Nvidia (United States)","Technische Universität Darmstadt"],"concepts":[{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.7170000076293945},{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.703499972820282},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6754000186920166},{"id":"https://openalex.org/C108583219","display_name":"Deep learning","score":0.605400025844574},{"id":"https://openalex.org/C171268870","display_name":"GRASP","score":0.560699999332428},{"id":"https://openalex.org/C167966045","display_name":"Generative model","score":0.553600013256073},{"id":"https://openalex.org/C177148314","display_name":"Generalization","score":0.5353999733924866},{"id":"https://openalex.org/C9652623","display_name":"Field (mathematics)","score":0.450300008058548}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7131619982","title":"Self-Disentangling Domain-Specific and Domain-Agnostic Representations Across Multiple Sources for Data-To-Text Generation","url":"https://doi.org/10.1109/ickg66886.2025.00016","published":"2025-11-13","authors":["Mingxuan Du","Jingbo Zhou","Fuzhen Zhuang","Yuhong Zhang"],"abstract":"Recent years have witnessed the increasing research attention on the topic of data-to-text generation(it is also called table-to-text generation), due to its wide range of applications, such as generating textual summaries from structured knowledge graphs. Existing methods usually require a large amount of labeled data to achieve satisfying performance. However, it is very expensive and time consuming to collect enough labeled data in a specific domain to train a model. Though it is possible to pre-train a general model through multiple source domain data and then use the target domain data to fine-tune it, there may exist the gap between each other. To this end, we propose a novel data-to-text generation model, named SMSTL, with utilizing multiple sources in a transfer learning setting. The core of SMSTL is a special designed self-disentangling mechanism to disentangle the domain-specif...","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/ickg66886.2025.00016","openalex_id":"https://openalex.org/W7131619982","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Baidu (China)","Beihang University","Hefei University of Technology"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7483999729156494},{"id":"https://openalex.org/C36503486","display_name":"Domain (mathematical analysis)","score":0.6481000185012817},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5156000256538391},{"id":"https://openalex.org/C204323151","display_name":"Range (aeronautics)","score":0.47929999232292175},{"id":"https://openalex.org/C2164484","display_name":"Core (optical fiber)","score":0.42410001158714294},{"id":"https://openalex.org/C89611455","display_name":"Mechanism (biology)","score":0.38280001282691956},{"id":"https://openalex.org/C2776145971","display_name":"Labeled data","score":0.376800000667572},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.3725999891757965}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4416774073","title":"AI-discovered tuning laws explain neuronal population code geometry","url":"https://doi.org/10.1101/2025.11.12.688086","published":"2025-11-13","authors":["Reilly Tilbury","Dabin Kwon","Ali Haydaroğlu","Jacob M Ratliff","Valentin Schmutz","Matteo Carandini","Kevin J Miller","Kim Stachenfeld","Kenneth D. Harris"],"abstract":"The activity of visual cortical neurons forms a population code representing image stimuli. There is, however, a discrepancy between our understanding of this code at the single-cell and population levels: direct measurements indicate the population code is high-dimensional, but established models of single-cell tuning give rise to low-dimensional codes. We reconciled this discrepancy by developing an AI science system to find a new parsimonious, interpretable equation for visual cortical orientation tuning. Candidate equations were expressed as short computer programs and evolved by Large Language Models (LLMs) using graphical diagnostics. The resulting equation not only improved single-cell fits, but also accurately modelled the population code’s high-dimensional geometry. A novel parameter of the AI-discovered equation, which controls single-cell tuning smoothness, gives rise to high-...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"preprint","doi":"https://doi.org/10.1101/2025.11.12.688086","openalex_id":"https://openalex.org/W4416774073","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Google (United States)","Google DeepMind (United Kingdom)","National Hospital for Neurology and Neurosurgery","University College London"],"concepts":[{"id":"https://openalex.org/C2908647359","display_name":"Population","score":0.7023000121116638},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5404999852180481},{"id":"https://openalex.org/C179518139","display_name":"Coding (social sciences)","score":0.5382000207901001},{"id":"https://openalex.org/C43126263","display_name":"Source code","score":0.5174999833106995},{"id":"https://openalex.org/C2776760102","display_name":"Code (set theory)","score":0.49239999055862427},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4805999994277954},{"id":"https://openalex.org/C77637269","display_name":"Neural coding","score":0.4661000072956085},{"id":"https://openalex.org/C102634674","display_name":"Smoothness","score":0.451200008392334}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"official:7df3217d44a23e89","title":"GPT-5.1 Instant and GPT-5.1 Thinking System Card Addendum","url":"https://openai.com/index/gpt-5-system-card-addendum-gpt-5-1","published":"2025-11-12","authors":["OpenAI"],"abstract":"This GPT-5 system card addendum provides updated safety metrics for GPT-5.1 Instant and Thinking, including new evaluations for mental health and emotional reliance.","companies":["OpenAI"],"matched_orgs":["OpenAI"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Publication"],"author_affiliations":["OpenAI"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official RSS feed https://openai.com/news/rss.xml"}},{"id":"openalex:W7131434547","title":"Bi-NAS: Towards Effective and Personalized Explanation for Recommender Systems via Bi-Level Neural Architecture Search","url":"https://doi.org/10.1109/icdm65498.2025.00177","published":"2025-11-12","authors":["Longfeng Wu","Yao Zhou","Tong Zeng","Zhimin Peng","Bhanu Pratap Singh Rawat","Lecheng Zheng","Giovanni Seni","Dawei Zhou"],"abstract":"Recommender systems are vital in helping users navigate vast amounts of information, offering personalized suggestions and effective explanations for these recommendations. While previous efforts have attempted to provide such explanations, evaluating their effectiveness across various scenarios remains a challenge. Enhancing these explanations is essential for improving user engagement, trust, and decision-making. To facilitate effective explanations within the recommender system, we propose a Bi-level Neural Architecture Search (Bi-NAS) frame-work to optimize explanations. This approach simultaneously refines cross-attention mechanisms and feature interaction functions by exploring both intra-layer and inter-layer design spaces. Furthermore, we integrate Large Language Models (LLMs) to enhance explanation generation, leveraging zero-shot prompting to produce more effective and personal...","companies":["Google/DeepMind","Amazon"],"matched_orgs":["Google/DeepMind","Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/icdm65498.2025.00177","openalex_id":"https://openalex.org/W7131434547","cited_by_count":0,"quality_score":53,"matched_keywords":["personalized"],"author_affiliations":["Amazon (United States)","Google (United States)","Virginia Tech"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8407999873161316},{"id":"https://openalex.org/C557471498","display_name":"Recommender system","score":0.796999990940094},{"id":"https://openalex.org/C2780233690","display_name":"Transparency (behavior)","score":0.5565000176429749},{"id":"https://openalex.org/C2776401178","display_name":"Feature (linguistics)","score":0.491100013256073},{"id":"https://openalex.org/C183003079","display_name":"Personalization","score":0.47440001368522644},{"id":"https://openalex.org/C2779530757","display_name":"Quality (philosophy)","score":0.4722000062465668},{"id":"https://openalex.org/C2776760102","display_name":"Code (set theory)","score":0.4480000138282776},{"id":"https://openalex.org/C123657996","display_name":"Architecture","score":0.4458000063896179}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"apple:lp79vl03yx7vc3tq3wkp0rtf","title":"CAR-Flow: Condition-Aware Reparameterization Aligns Source and Target for Better Flow Matching","url":"https://machinelearning.apple.com/research/car-flow","published":"2025-11-12","authors":["Chen Chen","Pengsheng Guo","Liangchen Song","Jiasen Lu","Rui Qian","Xinze Wang","Tsu-Jui Fu","Wei Liu","Yinfei Yang","Alex Schwing"],"abstract":"Conditional generative modeling aims to learn a conditional data distribution from samples containing data-condition pairs. For this, diffusion and flow-based methods have attained compelling results. These methods use a learned (flow) model to transport an initial standard Gaussian noise that ignores the condition to the conditional data distribution. The model is hence required to learn both mass transport and conditional injection. To ease the...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"openalex:W7133311432","title":"PRvL: Quantifying the Capabilities and Risks of Large Language Models for PII Redaction","url":"https://doi.org/10.1109/tps-isa67132.2025.00025","published":"2025-11-12","authors":["Leon Garza","Anantaa Kotal","Aritran Piplai","Lavanya Elluri","Prajit Kumar Das","Aman Chadha"],"abstract":"Redacting Personally Identifiable Information (PII) from unstructured text is critical for ensuring data privacy in regulated domains. While earlier approaches have relied on rulebased systems and domain-specific Named Entity Recognition (NER) models, these methods fail to generalize across formats and contexts. Recent advances in Large Language Models (LLMs) offer a promising alternative, yet the effect of architectural and training choices on redaction performance remains underexplored. LLMs have demonstrated strong performance in tasks that require contextual language understanding, including the redaction of PII in free-form text. Prior work suggests that with appropriate adaptation, LLMs can become effective contextual privacy learners. However, the consequences of architectural and training choices for PII Redaction remain underexplored. In this work, we present a comprehensive ana...","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/tps-isa67132.2025.00025","openalex_id":"https://openalex.org/W7133311432","cited_by_count":1,"quality_score":50,"matched_keywords":["LLM","language model","efficient"],"author_affiliations":["Amazon (United States)","Cisco Systems (United States)","Texas A&M University – Central Texas","The University of Texas at El Paso"],"concepts":[{"id":"https://openalex.org/C2776795254","display_name":"Redaction","score":0.9128999710083008},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7487999796867371},{"id":"https://openalex.org/C2780598303","display_name":"Flexibility (engineering)","score":0.5360999703407288},{"id":"https://openalex.org/C2776214188","display_name":"Inference","score":0.5073000192642212},{"id":"https://openalex.org/C2522767166","display_name":"Data science","score":0.41920000314712524},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.3727000057697296},{"id":"https://openalex.org/C41458344","display_name":"Publication","score":0.3422999978065491},{"id":"https://openalex.org/C2780233690","display_name":"Transparency (behavior)","score":0.3257000148296356}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W7131393896","title":"Efficient Sequential Recommendation for Long Term User Interest Via Personalization","url":"https://doi.org/10.1109/icdm65498.2025.00099","published":"2025-11-12","authors":["Qiang Zhang","Hanchao Yu","Ivan Ji","Chen Yuan","Yi Zhang","Chihuang Liu","Xiaolong Wang","Christopher E. Lambert","Ren Chen","Chen Kovacs","Xinzhu Bei","Renqin Cai"],"abstract":"Recent years have witnessed success of sequential modeling, generative recommender, and large language model for recommendation. Though the scaling law has been validated for sequential models, it showed inefficiency in computational capacity when considering real-world applications like recommendation, due to the non-linear(quadratic) increasing nature of the transformer model. To improve the efficiency of the sequential model, we introduced a novel approach to sequential recommendation that leverages personalization techniques to enhance efficiency and performance. Our method compresses long user interaction histories into learnable tokens, which are then combined with recent interactions to generate recommendations. This approach significantly reduces computational costs while maintaining high recommendation accuracy. Our method could be applied to existing transformer based recommend...","companies":["Meta/FAIR"],"matched_orgs":["Meta/FAIR"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/icdm65498.2025.00099","openalex_id":"https://openalex.org/W7131393896","cited_by_count":0,"quality_score":49,"matched_keywords":["language model","personalization","efficient"],"author_affiliations":["Meta (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8183000087738037},{"id":"https://openalex.org/C183003079","display_name":"Personalization","score":0.8007000088691711},{"id":"https://openalex.org/C2778869765","display_name":"Inefficiency","score":0.6158000230789185},{"id":"https://openalex.org/C66322947","display_name":"Transformer","score":0.4977000057697296},{"id":"https://openalex.org/C557471498","display_name":"Recommender system","score":0.47929999232292175},{"id":"https://openalex.org/C67712803","display_name":"User modeling","score":0.45570001006126404},{"id":"https://openalex.org/C61797465","display_name":"Term (time)","score":0.4081000089645386},{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.3978999853134155}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4416148780","title":"MobilityGPT: Enhanced Human Mobility Modeling With a GPT Model","url":"https://doi.org/10.1109/tits.2025.3626357","published":"2025-11-12","authors":["Ammar Haydari","Dongjie Chen","Zhengfeng Lai","Michael Zhang","Chen‐Nee Chuah"],"abstract":"Generative models have shown promising results in capturing human mobility characteristics and generating synthetic trajectories. However, it remains challenging to ensure that the generated geospatial mobility data is semantically realistic, including consistent location sequences, and reflects real-world characteristics, such as constraining on geospatial limits. We reformat human mobility modeling as an autoregressive generation task to address these issues, leveraging the Generative Pre-trained Transformer (GPT) architecture. To ensure its controllable generation to alleviate the above challenges, we propose a geospatially-aware generative model, MobilityGPT. We propose a gravity-based sampling method to train a transformer for semantic sequence similarity. Then, we constrained the training process via a road connectivity matrix that provides the connectivity of sequences in trajecto...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/tits.2025.3626357","openalex_id":"https://openalex.org/W4416148780","cited_by_count":3,"quality_score":44,"matched_keywords":["preference"],"author_affiliations":["Apple (United States)","University of California, Davis"],"concepts":[{"id":"https://openalex.org/C9770341","display_name":"Geospatial analysis","score":0.758899986743927},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7491999864578247},{"id":"https://openalex.org/C66322947","display_name":"Transformer","score":0.5220999717712402},{"id":"https://openalex.org/C13662910","display_name":"Trajectory","score":0.5192000269889832},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4684999883174896},{"id":"https://openalex.org/C159877910","display_name":"Autoregressive model","score":0.46560001373291016},{"id":"https://openalex.org/C167966045","display_name":"Generative model","score":0.45350000262260437},{"id":"https://openalex.org/C98045186","display_name":"Process (computing)","score":0.4375999867916107}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":3}},{"id":"arxiv:2511.09057","title":"PAN: A World Model for General, Interactable, and Long-Horizon World Simulation","url":"https://huggingface.co/papers/2511.09057","published":"2025-11-12","authors":["PAN Team","Jiannan Xiang","Yi Gu","Zihan Liu","Zeyu Feng","Qiyue Gao","Yiyan Hu","Benhao Huang","Guangyi Liu","Yichi Yang","Kun Zhou","Davit Abrahamyan"],"abstract":"A world model enables an intelligent agent to imagine, predict, and reason about how the world evolves in response to its actions, and accordingly to plan and strategize. While recent video generation models produce realistic visual sequences, they typically operate in the prompt-to-full-video manner without causal control, interactivity, or long-horizon consistency required for purposeful reasoning. Existing world modeling efforts, on the other hand, often focus on restricted domains (e.g., physical, game, or 3D-scene dynamics) with limited depth and controllability, and struggle to generalize across diverse environments and interaction formats. In this work, we introduce PAN, a general, interactable, and long-horizon world model that predicts future world states through high-quality video simulation conditioned on history and natural language actions. PAN employs the Generative Latent....","companies":["Moonshot/Kimi"],"matched_orgs":["Moonshot/Kimi"],"company_groups":["company_china"],"company_regions":["China"],"sources":["huggingface_search"],"source":"huggingface_search","work_type":"","doi":"","openalex_id":"","cited_by_count":0,"quality_score":43,"matched_keywords":["LLM","language model","long-term","agent"],"author_affiliations":[],"concepts":[],"official_report":false,"quality_signals":{"company_match_source":"HuggingFace Papers organization/author metadata"}},{"id":"openalex:W4416204877","title":"Aligning machine and human visual representations across abstraction levels","url":"https://doi.org/10.1038/s41586-025-09631-6","published":"2025-11-12","authors":["Lukas Muttenthaler","Klaus Greff","Frieda Born","Bernhard Spitzer","Simon Kornblith","Michael C. Mozer","K. Müller","Thomas Unterthiner","Andrew K. Lampinen"],"abstract":"), model representations do not accurately capture all these levels of abstraction. To address this misalignment, we first train a teacher model to imitate human judgements, then transfer human-aligned structure from its representations to refine the representations of pretrained state-of-the-art vision foundation models via fine-tuning. These human-aligned models more accurately approximate human behaviour and uncertainty across a wide range of similarity tasks, including a dataset of human judgements spanning multiple levels of semantic abstractions. They also perform better on a diverse set of machine learning tasks, increasing generalization and out-of-distribution robustness. Thus, infusing neural networks with additional human knowledge yields a best-of-both-worlds representation that is both more consistent with human cognitive judgements and more practically useful, paving the wa...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1038/s41586-025-09631-6","openalex_id":"https://openalex.org/W4416204877","cited_by_count":6,"quality_score":43,"matched_keywords":[],"author_affiliations":["Berlin Institute for the Foundations of Learning and Data","Berlin Institute of Health at Charité - Universitätsmedizin Berlin","Google (United States)","Max Planck Institute for Human Cognitive and Brain Sciences","Max Planck Institute for Human Development","Max Planck Institute for Informatics","Oral Roberts University","Robert Koch Institute","Technische Universität Berlin","Technische Universität Dresden"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7353000044822693},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.724399983882904},{"id":"https://openalex.org/C177148314","display_name":"Generalization","score":0.7099000215530396},{"id":"https://openalex.org/C124304363","display_name":"Abstraction","score":0.6711000204086304},{"id":"https://openalex.org/C2776359362","display_name":"Representation (politics)","score":0.6365000009536743},{"id":"https://openalex.org/C103278499","display_name":"Similarity (geometry)","score":0.626800000667572},{"id":"https://openalex.org/C50644808","display_name":"Artificial neural network","score":0.5877000093460083},{"id":"https://openalex.org/C177264268","display_name":"Set (abstract data type)","score":0.5730999708175659}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":6}},{"id":"openalex:W7134961330","title":"Scalable Multilingual PII Annotation for Responsible AI in LLMs","url":"https://doi.org/10.1109/icdmw69685.2025.00049","published":"2025-11-12","authors":["Bharti Meena","Joanna Skubisz","Harshit Rajgarhia","Nand Dave","Kiran Ganesh","Shivali Dalmia","Abhishek Mukherji","Vasudevan Sundarababu","Olga Pospelova"],"abstract":"As Large Language Models (LLMs) gain wider adoption, ensuring their reliable handling of Personally Identifiable Information (PII) across diverse regulatory contexts has become essential. This work introduces a scalable multilingual data curation framework designed for high-quality PII annotation across 13 underrepresented locales (Table I), covering approximately 336 locale-specific PII types. Our phased, human-in-the-loop annotation methodology combines linguistic expertise with rigorous quality assurance, leading to substantial improvements in recall and false positive rates from pilot, training, and production phases. By leveraging inter-annotator agreement metrics and root-cause analysis, the framework systematically uncovers and resolves annotation inconsistencies, resulting in high-fidelity datasets suitable for supervised LLM fine-tuning. Beyond reporting empirical gains, we high...","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/icdmw69685.2025.00049","openalex_id":"https://openalex.org/W7134961330","cited_by_count":1,"quality_score":42,"matched_keywords":["LLM"],"author_affiliations":["Amazon (United States)","GGz centraal"],"concepts":[{"id":"https://openalex.org/C2776321320","display_name":"Annotation","score":0.6902999877929688},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.617900013923645},{"id":"https://openalex.org/C136764020","display_name":"World Wide Web","score":0.41830000281333923},{"id":"https://openalex.org/C48044578","display_name":"Scalability","score":0.39890000224113464},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3749000132083893},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.3743000030517578},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.2955000102519989},{"id":"https://openalex.org/C2522767166","display_name":"Data science","score":0.258899986743927}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W7134890737","title":"A Practical Synthesis of Detecting AI-Generated Textual, Visual, and Audio Content","url":"https://doi.org/10.1109/icdmw69685.2025.00170","published":"2025-11-12","authors":["Lele Cao"],"abstract":"Advances in AI-generated content have led to wide adoption of large language models, diffusion-based visual generators, and synthetic audio tools. However, these developments raise critical concerns about misinformation, copyright infringement, security threats, and the erosion of public trust. In this paper, we explore an extensive range of methods designed to detect and mitigate AI-generated textual, visual, and audio content. We begin by discussing motivations and potential impacts associated with AI-based content generation, including real-world risks and ethical dilemmas. We then outline detection techniques spanning observation-based strategies, linguistic and statistical analysis, model-based pipelines, watermarking and fingerprinting, as well as emergent ensemble approaches. We also present new perspectives on robustness, adaptation to rapidly improving generative architectures,....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/icdmw69685.2025.00170","openalex_id":"https://openalex.org/W7134890737","cited_by_count":1,"quality_score":42,"matched_keywords":["media"],"author_affiliations":["Microsoft (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6159999966621399},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4259999990463257},{"id":"https://openalex.org/C2778152352","display_name":"Content (measure theory)","score":0.391400009393692},{"id":"https://openalex.org/C28490314","display_name":"Speech recognition","score":0.3158000111579895},{"id":"https://openalex.org/C99498987","display_name":"Noise (video)","score":0.29170000553131104},{"id":"https://openalex.org/C177264268","display_name":"Set (abstract data type)","score":0.28850001096725464},{"id":"https://openalex.org/C26517878","display_name":"Key (lock)","score":0.2773999869823456},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.26930001378059387}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W7134985824","title":"OTTER: Open-Tagging via Text-Image Representation for Multi-Modal Understanding","url":"https://doi.org/10.1109/icdmw69685.2025.00054","published":"2025-11-12","authors":["Jieer Ouyang","Xiaoneng Xiang","Zheng Wang","Yuting Ding"],"abstract":"We introduce OTTER, a unified open-set multi-label tagging framework that harmonizes the stability of a curated, predefined category set with the adaptability of user-driven open tags. OTTER is built upon a large-scale, hierarchically organized multi-modal dataset, collected from diverse online repositories and annotated through a hybrid pipeline combining automated vision-language labeling with human refinement. By leveraging a multi-head attention architecture, OTTER jointly aligns visual and textual representations with both fixed and open-set label embeddings, enabling dynamic and semantically consistent tagging. OTTER consistently outperforms competitive baselines on two benchmark datasets: it achieves an overall F1 score of 0.81 on Otter and 0.75 on Favorite, surpassing the nextbest results by margins of 0.10 and 0.02, respectively. OTTER attains near-perfect performance on open-se...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/icdmw69685.2025.00054","openalex_id":"https://openalex.org/W7134985824","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Huawei German Research Center","Huawei Technologies (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5616999864578247},{"id":"https://openalex.org/C2776359362","display_name":"Representation (politics)","score":0.4675000011920929},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.43220001459121704},{"id":"https://openalex.org/C11413529","display_name":"Algorithm","score":0.28279998898506165},{"id":"https://openalex.org/C12713177","display_name":"Perspective (graphical)","score":0.27489998936653137},{"id":"https://openalex.org/C177264268","display_name":"Set (abstract data type)","score":0.2728999853134155},{"id":"https://openalex.org/C2776401178","display_name":"Feature (linguistics)","score":0.27129998803138733},{"id":"https://openalex.org/C33923547","display_name":"Mathematics","score":0.27079999446868896}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4416151572","title":"Catalysts of Transformation: Deep Dive on Transformer Architecture the Tech Behind Large Language Models and Generative AI","url":"https://doi.org/10.1007/978-981-95-1746-6_87","published":"2025-11-12","authors":["Praneet Amul Akash Cherukuri","Shivani Yadao","Vijender Kumar Solanki"],"abstract":"","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"book-chapter","doi":"https://doi.org/10.1007/978-981-95-1746-6_87","openalex_id":"https://openalex.org/W4416151572","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Amazon (United States)","University College for Women"],"concepts":[{"id":"https://openalex.org/C70587473","display_name":"Transformative learning","score":0.871399998664856},{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.7875000238418579},{"id":"https://openalex.org/C123657996","display_name":"Architecture","score":0.5773000121116638},{"id":"https://openalex.org/C66322947","display_name":"Transformer","score":0.5157999992370605},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5127000212669373},{"id":"https://openalex.org/C127413603","display_name":"Engineering","score":0.37299999594688416},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.35690000653266907},{"id":"https://openalex.org/C195324797","display_name":"Natural language","score":0.3443000018596649}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7134923597","title":"Augmenting Question Answering with A Hybrid RAG Approach","url":"https://doi.org/10.1109/cogmi67134.2025.00051","published":"2025-11-12","authors":["Tianyi Yang","Nashrah Haque","Vaishnave Jonnalagadda","Yuya Jeremy Ong","Zhehui Chen","Yanzhao Wu","Lei Yu","Divyesh Jadav","Wenqi Wei"],"abstract":"","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/cogmi67134.2025.00051","openalex_id":"https://openalex.org/W7134923597","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Florida International University","Fordham University","Google (United States)","Plastic Electronic (Austria)","Plastic Surgery Hospital","Rensselaer Polytechnic Institute"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5995000004768372},{"id":"https://openalex.org/C44291984","display_name":"Question answering","score":0.5839999914169312},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.448199987411499},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.3499000072479248},{"id":"https://openalex.org/C2776401178","display_name":"Feature (linguistics)","score":0.2915000021457672},{"id":"https://openalex.org/C2778112365","display_name":"Sequence (biology)","score":0.25999999046325684},{"id":"https://openalex.org/C192209626","display_name":"Focus (optics)","score":0.2460000067949295},{"id":"https://openalex.org/C113336015","display_name":"Complete information","score":0.23669999837875366}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4416132599","title":"Tackling Non-Stationarity in HVAC Control with TimeGPT-Enhanced Deep Reinforcement Learning","url":"https://doi.org/10.1145/3736425.3772354","published":"2025-11-11","authors":["Jiatong Li","Bokai Ji","Guangxia Li","Peilin Zhao","Liu Liu","Z. Z. Ren"],"abstract":"For the heating, ventilation and air conditioning (HVAC) control problem, the varying environmental factors like weather conditions and occupant activities cause the system dynamics non-stationary, making it challenging for classical Markov decision process (MDP) approaches that typically assume stationarity. We show that augmenting the state space with forecasts of varying environmental factors mitigates the impact of non-stationarity, leading to more stable and accelerated policy convergence. We propose a deep reinforcement learning (DRL) based HVAC control pipeline which uses a time-series foundation model known as TimeGPT to perform rolling forecasts of critical environmental factors such as outdoor temperature, solar irradiance, and occupant numbers. Owing to pretraining on massive datasets, TimeGPT can generate accurate predictions for unseen time-series without any task-specific t...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3736425.3772354","openalex_id":"https://openalex.org/W4416132599","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Shanghai Jiao Tong University","Tencent (China)","Wisdom Health (United States)","Xidian University"],"concepts":[{"id":"https://openalex.org/C122346748","display_name":"HVAC","score":0.71670001745224},{"id":"https://openalex.org/C97541855","display_name":"Reinforcement learning","score":0.6725000143051147},{"id":"https://openalex.org/C106189395","display_name":"Markov decision process","score":0.6053000092506409},{"id":"https://openalex.org/C105339364","display_name":"Software deployment","score":0.5597000122070312},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5404000282287598},{"id":"https://openalex.org/C98045186","display_name":"Process (computing)","score":0.511900007724762},{"id":"https://openalex.org/C72434380","display_name":"State space","score":0.460099995136261},{"id":"https://openalex.org/C2775924081","display_name":"Control (management)","score":0.4503999948501587}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4416078617","title":"Exploring Reasoning-Infused Text Embedding with Large Language Models for Zero-Shot Dense Retrieval","url":"https://doi.org/10.1145/3746252.3760855","published":"2025-11-10","authors":["Yuxiang Liu","Tian Wang","Gautam Kundu","Tianyu Cao","Guang Cheng","Zhen Ge","Jianshu Chen","Qingjun Cui","Trishul Chilimbi"],"abstract":"Transformer-based models such as BERT and E5 have significantly advanced text embedding by capturing rich contextual representations. However, many complex real-world queries require sophisticated reasoning to retrieve relevant documents beyond surface-level lexical matching, where encoder-only retrievers often fall short. Decoder-only large language models (LLMs), known for their strong reasoning capabilities, offer a promising alternative. Despite this potential, existing LLM-based embedding methods primarily focus on contextual representation and do not fully exploit the reasoning strength of LLMs. To bridge this gap, we propose Reasoning-Infused Text Embedding (RITE), a simple but effective approach that integrates logical reasoning into the text embedding process using generative LLMs. RITE builds upon existing language model embedding techniques by generating intermediate reasoning...","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3746252.3760855","openalex_id":"https://openalex.org/W4416078617","cited_by_count":0,"quality_score":49,"matched_keywords":["LLM","language model","retrieval"],"author_affiliations":["Amazon (United States)","University of Illinois Urbana-Champaign"],"concepts":[{"id":"https://openalex.org/C41608201","display_name":"Embedding","score":0.8127999901771545},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7806000113487244},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.6725000143051147},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.6184999942779541},{"id":"https://openalex.org/C165696696","display_name":"Exploit","score":0.6096000075340271},{"id":"https://openalex.org/C192209626","display_name":"Focus (optics)","score":0.5324000120162964},{"id":"https://openalex.org/C2776359362","display_name":"Representation (politics)","score":0.4975999891757965},{"id":"https://openalex.org/C98045186","display_name":"Process (computing)","score":0.42649999260902405}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"arxiv:2511.06719","title":"MobileLLM-Pro Technical Report","url":"https://huggingface.co/papers/2511.06719","published":"2025-11-10","authors":["Patrick Huber","Ernie Chang","Wei Wen","Igor Fedorov","Tarek Elgamal","Hanxian Huang","Naveen Suda","Chinnadhurai Sankar","Vish Vogeti","Yanghan Wang","Alex Gladkov","Kai Sheng Tai"],"abstract":"Efficient on-device language models around 1 billion parameters are essential for powering low-latency AI applications on mobile and wearable devices. However, achieving strong performance in this model class, while supporting long context windows and practical deployment remains a significant challenge. We introduce MobileLLM-Pro, a 1-billion-parameter language model optimized for on-device deployment. MobileLLM-Pro achieves state-of-the-art results across 11 standard benchmarks, significantly outperforming both Gemma 3-1B and Llama 3.2-1B, while supporting context windows of up to 128,000 tokens and showing only minor performance regressions at 4-bit quantization. These improvements are enabled by four core innovations: (1) implicit positional distillation, a novel technique that effectively instills long-context capabilities through knowledge distillation; (2) a specialist model mergi...","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["huggingface_search"],"source":"huggingface_search","work_type":"","doi":"","openalex_id":"","cited_by_count":0,"quality_score":43,"matched_keywords":["language model","efficient","quantization","distillation"],"author_affiliations":[],"concepts":[],"official_report":false,"quality_signals":{"company_match_source":"HuggingFace Papers organization/author metadata"}},{"id":"arxiv:2503.18065","title":"Unseen From Seen: Rewriting Observation-Instruction Using Foundation Models for Augmenting Vision-Language Navigation","url":"http://arxiv.org/abs/2503.18065","published":"2025-11-10","authors":["Ziming Wei","Bingqian Lin","Yunshuang Nie","Jiaqi Chen","Shikui Ma","Hang Xu","Xiaodan Liang"],"abstract":"Data scarcity is a long-standing challenge in the vision-language navigation (VLN) field, which extremely hinders the generalization of agents to unseen environments. Previous works primarily rely on additional simulator data or web-collected images/videos to improve the generalization. However, the simulator environments still face limited diversity, and the web-collected data often require extensive labor to remove the noise. In this article, we propose a Rewriting-driven AugMentation (RAM) paradigm for VLN, which directly creates the unseen observation-instruction pairs via rewriting human-annotated training data. Benefiting from our rewriting mechanism, new observation-instruction pairs can be obtained in both simulator-free and labor-saving manners to promote generalization. Specifically, we first introduce object-enriched observation rewriting, where we combine vision-language mode...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/tnnls.2025.3624691","openalex_id":"https://openalex.org/W4416140579","cited_by_count":1,"quality_score":38,"matched_keywords":[],"author_affiliations":["Centre for Artificial Intelligence and Robotics","Huawei Technologies (China)","Shanghai Jiao Tong University","Sun Yat-sen University","University of Hong Kong"],"concepts":[{"id":"https://openalex.org/C154690210","display_name":"Rewriting","score":0.8422999978065491},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7559999823570251},{"id":"https://openalex.org/C177148314","display_name":"Generalization","score":0.745199978351593},{"id":"https://openalex.org/C2779304628","display_name":"Face (sociological concept)","score":0.5494999885559082},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5231000185012817},{"id":"https://openalex.org/C2780598303","display_name":"Flexibility (engineering)","score":0.46549999713897705},{"id":"https://openalex.org/C2776359362","display_name":"Representation (politics)","score":0.37770000100135803},{"id":"https://openalex.org/C109747225","display_name":"Scarcity","score":0.36410000920295715}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4416078076","title":"ArMA: Mitigating Catastrophic Forgetting using Attention-Regularized Model Averaging in Continual Fine-tuning Large Language Models","url":"https://doi.org/10.1109/tai.2025.3630623","published":"2025-11-10","authors":["Xihe Qiu","Leijun Cheng","Teqi Hao","Xiaoyu Tan"],"abstract":"Recent advancements in continual fine-tuning have aimed to enhance the instruction-following capabilities of large language models (LLMs) within domain-specific contexts. However, these models often suffer from catastrophic forgetting, manifesting as a significant decline in performance on general domain tasks. This presents a substantial challenge for developers who seek to improve performance on a specific domain without compromising efficacy across previously established tasks. To mitigate the negative impact of fine-tuning on generalization and enable the model to adapt to new tasks while preserving its ability on general-domain tasks, we propose a novel framework, ARMA. This framework addresses catastrophic forgetting in the continual fine-tuning of LLMs through attention-regularized model averaging. Unlike the typical model average, which utilizes various metrics to measure and bal...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/tai.2025.3630623","openalex_id":"https://openalex.org/W4416078076","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Shanghai University of Engineering Science","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C7149132","display_name":"Forgetting","score":0.7225000262260437},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6739000082015991},{"id":"https://openalex.org/C173801870","display_name":"Heuristic","score":0.6044999957084656},{"id":"https://openalex.org/C2780451532","display_name":"Task (project management)","score":0.552299976348877},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.475600004196167},{"id":"https://openalex.org/C2776135515","display_name":"Regularization (linguistics)","score":0.45899999141693115},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.4499000012874603},{"id":"https://openalex.org/C177148314","display_name":"Generalization","score":0.43880000710487366}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"hf-org-paper:tencent:2511.06307","title":"DRIVE: Data Curation Best Practices for Reinforcement Learning with Verifiable Reward in Competitive Code Generation","url":"https://huggingface.co/papers/2511.06307","published":"2025-11-09","authors":["Tencent/Hunyuan"],"abstract":"","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["HuggingFace org papers","tencent"],"author_affiliations":["Tencent/Hunyuan"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/tencent/papers"}},{"id":"arxiv:2508.18812","title":"STARec: An Efficient Agent Framework for Recommender Systems via Autonomous Deliberate Reasoning","url":"http://arxiv.org/abs/2508.18812","published":"2025-11-08","authors":["Chenghao Wu","Ruiyang Ren","Junjie Zhang","Ruirui Wang","Zhongrui Ma","Qi Ye","Wayne Xin Zhao"],"abstract":"While modern recommender systems are instrumental in navigating information abundance, they remain fundamentally limited by static user modeling and reactive decision-making paradigms. Current large language model (LLM)-based agents inherit these shortcomings through their overreliance on heuristic pattern matching, yielding recommendations prone to shallow correlation bias, limited causal inference, and brittleness in sparse-data scenarios. We introduce STARec, a slow-thinking augmented agent framework that endows recommender systems with autonomous deliberative reasoning capabilities. Each user is modeled as an agent with parallel cognitions: fast response for immediate interactions and slow reasoning that performs chain-of-thought rationales. To cultivate intrinsic slow thinking, we develop anchored reinforcement training-a two-stage paradigm combining structured knowledge distillatio...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3746252.3760995","openalex_id":"https://openalex.org/W4416018117","cited_by_count":1,"quality_score":62,"matched_keywords":["LLM","language model","preference","efficient","distillation","agent"],"author_affiliations":["Huawei Technologies (China)","Renmin University of China"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8008000254631042},{"id":"https://openalex.org/C557471498","display_name":"Recommender system","score":0.6947000026702881},{"id":"https://openalex.org/C2776156558","display_name":"MovieLens","score":0.6147000193595886},{"id":"https://openalex.org/C173801870","display_name":"Heuristic","score":0.6025000214576721},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5350000262260437},{"id":"https://openalex.org/C97541855","display_name":"Reinforcement learning","score":0.4869000017642975},{"id":"https://openalex.org/C139807058","display_name":"Adaptation (eye)","score":0.4742000102996826},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.3824000060558319}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4416017642","title":"Querier-Aware LLM: Generating Personalized Responses to the Same Query from Different Queriers","url":"https://doi.org/10.1145/3746252.3761389","published":"2025-11-08","authors":["Hang Zeng","Chaoyue Niu","Fan Wu","Chengfei Lv","Guihai Chen"],"abstract":"Existing work on large language model (LLM) personalization assigned different responding roles to LLMs, but overlooked the diversity of queriers. In this work, we propose a new form of querier-aware LLM personalization, generating different responses even for the same query from different queriers. We design a dual-tower model architecture with a cross-querier general encoder and a querier-specific encoder. We further apply contrastive learning with multi-view augmentation, pulling close the dialogue representations of the same querier, while pulling apart those of different queriers. To mitigate the impact of query diversity on querier-contrastive learning, we cluster the dialogues based on query similarity and restrict the scope of contrastive learning within each cluster. To address the lack of datasets designed for querier-aware personalization, we also build a multi-querier dataset...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3746252.3761389","openalex_id":"https://openalex.org/W4416017642","cited_by_count":0,"quality_score":53,"matched_keywords":["LLM","language model","personalized","personalization"],"author_affiliations":["Alibaba Group (China)","Shanghai Jiao Tong University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8149999976158142},{"id":"https://openalex.org/C183003079","display_name":"Personalization","score":0.6280999779701233},{"id":"https://openalex.org/C2778012447","display_name":"Scope (computer science)","score":0.5541999936103821},{"id":"https://openalex.org/C103278499","display_name":"Similarity (geometry)","score":0.4991999864578247},{"id":"https://openalex.org/C12725497","display_name":"Baseline (sea)","score":0.4860000014305115},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.47909998893737793},{"id":"https://openalex.org/C118505674","display_name":"Encoder","score":0.46810001134872437},{"id":"https://openalex.org/C2779530757","display_name":"Quality (philosophy)","score":0.4498000144958496}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4416017570","title":"Autonomous Reasoning-Retrieval for Large Language Model Based Recommendation","url":"https://doi.org/10.1145/3746252.3761384","published":"2025-11-08","authors":["Bowen Zheng","Xiaolei Wang","Enze Liu","Xi Wang","Hongyu Lu","Yu Chen","Wayne Xin Zhao","Ji-Rong Wen"],"abstract":"Recently, large language models (LLMs) have been introduced into recommender systems (RSs) as recommendation backbones or to enhance traditional recommendation models (TRMs). However, existing LLM-based RSs fail to fully leverage the complementary strengths of LLMs (e.g., world knowledge and reasoning capabilities) and TRMs (e.g., recommendation-specific knowledge and computational efficiency), resulting in shallow exploration of the item space. To address this limitation, we propose DeepRec, a novel LLM-based RS approach that facilitates autonomous multi-turn interactions between LLMs and TRMs for deep item space exploration. In each interaction turn, LLMs reason over user preferences and collaborate with TRMs to retrieve candidate items. After multi-turn interaction, LLMs rank the aggregated candidates to generate the final recommendations. We utilize reinforcement learning (RL) for op...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3746252.3761384","openalex_id":"https://openalex.org/W4416017570","cited_by_count":0,"quality_score":53,"matched_keywords":["LLM","language model","preference","retrieval"],"author_affiliations":["Beijing Institute of Technology","Renmin University of China","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.70660001039505},{"id":"https://openalex.org/C153083717","display_name":"Leverage (statistics)","score":0.6729000210762024},{"id":"https://openalex.org/C557471498","display_name":"Recommender system","score":0.670799970626831},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.618399977684021},{"id":"https://openalex.org/C97541855","display_name":"Reinforcement learning","score":0.5794000029563904},{"id":"https://openalex.org/C2780801425","display_name":"Construct (python library)","score":0.4921000003814697},{"id":"https://openalex.org/C26517878","display_name":"Key (lock)","score":0.4875999987125397},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.4250999987125397}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4416018031","title":"Improving Rare and Common ICD Coding via a Multi-Agent LLM-Based Approach","url":"https://doi.org/10.1145/3746252.3760894","published":"2025-11-08","authors":["Rumeng Li","Xun Wang","Hong Yu"],"abstract":"Large Language Models (LLMs) have shown strong performance in tasks such as zero- and few-shot information extraction from clinical text without domain-specific training. However, in the ICD coding task, LLMs often hallucinate key details and produce high-recall but low-precision outputs due to the high-dimensional and imbalanced nature of ICD code distributions. Existing LLM-based approaches typically fail to capture the complex, dynamic interactions among human agents involved in real-world coding workflows-such as patients, physicians, and coders-and often lack interpretability and reliability. To address these challenges, we propose a novel multi-agent framework for ICD coding that simulates the real-world process using five role-specific LLM agents-patient, physician, coder, reviewer, and adjuster-and integrates the Subjective, Objective, Assessment, and Plan (SOAP) structure from E...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3746252.3760894","openalex_id":"https://openalex.org/W4416018031","cited_by_count":0,"quality_score":49,"matched_keywords":["LLM","agent","multi-agent"],"author_affiliations":["Microsoft (United States)","University of Massachusetts Amherst","University of Massachusetts Lowell"],"concepts":[{"id":"https://openalex.org/C2781067378","display_name":"Interpretability","score":0.9266999959945679},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7013000249862671},{"id":"https://openalex.org/C179518139","display_name":"Coding (social sciences)","score":0.6424000263214111},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5113000273704529},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.4805000126361847},{"id":"https://openalex.org/C2911011789","display_name":"Hallucinating","score":0.45419999957084656},{"id":"https://openalex.org/C26517878","display_name":"Key (lock)","score":0.38429999351501465},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.37130001187324524}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4416017326","title":"Dense Retrieval for Aggregated Search","url":"https://doi.org/10.1145/3746252.3761197","published":"2025-11-08","authors":["Lang Mei","Sijie Liu","Ziyuan Zhao","Qiang Yan","Jiaxin Mao","Ji-Rong Wen"],"abstract":"To satisfy users' diverse information needs, the aggregated search systems need to integrate heterogeneous results, with rich but different structural information, from a variety of verticals, such as news search, video search, and product search. A key challenge in aggregated search is to effectively and efficiently retrieve the most relevant results among a large number of heterogeneous information from different verticals. With the development of deep learning and pre-trained language models (PLMs), many researchers resort to Dense Retrieval (DR) models for a unified, efficient embedding-based retrieval and a better retrieval performance. However, existing dense retrieval models have limitations in: 1) capturing the structural information of search results ; and 2) generalizing across different vertical domains where the search results have different or even unseen structures. In this...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3746252.3761197","openalex_id":"https://openalex.org/W4416017326","cited_by_count":0,"quality_score":49,"matched_keywords":["retrieval","news","efficient"],"author_affiliations":["Renmin University of China","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.801800012588501},{"id":"https://openalex.org/C26517878","display_name":"Key (lock)","score":0.5403000116348267},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.4553999900817871},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.40790000557899475},{"id":"https://openalex.org/C132525143","display_name":"Graph","score":0.38040000200271606},{"id":"https://openalex.org/C19889080","display_name":"Beam search","score":0.36640000343322754},{"id":"https://openalex.org/C136197465","display_name":"Variety (cybernetics)","score":0.3571999967098236},{"id":"https://openalex.org/C108583219","display_name":"Deep learning","score":0.35179999470710754}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4416016553","title":"Prompt Tuning as User Inherent Profile Inference Machine","url":"https://doi.org/10.1145/3746252.3761574","published":"2025-11-08","authors":["Yusheng Lu","Zhaocheng Du","Xiangyang Li","Pengyue Jia","Yejing Wang","Weiwen Liu","Yichao Wang","Huifeng Guo","Ruiming Tang","Zhenhua Dong","Yongrui Duan","Xiangyu Zhao"],"abstract":"Large Language Models (LLMs) have exhibited significant promise in recommender systems by empowering user profiles with their extensive world knowledge and superior reasoning capabilities. However, LLMs face challenges like unstable instruction compliance, modality gaps, and high inference latency, leading to textual noise and limiting their effectiveness in recommender systems. To address these challenges, we propose UserIP-Tuning, which uses prompt-tuning to infer user profiles. It integrates the causal relationship between user profiles and behavior sequences into LLMs' prompts. It employs Expectation Maximization (EM) to infer the embedded latent profile, minimizing textual noise by fixing the prompt template. Furthermore, a profile quantization codebook bridges the modality gap by categorizing profile embeddings into collaborative IDs pre-stored for online deployment. This improves....","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3746252.3761574","openalex_id":"https://openalex.org/W4416016553","cited_by_count":0,"quality_score":45,"matched_keywords":["memory","quantization"],"author_affiliations":["City University of Hong Kong","Huawei Technologies (China)","Shanghai Jiao Tong University","Tongji University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7870000004768372},{"id":"https://openalex.org/C2776214188","display_name":"Inference","score":0.6873000264167786},{"id":"https://openalex.org/C127759330","display_name":"Codebook","score":0.6510000228881836},{"id":"https://openalex.org/C557471498","display_name":"Recommender system","score":0.5946000218391418},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.5321999788284302},{"id":"https://openalex.org/C188198153","display_name":"Limiting","score":0.4708999991416931},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4616999924182892},{"id":"https://openalex.org/C21569690","display_name":"Collaborative filtering","score":0.4174000024795532}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4416017928","title":"Personalized Multi Modal Alignment Encoding for CTR-Recommendation in WeChat","url":"https://doi.org/10.1145/3746252.3761525","published":"2025-11-08","authors":["Jiawei Zheng","Hao Gu","Lingling Yi","Jie Wen","Chuan Chen"],"abstract":"In recent years, with the significant evolution of multi-modal large models, many recommender researchers realized the potential of multi-modal information for user interest modeling. In industry recommendation system, a wide-used modeling architecture is to first pre-train a multi-modal model to provide omnipotent representations and then encode to discrete semantic IDs for online model. Although such a paradigm achieves remarkable improvements, there still exist two problems that limit model performance: (1) Modalities Mapping Independence: Each modal representation is independently mapped to semantic spaces and then get the specific code, which ignores the consistency and complementarity of different modalities of the same item. (2) User-irrelevant Clustering Assignment: For the specific item, most of existing quantization methods assume that all users share the same cluster assignmen...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3746252.3761525","openalex_id":"https://openalex.org/W4416017928","cited_by_count":0,"quality_score":45,"matched_keywords":["personalized","quantization"],"author_affiliations":["Tencent (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7997000217437744},{"id":"https://openalex.org/C66746571","display_name":"ENCODE","score":0.6399999856948853},{"id":"https://openalex.org/C2779903281","display_name":"Modalities","score":0.5602999925613403},{"id":"https://openalex.org/C71139939","display_name":"Modal","score":0.5540000200271606},{"id":"https://openalex.org/C202269582","display_name":"Complementarity (molecular biology)","score":0.5450999736785889},{"id":"https://openalex.org/C73555534","display_name":"Cluster analysis","score":0.4708000123500824},{"id":"https://openalex.org/C2776359362","display_name":"Representation (politics)","score":0.4632999897003174},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4406999945640564}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4416017485","title":"From Anchors to Answers: A Novel Node Tokenizer for Integrating Graph Structure into Large Language Models","url":"https://doi.org/10.1145/3746252.3761167","published":"2025-11-08","authors":["Yanbiao Ji","Chang Liu","Xin Chen","Dan Luo","Mei Li","Yue Ding","Wenqing Lin","Hongtao Lu"],"abstract":"Enabling large language models (LLMs) to effectively process and reason with graph-structured data remains a significant challenge despite their remarkable success in natural language tasks. Current approaches either convert graph structures into verbose textual descriptions, consuming substantial computational resources, or employ complex graph neural networks as tokenizers, which introduce significant training overhead. To bridge this gap, we present NT-LLM, a novel framework with an anchor-based positional encoding scheme for graph representation. Our approach strategically selects reference nodes as anchors and encodes each node's position relative to these anchors, capturing essential topological information without the computational burden of existing methods. Notably, we identify and address a fundamental issue: the inherent misalignment between discrete hop-based distances in gra...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3746252.3761167","openalex_id":"https://openalex.org/W4416017485","cited_by_count":0,"quality_score":45,"matched_keywords":["LLM","efficient"],"author_affiliations":["Chinese University of Hong Kong","Lehigh University","Shanghai Jiao Tong University","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8198000192642212},{"id":"https://openalex.org/C80444323","display_name":"Theoretical computer science","score":0.5985999703407288},{"id":"https://openalex.org/C132525143","display_name":"Graph","score":0.5728999972343445},{"id":"https://openalex.org/C41608201","display_name":"Embedding","score":0.5627999901771545},{"id":"https://openalex.org/C125411270","display_name":"Encoding (memory)","score":0.531000018119812},{"id":"https://openalex.org/C195324797","display_name":"Natural language","score":0.46320000290870667},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.41940000653266907},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.4023999869823456}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4416017467","title":"Empowering Large Language Model for Sequential Recommendation via Multimodal Embeddings and Semantic IDs","url":"https://doi.org/10.1145/3746252.3761169","published":"2025-11-08","authors":["Yuhao Wang","Junwei Pan","Xinhang Li","Maolin Wang","Yuan Wang","Yue Liu","Dapeng Liu","Jie Jiang","Xiangyu Zhao"],"abstract":"Sequential recommendation (SR) aims to capture users' dynamic interests and sequential patterns based on their historical interactions. Recently, the powerful capabilities of large language models (LLMs) have driven their adoption in SR. However, we identify two critical challenges in existing LLM-based SR methods: 1) embedding collapse when incorporating pre-trained collaborative embeddings and 2) catastrophic forgetting of quantized embeddings when utilizing semantic IDs. These issues dampen the model scalability and lead to suboptimal recommendation performance. Therefore, based on LLMs like Llama3-8B-instruct, we introduce a novel SR framework named MME-SID, which integrates multimodal embeddings and quantized embeddings to mitigate embedding collapse. Additionally, we propose a Multimodal Residual Quantized Variational Autoencoder (MM-RQ-VAE) with maximum mean discrepancy as the rec...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3746252.3761169","openalex_id":"https://openalex.org/W4416017467","cited_by_count":0,"quality_score":45,"matched_keywords":["LLM","language model"],"author_affiliations":["City University of Hong Kong","Tencent (China)","Tsinghua University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7997999787330627},{"id":"https://openalex.org/C101738243","display_name":"Autoencoder","score":0.7425000071525574},{"id":"https://openalex.org/C41608201","display_name":"Embedding","score":0.7247999906539917},{"id":"https://openalex.org/C48044578","display_name":"Scalability","score":0.7049000263214111},{"id":"https://openalex.org/C2776760102","display_name":"Code (set theory)","score":0.5098000168800354},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.5087000131607056},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5073000192642212},{"id":"https://openalex.org/C7149132","display_name":"Forgetting","score":0.4505000114440918}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4416017378","title":"Towards Understanding Bias in Synthetic Data for Evaluation","url":"https://doi.org/10.1145/3746252.3760908","published":"2025-11-08","authors":["Hossein A. Rahmani","Varsha Ramineni","Emine Yilmaz","Nick Craswell","Bhaskar Mitra"],"abstract":"Test collections are crucial for evaluating Information Retrieval (IR) systems. Creating a diverse set of user queries for these collections can be challenging, and obtaining relevance judgments, which indicate how well retrieved documents match a query, is often costly and resource-intensive. Recently, generating synthetic datasets using Large Language Models (LLMs) has gained attention in various applications. While previous work has used LLMs to generate synthetic queries or documents to improve ranking models, using LLMs to create synthetic test collections is still relatively unexplored. Previous work showed that synthetic test collections have the potential to be used for system evaluation, however, more analysis is needed to validate this claim. In this paper, we thoroughly investigate the reliability of synthetic test collections constructed using LLMs, where LLMs are used to gen...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3746252.3760908","openalex_id":"https://openalex.org/W4416017378","cited_by_count":2,"quality_score":43,"matched_keywords":["retrieval"],"author_affiliations":["Microsoft (United States)","Microsoft Research (United Kingdom)","Seattle University","The Alan Turing Institute","University College London"],"concepts":[{"id":"https://openalex.org/C189430467","display_name":"Ranking (information retrieval)","score":0.791100025177002},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7059999704360962},{"id":"https://openalex.org/C158154518","display_name":"Relevance (law)","score":0.6880999803543091},{"id":"https://openalex.org/C160920958","display_name":"Synthetic data","score":0.6315000057220459},{"id":"https://openalex.org/C2777267654","display_name":"Test (biology)","score":0.5841000080108643},{"id":"https://openalex.org/C177264268","display_name":"Set (abstract data type)","score":0.5579000115394592},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.5072000026702881},{"id":"https://openalex.org/C124101348","display_name":"Data mining","score":0.46149998903274536}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":2}},{"id":"openalex:W4416017417","title":"Sparse Autoencoders in Collaborative Filtering Enhanced LLM-based Recommender Systems","url":"https://doi.org/10.1145/3746252.3760957","published":"2025-11-08","authors":["Xinyu He","Jose Sepulveda","Fei Wang","Hanghang Tong"],"abstract":"Large language models (LLM) have demonstrated remarkable capability in recommendation tasks. Recently, efforts have been made to further enhance LLM performance with collaborative knowledge learned from traditional recommender systems. One approach is to inject learned embeddings into LLM prompts through a trainable projector, yet these embeddings could carry noisy or irrelevant information. In this paper, we propose using sparse autoencoders to improve input prompts. We show that sparse autoencoders can learn highly interpretable embeddings and extract key collaborative features in the case of recommender systems. With the help of sparse autoencoders, we are able to extract collaborative features to augment input prompts. By capturing TopK features of each item, we mitigate noisy information from item embeddings, therefore sparse autoencoders can also help with denoising embeddings in p...","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3746252.3760957","openalex_id":"https://openalex.org/W4416017417","cited_by_count":0,"quality_score":41,"matched_keywords":["LLM"],"author_affiliations":["Amazon (United States)","University of Illinois Urbana-Champaign"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8284000158309937},{"id":"https://openalex.org/C21569690","display_name":"Collaborative filtering","score":0.8208000063896179},{"id":"https://openalex.org/C557471498","display_name":"Recommender system","score":0.7365000247955322},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.6327999830245972},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.5455999970436096},{"id":"https://openalex.org/C26517878","display_name":"Key (lock)","score":0.5419999957084656},{"id":"https://openalex.org/C56372850","display_name":"Sparse matrix","score":0.42320001125335693},{"id":"https://openalex.org/C2776401178","display_name":"Feature (linguistics)","score":0.4198000133037567}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4416017947","title":"ROI Scan: LLM-powered Object-level Similarity Search for Google Ads Content Moderation","url":"https://doi.org/10.1145/3746252.3761443","published":"2025-11-08","authors":["Enming Luo","Yintao Liu","Dongjin Kwon","R Muñoz","Wei Qiao","Nic Trieu","Eric Xiao","Jimin Li","Laurel Graham","Ariel Fuxman"],"abstract":"","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3746252.3761443","openalex_id":"https://openalex.org/W4416017947","cited_by_count":0,"quality_score":41,"matched_keywords":["LLM"],"author_affiliations":["Google (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5967000126838684},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.47380000352859497},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.4325999915599823},{"id":"https://openalex.org/C25343380","display_name":"Relation (database)","score":0.37290000915527344},{"id":"https://openalex.org/C103278499","display_name":"Similarity (geometry)","score":0.32989999651908875},{"id":"https://openalex.org/C2778152352","display_name":"Content (measure theory)","score":0.30399999022483826},{"id":"https://openalex.org/C75165309","display_name":"Search engine indexing","score":0.26660001277923584},{"id":"https://openalex.org/C124101348","display_name":"Data mining","score":0.25850000977516174}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4416016568","title":"Neighbor-enhanced Graph Pre-training and Prompt Learning Framework for Fraud Detection","url":"https://doi.org/10.1145/3746252.3761588","published":"2025-11-08","authors":["Ziyang Cheng","Jie Yang","Yixin Song","Dawei Cheng","Guang Yang","Bo Wang"],"abstract":"Nowadays, as more users turn to WeChat Pay and other e-commerce platforms for transactions, an increasing number of fraudsters are being attracted to these platforms to conduct fraudulent activities, thereby stealing money. To address this issue, Graph Neural Networks (GNNs) have been widely adopted and have shown great success. However, with the rise of various transaction methods, users are increasingly engaging in multiple transaction networks, which creates a new scenario that requires models to detect fraud across these diverse networks. Unfortunately, current GNN-based fraud detection strategies often exhibit suboptimal performance and high time complexity in this evolving scenario, as they typically can handle only one transaction network at a time. Recently, advancements in graph prompt learning have demonstrated great success in managing various types of graph data and improving...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3746252.3761588","openalex_id":"https://openalex.org/W4416016568","cited_by_count":0,"quality_score":41,"matched_keywords":["efficient"],"author_affiliations":["Tencent (China)","Tongji University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8233000040054321},{"id":"https://openalex.org/C75949130","display_name":"Database transaction","score":0.6945000290870667},{"id":"https://openalex.org/C132525143","display_name":"Graph","score":0.5454000234603882},{"id":"https://openalex.org/C177148314","display_name":"Generalization","score":0.4634000062942505},{"id":"https://openalex.org/C38652104","display_name":"Computer security","score":0.4025999903678894},{"id":"https://openalex.org/C2522767166","display_name":"Data science","score":0.38679999113082886},{"id":"https://openalex.org/C127722929","display_name":"Transaction data","score":0.38260000944137573},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.3804999887943268}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4416018042","title":"MHSNet: An MoE-based Hierarchical Semantic Representation Network for Accurate Duplicate Resume Detection with Large Language Model","url":"https://doi.org/10.1145/3746252.3761547","published":"2025-11-08","authors":["Yu Li","Zulong Chen","Wenjian Xu","Hong Wen","Yipeng Yu","Man Lung Yiu","Yuyu Yin"],"abstract":"To maintain the company's talent pool, recruiters need to continuously search for resumes from third-party websites (e.g., LinkedIn, Indeed). However, fetched resumes are often incomplete and inaccurate. To improve the quality of third-party resumes and enrich the company's talent pool, it is essential to conduct duplication detection between the fetched resumes and those already in the company's talent pool. Such duplication detection is challenging due to the semantic complexity, structural heterogeneity, and information incompleteness of resume texts. To this end, we propose MHSNet, an multi-level identity verification framework that fine-tunes BGE-M3 using contrastive learning. With the fine-tuned BGE-M3, MHSNet generates multi-level sparse and dense representations for resumes, enabling the computation of corresponding multi-level semantic similarities. Moreover, the state-aware Mix...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3746252.3761547","openalex_id":"https://openalex.org/W4416018042","cited_by_count":0,"quality_score":41,"matched_keywords":["language model"],"author_affiliations":["Alibaba Group (China)","Hangzhou Dianzi University","Hong Kong Polytechnic University","Zhejiang University of Science and Technology"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8197000026702881},{"id":"https://openalex.org/C2776359362","display_name":"Representation (politics)","score":0.5611000061035156},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.5138000249862671},{"id":"https://openalex.org/C2778355321","display_name":"Identity (music)","score":0.4706999957561493},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4657999873161316},{"id":"https://openalex.org/C2779530757","display_name":"Quality (philosophy)","score":0.4641999900341034},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.46369999647140503},{"id":"https://openalex.org/C184337299","display_name":"Semantics (computer science)","score":0.4278999865055084}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4416017152","title":"GraFS: An Integrated GNN-LLM Approach for Inferring Best Functional Substitute Products","url":"https://doi.org/10.1145/3746252.3760961","published":"2025-11-08","authors":["Favour Nerrise","Edward W Huang","Xiaonan Ji","Karthik Subbian","Danai Koutra"],"abstract":"","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3746252.3760961","openalex_id":"https://openalex.org/W4416017152","cited_by_count":0,"quality_score":41,"matched_keywords":["LLM"],"author_affiliations":["Amazon (United States)","Stanford University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5490000247955322},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3732999861240387},{"id":"https://openalex.org/C90673727","display_name":"Product (mathematics)","score":0.2890999913215637},{"id":"https://openalex.org/C2776401178","display_name":"Feature (linguistics)","score":0.26919999718666077},{"id":"https://openalex.org/C177264268","display_name":"Set (abstract data type)","score":0.2549000084400177},{"id":"https://openalex.org/C116834253","display_name":"Identification (biology)","score":0.2500999867916107},{"id":"https://openalex.org/C26517878","display_name":"Key (lock)","score":0.24469999969005585},{"id":"https://openalex.org/C124101348","display_name":"Data mining","score":0.22540000081062317}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4416017866","title":"Google Ads Content Moderation with RAG","url":"https://doi.org/10.1145/3746252.3761435","published":"2025-11-08","authors":["Yuan Wang","Wei Qiao","Jingxiang Li","Tiantian Fang","Eric Xiao","Megan Oftelie","Zhimin Wang","Yintao Liu","Jimin Li","Yi-Ting Chen","Zhongli Ding","Enming Luo"],"abstract":"Keeping ad content policy classifiers up to date while maintaining the high quality bar is a significant challenge, especially with new threats emerging constantly. This paper introduces a new application to apply RAG-inspired in-context learning to accelerate content policy enforcement, especially when mitigating new emerging violations. Our application leverages RAG-based LLM inference for classification tasks and incorporates augmented reasoning information for better performance. We also developed a practical framework to enforce new violation patterns in O(1) days demonstrating improved memorization and generalization capabilities compared to traditional parametric and non-parametric models.","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3746252.3761435","openalex_id":"https://openalex.org/W4416017866","cited_by_count":0,"quality_score":41,"matched_keywords":["LLM"],"author_affiliations":["Google (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.676800012588501},{"id":"https://openalex.org/C30038468","display_name":"Memorization","score":0.6445000171661377},{"id":"https://openalex.org/C177148314","display_name":"Generalization","score":0.5770000219345093},{"id":"https://openalex.org/C2776214188","display_name":"Inference","score":0.5702999830245972},{"id":"https://openalex.org/C2779530757","display_name":"Quality (philosophy)","score":0.505299985408783},{"id":"https://openalex.org/C2778152352","display_name":"Content (measure theory)","score":0.3887999951839447},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.351500004529953},{"id":"https://openalex.org/C93225998","display_name":"Moderation","score":0.33160001039505005}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"arxiv:2508.14493","title":"Global-Distribution Aware Scenario-Specific Variational Representation Learning Framework","url":"http://arxiv.org/abs/2508.14493","published":"2025-11-08","authors":["Moyu Zhang","Yujun Jin","Jinxin Hu","Yu Zhang"],"abstract":"Current recommendation methods typically use a unified framework to offer personalized recommendations for different scenarios provided by commercial platforms. However, they often employ shared bottom representations, which partially hinders the model's capacity to capture scenario uniqueness. Ideally, users and items should exhibit specific characteristics in different scenarios, prompting the need to learn scenario-specific representations to differentiate scenarios. Yet, variations in user and item interactions across scenarios lead to data sparsity issues, impeding the acquisition of scenario-specific representations. To learn robust scenario-specific representations, we introduce a Global-Distribution Aware Scenario-Specific Variational Representation Learning Framework (GSVR) that can be directly applied to existing multi-scenario methods. Specifically, considering the uncertainty...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"preprint","doi":"https://doi.org/10.1145/3746252.3760866","openalex_id":"https://openalex.org/W4415239846","cited_by_count":0,"quality_score":41,"matched_keywords":["personalized"],"author_affiliations":["Alibaba Group (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7329000234603882},{"id":"https://openalex.org/C2776214188","display_name":"Inference","score":0.6880999803543091},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.6079000234603882},{"id":"https://openalex.org/C49937458","display_name":"Probabilistic logic","score":0.5623000264167786},{"id":"https://openalex.org/C2776359362","display_name":"Representation (politics)","score":0.5608999729156494},{"id":"https://openalex.org/C59404180","display_name":"Feature learning","score":0.5145999789237976},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5038999915122986},{"id":"https://openalex.org/C192065140","display_name":"Multinomial distribution","score":0.4449000060558319}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4416017552","title":"Evolving Graph-Based Context Modeling for Multi-Turn Conversational Retrieval-Augmented Generation","url":"https://doi.org/10.1145/3746252.3761355","published":"2025-11-08","authors":["Yiruo Cheng","Hongjin Qian","Fengran Mo","Yongkang Wu","Zhonghua Li","Qi Ye","Ji-Rong Wen","Zhicheng Dou"],"abstract":"Conversational Retrieval-Augmented Generation (RAG) systems enhance user interactions by integrating large language models (LLMs) with external knowledge retrieval. However, multi-turn conversations present significant challenges, including implicit user intent and noisy context, which hinder accurate retrieval and response generation. Existing approaches often struggle with the unstructured conversational context and fail to model explicit relations among conversational turns. Moreover, they do not leverage historically relevant passages effectively. To overcome these limitations, we propose EvoRAG, a novel framework that maintains an evolving knowledge graph aligned with the unstructured conversational context. This graph explicitly captures relations among user queries, system responses, and relevant passages across conversational turns, serving as a structured representation of the c...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3746252.3761355","openalex_id":"https://openalex.org/W4416017552","cited_by_count":0,"quality_score":41,"matched_keywords":["retrieval"],"author_affiliations":["Beijing Academy of Artificial Intelligence","Huawei Technologies (China)","Renmin University of China","Université de Montréal"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8289999961853027},{"id":"https://openalex.org/C153083717","display_name":"Leverage (statistics)","score":0.6643000245094299},{"id":"https://openalex.org/C154690210","display_name":"Rewriting","score":0.5965999960899353},{"id":"https://openalex.org/C2987255567","display_name":"Knowledge graph","score":0.5047000050544739},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.4431999921798706},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4309000074863434},{"id":"https://openalex.org/C2779343474","display_name":"Context (archaeology)","score":0.42570000886917114},{"id":"https://openalex.org/C132525143","display_name":"Graph","score":0.40790000557899475}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4416016498","title":"Enhancing and Assessing Instruction-Following with Fine-Grained Instruction Variants","url":"https://doi.org/10.1145/3746252.3761322","published":"2025-11-08","authors":["Jiuding Yang","Hui Liu","Weidong Guo","Xu Yu","Di Niu"],"abstract":"Aligning Large Language Models (LLMs) with nuanced user instructions is critical for their effective deployment in real-world applications. While prior methods focus on enhancing data diversity and complexity, they often overlook models' sensitivity to fine-grained variations in semantically similar instructions. To address this, we introduce DeMoRecon, a data augmentation framework that decomposes complex instructions into sub-components, modifies individual elements, and reconstructs them into instruction variants. This method preserves contextual integrity while injecting targeted variability essential for fine-grained instruction-following. Based on DeMoRecon, we construct the FGIV dataset, comprising over 1,700 seed instructions and thousands of nuanced variants designed for both supervised fine-tuning and preference-based alignment. Experimental results show that LLMs trained with....","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3746252.3761322","openalex_id":"https://openalex.org/W4416016498","cited_by_count":0,"quality_score":41,"matched_keywords":["preference"],"author_affiliations":["Tencent (China)","University of Alberta"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6970999836921692},{"id":"https://openalex.org/C63479239","display_name":"Robustness (evolution)","score":0.671999990940094},{"id":"https://openalex.org/C2780801425","display_name":"Construct (python library)","score":0.550599992275238},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.4697999954223633},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.44670000672340393},{"id":"https://openalex.org/C185798385","display_name":"Benchmark (surveying)","score":0.44269999861717224},{"id":"https://openalex.org/C51632099","display_name":"Training set","score":0.43849998712539673},{"id":"https://openalex.org/C192209626","display_name":"Focus (optics)","score":0.39469999074935913}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4416017454","title":"Adapting Large Language Models to Log Analysis with Interpretable Domain Knowledge","url":"https://doi.org/10.1145/3746252.3761189","published":"2025-11-08","authors":["Yuhe Ji","Yilun Liu","Feiyu Yao","Minggui He","Shimin Tao","Xiaofeng Zhao","Chang Su","Xinhua Yang","Weibin Meng","Yuming Xie","Boxing Chen","Shenglin Zhang"],"abstract":"Log analysis represents a critical sub-domain within AI applications that facilitates automatic approaches to fault and error management of large-scaled software systems, saving labors of traditional manual methods. While existing solutions using large language models (LLMs) show promise, they are limited by a significant domain gap between natural and log languages (the latter contains rich domain-specific tokens such as status codes, IP addresses, resource pathes), which restricts their effectiveness in real-world applications. However, directly adapting general-purpose LLMs to log analysis using raw logs may degrade their performance due to inconsistent token distribution. In this paper, we present a domain adaptation approach that addresses these limitations by integrating interpretable domain knowledge into open-source LLMs through continual pre-training (CPT), which bridges this do...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3746252.3761189","openalex_id":"https://openalex.org/W4416017454","cited_by_count":2,"quality_score":39,"matched_keywords":[],"author_affiliations":["Huawei Technologies (Canada)","Huawei Technologies (China)","Nankai University","Tianjin University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6880999803543091},{"id":"https://openalex.org/C207685749","display_name":"Domain knowledge","score":0.5752000212669373},{"id":"https://openalex.org/C36503486","display_name":"Domain (mathematical analysis)","score":0.5702999830245972},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5184000134468079},{"id":"https://openalex.org/C132964779","display_name":"Raw data","score":0.5126000046730042},{"id":"https://openalex.org/C2776434776","display_name":"Domain adaptation","score":0.4593000113964081},{"id":"https://openalex.org/C48145219","display_name":"Security token","score":0.4259999990463257},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.4156000018119812}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":2}},{"id":"openalex:W4416017491","title":"SELF: Surrogate-light Feature Selection with Large Language Models in Deep Recommender Systems","url":"https://doi.org/10.1145/3746252.3761378","published":"2025-11-08","authors":["Pengyue Jia","Zhaocheng Du","Yichao Wang","Xiangyu Zhao","Xiaopeng Li","Yuhao Wang","Qidong Liu","Huifeng Guo","Ruiming Tang"],"abstract":"Feature selection is crucial in recommender systems for improving model efficiency and predictive performance. Conventional approaches typically employ surrogate models-such as decision trees or neural networks-to estimate feature importance. However, their effectiveness is inherently constrained, as these models may struggle under suboptimal training conditions, including feature collinearity, high-dimensional sparsity, and insufficient data. In this paper, we propose SELF, a SurrogatE-Light Feature selection method for deep recommender systems. SELF integrates semantic reasoning from Large Language Models (LLMs) with task-specific learning from surrogate models, enabling an automated and lightweight feature selection process. Specifically, LLMs first produce a semantically informed ranking of feature importance, which is subsequently refined by a surrogate model, effectively integratin...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3746252.3761378","openalex_id":"https://openalex.org/W4416017491","cited_by_count":1,"quality_score":38,"matched_keywords":[],"author_affiliations":["City University of Hong Kong","Huawei Technologies (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8345000147819519},{"id":"https://openalex.org/C557471498","display_name":"Recommender system","score":0.76910001039505},{"id":"https://openalex.org/C148483581","display_name":"Feature selection","score":0.7243000268936157},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.7214999794960022},{"id":"https://openalex.org/C2776401178","display_name":"Feature (linguistics)","score":0.6567000150680542},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.6425999999046326},{"id":"https://openalex.org/C189430467","display_name":"Ranking (information retrieval)","score":0.5357999801635742},{"id":"https://openalex.org/C81917197","display_name":"Selection (genetic algorithm)","score":0.5149999856948853}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4416016591","title":"PRECISE: Pre-training and Fine-tuning Sequential Recommenders with Collaborative and Semantic Information","url":"https://doi.org/10.1145/3746252.3761584","published":"2025-11-08","authors":["Chonggang Song","Chunxu Shen","Hao Gu","Yaoming Wu","Lingling Yi","Jie Wen","Chuan Chen"],"abstract":"Recommendation platforms commonly offer diverse content scenarios for users to interact with. Pre-training models are the most commonly used approach in recommendation systems to capture users' full-domain interests. Traditional ID-based pre-training models mainly capture user interests by leveraging collaborative signals. However, a prevalent drawback of those systems is the incapacity to handle cold-start scenarios. With the recent advent of large language models, there has been a significant increase in research efforts exploiting LLMs to extract semantic information for items. However, text-based recommendations highly rely on elaborate feature engineering and often fail to capture collaborative similarities.","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3746252.3761584","openalex_id":"https://openalex.org/W4416016591","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Tencent (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7960000038146973},{"id":"https://openalex.org/C2776401178","display_name":"Feature (linguistics)","score":0.5410000085830688},{"id":"https://openalex.org/C557471498","display_name":"Recommender system","score":0.5060999989509583},{"id":"https://openalex.org/C184337299","display_name":"Semantics (computer science)","score":0.41679999232292175},{"id":"https://openalex.org/C21569690","display_name":"Collaborative filtering","score":0.4147999882698059},{"id":"https://openalex.org/C2522767166","display_name":"Data science","score":0.4142000079154968},{"id":"https://openalex.org/C136764020","display_name":"World Wide Web","score":0.36469998955726624},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.34369999170303345}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4416017983","title":"LLM4CD: Leveraging Large Language Models for Open-World Knowledge Augmented Cognitive Diagnosis","url":"https://doi.org/10.1145/3746252.3761321","published":"2025-11-08","authors":["Weiming Zhang","Lingyue Fu","Qingyao Li","Kounianhua Du","Jianghao Lin","Jingwei Yu","Wei Xia","Weinan Zhang","Ruiming Tang","Yong Yu"],"abstract":"Cognitive diagnosis (CD) plays a crucial role in intelligent education, evaluating students' comprehension of knowledge concepts based on their test histories. However, current CD methods often model students, exercises, and knowledge concepts solely on their ID relationships, neglecting the abundant semantic relationships present within the educational data space. Furthermore, contemporary intelligent tutoring systems (ITS) frequently involve the addition of new students and exercises, creating cold-start scenarios that ID-based methods find challenging to manage effectively. The advent of large language models (LLMs) offers the potential for overcoming this challenge with open-world knowledge. In this paper, we propose LLM4CD, which Leverages Large Language Models for open-world knowledge Augmented Cognitive Diagnosis. Our method utilizes the open-world knowledge of LLMs to construct c...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3746252.3761321","openalex_id":"https://openalex.org/W4416017983","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["GTx (United States)","Huawei Technologies (China)","Shanghai Jiao Tong University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7714999914169312},{"id":"https://openalex.org/C511192102","display_name":"Comprehension","score":0.6421999931335449},{"id":"https://openalex.org/C169900460","display_name":"Cognition","score":0.5924999713897705},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.501800000667572},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4803999960422516},{"id":"https://openalex.org/C161407221","display_name":"Cognitive model","score":0.4609000086784363},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.4408999979496002},{"id":"https://openalex.org/C2780801425","display_name":"Construct (python library)","score":0.4009000062942505}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4416018090","title":"GSTBench: A Benchmark Study on the Transferability of Graph Self-Supervised Learning","url":"https://doi.org/10.1145/3746252.3761422","published":"2025-11-08","authors":["Yu Song","Zhigang Hua","Yan Xie","Jingzhe Liu","Bo Long","Hui Liu"],"abstract":"Self-supervised learning (SSL) has shown great promise in graph representation learning. However, most existing graph SSL methods are developed and evaluated under a single-dataset setting, leaving their cross-dataset transferability largely unexplored and limiting their ability to leverage knowledge transfer and large-scale pretraining, factors that are critical for developing generalized intelligence beyond fitting training data. To address this gap and advance foundation model research for graphs, we present GSTBench, the first systematic benchmark for evaluating the transferability of graph SSL methods. We conduct large-scale pretraining on ogbn-papers100M and evaluate five representative SSL methods across a diverse set of target graphs. Our standardized experimental setup decouples confounding factors such as model architecture, dataset characteristics, and adaptation protocols, en...","companies":["Meta/FAIR"],"matched_orgs":["Meta/FAIR"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3746252.3761422","openalex_id":"https://openalex.org/W4416018090","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Aqua Metrology Systems (United States)","Meta (United States)","Michigan State University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7056999802589417},{"id":"https://openalex.org/C61272859","display_name":"Transferability","score":0.6776999831199646},{"id":"https://openalex.org/C132525143","display_name":"Graph","score":0.5350000262260437},{"id":"https://openalex.org/C153083717","display_name":"Leverage (statistics)","score":0.5309000015258789},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.511900007724762},{"id":"https://openalex.org/C184898388","display_name":"Pairwise comparison","score":0.4603999853134155},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.43689998984336853},{"id":"https://openalex.org/C80444323","display_name":"Theoretical computer science","score":0.4239000082015991}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4416017583","title":"Enhancing Dual-Target Cross-Domain Recommendation via Similar User Bridging","url":"https://doi.org/10.1145/3746252.3761356","published":"2025-11-08","authors":["Qi Zhou","Xi Chen","Chuyu Fang","Jianji Wang","Chuan Qin","Fuzhen Zhuang"],"abstract":"Dual-target cross-domain recommendation aims to mitigate data sparsity and enables mutual enhancement via bidirectional knowledge transfer. Most existing methods rely on overlapping users to build cross-domain connections. However, in many real-world scenarios, overlapping data is extremely limited-or even entirely absent-significantly diminishing the effectiveness of these methods. To address this challenge, we propose SUBCDR, a novel framework that leverages large language models (LLMs) to bridge similar users across domains, thereby enhancing dual-target cross-domain recommendation. Specifically, we introduce a Multi-Interests-Aware Prompt Learning mechanism that enables LLMs to generate comprehensive user profiles, disentangling domain-invariant interest points while capturing fine-grained preferences. Then, we construct intra-domain bipartite graphs from user-item interactions and a...","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3746252.3761356","openalex_id":"https://openalex.org/W4416017583","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Baidu (China)","Beijing Academy of Artificial Intelligence","Computer Network Information Center","University of Chinese Academy of Sciences","University of Science and Technology of China","Xi'an Jiaotong University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8475000262260437},{"id":"https://openalex.org/C174348530","display_name":"Bridging (networking)","score":0.6330000162124634},{"id":"https://openalex.org/C2780801425","display_name":"Construct (python library)","score":0.4471000134944916},{"id":"https://openalex.org/C132525143","display_name":"Graph","score":0.42730000615119934},{"id":"https://openalex.org/C197657726","display_name":"Bipartite graph","score":0.39959999918937683},{"id":"https://openalex.org/C2987255567","display_name":"Knowledge graph","score":0.3871999979019165},{"id":"https://openalex.org/C557471498","display_name":"Recommender system","score":0.38519999384880066},{"id":"https://openalex.org/C100776233","display_name":"Bridge (graph theory)","score":0.3594000041484833}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4416017202","title":"ECKGBench: Benchmarking Large Language Models in E-commerce Leveraging Knowledge Graph","url":"https://doi.org/10.1145/3746252.3761613","published":"2025-11-08","authors":["Langming Liu","Haibin Chen","Yuhao Wang","Yujin Yuan","Shilei Liu","Wenbo Su","Xiangyu Zhao","Bo Zheng"],"abstract":"Large language models (LLMs) have demonstrated their capabilities across various natural language processing (NLP) tasks. Their potential in e-commerce is also substantial, evidenced by existing implementations in scenarios such as platform search and recommender systems. One obstinate concern associated with LLMs is the factuality issue (e.g., hallucination), which is urgent in e-commerce due to its significant impact on user experience and revenue. While some methods aim to evaluate the factuality of LLMs, issues such as lack of objectivity, high consumption, and lack of domain expertise arise. To this end, leveraging a collected knowledge graph (KG) as a reliable source, we propose ECKGBench, a question-answering dataset to assess LLMs' capacity in e-commerce. Specifically, each question is automatically generated based on one KG triple through a standardized pipeline, guaranteeing ev...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3746252.3761613","openalex_id":"https://openalex.org/W4416017202","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Alibaba Group (China)","City University of Hong Kong"],"concepts":[{"id":"https://openalex.org/C86251818","display_name":"Benchmarking","score":0.8353000283241272},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8021000027656555},{"id":"https://openalex.org/C2987255567","display_name":"Knowledge graph","score":0.6035000085830688},{"id":"https://openalex.org/C26713055","display_name":"Implementation","score":0.5385000109672546},{"id":"https://openalex.org/C132525143","display_name":"Graph","score":0.5263000130653381},{"id":"https://openalex.org/C36503486","display_name":"Domain (mathematical analysis)","score":0.5182999968528748},{"id":"https://openalex.org/C2522767166","display_name":"Data science","score":0.439300000667572},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.430400013923645}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"arxiv:2508.14485","title":"Distribution-Guided Auto-Encoder for User Multimodal Interest Cross Fusion","url":"http://arxiv.org/abs/2508.14485","published":"2025-11-08","authors":["Moyu Zhang","Yongxiang Tang","Yujun Jin","Jinxin Hu","Yu Zhang"],"abstract":"Traditional recommendation methods model a user's interest in a target item by correlating its embedding with the embeddings of items from the user's interaction history, thereby capturing implicit collaborative filtering signals. Consequently, traditional ID-based methods often encounter data sparsity problems stemming from the sparse nature of ID features. To mitigate this issue, recommendation models incorporate multimodal item information to enhance recommendation accuracy. However, existing multimodal recommendation methods typically rely on early fusion approaches, which focus primarily on combining text and image features, while neglecting the dynamic context provided by user behavior sequences. This oversight precludes the dynamic adaptation of multimodal interest representations to behavioral patterns, thereby hindering the model's ability to effectively capture user multimodal....","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"preprint","doi":"https://doi.org/10.1145/3746252.3761367","openalex_id":"https://openalex.org/W4416017679","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Alibaba Group (China)","Jingdong (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.791700005531311},{"id":"https://openalex.org/C2776359362","display_name":"Representation (politics)","score":0.605400025844574},{"id":"https://openalex.org/C103278499","display_name":"Similarity (geometry)","score":0.5849000215530396},{"id":"https://openalex.org/C192209626","display_name":"Focus (optics)","score":0.5719000101089478},{"id":"https://openalex.org/C2779343474","display_name":"Context (archaeology)","score":0.5703999996185303},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5126000046730042},{"id":"https://openalex.org/C98045186","display_name":"Process (computing)","score":0.49790000915527344},{"id":"https://openalex.org/C41608201","display_name":"Embedding","score":0.4747999906539917}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4416017405","title":"CLUE: Using Large Language Models for Judging Document Usefulness in Web Search Evaluation","url":"https://doi.org/10.1145/3746252.3761158","published":"2025-11-08","authors":["Xingzhu Wang","Erhan Zhang","Yiqun Chen","Jinghan Xuan","Yucheng Hou","Yitong Xu","Ying Nie","Shuaiqiang Wang","Dawei Yin","Jiaxin Mao"],"abstract":"The widely adopted Cranfield paradigm fails to adequately capture user satisfaction due to a weak relevance-satisfaction correlation. Additionally, constructing test collections incurs high relevance annotation costs. To address these two limitations, we aim to explore the use of large language models (LLMs) to generate multilevel usefulness labels. We propose CLUE, a user-centric evaluation method that explicitly incorporates users' search context and behavior information into LLMs. Inspired by ordinal regression, it employs a cascade structure tailored for multilevel usefulness judgments. Our study shows that using CLUE, LLMs can effectively assess usefulness when provided with search context and behavior, outperforming third-party labeling methods. We also conduct ablation studies to explore the impact of each component in CLUE. Finally, we utilize the usefulness labels generated by C...","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3746252.3761158","openalex_id":"https://openalex.org/W4416017405","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Baidu (China)","Renmin University of China"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8009999990463257},{"id":"https://openalex.org/C158154518","display_name":"Relevance (law)","score":0.6509000062942505},{"id":"https://openalex.org/C2779343474","display_name":"Context (archaeology)","score":0.5795999765396118},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.5174000263214111},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.499099999666214},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4846000075340271},{"id":"https://openalex.org/C168167062","display_name":"Component (thermodynamics)","score":0.4514000117778778},{"id":"https://openalex.org/C2776321320","display_name":"Annotation","score":0.40869998931884766}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4416018126","title":"C-FAITH: A Chinese Fine-Grained Benchmark for Automated Hallucination Evaluation","url":"https://doi.org/10.1145/3746252.3761604","published":"2025-11-08","authors":["Xu Zhang","Zhifei Liu","Jiahao Wang","Huixuan Zhang","Fan Xu","Junzhe Zhang","Xiaojun Wan"],"abstract":"Despite the rapid advancement of large language models, they remain highly susceptible to generating hallucinations, which significantly hinders their widespread application. Hallucination research requires dynamic and fine-grained evaluation. However, most existing hallucination benchmarks (especially in Chinese language) rely on human annotations, making automatical and cost-effective hallucination evaluation challenging. To address this, we introduce HaluAgent, an agentic framework that automatically constructs fine-grained question-answering (QA) dataset based on some knowledge documents. Our experiments demonstrate that the manually designed rules and prompt optimization can improve the quality of generated data. Using HaluAgent, we construct C-FAITH, a Chinese QA hallucination benchmark created from 1,399 knowledge documents obtained from web scraping, totaling 60,702 entries. We c...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3746252.3761604","openalex_id":"https://openalex.org/W4416018126","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Beijing University of Posts and Telecommunications","Huawei Technologies (China)","Peking University"],"concepts":[{"id":"https://openalex.org/C185798385","display_name":"Benchmark (surveying)","score":0.7688000202178955},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7056999802589417},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.6215000152587891},{"id":"https://openalex.org/C2780801425","display_name":"Construct (python library)","score":0.5813000202178955},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.5579000115394592},{"id":"https://openalex.org/C2779530757","display_name":"Quality (philosophy)","score":0.47749999165534973},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.4359999895095825},{"id":"https://openalex.org/C2777617010","display_name":"Mainstream","score":0.4223000109195709}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/adapting-web-agents-with-synthetic-supervision","title":"Adapting Web Agents with Synthetic Supervision","url":"https://www.microsoft.com/en-us/research/publication/adapting-web-agents-with-synthetic-supervision/","published":"2025-11-07","authors":["Zhaoyang Wang","Yiming Liang","Xuchao Zhang","Qianhui Wu","Siwei Han","Anson Bastos","Rujia Wang","Chetan Bansal","Baolin Peng","Jianfeng Gao","Saravan Rajmohan","Huaxiu Yao"],"abstract":"Web agents struggle to adapt to new websites due to the scarcity of environment specific tasks and demonstrations. Recent works have explored synthetic data generation to address this challenge, however, they suffer from data quality issues where synthesized tasks contain hallucinations that cannot be executed, and collected trajectories are noisy with redundant or misaligned actions. In this paper, we propose SynthAgent, a fully synthetic supervision framework that aims at improving synthetic data quality via dual refinement of both tasks and trajectories. Our approach begins by synthesizing diverse tasks through categorized exploration of web elements, ensuring efficient coverage of the target environment. During trajectory collection, we refine tasks when conflicts with actual observations are detected, mitigating hallucinations while maintaining task consistency. After collection, we...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Miscellaneous","Artificial intelligence","Computer science","efficient"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W7131311689","title":"V-RAG: Competitive Tree Reranking and Static Distillation for Answer-Source Alignment","url":"https://doi.org/10.1109/ic-nidc67200.2025.11390266","published":"2025-11-07","authors":["Hanyan Zhao","Wenhui Lin","Shijie Cai","Fan Duo","Jie Yang"],"abstract":"Retrieval-Augmented Generation (RAG) has become a mainstream paradigm for enhancing the factual consistency of large language models (LLMs). However, existing RAG frameworks still face two major challenges: improving retrieval recall to provide more relevant context for generation, and reducing hallucinations in LLM-generated answers. In this paper, we propose V-RAG, a novel RAG framework augmented by a validation mechanism. In the reranking stage, V-RAG employs a multi-round filtering process based on a competition tree structure. An unfinetuned LLM is used to eliminate redundant documents from the initially retrieved set, enhancing contextual relevance while reducing computational overhead. In the generation stage, V-RAG introduces a three-way answer evaluation module that classifies responses into acceptable, hallucinated, or irrelevantly cited. It also dynamically updates prompt stra...","companies":["Meta/FAIR"],"matched_orgs":["Meta/FAIR"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/ic-nidc67200.2025.11390266","openalex_id":"https://openalex.org/W7131311689","cited_by_count":0,"quality_score":49,"matched_keywords":["LLM","retrieval","distillation"],"author_affiliations":["Aisino (China)","Beijing University of Posts and Telecommunications","Meta (United States)","Xiamen University of Technology"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8241999745368958},{"id":"https://openalex.org/C35292069","display_name":"Validator","score":0.6263999938964844},{"id":"https://openalex.org/C2776436953","display_name":"Consistency (knowledge bases)","score":0.5526999831199646},{"id":"https://openalex.org/C2779343474","display_name":"Context (archaeology)","score":0.5367000102996826},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5076000094413757},{"id":"https://openalex.org/C98045186","display_name":"Process (computing)","score":0.4632999897003174},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.4593000113964081},{"id":"https://openalex.org/C158154518","display_name":"Relevance (law)","score":0.42080000042915344}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7131262538","title":"SBR-RAG: Efficient Subgraph-Based RAG with Lightweight Filter","url":"https://doi.org/10.1109/ic-nidc67200.2025.11390490","published":"2025-11-07","authors":["Xinda Chu","Ma Lan","Jincheng Bao","Fan Duo","Yuanyuan Qiao"],"abstract":"Contemporary artificial intelligence (AI), particularly Large Language Models (LLMs), focuses on emulating human capabilities for knowledge acquisition and utilization. Relying solely on pre-training data is insufficient for up-to-date knowledge access, yet continuously retraining or fine-tuning LLMs with internal data is impractical. Many existing Retrieval-Augmented Generation (RAG) [1], [2], [3], [4] methods significantly enhance LLMs' access to external knowledge through vector-based or graph-structured retrieval from knowledge bases. However, the complexity and diversity of real-world knowledge bases hinder RAG systems from effectively retrieving complex multi-hop data and abstractly related information; moreover, increased volumes of erroneous retrieved data readily induce LLM hallucinations. To address these limitations, we propose SBR-RAG, a knowledge retrieval and filtering fram...","companies":["Meta/FAIR"],"matched_orgs":["Meta/FAIR"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/ic-nidc67200.2025.11390490","openalex_id":"https://openalex.org/W7131262538","cited_by_count":0,"quality_score":49,"matched_keywords":["LLM","retrieval","efficient"],"author_affiliations":["Aisino (China)","Beijing University of Posts and Telecommunications","Meta (United States)","Xiamen University of Technology"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7689999938011169},{"id":"https://openalex.org/C2779803651","display_name":"Discriminator","score":0.7103999853134155},{"id":"https://openalex.org/C106131492","display_name":"Filter (signal processing)","score":0.5593000054359436},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5360000133514404},{"id":"https://openalex.org/C2778712577","display_name":"Retraining","score":0.5242000222206116},{"id":"https://openalex.org/C81669768","display_name":"Precision and recall","score":0.4851999878883362},{"id":"https://openalex.org/C100660578","display_name":"Recall","score":0.40549999475479126},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.4018000066280365}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4416016849","title":"ClariLM: Enhancing Open-domain Clarification Ability for Large Language Models","url":"https://doi.org/10.1145/3746252.3761068","published":"2025-11-07","authors":["Ziliang Zhao","Haonan Chen","Shiren Song","Jian Xie","Zhicheng Dou"],"abstract":"Active understanding and clarification of user intent is crucial for information-seeking systems based on Large Language Models (LLMs), as it enhances search efficiency and improves user experience for human-LLM interaction. While existing systems rely on domain-specific resources to generate clarifying questions, they face challenges when extended to open-domain scenarios due to the lack of human-LLM clarification data. In this paper, we propose ClariLM to synthesize large-scale clarification data and enhance the LLMs' clarification capability. Specifically, we design two key stages to prepare data: first, given a user question, the Clarification Facet Detection (CFD) stage employs a facet mining model learned from human-LLM conversation logs to predict realistic potential clarification candidates. Additionally, it incorporates direct predictions from powerful LLMs as supplements to gua...","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3746252.3761068","openalex_id":"https://openalex.org/W4416016849","cited_by_count":0,"quality_score":45,"matched_keywords":["LLM","preference"],"author_affiliations":["Baidu (China)","Renmin University of China"],"concepts":[{"id":"https://openalex.org/C43122875","display_name":"Facet (psychology)","score":0.8367999792098999},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7397000193595886},{"id":"https://openalex.org/C177264268","display_name":"Set (abstract data type)","score":0.6134999990463257},{"id":"https://openalex.org/C26517878","display_name":"Key (lock)","score":0.5613999962806702},{"id":"https://openalex.org/C2777200299","display_name":"Conversation","score":0.5601000189781189},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.5501000285148621},{"id":"https://openalex.org/C81917197","display_name":"Selection (genetic algorithm)","score":0.5063999891281128},{"id":"https://openalex.org/C2781249084","display_name":"Preference","score":0.48010000586509705}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4415993980","title":"The AI revolution: how multimodal intelligence will reshape the oncology ecosystem","url":"https://doi.org/10.1038/s44387-025-00044-4","published":"2025-11-07","authors":["David Dellamonica","David Ruau","Ben Griffiths","Greg Rossi","Bob T. Li","Pedram Razavi","Tommasa Maio","T. González","Amanda Remorino","Jorge S. Reis‐Filho","Philippe Menu","Thorsten Gutjahr"],"abstract":"Abstract Multimodal artificial intelligence (MMAI) is redefining oncology by integrating heterogeneous datasets from diagnostic modalities into cohesive analytical frameworks for more accurate and personalized cancer care. We highlight MMAI applications across the patient journey and clinical research, discuss outstanding challenges, and the need for guidelines and regulatory frameworks. By converting multimodal complexity into clinically actionable insights, MMAI is poised to improve patient outcomes while reshaping the economics of global cancer care.","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1038/s44387-025-00044-4","openalex_id":"https://openalex.org/W4415993980","cited_by_count":3,"quality_score":44,"matched_keywords":["personalized"],"author_affiliations":["AstraZeneca (Spain)","AstraZeneca (Switzerland)","AstraZeneca (United Kingdom)","AstraZeneca (United States)","Memorial Sloan Kettering Cancer Center","Nvidia (United States)","Sophia Genetics (Switzerland)"],"concepts":[{"id":"https://openalex.org/C2779903281","display_name":"Modalities","score":0.6604999899864197},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.43389999866485596},{"id":"https://openalex.org/C32220436","display_name":"Personalized medicine","score":0.39959999918937683},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.37380000948905945},{"id":"https://openalex.org/C71924100","display_name":"Medicine","score":0.3709000051021576},{"id":"https://openalex.org/C2779473830","display_name":"MEDLINE","score":0.35409998893737793},{"id":"https://openalex.org/C163763905","display_name":"Precision medicine","score":0.35190001130104065},{"id":"https://openalex.org/C2522767166","display_name":"Data science","score":0.3440000116825104}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":3}},{"id":"openalex:W4416003748","title":"Img2ST-Net: efficient high-resolution spatial omics prediction from whole-slide histology images via fully convolutional image-to-image learning","url":"https://doi.org/10.1117/1.jmi.12.6.061410","published":"2025-11-07","authors":["Junchao Zhu","Ruining Deng","Junlin Guo","Tianyuan Yao","Juming Xiong","Chongyu Qu","Mengmeng Yin","Yu Wang","Shilin Zhao","Haichun Yang","Daguang Xu","Yucheng Tang"],"abstract":"Purpose: or finer-introduces significant computational and modeling challenges. Conventional spot-by-spot sequential regression frameworks become inefficient and unstable at this scale, whereas the inherent extreme sparsity and low expression levels of high-resolution ST further complicate both prediction and evaluation. Approach: To address these limitations, we propose Img2ST-Net, a high-definition (HD) histology-to-ST generation framework for efficient and parallel high-resolution ST prediction. Unlike conventional spot-by-spot inference methods, Img2ST-Net employs a fully convolutional architecture to generate dense, HD gene expression maps in a parallelized manner. By modeling HD ST data as super-pixel representations, the task is reformulated from image-to-omics inference into a super-content image generation problem with hundreds or thousands of output channels. This design not on...","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1117/1.jmi.12.6.061410","openalex_id":"https://openalex.org/W4416003748","cited_by_count":0,"quality_score":41,"matched_keywords":["efficient"],"author_affiliations":["Cornell University","Nvidia (United States)","Vanderbilt University","Vanderbilt University Medical Center","Weill Cornell Medicine"],"concepts":[{"id":"https://openalex.org/C71924100","display_name":"Medicine","score":0.7437000274658203},{"id":"https://openalex.org/C2776214188","display_name":"Inference","score":0.6854000091552734},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.6777999997138977},{"id":"https://openalex.org/C81363708","display_name":"Convolutional neural network","score":0.5766000151634216},{"id":"https://openalex.org/C153180895","display_name":"Pattern recognition (psychology)","score":0.4560000002384186},{"id":"https://openalex.org/C108583219","display_name":"Deep learning","score":0.41589999198913574},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.36090001463890076},{"id":"https://openalex.org/C31601959","display_name":"Medical imaging","score":0.2676999866962433}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"arxiv:2510.17705","title":"Contextual Attention Modulation: Towards Efficient Multi-Task Adaptation in Large Language Models","url":"http://arxiv.org/abs/2510.17705","published":"2025-11-07","authors":["Dawei Pan","Zhaoyang Fu","Jingyuan Wang","Xiao Han","Yue Zhu","Xiangyu Zhao"],"abstract":"Large Language Models (LLMs) possess remarkable generalization capabilities but struggle with multi-task adaptation, particularly in balancing knowledge retention with task-specific specialization. Conventional fine-tuning methods suffer from catastrophic forgetting and substantial resource consumption, while existing parameter-efficient methods perform suboptimally in complex multi-task scenarios. To address this, we propose Contextual Attention Modulation (CAM), a novel mechanism that dynamically modulates the representations of self-attention modules in LLMs. CAM enhances task-specific features while preserving general knowledge, thereby facilitating more effective and efficient adaptation. For effective multi-task adaptation, CAM is integrated into our Hybrid Contextual Attention Modulation (HyCAM) framework, which combines a shared, full-parameter CAM module with multiple specialize...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"preprint","doi":"https://doi.org/10.1145/3746252.3761289","openalex_id":"https://openalex.org/W4415966292","cited_by_count":0,"quality_score":41,"matched_keywords":["efficient"],"author_affiliations":["Beihang University","City University of Hong Kong","Huawei Technologies (China)","Zhejiang University of Technology"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7986000180244446},{"id":"https://openalex.org/C139807058","display_name":"Adaptation (eye)","score":0.5855000019073486},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.48750001192092896},{"id":"https://openalex.org/C7149132","display_name":"Forgetting","score":0.46650001406669617},{"id":"https://openalex.org/C177148314","display_name":"Generalization","score":0.45350000262260437},{"id":"https://openalex.org/C2776760102","display_name":"Code (set theory)","score":0.453000009059906},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.42329999804496765},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.40619999170303345}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4416016804","title":"Advancing Temporal Sensitive Question Answering through Progressive Multi-Step Reflection","url":"https://doi.org/10.1145/3746252.3761292","published":"2025-11-07","authors":["Ziyang Chen","Erxue Min","Xiang Zhao","Yunxin Li","Xin Jia","Jinzhi Liao","Shuaiqiang Wang","Baotian Hu","Dawei Yin"],"abstract":"Retrieval-augmented generation (RAG) has demonstrated strong potential in enhancing large language models (LLMs) for complex, real-world question answering. However, existing RAG frameworks remain inadequate for temporal scenarios, primarily due to their inability to jointly model temporal constraints in both retrieval and reasoning. On the retrieval side, traditional approaches focus on semantic similarity, often returning outdated or temporally misaligned evidence. On the generation side, these systems frequently produce factually incorrect or hallucinated answers when confronted with incomplete or temporally inconsistent information. Motivated by the observed limitations, we propose ChronoReflect+, a temporal logic-aware RAG framework that incorporates hybrid temporal-aware retrieval and progressive multi-step reflection. Our method iteratively refines both retrieval and reasoning, id...","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3746252.3761292","openalex_id":"https://openalex.org/W4416016804","cited_by_count":0,"quality_score":41,"matched_keywords":["retrieval"],"author_affiliations":["Baidu (China)","Harbin Institute of Technology","National University of Defense Technology"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7314000129699707},{"id":"https://openalex.org/C44291984","display_name":"Question answering","score":0.7141000032424927},{"id":"https://openalex.org/C2911011789","display_name":"Hallucinating","score":0.6647999882698059},{"id":"https://openalex.org/C192209626","display_name":"Focus (optics)","score":0.6444000005722046},{"id":"https://openalex.org/C174348530","display_name":"Bridging (networking)","score":0.5871999859809875},{"id":"https://openalex.org/C2779343474","display_name":"Context (archaeology)","score":0.5577999949455261},{"id":"https://openalex.org/C65682993","display_name":"Reflection (computer programming)","score":0.4887999892234802},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.47690001130104065}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7131237658","title":"A List-Aware Re-Ranking Model with Multi-Granularity Relevance Feature Fusion","url":"https://doi.org/10.1109/ic-nidc67200.2025.11390124","published":"2025-11-07","authors":["Rui Wang","Z. A. Liu","Zhigang Wang","Fan Duo","Jie Yang","Yuanyuan Qiao"],"abstract":"In the retrieval stage of Retrieval-Augmented Generation (RAG), a subset of documents or text chunks is typically first retrieved from the database. These retrieved chunks are then reranked, and the top-ranked ones are selected as the final output. Most existing ranking methods focus solely on the similarity between the query and individual relevant documents while neglecting the interrelationships among the relevant documents. In this study, we propose a list-level reranking model based on multi-granularity relevance features fusion (MGRF), which explicitly incorporates the relationships among retrieved text chunks within the candidate list to improve ranking. Our model incorporates multi-granularity relevance features between the query and texts, as well as among the text chunks themselves, to more effectively model their interrelationships. Our model is extensively evaluated on public...","companies":["Meta/FAIR"],"matched_orgs":["Meta/FAIR"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/ic-nidc67200.2025.11390124","openalex_id":"https://openalex.org/W7131237658","cited_by_count":0,"quality_score":41,"matched_keywords":["retrieval"],"author_affiliations":["Aisino (China)","Beijing University of Posts and Telecommunications","Meta (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7757999897003174},{"id":"https://openalex.org/C158154518","display_name":"Relevance (law)","score":0.7235999703407288},{"id":"https://openalex.org/C189430467","display_name":"Ranking (information retrieval)","score":0.636900007724762},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.6014999747276306},{"id":"https://openalex.org/C192209626","display_name":"Focus (optics)","score":0.5819000005722046},{"id":"https://openalex.org/C103278499","display_name":"Similarity (geometry)","score":0.5259000062942505},{"id":"https://openalex.org/C2776401178","display_name":"Feature (linguistics)","score":0.5206000208854675},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.5022000074386597}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/twinvla-data-efficient-bimanual-manipulation-with-twin-single-arm-vision-language-action-models","title":"TwinVLA: Data-Efficient Bimanual Manipulation with Twin Single-Arm Vision-Language-Action Models","url":"https://www.microsoft.com/en-us/research/publication/twinvla-data-efficient-bimanual-manipulation-with-twin-single-arm-vision-language-action-models/","published":"2025-11-06","authors":["Hokyun Im","Euijin Jeong","Andrey Kolobov","Jianlong Fu","Youngwoon Lee"],"abstract":"Vision-language-action models (VLAs) trained on large-scale robotic datasets have demonstrated strong performance on manipulation tasks, including bimanual tasks. However, because most public datasets focus on single-arm demonstrations, adapting VLAs for bimanual tasks typically requires substantial additional bimanual data and fine-tuning. To address this challenge, we introduce TwinVLA, a modular framework that composes two copies of a pretrained single-arm VLA into a coordinated bimanual VLA. Unlike monolithic cross-embodiment models trained on mixtures of single-arm and bimanual data, TwinVLA improves both data efficiency and performance by composing pretrained single-arm policies. Across diverse bimanual tasks in real-world and simulation settings, TwinVLA outperforms a comparably-sized monolithic RDT-1B model without requiring any bimanual pretraining. Furthermore, it narrows the g...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","1970-01-01","efficient"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"hf-org-paper:tencent:2511.04962","title":"Too Good to be Bad: On the Failure of LLMs to Role-Play Villains","url":"https://huggingface.co/papers/2511.04962","published":"2025-11-06","authors":["Tencent/Hunyuan"],"abstract":"","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["HuggingFace org papers","tencent"],"author_affiliations":["Tencent/Hunyuan"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/tencent/papers"}},{"id":"apple:qn6qa6yraz9j6cpohauokok3","title":"PolyNorm: Few-Shot LLM-Based Text Normalization for Text-to-Speech","url":"https://machinelearning.apple.com/research/polynorm","published":"2025-11-06","authors":["Michel Wong","Ali Alshehri","Sophia Kao","Haotian He"],"abstract":"Text Normalization (TN) is a key preprocessing step in Text-to-Speech (TTS) systems, converting written forms into their canonical spoken equivalents. Traditional TN systems can exhibit high accuracy, but involve substantial engineering effort, are difficult to scale, and pose challenges to language coverage, particularly in low-resource settings. We propose PolyNorm, a prompt-based approach to TN using Large Language Models (LLMs), aiming to...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["LLM"],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"openalex:W4417250785","title":"A Mixed Deep Neural Network for sMRI and fMR Features Fusion in AD Detection","url":"https://doi.org/10.1109/bibe66822.2025.00042","published":"2025-11-06","authors":["Yanteng Zhang","Yuxiang Wei","Yizhuo He","Anees Abrol","Vince D. Calhoun"],"abstract":"MRI(Magnetic Resonance Imaging), as a non-invasive imaging technology, provides rich information at both the structural and functional levels of brain, offering significant support for the screening of Alzheimer's disease (AD). However, due to the large heterogeneity in data format and spatial characteristics between sMRI and fMRI, achieving effective fusion of these two modalities remains a major challenge. To address this issue, we first designed a Transformer attention module incorporating 3D positional encoding to effectively encode 3D sMRI features. Next, we constructed a cascaded transformer module to address the feature encoding of fMRI and the multimodal feature fusion of MRI images from different spatial domains, thereby enhancing the feature representation of both modalities. Additionally, we adopted a multi-layer fused feature integration strategy to enhance the robustness of....","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/bibe66822.2025.00042","openalex_id":"https://openalex.org/W4417250785","cited_by_count":1,"quality_score":38,"matched_keywords":[],"author_affiliations":["Center for Translational Research in Neuroimaging and Data Science","Google (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7164999842643738},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.7081999778747559},{"id":"https://openalex.org/C63479239","display_name":"Robustness (evolution)","score":0.6636000275611877},{"id":"https://openalex.org/C36464697","display_name":"Visualization","score":0.5708000063896179},{"id":"https://openalex.org/C2776401178","display_name":"Feature (linguistics)","score":0.5230000019073486},{"id":"https://openalex.org/C66746571","display_name":"ENCODE","score":0.5182999968528748},{"id":"https://openalex.org/C153180895","display_name":"Pattern recognition (psychology)","score":0.5120999813079834},{"id":"https://openalex.org/C125411270","display_name":"Encoding (memory)","score":0.4587000012397766}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"arxiv:2511.03929","title":"NVIDIA Nemotron Nano V2 VL","url":"https://huggingface.co/papers/2511.03929","published":"2025-11-06","authors":["NVIDIA","Amala Sanjay Deshmukh","Kateryna Chumachenko","Tuomas Rintamaki","Matthieu Le","Tyler Poon","Danial Mohseni Taheri","Ilia Karmanov","Guilin Liu","Jarno Seppanen","Guo Chen","Karan Sapra"],"abstract":"We introduce Nemotron Nano V2 VL, the latest model of the Nemotron vision-language series designed for strong real-world document understanding, long video comprehension, and reasoning tasks. Nemotron Nano V2 VL delivers significant improvements over our previous model, Llama-3.1-Nemotron-Nano-VL-8B, across all vision and text domains through major enhancements in model architecture, datasets, and training recipes. Nemotron Nano V2 VL builds on Nemotron Nano V2, a hybrid Mamba-Transformer LLM, and innovative token reduction techniques to achieve higher inference throughput in long document and video scenarios. We are releasing model checkpoints in BF16, FP8, and FP4 formats and sharing large parts of our datasets, recipes and training code.","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["huggingface_search"],"source":"huggingface_search","work_type":"","doi":"","openalex_id":"","cited_by_count":0,"quality_score":31,"matched_keywords":["LLM"],"author_affiliations":[],"concepts":[],"official_report":false,"quality_signals":{"company_match_source":"HuggingFace Papers organization/author metadata"}},{"id":"hf-org-paper:stepfun-ai:2511.03601","title":"Step-Audio-EditX Technical Report","url":"https://huggingface.co/papers/2511.03601","published":"2025-11-05","authors":["StepFun"],"abstract":"We present Step-Audio-EditX, the first open-source LLM-based audio model excelling at expressive and iterative audio editing encompassing emotion, speaking style, and paralinguistics alongside robust zero-shot text-to-speech (TTS) capabilities.Our core innovation lies in leveraging only large-margin synthetic data, which circumvents the need for embedding-based priors or auxiliary modules. This large-margin learning approach enables both iterative control and high expressivity across voices, and represents a fundamental pivot from the conventional focus on representation-level disentanglement. Evaluation results demonstrate that Step-Audio-EditX surpasses both MiniMax-2.6-hd and Doubao-Seed-TTS-2.0 in emotion editing and other fine-grained control tasks.","companies":["StepFun"],"matched_orgs":["StepFun"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["HuggingFace org papers","stepfun-ai","LLM"],"author_affiliations":["StepFun"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/stepfun-ai/papers"}},{"id":"official:1242510474413672","title":"Grok 2 Open-Weights Model Card","url":"https://huggingface.co/xai-org/grok-2","published":"2025-11-05","authors":["xAI"],"abstract":"Official xAI model card for the Grok 2 open-weights release. The card describes the released model, SGLang serving instructions, hardware requirements, and community license.","companies":["xAI"],"matched_orgs":["xAI"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_report"],"source":"official_report","work_type":"model_card","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["xAI"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official company report source"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/twt-thinking-without-tokens-by-habitual-reasoning-distillation-with-multi-teachers-guidance","title":"TwT: Thinking without Tokens by Habitual Reasoning Distillation with Multi-Teachers’ Guidance","url":"https://www.microsoft.com/en-us/research/publication/twt-thinking-without-tokens-by-habitual-reasoning-distillation-with-multi-teachers-guidance/","published":"2025-11-04","authors":["Jingxian Xu","Mengyu Zhou","Weichang Liu","Hanbing Liu","Shi Han","Dongmei Zhang"],"abstract":"Large Language Models (LLMs) have made significant strides in problem-solving by incorporating reasoning processes. However, this enhanced reasoning capability results in an increased number of output tokens during inference, leading to higher computational costs. To address this challenge, we propose TwT (Thinking without Tokens), a method that reduces inference-time costs through habitual reasoning distillation with multi-teachers' guidance, while maintaining high performance. Our approach introduces a Habitual Reasoning Distillation method, which internalizes explicit reasoning into the model’s habitual behavior through a Teacher-Guided compression strategy inspired by human cognition. Additionally, we propose Dual-Criteria Rejection Sampling (DCRS), a technique that generates a high-quality and diverse distillation dataset using multiple teacher models, making our method suitable for...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":88,"matched_keywords":["Inproceedings (Conference)","Algorithms","Artificial intelligence","Human language technologies","1970-01-01","LLM","efficient","compression","distillation"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/from-models-to-operators-rethinking-autoscaling-granularity-for-large-generative-models","title":"From Models to Operators: Rethinking Autoscaling Granularity for Large Generative Models","url":"https://www.microsoft.com/en-us/research/publication/from-models-to-operators-rethinking-autoscaling-granularity-for-large-generative-models/","published":"2025-11-04","authors":["Xingqi Cui","Chieh-Jan Mike Liang","Jiarong Xing","Haoran Qiu"],"abstract":"Serving large generative models such as LLMs and multi-modal transformers requires balancing user-facing SLOs (e.g., time-to-first-token, time-between-tokens) with provider goals of efficiency and cost reduction. Existing solutions rely on static provisioning or model-level autoscaling, both of which treat the model as a monolith. This coarse-grained resource management leads to degraded performance or significant resource underutilization due to poor adaptability to dynamic inference traffic that is common online.The root cause of this inefficiency lies in the internal structure of generative models: they are executed as graphs of interconnected operators. Through detailed characterization and systematic analysis, we find that operators are heterogeneous in their compute and memory footprints and exhibit diverse sensitivity to workload and resource factors such as batch size, sequence l...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Miscellaneous","Artificial intelligence","Systems and networking","Cloud systems","LLMs Inference","memory"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/whisper-leak-a-side-channel-attack-on-large-language-models","title":"Whisper Leak: a side-channel attack on Large Language Models","url":"https://www.microsoft.com/en-us/research/publication/whisper-leak-a-side-channel-attack-on-large-language-models/","published":"2025-11-04","authors":["Geoff McDonald","Jonathan Bar Or"],"abstract":"Large Language Models (LLMs) are increasingly deployed in sensitive domains including healthcare, legal services, and confidential communications, where privacy is paramount. This paper introduces Whisper Leak, a side-channel attack that infers user prompt topics from encrypted LLM traffic by analyzing packet size and timing patterns in streaming responses. Despite TLS encryption protecting content, these metadata patterns leak sufficient information to enable topic classification. We demonstrate the attack across 28 popular LLMs from major providers, achieving near-perfect classification (often98% AUPRC) and high precision even at extreme class imbalance (10,000:1 noise-to-target ratio). For many models, we achieve 100% precision in identifying sensitive topics like\"money laundering\"while recovering 5-20% of target conversations. This industry-wide vulnerability poses significant risks....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Unpublished","Artificial intelligence","Security, privacy, and cryptography","Computer science","LLM"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/mmtu-a-massive-multi-task-table-understanding-and-reasoning-benchmark","title":"MMTU: A Massive Multi-Task Table Understanding and Reasoning Benchmark","url":"https://www.microsoft.com/en-us/research/publication/mmtu-a-massive-multi-task-table-understanding-and-reasoning-benchmark/","published":"2025-11-04","authors":["Junjie Xing","Yeye He","Mengyu Zhou","Haoyu Dong","Shi Han","Lingjiao Chen","Dongmei Zhang","Surajit Chaudhuri","H. V. Jagadish"],"abstract":"Tables and table-based use cases play a crucial role in many important real-world applications, such as spreadsheets, databases, and computational notebooks, which traditionally require expert-level users like data engineers, data analysts, and database administrators to operate. Although LLMs have shown remarkable progress in working with tables (e.g., in spreadsheet and database copilot scenarios), comprehensive benchmarking of such capabilities remains limited. In contrast to an extensive and growing list of NLP benchmarks, evaluations of table-related tasks are scarce, and narrowly focus on tasks like NL-to-SQL and Table-QA, overlooking the broader spectrum of real-world tasks that professional users face. This gap limits our understanding and model progress in this important area.In this work, we introduce MMTU, a large-scale benchmark with over 30K questions across 25 real-world ta...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Data platforms and analytics","Human language technologies","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/sigmacollab-an-application-driven-dataset-for-physically-situated-collaboration","title":"SigmaCollab: An Application-Driven Dataset for Physically Situated Collaboration","url":"https://www.microsoft.com/en-us/research/publication/sigmacollab-an-application-driven-dataset-for-physically-situated-collaboration/","published":"2025-11-04","authors":["Dan Bohus","Sean Andrist","Ann Paradiso","Nick Saw","Tim Schoonbeek","Maia Stiber"],"abstract":"We introduce SigmaCollab, a dataset enabling research on physically situated human-AI collaboration. The dataset consists of a set of 85 sessions in which untrained participants were guided by a mixed-reality assistive AI agent in performing procedural tasks in the physical world. SigmaCollab includes a set of rich, multimodal data streams, such as the participant and system audio, egocentric camera views from the head-mounted device, depth maps, head, hand and gaze tracking information, as well as additional annotations performed post-hoc. While the dataset is relatively small in size (~ 14 hours), its application-driven and interactive nature brings to the fore novel research challenges for human-AI collaboration, and provides more realistic testing grounds for various AI models operating in this space. In future work, we plan to use the dataset to construct a set of benchmarks for phy...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Artificial intelligence","Computer vision","Human-computer interaction","agent"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"hf-org-paper:tencent:2511.02347","title":"LTD-Bench: Evaluating Large Language Models by Letting Them Draw","url":"https://huggingface.co/papers/2511.02347","published":"2025-11-04","authors":["Tencent/Hunyuan"],"abstract":"","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["HuggingFace org papers","tencent"],"author_affiliations":["Tencent/Hunyuan"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/tencent/papers"}},{"id":"official:df91499f2a443e40","title":"To Mask or to Mirror: Human-AI Alignment in Collective Reasoning","url":"https://deepmind.google/research/publications/180362/","published":"2025-11-04","authors":["Google/DeepMind"],"abstract":"As LLMs are increasingly used to simulate and augment collective decision-making, it is critical to examine how they align with human social reasoning. The key novel contribution of this paper is a method and study of alignment for collective outcomes (as opposed to the body of work on individual behavior alignment). We adapt a classical social psychology task, Lost at Sea to study how identity cues affect group leader election in a large-scale human experiment (N=748); we also simulate the participants with the Gemini, GPT, and Claude Large Language Models (LLMs). This reveals a critical insight: the tension between alignment for simulation vs alignment for an idealized outcome. Some models mirror people, where others mask our collective biases. Moreover, when identity cues are hidden, contrary to our human study, some models use identity to compensate for male-associated dialogue resul...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.48448/3cw2-ry52","openalex_id":"https://openalex.org/W7106853032","cited_by_count":0,"quality_score":56,"matched_keywords":["election"],"author_affiliations":["Google/DeepMind","DeepMind (United Kingdom)","Google (United States)"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Google DeepMind publications page https://deepmind.google/research/publications/"}},{"id":"apple:m07q6pylz3x1xh5vuer02hro","title":"Adapting Self-Supervised Representations as a Latent Space for Efficient Generation","url":"https://machinelearning.apple.com/research/self-supervised-representations","published":"2025-11-04","authors":["Ming Gui","Johannes Schusterbauer","Timy Phan","Felix Krause","Josh Susskind","Miguel Angel Bautista","Björn Ommer"],"abstract":"We introduce Representation Tokenizer (RepTok), a generative modeling framework that represents an image using a single continuous latent token obtained from self-supervised vision transformers. Building on a pre-trained SSL encoder, we fine-tune only the semantic token embedding and pair it with a generative decoder trained jointly using a standard flow matching objective. This adaptation enriches the token with low-level,...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["efficient"],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"arxiv:2511.02415","title":"ChartM$^3$: A Multi-Stage Code-Driven Pipeline for Constructing Multi-Dimensional and Multi-Step Visual Reasoning Data in Chart Comprehension","url":"http://arxiv.org/abs/2511.02415","published":"2025-11-04","authors":["Xu, Duo","Cheng, Hao","Lin, Xin","Xie, Zhen","Wang, Hao"],"abstract":"Complex chart understanding tasks demand advanced visual recognition and reasoning capabilities from multimodal large language models (MLLMs). However, current research provides limited coverage of complex chart scenarios and computation-intensive reasoning tasks prevalent in real-world applications. This study proposes an automated multi-stage code-driven pipeline for systematically generating visual reasoning datasets to address these limitations. The pipeline integrates retrieval-augmented generation (RAG) to retrieve professional chart templates and employs chain-of-thought (CoT) strategies to generate reasoning codes that simulate real data distributions, thereby driving chart rendering and question-related statistical computations. Through model-based evaluation, the pipeline enhances chart diversity and data quality. Using this framework, we construct ChartM$^3$, a multi-dimension...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"","openalex_id":"https://openalex.org/W7104182893","cited_by_count":0,"quality_score":41,"matched_keywords":["retrieval"],"author_affiliations":["Alibaba Group (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7957000136375427},{"id":"https://openalex.org/C2777508537","display_name":"Visual reasoning","score":0.7215999960899353},{"id":"https://openalex.org/C190812933","display_name":"Chart","score":0.67330002784729},{"id":"https://openalex.org/C43521106","display_name":"Pipeline (software)","score":0.6309999823570251},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5827000141143799},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.515999972820282},{"id":"https://openalex.org/C205711294","display_name":"Rendering (computer graphics)","score":0.5115000009536743},{"id":"https://openalex.org/C124101348","display_name":"Data mining","score":0.4090999960899353}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/vcode-a-multimodal-coding-benchmark-with-svg-as-symbolic-visual-representation","title":"VCode: a Multimodal Coding Benchmark with SVG as Symbolic Visual Representation","url":"https://www.microsoft.com/en-us/research/publication/vcode-a-multimodal-coding-benchmark-with-svg-as-symbolic-visual-representation/","published":"2025-11-03","authors":["Kevin Qinghong Lin","Yuhao Zheng","Hangyu Ran","Dantong Zhu","Dongxing Mao","Linjie Li","Philip H. S. Torr","Alex Jinpeng Wang"],"abstract":"Code has emerged as a precise and executable medium for reasoning and action in the agent era. Yet, progress has largely focused on language-centric tasks such as program synthesis and debugging, leaving visual-centric coding underexplored. Inspired by how humans reason over sketches, we advocate SVG code as a compact, interpretable, and executable visual representation. We introduce VCode, a benchmark that reframes multimodal understanding as code generation: given an image, a model must produce SVG that preserves symbolic meaning for downstream reasoning. VCode covers three domains - general commonsense (MM-Vet), professional disciplines (MMMU), and visual-centric perception (CV-Bench). To assess symbolic fidelity, we propose CodeVQA, a novel evaluation protocol in which a policy model answers questions over rendered SVGs; correct answers indicate faithful symbolic preservation. Empiri...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Unpublished","Computer vision","Computer science","agent"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"apple:vi6hvs9e1goumi2doeqpetc6","title":"Policy Maps: Tools for Guiding the Unbounded Space of LLM Behaviors","url":"https://machinelearning.apple.com/research/policy-maps","published":"2025-11-03","authors":["Michelle S. Lam","Jeffrey P. Bigham","Fred Hohman","Dominik Moritz","Kenneth Holstein","Mary Beth Kery"],"abstract":"AI policy sets boundaries on acceptable behavior for AI models, but this is challenging in the context of large language models (LLMs): how do you ensure coverage over a vast behavior space? We introduce policy maps, an approach to AI policy design inspired by the practice of physical mapmaking. Instead of aiming for full coverage, policy maps aid effective navigation through intentional design choices about which aspects to capture and which to...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.1145/3746059.3747680","openalex_id":"https://openalex.org/W4403794964","cited_by_count":1,"quality_score":57,"matched_keywords":["LLM"],"author_affiliations":["Apple","Apple (United States)","Carnegie Mellon University","Stanford University"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"openalex:W4415821357","title":"HAT: Hybrid Attention Transformer for Image Restoration","url":"https://doi.org/10.1109/tpami.2025.3628275","published":"2025-11-03","authors":["Xiangyu Chen","Xintao Wang","Wenlong Zhang","Xiangtao Kong","Yu Qiao","Jiantao Zhou","Chao Dong"],"abstract":"Transformer-based methods have shown impressive performance in image restoration tasks, such as image super-resolution and denoising. However, we find that these networks can only utilize a limited spatial range of input information through attribution analysis. This implies that the potential of Transformer is still not fully exploited in existing networks. In order to activate more input pixels for better restoration, we propose a new Hybrid Attention Transformer (HAT). It combines both channel attention and window-based self-attention schemes, thus making use of their complementary advantages. Moreover, to better aggregate the cross-window information, we introduce an overlapping cross-attention module to enhance the interaction between neighboring window features. In the training stage, we additionally adopt a same-task pre-training strategy to further exploit the potential of the mo...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/tpami.2025.3628275","openalex_id":"https://openalex.org/W4415821357","cited_by_count":13,"quality_score":54,"matched_keywords":["compression"],"author_affiliations":["Beijing Academy of Artificial Intelligence","City University of Macau","Shanghai Artificial Intelligence Laboratory","Shenzhen Institutes of Advanced Technology","Tencent (China)","University of Macau"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.733299970626831},{"id":"https://openalex.org/C106430172","display_name":"Image restoration","score":0.7330999970436096},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.6754999756813049},{"id":"https://openalex.org/C160633673","display_name":"Pixel","score":0.5958999991416931},{"id":"https://openalex.org/C165696696","display_name":"Exploit","score":0.5766000151634216},{"id":"https://openalex.org/C66322947","display_name":"Transformer","score":0.5723000168800354},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.492900013923645},{"id":"https://openalex.org/C185798385","display_name":"Benchmark (surveying)","score":0.41200000047683716}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":13}},{"id":"arxiv:2512.02502","title":"AskNearby: An LLM-Based Application for Neighborhood Information Retrieval and Personalized Cognitive-Map Recommendations","url":"http://arxiv.org/abs/2512.02502","published":"2025-11-03","authors":["Luyao Niu","Zhicheng Deng","Boyang Li","Nuoxian Huang","Ruiqi Liu","Wenjia Zhang"],"abstract":"The \"15-minute city\" envisions neighborhoods where residents can meet daily needs via a short walk or bike ride. Realizing this vision requires not only physical proximity but also efficient and reliable access to information about nearby places, services, and events. Existing location-based systems, however, focus mainly on city-level tasks and neglect the spatial, temporal, and cognitive factors that shape localized decision-making. We conceptualize this gap as the Local Life Information Accessibility (LLIA) problem and introduce AskNearby, an AI-driven community application that unifies retrieval and recommendation within the 15-minute life circle. AskNearby integrates (i) a three-layer Retrieval-Augmented Generation (RAG) pipeline that synergizes graph-based, semantic-vector, and geographic retrieval with (ii) a cognitive-map model that encodes each user's neighborhood familiarity an...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3764912.3770813","openalex_id":"https://openalex.org/W4417006745","cited_by_count":0,"quality_score":53,"matched_keywords":["LLM","personalized","retrieval","efficient"],"author_affiliations":["Imperial College London","New York University","Peking University Shenzhen Hospital","Shenzhen Metro (China)","Tencent (China)","Tongji University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7555000185966492},{"id":"https://openalex.org/C192209626","display_name":"Focus (optics)","score":0.6039999723434448},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.5144000053405762},{"id":"https://openalex.org/C43521106","display_name":"Pipeline (software)","score":0.5115000009536743},{"id":"https://openalex.org/C2776505523","display_name":"Plan (archaeology)","score":0.43849998712539673},{"id":"https://openalex.org/C2522767166","display_name":"Data science","score":0.39239999651908875},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.38029998540878296},{"id":"https://openalex.org/C21025794","display_name":"Cognitive models of information retrieval","score":0.362199991941452}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4417283652","title":"Query-Aware Route Enrichment for Handling Complex Direction Queries and Grounding Large Language Models","url":"https://doi.org/10.1145/3748636.3762802","published":"2025-11-03","authors":["Antonios Karatzoglou","Michael Snider","Varun Kakkar","Michael R. Evans","Dragomir Yankov","Goran Predović"],"abstract":"Modern direction services must evolve to meet the growing complexity of user queries, which increasingly resemble natural language and include nuanced constraints and preferences. This paper introduces a dynamic, query-aware route enrichment framework designed to enhance routing services by integrating real-time contextual data—such as weather, events, and POIs. The system comprises five key components: query intent understanding, hint point selection along the route, data sourcing, language model-based response generation, and caching for performance optimization. A hybrid approach combining lightweight language models and a rule-based system is used to interpret user intent, while adaptive hint logic minimizes redundant API calls. The enriched route responses can be consumed directly by user interfaces or serve as grounding data for LLM-based assistants. A demo application illustrates....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3748636.3762802","openalex_id":"https://openalex.org/W4417283652","cited_by_count":0,"quality_score":45,"matched_keywords":["LLM","language model"],"author_affiliations":["Microsoft (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8201000094413757},{"id":"https://openalex.org/C81917197","display_name":"Selection (genetic algorithm)","score":0.5271000266075134},{"id":"https://openalex.org/C26517878","display_name":"Key (lock)","score":0.49639999866485596},{"id":"https://openalex.org/C82876162","display_name":"Latency (audio)","score":0.4772000014781952},{"id":"https://openalex.org/C195324797","display_name":"Natural language","score":0.4507000148296356},{"id":"https://openalex.org/C28719098","display_name":"Point (geometry)","score":0.4147999882698059},{"id":"https://openalex.org/C192028432","display_name":"Query language","score":0.39410001039505005},{"id":"https://openalex.org/C74172769","display_name":"Routing (electronic design automation)","score":0.3386000096797943}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4417339177","title":"Model Training with Sparsity Loss for Compressed Attention","url":"https://doi.org/10.1109/ictai66417.2025.00144","published":"2025-11-03","authors":["Eli Sason","Darya Frolova","Boris Nazarov","Felix Goldberg"],"abstract":"Long context fed into Large Language Models (LLMs) becomes one of the main limiting factors for their successful deployment in industry, which is related to the high deployment costs and dedicated hardware with large memory onboard. The computational overhead related to the long context stems directly from the attention layer computation. In this work, we aim to accelerate the attention layer computation, while minimizing performance degradation by exploiting theoretical framework of attention sparsity in LLMs. Based on its findings, we propose a customized loss function which is designed to enforce sparsity of the attention layer by restricting the energy of a predefined number of top elements in the attention matrix. We perform experiments with GPT-2 language model to show the efficacy of our sparsification approach. The attention matrices of the models trained with the proposed loss r...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/ictai66417.2025.00144","openalex_id":"https://openalex.org/W4417339177","cited_by_count":0,"quality_score":45,"matched_keywords":["language model","memory"],"author_affiliations":["Huawei Technologies (China)","Huawei Technologies (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7965999841690063},{"id":"https://openalex.org/C2779343474","display_name":"Context (archaeology)","score":0.6521999835968018},{"id":"https://openalex.org/C105339364","display_name":"Software deployment","score":0.6420000195503235},{"id":"https://openalex.org/C2779960059","display_name":"Overhead (engineering)","score":0.5756999850273132},{"id":"https://openalex.org/C14036430","display_name":"Function (biology)","score":0.5504999756813049},{"id":"https://openalex.org/C2779227376","display_name":"Layer (electronics)","score":0.5383999943733215},{"id":"https://openalex.org/C188198153","display_name":"Limiting","score":0.5315999984741211},{"id":"https://openalex.org/C183322885","display_name":"Context model","score":0.4399999976158142}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4415777346","title":"Agentic Framework for Intelligent Surrogate Modeling of Simulator-Driven Workflows","url":"https://doi.org/10.2118/229629-ms","published":"2025-11-03","authors":["Pradeep Kumar Shetty","Antonio Abinader","Prasham Sheth","Sreekrishnan Ramachandran","T.J.G.M. Lam"],"abstract":"Abstract We introduce a novel large-language-model (LLM) driven agentic framework to automate end-to-end surrogate modeling of simulator-driven workflows in the oil and gas domain. The autonomous agent orchestrates critical steps—initial sampling, simulator execution, adaptive retraining, and strategy switching—to minimize expensive simulator calls while meeting accuracy targets with minimal subject matter expert (SME) intervention. Applied to well-network flow and gas processing simulations, the agent dynamically switches sampling strategies to achieve user-specified accuracy efficiently. In each iteration, the LLM agent generates candidate input sets, evaluates them via the simulator, and fine-tunes the surrogate model on the expanded dataset. Uncertainty estimates from Monte Carlo (MC) dropout guide an acquisition function that scores candidates (e.g., by residual error or predicted v...","companies":["Meta/FAIR"],"matched_orgs":["Meta/FAIR"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.2118/229629-ms","openalex_id":"https://openalex.org/W4415777346","cited_by_count":0,"quality_score":45,"matched_keywords":["LLM","agent"],"author_affiliations":["Meta (United States)"],"concepts":[{"id":"https://openalex.org/C177212765","display_name":"Workflow","score":0.7400000095367432},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6829000115394592},{"id":"https://openalex.org/C2781395549","display_name":"Adaptive sampling","score":0.5838000178337097},{"id":"https://openalex.org/C131675550","display_name":"Surrogate model","score":0.5710999965667725},{"id":"https://openalex.org/C140779682","display_name":"Sampling (signal processing)","score":0.5644000172615051},{"id":"https://openalex.org/C177264268","display_name":"Set (abstract data type)","score":0.5437999963760376},{"id":"https://openalex.org/C43521106","display_name":"Pipeline (software)","score":0.5390999913215637},{"id":"https://openalex.org/C155512373","display_name":"Residual","score":0.4648999869823456}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4415821320","title":"Refine, Control and Distill: A Text-to-Image Framework for Faithful Image Generation","url":"https://doi.org/10.1109/tpami.2025.3628109","published":"2025-11-03","authors":["Peng Xing","Ning Wang","Yanpeng Sun","Jinhui Tang","Zechao Li"],"abstract":"While text-to-image diffusion models exhibit outstanding results, they struggle to faithfully generate key subjects with corresponding attributes in prompts, challenges known as catastrophic neglect and attribute binding. Previous works typically utilize attention adjustments to solve the above problems, whereas we observe that they may still generate unfaithful images. In this paper, we carefully analyze the text-to-image process and pinpoint three pivotal bottlenecks that hinder image faithful generation: (1) unequal responses of neglected subjects in text embedding, (2) competition and entanglement between subjects' attention, and (3) suboptimal quality of intermediate features from U-Net. Based on the aforementioned observations, we propose a Refine, Control, and Distill (RCD) framework built upon the stable diffusion model to alleviate the negative effects raised by the bottlenecks....","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/tpami.2025.3628109","openalex_id":"https://openalex.org/W4415821320","cited_by_count":0,"quality_score":41,"matched_keywords":["distillation"],"author_affiliations":["Huawei Technologies (China)","Nanjing University of Science and Technology"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.777400016784668},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.6202999949455261},{"id":"https://openalex.org/C26517878","display_name":"Key (lock)","score":0.6047999858856201},{"id":"https://openalex.org/C41608201","display_name":"Embedding","score":0.5935999751091003},{"id":"https://openalex.org/C98045186","display_name":"Process (computing)","score":0.5928000211715698},{"id":"https://openalex.org/C115961682","display_name":"Image (mathematics)","score":0.5394999980926514},{"id":"https://openalex.org/C2779530757","display_name":"Quality (philosophy)","score":0.48750001192092896},{"id":"https://openalex.org/C2775924081","display_name":"Control (management)","score":0.4771000146865845}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4417339337","title":"Multi-Step Adaptive Attack Agent: A Dynamic Approach for Jailbreaking Large Language Models","url":"https://doi.org/10.1109/ictai66417.2025.00026","published":"2025-11-03","authors":["Huiyun Jing","Jincheng Wei","Wei Wei","Yingshui Tan","Boren Zheng","Qingsong Yao"],"abstract":"Large Language Models (LLMs) have showcased remarkable potential across various domains, especially in text generation. However, their vulnerability to jailbreak attacks presents considerable challenges to secure deployment, as attackers can use carefully crafted prompts to bypass safety measures and generate harmful content. Current jailbreak methods generally suffer from two significant limitations: a restricted strategy space for generating adversarial prompts and insufficient optimization of prompts based on feedback from LLMs. To overcome these challenges, we present Multistep Adaptive Attack Agent (MATA), an approach that employs a game-theoretic interaction between attack model and target model to adaptively execute jailbreak attacks on LLMs. This method enables iterative attempts based on reflection, gradually identifying the optimal jailbreak attack strategy within a complex str...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/ictai66417.2025.00026","openalex_id":"https://openalex.org/W4417339337","cited_by_count":0,"quality_score":41,"matched_keywords":["agent"],"author_affiliations":["Alibaba Group (China)","China Academy of Information and Communications Technology","Chinese Academy of Sciences","Institute of Computing Technology"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7954000234603882},{"id":"https://openalex.org/C37736160","display_name":"Adversarial system","score":0.6665999889373779},{"id":"https://openalex.org/C95713431","display_name":"Vulnerability (computing)","score":0.5572999715805054},{"id":"https://openalex.org/C65856478","display_name":"Attack model","score":0.4490000009536743},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.4278999865055084},{"id":"https://openalex.org/C26517878","display_name":"Key (lock)","score":0.3986999988555908},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3736000061035156},{"id":"https://openalex.org/C38652104","display_name":"Computer security","score":0.365200012922287}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4417283726","title":"Toward Foundation Models for Mobility Enriched Geospatially Embedded Objects","url":"https://doi.org/10.1145/3748636.3760459","published":"2025-11-03","authors":["Maria Despoina Siampou","Shang-Ling Hsu","Shushman Choudhury","Neha Arora","Cyrus Shahabi"],"abstract":"Recent advances in large foundation models (FMs) have enabled learning general-purpose representations in natural language, vision, and audio. Yet geospatial artificial intelligence (GeoAI) still lacks widely adopted foundation models that generalize across tasks that require joint reasoning over geospatial objects and human mobility. Such tasks are crucial as mobility, along with satellite imagery, street view, and text, is a core modality for understanding the physical world. We argue that a key bottleneck is the absence of unified, general-purpose, and transferable representations for geospatially embedded objects (GEOs). Such objects include points, polylines, and polygons in geographic space, enriched with semantic context and critical for geospatial reasoning. Much current GeoAI research compares GEOs to tokens in language models, where patterns of human movement and spatiotemporal...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3748636.3760459","openalex_id":"https://openalex.org/W4417283726","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Google (United States)","University of Southern California"],"concepts":[{"id":"https://openalex.org/C9770341","display_name":"Geospatial analysis","score":0.8608999848365784},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6703000068664551},{"id":"https://openalex.org/C2779343474","display_name":"Context (archaeology)","score":0.5428000092506409},{"id":"https://openalex.org/C2522767166","display_name":"Data science","score":0.48420000076293945},{"id":"https://openalex.org/C26517878","display_name":"Key (lock)","score":0.4765999913215637},{"id":"https://openalex.org/C196031653","display_name":"Cartographic generalization","score":0.45579999685287476},{"id":"https://openalex.org/C184337299","display_name":"Semantics (computer science)","score":0.448199987411499},{"id":"https://openalex.org/C41856607","display_name":"Geographic information system","score":0.4480000138282776}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7124880808","title":"Heart Rate Monitoring Through ANC Headphones in Unconstrained Environments","url":"https://doi.org/10.1109/bsn66969.2025.11337355","published":"2025-11-03","authors":["Zhenyu Wu","Maanya Shanker","Tao Chen","Xiaoran Fan","Longfei Shangguan"],"abstract":"This paper introduces CLEAR-APG, a novel acoustic sensing approach that enables reliable heart rate monitoring in unconstrained environments using off-the-shelf active noise cancellation (ANC) headphones. By emitting ultrasonic signals into the user's ear canal via the headphone speaker and analyzing their echoes, which can detect the frequency of a pulsating vein along the canal wall. However, everyday activities such as exercising, speaking, or eating cause jaw movements that deform the ear canal, overwhelming the subtle deformation caused by blood flowing. To overcome this challenge, we employ the ANC headphone's built-in gyroscope to capture body motion and identify how various motion patterns influence the heartbeat waveform. Building on this insight, we propose a multi-modal method that effectively denoises the heartbeat waveform measurements and further accurately extracts heart r...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/bsn66969.2025.11337355","openalex_id":"https://openalex.org/W7124880808","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Google (United States)","Research!America (United States)","Samsung (United States)","University of Pittsburgh"],"concepts":[{"id":"https://openalex.org/C13852961","display_name":"Heartbeat","score":0.8514999747276306},{"id":"https://openalex.org/C2781258422","display_name":"Headphones","score":0.7228000164031982},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.557699978351593},{"id":"https://openalex.org/C99498987","display_name":"Noise (video)","score":0.5067999958992004},{"id":"https://openalex.org/C197424946","display_name":"Waveform","score":0.5006999969482422},{"id":"https://openalex.org/C24890656","display_name":"Acoustics","score":0.4650000035762787},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.44769999384880066},{"id":"https://openalex.org/C2982892191","display_name":"Heart beat","score":0.44029998779296875}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4417335751","title":"Few-shot Vision-language Prompt Tuning of VLMs for On-road Object Detection","url":"https://doi.org/10.1145/3764919.3770873","published":"2025-11-03","authors":["Minsoo Choi","Ravi Garg","Mohamed Moustafa","Tarun Bhatia","Amber Roy Chowdhury"],"abstract":"","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3764919.3770873","openalex_id":"https://openalex.org/W4417335751","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Amazon (United States)","Bellevue Hospital Center"],"concepts":[{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.6258000135421753},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.6022999882698059},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5652999877929688},{"id":"https://openalex.org/C2776151529","display_name":"Object detection","score":0.5054000020027161},{"id":"https://openalex.org/C2781238097","display_name":"Object (grammar)","score":0.4311000108718872},{"id":"https://openalex.org/C99498987","display_name":"Noise (video)","score":0.37439998984336853},{"id":"https://openalex.org/C153180895","display_name":"Pattern recognition (psychology)","score":0.3199000060558319},{"id":"https://openalex.org/C2776401178","display_name":"Feature (linguistics)","score":0.3010999858379364}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4415777467","title":"Architecting Asset Specific Time Series Foundation Model and its Applications for Asset Performance Management","url":"https://doi.org/10.2118/229309-ms","published":"2025-11-03","authors":["Pradeep Kumar Shetty","T.J.G.M. Lam","Praprut Songchitruksa","Abhinav Kohar","Salma Benslimane","Indranil Roychoudhury","Naveen Gupta","Antonio Abinader","José Celaya"],"abstract":"Abstract This paper presents a time series foundation model (TSFM) designed for asset performance management (APM) of industrial equipment such as pumps and compressors. Leveraging transformer-based architecture and a discrete tokenization scheme for multivariate sensor data, the TSFM is pretrained on a broad corpus of operational and simulated time series data. It is then fine-tuned on asset-specific data to adapt to local conditions with minimal effort. The model supports multiple tasks (e.g., forecasting and anomaly detection) using task-specific heads atop a shared representation. Experimental results on electric submersible pump (ESP) data demonstrate improved performance over traditional LSTM and threshold-based models, with a 10% reduction in RMSE and 8% reduction in MAPE for forecasting, and an F1-score improvement from 0.72 to 0.80 for anomaly detection. A field case study illus...","companies":["Meta/FAIR"],"matched_orgs":["Meta/FAIR"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.2118/229309-ms","openalex_id":"https://openalex.org/W4415777467","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Meta (United States)"],"concepts":[{"id":"https://openalex.org/C2776517139","display_name":"Asset management","score":0.6951000094413757},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6554999947547913},{"id":"https://openalex.org/C151406439","display_name":"Time series","score":0.5598999857902527},{"id":"https://openalex.org/C739882","display_name":"Anomaly detection","score":0.5163000226020813},{"id":"https://openalex.org/C143724316","display_name":"Series (stratigraphy)","score":0.4948999881744385},{"id":"https://openalex.org/C77618280","display_name":"Scheme (mathematics)","score":0.476500004529953},{"id":"https://openalex.org/C111335779","display_name":"Reduction (mathematics)","score":0.44359999895095825},{"id":"https://openalex.org/C76178495","display_name":"Asset (computer security)","score":0.4284000098705292}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/align-to-misalign-automatic-llm-jailbreak-with-meta-optimized-llm-judges","title":"Align to Misalign: Automatic LLM Jailbreak with Meta-Optimized LLM Judges","url":"https://www.microsoft.com/en-us/research/publication/align-to-misalign-automatic-llm-jailbreak-with-meta-optimized-llm-judges/","published":"2025-11-02","authors":["Hamin Koo","Minseon Kim","Jaehyung Kim"],"abstract":"Identifying the vulnerabilities of large language models (LLMs) is crucial for improving their safety by addressing inherent weaknesses. Jailbreaks, in which adversaries bypass safeguards with crafted input prompts, play a central role in red-teaming by probing LLMs to elicit unintended or unsafe behaviors. Recent optimization-based jailbreak approaches iteratively refine attack prompts by leveraging LLMs. However, they often rely heavily on either binary attack success rate (ASR) signals, which are sparse, or manually crafted scoring templates, which introduce human bias and uncertainty in the scoring outcomes. To address these limitations, we introduce AMIS (Align to MISalign), a meta-optimization framework that jointly evolves jailbreak prompts and scoring templates through a bi-level structure. In the inner loop, prompts are refined using fine-grained and dense feedback using a fixed...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","large language models","1970-01-01","LLM"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/bioagents-bridging-the-gap-in-bioinformatics-analysis-with-multi-agent-systems","title":"BioAgents: Bridging the gap in bioinformatics analysis with multi-agent systems","url":"https://www.microsoft.com/en-us/research/publication/bioagents-bridging-the-gap-in-bioinformatics-analysis-with-multi-agent-systems/","published":"2025-11-01","authors":["Nikita Mehandru","Amanda K. Hall","Olesya Melnichenko","Yulia Dubinina","Daniel Tsirulnikov","David Bamman","Ahmed Alaa","Scott Saponas","Venkat S. Malladi"],"abstract":"Developing end-to-end bioinformatics workflows is challenging, demanding deep expertise in both genomics and computational techniques. While large language models (LLMs) provide some assistance, they often lack the nuanced guidance required for complex bioinformatics tasks, and are resource-intensive. We thus propose a multi-agent system built on small language models, fine-tuned on bioinformatics data, and enhanced with retrieval augmented generation (RAG). Our system, BioAgents, enables local operation and personalization using proprietary data. We observe performance comparable to human experts on conceptual genomics tasks, and discuss future work to enhance code generation capabilities. Venue: 1970-01-01","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":88,"matched_keywords":["Article (Journal)","Artificial intelligence","Medical, health and genomics","Biology","1970-01-01","personalization","retrieval","agent","multi-agent"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/mitigate-one-skew-another-tackling-intersectional-biases-in-text-to-image-models","title":"Mitigate One, Skew Another? Tackling Intersectional Biases in Text-to-Image Models","url":"https://www.microsoft.com/en-us/research/publication/mitigate-one-skew-another-tackling-intersectional-biases-in-text-to-image-models/","published":"2025-11-01","authors":["Pushkar Shukla","Aditya Chinchure","Emily Diana","Alexander Tolbert","Kartik Hosanagar","Vineeth N Balasubramanian","Leonid Sigal","Matthew Turk"],"abstract":"The biases exhibited by text-to-image (TTI) models are often treated as independent, though in reality, they may be deeply interrelated. Addressing bias along one dimension—such as ethnicity or age—can inadvertently affect another, like gender, either mitigating or exacerbating existing disparities. Understanding these interdependencies is crucial for designing fairer generative models, yet measuring such effects quantitatively remains a challenge. To address this, we introduce BiasConnect, a novel tool for analyzing and quantifying bias interactions in TTI models. BiasConnect uses counterfactual interventions along different bias axes to reveal the underlying structure of these interactions and estimates the effect of mitigating one bias axis on another. These estimates show strong correlation (+0.65) with observed post-mitigation outcomes.Building on BiasConnect, we propose InterMit, a...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":80,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer vision","Multimodal Large Language Models","Natural language processing","Vision-language models","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/table-specialist-language-model-specialists-for-tables-using-iterative-fine-tuning","title":"Table-Specialist: Language Model Specialists for Tables using Iterative Fine-tuning","url":"https://www.microsoft.com/en-us/research/publication/table-specialist-language-model-specialists-for-tables-using-iterative-fine-tuning/","published":"2025-11-01","authors":["Junjie Xing","Yeye He","Mengyu Zhou","Haoyu Dong","Shi Han","Dongmei Zhang","Surajit Chaudhuri"],"abstract":"Language models such as GPT and Llama have shown remarkable ability on diverse natural language tasks, yet their performance on complex table tasks (e.g., NL-to-Code, data cleaning, etc.) continue to be sub-optimal. To improve their performance, task-specific fine-tuning is often needed, which however require expensive human labeling, and is prone to over-fitting.In this work, we propose Table-Specialist, a new self-trained fine-tuning paradigm specifically designed for table tasks. Our insight is that for each table task, there often exist two dual versions of the same task, one generative and one classification in nature. Leveraging their duality, we propose a Generator-Validator paradigm, to iteratively generate-then-validate training data from language-models, to fine-tune stronger Table-Specialist models that can specialize in a given task, without using manually-labeled data.Extens...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Data platforms and analytics","large language models","1970-01-01","language model"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/slim-sc-thought-pruning-for-efficient-scaling-with-self-consistency","title":"Slim-SC: Thought Pruning for Efficient Scaling with Self-Consistency","url":"https://www.microsoft.com/en-us/research/publication/slim-sc-thought-pruning-for-efficient-scaling-with-self-consistency/","published":"2025-11-01","authors":["Colin Hong Fung Heng","Xu Guo","Anand Chaanan Singh","Esha Choukse","Dmitrii Ustiugov"],"abstract":"Recently, Test-Time Scaling (TTS) has gained increasing attention for improving LLM reasoning performance at test time without retraining the model. A notable TTS technique is Self-Consistency (SC), which generates multiple reasoning chains in parallel and selects the final answer via majority voting. While effective, the order-of-magnitude computational overhead limits its broad deployment. Prior attempts to accelerate SC mainly rely on model-based confidence scores or heuristics with limited empirical support. For the first time, we theoretically and empirically analyze the inefficiencies of SC and reveal actionable opportunities for improvement. Building on these insights, we propose Slim-SC, a (thinking) step-wise pruning strategy that removes redundant chains using inter-chain similarity at the thought level. Experiments on three mathematical reasoning datasets and two recent LLM ar...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","NLP","1970-01-01","LLM","efficient"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/scaling-llm-test-time-compute-with-mobile-npu-on-smartphones","title":"Scaling LLM Test-Time Compute with Mobile NPU on Smartphones","url":"https://www.microsoft.com/en-us/research/publication/scaling-llm-test-time-compute-with-mobile-npu-on-smartphones/","published":"2025-11-01","authors":["Zixu Hao","Jianyu Wei","Tuowei Wang","Minxing Huang","Huiqiang Jiang","Shiqi Jiang","Ting Cao","Ju Ren"],"abstract":"Deploying Large Language Models (LLMs) on mobile devices faces the challenge of insufficient performance in smaller models and excessive resource consumption in larger ones. This paper highlights that mobile Neural Processing Units (NPUs) have underutilized computational resources, particularly their matrix multiplication units, during typical LLM inference. To leverage this idle compute capacity, we proposes applying test-time scaling techniques on mobile NPUs to enhance the performance of smaller LLMs. However, this approach confronts inherent NPU challenges, such as inadequate hardware support for fine-grained quantization and low efficiency in general-purpose computations. We address these by designing and implementing an end-to-end LLM inference system for Qualcomm Hexagon NPUs. This system incorporates hardware-aware, fine-grained tile group quantization, weight rearrangement and q...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Systems and networking","1970-01-01","LLM","quantization"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/linear-differential-vision-transformer-learning-visual-contrasts-via-pairwise-differentials","title":"Linear Differential Vision Transformer: Learning Visual Contrasts via Pairwise Differentials","url":"https://www.microsoft.com/en-us/research/publication/linear-differential-vision-transformer-learning-visual-contrasts-via-pairwise-differentials/","published":"2025-11-01","authors":["Yifan Pu","Jixuan Ying","Tianzhu Ye","Dongchen Han","Ziyi Wang","Qixiu Li","shao xinyu","Xiaochen Wang","Gao Huang","Xiu Li"],"abstract":"Vision Transformers (ViTs) have become a universal backbone for both image recognition and image generation. Yet their Multi–Head Self–Attention (MHSA) layer still performs a quadratic query–key interaction for \\emph{every} token pair, spending the bulk of computation on visually weak or redundant correlations. We introduce \\emph{Visual–Contrast Attention} (VCA), a drop-in replacement for MHSA that injects an explicit notion of discrimination while reducing the theoretical complexity from to with . VCA first distils each head’s dense query field into a handful of spatially pooled \\emph{visual–contrast tokens}, then splits them into a learnable \\emph{positive} and \\emph{negative} stream whose differential interaction highlights what truly separates one region from another. The module adds fewer than \\,M parameters to a DeiT-Tiny backbone, requires no extra FLOPs, and is wholly architectur...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer vision","Computer science","Computer Vision and Pattern Recognition","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/jailbreak-distillation-renewable-safety-benchmarking","title":"Jailbreak Distillation: Renewable Safety Benchmarking","url":"https://www.microsoft.com/en-us/research/publication/jailbreak-distillation-renewable-safety-benchmarking/","published":"2025-11-01","authors":["Jingyu Zhang","Ahmed Elgohary","Xiawei Wang","A S M Iftekhar","Ahmed Magooda","Ben Van Durme","Daniel Khashabi","Kyle Jackson"],"abstract":"Large language models (LLMs) are rapidly deployed in critical applications, raising urgent needs for robust safety benchmarking. We propose Jailbreak Distillation (JBDistill), a novel benchmark construction framework that\"distills\"jailbreak attacks into high-quality and easily-updatable safety benchmarks. JBDistill utilizes a small set of development models and existing jailbreak attack algorithms to create a candidate prompt pool, then employs prompt selection algorithms to identify an effective subset of prompts as safety benchmarks. JBDistill addresses challenges in existing safety evaluation: the use of consistent evaluation prompts across models ensures fair comparisons and reproducibility. It requires minimal human effort to rerun the JBDistill pipeline and produce updated benchmarks, alleviating concerns on saturation and contamination. Extensive experiments demonstrate our benchm...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","Generative AI","1970-01-01","distillation"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/sherlock-reliable-and-efficient-workflow-execution","title":"Sherlock: Reliable and efficient workflow execution","url":"https://www.microsoft.com/en-us/research/publication/sherlock-reliable-and-efficient-workflow-execution/","published":"2025-11-01","authors":["Yeonju Ro","Haoran Qiu","Íñigo Goiri","Rodrigo Fonseca","Ricardo Bianchini","Aditya Akella","Zhangyang Wang","Mattan Erez","Esha Choukse"],"abstract":"With the increasing adoption of large language models (LLM), agentic workflows, which compose multiple LLM calls with tools, retrieval, and reasoning steps, are increasingly replacing traditional applications. However, such workflows are inherently error-prone: incorrect or partially correct output at one step can propagate or even amplify through subsequent stages, compounding the impact on the final output. Recent work proposes integrating verifiers that validate LLM output or actions, such as self-reflection, debate, or LLM-as-a-judge mechanisms. Yet, verifying every step introduces significant latency and cost overheads. In this work, we seek to answer three key questions: which nodes in a workflow are most error-prone and thus deserve costly verification, how to select the most appropriate verifier for each node, and how to use verification with minimal impact to latency? Our soluti...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Unpublished","Artificial intelligence","LLM","retrieval","efficient"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/rad-phi4-vision-cxr-a-compact-multimodal-assistant-for-versatile-radiology-workflows","title":"Rad-Phi4-Vision-CXR: A Compact Multimodal Assistant for Versatile Radiology Workflows","url":"https://www.microsoft.com/en-us/research/publication/rad-phi4-vision-cxr-a-compact-multimodal-assistant-for-versatile-radiology-workflows/","published":"2025-11-01","authors":["Mercy Ranjit","Tanuja Ganu"],"abstract":"The integration of artificial intelligence into radiology underscores the need for efficient models capable of supporting a wide range of clinical tasks. We introduce Rad-Phi4-Vision-CXR , a compact multimodal vision-language model designed to seamlessly integrate into radiology workflows for chest X-rays. It supports radiology report generation, fine-grained visual question answering (VQA) for abnormalities and tubes/lines (including presence and placement), and grounding capabilities for anatomies, pathologies, and medical devices. Beyond these tasks, we propose a capability for findings generation with causal exploration of radiology findings and differential diagnosis, enabling the model to affirm findings or rule out conditions, thereby enhancing its utility in clinical decision-making. Rad-Phi4-Vision CXR achieves state-of-the-art performance on multiple benchmarks for report gener...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Inproceedings (Conference)","Medical, health and genomics","1970-01-01","language model","efficient"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/image-as-a-world-generating-interactive-world-from-single-image-via-panoramic-video-generation","title":"Image as a World: Generating Interactive World from Single Image via Panoramic Video Generation","url":"https://www.microsoft.com/en-us/research/publication/image-as-a-world-generating-interactive-world-from-single-image-via-panoramic-video-generation/","published":"2025-11-01","authors":["Dongnan Gui","Xun Guo","Wengang Zhou","Yan Lu"],"abstract":"Generating an interactive visual world from a single image is both challenging and practically valuable, as single-view inputs are easy to acquire and align well with prompt-driven applications such as gaming and virtual reality. This paper introduces a novel unified framework, Image as a World ( IaaW ), which synthesizes high-quality 360-degree videos from a single image that are both controllable and temporally continuable. Our framework consists of three stages: world initialization, which jointly synthesizes spatially complete and temporally dynamic scenes from a single view; world exploration, which supports user-specified viewpoint rotation; and world continuation, which extends the generated scene forward in time with temporal consistency. To support this pipeline, we design a visual world model based on generative diffusion models modulated with spherical 3D positional encoding a...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Inproceedings (Conference)","Computer vision","Computer Vision and Pattern Recognition","1970-01-01","language model"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/generative-caching-for-structurally-similar-prompts-and-responses","title":"Generative Caching for Structurally Similar Prompts and Responses","url":"https://www.microsoft.com/en-us/research/publication/generative-caching-for-structurally-similar-prompts-and-responses/","published":"2025-11-01","authors":["Sarthak Chakraborty","Suman Nath","Xuchao Zhang","Chetan Bansal","Indranil Gupta"],"abstract":"Large Language Models (LLMs) are increasingly being used to plan, reason, and execute tasks across various scenarios. Use cases like repeatable workflows, chatbots, and AI agents often involve recurring tasks and tend to reuse similar prompts when interacting with the LLM. This opens up opportunities for caching. With structurally similar prompts that differ in subtle yet important ways, which are also reflected in their corresponding responses, exact prompt matching fails, while semantic caching techniques may return cached responses that are incorrect since they ignore these variations. To address this, we introduce GenCache, a generative cache that produces variation-aware responses for structurally similar prompts. It identifies and reuses the pattern in which responses are generated for structurally similar prompts for new requests. We show that GenCache achieves an 83% cache hit ra...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Systems and networking","1970-01-01","LLM"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/privacy-in-action-towards-realistic-privacy-mitigation-and-evaluation-for-llm-powered-agents","title":"Privacy in Action: Towards Realistic Privacy Mitigation and Evaluation for LLM-Powered Agents","url":"https://www.microsoft.com/en-us/research/publication/privacy-in-action-towards-realistic-privacy-mitigation-and-evaluation-for-llm-powered-agents/","published":"2025-11-01","authors":["Shouju Wang","Fenglin Yu","Xirui Liu","Xiaoting Qin","Jue Zhang","Qingwei Lin 林庆维","Dongmei Zhang","Saravan Rajmohan"],"abstract":"The increasing autonomy of LLM agents in handling sensitive communications, accelerated by Model Context Protocol (MCP) and Agent-to-Agent (A2A) frameworks, creates urgent privacy challenges. While recent work reveals significant gaps between LLMs’ privacy Q&A performance and their agent behavior, existing benchmarks remain limited to static, simplified scenarios. We present PrivacyChecker, a model-agnostic, contextual integrity based mitigation approach that effectively reduces privacy leakage from 36.08% to 7.30% on DeepSeek-R1 and from 33.06% to 8.32% on GPT-4o, all while preserving task helpfulness. We also introduce PrivacyLens-Live, transforming static benchmarks into dynamic MCP and A2A environments that reveal substantially higher privacy risks in practical. Our modular mitigation approach integrates seamlessly into agent protocols through three deployment strategies, providing p...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","LLM","agent"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/out-of-sight-not-out-of-context-egocentric-spatial-reasoning-in-vlms-across-disjoint-frames","title":"Out of sight, not out of context? Egocentric spatial reasoning in VLMs across disjoint frames","url":"https://www.microsoft.com/en-us/research/publication/out-of-sight-not-out-of-context-egocentric-spatial-reasoning-in-vlms-across-disjoint-frames/","published":"2025-11-01","authors":["Sahithya Ravi","Gabriel Herbert Sarch","Vibhav Vineet","Andrew D. Wilson","Balasaravanan Thoravi Kumaravel"],"abstract":"An embodied AI assistant operating on egocentric video must integrate spatial cues across time-for instance, determining where an object A, glimpsed a few moments ago lies relative to an object B encountered later. We introduce Disjoint-3DQA, a generative QA benchmark that evaluates this ability of VLMs by posing questions about object pairs that are not co-visible in the same frame. We evaluated seven state-of-the-art VLMs and found that models lag behind human performance by 28%, with steeper declines in accuracy (60%→ 30%) as the temporal gap widens. Our analysis further reveals that providing trajectories or bird’s-eye-view projections to VLMs results in only marginal improvements, whereas providing oracle 3D coordinates leads to a substantial 20% performance increase. This highlights a core bottleneck of multi-frame VLMs in constructing and maintaining 3D scene representations over....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer vision","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/meeting-delegate-benchmarking-llms-on-attending-meetings-on-our-behalf","title":"MEETING DELEGATE: Benchmarking LLMs on Attending Meetings on Our Behalf","url":"https://www.microsoft.com/en-us/research/publication/meeting-delegate-benchmarking-llms-on-attending-meetings-on-our-behalf/","published":"2025-11-01","authors":["Lingxiang Hu","Shurun Yuan","Xiaoting Qin","Jue Zhang","Qingwei Lin 林庆维","Dongmei Zhang","Saravan Rajmohan","Qi Zhang"],"abstract":"In contemporary workplaces, meetings are essential for exchanging ideas and ensuring team alignment but often face challenges such as time consumption, scheduling conflicts, and inefficient participation. Recent advancements in Large Language Models (LLMs) have demonstrated their strong capabilities in natural language generation and reasoning, prompting the question: can LLMs effectively delegate participants in meetings? To explore this, we develop a prototype LLM-powered meeting delegate system and create a comprehensive benchmark using real meeting transcripts. Our evaluation reveals that GPT-4/4o maintain balanced performance between active and cautious engagement strategies. In contrast, Gemini 1.5 Pro tends to be more cautious, while Gemini 1.5 Flash and Llama3-8B/70B display more active tendencies. Overall, about 60\\% of responses address at least one key point from the ground-tr...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","1970-01-01","LLM"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/from-measurement-to-expertise-empathetic-expert-adapters-for-context-based-empathy-in-conversational-ai-agents","title":"From Measurement to Expertise: Empathetic Expert Adapters for Context-Based Empathy in Conversational AI Agents","url":"https://www.microsoft.com/en-us/research/publication/from-measurement-to-expertise-empathetic-expert-adapters-for-context-based-empathy-in-conversational-ai-agents/","published":"2025-11-01","authors":["Erfan Shayegani","Jina Suh","Andrew D. Wilson","Nagu Rangan","Javier Hernandez"],"abstract":"Empathy is a critical factor in fostering positive user experiences in conversational AI. While models can display empathy, it is often generic rather than tailored to specific tasks and contexts. In this work, we introduce a novel framework for developing and evaluating context-specific empathetic large language models (LLMs). We first analyze a real-world conversational dataset consisting of 672 multi-turn conversations across 8 tasks, revealing significant differences in terms of expected and experienced empathy before and after the conversations, respectively. To help minimize this gap, we develop a synthetic multi-turn conversational generation pipeline and steer responses toward our defined empathy patterns based on the context that more closely matches users' expectations. We then train empathetic expert adapters for context-specific empathy that specialize in varying empathy leve...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Unpublished","Artificial intelligence","Human-computer interaction"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/less-is-more-generating-time-series-with-llama-style-autoregression-in-simple-factorized-latent-spaces","title":"Less Is More: Generating Time Series with LLaMA-Style Autoregression in Simple Factorized Latent Spaces","url":"https://www.microsoft.com/en-us/research/publication/less-is-more-generating-time-series-with-llama-style-autoregression-in-simple-factorized-latent-spaces/","published":"2025-11-01","authors":["Siyuan Li","Yifan Sun","Lei Cheng","Lewen Wang","Yang Liu","Weiqing Liu","Jianlong Li","Jiang Bian","Shikai Fang"],"abstract":"Generative models for multivariate time series are essential for data augmentation, simulation, and privacy preservation, yet current state-of-the-art diffusion-based approaches are slow and limited to fixed-length windows. We propose FAR-TS, a simple yet effective framework that combines disentangled factorization with an autoregressive Transformer over a discrete, quantized latent space to generate time series. Each time series is decomposed into a data-adaptive basis that captures static cross-channel correlations and temporal coefficients that are vector-quantized into discrete tokens. A LLaMA-style autoregressive Transformer then models these token sequences, enabling fast and controllable generation of sequences with arbitrary length. Owing to its streamlined design, FAR-TS achieves orders-of-magnitude faster generation than Diffusion-TS while preserving cross-channel correlations....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Unpublished","Artificial intelligence"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W4416140781","title":"INNV-35. Artificial intelligence in Neuro-Oncology: Mapping the field","url":"https://doi.org/10.1093/neuonc/noaf201.0924","published":"2025-11-01","authors":["Sebastian Voigtlaender","Thomas Nelson","Philipp Karschnia","Eugene Vaios","Michelle M. Kim","Philipp Lohmann","Norbert Galldiks","Mariella G Filbin","Shekoofeh Azizi","Vivek Natarajan","Michelle Monje","Jörg Dietrich"],"abstract":"Abstract BACKGROUND Artificial intelligence (AI) is reshaping neuro-oncology research and clinical practice. This abstract summarizes key findings from a peer-reviewed review article (accepted, The Lancet Digital Health), which maps AI applications across the neuro-oncological care trajectory and critically examines major opportunities, challenges, and future directions. METHODS We searched PubMed, ArXiv, and Google Scholar using comprehensive MeSH term-based search strings (e.g., “glioma,” “machine learning,”, “foundation model,” “omics”) from 1/1/2020–12/7/2024. Article metadata were retrieved via Python wrappers (built around PubMed and ArXiv APIs) or manually (from Google Scholar). Records were screened and deduplicated. Studies were selected based on predefined criteria, including explicit use of machine learning (ML) as a core technology, a multicentric or independent validation co...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1093/neuonc/noaf201.0924","openalex_id":"https://openalex.org/W4416140781","cited_by_count":0,"quality_score":41,"matched_keywords":["personalized"],"author_affiliations":["Broad Institute","Center for Neuro-Oncology","Dana-Farber/Boston Children's Cancer and Blood Disorders Center","Duke Medical Center","Forschungszentrum Jülich","Google (Canada)","Google (United States)","Howard Hughes Medical Institute","Ludwig-Maximilians-Universität München","Massachusetts General Hospital","Max Planck Institute for Biological Cybernetics","Michigan Medicine","Neurological Surgery","RWTH Aachen University","Stanford University","University Hospital Cologne","University of California, San Francisco","University of Cologne","University of Michigan","Universitätsklinikum Aachen","Virtual High School"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6266000270843506},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.593999981880188},{"id":"https://openalex.org/C93518851","display_name":"Metadata","score":0.5792999863624573},{"id":"https://openalex.org/C2522767166","display_name":"Data science","score":0.5733000040054321},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.45969998836517334},{"id":"https://openalex.org/C9652623","display_name":"Field (mathematics)","score":0.45500001311302185},{"id":"https://openalex.org/C108583219","display_name":"Deep learning","score":0.40720000863075256},{"id":"https://openalex.org/C519991488","display_name":"Python (programming language)","score":0.3587999939918518}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4416751217","title":"Evolutionary Prompt Design for LLM-Based Post-ASR Error Correction","url":"https://doi.org/10.1109/sips66314.2025.11261248","published":"2025-11-01","authors":["Rithik Sachdev","Zhong-Qiu Wang","Chao-Han Huck Yang"],"abstract":"Building upon the strength of modern large language models (LLMs), generative error correction (GEC) has emerged as a promising paradigm that can elevate the performance of modern automatic speech recognition (ASR) systems. One representative approach is to leverage in-context learning to prompt LLMs so that a better hypothesis can be generated by the LLMs based on a carefully-designed prompt and an N-best list of hypotheses produced by ASR systems. However, it is yet unknown whether the existing prompts are the most effective ones for the task of post-ASR error correction. In this context, this paper first explores alternative prompts to identify an initial set of effective prompts, and then proposes to employ an evolutionary prompt optimization algorithm to refine the initial prompts. Evaluations results on the CHiME-4 subset of the Task 1 of the SLT 2024 GenSEC challenge show the effe...","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/sips66314.2025.11261248","openalex_id":"https://openalex.org/W4416751217","cited_by_count":0,"quality_score":41,"matched_keywords":["LLM"],"author_affiliations":["Carnegie Mellon University","Nvidia (United States)","Southern University of Science and Technology"],"concepts":[{"id":"https://openalex.org/C153083717","display_name":"Leverage (statistics)","score":0.7595999836921692},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6640999913215637},{"id":"https://openalex.org/C2780451532","display_name":"Task (project management)","score":0.6581000089645386},{"id":"https://openalex.org/C177264268","display_name":"Set (abstract data type)","score":0.5299000144004822},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5045999884605408},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.4918999969959259},{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.4832000136375427},{"id":"https://openalex.org/C103088060","display_name":"Error detection and correction","score":0.4551999866962433}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7117621275","title":"Does visualization help AI understand data?","url":"https://doi.org/10.1109/vis60296.2025.00016","published":"2025-11-01","authors":["Victoria R. Li","Johnathan L. Sun","Martin Wattenberg"],"abstract":"Charts and graphs help people analyze data, but can they also be useful to AI systems? To investigate this question, we perform a series of experiments with two commercial vision-language models: GPT 4.1 and Claude 3.5. Across three representative analysis tasks, the two systems describe synthetic datasets more precisely and accurately when raw data is accompanied by a scatterplot, especially as datasets grow in complexity. Comparison with two baselines— providing a blank chart and a chart with mismatched data—shows that the improved performance is due to the content of the charts. Our results are initial evidence that AI systems, like humans, can benefit from visualization.","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/vis60296.2025.00016","openalex_id":"https://openalex.org/W7117621275","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Google (United States)","Harvard University Press"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7585999965667725},{"id":"https://openalex.org/C190812933","display_name":"Chart","score":0.6599000096321106},{"id":"https://openalex.org/C36464697","display_name":"Visualization","score":0.5898000001907349},{"id":"https://openalex.org/C2778089247","display_name":"Blank","score":0.5740000009536743},{"id":"https://openalex.org/C124101348","display_name":"Data mining","score":0.5195000171661377},{"id":"https://openalex.org/C172367668","display_name":"Data visualization","score":0.4959999918937683},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.486299991607666},{"id":"https://openalex.org/C132964779","display_name":"Raw data","score":0.4837000072002411}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"official:a9b1bda8136cbf75","title":"Claude Opus 4.5 System Card","url":"https://www-cdn.anthropic.com/bf10f64990cfda0ba858290be7b8cc6317685f47.pdf","published":"2025-11","authors":["Anthropic"],"abstract":"Official Anthropic system card for Claude Opus 4.5.","companies":["Anthropic"],"matched_orgs":["Anthropic"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"model_card","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["system card","Claude","Claude Opus 4.5"],"author_affiliations":["Anthropic"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Anthropic system cards page https://www.anthropic.com/system-cards"}},{"id":"official:8beeec40249a9c44","title":"Do What You Say: Steering Vision-Language-Action Models via Runtime Reasoning-Action Alignment Verification","url":"https://research.nvidia.com/publication/2025-11_do-what-you-say-steering-vision-language-action-models-runtime-reasoning-action","published":"2025-11","authors":["Yilin Wu","Anqi Li","Tucker Hermans","Fabio Ramos","Andrea Bajcs","Claudia Pérez D’Arpino"],"abstract":"Official NVIDIA Research publication.","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["NVIDIA"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official NVIDIA Research publications page https://research.nvidia.com/publications?f%5B0%5D=publication_date%3A2025&page=0"}},{"id":"official:349c06a94777cb5a","title":"Data-Driven Loss Functions for Inference-Time Optimization in Text-to-Image","url":"https://research.nvidia.com/publication/2025-11_data-driven-loss-functions-inference-time-optimization-text-image","published":"2025-11","authors":["Sapir Yflah","Yuval Atzmon","Gal Chechik"],"abstract":"Official NVIDIA Research publication.","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["NVIDIA"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official NVIDIA Research publications page https://research.nvidia.com/publications?f%5B0%5D=publication_date%3A2025&page=0"}},{"id":"hf-org-paper:tencent:2510.27688","title":"Continuous Autoregressive Language Models","url":"https://huggingface.co/papers/2510.27688","published":"2025-10-31","authors":["Tencent/Hunyuan"],"abstract":"","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["HuggingFace org papers","tencent"],"author_affiliations":["Tencent/Hunyuan"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/tencent/papers"}},{"id":"openalex:W4415743526","title":"Chat2Layout: Interactive 3D Furniture Layout With a Multimodal LLM","url":"https://doi.org/10.1109/tvcg.2025.3626731","published":"2025-10-31","authors":["Can Wang","Hongliang Zhong","Menglei Chai","Mingming He","Dongdong Chen","Jing Liao"],"abstract":"Automatic furniture layout is long desired for convenient interior design. Leveraging the remarkable visual reasoning capabilities of multimodal large language models (MLLMs), recent methods address layout generation in a static manner, lacking the feedback-driven refinement essential for interactive user engagement. We introduce Chat2Layout, a novel interactive furniture layout generation system that extends the functionality of MLLMs into the realm of interactive layout design. To achieve this, we establish a unified vision-question paradigm for in-context learning, enabling seamless communication with MLLMs to steer their behavior without altering model weights. Within this framework, we present a novel training-free visual prompting mechanism. This involves a visual-text prompting technique that assist MLLMs in reasoning about plausible layout plans, followed by an Offline-to-Online....","companies":["Google/DeepMind","Microsoft"],"matched_orgs":["Google/DeepMind","Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/tvcg.2025.3626731","openalex_id":"https://openalex.org/W4415743526","cited_by_count":3,"quality_score":60,"matched_keywords":["LLM","agent"],"author_affiliations":["City University of Hong Kong","Google (United States)","Microsoft (United States)","Netflix (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8697999715805054},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.6496999859809875},{"id":"https://openalex.org/C177264268","display_name":"Set (abstract data type)","score":0.5893999934196472},{"id":"https://openalex.org/C36464697","display_name":"Visualization","score":0.4661000072956085},{"id":"https://openalex.org/C194969405","display_name":"Virtual reality","score":0.3935000002384186},{"id":"https://openalex.org/C26760741","display_name":"Perception","score":0.3919000029563904},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3474999964237213},{"id":"https://openalex.org/C205711294","display_name":"Rendering (computer graphics)","score":0.3109000027179718}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":3}},{"id":"openalex:W4415745796","title":"LLM-AGR: Large Language Model Augmented Graph Representation Learning for Recommendation","url":"https://doi.org/10.1016/j.knosys.2025.114791","published":"2025-10-31","authors":["Xinji Zha","Yumin Dong","Haomiao Jiang","Zihui Xu","Chao Wang"],"abstract":"","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1016/j.knosys.2025.114791","openalex_id":"https://openalex.org/W4415745796","cited_by_count":1,"quality_score":46,"matched_keywords":["LLM","language model"],"author_affiliations":["Chongqing Normal University","Dalian University","Dalian University of Foreign Languages","Dalian University of Technology","Huawei Technologies (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8208000063896179},{"id":"https://openalex.org/C21569690","display_name":"Collaborative filtering","score":0.6498000025749207},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5552999973297119},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.4738999903202057},{"id":"https://openalex.org/C2780513914","display_name":"Bottleneck","score":0.4456999897956848},{"id":"https://openalex.org/C132525143","display_name":"Graph","score":0.4433000087738037},{"id":"https://openalex.org/C557471498","display_name":"Recommender system","score":0.4162999987602234},{"id":"https://openalex.org/C2776359362","display_name":"Representation (politics)","score":0.3725000023841858}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4415736045","title":"Retrieval-Augmented Sign Language Translation","url":"https://doi.org/10.1145/3771277","published":"2025-10-31","authors":["Huijie Yao","Wengang Zhou","Hao Zhou","Hezhen Hu","Houqiang Li"],"abstract":"In this work, we present a framework named Retrieval-Augmented Sign Language Translation (RASLT). Since human translators can provide more accurate answers when they have access to similar translation samples proofread by experts, it is generally believed that similar references should be beneficial for the translation process. To augment existing approaches with extra references beyond input sign language video, our RASLT utilizes a cross-modal query expansion mechanism to enhance the input of existing sign language translation systems. Technically, our RASLT performs sign language translation in two stages, i.e., video retrieval and text generation. The video retriever first searches for the extra text descriptions from a sign language database based on the similarities between the sign language videos. Then, the retrieval-augmented translator takes the retrieved text descriptions as a...","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3771277","openalex_id":"https://openalex.org/W4415736045","cited_by_count":0,"quality_score":41,"matched_keywords":["retrieval"],"author_affiliations":["Baidu (China)","University of Science and Technology of China"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.9110999703407288},{"id":"https://openalex.org/C522192633","display_name":"Sign language","score":0.7710999846458435},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.670199990272522},{"id":"https://openalex.org/C203005215","display_name":"Machine translation","score":0.5658000111579895},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5566999912261963},{"id":"https://openalex.org/C139676723","display_name":"Sign (mathematics)","score":0.536899983882904},{"id":"https://openalex.org/C149364088","display_name":"Translation (biology)","score":0.5123999714851379},{"id":"https://openalex.org/C2780226545","display_name":"Modality (human–computer interaction)","score":0.5115000009536743}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/magentic-marketplace-an-open-source-environment-for-studying-agentic-markets","title":"Magentic Marketplace: An Open-Source Environment for Studying Agentic Markets","url":"https://www.microsoft.com/en-us/research/publication/magentic-marketplace-an-open-source-environment-for-studying-agentic-markets/","published":"2025-10-30","authors":["Gagan Bansal","Wenyue Hua","Zezhou Huang","Adam Fourney","Amanda Swearngin","Will Epperson","Tyler Payne","Jake Hofman","Brendan Lucier","Chinmay Singh","Markus Mobius","Akshay Nambi"],"abstract":"As LLM agents advance, they are increasingly mediating economic decisions, ranging from prod-uct discovery to transactions, on behalf of users. Such applications promise benefits but also raisemany questions about agent accountability and value for users. Addressing these questions requiresunderstanding how agents behave in realistic market conditions. However, previous research haslargely evaluated agents in constrained settings, such as single-task marketplaces (e.g., negotiation)or structured two-agent interactions. Real-world markets are fundamentally different: they requireagents to handle diverse economic activities and coordinate within large, dynamic ecosystems wheremultiple agents with opaque behaviors may engage in open-ended dialogues. To bridge this gap, weinvestigate two-sided agentic marketplaces where Assistant agents represent consumers and Serviceagents represent competi...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Tech Report","Artificial intelligence","Economics","LLM","efficient","agent"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"hf-org-paper:moonshotai:2510.26692","title":"Kimi Linear: An Expressive, Efficient Attention Architecture","url":"https://huggingface.co/papers/2510.26692","published":"2025-10-30","authors":["Moonshot/Kimi"],"abstract":"We introduce Kimi Linear, a hybrid linear attention architecture that, for the first time, outperforms full attention under fair comparisons across various scenarios -- including short-context, long-context, and reinforcement learning (RL) scaling regimes. At its core lies Kimi Delta Attention (KDA), an expressive linear attention module that extends Gated DeltaNet with a finer-grained gating mechanism, enabling more effective use of limited finite-state RNN memory. Our bespoke chunkwise algorithm achieves high hardware efficiency through a specialized variant of the Diagonal-Plus-Low-Rank (DPLR) transition matrices, which substantially reduces computation compared to the general DPLR formulation while remaining more consistent with the classical delta rule. We pretrain a Kimi Linear model with 3B activated parameters and 48B total parameters, based on a layerwise hybrid of KDA and Multi...","companies":["Moonshot/Kimi"],"matched_orgs":["Moonshot/Kimi"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["HuggingFace org papers","moonshotai","memory","efficient"],"author_affiliations":["Moonshot/Kimi"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/moonshotai/papers"}},{"id":"hf-org-paper:tencent:2510.26697","title":"The End of Manual Decoding: Towards Truly End-to-End Language Models","url":"https://huggingface.co/papers/2510.26697","published":"2025-10-30","authors":["Tencent/Hunyuan"],"abstract":"","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["HuggingFace org papers","tencent"],"author_affiliations":["Tencent/Hunyuan"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/tencent/papers"}},{"id":"openalex:W7128492094","title":"Enhancing Customer Journey Intelligence: A Comprehensive Framework for 360 - Degree Analytics Using Generative AI","url":"https://doi.org/10.53469/jrse.2025.07(10).06","published":"2025-10-30","authors":["Cibaca Khandelwal"],"abstract":"The marketing analytics landscape is being transformed by the convergence of Generative AI and Advanced Attribution Models. Generative AI enables the creation of unique, personalized content, revolutionizing customer engagement and campaign optimization [3 - 6]. Advanced Attribution Models provide unprecedented insights into the complex customer journey, tracking the impact of touchpoints and channels on conversion rates [7 - 10]. This article explores the integration of these cutting - edge technologies, demonstrating how organizations can harness their synergies to drive measurable improvements in marketing performance [1 - 2]. Through case studies and analysis, the study examines the practical applications, challenges, and strategic implications of this transformative approach [11 - 12]. The findings offer valuable insights for practitioners, data scientists, and leaders, providing a....","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.53469/jrse.2025.07(10).06","openalex_id":"https://openalex.org/W7128492094","cited_by_count":0,"quality_score":41,"matched_keywords":["personalized"],"author_affiliations":["Amazon (United States)"],"concepts":[{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.6894000172615051},{"id":"https://openalex.org/C2776915394","display_name":"Customer engagement","score":0.6884999871253967},{"id":"https://openalex.org/C153083717","display_name":"Leverage (statistics)","score":0.6715999841690063},{"id":"https://openalex.org/C70587473","display_name":"Transformative learning","score":0.629800021648407},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5989000201225281},{"id":"https://openalex.org/C79158427","display_name":"Analytics","score":0.5867000222206116},{"id":"https://openalex.org/C2522767166","display_name":"Data science","score":0.5062999725341797},{"id":"https://openalex.org/C167966045","display_name":"Generative model","score":0.5030999779701233}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4415682262","title":"Benchmarking large and small MLLMs","url":"https://doi.org/10.1007/s00138-025-01762-0","published":"2025-10-30","authors":["Xuelu Feng","Y. Li","Dongdong Chen","Mei Gao","Mengchen Liu","Junsong Yuan","Chunming Qiao"],"abstract":"Abstract Large multimodal language models (MLLMs) such as GPT-4V and GPT-4o have achieved remarkable advancements in understanding and generating multimodal content, showcasing superior quality and capabilities across diverse tasks. However, their deployment faces significant challenges, including slow inference, high computational cost, and impracticality for on-device applications. In contrast, the emergence of small MLLMs, exemplified by the LLava-series models and Phi-3-Vision, offers promising alternatives with faster inference, reduced deployment costs, and the ability to handle domain-specific scenarios. Despite their growing presence, the capability boundaries between large and small MLLMs remain underexplored. In this work, we conduct a systematic and comprehensive evaluation to benchmark both small and large MLLMs, spanning general capabilities such as object recognition, tempo...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1007/s00138-025-01762-0","openalex_id":"https://openalex.org/W4415682262","cited_by_count":3,"quality_score":40,"matched_keywords":[],"author_affiliations":["Microsoft (United States)","University at Buffalo, State University of New York"],"concepts":[{"id":"https://openalex.org/C86251818","display_name":"Benchmarking","score":0.8147000074386597},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.744700014591217},{"id":"https://openalex.org/C105339364","display_name":"Software deployment","score":0.6190000176429749},{"id":"https://openalex.org/C185798385","display_name":"Benchmark (surveying)","score":0.5789999961853027},{"id":"https://openalex.org/C2779530757","display_name":"Quality (philosophy)","score":0.5663999915122986},{"id":"https://openalex.org/C170130773","display_name":"Usability","score":0.5360999703407288},{"id":"https://openalex.org/C2522767166","display_name":"Data science","score":0.47440001368522644},{"id":"https://openalex.org/C115903868","display_name":"Software engineering","score":0.40939998626708984}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":3}},{"id":"arxiv:2511.00088","title":"Alpamayo-R1: Bridging Reasoning and Action Prediction for Generalizable Autonomous Driving in the Long Tail","url":"https://huggingface.co/papers/2511.00088","published":"2025-10-30","authors":["NVIDIA","Yan Wang","Wenjie Luo","Junjie Bai","Yulong Cao","Tong Che","Ke Chen","Yuxiao Chen","Jenna Diamond","Yifan Ding","Wenhao Ding","Liang Feng"],"abstract":"End-to-end architectures trained via imitation learning have advanced autonomous driving by scaling model size and data, yet performance remains brittle in safety-critical long-tail scenarios where supervision is sparse and causal understanding is limited. To address this, we introduce Alpamayo-R1 (AR1), a vision-language-action model (VLA) that integrates Chain of Causation reasoning with trajectory planning to enhance decision-making in complex driving scenarios. Our approach features three key innovations: (1) the Chain of Causation (CoC) dataset, built through a hybrid auto-labeling and human-in-the-loop pipeline producing decision-grounded, causally linked reasoning traces aligned with driving behaviors; (2) a modular VLA architecture combining Cosmos-Reason, a Vision-Language Model pre-trained for Physical AI applications, with a diffusion-based trajectory decoder that generates dy...","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["huggingface_search"],"source":"huggingface_search","work_type":"","doi":"","openalex_id":"","cited_by_count":0,"quality_score":31,"matched_keywords":["language model"],"author_affiliations":[],"concepts":[],"official_report":false,"quality_signals":{"company_match_source":"HuggingFace Papers organization/author metadata"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/promediate-a-socio-cognitive-framework-for-evaluating-proactive-agents-in-multi-party-negotiation","title":"ProMediate: A Socio-cognitive framework for evaluating proactive agents in multi-party negotiation","url":"https://www.microsoft.com/en-us/research/publication/promediate-a-socio-cognitive-framework-for-evaluating-proactive-agents-in-multi-party-negotiation/","published":"2025-10-29","authors":["Ziyi Liu","Bahar Sarrafzadeh","Pei Zhou","Longqi Yang","Jieyu Zhao","Ashish Sharma"],"abstract":"While Large Language Models (LLMs) are increasingly used in agentic frameworks to assist individual users, there is a growing need for agents that can proactively manage complex, multi-party collaboration. Systematic evaluation methods for such proactive agents remain scarce, limiting progress in developing AI that can effectively support multiple people together. Negotiation offers a demanding testbed for this challenge, requiring socio-cognitive intelligence to navigate conflicting interests between multiple participants and multiple topics and build consensus. Here, we present ProMediate, the first framework for evaluating proactive AI mediator agents in complex, multi-topic, multi-party negotiations. ProMediate consists of two core components: (i) a simulation testbed based on realistic negotiation cases and theory-driven difficulty levels (ProMediate-Easy, ProMediate-Medium, and Pro...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Unpublished","Artificial intelligence","Human language technologies","Computer science","Natural language processing","agent"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/gistify-codebase-level-understanding-via-runtime-execution","title":"Gistify! Codebase-Level Understanding via Runtime Execution","url":"https://www.microsoft.com/en-us/research/publication/gistify-codebase-level-understanding-via-runtime-execution/","published":"2025-10-29","authors":["Hyunji Lee","Minseon Kim","Chinmay Singh","Matheus Pereira","Atharv Sonwane","Isadora White","Elias Stengel-Eskin","Mohit Bansal","Zhengyan Shi","Alessandro Sordoni","Marc-Alexandre Côté","Xingdi Yuan"],"abstract":"As coding agents are increasingly deployed in large codebases, the need to automatically design challenging, codebase-level evaluation is central. We propose Gistify, a task where a coding LLM must create a single, minimal, self-contained file that can reproduce a specific functionality of a codebase. The coding LLM is given full access to a codebase along with a specific entrypoint (e.g., a python command), and the generated file must replicate the output of the same command ran under the full codebase, while containing only the essential components necessary to execute the provided command. Success on Gistify requires both structural understanding of the codebase, accurate modeling of its execution flow as well as the ability to produce potentially large code patches. Our findings show that current state-of-the-art models struggle to reliably solve Gistify tasks, especially ones with l...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","large language models","1970-01-01","LLM"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/measuring-ai-diffusion-a-population-normalized-metric-for-tracking-global-ai-usage","title":"Measuring AI Diffusion: A Population-Normalized Metric for Tracking Global AI Usage","url":"https://www.microsoft.com/en-us/research/publication/measuring-ai-diffusion-a-population-normalized-metric-for-tracking-global-ai-usage/","published":"2025-10-29","authors":["Amit Misra","Jane Wang","Scott McCullers","Kevin White","Juan M. Lavista Ferres"],"abstract":"Measuring global AI diffusion remains challenging due to a lack of population-normalized, crosscountry usage data. We introduce AI User Share, a novel indicator that estimates the share of each country’s working-age population actively using AI tools. Built from anonymized Microsoft telemetry and adjusted for device access and mobile scaling, this metric spans 148 economies and provides consistent, real-time insight into global AI diffusion. We find wide variation in adoption, with a strong correlation between AI User Share and GDP. High uptake is concentrated in developed economies, though usage among internet-connected populations in lower-income countries reveals substantial latent demand. We also detect sharp increases in usage following major product launches, such as DeepSeek in early 2025. While the metric’s reliance solely on Microsoft telemetry introduces potential biases relate...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Unpublished","Artificial intelligence"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/ai-diffusion-in-low-resource-language-countries","title":"AI Diffusion in Low Resource Language Countries","url":"https://www.microsoft.com/en-us/research/publication/ai-diffusion-in-low-resource-language-countries/","published":"2025-10-29","authors":["Amit Misra","Syed Waqas Zamir","Wassim Hamidouche","Inbal Becker-Reshef","Juan M. Lavista Ferres"],"abstract":"Artificial intelligence (AI) is diffusing globally at unprecedented speed, but adoption remains uneven. Frontier Large Language Models (LLMs) are known to perform poorly on low-resource languages due to data scarcity. We hypothesize that this performance deficit reduces the utility of AI, thereby slowing adoption in Low-Resource Language Countries (LRLCs). To test this, we use a weighted regression model to isolate the language effect from socioeconomic and demographic factors, finding that LRLCs have a share of AI users that is approximately 20% lower relative to their baseline. These results indicate that linguistic accessibility is a significant, independent barrier to equitable AI diffusion.","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Unpublished","Artificial intelligence"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"official:cdf9443d8637120a","title":"gpt-oss-safeguard technical report","url":"https://openai.com/index/gpt-oss-safeguard-technical-report","published":"2025-10-29","authors":["OpenAI"],"abstract":"gpt-oss-safeguard-120b and gpt-oss-safeguard-20b are two open-weight reasoning models post-trained from the gpt-oss models and trained to reason from a provided policy in order to label content under that policy. In this report, we describe gpt-oss-safeguard’s capabilities and provide our baseline safety evaluations on the gpt-oss-safeguard models, using the underlying gpt-oss models as a baseline. For more information about the development and architecture of the underlying gpt-oss models, see the original gpt-oss model model card⁠.","companies":["OpenAI"],"matched_orgs":["OpenAI"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Safety"],"author_affiliations":["OpenAI"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official RSS feed https://openai.com/news/rss.xml"}},{"id":"apple:t1imu5ake04b3hcg2z0kcfmo","title":"Reasoning’s Razor: Reasoning Improves Accuracy but Can Hurt Recall at Critical Operating Points in Safety and Hallucination Detection","url":"https://machinelearning.apple.com/research/reasoning-razor","published":"2025-10-29","authors":["Atoosa Chegini","Hamid Kazemi","Garrett Souza","Maria Safi","Yang Song","Samy Bengio","Sinead Williamson","Mehrdad Farajtabar"],"abstract":"Reasoning has become a central paradigm for large language models (LLMs), consistently boosting accuracy across diverse benchmarks. Yet its suitability for precision-sensitive tasks remains unclear. We present the first systematic study of reasoning for classification tasks under strict low false positive rate (FPR) regimes. Our analysis covers two tasks—safety detection and hallucination detection—evaluated in both fine-tuned and zero-shot...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"apple:lukgbgormmmp2tz69rbmhqm6","title":"RL for Reasoning by Adaptively Revealing Rationales","url":"https://machinelearning.apple.com/research/rl-for-reasoning","published":"2025-10-29","authors":["Mohammad Hossein Amani","Aryo Lotfi","Nicolas Mario Baldwin","Samy Bengio","Mehrdad Farajtabar","Emmanuel Abbé","Robert West"],"abstract":"We propose that reinforcement learning (RL) from partial expert demonstrations is not merely a training heuristic, but a promising framework for solving complex sequence generation tasks. Supervised fine-tuning (SFT) relies on dense ground-truth labels, which become increasingly costly as sequence length grows. RL, on the other hand, struggles with sparse rewards and a combinatorially large output space. We address this by introducing adaptive...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"openalex:W4415654010","title":"Toward Community-Led Evaluations of Text-to-Image AI Representations of Disability, Health, and Accessibility","url":"https://doi.org/10.1145/3757887.3763012","published":"2025-10-29","authors":["Cynthia L. Bennett","Shaun K. Kane","Christina Harrington"],"abstract":"","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3757887.3763012","openalex_id":"https://openalex.org/W4415654010","cited_by_count":2,"quality_score":39,"matched_keywords":[],"author_affiliations":["Google (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5332000255584717},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.43160000443458557},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.3082999885082245},{"id":"https://openalex.org/C15744967","display_name":"Psychology","score":0.2793000042438507},{"id":"https://openalex.org/C2776359362","display_name":"Representation (politics)","score":0.27869999408721924},{"id":"https://openalex.org/C177264268","display_name":"Set (abstract data type)","score":0.27799999713897705},{"id":"https://openalex.org/C2776401178","display_name":"Feature (linguistics)","score":0.2766000032424927},{"id":"https://openalex.org/C12713177","display_name":"Perspective (graphical)","score":0.27070000767707825}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":2}},{"id":"openalex:W4415686202","title":"Language Models as Ontology Encoders","url":"https://doi.org/10.1007/978-3-032-09527-5_24","published":"2025-10-29","authors":["Hui Yang","Jiaoyan Chen","Yuan He","Yongsheng Gao","Ian Horrocks"],"abstract":"","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"book-chapter","doi":"https://doi.org/10.1007/978-3-032-09527-5_24","openalex_id":"https://openalex.org/W4415686202","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Amazon (United States)","University of Manchester","University of Oxford"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8788999915122986},{"id":"https://openalex.org/C25810664","display_name":"Ontology","score":0.7024999856948853},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5620999932289124},{"id":"https://openalex.org/C101230327","display_name":"Web Ontology Language","score":0.5486000180244446},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.5472999811172485},{"id":"https://openalex.org/C41608201","display_name":"Embedding","score":0.5364000201225281},{"id":"https://openalex.org/C2776214188","display_name":"Inference","score":0.5299999713897705},{"id":"https://openalex.org/C167729594","display_name":"Axiom","score":0.5013999938964844}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/beyond-length-quantifying-long-range-information-for-long-context-llm-pretraining-data","title":"Beyond Length: Quantifying Long-Range Information for Long-Context LLM Pretraining Data","url":"https://www.microsoft.com/en-us/research/publication/beyond-length-quantifying-long-range-information-for-long-context-llm-pretraining-data/","published":"2025-10-28","authors":["Haoran Deng","Yingyu Lin","Zhenghao Lin","Xiao Liu","Yizhou Sun","Yian Ma","Yeyun Gong"],"abstract":"Long-context language models unlock advanced capabilities in reasoning, code generation, and document summarization by leveraging dependencies across extended spans of text. However, a significant portion of readily available long-text data lacks meaningful long-distance dependencies; most spans can be predicted using only local context. Training on such data is inefficient, making careful data selection crucial. Therefore, we introduce LongFilter, a framework for curating training data tailored to long-context pretraining. LongFilter measures the information gain provided by extended context by contrasting model predictions under long-context versus short-context settings, thereby identifying samples where long-range dependencies are essential. Experiments with LLaMA-3-8B, extending its context length from 8K to 64K, show that LongFilter efficiently selects high-quality data and yields....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","1970-01-01","LLM"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"apple:wzqr35ixfoq51da84i3meyhs","title":"Improving Language Model Personas via Rationalization with Psychological Scaffolds","url":"https://machinelearning.apple.com/research/psychological-scaffolds","published":"2025-10-28","authors":["Brihi Joshi","Xiang Ren","Swabha Swayamdipta","Rik Koncel-Kedziorski","Tim Paek"],"abstract":"Language models prompted with a user description or persona are being used to predict the user's preferences and opinions. However, existing approaches to building personas mostly rely on a user's demographic attributes and/or prior judgments, but not on any underlying reasoning behind a user's judgments. We introduce PB&J (Psychology of Behavior and Judgments), a framework that improves LM personas by incorporating potential rationales for why...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["language model"],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"openalex:W4415622515","title":"Open Spatio-Temporal Foundation Models for Traffic Prediction","url":"https://doi.org/10.1145/3773912","published":"2025-10-28","authors":["Zhonghang Li","Long Xia","Lei Shi","Yong Xu","Dawei Yin","Chao Huang"],"abstract":"Accurate traffic forecasting is crucial for effective urban planning and transportation management, enabling efficient resource allocation and enhanced travel experiences. However, existing models often face limitations in generalization, struggling with zero-shot prediction on unseen regions and cities, as well as diminished long-term accuracy. This is primarily due to the inherent challenges in handling the spatial and temporal heterogeneity of traffic data, coupled with the significant distribution shift across time and space. In this work, we aim to unlock new possibilities for building versatile, resilient and adaptive spatio-temporal foundation models for traffic prediction. We introduce OpenCity, a foundation model that captures underlying spatio-temporal patterns from diverse data, facilitating zero-shot generalization across urban environments. OpenCity integrates Transformers w...","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3773912","openalex_id":"https://openalex.org/W4415622515","cited_by_count":7,"quality_score":52,"matched_keywords":["long-term","efficient"],"author_affiliations":["Baidu (China)","South China University of Technology","University of Hong Kong"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8445000052452087},{"id":"https://openalex.org/C177148314","display_name":"Generalization","score":0.47119998931884766},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3961000144481659},{"id":"https://openalex.org/C2779888511","display_name":"Traffic congestion","score":0.38089999556541443},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.37119999527931213},{"id":"https://openalex.org/C165696696","display_name":"Exploit","score":0.35339999198913574},{"id":"https://openalex.org/C50644808","display_name":"Artificial neural network","score":0.3488999903202057},{"id":"https://openalex.org/C177774035","display_name":"Granularity","score":0.34369999170303345}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":7}},{"id":"openalex:W4415620301","title":"Assessing the effectiveness of recent closed-source large language models in fault localization and automated program repair","url":"https://doi.org/10.1007/s10515-025-00549-x","published":"2025-10-28","authors":["Bo Wang","Ming Deng","Mingda Chen","Youfang Lin","Jianyi Zhou","Jie M. Zhang"],"abstract":"","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1007/s10515-025-00549-x","openalex_id":"https://openalex.org/W4415620301","cited_by_count":2,"quality_score":39,"matched_keywords":[],"author_affiliations":["Beijing Academy of Artificial Intelligence","Beijing Jiaotong University","Cloud Computing Center","Huawei Technologies (China)","King's College London"],"concepts":[{"id":"https://openalex.org/C9652623","display_name":"Field (mathematics)","score":0.5023000240325928},{"id":"https://openalex.org/C26517878","display_name":"Key (lock)","score":0.4927999973297119},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.4893999993801117},{"id":"https://openalex.org/C2522767166","display_name":"Data science","score":0.4424000084400177},{"id":"https://openalex.org/C192209626","display_name":"Focus (optics)","score":0.4242999851703644},{"id":"https://openalex.org/C112930515","display_name":"Risk analysis (engineering)","score":0.3950999975204468},{"id":"https://openalex.org/C110875604","display_name":"The Internet","score":0.38679999113082886},{"id":"https://openalex.org/C548217200","display_name":"Java","score":0.37049999833106995}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":2}},{"id":"openalex:W4416501801","title":"Hello, GenAI? Dissecting Human to Generative AI Calling","url":"https://doi.org/10.1145/3730567.3764441","published":"2025-10-28","authors":["Ruizhi Cheng","Surendra Pathak","Guowu Xie","Matteo Varvello","Songqing Chen","Bo Han"],"abstract":"The rise of generative artificial intelligence (GenAI), powered by large language models, has led to the emergence of real-time, voice-based conversational applications that enable dynamic, multi-modal interactions for everyday tasks such as checking the weather or planning a trip. These human-to-GenAI calling applications blend speech processing, generative intelligence, and real-time communication, presenting new challenges in latency optimization, network infrastructure design, and resilience under load. Despite their growing popularity, little is known about the operational characteristics and performance of these applications. This paper conducts an empirical measurement of six human-to-GenAI calling applications from Google, Meta, Microsoft, and OpenAI, focusing on their input/output modalities, network behavior, latency metrics, and robustness. Our findings reveal key design choic...","companies":["Meta/FAIR"],"matched_orgs":["Meta/FAIR"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3730567.3764441","openalex_id":"https://openalex.org/W4416501801","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["George Mason University","Meta (United States)","Nokia (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7055000066757202},{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.6678000092506409},{"id":"https://openalex.org/C82876162","display_name":"Latency (audio)","score":0.6039999723434448},{"id":"https://openalex.org/C26517878","display_name":"Key (lock)","score":0.49889999628067017},{"id":"https://openalex.org/C2779585090","display_name":"Resilience (materials science)","score":0.39989998936653137},{"id":"https://openalex.org/C2983448237","display_name":"Language understanding","score":0.3734000027179718},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.34940001368522644},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.2847000062465668}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"arxiv:2510.24701","title":"Tongyi DeepResearch Technical Report","url":"https://huggingface.co/papers/2510.24701","published":"2025-10-28","authors":["Tongyi DeepResearch Team","Baixuan Li","Bo Zhang","Dingchu Zhang","Fei Huang","Guangyu Li","Guoxin Chen","Huifeng Yin","Jialong Wu","Jingren Zhou","Kuan Li","Liangcai Su"],"abstract":"We present Tongyi DeepResearch, an agentic large language model, which is specifically designed for long-horizon, deep information-seeking research tasks. To incentivize autonomous deep research agency, Tongyi DeepResearch is developed through an end-to-end training framework that combines agentic mid-training and agentic post-training, enabling scalable reasoning and information seeking across complex tasks. We design a highly scalable data synthesis pipeline that is fully automatic, without relying on costly human annotation, and empowers all training stages. By constructing customized environments for each stage, our system enables stable and consistent interactions throughout. Tongyi DeepResearch, featuring 30.5 billion total parameters, with only 3.3 billion activated per token, achieves state-of-the-art performance across a range of agentic deep research benchmarks, including Human...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["huggingface_search"],"source":"huggingface_search","work_type":"","doi":"","openalex_id":"","cited_by_count":0,"quality_score":31,"matched_keywords":["language model"],"author_affiliations":[],"concepts":[],"official_report":false,"quality_signals":{"company_match_source":"HuggingFace Papers organization/author metadata"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/empowering-agentic-video-analytics-systems-with-video-language-models","title":"AVA: Towards Agentic Video Analytics with Vision Language Models","url":"https://www.microsoft.com/en-us/research/publication/empowering-agentic-video-analytics-systems-with-video-language-models/","published":"2025-10-27","authors":["Yuxuan Yan","Shiqi Jiang","Ting Cao","Yifan Yang","Qianqian Yang","Yuanchao Shu","Yuqing Yang","Lili Qiu"],"abstract":"AI-driven video analytics has become increasingly important across diverse domains. However, existing systems are often constrained to specific, predefined tasks, limiting their adaptability in open-ended analytical scenarios. The recent emergence of Vision Language Models (VLMs) as transformative technologies offers significant potential for enabling open-ended video understanding, reasoning, and analytics. Nevertheless, their limited context windows present challenges when processing ultra-long video content, which is prevalent in real-world applications. To address this, we introduce AVA, a VLM-powered system designed for open-ended, advanced video analytics. AVA incorporates two key innovations: (1) the near real-time construction of Event Knowledge Graphs (EKGs) for efficient indexing of long or continuous video streams, and (2) an agentic retrieval-generation mechanism that leverag...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Systems and networking","1970-01-01","retrieval","efficient"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/sequences-of-logits-reveal-the-low-rank-structure-of-language-models","title":"Sequences of Logits Reveal the Low Rank Structure of Language Models","url":"https://www.microsoft.com/en-us/research/publication/sequences-of-logits-reveal-the-low-rank-structure-of-language-models/","published":"2025-10-27","authors":["Noah Golowich","Allen Liu","Abhishek Shetty"],"abstract":"A major problem in the study of large language models is to understand their inherent low-dimensional structure. We introduce an approach to study the low-dimensional structure of language models at a model-agnostic level: as sequential probabilistic models. We first empirically demonstrate that a wide range of modern language models exhibit low-rank structure: in particular, matrices built from the model's logits for varying sets of prompts and responses have low approximate rank. We then show that this low-rank structure can be leveraged for generation -- in particular, we can generate a response to a target prompt using a linear combination of the model's outputs on unrelated, or even nonsensical prompts. On the theoretical front, we observe that studying the approximate rank of language models in the sense discussed above yields a simple universal abstraction whose theoretical predic...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","mathematics","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"hf-org-paper:Qwen:2510.23095","title":"Revisiting Multimodal Positional Encoding in Vision-Language Models","url":"https://huggingface.co/papers/2510.23095","published":"2025-10-27","authors":["Alibaba/Qwen"],"abstract":"","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["HuggingFace org papers","Qwen"],"author_affiliations":["Alibaba/Qwen"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/Qwen/papers"}},{"id":"apple:e00yxpamvewkld61v1sef3s3","title":"Memory-Efficient Backpropagation for Fine-Tuning LLMs on Resource-Constrained Mobile Devices","url":"https://machinelearning.apple.com/research/memory-efficient-backpropagation","published":"2025-10-27","authors":["Congzheng Song","Xinyu Tang"],"abstract":"Fine-tuning large language models (LLMs) with backpropagation — even for a subset of parameters such as LoRA — can be much more memory-consuming than inference and is often deemed impractical for resource-constrained mobile devices. Alternative methods, such as zeroth-order optimization (ZO), can greatly reduce the memory footprint but come at the cost of significantly slower model convergence (10× to 100× more steps than backpropagation). We...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["memory","efficient"],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"apple:go61vx5omweqjr46kqzw1vf0","title":"PrimeX: A Dataset of Worldview, Opinion, and Explanation","url":"https://machinelearning.apple.com/research/primex","published":"2025-10-27","authors":["Rik Koncel-Kedziorski","Brihi Joshi","Tim Paek"],"abstract":"As the adoption of language models advances, so does the need to better represent individual users to the model. Are there aspects of an individual's belief system that a language model can utilize for improved alignment? Following prior research, we investigate this question in the domain of opinion prediction by developing PrimeX, a dataset of public opinion survey data from 858 US residents with two additional sources of belief information:...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["language model"],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"apple:tiizhmxlvvh70nlb40cwe07f","title":"Leveraging the Power of Large Language Models in Entity Linking via Adaptive Routing and Targeted Reasoning","url":"https://machinelearning.apple.com/research/leveraging-power","published":"2025-10-27","authors":["Yajie Li","Albert Galimov","Mitra Datta Ganapaneni","Pujitha Thejaswi","De Meng","Priyanshu Kumar","Saloni Potdar"],"abstract":"Entity Linking (EL) has traditionally relied on large annotated datasets and extensive model fine-tuning. While recent few-shot methods leverage large language models (LLMs) through prompting to reduce training requirements, they often suffer from inefficiencies due to expensive LLM-based reasoning. ARTER (Adaptive Routing and Targeted Entity Reasoning) presents a structured pipeline that achieves high performance without deep fine-tuning by...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["LLM"],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"apple:mmctzjs87jr65jk0y2ielvnt","title":"Breaking Down Video LLM Benchmarks: Knowledge, Spatial Perception, or True Temporal Understanding?","url":"https://machinelearning.apple.com/research/breaking-down","published":"2025-10-27","authors":["Bo Feng°","Zhengfeng Lai°","Shiyu Li","Zizhen Wang°","Simon Wang","Ping Huang","Meng Cao"],"abstract":"This paper was accepted at the Evaluating the Evolving LLM Lifecycle Workshop at NeurIPS 2025.","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["LLM"],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"official:b988ed48eef397fa","title":"Addendum to GPT-5 System Card: Sensitive conversations","url":"https://openai.com/index/gpt-5-system-card-sensitive-conversations","published":"2025-10-27","authors":["OpenAI"],"abstract":"This system card details GPT-5’s improvements in handling sensitive conversations, including new benchmarks for emotional reliance, mental health, and jailbreak resistance.","companies":["OpenAI"],"matched_orgs":["OpenAI"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Safety"],"author_affiliations":["OpenAI"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official RSS feed https://openai.com/news/rss.xml"}},{"id":"apple:ykr45ej80i68p2hpxecxsm5s","title":"Pico-Banana-400K: A Large-Scale Dataset for Text-Guided Image Editing","url":"https://machinelearning.apple.com/research/pico-banana","published":"2025-10-27","authors":["Yusu Qian","Eli Bocek-Rivele","Liangchen Song","Jialing Tong","Yinfei Yang","Jiasen Lu","Wenze Hu","Zhe Gan"],"abstract":"Recent advances in multimodal models have demonstrated remarkable text-guided image editing capabilities, withsystems like GPT-4o and Nano-Banana setting new benchmarks. However, the research community’s progress remainsconstrained by the absence of large-scale, high-quality, and openly accessible datasets built from real images. Weintroduce Pico-Banana-400K, a comprehensive 400K-image dataset for instruction-based image editing. Our dataset...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"apple:npegb5u25ecyycm5htfizqht","title":"ODKE+: Ontology-Guided Open-Domain Knowledge Extraction with LLMs","url":"https://machinelearning.apple.com/research/odke","published":"2025-10-27","authors":["Samira Khorshidi","Azadeh Nikfarjam","Suprita Shankar","Yisi Sang","Yash Govind","Hyun Jang","Ali Kasgari","Alexis McClimans","Mohamed Soliman","Vishnu Konda","Ahmed Fakhry","Xiaoguang Qi"],"abstract":"Knowledge graphs (KGs) are foundational to many AI applications, but maintaining their freshness and completeness remains costly. We present ODKE+, a production-grade system that automatically extracts and ingests millions of open-domain facts from web sources with high precision. ODKE+ combines modular components into a scalable pipeline: (1) the Extraction Initiator detects missing or stale facts, (2) the Evidence Retriever collects supporting...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"apple:m92847xngwampdoqoqerezcw","title":"Evaluating Evaluation Metrics -- The Mirage of Hallucination Detection","url":"https://machinelearning.apple.com/research/hallucination-detection","published":"2025-10-27","authors":["Atharva Kulkarni","Yuan Zhang","Joel Ruben Antony Moniz","Xiou Ge","Bo-Hsiang Tseng","Dhivya Piraviperumal","Swabha Swayamdipta","Hong Yu"],"abstract":"Hallucinations pose a significant obstacle to the reliability and widespread adoption of language models, yet their accurate measurement remains a persistent challenge. While many task- and domain-specific metrics have been proposed to assess faithfulness and factuality concerns, the robustness and generalization of these metrics are still untested. In this paper, we conduct a large-scale empirical evaluation of 6 diverse sets of hallucination...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"openalex:W4415594438","title":"SemanticLog: Towards Effective and Efficient Large-Scale Semantic Log Parsing","url":"https://doi.org/10.1109/tse.2025.3625121","published":"2025-10-27","authors":["Chenbo Zhang","Wenying Xu","Jinbu Liu","Lu Zhang","Guiyang Liu","Jihong Guan","Qi Zhou","Shuigeng Zhou"],"abstract":"Logs of large-scale cloud systems record diverse system events, ranging from routine statuses to critical errors. As the fundamental step of automated log analysis, log parsing is to transform unstructured logs into structured data for easier management and analysis. However, existing syntax-based and deep learning-based parsers struggle with complex real-world logs. Recent parsers based on large language models (LLMs) achieve higher accuracy, but they typically rely on online APIs (e.g., ChatGPT), raising privacy concerns and suffering from network latency. Moreover, with the rise of artificial intelligence for IT operations (AIOps), traditional parsers that focus on syntax-level templates fail to capture the semantics of dynamic log parameters, limiting their usefulness for downstream tasks. These challenges highlight the need for semantic log parsing that goes beyond template extracti...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/tse.2025.3625121","openalex_id":"https://openalex.org/W4415594438","cited_by_count":0,"quality_score":45,"matched_keywords":["LLM","efficient"],"author_affiliations":["Alibaba Group (China)","Fudan University","Tongji University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.9085000157356262},{"id":"https://openalex.org/C186644900","display_name":"Parsing","score":0.8440999984741211},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5726000070571899},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.4756999909877777},{"id":"https://openalex.org/C184337299","display_name":"Semantics (computer science)","score":0.42010000348091125},{"id":"https://openalex.org/C2781466058","display_name":"Parse tree","score":0.4169999957084656},{"id":"https://openalex.org/C2776214188","display_name":"Inference","score":0.3982999920845032},{"id":"https://openalex.org/C113174947","display_name":"Tree (set theory)","score":0.39250001311302185}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4415595662","title":"Leveraging historical information to boost retrieval-augmented generation in conversations","url":"https://doi.org/10.1016/j.ipm.2025.104449","published":"2025-10-27","authors":["Fengran Mo","Yifan Gao","Zhuofeng Wu","Xin Liu","P. S. Chen","Zheng Li","Zhengyang Wang","Xian Li","Meng Jiang","Jian‐Yun Nie"],"abstract":"Multi-turn interactions between users and information-seeking systems have become a popular paradigm to satisfy complex information needs via a flexible interface and context understanding capacity. However, existing methods primarily adapt single-turn retrieval-augmented generation (RAG) pipelines to conversational settings without effectively incorporating historical information, such as previous search results, turn dependency, and historical evidence grounding. To effectively manage and utilize the information in conversations, we explore the feasibility of boosting response generation by leveraging historical information and propose several strategies to incorporate this information individually or in combination. We conduct experiments on three widely used conversational search benchmarks, each containing thousands of samples. Our method consistently outperforms previous strong bas...","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1016/j.ipm.2025.104449","openalex_id":"https://openalex.org/W4415595662","cited_by_count":2,"quality_score":43,"matched_keywords":["retrieval"],"author_affiliations":["Amazon (United States)","University of Notre Dame","Université de Montréal"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8087000250816345},{"id":"https://openalex.org/C46686674","display_name":"Boosting (machine learning)","score":0.5817999839782715},{"id":"https://openalex.org/C2779343474","display_name":"Context (archaeology)","score":0.4781000018119812},{"id":"https://openalex.org/C2522767166","display_name":"Data science","score":0.4196999967098236},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.32510000467300415},{"id":"https://openalex.org/C2985684807","display_name":"Text generation","score":0.3095000088214874},{"id":"https://openalex.org/C180198813","display_name":"Information system","score":0.2962999939918518},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.2897999882698059}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":2}},{"id":"openalex:W4415596557","title":"Integrating Genomics into Multimodal EHR Foundation Models","url":"https://doi.org/10.1101/2025.10.26.684668","published":"2025-10-27","authors":["Jonathan Amar","Enduo Liu","Alessandra Breschi","Liangliang Zhang","Pouya Kheradpour","Shujun Li","Lisa Soleymani Lehmann","Alessandro Giulianelli","Matthew Edwards","Yugang Jia","David Nola","Raghav Mani"],"abstract":"ABSTRACT This paper introduces an innovative Electronic Health Record (EHR) foundation model that integrates Polygenic Risk Scores (PRS) as a foundational data modality, moving beyond traditional EHR-only approaches to build more holistic health profiles. Leveraging the extensive and diverse data from the All of Us (AoU) Research Program, this multimodal framework aims to learn complex relationships between clinical data and genetic predispositions. The methodology extends advancements in generative AI to the EHR foundation model space, enhancing predictive capabilities and interpretability. Evaluation on AoU data demonstrates the model’s predictive value for the onset of various conditions, particularly Type 2 Diabetes (T2D), and illustrates the interplay between PRS and EHR data. The work also explores transfer learning for custom classification tasks, showcasing the architecture’s ver...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"preprint","doi":"https://doi.org/10.1101/2025.10.26.684668","openalex_id":"https://openalex.org/W4415596557","cited_by_count":2,"quality_score":43,"matched_keywords":["personalized"],"author_affiliations":["Google (United States)","Nvidia (United Kingdom)"],"concepts":[{"id":"https://openalex.org/C2780966255","display_name":"Foundation (evidence)","score":0.6425999999046326},{"id":"https://openalex.org/C2522767166","display_name":"Data science","score":0.5514000058174133},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5252000093460083},{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.46369999647140503},{"id":"https://openalex.org/C189206191","display_name":"Genomics","score":0.43790000677108765},{"id":"https://openalex.org/C32220436","display_name":"Personalized medicine","score":0.40230000019073486},{"id":"https://openalex.org/C2776291640","display_name":"Value (mathematics)","score":0.3767000138759613},{"id":"https://openalex.org/C56739046","display_name":"Knowledge management","score":0.3765000104904175}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":2}},{"id":"openalex:W4415593644","title":"Evaluating the Adversarial Robustness of Vision-Language Models via Internal Feature Perturbations","url":"https://doi.org/10.1109/tcsvt.2025.3625396","published":"2025-10-27","authors":["Chaohu Liu","Yubo Wang","Haoyu Cao","Bing Liu","Deqiang Jiang"],"abstract":"Vision-language models (VLMs), such as BLIP-2 and LLaVA, have significantly advanced multimodal understanding but exhibit critical vulnerabilities to visual adversarial perturbations. The high efficacy of untargeted attacks, in particular, poses significant concerns for their operational robustness. Conventional attack methods generate adversarial examples by backpropagating the language modeling loss from the final output to the input image. However, the deep architecture of the integrated large language models (LLMs) often diminishes this gradient flow, limiting the attack’s effectiveness in perturbing the visual domain. To address this limitation, we introduce a novel untargeted attack method based on Maximizing Information Entropy (MIE). Our approach enhances attack efficacy not only by maximizing the information entropy of the model’s final output but also by directly inducing uncer...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/tcsvt.2025.3625396","openalex_id":"https://openalex.org/W4415593644","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Tencent (China)","University of Science and Technology of China"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7760999798774719},{"id":"https://openalex.org/C37736160","display_name":"Adversarial system","score":0.7414000034332275},{"id":"https://openalex.org/C63479239","display_name":"Robustness (evolution)","score":0.7239000201225281},{"id":"https://openalex.org/C106301342","display_name":"Entropy (arrow of time)","score":0.6186000108718872},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5019999742507935},{"id":"https://openalex.org/C188198153","display_name":"Limiting","score":0.4871000051498413},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.4366999864578247},{"id":"https://openalex.org/C165696696","display_name":"Exploit","score":0.40709999203681946}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/aescoder-code-aesthetics-with-agentic-reward-feedback","title":"AESCoder: Code Aesthetics with Agentic Reward Feedback","url":"https://www.microsoft.com/en-us/research/publication/aescoder-code-aesthetics-with-agentic-reward-feedback/","published":"2025-10-26","authors":["Lingjie Jiang","Bang Xiao","Shaohan Huang","Tengchao Lv","Yupan Huang","Xun Wu","Lei Cui","Furu Wei"],"abstract":"Large Language Models (LLMs) have become valuable assistants for developers in code-related tasks. While LLMs excel at traditional programming tasks such as code generation and bug fixing, they struggle with visually-oriented coding tasks, often producing suboptimal aesthetics. In this paper, we introduce a new pipeline to enhance the aesthetic quality of LLM-generated code. We first construct AesCode-358K, a large-scale instruction-tuning dataset focused on code aesthetics. Next, we propose agentic reward feedback, a multi-agent system that evaluates executability, static aesthetics, and interactive aesthetics. Building on this, we develop GRPO-AR, which integrates these signals into the GRPO algorithm for joint optimization of functionality and code aesthetics. Finally, we develop OpenDesign, a benchmark for assessing code aesthetics. Experimental results show that combining supervised...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":84,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","large language models","1970-01-01","LLM","agent","multi-agent"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W7140863123","title":"Multimodal Learning with Proxy Tokens","url":"https://doi.org/10.1109/ieeeconf67917.2025.11443840","published":"2025-10-26","authors":["M. Reza","Ameya D. Patil","Mashhour Solh","M. Salman Asif"],"abstract":"Multimodal models remain highly sensitive to missing inputs, leading to unstable predictions. We address this common real-world issue with a lightweight method that enables stable predictions even when some modalities are absent. Our approach introduces proxy tokens that act as stand-ins for the missing modality. Instead of adding extra networks for feature generation, each proxy token is optimized to approximate the class token of the missing modality from the available one. We train these proxy tokens efficiently using low-rank adapters added to frozen unimodal encoders with an alignment loss that encourages good feature approximation. Experiments show that our method outperforms prior techniques under both missing and complete modality settings. Overall, it provides an efficient and flexible way to build multimodal systems that remain reliable even when some modalities are missing.","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/ieeeconf67917.2025.11443840","openalex_id":"https://openalex.org/W7140863123","cited_by_count":0,"quality_score":41,"matched_keywords":["efficient"],"author_affiliations":["Amazon (United States)","University of California, Riverside"],"concepts":[{"id":"https://openalex.org/C48145219","display_name":"Security token","score":0.840499997138977},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8082000017166138},{"id":"https://openalex.org/C2780148112","display_name":"Proxy (statistics)","score":0.7332000136375427},{"id":"https://openalex.org/C2779903281","display_name":"Modalities","score":0.6255999803543091},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.6007999777793884},{"id":"https://openalex.org/C118505674","display_name":"Encoder","score":0.5958999991416931},{"id":"https://openalex.org/C9357733","display_name":"Missing data","score":0.5687999725341797},{"id":"https://openalex.org/C2776401178","display_name":"Feature (linguistics)","score":0.51910001039505}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4416429599","title":"LLM4Verilog: Building Large-Scale, High-Quality Data Infrastructure for Verilog Code Generation via Community Efforts","url":"https://doi.org/10.1109/iccad66269.2025.11240644","published":"2025-10-26","authors":["Zhongzhi Yu","Chaojian Li","Yongan Zhang","Mingjie Liu","Nathaniel Pinckney","Wenfei Zhou","Rongjian Liang","Haoyu Yang","Haoxing Ren","Yingyan Lin"],"abstract":"Despite recent advancements in code generation with large language models (LLMs), generating hardware code such as Verilog remains a significant challenge due to the scarcity of large-scale, high-quality datasets in the hardware domain. Existing approaches, including scraping open-source repositories and relying on manually curated datasets, often suffer from limited diversity, quality, and scalability. To address these limitations, we introduce LLM4Verilog, an exploratory, collaborative initiative aimed at constructing a large-scale, high-quality, open-source Verilog dataset. Our initiative integrates a community-driven data collection pipeline with a two-stage data filtering technique to ensure high dataset quality. The first stage removes duplicates and low-quality samples, resulting in a large-scale dataset called LLM4Verilog-complete. The second stage applies an LLM-driven quality s...","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/iccad66269.2025.11240644","openalex_id":"https://openalex.org/W4416429599","cited_by_count":0,"quality_score":41,"matched_keywords":["LLM"],"author_affiliations":["Georgia Institute of Technology","Nvidia (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7893999814987183},{"id":"https://openalex.org/C2776760102","display_name":"Code (set theory)","score":0.5957000255584717},{"id":"https://openalex.org/C43521106","display_name":"Pipeline (software)","score":0.578499972820282},{"id":"https://openalex.org/C2779030575","display_name":"Verilog","score":0.5230000019073486},{"id":"https://openalex.org/C133162039","display_name":"Code generation","score":0.48030000925064087},{"id":"https://openalex.org/C206345919","display_name":"Resource (disambiguation)","score":0.45210000872612},{"id":"https://openalex.org/C124101348","display_name":"Data mining","score":0.37689998745918274},{"id":"https://openalex.org/C2779530757","display_name":"Quality (philosophy)","score":0.3617999851703644}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4415556596","title":"A community‐driven vision for a new knowledge resource for AI","url":"https://doi.org/10.1002/aaai.70035","published":"2025-10-26","authors":["Vinay K. Chaudhri","Chaitan Baru","Brandon Bennett","Mehul Bhatt","D. G. Cassel","Anthony G. Cohn","Rina Dechter","Esra Erdem","Dave Ferrucci","Ken Forbus","Gregory Gelfond","Michael Genesereth"],"abstract":"Abstract The long‐standing goal of creating a comprehensive, multi‐purpose knowledge resource, reminiscent of the 1984 Cyc project, still persists in AI. Despite the success of knowledge resources like WordNet, ConceptNet, Wolfram|Alpha and other commercial knowledge graphs, verifiable, general‐purpose, widely available sources of knowledge remain a critical deficiency in AI infrastructure. Large language models struggle due to knowledge gaps; robotic planning lacks necessary world knowledge; and the detection of factually false information relies heavily on human expertise. What kind of knowledge resource is most needed in AI today? How can modern technology shape its development and evaluation? A recent AAAI workshop gathered over 50 researchers to explore these questions. This paper synthesizes our findings and outlines a community‐driven vision for a new knowledge infrastructure. In....","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1002/aaai.70035","openalex_id":"https://openalex.org/W4415556596","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Amazon (United States)","American Enterprise Institute","Carnegie Mellon University","Creative Technologies (United States)","Cycorp (United States)","Cytel (United States)","Defense Advanced Research Projects Agency","Educational Testing Service","Irvine University","Kansas State University","Knowledge Based Systems (United States)","NorthShore University HealthSystem","Northwestern University","Oracle (United States)","Philadelphia University","Rensselaer Polytechnic Institute","Robert Bosch (United States)","SRI International","Sabancı Üniversitesi","Stanford University","The Alan Turing Institute","The University of Texas at Austin","The University of Texas at Dallas","Third Way","Turing Institute","U.S. National Science Foundation","University of Auckland","University of California, Irvine","University of California, San Francisco","University of Dallas","University of Dayton","University of Leeds","University of Maryland, Baltimore County","University of Nebraska at Omaha","University of North Texas at Dallas","University of Pennsylvania","Wikimedia Foundation","Wright State University","Örebro University"],"concepts":[{"id":"https://openalex.org/C165696696","display_name":"Exploit","score":0.7300000190734863},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6829000115394592},{"id":"https://openalex.org/C56739046","display_name":"Knowledge management","score":0.5942000150680542},{"id":"https://openalex.org/C161301231","display_name":"Knowledge representation and reasoning","score":0.560699999332428},{"id":"https://openalex.org/C206345919","display_name":"Resource (disambiguation)","score":0.5253000259399414},{"id":"https://openalex.org/C2779343474","display_name":"Context (archaeology)","score":0.5209000110626221},{"id":"https://openalex.org/C84685590","display_name":"Knowledge engineering","score":0.47360000014305115},{"id":"https://openalex.org/C2522767166","display_name":"Data science","score":0.4562000036239624}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4415537706","title":"MuCodec: Ultra Low-Bitrate Music Codec for Music Generation","url":"https://doi.org/10.1145/3746027.3755710","published":"2025-10-25","authors":["Yaoxun Xu","Hangting Chen","Jianwei Yu","Wei Tan","Shun Lei","Zhiwei Lin","Rongzhi Gu","Zhiyong Wu"],"abstract":"Music generation is pivotal in multimedia, aiding creation and lowering the creative threshold. It focuses on generating music with clear vocals and harmonious accompaniment based on lyrics, combining high artistic creativity with technical challenges. The music codec is an important bridging component in large language model-based music generation, connecting language models with the generated music. However, existing neural codecs typically require token rates exceeding 50 Hz to achieve acceptable music quality, resulting in a context length that surpasses 12,000 tokens for a 4-minute song-a scale that is computationally demanding. This highlights the need for high-compression, high-fidelity music codecs that can reconstruct both vocals and accompaniment with high quality at low frame rates and bitrates, thereby better assisting music generation. To address this, we introduce MuCodec,....","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3746027.3755710","openalex_id":"https://openalex.org/W4415537706","cited_by_count":1,"quality_score":54,"matched_keywords":["language model","efficient","compression","quantization"],"author_affiliations":["Tencent (China)","Tsinghua University","University Town of Shenzhen"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7694000005722046},{"id":"https://openalex.org/C161765866","display_name":"Codec","score":0.755299985408783},{"id":"https://openalex.org/C28490314","display_name":"Speech recognition","score":0.5723999738693237},{"id":"https://openalex.org/C73520026","display_name":"Pop music automation","score":0.5253000259399414},{"id":"https://openalex.org/C48145219","display_name":"Security token","score":0.38690000772476196},{"id":"https://openalex.org/C199833920","display_name":"Vector quantization","score":0.35670000314712524},{"id":"https://openalex.org/C167310288","display_name":"Sound quality","score":0.35100001096725464},{"id":"https://openalex.org/C2779343474","display_name":"Context (archaeology)","score":0.34850001335144043}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"apple:n68cch8oviulhfoodlduf8tj","title":"Bias after Prompting: Persistent Discrimination in Large Language Models","url":"https://machinelearning.apple.com/research/persistent-discrimination","published":"2025-10-25","authors":["Nivedha Sivakumar","Natalie Mackraz","Samira Khorshidi","Krishna Patel","Barry-John Theobald","Luca Zappella","Nicholas Apostoloff"],"abstract":"A dangerous assumption that can be made from prior work on the bias transfer hypothesis (BTH) is that biases do not transfer from pre-trained large language models (LLMs) to adapted models. We invalidate this assumption by studying the BTH in causal models under prompt adaptations, as prompting is an extremely popular and accessible adaptation strategy used in real-world applications. In contrast to prior work, we find that biases can transfer...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"openalex:W4415538124","title":"MultiMind: Enhancing Werewolf Agents with Multimodal Reasoning and Theory of Mind","url":"https://doi.org/10.1145/3746027.3755752","published":"2025-10-25","authors":["Zheng Zhang","N. Y. Xiao","Qi Chai","Deheng Ye","Hao Wang"],"abstract":"Large Language Model (LLM) agents have demonstrated impressive capabilities in social deduction games (SDGs) like Werewolf, where strategic reasoning and social deception are essential. However, current approaches remain limited to textual information, ignoring crucial multimodal cues such as facial expressions and tone of voice that humans naturally use to communicate. Moreover, existing SDG agents primarily focus on inferring other players' identities without modeling how others perceive themselves or fellow players. To address these limitations, we use One Night Ultimate Werewolf (ONUW) as a testbed and present MultiMind, the first framework integrating multimodal information into SDG agents. MultiMind processes facial expressions and vocal tones alongside verbal content, while employing a Theory of Mind (ToM) model to represent each player's suspicion levels toward others. By combini...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3746027.3755752","openalex_id":"https://openalex.org/W4415538124","cited_by_count":0,"quality_score":49,"matched_keywords":["LLM","language model","agent"],"author_affiliations":["Tencent (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6816999912261963},{"id":"https://openalex.org/C2779267917","display_name":"Deception","score":0.6266000270843506},{"id":"https://openalex.org/C192209626","display_name":"Focus (optics)","score":0.5424000024795532},{"id":"https://openalex.org/C2779560602","display_name":"Theory of mind","score":0.527899980545044},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.43529999256134033},{"id":"https://openalex.org/C195704467","display_name":"Facial expression","score":0.3691999912261963},{"id":"https://openalex.org/C188147891","display_name":"Cognitive science","score":0.3564999997615814},{"id":"https://openalex.org/C184337299","display_name":"Semantics (computer science)","score":0.35519999265670776}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4415536809","title":"Cued-Agent: A Collaborative Multi-Agent System for Automatic Cued Speech Recognition","url":"https://doi.org/10.1145/3746027.3755423","published":"2025-10-25","authors":["Guanjie Huang","Danny H. K. Tsang","Shan Yang","Guangzhi Lei","Li Liu"],"abstract":"Cued Speech (CS) is a visual communication system that combines lip-reading with hand coding to facilitate communication for individuals with hearing impairments. Automatic CS Recognition (ACSR) aims to convert CS hand gestures and lip movements into text via AI-driven methods. Traditionally, the temporal asynchrony between hand and lip movements requires the design of complex modules to facilitate effective multimodal fusion. However, constrained by limited data availability, current methods demonstrate insufficient capacity for adequately training these fusion mechanisms, resulting in suboptimal performance. Recently, multi-agent systems have shown promising capabilities in handling complex tasks with limited data availability. To this end, we propose the first collaborative multi-agent system for ACSR, named Cued-Agent. It integrates four specialized sub-agents: a Multimodal Large Lan...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3746027.3755423","openalex_id":"https://openalex.org/W4415536809","cited_by_count":0,"quality_score":49,"matched_keywords":["language model","agent","multi-agent"],"author_affiliations":["Tencent (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8101999759674072},{"id":"https://openalex.org/C28490314","display_name":"Speech recognition","score":0.6349999904632568},{"id":"https://openalex.org/C207347870","display_name":"Gesture","score":0.5335999727249146},{"id":"https://openalex.org/C83195618","display_name":"Cued speech","score":0.5282999873161316},{"id":"https://openalex.org/C57273362","display_name":"Decoding methods","score":0.47760000824928284},{"id":"https://openalex.org/C179518139","display_name":"Coding (social sciences)","score":0.4408999979496002},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.42910000681877136},{"id":"https://openalex.org/C2779019669","display_name":"Asynchrony (computer programming)","score":0.40610000491142273}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"arxiv:2508.16932","title":"Align 3D Representation and Text Embedding for 3D Content Personalization","url":"http://arxiv.org/abs/2508.16932","published":"2025-10-25","authors":["Qi Song","Ziyuan Luo","Ka Chun Cheung","Simon See","Renjie Wan"],"abstract":"Recent advances in NeRF and 3DGS have significantly enhanced the efficiency and quality of 3D content synthesis. However, efficient personalization of generated 3D content remains a critical challenge. Current 3D personalization approaches predominantly rely on knowledge distillation-based methods, which require computationally expensive retraining procedures. To address this challenge, we propose Invert3D, a novel framework for convenient 3D content personalization. Nowadays, vision-language models such as CLIP enable direct image personalization through aligned vision-text embedding spaces. However, the inherent structural differences between 3D content and 2D images preclude direct application of these techniques to 3D personalization. Our approach bridges this gap by establishing alignment between 3D representations and text embedding spaces. Specifically, we develop a camera-conditi...","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3746027.3758145","openalex_id":"https://openalex.org/W4415536638","cited_by_count":0,"quality_score":49,"matched_keywords":["personalization","efficient","distillation"],"author_affiliations":["Hong Kong Baptist University","Nvidia (United Kingdom)","Nvidia (United States)"],"concepts":[{"id":"https://openalex.org/C183003079","display_name":"Personalization","score":0.8695999979972839},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7570000290870667},{"id":"https://openalex.org/C41608201","display_name":"Embedding","score":0.7530999779701233},{"id":"https://openalex.org/C2776359362","display_name":"Representation (politics)","score":0.5541999936103821},{"id":"https://openalex.org/C2778152352","display_name":"Content (measure theory)","score":0.47290000319480896},{"id":"https://openalex.org/C2778712577","display_name":"Retraining","score":0.4318000078201294},{"id":"https://openalex.org/C2779530757","display_name":"Quality (philosophy)","score":0.42899999022483826},{"id":"https://openalex.org/C26517878","display_name":"Key (lock)","score":0.41350001096725464}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4415538812","title":"MMESGBench: Pioneering Multimodal Understanding and Complex Reasoning Benchmark for ESG Tasks","url":"https://doi.org/10.1145/3746027.3758225","published":"2025-10-25","authors":["Lei Zhang","Xin Zhou","Chaoyue He","Di Wang","Yi Wu","Hong Xu","Wei Liu","Chunyan Miao"],"abstract":"Environmental, Social, and Governance (ESG) reports are essential for assessing sustainability, regulatory compliance, and financial transparency. However, these documents are typically long, multimodal, and structurally complex, combining dense text, tables, figures, and layout-sensitive semantics. Existing AI systems often struggle to perform reliable document-level reasoning in such settings, and no dedicated benchmark currently exists in ESG domain. To fill the gap, we introduce MMESGBench, a first-of-its-kind benchmark dataset targeted to evaluate multimodal understanding and reasoning across multi-source ESG documents. This dataset is constructed via a human-AI collaborative, multi-stage pipeline. First, a multimodal LLM generates candidate question-answer (QA) pairs by jointly interpreting textual, tabular, and visual information from layout-aware document pages. Second, an LLM ve...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3746027.3758225","openalex_id":"https://openalex.org/W4415538812","cited_by_count":2,"quality_score":47,"matched_keywords":["LLM","retrieval"],"author_affiliations":["Alibaba Group (China)","Nanyang Technological University","University College London"],"concepts":[{"id":"https://openalex.org/C185798385","display_name":"Benchmark (surveying)","score":0.8575000166893005},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7674999833106995},{"id":"https://openalex.org/C36503486","display_name":"Domain (mathematical analysis)","score":0.5823000073432922},{"id":"https://openalex.org/C98045186","display_name":"Process (computing)","score":0.5616999864578247},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5382999777793884},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.46160000562667847},{"id":"https://openalex.org/C184337299","display_name":"Semantics (computer science)","score":0.42329999804496765},{"id":"https://openalex.org/C20162079","display_name":"Case-based reasoning","score":0.35519999265670776}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":2}},{"id":"openalex:W4415537389","title":"SaP-Bot: A Multimodal Large-Language Model for End-to-End Same-Product Identification","url":"https://doi.org/10.1145/3746027.3754835","published":"2025-10-25","authors":["Yixuan Zhou","Yulu Tian","Wenliang Zhong","Xingbin Yu","Heng Tao Shen","Xing Xu"],"abstract":"Same-product identification serves as a critical infrastructure in e-commerce systems, enabling accurate product matching across heterogeneous marketing representations for key applications such as price comparison and personalized recommendation. Conventional approaches typically depend on manual feature engineering and extensive rule tuning, which limits their adaptability to varying identification criteria across different product categories and inconsistent business scenarios. To overcome these challenges, we propose an end-to-end same-product identification model powered by multimodal large language models (MLLMs) that inherently support multimodal alignment and exhibit strong generalization across diverse real-world settings. We first introduce a novel group-wise annotation pipeline to construct a high-quality dataset, consisting of diverse product pairs with multimodal presentatio...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3746027.3754835","openalex_id":"https://openalex.org/W4415537389","cited_by_count":1,"quality_score":46,"matched_keywords":["language model","personalized"],"author_affiliations":["Alibaba Group (China)","Tongji University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7150999903678894},{"id":"https://openalex.org/C116834253","display_name":"Identification (biology)","score":0.6949999928474426},{"id":"https://openalex.org/C177148314","display_name":"Generalization","score":0.6751000285148621},{"id":"https://openalex.org/C2776321320","display_name":"Annotation","score":0.6061999797821045},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5694000124931335},{"id":"https://openalex.org/C90673727","display_name":"Product (mathematics)","score":0.5568000078201294},{"id":"https://openalex.org/C43521106","display_name":"Pipeline (software)","score":0.5511000156402588},{"id":"https://openalex.org/C165064840","display_name":"Matching (statistics)","score":0.5493999719619751}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4415536393","title":"TriCLIP-3D: A Unified Parameter-Efficient Framework for Tri-Modal 3D Visual Grounding based on CLIP","url":"https://doi.org/10.1145/3746027.3755836","published":"2025-10-25","authors":["Fan Li","Zanyi Wang","Zeyi Huang","Guang Dai","Jingdong Wang","Mengmeng Wang"],"abstract":"3D visual grounding allows an embodied agent to understand visual information in real-world 3D environments based on human instructions, which is crucial for embodied intelligence. Existing 3D visual grounding methods typically rely on separate encoders for different modalities (e.g., RGB images, text, and 3D point clouds), resulting in large and complex models that are inefficient to train. While some approaches use pre-trained 2D multi-modal models like CLIP for 3D tasks, they still struggle with aligning point cloud data to 2D encoders. As a result, these methods continue to depend on 3D encoders for feature extraction, further increasing model complexity and training inefficiency. In this paper, we propose a unified 2D pre-trained multi-modal network to process all three modalities (RGB images, text, and point clouds), significantly simplifying the architecture. By leveraging a 2D CL...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3746027.3755836","openalex_id":"https://openalex.org/W4415536393","cited_by_count":0,"quality_score":45,"matched_keywords":["efficient","agent"],"author_affiliations":["Hefei University of Technology","Huawei Technologies (China)","State Grid Corporation of China (China)","Xi'an Jiaotong University","Zhejiang University of Technology"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7682999968528748},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.6869000196456909},{"id":"https://openalex.org/C131979681","display_name":"Point cloud","score":0.5986999869346619},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.5720999836921692},{"id":"https://openalex.org/C2776401178","display_name":"Feature (linguistics)","score":0.5232999920845032},{"id":"https://openalex.org/C141353440","display_name":"Fuse (electrical)","score":0.5199999809265137},{"id":"https://openalex.org/C98045186","display_name":"Process (computing)","score":0.4878000020980835},{"id":"https://openalex.org/C36464697","display_name":"Visualization","score":0.4844000041484833}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4415538187","title":"LVLM-MIR: Large Vision-Language Model with Parameter-Efficient Fine-Tuning for Multimodal Interleaved Reasoning","url":"https://doi.org/10.1145/3746027.3762002","published":"2025-10-25","authors":["Jun Yu","Xilong Lu","Cong Wang","Qiang Ling"],"abstract":"Multimodal interleaved reasoning, which requires models to understand interleaved image-text sequences and multiple images, is a critical challenge in contemporary AI. This paper proposes a parameter-efficient fine-tuning framework based on Large Vision-Language Models, with Qwen2.5-VL as the backbone and Low-Rank Adaptation for task-specific adaptation. The framework integrates four stages: multimodal input preprocessing to align with pre-training distributions, visual feature extraction via a modified Vision Transformer, cross-modal fusion via attention mechanisms, and response generation via an autoregressive decoder. By freezing pre-trained weights and fine-tuning low-rank adapters in both visual and language modules, it balances preserving general multimodal knowledge with optimizing target tasks, achieving high performance with low computational overhead. On the MIRAGE Challenge Tr...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3746027.3762002","openalex_id":"https://openalex.org/W4415538187","cited_by_count":0,"quality_score":45,"matched_keywords":["language model","efficient"],"author_affiliations":["Huawei Technologies (China)","University of Science and Technology of China"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7976999878883362},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5982999801635742},{"id":"https://openalex.org/C2776401178","display_name":"Feature (linguistics)","score":0.5503000020980835},{"id":"https://openalex.org/C34736171","display_name":"Preprocessor","score":0.5320000052452087},{"id":"https://openalex.org/C4679612","display_name":"Aggregate (composite)","score":0.5291000008583069},{"id":"https://openalex.org/C52622490","display_name":"Feature extraction","score":0.4925999939441681},{"id":"https://openalex.org/C18555067","display_name":"Joint (building)","score":0.4165000021457672},{"id":"https://openalex.org/C2777508537","display_name":"Visual reasoning","score":0.36809998750686646}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4415540312","title":"Inversion-DPO: Precise and Efficient Post-Training for Diffusion Models","url":"https://doi.org/10.1145/3746027.3755220","published":"2025-10-25","authors":["Zejian Li","Yize Li","Chenye Meng","Zhongni Liu","L. Yang","Shengyuan Zhang","Guang Yang","Changyuan Yang","Zhiyuan Yang","Lingyun Sun"],"abstract":"Recent advancements in diffusion models (DMs) have been propelled by alignment methods that post-train models to better conform to human preferences. However, these approaches typically require computation-intensive training of a base model and a reward model, which not only incurs substantial computational overhead but may also compromise model accuracy and training efficiency. To address these limitations, we propose Inversion-DPO, a novel alignment framework that circumvents reward modeling by reformulating Direct Preference Optimization (DPO) with DDIM inversion for DMs. Our method conducts intractable posterior sampling in Diffusion-DPO with the deterministic inversion from winning and losing samples to noise and thus derive a new post-training paradigm. This paradigm eliminates the need for auxiliary reward models or inaccurate appromixation, significantly enhancing both precision....","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3746027.3755220","openalex_id":"https://openalex.org/W4415540312","cited_by_count":0,"quality_score":45,"matched_keywords":["preference","efficient"],"author_affiliations":["Alibaba Group (China)","Ningbo University","Peking University","University of Electronic Science and Technology of China","Zhejiang University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7592999935150146},{"id":"https://openalex.org/C2780451532","display_name":"Task (project management)","score":0.5289000272750854},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5059000253677368},{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.5027999877929688},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.4593999981880188},{"id":"https://openalex.org/C1893757","display_name":"Inversion (geology)","score":0.42590001225471497},{"id":"https://openalex.org/C11413529","display_name":"Algorithm","score":0.42239999771118164},{"id":"https://openalex.org/C99498987","display_name":"Noise (video)","score":0.4194999933242798}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4415538210","title":"Harnessing Multimodal Large Language Models for Personalized Product Search with Query-aware Refinement","url":"https://doi.org/10.1145/3746027.3754898","published":"2025-10-25","authors":["Beibei Zhang","Yanan Lu","Ruobing Xie","Zongyi Li","Siyuan Xing","Tongwei Ren","Fen Lin"],"abstract":"Personalized product search (PPS) aims to retrieve products relevant to the given query considering user preferences within their purchase histories. Since large language models (LLM) exhibit impressive potential in content understanding and reasoning, current methods explore to leverage LLM to comprehend the complicated relationships among user, query and product to improve the search performance of PPS. Despite the progress, LLM-based PPS solutions merely take textual contents into consideration, neglecting multimodal contents which play a critical role for product search. Motivated by this, we propose a novel framework, HMPPS, for Harnessing Multimodal large language models (MLLM) to deal with Personalized Product Search based on multimodal contents. Nevertheless, the redundancy and noise in PPS input stand for a great challenge to apply MLLM for PPS, which not only misleads MLLM to g...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3746027.3754898","openalex_id":"https://openalex.org/W4415538210","cited_by_count":0,"quality_score":45,"matched_keywords":["LLM","personalized"],"author_affiliations":["Huazhong University of Science and Technology","Nanjing University","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8113999962806702},{"id":"https://openalex.org/C170858558","display_name":"Automatic summarization","score":0.729200005531311},{"id":"https://openalex.org/C153083717","display_name":"Leverage (statistics)","score":0.5799999833106995},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.5763000249862671},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.5437999963760376},{"id":"https://openalex.org/C152124472","display_name":"Redundancy (engineering)","score":0.5292999744415283},{"id":"https://openalex.org/C2776214188","display_name":"Inference","score":0.515500009059906},{"id":"https://openalex.org/C97854310","display_name":"Search engine","score":0.4235000014305115}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4415536586","title":"EasyAnimate: High-Performance Video Generation Framework with Hybrid Windows Attention and Reward Backpropagation","url":"https://doi.org/10.1145/3746027.3755083","published":"2025-10-25","authors":["Jingzehua Xu","Kunzhe Huang","Xinyi Zou","Yunkuo Chen","Bo Liu","Mengli Cheng","Jun Huang","Xing Shi"],"abstract":"This paper introduces EasyAnimate, an efficient and high quality video generation framework that leverages diffusion transformers to achieve high-quality video production, encompassing data processing, model training, and end-to-end inference. Despite substantial advancements achieved by video diffusion models, existing video generation models still struggles with slow generation speeds and less-than-ideal video quality. To improve training and inference efficiency without compromising performance, we propose Hybrid Window Attention. We design the multidirectional sliding window attention in Hybrid Window Attention, which provides stronger receptive capabilities in 3D dimensions compared to naive one, while reducing the model's computational complexity as the video sequence length increases. To enhance video generation quality, we optimize EasyAnimate using reward backpropagation to bett...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3746027.3755083","openalex_id":"https://openalex.org/W4415536586","cited_by_count":0,"quality_score":45,"matched_keywords":["language model","efficient"],"author_affiliations":["Alibaba Group (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8452000021934509},{"id":"https://openalex.org/C118505674","display_name":"Encoder","score":0.6647999882698059},{"id":"https://openalex.org/C155032097","display_name":"Backpropagation","score":0.5742999911308289},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5037000179290771},{"id":"https://openalex.org/C2776214188","display_name":"Inference","score":0.4542999863624573},{"id":"https://openalex.org/C102392041","display_name":"Sliding window protocol","score":0.41769999265670776},{"id":"https://openalex.org/C48145219","display_name":"Security token","score":0.41620001196861267},{"id":"https://openalex.org/C103910844","display_name":"Video quality","score":0.3695000112056732}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4415539943","title":"Chart-HQA: A Benchmark for Hypothetical Question Answering in Charts","url":"https://doi.org/10.1145/3746027.3758288","published":"2025-10-25","authors":["Xiangnan Chen","Yue Fang","Juncheng Li","Qian Xiao","Jun Lin","Siliang Tang","Yueting Zhuang"],"abstract":"Multimodal Large Language Models (MLLMs) have garnered significant attention for their strong visual-semantic understanding. Most existing chart benchmarks evaluate MLLMs' ability to parse information from charts to answer questions. However, they overlook the inherent output biases of MLLMs, where models rely on their parametric memory to answer questions rather than genuinely understanding the chart content. To address this limitation, we introduce a novel Chart Hypothetical Question Answering (HQA) task, which imposes assumptions on the same question to compel models to engage in counterfactual reasoning based on the chart content. Furthermore, we introduce HAI, a human-AI interactive data synthesis approach that leverages the efficient text-editing capabilities of LLMs alongside human expert knowledge to generate diverse and high-quality HQA data at a low cost. Using HAI, we construc...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3746027.3758288","openalex_id":"https://openalex.org/W4415539943","cited_by_count":0,"quality_score":45,"matched_keywords":["memory","efficient"],"author_affiliations":["Alibaba Group (China)","Zhejiang University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7813000082969666},{"id":"https://openalex.org/C44291984","display_name":"Question answering","score":0.7487000226974487},{"id":"https://openalex.org/C51929080","display_name":"Codebase","score":0.6240000128746033},{"id":"https://openalex.org/C185798385","display_name":"Benchmark (surveying)","score":0.609499990940094},{"id":"https://openalex.org/C2780801425","display_name":"Construct (python library)","score":0.5407000184059143},{"id":"https://openalex.org/C108650721","display_name":"Counterfactual thinking","score":0.5356000065803528},{"id":"https://openalex.org/C190812933","display_name":"Chart","score":0.4927999973297119},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.48590001463890076}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4415541184","title":"Human-Activity AGV Quality Assessment: A Benchmark Dataset and an Objective Evaluation Metric","url":"https://doi.org/10.1145/3746027.3754731","published":"2025-10-25","authors":["Zhichao Zhang","Wei Sun","Xinyue Li","Yunhao Li","Qihang Ge","Jun Jia","Zicheng Zhang","Zhongpeng Ji","Fengyu Sun","Shangling Jui","Xiongkuo Min","Guangtao Zhai"],"abstract":"AI-driven video generation techniques have made significant progress in recent years. However, AI-generated videos (AGVs) involving human activities often exhibit substantial visual and semantic distortions, hindering the practical application of video generation technologies in real-world scenarios. To address this challenge, we conduct a pioneering study on human activity AGV quality assessment, focusing on visual quality evaluation and the identification of semantic distortions. First, we construct the AI-Generated Human activity Video Quality Assessment (Human-AGVQA) dataset, consisting of 6,000 AGVs derived from 15 popular text-to-video (T2V) models using 400 text prompts that describe diverse human activities. We conduct a subjective study to evaluate the human appearance quality, action continuity quality, and overall video quality of AGVs, and identify semantic issues of human bo...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3746027.3754731","openalex_id":"https://openalex.org/W4415541184","cited_by_count":6,"quality_score":43,"matched_keywords":[],"author_affiliations":["East China Normal University","Huawei Technologies (China)","Shanghai Jiao Tong University"],"concepts":[{"id":"https://openalex.org/C176217482","display_name":"Metric (unit)","score":0.8539000153541565},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7594000101089478},{"id":"https://openalex.org/C2779530757","display_name":"Quality (philosophy)","score":0.7490000128746033},{"id":"https://openalex.org/C185798385","display_name":"Benchmark (surveying)","score":0.7365000247955322},{"id":"https://openalex.org/C103910844","display_name":"Video quality","score":0.554099977016449},{"id":"https://openalex.org/C2779346075","display_name":"Quality Score","score":0.5166000127792358},{"id":"https://openalex.org/C114227958","display_name":"Subjective video quality","score":0.5117999911308289},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.49300000071525574}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":6}},{"id":"openalex:W4415540597","title":"Twin Co-Adaptive Dialogue for Progressive Image Generation","url":"https://doi.org/10.1145/3746027.3755141","published":"2025-10-25","authors":["Jianhui Wang","Yangfan He","Yan Zhong","Xinyuan Song","Jiayi Su","Yuheng Feng","Ruoyu Wang","Hongyang He","Wenyu Zhu","Xinhang Yuan","Miao Zhang","Keqin Li"],"abstract":"Modern text-to-image generation systems have enabled the creation of remarkably realistic and high-quality visuals, yet they often falter when handling the inherent ambiguities in user prompts. In this work, we present Twin-Co, a framework that leverages synchronized, co-adaptive dialogue to progressively refine image generation. Instead of a static generation process, Twin-Co employs a dynamic, iterative workflow where an intelligent dialogue agent continuously interacts with the user. Initially, a base image is generated from the user's prompt. Then, through a series of synchronized dialogue exchanges, the system adapts and optimizes the image according to evolving user feedback. The co-adaptive process allows the system to progressively narrow down ambiguities and better align with user intent. Experiments demonstrate that Twin-Co not only enhances user experience by reducing trial-an...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3746027.3755141","openalex_id":"https://openalex.org/W4415540597","cited_by_count":1,"quality_score":42,"matched_keywords":["agent"],"author_affiliations":["Emory University","Google (United States)","Hong Kong Polytechnic University","Peking University","Saint Louis University","Tsinghua University","Twin Cities Orthopedics","University of Electronic Science and Technology of China","University of Minnesota","University of Toronto","University of Warwick","Xiamen University Malaysia"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7965999841690063},{"id":"https://openalex.org/C177212765","display_name":"Workflow","score":0.6938999891281128},{"id":"https://openalex.org/C98045186","display_name":"Process (computing)","score":0.6933000087738037},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.5742999911308289},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.567300021648407},{"id":"https://openalex.org/C115961682","display_name":"Image (mathematics)","score":0.5547000169754028},{"id":"https://openalex.org/C2779530757","display_name":"Quality (philosophy)","score":0.47870001196861267},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.4560999870300293}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4415540510","title":"The Eye of Sherlock Holmes: Uncovering User Private Attribute Profiling via Vision-Language Model Agentic Framework","url":"https://doi.org/10.1145/3746027.3755643","published":"2025-10-25","authors":["Feiran Liu","Yuzhe Zhang","Xinyi Huang","Yinan Peng","Xinfeng Li","Lixu Wang","Yutong Shen","Ranjie Duan","Simeng Qin","Xiaojun Jia","Qingsong Wen","Wei Dong"],"abstract":"","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3746027.3755643","openalex_id":"https://openalex.org/W4415540510","cited_by_count":1,"quality_score":42,"matched_keywords":["language model"],"author_affiliations":["Alibaba Group (China)","Beijing University of Technology","Bellevue Hospital Center","Hengyang Academy of Agricultural Sciences","Nanyang Technological University","Northeastern University","Northwestern University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6083999872207642},{"id":"https://openalex.org/C187191949","display_name":"Profiling (computer programming)","score":0.586899995803833},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3441999852657318},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.3407000005245209},{"id":"https://openalex.org/C136764020","display_name":"World Wide Web","score":0.31119999289512634},{"id":"https://openalex.org/C2522767166","display_name":"Data science","score":0.2955999970436096},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.2865999937057495},{"id":"https://openalex.org/C12713177","display_name":"Perspective (graphical)","score":0.2773999869823456}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4415538289","title":"ERR@HRI 2.0 Challenge: Multimodal Detection of Errors and Failures in Human-Robot Conversations","url":"https://doi.org/10.1145/3746027.3762073","published":"2025-10-25","authors":["Shiye Cao","Maia Stiber","Amama Mahmood","Maria Teresa Parreira","Wendy Ju","Micol Spitale","Hatice Güneş","Chien‐Ming Huang"],"abstract":"The integration of large language models (LLMs) into conversational robots has made human-robot conversations more dynamic. Yet, LLM-powered conversational robots remain prone to errors, e.g., misunderstanding user intent, prematurely interrupting users, or failing to respond altogether. Detecting and addressing these failures is critical for preventing conversational breakdowns, avoiding task disruptions, and sustaining user trust. To tackle this problem, the ERR@HRI 2.0 Challenge provides a multimodal dataset of LLM-powered conversational robot failures during human-robot conversations and encourages researchers to benchmark machine learning models designed to detect robot failures. The dataset includes 16 hours of dyadic human-robot interactions, incorporating facial, speech, and head movement features. Each interaction is annotated with the presence or absence of robot errors from th...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3746027.3762073","openalex_id":"https://openalex.org/W4415538289","cited_by_count":1,"quality_score":42,"matched_keywords":["LLM"],"author_affiliations":["Cornell University","Johns Hopkins University","Microsoft (United States)","Politecnico di Milano","University of Cambridge"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7469000220298767},{"id":"https://openalex.org/C2780451532","display_name":"Task (project management)","score":0.6589000225067139},{"id":"https://openalex.org/C90509273","display_name":"Robot","score":0.6575999855995178},{"id":"https://openalex.org/C185798385","display_name":"Benchmark (surveying)","score":0.5924999713897705},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5737000107765198},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.5580000281333923},{"id":"https://openalex.org/C26517878","display_name":"Key (lock)","score":0.46059998869895935},{"id":"https://openalex.org/C145460709","display_name":"Human–robot interaction","score":0.4415000081062317}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4415538402","title":"Debiasing Multimodal Large Language Models via Penalization of Language Priors","url":"https://doi.org/10.1145/3746027.3755364","published":"2025-10-25","authors":["Yifan Zhang","Yang Shi","Weichen Yu","Qingsong Wen","Xue Wang","Wenjing Yang","Zhang Zhang","Liang Wang","Rong Jin"],"abstract":"In the realms of computer vision and natural language processing, Multimodal Large Language Models (MLLMs) have become indispensable tools, proficient in generating textual responses based on visual inputs. Despite their advancements, our investigation reveals a noteworthy bias: the generated content is often driven more by the inherent priors of the underlying Large Language Models (LLMs) than by the input image. Empirical experiments underscore the persistence of this bias, as MLLMs often provide confident answers even in the absence of relevant images or given incongruent visual inputs. To rectify these biases and redirect the model's focus toward visual information, we propose two simple, training-free strategies. First, for tasks such as classification or multi-choice question answering, we introduce a ''Post-Hoc Debias'' method using an affine calibration step to adjust the output....","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3746027.3755364","openalex_id":"https://openalex.org/W4415538402","cited_by_count":1,"quality_score":42,"matched_keywords":["LLM"],"author_affiliations":["Alibaba Group (China)","Bellevue University","Carnegie Mellon University","Chinese Academy of Sciences","National University of Defense Technology","Peking University","Seattle University","Shandong Institute of Automation"],"concepts":[{"id":"https://openalex.org/C2779458634","display_name":"Debiasing","score":0.7825999855995178},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7358999848365784},{"id":"https://openalex.org/C177769412","display_name":"Prior probability","score":0.6575000286102295},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5443999767303467},{"id":"https://openalex.org/C2776135515","display_name":"Regularization (linguistics)","score":0.5167999863624573},{"id":"https://openalex.org/C192209626","display_name":"Focus (optics)","score":0.492900013923645},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.4221999943256378},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.4081000089645386}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4415540333","title":"ArtFRD: A Fisher-Rao Mixture Metric for Generative Model Aesthetic Evaluation","url":"https://doi.org/10.1145/3746027.3755259","published":"2025-10-25","authors":["Congxiong Huang","Zexi Jia","Hongyan Fei","Yeshuang Zhu","Zhiqiang Yuan","Jinchao Zhang","Jie Zhou"],"abstract":"Recent advances in generative modeling have enabled the synthesis of high-quality artistic images. Nevertheless, systematic evaluation of generative models from an aesthetic standpoint is still lacking, which hinders progress in artistic image synthesis. Existing evaluation metrics, such as Fréchet Inception Distance (FID) and CMMD, struggle with aesthetic assessment: they rely on pretrained visual features that overlook nuanced artistic attributes and employ distance functions ill-suited for modeling the diverse, multi-modal distribution of artistic styles. To address these limitations, we propose ArtFRD, a metric specifically designed for generative aesthetic evaluation. Grounded in aesthetic theory, ArtFRD extracts visual features along four key aesthetic dimensions-brushstroke, composition, lighting, and color-to capture fine-grained artistic properties. To model the multi-modal natu...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3746027.3755259","openalex_id":"https://openalex.org/W4415540333","cited_by_count":1,"quality_score":42,"matched_keywords":["efficient"],"author_affiliations":["Peking University","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.7997999787330627},{"id":"https://openalex.org/C176217482","display_name":"Metric (unit)","score":0.7041000127792358},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.565500020980835},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5631999969482422},{"id":"https://openalex.org/C167966045","display_name":"Generative model","score":0.5587999820709229},{"id":"https://openalex.org/C184408114","display_name":"Generative Design","score":0.5393999814987183},{"id":"https://openalex.org/C204323151","display_name":"Range (aeronautics)","score":0.4900999963283539},{"id":"https://openalex.org/C2780966255","display_name":"Foundation (evidence)","score":0.4681999981403351}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"arxiv:2508.17857","title":"VISA: Group-wise Visual Token Selection and Aggregation via Graph Summarization for Efficient MLLMs Inference","url":"http://arxiv.org/abs/2508.17857","published":"2025-10-25","authors":["Pengfei Jiang","Hanjun Li","Linglan Zhao","Fei Chao","Ke Yan","Shouhong Ding","Rongrong Ji"],"abstract":"In this study, we introduce a novel method called group-wise VI sual token Selection and Aggregation (VISA) to address the issue of inefficient inference stemming from excessive visual tokens in multimoal large language models (MLLMs). Compared with previous token pruning approaches, our method can preserve more visual information while compressing visual tokens. We first propose a graph-based visual token aggregation (VTA) module. VTA treats each visual token as a node, forming a graph based on semantic similarity among visual tokens. It then aggregates information from removed tokens into kept tokens based on this graph, producing a more compact visual token representation. Additionally, we introduce a group-wise token selection strategy (GTS) to divide visual tokens into kept and removed ones, guided by text tokens from the final layers of each group. This strategy progressively aggre...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3746027.3755792","openalex_id":"https://openalex.org/W4415538176","cited_by_count":0,"quality_score":41,"matched_keywords":["efficient"],"author_affiliations":["Tencent (China)","Xiamen University"],"concepts":[{"id":"https://openalex.org/C48145219","display_name":"Security token","score":0.8136000037193298},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7998999953269958},{"id":"https://openalex.org/C170858558","display_name":"Automatic summarization","score":0.6348000168800354},{"id":"https://openalex.org/C2776214188","display_name":"Inference","score":0.5928000211715698},{"id":"https://openalex.org/C36464697","display_name":"Visualization","score":0.5526000261306763},{"id":"https://openalex.org/C81917197","display_name":"Selection (genetic algorithm)","score":0.5425999760627747},{"id":"https://openalex.org/C132525143","display_name":"Graph","score":0.5410000085830688},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5303999781608582}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4415537031","title":"Towards Universal Perception through Language-Guided Open-World Object Detection","url":"https://doi.org/10.1145/3746027.3755017","published":"2025-10-25","authors":["Zihan Wang","Yunhang Shen","Yuan Fang","Zuwei Long","Ke Li","Xing Sun","Jiao Xie","Shaohui Lin"],"abstract":"Open-vocabulary object detection seeks to recognize objects from arbitrary language inputs, extending detection beyond fixed training categories. While recent methods have made progress in detecting unseen categories, they typically require a set of predefined categories during the inference stage, hindering practical deployment in open-world scenarios. To overcome this crucial limitation, we propose UniPerception , a novel universal perception framework based on open-vocabulary object detection. It not only excels at open-vocabulary object detection but is also capable of generating labels for target objects in the absence of predefined vocabularies, and can be adapted to a broad range of vision-language tasks simply by modifying the language instructions. UniPerception seamlessly integrates three key innovations: 1) a robust visual detector trained on diverse data sources to capture ri...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3746027.3755017","openalex_id":"https://openalex.org/W4415537031","cited_by_count":0,"quality_score":41,"matched_keywords":["language model"],"author_affiliations":["East China Normal University","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7800999879837036},{"id":"https://openalex.org/C2776151529","display_name":"Object detection","score":0.7329999804496765},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.6952000260353088},{"id":"https://openalex.org/C2776214188","display_name":"Inference","score":0.5763999819755554},{"id":"https://openalex.org/C2781238097","display_name":"Object (grammar)","score":0.5375000238418579},{"id":"https://openalex.org/C177264268","display_name":"Set (abstract data type)","score":0.5339999794960022},{"id":"https://openalex.org/C177148314","display_name":"Generalization","score":0.503600001335144},{"id":"https://openalex.org/C26760741","display_name":"Perception","score":0.492900013923645}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4415538104","title":"Tora2: Motion and Appearance Customized Diffusion Transformer for Multi-Entity Video Generation","url":"https://doi.org/10.1145/3746027.3754869","published":"2025-10-25","authors":["Zhenghao Zhang","Junchao Liao","Xiangyu Meng","Long Qin","W. Wang"],"abstract":"Recent advances in diffusion transformer models for motion-guided video generation, such as Tora, have shown significant progress. In this paper, we present Tora2, an enhanced version of Tora, which introduces several design improvements to expand its capabilities in both appearance and motion customization. Specifically, we introduce a decoupled personalization extractor that generates comprehensive personalization embeddings for multiple open-set entities, better preserving fine-grained visual details compared to previous methods. Building on this, we design a gated self-attention mechanism to integrate trajectory, textual description, and visual information for each entity. This innovation significantly reduces misalignment in multimodal conditioning during training. Moreover, we introduce a contrastive loss that jointly optimizes trajectory dynamics and entity consistency through exp...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3746027.3754869","openalex_id":"https://openalex.org/W4415538104","cited_by_count":0,"quality_score":41,"matched_keywords":["personalization"],"author_affiliations":["Alibaba Group (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7595000267028809},{"id":"https://openalex.org/C183003079","display_name":"Personalization","score":0.6851999759674072},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.5360999703407288},{"id":"https://openalex.org/C104114177","display_name":"Motion (physics)","score":0.5098000168800354},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5040000081062317},{"id":"https://openalex.org/C66322947","display_name":"Transformer","score":0.4025999903678894},{"id":"https://openalex.org/C145565327","display_name":"Motion control","score":0.3864000141620636},{"id":"https://openalex.org/C10161872","display_name":"Motion estimation","score":0.33570000529289246}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4415537932","title":"Talk, Imagine, Evolve: A Unified Multimodal Agent for Seamless Visual Generation and Editing","url":"https://doi.org/10.1145/3746027.3754467","published":"2025-10-25","authors":["Zhaofan Qiu","Zijian Gong","Yingwei Pan","Ting Yao","Tao Mei"],"abstract":"This paper demonstrates a pioneering unified multimodal agent that transforms complex visual content creation into an intuitive, conversational experience, allowing users to talk, imagine, and evolve their ideas. Overcoming the limitations of fragmented multimodal technique tools, our system seamlessly integrates text-to-image generation, instruction-based image editing, text/image-to-video generation, and interactive understanding within a single AI interface. Users of all skill levels can perform sophisticated visual tasks using natural language and visual inputs. The system's architecture features a central Coordinator module processing multimodal inputs and directing tasks to Generation or Chat pathways. For Generation, a Planner utilizes our state-of-the-art specialized models in image/video generation and image editing, while the Chat function facilitates clarification and collabor...","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3746027.3754467","openalex_id":"https://openalex.org/W4415537932","cited_by_count":0,"quality_score":41,"matched_keywords":["agent"],"author_affiliations":["Baidu (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8241999745368958},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.6492999792098999},{"id":"https://openalex.org/C135641252","display_name":"Multimodal interaction","score":0.5601000189781189},{"id":"https://openalex.org/C2779754051","display_name":"Interactive storytelling","score":0.4941999912261963},{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.48730000853538513},{"id":"https://openalex.org/C123657996","display_name":"Architecture","score":0.4521999955177307},{"id":"https://openalex.org/C49774154","display_name":"Multimedia","score":0.4415000081062317},{"id":"https://openalex.org/C14036430","display_name":"Function (biology)","score":0.4115999937057495}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4415539179","title":"SynthVLM: Towards High-Quality and Efficient Synthesis of Image-Caption Datasets for Vision-Language Models","url":"https://doi.org/10.1145/3746027.3758222","published":"2025-10-25","authors":["Zheng Liu","Hao Liang","Bozhou Li","Wentao Xiong","Chong Chen","Conghui He","Wentao Zhang","Bin Cui"],"abstract":"Vision-Language Models (VLMs) have recently emerged, demonstrating remarkable vision-understanding capabilities. However, training these models requires large-scale datasets, which brings challenges related to efficiency, effectiveness, and quality of web data. In this paper, we introduce SynthVLM, a new data synthesis and curation method for generating image-caption pairs. Unlike traditional methods, where captions are generated from images, SynthVLM utilizes advanced diffusion models and high-quality captions to synthesize and select images from text captions, thereby creating precisely aligned image-text pairs. We further introduce SynthVLM-100K, a high-quality dataset consisting of 100K curated and synthesized image-caption pairs. In both model and human evaluations, SynthVLM-100K outperforms traditional real-world datasets. Leveraging this dataset, we develop a new family of multimo...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3746027.3758222","openalex_id":"https://openalex.org/W4415539179","cited_by_count":0,"quality_score":41,"matched_keywords":["efficient"],"author_affiliations":["Huawei Technologies (China)","Peking University","Shanghai Artificial Intelligence Laboratory"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8296999931335449},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5741000175476074},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.47440001368522644},{"id":"https://openalex.org/C51632099","display_name":"Training set","score":0.4618000090122223},{"id":"https://openalex.org/C2779530757","display_name":"Quality (philosophy)","score":0.42579999566078186},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.4088999927043915},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.4065000116825104},{"id":"https://openalex.org/C67186912","display_name":"Data modeling","score":0.37860000133514404}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4415536855","title":"Speech Token Prediction via Compressed-to-fine Language Modeling for Speech Generation","url":"https://doi.org/10.1145/3746027.3755816","published":"2025-10-25","authors":["Wenrui Liu","Qian Chen","Wen Wang","Guanrou Yang","Weiqin Li","Minghui Fang","Jialong Zuo","Xiaoda Yang","Tao Jin","Jin Xu","Zemin Liu","Ya‐Feng Chen"],"abstract":"Neural audio codecs, used as speech tokenizers, have demonstrated remarkable potential in the field of speech generation. However, to ensure high-fidelity audio reconstruction, neural audio codecs typically encode audio into long sequences of speech tokens, posing a significant challenge for downstream language models in long-context modeling. We observe that speech token sequences exhibit short-range dependency: due to the monotonic alignment between text and speech in text-to-speech (TTS) tasks, the prediction of the current token primarily relies on its local context, while long-range tokens contribute less to the current token prediction and often contain redundant information. Inspired by this observation, we propose a compressed-to-fine language modeling approach to address the challenge of long sequence speech tokens within neural codec language models: (1) Fine-grained Initial an...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3746027.3755816","openalex_id":"https://openalex.org/W4415536855","cited_by_count":0,"quality_score":41,"matched_keywords":["compression"],"author_affiliations":["Alibaba Group (China)","Alibaba Group (United States)","Shanghai Jiao Tong University","Tsinghua University","Zhejiang University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8095999956130981},{"id":"https://openalex.org/C48145219","display_name":"Security token","score":0.8084999918937683},{"id":"https://openalex.org/C28490314","display_name":"Speech recognition","score":0.6682999730110168},{"id":"https://openalex.org/C161765866","display_name":"Codec","score":0.5774999856948853},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.5622000098228455},{"id":"https://openalex.org/C13895895","display_name":"Speech coding","score":0.4880000054836273},{"id":"https://openalex.org/C61328038","display_name":"Speech processing","score":0.42500001192092896},{"id":"https://openalex.org/C14999030","display_name":"Speech synthesis","score":0.4004000127315521}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4415540898","title":"Pseudo-Autoregressive Neural Codec Language Models for Efficient Zero-Shot Text-to-Speech Synthesis","url":"https://doi.org/10.1145/3746027.3754745","published":"2025-10-25","authors":["Yifan Yang","Shujie Liu","Jinyu Li","Yuxuan Hu","Haibin Wu","Hui Wang","Jianwei Yu","Lingwei Meng","Haiyang Sun","Yanqing Liu","Yan Lu","Kai Yu"],"abstract":"Recent zero-shot text-to-speech (TTS) systems face a common dilemma: autoregressive (AR) models suffer from slow generation and lack duration controllability, while non-autoregressive (NAR) models lack temporal modeling and typically require complex designs. In this paper, we introduce a novel pseudo-autoregressive (PAR) codec language modeling approach that unifies AR and NAR modeling. Combining explicit temporal modeling from AR with parallel generation from NAR, PAR generates dynamic-length spans at fixed time steps. Building on PAR, we propose PALLE, a two-stage TTS system that leverages PAR for initial generation followed by NAR refinement. In the first stage, PAR progressively generates speech tokens along the time dimension, with each step predicting all positions in parallel but only retaining the left-most span. In the second stage, low-confidence tokens are iteratively refined....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3746027.3754745","openalex_id":"https://openalex.org/W4415540898","cited_by_count":0,"quality_score":41,"matched_keywords":["efficient"],"author_affiliations":["Microsoft (Canada)","Microsoft (United States)","Microsoft Research Asia (China)","Shanghai Jiao Tong University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8240000009536743},{"id":"https://openalex.org/C159877910","display_name":"Autoregressive model","score":0.6338000297546387},{"id":"https://openalex.org/C28490314","display_name":"Speech recognition","score":0.6291999816894531},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.6226999759674072},{"id":"https://openalex.org/C177264268","display_name":"Set (abstract data type)","score":0.5605999827384949},{"id":"https://openalex.org/C2776214188","display_name":"Inference","score":0.5424000024795532},{"id":"https://openalex.org/C161765866","display_name":"Codec","score":0.4821999967098236},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.42829999327659607}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4415540115","title":"Media integrity and literacy in the age of GenAI & Deepfakes","url":"https://doi.org/10.1145/3746027.3764187","published":"2025-10-25","authors":["Christoph Bregler"],"abstract":"The rise of generative AI has democratized media creation, bringing huge promise but also possible perils. While this may seem like a new problem, the generation and manipulation of media has a long history that predates the current AI boom. I'll discuss key insights from our multi-year analysis of content that people shared online. Looking at manipulations such as deepfakes and cheapfakes, as well as misleading contextual manipulations, I'll reveal surprising statistics that challenge common assumptions about the most prevalent types of problematic media. I'll then explore mitigation strategies, including ways to improve information literacy tools, the opportunities and limitations of using AI to detect manipulated content, and how provenance methods paired with AI can help address out-of-context manipulations. Finally, I'll introduce an AI-based tool that can provide additional context...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3746027.3764187","openalex_id":"https://openalex.org/W4415540115","cited_by_count":0,"quality_score":41,"matched_keywords":["media"],"author_affiliations":["Google (United States)"],"concepts":[{"id":"https://openalex.org/C2779343474","display_name":"Context (archaeology)","score":0.6455000042915344},{"id":"https://openalex.org/C547764534","display_name":"Literacy","score":0.6168000102043152},{"id":"https://openalex.org/C97628146","display_name":"Media literacy","score":0.5892000198364258},{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.5307999849319458},{"id":"https://openalex.org/C26517878","display_name":"Key (lock)","score":0.5180000066757202},{"id":"https://openalex.org/C15744967","display_name":"Psychology","score":0.40689998865127563},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.40470001101493835},{"id":"https://openalex.org/C108827166","display_name":"Internet privacy","score":0.3815999925136566}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"arxiv:2604.16313","title":"MARA: A Multimodal Adaptive Retrieval-Augmented Framework for Document Question Answering","url":"https://arxiv.org/abs/2604.16313","published":"2025-10-25","authors":["Hui Wu","Haoquan Zhai","Yuchen Li","Hengyi Cai","Peirong Zhang","Yidan Zhang","Lei Wang","Chunle Wang","Yingyan Hou","Shuaiqiang Wang","Dawei Yin"],"abstract":"Retrieval-based multimodal document QA aims to identify and integrate relevant information from visually rich documents with complex multimodal structures. While retrieval-augmented generation (RAG) has shown strong performance in text-based QA, its extensions to multimodal documents remain underexplored and face significant limitations. Specifically, current approaches rely on query-agnostic document representations that overlook salient content and use static top-k evidence selection, which fails to adapt to the uncertain distribution of relevant information. To address these limitations, we propose the Multimodal Adaptive Retrieval-Augmented (MARA) framework, which introduces query-adaptive mechanisms to both retrieval and generation. MARA consists of two components: a Query-Aligned Region Encoder that builds multi-level document representations and reweights them based on query relev...","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3746027.3755390","openalex_id":"https://openalex.org/W4415539505","cited_by_count":0,"quality_score":41,"matched_keywords":["retrieval"],"author_affiliations":["Aerospace Information Research Institute","Baidu (China)","Chinese Academy of Sciences","University of Chinese Academy of Sciences"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8101999759674072},{"id":"https://openalex.org/C158154518","display_name":"Relevance (law)","score":0.7218999862670898},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.6373999714851379},{"id":"https://openalex.org/C2780719617","display_name":"Salient","score":0.5952000021934509},{"id":"https://openalex.org/C44291984","display_name":"Question answering","score":0.49810001254081726},{"id":"https://openalex.org/C161156560","display_name":"Document retrieval","score":0.4916999936103821},{"id":"https://openalex.org/C118505674","display_name":"Encoder","score":0.47769999504089355},{"id":"https://openalex.org/C2779530757","display_name":"Quality (philosophy)","score":0.46070000529289246}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4415540378","title":"GLDesigner: Leveraging Multi-Modal LLMs as Designer for Enhanced Aesthetic Text Glyph Layouts","url":"https://doi.org/10.1145/3746027.3755289","published":"2025-10-25","authors":["Junwen He","Yifan Wang","Lijun Wang","Huchuan Lu","Chenyang Li","H.B. Chen","Jin-Peng Lan","Jun-Yan He","Bin Luo","Yifeng Geng"],"abstract":"Text logo design heavily relies on the creativity and expertise of professional designers, in which arranging element layouts is one of the most important procedures. However, this specific task has received limited attention, often overshadowed by broader layout generation tasks such as document or poster design. In this paper, we propose a Vision-Language Model (VLM)-based framework that generates content-aware text logo layouts by integrating multi-modal inputs with user-defined constraints, enabling more flexible and robust layout generation for real-world applications. We introduce two model techniques that reduce the computational cost for processing multiple glyph images simultaneously, without compromising performance. To support instruction tuning of our model, we construct two extensive text logo datasets that are five times larger than existing public datasets. In addition to....","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3746027.3755289","openalex_id":"https://openalex.org/W4415540378","cited_by_count":0,"quality_score":41,"matched_keywords":["language model"],"author_affiliations":["Alibaba Group (China)","Dalian University of Technology","Kingdee (China)"],"concepts":[{"id":"https://openalex.org/C142816647","display_name":"Glyph (data visualization)","score":0.9649999737739563},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7944999933242798},{"id":"https://openalex.org/C2780801425","display_name":"Construct (python library)","score":0.6100999712944031},{"id":"https://openalex.org/C2778720087","display_name":"Logo (programming language)","score":0.5403000116348267},{"id":"https://openalex.org/C2780451532","display_name":"Task (project management)","score":0.529699981212616},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4993000030517578},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.43810001015663147},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.4228000044822693}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4415536280","title":"From Continuous to Discrete: Cross-Domain Collaborative General Speech Enhancement via Hierarchical Language Models","url":"https://doi.org/10.1145/3746027.3754988","published":"2025-10-25","authors":["Zhaoxi Mu","Rilin Chen","Andong Li","Meng Yu","Xinyu Yang","Dong Yu"],"abstract":"This paper introduces OmniGSE, a novel general speech enhancement (GSE) framework designed to mitigate the diverse distortions that speech signals encounter in real-world scenarios. These distortions include background noise, reverberation, bandwidth limitations, signal clipping, and network packet loss. Existing methods typically focus on optimizing for a single type of distortion, often struggling to effectively handle the simultaneous presence of multiple distortions in complex scenarios. OmniGSE bridges this gap by integrating the strengths of discriminative and generative approaches through a two-stage architecture that enables cross-domain collaborative optimization. In the first stage, continuous features are enhanced using a lightweight channel-split NAC-RoFormer. In the second stage, discrete tokens are generated to reconstruct high-quality speech through language models. Specif...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3746027.3754988","openalex_id":"https://openalex.org/W4415536280","cited_by_count":0,"quality_score":41,"matched_keywords":["language model"],"author_affiliations":["Bellevue Hospital Center","Chinese Academy of Sciences","Institute of Acoustics","Tencent (China)","Xi'an Jiaotong University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7990999817848206},{"id":"https://openalex.org/C127759330","display_name":"Codebook","score":0.796999990940094},{"id":"https://openalex.org/C2776182073","display_name":"Speech enhancement","score":0.746999979019165},{"id":"https://openalex.org/C97931131","display_name":"Discriminative model","score":0.6642000079154968},{"id":"https://openalex.org/C192209626","display_name":"Focus (optics)","score":0.5468000173568726},{"id":"https://openalex.org/C28490314","display_name":"Speech recognition","score":0.4781999886035919},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.46059998869895935},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4189000129699707}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4415536336","title":"Fighting Fire with Fire (F3): A Training-free and Efficient Visual Adversarial Example Purification Method in LVLMs","url":"https://doi.org/10.1145/3746027.3758152","published":"2025-10-25","authors":["Yudong Zhang","Ruobing Xie","Yiqing Huang","Jiansheng Chen","Xingwu Sun","Zhanhui Kang","Di Wang","Yu Wang"],"abstract":"Recent advances in large vision-language models (LVLMs) have showcased their remarkable capabilities across a wide range of multimodal vision-language tasks. However, these models remain vulnerable to visual adversarial attacks, which can substantially compromise their performance. In this paper, we introduce F3, a novel adversarial purification framework that employs a counterintuitive ''fighting fire with fire'' strategy: intentionally introducing simple perturbations to adversarial examples to mitigate their harmful effects. Specifically, F3 leverages cross-modal attentions derived from randomly perturbed adversary examples as reference targets. By injecting noise into these adversarial examples, F3 effectively refines their attention, resulting in cleaner and more reliable model outputs. Remarkably, this seemingly paradoxical approach of employing noise to counteract adversarial atta...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3746027.3758152","openalex_id":"https://openalex.org/W4415536336","cited_by_count":0,"quality_score":41,"matched_keywords":["efficient"],"author_affiliations":["Tencent (China)","Tsinghua University","University of Macau","University of Science and Technology Beijing"],"concepts":[{"id":"https://openalex.org/C37736160","display_name":"Adversarial system","score":0.9516000151634216},{"id":"https://openalex.org/C41065033","display_name":"Adversary","score":0.7305999994277954},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6740000247955322},{"id":"https://openalex.org/C101097943","display_name":"Counterintuitive","score":0.5328999757766724},{"id":"https://openalex.org/C99498987","display_name":"Noise (video)","score":0.4489000141620636},{"id":"https://openalex.org/C63479239","display_name":"Robustness (evolution)","score":0.4108000099658966},{"id":"https://openalex.org/C204323151","display_name":"Range (aeronautics)","score":0.39980000257492065},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3926999866962433}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4415540375","title":"FantasyTalking: Realistic Talking Portrait Generation via Coherent Motion Synthesis","url":"https://doi.org/10.1145/3746027.3755217","published":"2025-10-25","authors":["Mengchao Wang","Qiang Wang","Fan Jiang","Yaqi Fan","Yunpeng Zhang","Yonggang Qi","Kun Zhao","Mu Xu"],"abstract":"Creating a realistic animatable avatar from a single static portrait remains challenging. Existing approaches often struggle to capture subtle facial expressions, the associated global body movements, and the dynamic background. To address these limitations, we propose a novel framework that leverages a pretrained video diffusion Transformer model to generate high-fidelity, coherent talking portraits with controllable motion dynamics. At the core of our work is a dual-stage audio-visual alignment strategy. In the first stage, we employ a clip-level training scheme to establish coherent global motion by aligning audio-driven dynamics across the entire scene, including the reference portrait, contextual objects, and background. In the second stage, we refine lip movements at the frame level using a lip-tracing mask, ensuring precise synchronization with audio signals. To preserve identity....","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3746027.3755217","openalex_id":"https://openalex.org/W4415540375","cited_by_count":4,"quality_score":41,"matched_keywords":[],"author_affiliations":["Alibaba Group (China)","Beijing University of Posts and Telecommunications"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7664999961853027},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.642799973487854},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5909000039100647},{"id":"https://openalex.org/C104114177","display_name":"Motion (physics)","score":0.5572999715805054},{"id":"https://openalex.org/C2778562939","display_name":"Synchronization (alternating current)","score":0.5081999897956848},{"id":"https://openalex.org/C48007421","display_name":"Motion capture","score":0.414000004529953},{"id":"https://openalex.org/C2777365542","display_name":"Avatar","score":0.40380001068115234},{"id":"https://openalex.org/C2776436953","display_name":"Consistency (knowledge bases)","score":0.36320000886917114}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":4}},{"id":"openalex:W4415536424","title":"FELLE: Autoregressive Speech Synthesis with Token-Wise Coarse-to-Fine Flow Matching","url":"https://doi.org/10.1145/3746027.3755494","published":"2025-10-25","authors":["Hui Wang","Shujie Liu","Lingwei Meng","Jinyu Li","Yifan Yang","Shiwan Zhao","Haiyang Sun","Yanqing Liu","Haoqin Sun","Jiaming Zhou","Yan Lu","Yong Qin"],"abstract":"To advance continuous token modeling and temporal-coherence enforcement, we propose FELLE, an autoregressive model that integrates language modeling with token-wise flow matching. By leveraging the autoregressive nature of language models and the generative efficacy of flow matching, FELLE effectively predicts continuous-valued tokens (mel-spectrograms). For each continuous-valued token, FELLE modifies the general prior distribution in flow matching by incorporating information from the previous step, improving coherence and stability. Furthermore, to enhance synthesis quality, FELLE introduces a coarse-to-fine flow-matching mechanism, generating continuous-valued tokens hierarchically, conditioned on the language model's output. Experimental results demonstrate the potential of incorporating flow-matching techniques in autoregressive mel-spectrogram modeling, leading to significant impr...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3746027.3755494","openalex_id":"https://openalex.org/W4415536424","cited_by_count":0,"quality_score":41,"matched_keywords":["language model"],"author_affiliations":["Microsoft (Finland)","Microsoft (United States)","Microsoft Research Asia (China)","Nankai University"],"concepts":[{"id":"https://openalex.org/C159877910","display_name":"Autoregressive model","score":0.885200023651123},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7495999932289124},{"id":"https://openalex.org/C48145219","display_name":"Security token","score":0.5723999738693237},{"id":"https://openalex.org/C28490314","display_name":"Speech recognition","score":0.5110999941825867},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.5083000063896179},{"id":"https://openalex.org/C165064840","display_name":"Matching (statistics)","score":0.4991999864578247},{"id":"https://openalex.org/C2781181686","display_name":"Coherence (philosophical gambling strategy)","score":0.4564000070095062},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.43290001153945923}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4415536914","title":"Consistent and Invariant Generalization Learning for Short-video Misinformation Detection","url":"https://doi.org/10.1145/3746027.3755809","published":"2025-10-25","authors":["Hanghui Guo","Weijie Shi","Mengze Li","Juncheng Li","Hao Chen","Yue Cui","Jiajie Xu","Jia Zhu","Jiawei Shen","Zhangze Chen","Sirui Han"],"abstract":"Short-video misinformation detection has attracted wide attention in the multi-modal domain, aiming to accurately identify the misinformation in the video format accompanied by the corresponding audio. Despite significant advancements, current models in this field, trained on particular domains (source domains), often exhibit unsatisfactory performance on unseen domains (target domains) due to domain gaps. To effectively realize such domain generalization on the short-video misinformation detection task, we propose deep insights into the characteristics of different domains: (1) The detection on various domains may mainly rely on different modalities (i.e., mainly focusing on videos or audios). To enhance domain generalization, it is crucial to achieve optimal model performance on all modalities simultaneously. (2) For some domains focusing on cross-modal joint fraud, a comprehensive ana...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3746027.3755809","openalex_id":"https://openalex.org/W4415536914","cited_by_count":0,"quality_score":41,"matched_keywords":["distillation"],"author_affiliations":["Hong Kong University of Science and Technology","Soochow University","Tencent (China)","Zhejiang Normal University","Zhejiang University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7533000111579895},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.7501999735832214},{"id":"https://openalex.org/C2779903281","display_name":"Modalities","score":0.593999981880188},{"id":"https://openalex.org/C2776990098","display_name":"Misinformation","score":0.5867000222206116},{"id":"https://openalex.org/C190470478","display_name":"Invariant (physics)","score":0.5181999802589417},{"id":"https://openalex.org/C177148314","display_name":"Generalization","score":0.46810001134872437},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.45489999651908875},{"id":"https://openalex.org/C116834253","display_name":"Identification (biology)","score":0.4422000050544739}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"arxiv:2510.05586","title":"CalibCLIP: Contextual Calibration of Dominant Semantics for Text-Driven Image Retrieval","url":"http://arxiv.org/abs/2510.05586","published":"2025-10-25","authors":["Bin Kang","Bin Chen","Junjie Wang","Yulin Li","Junzhi Zhao","Junle Wang","Zhuotao Tian"],"abstract":"Existing Visual Language Models (VLMs) suffer structural limitations where a few low contribution tokens may excessively capture global semantics, dominating the information aggregation process and suppressing the discriminative features in text-driven image retrieval tasks. To address this, we introduce \\textbf{CalibCLIP}, a training-free method designed to calibrate the suppressive effect of dominant tokens. Specifically, in the visual space, we propose the Contrastive Visual Enhancer (CVE), which decouples visual features into target and low information regions. Subsequently, it identifies dominant tokens and dynamically suppresses their representations.In the textual space, we introduce the Discriminative Concept Calibrator (DCC), which aims to differentiate between general and discriminative concepts within the text query. By mitigating the challenges posed by generic concepts and i...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"preprint","doi":"https://doi.org/10.1145/3746027.3755765","openalex_id":"https://openalex.org/W4414977866","cited_by_count":0,"quality_score":41,"matched_keywords":["retrieval"],"author_affiliations":["Harbin Institute of Technology","Southwest Jiaotong University","Tencent (China)","University of Chinese Academy of Sciences"],"concepts":[{"id":"https://openalex.org/C97931131","display_name":"Discriminative model","score":0.9459999799728394},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7494000196456909},{"id":"https://openalex.org/C184337299","display_name":"Semantics (computer science)","score":0.6406999826431274},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.6297000050544739},{"id":"https://openalex.org/C1667742","display_name":"Image retrieval","score":0.5181999802589417},{"id":"https://openalex.org/C98045186","display_name":"Process (computing)","score":0.47209998965263367},{"id":"https://openalex.org/C2776760102","display_name":"Code (set theory)","score":0.4205999970436096},{"id":"https://openalex.org/C153180895","display_name":"Pattern recognition (psychology)","score":0.4092999994754791}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4415538849","title":"Less is More: High-value Data Selection for Visual Instruction Tuning","url":"https://doi.org/10.1145/3746027.3755160","published":"2025-10-25","authors":["Zikang Liu","Kun Zhou","Wayne Xin Zhao","Dawei Gao","Yaliang Li","Ji-Rong Wen"],"abstract":"Visual instruction tuning is the key to building large vision language models (LVLMs), which can greatly improve the task solving and generalization capabilities. Previous work mostly collects a mixture of existing visual instruction datasets via heuristic ways for train- ing (even more than a million instructions), which may introduce data redundancy and increase the training cost. To investigate it, we conduct a series of empirical studies, which show that greatly reducing the amount of instructions from several tasks even do not affect the performance, indicating significant redundancy within the visual instruction datasets. Based on the findings, we propose a high-value data selection approach TIVE, to eliminate redundancy within the visual instruction data and reduce the training cost. In TIVE, based on the gradient-based influence functions, we estimate the instance influence score...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3746027.3755160","openalex_id":"https://openalex.org/W4415538849","cited_by_count":3,"quality_score":40,"matched_keywords":[],"author_affiliations":["Alibaba Group (China)","Alibaba Group (United States)","Bellevue Hospital Center","Renmin University of China"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8044000267982483},{"id":"https://openalex.org/C153083717","display_name":"Leverage (statistics)","score":0.6840000152587891},{"id":"https://openalex.org/C152124472","display_name":"Redundancy (engineering)","score":0.5976999998092651},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.5669000148773193},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5149000287055969},{"id":"https://openalex.org/C2780451532","display_name":"Task (project management)","score":0.45339998602867126},{"id":"https://openalex.org/C51632099","display_name":"Training set","score":0.390500009059906},{"id":"https://openalex.org/C175154964","display_name":"Task analysis","score":0.3711000084877014}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":3}},{"id":"openalex:W4415540927","title":"SHALE: A Scalable Benchmark for Fine-grained Hallucination Evaluation in LVLMs","url":"https://doi.org/10.1145/3746027.3758308","published":"2025-10-25","authors":["Bei Yan","Zhiyuan Chen","Yuecong Min","Jie Zhang","Jiahao Wang","Xiaozhen Wang","Shiguang Shan"],"abstract":"Despite rapid advances, Large Vision-Language Models (LVLMs) still suffer from hallucinations, i.e., generating content inconsistent with input or established world knowledge, which correspond to faithfulness and factuality hallucinations, respectively. Prior studies primarily evaluate faithfulness hallucination at a rather coarse level (e.g., object-level) and lack fine-grained analysis. Additionally, existing benchmarks often rely on costly manual curation or reused public datasets, raising concerns about scalability and data leakage. To address these limitations, we propose an automated data construction pipeline that produces scalable, controllable, and diverse evaluation data. We also design a hierarchical hallucination induction framework with input perturbations to simulate realistic noisy scenarios. Integrating these designs, we construct SHALE, a Scalable HALlucination Evaluatio...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3746027.3758308","openalex_id":"https://openalex.org/W4415540927","cited_by_count":1,"quality_score":38,"matched_keywords":[],"author_affiliations":["Chinese Academy of Sciences","Huawei Technologies (China)","Institute of Computing Technology"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6621999740600586},{"id":"https://openalex.org/C185798385","display_name":"Benchmark (surveying)","score":0.6603999733924866},{"id":"https://openalex.org/C43521106","display_name":"Pipeline (software)","score":0.5985999703407288},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5722000002861023},{"id":"https://openalex.org/C48044578","display_name":"Scalability","score":0.5633000135421753},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.531000018119812},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.49790000915527344},{"id":"https://openalex.org/C94124525","display_name":"Categorization","score":0.4950000047683716}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4415540974","title":"MVQA-68K: A Multi-dimensional and Causally-annotated Dataset with Quality Interpretability for Video Assessment","url":"https://doi.org/10.1145/3746027.3754824","published":"2025-10-25","authors":["Yanyun Pu","Kehan Li","Zeyi Huang","Zhijie Zhong","Kaixiang Yang"],"abstract":"With the rapid advancement of video generation models such as Sora, video quality assessment (VQA) is becoming increasingly crucial for selecting high-quality videos from large-scale datasets used in pre-training. Traditional VQA methods, typically producing single numerical scores, often lack comprehensiveness and interpretability. To address these challenges, we introduce MVQA-68K, a novel multi-dimensional VQA dataset comprising over 68,000 carefully annotated videos, covering seven essential quality dimensions: overall aesthetics, camera movement, dynamic degree, texture detail, composition, visual quality, and factual consistency. Each annotation includes detailed chain-of-thought reasoning to facilitate interpretability and comprehensive understanding. Extensive experiments demonstrate that MVQA-68K significantly enhances the performance of various multimodal large language models....","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3746027.3754824","openalex_id":"https://openalex.org/W4415540974","cited_by_count":1,"quality_score":38,"matched_keywords":[],"author_affiliations":["Huawei Technologies (China)","South China University of Technology"],"concepts":[{"id":"https://openalex.org/C2781067378","display_name":"Interpretability","score":0.9764999747276306},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7971000075340271},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.6015999913215637},{"id":"https://openalex.org/C2779530757","display_name":"Quality (philosophy)","score":0.5727999806404114},{"id":"https://openalex.org/C177264268","display_name":"Set (abstract data type)","score":0.5720000267028809},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.5454999804496765},{"id":"https://openalex.org/C98045186","display_name":"Process (computing)","score":0.5297999978065491},{"id":"https://openalex.org/C2776321320","display_name":"Annotation","score":0.5297999978065491}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4415539341","title":"DHCP: Detecting Hallucinations by Cross-modal Attention Pattern in Large Vision-Language Models","url":"https://doi.org/10.1145/3746027.3755118","published":"2025-10-25","authors":["Yudong Zhang","Ruobing Xie","Xingwu Sun","Yiqing Huang","Jiansheng Chen","Zhanhui Kang","Di Wang","Yu Wang"],"abstract":"Large vision-language models (LVLMs) have demonstrated exceptional performance on complex multimodal tasks. However, they continue to suffer from significant hallucination issues, including object, attribute, and relational hallucinations. To accurately detect these hallucinations, we investigated the variations in cross-modal attention patterns between hallucination and non-hallucination states. Leveraging these distinctions, we developed a lightweight detector capable of identifying hallucinations. Our proposed method, Detecting Hallucinations by Cross-modal Attention Patterns (DHCP), is straightforward and does not require additional LVLM training or extra LVLM inference steps. Experimental results show that DHCP achieves remarkable performance in hallucination detection. By offering novel insights into the identification and analysis of hallucinations in LVLMs, DHCP contributes to ad...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3746027.3755118","openalex_id":"https://openalex.org/W4415539341","cited_by_count":1,"quality_score":38,"matched_keywords":[],"author_affiliations":["Tencent (China)","Tsinghua University","University of Science and Technology Beijing"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5755000114440918},{"id":"https://openalex.org/C116834253","display_name":"Identification (biology)","score":0.5519000291824341},{"id":"https://openalex.org/C43214815","display_name":"Reliability (semiconductor)","score":0.544700026512146},{"id":"https://openalex.org/C2776214188","display_name":"Inference","score":0.5367000102996826},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.46239998936653137},{"id":"https://openalex.org/C15744967","display_name":"Psychology","score":0.4235000014305115},{"id":"https://openalex.org/C180747234","display_name":"Cognitive psychology","score":0.41679999232292175},{"id":"https://openalex.org/C26760741","display_name":"Perception","score":0.3619999885559082}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4415537304","title":"Breaking the Modality Barrier: Universal Embedding Learning with Multimodal LLMs","url":"https://doi.org/10.1145/3746027.3754845","published":"2025-10-25","authors":["Tiancheng Gu","Kaicheng Yang","Ziyong Feng","Xingjun Wang","Yanzhao Zhang","Dingkun Long","Yingda Chen","Weidong Cai","Jiankang Deng"],"abstract":"","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3746027.3754845","openalex_id":"https://openalex.org/W4415537304","cited_by_count":1,"quality_score":38,"matched_keywords":[],"author_affiliations":["Alibaba Group (China)","Alibaba Group (United States)","Bellevue Hospital Center","Imperial College London","The University of Sydney"],"concepts":[{"id":"https://openalex.org/C2780226545","display_name":"Modality (human–computer interaction)","score":0.635699987411499},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5304999947547913},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4862000048160553},{"id":"https://openalex.org/C41608201","display_name":"Embedding","score":0.4537999927997589},{"id":"https://openalex.org/C26517878","display_name":"Key (lock)","score":0.3416000008583069},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.32409998774528503},{"id":"https://openalex.org/C15744967","display_name":"Psychology","score":0.28380000591278076},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.2784999907016754}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4415546060","title":"User Simulator to Evaluate and Optimize AIGC","url":"https://doi.org/10.1145/3746262.3761972","published":"2025-10-25","authors":["Junfeng He"],"abstract":"In this talk, I will present our recent works on building user models as well as how to apply them to evaluate and improve AIGC. In particular, I will talk about how to build an rich human feedback (auto-rater) model to predict raters' rich feedback for generated images (CVPR 2024 paper), which can serve as an interpretable AIGC evaluation and reward model. Moreover, I will show how to improve image generation models via fine-tuning with our auto-rater model predictions, e.g. achieving region-aware fine-tuning for T2I models to fix problematic regions (CVPR 2025 paper), and more. Finally, we will also discuss a rich human behavior model across various kinds of visual content (NeurIPS 2024 paper), and give some example about how to use it to improve visual content for better user experience.","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3746262.3761972","openalex_id":"https://openalex.org/W4415546060","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Google (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7598000168800354},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.5006999969482422},{"id":"https://openalex.org/C36464697","display_name":"Visualization","score":0.398499995470047},{"id":"https://openalex.org/C67712803","display_name":"User modeling","score":0.38260000944137573},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.37310001254081726},{"id":"https://openalex.org/C3020716817","display_name":"Visual feedback","score":0.3327000141143799},{"id":"https://openalex.org/C115961682","display_name":"Image (mathematics)","score":0.3287000060081482},{"id":"https://openalex.org/C2780626000","display_name":"Human-in-the-loop","score":0.3172999918460846}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"arxiv:2508.07766","title":"UniSVG: A Unified Dataset for Vector Graphic Understanding and Generation with Multimodal Large Language Models","url":"http://arxiv.org/abs/2508.07766","published":"2025-10-25","authors":["Jinke Li","Jiarui Yu","Chenxing Wei","Hande Dong","Qiang Lin","L. Yang","Z. H. Wang","Yanbin Hao"],"abstract":"Unlike bitmap images, scalable vector graphics (SVG) maintain quality when scaled, frequently employed in computer vision and artistic design in the representation of SVG code. In this era of proliferating AI-powered systems, enabling AI to understand and generate SVG has become increasingly urgent. However, AI-driven SVG understanding and generation (U&G) remain significant challenges. SVG code, equivalent to a set of curves and lines controlled by floating-point parameters, demands high precision in SVG U&G. Besides, SVG generation operates under diverse conditional constraints, including textual prompts and visual references, which requires powerful multi-modal processing for condition-to-SVG transformation. Recently, the rapid growth of Multi-modal Large Language Models (MLLMs) have demonstrated capabilities to process multi-modal inputs and generate complex vector controlling parame...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3746027.3758269","openalex_id":"https://openalex.org/W4415539825","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Anhui University","Hefei University of Technology","Shenzhen University","Tencent (China)","Zhejiang University-University of Edinburgh Institute"],"concepts":[{"id":"https://openalex.org/C202629362","display_name":"Scalable Vector Graphics","score":0.9959999918937683},{"id":"https://openalex.org/C3115412","display_name":"Bitmap","score":0.7688999772071838},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7621999979019165},{"id":"https://openalex.org/C21442007","display_name":"Graphics","score":0.498199999332428},{"id":"https://openalex.org/C48044578","display_name":"Scalability","score":0.4713999927043915},{"id":"https://openalex.org/C59662460","display_name":"Vector graphics","score":0.45680001378059387},{"id":"https://openalex.org/C177264268","display_name":"Set (abstract data type)","score":0.43560001254081726},{"id":"https://openalex.org/C98045186","display_name":"Process (computing)","score":0.42719998955726624}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4415536174","title":"Topic Guided Multi-faceted Semantic Disentanglement for CTR prediction","url":"https://doi.org/10.1145/3746027.3754981","published":"2025-10-25","authors":["Fengxin Li","Zhiqian Yin","Hongyan Liu","Jingcai Guo","Jun He","Yi Li","Chao Zhou","Jun Zhang","Haijie Gu"],"abstract":"Click-through rate (CTR) prediction lies at the heart of the online advertising ecosystem and recommendation systems, helping to improve user engagement and platform revenue. With the advent of Pretrained Language Models (PLMs), researchers focus on incorporating textual features to enhance semantic understanding in CTR prediction. However, existing methods typically aggregate a wealth of textual features and encode the informative text into a single semantic embedding. This mechanism leads to entangled embedding that fails to capture fine-grained feature interactions, ultimately limiting CTR prediction performance. To address this issue, we propose Multi-faceted Semantic Disentanglement for CTR prediction (MSD-CTR), a novel framework designed to disentangle and leverage multi-faceted textual information. MSD-CTR consists of two key components: Disentangled Semantic Topic Model (DSTopic)...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3746027.3754981","openalex_id":"https://openalex.org/W4415536174","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Hong Kong Polytechnic University","Renmin University of China","Tencent (China)","Tsinghua University","University of Hong Kong"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8180999755859375},{"id":"https://openalex.org/C153083717","display_name":"Leverage (statistics)","score":0.7188000082969666},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5480999946594238},{"id":"https://openalex.org/C4679612","display_name":"Aggregate (composite)","score":0.5396000146865845},{"id":"https://openalex.org/C41608201","display_name":"Embedding","score":0.5073999762535095},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.5013999938964844},{"id":"https://openalex.org/C2781122975","display_name":"Semantic feature","score":0.49320000410079956},{"id":"https://openalex.org/C2776401178","display_name":"Feature (linguistics)","score":0.46299999952316284}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4415539266","title":"SpineBench: Benchmarking Multimodal LLMs for Spinal Pathology Analysis","url":"https://doi.org/10.1145/3746027.3758212","published":"2025-10-25","authors":["Chenghanyu Zhang","Zekun Li","Peipei Li","Xing Cui","Shuhan Xia","Weixiang Yan","Yiqiao Zhang","Qianyu Zhuang"],"abstract":"With the increasing integration of Multimodal Large Language Models (MLLMs) into the medical field, comprehensive evaluation of their performance in various medical domains becomes critical. However, existing benchmarks primarily assess general medical tasks, inadequately capturing performance in nuanced areas like the spine, which relies heavily on visual input. To address this, we introduce SpineBench, a comprehensive Visual Question Answering (VQA) benchmark designed for fine-grained analysis and evaluation of MLLMs in the spinal domain. SpineBench comprises 64,878 QA pairs from 40,263 spine images, covering 11 spinal diseases through two critical clinical tasks: spinal disease diagnosis and spinal lesion localization, both in multiple-choice format. SpineBench is built by integrating and standardizing image-label pairs from open-source spinal disease datasets, and samples challenging...","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3746027.3758212","openalex_id":"https://openalex.org/W4415539266","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Amazon (United States)","Beijing University of Posts and Telecommunications","Chinese Academy of Medical Sciences & Peking Union Medical College","Peking Union Medical College Hospital","University of California, Santa Barbara"],"concepts":[{"id":"https://openalex.org/C86251818","display_name":"Benchmarking","score":0.8276000022888184},{"id":"https://openalex.org/C71924100","display_name":"Medicine","score":0.5746999979019165},{"id":"https://openalex.org/C2781438226","display_name":"Spinal disease","score":0.5703999996185303},{"id":"https://openalex.org/C185798385","display_name":"Benchmark (surveying)","score":0.5454999804496765},{"id":"https://openalex.org/C99508421","display_name":"Physical medicine and rehabilitation","score":0.4560999870300293},{"id":"https://openalex.org/C36503486","display_name":"Domain (mathematical analysis)","score":0.4077000021934509},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.4016000032424927},{"id":"https://openalex.org/C2779134260","display_name":"Disease","score":0.4016000032424927}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"arxiv:2504.19183","title":"Segmenting Objectiveness and Task-awareness Unknown Region for Autonomous Driving","url":"http://arxiv.org/abs/2504.19183","published":"2025-10-25","authors":["Mi Zheng","Guanglei Yang","Zhenhong Huang","Zhenhua Guo","Kevin Wu Han","Wangmeng Zuo"],"abstract":"With the emergence of transformer-based architectures and large language models (LLMs), the accuracy of road scene perception has substantially advanced. Nonetheless, current road scene segmentation approaches are predominantly trained on closed-set data, resulting in insufficient detection capabilities for out-of-distribution (OOD) objects. To overcome this limitation, road anomaly detection methods have been proposed. However, existing methods primarily depend on image inpainting and OOD distribution detection techniques, facing two critical issues: (1) inadequate consideration of the objectiveness attributes of anomalous regions, causing incomplete segmentation when anomalous objects share similarities with known classes, and (2) insufficient attention to environmental constraints, leading to the detection of anomalies irrelevant to autonomous driving tasks. In this paper, we propose....","companies":["Meta/FAIR"],"matched_orgs":["Meta/FAIR"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"preprint","doi":"https://doi.org/10.1145/3746027.3755178","openalex_id":"https://openalex.org/W4415539114","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Harbin Institute of Technology","Meta (United States)"],"concepts":[{"id":"https://openalex.org/C89600930","display_name":"Segmentation","score":0.7554000020027161},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.7486000061035156},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6583999991416931},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.5787000060081482},{"id":"https://openalex.org/C185798385","display_name":"Benchmark (surveying)","score":0.550599992275238},{"id":"https://openalex.org/C2777210771","display_name":"Block (permutation group theory)","score":0.4943999946117401},{"id":"https://openalex.org/C125308379","display_name":"Market segmentation","score":0.474700003862381},{"id":"https://openalex.org/C739882","display_name":"Anomaly detection","score":0.45669999718666077}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4415540301","title":"PgM: Partitioner Guided Modal Learning Framework","url":"https://doi.org/10.1145/3746027.3754788","published":"2025-10-25","authors":["Guimin Hu","Yuchen Xin","Lijie Hu","Zhihong Zhu","Hasti Seifi"],"abstract":"Multimodal learning benefits from multiple modal information, and each learned modal representations can be divided into uni-modal that can be learned from uni-modal training and paired-modal features that can be learned from cross-modal interaction. Building on this perspective, we propose a partitioner-guided modal learning framework, PgM, which consists of the modal partitioner, uni-modal learner, paired-modal learner, and uni-paired modal decoder. Modal partitioner segments the learned modal representation into uni-modal and paired-modal features. Modal learner incorporates two dedicated components for uni-modal and paired-modal learning. Uni-paired modal decoder reconstructs modal representation based on uni-modal and paired-modal features. PgM offers three key benefits: 1) thorough learning of uni-modal and paired-modal features, 2) flexible distribution adjustment for uni-modal an...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3746027.3754788","openalex_id":"https://openalex.org/W4415540301","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Arizona State University","Guangdong University of Technology","IT University of Copenhagen","Mohamed bin Zayed University of Artificial Intelligence","Nanjing University","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C71139939","display_name":"Modal","score":0.900600016117096},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6930999755859375},{"id":"https://openalex.org/C2776359362","display_name":"Representation (politics)","score":0.6488999724388123},{"id":"https://openalex.org/C2779903281","display_name":"Modalities","score":0.6362000107765198},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5465999841690063},{"id":"https://openalex.org/C59404180","display_name":"Feature learning","score":0.48350000381469727},{"id":"https://openalex.org/C26517878","display_name":"Key (lock)","score":0.4361000061035156},{"id":"https://openalex.org/C2780226545","display_name":"Modality (human–computer interaction)","score":0.4262999892234802}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4415538363","title":"OCR-Critic: Aligning Multimodal Large Language Models' Perception through Critical Feedback","url":"https://doi.org/10.1145/3746027.3754585","published":"2025-10-25","authors":["Qiuna Tan","Runqi Qiao","Guanting Dong","YiFan Zhang","Minhui Wu","Jiapeng Wang","Miaoxuan Zhang","Yida Xu","Chong Sun","Chen Li","Honggang Zhang"],"abstract":"Recent advancements in Large Multimodal Models have demonstrated impressive performance in various tasks. However, their capabilities in error detection and resolution for Optical Character Recognition (OCR) remain underexplored. To address this gap, we construct the first visual instruction tuning dataset specifically for detailed OCR error analysis. Building on this foundation, we develop a universal, plug-and-play OCR-Critic model that incorporates three novel dynamic alignment strategies. These strategies systematically mitigate LMMs' weaknesses in OCR tasks by providing coarse-to-fine error feedback. To comprehensively evaluate these capabilities, we introduce OCR-ERROR, a benchmark designed to assess LMMs' ability to detect and categorize OCR errors, covering two task types, diverse error categories, and 2,400 rigorously validated samples. Experimental results show that OCR-Critic....","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3746027.3754585","openalex_id":"https://openalex.org/W4415538363","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Beijing University of Posts and Telecommunications","Renmin University of China","South China University of Technology","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8051000237464905},{"id":"https://openalex.org/C185798385","display_name":"Benchmark (surveying)","score":0.6152999997138977},{"id":"https://openalex.org/C94124525","display_name":"Categorization","score":0.6025999784469604},{"id":"https://openalex.org/C2780451532","display_name":"Task (project management)","score":0.5819000005722046},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5641999840736389},{"id":"https://openalex.org/C2780801425","display_name":"Construct (python library)","score":0.5602999925613403},{"id":"https://openalex.org/C103088060","display_name":"Error detection and correction","score":0.5309000015258789},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.4869999885559082}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4415539120","title":"MathScape: Benchmarking Multimodal Large Language Models in Real-World Mathematical Contexts","url":"https://doi.org/10.1145/3746027.3758240","published":"2025-10-25","authors":["Hao Liang","Linzhuang Sun","zhouminxuan zhouminxuan","Zirong Chen","Meiyi Qiang","Mingan Lin","T. Li","Fan Yang","Zenan Zhou","Wentao Zhang"],"abstract":"With the rapid progress of Multimodal LLMs, evaluating their mathematical reasoning capabilities has become an increasingly important research direction. In particular, visual-textual mathematical reasoning serves as a key indicator of an MLLM's ability to comprehend and solve complex, multi-step quantitative problems. While existing benchmarks such as MathVista and MathVerse have advanced the evaluation of multimodal math proficiency, they primarily rely on digitally rendered content and fall short in capturing the complexity of real-world scenarios. To bridge this gap, we introduce MathScape, a novel benchmark focused on assessing MLLMs' reasoning ability in realistic mathematical contexts. MathScape comprises 1,369 high-quality math problems paired with human-captured real-world images, closely reflecting the challenges encountered in practical educational settings. We conduct a thoro...","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3746027.3758240","openalex_id":"https://openalex.org/W4415539120","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Baidu (China)","Beijing Institute of Technology","Nankai University","Peking University","Peking University Stomatological Hospital","University of Chinese Academy of Sciences"],"concepts":[{"id":"https://openalex.org/C86251818","display_name":"Benchmarking","score":0.7735000252723694},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7028999924659729},{"id":"https://openalex.org/C2776962539","display_name":"Lagging","score":0.6477000117301941},{"id":"https://openalex.org/C185798385","display_name":"Benchmark (surveying)","score":0.5975000262260437},{"id":"https://openalex.org/C26517878","display_name":"Key (lock)","score":0.5371000170707703},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5194000005722046},{"id":"https://openalex.org/C100776233","display_name":"Bridge (graph theory)","score":0.48809999227523804},{"id":"https://openalex.org/C76969082","display_name":"Mathematical model","score":0.45159998536109924}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4415540810","title":"MESH - Understanding Videos Like Human: Measuring Hallucinations in Large Video Models","url":"https://doi.org/10.1145/3746027.3755626","published":"2025-10-25","authors":["Garry Yang","Zizhe Chen","Man Hon Wong","Haoyu Lei","Yongqiang Chen","Zhenguo Li","Kaiwen Zhou","James Cheng"],"abstract":"Large Video Models (LVMs) build on the semantic capabilities of Large Language Models (LLMs) and vision modules by integrating temporal information to better understand dynamic video content. Despite their progress, LVMs are prone to hallucinations-producing inaccurate or irrelevant descriptions. Current benchmarks for video hallucination depend heavily on manual categorization of video content, neglecting the perception-based processes through which humans naturally interpret videos. We introduce MESH, a benchmark designed to evaluate hallucinations in LVMs systematically. MESH uses a Question-Answering framework with binary and multi-choice formats incorporating target and trap instances. It follows a bottom-up approach, evaluating basic objects, coarse-to-fine subject features, and subject-action pairs, aligning with human video understanding. We demonstrate that MESH offers an effect...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3746027.3755626","openalex_id":"https://openalex.org/W4415540810","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Chinese University of Hong Kong","Huawei Technologies (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7590000033378601},{"id":"https://openalex.org/C185798385","display_name":"Benchmark (surveying)","score":0.6158000230789185},{"id":"https://openalex.org/C94124525","display_name":"Categorization","score":0.6074000000953674},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5253999829292297},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.4147000014781952},{"id":"https://openalex.org/C2908998935","display_name":"Visual Hallucination","score":0.4113999903202057},{"id":"https://openalex.org/C202474056","display_name":"Video tracking","score":0.4036000072956085},{"id":"https://openalex.org/C184337299","display_name":"Semantics (computer science)","score":0.3637999892234802}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4415540873","title":"Identity-Preserving Text-to-Video Generation Guided by Simple yet Effective Spatial-Temporal Decoupled Representations","url":"https://doi.org/10.1145/3746027.3761988","published":"2025-10-25","authors":["Yuji Wang","Moran Li","Xiaobin Hu","Ran Yi","Jiangning Zhang","Han Feng","Weijian Cao","Yabiao Wang","Chengjie Wang","Lizhuang Ma"],"abstract":"Identity-preserving text-to-video (IPT2V) generation, which aims to create high-fidelity videos with consistent human identity, has become crucial for downstream applications. However, current end-to-end frameworks suffer a critical spatial-temporal trade-off: optimizing for spatially coherent layouts of key elements ( e.g., character identity preservation) often compromises instruction-compliant temporal smoothness, while prioritizing dynamic realism risks disrupting the spatial coherence of visual structures. To tackle this issue, we propose a simple yet effective spatial-temporal decoupled framework that decomposes representations into spatial features for layouts and temporal features for motion dynamics. Specifically, our paper proposes a semantic prompt optimization mechanism and stage-wise decoupled generation paradigm. The former module decouples the prompt into spatial and tempo...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3746027.3761988","openalex_id":"https://openalex.org/W4415540873","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Shanghai Jiao Tong University","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8062999844551086},{"id":"https://openalex.org/C2781181686","display_name":"Coherence (philosophical gambling strategy)","score":0.5534999966621399},{"id":"https://openalex.org/C26517878","display_name":"Key (lock)","score":0.5486999750137329},{"id":"https://openalex.org/C2780586882","display_name":"Simple (philosophy)","score":0.5400000214576721},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.45089998841285706},{"id":"https://openalex.org/C2778355321","display_name":"Identity (music)","score":0.421099990606308},{"id":"https://openalex.org/C2985909886","display_name":"Spatial coherence","score":0.41040000319480896},{"id":"https://openalex.org/C104114177","display_name":"Motion (physics)","score":0.40220001339912415}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4415540017","title":"HAMLET-FFD: Hierarchical Adaptive Multi-modal Learning Embeddings Transformation for Face Forgery Detection","url":"https://doi.org/10.1145/3746027.3755137","published":"2025-10-25","authors":["Jialei Cui","Jianwei Du","Yanzhe Li","Lei Gao","Hui Jiang","Chenfu Bao"],"abstract":"","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3746027.3755137","openalex_id":"https://openalex.org/W4415540017","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Baidu (China)","Southeast University","Tsinghua University"],"concepts":[{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.6272000074386597},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6241999864578247},{"id":"https://openalex.org/C2779304628","display_name":"Face (sociological concept)","score":0.5232999920845032},{"id":"https://openalex.org/C204241405","display_name":"Transformation (genetics)","score":0.5065000057220459},{"id":"https://openalex.org/C2776401178","display_name":"Feature (linguistics)","score":0.3767000138759613},{"id":"https://openalex.org/C153180895","display_name":"Pattern recognition (psychology)","score":0.34619998931884766},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.30799999833106995},{"id":"https://openalex.org/C4641261","display_name":"Face detection","score":0.27649998664855957}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4415536498","title":"Generating Negative Samples for Multi-Modal Recommendation","url":"https://doi.org/10.1145/3746027.3754977","published":"2025-10-25","authors":["Yanbiao Ji","Dan Luo","Chang Liu","Shaokai Wu","Jing Tong","Qichen He","Deyi Ji","Hongtao Lu","Yue Ding"],"abstract":"Multi-modal recommender systems (MMRS) have gained significant attention due to their ability to leverage information from various modalities to enhance recommendation quality. However, existing negative sampling techniques often struggle to effectively utilize the multi-modal data, leading to suboptimal performance. In this paper, we identify two key challenges in negative sampling for MMRS: (1) producing cohesive negative samples contrasting with positive samples and (2) maintaining a balanced influence across different modalities. To address these challenges, we propose NegGen, a novel framework that utilizes multi-modal large language models (MLLMs) to generate balanced and contrastive negative samples. We design three different prompt templates to enable NegGen to analyze and manipulate item attributes across multiple modalities, and then generate negative samples that introduce bet...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3746027.3754977","openalex_id":"https://openalex.org/W4415536498","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Lehigh University","Shanghai Jiao Tong University","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C153083717","display_name":"Leverage (statistics)","score":0.7452999949455261},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7448999881744385},{"id":"https://openalex.org/C26517878","display_name":"Key (lock)","score":0.6820999979972839},{"id":"https://openalex.org/C140779682","display_name":"Sampling (signal processing)","score":0.5088000297546387},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.4916999936103821},{"id":"https://openalex.org/C2780226545","display_name":"Modality (human–computer interaction)","score":0.49140000343322754},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.47099998593330383},{"id":"https://openalex.org/C2779903281","display_name":"Modalities","score":0.4674000144004822}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4415539618","title":"FingER: Content Aware Fine-grained Evaluation with Reasoning for AI-Generated Videos","url":"https://doi.org/10.1145/3746027.3755102","published":"2025-10-25","authors":["Rui Chen","Lei Sun","Jing Tang","Geng Li","Xiangxiang Chu"],"abstract":"Recent advances in video generation have posed great challenges in the assessment of AI-generated content, particularly with the emergence of increasingly sophisticated models. The various inconsistencies and defects observed in such videos are inherently complex, making overall scoring notoriously difficult. In this paper, we emphasize the critical importance of integrating fine-grained reasoning into video evaluation. We propose FingER, a novel entity-level reasoning evaluation framework that first automatically generates Fine-grained Entity-level questions, and then answers those questions by a Reasoning model with scores, which can be subsequently weighted summed to an overall score for different applications. Specifically, we leverage LLMs to derive entity-level questions across five distinct perspectives, which (i) often focus on some specific entities of the content, thereby makin...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3746027.3755102","openalex_id":"https://openalex.org/W4415539618","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Alibaba Group (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8098999857902527},{"id":"https://openalex.org/C153083717","display_name":"Leverage (statistics)","score":0.7827000021934509},{"id":"https://openalex.org/C2780801425","display_name":"Construct (python library)","score":0.5879999995231628},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5400000214576721},{"id":"https://openalex.org/C774472","display_name":"Margin (machine learning)","score":0.5382999777793884},{"id":"https://openalex.org/C192209626","display_name":"Focus (optics)","score":0.5198000073432922},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.43639999628067017},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.3407999873161316}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4415539101","title":"DriVerse: Navigation World Model for Driving Simulation via Multimodal Trajectory Prompting and Motion Alignment","url":"https://doi.org/10.1145/3746027.3755111","published":"2025-10-25","authors":["Xiaofan Li","Chenming Wu","Zhao Yang","Zhihao Xu","Yumeng Zhang","Dingkang Liang","Ji Wan","Jun Wang"],"abstract":"This paper presents DriVerse, a generative model for simulating navigation-driven driving scenes from a single image and a future trajectory. Previous autonomous driving world models either directly feed the trajectory or discrete control signals into the generation pipeline, leading to poor alignment between the control inputs and the implicit features of the 2D base generative model, which results in low-fidelity video outputs. Some methods use coarse textual commands or discrete vehicle control signals, which lack the precision to guide fine-grained, trajectory-specific video generation, making them unsuitable for evaluating actual autonomous driving algorithms. DriVerse introduces explicit trajectory guidance in two complementary forms: it tokenizes trajectories into textual prompts using a predefined trend vocabulary for seamless language integration, and converts 3D trajectories in...","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3746027.3755111","openalex_id":"https://openalex.org/W4415539101","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Baidu (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7838000059127808},{"id":"https://openalex.org/C13662910","display_name":"Trajectory","score":0.7150999903678894},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.6043000221252441},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.5803999900817871},{"id":"https://openalex.org/C104114177","display_name":"Motion (physics)","score":0.5507000088691711},{"id":"https://openalex.org/C2776937971","display_name":"Heading (navigation)","score":0.5346999764442444},{"id":"https://openalex.org/C2776436953","display_name":"Consistency (knowledge bases)","score":0.44620001316070557},{"id":"https://openalex.org/C167966045","display_name":"Generative model","score":0.40709999203681946}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4415538267","title":"DITL <sup>2</sup> : Dual-Stage Invariance Transfer Learning for Generalizable Document Image Tampering Localization","url":"https://doi.org/10.1145/3746027.3754857","published":"2025-10-25","authors":["Songze Li","Yunfei Guo","S. Chen","Bin Li","Kaiqing Lin","Changsheng Chen","Haodong Li","Taiping Yao","Shouhong Ding"],"abstract":"Document Image Tampering Localization (DITL) advances considerably, yet achieving robust cross-dataset generalization remains a formidable challenge for practical applications. Expanding existing document datasets for training is labor-intensive, making it appealing to incorporate data from non-document domains such as natural scene images. However, domain-specific variations, including differences in color distribution and texture, compromise the performance of joint training. To address this issue, we propose DITL2, a Dual-stage Invariance Transfer Learning framework for Document Image Tampering Localization that consists of Cross-Domain Invariance Pre-training (CDIP) and Frequency Decoupling Parameter Adaptation (FDPA). In the pre-training stage, CDIP employs style transfer and texture consistency learning to suppress domain-specific influences from tampered natural scene images, and....","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3746027.3754857","openalex_id":"https://openalex.org/W4415538267","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Shenzhen University","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7203999757766724},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.6427000164985657},{"id":"https://openalex.org/C177148314","display_name":"Generalization","score":0.5950000286102295},{"id":"https://openalex.org/C150899416","display_name":"Transfer of learning","score":0.5800999999046326},{"id":"https://openalex.org/C2776436953","display_name":"Consistency (knowledge bases)","score":0.5059000253677368},{"id":"https://openalex.org/C75291252","display_name":"TRACE (psycholinguistics)","score":0.49149999022483826},{"id":"https://openalex.org/C115961682","display_name":"Image (mathematics)","score":0.490200012922287},{"id":"https://openalex.org/C136197465","display_name":"Variety (cybernetics)","score":0.44110000133514404}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4415546042","title":"Cosmos World Foundation Models for Physical AI","url":"https://doi.org/10.1145/3746262.3761969","published":"2025-10-25","authors":["Jinwei Gu"],"abstract":"NVIDIA Cosmos is a family of World Foundation Models specifically designed for physical AI. It includes three main components: Cosmos-Predict, Cosmos-Transfer, and Cosmos-Reason. In this talk, I will give an overview of Cosmos models, its development journey, benchmarking and evaluation, and various types of post-training of Cosmos models specifically targeting applications to robotics and autonomous driving. I will demonstrate the effectiveness of using Cosmos World Foundation Models for Synthetic Data Generation (SDG), embodied AI, and other downstream tasks.","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3746262.3761969","openalex_id":"https://openalex.org/W4415546042","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Nvidia (United States)"],"concepts":[{"id":"https://openalex.org/C2780307871","display_name":"Cosmos (plant)","score":0.7124000191688538},{"id":"https://openalex.org/C2780966255","display_name":"Foundation (evidence)","score":0.6844000220298767},{"id":"https://openalex.org/C127413603","display_name":"Engineering","score":0.5561000108718872},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.46389999985694885},{"id":"https://openalex.org/C34413123","display_name":"Robotics","score":0.40450000762939453},{"id":"https://openalex.org/C86251818","display_name":"Benchmarking","score":0.40380001068115234},{"id":"https://openalex.org/C59519942","display_name":"Drone","score":0.31130000948905945},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.3109000027179718}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4415540540","title":"CausalCtrl: Causality-Aware Control Framework for Text-Guided Visual Editing","url":"https://doi.org/10.1145/3746027.3755228","published":"2025-10-25","authors":["Haoxiang Cao","Chaoqun Wang","Yongwen Lai","Shaobo Min","Xuejin Chen"],"abstract":"Text-guided visual editing aims to modify visual content according to a target prompt while faithfully preserving the structure and identity of the source image or video. However, existing methods ignore confounding effects brought from the pretrained model, i.e., harmful biases learned from the pretraining datasets, leading to spurious correlations during the editing processing. To address this issue, we introduce CausalCtrl, a novel training-free framework that reformulates text-guided visual editing from a causal inference perspective. The core idea is to leverage frontdoor adjustment to estimate the interventional distribution of the output, effectively blocking the influence of hidden confounders introduced by the pretrained model. Specifically, we first design a dual-branch inversion mechanism that disentangles the source content and target semantics into two separate latent embedd...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3746027.3755228","openalex_id":"https://openalex.org/W4415540540","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["South China Normal University","Tencent (China)","University of Science and Technology of China"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.786899983882904},{"id":"https://openalex.org/C97256817","display_name":"Spurious relationship","score":0.6956999897956848},{"id":"https://openalex.org/C2776674983","display_name":"Image editing","score":0.6761000156402588},{"id":"https://openalex.org/C153083717","display_name":"Leverage (statistics)","score":0.5881999731063843},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5077000260353088},{"id":"https://openalex.org/C2776214188","display_name":"Inference","score":0.4871000051498413},{"id":"https://openalex.org/C184337299","display_name":"Semantics (computer science)","score":0.40959998965263367},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.36410000920295715}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7106822290","title":"Beyond Paraphrasing: Analyzing Summarization Abstractiveness and Reasoning","url":"https://doi.org/10.48448/22xh-d354","published":"2025-10-25","authors":["Association for Computational Linguistics 2025","Cheung, Jackie","Ernst, Ori","Zeweniuk, Nathan"],"abstract":"While there have been many studies analyzing the ability of LLMs to solve problems through reasoning, their application of reasoning in summarization remains largely unexamined. This study explores whether reasoning is essential to summarization by investigating three questions: (1) Do humans frequently use reasoning to generate new summary content? (2) Do summarization models exhibit the same reasoning patterns as humans? (3) Should summarization models integrate more complex reasoning abilities? Our findings reveal that while human summaries often contain reasoning-based information, system-generated summaries rarely contain this same information. This suggests that models struggle to effectively apply reasoning, even when it could improve summary quality. We advocate for the development of models that incorporate deeper reasoning and abstractiveness, and we release our annotated data....","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"other","doi":"https://doi.org/10.48448/22xh-d354","openalex_id":"https://openalex.org/W7106822290","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Amazon (United States)","Bar-Ilan University"],"concepts":[{"id":"https://openalex.org/C170858558","display_name":"Automatic summarization","score":0.9639000296592712},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7078999876976013},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4991999864578247},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.37529999017715454},{"id":"https://openalex.org/C195344581","display_name":"Automated reasoning","score":0.34950000047683716},{"id":"https://openalex.org/C20162079","display_name":"Case-based reasoning","score":0.3441999852657318},{"id":"https://openalex.org/C26517878","display_name":"Key (lock)","score":0.34290000796318054},{"id":"https://openalex.org/C37335422","display_name":"Model-based reasoning","score":0.3151000142097473}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4415540914","title":"A Comprehensive Model for Visual Fatigue Assessment in 3D Light Field Displays Based on Eye Movement Data Analysis","url":"https://doi.org/10.1145/3746027.3755675","published":"2025-10-25","authors":["Yu Chen","Binbin Yan","Shuo Chen","Xinzhu Sang"],"abstract":"Three-dimensional (3D) light field displays (LFDs) provide immersive visual experiences and have attracted increasing attention. However, visual fatigue remains an important concern when users watch 3D LFDs which limits their development and application. In this paper, we propose a comprehensive methodology that integrates subjective and objective data to establish a robust dataset and employs eye movement data for systematically investigating visual fatigue in 3D LFDs. Firstly, a multimodal dataset is constructed by integrating subjective fatigue scores and objective eye movement data collection. Then, we propose the Deep Correlation Data Analysis Model (DCDAM), which uses Spearman's rank correlation coefficient to analyze correlations between key objective metrics and subjective fatigue curves, validating the effectiveness of these metrics. Furthermore, to comprehensively assess visual...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3746027.3755675","openalex_id":"https://openalex.org/W4415540914","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Beijing University of Posts and Telecommunications","Huawei Technologies (China)","Huawei Technologies (United Kingdom)"],"concepts":[{"id":"https://openalex.org/C153050134","display_name":"Eye movement","score":0.6510999798774719},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6136999726295471},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5659999847412109},{"id":"https://openalex.org/C177148314","display_name":"Generalization","score":0.5285000205039978},{"id":"https://openalex.org/C9652623","display_name":"Field (mathematics)","score":0.47920000553131104},{"id":"https://openalex.org/C2776058522","display_name":"Visual field","score":0.4359999895095825},{"id":"https://openalex.org/C164280684","display_name":"Gaze-contingency paradigm","score":0.41990000009536743},{"id":"https://openalex.org/C56461940","display_name":"Eye tracking","score":0.3919000029563904}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/interpretable-next-token-prediction-via-the-generalized-induction-head","title":"Interpretable Next-token Prediction via the Generalized Induction Head","url":"https://www.microsoft.com/en-us/research/publication/interpretable-next-token-prediction-via-the-generalized-induction-head/","published":"2025-10-24","authors":["Eunji Kim","Sriya Mantena","Weiwei Yang","Chandan Singh","Sungroh Yoon","Jianfeng Gao"],"abstract":"While large transformer models excel in predictive performance, their lack of interpretability restricts their usefulness in high-stakes domains. To remedy this, we propose the Generalized Induction-Head Model (GIM), an interpretable model for next-token prediction inspired by the observation of\"induction heads\"in LLMs. GIM is a retrieval-based module that identifies similar sequences in the input context by combining exact n-gram matching and fuzzy matching based on a neural similarity metric. We evaluate GIM in two settings: language modeling and fMRI response prediction. In language modeling, GIM improves next-token prediction by up to 25%p over interpretable baselines, significantly narrowing the gap with black-box LLMs. In an fMRI setting, GIM improves neural response prediction by 20% and offers insights into the language selectivity of the brain. GIM represents a significant step....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","1970-01-01","retrieval"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W4415517504","title":"CellTok: Early-Fusion Multimodal Large Language Model for Single-Cell Transcriptomics via Tokenization","url":"https://doi.org/10.1101/2025.10.22.684047","published":"2025-10-24","authors":["Chuxi Xiao","Haiyang Bian","Yixin Chen","Lei Wei","Xuegong Zhang"],"abstract":"Abstract Single-cell transcriptomic data provide a high-resolution view of cellular states and functions, offering critical insights into development, disease, and tissue heterogeneity. Existing foundation models for single-cell analysis typically embed cells as continuous vectors, limiting their generative flexibility and hindering integration with the knowledge accumulated in large language models (LLMs). In this work, we present CellTok, a multimodal LLM framework for unified analysis of scRNA-seq data and biological text. Each cell is tokenized into discrete codebook tokens via VQ-VAE and integrated into the LLM’s vocabulary using early fusion. This allows CellTok to process biological and textual inputs autoregressively, leveraging pretrained knowledge within LLMs to analyze cellular states and interactions. CellTok demonstrates strong performance in cell-level tasks such as annotat...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"preprint","doi":"https://doi.org/10.1101/2025.10.22.684047","openalex_id":"https://openalex.org/W4415517504","cited_by_count":1,"quality_score":46,"matched_keywords":["LLM","language model"],"author_affiliations":["Alibaba Group (China)","Cloud Computing Center","Fudan University","Tsinghua University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7574999928474426},{"id":"https://openalex.org/C2780598303","display_name":"Flexibility (engineering)","score":0.5302000045776367},{"id":"https://openalex.org/C2776321320","display_name":"Annotation","score":0.5288000106811523},{"id":"https://openalex.org/C98045186","display_name":"Process (computing)","score":0.5174000263214111},{"id":"https://openalex.org/C2777601683","display_name":"Vocabulary","score":0.5157999992370605},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4887000024318695},{"id":"https://openalex.org/C26517878","display_name":"Key (lock)","score":0.4537999927997589},{"id":"https://openalex.org/C25810664","display_name":"Ontology","score":0.4251999855041504}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4415506496","title":"Democratizing protein language model training, sharing and collaboration","url":"https://doi.org/10.1038/s41587-025-02859-7","published":"2025-10-24","authors":["Jin Su","Zhikai Li","Tianli Tao","Chenchen Han","Yan He","Fengyuan Dai","Qingyan Yuan","Yuan Gao","Tong Si","Xuting Zhang","Yuyang Zhou","Junjie Shan"],"abstract":"","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1038/s41587-025-02859-7","openalex_id":"https://openalex.org/W4415506496","cited_by_count":3,"quality_score":44,"matched_keywords":["language model"],"author_affiliations":["Centre for Genomic Regulation","Chinese Academy of Sciences","Colorado State University","Columbia University","Duke University","Harvard University","Hefei University of Technology","Helmholtz Zentrum München","Massachusetts Institute of Technology","Microsoft (United States)","Rice University","Seoul National University","Shandong University","Shanghai Jiao Tong University","ShanghaiTech University","Shenzhen Institutes of Advanced Technology","Shenzhen Weiguang Biological Products (China)","Soochow University","South China Agricultural University","Universitat Pompeu Fabra","University of California, San Francisco","University of Oxford","University of Pennsylvania","University of Toronto","University of Wisconsin–Madison","Westlake University","Xihu Institute of Electronic Research","Zhejiang Lab","Zhejiang University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7142999768257141},{"id":"https://openalex.org/C136764020","display_name":"World Wide Web","score":0.4593000113964081},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.4115999937057495},{"id":"https://openalex.org/C2777211547","display_name":"Training (meteorology)","score":0.38960000872612},{"id":"https://openalex.org/C2522767166","display_name":"Data science","score":0.37439998984336853},{"id":"https://openalex.org/C47602998","display_name":"Language barrier","score":0.374099999666214},{"id":"https://openalex.org/C51632099","display_name":"Training set","score":0.36410000920295715},{"id":"https://openalex.org/C56739046","display_name":"Knowledge management","score":0.36039999127388}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":3}},{"id":"openalex:W7124142114","title":"Multimodal Foundation Model-Driven User Interest Modeling and Behavior Analysis on Short Video Platforms","url":"https://doi.org/10.1109/mlbdbi67855.2025.11331476","published":"2025-10-24","authors":["Y. X. Zhao","Yike Peng","Li Zhang","Qianyi Sun","Zhihui Zhang","Yingying Zhuang"],"abstract":"With the rapid expansion of user bases on short video platforms, personalized recommendation systems are playing an increasingly critical role in enhancing user experience and optimizing content distribution. Traditional interest modeling methods often rely on unimodal data, such as click logs or text labels, which limits their ability to fully capture user preferences in a complex multimodal content environment. To address this challenge, this paper proposes a multimodal foundation model-based framework for user interest modeling and behavior analysis. By integrating video frames, textual descriptions, and background music into a unified semantic space using cross-modal alignment strategies, the framework constructs fine-grained user interest vectors. Additionally, we introduce a behavior-driven feature embedding mechanism that incorporates viewing, liking, and commenting sequences to m...","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/mlbdbi67855.2025.11331476","openalex_id":"https://openalex.org/W7124142114","cited_by_count":2,"quality_score":43,"matched_keywords":["personalized"],"author_affiliations":["Amazon (United States)","Boston University","Columbia University","Johns Hopkins Medicine","Johns Hopkins University","Vanderbilt University","Washington University in St. Louis"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8312000036239624},{"id":"https://openalex.org/C2781067378","display_name":"Interpretability","score":0.6470000147819519},{"id":"https://openalex.org/C36464697","display_name":"Visualization","score":0.5008999705314636},{"id":"https://openalex.org/C557471498","display_name":"Recommender system","score":0.47699999809265137},{"id":"https://openalex.org/C75291252","display_name":"TRACE (psycholinguistics)","score":0.4715000092983246},{"id":"https://openalex.org/C67712803","display_name":"User modeling","score":0.47099998593330383},{"id":"https://openalex.org/C2776401178","display_name":"Feature (linguistics)","score":0.4408000111579895},{"id":"https://openalex.org/C2780233690","display_name":"Transparency (behavior)","score":0.4171999990940094}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":2}},{"id":"openalex:W7127629172","title":"Phishing Email Detection Using a Multimodal BERT–XGBoost Framework","url":"https://doi.org/10.1109/gcat66372.2025.11368484","published":"2025-10-24","authors":["Atul Mishra","Yash Gautam","Rupa Shree S","Meda Mounika","Minal Moharir","Namruth Reddy S"],"abstract":"Phishing attacks through email correspondence remains one of the most prominent cyber risks in today’s cyber space. The conventional detection methodologies that depend solely on textual content or URL patterns are becoming more ineffective in countering advanced evasion measures taken by cybercriminals. This research provides a comprehensive system for detecting attempts at phishing by leveraging semantic knowledge about text by one fine-tuned Bidirectional Encoder Representations from Transformers along with structural URL feature understanding by extreme Gradient Boosting. Outputs from these complementary models are combined by a Logistic Regression meta-classifier to get final detection scores. Experimental validation on the SpamAssassin email collection demonstrates that our proposed fusion approach yields 98% accuracy and 0.996 AUC and significantly dominates individual modalities....","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/gcat66372.2025.11368484","openalex_id":"https://openalex.org/W7127629172","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Nvidia (United States)","R.V. College of Engineering"],"concepts":[{"id":"https://openalex.org/C83860907","display_name":"Phishing","score":0.8070999979972839},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7695000171661377},{"id":"https://openalex.org/C2776401178","display_name":"Feature (linguistics)","score":0.4375},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.41780000925064087},{"id":"https://openalex.org/C2779903281","display_name":"Modalities","score":0.4165000021457672},{"id":"https://openalex.org/C2781251061","display_name":"Evasion (ethics)","score":0.39649999141693115},{"id":"https://openalex.org/C124101348","display_name":"Data mining","score":0.38179999589920044},{"id":"https://openalex.org/C105339364","display_name":"Software deployment","score":0.37959998846054077}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7126376592","title":"An intelligent cockpit voice testing system based on artificial intelligence technology","url":"https://doi.org/10.1145/3783998.3784002","published":"2025-10-24","authors":["Han Zhou","Junxia Ma"],"abstract":"With the rapid development of intelligent cockpit technology, voice interaction has become the core entry point of human-vehicle interaction. This paper proposes an intelligent cockpit voice test system based on artificial intelligence technology, which deeply integrates AI technologies such as automatic speech recognition (ASR), natural language processing (NLP), speech synthesis (TTS), and big data analysis. It builds a full-process test platform that integrates intelligent use case generation, multimodal scenario simulation, automated execution and in-depth analysis. The system significantly improves test efficiency and coverage, reduces labor costs, and provides data-driven decision support for the quality optimization of voice interaction systems. It is a key infrastructure for ensuring the quality of voice interaction in intelligent cockpits.","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3783998.3784002","openalex_id":"https://openalex.org/W7126376592","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Intelligent Health (United Kingdom)","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C30322324","display_name":"Cockpit","score":0.9042999744415283},{"id":"https://openalex.org/C26517878","display_name":"Key (lock)","score":0.6150000095367432},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5972999930381775},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.46630001068115234},{"id":"https://openalex.org/C28719098","display_name":"Point (geometry)","score":0.4580000042915344},{"id":"https://openalex.org/C2779530757","display_name":"Quality (philosophy)","score":0.4250999987125397},{"id":"https://openalex.org/C2777267654","display_name":"Test (biology)","score":0.40630000829696655},{"id":"https://openalex.org/C56397880","display_name":"Intelligent decision support system","score":0.4050999879837036}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"hf-org-paper:baidu:2510.20548","title":"GlobalRAG: Enhancing Global Reasoning in Multi-hop Question Answering via Reinforcement Learning","url":"https://huggingface.co/papers/2510.20548","published":"2025-10-23","authors":["Baidu"],"abstract":"","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["HuggingFace org papers","baidu"],"author_affiliations":["Baidu"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/baidu/papers"}},{"id":"hf-org-paper:tencent:2510.20187","title":"Every Question Has Its Own Value: Reinforcement Learning with Explicit Human Values","url":"https://huggingface.co/papers/2510.20187","published":"2025-10-23","authors":["Tencent/Hunyuan"],"abstract":"","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["HuggingFace org papers","tencent"],"author_affiliations":["Tencent/Hunyuan"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/tencent/papers"}},{"id":"openalex:W4415446051","title":"ULMGNN: Fragmented layer grouping in GUI designs through graph learning based on multimodal information","url":"https://doi.org/10.1016/j.neucom.2025.131894","published":"2025-10-23","authors":["Yunnong Chen","Shuhong Xiao","Jiazhi Li","Tingting Zhou","Lingyun Sun","Liuqing Chen"],"abstract":"","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1016/j.neucom.2025.131894","openalex_id":"https://openalex.org/W4415446051","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Alibaba Group (China)","Zhejiang University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8392000198364258},{"id":"https://openalex.org/C160713754","display_name":"Maintainability","score":0.64410001039505},{"id":"https://openalex.org/C168167062","display_name":"Component (thermodynamics)","score":0.5097000002861023},{"id":"https://openalex.org/C132525143","display_name":"Graph","score":0.5045999884605408},{"id":"https://openalex.org/C2779227376","display_name":"Layer (electronics)","score":0.5006999969482422},{"id":"https://openalex.org/C2777904410","display_name":"Software","score":0.49559998512268066},{"id":"https://openalex.org/C170130773","display_name":"Usability","score":0.4740999937057495},{"id":"https://openalex.org/C63584917","display_name":"Bounding overwatch","score":0.4429999887943268}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/language-ranker-a-lightweight-ranking-framework-for-llm-decoding","title":"Language Ranker: A Lightweight Ranking framework for LLM Decoding","url":"https://www.microsoft.com/en-us/research/publication/language-ranker-a-lightweight-ranking-framework-for-llm-decoding/","published":"2025-10-22","authors":["Chenheng Zhang","Tianqi Du","Jizhe Zhang","Mingqing Xiao","Yifei Wang","Yisen Wang","Zhouchen Lin"],"abstract":"Conventional research on large language models (LLMs) has primarily focused on refining output distributions, while paying less attention to the decoding process that transforms these distributions into final responses. Recent advances, such as scaling the computation of inference time with reward models, have underscored the importance of decoding, but these methods often suffer from high computational costs and limited applicability. In this paper, we revisit LLM generation through the lens of recommender systems, conceptualizing the decoding process as analogous to the ranking stage in recommendation pipelines. From this perspective, we observe that both traditional decoding methods and reward models exhibit clear limitations such as redundancy. Motivated by this insight, we propose Language Ranker, a novel framework that introduces a lightweight module to rerank candidate responses u...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":80,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computation and Language","Computer science","large language models","1970-01-01","LLM"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/efficient-llm-adaptation-using-a-single-gradient-step-on-100-samples","title":"Efficient LLM Adaptation Using a Single Gradient Step on 100 Samples","url":"https://www.microsoft.com/en-us/research/publication/efficient-llm-adaptation-using-a-single-gradient-step-on-100-samples/","published":"2025-10-22","authors":["Shiva Sreeram","Alaa Maalouf","Pratyusha Sharma","Daniela Rus"],"abstract":"Recently, Sharma et al. suggested a method called Layer-SElective-Rank reduction (LASER) which demonstrated that pruning high-order components of carefully chosen LLM's weight matrices can boost downstream accuracy -- without any gradient-based fine-tuning. Yet LASER's exhaustive, per-matrix search (each requiring full-dataset forward passes) makes it impractical for rapid deployment. We demonstrate that this overhead can be removed and find that: (i) Only a small, carefully chosen subset of matrices needs to be inspected -- eliminating the layer-by-layer sweep, (ii) The gradient of each matrix's singular values pinpoints which matrices merit reduction, (iii) Increasing the factorization search space by allowing matrices rows to cluster around multiple subspaces and then decomposing each cluster separately further reduces overfitting on the original training data and further lifts accura...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","1970-01-01","LLM","efficient"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"bytedance-seed:1445","title":"Seed3D 1.0: From Images to High-Fidelity Simulation-Ready 3D Assets","url":"https://seed.bytedance.com/en/research/seed3d-1-0-from-images-to-high-fidelity-simulation-ready-3d-assets","published":"2025-10-22","authors":["Jiashi Feng","Xiu Li","Jing Lin","Jiahang Liu","Gaohong Liu","Weiqiang Lou","Su Ma","Guang Shi","Qinlong Wang","Jun Wang","Zhongcong Xu","Xuanyu Yi"],"abstract":"Developing embodied AI agents requires scalable training environments that balance content diversity with physics accuracy. World simulators provide such environments but face distinct limitations: video-based methods generate diverse content but lack real-time physics feedback for interactive learning, while physics-based engines provide accurate dynamics but face scalability limitations from costly manual asset creation. We present Seed3D 1.0, a foundation model that generates simulation-ready 3D assets from single images, addressing the scalability challenge while maintaining physics rigor. Unlike existing 3D generation models, our system produces assets with accurate geometry, well-aligned textures, and realistic physically-based materials. These assets can be directly integrated into physics engines with minimal configuration, enabling deployment in robotic manipulation and simulati...","companies":["ByteDance/Seed"],"matched_orgs":["ByteDance/Seed"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Computer Vision and Pattern Recognition","Multimodal","arXiv"],"author_affiliations":["ByteDance/Seed"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official ByteDance Seed publication API https://seed.bytedance.com/api/get_article_list_v2"}},{"id":"openalex:W4416798814","title":"I <sup>2</sup> TTS: Image-Indicated Immersive Text-to-Speech Synthesis with Spatial Perception","url":"https://doi.org/10.1109/apsipaasc65261.2025.11249147","published":"2025-10-22","authors":["Jiawei Zhang","Tianhao Zhang","Jun Wang","Jiaran Gao","Ruijie Tao","Xinyuan Qian","Xu-Cheng Yin"],"abstract":"Controlling the spatial and stylistic characteristics of synthesized speech is essential for immersive and personalized applications such as virtual reality, gaming, and human-computer interaction. While recent Text-to-speech (TTS) systems have explored multi-modal conditioning, they often suffer from poor reverberation fidelity or degraded audio quality due to reliance on external vocoders. In this paper, we propose Image-indicated Immersive Text-to-speech Synthesis (<tex xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">$\\mathbf{I}^{\\mathbf{2}}$</tex> TTS), an end-to-end multimodal TTS framework that synthesizes high-quality, immersive speech from text and visual scene prompts. Our model leverages a CLIP-based image encoder with an adaptive adapter to extract scene-aware features, a Speech Reverberation Classifier (SRC) for refining acoustic-visu...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/apsipaasc65261.2025.11249147","openalex_id":"https://openalex.org/W4416798814","cited_by_count":0,"quality_score":41,"matched_keywords":["personalized"],"author_affiliations":["National University of Singapore","Tencent (China)","University of Science and Technology Beijing"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.781000018119812},{"id":"https://openalex.org/C28490314","display_name":"Speech recognition","score":0.5957000255584717},{"id":"https://openalex.org/C118505674","display_name":"Encoder","score":0.5759000182151794},{"id":"https://openalex.org/C113364801","display_name":"High fidelity","score":0.5629000067710876},{"id":"https://openalex.org/C194969405","display_name":"Virtual reality","score":0.47600001096725464},{"id":"https://openalex.org/C26760741","display_name":"Perception","score":0.44769999384880066},{"id":"https://openalex.org/C95851461","display_name":"Reverberation","score":0.40450000762939453},{"id":"https://openalex.org/C14999030","display_name":"Speech synthesis","score":0.4018999934196472}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/loongrl-reinforcement-learning-for-advanced-reasoning-over-long-contexts","title":"LoongRL: Reinforcement Learning for Advanced Reasoning over Long Contexts","url":"https://www.microsoft.com/en-us/research/publication/loongrl-reinforcement-learning-for-advanced-reasoning-over-long-contexts/","published":"2025-10-21","authors":["Siyuan Wang","Gaokai Zhang","L. Zhang","Ning Shang","Fan Yang","Dongyao Chen","Mao Yang"],"abstract":"Reasoning over long contexts is essential for large language models. While reinforcement learning (RL) enhances short-context reasoning by inducing\"Aha\"moments in chain-of-thought, the advanced thinking patterns required for long-context reasoning remain largely unexplored, and high-difficulty RL data are scarce. In this paper, we introduce LoongRL, a data-driven RL method for advanced long-context reasoning. Central to LoongRL is KeyChain, a synthesis approach that transforms short multi-hop QA into high-difficulty long-context tasks by inserting UUID chains that hide the true question among large collections of distracting documents. Solving these tasks requires the model to trace the correct chain step-by-step, identify the true question, retrieve relevant facts and reason over them to answer correctly. RL training on KeyChain data induces an emergent plan-retrieve-reason-recheck reas...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","Reinforcement learning","1970-01-01","retrieval"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/seeing-across-views-benchmarking-spatial-reasoning-of-vision-language-models-in-robotic-scenes","title":"Seeing Across Views: Benchmarking Spatial Reasoning of Vision-Language Models in Robotic Scenes","url":"https://www.microsoft.com/en-us/research/publication/seeing-across-views-benchmarking-spatial-reasoning-of-vision-language-models-in-robotic-scenes/","published":"2025-10-21","authors":["Zhiyuan Feng","Zhaolu Kang","Qijie Wang","Zhiying Du","Jiongrui Yan","Shubin Shi","Chengbo Yuan","Huizhi Liang","Yu Deng","Qixiu Li","Rushuai Yang","Arctanx An"],"abstract":"Vision-language models (VLMs) are essential to Embodied AI, enabling robots to perceive, reason, and act in complex environments. They also serve as the foundation for the recent Vision-Language-Action (VLA) models. Yet most evaluations of VLMs focus on single-view settings, leaving their ability to integrate multi-view information underexplored. At the same time, multi-camera setups are increasingly standard in robotic platforms, as they provide complementary perspectives to mitigate occlusion and depth ambiguity. Whether VLMs can effectively leverage such multi-view inputs for robotic reasoning therefore remains an open question. To bridge this gap, we introduce MV-RoboBench, a benchmark specifically designed to evaluate the multi-view spatial reasoning capabilities of VLMs in robotic manipulation. MV-RoboBench consists of 1.7k manually curated QA items across eight subtasks, divided i...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"hf-org-paper:tencent:2510.18455","title":"ChronoPlay: A Framework for Modeling Dual Dynamics and Authenticity in Game RAG Benchmarks","url":"https://huggingface.co/papers/2510.18455","published":"2025-10-21","authors":["Tencent/Hunyuan"],"abstract":"","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["HuggingFace org papers","tencent"],"author_affiliations":["Tencent/Hunyuan"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/tencent/papers"}},{"id":"openalex:W4415428435","title":"Mutual Information-Driven Rationale Distillation for Hateful Memes Detection","url":"https://doi.org/10.3233/faia251159","published":"2025-10-21","authors":["Chuanpeng Yang","Yaxin Liu","Fuqing Zhu"],"abstract":"Hateful memes typically disseminate discriminatory statements directly or indirectly based on race, religion, or other characteristics. Previous research has made enlightening exploration in detecting explicit hateful memes. However, these methods overlook the analysis of implicit hate, which is particularly challenging as it requires comprehensive background knowledge to be accurately identified. To this end, this paper proposes a mutual information-driven rationale distillation model (MIRD) that leverages multimodal reasoning knowledge extracted from MLLM to reveal hateful memes. The model is divided into two stages. The first stage is abductive reasoning, which provides contextual background information to guide the revelation of the implicit meaning of memes. The second stage is rationale distillation, which transfers the background knowledge provided by MLLM to small models, thereby...","companies":["Tencent/Hunyuan","Baidu"],"matched_orgs":["Tencent/Hunyuan","Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"book-chapter","doi":"https://doi.org/10.3233/faia251159","openalex_id":"https://openalex.org/W4415428435","cited_by_count":0,"quality_score":53,"matched_keywords":["distillation"],"author_affiliations":["Baidu (China)","Chinese Academy of Sciences","Institute of Information Engineering","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7044000029563904},{"id":"https://openalex.org/C2776436953","display_name":"Consistency (knowledge bases)","score":0.6450999975204468},{"id":"https://openalex.org/C2780876879","display_name":"Meaning (existential)","score":0.5508999824523926},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5328999757766724},{"id":"https://openalex.org/C101780184","display_name":"Dissemination","score":0.46630001068115234},{"id":"https://openalex.org/C152139883","display_name":"Mutual information","score":0.45980000495910645},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.38429999351501465},{"id":"https://openalex.org/C44291984","display_name":"Question answering","score":0.33970001339912415}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4415428926","title":"Unlocking the Potential of mLLMs: Enhancing Video-Text Retrieval Through Caption Supplementation and Conical Embedding Optimization","url":"https://doi.org/10.3233/faia250879","published":"2025-10-21","authors":["Baoyao Yang","Junxiang Chen","Wenbin Yao"],"abstract":"The burgeoning field of video-text retrieval has witnessed significant advancements with the advent of deep learning. However, understanding and matching textual descriptions and video data remains a formidable challenge due to the large information gap across textual and video modalities. As observed, the caption of a video is commonly under-described, lacking expressions of minor characters or local details. Some recent advances have attempted to leverage multimodal Large Language Model (mLLM) to bridge the comprehension gap. However, mLLMs’ potential in enhancing video-text retrieval (VTR) is understudied. This paper aims to fill this research vacancy, analyzing the practical significance and model preferences for utilizing mLLMs in VTR enhancement, as well as investigating the effective integration of mLLM-derived information into the retrieval learning. Based on our analytical insig...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"book-chapter","doi":"https://doi.org/10.3233/faia250879","openalex_id":"https://openalex.org/W4415428926","cited_by_count":0,"quality_score":45,"matched_keywords":["language model","retrieval"],"author_affiliations":["Guangdong University of Technology","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7760999798774719},{"id":"https://openalex.org/C63479239","display_name":"Robustness (evolution)","score":0.5860999822616577},{"id":"https://openalex.org/C153083717","display_name":"Leverage (statistics)","score":0.5857999920845032},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5723000168800354},{"id":"https://openalex.org/C41608201","display_name":"Embedding","score":0.5719000101089478},{"id":"https://openalex.org/C2983174267","display_name":"Video retrieval","score":0.43880000710487366},{"id":"https://openalex.org/C174348530","display_name":"Bridging (networking)","score":0.4138000011444092},{"id":"https://openalex.org/C100776233","display_name":"Bridge (graph theory)","score":0.41190001368522644}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4415428185","title":"VariGen: Controllable Image Generation Via Personalized Diffusion Framework","url":"https://doi.org/10.3233/faia251196","published":"2025-10-21","authors":["Mingxin Cai","Zhuan Qing Huang","Yuchen Li","Linghe Kong","Guihai Chen"],"abstract":"Diffusion models have achieved remarkable progress in text-to-image generation, enabling the rise of personalized models. A key challenge in personalized generation is to provide users with precise control while ensuring high fidelity to the content. To address this, we introduce VariGen: a framework that empowers users to achieve fine-grained, layout-controllable personalized image generation. VariGen employs the Variational Detail-Aware Feature Extractor to capture intricate details from reference subjects and the Dual Layout Control Mechanism to integrate layout specifications seamlessly into the generation process. We demonstrate that VariGen achieves superior performance through extensive experimentation, offering unparalleled creative freedom and fidelity. To our knowledge, this is the first work to enable users to “create anything, anywhere” with such precision and flexibility.","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"book-chapter","doi":"https://doi.org/10.3233/faia251196","openalex_id":"https://openalex.org/W4415428185","cited_by_count":0,"quality_score":41,"matched_keywords":["personalized"],"author_affiliations":["Baidu (China)","Shanghai Jiao Tong University","University of Technology Sydney"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7407000064849854},{"id":"https://openalex.org/C26517878","display_name":"Key (lock)","score":0.5971999764442444},{"id":"https://openalex.org/C2776401178","display_name":"Feature (linguistics)","score":0.5332000255584717},{"id":"https://openalex.org/C2776459999","display_name":"Fidelity","score":0.4925000071525574},{"id":"https://openalex.org/C113364801","display_name":"High fidelity","score":0.47540000081062317},{"id":"https://openalex.org/C2987933465","display_name":"Image manipulation","score":0.46050000190734863},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4456999897956848},{"id":"https://openalex.org/C115961682","display_name":"Image (mathematics)","score":0.3887999951839447}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4415428244","title":"LexSemBridge: Fine-Grained Dense Representation Enhancement Through Token-Aware Embedding Augmentation","url":"https://doi.org/10.3233/faia251083","published":"2025-10-21","authors":["Shaoxiong Zhan","Hai Lin","Hongming Tan","Xiaodong Cai","Haitao Zheng","Xin Su","Zifei Shan","Ruitong Liu","Hong‐Gee Kim"],"abstract":"As queries in retrieval-augmented generation (RAG) pipelines powered by large language models (LLMs) become increasingly complex and diverse, dense retrieval models have demonstrated strong performance in semantic matching. Nevertheless, they often struggle with fine-grained retrieval tasks, where precise keyword alignment and span-level localization are required, even in cases with high lexical overlap that would intuitively suggest easier retrieval. To systematically evaluate this limitation, we introduce two targeted tasks, keyword retrieval and part-of-passage retrieval, designed to simulate practical fine-grained scenarios. Motivated by these observations, we propose LexSemBridge, a unified framework that enhances dense query representations through fine-grained, input-aware vector modulation. LexSemBridge constructs latent enhancement vectors from input tokens using three paradigms...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"book-chapter","doi":"https://doi.org/10.3233/faia251083","openalex_id":"https://openalex.org/W4415428244","cited_by_count":0,"quality_score":41,"matched_keywords":["retrieval"],"author_affiliations":["Peng Cheng Laboratory","Seoul National University","Tencent (China)","Tsinghua University","University Town of Shenzhen"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.79339998960495},{"id":"https://openalex.org/C41608201","display_name":"Embedding","score":0.6100999712944031},{"id":"https://openalex.org/C118505674","display_name":"Encoder","score":0.5803999900817871},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5770999789237976},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.5680000185966492},{"id":"https://openalex.org/C97931131","display_name":"Discriminative model","score":0.5551000237464905},{"id":"https://openalex.org/C2780767217","display_name":"Generality","score":0.5428000092506409},{"id":"https://openalex.org/C2776359362","display_name":"Representation (politics)","score":0.5318999886512756}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4415428212","title":"Investigating Sparsity of Self-Attention","url":"https://doi.org/10.3233/faia251302","published":"2025-10-21","authors":["Martin Garaj","Kisub Kim","Alexis Stockinger"],"abstract":"Understanding the sparsity patterns of the self-attention mechanism in modern Large Language Models (LLMs) has become increasingly important for improving computational efficiency. Motivated by empirical observations, numerous algorithms assume specific sparsity structures within self-attention. In this work, we rigorously examine five common conjectures about self-attention sparsity frequently addressed in recent literature: (1) attention width decreases through network depth, (2) attention heads form distinct behavioral clusters, (3) recent tokens receive high attention, (4) the first token maintains consistent focus, and (5) semantically important tokens persistently attract attention. Our analysis uses over 4 million attention weight vectors from Llama3-8B collected over long-context benchmark LongBench to achieve statistically significant results. Our findings strongly support conje...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"book-chapter","doi":"https://doi.org/10.3233/faia251302","openalex_id":"https://openalex.org/W4415428212","cited_by_count":0,"quality_score":41,"matched_keywords":["efficient"],"author_affiliations":["Daegu Gyeongbuk Institute of Science and Technology","Huawei Technologies (China)"],"concepts":[{"id":"https://openalex.org/C108010975","display_name":"Pruning","score":0.6876999735832214},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6680999994277954},{"id":"https://openalex.org/C192209626","display_name":"Focus (optics)","score":0.5957000255584717},{"id":"https://openalex.org/C48145219","display_name":"Security token","score":0.5807999968528748},{"id":"https://openalex.org/C89611455","display_name":"Mechanism (biology)","score":0.5515000224113464},{"id":"https://openalex.org/C185798385","display_name":"Benchmark (surveying)","score":0.4810999929904938},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.47440001368522644},{"id":"https://openalex.org/C73555534","display_name":"Cluster analysis","score":0.4348999857902527}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4415428271","title":"A Modality-Tailored Graph Modeling Framework for Urban Region Representation via Contrastive Learning","url":"https://doi.org/10.3233/faia251173","published":"2025-10-21","authors":["Yaya Zhao","Kaiqi Zhao","Zixuan Tang","Zhiyuan Liu","Xiaoling Lu","Yalei Du"],"abstract":"Graph-based models have emerged as a powerful paradigm for modeling multimodal urban data and learning region representations for various downstream tasks. However, existing approaches face two major limitations. (1) They typically employ identical graph neural network architectures across all modalities, failing to capture modality-specific structures and characteristics. (2) During the fusion stage, they often neglect spatial heterogeneity by assuming that the aggregation weights of different modalities remain invariant across regions, resulting in suboptimal representations. To address these issues, we propose MTGRR, a modality-tailored graph modeling framework for urban region representation, built upon a multimodal dataset comprising point of interest (POI), taxi mobility, land use, road element, remote sensing, and street view images. (1) MTGRR categorizes modalities into two group...","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"book-chapter","doi":"https://doi.org/10.3233/faia251173","openalex_id":"https://openalex.org/W4415428271","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Baidu (China)","CNS Research (United States)","Harbin Institute of Technology"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7421000003814697},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.6266999840736389},{"id":"https://openalex.org/C132525143","display_name":"Graph","score":0.5634999871253967},{"id":"https://openalex.org/C2779903281","display_name":"Modalities","score":0.5618000030517578},{"id":"https://openalex.org/C59404180","display_name":"Feature learning","score":0.44609999656677246},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.4007999897003174},{"id":"https://openalex.org/C2776359362","display_name":"Representation (politics)","score":0.3919999897480011},{"id":"https://openalex.org/C2780226545","display_name":"Modality (human–computer interaction)","score":0.3903000056743622}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/kirby-judge-think-once-judge-anywhere","title":"Kirby-Judge: Think Once, Judge Anywhere","url":"https://www.microsoft.com/en-us/research/publication/kirby-judge-think-once-judge-anywhere/","published":"2025-10-20","authors":["Jongwoo Ko","Sungnyun Kim","Sungwoo Cho","SeYoung Yun"],"abstract":"Human-generated reward signals are critical for aligning generative models with human preferences, guiding both training and inference-time evaluations. While large language models (LLMs) employed as proxy evaluators, i.e., LLM-as-a-Judge, significantly reduce the costs associated with manual annotations, they typically require extensive modality-specific training data and fail to generalize well across diverse multimodal tasks. In this paper, we propose Kirby-Judge , a reasoning-guided multimodal judge model that leverages minimal textual reasoning data to robustly generalize across multiple modalities and evaluation formats. Our core intuition is that structured textual reasoning explanations inherently encode generalizable decision-making patterns, enabling an effective transfer to multimodal judgments, e.g., with images or videos. Empirical results demonstrate that Kirby-Judge, despi...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Human language technologies","Computer science","large language models","1970-01-01","LLM"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/counterfactual-reasoning-for-steerable-pluralistic-value-alignment-of-large-language-models","title":"Counterfactual Reasoning for Steerable Pluralistic Value Alignment of Large Language Models","url":"https://www.microsoft.com/en-us/research/publication/counterfactual-reasoning-for-steerable-pluralistic-value-alignment-of-large-language-models/","published":"2025-10-20","authors":["Hanze Guo","Jing Yao","Xiao Zhou","Xiaoyuan Yi","Xing Xie"],"abstract":"As large language models (LLMs) become increasingly integrated into applications serving users across diverse cultures, communities and demographics, it is critical to align LLMs with pluralistic human values beyond average principles (e.g., HHH). In psychological and social value theories such as Schwartz's Value Theory, pluralistic values are represented by multiple value dimensions paired with various priorities. However, existing methods encounter two challenges when aligning with such fine-grained value objectives: 1) they often treat multiple values as independent and equally important, ignoring their interdependence and relative priorities (value complexity); 2) they struggle to precisely control nuanced value priorities, especially those underrepresented ones (value steerability). To handle these challenges, we propose COUPLE, a COUnterfactual reasoning framework for PLuralistic....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","large language models","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/cosmocore-affective-dream-replay-reinforcement-learning-for-code-generation","title":"CosmoCore Affective Dream-Replay Reinforcement Learning for Code Generation","url":"https://www.microsoft.com/en-us/research/publication/cosmocore-affective-dream-replay-reinforcement-learning-for-code-generation/","published":"2025-10-20","authors":["S. Ravindran"],"abstract":"We introduce CosmoCore, a neuroscience-inspired reinforcement learning (RL) architecture that integrates affective signals to enhance code generation in large language models (LLMs). Motivated by human and animal learning where embarrassment from mistakes drives rapid correction, as observed in training a puppy to avoid repeating errors after a single scolding CosmoCore tags code generation trajectories with valence and surprise using a lightweight multi-layer perceptron (MLP). High-negative valence (cringe) episodes, such as buggy code outputs, are prioritized in a Dream Queue for five-fold replay during off-policy updates, while low-surprise successes are pruned to prevent overconfidence and buffer bloat. Evaluated on code generation benchmarks like HumanEval and BigCodeBench, alongside simulations with a custom data pipeline environment, CosmoCore reduces hallucinated code (e.g., synt...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Inproceedings (Conference)","Human-computer interaction","Computer science","Reinforcement learning","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"hf-org-paper:deepseek-ai:2510.18234","title":"DeepSeek-OCR: Contexts Optical Compression","url":"https://huggingface.co/papers/2510.18234","published":"2025-10-20","authors":["DeepSeek"],"abstract":"We present DeepSeek-OCR as an initial investigation into the feasibility of compressing long contexts via optical 2D mapping. DeepSeek-OCR consists of two components: DeepEncoder and DeepSeek3B-MoE-A570M as the decoder. Specifically, DeepEncoder serves as the core engine, designed to maintain low activations under high-resolution input while achieving high compression ratios to ensure an optimal and manageable number of vision tokens. Experiments show that when the number of text tokens is within 10 times that of vision tokens (i.e., a compression ratio < 10x), the model can achieve decoding (OCR) precision of 97%. Even at a compression ratio of 20x, the OCR accuracy still remains at about 60%. This shows considerable promise for research areas such as historical long-context compression and memory forgetting mechanisms in LLMs. Beyond this, DeepSeek-OCR also demonstrates high practical....","companies":["DeepSeek"],"matched_orgs":["DeepSeek"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["HuggingFace org papers","deepseek-ai","memory","compression"],"author_affiliations":["DeepSeek"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/deepseek-ai/papers"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/breaking-and-fixing-defenses-against-control-flow-hijacking-in-multi-agent-systems","title":"Breaking and Fixing Defenses Against Control-Flow Hijacking in Multi-Agent Systems","url":"https://www.microsoft.com/en-us/research/publication/breaking-and-fixing-defenses-against-control-flow-hijacking-in-multi-agent-systems/","published":"2025-10-19","authors":["Rishi Jha","Harold Triedman","Justin Wagle","Vitaly Shmatikov"],"abstract":"Control-flow hijacking attacks manipulate orchestration mechanisms in multi-agent systems into performing unsafe actions that compromise the system and exfiltrate sensitive information. Recently proposed defenses, such as LlamaFirewall, rely on alignment checks of inter-agent communications to ensure that all agent invocations are\"related to\"and\"likely to further\"the original objective. We start by demonstrating control-flow hijacking attacks that evade these defenses even if alignment checks are performed by advanced LLMs. We argue that the safety and functionality objectives of multi-agent systems fundamentally conflict with each other. This conflict is exacerbated by the brittle definitions of\"alignment\"and the checkers'incomplete visibility into the execution context. We then propose, implement, and evaluate ControlValve, a new defense inspired by the principles of control-flow integ...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":84,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","Engineering","large language models","1970-01-01","agent","multi-agent"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"official:cb61aec6fc12f0d0","title":"Enrich and Detect: Video Temporal Grounding with Multimodal LLMs","url":"https://ai.meta.com/research/publications/enrich-and-detect-video-temporal-grounding-with-multimodal-llms/","published":"2025-10-19","authors":["Shraman Pramanick","Effrosyni Mavroudi","Yale Song","Rama Chellappa","Lorenzo Torresani","Triantafyllos Afouras"],"abstract":"Official Meta AI publication page.","companies":["Meta/FAIR"],"matched_orgs":["Meta/FAIR"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Computer Vision"],"author_affiliations":["Meta/FAIR"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Meta AI publication search https://ai.meta.com/results/?content_types%5B0%5D=publication&page=3"}},{"id":"openalex:W7131122436","title":"Align Before You Recommend: Parameter Efficient Personalization via Cross-Attentive Fusion of Hierarchical Language Models","url":"https://doi.org/10.1109/iccvw69036.2025.00632","published":"2025-10-19","authors":["Alicja Kwaśniewska","Marcin Bednarz","Chad Neal"],"abstract":"The rapidly growing global advertising and marketing industry demands innovative machine learning systems that balance accuracy with efficiency. Recommendation systems, crucial to many platforms, require careful considerations and potential enhancements. While Large Language Models (LLMs) have transformed various domains, their potential in sequential recommendation systems remains underexplored. Pioneering works like Hierarchical Large Language Models (HLLM) demonstrated LLMs' capability for next-item recommendation but rely on computationally intensive fine-tuning, limiting widespread adoption. This work introduces HLLM+, enhancing the HLLM framework to achieve high-accuracy recommendations without full model fine-tuning. By introducing targeted alignment components between frozen LLMs, our approach outperforms frozen model performance in popular and long-tail item recommendation tasks...","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/iccvw69036.2025.00632","openalex_id":"https://openalex.org/W7131122436","cited_by_count":0,"quality_score":49,"matched_keywords":["LLM","personalization","efficient"],"author_affiliations":["Amazon (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.791100025177002},{"id":"https://openalex.org/C183003079","display_name":"Personalization","score":0.714900016784668},{"id":"https://openalex.org/C105339364","display_name":"Software deployment","score":0.614799976348877},{"id":"https://openalex.org/C139807058","display_name":"Adaptation (eye)","score":0.5539000034332275},{"id":"https://openalex.org/C557471498","display_name":"Recommender system","score":0.5519999861717224},{"id":"https://openalex.org/C2779530757","display_name":"Quality (philosophy)","score":0.51419997215271},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.47609999775886536},{"id":"https://openalex.org/C188198153","display_name":"Limiting","score":0.46970000863075256}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7131098521","title":"SEED-Story: Multimodal Long Story Generation with Large Language Model","url":"https://doi.org/10.1109/iccvw69036.2025.00197","published":"2025-10-19","authors":["Shu Yang","Yuying Ge","Yang Li","Yuehua Chen","Yixiao Ge","Ying Shan","Yuehua Chen"],"abstract":"Advances in image generation and open-form text generation have paved the way for tackling the challenging task of multimodal long story generation. In our work, we introduce SEED-Story, a novel approach that extends Multi-modal Large Language Models (MLLMs) to generate coherent, extended narratives composed of both interleaved text and images. By leveraging robust MLLMs, our model predicts text tokens and regresses visual features that are subsequently refined through an adapted de-tokenizer; ensuring that generated images consistently depict recurring characters and maintain a unified visual style. Further-more, we introduce a multimodal attention sink mechanism to overcome the train-short test-long challenge. This mechanism retains recent tokens while preserving critical tokens from both the start and end of image sequences, enabling efficient autoregressive generation of long stories...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/iccvw69036.2025.00197","openalex_id":"https://openalex.org/W7131098521","cited_by_count":3,"quality_score":48,"matched_keywords":["language model","efficient"],"author_affiliations":["Tencent (China)","University of Hong Kong"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7985000014305115},{"id":"https://openalex.org/C199033989","display_name":"Narrative","score":0.6323000192642212},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5742999911308289},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.5195000171661377},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.4814000129699707},{"id":"https://openalex.org/C2985684807","display_name":"Text generation","score":0.4278999865055084},{"id":"https://openalex.org/C2780451532","display_name":"Task (project management)","score":0.4104999899864197},{"id":"https://openalex.org/C185798385","display_name":"Benchmark (surveying)","score":0.39640000462532043}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":3}},{"id":"openalex:W7131069438","title":"iDETEX: Empowering MLLMs for Intelligent DETailed EXplainable IQA","url":"https://doi.org/10.1109/iccvw69036.2025.00416","published":"2025-10-19","authors":["Zhaoran Zhao","Xinli Yue","Jianhui Sun","Yuhao Xie","Tao Shao","Liangchao Yao","Fan Xia","Yuetang Deng"],"abstract":"Image Quality Assessment (IQA) has progressed from scalar quality prediction to more interpretable, human-aligned evaluation paradigms. In this work, we address the emerging challenge of detailed and explainable IQA by proposing iDETEX-a unified multimodal large language model (MLLM) capable of simultaneously performing three key tasks: quality grounding, perception, and description. To facilitate efficient and generalizable training across these heterogeneous subtasks, we design a suite of task-specific offline augmentation modules and a data mixing strategy. These are further complemented by online enhancement strategies to fully exploit multi-sourced supervision. We validate our approach on the large-scale ViDA-UGC benchmark, where iDETEX achieves state-of-the-art performance across all subtasks. Our model ranks first in the ICCV MIPI 2025 Detailed Image Quality Assessment Challenge,....","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/iccvw69036.2025.00416","openalex_id":"https://openalex.org/W7131069438","cited_by_count":1,"quality_score":46,"matched_keywords":["language model","efficient"],"author_affiliations":["Tencent (China)"],"concepts":[{"id":"https://openalex.org/C165696696","display_name":"Exploit","score":0.7746000289916992},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7684000134468079},{"id":"https://openalex.org/C63479239","display_name":"Robustness (evolution)","score":0.7203999757766724},{"id":"https://openalex.org/C79581498","display_name":"Suite","score":0.6553999781608582},{"id":"https://openalex.org/C26517878","display_name":"Key (lock)","score":0.5105999708175659},{"id":"https://openalex.org/C2779530757","display_name":"Quality (philosophy)","score":0.4754999876022339},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.46889999508857727},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.4636000096797943}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W7131105516","title":"Infusing Fine-Grained Visual Knowledge to Vision-Language Models","url":"https://doi.org/10.1109/iccvw69036.2025.00445","published":"2025-10-19","authors":["Nikolaos-Antonios Ypsilantis","Kaifeng Chen","Andre B. Araujo","Ondřej Chum"],"abstract":"Large-scale contrastive pre-training produces powerful Vision-and-Language Models (VLMs) capable of generating representations (embeddings) effective for a wide variety of visual and multimodal tasks. However, these pretrained embeddings remain suboptimal for fine-grained open-set visual retrieval, where state-of-the-art results require fine-tuning the vision encoder using annotated domain-specific samples. Naively performing such fine-tuning typically leads to catastrophic forgetting, severely diminishing the model's general-purpose visual and cross-modal capabilities. In this work, we propose a fine-tuning method explicitly designed to achieve optimal balance between fine-grained domain adaptation and retention of the pretrained VLM's broad multimodal knowledge. Drawing inspiration from continual learning literature, we systematically analyze standard regularization techniques aimed at...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/iccvw69036.2025.00445","openalex_id":"https://openalex.org/W7131105516","cited_by_count":0,"quality_score":45,"matched_keywords":["retrieval","efficient"],"author_affiliations":["Czech Technical University in Prague","Google (United States)","Google DeepMind (United Kingdom)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7662000060081482},{"id":"https://openalex.org/C118505674","display_name":"Encoder","score":0.6370000243186951},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.6060000061988831},{"id":"https://openalex.org/C8642999","display_name":"Hyperparameter","score":0.5322999954223633},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.5073999762535095},{"id":"https://openalex.org/C177148314","display_name":"Generalization","score":0.5056999921798706},{"id":"https://openalex.org/C177264268","display_name":"Set (abstract data type)","score":0.46230000257492065},{"id":"https://openalex.org/C2776135515","display_name":"Regularization (linguistics)","score":0.4223000109195709}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4416748155","title":"Enhancing Single Image to 3D Generation using Gaussian Splatting and Hybrid Diffusion Priors","url":"https://doi.org/10.1109/iros60139.2025.11246657","published":"2025-10-19","authors":["Hritam Basak","Hamid Tabatabaee","Shreekant Gayaka","Mingfeng Li","Xin Yang","Cheng-Hao Kuo","Arnie Sen","Min Sun","Zhaozheng Yin"],"abstract":"3D object generation from a single unposed RGB image is essential for robotic perception, as reconstructing complete geometry and texture is essential for precise manipulation, grasping, and scene understanding, which is key for autonomous navigation and dexterous interaction. Recent advancements in image-to-3D employ Gaussian Splatting with pre-trained 2D or 3D diffusion models, but a disparity exists: 2D models generate high-fidelity textures yet lack geometric consistency, while 3D models ensure structural coherence but produce overly smooth textures. To address this, we introduce a two-stage frequency-based distillation loss integrated with Gaussian Splatting, leveraging geometric priors from a 3D diffusion model’s low-frequency spectrum for structural consistency and a 2D diffusion model’s high-frequency details for sharper textures. Our approach achieves state-of-the-art 3D reconst...","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/iros60139.2025.11246657","openalex_id":"https://openalex.org/W4416748155","cited_by_count":1,"quality_score":42,"matched_keywords":["distillation"],"author_affiliations":["Amazon (United States)","Stony Brook University"],"concepts":[{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.7817000150680542},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.777400016784668},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7282999753952026},{"id":"https://openalex.org/C177769412","display_name":"Prior probability","score":0.646399974822998},{"id":"https://openalex.org/C163716315","display_name":"Gaussian","score":0.4999000132083893},{"id":"https://openalex.org/C2781238097","display_name":"Object (grammar)","score":0.4043999910354614},{"id":"https://openalex.org/C2776436953","display_name":"Consistency (knowledge bases)","score":0.4011000096797943},{"id":"https://openalex.org/C109950114","display_name":"3D reconstruction","score":0.3693000078201294}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W7131063066","title":"Test-time Prompt Refinement for Text-to-Image Models","url":"https://doi.org/10.1109/iccvw69036.2025.00680","published":"2025-10-19","authors":["Muhammad Abbas Khan","Yash Jain","Siddhartha Bhattacharyya","Vibhav Vineet"],"abstract":"Text-to-image (T2I) generation models have made significant strides but still struggle with prompt sensitivity: even minor changes in prompt wording can yield inconsistent or inaccurate outputs. To address this challenge, we introduce a closed-loop, test-time prompt refinement framework that requires no additional training of the underlying T2I model, termed TIR. In our approach, each generation step is followed by a refinement step, where a pretrained multimodal large language model (MLLM) analyzes the output image and the user's prompt. The MLLM detects misalignments (e.g., missing objects, incorrect attributes) and produces a refined and physically grounded prompt for the next round of image generation. By iteratively refining the prompt and verifying alignment between the prompt and the image, TIR corrects errors, mirroring the iterative refinement process of human artists. We demons...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/iccvw69036.2025.00680","openalex_id":"https://openalex.org/W7131063066","cited_by_count":0,"quality_score":41,"matched_keywords":["language model"],"author_affiliations":["Florida Institute of Technology","Microsoft (United States)"],"concepts":[{"id":"https://openalex.org/C189645446","display_name":"Mirroring","score":0.7860000133514404},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7386000156402588},{"id":"https://openalex.org/C2779982483","display_name":"Iterative refinement","score":0.6218000054359436},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5751000046730042},{"id":"https://openalex.org/C2781181686","display_name":"Coherence (philosophical gambling strategy)","score":0.5598999857902527},{"id":"https://openalex.org/C143587482","display_name":"Iterative and incremental development","score":0.5274999737739563},{"id":"https://openalex.org/C185798385","display_name":"Benchmark (surveying)","score":0.5131000280380249},{"id":"https://openalex.org/C98045186","display_name":"Process (computing)","score":0.5105999708175659}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7131092999","title":"Target Attribute Diffusion Models","url":"https://doi.org/10.1109/iccvw69036.2025.00724","published":"2025-10-19","authors":["William Loh","Yanting Miao","Pascal Poupart","Suraj Kothawade"],"abstract":"Diffusion models have shown notable success in generating images conditioned on textual prompts, enabling users to edit images at a coarse scale with well-aligned text-to-image models. ControlNet [31] enhances these capabilities by allowing diffusion models to edit aspects such as pose, position, and edges according to reference visual motion information in a qualitative manner. However, diffusion models still face challenges in measurable and quantitative applications, such as applying sharpening or color enhancement effects. We call quantities such as brightness and saturation, attributes. In this work, we introduce Target Attribute Diffusion Models (TADM), which enable diffusion models to incorporate additional conditioning on continuous random variables. Unlike classifier-guidance methods, which require training an explicit classifier [30], TADM supports real-valued conditional varia...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/iccvw69036.2025.00724","openalex_id":"https://openalex.org/W7131092999","cited_by_count":0,"quality_score":41,"matched_keywords":["preference"],"author_affiliations":["Google (United States)","University of Waterloo","Vector Institute"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6786999702453613},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.626800000667572},{"id":"https://openalex.org/C2781137444","display_name":"Sharpening","score":0.6263999938964844},{"id":"https://openalex.org/C125245961","display_name":"Brightness","score":0.5011000037193298},{"id":"https://openalex.org/C69357855","display_name":"Diffusion","score":0.4700999855995178},{"id":"https://openalex.org/C95623464","display_name":"Classifier (UML)","score":0.45809999108314514},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.4535999894142151},{"id":"https://openalex.org/C153180895","display_name":"Pattern recognition (psychology)","score":0.4359000027179718}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4416748652","title":"QuietPaw: Learning Quadrupedal Locomotion with Versatile Noise Preference Alignment","url":"https://doi.org/10.1109/iros60139.2025.11246267","published":"2025-10-19","authors":["Yuyou Zhang","Yihang Yao","Shiqi Liu","Yaru Niu","Changyi Lin","Yuxiang Yang","Wenhao Yu","Tingnan Zhang","Jie Tan","Ding Zhao"],"abstract":"When operating at their full capacity, quadrupedal robots can produce loud footstep noise, which can be disruptive in human-centered environments like homes, offices, and hospitals. As a result, balancing locomotion performance with noise constraints is crucial for the successful real-world deployment of quadrupedal robots. However, achieving adaptive noise control is challenging due to (a) the trade-off between agility and noise minimization, (b) the need for generalization across diverse deployment conditions, and (c) the difficulty of effectively adjusting policies based on noise requirements. We propose QuietPaw, a framework incorporating our Conditional Noise-Constrained Policy (CNCP), a constrained learning-based algorithm that enables flexible, noise-aware locomotion by conditioning policy behavior on noise-reduction levels. We leverage value representation decomposition in the cr...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/iros60139.2025.11246267","openalex_id":"https://openalex.org/W4416748652","cited_by_count":0,"quality_score":41,"matched_keywords":["preference"],"author_affiliations":["Carnegie Mellon University","Google (United States)","Google DeepMind (United Kingdom)"],"concepts":[{"id":"https://openalex.org/C99498987","display_name":"Noise (video)","score":0.640999972820282},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6395000219345093},{"id":"https://openalex.org/C90509273","display_name":"Robot","score":0.47940000891685486},{"id":"https://openalex.org/C105339364","display_name":"Software deployment","score":0.47279998660087585},{"id":"https://openalex.org/C116822448","display_name":"Noise control","score":0.4415999948978424},{"id":"https://openalex.org/C153083717","display_name":"Leverage (statistics)","score":0.43320000171661377},{"id":"https://openalex.org/C177148314","display_name":"Generalization","score":0.4196000099182129},{"id":"https://openalex.org/C2776359362","display_name":"Representation (politics)","score":0.4185999929904938}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7160169662","title":"MMAT-1M: A Large Reasoning Dataset for Multimodal Agent Tuning","url":"https://doi.org/10.1109/iccv51701.2025.00146","published":"2025-10-19","authors":["Tianhong Gao","Yannian Fu","Weiqun Wu","Haixiao Yue","Shanshan Liu","Gang Zhang"],"abstract":"","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/iccv51701.2025.00146","openalex_id":"https://openalex.org/W7160169662","cited_by_count":0,"quality_score":41,"matched_keywords":["agent"],"author_affiliations":["Baidu (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6516000032424927},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5590000152587891},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.3156999945640564},{"id":"https://openalex.org/C26517878","display_name":"Key (lock)","score":0.30140000581741333},{"id":"https://openalex.org/C2775924081","display_name":"Control (management)","score":0.27469998598098755},{"id":"https://openalex.org/C20162079","display_name":"Case-based reasoning","score":0.25519999861717224},{"id":"https://openalex.org/C177264268","display_name":"Set (abstract data type)","score":0.24799999594688416},{"id":"https://openalex.org/C165064840","display_name":"Matching (statistics)","score":0.24009999632835388}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7131061914","title":"MFT-VITON: High-Fidelity Virtual Try-On with Minimal Input via a Mask-Free Transformer-Diffusion Model","url":"https://doi.org/10.1109/iccvw69036.2025.00210","published":"2025-10-19","authors":["Zhenchen Wan","Yanwu Xu","Dongting Hu","W. Cheng","Tianxi Chen","Zhaoqing Wang","F. Liu","Tongliang Liu","Mingming Gong"],"abstract":"Recent advancements in Virtual Try-On (VITON) have achieved remarkable realism and texture fidelity, largely attributed to the emergence of text-to-image (T2I) diffusion models. However, prevailing Unet-based T2I backbones are increasingly inadequate for rendering fine-grained garment details, particularly in preserving textual elements and subtle textures. Diffusion Transformer (DiT)-based architectures, with their superior generative capacity, offer a promising alternative, yet their integration into current VITON pipelines is impeded by substantial architectural mismatches. To address these challenges, we propose a novel mask-based framework augmented with three key components: a Garment Semantic (GS)-Adapter for enhanced garment-specific representation, a Text Preservation Loss to maintain high-fidelity text rendering, and LLM-driven semantic guidance for improved alignment between t...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/iccvw69036.2025.00210","openalex_id":"https://openalex.org/W7131061914","cited_by_count":0,"quality_score":41,"matched_keywords":["LLM"],"author_affiliations":["Google (United States)","The University of Melbourne","The University of Sydney"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7813000082969666},{"id":"https://openalex.org/C205711294","display_name":"Rendering (computer graphics)","score":0.5841000080108643},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4722999930381775},{"id":"https://openalex.org/C2778738651","display_name":"Novelty","score":0.42829999327659607},{"id":"https://openalex.org/C26517878","display_name":"Key (lock)","score":0.38199999928474426},{"id":"https://openalex.org/C113364801","display_name":"High fidelity","score":0.3750999867916107},{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.3749000132083893},{"id":"https://openalex.org/C66322947","display_name":"Transformer","score":0.37439998984336853}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7160125469","title":"METEOR: Multi-Encoder Collaborative Token Pruning for Efficient Vision Language Models","url":"https://doi.org/10.1109/iccv51701.2025.01996","published":"2025-10-19","authors":["Liu Y","Yaoming Wang","Bowen Shi","Xi Zhang","Wenrui Dai","C. Li","Hongkai Xiong","Qi Tian"],"abstract":"","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/iccv51701.2025.01996","openalex_id":"https://openalex.org/W7160125469","cited_by_count":0,"quality_score":41,"matched_keywords":["efficient"],"author_affiliations":["Huawei Technologies (China)","Shanghai Jiao Tong University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7037000060081482},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.6093999743461609},{"id":"https://openalex.org/C108010975","display_name":"Pruning","score":0.4616999924182892},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.45010000467300415},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.41429999470710754},{"id":"https://openalex.org/C48145219","display_name":"Security token","score":0.3357999920845032},{"id":"https://openalex.org/C2776401178","display_name":"Feature (linguistics)","score":0.3312000036239624},{"id":"https://openalex.org/C195324797","display_name":"Natural language","score":0.3133000135421753}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7160042910","title":"MA-CIR: A Multimodal Arithmetic Benchmark for Composed Image Retrieval","url":"https://doi.org/10.1109/iccv51701.2025.01982","published":"2025-10-19","authors":["Jaeseok Byun","Young Kyun Jang","Seokhyeon Jeong","D Kim","Taesup Moon"],"abstract":"","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/iccv51701.2025.01982","openalex_id":"https://openalex.org/W7160042910","cited_by_count":0,"quality_score":41,"matched_keywords":["retrieval"],"author_affiliations":["Google (United States)","Korea University","Seoul National University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.631600022315979},{"id":"https://openalex.org/C1667742","display_name":"Image retrieval","score":0.6097999811172485},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5555999875068665},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.44510000944137573},{"id":"https://openalex.org/C185798385","display_name":"Benchmark (surveying)","score":0.44359999895095825},{"id":"https://openalex.org/C115961682","display_name":"Image (mathematics)","score":0.41670000553131104},{"id":"https://openalex.org/C9417928","display_name":"Image processing","score":0.40529999136924744},{"id":"https://openalex.org/C153180895","display_name":"Pattern recognition (psychology)","score":0.34119999408721924}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7131120554","title":"Learning by Taking Notes: Memory-Guided Continual Learning for Generative Multimodal Models","url":"https://doi.org/10.1109/iccvw69036.2025.00456","published":"2025-10-19","authors":["Yanhui Guo","Cai-fang Guo","Yan Gao","Yi Sun"],"abstract":"In the era of large models, emerging Generative Vision-Language Models (VLMs) have exhibited impressive zero-shot learning capabilities on generative tasks by leveraging knowledge acquired through pre-training on large-scale datasets. However, for specific downstream tasks such as classification and detection, VLMs require either prompt engineering with carefully crafted task-specific instructions or fine-tuning to align with task objectives and suppress hallucinations. These issues are further exacerbated under continual learning (CL) settings. In gradient-free in-context learning, generalization to novel tasks relies heavily on prompt design, which may be suboptimal or unavailable at test time. In contrast, gradient-based sequential fine-tuning across tasks tends to intensify hallucination due to the well-known phenomenon of catastrophic forgetting, a fundamen-tal challenge in CL parad...","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/iccvw69036.2025.00456","openalex_id":"https://openalex.org/W7131120554","cited_by_count":0,"quality_score":41,"matched_keywords":["memory"],"author_affiliations":["Amazon (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7488999962806702},{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.7233999967575073},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5659999847412109},{"id":"https://openalex.org/C2780451532","display_name":"Task (project management)","score":0.5360000133514404},{"id":"https://openalex.org/C167966045","display_name":"Generative model","score":0.5077999830245972},{"id":"https://openalex.org/C177148314","display_name":"Generalization","score":0.4733000099658966},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.4537999927997589},{"id":"https://openalex.org/C40506919","display_name":"Sequence learning","score":0.33169999718666077}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7160074091","title":"GestureHYDRA: Semantic Co-Speech Gesture Synthesis via Hybrid Modality Diffusion Transformer and Cascaded-Synchronized Retrieval-Augmented Generation","url":"https://doi.org/10.1109/iccv51701.2025.01172","published":"2025-10-19","authors":["Quanwei Yang","Luying Huang","Kaisiyuan Wang","Jiazhi Guan","Shengyi He","Fengguo Li","Hang Zhou","L J Yu","Yingying Li","Haocheng Feng","Hongtao Xie"],"abstract":"","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/iccv51701.2025.01172","openalex_id":"https://openalex.org/W7160074091","cited_by_count":0,"quality_score":41,"matched_keywords":["retrieval"],"author_affiliations":["Baidu (China)","University of Science and Technology of China"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6477000117301941},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5475000143051147},{"id":"https://openalex.org/C66322947","display_name":"Transformer","score":0.5401999950408936},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.43290001153945923},{"id":"https://openalex.org/C207347870","display_name":"Gesture","score":0.4242999851703644},{"id":"https://openalex.org/C2780226545","display_name":"Modality (human–computer interaction)","score":0.35179999470710754},{"id":"https://openalex.org/C153180895","display_name":"Pattern recognition (psychology)","score":0.31369999051094055},{"id":"https://openalex.org/C28490314","display_name":"Speech recognition","score":0.3098999857902527}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7160148082","title":"From Objects to Events: Unlocking Complex Visual Understanding in Object Detectors Via LLM-guided Symbolic Reasoning","url":"https://doi.org/10.1109/iccv51701.2025.02260","published":"2025-10-19","authors":["Yuhui Zeng","Haoxiang Wu","Wenjie Nie","Guangyao Chen","Xiawu Zheng","Yunhang Shen","Jun Peng","Yonghong Tian","Rongrong Ji"],"abstract":"","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/iccv51701.2025.02260","openalex_id":"https://openalex.org/W7160148082","cited_by_count":0,"quality_score":41,"matched_keywords":["LLM"],"author_affiliations":["Peking University","Tencent (China)","Xiamen University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6481999754905701},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.6060000061988831},{"id":"https://openalex.org/C2781238097","display_name":"Object (grammar)","score":0.6019999980926514},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.5013999938964844},{"id":"https://openalex.org/C94915269","display_name":"Detector","score":0.35899999737739563},{"id":"https://openalex.org/C36464697","display_name":"Visualization","score":0.3345000147819519},{"id":"https://openalex.org/C2776151529","display_name":"Object detection","score":0.305400013923645},{"id":"https://openalex.org/C2776095079","display_name":"The Symbolic","score":0.28999999165534973}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7131121648","title":"Cross-Lingual Visual Text Stylization and Synthesis Incorporating Text Rendering and Diffusion Model","url":"https://doi.org/10.1109/iccvw69036.2025.00636","published":"2025-10-19","authors":["Minmin Shen","C Y Chen"],"abstract":"Visual Text Stylization and Synthesis aims to generate a text that has the same style as the input text. This task is more challenging if the input and output images are of different languages, and remains an unaddressed issue for the state-of-the-art diffusion-based image generation models. To fulfill the demand for cross-lingual visual text stylization and synthesis for commercial applications, we propose a hybrid approach combining the strengths of two different methods: text rendering and diffusion models to generate visual text with the same style as the reference visual text image. This approach addresses the technical challenges of crosslingual text style transfer and is able to produce high quality visual text with various styles and complex texture. Moreover, our approach is able to handle long text with multi-line layout by incorporating large language models (LLM). We evaluate...","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/iccvw69036.2025.00636","openalex_id":"https://openalex.org/W7131121648","cited_by_count":0,"quality_score":41,"matched_keywords":["LLM"],"author_affiliations":["Amazon (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.801800012588501},{"id":"https://openalex.org/C205711294","display_name":"Rendering (computer graphics)","score":0.6715999841690063},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5792999863624573},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.5206000208854675},{"id":"https://openalex.org/C177264268","display_name":"Set (abstract data type)","score":0.44190001487731934},{"id":"https://openalex.org/C2780878386","display_name":"Visual language","score":0.40380001068115234},{"id":"https://openalex.org/C36464697","display_name":"Visualization","score":0.3953000009059906},{"id":"https://openalex.org/C2780451532","display_name":"Task (project management)","score":0.33809998631477356}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4416749283","title":"CLAP: A Closed-Loop Diffusion Transformer Action Foundation Model for Robotic Manipulation","url":"https://doi.org/10.1109/iros60139.2025.11246478","published":"2025-10-19","authors":["Mu Li","Dong Ye","Yang Zhou","Chenguang Yang"],"abstract":"The development of large Vision-Language-Action (VLA) models has enhanced the robot’s ability to manipulate objects in unseen scenarios based on language instructions. While existing VLAs have demonstrated promise in various scenarios, they still struggle with effective multi-modal data feature extraction and lack a closed-loop inference framework. In this paper, we propose an advanced VLA model. Unlike previous works that repurpose VLM for action prediction using simple action quantization, we componentized the VLA architecture with a specialized action module conditioned on the model output and a critic module for inference. We demonstrate the performance improvement of diffusion action transformers in modeling continuous temporal actions, with the critic module applied during inference to form a closed-loop model. Extensive experiments on real robots demonstrate that our model signifi...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/iros60139.2025.11246478","openalex_id":"https://openalex.org/W4416749283","cited_by_count":0,"quality_score":41,"matched_keywords":["quantization"],"author_affiliations":["Huawei Technologies (China)","South China University of Technology"],"concepts":[{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.6859999895095825},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6851999759674072},{"id":"https://openalex.org/C66322947","display_name":"Transformer","score":0.6741999983787537},{"id":"https://openalex.org/C2776214188","display_name":"Inference","score":0.6247000098228455},{"id":"https://openalex.org/C2780791683","display_name":"Action (physics)","score":0.5512999892234802},{"id":"https://openalex.org/C90509273","display_name":"Robot","score":0.49230000376701355},{"id":"https://openalex.org/C133731056","display_name":"Control engineering","score":0.46720001101493835},{"id":"https://openalex.org/C2776401178","display_name":"Feature (linguistics)","score":0.4449999928474426}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7131126688","title":"StyleBooth: Image Style Editing with Multimodal Instruction","url":"https://doi.org/10.1109/iccvw69036.2025.00206","published":"2025-10-19","authors":["Zhen Han","Chaojie Mao","Zeyinzi Jiang","Yulin Pan","Jingfeng Zhang"],"abstract":"Given an original image, image editing aims to generate an image that align with the provided instruction. The challenges are to accept multimodal inputs as instructions and a scarcity of high-quality training data, including crucial triplets of source/target image pairs and multimodal (text and image) instructions. In this paper, we focus on image style editing and present StyleBooth, a method that proposes a comprehensive framework for image editing and a feasible strategy for building a high-quality style editing dataset. We integrate encoded textual instruction and image exemplar as a unified condition for diffusion model, enabling the editing of original image following multimodal instructions. Furthermore, by iterative style-destyle tuning and editing and usability filtering, the StyleBooth dataset provides content-consistent stylized/plain image pairs in various categories of styl...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/iccvw69036.2025.00206","openalex_id":"https://openalex.org/W7131126688","cited_by_count":1,"quality_score":38,"matched_keywords":[],"author_affiliations":["Alibaba Group (China)","Alibaba Group (United States)"],"concepts":[{"id":"https://openalex.org/C2776674983","display_name":"Image editing","score":0.9039999842643738},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7904999852180481},{"id":"https://openalex.org/C2780598303","display_name":"Flexibility (engineering)","score":0.7333999872207642},{"id":"https://openalex.org/C170130773","display_name":"Usability","score":0.5978999733924866},{"id":"https://openalex.org/C115961682","display_name":"Image (mathematics)","score":0.5489000082015991},{"id":"https://openalex.org/C192209626","display_name":"Focus (optics)","score":0.5292999744415283},{"id":"https://openalex.org/C136197465","display_name":"Variety (cybernetics)","score":0.4894999861717224},{"id":"https://openalex.org/C2776445246","display_name":"Style (visual arts)","score":0.4747999906539917}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W7131126390","title":"PointSeg: A Training-Free Paradigm for 3D Scene Segmentation via Foundation Models","url":"https://doi.org/10.1109/iccvw69036.2025.00279","published":"2025-10-19","authors":["Qingdong He","Jinlong Peng","Zhengkai Jiang","Xiaobin Hu","Jiangning Zhang"],"abstract":"Recent success of vision foundation models have shown promising performance for the 2D perception tasks. However, it is difficult to train a 3D foundation network directly due to the limited dataset and it remains under explored whether existing foundation models can be lifted to 3D space seamlessly. In this paper, we present PointSeg, a novel training-free paradigm that leverages off-the-shelf vision foundation models to address 3D scene perception tasks. PointSeg can segment anything in 3D scene by acquiring accurate 3D prompts to align their corresponding pixels across frames. Concretely, we design a two-branch prompts learning structure to construct the 3D point-box prompts pairs, combining with the bidirectional matching strategy for accurate point and proposal prompts generation. Then, we perform the iterative post-refinement adaptively when cooperated with different vision foundat...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/iccvw69036.2025.00279","openalex_id":"https://openalex.org/W7131126390","cited_by_count":1,"quality_score":38,"matched_keywords":[],"author_affiliations":["Tencent (China)"],"concepts":[{"id":"https://openalex.org/C2780966255","display_name":"Foundation (evidence)","score":0.8159000277519226},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7286999821662903},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.6854000091552734},{"id":"https://openalex.org/C89600930","display_name":"Segmentation","score":0.6083999872207642},{"id":"https://openalex.org/C2780801425","display_name":"Construct (python library)","score":0.5982000231742859},{"id":"https://openalex.org/C26760741","display_name":"Perception","score":0.5522000193595886},{"id":"https://openalex.org/C165064840","display_name":"Matching (statistics)","score":0.5246999859809875},{"id":"https://openalex.org/C28719098","display_name":"Point (geometry)","score":0.48829999566078186}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W7131093241","title":"A Large-Scale Benchmark of Cross-Modal Learning for Histology and Gene Expression in Spatial Transcriptomics","url":"https://doi.org/10.1109/iccvw69036.2025.00128","published":"2025-10-19","authors":["Rushin H. Gindra","Giovanni Palla","Mathias Nguyen","Sophia Wagner","Manuel Tran","Fabian J Theis","Dieter Saur","Lorin Crawford","Tingying Peng"],"abstract":"Spatial transcriptomics enables simultaneous measurement of gene expression and tissue morphology, offering unprecedented insights into cellular organization and disease mechanisms. However, the field lacks comprehensive benchmarks for evaluating multimodal learning methods that leverage both histology images and gene expression data. Here, we present HESCAPE, a large-scale benchmark for cross-modal contrastive pretraining in spatial transcriptomics, built on a curated pan-organ dataset spanning 6 different gene panels and 54 donors. We systematically evaluated state-of-the-art image and gene expression encoders across multiple pre-training strategies and assessed their effectiveness on two downstream tasks: gene mutation classification and gene expression prediction. Our benchmark demonstrates that gene expression encoders are the primary determinant of strong representational alignment...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/iccvw69036.2025.00128","openalex_id":"https://openalex.org/W7131093241","cited_by_count":1,"quality_score":38,"matched_keywords":[],"author_affiliations":["Helmholtz Zentrum München","Microsoft (United States)","Technical University of Munich"],"concepts":[{"id":"https://openalex.org/C185798385","display_name":"Benchmark (surveying)","score":0.6934000253677368},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.607200026512146},{"id":"https://openalex.org/C86251818","display_name":"Benchmarking","score":0.5677000284194946},{"id":"https://openalex.org/C150194340","display_name":"Gene expression","score":0.5494999885559082},{"id":"https://openalex.org/C153083717","display_name":"Leverage (statistics)","score":0.5432999730110168},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5304999947547913},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.44620001316070557},{"id":"https://openalex.org/C162317418","display_name":"Transcriptome","score":0.4449999928474426}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W7160047254","title":"Visual Chronicles: Using Multimodal LLMs to Analyze Massive Collections of Images","url":"https://doi.org/10.1109/iccv51701.2025.01186","published":"2025-10-19","authors":["Boyang Deng","Songyou Peng","Kyle Genova","Gordon Wetzstein","Noah Snavely","Leonidas Guibas","Thomas Funkhouser"],"abstract":"","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/iccv51701.2025.01186","openalex_id":"https://openalex.org/W7160047254","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Google (United States)","Google DeepMind (United Kingdom)","Stanford University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5644000172615051},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.39419999718666077},{"id":"https://openalex.org/C36464697","display_name":"Visualization","score":0.36340001225471497},{"id":"https://openalex.org/C2776401178","display_name":"Feature (linguistics)","score":0.28850001096725464},{"id":"https://openalex.org/C205649164","display_name":"Geography","score":0.26159998774528503},{"id":"https://openalex.org/C2781238097","display_name":"Object (grammar)","score":0.26100000739097595},{"id":"https://openalex.org/C26517878","display_name":"Key (lock)","score":0.24639999866485596}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7160185532","title":"R1-Onevision: Advancing Generalized Multimodal Reasoning Through Cross-Modal Formalization","url":"https://doi.org/10.1109/iccv51701.2025.00229","published":"2025-10-19","authors":["Yi Yang","Xiaoxuan He","Hongkun Pan","Xiyan Jiang","Yan Deng","X Y Yang","Haoyu Lu","Dacheng Yin","Fengyun Rao","Minfeng Zhu","Bo Zhang","Wei Chen"],"abstract":"","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/iccv51701.2025.00229","openalex_id":"https://openalex.org/W7160185532","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Renmin University of China","Tencent (China)","Zhejiang University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.576200008392334},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4699000120162964},{"id":"https://openalex.org/C195344581","display_name":"Automated reasoning","score":0.290800005197525},{"id":"https://openalex.org/C20162079","display_name":"Case-based reasoning","score":0.2867000102996826},{"id":"https://openalex.org/C177264268","display_name":"Set (abstract data type)","score":0.2687000036239624},{"id":"https://openalex.org/C159032336","display_name":"Non-monotonic logic","score":0.26829999685287476},{"id":"https://openalex.org/C2780791683","display_name":"Action (physics)","score":0.2621000111103058},{"id":"https://openalex.org/C98045186","display_name":"Process (computing)","score":0.24819999933242798}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7160019347","title":"Modaltune: Fine-Tuning Slide-Level Foundation Models with Multi-Modal Information for Multi-Task Learning in Digital Pathology","url":"https://doi.org/10.1109/iccv51701.2025.02217","published":"2025-10-19","authors":["Vishwesh Ramanathan","Tony Xu","Pushpak Pati","Faruk Ahmed","Maged Goubran","Anne L. Martel"],"abstract":"","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/iccv51701.2025.02217","openalex_id":"https://openalex.org/W7160019347","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Google (United States)","Sunnybrook Health Science Centre","Sunnybrook Hospital"],"concepts":[{"id":"https://openalex.org/C2780966255","display_name":"Foundation (evidence)","score":0.5296000242233276},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.4814999997615814},{"id":"https://openalex.org/C127413603","display_name":"Engineering","score":0.32600000500679016},{"id":"https://openalex.org/C2777522853","display_name":"Digital pathology","score":0.3100000023841858},{"id":"https://openalex.org/C180198813","display_name":"Information system","score":0.3059999942779541},{"id":"https://openalex.org/C55587333","display_name":"Engineering ethics","score":0.26179999113082886},{"id":"https://openalex.org/C26517878","display_name":"Key (lock)","score":0.26100000739097595},{"id":"https://openalex.org/C71924100","display_name":"Medicine","score":0.2531000077724457}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7160166907","title":"MeshLLM: Empowering Large Language Models to Progressively Understand and Generate 3D Mesh","url":"https://doi.org/10.1109/iccv51701.2025.01305","published":"2025-10-19","authors":["Shuangkang Fang","I‐Chao Shen","Yuxing Wang","Yi–Hsuan Tsai","Yi Yang","Shuchang Zhou","Wenrui Ding","Takeo Igarashi","Ming-Hsuan Yang"],"abstract":"","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/iccv51701.2025.01305","openalex_id":"https://openalex.org/W7160166907","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Beihang University","Google (United States)","The University of Tokyo","University of California, Merced"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6536999940872192},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.36500000953674316},{"id":"https://openalex.org/C179603123","display_name":"Modeling language","score":0.3109999895095825},{"id":"https://openalex.org/C165064840","display_name":"Matching (statistics)","score":0.3012000024318695},{"id":"https://openalex.org/C12713177","display_name":"Perspective (graphical)","score":0.29499998688697815},{"id":"https://openalex.org/C36464697","display_name":"Visualization","score":0.2930999994277954},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.2897000014781952},{"id":"https://openalex.org/C26517878","display_name":"Key (lock)","score":0.2800000011920929}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7160137148","title":"Mdp <sup>3</sup> : a Training-Free Approach for List-Wise Frame Selection in Video-Llms","url":"https://doi.org/10.1109/iccv51701.2025.02233","published":"2025-10-19","authors":["Hui Sun","Shiyin Lu","Huanyu Wang","Qing-Guo Chen","Zhao Xu","Weihua Luo","Kaifu Zhang","Moran Li"],"abstract":"","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/iccv51701.2025.02233","openalex_id":"https://openalex.org/W7160137148","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Alibaba Group (China)","Nanjing University of Information Science and Technology"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.576200008392334},{"id":"https://openalex.org/C81917197","display_name":"Selection (genetic algorithm)","score":0.41760000586509705},{"id":"https://openalex.org/C126042441","display_name":"Frame (networking)","score":0.4018999934196472},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3865000009536743},{"id":"https://openalex.org/C11413529","display_name":"Algorithm","score":0.3531999886035919},{"id":"https://openalex.org/C177264268","display_name":"Set (abstract data type)","score":0.29490000009536743},{"id":"https://openalex.org/C33923547","display_name":"Mathematics","score":0.2896000146865845},{"id":"https://openalex.org/C116834253","display_name":"Identification (biology)","score":0.263700008392334}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7160031978","title":"Mamba-3VL: Taming State Space Model for 3D Vision Language Learning","url":"https://doi.org/10.1109/iccv51701.2025.00592","published":"2025-10-19","authors":["Yuan Wang","Yuxin Chen","Zhongang Qi","Lijun Liu","Jile Jiao","Xuetao Feng","Yujia Liang","Ying Shan","Zhipeng Zhang"],"abstract":"","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/iccv51701.2025.00592","openalex_id":"https://openalex.org/W7160031978","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Artificial Intelligence in Medicine (Canada)","Tencent (China)","Tsinghua University","University College of Applied Science"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5896999835968018},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5526999831199646},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.4034999907016754},{"id":"https://openalex.org/C48103436","display_name":"State (computer science)","score":0.38920000195503235},{"id":"https://openalex.org/C2778572836","display_name":"Space (punctuation)","score":0.37869998812675476},{"id":"https://openalex.org/C2776401178","display_name":"Feature (linguistics)","score":0.35179999470710754},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.33149999380111694},{"id":"https://openalex.org/C195324797","display_name":"Natural language","score":0.30300000309944153}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7160183092","title":"LIRA: Reasoning Reconstruction via Multimodal Large Language Models","url":"https://doi.org/10.1109/iccv51701.2025.00172","published":"2025-10-19","authors":["Zhen Zhou","Tong Wang","Yunkai Ma","Xiao Tan","Fengshui Jing"],"abstract":"","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/iccv51701.2025.00172","openalex_id":"https://openalex.org/W7160183092","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Baidu (China)","Chinese Academy of Sciences","Institute of Automation"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6362000107765198},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.520799994468689},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.4528000056743622},{"id":"https://openalex.org/C195324797","display_name":"Natural language","score":0.33070001006126404},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.31630000472068787},{"id":"https://openalex.org/C195344581","display_name":"Automated reasoning","score":0.31439998745918274},{"id":"https://openalex.org/C2776401178","display_name":"Feature (linguistics)","score":0.2628999948501587},{"id":"https://openalex.org/C26517878","display_name":"Key (lock)","score":0.2619999945163727}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7160106859","title":"Ideator: Jailbreaking and Benchmarking Large Vision-Language Models Using Themselves","url":"https://doi.org/10.1109/iccv51701.2025.00830","published":"2025-10-19","authors":["Ruofan Wang","Juncheng Li","Yue Wang","Bo Wang","Xiaosen Wang","Yan Teng","Yingchun Wang","X L","Yu-Gang Jiang"],"abstract":"","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/iccv51701.2025.00830","openalex_id":"https://openalex.org/W7160106859","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Beijing Academy of Artificial Intelligence","Fudan University","Huawei Technologies (China)","Shanghai Artificial Intelligence Laboratory"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5824999809265137},{"id":"https://openalex.org/C86251818","display_name":"Benchmarking","score":0.3939000070095062},{"id":"https://openalex.org/C2780009758","display_name":"Measure (data warehouse)","score":0.33329999446868896},{"id":"https://openalex.org/C124101348","display_name":"Data mining","score":0.265500009059906},{"id":"https://openalex.org/C2779343474","display_name":"Context (archaeology)","score":0.25940001010894775},{"id":"https://openalex.org/C67186912","display_name":"Data modeling","score":0.25},{"id":"https://openalex.org/C116834253","display_name":"Identification (biology)","score":0.24959999322891235},{"id":"https://openalex.org/C18762648","display_name":"Work (physics)","score":0.24899999797344208}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7131064352","title":"Geometric Inductive Priors in Diffusion-Based Optical Flow Estimation","url":"https://doi.org/10.1109/iccvw69036.2025.00073","published":"2025-10-19","authors":["Alberto Pepe","Joan Lasenby","Paulo dos Santos Mendonca"],"abstract":"Diffusion models are ubiquitous in generative modeling and their prevalence in structured prediction tasks is increasing. The denoising diffusion vision model (DDVM), for example, achieves state-of-the-art accuracy on tasks such as monocular depth and optical flow estimation. We introduce GA-DDVM, a modified version of DDVM working in Geometric Algebra (GA) that includes a geometric prior to constrain diffusion for faster and more accurate optical flow estimation. We constrain diffusion in two key ways: (i) we restrict the types of objects learned by the pipeline to 2D vector fields, (i.e., optical flows), and (ii) we limit the operations performed by the network layers on these objects to scaling and rotations. GA-DDVM demonstrates substantial improvements over the baseline DDVM that emerge early in training and persist across all checkpoints: at 600k training steps, GA-DDVM reduces the...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/iccvw69036.2025.00073","openalex_id":"https://openalex.org/W7131064352","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Microsoft (United States)","University of Cambridge"],"concepts":[{"id":"https://openalex.org/C155542232","display_name":"Optical flow","score":0.699400007724762},{"id":"https://openalex.org/C177769412","display_name":"Prior probability","score":0.6657999753952026},{"id":"https://openalex.org/C176217482","display_name":"Metric (unit)","score":0.5461999773979187},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.53329998254776},{"id":"https://openalex.org/C99844830","display_name":"Scaling","score":0.5205000042915344},{"id":"https://openalex.org/C41608201","display_name":"Embedding","score":0.5144000053405762},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4993000030517578},{"id":"https://openalex.org/C11413529","display_name":"Algorithm","score":0.4586000144481659}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7160152231","title":"From Prompt to Progression: Taming Video Diffusion Models for Seamless Attribute Transition","url":"https://doi.org/10.1109/iccv51701.2025.01733","published":"2025-10-19","authors":["Ling Lo","Kelvin C. K. Chan","Wen-Huang Cheng","Ming-Hsuan Yang"],"abstract":"","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/iccv51701.2025.01733","openalex_id":"https://openalex.org/W7160152231","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Google (United States)","Google DeepMind (United Kingdom)","National Taiwan University","National Yang Ming Chiao Tung University","University of California, Merced"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5519999861717224},{"id":"https://openalex.org/C69357855","display_name":"Diffusion","score":0.3625999987125397},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.359499990940094},{"id":"https://openalex.org/C194232998","display_name":"Transition (genetics)","score":0.35350000858306885},{"id":"https://openalex.org/C2776401178","display_name":"Feature (linguistics)","score":0.3285999894142151},{"id":"https://openalex.org/C11413529","display_name":"Algorithm","score":0.3188000023365021},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.3165000081062317},{"id":"https://openalex.org/C2779343474","display_name":"Context (archaeology)","score":0.2784000039100647}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7160134766","title":"From Enhancement to Understanding: Build a Generalized Bridge for Low-Light Vision via Semantically Consistent Unsupervised Fine-Tuning","url":"https://doi.org/10.1109/iccv51701.2025.01281","published":"2025-10-19","authors":["Sen Wang","Shao Zeng","Tianjun Gu","Zhizhong Zhang","Ruixin Zhang","Shouhong Ding","Jingyun Zhang","Jun Wang","Xin Tan","Yuan Xie","Lizhuang Ma"],"abstract":"","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/iccv51701.2025.01281","openalex_id":"https://openalex.org/W7160134766","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["East China Normal University","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6019999980926514},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5271000266075134},{"id":"https://openalex.org/C100776233","display_name":"Bridge (graph theory)","score":0.45660001039505005},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.3921000063419342},{"id":"https://openalex.org/C2776401178","display_name":"Feature (linguistics)","score":0.3037000000476837},{"id":"https://openalex.org/C61797465","display_name":"Term (time)","score":0.29829999804496765},{"id":"https://openalex.org/C26517878","display_name":"Key (lock)","score":0.2928999960422516},{"id":"https://openalex.org/C153180895","display_name":"Pattern recognition (psychology)","score":0.273499995470047}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7160105111","title":"Few-Shot Image Quality Assessment via Adaptation of Vision-Language Models","url":"https://doi.org/10.1109/iccv51701.2025.00972","published":"2025-10-19","authors":["Li X","Z P Huang","Yan Zhang","Yunhang Shen","Ke Li","Xiawu Zheng","Liujuan Cao","Rongrong Ji"],"abstract":"","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/iccv51701.2025.00972","openalex_id":"https://openalex.org/W7160105111","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Beijing Institute of Technology","Ministry of Education of the People's Republic of China","Tencent (China)","Xiamen University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5842000246047974},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5670999884605408},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.47929999232292175},{"id":"https://openalex.org/C2779530757","display_name":"Quality (philosophy)","score":0.4092999994754791},{"id":"https://openalex.org/C55020928","display_name":"Image quality","score":0.40059998631477356},{"id":"https://openalex.org/C3020001037","display_name":"Quality assessment","score":0.3626999855041504},{"id":"https://openalex.org/C115961682","display_name":"Image (mathematics)","score":0.35690000653266907},{"id":"https://openalex.org/C139807058","display_name":"Adaptation (eye)","score":0.353300005197525}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7160104589","title":"Enhancing Numerical Prediction of MLLMS With Soft Labeling","url":"https://doi.org/10.1109/iccv51701.2025.00327","published":"2025-10-19","authors":["Pei Wang","Zhaowei Cai","Hao Yang","Davide Modolo","Ashwin Swaminathan"],"abstract":"","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/iccv51701.2025.00327","openalex_id":"https://openalex.org/W7160104589","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Amazon (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5116999745368958},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.39320001006126404},{"id":"https://openalex.org/C153180895","display_name":"Pattern recognition (psychology)","score":0.3077999949455261},{"id":"https://openalex.org/C11413529","display_name":"Algorithm","score":0.30169999599456787},{"id":"https://openalex.org/C33923547","display_name":"Mathematics","score":0.274399995803833},{"id":"https://openalex.org/C99498987","display_name":"Noise (video)","score":0.2623000144958496},{"id":"https://openalex.org/C2776401178","display_name":"Feature (linguistics)","score":0.2599000036716461},{"id":"https://openalex.org/C9652623","display_name":"Field (mathematics)","score":0.23929999768733978}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4416748992","title":"Drive&Gen: Co-Evaluating End-to-End Driving and Video Generation Models","url":"https://doi.org/10.1109/iros60139.2025.11247678","published":"2025-10-19","authors":["Jiahao Wang","Zhenpei Yang","Yijing Bai","Yingwei Li","Yuliang Zou","Bo Sun","Abhijit Kundu","Jose Lezama","L. Huang","Zehao Zhu","Jyh-Jing Hwang","Dragomir Anguelov"],"abstract":"Recent advances in generative models have sparked exciting new possibilities in the field of autonomous vehicles. Specifically, video generation models are now being explored as controllable virtual testing environments. Simultaneously, end-to-end (E2E) driving models have emerged as a streamlined alternative to conventional modular autonomous driving systems, gaining popularity for their simplicity and scalability. However, the application of these techniques to simulation and planning raises important questions. First, while video generation models can generate increasingly realistic videos, can these videos faithfully adhere to the specified conditions and be realistic enough for E2E autonomous planner evaluation? Second, given that data is crucial for understanding and controlling E2E planners, how can we gain deeper insights into their biases and improve their ability to generalize....","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/iros60139.2025.11247678","openalex_id":"https://openalex.org/W4416748992","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Google (United States)","Google DeepMind (United Kingdom)","Johns Hopkins University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7264999747276306},{"id":"https://openalex.org/C9652623","display_name":"Field (mathematics)","score":0.5383999943733215},{"id":"https://openalex.org/C101468663","display_name":"Modular design","score":0.4805000126361847},{"id":"https://openalex.org/C48209547","display_name":"Controllability","score":0.4699000120162964},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4334999918937683},{"id":"https://openalex.org/C177148314","display_name":"Generalization","score":0.4325000047683716},{"id":"https://openalex.org/C2776372474","display_name":"Simplicity","score":0.4027000069618225},{"id":"https://openalex.org/C59519942","display_name":"Drone","score":0.3799000084400177}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7131114081","title":"Concat-ID: Towards Universal Identity-Preserving Video Synthesis","url":"https://doi.org/10.1109/iccvw69036.2025.00202","published":"2025-10-19","authors":["Yong Zhong","Zhuoyi Yang","Jiayan Teng","Xiaotao Gu","Chongxuan Li"],"abstract":"We present Concat-ID, a unified framework for identity-preserving video generation. Concat-ID employs variational autoencoders to extract image features, which are then concatenated with video latents along the sequence di-mension. It relies exclusively on inherent 3D self-attention mechanisms to incorporate them, eliminating the need for additional parameters or modules. A novel cross-video pairing strategy and a multi-stage training regimen are introduced to balance identity consistency and facial ed-itability while enhancing video naturalness. Extensive ex-periments demonstrate Concat-ID's superiority over existing methods in both single and multi-identity generation, as well as its seamless scalability to multi-subject scenar-ios, including virtual try-on and background-controllable generation. Concat-ID establishes a new benchmark for identity-preserving video synthesis, providing a...","companies":["Z.ai/Zhipu"],"matched_orgs":["Z.ai/Zhipu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/iccvw69036.2025.00202","openalex_id":"https://openalex.org/W7131114081","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Renmin University of China","Tsinghua University","Zhipu AI (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7821999788284302},{"id":"https://openalex.org/C48044578","display_name":"Scalability","score":0.7024000287055969},{"id":"https://openalex.org/C2776436953","display_name":"Consistency (knowledge bases)","score":0.5009999871253967},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4771000146865845},{"id":"https://openalex.org/C2778112365","display_name":"Sequence (biology)","score":0.4465000033378601},{"id":"https://openalex.org/C185798385","display_name":"Benchmark (surveying)","score":0.42910000681877136},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.39890000224113464},{"id":"https://openalex.org/C2776449333","display_name":"View synthesis","score":0.38359999656677246}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7131097523","title":"Bridge Feature Matching and Cross-Modal Alignment with Mutual-Filtering for Zero-Shot Anomaly Detection","url":"https://doi.org/10.1109/iccvw69036.2025.00374","published":"2025-10-19","authors":["Yuhu Bai","Jiangning Zhang","Yunkang Cao","Guangyuan Lu","Qingdong He","Xiangtai Li","Guanzhong Tian"],"abstract":"With the advent of vision-language models (e.g., CLIP) in zero- and few-shot settings, CLIP has been widely applied to zero-shot anomaly detection (ZSAD) in recent research, where the rare classes are essential and expected in many applications. This study introduces FiSeCLIP for ZSAD with training-free CLIP, combining the feature matching with the cross-modal alignment. Testing with the entire dataset is impractical, while batch-based testing better aligns with real industrial needs, and images within a batch can serve as mutual reference points. Accordingly, FiSeCLIP utilizes other images in the same batch as reference information for the current image. However, the lack of labels for these references can introduce ambiguity, we apply text information to filter out noisy features. In addition, we further explore CLIP's inherent potential to restore its local semantic correlation, adapt...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/iccvw69036.2025.00374","openalex_id":"https://openalex.org/W7131097523","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Hunan University","Peking University","Tencent (China)","Zhejiang University"],"concepts":[{"id":"https://openalex.org/C739882","display_name":"Anomaly detection","score":0.7713000178337097},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6930999755859375},{"id":"https://openalex.org/C89600930","display_name":"Segmentation","score":0.6640999913215637},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.6326000094413757},{"id":"https://openalex.org/C2776401178","display_name":"Feature (linguistics)","score":0.5910999774932861},{"id":"https://openalex.org/C153180895","display_name":"Pattern recognition (psychology)","score":0.5654000043869019},{"id":"https://openalex.org/C12997251","display_name":"Anomaly (physics)","score":0.5630999803543091},{"id":"https://openalex.org/C106131492","display_name":"Filter (signal processing)","score":0.5523999929428101}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4415336385","title":"Addressing task conflicts in LLMs multi-task fine-tuning with task-specific subnetwork refinement","url":"https://doi.org/10.1007/s10994-025-06885-z","published":"2025-10-19","authors":["Yiqun Wang","Chaoqun Wan","Xiang Tian","Xuesong Liu","Yaowu Chen"],"abstract":"","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1007/s10994-025-06885-z","openalex_id":"https://openalex.org/W4415336385","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Alibaba Group (China)","Embedded Systems (United States)","Zhejiang Sci-Tech University","Zhejiang University"],"concepts":[{"id":"https://openalex.org/C2780186347","display_name":"Subnetwork","score":0.9685999751091003},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7329000234603882},{"id":"https://openalex.org/C2780451532","display_name":"Task (project management)","score":0.7218000292778015},{"id":"https://openalex.org/C179518139","display_name":"Coding (social sciences)","score":0.4415000081062317},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4397999942302704},{"id":"https://openalex.org/C175154964","display_name":"Task analysis","score":0.4357999861240387},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.3601999878883362},{"id":"https://openalex.org/C2776214188","display_name":"Inference","score":0.27129998803138733}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/from-user-surveys-to-telemetry-driven-agents-exploring-the-potential-of-personalized-productivity-solutions","title":"From User Surveys to Telemetry-Driven Agents: Exploring the Potential of Personalized Productivity Solutions","url":"https://www.microsoft.com/en-us/research/publication/from-user-surveys-to-telemetry-driven-agents-exploring-the-potential-of-personalized-productivity-solutions/","published":"2025-10-18","authors":["Subigya Nepal","Javier Hernandez","Talie Massachi","Kael Rowan","Judith Amores","Jina Suh","Gonzalo Ramos","Brian Houck","Shamsi Iqbal","Mary Czerwinski"],"abstract":"We present a comprehensive, user-centric approach to understand preferences in AI-based productivity agents and develop personalized solutions tailored to users' needs. Utilizing a two-phase method, we first conducted a survey with 363 participants, exploring various aspects of productivity, communication style, agent approach, personality traits, personalization, and privacy. Drawing on the survey insights, we developed a GPT-4 powered personalized productivity agent that utilizes telemetry data gathered via Viva Insights from information workers to provide tailored assistance. We compared its performance with alternative productivity-assistive tools, such as dashboard and narrative, in a study involving 40 participants. Our findings highlight the importance of user-centric design, adaptability, and the balance between personalization and privacy in AI-assisted productivity tools. By bu...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":88,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Human-computer interaction","Human Computer Interaction","Productivity","1970-01-01","personalized","personalization","agent"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/vagen-reinforcing-world-model-reasoning-for-multi-turn-vlm-agents","title":"VAGEN: Reinforcing World Model Reasoning for Multi-Turn VLM Agents","url":"https://www.microsoft.com/en-us/research/publication/vagen-reinforcing-world-model-reasoning-for-multi-turn-vlm-agents/","published":"2025-10-18","authors":["Kangrui Wang","Pingyue Zhang","Zihan Wang","Yaning Gao","Linjie Li","Qineng Wang","Hanyang Chen","Chi Wan","Yiping Lu","Zhengyuan Yang","Lijuan Wang","Ranjay Krishna"],"abstract":"A key challenge in training Vision-Language Model (VLM) agents, compared to Language Model (LLM) agents, lies in the shift from textual states to complex visual observations. This transition introduces partial observability and demands robust world modeling. We ask: Can VLM agents construct internal world models through explicit visual state reasoning? To address this question, we architecturally enforce and reward the agent's reasoning process via reinforcement learning (RL), formulating it as a Partially Observable Markov Decision Process (POMDP). We find that decomposing the agent's reasoning into State Estimation (\"what is the current state?\") and Transition Modeling (\"what comes next?\") is critical for success, as demonstrated through five reasoning strategies. Our investigation into how agents represent internal beliefs reveals that the optimal representation is task-dependent: Nat...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":84,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","large language models","1970-01-01","LLM","language model","agent"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"official:69dc448af4338f99","title":"Controlling Multimodal LLMs via Reward-guided Decoding","url":"https://ai.meta.com/research/publications/controlling-multimodal-llms-via-reward-guided-decoding/","published":"2025-10-18","authors":["Oscar Mañas","Pierluca D'Oro","Koustuv Sinha","Adriana Romero Soriano","Michal Drozdzal","Aishwarya Agrawal"],"abstract":"Official Meta AI publication page.","companies":["Meta/FAIR"],"matched_orgs":["Meta/FAIR"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["NLP"],"author_affiliations":["Meta/FAIR"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Meta AI publication search https://ai.meta.com/results/?content_types%5B0%5D=publication&page=3"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/cost-aware-retrieval-augmentation-reasoning-models-with-adaptive-retrieval-depth","title":"Cost-Aware Retrieval-Augmentation Reasoning Models with Adaptive Retrieval Depth","url":"https://www.microsoft.com/en-us/research/publication/cost-aware-retrieval-augmentation-reasoning-models-with-adaptive-retrieval-depth/","published":"2025-10-17","authors":["Helia Hashemi","Victor Ruehle","Saravan Rajmohan"],"abstract":"Reasoning models have gained significant attention due to their strong performance, particularly when enhanced with retrieval augmentation. However, these models often incur high computational costs, as both retrieval and reasoning tokens contribute substantially to the overall resource usage. In this work, we make the following contributions: (1) we propose a retrieval-augmented reasoning model that dynamically adjusts the length of the retrieved document list based on the query and retrieval results; (2) we develop a cost-aware advantage function for training of efficient retrieval-augmented reasoning models through reinforcement learning; and (3) we explore both memory- and latency-bound implementations of the proposed cost-aware framework for both proximal and group relative policy optimization algorithms. We evaluate our approach on seven public question answering datasets and demon...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Miscellaneous","Artificial intelligence","Computer science","memory","retrieval","efficient"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W4415274689","title":"The Impact of Generative AI on the CSCW Landscape: Insights from HCI Education, Industry Dynamics, and Funding Perspectives","url":"https://doi.org/10.1145/3715070.3748275","published":"2025-10-17","authors":["Guo Freeman","Elizabeth D. Mynatt","Cliff Lampe","Heloísa Candello","Kori Inkpen","Nitesh Goyal"],"abstract":"","companies":["Google/DeepMind","Microsoft"],"matched_orgs":["Google/DeepMind","Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3715070.3748275","openalex_id":"https://openalex.org/W4415274689","cited_by_count":2,"quality_score":51,"matched_keywords":[],"author_affiliations":["Clemson University","Google (United States)","IBM Research - Brazil","Microsoft (United States)","Northeastern University","University of Michigan"],"concepts":[{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.5343999862670898},{"id":"https://openalex.org/C198439703","display_name":"Computer-supported cooperative work","score":0.5012999773025513},{"id":"https://openalex.org/C56739046","display_name":"Knowledge management","score":0.44679999351501465},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.4246000051498413},{"id":"https://openalex.org/C55587333","display_name":"Engineering ethics","score":0.3637999892234802},{"id":"https://openalex.org/C144024400","display_name":"Sociology","score":0.3625999987125397},{"id":"https://openalex.org/C127413603","display_name":"Engineering","score":0.32829999923706055},{"id":"https://openalex.org/C539667460","display_name":"Management science","score":0.31529998779296875}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":2}},{"id":"openalex:W4415297078","title":"Prompt design for medical question answering with Large Language Models","url":"https://doi.org/10.1016/j.mlwa.2025.100758","published":"2025-10-17","authors":["Leonid Kuligin","Jacqueline Lammert","Aleksandr Ostapenko","Keno K. Bressem","Martin Boeker","Maximilian Tschochohei"],"abstract":"The combination of prompting technique and the choice of a foundational model determines end-to-end workflow performance on a given task. We aim to provide comprehensive guidance for the best-performing prompting techniques for various LLMs for medical question-answering. We aim to provide comprehensive guidance for the best-performing prompting techniques for a variety of LLM for medical question-answering. We evaluated 15 large LLMs (incl. Claude 3.5 Sonnet, Gemini pro, Llama, Mistral, OpenAI GPT-4o and 4.1) and 6 smaller models (incl. Gemma, Mistral Nemo, Llama 3.1, Gemini flash) across five prompting techniques on neuro-oncology exam questions. Using the established MedQA dataset and a novel neuro-oncology question set, we compared basic prompting, chain-of-thought reasoning, and more complex agent-based methods incorporating external search capabilities. Results showed that the Reas...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1016/j.mlwa.2025.100758","openalex_id":"https://openalex.org/W4415297078","cited_by_count":2,"quality_score":47,"matched_keywords":["LLM","agent"],"author_affiliations":["Artificial Intelligence in Medicine (Canada)","Deutsches Herzzentrum München","Google (United States)","TUM Klinikum","Technical University of Munich"],"concepts":[{"id":"https://openalex.org/C136197465","display_name":"Variety (cybernetics)","score":0.6863999962806702},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6743000149726868},{"id":"https://openalex.org/C177212765","display_name":"Workflow","score":0.633899986743927},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.6037999987602234},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.45899999141693115},{"id":"https://openalex.org/C113174947","display_name":"Tree (set theory)","score":0.40119999647140503},{"id":"https://openalex.org/C2780586882","display_name":"Simple (philosophy)","score":0.39079999923706055},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.35440000891685486}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":2}},{"id":"arxiv:2510.15522","title":"Latent Reasoning in LLMs as a Vocabulary-Space Superposition","url":"https://huggingface.co/papers/2510.15522","published":"2025-10-17","authors":["Jingcheng Deng","Liang Pang","Zihao Wei","Shichen Xu","Zenghao Duan","Kun Xu","Yang Song","Huawei Shen","Xueqi Cheng"],"abstract":"Large language models (LLMs) demonstrate strong reasoning abilities with chain-of-thought prompting, but explicit reasoning introduces substantial computational overhead. Recent work on latent reasoning reduces this cost by reasoning in latent space without explicit supervision, but performance drops significantly. Our preliminary experiments suggest that this degradation stems from the unstructured latent space, which makes fitting latent tokens difficult. To address this, we restrict the latent space to the column space of the LLM vocabulary, treating latent reasoning as a superposition over vocabulary probabilities. Once latent reasoning concludes, it collapses into an eigenstate of explicit reasoning to yield the final answer. Based on this idea, we propose Latent-SFT, a two-stage learning framework. In the first stage, we design two specialized attention masks to guide the Latent To...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["huggingface_search"],"source":"huggingface_search","work_type":"","doi":"","openalex_id":"","cited_by_count":0,"quality_score":35,"matched_keywords":["LLM","compression"],"author_affiliations":[],"concepts":[],"official_report":false,"quality_signals":{"company_match_source":"HuggingFace Papers organization/author metadata"}},{"id":"official:3c2aadbe75510e56","title":"PaddleOCR-VL: Boosting Multilingual Document Parsing via a 0.9B Ultra-Compact Vision-Language Model","url":"https://ernie.baidu.com/blog/posts/paddleocr-vl/","published":"2025-10-16","authors":["Baidu"],"abstract":"We are excited to release PaddleOCR-VL, a SOTA and resource-efficient model tailored for document parsing.","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["ERNIE","Baidu","technical report","language model","efficient"],"author_affiliations":["Baidu"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official RSS feed https://ernie.baidu.com/blog/index.xml"}},{"id":"hf-org-paper:Qwen:2510.14276","title":"Qwen3Guard Technical Report","url":"https://huggingface.co/papers/2510.14276","published":"2025-10-16","authors":["Alibaba/Qwen"],"abstract":"As large language models (LLMs) become more capable and widely used, ensuring the safety of their outputs is increasingly critical. Existing guardrail models, though useful in static evaluation settings, face two major limitations in real-world applications: (1) they typically output only binary \"safe/unsafe\" labels, which can be interpreted inconsistently across diverse safety policies, rendering them incapable of accommodating varying safety tolerances across domains; and (2) they require complete model outputs before performing safety checks, making them fundamentally incompatible with streaming LLM inference, thereby preventing timely intervention during generation and increasing exposure to harmful partial outputs. To address these challenges, we present Qwen3Guard, a series of multilingual safety guardrail models with two specialized variants: Generative Qwen3Guard, which casts saf...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["HuggingFace org papers","Qwen","LLM"],"author_affiliations":["Alibaba/Qwen"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/Qwen/papers"}},{"id":"hf-org-paper:stepfun-ai:2510.14975","title":"WithAnyone: Towards Controllable and ID Consistent Image Generation","url":"https://huggingface.co/papers/2510.14975","published":"2025-10-16","authors":["StepFun"],"abstract":"","companies":["StepFun"],"matched_orgs":["StepFun"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["HuggingFace org papers","stepfun-ai"],"author_affiliations":["StepFun"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/stepfun-ai/papers"}},{"id":"hf-org-paper:Tencent-Hunyuan:2510.14943","title":"LaSeR: Reinforcement Learning with Last-Token Self-Rewarding","url":"https://huggingface.co/papers/2510.14943","published":"2025-10-16","authors":["Tencent/Hunyuan"],"abstract":"","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["HuggingFace org papers","Tencent-Hunyuan"],"author_affiliations":["Tencent/Hunyuan"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/Tencent-Hunyuan/papers"}},{"id":"apple:u7scn5o3yteupzr0kgfr7jlm","title":"Training Software Engineering Agents and Verifiers with SWE-Gym","url":"https://machinelearning.apple.com/research/training-software","published":"2025-10-16","authors":["Jiayi Pan","Xingyao Wang","Graham Neubig","Navdeep Jaitly","Heng Ji§","Alane Suhr","Yizhe Zhang"],"abstract":"We present SWE-Gym, the first environment for training real-world software engineering (SWE) agents. SWE-Gym contains 2,438 real-world Python task instances, each comprising a codebase with an executable runtime environment, unit tests, and a task specified in natural language. We use SWE-Gym to train language model based SWE agents, achieving up to 19% absolute gains in resolve rate on the popular SWE-Bench Verified and Lite test sets. We also...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["language model"],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"apple:l70zby544y09nmc0vzn6bwun","title":"CPEP: Contrastive Pose-EMG Pre-training Enhances Gesture Generalization on EMG Signals","url":"https://machinelearning.apple.com/research/cpep-contrastive","published":"2025-10-16","authors":["Wenhui Cui","Christopher Sandino","Hadi Pouransar","Ran Liu","Juri Minxha","Ellen Zippi","Aman Verma","Anna Sedlackova","Erdrin Azemi","Behrooz Mahasseni"],"abstract":"This paper was accepted at the Foundation Models for the Brain and Body Workshop at NeurIPS 2025.","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/holdout-loss-based-data-selection-for-llm-finetuning-via-in-context-learning","title":"Holdout-Loss-Based Data Selection for LLM Finetuning via In-Context Learning","url":"https://www.microsoft.com/en-us/research/publication/holdout-loss-based-data-selection-for-llm-finetuning-via-in-context-learning/","published":"2025-10-15","authors":["Ling Zhang","Xianliang Yang","Juwon Yu","Park Cheonyoung","Lei Song","Jiang Bian"],"abstract":"Fine-tuning large pretrained language models is a common approach for aligning them with human preferences, but noisy or off-target examples can dilute supervision. While small, well-chosen datasets often match the performance of much larger ones, systematic and efficient ways to identify high-value training data remain underexplored. Many current methods rely on heuristics or expensive retraining. We present a principled, resource-efficient framework for data selection and reweighting. At its core is an In-Context Approximation (ICA) that estimates the holdout loss a model would incur after training on a candidate example by conditioning on a small, curated holdout set in context. ICA requires no reference model and no additional finetuning. We define the resulting estimate as the ICA score, and derive per-example weights that dynamically reweight gradient updates as model parameters ev...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":80,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","large language models","1970-01-01","LLM","efficient"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/the-coverage-principle-how-pre-training-enables-post-training","title":"The Coverage Principle: How Pre-Training Enables Post-Training","url":"https://www.microsoft.com/en-us/research/publication/the-coverage-principle-how-pre-training-enables-post-training/","published":"2025-10-15","authors":["Fan Chen","Audrey Huang","Noah Golowich","Sadhika Malladi","Adam Block","Jordan T. Ash","Akshay Krishnamurthy","Akshay Krishnamurthy","Dylan J. Foster"],"abstract":"Language models demonstrate remarkable abilities when pre-trained on large text corpora and fine-tuned for specific tasks, but how and why pre-training shapes the success of the final model remains poorly understood. Notably, although pre-training success is often quantified by cross-entropy loss, cross-entropy can be a poor predictor of downstream performance. Instead, we provide a theoretical perspective on this relationship through the lens of \\emph{coverage}, which quantifies the probability mass the pre-trained model places on high-quality responses and which is necessary and sufficient for post-training and test-time scaling methods such as Best-of-N to succeed. Our main results develop an understanding of \\emph{the coverage principle}, a phenomenon whereby next-token prediction (more generally, maximum likelihood) implicitly optimizes toward a model with good coverage. In particul...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","mathematics","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/metis-fast-quality-aware-rag-systems-with-configuration-adaptation","title":"METIS: Fast Quality-Aware RAG Systems with Configuration Adaptation","url":"https://www.microsoft.com/en-us/research/publication/metis-fast-quality-aware-rag-systems-with-configuration-adaptation/","published":"2025-10-15","authors":["Siddhant Ray","Rui Pan","Zhuohan Gu","Kuntai Du","Shaoting Feng","Ganesh Ananthanarayanan","Ravi Netravali","Junchen Jiang"],"abstract":"RAG (Retrieval Augmented Generation) allows LLMs (large language models) to generate better responses with external knowledge, but using more external knowledge causes higher response delay. Prior work focuses either on reducing the response delay (e.g., better scheduling of RAG queries) or on maximizing quality (e.g., tuning the RAG workflow), but they fall short in systematically balancing the tradeoff between the delay and quality of RAG responses. To balance both quality and response delay, this paper presents METIS, the first RAG system that jointly schedules queries and adapts the key RAG configurations of each query, such as the number of retrieved text chunks and synthesis methods. Using four popular RAG-QA datasets, we show that compared to the state-of-the-art RAG optimization schemes, METIS reduces the generation latency by 1.64 − 2.54× without sacrificing generation quality.....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Systems and networking","1970-01-01","retrieval"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"apple:hk4iw6aqsora70nwiiphvzvk","title":"Agentic RAG for Software Testing with Hybrid Vector-Graph and Multi-Agent Orchestration","url":"https://machinelearning.apple.com/research/hybrid-vector-graph","published":"2025-10-15","authors":["Mohanakrishnan Hariharan","Seshu Babu Barma","Satish Arvapalli","Evangeline Sheela Arulanandam"],"abstract":"We present an approach to software testing automation using Agentic Retrieval-Augmented Generation (RAG) systems for Quality Engineering (QE) artifact creation. We combine autonomous AI agents with hybrid vector-graph knowledge systems to automate test plan, case, and QE metric generation. Our approach addresses traditional software testing limitations by leveraging LLMs such as Gemini and Mistral, multi-agent orchestration, and enhanced...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["retrieval","agent","multi-agent"],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"hf-org-paper:Qwen:2511.00010","title":"PlotCraft: Pushing the Limits of LLMs for Complex and Interactive Data Visualization","url":"https://huggingface.co/papers/2511.00010","published":"2025-10-15","authors":["Alibaba/Qwen"],"abstract":"","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["HuggingFace org papers","Qwen"],"author_affiliations":["Alibaba/Qwen"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/Qwen/papers"}},{"id":"arxiv:2503.19075","title":"The Case for \"Thick Evaluations\" of Cultural Representation in AI","url":"http://arxiv.org/abs/2503.19075","published":"2025-10-15","authors":["Rida Qadri","Mark Díaz","Ding Wang","Michael Madaio"],"abstract":"Generative AI image models have been increasingly evaluated for their (in)ability to represent non-Western cultures. We argue that these evaluations operate through reductive ideals of representation, abstracted from how people define their own representation and neglecting the inherently interpretive and contextual nature of cultural representation. In contrast to these \"thin\" evaluations, we introduce the idea of \"thick evaluations\": a more granular, situated, and discursive measurement framework for evaluating representations of social worlds in AI images, steeped in communities' own understandings of representation. We develop this evaluation framework through workshops in South Asia, by studying the \"thick\" ways in which people interpret and assign meaning to images of their own cultures. We introduce practices for thicker evaluations of representation that expand the understanding....","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1609/aies.v8i3.36696","openalex_id":"https://openalex.org/W4415195297","cited_by_count":1,"quality_score":38,"matched_keywords":[],"author_affiliations":["Google (United States)"],"concepts":[{"id":"https://openalex.org/C2776359362","display_name":"Representation (politics)","score":0.6730999946594238},{"id":"https://openalex.org/C2780871342","display_name":"Underpinning","score":0.5806000232696533},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.49950000643730164},{"id":"https://openalex.org/C2780876879","display_name":"Meaning (existential)","score":0.48750001192092896},{"id":"https://openalex.org/C111472728","display_name":"Epistemology","score":0.4781999886035919},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.47690001130104065},{"id":"https://openalex.org/C144024400","display_name":"Sociology","score":0.47209998965263367},{"id":"https://openalex.org/C2776502983","display_name":"Contrast (vision)","score":0.39410001039505005}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"arxiv:2510.13998","title":"BitNet Distillation","url":"https://huggingface.co/papers/2510.13998","published":"2025-10-15","authors":["Xun Wu","Shaohan Huang","Wenhui Wang","Ting Song","Li Dong","Yan Xia","Furu Wei"],"abstract":"In this paper, we present BitNet Distillation (BitDistill), a lightweight pipeline that fine-tunes off-the-shelf full-precision LLMs (e.g., Qwen) into 1.58-bit precision (i.e., ternary weights {-1, 0, 1}) for specific downstream tasks, achieving strong task-specific performance with minimal computational cost. Specifically, BitDistill incorporates three key techniques: the SubLN module, as introduced in BitNet; multi-head attention distillation, based on MiniLM; and continual pre-training, which serves as a crucial warm-up step to mitigate the scalability issue of the performance gap between finetuned full-precision and 1.58-bit LLMs on specific tasks. Experimental results show that BitDistill achieves performance comparable to the full-precision counterpart models across model size, while enabling up to 10x memory savings and 2.65x faster inference on CPUs. Code is available at https://...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["huggingface_search"],"source":"huggingface_search","work_type":"","doi":"","openalex_id":"","cited_by_count":0,"quality_score":35,"matched_keywords":["memory","distillation"],"author_affiliations":[],"concepts":[],"official_report":false,"quality_signals":{"company_match_source":"HuggingFace Papers organization/author metadata"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/evotest-evolutionary-test-time-learning-for-self-improving-agentic-systems","title":"EvoTest: Evolutionary Test-Time Learning for Self-Improving Agentic Systems","url":"https://www.microsoft.com/en-us/research/publication/evotest-evolutionary-test-time-learning-for-self-improving-agentic-systems/","published":"2025-10-14","authors":["Yufei He","Juncheng Liu","Yue Liu","Yibo Li","Tri Cao","Zhiyuan Hu","Xinxing Xu","Bryan Hooi"],"abstract":"A fundamental limitation of current AI agents is their inability to learn complex skills on the fly at test time, often behaving like\"clever but clueless interns\"in novel environments. This severely limits their practical utility. To systematically measure and drive progress on this challenge, we first introduce the Jericho Test-Time Learning (J-TTL) benchmark. J-TTL is a new evaluation setup where an agent must play the same game for several consecutive episodes, attempting to improve its performance from one episode to the next. On J-TTL, we find that existing adaptation methods like reflection, memory, or reinforcement learning struggle. To address the challenges posed by our benchmark, we present EvoTest, an evolutionary test-time learning framework that improves an agent without any fine-tuning or gradients-by evolving the entire agentic system after every episode. EvoTest has two r...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":80,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","AI agents","Computer science","1970-01-01","memory","agent"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/iovalve-leakage-free-i-o-sandbox-for-large-scale-untrusted-data-processing","title":"IOValve: Leakage-Free I/O Sandbox for Large-Scale Untrusted Data Processing","url":"https://www.microsoft.com/en-us/research/publication/iovalve-leakage-free-i-o-sandbox-for-large-scale-untrusted-data-processing/","published":"2025-10-14","authors":["Sangho Lee","Jules Drean","Yue Tan","Marcus Peinado"],"abstract":"The widespread adoption of Large Language Models (LLMs) is driving the rapidly growing demand for large-scale computations like training and fine-tuning models. In many areas, the confidentiality of the underlying data is of critical importance to their corporate or government owners. However, securing data in large-scale computations is challenging. First, its demand for enormous hardware resources typically requires outsourcing (e.g., to the public cloud). Second, the large and rapidly evolving software stack used in LLM training in conjunction with a growing incidence of supply chain attacks and software vulnerabilities makes it all but impossible for data owners to establish trust in the code that processes their highly sensitive data. Confidential computing and sandboxing are promising techniques for solving these problems. However, existing sandboxes do not address covert channels....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Security, privacy, and cryptography","Systems and networking","Computer security","1970-01-01","LLM"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W4415180151","title":"Tag-Enriched Multi-Attention With Large Language Models for Cross-Domain Sequential Recommendation","url":"https://doi.org/10.1109/tce.2025.3620527","published":"2025-10-14","authors":["Wangyu Wu","Xuhang Chen","Z H Chen","Jingen Jiang","Kim Fung Tsang","Xiaowei Huang","Fei Ma","Jimin Xiao"],"abstract":"Cross-Domain Sequential Recommendation (CDSR) plays a crucial role in modern consumer electronics and e-commerce platforms, where users interact with diverse services such as books, movies, and online retail products. These systems must accurately capture both domain-specific and cross-domain behavioral patterns to provide personalized and seamless consumer experiences. To address this challenge, we propose TEMA-LLM (<italic xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">Tag-Enriched Multi-Attention with Large Language Models</i>), a practical and effective framework that integrates <italic xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">Large Language Models (LLMs)</i> for semantic tag generation and enrichment. Specifically, TEMA-LLM employs LLMs to assign domain-aware prompts and generate descriptive....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/tce.2025.3620527","openalex_id":"https://openalex.org/W4415180151","cited_by_count":6,"quality_score":51,"matched_keywords":["LLM","personalized"],"author_affiliations":["Huizhou University","Microsoft (United States)","Shandong University","Shenzhen Institutes of Advanced Technology","University of Liverpool","Xi’an Jiaotong-Liverpool University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8162000179290771},{"id":"https://openalex.org/C2780801425","display_name":"Construct (python library)","score":0.6212999820709229},{"id":"https://openalex.org/C9652623","display_name":"Field (mathematics)","score":0.5501000285148621},{"id":"https://openalex.org/C557471498","display_name":"Recommender system","score":0.49959999322891235},{"id":"https://openalex.org/C154504017","display_name":"Identifier","score":0.4896000027656555},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.454800009727478},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.4117000102996826},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4032999873161316}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":6}},{"id":"openalex:W7125027550","title":"Federated Adaptation of Language Models for On-Device Speech Recognition using Confidence-Aware Training","url":"https://doi.org/10.1109/flta67013.2025.11336736","published":"2025-10-14","authors":["Zhe Liu"],"abstract":"On-device adaptation of automatic speech recognition (ASR) models is essential to bridge the gap between server-side proxy training data and the real-world data encountered on users' local devices. In ASR systems, language models (LMs) play a critical role in improving recognition accuracy by modeling the linguistic context and can be adapted to users' speaking styles or domains. Leveraging federated learning (FL), we introduce an efficient approach for continuously adapting on-device LMs with applications on ASR. To mitigate the impact of transcription errors on the training corpus in users' local devices, we conduct empirical studies comparing various strategies that incorporate token-level confidence scores to improve LM quality in FL settings. Experimental results show that our method yields relative word error rate (WER) reductions of 2.6% and 10.8% on two benchmark speech datasets,...","companies":["Meta/FAIR"],"matched_orgs":["Meta/FAIR"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/flta67013.2025.11336736","openalex_id":"https://openalex.org/W7125027550","cited_by_count":0,"quality_score":41,"matched_keywords":["efficient"],"author_affiliations":["Meta (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8357999920845032},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.692300021648407},{"id":"https://openalex.org/C28490314","display_name":"Speech recognition","score":0.5782999992370605},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5103999972343445},{"id":"https://openalex.org/C40969351","display_name":"Word error rate","score":0.510200023651123},{"id":"https://openalex.org/C51632099","display_name":"Training set","score":0.4918999969959259},{"id":"https://openalex.org/C139807058","display_name":"Adaptation (eye)","score":0.4876999855041504},{"id":"https://openalex.org/C2780148112","display_name":"Proxy (statistics)","score":0.4700999855995178}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/adaptcache-kv-cache-native-storage-hierarchy-for-low-delay-and-high-quality-language-model-serving","title":"AdaptCache: KV Cache Native Storage Hierarchy for Low-Delay and High-Quality Language Model Serving","url":"https://www.microsoft.com/en-us/research/publication/adaptcache-kv-cache-native-storage-hierarchy-for-low-delay-and-high-quality-language-model-serving/","published":"2025-10-13","authors":["Shaoting Feng","Hanchen Li","Kuntai Du","Zhuohan Gu","Yuhan Liu","Jiayi Yao","Siddhant Ray","Samuel Shen","Yihua Cheng","Ganesh Ananthanarayanan","Junchen Jiang"],"abstract":"Large language model (LLM) applications often reuse previously processed context, such as chat history and documents, which in troduces significant redundant computation. Existing LLM serving systems address such redundant computation by storing the KV caches of processed context and loading the corresponding KV cache when a new request reuses the context. Further, as these LLM applications scale, the total size of KV caches becomes excessively large and requires both DRAM and SSD for full storage. However, prior work that stores KV caches in DRAM and SSD suffers from high loading delays, as most KV cache hits come from SSD, which is slow to load. To increase the KV cache hit rate on DRAM, we identify lossy KV cache compression as a promising approach. We design a lossy compression system that decides the compression algorithm, compression rate and device placement for each KV cache entr...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":80,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Systems and networking","1970-01-01","LLM","language model","compression"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"hf-org-paper:zai-org:2510.11683","title":"Boundary-Guided Policy Optimization for Memory-efficient RL of Diffusion Large Language Models","url":"https://huggingface.co/papers/2510.11683","published":"2025-10-13","authors":["Z.ai/Zhipu"],"abstract":"","companies":["Z.ai/Zhipu"],"matched_orgs":["Z.ai/Zhipu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["HuggingFace org papers","zai-org","memory","efficient"],"author_affiliations":["Z.ai/Zhipu"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/zai-org/papers"}},{"id":"hf-org-paper:tencent:2510.11498","title":"ReLook: Vision-Grounded RL with a Multimodal LLM Critic for Agentic Web Coding","url":"https://huggingface.co/papers/2510.11498","published":"2025-10-13","authors":["Tencent/Hunyuan"],"abstract":"","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["HuggingFace org papers","tencent","LLM"],"author_affiliations":["Tencent/Hunyuan"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/tencent/papers"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/docreward-a-document-reward-model-for-structuring-and-stylizing","title":"DocReward: A Document Reward Model for Structuring and Stylizing","url":"https://www.microsoft.com/en-us/research/publication/docreward-a-document-reward-model-for-structuring-and-stylizing/","published":"2025-10-13","authors":["Junpeng Liu","Yuzhong Zhao","Bowen Cao","Jiayu Ding","Yilin Jia","Tengchao Lv","Yupan Huang","Shaohan Huang","Nan Yang","Li Dong","Lei Cui","Tao Ge"],"abstract":"Recent advances in agentic workflows have enabled the automation of tasks such as professional document generation. However, they primarily focus on textual quality, neglecting visual structure and style, which are crucial for readability and engagement. This gap arises mainly from the absence of suitable reward models to guide agentic workflows toward producing documents with stronger structural and stylistic quality. To address this, we propose DocReward, a document reward model that evaluates documents based on their structure and style. We construct a multi-domain dataset DocPair of 117K paired documents, covering 32 domains and 267 document types, each including a high- and low-professionalism document with identical content but different structure and style. This enables the model to evaluate professionalism comprehensively, and in a textual-quality-agnostic way. DocReward is train...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Miscellaneous","Artificial intelligence","Computer science"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"official:069962282cbf43c9","title":"SPG: Sandwiched Policy Gradient for Masked Diffusion Language Models","url":"https://ai.meta.com/research/publications/spg-sandwiched-policy-gradient-for-masked-diffusion-language-models/","published":"2025-10-13","authors":["Chenyu Wang","Paria Rashidinejad","DiJia Su","Song Jiang","Sid Wang","Siyan Zhao","Cai Zhou","Shannon Zejiang Shen","Feiyu Chen","Tommi Jaakkola","Yuandong Tian","Bo Liu"],"abstract":"Official Meta AI publication page.","companies":["Meta/FAIR"],"matched_orgs":["Meta/FAIR"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Reinforcement Learning"],"author_affiliations":["Meta/FAIR"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Meta AI publication search https://ai.meta.com/results/?content_types%5B0%5D=publication&page=3"}},{"id":"apple:ig64ox7hgkr848drbvsod7y3","title":"EncQA: Benchmarking Vision-Language Models on Visual Encodings for Charts","url":"https://machinelearning.apple.com/research/encqa-benchmarking","published":"2025-10-13","authors":["Kushin Mukherjee","Donghao Ren","Dominik Moritz","Yannick Assogba"],"abstract":"Multimodal vision-language models (VLMs) continue to achieve ever-improving scores on chart understanding benchmarks. Yet, we find that this progress does not fully capture the breadth of visual reasoning capabilities essential for interpreting charts. We introduce EncQA, a novel benchmark informed by the visualization literature, designed to provide systematic coverage of visual encodings and analytic tasks that are crucial for chart...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.1109/tvcg.2025.3634249","openalex_id":"https://openalex.org/W4416429374","cited_by_count":1,"quality_score":53,"matched_keywords":[],"author_affiliations":["Apple","Apple (United States)","Stanford University"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"apple:qus6k4xb517pwbkdwgesmgv4","title":"FS-DFM: Fast and Accurate Long Text Generation with Few-Step Diffusion Language Models","url":"https://machinelearning.apple.com/research/fs-dfm","published":"2025-10-13","authors":["Amin Karimi Monsefi","Nikhil Bhendawade","Manuel R. Ciosici","Dominic Culver","Yizhe Zhang","Irina Belousova"],"abstract":"Autoregressive language models (ARMs) deliver strong likelihoods, but are inherently serial: they generate one token per forward pass, which limits throughput and inflates latency for long sequences. Diffusion Language Models (DLMs) parallelize across positions and thus appear promising for language generation, yet standard discrete diffusion typically needs hundreds to thousands of model evaluations to reach high quality, trading serial depth...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"arxiv:2510.11317","title":"Next Interest Flow: A Generative Pre-training Paradigm for Recommender Systems by Modeling All-domain Movelines","url":"https://arxiv.org/abs/2510.11317","published":"2025-10-13","authors":["Chen Gao","Zixin Zhao","Lv Shao","Tong Liu"],"abstract":"Click-Through Rate (CTR) prediction has long been dominated by discriminative paradigms that optimize local decision boundaries within candidate-specific subspaces. However, these models often fail to capture the global joint distribution and the continuous structural evolution of user intent across all-domain movelines. While generative approaches attempt to model global transition patterns, existing methods suffer from discretization-induced information collapse by remapping nuanced e-commerce signals into discrete linguistic or categorical spaces, failing to preserve the topological fidelity of interest trajectories. To overcome these limitations, we propose a novel generative pre-training paradigm that models user intent as a continuous evolutionary trajectory on a high-dimensional latent interest manifold, termed the Next Interest Flow (NIF). We introduce kinematic constraints to go...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3805712.3808482","openalex_id":"https://openalex.org/W7154611076","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Alibaba Group (China)"],"concepts":[{"id":"https://openalex.org/C97931131","display_name":"Discriminative model","score":0.756600022315979},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7358999848365784},{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.6072999835014343},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5268999934196472},{"id":"https://openalex.org/C13662910","display_name":"Trajectory","score":0.5238000154495239},{"id":"https://openalex.org/C5274069","display_name":"Categorical variable","score":0.5220999717712402},{"id":"https://openalex.org/C167966045","display_name":"Generative model","score":0.5198000073432922},{"id":"https://openalex.org/C184898388","display_name":"Pairwise comparison","score":0.4296000003814697}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"arxiv:2305.14019","title":"ChipGPT: How far are we from natural language hardware design","url":"https://huggingface.co/papers/2305.14019","published":"2025-10-13","authors":["Kaiyan Chang","Ying Wang","Haimeng Ren","Mengdi Wang","Shengwen Liang","Yinhe Han","Huawei Li","Xiaowei Li"],"abstract":"As large language models (LLMs) like ChatGPT exhibited unprecedented machine intelligence, it also shows great performance in assisting hardware engineers to realize higher-efficiency logic design via natural language interaction. To estimate the potential of the hardware design process assisted by LLMs, this work attempts to demonstrate an automated design environment that explores LLMs to generate hardware logic designs from natural language specifications. To realize a more accessible and efficient chip development flow, we present a scalable four-stage zero-code logic design framework based on LLMs without retraining or finetuning. At first, the demo, ChipGPT, begins by generating prompts for the LLM, which then produces initial Verilog programs. Second, an output manager corrects and optimizes these programs before collecting them into the final design space. Eventually, ChipGPT wil...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["huggingface_search"],"source":"huggingface_search","work_type":"","doi":"","openalex_id":"","cited_by_count":0,"quality_score":35,"matched_keywords":["LLM","efficient"],"author_affiliations":[],"concepts":[],"official_report":false,"quality_signals":{"company_match_source":"HuggingFace Papers organization/author metadata"}},{"id":"arxiv:2510.11020","title":"GeoVLMath: Enhancing Geometry Reasoning in Vision-Language Models via Cross-Modal Reward for Auxiliary Line Creation","url":"https://huggingface.co/papers/2510.11020","published":"2025-10-13","authors":["Shasha Guo","Liang Pang","Xi Wang","Yanling Wang","Huawei Shen","Jing Zhang"],"abstract":"Auxiliary lines are essential for solving complex geometric problems but remain challenging for large vision-language models (LVLMs). Recent attempts construct auxiliary lines via code-driven rendering, a strategy that relies on accurate and executable code generation to produce visual renderings of the auxiliary lines for subsequent reasoning. However, in complex solid geometry settings, such a strong dependence on precise specifications substantially restricts the robustness of this strategy. Alternatively, we turn to a simpler and more stable solution, representing auxiliary-line constructions as structured textual descriptions. To bridge the gap between textual descriptions and spatial structure, we propose a reinforcement learning framework that enhances diagram-text alignment. The core is a cross-modal reward model that evaluates how well the generated auxiliary-line description ma...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["huggingface_search"],"source":"huggingface_search","work_type":"","doi":"","openalex_id":"","cited_by_count":0,"quality_score":27,"matched_keywords":[],"author_affiliations":[],"concepts":[],"official_report":false,"quality_signals":{"company_match_source":"HuggingFace Papers organization/author metadata"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/autoverus-automated-proof-generation-for-rust-code","title":"AutoVerus: Automated Proof Generation for Rust Code","url":"https://www.microsoft.com/en-us/research/publication/autoverus-automated-proof-generation-for-rust-code/","published":"2025-10-12","authors":["Chenyuan Yang","Xuheng Li","Md Rakib Hossain Misu","Jianan Yao","Weidong Cui","Yeyun Gong","Chris Hawblitzel","Shuvendu Lahiri","Jay Lorch","Shuai Lu","Fan Yang","Ziqiao Zhou"],"abstract":"[caption id=\"attachment1100862\" align=\"alignnone\" width=\"300\"] AutoVerus as a VSCode Plugin[/caption]Generative AI has shown its values for many software engineering tasks. Still in its infancy, large language model (LLM)-based proof generation lags behind LLM-based code generation. In this paper, we present AutoVerus. AutoVerus uses LLM to automatically generate correctness proof for Rust code. AutoVerus is designed to match the unique features of Verus, a verification tool that can prove the correctness of Rust code using proofs and specifications also written in Rust. AutoVerus consists of a network of LLM agents that are crafted and orchestrated to mimic human experts' three phases of proof construction: preliminary proof generation, proof refinement guided by generic tips, and proof debugging guided by verification errors. To thoroughly evaluate AutoVerus and help foster future rese...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":80,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Programming languages and software engineering","Systems and networking","1970-01-01","LLM","language model"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/representation-based-exploration-for-language-models-from-test-time-to-post-training","title":"Representation-Based Exploration for Language Models: From Test-Time to Post-Training","url":"https://www.microsoft.com/en-us/research/publication/representation-based-exploration-for-language-models-from-test-time-to-post-training/","published":"2025-10-12","authors":["Jens Tuyls","Dylan Foster","Akshay Krishnamurthy","Jordan Ash"],"abstract":"Reinforcement learning (RL) promises to expand the capabilities of language models, but it is unclear if current RL techniques promote the discovery of novel behaviors, or simply sharpen those already present in the base model. In this paper, we investigate the value of deliberate exploration -- explicitly incentivizing the model to discover novel and diverse behaviors -- and aim to understand how the knowledge in pre-trained models can guide this search. Our main finding is that exploration with a simple, principled, representation-based bonus derived from the pre-trained language model's hidden states significantly improves diversity and pass@k rates -- both for post-training, and in a novel inference-time scaling setting we introduce. For inference-time, exploration with representation-based diversity improves efficiency, consistently improving pass@k rates across a variety of models....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","Reinforcement learning","1970-01-01","language model"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/tracing-the-traces-latent-temporal-signals-for-efficient-and-accurate-reasoning","title":"Tracing the Traces: Latent Temporal Signals for Efficient and Accurate Reasoning","url":"https://www.microsoft.com/en-us/research/publication/tracing-the-traces-latent-temporal-signals-for-efficient-and-accurate-reasoning/","published":"2025-10-11","authors":["Martina G. Vilas","Safoora Yousefi","Besmira Nushi","Eric Horvitz","Vidhisha Balachandran"],"abstract":"Reasoning models improve their problem-solving ability through inference-time scaling, allocating more compute via longer token budgets. Identifying which reasoning traces are likely to succeed remains a key opportunity: reliably predicting productive paths can substantially reduce wasted computation and improve overall efficiency. We introduce Latent-Trajectory signals that characterize the temporal evolution of a model's internal representations during the generation of intermediate reasoning tokens. By measuring the overall change in latent representations between the start and end of reasoning, the change accumulated across intermediate steps, and the extent to which these changes advance toward the final state, we show that these signals predict solution accuracy more reliably than both cross-layer metrics and output-based confidence measures. When used to guide answer selection acr...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","1970-01-01","efficient"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W4415082886","title":"From image to report: automating lung cancer screening interpretation and reporting with vision-language models","url":"https://doi.org/10.1016/j.jbi.2025.104931","published":"2025-10-11","authors":["Tien-Yu Chang","Qinglin Gou","Leyi Zhao","Tiancheng Zhou","Hongyu Chen","Dong Yang","Huiwen Ju","Kaleb E. Smith","Chengkun Sun","Jinqian Pan","Yu Huang","Xing He"],"abstract":"","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1016/j.jbi.2025.104931","openalex_id":"https://openalex.org/W4415082886","cited_by_count":2,"quality_score":39,"matched_keywords":[],"author_affiliations":["Florida College","Florida Museum of Natural History","Florida State University","Indiana University Bloomington","Indiana University – Purdue University Indianapolis","Nvidia (United States)","Regenstrief Institute"],"concepts":[{"id":"https://openalex.org/C2777405583","display_name":"Lung cancer screening","score":0.84579998254776},{"id":"https://openalex.org/C71924100","display_name":"Medicine","score":0.6262999773025513},{"id":"https://openalex.org/C2776256026","display_name":"Lung cancer","score":0.5803999900817871},{"id":"https://openalex.org/C19527891","display_name":"Medical physics","score":0.5302000045776367},{"id":"https://openalex.org/C126838900","display_name":"Radiology","score":0.5266000032424927},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.506600022315979},{"id":"https://openalex.org/C2776463041","display_name":"Cancer screening","score":0.4788999855518341},{"id":"https://openalex.org/C31601959","display_name":"Medical imaging","score":0.4253000020980835}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":2}},{"id":"openalex:W4415077774","title":"E-commerce Sentiment Analysis Using Fine-tuned LLaMA3 Models: A QLoRA - based Approach","url":"https://doi.org/10.63887/jtie.2025.1.4.13","published":"2025-10-11","authors":["Tianran Li","Hanwu Li","Yutong Zhou"],"abstract":"With the rapid expansion of e-commerce platforms and the surge of user-generated content, accurate sentiment analysis of consumer reviews has become essential for business intelligence and customer relationship management. Traditional methods struggle with the linguistic complexity and diversity of online reviews. To address this challenge, this study proposes a fine-tuned LLaMA3 model using the QLoRA (Quantized Low-Rank Adaptation) method for e-commerce sentiment analysis. Experiments were conducted on a dataset of 4,846 Amazon reviews annotated with three sentiment categories: positive, negative, and neutral. Results reveal substantial improvements through domain-specific fine-tuning. While the pre-trained LLaMA3 baseline achieved only 0.37 overall accuracy with a strong bias toward neutral classification, the fine-tuned model reached 0.86 accuracy with balanced performance across all....","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.63887/jtie.2025.1.4.13","openalex_id":"https://openalex.org/W4415077774","cited_by_count":2,"quality_score":39,"matched_keywords":[],"author_affiliations":["Amazon (United States)","Beijing City University","Carnegie Mellon University"],"concepts":[{"id":"https://openalex.org/C66402592","display_name":"Sentiment analysis","score":0.7682999968528748},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7527999877929688},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5823000073432922},{"id":"https://openalex.org/C12725497","display_name":"Baseline (sea)","score":0.5529000163078308},{"id":"https://openalex.org/C2777212361","display_name":"Class (philosophy)","score":0.5127999782562256},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.48579999804496765},{"id":"https://openalex.org/C48044578","display_name":"Scalability","score":0.4742000102996826},{"id":"https://openalex.org/C2781316041","display_name":"Diversity (politics)","score":0.4433000087738037}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":2}},{"id":"arxiv:2508.10916","title":"Multimodal Quantitative Measures for Multiparty Behavior Evaluation","url":"http://arxiv.org/abs/2508.10916","published":"2025-10-11","authors":["Ojas Shirekar","Wim Pouw","Chenxu Hao","Vrushank Phadnis","Thabo Beeler","Chirag Raman"],"abstract":"Digital humans are emerging as autonomous agents in multiparty interactions, yet existing evaluation metrics largely ignore contextual coordination dynamics. We introduce a unified, intervention-driven framework for objective assessment of multiparty social behaviour in skeletal motion data, spanning three complementary dimensions: (1) synchrony via Cross-Recurrence Quantification Analysis, (2) temporal alignment via Multiscale Empirical Mode Decompositionbased Beat Consistency, and (3) structural similarity via Soft Dynamic Time Warping. We validate metric sensitivity through three theory-driven perturbations -- gesture kinematic dampening, uniform speech-gesture delays, and prosodic pitch-variance reduction-applied to $\\approx 145$ 30-second thin slices of group interactions from the DnD dataset. Mixed-effects analyses reveal predictable, joint-independent shifts: dampening increases C...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3716553.3750752","openalex_id":"https://openalex.org/W4415065157","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Delft University of Technology","Google (Switzerland)","Google (United States)","Netherlands Standardization Institute","Tilburg University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7355999946594238},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5169000029563904},{"id":"https://openalex.org/C207347870","display_name":"Gesture","score":0.48330000042915344},{"id":"https://openalex.org/C176217482","display_name":"Metric (unit)","score":0.47609999775886536},{"id":"https://openalex.org/C39920418","display_name":"Kinematics","score":0.45649999380111694},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.4438999891281128},{"id":"https://openalex.org/C48007421","display_name":"Motion capture","score":0.42969998717308044},{"id":"https://openalex.org/C26760741","display_name":"Perception","score":0.42480000853538513}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/semantic-visual-anomaly-detection-and-reasoning-in-ai-generated-images","title":"Semantic Visual Anomaly Detection and Reasoning in AI-Generated Images","url":"https://www.microsoft.com/en-us/research/publication/semantic-visual-anomaly-detection-and-reasoning-in-ai-generated-images/","published":"2025-10-10","authors":["Chuangchuang Tan","Xiang Ming","Jinglu Wang","Renshuai Tao","Bin Li","Yunchao Wei","Yao Zhao","Yan Lu"],"abstract":"The rapid advancement of AI-generated content (AIGC) has enabled the synthesis of visually convincing images; however, many such outputs exhibit subtle [latex]\\textbf{semantic anomalies}[/latex], including unrealistic object configurations, violations of physical laws, or commonsense inconsistencies, which compromise the overall plausibility of the generated scenes. Detecting these semantic-level anomalies is essential for assessing the trustworthiness of AIGC media, especially in AIGC image analysis, explainable deepfake detection and semantic authenticity assessment. In this paper, we formalize [latex]\\textbf{semantic anomaly detection and reasoning}[/latex] for AIGC images and introduce [latex]\\textbf{AnomReason}[/latex], a large-scale benchmark with structured annotations as quadruples [latex]\\textit{(Name, Phenomenon, Reasoning, Severity)}[/latex]. Annotations are produced by a modu...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":84,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer vision","Computer science","1970-01-01","media","agent","multi-agent"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/sample-efficient-online-learning-in-lm-agents-via-hindsight-trajectory-rewriting","title":"Sample-Efficient Online Learning in LM Agents via Hindsight Trajectory Rewriting","url":"https://www.microsoft.com/en-us/research/publication/sample-efficient-online-learning-in-lm-agents-via-hindsight-trajectory-rewriting/","published":"2025-10-10","authors":["Michael Y. Hu","Ben Van Durme","Jacob Andreas","Harsh Jhamtani"],"abstract":"Language model (LM) agents deployed in novel environments often exhibit poor sample efficiency when learning from sequential interactions. This significantly hinders the usefulness of such agents in environments where interaction is costly (for example, when they interact with humans or reset physical systems). While a number of existing LM agent architectures incorporate various mechanisms for experience storage and reflection, they make limited use of LMs'abilities to directly generate or reason about full counterfactual trajectories. We introduce ECHO (Experience Consolidation via Hindsight Optimization), a prompting framework that adapts hindsight experience replay from reinforcement learning for language model agents. ECHO generates optimized trajectories for alternative goals that could have been achieved during failed attempts, effectively creating synthetic positive examples from...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":80,"matched_keywords":["Miscellaneous","Artificial intelligence","Computer science","language model","memory","efficient","agent"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/chain-of-retrieval-augmented-generation","title":"Chain-of-Retrieval Augmented Generation","url":"https://www.microsoft.com/en-us/research/publication/chain-of-retrieval-augmented-generation/","published":"2025-10-10","authors":["Liang Wang","Haonan Chen","Nan Yang","Xiaolong Huang","Zhicheng Dou","Furu Wei"],"abstract":"This paper introduces an approach for training o1-like RAG models that retrieve and reason over relevant information step by step before generating the final answer. Conventional RAG methods usually perform a single retrieval step before the generation process, which limits their effectiveness in addressing complex queries due to imperfect retrieval results. In contrast, our proposed method, CoRAG (Chain-of-Retrieval Augmented Generation), allows the model to dynamically reformulate the query based on the evolving state. To train CoRAG effectively, we utilize rejection sampling to automatically generate intermediate retrieval chains, thereby augmenting existing RAG datasets that only provide the correct final answer. At test time, we propose various decoding strategies to scale the model's test-time compute by controlling the length and number of sampled retrieval chains. Experimental re...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Human language technologies","Search and information retrieval","Computer science","1970-01-01","retrieval"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"hf-org-paper:tencent:2510.09913","title":"Don't Throw Away Your Pretrained Model","url":"https://huggingface.co/papers/2510.09913","published":"2025-10-10","authors":["Tencent/Hunyuan"],"abstract":"","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["HuggingFace org papers","tencent"],"author_affiliations":["Tencent/Hunyuan"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/tencent/papers"}},{"id":"openalex:W7106805176","title":"KG-RAG: Enhancing GUI Agent Decision-Making via Knowledge Graph-Driven Retrieval-Augmented Generation","url":"https://doi.org/10.48448/z7gd-g387","published":"2025-10-10","authors":["Association for Computational Linguistics 2025","Chen, Jinpeng","Chesi, Graziano","Guan, Ziyi","Hou, Zhijian","Li, Jason Chun Lok","Ma, Wenao","Nguyen, Thanh-Toan","Qin, Shengchao","Wong, Ngai","Wu, Mengyang","Xian, Pengfei"],"abstract":"Despite recent progress, Graphic User Interface (GUI) agents powered by Large Language Models (LLMs) struggle with complex mobile tasks due to limited app-specific knowledge. While UI Transition Graphs (UTGs) offer structured navigation representations, they are underutilized due to poor extraction and inefficient integration. We introduce KG-RAG, a Knowledge Graph-driven Retrieval-Augmented Generation framework that transforms fragmented UTGs into structured vector databases for efficient real-time retrieval. By leveraging an intent-guided LLM search method, KG-RAG generates actionable navigation paths, enhancing agent decision-making. Experiments across diverse mobile apps show that KG-RAG outperforms existing methods, achieving a 75.8% success rate (8.9% improvement over AutoDroid), 84.6% decision accuracy (8.1% improvement), and reducing average task steps from 4.5 to 4.1. Additional...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"other","doi":"https://doi.org/10.48448/z7gd-g387","openalex_id":"https://openalex.org/W7106805176","cited_by_count":0,"quality_score":53,"matched_keywords":["LLM","retrieval","efficient","agent"],"author_affiliations":["Chinese University of Hong Kong","City University of Hong Kong","Huawei Technologies (China)","University of Hong Kong"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7850000262260437},{"id":"https://openalex.org/C2780451532","display_name":"Task (project management)","score":0.6129999756813049},{"id":"https://openalex.org/C113843644","display_name":"Interface (matter)","score":0.4530999958515167},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.4456999897956848},{"id":"https://openalex.org/C89505385","display_name":"User interface","score":0.44179999828338623},{"id":"https://openalex.org/C186967261","display_name":"Mobile device","score":0.42719998955726624},{"id":"https://openalex.org/C37789001","display_name":"Graphical user interface","score":0.3750999867916107},{"id":"https://openalex.org/C2987255567","display_name":"Knowledge graph","score":0.3395000100135803}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7106823112","title":"Igniting Creative Writing in Small Language Models: LLM-as-a-Judge versus Multi-Agent Refined Rewards","url":"https://doi.org/10.48448/whbw-5y07","published":"2025-10-10","authors":["Association for Computational Linguistics 2025","Lu, Bo","Shen, Dongdong","Wei, Xiaolong","Xia, Long","Yin, Dawei","Zhang, Xingyu","Zhao, Zhejun"],"abstract":"Large Language Models (LLMs) have demonstrated remarkable creative writing capabilities, yet their substantial computational demands hinder widespread use. Enhancing Small Language Models (SLMs) offers a promising alternative, but current methods like Supervised Fine-Tuning (SFT) struggle with novelty, and Reinforcement Learning from Human Feedback (RLHF) is costly. This paper explores two distinct AI-driven reward strategies within a Reinforcement Learning from AI Feedback (RLAIF) framework to ignite the creative writing of a 7B-parameter SLM, specifically for generating Chinese greetings. The first strategy employs a Reward Model (RM) trained on high-quality preference data curated by a novel multi-agent rejection sampling framework designed for creative tasks. The second, more novel, strategy utilizes a principle-guided LLM-as-a-Judge, whose reward function is optimized via an adversa...","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"other","doi":"https://doi.org/10.48448/whbw-5y07","openalex_id":"https://openalex.org/W7106823112","cited_by_count":0,"quality_score":53,"matched_keywords":["LLM","preference","agent","multi-agent"],"author_affiliations":["Baidu (China)","Beihang University"],"concepts":[{"id":"https://openalex.org/C97541855","display_name":"Reinforcement learning","score":0.6776000261306763},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6365000009536743},{"id":"https://openalex.org/C14036430","display_name":"Function (biology)","score":0.57669997215271},{"id":"https://openalex.org/C2781249084","display_name":"Preference","score":0.5012999773025513},{"id":"https://openalex.org/C37736160","display_name":"Adversarial system","score":0.49390000104904175},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.484499990940094},{"id":"https://openalex.org/C77618280","display_name":"Scheme (mathematics)","score":0.482699990272522},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.47760000824928284}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7106830674","title":"TP-RAG: Benchmarking Retrieval-Augmented Large Language Model Agents for Spatiotemporal-Aware Travel Planning","url":"https://doi.org/10.48448/7j32-p549","published":"2025-10-10","authors":["Association for Computational Linguistics 2025","liu, hao","Liu, Fan","Ma, Xinyu","Ni, Hang","Su, Lixin","Wang, Shuaiqiang","xiong, hui","Yin, Dawei"],"abstract":"Large language models (LLMs) have shown promise in automating travel planning, yet they often fall short in addressing nuanced spatiotemporal rationality. While existing benchmarks focus on basic plan validity, they neglect critical aspects such as route efficiency, POI appeal, and real-time adaptability. This paper introduces TP-RAG, the first benchmark tailored for retrieval-augmented, spatiotemporal-aware travel planning. Our dataset includes 2,348 real-world travel queries, 85,575 fine-grain annotated POIs, and 18,784 high-quality travel trajectory references sourced from online tourist documents, enabling dynamic and context-aware planning. Through extensive experiments, we reveal that integrating reference trajectories significantly improves spatial efficiency and POI rationality of the travel plan, while challenges persist in universality and robustness due to conflicting referenc...","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"other","doi":"https://doi.org/10.48448/7j32-p549","openalex_id":"https://openalex.org/W7106830674","cited_by_count":0,"quality_score":49,"matched_keywords":["LLM","language model","retrieval"],"author_affiliations":["Baidu (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7390000224113464},{"id":"https://openalex.org/C86251818","display_name":"Benchmarking","score":0.5734999775886536},{"id":"https://openalex.org/C18918823","display_name":"Tourism","score":0.5454000234603882},{"id":"https://openalex.org/C63479239","display_name":"Robustness (evolution)","score":0.48179998993873596},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.40790000557899475},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3885999917984009},{"id":"https://openalex.org/C185798385","display_name":"Benchmark (surveying)","score":0.3801000118255615},{"id":"https://openalex.org/C2522767166","display_name":"Data science","score":0.37869998812675476}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7106807415","title":"Split-Merge: Scalable and Memory-Efficient Merging of Expert LLMs","url":"https://doi.org/10.48448/bbf0-bc72","published":"2025-10-10","authors":["Association for Computational Linguistics 2025","Gorantla, Sruthi","Hazarika, Devamanyu","Hong, Mingyi","Lin, Kaixiang","Namazifar, Mahdi","Rawal, Aditya"],"abstract":"We introduce a zero-shot merging framework for large language models (LLMs) that consolidates specialized domain experts into a single model without any further training. Our core contribution lies in leveraging relative task vectors—difference representations encoding each expert’s unique traits with respect to a shared base model—to guide a principled and efficient merging process. By dissecting parameters into common dimensions (averaged across experts) and complementary dimensions (unique to each expert), we strike an optimal balance between generalization and specialization. We further devise a compression mechanism for the complementary parameters, retaining only principal components and scalar multipliers per expert, thereby minimizing overhead. A dynamic router then selects the most relevant domain at inference, ensuring that domain-specific precision is preserved. Experiments on...","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"other","doi":"https://doi.org/10.48448/bbf0-bc72","openalex_id":"https://openalex.org/W7106807415","cited_by_count":0,"quality_score":49,"matched_keywords":["memory","efficient","compression"],"author_affiliations":["Amazon (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8011000156402588},{"id":"https://openalex.org/C48044578","display_name":"Scalability","score":0.7394000291824341},{"id":"https://openalex.org/C36503486","display_name":"Domain (mathematical analysis)","score":0.5741000175476074},{"id":"https://openalex.org/C2780451532","display_name":"Task (project management)","score":0.5342000126838684},{"id":"https://openalex.org/C144559511","display_name":"Principal (computer security)","score":0.5291000008583069},{"id":"https://openalex.org/C2776760102","display_name":"Code (set theory)","score":0.5078999996185303},{"id":"https://openalex.org/C125411270","display_name":"Encoding (memory)","score":0.4708999991416931},{"id":"https://openalex.org/C80444323","display_name":"Theoretical computer science","score":0.423799991607666}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7106810224","title":"Recall with Reasoning: Chain-of-Thought Distillation for Mamba’s Long-Context Memory and Extrapolation","url":"https://doi.org/10.48448/cq89-xq48","published":"2025-10-10","authors":["Association for Computational Linguistics 2025","Fang, T","Ma, Jun-Yu","Mi, Haitao","Yu, Dong","Zhang, Hongming","Zhang, Zhisong"],"abstract":"Mamba's theoretical infinite-context potential is limited in practice when sequences far exceed training lengths. This work explores unlocking Mamba's long-context memory ability by a simple-yet-effective method, Recall with Reasoning (RwR), by distilling chain-of-thought (CoT) summarization from a teacher model. Specifically, RwR prepends these summarization as CoT prompts during fine-tuning, teaching Mamba to actively recall and reason over long contexts. Experiments on LONGMEMEVAL and HELMET show that RwR outperforms existing long-term memory methods on the Mamba model. Furthermore, under similar pre-training conditions, RwR improves the long-context performance of Mamba relative to comparable Transformer/hybrid baselines while preserving short-context capabilities, all without changing the architecture.","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"other","doi":"https://doi.org/10.48448/cq89-xq48","openalex_id":"https://openalex.org/W7106810224","cited_by_count":0,"quality_score":49,"matched_keywords":["memory","long-term","distillation"],"author_affiliations":["Tencent (China)","University of Science and Technology of China"],"concepts":[{"id":"https://openalex.org/C170858558","display_name":"Automatic summarization","score":0.8896999955177307},{"id":"https://openalex.org/C100660578","display_name":"Recall","score":0.8411999940872192},{"id":"https://openalex.org/C132459708","display_name":"Extrapolation","score":0.6121000051498413},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5654000043869019},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4503999948501587},{"id":"https://openalex.org/C81669768","display_name":"Precision and recall","score":0.4399000108242035},{"id":"https://openalex.org/C204030448","display_name":"Distillation","score":0.42809998989105225},{"id":"https://openalex.org/C18762648","display_name":"Work (physics)","score":0.39250001311302185}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7106835222","title":"Conan-Embedding-v2: Training an LLM from Scratch for Text Embeddings","url":"https://doi.org/10.48448/cxkb-0y63","published":"2025-10-10","authors":["Association for Computational Linguistics 2025","Chen, Xi","Chen, Shi-Zhe","Li, Shiyu","Liu, Ruijie","Tang, Yang"],"abstract":"Large language models (LLMs) have recently demonstrated excellent performance in text embedding tasks. Previous work usually use LoRA to fine-tune existing LLMs, which are limited by the data and training gap between LLMs and embedding models. In this work, we introduce Conan-embedding-v2, an LLM-based text embedder trained from scratch. First, we add news data and multilingual pairs for LLM training to bridge the data gap. Based on this, we propose a cross-lingual retrieval dataset that enables the LLM to better integrate embeddings across different languages. Second, LLMs use causal mask with token-level loss while embedding models use bidirectional mask with sentence-level loss, this training gap makes full fine-tuning less effective than LoRA. We introduce soft-mask mechanism to gradually transition between these two types of masks, making model to learn more comprehensive representa...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"other","doi":"https://doi.org/10.48448/cxkb-0y63","openalex_id":"https://openalex.org/W7106835222","cited_by_count":0,"quality_score":49,"matched_keywords":["LLM","retrieval","news"],"author_affiliations":["Tencent (China)"],"concepts":[{"id":"https://openalex.org/C41608201","display_name":"Embedding","score":0.835099995136261},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7542999982833862},{"id":"https://openalex.org/C2781235140","display_name":"Scratch","score":0.6873999834060669},{"id":"https://openalex.org/C2777211547","display_name":"Training (meteorology)","score":0.6675999760627747},{"id":"https://openalex.org/C185798385","display_name":"Benchmark (surveying)","score":0.5947999954223633},{"id":"https://openalex.org/C51632099","display_name":"Training set","score":0.5852000117301941},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5196999907493591},{"id":"https://openalex.org/C100776233","display_name":"Bridge (graph theory)","score":0.48260000348091125}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7106855467","title":"CoRanking: Collaborative Ranking with Small and Large Ranking Agents","url":"https://doi.org/10.48448/8rsb-a165","published":"2025-10-10","authors":["Association for Computational Linguistics 2025","Dou, Zhicheng","Liu, Wenhan","Ma, Xinyu","Su, Lixin","Wang, Shuaiqiang","Yin, Dawei","Zhu, Yutao"],"abstract":"Listwise ranking based on Large Language Models (LLMs) has achieved state-of-the-art performance in Information Retrieval (IR). However, their effectiveness often depends on LLMs with massive parameter scale (e.g., GPT-4) and computationally expensive sliding window processing, leading to substantial efficiency bottlenecks. In this paper, we propose a Collaborative Ranking framework (CoRanking) for LLM-based listwise ranking. Specifically, we strategically combine an efficient small reranker and an effective large reranker, and jointly optimize with a novel reinforcement learning method (RL). The small reranker performs initial passage ranking, effectively filtering the candidate set to a condensed top-k list (e.g., top-20 passages), and the large reranker (with stronger ranking capability) then reranks only this condensed subset rather than the full list, significantly improving efficie...","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"other","doi":"https://doi.org/10.48448/8rsb-a165","openalex_id":"https://openalex.org/W7106855467","cited_by_count":0,"quality_score":49,"matched_keywords":["LLM","retrieval","efficient"],"author_affiliations":["Baidu (China)","Renmin University of China"],"concepts":[{"id":"https://openalex.org/C189430467","display_name":"Ranking (information retrieval)","score":0.8320000171661377},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7742000222206116},{"id":"https://openalex.org/C177264268","display_name":"Set (abstract data type)","score":0.5609999895095825},{"id":"https://openalex.org/C124975894","display_name":"Ranking SVM","score":0.5339999794960022},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.4821999967098236},{"id":"https://openalex.org/C86037889","display_name":"Learning to rank","score":0.46880000829696655},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.453900009393692},{"id":"https://openalex.org/C124101348","display_name":"Data mining","score":0.435699999332428}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7106854658","title":"Koel-TTS: Enhancing LLM based Speech Generation with Preference Alignment and Classifier Free Guidance","url":"https://doi.org/10.48448/77f1-f289","published":"2025-10-10","authors":["Association for Computational Linguistics 2025","Casanova, Edresson","Desta, Mikyas T.","Fejgin, Roy","Ghosh, Subhankar","Hussain, Shehzeen Samarah","Li, Jason","Neekhara, Paarth","Valle, Rafael","Yang, Xuesong"],"abstract":"Autoregressive speech token generation models produce speech with remarkable variety and naturalness but often suffer from hallucinations and undesired vocalizations that do not conform to conditioning inputs. To address these challenges, we introduce Koel-TTS, an encoder-decoder transformer model for multilingual TTS that improves contextual adherence of speech generation LLMs through preference alignment and classifier-free guidance (CFG). For preference alignment, we design a reward system that ranks model outputs using automatic metrics derived from speech recognition and speaker verification models, encouraging generations that better match the input text and speaker identity. CFG further allows fine-grained control over the influence of conditioning inputs during inference by interpolating conditional and unconditional logits. Notably, applying CFG to a preference-aligned model yie...","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"other","doi":"https://doi.org/10.48448/77f1-f289","openalex_id":"https://openalex.org/W7106854658","cited_by_count":2,"quality_score":47,"matched_keywords":["LLM","preference"],"author_affiliations":["Nvidia (United States)"],"concepts":[{"id":"https://openalex.org/C28490314","display_name":"Speech recognition","score":0.7106000185012817},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6965000033378601},{"id":"https://openalex.org/C48145219","display_name":"Security token","score":0.6765000224113464},{"id":"https://openalex.org/C95623464","display_name":"Classifier (UML)","score":0.566100001335144},{"id":"https://openalex.org/C134537474","display_name":"Naturalness","score":0.5292999744415283},{"id":"https://openalex.org/C2776214188","display_name":"Inference","score":0.47350001335144043},{"id":"https://openalex.org/C14999030","display_name":"Speech synthesis","score":0.4661000072956085},{"id":"https://openalex.org/C66322947","display_name":"Transformer","score":0.4562999904155731}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7106808041","title":"SLOT: Structuring the Output of Large Language Models","url":"https://doi.org/10.48448/t3tt-de79","published":"2025-10-10","authors":["Association for Computational Linguistics 2025","Ding, Haibo","Mishra, Soumya Smruti","Shen, Zhengyuan","Shen, Zhengyuan","Xu, Zhichao","Xu, Zhichao"],"abstract":"Structured outputs are essential for large language models (LLMs) in critical applications like agents and information extraction. Despite their capabilities, LLMs often generate outputs that deviate from predefined schemas, significantly hampering reliable application development. We present SLOT (Structured LLM Output Transformer), a model-agnostic approach that transforms unstructured LLM outputs into precise structured formats. While existing solutions predominantly rely on constrained decoding techniques or are tightly coupled with specific models, SLOT employs a fine-tuned lightweight language model as a post-processing layer, achieving flexibility across various LLMs and schema specifications. We introduce SLOTBench, curated by a data synthesis pipeline alongside a formal evaluation methodology that quantifies both schema accuracy and content fidelity. Our results demonstrate that...","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"other","doi":"https://doi.org/10.48448/t3tt-de79","openalex_id":"https://openalex.org/W7106808041","cited_by_count":1,"quality_score":46,"matched_keywords":["LLM","language model"],"author_affiliations":["Amazon (Germany)","Amazon (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7870000004768372},{"id":"https://openalex.org/C52146309","display_name":"Schema (genetic algorithms)","score":0.7419999837875366},{"id":"https://openalex.org/C2775945657","display_name":"Structuring","score":0.7110000252723694},{"id":"https://openalex.org/C57273362","display_name":"Decoding methods","score":0.6021999716758728},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.5070000290870667},{"id":"https://openalex.org/C67186912","display_name":"Data modeling","score":0.42750000953674316},{"id":"https://openalex.org/C43521106","display_name":"Pipeline (software)","score":0.4235999882221222},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.35429999232292175}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7106786694","title":"WebCoT: Enhancing Web Agent Reasoning by Reconstructing Chain-of-Thought in Reflection, Branching, and Rollback","url":"https://doi.org/10.48448/swgy-qy46","published":"2025-10-10","authors":["Association for Computational Linguistics 2025","Fang, T","Hu, Minda","King, Irwin","Ma, Jun-Yu","Mi, Haitao","Yu, Dong","Zhang, Hongming","Zhang, Zhisong","Zhang, Jianshu","Zhou, Jingyan"],"abstract":"Web agents powered by Large Language Models (LLMs) show promise for next-generation AI, but their limited reasoning in uncertain, dynamic web environments hinders robust deployment. In this paper, we identify key reasoning skills essential for effective web agents, i.e., reflection & lookahead, branching, and rollback, and curate trajectory data that exemplifies these abilities by reconstructing the agent's (inference-time) reasoning algorithms into chain-of-thought rationales. We conduct experiments in the agent self-improving benchmark, OpenWebVoyager, and demonstrate that distilling salient reasoning patterns into the backbone LLM via simple fine-tuning can substantially enhance its performance. Our approach yields significant improvements across multiple benchmarks, including WebVoyager, Mind2web-live, and SimpleQA (web search), highlighting the potential of targeted reasoning skill....","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"other","doi":"https://doi.org/10.48448/swgy-qy46","openalex_id":"https://openalex.org/W7106786694","cited_by_count":0,"quality_score":45,"matched_keywords":["LLM","agent"],"author_affiliations":["Chinese University of Hong Kong","Tencent (China)","University of Science and Technology of China"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7433000206947327},{"id":"https://openalex.org/C2780719617","display_name":"Salient","score":0.7013000249862671},{"id":"https://openalex.org/C26517878","display_name":"Key (lock)","score":0.5934000015258789},{"id":"https://openalex.org/C118643609","display_name":"Web application","score":0.5206000208854675},{"id":"https://openalex.org/C89288958","display_name":"Reasoning system","score":0.4918000102043152},{"id":"https://openalex.org/C65682993","display_name":"Reflection (computer programming)","score":0.487199991941452},{"id":"https://openalex.org/C2780586882","display_name":"Simple (philosophy)","score":0.43700000643730164},{"id":"https://openalex.org/C103057564","display_name":"Analytic reasoning","score":0.41440001130104065}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7106835886","title":"Utility-Focused LLM Annotation for Retrieval and Retrieval-Augmented Generation","url":"https://doi.org/10.48448/aqag-yk90","published":"2025-10-10","authors":["Association for Computational Linguistics 2025","Bi, Keping","Cheng, Xueqi","Guo, Jiafeng","Liu, Shihao","Shi, Daiting","Tang, Minghao","Yin, Dawei","Zhang, Hengran"],"abstract":"This paper explores the use of large language models (LLMs) for annotating document utility in training retrieval and retrieval-augmented generation (RAG) systems, aiming to reduce dependence on costly human annotations. We address the gap between retrieval relevance and generative utility by employing LLMs to annotate document utility. To effectively utilize multiple positive samples per query, we introduce a novel loss that maximizes their summed marginal likelihood. Using the Qwen-2.5-32B model, we annotate utility on the MS MARCO dataset and conduct retrieval experiments on MS MARCO and BEIR, as well as RAG experiments on MS MARCO QA, NQ, and HotpotQA. Our results show that LLM-generated annotations enhance out-of-domain retrieval performance and improve RAG outcomes compared to models trained solely on human annotations or downstream QA metrics. Furthermore, combining LLM annotation...","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"other","doi":"https://doi.org/10.48448/aqag-yk90","openalex_id":"https://openalex.org/W7106835886","cited_by_count":0,"quality_score":45,"matched_keywords":["LLM","retrieval"],"author_affiliations":["Baidu (China)","Institute of Computing Technology"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8216000199317932},{"id":"https://openalex.org/C2776321320","display_name":"Annotation","score":0.7541000247001648},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.6478999853134155},{"id":"https://openalex.org/C158154518","display_name":"Relevance (law)","score":0.6468999981880188},{"id":"https://openalex.org/C114466953","display_name":"Initialization","score":0.5640000104904175},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4973999857902527},{"id":"https://openalex.org/C51632099","display_name":"Training set","score":0.47360000014305115},{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.4659000039100647}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7106816859","title":"T²: An Adaptive Test-Time Scaling Strategy for Contextual Question Answering","url":"https://doi.org/10.48448/g3nv-v807","published":"2025-10-10","authors":["Association for Computational Linguistics 2025","Li, Binyang","Liang, Bin","Wang, Huimin","WANG, Zezhong","Wong, Kam-fai","Wu, Xian","Zhang, Shubo","Zhao, Yutian","Zhao, Zhengyi"],"abstract":"Recent advances in large language models have demonstrated remarkable performance on Contextual Question Answering (CQA). However, prior approaches typically employ elaborate reasoning strategies regardless of question complexity, leading to low adaptability. Recent efficient test-time scaling methods introduce budget constraints or early stop mechanisms to avoid overthinking for straightforward questions. But they add human bias to the reasoning process and fail to leverage models' inherent reasoning capabilities. To address these limitations, we present T²: Think-to-Think, a novel framework that dynamically adapts reasoning depth based on question complexity. T² leverages the insight that if an LLM can effectively solve similar questions using specific reasoning strategies, it can apply the same strategy to the original question. This insight enables to adoption of concise reasoning fo...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"other","doi":"https://doi.org/10.48448/g3nv-v807","openalex_id":"https://openalex.org/W7106816859","cited_by_count":0,"quality_score":45,"matched_keywords":["LLM","efficient"],"author_affiliations":["Chinese University of Hong Kong","Tencent (China)","University of Hong Kong","University of International Relations"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7703999876976013},{"id":"https://openalex.org/C153083717","display_name":"Leverage (statistics)","score":0.7293000221252441},{"id":"https://openalex.org/C44291984","display_name":"Question answering","score":0.711899995803833},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.573199987411499},{"id":"https://openalex.org/C26517878","display_name":"Key (lock)","score":0.44290000200271606},{"id":"https://openalex.org/C98045186","display_name":"Process (computing)","score":0.43560001254081726},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.4162999987602234},{"id":"https://openalex.org/C193221554","display_name":"Commonsense reasoning","score":0.38999998569488525}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7106840934","title":"Tagging-Augmented Generation: Assisting Language Models in Finding Intricate Knowledge In Long Contexts","url":"https://doi.org/10.48448/tgs1-5t14","published":"2025-10-10","authors":["Association for Computational Linguistics 2025","Guo, Tinghao","Hovsepian, Karen","Kanakaris, Nikos","Mihaila, George","Mihaila, George","Pal, Anwesan","Tripathi, Somendra","Zhao, Mengnan"],"abstract":"Recent investigations into effective context lengths of modern flagship large language models (LLMs) have revealed major limitations in effective question answering (QA) and reasoning over long and complex contexts for even the largest and most impressive cadre of models. While approaches like retrieval-augmented generation (RAG) and chunk-based re-ranking attempt to mitigate this issue, they are sensitive to chunking, embedding and retrieval strategies and models, and furthermore, rely on extensive pre-processing, knowledge acquisition and indexing steps. In this paper, we propose Tagging-Augmented Generation (TAG), a lightweight data augmentation strategy that boosts LLM performance in long-context scenarios, without degrading and altering the integrity and composition of retrieved documents. We validate our hypothesis by augmenting two challenging and directly relevant question-answer...","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"other","doi":"https://doi.org/10.48448/tgs1-5t14","openalex_id":"https://openalex.org/W7106840934","cited_by_count":0,"quality_score":45,"matched_keywords":["LLM","retrieval"],"author_affiliations":["Amazon (Germany)","Amazon (United States)","University of North Texas"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8130000233650208},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.6337000131607056},{"id":"https://openalex.org/C2779343474","display_name":"Context (archaeology)","score":0.6226000189781189},{"id":"https://openalex.org/C75165309","display_name":"Search engine indexing","score":0.53329998254776},{"id":"https://openalex.org/C48145219","display_name":"Security token","score":0.5268999934196472},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4781999886035919},{"id":"https://openalex.org/C44291984","display_name":"Question answering","score":0.45980000495910645},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.4207000136375427}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7106815959","title":"SSA-COMET: Do LLMs Outperform Learned Metrics in Evaluating MT for Under-Resourced African Languages?","url":"https://doi.org/10.48448/kxgf-rk31","published":"2025-10-10","authors":["Association for Computational Linguistics 2025","Adelani, David Ifeoluwa","Ali, Felermino","Briakou, Eleftheria","Cherry, Colin","Deutsch, Daniel","Li, Senyu","Lopes Cardoso, Henrique","Sousa-Silva, Rui","Stenetorp, Pontus","Wang, Jiayi"],"abstract":"Evaluating machine translation (MT) quality for under-resourced African languages remains a significant challenge, as existing metrics often suffer from limited language coverage and poor performance in low-resource settings. While recent efforts, such as AfriCOMET, have addressed some of the issues, they are still constrained by small evaluation sets, a lack of publicly available training data tailored to African languages, and inconsistent performance in extremely low-resource scenarios. In this work, we introduce SSA-MTE, a large-scale human-annotated MT evaluation (MTE) dataset covering 13 African language pairs from the News domain, with over 63,000 sentence-level annotations from a diverse set of MT systems. Based on this data, we develop SSA-COMET and SSA-COMET-QE, improved reference-based and reference-free evaluation metrics. We also benchmark prompting-based approaches using st...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"other","doi":"https://doi.org/10.48448/kxgf-rk31","openalex_id":"https://openalex.org/W7106815959","cited_by_count":0,"quality_score":45,"matched_keywords":["LLM","news"],"author_affiliations":["Centre Universitaire de Mila","Google (United States)","McGill University","Universidade do Porto","University College London"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6086999773979187},{"id":"https://openalex.org/C185798385","display_name":"Benchmark (surveying)","score":0.5723999738693237},{"id":"https://openalex.org/C177264268","display_name":"Set (abstract data type)","score":0.5414999723434448},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5123000144958496},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.49869999289512634},{"id":"https://openalex.org/C2779530757","display_name":"Quality (philosophy)","score":0.4936999976634979},{"id":"https://openalex.org/C51632099","display_name":"Training set","score":0.4781000018119812},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.382099986076355}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7106806929","title":"Real-time Ad Retrieval via LLM-generative Commercial Intention for Sponsored Search Advertising","url":"https://doi.org/10.48448/bq3y-p312","published":"2025-10-10","authors":["Association for Computational Linguistics 2025","Liu, Tongtong","Lu, Zenghui","Ma, Shaoping","Qin, Meiyue","Tang, Shaogang","Wang, Zhaohui","Yang, Yuekui"],"abstract":"The integration of Large Language Models (LLMs) with retrieval systems has shown promising potential in retrieving documents (docs) or advertisements (ads) for a given query. Existing LLM-based retrieval methods generate numeric or content-based DocIDs to retrieve docs/ads. However, the one-to-few mapping between numeric IDs and docs, along with the time-consuming content extraction, leads to semantic inefficiency and limits the scalability of existing methods on large-scale corpora. In this paper, we propose the Real-time Ad REtrieval (RARE) framework, which leverages LLM-generated text called Commercial Intentions (CIs) as an intermediate semantic representation to directly retrieve ads for queries in real-time. These CIs are generated by a customized LLM injected with commercial knowledge, enhancing its domain relevance. Each CI corresponds to multiple ads, yielding a lightweight and....","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"other","doi":"https://doi.org/10.48448/bq3y-p312","openalex_id":"https://openalex.org/W7106806929","cited_by_count":0,"quality_score":45,"matched_keywords":["LLM","retrieval"],"author_affiliations":["Tencent (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7940999865531921},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.7455999851226807},{"id":"https://openalex.org/C48044578","display_name":"Scalability","score":0.7418000102043152},{"id":"https://openalex.org/C2778869765","display_name":"Inefficiency","score":0.5971999764442444},{"id":"https://openalex.org/C177264268","display_name":"Set (abstract data type)","score":0.5871999859809875},{"id":"https://openalex.org/C36503486","display_name":"Domain (mathematical analysis)","score":0.46239998936653137},{"id":"https://openalex.org/C2776359362","display_name":"Representation (politics)","score":0.44369998574256897},{"id":"https://openalex.org/C20556612","display_name":"Volume (thermodynamics)","score":0.3894999921321869}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7106787945","title":"RACQC: Advanced Retrieval-Augmented Generation for Chinese Query Correction","url":"https://doi.org/10.48448/2kr6-dn42","published":"2025-10-10","authors":["Association for Computational Linguistics 2025","Gao, Lingzhe","Guo, Yuanzhao","Lei, Haojie","Li, Wei","Liu, Shihao","Shi, Daiting","Su, Jinbo","Wang, Ke","Wang, Xinyi","Yin, Dawei"],"abstract":"In web search scenarios, erroneous queries frequently degrade users' experience through irrelevant results, underscoring the pivotal role of Chinese Spelling Check (CSC) systems. Although large language models (LLMs) exhibit remarkable capabilities across many tasks, they face critical challenges in the CSC scenario: (1) poor generalization to rare entities in open-domain searches, and (2) failure to adapt to temporal entity variations due to static parameters, resulting in serious over-correction issues. To tackle this, we present RACQC, a Chinese Query Correction system with Retrieval-Augmented Generation(RAG) and multi-task learning. Specifically, our approach (1) integrates dynamic knowledge retrieval through entity-centric RAG to address rare entities and innovatively proposes an entity-title collaborative corpus, and (2) employs contrastive correction tasks to mitigate LLM over-cor...","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"other","doi":"https://doi.org/10.48448/2kr6-dn42","openalex_id":"https://openalex.org/W7106787945","cited_by_count":0,"quality_score":45,"matched_keywords":["LLM","retrieval"],"author_affiliations":["Baidu (China)","Jilin Medical University","Jilin University","Renmin University of China"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8561000227928162},{"id":"https://openalex.org/C185798385","display_name":"Benchmark (surveying)","score":0.7441999912261963},{"id":"https://openalex.org/C99016210","display_name":"Query expansion","score":0.5885000228881836},{"id":"https://openalex.org/C2777801307","display_name":"Spelling","score":0.5257999897003174},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.5113000273704529},{"id":"https://openalex.org/C177148314","display_name":"Generalization","score":0.47099998593330383},{"id":"https://openalex.org/C192028432","display_name":"Query language","score":0.46380001306533813},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4180999994277954}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7106821698","title":"PRISM: Efficient Long-Range Reasoning With Short-Context LLMs","url":"https://doi.org/10.48448/w63k-6c25","published":"2025-10-10","authors":["Association for Computational Linguistics 2025","Gunel, Beliz","Jayalath, Dulhan","Monath, Nicholas","Tata, Sandeep","Wendt, James Bradley"],"abstract":"Long-range tasks demand reasoning over long inputs. However, existing solutions are limited, e.g., long-context models require large compute budgets, parameter-efficient fine-tuning (PEFT) needs training data, and retrieval-augmented generation (RAG) entails complex task-specific designs. Though in-context approaches overcome many of these issues, methods with short-context LLMs are inefficient, trading context for processing more tokens. We introduce PRISM, a highly token-efficient in-context method based on structured schemas that outperforms baselines on diverse tasks with 4x shorter contexts. This approach produces concise outputs and efficiently leverages key-value (KV) caches to reduce costs by up to 54%. PRISM scales down to tiny contexts without increasing costs or sacrificing quality, and generalizes to new tasks with minimal effort by generating schemas from task descriptions.","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"other","doi":"https://doi.org/10.48448/w63k-6c25","openalex_id":"https://openalex.org/W7106821698","cited_by_count":0,"quality_score":45,"matched_keywords":["retrieval","efficient"],"author_affiliations":["DeepMind (United Kingdom)","Google (United States)","University of Oxford"],"concepts":[{"id":"https://openalex.org/C2780451532","display_name":"Task (project management)","score":0.6966000199317932},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6693000197410583},{"id":"https://openalex.org/C2779343474","display_name":"Context (archaeology)","score":0.6204000115394592},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.43050000071525574},{"id":"https://openalex.org/C20162079","display_name":"Case-based reasoning","score":0.3625999987125397},{"id":"https://openalex.org/C112930515","display_name":"Risk analysis (engineering)","score":0.34709998965263367},{"id":"https://openalex.org/C175154964","display_name":"Task analysis","score":0.3334999978542328},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.31850001215934753}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7106798379","title":"Multi-Value-Product Retrieval-Augmented Generation for Industrial Product Attribute Value Identification","url":"https://doi.org/10.48448/nb9h-t791","published":"2025-10-10","authors":["Association for Computational Linguistics 2025","Chen, Jufeng","Han, ChengbaoLian","Huang, Fei","Lian, Chengbao","Su, Yindu","Yang, Haiyang","Yu, Chen","zhang, qingheng","Zheng, Bo","Zou, Huike"],"abstract":"Identifying attribute values from product profiles is a key task for improving product search, recommendation, and business analytics on e-commerce platforms, which we called Product Attribute Value Identification (PAVI) . However, existing PAVI methods face critical challenges, such as cascading errors, inability to handle out-of-distribution (OOD) attribute values, and lack of generalization capability. To address these limitations, we introduce Multi-Value-Product Retrieval-Augmented Generation (MVP-RAG), combining the strengths of retrieval, generation, and classification paradigms. MVP-RAG defines PAVI as a retrieval-generation task, where the product title description serves as the query, and products and attribute values act as the corpus. It first retrieves similar products of the same category and candidate attribute values, and then generates the standardized attribute values.....","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"other","doi":"https://doi.org/10.48448/nb9h-t791","openalex_id":"https://openalex.org/W7106798379","cited_by_count":0,"quality_score":45,"matched_keywords":["language model","retrieval"],"author_affiliations":["Alibaba Group (China)","Alibaba Group (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7020000219345093},{"id":"https://openalex.org/C116834253","display_name":"Identification (biology)","score":0.6514000296592712},{"id":"https://openalex.org/C124101348","display_name":"Data mining","score":0.6065000295639038},{"id":"https://openalex.org/C90673727","display_name":"Product (mathematics)","score":0.5777999758720398},{"id":"https://openalex.org/C177148314","display_name":"Generalization","score":0.5455999970436096},{"id":"https://openalex.org/C26517878","display_name":"Key (lock)","score":0.5426999926567078},{"id":"https://openalex.org/C2780451532","display_name":"Task (project management)","score":0.5304999947547913},{"id":"https://openalex.org/C2776291640","display_name":"Value (mathematics)","score":0.42879998683929443}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7106843049","title":"M-Ped: Multi-Prompt Ensemble Decoding for Large Language Models","url":"https://doi.org/10.48448/zex8-0p42","published":"2025-10-10","authors":["Association for Computational Linguistics 2025","Guo, Jiaxin","Li, Zongyao","Luo, Yuanchang","Rao, Zhiqiang","Shang, Hengchao","Wei, Daimeng","Wu, Zhanglin","Yang, Jinlong","Yang, Hao-Yu"],"abstract":"With the widespread application of Large Language Models (LLMs) in the field of Natural Language Processing (NLP), enhancing their performance has become a research hotspot. This paper presents a novel multi-prompt ensemble decoding approach designed to bolster the generation quality of LLMs by leveraging the aggregation of outcomes from multiple prompts. Given a unique input X, we submit n variations of prompts with X to LLMs in batch mode to decode and derive probability distributions. For each token prediction, we calculate the ensemble probability by averaging the n probability distributions within the batch, utilizing this aggregated probability to generate the token. This technique is dubbed Inner-Batch Ensemble. To facilitate efficient batch inference, we implement a Left-Padding strategy to maintain uniform input lengths across the n prompts. Through extensive experimentation on....","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"other","doi":"https://doi.org/10.48448/zex8-0p42","openalex_id":"https://openalex.org/W7106843049","cited_by_count":0,"quality_score":45,"matched_keywords":["LLM","efficient"],"author_affiliations":["Huawei Technologies (China)"],"concepts":[{"id":"https://openalex.org/C57273362","display_name":"Decoding methods","score":0.7581999897956848},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7501999735832214},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.6385999917984009},{"id":"https://openalex.org/C48145219","display_name":"Security token","score":0.5996999740600586},{"id":"https://openalex.org/C2776760102","display_name":"Code (set theory)","score":0.5963000059127808},{"id":"https://openalex.org/C9652623","display_name":"Field (mathematics)","score":0.48089998960494995},{"id":"https://openalex.org/C119898033","display_name":"Ensemble forecasting","score":0.4706999957561493},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4666999876499176}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7106848542","title":"Language Model Based Text-to-Audio Generation: Anti-Causally Aligned Collaborative Residual Transformers","url":"https://doi.org/10.48448/2qaz-cg28","published":"2025-10-10","authors":["Association for Computational Linguistics 2025","Hu, Zhe","Shang, Lei","Wang, Shujun","Wang, Juncheng","Xu, Chao","Yu, Cheng","Yu, Guoqi","谢, 昊宇"],"abstract":"While language models (LMs) paired with residual vector quantization (RVQ) tokenizers have shown promise in text-to-audio (T2A) generation, they still lag behind diffusion-based models by a non-trivial margin. We identify a critical dilemma underpinning this gap: incorporating more RVQ layers improves audio reconstruction fidelity but exceeds the generation capacity of conventional LMs. To address this, we first analyze RVQ dynamics and uncover two key limitations: 1) orthogonality of features across RVQ layers hinders effective LMs training, and 2) descending semantic richness in tokens from deeper RVQ layers exacerbates exposure bias during autoregressive decoding. Based on these insights, we propose Siren, a novel LM-based framework that employs multiple isolated transformers with causal conditioning and anti-causal alignment via reinforcement learning. Extensive experiments demonstra...","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"other","doi":"https://doi.org/10.48448/2qaz-cg28","openalex_id":"https://openalex.org/W7106848542","cited_by_count":0,"quality_score":45,"matched_keywords":["language model","quantization"],"author_affiliations":["Alibaba Group (United States)","Baidu (China)","Hong Kong Polytechnic University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7634999752044678},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.6470999717712402},{"id":"https://openalex.org/C174348530","display_name":"Bridging (networking)","score":0.5512999892234802},{"id":"https://openalex.org/C155512373","display_name":"Residual","score":0.5491999983787537},{"id":"https://openalex.org/C66322947","display_name":"Transformer","score":0.5236999988555908},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4058000147342682},{"id":"https://openalex.org/C28490314","display_name":"Speech recognition","score":0.3767000138759613},{"id":"https://openalex.org/C17137986","display_name":"Orthogonality","score":0.35589998960494995}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7106797335","title":"LMUNIT: Fine-grained Evaluation with Natural Language Unit Tests","url":"https://doi.org/10.48448/na3w-4x49","published":"2025-10-10","authors":["Association for Computational Linguistics 2025","Berrios, William","Franklin, Matija","Kiela, Douwe","Mehri, Shikib","Pathe Vivek, Rajan","Saad-Falcon, Jon","Shankar Naik, Nandita","Singh, Amanpreet","Vidgen, Bertie"],"abstract":"As language models become integral to critical workflows, assessing their behavior remains a fundamental challenge -- human evaluation is costly and noisy, while automated metrics provide only coarse, difficult-to-interpret signals. We introduce \\textbf{natural language unit tests}, a paradigm that decomposes response quality into explicit, testable criteria, along with a unified scoring model, LMUnit, which combines multi-objective training across preferences, direct ratings, and natural language rationales. Through controlled human studies, we show this paradigm significantly improves inter-annotator agreement and enables more effective LLM development workflows. LMunit achieves state-of-the-art performance on evaluation benchmarks (FLASK, BigGenBench) and competitive results on RewardBench. These results validate both our proposed paradigm and scoring model, suggesting a promising pat...","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"other","doi":"https://doi.org/10.48448/na3w-4x49","openalex_id":"https://openalex.org/W7106797335","cited_by_count":0,"quality_score":45,"matched_keywords":["LLM","language model"],"author_affiliations":["Amazon (United States)","Contextual Change (United States)","The Alan Turing Institute","University College Lahore"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7583000063896179},{"id":"https://openalex.org/C195324797","display_name":"Natural language","score":0.6118999719619751},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.524399995803833},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.5174999833106995},{"id":"https://openalex.org/C2779439875","display_name":"Natural language understanding","score":0.45840001106262207},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.4196000099182129},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.41819998621940613},{"id":"https://openalex.org/C2779530757","display_name":"Quality (philosophy)","score":0.4088999927043915}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7106822374","title":"ConCISE: Confidence-guided Compression in Step-by-step Efficient Reasoning","url":"https://doi.org/10.48448/z8z5-x063","published":"2025-10-10","authors":["Association for Computational Linguistics 2025","Deng, Yongheng","Meng, Fandong","Qiao, Ziqing","Ren, Ju","Wang, Guanbo","Wang, Dong","Wei, Lai","Zeng, Jiali","Zhang, Yaoxue","Zhou, Jie"],"abstract":"Large Reasoning Models (LRMs) perform strongly in complex reasoning tasks via Chain-of-Thought (CoT) prompting, but often suffer from verbose outputs, increasing computational overhead. Existing fine-tuning-based compression methods either operate post-hoc pruning, risking disruption to reasoning coherence, or rely on sampling-based selection, which fails to remove redundant content thoroughly. To address these limitations, this work begins by framing two key patterns of redundant reflection in LRMs—textitConfidence Deficit, wherein the model reflects on correct intermediate steps, and textitTermination Delay, where reflection continues after a verified, confident answer—through a confidence-guided perspective. Based on this, we introduce textbfConCISE (textbfConfidence-guided textbfCompression textbfIn textbfStep-by-step textbfEfficient Reasoning), a framework designed to generate conci...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"other","doi":"https://doi.org/10.48448/z8z5-x063","openalex_id":"https://openalex.org/W7106822374","cited_by_count":0,"quality_score":45,"matched_keywords":["efficient","compression"],"author_affiliations":["Tencent (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7666000127792358},{"id":"https://openalex.org/C2780451532","display_name":"Task (project management)","score":0.5385000109672546},{"id":"https://openalex.org/C89288958","display_name":"Reasoning system","score":0.5266000032424927},{"id":"https://openalex.org/C26517878","display_name":"Key (lock)","score":0.5145000219345093},{"id":"https://openalex.org/C81081738","display_name":"Lossless compression","score":0.5062000155448914},{"id":"https://openalex.org/C65682993","display_name":"Reflection (computer programming)","score":0.504800021648407},{"id":"https://openalex.org/C103057564","display_name":"Analytic reasoning","score":0.48669999837875366},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4648999869823456}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7106833258","title":"Answering Narrative-Driven Recommendation Queries via a Retrieve–Rank Paradigm and the OCG-Agent","url":"https://doi.org/10.48448/5cdv-nb11","published":"2025-10-10","authors":["Association for Computational Linguistics 2025","Feng, Yue","Shang, Haoning","Shi, Yunxiao","Xu, Wujiang","Xu, Min","Zi, Xing"],"abstract":"Narrative-driven recommendation queries are common in question-answering platforms, AI search engines, social forums, and some domain-specific vertical applications. Users typically submit free-form text requests for recommendations, e.g., “Any mind-bending thrillers like Shutter Island you’d recommend?” Such special queries have traditionally been addressed as generic QA task under the RAG paradigm. This work formally introduces narrative recommendation as a distinct task and contends that the RAG paradigm is inherently ill-suited for it, owing to information loss in LLMs when retrieving information from from multiple long and fragmented contexts, and limitations in ranking effectiveness. To overcome these limitations, we propose a novel retrieve-rank paradigm by theoretically demonstrating its superiority over RAG paradigm. Central to this new paradigm, we specially focus on the inform...","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"other","doi":"https://doi.org/10.48448/5cdv-nb11","openalex_id":"https://openalex.org/W7106833258","cited_by_count":0,"quality_score":45,"matched_keywords":["retrieval","agent"],"author_affiliations":["Baidu (China)","University of Technology Sydney"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.84170001745224},{"id":"https://openalex.org/C192209626","display_name":"Focus (optics)","score":0.6708999872207642},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.6601999998092651},{"id":"https://openalex.org/C2780451532","display_name":"Task (project management)","score":0.6456000208854675},{"id":"https://openalex.org/C189430467","display_name":"Ranking (information retrieval)","score":0.5965999960899353},{"id":"https://openalex.org/C26517878","display_name":"Key (lock)","score":0.382099986076355},{"id":"https://openalex.org/C44291984","display_name":"Question answering","score":0.3801000118255615},{"id":"https://openalex.org/C136764020","display_name":"World Wide Web","score":0.37139999866485596}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7106844587","title":"VoiceCraft-X: Unifying Multilingual, Voice-Cloning Speech Synthesis and Speech Editing","url":"https://doi.org/10.48448/hpzk-2m88","published":"2025-10-10","authors":["Association for Computational Linguistics 2025","Bhat, Vimal","Diwan, Anuj","Harwath, David","Huynh, Cong Phuoc","Liu, Zhu","Peng, Puyuan","Sun, Xiaohang","Zheng, Zhisheng"],"abstract":"We introduce VoiceCraft-X, an autoregressive neural codec language model which unifies multilingual speech editing and zero-shot text-to-speech (TTS) synthesis across 11 languages: English, Mandarin, Korean, Japanese, Spanish, French, German, Dutch, Italian, Portuguese, and Polish. VoiceCraft-X utilizes the Qwen3 large language model for phoneme-free cross-lingual text processing and a novel token reordering mechanism with time-aligned text and speech tokens to handle both tasks as a single sequence generation problem. The model generates high-quality, natural-sounding speech, seamlessly creating new audio or editing existing recordings within one framework. VoiceCraft-X shows robust performance in diverse linguistic settings, even with limited per-language data, underscoring the power of unified autoregressive approaches for advancing complex, real-world multilingual speech applications...","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"other","doi":"https://doi.org/10.48448/hpzk-2m88","openalex_id":"https://openalex.org/W7106844587","cited_by_count":0,"quality_score":41,"matched_keywords":["language model"],"author_affiliations":["Amazon (Germany)","Amazon (United States)","The University of Texas at Austin"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8447999954223633},{"id":"https://openalex.org/C48145219","display_name":"Security token","score":0.6926000118255615},{"id":"https://openalex.org/C28490314","display_name":"Speech recognition","score":0.6503999829292297},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.5867000222206116},{"id":"https://openalex.org/C14999030","display_name":"Speech synthesis","score":0.5328999757766724},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.47350001335144043},{"id":"https://openalex.org/C61328038","display_name":"Speech processing","score":0.4449000060558319},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.41780000925064087}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7106846138","title":"VRoPE: Rotary Position Embedding for Video Large Language Models","url":"https://doi.org/10.48448/fzgb-c179","published":"2025-10-10","authors":["Association for Computational Linguistics 2025","Cai, Junxian","Chen, Xi","Guo, Longteng","liu, jing","Liu, Zikang","Liu, Qingbin","Ma, Kai","Tang, Yepeng","Yue, Tongtian"],"abstract":"Rotary Position Embedding (RoPE) has shown strong performance in text-based Large Language Models (LLMs), but extending it to video remains a challenge due to the intricate spatiotemporal structure of video frames. Existing adaptations, such as RoPE-3D, attempt to encode spatial and temporal dimensions separately but suffer from two major limitations: positional bias in attention distribution and disruptions in video-text transitions. To overcome these issues, we propose Video Rotary Position Embedding (VRoPE), a novel positional encoding method tailored for Video-LLMs. Specifically, we introduce a more balanced encoding strategy that mitigates attention biases, ensuring a more uniform distribution of spatial focus. Additionally, our approach restructures positional indices to ensure a smooth transition between video and text tokens. Extensive experiments on different models demonstrate....","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"other","doi":"https://doi.org/10.48448/fzgb-c179","openalex_id":"https://openalex.org/W7106846138","cited_by_count":0,"quality_score":41,"matched_keywords":["retrieval"],"author_affiliations":["Chinese Academy of Sciences","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.784600019454956},{"id":"https://openalex.org/C41608201","display_name":"Embedding","score":0.7833999991416931},{"id":"https://openalex.org/C125411270","display_name":"Encoding (memory)","score":0.7562000155448914},{"id":"https://openalex.org/C66746571","display_name":"ENCODE","score":0.7017999887466431},{"id":"https://openalex.org/C198082294","display_name":"Position (finance)","score":0.6898999810218811},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.5498999953269958},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5112000107765198},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.46720001101493835}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7106832878","title":"Understanding Subword Compositionality of Large Language Models","url":"https://doi.org/10.48448/eat4-p974","published":"2025-10-10","authors":["Association for Computational Linguistics 2025","Chai, Yekun","Peng, Qiwei","Søgaard, Anders"],"abstract":"Large language models (LLMs) take sequences of subwords as input, requiring them to effective compose subword representations into meaningful word-level representations. In this paper, we present a comprehensive set of experiments to probe how LLMs compose subword information, focusing on three key aspects: structural similarity, semantic decomposability, and form retention. Our analysis of the experiments suggests that these five LLM families can be classified into three distinct groups, likely reflecting difference in their underlying composition strategies. Specifically, we observe (i) three distinct patterns in the evolution of structural similarity between subword compositions and whole-word representations across layers; (ii) great performance when probing layer by layer their sensitivity to semantic decompositionality; and (iii) three distinct patterns when probing sensitivity to....","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"other","doi":"https://doi.org/10.48448/eat4-p974","openalex_id":"https://openalex.org/W7106832878","cited_by_count":0,"quality_score":41,"matched_keywords":["LLM"],"author_affiliations":["Baidu (China)","University of Copenhagen"],"concepts":[{"id":"https://openalex.org/C121375916","display_name":"Principle of compositionality","score":0.8460999727249146},{"id":"https://openalex.org/C177264268","display_name":"Set (abstract data type)","score":0.6582000255584717},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6577000021934509},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.6435999870300293},{"id":"https://openalex.org/C2779227376","display_name":"Layer (electronics)","score":0.5418999791145325},{"id":"https://openalex.org/C103278499","display_name":"Similarity (geometry)","score":0.5364999771118164},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5253000259399414},{"id":"https://openalex.org/C2778112365","display_name":"Sequence (biology)","score":0.5022000074386597}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7106797945","title":"Thread: A Logic-Based Data Organization Paradigm for How-To Question Answering with Retrieval Augmented Generation","url":"https://doi.org/10.48448/zyaf-s983","published":"2025-10-10","authors":["Association for Computational Linguistics 2025","An, Kaikai","Cao, Lele","Chang, Baobao","Cheng, Sitao","Li, Liqun","Lin, Qingwei","Lu, Junting","Rajmohan, Saravan","Si, Shuzheng","Wang, Lu","Yang, Fangkai"],"abstract":"Recent advances in retrieval-augmented generation (RAG) have substantially improved question-answering systems, particularly for factoid '5Ws' questions. However, significant challenges remain when addressing '1H' questions, specifically how-to questions, which are integral for decision-making and require dynamic, step-by-step responses. The key limitation lies in the prevalent data organization paradigm, chunk, which commonly divides documents into fixed-size segments, and disrupts the logical coherence and connections within the context. To address this, we propose THREAD, a novel data organization paradigm enabling systems to handle how-to questions more effectively. Specifically, we introduce a new knowledge granularity, 'logic unit' (LU), where large language models transform documents into more structured and loosely interconnected LUs. Extensive experiments across both open-domain...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"other","doi":"https://doi.org/10.48448/zyaf-s983","openalex_id":"https://openalex.org/W7106797945","cited_by_count":0,"quality_score":41,"matched_keywords":["retrieval"],"author_affiliations":["Microsoft (United States)","Microsoft Research (United Kingdom)","Peking University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8377000093460083},{"id":"https://openalex.org/C44291984","display_name":"Question answering","score":0.6432999968528748},{"id":"https://openalex.org/C27158222","display_name":"Generalizability theory","score":0.5864999890327454},{"id":"https://openalex.org/C177606310","display_name":"Adaptability","score":0.5550000071525574},{"id":"https://openalex.org/C138101251","display_name":"Thread (computing)","score":0.552299976348877},{"id":"https://openalex.org/C2781181686","display_name":"Coherence (philosophical gambling strategy)","score":0.4318999946117401},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4262999892234802},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.4221000075340271}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7106800886","title":"SparsePO: Controlling Preference Alignment of LLMs via Sparse Token Masks","url":"https://doi.org/10.48448/6sdn-j688","published":"2025-10-10","authors":["Association for Computational Linguistics 2025","Bou Ammar, Haitham","Cardenas, Ronald","Christopoulou, Fenia","Lampouras, Gerasimos","Wang, Jun"],"abstract":"Direct alignment algorithms have proven an effective step for aligning language models to human-desired behaviors. Current variants of the Direct Preference Optimization objective have focused on a strict setting where all tokens are contributing signals of KL divergence and rewards to the loss function. However, human preference is not affected equally by each word in a sequence but is often dependent on specific words or phrases, e.g. existence of toxic terms leads to non-preferred responses. Based on this observation, we argue that not all tokens should be weighted equally during PO and propose a flexible objective termed SparsePO, that aims to automatically learn to weight the KL divergence and reward corresponding to each token during PO training. We propose two different variants of weight-masks that can either be derived from the reference model itself or learned on the fly. Notab...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"other","doi":"https://doi.org/10.48448/6sdn-j688","openalex_id":"https://openalex.org/W7106800886","cited_by_count":0,"quality_score":41,"matched_keywords":["preference"],"author_affiliations":["Artificial Intelligence in Medicine (Canada)","Centre for Artificial Intelligence and Robotics","Huawei Technologies (China)","Huawei Technologies (Sweden)","Huawei Technologies (United Kingdom)"],"concepts":[{"id":"https://openalex.org/C170858558","display_name":"Automatic summarization","score":0.7423999905586243},{"id":"https://openalex.org/C48145219","display_name":"Security token","score":0.7145000100135803},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6833999752998352},{"id":"https://openalex.org/C2781249084","display_name":"Preference","score":0.6388000249862671},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.588100016117096},{"id":"https://openalex.org/C207390915","display_name":"Divergence (linguistics)","score":0.5601999759674072},{"id":"https://openalex.org/C2778112365","display_name":"Sequence (biology)","score":0.49320000410079956},{"id":"https://openalex.org/C90805587","display_name":"Word (group theory)","score":0.46470001339912415}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7106826684","title":"Social Bias in Multilingual Language Models: A Survey","url":"https://doi.org/10.48448/k6ax-dp64","published":"2025-10-10","authors":["Association for Computational Linguistics 2025","Feng, Yue","Gamboa, Lance Calvin","Lee, Mark G."],"abstract":"Pretrained multilingual models exhibit the same social bias as models processing English texts. This systematic review analyzes emerging research that extends bias evaluation and mitigation approaches into multilingual and non-English contexts. We examine these studies with respect to linguistic diversity, cultural awareness, and their choice of evaluation metrics and mitigation techniques. Our survey illuminates gaps in the field’s dominant methodological design choices (e.g., preference for certain languages, scarcity of multilingual mitigation experiments) while cataloging common issues encountered and solutions implemented in adapting bias benchmarks across languages and cultures. Drawing from the implications of our findings, we chart directions for future research that can reinforce the multilingual bias literature’s inclusivity, cross-cultural appropriateness, and alignment with s...","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"other","doi":"https://doi.org/10.48448/k6ax-dp64","openalex_id":"https://openalex.org/W7106826684","cited_by_count":0,"quality_score":41,"matched_keywords":["preference"],"author_affiliations":["Baidu (China)","University of Birmingham"],"concepts":[{"id":"https://openalex.org/C2780035574","display_name":"Multilingualism","score":0.6294999718666077},{"id":"https://openalex.org/C37773902","display_name":"Cultural bias","score":0.5375000238418579},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5185999870300293},{"id":"https://openalex.org/C2781249084","display_name":"Preference","score":0.5151000022888184},{"id":"https://openalex.org/C109747225","display_name":"Scarcity","score":0.4754999876022339},{"id":"https://openalex.org/C41895202","display_name":"Linguistics","score":0.42570000886917114},{"id":"https://openalex.org/C2983427547","display_name":"Gender bias","score":0.42239999771118164},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.3944000005722046}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7106787405","title":"Small Models, Big Results: Achieving Superior Intent Extraction through Decomposition","url":"https://doi.org/10.48448/hvtm-sn71","published":"2025-10-10","authors":["Association for Computational Linguistics 2025","Berkovitch, Omri","Caduri, Sapir","Cohen, Danielle","Dagan, Ido","Efros, Anatoly","Halpern, Yoni","Kahlon, Noam","Oren, Joel"],"abstract":"Understanding user intents from UI interaction trajectories remains a challenging, yet crucial, frontier in intelligent agent development. While massive, datacenter-based, multi-modal large language models (MLLMs) possess greater capacity to handle the complexities of such sequences, smaller models which can run on-device to provide a privacy-preserving, low-cost, and low-latency user experience, struggle with accurate intent inference. We address these limitations by introducing a novel decomposed approach: first, we perform structured interaction summarization, capturing key information from each user action. Second, we perform intent extraction using a fine-tuned model operating on the aggregated summaries. This method improves intent understanding in resource-constrained models, even surpassing the base performance of large MLLMs.","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"other","doi":"https://doi.org/10.48448/hvtm-sn71","openalex_id":"https://openalex.org/W7106787405","cited_by_count":0,"quality_score":41,"matched_keywords":["agent"],"author_affiliations":["Bar-Ilan University","Google (United States)","Robert Bosch (India)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7699000239372253},{"id":"https://openalex.org/C26517878","display_name":"Key (lock)","score":0.6600000262260437},{"id":"https://openalex.org/C124681953","display_name":"Decomposition","score":0.5554999709129333},{"id":"https://openalex.org/C195807954","display_name":"Information extraction","score":0.4740000069141388},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4050999879837036},{"id":"https://openalex.org/C75684735","display_name":"Big data","score":0.4007999897003174},{"id":"https://openalex.org/C42058472","display_name":"Base (topology)","score":0.38830000162124634},{"id":"https://openalex.org/C124101348","display_name":"Data mining","score":0.3434999883174896}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7106789784","title":"Sequential-NIAH: A Needle-In-A-Haystack Benchmark for Extracting Sequential Needles from Long Contexts","url":"https://doi.org/10.48448/8rn5-4c06","published":"2025-10-10","authors":["Association for Computational Linguistics 2025","Li, Fang","Liang, Xiaolong","Qiao, Lingfeng","Sun, Xing","Wang, Jie","yin, di","Yu, Yifei","Zeng, Chen","Zhang, Qian-Wen","Zheng, Suncong"],"abstract":"Evaluating the ability of large language models (LLMs) to process lengthy contexts is critical, especially for retrieving query-relevant information embedded within them. We introduce Sequential-NIAH, a benchmark specifically designed to evaluate the capability of LLMs to extract sequential information items (known as \\emph{needles}) from long contexts. The benchmark includes three needle generation pipelines: synthetic-temporal, real-temporal, and real-logical orders, with context lengths ranging from 8K to 128K, which comprises 14,000 samples (2,000 for testing). To facilitate the evaluation of this benchmark, we trained an evaluation model that assesses the correctness of LLM responses by comparing their completeness and sequential consistency against the ground truth, which provides a more reliable evaluation metric than GPT-4 or Claude. We conducted experiments on six well-known LLM...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"other","doi":"https://doi.org/10.48448/8rn5-4c06","openalex_id":"https://openalex.org/W7106789784","cited_by_count":0,"quality_score":41,"matched_keywords":["LLM"],"author_affiliations":["Tencent (China)"],"concepts":[{"id":"https://openalex.org/C185798385","display_name":"Benchmark (surveying)","score":0.7932999730110168},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7569000124931335},{"id":"https://openalex.org/C55439883","display_name":"Correctness","score":0.741100013256073},{"id":"https://openalex.org/C2776436953","display_name":"Consistency (knowledge bases)","score":0.6708999872207642},{"id":"https://openalex.org/C176217482","display_name":"Metric (unit)","score":0.6406999826431274},{"id":"https://openalex.org/C2779343474","display_name":"Context (archaeology)","score":0.6128000020980835},{"id":"https://openalex.org/C124101348","display_name":"Data mining","score":0.5767999887466431},{"id":"https://openalex.org/C177264268","display_name":"Set (abstract data type)","score":0.5519000291824341}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7106845376","title":"RLHF Algorithms Ranked: An Extensive Evaluation Across Diverse Tasks, Rewards, and Hyperparameters","url":"https://doi.org/10.48448/338a-5431","published":"2025-10-10","authors":["Association for Computational Linguistics 2025","Grabowski, Peter","Grabowski, Peter","Ie, Eugene","Ie, Eugene","Kaushal, Aditi","Pasumarthi, Rama Kumar","Spangher, Lucas","Spangher, Lucas"],"abstract":"Large Language Models (LLMs) have demonstrated impressive text generation capabilities, yet their outputs often misalign with human preferences. To address this challenge, Reinforcement Learning from Human Feedback (RLHF) has become an essential component of modern LLM training pipelines. Although Proximal Policy Optimization (PPO) initially emerged as a favored RLHF strategy, its complexity and inefficiency have spurred the investigation of simpler alternatives. This work presents, to the authors' knowledge, the most comprehensive benchmark to date of seventeen state-of-the-art RLHF algorithms. We evaluate these algorithms on two different benchmarks, OpenAI's TL;DR Summarization and Anthropic's Helpfulness / Harmlessness, with two different reward models a Gemma 2B Reward model and a Rules based reward model. We incorporate extensive hyperparameter sweeps for each algorithm. With this....","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"other","doi":"https://doi.org/10.48448/338a-5431","openalex_id":"https://openalex.org/W7106845376","cited_by_count":0,"quality_score":41,"matched_keywords":["LLM"],"author_affiliations":["Google (United States)","Kootenay Association for Science & Technology","Korea Advanced Institute of Science and Technology"],"concepts":[{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.7328000068664551},{"id":"https://openalex.org/C8642999","display_name":"Hyperparameter","score":0.6816999912261963},{"id":"https://openalex.org/C86251818","display_name":"Benchmarking","score":0.6765999794006348},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6674000024795532},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.616599977016449},{"id":"https://openalex.org/C185798385","display_name":"Benchmark (surveying)","score":0.5848000049591064},{"id":"https://openalex.org/C2781265381","display_name":"Helpfulness","score":0.5432999730110168},{"id":"https://openalex.org/C2778869765","display_name":"Inefficiency","score":0.46779999136924744}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7106785493","title":"On Relation-Specific Neurons in Large Language Models","url":"https://doi.org/10.48448/2c60-zr80","published":"2025-10-10","authors":["Association for Computational Linguistics 2025","Chen, Runsheng","Hakimi, Ahmad","Hirlimann, Lea","Kargaran, Amir Hossein","Liu, Yihong","Rothe, Sascha","Schuetze, Hinrich","Wang, Mingyang","Yvon, François"],"abstract":"In large language models (LLMs), certain neurons can store distinct pieces of knowledge learned during pretraining. While factual knowledge typically appears as a combination of relations and entities, it remains unclear whether some neurons focus on a relation itself -- independent of any entity. We hypothesize such neurons detect a relation in the input text and guide generation involving such a relation. To investigate this, we study the LLama-2 family on a chosen set of relations, with a statistics-based method. Our experiments demonstrate the existence of relation-specific neurons. We measure the effect of selectively deactivating candidate neurons specific to relation r on the LLM's ability to handle (1) facts involving relation r and (2) facts involving a different relation r' neq r. With respect to their capacity for encoding relation information, we give evidence for the followi...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"other","doi":"https://doi.org/10.48448/2c60-zr80","openalex_id":"https://openalex.org/W7106785493","cited_by_count":0,"quality_score":41,"matched_keywords":["LLM"],"author_affiliations":["Google (United States)","LMU Klinikum","Ludwig-Maximilians-Universität München","Technical University of Munich"],"concepts":[{"id":"https://openalex.org/C25343380","display_name":"Relation (database)","score":0.8730000257492065},{"id":"https://openalex.org/C125411270","display_name":"Encoding (memory)","score":0.6830000281333923},{"id":"https://openalex.org/C192209626","display_name":"Focus (optics)","score":0.6388000249862671},{"id":"https://openalex.org/C2778794669","display_name":"Neuron","score":0.6018999814987183},{"id":"https://openalex.org/C100660578","display_name":"Recall","score":0.5946999788284302},{"id":"https://openalex.org/C177264268","display_name":"Set (abstract data type)","score":0.5501999855041504},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5156000256538391},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3971000015735626}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7106796977","title":"Mitigating Hallucinations in Large Vision-Language Models via Entity-Centric Multimodal Preference Optimization","url":"https://doi.org/10.48448/1v4t-3s12","published":"2025-10-10","authors":["Association for Computational Linguistics 2025","Cao, Min","Huang, Jizhou","Jiu Long, Wu","Shi, Zhengliang","Wang, Shuaiqiang","Yan, Lingyong","Yin, Dawei","Zhang, Min"],"abstract":"Large Visual Language Models (LVLMs) have demonstrated impressive capabilities across multiple tasks. However, their trustworthiness is often challenged by hallucinations, which can be attributed to the modality misalignment and the inherent hallucinations of their underlying Large Language Models (LLMs) backbone. Existing preference alignment methods focus on aligning model responses with human preferences while neglecting image-text modality alignment, resulting in over-reliance on LLMs and hallucinations. In this paper, we propose Entity-centric Multimodal Preference Optimization (EMPO), which achieves enhanced modality alignment than existing human preference alignment methods. Besides, to overcome the scarcity of high-quality multimodal preference data, we utilize open-source instruction datasets to automatically construct high-quality preference data across three aspects: image, in...","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"other","doi":"https://doi.org/10.48448/1v4t-3s12","openalex_id":"https://openalex.org/W7106796977","cited_by_count":0,"quality_score":41,"matched_keywords":["preference"],"author_affiliations":["Baidu (China)","Shandong University","Soochow University"],"concepts":[{"id":"https://openalex.org/C2781249084","display_name":"Preference","score":0.784600019454956},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6811000108718872},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5771999955177307},{"id":"https://openalex.org/C2780801425","display_name":"Construct (python library)","score":0.5220999717712402},{"id":"https://openalex.org/C2780226545","display_name":"Modality (human–computer interaction)","score":0.4876999855041504},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.4611999988555908},{"id":"https://openalex.org/C192209626","display_name":"Focus (optics)","score":0.4578999876976013},{"id":"https://openalex.org/C2781238097","display_name":"Object (grammar)","score":0.4198000133037567}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7106835339","title":"Meta-Rewarding Language Models: Self-Improving Alignment with LLM-as-a-Meta-Judge","url":"https://doi.org/10.48448/0vxw-q060","published":"2025-10-10","authors":["Association for Computational Linguistics 2025","E Weston, Jason","Golovneva, Olga","Jiao, Jiantao","Sukhbaatar, Sainbayar","Tian, Yuandong","Wu, Tianhao","Xu, Jing","Yuan, Weizhe"],"abstract":"Large Language Models (LLMs) are rapidly surpassing human knowledge in many domains. While improving these models traditionally relies on costly human data, recent self-rewarding mechanisms have shown that LLMs can improve by judging their own responses instead of relying on human labelers. However, existing methods have primarily focused on improving model responses rather than judgment capabilities, resulting in rapid saturation during iterative training. To address this issue, we introduce a novel Meta-Rewarding step to the self-improvement process, where the model judges its own judgements and uses that feedback to refine its judgment skills. Surprisingly, this unsupervised approach improves the model's ability to judge and follow instructions, as demonstrated by a win rate improvement of Llama-3-8B-Instruct from 22.9% to 39.4% on AlpacaEval 2, and 20.6% to 29.1% on Arena-Hard. These...","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"other","doi":"https://doi.org/10.48448/0vxw-q060","openalex_id":"https://openalex.org/W7106835339","cited_by_count":0,"quality_score":41,"matched_keywords":["LLM"],"author_affiliations":["Berkeley College","Nvidia (United Kingdom)","Nvidia (United States)","University of California, Berkeley"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6978999972343445},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.5870000123977661},{"id":"https://openalex.org/C2993724205","display_name":"Human language","score":0.5557000041007996},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5038999915122986},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.3714999854564667},{"id":"https://openalex.org/C26517878","display_name":"Key (lock)","score":0.36390000581741333},{"id":"https://openalex.org/C195324797","display_name":"Natural language","score":0.3495999872684479},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.34689998626708984}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7106788263","title":"Med-VRAgent: A Framework for Medical Visual Reasoning-Enhanced Agents","url":"https://doi.org/10.48448/cyqj-d006","published":"2025-10-10","authors":["Association for Computational Linguistics 2025","Feng, Yue","guo, guangfu","Lu, Xiaoqian"],"abstract":"Vision-language models (VLMs) achieve promising results in medical reasoning but struggle with hallucinations, vague descriptions, Inconsistent logic and poor localization. To address this, we propose a agent framework named Medical Visual Reasoning Agent (\\textbf{Med-VRAgent}). The approach is based on Visual Guidance and Self-Reward paradigms and Monte Carlo Tree Search (MCTS). By combining the Visual Guidance with tree search, Med-VRAgent improves the medical visual reasoning capabilities of VLMs. We use the trajectories collected by Med-RAgent as feedback to further improve the performance by fine-tuning the VLMs with the proximal policy optimization (PPO) objective. Experiments on multiple medical VQA benchmarks demonstrate that our method outperforms existing approaches.","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"other","doi":"https://doi.org/10.48448/cyqj-d006","openalex_id":"https://openalex.org/W7106788263","cited_by_count":0,"quality_score":41,"matched_keywords":["agent"],"author_affiliations":["Baidu (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7178999781608582},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5810999870300293},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.5027999877929688},{"id":"https://openalex.org/C113174947","display_name":"Tree (set theory)","score":0.4645000100135803},{"id":"https://openalex.org/C36464697","display_name":"Visualization","score":0.396699994802475},{"id":"https://openalex.org/C58328972","display_name":"Expert system","score":0.35429999232292175},{"id":"https://openalex.org/C2777508537","display_name":"Visual reasoning","score":0.33009999990463257},{"id":"https://openalex.org/C2777055276","display_name":"Visual approach","score":0.3086000084877014}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7106823646","title":"LORAXBENCH: A Multitask, Multilingual Benchmark Suite for 20 Indonesian Languages","url":"https://doi.org/10.48448/w902-xw16","published":"2025-10-10","authors":["Association for Computational Linguistics 2025","Aji, Alham","Cohn, Trevor"],"abstract":"As one of the world's most populous countries, with 700 languages spoken, Indonesia is behind in terms of NLP progress. We introduce LORAXBENCH, a benchmark that focuses on low-resource languages of Indonesia and covers 6 diverse tasks: reading comprehension, open-domain QA, language inference, causal reasoning, translation, and cultural QA. Our dataset cover 20 languages, with the addition of two formality registers for three languages. We evaluate a diverse set of multilingual and region-focused LLMs and found that this benchmark is challenging. We note a visible discrepancy between performance in Indonesian and other languages, especially the low-resource ones. There is no clear lead when using a region-specific model as opposed to the general multilingual model. Lastly, we show that a change in register affects model performance, especially with registers not commonly found in social...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"other","doi":"https://doi.org/10.48448/w902-xw16","openalex_id":"https://openalex.org/W7106823646","cited_by_count":0,"quality_score":41,"matched_keywords":["media"],"author_affiliations":["Google (United States)"],"concepts":[{"id":"https://openalex.org/C185798385","display_name":"Benchmark (surveying)","score":0.7703999876976013},{"id":"https://openalex.org/C2777159308","display_name":"Formality","score":0.7559999823570251},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7332000136375427},{"id":"https://openalex.org/C2779207338","display_name":"Indonesian","score":0.7038000226020813},{"id":"https://openalex.org/C79581498","display_name":"Suite","score":0.5924999713897705},{"id":"https://openalex.org/C177264268","display_name":"Set (abstract data type)","score":0.5460000038146973},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.5335000157356262},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.498199999332428}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7106856685","title":"Improving Neutral Point-of-View Generation with Data- and Parameter-Efficient RL","url":"https://doi.org/10.48448/p7m8-t496","published":"2025-10-10","authors":["Association for Computational Linguistics 2025","Ahlheim, Christiane","Beirami, Ahmad","Dixon, Lucas","Hoffmann, Jessica","Jin, Jarvis","Liemt, Erin MacMurray van","Sidahmed, Hakim","Tano, Marie","Thain, Nithum","Walfrand, Aria","Yu, Zac"],"abstract":"The paper shows that parameter-efficient reinforcement learning (PE-RL) is a highly effective training regime to improve large language models' (LLMs) ability to answer queries on sensitive topics with a Neutral Point of View (NPOV), i.e. to provide significantly more informative, diverse and impartial answers. This is shown by evaluating PE-RL and multiple strong baselines---including LoRA finetuning (strongest baseline), SFT and RLHF. PE-RL not only improves on overall NPOV quality compared to the strongest baseline (97.06\\% rightarrow 99.08\\%), but also scores much higher on features linguists identify as key to separating good answers from the best answers (60.25\\% rightarrow 85.21\\% for presence of supportive details, 68.74\\% rightarrow 91.43\\% for absence of oversimplification). A qualitative analysis corroborates this. Finally, our evaluation finds no statistical differences betwe...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"other","doi":"https://doi.org/10.48448/p7m8-t496","openalex_id":"https://openalex.org/W7106856685","cited_by_count":0,"quality_score":41,"matched_keywords":["efficient"],"author_affiliations":["DeepMind (United Kingdom)","Google (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6523000001907349},{"id":"https://openalex.org/C97541855","display_name":"Reinforcement learning","score":0.5796999931335449},{"id":"https://openalex.org/C28719098","display_name":"Point (geometry)","score":0.5720999836921692},{"id":"https://openalex.org/C2779530757","display_name":"Quality (philosophy)","score":0.5443000197410583},{"id":"https://openalex.org/C26517878","display_name":"Key (lock)","score":0.5273000001907349},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5109000205993652},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.49050000309944153},{"id":"https://openalex.org/C51632099","display_name":"Training set","score":0.4830999970436096}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7106848004","title":"HMoE: Heterogeneous Mixture of Experts for Language Modeling","url":"https://doi.org/10.48448/w1w7-9q81","published":"2025-10-10","authors":["Association for Computational Linguistics 2025","Han, Weidong","Kang, Zhanhui","Li, Shuaipeng","Okazaki, Naoaki","Sun, Xingwu","Wang, Di","Wang, An","Xie, Ruobing","Xu, Cheng-Zhong","Yang, Zhen","Zhao, Pinxue"],"abstract":"Mixture of Experts (MoE) offers remarkable performance and computational efficiency by selectively activating subsets of model parameters. Traditionally, MoE models use homogeneous experts, each with identical capacity. However, varying complexity in input data necessitates experts with diverse capabilities, while homogeneous MoE hinders effective expert specialization and efficient parameter utilization. In this study, we propose a novel Heterogeneous Mixture of Experts (HMoE) framework, where experts differ in size and thus possess diverse capacities. This heterogeneity allows for more specialized experts to handle varying token complexities more effectively. To address the imbalance in expert activation, we propose a novel training objective that encourages the frequent activation of smaller experts, so as to improve computational efficiency and parameter utilization. Extensive experi...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"other","doi":"https://doi.org/10.48448/w1w7-9q81","openalex_id":"https://openalex.org/W7106848004","cited_by_count":0,"quality_score":41,"matched_keywords":["efficient"],"author_affiliations":["Tencent (China)","Tokyo Institute of Technology","University of Macau"],"concepts":[{"id":"https://openalex.org/C66882249","display_name":"Homogeneous","score":0.7473999857902527},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7188000082969666},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5029000043869019},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.4377000033855438},{"id":"https://openalex.org/C179799912","display_name":"Computational complexity theory","score":0.42419999837875366},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.4219000041484833},{"id":"https://openalex.org/C66024118","display_name":"Computational model","score":0.41499999165534973},{"id":"https://openalex.org/C48145219","display_name":"Security token","score":0.36719998717308044}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7106831978","title":"GSID: Generative Semantic Indexing for E-Commerce Product Understanding","url":"https://doi.org/10.48448/59sg-8a60","published":"2025-10-10","authors":["Association for Computational Linguistics 2025","Chen, Jufeng","Han, ChengbaoLian","Huang, Fei","Lian, Chengbao","Yang, Haiyang","Yu, Chen","zhang, qingheng","Zheng, Bo","Zheng, Bo","Zou, Huike"],"abstract":"Structured representation of product information is a major bottleneck for the efficiency of e-commerce platforms, especially in second-hand ecommerce platforms. Currently, most product information are organized based on manually curated product categories and attributes, which often fail to adequately cover long-tail products and do not align well with buyer preference. To address these problems, we propose \\textbf{G}enerative \\textbf{S}emantic \\textbf{I}n\\textbf{D}exings (GSID), a data-driven approach to generate product structured representations. GSID consists of two key components: (1) Pre-training on unstructured product metadata to learn in-domain semantic embeddings, and (2) Generating more effective semantic codes tailored for downstream product-centric applications. Extensive experiments are conducted to validate the effectiveness of GSID, and it has been successfully deployed....","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"other","doi":"https://doi.org/10.48448/59sg-8a60","openalex_id":"https://openalex.org/W7106831978","cited_by_count":0,"quality_score":41,"matched_keywords":["preference"],"author_affiliations":["Alibaba Group (China)","Alibaba Group (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7865999937057495},{"id":"https://openalex.org/C90673727","display_name":"Product (mathematics)","score":0.6266999840736389},{"id":"https://openalex.org/C93518851","display_name":"Metadata","score":0.5586000084877014},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.5533999800682068},{"id":"https://openalex.org/C2776207758","display_name":"Downstream (manufacturing)","score":0.5340999960899353},{"id":"https://openalex.org/C26517878","display_name":"Key (lock)","score":0.5260000228881836},{"id":"https://openalex.org/C2780513914","display_name":"Bottleneck","score":0.5235999822616577},{"id":"https://openalex.org/C2776359362","display_name":"Representation (politics)","score":0.4666999876499176}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7106830887","title":"FairGen: Controlling Sensitive Attributes for Fair Generations in Diffusion Models via Adaptive Latent Guidance","url":"https://doi.org/10.48448/pcpg-dy85","published":"2025-10-10","authors":["Association for Computational Linguistics 2025","Bannihatti Kumar, Vinayshekhar","Gangadharaiah, Rashmi","Kang, Mintong","Khosla, Sopan","Kumar, Abhishek","Roy, Shamik"],"abstract":"Text-to-image diffusion models often exhibit biases toward specific demographic groups, such as generating more males than females when prompted to generate images of engineers, raising ethical concerns and limiting their adoption. In this paper, we tackle the challenge of mitigating generation bias towards any target attribute value (e.g., \"male\" for \"gender\") in diffusion models while preserving generation quality. We propose FairGen, an adaptive latent guidance mechanism which controls the generation distribution during inference. In FairGen, a latent guidance module dynamically adjusts the diffusion process to enforce specific attributes, while a memory module tracks the generation statistics and steers latent guidance to align with the targeted fair distribution of the attribute values. Further, given the limitations of existing datasets in comprehensively assessing bias in diffusio...","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"other","doi":"https://doi.org/10.48448/pcpg-dy85","openalex_id":"https://openalex.org/W7106830887","cited_by_count":0,"quality_score":41,"matched_keywords":["memory"],"author_affiliations":["Amazon (Germany)","Amazon (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6247000098228455},{"id":"https://openalex.org/C185798385","display_name":"Benchmark (surveying)","score":0.5921000242233276},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.4226999878883362},{"id":"https://openalex.org/C112972136","display_name":"Stability (learning theory)","score":0.4156999886035919},{"id":"https://openalex.org/C98045186","display_name":"Process (computing)","score":0.3862999975681305},{"id":"https://openalex.org/C69357855","display_name":"Diffusion","score":0.38350000977516174},{"id":"https://openalex.org/C124101348","display_name":"Data mining","score":0.37790000438690186},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3666999936103821}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7106802663","title":"FLAMES: Improving LLM Math Reasoning via a Fine-Grained Analysis of the Data Synthesis Pipeline","url":"https://doi.org/10.48448/ep71-3h51","published":"2025-10-10","authors":["Association for Computational Linguistics 2025","Bansal, Mohit","Chung, Tagyoung","Gupta, Arpit","Mehta, Kartik","Oraby, Shereen","Peng, Nanyun","Saha, Soumya","Seegmiller, Parker","Tao, Chenyang"],"abstract":"Recent works improving LLM math reasoning with synthetic data have used unique setups, making comparison of data synthesis strategies impractical. This leaves many unanswered questions about the roles of different factors in the synthetic data pipeline, such as the impact of filtering low-quality problems. To address this gap, we introduce FLAMES, a Framework for LLM Assessment of Math rEasoning Data Synthesis, and perform a systematic study of 10 existing data synthesis strategies and multiple other factors impacting the performance of synthetic math reasoning data. Our FLAMES experiments provide several valuable insights about the optimal balance of difficulty and diversity of synthetic data. First, data agents designed to increase problem complexity lead to best improvements on most math metrics. Second, with a fixed data generation budget, keeping higher problem coverage is more impo...","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"other","doi":"https://doi.org/10.48448/ep71-3h51","openalex_id":"https://openalex.org/W7106802663","cited_by_count":0,"quality_score":41,"matched_keywords":["LLM"],"author_affiliations":["Amazon (United States)","Dartmouth College","Dartmouth Hospital","UCLA Health","University of North Carolina Health Care","University of North Carolina at Chapel Hill"],"concepts":[{"id":"https://openalex.org/C160920958","display_name":"Synthetic data","score":0.6396999955177307},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6031000018119812},{"id":"https://openalex.org/C43521106","display_name":"Pipeline (software)","score":0.5906999707221985},{"id":"https://openalex.org/C177148314","display_name":"Generalization","score":0.5825999975204468},{"id":"https://openalex.org/C2776937632","display_name":"Program synthesis","score":0.4456999897956848},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.3853999972343445},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3799999952316284},{"id":"https://openalex.org/C159032336","display_name":"Non-monotonic logic","score":0.33090001344680786}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7106801277","title":"Dynamic Evaluation for Oversensitivity in LLMs","url":"https://doi.org/10.48448/wdsf-pe96","published":"2025-10-10","authors":["Association for Computational Linguistics 2025","Cheng, Sitao","Pu, Xiao","Wang, William","Wang, Xin Eric"],"abstract":"Oversensitivity—where language models defensively reject benign prompts—not only disrupts user interactions but also obscures the boundaries between harmful and harmless content. Existing benchmarks rely on static datasets that degrade over time as models evolve, leading to data contamination and diminished evaluative power. To address this, we develop a framework that dynamically generates model-specific challenging datasets, capturing emerging defensive patterns and aligning with each model’s unique behavior. Building on this approach, we construct OverBench, a benchmark that aggregates these datasets across diverse LLM families, encompassing 450,000 samples from 26 models. OverBench provides a dynamic and evolving perspective on oversensitivity, allowing for continuous monitoring of defensive triggers as models advance, highlighting vulnerabilities that static datasets overlook.","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"other","doi":"https://doi.org/10.48448/wdsf-pe96","openalex_id":"https://openalex.org/W7106801277","cited_by_count":0,"quality_score":41,"matched_keywords":["LLM"],"author_affiliations":["Microsoft (United States)","Microsoft Research (United Kingdom)","University of California, Santa Barbara"],"concepts":[{"id":"https://openalex.org/C2780801425","display_name":"Construct (python library)","score":0.7580999732017517},{"id":"https://openalex.org/C12713177","display_name":"Perspective (graphical)","score":0.6801999807357788},{"id":"https://openalex.org/C185798385","display_name":"Benchmark (surveying)","score":0.6643999814987183},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.657800018787384},{"id":"https://openalex.org/C112930515","display_name":"Risk analysis (engineering)","score":0.4239000082015991},{"id":"https://openalex.org/C2522767166","display_name":"Data science","score":0.4196000099182129},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.3337000012397766},{"id":"https://openalex.org/C124101348","display_name":"Data mining","score":0.32359999418258667}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7106829019","title":"DiCoRe: Enhancing Zero-shot Event Detection via Divergent-Convergent LLM Reasoning","url":"https://doi.org/10.48448/454x-eq54","published":"2025-10-10","authors":["Association for Computational Linguistics 2025","Chang, Kai-Wei","Mehrabi, Ninareh","Mehta, Kartik","Parekh, Tanmay","Peng, Nanyun"],"abstract":"Zero-shot Event Detection (ED), the task of identifying event mentions in natural language text without any training data, is critical for document understanding in specialized domains. Understanding the complex event ontology, extracting domain-specific triggers from the passage, and structuring them appropriately overloads and limits the utility of Large Language Models (LLMs) for zero-shot ED. To this end, we propose DiCoRe, a divergent-convergent reasoning framework that decouples the task of ED using Dreamer and Grounder. Dreamer encourages divergent reasoning through open-ended event discovery, which helps to boost event coverage. Conversely, Grounder introduces convergent reasoning to align the free-form predictions with the task-specific instructions using finite-state machine guided constrained decoding. Additionally, an LLM-Judge verifies the final outputs to ensure high precis...","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"other","doi":"https://doi.org/10.48448/454x-eq54","openalex_id":"https://openalex.org/W7106829019","cited_by_count":0,"quality_score":41,"matched_keywords":["LLM"],"author_affiliations":["Amazon (United States)","UCLA Health","University of California, Los Angeles"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7827000021934509},{"id":"https://openalex.org/C2779662365","display_name":"Event (particle physics)","score":0.7508999705314636},{"id":"https://openalex.org/C2780451532","display_name":"Task (project management)","score":0.72079998254776},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.7002999782562256},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.5694000124931335},{"id":"https://openalex.org/C2775945657","display_name":"Structuring","score":0.5009999871253967},{"id":"https://openalex.org/C195324797","display_name":"Natural language","score":0.4546000063419342},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.43320000171661377}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7106820008","title":"Debiasing Multilingual LLMs in Cross-lingual Latent Space","url":"https://doi.org/10.48448/0n6h-d019","published":"2025-10-10","authors":["Association for Computational Linguistics 2025","Chai, Yekun","Hu, Guimin","Peng, Qiwei","Søgaard, Anders"],"abstract":"Debiasing techniques such as SentDebias aim to reduce bias in large language models (LLMs). Previous studies have evaluated their cross-lingual transferability by directly applying these methods to LLM representations, revealing their limited effectiveness across languages. In this work, we therefore propose to perform debiasing in a joint latent space rather than directly on LLM representations. We construct a well-aligned cross-lingual latent space using an autoencoder trained on parallel TED talk scripts. Our experiments with Aya-expanse and two debiasing techniques across four languages (English, French, German, Dutch) demonstrate that a) autoencoders effectively construct a well-aligned cross-lingual latent space, and b) applying debiasing techniques in the learned cross-lingual latent space significantly improves both the overall debiasing performance and cross-lingual transferabil...","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"other","doi":"https://doi.org/10.48448/0n6h-d019","openalex_id":"https://openalex.org/W7106820008","cited_by_count":0,"quality_score":41,"matched_keywords":["LLM"],"author_affiliations":["Baidu (China)","Harbin Institute of Technology","University of Copenhagen"],"concepts":[{"id":"https://openalex.org/C2779458634","display_name":"Debiasing","score":0.9980999827384949},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.652999997138977},{"id":"https://openalex.org/C2778572836","display_name":"Space (punctuation)","score":0.6051999926567078},{"id":"https://openalex.org/C2780801425","display_name":"Construct (python library)","score":0.5674999952316284},{"id":"https://openalex.org/C101738243","display_name":"Autoencoder","score":0.5537999868392944},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4844000041484833},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.3465000092983246},{"id":"https://openalex.org/C170133592","display_name":"Latent semantic analysis","score":0.32510000467300415}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7106839072","title":"DRBO: Mitigating the Bottleneck Effect via Dynamic Reward Balancing in Multi-reward LLM Optimization","url":"https://doi.org/10.48448/hy3p-db21","published":"2025-10-10","authors":["Association for Computational Linguistics 2025","Chen, Nuo","Gao, Anningzhe","Gao, Yufei","Jin, Yongnan","Wang, Benyou","Yan, Lingyong"],"abstract":"In the current landscape of large language models (LLMs), many evaluation metrics have been developed and used as rewards during training to improve specific metrics. However, balancing these metrics and dynamically adjusting reward weights remains challenging, as current approaches often fail to enhance weaker metrics. To address this, we empirically propose a Dynamic Reward Balancing Optimization framework DRBO to mitigate the \"short-board effect\" by measuring performance, adjusting reward weights to prioritize weaker metrics, and optimizing the model via reinforcement learning. We apply DRBO to both single-task and multi-type task scenarios, validating its effectiveness in generation with citations and online shopping conversation tasks. The results demonstrate improved overall performance and balanced optimization across multiple metrics, effectively overcoming the diversity and comp...","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"other","doi":"https://doi.org/10.48448/hy3p-db21","openalex_id":"https://openalex.org/W7106839072","cited_by_count":0,"quality_score":41,"matched_keywords":["LLM"],"author_affiliations":["Baidu (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7810999751091003},{"id":"https://openalex.org/C2780513914","display_name":"Bottleneck","score":0.741100013256073},{"id":"https://openalex.org/C2780451532","display_name":"Task (project management)","score":0.5526000261306763},{"id":"https://openalex.org/C97541855","display_name":"Reinforcement learning","score":0.5425999760627747},{"id":"https://openalex.org/C2776760102","display_name":"Code (set theory)","score":0.3950999975204468},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.3873000144958496},{"id":"https://openalex.org/C2777200299","display_name":"Conversation","score":0.3634999990463257},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.31369999051094055}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7106813827","title":"CTR-Guided Generative Query Suggestion in Conversational Search","url":"https://doi.org/10.48448/ze89-m287","published":"2025-10-10","authors":["Association for Computational Linguistics 2025","Cai, Hengyi","Jia, Xin","Jia, Xin","Min, Erxue","Wang, Shuaiqiang","Wang, Shuaiqiang","Wu, Yunfang","Yang, Min","Yang, Xihong","Yin, Dawei"],"abstract":"Generating effective query suggestions in conversational search requires aligning model outputs with user preferences, which is challenging due to sparse and noisy click signals. We propose \\textbf{GQS}, a generative framework that integrates click modeling and preference optimization to enhance real-world user engagement. GQS consists of three key components: (1) a \\textit{Multi-Source CTR Modeling} module that captures diverse contextual signals to estimate fine-grained click-through rates; (2) a \\textit{Diversity-Aware Preference Alignment} strategy using CTR-weighted Direct Preference Optimization (DPO), which balances relevance and semantic diversity; and (3) a \\textit{CTR-Calibrated Iterative Optimization} process that jointly refines the CTR and generation models across training rounds. Experiments on two real-world tasks demonstrate that GQS outperforms strong baselines in CTR, r...","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"other","doi":"https://doi.org/10.48448/ze89-m287","openalex_id":"https://openalex.org/W7106813827","cited_by_count":0,"quality_score":41,"matched_keywords":["preference"],"author_affiliations":["Baidu (China)","Peking University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8123000264167786},{"id":"https://openalex.org/C158154518","display_name":"Relevance (law)","score":0.6607999801635742},{"id":"https://openalex.org/C2781249084","display_name":"Preference","score":0.6129999756813049},{"id":"https://openalex.org/C98045186","display_name":"Process (computing)","score":0.5882999897003174},{"id":"https://openalex.org/C26517878","display_name":"Key (lock)","score":0.5511999726295471},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5358999967575073},{"id":"https://openalex.org/C167966045","display_name":"Generative model","score":0.5091999769210815},{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.491100013256073}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7106820474","title":"CMedCalc-Bench: A Fine-Grained Benchmark for Chinese Medical Calculations in LLM","url":"https://doi.org/10.48448/8795-fm39","published":"2025-10-10","authors":["Association for Computational Linguistics 2025","Wu, Xian","Zhang, Yunyan","Zhu, Zhihong"],"abstract":"Although Large Language Models (LLMs) have demonstrated significant potential in medical diagnostics and clinical decision-making, existing biomedical NLP benchmarks primarily focus on qualitative reasoning tasks, lacking rigorous evaluation of quantitative computation capabilities extensively used in clinical settings, particularly for Chinese language scenarios. To address this gap, we introduce CMedCalc-Bench, the first fine-grained benchmark specifically designed for Chinese medical calculation tasks. CMedCalc-Bench consists of 69 typical calculation tasks spanning multiple clinical domains such as cardiology, endocrinology, nephrology, and emergency medicine, featuring over 1,000 real-world Chinese clinical cases. We develop an innovative multi-stage evaluation framework that separately evaluates clinical entity extraction and numerical computation processes, enabling detailed diagn...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"other","doi":"https://doi.org/10.48448/8795-fm39","openalex_id":"https://openalex.org/W7106820474","cited_by_count":0,"quality_score":41,"matched_keywords":["LLM"],"author_affiliations":["Peking University","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C185798385","display_name":"Benchmark (surveying)","score":0.7566999793052673},{"id":"https://openalex.org/C45374587","display_name":"Computation","score":0.7473999857902527},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7210999727249146},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.49720001220703125},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.4259999990463257},{"id":"https://openalex.org/C192209626","display_name":"Focus (optics)","score":0.40630000829696655},{"id":"https://openalex.org/C124101348","display_name":"Data mining","score":0.37059998512268066},{"id":"https://openalex.org/C534262118","display_name":"Medical diagnosis","score":0.3495999872684479}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7106801511","title":"Ambiguity Awareness Optimization: Towards Semantic Disambiguation for Direct Preference Optimization","url":"https://doi.org/10.48448/vt56-2c26","published":"2025-10-10","authors":["Association for Computational Linguistics 2025","Chen, Xi","Li, Jian","Xu, Pengfei","Yin, Shenglin","Zhang, Yujia","Zhao, Alan","Zhou, Xiaohui"],"abstract":"Direct Preference Optimization (DPO) is a widely used reinforcement learning from human feedback (RLHF) method across various domains. The study of token importance has attracted widespread attention in DPO. Researchers have found that token importance is crucial for improving the effectiveness of DPO. It is observed that identical or semantically similar content (defined as ambiguous content) frequently appears within the preference pairs. We hypothesize that the presence of ambiguous content during DPO training may introduce ambiguity, thereby limiting further improvements in alignment. Through mathematical analysis and proof-of-concept experiments, we reveal that ambiguous content may potentially introduce ambiguities, thereby degrading performance. To address this issue, we introduce Ambiguity Awareness Optimization (AAO), a simple yet effective approach that automatically re-weights...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"other","doi":"https://doi.org/10.48448/vt56-2c26","openalex_id":"https://openalex.org/W7106801511","cited_by_count":0,"quality_score":41,"matched_keywords":["preference"],"author_affiliations":["Peking University","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C2780522230","display_name":"Ambiguity","score":0.7989000082015991},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6825000047683716},{"id":"https://openalex.org/C2781249084","display_name":"Preference","score":0.6445000171661377},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.61080002784729},{"id":"https://openalex.org/C185798385","display_name":"Benchmark (surveying)","score":0.5475000143051147},{"id":"https://openalex.org/C181204326","display_name":"Preference learning","score":0.5182999968528748},{"id":"https://openalex.org/C48145219","display_name":"Security token","score":0.49559998512268066},{"id":"https://openalex.org/C103278499","display_name":"Similarity (geometry)","score":0.48080000281333923}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7106813934","title":"AesBiasBench: Evaluating Bias and Alignment in Multimodal Language Models for Personalized Image Aesthetic Assessment","url":"https://doi.org/10.48448/v8xt-qe70","published":"2025-10-10","authors":["Association for Computational Linguistics 2025","Li, Kun","Liu, Kangcheng","Po, Lai-Man","Xu, Xuyuan","Yang, Hongzheng","Zhao, Yuzhi"],"abstract":"Multimodal Large Language Models (MLLMs) are increasingly used in Personalized Image Aesthetic Assessment (PIAA), offering a scalable alternative to expert evaluation. However, their outputs may reflect subtle biases shaped by demographic cues such as gender, age, or education. In this work, we introduce AesBiasBench, a benchmark designed to evaluate MLLMs along two complementary axes: (1) the presence of stereotype bias, measured by how aesthetic evaluations vary across demographic groups; and (2) the alignment between model outputs and real human aesthetic preferences. Our benchmark spans three subtasks, Aesthetic Perception, Assessment, and Empathy, and introduces structured metrics (IFD, NRD, AAS) to quantify both bias and alignment. We evaluate 19 MLLMs, including proprietary models (e.g., GPT-4o, Claude-3.5-Sonnet) and open-source models (e.g., InternVL-2.5, Qwen2.5-VL). Results sh...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"other","doi":"https://doi.org/10.48448/v8xt-qe70","openalex_id":"https://openalex.org/W7106813934","cited_by_count":0,"quality_score":41,"matched_keywords":["personalized"],"author_affiliations":["Huawei Technologies (China)"],"concepts":[{"id":"https://openalex.org/C185798385","display_name":"Benchmark (surveying)","score":0.6664999723434448},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6654000282287598},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5584999918937683},{"id":"https://openalex.org/C2778355321","display_name":"Identity (music)","score":0.5180000066757202},{"id":"https://openalex.org/C168127410","display_name":"Stereotype (UML)","score":0.5037999749183655},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.4156000018119812},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.4056999981403351},{"id":"https://openalex.org/C115961682","display_name":"Image (mathematics)","score":0.4032999873161316}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7106817497","title":"AIMMerging: Adaptive Iterative Model Merging Using Training Trajectories for Language Model Continual Learning","url":"https://doi.org/10.48448/gc9a-a697","published":"2025-10-10","authors":["Association for Computational Linguistics 2025","Chu, Xu","DONG, Xiaoyu","Feng, Yujie","Li, Jian","LU, ZEXIN","Wang, Yasha","Wu, Xiao-Ming","Xu, Pengfei","Zhang, Yujia","Zhao, Alan","Zhou, Xiaohui"],"abstract":"Continual learning (CL) is essential for deploying large language models (LLMs) in dynamic real-world environments without the need for costly retraining. Recent model merging-based methods have attracted significant attention, but they still struggle to effectively manage the trade-off between learning new knowledge and preventing forgetting, a challenge largely stemming from suboptimal number of merges and merging frequency. In this paper, we introduce Adaptive Iterative Model Merging (AimMerging), a novel CL framework that utilizes learning and forgetting signals from the training trajectory to dynamically monitor the model's training status. Guided by dynamic monitoring, the training trajectory-guided merge controller adaptively determines the timing and frequency of iterative fusion, while the rehearsal-based knowledge fusion module computes the merging weights and executes the fusi...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"other","doi":"https://doi.org/10.48448/gc9a-a697","openalex_id":"https://openalex.org/W7106817497","cited_by_count":0,"quality_score":41,"matched_keywords":["language model"],"author_affiliations":["Hong Kong Polytechnic University","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.796500027179718},{"id":"https://openalex.org/C117619785","display_name":"Iterative learning control","score":0.7307999730110168},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5626000165939331},{"id":"https://openalex.org/C197129107","display_name":"Merge (version control)","score":0.557200014591217},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.5490999817848206},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.47870001196861267},{"id":"https://openalex.org/C7149132","display_name":"Forgetting","score":0.4747999906539917},{"id":"https://openalex.org/C13662910","display_name":"Trajectory","score":0.4122999906539917}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4415030062","title":"LogEval: A comprehensive benchmark suite for LLMs in log analysis","url":"https://doi.org/10.1007/s10664-025-10701-6","published":"2025-10-10","authors":["Tianyu Cui","Shiyu Ma","Ziang Chen","Tong Xiao","Chenyu Zhao","Shimin Tao","Yilun Liu","Shenglin Zhang","Duoming Lin","Changchang Liu","Yuzhe Cai","Weibin Meng"],"abstract":"","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1007/s10664-025-10701-6","openalex_id":"https://openalex.org/W4415030062","cited_by_count":3,"quality_score":40,"matched_keywords":[],"author_affiliations":["Huawei Technologies (China)","Nankai University","Tianjin University","Tianjin haihe hospital","Tsinghua University"],"concepts":[{"id":"https://openalex.org/C86251818","display_name":"Benchmarking","score":0.8747000098228455},{"id":"https://openalex.org/C185798385","display_name":"Benchmark (surveying)","score":0.785099983215332},{"id":"https://openalex.org/C79581498","display_name":"Suite","score":0.6906999945640564},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6184999942779541},{"id":"https://openalex.org/C2522767166","display_name":"Data science","score":0.5019000172615051},{"id":"https://openalex.org/C43214815","display_name":"Reliability (semiconductor)","score":0.4699000120162964},{"id":"https://openalex.org/C2776214188","display_name":"Inference","score":0.4496000111103058},{"id":"https://openalex.org/C26517878","display_name":"Key (lock)","score":0.44519999623298645}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":3}},{"id":"openalex:W7106817988","title":"VIVA+: Human-Centered Situational Decision-Making","url":"https://doi.org/10.48448/ac0n-ts80","published":"2025-10-10","authors":["Association for Computational Linguistics 2025","Hu, Zhe","LI, Jing","Liu, Guanzhong","Ren, Yixiao","Yin, Yu"],"abstract":"Multimodal Large Language Models (MLLMs) show promising results for embodied agents in operating meaningfully in complex, human-centered environments. Yet, evaluating their capacity for nuanced, human-like reasoning and decision-making remains challenging. We hence introduce HRDBench, a cognitively grounded benchmark for evaluating Human-centered Embodied Reasoning and Decision-making in MLLMs .HRDBench consists of 1,113 real-world situations paired with 6,126 multiple-choice questions, targeting three core abilities for decision-making: (1) Foundational Situation Comprehension, (2) Context-Driven Action Justification, and (3) Reflective Reasoning. Together, these dimensions provide a holistic framework for assessing a model’s ability to perceive, reason, and act in socially meaningful ways. We evaluate the state-of-the-art commercial and open-source models on \\benchmark, where we reveal...","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"other","doi":"https://doi.org/10.48448/ac0n-ts80","openalex_id":"https://openalex.org/W7106817988","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Baidu (China)","Hong Kong Polytechnic University"],"concepts":[{"id":"https://openalex.org/C100609095","display_name":"Embodied cognition","score":0.8163999915122986},{"id":"https://openalex.org/C2780791683","display_name":"Action (physics)","score":0.6132000088691711},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5180000066757202},{"id":"https://openalex.org/C9114305","display_name":"Situational ethics","score":0.5012000203132629},{"id":"https://openalex.org/C15744967","display_name":"Psychology","score":0.4731000065803528},{"id":"https://openalex.org/C188147891","display_name":"Cognitive science","score":0.4318000078201294},{"id":"https://openalex.org/C2983448237","display_name":"Language understanding","score":0.41200000047683716},{"id":"https://openalex.org/C2164484","display_name":"Core (optical fiber)","score":0.3662000000476837}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7106792215","title":"The Role of Outgoing Connection Heterogeneity in Feedforward Layers of Large Language Models","url":"https://doi.org/10.48448/k7m8-b754","published":"2025-10-10","authors":["Association for Computational Linguistics 2025","Kumar, Shankar","Stahlberg, Felix"],"abstract":"We report on investigations into the characteristics of outgoing connections in feedforward layers of large language models. Our findings show that inner neurons with diverse outgoing connection strengths contribute more significantly to model performance than those with uniform connections. We propose a new fine-tuning loss that takes advantage of this observation by decreasing the outgoing connection entropy in feedforward layers. Using this loss yields gains over standard fine-tuning across two different model families (PaLM-2 and Gemma-2) for downstream tasks in math, coding, and language understanding. To further elucidate the role of outgoing connection heterogeneity, we develop a data-free structured pruning method, which uses entropy to identify and remove neurons. This method significantly surpasses the effectiveness of random and even magnitude-based neuron removal.","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"other","doi":"https://doi.org/10.48448/k7m8-b754","openalex_id":"https://openalex.org/W7106792215","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Google (United States)"],"concepts":[{"id":"https://openalex.org/C38858127","display_name":"Feed forward","score":0.7226999998092651},{"id":"https://openalex.org/C13355873","display_name":"Connection (principal bundle)","score":0.7128999829292297},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6686000227928162},{"id":"https://openalex.org/C106301342","display_name":"Entropy (arrow of time)","score":0.5604000091552734},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.5041000247001648},{"id":"https://openalex.org/C88626702","display_name":"Continuation","score":0.40310001373291016},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4020000100135803},{"id":"https://openalex.org/C47702885","display_name":"Feedforward neural network","score":0.3935000002384186}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7106848020","title":"Sparsifying Mamba","url":"https://doi.org/10.48448/d55r-fs94","published":"2025-10-10","authors":["Association for Computational Linguistics 2025","Kang, Zhanhui","Li, Shuaipeng","Sun, Xingwu","Wang, An","Xie, Ruobing"],"abstract":"The Transformer architecture has long dominated the development of large language models, but its quadratic complexity in sequence length presents scalability challenges. Recent advances in State Space Models, particularly Mamba series, offer a promising alternative with linear-time inference and competitive performance. While scaling model capacity via sparsification, exemplified by Mixture-of-Experts, has proven effective in reducing computation while expanding knowledge capacity, the integration of sparsification with Mamba remains largely unexplored. Existing attempts typically apply naive block-level stacking, failing to leverage Mamba’s internal structure for fine-grained sparsification. In this work, we mainly explore how to sparsify the parameters inside Mamba. We found that the effects of using sparsification strategies on parameters related to various mechanisms inside mamba ar...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"other","doi":"https://doi.org/10.48448/d55r-fs94","openalex_id":"https://openalex.org/W7106848020","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Tencent (China)","Tokyo Institute of Technology"],"concepts":[{"id":"https://openalex.org/C48044578","display_name":"Scalability","score":0.7189000248908997},{"id":"https://openalex.org/C153083717","display_name":"Leverage (statistics)","score":0.628000020980835},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5414000153541565},{"id":"https://openalex.org/C2776214188","display_name":"Inference","score":0.534500002861023},{"id":"https://openalex.org/C129844170","display_name":"Quadratic equation","score":0.5026999711990356},{"id":"https://openalex.org/C99844830","display_name":"Scaling","score":0.4796999990940094},{"id":"https://openalex.org/C45374587","display_name":"Computation","score":0.36730000376701355},{"id":"https://openalex.org/C123657996","display_name":"Architecture","score":0.3531000018119812}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7106840131","title":"Reasoning-to-Defend: Safety-Aware Reasoning Can Defend Large Language Models from Jailbreaking","url":"https://doi.org/10.48448/svyq-mp35","published":"2025-10-10","authors":["Association for Computational Linguistics 2025","Sha, Lei","Wang, Shuaiqiang","Yan, Lingyong","Yin, Dawei","Zhu, Junda"],"abstract":"Large Reasoning Models (LRMs) have demonstrated impressive performances across diverse domains. However, how safety of Large Language Models (LLMs) benefits from enhanced reasoning capabilities against jailbreak queries remains unexplored. To bridge this gap, in this paper, we propose Reasoning-to-Defend (R2D), a novel training paradigm that integrates a safety-aware reasoning mechanism into LLMs' generation. This enables self-evaluation at each step of the reasoning process, forming safety pivot tokens as indicators of the safety status of responses. Furthermore, in order to improve the accuracy of predicting pivot tokens, we propose Contrastive Pivot Optimization (CPO), which enhances the model's perception of the safety status of given dialogues. LLMs dynamically adjust their response strategies during reasoning, significantly enhancing their safety capabilities defending jailbreak at...","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"other","doi":"https://doi.org/10.48448/svyq-mp35","openalex_id":"https://openalex.org/W7106840131","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Baidu (China)","Beihang University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7087000012397766},{"id":"https://openalex.org/C26760741","display_name":"Perception","score":0.5217000246047974},{"id":"https://openalex.org/C63479239","display_name":"Robustness (evolution)","score":0.5138999819755554},{"id":"https://openalex.org/C195344581","display_name":"Automated reasoning","score":0.4733999967575073},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.46709999442100525},{"id":"https://openalex.org/C2777508537","display_name":"Visual reasoning","score":0.44429999589920044},{"id":"https://openalex.org/C182306322","display_name":"Order (exchange)","score":0.36340001225471497},{"id":"https://openalex.org/C100776233","display_name":"Bridge (graph theory)","score":0.36160001158714294}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7106826014","title":"RareSyn: Health Record Synthesis for Rare Disease Diagnosis","url":"https://doi.org/10.48448/98qs-6w32","published":"2025-10-10","authors":["Association for Computational Linguistics 2025","Wang, Huimin","Wu, Xian","Zhao, Yutian","Zheng, Yefeng"],"abstract":"Diagnosis based on Electronic Health Records (EHRs) often struggles with data scarcity and privacy concerns. To address these issues, we introduce RareSyn, an innovative data synthesis approach designed to augment and de-identify EHRs, with a focus on rare diseases. The core insight of RareSyn involves using seed EHRs of rare diseases to recall similar records from both common and rare diseases, and then leveraging Large Language Models to substitute the key medical information (e.g., symptoms or examination details) in these records with information from the knowledge graph, thereby generating new EHRs. We first train a transformer Encoder with contrastive learning to integrate various types of medical knowledge. Then, RareSyn engages in iterative processes of recalling similar EHRs, structuring EHRs, revising EHRs, and generating new EHRs until the produced EHRs achieve extensive cover...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"other","doi":"https://doi.org/10.48448/98qs-6w32","openalex_id":"https://openalex.org/W7106826014","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Chinese University of Hong Kong","Tencent (China)","Westlake University"],"concepts":[{"id":"https://openalex.org/C195910791","display_name":"Medical record","score":0.6080999970436096},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5638999938964844},{"id":"https://openalex.org/C3020144179","display_name":"Electronic health record","score":0.4535999894142151},{"id":"https://openalex.org/C100660578","display_name":"Recall","score":0.44760000705718994},{"id":"https://openalex.org/C3019952477","display_name":"Health records","score":0.4442000091075897},{"id":"https://openalex.org/C2779134260","display_name":"Disease","score":0.4099999964237213},{"id":"https://openalex.org/C71924100","display_name":"Medicine","score":0.4043000042438507},{"id":"https://openalex.org/C2779701055","display_name":"Rare disease","score":0.39980000257492065}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7106804296","title":"Planning-Aware Code Infilling via Horizon-Length Prediction","url":"https://doi.org/10.48448/29qp-jb36","published":"2025-10-10","authors":["Association for Computational Linguistics 2025","Ding, Hantian","Ding, Yifeng","Kumar, Varun","Sun, Qing","Wang, Zijian","Wang, Shiqi"],"abstract":"Fill-in-the-Middle (FIM), or infilling, has become integral to code language models, enabling generation of missing code given both left and right contexts. However, the current FIM training paradigm which performs next-token prediction (NTP) over reordered sequence often leads to models struggling to generate content that aligns well with the surrounding context. We hypothesize that NTP alone is insufficient for models to learn effective planning conditioned on the distant right context, a critical factor for successful code infilling. To overcome this, we propose Horizon-Length Prediction (HLP), a novel training objective that teaches models to predict the number of remaining middle tokens at each step. HLP advances FIM with lookahead planning, enabling models to inherently learn infilling boundaries for arbitrary left and right contexts without relying on dataset-specific post-process...","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"other","doi":"https://doi.org/10.48448/29qp-jb36","openalex_id":"https://openalex.org/W7106804296","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Amazon (Germany)","Amazon (United States)","University of Illinois Urbana-Champaign"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6822999715805054},{"id":"https://openalex.org/C2776760102","display_name":"Code (set theory)","score":0.6394000053405762},{"id":"https://openalex.org/C2776214188","display_name":"Inference","score":0.6327000260353088},{"id":"https://openalex.org/C2778112365","display_name":"Sequence (biology)","score":0.5521000027656555},{"id":"https://openalex.org/C2779960059","display_name":"Overhead (engineering)","score":0.508899986743927},{"id":"https://openalex.org/C2777211547","display_name":"Training (meteorology)","score":0.4832000136375427},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.43549999594688416},{"id":"https://openalex.org/C43126263","display_name":"Source code","score":0.41179999709129333}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7106794236","title":"OpenRLHF: A Ray-based Easy-to-use, Scalable and High-performance RLHF Framework","url":"https://doi.org/10.48448/p3yd-bv08","published":"2025-10-10","authors":["Association for Computational Linguistics 2025","., Xianyu","Cao, Yu","Chen, Bin","Chen, Hao","Fang, Wenkai","Hu, Jian","Jiang, Songlin","Liu, YiMing","Liu, Jason","Shen, Wei","Wang, Haoran"],"abstract":"Large Language Models (LLMs) fine-tuned via Reinforcement Learning from Human Feedback (RLHF) and Reinforcement Learning with Verifiable Rewards (RLVR) significantly improve the alignment of human-AI values and further raise the upper bound of AI capabilities, particularly in reasoning-intensive, long-context Chain-of-Thought (long-CoT) tasks. However, existing RLHF (or RLVR) frameworks commonly face challenges such as inference bottlenecks and complexity barriers, restricting their accessibility for newcomers. To bridge this gap, we introduce \\textbf{OpenRLHF}, a user-friendly, scalable, and easy-to-learn open-source RLHF framework built upon Ray, vLLM, DeepSpeed, and HuggingFace Transformers, featuring a simplified design, clear code structure, and comprehensive documentation to facilitate entry for researchers and practitioners. Experimental results show that OpenRLHF achieves superio...","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"other","doi":"https://doi.org/10.48448/p3yd-bv08","openalex_id":"https://openalex.org/W7106794236","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Baidu (China)","Tianjin University","Xiaomi (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8036999702453613},{"id":"https://openalex.org/C48044578","display_name":"Scalability","score":0.7138000130653381},{"id":"https://openalex.org/C97541855","display_name":"Reinforcement learning","score":0.6628999710083008},{"id":"https://openalex.org/C85847156","display_name":"Verifiable secret sharing","score":0.6575999855995178},{"id":"https://openalex.org/C56666940","display_name":"Documentation","score":0.61080002784729},{"id":"https://openalex.org/C2776760102","display_name":"Code (set theory)","score":0.5770999789237976},{"id":"https://openalex.org/C2776214188","display_name":"Inference","score":0.566100001335144},{"id":"https://openalex.org/C100776233","display_name":"Bridge (graph theory)","score":0.5103999972343445}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7106824863","title":"Mind the Inclusivity Gap: Multilingual Gender-Neutral Translation Evaluation with mGeNTE","url":"https://doi.org/10.48448/tvfs-y244","published":"2025-10-10","authors":["Association for Computational Linguistics 2025","Attanasio, Giuseppe","Bentivogli, Luisa","Cupin, Eleonora","Gkovedarou, Eleni","Hackenbuchner, Janiça","Lauscher, Anne","Negri, Matteo","Piergentili, Andrea","Savoldi, Beatrice","Thind, Manjinder"],"abstract":"Avoiding the propagation of undue (binary) gender inferences and default masculine language remains a key challenge towards inclusive multilingual technologies, particularly when translating into languages with extensive gendered morphology. Gender-neutral translation (GNT) represents a linguistic strategy towards fairer communication across languages. However, research on GNT is limited to a few resources and language pairs. To address this gap, we introduce mGeNTE, an expert-curated resource, and use it to conduct the first systematic multilingual evaluation of inclusive translation with state-of-the-art instruction-following language models (LMs). Experiments on en-es/de/it/el reveal that while models can recognize when neutrality is appropriate, they cannot consistently produce neutral translations, limiting their usability. To probe this behavior, we enrich our evaluation with inter...","companies":["Meta/FAIR"],"matched_orgs":["Meta/FAIR"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"other","doi":"https://doi.org/10.48448/tvfs-y244","openalex_id":"https://openalex.org/W7106824863","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Fondazione Bruno Kessler","Ghent University","Instituto de Telecomunicações","Meta (United States)","University College Ghent"],"concepts":[{"id":"https://openalex.org/C2781067378","display_name":"Interpretability","score":0.6881999969482422},{"id":"https://openalex.org/C41895202","display_name":"Linguistics","score":0.5900999903678894},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5626000165939331},{"id":"https://openalex.org/C149364088","display_name":"Translation (biology)","score":0.4896000027656555},{"id":"https://openalex.org/C2780035574","display_name":"Multilingualism","score":0.4562999904155731},{"id":"https://openalex.org/C26517878","display_name":"Key (lock)","score":0.4553000032901764},{"id":"https://openalex.org/C188198153","display_name":"Limiting","score":0.44530001282691956},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4196000099182129}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7106792722","title":"MetaFaith: Faithful Natural Language Uncertainty Expression in LLMs","url":"https://doi.org/10.48448/wbxq-m159","published":"2025-10-10","authors":["Association for Computational Linguistics 2025","Caciularu, Avi","Cohan, Arman","Liu, Gabrielle","Rudner, Tim","szpektor, idan","Yona, Gal"],"abstract":"A critical component in the trustworthiness of LLMs is reliable uncertainty communication, yet LLMs often use assertive language when conveying false claims, leading to over-reliance and eroded trust. We present the first systematic study of faithful confidence calibration of LLMs, benchmarking models' ability to use linguistic expressions of uncertainty that faithfully reflect their intrinsic uncertainty, across a comprehensive array of models, datasets, and prompting strategies. Our results demonstrate that LLMs largely fail at this task, and that existing interventions are insufficient: standard prompt approaches provide only marginal gains, and existing, factuality-based calibration techniques can even harm faithful calibration. To address this critical gap, we introduce MetaFaith, a novel prompt-based calibration approach inspired by human metacognition. We show that MetaFaith robus...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"other","doi":"https://doi.org/10.48448/wbxq-m159","openalex_id":"https://openalex.org/W7106792722","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Google (United States)","Yale University"],"concepts":[{"id":"https://openalex.org/C2777363581","display_name":"Harm","score":0.6132000088691711},{"id":"https://openalex.org/C165838908","display_name":"Calibration","score":0.5184999704360962},{"id":"https://openalex.org/C2780451532","display_name":"Task (project management)","score":0.5113999843597412},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.44690001010894775},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3982999920845032},{"id":"https://openalex.org/C15744967","display_name":"Psychology","score":0.39730000495910645},{"id":"https://openalex.org/C153701036","display_name":"Trustworthiness","score":0.39430001378059387},{"id":"https://openalex.org/C95713431","display_name":"Vulnerability (computing)","score":0.3817000091075897}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7106803746","title":"MemeReaCon: Probing Contextual Meme Understanding in Large Vision-Language Models","url":"https://doi.org/10.48448/gz4m-j510","published":"2025-10-10","authors":["Association for Computational Linguistics 2025","Li, Binyang","Liang, Bin","Wang, Huimin","WANG, Zezhong","Wong, Kam-fai","Wu, Xian","Zhang, Yifan","Zhang, Shubo","Zhang, Yuxi","Zhao, Yutian","Zhao, Zhengyi"],"abstract":"Memes have emerged as a popular form of multimodal online communication, where their interpretation heavily depends on the specific context in which they appear. Current approaches predominantly focus on isolated meme analysis, either for harmful content detection or standalone interpretation, overlooking a fundamental challenge: the same meme can express different intents depending on its conversational context. This oversight creates an evaluation gap: although humans intuitively recognize how context shapes meme interpretation, Large Vision Language Models (LVLMs) can hardly understand context-dependent meme intent. To address this critical limitation, we introduce MemeReaCon, a novel benchmark specifically designed to evaluate how LVLMs understand memes in their original context. We collected memes from five different Reddit communities, keeping each meme's image, the post text, and....","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"other","doi":"https://doi.org/10.48448/gz4m-j510","openalex_id":"https://openalex.org/W7106803746","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Chinese University of Hong Kong","Tencent (China)","University of Hong Kong","University of International Relations"],"concepts":[{"id":"https://openalex.org/C192209626","display_name":"Focus (optics)","score":0.7102000117301941},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6740000247955322},{"id":"https://openalex.org/C2779343474","display_name":"Context (archaeology)","score":0.6144000291824341},{"id":"https://openalex.org/C527412718","display_name":"Interpretation (philosophy)","score":0.49950000643730164},{"id":"https://openalex.org/C185798385","display_name":"Benchmark (surveying)","score":0.47530001401901245},{"id":"https://openalex.org/C2522767166","display_name":"Data science","score":0.3910999894142151},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3817000091075897},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.32249999046325684}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7106831971","title":"MRFD: Multi-Region Fusion Decoding with Self-Consistency for Mitigating Hallucinations in LVLMs","url":"https://doi.org/10.48448/1cem-q434","published":"2025-10-10","authors":["Association for Computational Linguistics 2025","Cai, Yujun","Ge, Haonan","Wang, Yiwei","Yang, Ming-Hsuan"],"abstract":"Large Vision-Language Models (LVLMs) have shown strong performance across multimodal tasks. However, they often produce hallucinations—text that is inconsistent with visual input, due to the limited ability to verify information in different regions of the image. To address this, we propose Multi-Region Fusion Decoding (MRFD), a training-free decoding method that improves factual grounding by modeling inter-region consistency. MRFD identifies salient regions using cross-attention, generates initial responses for each, and computes reliability weights based on Jensen-Shannon Divergence (JSD) among the responses. These weights guide a consistency-aware fusion of per-region predictions, using region-aware prompts inspired by Chain-of-Thought reasoning. Experiments across multiple LVLMs and benchmarks show that MRFD significantly reduces hallucinations and improves response factuality withou...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"other","doi":"https://doi.org/10.48448/1cem-q434","openalex_id":"https://openalex.org/W7106831971","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Google (United States)","University of California, Merced"],"concepts":[{"id":"https://openalex.org/C57273362","display_name":"Decoding methods","score":0.8596000075340271},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6532999873161316},{"id":"https://openalex.org/C2780719617","display_name":"Salient","score":0.6209999918937683},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5981000065803528},{"id":"https://openalex.org/C43214815","display_name":"Reliability (semiconductor)","score":0.5800999999046326},{"id":"https://openalex.org/C158525013","display_name":"Fusion","score":0.5048999786376953},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.4359999895095825},{"id":"https://openalex.org/C33954974","display_name":"Sensor fusion","score":0.421099990606308}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7106801005","title":"Graders Should Cheat: Privileged Information Enables Expert-Level Automated Evaluations","url":"https://doi.org/10.48448/zxjy-ee17","published":"2025-10-10","authors":["Association for Computational Linguistics 2025","Arnold, Séb","Ding, Nan","Hua, Nan","Q Weinberger, Kilian","Sha, Fei","Zhou, Jin Peng"],"abstract":"Auto-evaluating language models (LMs), i.e., using a grader LM to evaluate the candidate LM, is an appealing way to accelerate the evaluation process and the cost associated with it. But this presents a paradox: how can we trust the grader LM, which is presumably weaker than the candidate LM, to assess problems that are beyond the frontier of the capabilities of either model or both? For instance, today's LMs struggle on graduate-level physics and Olympiad-level math, making them unreliable graders in these domains. We show that providing privileged information -- such as ground-truth solutions or problem-specific guidelines -- improves automated evaluations on such frontier problems. This approach offers two key advantages. First, it expands the range of problems where LMs graders apply. Specifically, weaker models can now rate the predictions of stronger models. Second, privileged info...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"other","doi":"https://doi.org/10.48448/zxjy-ee17","openalex_id":"https://openalex.org/W7106801005","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Cornell University","DeepMind (United Kingdom)","Google (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6977999806404114},{"id":"https://openalex.org/C26517878","display_name":"Key (lock)","score":0.5688999891281128},{"id":"https://openalex.org/C98045186","display_name":"Process (computing)","score":0.5486999750137329},{"id":"https://openalex.org/C2778571376","display_name":"Frontier","score":0.5181000232696533},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4830000102519989},{"id":"https://openalex.org/C204323151","display_name":"Range (aeronautics)","score":0.4724000096321106},{"id":"https://openalex.org/C48103436","display_name":"State (computer science)","score":0.39480000734329224},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.3873000144958496}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7106813724","title":"GeoChain: Multimodal Chain-of-Thought for Geographic Reasoning","url":"https://doi.org/10.48448/0yz7-ba79","published":"2025-10-10","authors":["Association for Computational Linguistics 2025","Grover, Rynaa","Pande, Nilay","Tamarapalli, Jayant Sravan","Yerramilli, Sahiti"],"abstract":"This paper introduces GeoChain, a large-scale benchmark for evaluating step-by-step geographic reasoning in multimodal large language models (MLLMs). Leveraging 1.46 million Mapillary street-level images, GeoChain pairs each image with a 21-step chain-of-thought (CoT) question sequence (over 30 million Q&A pairs). These sequences guide models from coarse attributes to fine-grained localization across four reasoning categories - visual, spatial, cultural, and precise geolocation - annotated by difficulty. Images are also enriched with semantic segmentation (150 classes) and a visual locatability score. Our benchmarking of contemporary MLLMs (GPT-4.1 variants, Claude 3.7, Gemini 2.5 variants) on a diverse 2,088-image subset reveals consistent challenges: models frequently exhibit weaknesses in visual grounding, display erratic reasoning, and struggle to achieve accurate localization, espec...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"other","doi":"https://doi.org/10.48448/0yz7-ba79","openalex_id":"https://openalex.org/W7106813724","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Google (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6640999913215637},{"id":"https://openalex.org/C22041718","display_name":"Geolocation","score":0.6531999707221985},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.6189000010490417},{"id":"https://openalex.org/C2777508537","display_name":"Visual reasoning","score":0.5289999842643738},{"id":"https://openalex.org/C86251818","display_name":"Benchmarking","score":0.5288000106811523},{"id":"https://openalex.org/C185798385","display_name":"Benchmark (surveying)","score":0.4839000105857849},{"id":"https://openalex.org/C184337299","display_name":"Semantics (computer science)","score":0.4778999984264374},{"id":"https://openalex.org/C155911833","display_name":"Spatial intelligence","score":0.42800000309944153}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7106786836","title":"Eliciting Implicit Acoustic Styles from Open-domain Instructions to Facilitate Fine-grained Controllable Generation of Speech","url":"https://doi.org/10.48448/fhrx-0h18","published":"2025-10-10","authors":["Association for Computational Linguistics 2025","Chen, Wenqing","Li, Chen","Wang, Zhisheng","Yang, Peiji","Yin, Jian","Yu, Jianxing","Zihao, Gou"],"abstract":"This paper focuses on generating speech with the acoustic style that meets users' needs based on their open-domain instructions. To control the style, early work mostly relies on pre-defined rules or templates. The control types and formats are fixed in a closed domain, making it hard to meet diverse needs of users. One solution is to resort to instructions in free text to guide the generation. Current work mainly studies the instructions that clearly specify the acoustic styles, such as low pitch and fast speed. However, the instructions are complex, some even vague and abstract, such as ``Generate a voice of a woman who is heartbroken due to a breakup. It is hard to infer this implicit style by traditional matching-based methods. To address this problem, we propose a new controllable model. It first utilizes multimodal LLMs with knowledge-augmented techniques to infer the desired speec...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"other","doi":"https://doi.org/10.48448/fhrx-0h18","openalex_id":"https://openalex.org/W7106786836","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Sun Yat-sen University","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7709000110626221},{"id":"https://openalex.org/C2780598303","display_name":"Flexibility (engineering)","score":0.6780999898910522},{"id":"https://openalex.org/C2776436953","display_name":"Consistency (knowledge bases)","score":0.6402000188827515},{"id":"https://openalex.org/C2776445246","display_name":"Style (visual arts)","score":0.5843999981880188},{"id":"https://openalex.org/C2775924081","display_name":"Control (management)","score":0.5687999725341797},{"id":"https://openalex.org/C2780992000","display_name":"Generator (circuit theory)","score":0.5515999794006348},{"id":"https://openalex.org/C48209547","display_name":"Controllability","score":0.5120999813079834},{"id":"https://openalex.org/C28490314","display_name":"Speech recognition","score":0.4560000002384186}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7106847187","title":"DivScene: Towards Open-Vocabulary Object Navigation with Large Vision Language Models in Diverse Scenes","url":"https://doi.org/10.48448/0acr-ra67","published":"2025-10-10","authors":["Association for Computational Linguistics 2025","Fang, T","Ma, Kaixin","Song, Yangqiu","Tian, Ye","Wang, Zhaowei","Yang, Yue","Yu, Dong","Zhang, Hongming"],"abstract":"Large Vision-Language Models (LVLMs) have achieved significant progress in tasks like visual question answering and document understanding. However, their potential to comprehend embodied environments and navigate within them remains underexplored. In this work, we first study the challenge of open-vocabulary object navigation by introducing DivScene, a large-scale dataset with 4,614 houses across 81 scene types and 5,707 kinds of target objects. Our dataset provides a much greater diversity of target objects and scene types than existing datasets, enabling a comprehensive task evaluation. We evaluated various methods with LVLMs and LLMs on our dataset and found that current models still fall short of open-vocab object navigation ability. Then, we fine-tuned LVLMs to predict the next action with CoT explanations. We observe that LVLM's navigation ability can be improved substantially wit...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"other","doi":"https://doi.org/10.48448/0acr-ra67","openalex_id":"https://openalex.org/W7106847187","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Hong Kong University of Science and Technology","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7558000087738037},{"id":"https://openalex.org/C2781238097","display_name":"Object (grammar)","score":0.6998000144958496},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.6740999817848206},{"id":"https://openalex.org/C2780451532","display_name":"Task (project management)","score":0.6549999713897705},{"id":"https://openalex.org/C2780791683","display_name":"Action (physics)","score":0.5967000126838684},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.5167999863624573},{"id":"https://openalex.org/C100609095","display_name":"Embodied cognition","score":0.4925000071525574},{"id":"https://openalex.org/C2781316041","display_name":"Diversity (politics)","score":0.3889000117778778}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7106790685","title":"DivLogicEval: A Framework for Benchmarking Logical Reasoning Evaluation in Large Language Models","url":"https://doi.org/10.48448/jpe5-qc34","published":"2025-10-10","authors":["Association for Computational Linguistics 2025","Liu, Lemao","Ting Chung, Tsz","Yeung, Dit-Yan","Yu, Mo"],"abstract":"Logic reasoning in natural language has been recognized as an important measure of human intelligence for Large Language Models (LLMs). Popular benchmarks may entangle multiple reasoning skills and thus provide unfaithful evaluations on the logic reasoning skill. Meanwhile, existing logic reasoning benchmarks are limited in language diversity and their distributions are deviated from the distribution of an ideal logic reasoning benchmark, which may lead to biased evaluation results. This paper thereby proposes a new classical logic benchmark DivLogicEval, consisting of natural sentences composed of diverse statements in a counterintuitive way. To ensure a more reliable evaluation, we also introduce a new evaluation metric that mitigates the influence of bias and randomness inherent in LLMs. Through experiments, we demonstrate the extent to which logical reasoning is required to answer th...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"other","doi":"https://doi.org/10.48448/jpe5-qc34","openalex_id":"https://openalex.org/W7106790685","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Hong Kong University of Science and Technology","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6571999788284302},{"id":"https://openalex.org/C195344581","display_name":"Automated reasoning","score":0.6158000230789185},{"id":"https://openalex.org/C134752490","display_name":"Logical consequence","score":0.5141000151634216},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.501800000667572},{"id":"https://openalex.org/C159032336","display_name":"Non-monotonic logic","score":0.47839999198913574},{"id":"https://openalex.org/C102993220","display_name":"Description logic","score":0.44699999690055847},{"id":"https://openalex.org/C89288958","display_name":"Reasoning system","score":0.41929998993873596},{"id":"https://openalex.org/C43971567","display_name":"Logical reasoning","score":0.41589999198913574}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7106829907","title":"DiMo-GUI: Advancing Test-time Scaling in GUI Grounding via Modality-Aware Visual Reasoning","url":"https://doi.org/10.48448/j506-px49","published":"2025-10-10","authors":["Association for Computational Linguistics 2025","Cai, Yujun","Chen, Hongkai","Liu, Chang","Wang, Yiwei","Wu, Hang","Yang, Ming-Hsuan","Ye, Qingwen"],"abstract":"Grounding natural language queries in graphical user interfaces (GUIs) poses unique challenges due to the diversity of visual elements, spatial clutter, and the ambiguity of language. In this paper, we introduce DiMo-GUI, a training-free framework for GUI grounding that leverages two core strategies: modality-wise decomposition and progressive visual zoom-in. Instead of treating the GUI as a monolithic image, our method splits the input into textual elements and iconic elements, allowing the model to reason over each modality independently using general-purpose vision-language models (VLMs). When predictions are ambiguous or incorrect, DiMo-GUI dynamically focuses attention by generating candidate focal regions centered on the model’s initial predictions and incrementally zooms into subregions to refine the grounding result. This hierarchical refinement process helps disambiguate visuall...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"other","doi":"https://doi.org/10.48448/j506-px49","openalex_id":"https://openalex.org/W7106829907","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Google (United States)","University of California, Merced"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7972999811172485},{"id":"https://openalex.org/C2780522230","display_name":"Ambiguity","score":0.5989999771118164},{"id":"https://openalex.org/C2777508537","display_name":"Visual reasoning","score":0.5536999702453613},{"id":"https://openalex.org/C98045186","display_name":"Process (computing)","score":0.5295000076293945},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.52920001745224},{"id":"https://openalex.org/C36464697","display_name":"Visualization","score":0.5109999775886536},{"id":"https://openalex.org/C168993435","display_name":"Ground","score":0.49129998683929443},{"id":"https://openalex.org/C2776214188","display_name":"Inference","score":0.4903999865055084}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7106790431","title":"CogDual: Enhancing Dual Cognition of LLMs via Reinforcement Learning with Implicit Rule-Based Rewards","url":"https://doi.org/10.48448/5j4e-hf52","published":"2025-10-10","authors":["Association for Computational Linguistics 2025","Chen, Xingyu","Li, Jian","Li, Xiaolong","Liu, Cheng","Ren, Feiliang","Tu, Zhaopeng","Ye, Fanghua","YifeiLu, YifeiLu"],"abstract":"Role-Playing Language Agents (RPLAs) have emerged as a significant application direction for Large Language Models (LLMs). Existing approaches typically rely on prompt engineering or supervised fine-tuning to enable models to imitate character behaviors in specific scenarios, but often neglect the underlying cognitive mechanisms driving these behaviors. Inspired by cognitive psychology, we introduce CogDual, a novel RPLA adopting a cognize-then-respond reasoning paradigm. By jointly modeling external situational awareness and internal self-awareness, CogDual generates responses with improved character consistency and contextual alignment. To further optimize the performance, we employ reinforcement learning with two general-purpose reward schemes designed for open-domain text generation. Extensive experiments on the CoSER benchmark, as well as Cross-MR and LifeChoice, demonstrate that Co...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"other","doi":"https://doi.org/10.48448/5j4e-hf52","openalex_id":"https://openalex.org/W7106790431","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Shanghai Jiao Tong University","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C97541855","display_name":"Reinforcement learning","score":0.661300003528595},{"id":"https://openalex.org/C169900460","display_name":"Cognition","score":0.6424999833106995},{"id":"https://openalex.org/C2776436953","display_name":"Consistency (knowledge bases)","score":0.6338000297546387},{"id":"https://openalex.org/C2780980858","display_name":"Dual (grammatical number)","score":0.5934000015258789},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5026999711990356},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4932999908924103},{"id":"https://openalex.org/C9114305","display_name":"Situational ethics","score":0.4900999963283539},{"id":"https://openalex.org/C180747234","display_name":"Cognitive psychology","score":0.4397999942302704}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7106827175","title":"CodeMixBench: Evaluating Code-Mixing Capabilities of LLMs Across 18 Languages","url":"https://doi.org/10.48448/ar8t-kr64","published":"2025-10-10","authors":["Association for Computational Linguistics 2025","Chai, Yekun","Yang, Yilun"],"abstract":"Code-mixing, the practice of switching between languages within a conversation, poses unique challenges for traditional NLP. Existing benchmarks like LinCE and GLUECoS are limited by their narrow language pairs and tasks, failing to adequately assess large language models' (LLMs) code-mixing abilities. Despite the recognized importance of code-mixing for multilingual users, research on LLMs in this context remains sparse. Additionally, current techniques for synthesizing code-mixed data are underdeveloped to generate code-mixing. In response, we introduce CodeMixBench, a comprehensive benchmark covering eight tasks, including three specific to LLMs and five traditional NLP tasks, and 18 languages from seven language families. We also propose a new method for generating large-scale synthetic code-mixed texts by combining word substitution with GPT-4 prompting. Our evaluation reveals consi...","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"other","doi":"https://doi.org/10.48448/ar8t-kr64","openalex_id":"https://openalex.org/W7106827175","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Baidu (China)"],"concepts":[{"id":"https://openalex.org/C2779343474","display_name":"Context (archaeology)","score":0.6291000247001648},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6014000177383423},{"id":"https://openalex.org/C185798385","display_name":"Benchmark (surveying)","score":0.46950000524520874},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.45730000734329224},{"id":"https://openalex.org/C90805587","display_name":"Word (group theory)","score":0.45179998874664307},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.43779999017715454},{"id":"https://openalex.org/C2522767166","display_name":"Data science","score":0.31949999928474426},{"id":"https://openalex.org/C51632099","display_name":"Training set","score":0.3091999888420105}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7106798318","title":"Annotating Training Data for Conditional Semantic Textual Similarity Measurement using Large Language Models","url":"https://doi.org/10.48448/msrv-yq46","published":"2025-10-10","authors":["Association for Computational Linguistics 2025","Bollegala, Danushka","Zhang, Gaifan","Zhou, Yi"],"abstract":"Semantic similarity between two sentences depends on the aspects considered between those sentences. To study this phenomenon, Deshpande et al. (2023) proposed the Conditional Semantic Textual Similarity (C-STS) task and annotated a human-rated similarity dataset containing pairs of sentences compared under two different conditions. However, Tu et al. (2024) found various annotation issues in this dataset and showed that manually re-annotating a small portion of it leads to more accurate C-STS models. Despite these pioneering efforts, the lack of large and accurately annotated C-STS datasets remains a blocker for making progress on this task as evidenced by the subpar performance of the C-STS models. To address this training data need, we resort to Large Language Models (LLMs) to correct the condition statements and similarity ratings in the original dataset proposed by Deshpande et al.....","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"other","doi":"https://doi.org/10.48448/msrv-yq46","openalex_id":"https://openalex.org/W7106798318","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Amazon (United States)","Cardiff University","University of Liverpool"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8093000054359436},{"id":"https://openalex.org/C2780451532","display_name":"Task (project management)","score":0.7214999794960022},{"id":"https://openalex.org/C103278499","display_name":"Similarity (geometry)","score":0.6991000175476074},{"id":"https://openalex.org/C2776321320","display_name":"Annotation","score":0.6837000250816345},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.65420001745224},{"id":"https://openalex.org/C51632099","display_name":"Training set","score":0.6331999897956848},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.6301000118255615},{"id":"https://openalex.org/C130318100","display_name":"Semantic similarity","score":0.6237000226974487}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7106814614","title":"AlignX: Advancing Multilingual Large Language Models with Multilingual Representation Alignment","url":"https://doi.org/10.48448/6ms1-w020","published":"2025-10-10","authors":["Association for Computational Linguistics 2025","Bu, Mengyu","Feng, Yang","He, Zhongjun","Wu, Hua","Zhang, Shaolei"],"abstract":"Multilingual large language models (LLMs) possess impressive multilingual understanding and generation capabilities. However, their performance and cross-lingual alignment often lag for non-dominant languages. A common solution is to fine-tune LLMs on large-scale and more balanced multilingual corpus, but such approaches often lead to imprecise alignment and suboptimal knowledge transfer, struggling with limited improvements across languages. In this paper, we propose AlignX to bridge the multilingual performance gap, which is a two-stage representation-level framework for enhancing multilingual performance of pre-trained LLMs. In the first stage, we align multilingual representations with multilingual semantic alignment and language feature integration. In the second stage, we stimulate the multilingual capability of LLMs via multilingual instruction fine-tuning. Experimental results on...","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"other","doi":"https://doi.org/10.48448/6ms1-w020","openalex_id":"https://openalex.org/W7106814614","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Baidu (China)","Chinese Academy of Sciences","Institute of Computing Technology"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7854999899864197},{"id":"https://openalex.org/C2780035574","display_name":"Multilingualism","score":0.6383000016212463},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.5932999849319458},{"id":"https://openalex.org/C2776359362","display_name":"Representation (politics)","score":0.5529999732971191},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5318999886512756},{"id":"https://openalex.org/C2776401178","display_name":"Feature (linguistics)","score":0.4291999936103821},{"id":"https://openalex.org/C41895202","display_name":"Linguistics","score":0.42640000581741333},{"id":"https://openalex.org/C100776233","display_name":"Bridge (graph theory)","score":0.35370001196861267}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7106824952","title":"A Comprehensive Framework to Operationalize Social Stereotypes for Responsible AI Evaluations","url":"https://doi.org/10.48448/sa3f-nt71","published":"2025-10-10","authors":["Association for Computational Linguistics 2025","Dev, Sunipa","Mostafazadeh Davani, Aida","Pérez-Urbina, Héctor","Prabhakaran, Vinodkumar"],"abstract":"Societal stereotypes are at the center of a myriad of responsible AI interventions targeted at reducing the generation and propagation of potentially harmful outcomes. While these efforts are much needed, they tend to be fragmented and often address different parts of the issue without taking in a unified or holistic approach about social stereotypes and how they impact various parts of the machine learning pipeline. As a result, it fails to capitalize on the underlying mechanisms that are common across different types of stereotypes, and to anchor on particular aspects that are relevant in certain cases. In this paper, we draw on social psychological research, and build on NLP data and methods, to propose a unified framework to operationalize stereotypes in generative AI evaluations. Our framework identifies key components of stereotypes that are crucial in AI evaluation, including the....","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"other","doi":"https://doi.org/10.48448/sa3f-nt71","openalex_id":"https://openalex.org/W7106824952","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Google (United States)"],"concepts":[{"id":"https://openalex.org/C9354725","display_name":"Operationalization","score":0.8964999914169312},{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.5759999752044678},{"id":"https://openalex.org/C15744967","display_name":"Psychology","score":0.503600001335144},{"id":"https://openalex.org/C27415008","display_name":"Psychological intervention","score":0.4810999929904938},{"id":"https://openalex.org/C77805123","display_name":"Social psychology","score":0.43459999561309814},{"id":"https://openalex.org/C26517878","display_name":"Key (lock)","score":0.42329999804496765},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.4049000144004822},{"id":"https://openalex.org/C180747234","display_name":"Cognitive psychology","score":0.3393999934196472}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"bytedance-seed:316","title":"Memory Retrieval and Consolidation in Large Language Models through Function Tokens","url":"https://seed.bytedance.com/en/research/memory-retrieval-and-consolidation-in-large-language-models-through-function-tokens","published":"2025-10-09","authors":["Shaohua Zhang","Yuan Lin","Hang Li"],"abstract":"The remarkable success of large language models (LLMs) stems from their ability to consolidate vast amounts of knowledge into the memory during pre-training and to retrieve it from the memory during inference, enabling advanced capabilities such as knowledge memorization, instruction-following and reasoning. However, the mechanisms of memory retrieval and consolidation in LLMs remain poorly understood. In this paper, we propose the function token hypothesis to explain the workings of LLMs: During inference, function tokens activate the most predictive features from context and govern next token prediction (memory retrieval). During pre-training, predicting the next tokens (usually content tokens) that follow function tokens increases the number of learned features of LLMs and updates the model parameters (memory consolidation). Function tokens here roughly correspond to function words in...","companies":["ByteDance/Seed"],"matched_orgs":["ByteDance/Seed"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Computation and Language","Responsible AI","arXiv","memory","retrieval"],"author_affiliations":["ByteDance/Seed"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official ByteDance Seed publication API https://seed.bytedance.com/api/get_article_list_v2"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/dyna-mind-learning-to-simulate-from-experience-for-better-ai-agents","title":"Dyna-Mind: Learning to Simulate from Experience for Better AI Agents","url":"https://www.microsoft.com/en-us/research/publication/dyna-mind-learning-to-simulate-from-experience-for-better-ai-agents/","published":"2025-10-09","authors":["Xiao Yu","Baolin Peng","Michel Galley","Hao Cheng","Qianhui Wu","Janardhan Kulkarni","Suman Nath","Zhou Yu","Jianfeng Gao"],"abstract":"Reasoning models have recently shown remarkable progress in domains such as math and coding. However, their expert-level abilities in math and coding contrast sharply with their performance in long-horizon, interactive tasks such as web navigation and computer/phone-use. Inspired by literature on human cognition, we argue that current AI agents need''vicarious trial and error''- the capacity to mentally simulate alternative futures before acting - in order to enhance their understanding and performance in complex interactive environments. We introduce Dyna-Mind, a two-stage training framework that explicitly teaches (V)LM agents to integrate such simulation into their reasoning. In stage 1, we introduce Reasoning with Simulations (ReSim), which trains the agent to generate structured reasoning traces from expanded search trees built from real experience gathered through environment inter...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","1970-01-01","agent"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"hf-org-paper:tencent:2510.08191","title":"Training-Free Group Relative Policy Optimization","url":"https://huggingface.co/papers/2510.08191","published":"2025-10-09","authors":["Tencent/Hunyuan"],"abstract":"","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["HuggingFace org papers","tencent"],"author_affiliations":["Tencent/Hunyuan"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/tencent/papers"}},{"id":"hf-org-paper:tencent:2510.07790","title":"GCPO: When Contrast Fails, Go Gold","url":"https://huggingface.co/papers/2510.07790","published":"2025-10-09","authors":["Tencent/Hunyuan"],"abstract":"","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["HuggingFace org papers","tencent"],"author_affiliations":["Tencent/Hunyuan"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/tencent/papers"}},{"id":"apple:abjq36grc8vc4zhkfcgpxuj9","title":"Analyzing Dialectical Biases in LLMs for Knowledge and Reasoning Benchmarks","url":"https://machinelearning.apple.com/research/analyzing-dialectical","published":"2025-10-09","authors":["Eileen Pan","Anna Seo Gyeong Choi","Maartje ter Hoeve","Skyler Seto","Allison Koenecke"],"abstract":"Large language models (LLMs) are ubiquitous in modern day natural language processing. However, previous work has shown degraded LLM performance for under-represented English dialects. We analyze the effects of typifying \"standard\" American English language questions as non-\"standard\" dialectal variants on multiple choice question answering tasks and find up to a 20% reduction in accuracy. Additionally, we investigate the grammatical basis of...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["LLM"],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"openalex:W4415008962","title":"PReMM: LLM-Based Program Repair for Multi-method Bugs via Divide and Conquer","url":"https://doi.org/10.1145/3763097","published":"2025-10-09","authors":["Linna Xie","Zhong Li","Yu Pei","Zhongzhen Wen","Kui Liu","Tian Zhang","Xuandong Li"],"abstract":"Large-language models (LLMs) have been leveraged to enhance the capability of automated program repair techniques in recent research. While existing LLM-based program repair techniques compared favorably to other techniques based on heuristics, constraint-solving, and learning in producing high-quality patches, they mainly target bugs that can be corrected by changing a single faulty method, which greatly limits the effectiveness of such techniques in repairing bugs that demand patches spanning across multiple methods. In this work, we propose the PReMM technique to effectively propose patches changing multiple methods. PReMM builds on three core component techniques: the faulty method clustering technique to partition the faulty methods into clusters based on the dependence relationship among them, enabling a divide-and-conquer strategy for the repairing task; the fault context extracti...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3763097","openalex_id":"https://openalex.org/W4415008962","cited_by_count":1,"quality_score":46,"matched_keywords":["LLM","agent"],"author_affiliations":["Hong Kong Polytechnic University","Huawei Technologies (China)","Nanjing University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7003999948501587},{"id":"https://openalex.org/C73555534","display_name":"Cluster analysis","score":0.6061999797821045},{"id":"https://openalex.org/C2779343474","display_name":"Context (archaeology)","score":0.5708000063896179},{"id":"https://openalex.org/C71559656","display_name":"Divide and conquer algorithms","score":0.5503000020980835},{"id":"https://openalex.org/C1009929","display_name":"Software bug","score":0.5378999710083008},{"id":"https://openalex.org/C42812","display_name":"Partition (number theory)","score":0.4339999854564667},{"id":"https://openalex.org/C175551986","display_name":"Fault (geology)","score":0.42480000853538513},{"id":"https://openalex.org/C124101348","display_name":"Data mining","score":0.39480000734329224}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4415007445","title":"The Continuous Tensor Abstraction: Where Indices Are Real","url":"https://doi.org/10.1145/3763146","published":"2025-10-09","authors":["Jaeyeon Won","Willow Ahrens","Teodoro Fields Collin","Joel Emer","Saman Amarasinghe"],"abstract":"This paper introduces the continuous tensor abstraction, allowing indices to take real-number values (e.g., A[3.14]). It also presents continuous tensor algebra expressions, such as C x , y = A x , y ∗ B x , y , where indices are defined over a continuous domain. This work expands the traditional tensor model to include continuous tensors. Our implementation supports piecewise-constant tensors, on which infinite domains can be processed in finite time. We also introduce a new tensor format for efficient storage and a code generation technique for automatic kernel generation. For the first time, our abstraction expresses domains like computational geometry and computer graphics in the language of tensor programming. Our approach demonstrates competitive or better performance to hand-optimized kernels in leading libraries across diverse applications. Compared to hand-implemented libraries....","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3763146","openalex_id":"https://openalex.org/W4415007445","cited_by_count":1,"quality_score":42,"matched_keywords":["efficient"],"author_affiliations":["Georgia Institute of Technology","Massachusetts Institute of Technology","Nvidia (United States)"],"concepts":[{"id":"https://openalex.org/C155281189","display_name":"Tensor (intrinsic definition)","score":0.6959999799728394},{"id":"https://openalex.org/C137800194","display_name":"Interpolation (computer graphics)","score":0.6898000240325928},{"id":"https://openalex.org/C124304363","display_name":"Abstraction","score":0.5738000273704529},{"id":"https://openalex.org/C1680195","display_name":"Tensor algebra","score":0.5432999730110168},{"id":"https://openalex.org/C74193536","display_name":"Kernel (algebra)","score":0.5246000289916992},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.49300000071525574},{"id":"https://openalex.org/C68339613","display_name":"Speedup","score":0.4611999988555908},{"id":"https://openalex.org/C21442007","display_name":"Graphics","score":0.45910000801086426}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"arxiv:2510.23521","title":"Explicit Memory Through Online 3D Gaussian Splatting Improves Class-Agnostic Video Segmentation","url":"http://arxiv.org/abs/2510.23521","published":"2025-10-09","authors":["Anthony W. Opipari","Aravindhan K Krishnan","Shreekant Gayaka","Min Sun","Cheng-Hao Kuo","Arnie Sen","Odest Chadwicke Jenkins"],"abstract":"Remembering where object segments were predicted in the past is useful for improving the accuracy and consistency of class-agnostic video segmentation algorithms. Existing video segmentation algorithms typically use either no object-level memory (e.g. FastSAM) or they use implicit memories in the form of recurrent neural network features (e.g. SAM2). In this paper, we augment both types of segmentation models using an explicit 3D memory and show that the resulting models have more accurate and consistent predictions. For this, we develop an online 3D Gaussian Splatting (3DGS) technique to store predicted object-level segments generated throughout the duration of a video. Based on this 3DGS representation, a set of fusion techniques are developed, named FastSAM-Splat and SAM2-Splat, that use the explicit 3DGS memory to improve their respective foundation models' predictions. Ablation expe...","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/lra.2025.3619783","openalex_id":"https://openalex.org/W4415003108","cited_by_count":0,"quality_score":41,"matched_keywords":["memory"],"author_affiliations":["Amazon (United States)","University of Michigan"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8289999961853027},{"id":"https://openalex.org/C89600930","display_name":"Segmentation","score":0.6518999934196472},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.6243000030517578},{"id":"https://openalex.org/C2776436953","display_name":"Consistency (knowledge bases)","score":0.5038999915122986},{"id":"https://openalex.org/C177264268","display_name":"Set (abstract data type)","score":0.49390000104904175},{"id":"https://openalex.org/C50644808","display_name":"Artificial neural network","score":0.4593000113964081},{"id":"https://openalex.org/C8642999","display_name":"Hyperparameter","score":0.44290000200271606},{"id":"https://openalex.org/C153180895","display_name":"Pattern recognition (psychology)","score":0.42899999022483826}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4415002728","title":"Ranking Formal Specifications using LLMs","url":"https://doi.org/10.1145/3759425.3763386","published":"2025-10-09","authors":["Mike He","Zhendong Ang","Ankush Desai","Aarti Gupta"],"abstract":"Formal specifications are essential for reasoning about the correctness of complex systems. While recent advances have explored automatically learning such specifications, the challenge of distinguishing meaningful, non-trivial specifications from a vast and noisy pool of learned candidates remains largely open. In this position paper, we present an approach for specification ranking, aimed at identifying the most critical specifications that contribute to overall system correctness. To this end, we develop a four-metric rating framework that quantifies the importance of a specification. Our approach leverages the reasoning capabilities of Large Language Models to rank specifications from a set of automatically learned candidates. We evaluate the proposed method on a set of specifications inferred for 11 open-source and 3 proprietary distributed system benchmarks, demonstrating its effec...","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3759425.3763386","openalex_id":"https://openalex.org/W4415002728","cited_by_count":1,"quality_score":38,"matched_keywords":[],"author_affiliations":["Amazon (United States)","National University of Singapore","Princeton University"],"concepts":[{"id":"https://openalex.org/C55439883","display_name":"Correctness","score":0.7980999946594238},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7631000280380249},{"id":"https://openalex.org/C189430467","display_name":"Ranking (information retrieval)","score":0.6654999852180481},{"id":"https://openalex.org/C164226766","display_name":"Rank (graph theory)","score":0.6399000287055969},{"id":"https://openalex.org/C177264268","display_name":"Set (abstract data type)","score":0.6190000176429749},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5059999823570251},{"id":"https://openalex.org/C116253237","display_name":"Formal specification","score":0.4733000099658966},{"id":"https://openalex.org/C115903868","display_name":"Software engineering","score":0.4214000105857849}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4415002696","title":"Improving SAST Detection Capability with LLMs and Enhanced DFA","url":"https://doi.org/10.1145/3759425.3763388","published":"2025-10-09","authors":["Yuan Luo","Zhaojun Chen","Yuxin Dong","Haiquan Zhang","Yi Sun","Fei Xie","Zhiqiang Dong"],"abstract":"Static Application Security Testing (SAST) is a cornerstone of modern vulnerability discovery, enabling tools like GitHub’s CodeQL to identify security flaws in code repositories. However, our large-scale analysis of open-source repositories reveals that SAST’s detection performance is limited by three main factors: (1) incomplete source and sink coverage in built-in propagation rules, (2) failure to recognize sanitization functions, and (3) disruptions in data flow due to insufficient support for certain language features. In this work, we demonstrate how Large Language Models (LLMs) can improve the identification of taint sources and sinks, as well as the recognition of sanitization functions. Using CodeQL as an example, we also introduce the implementation principles of SAST's Data Flow Analysis (DFA). Furthermore, we propose enhancing Java thread support to improve the accuracy of DF...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3759425.3763388","openalex_id":"https://openalex.org/W4415002696","cited_by_count":1,"quality_score":38,"matched_keywords":[],"author_affiliations":["Peking University","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7972000241279602},{"id":"https://openalex.org/C138101251","display_name":"Thread (computing)","score":0.5839999914169312},{"id":"https://openalex.org/C43126263","display_name":"Source code","score":0.4546999931335449},{"id":"https://openalex.org/C548217200","display_name":"Java","score":0.4507000148296356},{"id":"https://openalex.org/C63116202","display_name":"Taint checking","score":0.4415000081062317},{"id":"https://openalex.org/C38652104","display_name":"Computer security","score":0.4068000018596649},{"id":"https://openalex.org/C116834253","display_name":"Identification (biology)","score":0.4016999900341034},{"id":"https://openalex.org/C124101348","display_name":"Data mining","score":0.3797999918460846}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4415003123","title":"ImitDiff: Transferring Foundation-Model Priors for Distraction-Robust Visuomotor Policy","url":"https://doi.org/10.1109/lra.2025.3619835","published":"2025-10-09","authors":["Yuhang Dong","Haizhou Ge","Yupei Zeng","Jiangning Zhang","Beiwen Tian","Hongrui Zhu","Yufei Jia","Ruixiang Wang","Zhucun Xue","Guyue Zhou","Longhua Ma","Guanzhong Tian"],"abstract":"Visuomotor imitation learning policies enable robots to efficiently acquire manipulation skills from visual demonstrations. However, as scene complexity and visual distractions increase, policies that perform well in simple settings often experience substantial performance degradation. To address this challenge, we propose ImitDiff, a diffusion-based imitation learning policy guided by fine-grained semantics within a dual-resolution workflow. Leveraging pretrained priors of vision-language foundation models, our method transforms high-level instructions into pixel-level visual semantic masks. These masks guide a dual-resolution perception pipeline that captures both global context (e.g., overall layout) from low-resolution observation and fine-grained local features (e.g., geometric details) from high-resolution observation, enabling the policy to focus on task-relevant regions. Addition...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/lra.2025.3619835","openalex_id":"https://openalex.org/W4415003123","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Chinese University of Hong Kong, Shenzhen","Ningbo University","Tencent (China)","Tsinghua University","University of Nottingham Ningbo China","Zhejiang University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7443000078201294},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.7010999917984009},{"id":"https://openalex.org/C177148314","display_name":"Generalization","score":0.5200999975204468},{"id":"https://openalex.org/C2776214188","display_name":"Inference","score":0.5127000212669373},{"id":"https://openalex.org/C177769412","display_name":"Prior probability","score":0.5059999823570251},{"id":"https://openalex.org/C2780791683","display_name":"Action (physics)","score":0.4814999997615814},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.4611000120639801},{"id":"https://openalex.org/C184337299","display_name":"Semantics (computer science)","score":0.4490000009536743}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4415002714","title":"CG-Bench: Can Language Models Assist Call Graph Construction in the Real World?","url":"https://doi.org/10.1145/3759425.3763379","published":"2025-10-09","authors":["Ting Yuan","Wenrui Zhang","Dong Chen","Jie Wang"],"abstract":"Language models for coding are shifting their focus from function-level to repository-level, with complex function invocations. We introduce CG-Bench, the first manually constructed benchmark that measures the ability to understand call graphs for language models. This benchmark contains 104 call sites and related code snippets associated with call chains from 7 representative open-source C/C++ projects. Language models are tasked with inference the calling targets from them. We evaluated four popular language models on CG-Bench. Surprisingly, all four models with different prompt settings achieve accuracy greater than 50% and Deepseek-6.7b with few-shot prompts reaches 69.70%. We further show four findings from a micro study, which demonstrates that using language models for call graph construction is promising and the performance can be improved by prompt hacking, removing irrelevant i...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3759425.3763379","openalex_id":"https://openalex.org/W4415002714","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Huawei Technologies (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8034999966621399},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.6736000180244446},{"id":"https://openalex.org/C185798385","display_name":"Benchmark (surveying)","score":0.6119999885559082},{"id":"https://openalex.org/C102379954","display_name":"Call graph","score":0.5963000059127808},{"id":"https://openalex.org/C2776214188","display_name":"Inference","score":0.5853999853134155},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.49309998750686646},{"id":"https://openalex.org/C192209626","display_name":"Focus (optics)","score":0.484499990940094},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.484499990940094}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/condabench-interactive-evaluation-of-language-models-for-advanced-data-analysis","title":"ConDABench: Interactive Evaluation of Language Models for Advanced Data Analysis","url":"https://www.microsoft.com/en-us/research/publication/condabench-interactive-evaluation-of-language-models-for-advanced-data-analysis/","published":"2025-10-08","authors":["Avik Dutta","Priyanshu Gupta","Hosein Hasanbeig","Rahul Pratap Singh","Harshit Nigam","Sumit Gulwani","Arjun Radhakrishna","Gustavo Soares","Ashish Tiwari"],"abstract":"Real-world data analysis tasks often come with under-specified goals and unclean data. User interaction is necessary to understand and disambiguate a user’s intent, and hence, essential to solving these complex tasks. Existing benchmarks for evaluating LLMs on data analysis tasks do not capture these complexities or provide first-class support for interactivity. We introduce ConDABench, a framework for generating conversational data analysis (ConDA) benchmarks and evaluating external tools on the generated benchmarks. ConDABench consists of (a) a multi-agent workflow for generating realistic benchmarks from articles describing insights gained from public datasets, (b) 1,420 ConDA problems generated using this workflow, and (c) an evaluation harness that, for the first time, makes it possible to systematically evaluate conversational data analysis tools on the generated ConDA problems. Ev...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","AI agents","1970-01-01","agent","multi-agent"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/all-claims-are-equal-but-some-claims-are-more-equal-than-others-importance-sensitive-factuality-evaluation-of-llm-generations","title":"All claims are equal, but some claims are more equal than others: Importance-sensitive factuality evaluation of LLM generations","url":"https://www.microsoft.com/en-us/research/publication/all-claims-are-equal-but-some-claims-are-more-equal-than-others-importance-sensitive-factuality-evaluation-of-llm-generations/","published":"2025-10-08","authors":["Miriam Wanner","Leif Azzopardi","Paul Thomas","Soham Dan","Ben Van Durme","Nick Craswell"],"abstract":"Existing methods for evaluating the factuality of large language model (LLM) responses treat all claims as equally important. This results in misleading evaluations when vital information is missing or incorrect as it receives the same weight as peripheral details, raising the question: how can we reliably detect such differences when there are errors in key information? Current approaches that measure factuality tend to be insensitive to omitted or false key information. To investigate this lack of sensitivity, we construct VITALERRORS, a benchmark of 6,733 queries with minimally altered LLM responses designed to omit or falsify key information. Using this dataset, we demonstrate the insensitivities of existing evaluation metrics to key information errors. To address this gap, we introduce VITAL, a set of metrics that provide greater sensitivity in measuring the factuality of responses....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Miscellaneous","Artificial intelligence","Computer science","LLM","language model"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W4416402292","title":"Revisiting put-that-there, context aware window interactions via LLMs","url":"https://doi.org/10.1109/ismar-adjunct68609.2025.00104","published":"2025-10-08","authors":["Riccardo Bovo","Daniele Giunchi","Pasquale Cascarano","Eric J. Gonzalez","Mar González-Franco"],"abstract":"We revisit Bolt’s classic Put-That-There concept for modern head-mounted displays by pairing Large Language Models (LLMs) with XR sensor and tech stack. The agent fuses (i) a semantically segmented 3-D environment, (ii) live application metadata, and (iii) users’ verbal, pointing, and head-gaze cues to issue JSON window-placement actions. As a result, users can manage a panoramic workspace through: (1) explicit commands (\"Place Google Maps on the coffee table\"), (2) deictic speech plus gestures (\"Put that there\"), or (3) high-level goals (\"I need to send a message\"). Unlike traditional explicit interfaces, our system supports one-to-many action mappings and goal-centric reasoning, allowing the LLM to dynamically infer relevant applications and layout decisions, including interrelationships across tools. This enables seamless, intent-driven interaction without manual window juggling in im...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/ismar-adjunct68609.2025.00104","openalex_id":"https://openalex.org/W4416402292","cited_by_count":1,"quality_score":46,"matched_keywords":["LLM","agent"],"author_affiliations":["Google (United States)","Imperial College London","University of Birmingham","University of Bologna"],"concepts":[{"id":"https://openalex.org/C207347870","display_name":"Gesture","score":0.7555999755859375},{"id":"https://openalex.org/C2780416260","display_name":"JSON","score":0.7459999918937683},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7437000274658203},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.6604999899864197},{"id":"https://openalex.org/C58581272","display_name":"Workspace","score":0.6521999835968018},{"id":"https://openalex.org/C13077596","display_name":"Deixis","score":0.6395000219345093},{"id":"https://openalex.org/C2778751112","display_name":"Window (computing)","score":0.5924999713897705},{"id":"https://openalex.org/C2779343474","display_name":"Context (archaeology)","score":0.5881999731063843}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"arxiv:2510.06931","title":"Textual interpretation of transient image classifications from large language models","url":"http://arxiv.org/abs/2510.06931","published":"2025-10-08","authors":["Fiorenzo Stoppa","Turan Bulmus","S. Bloemen","S. J. Smartt","P. Groot","P. M. Vreeswijk","K. Smith"],"abstract":"Modern astronomical surveys deliver immense volumes of transient detections, yet distinguishing real astrophysical signals (for example, explosive events) from bogus imaging artefacts remains a challenge. Convolutional neural networks are effectively used for real versus bogus classification; however, their reliance on opaque latent representations hinders interpretability. Here we show that large language models (LLMs) can approach the performance level of a convolutional neural network on three optical transient survey datasets (Pan-STARRS, MeerLICHT and ATLAS) while simultaneously producing direct, human-readable descriptions for every candidate. Using only 15 examples and concise instructions, Google's LLM, Gemini, achieves a 93% average accuracy across datasets that span a range of resolution and pixel scales. We also show that a second LLM can assess the coherence of the output of....","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1038/s41550-025-02670-z","openalex_id":"https://openalex.org/W4414930832","cited_by_count":2,"quality_score":43,"matched_keywords":["LLM"],"author_affiliations":["Google (United States)","Queen's University Belfast","Radboud University Nijmegen","South African Radio Astronomy Observatory","University of Cape Town","University of Oxford"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7982000112533569},{"id":"https://openalex.org/C81363708","display_name":"Convolutional neural network","score":0.7749999761581421},{"id":"https://openalex.org/C2781181686","display_name":"Coherence (philosophical gambling strategy)","score":0.6133999824523926},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5866000056266785},{"id":"https://openalex.org/C2780799671","display_name":"Transient (computer programming)","score":0.576200008392334},{"id":"https://openalex.org/C100776233","display_name":"Bridge (graph theory)","score":0.5368000268936157},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.48590001463890076},{"id":"https://openalex.org/C527412718","display_name":"Interpretation (philosophy)","score":0.4708999991416931}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":2}},{"id":"arxiv:2510.07318","title":"Artificial Hippocampus Networks for Efficient Long-Context Modeling","url":"https://huggingface.co/papers/2510.07318","published":"2025-10-08","authors":["Yunhao Fang","Weihao Yu","Shu Zhong","Qinghao Ye","Xuehan Xiong","Lai Wei"],"abstract":"Long-sequence modeling faces a fundamental trade-off between the efficiency of compressive fixed-size memory in RNN-like models and the fidelity of lossless growing memory in attention-based Transformers. Inspired by the Multi-Store Model in cognitive science, we introduce a memory framework of artificial neural networks. Our method maintains a sliding window of the Transformer's KV cache as lossless short-term memory, while a learnable module termed Artificial Hippocampus Network (AHN) recurrently compresses out-of-window information into a fixed-size compact long-term memory. To validate this framework, we instantiate AHNs using modern RNN-like architectures, including Mamba2, DeltaNet, and Gated DeltaNet. Extensive experiments on long-context benchmarks LV-Eval and InfiniteBench demonstrate that AHN-augmented models consistently outperform sliding window baselines and achieve performa...","companies":["ByteDance/Seed"],"matched_orgs":["ByteDance/Seed"],"company_groups":["company_china"],"company_regions":["China"],"sources":["huggingface_search"],"source":"huggingface_search","work_type":"","doi":"","openalex_id":"","cited_by_count":0,"quality_score":39,"matched_keywords":["memory","long-term","efficient"],"author_affiliations":[],"concepts":[],"official_report":false,"quality_signals":{"company_match_source":"HuggingFace Papers organization/author metadata"}},{"id":"openalex:W4414957745","title":"Beyond Latent Patterns: Reinterpreting AI Model Capabilities","url":"https://doi.org/10.36227/techrxiv.175994134.40680922/v1","published":"2025-10-08","authors":["Siddhant Sukhatankar"],"abstract":"This paper critically examines prevailing claims regarding Artificial General Intelligence, particularly concerning Large Language Models (LLMs). It highlights how correlations observed within latent embeddings, while indicative of learned representations, are often misinterpreted as evidence of humanlike understanding. By exploring human cognitive biases in pattern recognition, this work advocates for a refined modeling feedback mechanism in AI evaluation, emphasizing rigorous interpretation of model outputs to avoid unsubstantiated assertions of intelligence.","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"preprint","doi":"https://doi.org/10.36227/techrxiv.175994134.40680922/v1","openalex_id":"https://openalex.org/W4414957745","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Amazon (Germany)","Amazon (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5185999870300293},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4986000061035156},{"id":"https://openalex.org/C111472728","display_name":"Epistemology","score":0.3075999915599823},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.2973000109195709},{"id":"https://openalex.org/C188147891","display_name":"Cognitive science","score":0.28839999437332153},{"id":"https://openalex.org/C192209626","display_name":"Focus (optics)","score":0.2702000141143799},{"id":"https://openalex.org/C12713177","display_name":"Perspective (graphical)","score":0.26820001006126404},{"id":"https://openalex.org/C15744967","display_name":"Psychology","score":0.2680000066757202}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"arxiv:2510.07315","title":"Vibe Checker: Aligning Code Evaluation with Human Preference","url":"https://huggingface.co/papers/2510.07315","published":"2025-10-08","authors":["Ming Zhong","Xiang Zhou","Ting-Yun Chang","Qingze Wang","Nan Xu","Xiance Si","Dan Garrette","Shyam Upadhyay","Jeremiah Liu","Jiawei Han","Benoit Schillings","Jiao Sun"],"abstract":"Large Language Models (LLMs) have catalyzed vibe coding, where users leverage LLMs to generate and iteratively refine code through natural language interactions until it passes their vibe check. Vibe check is tied to real-world human preference and goes beyond functionality: the solution should feel right, read cleanly, preserve intent, and remain correct. However, current code evaluation remains anchored to pass@k and captures only functional correctness, overlooking the non-functional instructions that users routinely apply. In this paper, we hypothesize that instruction following is the missing piece underlying vibe check that represents human preference in coding besides functional correctness. To quantify models' code instruction following capabilities with measurable signals, we present VeriCode, a taxonomy of 30 verifiable code instructions together with corresponding deterministi...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["huggingface_search"],"source":"huggingface_search","work_type":"","doi":"","openalex_id":"","cited_by_count":0,"quality_score":31,"matched_keywords":["preference"],"author_affiliations":[],"concepts":[],"official_report":false,"quality_signals":{"company_match_source":"HuggingFace Papers organization/author metadata"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/the-markovian-thinker","title":"The Markovian Thinker","url":"https://www.microsoft.com/en-us/research/publication/the-markovian-thinker/","published":"2025-10-07","authors":["Milad Aghajohari","Kamran Chitsaz","Amirhossein Kazemnejad","Sarath Chandar","Alessandro Sordoni","Aaron C. Courville","Siva Reddy"],"abstract":"Reinforcement learning (RL) has recently become a strong recipe for training reasoning LLMs that produce long chains of thought (LongCoT). Yet the standard RL\"thinking environment\", where the state is the prompt plus all prior reasoning tokens, makes the state unbounded and forces attention-based policies to pay quadratic compute as thoughts lengthen. We revisit the environment itself. We propose Markovian Thinking, a paradigm in which the policy advances reasoning while conditioning on a constant-size state, decoupling thinking length from context size. As an immediate consequence this yields linear compute with constant memory. We instantiate this idea with Delethink, an RL environment that structures reasoning into fixed-size chunks. Within each chunk, the model thinks as usual; at the boundary, the environment resets the context and reinitializes the prompt with a short carryover. Th...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","1970-01-01","memory","efficient"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/flipping-the-dialogue-training-and-evaluating-user-language-models","title":"Flipping the Dialogue: Training and Evaluating User Language Models","url":"https://www.microsoft.com/en-us/research/publication/flipping-the-dialogue-training-and-evaluating-user-language-models/","published":"2025-10-07","authors":["Tarek Naous","Philippe Laban","Wei Xu","Jennifer Neville"],"abstract":"Conversations with LMs involve two participants: a human user leading the conversation, and an LM assistant responding to the user's request. To satisfy this specific role, LMs are post-trained to be helpful assistants -- optimized to produce exhaustive and well-structured responses, free of ambiguity and grammar errors. User utterances, on the other hand, are rarely perfected, with each user phrasing requests in unique ways, sometimes putting in partial effort at each turn and refining on the fly. To evaluate LM performance in realistic settings, prior work simulated users in multi-turn conversations, often prompting an LLM originally trained to be a helpful assistant to act as a user. However, we show that assistant LMs make for poor user simulators, with the surprising finding that better assistants yield worse simulators. Instead, we introduce purpose-built User Language Models (User...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","large language models","1970-01-01","LLM"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/next-semantic-scale-prediction-via-hierarchical-diffusion-language-models","title":"Next Semantic Scale Prediction via Hierarchical Diffusion Language Models","url":"https://www.microsoft.com/en-us/research/publication/next-semantic-scale-prediction-via-hierarchical-diffusion-language-models/","published":"2025-10-07","authors":["Cai Zhou","Chenyu Wang","Dinghuai Zhang","Shangyuan Tong","Yifei Wang","Stephen Bates","T. Jaakkola"],"abstract":"In this paper we introduce Hierarchical Diffusion Language Models (HDLM) -- a novel family of discrete diffusion models for language modeling. HDLM builds on a hierarchical vocabulary where low-level tokens with detailed semantics are surjectively mapped to high-level tokens with coarse-grained meanings. In the forward process, each token is independently perturbed to its higher-level ancestor with more abstract semantics according to the scheduler, while in the reverse process the model progressively predicts the next, more detailed semantics. Taken together, HDLM provides a general time-varying next semantic scale prediction process for language modeling. We derive closed-form expressions for the diffusion Evidence Lower Bound (ELBO), and show that HDLM can be implemented in a flexible manner while including the existing MDLM as a special case. We also propose practical training techni...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","Diffusion models","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"official:2ae187b11915ebe1","title":"Gemini 2.5 Computer Use Model Card","url":"https://storage.googleapis.com/deepmind-media/Model-Cards/Gemini-2-5-Computer-Use-Model-Card.pdf","published":"2025-10-07","authors":["Google/DeepMind"],"abstract":"Official Google DeepMind model card.","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"model_card","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["model card","Gemini 2.5 Computer Use"],"author_affiliations":["Google/DeepMind"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Google DeepMind model cards page https://deepmind.google/models/model-cards/"}},{"id":"apple:ogb306ap2z88olbhuumbftwo","title":"Stable Diffusion Models are Secretly Good at Visual In-Context Learning","url":"https://machinelearning.apple.com/research/stable-diffusion","published":"2025-10-07","authors":["Trevine Oorloff","Vishwanath Sindagi","Wele Gedara Chaminda Bandara","Ali Shafahi","Amin Ghiasi","Charan Prakash","Reza Ardekani"],"abstract":"Large language models (LLM) in natural language processing (NLP) have demonstrated great potential for in-context learning (ICL) -- the ability to leverage a few sets of example prompts to adapt to various tasks without having to explicitly update the model weights. ICL has recently been explored for computer vision tasks with promising early outcomes. These approaches involve specialized training and/or additional data that complicate the...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["LLM"],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"openalex:W4414908731","title":"EgoTrigger: Toward Audio-Driven Image Capture for Human Memory Enhancement in All-Day Energy-Efficient Smart Glasses","url":"https://doi.org/10.1109/tvcg.2025.3616866","published":"2025-10-07","authors":["Akshay Paruchuri","Sinan Hersek","Lavisha Aggarwal","Qiao Yang","Xin Liu","Achin Kulshrestha","Andrea Colaço","Henry Fuchs","Ishan Chatterjee"],"abstract":"All-day smart glasses are likely to emerge as platforms capable of continuous contextual sensing, uniquely positioning them for unprecedented assistance in our daily lives. Integrating the multi-modal AI agents required for human memory enhancement while performing continuous sensing, however, presents a major energy efficiency challenge for all-day usage. Achieving this balance requires intelligent, context-aware sensor management. Our approach, EgoTrigger, leverages audio cues from the microphone to selectively activate power-intensive cameras, enabling efficient sensing while preserving substantial utility for human memory enhancement. EgoTrigger uses a lightweight audio model (YAMNet) and a custom classification head to trigger image capture from hand-object interaction (HOI) audio cues, such as the sound of a drawer opening or a medication bottle being opened. In addition to evaluat...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/tvcg.2025.3616866","openalex_id":"https://openalex.org/W4414908731","cited_by_count":1,"quality_score":46,"matched_keywords":["memory","efficient"],"author_affiliations":["Google (United States)","University of North Carolina Health Care","University of North Carolina at Chapel Hill"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8503000140190125},{"id":"https://openalex.org/C2778263558","display_name":"Microphone","score":0.6287999749183655},{"id":"https://openalex.org/C100660578","display_name":"Recall","score":0.4896000027656555},{"id":"https://openalex.org/C186370098","display_name":"Energy (signal processing)","score":0.4763999879360199},{"id":"https://openalex.org/C2985957978","display_name":"Human memory","score":0.46050000190734863},{"id":"https://openalex.org/C121687571","display_name":"Activity recognition","score":0.4404999911785126},{"id":"https://openalex.org/C29794715","display_name":"Smartwatch","score":0.43639999628067017},{"id":"https://openalex.org/C81669768","display_name":"Precision and recall","score":0.41780000925064087}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"arxiv:2510.06217","title":"TaTToo: Tool-Grounded Thinking PRM for Test-Time Scaling in Tabular Reasoning","url":"https://huggingface.co/papers/2510.06217","published":"2025-10-07","authors":["Jiaru Zou","Soumya Roy","Vinay Kumar Verma","Ziyi Wang","David Wipf","Pan Lu","Sumit Negi","James Zou","Jingrui He"],"abstract":"Process Reward Models (PRMs) have recently emerged as a powerful framework for enhancing the reasoning capabilities of large reasoning models (LRMs), particularly in the context of test-time scaling (TTS). However, their potential for supervising LRMs on tabular reasoning domains remains underexplored. Through detailed empirical analyses, we identify that existing PRMs, though widely adopted for supervising text-only reasoning steps, struggle with table-specific operations such as sub-table retrieval and schema interaction, leading to critical performance bottlenecks. To address this limitation, we propose TaTToo, a novel table-grounded PRM framework that (i) reasons explicitly over tabular reasoning steps and (ii) integrates tool-based verification to provide precise reward supervision. Concretely, we first design a scalable data curation pipeline that constructs over 60k high-quality s...","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["huggingface_search"],"source":"huggingface_search","work_type":"","doi":"","openalex_id":"","cited_by_count":0,"quality_score":31,"matched_keywords":["retrieval"],"author_affiliations":[],"concepts":[],"official_report":false,"quality_signals":{"company_match_source":"HuggingFace Papers organization/author metadata"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/high-fidelity-synthetic-ecg-generation-via-mel-spectrogram-informed-diffusion-training","title":"High-Fidelity Synthetic ECG Generation via Mel-Spectrogram Informed Diffusion Training","url":"https://www.microsoft.com/en-us/research/publication/high-fidelity-synthetic-ecg-generation-via-mel-spectrogram-informed-diffusion-training/","published":"2025-10-06","authors":["Zhuoyi Huang","Nutan Sahoo","Anamika Kumari","Girish Kumar","Kexuan Cai","Shixing Cao","Yue Kang","Tian Xia","Somya Chatterjee","Nick Hausman","Aidan Jay","Eric S. Rosenthal"],"abstract":"The development of machine learning for cardiac care is severely hampered by privacy restrictions on sharing real patient electrocardiogram (ECG) data. Although generative AI offers a promising solution, the real-world use of existing model-synthesized ECGs is limited by persistent gaps in trustworthiness and clinical utility. In this work, we address two major shortcomings of current generative ECG methods: insufficient morphological fidelity and the inability to generate personalized, patient-specific physiological signals. To address these gaps, we build on a conditional diffusion-based Structured State Space Model (SSSD-ECG) with two principled innovations: (1) MIDT-ECG (Mel-Spectrogram Informed Diffusion Training), a novel training paradigm with time-frequency domain supervision to enforce physiological structural realism, and (2) multi-modal demographic conditioning to enable patie...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":80,"matched_keywords":["Miscellaneous","Artificial intelligence","Medical, health and genomics","Computer science","Machine learning","personalized","personalization"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/eepo-exploration-enhanced-policy-optimization-via-sample-then-forget","title":"EEPO: Exploration-Enhanced Policy Optimization via Sample-Then-Forget","url":"https://www.microsoft.com/en-us/research/publication/eepo-exploration-enhanced-policy-optimization-via-sample-then-forget/","published":"2025-10-06","authors":["Liang Chen","Xueting Han","Qizhou Wang","Bo Han","Jing Bai","Hinrich Schutze","Kam-Fai Wong"],"abstract":"Balancing exploration and exploitation remains a central challenge in reinforcement learning with verifiable rewards (RLVR) for large language models (LLMs). Current RLVR methods often overemphasize exploitation, leading to entropy collapse, diminished exploratory capacity, and ultimately limited performance gains. Although techniques that increase policy stochasticity can promote exploration, they frequently fail to escape dominant behavioral modes. This creates a self-reinforcing loop-repeatedly sampling and rewarding dominant modes-that further erodes exploration. We introduce Exploration-Enhanced Policy Optimization (EEPO), a framework that promotes exploration via two-stage rollouts with adaptive unlearning. In the first stage, the model generates half of the trajectories; it then undergoes a lightweight unlearning step to temporarily suppress these sampled responses, forcing the se...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","large language models","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W4414870983","title":"Multimodal AI agents for capturing and sharing laboratory practice","url":"https://doi.org/10.1101/2025.10.05.680425","published":"2025-10-06","authors":["Patricia Skowronek","Anant Nawalgaria","Matthias Mann"],"abstract":"Abstract We present a multimodal AI laboratory agent that captures and shares tacit experimental practice by linking written instructions with hands-on laboratory work through the analysis of video, speech, and text. While current AI tools have proven effective in literature analysis and code generation, they do not address the critical gap between documented knowledge and implicit lab practice. Our framework bridges this divide by integrating protocol generation directly from researcher-recorded videos, systematic detection of experimental errors, and evaluation of instrument readiness by comparing current performance against historical decisions. Evaluated in mass spectrometry-based proteomics, we demonstrate that the agent can capture and share practical expertise beyond conventional documentation and identify common mistakes, although domain-specific and spatial recognition should st...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"preprint","doi":"https://doi.org/10.1101/2025.10.05.680425","openalex_id":"https://openalex.org/W4414870983","cited_by_count":1,"quality_score":42,"matched_keywords":["agent"],"author_affiliations":["Google (United States)","Max Planck Institute of Biochemistry"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6891000270843506},{"id":"https://openalex.org/C56666940","display_name":"Documentation","score":0.6399000287055969},{"id":"https://openalex.org/C2780385302","display_name":"Protocol (science)","score":0.5109999775886536},{"id":"https://openalex.org/C2779561248","display_name":"Tacit knowledge","score":0.4970000088214874},{"id":"https://openalex.org/C2522767166","display_name":"Data science","score":0.44200000166893005},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.3970000147819519},{"id":"https://openalex.org/C184356942","display_name":"Best practice","score":0.36640000343322754},{"id":"https://openalex.org/C2780910867","display_name":"Multimodality","score":0.33309999108314514}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"arxiv:2510.05396","title":"Scalable In-context Ranking with Generative Models","url":"https://huggingface.co/papers/2510.05396","published":"2025-10-06","authors":["Nilesh Gupta","Chong You","Srinadh Bhojanapalli","Sanjiv Kumar","Inderjit Dhillon","Felix Yu"],"abstract":"In-context Ranking (ICR) is an emerging paradigm for Information Retrieval (IR), which leverages contextual understanding of LLMs by directly incorporating the task description, candidate documents, and the query into the model's input prompt and tasking the LLM to identify relevant document(s). While it is effective, efficiency is a significant challenge in this paradigm, especially as the candidate list grows due to quadratic/super-linear scaling of attention operation with context length. To this end, this paper first identifies inherent and exploitable structures in the attention of LLMs finetuned for ICR: (1) inter-document block sparsity: attention is dense within each document block but sparse across different documents in the context; and (2) query-document block relevance: the attention scores from certain query tokens to a document block in middle layers strongly correlate with...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["huggingface_search"],"source":"huggingface_search","work_type":"","doi":"","openalex_id":"","cited_by_count":0,"quality_score":39,"matched_keywords":["LLM","retrieval","efficient"],"author_affiliations":[],"concepts":[],"official_report":false,"quality_signals":{"company_match_source":"HuggingFace Papers organization/author metadata"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/agentic-context-engineering-evolving-contexts-for-self-improving-language-models","title":"Agentic Context Engineering: Evolving Contexts for Self-Improving Language Models","url":"https://www.microsoft.com/en-us/research/publication/agentic-context-engineering-evolving-contexts-for-self-improving-language-models/","published":"2025-10-05","authors":["Qizheng Zhang","Changran Hu","Shubhangi Upasani","Boyuan Ma","Fenglu Hong","V. Kamanuru","Jay Rainton","Chen Wu","Mengmeng Ji","Hanchen Li","Urmish Thakker","James Zou"],"abstract":"Large language model (LLM) applications such as agents and domain-specific reasoning increasingly rely on context adaptation -- modifying inputs with instructions, strategies, or evidence, rather than weight updates. Prior approaches improve usability but often suffer from brevity bias, which drops domain insights for concise summaries, and from context collapse, where iterative rewriting erodes details over time. Building on the adaptive memory introduced by Dynamic Cheatsheet, we introduce ACE (Agentic Context Engineering), a framework that treats contexts as evolving playbooks that accumulate, refine, and organize strategies through a modular process of generation, reflection, and curation. ACE prevents collapse with structured, incremental updates that preserve detailed knowledge and scale with long-context models. Across agent and domain-specific benchmarks, ACE optimizes contexts b...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":92,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","large language models","1970-01-01","LLM","language model","memory","efficient","agent"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/trade-in-minutes-rationality-driven-agentic-system-for-quantitative-financial-trading","title":"Trade in Minutes! Rationality-Driven Agentic System for Quantitative Financial Trading","url":"https://www.microsoft.com/en-us/research/publication/trade-in-minutes-rationality-driven-agentic-system-for-quantitative-financial-trading/","published":"2025-10-05","authors":["Zifan Song","Kaitao Song","Guosheng Hu","Ding Qi","Junyao Gao","Xiaohua Wang","Dongsheng Li","Cairong Zhao"],"abstract":"Recent advancements in large language models (LLMs) and agentic systems have shown exceptional decision-making capabilities, revealing significant potential for autonomic finance. Current financial trading agents predominantly simulate anthropomorphic roles that inadvertently introduce emotional biases and rely on peripheral information, while being constrained by the necessity for continuous inference during deployment. In this paper, we pioneer the harmonization of strategic depth in agents with the mechanical rationality essential for quantitative trading. Consequently, we present TiMi (Trade in Minutes), a rationality-driven multi-agent system that architecturally decouples strategy development from minute-level deployment. TiMi leverages specialized LLM capabilities of semantic analysis, code programming, and mathematical reasoning within a comprehensive policy-optimization-deployme...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":80,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","1970-01-01","LLM","agent","multi-agent"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/swireasoning-switch-thinking-in-latent-and-explicit-for-pareto-superior-reasoning-llms","title":"SwiReasoning: Switch-Thinking in Latent and Explicit for Pareto-Superior Reasoning LLMs","url":"https://www.microsoft.com/en-us/research/publication/swireasoning-switch-thinking-in-latent-and-explicit-for-pareto-superior-reasoning-llms/","published":"2025-10-05","authors":["Dachuan Shi","Abedelkadir Asi","Keying Li","Xiangchi Yuan","Leyan Pan","Wenke Lee","Wen Xiao"],"abstract":"Recent work shows that, beyond discrete reasoning through explicit chain-of-thought steps, which are limited by the boundaries of natural languages, large language models (LLMs) can also reason continuously in latent space, allowing richer information per step and thereby improving token efficiency. Despite this promise, latent reasoning still faces two challenges, especially in training-free settings: 1) purely latent reasoning broadens the search distribution by maintaining multiple implicit paths, which diffuses probability mass, introduces noise, and impedes convergence to a single high-confidence solution, thereby hurting accuracy; and 2) overthinking persists even without explicit text, wasting tokens and degrading efficiency. To address these issues, we introduce SwiReasoning, a training-free framework for LLM reasoning which features two key innovations: 1) SwiReasoning dynamical...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","1970-01-01","LLM"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/do-code-models-suffer-from-the-dunning-kruger-effect","title":"Do Code Models Suffer from the Dunning-Kruger Effect?","url":"https://www.microsoft.com/en-us/research/publication/do-code-models-suffer-from-the-dunning-kruger-effect/","published":"2025-10-05","authors":["Mukul Singh","Somya Chatterjee","Arjun Radhakrishna","Sumit Gulwani"],"abstract":"As artificial intelligence systems increasingly collaborate with humans in creative and technical domains, questions arise about the cognitive boundaries and biases that shape our shared agency. This paper investigates the Dunning-Kruger Effect (DKE), the tendency for those with limited competence to overestimate their abilities in state-of-the-art LLMs in coding tasks. By analyzing model confidence and performance across a diverse set of programming languages, we reveal that AI models mirror human patterns of overconfidence, especially in unfamiliar or low-resource domains. Our experiments demonstrate that less competent models and those operating in rare programming languages exhibit stronger DKE-like bias, suggesting that the strength of the bias is proportionate to the competence of the models.","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Article (Journal)","Artificial intelligence","Social sciences","Computer science"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W7125942996","title":"On the Taxonomy, Tasks, and Open-Challenges for Multimodal Large Language Models","url":"https://doi.org/10.1109/smc58881.2025.11342688","published":"2025-10-05","authors":["Lecheng Yan","Ruizhe Li","Jiahui Geng","Qing Li","Minghao Wu","Zhanyu Wang","Wenxi Li","Tianbo Ji","Shaochen Jiang","Chenyang Lyu"],"abstract":"In recent years, the field of Artificial Intelligence has witnessed the emergence of Multimodal Large Language Models (MLLMs) that have significantly advanced the state-of-the-art in understanding and generating content across various data modalities. These models, capable of processing and integrating information from text, images, audio, and video, have opened new avenues for research and applications. Distinguished by their ability to understand and generation information with diverse modalities, such as text, image, audio and many others, MLLMs mark a significant step towards the final aim of Artificial General Intelligence (AGI). This comprehensive survey provides an in-depth examination of MLLMs, highlighting their evolutionary trajectory, current state-of-the-art developments, and prospective future directions. Specifically, we show taxonomy of MLLMs by their modalities to be proc...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/smc58881.2025.11342688","openalex_id":"https://openalex.org/W7125942996","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Alibaba Group (China)","Mohamed bin Zayed University of Artificial Intelligence","Monash University","Nantong University","The University of Sydney","Tsinghua University","University of Aberdeen","Xinjiang University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6661999821662903},{"id":"https://openalex.org/C70587473","display_name":"Transformative learning","score":0.6304000020027161},{"id":"https://openalex.org/C2522767166","display_name":"Data science","score":0.5336999893188477},{"id":"https://openalex.org/C2779903281","display_name":"Modalities","score":0.5293999910354614},{"id":"https://openalex.org/C105339364","display_name":"Software deployment","score":0.4982999861240387},{"id":"https://openalex.org/C2779343474","display_name":"Context (archaeology)","score":0.4936000108718872},{"id":"https://openalex.org/C58642233","display_name":"Taxonomy (biology)","score":0.4708000123500824},{"id":"https://openalex.org/C9652623","display_name":"Field (mathematics)","score":0.4641999900341034}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"arxiv:2510.04290","title":"ChronoEdit: Towards Temporal Reasoning for Image Editing and World Simulation","url":"https://huggingface.co/papers/2510.04290","published":"2025-10-05","authors":["Jay Zhangjie Wu","Xuanchi Ren","Tianchang Shen","Tianshi Cao","Kai He","Yifan Lu","Ruiyuan Gao","Enze Xie","Shiyi Lan","Jose M. Alvarez","Jun Gao","Sanja Fidler"],"abstract":"Recent advances in large generative models have significantly advanced image editing and in-context image generation, yet a critical gap remains in ensuring physical consistency, where edited objects must remain coherent. This capability is especially vital for world simulation related tasks. In this paper, we present ChronoEdit, a framework that reframes image editing as a video generation problem. First, ChronoEdit treats the input and edited images as the first and last frames of a video, allowing it to leverage large pretrained video generative models that capture not only object appearance but also the implicit physics of motion and interaction through learned temporal consistency. Second, ChronoEdit introduces a temporal reasoning stage that explicitly performs editing at inference time. Under this setting, the target frame is jointly denoised with reasoning tokens to imagine a pla...","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["huggingface_search"],"source":"huggingface_search","work_type":"","doi":"","openalex_id":"","cited_by_count":0,"quality_score":27,"matched_keywords":[],"author_affiliations":[],"concepts":[],"official_report":false,"quality_signals":{"company_match_source":"HuggingFace Papers organization/author metadata"}},{"id":"hf-org-paper:tencent:2510.03222","title":"Low-probability Tokens Sustain Exploration in Reinforcement Learning with Verifiable Reward","url":"https://huggingface.co/papers/2510.03222","published":"2025-10-03","authors":["Tencent/Hunyuan"],"abstract":"","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["HuggingFace org papers","tencent"],"author_affiliations":["Tencent/Hunyuan"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/tencent/papers"}},{"id":"openalex:W4414798304","title":"Flow Autoencoders are Effective Protein Tokenizers","url":"https://doi.org/10.1101/2025.10.01.679645","published":"2025-10-03","authors":["Rohit Dilip","Evan Zhang","Ayush Varshney","David Van Valen"],"abstract":"A bstract Protein structure tokenizers enable the creation of multimodal models of protein structure, sequence, and function. Current approaches to protein structure tok-enization rely on bespoke components that are invariant to spatial symmetries, but that are challenging to optimize and scale. We present Kanzi, a flow-based tokenizer for tokenization and generation of protein structures. Kanzi consists of a diffusion autoencoder trained with a flow matching loss. We show that this approach simplifies several aspects of protein structure tokenizers: frame-based representations can be replaced with global coordinates, complex losses are replaced with a single flow matching loss, and SE(3)-invariant attention operations can be replaced with standard attention. We find that these changes stabilize the training of parameter-efficient models that outperform existing to- kenizers on reconstru...","companies":["OpenAI"],"matched_orgs":["OpenAI"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"preprint","doi":"https://doi.org/10.1101/2025.10.01.679645","openalex_id":"https://openalex.org/W4414798304","cited_by_count":0,"quality_score":41,"matched_keywords":["efficient"],"author_affiliations":["California Institute of Technology","OpenAI (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7506999969482422},{"id":"https://openalex.org/C101738243","display_name":"Autoencoder","score":0.6740999817848206},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5455999970436096},{"id":"https://openalex.org/C165064840","display_name":"Matching (statistics)","score":0.4903999865055084},{"id":"https://openalex.org/C38349280","display_name":"Flow (mathematics)","score":0.4717999994754791},{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.4059999883174896},{"id":"https://openalex.org/C153180895","display_name":"Pattern recognition (psychology)","score":0.4049000144004822},{"id":"https://openalex.org/C167966045","display_name":"Generative model","score":0.3720000088214874}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"arxiv:2510.03506","title":"OneFlow: Concurrent Mixed-Modal and Interleaved Generation with Edit Flows","url":"https://huggingface.co/papers/2510.03506","published":"2025-10-03","authors":["John Nguyen","Marton Havasi","Tariq Berrada","Luke Zettlemoyer","Ricky T. Q. Chen"],"abstract":"We present OneFlow, the first non-autoregressive multimodal model that enables variable-length and concurrent mixed-modal generation. Unlike autoregressive models that enforce rigid causal ordering between text and image generation, OneFlow combines an insertion-based Edit Flow for discrete text tokens with Flow Matching for image latents. OneFlow enables concurrent text-image synthesis with hierarchical sampling that prioritizes content over grammar. Through controlled experiments across model sizes from 1B to 8B, we demonstrate that OneFlow outperforms autoregressive baselines on both generation and understanding tasks while using up to 50% fewer training FLOPs. OneFlow surpasses both autoregressive and diffusion-based approaches while unlocking new capabilities for concurrent generation, iterative refinement, and natural reasoning-like generation.","companies":["Meta/FAIR"],"matched_orgs":["Meta/FAIR"],"company_groups":["company_us"],"company_regions":["US"],"sources":["huggingface_search"],"source":"huggingface_search","work_type":"","doi":"","openalex_id":"","cited_by_count":0,"quality_score":27,"matched_keywords":[],"author_affiliations":[],"concepts":[],"official_report":false,"quality_signals":{"company_match_source":"HuggingFace Papers organization/author metadata"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/taming-imperfect-process-verifiers-a-sampling-perspective-on-backtracking","title":"Taming Imperfect Process Verifiers: A Sampling Perspective on Backtracking","url":"https://www.microsoft.com/en-us/research/publication/taming-imperfect-process-verifiers-a-sampling-perspective-on-backtracking/","published":"2025-10-02","authors":["Dhruv Rohatgi","Abhishek Shetty","Donya Saless","Yuchen Li","Ankur Moitra","Andrej Risteski","Dylan Foster"],"abstract":"Test-time algorithms that combine the generative power of language models with process verifiers that assess the quality of partial generations offer a promising lever for eliciting new reasoning capabilities, but the algorithmic design space and computational scaling properties of such approaches are still opaque, and their benefits are far from apparent when one accounts for the cost of learning a high-quality verifier. Our starting point is the observation that seemingly benign errors in a learned verifier can lead to catastrophic failures for standard decoding techniques due to error amplification during the course of generation. We then ask: can this be improved with more sophisticated decoding strategies? We introduce a new process-guided test-time sampling algorithm, VGB, which uses theoretically grounded backtracking to achieve provably better robustness to verifier errors. VGB i...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"apple:ovq7iqt5e4mam65sek5y2gq3","title":"TASER: Translation Assessment via Systematic Evaluation and Reasoning","url":"https://machinelearning.apple.com/research/taser","published":"2025-10-02","authors":["Monishwaran Maheswaran","Marco Carini","Christian Federmann","Tony Diaz"],"abstract":"We introduce TASER (Translation Assessment via Systematic Evaluation and Reasoning), a metric that uses Large Reasoning Models (LRMs) for automated translation quality assessment. TASER harnesses the explicit reasoning capabilities of LRMs to conduct systematic, step-by-step evaluation of translation quality. We evaluate TASER on the WMT24 Metrics Shared Task across both reference-based and reference-free scenarios, demonstrating state-of-the-art...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"openalex:W7123348570","title":"Secret Breach Detection in Source Code with Large Language Models","url":"https://doi.org/10.1109/esem64174.2025.00021","published":"2025-10-02","authors":["Md Nafiu Rahman","Sadif Ahmed","Zahin Wahab","S M Sohan","Rifat Shahriyar"],"abstract":"Background: Leaking sensitive information-such as API keys, tokens, and credentials-in source code remains a persistent security threat. Traditional regex and entropy-based tools often generate high false positives due to limited contextual understanding. Aims: This work aims to enhance secret detection in source code using large language models (LLMs), reducing false positives while maintaining high recall. We also evaluate the feasibility of using fine-tuned, smaller models for local deployment. Method: We propose a hybrid approach combining regex-based candidate extraction with LLM-based classification. We evaluate pre-trained and fine-tuned variants of various Large Language Models on a benchmark dataset from 818 GitHub repositories. Various prompting strategies and efficient fine-tuning methods are employed for both binary and multiclass classification. Results: The fine-tuned LLaMA...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/esem64174.2025.00021","openalex_id":"https://openalex.org/W7123348570","cited_by_count":0,"quality_score":45,"matched_keywords":["LLM","efficient"],"author_affiliations":["Bangladesh University of Engineering and Technology","Google (United States)","University of British Columbia"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8483999967575073},{"id":"https://openalex.org/C64869954","display_name":"False positive paradox","score":0.6685000061988831},{"id":"https://openalex.org/C43126263","display_name":"Source code","score":0.6588000059127808},{"id":"https://openalex.org/C185798385","display_name":"Benchmark (surveying)","score":0.6263999938964844},{"id":"https://openalex.org/C48372109","display_name":"Binary number","score":0.5638999938964844},{"id":"https://openalex.org/C48044578","display_name":"Scalability","score":0.5533000230789185},{"id":"https://openalex.org/C2776760102","display_name":"Code (set theory)","score":0.5245000123977661},{"id":"https://openalex.org/C63435697","display_name":"Binary code","score":0.4562999904155731}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4414760390","title":"Zero-Shot Autonomous Microscopy for Scalable and Intelligent Characterization of 2D Materials","url":"https://doi.org/10.1021/acsnano.5c09057","published":"2025-10-02","authors":["H. J. Yang","Ruoyan Avery Yin","Chi Jiang","Yuepeng Hu","Xiaokai Zhu","Xingjian Hu","S. Ananda Kumar","Samantha K. Holmes","Xinghuan Wang","Xiaohua Zhai","Keran Rong","Yunyue Zhu"],"abstract":", SnSe─regardless of whether they were fabricated via top-down or bottom-up methods. This work represents the implementation of foundation models to achieve autonomous analysis, providing a scalable and data-efficient characterization paradigm that transforms the approach to nanoscale materials research.","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1021/acsnano.5c09057","openalex_id":"https://openalex.org/W4414760390","cited_by_count":3,"quality_score":44,"matched_keywords":["efficient"],"author_affiliations":["Australian National University","Duke University","Google (United States)","Google DeepMind (United Kingdom)","Massachusetts Institute of Technology","National University of Singapore","University of California, Berkeley"],"concepts":[{"id":"https://openalex.org/C2780841128","display_name":"Characterization (materials science)","score":0.7717999815940857},{"id":"https://openalex.org/C2780513914","display_name":"Bottleneck","score":0.739300012588501},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6635000109672546},{"id":"https://openalex.org/C48044578","display_name":"Scalability","score":0.6032000184059143},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4537999927997589},{"id":"https://openalex.org/C89600930","display_name":"Segmentation","score":0.4163999855518341},{"id":"https://openalex.org/C171250308","display_name":"Nanotechnology","score":0.3224000036716461},{"id":"https://openalex.org/C2780966255","display_name":"Foundation (evidence)","score":0.3163999915122986}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":3}},{"id":"openalex:W4414763249","title":"The Brain Imaging and Neurophysiology Database: BINDing multimodal neural data into a large-scale repository","url":"https://doi.org/10.1101/2025.10.01.25337054","published":"2025-10-02","authors":["Charlotte Maschke","Peter Hadar","Yi‐Cheng Zhang","Li Jian","Gauri Ganjoo","Andrew Hoopes","Alessandro Guazzo","Aditya Gupta","Manohar Ghanta","Bruce D. Nearing","Christine Tsien Silvers","Bharath Gunapati"],"abstract":"1 The Brain Imaging and Neurophysiology Database (BIND) represents one of the largest multi-institutional, multimodal, clinical neuroimaging repositories, comprising 1.8 million brain scans from 38,945 patients, linked to neurophysiological recordings. This comprehensive dataset addresses critical limitations in neuroimaging research by providing unprecedented scale and diversity across pathologies and health. BIND integrates de-identified data from Massachusetts General Hospital, Brigham and Women's Hospital, and Stanford University, including 1,723,699 MRI scans (1.5 Tesla, 3 Tesla, and 7 Tesla), 54,137 CT scans, 5,093 PET scans, and 526 SPECT scans, converted to standardized NIfTI format following BIDS organization. The database spans the full age spectrum (newborn to 106 years) and encompasses diverse neurological conditions alongside healthy patients. We deployed Bio-Medical Large L...","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"preprint","doi":"https://doi.org/10.1101/2025.10.01.25337054","openalex_id":"https://openalex.org/W4414763249","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Amazon (United States)","Artificial Intelligence in Medicine (Canada)","Athinoula A. Martinos Center for Biomedical Imaging","Beth Israel Deaconess Medical Center","Harvard University","Massachusetts General Hospital","Palo Alto University","Stanford University","Yale University"],"concepts":[{"id":"https://openalex.org/C58693492","display_name":"Neuroimaging","score":0.8568999767303467},{"id":"https://openalex.org/C152478114","display_name":"Neurophysiology","score":0.669700026512146},{"id":"https://openalex.org/C522805319","display_name":"Electroencephalography","score":0.5652999877929688},{"id":"https://openalex.org/C205427263","display_name":"Neuroinformatics","score":0.5565000176429749},{"id":"https://openalex.org/C169760540","display_name":"Neuroscience","score":0.527899980545044},{"id":"https://openalex.org/C31601959","display_name":"Medical imaging","score":0.4343999922275543},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.43130001425743103},{"id":"https://openalex.org/C129692064","display_name":"Clinical neurophysiology","score":0.4138000011444092}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4414755817","title":"Deploying and Scaling Defect Detection Models","url":"https://doi.org/10.4018/979-8-3373-4460-7.ch003","published":"2025-10-02","authors":["Banani Mohapatra","Bhavnish Walia","Subhani Rath","Sovan Rath","Meetu Malhotra"],"abstract":"The speed of modern software development has amplified the demand for scalable defect detection. This chapter presents a survey of machine learning (ML)-based defect detection models with an emphasis on their deployment and scalability. Three modeling paradigms are covered: supervised learning, unsupervised learning, and LLMs like CodeBERT. Using benchmarking datasets such as NASA's Metrics Data (MDP) and CodeXGLUE, we investigate the model performance and scalability trade-offs. Supervised models, like Random Forest classifiers trained on labeled data, have high accuracy. Unsupervised models such as Isolation Forests provide value where there is no ground truth, and LLMs provide semantic comprehension and reasoning for defect detection tasks. This chapter also explains production pipelines necessary to scale up such models. We demonstrate a cloud-native deployment pipeline with CI/CD in...","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"book-chapter","doi":"https://doi.org/10.4018/979-8-3373-4460-7.ch003","openalex_id":"https://openalex.org/W4414755817","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Amazon (United States)","Anna Needs Neuroblastoma Answers","Ford Motor Company (United States)","Harrisburg University of Science and Technology","Walmart (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6927000284194946},{"id":"https://openalex.org/C105339364","display_name":"Software deployment","score":0.6840999722480774},{"id":"https://openalex.org/C86251818","display_name":"Benchmarking","score":0.6753000020980835},{"id":"https://openalex.org/C48044578","display_name":"Scalability","score":0.6690000295639038},{"id":"https://openalex.org/C43521106","display_name":"Pipeline (software)","score":0.6452000141143799},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5117999911308289},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.48840001225471497},{"id":"https://openalex.org/C206345919","display_name":"Resource (disambiguation)","score":0.46209999918937683}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"arxiv:2510.02283","title":"Self-Forcing++: Towards Minute-Scale High-Quality Video Generation","url":"https://huggingface.co/papers/2510.02283","published":"2025-10-02","authors":["Justin Cui","Jie Wu","Ming Li","Tao Yang","Xiaojie Li","Rui Wang","Andrew Bai","Yuanhao Ban","Cho-Jui Hsieh"],"abstract":"Diffusion models have revolutionized image and video generation, achieving unprecedented visual quality. However, their reliance on transformer architectures incurs prohibitively high computational costs, particularly when extending generation to long videos. Recent work has explored autoregressive formulations for long video generation, typically by distilling from short-horizon bidirectional teachers. Nevertheless, given that teacher models cannot synthesize long videos, the extrapolation of student models beyond their training horizon often leads to pronounced quality degradation, arising from the compounding of errors within the continuous latent space. In this paper, we propose a simple yet effective approach to mitigate quality degradation in long-horizon video generation without requiring supervision from long-video teachers or retraining on long video datasets. Our approach cente...","companies":["ByteDance/Seed"],"matched_orgs":["ByteDance/Seed"],"company_groups":["company_china"],"company_regions":["China"],"sources":["huggingface_search"],"source":"huggingface_search","work_type":"","doi":"","openalex_id":"","cited_by_count":0,"quality_score":27,"matched_keywords":[],"author_affiliations":[],"concepts":[],"official_report":false,"quality_signals":{"company_match_source":"HuggingFace Papers organization/author metadata"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/permissive-information-flow-analysis-for-large-language-models","title":"Permissive Information-Flow Analysis for Large Language Models","url":"https://www.microsoft.com/en-us/research/publication/permissive-information-flow-analysis-for-large-language-models/","published":"2025-10-01","authors":["Shoaib Ahmed Siddiqui","Radhika Gaonkar","Boris Köpf","David Krueger","Andrew Paverd","Ahmed Salem","Shruti Tople","Lukas Wutschitz","Menglin Xia","Santiago Zanella-Béguelin"],"abstract":"Large Language Models (LLMs) are rapidly becoming commodity components of larger software systems. This poses natural security and privacy problems: poisoned data retrieved from one component can change the model's behavior and compromise the entire system, including coercing the model to spread confidential data to untrusted components. One promising approach is to tackle this problem at the system level via dynamic information flow (aka taint) tracking. Unfortunately, this approach of propagating the most restrictive input label to the output is too conservative for applications where LLMs operate on inputs retrieved from diverse sources. In this paper, we propose a novel, more permissive approach to propagate information flow labels through LLM queries. The key idea behind our approach is to propagate only the labels of the samples that were influential in generating the model output....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":92,"matched_keywords":["Article (Journal)","Artificial intelligence","Security, privacy, and cryptography","Computer science","large language models","1970-01-01","LLM","language model","retrieval","agent"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/siraj-diverse-and-efficient-red-teaming-for-llm-agents-via-distilled-structured-reasoning","title":"SIRAJ: Diverse and Efficient Red-Teaming for LLM Agents via Distilled Structured Reasoning","url":"https://www.microsoft.com/en-us/research/publication/siraj-diverse-and-efficient-red-teaming-for-llm-agents-via-distilled-structured-reasoning/","published":"2025-10-01","authors":["Kaiwen Zhou","Ahmed Elgohary","A S M Iftekhar","Amin Saied"],"abstract":"The ability of LLM agents to plan and invoke tools exposes them to new safety risks, making a comprehensive red-teaming system crucial for discovering vulnerabilities and ensuring their safe deployment. We present SIRAJ: a generic red-teaming framework for arbitrary black-box LLM agents. We employ a dynamic two-step process that starts with an agent definition and generates diverse seed test cases that cover various risk outcomes, tool-use trajectories, and risk sources. Then, it iteratively constructs and refines model-based adversarial attacks based on the execution trajectories of former attempts. To optimize the red-teaming cost, we present a model distillation approach that leverages structured forms of a teacher model's reasoning to train smaller models that are equally effective. Across diverse evaluation agent settings, our seed test case generation approach yields 2 -- 2.5x boos...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":84,"matched_keywords":["Miscellaneous","Artificial intelligence","Computer science","Generative AI","LLM","efficient","distillation","agent"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/sharp-tools-how-developers-wield-agentic-ai-in-real-software-engineering-tasks","title":"Why AI Agents Still Need You: Findings from Developer-Agent Collaborations in the Wild","url":"https://www.microsoft.com/en-us/research/publication/sharp-tools-how-developers-wield-agentic-ai-in-real-software-engineering-tasks/","published":"2025-10-01","authors":["Aayush Kumar","Yasharth Bajpai","Sumit Gulwani","Gustavo Soares","Emerson Murphy-Hill"],"abstract":"Software Engineering Agents (SWE agents) can autonomously perform development tasks on benchmarks like SWE Bench, but still face challenges when tackling complex and ambiguous real-world tasks. Consequently, SWE agents are often designed to allow interactivity with developers, enabling collaborative problem-solving. To understand how developers collaborate with SWE agents and the communication challenges that arise in such interactions, we observed 19 developers using an in-IDE agent to resolve 33 open issues in repositories to which they had previously contributed. Participants successfully resolved about half of these issues, with participants solving issues incrementally having greater success than those using a one-shot approach. Participants who actively collaborated with the agent and iterated on its outputs were also more successful, though they faced challenges in trusting the ag...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":80,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Human-computer interaction","Programming languages and software engineering","Computer science","1970-01-01","agent"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/pzo-pseudo-zeroth-order-algorithm-for-training-deep-neural-networks","title":"PZO: Pseudo-Zeroth-Order Algorithm for Training Deep Neural Networks","url":"https://www.microsoft.com/en-us/research/publication/pzo-pseudo-zeroth-order-algorithm-for-training-deep-neural-networks/","published":"2025-10-01","authors":["Pengyun Yue","Xuanlin Yang","Mingqing Xiao","Zhouchen Lin"],"abstract":"Zeroth-order Optimization (ZO) has received wide attention in machine learning, especially when computing full gradient is expensive or even impossible. Recently, ZO has emerged as an important paradigm for memory-efficient fine-tuning of large language models (LLMs), circumventing the memory overhead of backpropagation. However, existing ZO gradient estimators exhibit dimension-dependent variance scaling as [equation], leading to dimension-dependent convergence rates which is prohibitive for large-scale LLM parameters. To address this problem, we present a Pseudo-Zeroth-Order (PZO) framework for optimizing composite objective functions, especially large-scale models: [equation], where h represents complex, high-dimensional representations and [equation] is a task-specific loss. While existing zeroth-order methods estimate gradients with final loss functions, our PZO algorithm estimate t...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":80,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Deep neural networks","1970-01-01","LLM","memory","efficient"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/efficient-and-near-optimal-algorithm-for-general-contextual-dueling-bandits-with-offline-regression-oracles","title":"Efficient and Near-Optimal Algorithm for General Contextual Dueling Bandits with Offline Regression Oracles","url":"https://www.microsoft.com/en-us/research/publication/efficient-and-near-optimal-algorithm-for-general-contextual-dueling-bandits-with-offline-regression-oracles/","published":"2025-10-01","authors":["Aadirupa Saha","Robert E. Schapire"],"abstract":"We study the contextual dueling bandit problem, where a learner uses contextual information to make two decisions and receives only relative (comparison) feedback. This problem is crucial in reinforcement learning with human feedback (RLHF), widely applied in AI alignment to integrate human preferences into AI models. Unlike prior works, we consider general preference relations and propose the first efficient, near-optimal regret algorithm using an offline regression oracle. Existing approaches rely on online oracles, which are often impractical for complex function classes, leading to poor performance in challenging settings. Our key contribution is analyzing the contextual Best Response regret and developing an [latex]\\tilde{O}(K\\sqrt{T})[/latex] regret algorithm, explicitly incorporating offline oracle performance. We further extend our results to continuous decision spaces, achieving...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":80,"matched_keywords":["Inproceedings (Conference)","Algorithms","Artificial intelligence","1970-01-01","LLM","preference","efficient"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/bluecodeagent-a-blue-teaming-agent-enabled-by-automated-red-teaming-for-codegen-ai","title":"BlueCodeAgent: A Blue Teaming Agent Enabled by Automated Red Teaming for CodeGen AI","url":"https://www.microsoft.com/en-us/research/publication/bluecodeagent-a-blue-teaming-agent-enabled-by-automated-red-teaming-for-codegen-ai/","published":"2025-10-01","authors":["Chengquan Guo","Yuzhou Nie","Chulin Xie","Zinan Lin","Wenbo Guo","Bo Li"],"abstract":"As large language models (LLMs) are increasingly used for code generation, concerns over the security risks have grown substantially. Early research has primarily focused on red teaming, which aims to uncover and evaluate vulnerabilities and risks of CodeGen models. However, progress on the blue teaming side remains limited, as developing defense requires effective semantic understanding to differentiate the unsafe from the safe. To fill in this gap, we propose BlueCodeAgent, an end-to-end blue teaming agent enabled by automated red teaming. Our framework integrates both sides: red teaming generates diverse risky instances, while the blue teaming agent leverages these to detect previously seen and unseen risk scenarios through constitution and code analysis with agentic integration for multi-level defense. Our evaluation across three representative code-related tasks--bias instruction de...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":80,"matched_keywords":["Unpublished","Artificial intelligence","Security, privacy, and cryptography","AI agents","Computer security","large language models","agent"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/ai-where-it-matters-where-why-and-how-developers-want-ai-support-in-daily-work","title":"AI Where It Matters: Where, Why, and How Developers Want AI Support in Daily Work","url":"https://www.microsoft.com/en-us/research/publication/ai-where-it-matters-where-why-and-how-developers-want-ai-support-in-daily-work/","published":"2025-10-01","authors":["Rudrajit Choudhuri","Carmen Badea","Christian Bird","Jenna Butler","Robert DeLIne","Brian Houck"],"abstract":"Generative AI is reshaping software work, yet we lack clear guidance on where developers most need and want support, and how to design it responsibly. We report a large-scale, mixed-methods study of N=860 developers that examines where, why, and how they seek or limit AI help, providing the first task-aware, empirically validated mapping from developers' perceptions of their tasks to AI adoption patterns and responsible AI priorities. Using cognitive appraisal theory, we show that task evaluations predict openness to and use of AI, revealing distinct patterns: strong current use and a desire for improvement in core work (e.g., coding, testing); high demand to reduce toil (e.g., documentation, operations); and clear limits for identity- and relationship-centric work (e.g., mentoring). Priorities for responsible AI support vary by context: reliability and security for systems-facing tasks;...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":80,"matched_keywords":["Unpublished","Artificial intelligence","Human-computer interaction","Programming languages and software engineering","Generative AI","HCI","software engineering"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/unlocking-slm-potential-for-data-analysis-code-generation-via-non-parametric-knowledge-distillation","title":"Unlocking SLM Potential for Data Analysis Code Generation via Non-Parametric Knowledge Distillation","url":"https://www.microsoft.com/en-us/research/publication/unlocking-slm-potential-for-data-analysis-code-generation-via-non-parametric-knowledge-distillation/","published":"2025-10-01","authors":["Jinyang Li","Jack Williams","Nick McKenna","Arian Askari","Nicholas Wilson","Reynold Cheng"],"abstract":"Knowledge distillation from Large Language Models (LLMs) to locally hosted Small Language Models (SLMs) provides advantages for Data Analysis Code Generation (DACG) such as privacy protection. However, achieving effective distillation without resource-intensive training is challenging. This paper investigates whether LLMs can distill knowledge to SLMs through In-Context Learning (ICL), a training-free method for rapid task adaptation. We present the DarGO: Distillation and Adaptive Reasoning-Guided Orchestration framework, which facilitates automatic knowledge distillation from LLMs to SLMs. DarGO consists of three phases: exploration through an Model Orchestration Interface (MOI), Memory Collection of successful trajectories, and Knowledge-driven Inference. We evaluate DarGO on three challenging DACG benchmarks (WikiTQ, TabMWP, and Bird-SQL), each with in-domain training sets that enabl...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","large language models","1970-01-01","memory","distillation"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/trainverify-equivalence-based-verification-for-distributed-llm-training","title":"TrainVerify: Equivalence-Based Verification for Distributed LLM Training","url":"https://www.microsoft.com/en-us/research/publication/trainverify-equivalence-based-verification-for-distributed-llm-training/","published":"2025-10-01","authors":["Yunchi Lu","Youshan Miao","Cheng Tan","Peng Huang","Yi Zhu","Xian Zhang","Fan Yang"],"abstract":"Training large language models (LLMs) at scale requires parallel execution across thousands of devices, incurring enormous computational costs. Yet, these costly distributed trainings are prone to correctness bugs, causing silent errors and potentially wasting millions of GPU hours. These bugs are challenging to expose through testing.We introduce TrainVerify, a system for verifiable distributed training of LLMs to eliminate parallelization bugs. Given a deep learning model’s logical specification as the ground truth, TrainVerify formally verifies that a distributed parallel execution plan is mathematically equivalent to it. Direct verification is notoriously difficult due to the sheer scale of LLMs which often involves billions of variables and highly intricate computation graphs. Therefore, TrainVerify introduces a stage-wise parallel verification algorithm and shape-reduction techniqu...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Programming languages and software engineering","Systems and networking","Operating system","1970-01-01","LLM"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/self-distilled-attention-gating-for-efficient-long-context-prefilling","title":"Self-distilled Attention Gating for Efficient Long-context Prefilling","url":"https://www.microsoft.com/en-us/research/publication/self-distilled-attention-gating-for-efficient-long-context-prefilling/","published":"2025-10-01","authors":["Yizhao Gao","Zhichen Zeng","DaYou Du","Shijie Cao","Peiyuan Zhou","Jiaxing Qi","Junjie Lai","Hayden Kwok-Hay So","Ting Cao","Fan Yang","Mao Yang"],"abstract":"Attention is the cornerstone of modern Large Language Models (LLMs). Yet its quadratic complexity hinders efficiency and scalability, especially for long-context processing. A promising approach is to leverage sparsity in attention. However, existing sparsity-based solutions predominantly rely on predefined patterns or heuristics at the attention head level, struggling to adapt dynamically to different contexts efficiently. We propose SeerAttention, a simple yet effective attention mechanism that directly learns the block-level attention sparsity from the LLM itself. Inspired by the gating mechanism in Mixture of Experts (MoE), SeerAttention augments the conventional attention with a learnable gate that selectively activates important blocks within the attention map. Specifically, the gate first pools the query (Q) and key (K) tensors along the sequence dimension and processes them throu...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","1970-01-01","LLM","efficient","distillation"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/r-kv-redundancy-aware-kv-cache-compression-for-reasoning-models","title":"R-KV: Redundancy-aware KV Cache Compression for Reasoning Models","url":"https://www.microsoft.com/en-us/research/publication/r-kv-redundancy-aware-kv-cache-compression-for-reasoning-models/","published":"2025-10-01","authors":["Zefan Cai","Wen Xiao","Hanshi Sun","Cheng Luo","Yikai Zhang","Ke Wan","Yucheng Li","Yeyang Zhou","Li-Wen Chang","Jiuxiang Gu","Zhen Dong","Anima Anandkumar"],"abstract":"Reasoning models have demonstrated impressive performance in self-reflection and chain-of-thought reasoning. However, they often produce excessively long outputs, leading to prohibitively large key-value (KV) caches during inference. While chain-of-thought inference significantly improves performance on complex reasoning tasks, it can also lead to reasoning failures when deployed with existing KV cache compression approaches. To address this, we propose Redundancy-aware KV Cache Compression for Reasoning models (R-KV), a novel method specifically targeting redundant tokens in reasoning models. Our method preserves nearly 100% of the full KV cache performance using only 10% of the KV cache, substantially outperforming existing KV cache baselines, which reach only 60% of the performance. Remarkably, R-KV even achieves 105% of full KV cache performance with 16% of the KV cache. This KV-cach...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computation and Language","1970-01-01","memory","compression"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/just-do-it-computer-use-agents-exhibit-blind-goal-directedness","title":"Just Do It!? Computer-Use Agents Exhibit Blind Goal-Directedness","url":"https://www.microsoft.com/en-us/research/publication/just-do-it-computer-use-agents-exhibit-blind-goal-directedness/","published":"2025-10-01","authors":["Erfan Shayegani","Keegan Hines","Yue Dong","Nael B. Abu-Ghazaleh","Roman Lutz","Spencer Whitehead","Vidhisha Balachandran","Besmira Nushi","Vibhav Vineet"],"abstract":"Computer-Use Agents (CUAs) are an increasingly deployed class of agents that take actions on GUIs to accomplish user goals. In this paper, we show that CUAs consistently exhibit Blind Goal-Directedness (BGD): a bias to pursue goals regardless of feasibility, safety, reliability, or context. We characterize three prevalent patterns of BGD: (i) lack of contextual reasoning, (ii) assumptions and decisions under ambiguity, and (iii) contradictory or infeasible goals. We develop BLIND-ACT, a benchmark of 90 tasks capturing these three patterns. Built on OSWorld, BLIND-ACT provides realistic environments and employs LLM-based judges to evaluate agent behavior, achieving 93.75% agreement with human annotations. We use BLIND-ACT to evaluate nine frontier models, including Claude Sonnet and Opus 4, Computer-Use-Preview, and GPT-5, observing high average BGD rates (80.8%) across them. We show that...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","1970-01-01","LLM","agent"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/accelerating-block-coordinate-descent-for-llm-finetuning-via-landscape-expansion","title":"Accelerating Block Coordinate Descent for LLM Finetuning via Landscape Expansion","url":"https://www.microsoft.com/en-us/research/publication/accelerating-block-coordinate-descent-for-llm-finetuning-via-landscape-expansion/","published":"2025-10-01","authors":["Qijun Luo","Yifei Shen","Liangzu Peng","Dongsheng Li","Xiao Li"],"abstract":"Finetuning large language models (LLMs) is a resource-intensive task for researchers in academia, with memory constraints posing a key bottleneck. A classic optimization method, block coordinate descent (BCD), significantly reduces memory cost by segmenting the trainable parameters into multiple blocks and optimizing one active block at a time while freezing the others. However, we identify that blindly applying BCD to train LLMs can be inefficient for two reasons. First, optimizing only the active block requires backpropagating through multiple deeper yet inactive blocks, resulting in wasteful computations. Second, the frozen blocks, when they are not quite close to optimality, can narrow the optimization landscape, potentially misguiding the training of the active block. To address these issues simultaneously, we propose integrating BCD with landscape expansion , which unfreezes the in...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","1970-01-01","LLM","memory","efficient"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/table-based-language-models-for-ophthalmology-assessment-in-the-emergency-department","title":"Table-based language models for ophthalmology assessment in the emergency department","url":"https://www.microsoft.com/en-us/research/publication/table-based-language-models-for-ophthalmology-assessment-in-the-emergency-department/","published":"2025-10-01","authors":["Juan M. Lavista Ferres","Shu Feng","Mary Kim","Nadia Popovici","Lauren Lee","Kaden Moore","Karine D. Bojikian"],"abstract":"Purpose General-domain large language models (LLMs) have emerged as valuable tools in healthcare, however, their ability to understand and perform tasks based on data stored in tabular form has not been explored in Ophthalmology. We aimed to assess OpenAI’s Generative Pre-trained Transformer 4o (GPT-4o) performance within real emergency department (ED) eye-related encounters extracted from electronic medical records in tabular format. Methods We input the excel spreadsheet containing the data on 1,419 unique eye-related ED encounters, divided into (1) chief complaint (CC), history of present illness (HPI), and eye examination; (2) CC and eye examination; (3) eye examination only, into GPT-4o via Microsoft’s Azure OpenAI Service using chain-of-thought (CoT) prompting and evaluated the diagnosis and assessment performance of the LLM on the presented data. GPT-4o answers were reviewed by bo...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Article (Journal)","Artificial intelligence","Medical, health and genomics","1970-01-01","LLM"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/nadro-leveraging-dual-reward-strategies-for-llms-training-on-noisy-data","title":"NaDRO: Leveraging Dual-Reward Strategies for LLMs Training on Noisy Data","url":"https://www.microsoft.com/en-us/research/publication/nadro-leveraging-dual-reward-strategies-for-llms-training-on-noisy-data/","published":"2025-10-01","authors":["Haolong Qian","Xianliang Yang","Ling Zhang","Lei Song","Jiang Bian","Chun Yuan"],"abstract":"Group Relative Policy Optimization (GRPO) fine-tuning has been empirically shown to significantly enhance the reasoning abilities of language models. However, it often relies on large-scale, high-quality labeled data, which is typically difficult to obtain. To address this challenge, we introduce the Noise-Aware Dual-Reward Optimization (NaDRO) , which effectively enhances LLMs training in environments where data is noisy or imperfect. NaDRO operates through two key components: \\textbf{(1) Preference-based Outcome Reward (POR)}, which extracts reliable preference signals from noisy data, guiding LLMs towards more effective decisions instead of relying on specific noisy scores; and \\textbf{(2) a Context Perception Reward (CPR) mechanism}, which ensures that LLMs conduct necessary qualitative assessment of the current problem state, rewarding accurate judgments to foster better cognitive u...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","large language models","1970-01-01","preference"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/maximizing-nature-based-solutions-using-artificial-intelligence-to-align-global-biodiversity-climate-and-water-targets","title":"Maximizing Nature-based Solutions using Artificial Intelligence to align global biodiversity, climate, and water targets","url":"https://www.microsoft.com/en-us/research/publication/maximizing-nature-based-solutions-using-artificial-intelligence-to-align-global-biodiversity-climate-and-water-targets/","published":"2025-10-01","authors":["Amy Luers"],"abstract":"Nature-based Solutions (NbS) encompass a spectrum of conservation and restoration actions aimed at improving biodiversity, climate, and/or water outcomes. Considerable research exists that focuses either on conservation or restoration, or on one particular environmental outcome. Yet, there is a need to develop integrated frameworks that align multiple outcomes and enable comprehensive environmental and economic assessments. Here, we present an integrated framework leveraging an AI agent that interprets species’ habitat connectivity changes along with climate and water co-benefits to select optimal conservation and restoration priorities. We implement this framework through scenarios maximizing biodiversity protection with ecological integrity, carbon storage, and water co-benefits to achieve Canada’s 30×30 conservation and restoration targets. Our results suggest that prioritizing the pr...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Unpublished","Artificial intelligence","Ecology and environment","Climate change","agent"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/leveraging-large-language-models-to-generate-multiple-choice-questions-for-ophthalmology-education","title":"Leveraging Large Language Models to Generate Multiple-Choice Questions for Ophthalmology Education","url":"https://www.microsoft.com/en-us/research/publication/leveraging-large-language-models-to-generate-multiple-choice-questions-for-ophthalmology-education/","published":"2025-10-01","authors":["Shahrzad Gholami","Daniel B. Mummert","Beth Wilson","Sarah Page","Rahul Dodhia","Juan M. Lavista Ferres","Bill Weeks","Dale E. Fajardo","Karine D. Bojikian"],"abstract":"Importance Multiple choice questions (MCQs) are an important and integral component of ophthalmology residency training evaluation and board certification; however, high-quality questions are difficult and time-consuming to draft. Objective To evaluate whether general-domain large language models (LLMs), particularly OpenAI’s Generative Pre-trained Transformer 4 (GPT-4), can reliably generate high-quality, novel, and readable MCQs comparable to those of a committee of experienced examination writers. Design, Setting, and Participants This survey study, conducted from September 2024 to April 2025, assesses LLM performance in generating MCQs based on the American Academy of Ophthalmology (AAO) Basic and Clinical Science Course ( BCSC ) compared with a committee of human experts. Ten expert ophthalmologists, who were masked to the generation source, independently evaluated MCQs using a 10-p...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Article (Journal)","Artificial intelligence","Medical, health and genomics","1970-01-01","LLM"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/enhancing-temporal-understanding-in-video-llms-through-stacked-temporal-attention-in-vision-encoders","title":"Enhancing Temporal Understanding in Video-LLMs through Stacked Temporal Attention in Vision Encoders","url":"https://www.microsoft.com/en-us/research/publication/enhancing-temporal-understanding-in-video-llms-through-stacked-temporal-attention-in-vision-encoders/","published":"2025-10-01","authors":["Ali Rasekh","Erfan Soula","Omid Daliran","Simon Gottschalk","Mohsen Fayyaz"],"abstract":"Despite significant advances in Multimodal Large Language Models (MLLMs), understanding complex temporal dynamics in videos remains a major challenge. Our experiments show that current Video Large Language Model (Video-LLM) architectures have critical limitations in temporal understanding, struggling with tasks that require detailed comprehension of action sequences and temporal progression. In this work, we propose a Video-LLM architecture that introduces stacked temporal attention modules directly within the vision encoder. This design incorporates a temporal attention in vision encoder, enabling the model to better capture the progression of actions and the relationships between frames before passing visual tokens to the LLM. Our results show that this approach significantly improves temporal reasoning and outperforms existing models in video question answering tasks, specifically in....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","1970-01-01","LLM","language model"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/agentic-media","title":"Agentic Media: Reimagining the Future of Communication","url":"https://www.microsoft.com/en-us/research/publication/agentic-media/","published":"2025-10-01","authors":["Yun Wang","Yan Lu"],"abstract":"Traditional media systems model communication as a linear transfer of static content from author to reader, mediated by largely passive tools. This model enforces rigid separations between creation and consumption, fragments communicative processes, and constrains collaboration as meaning evolves over time. We introduce Agentic Media, a communication paradigm in which media participate in the construction and negotiation of meaning by embedding communicative intent, retaining interactional context, and supporting adaptive engagement. Within this paradigm, communication is reframed as an ongoing process of expression, exploration, interpretation, and reflection, rather than the delivery of finalized artifacts. We articulate the conceptual foundations for reasoning about how communication can be structured when media are treated as active participants in communicative processes. Building o...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Tech Report","Artificial intelligence","Graphics and multimedia","Human-computer interaction","media"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/scgenescope-a-treatment-matched-single-cell-imaging-and-transcriptomics-dataset-and-benchmark-for-treatment-response-modeling","title":"scGeneScope: A Treatment-Matched Single Cell Imaging and Transcriptomics Dataset and Benchmark for Treatment Response Modeling","url":"https://www.microsoft.com/en-us/research/publication/scgenescope-a-treatment-matched-single-cell-imaging-and-transcriptomics-dataset-and-benchmark-for-treatment-response-modeling/","published":"2025-10-01","authors":["Joel Dapello","Marcel Nassar","Ridvan Eksi","Ban Wang","Jules Gagnon-Marchand","Kenneth Gao","akram Baharlouei","Kyra Thrush","Nina Riehs","Amy Peterson","Aniket Tolpadi","Abhejit Rajagopal"],"abstract":"Understanding cellular responses to chemical interventions is critical to the discovery of effective therapeutics. Because individual biological techniques often measure only one axis of cellular response at a time, high-quality multimodal datasets are needed to unlock a holistic understanding of how cells respond to treatments and to advance computational methods that integrate modalities. However, many techniques destroy cells and thus preclude paired measurements, and attempts to match disparate unimodal datasets are often confounded by data being generated in incompatible experimental settings. Here we introduce scGeneScope, a multimodal single‑cell RNA sequencing (scRNA-seq) and Cell Painting microscopy image dataset conditionally paired by chemical treatment, designed to facilitate the development and benchmarking of unimodal, multimodal, and multiple profile machine learning metho...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Inproceedings (Conference)","Medical, health and genomics","Biology","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/roboseer-video-generators-can-be-generalizable-robot-manipulators","title":"RoboSeer: Video Generators Can Be Generalizable Robot Manipulators","url":"https://www.microsoft.com/en-us/research/publication/roboseer-video-generators-can-be-generalizable-robot-manipulators/","published":"2025-10-01","authors":["Yichao Shen","Fangyun Wei","Zhiying Du","Yaobo Liang","Yan Lu","Jiaolong Yang","Nanning Zheng","Baining Guo"],"abstract":"Generalization in robot manipulation is essential for deploying robots in open-world environments and advancing toward artificial general intelligence. While recent vision-language-action models leverage large pre-trained understanding models for perception and instruction following, their ability to generalize to novel tasks, objects, and settings remains limited. In this work, we present RoboSeer, a new approach that shifts from understanding to generation. Instead of solely predicting the next action, RoboSeer also imagines and generates the future visual outcome of that action. Built on a multi-modal Diffusion Transformer, RoboSeer jointly models video, language, and action modalities, using pre-trained video generative models for joint visual and action forecasting. Our experiments show that high-quality imagined futures correlate with reliable action predictions and task success, h...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Vision-language models","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/parameter-free-last-iterate-convergence-of-counterfactual-regret-minimization-algorithms","title":"Parameter-Free Last-Iterate Convergence of Counterfactual Regret Minimization Algorithms","url":"https://www.microsoft.com/en-us/research/publication/parameter-free-last-iterate-convergence-of-counterfactual-regret-minimization-algorithms/","published":"2025-10-01","authors":["Linjian Meng","Youzhi Zhang","Shangdong Yang","Tianyu Ding","Zhenxing Ge","Wenbin Li","Tianpei Yang","Bo An","Yang Gao"],"abstract":"To establish last-iterate convergence for Counterfactual Regret Minimization (CFR) algorithms in learning a Nash equilibrium (NE) of extensive-form games (EFGs), recent studies reformulate learning an NE of the original EFG as learning the NEs of a sequence of (perturbed) regularized EFGs. Hence, proving last-iterate convergence in solving the original EFG reduces to proving last-iterate convergence in solving (perturbed) regularized EFGs. However, these studies only establish last-iterate convergence for Online Mirror Descent (OMD)-based CFR algorithms instead of Regret Matching (RM)-based CFR algorithms in solving perturbed regularized EFGs, resulting in a poor empirical convergence rate, as RM-based CFR algorithms typically outperform OMD-based CFR algorithms. In addition, as solving multiple perturbed regularized EFGs is required, fine-tuning across multiple perturbed regularized EFG...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Inproceedings (Conference)","Algorithms","Algorithm","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/native-hybrid-thinking-models","title":"Native Hybrid Thinking Models","url":"https://www.microsoft.com/en-us/research/publication/native-hybrid-thinking-models/","published":"2025-10-01","authors":["Lingjie Jiang","Xun Wu","Shaohan Huang","Qingxiu Dong","Zewen Chi","Li Dong","Xingxing Zhang","Tengchao Lv","Lei Cui","Furu Wei"],"abstract":"Recent Large Reasoning Models (LRMs) have shown substantially improved reasoning capabilities over traditional Large Language Models (LLMs) by incorporating extended thinking processes prior to producing final responses. However, excessively lengthy thinking introduces substantial overhead in terms of token consumption and latency, particularly unnecessary for simple queries. In this work, we introduce introduce Native Hybrid Thinking Models (NHTMs), the first kind of model capable of adaptively determining whether to perform thinking based on the contextual information of user queries. To achieve this, we propose a two-stage training pipeline comprising Hybrid Fine-Tuning (HFT) as a cold start, followed by online reinforcement learning with the proposed Hybrid Group Policy Optimization (HGPO) to implicitly learn to select the appropriate thinking mode. Furthermore, we introduce a metric...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Reinforcement learning","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/learning-outcomes-with-genai-in-the-classroom-a-review-of-empirical-evidence","title":"Learning outcomes with GenAI in the classroom: A review of empirical evidence","url":"https://www.microsoft.com/en-us/research/publication/learning-outcomes-with-genai-in-the-classroom-a-review-of-empirical-evidence/","published":"2025-10-01","authors":["Kathy Walker","Mihaela Vorvoreanu"],"abstract":"This report presents a review of recent empirical evidence of generative AI (GenAI) impact on learning outcomes in formal education. Its purpose is to provide educators with an overview of top concerns for ensuring students’ learning gains when using LLM-based learning tools and concludes with research-derived guidance for deciding when and how to use these tools in the classroom. The report unfolds as follows: Section 1 distinguishes between the needs of education and industry , where the benefits of LLMs were first explored, primarily for productivity gains. Educators’ priorities are different. Pedagogical concerns include consideration of inequities in education, developing students’ critical thinking skills, and the potential for GenAI to inhibit social development. These concerns extend beyond technologists’ focus on mitigating technical harms such as toxic content, bias, or accurac...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Tech Report","Artificial intelligence","LLM","memory"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/disarming-strategic-text-span-aware-counterfactuals-for-robust-content-moderation","title":"Disarming Strategic Text: Span-Aware Counterfactuals for Robust Content Moderation","url":"https://www.microsoft.com/en-us/research/publication/disarming-strategic-text-span-aware-counterfactuals-for-robust-content-moderation/","published":"2025-10-01","authors":["Hardik Meisheri","Muhammad Zaid Hassan","Swati Tiwari","Puneet Mangla","Samarth Bharadwaj","Karthik Sankaranarayanan","Amit Singh"],"abstract":"Machine learning systems deployed in the wild must operate reliably despite unreliable inputs, whether arising from distribution shifts, adversarial manipulation, or strategic behavior by users. Content moderation is a prime example: violators deliberately exploit euphemisms, obfuscations, or benign co-occurrence patterns to evade detection, creating unreliable supervision signals for classifiers. We present a span-aware augmentation framework that generates high-quality counterfactual hard negatives to improve robustness under such conditions. Our pipeline combines (i) multi-LLM agreement to extract causal violation spans,(ii) policy-guided rewrites of those spans into compliant alternatives, and (iii) validation via reinference to ensure only genuine label-flipping counterfactuals are retained. Across real-world ad moderation and toxic comment datasets, this approach consistently reduc...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Miscellaneous","Artificial intelligence","Natural language processing","LLM"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"hf-org-paper:tencent:2510.01591","title":"CLUE: Non-parametric Verification from Experience via Hidden-State Clustering","url":"https://huggingface.co/papers/2510.01591","published":"2025-10-01","authors":["Tencent/Hunyuan"],"abstract":"Assessing the quality of Large Language Model (LLM) outputs presents a critical challenge. Previous methods either rely on text-level information (e.g., reward models, majority voting), which can overfit to superficial cues, or on calibrated confidence from token probabilities, which would fail on less-calibrated models. Yet both of these signals are, in fact, partial projections of a richer source of information: the model's internal hidden states. Early layers, closer to token embeddings, preserve semantic and lexical features that underpin text-based judgments, while later layers increasingly align with output logits, embedding confidence-related information. This paper explores hidden states directly as a unified foundation for verification. We show that the correctness of a solution is encoded as a geometrically separable signature within the trajectory of hidden activations. To val...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["HuggingFace org papers","tencent","LLM","language model"],"author_affiliations":["Tencent/Hunyuan"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/tencent/papers"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/blurguard-a-simple-approach-for-robustifying-image-protection-against-ai-powered-editing","title":"BlurGuard: A Simple Approach for Robustifying Image Protection Against AI-Powered Editing","url":"https://www.microsoft.com/en-us/research/publication/blurguard-a-simple-approach-for-robustifying-image-protection-against-ai-powered-editing/","published":"2025-10-01","authors":["Jinsu Kim","Yunhun Nam","Minseon Kim","Sangpil Kim","Jongheon Jeong"],"abstract":"Recent advances in text-to-image models have increased the exposure of powerful image editing techniques as a tool, raising concerns about their potential for malicious use. An emerging line of research to address such threats focuses on implanting “protective” adversarial noise into images before their public release, so future attempts to edit them using text-to-image models can be impeded. However, subsequent works have shown that these adversarial noises are often easily “reversed,” e.g., with techniques as simple as JPEG compression, casting doubt on the practicality of the approach. In this paper, we argue that adversarial noise for image protection should not only be imperceptible, as has been a primary focus of prior work, but also irreversible, viz., it should be difficult to detect as noise provided that the original image is hidden. We propose a surprisingly simple method to e...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","1970-01-01","compression"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/value-gradient-guidance-for-flow-matching-alignment","title":"Value Gradient Guidance for Flow Matching Alignment","url":"https://www.microsoft.com/en-us/research/publication/value-gradient-guidance-for-flow-matching-alignment/","published":"2025-10-01","authors":["Zhen Liu","Tim Xiao","Carles Domingo-Enrich","Weiyang Liu","Dinghuai Zhang"],"abstract":"While methods exist for aligning flow matching models -- a popular and effective class of generative models -- with human preferences, existing approaches fail to achieve both adaptation efficiency and probabilistically sound prior preservation. In this work, we leverage the theory of optimal control and propose VGG-Flow, a gradient matching–based method for finetuning pretrained flow matching models. The key idea in this algorithm is that the optimal difference between the finetuned velocity field and the pretrained one should be matched with the gradient field of a value function. This method not only incorporates first-order information from the reward model but also benefits from heuristic initialization of the value function to enable fast adaptation. Empirically, we show on a popular text-to-image flow matching model, Stable Diffusion 3, that our method can finetune flow matching m...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/robust-heuristic-algorithm-design-with-llms","title":"Robust Heuristic Algorithm Design with LLMs","url":"https://www.microsoft.com/en-us/research/publication/robust-heuristic-algorithm-design-with-llms/","published":"2025-10-01","authors":["Pantea Karimi Babaahmadi","Dany Rouhana","Pooria Namyar","Siva Kesava Reddy Kakarla","Venkat Arun","Behnaz Arzani"],"abstract":"Abstract — We posit that we can generate more robust andperformant heuristics if we augment approaches using LLMsfor heuristic design with tools that explain why heuristicsunderperform and suggestions about how to fix them. Wefind even simple ideas that (1) expose the LLM to instanceswhere the heuristic underperforms; (2) explain why theyoccur; and (3) specialize design to regions in the input space,can produce more robust algorithms compared to existingtechniques — the heuristics we produce have a ∼ 28× betterworst-case performance compared to FunSearch, improveaverage performance, and maintain the runtime.","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Tech Report","Systems and networking","LLM"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/rationalized-all-atom-protein-design-with-unified-multi-modal-bayesian-flow","title":"Rationalized All-Atom Protein Design with Unified Multi-modal Bayesian Flow","url":"https://www.microsoft.com/en-us/research/publication/rationalized-all-atom-protein-design-with-unified-multi-modal-bayesian-flow/","published":"2025-10-01","authors":["Hanlin Wu","Yuxuan Song","Zhe Zhang","Zhilong Zhang","Hao Zhou","Wei-Ying Ma","Jingjing Liu"],"abstract":"Designing functional proteins is a critical yet challenging problem due to the intricate interplay between backbone structures, sequences, and side-chains. Current approaches often decompose protein design into separate tasks, which can lead to accumulated errors, while recent efforts increasingly focus on all-atom protein design. However, we observe that existing all-atom generation approaches suffering from an information shortcut issue, where models inadvertently infer sequences from side-chain information, compromising their ability to accurately learn sequence distributions. To address this, we introduce a novel rationalized information flow strategy to eliminate the information shortcut. Furthermore, motivated by the advantages of Bayesian flows over differential equation–based methods, we propose the first Bayesian flow formulation for protein backbone orientations by recasting or...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/for-better-or-for-worse-transformers-seek-patterns-for-memorization","title":"For Better or for Worse, Transformers Seek Patterns for Memorization","url":"https://www.microsoft.com/en-us/research/publication/for-better-or-for-worse-transformers-seek-patterns-for-memorization/","published":"2025-10-01","authors":["Madhur Panwar","Gail Weiss","Navin Goyal","Antoine Bosselut"],"abstract":"Memorization in language models is a critical yet poorly understood phenomenon. In this work, we investigate memorization in transformer-based language models by analyzing their training dynamics over multiple epochs. We find that memorization is neither a constant accumulation of sequences nor simply dictated by the recency of exposure to these sequences. Instead, much like generalization, memorization appears to be driven by pattern recognition. Tracking memorization dynamics in mixed datasets, we observe that models memorize different sub-datasets in distinct bursts, suggesting that each subset is associated with unique underlying patterns, and that the model prefers to learn these patterns in a predictable order. While easily learnable patterns tend to support generalization on unseen data, more complex patterns do not. Furthermore, in datasets with weak or absent patterns, models ma...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/do-llms-comply-differently-during-tests-and-can-we-steer-that","title":"Do LLMs Comply Differently During Tests? And Can We Steer That?","url":"https://www.microsoft.com/en-us/research/publication/do-llms-comply-differently-during-tests-and-can-we-steer-that/","published":"2025-10-01","authors":["Sahar Abdelnabi","Ahmed Salem"],"abstract":"Reasoning‐focused large language models (LLMs) sometimes alter their behavior when they detect that they are being evaluated—an effect analogous to the Hawthorne phenomenon—which can lead them to optimize for test‐passing performance or to comply more readily with harmful prompts if real‐world consequences appear absent. We present the first quantitative study of how such \"test awareness'' impacts model behavior, particularly its safety alignment. We introduce a white‐box probing framework that (i) linearly identifies awareness‐related activations and (ii) steers models toward or away from test awareness while monitoring downstream performance. We apply our method to different state-of-the-art open-source reasoning LLMs across both realistic and hypothetical tasks. Our results demonstrate that test awareness significantly impact safety alignment, and is different for different models. By...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"hf-org-paper:tencent:2510.01444","title":"VOGUE: Guiding Exploration with Visual Uncertainty Improves Multimodal Reasoning","url":"https://huggingface.co/papers/2510.01444","published":"2025-10-01","authors":["Tencent/Hunyuan"],"abstract":"Reinforcement learning with verifiable rewards (RLVR) improves reasoning in large language models (LLMs) but struggles with exploration, an issue that still persists for multimodal LLMs (MLLMs). Current methods treat the visual input as a fixed, deterministic condition, overlooking a critical source of ambiguity and struggling to build policies robust to plausible visual variations. We introduce VOGUE (Visual Uncertainty Guided Exploration), a novel method that shifts exploration from the output (text) to the input (visual) space. By treating the image as a stochastic context, VOGUE quantifies the policy's sensitivity to visual perturbations using the symmetric KL divergence between a \"raw\" and \"noisy\" branch, creating a direct signal for uncertainty-aware exploration. This signal shapes the learning objective via an uncertainty-proportional bonus, which, combined with a token-entropy bo...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["HuggingFace org papers","tencent"],"author_affiliations":["Tencent/Hunyuan"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/tencent/papers"}},{"id":"openalex:W4414747707","title":"Foundation model for efficient biological discovery in single-molecule time traces","url":"https://doi.org/10.1038/s41592-025-02839-4","published":"2025-10-01","authors":["Jieming Li","Leyou Zhang","Alexander Johnson‐Buck","Nils G. Walter"],"abstract":"","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1038/s41592-025-02839-4","openalex_id":"https://openalex.org/W4414747707","cited_by_count":4,"quality_score":45,"matched_keywords":["efficient"],"author_affiliations":["Analysis Group (United States)","Bristol-Myers Squibb (Germany)","Bristol-Myers Squibb (Ireland)","Google (United States)","The Bristol-Myers Squibb Children's Hospital","University of Michigan"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7705000042915344},{"id":"https://openalex.org/C2776865275","display_name":"Projector","score":0.5049999952316284},{"id":"https://openalex.org/C80444323","display_name":"Theoretical computer science","score":0.4142000079154968},{"id":"https://openalex.org/C75291252","display_name":"TRACE (psycholinguistics)","score":0.4101000130176544},{"id":"https://openalex.org/C176217482","display_name":"Metric (unit)","score":0.40299999713897705},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3702000081539154},{"id":"https://openalex.org/C49937458","display_name":"Probabilistic logic","score":0.32710000872612},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.32409998774528503}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":4}},{"id":"openalex:W4414715259","title":"Pre-trained molecular language models with random functional group masking","url":"https://doi.org/10.1038/s44387-025-00029-3","published":"2025-10-01","authors":["Tianhao Peng","Yuchen Li","Xuhong Li","Jiang Bian","Zeke Xie","Ning Sui","Shahid Mumtaz","Yanwu Xu","Linghe Kong","Haoyi Xiong"],"abstract":"Recent advancements in computational chemistry utilize transformer-based models pre-trained on Simplified Molecular Input Line Entry System (SMILES) sequences to predict molecular properties. To improve upon existing methods, we propose MLM-FG, a molecular language model with a novel pre-training strategy that randomly masks subsequences corresponding to chemically significant functional groups. This technique compels the model to better infer molecular structures and properties by learning the context of these key units. Extensive evaluations across 11 benchmark tasks demonstrate the superiority of MLM-FG, outperforming existing SMILES- and graph-based models in 9 of the 11 tasks. Remarkably, MLM-FG surpasses even some 3D-graph-based models, highlighting its exceptional capacity for representation learning without explicit 3D structural information. These results indicate that MLM-FG ef...","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1038/s44387-025-00029-3","openalex_id":"https://openalex.org/W4414715259","cited_by_count":3,"quality_score":44,"matched_keywords":["language model"],"author_affiliations":["Baidu (China)","Beihang University","North Carolina State University","Nottingham Trent University","Shanghai Jiao Tong University","Silesian University of Technology","South China University of Technology"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.724399983882904},{"id":"https://openalex.org/C2779343474","display_name":"Context (archaeology)","score":0.607200026512146},{"id":"https://openalex.org/C2776359362","display_name":"Representation (politics)","score":0.5637999773025513},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5558000206947327},{"id":"https://openalex.org/C26517878","display_name":"Key (lock)","score":0.554099977016449},{"id":"https://openalex.org/C185798385","display_name":"Benchmark (surveying)","score":0.4918999969959259},{"id":"https://openalex.org/C66024118","display_name":"Computational model","score":0.4325999915599823},{"id":"https://openalex.org/C2777402240","display_name":"Masking (illustration)","score":0.42260000109672546}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":3}},{"id":"arxiv:2510.00615","title":"ACON: Optimizing Context Compression for Long-horizon LLM Agents","url":"https://huggingface.co/papers/2510.00615","published":"2025-10-01","authors":["Minki Kang","Wei-Ning Chen","Dongge Han","Huseyin A. Inan","Lukas Wutschitz","Yanzhi Chen","Robert Sim","Saravan Rajmohan"],"abstract":"Large language models (LLMs) are increasingly deployed as agents in dynamic, real-world environments, where success requires both reasoning and effective tool use. A central challenge for agentic tasks is the growing context length, as agents must accumulate long histories of actions and observations. This expansion raises costs and reduces efficiency in long-horizon tasks, yet prior work on context compression has mostly focused on single-step tasks or narrow applications. We introduce Agent Context Optimization (ACON), a unified framework that optimally compresses both environment observations and interaction histories into concise yet informative condensations. ACON leverages compression guideline optimization in natural language space: given paired trajectories where full context succeeds but compressed context fails, capable LLMs analyze the causes of failure, and the compression gu...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["huggingface_search"],"source":"huggingface_search","work_type":"","doi":"","openalex_id":"","cited_by_count":0,"quality_score":43,"matched_keywords":["LLM","memory","compression","agent"],"author_affiliations":[],"concepts":[],"official_report":false,"quality_signals":{"company_match_source":"HuggingFace Papers organization/author metadata"}},{"id":"arxiv:2510.10620","title":"DCP: Addressing Input Dynamism In Long-Context Training via Dynamic Context Parallelism","url":"http://arxiv.org/abs/2510.10620","published":"2025-10-01","authors":["Chenyu Jiang","Zhenkun Cai","Ye Tian","Zhen Jia","Yida Wang","Chuan Wu"],"abstract":"Context parallelism has emerged as a key technique to support long-context training, a growing trend in generative AI for modern large models. However, existing context parallel methods rely on static parallelization configurations that overlook the dynamic nature of training data, specifically, the variability in sequence lengths and token relationships (i.e., attention patterns) across samples. As a result, these methods often suffer from unnecessary communication overhead and imbalanced computation. In this paper, we present DCP, a dynamic context parallel training framework that introduces fine-grained blockwise partitioning of both data and computation. By enabling flexible mapping of data and computation blocks to devices, DCP can adapt to varying sequence characteristics, effectively reducing communication and improving memory and computation balance. Micro-benchmarks demonstrate....","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3731569.3764849","openalex_id":"https://openalex.org/W4414736012","cited_by_count":0,"quality_score":41,"matched_keywords":["memory"],"author_affiliations":["Amazon (United States)","Chinese University of Hong Kong","University of Hong Kong"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8328999876976013},{"id":"https://openalex.org/C2779343474","display_name":"Context (archaeology)","score":0.7235999703407288},{"id":"https://openalex.org/C2779960059","display_name":"Overhead (engineering)","score":0.6636999845504761},{"id":"https://openalex.org/C2781172179","display_name":"Parallelism (grammar)","score":0.635200023651123},{"id":"https://openalex.org/C45374587","display_name":"Computation","score":0.6172000169754028},{"id":"https://openalex.org/C26517878","display_name":"Key (lock)","score":0.6103000044822693},{"id":"https://openalex.org/C2778112365","display_name":"Sequence (biology)","score":0.6011999845504761},{"id":"https://openalex.org/C2775836275","display_name":"Dynamism","score":0.513700008392334}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4412494106","title":"Breaking data silos: incorporating the DICOM imaging standard into the OMOP CDM to enable multimodal research.","url":"https://pubmed.ncbi.nlm.nih.gov/40680297","published":"2025-10-01","authors":["Woo Yeon Park","T Schmidt","Gabriel Lucca de Oliveira Salvador","K.P. O’Donnell","Brad Genereaux","Kyulee Jeon","Seng Chan You","Blake E. Dewey","Paul Nagy"],"abstract":"OBJECTIVE: This work incorporates the Digital Imaging Communications in Medicine (DICOM) Standard into the Observational Medical Outcomes Partnership Common Data Model (OMOP CDM) to standardize and accurately represent imaging studies, such as acquisition parameters, in multimodal research studies. MATERIALS AND METHODS: DICOM is the internationally adopted standard that defines entities and relationships for biomedical imaging data used for clinical imaging studies. Most of the complexity in the DICOM data structure centers around the metadata. This metadata contains information about the patient and the modality acquisition parameters. We parsed the DICOM vocabularies in Parts 3, 6, and 16 to obtain structured metadata definitions and added these as custom concepts in the OMOP CDM vocabulary. To validate our pipeline, we harvested and transformed DICOM metadata from magnetic resonance....","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1093/jamia/ocaf091","openalex_id":"https://openalex.org/W4412494106","cited_by_count":3,"quality_score":40,"matched_keywords":[],"author_affiliations":["Canon (Japan)","Johns Hopkins Medicine","Johns Hopkins University","Nvidia (United States)","Yonsei University","Yonsei University Health System"],"concepts":[{"id":"https://openalex.org/C77331912","display_name":"DICOM","score":0.6680553555488586},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5627850294113159},{"id":"https://openalex.org/C48255552","display_name":"Information silo","score":0.44388478994369507},{"id":"https://openalex.org/C201995342","display_name":"Systems engineering","score":0.3991122841835022},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.3216562271118164},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.2673470973968506},{"id":"https://openalex.org/C127413603","display_name":"Engineering","score":0.23240745067596436},{"id":"https://openalex.org/C78519656","display_name":"Mechanical engineering","score":0.10703524947166443}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":3}},{"id":"openalex:W7110256878","title":"Guiding Evolution of Artificial Life Using Vision-Language Models","url":"https://doi.org/10.1162/isal.a.850","published":"2025-10-01","authors":["Nikhil Baid","Hannah Erlebach","Paul Hellegouarch","Frederico Wieser"],"abstract":"Foundation models (FMs) have recently opened up new frontiers in the field of artificial life (ALife) by providing powerful tools to automate search through ALife simulations. Previous work aligns ALife simulations with natural language target prompts using vision-language models (VLMs). We build on Automated Search for Artificial Life (ASAL) by introducing ASAL++, a method for open-ended-like search guided by multimodal FMs. We use a second FM to propose new evolutionary targets based on a simulation’s visual history. This induces an evolutionary trajectory with increasingly complex targets. We explore two strategies: (1) evolving a simulation to match a single new prompt at each iteration (Evolved Supervised Targets: EST) and (2) evolving a simulation to match the entire sequence of generated prompts (Evolved Temporal Targets: ETT). We test our method empirically in the Lenia substrate...","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1162/isal.a.850","openalex_id":"https://openalex.org/W7110256878","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Baidu (China)","Centre National de la Recherche Scientifique","Hannah Research Foundation","Institut Pasteur","Pasteur Hellenic Institute","Universities UK","University College London","Université Paris Cité"],"concepts":[{"id":"https://openalex.org/C19273510","display_name":"Artificial life","score":0.8463000059127808},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7185999751091003},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.6678000092506409},{"id":"https://openalex.org/C159149176","display_name":"Evolutionary algorithm","score":0.5450999736785889},{"id":"https://openalex.org/C9652623","display_name":"Field (mathematics)","score":0.5038999915122986},{"id":"https://openalex.org/C2778112365","display_name":"Sequence (biology)","score":0.45660001039505005},{"id":"https://openalex.org/C13662910","display_name":"Trajectory","score":0.45399999618530273},{"id":"https://openalex.org/C105902424","display_name":"Evolutionary computation","score":0.44429999589920044}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4414846515","title":"AI-crafted narratives: an empirical study on generating interactive stories using generative pre-training transformers","url":"https://doi.org/10.1007/s10489-025-06833-3","published":"2025-10-01","authors":["Ana Carolina de Souza Mendes","Mason Adsero","Joshua Palicka","Nurulla Zholdoshov","Zhao Xin"],"abstract":"","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1007/s10489-025-06833-3","openalex_id":"https://openalex.org/W4414846515","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Google (United States)","Seattle University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8870000243186951},{"id":"https://openalex.org/C120936955","display_name":"Empirical research","score":0.7294999957084656},{"id":"https://openalex.org/C66322947","display_name":"Transformer","score":0.6499000191688538},{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.6322000026702881},{"id":"https://openalex.org/C26760741","display_name":"Perception","score":0.5874000191688538},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.5127999782562256},{"id":"https://openalex.org/C9652623","display_name":"Field (mathematics)","score":0.5113000273704529},{"id":"https://openalex.org/C2779530757","display_name":"Quality (philosophy)","score":0.40529999136924744}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"arxiv:2510.01180","title":"BroRL: Scaling Reinforcement Learning via Broadened Exploration","url":"https://huggingface.co/papers/2510.01180","published":"2025-10-01","authors":["Jian Hu","Mingjie Liu","Ximing Lu","Fang Wu","Zaid Harchaoui","Shizhe Diao","Yejin Choi","Pavlo Molchanov","Jun Yang","Jan Kautz","Yi Dong"],"abstract":"Reinforcement Learning with Verifiable Rewards (RLVR) has emerged as a key ingredient for unlocking complex reasoning capabilities in large language models. Recent work ProRL has shown promise in scaling RL by increasing the number of training steps. However, performance plateaus after thousands of steps, with clear diminishing returns from allocating more computation to additional training. In this work, we investigate a complementary paradigm for scaling RL, BroR-Lincreasing the number of rollouts per example to hundreds to exhaustively Broaden exploration, which yields continuous performance gains beyond the saturation point observed in ProRL when scaling the number of training steps. Our approach is motivated by a mass balance equation analysis allowing us to characterize the rate of change in probability mass for correct and incorrect tokens during the reinforcement process. We show...","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["huggingface_search"],"source":"huggingface_search","work_type":"","doi":"","openalex_id":"","cited_by_count":0,"quality_score":27,"matched_keywords":[],"author_affiliations":[],"concepts":[],"official_report":false,"quality_signals":{"company_match_source":"HuggingFace Papers organization/author metadata"}},{"id":"official:826f92a924d45004","title":"Claude Haiku 4.5 System Card","url":"https://www-cdn.anthropic.com/7aad69bf12627d42234e01ee7c36305dc2f6a970.pdf","published":"2025-10","authors":["Anthropic"],"abstract":"Official Anthropic system card for Claude Haiku 4.5.","companies":["Anthropic"],"matched_orgs":["Anthropic"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"model_card","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["system card","Claude","Claude Haiku 4.5"],"author_affiliations":["Anthropic"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Anthropic system cards page https://www.anthropic.com/system-cards"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/improving-code-localization-with-repository-memory","title":"Improving Code Localization with Repository Memory","url":"https://www.microsoft.com/en-us/research/publication/improving-code-localization-with-repository-memory/","published":"2025-09-30","authors":["Boshi Wang","Weijian Xu","Yunsheng Li","Mei Gao","Yujia Xie","Huan Sun","Dongdong Chen"],"abstract":"Code localization is a fundamental challenge in repository-level software engineering tasks such as bug fixing. While existing methods equip language agents with comprehensive tools/interfaces to fetch information from the repository, they overlook the critical aspect of memory, where each instance is typically handled from scratch assuming no prior repository knowledge. In contrast, human developers naturally build long-term repository memory, such as the functionality of key modules and associations between various bug types and their likely fix locations. In this work, we augment language agents with such memory by leveraging a repository's commit history -- a rich yet underutilized resource that chronicles the codebase's evolution. We introduce tools that allow the agent to retrieve from a non-parametric memory encompassing recent historical commits and linked issues, as well as func...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":80,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","1970-01-01","memory","long-term","agent"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/flexicodec-a-dynamic-neural-audio-codec-for-low-frame-rates","title":"FlexiCodec: A Dynamic Neural Audio Codec for Low Frame Rates","url":"https://www.microsoft.com/en-us/research/publication/flexicodec-a-dynamic-neural-audio-codec-for-low-frame-rates/","published":"2025-09-30","authors":["Jiaqi Li","Yao Qian","Yuxuan Hu","Leying Zhang","Xiaofei Wang","Heng Lu","Manthan Thakker","Jinyu Li","Sheng Zhao","Zhizheng Wu"],"abstract":"Neural audio codecs are foundational to speech language models. It is expected to have a low frame rate and decoupled semantic and acoustic information. A lower frame rate codec can reduce the computational cost of speech language models by shortening the sequence length. Recent studies have developed 12.5Hz low-frame-rate audio codecs, but even lower frame rate codecs remain underexplored. We find that a major challenge for very low frame rate tokens is missing semantic information. This paper introduces FlexiCodec to address this limitation. FlexiCodec improves semantic preservation with a dynamic frame rate approach and introduces a novel architecture featuring an ASR feature-assisted dual stream encoding and Transformer bottlenecks. With dynamic frame rates, it uses less frames at information-sparse regions through adaptively merging semantically similar frames. A dynamic frame rate....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":80,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Audio and Acoustics","Computer science","speech language models","1970-01-01","language model"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/a-brain-inspired-agentic-architecture-to-improve-planning-with-llms","title":"A brain-inspired agentic architecture to improve planning with LLMs","url":"https://www.microsoft.com/en-us/research/publication/a-brain-inspired-agentic-architecture-to-improve-planning-with-llms/","published":"2025-09-30","authors":["Taylor Webb","Shanka Subhra Mondal","Ida Momennejad"],"abstract":"Large language models (LLMs) demonstrate impressive performance on a wide variety of tasks, but they often struggle with tasks that require multi-step reasoning or goal-directed planning. To address this, we take inspiration from the human brain, in which planning is accomplished via component processes that are predominantly associated with specific brain regions. These processes include conflict monitoring, state prediction, state evaluation, task decomposition, and task coordination. We find that LLMs are often capable of carrying out these functions in isolation, but struggle to autonomously coordinate them in the service of a goal. Therefore, we propose a modular agentic architecture - the Modular Agentic Planner (MAP) - in which planning is performed via the interaction of specialized brain-inspired LLM modules. We evaluate MAP on three challenging planning tasks – graph traversal,...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.1038/s41467-025-63804-5","openalex_id":"https://openalex.org/W4414625998","cited_by_count":5,"quality_score":73,"matched_keywords":["Article (Journal)","Artificial intelligence","LLM","efficient"],"author_affiliations":["Microsoft","Microsoft (United States)","Microsoft Research New York City (United States)","Mila - Quebec Artificial Intelligence Institute","Princeton University","Université de Montréal"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"hf-org-paper:tencent:2509.26226","title":"Thinking-Free Policy Initialization Makes Distilled Reasoning Models More Effective and Efficient Reasoners","url":"https://huggingface.co/papers/2509.26226","published":"2025-09-30","authors":["Tencent/Hunyuan"],"abstract":"Reinforcement Learning with Verifiable Reward (RLVR) effectively solves complex tasks but demands extremely long context lengths during training, leading to substantial computational costs. While multi-stage training can partially mitigate this, starting with overly short contexts often causes irreversible performance degradation, ultimately failing to reduce overall training compute significantly. In this paper, we introduce **T**hinking-**F**ree **P**olicy **I**nitialization (**TFPI**), a simple yet effective adaptation to RLVR that bridges long Chain-of-Thought (CoT) distillation and standard RLVR. TFPI employs a simple *ThinkFree* operation, explicitly discarding the thinking content via a direct *</think>* append, to reduce token usage during inference. Training with *ThinkFree*-adapted inputs improves performance and lowers token consumption, even in the original slow-thinking mode...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["HuggingFace org papers","tencent","efficient","distillation"],"author_affiliations":["Tencent/Hunyuan"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/tencent/papers"}},{"id":"hf-org-paper:huawei-noah:2509.26497","title":"Revealing the Power of Post-Training for Small Language Models via Knowledge Distillation","url":"https://huggingface.co/papers/2509.26497","published":"2025-09-30","authors":["Huawei/Noah"],"abstract":"","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["HuggingFace org papers","huawei-noah","distillation"],"author_affiliations":["Huawei/Noah"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/huawei-noah/papers"}},{"id":"hf-org-paper:tencent:2509.26514","title":"BatonVoice: An Operationalist Framework for Enhancing Controllable Speech Synthesis with Linguistic Intelligence from LLMs","url":"https://huggingface.co/papers/2509.26514","published":"2025-09-30","authors":["Tencent/Hunyuan"],"abstract":"The rise of Large Language Models (LLMs) is reshaping multimodel models, with speech synthesis being a prominent application. However, existing approaches often underutilize the linguistic intelligence of these models, typically failing to leverage their powerful instruction-following capabilities. This limitation hinders the model's ability to follow text instructions for controllable Text-to-Speech~(TTS). To address this, we propose a new paradigm inspired by ``operationalism'' that decouples instruction understanding from speech generation. We introduce BatonVoice, a framework where an LLM acts as a ``conductor'', understanding user instructions and generating a textual ``plan'' -- explicit vocal features (e.g., pitch, energy). A separate TTS model, the ``orchestra'', then generates the speech from these features. To realize this component, we develop BatonTTS, a TTS model trained spe...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["HuggingFace org papers","tencent","LLM"],"author_affiliations":["Tencent/Hunyuan"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/tencent/papers"}},{"id":"hf-org-paper:Tencent-Hunyuan:2509.26618","title":"DA^2: Depth Anything in Any Direction","url":"https://huggingface.co/papers/2509.26618","published":"2025-09-30","authors":["Tencent/Hunyuan"],"abstract":"","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["HuggingFace org papers","Tencent-Hunyuan"],"author_affiliations":["Tencent/Hunyuan"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/Tencent-Hunyuan/papers"}},{"id":"official:0772c7e1857ebcdf","title":"Sora 2 is here","url":"https://openai.com/index/sora-2","published":"2025-09-30","authors":["OpenAI"],"abstract":"Our latest video generation model is more physically accurate, realistic, and controllable than prior systems. It also features synchronized dialogue and sound effects. Create with it in the new Sora app.","companies":["OpenAI"],"matched_orgs":["OpenAI"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Research"],"author_affiliations":["OpenAI"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official RSS feed https://openai.com/news/rss.xml"}},{"id":"official:b357a39a4501c3e2","title":"Sora 2 System Card","url":"https://openai.com/index/sora-2-system-card","published":"2025-09-30","authors":["OpenAI"],"abstract":"Sora 2 is our new state of the art video and audio generation model. Building on the foundation of Sora, this new model introduces capabilities that have been difficult for prior video models to achieve– such as more accurate physics, sharper realism, synchronized audio, enhanced steerability, and an expanded stylistic range.","companies":["OpenAI"],"matched_orgs":["OpenAI"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Publication"],"author_affiliations":["OpenAI"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official RSS feed https://openai.com/news/rss.xml"}},{"id":"arxiv:2509.26328","title":"Fast-dLLM v2: Efficient Block-Diffusion LLM","url":"https://huggingface.co/papers/2509.26328","published":"2025-09-30","authors":["Chengyue Wu","Hao Zhang","Shuchen Xue","Shizhe Diao","Yonggan Fu","Zhijian Liu","Pavlo Molchanov","Ping Luo","Song Han","Enze Xie"],"abstract":"Autoregressive (AR) large language models (LLMs) have achieved remarkable performance across a wide range of natural language tasks, yet their inherent sequential decoding limits inference efficiency. In this work, we propose Fast-dLLM v2, a carefully designed block diffusion language model (dLLM) that efficiently adapts pretrained AR models into dLLMs for parallel text generation, requiring only approximately 1B tokens of fine-tuning. This represents a 500x reduction in training data compared to full-attention diffusion LLMs such as Dream (580B tokens), while preserving the original model's performance. Our approach introduces a novel training recipe that combines a block diffusion mechanism with a complementary attention mask, enabling blockwise bidirectional context modeling without sacrificing AR training objectives. To further accelerate decoding, we design a hierarchical caching me...","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["huggingface_search"],"source":"huggingface_search","work_type":"","doi":"","openalex_id":"","cited_by_count":0,"quality_score":39,"matched_keywords":["LLM","language model","efficient"],"author_affiliations":[],"concepts":[],"official_report":false,"quality_signals":{"company_match_source":"HuggingFace Papers organization/author metadata"}},{"id":"arxiv:2509.25760","title":"TruthRL: Incentivizing Truthful LLMs via Reinforcement Learning","url":"https://huggingface.co/papers/2509.25760","published":"2025-09-30","authors":["Zhepei Wei","Xiao Yang","Kai Sun","Jiaqi Wang","Rulin Shao","Sean Chen","Mohammad Kachuee","Teja Gollapudi","Tony Liao","Nicolas Scheffer","Rakesh Wanga","Anuj Kumar"],"abstract":"While large language models (LLMs) have demonstrated strong performance on factoid question answering, they are still prone to hallucination and untruthful responses, particularly when tasks demand information outside their parametric knowledge. Indeed, truthfulness requires more than accuracy -- models must also recognize uncertainty and abstain when unsure to avoid hallucinations. This presents a fundamental challenge for existing methods: approaches that optimize for accuracy often amplify hallucinations, while those that encourage abstention can become overly conservative, sacrificing correct answers. Both extremes ultimately compromise truthfulness. In this work, we present TruthRL, a general reinforcement learning (RL) framework that directly optimizes the truthfulness of LLMs. Specifically, we implement TruthRL using GRPO with a simple yet effective ternary reward that distinguish...","companies":["Meta/FAIR"],"matched_orgs":["Meta/FAIR"],"company_groups":["company_us"],"company_regions":["US"],"sources":["huggingface_search"],"source":"huggingface_search","work_type":"","doi":"","openalex_id":"","cited_by_count":0,"quality_score":31,"matched_keywords":["retrieval"],"author_affiliations":[],"concepts":[],"official_report":false,"quality_signals":{"company_match_source":"HuggingFace Papers organization/author metadata"}},{"id":"arxiv:2510.02387","title":"CWM: An Open-Weights LLM for Research on Code Generation with World Models","url":"https://huggingface.co/papers/2510.02387","published":"2025-09-30","authors":["FAIR CodeGen team","Quentin Carbonneaux","Gal Cohen","Jonas Gehring","Jacob Kahn","Jannik Kossen","Felix Kreuk","Emily McMilin","Michel Meyer","Yuxiang Wei","David Zhang","Kunhao Zheng"],"abstract":"We release Code World Model (CWM), a 32-billion-parameter open-weights LLM, to advance research on code generation with world models. To improve code understanding beyond what can be learned from training on static code alone, we mid-train CWM on a large amount of observation-action trajectories from Python interpreter and agentic Docker environments, and perform extensive multi-task reasoning RL in verifiable coding, math, and multi-turn software engineering environments. With CWM, we provide a strong testbed for researchers to explore the opportunities world modeling affords for improving code generation with reasoning and planning in computational environments. We present first steps of how world models can benefit agentic coding, enable step-by-step simulation of Python code execution, and show early results of how reasoning can benefit from the latter. CWM is a dense, decoder-only L...","companies":["Meta/FAIR"],"matched_orgs":["Meta/FAIR"],"company_groups":["company_us"],"company_regions":["US"],"sources":["huggingface_search"],"source":"huggingface_search","work_type":"","doi":"","openalex_id":"","cited_by_count":0,"quality_score":31,"matched_keywords":["LLM"],"author_affiliations":[],"concepts":[],"official_report":false,"quality_signals":{"company_match_source":"HuggingFace Papers organization/author metadata"}},{"id":"arxiv:2509.25849","title":"Knapsack RL: Unlocking Exploration of LLMs via Optimizing Budget Allocation","url":"https://huggingface.co/papers/2509.25849","published":"2025-09-30","authors":["Ziniu Li","Congliang Chen","Tianyun Yang","Tian Ding","Ruoyu Sun","Ge Zhang","Wenhao Huang","Zhi-Quan Luo"],"abstract":"Large Language Models (LLMs) can self-improve through reinforcement learning, where they generate trajectories to explore and discover better solutions. However, this exploration process is computationally expensive, often forcing current methods to assign limited exploration budgets to each task. This uniform allocation creates problematic edge cases: easy tasks consistently succeed while difficult tasks consistently fail, both producing zero gradients during training updates for the widely used Group Relative Policy Optimization (GRPO). We address this problem from the lens of exploration budget allocation. Viewing each task's exploration as an \"item\" with a distinct \"value\" and \"cost\", we establish a connection to the classical knapsack problem. This formulation allows us to derive an optimal assignment rule that adaptively distributes resources based on the model's current learning s...","companies":["ByteDance/Seed"],"matched_orgs":["ByteDance/Seed"],"company_groups":["company_china"],"company_regions":["China"],"sources":["huggingface_search"],"source":"huggingface_search","work_type":"","doi":"","openalex_id":"","cited_by_count":0,"quality_score":27,"matched_keywords":[],"author_affiliations":[],"concepts":[],"official_report":false,"quality_signals":{"company_match_source":"HuggingFace Papers organization/author metadata"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/the-invisible-mentor-inferring-user-actions-from-screen-recordings-to-recommend-better-workflows","title":"The Invisible Mentor: Inferring User Actions from Screen Recordings to Recommend Better Workflows","url":"https://www.microsoft.com/en-us/research/publication/the-invisible-mentor-inferring-user-actions-from-screen-recordings-to-recommend-better-workflows/","published":"2025-09-29","authors":["Litao Yan","Andrew Head","K. Milne","Vu Le","Sumit Gulwani","Chris Parnin","Emerson Murphy-Hill"],"abstract":"Many users struggle to notice when a more efficient workflow exists in feature-rich tools like Excel. Existing AI assistants offer help only after users describe their goals or problems, which can be effortful and imprecise. We present InvisibleMentor, a system that turns screen recordings of task completion into vision-grounded reflections on tasks. It detects issues such as repetitive edits and recommends more efficient alternatives based on observed behavior. Unlike prior systems that rely on logs, APIs, or user prompts, InvisibleMentor operates directly on screen recordings. It uses a two-stage pipeline: a vision-language model reconstructs actions and context, and a language model generates structured, high-fidelity suggestions. In evaluation, InvisibleMentor accurately identified inefficient workflows, and participants found its suggestions more actionable, tailored, and more helpf...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.1145/3772318.3790294","openalex_id":"https://openalex.org/W7154157734","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Human-computer interaction","Computer science","1970-01-01","language model","efficient"],"author_affiliations":["Microsoft","Microsoft (United States)","University of Pennsylvania"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/building-benchmarks-from-the-ground-up-community-centered-evaluation-of-llms-in-healthcare-chatbot-settings","title":"Building Benchmarks from the Ground Up: Community-Centered Evaluation of LLMs in Healthcare Chatbot Settings","url":"https://www.microsoft.com/en-us/research/publication/building-benchmarks-from-the-ground-up-community-centered-evaluation-of-llms-in-healthcare-chatbot-settings/","published":"2025-09-29","authors":["H. Hamna","Gayatri Bhat","Sourabrata Mukherjee","Faisal Lalani","Evan Hadfield","Divya Siddarth","Kalika Bali","Sunayana Sitaram"],"abstract":"Large Language Models (LLMs) are typically evaluated through general or domain-specific benchmarks testing capabilities that often lack grounding in the lived realities of end users. Critical domains such as healthcare require evaluations that extend beyond artificial or simulated tasks to reflect the everyday needs, cultural practices, and nuanced contexts of communities. We propose Samiksha, a community-driven evaluation pipeline co-created with civil-society organizations (CSOs) and community members. Our approach enables scalable, automated benchmarking through a culturally aware, community-driven pipeline in which community feedback informs what to evaluate, how the benchmark is built, and how outputs are scored. We demonstrate this approach in the health domain in India. Our analysis highlights how current multilingual LLMs address nuanced community health queries, while also offer...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Human-computer interaction","Computer science","large language models","1970-01-01","LLM"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/understanding-the-mixture-of-experts-with-nadaraya-watson-kernel","title":"Understanding the Mixture-of-Experts with Nadaraya-Watson Kernel","url":"https://www.microsoft.com/en-us/research/publication/understanding-the-mixture-of-experts-with-nadaraya-watson-kernel/","published":"2025-09-29","authors":["Chuanyang Zheng","Jiankai Sun","Yihang Gao","Enze Xie","Yuehao Wang","Peihao Wang","Ting Xu","Matthew Chang","Liliang Ren","Jingyao Li","Jing Xiong","Kashif Rasul"],"abstract":"Mixture-of-Experts (MoE) has become a cornerstone in recent state-of-the-art large language models (LLMs). Traditionally, MoE relies on [latex]\\mathrm{Softmax}[/latex] as the router score function to aggregate expert output, a designed choice that has persisted from the earliest MoE models to modern LLMs, and is now widely regarded as standard practice. However, the necessity of using [latex]\\mathrm{Softmax}[/latex] to project router weights into a probability simplex remains an unchallenged assumption rather than a principled design choice. In this work, we first revisit the classical Nadaraya-Watson regression and observe that MoE shares the same mathematical formulation as Nadaraya-Watson regression. Furthermore, we show that both feed-forward neural network (FFN) and MoE can be interpreted as a special case of Nadaraya-Watson regression, where the kernel function corresponds to the i...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","1970-01-01","LLM"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"hf-org-paper:tencent:2509.25052","title":"Cogito, Ergo Ludo: An Agent that Learns to Play by Reasoning and Planning","url":"https://huggingface.co/papers/2509.25052","published":"2025-09-29","authors":["Tencent/Hunyuan"],"abstract":"The pursuit of artificial agents that can learn to master complex environments has led to remarkable successes, yet prevailing deep reinforcement learning methods often rely on immense experience, encoding their knowledge opaquely within neural network weights. We propose a different paradigm, one in which an agent learns to play by reasoning and planning. We introduce Cogito, ergo ludo (CEL), a novel agent architecture that leverages a Large Language Model (LLM) to build an explicit, language-based understanding of its environment's mechanics and its own strategy. Starting from a tabula rasa state with no prior knowledge (except action set), CEL operates on a cycle of interaction and reflection. After each episode, the agent analyzes its complete trajectory to perform two concurrent learning processes: Rule Induction, where it refines its explicit model of the environment's dynamics, an...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["HuggingFace org papers","tencent","LLM","language model","agent"],"author_affiliations":["Tencent/Hunyuan"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/tencent/papers"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/biasbusters-uncovering-and-mitigating-tool-selection-bias-in-large-language-models","title":"BiasBusters: Uncovering and Mitigating Tool Selection Bias in Large Language Models","url":"https://www.microsoft.com/en-us/research/publication/biasbusters-uncovering-and-mitigating-tool-selection-bias-in-large-language-models/","published":"2025-09-29","authors":["Thierry Blankenstein","Jialin Yu","Zixuan Li","Vassilis Plachouras","Sunando Sengupta","Philip H. S. Torr","Yarin Gal","Alasdair Paren","Adel Bibi"],"abstract":"Agents backed by large language models (LLMs) often rely on external tools drawn from marketplaces where multiple providers offer functionally equivalent options. This raises a critical point concerning fairness: if selection is systematically biased, it can degrade user experience and distort competition by privileging some providers over others. We introduce a benchmark of diverse tool categories, each containing multiple functionally equivalent tools, to evaluate tool-selection bias. Using this benchmark, we test seven models and show that unfairness exists with models either fixating on a single provider or disproportionately preferring earlier-listed tools in context. To investigate the origins of this bias, we conduct controlled experiments examining tool features, metadata (name, description, parameters), and pre-training exposure. We find that: (1) semantic alignment between quer...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","large language models","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"hf-org-paper:Qwen:2509.25084","title":"Scaling Generalist Data-Analytic Agents","url":"https://huggingface.co/papers/2509.25084","published":"2025-09-29","authors":["Alibaba/Qwen"],"abstract":"Data-analytic agents are emerging as a key catalyst for automated scientific discovery and for the vision of Innovating AI. Current approaches, however, rely heavily on prompt engineering over proprietary models, while open-source models struggle to face diverse-format, large-scale data files and long-horizon, multi-step reasoning that real-world analytics demands. This paper introduces DataMind, a scalable data synthesis and agent training recipe designed to build generalist data-analytic agents. DataMind tackles three key challenges in building open-source data-analytic agents, including insufficient data resources, improper training strategy, and unstable code-based multi-turn rollout. Concretely, DataMind applies 1) a fine-grained task taxonomy and a recursive easy-to-hard task composition mechanism to increase the diversity and difficulty of synthesized queries; 2) a knowledge-augme...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["HuggingFace org papers","Qwen","memory","agent"],"author_affiliations":["Alibaba/Qwen"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/Qwen/papers"}},{"id":"apple:c4gmnzha1df9ckasnmev6vtn","title":"StreamBridge: Turning Your Offline Video Large Language Model into a Proactive Streaming Assistant","url":"https://machinelearning.apple.com/research/proactive-streaming-assistant","published":"2025-09-29","authors":["Haibo Wang","Bo Feng","Zhengfeng Lai","Mingze Xu","Shiyu Li","Weifeng Ge","Afshin Dehghan","Meng Cao","Ping Huang"],"abstract":"We present StreamBridge, a simple yet effective framework that seamlessly transforms offline Video-LLMs into streaming-capable models. It addresses two fundamental challenges in adapting existing models into online scenarios: (1) limited capability for multi-turn real-time understanding, and (2) lack of proactive response mechanisms. Specifically, StreamBridge incorporates (1) a memory buffer combined with a round-decayed compression strategy,...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["language model","memory","compression"],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"apple:hwmcfpcpcw1u9z0g41ykatr0","title":"Checklists Are Better Than Reward Models For Aligning Language Models","url":"https://machinelearning.apple.com/research/checklists-are-better","published":"2025-09-29","authors":["Vijay Viswanathan","Yanchao Sun","Shuang Ma","Xiang Kong","Meng Cao","Graham Neubig","Tongshuang Wu"],"abstract":"Language models must be adapted to understand and follow user instructions. Reinforcement learning is widely used to facilitate this -- typically using fixed criteria such as \"helpfulness\" and \"harmfulness\". In our work, we instead propose using flexible, instruction-specific criteria as a means of broadening the impact that reinforcement learning can have in eliciting instruction following. We propose \"Reinforcement Learning from Checklist...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"arxiv:2509.25137","title":"The Era of Real-World Human Interaction: RL from User Conversations","url":"https://huggingface.co/papers/2509.25137","published":"2025-09-29","authors":["Chuanyang Jin","Jing Xu","Bo Liu","Leitian Tao","Olga Golovneva","Tianmin Shu","Wenting Zhao","Xian Li","Jason Weston"],"abstract":"We posit that to achieve continual model improvement and multifaceted alignment, future models must learn from natural human interaction. Current conversational models are aligned using pre-annotated, expert-generated human feedback. In this work, we introduce Reinforcement Learning from Human Interaction (RLHI), a paradigm that learns directly from in-the-wild user conversations. We develop two complementary methods: (1) RLHI with User-Guided Rewrites, which revises unsatisfactory model outputs based on users' natural-language follow-up responses, (2) RLHI with User-Based Rewards, which learns via a reward model conditioned on knowledge of the user's long-term interaction history (termed persona). Together, these methods link long-term user personas to turn-level preferences via persona-conditioned preference optimization. Trained on conversations derived from WildChat, both RLHI varian...","companies":["Meta/FAIR"],"matched_orgs":["Meta/FAIR"],"company_groups":["company_us"],"company_regions":["US"],"sources":["huggingface_search"],"source":"huggingface_search","work_type":"","doi":"","openalex_id":"","cited_by_count":0,"quality_score":43,"matched_keywords":["personalized","personalization","preference","long-term"],"author_affiliations":[],"concepts":[],"official_report":false,"quality_signals":{"company_match_source":"HuggingFace Papers organization/author metadata"}},{"id":"arxiv:2509.24726","title":"Socratic-Zero : Bootstrapping Reasoning via Data-Free Agent Co-evolution","url":"https://huggingface.co/papers/2509.24726","published":"2025-09-29","authors":["Shaobo Wang","Zhengbo Jiao","Zifan Zhang","Yilang Peng","Xu Ze","Boyu Yang","Wei Wang","Hu Wei","Linfeng Zhang"],"abstract":"Recent breakthroughs in large language models (LLMs) on reasoning tasks rely heavily on massive, high-quality datasets-typically human-annotated and thus difficult to scale. While data synthesis or distillation offers a promising alternative, existing methods struggle with inconsistent data quality and an inability to dynamically adapt to the evolving capabilities of the model, leading to suboptimal training signals. To address these limitations, we introduce Socratic-Zero, a fully autonomous framework that generates high-quality training data from minimal seed examples through the co-evolution of three agents: the Teacher, the Solver, and the Generator. The Solver continuously refines its reasoning by learning from preference feedback on both successful and failed trajectories; the Teacher adaptively crafts increasingly challenging questions based on the Solver's weaknesses; and the Gen...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["huggingface_search"],"source":"huggingface_search","work_type":"","doi":"","openalex_id":"","cited_by_count":0,"quality_score":39,"matched_keywords":["preference","distillation","agent"],"author_affiliations":[],"concepts":[],"official_report":false,"quality_signals":{"company_match_source":"HuggingFace Papers organization/author metadata"}},{"id":"arxiv:2509.25161","title":"Rolling Forcing: Autoregressive Long Video Diffusion in Real Time","url":"https://huggingface.co/papers/2509.25161","published":"2025-09-29","authors":["Kunhao Liu","Wenbo Hu","Jiale Xu","Ying Shan","Shijian Lu"],"abstract":"Streaming video generation, as one fundamental component in interactive world models and neural game engines, aims to generate high-quality, low-latency, and temporally coherent long video streams. However, most existing work suffers from severe error accumulation that often significantly degrades the generated stream videos over long horizons. We design Rolling Forcing, a novel video generation technique that enables streaming long videos with minimal error accumulation. Rolling Forcing comes with three novel designs. First, instead of iteratively sampling individual frames, which accelerates error propagation, we design a joint denoising scheme that simultaneously denoises multiple frames with progressively increasing noise levels. This design relaxes the strict causality across adjacent frames, effectively suppressing error growth. Second, we introduce the attention sink mechanism int...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["huggingface_search"],"source":"huggingface_search","work_type":"","doi":"","openalex_id":"","cited_by_count":0,"quality_score":39,"matched_keywords":["long-term","efficient","distillation"],"author_affiliations":[],"concepts":[],"official_report":false,"quality_signals":{"company_match_source":"HuggingFace Papers organization/author metadata"}},{"id":"openalex:W4414603932","title":"Semantic-Assisted Object Clustering for Multi-Modal Referring Video Segmentation","url":"https://doi.org/10.1109/tpami.2025.3612474","published":"2025-09-29","authors":["Yong Liu","Zhuoyan Luo","Yicheng Xiao","Yitong Wang","Shuyan Li","Xiu Li","Yujiu Yang","Yansong Tang"],"abstract":"This paper concentrates on Multi-modal Referring Video Segmentation task, where a well optimized model is able to recognize and segment the target objects referred by the given guidance signals, e.g., language description. Early approaches model this task as a sequence prediction problem. The lack of a global view of video content leads to difficulties in effectively utilizing inter-frame relationships. Some recent works propose to perform temporal modeling with vanilla attention mechanism. However, the condensed visual representation tends to be messy about target information due to occlusion or motion blur. Unlimited non-local operation would spread such noise to all the sequences and interfere with the extraction of global representations. To address the above issue, we present Semantic-assisted Object Cluster network (SOC) and the improved SOC++ in this paper. Our method unifies temp...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/tpami.2025.3612474","openalex_id":"https://openalex.org/W4414603932","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Queen's University Belfast","Tencent (China)","Tsinghua University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8629000186920166},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.7502999901771545},{"id":"https://openalex.org/C89600930","display_name":"Segmentation","score":0.5946000218391418},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.5633999705314636},{"id":"https://openalex.org/C73555534","display_name":"Cluster analysis","score":0.49559998512268066},{"id":"https://openalex.org/C2776359362","display_name":"Representation (politics)","score":0.45910000801086426},{"id":"https://openalex.org/C2781238097","display_name":"Object (grammar)","score":0.45489999651908875},{"id":"https://openalex.org/C18555067","display_name":"Joint (building)","score":0.44440001249313354}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"arxiv:2509.24695","title":"SANA-Video: Efficient Video Generation with Block Linear Diffusion Transformer","url":"https://huggingface.co/papers/2509.24695","published":"2025-09-29","authors":["Junsong Chen","Yuyang Zhao","Jincheng Yu","Ruihang Chu","Junyu Chen","Shuai Yang","Xianbang Wang","Yicheng Pan","Daquan Zhou","Huan Ling","Haozhe Liu","Hongwei Yi"],"abstract":"We introduce SANA-Video, a small diffusion model that can efficiently generate videos up to 720x1280 resolution and minute-length duration. SANA-Video synthesizes high-resolution, high-quality and long videos with strong text-video alignment at a remarkably fast speed, deployable on RTX 5090 GPU. Two core designs ensure our efficient, effective and long video generation: (1) Linear DiT: We leverage linear attention as the core operation, which is more efficient than vanilla attention given the large number of tokens processed in video generation. (2) Constant-Memory KV cache for Block Linear Attention: we design block-wise autoregressive approach for long video generation by employing a constant-memory state, derived from the cumulative properties of linear attention. This KV cache provides the Linear DiT with global context at a fixed memory cost, eliminating the need for a traditional....","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["huggingface_search"],"source":"huggingface_search","work_type":"","doi":"","openalex_id":"","cited_by_count":0,"quality_score":35,"matched_keywords":["memory","efficient"],"author_affiliations":[],"concepts":[],"official_report":false,"quality_signals":{"company_match_source":"HuggingFace Papers organization/author metadata"}},{"id":"arxiv:2509.25149","title":"Pretraining Large Language Models with NVFP4","url":"https://huggingface.co/papers/2509.25149","published":"2025-09-29","authors":["NVIDIA","Felix Abecassis","Anjulie Agrusa","Dong Ahn","Jonah Alben","Stefania Alborghetti","Michael Andersch","Sivakumar Arayandi","Alexis Bjorlin","Aaron Blakeman","Evan Briones","Ian Buck"],"abstract":"Large Language Models (LLMs) today are powerful problem solvers across many domains, and they continue to get stronger as they scale in model size, training set size, and training set quality, as shown by extensive research and experimentation across the industry. Training a frontier model today requires on the order of tens to hundreds of yottaflops, which is a massive investment of time, compute, and energy. Improving pretraining efficiency is therefore essential to enable the next generation of even more capable LLMs. While 8-bit floating point (FP8) training is now widely adopted, transitioning to even narrower precision, such as 4-bit floating point (FP4), could unlock additional improvements in computational speed and resource utilization. However, quantization at this level poses challenges to training stability, convergence, and implementation, notably for large-scale models trai...","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["huggingface_search"],"source":"huggingface_search","work_type":"","doi":"","openalex_id":"","cited_by_count":0,"quality_score":35,"matched_keywords":["LLM","quantization"],"author_affiliations":[],"concepts":[],"official_report":false,"quality_signals":{"company_match_source":"HuggingFace Papers organization/author metadata"}},{"id":"arxiv:2509.25189","title":"InfoAgent: Advancing Autonomous Information-Seeking Agents","url":"https://huggingface.co/papers/2509.25189","published":"2025-09-29","authors":["Gongrui Zhang","Jialiang Zhu","Ruiqi Yang","Kai Qiu","Miaosen Zhang","Zhirong Wu","Qi Dai","Bei Liu","Chong Luo","Zhengyuan Yang","Linjie Li","Lijuan Wang"],"abstract":"Building Large Language Model agents that expand their capabilities by interacting with external tools represents a new frontier in AI research and applications. In this paper, we introduce InfoAgent, a deep research agent powered by an innovative data synthesis pipeline and orchestrated web search tools. To construct challenging, hard-to-find queries,we build entity trees and apply sub-tree sampling with entity fuzzification to systematically increase question difficulty. Unlike prior work that relies heavily on commercial search tools, we develop a dedicated self-hosted search infrastructure, enhancing transparency of agent environments and facilitating further advancement of agent capacity. We evaluate the effectiveness of our data pipeline by measuring the average number of tool calls required to correctly answer a question, and also show that our agent yields better performance when...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["huggingface_search"],"source":"huggingface_search","work_type":"","doi":"","openalex_id":"","cited_by_count":0,"quality_score":35,"matched_keywords":["language model","agent"],"author_affiliations":[],"concepts":[],"official_report":false,"quality_signals":{"company_match_source":"HuggingFace Papers organization/author metadata"}},{"id":"arxiv:2509.25182","title":"DC-VideoGen: Efficient Video Generation with Deep Compression Video Autoencoder","url":"https://huggingface.co/papers/2509.25182","published":"2025-09-29","authors":["Junyu Chen","Wenkun He","Yuchao Gu","Yuyang Zhao","Jincheng Yu","Junsong Chen","Dongyun Zou","Yujun Lin","Zhekai Zhang","Muyang Li","Haocheng Xi","Ligeng Zhu"],"abstract":"We introduce DC-VideoGen, a post-training acceleration framework for efficient video generation. DC-VideoGen can be applied to any pre-trained video diffusion model, improving efficiency by adapting it to a deep compression latent space with lightweight fine-tuning. The framework builds on two key innovations: (i) a Deep Compression Video Autoencoder with a novel chunk-causal temporal design that achieves 32x/64x spatial and 4x temporal compression while preserving reconstruction quality and generalization to longer videos; and (ii) AE-Adapt-V, a robust adaptation strategy that enables rapid and stable transfer of pre-trained models into the new latent space. Adapting the pre-trained Wan-2.1-14B model with DC-VideoGen requires only 10 GPU days on the NVIDIA H100 GPU. The accelerated models achieve up to 14.8x lower inference latency than their base counterparts without compromising quali...","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["huggingface_search"],"source":"huggingface_search","work_type":"","doi":"","openalex_id":"","cited_by_count":0,"quality_score":35,"matched_keywords":["efficient","compression"],"author_affiliations":[],"concepts":[],"official_report":false,"quality_signals":{"company_match_source":"HuggingFace Papers organization/author metadata"}},{"id":"arxiv:2509.24945","title":"MobileLLM-R1: Exploring the Limits of Sub-Billion Language Model Reasoners with Open Training Recipes","url":"https://huggingface.co/papers/2509.24945","published":"2025-09-29","authors":["Changsheng Zhao","Ernie Chang","Zechun Liu","Chia-Jung Chang","Wei Wen","Chen Lai","Rick Cao","Yuandong Tian","Raghuraman Krishnamoorthi","Yangyang Shi","Vikas Chandra"],"abstract":"The paradigm shift in large language models (LLMs) from instinctive responses to chain-of-thought (CoT) reasoning has fueled two prevailing assumptions: (1) reasoning capabilities only emerge in sufficiently large models, and (2) such capabilities require training on massive datasets. While the first assumption has already been challenged by recent sub-billion-parameter reasoning models such as Qwen3-0.6B and DeepSeek distilled variants, the second remains largely unquestioned. In this work, we revisit the necessity of scaling to extremely large corpora (>10T tokens) for reasoning emergence. By carefully curating and resampling open-source datasets that we identify as beneficial under our designed metrics, we demonstrate that strong reasoning abilities can emerge with far less data. Specifically, we show that only ~2T tokens of high-quality data are sufficient, and pre-training with 4.2T...","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["huggingface_search"],"source":"huggingface_search","work_type":"","doi":"","openalex_id":"","cited_by_count":0,"quality_score":31,"matched_keywords":["language model"],"author_affiliations":[],"concepts":[],"official_report":false,"quality_signals":{"company_match_source":"HuggingFace Papers organization/author metadata"}},{"id":"arxiv:2509.25180","title":"DC-Gen: Post-Training Diffusion Acceleration with Deeply Compressed Latent Space","url":"https://huggingface.co/papers/2509.25180","published":"2025-09-29","authors":["Wenkun He","Yuchao Gu","Junyu Chen","Dongyun Zou","Yujun Lin","Zhekai Zhang","Haocheng Xi","Muyang Li","Ligeng Zhu","Jincheng Yu","Junsong Chen","Enze Xie"],"abstract":"Existing text-to-image diffusion models excel at generating high-quality images, but face significant efficiency challenges when scaled to high resolutions, like 4K image generation. While previous research accelerates diffusion models in various aspects, it seldom handles the inherent redundancy within the latent space. To bridge this gap, this paper introduces DC-Gen, a general framework that accelerates text-to-image diffusion models by leveraging a deeply compressed latent space. Rather than a costly training-from-scratch approach, DC-Gen uses an efficient post-training pipeline to preserve the quality of the base model. A key challenge in this paradigm is the representation gap between the base model's latent space and a deeply compressed latent space, which can lead to instability during direct fine-tuning. To overcome this, DC-Gen first bridges the representation gap with a lightw...","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["huggingface_search"],"source":"huggingface_search","work_type":"","doi":"","openalex_id":"","cited_by_count":0,"quality_score":31,"matched_keywords":["efficient"],"author_affiliations":[],"concepts":[],"official_report":false,"quality_signals":{"company_match_source":"HuggingFace Papers organization/author metadata"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/pixelcraft-a-multi-agent-system-for-high-fidelity-visual-reasoning-on-structured-images","title":"PixelCraft: A Multi-Agent System for High-Fidelity Visual Reasoning on Structured Images","url":"https://www.microsoft.com/en-us/research/publication/pixelcraft-a-multi-agent-system-for-high-fidelity-visual-reasoning-on-structured-images/","published":"2025-09-28","authors":["Shuoshuo Zhang","Zijian Li","Yizhen Zhang","Jingjing Fu","Lei Song","Jiang Bian","Jun Zhang","Yujiu Yang","Rui Wang"],"abstract":"Structured images (e.g., charts and geometric diagrams) remain challenging for multimodal large language models (MLLMs), as perceptual slips can cascade into erroneous conclusions. Intermediate visual cues can steer reasoning; however, existing cue-based methods are constrained with low-fidelity image processing and linear, rigid reasoning patterns, limiting their effectiveness on complex structured-image tasks. In this paper, we propose PixelCraft, a novel multi-agent system for high-fidelity image processing and flexible visual reasoning on structured images. The system comprises a dispatcher, a planner, a reasoner, critics, and a set of visual tool agents. To achieve high-fidelity processing, we construct a high-quality corpus and fine-tune an MLLM into a grounding model, whose pixel-level localizations are integrated with traditional computer vision (CV) algorithms in tool agents. Bu...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":84,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","Multimodal Large Language Models","1970-01-01","memory","agent","multi-agent"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/when-mllms-meet-compression-distortion-a-coding-paradigm-tailored-to-mllms","title":"When MLLMs Meet Compression Distortion: A Coding Paradigm Tailored to MLLMs","url":"https://www.microsoft.com/en-us/research/publication/when-mllms-meet-compression-distortion-a-coding-paradigm-tailored-to-mllms/","published":"2025-09-28","authors":["Jinming Liu","Zhaoyang Jia","Jiahao Li","Bin Li","Xin Jin","Wenjun Zeng","Yan Lu"],"abstract":"The increasing deployment of powerful Multimodal Large Language Models (MLLMs), typically hosted on cloud platforms, urgently requires effective compression techniques to efficiently transmit signal inputs (e.g., images, videos) from edge devices with minimal bandwidth usage. However, conventional image codecs are optimized for fidelity to serve the Human Visual System (HVS) and ill-suited for MLLMs, in which diverse downstream tasks are jointly considered. In this paper, we first systematically analyze the impact of compression artifacts on several mainstream MLLMs. We find that: Compression distortion unevenly impacts different-level image features, leading to varying effects on MLLMs'downstream tasks depending on their feature-level reliance. Motivated by this discovery, we propose an image Codec TAilored to MLLMs (CoTAM) designed to adaptively protect multi-level features and suit di...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer vision","Computer science","1970-01-01","compression"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/warex-web-agent-reliability-evaluation-on-existing-benchmarks","title":"WAREX: Web Agent Reliability Evaluation on Existing Benchmarks","url":"https://www.microsoft.com/en-us/research/publication/warex-web-agent-reliability-evaluation-on-existing-benchmarks/","published":"2025-09-28","authors":["Su Kara","Fazle Faisal","Suman Nath"],"abstract":"Recent advances in browser-based LLM agents have shown promise for automating tasks ranging from simple form filling to hotel booking or online shopping. Current benchmarks measure agent performance in controlled environments, such as containers or stable networks, where websites behave deterministically. However, in the real world, users access websites over networks and HTTPS connections that introduce instability from multiple sources: client-side, server-side issues or broader system failures. Moreover, live websites are prone to web attacks such Cross-Site Scripting, as well as general site modifications which can cause unexpected or malicious pop-ups or improper functionality. To address this gap, we present WAREX: Web Agent Reliability Evaluation on Existing Benchmarks. We measure the impact of WAREX across three popular benchmarks: WebArena, WebVoyager, and REAL. Our experiments....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Article (Journal)","Systems and networking","Computer science","LLM","agent"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/hyperspherical-latents-improve-continuous-token-autoregressive-generation","title":"Hyperspherical Latents Improve Continuous-Token Autoregressive Generation","url":"https://www.microsoft.com/en-us/research/publication/hyperspherical-latents-improve-continuous-token-autoregressive-generation/","published":"2025-09-28","authors":["Guolin Ke","Hui Xue"],"abstract":"Autoregressive (AR) models are promising for image generation, yet continuous-token AR variants often trail latent diffusion and masked-generation models. The core issue is heterogeneous variance in VAE latents, which is amplified during AR decoding, especially under classifier-free guidance (CFG), and can cause variance collapse. We propose SphereAR to address this issue. Its core design is to constrain all AR inputs and outputs -- including after CFG -- to lie on a fixed-radius hypersphere (constant $\\ell2$ norm), leveraging hyperspherical VAEs. Our theoretical analysis shows that hyperspherical constraint removes the scale component (the primary cause of variance collapse), thereby stabilizing AR decoding. Empirically, on ImageNet generation, SphereAR-H (943M) sets a new state of the art for AR models, achieving FID 1.34. Even at smaller scales, SphereAR-L (479M) reaches FID 1.54 and....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","Machine learning","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"hf-org-paper:tencent:2509.23951","title":"HunyuanImage 3.0 Technical Report","url":"https://huggingface.co/papers/2509.23951","published":"2025-09-28","authors":["Tencent/Hunyuan"],"abstract":"We present HunyuanImage 3.0, a native multimodal model that unifies multimodal understanding and generation within an autoregressive framework, with its image generation module publicly available. The achievement of HunyuanImage 3.0 relies on several key components, including meticulous data curation, advanced architecture design, a native Chain-of-Thoughts schema, progressive model pre-training, aggressive model post-training, and an efficient infrastructure that enables large-scale training and inference. With these advancements, we successfully trained a Mixture-of-Experts (MoE) model comprising over 80 billion parameters in total, with 13 billion parameters activated per token during inference, making it the largest and most powerful open-source image generative model to date. We conducted extensive experiments and the results of automatic and human evaluation of text-image alignment...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["HuggingFace org papers","tencent","efficient"],"author_affiliations":["Tencent/Hunyuan"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/tencent/papers"}},{"id":"hf-org-paper:baidu:2509.23765","title":"Knowledge-Level Consistency Reinforcement Learning: Dual-Fact Alignment for Long-Form Factuality","url":"https://huggingface.co/papers/2509.23765","published":"2025-09-28","authors":["Baidu"],"abstract":"","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["HuggingFace org papers","baidu"],"author_affiliations":["Baidu"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/baidu/papers"}},{"id":"openalex:W4414773788","title":"UniAnimate: taming unified video diffusion models for consistent human image animation","url":"https://doi.org/10.1007/s11432-024-4592-3","published":"2025-09-28","authors":["Xiang Wang","Shiwei Zhang","Changxin Gao","Jiayu Wang","Xiaoqiang Zhou","Yingya Zhang","Luxin Yan","Nong Sang"],"abstract":"","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1007/s11432-024-4592-3","openalex_id":"https://openalex.org/W4414773788","cited_by_count":11,"quality_score":48,"matched_keywords":[],"author_affiliations":["Alibaba Group (China)","Hefei University of Technology","Huazhong University of Science and Technology","University of Science and Technology of China"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8151999711990356},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.6843000054359436},{"id":"https://openalex.org/C502989409","display_name":"Animation","score":0.6704999804496765},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.6599000096321106},{"id":"https://openalex.org/C99498987","display_name":"Noise (video)","score":0.5313000082969666},{"id":"https://openalex.org/C2776401178","display_name":"Feature (linguistics)","score":0.5067999958992004},{"id":"https://openalex.org/C172849965","display_name":"Reference frame","score":0.45509999990463257},{"id":"https://openalex.org/C126042441","display_name":"Frame (networking)","score":0.4320000112056732}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":11}},{"id":"arxiv:2509.23873","title":"Winning the Pruning Gamble: A Unified Approach to Joint Sample and Token Pruning for Efficient Supervised Fine-Tuning","url":"https://huggingface.co/papers/2509.23873","published":"2025-09-28","authors":["Shaobo Wang","Jiaming Wang","Jiajun Zhang","Cong Wang","Yue Min","Zichen Wen","Fei Huang","Huiqiang Jiang","Junyang Lin","Dayiheng Liu","Linfeng Zhang"],"abstract":"As supervised fine-tuning (SFT) evolves from a lightweight post-training step into a compute-intensive phase rivaling mid-training in scale, data efficiency has become critical for aligning large language models (LLMs) under tight budgets. Existing data pruning methods suffer from a fragmented design: they operate either at the sample level or the token level in isolation, failing to jointly optimize both dimensions. This disconnect leads to significant inefficiencies--high-value samples may still contain redundant tokens, while token-level pruning often discards crucial instructional or corrective signals embedded in individual examples. To address this bottleneck, we introduce the Error-Uncertainty (EU) Plane, a diagnostic framework that jointly characterizes the heterogeneous utility of training data across samples and tokens. Guided by this insight, we propose Quadrant-based Tuning (...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["huggingface_search"],"source":"huggingface_search","work_type":"","doi":"","openalex_id":"","cited_by_count":0,"quality_score":35,"matched_keywords":["LLM","efficient"],"author_affiliations":[],"concepts":[],"official_report":false,"quality_signals":{"company_match_source":"HuggingFace Papers organization/author metadata"}},{"id":"openalex:W7140277959","title":"Design of Intelligent Report Automatic Generation System and Optimization of Generative Algorithm in Power Business Scenarios","url":"https://doi.org/10.1109/actce66599.2025.00036","published":"2025-09-27","authors":["Li Chen","Hao Yang","Juntai Shi","Yixuan Chen","Xunquan Lu"],"abstract":"The digital transformation of power industry challenges the value mining of massive data, and the traditional report generation mode is inefficient and difficult to meet the professional and real-time requirements. In this paper, an intelligent automatic report generation system driven by business rules and optimized by generative algorithm is proposed to solve the problems of low efficiency, poor accuracy and poor readability of report generation in power scene. The system architecture is divided into a data fusion layer, a business rule layer, an intelligent generation layer, and an application interaction layer. A dynamic template engine is constructed through a knowledge graph, and combined with a rule engine and a domain fine tuned Large Language Model (LLM) to achieve automatic generation, verification, and optimization of reports. The experimental results show that the system can....","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/actce66599.2025.00036","openalex_id":"https://openalex.org/W7140277959","cited_by_count":0,"quality_score":45,"matched_keywords":["LLM","language model"],"author_affiliations":["Baidu (China)","Shanghai Electric (China)","State Grid Corporation of China (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6230000257492065},{"id":"https://openalex.org/C163258240","display_name":"Power (physics)","score":0.4377000033855438},{"id":"https://openalex.org/C11413529","display_name":"Algorithm","score":0.3677000105381012},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3490999937057495},{"id":"https://openalex.org/C26517878","display_name":"Key (lock)","score":0.3165999948978424},{"id":"https://openalex.org/C2987595161","display_name":"Optimization algorithm","score":0.30720001459121704},{"id":"https://openalex.org/C2776401178","display_name":"Feature (linguistics)","score":0.29330000281333923},{"id":"https://openalex.org/C127413603","display_name":"Engineering","score":0.27950000762939453}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4414564780","title":"Thing2Reality: Enabling Spontaneous Creation of 3D Objects from 2D Content using Generative AI in XR Meetings","url":"https://doi.org/10.1145/3746059.3747621","published":"2025-09-27","authors":["Erzhen Hu","Mingyi Li","Jungtaek Hong","Xun Qian","Alex Olwal","David Kim","Seongkook Heo","Ruofei Du"],"abstract":"","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3746059.3747621","openalex_id":"https://openalex.org/W4414564780","cited_by_count":7,"quality_score":44,"matched_keywords":[],"author_affiliations":["Google (Switzerland)","Google (United States)","Northeastern University","University of Virginia"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6000999808311462},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.44339999556541443},{"id":"https://openalex.org/C2778152352","display_name":"Content (measure theory)","score":0.4431999921798706},{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.3465000092983246},{"id":"https://openalex.org/C2776401178","display_name":"Feature (linguistics)","score":0.3418000042438507},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.3325999975204468},{"id":"https://openalex.org/C177264268","display_name":"Set (abstract data type)","score":0.3264000117778778},{"id":"https://openalex.org/C167966045","display_name":"Generative model","score":0.3190000057220459}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":7}},{"id":"arxiv:2511.11930","title":"Enhancing XR Auditory Realism via Multimodal Scene-Aware Acoustic Rendering","url":"http://arxiv.org/abs/2511.11930","published":"2025-09-27","authors":["Tianyu Xu","Jihan Li","Penghe Zu","Pranav Sahay","Maruchi Kim","Jack Obeng-Marnu","Farley Miller","Xun Qian","Katrina Passarella","Mahitha Rachumalla","Rajeev Nongpiur","Dong-Ryeol Shin"],"abstract":"In Extended Reality (XR), rendering sound that accurately simulates real-world acoustics is pivotal in creating lifelike and believable virtual experiences. However, existing XR spatial audio rendering methods often struggle with real-time adaptation to diverse physical scenes, causing a sensory mismatch between visual and auditory cues that disrupts user immersion. To address this, we introduce SAMOSA, a novel on-device system that renders spatially accurate sound by dynamically adapting to its physical environment. SAMOSA leverages a synergistic multimodal scene representation by fusing real-time estimations of room geometry, surface materials, and semantic-driven acoustic context. This rich representation then enables efficient acoustic calibration via scene priors, allowing the system to synthesize a highly realistic Room Impulse Response (RIR). We validate our system through technic...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3746059.3747730","openalex_id":"https://openalex.org/W4417199954","cited_by_count":2,"quality_score":43,"matched_keywords":["efficient"],"author_affiliations":["Cognizant (United States)","Google (United States)"],"concepts":[{"id":"https://openalex.org/C205711294","display_name":"Rendering (computer graphics)","score":0.8489000201225281},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7350000143051147},{"id":"https://openalex.org/C194969405","display_name":"Virtual reality","score":0.5823000073432922},{"id":"https://openalex.org/C171179263","display_name":"Auditory display","score":0.5092999935150146},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.46639999747276306},{"id":"https://openalex.org/C36464697","display_name":"Visualization","score":0.4325999915599823},{"id":"https://openalex.org/C26760741","display_name":"Perception","score":0.3765000104904175},{"id":"https://openalex.org/C91607612","display_name":"Sonification","score":0.3686000108718872}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":2}},{"id":"openalex:W4414565981","title":"QueryGenie: Making LLM-Based Database Querying Transparent and Controllable","url":"https://doi.org/10.1145/3746058.3758982","published":"2025-09-27","authors":["Longfei Chen","Shenghan Gao","S.L. Wang","K.-J. Lin","Yun Wang","Quan Li"],"abstract":"","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3746058.3758982","openalex_id":"https://openalex.org/W4414565981","cited_by_count":0,"quality_score":41,"matched_keywords":["LLM"],"author_affiliations":["Microsoft (United States)","Microsoft Research (United Kingdom)","Microsoft Research Asia (China)","ShanghaiTech University","Shenzhen Academy of Aerospace Technology"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6488999724388123},{"id":"https://openalex.org/C77088390","display_name":"Database","score":0.48989999294281006},{"id":"https://openalex.org/C98045186","display_name":"Process (computing)","score":0.29840001463890076},{"id":"https://openalex.org/C5655090","display_name":"Relational database","score":0.2847999930381775},{"id":"https://openalex.org/C9652623","display_name":"Field (mathematics)","score":0.2605000138282776},{"id":"https://openalex.org/C77618280","display_name":"Scheme (mathematics)","score":0.25619998574256897},{"id":"https://openalex.org/C26517878","display_name":"Key (lock)","score":0.2558000087738037},{"id":"https://openalex.org/C180198813","display_name":"Information system","score":0.25279998779296875}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"arxiv:2508.14395","title":"NoteIt: A System Converting Instructional Videos to Interactable Notes Through Multimodal Video Understanding","url":"http://arxiv.org/abs/2508.14395","published":"2025-09-27","authors":["Running Zhao","Zhihan Jiang","Xinchen Zhang","Chirui Chang","Handi Chen","Weipeng Deng","Luyao Jin","Xiaojuan Qi","Xun Qian","Edith C.‐H. Ngai"],"abstract":"Users often take notes for instructional videos to access key knowledge later without revisiting long videos. Automated note generation tools enable users to obtain informative notes efficiently. However, notes generated by existing research or off-the-shelf tools fail to preserve the information conveyed in the original videos comprehensively, nor can they satisfy users' expectations for diverse presentation formats and interactive features when using notes digitally. In this work, we present NoteIt, a system, which automatically converts instructional videos to interactable notes using a novel pipeline that faithfully extracts hierarchical structure and multimodal key information from videos. With NoteIt's interface, users can interact with the system to further customize the content and presentation formats of the notes according to their preferences. We conducted both a technical eva...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3746059.3747626","openalex_id":"https://openalex.org/W4415239351","cited_by_count":3,"quality_score":40,"matched_keywords":[],"author_affiliations":["Chinese University of Hong Kong","Google (United States)","University of Hong Kong"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.838699996471405},{"id":"https://openalex.org/C170130773","display_name":"Usability","score":0.7436000108718872},{"id":"https://openalex.org/C43521106","display_name":"Pipeline (software)","score":0.713100016117096},{"id":"https://openalex.org/C2777601897","display_name":"Presentation (obstetrics)","score":0.7117999792098999},{"id":"https://openalex.org/C26517878","display_name":"Key (lock)","score":0.6564000248908997},{"id":"https://openalex.org/C49774154","display_name":"Multimedia","score":0.6416000127792358},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.4814000129699707},{"id":"https://openalex.org/C136764020","display_name":"World Wide Web","score":0.46970000863075256}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":3}},{"id":"hf-org-paper:huawei-noah:2509.22921","title":"Rethinking Large Language Model Distillation: A Constrained Markov Decision Process Perspective","url":"https://huggingface.co/papers/2509.22921","published":"2025-09-26","authors":["Huawei/Noah"],"abstract":"We introduce a novel approach to large language model (LLM) distillation by formulating it as a constrained reinforcement learning problem. While recent work has begun exploring the integration of task-specific rewards into distillation processes, existing methods typically rely on ad-hoc reward weighting. We propose a principled optimization framework that maximizes task-specific rewards while constraining the divergence from the teacher model to remain below a specified threshold. Our approach adapts constrained state augmented reinforcement learning to the distillation setting, introducing a modified reward function that maintains theoretical guarantees of constraint satisfaction without requiring state augmentation or teacher model access during deployment and without the computational overhead of the dual Lagrangian methods. Through extensive experiments on mathematical reasoning ta...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["HuggingFace org papers","huawei-noah","LLM","language model","efficient","distillation"],"author_affiliations":["Huawei/Noah"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/huawei-noah/papers"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/sysmobench-evaluating-ai-on-formally-modeling-complex-real-world-systems","title":"SysMoBench: Evaluating AI on Formally Modeling Complex Real-World Systems","url":"https://www.microsoft.com/en-us/research/publication/sysmobench-evaluating-ai-on-formally-modeling-complex-real-world-systems/","published":"2025-09-26","authors":["Qian Cheng","Ruize Tang","Emilie Ma","Finn Hackett","Peiyang He","Yiming Su","Ivan Beschastnikh","Yu Huang","Xiaoxing Ma","Tianyin Xu"],"abstract":"Formal models are essential to specifying large, complex computer systems and verifying their correctness, but are notoriously expensive to write and maintain. Recent advances in generative AI show promise in generating certain forms of specifications. However, existing work mostly targets small code, not complete systems. It is unclear whether AI can deal with realistic system artifacts, as this requires abstracting their complex behavioral properties into formal models. We present SysMoBench, a benchmark that evaluates AI's ability to formally model large, complex systems. We focus on concurrent and distributed systems, which are keystones of today's critical computing infrastructures, encompassing operating systems and cloud infrastructure. We use TLA+, the de facto specification language for concurrent and distributed systems, though the benchmark can be extended to other specificati...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","1970-01-01","election"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"hf-org-paper:moonshotai:2509.23045","title":"Kimi-Dev: Agentless Training as Skill Prior for SWE-Agents","url":"https://huggingface.co/papers/2509.23045","published":"2025-09-26","authors":["Moonshot/Kimi"],"abstract":"Large Language Models (LLMs) are increasingly applied to software engineering (SWE), with SWE-bench as a key benchmark. Solutions are split into SWE-Agent frameworks with multi-turn interactions and workflow-based Agentless methods with single-turn verifiable steps. We argue these paradigms are not mutually exclusive: reasoning-intensive Agentless training induces skill priors, including localization, code edit, and self-reflection that enable efficient and effective SWE-Agent adaptation. In this work, we first curate the Agentless training recipe and present Kimi-Dev, an open-source SWE LLM achieving 60.4\\% on SWE-bench Verified, the best among workflow approaches. With additional SFT adaptation on 5k publicly-available trajectories, Kimi-Dev powers SWE-Agents to 48.6\\% pass@1, on par with that of Claude 3.5 Sonnet (241022 version). These results show that structured skill priors from A...","companies":["Moonshot/Kimi"],"matched_orgs":["Moonshot/Kimi"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["HuggingFace org papers","moonshotai","LLM","efficient","agent"],"author_affiliations":["Moonshot/Kimi"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/moonshotai/papers"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/imaginationvellum-generative-ai-ideation-canvas-with-spatial-prompts-generative-strokes-and-ideation-history","title":"ImaginationVellum: Generative-AI Ideation Canvas with Spatial Prompts, Generative Strokes, and Ideation History","url":"https://www.microsoft.com/en-us/research/publication/imaginationvellum-generative-ai-ideation-canvas-with-spatial-prompts-generative-strokes-and-ideation-history/","published":"2025-09-26","authors":["Nicolai Marquardt","A. Roseway","Hugo ROMAT","Payod Panda","Michel Pahud","Gonzalo A. Ramos","S. Drucker","Andrew D. Wilson","Ken Hinckley","Nathalie Henry Riche"],"abstract":"We introduce ImaginationVellum, a multi-modal spatial canvas for early-stage visual ideation and concept sketching with generative AI. The resulting system supports a unique style of human-AI co-creation where the canvas is the prompt. This means that ImaginationVellum employs the entire 2D canvas as an active prompt space, where spatial arrangement, proximity, and composition of diverse content elements—inking, text, images, and intermediate results—steer generative visual outcomes. As a technical probe, ImaginationVellum contributes a set of spatially-grounded direct manipulation tools for iterative visual ideation. In particular, we introduce Generative Strokes—freeform strokes that spatially modulate generation and prompt-parameters (articulated along multiple latent semantic or stylistic dimensions). These techniques afford rapid traversal of design spaces via convergence, divergenc...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Human-computer interaction","Computer science"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"hf-org-paper:tencent:2509.22601","title":"Learn the Ropes, Then Trust the Wins: Self-imitation with Progressive Exploration for Agentic Reinforcement Learning","url":"https://huggingface.co/papers/2509.22601","published":"2025-09-26","authors":["Tencent/Hunyuan"],"abstract":"Reinforcement learning (RL) is the dominant paradigm for sharpening strategic tool use capabilities of LLMs on long-horizon, sparsely-rewarded agent tasks, yet it faces a fundamental challenge of exploration-exploitation trade-off. Existing studies stimulate exploration through the lens of policy entropy, but such mechanical entropy maximization is prone to RL training instability due to the multi-turn distribution shifting. In this paper, we target the progressive exploration-exploitation balance under the guidance of the agent own experiences without succumbing to either entropy collapsing or runaway divergence. We propose SPEAR, a curriculum-based self-imitation learning (SIL) recipe for training agentic LLMs. It extends the vanilla SIL framework, where a replay buffer stores self-generated promising trajectories for off-policy update, by gradually steering the policy evolution within...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["HuggingFace org papers","tencent","agent"],"author_affiliations":["Tencent/Hunyuan"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/tencent/papers"}},{"id":"official:0602e06ecc46782a","title":"Gemini 2.5 Flash-Lite Model Card","url":"https://storage.googleapis.com/deepmind-media/Model-Cards/Gemini-2-5-Flash-Lite-Model-Card.pdf","published":"2025-09-26","authors":["Google/DeepMind"],"abstract":"Official Google DeepMind model card.","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"model_card","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["model card","Gemini 2.5 Flash-Lite"],"author_affiliations":["Google/DeepMind"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Google DeepMind model cards page https://deepmind.google/models/model-cards/"}},{"id":"official:848261d214ddef55","title":"Gemini 2.5 Flash and Gemini 2.5 Flash Image Model Card","url":"https://storage.googleapis.com/deepmind-media/Model-Cards/Gemini-2-5-Flash-Model-Card.pdf","published":"2025-09-26","authors":["Google/DeepMind"],"abstract":"Official Google DeepMind model card.","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"model_card","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["model card","Gemini 2.5 Flash and Gemini 2.5 Flash Image"],"author_affiliations":["Google/DeepMind"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Google DeepMind model cards page https://deepmind.google/models/model-cards/"}},{"id":"apple:oaubhwc86uuhay79dos3t01i","title":"Scaling Laws for Optimal Data Mixtures","url":"https://machinelearning.apple.com/research/optimal-data-mixtures","published":"2025-09-26","authors":["Mustafa Shukor","Louis Bethune","Dan Busbridge","David Grangier","Enrico Fini","Alaaeldin El-Nouby","Pierre Ablin"],"abstract":"Large foundation models are typically trained on data from multiple domains, with the data mixture—the proportion of each domain used—playing a critical role in model performance. The standard approach to selecting this mixture relies on trial and error, which becomes impractical for large-scale pretraining. We propose a systematic method to determine the optimal data mixture for any target domain using scaling laws. Our approach...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"arxiv:2509.22944","title":"SINQ: Sinkhorn-Normalized Quantization for Calibration-Free Low-Precision LLM Weights","url":"https://huggingface.co/papers/2509.22944","published":"2025-09-26","authors":["Lorenz K. Müller","Philippe Bich","Jiawei Zhuang","Ahmet Çelik","Luca Benfenati","Lukas Cavigelli"],"abstract":"Post-training quantization has emerged as the most widely used strategy for deploying large language models at low precision. Still, current methods show perplexity degradation at bit-widths less than or equal to 4, partly because representing outliers causes precision issues in parameters that share the same scales as these outliers. This problem is especially pronounced for calibration-free, uniform quantization methods. We introduce SINQ to augment existing post-training quantizers with an additional second-axis scale factor and a fast Sinkhorn-Knopp-style algorithm that finds scales to normalize per-row and per-column variances, thereby minimizing a novel per-matrix proxy target for quantization: the matrix imbalance. Our method has no interactions between layers and can be trivially applied to new architectures to quantize any linear layers. We evaluate our method on the Qwen3 model...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["huggingface_search"],"source":"huggingface_search","work_type":"","doi":"","openalex_id":"","cited_by_count":0,"quality_score":35,"matched_keywords":["LLM","quantization"],"author_affiliations":[],"concepts":[],"official_report":false,"quality_signals":{"company_match_source":"HuggingFace Papers organization/author metadata"}},{"id":"arxiv:2510.01265","title":"RLP: Reinforcement as a Pretraining Objective","url":"https://huggingface.co/papers/2510.01265","published":"2025-09-26","authors":["Ali Hatamizadeh","Syeda Nahida Akter","Shrimai Prabhumoye","Jan Kautz","Mostofa Patwary","Mohammad Shoeybi","Bryan Catanzaro","Yejin Choi"],"abstract":"The dominant paradigm for training large reasoning models starts with pre-training using next-token prediction loss on vast amounts of data. Reinforcement learning, while powerful in scaling reasoning, is introduced only as the very last phase of post-training, preceded by supervised fine-tuning. While dominant, is this an optimal way of training? In this paper, we present RLP, an information-driven reinforcement pretraining objective, that brings the core spirit of reinforcement learning -- exploration -- to the last phase of pretraining. The key idea is to treat chain-of-thought as an exploratory action, with rewards computed based on the information gain it provides for predicting future tokens. This training objective essentially encourages the model to think for itself before predicting what comes next, thus teaching an independent thinking behavior earlier in the pretraining. More....","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["huggingface_search"],"source":"huggingface_search","work_type":"","doi":"","openalex_id":"","cited_by_count":0,"quality_score":31,"matched_keywords":["efficient"],"author_affiliations":[],"concepts":[],"official_report":false,"quality_signals":{"company_match_source":"HuggingFace Papers organization/author metadata"}},{"id":"arxiv:2509.22622","title":"LongLive: Real-time Interactive Long Video Generation","url":"https://huggingface.co/papers/2509.22622","published":"2025-09-26","authors":["Shuai Yang","Wei Huang","Ruihang Chu","Yicheng Xiao","Yuyang Zhao","Xianbang Wang","Muyang Li","Enze Xie","Yingcong Chen","Yao Lu","Song Han","Yukang Chen"],"abstract":"We present LongLive, a frame-level autoregressive (AR) framework for real-time and interactive long video generation. Long video generation presents challenges in both efficiency and quality. Diffusion and Diffusion-Forcing models can produce high-quality videos but suffer from low efficiency due to bidirectional attention. Causal attention AR models support KV caching for faster inference, but often degrade in quality on long videos due to memory challenges during long-video training. In addition, beyond static prompt-based generation, interactive capabilities, such as streaming prompt inputs, are critical for dynamic content creation, enabling users to guide narratives in real time. This interactive requirement significantly increases complexity, especially in ensuring visual consistency and semantic coherence during prompt transitions. To address these challenges, LongLive adopts a ca...","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["huggingface_search"],"source":"huggingface_search","work_type":"","doi":"","openalex_id":"","cited_by_count":0,"quality_score":31,"matched_keywords":["memory"],"author_affiliations":[],"concepts":[],"official_report":false,"quality_signals":{"company_match_source":"HuggingFace Papers organization/author metadata"}},{"id":"arxiv:2510.03264","title":"Front-Loading Reasoning: The Synergy between Pretraining and Post-Training Data","url":"https://huggingface.co/papers/2510.03264","published":"2025-09-26","authors":["Syeda Nahida Akter","Shrimai Prabhumoye","Eric Nyberg","Mostofa Patwary","Mohammad Shoeybi","Yejin Choi","Bryan Catanzaro"],"abstract":"The prevailing paradigm for enhancing the reasoning abilities of LLMs revolves around post-training on high-quality, reasoning-intensive data. While emerging literature suggests that reasoning data is increasingly incorporated also during the mid-training stage-a practice that is relatively more proprietary and less openly characterized-the role of such data in pretraining remains unclear. In particular, due to the opaqueness of pretraining corpora in most frontier models, the effect of reasoning data introduced at different phases of pre- and/or post-training is relatively less reported in the scientific literature. This raises several important questions: Is adding reasoning data earlier during pretraining any better than introducing it during post-training? Could earlier inclusion risk overfitting and harm generalization, or instead establish durable foundations that later fine-tuning...","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["huggingface_search"],"source":"huggingface_search","work_type":"","doi":"","openalex_id":"","cited_by_count":0,"quality_score":31,"matched_keywords":["LLM"],"author_affiliations":[],"concepts":[],"official_report":false,"quality_signals":{"company_match_source":"HuggingFace Papers organization/author metadata"}},{"id":"arxiv:2509.22072","title":"Fine-tuning Done Right in Model Editing","url":"https://huggingface.co/papers/2509.22072","published":"2025-09-26","authors":["Wanli Yang","Fei Sun","Rui Tang","Hongyu Zang","Du Su","Qi Cao","Jingang Wang","Huawei Shen","Xueqi Cheng"],"abstract":"Fine-tuning, a foundational method for adapting large language models, has long been considered ineffective for model editing. Here, we challenge this belief, arguing that the reported failure arises not from the inherent limitation of fine-tuning itself, but from adapting it to the sequential nature of the editing task, a single-pass depth-first pipeline that optimizes each sample to convergence before moving on. While intuitive, this depth-first pipeline coupled with sample-wise updating over-optimizes each edit and induces interference across edits. Our controlled experiments reveal that simply restoring fine-tuning to the standard breadth-first (i.e., epoch-based) pipeline with mini-batch optimization substantially improves its effectiveness for model editing. Moreover, fine-tuning in editing also suffers from suboptimal tuning parameter locations inherited from prior methods. Throug...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["huggingface_search"],"source":"huggingface_search","work_type":"","doi":"","openalex_id":"","cited_by_count":0,"quality_score":27,"matched_keywords":[],"author_affiliations":[],"concepts":[],"official_report":false,"quality_signals":{"company_match_source":"HuggingFace Papers organization/author metadata"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/benefits-and-pitfalls-of-reinforcement-learning-for-language-model-planning-a-theoretical-perspective","title":"Benefits and Pitfalls of Reinforcement Learning for Language Model Planning: A Theoretical Perspective","url":"https://www.microsoft.com/en-us/research/publication/benefits-and-pitfalls-of-reinforcement-learning-for-language-model-planning-a-theoretical-perspective/","published":"2025-09-25","authors":["Siwei Wang","Yifei Shen","Haoran Sun","Shi Feng","Shang-Hua Teng","Li Dong","Yaru Hao","Wei Chen"],"abstract":"Recent reinforcement learning (RL) methods have substantially enhanced the planning capabilities of Large Language Models (LLMs), yet the theoretical basis for their effectiveness remains elusive. In this work, we investigate RL's benefits and limitations through a tractable graph-based abstraction, focusing on policy gradient (PG) and Q-learning methods. Our theoretical analyses reveal that supervised fine-tuning (SFT) may introduce co-occurrence-based spurious solutions, whereas RL achieves correct planning primarily through exploration, underscoring exploration's role in enabling better generalization. However, we also show that PG suffers from diversity collapse, where output diversity decreases during training and persists even after perfect accuracy is attained. By contrast, Q-learning provides two key advantages: off-policy learning and diversity preservation at convergence. We fu...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":84,"matched_keywords":["Inproceedings (Conference)","Algorithms","Artificial intelligence","Computer science","mathematics","Reinforcement learning","1970-01-01","language model"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/medha-efficient-llm-inference-on-multi-million-context-lengths-without-approximation","title":"Medha: Efficient LLM Inference on Multi-Million Context Lengths Without Approximation","url":"https://www.microsoft.com/en-us/research/publication/medha-efficient-llm-inference-on-multi-million-context-lengths-without-approximation/","published":"2025-09-25","authors":["Amey Agrawal","Haoran Qiu","Junda Chen","Íñigo Goiri","Chaojie Zhang","Rayyan Shahid","Ramachandran Ramjee","Alexey Tumanov","Esha Choukse"],"abstract":"As large language models (LLMs) handle increasingly longer contexts, serving long inference requests of millions of tokens presents unique challenges. We show that existing work for long context inference is largely based on techniques from long context training, and does not handle the high variability in input lengths during inference. This leads to inefficient resource utilization, server fragmentation, and head-of-line (HOL) blocking.We present Medha, an end-to-end system for efficient long-context LLM inference that addresses these challenges through fine-grained time sharing. Medha introduces three key innovations: (1) the mechanism of adaptive prefill chunking to help mitigate HOL blocking with preemption; (2) two new parallelism strategies: Sequence Pipeline Parallelism (SPP) to reduce time-to-first-token by pipelining prefill chunks, and KV-Cache Parallelism (KVP) to lower time-...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":80,"matched_keywords":["Miscellaneous","Artificial intelligence","Systems and networking","Ai systems","LLMs Inference","LLM","efficient"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/prore-a-proactive-reward-system-for-gui-agents-via-reasoner-actor-collaboration","title":"ProRe: A Proactive Reward System for GUI Agents via Reasoner-Actor Collaboration","url":"https://www.microsoft.com/en-us/research/publication/prore-a-proactive-reward-system-for-gui-agents-via-reasoner-actor-collaboration/","published":"2025-09-25","authors":["Gaole Dai","Shiqi Jiang","Ting Cao","Yuqing Yang","Yuanchun Li","Rui Tan","Mo Li","Lili Qiu"],"abstract":"Reward is critical to the evaluation and training of large language models (LLMs). However, existing rule-based or model-based reward methods struggle to generalize to GUI agents, where access to ground-truth trajectories or application databases is often unavailable, and static trajectory-based LLM-as-a-Judge approaches suffer from limited accuracy. To address these challenges, we propose ProRe, a proactive reward system that leverages a general-purpose reasoner and domain-specific evaluator agents (actors). The reasoner schedules targeted state probing tasks, which the evaluator agents then execute by actively interacting with the environment to collect additional observations. This enables the reasoner to assign more accurate and verifiable rewards to GUI agents. Empirical results on over 3K trajectories demonstrate that ProRe improves reward accuracy and F1 score by up to 5.3% and 19...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","large language models","LLM"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/the-role-of-synthetic-data-in-multilingual-multi-cultural-ai-systems-lessons-from-indic-languages","title":"The role of synthetic data in Multilingual, Multi-cultural AI systems: Lessons from Indic Languages","url":"https://www.microsoft.com/en-us/research/publication/the-role-of-synthetic-data-in-multilingual-multi-cultural-ai-systems-lessons-from-indic-languages/","published":"2025-09-25","authors":["Pranjal A. Chitale","Varun Gumma","Sanchit Ahuja","Prashant Kodali","Manan Uppadhyay","Deepthi Sudharsan","Sunayana Sitaram"],"abstract":"Developing AI systems that operate effectively across languages while remaining culturally grounded is a long-standing challenge, particularly in low-resource settings. Synthetic data provides a promising avenue, yet its effectiveness in multilingual and multicultural contexts remains underexplored. We investigate the creation and impact of synthetic, culturally contextualized datasets for Indian languages through a bottom-up generation strategy that prompts large open-source LLMs (= 235B parameters) to ground data generation in language-specific Wikipedia content. This approach complements the dominant top-down paradigm of translating synthetic datasets from high-resource languages such as English. We introduce Updesh, a high-quality large-scale synthetic instruction-following dataset comprising 9.5M data points across 13 Indian languages, encompassing diverse reasoning and generative t...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Miscellaneous","Human language technologies","Computer science","human language technologies"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"official:e316e74e15a6cc5e","title":"Gemini Robotics 1.5 Model Card","url":"https://storage.googleapis.com/deepmind-media/gemini-robotics/Gemini-Robotics-1-5-Tech-Report.pdf#page=30","published":"2025-09-25","authors":["Google/DeepMind"],"abstract":"Official Google DeepMind model card.","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"model_card","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["model card","Gemini Robotics 1.5"],"author_affiliations":["Google/DeepMind"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Google DeepMind model cards page https://deepmind.google/models/model-cards/"}},{"id":"official:d961a6b24fba170b","title":"EmbeddingGemma Model Card","url":"https://ai.google.dev/gemma/docs/embeddinggemma/model_card","published":"2025-09-25","authors":["Google/DeepMind"],"abstract":"Official Google DeepMind model card.","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"model_card","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["model card","EmbeddingGemma"],"author_affiliations":["Google/DeepMind"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Google DeepMind model cards page https://deepmind.google/models/model-cards/"}},{"id":"official:5bcfb9e17345cfc1","title":"Hunyuan3D-Omni: A Unified Framework for Controllable Generation of 3D Assets","url":"https://huggingface.co/papers/2509.21245","published":"2025-09-25","authors":["Tencent Hunyuan"],"abstract":"","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_report"],"source":"official_report","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Tencent/Hunyuan"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official company report source"}},{"id":"openalex:W4414497210","title":"Creative scar without generative AI: Individual creativity fails to sustain while homogeneity keeps climbing","url":"https://doi.org/10.1016/j.techsoc.2025.103087","published":"2025-09-25","authors":["Yiyong Zhou","QingHan Liu","Jihao Huang","Guiquan Li"],"abstract":"","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1016/j.techsoc.2025.103087","openalex_id":"https://openalex.org/W4414497210","cited_by_count":8,"quality_score":45,"matched_keywords":[],"author_affiliations":["BeiGene (China)","Ministry of Education","Ministry of Education and Child Care","Peking University","Peking University Sixth Hospital","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.8762000203132629},{"id":"https://openalex.org/C11012388","display_name":"Creativity","score":0.8366000056266785},{"id":"https://openalex.org/C167966045","display_name":"Generative model","score":0.5008000135421753},{"id":"https://openalex.org/C15744967","display_name":"Psychology","score":0.45719999074935913},{"id":"https://openalex.org/C142259097","display_name":"Homogeneity (statistics)","score":0.43950000405311584},{"id":"https://openalex.org/C70789860","display_name":"The arts","score":0.42559999227523804},{"id":"https://openalex.org/C180747234","display_name":"Cognitive psychology","score":0.4050999879837036},{"id":"https://openalex.org/C46686674","display_name":"Boosting (machine learning)","score":0.35569998621940613}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":8}},{"id":"arxiv:2509.21319","title":"RLBFF: Binary Flexible Feedback to bridge between Human Feedback & Verifiable Rewards","url":"https://huggingface.co/papers/2509.21319","published":"2025-09-25","authors":["Zhilin Wang","Jiaqi Zeng","Olivier Delalleau","Ellie Evans","Daniel Egert","Hoo-Chang Shin","Felipe Soares","Yi Dong","Oleksii Kuchaiev"],"abstract":"Reinforcement Learning with Human Feedback (RLHF) and Reinforcement Learning with Verifiable Rewards (RLVR) are the main RL paradigms used in LLM post-training, each offering distinct advantages. However, RLHF struggles with interpretability and reward hacking because it relies on human judgments that usually lack explicit criteria, whereas RLVR is limited in scope by its focus on correctness-based verifiers. We propose Reinforcement Learning with Binary Flexible Feedback (RLBFF), which combines the versatility of human-driven preferences with the precision of rule-based verification, enabling reward models to capture nuanced aspects of response quality beyond mere correctness. RLBFF extracts principles that can be answered in a binary fashion (e.g. accuracy of information: yes, or code readability: no) from natural language feedback. Such principles can then be used to ground Reward Mod...","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["huggingface_search"],"source":"huggingface_search","work_type":"","doi":"","openalex_id":"","cited_by_count":0,"quality_score":31,"matched_keywords":["LLM"],"author_affiliations":[],"concepts":[],"official_report":false,"quality_signals":{"company_match_source":"HuggingFace Papers organization/author metadata"}},{"id":"arxiv:2509.21042","title":"Behind RoPE: How Does Causal Mask Encode Positional Information?","url":"https://huggingface.co/papers/2509.21042","published":"2025-09-25","authors":["Junu Kim","Xiao Liu","Zhenghao Lin","Lei Ji","Yeyun Gong","Edward Choi"],"abstract":"While explicit positional encodings such as RoPE are a primary source of positional information in Transformer decoders, the causal mask also provides positional information. In this work, we prove that the causal mask can induce position-dependent patterns in attention scores, even without parameters or causal dependency in the input. Our theoretical analysis indicates that the induced attention pattern tends to favor nearby query-key pairs, mirroring the behavior of common positional encodings. Empirical analysis confirms that trained models exhibit the same behavior, with learned parameters further amplifying these patterns. Notably, we found that the interaction of causal mask and RoPE distorts RoPE's relative attention score patterns into non-relative ones. We consistently observed this effect in modern large language models, suggesting the importance of considering the causal mask....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["huggingface_search"],"source":"huggingface_search","work_type":"","doi":"","openalex_id":"","cited_by_count":0,"quality_score":27,"matched_keywords":[],"author_affiliations":[],"concepts":[],"official_report":false,"quality_signals":{"company_match_source":"HuggingFace Papers organization/author metadata"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/cad-tokenizer-towards-text-based-cad-prototyping-via-modality-specific-tokenization","title":"CAD-Tokenizer: Towards Text-based CAD Prototyping via Modality-Specific Tokenization","url":"https://www.microsoft.com/en-us/research/publication/cad-tokenizer-towards-text-based-cad-prototyping-via-modality-specific-tokenization/","published":"2025-09-24","authors":["Ruiyu Wang","Shizhao Sun","Weijian Ma","Jiang Bian"],"abstract":"Computer-Aided Design (CAD) is a foundational component of industrial prototyping, where models are defined not by raw coordinates but by construction sequences such as sketches and extrusions. This sequential structure enables both efficient prototype initialization and subsequent editing. Text-guided CAD prototyping, which unifies Text-to-CAD generation and CAD editing, has the potential to streamline the entire design pipeline. However, prior work has not explored this setting, largely because standard large language model (LLM) tokenizers decompose CAD sequences into natural-language word pieces, failing to capture primitive-level CAD semantics and hindering attention modules from modeling geometric structure. We conjecture that a multimodal tokenization strategy, aligned with CAD's primitive and structural nature, can provide more effective representations. To this end, we propose C...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":84,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","Computer-Aided Design","1970-01-01","LLM","language model","efficient"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"official:fb4809968faa0b08","title":"CWM: An Open-Weights LLM for Research on CodeGeneration with World Models","url":"https://ai.meta.com/research/publications/cwm-an-open-weights-llm-for-research-on-code-generation-with-world-models/","published":"2025-09-24","authors":["Jade Copet","Quentin Carbonneaux","Gal Cohen","Jonas Gehring","Jacob Kahn","Jannik Kossen","Felix Kreuk","Emily McMilin","Michel Meyer","Yuxiang Wei","David Zhang","Kunhao Zheng"],"abstract":"Official Meta AI publication page.","companies":["Meta/FAIR"],"matched_orgs":["Meta/FAIR"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["NLP","LLM"],"author_affiliations":["Meta/FAIR"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Meta AI publication search https://ai.meta.com/results/?content_types%5B0%5D=publication&page=3"}},{"id":"openalex:W4414463097","title":"Agents Are Not Enough","url":"https://doi.org/10.1109/mc.2025.3575829","published":"2025-09-24","authors":["Chirag Shah","Ryen W. White"],"abstract":"As artificial intelligence (AI) integration grows, autonomous agents are experiencing a resurgence, and examining their past incarnations reveals lessons about what worked and what failed that can inform current development. We propose combining generative AI with a robust ecosystem to make the current wave of agents more effective and sustainable.","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/mc.2025.3575829","openalex_id":"https://openalex.org/W4414463097","cited_by_count":3,"quality_score":40,"matched_keywords":[],"author_affiliations":["Microsoft (United States)","University of Washington"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.47510001063346863},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.41609999537467957},{"id":"https://openalex.org/C148043351","display_name":"Current (fluid)","score":0.336899995803833},{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.3301999866962433},{"id":"https://openalex.org/C13687954","display_name":"Autonomous agent","score":0.3273000121116638},{"id":"https://openalex.org/C157170001","display_name":"Applications of artificial intelligence","score":0.296099990606308},{"id":"https://openalex.org/C2780791683","display_name":"Action (physics)","score":0.2921999990940094},{"id":"https://openalex.org/C61797465","display_name":"Term (time)","score":0.2818000018596649}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":3}},{"id":"openalex:W4414485088","title":"The Illusion of Thinking","url":"https://doi.org/10.70777/si.v2i6.15919","published":"2025-09-23","authors":["Parshin Shojaee","Iman Mirzadeh","Keivan Alizadeh","Maxwell Horton","Samy Bengio","Mehrdad Farajtabar"],"abstract":"Recent generations of frontier language models have introduced Large Reasoning Models (LRMs) that generate detailed thinking processes before providing answers. While these models demonstrate improved performance on reasoning benchmarks, their fundamental capabilities, scaling properties, and limitations remain insufficiently understood. Current evaluations primarily focus on established mathematical and coding benchmarks, emphasizing final answer accuracy. However, this evaluation paradigm often suffers from data contamination and does not provide insights into the reasoning traces’ structure and quality. In this work, we systematically investigate these gaps with the help of controllable puzzle environments that allow precise manipulation of compositional complexity while maintaining consistent logical structures. This setup enables the analysis of not only final answers but also the i...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.70777/si.v2i6.15919","openalex_id":"https://openalex.org/W4414485088","cited_by_count":102,"quality_score":71,"matched_keywords":["LLM"],"author_affiliations":["Apple (Germany)","Apple (Israel)","Apple (United States)"],"concepts":[{"id":"https://openalex.org/C2776214188","display_name":"Inference","score":0.6791999936103821},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6399000287055969},{"id":"https://openalex.org/C101097943","display_name":"Counterintuitive","score":0.6363999843597412},{"id":"https://openalex.org/C192209626","display_name":"Focus (optics)","score":0.5034000277519226},{"id":"https://openalex.org/C103057564","display_name":"Analytic reasoning","score":0.4779999852180481},{"id":"https://openalex.org/C99844830","display_name":"Scaling","score":0.4417000114917755},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3840999901294708},{"id":"https://openalex.org/C83725634","display_name":"Qualitative reasoning","score":0.3824999928474426}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":102}},{"id":"apple:qfqx20o4c99xtqyv0rmyca37","title":"EpiCache: Episodic KV Cache Management for Long Conversational Question Answering","url":"https://machinelearning.apple.com/research/epicache","published":"2025-09-23","authors":["Minsoo Kim","Arnav Kundu","Han-Byul Kim","Richa Dixit","Minsik Cho"],"abstract":"Recent advances in large language models (LLMs) have extended context lengths, enabling assistants to sustain long histories for coherent, personalized responses. This ability, however, hinges on Key-Value (KV) caching, whose memory grows linearly with dialogue length and quickly dominates under strict resource constraints. An active line of research for reducing this overhead is KV cache compression, which seeks to limit cache size while...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["personalized","memory","compression"],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"official:9a3b12386b801da1","title":"MetaEmbed: Scaling Multimodal Retrieval at Test-Time with Flexible Late Interactions","url":"https://ai.meta.com/research/publications/metaembed-scaling-multimodal-retrieval-at-test-time-with-flexible-late-interactions/","published":"2025-09-23","authors":["Zilin Xiao","Qi Ma","Mengting Gu","Jason Chen","Xintao Chen","Vicente Ordonez","Vijai Mohan"],"abstract":"Official Meta AI publication page.","companies":["Meta/FAIR"],"matched_orgs":["Meta/FAIR"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["NLP","retrieval"],"author_affiliations":["Meta/FAIR"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Meta AI publication search https://ai.meta.com/results/?content_types%5B0%5D=publication&page=3"}},{"id":"apple:wbeh28db5ifklwjpn8fbggk4","title":"Adversarial Distilled Retrieval-Augmented Guarding Model for Online Malicious Intent Detection","url":"https://machinelearning.apple.com/research/adversarial-distilled","published":"2025-09-23","authors":["Yihao Guo","Haocheng Bian","Liutong Zhou","Ze Wang","Zhaoyi Zhang","Francois Kawala","Milan Dean¶","Ian Fischer","Yuantao Peng","Noyan Tokgozoglu","Ivan Barrientos","Riyaaz Shaik"],"abstract":"With the deployment of Large Language Models (LLMs) in interactive applications, online malicious intent detection has become increasingly critical. However, existing approaches fall short of handling diverse and complex user queries in real time. To address these challenges, we introduce ADRAG (Adversarial Distilled Retrieval-Augmented Guard), a two-stage framework for robust and efficient online malicious intent detection. In the training...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["retrieval","efficient"],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"official:8f61521075ec3d57","title":"Qwen3Guard: Real-time Safety for Your Token Stream","url":"https://qwenlm.github.io/blog/qwen3guard/","published":"2025-09-23","authors":["Alibaba/Qwen"],"abstract":"Tech Report GitHub Hugging Face ModelScope DISCORDIntroduction We are excited to introduce Qwen3Guard, the first safety guardrail model in the Qwen family. Built upon the powerful Qwen3 foundation models and fine-tuned specifically for safety classificatoin, Qwen3Guard ensures responsible AI interactions by delivering precise safety detection for both prompts and responses, complete with risk levels and categorized classifications for accurate moderation.Qwen3Guard achieves state-of-the-art performance on major safety benchmarks, demonstrating strong capabilities in both prompt and response classification tasks across English, Chinese, and multilingual environments.","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Alibaba/Qwen"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official RSS feed https://qwenlm.github.io/blog/index.xml"}},{"id":"apple:wxd8de5mvxi975ckehrolc1m","title":"MM-Spatial: Exploring 3D Spatial Understanding in Multimodal LLMs","url":"https://machinelearning.apple.com/research/mm-spatial","published":"2025-09-23","authors":["Erik Daxberger","Nina Wenzel","David Griffiths","Haiming Gang","Justin Lazarow","Gefen Kohavi","Kai Kang","Marcin Eichner","Yinfei Yang","Afshin Dehghan","Peter Grasch"],"abstract":"Multimodal large language models (MLLMs) excel at 2D visual understanding but remain limited in their ability to reason about 3D space. In this work, we leverage large-scale high-quality 3D scene data with open-set annotations to introduce 1) a novel supervised fine-tuning dataset and 2) a new evaluation benchmark, focused on indoor scenes. Our Cubify Anything VQA (CA-VQA) data covers diverse spatial tasks including spatial relationship...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"apple:btbdwtp3g0y8nibxgj7278vn","title":"Classifier-Free Guidance is a Predictor-Corrector","url":"https://machinelearning.apple.com/research/classifier-free-guidance","published":"2025-09-23","authors":["Arwen Bradley","Preetum Nakkiran"],"abstract":"We investigate the theoretical foundations of classifier-free guidance (CFG). CFG is the dominant method of conditional sampling for text-to-image diffusion models, yet unlike other aspects of diffusion, it remains on shaky theoretical footing. In this paper, we disprove common misconceptions, by showing that CFG interacts differently with DDPM (Ho et al., 2020) and DDIM (Song et al., 2021), and neither sampler with CFG generates the...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"openalex:W4414419493","title":"A collaborative large language model for drug analysis","url":"https://doi.org/10.1038/s41551-025-01471-z","published":"2025-09-23","authors":["Hongjian Zhou","Fenglin Liu","Jinge Wu","Wenjun Zhang","Guowei Huang","Lei Clifton","David W. Eyre","Haochen Luo","F. Liu","Kim Branson","Patrick Schwab","Xian Wu"],"abstract":"Large language models (LLMs), such as ChatGPT, have substantially helped in understanding human inquiries and generating textual content with human-level fluency. However, directly using LLMs in healthcare applications faces several problems. LLMs are prone to produce hallucinations, or fluent content that appears reasonable and genuine but that is factually incorrect. Ideally, the source of the generated content should be easily traced for clinicians to evaluate. We propose a knowledge-grounded collaborative large language model, DrugGPT, to make accurate, evidence-based and faithful recommendations that can be used for clinical decisions. DrugGPT incorporates diverse clinical-standard knowledge bases and introduces a collaborative mechanism that adaptively analyses inquiries, captures relevant knowledge sources and aligns these inquiries and knowledge sources when dealing with differen...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1038/s41551-025-01471-z","openalex_id":"https://openalex.org/W4414419493","cited_by_count":8,"quality_score":49,"matched_keywords":["language model"],"author_affiliations":["GlaxoSmithKline (United Kingdom)","Nuffield Health","Open Data Institute","Science Oxford","Suzhou Research Institute","Tencent (China)","University College London","University of Oxford","Westlake University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6879000067710876},{"id":"https://openalex.org/C116834253","display_name":"Identification (biology)","score":0.6793000102043152},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.44589999318122864},{"id":"https://openalex.org/C26517878","display_name":"Key (lock)","score":0.4359999895095825},{"id":"https://openalex.org/C89611455","display_name":"Mechanism (biology)","score":0.4153999984264374},{"id":"https://openalex.org/C2522767166","display_name":"Data science","score":0.3752000033855438},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.3702000081539154},{"id":"https://openalex.org/C2780035454","display_name":"Drug","score":0.36570000648498535}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":8}},{"id":"openalex:W4414563742","title":"A survey for large language models in biomedicine","url":"https://doi.org/10.1016/j.artmed.2025.103268","published":"2025-09-23","authors":["Chong Wang","Mengyao Li","Junjun He","Zhongruo Wang","Erfan Darzi","Zan Chen","Jin Ye","Tianbin Li","Yanzhou Su","Jing Ke","Kaili Qu","Shuxin Li"],"abstract":"","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1016/j.artmed.2025.103268","openalex_id":"https://openalex.org/W4414563742","cited_by_count":9,"quality_score":46,"matched_keywords":[],"author_affiliations":["Amazon (United States)","Boston Children's Hospital","Institute of Natural Science","Johns Hopkins University","Johns Hopkins University Applied Physics Laboratory","Monash University","Shanghai Artificial Intelligence Laboratory","Shanghai Jiao Tong University","University of Cambridge","Xinxiang Medical University"],"concepts":[{"id":"https://openalex.org/C66782513","display_name":"Biomedicine","score":0.9340999722480774},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6588000059127808},{"id":"https://openalex.org/C2780233690","display_name":"Transparency (behavior)","score":0.6014000177383423},{"id":"https://openalex.org/C2522767166","display_name":"Data science","score":0.5353000164031982},{"id":"https://openalex.org/C139807058","display_name":"Adaptation (eye)","score":0.4025999903678894},{"id":"https://openalex.org/C9652623","display_name":"Field (mathematics)","score":0.3774999976158142},{"id":"https://openalex.org/C36503486","display_name":"Domain (mathematical analysis)","score":0.3546999990940094},{"id":"https://openalex.org/C55587333","display_name":"Engineering ethics","score":0.35260000824928284}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":9}},{"id":"openalex:W4414427187","title":"Language models reveal a complex sequence basis for adaptive convergent evolution of protein functions","url":"https://doi.org/10.1073/pnas.2418254122","published":"2025-09-23","authors":["Zhenqiu Cao","Hongjiu Zhang","Zhengting Zou"],"abstract":"Convergent evolution, or convergence, refers to repeated, independent emergences of the same trait in two or more lineages of species during evolution, often indicating functional adaptation to specific environmental factors. Many computational methods have been proposed to investigate the genetic basis for organismal functional convergence, as an important way to decode the complex sequence-function map of proteins. These methods mostly focus on the convergence of amino acid states at the level of individual sites in functionally related proteins. However, even without site-level sequence similarity, protein function similarity may also stem from convergence of high-order protein features, which cannot be captured by the conventional methods. To fill this gap, we first derived numerical embeddings from protein sequences by pretrained protein language models (PLM). In four previously rep...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1073/pnas.2418254122","openalex_id":"https://openalex.org/W4414427187","cited_by_count":6,"quality_score":43,"matched_keywords":[],"author_affiliations":["Chinese Academy of Sciences","Institute of Zoology","Microsoft (United States)","University of Chinese Academy of Sciences"],"concepts":[{"id":"https://openalex.org/C62142553","display_name":"Convergent evolution","score":0.7799000144004822},{"id":"https://openalex.org/C2777303404","display_name":"Convergence (economics)","score":0.7096999883651733},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.49070000648498535},{"id":"https://openalex.org/C103278499","display_name":"Similarity (geometry)","score":0.4361000061035156},{"id":"https://openalex.org/C14036430","display_name":"Function (biology)","score":0.42089998722076416},{"id":"https://openalex.org/C10010492","display_name":"Protein sequencing","score":0.4171999990940094},{"id":"https://openalex.org/C12426560","display_name":"Basis (linear algebra)","score":0.41359999775886536},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.41100001335144043}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":6}},{"id":"arxiv:2509.19228","title":"CompLLM: Compression for Long Context Q&A","url":"https://huggingface.co/papers/2509.19228","published":"2025-09-23","authors":["Gabriele Berton","Jayakrishnan Unnikrishnan","Son Tran","Mubarak Shah"],"abstract":"Large Language Models (LLMs) face significant computational challenges when processing long contexts due to the quadratic complexity of self-attention. While soft context compression methods, which map input text to smaller latent representations, have shown promise, their real-world adoption is limited. Existing techniques typically compress the context as a single unit, which leads to quadratic compression complexity and an inability to reuse computations across queries with overlapping contexts. In this work, we introduce CompLLM, a soft compression technique designed for practical deployment. Instead of processing the context holistically, CompLLM divides it into segments and compresses each one independently. This simple design choice yields three critical properties: efficiency, as the compression step scales linearly with the context length; scalability, enabling models trained on...","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["huggingface_search"],"source":"huggingface_search","work_type":"","doi":"","openalex_id":"","cited_by_count":0,"quality_score":31,"matched_keywords":["compression"],"author_affiliations":[],"concepts":[],"official_report":false,"quality_signals":{"company_match_source":"HuggingFace Papers organization/author metadata"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/stackfeed","title":"STACKFEED: Structured Textual Actor-Critic Knowledge Base Editing with Feedback","url":"https://www.microsoft.com/en-us/research/publication/stackfeed/","published":"2025-09-22","authors":["Naman Gupta","Shashank Kirtania","Priyanshu Gupta","Krishna Kariya","Sumit Gulwani","Arun Iyer","Suresh Parthasarathy","Arjun Radhakrishna","Sriram Rajamani","Gustavo Soares"],"abstract":"Large Language Models (LLMs) are increasingly used for complex software engineering tasks but often generate incorrect or outdated code. Retrieval-Augmented Generation systems attempt to solve this by using external knowledge bases (KB) like API documentation, but in the fast-paced world of software development, this documentation itself quickly becomes outdated. To address this critical gap, we introduce STACKFEED, a novel Structured Textual Actor-Critic Knowledge base editing with FEEDback approach that iteratively refines documentation using feedback from oracles, such as compiler errors or test failures, via a multi-actor, centralized critic architecture. Each document in the KB is managed by a dedicated ReACT actor agent that performs structured edits based on targeted instructions from the critic. We demonstrate STACKFEED’s effectiveness on challenging software engineering scenario...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":80,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Machine learning","Multi-agent system","1970-01-01","retrieval","agent"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/actions-speak-louder-than-prompts-a-large-scale-study-of-llms-for-graph-inference","title":"Actions Speak Louder than Prompts: A Large-Scale Study of LLMs for Graph Inference","url":"https://www.microsoft.com/en-us/research/publication/actions-speak-louder-than-prompts-a-large-scale-study-of-llms-for-graph-inference/","published":"2025-09-22","authors":["Ben Finkelshtein","Silviu Cucerzan","Sujay Kumar Jauhar","Ryen W. White"],"abstract":"Large language models (LLMs) are increasingly used for text-rich graph machine learning tasks such as node classification in high-impact domains like fraud detection and recommendation systems. Yet, despite a surge of interest, the field lacks a principled understanding of the capabilities of LLMs in their interaction with graph data. In this work, we conduct a large-scale, controlled evaluation across several key axes of variability to systematically assess the strengths and weaknesses of LLM-based graph reasoning methods in text-based applications. The axes include the LLM-graph interaction mode, comparing prompting, tool-use, and code generation; dataset domains, spanning citation, web-link, e-commerce, and social networks; structural regimes contrasting homophilic and heterophilic graphs; feature characteristics involving both short- and long-text node attributes; and model configura...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","large language models","1970-01-01","LLM"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/the-illusion-of-readiness-in-health-ai","title":"The Illusion of Readiness in Health AI","url":"https://www.microsoft.com/en-us/research/publication/the-illusion-of-readiness-in-health-ai/","published":"2025-09-22","authors":["Yu Gu","Jingjing Fu","Xiaodong Liu","Jeya Maria Jose Valanarasu","Noel Codella","Reuben Tan","Jinyu Wang","Qianchu Liu","Ying Jin","Sheng Zhang","Rui Wang","Lei Song"],"abstract":"Large language models have demonstrated remarkable performance in a wide range of medical benchmarks. Yet underneath the seemingly promising results lie salient growth areas, especially in cutting-edge frontiers such as multimodal reasoning. In this paper, we introduce a series of adversarial stress tests to systematically assess the robustness of flagship models and medical benchmarks. Our study reveals prevalent brittleness in the presence of simple adversarial transformations: leading systems can guess the right answer even with key inputs removed, yet may get confused by the slightest prompt alterations, while fabricating convincing yet flawed reasoning traces. Using clinician-guided rubrics, we demonstrate that popular medical benchmarks vary widely in what they truly measure. Our study reveals significant competency gaps of frontier AI in attaining real-world readiness for health a...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Miscellaneous","Artificial intelligence","Medical, health and genomics","Computer science"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"bytedance-seed:314","title":"MEF: A Systematic Evaluation Framework for Text-to-Image Models","url":"https://seed.bytedance.com/en/research/mef-a-systematic-evaluation-framework-for-text-to-image-models","published":"2025-09-22","authors":["Xiaojing Dong","Weilin Huang","Liang Li","Yiying Li","Shu Liu","Tongtong Ou","Shuang Ouyang","Yu Tian","Fengxuan Zhao"],"abstract":"Rapid advances in text-to-image (T2I) generation have raised higher requirements for evaluation methodologies. Existing benchmarks center on objective capabilities and dimensions, but lack an application-scenario perspective, limiting external validity. Moreover, current evaluations typically rely on either ELO for overall ranking or MOS for dimension-specific scoring, yet both methods have inherent shortcomings and limited interpretability. Therefore, we introduce the Magic Evaluation Framework (MEF), a systematic and practical approach for evaluating T2I models. First, we propose a structured taxonomy encompassing user scenarios, elements, element compositions, and text expression forms to construct the Magic-Bench-377, which supports label-level assessment and ensures a balanced coverage of both user scenarios and capabilities. On this basis, we combine ELO and dimension-specific MOS....","companies":["ByteDance/Seed"],"matched_orgs":["ByteDance/Seed"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Artificial Intelligence","Vision","arXiv"],"author_affiliations":["ByteDance/Seed"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official ByteDance Seed publication API https://seed.bytedance.com/api/get_article_list_v2"}},{"id":"apple:lj0fd3j22lp8uyvzl3o9u1rl","title":"UniGen: Enhanced Training & Test-Time Strategies for Unified Multimodal Understanding and Generation","url":"https://machinelearning.apple.com/research/unigen-enhanced-training","published":"2025-09-22","authors":["Rui Tian","Mingfei Gao","Mingze Xu","Jiaming Hu","Jiasen Lu","Zuxuan Wu","Yinfei Yang","Afshin Dehghan"],"abstract":"We introduce UniGen, a unified multimodal large language model (MLLM) capable of image understanding and generation. We study the full training pipeline of UniGen from a data-centric perspective, including multi-stage pre-training, supervised fine-tuning, and direct preference optimization. More importantly, we propose a new Chain-of-Thought Verification (CoT-V) strategy for test-time scaling, which significantly boosts UniGen's image generation...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["language model","preference"],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"hf-org-paper:Qwen:2509.17765","title":"Qwen3-Omni Technical Report","url":"https://huggingface.co/papers/2509.17765","published":"2025-09-22","authors":["Alibaba/Qwen"],"abstract":"We present Qwen3-Omni, a single multimodal model that, for the first time, maintains state-of-the-art performance across text, image, audio, and video without any degradation relative to single-modal counterparts. Qwen3-Omni matches the performance of same-sized single-modal models within the Qwen series and excels particularly on audio tasks. Across 36 audio and audio-visual benchmarks, Qwen3-Omni achieves open-source SOTA on 32 benchmarks and overall SOTA on 22, outperforming strong closed-source models such as Gemini-2.5-Pro, Seed-ASR, and GPT-4o-Transcribe. Qwen3-Omni adopts a Thinker-Talker MoE architecture that unifies perception and generation across text, images, audio, and video, yielding fluent text and natural real-time speech. It supports text interaction in 119 languages, speech understanding in 19 languages, and speech generation in 10 languages. To reduce first-packet late...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["HuggingFace org papers","Qwen"],"author_affiliations":["Alibaba/Qwen"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/Qwen/papers"}},{"id":"apple:uuhng56lavudjkl3aqslw0a8","title":"Guiding Cross-Modal Representations with MLLM Priors via Preference Alignment","url":"https://machinelearning.apple.com/research/guiding-cross-modal","published":"2025-09-22","authors":["Pengfei Zhao","Rongbo Luan","Wei Zhang","Peng Wu","Sifeng He"],"abstract":"Despite Contrastive Language-Image Pretraining (CLIP)'s remarkable capability to retrieve content across modalities, a substantial modality gap persists in its feature space. Intriguingly, we discover that off-the-shelf MLLMs (Multimodal Large Language Models) demonstrate powerful inherent modality alignment properties. While recent MLLM-based retrievers with unified architectures partially mitigate this gap, their reliance on coarse modality...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["preference"],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"apple:vqom9gft9kvtymi4xm9pbu94","title":"Datasets, Documents, and Repetitions: The Practicalities of Unequal Data Quality","url":"https://machinelearning.apple.com/research/datasets-documents-repetitions","published":"2025-09-22","authors":["Alex Fang","Hadi Pouransari","Matt Jordan","Alexander Toshev","Vaishaal Shankar§","Ludwig Schmidt","Tom Gunter¶"],"abstract":"Data filtering has become a powerful tool for improving model performance while reducing computational cost. However, as large language model compute budgets continue to grow, the limited data volume provided by heavily filtered and deduplicated datasets will become a practical constraint. In efforts to better understand how to proceed, we study model performance at various compute budgets and across multiple pre-training datasets created through...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["language model"],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"apple:nz97cd5l3kmozvh5ybyuj6ox","title":"On Inductive Biases That Enable Generalization of Diffusion Transformers","url":"https://machinelearning.apple.com/research/on-inductive-biases","published":"2025-09-22","authors":["Jie An","De Wang","Pengsheng Guo","Jiebo Luo","Alexander Schwing"],"abstract":"Recent work studying the generalization of diffusion models with UNet-based denoisers reveals inductive biases that can be expressed via geometry-adaptive harmonic bases. However, in practice, more recent denoising networks are often based on transformers, e.g., the diffusion transformer (DiT). This raises the question: do transformer-based denoising networks exhibit inductive biases that can also be expressed via geometry-adaptive harmonic...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"apple:vjms4v4o5rxt1l6sn6ud0jge","title":"MobileCLIP2: Improving Multi-Modal Reinforced Training","url":"https://machinelearning.apple.com/research/mobileclip2","published":"2025-09-22","authors":["Fartash Faghri","Pavan Kumar Anasossalu Vasu","Cem Koc","Vaishaal Shankar","Alexander Toshev","Oncel Tuzel","Hadi Pouransari"],"abstract":"This paper received Featured Certification from Transactions on Machine Learning Research (TMLR) 2025.","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"apple:n8n7sldsntqvyq3sasqvzu9j","title":"Instance-Optimality for Private KL Distribution Estimation","url":"https://machinelearning.apple.com/research/instance-optimality","published":"2025-09-22","authors":["Jiayuan Ye","Vitaly Feldman","Kunal Talwar"],"abstract":"We study the fundamental problem of estimating an unknown discrete distribution p over d symbols, given n i.i.d. samples from the distribution. We are interested in minimizing the KL divergence between the true distribution and the algorithm's estimate. We first construct minimax optimal private estimators. Minimax optimality however fails to shed light on an algorithm's performance on individual (non-worst-case) instances p and simple...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"openalex:W4414419591","title":"ADAgent: LLM Agent for Alzheimer’s Disease Analysis with Collaborative Coordinator","url":"https://doi.org/10.1007/978-3-032-06004-4_3","published":"2025-09-22","authors":["Wenlong Hou","Guangqian Yang","Ye Du","Yeung Lau","Lihao Liu","Junjun He","Линг Лонг","Shujun Wang"],"abstract":"","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"book-chapter","doi":"https://doi.org/10.1007/978-3-032-06004-4_3","openalex_id":"https://openalex.org/W4414419591","cited_by_count":1,"quality_score":46,"matched_keywords":["LLM","agent"],"author_affiliations":["Amazon (United States)","Beijing Academy of Artificial Intelligence","Hong Kong Polytechnic University","Seattle University","Shanghai Artificial Intelligence Laboratory","Sun Yat-sen University","Third Affiliated Hospital of Sun Yat-sen University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8496000170707703},{"id":"https://openalex.org/C2779903281","display_name":"Modalities","score":0.6492999792098999},{"id":"https://openalex.org/C98045186","display_name":"Process (computing)","score":0.593999981880188},{"id":"https://openalex.org/C177264268","display_name":"Set (abstract data type)","score":0.5467000007629395},{"id":"https://openalex.org/C2780665704","display_name":"Intervention (counseling)","score":0.5015000104904175},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4758000075817108},{"id":"https://openalex.org/C534262118","display_name":"Medical diagnosis","score":0.4422000050544739},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.38260000944137573}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4414399064","title":"Transformer-based multimodal learning for predicting mechanical properties in heat-treated stainless steel","url":"https://doi.org/10.1016/j.matdes.2025.114800","published":"2025-09-22","authors":["Xuefei Wang","Shijie Zhang","Di Jiang","Winnie Yu","Yihao Zheng","Chunyang Luo","Haojie Wang","Zhaodong Wang"],"abstract":"• Innovative Multimodal Learning Approach: Introduces a novel multimodal learning framework utilizing a Transformer-based model to predict the mechanical properties of vacuum carburized martensitic stainless steel. • Integration of Diverse Data Sources: Successfully combines microstructure images, material composition, and process parameters to construct a comprehensive model for predicting hardness and wear resistance. • Enhanced Prediction Accuracy: Achieves significant improvements in prediction accuracy, with an R 2 value of 0.98 and a mean absolute error (MAE) of 5.23 HV for hardness prediction. • Application of Variational Mode Decomposition (VMD) : Implements VMD to process wear curves, effectively reducing noise and enhancing the accuracy of wear performance predictions. • Potential for Broader Applications: Demonstrates the potential of multimodal learning in materials science,....","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1016/j.matdes.2025.114800","openalex_id":"https://openalex.org/W4414399064","cited_by_count":1,"quality_score":38,"matched_keywords":[],"author_affiliations":["Henan Academy of Sciences","Huawei Technologies (China)","Northeastern University","Shenyang University of Technology"],"concepts":[{"id":"https://openalex.org/C192562407","display_name":"Materials science","score":0.777400016784668},{"id":"https://openalex.org/C63479239","display_name":"Robustness (evolution)","score":0.7175999879837036},{"id":"https://openalex.org/C98045186","display_name":"Process (computing)","score":0.4851999878883362},{"id":"https://openalex.org/C27158222","display_name":"Generalizability theory","score":0.4187000095844269},{"id":"https://openalex.org/C99498987","display_name":"Noise (video)","score":0.38280001282691956},{"id":"https://openalex.org/C78519656","display_name":"Mechanical engineering","score":0.3506999909877777},{"id":"https://openalex.org/C87976508","display_name":"Microstructure","score":0.3481999933719635},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.32839998602867126}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4414417511","title":"On the Risk of Misleading Reports: Diagnosing Textual Biases in Multimodal Clinical AI","url":"https://doi.org/10.1007/978-3-032-06004-4_32","published":"2025-09-22","authors":["David Restrepo","Sofia Ira Ktena","Maria Vakalopoulou","Stergios Christodoulidis","Enzo Ferrante"],"abstract":"","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"book-chapter","doi":"https://doi.org/10.1007/978-3-032-06004-4_32","openalex_id":"https://openalex.org/W4414417511","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["CentraleSupélec","Consejo Nacional de Investigaciones Científicas y Técnicas","Google (United Kingdom)","Google DeepMind (United Kingdom)","Universidad de Buenos Aires","Université Paris-Saclay"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8259000182151794},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.6466000080108643},{"id":"https://openalex.org/C2780226545","display_name":"Modality (human–computer interaction)","score":0.5853999853134155},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.5490000247955322},{"id":"https://openalex.org/C19768560","display_name":"Dependency (UML)","score":0.47540000081062317},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.41659998893737793},{"id":"https://openalex.org/C31601959","display_name":"Medical imaging","score":0.4163999855518341},{"id":"https://openalex.org/C48372109","display_name":"Binary number","score":0.4065999984741211}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7123962553","title":"Rethinking Document Layout Analysis through Text Clustering via Multi-Modal Graph Convolution Networks","url":"https://doi.org/10.1109/mmsp64401.2025.11324237","published":"2025-09-21","authors":["Wenxi Li","Chenyang Lyu","Wei Ji","Liting Zhou","Cathal Gurrin","Y. F. Guo"],"abstract":"Document layout analysis, a critical process in automated document processing, traditionally relies on object detection techniques, primarily focusing on the structural segmentation of documents. However, these approaches often fall short in comprehensively understanding the semantic content within the text, leading to a disjointed analysis of document structure and content. To address this, we propose a novel methodology that combines text clustering with multi-modal graph convolution networks, aiming to integrate structural detection with semantic understanding. Our approach starts with text detection, followed by encoding using a large language model. Subsequently, we integrate visual and positional data using Graph Neural Networks to perform clustering, creating a synergy between the textual and structural aspects of documents. Extensive experiments on mainstream datasets demonstrate...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/mmsp64401.2025.11324237","openalex_id":"https://openalex.org/W7123962553","cited_by_count":0,"quality_score":41,"matched_keywords":["language model"],"author_affiliations":["Alibaba Group (China)","China Telecom","China Telecom (China)","Dublin City University","Tsinghua University"],"concepts":[{"id":"https://openalex.org/C72773152","display_name":"Document layout analysis","score":0.8406000137329102},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8023999929428101},{"id":"https://openalex.org/C177937566","display_name":"Document clustering","score":0.6818000078201294},{"id":"https://openalex.org/C73555534","display_name":"Cluster analysis","score":0.6014000177383423},{"id":"https://openalex.org/C132525143","display_name":"Graph","score":0.5184000134468079},{"id":"https://openalex.org/C89600930","display_name":"Segmentation","score":0.4869000017642975},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.459199994802475},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4262000024318695}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4415821613","title":"CP-Bench: A PyTorch Test Suite to Detect AI Hardware Failure, Performance Degradation, and Silent Data Corruption","url":"https://doi.org/10.1109/itc58126.2025.00062","published":"2025-09-20","authors":["Xun Jiao","Sunny Yang","Suman Gumudavelli","S Varshini","Abhinav Pandey","Abhinav Jauhri","Francesco Caggioni","Gautham Vunnam","Harish Dattatraya Dixit","Jason Liang","Philip Henzler","Sameeksha Gupta"],"abstract":"The growing complexity in manufacturing and operating the hardware in AI clusters leads to significant challenges in reliability. Hyperscalars have reported various AI hardware failures during high-stake jobs such as GenAI model training, where one GPU failure could bring down the entire training job. To tackle this issue, we present CP-Bench, an open-source, Configurable and Parameterizable, PyTorch-level test suite designed to test AI hardware failure, performance degradation, and silent data corruption (SDC). Built upon open-source projects, CP-Bench contains 30+ AI workloads (e.g., Llama), and implements various checks (e.g., SDC check) within these workloads. We have deployed CP-Bench throughout Meta’s AI hardware lifecycle, spanning manufacturing, in-production diagnostics, and device RMA; CP-Bench identified various hardware issues, some of which were not caught by vendor’s toolin...","companies":["Meta/FAIR"],"matched_orgs":["Meta/FAIR"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/itc58126.2025.00062","openalex_id":"https://openalex.org/W4415821613","cited_by_count":1,"quality_score":38,"matched_keywords":[],"author_affiliations":["BC Platforms (Finland)","Meta (United States)"],"concepts":[{"id":"https://openalex.org/C79581498","display_name":"Suite","score":0.696399986743927},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6919000148773193},{"id":"https://openalex.org/C2777338717","display_name":"Vendor","score":0.6725999712944031},{"id":"https://openalex.org/C9390403","display_name":"Computer hardware","score":0.49810001254081726},{"id":"https://openalex.org/C2777267654","display_name":"Test (biology)","score":0.4555000066757202},{"id":"https://openalex.org/C149635348","display_name":"Embedded system","score":0.4296000003814697},{"id":"https://openalex.org/C111919701","display_name":"Operating system","score":0.42160001397132874},{"id":"https://openalex.org/C151552104","display_name":"Test suite","score":0.40700000524520874}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/evaluating-the-effectiveness-and-scalability-of-llm-based-data-augmentation-for-retrieval","title":"Evaluating the Effectiveness and Scalability of LLM-Based Data Augmentation for Retrieval","url":"https://www.microsoft.com/en-us/research/publication/evaluating-the-effectiveness-and-scalability-of-llm-based-data-augmentation-for-retrieval/","published":"2025-09-19","authors":["Pranjal A. Chitale","Bishal Santra","Yashoteja Prabhu","Amit Sharma"],"abstract":"Compact dual-encoder models are widely used for retrieval owing to their efficiency and scalability. However, such models often underperform compared to their Large Language Model (LLM)-based retrieval counterparts, likely due to their limited world knowledge. While LLM-based data augmentation has been proposed as a strategy to bridge this performance gap, there is insufficient understanding of its effectiveness and scalability to real-world retrieval problems. Existing research does not systematically explore key factors such as the optimal augmentation scale, the necessity of using large augmentation models, and whether diverse augmentations improve generalization, particularly in out-of-distribution (OOD) settings. This work presents a comprehensive study of the effectiveness of LLM augmentation for retrieval, comprising over 100 distinct experimental settings of retrieval models, aug...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":84,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","Natural language processing","LLM","language model","retrieval","efficient"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W7126263863","title":"Intelligent Integration of Generative AI in Medical Diagnostics and Data Analysis for Next-Generation Healthcare Systems","url":"https://doi.org/10.1109/iconat66879.2025.11362791","published":"2025-09-19","authors":["R. V. S. Praveen","S. Sista","RaviTeja Aida","Srinikhil Saisatya Vemuri","Shivprasad Chagi","Balaji Sankar"],"abstract":"Healthcare systems around the world are working to address the unmet medical requirements of people with a variety of acute and chronic illnesses. One of the main objectives is to guarantee that every citizen has access to high-quality, reasonably priced healthcare, and governments are increasingly assessing the effectiveness of healthcare treatments, including what constitutes a cost-effective therapy. Medical diagnostics and data analysis are becoming crucial for next-generation healthcare systems, even if previous discussions have mostly ignored the significance of information, especially the VODI. By enabling more effective and efficient treatment while keeping costs under control, investing in diagnostic data has the potential to drastically change the way healthcare is delivered. With an emphasis on the detection of thyroid disease patients, our work suggests a technique that uses....","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/iconat66879.2025.11362791","openalex_id":"https://openalex.org/W7126263863","cited_by_count":0,"quality_score":41,"matched_keywords":["efficient"],"author_affiliations":["Amazon (United States)","Center for Advanced Legal Studies","Fort Bend County Libraries","Harris Health System","Hindu College of Pharmacy","United Utilities (United Kingdom)"],"concepts":[{"id":"https://openalex.org/C160735492","display_name":"Health care","score":0.6238999962806702},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6061000227928162},{"id":"https://openalex.org/C136197465","display_name":"Variety (cybernetics)","score":0.5799999833106995},{"id":"https://openalex.org/C136886441","display_name":"Normalization (sociology)","score":0.5235000252723694},{"id":"https://openalex.org/C2988170871","display_name":"Healthcare system","score":0.5092999935150146},{"id":"https://openalex.org/C2522767166","display_name":"Data science","score":0.4690999984741211},{"id":"https://openalex.org/C75684735","display_name":"Big data","score":0.4390999972820282},{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.42829999327659607}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/on-optimal-steering-to-achieve-exact-fairness","title":"On Optimal Steering to Achieve Exact Fairness","url":"https://www.microsoft.com/en-us/research/publication/on-optimal-steering-to-achieve-exact-fairness/","published":"2025-09-18","authors":["Mohit Sharma","Amit Deshpande","Chiranjib Bhattacharyya","Rajiv Ratn Shah"],"abstract":"To fix the'bias in, bias out'problem in fair machine learning, it is important to steer feature distributions of data or internal representations of Large Language Models (LLMs) to ideal ones that guarantee group-fair outcomes. Previous work on fair generative models and representation steering could greatly benefit from provable fairness guarantees on the model output. We define a distribution as ideal if the minimizer of any cost-sensitive risk on it is guaranteed to have exact group-fair outcomes (e.g., demographic parity, equal opportunity)-in other words, it has no fairness-utility trade-off. We formulate an optimization program for optimal steering by finding the nearest ideal distribution in KL-divergence, and provide efficient algorithms for it when the underlying distributions come from well-known parametric families (e.g., normal, log-normal). Empirically, our optimal steering....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":84,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","large language models","Machine learning","1970-01-01","LLM","efficient"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/rpg-a-repository-planning-graph-for-unified-and-scalable-codebase-generation","title":"RPG: A Repository Planning Graph for Unified and Scalable Codebase Generation","url":"https://www.microsoft.com/en-us/research/publication/rpg-a-repository-planning-graph-for-unified-and-scalable-codebase-generation/","published":"2025-09-18","authors":["Jane Luo","Xin Zhang","Steven Liu","Jie Wu","Yiming Huang","Yangyu Huang","Chengyu Yin","Ying Xin","Jianfeng Liu","Yuefeng Zhan","Hao Sun","Qi Chen"],"abstract":"Large language models excel at function- and file-level code generation, yet generating complete repositories from scratch remains a fundamental challenge. This process demands coherent and reliable planning across proposal- and implementation-level stages, while natural language, due to its ambiguity and verbosity, is ill-suited for faithfully representing complex software structures. To address this, we introduce the Repository Planning Graph (RPG), a persistent representation that unifies proposal- and implementation-level planning by encoding capabilities, file structures, data flows, and functions in one graph. RPG replaces ambiguous natural language with an explicit blueprint, enabling long-horizon planning and scalable repository generation. Building on RPG, we develop ZeroRepo, a graph-driven framework for repository generation from scratch. It operates in three stages: proposal-...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":80,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","large language models","1970-01-01","LLM","agent"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/bayesian-concept-bottleneck-models-with-llm-priors","title":"Bayesian Concept Bottleneck Models with LLM Priors","url":"https://www.microsoft.com/en-us/research/publication/bayesian-concept-bottleneck-models-with-llm-priors/","published":"2025-09-18","authors":["Jean Feng","Avni Kothari","Luke Zier","Chandan Singh","Yan Shuo Tan"],"abstract":"Concept Bottleneck Models (CBMs) have been proposed as a compromise between white-box and black-box models, aiming to achieve interpretability without sacrificing accuracy. The standard training procedure for CBMs is to predefine a candidate set of human-interpretable concepts, extract their values from the training data, and identify a sparse subset as inputs to a transparent prediction model. However, such approaches are often hampered by the tradeoff between exploring a sufficiently large set of concepts versus controlling the cost of obtaining concept extractions, resulting in a large interpretability-accuracy tradeoff. This work investigates a novel approach that sidesteps these challenges: BC-LLM iteratively searches over a potentially infinite set of concepts within a Bayesian framework, in which Large Language Models (LLMs) serve as both a concept extraction mechanism and prior.....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","mathematics","1970-01-01","LLM"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W4414320956","title":"A Comprehensive Survey of Small Language Models in the Era of Large Language Models: Techniques, Enhancements, Applications, Collaboration with LLMs, and Trustworthiness","url":"https://doi.org/10.1145/3768165","published":"2025-09-18","authors":["Fali Wang","Zhiwei Zhang","Xianren Zhang","Zongyu Wu","Tzuhao Mo","Qiuhao Lu","Wanjing Wang","Rui Li","Junjie Xu","Xianfeng Tang","Qi He","Yao Ma"],"abstract":"Large language models (LLMs) have demonstrated emergent abilities in text generation, question answering, and reasoning, facilitating various tasks and domains. Despite their proficiency in various tasks, LLMs like PaLM 540B and Llama-3.1 405B face limitations due to large parameter sizes and computational demands, often requiring cloud API use, which raises privacy concerns, limits real-time applications on edge devices, and increases fine-tuning costs. Additionally, LLMs often underperform in specialized domains such as healthcare and law due to insufficient domain-specific knowledge, necessitating specialized models. Therefore, Small Language Models (SLMs) are increasingly favored for their low inference latency, cost-effectiveness, efficient development, and easy customization and adaptability. These models are particularly well-suited for resource-limited environments and domain kno...","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3768165","openalex_id":"https://openalex.org/W4414320956","cited_by_count":37,"quality_score":71,"matched_keywords":["efficient"],"author_affiliations":["Amazon (United States)","California University of Pennsylvania","Pennsylvania State University","Rensselaer Polytechnic Institute","The University of Texas Health Science Center at Houston","University of Pennsylvania"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8410000205039978},{"id":"https://openalex.org/C2776214188","display_name":"Inference","score":0.6428999900817871},{"id":"https://openalex.org/C183003079","display_name":"Personalization","score":0.6243000030517578},{"id":"https://openalex.org/C36503486","display_name":"Domain (mathematical analysis)","score":0.5561000108718872},{"id":"https://openalex.org/C2522767166","display_name":"Data science","score":0.5335999727249146},{"id":"https://openalex.org/C135257023","display_name":"Domain-specific language","score":0.43619999289512634},{"id":"https://openalex.org/C153701036","display_name":"Trustworthiness","score":0.4352000057697296},{"id":"https://openalex.org/C58642233","display_name":"Taxonomy (biology)","score":0.41999998688697815}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":37}},{"id":"openalex:W4417249230","title":"Ensemble-Based Intrusion Detection Enhanced with LLM-Driven Incident Response Automation","url":"https://doi.org/10.1109/i4tech64670.2025.11277913","published":"2025-09-18","authors":["Srinivasa Rao Thumala","Vijay Mane","Chinmay Inamdar","Piyush Pethkar","Akhilesh Poke","Rajat Patil"],"abstract":"Traditional incident response systems lack adapt-ability and autonomous decision-making, limiting their effectiveness against sophisticated cyber threats. A modular AI framework is implemented to address this challenge by integrating ensemble-based binary intrusion detection and multiclass attack classification with real-time automation. Using the UNSW-NB15 dataset, Random Forest, XGBoost, and LightGBM models are trained for binary threat detection, followed by classification into nine attack categories. To enhance interpretability and operational readiness, incident reports are generated via the Mixtral-8x7B-Instruct large language model using LangChain and Hugging Face Inference API. Real-time interaction and deployment are enabled through a Flask-based REST API interface. The system achieves 93% accuracy and 95% recall, demonstrating its reliability in minimizing false negatives. Desi...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/i4tech64670.2025.11277913","openalex_id":"https://openalex.org/W4417249230","cited_by_count":1,"quality_score":46,"matched_keywords":["LLM","language model"],"author_affiliations":["International Institute of Information Technology","Microsoft (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6998999714851379},{"id":"https://openalex.org/C2781067378","display_name":"Interpretability","score":0.6948000192642212},{"id":"https://openalex.org/C35525427","display_name":"Intrusion detection system","score":0.6662999987602234},{"id":"https://openalex.org/C105339364","display_name":"Software deployment","score":0.6657000184059143},{"id":"https://openalex.org/C43214815","display_name":"Reliability (semiconductor)","score":0.5192999839782715},{"id":"https://openalex.org/C124101348","display_name":"Data mining","score":0.47110000252723694},{"id":"https://openalex.org/C101468663","display_name":"Modular design","score":0.4697999954223633},{"id":"https://openalex.org/C115901376","display_name":"Automation","score":0.4521999955177307}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/flowrl-matching-reward-distributions-for-llm-reasoning","title":"FlowRL: Matching Reward Distributions for LLM Reasoning","url":"https://www.microsoft.com/en-us/research/publication/flowrl-matching-reward-distributions-for-llm-reasoning/","published":"2025-09-17","authors":["Xuekai Zhu","Daixuan Cheng","Dinghuai Zhang","Hengli Li","Kaiyan Zhang","Che Jiang","Youbang Sun","Ermo Hua","Yuxin Zuo","Xingtai Lv","Qizheng Zhang","Lin Chen"],"abstract":"We propose FlowRL: matching the full reward distribution via flow balancing instead of maximizing rewards in large language model (LLM) reinforcement learning (RL). Recent advanced reasoning models adopt reward-maximizing methods (\\eg, PPO and GRPO), which tend to over-optimize dominant reward signals while neglecting less frequent but valid reasoning paths, thus reducing diversity. In contrast, we transform scalar rewards into a normalized target distribution using a learnable partition function, and then minimize the reverse KL divergence between the policy and the target distribution. We implement this idea as a flow-balanced optimization method that promotes diverse exploration and generalizable reasoning trajectories. We conduct experiments on math and code reasoning tasks: FlowRL achieves a significant average improvement of $10.0\\%$ over GRPO and $5.1\\%$ over PPO on math benchmark...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":84,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","large language models","1970-01-01","LLM","language model","efficient"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/hierarchical-self-attention-generalizing-neural-attention-mechanics-to-multi-scale-problems","title":"Hierarchical Self-Attention: Generalizing Neural Attention Mechanics to Multi-Scale Problems","url":"https://www.microsoft.com/en-us/research/publication/hierarchical-self-attention-generalizing-neural-attention-mechanics-to-multi-scale-problems/","published":"2025-09-17","authors":["Saeed Amizadeh","Sara Abdali","Yinheng Li","Kazuhito Koishida"],"abstract":"Transformers and their attention mechanism have been revolutionary in the field of Machine Learning. While originally proposed for the language data, they quickly found their way to the image, video, graph, etc. data modalities with various signal geometries. Despite this versatility, generalizing the attention mechanism to scenarios where data is presented at different scales from potentially different modalities is not straightforward. The attempts to incorporate hierarchy and multi-modality within transformers are largely based on ad hoc heuristics, which are not seamlessly generalizable to similar problems with potentially different structures. To address this problem, in this paper, we take a fundamentally different approach: we first propose a mathematical construct to represent multi-modal, multi-scale data. We then mathematically derive the neural attention mechanics for the prop...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":80,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","Machine learning","mathematics","1970-01-01","efficient"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/enterprise-ai-must-enforce-participant-aware-access-control","title":"Enterprise AI Must Enforce Participant-Aware Access Control","url":"https://www.microsoft.com/en-us/research/publication/enterprise-ai-must-enforce-participant-aware-access-control/","published":"2025-09-17","authors":["Shashank Shreedhar Bhatt","Tanmay Rajore","Khushboo Aggarwal","Ganesh Ananthanarayanan","Ranveer Chandra","Nishanth Chandran","Suyash Choudhury","Divya Gupta","Emre Kiciman","Shrey Pandey","Srinath Setty","Rahul Sharma"],"abstract":"Large language models (LLMs) are increasingly deployed in enterprise settings where they interact with multiple users and are trained or fine-tuned on sensitive internal data. While fine-tuning enhances performance by internalizing domain knowledge, it also introduces a critical security risk: leakage of confidential training data to unauthorized users. These risks are exacerbated when LLMs are combined with Retrieval-Augmented Generation (RAG) pipelines that dynamically fetch contextual documents at inference time. We demonstrate data exfiltration attacks on AI assistants where adversaries can exploit current fine-tuning and RAG architectures to leak sensitive information by leveraging the lack of access control enforcement. We show that existing defenses, including prompt sanitization, output filtering, system isolation, and training-level privacy mechanisms, are fundamentally probabil...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Tech Report","Artificial intelligence","Computer science","large language models","LLM","retrieval"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W4414270461","title":"Efficient Multi-Camera Tokenization With Triplanes for End-to-End Driving","url":"https://doi.org/10.1109/lra.2025.3611145","published":"2025-09-17","authors":["Boris Ivanovic","Cristiano Saltori","Yurong You","Yan Wang","Wenjie Luo","Marco Pavone"],"abstract":"Autoregressive Transformers are increasingly being deployed as end-to-end robot and autonomous vehicle (AV) policy architectures, owing to their scalability and potential to leverage internet-scale pretraining for generalization. Accordingly, tokenizing sensor data <italic xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">efficiently</i> is paramount to ensuring the real-time feasibility of such architectures on embedded hardware. To this end, we present an efficient triplane-based multi-camera tokenization strategy that leverages recent advances in 3D neural reconstruction and rendering to produce sensor tokens that are agnostic to the number of input cameras and their resolution, while explicitly accounting for their geometry around an AV. Experiments on large-scale AV datasets and a state-of-the-art neural simulator demonstrate that our approach...","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/lra.2025.3611145","openalex_id":"https://openalex.org/W4414270461","cited_by_count":1,"quality_score":42,"matched_keywords":["efficient"],"author_affiliations":["Nvidia (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7620999813079834},{"id":"https://openalex.org/C2776214188","display_name":"Inference","score":0.589900016784668},{"id":"https://openalex.org/C153083717","display_name":"Leverage (statistics)","score":0.5859000086784363},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5831000208854675},{"id":"https://openalex.org/C205711294","display_name":"Rendering (computer graphics)","score":0.5673999786376953},{"id":"https://openalex.org/C48044578","display_name":"Scalability","score":0.5196999907493591},{"id":"https://openalex.org/C176982825","display_name":"Lexical analysis","score":0.49399998784065247},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.43389999866485596}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/tenet-an-efficient-sparsity-aware-lut-centric-architecture-for-ternary-llm-inference-on-edge","title":"TENET: An Efficient Sparsity-Aware LUT-Centric Architecture for Ternary LLM Inference On Edge","url":"https://www.microsoft.com/en-us/research/publication/tenet-an-efficient-sparsity-aware-lut-centric-architecture-for-ternary-llm-inference-on-edge/","published":"2025-09-16","authors":["Zhirui Huang","Ran Shu","Shijie Cao","Ran Shu","Ian Wang","Ting Cao","Chixiao Chen","Yongqiang Xiong"],"abstract":"Ternary quantization has emerged as a powerful technique for reducing both computational and memory footprint of large language models (LLM), enabling efficient real-time inference deployment without significantly compromising model accuracy. Conventional LLM inference platforms (e.g GPUs) cannot capitalize on its benefits, as they (i) lack native support for ternary arithmetic and memory specialization and (ii) remain severely under-utilized in low-batch, real-time scenarios. In this work, we propose TENET, a sparse-aware LUT-centric architecture that co-optimizes algorithm, compute, and memory for ternary LLM inference. To maximize the efficiency of Ternary Linear layer, TENET introduces a Sparse Ternary LUT (STL) core that optimizes ternary mixed-precision GEMM using a symmetric precompute lookup table. It also features Dynamic Activation N:M Sparsity to exploit the sparsity within th...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":84,"matched_keywords":["Miscellaneous","Artificial intelligence","Hardware and devices","Computer science","LLM","memory","efficient","quantization"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/data-scaling-laws-for-radiology-foundation-models","title":"Data Scaling Laws for Radiology Foundation Models","url":"https://www.microsoft.com/en-us/research/publication/data-scaling-laws-for-radiology-foundation-models/","published":"2025-09-16","authors":["Maximilian Ilse","Harshita Sharma","Anton Schwaighofer","Sam Bond-Taylor","Fernando Pérez-García","Olesya Melnichenko","Anne-Marie G. Sykes","K. Horst","Ashish Khandelwal","Maxwell C. Reynolds","M. Wetscherek","Noel Codella"],"abstract":"Foundation vision encoders such as CLIP and DINOv2, trained on web-scale data, exhibit strong transfer performance across tasks and datasets. However, medical imaging foundation models remain constrained by smaller datasets, limiting our understanding of how data scale and pretraining paradigms affect performance in this setting. In this work, we systematically study continual pretraining of two vision encoders, MedImageInsight (MI2) and RAD-DINO representing the two major encoder paradigms CLIP and DINOv2, on up to 3.5M chest x-rays from a single institution, holding compute and evaluation protocols constant. We evaluate on classification (radiology findings, lines and tubes), segmentation (lines and tubes), and radiology report generation. While prior work has primarily focused on tasks related to radiology findings, we include lines and tubes tasks to counterbalance this bias and eval...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Miscellaneous","Artificial intelligence","Medical, health and genomics","Computer science"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"hf-org-paper:tencent:2509.13232","title":"Single-stream Policy Optimization","url":"https://huggingface.co/papers/2509.13232","published":"2025-09-16","authors":["Tencent/Hunyuan"],"abstract":"","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["HuggingFace org papers","tencent"],"author_affiliations":["Tencent/Hunyuan"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/tencent/papers"}},{"id":"official:b080f192b80a8fcd","title":"Hunyuan3D Studio: End-to-End AI Pipeline for Game-Ready 3D Asset Generation","url":"https://huggingface.co/papers/2509.12815","published":"2025-09-16","authors":["Tencent Hunyuan"],"abstract":"","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_report"],"source":"official_report","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Tencent/Hunyuan"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official company report source"}},{"id":"openalex:W4414230193","title":"I&S-ViT: An Inclusive & Stable Method for Post-Training ViTs Quantization","url":"https://doi.org/10.1109/tpami.2025.3610466","published":"2025-09-16","authors":["Yunshan Zhong","Jiawei Hu","Mingbao Lin","Mengzhao Chen","Rongrong Ji"],"abstract":"Albeit the scalable performance of vision transformers (ViTs), the dense computational costs undermine their position in industrial applications. Post-training quantization (PTQ), tuning ViTs with a tiny dataset and running in a low-bit format, well addresses the cost issue but unluckily bears more performance drops in lower-bit cases. In this paper, we introduce I&S-ViT, a novel method that regulates the PTQ of ViTs in an inclusive and stable fashion. I&S-ViT first identifies two issues in the PTQ of ViTs: (1) Quantization inefficiency in the prevalent log2 quantizer for post-Softmax activations; (2) Rugged and magnified loss landscape in coarse-grained quantization granularity for post-LayerNorm activations. Then, I&S-ViT addresses these issues by introducing: (1) A novel shift-uniform-log2 quantizer (SULQ) that incorporates a shift mechanism followed by uniform quantization to achieve...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/tpami.2025.3610466","openalex_id":"https://openalex.org/W4414230193","cited_by_count":1,"quality_score":42,"matched_keywords":["quantization"],"author_affiliations":["Tencent (China)","Xiamen University"],"concepts":[{"id":"https://openalex.org/C28855332","display_name":"Quantization (signal processing)","score":0.7883999943733215},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.548799991607666},{"id":"https://openalex.org/C48044578","display_name":"Scalability","score":0.5023999810218811},{"id":"https://openalex.org/C143397304","display_name":"Geometric quantization","score":0.48080000281333923},{"id":"https://openalex.org/C177774035","display_name":"Granularity","score":0.47780001163482666},{"id":"https://openalex.org/C11413529","display_name":"Algorithm","score":0.43290001153945923},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.41530001163482666},{"id":"https://openalex.org/C33923547","display_name":"Mathematics","score":0.3865000009536743}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4414223800","title":"A Generative Foundation Model for Antibody Design","url":"https://doi.org/10.1101/2025.09.12.675771","published":"2025-09-16","authors":["Rubo Wang","Fandi Wu","Jiale Shi","Yidong Song","Yu Kong","Jianhua Ma","Bing He","Qihong Yan","Tianlei Ying","Peilin Zhao","Xingyu Gao","Jianhua Yao"],"abstract":"Abstract Antibodies are indispensable components of the immune system, yet the design of high-affinity antibodies remains a time-consuming and experimentally intensive process. To address this challenge, we present IgGM, a novel generative foundation model designed to accelerate high-affinity antibody engineering. IgGM learns the complex relationships underlying the binding interactions between antigens and antibodies, as well as the mapping between antibody sequences and structures. By conditioning on different inputs, IgGM supports a wide range of antibody design tasks, including complex structure prediction, inverse design, affinity maturation, framework optimization, humanization, and de novo antibody design. It is compatible with both conventional antibodies and nanobodies, and allows user-defined CDR loop lengths for flexible design. To prioritize candidates, we introduce a frequen...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"preprint","doi":"https://doi.org/10.1101/2025.09.12.675771","openalex_id":"https://openalex.org/W4414223800","cited_by_count":4,"quality_score":41,"matched_keywords":[],"author_affiliations":["First Affiliated Hospital of Guangzhou Medical University","Guangzhou Medical University","Institute of Microelectronics","Shanghai Jiao Tong University","Shanghai Medical College of Fudan University","Sun Yat-sen University","Tencent (China)","University of Chinese Academy of Sciences"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5997999906539917},{"id":"https://openalex.org/C159654299","display_name":"Antibody","score":0.5691999793052673},{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.504800021648407},{"id":"https://openalex.org/C70721500","display_name":"Computational biology","score":0.4860000014305115},{"id":"https://openalex.org/C147483822","display_name":"Antigen","score":0.4278999865055084},{"id":"https://openalex.org/C2780966255","display_name":"Foundation (evidence)","score":0.4120999872684479},{"id":"https://openalex.org/C204323151","display_name":"Range (aeronautics)","score":0.38659998774528503},{"id":"https://openalex.org/C184408114","display_name":"Generative Design","score":0.3776000142097473}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":4}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/edival-agent-an-object-centric-framework-for-automated-fine-grained-evaluation-of-multi-turn-editing","title":"EdiVal-Agent: An Object-Centric Framework for Automated, Fine-Grained Evaluation of Multi-Turn Editing","url":"https://www.microsoft.com/en-us/research/publication/edival-agent-an-object-centric-framework-for-automated-fine-grained-evaluation-of-multi-turn-editing/","published":"2025-09-15","authors":["Tianyu Chen","Yasi Zhang","Zhi Zhang","Peiyu Yu","Shu Wang","Zhendong Wang","K. Lin","Xiaofei Wang","Zhengyuan Yang","Linjie Li","Chung-Ching Lin","Jianwen Xie"],"abstract":"Instruction-based image editing has advanced rapidly, yet reliable and interpretable evaluation remains a bottleneck. Current protocols either (i) depend on paired reference images-resulting in limited coverage and inheriting biases from prior generative models-or (ii) rely solely on zero-shot vision-language models (VLMs), whose prompt-based assessments of instruction following, content consistency, and visual quality are often imprecise. To address this, we introduce EdiVal-Agent, an automated and fine-grained evaluation framework grounded in an object-centric perspective, designed to assess not only standard single-turn but also multi-turn instruction-based editing with precision. Given an input image, EdiVal-Agent first decomposes it into semantically meaningful objects, then synthesizes diverse, context-aware editing instructions while dynamically updating object pools across turns....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":84,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer vision","Computer science","Vision-language models","1970-01-01","preference","agent"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/tabletalk-scaffolding-spreadsheet-development-with-a-language-agent","title":"TableTalk: Scaffolding Spreadsheet Development with a Language Agent","url":"https://www.microsoft.com/en-us/research/publication/tabletalk-scaffolding-spreadsheet-development-with-a-language-agent/","published":"2025-09-15","authors":["Jenny T. Liang","Aayush Kumar","Yasharth Bajpai","Sumit Gulwani","Vu Le","Chris Parnin","Arjun Radhakrishna","Ashish Tiwari","Emerson Murphy-Hill","Gustavo Soares"],"abstract":"Despite its ubiquity in the workforce, spreadsheet programming remains challenging as programmers need both spreadsheet-specific knowledge (e.g., APIs to write formulas) and problem-solving skills to create complex spreadsheets. Large language models (LLMs) can help automate aspects of this process, and recent advances in planning and reasoning have enabled language agents, which dynamically plan, use tools, and take iterative actions to complete complex tasks. These agents observe, plan, and act, making them well-suited to scaffold spreadsheet programming by following expert processes. We present TableTalk, a language agent that helps programmers build spreadsheets conversationally. Its design reifies three design principles -- scaffolding, flexibility, and incrementality -- which we derived from two studies of seven programmers and 62 Excel templates. TableTalk structures spreadsheet d...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Article (Journal)","Artificial intelligence","Data platforms and analytics","Human-computer interaction","Computer science","agent"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/edival-agent-an-object-centric-framework-for-automated-scalable-fine-grained-evaluation-of-multi-turn-editing","title":"EdiVal-Agent: An Object-Centric Framework for Automated, Scalable, Fine-Grained Evaluation of Multi-Turn Editing","url":"https://www.microsoft.com/en-us/research/publication/edival-agent-an-object-centric-framework-for-automated-scalable-fine-grained-evaluation-of-multi-turn-editing/","published":"2025-09-15","authors":["Tianyu Chen","Yasi Zhang","Zhi Zhang","Peiyu Yu","Shu Wang","Zhendong Wang","Kevin Lin","Xiaofei Wang","Zhengyuan Yang","Linjie Li","Chung-Ching Lin","Jianwen Xie"],"abstract":"Instruction-based image editing has advanced rapidly, yet reliable and interpretable evaluation remains a bottleneck. Current protocols either (i) depend on paired reference images -- resulting in limited coverage and inheriting biases from prior generative models -- or (ii) rely solely on zero-shot vision--language models (VLMs), whose prompt-based assessments of instruction following, content consistency, and visual quality are often imprecise. To address this, we introduce EdiVal-Agent, an automated, scalable, and fine-grained evaluation framework for multi-turn instruction-based editing from an object-centric perspective, supported by a suite of expert tools. Given an image, EdiVal-Agent first decomposes it into semantically meaningful objects, then synthesizes diverse, context-aware editing instructions. For evaluation, it integrates VLMs with open-vocabulary object detectors to ass...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Miscellaneous","Computer vision","Computer science","Computer Vision and Pattern Recognition","preference","agent"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W4414167878","title":"Charting the evolution of artificial intelligence mental health chatbots from rule‐based systems to large language models: a systematic review","url":"https://doi.org/10.1002/wps.21352","published":"2025-09-15","authors":["Yining Hua","Steve Siddals","Zilin Ma","Isaac R. Galatzer‐Levy","Winna Xia","Christine Hau","Hongbin Na","Matthew Flathers","Jake Linardon","Cyrus Ayubcha","John Torous"],"abstract":"The rapid evolution of artificial intelligence (AI) chatbots in mental health care presents a fragmented landscape with variable clinical evidence and evaluation rigor. This systematic review of 160 studies (2020-2024) classifies chatbot architectures - rule-based, machine learning-based, and large language model (LLM)-based - and proposes a three-tier evaluation framework: foundational bench testing (technical validation), pilot feasibility testing (user engagement), and clinical efficacy testing (symptom reduction). While rule-based systems dominated until 2023, LLM-based chatbots surged to 45% of new studies in 2024. However, only 16% of LLM studies underwent clinical efficacy testing, with most (77%) still in early validation. Overall, only 47% of studies focused on clinical efficacy testing, exposing a critical gap in robust validation of therapeutic benefit. Discrepancies emerged b...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1002/wps.21352","openalex_id":"https://openalex.org/W4414167878","cited_by_count":30,"quality_score":75,"matched_keywords":["LLM","language model"],"author_affiliations":["Beth Israel Deaconess Medical Center","Deakin University","Google (United States)","Harvard University","Intelligent Systems Research (United States)","New York University","University of Technology Sydney"],"concepts":[{"id":"https://openalex.org/C2779041454","display_name":"Chatbot","score":0.7440999746322632},{"id":"https://openalex.org/C27415008","display_name":"Psychological intervention","score":0.5717999935150146},{"id":"https://openalex.org/C134362201","display_name":"Mental health","score":0.5199000239372253},{"id":"https://openalex.org/C71924100","display_name":"Medicine","score":0.5048999786376953},{"id":"https://openalex.org/C46304622","display_name":"Certification","score":0.4860999882221222},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4672999978065491},{"id":"https://openalex.org/C162027153","display_name":"Artificial general intelligence","score":0.42899999022483826},{"id":"https://openalex.org/C2778738651","display_name":"Novelty","score":0.4059999883174896}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":30}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/historybankqa-multilingual-temporal-question-answering-on-historical-events","title":"HistoryBankQA: Multilingual Temporal Question Answering on Historical Events","url":"https://www.microsoft.com/en-us/research/publication/historybankqa-multilingual-temporal-question-answering-on-historical-events/","published":"2025-09-15","authors":["Biswadip Mandal","Anant Khandelwal","Manish Gupta"],"abstract":"Temporal reasoning about historical events is a critical skill for NLP tasks like event extraction, historical entity linking, temporal question answering, timeline summarization, temporal event clustering and temporal natural language inference. Yet efforts on benchmarking temporal reasoning capabilities of large language models (LLMs) are rather limited. Existing temporal reasoning datasets are limited in scale, lack multilingual coverage and focus more on contemporary events. To address these limitations, we present HistoryBank, a multilingual database of 10M+ historical events extracted from Wikipedia timeline pages and article infoboxes. Our database provides unprecedented coverage in both historical depth and linguistic breadth with 10 languages. Additionally, we construct a comprehensive question answering benchmark for temporal reasoning across all languages. This benchmark cover...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Miscellaneous","Human language technologies","Computer science","Natural language processing"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"official:7e882bd7ca4bab24","title":"Addendum to GPT-5 system card: GPT-5-Codex","url":"https://openai.com/index/gpt-5-system-card-addendum-gpt-5-codex","published":"2025-09-15","authors":["OpenAI"],"abstract":"This addendum to the GPT-5 system card shares a new model: GPT-5-Codex, a version of GPT-5 further optimized for agentic coding in Codex. GPT-5-Codex adjusts its thinking effort more dynamically based on task complexity, responding quickly to simple conversational queries or small tasks, while independently working for longer on more complex tasks.","companies":["OpenAI"],"matched_orgs":["OpenAI"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Safety"],"author_affiliations":["OpenAI"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official RSS feed https://openai.com/news/rss.xml"}},{"id":"official:f240c4bdd96a1e37","title":"CyberSOCEval: Benchmarking LLMs Capabilities for Malware Analysis and Threat Intelligence Reasoning","url":"https://ai.meta.com/research/publications/cybersoceval-benchmarking-llms-capabilities-for-malware-analysis-and-threat-intelligence-reasoning/","published":"2025-09-15","authors":["Lauren Deason","Adam Bali","Ciprian Bejean","Diana Bolocan","James Crnkovich","Ioana Croitoru","Krishna Durai","Chase Midler","Calin Miron","David Molnar","Brad Moon","Bruno Ostarcevic"],"abstract":"Official Meta AI publication page.","companies":["Meta/FAIR"],"matched_orgs":["Meta/FAIR"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Meta/FAIR"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Meta AI publication search https://ai.meta.com/results/?content_types%5B0%5D=publication&page=4"}},{"id":"openalex:W4415250872","title":"Accelerating Supercomputing: AI-Hardware-Driven Innovation for Speed and Efficiency","url":"https://doi.org/10.1109/hpec67600.2025.11196413","published":"2025-09-15","authors":["Jack Dongarra","John A. Gunnels","Harun Bayraktar","Azzam Haidar","Dan Ernst"],"abstract":"The evolution of GPUs has resulted in democratized access to increasingly powerful low-precision compute capabilities, designed for artificial intelligence (AI), particularly large language models (LLMs) and generative AI. These algorithms heavily utilize hardware units specialized for matrix multiplication, such as Tensor Cores, that have advanced since their introduction, offering improved functionality, throughput, and energy efficiency. Two key techniques: mixed-precision algorithms and floating-point emulation, leveraging these resources, have emerged. They enable scientific applications, many dependent upon high-precision linear algebra, to achieve dramatic gains in performance and power efficiency. Additionally, these methods facilitate innovation in areas such as fine-grained mixed-precision strategies and data compression, broadening their impact across diverse computing platfor...","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/hpec67600.2025.11196413","openalex_id":"https://openalex.org/W4415250872","cited_by_count":0,"quality_score":41,"matched_keywords":["compression"],"author_affiliations":["Nvidia (United States)","Oak Ridge National Laboratory"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6818000078201294},{"id":"https://openalex.org/C26517878","display_name":"Key (lock)","score":0.5828999876976013},{"id":"https://openalex.org/C155281189","display_name":"Tensor (intrinsic definition)","score":0.44440001249313354},{"id":"https://openalex.org/C48044578","display_name":"Scalability","score":0.42080000042915344},{"id":"https://openalex.org/C99844830","display_name":"Scaling","score":0.38589999079704285},{"id":"https://openalex.org/C2742236","display_name":"Efficient energy use","score":0.3758000135421753},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3596999943256378},{"id":"https://openalex.org/C80444323","display_name":"Theoretical computer science","score":0.35249999165534973}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/lost-in-embeddings-information-loss-in-vision-language-models","title":"Lost in Embeddings: Information Loss in Vision-Language Models","url":"https://www.microsoft.com/en-us/research/publication/lost-in-embeddings-information-loss-in-vision-language-models/","published":"2025-09-14","authors":["Wenyan Li","Raphael Tang","Chengzu Li","Caiqi Zhang","Ivan Vuli'c","Anders Søgaard"],"abstract":"Vision--language models (VLMs) often process visual inputs through a pretrained vision encoder, followed by a projection into the language model's embedding space via a connector component. While crucial for modality fusion, the potential information loss induced by this projection step and its direct impact on model capabilities remain understudied. We introduce two complementary approaches to examine and quantify this loss by analyzing the latent representation space. First, we evaluate semantic information preservation by analyzing changes in k-nearest neighbor relationships between image representations, before and after projection. Second, we directly measure information loss by reconstructing visual embeddings from the projected representation, localizing loss at an image patch level. Experiments reveal that connectors substantially distort the local geometry of visual representati...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":80,"matched_keywords":["Miscellaneous","Computer vision","Computer science","Computer Vision and Pattern Recognition","vision language models","language model","retrieval"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/graph-enhanced-retrieval-augmented-question-answering-for-e-commerce-customer-support","title":"Graph-Enhanced Retrieval-Augmented Question Answering for E-Commerce Customer Support","url":"https://www.microsoft.com/en-us/research/publication/graph-enhanced-retrieval-augmented-question-answering-for-e-commerce-customer-support/","published":"2025-09-14","authors":["Piyushkumar Patel"],"abstract":"E-Commerce customer support requires quick and accurate answers grounded in product data and past support cases. This paper develops a novel retrieval-augmented generation (RAG) framework that uses knowledge graphs (KGs) to improve the relevance of the answer and the factual grounding. We examine recent advances in knowledge-augmented RAG and chatbots based on large language models (LLM) in customer support, including Microsoft's GraphRAG and hybrid retrieval architectures. We then propose a new answer synthesis algorithm that combines structured subgraphs from a domain-specific KG with text documents retrieved from support archives, producing more coherent and grounded responses. We detail the architecture and knowledge flow of our system, provide comprehensive experimental evaluation, and justify its design in real-time support settings. Our implementation demonstrates 23\\% improvement...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":80,"matched_keywords":["Miscellaneous","Artificial intelligence","Computer science","Information retrieval","mathematics","LLM","retrieval"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W4414260593","title":"SuperBench: A Proactive Validation System for Improving Reliability of Cloud AI Infrastructure","url":"https://doi.org/10.1145/3767334","published":"2025-09-13","authors":["Yifan Xiong","Yuting Jiang","Ziyue Yang","Lei Qu","Guoshuai Zhao","Shuguang Liu","Dong Zhong","Boris Pinzur","Jie Zhang","Yang Wang","Joaquin José","Hossein Pourreza"],"abstract":"Reliability in cloud AI infrastructure is crucial for cloud service providers, prompting the widespread use of hardware redundancies. However, these redundancies can inadvertently lead to hidden degradation, known as “gray failure”, for AI workloads, significantly affecting end-to-end performance and concealing performance issues, which complicates root cause analysis for failures and regressions. We introduce SuperBench, a proactive validation system for AI infrastructure that mitigates hidden degradation caused by hardware redundancies and enhances overall reliability. SuperBench features a comprehensive benchmark suite, capable of evaluating individual hardware components and representing most real AI workloads. It comprises a Validator that learns benchmark criteria to pinpoint defective components clearly. Additionally, SuperBench incorporates a Selector to balance validation time a...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3767334","openalex_id":"https://openalex.org/W4414260593","cited_by_count":1,"quality_score":38,"matched_keywords":[],"author_affiliations":["Microsoft (Canada)","Microsoft (Finland)","Microsoft (United States)","Microsoft Research (United Kingdom)","Microsoft Research Asia (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8776000142097473},{"id":"https://openalex.org/C31395832","display_name":"Testbed","score":0.7727000117301941},{"id":"https://openalex.org/C79974875","display_name":"Cloud computing","score":0.7682999968528748},{"id":"https://openalex.org/C185798385","display_name":"Benchmark (surveying)","score":0.7200000286102295},{"id":"https://openalex.org/C35292069","display_name":"Validator","score":0.6656000018119812},{"id":"https://openalex.org/C43214815","display_name":"Reliability (semiconductor)","score":0.6215999722480774},{"id":"https://openalex.org/C120314980","display_name":"Distributed computing","score":0.5375999808311462},{"id":"https://openalex.org/C63540848","display_name":"Fault tolerance","score":0.40389999747276306}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"official:43c44dc10a64aca8","title":"ERNIE 4.5 Gets a Major Inference Speed Boost","url":"https://ernie.baidu.com/blog/posts/plas/","published":"2025-09-12","authors":["Baidu"],"abstract":"How the new PLAS sparse attention update delivers performance gains for long-context inference on ERNIE 4.5 models.","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Baidu"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official RSS feed https://ernie.baidu.com/blog/index.xml"}},{"id":"openalex:W7128705166","title":"Self-Supervised Learning for Unstructured Data Analytics: A Unified Framework for Scalable Representation and Insight Extraction","url":"https://doi.org/10.1109/icerect65215.2025.11377309","published":"2025-09-12","authors":["Rupesh Dabbir"],"abstract":"Unstructured data—ranging from text and images to audio and sensor streams—constitutes the majority of global digital content but remains largely untapped due to annotation scarcity and heterogeneity. In this paper, we propose a self-supervised learning method for analytics on unstructured data. Our approach integrates masked content prediction, contrastive learning, and cross-modal correlation objectives within a single architecture, enabling it to learn rich and transferable representations from raw, unlabeled inputs. Unlike modality-specific methods, our framework generalizes seamlessly across text, vision, and multimodal domains, making it particularly suitable for real-world deployment. Experiments on benchmark datasets such as ImageNet, AG News, and MSCOCO show state-of-the-art results on classification and retrieval tasks in the absence of human supervision. Strong robustness, tra...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/icerect65215.2025.11377309","openalex_id":"https://openalex.org/W7128705166","cited_by_count":0,"quality_score":45,"matched_keywords":["retrieval","news"],"author_affiliations":["Google (United States)"],"concepts":[{"id":"https://openalex.org/C2781252014","display_name":"Unstructured data","score":0.8492000102996826},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8328999876976013},{"id":"https://openalex.org/C48044578","display_name":"Scalability","score":0.6481999754905701},{"id":"https://openalex.org/C185798385","display_name":"Benchmark (surveying)","score":0.619700014591217},{"id":"https://openalex.org/C2776359362","display_name":"Representation (politics)","score":0.5726000070571899},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5672000050544739},{"id":"https://openalex.org/C2776321320","display_name":"Annotation","score":0.5521000027656555},{"id":"https://openalex.org/C177606310","display_name":"Adaptability","score":0.5472000241279602}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4414258176","title":"A Benchmark of Evo2 Genomic AI Models for Efficient and Practical Deployment","url":"https://doi.org/10.1101/2025.09.10.675279","published":"2025-09-12","authors":["Huimin Li","Hongkai Ji","Yuchen Zeng","Wei Lv","Jianmin Wu","Sheng Liu","Chunhua Lin","Huanming Yang","Zhaorong Li","Yubao Chen","Wei Dong"],"abstract":"Abstract The rapid advancement of DNA foundation language models has brought about a transformative shift in genomics, allowing for the deciphering of intricate patterns and regulatory mechanisms embedded within DNA sequences. The genomic foundation model Evo2 demonstrates remarkable capabilities in decoding DNA functional patterns through cross-species pretraining. However, despite the great potential of Evo2 in basic genomics research, there is currently no clear and systematic guidance on its specific application scenarios, performance, and optimization directions in the field of tumor genomics, and its performance dependency on specialized hardware (such as FP8 precision on H800 GPUs) has not been empirically benchmarked. Here, we present a focused validation of Evo2 using two independent cancer genomic datasets (Bladder Urothelial Carcinoma and Ovarian Cancer), we tested the downstr...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"preprint","doi":"https://doi.org/10.1101/2025.09.10.675279","openalex_id":"https://openalex.org/W4414258176","cited_by_count":0,"quality_score":45,"matched_keywords":["language model","efficient"],"author_affiliations":["Alibaba Group (China)","BGI Group (China)","Chinese Nutrition Society","University of Chinese Academy of Sciences","Yuhuangding Hospital","Zhejiang Academy of Social Sciences","Zhejiang Lab"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7354999780654907},{"id":"https://openalex.org/C2776214188","display_name":"Inference","score":0.6531000137329102},{"id":"https://openalex.org/C185798385","display_name":"Benchmark (surveying)","score":0.5837000012397766},{"id":"https://openalex.org/C105339364","display_name":"Software deployment","score":0.5080999732017517},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.4577000141143799},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4562999904155731},{"id":"https://openalex.org/C189206191","display_name":"Genomics","score":0.42309999465942383},{"id":"https://openalex.org/C2776207758","display_name":"Downstream (manufacturing)","score":0.38190001249313354}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4414595699","title":"HIP: Hierarchical Point Modeling and Pre-training for Visual Information Extraction","url":"https://doi.org/10.1007/978-3-032-04614-7_4","published":"2025-09-12","authors":["Rujiao Long","Pengfei Wang","Zhibo Yang","Wenqing Cheng"],"abstract":"","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"book-chapter","doi":"https://doi.org/10.1007/978-3-032-04614-7_4","openalex_id":"https://openalex.org/W4414595699","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Alibaba Group (China)","Huazhong University of Science and Technology"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8924000263214111},{"id":"https://openalex.org/C2781067378","display_name":"Interpretability","score":0.871399998664856},{"id":"https://openalex.org/C2911011789","display_name":"Hallucinating","score":0.7523000240325928},{"id":"https://openalex.org/C2776359362","display_name":"Representation (politics)","score":0.6173999905586243},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.6139000058174133},{"id":"https://openalex.org/C144986985","display_name":"Hierarchical database model","score":0.5932999849319458},{"id":"https://openalex.org/C28719098","display_name":"Point (geometry)","score":0.5622000098228455},{"id":"https://openalex.org/C90805587","display_name":"Word (group theory)","score":0.5479999780654907}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4414261039","title":"DualAlign: Generating Clinically Grounded Synthetic Data","url":"https://doi.org/10.1101/2025.09.09.25335422","published":"2025-09-12","authors":["Rui Li","Xun Wang","Hong Yu"],"abstract":"Abstract Synthetic clinical data are increasingly important for advancing AI in healthcare, given strict privacy constraints on real-world EHRs, limited availability of annotated rare-condition data, and systemic biases in observational datasets. While large language models (LLMs) can generate fluent clinical text, producing synthetic data that is both realistic and clinically meaningful remains challenging. We introduce DualAlign, a framework that enhances statistical fidelity and clinical plausibility through dual alignment: (1) statistical alignment, which conditions generation on patient demographics and risk factors; and (2) semantic alignment, which incorporates real-world symptom trajectories to guide content generation. Using Alzheimer’s disease (AD) as a case study, DualAlign produces context-grounded symptom-level sentences that better reflect real-world clinical documentation....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"preprint","doi":"https://doi.org/10.1101/2025.09.09.25335422","openalex_id":"https://openalex.org/W4414261039","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Microsoft (United States)","UMass Memorial Health Care","VA New England Healthcare System"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6736999750137329},{"id":"https://openalex.org/C160920958","display_name":"Synthetic data","score":0.6265000104904175},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5016999840736389},{"id":"https://openalex.org/C2776459999","display_name":"Fidelity","score":0.4952000081539154},{"id":"https://openalex.org/C114289077","display_name":"Statistical model","score":0.47029998898506165},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.4629000127315521},{"id":"https://openalex.org/C23131810","display_name":"Observational study","score":0.45829999446868896},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.4140999913215637}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4417471306","title":"A Survey for Foundation Models in Autonomous Driving","url":"https://doi.org/10.1109/iccvdm66874.2025.11290083","published":"2025-09-12","authors":["Haoxiang Gao","Zhongruo Wang","Yaqian Li","Kaiwen Long","Ming Yang","Yiqing Shen"],"abstract":"The advent of foundation models has revolutionized the fields of natural language processing and computer vision, paving the way for their application in autonomous driving (AD). This survey presents a comprehensive review of more than 40 research papers, demonstrating the role of foundation models in enhancing AD. Large language models contribute to planning and simulation in AD, particularly through their proficiency in reasoning, code generation and translation. In parallel, vision foundation models are increasingly adapted for critical tasks such as 3D object detection and tracking, as well as creating realistic driving scenarios for simulation and testing. Multi-modal foundation models, integrating diverse inputs, exhibit exceptional visual understanding and spatial reasoning, crucial for end-to-end AD. This survey not only provides a structured taxonomy, categorizing foundation mod...","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/iccvdm66874.2025.11290083","openalex_id":"https://openalex.org/W4417471306","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Amazon (United States)","Autodesk (United States)","Carnegie Mellon University","Johns Hopkins Medicine","Johns Hopkins University","Shanghai Jiao Tong University","University of Baltimore"],"concepts":[{"id":"https://openalex.org/C2780966255","display_name":"Foundation (evidence)","score":0.746999979019165},{"id":"https://openalex.org/C174348530","display_name":"Bridging (networking)","score":0.6326000094413757},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5200999975204468},{"id":"https://openalex.org/C127413603","display_name":"Engineering","score":0.43050000071525574},{"id":"https://openalex.org/C2779903281","display_name":"Modalities","score":0.3801000118255615},{"id":"https://openalex.org/C36503486","display_name":"Domain (mathematical analysis)","score":0.3725000023841858},{"id":"https://openalex.org/C201995342","display_name":"Systems engineering","score":0.33230000734329224},{"id":"https://openalex.org/C2522767166","display_name":"Data science","score":0.33219999074935913}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/population-aligned-persona-generation-for-llm-based-social-simulation","title":"Population-Aligned Persona Generation for LLM-based Social Simulation","url":"https://www.microsoft.com/en-us/research/publication/population-aligned-persona-generation-for-llm-based-social-simulation/","published":"2025-09-11","authors":["Zhengyu Hu","Zheyuan Xiao","Max Xiong","Yuxuan Lei","Tianfu Wang","Jianxun Lian","Kaize Ding","Ziang Xiao","Nicholas Jing Yuan","Xing Xie"],"abstract":"Recent advances in large language models (LLMs) have enabled human-like social simulations at unprecedented scale and fidelity, offering new opportunities for computational social science. A key challenge, however, is the construction of persona sets that authentically represent the diversity and distribution of real-world populations. Most existing LLM-based social simulation studies focus primarily on designing agentic frameworks and simulation environments, often overlooking the complexities of persona generation and the potential biases introduced by unrepresentative persona sets. In this paper, we propose a systematic framework for synthesizing high-quality, population-aligned persona sets for LLM-driven social simulation. Our approach begins by leveraging LLMs to generate narrative personas from long-term social media data, followed by rigorous quality assessment to filter out low-...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":80,"matched_keywords":["Miscellaneous","Social sciences","Computer science","large language models","LLM","long-term","media"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/the-morality-of-probability-how-implicit-moral-biases-in-llms-may-shape-the-future-of-human-ai-symbiosis","title":"The Morality of Probability: How Implicit Moral Biases in LLMs May Shape the Future of Human-AI Symbiosis","url":"https://www.microsoft.com/en-us/research/publication/the-morality-of-probability-how-implicit-moral-biases-in-llms-may-shape-the-future-of-human-ai-symbiosis/","published":"2025-09-11","authors":["Eoin O’Doherty","Nicole Weinrauch","Andrew Talone","Uri Klempner","Xiaoyuan Yi","Xing Xie","Yi Zeng"],"abstract":"Artificial intelligence (AI) is advancing at a pace that raises urgent questions about how to align machine decision-making with human moral values. This working paper investigates how leading AI systems prioritize moral outcomes and what this reveals about the prospects for human-AI symbiosis. We address two central questions: (1) What moral values do state-of-the-art large language models (LLMs) implicitly favour when confronted with dilemmas? (2) How do differences in model architecture, cultural origin, and explainability affect these moral preferences? To explore these questions, we conduct a quantitative experiment with six LLMs, ranking and scoring outcomes across 18 dilemmas representing five moral frameworks. Our findings uncover strikingly consistent value biases. Across all models, Care and Virtue values outcomes were rated most moral, while libertarian choices were consistent...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Miscellaneous","Artificial intelligence","Human-computer interaction","Computer science","Human–computer interaction"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/is-in-context-learning-learning","title":"Is In-Context Learning Learning?","url":"https://www.microsoft.com/en-us/research/publication/is-in-context-learning-learning/","published":"2025-09-11","authors":["Adrian de Wynter"],"abstract":"In-context learning (ICL) allows some autoregressive models to solve tasks via next-token prediction and without needing further training. This has led to claims about these model's ability to solve (learn) unseen tasks with only a few shots (exemplars) in the prompt. However, deduction does not always imply learning, as ICL does not explicitly encode a given observation. Instead, the models rely on their prior knowledge and the exemplars given, if any. We argue that, mathematically, ICL does constitute learning, but its full characterisation requires empirical work. We then carry out a large-scale analysis of ICL ablating out or accounting for memorisation, pretraining, distributional shifts, and prompting style and phrasing. We find that ICL is an effective learning paradigm, but limited in its ability to learn and generalise to unseen tasks. We note that, in the limit where exemplars....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computation and Language","Computer science","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W4414105080","title":"PolyPath: Adapting a Large Multimodal Model for Multislide Pathology Report Generation","url":"https://doi.org/10.1016/j.modpat.2025.100886","published":"2025-09-11","authors":["Faruk Ahmed","Lin Yang","Tiam Jaroensri","Andrew Sellergren","Yossi Matias","Avinatan Hassidim","Greg S. Corrado","Dale R. Webster","Shravya Shetty","Shruthi Prabhakara","Y. Liu","Daniel Golden"],"abstract":"","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1016/j.modpat.2025.100886","openalex_id":"https://openalex.org/W4414105080","cited_by_count":2,"quality_score":39,"matched_keywords":[],"author_affiliations":["Google (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7301999926567078},{"id":"https://openalex.org/C534262118","display_name":"Medical diagnosis","score":0.6780999898910522},{"id":"https://openalex.org/C2779343474","display_name":"Context (archaeology)","score":0.6097999811172485},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.53329998254776},{"id":"https://openalex.org/C2780451532","display_name":"Task (project management)","score":0.5008999705314636},{"id":"https://openalex.org/C98045186","display_name":"Process (computing)","score":0.4537999927997589},{"id":"https://openalex.org/C160633673","display_name":"Pixel","score":0.4296000003814697},{"id":"https://openalex.org/C2777522853","display_name":"Digital pathology","score":0.41190001368522644}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":2}},{"id":"openalex:W4414146613","title":"Seeing Above and Below the Canopy: Modeling and Interpreting Species Occupancy with Multimodal Habitat Representations","url":"https://doi.org/10.1101/2025.09.06.674602","published":"2025-09-11","authors":["Timm Haucke","Lauren Harrell","Yunyi Shen","Levente J. Klein","David Rolnick","Lauren Gillespie","Sara Beery"],"abstract":"Abstract Effective conservation and restoration of species is an increasingly urgent priority. To design management strategies that improve species success, we need a solid understanding of the habitat characteristics that support it. Occupancy models are statistical tools that ecologists use to model these relationships from data. Yet, current models represent habitats with coarse-scale environmental variables that fail to capture important microhabitat features. We show that these limitations can be addressed by incorporating AI-derived, multimodal habitat representations from overhead satellite imagery and ground-level camera-trap imagery. Across geography and species, these representations yield more accurate out-of-sample predictions than models based on conventional covariates alone, and combining satellite and ground-level views provides complementary gains. To translate improved....","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"preprint","doi":"https://doi.org/10.1101/2025.09.06.674602","openalex_id":"https://openalex.org/W4414146613","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Google (United States)","McGill University","Moscow Institute of Thermal Technology","University of Michigan"],"concepts":[{"id":"https://openalex.org/C160331591","display_name":"Occupancy","score":0.7669000029563904},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5713000297546387},{"id":"https://openalex.org/C185933670","display_name":"Habitat","score":0.5358999967575073},{"id":"https://openalex.org/C114289077","display_name":"Statistical model","score":0.49869999289512634},{"id":"https://openalex.org/C18903297","display_name":"Ecology","score":0.454800009727478},{"id":"https://openalex.org/C174348530","display_name":"Bridging (networking)","score":0.4221999943256378},{"id":"https://openalex.org/C2778102629","display_name":"Satellite imagery","score":0.397599995136261},{"id":"https://openalex.org/C119043178","display_name":"Covariate","score":0.3937999904155731}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/jupiter-enhancing-llm-data-analysis-capabilities-via-notebook-and-inference-time-value-guided-search","title":"Jupiter: Enhancing LLM Data Analysis Capabilities via Notebook and Inference-Time Value-Guided Search","url":"https://www.microsoft.com/en-us/research/publication/jupiter-enhancing-llm-data-analysis-capabilities-via-notebook-and-inference-time-value-guided-search/","published":"2025-09-10","authors":["Shuocheng Li","Yihao Liu","Silin Du","Wenxuan Zeng","Zhe Xu","Mengyu Zhou","Yeye He","Haoyu Dong","Shi Han","Dongmei Zhang"],"abstract":"Large language models (LLMs) have shown great promise in automating data science workflows, but existing models still struggle with multi-step reasoning and tool use, which limits their effectiveness on complex data analysis tasks. To address this, we propose a scalable pipeline that extracts high-quality, tool-based data analysis tasks and their executable multi-step solutions from real-world Jupyter notebooks and associated data files. Using this pipeline, we introduce NbQA, a large-scale dataset of standardized task-solution pairs that reflect authentic tool-use patterns in practical data science scenarios. To further enhance multi-step reasoning, we present Jupiter, a framework that formulates data analysis as a search problem and applies Monte Carlo Tree Search (MCTS) to generate diverse solution trajectories for value model learning. During inference, Jupiter combines the value mod...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":80,"matched_keywords":["Inproceedings (Conference)","Data platforms and analytics","Computer science","large language models","1970-01-01","LLM","agent"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/one-head-many-models-cross-attention-routing-for-cost-aware-llm-selection","title":"One Head, Many Models: Cross-Attention Routing for Cost-Aware LLM Selection","url":"https://www.microsoft.com/en-us/research/publication/one-head-many-models-cross-attention-routing-for-cost-aware-llm-selection/","published":"2025-09-10","authors":["Roshini Pulishetty","Mani Kishan Ghantasala","Keerthy Kaushik Dasoju","Niti Mangwani","Vishal Garimella","Aditya Mate","Somya Chatterjee","Yue Kang","Ehi Nosakhare","Sadid A. Hasan","Soundar Srinivasan"],"abstract":"The proliferation of large language models (LLMs) with varying computational costs and performance profiles presents a critical challenge for scalable, cost-effective deployment in real-world applications. We introduce a unified routing framework that leverages a single-head cross-attention mechanism to jointly model query and model embeddings, enabling dynamic selection of the optimal LLM for each input query. Our approach is evaluated on RouterBench, a large-scale, publicly available benchmark encompassing diverse LLM pools and domains. By explicitly capturing fine-grained query-model interactions, our router predicts both response quality and generation cost, achieving up to 6.6% improvement in Average Improvement in Quality (AIQ) and 2.9% in maximum performance over existing routers. To robustly balance performance and cost, we propose an exponential reward function that enhances sta...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Miscellaneous","Artificial intelligence","Programming languages and software engineering","Computer science","large language models","LLM"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W4414110320","title":"Printed sensing human-machine interface with individualized adaptive machine learning","url":"https://doi.org/10.1126/sciadv.adw3725","published":"2025-09-10","authors":["Guohui Wang","Yao Tang","Xinran Luo","Shengdi Lu","Yiru Zhou","Yi Lu","Guangyang Sun","Pei Liu","Jiayu Ning","Hua Jiang","Ke Hu","Hongzhen Liu"],"abstract":"Developing intelligent robots with integrated sensing capabilities is critical for advanced manufacturing, medical robots, and embodied intelligence. Existing robotic sensing technologies are limited to recording of acceleration, driving torque, pressure feedback, and so on. Expanding and integrating with the multimodal sensors to mimic and even surpass the human feeling is substantially underdeveloped. Here, we introduce a printed soft human-machine interface consisting of an e-skin-enabled gesture recognitions with feedback stimulus and a soft robot with multimodal perception of contact pressure, temperature, thermal conductivity, and electrical conductivity. The sensing e-skin with adaptive machine learning was able to decode and classify the hand gestures with re-wearable convenience and individual's differences. The soft interface provides the bidirectional communications between ro...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1126/sciadv.adw3725","openalex_id":"https://openalex.org/W4414110320","cited_by_count":8,"quality_score":45,"matched_keywords":[],"author_affiliations":["Shanghai Sixth People's Hospital","ShanghaiTech University","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7572000026702881},{"id":"https://openalex.org/C207347870","display_name":"Gesture","score":0.6729000210762024},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.6604999899864197},{"id":"https://openalex.org/C90509273","display_name":"Robot","score":0.6406000256538391},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.6025999784469604},{"id":"https://openalex.org/C34413123","display_name":"Robotics","score":0.5849000215530396},{"id":"https://openalex.org/C113843644","display_name":"Interface (matter)","score":0.5677000284194946},{"id":"https://openalex.org/C192327766","display_name":"Cognitive robotics","score":0.5027999877929688}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":8}},{"id":"hf-org-paper:tencent:2509.07980","title":"Parallel-R1: Towards Parallel Thinking via Reinforcement Learning","url":"https://huggingface.co/papers/2509.07980","published":"2025-09-09","authors":["Tencent/Hunyuan"],"abstract":"Parallel thinking has emerged as a novel approach for enhancing the reasoning capabilities of large language models (LLMs) by exploring multiple reasoning paths concurrently. However, activating such capabilities through training remains challenging, as existing methods predominantly rely on supervised fine-tuning (SFT) over synthetic data, which encourages teacher-forced imitation rather than exploration and generalization. Different from them, we propose Parallel-R1, the first reinforcement learning (RL) framework that enables parallel thinking behaviors for complex real-world reasoning tasks. Our framework employs a progressive curriculum that explicitly addresses the cold-start problem in training parallel thinking with RL. We first use SFT on prompt-generated trajectories from easier tasks to instill the parallel thinking ability, then transition to RL to explore and generalize this...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["HuggingFace org papers","tencent"],"author_affiliations":["Tencent/Hunyuan"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/tencent/papers"}},{"id":"openalex:W4414152979","title":"AI mirrors experimental science to uncover a mechanism of gene transfer crucial to bacterial evolution","url":"https://doi.org/10.1016/j.cell.2025.08.018","published":"2025-09-09","authors":["José R. Penadés","Juraj Gottweis","Lingchen He","Jonasz B. Patkowski","Alexander Daryin","Wei‐Hung Weng","Tao Tu","Anil Palepu","Anatoly Myaskovsky","Annalisa Pawlosky","Vivek Natarajan","Alan Karthikesalingam"],"abstract":"Artificial intelligence (AI) models have been proposed for hypothesis generation, but testing their ability to drive high-impact research is challenging since an AI-generated hypothesis can take decades to validate. Here, we challenge the ability of a recently developed large language model (LLM)-based platform, AI co-scientist, to generate high-level hypotheses by posing a question that took years to resolve experimentally but remained unpublished: how could capsid-forming phage-inducible chromosomal islands (cf-PICIs) spread across bacterial species? Remarkably, the AI co-scientist's top-ranked hypothesis matched our experimentally confirmed mechanism: cf-PICIs hijack diverse phage tails to expand their host range. We critically assess its five highest-ranked hypotheses, showing that some opened new research avenues in our laboratories. We benchmark its performance against other LLMs a...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1016/j.cell.2025.08.018","openalex_id":"https://openalex.org/W4414152979","cited_by_count":11,"quality_score":56,"matched_keywords":["LLM","language model"],"author_affiliations":["Google (United States)"],"concepts":[{"id":"https://openalex.org/C86803240","display_name":"Biology","score":0.8438000082969666},{"id":"https://openalex.org/C89611455","display_name":"Mechanism (biology)","score":0.7353000044822693},{"id":"https://openalex.org/C92938381","display_name":"Horizontal gene transfer","score":0.5389000177383423},{"id":"https://openalex.org/C185798385","display_name":"Benchmark (surveying)","score":0.5108000040054321},{"id":"https://openalex.org/C87590526","display_name":"Experimental evolution","score":0.4878999888896942},{"id":"https://openalex.org/C70721500","display_name":"Computational biology","score":0.450300008058548},{"id":"https://openalex.org/C3018928802","display_name":"Gene transfer","score":0.41670000553131104},{"id":"https://openalex.org/C188147891","display_name":"Cognitive science","score":0.3711000084877014}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":11}},{"id":"openalex:W4414165894","title":"Semantic Communication Based on Large Language Model for Underwater Image Transmission","url":"https://doi.org/10.1109/tmc.2025.3607717","published":"2025-09-09","authors":["Weilong Chen","Wenxuan Xu","Haoran Chen","Xinran Zhang","Zhijin Qin","Yanru Zhang","Zhu Han"],"abstract":"Underwater communication is essential for environmental monitoring, marine biology research, and underwater exploration. Traditional underwater communication faces limitations like low bandwidth, high latency, and susceptibility to noise, while semantic communication (SC) offers a promising solution by focusing on the exchange of semantics rather than symbols or bits. However, SC encounters challenges in underwater environments, including semantic information mismatch and difficulties in accurately identifying and transmitting critical information that aligns with the diverse requirements of underwater applications. To address these challenges, we propose a novel SC framework based on Large Language Models (LLMs). Our framework leverages visual LLMs to perform semantic compression and prioritization of underwater image data according to the query from users. By identifying and encoding k...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/tmc.2025.3607717","openalex_id":"https://openalex.org/W4414165894","cited_by_count":5,"quality_score":54,"matched_keywords":["LLM","language model","compression"],"author_affiliations":["Alibaba Group (China)","Southwest Jiaotong University","Tsinghua University","University of Electronic Science and Technology of China","University of Houston"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.829200029373169},{"id":"https://openalex.org/C98083399","display_name":"Underwater","score":0.7293000221252441},{"id":"https://openalex.org/C26517878","display_name":"Key (lock)","score":0.6924999952316284},{"id":"https://openalex.org/C184337299","display_name":"Semantics (computer science)","score":0.6638000011444092},{"id":"https://openalex.org/C125411270","display_name":"Encoding (memory)","score":0.5065000057220459},{"id":"https://openalex.org/C169111936","display_name":"Underwater acoustic communication","score":0.4970000088214874},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.45419999957084656},{"id":"https://openalex.org/C761482","display_name":"Transmission (telecommunications)","score":0.4108000099658966}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":5}},{"id":"openalex:W4414217599","title":"Programmable reality","url":"https://doi.org/10.3389/frvir.2025.1649785","published":"2025-09-09","authors":["Ryo Suzuki","Parastoo Abtahi","Chen Zhu-Tian","Mustafa Doga Dogan","Andrea Colaço","Eric J. Gonzalez","Karan Ahuja","Mar González-Franco"],"abstract":"Innovations in spatial computing and artificial intelligence (AI) are making it possible to overlay dynamic, interactive digital elements on the physical world. Soon, every object might have a real-time digital twin, enabling the “Internet of Things” so as to identify and interact with even unconnected items. This programmable reality would enable computational manipulation of the world around us through alteration of its appearance or functionality, similar to software, but for reality itself. Advances in AI language models have enabled zero-shot segmentation and understanding of the world, making it possible to query and manipulate objects with precision. However, this vision also demands natural and intuitive ways for humans to interact with these models through gestures, gaze, and existing devices. Augmented reality (AR) provides the ideal bridge between AI output and human input in....","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.3389/frvir.2025.1649785","openalex_id":"https://openalex.org/W4414217599","cited_by_count":4,"quality_score":45,"matched_keywords":["memory"],"author_affiliations":["Google (Switzerland)","Google (United States)","Northwestern University","Princeton University","University of Colorado Boulder","University of Minnesota","University of Minnesota System"],"concepts":[{"id":"https://openalex.org/C153715457","display_name":"Augmented reality","score":0.7680000066757202},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7109000086784363},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.6330999732017517},{"id":"https://openalex.org/C194969405","display_name":"Virtual reality","score":0.5307000279426575},{"id":"https://openalex.org/C100776233","display_name":"Bridge (graph theory)","score":0.504800021648407},{"id":"https://openalex.org/C206776904","display_name":"Mixed reality","score":0.44760000705718994},{"id":"https://openalex.org/C2781238097","display_name":"Object (grammar)","score":0.4345000088214874},{"id":"https://openalex.org/C8678698","display_name":"Artificial reality","score":0.4165000021457672}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":4}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/cancerguide-cancer-guideline-understanding-via-internal-disagreement-estimation","title":"CancerGUIDE: Cancer Guideline Understanding via Internal Disagreement Estimation","url":"https://www.microsoft.com/en-us/research/publication/cancerguide-cancer-guideline-understanding-via-internal-disagreement-estimation/","published":"2025-09-08","authors":["Alyssa Unell","Noel Codella","Sam Preston","Peniel Argaw","Wen-wai Yim","Zelalem Gero","Cliff Wong","Rajesh Jena","Eric Horvitz","Amanda K. Hall","Ruican Rachel Zhong","Jiachen Li"],"abstract":"The National Comprehensive Cancer Network (NCCN) provides evidence-based guidelines for cancer treatment. Translating complex patient presentations into guideline-compliant treatment recommendations is time-intensive, requires specialized expertise, and is prone to error. Advances in large language model (LLM) capabilities promise to reduce the time required to generate treatment recommendations and improve accuracy. We present an LLM agent-based approach to automatically generate guideline-concordant treatment trajectories for patients with non-small cell lung cancer (NSCLC). Our contributions are threefold. First, we construct a novel longitudinal dataset of 121 cases of NSCLC patients that includes clinical encounters, diagnostic results, and medical histories, each expertly annotated with the corresponding NCCN guideline trajectories by board-certified oncologists. Second, we demonst...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":88,"matched_keywords":["Miscellaneous","Artificial intelligence","Medical, health and genomics","Computer science","Healthcare","large language models","LLM","language model","agent"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/sheetdesigner-mllm-powered-spreadsheet-layout-generation-with-rule-based-and-vision-based-reflection","title":"SheetDesigner: MLLM-Powered Spreadsheet Layout Generation with Rule-Based and Vision-Based Reflection","url":"https://www.microsoft.com/en-us/research/publication/sheetdesigner-mllm-powered-spreadsheet-layout-generation-with-rule-based-and-vision-based-reflection/","published":"2025-09-08","authors":["Qin Chen","Yuanyi Ren","Xiaojun Ma","Mugeng Liu","Han Shi","Dongmei Zhang","Dongmei Zhang"],"abstract":"Spreadsheets are critical to data-centric tasks, with rich, structured layouts that enable efficient information transmission. Given the time and expertise required for manual spreadsheet layout design, there is an urgent need for automated solutions. However, existing automated layout models are ill-suited to spreadsheets, as they often (1) treat components as axis-aligned rectangles with continuous coordinates, overlooking the inherently discrete, grid-based structure of spreadsheets; and (2) neglect interrelated semantics, such as data dependencies and contextual links, unique to spreadsheets. In this paper, we first formalize the spreadsheet layout generation task, supported by a seven-criterion evaluation protocol and a dataset of 3,326 spreadsheets. We then introduce SheetDesigner, a zero-shot and training-free framework using Multimodal Large Language Models (MLLMs) that combines....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Miscellaneous","Data platforms and analytics","Computer science","spreadsheets","efficient"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/delta-l-normalization-rethink-loss-aggregation-in-rlvr","title":"$\\Delta L$ Normalization: Rethink Loss Aggregation in RLVR","url":"https://www.microsoft.com/en-us/research/publication/delta-l-normalization-rethink-loss-aggregation-in-rlvr/","published":"2025-09-08","authors":["Zhiyuan He","Xufang Luo","Yike Zhang","Yuqing Yang","Lili Qiu"],"abstract":"We propose $\\Delta L$ Normalization, a simple yet effective loss aggregation method tailored to the characteristic of dynamic generation lengths in Reinforcement Learning with Verifiable Rewards (RLVR). Recently, RLVR has demonstrated strong potential in improving the reasoning capabilities of large language models (LLMs), but a major challenge lies in the large variability of response lengths during training, which leads to high gradient variance and unstable optimization. Although previous methods such as GRPO, DAPO, and Dr. GRPO introduce different loss normalization terms to address this issue, they either produce biased estimates or still suffer from high gradient variance. By analyzing the effect of varying lengths on policy loss both theoretically and empirically, we reformulate the problem as finding a minimum-variance unbiased estimator. Our proposed $\\Delta L$ Normalization not...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Miscellaneous","Artificial intelligence","Computer science","Reinforcement learning"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"hf-org-paper:Tencent-Hunyuan:2509.06784","title":"P3-SAM: Native 3D Part Segmentation","url":"https://huggingface.co/papers/2509.06784","published":"2025-09-08","authors":["Tencent/Hunyuan"],"abstract":"","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["HuggingFace org papers","Tencent-Hunyuan"],"author_affiliations":["Tencent/Hunyuan"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/Tencent-Hunyuan/papers"}},{"id":"openalex:W4414063032","title":"Full-Stack Optimized Large Language Models for Lifelong Sequential Behavior Comprehension in Recommendation","url":"https://doi.org/10.1145/3766553","published":"2025-09-08","authors":["Rong Shan","Jiachen Zhu","Jianghao Lin","Chenxu Zhu","Bo Chen","Ruiming Tang","Yong Yu","Weinan Zhang"],"abstract":"As large language models (LLMs) achieve remarkable success in natural language processing (NLP) domains, LLM-enhanced recommender systems have received much attention and are being actively explored currently. In this article, we focus on adapting and enhancing large language models for recommendation tasks. First and foremost, we identify and formulate the lifelong sequential behavior incomprehension problem for LLMs in recommendation realms, i.e., LLMs fail to effectively extract useful information from a pure textual context of long user behavior sequence, even if the length of context is well below the context limitation of LLMs. To address such an issue and improve the recommendation performance of LLMs, we propose a novel framework, namely, R etrieval- e nhanced L arge La nguage models Plus (ReLLaX), which provides full-stack optimization from three perspectives, i.e., data, prompt...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3766553","openalex_id":"https://openalex.org/W4414063032","cited_by_count":1,"quality_score":46,"matched_keywords":["LLM","retrieval"],"author_affiliations":["Huawei Technologies (China)","Huawei Technologies (Sweden)","Shanghai Jiao Tong University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.751800000667572},{"id":"https://openalex.org/C2779343474","display_name":"Context (archaeology)","score":0.6836000084877014},{"id":"https://openalex.org/C511192102","display_name":"Comprehension","score":0.6660000085830688},{"id":"https://openalex.org/C557471498","display_name":"Recommender system","score":0.5684000253677368},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4740000069141388},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.4440000057220459},{"id":"https://openalex.org/C21569690","display_name":"Collaborative filtering","score":0.4300999939441681},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.41839998960494995}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/demo-healthcare-agent-orchestrator-hao-for-patient-summarization-in-molecular-tumor-boards","title":"Demo: Healthcare Agent Orchestrator (HAO) for Patient Summarization in Molecular Tumor Boards","url":"https://www.microsoft.com/en-us/research/publication/demo-healthcare-agent-orchestrator-hao-for-patient-summarization-in-molecular-tumor-boards/","published":"2025-09-07","authors":["Matthias Blondeel","Noel Codella","Sam Preston","Hao Qiu","Leonardo Schettini","Frank Tuan","Wen-wai Yim","Smitha Saligrama","Mert Oz","Shrey Jain","Matthew P Lungren","Thomas Osborne"],"abstract":"Molecular Tumor Boards (MTBs) are multidisciplinary forums where oncology specialists collaboratively assess complex patient cases to determine optimal treatment strategies. A central element of this process is the patient summary, typically compiled by a medical oncologist, radiation oncologist, or surgeon, or their trained medical assistant, who distills heterogeneous medical records into a concise narrative to facilitate discussion. This manual approach is often labor-intensive, subjective, and prone to omissions of critical information. To address these limitations, we introduce the Healthcare Agent Orchestrator (HAO), a Large Language Model (LLM)-driven AI agent that coordinates a multi-agent clinical workflow to generate accurate and comprehensive patient summaries for MTBs. Evaluating predicted patient summaries against ground truth presents additional challenges due to stylistic....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":88,"matched_keywords":["Miscellaneous","Medical, health and genomics","Computer science","Healthcare","large language models","LLM","language model","agent","multi-agent"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/instruction-agent-enhancing-agent-with-expert-demonstration","title":"Instruction Agent: Enhancing Agent with Expert Demonstration","url":"https://www.microsoft.com/en-us/research/publication/instruction-agent-enhancing-agent-with-expert-demonstration/","published":"2025-09-07","authors":["Yinheng Li","Hailey Hultquist","Justin Wagle","Kazuhito Koishida"],"abstract":"Graphical user interface (GUI) agents have advanced rapidly but still struggle with complex tasks involving novel UI elements, long-horizon actions, and personalized trajectories. In this work, we introduce Instruction Agent, a GUI agent that leverages expert demonstrations to solve such tasks, enabling completion of otherwise difficult workflows. Given a single demonstration, the agent extracts step-by-step instructions and executes them by strictly following the trajectory intended by the user, which avoids making mistakes during execution. The agent leverages the verifier and backtracker modules further to improve robustness. Both modules are critical to understand the current outcome from each action and handle unexpected interruptions(such as pop-up windows) during execution. Our experiments show that Instruction Agent achieves a 60% success rate on a set of tasks in OSWorld that al...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":80,"matched_keywords":["Miscellaneous","Graphics and multimedia","Systems and networking","Computer science","Graphical user interface","personalized","agent"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/beyond-two-stage-training-cooperative-sft-and-rl-for-llm-reasoning","title":"Beyond Two-Stage Training: Cooperative SFT and RL for LLM Reasoning","url":"https://www.microsoft.com/en-us/research/publication/beyond-two-stage-training-cooperative-sft-and-rl-for-llm-reasoning/","published":"2025-09-07","authors":["Liang Chen","Xueting Han","Li Shen","Jing Bai","Kam-Fai Wong"],"abstract":"Reinforcement learning (RL) has proven effective in incentivizing the reasoning abilities of large language models (LLMs), but suffers from severe efficiency challenges due to its trial-and-error nature. While the common practice employs supervised fine-tuning (SFT) as a warm-up stage for RL, this decoupled two-stage approach limits interaction between SFT and RL, thereby constraining overall effectiveness. This study introduces a novel method for learning reasoning models that employs bilevel optimization to facilitate better cooperation between these training paradigms. By conditioning the SFT objective on the optimal RL policy, our approach enables SFT to meta-learn how to guide RL's optimization process. During training, the lower level performs RL updates while simultaneously receiving SFT supervision, and the upper level explicitly maximizes the cooperative gain-the performance adv...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Miscellaneous","Artificial intelligence","Computer science","Reinforcement learning","LLM"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W4415746213","title":"Integrating Rules and Semantics for LLM-Based C-to-Rust Translation","url":"https://doi.org/10.1109/icsme64153.2025.00069","published":"2025-09-07","authors":["Feng Luo","Kexing Ji","Cuiyun Gao","Shuzheng Gao","Jia Feng","Kui Liu","Xin Xia","Michael R. Lyu"],"abstract":"Automated translation of legacy <tex xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">$\\mathbf{C}$</tex> code into Rust aims to ensure memory safety while reducing the burden of manual migration. Early approaches in C-to-Rust translation rely on static rule-based methods, but they suffer from limited coverage due to dependence on predefined rule patterns. Recent works regard the task as a sequence-to-sequence problem by leveraging large language models (LLMs). Although these LLM-based methods are capable of reducing unsafe code blocks, the translated code often exhibits issues in following Rust rules and maintaining semantic consistency. On one hand, existing methods adopt a direct prompting strategy to translate the <tex xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">$C$</tex> code, which struggles to ac...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/icsme64153.2025.00069","openalex_id":"https://openalex.org/W4415746213","cited_by_count":0,"quality_score":49,"matched_keywords":["LLM","memory","retrieval"],"author_affiliations":["Chinese University of Hong Kong","Harbin Institute of Technology","Huawei Technologies (China)","Huawei Technologies (United States)","Zhejiang University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8251000046730042},{"id":"https://openalex.org/C170858558","display_name":"Automatic summarization","score":0.8007000088691711},{"id":"https://openalex.org/C169590947","display_name":"Compiler","score":0.5888000130653381},{"id":"https://openalex.org/C184337299","display_name":"Semantics (computer science)","score":0.5853999853134155},{"id":"https://openalex.org/C149364088","display_name":"Translation (biology)","score":0.5670999884605408},{"id":"https://openalex.org/C199360897","display_name":"Programming language","score":0.555400013923645},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.5167999863624573},{"id":"https://openalex.org/C197781089","display_name":"Rust (programming language)","score":0.4819999933242798}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4415746005","title":"Together We are Better: LLM, IDE and Semantic Embedding to Assist Move Method Refactoring","url":"https://doi.org/10.1109/icsme64153.2025.00046","published":"2025-09-07","authors":["Abhiram Bellur","Fraol Batole","Mohammed Raihan Ullah","Malinda Dilhara","Yaroslav Zharov","Timofey Bryksin","Kai Ishikawa","Haifeng Chen","Masaharu Morimoto","Takeo Hosomi","Tien N. Nguyen","Hridesh Rajan"],"abstract":"MoveMethod is a hallmark refactoring. Despite a plethora of research tools that recommend which methods to move and where, these recommendations do not align with how expert developers perform Movemethod. Given the extensive training of Large Language Models and their reliance upon naturalness of code, they should expertly recommend which methods are misplaced in a given class and which classes are better hosts. Our formative study of 2016 LLM recommendations revealed that LLMs give expert suggestions, yet they are unreliable: up to 80 % of the suggestions are hallucinations. We introduce the first LLM fully powered assistant for MoveMethod refactoring that automates its whole end-to-end lifecycle, from recommendation to execution. We designed novel solutions that automatically filter LLM hallucinations using static analysis from IDEs and a novel workflow that requires LLMs to be self-co...","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/icsme64153.2025.00046","openalex_id":"https://openalex.org/W4415746005","cited_by_count":1,"quality_score":46,"matched_keywords":["LLM","retrieval"],"author_affiliations":["Amazon (United States)","Concordia University","Jesus University","NEC (United States)","Tulane University","University of Colorado System"],"concepts":[{"id":"https://openalex.org/C152752567","display_name":"Code refactoring","score":0.967199981212616},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7664999961853027},{"id":"https://openalex.org/C2779343474","display_name":"Context (archaeology)","score":0.6043999791145325},{"id":"https://openalex.org/C177212765","display_name":"Workflow","score":0.5914000272750854},{"id":"https://openalex.org/C2777212361","display_name":"Class (philosophy)","score":0.49459999799728394},{"id":"https://openalex.org/C185798385","display_name":"Benchmark (surveying)","score":0.46880000829696655},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4499000012874603},{"id":"https://openalex.org/C115903868","display_name":"Software engineering","score":0.4438999891281128}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4415746208","title":"A Deep Dive into Retrieval-Augmented Generation for Code Completion: Experience on WeChat","url":"https://doi.org/10.1109/icsme64153.2025.00062","published":"2025-09-07","authors":["Zezhou Yang","Ting Peng","Cuiyun Gao","Chaozheng Wang","Hailiang Huang","Yuetang Deng"],"abstract":"Code completion, a crucial task in software engineering that enhances developer productivity, has seen substantial improvements with the rapid advancement of large language models (LLMs). In recent years, retrieval-augmented generation (RAG) has emerged as a promising method to enhance the code completion capabilities of LLMs, which leverages relevant context from codebases without requiring model retraining. While existing studies have demonstrated the effectiveness of RAG on public repositories and benchmarks, the potential distribution shift between open-source and closed-source codebases presents unique challenges that remain unexplored. To mitigate the gap, we conduct an empirical study to investigate the performance of widely-used RAG methods for code completion in the industrialscale codebase of WeChat, one of the largest proprietary software systems. Specifically, we extensively....","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/icsme64153.2025.00062","openalex_id":"https://openalex.org/W4415746208","cited_by_count":0,"quality_score":41,"matched_keywords":["retrieval"],"author_affiliations":["Chinese University of Hong Kong","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8859000205993652},{"id":"https://openalex.org/C51929080","display_name":"Codebase","score":0.8062000274658203},{"id":"https://openalex.org/C2776760102","display_name":"Code (set theory)","score":0.550000011920929},{"id":"https://openalex.org/C26517878","display_name":"Key (lock)","score":0.5220000147819519},{"id":"https://openalex.org/C2779343474","display_name":"Context (archaeology)","score":0.5206000208854675},{"id":"https://openalex.org/C2777904410","display_name":"Software","score":0.4519999921321869},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.43720000982284546},{"id":"https://openalex.org/C43126263","display_name":"Source code","score":0.42559999227523804}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"arxiv:2510.06657","title":"LLM-Powered Nuanced Video Attribute Annotation for Enhanced Recommendations","url":"http://arxiv.org/abs/2510.06657","published":"2025-09-06","authors":["Boyuan Long","Yueqi Wang","Hiloni Mehta","Mick Zomnir","Omkar Pathak","Changping Meng","Ruolin Jia","Yajun Peng","Dapeng Hong","Xia Wu","Mingyan Gao","Onkar Dalal"],"abstract":"This paper presents a case study on deploying Large Language Models (LLMs) as an advanced \"annotation\" mechanism to achieve nuanced content understanding (e.g., discerning content \"vibe\") at scale within a large-scale industrial short-form video recommendation system. Traditional machine learning classifiers for content understanding face protracted development cycles and a lack of deep, nuanced comprehension. The \"LLM-as-annotators\" approach addresses these by significantly shortening development times and enabling the annotation of subtle attributes. This work details an end-to-end workflow encompassing: (1) iterative definition and robust evaluation of target attributes, refined by offline metrics and online A/B testing; (2) scalable offline bulk annotation of video corpora using LLMs with multimodal features, optimized inference, and knowledge distillation for broad application; and....","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"preprint","doi":"https://doi.org/10.1145/3705328.3748103","openalex_id":"https://openalex.org/W4414034782","cited_by_count":0,"quality_score":53,"matched_keywords":["LLM","personalized","retrieval","distillation"],"author_affiliations":["Google (United States)"],"concepts":[{"id":"https://openalex.org/C2776321320","display_name":"Annotation","score":0.73373943567276},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7193000316619873},{"id":"https://openalex.org/C49774154","display_name":"Multimedia","score":0.3991529941558838},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.32858604192733765},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.32546091079711914}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4414035095","title":"RankGraph: Unified Heterogeneous Graph Learning for Cross-Domain Recommendation","url":"https://doi.org/10.1145/3705328.3748118","published":"2025-09-06","authors":["Renzhi Wu","Junjie Yang","Li Chen","Hong Li","Li Yu","Hong Yan"],"abstract":"Cross-domain recommendation systems face the challenge of integrating fine-grained user and item relationships across various product domains.To address this, we introduce RankGraph, a scalable graph learning framework designed to serve as a core component in recommendation foundation models (FMs).By constructing and leveraging graphs composed of heterogeneous nodes and edges across multiple products, RankGraph enables the integration of complex relationships between users, posts, ads, and other entities.Our framework employs a GPU-accelerated Graph Neural Network and contrastive learning, allowing for dynamic extraction of subgraphs such as item-item and user-user graphs to support similarity-based retrieval and real-time clustering.Furthermore, RankGraph integrates graph-based pretrained representations as contextual tokens into FM sequence models, enriching them with structured relati...","companies":["Meta/FAIR"],"matched_orgs":["Meta/FAIR"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3705328.3748118","openalex_id":"https://openalex.org/W4414035095","cited_by_count":2,"quality_score":43,"matched_keywords":["retrieval"],"author_affiliations":["Meta (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7617673873901367},{"id":"https://openalex.org/C132525143","display_name":"Graph","score":0.5355973243713379},{"id":"https://openalex.org/C36503486","display_name":"Domain (mathematical analysis)","score":0.44244804978370667},{"id":"https://openalex.org/C557471498","display_name":"Recommender system","score":0.4327763617038727},{"id":"https://openalex.org/C80444323","display_name":"Theoretical computer science","score":0.4305911064147949},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.23765230178833008},{"id":"https://openalex.org/C33923547","display_name":"Mathematics","score":0.1350524127483368},{"id":"https://openalex.org/C134306372","display_name":"Mathematical analysis","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":2}},{"id":"openalex:W4414034992","title":"Scaling Generative Recommendations with Context Parallelism on Hierarchical Sequential Transducers","url":"https://doi.org/10.1145/3705328.3748143","published":"2025-09-06","authors":["Yan Dong","Han Li","Shen Li","Nikhil Patel","Xing Liu","Xiaodong Wang","Chuanhao Zhuge"],"abstract":"Large-scale recommendation systems are pivotal to process an immense volume of daily user interactions, requiring the effective modeling of high cardinality and heterogeneous features to ensure accurate predictions.In prior work, we introduced Hierarchical Sequential Transducers (HSTU), an attention-based architecture for modeling high cardinality, non-stationary streaming recommendation data, providing good scaling law in the generative recommender framework (GR).Recent studies and experiments demonstrate that attending to longer user history sequences yields significant metric improvements.However, scaling sequence length is activation-heavy, necessitating parallelism solutions to effectively shard activation memory.In transformer-based LLMs, context parallelism (CP) is a commonly used technique that distributes computation along the sequence-length dimension across multiple GPUs, effe...","companies":["Meta/FAIR"],"matched_orgs":["Meta/FAIR"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3705328.3748143","openalex_id":"https://openalex.org/W4414034992","cited_by_count":1,"quality_score":42,"matched_keywords":["memory"],"author_affiliations":["Meta (United States)"],"concepts":[{"id":"https://openalex.org/C2781172179","display_name":"Parallelism (grammar)","score":0.8288289308547974},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7461010217666626},{"id":"https://openalex.org/C99844830","display_name":"Scaling","score":0.6952981948852539},{"id":"https://openalex.org/C2779343474","display_name":"Context (archaeology)","score":0.61885666847229},{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.6101964712142944},{"id":"https://openalex.org/C173608175","display_name":"Parallel computing","score":0.41521212458610535},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.22213372588157654},{"id":"https://openalex.org/C33923547","display_name":"Mathematics","score":0.10499736666679382}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"official:b639932e68392efd","title":"Hunyuan-MT Technical Report","url":"https://huggingface.co/papers/2509.05209","published":"2025-09-05","authors":["Tencent Hunyuan"],"abstract":"","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_report"],"source":"official_report","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Tencent/Hunyuan"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official company report source"}},{"id":"openalex:W4416401251","title":"Application of AI and Generative AI for Understanding Student Behavior and Performance in Higher Education","url":"https://doi.org/10.1109/icicnct66124.2025.11232608","published":"2025-09-05","authors":["RVS Praveen","Satya Subrahmanya Sai Ram Gopal Peri","Harikrishna Vemuri","S. Sista","Srinikhil Saisatya Vemuri","RaviTeja Aida"],"abstract":"Motivation has consistently captivated scholars and practitioners focused on human behaviour and performance, with comprehensive research encompassing educational institutions, corporations, governmental entities, and athletic domains. In higher education, comprehending the significance of motivation is crucial for evaluating student behaviour and performance. Researchers have endeavoured to elucidate the motivations behind individual acts by integrating cognitive models, which concentrate on mental processes, and non-cognitive paradigms, which highlight attributes, emotions, and surroundings. This project seeks to improve the analysis of student behaviour and performance in higher education through the implementation of a hybrid DNN model that is sentiment-aware and context-sensitive. The model employs an attention mechanism to emphasise essential aspects and utilises a modified BiLSTM....","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/icicnct66124.2025.11232608","openalex_id":"https://openalex.org/W4416401251","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Accenture (United States)","Amazon (United States)","Deloitte (United States)","Hindu College of Pharmacy","Mahindra Group (India)","United Utilities (United Kingdom)"],"concepts":[{"id":"https://openalex.org/C120912362","display_name":"Higher education","score":0.578000009059906},{"id":"https://openalex.org/C511192102","display_name":"Comprehension","score":0.5011000037193298},{"id":"https://openalex.org/C81917197","display_name":"Selection (genetic algorithm)","score":0.49059998989105225},{"id":"https://openalex.org/C169900460","display_name":"Cognition","score":0.46160000562667847},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.4577000141143799},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4293000102043152},{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.4171000123023987},{"id":"https://openalex.org/C15744967","display_name":"Psychology","score":0.40130001306533813}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/collaboration-and-conflict-between-humans-and-language-models-through-the-lens-of-game-theory","title":"Collaboration and Conflict between Humans and Language Models through the Lens of Game Theory","url":"https://www.microsoft.com/en-us/research/publication/collaboration-and-conflict-between-humans-and-language-models-through-the-lens-of-game-theory/","published":"2025-09-04","authors":["Mukul Singh","Arjun Radhakrishna","Sumit Gulwani"],"abstract":"Language models are increasingly deployed in interactive online environments, from personal chat assistants to domain-specific agents, raising questions about their cooperative and competitive behavior in multi-party settings. While prior work has examined language model decision-making in isolated or short-term game-theoretic contexts, these studies often neglect long-horizon interactions, human-model collaboration, and the evolution of behavioral patterns over time. In this paper, we investigate the dynamics of language model behavior in the iterated prisoner's dilemma (IPD), a classical framework for studying cooperation and conflict. We pit model-based agents against a suite of 240 well-established classical strategies in an Axelrod-style tournament and find that language models achieve performance on par with, and in some cases exceeding, the best-known classical strategies. Behavio...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":80,"matched_keywords":["Miscellaneous","Artificial intelligence","Human language technologies","Computer science","Natural language processing","language model","long-term"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/flower-democratizing-generalist-robot-policies-with-efficient-vision-language-action-flow-policies","title":"FLOWER: Democratizing Generalist Robot Policies with Efficient Vision-Language-Action Flow Policies","url":"https://www.microsoft.com/en-us/research/publication/flower-democratizing-generalist-robot-policies-with-efficient-vision-language-action-flow-policies/","published":"2025-09-04","authors":["Moritz Reuss","Hongyi Zhou","Marcel Ruhle","Omer Erdincc Yaugmurlu","Fabian Otto","Rudolf Lioutikov"],"abstract":"Developing efficient Vision-Language-Action (VLA) policies is crucial for practical robotics deployment, yet current approaches face prohibitive computational costs and resource requirements. Existing diffusion-based VLA policies require multi-billion-parameter models and massive datasets to achieve strong performance. We tackle this efficiency challenge with two contributions: intermediate-modality fusion, which reallocates capacity to the diffusion head by pruning up to $50\\%$ of LLM layers, and action-specific Global-AdaLN conditioning, which cuts parameters by $20\\%$ through modular adaptation. We integrate these advances into a novel 950 M-parameter VLA called FLOWER. Pretrained in just 200 H100 GPU hours, FLOWER delivers competitive performance with bigger VLAs across $190$ tasks spanning ten simulation and real-world benchmarks and demonstrates robustness across diverse robotic em...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Miscellaneous","Computer vision","Computer science","Computer Vision and Pattern Recognition","LLM","efficient"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W4413981016","title":"Fine-grained multiclass nuclei segmentation with molecular empowered all-in-SAM model","url":"https://doi.org/10.1117/1.jmi.12.5.057501","published":"2025-09-04","authors":["Xueyuan Li","Can Cui","Ruining Deng","Yucheng Tang","Quan Liu","Tianyuan Yao","Shunxing Bao","Naweed I. Chowdhury","Haichun Yang","Yuankai Huo"],"abstract":"Purpose: Recent developments in computational pathology have been driven by advances in vision foundation models (VFMs), particularly the Segment Anything Model (SAM). This model facilitates nuclei segmentation through two primary methods: prompt-based zero-shot segmentation and the use of cell-specific SAM models for direct segmentation. These approaches enable effective segmentation across a range of nuclei and cells. However, general VFMs often face challenges with fine-grained semantic segmentation, such as identifying specific nuclei subtypes or particular cells. Approach: In this paper, we propose the molecular empowered all-in-SAM model to advance computational pathology by leveraging the capabilities of VFMs. This model incorporates a full-stack approach, focusing on (1) annotation-engaging lay annotators through molecular empowered learning to reduce the need for detailed pixel-...","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1117/1.jmi.12.5.057501","openalex_id":"https://openalex.org/W4413981016","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Nvidia (United Kingdom)","Nvidia (United States)","Vanderbilt Health","Vanderbilt University","Vanderbilt University Medical Center"],"concepts":[{"id":"https://openalex.org/C71924100","display_name":"Medicine","score":0.9556924104690552},{"id":"https://openalex.org/C89600930","display_name":"Segmentation","score":0.6749029159545898},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.44266262650489807},{"id":"https://openalex.org/C153180895","display_name":"Pattern recognition (psychology)","score":0.3438164293766022},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/bridging-gaps-between-student-and-expert-evaluations-of-ai-generated-programming-hints","title":"Bridging Gaps Between Student and Expert Evaluations of AI-Generated Programming Hints","url":"https://www.microsoft.com/en-us/research/publication/bridging-gaps-between-student-and-expert-evaluations-of-ai-generated-programming-hints/","published":"2025-09-02","authors":["Tung Phung","Mengyan Wu","Heeryung Choi","Gustavo Soares","Sumit Gulwani","A. Singla","Christopher Brooks"],"abstract":"Generative AI has the potential to enhance education by providing personalized feedback to students at scale. Recent work has proposed techniques to improve AI-generated programming hints and has evaluated their performance based on expert-designed rubrics or student ratings. However, it remains unclear how the rubrics used to design these techniques align with students'perceived helpfulness of hints. In this paper, we systematically study the mismatches in perceived hint quality from students'and experts'perspectives based on the deployment of AI-generated hints in a Python programming course. We analyze scenarios with discrepancies between student and expert evaluations, in particular, where experts rated a hint as high-quality while the student found it unhelpful. We identify key reasons for these discrepancies and classify them into categories, such as hints not accounting for the st...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Miscellaneous","Artificial intelligence","Computer science","Generative AI","personalized"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"official:bb00f4c3a3e46917","title":"Jointly Reinforcing Diversity and Quality in Language Model Generations","url":"https://ai.meta.com/research/publications/jointly-reinforcing-diversity-and-quality-in-language-model-generations/","published":"2025-09-02","authors":["Tianjian Li","Yiming Zhang","Ping Yu","Swarnadeep Saha","Daniel Khashabi","Jason Weston","Jack Lanchantin","Tianlu Wang"],"abstract":"Official Meta AI publication page.","companies":["Meta/FAIR"],"matched_orgs":["Meta/FAIR"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Reinforcement Learning","NLP","language model"],"author_affiliations":["Meta/FAIR"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Meta AI publication search https://ai.meta.com/results/?content_types%5B0%5D=publication&page=4"}},{"id":"openalex:W4413922155","title":"A Cloud-Edge Collaborative Inference System for Data-secure LLM Serving","url":"https://doi.org/10.1145/3748273.3749205","published":"2025-09-02","authors":["Wenjie Chu","Yunfeng Shao","Chunhui Du"],"abstract":"The surge in private deployment of large language models (LLMs) driven by open-source advancements has intensified challenges in computational scalability, infrastructure costs, and data privacy. While cloud-edge collaborative inference frameworks alleviate local resource constraints through elastic cloud offloading, their efficacy in wide-area networks (WANs) is hindered by communication inefficiencies and privacy risks. This paper proposes CROSS-SEC, a novel cloud-edge collaborative inference framework integrating cross-WANs PD disaggregation with split learning (SL) for data security preservation. To mitigate transmission bottlenecks, CROSS-SEC introduces a layerwise KVCache computation-communication overlapping mechanism, coupled with asychromous concurrent transmission to eliminate ACK-induced latency. For congestion control, a dual-grained scheduling strategy is proposed: (1) KVCac...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3748273.3749205","openalex_id":"https://openalex.org/W4413922155","cited_by_count":0,"quality_score":41,"matched_keywords":["LLM"],"author_affiliations":["Huawei Technologies (China)"],"concepts":[{"id":"https://openalex.org/C79974875","display_name":"Cloud computing","score":0.7972306609153748},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7397717237472534},{"id":"https://openalex.org/C162307627","display_name":"Enhanced Data Rates for GSM Evolution","score":0.5763468742370605},{"id":"https://openalex.org/C2776214188","display_name":"Inference","score":0.4865519106388092},{"id":"https://openalex.org/C38652104","display_name":"Computer security","score":0.41106078028678894},{"id":"https://openalex.org/C120314980","display_name":"Distributed computing","score":0.3554569482803345},{"id":"https://openalex.org/C111919701","display_name":"Operating system","score":0.17831599712371826},{"id":"https://openalex.org/C76155785","display_name":"Telecommunications","score":0.14090648293495178}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4413922012","title":"LIFT: Automating Symbolic Execution Optimization with Large Language Models for AI Networks","url":"https://doi.org/10.1145/3748273.3749202","published":"2025-09-02","authors":["Ruoxi Wang","Kun Li","Minghui Xu","Yue Zhang","Kaidi Xu","Chunchi Liu","Yinhao Xiao","Xiuzhen Cheng"],"abstract":"Dynamic Symbolic Execution (DSE) is a key technique in program analysis, widely used in software testing, vulnerability discovery, and formal verification. In distributed AI systems, DSE plays a crucial role in identifying hard-to-detect bugs, especially those arising from complex network communication patterns. However, traditional approaches to symbolic execution are often hindered by scalability issues and inefficiencies, particularly in large-scale systems. This paper introduces LIFT (Large-language-model Integrated Functional-equivalent-IR Transformation), a novel framework that leverages Large Language Models (LLMs) to automate the optimization of Intermediate Representations (IRs) in symbolic execution. LIFT addresses the challenges of symbolic execution by providing a scalable, context-sensitive solution for IR transformation. The framework consists of two phases: IR Analysis and...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3748273.3749202","openalex_id":"https://openalex.org/W4413922012","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Drexel University","Guangdong University Of Finances and Economics","Guangdong University of Finance","Huawei Technologies (China)","Northeastern University","Shandong University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8190420269966125},{"id":"https://openalex.org/C139002025","display_name":"Lift (data mining)","score":0.6760276556015015},{"id":"https://openalex.org/C2779639559","display_name":"Symbolic execution","score":0.5487292408943176},{"id":"https://openalex.org/C199360897","display_name":"Programming language","score":0.5240381360054016},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.33102279901504517},{"id":"https://openalex.org/C2777904410","display_name":"Software","score":0.10493183135986328},{"id":"https://openalex.org/C124101348","display_name":"Data mining","score":0.07544252276420593}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/vibe-coding-programming-through-conversation-with-artificial-intelligence","title":"Vibe coding: programming through conversation with artificial intelligence","url":"https://www.microsoft.com/en-us/research/publication/vibe-coding-programming-through-conversation-with-artificial-intelligence/","published":"2025-09-01","authors":["Advait Sarkar","Ian Drosos"],"abstract":"We examine “vibe coding”: an emerging programming paradigm where developers primarily write code by interacting with code-generating large language models rather than writing code directly. We present the first empirical study of vibe coding. We analysed over 8 hours of curated video capturing extended vibe coding sessions with rich think-aloud reflections. Using framework analysis, we investigated programmers’ goals, workflows, prompting techniques, debugging approaches, and challenges encountered.We find that vibe coding follows iterative goal satisfaction cycles where developers alternate between prompting AI, evaluating generated code through rapid scanning and application testing, and manual editing. Prompts in vibe coding blend vague, high-level directives with detailed technical specifications. Debugging remains a hybrid process combining AI assistance with manual practices.Critic...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":80,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Human-computer interaction","Programming languages and software engineering","Human–computer interaction","Programming language","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/quality-over-quantity-llm-based-curation-for-a-data-efficient-audio-video-foundation-model","title":"Quality Over Quantity? LLM-Based Curation for a Data-Efficient Audio-Video Foundation Model","url":"https://www.microsoft.com/en-us/research/publication/quality-over-quantity-llm-based-curation-for-a-data-efficient-audio-video-foundation-model/","published":"2025-09-01","authors":["Ali Vosoughi","Hannes Gamper","Dimitra Emmanouilidou"],"abstract":"Integrating audio and visual data for training multimodal foundational models remains a challenge. The Audio-Video Vector Alignment (AVVA) framework addresses this by considering AV scene alignment beyond mere temporal synchronization, and leveraging Large Language Models (LLMs) for data curation. AVVA implements a scoring mechanism for selecting aligned training data segments. It integrates Whisper, a speech-based foundation model, for audio and DINOv2 for video analysis in a dual-encoder structure with contrastive learning on AV pairs. Evaluations on AudioCaps, VALOR, and VGGSound demonstrate the effectiveness of the proposed model architecture and data curation approach. AVVA achieves a significant improvement in top-k accuracies for video-to-audio retrieval on all datasets compared to DenseAV, while using only 192 hrs of curated training data. Furthermore, an ablation study indicates...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.23919/eusipco63237.2025.11226207","openalex_id":"https://openalex.org/W4417051962","cited_by_count":0,"quality_score":80,"matched_keywords":["Inproceedings (Conference)","Audio and Acoustics","Audio and Speech Processing","1970-01-01","LLM","retrieval","efficient"],"author_affiliations":["Microsoft","Microsoft (United States)","University of Rochester"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/cosmir-chain-orchestrated-structured-memory-for-iterative-reasoning-over-long-context","title":"COSMIR: Chain Orchestrated Structured Memory for Iterative Reasoning over Long Context","url":"https://www.microsoft.com/en-us/research/publication/cosmir-chain-orchestrated-structured-memory-for-iterative-reasoning-over-long-context/","published":"2025-09-01","authors":["Naman Gupta","Shreeyash Gowaikar","Arun Iyer","Kiran Shiragur","Ramakrishna Bairi","Rishikesh Maurya","Ritabrata Maiti","Sankarshan Damle","Shachee Mishra Gupta"],"abstract":"Reasoning over very long inputs remains difficult for large language models (LLMs). Common workarounds either shrink the input via retrieval (risking missed evidence), enlarge the context window (straining selectivity), or stage multiple agents to read in pieces. In staged pipelines (e.g., Chain of Agents, CoA), free-form summaries passed between agents can discard crucial details and amplify early mistakes. We introduce COSMIR (Chain Orchestrated Structured Memory for Iterative Reasoning), a chain-style framework that replaces ad hoc messages with a structured memory. A Planner agent first turns a user query into concrete, checkable sub-questions. worker agents process chunks via a fixed micro-cycle: Extract, Infer, Refine, writing all updates to the shared memory. A Manager agent then Synthesizes the final answer directly from the memory. This preserves step-wise read-then-reason benef...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","1970-01-01","memory","retrieval","agent"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/dynamic-speculative-agent-planning","title":"Dynamic Speculative Agent Planning","url":"https://www.microsoft.com/en-us/research/publication/dynamic-speculative-agent-planning/","published":"2025-09-01","authors":["Yilin Guan","Wenyue Hua","Qingfeng Lan","Sun Fei","Dujian Ding","Devang Acharya","Chi Wang","W. Wang"],"abstract":"Despite their remarkable success in complex tasks propelling widespread adoption, large language-model-based agents still face critical deployment challenges due to prohibitive latency and inference costs. While recent work has explored various methods to accelerate inference, existing approaches suffer from significant limitations: they either fail to preserve performance fidelity, require extensive offline training of router modules, or incur excessive operational costs. Moreover, they provide minimal user control over the tradeoff between acceleration and other performance metrics. To address these gaps, we introduce Dynamic Speculative Planning (DSP), an asynchronous online reinforcement learning framework that provides lossless acceleration with substantially reduced costs without requiring additional pre-deployment preparation. DSP explicitly optimizes a joint objective balancing e...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Miscellaneous","Artificial intelligence","Computer science","large language models","agent"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/deeptrace-auditing-deep-research-ai-systems-for-tracking-reliability-across-citations-and-evidence","title":"DeepTRACE: Auditing Deep Research AI Systems for Tracking Reliability Across Citations and Evidence","url":"https://www.microsoft.com/en-us/research/publication/deeptrace-auditing-deep-research-ai-systems-for-tracking-reliability-across-citations-and-evidence/","published":"2025-09-01","authors":["P. Venkit","Philippe Laban","Yilun Zhou","Kung-Hsiang Huang","Yixin Mao","Chien-Sheng Wu"],"abstract":"Generative search engines and deep research LLM agents promise trustworthy, source-grounded synthesis, yet users regularly encounter overconfidence, weak sourcing, and confusing citation practices. We introduce DeepTRACE, a novel sociotechnically grounded audit framework that turns prior community-identified failure cases into eight measurable dimensions spanning answer text, sources, and citations. DeepTRACE uses statement-level analysis (decomposition, confidence scoring) and builds citation and factual-support matrices to audit how systems reason with and attribute evidence end-to-end. Using automated extraction pipelines for popular public models (e.g., GPT-4.5/5, You.com, Perplexity, Copilot/Bing, Gemini) and an LLM-judge with validated agreement to human raters, we evaluate both web-search engines and deep-research configurations. Our findings show that generative search engines an...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Miscellaneous","Artificial intelligence","AI agents","Computer science","LLM"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/optimind-teaching-llms-to-think-like-optimization-experts","title":"OptiMind: Teaching LLMs to Think Like Optimization Experts","url":"https://www.microsoft.com/en-us/research/publication/optimind-teaching-llms-to-think-like-optimization-experts/","published":"2025-09-01","authors":["Zeyi Chen","Xinzhi Zhang","Humishka Zope","Hugo Barbalho","Konstantina Mellou","Marco Molinaro","Janardhan (Jana) Kulkarni","Ishai Menache","Sirui Li"],"abstract":"Mathematical programming - the task of expressing operations and decision-making problems in precise mathematical language - is fundamental across domains, yet remains a skill-intensive process requiring operations research expertise. Recent advances in large language models for complex reasoning have spurred interest in automating this task, translating natural language into executable optimization models. Current approaches, however, achieve limited accuracy, hindered by scarce and noisy training data without leveraging domain knowledge. In this work, we systematically integrate optimization expertise to improve formulation accuracy for mixed-integer linear programming, a key family of mathematical programs. Our approach first cleans training data through class-based error analysis to explicitly prevent common mistakes within each optimization class. We then develop multi-turn inferenc...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Miscellaneous","Artificial intelligence","Mathematics","LLM"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/informed-correctors-for-discrete-diffusion-models","title":"Informed Correctors for Discrete Diffusion Models","url":"https://www.microsoft.com/en-us/research/publication/informed-correctors-for-discrete-diffusion-models/","published":"2025-09-01","authors":["Yixiu Zhao","Jiaxin Shi","Lester Mackey","Scott W. Linderman"],"abstract":"Discrete diffusion has emerged as a powerful framework for generative modeling in discrete domains, yet efficiently sampling from these models remains challenging. Existing sampling strategies often struggle to balance computation and sample quality when the number of sampling steps is reduced, even when the model has learned the data distribution well. To address these limitations, we propose a predictor-corrector sampling scheme where the corrector is informed by the diffusion model to more reliably counter the accumulating approximation errors. To further enhance the effectiveness of our informed corrector, we introduce complementary architectural modifications based on hollow transformers and a simple tailored training objective that leverages more training signal. We use a synthetic example to illustrate the failure modes of existing samplers and show how informed correctors allevia...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/energy-use-of-ai-inference-efficiency-pathways-and-test-time-compute","title":"Energy Use of AI Inference: Efficiency Pathways and Test-Time Compute","url":"https://www.microsoft.com/en-us/research/publication/energy-use-of-ai-inference-efficiency-pathways-and-test-time-compute/","published":"2025-09-01","authors":["Felipe Oviedo","Fiodar Kazhamiaka","Esha Choukse","Allen Kim","Amy Luers","Melanie Nakagawa","Ricardo Bianchini","Juan M. Lavista Ferres"],"abstract":"As AI inference scales to billions of queries and emerging reasoning and agentic workflows substantially increase token demand, reliable estimates of per-query energy use are increasingly important for capacity planning, emissions accounting, and efficiency prioritization. Yet many public estimates tend to be inconsistent and systematically overstate energy use, because they extrapolate from limited benchmarks and fail to reflect the efficiency gains achievable in at-scale deployments. In this perspective, we introduce a bottom-up methodology to estimate the per-query energy of large-scale LLM systems based on token throughput estimation. For models running on an H100 node under realistic workloads, GPU utilization and PUE constraints, we estimate a median energy per query of 0.34 Wh (IQR: 0.18–0.67) for frontier-scale models (200 billion parameters). These results are consistent with me...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Unpublished","Artificial intelligence","Ecology and environment","LLM"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"bytedance-seed:311","title":"Robix: A Unified Model for Robot Interaction, Reasoning and Planning","url":"https://seed.bytedance.com/en/research/robix-a-unified-model-for-robot-interaction-reasoning-and-planning","published":"2025-09-01","authors":["Huang Fang","Mengxi Zhang","Heng Dong","Wei Li","Zixuan Wang","Qifeng Zhang","Xueyun Tian","Yucheng Hu","Hang Li"],"abstract":"We introduce Robix, a unified vision-language model designed to serve as the high-level cognitive layer in a hierarchical robot system, integrating robot reasoning, task planning, and natural language interaction within a single architecture. Robix dynamically generates atomic commands for low-level controllers alongside verbal responses for human interaction, enabling end-to-end execution of complex instructions, long-horizon task planning, and natural human-robot collaboration. The model also introduces novel capabilities such as proactive dialogue, real-time interruption handling, and context-aware commonsense reasoning during task execution. At its core, Robix employs chain-of-thought reasoning and is trained through a three-stage strategy: (1) continued pretraining to enhance embodied reasoning skills like 3D spatial understanding, visual grounding, and task-centric reasoning; (2) s...","companies":["ByteDance/Seed"],"matched_orgs":["ByteDance/Seed"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Robotics","arXiv","language model"],"author_affiliations":["ByteDance/Seed"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official ByteDance Seed publication API https://seed.bytedance.com/api/get_article_list_v2"}},{"id":"hf-org-paper:tencent:2509.01215","title":"POINTS-Reader: Distillation-Free Adaptation of Vision-Language Models for Document Conversion","url":"https://huggingface.co/papers/2509.01215","published":"2025-09-01","authors":["Tencent/Hunyuan"],"abstract":"","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["HuggingFace org papers","tencent","distillation"],"author_affiliations":["Tencent/Hunyuan"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/tencent/papers"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/the-next-horizon-of-system-intelligence","title":"The Next Horizon of System Intelligence","url":"https://www.microsoft.com/en-us/research/publication/the-next-horizon-of-system-intelligence/","published":"2025-09-01","authors":["Chieh-Jan Mike Liang","Haoran Qiu","Francis Y. Yan","Tianyin Xu","Lidong Zhou"],"abstract":"Generative AI, as represented by Large Language Models (LLMs), has been making unprecedented impacts on many problem domains such as code generation and bug fixing, and has reshaped the research landscape across many fields in computing and data science. However, its role in tackling complex, real-world system challenges seems to remain controversial. Anecdotally, some doubt LLMs’ capability to manage system intricacies, while others express concerns over their safety and interpretability.This controversy raises hard but urgent questions, which remain unanswered and even subject to drastically different opinions: Can AI models go beyond coding and bug fixing, towards the design and implementation of innovative and complex systems? What are the inherent limitations of existing AI models, and what are the best practices and tools available for systems researchers? How can we effectively “m...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Miscellaneous","Systems and networking"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W4414359395","title":"RotateKV: Accurate and Robust 2-Bit KV Cache Quantization for LLMs via Outlier-Aware Adaptive Rotations","url":"https://doi.org/10.24963/ijcai.2025/690","published":"2025-09-01","authors":["Zhiyong Su","Hanyu Wei","Zhe Chen","Wang Shen","Linge Li","Huangqi Yu","Kehong Yuan"],"abstract":"Key-Value (KV) cache facilitates efficient large language models (LLMs) inference by avoiding recomputation of past KVs. As the batch size and context length increase, the oversized KV caches become a significant memory bottleneck, highlighting the need for efficient compression. Existing KV quantization rely on fine-grained quantization or the retention of a significant portion of high bit-widths caches, both of which compromise compression ratio and often fail to maintain robustness at extremely low average bit-widths. In this work, we explore the potential of rotation technique for 2-bit KV quantization and propose RotateKV, which achieves accurate and robust performance through the following innovations: (i) Outlier-Aware Rotation, which utilizes channel-reordering to adapt the rotations to varying channel-wise outlier distributions without sacrificing the computational efficiency of...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.24963/ijcai.2025/690","openalex_id":"https://openalex.org/W4414359395","cited_by_count":2,"quality_score":55,"matched_keywords":["memory","efficient","compression","quantization"],"author_affiliations":["Huawei Technologies (China)","Tsinghua University","University Town of Shenzhen","The University of Melbourne","The University of Sydney","The University of Western Australia"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7466999888420105},{"id":"https://openalex.org/C28855332","display_name":"Quantization (signal processing)","score":0.7081999778747559},{"id":"https://openalex.org/C11413529","display_name":"Algorithm","score":0.6377999782562256},{"id":"https://openalex.org/C63479239","display_name":"Robustness (evolution)","score":0.5753999948501587},{"id":"https://openalex.org/C68339613","display_name":"Speedup","score":0.564300000667572},{"id":"https://openalex.org/C115537543","display_name":"Cache","score":0.4894999861717224},{"id":"https://openalex.org/C79337645","display_name":"Outlier","score":0.47360000014305115},{"id":"https://openalex.org/C2776214188","display_name":"Inference","score":0.4571000039577484}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":2}},{"id":"openalex:W4414360502","title":"Enhancing User-Oriented Proactivity in Open-Domain Dialogues with Critic Guidance","url":"https://doi.org/10.24963/ijcai.2025/919","published":"2025-09-01","authors":["Yufeng Wang","Jinwu Hu","Ziteng Huang","Kunyang Lin","Zitian Zhang","Peihao Chen","Yu Hu","Qianyue Wang","Zhuliang Yu","Bin Sun","Xiaofen Xing","Qingfang Zheng"],"abstract":"Open-domain dialogue systems aim to generate natural and engaging conversations, providing significant practical value in real applications such as social robotics and personal assistants. The advent of large language models (LLMs) has greatly advanced this field by improving context understanding and conversational fluency. However, existing LLM-based dialogue systems often fall short in proactively understanding the user's chatting preferences and guiding conversations toward user-centered topics. This lack of user-oriented proactivity can lead users to feel unappreciated, reducing their satisfaction and willingness to continue the conversation in human-computer interactions. To address this issue, we propose a User-oriented Proactive Chatbot (UPC) to enhance the user-oriented proactivity. Specifically, we first construct a critic to evaluate this proactivity inspired by the LLM-as-a-j...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.24963/ijcai.2025/919","openalex_id":"https://openalex.org/W4414360502","cited_by_count":13,"quality_score":54,"matched_keywords":["LLM"],"author_affiliations":["Hong Kong Polytechnic University","Hunan University","Peng Cheng Laboratory","South China University of Technology","Tencent (China)","Technical University of Munich","University of Bayreuth","University of Toronto"],"concepts":[{"id":"https://openalex.org/C142944206","display_name":"Proactivity","score":0.9203000068664551},{"id":"https://openalex.org/C2779041454","display_name":"Chatbot","score":0.8852999806404114},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7006000280380249},{"id":"https://openalex.org/C2777200299","display_name":"Conversation","score":0.6392999887466431},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.5568000078201294},{"id":"https://openalex.org/C2779343474","display_name":"Context (archaeology)","score":0.5338000059127808},{"id":"https://openalex.org/C9652623","display_name":"Field (mathematics)","score":0.47749999165534973},{"id":"https://openalex.org/C2780801425","display_name":"Construct (python library)","score":0.42329999804496765}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4414359290","title":"Unified Molecule-Text Language Model with Discrete Token Representation","url":"https://doi.org/10.24963/ijcai.2025/1023","published":"2025-09-01","authors":["Shuhan Guo","Yatao Bian","Ruibing Wang","Nan Yin","Zhen Wang","Quanming Yao"],"abstract":"The remarkable success of Large Language Models (LLMs) across diverse tasks has driven the research community to extend their capabilities to molecular applications. However, most molecular LLMs employ adapter-based architectures that fail to equally integrate molecule and text modalities and lack explicit supervision signals for the molecular modality. To address these issues, we introduce UniMoT, a Unified Molecule-Text LLM adopting a tokenizer-based architecture that expands the vocabulary of LLMs with molecule tokens. Specifically, we introduce a Vector Quantization-driven tokenizer that incorporates a Q-Former to bridge the modality gap between molecule and text. This tokenizer transforms molecular structures into sequences of tokens exhibiting causal dependency, thereby encapsulating both high-level molecular features and textual information. Equipped with this tokenizer, UniMoT un...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.24963/ijcai.2025/1023","openalex_id":"https://openalex.org/W4414359290","cited_by_count":3,"quality_score":52,"matched_keywords":["LLM","language model","quantization"],"author_affiliations":["Hong Kong University of Science and Technology","Northwestern Polytechnical University","Tencent (China)","Tsinghua University","University of Seoul"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6898999810218811},{"id":"https://openalex.org/C2776359362","display_name":"Representation (politics)","score":0.6783000230789185},{"id":"https://openalex.org/C48145219","display_name":"Security token","score":0.5914000272750854},{"id":"https://openalex.org/C2777601683","display_name":"Vocabulary","score":0.48809999227523804},{"id":"https://openalex.org/C2779903281","display_name":"Modalities","score":0.47940000891685486},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.47690001130104065},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.47279998660087585},{"id":"https://openalex.org/C511192102","display_name":"Comprehension","score":0.45980000495910645}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4414359932","title":"Endogenous Recovery via Within-modality Prototypes for Incomplete Multimodal Hashing","url":"https://doi.org/10.24963/ijcai.2025/281","published":"2025-09-01","authors":["Sa Zhu","Dayan Wu","Chenming Wu","Pengwen Dai","Bo Li"],"abstract":"Multimodal hashing projects multimodal data into compact binary codes, enabling rapid and storage-efficient retrieval of large-scale multimedia content. In practical scenarios, the issue of missing modality frequently arises when dealing with multimodal data. Existing incomplete multimodal hashing techniques directly recover missing modalities by neural networks, resulting in a disjointed representation space between the recovered and true data. In this paper, we present a novel recovery paradigm, namely Prototype-based Modality Completion Hashing (PMCH). Instead of directly synthesizing it from available modalities, PMCH adaptively aggregates associated within-modality prototypes to recover missing modality data. Specifically, PMCH introduces an within-modality prototype learning module to optimize representative prototypes for each modality. These prototypes act as recovery anchors and...","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.24963/ijcai.2025/281","openalex_id":"https://openalex.org/W4414359932","cited_by_count":3,"quality_score":48,"matched_keywords":["retrieval","efficient"],"author_affiliations":["Baidu (China)","Chinese Academy of Sciences","Cyber Defense Agency (United States)","Institute of Information Engineering","Sun Yat-sen University","University of Chinese Academy of Sciences","University of Science and Technology of China"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8202999830245972},{"id":"https://openalex.org/C2780226545","display_name":"Modality (human–computer interaction)","score":0.7131999731063843},{"id":"https://openalex.org/C99138194","display_name":"Hash function","score":0.7059000134468079},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.6700999736785889},{"id":"https://openalex.org/C2776359362","display_name":"Representation (politics)","score":0.6428999900817871},{"id":"https://openalex.org/C185798385","display_name":"Benchmark (surveying)","score":0.6043000221252441},{"id":"https://openalex.org/C9357733","display_name":"Missing data","score":0.5257999897003174},{"id":"https://openalex.org/C2776760102","display_name":"Code (set theory)","score":0.508899986743927}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4414360512","title":"Indirect Online Preference Optimization via Reinforcement Learning","url":"https://doi.org/10.24963/ijcai.2025/61","published":"2025-09-01","authors":["En Wang","Xingyu Lin","Du Su","Chenfu Bao","Zhonghou Lv","Funing Yang","Yuanbo Xu","Wenbin Liu"],"abstract":"Human preference alignment (HPA) aims to ensure Large Language Models (LLMs) responding appropriately to meet human moral and ethical requirements. Existing methods, such as RLHF and DPO, rely heavily on high-quality human annotation, which restrict the efficiency of iterative online model refinement. To address the inefficiencies of human annotation acquisition, iterated online strategy advocates the use of fine-tuned LLMs to self-generate preference data. However, this approach is prone to distribution bias, because of differences between human and model annotations, as well as modeling errors between simulators and real-world contexts. To mitigate the impact of distribution bias, we adopt the principles of adversarial training, framing a zero-sum two-player game with a protagonist agent and an adversarial agent. With the adversarial agent challenging the alignment of protagonist agent...","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.24963/ijcai.2025/61","openalex_id":"https://openalex.org/W4414360512","cited_by_count":1,"quality_score":46,"matched_keywords":["preference","agent"],"author_affiliations":["Baidu (China)","Institute of Computing Technology","Jilin Medical University","Jilin University","University of Southern Queensland","University of Technology Sydney","Yangzhou University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6952000260353088},{"id":"https://openalex.org/C37736160","display_name":"Adversarial system","score":0.6798999905586243},{"id":"https://openalex.org/C2781249084","display_name":"Preference","score":0.585099995136261},{"id":"https://openalex.org/C2777868144","display_name":"Preference elicitation","score":0.5716999769210815},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5044000148773193},{"id":"https://openalex.org/C46814582","display_name":"Nash equilibrium","score":0.45879998803138733},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.45179998874664307},{"id":"https://openalex.org/C169087156","display_name":"Framing (construction)","score":0.4496999979019165}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4414359437","title":"Do Mentioned Items Truly Matter? Enhancing Conversational Recommender Systems with Causal Intervention and Large Language Models","url":"https://doi.org/10.24963/ijcai.2025/470","published":"2025-09-01","authors":["Lingzhi Wang","Xingshan Zeng","Kam‐Fai Wong"],"abstract":"Conversational Recommender Systems (CRS) have become increasingly important due to their ability to recommend items through interactive dialogue, adapting to user preferences in real time. Traditional CRS approaches face challenges in generating high-quality, diverse responses due to the limited availability of training data and the inherited biases from domain-specific fine-tuning. Furthermore, existing systems often overlook the impact of confounding variables during user interactions, leading to suboptimal recommendations. In this work, we propose a novel hybrid framework that integrates large language models (LLMs) with traditional recommendation techniques to address these limitations. Our approach leverages the strengths of LLMs in generating fluent, contextually appropriate responses while employing a traditional recommendation module to capture complex interaction structures. To....","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.24963/ijcai.2025/470","openalex_id":"https://openalex.org/W4414359437","cited_by_count":1,"quality_score":46,"matched_keywords":["LLM","personalized"],"author_affiliations":["Chinese University of Hong Kong","Harbin Institute of Technology","Huawei Technologies (China)","Institute of Software","Peking University"],"concepts":[{"id":"https://openalex.org/C557471498","display_name":"Recommender system","score":0.7572000026702881},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7488999962806702},{"id":"https://openalex.org/C48044578","display_name":"Scalability","score":0.572700023651123},{"id":"https://openalex.org/C27415008","display_name":"Psychological intervention","score":0.5126000046730042},{"id":"https://openalex.org/C2780665704","display_name":"Intervention (counseling)","score":0.5027999877929688},{"id":"https://openalex.org/C2779530757","display_name":"Quality (philosophy)","score":0.4830999970436096},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.4099000096321106},{"id":"https://openalex.org/C2522767166","display_name":"Data science","score":0.37540000677108765}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4414359448","title":"Mechanism Design for Large Language Models (Extended Abstract)","url":"https://doi.org/10.24963/ijcai.2025/1210","published":"2025-09-01","authors":["Paul Dütting","Vahab Mirrokni","Renato Paes Leme","Haifeng Xu","Song Zuo"],"abstract":"We investigate auction mechanisms for AI-generated content, focusing on applications like ad creative generation. In our model, agents' preferences over stochastically generated content are encoded as large language models (LLMs). We propose an auction format that operates on a token-by-token basis, and allows LLM agents to influence content creation through single dimensional bids. We formulate two desirable incentive properties and prove their equivalence to a monotonicity condition on output aggregation. This equivalence enables a second-price rule design, even absent explicit agent valuation functions. Our design is supported by demonstrations on a publicly available LLM.","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.24963/ijcai.2025/1210","openalex_id":"https://openalex.org/W4414359448","cited_by_count":0,"quality_score":45,"matched_keywords":["LLM","agent"],"author_affiliations":["Google (United States)","University of Chicago"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7098000049591064},{"id":"https://openalex.org/C153517567","display_name":"Mechanism design","score":0.6819000244140625},{"id":"https://openalex.org/C2780069185","display_name":"Equivalence (formal languages)","score":0.5878999829292297},{"id":"https://openalex.org/C72169020","display_name":"Monotonic function","score":0.4982999861240387},{"id":"https://openalex.org/C91810955","display_name":"Incentive compatibility","score":0.486299991607666},{"id":"https://openalex.org/C80444323","display_name":"Theoretical computer science","score":0.42419999837875366},{"id":"https://openalex.org/C186027771","display_name":"Valuation (finance)","score":0.38359999656677246},{"id":"https://openalex.org/C163239763","display_name":"Common value auction","score":0.34940001368522644}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7125955950","title":"Cross-Domain Knowledge Transfer in Multimodal AI Systems for Enhanced Predictive Accuracy","url":"https://doi.org/10.1109/icact67549.2025.11351389","published":"2025-09-01","authors":["Vivek Ghatala","Sai Bhuvana Kurada","Sangeeta Singh","Shashank Majety","Preeti Sharma Nair"],"abstract":"Recently, cross-domain knowledge transfer in multimodal AI systems has proven problematic. The issue applies especially to apps that must correctly and scalably merge text, graphics, and audio. This paper proposes a three-step approach that allows learning in diverse modes while remaining semantically aligned and domain-invariant. First, deep networks encode each modality. For semantic unification, each modality is projected onto a shared latent space. Domain discriminators generalize features, whereas reconstruction routes and attention approaches protect contextual information and reduce over-fitting. The second phase improves representations with intra- and inter-modal attention modules. It improves learning using contrastive goals, entropy regularization, and domain-based consistency. Finally, a meta-optimization layer, adaptive scoring algorithms, and uncertainty-guided ensemble out...","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/icact67549.2025.11351389","openalex_id":"https://openalex.org/W7125955950","cited_by_count":0,"quality_score":45,"matched_keywords":["efficient","distillation"],"author_affiliations":["Amazon (United States)","Arizona State University","Centre for Artificial Intelligence and Robotics","EKF Diagnostics (United States)","Endometriosis"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7718999981880188},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.6929000020027161},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.5523999929428101},{"id":"https://openalex.org/C150899416","display_name":"Transfer of learning","score":0.45660001039505005},{"id":"https://openalex.org/C197129107","display_name":"Merge (version control)","score":0.4372999966144562},{"id":"https://openalex.org/C66746571","display_name":"ENCODE","score":0.42320001125335693},{"id":"https://openalex.org/C48044578","display_name":"Scalability","score":0.41819998621940613},{"id":"https://openalex.org/C2776960227","display_name":"Knowledge transfer","score":0.39469999074935913}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4414360678","title":"VidEvo: Evolving Video Editing through Exhaustive Temporal Modeling","url":"https://doi.org/10.24963/ijcai.2025/99","published":"2025-09-01","authors":["Sizhe Dang","Huan Liu","Mengmeng Wang","Xin Lai","Guang Dai","Jingdong Wang"],"abstract":"Text-guided video editing (TGVE) has become a recent hotspot due to its entertainment value and practical applications. To reduce overhead, existing methods primarily extend from text-to-image diffusion models and typically involve reconstruction and editing phases. However, challenges persist, particularly in enhancing temporal consistency of a video while adhering to textual alignment requirements. A crucial factor leading to the aforementioned issue is the inadequate and implicit tuning of the attention module within existing methods, which is specifically designed to capture temporal information. In light of this, we introduce VidEvo, a novel one-shot video editing method that leverages explicit cues derived from the original video to enhance temporal modeling. By integrating null-video embedding (NVE) and window-frame attention (WFA) components, VidEvo facilitates the smooth and coh...","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.24963/ijcai.2025/99","openalex_id":"https://openalex.org/W4414360678","cited_by_count":4,"quality_score":41,"matched_keywords":[],"author_affiliations":["Baidu (China)","State Grid Corporation of China (China)","Xi'an Jiaotong University","Zhejiang University of Technology","Alibaba Group (United States)","Shanghai Jiao Tong University","University of Science and Technology of China"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8338000178337097},{"id":"https://openalex.org/C41608201","display_name":"Embedding","score":0.5467000007629395},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4749999940395355},{"id":"https://openalex.org/C2781181686","display_name":"Coherence (philosophical gambling strategy)","score":0.45239999890327454},{"id":"https://openalex.org/C177264268","display_name":"Set (abstract data type)","score":0.43529999256134033},{"id":"https://openalex.org/C2776436953","display_name":"Consistency (knowledge bases)","score":0.388700008392334},{"id":"https://openalex.org/C2780310081","display_name":"Video editing","score":0.3714999854564667},{"id":"https://openalex.org/C77277458","display_name":"Temporal database","score":0.35109999775886536}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4413887381","title":"Structure-Induced Gradient Regulation for Generalizable Vision-Language Models","url":"https://doi.org/10.1109/tpami.2025.3604454","published":"2025-09-01","authors":["Juncheng Li","Minghe Gao","Siliang Tang","Longhui Wei","Jun Xiao","Fei Wu","Richang Hong","Meng Wang","Qi Tian"],"abstract":"Prompt tuning, a recently emerging paradigm, adapts vision-language pre-trained models to new tasks efficiently by learning \"soft prompts\" for frozen models. However, in few-shot scenarios, its effectiveness is limited by sensitivity to the initialization and the time-consuming search for optimal initialization, hindering rapid adaptation. Additionally, prompt tuning risks reducing the models' generalizability due to overfitting on scarce training samples. To overcome these challenges, we introduce a novel Gradient-RegulAted Meta-prompt learning (GRAM) framework that jointly meta-learns an efficient soft prompt initialization for better adaptation and a lightweight gradient regulating function for strong cross-domain generalizability in a meta-learning paradigm using only the weakly labeled image-text pre-training data. This is achieved through a Cross-Modal Hierarchical Clustering algor...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/tpami.2025.3604454","openalex_id":"https://openalex.org/W4413887381","cited_by_count":0,"quality_score":41,"matched_keywords":["efficient"],"author_affiliations":["Hefei University of Technology","Huawei Technologies (China)","Zhejiang University"],"concepts":[{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.6596612930297852},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6483461260795593},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.49940037727355957},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.37702468037605286},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.32404547929763794}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4414359134","title":"MedDiT: A Knowledge-Controlled Diffusion Transformer Framework for Dynamic Medical Image Generation in Virtual Simulated Patient","url":"https://doi.org/10.24963/ijcai.2025/1267","published":"2025-09-01","authors":["Yanzeng Li","Cheng Zeng","Jinchao Zhang","Jie Zhou","Lei Zou"],"abstract":"Medical education relies heavily on Simulated Patients (SPs) to provide a safe environment for students to practice clinical skills, including medical image analysis. However, the high cost of recruiting qualified SPs and the lack of diverse medical imaging datasets have presented significant challenges. To address these issues, this paper introduces MedDiT, a novel knowledge-controlled conversational framework that can dynamically generate plausible medical images aligned with simulated patient symptoms, enabling diverse diagnostic skill training. Specifically, MedDiT integrates various patient Knowledge Graphs (KGs), which describe the attributes and symptoms of patients, to dynamically prompt Large Language Models' (LLMs) behavior and control the patient characteristics, mitigating hallucination during medical conversation. Additionally, a well-tuned Diffusion Transformer (DiT) model....","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.24963/ijcai.2025/1267","openalex_id":"https://openalex.org/W4414359134","cited_by_count":0,"quality_score":41,"matched_keywords":["LLM"],"author_affiliations":["Peking University","Tencent (China)","Wuhan University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.635200023651123},{"id":"https://openalex.org/C2778533338","display_name":"Virtual patient","score":0.6082000136375427},{"id":"https://openalex.org/C503897019","display_name":"Medical simulation","score":0.503600001335144},{"id":"https://openalex.org/C31601959","display_name":"Medical imaging","score":0.47209998965263367},{"id":"https://openalex.org/C66322947","display_name":"Transformer","score":0.4146000146865845},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.3959999978542328},{"id":"https://openalex.org/C2775924081","display_name":"Control (management)","score":0.3785000145435333},{"id":"https://openalex.org/C49774154","display_name":"Multimedia","score":0.37070000171661377}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4414360335","title":"Hallucination-Aware Prompt Optimization for Text-to-Video Synthesis","url":"https://doi.org/10.24963/ijcai.2025/1133","published":"2025-09-01","authors":["Jiapeng Wang","Chengyu Wang","Jun Huang","Lianwen Jin"],"abstract":"The rapid advancements in AI-generated content (AIGC) have led to extensive research and application of deep text-to-video (T2V) synthesis models, such as OpenAI's Sora. These models typically rely on high-quality prompt-video pairs and detailed text prompts for model training in order to produce high-quality videos. To boost the effectiveness of Sora-like T2V models, we introduce VidPrompter, an innovative large multi-modal model supporting T2V applications with three key functionalities: (1) generating detailed prompts from raw videos, (2) enhancing prompts from videos grounded with short descriptions, and (3) refining simple user-provided prompts to elevate T2V video quality. We train VidPrompter using a hybrid multi-task paradigm and propose the hallucination-aware direct preference optimization (HDPO) technique to improve the multi-modal, multi-task prompt optimization process. Expe...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.24963/ijcai.2025/1133","openalex_id":"https://openalex.org/W4414360335","cited_by_count":0,"quality_score":41,"matched_keywords":["preference"],"author_affiliations":["Alibaba Group (China)","South China University of Technology"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.769599974155426},{"id":"https://openalex.org/C26517878","display_name":"Key (lock)","score":0.5914000272750854},{"id":"https://openalex.org/C2780586882","display_name":"Simple (philosophy)","score":0.5054000020027161},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5004000067710876},{"id":"https://openalex.org/C2781249084","display_name":"Preference","score":0.4417000114917755},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.39980000257492065},{"id":"https://openalex.org/C60044698","display_name":"Refining (metallurgy)","score":0.3781000077724457},{"id":"https://openalex.org/C108583219","display_name":"Deep learning","score":0.3682999908924103}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4414359769","title":"Breaking the Self-Evaluation Barrier: Reinforced Neuro-Symbolic Planning with Large Language Models","url":"https://doi.org/10.24963/ijcai.2025/682","published":"2025-09-01","authors":["Jie-Jing Shao","H You","Guohao Cai","Quanyu Dai","Zhenhua Dong","Lan-Zhe Guo"],"abstract":"Large Language Models (LLMs) have demonstrated remarkable capabilities in language understanding and commonsense reasoning, yet they often struggle with constraint satisfaction in planning problems. Previous studies relying on test-time improvement with self-evaluation fail to address this limitation effectively. In this work, we identify this critical gap and propose a novel neuro-symbolic framework, Reinforced Neuro-Symbolic Planning (\\algo), that enhances LLM-powered planning by incorporating a symbolic verifier. The verifier provides explicit feedback on constraint satisfaction, enabling iterative refinement of the state evaluation. Specifically, we utilize the outcome feedback from each logical goal to update the process value along planning paths through a reinforcement value function maximization objective. We further employ T-norms to aggregate the satisfaction levels of multiple...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.24963/ijcai.2025/682","openalex_id":"https://openalex.org/W4414359769","cited_by_count":0,"quality_score":41,"matched_keywords":["LLM"],"author_affiliations":["Huawei Technologies (China)","Huawei Technologies (Sweden)","Nanjing University","Nanjing University of Science and Technology","Renmin University of China","University of Chinese Academy of Sciences"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6877999901771545},{"id":"https://openalex.org/C2776330181","display_name":"Maximization","score":0.5501000285148621},{"id":"https://openalex.org/C2776036281","display_name":"Constraint (computer-aided design)","score":0.5117999911308289},{"id":"https://openalex.org/C97541855","display_name":"Reinforcement learning","score":0.5076000094413757},{"id":"https://openalex.org/C14036430","display_name":"Function (biology)","score":0.4991999864578247},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.47530001401901245},{"id":"https://openalex.org/C44616089","display_name":"Constraint satisfaction","score":0.4526999890804291},{"id":"https://openalex.org/C98045186","display_name":"Process (computing)","score":0.4478999972343445}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4414229447","title":"Special Issue Editorial: Generative AI for Secure Communications and Networking","url":"https://doi.org/10.1109/mnet.2025.3592783","published":"2025-09-01","authors":["Liehuang Zhu","Dusit Niyato","Rongxing Lu","Xiaojiang Du","Fatima Hussain","Yao Sun","Shen Yan"],"abstract":"","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/mnet.2025.3592783","openalex_id":"https://openalex.org/W4414229447","cited_by_count":2,"quality_score":39,"matched_keywords":[],"author_affiliations":["Beijing Institute of Technology","Huawei Technologies (China)","Nanyang Technological University","Royal Bank of Canada","Stevens Institute of Technology","University of Glasgow","University of New Brunswick"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8245000243186951},{"id":"https://openalex.org/C31258907","display_name":"Computer network","score":0.382099986076355},{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.3723999857902527},{"id":"https://openalex.org/C38652104","display_name":"Computer security","score":0.36239999532699585},{"id":"https://openalex.org/C49774154","display_name":"Multimedia","score":0.35100001096725464},{"id":"https://openalex.org/C136764020","display_name":"World Wide Web","score":0.3280999958515167},{"id":"https://openalex.org/C76155785","display_name":"Telecommunications","score":0.32249999046325684},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.30630001425743103}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":2}},{"id":"openalex:W4414008981","title":"Generative Artificial Intelligence Models for Emerging Communication Systems: Fundamentals and Challenges","url":"https://doi.org/10.1109/mcom.001.2400730","published":"2025-09-01","authors":["Zhiyuan Peng","Yuchen Liu","Gaolei Li","Zhaohui Yang","Mingzhe Chen","Dongkuan Xu","Xingqin Lin"],"abstract":"The evolution of generative artificial intelligence (GenAI) marks a pivotal shift in the potential reshaping of technological landscapes. Wireless networks, bolstered by the advent of advanced intelligent technologies, present a promising domain for leveraging GenAI, which could revolutionize the current networking design and communication paradigms. Extensive research has reviewed large language models, a significant yet specialized facet of GenAI, and explored their integration into networks. This article offers a broad introduction to GenAI and delves into its applications within various emerging communication technologies. We start by providing an overview of representative GenAI models, including variational autoencoder, Transformer, and diffusion models, elucidating their foundational principles. Subsequently, we spotlight their emerging applications in advanced communication syste...","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/mcom.001.2400730","openalex_id":"https://openalex.org/W4414008981","cited_by_count":1,"quality_score":38,"matched_keywords":[],"author_affiliations":["North Carolina State University","Nvidia (United States)","Shanghai Jiao Tong University","University of Miami","Zhejiang University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8562201261520386},{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.6525264382362366},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4992942810058594},{"id":"https://openalex.org/C101765175","display_name":"Communications system","score":0.4152728021144867},{"id":"https://openalex.org/C167966045","display_name":"Generative model","score":0.41495072841644287},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.33362871408462524},{"id":"https://openalex.org/C2522767166","display_name":"Data science","score":0.33172643184661865},{"id":"https://openalex.org/C76155785","display_name":"Telecommunications","score":0.2600433826446533}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4414359628","title":"M4Bench: A Benchmark of Multi-domain Multi-granularity Multi-image Understanding for Multi-modal Large Language Models","url":"https://doi.org/10.24963/ijcai.2025/762","published":"2025-09-01","authors":["Xiaojun Ye","Guanbao Liang","Chunhui Wang","Liangcheng Li","Pengfei Ke","Rui Wang","Bingxin Jia","Gang Huang","Qiao Sun","Sheng Zhou"],"abstract":"The increasing demands in analyzing complex associated scenes pose necessities to researching multi-image understanding abilities. Compared with understanding individual images, both the alignments and differences between images are essential aspects of understanding the intricate relationships for multi-image inference tasks. However, existing benchmarks face difficulties in addressing both of these aspects simultaneously, resulting in obstacles to modeling relationships under various granularities and domains of images. In this paper, we introduce M4Bench to enhance the capability of aligning and distinguishing multi-images with multi-domain multi-granularity comparison. We carefully design five comparison tasks related to coarse and fine-grained granularities in single and multiple domains of images and evaluate them on 13 state-of-the-art multi-modal large language models with variou...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.24963/ijcai.2025/762","openalex_id":"https://openalex.org/W4414359628","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Alibaba Group (China)","Zhejiang University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7728000283241272},{"id":"https://openalex.org/C2776035091","display_name":"Viewpoints","score":0.6930999755859375},{"id":"https://openalex.org/C2776214188","display_name":"Inference","score":0.5722000002861023},{"id":"https://openalex.org/C185798385","display_name":"Benchmark (surveying)","score":0.5532000064849854},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5357000231742859},{"id":"https://openalex.org/C2776760102","display_name":"Code (set theory)","score":0.5134999752044678},{"id":"https://openalex.org/C2779304628","display_name":"Face (sociological concept)","score":0.4966999888420105},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.4189000129699707}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"official:56f582298cb51346","title":"Claude Sonnet 4.5 System Card","url":"https://www-cdn.anthropic.com/963373e433e489a87a10c823c52a0a013e9172dd.pdf","published":"2025-09","authors":["Anthropic"],"abstract":"Official Anthropic system card for Claude Sonnet 4.5.","companies":["Anthropic"],"matched_orgs":["Anthropic"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"model_card","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["system card","Claude","Claude Sonnet 4.5"],"author_affiliations":["Anthropic"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Anthropic system cards page https://www.anthropic.com/system-cards"}},{"id":"official:e4bae2e36ac2ecc4","title":"VT-Refine: Learning Bimanual Assembly with Visuo-Tactile Feedback via Simulation Fine-Tuning","url":"https://research.nvidia.com/publication/2025-09_vt-refine-learning-bimanual-assembly-visuo-tactile-feedback-simulation-fine","published":"2025-09","authors":["Binghao Huang","Jie Xu","Iretiayo Akinola","Wei Yang","Balakumar Sundaralingam","Rowland O'Flaherty","Dieter Fox","Xiaolong Wang","Arsalan Mousavian","Yu-Wei Chao","Yunzhu Li"],"abstract":"Official NVIDIA Research publication. CORL","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["CORL"],"author_affiliations":["NVIDIA"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official NVIDIA Research publications page https://research.nvidia.com/publications?f%5B0%5D=publication_date%3A2025&page=0"}},{"id":"official:2d505051738dc189","title":"FVDebug: An LLM-Driven Debugging Assistant for Automated Root Cause Analysis of Formal Verification Failures","url":"https://research.nvidia.com/publication/2025-09_fvdebug-llm-driven-debugging-assistant-automated-root-cause-analysis-formal","published":"2025-09","authors":["Yunsheng Bai","Ghaith Bany Hamad","Chia-Tung (Mark) Ho","Syed Suhaib","Mark Haoxing Ren"],"abstract":"Official NVIDIA Research publication.","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["LLM"],"author_affiliations":["NVIDIA"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official NVIDIA Research publications page https://research.nvidia.com/publications?f%5B0%5D=publication_date%3A2025&page=0"}},{"id":"official:cc83e607652f2d59","title":"World Simulation With Video Foundation Models for Physical AI","url":"https://research.nvidia.com/publication/2025-09_world-simulation-video-foundation-models-physical-ai","published":"2025-09","authors":["Ming-Yu Liu"],"abstract":"Official NVIDIA Research publication.","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["NVIDIA"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official NVIDIA Research publications page https://research.nvidia.com/publications?f%5B0%5D=publication_date%3A2025&page=0"}},{"id":"official:69f4482cc8eb5121","title":"Isaac Lab: A GPU Accelerated Simulation Framework For Multi-Modal Robot Learning","url":"https://research.nvidia.com/publication/2025-09_isaac-lab-gpu-accelerated-simulation-framework-multi-modal-robot-learning","published":"2025-09","authors":["Mayank Mittal","Kelly Guo","Gavriel State","Spencer Huang"],"abstract":"Official NVIDIA Research publication.","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["NVIDIA"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official NVIDIA Research publications page https://research.nvidia.com/publications?f%5B0%5D=publication_date%3A2025&page=0"}},{"id":"openalex:W4413854774","title":"EndoChat: Grounded multimodal large language model for endoscopic surgery","url":"https://doi.org/10.1016/j.media.2025.103789","published":"2025-08-31","authors":["Guankun Wang","Long Bai","Junyi Wang","Kun Yuan","Zhen Li","Tianxu Jiang","Xin‐Tao He","Jinlin Wu","Zhen Chen","Zhen Lei","Hongbin Liu","Jiazheng Wang"],"abstract":"Recently, Multimodal Large Language Models (MLLMs) have demonstrated their immense potential in computer-aided diagnosis and decision-making. In the context of robotic-assisted surgery, MLLMs can serve as effective tools for surgical training and guidance. However, there is still a deficiency of MLLMs specialized for surgical scene understanding in endoscopic procedures. To this end, we present EndoChat, an MLLM tailored to address various dialogue paradigms and subtasks in understanding endoscopic procedures. To train our EndoChat, we construct the Surg-396K dataset through a novel pipeline that systematically extracts surgical information and generates structured annotations based on large-scale endoscopic surgery datasets. Furthermore, we introduce a multi-scale visual token interaction mechanism and a visual contrast-based reasoning mechanism to enhance the model's representation lea...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1016/j.media.2025.103789","openalex_id":"https://openalex.org/W4413854774","cited_by_count":7,"quality_score":48,"matched_keywords":["language model"],"author_affiliations":["Centre National de la Recherche Scientifique","Centre for Artificial Intelligence and Robotics","Chinese University of Hong Kong","Huawei Technologies (China)","Inserm","Qilu Hospital of Shandong University","Technical University of Munich","Université de Strasbourg"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5310412049293518},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.42997321486473083},{"id":"https://openalex.org/C41895202","display_name":"Linguistics","score":0.33067697286605835},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.32295626401901245},{"id":"https://openalex.org/C138885662","display_name":"Philosophy","score":0.12387996912002563}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":7}},{"id":"openalex:W4413850687","title":"Test-Time Prompt Tuning for Vision-Language Models","url":"https://doi.org/10.1007/978-3-031-94969-2_6","published":"2025-08-30","authors":["Manli Shu","Weili Nie","De-An Huang","Zhiding Yu","Tom Goldstein","Anima Anandkumar","Chaowei Xiao"],"abstract":"","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"book-chapter","doi":"https://doi.org/10.1007/978-3-031-94969-2_6","openalex_id":"https://openalex.org/W4413850687","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["California Institute of Technology","Nvidia (United States)","University of Maryland, College Park","University of Wisconsin–Madison"],"concepts":[{"id":"https://openalex.org/C2777267654","display_name":"Test (biology)","score":0.6143544316291809},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.4933139383792877},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3828210234642029},{"id":"https://openalex.org/C15744967","display_name":"Psychology","score":0.3783648908138275},{"id":"https://openalex.org/C127313418","display_name":"Geology","score":0.10534805059432983},{"id":"https://openalex.org/C151730666","display_name":"Paleontology","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4413847667","title":"WoundcareVQA: A multilingual visual question answering benchmark dataset for wound care","url":"https://doi.org/10.1016/j.jbi.2025.104888","published":"2025-08-29","authors":["Wen-wai Yim","Asma Ben Abacha","Robert Doerning","Chia-Yu Chen","Jiaying Xu","A. V. Subbarao","Zixuan Yu","Fei Xia","M. Kennedy Hall","Meliha Yetişgen"],"abstract":"OBJECTIVE: Introduce the task of wound care multimodal multilingual visual question answering, provide baseline performances, and identify areas of future study. METHODS: A dataset of wound care multimodal multilingual visual question answering (VQA) was created using consumer health questions asked online. Practicing US medical doctors were tasked with providing metadata and expert responses labels. Several instruct-enabled, multilingual visual question answering models (GPT-4o, Gemini-1.5-Pro, and Qwen-VL) were tested to benchmark performances. Finally, automatic evaluations were tested against domain expert response ratings. RESULTS: A multilingual dataset of 477 wound care cases, 768 responses, 748 images, 3k structured data labels, 1362 translation instances, and 10k judgments was constructed (https://osf.io/xsj5u/). Metadata scores ranged from 0.32-0.78 accuracy depending on classi...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1016/j.jbi.2025.104888","openalex_id":"https://openalex.org/W4413847667","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Microsoft (United States)","Microsoft Research (United Kingdom)","Seattle University","University of Washington"],"concepts":[{"id":"https://openalex.org/C185798385","display_name":"Benchmark (surveying)","score":0.7019469738006592},{"id":"https://openalex.org/C44291984","display_name":"Question answering","score":0.6848299503326416},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6785454154014587},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.4379371404647827},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.43193352222442627},{"id":"https://openalex.org/C2984752397","display_name":"Primary care","score":0.4106071889400482},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.34181851148605347},{"id":"https://openalex.org/C2522767166","display_name":"Data science","score":0.3397493362426758}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4413822140","title":"MMGT: Motion Mask Guided Two-Stage Network for Co-Speech Gesture Video Generation","url":"https://doi.org/10.1109/tcsvt.2025.3604109","published":"2025-08-29","authors":["Siyuan Wang","Jiawei Liu","Wei Wang","Yeying Jin","Jinsong Du","Zhi Han"],"abstract":"","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/tcsvt.2025.3604109","openalex_id":"https://openalex.org/W4413822140","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Chinese Academy of Sciences","GlobalFoundries (Singapore)","Shenyang Institute of Automation","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C207347870","display_name":"Gesture","score":0.7487742900848389},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7403740882873535},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.46853771805763245},{"id":"https://openalex.org/C104114177","display_name":"Motion (physics)","score":0.4618161916732788},{"id":"https://openalex.org/C146357865","display_name":"Stage (stratigraphy)","score":0.44790786504745483},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.44519126415252686},{"id":"https://openalex.org/C28490314","display_name":"Speech recognition","score":0.42157042026519775},{"id":"https://openalex.org/C151730666","display_name":"Paleontology","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/echoes-in-ai-quantifying-lack-of-plot-diversity-in-llm-outputs","title":"Echoes in AI: Quantifying Lack of Plot Diversity in LLM Outputs","url":"https://www.microsoft.com/en-us/research/publication/echoes-in-ai-quantifying-lack-of-plot-diversity-in-llm-outputs/","published":"2025-08-28","authors":["Weijia Xu","Nebojsa Jojic","Sudha Rao","Chris Brockett","Bill Dolan"],"abstract":"With rapid advances in large language models (LLMs), there has been an increasing application of LLMs in creative content ideation and generation. A critical question emerges: can current LLMs provide ideas that are diverse enough to truly bolster the collective creativity? We examine two state-of-the-art LLMs, GPT-4 and LLaMA-3, on story generation and discover that LLM-generated stories often consist of plot elements that are echoed across a number of generations. This repetition lends itself to less unique outputs that are deterministic and predictable. To quantify this phenomenon, we introduce the Sui Generis (Latin for \"of its own kind\") score, an automatic metric that estimates how unlikely a plot element is to appear in alternative storylines generated by the same LLM. It helps quantify creativity at the narrative level, not just by counting unique words or topics and it also corr...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.1073/pnas.2504966122","openalex_id":"https://openalex.org/W4413785803","cited_by_count":4,"quality_score":72,"matched_keywords":["Article (Journal)","Artificial intelligence","Human language technologies","LLM"],"author_affiliations":["Microsoft","Microsoft (United States)","Microsoft Research (United Kingdom)"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W4413792188","title":"Q-Infer: Towards Efficient GPU-CPU Collaborative LLM Inference via Sparsity-Aware Dynamic Scheduling","url":"https://doi.org/10.1145/3764589","published":"2025-08-28","authors":["Kai Lü","Qiang Wei","Yier Lin","P. X. Liu","Haipeng Wang","Jiguang Wan","Ting Yao","Huatao Wu","Daohui Wang"],"abstract":"Large Language Models (LLMs) have sparked a new wave of exciting AI applications, yet their large model size imposes significant computational and storage costs during inference. Offloading parameters to the CPU and conducting GPU-CPU collaborative inference is a highly cost-effective strategy to alleviate GPU memory constraints. However, current solutions struggle to balance latency and throughput, and suffer from accuracy loss and performance fluctuations under various workloads and configurations. In this article, we propose Q-Infer, an efficient GPU-CPU collaborative inference system that significantly improves the performance and quality of LLM inference through several optimizations: (1) Q-Infer designs a dynamic caching strategy for important parameters by exploiting model sparsity and locality. (2) Q-Infer proposes a multi-window-based approach for selecting important tokens, whi...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3764589","openalex_id":"https://openalex.org/W4413792188","cited_by_count":0,"quality_score":49,"matched_keywords":["LLM","memory","efficient"],"author_affiliations":["Huawei Technologies (China)","Huazhong University of Science and Technology","Wuhan National Laboratory for Optoelectronics"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.9013820886611938},{"id":"https://openalex.org/C2776214188","display_name":"Inference","score":0.8109768629074097},{"id":"https://openalex.org/C115537543","display_name":"Cache","score":0.629957377910614},{"id":"https://openalex.org/C206729178","display_name":"Scheduling (production processes)","score":0.6079136729240417},{"id":"https://openalex.org/C173608175","display_name":"Parallel computing","score":0.5556106567382812},{"id":"https://openalex.org/C49154492","display_name":"Central processing unit","score":0.533912718296051},{"id":"https://openalex.org/C180613757","display_name":"CPU shielding","score":0.5046919584274292},{"id":"https://openalex.org/C2779808786","display_name":"Locality","score":0.48864442110061646}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"hf-org-paper:tencent:2508.19855","title":"Youtu-GraphRAG: Vertically Unified Agents for Graph Retrieval-Augmented Complex Reasoning","url":"https://huggingface.co/papers/2508.19855","published":"2025-08-27","authors":["Tencent/Hunyuan"],"abstract":"","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["HuggingFace org papers","tencent","retrieval"],"author_affiliations":["Tencent/Hunyuan"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/tencent/papers"}},{"id":"hf-org-paper:tencent:2508.19652","title":"Self-Rewarding Vision-Language Model via Reasoning Decomposition","url":"https://huggingface.co/papers/2508.19652","published":"2025-08-27","authors":["Tencent/Hunyuan"],"abstract":"Vision-Language Models (VLMs) often suffer from visual hallucinations, saying things that are not actually in the image, and language shortcuts, where they skip the visual part and just rely on text priors. These issues arise because most post-training methods for VLMs rely on simple verifiable answer matching and supervise only final outputs, leaving intermediate visual reasoning without explicit guidance. As a result, VLMs receive sparse visual signals and often learn to prioritize language-based reasoning over visual perception. To mitigate this, some existing methods add visual supervision using human annotations or distilled labels from external large models. However, human annotations are labor-intensive and costly, and because external signals cannot adapt to the evolving policy, they cause distributional shifts that can lead to reward hacking. In this paper, we introduce Vision-S...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["HuggingFace org papers","tencent","language model"],"author_affiliations":["Tencent/Hunyuan"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/tencent/papers"}},{"id":"openalex:W4413741796","title":"New frontiers in artificial intelligence for biodiversity research and conservation with multimodal language models","url":"https://doi.org/10.1111/2041-210x.70120","published":"2025-08-27","authors":["Zhongqi Miao","Yuanhan Zhang","Zalan Fabian","Andres Hernandez Celis","Sara Beery","Chunyuan Li","Ziwei Liu","Amrita Gupta","Md Nasir","Wanhua Li","Jason Holmberg","Meredith S. Palmer"],"abstract":"Abstract The integration of artificial intelligence (AI) into biodiversity research and conservation is growing rapidly, demonstrating great potential in reducing the intensive human labour required for data preprocessing, thereby, facilitating larger data collections that offer ecological insights at unprecedented scales. However, most of these AI applications for biodiversity are still in the early stages of development, hindered by challenges inherent in real‐world datasets and the limited accessibility of these technologies to practitioners without extensive programming knowledge. The recent advent of multimodal language models, which can process and generate multiple data modalities, has significantly expanded the realm of possible AI applications in biodiversity research. These models have demonstrated the ability to classify species and recognize more complex concepts, such as ani...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1111/2041-210x.70120","openalex_id":"https://openalex.org/W4413741796","cited_by_count":4,"quality_score":41,"matched_keywords":[],"author_affiliations":["Harvard University Press","Massachusetts Institute of Technology","Microsoft (United States)","Nanyang Technological University","Universidad de Los Andes","University of British Columbia","University of Southern California","Wildlife Information Liaison Development","Yale University"],"concepts":[{"id":"https://openalex.org/C130217890","display_name":"Biodiversity","score":0.6265774369239807},{"id":"https://openalex.org/C2994246104","display_name":"Biodiversity conservation","score":0.5650231242179871},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.49198052287101746},{"id":"https://openalex.org/C2522767166","display_name":"Data science","score":0.3503209054470062},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3426957130432129},{"id":"https://openalex.org/C18903297","display_name":"Ecology","score":0.3334222435951233},{"id":"https://openalex.org/C205649164","display_name":"Geography","score":0.32085707783699036},{"id":"https://openalex.org/C86803240","display_name":"Biology","score":0.13002341985702515}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":4}},{"id":"hf-org-paper:tencent:2508.19201","title":"Understanding Tool-Integrated Reasoning","url":"https://huggingface.co/papers/2508.19201","published":"2025-08-26","authors":["Tencent/Hunyuan"],"abstract":"","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["HuggingFace org papers","tencent"],"author_affiliations":["Tencent/Hunyuan"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/tencent/papers"}},{"id":"openalex:W4413676832","title":"Towards domain-adapted large language models for water and wastewater management: methods, datasets and benchmarking","url":"https://doi.org/10.1038/s41545-025-00509-8","published":"2025-08-26","authors":["Boyan Xu","Guanlan Wu","Zihao Li","Guangming Xu","Huabin Zeng","Rui Tong","How Yong Ng"],"abstract":"Large language models (LLMs) have shown significant promise for water and wastewater management. However, current foundation models are not yet reliable. This Perspective outlines a pathway for customizing foundation models into WaterGPTs (specialized LLMs for water and wastewater management). We present key methodologies for adapting foundation models into WaterGPTs, including prompt engineering, knowledge and tool augmentation, and fine-tuning, and they are illustrated through representative examples. Then, we highlight the importance of diverse and ethically sourced datasets to customize foundation models, and we propose strategies for efficiently extracting high-quality information to customize foundation models. Further, we advocate for the development of a secure, informative, and dynamic evaluation benchmark that will guide the creation of more reliable WaterGPT. To illustrate pra...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1038/s41545-025-00509-8","openalex_id":"https://openalex.org/W4413676832","cited_by_count":4,"quality_score":45,"matched_keywords":["LLM"],"author_affiliations":["Beijing Normal University","Google (United States)","National University of Singapore","Xiamen University"],"concepts":[{"id":"https://openalex.org/C86251818","display_name":"Benchmarking","score":0.9165470600128174},{"id":"https://openalex.org/C36503486","display_name":"Domain (mathematical analysis)","score":0.5851803421974182},{"id":"https://openalex.org/C94061648","display_name":"Wastewater","score":0.5819600820541382},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5802579522132874},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.33988282084465027},{"id":"https://openalex.org/C39432304","display_name":"Environmental science","score":0.2892140746116638},{"id":"https://openalex.org/C87717796","display_name":"Environmental engineering","score":0.16265052556991577},{"id":"https://openalex.org/C144133560","display_name":"Business","score":0.14403828978538513}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":4}},{"id":"openalex:W4413680550","title":"Evaluating the Effectiveness of Parameter-Efficient Fine-Tuning in Genomic Classification Tasks","url":"https://doi.org/10.1101/2025.08.21.671544","published":"2025-08-26","authors":["Daniel S. Berman","Daniel A. Jiménez","Shengjun Ta","Brian Merritt","Jeremy Ratcliff","Vijay Narayan"],"abstract":"ABSTRACT Foundation models are increasingly being leveraged for biological tasks. To address the high memory requirements of fine-tuning large pre-trained language models, parameter efficient fine-tuning (PEFT) methods are also being increasingly utilized. Previous studies have shown minimal, if any, loss in performance when using PEFT on binary classification tasks. However, the impact of using PEFT on tasks with large classification spaces has not been systemically evaluated. In this work, we apply PEFT to the problem of taxonomic classification using pre-trained genomic language models as the classification backbone. We explore various training strategies—including PEFT, full fine-tuning, and partial fine-tuning—for classifying sequences at the superkingdom, phylum, and genus levels. We find that PEFT-trained models significantly underperform compared to those trained via full fine-tu...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"preprint","doi":"https://doi.org/10.1101/2025.08.21.671544","openalex_id":"https://openalex.org/W4413680550","cited_by_count":0,"quality_score":45,"matched_keywords":["memory","efficient"],"author_affiliations":["Google (United States)","Google DeepMind (United Kingdom)","Johns Hopkins University Applied Physics Laboratory"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5421754717826843},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.37169793248176575},{"id":"https://openalex.org/C70721500","display_name":"Computational biology","score":0.3269649147987366},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.320673406124115},{"id":"https://openalex.org/C86803240","display_name":"Biology","score":0.14838066697120667}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4413536174","title":"Machine Translation in the Era of Large Language Models:A Survey of Historical and Emerging Problems","url":"https://doi.org/10.3390/info16090723","published":"2025-08-25","authors":["Duygu Ataman","Alexandra Birch","Nizar Habash","Marcello Federico","Philipp Koehn","Kyunghyun Cho"],"abstract":"Historically regarded as one of the most challenging tasks presented to achieve complete artificial intelligence (AI), machine translation (MT) research has seen continuous devotion over the past decade, resulting in cutting-edge architectures for the modeling of sequential information. While the majority of statistical models traditionally relied on the idea of learning from parallel translation examples, recent research exploring self-supervised and multi-task learning methods extended the capabilities of MT models, eventually allowing the creation of general-purpose large language models (LLMs). In addition to versatility in providing translations useful across languages and domains, LLMs can in principle perform any natural language processing (NLP) task given sufficient amount of task-specific examples. While LLMs now reach a point where they can both replace and augment traditional...","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.3390/info16090723","openalex_id":"https://openalex.org/W4413536174","cited_by_count":7,"quality_score":48,"matched_keywords":["LLM"],"author_affiliations":["Amazon (United States)","Johns Hopkins University","New York University","New York University Abu Dhabi","University of Edinburgh"],"concepts":[{"id":"https://openalex.org/C149364088","display_name":"Translation (biology)","score":0.5709070563316345},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.4895263612270355},{"id":"https://openalex.org/C2522767166","display_name":"Data science","score":0.40456950664520264},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.3645102381706238},{"id":"https://openalex.org/C41895202","display_name":"Linguistics","score":0.32242053747177124},{"id":"https://openalex.org/C185592680","display_name":"Chemistry","score":0.14108699560165405},{"id":"https://openalex.org/C138885662","display_name":"Philosophy","score":0.11277908086776733},{"id":"https://openalex.org/C55493867","display_name":"Biochemistry","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":7}},{"id":"openalex:W4415822189","title":"Agreeing to Interact in Human-Robot Interaction using Large Language Models and Vision Language Models","url":"https://doi.org/10.1109/ro-man63969.2025.11217646","published":"2025-08-25","authors":["Kazuhiro Sasabuchi","Naoki Wake","Atsushi Kanehira","Jun Takamatsu","Katsushi Ikeuchi"],"abstract":"In human-robot interaction (HRI), the beginning of an interaction is often complex. Whether the robot should communicate with the human is dependent on several situational factors (e.g., the current human’s activity, urgency of the interaction, etc.). We test whether large language models (LLM) and vision language models (VLM) can provide solutions to this problem. We compare four different system-design patterns using LLMs and VLMs, and test on a test set containing 84 human-robot situations. The test set mixes several publicly available datasets and also includes situations where the appropriate action to take is open-ended. Our results using the GPT-4o and Phi-3 Vision model indicate that LLMs and VLMs are capable of handling interaction beginnings when the desired actions are clear. The design using direct image input scored an 89% accuracy on the test set. Of the designs using indir...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/ro-man63969.2025.11217646","openalex_id":"https://openalex.org/W4415822189","cited_by_count":3,"quality_score":44,"matched_keywords":["LLM"],"author_affiliations":["Microsoft (United States)"],"concepts":[{"id":"https://openalex.org/C177264268","display_name":"Set (abstract data type)","score":0.694599986076355},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6377999782562256},{"id":"https://openalex.org/C2777267654","display_name":"Test (biology)","score":0.598800003528595},{"id":"https://openalex.org/C90509273","display_name":"Robot","score":0.5769000053405762},{"id":"https://openalex.org/C2780791683","display_name":"Action (physics)","score":0.5756999850273132},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5408999919891357},{"id":"https://openalex.org/C9114305","display_name":"Situational ethics","score":0.5217999815940857},{"id":"https://openalex.org/C145460709","display_name":"Human–robot interaction","score":0.5174999833106995}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":3}},{"id":"openalex:W4415821773","title":"Plan-and-Act using Large Language Models for Interactive Agreement","url":"https://doi.org/10.1109/ro-man63969.2025.11217788","published":"2025-08-25","authors":["Kazuhiro Sasabuchi","Naoki Wake","Atsushi Kanehira","Jun Takamatsu","Katsushi Ikeuchi"],"abstract":"Recent large language models (LLMs) are capable of planning robot actions. In this paper, we explore how LLMs can be used for planning actions with tasks involving situational human-robot interaction (HRI). A key problem of applying LLMs in situational HRI is balancing between \"respecting the current human’s activity\" and \"prioritizing the robot’s task,\" as well as understanding the timing of when to use the LLM to generate an action plan. In this paper, we propose a necessary plan-and-act skill design to solve the above problems. We show that a critical factor for enabling a robot to switch between passive / active interaction behavior is to provide the LLM with an action text about the current robot’s action. We also show that a second-stage question to the LLM (about the next timing to call the LLM) is necessary for planning actions at an appropriate timing. The skill design is applie...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/ro-man63969.2025.11217788","openalex_id":"https://openalex.org/W4415821773","cited_by_count":0,"quality_score":41,"matched_keywords":["LLM"],"author_affiliations":["Microsoft (United States)"],"concepts":[{"id":"https://openalex.org/C9114305","display_name":"Situational ethics","score":0.7494000196456909},{"id":"https://openalex.org/C2780791683","display_name":"Action (physics)","score":0.7294999957084656},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6115000247955322},{"id":"https://openalex.org/C26517878","display_name":"Key (lock)","score":0.5960999727249146},{"id":"https://openalex.org/C90509273","display_name":"Robot","score":0.5073999762535095},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.4625000059604645},{"id":"https://openalex.org/C2781039887","display_name":"Factor (programming language)","score":0.45489999651908875},{"id":"https://openalex.org/C18762648","display_name":"Work (physics)","score":0.4316999912261963}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4413472898","title":"CARE-AD: a multi-agent large language model framework for Alzheimer’s disease prediction using longitudinal clinical notes","url":"https://doi.org/10.1038/s41746-025-01940-4","published":"2025-08-24","authors":["Rumeng Li","Xun Wang","Dan R. Berlowitz","Jesse Mez","Honghuang Lin","Hong Yu"],"abstract":"Large language models (LLMs) have shown promising capabilities across diverse domains, yet their application to complex clinical prediction tasks remains limited. In this study, we present CARE-AD (Collaborative Analysis and Risk Evaluation for Alzheimer's Disease), a multi-agent LLM-based framework for forecasting Alzheimer's disease (AD) onset by analyzing longitudinal electronic health record (EHR) notes. CARE-AD assigns specialized LLM agents to extract signs and symptoms relevant to AD and conduct domain-specific evaluations-emulating a collaborative diagnostic process. In a retrospective evaluation, CARE-AD achieved higher accuracy (0.53 vs. 0.26-0.45) than baseline single-model approaches in predicting AD risk 10 years prior to the first recorded diagnosis code. These findings highlight the feasibility of using multi-agent LLM systems to support early risk assessment for AD and mo...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1038/s41746-025-01940-4","openalex_id":"https://openalex.org/W4413472898","cited_by_count":13,"quality_score":66,"matched_keywords":["LLM","language model","agent","multi-agent"],"author_affiliations":["Amherst College","Bedford VA Research Corporation","Boston University","Microsoft (United States)","University of Massachusetts Chan Medical School","University of Massachusetts Lowell"],"concepts":[{"id":"https://openalex.org/C2779134260","display_name":"Disease","score":0.48144811391830444},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.42837703227996826},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.3782433271408081},{"id":"https://openalex.org/C71924100","display_name":"Medicine","score":0.3721829354763031},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.32555845379829407},{"id":"https://openalex.org/C15744967","display_name":"Psychology","score":0.32179126143455505},{"id":"https://openalex.org/C126322002","display_name":"Internal medicine","score":0.14435946941375732}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":13}},{"id":"openalex:W4414273169","title":"Ironwood: Delivering Best in Class perf, perf/TCO and perf/Watt for Reasoning Model Training and Serving","url":"https://doi.org/10.1109/hcs66204.2025.11154400","published":"2025-08-24","authors":["Norman P. Jouppi","Sridhar Lakshmanamurthy"],"abstract":"","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/hcs66204.2025.11154400","openalex_id":"https://openalex.org/W4414273169","cited_by_count":1,"quality_score":38,"matched_keywords":[],"author_affiliations":["Google (United States)"],"concepts":[{"id":"https://openalex.org/C2777212361","display_name":"Class (philosophy)","score":0.6442000269889832},{"id":"https://openalex.org/C2777211547","display_name":"Training (meteorology)","score":0.5769000053405762},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5756999850273132},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.385699987411499},{"id":"https://openalex.org/C98045186","display_name":"Process (computing)","score":0.3303999900817871},{"id":"https://openalex.org/C56739046","display_name":"Knowledge management","score":0.28290000557899475},{"id":"https://openalex.org/C9652623","display_name":"Field (mathematics)","score":0.27790001034736633},{"id":"https://openalex.org/C2775924081","display_name":"Control (management)","score":0.26030001044273376}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/shortlisting-model-a-streamlined-simplexdiffusion-for-discrete-variable-generation","title":"ShortListing Model: A Streamlined SimplexDiffusion for Discrete Variable Generation","url":"https://www.microsoft.com/en-us/research/publication/shortlisting-model-a-streamlined-simplexdiffusion-for-discrete-variable-generation/","published":"2025-08-23","authors":["Yuxuan Song","Zhe Zhang","Yu Pei","Jingjing Gong","Qiying Yu","Zheng Zhang","Mingxuan Wang","Hao Zhou","Jingjing Liu","Wei-Ying Ma"],"abstract":"Generative modeling of discrete variables is challenging yet crucial for applications in natural language processing and biological sequence design. We introduce the Shortlisting Model (SLM), a novel simplex-based diffusion model inspired by progressive candidate pruning. SLM operates on simplex centroids, reducing generation complexity and enhancing scalability. Additionally, SLM incorporates a flexible implementation of classifier-free guidance, enhancing unconditional generation performance. Extensive experiments on DNA promoter and enhancer design, protein design, character-level and large-vocabulary language modeling demonstrate the competitive performance and strong potential of SLM. Our code can be found at https://github.com/GenSI-THUAIR/SLM Venue: 1970-01-01","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Human language technologies","Biology","Computer science","Natural language processing","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/bridging-gaps-in-ophthalmology-education-through-large-language-models","title":"Bridging Gaps in Ophthalmology Education Through Large Language Models","url":"https://www.microsoft.com/en-us/research/publication/bridging-gaps-in-ophthalmology-education-through-large-language-models/","published":"2025-08-23","authors":["Shahrzad Gholami","Beth Wilson","Sarah Page","Daniel B. Mummert","Joseph Carr","Robert R. McNabb","Rahul Dodhia","Juan M. Lavista Ferres","Bill Weeks","Dale E. Fajardo","Dale E. Fajardo"],"abstract":"Purpose To assess the performance of general-domain large language models (LLMs), particularly OpenAI’s Generative Pre-trained Transformer (GPT) models, within the American Academy of Ophthalmology (AAO) Self-Assessment Program, which is based on AAO’s Basic and Clinical Science Course. Methods We input 3357 questions into GPT-4o, GPT-4-Turbo, o1 and o3-mini via Microsoft’s Azure OpenAI Service using zero-shot and chain-of-thought (CoT) prompting. Questions with images were analyzed using the multimodal version of GPT-4o and GPT-4.1. The performance of the LLMs was compared to 1371 unique residents who had previously participated in the program. Additionally, we compared the performance on 1399 questions, including information on 3 question types: recall, interpretation, and decision-making or clinical management. Average accuracy rates were used to evaluate performance and compare stati...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Article (Journal)","Artificial intelligence","Medical, health and genomics","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/a-disease-centric-vision-language-foundation-model-for-precision-oncology-in-kidney-cancer","title":"A Disease-Centric Vision-Language Foundation Model for Precision Oncology in Kidney Cancer","url":"https://www.microsoft.com/en-us/research/publication/a-disease-centric-vision-language-foundation-model-for-precision-oncology-in-kidney-cancer/","published":"2025-08-22","authors":["Yuhui Tao","Zhongwei Zhao","Zilong Wang","Xufang Luo","Feng Chen","Kang Wang","Chuanfu Wu","Xue Zhang","Shaoting Zhang","Jiaxi Yao","Xin Jin","Xinyang Jiang"],"abstract":"The non-invasive assessment of increasingly incidentally discovered renal masses is a critical challenge in urologic oncology, where diagnostic uncertainty frequently leads to the overtreatment of benign or indolent tumors. In this study, we developed and validated RenalCLIP using a dataset of 27,866 CT scans from 8,809 patients across nine Chinese medical centers and the public TCIA cohort, a visual-language foundation model for characterization, diagnosis and prognosis of renal mass. The model was developed via a two-stage pre-training strategy that first enhances the image and text encoders with domain-specific knowledge before aligning them through a contrastive learning objective, to create robust representations for superior generalization and diagnostic precision. RenalCLIP achieved better performance and superior generalizability across 10 core tasks spanning the full clinical wo...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Article (Journal)","Computer vision","Medical, health and genomics","Computer science","Engineering","retrieval"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"apple:jr8a10sg2bqln2f9olt8n38c","title":"SlowFast-LLaVA-1.5: A Family of Token-Efficient Video Large Language Models for Long-Form Video Understanding","url":"https://machinelearning.apple.com/research/slowfast-llava","published":"2025-08-22","authors":["Mingze Xu","Mingfei Gao","Shiyu Li","Jiasen Lu","Zhe Gan","Zhengfeng Lai","Meng Cao","Kai Kang","Yinfei Yang","Afshin Dehghan"],"abstract":"We introduce SlowFast-LLaVA-1.5 (abbreviated as SF-LLaVA-1.5), a family of video large language models (LLMs) offering a token-efficient solution for long-form video understanding. We incorporate the two-stream SlowFast mechanism into a streamlined training pipeline, and perform joint video-image training on a carefully curated data mixture of only publicly available datasets. Our primary focus is on highly efficient model scales (1B and 3B),...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["efficient"],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"openalex:W4415934739","title":"Research on Multi-Model Fusion Machine Learning Demand Intelligent Forecasting System in Cloud Computing Environment","url":"https://doi.org/10.1109/iacis65746.2025.11210946","published":"2025-08-22","authors":["Jiaying Huang"],"abstract":"Background: Large language models (LLMs) are increasingly being used for automated unit test generation, but reported performance varies across tasks and datasets, and key aspects such as assertion plausibility and target coverage are not well understood. Methods: We perform a structured evaluation of LLM-based test generation on recent benchmarks and settings, summarizing research results in hinting, static analysis guidance, multi-agent work frameworks, and oracle generation; we compare LLMs with traditional tools such as EvoSuite, and analyze factors that affect coverage, fault detection, and maintainability. Results: When LLMs are used in conjunction with static analysis or method slicing, competitive and improved coverage can be achieved; achieving target line/branch/path coverage and obtaining robust oracles remain challenging; using multi-stage hints and tools (e.g., interpreters/...","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/iacis65746.2025.11210946","openalex_id":"https://openalex.org/W4415934739","cited_by_count":0,"quality_score":49,"matched_keywords":["LLM","agent","multi-agent"],"author_affiliations":["Amazon (United States)"],"concepts":[{"id":"https://openalex.org/C55166926","display_name":"Oracle","score":0.8202000260353088},{"id":"https://openalex.org/C55439883","display_name":"Correctness","score":0.8029000163078308},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.802299976348877},{"id":"https://openalex.org/C26517878","display_name":"Key (lock)","score":0.6075999736785889},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.5885000228881836},{"id":"https://openalex.org/C43214815","display_name":"Reliability (semiconductor)","score":0.5867000222206116},{"id":"https://openalex.org/C79974875","display_name":"Cloud computing","score":0.5254999995231628},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5006999969482422}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4413436028","title":"CUPre: Cross-domain Unsupervised Pre-training for few-shot cell segmentation","url":"https://doi.org/10.1016/j.inffus.2025.103641","published":"2025-08-22","authors":["Weibin Liao","Xuhong Li","Qingzhong Wang","Yanwu Xu","Zhaozhen Yin","Haoyi Xiong"],"abstract":"","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1016/j.inffus.2025.103641","openalex_id":"https://openalex.org/W4413436028","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Baidu (China)","South China University of Technology","Stony Brook University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7992232441902161},{"id":"https://openalex.org/C2778344882","display_name":"Shot (pellet)","score":0.7202116847038269},{"id":"https://openalex.org/C2777211547","display_name":"Training (meteorology)","score":0.6591716408729553},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.6208623051643372},{"id":"https://openalex.org/C36503486","display_name":"Domain (mathematical analysis)","score":0.596541702747345},{"id":"https://openalex.org/C89600930","display_name":"Segmentation","score":0.5895174145698547},{"id":"https://openalex.org/C153180895","display_name":"Pattern recognition (psychology)","score":0.4369228184223175},{"id":"https://openalex.org/C33923547","display_name":"Mathematics","score":0.07038941979408264}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4413386301","title":"Perplexity and proximity: Large language model perplexity complements semantic distance metrics for the detection of incoherent speech","url":"https://doi.org/10.1016/j.jbi.2025.104899","published":"2025-08-21","authors":["Weizhe Xu","Serguei Pakhomov","Patrick J. Heagerty","Eric Horvitz","Ellen Bradley","Joshua D. Woolley","Andrew T. Campbell","Alex S. Cohen","Dror Ben‐Zeev","Trevor Cohen"],"abstract":"","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1016/j.jbi.2025.104899","openalex_id":"https://openalex.org/W4413386301","cited_by_count":0,"quality_score":41,"matched_keywords":["language model"],"author_affiliations":["Behavioral Tech Research, Inc.","Dartmouth College","Louisiana State University","Microsoft (United States)","Minneapolis VA Health Care System","University of California, San Francisco","University of Minnesota","University of Washington","University of Washington Medical Center"],"concepts":[{"id":"https://openalex.org/C100279451","display_name":"Perplexity","score":0.9949443340301514},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7751643061637878},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.6778537631034851},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5732470750808716},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.5571286678314209},{"id":"https://openalex.org/C28490314","display_name":"Speech recognition","score":0.37947404384613037}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4413390950","title":"BOARD # 87: WIP: Democratizing Generative AI Quiz Creation: Accelerating Assessment Development in Engineering Education","url":"https://doi.org/10.18260/1-2--55903","published":"2025-08-21","authors":["John A. Hassell","Christopher Freeze","Ahmed Ashraf Butt","Heather McGowan","William Freeman"],"abstract":"","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.18260/1-2--55903","openalex_id":"https://openalex.org/W4413390950","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Google (United States)","University of Oklahoma"],"concepts":[{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.5762832760810852},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5588722825050354},{"id":"https://openalex.org/C5041995","display_name":"Engineering education","score":0.4823688566684723},{"id":"https://openalex.org/C115903868","display_name":"Software engineering","score":0.4304279088973999},{"id":"https://openalex.org/C110354214","display_name":"Engineering management","score":0.3698806166648865},{"id":"https://openalex.org/C201995342","display_name":"Systems engineering","score":0.3599281907081604},{"id":"https://openalex.org/C117671659","display_name":"Manufacturing engineering","score":0.3512117564678192},{"id":"https://openalex.org/C127413603","display_name":"Engineering","score":0.33399224281311035}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"arxiv:2508.14444","title":"NVIDIA Nemotron Nano 2: An Accurate and Efficient Hybrid Mamba-Transformer Reasoning Model","url":"https://huggingface.co/papers/2508.14444","published":"2025-08-20","authors":["NVIDIA","Aarti Basant","Abhijit Khairnar","Abhijit Paithankar","Abhinav Khattar","Adi Renduchintala","Adithya Renduchintala","Aditya Malte","Akhiad Bercovich","Akshay Hazare","Alejandra Rico","Aleksander Ficek"],"abstract":"We introduce Nemotron-Nano-9B-v2, a hybrid Mamba-Transformer language model designed to increase throughput for reasoning workloads while achieving state-of-the-art accuracy compared to similarly-sized models. Nemotron-Nano-9B-v2 builds on the Nemotron-H architecture, in which the majority of the self-attention layers in the common Transformer architecture are replaced with Mamba-2 layers, to achieve improved inference speed when generating the long thinking traces needed for reasoning. We create Nemotron-Nano-9B-v2 by first pre-training a 12-billion-parameter model (Nemotron-Nano-12B-v2-Base) on 20 trillion tokens using an FP8 training recipe. After aligning Nemotron-Nano-12B-v2-Base, we employ the Minitron strategy to compress and distill the model with the goal of enabling inference on up to 128k tokens on a single NVIDIA A10G GPU (22GiB of memory, bfloat16 precision). Compared to exi...","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["huggingface_search"],"source":"huggingface_search","work_type":"","doi":"","openalex_id":"","cited_by_count":0,"quality_score":39,"matched_keywords":["language model","memory","efficient"],"author_affiliations":[],"concepts":[],"official_report":false,"quality_signals":{"company_match_source":"HuggingFace Papers organization/author metadata"}},{"id":"official:4cf6a06f1baed8a3","title":"Qwen-Image-Edit: Image Editing with Higher Quality and Efficiency","url":"https://qwenlm.github.io/blog/qwen-image-edit/","published":"2025-08-19","authors":["Alibaba/Qwen"],"abstract":"QWEN CHAT GITHUB HUGGING FACE MODELSCOPE DISCORDWe are excited to introduce Qwen-Image-Edit, the image editing version of Qwen-Image. Built upon our 20B Qwen-Image model, Qwen-Image-Edit successfully extends Qwen-Image’s unique text rendering capabilities to image editing tasks, enabling precise text editing. Furthermore, Qwen-Image-Edit simultaneously feeds the input image into Qwen2.5-VL (for visual semantic control) and the VAE Encoder (for visual appearance control), achieving capabilities in both semantic and appearance editing.","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Alibaba/Qwen"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official RSS feed https://qwenlm.github.io/blog/index.xml"}},{"id":"arxiv:2509.23630","title":"Game-Oriented ASR Error Correction via RAG-Enhanced LLM","url":"http://arxiv.org/abs/2509.23630","published":"2025-08-19","authors":["Yan Jiang","Yongle Luo","Qinxian Zhou","Elvis S. Liu"],"abstract":"With the rapid development of the gaming industry and the increasing popularity of multiplayer online games, realtime voice communication has become a crucial tool for team collaboration and tactical exchanges. Automatic Speech Recognition (ASR) technology plays a vital role in modern gaming by converting voice commands into text, enabling efficient communication among players. However, existing general-purpose ASR systems face significant challenges in gaming scenarios due to the unique characteristics of in-game communication, such as short phrases, rapid speech, game-specific jargon, and environmental noise. These limitations often lead to frequent recognition errors, increasing communication costs and negatively impacting the overall gaming experience. Furthermore, the scarcity of domainspecific ASR training data exacerbates these issues, hindering system optimization. To address the...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/cog64752.2025.11114204","openalex_id":"https://openalex.org/W4413321612","cited_by_count":1,"quality_score":50,"matched_keywords":["LLM","retrieval","efficient"],"author_affiliations":["Tencent (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6920996308326721},{"id":"https://openalex.org/C103088060","display_name":"Error detection and correction","score":0.5331023931503296},{"id":"https://openalex.org/C28490314","display_name":"Speech recognition","score":0.37348344922065735},{"id":"https://openalex.org/C11413529","display_name":"Algorithm","score":0.2408900260925293}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4413311730","title":"ConfSum: Towards Automatic Summarization of Network-scale Operational Intents from Device Configurations","url":"https://doi.org/10.1145/3750022.3750459","published":"2025-08-19","authors":["Rundi Zhai","Jianmin Liu","Yukai Miao","Li Chen","Dan Li","Baojiang Cui","Peng Zhang","Ennan Zhai","Zishuo Ding"],"abstract":"When network operators need to understand the high-level intent behind a network's existing device configurations, they must engage in a tedious and error-prone process of manually reverse-engineering the low-level commands. We propose Configuration Intent Summarization (CIS), a new task that aims to automate this process by generating human-readable summaries of the intents embedded across a network's configurations. CIS is challenging due to the diversity of intents, the semantic gap between device-specific configurations and network-wide intents, and the need to reason about interactions between multiple devices' configurations. We present ConfSum, a system that addresses these challenges by leveraging the unique ability of large language models (LLMs) to parse semi-structured configuration files and summarize them in natural language. However, the full CIS task requires reasoning abo...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3750022.3750459","openalex_id":"https://openalex.org/W4413311730","cited_by_count":0,"quality_score":41,"matched_keywords":["LLM"],"author_affiliations":["Alibaba Group (China)","Beijing University of Posts and Telecommunications","Tsinghua University","Xi'an Jiaotong University"],"concepts":[{"id":"https://openalex.org/C170858558","display_name":"Automatic summarization","score":0.956274688243866},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7715843915939331},{"id":"https://openalex.org/C2778755073","display_name":"Scale (ratio)","score":0.5993344783782959},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.2409151792526245},{"id":"https://openalex.org/C121332964","display_name":"Physics","score":0.04968413710594177},{"id":"https://openalex.org/C62520636","display_name":"Quantum mechanics","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4413322591","title":"The Price of Intelligence","url":"https://doi.org/10.1145/3749447","published":"2025-08-19","authors":["Mark Russinovich","Ahmed Salem","Santiago Zanella-Béguelin","Yonatan Zunger"],"abstract":"Three risks inherent in LLMs.","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3749447","openalex_id":"https://openalex.org/W4413322591","cited_by_count":2,"quality_score":39,"matched_keywords":[],"author_affiliations":["Bellevue Hospital Center","Microsoft (United States)","Microsoft Research (United Kingdom)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5389220714569092},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3549927771091461}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":2}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/beyond-pass1-self-play-with-variational-problem-synthesis-sustains-rlvr","title":"Beyond Pass@1: Self-Play with Variational Problem Synthesis Sustains RLVR","url":"https://www.microsoft.com/en-us/research/publication/beyond-pass1-self-play-with-variational-problem-synthesis-sustains-rlvr/","published":"2025-08-18","authors":["Xiao Liang","Zhong-zhi Li","Yeyun Gong","Yelong Shen","Yingchun Wu","Zhijiang Guo","Weizhu Chen"],"abstract":"Reinforcement Learning with Verifiable Rewards (RLVR) has recently emerged as a key paradigm for post-training Large Language Models (LLMs), particularly for complex reasoning tasks. However, vanilla RLVR training has been shown to improve Pass@1 performance at the expense of policy entropy, leading to reduced generation diversity and limiting the Pass@k performance, which typically represents the upper bound of LLM reasoning capability. In this paper, we systematically analyze the policy's generation diversity from the perspective of training problems and find that augmenting and updating training problems helps mitigate entropy collapse during training. Based on these observations, we propose an online Self-play with Variational problem Synthesis (SvS) strategy for RLVR training, which uses the policy's correct solutions to synthesize variational problems while ensuring their reference...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","large language models","1970-01-01","LLM"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/flair-feedback-learning-for-adaptive-information-retrieval","title":"FLAIR: Feedback Learning for Adaptive Information Retrieval","url":"https://www.microsoft.com/en-us/research/publication/flair-feedback-learning-for-adaptive-information-retrieval/","published":"2025-08-18","authors":["William Zhang","Yiwen Zhu","Yunlei Lu","Mathieu Demarne","Wenjing Wang","Kai Deng","Nutan Sahoo","Katherine Lin","Miso Cilimdzic","Subru Krishnan (subru)"],"abstract":"Recent advances in Large Language Models (LLMs) have driven the adoption of copilots in complex technical scenarios, underscoring the growing need for specialized information retrieval solutions. In this paper, we introduce FLAIR, a lightweight, feedback learning framework that adapts copilot systems' retrieval strategies by integrating domain-specific expert feedback. FLAIR operates in two stages: an offline phase obtains indicators from (1) user feedback and (2) questions synthesized from documentation, storing these indicators in a decentralized manner. An online phase then employs a two-track ranking mechanism to combine raw similarity scores with the collected indicators. This iterative setup refines retrieval performance for any query. Extensive real-world evaluations of FLAIR demonstrate significant performance gains on both previously seen and unseen queries, surpassing state-of-...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.1145/3746252.3761553","openalex_id":"https://openalex.org/W4416016736","cited_by_count":0,"quality_score":72,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","1970-01-01","retrieval"],"author_affiliations":["Microsoft","Carnegie Mellon University","Microsemi (United States)","Microsoft (United States)","Microsoft Research (United Kingdom)"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"apple:rh8628dswx0i66e56z8k4u8h","title":"Investigating Intersectional Bias in Large Language Models using Confidence Disparities in Coreference Resolution","url":"https://machinelearning.apple.com/research/investigating-intersectional","published":"2025-08-18","authors":["Falaah Arif Khan","Niv Sivakumar","Oliver Wang","Rin Metcalf Susa","Cezanne Camacho","Barry Theobald","Luca Zappella","Nick Apostoloff"],"abstract":"Large language models (LLMs) have achieved impressive performance, leading to their widespread adoption as decision-support tools in resource-constrained contexts like hiring and admissions. There is, however, scientific consensus that AI systems can reflect and exacerbate societal biases, raising concerns about identity-based harm when used in critical social contexts. Prior work has laid a solid foundation for assessing bias in LLMs by...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"openalex:W4413277251","title":"StyleFormer: Multi-Agent Joint Trajectory Prediction and Planning in Urban Environments With Driving Style Awareness","url":"https://doi.org/10.1109/tits.2025.3595733","published":"2025-08-18","authors":["Xiang Li","Bo Yang","Xianming Zeng","Kimihiko Nakano"],"abstract":"Accurately inferring the driving intentions of neighboring vehicles is a critical challenge for autonomous vehicles (AVs) when handling complex interactions in urban environments. These interactions are complicated by diverse driving styles, which AVs often struggle to interpret, leading to overly cautious driving strategies. In this work, a trajectory prediction and planning framework, StyleFormer, is proposed which considers vehicles’ driving styles. The model classifies short-term driving styles using an unsupervised method and employs a vectorized representation to integrate map features, agent states, and driving styles. A Transformer-based attention mechanism is used to model interactions and intentions, enabling joint prediction of future trajectories for surrounding vehicles and multimodal trajectory generation for AVs. StyleFormer further adapts planned trajectories to different...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/tits.2025.3595733","openalex_id":"https://openalex.org/W4413277251","cited_by_count":2,"quality_score":51,"matched_keywords":["efficient","agent","multi-agent"],"author_affiliations":["Alibaba Group (China)","The University of Tokyo"],"concepts":[{"id":"https://openalex.org/C13662910","display_name":"Trajectory","score":0.6973791122436523},{"id":"https://openalex.org/C18555067","display_name":"Joint (building)","score":0.6237005591392517},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5593137741088867},{"id":"https://openalex.org/C2776445246","display_name":"Style (visual arts)","score":0.4913124144077301},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4139786660671234},{"id":"https://openalex.org/C22212356","display_name":"Transport engineering","score":0.37053966522216797},{"id":"https://openalex.org/C44154836","display_name":"Simulation","score":0.362338125705719},{"id":"https://openalex.org/C127413603","display_name":"Engineering","score":0.28885385394096375}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":2}},{"id":"arxiv:2603.04887","title":"Federated modality-specific encoders and partially personalized fusion decoder for multimodal brain tumor segmentation","url":"http://arxiv.org/abs/2603.04887","published":"2025-08-18","authors":["Hong Liu","Dong Wei","Qian Dai","Xian Wu","Yefeng Zheng","Liansheng Wang"],"abstract":"","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1016/j.media.2025.103759","openalex_id":"https://openalex.org/W4413275656","cited_by_count":2,"quality_score":43,"matched_keywords":["personalized"],"author_affiliations":["Tencent (China)","Westlake University","Xiamen University","Xiamen University of Technology"],"concepts":[{"id":"https://openalex.org/C2780226545","display_name":"Modality (human–computer interaction)","score":0.67859947681427},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6543025970458984},{"id":"https://openalex.org/C118505674","display_name":"Encoder","score":0.6345970630645752},{"id":"https://openalex.org/C158525013","display_name":"Fusion","score":0.6136788725852966},{"id":"https://openalex.org/C89600930","display_name":"Segmentation","score":0.6072149276733398},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.6042050719261169},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.5090660452842712},{"id":"https://openalex.org/C2780910867","display_name":"Multimodality","score":0.4394558072090149}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":2}},{"id":"openalex:W4416183132","title":"Bridging Context, Statistics, and Practice: A Multi-Dimensional Framework for Responsible LLM Evaluation and Selection","url":"https://doi.org/10.1109/gaclm67198.2025.11232293","published":"2025-08-18","authors":["N. Deepak Kumar","Srinivas Ramavath","Vinay Venkatesh"],"abstract":"Current AI model evaluation approaches employ uniform weighting schemes that fail to capture context-specific requirements and stakeholder priorities, leading to suboptimal model selection decisions in real-world deployments. Traditional methods prioritize raw performance metrics while neglecting critical dimensions such as fairness, privacy, and robustness, resulting in misaligned model choices for safety-critical, creative, and social applications. We introduce ARIA (AI Responsibility and Impact Assessment), a novel context-adaptive evaluation framework that dynamically weights five responsibility dimensions—performance, fairness, robustness, privacy, and sustainability—based on application context and stakeholder profiles. ARIA incorporates uncertainty quantification, cascading penalty mechanisms for critical weaknesses, and multi-objective optimization to provide comprehensive model....","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/gaclm67198.2025.11232293","openalex_id":"https://openalex.org/W4416183132","cited_by_count":0,"quality_score":41,"matched_keywords":["LLM"],"author_affiliations":["Google (United States)","University of Illinois Urbana-Champaign","University of the Cumberlands"],"concepts":[{"id":"https://openalex.org/C174348530","display_name":"Bridging (networking)","score":0.6797999739646912},{"id":"https://openalex.org/C201305675","display_name":"Stakeholder","score":0.6273000240325928},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.597100019454956},{"id":"https://openalex.org/C183115368","display_name":"Weighting","score":0.5716999769210815},{"id":"https://openalex.org/C81917197","display_name":"Selection (genetic algorithm)","score":0.48590001463890076},{"id":"https://openalex.org/C539667460","display_name":"Management science","score":0.47780001163482666},{"id":"https://openalex.org/C2779343474","display_name":"Context (archaeology)","score":0.46070000529289246},{"id":"https://openalex.org/C112930515","display_name":"Risk analysis (engineering)","score":0.444599986076355}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/can-ai-understand-mandarin-speech-prosody-a-framework-and-benchmark-showcase","title":"Can AI Understand Mandarin Speech Prosody? A Framework and Benchmark Showcase","url":"https://www.microsoft.com/en-us/research/publication/can-ai-understand-mandarin-speech-prosody-a-framework-and-benchmark-showcase/","published":"2025-08-17","authors":["Zilong Wang","Xiaoxue Zhang","Xinyang Jiang","Kaitao Song","Jue Yu"],"abstract":"How to model and estimate speech prosody is considered as a challenging task in understanding and generating natural speech. We introduce the Mandarin Speech Prosody Benchmark (MSPB), a linguistically grounded dataset for evaluating Speech Large Language Models (Speech LLMs) in Mandarin. MSPB comprises eight tasks covering crucial prosodic features and their interactions with syntax, semantics, and prag-matics. All MSPB items, designed per Mandarin linguistic principles and validated by experts, were phonetically recorded and verified. We evaluated six Speech LLMs (GPT-4o, Gemini-1.5-Pro, Gemini-2-Flash, Qwen2-Audio-7B-Instruct, GLM-4-Voice, MiniCPM-o 2.6). Although some models perform well with context-rich cues (e.g., irony), they generally struggle with subtle prosodic variations (e.g., focus marking) and underper-form humans. MSPB provides a valuable tool to assess and enhance prosod...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Article (Journal)","Audio and Acoustics","Human-computer interaction","Computer science"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W4414432206","title":"6D Pose Tracking for Adaptive AR-Mediated Human-Robot Collaboration","url":"https://doi.org/10.1109/case58245.2025.11163977","published":"2025-08-17","authors":["Akhil Ajikumar","Bowen Wen","Mohsen Moghaddam"],"abstract":"This paper presents a system framework for adaptive, augmented reality (AR)-mediated human-robot collaboration (HRC), enabling real-time multimodal interaction and adaptive robot behaviors during collaborative manipulation tasks. The system integrates egocentric 6D object pose tracking with real-time visual attention tracking (gaze), hand gestures, and speech, facilitating seamless two-way communication between the human and the robot. While leveraging an existing 6D pose tracker (FoundationPose), we present the first evaluation of its integration within a real-time, egocentric AR + robot framework for dynamic HRC. Our results highlight practical limitations (e.g., tracking drift) and provide design insights for building more robust, adaptive collaboration systems. The system was validated across four real-world scenarios, demonstrating promising performance and identifying key challenge...","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/case58245.2025.11163977","openalex_id":"https://openalex.org/W4414432206","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Georgia Institute of Technology","Nvidia (United Kingdom)","Nvidia (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6676999926567078},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.6040999889373779},{"id":"https://openalex.org/C26517878","display_name":"Key (lock)","score":0.6008999943733215},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.5600000023841858},{"id":"https://openalex.org/C145460709","display_name":"Human–robot interaction","score":0.5525000095367432},{"id":"https://openalex.org/C2775936607","display_name":"Tracking (education)","score":0.5439000129699707},{"id":"https://openalex.org/C90509273","display_name":"Robot","score":0.507099986076355},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.5034999847412109}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/generative-medical-event-models-improve-with-scale","title":"Generative Medical Event Models Improve with Scale","url":"https://www.microsoft.com/en-us/research/publication/generative-medical-event-models-improve-with-scale/","published":"2025-08-16","authors":["Shane Waxler","Paul Blazek","Davis White","Daniel Sneider","Kevin Chung","Mani Nagarathnam","Patrick Williams","Hank Voeller","Karen Wong","Matthew Swanhorst","Sheng Zhang","Naoto Usuyama"],"abstract":"Realizing personalized medicine at scale calls for methods that distill insights from longitudinal patient journeys, which can be viewed as a sequence of medical events. Foundation models pretrained on large-scale medical event data represent a promising direction for scaling real-world evidence generation and generalizing to diverse downstream tasks. Using Epic Cosmos, a dataset with medical events from de-identified longitudinal health records for 16.3 billion encounters over 300 million unique patient records from 310 health systems, we introduce the Cosmos Medical Event Transformer ( CoMET) models, a family of decoder-only transformer models pretrained on 118 million patients representing 115 billion discrete medical events (151 billion tokens). We present the largest scaling-law study for medical event data, establishing a methodology for pretraining and revealing power-law scaling....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Unpublished","Artificial intelligence","Computer science","personalized"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"apple:v8dadrblbdxmnw9n5j2ebmn4","title":"UICoder: Finetuning Large Language Models to Generate User Interface Code through Automated Feedback","url":"https://machinelearning.apple.com/research/uicoder","published":"2025-08-15","authors":["Jason Wu","Eldon Schoop","Alan Leung","Titus Barik","Jeffrey P. Bigham","Jeffrey Nichols"],"abstract":"Large language models (LLMs) struggle to consistently generate UI code that compiles and produces visually relevant designs. Existing approaches to improve generation rely on expensive human feedback or distilling a proprietary model. In this paper, we explore the use of automated feedback (compilers and multi-modal models) to guide LLMs to generate high-quality UI code. Our method starts with an existing LLM and iteratively produces improved...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["LLM"],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"arxiv:2508.11126","title":"AI Agentic Programming: A Survey of Techniques, Challenges, and Opportunities","url":"https://huggingface.co/papers/2508.11126","published":"2025-08-15","authors":["Huanting Wang","Jingzhi Gong","Huawei Zhang","Zheng Wang"],"abstract":"AI agentic programming is an emerging paradigm in which large language models (LLMs) autonomously plan, execute, and interact with external tools like compilers, debuggers, and version control systems to iteratively perform complex software development tasks. Unlike conventional code generation tools, agentic systems are capable of decomposing high-level goals, coordinating multi-step processes, and adapting their behavior based on intermediate feedback. These capabilities are transforming the software development practice. As this emerging field evolves rapidly, there is a need to define its scope, consolidate its technical foundations, and identify open research challenges. This survey provides a comprehensive and timely review of AI agentic programming. We introduce a taxonomy of agent behaviors and system architectures, and examine core techniques including planning, memory and conte...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["huggingface_search"],"source":"huggingface_search","work_type":"","doi":"","openalex_id":"","cited_by_count":0,"quality_score":35,"matched_keywords":["memory","agent"],"author_affiliations":[],"concepts":[],"official_report":false,"quality_signals":{"company_match_source":"HuggingFace Papers organization/author metadata"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/accelerating-biomolecular-modeling-with-atomworks-and-rf3","title":"Accelerating Biomolecular Modeling with AtomWorks and RF3","url":"https://www.microsoft.com/en-us/research/publication/accelerating-biomolecular-modeling-with-atomworks-and-rf3/","published":"2025-08-14","authors":["Nathaniel Corley","Simon V. Mathis","Rohith Krishna","Magnus Bauer","Tuscan R. Thompson","Woody Ahern","Maxwell W. Kazman","Rafael I Brent","Kieran Didi","Andrew Kubaney","Lilian McHugh","Arnav Nagle"],"abstract":"Deep learning methods trained on protein structure databases have revolutionized biomolecular structure prediction, but developing and training new models remains a considerable challenge. To facilitate the development of new models, we present AtomWorks: a broadly applicable data framework for developing state-of-the-art biomolecular foundation models spanning diverse tasks, including structure prediction, generative protein design, and fixed backbone sequence design. We use AtomWorks to train RosettaFold-3 (RF3), a structure prediction network capable of predicting arbitrary biomolecular complexes with an improved treatment of chirality that narrows the performance gap between closed-source AlphaFold3 (AF3) and existing open-source implementations. We expect that AtomWorks will accelerate the next generation of open-source biomolecular machine learning models and that RF3 will be broad...","companies":["Microsoft","NVIDIA"],"matched_orgs":["Microsoft","NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.1101/2025.08.14.670328","openalex_id":"https://openalex.org/W4413340128","cited_by_count":17,"quality_score":97,"matched_keywords":["Unpublished","Artificial intelligence","Biology","Medicine"],"author_affiliations":["Microsoft","Howard Hughes Medical Institute","Microsoft (United States)","Novo Nordisk Foundation","Nvidia (United States)","PDL BioPharma (United States)","Paul G. Allen Family Foundation","Science Oxford","Southwestern Medical Center","Technical University of Denmark","The University of Texas Southwestern Medical Center","University of Cambridge","University of Oxford","University of Washington","University of Washington Applied Physics Laboratory"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"official:29820656f9c7f882","title":"Gemma 3 Model Card","url":"https://ai.google.dev/gemma/docs/core/model_card_3","published":"2025-08-14","authors":["Google/DeepMind"],"abstract":"Official Google DeepMind model card.","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"model_card","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["model card","Gemma 3"],"author_affiliations":["Google/DeepMind"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Google DeepMind model cards page https://deepmind.google/models/model-cards/"}},{"id":"official:6435fcb0868aa139","title":"FastDeploy 2.0: A Large-Scale Model Inference and Deployment Toolkit with Native Support for ERNIE 4.5","url":"https://ernie.baidu.com/blog/posts/fastdeploy2.0/","published":"2025-08-14","authors":["Baidu"],"abstract":"As large models such as the ERNIE 4.5 family continue to be open-sourced, interest in their inference performance and deployment efficiency has multiplied across both research and industry. FastDeploy 2.0, built on the PaddlePaddle framework, addresses this demand by offering an end-to-end toolkit for efficient deployment and high-performance inference of large models.","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["efficient"],"author_affiliations":["Baidu"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official RSS feed https://ernie.baidu.com/blog/index.xml"}},{"id":"openalex:W4413141447","title":"Improving Sample Efficiency of Reinforcement Learning With Background Knowledge From Large Language Models","url":"https://doi.org/10.1109/tnnls.2025.3590731","published":"2025-08-14","authors":["Fuxiang Zhang","Junyou Li","Yi-Chen Li","Zongzhang Zhang","Yang Yu","Deheng Ye"],"abstract":"Low sample efficiency is an enduring challenge of reinforcement learning (RL). With the advent of versatile large language models (LLMs), recent works impart common-sense knowledge to accelerate policy learning for RL processes. However, we note that such guidance is often tailored for one specific task but loses generalizability. In this article, we introduce a framework that harnesses LLMs to extract background knowledge of an environment, which contains general understandings of the entire environment, making various downstream RL tasks benefit from one-time knowledge representation. We ground LLMs by feeding a few precollected experiences and requesting them to delineate background knowledge of the environment. Afterward, we represent the output knowledge as potential functions for potential-based reward shaping, which has a good property for maintaining policy optimality from task r...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/tnnls.2025.3590731","openalex_id":"https://openalex.org/W4413141447","cited_by_count":3,"quality_score":40,"matched_keywords":[],"author_affiliations":["Nanjing University","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C27158222","display_name":"Generalizability theory","score":0.7756215333938599},{"id":"https://openalex.org/C97541855","display_name":"Reinforcement learning","score":0.7445114254951477},{"id":"https://openalex.org/C2780451532","display_name":"Task (project management)","score":0.6792757511138916},{"id":"https://openalex.org/C198531522","display_name":"Sample (material)","score":0.6417078971862793},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5840072631835938},{"id":"https://openalex.org/C2776207758","display_name":"Downstream (manufacturing)","score":0.44843241572380066},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4395596385002136},{"id":"https://openalex.org/C2776760102","display_name":"Code (set theory)","score":0.4311600923538208}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":3}},{"id":"arxiv:2508.15804","title":"ReportBench: Evaluating Deep Research Agents via Academic Survey Tasks","url":"https://huggingface.co/papers/2508.15804","published":"2025-08-14","authors":["Minghao Li","Ying Zeng","Zhihao Cheng","Cong Ma","Kai Jia"],"abstract":"The advent of Deep Research agents has substantially reduced the time required for conducting extensive research tasks. However, these tasks inherently demand rigorous standards of factual accuracy and comprehensiveness, necessitating thorough evaluation before widespread adoption. In this paper, we propose ReportBench, a systematic benchmark designed to evaluate the content quality of research reports generated by large language models (LLMs). Our evaluation focuses on two critical dimensions: (1) the quality and relevance of cited literature, and (2) the faithfulness and veracity of the statements within the generated reports. ReportBench leverages high-quality published survey papers available on arXiv as gold-standard references, from which we apply reverse prompt engineering to derive domain-specific prompts and establish a comprehensive evaluation corpus. Furthermore, we develop an...","companies":["ByteDance/Seed"],"matched_orgs":["ByteDance/Seed"],"company_groups":["company_china"],"company_regions":["China"],"sources":["huggingface_search"],"source":"huggingface_search","work_type":"","doi":"","openalex_id":"","cited_by_count":0,"quality_score":31,"matched_keywords":["agent"],"author_affiliations":[],"concepts":[],"official_report":false,"quality_signals":{"company_match_source":"HuggingFace Papers organization/author metadata"}},{"id":"bytedance-seed:859","title":"Seeing, Listening, Remembering, and Reasoning: A Multimodal Agent with Long-Term Memory","url":"https://seed.bytedance.com/en/research/seeing-listening-remembering-and-reasoning-a-multimodal-agent-with-long-term-memory","published":"2025-08-13","authors":["Lin Long","Yichen He","Wentao Ye","Yiyuan Pan","Yuan Lin","Hang Li","Junbo Zhao","Wei Li"],"abstract":"We introduce M3-Agent, a novel multimodal agent framework equipped with long-term memory. Like humans, M3-Agent can process real-time visual and auditory inputs to build and update its long-term memory. Beyond episodic memory, it also develops semantic memory, enabling it to accumulate world knowledge over time. Its memory is organized in an entity-centric, multimodal format, allowing deeper and more consistent understanding of the environment. Given an instruction, M3-Agent autonomously performs multi-turn, iterative reasoning and retrieves relevant information from memory to accomplish the task. To evaluate memory effectiveness and memory-based reasoning in multimodal agents, we develop M3-Bench, a new long-video question answering benchmark. M3-Bench comprises 100 newly recorded real-world videos captured from a robot's perspective (M3-Bench-robot) and 929 web-sourced videos across di...","companies":["ByteDance/Seed"],"matched_orgs":["ByteDance/Seed"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Computer Vision and Pattern Recognition","Responsible AI","ICLR 2026","memory","long-term","agent"],"author_affiliations":["ByteDance/Seed"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official ByteDance Seed publication API https://seed.bytedance.com/api/get_article_list_v2"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/sample-more-to-think-less-group-filtered-policy-optimization-for-concise-reasoning","title":"Sample More to Think Less: Group Filtered Policy Optimization for Concise Reasoning","url":"https://www.microsoft.com/en-us/research/publication/sample-more-to-think-less-group-filtered-policy-optimization-for-concise-reasoning/","published":"2025-08-13","authors":["Vaishnavi Shrivastava","Ahmed Awadallah","Vidhisha Balachandran","Shivam Garg","Harkirat Behl","Dimitris Papailiopoulos"],"abstract":"Large language models trained with reinforcement learning with verifiable rewards tend to trade accuracy for length--inflating response lengths to achieve gains in accuracy. While longer answers may be warranted for harder problems, many tokens are merely \"filler\": repetitive, verbose text that makes no real progress. We introduce GFPO (Group Filtered Policy Optimization), which curbs this length explosion by sampling larger groups per problem during training and filtering responses to train on based on two key metrics: (1) response length and (2) token efficiency: reward per token ratio. By sampling more at training time, we teach models to think less at inference time. On the Phi-4-reasoning model, GFPO cuts GRPO's length inflation by 46-71% across challenging STEM and coding benchmarks (AIME 24/25, GPQA, Omni-MATH, LiveCodeBench) while maintaining accuracy. Optimizing for reward per t...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computation and Language","Machine learning","1970-01-01","efficient"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/exploring-scaling-laws-for-ehr-foundation-models","title":"Exploring Scaling Laws for EHR Foundation Models","url":"https://www.microsoft.com/en-us/research/publication/exploring-scaling-laws-for-ehr-foundation-models/","published":"2025-08-13","authors":["Sheng Zhang","Qin Liu","Naoto Usuyama","Cliff Wong","Tristan Naumann","Hoifung Poon"],"abstract":"The emergence of scaling laws has profoundly shaped the development of large language models (LLMs), enabling predictable performance gains through systematic increases in model size, dataset volume, and compute. Yet, these principles remain largely unexplored in the context of electronic health records (EHRs) -- a rich, sequential, and globally abundant data source that differs structurally from natural language. In this work, we present the first empirical investigation of scaling laws for EHR foundation models. By training transformer architectures on patient timeline data from the MIMIC-IV database across varying model sizes and compute budgets, we identify consistent scaling patterns, including parabolic IsoFLOPs curves and power-law relationships between compute, model parameters, data size, and clinical utility. These findings demonstrate that EHR models exhibit scaling behavior a...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Unpublished","Artificial intelligence","Medical, health and genomics","Computer science","personalized","efficient"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/benchmark-dataset-generation-and-evaluation-for-excel-formula-repair-with-llms","title":"Benchmark Dataset Generation and Evaluation for Excel Formula Repair with LLMs","url":"https://www.microsoft.com/en-us/research/publication/benchmark-dataset-generation-and-evaluation-for-excel-formula-repair-with-llms/","published":"2025-08-13","authors":["Ananya Singha","Harshita Sahijwani","Walt Williams","Emmanuel Aboah Boateng","Nick Hausman","Miguel Di Luca","Keegan Choudhury","Chaya Binet","Vu-Anh Le","Tianwei Chen","Oryan Rokeah Chen","Sulaiman Vesal"],"abstract":"Excel is a pervasive yet often complex tool, particularly for novice users, where runtime errors arising from logical mistakes or misinterpretations of functions pose a significant challenge. While large language models (LLMs) offer promising assistance by explaining formula errors, the automated correction of these semantic runtime errors remains an open problem. A primary challenge to advancing models for such scenarios is the severe lack of high-quality, comprehensive datasets for training and rigorous evaluation. This paper addresses this gap by introducing a novel approach for constructing a benchmark dataset specifically designed for Excel formula repair. We propose a data generation pipeline, which leverages a small set of curated seed samples from online forums to synthetically expand the dataset. Our pipeline integrates few-shot prompting with LLMs and employs a robust \\textit{L...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Article (Journal)","Artificial intelligence","Computer science","LLM"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W4413157351","title":"Typing to Listen at the Cocktail Party: Text-Guided Target Speaker Extraction","url":"https://doi.org/10.1109/tcds.2025.3598687","published":"2025-08-13","authors":["Xiang Hao","Jibin Wu","Jianwei Yu","Chenglin Xu","Kay Chen Tan"],"abstract":"Humans can easily isolate a single speaker from a complex acoustic environment, a capability referred to as the “Cocktail Party Effect.” However, replicating this ability has been a significant challenge in the field of target speaker extraction (TSE). Traditional TSE approaches predominantly rely on voiceprints, which raise privacy concerns and face issues related to the quality and availability of enrollment samples, as well as intra-speaker variability. To address these issues, this work introduces a novel text-guided TSE paradigm named LLM-TSE. In this paradigm, a state-of-the-art large language model, LLaMA 2, processes typed text input from users to extract semantic cues. We demonstrate that textual descriptions alone can effectively serve as cues for extraction, thus addressing privacy concerns and reducing dependency on voiceprints. Furthermore, our approach offers flexibility by...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/tcds.2025.3598687","openalex_id":"https://openalex.org/W4413157351","cited_by_count":1,"quality_score":46,"matched_keywords":["LLM","language model"],"author_affiliations":["Hong Kong Polytechnic University","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8615174293518066},{"id":"https://openalex.org/C28490314","display_name":"Speech recognition","score":0.6172064542770386},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.5016565322875977},{"id":"https://openalex.org/C2781209916","display_name":"Typing","score":0.4827396273612976},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.36063462495803833}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4415745039","title":"Adaptoserve: An Efficient System for Supporting Adaptive Chunked-Prefills in LLM Inference","url":"https://doi.org/10.1109/hpcc67675.2025.00051","published":"2025-08-13","authors":["Yu Ding","Jinhao Zhao","Zhengong Cai","Kai Shi","Fansong Zeng","Bo Yang"],"abstract":"With the widespread deployment of large language models (LLMs) across diverse applications, optimizing their inference processes to achieve high throughput and low latency has become increasingly critical. While vLLM emerges as a high-performance inference engine that significantly accelerates processing speed, it continues to face challenges with elevated tail latency in high-concurrency scenarios. Sarathi-Serve addresses this through chunked-prefills and stall-free scheduling, achieving a balance between throughput and latency. However, our analysis reveals fundamental limitations in its static chunking strategy. This paper presents AdaptoServe, a system employing a dynamic adaptive chunk-based prefetching strategy to overcome the resource management and latency optimization limitations inherent in static chunking approaches. Our solution features intelligent chunk size adaptation thro...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/hpcc67675.2025.00051","openalex_id":"https://openalex.org/W4415745039","cited_by_count":0,"quality_score":45,"matched_keywords":["LLM","efficient"],"author_affiliations":["Alibaba Group (China)","Nanjing University of Aeronautics and Astronautics","Zhejiang University","Zhejiang University of Science and Technology"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8360999822616577},{"id":"https://openalex.org/C2776214188","display_name":"Inference","score":0.7013999819755554},{"id":"https://openalex.org/C82876162","display_name":"Latency (audio)","score":0.6510999798774719},{"id":"https://openalex.org/C206729178","display_name":"Scheduling (production processes)","score":0.6184999942779541},{"id":"https://openalex.org/C120314980","display_name":"Distributed computing","score":0.5504999756813049},{"id":"https://openalex.org/C105339364","display_name":"Software deployment","score":0.506600022315979},{"id":"https://openalex.org/C160403385","display_name":"Queue","score":0.4422999918460846},{"id":"https://openalex.org/C157764524","display_name":"Throughput","score":0.4323999881744385}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4413166924","title":"Large language models in palynological taxonomy: a pilot study","url":"https://doi.org/10.1080/01916122.2025.2547645","published":"2025-08-13","authors":["Michael H. Stephenson","Chen Shen","Shu‐zhong Shen","Alessandro P. Carniti","Junxuan Fan","Jiaxi Yang","Jieping Ye"],"abstract":"","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1080/01916122.2025.2547645","openalex_id":"https://openalex.org/W4413166924","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Alibaba Group (China)","Nanjing University","Przedsiębiorstwo Badań i Doradztwa","Zhejiang Lab"],"concepts":[{"id":"https://openalex.org/C162501224","display_name":"Palynology","score":0.7311573624610901},{"id":"https://openalex.org/C58642233","display_name":"Taxonomy (biology)","score":0.6844891905784607},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.534515917301178},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.4447154998779297},{"id":"https://openalex.org/C41895202","display_name":"Linguistics","score":0.3777623474597931},{"id":"https://openalex.org/C127313418","display_name":"Geology","score":0.34269440174102783},{"id":"https://openalex.org/C18903297","display_name":"Ecology","score":0.22766849398612976},{"id":"https://openalex.org/C86803240","display_name":"Biology","score":0.17222487926483154}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"arxiv:2508.10104","title":"DINOv3","url":"https://huggingface.co/papers/2508.10104","published":"2025-08-13","authors":["Oriane Siméoni","Huy V. Vo","Maximilian Seitzer","Federico Baldassarre","Maxime Oquab","Cijo Jose","Vasil Khalidov","Marc Szafraniec","Seungeun Yi","Michaël Ramamonjisoa","Francisco Massa","Daniel Haziza"],"abstract":"Self-supervised learning holds the promise of eliminating the need for manual data annotation, enabling models to scale effortlessly to massive datasets and larger architectures. By not being tailored to specific tasks or domains, this training paradigm has the potential to learn visual representations from diverse sources, ranging from natural to aerial images -- using a single algorithm. This technical report introduces DINOv3, a major milestone toward realizing this vision by leveraging simple yet effective strategies. First, we leverage the benefit of scaling both dataset and model size by careful data preparation, design, and optimization. Second, we introduce a new method called Gram anchoring, which effectively addresses the known yet unsolved issue of dense feature maps degrading during long training schedules. Finally, we apply post-hoc strategies that further enhance our models...","companies":["Meta/FAIR"],"matched_orgs":["Meta/FAIR"],"company_groups":["company_us"],"company_regions":["US"],"sources":["huggingface_search"],"source":"huggingface_search","work_type":"","doi":"","openalex_id":"","cited_by_count":0,"quality_score":27,"matched_keywords":[],"author_affiliations":[],"concepts":[],"official_report":false,"quality_signals":{"company_match_source":"HuggingFace Papers organization/author metadata"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/odysseybench-evaluating-llm-agents-on-long-horizon-complex-office-application-workflows","title":"OdysseyBench: Evaluating LLM Agents on Long-Horizon Complex Office Application Workflows","url":"https://www.microsoft.com/en-us/research/publication/odysseybench-evaluating-llm-agents-on-long-horizon-complex-office-application-workflows/","published":"2025-08-12","authors":["Weixuan Wang","Dongge Han","Daniel Madrigal","Jin Xu","Victor Ruehle","Saravan Rajmohan"],"abstract":"Autonomous agents powered by large language models (LLMs) are increasingly deployed in real-world applications requiring complex, long-horizon workflows. However, existing benchmarks predominantly focus on atomic tasks that are self-contained and independent, failing to capture the long-term contextual dependencies and multi-interaction coordination required in realistic scenarios. To address this gap, we introduce OdysseyBench, a comprehensive benchmark for evaluating LLM agents on long-horizon workflows across diverse office applications including Word, Excel, PDF, Email, and Calendar. Our benchmark comprises two complementary splits: OdysseyBench+ with 300 tasks derived from real-world use cases, and OdysseyBench-Neo with 302 newly synthesized complex tasks. Each task requires agent to identify essential information from long-horizon interaction histories and perform multi-step reason...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":80,"matched_keywords":["Article (Journal)","Artificial intelligence","Computer science","LLM","long-term","agent","multi-agent"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/viscodex-unified-multimodal-code-generation-via-merging-vision-and-coding-models","title":"VisCodex: Unified Multimodal Code Generation via Merging Vision and Coding Models","url":"https://www.microsoft.com/en-us/research/publication/viscodex-unified-multimodal-code-generation-via-merging-vision-and-coding-models/","published":"2025-08-12","authors":["Lingjie Jiang","Shaohan Huang","Xun Wu","Yixia Li","Dongdong Zhang","Furu Wei"],"abstract":"Multimodal large language models (MLLMs) have significantly advanced the integration of visual and textual understanding. However, their ability to generate code from multimodal inputs remains limited. In this work, we introduce VisCodex, a unified framework that seamlessly merges vision and coding language models to empower MLLMs with strong multimodal code generation abilities. Leveraging a task vector-based model merging technique, we integrate a state-of-the-art coding LLM into a strong vision-language backbone, while preserving both visual comprehension and advanced coding skills. To support training and evaluation, we introduce the Multimodal Coding Dataset (MCD), a large-scale and diverse collection of 598k samples, including high-quality HTML code, chart image-code pairs, image-augmented StackOverflow QA, and algorithmic problems. Furthermore, we propose InfiBench-V, a novel and....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","1970-01-01","LLM"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"official:7909350761c2ed3a","title":"Efficient Speculative Decoding for Llama at Scale: Challenges and Solutions","url":"https://ai.meta.com/research/publications/efficient-speculative-decoding-for-llama-at-scale-challenges-and-solutions/","published":"2025-08-12","authors":["GenAI and Infra Teams"],"abstract":"Official Meta AI publication page.","companies":["Meta/FAIR"],"matched_orgs":["Meta/FAIR"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["NLP","efficient"],"author_affiliations":["Meta/FAIR"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Meta AI publication search https://ai.meta.com/results/?content_types%5B0%5D=publication&page=4"}},{"id":"apple:hjji6ig1ebm744a8luc6r9sz","title":"Eliciting In-context Retrieval and Reasoning for Long-Context Language Models","url":"https://machinelearning.apple.com/research/eliciting-in-context","published":"2025-08-12","authors":["Yifu Qiu","Varun Embar","Yizhe Zhang","Navdeep Jaitly","Shay B. Cohen","Benjamin Han"],"abstract":"Recent advancements in long-context language models (LCLMs) have the potential to transform Retrieval-Augmented Generation (RAG) by simplifying pipelines. With their extended context windows, LCLMs can process entire knowledge bases and directly handle retrieval and reasoning. This capability is defined as In-Context Retrieval and Reasoning (ICR2). However, existing benchmarks like LOFT often overestimate LCLM performance because they lack...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["retrieval"],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"openalex:W4413127433","title":"Large Language Models for Psychiatric Phenotype Extraction from Electronic Health Records","url":"https://doi.org/10.1101/2025.08.07.25333172","published":"2025-08-12","authors":["Clara Frydman-Gani","Alejandro Arias","Maria Perez Vallejo","John Daniel Londoño Martínez","Johanna Valencia‐Echeverry","Mauricio Castano","Alex Bui","Nelson B. Freimer","Carlos López‐Jaramillo","Loes M. Olde Loohuis"],"abstract":"The accurate detection of clinical phenotypes from electronic health records (EHRs) is pivotal for advancing large-scale genetic and longitudinal studies in psychiatry. Free-text clinical notes are an essential source of symptom-level information, particularly in psychiatry. However, the automated extraction of symptoms from clinical text remains challenging. Here, we tested 11 open-source generative large language models (LLMs) for their ability to detect 109 psychiatric phenotypes from clinical text, using annotated EHR notes from a psychiatric clinic in Colombia. The LLMs were evaluated both \"out-of-the-box\" and after fine-tuning, and compared against a traditional natural language processing (tNLP) method developed from the same data. We show that while base LLM performance was poor to moderate (0.2-0.6 macro-F1 for zero-shot; 0.2-0.74 macro-F1 for few shot), it improved significantl...","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"preprint","doi":"https://doi.org/10.1101/2025.08.07.25333172","openalex_id":"https://openalex.org/W4413127433","cited_by_count":0,"quality_score":41,"matched_keywords":["LLM"],"author_affiliations":["Department of Physiological Sciences","Nvidia (United States)","Universidad Autonoma de Manizales","Universidad de Antioquia","University of Caldas","University of California, Los Angeles"],"concepts":[{"id":"https://openalex.org/C166955791","display_name":"Macro","score":0.5552008748054504},{"id":"https://openalex.org/C2778344882","display_name":"Shot (pellet)","score":0.46194392442703247},{"id":"https://openalex.org/C3019952477","display_name":"Health records","score":0.4324053227901459},{"id":"https://openalex.org/C195910791","display_name":"Medical record","score":0.4230669140815735},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.42007631063461304},{"id":"https://openalex.org/C118552586","display_name":"Psychiatry","score":0.40584030747413635},{"id":"https://openalex.org/C15744967","display_name":"Psychology","score":0.3655471205711365},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.32901161909103394}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4413127652","title":"Evaluating scientific theories as predictive models in language neuroscience","url":"https://doi.org/10.1101/2025.08.12.669958","published":"2025-08-12","authors":["Chandan Deep Singh","Richard Antonello","Sihang Guo","Gavin Mischler","Jianfeng Gao","Nima Mesgarani","Alexander G. Huth"],"abstract":"the underlying phenomena, i.e. what features of the stimulus drive the response? We present Question Answering encoding models, a method for converting qualitative theories of language selectivity into highly accurate, interpretable models of brain responses. QA encoding models annotate a language stimulus by using a large language model to answer yes-no questions corresponding to qualitative theories. A compact QA encoding model that uses only 35 questions outperforms existing baselines at predicting brain responses in both fMRI and ECoG data. The model weights also provide easily interpretable maps of language selectivity across cortex; these maps show quantitative agreement with meta-analyses of the existing literature and selectivity maps identified in a follow-up fMRI experiment. These results demonstrate that LLMs can bridge the widening gap between qualitative scientific theories....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"preprint","doi":"https://doi.org/10.1101/2025.08.12.669958","openalex_id":"https://openalex.org/W4413127652","cited_by_count":0,"quality_score":41,"matched_keywords":["language model"],"author_affiliations":["Columbia University","Earth Island Institute","Microsoft (United States)","University of California, Berkeley"],"concepts":[{"id":"https://openalex.org/C2779918689","display_name":"Stimulus (psychology)","score":0.6492214202880859},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6174993515014648},{"id":"https://openalex.org/C125411270","display_name":"Encoding (memory)","score":0.5264019966125488},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.4789472222328186},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.41759270429611206},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.3920463025569916},{"id":"https://openalex.org/C188147891","display_name":"Cognitive science","score":0.38058221340179443},{"id":"https://openalex.org/C15744967","display_name":"Psychology","score":0.34590083360671997}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4413112588","title":"DXA-Net: Dual-Task Cross-Lingual Alignment Network for Zero-Shot Cross-Lingual Spoken Language Understanding","url":"https://doi.org/10.1109/tpami.2025.3597726","published":"2025-08-12","authors":["Bowen Xing","Libo Qin","Zhihong Zhu","Yu Zhou","Ivor W. Tsang"],"abstract":"The state-of-the-art zero-shot cross-lingual spoken language understanding (SLU) model utilizes cross-lingual unsupervised contrastive learning to achieve multilingual semantics alignment. While existing methods have achieved promising results, they still have two issues limiting cross-lingual knowledge transfer: (1) dual-task correlative knowledge is not explicitly modeled and transferred to target languages; (2) the semantics differences among samples are ignored, and the contrastive semantics knowledge is not transferred to target languages. In this paper, we propose a dual-task cross-lingual alignment network (DXA-Net), which makes the first attempt to tackle zero-shot cross-lingual SLU based on the prompt-tuning paradigm. To solve the first issue, we propose the co-guiding prompt, which allows the model to conditionally generate one task's label based on another one's. To solve the....","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/tpami.2025.3597726","openalex_id":"https://openalex.org/W4413112588","cited_by_count":4,"quality_score":41,"matched_keywords":[],"author_affiliations":["Agency for Science, Technology and Research","Central South University","Columbia University","Peking University","Tencent (China)","University of Science and Technology Beijing"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8856977224349976},{"id":"https://openalex.org/C185798385","display_name":"Benchmark (surveying)","score":0.6499473452568054},{"id":"https://openalex.org/C2780451532","display_name":"Task (project management)","score":0.6276770830154419},{"id":"https://openalex.org/C184337299","display_name":"Semantics (computer science)","score":0.6117680072784424},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.6035202145576477},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.5919070839881897},{"id":"https://openalex.org/C2780980858","display_name":"Dual (grammatical number)","score":0.5529857873916626},{"id":"https://openalex.org/C2780813799","display_name":"Zero (linguistics)","score":0.44527536630630493}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":4}},{"id":"openalex:W4413183762","title":"RESM: Capturing sequence and structure encoding of RNAs by mapped transfer learning from ESM (evolutionary scale modeling) protein language model","url":"https://doi.org/10.1101/2025.08.09.669469","published":"2025-08-10","authors":["Yikun Zhang","Hao Zhang","Guowei Li","He Wang","Xing Zhang","Hong‐Xi Xu","Ting-Ting Zhang","Liangsheng Wen","Yu Zhao","Jiuhong Jiang","Jie Chen","Yanjun Chen"],"abstract":"Abstract RNA sequences exhibit lower evolutionary conservation than proteins due to their informationally constrained four-letter alphabet, compared to the 20-letter code of proteins. More limited information makes unsupervised learning of structural and functional evolutionary patterns more challenging from single RNA sequences. We overcame this limitation by mapping RNA sequences to pseudo-protein sequences to allow effective transfer training from a protein language model (protein Evolution-Scale Model 2, protESM-2). The resulting RNA ESM (RESM) outperforms 12 existing RNA language models in zero-shot prediction, not only in sequence classification but also in RNA secondary structure and RNA-RNA interaction prediction. Further supervised fine-tuning demonstrates RESM’s generalizability and superior performance over the existing models compared across multiple downstream tasks, includi...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"preprint","doi":"https://doi.org/10.1101/2025.08.09.669469","openalex_id":"https://openalex.org/W4413183762","cited_by_count":0,"quality_score":45,"matched_keywords":["language model","memory"],"author_affiliations":["BGI Group (China)","China Mobile (China)","Guangzhou Experimental Station","Huawei Technologies (China)","Peking University","ShanghaiTech University","Shenzhen Bay Laboratory","Sinopec (China)"],"concepts":[{"id":"https://openalex.org/C2778112365","display_name":"Sequence (biology)","score":0.658616304397583},{"id":"https://openalex.org/C125411270","display_name":"Encoding (memory)","score":0.6093850135803223},{"id":"https://openalex.org/C2778755073","display_name":"Scale (ratio)","score":0.5576545000076294},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.45034167170524597},{"id":"https://openalex.org/C70721500","display_name":"Computational biology","score":0.36694303154945374},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.34800004959106445},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.32848697900772095},{"id":"https://openalex.org/C86803240","display_name":"Biology","score":0.2660631239414215}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4413093915","title":"DeepISLES: a clinically validated ischemic stroke segmentation model from the ISLES'22 challenge","url":"https://doi.org/10.1038/s41467-025-62373-x","published":"2025-08-09","authors":["Ezequiel de la Rosa","Mauricio Reyes","Sook‐Lei Liew","A. Hutton","Roland Wiest","Johannes Kaesmacher","Uta Hanning","Arsany Hakim","Richard Zubal","Waldo Valenzuela","David Robben","Diana M. Sima"],"abstract":"Diffusion-weighted MRI is critical for diagnosing and managing ischemic stroke, but variability in images and disease presentation limits the generalizability of AI algorithms. We present DeepISLES, a robust ensemble algorithm developed from top submissions to the 2022 Ischemic Stroke Lesion Segmentation challenge we organized. By combining the strengths of best-performing methods from leading research groups, DeepISLES achieves superior accuracy in detecting and segmenting ischemic lesions, generalizing well across diverse axes. Validation on a large external dataset (N = 1685) confirms its robustness, outperforming previous state-of-the-art models by 7.4% in Dice score and 12.6% in F1 score. It also excels at extracting clinical biomarkers and correlates strongly with clinical stroke scores, closely matching expert performance. Neuroradiologists prefer DeepISLES' segmentations over man...","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1038/s41467-025-62373-x","openalex_id":"https://openalex.org/W4413093915","cited_by_count":7,"quality_score":44,"matched_keywords":[],"author_affiliations":["Arizona State University","Beijing University of Posts and Telecommunications","Centre Hospitalier Universitaire de Tours","Centre National de la Recherche Scientifique","Centre d'Investigation Clinique - Innovation Technologique","Centre de Recherche en Acquisition et Traitement de l'Image pour la Santé","China Medical University","China Medical University Hospital","Clinical Investigation Center Plurithematic Tours","Feng Chia University","Helmholtz Zentrum München","Icometrix (Belgium)","Imperial College London","Inserm","Institut National des Sciences Appliquées de Lyon","KU Leuven","King's College London","Nvidia (United States)","Pohang University of Science and Technology","Quantitative BioSciences","Radboud University Medical Center","Radboud University Nijmegen","Sunnybrook Health Science Centre","Sunnybrook Research Institute","TUM Klinikum","Technical University of Munich","Universitat Rovira i Virgili","Universitat de Girona","University College London","University Hospital of Bern","University Hospital of Zurich","University Medical Center Hamburg-Eppendorf","University of Bern","University of Southern California","University of Toronto","University of Zurich","Universität Hamburg","Université Claude Bernard Lyon 1","Wellcome Centre for Human Neuroimaging"],"concepts":[{"id":"https://openalex.org/C3020199598","display_name":"Ischemic stroke","score":0.7158088684082031},{"id":"https://openalex.org/C2780645631","display_name":"Stroke (engine)","score":0.5485145449638367},{"id":"https://openalex.org/C89600930","display_name":"Segmentation","score":0.4938267171382904},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.48023349046707153},{"id":"https://openalex.org/C71924100","display_name":"Medicine","score":0.4426713287830353},{"id":"https://openalex.org/C70721500","display_name":"Computational biology","score":0.4054211378097534},{"id":"https://openalex.org/C126322002","display_name":"Internal medicine","score":0.3561122417449951},{"id":"https://openalex.org/C164705383","display_name":"Cardiology","score":0.3484158515930176}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":7}},{"id":"openalex:W4413237660","title":"Repository-Level Code Generation Method Enhanced by Context-Dependent Graph Retrieval","url":"https://doi.org/10.1007/978-3-031-93257-1_3","published":"2025-08-09","authors":["Hanxiao Zhang","Anqi Li","Zhirui Kuai","Zhao Wei","Li Kuang","Yingjie Xia"],"abstract":"","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"book-chapter","doi":"https://doi.org/10.1007/978-3-031-93257-1_3","openalex_id":"https://openalex.org/W4413237660","cited_by_count":0,"quality_score":41,"matched_keywords":["retrieval"],"author_affiliations":["Central South University","Hangzhou Dianzi University","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6501020193099976},{"id":"https://openalex.org/C132525143","display_name":"Graph","score":0.5517468452453613},{"id":"https://openalex.org/C2776760102","display_name":"Code (set theory)","score":0.5514008402824402},{"id":"https://openalex.org/C199360897","display_name":"Programming language","score":0.3697141408920288},{"id":"https://openalex.org/C80444323","display_name":"Theoretical computer science","score":0.34618180990219116},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.3291611075401306},{"id":"https://openalex.org/C177264268","display_name":"Set (abstract data type)","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"apple:cdeqf1wp6gxv3uw827kcrsns","title":"Your LLM Knows the Future: Uncovering Its Multi-Token Prediction Potential","url":"https://machinelearning.apple.com/research/prediction-potential","published":"2025-08-08","authors":["Mohammad Samragh","Arnav Kundu","David Harrison","Kumari Nishu","Devang Naik","Minsik Cho","Mehrdad Farajtabar"],"abstract":"Autoregressive language models are constrained by their inherently sequential nature, generating one token at a time. This paradigm limits inference speed and parallelism, especially during later stages of generation when the direction and semantics of text are relatively certain. In this work, we propose a novel framework that leverages the inherent knowledge of vanilla autoregressive language models about future tokens, combining techniques to...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["LLM"],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"apple:gfcmdbrzht4jztd0cie1op9j","title":"DiceHuBERT: Distilling HuBERT with a Self-Supervised Learning Objective","url":"https://machinelearning.apple.com/research/dicehubert","published":"2025-08-08","authors":["Hyung Gun Chi","Zakaria Aldeneh","Tatiana Likhomanenko","Oggi Rudovic","Takuya Higuchi","Li-Wei Chen","Shinji Watanabe","Ahmed Hussen Abdelaziz"],"abstract":"We introduce DiceHuBERT, a knowledge distillation framework for compressing HuBERT, a widely used self-supervised learning (SSL)-based speech foundation model. Unlike existing distillation methods that rely on layer-wise and feature-wise mapping between teacher and student models, DiceHuBERT leverages HuBERT's iterative self-distillation mechanism by directly replacing the original model with a student model. This replacement allows the student...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["distillation"],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"openalex:W4413120574","title":"Characterizing and Efficiently Accelerating Multimodal Generation Model Inference","url":"https://doi.org/10.1109/mm.2025.3596539","published":"2025-08-08","authors":["Yejin Lee","Alicia Golden","Anna Sun","Basil Hosmer","Bilge Acun","Can Balioglu","Changhan Wang","Charles David Hernandez","Christian Puhrsch","Daniel Haziza","Driss Guessous","Francisco Massa"],"abstract":"Generative artificial intelligence (AI) technology is revolutionizing the computing industry, posing new system design and optimization opportunities. In particular, AI’s ability to understand and respond in multiple modalities comes with significant system resource demands. To sustainably scale generative AI capabilities to billions of users in the world, inference must be fast and efficient. This article pinpoints key system design and optimization opportunities by characterizing a family of emerging multimodal generation models on real systems. Autoregressive token generation is a critical latency performance bottleneck, typically dominated by GPU idle time. In addition to memory-intensive attention across the generative AI models, linear operations constitute significant inference latency due to the feed forward networks in Transformer-based models. We demonstrate that state-of-the-a...","companies":["Meta/FAIR"],"matched_orgs":["Meta/FAIR"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/mm.2025.3596539","openalex_id":"https://openalex.org/W4413120574","cited_by_count":4,"quality_score":49,"matched_keywords":["memory","efficient"],"author_affiliations":["ATUM (United States)","Bellevue College","Institution of Civil Engineers","Integrated Software (United States)","Medieval Academy of America","Menlo School","Meta (United Kingdom)","Meta (United States)","Metacomp Technologies (United States)","Metrica (United States)","Metro Transit","National Society of Professional Engineers","Seattle University","The Metropolitan Opera (United States)","Visiting Nurse Association"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8762450218200684},{"id":"https://openalex.org/C2776214188","display_name":"Inference","score":0.622630774974823},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3826275169849396},{"id":"https://openalex.org/C173608175","display_name":"Parallel computing","score":0.35120633244514465}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":4}},{"id":"openalex:W4413054423","title":"MathOdyssey: Benchmarking Mathematical Problem-Solving Skills in Large Language Models Using Odyssey Math Data","url":"https://doi.org/10.1038/s41597-025-05283-3","published":"2025-08-08","authors":["Meng Fang","Xiangpeng Wan","Fei Lü","Fei Xing","Kai Zou"],"abstract":"Large language models (LLMs) have significantly advanced natural language understanding and demonstrated strong problem-solving abilities. Despite these successes, most LLMs still struggle with solving mathematical problems due to the intricate reasoning required. To support rigorous evaluation of mathematical reasoning in LLMs, we introduce the \"MathOdyssey\" dataset - a curated collection of 387 expert-generated mathematical problems spanning high school, university, and Olympiad-level topics. Each problem is accompanied by a detailed solution and categorized by difficulty level, subject area, and answer type. The dataset was developed through a rigorous multi-stage process involving contributions from subject experts, peer review, and standardized formatting. We provide detailed metadata and a standardized schema to facilitate consistent use in downstream applications. To demonstrate t...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1038/s41597-025-05283-3","openalex_id":"https://openalex.org/W4413054423","cited_by_count":7,"quality_score":44,"matched_keywords":[],"author_affiliations":["Google DeepMind (United Kingdom)","Johns Hopkins University","Mathematica Policy Research","University of Liverpool"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5768527388572693},{"id":"https://openalex.org/C86251818","display_name":"Benchmarking","score":0.5156739950180054},{"id":"https://openalex.org/C52146309","display_name":"Schema (genetic algorithms)","score":0.4936126470565796},{"id":"https://openalex.org/C130383907","display_name":"Olympiad","score":0.48818761110305786},{"id":"https://openalex.org/C88006597","display_name":"Disk formatting","score":0.43697863817214966},{"id":"https://openalex.org/C93518851","display_name":"Metadata","score":0.43105950951576233},{"id":"https://openalex.org/C145420912","display_name":"Mathematics education","score":0.4069667160511017},{"id":"https://openalex.org/C2522767166","display_name":"Data science","score":0.38984888792037964}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":7}},{"id":"openalex:W4413094312","title":"Value of artificial intelligence in neuro-oncology","url":"https://doi.org/10.1016/j.landig.2025.100876","published":"2025-08-08","authors":["Sebastian Voigtlaender","Thomas Nelson","Philipp Karschnia","Eugene Vaios","Michelle M. Kim","Philipp Lohmann","Norbert Galldiks","Mariella G Filbin","Shekoofeh Azizi","Vivek Natarajan","Michelle Monje","Jörg Dietrich"],"abstract":"CNS cancers are complex, difficult-to-treat malignancies that remain insufficiently understood and mostly incurable, despite decades of research efforts. Artificial intelligence (AI) is poised to reshape neuro-oncological practice and research, driving advances in medical image analysis, neuro-molecular-genetic characterisation, biomarker discovery, therapeutic target identification, tailored management strategies, and neurorehabilitation. This Review examines key opportunities and challenges associated with AI applications along the neuro-oncological care trajectory. We highlight emerging trends in foundation models, biophysical modelling, synthetic data, and drug development and discuss regulatory, operational, and ethical hurdles across data, translation, and implementation gaps. Near-term clinical translation depends on scaling validated AI solutions for well defined clinical tasks.....","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"review","doi":"https://doi.org/10.1016/j.landig.2025.100876","openalex_id":"https://openalex.org/W4413094312","cited_by_count":6,"quality_score":43,"matched_keywords":[],"author_affiliations":["Dana-Farber/Boston Children's Cancer and Blood Disorders Center","Duke Medical Center","Google (Canada)","Google (United States)","Harvard University","Howard Hughes Medical Institute","Ludwig-Maximilians-Universität München","Massachusetts General Hospital","Max Planck Institute for Biological Cybernetics","Neurological Surgery","RWTH Aachen University","San Francisco General Hospital","Stanford University","University Hospital Cologne","University of Cologne","University of Michigan"],"concepts":[{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5158161520957947},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5112707614898682},{"id":"https://openalex.org/C47177892","display_name":"Neurorehabilitation","score":0.47217196226119995},{"id":"https://openalex.org/C2522767166","display_name":"Data science","score":0.3962388038635254},{"id":"https://openalex.org/C71924100","display_name":"Medicine","score":0.36164742708206177},{"id":"https://openalex.org/C15744967","display_name":"Psychology","score":0.20831778645515442},{"id":"https://openalex.org/C169760540","display_name":"Neuroscience","score":0.18825969099998474},{"id":"https://openalex.org/C2778818304","display_name":"Rehabilitation","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":6}},{"id":"arxiv:2508.06471","title":"GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models","url":"https://huggingface.co/papers/2508.06471","published":"2025-08-08","authors":["GLM-4. 5 Team","Aohan Zeng","Xin Lv","Qinkai Zheng","Zhenyu Hou","Bin Chen","Chengxing Xie","Cunxiang Wang","Da Yin","Hao Zeng","Jiajie Zhang","Kedong Wang"],"abstract":"We present GLM-4.5, an open-source Mixture-of-Experts (MoE) large language model with 355B total parameters and 32B activated parameters, featuring a hybrid reasoning method that supports both thinking and direct response modes. Through multi-stage training on 23T tokens and comprehensive post-training with expert model iteration and reinforcement learning, GLM-4.5 achieves strong performance across agentic, reasoning, and coding (ARC) tasks, scoring 70.1% on TAU-Bench, 91.0% on AIME 24, and 64.2% on SWE-bench Verified. With much fewer parameters than several competitors, GLM-4.5 ranks 3rd overall among all evaluated models and 2nd on agentic benchmarks. We release both GLM-4.5 (355B parameters) and a compact version, GLM-4.5-Air (106B parameters), to advance research in reasoning and agentic AI systems. Code, models, and more information are available at https://github.com/zai-org/GLM-4...","companies":["Z.ai/Zhipu"],"matched_orgs":["Z.ai/Zhipu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["huggingface_search"],"source":"huggingface_search","work_type":"","doi":"","openalex_id":"","cited_by_count":0,"quality_score":31,"matched_keywords":["language model"],"author_affiliations":[],"concepts":[],"official_report":false,"quality_signals":{"company_match_source":"HuggingFace Papers organization/author metadata"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/auto-eval-judge-towards-a-general-agentic-framework-for-task-completion-evaluation","title":"Auto-Eval Judge: Towards a General Agentic Framework for Task Completion Evaluation","url":"https://www.microsoft.com/en-us/research/publication/auto-eval-judge-towards-a-general-agentic-framework-for-task-completion-evaluation/","published":"2025-08-07","authors":["Abubakarr Jaye","Sadid A. Hasan"],"abstract":"The increasing adoption of foundation models as agents across diverse domains necessitates a robust evaluation framework. Current methods, such as LLM-as-a-Judge, focus only on final outputs, overlooking the step-by-step reasoning that drives agentic decision-making. Meanwhile, existing Agent-as-a-Judge systems, where one agent evaluates another's task completion, are typically designed for narrow, domain-specific settings. To address this gap, we propose a generalizable, modular framework for evaluating agent task completion independent of the task domain. The framework emulates human-like evaluation by decomposing tasks into sub-tasks and validating each step using available information, such as the agent's output and reasoning. Each module contributes to a specific aspect of the evaluation process, and their outputs are aggregated to produce a final verdict on task completion. We vali...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":80,"matched_keywords":["Article (Journal)","Artificial intelligence","AI agents","automatic evaluation","Generative AI","LLM","agent"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"official:3cf2c422e1543c15","title":"GPT-5 System Card","url":"https://openai.com/index/gpt-5-system-card","published":"2025-08-07","authors":["OpenAI"],"abstract":"This GPT-5 system card explains how a unified model routing system powers fast and smart responses using gpt-5-main, gpt-5-thinking, and lightweight versions like gpt-5-thinking-nano, optimized for different tasks and developer use.","companies":["OpenAI"],"matched_orgs":["OpenAI"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Publication"],"author_affiliations":["OpenAI"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official RSS feed https://openai.com/news/rss.xml"}},{"id":"openalex:W4413044078","title":"A multimodal automated deep learning-based model for predicting biochemical recurrence of prostate cancer following prostatectomy from baseline MRI, Presurgical clinical covariates","url":"https://doi.org/10.1016/j.clinimag.2025.110579","published":"2025-08-07","authors":["Benjamin Simon","Stephanie A. Harmon","Katie Merriman","Jesse Tetreault","Ömer Tarık Esengür","Hunter Stecko","Enis C. Yılmaz","Lei Clifton","Anshul Thakur","Zoë Blake","Maria J. Merino","Julie Y. An"],"abstract":"","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1016/j.clinimag.2025.110579","openalex_id":"https://openalex.org/W4413044078","cited_by_count":3,"quality_score":40,"matched_keywords":[],"author_affiliations":["Center for Cancer Research","Johns Hopkins University","National Institutes of Health","Nvidia (United States)","Singapore General Hospital","University of California, San Diego","University of Oxford"],"concepts":[{"id":"https://openalex.org/C71924100","display_name":"Medicine","score":0.9645727276802063},{"id":"https://openalex.org/C2779466945","display_name":"Prostatectomy","score":0.8721860647201538},{"id":"https://openalex.org/C2780192828","display_name":"Prostate cancer","score":0.8236414194107056},{"id":"https://openalex.org/C12725497","display_name":"Baseline (sea)","score":0.6874111890792847},{"id":"https://openalex.org/C2910607126","display_name":"Multiparametric MRI","score":0.5700770616531372},{"id":"https://openalex.org/C2777008409","display_name":"Biochemical recurrence","score":0.560382604598999},{"id":"https://openalex.org/C2776235491","display_name":"Prostate","score":0.495641827583313},{"id":"https://openalex.org/C119043178","display_name":"Covariate","score":0.4499378204345703}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":3}},{"id":"arxiv:2508.05748","title":"WebWatcher: Breaking New Frontier of Vision-Language Deep Research Agent","url":"https://huggingface.co/papers/2508.05748","published":"2025-08-07","authors":["Xinyu Geng","Peng Xia","Zhen Zhang","Xinyu Wang","Qiuchen Wang","Ruixue Ding","Chenxi Wang","Jialong Wu","Yida Zhao","Kuan Li","Yong Jiang","Pengjun Xie"],"abstract":"Web agents such as Deep Research have demonstrated superhuman cognitive abilities, capable of solving highly challenging information-seeking problems. However, most research remains primarily text-centric, overlooking visual information in the real world. This makes multimodal Deep Research highly challenging, as such agents require much stronger reasoning abilities in perception, logic, knowledge, and the use of more sophisticated tools compared to text-based agents. To address this limitation, we introduce WebWatcher, a multi-modal Agent for Deep Research equipped with enhanced visual-language reasoning capabilities. It leverages high-quality synthetic multimodal trajectories for efficient cold start training, utilizes various tools for deep reasoning, and further enhances generalization through reinforcement learning. To better evaluate the capabilities of multimodal agents, we propos...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["huggingface_search"],"source":"huggingface_search","work_type":"","doi":"","openalex_id":"","cited_by_count":0,"quality_score":39,"matched_keywords":["retrieval","efficient","agent"],"author_affiliations":[],"concepts":[],"official_report":false,"quality_signals":{"company_match_source":"HuggingFace Papers organization/author metadata"}},{"id":"hf-org-paper:baidu:2508.04604","title":"TURA: Tool-Augmented Unified Retrieval Agent for AI Search","url":"https://huggingface.co/papers/2508.04604","published":"2025-08-06","authors":["Baidu"],"abstract":"","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["HuggingFace org papers","baidu","retrieval","agent"],"author_affiliations":["Baidu"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/baidu/papers"}},{"id":"hf-org-paper:tencent:2508.05004","title":"R-Zero: Self-Evolving Reasoning LLM from Zero Data","url":"https://huggingface.co/papers/2508.05004","published":"2025-08-06","authors":["Tencent/Hunyuan"],"abstract":"Self-evolving Large Language Models (LLMs) offer a scalable path toward super-intelligence by autonomously generating, refining, and learning from their own experiences. However, existing methods for training such models still rely heavily on vast human-curated tasks and labels, typically via fine-tuning or reinforcement learning, which poses a fundamental bottleneck to advancing AI systems toward capabilities beyond human intelligence. To overcome this limitation, we introduce R-Zero, a fully autonomous framework that generates its own training data from scratch. Starting from a single base LLM, R-Zero initializes two independent models with distinct roles, a Challenger and a Solver. These models are optimized separately and co-evolve through interaction: the Challenger is rewarded for proposing tasks near the edge of the Solver capability, and the Solver is rewarded for solving increas...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["HuggingFace org papers","tencent","LLM"],"author_affiliations":["Tencent/Hunyuan"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/tencent/papers"}},{"id":"openalex:W4413010798","title":"Global and Local Semantic Completion Learning for Vision-Language Pre-Training","url":"https://doi.org/10.1109/tpami.2025.3596394","published":"2025-08-06","authors":["Rong-Cheng Tu","Yatai Ji","Jie Jiang","Weijie Kong","Chengfei Cai","Wenzhe Zhao","Hongfa Wang","Yujiu Yang","Wei Liu"],"abstract":"Cross-modal alignment plays a crucial role in vision-language pre-training (VLP) models, enabling them to capture meaningful associations across different modalities. For this purpose, inspired by the success of masked language modeling (MLM) tasks in the NLP pre-training area, numerous masked modeling tasks have been proposed for VLP to further promote cross-modal interactions. The core idea of previous masked modeling tasks is to focus on reconstructing the masked tokens based on visible context for learning local-local alignment, i.e., associations between image patches and text tokens. However, most of them pay little attention to the global semantic features generated for the masked data, resulting in a limited cross-modal alignment ability of global representations to local features of the other modality. Therefore, in this paper, we propose a novel Global and Local Semantic Comple...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/tpami.2025.3596394","openalex_id":"https://openalex.org/W4413010798","cited_by_count":3,"quality_score":44,"matched_keywords":["retrieval"],"author_affiliations":["Nanyang Technological University","Tencent (China)","Tsinghua University","University Town of Shenzhen"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7097992300987244},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.6884492039680481},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.5479328036308289},{"id":"https://openalex.org/C2777211547","display_name":"Training (meteorology)","score":0.4858662486076355},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.37843644618988037},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.34806281328201294},{"id":"https://openalex.org/C121332964","display_name":"Physics","score":0.0},{"id":"https://openalex.org/C153294291","display_name":"Meteorology","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":3}},{"id":"openalex:W4416183514","title":"Balancing Semantic Relevance and Engagement in Related Video Recommendations","url":"https://doi.org/10.1109/mipr67560.2025.00016","published":"2025-08-06","authors":["Amit Jaspal","Feng Zhang","Wei Chun Chang","Sumit Kumar","Yubo Wang","Roni Mittleman","Qifan Wang","Weize Mao"],"abstract":"Related video recommendations commonly use collaborative filtering (CF) driven by co-engagement signals, often resulting in recommendations lacking semantic coherence and exhibiting strong popularity bias. This paper introduces a novel multi-objective retrieval framework, enhancing standard twotower models to explicitly balance semantic relevance and user engagement. Our approach uniquely combines: (a) multi-task learning (MTL) to jointly optimize co-engagement and semantic relevance, explicitly prioritizing topical coherence; (b) fusion of multimodal content features (textual and visual embeddings) for richer semantic understanding, and (c) off-policy correction (OPC) via inverse propensity weighting to effectively mitigate popularity bias. Evaluation on industrial-scale data and a two-week live A/B test reveals our framework's efficacy. We observed significant improvements in semantic....","companies":["Meta/FAIR"],"matched_orgs":["Meta/FAIR"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/mipr67560.2025.00016","openalex_id":"https://openalex.org/W4416183514","cited_by_count":1,"quality_score":42,"matched_keywords":["retrieval"],"author_affiliations":["Meta (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7817000150680542},{"id":"https://openalex.org/C158154518","display_name":"Relevance (law)","score":0.6535000205039978},{"id":"https://openalex.org/C2780586970","display_name":"Popularity","score":0.6238999962806702},{"id":"https://openalex.org/C2778493491","display_name":"Semantic matching","score":0.5985000133514404},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.5403000116348267},{"id":"https://openalex.org/C48044578","display_name":"Scalability","score":0.5203999876976013},{"id":"https://openalex.org/C2778143727","display_name":"Readability","score":0.5047000050544739},{"id":"https://openalex.org/C183115368","display_name":"Weighting","score":0.4742000102996826}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/ai-enhanced-sensemaking-exploring-the-design-of-a-generative-ai-based-assistant-to-support-genetic-professionals","title":"AI-Enhanced Sensemaking: Exploring the Design of a Generative AI-Based Assistant to Support Genetic Professionals","url":"https://www.microsoft.com/en-us/research/publication/ai-enhanced-sensemaking-exploring-the-design-of-a-generative-ai-based-assistant-to-support-genetic-professionals/","published":"2025-08-05","authors":["Angela Mastrianni","Hope Twede","Aleksandra Sarcevic","Jeremiah Wander","C. Austin-Tse","Scott Saponas","Heidi L. Rehm","A. M. Conard","Amanda K. Hall"],"abstract":"Generative AI has the potential to transform knowledge work, but further research is needed to understand how knowledge workers envision using and interacting with generative AI. We investigate the development of generative AI tools to support domain experts in knowledge work, examining task delegation and the design of human-AI interactions. Our research focused on designing a generative AI assistant to aid genetic professionals in analyzing whole genome sequences (WGS) and other clinical data for rare disease diagnosis. Through interviews with 17 genetics professionals, we identified current challenges in WGS analysis. We then conducted co-design sessions with six genetics professionals to determine tasks that could be supported by an AI assistant and considerations for designing interactions with the AI assistant. From our findings, we identified sensemaking as both a current challeng...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.1145/3756326","openalex_id":"https://openalex.org/W4413019883","cited_by_count":2,"quality_score":74,"matched_keywords":["Article (Journal)","Artificial intelligence","Human-computer interaction","Medical, health and genomics","Computer science"],"author_affiliations":["Microsoft","Broad Institute","Drexel University","Microsoft (United States)","Microsoft Research (United Kingdom)"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/velocitune-a-velocity-based-dynamic-domain-reweighting-method-for-continual-pre-training","title":"Velocitune: A Velocity-based Dynamic Domain Reweighting Method for Continual Pre-training","url":"https://www.microsoft.com/en-us/research/publication/velocitune-a-velocity-based-dynamic-domain-reweighting-method-for-continual-pre-training/","published":"2025-08-05","authors":["Zheheng Luo","Xin Zhang","Xiao Liu","Haoling Li","Yeyun Gong","Qi Chen","Peng Cheng"],"abstract":"It is well-known that a diverse corpus is critical for training large language models, which are typically constructed from a mixture of various domains. In general, previous efforts resort to sampling training data from different domains with static proportions, as well as adjusting data proportions during training. However, few methods have addressed the complexities of domain-adaptive continual pre-training. To fill this gap, we propose Velocitune, a novel framework dynamically assesses learning velocity and adjusts data proportions accordingly, favoring slower-learning domains while shunning faster-learning ones, which is guided by a scaling law to indicate the desired learning goal for each domain with less associated cost. To evaluate the effectiveness of Velocitune, we conduct experiments in a reasoning-focused dataset with CodeLlama, as well as in a corpus specialised for system....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"official:982069863c7ad963","title":"gpt-oss-120b & gpt-oss-20b Model Card","url":"https://openai.com/index/gpt-oss-model-card","published":"2025-08-05","authors":["OpenAI"],"abstract":"We introduce gpt-oss-120b and gpt-oss-20b, two open-weight reasoning models available under the Apache 2.0 license and our gpt-oss usage policy.","companies":["OpenAI"],"matched_orgs":["OpenAI"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Publication"],"author_affiliations":["OpenAI"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official RSS feed https://openai.com/news/rss.xml"}},{"id":"official:41838e7a97a55458","title":"Estimating worst case frontier risks of open weight LLMs","url":"https://openai.com/index/estimating-worst-case-frontier-risks-of-open-weight-llms","published":"2025-08-05","authors":["OpenAI"],"abstract":"In this paper, we study the worst-case frontier risks of releasing gpt-oss. We introduce malicious fine-tuning (MFT), where we attempt to elicit maximum capabilities by fine-tuning gpt-oss to be as capable as possible in two domains: biology and cybersecurity.","companies":["OpenAI"],"matched_orgs":["OpenAI"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Safety"],"author_affiliations":["OpenAI"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official RSS feed https://openai.com/news/rss.xml"}},{"id":"openalex:W4412956623","title":"TerraCraft: City-scale generative procedural modeling with natural languages","url":"https://doi.org/10.1016/j.gmod.2025.101285","published":"2025-08-05","authors":["Zulian Xi","Zhihao Yao","Jiahui Huang","Zifeng Lü","Hongyu Yan","Tai‐Jiang Mu","Zhigang Wang","Qun‐Ce Xu"],"abstract":"Automated generation of large-scale 3D scenes presents a significant challenge due to the resource-intensive training and datasets required. This is in sharp contrast to the 2D counterparts that have become readily available due to their superior speed and quality. However, prior work in 3D procedural modeling has demonstrated promise in generating high-quality assets using the combination of algorithms and user-defined rules. To leverage the best of both 2D generative models and procedural modeling tools, we present TerraCraft, a novel framework for generating geometrically high-quality 3D city-scale scenes. By utilizing Large Language Models (LLMs), TerraCraft can generate city-scale 3D scenes from natural text descriptions. With its intuitive operation and powerful capabilities, TerraCraft enables users to easily create geometrically high-quality scenes readily for various application...","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1016/j.gmod.2025.101285","openalex_id":"https://openalex.org/W4412956623","cited_by_count":1,"quality_score":38,"matched_keywords":[],"author_affiliations":["Nvidia (United States)","Tsinghua University"],"concepts":[{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.7082927823066711},{"id":"https://openalex.org/C2778755073","display_name":"Scale (ratio)","score":0.5035008788108826},{"id":"https://openalex.org/C113230428","display_name":"Procedural modeling","score":0.5008418560028076},{"id":"https://openalex.org/C2776608160","display_name":"Natural (archaeology)","score":0.4925573468208313},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.44745486974716187},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.3602312207221985},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.28752216696739197},{"id":"https://openalex.org/C95457728","display_name":"History","score":0.17203989624977112}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4412984999","title":"RexUniNLU: Recursive Method With Explicit Schema Instructor for Universal Natural Language Understanding","url":"https://doi.org/10.1109/tkde.2025.3595143","published":"2025-08-05","authors":["Chengyuan Liu","Shihang Wang","Yangyang Kang","Fubang Zhao","Kun Kuang","Weiming Lü","Changlong Sun","Fei Wu"],"abstract":"Information Extraction (IE) and Text Classification (CLS) serve as the fundamental pillars of NLU, with both disciplines relying on analyzing input sequences to categorize outputs into pre-established schemas. However, there is no existing encoder-based model that can unify IE and CLS tasks from this perspective. To fully explore the foundation shared within NLU tasks, we have proposed a <bold xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">recursive method with explicit schema instructor for universal NLU</b>. Specifically, we firstly redefine the true universal information extraction (UIE) with a formal formulation that covers almost all extraction schemas, including quadruples and quintuples which remain unsolved for previous UIE models. Then, we expands the formulation to all CLS and multi-modal NLU tasks. Based on that, we introduce RexUniNL...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/tkde.2025.3595143","openalex_id":"https://openalex.org/W4412984999","cited_by_count":1,"quality_score":38,"matched_keywords":[],"author_affiliations":["Alibaba Group (China)","Hangzhou Wanxiang Polytechnic","Zhejiang University","Zhejiang University of Science and Technology"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8336529731750488},{"id":"https://openalex.org/C52146309","display_name":"Schema (genetic algorithms)","score":0.5790033936500549},{"id":"https://openalex.org/C195324797","display_name":"Natural language","score":0.5572686791419983},{"id":"https://openalex.org/C199360897","display_name":"Programming language","score":0.5414303541183472},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.4698962867259979},{"id":"https://openalex.org/C80444323","display_name":"Theoretical computer science","score":0.3943093717098236},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.35764992237091064},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.29401320219039917}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/cognitive-loop-via-in-situ-optimization-self-adaptive-reasoning-for-science","title":"Cognitive Loop via In-Situ Optimization: Self-Adaptive Reasoning for Science","url":"https://www.microsoft.com/en-us/research/publication/cognitive-loop-via-in-situ-optimization-self-adaptive-reasoning-for-science/","published":"2025-08-04","authors":["Newman Cheng","Gordon Broadbent","William Chappell"],"abstract":"The capacity for artificial intelligence (AI) to formulate, evolve, and test altered thought patternsunder dynamic conditions indicates advanced cognition that is crucial for scientific discovery. Theexisting AI development landscape falls into two categories: 1) frameworks over non-reasoning modelsthat natively incorporate opinions on how humans think, and 2) reasoning models that abstractprecise control of the reasoning intuition away from end users. While powerful, for scientists tomaximize utility of AI in scientific discovery, they not only require accuracy and transparency inreasoning, but also steerability. Hence, we introduce an alternative approach that enables deep andprecise control over the reasoning process called: a cognitive loop via in-situ optimization (CLIO).CLIO enables large language models (LLMs) to self-formulate ways of approaching a problem, adaptbehavior when sel...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Unpublished","Artificial intelligence","Human-computer interaction","Search and information retrieval","Quantum"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"bytedance-seed:305","title":"Seed Diffusion: A Large-Scale Diffusion Language Model with High-Speed Inference","url":"https://seed.bytedance.com/en/research/seed-diffusion-a-large-scale-diffusion-language-model-with-high-speed-inference","published":"2025-08-04","authors":["Yuxuan Song","Zheng Zhang","Cheng Luo","Pengyang Gao","Fan Xia","Hao Luo","Zheng Li","Yuehang Yang","Hongli Yu","Xingwei Qu","Yuwei Fu","Jing Su"],"abstract":"We present Seed Diffusion Preview, a large-scale language model based on discrete-state diffusion, offering remarkably fast inference speed. Thanks to non-sequential, parallel generation, discrete diffusion models provide a notable speedup to mitigate the inherent latency of token-by-token decoding, as demonstrated recently (e.g., Mercury Coder, Gemini Diffusion). Seed Diffusion Preview achieves an inference speed of 2,146 token/s over H20 GPUs while maintaining competitive performance across a sweep of standard code evaluation benchmarks, significantly faster than contemporary Mercury and Gemini Diffusion, establishing new state of the art on the speed-quality Pareto frontier for code models. External paper link: https://arxiv.org/pdf/2508.02193","companies":["ByteDance/Seed"],"matched_orgs":["ByteDance/Seed"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Computation and Language","LLM","arXiv","language model"],"author_affiliations":["ByteDance/Seed"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official ByteDance Seed publication API https://seed.bytedance.com/api/get_article_list_v2"}},{"id":"official:1bf2730b89dad12e","title":"Qwen-Image: Crafting with Native Text Rendering","url":"https://qwenlm.github.io/blog/qwen-image/","published":"2025-08-04","authors":["Alibaba/Qwen"],"abstract":"GITHUB HUGGING FACE MODELSCOPE DEMO DISCORDWe are thrilled to release Qwen-Image, a 20B MMDiT image foundation model that achieves significant advances in complex text rendering and precise image editing. To try the latest model, feel free to visit Qwen Chat and choose “Image Generation”.The key features include:Superior Text Rendering: Qwen-Image excels at complex text rendering, including multi-line layouts, paragraph-level semantics, and fine-grained details. It supports both alphabetic languages (e.","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Alibaba/Qwen"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official RSS feed https://qwenlm.github.io/blog/index.xml"}},{"id":"openalex:W4413442683","title":"Silicon Whisperers: Improving Test Quality and Cost in the Age of Generative AI","url":"https://doi.org/10.1109/coins65080.2025.11125741","published":"2025-08-04","authors":["Sanmitra Banerjee","Jonti Talukdar","Farshad Firouzi"],"abstract":"The relentless scaling of process nodes, emergence of post-silicon device technologies, and stringent quality demands of automotive and medical markets have pushed integrated-circuit (IC) testing to an economic inflection point: manufacturing tests for advanced SoCs already account for nearly one-third of total silicon cost, yet latent-defect escapes remain an industry concern. In this survey, we analyze how large language models (LLMs) and broader generative AI techniques can be leveraged to break the long-standing quality-versus-cost trade-off in test. We discuss how generative AI is transitioning from pilot studies to factory-scale impact, positioning the semiconductor industry to deliver higher-quality products at lower test cost over the coming decade.","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/coins65080.2025.11125741","openalex_id":"https://openalex.org/W4413442683","cited_by_count":2,"quality_score":39,"matched_keywords":[],"author_affiliations":["Arizona State University","Nvidia (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6025755405426025},{"id":"https://openalex.org/C2777267654","display_name":"Test (biology)","score":0.5490246415138245},{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.5359894037246704},{"id":"https://openalex.org/C2779530757","display_name":"Quality (philosophy)","score":0.5222519040107727},{"id":"https://openalex.org/C200601418","display_name":"Reliability engineering","score":0.4613378942012787},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3905065059661865},{"id":"https://openalex.org/C127413603","display_name":"Engineering","score":0.228817880153656},{"id":"https://openalex.org/C111472728","display_name":"Epistemology","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":2}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/harnessing-temporal-databases-for-systematic-evaluation-of-factual-time-sensitive-question-answering-in-large-language-models","title":"Harnessing Temporal Databases for Systematic Evaluation of Factual Time-Sensitive Question-Answering in Large Language Models","url":"https://www.microsoft.com/en-us/research/publication/harnessing-temporal-databases-for-systematic-evaluation-of-factual-time-sensitive-question-answering-in-large-language-models/","published":"2025-08-03","authors":["Soyeon Kim","Jindong Wang","Xing Xie","Steven Euijong Whang"],"abstract":"Facts evolve over time, making it essential for Large Language Models (LLMs) to handle time-sensitive factual knowledge accurately and reliably. While factual Time-Sensitive Question-Answering (TSQA) tasks have been widely studied, existing benchmarks often rely on manual curation or a small, fixed set of predefined templates, which restricts scalable and comprehensive TSQA evaluation. To address these challenges, we propose TDBench, a new benchmark that systematically constructs TSQA pairs by harnessing temporal databases and database techniques such as temporal SQL and functional dependencies. We also introduce a fine-grained evaluation metric called time accuracy, which assesses the validity of time references in model explanations alongside traditional answer accuracy to enable a more reliable TSQA evaluation. Extensive experiments on contemporary LLMs show how \\ours{} enables scalab...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","large language models","1970-01-01","LLM"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/vlm4d-towards-spatiotemporal-awareness-in-vision-language-models","title":"VLM4D: Towards Spatiotemporal Awareness in Vision Language Models","url":"https://www.microsoft.com/en-us/research/publication/vlm4d-towards-spatiotemporal-awareness-in-vision-language-models/","published":"2025-08-03","authors":["Shijie Zhou","Alexander Vilesov","Xuehai He","Ziyu Wan","Shuwang Zhang","Aditya Nagachandra","Di Chang","Dongdong Chen","Xin Eric Wang","Achuta Kadambi"],"abstract":"Vision language models (VLMs) have shown remarkable capabilities in integrating linguistic and visual reasoning but remain fundamentally limited in understanding dynamic spatiotemporal interactions. Humans effortlessly track and reason about object movements, rotations, and perspective shifts-abilities essential for robust dynamic real-world understanding yet notably lacking in current VLMs. In this paper, we introduce VLM4D, the first benchmark specifically designed to evaluate the spatiotemporal reasoning capabilities of VLMs. Our benchmark comprises diverse real-world and synthetic videos accompanied by carefully curated question-answer pairs emphasizing translational and rotational motions, perspective awareness, and motion continuity. Through comprehensive evaluations of state-of-the-art open and closed-source VLMs, we identify significant performance gaps compared to human baseline...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Miscellaneous","Computer vision","Computer science"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W4412876891","title":"Multi-Agent Proactive Information Seeking with Adaptive LLM Orchestration for Non-Factoid Question Answering","url":"https://doi.org/10.1145/3711896.3737249","published":"2025-08-03","authors":["Xinran Chen","Yuchen Li","Hengyi Cai","Zhuoran Ma","Xuanang Chen","Haoyi Xiong","Shuaiqiang Wang","Ben He","Le Sun","Dawei Yin"],"abstract":"The proliferation of complex non-factoid questions in modern information seeking (IS) systems exposes critical limitations in conventional Retrieval-Augmented Generation (RAG) approaches, particularly their static search strategies and the lack of systematic multi-source information integration capabilities. Facing these limitations, we present PASS (Proactive Agent-driven Search System), a novel multi-agent framework that operationalizes human-like proactive search strategies through five specialized agents: Revealer for intent analysis, Navigator for search planning, Seeker/Reader for adaptive retrieval, and Writer for response synthesis, systematically expanding the search space through iterative query refinement and multi-perspective knowledge integration. Crucially, our framework demonstrates remarkable adaptability to mid-sized LLMs, demonstrating its scalability in resource-constr...","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3711896.3737249","openalex_id":"https://openalex.org/W4412876891","cited_by_count":1,"quality_score":54,"matched_keywords":["LLM","retrieval","agent","multi-agent"],"author_affiliations":["Baidu (China)","Chinese Academy of Sciences","Institute of Software","University of Chinese Academy of Sciences"],"concepts":[{"id":"https://openalex.org/C199168358","display_name":"Orchestration","score":0.8832303881645203},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.767631471157074},{"id":"https://openalex.org/C44291984","display_name":"Question answering","score":0.7131300568580627},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.4308144152164459},{"id":"https://openalex.org/C56739046","display_name":"Knowledge management","score":0.34034815430641174},{"id":"https://openalex.org/C558565934","display_name":"Musical","score":0.0},{"id":"https://openalex.org/C142362112","display_name":"Art","score":0.0},{"id":"https://openalex.org/C153349607","display_name":"Visual arts","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4412875566","title":"KDD 2025 Workshop on Inference Optimization for Generative AI","url":"https://doi.org/10.1145/3711896.3737865","published":"2025-08-03","authors":["Panpan Xu","Youngsuk Park","Lin Lee Cheong","Yida Wang","Yiying Zhang","George Karypis","Sherry Marcus"],"abstract":"The demand for efficient Large Language Model (LLM) inference has surged with the rising adoption of Generative AI (GenAI) applications, particularly in areas such as agents and retrieval-augmented generation. Efficient inference serves two crucial purposes: it enables the deployment of LLM-centered applications that address critical business needs, while also facilitating rapid experimentation for researchers to extract valuable insights and new understandings. However, despite the field's rapid advancement and interdisciplinary nature, there remains a limited exchange of ideas and methodologies between production-facing practitioners and researchers seeking to experiment with new GenAI concepts quickly. To bridge this gap, we are introducing the first KDD workshop on Inference Optimization for Generative AI. Our goal is to create a collaborative platform where researchers and practitio...","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3711896.3737865","openalex_id":"https://openalex.org/W4412875566","cited_by_count":0,"quality_score":53,"matched_keywords":["LLM","language model","retrieval","efficient"],"author_affiliations":["Amazon (United States)","University of California San Diego"],"concepts":[{"id":"https://openalex.org/C2776214188","display_name":"Inference","score":0.7137563228607178},{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.6699217557907104},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6622655391693115},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5061690211296082},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.42986592650413513}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4412877102","title":"KDD 2025 - AI Reasoning Day","url":"https://doi.org/10.1145/3711896.3737674","published":"2025-08-03","authors":["Jun Huan","Xiangxiong Zhang","Ye Xing","Wee Hyong Tok","Ružica Piskač"],"abstract":"Generative AI and the use of large language models (LLMs) are changing the way we work, create, play, and live. As we have witnessed in the past few years, there is significant progress in training LLMs to have a deep understanding of the semantics of language so that such models begin to perform ''reasoning''. (Human) Reasoning is the process of applying logic to derive conclusions based on new or existing information with the goal of finding the truth. Reasoning is a form of high-level human intelligence. There are many types of reasoning: mathematical reasoning, common sense reasoning, temporal reasoning, among others. Multi-hope reasoning with LLM is an emerging capability for LLMs with tens of billions of parameters. Such ''reasoning models'', including Sonnet 3.7, Chat GPT O1, have powered important application areas such as AI4coding, agentic workflow, among others. The first KDD....","companies":["Microsoft","Amazon"],"matched_orgs":["Microsoft","Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3711896.3737674","openalex_id":"https://openalex.org/W4412877102","cited_by_count":0,"quality_score":53,"matched_keywords":["LLM"],"author_affiliations":["Amazon (United States)","Microsoft (United States)","Purdue University West Lafayette","Yale University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6470974683761597},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4921824038028717}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4412877014","title":"FoodGPT: Reinforcement Post-Training of Large Language Models in the Food Delivery Domain","url":"https://doi.org/10.1145/3711896.3737222","published":"2025-08-03","authors":["Jiang Wang","Zhengxin Dong","Bing Bai","Guyu Jiang","Aiquan Yuan","Guodong Cao"],"abstract":"On-demand Food Delivery (OFD) platforms, such as Ele.me and Meituan, have transformed daily life by offering convenient ordering services. However, challenges remain in understanding user intentions and processing product-related text information. Existing NLP models, while advanced in general tasks, are less effective for OFD-specific needs due to data scarcity and high computational costs. This paper introduces FoodInstruct, a Chinese dataset with 1.6 million examples across 12 OFD-related NLP tasks, and FoodGPT, a domain-specific large language model. We propose an efficient reinforcement post-training framework that combines Supervised Fine-Tuning (SFT), Direct Preference Optimization (DPO), and Proximal Policy Optimization (PPO) with an additional rule-based reward signal. The resulting foundation model, FoodGPT, enhances model performance while minimizing resource consumption. Expe...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3711896.3737222","openalex_id":"https://openalex.org/W4412877014","cited_by_count":0,"quality_score":53,"matched_keywords":["LLM","language model","preference","efficient"],"author_affiliations":["Alibaba Group (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6859318017959595},{"id":"https://openalex.org/C97541855","display_name":"Reinforcement learning","score":0.5329490900039673},{"id":"https://openalex.org/C36503486","display_name":"Domain (mathematical analysis)","score":0.5151509046554565},{"id":"https://openalex.org/C67203356","display_name":"Reinforcement","score":0.45381516218185425},{"id":"https://openalex.org/C2994309678","display_name":"Food delivery","score":0.42110657691955566},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.36415618658065796},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3517715334892273},{"id":"https://openalex.org/C127413603","display_name":"Engineering","score":0.14856526255607605}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4412877227","title":"3rd Workshop on Causal Inference and Machine Learning in Practice","url":"https://doi.org/10.1145/3711896.3737858","published":"2025-08-03","authors":["Jeong-Yoon Lee","Jing Pan","Yifeng Wu","Totte Harinen","Paul Lo","Zhenyu Zhao","Huigang Chen","Sichao Yin","Roland Stevenson","Jingshen Wang","Yingfei Wang","Chu Wang"],"abstract":"The 3rd Workshop on Causal Inference and Machine Learning in Practice at KDD 2025 aims to bring together researchers, industry professionals, and practitioners to explore the application of causal inference within machine learning models. As causal machine learning techniques gain traction across industries, practical challenges related to trustworthiness, robustness, and fairness remain at the forefront. This workshop will provide a forum to discuss methodologies for evaluating causal models in real-world scenarios and explore innovative applications that integrate causal inference with generative AI (GenAI) and large language models (LLMs). Topics of interest include using GenAI and LLMs to facilitate causal inference tasks and leveraging causal inference techniques for evaluating and improving GenAI/LLM models. Building on the success of the previous workshop editions at KDD 2023 and....","companies":["Google/DeepMind","Amazon"],"matched_orgs":["Google/DeepMind","Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3711896.3737858","openalex_id":"https://openalex.org/W4412877227","cited_by_count":0,"quality_score":53,"matched_keywords":["LLM"],"author_affiliations":["Amazon (United States)","Bay Area Air Quality Management District","Creative Technologies (United States)","Google (United States)","Pharmacology Research Institute","Seattle University","Snap (United States)","Uber AI (United States)","University of California, Berkeley","University of Washington"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7232178449630737},{"id":"https://openalex.org/C2776214188","display_name":"Inference","score":0.6771340370178223},{"id":"https://openalex.org/C158600405","display_name":"Causal inference","score":0.6215115189552307},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.587715744972229},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5837544798851013},{"id":"https://openalex.org/C2522767166","display_name":"Data science","score":0.36709722876548767},{"id":"https://openalex.org/C149782125","display_name":"Econometrics","score":0.16019943356513977},{"id":"https://openalex.org/C33923547","display_name":"Mathematics","score":0.08415380120277405}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4412877051","title":"Large Language Models can Deliver Accurate and Interpretable Time Series Anomaly Detection","url":"https://doi.org/10.1145/3711896.3737239","published":"2025-08-03","authors":["Jun Liu","Chaoyun Zhang","Jiaxu Qian","Minghua Ma","Si Qin","C. Bansal","Qingwei Lin","Saravan Rajmohan","Dongmei Zhang"],"abstract":"Time series anomaly detection (TSAD) plays a crucial role in various industrial applications. Traditional deep learning TSAD models require extensive training data and operate as black boxes, lacking interpretability for detected anomalies. To address these challenges, we propose LLMAD, a novel TSAD method that employs Large Language Models (LLMs) to deliver accurate and interpretable TSAD results. LLMAD applies in-context anomaly detection by retrieving both positive and negative similar time series segments, significantly enhancing LLMs' effectiveness. Furthermore, LLMAD employs the Anomaly Detection Chain-of-Thought approach to mimic expert logic for its decision-making process. This further enhances its performance and enables LLMAD to provide explanations for their detections through versatile perspectives.","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3711896.3737239","openalex_id":"https://openalex.org/W4412877051","cited_by_count":15,"quality_score":52,"matched_keywords":[],"author_affiliations":["Microsoft (United States)","Microsoft Research Asia (China)","University of Chinese Academy of Sciences"],"concepts":[{"id":"https://openalex.org/C143724316","display_name":"Series (stratigraphy)","score":0.7078931927680969},{"id":"https://openalex.org/C739882","display_name":"Anomaly detection","score":0.6756701469421387},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6750167012214661},{"id":"https://openalex.org/C151406439","display_name":"Time series","score":0.6009078621864319},{"id":"https://openalex.org/C12997251","display_name":"Anomaly (physics)","score":0.5916755199432373},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4380885362625122},{"id":"https://openalex.org/C124101348","display_name":"Data mining","score":0.360927551984787},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.3362693190574646}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":15}},{"id":"openalex:W4412876941","title":"FlowXpert: Expertizing Troubleshooting Workflow Orchestration with Knowledge Base and Multi-Agent Coevolution","url":"https://doi.org/10.1145/3711896.3737221","published":"2025-08-03","authors":["Binpeng Shi","Luo Yu","Jingya Wang","Y. Zhao","Shenglin Zhang","Bowen Hao","Chenyu Zhao","Yongqian Sun","Zhi Zhang","Ronghua Sun","Haihua Li","Wei Song"],"abstract":"Incident management remains a critical yet challenging task for large-scale cloud services. Most cloud service providers abstract troubleshooting into predefined workflows for different incidents, offering step-by-step guidance. However, manually crafting workflows is resource-consuming and knowledge-intensive, hindering large-scale deployment. Most automated techniques for workflow orchestration rely on large language models (LLMs) to handle complex tasks but overlook key aspects of troubleshooting, including complex expertise, domain requirements, and the reliability of AI feedback. These limitations undermine workflow quality. Therefore, we propose FlowXpert, a novel framework for troubleshooting workflow orchestration. Leveraging LLMs, it first builds a knowledge base centered on incident-aware nodes to precisely depict expertise. Then, fed into AI feedback and synthetic preference d...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3711896.3737221","openalex_id":"https://openalex.org/W4412876941","cited_by_count":2,"quality_score":51,"matched_keywords":["preference","agent","multi-agent"],"author_affiliations":["Huawei Technologies (China)","Huawei Technologies (United States)","Nankai University","Tsinghua University"],"concepts":[{"id":"https://openalex.org/C147494362","display_name":"Troubleshooting","score":0.9229816198348999},{"id":"https://openalex.org/C199168358","display_name":"Orchestration","score":0.9194246530532837},{"id":"https://openalex.org/C177212765","display_name":"Workflow","score":0.8159626722335815},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6972013711929321},{"id":"https://openalex.org/C33009525","display_name":"Coevolution","score":0.5370842814445496},{"id":"https://openalex.org/C4554734","display_name":"Knowledge base","score":0.5281847715377808},{"id":"https://openalex.org/C42058472","display_name":"Base (topology)","score":0.4367890954017639},{"id":"https://openalex.org/C56739046","display_name":"Knowledge management","score":0.4286862313747406}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":2}},{"id":"openalex:W4412877063","title":"UQABench: Evaluating User Embedding for Prompting LLMs in Personalized Question Answering","url":"https://doi.org/10.1145/3711896.3737385","published":"2025-08-03","authors":["Langming Liu","Shilei Liu","Yujin Yuan","Yizhen Zhang","Bencheng Yan","Zhiyuan Zeng","Zihao Wang","Jiaqi Liu","Di Wang","Wenbo Su","Pengjie Wang","Jian Xu"],"abstract":"Large language models (LLMs) achieve remarkable success in natural language processing (NLP). In practical scenarios like recommendations, as users increasingly seek personalized experiences, it becomes crucial to incorporate user interaction history into the context of LLMs to enhance personalization. However, from a practical utility perspective, user interactions' extensive length and noise present challenges when used directly as text prompts. A promising solution is to compress and distill interactions into compact embeddings, serving as soft prompts to assist LLMs in generating personalized responses. Although this approach brings efficiency, a critical concern emerges: Can user embeddings adequately capture valuable information and prompt LLMs? To address this concern, we propose UQABench, a benchmark designed to evaluate the effectiveness of user embeddings in prompting LLMs for....","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3711896.3737385","openalex_id":"https://openalex.org/W4412877063","cited_by_count":1,"quality_score":50,"matched_keywords":["LLM","personalized","personalization"],"author_affiliations":["Alibaba Group (China)"],"concepts":[{"id":"https://openalex.org/C44291984","display_name":"Question answering","score":0.6525924205780029},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6195724010467529},{"id":"https://openalex.org/C41608201","display_name":"Embedding","score":0.6036489009857178},{"id":"https://openalex.org/C136764020","display_name":"World Wide Web","score":0.4172717332839966},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.3638245761394501},{"id":"https://openalex.org/C108827166","display_name":"Internet privacy","score":0.33972322940826416},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.2391391396522522}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4412877077","title":"Complicated Semantic Alignment for Long-Tail Query Rewriting in Taobao Search Based on Large Language Model","url":"https://doi.org/10.1145/3711896.3737204","published":"2025-08-03","authors":["Yunling Feng","Ling Gui","Yue Jiang","Huang Jian-feng","Dan Ou","Quan Liu","Fuyu Lv","Yajing Xu"],"abstract":"In the realm of e-commerce search, semantic matching has consistently been a core issue, as it directly affects user experience and company revenue. However, users' queries often fail to effectively retrieve relevant products due to discrepancies between the user's expression habits and product names written by merchants. Even existing large language model (LLM) based query rewriting methods can bridge the semantic gap for most queries, they are still ineffective for long-tail queries with complicated semantic. In this paper, we propose Complicated Semantic Alignment Query Rewrite(CSA-QR) framework, which mitigates the semantic differences in long-tail queries with complicated semantics. CSA-QR comprises three stages: high-quality supervised fine-tuning (SFT) dataset generation, multi-dimensional alignment dataset generation, and binary feedback Proximal Policy Optimization (PPO) for rei...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3711896.3737204","openalex_id":"https://openalex.org/W4412877077","cited_by_count":1,"quality_score":50,"matched_keywords":["LLM","language model","retrieval"],"author_affiliations":["Alibaba Group (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8154244422912598},{"id":"https://openalex.org/C154690210","display_name":"Rewriting","score":0.7171598672866821},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5298200249671936},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.5053432583808899},{"id":"https://openalex.org/C192028432","display_name":"Query language","score":0.4116593301296234},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.4039309620857239},{"id":"https://openalex.org/C199360897","display_name":"Programming language","score":0.3973856270313263}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4412877037","title":"Applying Large Language Model For Relevance Search In Tencent","url":"https://doi.org/10.1145/3711896.3737193","published":"2025-08-03","authors":["Dezhi Ye","J. Liu","Junwei Hu","Jiabin Fan","Bowen Tian","Haijin Liang","Jin Ma"],"abstract":"Relevance plays a crucial role in commercial search engines by identifying documents related to user queries and fulfilling their search needs.Traditional approaches employ encoder-only models like BERT, which process concatenated query-document pairs to predict relevance scores.While autoregressive large language models (LLMs) have revolutionized numerous NLP domains, their direct application to web-scale search systems presents significant challenges.On one hand, the relevance modeling capabilities of LLMs have not been fully explored.On the other, the high computational costs and inference times make deploying LLMs in online search systems, which demand extremely low latency, nearly impossible.In this work, we address these challenges through two key contributions.First, we develop a comprehensive evaluation framework to systematically assess the effectiveness of LLMs in query-documen...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3711896.3737193","openalex_id":"https://openalex.org/W4412877037","cited_by_count":1,"quality_score":50,"matched_keywords":["language model","efficient","quantization"],"author_affiliations":["Tencent (China)"],"concepts":[{"id":"https://openalex.org/C158154518","display_name":"Relevance (law)","score":0.7691240906715393},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7290320992469788},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.4569774568080902},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.44691193103790283},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.4400065243244171},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.372550904750824},{"id":"https://openalex.org/C17744445","display_name":"Political science","score":0.10826200246810913},{"id":"https://openalex.org/C199539241","display_name":"Law","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4412877246","title":"A Survey on Small Language Models in the Era of Large Language Models: Architecture, Capabilities, and Trustworthiness","url":"https://doi.org/10.1145/3711896.3736563","published":"2025-08-03","authors":["Fali Wang","Minhua Lin","Yao Ma","Hui Liu","Qi He","Xianfeng Tang","Jiliang Tang","Jian Pei","Suhang Wang"],"abstract":"Large language models (LLMs) based on Transformer architecture are powerful but face challenges with deployment, inference latency, and costly fine-tuning. These limitations highlight the emerging potential of small language models (SLMs), which can either replace LLMs through innovative architectures and technologies, or assist them as efficient proxy or reward models. Emerging architectures such as Mamba and xLSTM address the quadratic scaling of inference with window length in Transformers by enabling linear scaling. To maximize SLM performance, test-time compute scaling strategies reduce the performance gap with LLMs by allocating extra compute budget during test time. Beyond standalone usage, SLMs could also assist in LLMs via weak-to-strong learning, proxy tuning, and guarding, fostering secure and efficient LLM deployment. Lastly, the trustworthiness of SLMs remains a critical yet...","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3711896.3736563","openalex_id":"https://openalex.org/W4412877246","cited_by_count":5,"quality_score":50,"matched_keywords":["LLM","efficient"],"author_affiliations":["Amazon (United States)","Duke University","Michigan State University","Pennsylvania State University","Rensselaer Polytechnic Institute"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7578757405281067},{"id":"https://openalex.org/C123657996","display_name":"Architecture","score":0.6839944124221802},{"id":"https://openalex.org/C153701036","display_name":"Trustworthiness","score":0.668353259563446},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.39060884714126587},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3575401306152344},{"id":"https://openalex.org/C199360897","display_name":"Programming language","score":0.3357839584350586},{"id":"https://openalex.org/C118524514","display_name":"Computer architecture","score":0.33017730712890625},{"id":"https://openalex.org/C2522767166","display_name":"Data science","score":0.3201565146446228}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":5}},{"id":"openalex:W4412877221","title":"KDD Workshop on Evaluation and Trustworthiness of Agentic and Generative AI","url":"https://doi.org/10.1145/3711896.3737854","published":"2025-08-03","authors":["Yuan Ling","Shujing Dong","Zheng Chen","Yarong Feng","Sadid A. Hasan","George Karypis","Chandan K. Reddy"],"abstract":"The rapid deployment of Generative and Agentic AI systems-ranging from large language models to autonomous agents-has created a critical need for rigorous and trustworthy evaluation methodologies. As these models influence real-world decision-making, traditional performance metrics alone fall short in capturing issues of safety, ethical alignment, misinformation, and human-centered usability. This workshop addresses these challenges by fostering interdisciplinary discussions and innovations in evaluation strategies that go beyond conventional benchmarks. Topics include holistic and multi-perspective assessments, scalable evaluation pipelines, reasoning and goal alignment in agentic behavior, misinformation detection, cross-modal generation, and trust calibration. By advancing robust, user-centric, and societally grounded evaluation practices, this workshop contributes to expanding KDD's....","companies":["Microsoft","Amazon"],"matched_orgs":["Microsoft","Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3711896.3737854","openalex_id":"https://openalex.org/W4412877221","cited_by_count":0,"quality_score":49,"matched_keywords":[],"author_affiliations":["Amazon (United States)","Bellevue Hospital Center","Microsoft (United States)","University of Minnesota","Virginia Tech"],"concepts":[{"id":"https://openalex.org/C153701036","display_name":"Trustworthiness","score":0.8183421492576599},{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.7213584184646606},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6868522763252258},{"id":"https://openalex.org/C167966045","display_name":"Generative model","score":0.42726486921310425},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4232766628265381},{"id":"https://openalex.org/C2522767166","display_name":"Data science","score":0.38735419511795044},{"id":"https://openalex.org/C38652104","display_name":"Computer security","score":0.07381165027618408}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4412875535","title":"8th Workshop on Machine Learning in Finance","url":"https://doi.org/10.1145/3711896.3737860","published":"2025-08-03","authors":["Saurabh Nagrecha","Isha Chaturvedi","Senthil Kumar","Nitesh V. Chawla","Mahashweta Das","Daksha Yadav","José A. Rodríguez-Serrano","Eren Kurshan"],"abstract":"The financial industry leverages machine learning in more ways than just finding the right alpha signal. It grapples with supply chains, business processes, marketing, churn, fraud, and money laundering, all while maintaining compliance with the various regulatory frameworks it is beholden to. Due to the sheer volume of wealth being handled by the financial industry and its critical role in everyday life, it has been a lucrative target for a wide spectrum of ever-evolving bad actors. With each successive iteration of this workshop, we have attempted to capture the breadth of these actors - fraudsters, money launderers, market manipulators, and potentially nation-state-level risks. The emerging advances in Generative AI make this a particularly exciting time to host this workshop. GenAI offers groundbreaking approaches to handling the various data types prevalent in the financial sector.....","companies":["Google/DeepMind","Amazon"],"matched_orgs":["Google/DeepMind","Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3711896.3737860","openalex_id":"https://openalex.org/W4412875535","cited_by_count":0,"quality_score":49,"matched_keywords":[],"author_affiliations":["Amazon (United States)","Capital One (United States)","Columbia University","Google (United States)","Universitat Ramon Llull","University of Notre Dame","Visa (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5565069317817688},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3504587411880493},{"id":"https://openalex.org/C110354214","display_name":"Engineering management","score":0.32749900221824646},{"id":"https://openalex.org/C127413603","display_name":"Engineering","score":0.2560282349586487}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4412877000","title":"RankExpert: A Mixture of Textual-and-Behavioral Experts for Multi-Objective Learning-to-Rank in Web Search","url":"https://doi.org/10.1145/3711896.3737258","published":"2025-08-03","authors":["Yuchen Li","Hao Zhang","Yongqi Zhang","Hengyi Cai","Mingxin Cai","Shuaiqiang Wang","Haoyi Xiong","Linghe Kong","Dawei Yin","Lei Chen"],"abstract":"As modern learning-to-rank (LTR) systems rely on both textual and behavioral features, it is essential to extend pre-trained language models (PLMs) from text (queries and webpages) understanding to end-to-end ranking score prediction subject to multiple objectives, such as relevance, quality, authority, and recency. While textual inputs encompass a broader array of features than mere relevance and behavioral features are frequently skewed by user feedback with position bias, an integrated solution is required to jointly disentangle and fuse these heterogeneous features, ensuring robust and unbiased ranking predictions. In this work, we introduce RankExpert, a unified framework that holistically models heterogeneous ranking signals by integrating PLM-based semantic extraction with behavioral cues. RankExpert employs a lightweight PLM with hierarchical distillation for efficient query-docu...","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3711896.3737258","openalex_id":"https://openalex.org/W4412877000","cited_by_count":3,"quality_score":48,"matched_keywords":["efficient","distillation"],"author_affiliations":["Baidu (China)","Shanghai Jiao Tong University"],"concepts":[{"id":"https://openalex.org/C164226766","display_name":"Rank (graph theory)","score":0.7358857989311218},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7346967458724976},{"id":"https://openalex.org/C86037889","display_name":"Learning to rank","score":0.7054436802864075},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.5712371468544006},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.46950531005859375},{"id":"https://openalex.org/C136764020","display_name":"World Wide Web","score":0.44717997312545776},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.3751469850540161},{"id":"https://openalex.org/C2522767166","display_name":"Data science","score":0.32188618183135986}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":3}},{"id":"openalex:W4412876847","title":"LLM4Tag: Automatic Tagging System for Information Retrieval via Large Language Models","url":"https://doi.org/10.1145/3711896.3737242","published":"2025-08-03","authors":["Ruiming Tang","Chenxu Zhu","Bo Chen","Weipeng Zhang","Menghui Zhu","Xinyi Dai","Huifeng Guo"],"abstract":"Tagging systems play an essential role in various information retrieval applications such as search engines and recommender systems. Recently, Large Language Models (LLMs) have been applied in tagging systems due to their extensive world knowledge, semantic understanding, and reasoning capabilities. Despite achieving remarkable performance, existing methods still have limitations, including difficulties in retrieving relevant candidate tags comprehensively, challenges in adapting to emerging domain-specific knowledge, and the lack of reliable tag confidence quantification. To address these three limitations above, we propose an automatic tagging system LLM4Tag. First, a graph-based tag recall module is designed to effectively and comprehensively construct a small-scale highly relevant candidate tag set. Subsequently, a knowledge-enhanced tag generation module is employed to generate accu...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3711896.3737242","openalex_id":"https://openalex.org/W4412876847","cited_by_count":3,"quality_score":48,"matched_keywords":["long-term","retrieval"],"author_affiliations":["Huawei Technologies (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8301903605461121},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.6005347967147827},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.5634176135063171},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.49856114387512207},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.42317402362823486}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":3}},{"id":"openalex:W4412876875","title":"ChineseEcomQA: A Scalable E-commerce Concept Evaluation Benchmark for Large Language Models","url":"https://doi.org/10.1145/3711896.3737374","published":"2025-08-03","authors":["Haibin Chen","Kangtao Lv","Chengwei Hu","Yanshi Li","Yujin Yuan","Yancheng He","Xingyao Zhang","Langming Liu","Shilei Liu","Wenbo Su","Bo Zheng"],"abstract":"With the increasing use of Large Language Models (LLMs) in fields such as e-commerce, domain-specific concept evaluation benchmarks are crucial for assessing their domain capabilities. Existing LLMs may generate factually incorrect information within the complex e-commerce applications. Therefore, it is necessary to build an e-commerce concept benchmark. Existing benchmarks encounter two primary challenges: (1) handle the heterogeneous and diverse nature of tasks(2) distinguish between generality and specificity within the e-commerce field. To address these problems, we propose ChineseEcomQA, a scalable question-answering benchmark focused on fundamental e-commerce concepts. ChineseEcomQA is built on three core characteristics: Focus on Fundamental Concept, E-commerce Generality and E-commerce Expertise. Fundamental concepts are designed to be applicable across a diverse array of e-comme...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3711896.3737374","openalex_id":"https://openalex.org/W4412876875","cited_by_count":2,"quality_score":47,"matched_keywords":["LLM","retrieval"],"author_affiliations":["Alibaba Group (China)","Zhejiang University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8080043196678162},{"id":"https://openalex.org/C185798385","display_name":"Benchmark (surveying)","score":0.7986363768577576},{"id":"https://openalex.org/C48044578","display_name":"Scalability","score":0.6901642084121704},{"id":"https://openalex.org/C78597825","display_name":"E-commerce","score":0.4906376898288727},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.42588597536087036},{"id":"https://openalex.org/C199360897","display_name":"Programming language","score":0.3887327015399933},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.34541940689086914},{"id":"https://openalex.org/C136764020","display_name":"World Wide Web","score":0.28105628490448}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":2}},{"id":"openalex:W4412877099","title":"AliBoost: Ecological Boosting Framework in Alibaba Platform","url":"https://doi.org/10.1145/3711896.3737188","published":"2025-08-03","authors":["Qijie Shen","Yuanchen Bei","Zihong Huang","Jialin Zhu","Keqin Xu","Boya Du","Jiawei Tang","Yuning Jiang","Feiran Huang","Xiao Huang","Hao Chen"],"abstract":"Maintaining a healthy ecosystem in billion-scale online platforms is challenging, as users naturally gravitate toward popular items, leaving cold and less-explored items behind. This ''rich-get-richer'' phenomenon hinders the growth of potentially valuable cold items and harms the platform's ecosystem. Existing cold-start models primarily focus on improving initial recommendation performance for cold items but fail to address users' natural preference for popular content. In this paper, we introduce AliBoost, Alibaba's ecological boosting framework, designed to complement user-oriented natural recommendations and foster a healthier ecosystem. AliBoost incorporates a tiered boosting structure and boosting principles to ensure high-potential items quickly gain exposure while minimizing disruption to low-potential items. To achieve this, we propose the Stacking Fine-Tuning Cold Predictor to...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3711896.3737188","openalex_id":"https://openalex.org/W4412877099","cited_by_count":2,"quality_score":47,"matched_keywords":["personalized","preference"],"author_affiliations":["Alibaba Group (China)","Hong Kong Polytechnic University","Jinan University","University of Macau","Zhejiang University"],"concepts":[{"id":"https://openalex.org/C46686674","display_name":"Boosting (machine learning)","score":0.8242194652557373},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5371691584587097},{"id":"https://openalex.org/C18903297","display_name":"Ecology","score":0.37924906611442566},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.35999584197998047},{"id":"https://openalex.org/C86803240","display_name":"Biology","score":0.13030153512954712}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":2}},{"id":"openalex:W4412877216","title":"Second Workshop on Generative AI for Recommender Systems and Personalization","url":"https://doi.org/10.1145/3711896.3737856","published":"2025-08-03","authors":["Narges Tabari","Aniket Anand Deshmukh","Wang-Cheng Kang","Julian McAuley","James Caverlee","Neil Shah","George Karypis"],"abstract":"Building personalized recommender systems is a cornerstone of the modern data mining and applied machine learning (ML) community. Modern online platforms have a confluence of data including user-item interaction graphs, user and item-associated semantics (text, visual content, etc.), and metadata. Recent advancements in generative models and semantic encoders via large language models (LLMs), visual and audio encoders have significantly impacted research in relevant domains, enabling new directions in knowledge discovery and ability of models to better incorporate semantic context. This workshop bridges the research gap between the use of generative models and recommendation for personalized systems. We will focus on topics spanning the interplay between such models and conventional personalized systems.","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3711896.3737856","openalex_id":"https://openalex.org/W4412877216","cited_by_count":1,"quality_score":46,"matched_keywords":["personalized","personalization"],"author_affiliations":["Google (United States)","Snap (United States)","Texas A&M University","University of California San Diego","University of Minnesota System"],"concepts":[{"id":"https://openalex.org/C183003079","display_name":"Personalization","score":0.8451958894729614},{"id":"https://openalex.org/C557471498","display_name":"Recommender system","score":0.8420895934104919},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.735929548740387},{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.6737367510795593},{"id":"https://openalex.org/C136764020","display_name":"World Wide Web","score":0.4696093499660492},{"id":"https://openalex.org/C49774154","display_name":"Multimedia","score":0.38221848011016846},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.3591257631778717},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3586280941963196}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4412877005","title":"KnowTrace: Bootstrapping Iterative Retrieval-Augmented Generation with Structured Knowledge Tracing","url":"https://doi.org/10.1145/3711896.3737015","published":"2025-08-03","authors":["Rui Li","Quanyu Dai","Zeyu Zhang","Xu Chen","Zhenhua Dong","Ji-Rong Wen"],"abstract":"Recent advances in retrieval-augmented generation (RAG) furnish large language models (LLMs) with iterative retrievals of relevant information to handle complex multi-hop questions. These methods typically alternate between LLM reasoning and retrieval to accumulate external information into the LLM's context. However, the ever-growing context inherently imposes an increasing burden on the LLM to perceive connections among critical information pieces, with futile reasoning steps further exacerbating this overload issue. In this paper, we present KnowTrace, an elegant RAG framework to (1) mitigate the context overload and (2) bootstrap higher-quality multi-step reasoning. Instead of simply piling the retrieved contents, KnowTrace autonomously traces out desired knowledge triplets to organize a specific knowledge graph relevant to the input question. Such a structured workflow not only empo...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3711896.3737015","openalex_id":"https://openalex.org/W4412877005","cited_by_count":1,"quality_score":46,"matched_keywords":["LLM","retrieval"],"author_affiliations":["Huawei Technologies (China)","Renmin University of China"],"concepts":[{"id":"https://openalex.org/C207609745","display_name":"Bootstrapping (finance)","score":0.8689842820167542},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.791558563709259},{"id":"https://openalex.org/C138673069","display_name":"Tracing","score":0.6315814256668091},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.4107239842414856},{"id":"https://openalex.org/C199360897","display_name":"Programming language","score":0.10286390781402588},{"id":"https://openalex.org/C33923547","display_name":"Mathematics","score":0.08873218297958374},{"id":"https://openalex.org/C149782125","display_name":"Econometrics","score":0.08253610134124756}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4412877100","title":"An Automatic Graph Construction Framework based on Large Language Models for Recommendation","url":"https://doi.org/10.1145/3711896.3737192","published":"2025-08-03","authors":["Rong Shan","Jianghao Lin","Chenxu Zhu","Bo Chen","Menghui Zhu","Kangning Zhang","Jieming Zhu","Ruiming Tang","Yong Yu","Weinan Zhang"],"abstract":"Graph neural networks (GNNs) have emerged as state-of-the-art methods to learn from graph-structured data for recommendation. However, most existing GNN-based recommendation methods focus on the optimization of model structures and learning strategies based on pre-defined graphs, neglecting the importance of the graph construction stage. Earlier works for graph construction usually rely on specific rules or crowdsourcing, which are either too simplistic or too labor-intensive. Recent works start to utilize large language models (LLMs) to automate the graph construction, in view of their abundant open-world knowledge and remarkable reasoning capabilities. Nevertheless, they generally suffer from two limitations: (1) invisibility of global view (e.g., overlooking contextual information) and (2) construction inefficiency. To this end, we introduce AutoGraph, an automatic graph construction....","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3711896.3737192","openalex_id":"https://openalex.org/W4412877100","cited_by_count":1,"quality_score":46,"matched_keywords":["preference","quantization"],"author_affiliations":["Huawei Technologies (China)","Shanghai Jiao Tong University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8078153133392334},{"id":"https://openalex.org/C132525143","display_name":"Graph","score":0.49742844700813293},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.40547841787338257},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3642047643661499},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.33117425441741943},{"id":"https://openalex.org/C199360897","display_name":"Programming language","score":0.3203338086605072},{"id":"https://openalex.org/C80444323","display_name":"Theoretical computer science","score":0.23076042532920837}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4412877156","title":"Retrieval Augmented Cross-Domain LifeLong Behavior Modeling for Enhancing Click-through Rate Prediction","url":"https://doi.org/10.1145/3711896.3737261","published":"2025-08-03","authors":["Xing Tang","Chaohua Yang","Yuwen Fu","Dongyang Ao","Shiwei Li","Fuyuan Lyu","Dugang Liu","Xiuqiang He"],"abstract":"Lifelong behavior modeling for single-domain has been widely investigated in industry click-through (CTR) prediction. However, some domains do not always have rich historical behaviors in online platforms, so cross-domain lifelong behavior modeling is overlooked. This paper proposes a novel retrieval augmented lifelong cross-domain net (RAL-CDNet) to address the challenges in cross-domain lifelong behavior modeling. There are three components in RAL-CDNet, i.e., cross-domain retrieval unit, cross-domain alignment unit, and cross-net. As the general search unit in the previous study, a cross-domain retrieval unit features a retrieval augmented paradigm that utilizes a pre-trained language model to learn the intrinsic textual information of user behaviors and generates the sequential behaviors from the source domain based on sequential behaviors in the target domain. The retrieval augmente...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3711896.3737261","openalex_id":"https://openalex.org/W4412877156","cited_by_count":0,"quality_score":45,"matched_keywords":["language model","retrieval"],"author_affiliations":["Huazhong University of Science and Technology","McGill University","Shenzhen Technology University","Shenzhen University","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7376888394355774},{"id":"https://openalex.org/C36503486","display_name":"Domain (mathematical analysis)","score":0.6372054219245911},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.36223238706588745},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.334234356880188},{"id":"https://openalex.org/C33923547","display_name":"Mathematics","score":0.05701729655265808},{"id":"https://openalex.org/C134306372","display_name":"Mathematical analysis","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"arxiv:2505.14620","title":"Enhancing Learned Knowledge in LoRA Adapters Through Efficient Contrastive Decoding on Ascend NPUs","url":"http://arxiv.org/abs/2505.14620","published":"2025-08-03","authors":["Morgan Heisler","Linzi Xing","Ge Shi","Hanieh Sadri","Gursimran Singh","Weiwei Zhang","Tao Ye","Ying Xiong","Yong Zhang","Zhenan Fan"],"abstract":"Huawei Cloud users leverage LoRA (Low-Rank Adaptation) as an efficient and scalable method to fine-tune and customize large language models (LLMs) for application-specific needs. However, tasks that require complex reasoning or deep contextual understanding are often hindered by biases or interference from the base model when using typical decoding methods like greedy or beam search. These biases can lead to generic or task-agnostic responses from the base model instead of leveraging the LoRA-specific adaptations. In this paper, we introduce Contrastive LoRA Decoding (CoLD), a novel decoding framework designed to maximize the use of task-specific knowledge in LoRA-adapted models, resulting in better downstream performance. CoLD uses contrastive decoding by scoring candidate tokens based on the divergence between the probability distributions of a LoRA-adapted expert model and the corresp...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"preprint","doi":"https://doi.org/10.1145/3711896.3737215","openalex_id":"https://openalex.org/W4412876994","cited_by_count":1,"quality_score":42,"matched_keywords":["efficient"],"author_affiliations":["Huawei Technologies (Canada)","Huawei Technologies (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8009063005447388},{"id":"https://openalex.org/C57273362","display_name":"Decoding methods","score":0.7521240711212158},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.331107497215271},{"id":"https://openalex.org/C76155785","display_name":"Telecommunications","score":0.09959593415260315}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4412877186","title":"Boosting E-commerce Content Diversity: A Graph-based RAG Approach with User Reviews","url":"https://doi.org/10.1145/3711896.3736864","published":"2025-08-03","authors":["Jiaxi Yang","Yiling Jia","Carl Yang","Yi Liang","Lu Lin"],"abstract":"In e-commerce, product descriptions and other forms of copywriting play a critical role in shaping consumer purchasing decisions. However, manually crafting such content is both time-consuming and costly, particularly given the vast and diverse item catalogs. Recent advances in large language models (LLMs) have transformed automated text generation, offering immense potential to streamline this process. Despite their capabilities, LLMs continue to face obstacles in e-commerce applications, including a lack of diversity and an inability to fully grasp the nuanced details of specific items. To address these limitations, we propose a novel framework that integrates graph-based knowledge into Retrieval-Augmented Generation (RAG) to enhance content generation. Our approach leverages user reviews to construct an item-feature graph, capturing both explicit and implicit connections between items...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3711896.3736864","openalex_id":"https://openalex.org/W4412877186","cited_by_count":1,"quality_score":42,"matched_keywords":["retrieval"],"author_affiliations":["Emory University","Google (United States)","Pennsylvania State University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7185737490653992},{"id":"https://openalex.org/C46686674","display_name":"Boosting (machine learning)","score":0.6882581114768982},{"id":"https://openalex.org/C132525143","display_name":"Graph","score":0.55727618932724},{"id":"https://openalex.org/C101293273","display_name":"User-generated content","score":0.48049700260162354},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.4679628312587738},{"id":"https://openalex.org/C2781316041","display_name":"Diversity (politics)","score":0.4501877725124359},{"id":"https://openalex.org/C2778152352","display_name":"Content (measure theory)","score":0.42210888862609863},{"id":"https://openalex.org/C136764020","display_name":"World Wide Web","score":0.38667163252830505}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4412875467","title":"A Scalable Pretraining Framework for Link Prediction with Efficient Adaptation","url":"https://doi.org/10.1145/3711896.3736822","published":"2025-08-03","authors":["Yu Song","Zhigang Hua","Harry Shomer","Yan Xie","Jingzhe Liu","Bo Long","Hui Liu"],"abstract":"Link Prediction (LP) is a critical task in graph machine learning. While Graph Neural Networks (GNNs) have significantly advanced LP performance recently, existing methods face key challenges including limited supervision from sparse connectivity, sensitivity to initialization, and poor generalization under distribution shifts. We explore pretraining as a solution to address these challenges. Unlike node classification, LP is inherently a pairwise task, which requires the integration of both node- and edge-level information. In this work, we present the first systematic study on the transferability of these distinct modules and propose a late fusion strategy to effectively combine their outputs for improved performance. To handle the diversity of pretraining data and avoid negative transfer, we introduce a Mixture-of-Experts (MoE) framework that captures distinct patterns in separate exp...","companies":["Meta/FAIR"],"matched_orgs":["Meta/FAIR"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3711896.3736822","openalex_id":"https://openalex.org/W4412875467","cited_by_count":1,"quality_score":42,"matched_keywords":["efficient"],"author_affiliations":["Aqua Metrology Systems (United States)","Meta (United States)","Michigan State University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7597032785415649},{"id":"https://openalex.org/C139807058","display_name":"Adaptation (eye)","score":0.7453458905220032},{"id":"https://openalex.org/C48044578","display_name":"Scalability","score":0.6340641379356384},{"id":"https://openalex.org/C2778753846","display_name":"Link (geometry)","score":0.5165020823478699},{"id":"https://openalex.org/C120314980","display_name":"Distributed computing","score":0.35678955912590027},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3237096965312958},{"id":"https://openalex.org/C31258907","display_name":"Computer network","score":0.2913481593132019},{"id":"https://openalex.org/C77088390","display_name":"Database","score":0.1081552505493164}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4412877016","title":"<scp>AtomR:</scp> Atomic Operator-Empowered Large Language Models for Heterogeneous Knowledge Reasoning","url":"https://doi.org/10.1145/3711896.3736849","published":"2025-08-03","authors":["Amy Xin","Jinxin Liu","Zijun Yao","Zhicheng Lee","Shulin Cao","Lei Hou","Juanzi Li"],"abstract":"Despite the outstanding capabilities of large language models (LLMs), knowledge-intensive reasoning still remains a challenging task due to LLMs' limitations in compositional reasoning and the hallucination problem. A prevalent solution is to employ chain-of-thought (CoT) with retrieval-augmented generation (RAG), which first formulates a reasoning plan by decomposing complex questions into simpler sub-questions, and then applies iterative RAG at each sub-question. However, prior works exhibit two crucial problems: inadequate reasoning planning and poor incorporation of heterogeneous knowledge. In this paper, we introduce AtomR, a framework for LLMs to conduct accurate heterogeneous knowledge reasoning at the atomic level. Inspired by how knowledge graph query languages model compositional reasoning through combining predefined operations, we propose three atomic knowledge operators, a u...","companies":["Z.ai/Zhipu"],"matched_orgs":["Z.ai/Zhipu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3711896.3736849","openalex_id":"https://openalex.org/W4412877016","cited_by_count":1,"quality_score":42,"matched_keywords":["retrieval"],"author_affiliations":["Tsinghua University","Zhipu AI (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6921912431716919},{"id":"https://openalex.org/C17020691","display_name":"Operator (biology)","score":0.5609336495399475},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.37560784816741943},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.36463260650634766},{"id":"https://openalex.org/C185592680","display_name":"Chemistry","score":0.07397308945655823},{"id":"https://openalex.org/C158448853","display_name":"Repressor","score":0.0},{"id":"https://openalex.org/C55493867","display_name":"Biochemistry","score":0.0},{"id":"https://openalex.org/C104317684","display_name":"Gene","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4412877247","title":"SKnow-LLM Workshop: Structured Knowledge for Large Language Models","url":"https://doi.org/10.1145/3711896.3737845","published":"2025-08-03","authors":["Qi Zhu","Xiusi Chen","Yu Zhang","Soji Adeshina","Costas Mavromatis","Zhen Han","Vassilis N. Ioannidis","Leman Akoglu","Danai Koutra","Huzefa Rangwala"],"abstract":"Frontier large language models (LLMs) have demonstrated remarkable performance across various knowledge-intensive enterprise tasks. However, these models are primarily trained on unstructured, general knowledge, which limits their effectiveness in domain-specific applications-particularly when tasks involve structured data sources or sensitive enterprise information. We propose the first Structured Knowledge for Large Language Models Workshop - SKnow-LLM, which aims to bridge this gap by promoting research on innovative methodologies and practical applications in this area. Through keynote talks, panel discussions and paper presentations, the workshop will foster in-depth discussions on recent advances, identify existing challenges, and explore promising directions for integrating structured knowledge into LLMs.","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3711896.3737845","openalex_id":"https://openalex.org/W4412877247","cited_by_count":0,"quality_score":41,"matched_keywords":["LLM"],"author_affiliations":["Amazon (United States)","Carnegie Mellon University","Texas A&M University","University of Illinois Urbana-Champaign","University of Michigan"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7434716820716858},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.3694048523902893},{"id":"https://openalex.org/C199360897","display_name":"Programming language","score":0.3566335141658783}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4412877062","title":"ROMA: Recommendation-Oriented Language Model Adaptation Using Multi-Modal Multi-Domain Item Sequences","url":"https://doi.org/10.1145/3711896.3737262","published":"2025-08-03","authors":["Xingyu Lu","Jinpeng Wang","Jieming Zhu","Zhicheng Zhang","D. Zou","Hai-Tao Zheng","Shu‐Tao Xia","Rui Zhang"],"abstract":"Sequential recommendation (SR) aims to capture dynamic user preferences from users' historical behaviors. Recently, benefiting from astonishing understanding ability of pre-trained language models (PLMs), text-enhanced sequential recommender becomes a promising direction, which employs PLMs to extract semantic information for user/item representation. Although promising in improving performance and transferability, few existing text-enhanced SR studies have analyzed the differences between PLMs and recommenders, restricting the ability of PLMs for recommendation. In this paper, we make an in-depth comparison and conclude their discrepancies in representation and knowledge level, respectively, caused by different multi-modal content and task-oriented capabilities. Based on this, we propose a Recommendation-Oriented Language Model Adaptation framework (named ROMA) using multi-modal multi-d...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3711896.3737262","openalex_id":"https://openalex.org/W4412877062","cited_by_count":0,"quality_score":41,"matched_keywords":["language model"],"author_affiliations":["Huawei Technologies (China)","Huazhong University of Science and Technology","Tsinghua University","Tsinghua–Berkeley Shenzhen Institute","University Town of Shenzhen"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7914861440658569},{"id":"https://openalex.org/C71139939","display_name":"Modal","score":0.7657660245895386},{"id":"https://openalex.org/C2776434776","display_name":"Domain adaptation","score":0.6861135959625244},{"id":"https://openalex.org/C139807058","display_name":"Adaptation (eye)","score":0.6690359115600586},{"id":"https://openalex.org/C36503486","display_name":"Domain (mathematical analysis)","score":0.5453412532806396},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.46255508065223694},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.35460132360458374},{"id":"https://openalex.org/C33923547","display_name":"Mathematics","score":0.06659913063049316}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4412876862","title":"Optimizing Case-Based Reasoning System for Functional Test Script Generation with Large Language Models","url":"https://doi.org/10.1145/3711896.3737254","published":"2025-08-03","authors":["Siyuan Guo","H. Liu","Xiaolong Chen","Yuming Xie","Liang Zhang","Tao Han","Hechang Chen","Yi Chang","Jun Wang"],"abstract":"In this work, we explore the potential of large language models (LLMs) for generating functional test scripts, which necessitates understanding the dynamically evolving code structure of the target software. To achieve this, we propose a case-based reasoning (CBR) system utilizing a 4R cycle (i.e., retrieve, reuse, revise, and retain), which maintains and leverages a case bank of test intent descriptions and corresponding test scripts to facilitate LLMs for test script generation. To improve user experience further, we introduce Re4, an optimization method for the CBR system, comprising reranking-based retrieval finetuning and reinforced reuse finetuning. Specifically, we first identify positive examples with high semantic and script similarity, providing reliable pseudo-labels for finetuning the retriever model without costly labeling. Then, we apply supervised finetuning, followed by a...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3711896.3737254","openalex_id":"https://openalex.org/W4412876862","cited_by_count":0,"quality_score":41,"matched_keywords":["retrieval"],"author_affiliations":["Huawei Technologies (China)","Jilin International Studies University","Jilin University","University College London"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7741003036499023},{"id":"https://openalex.org/C2777267654","display_name":"Test (biology)","score":0.5449475646018982},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.5287244915962219},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5033342242240906},{"id":"https://openalex.org/C199360897","display_name":"Programming language","score":0.4199836254119873},{"id":"https://openalex.org/C86803240","display_name":"Biology","score":0.0},{"id":"https://openalex.org/C151730666","display_name":"Paleontology","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4412876870","title":"MOTTO: A Mixture-of-Experts Framework for Multi-Treatment, Multi-Outcome Treatment Effect Estimation","url":"https://doi.org/10.1145/3711896.3737056","published":"2025-08-03","authors":["Yiling Liu","Wei Shi","Chen Fu","Ziyang Jiang","Zhigang Hua","David Carlson"],"abstract":"Multi-treatment multi-outcome treatment effect estimation plays a vital role in today's industry-level applications. For example, in social media ads, practitioners simultaneously deploy multiple interventions to users' experience and track multi-faceted metrics (e.g., ad performance, engagement, churn). However, existing methods for estimating treatment effects struggle to simultaneously address the complex interplays and ensure robust counterfactual balancing across treatment-outcome pairs.","companies":["Meta/FAIR"],"matched_orgs":["Meta/FAIR"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3711896.3737056","openalex_id":"https://openalex.org/W4412876870","cited_by_count":0,"quality_score":41,"matched_keywords":["media"],"author_affiliations":["Aqua Metrology Systems (United States)","Duke University","Meta (United States)"],"concepts":[{"id":"https://openalex.org/C148220186","display_name":"Outcome (game theory)","score":0.7376353740692139},{"id":"https://openalex.org/C96250715","display_name":"Estimation","score":0.6406200528144836},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5844666361808777},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.380740761756897},{"id":"https://openalex.org/C33923547","display_name":"Mathematics","score":0.2087244987487793},{"id":"https://openalex.org/C127413603","display_name":"Engineering","score":0.1392987072467804},{"id":"https://openalex.org/C201995342","display_name":"Systems engineering","score":0.0614701509475708},{"id":"https://openalex.org/C144237770","display_name":"Mathematical economics","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"arxiv:2506.08074","title":"Hierarchical Lexical Graph for Enhanced Multi-Hop Retrieval","url":"http://arxiv.org/abs/2506.08074","published":"2025-08-03","authors":["Abdellah Ghassel","Ian Robinson","Gabriel Tanase","Hal Cooper","Bryan Thompson","Zhen Han","Vassilis N. Ioannidis","Soji Adeshina","Huzefa Rangwala"],"abstract":"Retrieval-Augmented Generation (RAG) grounds large language models in external evidence, yet it still falters when answers must be pieced together across semantically distant documents. We close this gap with the Hierarchical Lexical Graph (HLG), a three-tier index that (i) traces every atomic proposition to its source(ii) clusters propositions into latent topics, and (iii) links entities and relations to expose cross-document paths. On top of HLG we build two complementary, plug-and-play retrievers: StatementGraphRAG, which performs fine-grained entity-aware beam search over propositions for high-precision factoid questions, and TopicGraphRAG, which selects coarse topics before expanding along entity links to supply broad yet relevant context for exploratory queries. Additionally, existing benchmarks lack the complexity required to rigorously evaluate multi-hop summarization systems, of...","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"preprint","doi":"https://doi.org/10.1145/3711896.3737233","openalex_id":"https://openalex.org/W4412876920","cited_by_count":0,"quality_score":41,"matched_keywords":["retrieval"],"author_affiliations":["Amazon (United Kingdom)","Amazon (United States)","Queen's University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.780550479888916},{"id":"https://openalex.org/C132525143","display_name":"Graph","score":0.500114917755127},{"id":"https://openalex.org/C25906391","display_name":"Hop (telecommunications)","score":0.48567092418670654},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.4671351909637451},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3888208568096161},{"id":"https://openalex.org/C80444323","display_name":"Theoretical computer science","score":0.27109813690185547},{"id":"https://openalex.org/C31258907","display_name":"Computer network","score":0.1600247323513031}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4412876976","title":"FoRAGe: High-CTR Food Image Synthesis with Retrieval-Augmented Diffusion Model","url":"https://doi.org/10.1145/3711896.3737223","published":"2025-08-03","authors":["Jiaxu Feng","Xinyu Gao","Muqi Huang","Kangrong Xu","Yun Xiong","Kun Zhou","Chuan Li","Feng Shi"],"abstract":"High Click-Through Rate (CTR) imagery has proven commercial value for food delivery platforms, driving a need for strategies to generate visually compelling images. Our investigations reveal a positive correlation between appropriate food backgrounds and subsequent user engagement. Despite advancements in diffusion models, inpainting new backgrounds does not guarantee high CTR, and fine-tuning diffusion models for this purpose is prohibitively expensive for the fast-paced online food delivery advertising sector. Consequently, there is a lack of cost-effective, transferable generation frameworks tailored to high-CTR food images. In this paper, we propose FoRAGe, a novel high-CTR Food image Retrieval-Augmented Generation pipeline leveraging ControlNet based on Stable Diffusion. Specifically, we construct a comprehensive food image database encompassing a diverse range of background environ...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3711896.3737223","openalex_id":"https://openalex.org/W4412876976","cited_by_count":0,"quality_score":41,"matched_keywords":["retrieval"],"author_affiliations":["Alibaba Group (China)","Fudan University"],"concepts":[{"id":"https://openalex.org/C2779370140","display_name":"Forage","score":0.6396962404251099},{"id":"https://openalex.org/C69357855","display_name":"Diffusion","score":0.6146018505096436},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5942983627319336},{"id":"https://openalex.org/C115961682","display_name":"Image (mathematics)","score":0.5065823793411255},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.46685153245925903},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4279707372188568},{"id":"https://openalex.org/C6557445","display_name":"Agronomy","score":0.20154312252998352},{"id":"https://openalex.org/C86803240","display_name":"Biology","score":0.13940289616584778}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4412875519","title":"Advancing Graph Foundation Models: A Data-Centric Perspective","url":"https://doi.org/10.1145/3711896.3736833","published":"2025-08-03","authors":["Yuhan Li","Y.X. Wang","Jianheng Tang","Heng Chang","Yuxiang Ren","Jia Li"],"abstract":"Recently, Graph Foundation Models (GFMs) have emerged as a significant research topic in graph machine learning. Compared with traditional graph neural networks, GFMs demonstrate impressive zero-shot generalization across different domains and tasks through large-scale pre-training on extensive and diverse graph data. Despite the initial success of pre-training, existing GFMs face challenges such as extreme time consumption and the presence of redundancy and noise in pre-training data. To alleviate these issues, we present the first exploration of data-centric GFM, which aims to optimize pre-training data (i.e., a set of subgraphs) to establish a more efficient GFM while maintaining robust performance across various downstream tasks. We propose DCGFM, a plug-and-play approach for Data-Centric GFM that incorporates the idea of data pruning to remove redundant and less informative subgraph...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3711896.3736833","openalex_id":"https://openalex.org/W4412875519","cited_by_count":0,"quality_score":41,"matched_keywords":["efficient"],"author_affiliations":["Huawei Technologies (China)","Nanjing University","Soochow University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6731516122817993},{"id":"https://openalex.org/C2780966255","display_name":"Foundation (evidence)","score":0.5884968042373657},{"id":"https://openalex.org/C12713177","display_name":"Perspective (graphical)","score":0.567740261554718},{"id":"https://openalex.org/C132525143","display_name":"Graph","score":0.46690207719802856},{"id":"https://openalex.org/C2522767166","display_name":"Data science","score":0.45960545539855957},{"id":"https://openalex.org/C88230418","display_name":"Graph theory","score":0.4384550452232361},{"id":"https://openalex.org/C67186912","display_name":"Data modeling","score":0.43842005729675293},{"id":"https://openalex.org/C80444323","display_name":"Theoretical computer science","score":0.3085600733757019}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4412877029","title":"Bursting Filter Bubble: Enhancing Serendipity Recommendations with Aligned Large Language Models","url":"https://doi.org/10.1145/3711896.3737199","published":"2025-08-03","authors":["Yunjia Xi","Muyan Weng","Wen Chen","Chao Yi","Dian Chen","Gaoyang Guo","Mao Zhang","Jian Wu","Yuning Jiang","Q. Liu","Yong Yu","Weinan Zhang"],"abstract":"Recommender systems (RSs) often suffer from the feedback loop phenomenon, i.e., RSs are trained on data biased by their recommendations. This leads to the filter bubble effect that reinforces homogeneous content and reduces user satisfaction. To this end, serendipity recommendations, which offer unexpected yet relevant items, are proposed. Recently, large language models (LLMs) have shown potential in serendipity prediction due to their extensive world knowledge and reasoning capabilities. However, they still face challenges in aligning serendipity judgments with human assessments, handling long user behavior sequences, and meeting the latency requirements of industrial RSs. To address these issues, we propose SERAL (Serendipity Recommendations with Aligned Large Language Models), a framework comprising three stages: (1) Cognition Profile Generation to compress user behavior into multi-l...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3711896.3737199","openalex_id":"https://openalex.org/W4412877029","cited_by_count":3,"quality_score":40,"matched_keywords":[],"author_affiliations":["Alibaba Group (China)","Shanghai Jiao Tong University"],"concepts":[{"id":"https://openalex.org/C2779119418","display_name":"Serendipity","score":0.8857364654541016},{"id":"https://openalex.org/C157915830","display_name":"Bubble","score":0.7533945441246033},{"id":"https://openalex.org/C106131492","display_name":"Filter (signal processing)","score":0.542717695236206},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5049167275428772},{"id":"https://openalex.org/C195221683","display_name":"Bursting","score":0.45544078946113586},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.1984771490097046},{"id":"https://openalex.org/C15744967","display_name":"Psychology","score":0.12219375371932983},{"id":"https://openalex.org/C121332964","display_name":"Physics","score":0.1211463212966919}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":3}},{"id":"arxiv:2506.03699","title":"Scaling Transformers for Discriminative Recommendation via Generative Pretraining","url":"http://arxiv.org/abs/2506.03699","published":"2025-08-03","authors":["C.-L. Wang","Bingchao Wu","Zheng Chen","Lei Shen","Bing Wang","Xiaoyi Zeng"],"abstract":"Discriminative recommendation tasks, such as CTR (click-through rate) and CVR (conversion rate) prediction, play critical roles in the ranking stage of large-scale industrial recommender systems. However, training a discriminative model encounters a significant overfitting issue induced by data sparsity. Moreover, this overfitting issue worsens with larger models, causing them to underperform smaller ones. To address the overfitting issue and enhance model scalability, we propose a framework named GPSD (Generative Pretraining for Scalable Discriminative Recommendation), drawing inspiration from generative training, which exhibits no evident signs of overfitting. GPSD leverages the parameters learned from a pretrained generative model to initialize a discriminative model, and subsequently applies a sparse parameter freezing strategy. Extensive experiments conducted on both industrial-scal...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"preprint","doi":"https://doi.org/10.1145/3711896.3737117","openalex_id":"https://openalex.org/W4412877069","cited_by_count":1,"quality_score":38,"matched_keywords":[],"author_affiliations":["Alibaba Group (China)"],"concepts":[{"id":"https://openalex.org/C97931131","display_name":"Discriminative model","score":0.842300295829773},{"id":"https://openalex.org/C66322947","display_name":"Transformer","score":0.6847000122070312},{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.6002610921859741},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5938786268234253},{"id":"https://openalex.org/C99844830","display_name":"Scaling","score":0.4907735288143158},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.41859549283981323},{"id":"https://openalex.org/C153180895","display_name":"Pattern recognition (psychology)","score":0.33016806840896606},{"id":"https://openalex.org/C127413603","display_name":"Engineering","score":0.1394270658493042}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4412876817","title":"Utilizing Strategic Pre-training to Reduce Overfitting: Baguan - A Pre-trained Weather Forecasting Model","url":"https://doi.org/10.1145/3711896.3737178","published":"2025-08-03","authors":["Peisong Niu","Ziqing Ma","Tian Zhou","Weiqi Chen","Lefei Shen","Rong Jin","Liang Sun"],"abstract":"Weather forecasting has long posed a significant challenge for humanity. While recent AI-based models have surpassed traditional numerical weather prediction (NWP) methods in global forecasting tasks, overfitting remains a critical issue due to the limited availability of real-world weather data spanning only a few decades. Unlike fields like computer vision or natural language processing, where data abundance can mitigate overfitting, weather forecasting demands innovative strategies to address this challenge with existing data. In this paper, we explore pre-training methods for weather forecasting, finding that selecting an appropriately challenging pre-training task introduces locality bias, effectively mitigating overfitting and enhancing performance. We introduce Baguan, a novel data-driven model for medium-range weather forecasting, built on a Siamese Autoencoder pre-trained in a s...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3711896.3737178","openalex_id":"https://openalex.org/W4412876817","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Alibaba Group (Cayman Islands)","Alibaba Group (China)","Zhejiang University of Science and Technology"],"concepts":[{"id":"https://openalex.org/C22019652","display_name":"Overfitting","score":0.9546090364456177},{"id":"https://openalex.org/C2777211547","display_name":"Training (meteorology)","score":0.8096024990081787},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6575030088424683},{"id":"https://openalex.org/C21001229","display_name":"Weather forecasting","score":0.6563498973846436},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5560243129730225},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.4869774281978607},{"id":"https://openalex.org/C153294291","display_name":"Meteorology","score":0.21652832627296448},{"id":"https://openalex.org/C50644808","display_name":"Artificial neural network","score":0.16483089327812195}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4412875549","title":"The 11th Mining and Learning from Time Series (MILETS): From Classical Methods to LLMs","url":"https://doi.org/10.1145/3711896.3737867","published":"2025-08-03","authors":["Sanjay Purushotham","Dongjin Song","Qingsong Wen","Jun Huan","Yuxuan Liang","Cong Shen","Stefan Zohren","Yuriy Nevmyvaka"],"abstract":"Time series data is now pervasive across domains such as healthcare, finance, entertainment, and transportation, driven by advances in sensing technologies that enable continuous data collection. The resulting increase in data volume and complexity poses significant challenges to traditional analysis methods, calling for the development of advanced, interdisciplinary approaches to temporal data mining. This workshop aims to: (1) identify key challenges in learning from time series data, including irregular sampling, spatiotemporal dependencies, and uncertainty quantification; (2) explore recent advances in algorithmic, statistical, theoretical, and systems-based solutions-ranging from classical methods to emerging techniques involving large language models (LLMs); and (3) foster collaboration by highlighting open problems and novel research directions in time series analysis. Bridging th...","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3711896.3737867","openalex_id":"https://openalex.org/W4412875549","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Amazon (United States)","Morgan Stanley (United States)","University of Connecticut","University of Maryland, Baltimore County","University of Oxford","University of Virginia"],"concepts":[{"id":"https://openalex.org/C143724316","display_name":"Series (stratigraphy)","score":0.7259841561317444},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.580804169178009},{"id":"https://openalex.org/C151406439","display_name":"Time series","score":0.4648047685623169},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.36176806688308716},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.30229896306991577},{"id":"https://openalex.org/C127313418","display_name":"Geology","score":0.10665112733840942},{"id":"https://openalex.org/C151730666","display_name":"Paleontology","score":0.06526732444763184}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4412877022","title":"Score-based Generative Modeling for Conditional Independence Testing","url":"https://doi.org/10.1145/3711896.3737118","published":"2025-08-03","authors":["Yixin Ren","Chenghou Jin","Yewei Xia","Li Ke","Longtao Huang","Hui Xue","Hao Zhang","Jihong Guan","Shuigeng Zhou"],"abstract":"Determining conditional independence (CI) relationships between random variables is a fundamental yet challenging task in machine learning and statistics, especially in high-dimensional settings. Existing generative model-based CI testing methods, such as those utilizing generative adversarial networks (GANs), often struggle with undesirable modeling of conditional distributions and training instability, resulting in subpar performance. To address these issues, we propose a novel CI testing method via score-based generative modeling, which achieves precise Type I error control and strong testing power. Concretely, we first employ a sliced conditional score matching scheme to accurately estimate conditional score and use Langevin dynamics conditional sampling to generate null hypothesis samples, ensuring precise Type I error control. Then, we incorporate a goodness-of-fit stage into the m...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3711896.3737118","openalex_id":"https://openalex.org/W4412877022","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Alibaba Group (China)","Chinese Academy of Sciences","Fudan University","Shenzhen Institutes of Advanced Technology","Tongji University"],"concepts":[{"id":"https://openalex.org/C79772020","display_name":"Conditional independence","score":0.6733943223953247},{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.6727529764175415},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.613624632358551},{"id":"https://openalex.org/C35651441","display_name":"Independence (probability theory)","score":0.5967425107955933},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4843674600124359},{"id":"https://openalex.org/C149782125","display_name":"Econometrics","score":0.4398568570613861},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.4371311664581299},{"id":"https://openalex.org/C105795698","display_name":"Statistics","score":0.2683618664741516}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4412876915","title":"SPARTA: An Optimization Framework for Differentially Private Sparse Fine-Tuning","url":"https://doi.org/10.1145/3711896.3736842","published":"2025-08-03","authors":["Mehdi Makni","Kayhan Behdin","Gabriel Afriat","Zheng Xu","Sergei Vassilvitskii","Natalia Ponomareva","Rahul Mazumder","Hussein Hazimeh"],"abstract":"KDD ’25, Toronto, ON, Canada","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3711896.3736842","openalex_id":"https://openalex.org/W4412876915","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Google (United States)","LinkedIn (United States)","Massachusetts Institute of Technology"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6306928992271423},{"id":"https://openalex.org/C56372850","display_name":"Sparse matrix","score":0.4715825617313385},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.344668984413147},{"id":"https://openalex.org/C121332964","display_name":"Physics","score":0.09502527117729187},{"id":"https://openalex.org/C62520636","display_name":"Quantum mechanics","score":0.0},{"id":"https://openalex.org/C163716315","display_name":"Gaussian","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/taco-rl-task-aware-prompt-compression-optimization-with-reinforcement-learning","title":"TACO-RL: Task Aware Prompt Compression Optimization with Reinforcement Learning","url":"https://www.microsoft.com/en-us/research/publication/taco-rl-task-aware-prompt-compression-optimization-with-reinforcement-learning/","published":"2025-08-01","authors":["Shivam Shandilya","Menglin Xia","Supriyo GHOSH","Huiqiang Jiang","Jue Zhang","Qianhui Wu","Victor Ruehle","Saravan Rajmohan"],"abstract":"The increasing prevalence of large language models (LLMs) such as GPT-4 in various applications has led to a surge in the size of prompts required for optimal performance, leading to challenges in computational efficiency. Prompt compression aims to reduce the inference cost by minimizing input tokens without compromising on the task performance. However, existing prompt compression techniques either rely on sub-optimal metrics such as information entropy or model it as a task-agnostic token classification problem that fails to capture task-specific information. To address these issues, we propose a novel and efficient reinforcement learning (RL) based task-aware prompt compression method. To ensure low latency requirements, we leverage existing Transformer encoder-based token classification model while guiding the learning process with task-specific reward signals using lightweight REIN...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Natural language processing","1970-01-01","efficient","compression"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/renderformer-transformer-based-neural-rendering-of-triangle-meshes-with-global-illumination","title":"RenderFormer: Transformer-based Neural Rendering of Triangle Meshes with Global Illumination","url":"https://www.microsoft.com/en-us/research/publication/renderformer-transformer-based-neural-rendering-of-triangle-meshes-with-global-illumination/","published":"2025-08-01","authors":["Chong Zeng","Yue Dong","Pieter Peers","Hongzhi Wu","Xin Tong"],"abstract":"We present RenderFormer, a neural rendering pipeline that directly renders an image from a triangle-based representation of a scene with full global illumination effects and that does not require per-scene training or fine-tuning. Instead of taking a physics-centric approach to rendering, we formulate rendering as a sequence-to-sequence transformation where a sequence of tokens representing triangles with reflectance properties is converted to a sequence of output tokens representing small patches of pixels. RenderFormer follows a two-stage pipeline: a view-independent stage that models triangle-to-triangle light transport, and a view-dependent stage that transforms a token representing a bundle of rays to the corresponding pixel values guided by the triangle-sequence from the the view-independent stage. Both stages are based on the transformer architecture and are learned with minimal p...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Graphics and multimedia","Computer graphics","Transformer (machine learning model)","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/understanding-protecting-and-augmenting-human-cognition-with-generative-ai-a-synthesis-of-the-chi-2025-tools-for-thought-workshop","title":"Understanding, Protecting, and Augmenting Human Cognition with Generative AI: A Synthesis of the CHI 2025 Tools for Thought Workshop","url":"https://www.microsoft.com/en-us/research/publication/understanding-protecting-and-augmenting-human-cognition-with-generative-ai-a-synthesis-of-the-chi-2025-tools-for-thought-workshop/","published":"2025-08-01","authors":["Lev Tankelevitch","Elena L. Glassman","Jessica He","Aniket Kittur","Mina Lee","Srishti Palani","Advait Sarkar","Gonzalo Ramos","Yvonne Rogers","Hari Subramonyam"],"abstract":"Generative AI (GenAI) radically expands the scope and capability of automation for work, education, and everyday tasks, a transformation posing both risks and opportunities for human cognition. How will human cognition change, and what opportunities are there for GenAI to augment it? Which theories, metrics, and other tools are needed to address these questions? The CHI 2025 workshop on Tools for Thought aimed to bridge an emerging science of how the use of GenAI affects human thought, from metacognition to critical thinking, memory, and creativity, with an emerging design practice for building GenAI tools that both protect and augment human thought. Fifty-six researchers, designers, and thinkers from across disciplines as well as industry and academia, along with 34 papers and portfolios, seeded a day of discussion, ideation, and community-building. We synthesize this material here to b...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Article (Journal)","Artificial intelligence","Human-computer interaction","Human Computer Interaction","memory"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/new-frontiers-in-ai-for-biodiversity-research-and-conservation-with-multimodal-language-models","title":"New frontiers in AI for biodiversity research and conservation with multimodal language models","url":"https://www.microsoft.com/en-us/research/publication/new-frontiers-in-ai-for-biodiversity-research-and-conservation-with-multimodal-language-models/","published":"2025-08-01","authors":["Zhongqi Miao","Yuanhan Zhang","Zalan Fabian","Andres Hernandez Celis","Sara Beery","Chunyuan Li","Ziwei Liu","Amrita Gupta","Md Nasir","Wanhua Li","Jason Holmberg","Meredith Palmer"],"abstract":"The integration of artificial intelligence (AI) into biodiversity research and conservation is growing rapidly, demonstrating great potential in reducing the intensive human labour required for data preprocessing, thereby, facilitating larger data collections that offer ecological insights at unprecedented scales. However, most of these AI applications for biodiversity are still in the early stages of development, hindered by challenges inherent in real-world datasets and the limited accessibility of these technologies to practitioners without extensive programming knowledge. The recent advent of multimodal language models, which can process and generate multiple data modalities, has significantly expanded the realm of possible AI applications in biodiversity research. These models have demonstrated the ability to classify species and recognize more complex concepts, such as animal postu...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.32942/x22s6f","openalex_id":"https://openalex.org/W4401226787","cited_by_count":4,"quality_score":72,"matched_keywords":["Article (Journal)","Artificial intelligence","Ecology and environment","1970-01-01"],"author_affiliations":["Microsoft","Harvard University Press","Massachusetts Institute of Technology","Microsoft (United States)","Microsoft Research (United Kingdom)","Nanyang Technological University","Universidad de Los Andes","University of British Columbia","University of Southern California","Yale University"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/collective-agency-in-art-making-towards-community-centric-design-of-text-to-image-t2i-ai-tools","title":"Collective Agency in Art-making: Towards Community-centric Design of Text-to-Image (T2I) AI Tools","url":"https://www.microsoft.com/en-us/research/publication/collective-agency-in-art-making-towards-community-centric-design-of-text-to-image-t2i-ai-tools/","published":"2025-08-01","authors":["Abdullah Hasan Safir","Noshin Tahsin","Pratyasha Saha","Dipannita Nandi","Zulkarin Jahangir","Cecily Morrison","Syed Ishtiaque Ahmed","Nusrat Jahan Mim"],"abstract":"Text-to-image (T2I) AI tools are trained on vast datasets of existing images and artworks. We identify that existing ethical standards and regulatory safeguards for these tools largely lie within the Western neoliberal realm. They assume that artistic creativity originates from individuals rather than in collectives or social environments, ownership is an individual concern rather than shaped by communities and shared cultural traditions, and compensation should be based on individual claims rather than acknowledging collective contributions to artistic knowledge. In this paper, we counter these assumptions by theorizing ‘collective agency’ as a critical conceptual lens to rethink artists’ community-centric roles in relation to these tools. Drawing from our nine-month-long qualitative interventions with diverse Bangladeshi artist groups, we find that these artists manifest cultural reson...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Human-computer interaction","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/ai-economy-institute-aiei-program","title":"AI Economy Institute (AIEI) Program","url":"https://www.microsoft.com/en-us/research/publication/ai-economy-institute-aiei-program/","published":"2025-08-01","authors":["Microsoft"],"abstract":"\"Education in the AI Economy: Diffusion, disruption, and design for purposeful transformation\" The Microsoft AI Economy Institute (AIEI) invites proposals for its second global research call, inviting submissions that will help shape the future of work, learning, and opportunity in the age of generative AI. As GenAI is demonstrating the potential to become the most widely adopted technology in history, we face a defining moment: will its benefits be broadly shared, or will barriers—technical, institutional, regulatory, economic, educational, and cultural—limit its promise for people and communities around the world?Microsoft established AIEI to support the diffusion of AI so that it is intentional, inclusive, and informed by rigorous research and scholarly discourse so that societies everywhere may adapt to the economic and social changes AI brings. We seek to accelerate understanding of...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Unpublished","Artificial intelligence","long-term"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"official:a43f705c481ddc09","title":"Gemini 2.5 Deep Think Model Card","url":"https://storage.googleapis.com/deepmind-media/Model-Cards/Gemini-2-5-Deep-Think-Model-Card.pdf","published":"2025-08-01","authors":["Google/DeepMind"],"abstract":"Official Google DeepMind model card.","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"model_card","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["model card","Gemini 2.5 Deep Think"],"author_affiliations":["Google/DeepMind"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Google DeepMind model cards page https://deepmind.google/models/model-cards/"}},{"id":"openalex:W4412837243","title":"Parameter-Efficient Fine-Tuning in Spectral Domain for Point Cloud Learning","url":"https://doi.org/10.1109/tpami.2025.3594749","published":"2025-08-01","authors":["Dingkang Liang","Tianrui Feng","Xin Zhou","Yumeng Zhang","Zhikang Zou","Xiang Bai"],"abstract":"Recently, leveraging pre-training techniques to enhance point cloud models has become a prominent research topic. However, existing approaches typically require full fine-tuning of pre-trained models to achieve satisfactory performance on downstream tasks, which is storage-intensive and computationally demanding. To address this issue, we propose a novel Parameter-Efficient Fine-Tuning (PEFT) method for point cloud, called PointGST (Point cloud Graph Spectral Tuning). PointGST freezes the pre-trained model and introduces a lightweight, trainable Point Cloud Spectral Adapter (PCSA) for fine-tuning parameters in the spectral domain. The core idea is built on two observations: 1) The inner tokens from frozen models might present confusion in the spatial domain; 2) Task-specific intrinsic information is important for transferring the general knowledge to the downstream task. Specifically, Po...","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/tpami.2025.3594749","openalex_id":"https://openalex.org/W4412837243","cited_by_count":16,"quality_score":57,"matched_keywords":["efficient"],"author_affiliations":["Baidu (China)","Huazhong University of Science and Technology","Huazhong University of Science and Technology Hospital"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6596072912216187},{"id":"https://openalex.org/C131979681","display_name":"Point cloud","score":0.6061567664146423},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5497423410415649},{"id":"https://openalex.org/C79974875","display_name":"Cloud computing","score":0.5490826964378357},{"id":"https://openalex.org/C36503486","display_name":"Domain (mathematical analysis)","score":0.46972739696502686},{"id":"https://openalex.org/C28719098","display_name":"Point (geometry)","score":0.4636209309101105},{"id":"https://openalex.org/C153180895","display_name":"Pattern recognition (psychology)","score":0.34763896465301514},{"id":"https://openalex.org/C11413529","display_name":"Algorithm","score":0.342599481344223}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":16}},{"id":"apple:fjvz5ximyhclaiv75v3nue03","title":"STIV: Scalable Text and Image Conditioned Video Generation","url":"https://machinelearning.apple.com/research/conditioned-video-generation","published":"2025-08-01","authors":["Zongyu Lin","Wei Liu","Chen Chen","Jiasen Lu","Wenze Hu","Tsu-Jui Fu","Jesse Allardice","Zhengfeng Lai","Liangchen Song","Bowen Zhang","Cha Chen","Yiran Fei"],"abstract":"The field of video generation has made remarkable advancements, yet there remains a pressing need for a clear, systematic recipe that can guide the development of robust and scalable models. In this work, we present a comprehensive study that systematically explores the interplay of model architectures, training recipes, and data curation strategies, culminating in a simple and scalable text-image-conditioned video generation method, named STIV....","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"openalex:W4412825899","title":"Adapting Large Vision-Language Models to Visually-Aware Conversational Recommendation","url":"https://doi.org/10.1145/3711896.3736828","published":"2025-08-01","authors":["Hyunsik Jeon","Satoshi Koide","Yu Wang","Zhankui He","Julian McAuley"],"abstract":"Conversational recommender systems engage users in dialogues to refine their needs and provide more personalized suggestions. Although textual information suffices for many domains, visually driven categories such as fashion or home decor potentially require detailed visual information related to color, style, or design. To address this challenge, we propose LaViC (Large Vision-Language Conversational Recommendation Framework), a novel approach that integrates compact image representations into dialogue-based recommendation systems. LaViC leverages a large vision-language model in a two-stage process: (1) visual knowledge self-distillation, which condenses product images from thousands of tokens into a small set of visual tokens in a self-distillation manner, significantly reducing computational overhead, and (2) recommendation fine-tuning, which enables the model to incorporate both dia...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3711896.3736828","openalex_id":"https://openalex.org/W4412825899","cited_by_count":0,"quality_score":49,"matched_keywords":["language model","personalized","distillation"],"author_affiliations":["Google (United States)","Toyota Motor Corporation (Japan)","University of California San Diego"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8285750150680542},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.5245932936668396},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.44618692994117737},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.40557265281677246}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4413110710","title":"Beyond the leaderboard: leveraging predictive modeling for protein–ligand insights and discovery","url":"https://doi.org/10.1093/bioinformatics/btaf425","published":"2025-08-01","authors":["Dan Kalifa","Kira Radinsky","Eric Horvitz"],"abstract":"MOTIVATION: Ligands are biomolecules that bind to specific sites on target proteins, often inducing conformational changes important in the protein's function. Knowledge about ligand interactions with proteins are fundamental to understanding biological mechanisms and advancing drug discovery. Traditional protein language models focus on amino acid sequences and 3D structures, overlooking the structural and functional changes induced by protein-ligand interactions. We investigate the value of integrating ligand-protein binding data in several predictive challenges and leverage findings to frame research directions and questions. RESULTS: We show how the integration of protein-ligand interaction data in protein representation learning can increase predictive power. We evaluate the methodology across diverse biological tasks, demonstrating consistent improvements over state-of-the-art mode...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1093/bioinformatics/btaf425","openalex_id":"https://openalex.org/W4413110710","cited_by_count":2,"quality_score":39,"matched_keywords":[],"author_affiliations":["Microsoft (United States)","Technion – Israel Institute of Technology","University of Washington","University of Washington Medical Center"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6775123476982117},{"id":"https://openalex.org/C74187038","display_name":"Drug discovery","score":0.5489622950553894},{"id":"https://openalex.org/C70721500","display_name":"Computational biology","score":0.4480418264865875},{"id":"https://openalex.org/C2522767166","display_name":"Data science","score":0.42058077454566956},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.3511713743209839},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.32001763582229614},{"id":"https://openalex.org/C60644358","display_name":"Bioinformatics","score":0.23548674583435059},{"id":"https://openalex.org/C86803240","display_name":"Biology","score":0.190841943025589}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":2}},{"id":"openalex:W4412825822","title":"FlanS: A Foundation Model for Free-Form Language-based Segmentation in Medical Images","url":"https://doi.org/10.1145/3711896.3736963","published":"2025-08-01","authors":["Longchao Da","Rui Wang","Xiaojian Xu","Parminder Bhatia","Taha Kass‐Hout","Hua Wei","Cao Xiao"],"abstract":"KDD ’25, August 3–7, 2025, Toronto, ON, Canada","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3711896.3736963","openalex_id":"https://openalex.org/W4412825822","cited_by_count":1,"quality_score":38,"matched_keywords":[],"author_affiliations":["Amazon (United States)","Arizona State University","General Electric (Spain)"],"concepts":[{"id":"https://openalex.org/C2780966255","display_name":"Foundation (evidence)","score":0.7725655436515808},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7058709859848022},{"id":"https://openalex.org/C89600930","display_name":"Segmentation","score":0.6121960282325745},{"id":"https://openalex.org/C124504099","display_name":"Image segmentation","score":0.581464409828186},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5418819189071655},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.45872825384140015},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.40927383303642273},{"id":"https://openalex.org/C205649164","display_name":"Geography","score":0.0610431432723999}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"arxiv:2506.10778","title":"SlotPi: Physics-informed Object-centric Reasoning Models","url":"http://arxiv.org/abs/2506.10778","published":"2025-08-01","authors":["Jian Li","Han Wan","Ning Lin","Yuliang Zhan","Ruizhi Chengze","H. Wang","Yi Zhang","Hongsheng Liu","Zidong Wang","Yu Fan","Hao Sun"],"abstract":"Understanding and reasoning about dynamics governed by physical laws through visual observation, akin to human capabilities in the real world, poses significant challenges. Currently, object-centric dynamic simulation methods, which emulate human behavior, have achieved notable progress but overlook two critical aspects: 1) the integration of physical knowledge into models. Humans gain physical insights by observing the world and apply this knowledge to accurately reason about various dynamic scenarios; 2) the validation of model adaptability across diverse scenarios. Real-world dynamics, especially those involving fluids and objects, demand models that not only capture object interactions but also simulate fluid flow characteristics. To address these gaps, we introduce SlotPi, a slot-based physics-informed object-centric reasoning model. SlotPi integrates a physical module based on Hami...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"preprint","doi":"https://doi.org/10.1145/3711896.3737131","openalex_id":"https://openalex.org/W4412825809","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Huawei Technologies (China)","Renmin University of China"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5492904782295227},{"id":"https://openalex.org/C2781238097","display_name":"Object (grammar)","score":0.4869316816329956},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.25273096561431885}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4413980085","title":"Adaptive Ensemble Learning for Predictive Maintenance: Neural-Gated Mixture of Experts Architecture","url":"https://doi.org/10.1109/ibdap65587.2025.11145853","published":"2025-08-01","authors":["Vipin Kataria","Vinodkumar Reddy Surasani","Sumeet Jeswani"],"abstract":"Predictive maintenance has emerged as a critical technology in Industry 4.0, yet existing solutions often rely on single models that may not capture the full complexity of industrial failure patterns across different operational conditions. This paper presents a Mixture of Experts (MoE) architecture that leverages neural network-based dynamic gating to intelligently combine predictions from diverse machine learning models. Our approach employs four expert modelsRandom Forest, K-Nearest Neighbors, Gradient Boosting and XGBoost, -coordinated by a neural gating network that learns to assign weights based on operational conditions. The gating network consists of a four-layer architecture with dropout regularization, enabling adaptive expert selection for different failure patterns. Evaluated on a synthetic industrial dataset with 10,000 instances, our MoE system achieves 99.34% accuracy and....","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/ibdap65587.2025.11145853","openalex_id":"https://openalex.org/W4413980085","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Google (United States)","Picarro (United States)","World Wide Web Consortium"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7420827150344849},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5754961967468262},{"id":"https://openalex.org/C123657996","display_name":"Architecture","score":0.5314855575561523},{"id":"https://openalex.org/C45942800","display_name":"Ensemble learning","score":0.5192072987556458},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.5092038512229919},{"id":"https://openalex.org/C50644808","display_name":"Artificial neural network","score":0.4695110023021698},{"id":"https://openalex.org/C142362112","display_name":"Art","score":0.0},{"id":"https://openalex.org/C153349607","display_name":"Visual arts","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"official:062933369db4e545","title":"Claude Opus 4.1 System Card","url":"https://www-cdn.anthropic.com/9fa30625273bafdf5af82c93719d7ca606485a16.pdf","published":"2025-08","authors":["Anthropic"],"abstract":"Official Anthropic system card for Claude Opus 4.1.","companies":["Anthropic"],"matched_orgs":["Anthropic"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"model_card","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["system card","Claude","Claude Opus 4.1"],"author_affiliations":["Anthropic"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Anthropic system cards page https://www.anthropic.com/system-cards"}},{"id":"official:c0c5a09937cf2300","title":"GEN3C: 3D-Informed World-Consistent Video Generation with Precise Camera Control","url":"https://research.nvidia.com/publication/2025-08_gen3c-3d-informed-world-consistent-video-generation-precise-camera-control","published":"2025-08","authors":["Xuanchi Ren","Tianchang Shen","Jiahui Huang","Huan Ling","Yifan Lu","Merlin Nimier-David","Thomas Müller","Alex Keller","Sanja Fidler","Jun Gao"],"abstract":"Official NVIDIA Research publication. CVPR","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["CVPR"],"author_affiliations":["NVIDIA"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official NVIDIA Research publications page https://research.nvidia.com/publications?f%5B0%5D=publication_date%3A2025&page=1"}},{"id":"official:ba6d45cc8286f0ab","title":"Fly, Fail, Fix: Iterative Game Repair with Reinforcement Learning and Large Multimodal Models","url":"https://research.nvidia.com/publication/2025-08_fly-fail-fix-iterative-game-repair-reinforcement-learning-and-large-multimodal","published":"2025-08","authors":["Alex Zook","Josef Spjut","Jonathan Tremblay"],"abstract":"Official NVIDIA Research publication.","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["NVIDIA"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official NVIDIA Research publications page https://research.nvidia.com/publications?f%5B0%5D=publication_date%3A2025&page=1"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/phi-ground-tech-report-advancing-perception-in-gui-grounding","title":"Phi-Ground Tech Report: Advancing Perception in GUI Grounding","url":"https://www.microsoft.com/en-us/research/publication/phi-ground-tech-report-advancing-perception-in-gui-grounding/","published":"2025-07-31","authors":["Miaosen Zhang","Ziqiang Xu","Jialiang Zhu","Qi Dai","Kai Qiu","Yifan Yang","Chong Luo","Tianyi Chen","Justin Wagle","Tim Franklin","Baining Guo"],"abstract":"With the development of multimodal reasoning models, Computer Use Agents (CUAs), akin to Jarvis from \"Iron Man\" , are becoming a reality. GUI grounding is a core component for CUAs to execute actual actions, similar to mechanical control in robotics, and it directly leads to the success or failure of the system. It determines actions such as clicking and typing, as well as related parameters like the coordinates for clicks. Current end-to-end grounding models still achieve less than [latex]65%[/latex] accuracy on challenging benchmarks like ScreenSpot-pro and UI-Vision, indicating they are far from being ready for deployment. In this work, we conduct an empirical study on the training of grounding models, examining details from data collection to model training. Ultimately, we developed the [latex]\\textbf{Phi-Ground}[/latex] model family, which achieves state-of-the-art performance acros...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Tech Report","Artificial intelligence","Computer vision","Computer science","agent"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"bytedance-seed:881","title":"Seed-Prover: Deep and Broad Reasoning for Automated Theorem Proving","url":"https://seed.bytedance.com/en/research/seed-prover-deep-and-broad-reasoning-for-automated-theorem-proving","published":"2025-07-31","authors":["Luoxin Chen","Jinming Gu","Liankai Huang","Wenhao Huang","Zhicheng Jiang","Allan Jie","Xiaoran Jin","Xing Jin","Chenggang Li","Kaijing Ma","Cheng Ren","Jiawei Shen"],"abstract":"LLMs have demonstrated strong mathematical reasoning abilities by leveraging reinforcement learning with long chain-of-thought, yet they continue to struggle with theorem proving due to the lack of clear supervision signals when solely using natural language. Dedicated domain-specific languages like Lean provide clear supervision via formal verification of proofs, enabling effective training through reinforcement learning. In this work, we propose \\textbf{Seed-Prover}, a lemma-style whole-proof reasoning model. Seed-Prover can iteratively refine its proof based on Lean feedback, proved lemmas, and self-summarization. To solve IMO-level contest problems, we design three test-time inference strategies that enable both deep and broad reasoning. Seed-Prover proves 78.1% of formalized past IMO problems, saturates MiniF2F, and achieves over 50\\% on PutnamBench, outperforming the previous state...","companies":["ByteDance/Seed"],"matched_orgs":["ByteDance/Seed"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["LLM","arXiv"],"author_affiliations":["ByteDance/Seed"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official ByteDance Seed publication API https://seed.bytedance.com/api/get_article_list_v2"}},{"id":"openalex:W4412795922","title":"Constraint-Aware Zero-Shot Vision-Language Navigation in Continuous Environments","url":"https://doi.org/10.1109/tpami.2025.3594204","published":"2025-07-31","authors":["Kehan Chen","Dong An","Yan Huang","Rongtao Xu","Yifei Su","Yonggen Ling","Ian Reid","Liang Wang"],"abstract":"We address the task of Vision-Language Navigation in Continuous Environments (VLN-CE) under the zero-shot setting. Zero-shot VLN-CE is particularly challenging due to the absence of expert demonstrations for training and minimal environment structural prior to guide navigation. To confront these challenges, we propose a Constraint-Aware Navigator (CA-Nav), which reframes zero-shot VLN-CE as a sequential, constraint-aware sub-instruction completion process. CA-Nav continuously translates sub-instructions into navigation plans using two core modules: the Constraint-Aware Sub-instruction Manager (CSM) and the Constraint-Aware Value Mapper (CVM). CSM defines the completion criteria for decomposed sub-instructions as constraints and tracks navigation progress by switching sub-instructions in a constraint-aware manner. CVM, guided by CSM's constraints, generates a value map on the fly and refi...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/tpami.2025.3594204","openalex_id":"https://openalex.org/W4412795922","cited_by_count":5,"quality_score":42,"matched_keywords":[],"author_affiliations":["Chinese Academy of Sciences","Institute of Automation","Mohamed bin Zayed University of Artificial Intelligence","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6849270462989807},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.662065863609314},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.6333637237548828},{"id":"https://openalex.org/C2778344882","display_name":"Shot (pellet)","score":0.5789347887039185},{"id":"https://openalex.org/C2776036281","display_name":"Constraint (computer-aided design)","score":0.5601311922073364},{"id":"https://openalex.org/C2780813799","display_name":"Zero (linguistics)","score":0.5031141638755798},{"id":"https://openalex.org/C5339829","display_name":"Machine vision","score":0.41917097568511963},{"id":"https://openalex.org/C33923547","display_name":"Mathematics","score":0.18589147925376892}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":5}},{"id":"openalex:W4412809105","title":"Segment Anything in Context with Vision Foundation Models","url":"https://doi.org/10.1007/s11263-025-02517-0","published":"2025-07-31","authors":["Yang Liu","Muzhi Zhu","Hao Chen","Xinlong Wang","Bo Feng","Hao Wang","Shiyu Li","Raviteja Vemulapalli","Chunhua Shen"],"abstract":"","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1007/s11263-025-02517-0","openalex_id":"https://openalex.org/W4412809105","cited_by_count":1,"quality_score":38,"matched_keywords":[],"author_affiliations":["Apple (United States)","Beijing Academy of Artificial Intelligence","Zhejiang University"],"concepts":[{"id":"https://openalex.org/C2780966255","display_name":"Foundation (evidence)","score":0.6766918897628784},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.6228142380714417},{"id":"https://openalex.org/C2779343474","display_name":"Context (archaeology)","score":0.5971420407295227},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5222542881965637},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.4252660572528839},{"id":"https://openalex.org/C153180895","display_name":"Pattern recognition (psychology)","score":0.39997047185897827},{"id":"https://openalex.org/C205649164","display_name":"Geography","score":0.19166061282157898},{"id":"https://openalex.org/C166957645","display_name":"Archaeology","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4412792087","title":"Corrigendum to “FaceChain-MMID: Generating highly identity-consistent realistic portraits via dividing & merging multi-modal representations” [Pattern Recognition 168 (2025) 111858]","url":"https://doi.org/10.1016/j.patcog.2025.112161","published":"2025-07-31","authors":["Chao Xu","Wang Fei","Cheng Yu","Baigui Sun","Jian Zhao"],"abstract":"","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1016/j.patcog.2025.112161","openalex_id":"https://openalex.org/W4412792087","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Alibaba Group (China)","Alibaba Group (United States)","China Telecom (China)","Northwestern Polytechnical University","Walsh University","Zhejiang Lab"],"concepts":[{"id":"https://openalex.org/C71139939","display_name":"Modal","score":0.7847768664360046},{"id":"https://openalex.org/C2778355321","display_name":"Identity (music)","score":0.6452845335006714},{"id":"https://openalex.org/C162462552","display_name":"Portrait","score":0.6406468152999878},{"id":"https://openalex.org/C153180895","display_name":"Pattern recognition (psychology)","score":0.5164539813995361},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.4743543863296509},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4380509853363037},{"id":"https://openalex.org/C11413529","display_name":"Algorithm","score":0.3434329032897949},{"id":"https://openalex.org/C46312422","display_name":"Communication","score":0.33858633041381836}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/villa-x-enhancing-latent-action-modeling-in-vision-language-action-models","title":"villa-X: Enhancing Latent Action Modeling in Vision-Language-Action Models","url":"https://www.microsoft.com/en-us/research/publication/villa-x-enhancing-latent-action-modeling-in-vision-language-action-models/","published":"2025-07-30","authors":["Xiaoyu Chen","Hangxing Wei","Pushi Zhang","Chuheng Zhang","Kaixin Wang","Yanjiang Guo","Rushuai Yang","Yucen Wang","Xinquan Xiao","Li Zhao","Jianyu Chen","Jiang Bian"],"abstract":"Vision-Language-Action (VLA) models have emerged as a popular paradigm for learning robot manipulation policies that can follow language instructions and generalize to novel scenarios. Recent works have begun to explore the incorporation of latent actions, abstract representations of motion between two frames, into VLA pre-training. In this paper, we introduce villa-X, a novel Vision-Language-Latent-Action (ViLLA) framework that advances latent action modeling for learning generalizable robot manipulation policies. Our approach improves both how latent actions are learned and how they are incorporated into VLA pre-training. We demonstrate that villa-X can generate latent action plans in a zero-shot fashion, even for unseen embodiments and open-vocabulary symbolic understanding. This capability enables villa-X to achieve superior performance across diverse simulation tasks in SIMPLER and....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W4412700808","title":"Visual delta generation with large multi-modal models enhances composed image retrieval using unlabeled data","url":"https://doi.org/10.1038/s41598-025-07798-6","published":"2025-07-28","authors":["Young Kyun Jang","Donghyun Kim"],"abstract":"Composed Image Retrieval (CIR) retrieves a target image similar to a reference image, guided by a provided textual modification (i.e., a triplet with<reference image, text, target image>). Previous works on CIR can largely be developed into two categories: supervised learning approaches and weakly supervised (i.e., zero-shot) learning approaches. Supervised learning CIR models require labeled triplets which may not be easily obtained and limit the widespread use of CIR and its scalability. On the other hand, a weakly supervised learning approach (also called zero-shot CIR), can be relatively easily trained with image-caption pairs without considering the image-to-image relation (i.e., no supervised triplet required), but this approach tends to yield lower accuracy. In this paper, we extend the application of existing Composed Image Retrieval (CIR) into semi-supervised learning, domain ad...","companies":["Meta/FAIR"],"matched_orgs":["Meta/FAIR"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1038/s41598-025-07798-6","openalex_id":"https://openalex.org/W4412700808","cited_by_count":0,"quality_score":45,"matched_keywords":["language model","retrieval"],"author_affiliations":["Korea University","Meta (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6543067097663879},{"id":"https://openalex.org/C71139939","display_name":"Modal","score":0.5385924577713013},{"id":"https://openalex.org/C5072461","display_name":"Delta","score":0.48257318139076233},{"id":"https://openalex.org/C153180895","display_name":"Pattern recognition (psychology)","score":0.45956116914749146},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4481552541255951},{"id":"https://openalex.org/C115961682","display_name":"Image (mathematics)","score":0.4289679527282715},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.41897016763687134},{"id":"https://openalex.org/C185592680","display_name":"Chemistry","score":0.1571427881717682}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"arxiv:2507.20534","title":"Kimi K2: Open Agentic Intelligence","url":"https://huggingface.co/papers/2507.20534","published":"2025-07-28","authors":["Kimi Team","Yifan Bai","Yiping Bao","Guanduo Chen","Jiahao Chen","Ningxin Chen","Ruijue Chen","Yanru Chen","Yuankun Chen","Yutian Chen","Zhuofu Chen","Jialei Cui"],"abstract":"We introduce Kimi K2, a Mixture-of-Experts (MoE) large language model with 32 billion activated parameters and 1 trillion total parameters. We propose the MuonClip optimizer, which improves upon Muon with a novel QK-clip technique to address training instability while enjoying the advanced token efficiency of Muon. Based on MuonClip, K2 was pre-trained on 15.5 trillion tokens with zero loss spike. During post-training, K2 undergoes a multi-stage post-training process, highlighted by a large-scale agentic data synthesis pipeline and a joint reinforcement learning (RL) stage, where the model improves its capabilities through interactions with real and synthetic environments. Kimi K2 achieves state-of-the-art performance among open-source non-thinking models, with strengths in agentic capabilities. Notably, K2 obtains 66.1 on Tau2-Bench, 76.5 on ACEBench (En), 65.8 on SWE-Bench Verified, an...","companies":["Moonshot/Kimi"],"matched_orgs":["Moonshot/Kimi"],"company_groups":["company_china"],"company_regions":["China"],"sources":["huggingface_search"],"source":"huggingface_search","work_type":"","doi":"","openalex_id":"","cited_by_count":0,"quality_score":31,"matched_keywords":["language model"],"author_affiliations":[],"concepts":[],"official_report":false,"quality_signals":{"company_match_source":"HuggingFace Papers organization/author metadata"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/tablelora-low-rank-adaptation-on-table-structure-understanding-for-large-language-models","title":"TableLoRA: Low-rank Adaptation on Table Structure Understanding for Large Language Models","url":"https://www.microsoft.com/en-us/research/publication/tablelora-low-rank-adaptation-on-table-structure-understanding-for-large-language-models/","published":"2025-07-27","authors":["Xinyi He","Yihao Liu","Mengyu Zhou","Yeye He","Haoyu Dong","Shi Han","Dongmei Zhang"],"abstract":"Tabular data are crucial in many fields and their understanding by large language models (LLMs) under high parameter efficiency paradigm is important.However, directly applying parameter-efficient fine-tuning (PEFT) techniques to tabular tasks presents significant challenges, particularly in terms of better table serialization and the representation of two-dimensional structured information within a one-dimensional sequence.To address this, we propose TableLoRA, a module designed to improve LLMs' understanding of table structure during PEFT. It incorporates special tokens for serializing tables with special token encoder and uses 2D LoRA to encode low-rank information on cell positions. Experiments on four tabular-related datasets demonstrate that TableLoRA consistently outperforms vanilla LoRA and surpasses various table encoding methods tested in control experiments. These findings rev...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":80,"matched_keywords":["Inproceedings (Conference)","Algorithms","Artificial intelligence","Data platforms and analytics","Human language technologies","1970-01-01","efficient"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/tablepilot-recommending-human-preferred-tabular-data-analysis-with-large-language-models","title":"TablePilot: Recommending Human-Preferred Tabular Data Analysis with Large Language Models","url":"https://www.microsoft.com/en-us/research/publication/tablepilot-recommending-human-preferred-tabular-data-analysis-with-large-language-models/","published":"2025-07-27","authors":["Deyin Yi","Yihao Liu","Lang Cao","Mengyu Zhou","Haoyu Dong","Shi Han","Dongmei Zhang"],"abstract":"Tabular data analysis is crucial in many scenarios, yet efficiently identifying relevant queries and results for new tables remains challenging due to data complexity, diverse analytical operations, and high-quality analysis requirements. To address these challenges, we aim to recommend query–code–result triplets tailored for new tables in tabular data analysis workflows. In this paper, we present TablePilot, a pioneering tabular data analysis framework leveraging large language models to autonomously generate comprehensive and superior analytical results without relying on user profiles or prior interactions. Additionally, we propose Rec-Align, a novel method to further improve recommendation quality and better align with human preferences. Experiments on DART, a dataset specifically designed for comprehensive tabular data analysis recommendation, demonstrate the effectiveness of our fr...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Data platforms and analytics","Human-computer interaction","Programming languages and software engineering","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/adaptagent-adapting-multimodal-web-agents-with-few-shot-learning-from-human-demonstrations","title":"AdaptAgent: Adapting Multimodal Web Agents with Few-Shot Learning from Human Demonstrations","url":"https://www.microsoft.com/en-us/research/publication/adaptagent-adapting-multimodal-web-agents-with-few-shot-learning-from-human-demonstrations/","published":"2025-07-27","authors":["Gaurav Verma","Rachneet Kaur","Nishan Srishankar","Zhen Zeng","Tucker Balch","Manuela Veloso"],"abstract":"State-of-the-art multimodal web agents, powered by Multimodal Large Language Models (MLLMs), can autonomously execute many web tasks by processing user instructions and interacting with graphical user interfaces (GUIs). Current strategies for building web agents rely on (i) the generalizability of underlying MLLMs and their steerability via prompting, and (ii) large-scale fine-tuning of MLLMs on web-related tasks. However, web agents still struggle to automate tasks on unseen websites and domains, limiting their applicability to enterprise-specific and proprietary platforms. Beyond generalization from large-scale pre-training and fine-tuning, we propose building agents for few-shot adaptability using human demonstrations. We introduce the AdaptAgent framework that enables both proprietary and open-weights multimodal web agents to adapt to new websites and domains using few human demonstr...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Human language technologies","Computer science","large language models","agent"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/geometric-mean-policy-optimization","title":"Geometric-Mean Policy Optimization","url":"https://www.microsoft.com/en-us/research/publication/geometric-mean-policy-optimization/","published":"2025-07-27","authors":["Yuzhong Zhao","Yue Liu","Junpeng Liu","Jingye Chen","Xun Wu","Yaru Hao","Tengchao Lv","Shaohan Huang","Lei Cui","Qixiang Ye","Fang Wan","Furu Wei"],"abstract":"Group Relative Policy Optimization (GRPO) has significantly enhanced the reasoning capability of large language models by optimizing the arithmetic mean of token-level rewards. Unfortunately, GRPO is observed to suffer from unstable policy updates when facing tokens with outlier importance-weighted rewards, which manifest as extreme importance sampling ratios during training. In this study, we propose Geometric-Mean Policy Optimization (GMPO), with the aim to improve the stability of GRPO through suppressing token reward outliers. Instead of optimizing the arithmetic mean, GMPO maximizes the geometric mean of token-level rewards, which is inherently less sensitive to outliers and maintains a more stable range of importance sampling ratio. GMPO is plug-and-play-simply replacing GRPO's arithmetic mean with the geometric mean of token-level rewards, as the latter is inherently less sensitiv...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","large language models","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"official:c4bf83eb77f25ea5","title":"GSPO: Towards Scalable Reinforcement Learning for Language Models","url":"https://qwenlm.github.io/blog/gspo/","published":"2025-07-27","authors":["Alibaba/Qwen"],"abstract":"PAPER DISCORDIntroduction Reinforcement Learning (RL) has emerged as a pivotal paradigm for scaling language models and enhancing their deep reasoning and problem-solving capabilities. To scale RL, the foremost prerequisite is maintaining stable and robust training dynamics. However, we observe that existing RL algorithms (such as GRPO) exhibit severe instability issues during long training and lead to irreversible model collapse, hindering further performance improvements with increased compute.To enable successful RL scaling, we propose the Group Sequence Policy Optimization (GSPO) algorithm.","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Alibaba/Qwen"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official RSS feed https://qwenlm.github.io/blog/index.xml"}},{"id":"openalex:W4412681917","title":"Multimodal spatial language maps for robot navigation and manipulation","url":"https://doi.org/10.1177/02783649251351658","published":"2025-07-27","authors":["Chenguang Huang","Oier Mees","Andy Zeng","Wolfram Burgard"],"abstract":"Grounding language to a navigating agent’s observations can leverage pretrained multimodal foundation models to match perceptions to object or event descriptions. However, previous approaches remain disconnected from environment mapping, lack the spatial precision of geometric maps, or neglect additional modality information beyond vision. To address this, we propose multimodal spatial language maps as a spatial map representation that fuses pretrained multimodal features with a 3D reconstruction of the environment. We build these maps autonomously using standard exploration. We present two instances of our maps, which are visual-language maps (VLMaps) and their extension to audio-visual-language maps (AVLMaps) obtained by adding audio information. When combined with large language models (LLMs), VLMaps can translate natural language commands into open-vocabulary spatial goals (e.g., “in...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1177/02783649251351658","openalex_id":"https://openalex.org/W4412681917","cited_by_count":1,"quality_score":42,"matched_keywords":["agent"],"author_affiliations":["Berkeley College","Google (United States)","University of California, Berkeley","University of Technology Nuremberg"],"concepts":[{"id":"https://openalex.org/C90509273","display_name":"Robot","score":0.6265011429786682},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5645798444747925},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5610255002975464},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.5349873900413513},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.4707033336162567},{"id":"https://openalex.org/C34413123","display_name":"Robotics","score":0.44752371311187744}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4412673663","title":"Interspatial Attention for Efficient 4D Human Video Generation","url":"https://doi.org/10.1145/3731165","published":"2025-07-27","authors":["Ruizhi Shao","Yinghao Xu","Yujun Shen","Ceyuan Yang","Yang Zheng","Changan Chen","Yebin Liu","Gordon Wetzstein"],"abstract":"Generating photorealistic videos of digital humans in a controllable manner is crucial for a plethora of applications. Existing approaches either build on methods that employ template-based 3D representations or emerging video generation models but suffer from poor quality or limited consistency and identity preservation when generating individual or multiple digital humans. In this paper, we introduce a new interspatial attention (ISA) mechanism as a scalable building block for modern diffusion transformer (DiT)-based video generation models. ISA is a new type of cross attention that uses relative positional encodings tailored for the generation of human videos. Leveraging a custom-developed video variation autoencoder, we train a latent ISA-based diffusion model on a large corpus of video data. Our model achieves state-of-the-art performance for 4D human video synthesis, demonstrating....","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3731165","openalex_id":"https://openalex.org/W4412673663","cited_by_count":1,"quality_score":42,"matched_keywords":["efficient"],"author_affiliations":["Alibaba Group (China)","Stanford University","Tsinghua University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6886980533599854},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.480217307806015},{"id":"https://openalex.org/C121684516","display_name":"Computer graphics (images)","score":0.45485547184944153},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.41924434900283813},{"id":"https://openalex.org/C49774154","display_name":"Multimedia","score":0.34170520305633545}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4412673687","title":"AlignTex: Pixel-Precise Texture Generation from Multi-view Artwork","url":"https://doi.org/10.1145/3731158","published":"2025-07-27","authors":["Yuqing Zhang","Hao Xu","Yiqian Wu","Sirui Chen","Sirui Lin","Xiang Li","Xifeng Gao","Xiaogang Jin"],"abstract":"Current 3D asset creation pipelines typically consist of three stages: creating multi-view concept art, producing 3D meshes based on the artwork, and painting textures for the meshes—an often labor-intensive process. Automated texture generation offers significant acceleration, but prior methods, which fine-tune 2D diffusion models with multi-view input images, often fail to preserve pixel-level details. These methods primarily emphasize semantic and subject consistency, which do not meet the requirements of artwork-guided texture workflows. To address this, we present AlignTex , a novel framework for generating high-quality textures from 3D meshes and multi-view artwork, ensuring both appearance detail and geometric consistency. AlignTex operates in two stages: aligned image generation and texture refinement. The core of our approach, AlignNet , resolves complex misalignments by extract...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3731158","openalex_id":"https://openalex.org/W4412673687","cited_by_count":2,"quality_score":39,"matched_keywords":[],"author_affiliations":["Shenzhen University","Tencent (China)","Zhejiang University"],"concepts":[{"id":"https://openalex.org/C160633673","display_name":"Pixel","score":0.636841356754303},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.6257905960083008},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6181634068489075},{"id":"https://openalex.org/C2781195486","display_name":"Texture (cosmology)","score":0.6070441007614136},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.6064786314964294},{"id":"https://openalex.org/C121684516","display_name":"Computer graphics (images)","score":0.5887069702148438},{"id":"https://openalex.org/C50494287","display_name":"Texture synthesis","score":0.5728018879890442},{"id":"https://openalex.org/C144743038","display_name":"Texture filtering","score":0.5009732246398926}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":2}},{"id":"arxiv:2507.20368","title":"MagicAnime: A Hierarchically Annotated, Multimodal and Multitasking Dataset with Benchmarks for Cartoon Animation Generation","url":"https://huggingface.co/papers/2507.20368","published":"2025-07-27","authors":["Shuolin Xu","Bingyuan Wang","Zeyu Cai","Fangteng Fu","Yue Ma","Tongyi Lee","Hongchuan Yu","Zeyu Wang"],"abstract":"Generating high-quality cartoon animations multimodal control is challenging due to the complexity of non-human characters, stylistically diverse motions and fine-grained emotions. There is a huge domain gap between real-world videos and cartoon animation, as cartoon animation is usually abstract and has exaggerated motion. Meanwhile, public multimodal cartoon data are extremely scarce due to the difficulty of large-scale automatic annotation processes compared with real-life scenarios. To bridge this gap, We propose the MagicAnime dataset, a large-scale, hierarchically annotated, and multimodal dataset designed to support multiple video generation tasks, along with the benchmarks it includes. Containing 400k video clips for image-to-video generation, 50k pairs of video clips and keypoints for whole-body annotation, 12k pairs of video clips for video-to-video face animation, and 2.9k pai...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["huggingface_search"],"source":"huggingface_search","work_type":"","doi":"","openalex_id":"","cited_by_count":0,"quality_score":27,"matched_keywords":[],"author_affiliations":[],"concepts":[],"official_report":false,"quality_signals":{"company_match_source":"HuggingFace Papers organization/author metadata"}},{"id":"openalex:W4415884123","title":"Generative AI and Large Language Models in Conversational Systems: Trends and Future Directions","url":"https://doi.org/10.1109/aic66080.2025.11212044","published":"2025-07-26","authors":["Ravi Kiran Vadlamani","Prasad Sundaramoorthy","Lakshmi Narasimhan Srinivasagopalan","Dharmateja Priyadarshi Uddandarao","Revanth Reddy Bandaru"],"abstract":"The rapid advancement of Generative Artificial Intelligence (GenAI) and Large Language Models (LLMs) has significantly transformed Conversational AI, enabling more natural and human-like interactions through cutting-edge deep learning frameworks. This research offers a comprehensive evaluation of leading LLMs, including GPT-4, BERT, T5, ChatGPT, and Claude, assessing their performance based on key metrics such as accuracy, loss, AUC-ROC, F1-score, and computational complexity. The study examines the role of Reinforcement Learning with Human Feedback (RLHF), self-supervised learning, and transfer learning in improving model efficiency across various natural language processing tasks. A thorough experimental assessment was performed using a diverse dataset of AI models to evaluate their effectiveness in Conversational AI applications. The RandomForestClassifier was utilized to predict LLM....","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/aic66080.2025.11212044","openalex_id":"https://openalex.org/W4415884123","cited_by_count":0,"quality_score":41,"matched_keywords":["LLM"],"author_affiliations":["Amazon (United States)","Hôpital Nord"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7791000008583069},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.6812000274658203},{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.6353999972343445},{"id":"https://openalex.org/C9652623","display_name":"Field (mathematics)","score":0.5651999711990356},{"id":"https://openalex.org/C26517878","display_name":"Key (lock)","score":0.5002999901771545},{"id":"https://openalex.org/C2983448237","display_name":"Language understanding","score":0.45239999890327454},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.44179999828338623},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.42809998989105225}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"apple:wud077u1n09udoaeyaz4agox","title":"On the Way to LLM Personalization: Learning to Remember User Conversations","url":"https://machinelearning.apple.com/research/on-the-way","published":"2025-07-25","authors":["Lucie Charlotte Magister","Katherine Metcalf","Yizhe Zhang","Maartje ter Hoeve"],"abstract":"This paper was accepted at the Workshop on Large Language Model Memorization (L2M2) 2025.","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["LLM","language model","personalization"],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"apple:iq7b54jkgnoscyg4qk625sj2","title":"mRAKL: Multilingual Retrieval-Augmented Knowledge Graph Construction for Low-Resourced Languages","url":"https://machinelearning.apple.com/research/mrakl","published":"2025-07-25","authors":["Hellina Hailu Nigatu","Min Li","Maartje ter Hoeve","Saloni Potdar","Sarah E. Chasins"],"abstract":"Knowledge Graphs represent real-world entities and the relationships between them. Multilingual Knowledge Graph Construction (mKGC) refers to the task of automatically constructing or predicting missing entities and links for knowledge graphs in a multilingual setting. In this work, we reformulate the mKGC task as a Question Answering (QA) task and introduce mRAKL: a Retrieval-Augmented Generation (RAG) based system to perform mKGC. We achieve...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["retrieval"],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"apple:keujm9cxqhxfq69lnx2wcofq","title":"Steering into New Embedding Spaces: Analyzing Cross-Lingual Alignment Induced by Model Interventions in Multilingual Language Models","url":"https://machinelearning.apple.com/research/new-embedding-spaces","published":"2025-07-25","authors":["Anirudh Sundar§","Sinead Williamson","Katherine Metcalf","Barry Theobald","Skyler Seto","Masha Fedzechkina"],"abstract":"Aligned representations across languages is a desired property in multilingual large language models (mLLMs), as alignment can improve performance in cross-lingual tasks. Typically alignment requires fine-tuning a model, which is computationally expensive, and sizable language data, which often may not be available. A data-efficient alternative to fine-tuning is model interventions -- a method for manipulating model activations to steer...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["efficient"],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"apple:aek2ssaf5vf6trtaug3ynxet","title":"MMAU: A Holistic Benchmark of Agent Capabilities Across Diverse Domains","url":"https://machinelearning.apple.com/research/mmau","published":"2025-07-25","authors":["Guoli Yin","Haoping Bai","Shuang Ma","Feng Nan","Yanchao Sun","Zhaoyang Xu","Shen Ma","Jiarui Lu","Xiang Kong","Aonan Zhang","Dian Ang Yap","Yizhe Zhang"],"abstract":"Recent advances in large language models (LLMs) have increased the demand for comprehensive benchmarks to evaluate their capabilities as human-like agents. Existing benchmarks, while useful, often focus on specific application scenarios, emphasizing task completion but failing to dissect the underlying skills that drive these outcomes. This lack of granularity makes it difficult to deeply discern where failures stem from. Additionally, setting up...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["agent"],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"apple:v0ga6l22k9z0ppbml1d8a4er","title":"Can External Validation Tools Improve Annotation Quality for LLM-as-a-Judge?","url":"https://machinelearning.apple.com/research/external-validation","published":"2025-07-25","authors":["Arduin Findeis","Floris Weers","Guoli Yin","Ke Ye","Ruoming Pang","Tom Gunter"],"abstract":"Pairwise preferences over model responses are widely collected to evaluate and provide feedback to large language models (LLMs). Given two alternative model responses to the same input, a human or AI annotator selects the better'' response. Such data can provide a feedback signal in domains where traditional hard-coded metrics are difficult to obtain (e.g. quality of a chat interactions), thereby helping measure model progress or model...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["LLM"],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"apple:ruxyij4x3mg62lf7kzeotzoz","title":"ASPERA: A Simulated Environment to Evaluate Planning for Complex Action Execution","url":"https://machinelearning.apple.com/research/aspera","published":"2025-07-25","authors":["Alexandru Coca","Mark Gaynor","Zhenxing Zhang","Jianpeng Cheng","Bo-Hsiang Tseng","Pete Boothroyd","Héctor Martinez Alonso","Diarmuid Ó Séaghdha","Anders Johannsen"],"abstract":"This work evaluates the potential of large language models (LLMs) to power digital assistants capable of complex action execution. These assistants rely on pre-trained programming knowledge to execute multi-step goals by composing objects and functions defined in assistant libraries into action execution programs. To achieve this, we develop ASPERA, a framework comprising an assistant library simulation and a human-assisted LLM data generation...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["LLM"],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"openalex:W4416342111","title":"AI-Driven ETL Pipelines for Real-Time Big Data Curation","url":"https://doi.org/10.1109/wconf64849.2025.11233584","published":"2025-07-25","authors":["Rupesh Dabbir","Shubham Sharma"],"abstract":"The proliferation of real-time data sources such as financial markets, IoT networks, and digital services has amplified the demand for resilient and adaptive data engineering solutions. Traditional ETL (Extract, Transform, Load) systems, while foundational to data pipelines, are typically designed for static, batch-oriented workflows that lack the flexibility to handle dynamic errors, schema shifts, or API-level failures. This architectural rigidity results in operational bottlenecks, data loss, and increased dependence on manual intervention. To address these limitations, this paper proposes an AI-enhanced, self-healing ETL pipeline capable of autonomously detecting and recovering from runtime failures in real-time environments. The implementation ingests cryptocurrency trading data from the Gemini API and incorporates intelligent retry logic, backoff algorithms, and structured error ha...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/wconf64849.2025.11233584","openalex_id":"https://openalex.org/W4416342111","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Google (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7192000150680542},{"id":"https://openalex.org/C177212765","display_name":"Workflow","score":0.6531000137329102},{"id":"https://openalex.org/C75684735","display_name":"Big data","score":0.5755000114440918},{"id":"https://openalex.org/C77088390","display_name":"Database","score":0.5329999923706055},{"id":"https://openalex.org/C175309249","display_name":"Pipeline transport","score":0.4950000047683716},{"id":"https://openalex.org/C43521106","display_name":"Pipeline (software)","score":0.40880000591278076},{"id":"https://openalex.org/C79974875","display_name":"Cloud computing","score":0.3804999887943268},{"id":"https://openalex.org/C52146309","display_name":"Schema (genetic algorithms)","score":0.36880001425743103}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"arxiv:2507.19427","title":"Step-3 is Large yet Affordable: Model-system Co-design for Cost-effective Decoding","url":"https://huggingface.co/papers/2507.19427","published":"2025-07-25","authors":["StepFun","Bin Wang","Bojun Wang","Changyi Wan","Guanzhe Huang","Hanpeng Hu","Haonan Jia","Hao Nie","Mingliang Li","Nuo Chen","Siyu Chen","Song Yuan"],"abstract":"Large language models (LLMs) face low hardware efficiency during decoding, especially for long-context reasoning tasks. This paper introduces Step-3, a 321B-parameter VLM with hardware-aware model-system co-design optimized for minimizing decoding costs. Step-3 innovates in two key dimensions: (1) A novel Multi-Matrix Factorization Attention (MFA) mechanism that significantly reduces both KV cache size and computation while maintaining high attention expressiveness, and (2) Attention-FFN Disaggregation (AFD), a distributed inference system that decouples attention and Feed-Forward Network (FFN) layers into specialized subsystems. This co-design achieves unprecedented cost efficiency: Step-3 significantly reduces theoretical decoding costs compared with models like DeepSeek-V3 and Qwen3 MoE 235B, with the gains widening at longer context. Step-3 achieves low cost while activating 38B para...","companies":["StepFun"],"matched_orgs":["StepFun"],"company_groups":["company_china"],"company_regions":["China"],"sources":["huggingface_search"],"source":"huggingface_search","work_type":"","doi":"","openalex_id":"","cited_by_count":0,"quality_score":31,"matched_keywords":["LLM"],"author_affiliations":[],"concepts":[],"official_report":false,"quality_signals":{"company_match_source":"HuggingFace Papers organization/author metadata"}},{"id":"bytedance-seed:300","title":"Seed LiveInterpret 2.0: End-to-end Simultaneous Speech-to-speech Translation with Your Voice","url":"https://seed.bytedance.com/en/research/seed-liveinterpret-2-0-end-to-end-simultaneous-speech-to-speech-translation-with-your-voice","published":"2025-07-24","authors":["Seed Speech Team"],"abstract":"Simultaneous Interpretation (SI) represents one of the most daunting frontiers in the translation industry, with product-level automatic systems long plagued by intractable challenges: subpar transcription and translation quality, lack of real-time speech generation, multi-speaker confusion, and translated speech inflation, especially in long-form discourses. In this study, we introduce Seed-LiveInterpret 2.0, an end-to-end SI model that delivers high-fidelity, ultra-low-latency speech-to-speech generation with voice cloning capabilities. As a fully operational product-level solution, Seed-LiveInterpret 2.0 tackles these challenges head-on through our novel duplex speech-to-speech understanding-generating framework. Experimental results demonstrate that through large-scale pretraining and reinforcement learning, the model achieves a significantly better balance between translation accura...","companies":["ByteDance/Seed"],"matched_orgs":["ByteDance/Seed"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Speech&Audio","Speech","arXiv"],"author_affiliations":["ByteDance/Seed"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official ByteDance Seed publication API https://seed.bytedance.com/api/get_article_list_v2"}},{"id":"official:8ba3ca14b0233489","title":"Qwen-MT: Where Speed Meets Smart Translation","url":"https://qwenlm.github.io/blog/qwen-mt/","published":"2025-07-24","authors":["Alibaba/Qwen"],"abstract":"DEMO API DISCORDIntroduction Here we introduce the latest update of Qwen-MT (qwen-mt-turbo) via Qwen API. This update builds upon the powerful Qwen3, leveraging trillions multilingual and translation tokens to comprehensively enhance the model’s multilingual understanding and translation capabilities. By integrating reinforcement learning techniques, the model achieves significant improvements in translation accuracy and linguistic fluency.Key Features:Multilingual Support for 92 Languages: Qwen-MT enables high-quality translation across 92 major official languages and prominent dialects, covering over 95% of the global population to meet diverse cross-lingual communication needs.","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Alibaba/Qwen"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official RSS feed https://qwenlm.github.io/blog/index.xml"}},{"id":"openalex:W4412638626","title":"Corrigendum to “MA-FSAR: Multimodal Adaptation of CLIP for few-shot action recognition” [Pattern Recognition 169 (2026) 111902]","url":"https://doi.org/10.1016/j.patcog.2025.112160","published":"2025-07-24","authors":["Jiazheng Xing","Jian Zhao","Chao Xu","Mengmeng Wang","Guang Dai","Yong Liu","Jingdong Wang","Xuelong Li"],"abstract":"","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"erratum","doi":"https://doi.org/10.1016/j.patcog.2025.112160","openalex_id":"https://openalex.org/W4412638626","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Baidu (China)","China Central Television","Northwestern Polytechnic University","Northwestern Polytechnical University","State Grid Corporation of China (China)","State Key Laboratory of Industrial Control Technology","Zhejiang University of Technology"],"concepts":[{"id":"https://openalex.org/C2780791683","display_name":"Action (physics)","score":0.6187743544578552},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5940996408462524},{"id":"https://openalex.org/C2987834672","display_name":"Action recognition","score":0.5371357202529907},{"id":"https://openalex.org/C139807058","display_name":"Adaptation (eye)","score":0.5340510606765747},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5298295617103577},{"id":"https://openalex.org/C153180895","display_name":"Pattern recognition (psychology)","score":0.4257392883300781},{"id":"https://openalex.org/C15744967","display_name":"Psychology","score":0.17395800352096558},{"id":"https://openalex.org/C169760540","display_name":"Neuroscience","score":0.03831779956817627}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"hf-org-paper:Qwen:2507.18071","title":"Group Sequence Policy Optimization","url":"https://huggingface.co/papers/2507.18071","published":"2025-07-23","authors":["Alibaba/Qwen"],"abstract":"","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["HuggingFace org papers","Qwen"],"author_affiliations":["Alibaba/Qwen"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/Qwen/papers"}},{"id":"openalex:W4412605458","title":"M3DM-NR: RGB-3D Noisy-Resistant Industrial Anomaly Detection via Multimodal Denoising","url":"https://doi.org/10.1109/tpami.2025.3592089","published":"2025-07-23","authors":["Chengjie Wang","Haokun Zhu","Jinlong Peng","Yue Wang","Ran Yi","Yunsheng Wu","Lizhuang Ma","Jiangning Zhang"],"abstract":"Existing industrial anomaly detection methods primarily concentrate on unsupervised learning with pristine RGB images. Yet, both RGB and 3D data are crucial for anomaly detection, and the datasets are seldom completely clean in practical scenarios. To address above challenges, this paper initially delves into the RGB-3D multi-modal noisy anomaly detection, proposing a novel noise-resistant M3DM-NR framework to leveraging strong multi-modal discriminative capabilities of CLIP. M3DM-NR consists of three stages: Stage-I introduces the Suspected References Selection module to filter a few normal samples from the training dataset, using the multimodal features extracted by the Initial Feature Extraction, and a Suspected Anomaly Map Computation module to generate a suspected anomaly map to focus on abnormal regions as reference. Stage-II uses the suspected anomaly maps of the reference samples...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/tpami.2025.3592089","openalex_id":"https://openalex.org/W4412605458","cited_by_count":5,"quality_score":42,"matched_keywords":[],"author_affiliations":["Shanghai Jiao Tong University","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C739882","display_name":"Anomaly detection","score":0.7232789397239685},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.7125501036643982},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.640760064125061},{"id":"https://openalex.org/C163294075","display_name":"Noise reduction","score":0.6250575184822083},{"id":"https://openalex.org/C153180895","display_name":"Pattern recognition (psychology)","score":0.5987372994422913},{"id":"https://openalex.org/C82990744","display_name":"RGB color model","score":0.5800879001617432},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.5100513100624084},{"id":"https://openalex.org/C2983327147","display_name":"Image denoising","score":0.4741874635219574}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":5}},{"id":"arxiv:2505.09608","title":"LightLab: Controlling Light Sources in Images with Diffusion Models","url":"http://arxiv.org/abs/2505.09608","published":"2025-07-23","authors":["Nadav Magar","Amir Hertz","Eric Tabellion","Yael Pritch","Alex Rav-Acha","Ariel Shamir","Yedid Hoshen"],"abstract":"We present a simple, yet effective diffusion-based method for fine-grained, parametric control over light sources in an image. Existing relighting methods either rely on multiple input views to perform inverse rendering at inference time, or fail to provide explicit control over light changes. Our method fine-tunes a diffusion model on a small set of real raw photograph pairs, supplemented by synthetically rendered images at scale, to elicit its photorealistic prior for relighting. We leverage the linearity of light to synthesize image pairs depicting controlled light changes of either a target light source or ambient illumination. Using this data and an appropriate fine-tuning scheme, we train a model for precise illumination changes with explicit control over light intensity and color. Lastly, we show how our method can achieve compelling light editing results, and outperforms existing...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3721238.3730696","openalex_id":"https://openalex.org/W4412589932","cited_by_count":1,"quality_score":42,"matched_keywords":["preference"],"author_affiliations":["Google (Israel)","Google (United States)","Hebrew University of Jerusalem","Tel Aviv University"],"concepts":[{"id":"https://openalex.org/C69357855","display_name":"Diffusion","score":0.6639285683631897},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5936352610588074},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3305623531341553},{"id":"https://openalex.org/C121332964","display_name":"Physics","score":0.14643162488937378},{"id":"https://openalex.org/C97355855","display_name":"Thermodynamics","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4412585535","title":"GraphMSR: A graph foundation model-based approach for MRI image super-resolution with multimodal semantic integration","url":"https://doi.org/10.1016/j.patcog.2025.112178","published":"2025-07-23","authors":["Zhiquan Qin","Zihao He","Yan Zhang","Yunhang Shen","Ke Li"],"abstract":"","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1016/j.patcog.2025.112178","openalex_id":"https://openalex.org/W4412585535","cited_by_count":2,"quality_score":39,"matched_keywords":[],"author_affiliations":["Tencent (China)","Xiamen University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6051009893417358},{"id":"https://openalex.org/C141239990","display_name":"Superresolution","score":0.594009280204773},{"id":"https://openalex.org/C2780966255","display_name":"Foundation (evidence)","score":0.553296685218811},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5355806946754456},{"id":"https://openalex.org/C132525143","display_name":"Graph","score":0.5184354782104492},{"id":"https://openalex.org/C115961682","display_name":"Image (mathematics)","score":0.4751184582710266},{"id":"https://openalex.org/C138268822","display_name":"Resolution (logic)","score":0.42987245321273804},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.40656328201293945}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":2}},{"id":"arxiv:2508.07905","title":"Generative Video Matting","url":"http://arxiv.org/abs/2508.07905","published":"2025-07-23","authors":["Yongtao Ge","Kangyang Xie","Guangkai Xu","Li Ke","Mingyu Liu","Longtao Huang","Hui Xue","Hao Chen","Chunhua Shen"],"abstract":"Video matting has traditionally been limited by the lack of high-quality ground-truth data. Most existing video matting datasets provide only human-annotated imperfect alpha and foreground annotations, which must be composited to background images or videos during the training stage. Thus, the generalization capability of previous methods in real-world scenarios is typically poor. In this work, we propose to solve the problem from two perspectives. First, we emphasize the importance of large-scale pre-training by pursuing diverse synthetic and pseudo-labeled segmentation datasets. We also develop a scalable synthetic data generation pipeline that can render diverse human bodies and fine-grained hairs, yielding around 200 video clips with a 3-second duration for fine-tuning. Second, we introduce a novel video matting approach that can effectively leverage the rich priors from pre-trained....","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"preprint","doi":"https://doi.org/10.1145/3721238.3730642","openalex_id":"https://openalex.org/W4412587526","cited_by_count":1,"quality_score":38,"matched_keywords":[],"author_affiliations":["Alibaba Group (China)","The University of Adelaide","Zhejiang University","Zhejiang University of Science and Technology","Zhejiang University of Technology"],"concepts":[{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.6612133383750916},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6452764272689819},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.48377200961112976},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.3794695734977722}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4412584833","title":"Dynamic data selection with normalized gradient-based influence approximation for targeted fine-tuning of LLMs","url":"https://doi.org/10.1016/j.knosys.2025.114144","published":"2025-07-23","authors":["Zige Wang","Qi Zhu","Fei Mi","Yasheng Wang","Haotian Wang","Lifeng Shang"],"abstract":"","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1016/j.knosys.2025.114144","openalex_id":"https://openalex.org/W4412584833","cited_by_count":1,"quality_score":38,"matched_keywords":[],"author_affiliations":["Beijing Haidian Hospital","Huawei Technologies (China)","Huawei Technologies (Sweden)","Peking University","Tsinghua University"],"concepts":[{"id":"https://openalex.org/C81917197","display_name":"Selection (genetic algorithm)","score":0.6111358404159546},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.3606603145599365},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.20856904983520508}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4412596306","title":"Splat4D: Diffusion-Enhanced 4D Gaussian Splatting for Temporally and Spatially Consistent Content Creation","url":"https://doi.org/10.1145/3721238.3730752","published":"2025-07-23","authors":["Minghao Yin","Yukang Cao","Songyou Peng","Kai Han"],"abstract":"Generating high-quality 4D content from monocular videos—for applications such as digital humans and AR/VR—poses challenges in ensuring temporal and spatial consistency, preserving intricate details, and incorporating user guidance effectively. To overcome these challenges, we introduce Splat4D, a novel framework enabling high-fidelity 4D content generation from a monocular video. Splat4D achieves superior performance while maintaining faithful spatial-temporal coherence, by leveraging multi-view rendering, inconsistency identification, a video diffusion model, and an asymmetric U-Net for refinement. Through extensive evaluations on public benchmarks, Splat4D consistently demonstrates state-of-the-art performance across various metrics, underscoring the efficacy of our approach. Additionally, the versatility of Splat4D is validated in various applications such as text/image conditioned 4...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3721238.3730752","openalex_id":"https://openalex.org/W4412596306","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Google (United States)","Nanyang Technological University","University of Hong Kong"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6967004537582397},{"id":"https://openalex.org/C163716315","display_name":"Gaussian","score":0.5785648822784424},{"id":"https://openalex.org/C2778152352","display_name":"Content (measure theory)","score":0.512395441532135},{"id":"https://openalex.org/C69357855","display_name":"Diffusion","score":0.50738924741745},{"id":"https://openalex.org/C121684516","display_name":"Computer graphics (images)","score":0.37162065505981445},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.32022809982299805},{"id":"https://openalex.org/C121332964","display_name":"Physics","score":0.12198492884635925},{"id":"https://openalex.org/C33923547","display_name":"Mathematics","score":0.09254556894302368}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4412588068","title":"FashionComposer: Compositional Fashion Image Generation","url":"https://doi.org/10.1145/3721238.3730663","published":"2025-07-23","authors":["Sihui Ji","Yiyang Wang","Xi Chen","Xiaogang Xu","Hao Luo","Hengshuang Zhao"],"abstract":"","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3721238.3730663","openalex_id":"https://openalex.org/W4412588068","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Alibaba Group (China)","Chinese University of Hong Kong","University of Hong Kong"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.549267590045929},{"id":"https://openalex.org/C115961682","display_name":"Image (mathematics)","score":0.4640604257583618},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3727319538593292},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.3547069728374481},{"id":"https://openalex.org/C121684516","display_name":"Computer graphics (images)","score":0.33875271677970886}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"official:64ab8340f765ffdf","title":"Qwen3-Coder: Agentic Coding in the World","url":"https://qwenlm.github.io/blog/qwen3-coder/","published":"2025-07-22","authors":["Alibaba/Qwen"],"abstract":"GITHUB HUGGING FACE MODELSCOPE DISCORDToday, we’re announcing Qwen3-Coder, our most agentic code model to date. Qwen3-Coder is available in multiple sizes, but we’re excited to introduce its most powerful variant first: Qwen3-Coder-480B-A35B-Instruct — a 480B-parameter Mixture-of-Experts model with 35B active parameters which supports the context length of 256K tokens natively and 1M tokens with extrapolation methods, offering exceptional performance in both coding and agentic tasks. Qwen3-Coder-480B-A35B-Instruct sets new state-of-the-art results among open models on Agentic Coding, Agentic Browser-Use, and Agentic Tool-Use, comparable to Claude Sonnet 4.","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Alibaba/Qwen"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official RSS feed https://qwenlm.github.io/blog/index.xml"}},{"id":"openalex:W4412565292","title":"Benchmarking Multi-dimensional AIGC Video Quality Assessment: A Dataset and Unified Model","url":"https://doi.org/10.1145/3749844","published":"2025-07-22","authors":["Zhichao Zhang","Wei Sun","Xinyue Li","Jun Jia","Xiongkuo Min","Zicheng Zhang","Chunyi Li","Zijian Chen","Puyi Wang","Fengyu Sun","Shangling Jui","Guangtao Zhai"],"abstract":"In recent years, AI-driven video generation has gained significant attention due to great advancements in visual and language generative techniques. Consequently, there is a growing need for accurate Video Quality Assessment (VQA) metrics to evaluate the perceptual quality of AI-generated content (AIGC) videos and optimize video generation models. However, assessing the quality of AIGC videos remains a significant challenge because these videos often exhibit highly complex distortions, such as unnatural actions and irrational objects. To address this challenge, we systematically investigate the AIGC-VQA problem in this article, considering both subjective and objective quality assessment perspectives. For the subjective perspective, we construct the L arge-scale G enerated V ideo Q uality Assessment (LGVQ) dataset, consisting of \\(2,\\!808\\) AIGC videos generated by six video generation m...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3749844","openalex_id":"https://openalex.org/W4412565292","cited_by_count":7,"quality_score":44,"matched_keywords":[],"author_affiliations":["East China Normal University","Huawei Technologies (China)","Shanghai Jiao Tong University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.9059203863143921},{"id":"https://openalex.org/C86251818","display_name":"Benchmarking","score":0.8866012096405029},{"id":"https://openalex.org/C2779530757","display_name":"Quality (philosophy)","score":0.46792128682136536},{"id":"https://openalex.org/C3020001037","display_name":"Quality assessment","score":0.437541127204895},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.36042991280555725},{"id":"https://openalex.org/C124101348","display_name":"Data mining","score":0.3494371473789215},{"id":"https://openalex.org/C2522767166","display_name":"Data science","score":0.32997193932533264},{"id":"https://openalex.org/C3018395757","display_name":"Evaluation methods","score":0.17945778369903564}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":7}},{"id":"bytedance-seed:297","title":"GR-3 Technical Report","url":"https://seed.bytedance.com/en/research/gr-3-technical-report","published":"2025-07-21","authors":["Seed Robotics Team"],"abstract":"We report our recent progress towards building generalist robot policies, the development of GR-3. GR-3 is a large-scale vision-language-action (VLA) model. It showcases exceptional capabilities in generalizing to novel objects, environments, and instructions involving abstract concepts. Furthermore, it can be efficiently fine-tuned with minimal human trajectory data, enabling rapid and cost-effective adaptation to new settings. GR-3 also excels in handling long-horizon and dexterous tasks, including those requiring bi-manual manipulation and mobile movement, showcasing robust and reliable performance. These capabilities are achieved through a multi-faceted training recipe that includes co-training with web-scale vision-language data, efficient fine-tuning from human trajectory data collected via VR devices, and effective imitation learning with robot trajectory data. In addition, we int...","companies":["ByteDance/Seed"],"matched_orgs":["ByteDance/Seed"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Robotics","arXiv","efficient"],"author_affiliations":["ByteDance/Seed"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official ByteDance Seed publication API https://seed.bytedance.com/api/get_article_list_v2"}},{"id":"openalex:W4412536669","title":"UniVoxel: A Novel Framework for 3-D Object Detection in Autonomous Vehicles With Multimodal Voxel Representation","url":"https://doi.org/10.1109/jsen.2025.3589494","published":"2025-07-21","authors":["Kaiqi Liu","Yuanyuan Deng","Jiaxun Tong","Wei Li"],"abstract":"Fusing camera and LiDAR information is one of the effective means for achieving robust 3D object detection. However, current 3D multi-modal methods typically rely on independent branches to extract features from different sensors separately, leading to underutilization of complementary information. In this paper, a multi-modal detector named UniVoxel is proposed, which is built on a query-based detection paradigm. The UniVoxel integrates inputs from various modalities into the voxel representation for fusion. Specifically, a Semantic-guided Query Generator (SQG) is proposed, in which the low-level voxel features are utilized to adaptively sample multi-scale image features, producing unified multi-modal voxel features. The multi-modal voxel features contain both the geometric and semantic information of the voxels and can ensure that the model focuses on the Regions of Interest (RoI). Mea...","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/jsen.2025.3589494","openalex_id":"https://openalex.org/W4412536669","cited_by_count":2,"quality_score":39,"matched_keywords":[],"author_affiliations":["Baidu (China)","Beijing Electronic Science and Technology Institute","Beijing Institute of Technology"],"concepts":[{"id":"https://openalex.org/C2776359362","display_name":"Representation (politics)","score":0.7270494699478149},{"id":"https://openalex.org/C71139939","display_name":"Modal","score":0.6776884198188782},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6478897929191589},{"id":"https://openalex.org/C54170458","display_name":"Voxel","score":0.6292654871940613},{"id":"https://openalex.org/C2781238097","display_name":"Object (grammar)","score":0.5314314365386963},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.4692057967185974},{"id":"https://openalex.org/C2776151529","display_name":"Object detection","score":0.4646579325199127},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4571942389011383}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":2}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/stitch-simultaneous-thinking-and-talking-with-chunked-reasoning-for-spoken-language-models","title":"STITCH: Simultaneous Thinking and Talking with Chunked Reasoning for Spoken Language Models","url":"https://www.microsoft.com/en-us/research/publication/stitch-simultaneous-thinking-and-talking-with-chunked-reasoning-for-spoken-language-models/","published":"2025-07-20","authors":["Cheng-Han Chiang","Xiaofei Wang","Linjie Li","Chung-Ching Lin","Kevin Lin","Shujie Liu","Zhendong Wang","Zhengyuan Yang","Hung-yi Lee","Lijuan Wang"],"abstract":"Spoken Language Models (SLMs) are designed to take speech inputs and produce spoken responses. However, current SLMs lack the ability to perform an internal, unspoken thinking process before responding. In contrast, humans typically engage in complex mental reasoning internally, enabling them to communicate ideas clearly and concisely. Thus, integrating an unspoken thought process into SLMs is highly desirable. While naively generating a complete chain-of-thought (CoT) reasoning before starting to talk can enable thinking for SLMs, this induces additional latency for the speech response, as the CoT reasoning can be arbitrarily long. To solve this issue, we propose Stitch, a novel generation method that alternates between the generation of unspoken reasoning chunks and spoken response chunks. Since the audio duration of a chunk of spoken response is much longer than the time to generate t...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","Engineering","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W4414210525","title":"Optimizing Data Distribution and Kernel Performance for Efficient Training of Chemistry Foundation Models: A Case Study with MACE","url":"https://doi.org/10.1145/3731545.3731594","published":"2025-07-20","authors":["Jesun Firoz","Franco Pellegrini","Mario Geiger","Darren J. Hsu","Jenna A. Bilbrey","Huey Wen Chou","Maximilian Stadler","Markus Höhnerbach","Tingyu Wang","Dejun Lin","Emine Küçükbenli","Henry W. Sprueill"],"abstract":"Chemistry Foundation Models (CFMs) that leverage Graph Neural Networks (GNNs) operating on 3D molecular graph structures are becoming indispensable tools for computational chemists and materials scientists. These models facilitate the understanding of matter and the discovery of new molecules and materials. In contrast to GNNs operating on large homogeneous graphs, GNNs used by CFMs process a large number of geometric graphs of varying sizes, requiring different optimization strategies than those developed for large homogeneous GNNs. This paper presents optimizations for two critical phases of CFM training: data distribution and model training, targeting MACE - a state-of-the-art CFM. We address the challenge of load balancing in data distribution by formulating it as a multi-objective bin packing problem. We propose an iterative algorithm that provides a highly effective, fast, and prac...","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3731545.3731594","openalex_id":"https://openalex.org/W4414210525","cited_by_count":1,"quality_score":42,"matched_keywords":["efficient"],"author_affiliations":["Nvidia (United States)","Pacific Northwest National Laboratory","Scuola Internazionale Superiore di Studi Avanzati","University of Cambridge","University of Utah"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7125999927520752},{"id":"https://openalex.org/C74193536","display_name":"Kernel (algebra)","score":0.5116999745368958},{"id":"https://openalex.org/C153083717","display_name":"Leverage (statistics)","score":0.4724999964237213},{"id":"https://openalex.org/C132525143","display_name":"Graph","score":0.40880000591278076},{"id":"https://openalex.org/C51632099","display_name":"Training set","score":0.37619999051094055},{"id":"https://openalex.org/C122280245","display_name":"Kernel method","score":0.37209999561309814},{"id":"https://openalex.org/C50644808","display_name":"Artificial neural network","score":0.3693000078201294},{"id":"https://openalex.org/C98045186","display_name":"Process (computing)","score":0.365200012922287}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"arxiv:2507.15061","title":"WebShaper: Agentically Data Synthesizing via Information-Seeking Formalization","url":"https://huggingface.co/papers/2507.15061","published":"2025-07-20","authors":["Zhengwei Tao","Jialong Wu","Wenbiao Yin","Junkai Zhang","Baixuan Li","Haiyang Shen","Kuan Li","Liwen Zhang","Xinyu Wang","Yong Jiang","Pengjun Xie","Fei Huang"],"abstract":"The advent of Large Language Model (LLM)-powered agents has revolutionized artificial intelligence by enabling solutions to complex, open-ended tasks through web-based information-seeking (IS) capabilities. The scarcity of high-quality training data has limited the development of IS agents. Existing approaches typically adopt an information-driven paradigm that first collects web data and then generates questions based on the retrieval. However, this may lead to inconsistency between information structure and reasoning structure, question and answer. To mitigate, we propose a formalization-driven IS data synthesis framework WebShaper to construct a dataset. WebShaper systematically formalizes IS tasks through set theory. Central to the formalization is the concept of Knowledge Projections (KP), which enables precise control over reasoning structure by KP operation compositions. During sy...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["huggingface_search"],"source":"huggingface_search","work_type":"","doi":"","openalex_id":"","cited_by_count":0,"quality_score":39,"matched_keywords":["LLM","language model","retrieval"],"author_affiliations":[],"concepts":[],"official_report":false,"quality_signals":{"company_match_source":"HuggingFace Papers organization/author metadata"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/insights-into-a-radiology-specialised-multimodal-large-language-model-with-sparse-autoencoders","title":"Insights into a radiology-specialised multimodal large language model with sparse autoencoders","url":"https://www.microsoft.com/en-us/research/publication/insights-into-a-radiology-specialised-multimodal-large-language-model-with-sparse-autoencoders/","published":"2025-07-19","authors":["Kenza Bouzid","Shruthi Bannur","Felix Meissen","Daniel Coelho de Castro","Anton Schwaighofer","Javier Alvarez-Valle","Stephanie Hyland"],"abstract":"Interpretability can improve the safety, transparency and trust of AI models, which is especially important in healthcare applications where decisions often carry significant consequences. Mechanistic interpretability, particularly through the use of sparse autoencoders (SAEs), offers a promising approach for uncovering human-interpretable features within large transformer-based models. In this study, we apply Matryoshka-SAE to the radiology-specialised multimodal large language model, MAIRA-2, to interpret its internal representations. Using large-scale automated interpretability of the SAE features, we identify a range of clinically relevant concepts - including medical devices (e.g., line and tube placements, pacemaker presence), pathologies such as pleural effusion and cardiomegaly, longitudinal changes and textual features. We further examine the influence of these features on model...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Medical, health and genomics","Computer science","1970-01-01","language model"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W4412871250","title":"Enhancing LLMs for Power System Simulations: A Feedback-Driven Multi-Agent Framework","url":"https://doi.org/10.1109/tsg.2025.3589114","published":"2025-07-18","authors":["Mengshuo Jia","Zeyu Cui","Gabriela Hug"],"abstract":"The integration of experimental technologies with large language models (LLMs) is transforming scientific research. It positions AI as a versatile research assistant rather than a mere problem-solving tool. In the field of power systems, however, managing simulations — one of the essential experimental technologies — remains a challenge for LLMs due to their limited domain-specific knowledge, restricted reasoning capabilities, and imprecise handling of simulation parameters. To address these limitations, this paper proposes a feedback-driven, multi-agent framework. It incorporates three proposed modules: an enhanced retrieval-augmented generation (RAG) module, an improved reasoning module, and a dynamic environmental acting module with an error-feedback mechanism. Validated on 69 diverse tasks from <sc xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" xmlns:xlink=\"http://www.w3.org/1999/xli...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/tsg.2025.3589114","openalex_id":"https://openalex.org/W4412871250","cited_by_count":10,"quality_score":63,"matched_keywords":["LLM","retrieval","agent","multi-agent"],"author_affiliations":["Alibaba Group (China)","ETH Zurich","Shanghai Jiao Tong University"],"concepts":[{"id":"https://openalex.org/C163258240","display_name":"Power (physics)","score":0.47492724657058716},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.42797911167144775},{"id":"https://openalex.org/C134560507","display_name":"Environmental economics","score":0.35709863901138306},{"id":"https://openalex.org/C144133560","display_name":"Business","score":0.3403952717781067},{"id":"https://openalex.org/C47446073","display_name":"Control theory (sociology)","score":0.32648006081581116},{"id":"https://openalex.org/C162324750","display_name":"Economics","score":0.3248818516731262},{"id":"https://openalex.org/C133731056","display_name":"Control engineering","score":0.32292288541793823},{"id":"https://openalex.org/C127413603","display_name":"Engineering","score":0.3201724886894226}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":10}},{"id":"openalex:W4413067177","title":"Agent4Ranking: Semantic Robust Ranking via Personalized Query Rewriting Using Multi-Agent LLMs","url":"https://doi.org/10.1145/3749099","published":"2025-07-18","authors":["Xiaopeng Li","Lixin Su","Pengyue Jia","Suqi Cheng","Junfeng Wang","Dawei Yin","Xiangyu Zhao"],"abstract":"Search engines are crucial as they provide an efficient and easy way to access vast amounts of information on the Internet for diverse information needs. User queries, even with a specific need, can differ significantly. Prior research has explored the resilience of ranking models against typical query variations like paraphrasing, misspellings, and order changes. Yet, these works overlook how diverse demographics uniquely formulate identical queries. For instance, older individuals tend to construct queries more naturally and in varied order compared to other groups. This demographic diversity necessitates enhancing the adaptability of ranking models to diverse query formulations. To this end, in this article, we propose a framework that integrates a novel rewriting pipeline that rewrites queries from various demographic perspectives and a novel framework to enhance ranking robustness.....","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3749099","openalex_id":"https://openalex.org/W4413067177","cited_by_count":4,"quality_score":57,"matched_keywords":["personalized","efficient","agent","multi-agent"],"author_affiliations":["Baidu (China)","City University of Hong Kong"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.762624204158783},{"id":"https://openalex.org/C189430467","display_name":"Ranking (information retrieval)","score":0.7244266867637634},{"id":"https://openalex.org/C154690210","display_name":"Rewriting","score":0.7033835053443909},{"id":"https://openalex.org/C63479239","display_name":"Robustness (evolution)","score":0.6728814840316772},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.5382182002067566},{"id":"https://openalex.org/C168725872","display_name":"Sophistication","score":0.4595438838005066},{"id":"https://openalex.org/C124101348","display_name":"Data mining","score":0.3213965594768524},{"id":"https://openalex.org/C199360897","display_name":"Programming language","score":0.11975753307342529}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":4}},{"id":"apple:h6s1mp8fwzhb97np8rjboif5","title":"Language Models Improve When Pretraining Data Matches Target Tasks","url":"https://machinelearning.apple.com/research/pretraining-data-matches","published":"2025-07-18","authors":["David Mizrahi","Anders Boesen Lindbo Larsen","Jesse Allardice","Suzie Petryk","Yuri Gorokhov","Jeffrey Li","Alex Fang","Josh Gardner§","Tom Gunter","Afshin Dehghan"],"abstract":"Every data selection method inherently has a target. In practice, these targets often emerge implicitly through benchmark-driven iteration: researchers develop selection strategies, train models, measure benchmark performance, then refine accordingly. This raises a natural question: what happens when we make this optimization explicit? To explore this, we propose benchmark-targeted ranking (BETR), a simple method that selects pretraining...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"openalex:W4412673545","title":"A Large-Scale Study of Relevance Assessments with Large Language Models Using UMBRELA","url":"https://doi.org/10.1145/3731120.3744605","published":"2025-07-18","authors":["Shivani Upadhyay","Ronak Pradeep","Nandan Thakur","Daniel Campos","Nick Craswell","Ian Soboroff","Jimmy Lin"],"abstract":"There is substantial interest in applying large language models (LLMs) to provide relevance assessments in information retrieval (IR) applications from both industry and academia. To date, researchers and practitioners have presented several studies, but many questions remain. In this paper, we examine four different relevance assessment strategies: a fully manual process and three variants that rely on LLMs to different extents using our tool called UMBRELA. These were deployed in the TREC 2024 RAG Track on a diverse set of 77 runs from 19 teams in situ, which allowed us to correlate system rankings induced by the different approaches and to characterize tradeoffs between cost and quality. We find that system rankings produced by the three LLM-based strategies correlate well at the run level with those produced by fully manual assessments in terms of nDCG@20, nDCG@100, and Recall@100. O...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3731120.3744605","openalex_id":"https://openalex.org/W4412673545","cited_by_count":5,"quality_score":50,"matched_keywords":["LLM","retrieval"],"author_affiliations":["College of San Mateo","Microsoft (United States)","Seattle University","University of Waterloo"],"concepts":[{"id":"https://openalex.org/C158154518","display_name":"Relevance (law)","score":0.7325040102005005},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6517194509506226},{"id":"https://openalex.org/C2778755073","display_name":"Scale (ratio)","score":0.6473202705383301},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.5192989706993103},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3503267765045166},{"id":"https://openalex.org/C205649164","display_name":"Geography","score":0.08708155155181885},{"id":"https://openalex.org/C58640448","display_name":"Cartography","score":0.07942035794258118},{"id":"https://openalex.org/C199539241","display_name":"Law","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":5}},{"id":"official:fc8d05a05313719a","title":"ChatGPT agent System Card","url":"https://openai.com/index/chatgpt-agent-system-card","published":"2025-07-17","authors":["OpenAI"],"abstract":"ChatGPT agent System Card: OpenAI’s agentic model unites research, browser automation, and code tools with safeguards under the Preparedness Framework.","companies":["OpenAI"],"matched_orgs":["OpenAI"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Publication","agent"],"author_affiliations":["OpenAI"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official RSS feed https://openai.com/news/rss.xml"}},{"id":"apple:gjv3wvfrrg74vt50yfmw36l5","title":"Apple Intelligence Foundation Language Models Tech Report 2025","url":"https://machinelearning.apple.com/research/apple-foundation-models-tech-report-2025","published":"2025-07-17","authors":["Apple"],"abstract":"We introduce two multilingual, multimodal foundation language models that power Apple Intelligence features across Apple devices and services: (i) a ∼3B-parameter on-device model optimized for Apple silicon through architectural innovations such as KV-cache sharing and 2-bit quantization-aware training; and (ii) a scalable server model built on a novel Parallel-Track Mixture-of-Experts (PT-MoE) transformer that combines track parallelism,...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["quantization"],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"openalex:W4412491770","title":"Inv-Adapter: ID Customization Generation via Image Inversion and Lightweight Parameter Adapter","url":"https://doi.org/10.1109/tpami.2025.3590321","published":"2025-07-17","authors":["Peng Xing","Ning Wang","Jianbo Ouyang","Zechao Li"],"abstract":"The remarkable advancement in text-to-image generation models significantly boosts the research in ID customization generation. However, existing personalization methods cannot simultaneously satisfy high-fidelity and low-costs requirements. Their main bottleneck lies in the additional prompt image encoder (i.e., CLIP vision encoder), which produces weak alignment signals with the text-to-image model that may lose face information and is not well 'absorbed' by the text-to-image model. Towards this end, we propose Inv-Adapter, which first introduces a more reasonable and efficient token representation of ID image features and introduces a lightweight parameter adaptor to inject ID features. Specifically, our Inv-Adapter extracts diffusion-domain representations of ID images utilizing a pre-trained text-to-image model via DDIM image inversion, without an additional image encoder. Benefitin...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/tpami.2025.3590321","openalex_id":"https://openalex.org/W4412491770","cited_by_count":0,"quality_score":45,"matched_keywords":["personalization","efficient"],"author_affiliations":["Huawei Technologies (China)","Nanjing University of Science and Technology"],"concepts":[{"id":"https://openalex.org/C177284502","display_name":"Adapter (computing)","score":0.8881519436836243},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7724951505661011},{"id":"https://openalex.org/C118505674","display_name":"Encoder","score":0.5868563055992126},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5715571045875549},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.4867846667766571},{"id":"https://openalex.org/C2776459999","display_name":"Fidelity","score":0.45804375410079956},{"id":"https://openalex.org/C153180895","display_name":"Pattern recognition (psychology)","score":0.3279765844345093},{"id":"https://openalex.org/C9390403","display_name":"Computer hardware","score":0.24393045902252197}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/learning-to-summarize-user-information-for-personalized-reinforcement-learning-from-human-feedback","title":"Learning to summarize user information for personalized reinforcement learning from human feedback","url":"https://www.microsoft.com/en-us/research/publication/learning-to-summarize-user-information-for-personalized-reinforcement-learning-from-human-feedback/","published":"2025-07-16","authors":["Hyunji Nam","Yanming Wan","Mickel Liu","Peter Ahnn","Jianxun Lian","Natasha Jaques"],"abstract":"As everyday use cases of large language model (LLM) AI assistants have expanded, it is becoming increasingly important to personalize responses to align to different users'preferences and goals. While reinforcement learning from human feedback (RLHF) is effective at improving LLMs to be generally more helpful and fluent, it does not account for variability across users, as it models the entire user population with a single reward model, meaning it assumes that everyone's preferences are the same. We present a novel framework, Preference Learning Using Summarization (PLUS), that uses reinforcement learning (RL) to learn to produce text-based summaries of each user's preferences, characteristics, and past conversations. These summaries condition the reward model, enabling it to make personalized predictions about the types of responses valued by each user. Both the user-summarization model...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":92,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","large language models","1970-01-01","LLM","language model","personalized","personalization","preference"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W4412467931","title":"EEGMamba: An EEG foundation model with Mamba","url":"https://doi.org/10.1016/j.neunet.2025.107816","published":"2025-07-16","authors":["Jiquan Wang","Sha Zhao","Zhiling Luo","Yangxuan Zhou","Shijian Li","Gang Pan"],"abstract":"","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1016/j.neunet.2025.107816","openalex_id":"https://openalex.org/W4412467931","cited_by_count":8,"quality_score":45,"matched_keywords":[],"author_affiliations":["Alibaba Group (China)","Zhejiang Lab","Zhejiang University","Zhejiang University of Science and Technology","Zhejiang University of Technology"],"concepts":[{"id":"https://openalex.org/C2780966255","display_name":"Foundation (evidence)","score":0.6303389668464661},{"id":"https://openalex.org/C522805319","display_name":"Electroencephalography","score":0.5271356105804443},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.41031524538993835},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3514508605003357},{"id":"https://openalex.org/C15744967","display_name":"Psychology","score":0.2170340120792389},{"id":"https://openalex.org/C169760540","display_name":"Neuroscience","score":0.09230440855026245},{"id":"https://openalex.org/C205649164","display_name":"Geography","score":0.08247032761573792},{"id":"https://openalex.org/C166957645","display_name":"Archaeology","score":0.04012307524681091}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":8}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/mindjourney-test-time-scaling-with-world-models-for-spatial-reasoning","title":"MindJourney: Test-Time Scaling with World Models for Spatial Reasoning","url":"https://www.microsoft.com/en-us/research/publication/mindjourney-test-time-scaling-with-world-models-for-spatial-reasoning/","published":"2025-07-15","authors":["Yuncong Yang","Jiageng Liu","Zheyuan Zhang","Siyuan Zhou","Reuben Tan","Jianwei Yang","Yilun Du","Chuang Gan"],"abstract":"Spatial reasoning in 3D space is central to human cognition and indispensable for embodied tasks such as navigation and manipulation. However, state-of-the-art vision-language models (VLMs) struggle frequently with tasks as simple as anticipating how a scene will look after an egocentric motion: they perceive 2D images but lack an internal model of 3D dynamics. We therefore propose MindJourney, a test-time scaling framework that grants a VLM with this missing capability by coupling it to a controllable world model based on video diffusion. The VLM iteratively sketches a concise camera trajectory, while the world model synthesizes the corresponding view at each step. The VLM then reasons over this multi-view evidence gathered during the interactive exploration. Without any fine-tuning, our MindJourney achieves over an average 8% performance boost on the representative spatial reasoning be...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Miscellaneous","Computer vision","Computer science","Computer Vision and Pattern Recognition"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W4412445007","title":"I2EKD: Efficient and Versatile Image-to-Event Knowledge Distillation","url":"https://doi.org/10.1109/tcsvt.2025.3589222","published":"2025-07-15","authors":["Haotian Liu","Yu Guo","Hu Cao","Sanqing Qu","Fan Lü","Yan Zhong","Zhichao Lu","Luziwei Leng","Guang Chen"],"abstract":"Recently, general-purpose features for event camera data have become increasingly important in advancing event-based vision applications. Current methods typically adopt pre-training paradigms, yielding promising performance. However, the limited data and sparse spatial information of events hinder effective use of pretraining for rich semantic learning. In this paper, we tackle semantic scarcity by transferring knowledge from large pre-trained image models, without increasing event training data. Concretely, we propose a novel image-to-event knowledge distillation method named I2EKD. Acknowledging that different backbones suit different applications, we fix the teacher and keep the student architecture flexible. To improve versatility, we equip I2EKD with two model-agnostic objectives at the logit and feature levels. Additionally, without task-specific objectives or labels, I2EKD avoids...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/tcsvt.2025.3589222","openalex_id":"https://openalex.org/W4412445007","cited_by_count":0,"quality_score":45,"matched_keywords":["efficient","distillation"],"author_affiliations":["City University of Hong Kong","Huawei Technologies (China)","Peking University","Technical University of Munich","Tongji University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6679874658584595},{"id":"https://openalex.org/C204030448","display_name":"Distillation","score":0.5845699310302734},{"id":"https://openalex.org/C9417928","display_name":"Image processing","score":0.46973809599876404},{"id":"https://openalex.org/C115961682","display_name":"Image (mathematics)","score":0.4666072726249695},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.45680350065231323},{"id":"https://openalex.org/C2779662365","display_name":"Event (particle physics)","score":0.4347417950630188},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.41474398970603943},{"id":"https://openalex.org/C124504099","display_name":"Image segmentation","score":0.41098758578300476}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4412444884","title":"Unlocking High-Fidelity Learning: Towards Neuron-Grained Model Extraction","url":"https://doi.org/10.1109/tdsc.2025.3588857","published":"2025-07-15","authors":["Yaxin Xiao","Haibo Hu","Qingqing Ye","Tang Li","Zi Liang","Huadi Zheng"],"abstract":"Model extraction (ME) attacks replicate valuable black-box machine learning (ML) models via malicious query interactions. Cutting-edge attacks focus on actively designing query samples to enhance model fidelity and imprudently adhere to the standard ML training approach. This causes a deviation from the true objective of learning a model over a task. In this paper, we innovatively shift our focus from query selection to training process optimization, aiming to boost the similarity of the copy model with the victim model from neuron to model level. We leverage neuron matching theory to attain this objective and develop a general training booster framework, MEBooster, to fully exploit this theory. MEBooster comprises an initial bootstrapping phase that furnishes initial parameters and an optimal model architecture, followed by a post-processing phase that employs fine-tuning for enhanced n...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/tdsc.2025.3588857","openalex_id":"https://openalex.org/W4412444884","cited_by_count":1,"quality_score":38,"matched_keywords":[],"author_affiliations":["Hong Kong Polytechnic University","Huawei Technologies (China)","Yunnan University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7606921195983887},{"id":"https://openalex.org/C2776459999","display_name":"Fidelity","score":0.5868387222290039},{"id":"https://openalex.org/C4725764","display_name":"Extraction (chemistry)","score":0.5668469071388245},{"id":"https://openalex.org/C186565885","display_name":"Biological neuron model","score":0.4223908483982086},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.41036924719810486},{"id":"https://openalex.org/C50644808","display_name":"Artificial neural network","score":0.2070314884185791},{"id":"https://openalex.org/C76155785","display_name":"Telecommunications","score":0.17100182175636292},{"id":"https://openalex.org/C185592680","display_name":"Chemistry","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4412441320","title":"Origami-based collaborative spatial problem-solving: Multimodal observational study","url":"https://doi.org/10.1016/j.tsc.2025.101920","published":"2025-07-15","authors":["Vitaliy Popov","Yaoran Li","Hanxiang Du","Gaoxia Zhu","Perla Myers","Lisa M. Ridgley","David C. Geary"],"abstract":"• Students tested on origami task to reveal impact of collaborative problem solving. • Successful pairs drew/gestured more than manipulated. • Results suggest how to best prep students for spatial tasks. The study examined students' spatial collaborative problem-solving behaviors when engaging in a design task dependent on spatial reasoning. Thirty undergraduate students working alone and collaboratively were tested on performance differences in solving an origami Sonobe cube as an active hands-on spatial problem solving task. Epistemic network analysis and sequential pattern mining were conducted to reveal the relationships among collaborative problem solving behaviors and students' embodied engagement displayed in low - versus high-performing student pairs. The core findings were that successful student pairs (higher scores on their final sketches) engaged more in sketching and gesturi...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1016/j.tsc.2025.101920","openalex_id":"https://openalex.org/W4412441320","cited_by_count":1,"quality_score":38,"matched_keywords":[],"author_affiliations":["Apple (United States)","Institute for Learning Innovation","Nanyang Technological University","University of Michigan","University of Missouri","Western Washington University"],"concepts":[{"id":"https://openalex.org/C23131810","display_name":"Observational study","score":0.8122977018356323},{"id":"https://openalex.org/C2780910867","display_name":"Multimodality","score":0.45736250281333923},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.4406711757183075},{"id":"https://openalex.org/C15744967","display_name":"Psychology","score":0.42704012989997864},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.36390402913093567},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.3475814461708069},{"id":"https://openalex.org/C33923547","display_name":"Mathematics","score":0.22662311792373657},{"id":"https://openalex.org/C105795698","display_name":"Statistics","score":0.207841157913208}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"apple:jk1aysl4byvt5atz06lu771s","title":"ILuvUI: Instruction-Tuned Language-Vision Modeling of UIs from Machine Conversations","url":"https://machinelearning.apple.com/research/iluvui-instruction-tuned","published":"2025-07-14","authors":["Yue Jiang","Eldon Schoop","Amanda Swearngin","Jeffrey Nichols"],"abstract":"Multimodal Vision-Language Models (VLMs) enable powerful applications from their fused understanding of images and language, butmany perform poorly on UI tasks due to the lack of UI training data. In this paper, we adapt a recipe for generating paired text-imagetraining data for VLMs to the UI domain by combining existing pixel-based methods with a Large Language Model (LLM). Unlikeprior art, our method requires no human-provided annotations,...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.1145/3708359.3712129","openalex_id":"https://openalex.org/W4408615011","cited_by_count":7,"quality_score":67,"matched_keywords":["LLM","language model"],"author_affiliations":["Apple","Aalto University","Apple (United States)"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"apple:u7lj06wv55myfef4h3d5dtlx","title":"Visatronic: A Multimodal Decoder-Only Model for Speech Synthesis","url":"https://machinelearning.apple.com/research/multimodal-decoder-only-model","published":"2025-07-14","authors":["Akshita Gupta","Tatiana Likhomanenko","Karren Yang","Richard He Bai","Zakaria Aldeneh","Navdeep Jaitly"],"abstract":"The rapid progress of foundation models and large language models (LLMs) has fueled significantly improvement in the capabilities of machine learning systems that benefit from mutlimodal input data. However, existing multimodal models arepredominantly built on top of pre-trained LLMs, which can limit accurate modeling of temporal dependencies across other modalities and thus limit the model’s ability to jointly process and leverage multimodal...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"apple:gb6dirguyinm83obe6vds49j","title":"Overcoming Vocabulary Constraints with Pixel-level Fallback","url":"https://machinelearning.apple.com/research/overcoming-vocabulary-constraints","published":"2025-07-14","authors":["Jonas F. Lotz","Hendra Setiawan","Stephan Peitz","Yova Kementchedjhieva"],"abstract":"Subword tokenization requires balancing computational efficiency and vocabulary coverage, which often leads to suboptimal performance on languages and scripts not prioritized during training. We propose to augment pretrained language models with a vocabulary-free encoder that generates input embeddings from text rendered as pixels. Through experiments on English-centric language models, we demonstrate that our approach substantially improves...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"openalex:W4414604570","title":"A controller generation platform for supernumerary robotic leg: Pilot Design and Simulation","url":"https://doi.org/10.1109/aim64088.2025.11175774","published":"2025-07-14","authors":["Jianghao Zhao","Youtao Zhou","Lingyi Meng","Xiong Li","Enhao Zheng"],"abstract":"With the continuous development of Supernumerary Robotic Limbs (SRLs), SRLs systems that can collaborate with the human body have shown great potential to enhance human motion capabilities and reduce joint burden. However, current SRLs design methods often face challenges such as limited task adaptability and suboptimal design efficiency. To address these issues, this study proposes a multi-modal SRLs controller generation platform. Our current work is a preliminary simulation. The platform can automatically train a universal SRLs controller based on the input human motion dataset of multiple locomotion modes in parallel, thus generating adaptive control trajectories to the corresponding motion tasks. First, we designed and implemented a multitask parallel training platform, allowing SRLs to share a common policy across multiple tasks, thus improving design efficiency and task adaptabili...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/aim64088.2025.11175774","openalex_id":"https://openalex.org/W4414604570","cited_by_count":0,"quality_score":41,"matched_keywords":["efficient"],"author_affiliations":["Beihang University","Chinese Academy of Sciences","Institute of Automation","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C203479927","display_name":"Controller (irrigation)","score":0.670199990272522},{"id":"https://openalex.org/C2780451532","display_name":"Task (project management)","score":0.6657000184059143},{"id":"https://openalex.org/C177606310","display_name":"Adaptability","score":0.5820000171661377},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5267999768257141},{"id":"https://openalex.org/C133731056","display_name":"Control engineering","score":0.5166000127792358},{"id":"https://openalex.org/C127413603","display_name":"Engineering","score":0.46160000562667847},{"id":"https://openalex.org/C97541855","display_name":"Reinforcement learning","score":0.4426000118255615},{"id":"https://openalex.org/C145565327","display_name":"Motion control","score":0.43299999833106995}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/through-the-river-understanding-the-benefit-of-schedule-free-methods-for-language-model-training","title":"Through the River: Understanding the Benefit of Schedule-Free Methods for Language Model Training","url":"https://www.microsoft.com/en-us/research/publication/through-the-river-understanding-the-benefit-of-schedule-free-methods-for-language-model-training/","published":"2025-07-13","authors":["Minhak Song","Beomhan Baek","Kwangjun Ahn","Chulhee Yun"],"abstract":"As both model and dataset sizes continue to scale rapidly, conventional pretraining strategies with fixed compute budgets-such as cosine learning rate schedules-are increasingly inadequate for large-scale training. Recent alternatives, including warmup-stable-decay (WSD) schedules and weight averaging, offer greater flexibility. However, WSD relies on explicit decay phases to track progress, while weight averaging addresses this limitation at the cost of additional memory. In search of a more principled and scalable alternative, we revisit the Schedule-Free (SF) method [Defazio et al., 2024], which has shown strong empirical performance across diverse settings. We show that SF-AdamW effectively navigates the \"river\" structure of the loss landscape without decay phases or auxiliary averaging, making it particularly suitable for continuously scaling training workloads. To understand this b...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":84,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","Language model","mathematics","1970-01-01","language model","memory"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/minerva-a-programmable-memory-test-benchmark-for-language-models","title":"Minerva: A Programmable Memory Test Benchmark for Language Models","url":"https://www.microsoft.com/en-us/research/publication/minerva-a-programmable-memory-test-benchmark-for-language-models/","published":"2025-07-13","authors":["Menglin Xia","Victor Ruehle","Saravan Rajmohan","Reza Shokri"],"abstract":"How effectively can LLM-based AI assistants utilize their memory (context) to perform various tasks? Traditional data benchmarks, which are often manually crafted, suffer from several limitations: they are static, susceptible to overfitting, difficult to interpret, and lack actionable insights--failing to pinpoint the specific capabilities a model lacks when it does not pass a test. In this paper, we present a framework for automatically generating a comprehensive set of tests to evaluate models' abilities to use their memory effectively. Our framework extends the range of capability tests beyond the commonly explored (passkey, key-value, needle in the haystack) search, a dominant focus in the literature. Specifically, we evaluate models on atomic tasks such as searching, recalling, editing, matching, comparing information in context memory, and performing basic operations when inputs ar...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":80,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","large language models","1970-01-01","LLM","memory"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/mminference-accelerating-pre-filling-for-long-context-vlms-via-modality-aware-permutation-sparse-attention","title":"MMInference: Accelerating Pre-filling for Long-Context VLMs via Modality-Aware Permutation Sparse Attention","url":"https://www.microsoft.com/en-us/research/publication/mminference-accelerating-pre-filling-for-long-context-vlms-via-modality-aware-permutation-sparse-attention/","published":"2025-07-13","authors":["Yucheng Li","Huiqiang Jiang","Chengruidong Zhang","Qianhui Wu","Xufang Luo","Surin Ahn","Amir H. Abdi","Dongsheng Li","Jianfeng Gao","Yuqing Yang","Lili Qiu"],"abstract":"The integration of long-context capabilities with visual understanding unlocks unprecedented potential for Vision Language Models (VLMs). However, the quadratic attention complexity during the pre-filling phase remains a significant obstacle to real-world deployment. To overcome this limitation, we introduce MMInference (Multimodality Million tokens Inference), a dynamic sparse attention method that accelerates the prefilling stage for long-context multi-modal inputs. First, our analysis reveals that the temporal and spatial locality of video input leads to a unique sparse pattern, the Grid pattern. Simultaneously, VLMs exhibit markedly different sparse distributions across different modalities. We introduce a permutation-based method to leverage the unique Grid pattern and handle modality boundary issues. By offline search the optimal sparse patterns for each head, MMInference construct...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":80,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Human language technologies","Computation and Language","large language models","1970-01-01","efficient"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"arxiv:2601.20048","title":"Insight Agents: An LLM-Based Multi-Agent System for Data Insights","url":"http://arxiv.org/abs/2601.20048","published":"2025-07-13","authors":["Jincheng Bai","Zhenyu Zhang","Jinlian ZHANG","Jason Zhu"],"abstract":"Today, E-commerce sellers face several key challenges, including difficulties in discovering and effectively utilizing available programs and tools, and struggling to understand and utilize rich data from various tools. We therefore aim to develop Insight Agents (IA), a conversational multi-agent Data Insight system, to provide E-commerce sellers with personalized data and business insights through automated information retrieval. Our hypothesis is that IA will serve as a force multiplier for sellers, thereby driving incremental seller adoption by reducing the effort required and increase speed at which sellers make good business decisions. In this paper, we introduce this new LLM-backed end-to-end agentic workflow designed for comprehensive coverage, high accuracy, and low latency. It features a hierarchical multi-agent structure, consisting of manager agent and two worker agents: data....","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3726302.3731959","openalex_id":"https://openalex.org/W4412394909","cited_by_count":0,"quality_score":61,"matched_keywords":["LLM","personalized","retrieval","efficient","agent","multi-agent"],"author_affiliations":["Amazon (United States)","Seattle University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6642266511917114},{"id":"https://openalex.org/C41550386","display_name":"Multi-agent system","score":0.5086637139320374},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.19682317972183228}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"arxiv:2503.19092","title":"Rankers, Judges, and Assistants: Towards Understanding the Interplay of LLMs in Information Retrieval Evaluation","url":"http://arxiv.org/abs/2503.19092","published":"2025-07-13","authors":["Krisztian Balog","Donald Metzler","Zhen Qin"],"abstract":"Large language models (LLMs) are increasingly integral to information retrieval (IR), powering ranking, evaluation, and AI-assisted content creation. This widespread adoption necessitates a critical examination of potential biases arising from the interplay between these LLM-based components. This paper synthesizes existing research and presents novel experiment designs that explore how LLM-based rankers and assistants influence LLM-based judges. We provide the first empirical evidence of LLM judges exhibiting significant bias towards LLM-based rankers. Furthermore, we observe limitations in LLM judges' ability to discern subtle system performance differences. Contrary to some previous findings, our preliminary study does not find evidence of bias against AI-generated content. These results highlight the need for a more holistic view of the LLM-driven information ecosystem. To this end,....","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3726302.3730348","openalex_id":"https://openalex.org/W4412377928","cited_by_count":14,"quality_score":59,"matched_keywords":["LLM","retrieval"],"author_affiliations":["Google (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5254147052764893},{"id":"https://openalex.org/C2522767166","display_name":"Data science","score":0.331204891204834},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.32733625173568726}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":14}},{"id":"openalex:W4412376921","title":"LLM-Empowered Creator Simulation for Long-Term Evaluation of Recommender Systems Under Information Asymmetry","url":"https://doi.org/10.1145/3726302.3730026","published":"2025-07-13","authors":["Xiaopeng Ye","Xu Chen","Zhongxiang Sun","Jun Xu","Gang Wang","Zhenhua Dong","Ji-Rong Wen"],"abstract":"Maintaining the long-term sustainability of recommender systems (RS) is crucial.Traditional RS evaluation methods primarily focus on the user's immediate feedback (e.g., click), however, they often overlook the long-term effect involved by the content creators.In the real world, content creators can strategically create and upload new items to the platform by analyzing users' feedback and preference trends.Although previous studies have attempted to model creator behaviors, they often overlook that such behaviors are under conditions of information asymmetry.This asymmetry arises because creators mainly access the user feedback on the items they produce, while the platform has access to the full spectrum of feedback data.However, existing RS simulators often fail to consider such a condition, making the long-term RS evaluation inaccurate.To bridge this gap, we propose a Large Language Mo...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3726302.3730026","openalex_id":"https://openalex.org/W4412376921","cited_by_count":1,"quality_score":58,"matched_keywords":["LLM","language model","preference","long-term","agent"],"author_affiliations":["Beijing Academy of Artificial Intelligence","Huawei Technologies (China)","Renmin University of China"],"concepts":[{"id":"https://openalex.org/C557471498","display_name":"Recommender system","score":0.8219081163406372},{"id":"https://openalex.org/C61797465","display_name":"Term (time)","score":0.7813275456428528},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7023584246635437},{"id":"https://openalex.org/C38976095","display_name":"Asymmetry","score":0.6452593803405762},{"id":"https://openalex.org/C137577040","display_name":"Information asymmetry","score":0.46459269523620605},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.38716840744018555},{"id":"https://openalex.org/C144133560","display_name":"Business","score":0.06464406847953796},{"id":"https://openalex.org/C62520636","display_name":"Quantum mechanics","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"official:333b585579bd3822","title":"SLIM: ONE-SHOT QUANTIZED SPARSE PLUS LOW-RANK APPROXIMATION OF LLMS","url":"https://deepmind.google/research/publications/148040/","published":"2025-07-13","authors":["Google/DeepMind"],"abstract":"","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Google/DeepMind"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Google DeepMind publications page https://deepmind.google/research/publications/"}},{"id":"openalex:W4412377067","title":"MRAMG-Bench: A Comprehensive Benchmark for Advancing Multimodal Retrieval-Augmented Multimodal Generation","url":"https://doi.org/10.1145/3726302.3730288","published":"2025-07-13","authors":["Qinhan Yu","Zhiyou Xiao","Binghui Li","Zhengren Wang","Chong Chen","Wentao Zhang"],"abstract":"Recent advances in Retrieval-Augmented Generation (RAG) have significantly improved response accuracy and relevance by incorporating external knowledge into Large Language Models (LLMs). However, existing RAG methods primarily focus on generating text-only answers, even in Multimodal Retrieval-Augmented Generation (MRAG) scenarios, where multimodal elements are retrieved to assist in generating text answers. To address this, we introduce the Multimodal Retrieval-Augmented Multimodal Generation (MRAMG) task, in which we aim to generate multimodal answers that combine both text and images, fully leveraging the multimodal data within a corpus. Despite growing attention to this challenging task, a notable lack of a comprehensive benchmark persists for effectively evaluating its performance. To bridge this gap, we provide MRAMG-Bench, a meticulously curated, human-annotated benchmark comprisi...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3726302.3730288","openalex_id":"https://openalex.org/W4412377067","cited_by_count":3,"quality_score":52,"matched_keywords":["LLM","retrieval","efficient"],"author_affiliations":["Huawei Technologies (China)","Peking University"],"concepts":[{"id":"https://openalex.org/C185798385","display_name":"Benchmark (surveying)","score":0.8140205144882202},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.733521580696106},{"id":"https://openalex.org/C4441509","display_name":"Multimodal therapy","score":0.5884115695953369},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.43731772899627686},{"id":"https://openalex.org/C135641252","display_name":"Multimodal interaction","score":0.4234354496002197},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.32850271463394165},{"id":"https://openalex.org/C205649164","display_name":"Geography","score":0.05395665764808655},{"id":"https://openalex.org/C141071460","display_name":"Surgery","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":3}},{"id":"official:4d8469b16bab8d44","title":"Long-Form Speech Generation with Spoken Language Models","url":"https://deepmind.google/research/publications/126936/","published":"2025-07-13","authors":["Google/DeepMind"],"abstract":"","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Google/DeepMind"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Google DeepMind publications page https://deepmind.google/research/publications/"}},{"id":"official:0813e4e955207944","title":"Large Language Models as Rankers, Judges, and Assistants: A Perspective on the Potential Over-Reliance on LLMs in IR","url":"https://deepmind.google/research/publications/147939/","published":"2025-07-13","authors":["Google/DeepMind"],"abstract":"","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Google/DeepMind"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Google DeepMind publications page https://deepmind.google/research/publications/"}},{"id":"arxiv:2509.07594","title":"ELEC: Efficient Large Language Model-Empowered Click-Through Rate Prediction","url":"http://arxiv.org/abs/2509.07594","published":"2025-07-13","authors":["Rui Dong","Wentao Ouyang","Xiangzheng Liu"],"abstract":"Click-through rate (CTR) prediction plays an important role in online advertising systems. On the one hand, traditional CTR prediction models capture the collaborative signals in tabular data via feature interaction modeling, but they lose semantics in text. On the other hand, Large Language Models (LLMs) excel in understanding the context and meaning behind text, but they face challenges in capturing collaborative signals and they have long inference latency. In this paper, we aim to leverage the benefits of both types of models and pursue collaboration, semantics and efficiency. We present ELEC, which is an Efficient LLM-Empowered CTR prediction framework. We first adapt an LLM for the CTR prediction task. In order to leverage the ability of the LLM but simultaneously keep efficiency, we utilize the pseudo-siamese network which contains a gain network and a vanilla network. We inject t...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3726302.3730188","openalex_id":"https://openalex.org/W4412376982","cited_by_count":1,"quality_score":50,"matched_keywords":["LLM","language model","efficient"],"author_affiliations":["Alibaba Group (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7313045263290405},{"id":"https://openalex.org/C115174607","display_name":"Click-through rate","score":0.518508791923523},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.5118376612663269},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.35231560468673706},{"id":"https://openalex.org/C199360897","display_name":"Programming language","score":0.33158227801322937},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3307472765445709},{"id":"https://openalex.org/C136764020","display_name":"World Wide Web","score":0.10818007588386536}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4412394954","title":"Language Model Alignment for Conversational Shopping at Amazon","url":"https://doi.org/10.1145/3726302.3731955","published":"2025-07-13","authors":["Chen Luo","Dimitri Papadimitriou","Hariharan Muralidharan","Dhineshkumar Ramasubbu","Aakash Kolekar","Wenju Xu","Cong Xu","Anirudh Srinivasan","Mukesh Jain","Qi He"],"abstract":"The rapid growth of online shopping stores, such as Amazon, has led to services reaching billions of people worldwide. With global retail sales exceeding $6 trillion in 2024, customer expectations for personalized and seamless shopping experiences have heightened. Traditional online shopping experiences, such as search and navigation systems, often fall short in addressing complex shopping journeys. Conversational shopping (such as Amazon Rufus) offers a transformative approach by enabling dynamic, multi-turn dialogues that closely resemble human interactions. This allows customers to explore product options, seek clarifications, and receive personalized recommendations, thereby enhancing product discovery and informed decision-making. In this paper, we share our year-long journey of using language models for conversational shopping at Amazon and introduce how we use LLM fine-tuning tech...","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3726302.3731955","openalex_id":"https://openalex.org/W4412394954","cited_by_count":0,"quality_score":49,"matched_keywords":["LLM","language model","personalized"],"author_affiliations":["Amazon (United States)"],"concepts":[{"id":"https://openalex.org/C535291247","display_name":"Amazon rainforest","score":0.8477104902267456},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7036004066467285},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.44870415329933167},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3728969693183899},{"id":"https://openalex.org/C18903297","display_name":"Ecology","score":0.06904816627502441},{"id":"https://openalex.org/C86803240","display_name":"Biology","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4412378188","title":"GENNEXT: The Next Generation of IR and Recommender Systems with Language Agents, Generative Models, and Conversational AI","url":"https://doi.org/10.1145/3726302.3730369","published":"2025-07-13","authors":["Yashar Deldjoo","Scott Sanner","Enrico Palumbo","Hugues Bouchard","Shuai Zhang","Pablo Castells","Julian McAuley"],"abstract":"We present GENNEXT, a workshop dedicated to exploring the integration of language agents, generative models, and conversational AI within information retrieval (IR) and recommender systems (RS). Building on the success of our recent RecSys'24 workshop, GENNEXT aims to advance discussions on the applications of language agents powered by Large Language Models (LLMs). The workshop will focus on enhancing interactivity between users and systems through multi-turn dialogues, improving creative content generation, advancing personalization, and enabling multifaceted, context-aware decision-making. For example, a language agent could respond to a query like ''Suggest an eco-friendly food tour for a weekend in my city'' by using a recommendation API to identify eateries specializing in sustainable or organic cuisine and a pollution API to ensure the selected routes have low air pollution levels...","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3726302.3730369","openalex_id":"https://openalex.org/W4412378188","cited_by_count":0,"quality_score":49,"matched_keywords":["personalization","retrieval","agent"],"author_affiliations":["Amazon (United States)","Polytechnic University of Bari","UC San Diego Health System","Universidad Autónoma de Madrid","University of Toronto"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7805651426315308},{"id":"https://openalex.org/C557471498","display_name":"Recommender system","score":0.7590281963348389},{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.7485470771789551},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.5265591144561768},{"id":"https://openalex.org/C190954187","display_name":"Dialog system","score":0.5167064666748047},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.511816680431366},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.35290011763572693},{"id":"https://openalex.org/C136764020","display_name":"World Wide Web","score":0.340767502784729}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4412377794","title":"Unveiling Knowledge Utilization Mechanisms in LLM-based Retrieval-Augmented Generation","url":"https://doi.org/10.1145/3726302.3730112","published":"2025-07-13","authors":["Yuhao Wang","Ruiyang Ren","Yucheng Wang","Wayne Xin Zhao","Jing Liu","Hua Wu","Haifeng Wang"],"abstract":"Considering the inherent limitations of parametric knowledge in large language models (LLMs), retrieval-augmented generation (RAG) is widely employed to expand their knowledge scope. Since RAG has shown promise in knowledge-intensive tasks like open-domain question answering, its broader application to complex tasks and intelligent assistants has further advanced its utility. Despite this progress, the underlying knowledge utilization mechanisms of LLM-based RAG remain underexplored. In this paper, we present a systematic investigation of the intrinsic mechanisms by which LLMs integrate internal (parametric) and external (retrieved) knowledge in RAG scenarios. Specially, we employ knowledge stream analysis at the macroscopic level, and investigate the function of individual modules at the microscopic level. Drawing on knowledge streaming analyses, we decompose the knowledge utilization p...","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3726302.3730112","openalex_id":"https://openalex.org/W4412377794","cited_by_count":2,"quality_score":47,"matched_keywords":["LLM","retrieval"],"author_affiliations":["Baidu (China)","Renmin University of China"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6773676872253418},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.38956695795059204}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":2}},{"id":"openalex:W4412377058","title":"The Great Nugget Recall: Automating Fact Extraction and RAG Evaluation with Large Language Models","url":"https://doi.org/10.1145/3726302.3730090","published":"2025-07-13","authors":["Ronak Pradeep","Nandan Thakur","Shivani Upadhyay","Daniel Campos","Nick Craswell","Ian Soboroff","Hoa Trang Dang","Jimmy Lin"],"abstract":"Large Language Models (LLMs) have significantly enhanced the capabilities of information access systems, especially with retrieval-augmented generation (RAG). Nevertheless, the evaluation of RAG systems remains a barrier to continued progress, a challenge we tackle in this work by proposing an automatic evaluation framework that is validated against human annotations. We believe that the nugget evaluation methodology provides a solid foundation for evaluating RAG systems. This approach, originally developed for the TREC Question Answering (QA) Track in 2003, evaluates systems based on atomic facts that should be present in good answers. Our efforts focus on ''refactoring'' this methodology, where we describe the AutoNuggetizer framework that specifically applies LLMs to both automatically create nuggets and automatically assign nuggets to system answers. In the context of the TREC 2024 R...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3726302.3730090","openalex_id":"https://openalex.org/W4412377058","cited_by_count":6,"quality_score":47,"matched_keywords":["retrieval"],"author_affiliations":["College of San Mateo","Microsoft (United States)","Seattle University","University of Waterloo"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7547250986099243},{"id":"https://openalex.org/C100660578","display_name":"Recall","score":0.6316668391227722},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.520248532295227},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.5000150203704834},{"id":"https://openalex.org/C4725764","display_name":"Extraction (chemistry)","score":0.49872398376464844},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4285638630390167},{"id":"https://openalex.org/C199360897","display_name":"Programming language","score":0.3673613667488098},{"id":"https://openalex.org/C41895202","display_name":"Linguistics","score":0.1209297776222229}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":6}},{"id":"openalex:W4412377049","title":"Reason-to-Rank: Distilling Direct and Comparative Reasoning from Large Language Models for Document Reranking","url":"https://doi.org/10.1145/3726302.3730070","published":"2025-07-13","authors":["Yuelyu Ji","Zhuochun Li","Rui Meng","Daqing He"],"abstract":"Reranking documents in information retrieval often relies on black-box models that improve effectiveness but lack explainability. We introduce Reason-to-Rank (R2R), a novel framework that separates direct relevance reasoning from comparison reasoning to provide both direct and comparitive explanations. We first prompt a large language model to produce comprehensive rationales and a ranking order; then we distill both the ranking decisions and textual explanations into a smaller, open-source student model. Our approach not only improves retrieval performance, as demonstrated in MSMARCO, BEIR, and BRIGHT, but also provides interpretable justifications for why one document outranks another. We report NDCG@5 (and NDCG@10) for direct comparisons with prior work, and show that the distilled student model achieves competitive results while significantly reducing computational overhead. By unify...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3726302.3730070","openalex_id":"https://openalex.org/W4412377049","cited_by_count":2,"quality_score":47,"matched_keywords":["language model","retrieval"],"author_affiliations":["Google (United States)","University of Pittsburgh"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7322608232498169},{"id":"https://openalex.org/C164226766","display_name":"Rank (graph theory)","score":0.6431009769439697},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.4327136278152466},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.4311051368713379},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.40193697810173035},{"id":"https://openalex.org/C33923547","display_name":"Mathematics","score":0.06716346740722656},{"id":"https://openalex.org/C114614502","display_name":"Combinatorics","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":2}},{"id":"openalex:W4412394824","title":"Efficiency Unleashed: Inference Acceleration for LLM-based Recommender Systems with Speculative Decoding","url":"https://doi.org/10.1145/3726302.3729961","published":"2025-07-13","authors":["Yunjia Xi","Hangyu Wang","Bo Chen","Jianghao Lin","Menghui Zhu","Weiwen Liu","Ruiming Tang","Zhewei Wei","Weinan Zhang","Yong Yu"],"abstract":"The past few years have witnessed a growing interest in LLM-based recommender systems (RSs), although their industrial deployment remains in a preliminary stage. Most existing deployments leverage LLMs offline as feature enhancers, generating augmented knowledge for downstream tasks. However, in recommendation scenarios with numerous users and items, even offline knowledge generation with LLMs demands significant time and computational resources. This inefficiency arises from the autoregressive nature of LLMs. A promising solution is speculative decoding, a Draft-Then-Verify approach that increases the number of tokens generated per decoding step. In this work, we first identify recommendation knowledge generation as a highly fitting use case for retrieval-based speculative decoding. Then, we discern its two characteristics: (1) the vast number of items and users in RSs leads to retrieva...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3726302.3729961","openalex_id":"https://openalex.org/W4412394824","cited_by_count":2,"quality_score":47,"matched_keywords":["LLM","retrieval"],"author_affiliations":["Huawei Technologies (China)","Renmin University of China","Shanghai Jiao Tong University"],"concepts":[{"id":"https://openalex.org/C557471498","display_name":"Recommender system","score":0.7700895071029663},{"id":"https://openalex.org/C117896860","display_name":"Acceleration","score":0.7381864786148071},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7340912222862244},{"id":"https://openalex.org/C57273362","display_name":"Decoding methods","score":0.6787075996398926},{"id":"https://openalex.org/C2776214188","display_name":"Inference","score":0.659202516078949},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.31870192289352417},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.28851407766342163},{"id":"https://openalex.org/C11413529","display_name":"Algorithm","score":0.20764762163162231}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":2}},{"id":"openalex:W4412394843","title":"Embracing Plasticity: Balancing Stability and Plasticity in Continual Recommender Systems","url":"https://doi.org/10.1145/3726302.3729964","published":"2025-07-13","authors":["Hyunsik Yoo","SeongKu Kang","Ruizhong Qiu","Charlie Xu","Fei Wang","Hanghang Tong"],"abstract":"In the era of big data and AI, recommender systems must adapt to evolving user preferences and new users/items to maintain high-quality recommendations. Fine-tuning, which updates model parameters using only new data, offers an efficient alternative to full retraining but struggles to balance stability (retaining past knowledge) and plasticity (adapting to new knowledge). While existing methods prioritize stability to address catastrophic forgetting, we argue that plasticity must also be explicitly strengthened, especially for users with rapidly changing preferences. In this work, we propose PlastIcity and StAbility balancing continual recommender systems (PISA), a novel framework that adaptively balances stability and plasticity based on user preference shifts. PISA quantifies preference shifts as changes in user distances to item clusters, and then guides user embeddings by prioritizin...","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3726302.3729964","openalex_id":"https://openalex.org/W4412394843","cited_by_count":1,"quality_score":46,"matched_keywords":["preference","efficient"],"author_affiliations":["Amazon (United States)","Korea University","University of Illinois Urbana-Champaign"],"concepts":[{"id":"https://openalex.org/C79186407","display_name":"Plasticity","score":0.7597784996032715},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6605616807937622},{"id":"https://openalex.org/C557471498","display_name":"Recommender system","score":0.6500638723373413},{"id":"https://openalex.org/C112972136","display_name":"Stability (learning theory)","score":0.5113245844841003},{"id":"https://openalex.org/C136764020","display_name":"World Wide Web","score":0.20433184504508972},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.12376123666763306},{"id":"https://openalex.org/C192562407","display_name":"Materials science","score":0.10712820291519165},{"id":"https://openalex.org/C159985019","display_name":"Composite material","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4412376941","title":"Assessing Support for the TREC 2024 RAG Track: A Large-Scale Comparative Study of LLM and Human Evaluations","url":"https://doi.org/10.1145/3726302.3730165","published":"2025-07-13","authors":["Nandan Thakur","Ronak Pradeep","Shivani Upadhyay","Daniel Campos","Nick Craswell","Ian Soboroff","Hoa Trang Dang","Jimmy Lin"],"abstract":"Retrieval-augmented generation (RAG) enables large language models (LLMs) to generate answers with citations from source documents containing ''ground truth''. A crucial factor in RAG evaluation is ''support'', or whether the information in the cited documents supports the answer. We conducted a comparative study of submissions to the TREC 2024 RAG Track, evaluating an automatic LLM judge (GPT-4o) against human judges for support assessment. We considered two conditions: (1) fully manual assessments from scratch and (2) manual assessments with post-editing of LLM predictions. Our results indicate good agreement between human and GPT-4o predictions. Further analysis of the disagreements shows that an independent human judge correlates better with GPT-4o than a human judge, suggesting that LLM judges can be a reliable alternative for support assessment. We provide a qualitative analysis of...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3726302.3730165","openalex_id":"https://openalex.org/W4412376941","cited_by_count":1,"quality_score":46,"matched_keywords":["LLM","retrieval"],"author_affiliations":["College of San Mateo","Microsoft (United States)","Seattle University","University of Waterloo"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7399290800094604},{"id":"https://openalex.org/C89992363","display_name":"Track (disk drive)","score":0.7322548627853394},{"id":"https://openalex.org/C2778755073","display_name":"Scale (ratio)","score":0.5840228199958801},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.37464791536331177},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.32952234148979187},{"id":"https://openalex.org/C111919701","display_name":"Operating system","score":0.14653286337852478},{"id":"https://openalex.org/C58640448","display_name":"Cartography","score":0.0869610607624054},{"id":"https://openalex.org/C205649164","display_name":"Geography","score":0.07087117433547974}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4412394948","title":"Retrieval-Augmented Image Captioning and Generation with Entity Concepts Enhancement for Baidu Multimodal Advertising","url":"https://doi.org/10.1145/3726302.3731957","published":"2025-07-13","authors":["Lei Shen","Kang Zhao","Zhipeng Jin","Wen Tao","Yi Yang","Cong Han","Shuanglong Li","Zhongmin Cai","Lin Liu"],"abstract":"Recent advancements in generative artificial intelligence are driving a significant transformation in information retrieval and content generation, creating substantial opportunities for online advertising. Text-to-image generation technology has become increasingly prevalent in advertising content production, demonstrating promising performance improvements in terms of semantic relevance and visual appeal. However, existing models often suffer from inadequate representation of entity concepts, such as prominent product brands and recognizable landmarks. This inherent limitation subsequently leads to notable deficiencies in brand tonality, industry-specific relevance, and market adaptability of the generated advertising content. To address this challenge, we propose a multimodal ad content generation framework specifically engineered for online advertising system, particularly focused on...","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3726302.3731957","openalex_id":"https://openalex.org/W4412394948","cited_by_count":0,"quality_score":45,"matched_keywords":["language model","retrieval"],"author_affiliations":["Baidu (China)","Xi'an Jiaotong University"],"concepts":[{"id":"https://openalex.org/C157657479","display_name":"Closed captioning","score":0.9625764489173889},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7973471879959106},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.5231608152389526},{"id":"https://openalex.org/C115961682","display_name":"Image (mathematics)","score":0.4667993187904358},{"id":"https://openalex.org/C49774154","display_name":"Multimedia","score":0.4505501389503479},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3909912109375},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.3693086504936218},{"id":"https://openalex.org/C136764020","display_name":"World Wide Web","score":0.33908113837242126}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4412376952","title":"Pre-train, Align, and Disentangle: Empowering Sequential Recommendation with Large Language Models","url":"https://doi.org/10.1145/3726302.3730059","published":"2025-07-13","authors":["Yuhao Wang","Junwei Pan","Pengyue Jia","Wanyu Wang","Maolin Wang","Zhixiang Feng","X. Li","Jie Jiang","Xiangyu Zhao"],"abstract":"Sequential Recommendation (SR) aims to leverage the sequential patterns in users' historical interactions to accurately track their preferences. However, the primary reliance of existing SR methods on collaborative data results in challenges such as the cold-start problem and sub-optimal performance. Concurrently, despite the proven effectiveness of large language models (LLMs), their integration into commercial recommender systems is impeded by issues such as high inference latency, incomplete capture of all distribution statistics, and catastrophic forgetting. To address these issues, we introduce a novel Pre-train, Align, and Disentangle (PAD) framework to enhance SR models with LLMs. In particular, we initially pre-train both the SR and LLM models to obtain collaborative and textual embeddings. Subsequently, we propose a characteristic recommendation-anchored alignment loss using mul...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3726302.3730059","openalex_id":"https://openalex.org/W4412376952","cited_by_count":4,"quality_score":45,"matched_keywords":["LLM"],"author_affiliations":["City University of Hong Kong","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.758507490158081},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.48997417092323303},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.4000636339187622},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.34872183203697205}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":4}},{"id":"openalex:W4412378033","title":"Low-Cost Document Retrieval with Dense Pseudo-Query Encoding","url":"https://doi.org/10.1145/3726302.3730227","published":"2025-07-13","authors":["Shanxiu He","Wentai Xie","Yifan Qiao","Parker Carlson","Tao Yang"],"abstract":"Low-cost retrieval is crucial for document search on resource-limited computing platforms. This paper presents a staged sparse-to-dense retrieval framework that substitutes expensive dense query encoding with a dense pseudo-query (DPQ), an approximation derived solely from sparse retrieval results. DPQ scheme employs a simple, rank-aware weighting to combine corresponding dense representations of top sparse results, providing an opportunity to efficiently leverage an expensive but expressive LLM or BERT-based dense model without requiring GPUs. The evaluation demonstrates that DPQ-based retrieval runs fast on an affordable platform and outperforms several low-cost baselines in zero-shot retrieval.","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3726302.3730227","openalex_id":"https://openalex.org/W4412378033","cited_by_count":0,"quality_score":45,"matched_keywords":["LLM","retrieval"],"author_affiliations":["Apple (United States)","University of California, Santa Barbara"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7970525026321411},{"id":"https://openalex.org/C125411270","display_name":"Encoding (memory)","score":0.7383633852005005},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.6283611059188843},{"id":"https://openalex.org/C99016210","display_name":"Query expansion","score":0.4398234784603119},{"id":"https://openalex.org/C161156560","display_name":"Document retrieval","score":0.42652779817581177},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.24002593755722046}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4412376966","title":"Comprehending Knowledge Graphs with Large Language Models for Recommender Systems","url":"https://doi.org/10.1145/3726302.3729932","published":"2025-07-13","authors":["Ziqiang Cui","Yunpeng Weng","Xing Tang","Fuyuan Lyu","Dugang Liu","Xiuqiang He","Chen Ma"],"abstract":"In recent years, the introduction of knowledge graphs (KGs) has significantly advanced recommender systems by facilitating the discovery of potential associations between items. However, existing methods still face several limitations. First, most KGs suffer from missing facts or limited scopes. Second, existing methods convert textual information in KGs into IDs, resulting in the loss of natural semantic connections between different items. Third, existing methods struggle to capture high-order connections in the global KG. To address these limitations, we propose a novel method called CoLaKG, which leverages large language models (LLMs) to improve KG-based recommendations. The extensive knowledge and remarkable reasoning capabilities of LLMs enable our method to supplement missing facts in KGs, and their powerful text understanding abilities allow for better utilization of semantic inf...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3726302.3729932","openalex_id":"https://openalex.org/W4412376966","cited_by_count":4,"quality_score":45,"matched_keywords":["retrieval"],"author_affiliations":["City University of Hong Kong","McGill University","Shenzhen Technology University","Shenzhen University","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8342345952987671},{"id":"https://openalex.org/C557471498","display_name":"Recommender system","score":0.7923837900161743},{"id":"https://openalex.org/C2987255567","display_name":"Knowledge graph","score":0.6424206495285034},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.4267211854457855},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.42611005902290344},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.36377668380737305},{"id":"https://openalex.org/C2522767166","display_name":"Data science","score":0.3234899044036865}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":4}},{"id":"openalex:W4412376954","title":"Reproducibility, Replicability, and Insights into Visual Document Retrieval with Late Interaction","url":"https://doi.org/10.1145/3726302.3730285","published":"2025-07-13","authors":["Jingfen Qiao","Jia-Huei Ju","Xinyu Ma","Evangelos Kanoulas","Andrew Yates"],"abstract":"Visual Document Retrieval (VDR) is an emerging research area that focuses on encoding and retrieving document images directly, bypassing the dependence on Optical Character Recognition (OCR) for document search. A recent advance in VDR was introduced by ColPali, which significantly improved retrieval effectiveness through a late interaction mechanism. ColPali's approach demonstrated substantial performance gains over existing baselines that do not use late interaction on an established benchmark. In this study, we investigate the reproducibility and replicability of VDR methods with and without late interaction mechanisms by systematically evaluating their performance across multiple pre-trained vision-language models. Our findings confirm that late interaction yields considerable improvements in retrieval effectiveness; however, it also introduces computational inefficiencies during inf...","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3726302.3730285","openalex_id":"https://openalex.org/W4412376954","cited_by_count":1,"quality_score":42,"matched_keywords":["retrieval"],"author_affiliations":["Amsterdam University of the Arts","Baidu (China)","Johns Hopkins University","University of Amsterdam"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7341356873512268},{"id":"https://openalex.org/C9893847","display_name":"Reproducibility","score":0.6952590346336365},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.5764378905296326},{"id":"https://openalex.org/C105795698","display_name":"Statistics","score":0.07781809568405151},{"id":"https://openalex.org/C33923547","display_name":"Mathematics","score":0.07720154523849487}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4412377821","title":"From To-Do to Ta-Da: Transforming Task-Focused IR with Generative AI","url":"https://doi.org/10.1145/3726302.3730352","published":"2025-07-13","authors":["Chirag Shah","Ryen W. White"],"abstract":"For decades, scholars have emphasized that tasks should be the central focus in Information Retrieval (IR).This point of view holds even more significance with the advent of Generative Artificial Intelligence (GenAI) models, which can, among other capabilities, understand natural language, engage in dialog with users, generate bespoke user interfaces, and power agents to help complete tasks.GenAI presents an unprecedented opportunity to finally realize the potential of tasks in IR, enhance task-focused retrieval and interaction, and create \"magical\" task completion moments for users.In this paper, we explore the rationale and methodology behind this argument.Traditional IR systems support mostly simple tasks.The emergence of GenAI creates an opportunity for IR systems to help users achieve complex tasks and for the IR community to rekindle its interest and demonstrate leadership in this....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3726302.3730352","openalex_id":"https://openalex.org/W4412377821","cited_by_count":1,"quality_score":42,"matched_keywords":["retrieval"],"author_affiliations":["Microsoft (United States)","University of Washington"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6625775098800659},{"id":"https://openalex.org/C2780451532","display_name":"Task (project management)","score":0.5239225029945374},{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.5042742490768433},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4268026053905487},{"id":"https://openalex.org/C127413603","display_name":"Engineering","score":0.09401810169219971},{"id":"https://openalex.org/C201995342","display_name":"Systems engineering","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4412377863","title":"Action First: Leveraging Preference-Aware Actions for More Effective Decision-Making in Interactive Recommender Systems","url":"https://doi.org/10.1145/3726302.3729885","published":"2025-07-13","authors":["Renting Rui","Yunjia Xi","Weiwen Liu","Jianghao Lin","Bo Chen","Ruiming Tang","Weinan Zhang","Yong Yu"],"abstract":"Interactive recommender systems (IRSs) aim to meet user needs through natural language dialogues, optimizing recommendations with minimal interactions. Typically, IRSs are based on large language models (LLMs). Existing methods generally consist of two stages: decision-making (deciding whether to recommend or ask clarification questions) and action execution (generating recommendations or clarification questions). These methods usually follow a decision-first paradigm, where the model first decides on the action based on past conversations, and then executes the corresponding action. Since LLMs struggle to process a large number of candidate items, the recommendation process is often carried out in collaboration with external recommendation tools, which provide a small candidate set for LLMs to refine.","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3726302.3729885","openalex_id":"https://openalex.org/W4412377863","cited_by_count":1,"quality_score":42,"matched_keywords":["preference"],"author_affiliations":["Huawei Technologies (China)","Shanghai Jiao Tong University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7917800545692444},{"id":"https://openalex.org/C557471498","display_name":"Recommender system","score":0.7691904306411743},{"id":"https://openalex.org/C2781249084","display_name":"Preference","score":0.7547701001167297},{"id":"https://openalex.org/C2780791683","display_name":"Action (physics)","score":0.6069954037666321},{"id":"https://openalex.org/C2777868144","display_name":"Preference elicitation","score":0.45062002539634705},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.382651150226593},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.278983473777771},{"id":"https://openalex.org/C175444787","display_name":"Microeconomics","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4412394950","title":"SuperRS: Multi Scenario Reciprocal-Aware Dual MoE for Unified Recommendation-Search Ranking","url":"https://doi.org/10.1145/3726302.3731949","published":"2025-07-13","authors":["Zihan Xia","Chuanyu Xu","Tao Zhang","Chengfu Huo"],"abstract":"In e-commerce, search and recommendation rankings require a deep understanding of user behaviors and personalized scoring of products. While existing systems maintain separate pipelines for search and recommendation, these two scenarios share aligned objectives and exhibit consistent data patterns during ranking. To address this, we propose a joint modeling approach for search-recommendation ranking that enables information gain exchange between the two scenarios, thus facilitating enhanced modeling of users' cross-scenario behaviors. Our proposed SuperRS framework employs a Dual-layer Multi-MoE (DualMoE) architecture to tackle scenario-specific disparities and achieve multi-interest fusion perception. A key aspect is the Search-Recommendation Sequence Fusion Unit, which integrates user interaction sequences from both scenarios. Additionally, we introduce a unified Representation Extract...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3726302.3731949","openalex_id":"https://openalex.org/W4412394950","cited_by_count":0,"quality_score":41,"matched_keywords":["personalized"],"author_affiliations":["Alibaba Group (China)"],"concepts":[{"id":"https://openalex.org/C2777742833","display_name":"Reciprocal","score":0.7339165806770325},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.679352343082428},{"id":"https://openalex.org/C2780980858","display_name":"Dual (grammatical number)","score":0.6726736426353455},{"id":"https://openalex.org/C189430467","display_name":"Ranking (information retrieval)","score":0.6626576781272888},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.34827229380607605},{"id":"https://openalex.org/C138885662","display_name":"Philosophy","score":0.0},{"id":"https://openalex.org/C41895202","display_name":"Linguistics","score":0.0},{"id":"https://openalex.org/C124952713","display_name":"Literature","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4412396346","title":"IDAT: A Multi-Modal Dataset and Toolkit for Building and Evaluating Interactive Task-Solving Agents","url":"https://doi.org/10.1145/3726302.3730300","published":"2025-07-13","authors":["Shrestha Mohanty","Negar Arabzadeh","Andrea Tupini","Yuxuan Sun","Alexey Skrynnik","Artem Zholus","Marc-Alexandre Côté","Julia Kiseleva"],"abstract":"Seamless interaction between AI agents and humans using natural language remains a key goal in AI research. This paper addresses the challenges of developing interactive agents capable of understanding and executing grounded natural language instructions through the IGLU competition. Despite advancements, challenges such as a scarcity of appropriate datasets and the need for effective evaluation platforms persist. We introduce a scalable data collection tool for gathering interactive grounded language instructions within a Minecraft-like environment, resulting in a Multi-Modal dataset with around 9,000 utterances and over 1,000 clarification questions. Additionally, we present a Human-in-the-Loop interactive evaluation platform for qualitative analysis and comparison of agent performance through multi-turn communication with human annotators. We offer to the community these assets referr...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3726302.3730300","openalex_id":"https://openalex.org/W4412396346","cited_by_count":0,"quality_score":41,"matched_keywords":["agent"],"author_affiliations":["Alpha Omega Alpha Medical Honor Society","Massachusetts Institute of Technology","Menlo School","Microsoft (Canada)","Microsoft (United States)","Microsoft Research Montréal (Canada)","Polytechnique Montréal","St. Franziskus Hospital","University of Waterloo"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8252288103103638},{"id":"https://openalex.org/C2780451532","display_name":"Task (project management)","score":0.7325552105903625},{"id":"https://openalex.org/C71139939","display_name":"Modal","score":0.7174638509750366},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.5717808604240417},{"id":"https://openalex.org/C175154964","display_name":"Task analysis","score":0.43216729164123535},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.33362916111946106},{"id":"https://openalex.org/C201995342","display_name":"Systems engineering","score":0.1171618402004242},{"id":"https://openalex.org/C127413603","display_name":"Engineering","score":0.0698176920413971}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"arxiv:2504.12920","title":"CSMF: Cascaded Selective Mask Fine-Tuning for Multi-Objective Embedding-Based Retrieval","url":"http://arxiv.org/abs/2504.12920","published":"2025-07-13","authors":["Hao Deng","Haibo Xing","Kanefumi Matsuyama","Moyu Zhang","Jinxin Hu","Hong Wen","Yu Zhang","Xiaoyi Zeng","Jing Zhang"],"abstract":"Multi-objective embedding-based retrieval (EBR) has become increasingly critical due to the growing complexity of user behaviors and commercial objectives. While traditional approaches often suffer from data sparsity and limited information sharing between objectives, recent methods utilizing a shared network alongside dedicated sub-networks for each objective partially address these limitations. However, such methods significantly increase the model parameters, leading to an increased retrieval latency and a limited ability to model causal relationships between objectives. To address these challenges, we propose the Cascaded Selective Mask Fine-Tuning (CSMF), a novel method that enhances both retrieval efficiency and serving performance for multi-objective EBR. The CSMF framework selectively masks model parameters to free up independent learning space for each objective, leveraging the....","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3726302.3729939","openalex_id":"https://openalex.org/W4412377141","cited_by_count":0,"quality_score":41,"matched_keywords":["retrieval"],"author_affiliations":["Alibaba Group (China)","Wuhan University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6939011216163635},{"id":"https://openalex.org/C41608201","display_name":"Embedding","score":0.6637738943099976},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3892343044281006}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4412377072","title":"Researchy Questions: A Dataset of Multi-Perspective, Decompositional Questions for Deep Research","url":"https://doi.org/10.1145/3726302.3730275","published":"2025-07-13","authors":["Corby Rosset","Hyunsong Chung","Guanghui Qin","Ethan C. Chau","Zhuo Feng","Ahmed Hassan Awadallah","Jennifer Neville","Nikhil Rao"],"abstract":"Existing question answering (QA) datasets are no longer challenging to most powerful Large Language Models (LLMs). Traditional QA benchmarks like TriviaQA, NaturalQuestions, ELI5 and HotpotQA mainly study ''known unknowns'' with clear indications of both what information is missing, and how to find it to answer the question. A yet unmet need of the NLP community is a bank of non-factoid, multi-perspective questions involving a great deal of unclear information needs, i.e. ''unknown unknowns''. We claim we can find such questions in search engine logs, which is surprising because most question-intent queries are indeed factoid. Furthermore, recent products like Google's DeepResearch (announced a year after this resource was released publicly) specifically address such queries, retrieving hundreds of documents to synthesize report-style responses. We present Researchy Questions, the world'...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3726302.3730275","openalex_id":"https://openalex.org/W4412377072","cited_by_count":2,"quality_score":39,"matched_keywords":[],"author_affiliations":["Johns Hopkins University","Microsoft (United States)","National Taiwan University"],"concepts":[{"id":"https://openalex.org/C12713177","display_name":"Perspective (graphical)","score":0.7888004779815674},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6115527749061584},{"id":"https://openalex.org/C2522767166","display_name":"Data science","score":0.4864034950733185},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4615137577056885},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.3494557738304138}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":2}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/orchestration-for-domain-specific-edge-cloud-language-models","title":"Orchestration for Domain-specific Edge-Cloud Language Models","url":"https://www.microsoft.com/en-us/research/publication/orchestration-for-domain-specific-edge-cloud-language-models/","published":"2025-07-11","authors":["Prasoon Patidar","Alex Crown","Kevin Hsieh","Yifei Xu","Tusher Chakraborty","Ranveer Chandra","Yuvraj Agarwal"],"abstract":"The remarkable performance of Large Language Models (LLMs) has inspired many applications, which often necessitate edge-cloud collaboration due to connectivity, privacy, and cost considerations. Traditional methods primarily focus on selecting the best LLM model for optimizing performance, while neglecting the critical interplay between the components of the LLM serving pipeline (context retrieval, query preprocessing, etc.) or the changing latency and cost constraints. We introduce ECO-LLM (Edge-Cloud Orchestrator for LLMs), a novel system that reframes this problem as a joint optimization challenge and solves it by systematically exploring component configurations and dynamically selecting optimal strategies at the query level. ECO-LLM consists of two components: (1) the ECO-LLM Emulator, which efficiently explores the vast configuration space utilizing query clustering and pareto-opti...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Miscellaneous","Artificial intelligence","Computer science","LLM","retrieval"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"apple:urqzc2lbuf028dbgm02c8sx4","title":"CommVQ: Commutative Vector Quantization for KV Cache Compression","url":"https://machinelearning.apple.com/research/commutative-vector-quantization","published":"2025-07-11","authors":["Junyan Li","Tianle Cai","Yang Zhang§","Muhammad Yusuf Hassan","Talha Chafekar","Colorado Reed","Zhile Ren","Pengsheng Guo","Binazir Karimzadeh","Chong Wang","Chuang Gan"],"abstract":"Large Language Models (LLMs) are increasingly used in applications requiring long contextlengths, but the key-value (KV) cache often becomes a memory bottleneck on GPUs as con-text lengths grow. To address this, we propose Commutative Vector Quantization (CommVQ)to significantly reduce memory usage for long context LLM inference. First, we leverage additive quantization by introducing a lightweight encoder and codebook to compress the KV...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["LLM","memory","compression","quantization"],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"apple:i5od3bofc56lpxicwtn2wmge","title":"QuantSpec: Self-Speculative Decoding with Hierarchical Quantized KV Cache","url":"https://machinelearning.apple.com/research/quantspec","published":"2025-07-11","authors":["Rishabh Tiwari","Haocheng Xi","Aditya Tomar","Coleman Hooper","Sehoon Kim","Maxwell Horton","Mahyar Najibi","Michael W. Mahoney§","Kurt Keutzer","Amir Gholami"],"abstract":"Large Language Models (LLMs) are increasingly being deployed on edge devices for long-context settings, creating a growing need for fast and efficient long-context inference. In these scenarios, the Key-Value (KV) cache is the primary bottleneck in terms of both GPU memory and latency, as the full KV cache must be loaded for each decoding step. While speculative decoding is a widely accepted technique to accelerate autoregressive decoding,...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["memory","efficient"],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"apple:iwioq9a9pakqekft4mhb9n51","title":"Point-3D LLM: Studying the Impact of Token Structure for 3D Scene Understanding With Large Language Models","url":"https://machinelearning.apple.com/research/pts3d-llm","published":"2025-07-11","authors":["Hugues Thomas","Chen Chen","Jian Zhang"],"abstract":"Effectively representing 3D scenes for Multimodal Large Language Models (MLLMs) is crucial yet challenging. Existing approaches commonly only rely on 2D image features and use varied tokenization approaches. This work presents a rigorous study of 3D token structures, systematically comparing video-based and point-based representations while maintaining consistent model backbones and parameters. We propose a novel approach that enriches visual...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["LLM"],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"apple:rgv5007utz4rqiydwhpiqxri","title":"Target Concrete Score Matching: A Holistic Framework for Discrete Diffusion","url":"https://machinelearning.apple.com/research/target-concrete","published":"2025-07-11","authors":["Ruixiang Zhang","Zijing Ou","Shuangfei Zhai","Yizhe Zhang","Josh Susskind","Navdeep Jaitly","James Thornton"],"abstract":"Discrete diffusion is a promising framework for modeling and generating discrete data. In this work, we present Target Concrete Score Matching (TCSM), a novel and versatile objective for training and fine-tuning discrete diffusion models. TCSM provides a general framework with broad applicability. It supports pre-training discrete diffusion models directly from data samples, and many existing discrete diffusion approaches naturally emerge as...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"apple:oigz1smo1kdaskare6mdwdlg","title":"Shielded Diffusion: Generating Novel and Diverse Images using Sparse Repellency","url":"https://machinelearning.apple.com/research/diffusion","published":"2025-07-11","authors":["Michael Kirchhof","James Thornton","Louis Béthune","Pierre Ablin","Eugene Ndiaye","Marco Cuturi"],"abstract":"The adoption of text-to-image diffusion models raises concerns over reliability, drawing scrutiny under the lens of various metrics like calibration, fairness, or compute efficiency. We focus in this work on two issues that arise when deploying these models: a lack of diversity when prompting images, and a tendency to recreate images from the training set. To solve both problems, we propose a method that coaxes the sampled trajectories of...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"apple:nsn5kx6dce6emtk26si30da4","title":"Self-reflective Uncertainties: Do LLMs Know Their Internal Answer Distribution?","url":"https://machinelearning.apple.com/research/self-reflective","published":"2025-07-11","authors":["Michael Kirchhof","Luca Füger","Adam Golinski","Eeshan Gunesh Dhekane","Arno Blaas","Sinead Williamson"],"abstract":"This paper was accepted at the Workshop on Reliable and Responsible Foundation Models (RRFMs) Workshop at ICML 2025.","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"apple:f55innbnjuc5cttx5qsipohs","title":"Omni-Router: Sharing Routing Decisions in Sparse Mixture-of-Experts for Speech Recognition","url":"https://machinelearning.apple.com/research/omni-router","published":"2025-07-11","authors":["Zijin Gu","Tatiana Likhomanenko","Navdeep Jaitly"],"abstract":"Mixture-of-experts (MoE) architectures have expanded from language modeling to automatic speech recognition (ASR). Traditional MoE methods, such as the Switch Transformer, route experts independently within each layer. Our analysis reveals that routers in most layers make expert choices that are not strongly correlated with the choices of the routers in other layers. To increase the cooperation between experts in different layers and encourage...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"apple:algu7npwmrpnyk4zannf6zhr","title":"Beyond Sensor Data: Foundation Models of Behavioral Data from Wearables Improve Health Predictions","url":"https://machinelearning.apple.com/research/beyond-sensor","published":"2025-07-11","authors":["Eray Erturk","Fahad Kamran","Salar Abbaspourazad","Sean Jewell","Harsh Sharma","Yujie Li","Sinead Williamson","Nicholas J Foti","Joseph Futoma"],"abstract":"Wearable devices record physiological and behavioral signals that can improve health predictions. While foundation models are increasingly used for such predictions, they have been primarily applied to low-level sensor data, despite behavioral data often being more informative due to their alignment with physiologically relevant timescales and quantities. We develop foundation models of such behavioral signals using over 2.5B hours of wearable...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"apple:xigc8nfnvkktw1q3c5v02chv","title":"Addressing Misspecification in Simulation-based Inference through Data-driven Calibration","url":"https://machinelearning.apple.com/research/addressing-misspecification","published":"2025-07-11","authors":["Antoine Wehenkel","Juan L. Gamella","Ozan Sener","Jens Behrmann","Guillermo Sapiro","Jörn-Henrik Jacobsen","Marco Cuturi"],"abstract":"Driven by steady progress in deep generative modeling, simulation-based inference (SBI) has emerged as the workhorse for inferring the parameters of stochastic simulators. However, recent work has demonstrated that model misspecification can compromise the reliability of SBI, preventing its adoption in important applications where only misspecified simulators are available. This work introduces robust posterior estimation~(RoPE), a framework that...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"apple:iskxnl0677g6esk5ckj6uwrk","title":"A Variational Framework for Improving Naturalness in Generative Spoken Language Models","url":"https://machinelearning.apple.com/research/naturalness","published":"2025-07-11","authors":["Li-Wei Chen","Takuya Higuchi","Zak Aldeneh","Ahmed Hussen Abdelaziz","Alexander Rudnicky"],"abstract":"The success of large language models in text processing has inspired their adaptation to speech modeling. However, since speech is continuous and complex, it is often discretized for autoregressive modeling. Speech tokens derived from self-supervised models (known as semantic tokens) typically focus on the linguistic aspects of speech but neglect prosodic information. As a result, models trained on these tokens can generate speech with reduced...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"openalex:W4412202274","title":"A perspective for adapting generalist AI to specialized medical AI applications and their challenges","url":"https://doi.org/10.1038/s41746-025-01789-7","published":"2025-07-11","authors":["Zifeng Wang","Hanyin Wang","Benjamin Danek","Ying Li","Christina Mack","Luk Arbuckle","Devyani Biswal","Hoifung Poon","Yajuan Wang","Pranav Rajpurkar","Cao Xiao","Jimeng Sun"],"abstract":"We introduce a framework to adapt large language models for medicine: (1) Modeling: breaking down medical workflows into manageable steps; (2) Optimization: optimizing model performance via advanced adaptations; and (3) System engineering: developing agent or chain systems. Furthermore, we describe varied use cases, such as clinical trial design, clinical decision support, and medical imaging analysis. Finally, we discuss challenges and considerations for building medical AI with LLMs.","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"review","doi":"https://doi.org/10.1038/s41746-025-01789-7","openalex_id":"https://openalex.org/W4412202274","cited_by_count":6,"quality_score":47,"matched_keywords":["agent"],"author_affiliations":["Bellevue Hospital Center","Harrison Medical Center","Harvard University","IQVIA (United States)","Mayo Clinic Health System","Microsoft (United States)","Regeneron (United States)","Seattle University","University of Illinois Urbana-Champaign"],"concepts":[{"id":"https://openalex.org/C45371612","display_name":"Generalist and specialist species","score":0.7998653650283813},{"id":"https://openalex.org/C12713177","display_name":"Perspective (graphical)","score":0.7558199167251587},{"id":"https://openalex.org/C55587333","display_name":"Engineering ethics","score":0.4142204523086548},{"id":"https://openalex.org/C2522767166","display_name":"Data science","score":0.3536766767501831},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.34549853205680847},{"id":"https://openalex.org/C188147891","display_name":"Cognitive science","score":0.32188931107521057},{"id":"https://openalex.org/C15744967","display_name":"Psychology","score":0.2922898530960083},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.28531596064567566}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":6}},{"id":"arxiv:2507.08191","title":"Overview of the TREC 2021 deep learning track","url":"http://arxiv.org/abs/2507.08191","published":"2025-07-10","authors":["Nick Craswell","Bhaskar Mitra","Emine Yılmaz","Daniel Campos","Campos, Daniel","Lin, Jimmy","Voorhees, Ellen M.","Soboroff, Ian"],"abstract":"This is the fifth year of the TREC Deep Learning track. As in previous years, we leverage the MS MARCO datasets that made hundreds of thousands of human-annotated training labels available for both passage and document ranking tasks. We mostly repeated last year's design, to get another matching test set, based on the larger, cleaner, less-biased v2 passage and document set, with passage ranking as primary and document ranking as a secondary task (using labels inferred from passage). As we did last year, we sample from MS MARCO queries that were completely held out, unused in corpus construction, unlike the test queries in the first three years. This approach yields a more difficult test with more headroom for improvement. Alongside the usual MS MARCO (human) queries from MS MARCO, this year we generated synthetic queries using a fine-tuned T5 model and using a GPT-4 prompt. The new head...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"preprint","doi":"https://doi.org/10.48550/arxiv.2507.08191","openalex_id":"https://openalex.org/W3178067142","cited_by_count":57,"quality_score":75,"matched_keywords":["LLM","language model"],"author_affiliations":["Microsoft (United States)","National Institute of Standards and Technology","University College London"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.83484947681427},{"id":"https://openalex.org/C70437156","display_name":"Pooling","score":0.7323148250579834},{"id":"https://openalex.org/C189430467","display_name":"Ranking (information retrieval)","score":0.7320533990859985},{"id":"https://openalex.org/C108583219","display_name":"Deep learning","score":0.709880530834198},{"id":"https://openalex.org/C169903167","display_name":"Test set","score":0.7063732147216797},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.6696096062660217},{"id":"https://openalex.org/C2780451532","display_name":"Task (project management)","score":0.6464015245437622},{"id":"https://openalex.org/C51632099","display_name":"Training set","score":0.6088825464248657}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":57}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/working-with-ai-measuring-the-occupational-implications-of-generative-ai","title":"Working with AI: Measuring the Applicability of Generative AI to Occupations","url":"https://www.microsoft.com/en-us/research/publication/working-with-ai-measuring-the-occupational-implications-of-generative-ai/","published":"2025-07-10","authors":["Kiran Tomlinson","Sonia Jaffe","Will Wang","Scott Counts","Siddharth Suri"],"abstract":"Given the rapid adoption of generative AI and its potential to impact a wide range of tasks, understanding the effects of AI on the economy is one of society's most important questions. In this work, we take a step toward that goal by analyzing the work activities people do with AI, how successfully and broadly those activities are done, and combine that with data on what occupations do those activities. We analyze a dataset of 200k anonymized and privacy-scrubbed conversations between users and Microsoft Bing Copilot, a publicly available generative AI system. We find the most common work activities people seek AI assistance for involve gathering information and writing, while the most common activities that AI itself is performing are providing information and assistance, writing, teaching, and advising. Combining these activity classifications with measurements of task success and sco...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Unpublished","Artificial intelligence","Economics"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"bytedance-seed:276","title":"Understanding Chain-of-Thought in LLMs through Information Theory","url":"https://seed.bytedance.com/en/research/understanding-chain-of-thought-in-llms-through-information-theory","published":"2025-07-10","authors":["Jean-Francois Ton","Muhammad Faaiz Taufiq","Yang Liu"],"abstract":"Large Language Models (LLMs) have shown impressive performance in complex reasoning tasks through the use of Chain-of-Thought (CoT) reasoning, allowing models to break down problems into manageable sub-tasks. However, existing CoT evaluation techniques either require annotated CoT data or fall short in accurately assessing intermediate reasoning steps, leading to high rates of false positives. In this paper, we formalize CoT reasoning in LLMs through an information-theoretic lens. Specifically, our framework quantifies the 'information-gain' at each reasoning step, enabling the identification of failure modes in LLMs without the need for expensive annotated datasets. We demonstrate the efficacy of our approach through extensive experiments on toy arithmetic, GSM8K and PRM800k datasets, where it significantly outperforms existing outcome-based methods by providing more accurate insights i...","companies":["ByteDance/Seed"],"matched_orgs":["ByteDance/Seed"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Computation and Language","Responsible AI","ICML 2025"],"author_affiliations":["ByteDance/Seed"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official ByteDance Seed publication API https://seed.bytedance.com/api/get_article_list_v2"}},{"id":"openalex:W4412171133","title":"AlignFormer: Modality Matching Can Achieve Better Zero-Shot Instruction-Following Speech-LLM","url":"https://doi.org/10.1109/jstsp.2025.3588378","published":"2025-07-10","authors":["Ruchao Fan","Bo Ren","Yuxuan Hu","Rui Zhao","Shujie Liu","Jinyu Li"],"abstract":"Integrating speech into LLM (speech-LLM) has gaining increased attention recently. The mainstream solution is to connect a well-trained speech encoder and LLM with a neural adapter. However, the length mismatch between the speech and text sequences are not well handled, leading to imperfect modality matching between the speech and text. In this work, we propose a novel neural adapter, AlignFormer, to reduce the length gap between the two modalities. AlignFormer consists of CTC and dynamic-window QFormer layers, where the CTC alignment provides the dynamic window information for QFormer. The LLM backbone is frozen in training to preserve its text capability, especially the instruction following capability. When training with ASR data only, the proposed AlignFormer unlocks the instruction following capability for speech-LLM and the model can perform zero-shot speech translation (ST) and sp...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/jstsp.2025.3588378","openalex_id":"https://openalex.org/W4412171133","cited_by_count":6,"quality_score":47,"matched_keywords":["LLM"],"author_affiliations":["Microsoft (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6378718614578247},{"id":"https://openalex.org/C2780226545","display_name":"Modality (human–computer interaction)","score":0.6138337254524231},{"id":"https://openalex.org/C165064840","display_name":"Matching (statistics)","score":0.5636296272277832},{"id":"https://openalex.org/C2780813799","display_name":"Zero (linguistics)","score":0.5455800890922546},{"id":"https://openalex.org/C28490314","display_name":"Speech recognition","score":0.5138457417488098},{"id":"https://openalex.org/C2778344882","display_name":"Shot (pellet)","score":0.4871918559074402},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.453086256980896},{"id":"https://openalex.org/C11413529","display_name":"Algorithm","score":0.3208886384963989}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":6}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/brain-informed-fine-tuning-for-improved-multilingual-understanding-in-language-models","title":"Brain-Informed Fine-Tuning for Improved Multilingual Understanding in Language Models","url":"https://www.microsoft.com/en-us/research/publication/brain-informed-fine-tuning-for-improved-multilingual-understanding-in-language-models/","published":"2025-07-09","authors":["Anuja Negi","S. Oota","Manish Gupta","Fatma Deniz"],"abstract":"Recent studies have demonstrated that fine-tuning language models with brain data can improve their semantic understanding, although these findings have so far been limited to English. Interestingly, similar to the shared multilingual embedding space of pretrained multilingual language models, human studies provide strong evidence for a shared semantic system in bilingual individuals. Here, we investigate whether fine-tuning language models with bilingual brain data changes model representations in a way that improves them across multiple languages. To test this, we fine-tune monolingual and multilingual language models using brain activity recorded while bilingual participants read stories in English and Chinese. We then evaluate how well these representations generalize to the bilingual participants’ first language, their second language, and several other languages that the participan...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Biology","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W4412167618","title":"Leveraging Existing Trained AI Models for Enhanced Interview Preparation","url":"https://doi.org/10.32996/jcsts.2025.7.7.52","published":"2025-07-09","authors":["Sameeksha Gupta"],"abstract":"Modern job markets require advanced planning techniques that tackle the changing intricacies of professional hiring procedures. The incorporation of artificial intelligence technologies into interview preparation signifies a groundbreaking progress in career development, utilizing pre-trained models to establish thorough training settings. Natural language processing technologies, speech recognition systems, and machine learning models integrate to create flexible platforms that react dynamically to specific candidate needs across various sectors and professional environments. Virtual reality settings augmented with generative conversational AI offer engaging training experiences that closely mimic real interview situations, along with comprehensive performance analysis and tailored feedback systems. Multimodal learning strategies combine textual, auditory, visual, and behavioral data st...","companies":["Meta/FAIR"],"matched_orgs":["Meta/FAIR"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.32996/jcsts.2025.7.7.52","openalex_id":"https://openalex.org/W4412167618","cited_by_count":1,"quality_score":38,"matched_keywords":[],"author_affiliations":["Meta (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5205815434455872},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.42306599020957947},{"id":"https://openalex.org/C15744967","display_name":"Psychology","score":0.32522058486938477}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4412489912","title":"A large language model-driven reward design framework via dynamic feedback for reinforcement learning","url":"https://doi.org/10.1016/j.knosys.2025.114065","published":"2025-07-08","authors":["Shengjie Sun","Runze Liu","Jiafei Lyu","H. J. Yang","Liangpeng Zhang","Xiu Li"],"abstract":"","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1016/j.knosys.2025.114065","openalex_id":"https://openalex.org/W4412489912","cited_by_count":5,"quality_score":46,"matched_keywords":["language model"],"author_affiliations":["Tencent (China)","Tsinghua University"],"concepts":[{"id":"https://openalex.org/C97541855","display_name":"Reinforcement learning","score":0.8269599080085754},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5829511284828186},{"id":"https://openalex.org/C67203356","display_name":"Reinforcement","score":0.5176363587379456},{"id":"https://openalex.org/C188147891","display_name":"Cognitive science","score":0.4237288236618042},{"id":"https://openalex.org/C180747234","display_name":"Cognitive psychology","score":0.3821621537208557},{"id":"https://openalex.org/C15744967","display_name":"Psychology","score":0.34947681427001953},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3436017632484436},{"id":"https://openalex.org/C77805123","display_name":"Social psychology","score":0.12290036678314209}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":5}},{"id":"arxiv:2505.08202","title":"AI and Generative AI Transforming Disaster Management: A Survey of Damage Assessment and Response Techniques","url":"http://arxiv.org/abs/2505.08202","published":"2025-07-08","authors":["Abhishek Raj","Ankit Shetgaonkar","Lakshit Arora","Dipen Pradhan","Sanjay Surendranath Girija","Shashank Kapoor"],"abstract":"Natural disasters, including earthquakes, wildfires and cyclones, bear a huge risk on human lives as well as infrastructure assets. An effective response to disaster depends on the ability to rapidly and efficiently assess the intensity of damage. Artificial Intelligence (AI) and Generative Artificial Intelligence (GenAI) presents a breakthrough solution, capable of combining knowledge from multiple types and sources of data, simulating realistic scenarios of disaster, and identifying emerging trends at a speed previously unimaginable. In this paper, we present a comprehensive review on the prospects of AI and GenAI in damage assessment for various natural disasters, highlighting both its strengths and limitations. We talk about its application to multimodal data such as text, image, video, and audio, and also cover major issues of data privacy, security, and ethical use of the technolog...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/compsac65507.2025.00251","openalex_id":"https://openalex.org/W4413679588","cited_by_count":6,"quality_score":43,"matched_keywords":[],"author_affiliations":["Google (United States)"],"concepts":[{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.5815539956092834},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.552779495716095},{"id":"https://openalex.org/C3018653863","display_name":"Disaster response","score":0.49206608533859253},{"id":"https://openalex.org/C62555980","display_name":"Emergency management","score":0.3621327877044678},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.33356937766075134},{"id":"https://openalex.org/C17744445","display_name":"Political science","score":0.0761469304561615},{"id":"https://openalex.org/C199539241","display_name":"Law","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":6}},{"id":"openalex:W4413098030","title":"Large Language Models and Data Quality for Knowledge Graphs","url":"https://doi.org/10.1016/j.ipm.2025.104281","published":"2025-07-08","authors":["Stefano Marchesin","Gianmaria Silvello","Omar Alonso"],"abstract":"","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1016/j.ipm.2025.104281","openalex_id":"https://openalex.org/W4413098030","cited_by_count":5,"quality_score":42,"matched_keywords":[],"author_affiliations":["Amazon (United States)","University of Padua"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6197407841682434},{"id":"https://openalex.org/C2987255567","display_name":"Knowledge graph","score":0.5282244682312012},{"id":"https://openalex.org/C2779530757","display_name":"Quality (philosophy)","score":0.48301395773887634},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.4651070237159729},{"id":"https://openalex.org/C2522767166","display_name":"Data science","score":0.4032909870147705},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.31593918800354004},{"id":"https://openalex.org/C111472728","display_name":"Epistemology","score":0.09358209371566772},{"id":"https://openalex.org/C138885662","display_name":"Philosophy","score":0.056533873081207275}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":5}},{"id":"arxiv:2509.00073","title":"Mitigating Clinician Information Overload: Generative AI for Integrated EHR and RPM Data Analysis","url":"http://arxiv.org/abs/2509.00073","published":"2025-07-08","authors":["Ankit Shetgaonkar","Dipen Pradhan","Lakshit Arora","Sanjay Surendranath Girija","Abhishek Raj","Shashank Kapoor"],"abstract":"Generative Artificial Intelligence (GenAI), particularly Large Language Models (LLMs), offer powerful capabilities for interpreting the complex data landscape in healthcare. In this paper, we present a comprehensive overview of the capabilities, requirements and applications of GenAI for deriving clinical insights and improving clinical efficiency. We first provide some background on the forms and sources of patient data, namely real-time Remote Patient Monitoring (RPM) streams and traditional Electronic Health Records (EHRs). The sheer volume and heterogeneity of this combined data present significant challenges to clinicians and contribute to information overload.In addition, we explore the potential of LLM-powered applications for improving clinical efficiency. These applications can enhance navigation of longitudinal patient data and provide actionable clinical decision support throu...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/compsac65507.2025.00284","openalex_id":"https://openalex.org/W4413679708","cited_by_count":0,"quality_score":41,"matched_keywords":["LLM"],"author_affiliations":["Google (United States)"],"concepts":[{"id":"https://openalex.org/C186625053","display_name":"Information overload","score":0.8136647939682007},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6324774622917175},{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.5584781765937805},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.33280301094055176},{"id":"https://openalex.org/C136764020","display_name":"World Wide Web","score":0.15881386399269104}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4412107036","title":"Evidential Multimodal Fusion Network for Trusted Pedestrian Crossing Intent Prediction","url":"https://doi.org/10.1109/tcss.2025.3576113","published":"2025-07-08","authors":["Shilin Zhang","Xiaobo Chen","Wei Xu","Lei Yang","Jian Yang"],"abstract":"Accurate prediction of pedestrians’ behavior poses formidable challenges for autonomous vehicles in urban environments. Multimodal data, such as pedestrians’ motion data, context images, and ego vehicle speed, offer complementary and comprehensive information that can significantly enhance prediction performance. However, the previous methods, despite yielding promising results, integrate different modalities to form a uniform representation, which falls short of fully exploiting the heterogeneity and complementarity of all modalities. Besides, the uncertainty inherent in predictions is also a major concern for safety-critical systems such as autonomous vehicles. In light of the above concerns, this study puts forward a novel evidential multimodal fusion network called EMFNet, which leverages multimodal data and evidence fusion techniques for trusted pedestrian crossing intention predict...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/tcss.2025.3576113","openalex_id":"https://openalex.org/W4412107036","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Alibaba Group (China)","Nanjing University of Information Science and Technology","Nanjing University of Science and Technology","Shandong Institute of Business and Technology","Yantai University"],"concepts":[{"id":"https://openalex.org/C2777113093","display_name":"Pedestrian","score":0.7709914445877075},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5891717076301575},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5357993841171265},{"id":"https://openalex.org/C2777819797","display_name":"Pedestrian crossing","score":0.5026218891143799},{"id":"https://openalex.org/C33954974","display_name":"Sensor fusion","score":0.44306710362434387},{"id":"https://openalex.org/C38652104","display_name":"Computer security","score":0.3617379069328308},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.32405006885528564},{"id":"https://openalex.org/C127413603","display_name":"Engineering","score":0.27840524911880493}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/llmoc-large-language-model-inference-at-wafer-scale","title":"WaferLLM: Large Language Model Inference at Wafer Scale","url":"https://www.microsoft.com/en-us/research/publication/llmoc-large-language-model-inference-at-wafer-scale/","published":"2025-07-07","authors":["Congjie He","Yeqi Huang","Pei Mu","Ziming Miao","Jilong Xue","Lingxiao Ma","Fan Yang","Luo Mai"],"abstract":"Emerging AI accelerators increasingly adopt wafer-scale manufacturing technologies, integrating hundreds of thousands of AI cores in a mesh architecture with large distributed on-chip memory (tens of GB in total) and ultra-high on-chip memory bandwidth (tens of PB/s). However, current LLM inference systems, optimized for shared memory architectures like GPUs, fail to exploit these accelerators fully.We introduce WaferLLM, the first wafer-scale LLM inference system. WaferLLM is guided by a novel PLMR model (pronounced as \"Plummer\") that captures the unique hardware characteristics of wafer-scale architectures. Leveraging this model, WaferLLM pioneers wafer-scale LLM parallelism, optimizing the utilization of hundreds of thousands of on-chip cores. It also introduces MeshGEMM and MeshGEMV, the first GEMM and GEMV implementations designed to scale effectively on wafer-scale accelerators.Eva...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":84,"matched_keywords":["Inproceedings (Conference)","Systems and networking","large language models","1970-01-01","LLM","language model","memory","efficient"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/training-plug-and-play-knowledge-modules-with-deep-context-distillation","title":"Training Plug-and-Play Knowledge Modules with Deep Context Distillation","url":"https://www.microsoft.com/en-us/research/publication/training-plug-and-play-knowledge-modules-with-deep-context-distillation/","published":"2025-07-07","authors":["Lucas Caccia","Alan Ansell","E. Ponti","Ivan Vuli'c","Alessandro Sordoni"],"abstract":"Dynamically integrating new or rapidly evolving information after (Large) Language Model pre-training remains challenging, particularly in low-data scenarios or when dealing with private and specialized documents. In-context learning and retrieval-augmented generation (RAG) face limitations, including their high inference costs and their inability to capture global document information. In this paper, we propose a way of modularizing knowledge by training document-level Knowledge Modules (KMs). KMs are lightweight components implemented as parameter-efficient LoRA modules, which are trained to store information about new documents and can be easily plugged into models on demand. We show that next-token prediction performs poorly as the training objective for KMs. We instead propose Deep Context Distillation: we learn KMs parameters such as to simulate hidden states and logits of a teache...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":84,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","1970-01-01","language model","retrieval","efficient","distillation"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/slimmoe-structured-compression-of-large-moe-models-via-expert-slimming-and-distillation","title":"SlimMoE: Structured Compression of Large MoE Models via Expert Slimming and Distillation","url":"https://www.microsoft.com/en-us/research/publication/slimmoe-structured-compression-of-large-moe-models-via-expert-slimming-and-distillation/","published":"2025-07-07","authors":["Zichong Li","Chen Liang","Zixuan Zhang","Ilgee Hong","Young Jin Kim","Weizhu Chen","Tuo Zhao"],"abstract":"The Mixture of Experts (MoE) architecture has emerged as a powerful paradigm for scaling large language models (LLMs) while maintaining inference efficiency. However, their substantial memory requirements make them prohibitively expensive to fine-tune or deploy in resource-constrained environments. To address this challenge, we propose \\textit{SlimMoE}, a multi-stage compression framework that transforms large MoE models into significantly smaller and more efficient variants without the cost of training from scratch. Our method systematically reduces parameter counts by slimming experts and transferring knowledge through intermediate stages, effectively mitigating the performance degradation typical of one-shot pruning. Using SlimMoE, we compress Phi-3.5-MoE (41.9B total / 6.6B activated parameters) into two smaller models: Phi-mini-MoE (7.6B total / 2.4B activated) and Phi-tiny-MoE (3.8...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":84,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","1970-01-01","memory","efficient","compression","distillation"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/securitylingua-efficient-defense-of-llm-jailbreak-attacks-via-security-aware-prompt-compression","title":"SecurityLingua: Efficient Defense of LLM Jailbreak Attacks via Security-Aware Prompt Compression","url":"https://www.microsoft.com/en-us/research/publication/securitylingua-efficient-defense-of-llm-jailbreak-attacks-via-security-aware-prompt-compression/","published":"2025-07-07","authors":["Yucheng Li","Surin Ahn","Huiqiang Jiang","Amir H. Abdi","Yuqing Yang","Lili Qiu"],"abstract":"Large language models (LLMs) have achieved widespread adoption across numerous applications. However, many LLMs are vulnerable to malicious attacks even after safety alignment. These attacks typically bypass LLMs’ safety guardrails by wrapping the original malicious instructions inside adversarial jailbreaks prompts. Previous research has proposed methods such as adversarial training and prompt rephrasing to mitigate these safety vulnerabilities, but these methods often reduce the utility of LLMs or lead to significant computational overhead and online latency. In this paper, we propose SecurityLingua, an effective and efficient approach to defend LLMs against jailbreak attacks via security-oriented prompt compression. Specifically, we train a prompt compressor designed to discern the “true intention” of the input prompt, with a particular focus on detecting the malicious intentions of a...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":84,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Security, privacy, and cryptography","Computer science","1970-01-01","LLM","efficient","compression"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/pyramidkv-dynamic-kv-cache-compression-based-on-pyramidal-information-funneling","title":"PyramidKV: Dynamic KV Cache Compression based on Pyramidal Information Funneling","url":"https://www.microsoft.com/en-us/research/publication/pyramidkv-dynamic-kv-cache-compression-based-on-pyramidal-information-funneling/","published":"2025-07-07","authors":["Zefan Cai","Yichi Zhang","Bofei Gao","Yuliang Liu","Tianyu Liu","Keming Lu","Wayne Xiong","Yue Dong","Baobao Chang","Junjie Hu","Wen Xiao"],"abstract":"In this study, we investigate whether attention-based information flow inside large language models (LLMs) is aggregated through noticeable patterns for long context processing. Our observations reveal that LLMs aggregate information through Pyramidal Information Funneling where attention is scattering widely in lower layers, progressively consolidating within specific contexts, and ultimately focusing on critical tokens (a.k.a massive activation or attention sink) in higher layers. Motivated by these insights, we developed PyramidKV, a novel and effective KV cache compression method. This approach dynamically adjusts the KV cache size across different layers, allocating more cache in lower layers and less in higher ones, diverging from traditional methods that maintain a uniform KV cache size. Our experimental evaluations, utilizing the LongBench benchmark, show that PyramidKV matches t...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":80,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Human language technologies","Computer science","1970-01-01","memory","compression"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/know-me-respond-to-me-benchmarking-llms-for-dynamic-user-profiling-and-personalized-responses-at-scale","title":"Know Me, Respond to Me: Benchmarking LLMs for Dynamic User Profiling and Personalized Responses at Scale","url":"https://www.microsoft.com/en-us/research/publication/know-me-respond-to-me-benchmarking-llms-for-dynamic-user-profiling-and-personalized-responses-at-scale/","published":"2025-07-07","authors":["Bowen Jiang","Zhuoqun Hao","Young-Min Cho","Bryan Li","Yuan Yuan","Sihao Chen","Lyle Ungar","C. J. Taylor","Dan Roth"],"abstract":"Large Language Models (LLMs) have emerged as personalized assistants for users across a wide range of tasks – from offering writing support to delivering tailored recommendations or consultations. Over time, the interaction history between a user and an LLM can provide extensive information about an individual’s traits and preferences. However, open questions remain on how well LLMs today can effectively leverage such history to (1) internalize the user’s inherent traits and preferences, (2) track how the user profiling and preferences evolve over time, and (3) generate personalized responses accordingly in new scenarios.In this work, we introduce the PERSONAMEM benchmark. PERSONAMEM features curated user profiles with over 180 simulated user-LLM interaction histories, each containing up to 60 sessions of multi-turn conversations across 15 real-world tasks that require personalization. G...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":80,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","1970-01-01","LLM","personalized","personalization"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/putting-the-value-back-in-rl-better-test-time-scaling-by-unifying-llm-reasoners-with-verifiers","title":"Putting the Value Back in RL: Better Test-Time Scaling by Unifying LLM Reasoners With Verifiers","url":"https://www.microsoft.com/en-us/research/publication/putting-the-value-back-in-rl-better-test-time-scaling-by-unifying-llm-reasoners-with-verifiers/","published":"2025-07-07","authors":["Kusha Sareen","Morgane M Moss","Alessandro Sordoni","Rishabh Agarwal","Arian Hosseini"],"abstract":"Prevalent reinforcement learning~(RL) methods for fine-tuning LLM reasoners, such as GRPO or Leave-one-out PPO, abandon the learned value function in favor of empirically estimated returns. This hinders test-time compute scaling that relies on using the value-function for verification. In this work, we propose RL[latex]^V[/latex] that augments any \"value-free\" RL method by jointly training the LLM as both a reasoner and a generative verifier using RL-generated data, adding verification capabilities without significant overhead. Empirically, RL[latex]^V[/latex] boosts MATH accuracy by over 20\\% with parallel sampling and enables [latex]8-32\\times[/latex] efficient test-time compute scaling compared to the base RL method. RL[latex]^V[/latex] also exhibits strong generalization capabilities for both easy-to-hard and out-of-domain tasks. Furthermore, RL[latex]^V[/latex] achieves [latex]1.2-1...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","1970-01-01","LLM","efficient"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/rethinking-safety-in-llm-fine-tuning-an-optimization-perspective","title":"Rethinking Safety in LLM Fine-tuning: An Optimization Perspective","url":"https://www.microsoft.com/en-us/research/publication/rethinking-safety-in-llm-fine-tuning-an-optimization-perspective/","published":"2025-07-07","authors":["Minseon Kim","Jin Myung Kwak","Lama Alssum","Bernard Ghanem","Philip H. S. Torr","David Krueger","Fazl Barez","Adel Bibi"],"abstract":"Fine-tuning language models is commonly believed to inevitably harm their safety, i.e., refusing to respond to harmful user requests, even when using harmless datasets, thus requiring additional safety measures. We challenge this belief through systematic testing, showing that poor optimization choices, rather than inherent trade-offs, often cause safety problems, measured as harmful responses to adversarial prompts. By properly selecting key training hyper-parameters, e.g., learning rate, batch size, and gradient steps, we reduce unsafe model responses from 16\\% to approximately 5\\%, as measured by keyword matching, while maintaining utility performance. Based on this observation, we propose a simple exponential moving average (EMA) momentum technique in parameter space that preserves safety performance by creating a stable optimization path and retains the original pre-trained model's....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","1970-01-01","LLM"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/exploring-sparse-adapters-for-scalable-merging-of-parameter-efficient-experts","title":"Exploring Sparse Adapters for Scalable Merging of Parameter Efficient Experts","url":"https://www.microsoft.com/en-us/research/publication/exploring-sparse-adapters-for-scalable-merging-of-parameter-efficient-experts/","published":"2025-07-07","authors":["Samin Yeasar Arnob","Zhan Su","Minseon Kim","Oleksiy Ostapenko","Riyasat Ohib","Esra'a Saleh","Doina Precup","Lucas Caccia","Alessandro Sordoni"],"abstract":"Merging parameter-efficient task experts has recently gained growing attention as a way to build modular architectures that can be rapidly adapted on the fly for specific downstream tasks, without requiring additional fine-tuning. Typically, LoRA serves as the foundational building block of such parameter-efficient modular architectures, leveraging low-rank weight structures to reduce the number of trainable parameters. In this paper, we study the properties of sparse adapters, which train only a subset of weights in the base neural network, as potential building blocks of modular architectures. First, we propose a simple method for training highly effective sparse adapters, which is conceptually simpler than existing methods in the literature and surprisingly outperforms both LoRA and full fine-tuning in our setting. Next, we investigate the merging properties of these sparse adapters b...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","1970-01-01","efficient"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/scaling-laws-of-synthetic-data-for-language-models","title":"Scaling Laws of Synthetic Data for Language Models","url":"https://www.microsoft.com/en-us/research/publication/scaling-laws-of-synthetic-data-for-language-models/","published":"2025-07-07","authors":["Zeyu Qin","Qingxiu Dong","Xingxing Zhang","Li Dong","Xiaolong Huang","Ziyi Yang","Mahmoud Khademi","Dongdong Zhang","Hany Hassan Awadalla","Yi R. Fung","Weizhu Chen","Minhao Cheng"],"abstract":"Large language models (LLMs) achieve strong performance across diverse tasks, driven by high-quality web data used in pre-training. However, recent studies indicate web data is rapidly depleting. Synthetic data emerges as a promising alternative, but it remains unclear whether synthetic datasets exhibit predictable scalability comparable to raw pre-training data. In this work, we systematically investigate scaling laws of synthetic data by introducing SynthLLM, a scalable framework that transforms pre-training corpora into diverse, high-quality synthetic datasets. Our approach achieves this by automatically extracting and recombining high-level concepts across multiple documents using a graph algorithm. Key findings from our experiments with SynthLLM on math domain include: (1) SynthLLM generates synthetic data that reliably adheres to rectified scaling law across various model sizes; (2...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Human language technologies","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"apple:oiy6hvuukj0cx395bu14x3vv","title":"SceneScout: Towards AI Agent-driven Access to Street View Imagery for Blind Users","url":"https://machinelearning.apple.com/research/scenescout","published":"2025-07-07","authors":["Gaurav Jain","Leah Findlater","Cole Gleason"],"abstract":"People who are blind or have low vision (BLV) may hesitate to travel independently in unfamiliar environments due to uncertainty about the physical landscape. While most tools focus on in-situ navigation, those exploring pre-travel assistance typically provide only landmarks and turn-by-turn instructions, lacking detailed visual context. Street view imagery, which contains rich visual information and has the potential to reveal numerous...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["agent"],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"apple:tpragc0lifyqj4up2xkh43ty","title":"Parameters vs FLOPs: Scaling Laws for Optimal Sparsity for Mixture-of-Experts Language Models","url":"https://machinelearning.apple.com/research/parameters-flops-scaling","published":"2025-07-07","authors":["Samira Abnar","Harshay Shah","Dan Busbridge","Alaaeldin Mohamed Elnouby Ali","Josh Susskind","Vimal Thilak"],"abstract":"This paper was accepted at the Sparsity in LLMs (SLLM): Deep Dive into Mixture of Experts, Quantization, Hardware, and Inference workshop at ICLR 2025.","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["quantization"],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"apple:wkxtqm3mwpggq07ip5x6lw9s","title":"Learning to Route LLMs with Confidence Tokens","url":"https://machinelearning.apple.com/research/learning-to-route","published":"2025-07-07","authors":["Yu-Neng Chuang","Prathusha K. Sarma","Parikshit Gopalan","John Boccio","Sara Bolouki","Xia Hu","Helen Zhou"],"abstract":"Large language models (LLMs) have demonstrated impressive performance on several tasks and are increasingly deployed in real-world applications. However, especially in high-stakes settings, it becomes vital to know when the output of an LLM may be unreliable. Depending on whether an answer is trustworthy, a system can then choose to route the question to another expert, or otherwise fall back on a safe default behavior. In this work, we study the...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["LLM"],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"apple:vhlbnw88gwc69hei69ovyvsm","title":"The Geometries of Truth Are Orthogonal Across Tasks","url":"https://machinelearning.apple.com/research/geometries-of-truth","published":"2025-07-07","authors":["Waïss Azizian","Michael Kirchhof","Eugene Ndiaye","Louis Béthune","Michal Klein","Pierre Ablin","Marco Cuturi"],"abstract":"This paper was presented at the Workshop on Reliable and Responsible Foundation Models at ICML 2025.","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"openalex:W4412073499","title":"DBR: Divergence-Based Regularization for Debiasing Natural Language Understanding Models","url":"https://doi.org/10.1145/3748239.3748241","published":"2025-07-07","authors":["Zihao Li","Ruixiang Tang","Lu Cheng","Shuaiqiang Wang","Dawei Yin","Mengnan Du"],"abstract":"Pre-trained language models (PLMs) have achieved impressive results on various natural language processing tasks. However, recent research has revealed that these models often rely on superficial features and shortcuts instead of developing a genuine understanding of language, especially for natural language understanding (NLU) tasks. Consequently, the models struggle to generalize to out-of-domain data. In this work, we propose Divergence Based Regularization (DBR) to mitigate this shortcut learning behavior. Our method measures the divergence between the output distributions for original examples and examples where shortcut tokens have been masked. This process prevents the model's predictions from being overly influenced by shortcut features or biases. We evaluate our model on three NLU tasks and find that it improves out-of-domain performance with little loss of in-domain accuracy. O...","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3748239.3748241","openalex_id":"https://openalex.org/W4412073499","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Baidu (China)","New Jersey Institute of Technology","Rutgers Sexual and Reproductive Health and Rights","University of Illinois Chicago"],"concepts":[{"id":"https://openalex.org/C2779458634","display_name":"Debiasing","score":0.8198116421699524},{"id":"https://openalex.org/C2776135515","display_name":"Regularization (linguistics)","score":0.5934693813323975},{"id":"https://openalex.org/C207390915","display_name":"Divergence (linguistics)","score":0.5245388150215149},{"id":"https://openalex.org/C2776608160","display_name":"Natural (archaeology)","score":0.41683003306388855},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.40400147438049316},{"id":"https://openalex.org/C15744967","display_name":"Psychology","score":0.29294681549072266},{"id":"https://openalex.org/C127313418","display_name":"Geology","score":0.2831156849861145},{"id":"https://openalex.org/C41895202","display_name":"Linguistics","score":0.2692497968673706}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/artificial-intelligence-and-other-speculative-metaphors","title":"Artificial Intelligence and other Speculative Metaphors","url":"https://www.microsoft.com/en-us/research/publication/artificial-intelligence-and-other-speculative-metaphors/","published":"2025-07-04","authors":["Mark Blythe","Sin Lindley","Dave Murray-Rust"],"abstract":"The paper proposes “speculative metaphors” as constructs for reframing and critically engaging with ideas of artificial intelligence. It identifies a broad range of AI metaphors in the wider culture and technical literature and discusses metaphor design in terms of explanation, persuasion and speculation. To explore different metaphor design strategies, we use a custom GPT to generate a large number of variants on the “artificial intelligence” metaphor. The paper contributes a conceptual framing for such speculative metaphor drawing on ideas of knowledge and understanding, fusion and synthesis, collaboration and collectives. We argue that generating speculative metaphors provides a means of thinking critically about human-AI interaction.","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Article (Journal)","Human-computer interaction","Computer science","persuasion"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"apple:x0r4yzmc3p0mpc2fd3qgcxck","title":"Soup-of-Experts: Pretraining Specialist Models via Parameters Averaging","url":"https://machinelearning.apple.com/research/soup-of-experts","published":"2025-07-04","authors":["Pierre Ablin","Angelos Katharopoulos","Skyler Seto","David Grangier"],"abstract":"Large-scale models are routinely trained on a mixture of different data sources. Different data mixtures yield very different downstream performances.We propose a novel architecture that can instantiate one model for each data mixture without having to re-train the model. Our architecture consists of a bank of expert weights, which are linearly combined to instantiate one model. We learn the linear combination coefficients as a function of...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"apple:o4btabp0fj9cbr93j1jqo7x4","title":"Is Your Model Fairly Certain? Uncertainty-Aware Fairness Evaluation for LLMs","url":"https://machinelearning.apple.com/research/fairly-certain","published":"2025-07-04","authors":["Yinong Oliver Wang","Nivedha Sivakumar","Falaah Arif Khan","Rin Metcalf Susa","Adam Golinski","Natalie Mackraz","Barry-John Theobald","Luca Zappella","Nicholas Apostoloff"],"abstract":"The recent rapid adoption of large language models (LLMs) highlights the critical need for benchmarking their fairness. Conventional fairness metrics, which focus on discrete accuracy-based evaluations (i.e., prediction correctness), fail to capture the implicit impact of model uncertainty (e.g., higher model confidence about one group over another despite similar accuracy). To address this limitation, we propose an uncertainty-aware fairness...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"openalex:W4415883543","title":"Next-Generation Autonomous Troubleshooting Using Generative AI in Heterogeneous Cloud Systems","url":"https://doi.org/10.1109/i2itcon65200.2025.11210496","published":"2025-07-04","authors":["Ajinkya Potdar","Venkatesh Kodela","Lakshmi Narasimhan Srinivasagopalan","Imran Khan","S. Chandramohan","Dinesh Gottipalli"],"abstract":"The goal of troubleshooting in a heterogeneous multi-cloud ecosystem is to identify and fix the problems that develop due to variations in construction, application programming interfaces (APIs), and service configurations across different cloud platforms. Reviewing each cloud provider's performance logs, problem reports, and deployment outcomes will assist in identifying application behavior inconsistencies. Given the variation across providers like AWS, Azure, and GCP, it is crucial to pay close attention to the authentication methods, service orchestration, and network connectivity. A centralized monitoring and diagnostic system that can correlate metrics and logs across a number of settings is essential for effective troubleshooting. This framework will help to identify the underlying cause of errors by combining interpretability and adaptability, in addition to the automation across...","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/i2itcon65200.2025.11210496","openalex_id":"https://openalex.org/W4415883543","cited_by_count":0,"quality_score":41,"matched_keywords":["efficient"],"author_affiliations":["Amazon (United States)","Dallas Independent School District","International Development Enterprises","Zimmer Biomet (Switzerland)","Zimmer Biomet (United States)"],"concepts":[{"id":"https://openalex.org/C147494362","display_name":"Troubleshooting","score":0.82669997215271},{"id":"https://openalex.org/C2781067378","display_name":"Interpretability","score":0.8174999952316284},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7574999928474426},{"id":"https://openalex.org/C79974875","display_name":"Cloud computing","score":0.63919997215271},{"id":"https://openalex.org/C2776401178","display_name":"Feature (linguistics)","score":0.5054000020027161},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.4860000014305115},{"id":"https://openalex.org/C206588197","display_name":"Reuse","score":0.476500004529953},{"id":"https://openalex.org/C124101348","display_name":"Data mining","score":0.46700000762939453}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4412022577","title":"LLM performance in multimodal learning environments: study of integration of text with visual, audio, and sensor data for holistic decision-making","url":"https://doi.org/10.1117/12.3068419","published":"2025-07-04","authors":["Nikunj Agarwal","Aditi Choudhary","Aditya Gupta","Pulkit Jain","Mukund B. Wagh","Dinesh Besiahgari"],"abstract":"The advent of Large Language Models (LLMs) has redefined the boundaries of artificial intelligence, particularly in natural language processing. With their remarkable ability to generate coherent text, LLMs are now being explored for their potential in multimodal learning environments where data from text, visual, audio, and sensor inputs converge. This study delves into the integration of these modalities using LLMs, focusing on their performance in holistic decision-making tasks. By analysing foundational principles, current methodologies, and future directions, this paper aims to provide a comprehensive understanding of the opportunities and challenges in leveraging LLMs for multimodal applications. Empirical insights are supported with technical details, evaluation metrics, and real-world applications.","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1117/12.3068419","openalex_id":"https://openalex.org/W4412022577","cited_by_count":0,"quality_score":41,"matched_keywords":["LLM"],"author_affiliations":["Amazon (United States)","University of Cincinnati"],"concepts":[{"id":"https://openalex.org/C3017588708","display_name":"Audio visual","score":0.7939512729644775},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7554594278335571},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.56452476978302},{"id":"https://openalex.org/C2780660688","display_name":"Multimodal learning","score":0.469414085149765},{"id":"https://openalex.org/C49774154","display_name":"Multimedia","score":0.4659498631954193},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.22112134099006653}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/designing-with-multi-agent-generative-ai-insights-from-industry-early-adopters","title":"Designing with Multi-Agent Generative AI: Insights from Industry Early Adopters","url":"https://www.microsoft.com/en-us/research/publication/designing-with-multi-agent-generative-ai-insights-from-industry-early-adopters/","published":"2025-07-03","authors":["Suchismita Naik","Austin L. Toombs","Amanda Snellinger","Scott Saponas","Amanda K. Hall"],"abstract":"In this paper we present the results of our investigation into how employees at Microsoft, as early adopters of multi-agent generative AI systems, navigate the complexities of designing, testing, and deploying these technologies to extend the organization’s product ecosystem. Through interviews with thirteen developers, we uncover the challenges, use cases, and lessons when designing with and for multi-agent AI frameworks. Our analysis reveals how participants leveraged this advanced emerging technology to enhance collaboration, productivity, customer support, creative processes, and security. Key design strategies include managing agent complexity, fostering transparency, and balancing agent autonomy with human oversight, essential considerations for human-agent interaction design. We provide empirical insights into the capabilities and limitations of multi-agent systems in real-world c...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.1145/3715336.3735823","openalex_id":"https://openalex.org/W4412017465","cited_by_count":2,"quality_score":82,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Human-computer interaction","Computer science","1970-01-01","agent","multi-agent"],"author_affiliations":["Microsoft","Indiana University Bloomington","Microsoft (United States)","Purdue University West Lafayette"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/sequential-diagnosis-with-language-models","title":"Sequential Diagnosis with Language Models","url":"https://www.microsoft.com/en-us/research/publication/sequential-diagnosis-with-language-models/","published":"2025-07-03","authors":["Harsha Nori","M. Daswani","Christopher Kelly","Scott M. Lundberg","Marco Túlio Ribeiro","Marc Wilson","Xiaoxuan Liu","V. Sounderajah","Jonathan M. Carlson","Matthew P Lungren","Bay Gross","Peter Hames"],"abstract":"Artificial intelligence holds great promise for expanding access to expert medical knowledge and reasoning. However, most evaluations of language models rely on static vignettes and multiple-choice questions that fail to reflect the complexity and nuance of evidence-based medicine in real-world settings. In clinical practice, physicians iteratively formulate and revise diagnostic hypotheses, adapting each subsequent question and test to what they've just learned, and weigh the evolving evidence before committing to a final diagnosis. To emulate this iterative process, we introduce the Sequential Diagnosis Benchmark, which transforms 304 diagnostically challenging New England Journal of Medicine clinicopathological conference (NEJM-CPC) cases into stepwise diagnostic encounters. A physician or AI begins with a short case abstract and must iteratively request additional details from a gate...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Unpublished","Artificial intelligence","Human language technologies","Computer science"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W4411932262","title":"A foundation model to predict and capture human cognition","url":"https://doi.org/10.1038/s41586-025-09215-4","published":"2025-07-02","authors":["Marcel Binz","Elif Akata","Matthias Bethge","Franziska Brändle","Fred Callaway","Julian Coda-Forno","Peter Dayan","Can Demircan","Maria K. Eckstein","Noémi Éltető","Thomas L. Griffiths","Susanne Haridi"],"abstract":". A first step towards such a theory is to create a computational model that can predict human behaviour in a wide range of settings. Here we introduce Centaur, a computational model that can predict and simulate human behaviour in any experiment expressible in natural language. We derived Centaur by fine-tuning a state-of-the-art language model on a large-scale dataset called Psych-101. Psych-101 has an unprecedented scale, covering trial-by-trial data from more than 60,000 participants performing in excess of 10,000,000 choices in 160 experiments. Centaur not only captures the behaviour of held-out participants better than existing cognitive models, but it also generalizes to previously unseen cover stories, structural task modifications and entirely new domains. Furthermore, the model's internal representations become more aligned with human neural activity after fine-tuning. Taken to...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1038/s41586-025-09215-4","openalex_id":"https://openalex.org/W4411932262","cited_by_count":59,"quality_score":71,"matched_keywords":["language model"],"author_affiliations":["Georgia Institute of Technology","Google (United Kingdom)","Google DeepMind (United Kingdom)","Helmholtz Institute Mainz","Helmholtz Zentrum München","Max Planck Institute for Biological Cybernetics","Max Planck Institute for Human Cognitive and Brain Sciences","Max Planck Institute for Human Development","New York University","Princeton University","Technical University of Munich","Technische Universität Darmstadt","University of Basel","University of California San Diego","University of Cambridge","University of Oxford","University of Tübingen"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6521016359329224},{"id":"https://openalex.org/C169900460","display_name":"Cognition","score":0.6286888122558594},{"id":"https://openalex.org/C204323151","display_name":"Range (aeronautics)","score":0.6248360276222229},{"id":"https://openalex.org/C2780451532","display_name":"Task (project management)","score":0.5765426754951477},{"id":"https://openalex.org/C66024118","display_name":"Computational model","score":0.5600310564041138},{"id":"https://openalex.org/C2778755073","display_name":"Scale (ratio)","score":0.47957390546798706},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.41983097791671753},{"id":"https://openalex.org/C188147891","display_name":"Cognitive science","score":0.4124099910259247}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":59}},{"id":"apple:fpq56ighwfewvgldka9crg8b","title":"The Super Weight in Large Language Models","url":"https://machinelearning.apple.com/research/super-weight","published":"2025-07-02","authors":["Mengxia Yu","De Wang","Qi Shan","Colorado Reed","Alvin Wan"],"abstract":"Recent works have shown a surprising result: a small fraction of Large Language Model (LLM) parameter outliers are disproportionately important to the quality of the model. LLMs contain billions of parameters, so these small fractions, such as 0.01%, translate to hundreds of thousands of parameters. In this work, we present an even more surprising finding: Pruning as few as a single parameter can destroy an LLM's ability to generate text --...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["LLM","language model"],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"openalex:W4411949399","title":"Robust Multi-Contrast MRI Medical Image Translation via Knowledge Distillation and Adversarial Attack","url":"https://doi.org/10.1109/jbhi.2025.3584721","published":"2025-07-02","authors":["Xujie Zhao","Feng Liang","Chengjiang Long","Zhiyong Yuan","Jianhui Zhao"],"abstract":"Medical image translation is of great value but is very difficult due to the requirement with style change of noise pattern and anatomy invariance of image content. Various deep learning methods like the mainstream GAN, Transformer and Diffusion models have been developed to learn the multi-modal mapping to obtain the translated images, but the results from the generator are still far from being perfect for medical images. In this paper, we propose a robust multi-contrast translation framework for MRI medical images with knowledge distillation and adversarial attack, which can be integrated with any generator. The additional refinement network consists of teacher and student modules with similar structures but different inputs. Unlike the existing knowledge distillation works, our teacher module is designed as a registration network with more inputs to better learn the noise distribution...","companies":["Meta/FAIR"],"matched_orgs":["Meta/FAIR"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/jbhi.2025.3584721","openalex_id":"https://openalex.org/W4411949399","cited_by_count":0,"quality_score":41,"matched_keywords":["distillation"],"author_affiliations":["META Health","Meta (United States)","Wuhan University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7025969624519348},{"id":"https://openalex.org/C37736160","display_name":"Adversarial system","score":0.6894301772117615},{"id":"https://openalex.org/C149364088","display_name":"Translation (biology)","score":0.6317859888076782},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.617492139339447},{"id":"https://openalex.org/C2776502983","display_name":"Contrast (vision)","score":0.5760268568992615},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.5482407808303833},{"id":"https://openalex.org/C31601959","display_name":"Medical imaging","score":0.5204342007637024},{"id":"https://openalex.org/C115961682","display_name":"Image (mathematics)","score":0.4776836037635803}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/magentic-ui-report","title":"Magentic-UI: Towards Human-in-the-loop Agentic Systems","url":"https://www.microsoft.com/en-us/research/publication/magentic-ui-report/","published":"2025-07-01","authors":["Hussein Mozannar","Gagan Bansal","Cheng Tan","Adam Fourney","Victor Dibia","Jingya Chen","Jack Gerrits","Tyler Payne","Matheus Kunzler Maldaner","Madeleine Grunde-McLaughlin","Eric Zhu","Griffin Bassman"],"abstract":"AI agents powered by large language models are increasingly capable of autonomously completing complex, multi-step tasks using external tools. Yet, they still fall short of human-level performance in most domains including computer use, software development, and research. Their growing autonomy and ability to interact with the outside world, also introduces safety and security risks including potentially misaligned actions and adversarial manipulation. We argue that human-in-the-loop agentic systems offer a promising path forward, combining human oversight and control with AI efficiency to unlock productivity from imperfect systems. We introduce Magentic-UI, an open-source web interface for developing and studying human-agent interaction. Built on a flexible multi-agent architecture, Magentic-UI supports web browsing, code execution, and file manipulation, and can be extended with divers...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":100,"matched_keywords":["Tech Report","Artificial intelligence","AI agents","Generative AI","Human-AI Collaboration","Human–computer interaction","Machine learning","memory","long-term","efficient","agent","multi-agent"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/collabllm-from-passive-responders-to-active-collaborators","title":"CollabLLM: From Passive Responders to Active Collaborators","url":"https://www.microsoft.com/en-us/research/publication/collabllm-from-passive-responders-to-active-collaborators/","published":"2025-07-01","authors":["Shirley Wu","Michel Galley","Baolin Peng","Hao Cheng","Gavin Li","Yao Dou","Weixin Cai","James Zou","J. Leskovec","Jianfeng Gao"],"abstract":"Large Language Models are typically trained with next-turn rewards, limiting their ability to optimize for long-term interaction. As a result, they often respond passively to ambiguous or open-ended user requests, failing to help users reach their ultimate intents and leading to inefficient conversations. To address these limitations, we introduce CollabLLM, a novel and general training framework that enhances multiturn human-LLM collaboration. Its key innovation is a collaborative simulation that estimates the long-term contribution of responses using Multiturn-aware Rewards. By reinforcement fine-tuning these rewards, CollabLLM goes beyond responding to user requests, and actively uncovers user intent and offers insightful suggestions-a key step towards more human-centered AI. We also devise a multiturn interaction benchmark with three challenging tasks such as document creation. Colla...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":88,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Human language technologies","Human-computer interaction","Computer science","large language models","1970-01-01","LLM","long-term"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/dpimagebench-a-unified-benchmark-for-differentially-private-image-synthesis","title":"DPImageBench: A Unified Benchmark for Differentially Private Image Synthesis","url":"https://www.microsoft.com/en-us/research/publication/dpimagebench-a-unified-benchmark-for-differentially-private-image-synthesis/","published":"2025-07-01","authors":["Chen Gong","Kecen Li","Zinan Lin","Tianhao Wang"],"abstract":"Differentially private (DP) image synthesis aims to generate artificial images that retain the properties of sensitive images while protecting the privacy of individual images within the dataset. Despite recent advancements, we find that inconsistent--and sometimes flawed--evaluation protocols have been applied across studies. This not only impedes the understanding of current methods but also hinders future advancements.To address the issue, this paper introduces DPImageBench for DP image synthesis, with thoughtful design across several dimensions: (1) Methods. We study eleven prominent methods and systematically characterize each based on model architecture, pretraining strategy, and privacy mechanism. (2) Evaluation. We include nine datasets and seven fidelity and utility metrics to thoroughly assess them. Notably, we find that a common practice of selecting downstream classifiers bas...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.1145/3719027.3765045","openalex_id":"https://openalex.org/W4416549289","cited_by_count":1,"quality_score":85,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Security, privacy, and cryptography","Benchmarking","Differential privacy","Image generation","Synthetic data","1970-01-01"],"author_affiliations":["Microsoft","Microsoft (United States)","University of Virginia"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/pretraining-context-compressor-for-large-language-models-with-embedding-based-memory","title":"Pretraining Context Compressor for Large Language Models with Embedding-Based Memory","url":"https://www.microsoft.com/en-us/research/publication/pretraining-context-compressor-for-large-language-models-with-embedding-based-memory/","published":"2025-07-01","authors":["Yuhong Dai","Jianxun Lian","Yitian Huang","Wei Zhang","Mingyang Zhou","Mingqi Wu","Xing Xie","Hao Liao"],"abstract":"Efficient processing of long contexts in large language models (LLMs) is essential for real world applications like retrieval-augmented generation and in-context learning, especially in resource-constrained environments such as edge computing. This paper explores the embedding-based context compression to reduce inference costs while preserving the downstream LLM configurations. We propose a decoupled compressor-LLM framework, pretrained on text reconstruction and completion tasks, designed to effectively preserve essential contextual information within condensed embedding representations. Our extensive experiments investigate pretraining, model configurations, compression rates, efficiency across tasks, and adaptability to various LLMs. Results demonstrate that our approach outperforms competitive baselines in three domains and across eight datasets while being adapt able to different d...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":84,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","1970-01-01","LLM","memory","retrieval","efficient","compression"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/nextcoder-robust-adaptation-of-code-lms-to-diverse-code-edits","title":"NextCoder: Robust Adaptation of Code LMs to Diverse Code Edits","url":"https://www.microsoft.com/en-us/research/publication/nextcoder-robust-adaptation-of-code-lms-to-diverse-code-edits/","published":"2025-07-01","authors":["Tushar Aggarwal","Swayam Singh","Abhijeet Awasthi","Aditya Kanade","Nagarajan Natarajan"],"abstract":"Software engineering activities frequently involve edits to existing code. However, contemporary code language models (LMs) lack the ability to handle diverse types of code-edit requirements. In this work, we attempt to overcome this shortcoming through (1) a novel synthetic data generation pipeline and (2) a robust model adaptation algorithm. Starting with seed code examples and diverse editing criteria, our pipeline generates high-quality samples comprising original and modified code, along with natural language instructions in different styles and verbosity. Today’s code LMs come bundled with strong abilities, such as code generation and instruction following, which should not be lost due to fine-tuning. To ensure this, we propose a novel adaptation algorithm, SeleKT, that (a) leverages a dense gradient-based step to identify the weights that are most important for code editing, and (...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":80,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Programming languages and software engineering","large language models","Machine learning","software engineering","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/mogic-metadata-infused-oracle-guidance-for-improved-extreme-classification","title":"MOGIC: Metadata-Infused Oracle Guidance for Improved Extreme Classification","url":"https://www.microsoft.com/en-us/research/publication/mogic-metadata-infused-oracle-guidance-for-improved-extreme-classification/","published":"2025-07-01","authors":["Suchith Chidananda Prabhu","Bhavyajeet Singh","Anshul Mittal","Siddarth Asokan","Shikhar Mohan","Deepak Saini","Yashoteja Prabhu","Lakshya Kumar","Jian Jiao","Amit Singh","Niket Tandon","Manish Gupta"],"abstract":"Retrieval-augmented classification and generation models benefit from early-stage fusion of high-quality text-based metadata, often called memory, but face high latency and noise sensitivity. In extreme classification (XC), where low latency is crucial, existing methods use late-stage fusion for efficiency and robustness. To enhance accuracy while maintaining low latency, we propose MOGIC, a novel approach to metadata-infused oracle guidance for XC. We train an early-fusion oracle classifier with access to both query-side and label-side ground-truth metadata in textual form and subsequently use it to guide existing memory-based XC disciple models via regularization. The MOGIC algorithm improves precision@1 and propensity-scored precision@1 of XC disciple models by 1-2% on six standard datasets, at no additional inference-time cost. We show that MOGIC can be used in a plug-and-play manner...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":80,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Search and information retrieval","Information retrieval","1970-01-01","memory","retrieval"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/understanding-and-meeting-practitioner-needs-when-measuring-representational-harms-caused-by-llm-based-systems","title":"Understanding and Meeting Practitioner Needs When Measuring Representational Harms Caused by LLM-Based Systems","url":"https://www.microsoft.com/en-us/research/publication/understanding-and-meeting-practitioner-needs-when-measuring-representational-harms-caused-by-llm-based-systems/","published":"2025-07-01","authors":["Emma Harvey","Emily Sheng","Su Lin Blodgett","Alex Chouldechova","Jean Garcia-Gathright","Alexandra Olteanu","Hanna Wallach"],"abstract":"The NLP research community has made publicly available numerous instruments for measuring representational harms caused by large language model (LLM)-based systems. These instruments have taken the form of datasets, metrics, tools, and more. In this paper, we examine the extent to which such instruments meet the needs of practitioners tasked with evaluating LLM-based systems. Via semi-structured interviews with 12 such practitioners, we find that practitioners are often unable to use publicly available instruments for measuring representational harms. We identify two types of challenges. In some cases, instruments are not useful because they do not meaningfully measure what practitioners seek to measure or are otherwise misaligned with practitioner needs. In other cases, instruments-even useful instruments-are not used by practitioners due to practical and institutional barriers impeding...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Human language technologies","Social sciences","1970-01-01","LLM","language model"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/longrope2-near-lossless-llm-context-window-scaling","title":"LongRoPE2: Near-Lossless LLM Context Window Scaling","url":"https://www.microsoft.com/en-us/research/publication/longrope2-near-lossless-llm-context-window-scaling/","published":"2025-07-01","authors":["Ning Shang","Li Lyna Zhang","Siyuan Wang","Gaokai Zhang","Gilsinia Lopez","Fan Yang","Weizhu Chen","Mao Yang"],"abstract":"LongRoPE2 is a novel approach that extends the effective context window of pre-trained large language models (LLMs) to the target length, while preserving the performance on the original shorter context window. This is achieved by three contributions: (1) a hypothesis that insufficient training in higher RoPE dimensions contributes to the persistent out-of-distribution (OOD) issues observed in existing methods; (2) an effective RoPE rescaling algorithm that adopts evolutionary search guided by \"needle-driven\" perplexity to address the insufficient training problem; (3) a mixed context window training approach that fine-tunes model weights to adopt rescaled RoPE for long-context sequences while preserving the short-context performance with the original RoPE. Extensive experiments on LLaMA3-8B and Phi3-mini-3.8B across various benchmarks validate the hypothesis and demonstrate the effectiv...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Systems and networking","large language models","1970-01-01","LLM"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/its-hard-to-be-normal-the-impact-of-noise-on-structure-agnostic-estimation","title":"It's Hard to Be Normal: The Impact of Noise on Structure-agnostic Estimation","url":"https://www.microsoft.com/en-us/research/publication/its-hard-to-be-normal-the-impact-of-noise-on-structure-agnostic-estimation/","published":"2025-07-01","authors":["Jikai Jin","Lester Mackey","Vasilis Syrgkanis"],"abstract":"Structure-agnostic causal inference studies how well one can estimate a treatment effect given black-box machine learning estimates of nuisance functions (like the impact of confounders on treatment and outcomes). Here, we find that the answer depends in a surprising way on the distribution of the treatment noise. Focusing on the partially linear model of \\citet{robinson1988root}, we first show that the widely adopted double machine learning (DML) estimator is minimax rate-optimal for Gaussian treatment noise, resolving an open problem of \\citet{mackey2018orthogonal}. Meanwhile, for independent non-Gaussian treatment noise, we show that DML is always suboptimal by constructing new practical procedures with higher-order robustness to nuisance errors. These \\emph{ACE} procedures use structure-agnostic cumulant estimators to achieve $r$-th order insensitivity to nuisance errors whenever the...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","Economics","mathematics","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/designing-interfaces-that-support-temporal-work-across-meetings-with-generative-ai","title":"Designing Interfaces that Support Temporal Work Across Meetings with Generative AI","url":"https://www.microsoft.com/en-us/research/publication/designing-interfaces-that-support-temporal-work-across-meetings-with-generative-ai/","published":"2025-07-01","authors":["Rishi Vanukuru","Payod Panda","Xinyue Chen","Ava Elizabeth Scott","Lev Tankelevitch","Sean Rintel"],"abstract":"Temporal work is an essential part of the modern knowledge workplace, where multiple threads of meetings and projects are connected across time by the acts of looking back (retrospection) and ahead (prospection). As we develop Generative AI interfaces to support knowledge work, this lens of temporality can help ground design in real workplace needs. Building upon research in routine dynamics and cognitive science, and an exploratory analysis of real recurring meetings, we develop a framework and a tool for the synergistic exploration of temporal work and the capabilities of Generative AI. We then use these to design a series of interface concepts and prototypes to better support work that spans multiple scales of time. Through this approach, we demonstrate how the design of new Generative AI tools can be guided by our understanding of how work really happens across meetings and projects....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Human-computer interaction","Social sciences","Human–computer interaction","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/demographically-inspired-query-variants-using-an-llm","title":"Demographically-inspired query variants using an LLM","url":"https://www.microsoft.com/en-us/research/publication/demographically-inspired-query-variants-using-an-llm/","published":"2025-07-01","authors":["Marwah Alaofi","Nicola Ferro","Paul Thomas","Falk Scholer","Mark Sanderson"],"abstract":"This study proposes a method to diversify queries in existing test collections to reflect some of the diversity of search engine users, aligning with an earlier vision of an 'ideal' test collection. A Large Language Model (LLM) is used to create query variants: alternativequeries that have the same meaning as the original. These variants represent user profiles characterised by different properties, such as language and domain proficiency, which are known in the Information Retrieval (IR) literature to influence query formulation.The LLM's ability to generate query variants that align with user profiles is empirically validated, and the variants' utility is further explored for IR system evaluation. Results demonstrate that the variants impact how systems are ranked and show that user profiles experience significantly different levels of system effectiveness. This method enables an alter...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Search and information retrieval","1970-01-01","LLM","language model","retrieval"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/cad-editor-a-locate-then-infill-framework-with-automated-training-data-synthesis-for-text-based-cad-editing","title":"CAD-Editor: A Locate-then-Infill Framework with Automated Training Data Synthesis for Text-Based CAD Editing","url":"https://www.microsoft.com/en-us/research/publication/cad-editor-a-locate-then-infill-framework-with-automated-training-data-synthesis-for-text-based-cad-editing/","published":"2025-07-01","authors":["Yu Yuan","Shizhao Sun","Qi Liu","Jiang Bian"],"abstract":"Computer Aided Design (CAD) is indispensable across various industries. \\emph{Text-based CAD editing}, which automates the modification of CAD models based on textual instructions, holds great potential but remains underexplored. Existing methods primarily focus on design variation generation or text-based CAD generation, either lacking support for text-based control or neglecting existing CAD models as constraints. We introduce \\emph{CAD-Editor}, the first framework for text-based CAD editing. To address the challenge of demanding triplet data with accurate correspondence for training, we propose an automated data synthesis pipeline. This pipeline utilizes design variation models to generate pairs of original and edited CAD models and employs Large Vision-Language Models (LVLMs) to summarize their differences into editing instructions. To tackle the composite nature of text-based CAD ed...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer vision","Computer science","Vision-language models","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/preaching-to-the-choir-lessons-ir-should-share-with-ai","title":"Preaching to the ChoIR: Lessons IR should share with AI","url":"https://www.microsoft.com/en-us/research/publication/preaching-to-the-choir-lessons-ir-should-share-with-ai/","published":"2025-07-01","authors":["Gianluca Demartini","Claudia Hauff","Matthew Lease","Stefano Mizzaro","Kevin Roitero","Mark Sanderson","Falk Scholer","Chirag Shah","Damiano Spina","Paul Thomas","Arjen P de Vries","Guido Zuccon"],"abstract":"The field of Information Retrieval (IR) changed profoundly at the end of the 1990s with the rise of Web Search, and there are parallels with developments in Artificial Intelligence (AI) happening today with the advent of ChatGPT, Large Language Models, and Generative AI.We acknowledge that there are clear differences between IR and AI. For example, IR is a much smaller field, and new problems arise, like data contamination that may affect benchmark-based evaluation of AI systems. But looking through the lens of an IR researcher, there are many striking similarities between the two fields of IR (25 years ago) and AI (today), and many topics appearing in discussions in AI resemble those of 25 years ago in IR: benchmark reliability and robust evaluation, reproducibility of results for non-public models, privacy and copyright issues, efficiency and scalability, etc. In this paper, we discuss...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Search and information retrieval","1970-01-01","retrieval"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/position-to-make-text-to-image-models-that-work-for-marginalized-communities-we-need-new-measurement-practices-for-the-long-tail","title":"Position: To Make Text-to-Image Models that Work for Marginalized Communities, We Need New Measurement Practices for the Long Tail","url":"https://www.microsoft.com/en-us/research/publication/position-to-make-text-to-image-models-that-work-for-marginalized-communities-we-need-new-measurement-practices-for-the-long-tail/","published":"2025-07-01","authors":["Nari Johnson","Hamna .","Deepthi Sudharsan","Theo Holroyd","Samantha Dalal","Siobhan Mackenzie Hall","Jennifer Wortman Vaughan","Daniela Massiceti","Cecily Morrison"],"abstract":"While the capabilities of frontier text-to-image models are rapidly improving, they often fail to represent the low data, long tail concepts that matter to historically marginalized communities. Effective measurement is a critical first step towards identifying and addressing these errors, yet little work has validated if existing T2I evaluation metrics work for the long tail. In this paper, we draw upon two community-based case studies to identify challenges with applying best practices to validate T2I metrics using human preference data. We show that available approaches to create and validate evaluation metrics break down when applied to tail concepts because of the need for community knowledge (scaling community annotations) and challenges achieving a range of good and bad images (shades of bad). We take the position that methodological innovation is needed to develop measurement pra...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Unpublished","Artificial intelligence","Computer vision","Human-computer interaction","preference"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/grounding-task-assistance-with-multimodal-cues-from-a-single-demonstration","title":"Grounding task assistance with multimodal cues from a single demonstration","url":"https://www.microsoft.com/en-us/research/publication/grounding-task-assistance-with-multimodal-cues-from-a-single-demonstration/","published":"2025-07-01","authors":["Gabriel Herbert Sarch","Balasaravanan Thoravi Kumaravel","Sahithya Ravi","Vibhav Vineet","Andrew D. Wilson"],"abstract":"A person’s demonstration often serves as a key reference for others learning the same task. However, RGB video, the dominant medium for representing these demonstrations, often fails to capture fine-grained contextual cues such as intent, safety-critical environmental factors, and subtle preferences embedded in human behavior. This sensory gap fundamentally limits the ability of Vision Language Models (VLMs) to reason about why actions occur and how they should adapt to individual users. To address this, we introduce MICA (Multimodal Interactive Contextualized Assistance), a framework that improves conversational agents for task assistance by integrating eye gaze and speech cues. MICA segments demonstrations into meaningful sub-tasks and extracts keyframes and captions that capture fine-grained intent and user-specific cues, enabling richer contextual grounding for visual question answer...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Human-computer interaction","1970-01-01","retrieval"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/fea-bench-a-benchmark-for-evaluating-repository-level-code-generation-for-feature-implementation","title":"FEA-Bench: A Benchmark for Evaluating Repository-Level Code Generation for Feature Implementation","url":"https://www.microsoft.com/en-us/research/publication/fea-bench-a-benchmark-for-evaluating-repository-level-code-generation-for-feature-implementation/","published":"2025-07-01","authors":["Wei Li","Xin Zhang","Zhongxin Guo","Shaoguo Mao","Wen Luo","Guangyue Peng","Yangyu Huang","Houfeng Wang","Scarlett Li"],"abstract":"Implementing new features in repository-level codebases is a crucial application of code generation models. However, current benchmarks lack a dedicated evaluation framework for this capability. To fill this gap, we introduce FEA-Bench, a benchmark designed to assess the ability of large language models (LLMs) to perform incremental development within code repositories. We collect pull requests from 83 GitHub repositories and use rule-based and intent-based filtering to construct task instances focused on new feature development. Each task instance containing code changes is paired with relevant unit test files to ensure that the solution can be verified. The feature implementation requires LLMs to simultaneously possess code completion capabilities for new components and code editing abilities for other relevant parts in the code repository, providing a more comprehensive evaluation met...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Code generation","large language models","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/dehumanizing-machines-mitigating-anthropomorphic-behaviors-in-text-generation-systems","title":"Dehumanizing Machines: Mitigating Anthropomorphic Behaviors in Text Generation Systems","url":"https://www.microsoft.com/en-us/research/publication/dehumanizing-machines-mitigating-anthropomorphic-behaviors-in-text-generation-systems/","published":"2025-07-01","authors":["Myra Cheng","Su Lin Blodgett","Alicia DeVrio","Lisa Egede","Alexandra Olteanu"],"abstract":"As text generation systems’ outputs are increasingly anthropomorphic—perceived as human-like—scholars have also increasingly raised concerns about how such outputs can lead to harmful outcomes, such as users over-relying or developing emotional dependence on these systems. How to intervene on such system outputs to mitigate anthropomorphic behaviors and their attendant harmful outcomes, however, remains understudied. With this work, we aim to provide empirical and theoretical grounding for developing such interventions. To do so, we compile an inventory of interventions grounded both in prior literature and a crowdsourcing study where participants edited system outputs to make them less human-like. Drawing on this inventory, we also develop a conceptual framework to help characterize the landscape of possible interventions, articulate distinctions between different types of interventions...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.18653/v1/2025.acl-long.1259","openalex_id":"https://openalex.org/W4412889949","cited_by_count":5,"quality_score":69,"matched_keywords":["Inproceedings (Conference)","Human language technologies","Social sciences"],"author_affiliations":["Microsoft","Carnegie Mellon University","Microsoft (United States)"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/text-to-cad-generation-through-infusing-visual-feedback-in-large-language-models","title":"Text-to-CAD Generation Through Infusing Visual Feedback in Large Language Models","url":"https://www.microsoft.com/en-us/research/publication/text-to-cad-generation-through-infusing-visual-feedback-in-large-language-models/","published":"2025-07-01","authors":["Ruiyu Wang","Yu Yuan","Shizhao Sun","Jiang Bian"],"abstract":"Creating Computer-Aided Design (CAD) models requires significant expertise and effort. Text-to-CAD, which converts textual descriptions into CAD parametric sequences, is crucial in streamlining this process. Recent studies have utilized ground-truth parametric sequences, known as sequential signals, as supervision to achieve this goal. However, CAD models are inherently multimodal, comprising parametric sequences and corresponding rendered visual objects. Besides,the rendering process from parametric sequences to visual objects is many-to-one. Therefore, both sequential and visual signals are critical for effective training. In this work, we introduce CADFusion, a framework that uses Large Language Models (LLMs) as the backbone and alternates between two training stages: the sequential learning (SL) stage and the visual feedback (VF) stage. In the SL stage, we train LLMs using ground-tru...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer vision","Computer science"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/system-comparison-using-automated-generation-of-relevance-judgements-in-multiple-languages","title":"System comparison using automated generation of relevance judgements in multiple languages","url":"https://www.microsoft.com/en-us/research/publication/system-comparison-using-automated-generation-of-relevance-judgements-in-multiple-languages/","published":"2025-07-01","authors":["Paul Thomas","Douglas W Oard","Eugene Yang","Dawn Lawrie","James Mayfield"],"abstract":"Recent work has shown that Large Language Models (LLMs) can produce relevance judgements for English retrieval that are useful as a basis for system comparison, and they do so at vastly reduced cost compared to human assessors. Using relevance judgements and ranked retrieval runs from the TREC NeuCLIR track, this paper shows that LLMs can also produce reliable assessments in other languages, even when the topic description or the prompt are in a language different from the documents. Results with Chinese, Persian and Russian documents show that although document language affects both agreement with human assessors on graded relevance and on preference ordering among systems, prompt-language and topic-language effects are negligible. This has implications for the design of multilingual test collections, suggesting that prompts and topic descriptions can be developed in any convenient lang...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Inproceedings (Conference)","Search and information retrieval","preference","retrieval"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"huawei-noah:200","title":"Forest-of-Thought: Scaling Test-Time Compute for Enhancing LLM Reasoning","url":"https://www.noahlab.com.hk/en/scientific_research/forest-of-thought-scaling-test-time-compute-for-enhancing-llm-reasoning","published":"2025-07-01","authors":["Huawei/Noah"],"abstract":"Official Huawei Noah's Ark Lab publication entry. Venue: ICML 2025. External paper link: https://arxiv.org/abs/2412.09078","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Model architecture and optimization","ICML 2025","2025","LLM"],"author_affiliations":["Huawei/Noah"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Huawei Noah Wagtail publication API https://www.noahlab.com.hk/wt_app/api/v2/pages/"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/socialcc-interactive-evaluation-for-cultural-competence-in-language-agents","title":"SocialCC: Interactive Evaluation for Cultural Competence in Language Agents","url":"https://www.microsoft.com/en-us/research/publication/socialcc-interactive-evaluation-for-cultural-competence-in-language-agents/","published":"2025-07-01","authors":["Jincenzi Wu","Jianxun Lian","DingDong Wang","Helen Meng"],"abstract":"Large Language Models (LLMs) are increasingly deployed worldwide, yet their ability to navigate cultural nuances remains underexplored. Misinterpreting cultural content can lead to AI-generated responses that are offensive or inappropriate, limiting their usability in global applications such as customer service, diplomatic communication, and online education. While prior research has evaluated cultural knowledge of LLMs, existing benchmarks fail to assess dynamic cultural competence — the ability to apply cultural knowledge effectively in real-world interactions. To address this gap, we introduce SocialCC, a novel benchmark designed to evaluate cultural competence through multi-turn interactive intercultural scenarios. It comprises 3,060 human-written scenarios spanning 60 countries across six continents. Through extensive experiments on eight prominent LLMs, our findings reveal a signi...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/large-language-models-for-supply-chain-decisions","title":"Large Language Models for Supply Chain Decisions","url":"https://www.microsoft.com/en-us/research/publication/large-language-models-for-supply-chain-decisions/","published":"2025-07-01","authors":["David Simchi-Levi","Konstantina Mellou","Ishai Menache","Jeevan Pathuri"],"abstract":"Supply Chain Management requires addressing a variety of complex decision-making challenges, from sourcing strategies to planning and execution. Over the last few decades, advances in computation and information technologies have enabled the transition from manual, intuition and experience-based decision-making, into more automated and data-driven decisions using a variety of tools that apply optimization techniques. These techniques use mathematical methods to improve decision-making. Unfortunately, business planners and executives still need to spend considerable time and effort to (i) understand and explain the recommendations coming out of these technologies; (ii) analyze various scenarios and answer what-if questions; and (iii) update the mathematical models used in these tools to reflect current business environments. Addressing these challenges requires involving data science team...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Miscellaneous","Artificial intelligence"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"official:069ce5d499a7ad63","title":"Gemini Robotics On-Device Model Card","url":"https://storage.googleapis.com/deepmind-media/Model-Cards/Gemini-Robotics-On-Device-Model-Card.pdf","published":"2025-07-01","authors":["Google/DeepMind"],"abstract":"Official Google DeepMind model card.","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"model_card","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["model card","Gemini Robotics On-Device"],"author_affiliations":["Google/DeepMind"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Google DeepMind model cards page https://deepmind.google/models/model-cards/"}},{"id":"official:a822200e2c31196a","title":"ASTRO: Teaching Language Models to Reason by Reflecting and Backtracking In-Context","url":"https://ai.meta.com/research/publications/astro-teaching-language-models-to-reason-by-reflecting-and-backtracking-in-context/","published":"2025-07-01","authors":["Joongwon (Daniel) Kim","Anirudh Goyal","Liang Tan","Hannaneh Hajishirzi","Srini Iyer","Tianlu Wang"],"abstract":"Official Meta AI publication page.","companies":["Meta/FAIR"],"matched_orgs":["Meta/FAIR"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Reinforcement Learning","NLP"],"author_affiliations":["Meta/FAIR"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Meta AI publication search https://ai.meta.com/results/?content_types%5B0%5D=publication&page=4"}},{"id":"openalex:W4412091453","title":"Statistical or Embodied? Comparing Colorseeing, Colorblind, Painters, and Large Language Models in Their Processing of Color Metaphors","url":"https://doi.org/10.1111/cogs.70083","published":"2025-07-01","authors":["Ethan O. Nadler","Douglas Guilbeault","Sofronia M Ringold","Tom Williamson","Antoine Bellemare","Iulia M. Comșa","Karim Jerbi","Srini Narayanan","Lisa Aziz‐Zadeh"],"abstract":"Can metaphorical reasoning involving embodied experience-such as color perception-be learned from the statistics of language alone? Recent work finds that colorblind individuals robustly understand and reason abstractly about color, implying that color associations in everyday language might contribute to the metaphorical understanding of color. However, it is unclear how much colorblind individuals' understanding of color is driven by language versus their limited (but no less embodied) visual experience. A more direct test of whether language supports the acquisition of humans' understanding of color is whether large language models (LLMs)-those trained purely on text with no visual experience-can nevertheless learn to generate consistent and coherent metaphorical responses about color. Here, we conduct preregistered surveys that compare colorseeing adults, colorblind adults, and LLMs....","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1111/cogs.70083","openalex_id":"https://openalex.org/W4412091453","cited_by_count":2,"quality_score":43,"matched_keywords":["LLM"],"author_affiliations":["Google (United States)","Google DeepMind (United Kingdom)","North Bristol NHS Trust","Southmead Hospital","Stanford University","University of California San Diego","University of Oxford","University of Southern California","University of the West of England","Université de Montréal"],"concepts":[{"id":"https://openalex.org/C100609095","display_name":"Embodied cognition","score":0.8273147344589233},{"id":"https://openalex.org/C15744967","display_name":"Psychology","score":0.6324209570884705},{"id":"https://openalex.org/C2779343474","display_name":"Context (archaeology)","score":0.5532436370849609},{"id":"https://openalex.org/C26760741","display_name":"Perception","score":0.5494949817657471},{"id":"https://openalex.org/C180747234","display_name":"Cognitive psychology","score":0.5221093893051147},{"id":"https://openalex.org/C2776664067","display_name":"Synesthesia","score":0.43780824542045593},{"id":"https://openalex.org/C61674017","display_name":"Color vision","score":0.43298986554145813},{"id":"https://openalex.org/C77805123","display_name":"Social psychology","score":0.33174124360084534}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":2}},{"id":"official:40056e29e4eaa2b8","title":"Helix Parallelism: Rethinking Sharding Strategies for Interactive Multi-Million-Token LLM Decoding","url":"https://research.nvidia.com/publication/2025-07_helix-parallelism-rethinking-sharding-strategies-interactive-multi-million","published":"2025-07","authors":["Nidhi Bhatia","Ankit More","Ritika Borkar","Tiyasa Mitra","Ramon Matas","Ritchie Zhao","Maximilian Golub","Dheevatsa Mudigere","Brian Pharris","Bita Darvish Rouhani"],"abstract":"Official NVIDIA Research publication.","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["LLM"],"author_affiliations":["NVIDIA"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official NVIDIA Research publications page https://research.nvidia.com/publications?f%5B0%5D=publication_date%3A2025&page=1"}},{"id":"official:6df76a1f0c507821","title":"Identity-Motion Trade-offs in Text-to-Video Generation","url":"https://research.nvidia.com/publication/2025-07_identity-motion-trade-offs-text-video-generation","published":"2025-07","authors":["Yuval Atzmon","Rinon Gal","Yoad Tewel","Yoni Kasten","Gal Chechik"],"abstract":"Official NVIDIA Research publication.","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["NVIDIA"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official NVIDIA Research publications page https://research.nvidia.com/publications?f%5B0%5D=publication_date%3A2025&page=1"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/retroinfer-a-vector-storage-approach-for-scalable-long-context-llm-inference","title":"RetroInfer: A Vector-Storage Approach for Scalable Long-Context LLM Inference","url":"https://www.microsoft.com/en-us/research/publication/retroinfer-a-vector-storage-approach-for-scalable-long-context-llm-inference/","published":"2025-06-30","authors":["Yaoqi Chen","Jinkai Zhang","Baotong Lu","Qianxi Zhang","Chengruidong Zhang","Jingjia Luo","Di Liu","Huiqiang Jiang","Qi Chen","Jing Liu","Bailu Ding","Xiao Yan"],"abstract":"The growing context lengths of large language models (LLMs) pose significant challenges for efficient inference, primarily due to GPU memory and bandwidth constraints. We present RetroInfer, a novel system that reconceptualizes the key-value (KV) cache as a vector storage system which exploits the inherent attention sparsity to accelerate long-context LLM inference. At its core is the wave index, an Attention-aWare VEctor index that enables efficient and accurate retrieval of critical tokens through techniques such as tripartite attention approximation, accuracy-bounded attention estimation, and segmented clustering. Complementing this is the wave buffer, which coordinates KV cache placement and overlaps computation and data transfer across GPU and CPU to sustain high throughput. Unlike prior sparsity-based methods that struggle with token selection and hardware coordination, RetroInfer....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Unpublished","Artificial intelligence","LLM","memory","retrieval","efficient"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"apple:afzmdkhm449e27x7t1qbed5l","title":"TiC-LM: A Web-Scale Benchmark for Time-Continual LLM Pretraining","url":"https://machinelearning.apple.com/research/tic-lm-web-scale","published":"2025-06-30","authors":["Jeffrey Li§","Mohammadreza Armandpour","Iman Mirzadeh","Sachin Mehta°","Vaishaal Shankar°","Raviteja Vemulapalli","Samy Bengio","Oncel Tuzel","Mehrdad Farajtabar","Hadi Pouransari","Fartash Faghri"],"abstract":"This paper was accepted to the ACL 2025 main conference as an oral presentation.","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["LLM"],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"apple:kyh4yp3boov9ey2bu6sif09l","title":"Instruction-Following Pruning for Large Language Models","url":"https://machinelearning.apple.com/research/pruning-large-language","published":"2025-06-30","authors":["Bairu Hou","Qibin Chen","Jianyu Wang","Guoli Yin","Chong Wang","Nan Du","Ruoming Pang","Shiyu Chang","Tao Lei"],"abstract":"With the rapid scaling of large language models (LLMs), structured pruning has become a widely used technique to learn efficient, smaller models from larger ones, delivering superior performance compared to training similarly sized models from scratch. In this paper, we move beyond the traditional static pruning approach of determining a fixed pruning mask for a model, and propose a dynamic approach to structured pruning. In our method, the...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["efficient"],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"apple:ls8xxjifpzcz62g10os06d2f","title":"Evaluating Long Range Dependency Handling in Code Generation LLMs","url":"https://machinelearning.apple.com/research/evaluating-long-range","published":"2025-06-30","authors":["Yannick Assogba","Donghao Ren"],"abstract":"As language models support larger and larger context sizes, evaluating their ability to makeeffective use of that context becomes increasingly important. We analyze the ability ofseveral code generation models to handle long range dependencies using a suite of multi-stepkey retrieval tasks in context windows up to 8k tokens in length. The tasks progressivelyincrease in difficulty and allow more nuanced evaluation of model capabilities than...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["retrieval"],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"apple:d2ambl1luhzc9r6dplfoz2dj","title":"ETVA: Evaluation of Text-to-Video Alignment via Fine-grained Question Generation and Answering","url":"https://machinelearning.apple.com/research/etva","published":"2025-06-30","authors":["Kaisi Guan","Zhengfeng Lai","Yuchong Sun","Peng Zhang","Wei Liu","Kieran Liu","Meng Cao","Ruihua Song"],"abstract":"Precisely evaluating semantic alignment between text prompts and generated videos remains a challenge in Text-to-Video (T2V) Generation. Existing text-to-video alignment metrics like CLIPScore only generate coarse-grained scores without fine-grained alignment details, failing to align with human preference. To address this limitation, we propose ETVA, a novel Evaluation method of Text-to-Video Alignment via fine-grained question generation and...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["preference"],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"openalex:W4415708620","title":"Enhancing Long Video Understanding via Hierarchical Event-Based Memory","url":"https://doi.org/10.1109/icme59968.2025.11210102","published":"2025-06-30","authors":["Dingxin Cheng","Mingda Li","Jingyu Liu","Yongxin Guo","Bin Jiang","Qingbin Liu","Xi Chen","Bo Zhao"],"abstract":"Recently, integrating visual foundation models into large language models (LLMs) to form video understanding systems has attracted widespread attention. Most of the existing models compress diverse semantic information within the whole video and feed it into LLMs for content comprehension. While this method excels in short video understanding, it may result in a blend of multiple event information in long videos due to coarse compression, which causes information redundancy. Consequently, the semantics of key events might be obscured within the vast information that hinders the model’s understanding capabilities. To address this issue, we propose a Hierarchical Event-based Memory-enhanced LLM (HEM-LLM) for better understanding of long videos. Firstly, we design a novel adaptive sequence segmentation scheme to divide multiple events within long videos. In this way, we can perform individu...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/icme59968.2025.11210102","openalex_id":"https://openalex.org/W4415708620","cited_by_count":0,"quality_score":53,"matched_keywords":["LLM","memory","long-term","compression"],"author_affiliations":["Chinese University of Hong Kong, Shenzhen","Shandong University","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8331000208854675},{"id":"https://openalex.org/C184337299","display_name":"Semantics (computer science)","score":0.7078999876976013},{"id":"https://openalex.org/C2779662365","display_name":"Event (particle physics)","score":0.6802999973297119},{"id":"https://openalex.org/C26517878","display_name":"Key (lock)","score":0.5971999764442444},{"id":"https://openalex.org/C77618280","display_name":"Scheme (mathematics)","score":0.4771000146865845},{"id":"https://openalex.org/C89600930","display_name":"Segmentation","score":0.41370001435279846},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.40610000491142273},{"id":"https://openalex.org/C2778112365","display_name":"Sequence (biology)","score":0.3790000081062317}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4415709241","title":"Adaptive Mobile Agent for Dynamic Interactions","url":"https://doi.org/10.1109/icme59968.2025.11209947","published":"2025-06-30","authors":["Yanda Li","Chi Zhang","Wen Jiang","Wanqi Yang","Bin Fu","Cheng Pei","Xin Chen","Meng Fang","Ling Chen","Yunchao Wei"],"abstract":"With the rise of Multimodal Large Language Models (MLLM), LLM-driven visual agents are transforming software interfaces, especially those with graphical user interfaces. However, existing methods often struggle with diverse and complex mobile environments, such as rapidly changing app interfaces or non-standard UI components, limiting their adaptability and precision. This work presents a novel LLM-based multimodal agent framework for mobile devices, designed to enhance interaction and adaptive capabilities in dynamic mobile environments. By autonomously navigating devices and emulating human-like behaviors, the agent integrates parsing, text, and vision descriptions to construct a flexible action space. During the exploration phase, functionalities of user interface elements are documented into a customized structured knowledge base. In the deployment phase, RAG technology enables effic...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/icme59968.2025.11209947","openalex_id":"https://openalex.org/W4415709241","cited_by_count":0,"quality_score":53,"matched_keywords":["LLM","retrieval","efficient","agent"],"author_affiliations":["Beijing Jiaotong University","Tencent (China)","University of Liverpool","University of Technology Sydney","Westlake University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.718999981880188},{"id":"https://openalex.org/C177606310","display_name":"Adaptability","score":0.6723999977111816},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.6417999863624573},{"id":"https://openalex.org/C2780801425","display_name":"Construct (python library)","score":0.5519999861717224},{"id":"https://openalex.org/C105339364","display_name":"Software deployment","score":0.5440000295639038},{"id":"https://openalex.org/C186967261","display_name":"Mobile device","score":0.5072000026702881},{"id":"https://openalex.org/C2777904410","display_name":"Software","score":0.46149998903274536},{"id":"https://openalex.org/C113843644","display_name":"Interface (matter)","score":0.448199987411499}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"apple:dwcj88cy4wix44zo84xysmq5","title":"From Interaction to Impact: Towards Safer AI Agents Through Understanding and Evaluating Mobile UI Operation Impacts","url":"https://machinelearning.apple.com/research/towards-safer-ai-agents","published":"2025-06-30","authors":["Zhuohao Jerry Zhang","Eldon Schoop","Jeffrey Nichols","Anuj Mahajan","Amanda Swearngin"],"abstract":"With advances in generative AI, there is increasing work towards creating autonomous agents that can manage daily tasks by operating user interfaces (UIs). While prior research has studied the mechanics of how AI agents might navigate UIs and understand UI structure, the effects of agents and their autonomous actions—particularly those that may be risky or irreversible—remain under-explored. In this work, we investigate the real-world impacts and...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"apple:ll9fxac95hckxw8owjit1af6","title":"Contrastive Localized Language-Image Pre-Training","url":"https://machinelearning.apple.com/research/contrastive-localized","published":"2025-06-30","authors":["Hong-You Chen","Jeff Lai","Haotian Zhang","Angie Wang","Marcin Eichner","Keen You","Meng Cao","Bowen Zhang","Yinfei Yang","Zhe Gan"],"abstract":"Contrastive Language-Image Pre-training (CLIP) has been a celebrated method for training vision encoders to generate image/text representations facilitating various applications. Recently, CLIP has been widely adopted as the vision backbone of multimodal large language models (MLLMs) to connect image inputs for language interactions. The success of CLIP as a vision-language foundation model relies on aligning web-crawled noisy text annotations at...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"apple:lqd4kgppqwh7g533pjfplve3","title":"Cavia: Camera-controllable Multi-view Video Diffusion with View-Integrated Attention","url":"https://machinelearning.apple.com/research/cavia","published":"2025-06-30","authors":["Dejia Xu","Yifan Jiang","Chen (Kimi) Huang","Liangchen Song","Thorsten Gernoth","Liangliang Cao","Atlas Wang","Hao Tang"],"abstract":"In recent years, there have been remarkable breakthroughs in image-to-video generation. However, the 3D consistency and camera controllability of generated frames have remained unsolved. Recent studies have attempted to incorporate camera control into the generation process, but their results are often limited to simple trajectories or lack the ability to generate consistent videos from multiple distinct camera paths for the same scene. To...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"official:febd9ba1a7faf64e","title":"Announcing the Open Source Release of the ERNIE 4.5 Model Family","url":"https://ernie.baidu.com/blog/posts/ernie4.5/","published":"2025-06-30","authors":["Baidu"],"abstract":"We introduce ERNIE 4.5, a new family of large-scale multimodal models comprising 10 distinct variants. The model family consist of Mixture-of-Experts (MoE) models with 47B and 3B active parameters, with the largest model having 424B total parameters, as well as a 0.3B dense model.","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Baidu"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official RSS feed https://ernie.baidu.com/blog/index.xml"}},{"id":"apple:ol92g36rdb4169haljeuls6z","title":"Advancing Egocentric Video Question Answering with Multimodal Large Language Models","url":"https://machinelearning.apple.com/research/advancing-egocentric-video","published":"2025-06-30","authors":["Alkesh Patel","Vibhav Chitalia","Yinfei Yang"],"abstract":"Egocentric Video Question Answering (QA) requires models to handle long-horizon temporal reasoning, first-person perspectives, and specialized challenges like frequent camera movement. This paper systematically evaluates both proprietary and open-source Multimodal Large Language Models (MLLMs) on QaEgo4Dv2—a refined dataset of egocentric videos derived from QaEgo4D. Four popular MLLMs (GPT-4o, Gemini-1.5-Pro, Video-LLaVa-7B and...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"arxiv:2512.02584","title":"Stepwise Schema-Guided Prompting Framework with Parameter Efficient Instruction Tuning for Multimedia Event Extraction","url":"http://arxiv.org/abs/2512.02584","published":"2025-06-30","authors":["Xiang Yuan","Xinrong Chen","Haochen Li","Hang Yang","Guanyu Wang","Wei-Ping Li","Tong Mo"],"abstract":"Multimedia Event Extraction (MEE) has become an important task in information extraction research as news today increasingly prefers to contain multimedia content. Current MEE works mainly face two challenges: (1) Inadequate extraction framework modeling for handling complex and flexible multimedia event structure; (2) The absence of multimodal-aligned training data for effective knowledge transfer to MEE task. In this work, we propose a Stepwise Schema-Guided Prompting Framework (SSGPF) using Multimodal Large Language Model (MLLM) as backbone for adaptive structure capturing to solve MEE task. At the initial step of SSGPF, we design Event Type Schema Guided Prompting (ETSGP) for event detection, then we devise Argument Role Schema Guided Prompting (ARSGP) that contains multi-step prompts with text-bridged grounding technique for argument extraction. We construct a weakly-aligned multimo...","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/icme59968.2025.11210082","openalex_id":"https://openalex.org/W4415708963","cited_by_count":1,"quality_score":50,"matched_keywords":["language model","news","efficient"],"author_affiliations":["Baidu (China)","Peking University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8151999711990356},{"id":"https://openalex.org/C52146309","display_name":"Schema (genetic algorithms)","score":0.6621000170707703},{"id":"https://openalex.org/C2779662365","display_name":"Event (particle physics)","score":0.6004999876022339},{"id":"https://openalex.org/C98184364","display_name":"Argument (complex analysis)","score":0.5360000133514404},{"id":"https://openalex.org/C2780801425","display_name":"Construct (python library)","score":0.5343000292778015},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.48240000009536743},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.4634000062942505},{"id":"https://openalex.org/C185798385","display_name":"Benchmark (surveying)","score":0.44269999861717224}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4415708409","title":"AKVQ-VL: Attention-Aware KV Cache Adaptive 2-Bit Quantization for Vision-Language Models","url":"https://doi.org/10.1109/icme59968.2025.11209367","published":"2025-06-30","authors":["Zhiyong Su","Wang Shen","Linge Li","Zhe Chen","Hanyu Wei","Huangqi Yu","Kehong Yuan"],"abstract":"Vision-language models (VLMs) show remarkable performance in multimodal tasks. However, excessively long multimodal inputs lead to oversized Key-Value (KV) caches, resulting in significant memory consumption and I/O bottlenecks. Previous KV quantization methods for Large Language Models (LLMs) may alleviate these issues but overlook the attention saliency differences of multimodal tokens, resulting in suboptimal performance. In this paper, we investigate the attention-aware token saliency patterns in VLM and propose AKVQ-VL. AKVQ-VL leverages the proposed Text-Salient Attention (TSA) and Pivot-Token-Salient Attention (PSA) patterns to adaptively allocate bit budgets. Moreover, achieving extremely low-bit quantization requires effectively addressing outliers in KV tensors. AKVQ-VL utilizes the Walsh-Hadamard transform (WHT) to construct outlier-free KV caches, thereby reducing quantizatio...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/icme59968.2025.11209367","openalex_id":"https://openalex.org/W4415708409","cited_by_count":1,"quality_score":50,"matched_keywords":["LLM","memory","quantization"],"author_affiliations":["Huawei Technologies (China)","Tsinghua University"],"concepts":[{"id":"https://openalex.org/C28855332","display_name":"Quantization (signal processing)","score":0.8450000286102295},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7526999711990356},{"id":"https://openalex.org/C79337645","display_name":"Outlier","score":0.47620001435279846},{"id":"https://openalex.org/C48145219","display_name":"Security token","score":0.47519999742507935},{"id":"https://openalex.org/C11413529","display_name":"Algorithm","score":0.41029998660087585},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3962000012397766},{"id":"https://openalex.org/C115537543","display_name":"Cache","score":0.3569999933242798},{"id":"https://openalex.org/C2780165032","display_name":"Energy consumption","score":0.31709998846054077}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4416251035","title":"SuperFC: Selective Data Utilization for a Sustainable and Effective Function-Calling Agent","url":"https://doi.org/10.1109/ijcnn64981.2025.11228427","published":"2025-06-30","authors":["Xin Yang","Yilun Liu","Shimin Tao","Chunguang Zhao","Weibin Meng","M. He","Chang Su","Rongrong Liu","Hongxia Ma","Zhang Li","Jingzhou Du","Duan Li"],"abstract":"The function-calling agent is obtained by performing agent tuning to the large language model (LLM) on function-calling dataset. However, even state-of-the-art datasets (e.g., xlam-function-calling-60k datasets) still contain numerous misleading examples of low-quality data, wasting significant computational resources and result in an unnecessary carbon footprint. Furthermore, such inductive bad data negatively impacts the performance of the agent. In this paper, we propose a set of scoring criteria specifically tailored to evaluate function-calling data and use these criteria to develop a data filtering framework. By applying this framework to filter out low-quality data, we fine-tuned SuperFC, which demonstrates substantial improvements in both sustainability and performance. The SuperFC-7B training process reduced training time from 455 minutes to 85 minutes, resulting in a 80.02% red...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/ijcnn64981.2025.11228427","openalex_id":"https://openalex.org/W4416251035","cited_by_count":0,"quality_score":49,"matched_keywords":["LLM","language model","agent"],"author_affiliations":["Huawei Technologies (Canada)","Huawei Technologies (China)","Northeastern University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6377000212669373},{"id":"https://openalex.org/C98045186","display_name":"Process (computing)","score":0.6266000270843506},{"id":"https://openalex.org/C106131492","display_name":"Filter (signal processing)","score":0.5227000117301941},{"id":"https://openalex.org/C112930515","display_name":"Risk analysis (engineering)","score":0.5112000107765198},{"id":"https://openalex.org/C177264268","display_name":"Set (abstract data type)","score":0.49079999327659607},{"id":"https://openalex.org/C66204764","display_name":"Sustainability","score":0.47620001435279846},{"id":"https://openalex.org/C2779530757","display_name":"Quality (philosophy)","score":0.4740999937057495},{"id":"https://openalex.org/C111335779","display_name":"Reduction (mathematics)","score":0.428600013256073}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7105648983","title":"AttentionDefense: Leveraging System Prompt Attention for Explainable Defense Against Novel Jailbreaks","url":"https://doi.org/10.1109/ijcnn64981.2025.11228703","published":"2025-06-30","authors":["Charlotte Siska","Anush Sankaran"],"abstract":"In the past few years, Language Models (LMs) have shown par-human capabilities in several domains. Despite their practical applications and exceeding user consumption, they are susceptible to jailbreaks when malicious input exploits the LM’s weaknesses, causing it to deviate from its intended behavior. Current defensive strategies either classify the input prompt as adversarial or prevent LMs from generating harmful outputs. However, it is challenging to explain the reason behind the malicious nature of the jailbreak, which results in a wide variety of closed-box approaches. In this research, we propose and demonstrate that system-prompt attention from Small Language Models (SLMs) can be used to characterize adversarial prompts, providing a novel, explainable, and cheaper defense approach called AttentionDefense. Our research suggests that the attention mechanism is an integral component...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/ijcnn64981.2025.11228703","openalex_id":"https://openalex.org/W7105648983","cited_by_count":0,"quality_score":49,"matched_keywords":["LLM","agent","multi-agent"],"author_affiliations":["Microsoft (United States)","Microsoft Research (United Kingdom)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8517000079154968},{"id":"https://openalex.org/C37736160","display_name":"Adversarial system","score":0.7404999732971191},{"id":"https://openalex.org/C185798385","display_name":"Benchmark (surveying)","score":0.6866000294685364},{"id":"https://openalex.org/C165696696","display_name":"Exploit","score":0.6801999807357788},{"id":"https://openalex.org/C136197465","display_name":"Variety (cybernetics)","score":0.6258999705314636},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5577999949455261},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.4964999854564667},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.4480000138282776}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4416252407","title":"A Multimodal LLM for Chart Understanding and Generation","url":"https://doi.org/10.1109/ijcnn64981.2025.11227777","published":"2025-06-30","authors":["Yucheng Han","Chi Zhang","Xin Chen","Fukun Yin","Xu Yang","Zhibin Wang","Gang Yu","Bin Fu","Hanwang Zhang"],"abstract":"Multi-modal large language models have demonstrated impressive performances on most vision-language tasks. However, the model generally lacks the understanding capabilities for specific domain data, particularly when it comes to interpreting chart figures. This is mainly due to the lack of relevant multi-modal instruction tuning datasets. In this article, we create a high-quality instruction-tuning dataset leveraging GPT-4. We develop a multi-step data generation process in which different steps are responsible for generating tabular data, creating chart figures, and designing instruction tuning data separately. Our method’s flexibility enables us to generate diverse, high-quality instruction-tuning data consistently and efficiently while maintaining a low resource expenditure. Additionally, it allows us to incorporate a wider variety of chart and task types not yet featured in existing....","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/ijcnn64981.2025.11227777","openalex_id":"https://openalex.org/W4416252407","cited_by_count":4,"quality_score":49,"matched_keywords":["LLM","language model"],"author_affiliations":["Fudan University","Nanyang Technological University","Southeast University","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8360999822616577},{"id":"https://openalex.org/C190812933","display_name":"Chart","score":0.7936999797821045},{"id":"https://openalex.org/C136197465","display_name":"Variety (cybernetics)","score":0.5478000044822693},{"id":"https://openalex.org/C98045186","display_name":"Process (computing)","score":0.544700026512146},{"id":"https://openalex.org/C2780451532","display_name":"Task (project management)","score":0.5343999862670898},{"id":"https://openalex.org/C2780598303","display_name":"Flexibility (engineering)","score":0.5289999842643738},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.42829999327659607},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.41600000858306885}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":4}},{"id":"openalex:W4415708087","title":"AutoStyle-TTS: Retrieval-Augmented Generation based Automatic Style Matching Text-to-Speech Synthesis","url":"https://doi.org/10.1109/icme59968.2025.11208937","published":"2025-06-30","authors":["Dan Luo","Chengyuan Ma","Weiqin Li","Jun Wang","Wei Chen","Zhiyong Wu"],"abstract":"With the advancement of speech synthesis technology, users have higher expectations for the naturalness and expressiveness of synthesized speech. But previous research ignores the importance of prompt selection. This study proposes a text-to-speech (TTS) framework based on Retrieval-Augmented Generation (RAG) technology, which can dynamically adjust the speech style according to the text content to achieve more natural and vivid communication effects. We have constructed a speech style knowledge database containing high-quality speech samples in various contexts and developed a style matching scheme. This scheme uses embeddings, extracted by Llama, PER-LLM-Embedder, and Moka, to match with samples in the knowledge database, selecting the most appropriate speech style for synthesis. Furthermore, our empirical research validates the effectiveness of the proposed method. Our demo can be vie...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/icme59968.2025.11208937","openalex_id":"https://openalex.org/W4415708087","cited_by_count":3,"quality_score":48,"matched_keywords":["LLM","retrieval"],"author_affiliations":["Tencent (China)","Tsinghua University","University Town of Shenzhen"],"concepts":[{"id":"https://openalex.org/C134537474","display_name":"Naturalness","score":0.8209999799728394},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7720000147819519},{"id":"https://openalex.org/C14999030","display_name":"Speech synthesis","score":0.6043999791145325},{"id":"https://openalex.org/C165064840","display_name":"Matching (statistics)","score":0.597000002861023},{"id":"https://openalex.org/C2776445246","display_name":"Style (visual arts)","score":0.5649999976158142},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.5144000053405762},{"id":"https://openalex.org/C28490314","display_name":"Speech recognition","score":0.48669999837875366},{"id":"https://openalex.org/C2776187449","display_name":"Natural language generation","score":0.460999995470047}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":3}},{"id":"openalex:W4415708098","title":"UniSep: Universal Target Audio Separation with Language Models at Scale","url":"https://doi.org/10.1109/icme59968.2025.11210185","published":"2025-06-30","authors":["Yuanyuan Wang","Hangting Chen","Dongchao Yang","Weiqin Li","Dan Luo","Guangzhi Li","Shan Yang","Zhiyong Wu","Helen Meng","Xixin Wu"],"abstract":"We propose Universal target audio Separation (UniSep), addressing the separation task on arbitrary mixtures of different types of audio. Distinguished from previous studies, UniSep is performed on unlimited source domains and unlimited source numbers. We formulate the separation task as a sequence-to-sequence problem, and a large language model (LLM) is used to model the audio sequence in the discrete latent space, leveraging the power of LLM in handling complex mixture audios with large-scale data. Moreover, a novel pre-training strategy is proposed to utilize audio-only data, which reduces the efforts of large-scale data simulation and enhances the ability of LLMs to understand the consistency and correlation of information within audio sequences. We also demonstrate the effectiveness of scaling datasets in an audio separation task: we use large-scale data (36.5k hours), including spee...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/icme59968.2025.11210185","openalex_id":"https://openalex.org/W4415708098","cited_by_count":2,"quality_score":47,"matched_keywords":["LLM","language model"],"author_affiliations":["Chinese University of Hong Kong","Tencent (China)","Tsinghua University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7330999970436096},{"id":"https://openalex.org/C2776864781","display_name":"Source separation","score":0.6955999732017517},{"id":"https://openalex.org/C2776061190","display_name":"Separation (statistics)","score":0.6718999743461609},{"id":"https://openalex.org/C2780451532","display_name":"Task (project management)","score":0.6538000106811523},{"id":"https://openalex.org/C2776436953","display_name":"Consistency (knowledge bases)","score":0.5501999855041504},{"id":"https://openalex.org/C28490314","display_name":"Speech recognition","score":0.46380001306533813},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4489000141620636},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.42660000920295715}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":2}},{"id":"openalex:W4415707659","title":"OSLLM: A Retrieve-Reason-Refine Framework for Multi-Domain Relation Extraction with Large Language Models","url":"https://doi.org/10.1109/icme59968.2025.11209996","published":"2025-06-30","authors":["Jie Zhou","Yongxue Shan","Meihan Wu","Fei Hu","Li Zheng","Xiaodong Wang"],"abstract":"Relation Extraction (RE) aims to identify relations between entities in text. Despite the potential of Large Language Models (LLMs) in RE, they struggle with low relevance of relations in retrieved demonstrations and inconsistent responses to identical queries. Therefore, we introduce OSLLM, a novel retrieve-reason-refine framework for RE in Open Source Intelligence with LLMs. OSLLM leverages relation embeddings for accurate retrieval, performs better in-context reasoning, and refines outputs via template self-optimization and answer self-evaluation. Extensive experiments demonstrate that our method outperforms other LLM-based methods in traditional, temporal, and open RE tasks. Additionally, leveraging OSLLM to assist fine-tuned models in handling boundary samples significantly boosts the performance of these smaller models.","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/icme59968.2025.11209996","openalex_id":"https://openalex.org/W4415707659","cited_by_count":0,"quality_score":45,"matched_keywords":["LLM","retrieval"],"author_affiliations":["National University of Defense Technology","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C25343380","display_name":"Relation (database)","score":0.8083000183105469},{"id":"https://openalex.org/C153604712","display_name":"Relationship extraction","score":0.7860000133514404},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7795000076293945},{"id":"https://openalex.org/C158154518","display_name":"Relevance (law)","score":0.695900022983551},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5788999795913696},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.5364999771118164},{"id":"https://openalex.org/C62354387","display_name":"Boundary (topology)","score":0.4348999857902527},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.4016999900341034}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4415707689","title":"CrossMuSim: A Cross-Modal Framework for Music Similarity Retrieval with LLM-Powered Text Description Sourcing and Mining","url":"https://doi.org/10.1109/icme59968.2025.11210123","published":"2025-06-30","authors":["Tristan Tsoi","Jiajun Deng","Yaolong Ju","Benno Weck","Holger Kirchhoff","Simon Lui"],"abstract":"Music similarity retrieval is fundamental for managing and exploring relevant content from large collections in streaming platforms. This paper presents a novel cross-modal contrastive learning framework that leverages the open-ended nature of text descriptions to guide music similarity modeling, addressing the limitations of traditional uni-modal approaches in capturing complex musical relationships. To overcome the scarcity of high-quality text-music paired data, this paper introduces a dual-source data acquisition approach combining online scraping and LLM-based prompting, where carefully designed prompts leverage LLMs’ comprehensive music knowledge to generate contextually rich descriptions. Extensive experiments demonstrate that the proposed framework achieves significant performance improvements over existing benchmarks through objective metrics, subjective evaluations, and real-wo...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/icme59968.2025.11210123","openalex_id":"https://openalex.org/W4415707689","cited_by_count":0,"quality_score":45,"matched_keywords":["LLM","retrieval"],"author_affiliations":["Huawei Technologies (China)","Huawei Technologies (Germany)","Pompeu Fabra University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7645999789237976},{"id":"https://openalex.org/C153083717","display_name":"Leverage (statistics)","score":0.66839998960495},{"id":"https://openalex.org/C2777946086","display_name":"Music information retrieval","score":0.6322000026702881},{"id":"https://openalex.org/C103278499","display_name":"Similarity (geometry)","score":0.6227999925613403},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.5284000039100647},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3458999991416931},{"id":"https://openalex.org/C558565934","display_name":"Musical","score":0.3440999984741211},{"id":"https://openalex.org/C73520026","display_name":"Pop music automation","score":0.3377000093460083}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4415707910","title":"VSD2M: Large-scale Vision-language Sticker Dataset for Multi-frame Animated Sticker Generation","url":"https://doi.org/10.1109/icme59968.2025.11209368","published":"2025-06-30","authors":["Zhiqiang Yuan","Jiapei Zhang","Ying Deng","Yeshuang Zhu","Jie Zhou","Jinchao Zhang"],"abstract":"media, Nowadays, advanced text-to-video algorithms have spawned numerous general video generation systems that allow users to customize high-quality, photo-realistic videos by only providing simple text prompts. However, creating customized animated stickers, which have lower frame rates and more abstract semantics than videos, is greatly hindered by difficulties in data acquisition and incomplete benchmarks. To facilitate the exploration of researchers in animated sticker generation (ASG) field, we construct the currently largest vision-language sticker dataset named \"VSD2M\" at a two-million scale that contains static and animated stickers. Furthermore, to improve the performance of traditional video generation methods on ASG tasks with discrete characteristics, we propose a Spatial Temporal Interaction layer that utilizes semantic interaction and detail preservation to address the issu...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/icme59968.2025.11209368","openalex_id":"https://openalex.org/W4415707910","cited_by_count":1,"quality_score":42,"matched_keywords":["media"],"author_affiliations":["Tencent (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8174999952316284},{"id":"https://openalex.org/C184337299","display_name":"Semantics (computer science)","score":0.6675999760627747},{"id":"https://openalex.org/C2780801425","display_name":"Construct (python library)","score":0.6107000112533569},{"id":"https://openalex.org/C126042441","display_name":"Frame (networking)","score":0.5852000117301941},{"id":"https://openalex.org/C2780586882","display_name":"Simple (philosophy)","score":0.4767000079154968},{"id":"https://openalex.org/C185798385","display_name":"Benchmark (surveying)","score":0.4625000059604645},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.44780001044273376},{"id":"https://openalex.org/C2778755073","display_name":"Scale (ratio)","score":0.436599999666214}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4415707989","title":"Fine-tuned Multimodal Large Language Models are Zero-shot Learners in Image Quality Assessment","url":"https://doi.org/10.1109/icme59968.2025.11209018","published":"2025-06-30","authors":["Rui Xiong","Li Chen","Zhida Feng","Jiaxiang Liu","Shikun Feng"],"abstract":"Image quality assessment (IQA) has traditionally relied on task-specific models, often limiting their adaptability and generalization to diverse image content. To this end, this paper introduces LV-IQA, a fine-tuned Multimodal Large Language Model (MLLM) via Visual Grounding, demonstrating zero-shot learning in IQA. The key contributions of this paper involve the proposal of a cross-modal chain of thought rooted in hierarchical semantics and quality levels. Additionally, To enhance its adaptability, the paper incorporates a visual grounding prompt generated through segmentation masks and bounding boxes to learn the correspondence between original images and their hierarchical semantics. By observing the original images and their corresponding visual grounding information, LV-IQA is empowered to employ a sophisticated chain of thought, involving the perception and comprehension of local d...","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/icme59968.2025.11209018","openalex_id":"https://openalex.org/W4415707989","cited_by_count":1,"quality_score":42,"matched_keywords":["language model"],"author_affiliations":["Baidu (China)","Wuhan University of Science and Technology"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7480000257492065},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.6326000094413757},{"id":"https://openalex.org/C184337299","display_name":"Semantics (computer science)","score":0.5863000154495239},{"id":"https://openalex.org/C177148314","display_name":"Generalization","score":0.5052000284194946},{"id":"https://openalex.org/C2779530757","display_name":"Quality (philosophy)","score":0.476500004529953},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.46549999713897705},{"id":"https://openalex.org/C26517878","display_name":"Key (lock)","score":0.4643999934196472},{"id":"https://openalex.org/C89600930","display_name":"Segmentation","score":0.4512999951839447}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4415709155","title":"Unifying Spatio-Temporal Contexts for Advanced Text-Video Retrieval","url":"https://doi.org/10.1109/icme59968.2025.11209054","published":"2025-06-30","authors":["Yanhao Huang","Baoyao Yang","Junxiang Chen","Wenbin Yao","Chen De"],"abstract":"Text-to-video retrieval (T2VR) aims to identify the most semantically relevant video based on a text query. Text queries typically involve diverse visual elements and events in video, making it non-trivial to learn a robust video feature representation for different queries. An abundance of spatial information make the model overwhelmed by redundancy and noisy and struggle to focus on linchpin visual elements. Additionally, without effective guidance, models grapple with connecting temporal information across different frames. In this paper, we introduce a Spatial-Temporal Pooling (STP) method to cohesively capture and unify the inherent spatio-temporal context within videos. For spatial information, STP leverages spatial tags such as entities, scenes, and text as attention prompts, steering the model toward salient visual elements while mitigating the impact of redundancies and noise. F...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/icme59968.2025.11209054","openalex_id":"https://openalex.org/W4415709155","cited_by_count":0,"quality_score":41,"matched_keywords":["retrieval"],"author_affiliations":["Guangdong University of Technology","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7825000286102295},{"id":"https://openalex.org/C2780719617","display_name":"Salient","score":0.7063999772071838},{"id":"https://openalex.org/C152124472","display_name":"Redundancy (engineering)","score":0.65829998254776},{"id":"https://openalex.org/C192209626","display_name":"Focus (optics)","score":0.605400025844574},{"id":"https://openalex.org/C70437156","display_name":"Pooling","score":0.5590000152587891},{"id":"https://openalex.org/C26760741","display_name":"Perception","score":0.5246000289916992},{"id":"https://openalex.org/C2779343474","display_name":"Context (archaeology)","score":0.5224999785423279},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.507099986076355}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4416249832","title":"Towards Realistic Generation: A Multi-Task Agent for Imitating Diverse Character Linguistic Styles","url":"https://doi.org/10.1109/ijcnn64981.2025.11228654","published":"2025-06-30","authors":["Siyuan Chen","Qingyi Si","Chenxu Yang","Zheng Lin","Yunzhi Liang","Siyang Tao","Zefeng Zhang","Weiping Wang"],"abstract":"The advent of large language models (LLMs) has significantly propelled the advancement of Role-Playing Agents (RPAs). However, current Role-Playing Agents predominantly focus on mimicking a character’s fundamental attributes while neglecting the replication of linguistic style, and they are incapable of effectively replicating characters when performing tasks beyond multi-turn dialogues, which results in generated responses that lack authenticity. The reason current RPAs lack this capability is due to the nature of existing character datasets, which lack collections of character quotations and are limited to multi-turn dialogue tasks, constraining the RPA’s performance across other task domains and failing to mimic a character’s linguistic style. To address this gap, we developed a multi-task role-playing dataset named MRstyle, which encompasses a substantial number of real individuals a...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/ijcnn64981.2025.11228654","openalex_id":"https://openalex.org/W4416249832","cited_by_count":0,"quality_score":41,"matched_keywords":["agent"],"author_affiliations":["Chinese Academy of Sciences","Huawei Technologies (China)","Institute of Information Engineering","University of Chinese Academy of Sciences"],"concepts":[{"id":"https://openalex.org/C2780451532","display_name":"Task (project management)","score":0.6376000046730042},{"id":"https://openalex.org/C192209626","display_name":"Focus (optics)","score":0.6305999755859375},{"id":"https://openalex.org/C2780861071","display_name":"Character (mathematics)","score":0.6101999878883362},{"id":"https://openalex.org/C41895202","display_name":"Linguistics","score":0.550599992275238},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5426999926567078},{"id":"https://openalex.org/C90673727","display_name":"Product (mathematics)","score":0.43459999561309814},{"id":"https://openalex.org/C15744967","display_name":"Psychology","score":0.3296999931335449},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.3257000148296356}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4416251573","title":"Repository-Level Code Smell Detection Based on Multi-Scale Code Information and LLM Assistance","url":"https://doi.org/10.1109/ijcnn64981.2025.11228932","published":"2025-06-30","authors":["Wenjie Liang","Y. Zeng","Haitao Zheng","Jiale Wang","Haiye Lin","Hong‐Gee Kim","Bing An","Zhao Wei","Yong Xu"],"abstract":"In software engineering, code smell detection has always been an important research task because it affects the readability and maintainability of programs. Traditional code smell detection work mostly focuses on a single file, and the study of repo-level code smell caused by the interaction of multiple files is still relatively lacking, although it is more practical in reality. Although large language models (LLMs) have recently achieved remarkable success in the field of code generation, simply copying the fine-tuning method based on LLM is not good enough in this task because it is difficult to model the logical relationship between multiple files. In this paper, we propose a repo-level code smell detection model(RSD), which divides the levels according to the distance between the rest class and the problem class, uses cross attention and CNN to model global and local information, and...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/ijcnn64981.2025.11228932","openalex_id":"https://openalex.org/W4416251573","cited_by_count":0,"quality_score":41,"matched_keywords":["LLM"],"author_affiliations":["Seoul National University","Tencent (China)","Tsinghua University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7664999961853027},{"id":"https://openalex.org/C2779151265","display_name":"Copying","score":0.7149999737739563},{"id":"https://openalex.org/C2776760102","display_name":"Code (set theory)","score":0.6517000198364258},{"id":"https://openalex.org/C2780451532","display_name":"Task (project management)","score":0.6036999821662903},{"id":"https://openalex.org/C133237599","display_name":"Code smell","score":0.5618000030517578},{"id":"https://openalex.org/C160713754","display_name":"Maintainability","score":0.5482000112533569},{"id":"https://openalex.org/C185798385","display_name":"Benchmark (surveying)","score":0.5306000113487244},{"id":"https://openalex.org/C9652623","display_name":"Field (mathematics)","score":0.5030999779701233}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4415708167","title":"FastAno: Accelerating Defect Image Generation with Efficient Sampling","url":"https://doi.org/10.1109/icme59968.2025.11209548","published":"2025-06-30","authors":["Haoyu Guan","Qianzi Yu","Kai Zhu","Yang Cao","Yu Kang"],"abstract":"Defect inspection faces the challenge of insufficient data. Although existing defect generation methods can produce high-quality defect images, the time-consuming generation process hinders the online availability. To solve it, we propose FastAno, a four-step sampling model for rapid defect generation. Specifically, we first introduce the Adaptive Defect-specific Loss, which calculates region-weighted feature loss to enhance shortcut mapping of defect distribution. Secondly, we propose the Dynamic Attention Optimization Strategy, which enhances the attention activation of the anomaly semantics to improve the generation of defects, while adaptively suppressing the activation of normal semantics to mitigate the degradation of non-defect regions. Extensive experiments on MVTec AD dataset demonstrate that our method achieves significantly faster generation speed while maintaining high genera...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/icme59968.2025.11209548","openalex_id":"https://openalex.org/W4415708167","cited_by_count":0,"quality_score":41,"matched_keywords":["efficient"],"author_affiliations":["Alibaba Group (China)","University of Science and Technology of China"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7139999866485596},{"id":"https://openalex.org/C184337299","display_name":"Semantics (computer science)","score":0.6248999834060669},{"id":"https://openalex.org/C98045186","display_name":"Process (computing)","score":0.5849999785423279},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4819999933242798},{"id":"https://openalex.org/C140779682","display_name":"Sampling (signal processing)","score":0.46560001373291016},{"id":"https://openalex.org/C2779679103","display_name":"Degradation (telecommunications)","score":0.44119998812675476},{"id":"https://openalex.org/C2776401178","display_name":"Feature (linguistics)","score":0.43709999322891235},{"id":"https://openalex.org/C739882","display_name":"Anomaly detection","score":0.435699999332428}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4416251593","title":"Eliminating Retrieval Knowledge Conflicts: Cross-Validation Re-ranking with Large Language Models","url":"https://doi.org/10.1109/ijcnn64981.2025.11228012","published":"2025-06-30","authors":["Qirui Wu","Hai Lin","Haitao Zheng","Ruobing Xie","Saiyong Yang","Xingwu Sun","Zhan Kang","Hong‐Gee Kim"],"abstract":"In retrieval-augmented generation (RAG) systems, Large Language Models (LLMs) have been shown to be effective for re-ranking. However, existing research often prioritizes passage relevance over reliability, which can result in the incorporation of conflicting information and the generation of ambiguous responses. This issue becomes particularly pronounced when addressing inter-context knowledge conflicts, where candidate documents present contradictory information that may mislead the model. To mitigate this problem, we propose a novel cross-validation re-ranking technique designed specifically to resolve inter-context knowledge conflicts during the retrieval process. We also develop a new dataset, ContraPRT, to evaluate the ability of models to rank passages containing conflicting knowledge. Experimental results using GPT-4 and LlaMA3-70B demonstrate that our approach not only effective...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/ijcnn64981.2025.11228012","openalex_id":"https://openalex.org/W4416251593","cited_by_count":0,"quality_score":41,"matched_keywords":["retrieval"],"author_affiliations":["Seoul National University","Tencent (China)","Tsinghua University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.807200014591217},{"id":"https://openalex.org/C158154518","display_name":"Relevance (law)","score":0.7418000102043152},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.5600000023841858},{"id":"https://openalex.org/C44291984","display_name":"Question answering","score":0.46560001373291016},{"id":"https://openalex.org/C164226766","display_name":"Rank (graph theory)","score":0.4343999922275543},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.43140000104904175},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.4106000065803528},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.3682999908924103}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4416251412","title":"EMMA: Your Text-to-Image Diffusion Model Can Secretly Accept Multi-Modal Prompts","url":"https://doi.org/10.1109/ijcnn64981.2025.11228924","published":"2025-06-30","authors":["Yucheng Han","Rui Wang","Chi Zhang","Juntao Hu","Pei Chen","Bin Fu","Hanwang Zhang"],"abstract":"Recent advancements in image generation have enabled the creation of high-quality images from diverse conditions, such as text and images. However, existing methods struggle to balance multiple conditions effectively, typically favoring certain modalities over others. To address this challenge, we introduce EMMA, a novel multi-modal image generation model that integrates both text and additional modalities, which we refer to as \"text + X\", to guide the generation of images. EMMA incorporates an innovative Multi-modal Feature Connector design, which effectively integrates textual and supplementary modal information through a special attention mechanism. Additionally, we propose a method for the assembly of existing modules to produce images conditioned on multiple modalities at the same time, eliminating the need for additional training. This modular nature also facilitates easy adaptatio...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/ijcnn64981.2025.11228924","openalex_id":"https://openalex.org/W4416251412","cited_by_count":0,"quality_score":41,"matched_keywords":["personalized"],"author_affiliations":["Fudan University","Nanyang Technological University","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7001000046730042},{"id":"https://openalex.org/C101468663","display_name":"Modular design","score":0.6967999935150146},{"id":"https://openalex.org/C2779903281","display_name":"Modalities","score":0.607699990272522},{"id":"https://openalex.org/C2776401178","display_name":"Feature (linguistics)","score":0.5644999742507935},{"id":"https://openalex.org/C2776459999","display_name":"Fidelity","score":0.5386999845504761},{"id":"https://openalex.org/C115961682","display_name":"Image (mathematics)","score":0.5194000005722046},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.49810001254081726},{"id":"https://openalex.org/C2776187449","display_name":"Natural language generation","score":0.4302000105381012}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4416252107","title":"Breaking Static Barriers: Dynamic Post-Training Quantization for Diffusion Models","url":"https://doi.org/10.1109/ijcnn64981.2025.11228929","published":"2025-06-30","authors":["Yang Xiao","Huixia Li","Lijiang Li","Xiawu Zheng","Yuexiao Ma","Feng Ling","Jie Wu","Xuefeng Xiao","Rui Wang","Min Zheng","Fei Chao"],"abstract":"Current Post-Training Quantization (PTQ) schemes have been extensively studied for traditional convolutional neural networks and language models; however, PTQ application in diffusion models has shown significant performance degradation due to static settings of PTQ. Existing methods only uniformly and statically sample during each denoising step to construct calibration sets, neglecting the different importance of different steps in diffusion models. Furthermore, diffusion models exhibit a large number of activations with skewed distributions, and maintaining a static zero-point during the reconstruction process causes the model to converge only to local optima. To solve these limitations, it is necessary to dynamically design calibration dataset construction methods for different quantization scenarios and develop specialized optimization strategies tailored to specific activation dist...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/ijcnn64981.2025.11228929","openalex_id":"https://openalex.org/W4416252107","cited_by_count":0,"quality_score":41,"matched_keywords":["quantization"],"author_affiliations":["Tencent (China)","Xiamen University"],"concepts":[{"id":"https://openalex.org/C28855332","display_name":"Quantization (signal processing)","score":0.7318999767303467},{"id":"https://openalex.org/C11413529","display_name":"Algorithm","score":0.6101999878883362},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5515000224113464},{"id":"https://openalex.org/C68710425","display_name":"Diffusion process","score":0.47600001096725464},{"id":"https://openalex.org/C165838908","display_name":"Calibration","score":0.43220001459121704},{"id":"https://openalex.org/C126255220","display_name":"Mathematical optimization","score":0.4027999937534332},{"id":"https://openalex.org/C33923547","display_name":"Mathematics","score":0.38190001249313354},{"id":"https://openalex.org/C69357855","display_name":"Diffusion","score":0.3725999891757965}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4415708212","title":"A Multi-Stage Framework for Multimodal Controllable Speech Synthesis","url":"https://doi.org/10.1109/icme59968.2025.11210021","published":"2025-06-30","authors":["Rui Niu","Weihao Wu","Jie Chen","Long Ma","Zhiyong Wu"],"abstract":"Controllable speech synthesis aims to control the style of generated speech using reference input, which can be of various modalities. Existing face-based methods struggle with robustness and generalization due to data quality constraints, while text prompt methods offer limited diversity and fine-grained control. Although multimodal approaches aim to integrate various modalities, their reliance on fully matched training data significantly constrains their performance and applicability. This paper proposes a 3-stage multimodal controllable speech synthesis framework to address these challenges. For face encoder, we use supervised learning and knowledge distillation to tackle generalization issues. Furthermore, the text encoder is trained on both text-face and text-speech data to enhance the diversity of the generated speech. Experimental results demonstrate that this method outperforms s...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/icme59968.2025.11210021","openalex_id":"https://openalex.org/W4415708212","cited_by_count":0,"quality_score":41,"matched_keywords":["distillation"],"author_affiliations":["Tencent (China)","Tsinghua University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7882999777793884},{"id":"https://openalex.org/C63479239","display_name":"Robustness (evolution)","score":0.6090999841690063},{"id":"https://openalex.org/C14999030","display_name":"Speech synthesis","score":0.5935999751091003},{"id":"https://openalex.org/C177148314","display_name":"Generalization","score":0.5730999708175659},{"id":"https://openalex.org/C28490314","display_name":"Speech recognition","score":0.5673999786376953},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5669999718666077},{"id":"https://openalex.org/C118505674","display_name":"Encoder","score":0.5116000175476074},{"id":"https://openalex.org/C51632099","display_name":"Training set","score":0.4415000081062317}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4415708254","title":"DreamPBR: Text-driven High-Resolution SVBRDF Generation with Multimodal Guidance","url":"https://doi.org/10.1109/icme59968.2025.11210202","published":"2025-06-30","authors":["Linxuan Xin","Zheng Zhang","Zhiyi Pan","Jinfu Wei","Duan Gao","Wei Gao"],"abstract":"Existing material creation methods are limited in diversity due to the scarcity of real-world data. To enhance controllability and diversity, we propose DreamPBR, a diffusion-based generative framework that creates spatially varying appearance properties guided by text and multimodal controls. By integrating large-scale vision-language models trained on billions of text-image pairs with material priors from hundreds of Physically Based Rendering (PBR) samples, we achieve high-quality PBR material generation. We employ a material Latent Diffusion Model (m-LDM) to map albedo maps to latent space, which is then decoded into full Spatially Varying Bidirectional Reflectance Distribution Function (SVBRDF) parameter maps via a rendering-aware PBR decoder. To achieve diverse control, we introduce a multimodal guidance module that includes image and 3D shape guidance. We demonstrate DreamPBR’s ef...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/icme59968.2025.11210202","openalex_id":"https://openalex.org/W4415708254","cited_by_count":3,"quality_score":40,"matched_keywords":[],"author_affiliations":["Cloud Computing Center","Huawei Technologies (China)","Huawei Technologies (United States)","Peking University","Tsinghua University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6984000205993652},{"id":"https://openalex.org/C205711294","display_name":"Rendering (computer graphics)","score":0.6432999968528748},{"id":"https://openalex.org/C48209547","display_name":"Controllability","score":0.6136000156402588},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5514000058174133},{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.5432999730110168},{"id":"https://openalex.org/C177769412","display_name":"Prior probability","score":0.5397999882698059},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.4618000090122223},{"id":"https://openalex.org/C167966045","display_name":"Generative model","score":0.361299991607666}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":3}},{"id":"openalex:W4411836207","title":"AirFusion: sensor data fusion for air quality monitoring","url":"https://doi.org/10.1080/01431161.2025.2516692","published":"2025-06-30","authors":["Mirna Elbestar","Sherif G. Aly","Rami Ghannam","Hesham M. Eraqi"],"abstract":"Since traditional air quality monitoring methods often rely on geographically sparse and costly air quality monitoring stations, image-based air quality methodologies are recently offering a compelling alternative that utilizes images from sources like satellites, traffic cameras, and even smartphones to monitor pollution levels by using estimation models, image-processing techniques, and deep learning models. In this paper, we introduce a novel, multimodal dataset designed to address the limitations of existing resources, which are often restricted in size, geographical coverage, and fixed-scene imagery, impeding the generalization of existing deep learning prediction models. Our AirFusion dataset comprises 9,411 images paired with synchronized meteorological and geospatial readings, collected by a portable commercial air quality sensor, from 179 diverse locations. Moreover, we introduc...","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1080/01431161.2025.2516692","openalex_id":"https://openalex.org/W4411836207","cited_by_count":1,"quality_score":38,"matched_keywords":[],"author_affiliations":["Amazon (United States)","American University in Cairo","University of Glasgow"],"concepts":[{"id":"https://openalex.org/C62649853","display_name":"Remote sensing","score":0.5780851244926453},{"id":"https://openalex.org/C33954974","display_name":"Sensor fusion","score":0.5566843748092651},{"id":"https://openalex.org/C39432304","display_name":"Environmental science","score":0.5110553503036499},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.4557918310165405},{"id":"https://openalex.org/C2779530757","display_name":"Quality (philosophy)","score":0.4454602897167206},{"id":"https://openalex.org/C158525013","display_name":"Fusion","score":0.43343430757522583},{"id":"https://openalex.org/C24756922","display_name":"Data quality","score":0.42570269107818604},{"id":"https://openalex.org/C127313418","display_name":"Geology","score":0.18820273876190186}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4415708193","title":"VividPose: Vividly 3D-driven Stable Pose Diffusion of High Facial Fidelity","url":"https://doi.org/10.1109/icme59968.2025.11209876","published":"2025-06-30","authors":["Qilin Wang","Zhengkai Jiang","Chengming Xu","Jiangning Zhang","Yabiao Wang","Xinyi Zhang","Yun Cao","Weijian Cao","Chengjie Wang","Zhanxiong Wang","Yanwei Fu"],"abstract":"Human image animation aims to generate a video from a static image by following a specified pose sequence. Existing methods typically adopt a multi-stage pipeline that separately learns appearance and motion, leading to appearance degradation and temporal inconsistencies. To address these issues, we propose VividPose, an innovative end-to-end pipeline based on Stable Video Diffusion (SVD) that ensures superior temporal stability. To enhance the retention of human identity, we propose an identity-aware appearance controller that integrates additional facial information without compromising other appearance details such as clothing texture and background. To accommodate diverse human body shapes and hand movements, we introduce a geometry-aware pose controller that utilizes both rendering maps and skeleton maps. Extensive qualitative and quantitative experiments on the UBCFashion and TikTo...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/icme59968.2025.11209876","openalex_id":"https://openalex.org/W4415708193","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Fudan University","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7785999774932861},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.7649000287055969},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.7638999819755554},{"id":"https://openalex.org/C205711294","display_name":"Rendering (computer graphics)","score":0.5830000042915344},{"id":"https://openalex.org/C43521106","display_name":"Pipeline (software)","score":0.5156000256538391},{"id":"https://openalex.org/C502989409","display_name":"Animation","score":0.5134000182151794},{"id":"https://openalex.org/C138591656","display_name":"Computer facial animation","score":0.43209999799728394},{"id":"https://openalex.org/C83248878","display_name":"Active appearance model","score":0.430400013923645}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4415708362","title":"UniVG: Towards UNIfied-modal Video Generation","url":"https://doi.org/10.1109/icme59968.2025.11210223","published":"2025-06-30","authors":["Ludan Ruan","Lei Tian","Chuanwei Huang","Xu Zhang","Xinyan Xiao"],"abstract":"Diffusion based video generation has received significant attention in both the academic and industrial communities. Despite recent exploration of diverse conditional inputs for better video generation control, existing methods, primarily targeting individual tasks, often fall short in real-world scenarios where users may use any form of conditioning, either individually or combined. To address this, we propose a Unified-modal Video Generation system capable of handling multiple video generation tasks across different modalities. Our approach introduces the concept of generative freedom in the diffusion process, which allows us to reclassify video generation tasks into high-freedom and low-freedom categories based on the solution space given certain conditions. We then design different diffusion paradigms for each category. For high-freedom video generation, we present a base model that....","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/icme59968.2025.11210223","openalex_id":"https://openalex.org/W4415708362","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Baidu (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7653999924659729},{"id":"https://openalex.org/C2776214188","display_name":"Inference","score":0.4993000030517578},{"id":"https://openalex.org/C202474056","display_name":"Video tracking","score":0.4903999865055084},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4650999903678894},{"id":"https://openalex.org/C98045186","display_name":"Process (computing)","score":0.4277999997138977},{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.4268999993801117},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.39629998803138733},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.38359999656677246}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4416249389","title":"SmellDetector: Multi-Label Code Smell Detection and Refactoring with Large Language Models","url":"https://doi.org/10.1109/ijcnn64981.2025.11227837","published":"2025-06-30","authors":["Wenjie Liang","Jiale Wang","Haitao Zheng","Yinghui Li","Haiye Lin","Hong‐Gee Kim","Bing An","Wei Zhao","Yong Xu"],"abstract":"Large Language Models (LLMs) have demonstrated impressive capabilities in many tasks such as code generation and automated program repair. However, code LLMs have ignored another important task in programmers’ daily development work, which is to improve the maintainability, readability, and scalability of the program. All of these characteristics are related to code smells and we study how to improve them by detecting and removing code smells. Most works on code smells still rely on using measures formulated by experts as features, but lack of use of the rich prior knowledge contained in code LLMs. In this paper, we propose SmellDetector, a comprehensive model for both code smell detection and refactoring opportunities detection in Java. We train the model with the designed prompt which contains both code smells of class-level and method-level in the same code snippet, including more tha...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/ijcnn64981.2025.11227837","openalex_id":"https://openalex.org/W4416249389","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Seoul National University","Tencent (China)","Tsinghua University"],"concepts":[{"id":"https://openalex.org/C152752567","display_name":"Code refactoring","score":0.958899974822998},{"id":"https://openalex.org/C133237599","display_name":"Code smell","score":0.8931999802589417},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7771999835968018},{"id":"https://openalex.org/C2776760102","display_name":"Code (set theory)","score":0.7268999814987183},{"id":"https://openalex.org/C2780451532","display_name":"Task (project management)","score":0.5550000071525574},{"id":"https://openalex.org/C199360897","display_name":"Programming language","score":0.5321999788284302},{"id":"https://openalex.org/C48044578","display_name":"Scalability","score":0.4187999963760376},{"id":"https://openalex.org/C133162039","display_name":"Code generation","score":0.41819998621940613}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4416251298","title":"Make \"V\" and \"Q\" Inseparable: Deliberately Dual-Channel Adversarial Learning for Robust Visual Question Answering","url":"https://doi.org/10.1109/ijcnn64981.2025.11228973","published":"2025-06-30","authors":["Hanxiao Wu","Zhaowen Li","Feilong Chen","Zhiyu Wang","Jiali Xu","Liqun Hu","Huaixuan Cao","Yin Li","Jinqiao Wang","Jianlong Chang"],"abstract":"Visual Question Answering (VQA) is a challenging task due to the vision-language biases which restrict the model to sufficiently learn the multi-modal knowledge from visual image and natural language simultaneously. Several recent works attempt to alleviate this problem via weakening language prior but ignore vision prior, hindering further performance improvement. In this paper, we propose a novel Deliberately Dual-Channel Adversarial Learning (DCAL) to make \"V\" and \"Q\" inseparable, which aims to weaken prior from both vision and language. Specifically, DCAL introduces in-batch random negative sampling to force the model to be wrong when given the wrong questions or images. DCAL maximizes the likelihood of correct answers for the original question-image pairs and minimizes it for random negative samples. In order to solve the problem of false negatives, DCAL exploits a deliberate traini...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/ijcnn64981.2025.11228973","openalex_id":"https://openalex.org/W4416251298","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Huawei Technologies (China)","Institute of Automation","Jinan Institute of Quantum Technology","Shandong Institute of Automation","Shandong Iron and Steel Group (China)","Shandong Lianxing Energy Group (China)","Wuhan University of Technology"],"concepts":[{"id":"https://openalex.org/C44291984","display_name":"Question answering","score":0.8330000042915344},{"id":"https://openalex.org/C37736160","display_name":"Adversarial system","score":0.7656999826431274},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6988000273704529},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.6703000068664551},{"id":"https://openalex.org/C165696696","display_name":"Exploit","score":0.6309000253677368},{"id":"https://openalex.org/C2780451532","display_name":"Task (project management)","score":0.6158000230789185},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.5212000012397766},{"id":"https://openalex.org/C195324797","display_name":"Natural language","score":0.4814000129699707}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4415709510","title":"Instruction-Aligned Visual Attention for Mitigating Hallucinations in Large Vision-Language Models","url":"https://doi.org/10.1109/icme59968.2025.11209139","published":"2025-06-30","authors":["Bin Li","Dehong Gao","Yimin Wang","Linbo Jin","Shanqing Yu","Xiaoyan Cai","Libin Yang"],"abstract":"Despite the significant success of Large Vision-Language models(LVLMs), these models still suffer hallucinations when describing images, generating answers that include non-factual objects. It is reported that these models tend to overfocus on certain irrelevant image tokens that do not contain critical information for answering the question and distort the output. To address this, we propose an Instruction-Aligned Visual Attention(IAVA) approach, which identifies irrelevant tokens by comparing changes in attention weights under two different instructions. By applying contrastive decoding, we dynamically adjust the logits generated from original image tokens and irrelevant image tokens, reducing the model’s over-attention to irrelevant information. The experimental results demonstrate that IAVA consistently outperforms existing decoding techniques on benchmarks such as MME, POPE, and Tex...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/icme59968.2025.11209139","openalex_id":"https://openalex.org/W4415709510","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Alibaba Group (China)","Northwestern Polytechnical University","Zhejiang University of Technology"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.692300021648407},{"id":"https://openalex.org/C2781238097","display_name":"Object (grammar)","score":0.5447999835014343},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.541700005531311},{"id":"https://openalex.org/C57273362","display_name":"Decoding methods","score":0.5005000233650208},{"id":"https://openalex.org/C115961682","display_name":"Image (mathematics)","score":0.4625999927520752},{"id":"https://openalex.org/C2986089797","display_name":"Visual attention","score":0.4146000146865845},{"id":"https://openalex.org/C180747234","display_name":"Cognitive psychology","score":0.39590001106262207},{"id":"https://openalex.org/C2780451532","display_name":"Task (project management)","score":0.36800000071525574}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4415708280","title":"Few-Shot 3D Face Generation via a Controllable Diffusion Model Guided by Text and Images","url":"https://doi.org/10.1109/icme59968.2025.11210230","published":"2025-06-30","authors":["Jinfu Wei","Zheng Zhang","Qinchuan Zhang","Ran Liao","Duan Gao"],"abstract":"Recent advancements in text-to-3D generation have relied on large 3D datasets or expensive optimization processes during inference. In this paper, we introduce ControlFace, a novel framework designed for the creation of computer graphics-friendly 3D faces under the guidance of text and images. We utilize a controllable diffusion model to generate physically-based facial assets in texture space. The key to achieving few-shot generation lies in 3D-aware controls: a texture-space facial representation of geometry proxy. The main distinguishing feature of our framework is the effective integration of 3D facial priors with the diversity inherited from text-to-image diffusion models through few-shot learning, requiring only 36 3D faces for training. Once trained, ControlFace can generate diverse 3D faces in a feed-forward manner within 5 seconds and perform editing and stylization without 3D l...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/icme59968.2025.11210230","openalex_id":"https://openalex.org/W4415708280","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Huawei Technologies (China)","Tsinghua University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7533000111579895},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.677299976348877},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.6338000297546387},{"id":"https://openalex.org/C2779304628","display_name":"Face (sociological concept)","score":0.5996999740600586},{"id":"https://openalex.org/C2776401178","display_name":"Feature (linguistics)","score":0.5952000021934509},{"id":"https://openalex.org/C2776359362","display_name":"Representation (politics)","score":0.5421000123023987},{"id":"https://openalex.org/C2776674983","display_name":"Image editing","score":0.5275999903678894},{"id":"https://openalex.org/C177769412","display_name":"Prior probability","score":0.5044000148773193}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4415708419","title":"Content-Style Disentangled Audio Style Transfer via Diffusion Model","url":"https://doi.org/10.1109/icme59968.2025.11209789","published":"2025-06-30","authors":["Yiran Wang","Jiasheng Lu","Jun Chen","Xinyu Zhang","Yingshan Liang","Zhimin Du","Qingyang Shi","Shao‐Lun Huang"],"abstract":"Deep generative models have advanced the synthesis of high-quality audio signals, shifting the focus from audio fidelity to user-specific customization. Despite significant progress, current models struggle to generate style-consistent audio. Audio style transfer offers a more intuitive approach for capturing user intent but faces challenges in the disentanglement and interpretation of content and style. This paper introduces a novel framework for content-style disentangled audio style transfer. We introduce an interpretable, formula-based style distance that effectively disentangles content and style within the language-audio feature space. The proposed QwenAudio-Contrastive Language Audio Pretraining (Qwen-CLAP) content extraction module and the CLAP-based style disentanglement loss coordinated with the style reconstruction loss, enable interpretable disentanglement and stylization. Co...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/icme59968.2025.11209789","openalex_id":"https://openalex.org/W4415708419","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Huawei Technologies (China)","Tsinghua University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.679099977016449},{"id":"https://openalex.org/C2776459999","display_name":"Fidelity","score":0.6654999852180481},{"id":"https://openalex.org/C2776445246","display_name":"Style (visual arts)","score":0.633400022983551},{"id":"https://openalex.org/C192209626","display_name":"Focus (optics)","score":0.5927000045776367},{"id":"https://openalex.org/C113364801","display_name":"High fidelity","score":0.5460000038146973},{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.5091999769210815},{"id":"https://openalex.org/C167966045","display_name":"Generative model","score":0.4848000109195709},{"id":"https://openalex.org/C28490314","display_name":"Speech recognition","score":0.4523000121116638}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4415707817","title":"Beyond Sliders: Mastering the Art of Diffusion-based Image Manipulation","url":"https://doi.org/10.1109/icme59968.2025.11208968","published":"2025-06-30","authors":["Yufei Tang","Daiheng Gao","Ping Wu","Wenbo Zhou","Bang Zhang","Weimin Zhang"],"abstract":"In the realm of image generation, the quest for realism and customization has never been more pressing. While existing methods like concept sliders have made strides, they often falter when it comes to non-AIGC images, particularly images captured in real-world settings. To bridge this gap, we introduce Beyond Sliders, an innovative framework that integrates GANs and diffusion models to facilitate sophisticated image manipulation across diverse image categories. Improved upon concept sliders, our method refines the image through fine-grained guidance—both textual and visual—in an adversarial manner, leading to a marked enhancement in image quality and realism. Extensive experimental validation confirms the robustness and versatility of Beyond Sliders across a spectrum of applications.","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/icme59968.2025.11208968","openalex_id":"https://openalex.org/W4415707817","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Alibaba Group (China)","Alibaba Group (United States)","Fujian University of Technology","University of Science and Technology of China"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6812000274658203},{"id":"https://openalex.org/C2987933465","display_name":"Image manipulation","score":0.6754000186920166},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.640500009059906},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.6365000009536743},{"id":"https://openalex.org/C63479239","display_name":"Robustness (evolution)","score":0.541100025177002},{"id":"https://openalex.org/C115961682","display_name":"Image (mathematics)","score":0.5307999849319458},{"id":"https://openalex.org/C37736160","display_name":"Adversarial system","score":0.3822000026702881},{"id":"https://openalex.org/C121684516","display_name":"Computer graphics (images)","score":0.3797000050544739}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4416252100","title":"Accelerated Dropout: A Bitmask Approach to Speed Up Model Training","url":"https://doi.org/10.1109/ijcnn64981.2025.11228370","published":"2025-06-30","authors":["Jincheng Xie","Hao Shi","Sicheng Xu","Yan Zhang","Zhenyu Ming","Zhongyi Huang"],"abstract":"Dropout [1], a standard regularization technique in Large Language Models, incurs extra computational overhead, particularly due to its repeated application during the training process. To address this issue, we propose Accelerated Dropout, an algorithm that revolutionizes the traditional dropout method by employing a bitmask, rather than a float mask, to significantly expedite the training process. In the theoretical proof, we demonstrate that accelerated dropout is exponentially convergent to the probability of retaining neurons. Our extensive experimental analysis, conducted on 13 benchmark datasets and 9 deep learning models, confirms that accelerated dropout outperforms traditional dropout in terms of training efficiency and generalization performance. The experimental results indicate that, compared to torch dropout, our accelerated dropout achieves 8x speedup on a single dropout o...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/ijcnn64981.2025.11228370","openalex_id":"https://openalex.org/W4416252100","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Huawei Technologies (China)","Tsinghua University"],"concepts":[{"id":"https://openalex.org/C2776145597","display_name":"Dropout (neural networks)","score":0.859499990940094},{"id":"https://openalex.org/C68339613","display_name":"Speedup","score":0.7937999963760376},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7153000235557556},{"id":"https://openalex.org/C185798385","display_name":"Benchmark (surveying)","score":0.5662000179290771},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5546000003814697},{"id":"https://openalex.org/C2776135515","display_name":"Regularization (linguistics)","score":0.4830000102519989},{"id":"https://openalex.org/C2777211547","display_name":"Training (meteorology)","score":0.46860000491142273},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.44290000200271606}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4415707723","title":"ALCReg: Active Label Correction for Partial Point Cloud Registration","url":"https://doi.org/10.1109/icme59968.2025.11209927","published":"2025-06-30","authors":["Zongyi Xu","Xinqi Jiang","Xinyu Gao","Shanshan Zhao","Qianni Zhang","Weisheng Li","Xinbo Gao"],"abstract":"Deep point cloud registration methods encounter challenges due to partial overlaps and are heavily reliant on labeled data. In this paper, we propose ALCReg, an active label correction method for partial point cloud registration learning. ALCReg utilises a multimodal approach to generate pseudo labels, mitigating the cold-start issue in active learning. To ensure the diversity and representativeness of selected samples, we propose an inlier ratio based query strategy for manual correction. Furthermore, an innovative self-correction mechanism based on consistency is introduced, allowing the model to refine pseudo labels autonomously and further improve model performance. Experimental results on the 3DMatch and 3DLoMatch datasets demonstrate that ALCReg achieves comparable performance with the fully-supervised registration methods, even with only 5% of labeled samples, making it the first....","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/icme59968.2025.11209927","openalex_id":"https://openalex.org/W4415707723","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Alibaba Group (China)","Alibaba Group (United States)","Chongqing University of Posts and Telecommunications","Queen Mary University of London"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7319999933242798},{"id":"https://openalex.org/C131979681","display_name":"Point cloud","score":0.7275999784469604},{"id":"https://openalex.org/C2776436953","display_name":"Consistency (knowledge bases)","score":0.6068999767303467},{"id":"https://openalex.org/C77967617","display_name":"Active learning (machine learning)","score":0.5702000260353088},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5540000200271606},{"id":"https://openalex.org/C79974875","display_name":"Cloud computing","score":0.5475999712944031},{"id":"https://openalex.org/C37381756","display_name":"Representativeness heuristic","score":0.459199994802475},{"id":"https://openalex.org/C124101348","display_name":"Data mining","score":0.4309999942779541}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/efficientxlang-towards-improving-token-efficiency-through-cross-lingual-reasoning","title":"EfficientXLang: Towards Improving Token Efficiency Through Cross-Lingual Reasoning","url":"https://www.microsoft.com/en-us/research/publication/efficientxlang-towards-improving-token-efficiency-through-cross-lingual-reasoning/","published":"2025-06-29","authors":["Sanchit Ahuja","Praneetha Vaddamanu","Barun Patra"],"abstract":"Despite recent advances in Language Reasoning Models (LRMs), most research focuses solely on English, even though many models are pretrained on multilingual data. In this work, we investigate: Is English the most token-efficient language for reasoning? We evaluate three open-source RLMs: DeepSeek R1, Qwen 2.5 and Qwen 3, across four math datasets and seven typologically diverse languages. We find that reasoning in non-English languages not only reduces token usage, but also preserves accuracy. These gains persist even after translating the reasoning traces into English, suggesting genuine shifts in reasoning behavior rather than surface-level linguistic effects. The extent of improvement, however, depends on the models multilingual strength. Our findings motivate a broader view of reasoning in language models, highlighting the potential of multilingual reasoning and the importance of str...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Article (Journal)","Artificial intelligence","Human language technologies","Computation and Language","Computer science","efficient"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/teacher-ai-collaboration-for-curating-and-customizing-lesson-plans-in-low-resource-schools","title":"Teacher-AI Collaboration for Curating and Customizing Lesson Plans in Low-Resource Schools","url":"https://www.microsoft.com/en-us/research/publication/teacher-ai-collaboration-for-curating-and-customizing-lesson-plans-in-low-resource-schools/","published":"2025-06-29","authors":["Deepak Varuvel Dennison","Bakhtawar Ahtisham","Kavyansh Chourasia","Nirmit Arora","Rahul Singh","René F. Kizilcec","Akshay Nambi","Tanuja Ganu","Aditya Vashistha"],"abstract":"This study investigates Shiksha copilot, an AI-assisted lesson planning tool deployed in government schools across Karnataka, India. The system combined LLMs and human expertise through a structured process in which English and Kannada lesson plans were co-created by curators and AI; teachers then further customized these curated plans for their classrooms using their own expertise alongside AI support. Drawing on a large-scale mixed-methods study involving 1,043 teachers and 23 curators, we examine how educators collaborate with AI to generate context-sensitive lesson plans, assess the quality of AI-generated content, and analyze shifts in teaching practices within multilingual, low-resource environments. Our findings show that teachers used Shiksha copilot both to meet administrative documentation needs and to support their teaching. The tool eased bureaucratic workload, reduced lesson...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Miscellaneous","Artificial intelligence","Computer science","Educational technology"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/best-route-adaptive-llm-routing-with-test-time-optimal-compute","title":"BEST-Route: Adaptive LLM Routing with Test-Time Optimal Compute","url":"https://www.microsoft.com/en-us/research/publication/best-route-adaptive-llm-routing-with-test-time-optimal-compute/","published":"2025-06-27","authors":["Dujian Ding","Ankur Mallick","Shaokun Zhang","Chi Wang","Daniel Madrigal","Mirian Hipolito Garcia","Menglin Xia","L. Lakshmanan","Qingyun Wu","Victor Ruehle"],"abstract":"Large language models (LLMs) are powerful tools but are often expensive to deploy at scale. LLM query routing mitigates this by dynamically assigning queries to models of varying cost and quality to obtain a desired trade-off. Prior query routing approaches generate only one response from the selected model and a single response from a small (inexpensive) model was often not good enough to beat a response from a large (expensive) model due to which they end up overusing the large model and missing out on potential cost savings. However, it is well known that for small models, generating multiple responses and selecting the best can enhance quality while remaining cheaper than a single large-model response. We leverage this idea to propose BEST-Route, a novel routing framework that chooses a model and the number of responses to sample from it based on query difficulty and the quality thre...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Article (Journal)","Artificial intelligence","Computer science","LLM"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"apple:uxvrsub6fa99ckyjw70zh2mi","title":"Aligning LLMs by Predicting Preferences from User Writing Samples","url":"https://machinelearning.apple.com/research/predicting-preferences","published":"2025-06-27","authors":["Stéphane Aroca-Ouellette","Natalie Mackraz","Barry-John Theobald","Katherine Metcalf"],"abstract":"Accommodating human preferences is essential for creating aligned LLM agents that deliver personalized and effective interactions. Recent work has shown the potential for LLMs acting as writing agents to infer a description of user preferences. Agent alignment then comes from conditioning on the inferred preference description. However, existing methods often produce generic preference descriptions that fail to capture the unique and...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["LLM","personalized","preference","agent"],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"bytedance-seed:293","title":"Investigating the Overlooked Hessian Structure: From CNNs to LLMs","url":"https://seed.bytedance.com/en/research/investigating-the-overlooked-hessian-structure-from-cnns-to-llms","published":"2025-06-27","authors":["Qian-Yuan Tang","Yufei Gu","Yunfeng Cai","Mingming Sun","Ping Li","Xun Zhou","Zeke Xie"],"abstract":"It is well-known that the Hessian of deep loss landscape matters to optimization and generalization of deep learning. Previous studies reported a rough Hessian structure in deep learning, which consists of two components, a small number of large eigenvalues and a large number of nearly-zero eigenvalues. To the best of our knowledge, we are the first to report that a simple but overlooked power-law Hessian structure exists in well-trained deep neural networks, including Convolutional Neural Networks (CNNs) and Large Language Models (LLMs). Moreover, we provide a maximum-entropy theoretical interpretation for the power-law Hessian structure and theoretically demonstrate the existence of a robust and low-dimensional subspace of deep neural networks. Our extensive experiments using the proposed power-law spectral method demonstrate that the power-law Hessian spectra critically relate to mult...","companies":["ByteDance/Seed"],"matched_orgs":["ByteDance/Seed"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Deep Learning","LLM","ICML 2025"],"author_affiliations":["ByteDance/Seed"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official ByteDance Seed publication API https://seed.bytedance.com/api/get_article_list_v2"}},{"id":"official:c541986211ae4efe","title":"Gemini 2.5 Pro Model Card","url":"https://storage.googleapis.com/deepmind-media/Model-Cards/Gemini-2-5-Pro-Model-Card.pdf","published":"2025-06-27","authors":["Google/DeepMind"],"abstract":"Official Google DeepMind model card.","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"model_card","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["model card","Gemini 2.5 Pro"],"author_affiliations":["Google/DeepMind"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Google DeepMind model cards page https://deepmind.google/models/model-cards/"}},{"id":"apple:wmqe73eekfdkmw62m2qwswar","title":"Variational Rectified Flow Matching","url":"https://machinelearning.apple.com/research/variational","published":"2025-06-27","authors":["Pengsheng Guo","Alexander G. Schwing"],"abstract":"We study Variational Rectified Flow Matching, a framework that enhances classic rectified flow matching by modeling multi-modal velocity vector-fields. At inference time, classic rectified flow matching 'moves' samples from a source distribution to the target distribution by solving an ordinary differential equation via integration along a velocity vector-field. At training time, the velocity vector-field is learnt by linearly interpolating...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"apple:oycc8tdeokrwaay870ne3psp","title":"Trade-offs in Data Memorization via Strong Data Processing Inequalities","url":"https://machinelearning.apple.com/research/trade-offs","published":"2025-06-27","authors":["Vitaly Feldman","Guy Kornowski","Xin Lyu"],"abstract":"Recent research demonstrated that training large language models involves memorization of a significant fraction of training data. Such memorization can lead to privacy violations when training on sensitive user data and thus motivates the study of data memorization's role in learning.In this work, we develop a general approach for proving lower bounds on excess data memorization, that relies on a new connection between strong data processing...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"official:c4c25ad438dc4b23","title":"Time to Speak Some Dialects, Qwen-TTS!","url":"https://qwenlm.github.io/blog/qwen-tts/","published":"2025-06-27","authors":["Alibaba/Qwen"],"abstract":"API DISCORDIntroduction Here we introduce the latest update of Qwen-TTS (qwen-tts-latest or qwen-tts-2025-05-22) through Qwen API . Trained on a large-scale dataset encompassing over millions of hours of speech, Qwen-TTS achieves human-level naturalness and expressiveness. Notably, Qwen-TTS automatically adjusts prosody, pacing, and emotional inflections in response to the input text. Notably, Qwen-TTS supports the generation of 3 Chinese dialects, including Pekingese, Shanghainese, and Sichuanese.As of now, Qwen-TTS supports 7 Chinese-English bilingual voices, including Cherry, Ethan, Chelsie, Serena, Dylan (Pekingese), Jada (Shanghainese) and Sunny (Sichuanese).","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Alibaba/Qwen"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official RSS feed https://qwenlm.github.io/blog/index.xml"}},{"id":"apple:c2mvrmdkzemk3r5qhz1ao0bd","title":"Revisiting Uncertainty Quantification Evaluation in Language Models: Spurious Interactions with Response Length Bias Results","url":"https://machinelearning.apple.com/research/revisiting-uncertainty","published":"2025-06-27","authors":["Andrea Santilli","Adam Golinski","Michael Kirchhof","Federico Danieli","Arno Blaas","Miao Xiong","Luca Zappella","Sinead Williamson"],"abstract":"Uncertainty Quantification (UQ) in Language Models (LMs) is key to improving their safety and reliability. Evaluations often use metrics like AUROC to assess how well UQ methods (e.g., negative sequence probabilities) correlate with task correctness functions (e.g., ROUGE-L). We show that mutual biases--when both UQ methods and correctness functions are biased by the same factors--systematically distort evaluation. First, we formally prove that...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"apple:yxs3bvqoil1dff8scwn3g8dl","title":"Normalizing Flows are Capable Generative Models","url":"https://machinelearning.apple.com/research/normalizing-flows","published":"2025-06-27","authors":["Shuangfei Zhai","Ruixiang Zhang","Preetum Nakkiran","David Berthelot","Jiatao Gu","Huangjie Zheng","Tianrong Chen","Miguel Angel Bautista","Navdeep Jaitly","Josh Susskind"],"abstract":"Normalizing Flows (NFs) are likelihood-based models for continuous inputs. They have demonstrated promising results on both density estimation and generative modeling tasks, but have received relatively little attention in recent years. In this work, we demonstrate that NFs are more powerful than previously believed. We present TarFlow: a simple and scalable architecture that enables highly performant NF models. TarFlow can be thought of as a...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"apple:q1miw6hjs21pigionyecgz5k","title":"INRFlow: Flow Matching for INRs in Ambient Space","url":"https://machinelearning.apple.com/research/flow-matching","published":"2025-06-27","authors":["Yuyang Wang","Anurag Ranjan","Josh Susskind","Miguel Angel Bautista"],"abstract":"Flow matching models have emerged as a powerful method for generative modeling on domains like images or videos, and even on irregular or unstructured data like 3D point clouds or even protein structures. These models are commonly trained in two stages: first, a data compressor is trained, and in a subsequent training stage a flow matching generative model is trained in the latent space of the data compressor. This two-stage paradigm sets...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"openalex:W4411712579","title":"CLIP-GS: CLIP-Informed Gaussian Splatting for View-Consistent 3D Indoor Semantic Understanding","url":"https://doi.org/10.1145/3746284","published":"2025-06-27","authors":["Guibiao Liao","Jiankun Li","Zhenyu Bao","Xiaoqing Ye","Qing Li","Kanglin Liu"],"abstract":"Exploiting 3D Gaussian Splatting (3DGS) with Contrastive Language-Image Pre-Training (CLIP) models for open-vocabulary 3D semantic understanding of indoor scenes has emerged as an attractive research focus. Existing methods typically attach high-dimensional CLIP semantic embeddings to 3D Gaussians and leverage view-inconsistent 2D CLIP semantics as Gaussian supervision, resulting in efficiency bottlenecks and deficient 3D semantic consistency. To address these challenges, we present CLIP-GS, efficiently achieving a coherent semantic understanding of 3D indoor scenes via the proposed Semantic Attribute Compactness (SAC) and 3D Coherent Regularization (3DCR). SAC approach exploits the naturally unified semantics within objects to learn compact, yet effective, semantic Gaussian representations, enabling highly efficient rendering (>100 FPS). 3DCR enforces semantic consistency in 2D and 3D d...","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3746284","openalex_id":"https://openalex.org/W4411712579","cited_by_count":6,"quality_score":47,"matched_keywords":["efficient"],"author_affiliations":["Baidu (China)","Peng Cheng Laboratory"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.924039363861084},{"id":"https://openalex.org/C163716315","display_name":"Gaussian","score":0.45942381024360657},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3980289101600647},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.3811799883842468},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.3297657370567322},{"id":"https://openalex.org/C121684516","display_name":"Computer graphics (images)","score":0.32622721791267395},{"id":"https://openalex.org/C62520636","display_name":"Quantum mechanics","score":0.0},{"id":"https://openalex.org/C121332964","display_name":"Physics","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":6}},{"id":"bytedance-seed:292","title":"Active Reward Modeling: Adaptive Preference Labeling for Large Language Model Alignment","url":"https://seed.bytedance.com/en/research/active-reward-modeling-adaptive-preference-labeling-for-large-language-model-alignment","published":"2025-06-26","authors":["Yunyi Shen","Hao Sun","Jean-Francois Ton"],"abstract":"Building neural reward models from human preferences is a pivotal component in reinforcement learning from human feedback (RLHF) and large language model alignment research. Given the scarcity and high cost of human annotation, how to select the most informative pairs to annotate is an essential yet challenging open problem. In this work, we highlight the insight that an ideal comparison dataset for reward modeling should balance exploration of the representation space and make informative comparisons between pairs with moderate reward differences. Technically, challenges arise in quantifying the two objectives and efficiently prioritizing the comparisons to be annotated. To address this, we propose the Fisher information-based selection strategies, adapt theories from the classical experimental design literature, and apply them to the final linear layer of the deep neural network-based....","companies":["ByteDance/Seed"],"matched_orgs":["ByteDance/Seed"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Computation and Language","Responsible AI","ICML 2025","language model","preference"],"author_affiliations":["ByteDance/Seed"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official ByteDance Seed publication API https://seed.bytedance.com/api/get_article_list_v2"}},{"id":"official:aeac4d1d7c50267e","title":"Qwen VLo: From \"Understanding\" the World to \"Depicting\" It","url":"https://qwenlm.github.io/blog/qwen-vlo/","published":"2025-06-26","authors":["Alibaba/Qwen"],"abstract":"QWEN CHAT DISCORDIntroduction The evolution of multimodal large models is continually pushing the boundaries of what we believe technology can achieve. From the initial QwenVL to the latest Qwen2.5 VL, we have made progress in enhancing the model’s ability to understand image content. Today, we are excited to introduce a new model, Qwen VLo, a unified multimodal understanding and generation model. This newly upgraded model not only “understands” the world but also generates high-quality recreations based on that understanding, truly bridging the gap between perception and creation.","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Alibaba/Qwen"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official RSS feed https://qwenlm.github.io/blog/index.xml"}},{"id":"openalex:W4411692809","title":"Remote AI Screening for Parkinson's Disease: A Multimodal, Cross-Setting Validation Study","url":"https://doi.org/10.21203/rs.3.rs-6844936/v1","published":"2025-06-26","authors":["Md. Saiful Islam","Tariq Adnan","Abdelrahman Abdelkader","Zipei Liu","Evelyn Ma","Sooyong Park","Asif Azad","Pai Liu","Meghan Pawlik","Emily Hartman","Erin Shelton","Kristina Larson"],"abstract":"","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"preprint","doi":"https://doi.org/10.21203/rs.3.rs-6844936/v1","openalex_id":"https://openalex.org/W4411692809","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Bangladesh University of Engineering and Technology","Google (United States)","Harvard University","University of Rochester","University of Rochester Medical Center"],"concepts":[{"id":"https://openalex.org/C170130773","display_name":"Usability","score":0.6384448409080505},{"id":"https://openalex.org/C2780084366","display_name":"Demographics","score":0.5984535813331604},{"id":"https://openalex.org/C58471807","display_name":"Receiver operating characteristic","score":0.5410181283950806},{"id":"https://openalex.org/C2777267654","display_name":"Test (biology)","score":0.46185103058815},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.430032342672348},{"id":"https://openalex.org/C71924100","display_name":"Medicine","score":0.42193466424942017},{"id":"https://openalex.org/C2779134260","display_name":"Disease","score":0.41414108872413635},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.41098713874816895}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/dynamic-prompt-middleware-contextual-prompt-refinement-controls-for-comprehension-tasks","title":"Dynamic Prompt Middleware: Contextual Prompt Refinement Controls for Comprehension Tasks","url":"https://www.microsoft.com/en-us/research/publication/dynamic-prompt-middleware-contextual-prompt-refinement-controls-for-comprehension-tasks/","published":"2025-06-25","authors":["Ian Drosos","Jack Williams","Advait Sarkar","Nicholas Wilson","Sean Rintel","Payod Panda"],"abstract":"FIGURE: Left: Dynamic PRC interface. Right: System flow diagram for Dynamic PRC. ABSTRACTEffective prompting of generative AI is challenging for many users, particularly in expressing context for comprehension tasks such as explaining spreadsheet formulas, Python code, and text passages. Prompt middleware aims to address this barrier by assisting in prompt construction, but barriers remain for users in expressing adequate control so that they can receive AI-responses that match their preferences. We conduct a formative survey (n=38) investigating user needs for control over AI-generated explanations in comprehension tasks, which uncovers a trade-off between standardized but predictable support for prompting, and adaptive but unpredictable support tailored to the user and task. To explore this trade-off, we implement two prompt middleware approaches: Dynamic Prompt Refinement Control (Dyn...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Human-computer interaction","Social sciences","1970-01-01","preference"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W4411631978","title":"Reproducibility Companion Paper: u-LLaVA: Unifying Multi-Modal Tasks via Large Language Model","url":"https://doi.org/10.1145/3731715.3734537","published":"2025-06-25","authors":["Jinjin Xu","Xilu Wang","Luyun Xu","Yuzhe Yang","Xiang Li","Fanyi Wang","Yanchun Xie","Yijie Huang","Yaqian Li","Yunfan Hu"],"abstract":"","companies":["Amazon","Alibaba/Qwen","Baidu"],"matched_orgs":["Amazon","Alibaba/Qwen","Baidu"],"company_groups":["company_us","company_china"],"company_regions":["US","China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3731715.3734537","openalex_id":"https://openalex.org/W4411631978","cited_by_count":0,"quality_score":65,"matched_keywords":["language model"],"author_affiliations":["Alibaba Group (China)","Amazon (United States)","Baidu (China)","Seattle University","University of Surrey"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7334421277046204},{"id":"https://openalex.org/C71139939","display_name":"Modal","score":0.6500892639160156},{"id":"https://openalex.org/C9893847","display_name":"Reproducibility","score":0.6174589991569519},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.3758854568004608},{"id":"https://openalex.org/C199360897","display_name":"Programming language","score":0.3646264374256134},{"id":"https://openalex.org/C33923547","display_name":"Mathematics","score":0.1192486584186554},{"id":"https://openalex.org/C105795698","display_name":"Statistics","score":0.08969125151634216},{"id":"https://openalex.org/C192562407","display_name":"Materials science","score":0.08285123109817505}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4411635528","title":"AgentStory: A Multi-Agent System for Story Visualization with Multi-Subject Consistent Text-to-Image Generation","url":"https://doi.org/10.1145/3731715.3733271","published":"2025-06-25","authors":["Tianchen Zhou","Zhongjie Duan","Cen Chen","Wenmeng Zhou","Yanhao Wang","Yaliang Li"],"abstract":"Story visualization aims to create visual content, such as images and videos, that is consistent, coherent, and complete with a given story. Despite significant advances in the application of diffusion models for general text-to-image generation tasks, they still encounter difficulties when directly used to produce consistent visual content that accurately aligns with the narrative text. In this paper, we propose a novel training-free automated story visualization framework called AgentStory that can generate image illustrations based on a story synopsis provided by users. Specifically, the framework employs multiple agents empowered by Large Language Models (LLMs) to create detailed descriptions of each subject and scene in the entire story. Then, it integrates a masking mechanism with a fine-grained consistency refinement adapter to incorporate different subjects in a scene. Furthermor...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3731715.3733271","openalex_id":"https://openalex.org/W4411635528","cited_by_count":1,"quality_score":46,"matched_keywords":["agent","multi-agent"],"author_affiliations":["Alibaba Group (China)","Alibaba Group (United States)","Bellevue Hospital Center","East China Normal University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7709331512451172},{"id":"https://openalex.org/C2777855551","display_name":"Subject (documents)","score":0.740837037563324},{"id":"https://openalex.org/C36464697","display_name":"Visualization","score":0.6810727119445801},{"id":"https://openalex.org/C115961682","display_name":"Image (mathematics)","score":0.4689566493034363},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.43582460284233093},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.39156249165534973},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.34869688749313354},{"id":"https://openalex.org/C121684516","display_name":"Computer graphics (images)","score":0.320928156375885}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4411665976","title":"Research on an adaptive multimodal cognitive load classification method","url":"https://doi.org/10.1080/27706710.2025.2519022","published":"2025-06-25","authors":["Xiulan Yu","Ruizhen Wang"],"abstract":"Purpose: Functional near-infrared spectroscopy (fNIRS) has emerged as a prominent research tool in the fields of brain function analysis, and cognitive load classification due to its ability to objectively reflect cognitive states. However, the variability of fNIRS signals among subjects and the scarcity of data present significant challenges. Methods: To address these issues, this study proposes a cognitive load classification algorithm based on Multimodal Domain Adaptation (MMDA), which comprises an Intra-Modal Domain Adaptation (IMDA) module and a Cross-Modal Domain Adaptation (CMDA) module. The IMDA module aims to mitigate the inter-subject variability of fNIRS signals by extracting shared features among subjects, while the CMDA module leverages electroencephalography (EEG) to align the spatiotemporal information of fNIRS, thereby capturing intrinsic relationships between physiologic...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1080/27706710.2025.2519022","openalex_id":"https://openalex.org/W4411665976","cited_by_count":2,"quality_score":39,"matched_keywords":[],"author_affiliations":["Tencent (China)","Xi'an Medical University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5070314407348633},{"id":"https://openalex.org/C169900460","display_name":"Cognition","score":0.4387107491493225},{"id":"https://openalex.org/C61641136","display_name":"Cognitive load","score":0.4353960156440735},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.39462465047836304},{"id":"https://openalex.org/C15744967","display_name":"Psychology","score":0.24109920859336853},{"id":"https://openalex.org/C169760540","display_name":"Neuroscience","score":0.0668419897556305}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":2}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/ownership-not-just-happy-talk-co-designing-a-participatory-large-language-model-for-journalism","title":"Ownership, Not Just Happy Talk: Co-Designing a Participatory Large Language Model for Journalism","url":"https://www.microsoft.com/en-us/research/publication/ownership-not-just-happy-talk-co-designing-a-participatory-large-language-model-for-journalism/","published":"2025-06-23","authors":["Emily Tseng","Meg Young","Marianne Aubin Le Quéré","Aimee Rinehart","Harini Suresh"],"abstract":"Journalism has emerged as an essential domain for understanding the uses, limitations, and impacts of large language models (LLMs) in the workplace. News organizations face divergent financial incentives: LLMs already permeate newswork processes within financially constrained organizations, even as ongoing legal challenges assert that AI companies violate their copyright. At stake are key questions about what LLMs are created to do, and by whom: How might a journalist-led LLM work, and what can participatory design illuminate about the present-day challenges about adapting “one-size-fits-all” foundation models to a given context of use? In this paper, we undertake a co-design exploration to understand how a participatory approach to LLMs might address opportunities and challenges around AI in journalism. Our 20 interviews with reporters, data journalists, editors, labor organizers, produ...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":88,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Social sciences","Human–computer interaction","1970-01-01","LLM","language model","journalism","news"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/reduction-fusion-for-optimized-distributed-data-parallel-computations-via-inverse-recomputation","title":"Reduction Fusion for Optimized Distributed Data-Parallel Computations via Inverse Recomputation","url":"https://www.microsoft.com/en-us/research/publication/reduction-fusion-for-optimized-distributed-data-parallel-computations-via-inverse-recomputation/","published":"2025-06-23","authors":["Haoxiang Lin","Yang Wang","Yanjie Gao","Hongyu Zhang","Ming Wu","Mao Yang"],"abstract":"Distributed data-parallel computations are critical for both traditional big data applications and emerging large language model tasks. The efficiency of these computations largely depends on reducer performance, particularly in handling extensive data access. This paper introduces a novel reduction fusion algorithm that optimizes distributed data-parallel programs by fusing dependent reducers and mappers into a single, unified reducer. Employing inverse recomputation, the algorithm preserves partial aggregation and reduces storage, network I/O, memory, and cache overheads. Our preliminary evaluation reveals performance improvements of up to 2.47×, demonstrating the practicality and effectiveness of this approach, while also highlighting its potential to address challenges posed by extensive data access in modern distributed computing environments. Venue: 1970-01-01","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":80,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Programming languages and software engineering","Systems and networking","1970-01-01","language model","memory"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/an-empirical-study-of-issues-in-large-language-model-training-systems","title":"An Empirical Study of Issues in Large Language Model Training Systems","url":"https://www.microsoft.com/en-us/research/publication/an-empirical-study-of-issues-in-large-language-model-training-systems/","published":"2025-06-23","authors":["Yanjie Gao","Ruiming Lu","Haoxiang Lin","Yueguo Chen"],"abstract":"Large language models (LLMs) have gained significant traction in recent years, driving advancements in various applications. The training and evaluation of these models depend heavily on specialized LLM training systems, which are deployed across numerous GPUs, partition LLMs, and process large datasets. However, issues in LLM training systems can lead to program crashes or unexpected behavior, reducing development productivity and wasting valuable resources such as GPUs and storage.This paper presents the first comprehensive empirical study of issues in LLM training systems. We conducted a manual analysis of 300 high-quality issue reports and corresponding fix commits from the GitHub repositories of three prominent LLM training systems: Microsoft DeepSpeed, NVIDIA Megatron-LM, and Hugging Face Transformers. Our analysis identified common symptoms, root causes, typical fixes, and debuggi...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":80,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Programming languages and software engineering","Systems and networking","1970-01-01","LLM","language model"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/dl%c2%b2-detecting-communication-deadlocks-in-deep-learning-jobs","title":"dl²: Detecting Communication Deadlocks in Deep Learning Jobs","url":"https://www.microsoft.com/en-us/research/publication/dl%c2%b2-detecting-communication-deadlocks-in-deep-learning-jobs/","published":"2025-06-23","authors":["Yanjie Gao","Jiyu Luo","Haoxiang Lin","Hongyu Zhang","Ming Wu","Mao Yang"],"abstract":"In recent years, deep learning has seen widespread adoption across various domains, giving rise to large-scale models such as large language models. Training these models, particularly in distributed environments, presents substantial computational and communication challenges. A critical issue is the communication deadlock—a state in which processes become indefinitely stalled while awaiting network messages from others, which leads to resource wastage and reduced productivity. Current approaches to deadlock handling are either unsuitable for deep learning due to its unique hybrid programming paradigm or limit optimization opportunities. This paper presents dl 2 , a novel dynamic analysis tool designed to detect communication deadlocks in deep learning jobs. dl 2 models the runtime trace of a job as an execution graph, detects unmatched communications, and constructs a wait-for graph to...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Programming languages and software engineering","Systems and networking","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/reprocopilot-llm-driven-failure-reproduction-with-dynamic-refinement","title":"ReproCopilot: LLM-Driven Failure Reproduction with Dynamic Refinement","url":"https://www.microsoft.com/en-us/research/publication/reprocopilot-llm-driven-failure-reproduction-with-dynamic-refinement/","published":"2025-06-23","authors":["Tanakorn Leesatapornwongsa","Fazle Faisal","Suman Nath"],"abstract":"Failure reproduction is a crucial step for debugging software systems, but it is often challenging and time-consuming, especially when the failures depend on complex inputs, states, or environments. In this paper, we present ReproCopilot, a tool that leverages program analysis and a large language model (LLM) to generate failure reproduction code and inputs. ReproCopilot proposes two novel techniques: state-oriented code generation and dynamic refinement that iteratively guide the LLM with program analysis feedback until the generated code can successfully reproduce the failure. We evaluate ReproCopilot on 37 real-world cases from 15 open-source projects, and show that it can reproduce 78% of them, significantly outperforming the-state-of-the-art solutions. Venue: 1970-01-01","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Inproceedings (Conference)","Systems and networking","1970-01-01","LLM","language model"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/an-ab-initio-foundation-model-of-wavefunctions-that-accurately-describes-chemical-bond-breaking","title":"An ab initio foundation model of wavefunctions that accurately describes chemical bond breaking","url":"https://www.microsoft.com/en-us/research/publication/an-ab-initio-foundation-model-of-wavefunctions-that-accurately-describes-chemical-bond-breaking/","published":"2025-06-23","authors":["Adam Foster","Zeno Schätzle","P. B. Szabó","Lixue Cheng","Jonas Köhler","Gino Cassella","Nicholas Gao","Jiawei Li","Frank Noé","Jan Hermann"],"abstract":"Reliable description of bond breaking remains a major challenge for quantum chemistry due to the multireferential character of the electronic structure in dissociating species. Multireferential methods in particular suffer from large computational cost, which under the normal paradigm has to be paid anew for each system at a full price, ignoring commonalities in electronic structure across molecules. Quantum Monte Carlo with deep neural networks (deep QMC) uniquely offers to exploit such commonalities by pretraining transferable wavefunction models, but all such attempts were so far limited in scope. Here, we bring this new paradigm to fruition with Orbformer, a novel transferable wavefunction model pretrained on 22,000 equilibrium and dissociating structures that can be fine-tuned on unseen molecules reaching an accuracy-cost ratio rivalling classical multireferential methods. On establ...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Miscellaneous","Artificial intelligence","Computer science","mathematics","Physics"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"official:7f958f98641e6f24","title":"LIA: Cost-efficient LLM Inference Acceleration with Intel Advanced Matrix Extensions and CXL","url":"https://deepmind.google/research/publications/81986/","published":"2025-06-23","authors":["Google/DeepMind"],"abstract":"","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["LLM","efficient"],"author_affiliations":["Google/DeepMind"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Google DeepMind publications page https://deepmind.google/research/publications/"}},{"id":"openalex:W4413267850","title":"Alibaba LingmaAgent: Improving Automated Issue Resolution via Comprehensive Repository Exploration","url":"https://doi.org/10.1145/3696630.3728549","published":"2025-06-23","authors":["Yingwei Ma","Qingping Yang","Rongyu Cao","Binhua Li","Fei Huang","Yongbin Li"],"abstract":"This paper presents Alibaba LingmaAgent, a novel Automated Software Engineering method designed to comprehensively understand and utilize whole software repositories for issue resolution. Deployed in TONGYI Lingma, an IDE-based coding assistant developed by Alibaba Cloud, LingmaAgent addresses the limitations of existing LLM-based agents that primarily focus on local code information. Our approach introduces a top-down method to condense critical repository information into a knowledge graph, reducing complexity, and employs a Monte Carlo tree search based strategy enabling agents to explore and understand entire repositories. We guide agents to summarize, analyze, and plan using repository-level knowledge, allowing them to dynamically acquire information and generate patches for real-world GitHub issues. In extensive experiments, LingmaAgent demonstrated significant improvements, achiev...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3696630.3728549","openalex_id":"https://openalex.org/W4413267850","cited_by_count":3,"quality_score":48,"matched_keywords":["LLM","agent"],"author_affiliations":["Alibaba Group (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6474345326423645},{"id":"https://openalex.org/C138268822","display_name":"Resolution (logic)","score":0.5461261868476868},{"id":"https://openalex.org/C2522767166","display_name":"Data science","score":0.3814965486526489},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.16093426942825317}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":3}},{"id":"openalex:W4413267905","title":"L4: Diagnosing Large-scale LLM Training Failures via Automated Log Analysis","url":"https://doi.org/10.1145/3696630.3728531","published":"2025-06-23","authors":["Zhihan Jiang","Junjie Huang","Guangba Yu","Zhuangbin Chen","Yichen Li","Renyi Zhong","C. Q. Feng","Yongqiang Yang","Zengyin Yang","Michael R. Lyu"],"abstract":"","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3696630.3728531","openalex_id":"https://openalex.org/W4413267905","cited_by_count":6,"quality_score":47,"matched_keywords":["LLM"],"author_affiliations":["Chinese University of Hong Kong","Cloud Computing Center","Huawei Technologies (China)","Sun Yat-sen University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7073162794113159},{"id":"https://openalex.org/C2778755073","display_name":"Scale (ratio)","score":0.5776340365409851},{"id":"https://openalex.org/C2777211547","display_name":"Training (meteorology)","score":0.4960575997829437},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3909373879432678},{"id":"https://openalex.org/C124101348","display_name":"Data mining","score":0.3458738923072815},{"id":"https://openalex.org/C153294291","display_name":"Meteorology","score":0.0},{"id":"https://openalex.org/C121332964","display_name":"Physics","score":0.0},{"id":"https://openalex.org/C62520636","display_name":"Quantum mechanics","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":6}},{"id":"openalex:W4413267893","title":"Natural Language Outlines for Code: Literate Programming in the LLM Era","url":"https://doi.org/10.1145/3696630.3728541","published":"2025-06-23","authors":["Kensen Shi","Deniz Altınbüken","Saswat Anand","Mihai Christodorescu","Katja Grünwedel","Alexa Koenings","S Sanyasi Naidu","Anurag Pathak","Marc Rasi","Fredde Ribeiro","Brandon Ruffin","Siddhant Sanyam"],"abstract":"We propose using natural language outlines as a novel modality and interaction surface for providing AI assistance to developers throughout the software development process. An NL outline for a code function comprises multiple statements written in concise prose, which partition the code and summarize its main ideas in the style of literate programming. Crucially, we find that modern LLMs can generate accurate and high-quality NL outlines in practice. Moreover, NL outlines enable a bidirectional sync between code and NL, where a developer can change either code or NL and have the LLM automatically update the other. We discuss many use cases for NL outlines: they can accelerate understanding and navigation of code and diffs, simplify code maintenance, augment code search, steer code generation, and more. We then propose and compare multiple LLM prompting techniques for generating outlines...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3696630.3728541","openalex_id":"https://openalex.org/W4413267893","cited_by_count":4,"quality_score":45,"matched_keywords":["LLM"],"author_affiliations":["Google (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7241261601448059},{"id":"https://openalex.org/C199360897","display_name":"Programming language","score":0.7022961974143982},{"id":"https://openalex.org/C2776760102","display_name":"Code (set theory)","score":0.5259262323379517},{"id":"https://openalex.org/C67463725","display_name":"Natural language programming","score":0.46131157875061035},{"id":"https://openalex.org/C195324797","display_name":"Natural language","score":0.4602282643318176},{"id":"https://openalex.org/C570499","display_name":"First-generation programming language","score":0.4202738106250763},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.295708566904068},{"id":"https://openalex.org/C34165917","display_name":"Programming paradigm","score":0.2531311810016632}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":4}},{"id":"openalex:W4411541877","title":"Detecting Prefix Bias in LLM-based Reward Models","url":"https://doi.org/10.1145/3715275.3732204","published":"2025-06-23","authors":["Ashwin Kumar","Yuzi He","Aram Markosyan","Bobbie Chern","Imanol Arrieta-Ibarra"],"abstract":"Reinforcement Learning with Human Feedback (RLHF) has emerged as a key paradigm for task-specific fine-tuning of language models using human preference data.While numerous publicly available preference datasets provide pairwise comparisons of responses, the potential for biases in the resulting reward models remains underexplored.In this work, we introduce novel methods to detect and evaluate prefix bias-a systematic shift in model preferences triggered by minor variations in query prefixes-in LLM-based reward models trained on such datasets.We leverage these metrics to reveal significant biases in preference models across racial and gender dimensions.Our comprehensive evaluation spans diverse open-source preference datasets and reward model architectures, demonstrating susceptibility to this kind of bias regardless of the underlying model architecture.Furthermore, we propose a data augm...","companies":["Meta/FAIR"],"matched_orgs":["Meta/FAIR"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3715275.3732204","openalex_id":"https://openalex.org/W4411541877","cited_by_count":0,"quality_score":45,"matched_keywords":["LLM","preference"],"author_affiliations":["College of San Mateo","Meta (United States)","Washington University in St. Louis"],"concepts":[{"id":"https://openalex.org/C141603448","display_name":"Prefix","score":0.7611581087112427},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7018080949783325},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.38751035928726196},{"id":"https://openalex.org/C41895202","display_name":"Linguistics","score":0.0},{"id":"https://openalex.org/C138885662","display_name":"Philosophy","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4413267884","title":"A Multimodal Intelligent Change Assessment Framework for Microservice Systems Based on Large Language Models","url":"https://doi.org/10.1145/3696630.3728561","published":"2025-06-23","authors":["Yongqian Sun","Tinghua Zheng","Xidao Wen","Weihua Kuang","Heng Liu","Shenglin Zhang","C. P. Shen","Bo Wu","Dan Pei"],"abstract":"Frequent changes in large-scale online service systems often lead to failures, threatening system reliability. To overcome the limitations of existing techniques in erroneous change detection, failure triage, and root cause change analysis, this paper presents a multimodal intelligent change assessment framework based on large language models. Our framework integrates retrieval-augmented generation techniques and leverages unified representation of multimodal data, enhanced knowledge access, and domain-specific LLMs to automate the entire change management lifecycle. Experiments on two microservice system datasets show that our method outperforms state-of-the-arts in accuracy, efficiency, and minimizing manual intervention. Furthermore, SCELM has been operational for over 11 months in real world, reducing response and resolution times for erroneous changes by 90%, significantly improving...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3696630.3728561","openalex_id":"https://openalex.org/W4413267884","cited_by_count":2,"quality_score":43,"matched_keywords":["retrieval"],"author_affiliations":["China Energy Engineering Corporation (China)","Nankai University","Tencent (China)","Tsinghua University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7750860452651978},{"id":"https://openalex.org/C2778505942","display_name":"Microservices","score":0.47598567605018616},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.41147804260253906},{"id":"https://openalex.org/C115903868","display_name":"Software engineering","score":0.376248836517334},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3687213063240051},{"id":"https://openalex.org/C79974875","display_name":"Cloud computing","score":0.08831673860549927},{"id":"https://openalex.org/C111919701","display_name":"Operating system","score":0.07240185141563416}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":2}},{"id":"openalex:W4412703922","title":"Towards Mitigating API Hallucination in Code Generated by LLMs with Hierarchical Dependency Aware","url":"https://doi.org/10.1145/3696630.3728569","published":"2025-06-23","authors":["Yujia Chen","Mingyu Chen","Cuiyun Gao","Zhihan Jiang","Zhongqi Li","Yuchi Ma"],"abstract":"Application Programming Interfaces (APIs) are crucial in modern software development. Large Language Models (LLMs) assist in automated code generation but often struggle with API hallucination, including invoking non-existent APIs and misusing existing ones in practical development scenarios. Existing studies resort to Retrieval-Augmented Generation (RAG) methods for mitigating the hallucination issue, but tend to fail since they generally ignore the structural dependencies in practical projects and do not indeed validate whether the generated APIs are available or not. To address these limitations, we propose MARIN, a framework for mitigating API hallucination in code generated by LLMs with hierarchical dependency aware. MARIN consists of two phases: Hierarchical Dependency Mining, which analyzes local and global dependencies of the current function, aiming to supplement comprehensive p...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3696630.3728569","openalex_id":"https://openalex.org/W4412703922","cited_by_count":1,"quality_score":42,"matched_keywords":["retrieval"],"author_affiliations":["Harbin Institute of Technology","Huawei Technologies (China)"],"concepts":[{"id":"https://openalex.org/C19768560","display_name":"Dependency (UML)","score":0.732113242149353},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6270941495895386},{"id":"https://openalex.org/C2776760102","display_name":"Code (set theory)","score":0.5863932967185974},{"id":"https://openalex.org/C199360897","display_name":"Programming language","score":0.27677780389785767},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.21698811650276184},{"id":"https://openalex.org/C177264268","display_name":"Set (abstract data type)","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4412130287","title":"LLMPrism: Black-box Performance Diagnosis for Production LLM Training Platforms","url":"https://doi.org/10.1109/dsn-s65789.2025.00034","published":"2025-06-23","authors":["Zhihan Jiang","Rui Ren","Guangba Yu","Yifeng Wu","Wenwei Gu","Yichen Li","Yu‐Jie Huang","Cong Feng","Zengyin Yang","Yongqiang Yang","Michael R. Lyu"],"abstract":"Large Language Models (LLMs) have brought about revolutionary changes in diverse fields, rendering LLM training of utmost importance for modern enterprises. To meet this demand, multi-tenant large-scale LLM training platforms have been built to offer LLM training services. Nevertheless, due to the complexity and synchronous nature of LLM training process, performance issues occur frequently and can result in substantial resource wastage. The limited visibility from the perspective of platform providers impedes existing profiling methods and poses challenges to the monitoring and diagnosis of the performance of LLM training jobs. For the first time, this paper proposes the utilization of underlying network flow data to reconstruct the training timelines of jobs based on the distinct characteristics in the LLM training procedure. We design LLMPrism, the first black-box performance diagnosi...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/dsn-s65789.2025.00034","openalex_id":"https://openalex.org/W4412130287","cited_by_count":1,"quality_score":42,"matched_keywords":["LLM"],"author_affiliations":["Chinese University of Hong Kong","Huawei Technologies (China)"],"concepts":[{"id":"https://openalex.org/C94966114","display_name":"Black box","score":0.8373215198516846},{"id":"https://openalex.org/C2778348673","display_name":"Production (economics)","score":0.6238733530044556},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.618537425994873},{"id":"https://openalex.org/C2777211547","display_name":"Training (meteorology)","score":0.6083745956420898},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.2397468388080597},{"id":"https://openalex.org/C121332964","display_name":"Physics","score":0.06167340278625488},{"id":"https://openalex.org/C153294291","display_name":"Meteorology","score":0.05900135636329651},{"id":"https://openalex.org/C162324750","display_name":"Economics","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4413268020","title":"LLM-Augmented Ticket Aggregation for Low-cost Mobile OS Defect Resolution","url":"https://doi.org/10.1145/3696630.3728547","published":"2025-06-23","authors":["Yongqian Sun","Bowen Hao","Xiaotian Wang","Chenyu Zhao","Y. Zhao","Binpeng Shi","Shenglin Zhang","Qiao Ge","Wenhu Li","Hua Wei","Dan Pei"],"abstract":"","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3696630.3728547","openalex_id":"https://openalex.org/W4413268020","cited_by_count":1,"quality_score":42,"matched_keywords":["LLM"],"author_affiliations":["Huawei Technologies (China)","Nankai University","Tsinghua University"],"concepts":[{"id":"https://openalex.org/C2776540713","display_name":"Ticket","score":0.8844335079193115},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6372323036193848},{"id":"https://openalex.org/C153715457","display_name":"Augmented reality","score":0.4770457148551941},{"id":"https://openalex.org/C138268822","display_name":"Resolution (logic)","score":0.4686875641345978},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.18505194783210754},{"id":"https://openalex.org/C38652104","display_name":"Computer security","score":0.1739182472229004},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.07874429225921631}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4412703862","title":"AmocRCA: At Most One Change Segmentation and Relative Correlation Ranking for Root Cause Analysis","url":"https://doi.org/10.1145/3696630.3731612","published":"2025-06-23","authors":["Anton Altenbernd","WU Zhi-yuan","Odej Kao"],"abstract":"In this paper, we present AmocRCA, a highly effective and efficient approach to Root Cause Analysis (RCA) using metric data only. A fast and reliable localization of root causes is decisive for self-stabilization of large systems in production and thus for continuous operation. Unlike many multi-modal approaches, we omit the necessity to create and maintain topology and interaction graphs as well as to collect and interpret semantically rich data such as logs and traces for the sake of quick localization/reaction and low computational overhead while achieving comparable results in terms of RCA precision. AmocRCA is based on a recent and promising approach for RCA named BARO. It leverages At Most One Change (AMOC) segmentation to make the scoring mechanism independent of anomaly detection, and employs a relative correlation ranking that enhances the scoring mechanism while reducing the ne...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3696630.3731612","openalex_id":"https://openalex.org/W4412703862","cited_by_count":0,"quality_score":41,"matched_keywords":["efficient"],"author_affiliations":["Huawei Technologies (China)","Technische Universität Berlin"],"concepts":[{"id":"https://openalex.org/C189430467","display_name":"Ranking (information retrieval)","score":0.7391989231109619},{"id":"https://openalex.org/C89600930","display_name":"Segmentation","score":0.5955809950828552},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.4974346458911896},{"id":"https://openalex.org/C130963320","display_name":"Root cause analysis","score":0.486102819442749},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.48292800784111023},{"id":"https://openalex.org/C117220453","display_name":"Correlation","score":0.4819464683532715},{"id":"https://openalex.org/C203595873","display_name":"Change detection","score":0.47560983896255493},{"id":"https://openalex.org/C124504099","display_name":"Image segmentation","score":0.4397121071815491}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"arxiv:2506.00202","title":"What do professional software developers need to know to succeed in an age of Artificial Intelligence?","url":"http://arxiv.org/abs/2506.00202","published":"2025-06-23","authors":["Matthew Kam","Cody Miller","Miaoxin Wang","Abey Tidwell","Irene Lee","Joyce Malyn‐Smith","Beatriz Perret","Vikram Tiwari","Joshua Kenitzer","Andrew Macvean","Erin Barrar"],"abstract":"Generative AI is showing early evidence of productivity gains for software developers, but concerns persist regarding workforce disruption and deskilling. We describe our research with 21 developers at the cutting edge of using AI, summarizing 12 of their work goals we uncovered, together with 75 associated tasks and the skills & knowledge for each, illustrating how developers use AI at work. From all of these, we distilled our findings in the form of 5 insights. We found that the skills & knowledge to be a successful AI-enhanced developer are organized into four domains (using Generative AI effectively, core software engineering, adjacent engineering, and adjacent non-engineering) deployed at critical junctures throughout a 6-step task workflow. In order to \"future proof\" developers for this age of AI, on-the-job learning initiatives and computer science degree programs will need to tar...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"preprint","doi":"https://doi.org/10.1145/3696630.3727251","openalex_id":"https://openalex.org/W4413268219","cited_by_count":3,"quality_score":40,"matched_keywords":[],"author_affiliations":["Assemblée Nationale de France","Boston College","Education Development Center","Google (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6599031686782837},{"id":"https://openalex.org/C2777904410","display_name":"Software","score":0.5184697508811951},{"id":"https://openalex.org/C2522767166","display_name":"Data science","score":0.4332680404186249},{"id":"https://openalex.org/C115903868","display_name":"Software engineering","score":0.38783276081085205},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.32073965668678284},{"id":"https://openalex.org/C111919701","display_name":"Operating system","score":0.09007158875465393}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":3}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/first-workshop-connecting-academia-and-industry-on-modern-integrated-database-and-ai-systems-midas","title":"First Workshop Connecting Academia and Industry on Modern Integrated Database and AI Systems (MIDAS)","url":"https://www.microsoft.com/en-us/research/publication/first-workshop-connecting-academia-and-industry-on-modern-integrated-database-and-ai-systems-midas/","published":"2025-06-22","authors":["Avrilia Floratou","Subru Krishnan","Jignesh M. Patel"],"abstract":"The MIDAS workshop is designed to foster meaningful collaboration between researchers and industry practitioners by identifying and addressing complex challenges in the field of Generative AI (GenAI) and Data. These challenges often require a longer-term research perspective while simultaneously needing to remain grounded in real-world constraints and operational scenarios.","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Data platforms and analytics","Big data","Computer science","Database"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"apple:moxrcgsy7jq2untxym0gxcr4","title":"AbstRaL: Augmenting LLMs' Reasoning by Reinforcing Abstract Thinking","url":"https://machinelearning.apple.com/research/abstral","published":"2025-06-22","authors":["Silin Gao","Antoine Bosselut","Samy Bengio","Emmanuel Abbe"],"abstract":"Recent studies have shown that large language models (LLMs), especially smaller ones, often lack robustness in their reasoning. I.e., they tend to experience performance drops when faced with distribution shifts, such as changes to numerical or nominal variables, or insertions of distracting clauses. A possible strategy to address this involves generating synthetic data to further \"instantiate\" reasoning problems on potential variations. In...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"openalex:W4411523014","title":"KEENHash: Hashing Programs into Function-Aware Embeddings for Large-Scale Binary Code Similarity Analysis","url":"https://doi.org/10.1145/3728911","published":"2025-06-22","authors":["Zhijie Liu","Qiyi Tang","Sen Nie","Shi Wu","Liang Feng Zhang","Yutian Tang"],"abstract":"Binary code similarity analysis (BCSA) is a crucial research area in many fields such as cybersecurity. Specifically, function-level diffing tools are the most widely used in BCSA: they perform function matching one by one for evaluating the similarity between binary programs. However, such methods need a high time complexity, making them unscalable in large-scale scenarios (e.g., 1/n-to-n search). Towards effective and efficient program-level BCSA, we propose KEENHash, a novel hashing approach that hashes binaries into program-level representations through large language model (LLM)-generated function embeddings. KEENHash condenses a binary into one compact and fixed-length program embedding using K-Means and Feature Hashing, allowing us to do effective and efficient large-scale program-level BCSA, surpassing the previous state-of-the-art methods. The experimental results show that KEEN...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3728911","openalex_id":"https://openalex.org/W4411523014","cited_by_count":1,"quality_score":50,"matched_keywords":["LLM","language model","efficient"],"author_affiliations":["ShanghaiTech University","Tencent (China)","University of Glasgow"],"concepts":[{"id":"https://openalex.org/C99138194","display_name":"Hash function","score":0.7130539417266846},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6784989833831787},{"id":"https://openalex.org/C103278499","display_name":"Similarity (geometry)","score":0.6718894839286804},{"id":"https://openalex.org/C48372109","display_name":"Binary number","score":0.579780101776123},{"id":"https://openalex.org/C14036430","display_name":"Function (biology)","score":0.5546537637710571},{"id":"https://openalex.org/C165064840","display_name":"Matching (statistics)","score":0.5365560054779053},{"id":"https://openalex.org/C2778755073","display_name":"Scale (ratio)","score":0.5223656296730042},{"id":"https://openalex.org/C2776760102","display_name":"Code (set theory)","score":0.5177720785140991}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4411522891","title":"S-Eval: Towards Automated and Comprehensive Safety Evaluation for Large Language Models","url":"https://doi.org/10.1145/3728971","published":"2025-06-22","authors":["Xiaohan Yuan","Jinfeng Li","Dongxia Wang","Yuefeng Chen","Xiaofeng Mao","Longtao Huang","Jialuo Chen","Hui Xue","Xiaoxia Liu","Wenhai Wang","Kui Ren","Jingyi Wang"],"abstract":"Generative large language models (LLMs) have revolutionized natural language processing with their transformative and emergent capabilities. However, recent evidence indicates that LLMs can produce harmful content that violates social norms, raising significant concerns regarding the safety and ethical ramifications of deploying these advanced models. Thus, it is both critical and imperative to perform a rigorous and comprehensive safety evaluation of LLMs before deployment. Despite this need, owing to the extensiveness of LLM generation space, it still lacks a unified and standardized risk taxonomy to systematically reflect the LLM content safety, as well as automated safety assessment techniques to explore the potential risks efficiently. To bridge the striking gap, we propose S-Eval, a novel LLM-based automated Safety Evaluation framework with a newly defined comprehensive risk taxono...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3728971","openalex_id":"https://openalex.org/W4411522891","cited_by_count":4,"quality_score":49,"matched_keywords":["LLM","efficient"],"author_affiliations":["Alibaba Group (China)","Zhejiang University"],"concepts":[{"id":"https://openalex.org/C70587473","display_name":"Transformative learning","score":0.5902789831161499},{"id":"https://openalex.org/C32896092","display_name":"Risk management","score":0.4761076867580414},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.47198015451431274},{"id":"https://openalex.org/C12174686","display_name":"Risk assessment","score":0.46338948607444763},{"id":"https://openalex.org/C185798385","display_name":"Benchmark (surveying)","score":0.4515109658241272},{"id":"https://openalex.org/C105002631","display_name":"Subject-matter expert","score":0.4184054136276245},{"id":"https://openalex.org/C2780801425","display_name":"Construct (python library)","score":0.41034409403800964},{"id":"https://openalex.org/C112930515","display_name":"Risk analysis (engineering)","score":0.37633568048477173}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":4}},{"id":"openalex:W4411523101","title":"PatchScope: LLM-Enhanced Fine-Grained Stable Patch Classification for Linux Kernel","url":"https://doi.org/10.1145/3728944","published":"2025-06-22","authors":["R. Liu","Heyuan Shi","Shuning Liu","Chao Hu","S. Li","Yuheng Shen","Runzhe Wang","Xiaohai Shi","Yu Jiang"],"abstract":"Stable patch classification plays a crucial role in vulnerability management for the Linux kernel, significantly contributing to the stability and security of Long-term support(LTS) versions. Although existing tools have effectively assisted in assessing whether patches should be merged into stable versions, they cannot determine which stable patches should be merged into which LTS versions. This process still requires the maintainers of the distribution community to manually screen based on the requirements of their respective versions.To address this issue, we propose PatchScope, which is designed to predict the specific merge status of patches.Patchscope consists of two components: patch analysis and patch classification.Patch analysis leverages Large Language Models(LLMs) to generate detailed patch descriptions from the commit message and code changes, thereby deepening the model's s...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3728944","openalex_id":"https://openalex.org/W4411523101","cited_by_count":0,"quality_score":49,"matched_keywords":["LLM","language model","long-term"],"author_affiliations":["Alibaba Group (China)","Central South University","Tsinghua University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8410993814468384},{"id":"https://openalex.org/C197129107","display_name":"Merge (version control)","score":0.7080848217010498},{"id":"https://openalex.org/C95623464","display_name":"Classifier (UML)","score":0.5610429048538208},{"id":"https://openalex.org/C553261973","display_name":"Linux kernel","score":0.5402912497520447},{"id":"https://openalex.org/C153180980","display_name":"Commit","score":0.4803796410560608},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.4791128635406494},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.41696402430534363},{"id":"https://openalex.org/C111919701","display_name":"Operating system","score":0.18285492062568665}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4414197649","title":"SSDTrain: An Activation Offloading Framework to SSDs for Faster Large Language Model Training","url":"https://doi.org/10.1109/dac63849.2025.11132754","published":"2025-06-22","authors":["Kun Wu","Jeongmin Park","Xiaofan Zhang","Mert Hidayetoğlu","Vikram Sharma Mailthody","Sitao Huang","Steve Lumetta","Wen‐mei Hwu"],"abstract":"The growth rate of the GPU memory capacity has not been able to keep up with that of the size of large language models (LLMs), hindering the model training process. In particular, activations-the intermediate tensors produced during forward propagation and reused in backward propagation-dominate the GPU memory use. This leads to high training overheads such as expensive weight update costs due to the small micro-batch size. To address this challenge, we propose SSDTrain, an adaptive activation offloading framework to high-capacity NVMe SSDs. SSDTrain reduces GPU memory usage without impacting performance by fully overlapping data transfers with computation. SSDTrain is compatible with popular deep learning frameworks like PyTorch, Megatron, and DeepSpeed, and it employs techniques such as tensor deduplication and forwarding to further enhance efficiency. We extensively experimented with....","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/dac63849.2025.11132754","openalex_id":"https://openalex.org/W4414197649","cited_by_count":2,"quality_score":47,"matched_keywords":["language model","memory"],"author_affiliations":["Google (United States)","Stanford University","University of California, Irvine","University of Illinois Urbana-Champaign"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8429999947547913},{"id":"https://openalex.org/C43521106","display_name":"Pipeline (software)","score":0.7458999752998352},{"id":"https://openalex.org/C32587265","display_name":"Data deduplication","score":0.6815999746322632},{"id":"https://openalex.org/C173608175","display_name":"Parallel computing","score":0.5942999720573425},{"id":"https://openalex.org/C157764524","display_name":"Throughput","score":0.5885999798774719},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.38940000534057617},{"id":"https://openalex.org/C149635348","display_name":"Embedded system","score":0.3521000146865845},{"id":"https://openalex.org/C45374587","display_name":"Computation","score":0.35190001130104065}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":2}},{"id":"openalex:W4414199045","title":"SQ-DM: Accelerating Diffusion Models with Aggressive Quantization and Temporal Sparsity","url":"https://doi.org/10.1109/dac63849.2025.11132632","published":"2025-06-22","authors":["Zichen Fan","Steve Dai","Rangharajan Venkatesan","Dennis Sylvester","Brucek Khailany"],"abstract":"Diffusion models have gained significant popularity in image generation tasks. However, generating high-quality content remains notably slow because it requires running model inference over many time steps. To accelerate these models, we propose to aggressively quantize both weights and activations, while simultaneously promoting significant activation sparsity. We further observe that the stated sparsity pattern varies among different channels and evolves across time steps. To support this quantization and sparsity scheme, we present a novel diffusion model accelerator featuring a heterogeneous mixed-precision dense-sparse architecture, channel-last address mapping, and a time-step-aware sparsity detector for efficient handling of the sparsity pattern. Our 4-bit quantization technique demonstrates superior generation quality compared to existing $\\mathbf{4}$-bit methods. Our custom acce...","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/dac63849.2025.11132632","openalex_id":"https://openalex.org/W4414199045","cited_by_count":0,"quality_score":45,"matched_keywords":["efficient","quantization"],"author_affiliations":["Nvidia (United States)","University of Michigan"],"concepts":[{"id":"https://openalex.org/C28855332","display_name":"Quantization (signal processing)","score":0.7526999711990356},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6342999935150146},{"id":"https://openalex.org/C94915269","display_name":"Detector","score":0.5831999778747559},{"id":"https://openalex.org/C11413529","display_name":"Algorithm","score":0.5717999935150146},{"id":"https://openalex.org/C2776214188","display_name":"Inference","score":0.5478000044822693},{"id":"https://openalex.org/C69357855","display_name":"Diffusion","score":0.48330000042915344},{"id":"https://openalex.org/C186370098","display_name":"Energy (signal processing)","score":0.44020000100135803},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.40299999713897705}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4413018504","title":"RealDriveSim: A Realistic Multi-Modal Multi-Task Synthetic Dataset for Autonomous Driving","url":"https://doi.org/10.1109/iv64158.2025.11097728","published":"2025-06-22","authors":["Arpit Jadon","Haoran Wang","Phillip L. Thomas","Michael W. Stanley","S. Nathaniel Cibik","Rachel Laurat","Omar Maher","Lukas Hoyer","Ozan Unal","Dengxin Dai"],"abstract":"As perception models continue to develop, the need for large-scale datasets increases. However, data annotation remains far too expensive to effectively scale and meet the demand. Synthetic datasets provide a solution to boost model performance with substantially reduced costs. However, current synthetic datasets remain limited in their scope, realism, and are designed for specific tasks and applications. In this work, we present RealDriveSim, a realistic multi-modal synthetic dataset for autonomous driving that not only supports popular 2D computer vision applications but also their LiDAR counterparts, providing fine-grained annotations for up to 64 classes. We extensively evaluate our dataset for a wide range of applications and domains, demonstrating state-of-the-art results compared to existing synthetic benchmarks. The dataset is publicly available at https//zrealdrivesim.github.io/...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/iv64158.2025.11097728","openalex_id":"https://openalex.org/W4413018504","cited_by_count":1,"quality_score":38,"matched_keywords":[],"author_affiliations":["Deutsches Zentrum für Luft- und Raumfahrt e. V. (DLR)","ETH Zurich","Huawei Technologies (China)","Max Planck Institute for Informatics","Parallel Consulting (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7094895243644714},{"id":"https://openalex.org/C71139939","display_name":"Modal","score":0.7012842893600464},{"id":"https://openalex.org/C2780451532","display_name":"Task (project management)","score":0.6605682969093323},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4234028458595276},{"id":"https://openalex.org/C127413603","display_name":"Engineering","score":0.15862390398979187},{"id":"https://openalex.org/C201995342","display_name":"Systems engineering","score":0.12826842069625854},{"id":"https://openalex.org/C185592680","display_name":"Chemistry","score":0.0},{"id":"https://openalex.org/C188027245","display_name":"Polymer chemistry","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4411523039","title":"REACCEPT: Automated Co-evolution of Production and Test Code Based on Dynamic Validation and Large Language Models","url":"https://doi.org/10.1145/3728930","published":"2025-06-22","authors":["Jianlei Chi","Xiaotian Wang","Yuhan Huang","Lechen Yu","Di Cui","Sun Jian-guo","Jun Sun"],"abstract":"Synchronizing production and test code, known as PT co-evolution, is critical for software quality. Given the significant manual effort involved, researchers have tried automating PT co-evolution using predefined heuristics and machine learning models. However, existing solutions are still incomplete. Most approaches only detect and flag obsolete test cases, leaving developers to manually update them. Meanwhile, existing solutions may suffer from low accuracy, especially when applied to real-world software projects. In this paper, we propose ReAccept, a novel approach leveraging large language models (LLMs), retrievalaugmented generation (RAG), and dynamic validation to fully automate PT co-evolution with high accuracy. ReAccept employs an experience-guided approach to generate prompt templates for the identification and subsequent update processes. After updating a test case, ReAccept p...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3728930","openalex_id":"https://openalex.org/W4411523039","cited_by_count":1,"quality_score":38,"matched_keywords":[],"author_affiliations":["Harbin Engineering University","Microsoft (United States)","Seattle University","Singapore Management University","Xidian University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8184522390365601},{"id":"https://openalex.org/C128942645","display_name":"Test case","score":0.5456370711326599},{"id":"https://openalex.org/C127705205","display_name":"Heuristics","score":0.49610552191734314},{"id":"https://openalex.org/C2776760102","display_name":"Code (set theory)","score":0.48965078592300415},{"id":"https://openalex.org/C53942775","display_name":"Code coverage","score":0.47310400009155273},{"id":"https://openalex.org/C548217200","display_name":"Java","score":0.4728917181491852},{"id":"https://openalex.org/C2777904410","display_name":"Software","score":0.45005297660827637},{"id":"https://openalex.org/C117447612","display_name":"Software quality","score":0.43263334035873413}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4411523013","title":"OmniGIRL: A Multilingual and Multimodal Benchmark for GitHub Issue Resolution","url":"https://doi.org/10.1145/3728871","published":"2025-06-22","authors":["Lianghong Guo","Wei Tao","R H Jiang","Yanlin Wang","Jiachi Chen","Xilin Liu","Yuchi Ma","Mingzhi Mao","Hongyu Zhang","Zibin Zheng"],"abstract":"The GitHub issue resolution task aims to resolve issues reported in repositories automatically. With advances in large language models (LLMs), this task has gained increasing attention, and several benchmarks are proposed to evaluate the issue resolution ability of LLMs. However, existing benchmarks have three main limitations. First, current benchmarks focus on a single programming language, limiting the evaluation of issues from repositories across different languages. Second, they usually cover a narrow range of domains, which may fail to represent the diversity of real-world issues. Third, existing benchmarks rely solely on textual information in issue descriptions, overlooking multimodal information such as images in issues. In this paper, we propose OmniGIRL, a GitHub Issue ResoLution benchmark that is multilingual, multimodal, and multi-domain. OmniGIRL includes 959 task instances...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3728871","openalex_id":"https://openalex.org/W4411523013","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Chongqing University","Huawei Technologies (China)","Sun Yat-sen University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7587112188339233},{"id":"https://openalex.org/C519991488","display_name":"Python (programming language)","score":0.6685196161270142},{"id":"https://openalex.org/C544833334","display_name":"JavaScript","score":0.5788146257400513},{"id":"https://openalex.org/C185798385","display_name":"Benchmark (surveying)","score":0.560697615146637},{"id":"https://openalex.org/C2780451532","display_name":"Task (project management)","score":0.5065354108810425},{"id":"https://openalex.org/C169087156","display_name":"Framing (construction)","score":0.4268660843372345},{"id":"https://openalex.org/C2522767166","display_name":"Data science","score":0.38896840810775757},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3737970292568207}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/routing-mamba-scaling-state-space-models-with-mixture-of-experts-projection","title":"Routing Mamba: Scaling State Space Models with Mixture-of-Experts Projection","url":"https://www.microsoft.com/en-us/research/publication/routing-mamba-scaling-state-space-models-with-mixture-of-experts-projection/","published":"2025-06-21","authors":["Zheng Zhan","Liliang Ren","Shuohang Wang","Liyuan Liu","Yang Liu","Yeyun Gong","Yanzhi Wang","Yelong Shen"],"abstract":"Linear State Space Models (SSMs) offer remarkable performance gains in efficient sequence modeling, with constant inference-time computation and memory complexity. Recent advances, such as Mamba, further enhance SSMs with input-dependent gating and hardware-aware implementations, positioning them as strong alternatives to Transformers for long sequence modeling. However, efficiently scaling the expressive power of SSMs, particularly with Mixture of Experts (MoE), remains challenging, as naive integration attempts often falter or degrade performance. In this work, we introduce Routing Mamba (RoM), a novel approach that scales SSM parameters using sparse mixtures of linear projection experts. By sharing routing decisions between projection layers and lightweight sub-modules within Mamba across experts, RoM leverages synergies among linear projection experts for effective and efficient spar...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":80,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","Language model","1970-01-01","memory","efficient"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/detecting-functionality-specific-vulnerabilities-via-retrieving-individual-functionality-equivalent-apis-in-open-source-repositories","title":"Detecting Functionality-Specific Vulnerabilities via Retrieving Individual Functionality-Equivalent APIs in Open-Source Repositories","url":"https://www.microsoft.com/en-us/research/publication/detecting-functionality-specific-vulnerabilities-via-retrieving-individual-functionality-equivalent-apis-in-open-source-repositories/","published":"2025-06-20","authors":["Tianyu Chen","Zeyu Wang","Lin Li","Ding Li","Zongyang Li","Xiaoning Chang","Pan Bian","Guangtai Liang","Qianxiang Wang","Tao Xie"],"abstract":"Functionality-specific vulnerabilities, which mainly occur in Application Programming Interfaces (APIs) with specific functionalities, are crucial for software developers to detect and avoid. When detecting individual functionality-specific vulnerabilities, the existing two categories of approaches are ineffective because they consider only the API bodies and are unable to handle diverse implementations of functionality-equivalent APIs. To effectively detect functionality-specific vulnerabilities, we propose APISS, the first approach to utilize API doc strings and signatures instead of API bodies. APISS first retrieves functionality-equivalent APIs for APIs with existing vulnerabilities and then migrates Proof-of-Concepts (PoCs) of the existing vulnerabilities for newly detected vulnerable APIs. To retrieve functionality-equivalent APIs, we leverage a Large Language Model for API embeddi...","companies":["Microsoft","Huawei/Noah"],"matched_orgs":["Microsoft","Huawei/Noah"],"company_groups":["company_us","company_china"],"company_regions":["US","China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.4230/lipics.ecoop.2025.6","openalex_id":"https://openalex.org/W7110392010","cited_by_count":16106,"quality_score":122,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Security, privacy, and cryptography","Computer science","1970-01-01","language model","efficient"],"author_affiliations":["Microsoft","Huawei Technologies (China)","Huawei Technologies (United States)","Peking University"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"bytedance-seed:289","title":"Polybasic Speculative Decoding Through a Theoretical Perspective","url":"https://seed.bytedance.com/en/research/polybasic-speculative-decoding-through-a-theoretical-perspective","published":"2025-06-20","authors":["Ruilin Wang","Huixia Li","Yuexiao Ma","Xiawu Zheng","Fei Chao","Xuefeng Xiao","Rongrong Ji"],"abstract":"Inference latency stands as a critical bottleneck in the large-scale deployment of Large Language Models (LLMs). Speculative decoding methods have recently shown promise in accelerating inference without compromising the output distribution. However, existing work typically relies on a dualistic draft-verify framework and lacks rigorous theoretical grounding. In this paper, we introduce a novel polybasic speculative decoding framework, underpinned by a comprehensive theoretical analysis. Specifically, we prove a fundamental theorem that characterizes the optimal inference time for multi-model speculative decoding systems, shedding light on how to extend beyond the dualistic approach to a more general polybasic paradigm. Through our theoretical investigation of multi-model token generation, we expose and optimize the interplay between model capabilities, acceptance lengths, and overall co...","companies":["ByteDance/Seed"],"matched_orgs":["ByteDance/Seed"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Deep Learning","Vision","ICML 2025"],"author_affiliations":["ByteDance/Seed"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official ByteDance Seed publication API https://seed.bytedance.com/api/get_article_list_v2"}},{"id":"apple:lk00ebb8rlzxaebyn58m0egi","title":"Scaling Laws for Forgetting During Finetuning with Pretraining Data Injection","url":"https://machinelearning.apple.com/research/scaling-laws","published":"2025-06-20","authors":["Louis Béthune","David Grangier","Dan Busbridge","Eleonora Gualdoni","Marco Cuturi","Pierre Ablin"],"abstract":"A widespread strategy for obtaining a language model that performs well in a target domain is to fine-tune it by training it to do unsupervised next-token prediction on data from that domain. Fine-tuning presents two challenges: i) if the amount of target data is limited, as is the case in most practical applications, the model will quickly overfit, and ii) the model will drift away from the original model and forget the pre-training...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["language model"],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"openalex:W4411486143","title":"RAGO: Systematic Performance Optimization for Retrieval-Augmented Generation Serving","url":"https://doi.org/10.1145/3695053.3731093","published":"2025-06-20","authors":["Wenqi Jiang","Suvinay Subramanian","Catherine E. Graves","Gustavo Alonso","Amir Yazdanbakhsh","Vidushi Dadu"],"abstract":"Retrieval-augmented generation (RAG) is emerging as a popular approach for reliable LLM serving.However, efficient RAG serving remains an open challenge due to the rapid emergence of many RAG variants and the substantial differences in workload characteristics across them.This paper makes three fundamental contributions to advancing RAG serving.First, we introduce RAGSchema, a structured abstraction that captures the wide range of RAG algorithms, serving as a foundation for performance optimization.Second, we analyze several representative RAG workloads with distinct RAGSchema, revealing significant performance variability across these workloads.Third, to address this variability and meet diverse performance requirements, we propose RAGO (Retrieval-Augmented Generation Optimizer), a system optimization framework for efficient RAG serving.RAGO achieves up to a 2increase in QPS per chip an...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3695053.3731093","openalex_id":"https://openalex.org/W4411486143","cited_by_count":7,"quality_score":56,"matched_keywords":["LLM","retrieval","efficient"],"author_affiliations":["ETH Zurich","Google (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.737680196762085},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.4085083305835724}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":7}},{"id":"openalex:W4411472182","title":"Leveraging Correctness-Based Learning for Efficient Reasoning via Automated Process Supervision","url":"https://doi.org/10.1109/jstsp.2025.3581101","published":"2025-06-20","authors":["Yuxuan Yao","Mingyang Liu","Guanzhi Deng","Zengyan Liu","Dapeng Wu","Zhijiang Guo","Han Wu","Linqi Song"],"abstract":"Large language models (LLMs) have demonstrated exceptional performance across various tasks, yet they still face limitations such as hallucination, unfaithful reasoning, and toxic content. One potential approach to mitigate these issues is to learn from human feedback or external tools. In this paper, we introduce an intrinsic self-correcting reasoning framework for LLMs that eliminates the need for human input, external tools, and handcrafted prompts. Our proposed framework, based on a multi-step reasoning paradigm called Learning from Correctness (LECO), enhances reasoning performance without relying on error-based learning. However, it is important to note that model logits are not always available. In such cases, the pre-trained process reward model (PRM) serves as an effective alternative. Consequently, we integrated LECO with PRM to develop the new LECO-R model. Experimental result...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/jstsp.2025.3581101","openalex_id":"https://openalex.org/W4411472182","cited_by_count":0,"quality_score":41,"matched_keywords":["efficient"],"author_affiliations":["City University of Hong Kong","Huawei Technologies (China)"],"concepts":[{"id":"https://openalex.org/C55439883","display_name":"Correctness","score":0.8243887424468994},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8060210943222046},{"id":"https://openalex.org/C98045186","display_name":"Process (computing)","score":0.6990607976913452},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5460647344589233},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.4415067434310913},{"id":"https://openalex.org/C199360897","display_name":"Programming language","score":0.17642742395401}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4411470730","title":"Applying multimodal AI to physiological waveforms improves genetic prediction of cardiovascular traits","url":"https://doi.org/10.1016/j.ajhg.2025.05.015","published":"2025-06-20","authors":["Yuchen Zhou","Justin Khasentino","Taedong Yun","Mahantesh I. Biradar","Jacqueline Baras Shreibati","Dongbing Lai","Tae‐Hwi Schwantes‐An","Robert Luben","Zachary R. McCaw","Jorgen Engmann","Rui Providência","Amand F. Schmidt"],"abstract":"Electronic health records, biobanks, and wearable biosensors enable the collection of multiple health modalities from many individuals. Access to multimodal health data provides a unique opportunity for genetic studies of complex traits because different modalities relevant to a single physiological system (e.g., circulatory system) encode complementary and overlapping information. We propose a multimodal deep learning method, multimodal representation learning for genetic discovery on low-dimensional embeddings (M-REGLE), for discovering genetic associations from a joint representation of complementary electrophysiological waveform modalities. M-REGLE jointly learns a lower representation (i.e., latent factors) of multimodal physiological waveforms using a convolutional variational autoencoder, performs genome-wide association studies (GWASs) on each latent factor, then combines the res...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1016/j.ajhg.2025.05.015","openalex_id":"https://openalex.org/W4411470730","cited_by_count":3,"quality_score":40,"matched_keywords":[],"author_affiliations":["Genomics England","Google (United States)","Indiana University School of Medicine","Indiana University – Purdue University Indianapolis","MRC Epidemiology Unit","Moorfields Eye Hospital","National Institute for Health Research","Netherlands Heart Institute","Queen Mary University of London","St Bartholomew's Hospital","University College London","University Medical Center Utrecht","University of North Carolina at Chapel Hill","William Harvey Research Institute"],"concepts":[{"id":"https://openalex.org/C101738243","display_name":"Autoencoder","score":0.6999975442886353},{"id":"https://openalex.org/C116567970","display_name":"Biobank","score":0.6952699422836304},{"id":"https://openalex.org/C2779903281","display_name":"Modalities","score":0.6751514077186584},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.6233784556388855},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5787343978881836},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.5102672576904297},{"id":"https://openalex.org/C108583219","display_name":"Deep learning","score":0.4659693241119385},{"id":"https://openalex.org/C60644358","display_name":"Bioinformatics","score":0.20003631711006165}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":3}},{"id":"arxiv:2506.16791","title":"TabArena: A Living Benchmark for Machine Learning on Tabular Data","url":"http://arxiv.org/abs/2506.16791","published":"2025-06-20","authors":["Nick Erickson","Lennart Purucker","Andrej Tschalzev","David Holzmüller","P. Desai","David Salinas","Frank Hutter"],"abstract":"With the growing popularity of deep learning and foundation models for tabular data, the need for standardized and reliable benchmarks is higher than ever. However, current benchmarks are static. Their design is not updated even if flaws are discovered, model versions are updated, or new models are released. To address this, we introduce TabArena, the first continuously maintained living tabular benchmarking system. To launch TabArena, we manually curate a representative collection of datasets and well-implemented models, conduct a large-scale benchmarking study to initialize a public leaderboard, and assemble a team of experienced maintainers. Our results highlight the influence of validation method and ensembling of hyperparameter configurations to benchmark models at their full potential. While gradient-boosted trees are still strong contenders on practical tabular datasets, we observ...","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"preprint","doi":"https://doi.org/10.48550/arxiv.2506.16791","openalex_id":"https://openalex.org/W4414987631","cited_by_count":1,"quality_score":38,"matched_keywords":[],"author_affiliations":["Amazon (United States)","University of Freiburg","University of Mannheim","École Polytechnique"],"concepts":[{"id":"https://openalex.org/C86251818","display_name":"Benchmarking","score":0.9236999750137329},{"id":"https://openalex.org/C185798385","display_name":"Benchmark (surveying)","score":0.8531000018119812},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.7831000089645386},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.7452999949455261},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7260000109672546},{"id":"https://openalex.org/C108583219","display_name":"Deep learning","score":0.6254000067710876},{"id":"https://openalex.org/C177264268","display_name":"Set (abstract data type)","score":0.5386000275611877},{"id":"https://openalex.org/C8642999","display_name":"Hyperparameter","score":0.531000018119812}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4411450140","title":"CoverUp: Effective High Coverage Test Generation for Python","url":"https://doi.org/10.1145/3729398","published":"2025-06-19","authors":["Juan Altmayer Pizzorno","Emery D. Berger"],"abstract":"Testing is an essential part of software development. Test generation tools attempt to automate the otherwise labor-intensive task of test creation, but generating high-coverage tests remains challenging. This paper proposes CoverUp, a novel approach to driving the generation of high-coverage Python regression tests. CoverUp combines coverage analysis, code context, and feedback in prompts that iteratively guide the LLM to generate tests that improve line and branch coverage. We evaluate our prototype CoverUp implementation across a benchmark of challenging code derived from open-source Python projects and show that CoverUp substantially improves on the state of the art. Compared to CodaMosa, a hybrid search/LLM-based test generator, CoverUp achieves a per-module median line+branch coverage of 80% (vs. 47%). Compared to MuTAP, a mutation- and LLM-based test generator, CoverUp achieves an...","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3729398","openalex_id":"https://openalex.org/W4411450140","cited_by_count":12,"quality_score":53,"matched_keywords":["LLM"],"author_affiliations":["Amazon (United States)","Amherst College"],"concepts":[{"id":"https://openalex.org/C519991488","display_name":"Python (programming language)","score":0.9152735471725464},{"id":"https://openalex.org/C53942775","display_name":"Code coverage","score":0.803485095500946},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7244886159896851},{"id":"https://openalex.org/C148027188","display_name":"Unit testing","score":0.6128331422805786},{"id":"https://openalex.org/C161821725","display_name":"Regression testing","score":0.6099619269371033},{"id":"https://openalex.org/C199519371","display_name":"Source lines of code","score":0.5890292525291443},{"id":"https://openalex.org/C199360897","display_name":"Programming language","score":0.498187780380249},{"id":"https://openalex.org/C43126263","display_name":"Source code","score":0.4815533459186554}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":12}},{"id":"official:9bc8a7ed1b9a65d3","title":"Hunyuan3D 2.5: Towards High-Fidelity 3D Assets Generation with Ultimate Details","url":"https://huggingface.co/papers/2506.16504","published":"2025-06-19","authors":["Tencent Hunyuan"],"abstract":"","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_report"],"source":"official_report","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Tencent/Hunyuan"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official company report source"}},{"id":"openalex:W4411446206","title":"External Retrievals or Internal Priors? From RAG to Epitome-Augmented Generation by Fuzzy Selection","url":"https://doi.org/10.1109/tfuzz.2025.3581205","published":"2025-06-19","authors":["Kai He","Jiaxing Xu","Qika Lin","Wenqing Wang","Zeyu Gao","Jialun Wu","Yucheng Huang","Mengling Feng"],"abstract":"Retrieval-Augmented Generation (RAG) offers a promising solution to the limitations of static knowledge and hallucinations in Large Language Models (LLMs). While prior research has introduced numerous enhancements to RAG systems, a significant challenge remains under-explored: the potential conflict between external retrievals and LLMs' internal priors, which can undermine the quality of generated outputs. To tackle this issue, we present the <bold xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">E</b>pitome-<bold xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">A</b>ugmented <bold xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">G</b>eneration (<bold xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" xmlns:xlink=\"http://www.w3.org/1999/xlink\"/><italic xmlns:mml=\"http://...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/tfuzz.2025.3581205","openalex_id":"https://openalex.org/W4411446206","cited_by_count":0,"quality_score":49,"matched_keywords":["LLM","retrieval","distillation"],"author_affiliations":["Nanyang Technological University","National University Health System","National University of Singapore","Northwestern Polytechnical University","Tencent (China)","University of Cambridge","Xi'an Jiaotong University"],"concepts":[{"id":"https://openalex.org/C2775858994","display_name":"Epitome","score":0.8985852003097534},{"id":"https://openalex.org/C177769412","display_name":"Prior probability","score":0.7121002078056335},{"id":"https://openalex.org/C81917197","display_name":"Selection (genetic algorithm)","score":0.6795287132263184},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5507295727729797},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5474286079406738},{"id":"https://openalex.org/C58166","display_name":"Fuzzy logic","score":0.5233356356620789},{"id":"https://openalex.org/C153180895","display_name":"Pattern recognition (psychology)","score":0.3426140546798706},{"id":"https://openalex.org/C33923547","display_name":"Mathematics","score":0.33690008521080017}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4411449706","title":"Code Change Intention, Development Artifact, and History Vulnerability: Putting Them Together for Vulnerability Fix Detection by LLM","url":"https://doi.org/10.1145/3715738","published":"2025-06-19","authors":["Xu Yang","Wenhan Zhu","Michael Pacheco","Jiayuan Zhou","Shaowei Wang","Xing Hu","Kui Liu"],"abstract":"Detecting vulnerability fix commits in open-source software is crucial for maintaining software security. To help OSS identify vulnerability fix commits, several automated approaches are developed. However, existing approaches like VulFixMiner and CoLeFunDa, focus solely on code changes, neglecting essential context from development artifacts. Tools like Vulcurator, which integrates issue reports, fail to leverage semantic associations between different development artifacts (e.g., pull requests and history vulnerability fixes). Moreover, they miss vulnerability fixes in tangled commits and lack explanations, limiting practical use. Hence to address those limitations, we propose LLM4VFD, a novel framework that leverages Large Language Models (LLMs) enhanced with Chain-of-Thought reasoning and In-Context Learning to improve the accuracy of vulnerability fix detection. LLM4VFD comprises th...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3715738","openalex_id":"https://openalex.org/W4411449706","cited_by_count":4,"quality_score":49,"matched_keywords":["LLM","language model"],"author_affiliations":["Huawei Technologies (Canada)","Huawei Technologies (China)","University of Manitoba","Zhejiang University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.723605751991272},{"id":"https://openalex.org/C95713431","display_name":"Vulnerability (computing)","score":0.6792377233505249},{"id":"https://openalex.org/C2779010991","display_name":"Artifact (error)","score":0.6560807228088379},{"id":"https://openalex.org/C153180980","display_name":"Commit","score":0.5811672806739807},{"id":"https://openalex.org/C2779343474","display_name":"Context (archaeology)","score":0.5662007927894592},{"id":"https://openalex.org/C153083717","display_name":"Leverage (statistics)","score":0.5197648406028748},{"id":"https://openalex.org/C38652104","display_name":"Computer security","score":0.49632149934768677},{"id":"https://openalex.org/C167063184","display_name":"Vulnerability assessment","score":0.45502975583076477}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":4}},{"id":"openalex:W4411449774","title":"Beyond PEFT: Layer-Wise Optimization for More Effective and Efficient Large Code Model Tuning","url":"https://doi.org/10.1145/3729341","published":"2025-06-19","authors":["Chaozheng Wang","Jia Feng","Shuzheng Gao","Cuiyun Gao","Zongjie Li","T. Y. Peng","Hailiang Huang","Yuetang Deng","Michael R. Lyu"],"abstract":"Large Code Models (LCMs) have demonstrated remarkable effectiveness across various code intelligence tasks. Supervised fine-tuning is essential to optimize their performance for specific downstream tasks. Compared with the traditional full-parameter fine-tuning (FFT) method, Parameter-Efficient Fine-Tuning (PEFT) methods can train LCMs with substantially reduced resource consumption and have gained widespread attention among researchers and practitioners. While existing studies have explored PEFT methods for code intelligence tasks, they have predominantly focused on a limited subset of scenarios, such as code generation with publicly available datasets, leading to constrained generalizability of the findings. To mitigate the limitation, we conduct a comprehensive study on exploring the effectiveness of the PEFT methods, which involves five code intelligence tasks containing both public....","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3729341","openalex_id":"https://openalex.org/W4411449774","cited_by_count":2,"quality_score":43,"matched_keywords":["efficient"],"author_affiliations":["Chinese University of Hong Kong","Hong Kong University of Science and Technology","Tencent (China)","University of Electronic Science and Technology of China"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7943698167800903},{"id":"https://openalex.org/C2776760102","display_name":"Code (set theory)","score":0.5742198824882507},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.5313292741775513},{"id":"https://openalex.org/C2776214188","display_name":"Inference","score":0.5138514041900635},{"id":"https://openalex.org/C27158222","display_name":"Generalizability theory","score":0.4669187068939209},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4388014078140259},{"id":"https://openalex.org/C2780451532","display_name":"Task (project management)","score":0.4350394606590271},{"id":"https://openalex.org/C43126263","display_name":"Source code","score":0.42951488494873047}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":2}},{"id":"openalex:W4411450119","title":"Beyond Functional Correctness: Investigating Coding Style Inconsistencies in Large Language Models","url":"https://doi.org/10.1145/3715749","published":"2025-06-19","authors":["Yanlin Wang","Tianyue Jiang","Mingwei Liu","Jiachi Chen","Mingzhi Mao","Xilin Liu","Yuchi Ma","Zibin Zheng"],"abstract":"Large language models (LLMs) have brought a paradigm shift to the field of code generation, offering the potential to enhance the software development process. However, previous research mainly focuses on the accuracy of code generation, while coding style differences between LLMs and human developers remain under-explored. In this paper, we empirically analyze the differences in coding style between the code generated by mainstream LLMs and the code written by human developers, and summarize coding style inconsistency taxonomy. Specifically, we first summarize the types of coding style inconsistencies by manually analyzing a large number of generation results. We then compare the code generated by LLMs with the code written by human programmers in terms of readability, conciseness, and robustness. The results reveal that LLMs and developers exhibit differences in coding style. Additiona...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3715749","openalex_id":"https://openalex.org/W4411450119","cited_by_count":6,"quality_score":43,"matched_keywords":[],"author_affiliations":["Huawei Technologies (China)","Sun Yat-sen University"],"concepts":[{"id":"https://openalex.org/C2778143727","display_name":"Readability","score":0.8485053777694702},{"id":"https://openalex.org/C179518139","display_name":"Coding (social sciences)","score":0.7167964577674866},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6651856899261475},{"id":"https://openalex.org/C55439883","display_name":"Correctness","score":0.4972558319568634},{"id":"https://openalex.org/C2776445246","display_name":"Style (visual arts)","score":0.48846641182899475},{"id":"https://openalex.org/C2777617010","display_name":"Mainstream","score":0.42716264724731445},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.3588729202747345},{"id":"https://openalex.org/C2522767166","display_name":"Data science","score":0.32343727350234985}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":6}},{"id":"openalex:W4411449970","title":"Smaller but Better: Self-Paced Knowledge Distillation for Lightweight yet Effective LCMs","url":"https://doi.org/10.1145/3729405","published":"2025-06-19","authors":["Yujia Chen","Y.-C. Ye","Zhongqi Li","Yuchi Ma","Cuiyun Gao"],"abstract":"Large code models (LCMs) have remarkably advanced the field of code generation. Despite their impressive capabilities, they still face practical deployment issues, such as high inference costs, limited accessibility of proprietary LCMs, and adaptability issues of ultra-large LCMs. These issues highlight the critical need for more accessible, lightweight yet effective LCMs. Knowledge distillation (KD) offers a promising solution, which transfers the programming capabilities of larger, advanced LCMs (Teacher) to smaller, less powerful LCMs (Student). However, existing KD methods for code intelligence often lack consideration of fault domain knowledge and rely on static seed knowledge, leading to degraded programming capabilities of student models. In this paper, we propose a novel Self-Paced knOwledge DistillAtion framework, named SODA, aiming at developing lightweight yet effective studen...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3729405","openalex_id":"https://openalex.org/W4411449970","cited_by_count":0,"quality_score":41,"matched_keywords":["distillation"],"author_affiliations":["Harbin Institute of Technology","Huawei Technologies (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7496309876441956},{"id":"https://openalex.org/C55439883","display_name":"Correctness","score":0.6237825155258179},{"id":"https://openalex.org/C115903868","display_name":"Software engineering","score":0.57565838098526},{"id":"https://openalex.org/C2775928411","display_name":"Fault injection","score":0.5531606674194336},{"id":"https://openalex.org/C2776214188","display_name":"Inference","score":0.46018582582473755},{"id":"https://openalex.org/C207685749","display_name":"Domain knowledge","score":0.42034488916397095},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3384963870048523},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.3240143656730652}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4411450188","title":"Less Is More: On the Importance of Data Quality for Unit Test Generation","url":"https://doi.org/10.1145/3715778","published":"2025-06-19","authors":["Junwei Zhang","Xing Hu","Shan Gao","Xin Xia","David Lo","Shanping Li"],"abstract":"Unit testing is crucial for software development and maintenance. Effective unit testing ensures and improves software quality, but writing unit tests is time-consuming and labor-intensive. Recent studies have proposed deep learning (DL) techniques or large language models (LLMs) to automate unit test generation. These models are usually trained or fine-tuned on large-scale datasets. Despite growing awareness of the importance of data quality, there has been limited research on the quality of datasets used for test generation. To bridge this gap, we systematically examine the impact of noise on the performance of learning-based test generation models. We first apply the open card sorting method to analyze the most popular and largest test generation dataset, Methods2Test, to categorize eight distinct types of noise. Further, we conduct detailed interviews with 17 domain experts to valida...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3715778","openalex_id":"https://openalex.org/W4411450188","cited_by_count":2,"quality_score":39,"matched_keywords":[],"author_affiliations":["Huawei Technologies (China)","Singapore Management University","Zhejiang University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7337138652801514},{"id":"https://openalex.org/C148027188","display_name":"Unit testing","score":0.732136070728302},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.5701146721839905},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4996969699859619},{"id":"https://openalex.org/C16910744","display_name":"Test data","score":0.4989774227142334},{"id":"https://openalex.org/C99498987","display_name":"Noise (video)","score":0.474107563495636},{"id":"https://openalex.org/C24756922","display_name":"Data quality","score":0.4364585280418396},{"id":"https://openalex.org/C2779530757","display_name":"Quality (philosophy)","score":0.43637949228286743}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":2}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/revela-dense-retriever-learning-via-language-modeling","title":"Revela: Dense Retriever Learning via Language Modeling","url":"https://www.microsoft.com/en-us/research/publication/revela-dense-retriever-learning-via-language-modeling/","published":"2025-06-18","authors":["Fengyu Cai","Tong Chen","Xinran Zhao","Sihao Chen","Hongming Zhang","Sherry Tongshuang Wu","Iryna Gurevych","Heinz Koeppl"],"abstract":"Dense retrievers play a vital role in accessing external and specialized knowledge to augment language models (LMs). Training dense retrievers typically requires annotated query-document pairs, which are costly to create and scarce in specialized domains (e.g., code) or in complex settings (e.g., requiring reasoning). These practical challenges have sparked growing interest in self-supervised retriever learning. Since LMs are trained to capture token-level dependencies through a self-supervised learning objective (i.e., next token prediction), we can analogously cast retrieval as learning dependencies among chunks of tokens. This analogy naturally leads to the question: How can we adapt self-supervised learning objectives in the spirit of language modeling to train retrievers? To answer this question, we introduce Revela, a unified and scalable training framework for self-supervised retr...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","Machine learning","1970-01-01","retrieval"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W4411428681","title":"Generalized biological foundation model with unified nucleic acid and protein language","url":"https://doi.org/10.1038/s42256-025-01044-4","published":"2025-06-18","authors":["Yong He","Pan Fang","Yongtao Shan","Yuanfei Pan","Yanhong Wei","Yichang Chen","Yihao Chen","Yi Liu","Zhenyu Zeng","Zhan Zhou","Feng Zhu","Edward C. Holmes"],"abstract":"Abstract The language of biology, encoded in DNA, RNA and proteins, forms the foundation of life but remains challenging to decode owing to its complexity. Traditional computational methods often struggle to integrate information across these molecules, limiting a comprehensive understanding of biological systems. Advances in natural language processing with pre-trained models offer possibilities for interpreting biological language. Here we introduce LucaOne, a pre-trained foundation model trained on nucleic acid and protein sequences from 169,861 species. Through large-scale data integration and semi-supervised learning, LucaOne shows an understanding of key biological principles, such as DNA–protein translation. Using few-shot learning, it effectively comprehends the central dogma of molecular biology and performs competitively on tasks involving DNA, RNA or protein inputs. Our result...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1038/s42256-025-01044-4","openalex_id":"https://openalex.org/W4411428681","cited_by_count":21,"quality_score":58,"matched_keywords":[],"author_affiliations":["Alibaba Group (China)","Chinese Academy of Medical Sciences & Peking Union Medical College","City University of Hong Kong","Fudan University","Institute of Infection and Immunity","Second Affiliated Hospital of Zhejiang University","Sun Yat-sen University","Taronga Conservation Society Australia","The University of Sydney","Zhejiang University"],"concepts":[{"id":"https://openalex.org/C188198153","display_name":"Limiting","score":0.6237524747848511},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5943726301193237},{"id":"https://openalex.org/C2780966255","display_name":"Foundation (evidence)","score":0.5496722459793091},{"id":"https://openalex.org/C24107716","display_name":"Nucleic acid","score":0.528735876083374},{"id":"https://openalex.org/C70721500","display_name":"Computational biology","score":0.476727157831192},{"id":"https://openalex.org/C201797286","display_name":"Biological data","score":0.4606151580810547},{"id":"https://openalex.org/C67705224","display_name":"RNA","score":0.43508023023605347},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4223966598510742}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":21}},{"id":"official:1682a8c6dc17dbb6","title":"Hunyuan3D 2.1: From Images to High-Fidelity 3D Assets with Production-Ready PBR Material","url":"https://huggingface.co/papers/2506.15442","published":"2025-06-18","authors":["Tencent Hunyuan"],"abstract":"","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_report"],"source":"official_report","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Tencent/Hunyuan"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official company report source"}},{"id":"openalex:W7154601118","title":"The Effect of Representational Compression on Flexibility Across Learning in Humans and Artificial Neural Networks","url":"https://doi.org/10.48448/x3hd-0w13","published":"2025-06-18","authors":["Cognitive Science Society 2025","Christopher Summerfield","Mia Whitefield"],"abstract":"Humans can generalise from past experiences to novel situations as well as revise prior knowledge to flexibly adapt to changing contexts and goals. The representational geometry framework formalises how information is structured in the brain and suggests that abstraction involves a trade-off between generalisation and flexibility. However, how task representations evolve across learning and relate to behaviour remains unclear. Here, we tested the hypothesis that representational compression of task representations across learning underlies this flexibility impairment. Using an extra-dimensional shifting task, we manipulated the pretraining length to control the degree of compression. In both humans and artificial neural networks, longer pretraining was associated with decreased flexibility. Network dynamics indicated that greater compression incurred a higher representational reorganisat...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"other","doi":"https://doi.org/10.48448/x3hd-0w13","openalex_id":"https://openalex.org/W7154601118","cited_by_count":0,"quality_score":41,"matched_keywords":["compression"],"author_affiliations":["Google DeepMind (United Kingdom)","University of Oxford"],"concepts":[{"id":"https://openalex.org/C2780598303","display_name":"Flexibility (engineering)","score":0.7300000190734863},{"id":"https://openalex.org/C2780451532","display_name":"Task (project management)","score":0.6420999765396118},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6175000071525574},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.6097999811172485},{"id":"https://openalex.org/C124304363","display_name":"Abstraction","score":0.6036999821662903},{"id":"https://openalex.org/C111030470","display_name":"Curse of dimensionality","score":0.5583999752998352},{"id":"https://openalex.org/C188198153","display_name":"Limiting","score":0.46560001373291016},{"id":"https://openalex.org/C50644808","display_name":"Artificial neural network","score":0.4634000062942505}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4411430238","title":"Predicting High‐Resolution Spatial and Spectral Features in Mass Spectrometry Imaging with Machine Learning and Multimodal Data Fusion","url":"https://doi.org/10.1002/aidi.202500021","published":"2025-06-18","authors":["Md Inzamam Ul Haque","Ramakrishnan Kannan","Jacob Hinkle","Sylwia A. Stopka","Anton V. Ievlev","Nathalie Y.R. Agar","Olga S. Ovchinnikova","Debangshu Mukherjee"],"abstract":"Recent advancements in molecular Mass Spectrometry Imaging have sparked interest in integrating high spatial resolution methods with molecular mass‐spectrometry‐based chemical imaging. Fusion‐based algorithms have proven effective in generating high spatial‐resolution molecular mass spectra. However, a significant challenge stems from the differing physical mechanisms underlying image generation and data upsampling techniques, potentially leading to discrepancies in integrated information channels. Integrating physical constraints into data processing workflows is essential to tackle this issue. In this study, we propose an innovative approach that merges data from Fourier transform ion cyclotron resonance (FTICR), time‐of‐flight matrix‐assisted laser desorption/ionization, and time‐of‐flight secondary ion mass spectrometry imaging techniques. By leveraging FT‐ICR's unparalleled spectral...","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1002/aidi.202500021","openalex_id":"https://openalex.org/W4411430238","cited_by_count":1,"quality_score":38,"matched_keywords":[],"author_affiliations":["Brigham and Women's Hospital","Dana-Farber Cancer Institute","Harvard University","Knoxville College","Nvidia (United States)","Oak Ridge National Laboratory","University of Tennessee at Knoxville"],"concepts":[{"id":"https://openalex.org/C24066741","display_name":"Mass spectrometry imaging","score":0.7218527793884277},{"id":"https://openalex.org/C162356407","display_name":"Mass spectrometry","score":0.6459522247314453},{"id":"https://openalex.org/C170552419","display_name":"Fourier transform ion cyclotron resonance","score":0.6375640630722046},{"id":"https://openalex.org/C205372480","display_name":"Image resolution","score":0.5813047289848328},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5088925361633301},{"id":"https://openalex.org/C3232514","display_name":"Spectral imaging","score":0.4993431568145752},{"id":"https://openalex.org/C65597285","display_name":"Chemical imaging","score":0.4911656081676483},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.44278615713119507}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4412568013","title":"Representation Is All We Need: Performance and Fairness of Google X-ray Foundation Model Representations – A Preliminary Study","url":"https://doi.org/10.1109/ichi64645.2025.00102","published":"2025-06-18","authors":["Gebreyowhans H. Bahre","Hassan Hamidi","Andrew Sellergren","Leo Anthony Celi","Francesco Calimeri","Laleh Seyyed-Kalantari"],"abstract":"AI has shown remarkable potential in healthcare, but faces accessibility challenges due to high computational and expertise demands, especially in medical image analysis. Vector embeddings (Emb) offer a solution by converting large medical image datasets into compact representations via foundation models in zero-shot inference, reducing GPU and storage needs. We evaluate AI models trained on Emb versus medical images for chest X-ray diagnosis and findings show that Emb-based models maintain classification performance while improving fairness across demographics(e.g., race, sex, and age), making AI more accessible to low-resource communities.","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/ichi64645.2025.00102","openalex_id":"https://openalex.org/W4412568013","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Google (United States)","IIT@MIT","Massachusetts Institute of Technology","University of Calabria","Vector Institute","York University"],"concepts":[{"id":"https://openalex.org/C2780966255","display_name":"Foundation (evidence)","score":0.7878552079200745},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6795972585678101},{"id":"https://openalex.org/C2776359362","display_name":"Representation (politics)","score":0.649045467376709},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.435403972864151},{"id":"https://openalex.org/C80444323","display_name":"Theoretical computer science","score":0.37440037727355957},{"id":"https://openalex.org/C136764020","display_name":"World Wide Web","score":0.35487550497055054},{"id":"https://openalex.org/C17744445","display_name":"Political science","score":0.07882314920425415},{"id":"https://openalex.org/C94625758","display_name":"Politics","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7154615150","title":"CoRe: Cognitive Reasoning Framework for Zero-Shot Table Understanding and Reasoning","url":"https://doi.org/10.48448/xy3g-n136","published":"2025-06-18","authors":["Cognitive Science Society 2025","Jiongfan Chen","Chengfeng Chen","Ying Chen","Kui Leng","Haicong Li","Haibo Liu","Yong Liu","Fang Wang","Lu Yang","Junyan Ye","Bo Zhang"],"abstract":"While Large Language Models (LLMs) have demonstrated remarkable capabilities in natural language understanding, they still struggle with reasoning over table-based structured data, particularly in zero-shot settings. Tasks like question answering (QA), SQL generation, and numerical reasoning often fail due to insufficient task-specific training. To address these challenges, we propose CoRe (Cognitive Reasoning Framework for Structured Data), inspired by cognitive science principles of hierarchical and iterative reasoning. CoRe structures reasoning into multiple stages, allowing LLMs to better navigate the intricacies of table-based data. Evaluations using advanced LLMs, including Qwen-Plus, GPT-4o mini, and GLM-4-Plus, on datasets like HybridQA, BIRD, and DocMath-Eval show consistent performance improvements. CoRe outperforms zero-shot state-of-the-art (SOTA) on HybridQA and BIRD by 11.4...","companies":["Z.ai/Zhipu"],"matched_orgs":["Z.ai/Zhipu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"other","doi":"https://doi.org/10.48448/xy3g-n136","openalex_id":"https://openalex.org/W7154615150","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["China Mobile (China)","ETH Zurich","Hunan First Normal University","South China Normal University","Zhipu AI (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6959999799728394},{"id":"https://openalex.org/C36964233","display_name":"Verbal reasoning","score":0.6171000003814697},{"id":"https://openalex.org/C183521366","display_name":"Psychology of reasoning","score":0.6014000177383423},{"id":"https://openalex.org/C2164484","display_name":"Core (optical fiber)","score":0.5928000211715698},{"id":"https://openalex.org/C86827895","display_name":"Opportunistic reasoning","score":0.5189999938011169},{"id":"https://openalex.org/C37335422","display_name":"Model-based reasoning","score":0.5149000287055969},{"id":"https://openalex.org/C89288958","display_name":"Reasoning system","score":0.5127999782562256},{"id":"https://openalex.org/C169900460","display_name":"Cognition","score":0.5109999775886536}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/nabla-r2d3-effective-and-efficient-3d-diffusion-alignment-with-2d-rewards","title":"Nabla-R2D3: Effective and Efficient 3D Diffusion Alignment with 2D Rewards","url":"https://www.microsoft.com/en-us/research/publication/nabla-r2d3-effective-and-efficient-3d-diffusion-alignment-with-2d-rewards/","published":"2025-06-17","authors":["Qingming Liu","Zhen Liu","Dinghuai Zhang","Kui Jia"],"abstract":"Generating high-quality and photorealistic 3D assets remains a longstanding challenge in 3D vision and computer graphics. Although state-of-the-art generative models, such as diffusion models, have made significant progress in 3D generation, they often fall short of human-designed content due to limited ability to follow instructions, align with human preferences, or produce realistic textures, geometries, and physical attributes. In this paper, we introduce Nabla-R2D3, a highly effective and sample-efficient reinforcement learning alignment framework for 3D-native diffusion models using 2D rewards. Built upon the recently proposed Nabla-GFlowNet method, which matches the score function to reward gradients in a principled manner for reward finetuning, our Nabla-R2D3 enables effective adaptation of 3D diffusion models using only 2D reward signals. Extensive experiments show that, unlike v...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Computer vision","Computer science","Computer Vision and Pattern Recognition","1970-01-01","efficient"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/understanding-multi-fidelity-training-of-machine-learned-force-fields","title":"Understanding multi-fidelity training of machine-learned force-fields","url":"https://www.microsoft.com/en-us/research/publication/understanding-multi-fidelity-training-of-machine-learned-force-fields/","published":"2025-06-17","authors":["John Gardner","Hannes Schulz","Jean Helie","Lixin Sun","Gregor Simm"],"abstract":"Effectively leveraging data from multiple quantum-chemical methods is essential for building machine-learned force fields (MLFFs) that are applicable to a wide range of chemical systems. This study systematically investigates two multi-fidelity training strategies, pre-training/fine-tuning and multi-headed training, to elucidate the mechanisms underpinning their success. We identify key factors driving the efficacy of pre-training followed by fine-tuning, but find that internal representations learned during pre-training are inherently method-specific, requiring adaptation of the model backbone during fine-tuning. Multi-headed models offer an extensible alternative, enabling simultaneous training on multiple fidelities. We demonstrate that a multi-headed model learns method-agnostic representations that allow for accurate predictions across multiple label sources. While this approach int...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Miscellaneous","Artificial intelligence","Machine learning","Physics","efficient"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W4411403530","title":"OpenSearch-SQL: Enhancing Text-to-SQL with Dynamic Few-shot and Consistency Alignment","url":"https://doi.org/10.1145/3725331","published":"2025-06-17","authors":["Xiangjin Xie","Guohuan Xu","Lingyan Zhao","Ruijie Guo"],"abstract":"Although multi-agent collaborative Large Language Models (LLMs) have achieved significant breakthroughs in the Text-to-SQL task, their performance is still constrained by various factors. These factors include the incompleteness of the framework, failure to follow instructions, and model hallucinations. To address these problems, we propose OpenSearch-SQL, which divides the Text-to-SQL task into four main modules: Preprocessing, Extraction, Generation, and Refinement, along with an Alignment module based on a consistency alignment mechanism. This architecture aligns the inputs and outputs of agents through the Alignment module, reducing failures in instruction following and hallucination. Furthermore, we introduce SQL-Like (an intermediate language), optimize the structured Chain-of-Thought (CoT) based on SQL-Like, and develop a dynamic few-shot strategy via self-taught Query-CoT-SQL. In...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3725331","openalex_id":"https://openalex.org/W4411403530","cited_by_count":19,"quality_score":64,"matched_keywords":["agent","multi-agent"],"author_affiliations":["Alibaba Group (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8749047517776489},{"id":"https://openalex.org/C510870499","display_name":"SQL","score":0.7927022576332092},{"id":"https://openalex.org/C2780451532","display_name":"Task (project management)","score":0.534489631652832},{"id":"https://openalex.org/C63000827","display_name":"Software portability","score":0.5257973074913025},{"id":"https://openalex.org/C2776436953","display_name":"Consistency (knowledge bases)","score":0.5078323483467102},{"id":"https://openalex.org/C177264268","display_name":"Set (abstract data type)","score":0.482769638299942},{"id":"https://openalex.org/C55596503","display_name":"Data definition language","score":0.4683641195297241},{"id":"https://openalex.org/C34736171","display_name":"Preprocessor","score":0.4583451449871063}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":19}},{"id":"openalex:W4411403450","title":"PQCache: Product Quantization-based KVCache for Long Context LLM Inference","url":"https://doi.org/10.1145/3725338","published":"2025-06-17","authors":["Hailin Zhang","X. L. Ji","Yilin Chen","Fangcheng Fu","Xupeng Miao","Xiaonan Nie","Weipeng Chen","Bin Cui"],"abstract":"As the field of Large Language Models (LLMs) continues to evolve, the context length in inference is steadily growing. Key-Value Cache (KVCache), the intermediate representations of tokens within LLM inference, has now become the primary memory bottleneck due to limited GPU memory. Current methods selectively determine suitable keys and values for self-attention computation in LLMs to address the issue. However, they either fall short in maintaining model quality or result in high serving latency. Drawing inspiration from advanced embedding retrieval techniques prevalent in the data management community, we consider the storage and retrieval of KVCache as a typical embedding retrieval problem. We propose PQCache , which employs Product Quantization (PQ) to manage KVCache, maintaining model quality while ensuring low serving latency. During the prefilling phase, we apply PQ to tokens' key...","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3725338","openalex_id":"https://openalex.org/W4411403450","cited_by_count":8,"quality_score":61,"matched_keywords":["LLM","memory","retrieval","quantization"],"author_affiliations":["Baidu (China)","Peking University","Purdue University West Lafayette"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7881680727005005},{"id":"https://openalex.org/C2780513914","display_name":"Bottleneck","score":0.5654473304748535},{"id":"https://openalex.org/C2776214188","display_name":"Inference","score":0.5590077042579651},{"id":"https://openalex.org/C82876162","display_name":"Latency (audio)","score":0.5042773485183716},{"id":"https://openalex.org/C45374587","display_name":"Computation","score":0.49770501255989075},{"id":"https://openalex.org/C57273362","display_name":"Decoding methods","score":0.48854097723960876},{"id":"https://openalex.org/C28855332","display_name":"Quantization (signal processing)","score":0.45954057574272156},{"id":"https://openalex.org/C115537543","display_name":"Cache","score":0.4549429714679718}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":8}},{"id":"official:c02df2bab9189e73","title":"ShieldGemma 2 Model Card","url":"https://ai.google.dev/gemma/docs/shieldgemma/model_card_2","published":"2025-06-17","authors":["Google/DeepMind"],"abstract":"Official Google DeepMind model card.","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"model_card","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["model card","ShieldGemma 2"],"author_affiliations":["Google/DeepMind"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Google DeepMind model cards page https://deepmind.google/models/model-cards/"}},{"id":"official:0aa09556b0d8ea15","title":"Gemma 3n Model Card","url":"https://ai.google.dev/gemma/docs/gemma-3n/model_card","published":"2025-06-17","authors":["Google/DeepMind"],"abstract":"Official Google DeepMind model card.","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"model_card","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["model card","Gemma 3n"],"author_affiliations":["Google/DeepMind"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Google DeepMind model cards page https://deepmind.google/models/model-cards/"}},{"id":"openalex:W4411403365","title":"Revisiting Graph Analytics Benchmark","url":"https://doi.org/10.1145/3725345","published":"2025-06-17","authors":["Lingkai Meng","Yu Shao","Long Yuan","Longbin Lai","Peng Cheng","Xue Li","Wenyuan Yu","Wenjie Zhang","Xuemin Lin","Jingren Zhou"],"abstract":"The rise of graph analytics platforms has led to the development of various benchmarks for evaluating and comparing platform performance. However, existing benchmarks often fall short of fully assessing performance due to limitations in core algorithm selection, data generation processes (and the corresponding synthetic datasets), as well as the neglect of API usability evaluation. To address these shortcomings, we propose a novel graph analytics benchmark. First, we select eight core algorithms by extensively reviewing both academic and industrial settings. Second, we design an efficient and flexible data generator and produce eight new synthetic datasets as the default datasets for our benchmark. Lastly, we introduce a multi-level large language model (LLM)-based framework for API usability evaluation-the first of its kind in graph analytics benchmarks. We conduct comprehensive experim...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3725345","openalex_id":"https://openalex.org/W4411403365","cited_by_count":2,"quality_score":51,"matched_keywords":["LLM","language model","efficient"],"author_affiliations":["Alibaba Group (China)","East China Normal University","Shanghai Jiao Tong University","Tongji University","UNSW Sydney","Wuhan University of Technology"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.817225456237793},{"id":"https://openalex.org/C185798385","display_name":"Benchmark (surveying)","score":0.7376633882522583},{"id":"https://openalex.org/C79158427","display_name":"Analytics","score":0.7290087342262268},{"id":"https://openalex.org/C86251818","display_name":"Benchmarking","score":0.6208759546279907},{"id":"https://openalex.org/C170130773","display_name":"Usability","score":0.6207560896873474},{"id":"https://openalex.org/C132525143","display_name":"Graph","score":0.5750250816345215},{"id":"https://openalex.org/C2522767166","display_name":"Data science","score":0.4580608010292053},{"id":"https://openalex.org/C124101348","display_name":"Data mining","score":0.34935933351516724}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":2}},{"id":"openalex:W4411374404","title":"A <scp>ndromeda</scp> : Debugging Database Performance Issues with Retrieval-Augmented Large Language Models","url":"https://doi.org/10.1145/3722212.3725080","published":"2025-06-17","authors":["P. Wang","Sibei Chen","Ju Fan","Bin Wu","Nan Tang","Jian Tan"],"abstract":"Debugging performance issues in a database management system (DBMS) is tedious and challenging, even for experienced database administrators (DBAs). Thus, with the rapid advancement of large language models (LLMs), developing an LLM-powered co-pilot to assist or even replace DBAs by automatically diagnosing issues and generating recommendations for resolution presents a promising direction. However, directly prompting LLMs for DBMS performance debugging often yields either generic or irrelevant responses, as LLMs may lack both domain knowledge in performance debugging and a deep understanding of DBMS internals. In this paper, we introduce Andromeda, a retrieval-augmented, LLM-powered system for automatic DBMS performance debugging. Andromeda enables users to pose natural language questions about various performance issues and provides context-aware and actionable recommendations for reso...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3722212.3725080","openalex_id":"https://openalex.org/W4411374404","cited_by_count":1,"quality_score":46,"matched_keywords":["LLM","retrieval"],"author_affiliations":["Alibaba Group (China)","Alibaba Group (United States)","Renmin University of China","University of Hong Kong"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8294436931610107},{"id":"https://openalex.org/C168065819","display_name":"Debugging","score":0.8240206241607666},{"id":"https://openalex.org/C199360897","display_name":"Programming language","score":0.47506970167160034},{"id":"https://openalex.org/C77088390","display_name":"Database","score":0.41664087772369385},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.3287573456764221}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4411403463","title":"DIGRA: A Dynamic Graph Indexing for Approximate Nearest Neighbor Search with Range Filter","url":"https://doi.org/10.1145/3725399","published":"2025-06-17","authors":["Mengxu Jiang","Zhi Yang","Fangyuan Zhang","Guanhao Hou","Jieming Shi","Wenchao Zhou","Feifei Li","Sibo Wang"],"abstract":"Recent advancements in AI have enabled models to map real-world entities, such as product images, into high-dimensional vectors, making approximate nearest neighbor search (ANNS) crucial for various applications. Often, these vectors are associated with additional attributes like price, prompting the need for range-filtered ANNS where users seek similar items within specific attribute ranges. Naive solutions like pre-filtering and post-filtering are straightforward but inefficient. Specialized indexes, such as SeRF, SuperPostFiltering, and iRangeGraph, have been developed to address these queries effectively. However, these solutions do not support dynamic updates, limiting their practicality in real-world scenarios where datasets frequently change. To address these challenges, we propose DIGRA, a novel dynamic graph index for range-filtered ANNS. DIGRA supports efficient dynamic updates...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3725399","openalex_id":"https://openalex.org/W4411403463","cited_by_count":3,"quality_score":44,"matched_keywords":["efficient"],"author_affiliations":["Alibaba Group (China)","Chinese University of Hong Kong","Hong Kong Polytechnic University","Huazhong University of Science and Technology"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8318971395492554},{"id":"https://openalex.org/C75165309","display_name":"Search engine indexing","score":0.8148623108863831},{"id":"https://openalex.org/C124101348","display_name":"Data mining","score":0.652753472328186},{"id":"https://openalex.org/C2779960059","display_name":"Overhead (engineering)","score":0.6088800430297852},{"id":"https://openalex.org/C132525143","display_name":"Graph","score":0.5301186442375183},{"id":"https://openalex.org/C204323151","display_name":"Range (aeronautics)","score":0.520791232585907},{"id":"https://openalex.org/C113238511","display_name":"k-nearest neighbors algorithm","score":0.48597466945648193},{"id":"https://openalex.org/C116738811","display_name":"Nearest neighbor search","score":0.47399580478668213}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":3}},{"id":"openalex:W4411383096","title":"Efficient knowledge graph to text powered by LLGM: linear latent graph model","url":"https://doi.org/10.1007/s40747-025-01985-8","published":"2025-06-17","authors":["Xiaokang Zhao","Yao Zheng","Yubo Shan","Jingyuan Li","Kun Zhang","Yuanzhuo Wang"],"abstract":"Knowledge graph to text generation is crucial for interpreting complex structured data, yet state-of-the-art transformer models face significant computational burdens, limiting their practical deployment. This paper introduces the Linear Latent Graph Model (LLGM), a novel architecture that significantly enhances efficiency in KG-to-text generation without compromising performance. LLGM’s core innovations are three-fold: (1) a Multi-head Statistical Attention (MSA) mechanism that achieves linear O(N) complexity by replacing pairwise token interactions with efficient statistical approximations, drastically reducing the primary computational bottleneck; (2) a Graph Latent Self-Attention (GLSA) module that efficiently encodes explicit graph structures using dimension-reduced intermediate representations, preserving relational fidelity with fewer parameters; and (3) a Graph Periodicity Projec...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1007/s40747-025-01985-8","openalex_id":"https://openalex.org/W4411383096","cited_by_count":2,"quality_score":43,"matched_keywords":["efficient"],"author_affiliations":["Beijing Technology and Business University","Chinese Academy of Sciences","Institute of Computing Technology","Tencent (China)","Zhengzhou University"],"concepts":[{"id":"https://openalex.org/C139502532","display_name":"Computational intelligence","score":0.6294078826904297},{"id":"https://openalex.org/C132525143","display_name":"Graph","score":0.5735111236572266},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5348289012908936},{"id":"https://openalex.org/C2987255567","display_name":"Knowledge graph","score":0.4229416847229004},{"id":"https://openalex.org/C80444323","display_name":"Theoretical computer science","score":0.3369932174682617},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.32670870423316956}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":2}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/reinforcement-learning-with-verifiable-rewards-implicitly-incentivizes-correct-reasoning-in-base-llms","title":"Reinforcement Learning with Verifiable Rewards Implicitly Incentivizes Correct Reasoning in Base LLMs","url":"https://www.microsoft.com/en-us/research/publication/reinforcement-learning-with-verifiable-rewards-implicitly-incentivizes-correct-reasoning-in-base-llms/","published":"2025-06-16","authors":["Xumeng Wen","Zihan Liu","Shun Zheng","Zhijian Xu","Shengyu Ye","Zhirong Wu","Xiao Liang","Yang Wang","Junjie Li","Ziming Miao","Jiang Bian","Mao Yang"],"abstract":"Recent advancements in long chain-of-thought (CoT) reasoning, particularly through the Group Relative Policy Optimization algorithm used by DeepSeek-R1, have led to significant interest in the potential of Reinforcement Learning with Verifiable Rewards (RLVR) for Large Language Models (LLMs). While RLVR promises to improve reasoning by allowing models to learn from free exploration, there remains debate over whether it truly enhances reasoning abilities or simply boosts sampling efficiency. This paper systematically investigates the impact of RLVR on LLM reasoning. We revisit Pass@K experiments and demonstrate that RLVR can extend the reasoning boundary for both mathematical and coding tasks. This is supported by our introduction of a novel evaluation metric, CoT-Pass@K, which captures reasoning success by accounting for both the final answer and intermediate reasoning steps. Furthermore...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","Reinforcement learning","1970-01-01","LLM"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/multimodal-needle-in-a-haystack-benchmarking-long-context-capability-of-multimodal-large-language-models","title":"Multimodal Needle in a Haystack: Benchmarking Long-Context Capability of Multimodal Large Language Models","url":"https://www.microsoft.com/en-us/research/publication/multimodal-needle-in-a-haystack-benchmarking-long-context-capability-of-multimodal-large-language-models/","published":"2025-06-16","authors":["Hengyi Wang","Haizhou Shi","Shiwei Tan","Weiyi Qin","Wenyuan Wang","Tunyu Zhang","Akshay Nambi","Tanuja Ganu","Hao Wang"],"abstract":"Multimodal Large Language Models (MLLMs) have shown significant promise in various applications, leading to broad interest from researchers and practitioners alike. However, a comprehensive evaluation of their long-context capabilities remains underexplored. To address these gaps, we introduce the MultiModal Needle-in-a-haystack (MMNeedle) benchmark, specifically designed to assess the long-context capabilities of MLLMs. Besides multi-image input, we employ image stitching to further increase the input context length, and develop a protocol to automatically generate labels for sub-image level retrieval. Essentially, MMNeedle evaluates MLLMs by stress-testing their capability to locate a target sub-image (needle) within a set of images (haystack) based on textual instructions and descriptions of image contents. This setup necessitates an advanced understanding of extensive visual contexts...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","Multimodal Large Language Models","1970-01-01","retrieval"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"bytedance-seed:284","title":"Robust Multi-bit Text Watermark with LLM-based Paraphrasers","url":"https://seed.bytedance.com/en/research/robust-multi-bit-text-watermark-with-llm-based-paraphrasers","published":"2025-06-16","authors":["Xiaojun Xu","Jinghan Jia","Yuanshun Yao","Yang Liu","Hang Li"],"abstract":"We propose an imperceptible multi-bit text watermark embedded by paraphrasing with LLMs. We fine-tune a pair of LLM paraphrasers that are designed to behave differently so that their paraphrasing difference reflected in the text semantics can be identified by a trained decoder. To embed our multi-bit watermark, we use two paraphrasers alternatively to encode the pre-defined binary code at the sentence level. Then we use a text classifier as the decoder to decode each bit of the watermark. Through extensive experiments, we show that our watermarks can achieve over 99.99% detection AUC with small (1.1B) text paraphrasers while keeping the semantic information of the original sentence. More importantly, our pipeline is robust under word substitution and sentence paraphrasing perturbations and generalizes well to out-of-distributional data. We also show the stealthiness of our watermark with...","companies":["ByteDance/Seed"],"matched_orgs":["ByteDance/Seed"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Artificial Intelligence","Responsible AI","ICML 2025","LLM"],"author_affiliations":["ByteDance/Seed"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official ByteDance Seed publication API https://seed.bytedance.com/api/get_article_list_v2"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/direct-reasoning-optimization-llms-can-reward-and-refine-their-own-reasoning-for-open-ended-tasks","title":"Direct Reasoning Optimization: Constrained RL with Token-Level Dense Reward and Rubric-Gated Constraints for Open-ended Tasks","url":"https://www.microsoft.com/en-us/research/publication/direct-reasoning-optimization-llms-can-reward-and-refine-their-own-reasoning-for-open-ended-tasks/","published":"2025-06-16","authors":["Yifei Xu","Tusher Chakraborty","Srinagesh Sharma","Leonardo Nunes","Swati Sharma","Kate Drakos Demopulos","Emre Kiciman","Songwu Lu","Ranveer Chandra"],"abstract":"RL training of LLMs on open-ended tasks is challenging due to the lack of direct verifiability. In this paper, we frame such training as constrained RL that (i) optimizes a token-level dense Reasoning Reflection Reward (R3) aligned with reasoning quality, and (ii) enforces rubric-gating as feasibility constraints at the rollout group level. R3 measures the model's token-level certainty of a reference answer under its CoT reasoning prefix while selectively emphasizing reasoning-reflective tokens to capture how likely the generated reasoning is to yield the desired answer. Rubric-gating complements R3 by operationalizing principled task criteria as hard accept/reject checks on final answers. Empirically, across four datasets, our framework outperforms baselines, achieves faster, more sample-efficient learning, and respects feasibility constraints. Venue: 1970-01-01","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","1970-01-01","efficient"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/self-enhancing-video-data-management-system-for-compositional-events-with-large-language-models","title":"Self-Enhancing Video Data Management System for Compositional Events with Large Language Models","url":"https://www.microsoft.com/en-us/research/publication/self-enhancing-video-data-management-system-for-compositional-events-with-large-language-models/","published":"2025-06-16","authors":["Enhao Zhang","Nicole Sullivan","Brandon Haynes","Ranjay Krishna","Magdalena Balazinska"],"abstract":"Complex video queries can be answered by decomposing them into modular subtasks. However, existing video data management systems assume the existence of predefined modules for each subtask. We introduce VOCAL-UDF, a novel self-enhancing system that supports compositional queries over videos without the need for predefined modules. VOCAL-UDF automatically identifies and constructs missing modules and encapsulates them as user-defined functions (UDFs), thus expanding its querying capabilities. To achieve this, we formulate a unified UDF model that leverages large language models (LLMs) to aid in new UDF generation. VOCAL UDF handles a wide range of concepts by supporting both program-based UDFs (i.e., Python functions generated by LLMs) and distilled-model UDFs (lightweight vision models distilled from strong pretrained models). To resolve the inherent ambiguity in user intent, VOCAL-UDF g...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Article (Journal)","Data platforms and analytics","Computer science"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/screen-reader-users-in-the-vibe-coding-era-adaptation-empowerment-and-new-accessibility-landscape","title":"Screen Reader Users in the Vibe Coding Era: Adaptation, Empowerment, and New Accessibility Landscape","url":"https://www.microsoft.com/en-us/research/publication/screen-reader-users-in-the-vibe-coding-era-adaptation-empowerment-and-new-accessibility-landscape/","published":"2025-06-16","authors":["Nan Chen","Luna K. Qiu","Arran Zeyu Wang","Zilong Wang","Yuqing Yang"],"abstract":"The rise of generative AI agents has reshaped human-computer interaction and computer-supported cooperative work by shifting users' roles from direct task execution to supervising machine-driven actions, especially in programming (e.g.,\"vibe coding\"). However, there is limited understanding of how screen reader users engage with these systems in practice. To address this gap, we conducted a longitudinal study with 16 screen reader users, exploring their experiences with AI code assistants in daily programming scenarios. Participants first completed a tutorial with GitHub Copilot, then performed a programming task and provided initial feedback. After two weeks of AI-assisted programming, follow-up studies assessed changes in their practices and perceptions. Our findings demonstrate that advanced code assistants not only enhance their programming capabilities but also bridge accessibility....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Article (Journal)","Artificial intelligence","Computer science"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"arxiv:2505.04846","title":"HiPerRAG: High-Performance Retrieval Augmented Generation for Scientific Insights","url":"http://arxiv.org/abs/2505.04846","published":"2025-06-16","authors":["Ozan Gökdemir","Carlo Siebenschuh","Alexander Brace","Azton I. Wells","Brian Hsu","Kyle Hippe","Priyanka V. Setty","Aswathy Ajith","J. Gregory Pauloski","Varuni Sastry","Sam Foreman","Huihuo Zheng"],"abstract":"The volume of scientific literature is growing exponentially, leading to underutilized discoveries, duplicated efforts, and limited cross-disciplinary collaboration. Retrieval-Augmented Generation (RAG) offers a way to assist scientists by improving the factuality of Large Language Models (LLMs) in processing this influx of information. However, scaling RAG to handle millions of articles introduces significant challenges, including the high computational costs associated with parsing documents and embedding scientific knowledge, as well as the algorithmic complexity of aligning these representations with the nuanced semantics of scientific content. To address these issues, we introduce HiPerRAG, a RAG workflow powered by high performance computing (HPC) to index and retrieve knowledge from more than 3.6 million scientific articles. At its core are Oreo, a high-throughput model for multim...","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3732775.3733586","openalex_id":"https://openalex.org/W4411490478","cited_by_count":6,"quality_score":47,"matched_keywords":["retrieval"],"author_affiliations":["Argonne National Laboratory","California Institute of Technology","Nvidia (United States)","University of Chicago","University of Illinois Chicago","University of Illinois Urbana-Champaign"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6955890655517578},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.3891810178756714}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":6}},{"id":"openalex:W4411353547","title":"MetalMind: A knowledge graph-driven human-centric knowledge system for metal additive manufacturing","url":"https://doi.org/10.1038/s44334-025-00038-9","published":"2025-06-16","authors":["Haolin Fan","Zhen Fan","Chenshu Liu","Jianhao Zhu","Tom Gibbs","Jerry Ying Hsi Fuh","Wen Feng Lu","Bingbing Li"],"abstract":"Abstract In the Industry 5.0 era, increasing manufacturing complexity and fragmented knowledge pose challenges for decision-making and workforce development. To tackle this, we present a human-centric knowledge system that integrates explicit knowledge from formal sources and implicit knowledge from expert insights. The system features three core innovations: (1) an automated KG construction pipeline leveraging large language models (LLMs) with collaborative verification to enhance knowledge extraction accuracy and minimize hallucinations; (2) a hybrid retrieval framework that combines vector-based, graph-based, and hybrid retrieval strategies for comprehensive knowledge access, achieving a 336.61% improvement over vector-based retrieval and a 68.04% improvement over graph-based retrieval in global understanding; and (3) an MR-enhanced interface that supports immersive, real-time interac...","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1038/s44334-025-00038-9","openalex_id":"https://openalex.org/W4411353547","cited_by_count":5,"quality_score":46,"matched_keywords":["retrieval"],"author_affiliations":["California State University, Northridge","National University of Singapore","Nvidia (United States)","University of California, Los Angeles"],"concepts":[{"id":"https://openalex.org/C2987255567","display_name":"Knowledge graph","score":0.5838140249252319},{"id":"https://openalex.org/C56739046","display_name":"Knowledge management","score":0.5133046507835388},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.48277679085731506},{"id":"https://openalex.org/C132525143","display_name":"Graph","score":0.45764416456222534},{"id":"https://openalex.org/C144133560","display_name":"Business","score":0.36168372631073},{"id":"https://openalex.org/C195094911","display_name":"Process management","score":0.3586447834968567},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.18322566151618958},{"id":"https://openalex.org/C80444323","display_name":"Theoretical computer science","score":0.1637919545173645}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":5}},{"id":"arxiv:2506.13585","title":"MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention","url":"https://huggingface.co/papers/2506.13585","published":"2025-06-16","authors":["MiniMax","Aili Chen","Aonian Li","Bangwei Gong","Binyang Jiang","Bo Fei","Bo Yang","Boji Shan","Changqing Yu","Chao Wang","Cheng Zhu","Chengjun Xiao"],"abstract":"We introduce MiniMax-M1, the world's first open-weight, large-scale hybrid-attention reasoning model. MiniMax-M1 is powered by a hybrid Mixture-of-Experts (MoE) architecture combined with a lightning attention mechanism. The model is developed based on our previous MiniMax-Text-01 model, which contains a total of 456 billion parameters with 45.9 billion parameters activated per token. The M1 model natively supports a context length of 1 million tokens, 8x the context size of DeepSeek R1. Furthermore, the lightning attention mechanism in MiniMax-M1 enables efficient scaling of test-time compute. These properties make M1 particularly suitable for complex tasks that require processing long inputs and thinking extensively. MiniMax-M1 is trained using large-scale reinforcement learning (RL) on diverse problems including sandbox-based, real-world software engineering environments. In addition....","companies":["MiniMax"],"matched_orgs":["MiniMax"],"company_groups":["company_china"],"company_regions":["China"],"sources":["huggingface_search"],"source":"huggingface_search","work_type":"","doi":"","openalex_id":"","cited_by_count":0,"quality_score":31,"matched_keywords":["efficient"],"author_affiliations":[],"concepts":[],"official_report":false,"quality_signals":{"company_match_source":"HuggingFace Papers organization/author metadata"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/unveiling-the-learning-mind-of-language-models-a-cognitive-framework-and-empirical-study","title":"Unveiling the Learning Mind of Language Models: A Cognitive Framework and Empirical Study","url":"https://www.microsoft.com/en-us/research/publication/unveiling-the-learning-mind-of-language-models-a-cognitive-framework-and-empirical-study/","published":"2025-06-15","authors":["Zhengyu Hu","Jianxun Lian","Zheyuan Xiao","Seraphina Zhang","Tianfu Wang","Nicholas Jing Yuan","Xing Xie","Hui Xiong"],"abstract":"Large language models (LLMs) have shown impressive capabilities across tasks such as mathematics, coding, and reasoning, yet their learning ability, which is crucial for adapting to dynamic environments and acquiring new knowledge, remains underexplored. In this work, we address this gap by introducing a framework inspired by cognitive psychology and education. Specifically, we decompose general learning ability into three distinct, complementary dimensions: Learning from Instructor (acquiring knowledge via explicit guidance), Learning from Concept (internalizing abstract structures and generalizing to new contexts), and Learning from Experience (adapting through accumulated exploration and feedback). We conduct a comprehensive empirical study across the three learning dimensions and identify several insightful findings, such as (i) interaction improves learning; (ii) conceptual understa...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","large language models","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/screen-reader-programmers-in-the-vibe-coding-era-adaptation-empowerment-and-new-accessibility-landscape","title":"Screen Reader Programmers in the Vibe Coding Era: Adaptation, Empowerment, and New Accessibility Landscape","url":"https://www.microsoft.com/en-us/research/publication/screen-reader-programmers-in-the-vibe-coding-era-adaptation-empowerment-and-new-accessibility-landscape/","published":"2025-06-15","authors":["Nan Chen","Luna K. Qiu","Arran Zeyu Wang","Zilong Wang","Yuqing Yang"],"abstract":"Generative AI agents are reshaping human-computer interaction, shifting users from direct task execution to supervising machine-driven actions, especially the rise of\"vibe coding\"in programming. Yet little is known about how screen reader programmers interact with AI code assistants in practice. We conducted a longitudinal study with 16 blind and low-vision programmers. Participants completed a GitHub Copilot tutorial, engaged with a programming task, and provided initial feedback. After two weeks of AI-assisted programming, follow-ups examined how their practices and perceptions evolved. Our findings show that code assistants enhanced programming efficiency and bridged accessibility gaps. However, participants struggled to convey intent, interpret AI outputs, and manage multiple views while maintaining situational awareness. They showed diverse preferences for accessibility features, ex...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Inproceedings (Conference)","Human-computer interaction","Computer science","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"bytedance-seed:271","title":"Improving Zero-Shot Adversarial Robustness in Vision-Language Models by Closed-form Alignment of Adversarial Path Simplices","url":"https://seed.bytedance.com/en/research/improving-zero-shot-adversarial-robustness-in-vision-language-models-by-closed-form-alignment-of-adversarial-path-simplices","published":"2025-06-15","authors":["Junhao Dong","Piotr Koniusz","Yifei Zhang","Hao Zhu","Weiming Liu","Xinghua Qu","Yew-Soon Ong"],"abstract":"Vision-Language Models (VLMs) such as CLIP excel at zero-shot classification due to large-scale pre-training but are vulnerable to adversarial examples. Adversarial fine-tuning robustifies zero-shot models by aligning prediction scores of individual adversaries with their clean counterparts, which typically overlooks intermediate adversarial samples along the adversarial trajectory crossing the decision boundary. Such intermediate adversaries and their vicinity produce informative representations capturing the decision boundary in detail. They can be improved by sampling adversarial candidates from simplices formed by joining two consecutive vertices on the adversarial trajectory and their clean counterpart. However, sampling simplices for adversaries is very costly. To train robust VLM, we overcome these limitations by Taylor expansion and formulating an upper-bound of alignment loss th...","companies":["ByteDance/Seed"],"matched_orgs":["ByteDance/Seed"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Deep Learning","Speech","ICML 2025"],"author_affiliations":["ByteDance/Seed"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official ByteDance Seed publication API https://seed.bytedance.com/api/get_article_list_v2"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/the-swe-bench-illusion-when-state-of-the-art-llms-remember-instead-of-reason","title":"The SWE-Bench Illusion: When State-of-the-Art LLMs Remember Instead of Reason","url":"https://www.microsoft.com/en-us/research/publication/the-swe-bench-illusion-when-state-of-the-art-llms-remember-instead-of-reason/","published":"2025-06-13","authors":["Shanchao Liang","Spandan Garg","Roshanak Zilouchian Moghaddam"],"abstract":"As large language models (LLMs) become increasingly capable and widely adopted, benchmarks play a central role in assessing their practical utility. For example, SWE-Bench Verified has emerged as a critical benchmark for evaluating LLMs'software engineering abilities, particularly their aptitude for resolving real-world GitHub issues. Recent LLMs show impressive performance on SWE-Bench, leading to optimism about their capacity for complex coding tasks. However, current evaluation protocols may overstate these models'true capabilities. It is crucial to distinguish LLMs'generalizable problem-solving ability and other learned artifacts. In this work, we introduce two diagnostic tasks: file path identification from issue descriptions alone and ground truth function reproduction with only the current file context and issue description to probe models'underlying knowledge. We present empirica...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Article (Journal)","Artificial intelligence","Programming languages and software engineering","Computer science"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"official:fa29cb6d1d8e18d0","title":"MiniMax-M1","url":"https://huggingface.co/MiniMaxAI/MiniMax-M1-80k/blob/main/MiniMax_M1_tech_report.pdf","published":"2025-06-13","authors":["MiniMax"],"abstract":"","companies":["MiniMax"],"matched_orgs":["MiniMax"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_repository_scan"],"source":"official_repository_scan","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["MiniMax"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace repo MiniMaxAI/MiniMax-M1-80k"}},{"id":"apple:bs1qxlfpj4ramkasdmyj27dv","title":"Discriminating Form and Meaning in Multilingual Models with Minimal-Pair ABX Tasks","url":"https://machinelearning.apple.com/research/discriminating-form","published":"2025-06-13","authors":["Maureen de Seyssel","Jie Chi","Skyler Seto","Maartje ter Hoeve","Masha Fedzechkina","Natalie Schluter"],"abstract":"We introduce a set of training-free ABX-style discrimination tasks to evaluate how multilingual language models represent language identity (form) and semantic content (meaning). Inspired from speech processing, these zero-shot tasks measure whether minimal differences in representation can be reliably detected. This offers a flexible and interpretable alternative to probing. Applied to XLM-R (Conneau et al, 2020) across pretraining checkpoints...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"openalex:W4417523215","title":"Research on E-Commerce Long-Tail Product Recommendation Mechanism Based on Large-Scale Language Models","url":"https://doi.org/10.1145/3766671.3766843","published":"2025-06-13","authors":["Q. Lu","Haotian Lyu","Jiayun Zheng","Yang Wang","Li Zhang","Chengrui Zhou"],"abstract":"As e-commerce platforms continue to extend their product catalogs, ensuring accurate recommendation of long-tail items has become a major objective. This is crucial because, in this way, user experience and platform revenue can be significantly enhanced. A common struggle for these systems is the long-tail problem, where the full breadth of data drives the high level of sparsity and the cold start occurs. Consequently, traditional recommendation algorithms have many issues when recommending long-tail items due to lack of available data. Our work reflects a long-tail product recommendation mechanism that represents a combination of the product text descriptions and the user behavior sequences based on a large-scale language model (LLM) as the first proposal of this paper. This mechanism starts with the application of a pre-trained LLM to convert multimodal texts, e.g., product titles, det...","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3766671.3766843","openalex_id":"https://openalex.org/W4417523215","cited_by_count":2,"quality_score":47,"matched_keywords":["LLM","language model"],"author_affiliations":["Amazon (United States)","Ann Arbor Center for Independent Living","Brown University","Columbia University","Nagoya University","University of Michigan","University of Southern California"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8134999871253967},{"id":"https://openalex.org/C557471498","display_name":"Recommender system","score":0.6233000159263611},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.5853000283241272},{"id":"https://openalex.org/C90673727","display_name":"Product (mathematics)","score":0.5835000276565552},{"id":"https://openalex.org/C21569690","display_name":"Collaborative filtering","score":0.5699999928474426},{"id":"https://openalex.org/C177264268","display_name":"Set (abstract data type)","score":0.5429999828338623},{"id":"https://openalex.org/C103278499","display_name":"Similarity (geometry)","score":0.4984000027179718},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.4957999885082245}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":2}},{"id":"openalex:W4411374687","title":"Bridging Academia and Industry: Leveraging Generative AI in a Software Engineering Course for Practical Industry Experiences","url":"https://doi.org/10.1145/3724363.3729036","published":"2025-06-13","authors":["Daniel Mejia","Ernest Holmes","Jenn Marroquin","Jamie Gorson Benario"],"abstract":"The rapid adoption of generative AI across the tech industry demands a corresponding evolution in educational practices. By proactively incorporating generative AI, educational institutions can ensure their programs remain relevant and continue to provide students with the skills necessary for career success. This work presents an intro Software Engineering course, Software Development Studio (SDS), designed and implemented by Google in collaboration with faculty, to ensure students acquire industry-relevant skills. The course focuses on integrating generative AI tools into software engineering practices, mirroring the evolving methodologies used by professionals in the field. The curriculum emphasizes practical, real-world projects, providing early undergraduate computer science students hands-on experience using generative AI tools. Data collected during the Spring 2024 semester from s...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3724363.3729036","openalex_id":"https://openalex.org/W4411374687","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Google (United States)","The University of Texas at El Paso"],"concepts":[{"id":"https://openalex.org/C174348530","display_name":"Bridging (networking)","score":0.8462406396865845},{"id":"https://openalex.org/C2777552389","display_name":"Course (navigation)","score":0.655984103679657},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.643977165222168},{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.6431697607040405},{"id":"https://openalex.org/C115903868","display_name":"Software engineering","score":0.5127192735671997},{"id":"https://openalex.org/C2777904410","display_name":"Software","score":0.43230491876602173},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.34551942348480225},{"id":"https://openalex.org/C127413603","display_name":"Engineering","score":0.2622758746147156}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/reveal-self-evolving-code-agents-via-reliable-self-verification","title":"ReVeal: Self-Evolving Code Agents via Reliable Self-Verification","url":"https://www.microsoft.com/en-us/research/publication/reveal-self-evolving-code-agents-via-reliable-self-verification/","published":"2025-06-12","authors":["Yiyang Jin","Kunzhao Xu","Hang Li","Xueting Han","Yanmin Zhou","Cheng Li","Jing Bai"],"abstract":"Reinforcement learning with verifiable rewards (RLVR) has advanced the reasoning capabilities of large language models. However, existing methods rely solely on outcome rewards, without explicitly optimizing verification or leveraging reliable signals from realistic environments, leading to unreliable self-verification and limited test-time scaling. To address this, we widen the verification-generation asymmetry by explicitly optimizing self-verification, making it a reliable driver of deeper test-time scaling. We introduce ReVeal, a multi-turn reinforcement learning framework that evolves code generation through self-verification and tool-based evaluation. ReVeal structures long-horizon reasoning as iterative generation-verification turns and incorporates TAPO for turn-level credit assignment, fostering the co-evolution of code and test generation. At inference, this strengthened self-v...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","Reinforcement learning","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/implicit-language-models-are-rnns-balancing-parallelization-and-expressivity","title":"Implicit Language Models are RNNs: Balancing Parallelization and Expressivity","url":"https://www.microsoft.com/en-us/research/publication/implicit-language-models-are-rnns-balancing-parallelization-and-expressivity/","published":"2025-06-12","authors":["Mark Schöne","Babak Rahmani","Heiner Kremer","Fabian Falck","Hitesh Ballani","Jannes Gladrow"],"abstract":"State-space models (SSMs) and transformers dominate the language modeling landscape. However, they are constrained to a lower computational complexity than classical recurrent neural networks (RNNs), limiting their expressivity. In contrast, RNNs lack parallelization during training, raising fundamental questions about the trade off between parallelization and expressivity. We propose implicit SSMs, which iterate a transformation until convergence to a fixed point. Theoretically, we show that implicit SSMs implement the non-linear state-transitions of RNNs. Empirically, we find that only approximate fixed-point convergence suffices, enabling the design of a scalable training curriculum that largely retains parallelization, with full convergence required only for a small subset of tokens. Our approach demonstrates superior state-tracking capabilities on regular languages, surpassing trans...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","State-space models","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"bytedance-seed:1374","title":"SwiftSpec: Ultra-Low Latency LLM Decoding by Scaling Asynchronous Speculative Decoding","url":"https://seed.bytedance.com/en/research/swiftspec-ultra-low-latency-llm-decoding-by-scaling-asynchronous-speculative-decoding","published":"2025-06-12","authors":["Ziyi Zhang","Ziheng Jiang","Chengquan Jiang","Menghan Yu","Size Zheng","Haibin Lin","Henry Hoffmann","Xin Liu"],"abstract":"Low-latency decoding for large language models (LLMs) is crucial for applications like chatbots and code assistants, yet generating long outputs remains slow in single-query settings. Prior work on speculative decoding (which combines a small draft model with a larger target model) and tensor parallelism has each accelerated decoding. However, conventional approaches fail to apply both simultaneously due to imbalanced compute requirements (between draft and target models), KV-cache inconsistencies, and communication overheads under small-batch tensor-parallelism. This paper introduces SwiftSpec, a system that targets ultra-low latency for LLM decoding. SwiftSpec redesigns the speculative decoding pipeline in an asynchronous and disaggregated manner, so that each component can be scaled flexibly and remove draft overhead from the critical path. To realize this design, SwiftSpec proposes p...","companies":["ByteDance/Seed"],"matched_orgs":["ByteDance/Seed"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Distributed, Parallel, and Cluster Computing","Infrastructures","arXiv","LLM"],"author_affiliations":["ByteDance/Seed"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official ByteDance Seed publication API https://seed.bytedance.com/api/get_article_list_v2"}},{"id":"openalex:W4411466105","title":"Research on Multi-Modal Retrieval System of E-Commerce Platform Based on Pre-Training Model","url":"https://doi.org/10.70711/aitr.v2i9.6879","published":"2025-06-12","authors":["Bingbing Zhang","Yi Han","Xiaofei Han"],"abstract":"In this paper, a multi-modal retrieval system for e-commerce platform is proposed, which integrates three advanced pre-training models: BLIP, CLIP and CLIP Interrogator. The system solves the challenge of traditional keyword-based product search by realizing more accurate and efficient graphic matching. We trained and evaluated our approach using 413, 000 image-text pairs from the Google conceptual Captions dataset. Our method introduces a novel feature fusion mechanism and combines the advantages of several pre-trained models to realize comprehensive visual semantic understanding. The system shows strong performance in daily business scenes and complex artistic product description. Experimental results show that our proposed method can effectively generate detailed and context-aware descriptions and accurately match user queries and product pictures. The adaptability and semantic unders...","companies":["Meta/FAIR"],"matched_orgs":["Meta/FAIR"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.70711/aitr.v2i9.6879","openalex_id":"https://openalex.org/W4411466105","cited_by_count":23,"quality_score":68,"matched_keywords":["retrieval","efficient"],"author_affiliations":["Meta (United States)","Shanghai International Studies University","Shanghai University of International Business and Economics","Xiamen University of Technology"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8250874876976013},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.6529449224472046},{"id":"https://openalex.org/C71139939","display_name":"Modal","score":0.5252234935760498},{"id":"https://openalex.org/C184337299","display_name":"Semantics (computer science)","score":0.5189099907875061},{"id":"https://openalex.org/C174348530","display_name":"Bridging (networking)","score":0.4747213125228882},{"id":"https://openalex.org/C2779343474","display_name":"Context (archaeology)","score":0.4434705078601837},{"id":"https://openalex.org/C90673727","display_name":"Product (mathematics)","score":0.43741321563720703},{"id":"https://openalex.org/C98045186","display_name":"Process (computing)","score":0.4159584045410156}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":23}},{"id":"bytedance-seed:836","title":"PAG: Multi-Turn Reinforced LLM Self-Correction with Policy as Generative Verifier","url":"https://seed.bytedance.com/en/research/pag-multi-turn-reinforced-llm-self-correction-with-policy-as-generative-verifier","published":"2025-06-12","authors":["Yuhua Jiang","Yuwen Xiong","Yufeng Yuan","Chao Xin","Wenyuan Xu","Yu Yue","Qianchuan Zhao","Lin Yan"],"abstract":"Large Language Models (LLMs) have demonstrated impressive capabilities in complex reasoning tasks, yet they still struggle to reliably verify the correctness of their own outputs. Existing solutions to this verification challenge often depend on separate verifier models or require multi-stage self-correction training pipelines, which limit scalability. In this paper, we propose Policy as Generative Verifier (PAG), a simple and effective framework that empowers LLMs to self-correct by alternating between policy and verifier roles within a unified multi-turn reinforcement learning (RL) paradigm. Distinct from prior approaches that always generate a second attempt regardless of model confidence, PAG introduces a selective revision mechanism: the model revises its answer only when its own generative verification step detects an error. This verify-then-revise workflow not only alleviates mode...","companies":["ByteDance/Seed"],"matched_orgs":["ByteDance/Seed"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Computation and Language","LLM","NeurIPS 2025"],"author_affiliations":["ByteDance/Seed"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official ByteDance Seed publication API https://seed.bytedance.com/api/get_article_list_v2"}},{"id":"bytedance-seed:288","title":"Expert Race: A Flexible Routing Strategy for Scaling Diffusion Transformer with Mixture of Experts","url":"https://seed.bytedance.com/en/research/expert-race-a-flexible-routing-strategy-for-scaling-diffusion-transformer-with-mixture-of-experts","published":"2025-06-12","authors":["Yike Yuan","Ziyu Wang","Zihao Huang","Defa Zhu","Xun Zhou","Jingyi Yu","Qiyang Min"],"abstract":"Diffusion models have emerged as mainstream framework in visual generation. Building upon this success, the integration of Mixture of Experts (MoE) methods has shown promise in enhancing model scalability and performance. In this paper, we introduce Race-DiT, a novel MoE model for diffusion transformers with a flexible routing strategy, Expert Race. By allowing tokens and experts to compete together and select the top candidates, the model learns to dynamically assign experts to critical tokens. Additionally, we propose per-layer regularization to address challenges in shallow layer learning, and router similarity loss to prevent mode collapse, ensuring better expert utilization. Extensive experiments on ImageNet validate the effectiveness of our approach, showcasing significant performance gains while promising scaling properties. External paper link: https://arxiv.org/pdf/2503.16057","companies":["ByteDance/Seed"],"matched_orgs":["ByteDance/Seed"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Computer Vision and Pattern Recognition","LLM","ICML 2025"],"author_affiliations":["ByteDance/Seed"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official ByteDance Seed publication API https://seed.bytedance.com/api/get_article_list_v2"}},{"id":"bytedance-seed:265","title":"Elucidating the Design Space of Multimodal Protein Language Models","url":"https://seed.bytedance.com/en/research/elucidating-the-design-space-of-multimodal-protein-language-models","published":"2025-06-12","authors":["Cheng-Yen Hsieh","Xinyou Wang","Daiheng Zhang","Dongyu Xue","Fei Ye","Shujian Huang","Zaixiang Zheng","Quanquan Gu"],"abstract":"Multimodal protein language models (PLMs) integrate sequence and token-based structural information, serving as a powerful foundation for protein modeling, generation, and design. However, the reliance on tokenizing 3D structures into discrete tokens causes substantial loss of fidelity about fine-grained structural details and correlations. In this paper, we systematically elucidate the design space of multimodal PLMs to overcome their limitations. We identify tokenization loss and inaccurate structure token predictions by the PLMs as major bottlenecks. To address these, our proposed design space covers improved generative modeling, structure-aware architectures and representation learning, and data exploration. Our advancements approach finer-grained supervision, demonstrating that token-based multimodal PLMs can achieve robust structural modeling. The effective design methods dramatica...","companies":["ByteDance/Seed"],"matched_orgs":["ByteDance/Seed"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["AI for Science","ICML 2025"],"author_affiliations":["ByteDance/Seed"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official ByteDance Seed publication API https://seed.bytedance.com/api/get_article_list_v2"}},{"id":"openalex:W4411251200","title":"Using tropical reef, bird and unrelated sounds for superior transfer learning in marine bioacoustics","url":"https://doi.org/10.1098/rstb.2024.0280","published":"2025-06-12","authors":["Ben Williams","Bart van Merriënboer","Vincent Dumoulin","Jenny Hamer","Abram B. Fleishman","Matthew McKown","Jill E. Munger","Aaron N. Rice","Ashlee Lillis","Clemency White","Catherine Hobbs","Tries B. Razak"],"abstract":"Machine learning has the potential to revolutionize passive acoustic monitoring (PAM) for ecological assessments. However, high annotation and computing costs limit the field's adoption. Generalizable pretrained networks can overcome these costs, but high-quality pretraining requires vast annotated libraries, limiting their current development to data-rich bird taxa. Here, we identify the optimum pretraining strategy for data-deficient domains, using tropical reefs as a representative case study. We assembled ReefSet, an annotated library of 57 000 reef sounds taken across 16 datasets, though still modest in scale compared to annotated bird libraries. We performed multiple pretraining experiments and found that pretraining on a library of bird audio 50 times the size of ReefSet provides notably superior generalizability on held-out reef datasets, with a mean area under the receiver opera...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1098/rstb.2024.0280","openalex_id":"https://openalex.org/W4411251200","cited_by_count":16,"quality_score":53,"matched_keywords":[],"author_affiliations":["Cornell University","Department of Agriculture, Land Reform and Rural Development","Google (Canada)","Google (United States)","Google DeepMind (United Kingdom)","IPB University","Interacoustics (Denmark)","Protein Metrics (United States)","University College London","University of Bristol","University of Exeter","University of New Hampshire at Manchester","Zoological Society of London"],"concepts":[{"id":"https://openalex.org/C34951282","display_name":"Bioacoustics","score":0.8392113447189331},{"id":"https://openalex.org/C77044568","display_name":"Reef","score":0.6462640166282654},{"id":"https://openalex.org/C156602925","display_name":"Tropical marine climate","score":0.49816083908081055},{"id":"https://openalex.org/C505870484","display_name":"Fishery","score":0.49657946825027466},{"id":"https://openalex.org/C150899416","display_name":"Transfer of learning","score":0.4127287268638611},{"id":"https://openalex.org/C111368507","display_name":"Oceanography","score":0.4086226224899292},{"id":"https://openalex.org/C205649164","display_name":"Geography","score":0.37645745277404785},{"id":"https://openalex.org/C86803240","display_name":"Biology","score":0.30978745222091675}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":16}},{"id":"openalex:W4411232425","title":"Unsupervised Pre-Training With Language-Vision Prompts for Low-Data Instance Segmentation","url":"https://doi.org/10.1109/tpami.2025.3579469","published":"2025-06-12","authors":["Dingwen Zhang","Hao Li","Diqi He","Nian Liu","Lechao Cheng","Jingdong Wang","Junwei Han"],"abstract":"In recent times, following the paradigm of DETR (DEtection TRansformer), query-based end-to-end instance segmentation (QEIS) methods have exhibited superior performance compared to CNN-based models, particularly when trained on large-scale datasets. Nevertheless, the effectiveness of these QEIS methods diminishes significantly when confronted with limited training data. This limitation arises from their reliance on substantial data volumes to effectively train the pivotal queries/kernels that are essential for acquiring localization and shape priors. To address this problem, we propose a novel method for unsupervised pre-training in low-data regimes. Inspired by the recently successful prompting technique, we introduce a new method, Unsupervised Pre-training with Language-Vision Prompts (UPLVP), which improves QEIS models' instance segmentation by bringing language-vision prompts to quer...","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/tpami.2025.3579469","openalex_id":"https://openalex.org/W4411232425","cited_by_count":15,"quality_score":52,"matched_keywords":[],"author_affiliations":["Baidu (China)","Chongqing University of Posts and Telecommunications","Hefei University of Technology","Inception Institute of Artificial Intelligence","Northwestern Polytechnical University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7514216303825378},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.7330580353736877},{"id":"https://openalex.org/C89600930","display_name":"Segmentation","score":0.5702874064445496},{"id":"https://openalex.org/C124504099","display_name":"Image segmentation","score":0.5130432844161987},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.4947482943534851},{"id":"https://openalex.org/C153180895","display_name":"Pattern recognition (psychology)","score":0.46668797731399536},{"id":"https://openalex.org/C51632099","display_name":"Training set","score":0.4579455256462097},{"id":"https://openalex.org/C2777211547","display_name":"Training (meteorology)","score":0.44033530354499817}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":15}},{"id":"openalex:W4411236346","title":"Advancing medical education in cervical cancer control with large language models for multiple-choice question generation","url":"https://doi.org/10.1080/0142159x.2025.2513419","published":"2025-06-12","authors":["Mingyang Chen","Jiayi Ma","Xiaoli Cui","Qianling Dai","Haiyan Hu","Yijin Wu","Sulaiya Husaiyin","Aiyuan Wu","You‐Lin Qiao"],"abstract":"OBJECTIVE: To explore the feasibility of using large language models (LLMs) to generate multiple-choice questions (MCQs) for cervical cancer control education and compare them with those created by clinicians. METHODS: GPT-4o and Baichuan4 generated 40 MCQs each with iteratively refined prompts. Clinicians generated 40 MCQs for comparison. 120 MCQs were evaluated by 12 experts across five dimensions (correctness, clarity and specificity, cognitive level, clinical relevance, explainability) using a 5-point Likert scale. Difficulty and discriminatory power were tested by practitioners. Participants were asked to identify the source of each MCQ. RESULTS: Automated MCQs were similar to clinician-generated ones in most dimensions. However, clinician-generated MCQs had a higher cognitive level (4.00±1.08) than those from GPT-4o (3.68±1.07) and Baichuan4 (3.7±1.13). Testing with 312 practitione...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1080/0142159x.2025.2513419","openalex_id":"https://openalex.org/W4411236346","cited_by_count":1,"quality_score":42,"matched_keywords":["LLM"],"author_affiliations":["Chengdu Women's and Children's Central Hospital","China Medical University","Chinese Academy of Medical Sciences & Peking Union Medical College","Jiangnan University","Liaoning Cancer Hospital & Institute","Peking Union Medical College Hospital","People's Hospital of Xinjiang Uygur Autonomous Region","Shenzhen Maternity and Child Healthcare Hospital","Southern Medical University","Tencent (China)","University of Electronic Science and Technology of China"],"concepts":[{"id":"https://openalex.org/C2778220009","display_name":"Cervical cancer","score":0.5773515701293945},{"id":"https://openalex.org/C2775924081","display_name":"Control (management)","score":0.49662405252456665},{"id":"https://openalex.org/C509550671","display_name":"Medical education","score":0.42992621660232544},{"id":"https://openalex.org/C71924100","display_name":"Medicine","score":0.42120838165283203},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.3184727430343628},{"id":"https://openalex.org/C121608353","display_name":"Cancer","score":0.274011492729187},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.1664992868900299},{"id":"https://openalex.org/C126322002","display_name":"Internal medicine","score":0.14528095722198486}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4411226903","title":"CSLAN: Cross-Species Latent Alignment Network for Trauma-Related Cell-Type Classification","url":"https://doi.org/10.1101/2025.06.09.655966","published":"2025-06-12","authors":["Rui Wu","Alan Shi","Yushi Liu","Alexandre Duprey","Ting Chen","Rui Li","Shuang Cao"],"abstract":"Abstract Understanding cellular identity and function across species is fundamental to deciphering conserved biological principles and translating insights from model organisms to human health. However, integrating single-cell transcriptomic data, particularly from resource-limited human studies, with comprehensive datasets from model organisms like mice remains a significant challenge due to evolutionary divergence and technical variability. Here, we introduce the Cross-Species Latent Alignment Network (CSLAN), a transfer learning framework that effectively bridges this gap for robust cell type classification. CSLAN employs a novel strategy involving focused feature selection on source (mouse) data followed by pretraining an encoder-decoder architecture. Crucially, for transfer to human data, only the encoder is fine-tuned while the latent space processor and the decoder, which encapsul...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"preprint","doi":"https://doi.org/10.1101/2025.06.09.655966","openalex_id":"https://openalex.org/W4411226903","cited_by_count":1,"quality_score":38,"matched_keywords":[],"author_affiliations":["Apple (United States)","Bentley University","Eli Lilly (United States)","Regenxbio (United States)","University of Cambridge","University of Illinois Urbana-Champaign"],"concepts":[{"id":"https://openalex.org/C2777299769","display_name":"Type (biology)","score":0.5371300578117371},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.37849298119544983},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3583243489265442},{"id":"https://openalex.org/C70721500","display_name":"Computational biology","score":0.333121120929718},{"id":"https://openalex.org/C86803240","display_name":"Biology","score":0.19965553283691406},{"id":"https://openalex.org/C18903297","display_name":"Ecology","score":0.064541757106781}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/mmmg-a-massive-multidisciplinary-multi-tier-generation-benchmark-for-text-to-image-reasoning","title":"MMMG: A Massive, Multidisciplinary, Multi-Tier Generation Benchmark for Text-to-Image Reasoning","url":"https://www.microsoft.com/en-us/research/publication/mmmg-a-massive-multidisciplinary-multi-tier-generation-benchmark-for-text-to-image-reasoning/","published":"2025-06-11","authors":["Yuxuan Luo","Yuhui Yuan","Junwen Chen","Haonan Cai","Ziyi Yue","Yuwei Yang","Fatima Zohra Daha","Ji Li","Zhouhui Lian"],"abstract":"In this paper, we introduce knowledge image generation as a new task, alongside the Massive Multi-Discipline Multi-Tier Knowledge-Image Generation Benchmark (MMMG) to probe the reasoning capability of image generation models. Knowledge images have been central to human civilization and to the mechanisms of human learning -- a fact underscored by dual-coding theory and the picture-superiority effect. Generating such images is challenging, demanding multimodal reasoning that fuses world knowledge with pixel-level grounding into clear explanatory visuals. To enable comprehensive evaluation, MMMG offers 4,456 expert-validated (knowledge) image-prompt pairs spanning 10 disciplines, 6 educational levels, and diverse knowledge formats such as charts, diagrams, and mind maps. To eliminate confounding complexity during evaluation, we adopt a unified Knowledge Graph (KG) representation. Each KG ex...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Computer vision","Computer science","Computer Vision and Pattern Recognition","1970-01-01","LLM"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"bytedance-seed:258","title":"Seedance 1.0: Exploring the Boundaries of Video Generation Models","url":"https://seed.bytedance.com/en/research/seedance-1-0-exploring-the-boundaries-of-video-generation-models","published":"2025-06-11","authors":["Seed Vision Team"],"abstract":"Notable advances in diffusion modeling have propelled rapid improvements in video generation, yet current foundational model still confront critical challenges in synergistically balancing prompt following, motion plausibility, and visual quality. In this report, we introduce Seedance 1.0, a high-performance and inference-efficient video foundation generation model that integrates several core technical improvements: (i) multi-source data curation augmented with precision and meaningful video captioning, enabling comprehensive learning across diverse scenarios; (ii) an efficient pre-training paradigm that enables multiple features or functions such as interleaved multimodal positional encoding, native multi-shot generation capacity, and multi-task modeling; (iii) carefully-designed post-training optimization leveraging fine-grained supervised fine-tuning, video-specific RLHF with multi-d...","companies":["ByteDance/Seed"],"matched_orgs":["ByteDance/Seed"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Computer Vision","Vision","arXiv","efficient","distillation"],"author_affiliations":["ByteDance/Seed"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official ByteDance Seed publication API https://seed.bytedance.com/api/get_article_list_v2"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/from-replication-to-redesign-exploring-pairwise-comparisons-for-llm-based-peer-review","title":"From Replication to Redesign: Exploring Pairwise Comparisons for LLM-Based Peer Review","url":"https://www.microsoft.com/en-us/research/publication/from-replication-to-redesign-exploring-pairwise-comparisons-for-llm-based-peer-review/","published":"2025-06-11","authors":["Yaohui Zhang","Haijing Zhang","Wenlong Ji","Tianyu Hua","Nick Haber","Hancheng Cao","Weixin Liang"],"abstract":"The advent of large language models (LLMs) offers unprecedented opportunities to reimagine peer review beyond the constraints of traditional workflows. Despite these opportunities, prior efforts have largely focused on replicating traditional review workflows with LLMs serving as direct substitutes for human reviewers, while limited attention has been given to exploring new paradigms that fundamentally rethink how LLMs can participate in the academic review process. In this paper, we introduce and explore a novel mechanism that employs LLM agents to perform pairwise comparisons among manuscripts instead of individual scoring. By aggregating outcomes from substantial pairwise evaluations, this approach enables a more accurate and robust measure of relative manuscript quality. Our experiments demonstrate that this comparative approach significantly outperforms traditional rating-based meth...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","1970-01-01","LLM"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W4414197657","title":"Deciding the Path: Leveraging Multi-Agent Systems for Solving Complex Tasks","url":"https://doi.org/10.1109/cvprw67362.2025.00405","published":"2025-06-11","authors":["Iman Abbasnejad","Xuefeng Liu","Atunu Roy"],"abstract":"We present a multi-agent framework that enhances the capabilities of LLMs through intelligent task distribution and resource optimization for solving complex problems. The framework uses a dynamic routing mechanism that automatically delegates queries to specialized agents, complemented by an efficient tool selection system that reduces computational complexity. This autonomous architecture eliminates the need for human intervention while maintaining high performance across diverse tasks. Through comprehensive empirical evaluation on three challenging benchmarks, our multi-agent framework outperforms existing state-of-the-art LLMs and even specialized systems. Our method achieves 90.29% accuracy on Math 401 (surpassing MathViz-E's 89.53%), 91.3% pass@1 on MBPP (exceeding QualityFlow's 88.53%), and obtains state-of-the-art valid efficiency and execution scores of 56.28% and 54.39% respect...","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/cvprw67362.2025.00405","openalex_id":"https://openalex.org/W4414197657","cited_by_count":0,"quality_score":53,"matched_keywords":["LLM","efficient","agent","multi-agent"],"author_affiliations":["Amazon (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7628999948501587},{"id":"https://openalex.org/C2780451532","display_name":"Task (project management)","score":0.6840999722480774},{"id":"https://openalex.org/C81917197","display_name":"Selection (genetic algorithm)","score":0.5078999996185303},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.41999998688697815},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.39149999618530273},{"id":"https://openalex.org/C206345919","display_name":"Resource (disambiguation)","score":0.3815000057220459},{"id":"https://openalex.org/C74172769","display_name":"Routing (electronic design automation)","score":0.3626999855041504},{"id":"https://openalex.org/C29202148","display_name":"Resource allocation","score":0.3504999876022339}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"apple:vdezqs2zudx8d9daotphvccz","title":"The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity","url":"https://machinelearning.apple.com/research/illusion-of-thinking","published":"2025-06-11","authors":["Parshin Shojaee","Iman Mirzadeh","Keivan Alizadeh","Maxwell Horton","Samy Bengio","Mehrdad Farajtabar"],"abstract":"Recent generations of frontier language models have introduced Large Reasoning Models (LRMs) that generate detailed thinking processes before providing answers. While these models demonstrate improved performance on reasoning benchmarks, their fundamental capabilities, scaling properties, and limitations remain insufficiently understood. Current evaluations primarily focus on established mathematical and coding benchmarks, emphasizing final...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"openalex:W4414197839","title":"Scaling On-Device GPU Inference for Large Generative Models","url":"https://doi.org/10.1109/cvprw67362.2025.00631","published":"2025-06-11","authors":["Jiuqiang Tang","Raman Sarokin","Ekaterina Ignasheva","Grant J. Jensen","Lin Chen","Ju Hyun Lee","Andrei Kulik","Matthias Grundmann"],"abstract":"Driven by the advancements in generative AI, large machine learning models have revolutionized domains such as image processing, audio synthesis, and speech recognition. While server-based deployments remain the locus of peak performance, the imperative for on-device inference, necessitated by privacy and efficiency considerations, persists. Recognizing GPUs as the on-device ML accelerator with the widest reach, we present ML Drift-an optimized framework that extends the capabilities of state-of-the-art GPU-accelerated inference engines. ML Drift enables on-device execution of generative AI workloads which contain 10 to 100x more parameters than existing on-device gener-ative AI models. ML Drift addresses intricate engineering challenges associated with cross-GPU API development, and ensures broad compatibility across mobile and desk-top/laptop platforms, thereby facilitating the deploym...","companies":["Google/DeepMind","Meta/FAIR"],"matched_orgs":["Google/DeepMind","Meta/FAIR"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/cvprw67362.2025.00631","openalex_id":"https://openalex.org/W4414197839","cited_by_count":2,"quality_score":51,"matched_keywords":[],"author_affiliations":["BC Platforms (Finland)","Google (United States)","Meta (United States)"],"concepts":[{"id":"https://openalex.org/C2776214188","display_name":"Inference","score":0.8235999941825867},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7975000143051147},{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.6381000280380249},{"id":"https://openalex.org/C105339364","display_name":"Software deployment","score":0.5364000201225281},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4602000117301941},{"id":"https://openalex.org/C167966045","display_name":"Generative model","score":0.4447000026702881},{"id":"https://openalex.org/C99844830","display_name":"Scaling","score":0.414000004529953},{"id":"https://openalex.org/C46743427","display_name":"Inference engine","score":0.4050999879837036}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":2}},{"id":"openalex:W4414198298","title":"Talk2Traffic: Interactive and Editable Traffic Scenario Generation for Autonomous Driving with Multimodal Large Language Model","url":"https://doi.org/10.1109/cvprw67362.2025.00364","published":"2025-06-11","authors":["Zihao Sheng","Zilin Huang","Yansong Qu","Yue Leng","Sikai Chen"],"abstract":"Deploying autonomous vehicles (AVs) requires testing in diverse and challenging scenarios to ensure safety and reliability, yet collecting real-world data remains prohibitively expensive. While simulation-based approaches offer costeffective alternatives, most existing methods lack sufficient support for intuitive, interactive editing of generated scenarios. This paper presents Talk2Traffic, a novel framework that leverages multimodal large language models (MLLMs) to enable interactive and editable traffic scenario generation. Talk2Traffic allows human users to generate various traffic scenarios through multimodal inputs (text, speech, and sketches). Our approach first employs an MLLM-based interpreter to extract structured representations from these inputs. These representations are then translated into executable Scenic code using a retrieval-augmented generation mechanism to reduce ha...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/cvprw67362.2025.00364","openalex_id":"https://openalex.org/W4414198298","cited_by_count":1,"quality_score":46,"matched_keywords":["language model","retrieval"],"author_affiliations":["Google (United States)","Purdue University West Lafayette","University of Wisconsin–Madison"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8223999738693237},{"id":"https://openalex.org/C160145156","display_name":"Executable","score":0.7972999811172485},{"id":"https://openalex.org/C122783720","display_name":"Interpreter","score":0.5074999928474426},{"id":"https://openalex.org/C2776760102","display_name":"Code (set theory)","score":0.4796999990940094},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.4738999903202057},{"id":"https://openalex.org/C2779903281","display_name":"Modalities","score":0.4724000096321106},{"id":"https://openalex.org/C195324797","display_name":"Natural language","score":0.45590001344680786},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.3862000107765198}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4414198953","title":"Generative AI for Film Creation: A Survey of Recent Advances","url":"https://doi.org/10.1109/cvprw67362.2025.00623","published":"2025-06-11","authors":["Ruihan Zhang","Borou Yu","Jiajian Min","Yetong Xin","Zheng Wei","Juncheng Nemo Shi","Mingzhen Huang","Xianghao Kong","Nix Liu Xin","Shanshan Jiang","Praagya Bahuguna","Mark Chan"],"abstract":"Generative AI (GenAI) is transforming filmmaking, equipping artists with tools like text-to-image and image-to-video diffusion, neural radiance fields, avatar generation, and 3D synthesis. This paper examines the adoption of these technologies in filmmaking, analyzing workflows from recent AI-driven films to understand how GenAI contributes to character creation, aesthetic styling, and narration. We explore key strategies for maintaining character consistency, achieving stylistic coherence, and ensuring motion continuity. Additionally, we highlight emerging trends such as the growing use of 3D generation and the integration of real footage with AI-generated elements. Beyond technical advancements, we examine how GenAI is enabling new artistic expressions, from generating hard-to-shoot footage to dreamlike diffusion-based morphing effects, abstract visuals, and unworldly objects. We also....","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/cvprw67362.2025.00623","openalex_id":"https://openalex.org/W4414198953","cited_by_count":9,"quality_score":46,"matched_keywords":[],"author_affiliations":["Buffalo State University","California Southern University","Communication University of China","DryScrub (United States)","Google (United States)","Harvard University Press","Hong Kong University of Science and Technology","Moscow Institute of Thermal Technology","New York University","Oncoscience (Germany)","Pratt Institute","REALITY Publishing (United States)","Universidad del Istmo","University at Buffalo, State University of New York","University of California, Santa Barbara","University of Southampton","University of Southern California"],"concepts":[{"id":"https://openalex.org/C50637493","display_name":"Morphing","score":0.6951000094413757},{"id":"https://openalex.org/C177212765","display_name":"Workflow","score":0.6455000042915344},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6348999738693237},{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.5953999757766724},{"id":"https://openalex.org/C2777365542","display_name":"Avatar","score":0.5863999724388123},{"id":"https://openalex.org/C104114177","display_name":"Motion (physics)","score":0.5012000203132629},{"id":"https://openalex.org/C2780861071","display_name":"Character (mathematics)","score":0.44589999318122864},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.44190001487731934}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":9}},{"id":"openalex:W4414197127","title":"Exemplar Masking for Multimodal Incremental Learning","url":"https://doi.org/10.1109/cvprw67362.2025.00277","published":"2025-06-11","authors":["Yi-Lun Lee","Chen-Yu Lee","Wei-Chen Chiu","Yi‐Hsuan Tsai"],"abstract":"Multimodal incremental learning needs to digest the information from multiple modalities while concurrently learning new knowledge without forgetting the previously learned information. There are numerous challenges for this task, mainly including the larger storage size of multimodal data in exemplar-based methods and the computational requirement of finetuning on huge multimodal models. In this paper, we leverage the parameter-efficient tuning scheme to reduce the burden of fine-tuning and propose the exemplar masking framework to efficiently replay old knowledge. Specifically, the non-important tokens are masked based on the attention weights and the correlation across different modalities, significantly reducing the storage size of an exemplar and consequently saving more exemplars under the same memory buffer. Moreover, we design a multimodal data augmentation technique to diversify...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/cvprw67362.2025.00277","openalex_id":"https://openalex.org/W4414197127","cited_by_count":1,"quality_score":46,"matched_keywords":["memory","efficient"],"author_affiliations":["ATM (Poland)","Google (United States)","National Yang Ming Chiao Tung University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8256000280380249},{"id":"https://openalex.org/C2777402240","display_name":"Masking (illustration)","score":0.7300000190734863},{"id":"https://openalex.org/C7149132","display_name":"Forgetting","score":0.7181000113487244},{"id":"https://openalex.org/C153083717","display_name":"Leverage (statistics)","score":0.6223999857902527},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.60589998960495},{"id":"https://openalex.org/C2779903281","display_name":"Modalities","score":0.5368000268936157},{"id":"https://openalex.org/C2780735816","display_name":"Incremental learning","score":0.5163000226020813},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.48590001463890076}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4414197407","title":"HOI-Diff: Text-Driven Synthesis of 3D Human-Object Interactions using Diffusion Models","url":"https://doi.org/10.1109/cvprw67362.2025.00271","published":"2025-06-11","authors":["Xiaogang Peng","Yiming Xie","Zizhao Wu","Varun Jampani","Deqing Sun","Huaizu Jiang"],"abstract":"We address the problem of generating realistic 3D humanobject interactions (HOIs) driven by textual prompts. To this end, we take a modular design and decompose the complex task into simpler sub-tasks. We first develop a dualbranch diffusion model (DBDM) to generate both human and object motions conditioned on the input text, and encourage coherent motions by a cross-attention communication module between the human and object motion generation branches. We also develop an affordance prediction diffusion model (APDM) to predict the contacting area between the human and object during the interactions driven by the textual prompt. The APDM is independent of the results by the DBDM and thus can correct potential errors by the latter. Moreover, it stochastically generates the contacting points to diversify the generated motions. Finally, we incorporate the estimated contacting points into the...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/cvprw67362.2025.00271","openalex_id":"https://openalex.org/W4414197407","cited_by_count":8,"quality_score":45,"matched_keywords":[],"author_affiliations":["Capability Scotland","Google (United States)","Hangzhou Dianzi University","Northeastern University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7462999820709229},{"id":"https://openalex.org/C2781238097","display_name":"Object (grammar)","score":0.7387999892234802},{"id":"https://openalex.org/C2780451532","display_name":"Task (project management)","score":0.713699996471405},{"id":"https://openalex.org/C194995250","display_name":"Affordance","score":0.6786999702453613},{"id":"https://openalex.org/C101468663","display_name":"Modular design","score":0.6509000062942505},{"id":"https://openalex.org/C69357855","display_name":"Diffusion","score":0.6079000234603882},{"id":"https://openalex.org/C104114177","display_name":"Motion (physics)","score":0.5830000042915344},{"id":"https://openalex.org/C2776760102","display_name":"Code (set theory)","score":0.550000011920929}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":8}},{"id":"openalex:W4411203106","title":"Stable Attribute Group Editing for Reliable Few-Shot Image Generation","url":"https://doi.org/10.1109/tcsvt.2025.3578670","published":"2025-06-11","authors":["Guanqi Ding","Xinzhe Han","Shuhui Wang","Xin Jin","Qingming Huang"],"abstract":"Few-shot image generation aims to generate data of an unseen category based on only a few samples. Apart from basic content generation, a bunch of downstream applications hopefully benefit from this task, such as low-data detection and few-shot classification. To achieve this goal, the generated images should guarantee category retention for classification beyond the visual quality and diversity. In our preliminary work, we present an “editing-based” framework, Attribute Group Editing (AGE), for reliable few-shot image generation, which largely improves the performance compared with existing methods that require re-training a GAN with limited data. Nevertheless, AGE’s performance on downstream classification is not as satisfactory as expected. Furthermore, existing generative models suffer from similar issues. This paper focuses on addressing the issue of universal class inconsistency in...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/tcsvt.2025.3578670","openalex_id":"https://openalex.org/W4411203106","cited_by_count":2,"quality_score":39,"matched_keywords":[],"author_affiliations":["China Aerospace Science and Technology Corporation","Chinese Academy of Sciences","Huawei Technologies (China)","Institute of Computing Technology","University of Chinese Academy of Sciences"],"concepts":[{"id":"https://openalex.org/C2776674983","display_name":"Image editing","score":0.6519170999526978},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6411994695663452},{"id":"https://openalex.org/C2778344882","display_name":"Shot (pellet)","score":0.5071070194244385},{"id":"https://openalex.org/C2781311116","display_name":"Group (periodic table)","score":0.5032555460929871},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4822483956813812},{"id":"https://openalex.org/C115961682","display_name":"Image (mathematics)","score":0.47721606492996216},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.40751737356185913},{"id":"https://openalex.org/C121684516","display_name":"Computer graphics (images)","score":0.3571069538593292}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":2}},{"id":"openalex:W4414198734","title":"How Good is my Video-LMM? Complex Video Reasoning and Robustness Evaluation Suite for Video-LMMs","url":"https://doi.org/10.1109/cvprw67362.2025.00349","published":"2025-06-11","authors":["Muhammad Uzair Khattak","Muhammad Ferjad Naeem","Jameel Hassan","Muzammal Naseer","Federico Tombari","Fahad Shahbaz Khan","Salman Khan"],"abstract":"Recent advancements in Large Language Models (LLMs) have led to the development of Video Large Multi-modal Models (Video-LMMs) that can handle a wide range of video understanding tasks. These models have the potential to be deployed in real-world applications such as robotics, AI assistants, medical surgery, and autonomous vehicles. The widespread adoption of Video-LMMs in our daily lives underscores the importance of ensuring and evaluating their robust performance in mirroring human-like reasoning and interaction capabilities in complex, real-world contexts. However, existing benchmarks for Video-LMMs primarily focus on general video comprehension abilities and neglect assessing their reasoning capabilities over complex videos in the real-world context, and the robustness of these models through the lens of user prompts as text queries. In this paper, we present the Complex Video Reaso...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/cvprw67362.2025.00349","openalex_id":"https://openalex.org/W4414198734","cited_by_count":1,"quality_score":38,"matched_keywords":[],"author_affiliations":["Google (United States)","Khalifa University of Science and Technology","Mohamed bin Zayed University of Artificial Intelligence"],"concepts":[{"id":"https://openalex.org/C79581498","display_name":"Suite","score":0.8198000192642212},{"id":"https://openalex.org/C63479239","display_name":"Robustness (evolution)","score":0.8115000128746033},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7960000038146973},{"id":"https://openalex.org/C189645446","display_name":"Mirroring","score":0.6525999903678894},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5570999979972839},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.47699999809265137},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.33709999918937683},{"id":"https://openalex.org/C185798385","display_name":"Benchmark (surveying)","score":0.325300008058548}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4414198523","title":"HARMONY: Hidden Activation Representations and Model Output-Aware Uncertainty Estimation for Vision-Language Models","url":"https://doi.org/10.1109/cvprw67362.2025.00154","published":"2025-06-11","authors":["Erum Mushtaq","Zalan Fabian","Yavuz Faruk Bakman","Anil Ramakrishna","Mahdi Soltanolkotabi","A. Salman Avestimehr"],"abstract":"Assessing the reliability of Vision-Language Models (VLMs) is crucial in high-stakes applications. Uncertainty Estimation (UE) methods are widely used for this purpose. Most existing probability-based UE approaches rely on output probability distributions, aggregating token probabilities into a single uncertainty score using predefined functions. Another line of research leverages model hidden representations, training MLP-based models to predict uncertainty. However, these methods often fall short in capturing the complex semantic and visual relationships between tokens and struggle to identify biased probabilities influenced by language priors. Based on these observations, we propose HARMONY (Hidden Activation Representations and Model Output-aware uNcertaintY Estimation for Vision-Language Models), a transformer-based UE function that jointly leverages model hidden representations and...","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/cvprw67362.2025.00154","openalex_id":"https://openalex.org/W4414198523","cited_by_count":1,"quality_score":38,"matched_keywords":[],"author_affiliations":["Amazon (United States)","Southern California University for Professional Studies","University of Southern California"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7583000063896179},{"id":"https://openalex.org/C48145219","display_name":"Security token","score":0.7031999826431274},{"id":"https://openalex.org/C43214815","display_name":"Reliability (semiconductor)","score":0.5825999975204468},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5360999703407288},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.47999998927116394},{"id":"https://openalex.org/C185798385","display_name":"Benchmark (surveying)","score":0.4641000032424927},{"id":"https://openalex.org/C96250715","display_name":"Estimation","score":0.4172999858856201},{"id":"https://openalex.org/C149441793","display_name":"Probability distribution","score":0.41499999165534973}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4414197570","title":"D-Feat Occlusions: Diffusion Features for Robustness to Partial Visual Occlusions in Object Recognition","url":"https://doi.org/10.1109/cvprw67362.2025.00160","published":"2025-06-11","authors":["Rupayan Mallick","Sibo Dong","Nataniel Ruiz","Sarah Adel Bargal"],"abstract":"Applications of diffusion models for visual tasks have been quite noteworthy. This paper targets making classification models more robust to occlusions for the task of object recognition by proposing a pipeline that utilizes a frozen diffusion model. Diffusion features have demonstrated success in image generation and image completion while understanding image context. Occlusion can be posed as an image completion problem by deeming the pixels of the occluder to be 'missing.' We hypothesize that such features can help hallucinate object visual features behind occluding objects, and hence we propose using them to enable models to become more occlusion robust. We design experiments to include input-based augmentations as well as feature-based augmentations. Input-based augmentations involve finetuning on images where the occluder pixels are inpainted, and feature-based augmentations involv...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/cvprw67362.2025.00160","openalex_id":"https://openalex.org/W4414197570","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Georgetown University","Google (United States)"],"concepts":[{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.8535000085830688},{"id":"https://openalex.org/C2911011789","display_name":"Hallucinating","score":0.7802000045776367},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.7766000032424927},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7300999760627747},{"id":"https://openalex.org/C63479239","display_name":"Robustness (evolution)","score":0.7229999899864197},{"id":"https://openalex.org/C160633673","display_name":"Pixel","score":0.5587000250816345},{"id":"https://openalex.org/C64876066","display_name":"Cognitive neuroscience of visual object recognition","score":0.4950000047683716},{"id":"https://openalex.org/C153180895","display_name":"Pattern recognition (psychology)","score":0.44940000772476196}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"arxiv:2506.10056","title":"Reward Models Enable Scalable Code Verification by Trading Accuracy for Throughput","url":"https://huggingface.co/papers/2506.10056","published":"2025-06-11","authors":["Gabriel Orlanski","Nicholas Roberts","Aws Albarghouthi","Frederic Sala"],"abstract":"The standard paradigm for solving coding tasks via large language models (LLMs) is to generate-then-rank programs, where the latter step uses a verifier in the ranking process. The growing consensus is that a comprehensive verifier (e.g., a full test suite) should be prioritized over an outcome reward model (ORM) whenever possible, with little consideration given to the trade-offs involved. We aim to challenge this assumption by systematically exploring the tradeoff between speed and accuracy. We find that ORMs play a crucial role in scaling verification through trading accuracy for speed, even when a comprehensive verifier is available. Their value becomes especially apparent when used in a generate-prune-then-rank approach, where a faster but less accurate verifier removes incorrect solutions prior to ranking -- leading to a system that is 11.65x faster while only being 8.33% less accu...","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["huggingface_search"],"source":"huggingface_search","work_type":"","doi":"","openalex_id":"","cited_by_count":0,"quality_score":27,"matched_keywords":[],"author_affiliations":[],"concepts":[],"official_report":false,"quality_signals":{"company_match_source":"HuggingFace Papers organization/author metadata"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/beast-efficient-tokenization-of-b-splines-encoded-action-sequences-for-imitation-learning","title":"BEAST: Efficient Tokenization of B-Splines Encoded Action Sequences for Imitation Learning","url":"https://www.microsoft.com/en-us/research/publication/beast-efficient-tokenization-of-b-splines-encoded-action-sequences-for-imitation-learning/","published":"2025-06-10","authors":["Hongyi Zhou","Weiran Liao","Xi Huang","Yucheng Tang","Fabian Otto","Xiaogang Jia","Xinkai Jiang","Simon Hilber","Ge Li","Qian Wang","Ömer Erdinç Yagmurlu","Nils Blank"],"abstract":"We present the B-spline Encoded Action Sequence Tokenizer (BEAST), a novel action tokenizer that encodes action sequences into compact discrete or continuous tokens using B-splines. In contrast to existing action tokenizers based on vector quantization or byte pair encoding, BEAST requires no separate tokenizer training and consistently produces tokens of uniform length, enabling fast action sequence generation via parallel decoding. Leveraging our B-spline formulation, BEAST inherently ensures generating smooth trajectories without discontinuities between adjacent segments. We extensively evaluate BEAST by integrating it with three distinct model architectures: a Variational Autoencoder (VAE) with continuous tokens, a decoder-only Transformer with discrete tokens, and Florence-2, a pretrained Vision-Language Model with an encoder-decoder architecture, demonstrating BEAST's compatibility...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":80,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","1970-01-01","language model","efficient","quantization"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/enhancing-reasoning-capabilities-of-small-language-models-with-blueprints-and-prompt-template-search","title":"Enhancing Reasoning Capabilities of Small Language Models with Blueprints and Prompt Template Search","url":"https://www.microsoft.com/en-us/research/publication/enhancing-reasoning-capabilities-of-small-language-models-with-blueprints-and-prompt-template-search/","published":"2025-06-10","authors":["Dongge Han","Menglin Xia","Daniel Madrigal","Samuel Kessler","Ankur Mallick","Xuchao Zhang","Mirian Hipolito Garcia","Jin Xu","Victor Ruehle","Saravan Rajmohan"],"abstract":"Small language models (SLMs) offer promising and efficient alternatives to large language models (LLMs). However, SLMs' limited capacity restricts their reasoning capabilities and makes them sensitive to prompt variations. To address these challenges, we propose a novel framework that enhances SLM reasoning capabilities through LLM generated blueprints. The blueprints provide structured, high-level reasoning guides that help SLMs systematically tackle related problems. Furthermore, our framework integrates a prompt template search mechanism to mitigate the SLMs' sensitivity to prompt variations. Our framework demonstrates improved SLM performance across various tasks, including math (GSM8K), coding (MBPP), and logic reasoning (BBH). Our approach improves the reasoning capabilities of SLMs without increasing model size or requiring additional training, offering a lightweight and deploymen...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","1970-01-01","LLM","efficient"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/vicrit-a-verifiable-reinforcement-learning-proxy-task-for-visual-perception-in-vlms","title":"ViCrit: A Verifiable Reinforcement Learning Proxy Task for Visual Perception in VLMs","url":"https://www.microsoft.com/en-us/research/publication/vicrit-a-verifiable-reinforcement-learning-proxy-task-for-visual-perception-in-vlms/","published":"2025-06-10","authors":["Xiyao Wang","Zhengyuan Yang","Chao Feng","Yongyuan Liang","Yuhang Zhou","Xiaoyu Liu","Ziyi Zang","Ming Li","Chung-Ching Lin","Kevin Lin","Linjie Li","Furong Huang"],"abstract":"Reinforcement learning (RL) has shown great effectiveness for fine-tuning large language models (LLMs) using tasks that are challenging yet easily verifiable, such as math reasoning or code generation. However, extending this success to visual perception in vision-language models (VLMs) has been impeded by the scarcity of vision-centric tasks that are simultaneously challenging and unambiguously verifiable. To this end, we introduce ViCrit (Visual Caption Hallucination Critic), an RL proxy task that trains VLMs to localize a subtle, synthetic visual hallucination injected into paragraphs of human-written image captions. Starting from a 200-word captions, we inject a single, subtle visual description error-altering a few words on objects, attributes, counts, or spatial relations-and task the model to pinpoint the corrupted span given the image and the modified caption. This formulation pr...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","Reinforcement learning","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"huawei-noah:178","title":"Mem2Ego: Empowering Vision-Language Models with Global-to-Ego Memory for Long-Horizon Embodied Navigation","url":"https://www.noahlab.com.hk/en/scientific_research/mem2ego-empowering-vision-language-models-with-global-to-ego-memory-for-long-horizon-embodied-navigation","published":"2025-06-10","authors":["Huawei/Noah"],"abstract":"Official Huawei Noah's Ark Lab publication entry. Venue: CVPR 2025 Workshop Foundation Models Meet Embodied Agents. External paper link: https://arxiv.org/pdf/2502.14254","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Physical AI","CVPR 2025 Workshop Foundation Models Meet Embodied Agents","2025","memory"],"author_affiliations":["Huawei/Noah"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Huawei Noah Wagtail publication API https://www.noahlab.com.hk/wt_app/api/v2/pages/"}},{"id":"openalex:W4413147631","title":"DepthCrafter: Generating Consistent Long Depth Sequences for Open-world Videos","url":"https://doi.org/10.1109/cvpr52734.2025.00193","published":"2025-06-10","authors":["Wenbo Hu","Xiangjun Gao","Xiaoyu Li","Sijie Zhao","Xiaodong Cun","Yong Zhang","Long Quan","Ying Shan"],"abstract":"Estimating video depth in open-world scenarios is challenging due to the diversity of videos in appearance, content motion, camera movement, and length. We present DepthCrafter, an innovative method for generating temporally consistent long depth sequences with intricate details for open-world videos, without requiring any supplementary information such as camera poses or optical flow. The generalization ability to open-world videos is achieved by training the video-to-depth model from a pretrained image-to-video diffusion model, through our meticulously designed three-stage training strategy. Our training approach enables the model to generate depth sequences with variable lengths at one time, up to 110 frames, and harvest both precise depth details and rich content diversity from realistic and synthetic datasets. We also propose an inference strategy that can process extremely long vid...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/cvpr52734.2025.00193","openalex_id":"https://openalex.org/W4413147631","cited_by_count":31,"quality_score":67,"matched_keywords":[],"author_affiliations":["Hong Kong University of Science and Technology","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6101542711257935},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4549138844013214},{"id":"https://openalex.org/C121684516","display_name":"Computer graphics (images)","score":0.3270798921585083}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":31}},{"id":"openalex:W4413144929","title":"LLMDet: Learning Strong Open-Vocabulary Object Detectors under the Supervision of Large Language Models","url":"https://doi.org/10.1109/cvpr52734.2025.01396","published":"2025-06-10","authors":["Shenghao Fu","Qize Yang","Qijie Mo","Junkai Yan","Xihan Wei","Jingke Meng","Xiaohua Xie","Wei‐Shi Zheng"],"abstract":"Recent open-vocabulary detectors achieve promising performance with abundant region-level annotated data. In this work, we show that an open-vocabulary detector co-training with a large language model by generating image-level detailed captions for each image can further improve performance. To achieve the goal, we first collect a dataset, GroundingCap-1M, wherein each image is accompanied by associated grounding labels and an image-level detailed caption. With this dataset, we finetune an open-vocabulary detector with training objectives including a standard grounding loss and a caption generation loss. We take advantage of a large language model to generate both region-level short captions for each region of interest and image-level long captions for the whole image. Under the supervision of the large language model, the resulting detector, LLMDet, outperforms the baseline by a clear m...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/cvpr52734.2025.01396","openalex_id":"https://openalex.org/W4413144929","cited_by_count":17,"quality_score":58,"matched_keywords":["language model"],"author_affiliations":["Alibaba Group (China)","Alibaba Group (United States)","Sun Yat-sen University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7686399221420288},{"id":"https://openalex.org/C2777601683","display_name":"Vocabulary","score":0.6620064973831177},{"id":"https://openalex.org/C2781238097","display_name":"Object (grammar)","score":0.5232369303703308},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.4945966899394989},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4728931486606598},{"id":"https://openalex.org/C2984601542","display_name":"Vocabulary learning","score":0.41072535514831543},{"id":"https://openalex.org/C199360897","display_name":"Programming language","score":0.33410340547561646},{"id":"https://openalex.org/C41895202","display_name":"Linguistics","score":0.2825375199317932}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":17}},{"id":"openalex:W4413147124","title":"Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models","url":"https://doi.org/10.1109/cvpr52734.2025.00847","published":"2025-06-10","authors":["Yuhao Dong","Zuyan Liu","Hailong Sun","Jingkang Yang","Weisheng Hu","Yongming Rao","Ziwei Liu"],"abstract":"Large Language Models (LLMs) demonstrate enhanced capabilities and reliability by reasoning more, evolving from Chain-of-Thought prompting to product-level solutions like OpenAI o1. Despite various efforts to improve LLM reasoning, high-quality long-chain reasoning data and optimized training pipelines still remain inadequately explored in vision-language tasks. In this paper, we present InsightV, an early effort to 1) scalably produce long and robust reasoning data for complex multi-modal tasks, and 2) an effective training pipeline to enhance the reasoning capabilities of multi-modal large language models (MLLMs). Specifically, to create long and structured reasoning data without human labor, we design a two-step pipeline with a progressive strategy to generate sufficiently long and diverse reasoning paths and a multi-granularity assessment method to ensure data quality. We observe tha...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/cvpr52734.2025.00847","openalex_id":"https://openalex.org/W4413147124","cited_by_count":7,"quality_score":56,"matched_keywords":["LLM","agent","multi-agent"],"author_affiliations":["Nanyang Technological University","Tencent (China)","Tsinghua University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6791884303092957},{"id":"https://openalex.org/C2780878386","display_name":"Visual language","score":0.45322877168655396},{"id":"https://openalex.org/C199185054","display_name":"Chain (unit)","score":0.45063936710357666},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.3989149034023285},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.38891899585723877},{"id":"https://openalex.org/C188147891","display_name":"Cognitive science","score":0.37203073501586914},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.3660464882850647},{"id":"https://openalex.org/C41895202","display_name":"Linguistics","score":0.2103341817855835}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":7}},{"id":"openalex:W4413146548","title":"ATP-LLaVA: Adaptive Token Pruning for Large Vision Language Models","url":"https://doi.org/10.1109/cvpr52734.2025.02325","published":"2025-06-10","authors":["Xubing Ye","Yukang Gan","Yixiao Ge","Xiaoping Zhang","Yansong Tang"],"abstract":"Large Vision Language Models (LVLMs) have achieved significant success across multi-modal tasks. However, the computational cost of processing long visual tokens can be prohibitively expensive on resource-limited devices. Previous methods have identified redundancy in visual tokens within the Large Language Model (LLM) decoder layers and have mitigated this by pruning tokens using a predefined or fixed ratio, thereby reducing computational over-head. Nonetheless, we observe that the impact of pruning ratio varies across different LLM layers and instances (image-prompt pairs). Therefore, it is essential to develop a layer-wise and instance-wise vision token pruning strategy to balance computational cost and model performance effectively. We propose ATP-LLaVA, a novel approach that adaptively determines instance-specific token pruning ratios for each LLM layer. Specifically, we introduce a...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/cvpr52734.2025.02325","openalex_id":"https://openalex.org/W4413146548","cited_by_count":11,"quality_score":56,"matched_keywords":["LLM","language model"],"author_affiliations":["Tencent (China)","Tsinghua University","Tsinghua–Berkeley Shenzhen Institute"],"concepts":[{"id":"https://openalex.org/C48145219","display_name":"Security token","score":0.8136744499206543},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7130781412124634},{"id":"https://openalex.org/C108010975","display_name":"Pruning","score":0.6867562532424927},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4867284595966339},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.4314520061016083},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.4255959093570709},{"id":"https://openalex.org/C31258907","display_name":"Computer network","score":0.1765580177307129},{"id":"https://openalex.org/C86803240","display_name":"Biology","score":0.06711426377296448}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":11}},{"id":"openalex:W4413147366","title":"MotionBench: Benchmarking and Improving Fine-Grained Video Motion Understanding for Vision Language Models","url":"https://doi.org/10.1109/cvpr52734.2025.00791","published":"2025-06-10","authors":["Wenyi Hong","Yean Cheng","Zhuoyi Yang","Weihan Wang","Lefan Wang","Xiaotao Gu","Shiyu Huang","Yuxiao Dong","Jie Tang"],"abstract":"In recent years, vision language models (VLMs) have made significant advancements in video understanding. However, a crucial capability — fine-grained motion comprehension — remains under-explored in current benchmarks. To address this gap, we propose MotionBench, a comprehensive evaluation benchmark designed to assess the fine-grained motion comprehension of video understanding models. MotionBench evaluates models’ motion-level perception through six primary categories of motion-oriented question types and includes data collected from diverse sources, ensuring a broad representation of real-world video content. Experimental results reveal that existing VLMs perform poorly in understanding fine-grained motions. To enhance VLM’s ability to perceive fine-grained motion within a limited sequence length of LLM, we conduct extensive experiments reviewing VLM architectures optimized for video....","companies":["Z.ai/Zhipu"],"matched_orgs":["Z.ai/Zhipu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/cvpr52734.2025.00791","openalex_id":"https://openalex.org/W4413147366","cited_by_count":6,"quality_score":55,"matched_keywords":["LLM","efficient","compression"],"author_affiliations":["Tsinghua University","Zhipu AI (China)"],"concepts":[{"id":"https://openalex.org/C86251818","display_name":"Benchmarking","score":0.7601521611213684},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7475346326828003},{"id":"https://openalex.org/C104114177","display_name":"Motion (physics)","score":0.5984874963760376},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5277916789054871},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.4969013035297394},{"id":"https://openalex.org/C121684516","display_name":"Computer graphics (images)","score":0.39713215827941895},{"id":"https://openalex.org/C162853370","display_name":"Marketing","score":0.0},{"id":"https://openalex.org/C144133560","display_name":"Business","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":6}},{"id":"openalex:W4413145624","title":"LSceneLLM: Enhancing Large 3D Scene Understanding Using Adaptive Visual Preferences","url":"https://doi.org/10.1109/cvpr52734.2025.00356","published":"2025-06-10","authors":["Hongyan Zhi","Peihao Chen","Junyan Li","Shuailei Ma","Xinyu Sun","Tianhang Xiang","Yinjie Lei","Mingkui Tan","Chuang Gan"],"abstract":"Research on 3D Vision-Language Models (3D-VLMs) is gaining increasing attention, which is crucial for developing embodied AI within 3D scenes, such as visual navigation and embodied question answering. Due to the high density of visual features, especially in large 3D scenes, accurately locating task-relevant visual information is challenging. Existing works attempt to segment all objects and consider their features as scene representations. However, these task-agnostic object features include much redundant information and missing details for the task-relevant area. To tackle these problems, we propose LSceneLLM, an adaptive framework that automatically identifies task-relevant areas by leveraging LLM’s visual preference for different tasks, followed by a plug-and-play scene magnifier module to capture fine-grained details in focused areas. Specifically, a dense token selector examines....","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/cvpr52734.2025.00356","openalex_id":"https://openalex.org/W4413145624","cited_by_count":8,"quality_score":53,"matched_keywords":["LLM","preference"],"author_affiliations":["Sichuan University","South China University of Technology","Tencent (China)","Universidad del Noreste","University of Massachusetts Amherst"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7512881755828857},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.5257010459899902},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5101318955421448},{"id":"https://openalex.org/C36464697","display_name":"Visualization","score":0.4288368821144104},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.3622868061065674}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":8}},{"id":"openalex:W4413147390","title":"JarvisIR: Elevating Autonomous Driving Perception with Intelligent Image Restoration","url":"https://doi.org/10.1109/cvpr52734.2025.02084","published":"2025-06-10","authors":["Yunlong Lin","Zixu Lin","Haoyu Chen","Panwang Pan","Chenxin Li","Sixiang Chen","Kairun Wen","Yeying Jin","Wenbo Li","Xinghao Ding"],"abstract":"Vision-centric perception systems struggle with unpredictable and coupled weather degradations in the wild. Current solutions are often limited, as they either depend on specific degradation priors or suffer from significant domain gaps. To enable robust and autonomous operation in real-world conditions, we propose JarvisIR, a VLM-powered agent that leverages the VLM as a controller to manage multiple expert restoration models. To further enhance system robustness, reduce hallucinations, and improve generalizability in real-world adverse weather, JarvisIR employs a novel two-stage framework consisting of supervised fine-tuning and human feedback alignment. Specifically, to address the lack of paired data in real-world scenarios, the human feedback alignment enables the VLM to be fine-tuned effectively on large-scale real-world data in an unsupervised manner. To support the training and e...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/cvpr52734.2025.02084","openalex_id":"https://openalex.org/W4413147390","cited_by_count":12,"quality_score":53,"matched_keywords":["agent"],"author_affiliations":["Chinese University of Hong Kong","Huawei Technologies (Sweden)","Policy Innovation and Co-ordination Office","Tencent (China)","Xiamen University"],"concepts":[{"id":"https://openalex.org/C26760741","display_name":"Perception","score":0.6580122113227844},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6241335868835449},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.5936748385429382},{"id":"https://openalex.org/C106430172","display_name":"Image restoration","score":0.49524155259132385},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4512932300567627},{"id":"https://openalex.org/C115961682","display_name":"Image (mathematics)","score":0.43415695428848267},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.3240931034088135},{"id":"https://openalex.org/C9417928","display_name":"Image processing","score":0.224900484085083}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":12}},{"id":"openalex:W4413157571","title":"VoCo-LLaMA: Towards Vision Compression with Large Language Models","url":"https://doi.org/10.1109/cvpr52734.2025.02777","published":"2025-06-10","authors":["Xubing Ye","Yukang Gan","Xiaoke Huang","Yixiao Ge","Yansong Tang"],"abstract":"Vision-Language Models (VLMs) have achieved remarkable success in various multi-modal tasks, but they are often bottlenecked by the limited context window and high computational cost of processing high-resolution image inputs and videos. Vision compression can alleviate this problem by reducing the vision token count. Previous approaches compress vision tokens with external modules and force LLMs to understand the compressed ones, leading to visual information loss. However, the LLMs’ understanding paradigm of vision tokens is not fully utilised in the compression learning process. We propose VoCo-LLaMA, the first approach to compress vision tokens using LLMs. By introducing Vision Compression tokens during the vision instruction tuning phase and leveraging attention distillation, our method distill how LLMs comprehend vision tokens into their processing of VoCo tokens. VoCo-LLaMA facili...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/cvpr52734.2025.02777","openalex_id":"https://openalex.org/W4413157571","cited_by_count":7,"quality_score":52,"matched_keywords":["compression","distillation"],"author_affiliations":["Tencent (China)","Tsinghua University","Tsinghua–Berkeley Shenzhen Institute","University of California, Santa Cruz"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7028902769088745},{"id":"https://openalex.org/C180016635","display_name":"Compression (physics)","score":0.4623931646347046},{"id":"https://openalex.org/C78548338","display_name":"Data compression","score":0.45918363332748413},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.44092318415641785},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.416734516620636},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.3394453823566437},{"id":"https://openalex.org/C192562407","display_name":"Materials science","score":0.0},{"id":"https://openalex.org/C159985019","display_name":"Composite material","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":7}},{"id":"openalex:W4413147584","title":"VLOGGER: Multimodal Diffusion for Embodied Avatar Synthesis","url":"https://doi.org/10.1109/cvpr52734.2025.01482","published":"2025-06-10","authors":["Enric Corona","Andrei Zanfir","Eduard Gabriel Băzăvan","Nikos Kolotouros","Thiemo Alldieck","Cristian Sminchisescu"],"abstract":"We propose VLOGGER, a method for audio-driven human video generation from a single input image of a person, which builds on the success of recent generative diffusion models. Our method consists of 1) a stochastic human-to-3d-motion diffusion model, and 2) a novel diffusion-based architecture that augments text-to-image models with both spatial and temporal controls. This supports the generation of high quality video of variable length, easily controllable through text or speech via high-level representations of human faces and bodies. In contrast to previous work, our method does not require training for each person, does not rely on face detection and cropping, generates the complete image (not just the face or the lips), and considers a broad spectrum of scenarios (e.g. visible torso or diverse subject identities) that are critical to correctly synthesize humans who communicate. We al...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/cvpr52734.2025.01482","openalex_id":"https://openalex.org/W4413147584","cited_by_count":9,"quality_score":50,"matched_keywords":["personalization"],"author_affiliations":["DeepMind (United Kingdom)","Google (United States)"],"concepts":[{"id":"https://openalex.org/C2777365542","display_name":"Avatar","score":0.846877932548523},{"id":"https://openalex.org/C100609095","display_name":"Embodied cognition","score":0.7597191333770752},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7014868259429932},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.5460562109947205},{"id":"https://openalex.org/C135641252","display_name":"Multimodal interaction","score":0.4225867986679077},{"id":"https://openalex.org/C69357855","display_name":"Diffusion","score":0.42244040966033936},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.2500578761100769},{"id":"https://openalex.org/C121332964","display_name":"Physics","score":0.0473533570766449}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":9}},{"id":"openalex:W4413146177","title":"Recognition-Synergistic Scene Text Editing","url":"https://doi.org/10.1109/cvpr52734.2025.01223","published":"2025-06-10","authors":["Zhengyao Fang","Pengyuan Lyu","Jingjing Wu","Chengquan Zhang","Jun Yu","Guangming Lu","Wenjie Pei"],"abstract":"Scene text editing aims to modify text content within scene images while maintaining style consistency. Traditional methods achieve this by explicitly disentangling style and content from the source image and then fusing the style with the target content, while ensuring content consistency using a pre-trained recognition model. Despite notable progress, these methods suffer from complex pipelines, leading to suboptimal performance in complex scenarios. In this work, we introduce Recognition-Synergistic Scene Text Editing (RS-STE), a novel approach that fully exploits the intrinsic synergy of text recognition for editing. Our model seamlessly integrates text recognition with text editing within a unified framework, and leverages the recognition model’s ability to implicitly disentangle style and content while ensuring content consistency. Specifically, our approach employs a multi-modal p...","companies":["Tencent/Hunyuan","Baidu"],"matched_orgs":["Tencent/Hunyuan","Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/cvpr52734.2025.01223","openalex_id":"https://openalex.org/W4413146177","cited_by_count":1,"quality_score":50,"matched_keywords":[],"author_affiliations":["Baidu (China)","Harbin Institute of Technology","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7355034351348877},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.43842029571533203},{"id":"https://openalex.org/C121684516","display_name":"Computer graphics (images)","score":0.39815962314605713},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.36281752586364746},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.35364943742752075},{"id":"https://openalex.org/C28490314","display_name":"Speech recognition","score":0.3410411477088928}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4413145386","title":"Motion Prompting: Controlling Video Generation with Motion Trajectories","url":"https://doi.org/10.1109/cvpr52734.2025.00010","published":"2025-06-10","authors":["Daniel Geng","Charles Herrmann","Junhwa Hur","Forrester Cole","Serena Zhang","Tobias Pfaff","Tatiana López-Guevara","Yusuf Aytar","Michael Rubinstein","Chen Sun","Oliver Wang","Andrew Owens"],"abstract":"Motion control is crucial for generating expressive and compelling video content; however, most existing video generation models rely mainly on text prompts for control, which struggle to capture the nuances of dynamic actions and temporal compositions. To this end, we train a video generation model conditioned on spatiotemporally sparse or dense motion trajectories. In contrast to prior motion conditioning work, this flexible representation can encode any number of trajectories, object-specific or global scene motion, and temporally sparse motion; due to its flexibility we refer to this conditioning as motion prompts. While users may directly specify sparse trajectories, we also show how to translate high-level user requests into detailed, semi-dense motion prompts, a process we term motion prompt expansion. We demonstrate the versatility of our approach through various applications, in...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/cvpr52734.2025.00010","openalex_id":"https://openalex.org/W4413145386","cited_by_count":13,"quality_score":50,"matched_keywords":[],"author_affiliations":["DeepMind (United Kingdom)","Google (United States)","University of Michigan–Ann Arbor"],"concepts":[{"id":"https://openalex.org/C104114177","display_name":"Motion (physics)","score":0.6712009906768799},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6519874334335327},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.6236063241958618},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4789350926876068},{"id":"https://openalex.org/C13662910","display_name":"Trajectory","score":0.4372750222682953},{"id":"https://openalex.org/C10161872","display_name":"Motion estimation","score":0.4276649057865143},{"id":"https://openalex.org/C121684516","display_name":"Computer graphics (images)","score":0.36547547578811646},{"id":"https://openalex.org/C121332964","display_name":"Physics","score":0.10789218544960022}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":13}},{"id":"openalex:W4413144741","title":"Cropper: Vision-Language Model for Image Cropping through In-Context Learning","url":"https://doi.org/10.1109/cvpr52734.2025.02793","published":"2025-06-10","authors":["Seung Hyun Lee","Ji‐Jun Jiang","Yiran Xu","Z. Li","Junjie Ke","Yinxiao Li","Junfeng He","Steven Hickson","Katie Datsenko","Sangpil Kim","Shuicheng Yan","Irfan Essa"],"abstract":"The goal of image cropping is to identify visually appealing crops in an image. Conventional methods are trained on specific datasets and fail to adapt to new requirements. Recent breakthroughs in large vision-language models (VLMs) enable visual in-context learning without explicit training. However, downstream tasks with VLMs remain under explored. In this paper, we propose an effective approach to leverage VLMs for image cropping. First, we propose an efficient prompt retrieval mechanism for image cropping to automate the selection of in-context examples. Second, we introduce an iterative refinement strategy to iteratively enhance the predicted crops. The proposed framework, we refer to as Cropper, is applicable to a wide range of cropping tasks, including free-form cropping, subject-aware cropping, and aspect ratio-aware cropping. Extensive experiments demonstrate that Cropper signif...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/cvpr52734.2025.02793","openalex_id":"https://openalex.org/W4413144741","cited_by_count":1,"quality_score":50,"matched_keywords":["language model","retrieval","efficient"],"author_affiliations":["DeepMind (United Kingdom)","Google (United States)"],"concepts":[{"id":"https://openalex.org/C13558536","display_name":"Cropping","score":0.7523434162139893},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6633406281471252},{"id":"https://openalex.org/C2779343474","display_name":"Context (archaeology)","score":0.5915290117263794},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5510837435722351},{"id":"https://openalex.org/C183322885","display_name":"Context model","score":0.4455585181713104},{"id":"https://openalex.org/C115961682","display_name":"Image (mathematics)","score":0.4320381283760071},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.422478586435318},{"id":"https://openalex.org/C205649164","display_name":"Geography","score":0.10750561952590942}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4413145886","title":"Active Data Curation Effectively Distills Large-Scale Multimodal Models","url":"https://doi.org/10.1109/cvpr52734.2025.01345","published":"2025-06-10","authors":["Vishaal Udandarao","Nikhil Parthasarathy","Muhammad Ferjad Naeem","Talfan Evans","Samuel Albanie","Federico Tombari","Yongqin Xian","Alessio Tonioni","Olivier J. Hénaff"],"abstract":"Knowledge distillation (KD) is the de facto standard for compressing large-scale multimodal models into smaller ones. Prior works have explored ever more complex KD strategies involving different objectives, teacher-ensembles, and weight inheritance. In this work, we explore an alternative, yet simple approach—active data curation as effective distillation for contrastive multimodal pretraining. Our simple online batch selection method, ACID, outperforms strong KD baselines across various model-,data-and compute-configurations. Further, we find such an active curation strategy to in fact be complementary to standard KD, and can be effectively combined to train highly performant inference-efficient models. Our simple and scalable pretraining framework, ACED, achieves state-of-the-art results across 27 zero-shot classification and image-text retrieval tasks with upto 11% less inference FLO...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/cvpr52734.2025.01345","openalex_id":"https://openalex.org/W4413145886","cited_by_count":1,"quality_score":50,"matched_keywords":["retrieval","efficient","distillation"],"author_affiliations":["DeepMind (United Kingdom)","Google (United States)","TH Bingen University of Applied Sciences"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6701534986495972},{"id":"https://openalex.org/C91632574","display_name":"Data curation","score":0.6259573698043823},{"id":"https://openalex.org/C2778755073","display_name":"Scale (ratio)","score":0.5572813749313354},{"id":"https://openalex.org/C2522767166","display_name":"Data science","score":0.45037898421287537},{"id":"https://openalex.org/C67186912","display_name":"Data modeling","score":0.42480647563934326},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.32133230566978455},{"id":"https://openalex.org/C77088390","display_name":"Database","score":0.20433369278907776},{"id":"https://openalex.org/C58640448","display_name":"Cartography","score":0.11638972163200378}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4413157009","title":"Towards General Visual-Linguistic Face Forgery Detection","url":"https://doi.org/10.1109/cvpr52734.2025.01823","published":"2025-06-10","authors":["Ke Sun","Chen Shen","Taiping Yao","Ziyin Zhou","Jiayi Ji","Xiaoshuai Sun","Chia‐Wen Lin","Rongrong Ji"],"abstract":"Face manipulation techniques have achieved significant advances, presenting serious challenges to security and social trust. Recent works demonstrate that leveraging multimodal models can enhance the generalization and interpretability of face forgery detection. However, existing annotation approaches, whether through human labeling or direct Multimodal Large Language Model (MLLM) generation, often suffer from hallucination issues, leading to inaccurate text descriptions, especially for high-quality forgeries. To address this, we propose Face Forgery Text Generator (FFTG), a novel annotation pipeline that generates accurate text descriptions by leveraging forgery masks for initial region and type identification, followed by a comprehensive prompting strategy to guide MLLMs in reducing hallucination. We validate our approach through fine-tuning both CLIP with a three-branch training frame...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/cvpr52734.2025.01823","openalex_id":"https://openalex.org/W4413157009","cited_by_count":8,"quality_score":49,"matched_keywords":["language model"],"author_affiliations":["National Tsing Hua University","Tencent (China)","Xiamen University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6874986290931702},{"id":"https://openalex.org/C2779304628","display_name":"Face (sociological concept)","score":0.6697874069213867},{"id":"https://openalex.org/C4641261","display_name":"Face detection","score":0.5382174849510193},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.484118789434433},{"id":"https://openalex.org/C31510193","display_name":"Facial recognition system","score":0.4465906322002411},{"id":"https://openalex.org/C41895202","display_name":"Linguistics","score":0.42401671409606934},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.3878959119319916},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.36114293336868286}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":8}},{"id":"openalex:W4413147815","title":"Number it: Temporal Grounding Videos like Flipping Manga","url":"https://doi.org/10.1109/cvpr52734.2025.01284","published":"2025-06-10","authors":["Yongliang Wu","Xinting Hu","Yuyang Sun","Yizhou Zhou","Wenbo Zhu","Fengyun Rao","Bernt Schiele","Xu Yang"],"abstract":"Video Large Language Models (Vid-LLMs) have made remarkable advancements in comprehending video content for QA dialogue. However, they struggle to extend this visual understanding to tasks requiring precise temporal localization, known as Video Temporal Grounding (VTG). To address this, we introduce Number-Prompt (NumPro), a novel method that empowers Vid-LLMs to bridge visual comprehension with temporal grounding by adding unique numerical identifiers to each video frame. Treating a video as a sequence of numbered frame images, NumPro transforms VTG into an intuitive process: flipping through manga panels in sequence. This allows Vid-LLMs to “read” event timelines, accurately linking visual content with cor responding temporal information. Our experiments demonstrate that NumPro significantly boosts VTG performance of top-tier Vid-LLMs without additional computational cost. Furthermore,...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/cvpr52734.2025.01284","openalex_id":"https://openalex.org/W4413147815","cited_by_count":8,"quality_score":49,"matched_keywords":["retrieval"],"author_affiliations":["Max Planck Institute for Informatics","Tencent (China)","University of California, Berkeley"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6438214778900146},{"id":"https://openalex.org/C168993435","display_name":"Ground","score":0.47753655910491943},{"id":"https://openalex.org/C121684516","display_name":"Computer graphics (images)","score":0.4474470913410187},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3506404757499695},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.32565855979919434},{"id":"https://openalex.org/C119599485","display_name":"Electrical engineering","score":0.12318956851959229},{"id":"https://openalex.org/C127413603","display_name":"Engineering","score":0.11423864960670471}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":8}},{"id":"openalex:W4413147351","title":"FireEdit: Fine-grained Instruction-based Image Editing via Region-aware Vision Language Model","url":"https://doi.org/10.1109/cvpr52734.2025.01222","published":"2025-06-10","authors":["Jun Zhou","Jiahao Li","Zunnan Xu","Hanhui Li","Yiji Cheng","Fa-Ting Hong","Qin Lin","Qinglin Lu","Xiaodan Liang"],"abstract":"Currently, instruction-based image editing methods have made significant progress by leveraging the powerful cross-modal understanding capabilities of vision language models (VLMs). However, they still face challenges in three key areas: 1) complex scenarios; 2) semantic consistency; and 3) fine-grained editing. To address these issues, we propose FireEdit, an innovative Fine-grained Instruction-based image editing framework that exploits a REgion-aware VLM. FireEdit is designed to accurately comprehend user instructions and ensure effective control over the editing process. Specifically, we enhance the fine-grained visual perception capabilities of the VLM by introducing additional region tokens. Relying solely on the output of the LLM to guide the diffusion model may lead to suboptimal editing results. Therefore, we propose a Time-Aware Target Injection module and a Hybrid Visual Cross...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/cvpr52734.2025.01222","openalex_id":"https://openalex.org/W4413147351","cited_by_count":4,"quality_score":49,"matched_keywords":["LLM","language model"],"author_affiliations":["Hong Kong University of Science and Technology","Sun Yat-sen University","Tencent (China)","Tsinghua University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8488825559616089},{"id":"https://openalex.org/C115961682","display_name":"Image (mathematics)","score":0.5573479533195496},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4789865016937256},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.45782172679901123},{"id":"https://openalex.org/C2776674983","display_name":"Image editing","score":0.4310167133808136},{"id":"https://openalex.org/C121684516","display_name":"Computer graphics (images)","score":0.4286862313747406}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":4}},{"id":"openalex:W4413147005","title":"Hallo3: Highly Dynamic and Realistic Portrait Image Animation with Video Diffusion Transformer","url":"https://doi.org/10.1109/cvpr52734.2025.01964","published":"2025-06-10","authors":["Jiahao Cui","Hui Li","Yun Zhan","Han Lin Shang","Kaihui Cheng","Yuqi Ma","Shan Mu","Hang Zhou","Jingdong Wang","Siyu Zhu"],"abstract":"Existing methodologies for animating portrait images face significant challenges, particularly in handling non-frontal perspectives, rendering dynamic objects around the portrait, and generating immersive, realistic backgrounds. In this paper, we introduce the first application of a pretrained transformer-based video generative model that demonstrates strong generalization capabilities and generates highly dynamic, realistic videos for portrait animation, effectively addressing these challenges. The adoption of a new video backbone model makes previous U-Net-Based methods for identity maintenance, audio conditioning, and video extrapolation inapplicable. To address this limitation, we design an identity reference network consisting of a causal 3D VAE combined with a stacked series of transformer layers, ensuring consistent facial identity across video sequences. Additionally, we investig...","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/cvpr52734.2025.01964","openalex_id":"https://openalex.org/W4413147005","cited_by_count":11,"quality_score":48,"matched_keywords":[],"author_affiliations":["Baidu (China)","Fudan University"],"concepts":[{"id":"https://openalex.org/C502989409","display_name":"Animation","score":0.7301049828529358},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.690183162689209},{"id":"https://openalex.org/C121684516","display_name":"Computer graphics (images)","score":0.6663106083869934},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.4898940920829773},{"id":"https://openalex.org/C66322947","display_name":"Transformer","score":0.4770990014076233},{"id":"https://openalex.org/C69357855","display_name":"Diffusion","score":0.45067691802978516},{"id":"https://openalex.org/C69369342","display_name":"Computer animation","score":0.4301976263523102},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.41077613830566406}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":11}},{"id":"openalex:W4413156425","title":"Forensics-Bench: A Comprehensive Forgery Detection Benchmark Suite for Large Vision Language Models","url":"https://doi.org/10.1109/cvpr52734.2025.00400","published":"2025-06-10","authors":["Jin Wang","Chenghui Lv","Xian Li","Shichao Dong","H. Li","Kelu Yao","Chao Li","Wenqi Shao","Ping Luo"],"abstract":"Recently, the rapid development of AIGC has significantly boosted the diversities of fake media spread in the Internet, posing unprecedented threats to social security, politics, law, and etc. To detect the ever-increasingly diverse malicious fake media in the new era of AIGC, recent studies have proposed to exploit Large Vision Language Models (LVLMs) to design robust forgery detectors due to their impressive performance on a wide range of multimodal tasks. However, it still lacks a comprehensive benchmark designed to comprehensively assess LVLMs' discerning capabilities on forgery media. To fill this gap, we present Forensics-Bench, a new forgery detection evaluation benchmark suite to assess LVLMs across massive forgery detection tasks, requiring comprehensive recognition, location and reasoning capabilities on diverse forgeries. Forensics-Bench comprises 63, 292 meticulously curated....","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/cvpr52734.2025.00400","openalex_id":"https://openalex.org/W4413156425","cited_by_count":3,"quality_score":48,"matched_keywords":["politics","media"],"author_affiliations":["Alibaba Group (China)","Institute for Advanced Study","Megvii (China)","ShangHai JiAi Genetics & IVF Institute","Shanghai Artificial Intelligence Laboratory","University of Hong Kong","Vi Technology (United States)","Zhejiang Lab","Zhejiang University"],"concepts":[{"id":"https://openalex.org/C79581498","display_name":"Suite","score":0.8596704602241516},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7524331212043762},{"id":"https://openalex.org/C185798385","display_name":"Benchmark (surveying)","score":0.7390440702438354},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.45073097944259644},{"id":"https://openalex.org/C38652104","display_name":"Computer security","score":0.413247287273407},{"id":"https://openalex.org/C205649164","display_name":"Geography","score":0.0},{"id":"https://openalex.org/C95457728","display_name":"History","score":0.0},{"id":"https://openalex.org/C13280743","display_name":"Geodesy","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":3}},{"id":"openalex:W4413157265","title":"Dual-Interrelated Diffusion Model for Few-Shot Anomaly Image Generation","url":"https://doi.org/10.1109/cvpr52734.2025.02832","published":"2025-06-10","authors":["Ying Jin","Jinlong Peng","Qingdong He","Teng Hu","Jiafu Wu","Hao Chen","H Wang","Wenbing Zhu","Mingmin Chi","Jun Liu","Yabiao Wang"],"abstract":"The performance of anomaly inspection in industrial manufacturing is constrained by the scarcity of anomaly data. To overcome this challenge, researchers have started employing anomaly generation approaches to augment the anomaly dataset. However, existing anomaly generation methods suffer from limited diversity in the generated anomalies and struggle to achieve a seamless blending of this anomaly with the original image. Moreover, the generated mask is usually not aligned with the generated anomaly. In this paper, we overcome these challenges from a new perspective, simultaneously generating a pair of the overall image and the corresponding anomaly part. We propose DualAnoDiff, a novel diffusion-based few-shot anomaly image generation model, which can generate diverse and realistic anomaly images by using a dual-interrelated diffusion model, where one of them is employed to generate the...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/cvpr52734.2025.02832","openalex_id":"https://openalex.org/W4413157265","cited_by_count":11,"quality_score":48,"matched_keywords":[],"author_affiliations":["Fudan University","Shanghai Jiao Tong University","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C2780980858","display_name":"Dual (grammatical number)","score":0.5616708397865295},{"id":"https://openalex.org/C2778344882","display_name":"Shot (pellet)","score":0.5566036701202393},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5158818960189819},{"id":"https://openalex.org/C12997251","display_name":"Anomaly (physics)","score":0.5010662078857422},{"id":"https://openalex.org/C115961682","display_name":"Image (mathematics)","score":0.48806679248809814},{"id":"https://openalex.org/C69357855","display_name":"Diffusion","score":0.46770545840263367},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.38566747307777405},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.34326326847076416}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":11}},{"id":"openalex:W4413145306","title":"Vision-Language Models Do Not Understand Negation","url":"https://doi.org/10.1109/cvpr52734.2025.02757","published":"2025-06-10","authors":["Kumail Alhamoud","Shaden Alshammari","Yonglong Tian","Guohao Li","Philip Torr","Yoon Kim","Marzyeh Ghassemi"],"abstract":"Many practical vision-language applications require models that understand negation, e.g., when using natural language to retrieve images which contain certain objects but not others. Despite advancements in vision-language models (VLMs) through large-scale training, their ability to comprehend negation remains underexplored. This study addresses the question: how well do current VLMs understand negation? We introduce NegBench, a new benchmark designed to evaluate negation understanding across 18 task variations and 79k examples spanning image, video, and medical datasets. The benchmark consists of two core tasks designed to evaluate negation understanding in diverse multimodal settings: Retrieval with Negation and Multiple Choice Questions with Negated Captions. Our evaluation reveals that modern VLMs struggle significantly with negation, often performing at chance level. To address the...","companies":["OpenAI"],"matched_orgs":["OpenAI"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/cvpr52734.2025.02757","openalex_id":"https://openalex.org/W4413145306","cited_by_count":6,"quality_score":47,"matched_keywords":["retrieval"],"author_affiliations":["Moscow Institute of Thermal Technology","OpenAI (United States)","University of Oxford"],"concepts":[{"id":"https://openalex.org/C2185349","display_name":"Negation","score":0.7312349677085876},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6900732517242432},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.5582491159439087},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.455281138420105},{"id":"https://openalex.org/C199360897","display_name":"Programming language","score":0.40039533376693726},{"id":"https://openalex.org/C41895202","display_name":"Linguistics","score":0.32567447423934937},{"id":"https://openalex.org/C138885662","display_name":"Philosophy","score":0.11227697134017944}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":6}},{"id":"openalex:W4413156792","title":"Flexible Frame Selection for Efficient Video Reasoning","url":"https://doi.org/10.1109/cvpr52734.2025.02707","published":"2025-06-10","authors":["Shyamal Buch","Arsha Nagrani","Anurag Arnab","Cordelia Schmid"],"abstract":"Video-language models have shown promise for addressing a range of multimodal tasks for video reasoning, such as video question-answering. However, the inherent computational challenges of processing long video data and increasing model sizes have led to standard approaches that are limited by the number of frames they can process. In this work, we propose the Flexible Frame Selector (FFS), a learnable policy model with a new flexible selection operation, that helps alleviate input context restrictions by enabling video-language models to focus on the most informative frames for the downstream multimodal task, without adding undue processing cost. Our method differentiates from prior work due to its learnability, efficiency, and flexibility. We verify the efficacy of our method on standard video-question answering and reasoning benchmarks, and observe our model can maintain or improve ba...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/cvpr52734.2025.02707","openalex_id":"https://openalex.org/W4413156792","cited_by_count":2,"quality_score":47,"matched_keywords":["language model","efficient"],"author_affiliations":["DeepMind (United Kingdom)","Google (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7783557772636414},{"id":"https://openalex.org/C81917197","display_name":"Selection (genetic algorithm)","score":0.7170110940933228},{"id":"https://openalex.org/C126042441","display_name":"Frame (networking)","score":0.615152895450592},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.41036948561668396},{"id":"https://openalex.org/C31258907","display_name":"Computer network","score":0.1333582103252411}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":2}},{"id":"openalex:W4413146877","title":"DocLayLLM: An Efficient Multi-modal Extension of Large Language Models for Text-rich Document Understanding","url":"https://doi.org/10.1109/cvpr52734.2025.00382","published":"2025-06-10","authors":["Wenhui Liao","Jiapeng Wang","Hongliang Li","Chengyu Wang","Jun Huang","Lianwen Jin"],"abstract":"Text-rich document understanding (TDU) requires comprehensive analysis of documents containing substantial textual content and complex layouts. While Multimodal Large Language Models (MLLMs) have achieved fast progress in this domain, existing approaches either demand significant computational resources or struggle with effective multi-modal integration. In this paper, we introduce DocLayLLM, an efficient multi-modal extension of LLMs specifically designed for TDU. By lightly integrating visual patch tokens and 2D positional tokens into LLMs’ input and encoding the document content using the LLMs themselves, we fully take advantage of the document comprehension capability of LLMs and enhance their perception of OCR information. We have also deeply considered the role of chain-of-thought (CoT) and innovatively proposed the techniques of CoT Pre-training and CoT Annealing. Our DocLayLLM ca...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/cvpr52734.2025.00382","openalex_id":"https://openalex.org/W4413146877","cited_by_count":6,"quality_score":47,"matched_keywords":["efficient"],"author_affiliations":["Alibaba Group (China)","South China University of Technology"],"concepts":[{"id":"https://openalex.org/C2778029271","display_name":"Extension (predicate logic)","score":0.7867993116378784},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7842930555343628},{"id":"https://openalex.org/C71139939","display_name":"Modal","score":0.7526113986968994},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.49965405464172363},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4030842185020447},{"id":"https://openalex.org/C199360897","display_name":"Programming language","score":0.2777824401855469},{"id":"https://openalex.org/C188027245","display_name":"Polymer chemistry","score":0.0},{"id":"https://openalex.org/C185592680","display_name":"Chemistry","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":6}},{"id":"openalex:W4413147045","title":"Debiasing Multimodal Large Language Models via Noise-Aware Preference Optimization","url":"https://doi.org/10.1109/cvpr52734.2025.00880","published":"2025-06-10","authors":["Zefeng Zhang","Hengzhu Tang","Jiawei Sheng","Zhenyu Zhang","Yi Ren","Zhenyang Li","Dawei Yin","Duohe Ma","Tingwen Liu"],"abstract":"Multimodal Large Language Models (MLLMs) excel in various tasks, yet often struggle with modality bias, where the model tends to rely heavily on a single modality and overlook critical information in other modalities, which leads to incorrect focus and generating irrelevant responses. In this paper, we propose using the paradigm of preference optimization to solve the modality bias problem, including RLAIF-V-Bias, a debiased preference optimization dataset, and a Noise-Aware Preference Optimization (NaPO) algorithm. Specifically, we first construct the dataset by introducing perturbations to reduce the informational content of certain modalities, compelling the model to rely on a specific modality when generating negative responses. To address the inevitable noise in automatically constructed data, we combine the noise-robust Mean Absolute Error (MAE) with the Binary Cross-Entropy (BCE)....","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/cvpr52734.2025.00880","openalex_id":"https://openalex.org/W4413147045","cited_by_count":6,"quality_score":47,"matched_keywords":["preference"],"author_affiliations":["Baidu (China)","Chinese Academy of Sciences","Institute of Information Engineering"],"concepts":[{"id":"https://openalex.org/C2779458634","display_name":"Debiasing","score":0.9164010882377625},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7326867580413818},{"id":"https://openalex.org/C2781249084","display_name":"Preference","score":0.5613885521888733},{"id":"https://openalex.org/C99498987","display_name":"Noise (video)","score":0.5030571818351746},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4231262803077698},{"id":"https://openalex.org/C188147891","display_name":"Cognitive science","score":0.1090114414691925},{"id":"https://openalex.org/C15744967","display_name":"Psychology","score":0.09497922658920288},{"id":"https://openalex.org/C105795698","display_name":"Statistics","score":0.07584315538406372}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":6}},{"id":"openalex:W4413146941","title":"Bridging Modalities: Improving Universal Multimodal Retrieval by Multimodal Large Language Models","url":"https://doi.org/10.1109/cvpr52734.2025.00866","published":"2025-06-10","authors":["Xin Zhang","Yanzhao Zhang","Wen Xie","Mingxin Li","Ziqi Dai","Dingkun Long","Pengjun Xie","Meishan Zhang","Wenjie Li","Min Zhang"],"abstract":"Universal Multimodal Retrieval (UMR) aims to enable search across various modalities using a unified model, where queries and candidates can consist of pure text, images, or a combination of both. Previous work has attempted to adopt multimodal large language models (MLLMs) to realize UMR using only text data. However, our preliminary experiments demonstrate that more diverse multimodal training data can further unlock the potential of MLLMs. Despite its effectiveness, the existing multimodal training data is highly imbalanced in terms of modality, which motivates us to develop a training data synthesis pipeline and construct a large-scale, high-quality fused-modal training dataset. Based on the synthetic training data, we develop the General Multimodal Embedder (GME), an MLLM-based dense retriever designed for UMR. Furthermore, we construct a comprehensive UMR Benchmark (UMRB) to evalua...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/cvpr52734.2025.00866","openalex_id":"https://openalex.org/W4413146941","cited_by_count":6,"quality_score":47,"matched_keywords":["retrieval"],"author_affiliations":["Alibaba Group (China)","Alibaba Group (United States)","Hong Kong Polytechnic University"],"concepts":[{"id":"https://openalex.org/C174348530","display_name":"Bridging (networking)","score":0.8983972668647766},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7672445774078369},{"id":"https://openalex.org/C2779903281","display_name":"Modalities","score":0.7561594247817993},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.506538987159729},{"id":"https://openalex.org/C4441509","display_name":"Multimodal therapy","score":0.4302184283733368},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.41978174448013306},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.39802318811416626},{"id":"https://openalex.org/C71924100","display_name":"Medicine","score":0.0640377402305603}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":6}},{"id":"openalex:W4413145855","title":"Towards Universal Soccer Video Understanding","url":"https://doi.org/10.1109/cvpr52734.2025.00785","published":"2025-06-10","authors":["Joe Rao","Haoning Wu","Hao Jiang","Ya Zhang","Yanfeng Wang","Weidi Xie"],"abstract":"As a globally celebrated sport, soccer has attracted widespread interest from fans all over the world. This paper aims to develop a comprehensive multi-modal framework for soccer video understanding. Specifically, we make the following contributions in this paper: (i) we introduce SoccerReplay-1988, the largest multi-modal soccer dataset to date, featuring videos and detailed annotations from 1,988 complete matches, with an automated annotation pipeline; (ii) we present an advanced soccer-specific visual encoder, MatchVision, which leverages spatiotemporal information across soccer videos and excels in various downstream tasks; (iii) we conduct extensive experiments and ablation studies on event classification, commentary generation, and multi-view foul recognition. MatchVision demonstrates state-of-the-art performance on all of them, substantially outperforming existing models, which hi...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/cvpr52734.2025.00785","openalex_id":"https://openalex.org/W4413145855","cited_by_count":9,"quality_score":46,"matched_keywords":[],"author_affiliations":["Alibaba Group (China)","Shanghai Jiao Tong University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6563873291015625},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.40356823801994324},{"id":"https://openalex.org/C49774154","display_name":"Multimedia","score":0.39837491512298584},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.38985273241996765},{"id":"https://openalex.org/C121684516","display_name":"Computer graphics (images)","score":0.36104393005371094},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.33922773599624634}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":9}},{"id":"openalex:W4413145329","title":"Show and Segment: Universal Medical Image Segmentation via In-Context Learning","url":"https://doi.org/10.1109/cvpr52734.2025.01940","published":"2025-06-10","authors":["Yunhe Gao","Di Liu","Zhuowei Li","Yunsheng Li","Dongdong Chen","Mu Zhou","Dimitris Metaxas"],"abstract":"Medical image segmentation remains challenging due to the vast diversity of anatomical structures, imaging modalities, and segmentation tasks. While deep learning has made significant advances, current approaches struggle to generalize as they require task-specific training or fine-tuning on unseen classes. We present Iris, a novel In-context Reference Image guided Segmentation framework that enables flexible adaptation to novel tasks through the use of reference examples without fine-tuning. At its core, Iris features a lightweight context task encoding module that distills task-specific information from reference context image-label pairs. This rich context embedding information is used to guide the segmentation of target objects. By decoupling task encoding from inference, Iris supports diverse strategies from one-shot inference and context example ensemble to object-level context exa...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/cvpr52734.2025.01940","openalex_id":"https://openalex.org/W4413145329","cited_by_count":5,"quality_score":46,"matched_keywords":["retrieval"],"author_affiliations":["Microsoft (United States)","Microsoft Research (United Kingdom)","Rutgers Sexual and Reproductive Health and Rights"],"concepts":[{"id":"https://openalex.org/C124504099","display_name":"Image segmentation","score":0.7365455627441406},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7227991819381714},{"id":"https://openalex.org/C89600930","display_name":"Segmentation","score":0.6447815895080566},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.6211866140365601},{"id":"https://openalex.org/C65885262","display_name":"Scale-space segmentation","score":0.6069449186325073},{"id":"https://openalex.org/C2779343474","display_name":"Context (archaeology)","score":0.5959339737892151},{"id":"https://openalex.org/C25694479","display_name":"Segmentation-based object categorization","score":0.561972975730896},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.5093158483505249}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":5}},{"id":"openalex:W4413145089","title":"Language-Guided Image Tokenization for Generation","url":"https://doi.org/10.1109/cvpr52734.2025.01465","published":"2025-06-10","authors":["Kaiwen Zha","Lijun Yu","Alireza Fathi","David A. Ross","Cordelia Schmid","Dina Katabi","Xiuye Gu"],"abstract":"Image tokenization, the process of transforming raw image pixels into a compact low-dimensional latent representation, has proven crucial for scalable and efficient image generation. However, mainstream image tokenization methods generally have limited compression rates, making high-resolution image generation computationally expensive. To address this challenge, we propose to leverage language for efficient image tokenization, and we call our method Text-Conditioned Image Tokenization (TexTok). TexTok is a simple yet effective tokenization framework that leverages language to provide a compact, high-level semantic representation. By conditioning the tokenization process on descriptive text captions, TexTok simplifies semantic learning, allowing more learning capacity and token space to be allocated to capture fine-grained visual details, leading to enhanced reconstruction quality and hi...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/cvpr52734.2025.01465","openalex_id":"https://openalex.org/W4413145089","cited_by_count":1,"quality_score":46,"matched_keywords":["efficient","compression"],"author_affiliations":["DeepMind (United Kingdom)","Google (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7644318342208862},{"id":"https://openalex.org/C176982825","display_name":"Lexical analysis","score":0.6967939138412476},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.4907549321651459},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4173051118850708},{"id":"https://openalex.org/C199360897","display_name":"Programming language","score":0.39663976430892944}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4413146392","title":"FlashSloth: Lightning Multimodal Large Language Models via Embedded Visual Compression","url":"https://doi.org/10.1109/cvpr52734.2025.01358","published":"2025-06-10","authors":["Bo Tong","Bokai Lai","Yiyi Zhou","Gen Luo","Yunhang Shen","Ke Li","Xiaoshuai Sun","Rongrong Ji"],"abstract":"Despite a big leap forward in capability, multimodal large language models (MLLMs) tend to behave like a sloth in practical use, i.e., slow response and large latency. Recent efforts are devoted to building tiny MLLMs for better efficiency, but the plethora of visual tokens still used limit their actual speedup. In this paper, we propose a powerful and fast tiny MLLM called FlashSloth. Different from previous efforts, FlashSloth focuses on improving the descriptive power of visual tokens in the process of compressing their redundant semantics. In particular, FlashSloth introduces embedded visual compression designs to capture both visually salient and instruction-related image information, so as to achieving superior multimodal performance with fewer visual tokens. Extensive experiments are conducted to validate the proposed FlashSloth, and a bunch of tiny but strong MLLMs are also compr...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/cvpr52734.2025.01358","openalex_id":"https://openalex.org/W4413146392","cited_by_count":1,"quality_score":46,"matched_keywords":["memory","compression"],"author_affiliations":["InternetLab","Shanghai Artificial Intelligence Laboratory","Tencent (China)","Xiamen University"],"concepts":[{"id":"https://openalex.org/C69398868","display_name":"Lightning (connector)","score":0.7557097673416138},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7245551943778992},{"id":"https://openalex.org/C78548338","display_name":"Data compression","score":0.44000065326690674},{"id":"https://openalex.org/C180016635","display_name":"Compression (physics)","score":0.43331441283226013},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.25026339292526245},{"id":"https://openalex.org/C163258240","display_name":"Power (physics)","score":0.0},{"id":"https://openalex.org/C159985019","display_name":"Composite material","score":0.0},{"id":"https://openalex.org/C121332964","display_name":"Physics","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4413147576","title":"AniGS: Animatable Gaussian Avatar from a Single Image with Inconsistent Gaussian Reconstruction","url":"https://doi.org/10.1109/cvpr52734.2025.01970","published":"2025-06-10","authors":["Lingteng Qiu","Shenhao Zhu","Qi Zuo","Xiaodong Gu","Yuan Dong","Junfei Zhang","Chao Xu","Zhe Li","Weihao Yuan","Liefeng Bo","Guanying Chen","Zilong Dong"],"abstract":"Generating animatable human avatars from a single image is essential for various digital human modeling applications. Existing 3D reconstruction methods often struggle to capture fine details in animatable models, while generative approaches for controllable animation, though avoiding explicit 3D modeling, suffer from viewpoint inconsistencies in extreme poses and computational inefficiencies. In this paper, we address these challenges by leveraging the power of generative models to produce detailed multi-view canonical pose images, which help resolve ambiguities in animatable human reconstruction. We then propose a robust method for 3D reconstruction of inconsistent images, enabling real-time rendering during inference. Specifically, we adapt a transformer-based video generation model to generate multi-view canonical pose images and normal maps, pretraining on a large-scale video datase...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/cvpr52734.2025.01970","openalex_id":"https://openalex.org/W4413147576","cited_by_count":5,"quality_score":46,"matched_keywords":["efficient"],"author_affiliations":["Alibaba Group (China)","Alibaba Group (United States)","Sun Yat-sen University"],"concepts":[{"id":"https://openalex.org/C2777365542","display_name":"Avatar","score":0.6959220170974731},{"id":"https://openalex.org/C163716315","display_name":"Gaussian","score":0.6820240616798401},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6029602885246277},{"id":"https://openalex.org/C61326573","display_name":"Gaussian process","score":0.545838475227356},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.46472981572151184},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.45758405327796936},{"id":"https://openalex.org/C115961682","display_name":"Image (mathematics)","score":0.4496572017669678},{"id":"https://openalex.org/C121684516","display_name":"Computer graphics (images)","score":0.3331717848777771}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":5}},{"id":"openalex:W4413147722","title":"ROD-MLLM: Towards More Reliable Object Detection in Multimodal Large Language Models","url":"https://doi.org/10.1109/cvpr52734.2025.01339","published":"2025-06-10","authors":["Haixu Yin","Yuqiang Ren","Ke Yan","Shouhong Ding","Yongtao Hao"],"abstract":"Multimodal large language models (MLLMs) have demonstrated strong language understanding and generation capabilities, excelling in visual tasks like referring and grounding. However, due to task type limitations and dataset scarcity, existing MLLMs only ground objects present in images and cannot reject non-existent objects effectively, resulting in unreliable predictions. In this paper, we introduce ROD-MLLM, a novel MLLM for Reliable Object Detection using free-form language. We propose a query-based localization mechanism to extract low-level object features. By aligning global and object-level visual information with text space, we leverage the large language model (LLM) for high-level comprehension and final localization decisions, overcoming the language understanding limitations of normal detectors. To enhance language-based object detection, we design an automated data annotation...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/cvpr52734.2025.01339","openalex_id":"https://openalex.org/W4413147722","cited_by_count":0,"quality_score":45,"matched_keywords":["LLM","language model"],"author_affiliations":["Tencent (China)","Tongji University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7739289999008179},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5041478872299194},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.48066842555999756},{"id":"https://openalex.org/C2781238097","display_name":"Object (grammar)","score":0.45116686820983887}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4413156721","title":"Lifelong Knowledge Editing for Vision Language Models with Low-Rank Mixture-of-Experts","url":"https://doi.org/10.1109/cvpr52734.2025.00883","published":"2025-06-10","authors":["Qizhou Chen","Chengyu Wang","Dakan Wang","Taolin Zhang","Wangyue Li","Xiaofeng He"],"abstract":"Model editing aims to correct inaccurate knowledge, update outdated information, and incorporate new data into Large Language Models (LLMs) without the need for retraining. This task poses challenges in lifelong scenarios where edits must be continuously applied for real-world applications. While some editors demonstrate strong robustness for lifelong editing in pure LLMs, Vision LLMs (VLLMs), which incorporate an additional vision modality, are not directly adaptable to existing LLM editors. In this paper, we propose LiveEdit, a Lifelong vision language model Edit to bridge the gap between lifelong LLM editing and VLLMs. We begin by training an editing expert generator to independently produce low-rank experts for each editing instance, with the goal of correcting the relevant responses of the VLLM. A hard filtering mechanism is developed to utilize visual semantic knowledge, thereby co...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/cvpr52734.2025.00883","openalex_id":"https://openalex.org/W4413156721","cited_by_count":0,"quality_score":45,"matched_keywords":["LLM","language model"],"author_affiliations":["Alibaba Group (China)","East China Normal University","Hefei University of Technology","Shanghai Stock Exchange"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7349247336387634},{"id":"https://openalex.org/C164226766","display_name":"Rank (graph theory)","score":0.6780825853347778},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.46989914774894714},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.45178043842315674},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.3299015164375305},{"id":"https://openalex.org/C33923547","display_name":"Mathematics","score":0.08293494582176208},{"id":"https://openalex.org/C114614502","display_name":"Combinatorics","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4413147469","title":"LayoutVLM: Differentiable Optimization of 3D Layout via Vision-Language Models","url":"https://doi.org/10.1109/cvpr52734.2025.02744","published":"2025-06-10","authors":["Fan-Yun Sun","Weiyu Liu","Siyi Gu","Dylan Lim","Goutam Bhat","Federico Tombari","Manling Li","Nick Haber","Jiajun Wu"],"abstract":"Spatial reasoning is a fundamental aspect of human cognition, enabling intuitive understanding and manipulation of objects in three-dimensional space. While foundation models demonstrate remarkable performance on some benchmarks, they still struggle with 3D reasoning tasks like arranging objects in space according to open-ended language instructions, particularly in dense and physically constrained environments. We introduce LayoutVLM, a framework and scene layout representation that exploits the semantic knowledge of Vision-Language Models (VLMs) and supports differentiable optimization to ensure physical plausibility. LayoutVLM employs VLMs to generate two mutually reinforcing representations from visually marked images, and a self-consistent decoding process to improve VLMs spatial planning. Our experiments show that LayoutVLM addresses the limitations of existing LLM and constraint-b...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/cvpr52734.2025.02744","openalex_id":"https://openalex.org/W4413147469","cited_by_count":4,"quality_score":45,"matched_keywords":["LLM"],"author_affiliations":["Google (United States)","Stanford University"],"concepts":[{"id":"https://openalex.org/C202615002","display_name":"Differentiable function","score":0.7385901212692261},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6955353617668152},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4132440686225891},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.3725709915161133},{"id":"https://openalex.org/C33923547","display_name":"Mathematics","score":0.15805643796920776},{"id":"https://openalex.org/C134306372","display_name":"Mathematical analysis","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":4}},{"id":"openalex:W4413145434","title":"Coarse Correspondences Boost Spatial-Temporal Reasoning in Multimodal Language Model","url":"https://doi.org/10.1109/cvpr52734.2025.00358","published":"2025-06-10","authors":["Benlin Liu","Yuhao Dong","Yiqin Wang","Zixian Ma","Yansong Tang","Luming Tang","Yongming Rao","Wei-Chiu Ma","Ranjay Krishna"],"abstract":"Multimodal language models (MLLMs) are increasingly being applied in real-world environments, necessitating their ability to interpret 3D spaces and comprehend temporal dynamics. Current methods often rely on specialized architectural designs or task-specific fine-tuning to achieve this. We introduce Coarse Correspondences, a simple lightweight method that enhances MLLMs’ spatial-temporal reasoning with 2D images as input, without modifying the architecture or requiring task-specific fine-tuning. Our method uses a lightweight tracking model to identify primary object correspondences between frames in a video or across different image viewpoints, and then conveys this information to MLLMs through visual prompting. We demonstrate that this simple training-free approach brings substantial gains to GPT4-V/O consistently on four benchmarks that require spatial-temporal reasoning, including +2...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/cvpr52734.2025.00358","openalex_id":"https://openalex.org/W4413145434","cited_by_count":0,"quality_score":45,"matched_keywords":["language model","memory"],"author_affiliations":["Cornell University","Tencent (China)","Tsinghua University","University of Washington"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.735775887966156},{"id":"https://openalex.org/C155911833","display_name":"Spatial intelligence","score":0.6123311519622803},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5237981081008911},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.37789013981819153}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4413146164","title":"NightAdapter: Learning a Frequency Adapter for Generalizable Night-time Scene Segmentation","url":"https://doi.org/10.1109/cvpr52734.2025.02220","published":"2025-06-10","authors":["Qi Bi","Jingjun Yi","Huimin Huang","Hao Zheng","Haolan Zhan","Yawen Huang","Yuexiang Li","Xian Wu","Yefeng Zheng"],"abstract":"Night-time scene segmentation is a critical yet challenging task in the real-world applications, primarily due to the complicated lighting conditions. However, existing methods lack sufficient generalization ability to unseen nighttime scenes with varying illumination. In light of this issue, we focus on investigating generalizable paradigms for night-time scene segmentation and propose an efficient fine-tuning scheme, dubbed NightAdapter, alleviating the domain gap across various scenes. Interestingly, different properties embedded in the day-time and night-time features can be characterized by the bands after discrete sine transform, which can be categorized into illumination-sensitive/-insensitive bands. Hence, our NightAdapter is powered by two appealing designs: (1) Illumination-Insensitive Band Adaptation that provides a foundation for understanding the prior, enhancing the robustn...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/cvpr52734.2025.02220","openalex_id":"https://openalex.org/W4413146164","cited_by_count":3,"quality_score":44,"matched_keywords":["efficient"],"author_affiliations":["Monash University","Tencent (China)","University of Macau","Westlake University"],"concepts":[{"id":"https://openalex.org/C177284502","display_name":"Adapter (computing)","score":0.7904145121574402},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7146406769752502},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5887579917907715},{"id":"https://openalex.org/C89600930","display_name":"Segmentation","score":0.5593909025192261},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.45467716455459595},{"id":"https://openalex.org/C28490314","display_name":"Speech recognition","score":0.37179213762283325},{"id":"https://openalex.org/C9390403","display_name":"Computer hardware","score":0.08307036757469177}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":3}},{"id":"openalex:W4413147340","title":"Generative Multimodal Pretraining with Discrete Diffusion Timestep Tokens","url":"https://doi.org/10.1109/cvpr52734.2025.02434","published":"2025-06-10","authors":["Kaihang Pan","Lin Wang","Zhongqi Yue","Tenglong Ao","Liyu Jia","Wei Zhao","Juncheng Li","Siliang Tang","Hanwang Zhang"],"abstract":"Recent endeavors in Multimodal Large Language Models (MLLMs) aim to unify visual comprehension and generation by combining LLM and diffusion models, the state-of-the-art in each task, respectively. Existing approaches rely on spatial visual tokens, where image patches are encoded and arranged according to a spatial order (e.g., raster scan). However, we show that spatial tokens lack the recursive structure inherent to languages, hence form an impossible language for LLM to master. In this paper, we build a proper visual language by leveraging diffusion timesteps to learn discrete, recursive visual tokens. Our proposed tokens recursively compensate for the progressive attribute loss in noisy images as timesteps increase, enabling the diffusion model to reconstruct the original image at any timestep. This approach allows us to effectively integrate the strengths of LLMs in autoregressive r...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/cvpr52734.2025.02434","openalex_id":"https://openalex.org/W4413147340","cited_by_count":3,"quality_score":44,"matched_keywords":["LLM"],"author_affiliations":["Huawei German Research Center","Huawei Technologies (China)","Huawei Technologies (United States)","Nanyang Technological University","Peking University","Zhejiang University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7289530038833618},{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.7249135971069336},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.548583984375},{"id":"https://openalex.org/C69357855","display_name":"Diffusion","score":0.523320198059082},{"id":"https://openalex.org/C167966045","display_name":"Generative model","score":0.4597899913787842},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.3641952872276306},{"id":"https://openalex.org/C121332964","display_name":"Physics","score":0.0},{"id":"https://openalex.org/C97355855","display_name":"Thermodynamics","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":3}},{"id":"openalex:W4413144818","title":"DynamicScaler: Seamless and Scalable Video Generation for Panoramic Scenes","url":"https://doi.org/10.1109/cvpr52734.2025.00576","published":"2025-06-10","authors":["Jinxiu Liu","Shaoheng Lin","Yinxiao Li","Ming–Hsuan Yang"],"abstract":"The increasing demand for immersive AR/VR applications and spatial intelligence has heightened the need to generate high-quality scene-level and 360° panoramic video. However, most video diffusion models are constrained by limited resolution and aspect ratio, which restricts their applicability to scene-level dynamic content synthesis. In this work, we propose DynamicScaler, addressing these challenges by enabling spatially scalable and panoramic dynamic scene synthesis that preserves coherence across panoramic scenes of arbitrary size. Specifically, we introduce a Offset Shifting Denoiser, facilitating efficient, synchronous, and coherent denoising panoramic dynamic scenes via a diffusion model with fixed resolution through a seamless rotating Window, which ensures seamless boundary transitions and consistency across the entire panoramic space, accommodating varying resolutions and aspe...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/cvpr52734.2025.00576","openalex_id":"https://openalex.org/W4413144818","cited_by_count":3,"quality_score":44,"matched_keywords":["efficient"],"author_affiliations":["Google (United States)","Google DeepMind (United Kingdom)","South China University of Technology"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8000555038452148},{"id":"https://openalex.org/C48044578","display_name":"Scalability","score":0.6788496375083923},{"id":"https://openalex.org/C121684516","display_name":"Computer graphics (images)","score":0.5827853083610535},{"id":"https://openalex.org/C49774154","display_name":"Multimedia","score":0.39641863107681274},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.3800452947616577},{"id":"https://openalex.org/C111919701","display_name":"Operating system","score":0.09400349855422974}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":3}},{"id":"openalex:W4413157185","title":"CrossOver: 3D Scene Cross-Modal Alignment","url":"https://doi.org/10.1109/cvpr52734.2025.00840","published":"2025-06-10","authors":["Sayan Sarkar","Ondřej Mikšík","Marc Pollefeys","Dániel Baráth","Iro Armeni"],"abstract":"Multi-modal 3D object understanding has gained significant attention, yet current approaches often assume complete data availability and rigid alignment across all modalities. We present CrossOver, a novel framework for cross-modal 3D scene understanding via flexible, scene-level modality alignment. Unlike traditional methods that require aligned modality data for every object instance, CrossOver learns a unified, modality-agnostic embedding space for scenes by aligning modalities – RGB images, point clouds, CAD models, floorplans, and text descriptions – with relaxed constraints and without explicit object semantics. Leveraging dimensionality-specific encoders, a multi-stage training pipeline, and emergent cross-modal behaviors, CrossOver supports robust scene retrieval and object localization, even with missing modalities. Evaluations on Scan-Net and 3RScan datasets show its superior p...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/cvpr52734.2025.00840","openalex_id":"https://openalex.org/W4413157185","cited_by_count":3,"quality_score":44,"matched_keywords":["retrieval"],"author_affiliations":["ETH Zurich","Microsoft (United States)","Microsoft Research (United Kingdom)","Stanford University"],"concepts":[{"id":"https://openalex.org/C122507166","display_name":"Crossover","score":0.7875703573226929},{"id":"https://openalex.org/C71139939","display_name":"Modal","score":0.680610179901123},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5835227370262146},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3306121826171875},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.3227410912513733},{"id":"https://openalex.org/C192562407","display_name":"Materials science","score":0.10883167386054993},{"id":"https://openalex.org/C188027245","display_name":"Polymer chemistry","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":3}},{"id":"openalex:W4413157060","title":"TexGarment: Consistent Garment UV Texture Generation via Efficient 3D Structure-Guided Diffusion Transformer","url":"https://doi.org/10.1109/cvpr52734.2025.02474","published":"2025-06-10","authors":["Jialun Liu","Jinbo Wu","Xiaobo Gao","Jiakui Hu","Bojun Xiong","Xing Liu","Chen Zhao","Hongbin Pei","Haocheng Feng","Yingying Li","Errui Ding","Jingdong Wang"],"abstract":"This paper introduces TexGarment, an efficient method for synthesizing high-quality, 3D-consistent garment textures in UV space. Traditional approaches based on 2D-to-3D mapping often suffer from 3D inconsistency, while methods learning from limited 3D data lack sufficient texture diversity. These limitations are particularly problematic in garment texture generation, where high demands exist for both detail and variety. To address these challenges, TexGarment leverages a pre-trained text-to-image diffusion Transformer model with robust generalization capabilities, introducing structural information to guide the model in generating 3D-consistent garment textures in a single inference step. Specifically, We utilize the 2D UV position map to guide the layout during the UV texture generation process, ensuring a coherent texture arrangement and enhancing it by integrating global 3D structura...","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/cvpr52734.2025.02474","openalex_id":"https://openalex.org/W4413157060","cited_by_count":2,"quality_score":43,"matched_keywords":["efficient"],"author_affiliations":["Baidu (China)","Peking University","Xi'an Jiaotong University"],"concepts":[{"id":"https://openalex.org/C66322947","display_name":"Transformer","score":0.6157979369163513},{"id":"https://openalex.org/C192562407","display_name":"Materials science","score":0.45235732197761536},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.4133238196372986},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.33111846446990967},{"id":"https://openalex.org/C49040817","display_name":"Optoelectronics","score":0.3213815689086914},{"id":"https://openalex.org/C119599485","display_name":"Electrical engineering","score":0.17157867550849915},{"id":"https://openalex.org/C127413603","display_name":"Engineering","score":0.09895986318588257},{"id":"https://openalex.org/C165801399","display_name":"Voltage","score":0.08056133985519409}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":2}},{"id":"openalex:W4413157446","title":"Re-HOLD: Video Hand Object Interaction Reenactment via adaptive Layout-instructed Diffusion Model","url":"https://doi.org/10.1109/cvpr52734.2025.01635","published":"2025-06-10","authors":["Yingying Fan","Qiong Yang","Kaisiyuan Wang","Hang Zhou","Yingying Li","Haocheng Feng","Errui Ding","Yu Wu","Jingdong Wang"],"abstract":"Current digital human studies focusing on lip-syncing and body movement are no longer sufficient to meet the growing industrial demand, while human video generation techniques that support interacting with real-world environments (e.g., objects) have not been well investigated. Despite human hand synthesis already being an intricate problem, generating objects in contact with hands and their interactions presents an even more challenging task, especially when the objects exhibit obvious variations in size and shape. To tackle these issues, we present a novel video reenactment framework focusing on Human-Object Interaction (HOI) via an adaptive Layout-instructed Diffusion model (Re-HOLD). Our key insight is to employ specialized layout representation for hands and objects, respectively. Such representations enable effective disentanglement of hand modeling and object adaptation to diverse...","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/cvpr52734.2025.01635","openalex_id":"https://openalex.org/W4413157446","cited_by_count":2,"quality_score":43,"matched_keywords":["memory"],"author_affiliations":["Baidu (China)","University of Science and Technology of China","Wuhan University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7714996933937073},{"id":"https://openalex.org/C2781238097","display_name":"Object (grammar)","score":0.6317866444587708},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.5884346961975098},{"id":"https://openalex.org/C69357855","display_name":"Diffusion","score":0.5427560806274414},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.46575742959976196},{"id":"https://openalex.org/C121684516","display_name":"Computer graphics (images)","score":0.40948712825775146},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.34921103715896606},{"id":"https://openalex.org/C44154836","display_name":"Simulation","score":0.32839587330818176}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":2}},{"id":"openalex:W4413146707","title":"Perception Tokens Enhance Visual Reasoning in Multimodal Language Models","url":"https://doi.org/10.1109/cvpr52734.2025.00363","published":"2025-06-10","authors":["Mahtab Bigverdi","Zelun Luo","Cheng-Yu Hsieh","Ethan Shen","Dongping Chen","Linda G. Shapiro","Ranjay Krishna"],"abstract":"Multimodal language models (MLMs) still face challenges in fundamental visual perception tasks where specialized models excel. Tasks requiring reasoning about 3D structures benefit from depth estimation, and reasoning about 2D object instances benefits from object detection. Yet, MLMs can not produce intermediate depth or boxes to reason over. Fine-tuning MLMs on relevant data doesn’t generalize well and outsourcing computation to specialized vision tools is too compute-intensive and memory-inefficient. To address this, we introduce Perception Tokens, intrinsic image representations designed to assist reasoning tasks where language is insufficient. Perception tokens act as auxiliary reasoning tokens, akin to chain-of-thought prompts in language models. For example, in a depth-related task, an MLM augmented with perception tokens can reason by generating a depth map as tokens, enabling it...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/cvpr52734.2025.00363","openalex_id":"https://openalex.org/W4413146707","cited_by_count":2,"quality_score":43,"matched_keywords":["memory"],"author_affiliations":["Google (United States)","University of Washington"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7893873453140259},{"id":"https://openalex.org/C26760741","display_name":"Perception","score":0.6532891392707825},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.48904213309288025},{"id":"https://openalex.org/C2780878386","display_name":"Visual language","score":0.4197947382926941},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.39807185530662537},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.3969497084617615},{"id":"https://openalex.org/C15744967","display_name":"Psychology","score":0.14654415845870972},{"id":"https://openalex.org/C41895202","display_name":"Linguistics","score":0.11145249009132385}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":2}},{"id":"openalex:W4413144850","title":"Lessons and Insights from a Unifying Study of Parameter-Efficient Fine-Tuning (PEFT) in Visual Recognition","url":"https://doi.org/10.1109/cvpr52734.2025.01383","published":"2025-06-10","authors":["Zheda Mai","Ping Zhang","Cheng-Hao Tu","Hong-You Chen","Quang‐Huy Nguyen","Li Zhang","Wei‐Lun Chao"],"abstract":"Parameter-Efficient fine-tuning (PEFT) has attracted significant attention due to the growth of pre-trained model sizes and the need to fine-tune (FT) them for superior downstream performance. Despite a surge in new PEFT methods, a systematic study to understand their performance and suitable application scenarios is lacking, leaving questions like \"when to apply PEFT\" and \"which method to use\" largely unanswered, especially in visual recognition. In this paper, we conduct a unifying empirical study of representative PEFT methods with Vision Transformers. We systematically tune their hyperparameters to fairly compare their accuracy on downstream tasks. Our study offers a practical user guide and unveils several new insights. First, if tuned carefully, different PEFT methods achieve similar accuracy in the low-shot benchmark VTAB-1K. This includes simple approaches like FT the bias terms....","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/cvpr52734.2025.01383","openalex_id":"https://openalex.org/W4413144850","cited_by_count":2,"quality_score":43,"matched_keywords":["efficient"],"author_affiliations":["Google (United States)","The Ohio State University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6868295073509216},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5275161862373352},{"id":"https://openalex.org/C157524613","display_name":"Fine-tuning","score":0.446181982755661},{"id":"https://openalex.org/C153180895","display_name":"Pattern recognition (psychology)","score":0.33479607105255127},{"id":"https://openalex.org/C121332964","display_name":"Physics","score":0.11533862352371216},{"id":"https://openalex.org/C62520636","display_name":"Quantum mechanics","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":2}},{"id":"openalex:W4413144751","title":"Ego4o: Egocentric Human Motion Capture and Understanding from Multi-Modal Input","url":"https://doi.org/10.1109/cvpr52734.2025.02111","published":"2025-06-10","authors":["Jian Wang","Rishabh Dabral","Diogo Luvizon","Zhe Cao","Lingjie Liu","Thabo Beeler","Christian Theobalt"],"abstract":"This work focuses on tracking and understanding human motion using consumer wearable devices, such as VR/AR headsets, smart glasses, cellphones, and smartwatches. These devices provide diverse, multi-modal sensor inputs, including egocentric images, and 1-3 sparse IMU sensors in varied combinations. Motion descriptions can also accompany these signals. The diverse input modalities and their intermittent availability pose challenges for consistent motion capture and understanding. In this work, we present Ego4o (o for omni), a new framework for simultaneous human motion capture and understanding from multi-modal egocentric inputs. This method maintains performance with partial inputs while achieving better results when multiple modalities are combined. First, the IMU sensor inputs, the optional egocentric image, and text description of human motion are encoded into the latent space of a m...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/cvpr52734.2025.02111","openalex_id":"https://openalex.org/W4413144751","cited_by_count":2,"quality_score":43,"matched_keywords":["LLM"],"author_affiliations":["California University of Pennsylvania","Google (United States)","Max Planck Institute for Informatics"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7460948824882507},{"id":"https://openalex.org/C71139939","display_name":"Modal","score":0.7337651252746582},{"id":"https://openalex.org/C48007421","display_name":"Motion capture","score":0.7029004096984863},{"id":"https://openalex.org/C2986578859","display_name":"Human motion","score":0.5961622595787048},{"id":"https://openalex.org/C104114177","display_name":"Motion (physics)","score":0.5440572500228882},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.507597804069519},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.47194522619247437},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.3634030818939209}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":2}},{"id":"openalex:W4413147139","title":"Decouple Distortion from Perception: Region Adaptive Diffusion for Extreme-low Bitrate Perception Image Compression","url":"https://doi.org/10.1109/cvpr52734.2025.01682","published":"2025-06-10","authors":["Jinchang Xu","Shaokang Wang","Jintao Chen","Zhe Li","Peidong Jia","Fei Zhao","Guoqing Xiang","Zhijian Hao","Shanghang Zhang","Xiaodong Xie"],"abstract":"Leveraging the generative power of diffusion models, generative image compression has achieved impressive perceptual fidelity even at extremely low bitrates. However, current methods often neglect the non-uniform complexity of images, limiting their ability to balance global perceptual quality with local texture consistency and to allocate coding resources efficiently. To address this, we introduce the Map-guided Masking Realism Image Diffusion Codec (MRIDC), designed to optimize the trade- off between local distortion and global perceptual quality in extreme-low bitrate compression. MRIDC integrates a vector-quantized image encoder with a diffusion-based decoder. On the encoding side, we propose a Map-guided Latent Masking (MLM) module, which selectively masks elements in the latent space based on prior information, allowing adaptive resource allocation aligned with image complexity. On...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/cvpr52734.2025.01682","openalex_id":"https://openalex.org/W4413147139","cited_by_count":2,"quality_score":43,"matched_keywords":["compression"],"author_affiliations":["Alibaba Group (China)","Peking University","Xidian University"],"concepts":[{"id":"https://openalex.org/C26760741","display_name":"Perception","score":0.7073892951011658},{"id":"https://openalex.org/C126780896","display_name":"Distortion (music)","score":0.7003231048583984},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.5734055042266846},{"id":"https://openalex.org/C13481523","display_name":"Image compression","score":0.5548874139785767},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5481619238853455},{"id":"https://openalex.org/C69357855","display_name":"Diffusion","score":0.5474092364311218},{"id":"https://openalex.org/C180016635","display_name":"Compression (physics)","score":0.5254924893379211},{"id":"https://openalex.org/C78548338","display_name":"Data compression","score":0.5225214958190918}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":2}},{"id":"openalex:W4413146645","title":"Antidote: A Unified Framework for Mitigating LVLM Hallucinations in Counterfactual Presupposition and Object Perception","url":"https://doi.org/10.1109/cvpr52734.2025.01365","published":"2025-06-10","authors":["Yuanchen Wu","Lu Zhang","Hang Yao","Junlong Du","Ke Yan","Shouhong Ding","Yunsheng Wu","Xiaoqiang Li"],"abstract":"Large Vision-Language Models (LVLMs) have achieved impressive results across various cross-modal tasks. However, hallucinations, i.e., the models generating counterfactual responses, remain a challenge. Though recent studies have attempted to alleviate object perception hallucinations, they focus on the models’ response generation, and overlooking the task question itself. This paper discusses the vulnerability of LVLMs in solving counterfactual presupposition questions (CPQs), where the models are prone to accept the presuppositions of counterfactual objects and produce severe hallucinatory responses. To this end, we introduce \"Antidote\", a unified, synthetic data-driven post-training framework for mitigating both types of hallucination above. It leverages synthetic data to incorporate factual priors into questions to achieve self-correction, and decouple the mitigation process into a p...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/cvpr52734.2025.01365","openalex_id":"https://openalex.org/W4413146645","cited_by_count":2,"quality_score":43,"matched_keywords":["preference"],"author_affiliations":["Shanghai University","Shanghai University of Engineering Science","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C21190884","display_name":"Presupposition","score":0.9040746688842773},{"id":"https://openalex.org/C108650721","display_name":"Counterfactual thinking","score":0.891122579574585},{"id":"https://openalex.org/C26760741","display_name":"Perception","score":0.6727526187896729},{"id":"https://openalex.org/C2779365888","display_name":"Antidote","score":0.559131383895874},{"id":"https://openalex.org/C2781238097","display_name":"Object (grammar)","score":0.5581668019294739},{"id":"https://openalex.org/C180747234","display_name":"Cognitive psychology","score":0.4622054100036621},{"id":"https://openalex.org/C15744967","display_name":"Psychology","score":0.4602469801902771},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.45754238963127136}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":2}},{"id":"openalex:W4413156773","title":"PointLoRA: Low-Rank Adaptation with Token Selection for Point Cloud Learning","url":"https://doi.org/10.1109/cvpr52734.2025.00619","published":"2025-06-10","authors":["Song Wang","Xiaolu Liu","Lingdong Kong","Jianyun Xu","Chunyong Hu","Gongfan Fang","Wentong Li","Jianke Zhu","Xinchao Wang"],"abstract":"Self-supervised representation learning for point cloud has demonstrated effectiveness in improving pre-trained model performance across diverse tasks. However, as pre-trained models grow in complexity, fully fine-tuning them for downstream applications demands substantial computational and storage resources. Parameter-efficient fine-tuning (PEFT) methods offer a promising solution to mitigate these resource requirements, yet most current approaches rely on complex adapter and prompt mechanisms that increase tunable parameters. In this paper, we propose PointLoRA, a simple yet effective method that combines low-rank adaptation (LoRA) with multi-scale token selection to efficiently fine-tune point cloud models. Our approach embeds LoRA layers within the most parameter-intensive components of point cloud transformers, reducing the need for tunable parameters while enhancing global feature....","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/cvpr52734.2025.00619","openalex_id":"https://openalex.org/W4413156773","cited_by_count":1,"quality_score":42,"matched_keywords":["efficient"],"author_affiliations":["Alibaba Group (Cayman Islands)","Alibaba Group (China)","Alibaba Group (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.714297890663147},{"id":"https://openalex.org/C139807058","display_name":"Adaptation (eye)","score":0.6809279322624207},{"id":"https://openalex.org/C48145219","display_name":"Security token","score":0.6792668104171753},{"id":"https://openalex.org/C81917197","display_name":"Selection (genetic algorithm)","score":0.6428908109664917},{"id":"https://openalex.org/C79974875","display_name":"Cloud computing","score":0.5160341262817383},{"id":"https://openalex.org/C164226766","display_name":"Rank (graph theory)","score":0.503297746181488},{"id":"https://openalex.org/C131979681","display_name":"Point cloud","score":0.49108123779296875},{"id":"https://openalex.org/C28719098","display_name":"Point (geometry)","score":0.47648948431015015}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4413144920","title":"Improving Autoregressive Visual Generation with Cluster-Oriented Token Prediction","url":"https://doi.org/10.1109/cvpr52734.2025.00873","published":"2025-06-10","authors":["Teng Hu","Jiangning Zhang","Ran Yi","Jieyu Weng","Yabiao Wang","Xianfang Zeng","Zhucun Xue","Lizhuang Ma"],"abstract":"Employing LLMs for visual generation has recently become a research focus. However, the existing methods primarily transfer the LLM architecture to visual generation but rarely investigate the fundamental differences between language and vision. This oversight may lead to suboptimal utilization of visual generation capabilities within the LLM framework. In this paper, we explore the characteristics of visual embedding space under the LLM framework and discover that the correlation between visual embeddings can help achieve more stable and robust generation results. We present IAR, an Improved AutoRegressive Visual Generation Method that enhances the training efficiency and generation quality of LLM-based visual generation models. Firstly, we propose a Codebook Rearrangement strategy that uses balanced k-means clustering algorithm to rearrange the visual codebook into clusters, ensuring h...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/cvpr52734.2025.00873","openalex_id":"https://openalex.org/W4413144920","cited_by_count":1,"quality_score":42,"matched_keywords":["LLM"],"author_affiliations":["Shanghai Jiao Tong University","Tencent (China)","Zhejiang University"],"concepts":[{"id":"https://openalex.org/C159877910","display_name":"Autoregressive model","score":0.8391218185424805},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7319715023040771},{"id":"https://openalex.org/C48145219","display_name":"Security token","score":0.7199724912643433},{"id":"https://openalex.org/C164866538","display_name":"Cluster (spacecraft)","score":0.5406342148780823},{"id":"https://openalex.org/C194657046","display_name":"STAR model","score":0.48658788204193115},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.39072054624557495},{"id":"https://openalex.org/C24338571","display_name":"Autoregressive integrated moving average","score":0.2522338330745697},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.22033804655075073}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4413155968","title":"IM-Zero: Instance-level Motion Controllable Video Generation in a Zero-shot Manner","url":"https://doi.org/10.1109/cvpr52734.2025.00681","published":"2025-06-10","authors":["Yuyang Huang","Y. R. Chen","Li Ding","Xiaopeng Zhang","Wenrui Dai","Junni Zou","Hongkai Xiong","Qi Tian"],"abstract":"Controllability of video generation has been recently concerned in addition to the quality of generated videos. The main challenge to controllable video generation is to synthesize videos based on user-specified instance spatial locations and movement trajectories. However, existing methods suffer from a dilemma between the resource consumption, generation quality, and user controllability. As an efficient alternative to prohibitive training-based video generation, existing zero-shot video generation methods cannot generate high-quality and motion-consistent videos under the control of layouts and movement trajectories. In this paper, we propose a novel zero-shot method named IM-Zero that ameliorates instance-level motion controllable video generation with enhanced control accuracy, motion consistency, and richness of details to address this problem. Specifically, we first present a moti...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/cvpr52734.2025.00681","openalex_id":"https://openalex.org/W4413155968","cited_by_count":1,"quality_score":42,"matched_keywords":["efficient"],"author_affiliations":["Huawei Technologies (China)","Shanghai Jiao Tong University"],"concepts":[{"id":"https://openalex.org/C2780813799","display_name":"Zero (linguistics)","score":0.7798018455505371},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6002638339996338},{"id":"https://openalex.org/C104114177","display_name":"Motion (physics)","score":0.5534079074859619},{"id":"https://openalex.org/C2778344882","display_name":"Shot (pellet)","score":0.5133280158042908},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.35407355427742004},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.32058364152908325},{"id":"https://openalex.org/C192562407","display_name":"Materials science","score":0.05938008427619934},{"id":"https://openalex.org/C191897082","display_name":"Metallurgy","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4413147704","title":"Hybrid-Level Instruction Injection for Video Token Compression in Multi-modal Large Language Models","url":"https://doi.org/10.1109/cvpr52734.2025.00801","published":"2025-06-10","authors":["Zhihang Liu","Chen-Wei Xie","Pandeng Li","Liming Zhao","Longxiang Tang","Yun Zheng","Chuanbin Liu","Hongtao Xie"],"abstract":"Recent Multi-modal Large Language Models (MLLMs) have been challenged by the computational overhead resulting from massive video frames, often alleviated through compression strategies. However, the visual content is not equally contributed to user instructions, existing strategies (e.g., average pool) inevitably lead to the loss of potentially useful information. To tackle this, we propose the Hybridlevel Instruction Injection Strategy for Conditional Token Compression in MLLMs (HICom), utilizing the instruction as a condition to guide the compression from both local and global levels. This encourages the compression to retain the maximum amount of user-focused information while reducing visual tokens to minimize computational burden. Specifically, the instruction condition is injected into the grouped visual tokens at the local level and the learnable tokens at the global level, and we...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/cvpr52734.2025.00801","openalex_id":"https://openalex.org/W4413147704","cited_by_count":1,"quality_score":42,"matched_keywords":["compression"],"author_affiliations":["Alibaba Group (China)","Alibaba Group (United States)","Tsinghua University","University of Science and Technology of China"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7969846725463867},{"id":"https://openalex.org/C71139939","display_name":"Modal","score":0.7189128398895264},{"id":"https://openalex.org/C48145219","display_name":"Security token","score":0.5879356861114502},{"id":"https://openalex.org/C180016635","display_name":"Compression (physics)","score":0.5004849433898926},{"id":"https://openalex.org/C78548338","display_name":"Data compression","score":0.47183945775032043},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.22499826550483704},{"id":"https://openalex.org/C31258907","display_name":"Computer network","score":0.19793513417243958},{"id":"https://openalex.org/C159985019","display_name":"Composite material","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4413146693","title":"HunyuanPortrait: Implicit Condition Control for Enhanced Portrait Animation","url":"https://doi.org/10.1109/cvpr52734.2025.01483","published":"2025-06-10","authors":["Zunnan Xu","Zhen Yu","Zixiang Zhou","Jun Zhou","Xiaoyu Jin","Fa-Ting Hong","Xiaozhong Ji","Junwei Zhu","Chengfei Cai","Shiyu Tang","Qin Lin","Xiu Li"],"abstract":"We introduce HunyuanPortrait, a diffusion-based condition control method that employs implicit representations for highly controllable and lifelike portrait animation. Given a single portrait image as an appearance reference and video clips as driving templates, HunyuanPortrait can animate the character in the reference image by the facial expression and head pose of the driving videos. In our framework, we utilize pre-trained encoders to achieve the decoupling of portrait motion information and identity in videos. To do so, implicit representation is adopted to encode motion information and is employed as control signals in the animation phase. By leveraging the power of stable video diffusion as the main building block, we carefully design adapter layers to inject control signals into the denoising unet through attention mechanisms. These bring spatial richness of details and temporal....","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/cvpr52734.2025.01483","openalex_id":"https://openalex.org/W4413146693","cited_by_count":5,"quality_score":42,"matched_keywords":[],"author_affiliations":["Hong Kong University of Science and Technology","Tencent (China)","Tsinghua University"],"concepts":[{"id":"https://openalex.org/C162462552","display_name":"Portrait","score":0.7600662708282471},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7230520844459534},{"id":"https://openalex.org/C502989409","display_name":"Animation","score":0.6861932277679443},{"id":"https://openalex.org/C121684516","display_name":"Computer graphics (images)","score":0.5021991729736328},{"id":"https://openalex.org/C69369342","display_name":"Computer animation","score":0.4740942418575287},{"id":"https://openalex.org/C2775924081","display_name":"Control (management)","score":0.4737066924571991},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.35967904329299927},{"id":"https://openalex.org/C142362112","display_name":"Art","score":0.2277112603187561}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":5}},{"id":"openalex:W4413157402","title":"HarmonySet: A Comprehensive Dataset for Understanding Video-Music Semantic Alignment and Temporal Synchronization","url":"https://doi.org/10.1109/cvpr52734.2025.00300","published":"2025-06-10","authors":["Zitang Zhou","Ke Mei","Yu Lu","Tianyi Wang","Fengyun Rao"],"abstract":"This paper introduces HarmonySet, a comprehensive dataset designed to advance video-music understanding. Harmony-Set consists of 48,328 diverse video-music pairs, annotated with detailed information on rhythmic synchronization, emotional alignment, thematic coherence, and cultural relevance. We propose a multi-step human-machine collaborative framework for efficient annotation, combining human insights with machine-generated descriptions to identify key transitions and assess alignment across multiple dimensions. Addition ally, we introduce a novel evaluation framework with tasks and metrics to assess the multi-dimensional alignment of video and music, including rhythm, emotion, theme, and cultural context. Our extensive experiments demonstrate that HarmonySet, along with the proposed evaluation framework, significantly improves the ability of multimodal models to capture and analyze the...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/cvpr52734.2025.00300","openalex_id":"https://openalex.org/W4413157402","cited_by_count":1,"quality_score":42,"matched_keywords":["efficient"],"author_affiliations":["Tencent (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8061944246292114},{"id":"https://openalex.org/C2778562939","display_name":"Synchronization (alternating current)","score":0.7787145376205444},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3928207755088806},{"id":"https://openalex.org/C28490314","display_name":"Speech recognition","score":0.34651583433151245},{"id":"https://openalex.org/C31258907","display_name":"Computer network","score":0.12069374322891235},{"id":"https://openalex.org/C127162648","display_name":"Channel (broadcasting)","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4413147003","title":"FirePlace: Geometric Refinements of LLM Common Sense Reasoning for 3D Object Placement","url":"https://doi.org/10.1109/cvpr52734.2025.01257","published":"2025-06-10","authors":["Ian Huang","Yanan Bao","Karen Truong","Howard Zhou","Cordelia Schmid","Leonidas Guibas","Alireza Fathi"],"abstract":"Scene generation with 3D assets presents a complex challenge, requiring both high-level semantic understanding and low-level geometric reasoning. While Multimodal Large Language Models (MLLMs) excel at semantic tasks, their application to 3D scene generation is hindered by their limited grounding on 3D geometry. In this paper, we investigate how to best work with MLLMs in an object placement task. Towards this goal, we introduce a novel framework, FirePlace, that applies existing MLLMs in (1) 3D geometric reasoning and the extraction of relevant geometric details from the 3D scene, (2) constructing and solving geometric constraints on the extracted low-level geometry, and (3) pruning for final placements that conform to common sense. By combining geometric reasoning with real-world understanding of MLLMs, our method can propose object placements that satisfy both geometric constraints as...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/cvpr52734.2025.01257","openalex_id":"https://openalex.org/W4413147003","cited_by_count":1,"quality_score":42,"matched_keywords":["LLM"],"author_affiliations":["Google (United States)","Google DeepMind (United Kingdom)","Stanford University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6445118188858032},{"id":"https://openalex.org/C2781238097","display_name":"Object (grammar)","score":0.5775781869888306},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.42572951316833496},{"id":"https://openalex.org/C155911833","display_name":"Spatial intelligence","score":0.42066723108291626},{"id":"https://openalex.org/C199639397","display_name":"Engineering drawing","score":0.33580994606018066},{"id":"https://openalex.org/C127413603","display_name":"Engineering","score":0.14176154136657715}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4413145815","title":"Diff-Palm: Realistic Palmprint Generation with Polynomial Creases and Intra-Class Variation Controllable Diffusion Models","url":"https://doi.org/10.1109/cvpr52734.2025.02455","published":"2025-06-10","authors":["Jianlong Jin","Chenglong Zhao","Ruixin Zhang","Sheng Shang","Jianqing Xu","Jingyun Zhang","Shaoming Wang","Yang Zhao","Shouhong Ding","Wei Jia","Yunsheng Wu"],"abstract":"Palmprint recognition is significantly limited by the lack of large-scale publicly available datasets. Previous methods have adopted Bézier curves to simulate the palm creases, which then serve as input for conditional GANs to generate realistic palmprints. However, without employing real data fine-tuning, the performance of the recognition model trained on these synthetic datasets would drastically decline, indicating a large gap between generated and real palmprints. This is primarily due to the utilization of an inaccurate palm crease representation and challenges in balancing intra-class variation with identity consistency. To address this, we introduce a polynomial-based palm crease representation that provides a new palm crease generation mechanism more closely aligned with the real distribution. We also propose the palm creases conditioned diffusion model with a novel intra-class....","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/cvpr52734.2025.02455","openalex_id":"https://openalex.org/W4413145815","cited_by_count":5,"quality_score":42,"matched_keywords":[],"author_affiliations":["Hefei University of Technology","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C2778334786","display_name":"Variation (astronomy)","score":0.6522245407104492},{"id":"https://openalex.org/C2777212361","display_name":"Class (philosophy)","score":0.5860660076141357},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5302167534828186},{"id":"https://openalex.org/C94598645","display_name":"Palm","score":0.5291833281517029},{"id":"https://openalex.org/C69357855","display_name":"Diffusion","score":0.5126284956932068},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5014400482177734},{"id":"https://openalex.org/C153180895","display_name":"Pattern recognition (psychology)","score":0.45455044507980347},{"id":"https://openalex.org/C90119067","display_name":"Polynomial","score":0.44318172335624695}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":5}},{"id":"openalex:W4413157726","title":"DiTCtrl: Exploring Attention Control in Multi-Modal Diffusion Transformer for Tuning-Free Multi-Prompt Longer Video Generation","url":"https://doi.org/10.1109/cvpr52734.2025.00727","published":"2025-06-10","authors":["Minghong Cai","Xiaodong Cun","Xiaoyu Li","Wenze Liu","Zhaoyang Zhang","Yong Zhang","Ying Shan","Xiangyu Yue"],"abstract":"Sora-like video generation models have achieved remarkable progress with a Multi-Modal Diffusion Transformer (MM-DiT) architecture. However, the current video generation models predominantly focus on single-prompt, struggling to generate coherent scenes with multiple sequential prompts that better reflect real-world dynamic scenarios. While some pioneering works have explored multi-prompt video generation, they face significant challenges including strict training data requirements, weak prompt following, and unnatural transitions. To address these problems, we propose DiTCtrl, a training-free multi-prompt video generation method under MM-DiT architectures for the first time. Our key idea is to take the multi-prompt video generation task as temporal video editing with smooth transitions. To achieve this goal, we first analyze MM-DiT’s attention mechanism, finding that the 3D full attenti...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/cvpr52734.2025.00727","openalex_id":"https://openalex.org/W4413157726","cited_by_count":5,"quality_score":42,"matched_keywords":[],"author_affiliations":["Great Lakes Research Group (United States)","Tencent (China)","University of Hong Kong"],"concepts":[{"id":"https://openalex.org/C71139939","display_name":"Modal","score":0.6772758364677429},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6184595823287964},{"id":"https://openalex.org/C66322947","display_name":"Transformer","score":0.6097286343574524},{"id":"https://openalex.org/C47446073","display_name":"Control theory (sociology)","score":0.3457513153553009},{"id":"https://openalex.org/C2775924081","display_name":"Control (management)","score":0.3176848292350769},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.2099786400794983},{"id":"https://openalex.org/C119599485","display_name":"Electrical engineering","score":0.19764915108680725},{"id":"https://openalex.org/C127413603","display_name":"Engineering","score":0.16357341408729553}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":5}},{"id":"openalex:W4413144859","title":"Align-KD: Distilling Cross-Modal Alignment Knowledge for Mobile Vision-Language Large Model Enhancement","url":"https://doi.org/10.1109/cvpr52734.2025.00395","published":"2025-06-10","authors":["Qianhan Feng","Wenshuo Li","Tong Lin","Xinghao Chen"],"abstract":"Vision-Language Models (VLMs) bring powerful understanding and reasoning capabilities to multimodal tasks. Meanwhile, the great need for capable aritificial intelligence on mobile devices also arises, such as the AI assistant software. Some efforts try to migrate VLMs to edge devices to expand their application scope. Simplifying the model structure is a common method, but as the model shrinks, the trade-off between performance and size becomes more and more difficult. Knowledge distillation (KD) can help models improve comprehensive capabilities without increasing size or data volume. However, most of the existing large model distillation techniques only consider applications on single-modal LLMs, or only use teachers to create new data environments for students. None of these methods takes into account the distillation of the most important cross-modal alignment knowledge in VLMs. We p...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/cvpr52734.2025.00395","openalex_id":"https://openalex.org/W4413144859","cited_by_count":1,"quality_score":42,"matched_keywords":["distillation"],"author_affiliations":["Beijing Academy of Artificial Intelligence","Huawei Technologies (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7386950254440308},{"id":"https://openalex.org/C71139939","display_name":"Modal","score":0.5490010380744934},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.38229095935821533},{"id":"https://openalex.org/C192562407","display_name":"Materials science","score":0.12352433800697327},{"id":"https://openalex.org/C188027245","display_name":"Polymer chemistry","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4413146686","title":"Advancing Myopia To Holism: Fully Contrastive Language-Image Pre-training","url":"https://doi.org/10.1109/cvpr52734.2025.02773","published":"2025-06-10","authors":["Haicheng Wang","Jin Chen","Weixiong Lin","Shuai Xiao","Mengting Chen","Yixuan Huang","Chang Liu","Mingshuai Yao","Jinsong Lan","Ying Chen","Qingwen Liu","Yanfeng Wang"],"abstract":"In rapidly evolving field of vision-language models (VLMs), contrastive language-image pre-training (CLIP) has made significant strides, becoming foundation for various downstream tasks. However, relying on one-to-one (image, text) contrastive paradigm to learn alignment from large-scale messy web data, CLIP faces a serious myopic dilemma, resulting in biases towards monotonous short texts and shallow visual expressivity. To overcome these issues, this paper advances CLIP into one novel holistic paradigm, by updating both diverse data and alignment optimization. To obtain colorful data with low cost, we use image-to-text captioning to generate multi-texts for each image, from multiple perspectives, granularities, and hierarchies. Two gadgets are proposed to encourage textual diversity. To match such (image, multi-texts) pairs, we modify the CLIP image encoder into multi-branch, and propo...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/cvpr52734.2025.02773","openalex_id":"https://openalex.org/W4413146686","cited_by_count":1,"quality_score":42,"matched_keywords":["retrieval"],"author_affiliations":["Alibaba Group (China)","Alibaba Group (United States)","Artificial Intelligence in Medicine (Canada)","Shanghai Jiao Tong University"],"concepts":[{"id":"https://openalex.org/C20155586","display_name":"Holism","score":0.728638768196106},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.64432692527771},{"id":"https://openalex.org/C2777211547","display_name":"Training (meteorology)","score":0.6084039211273193},{"id":"https://openalex.org/C115961682","display_name":"Image (mathematics)","score":0.4294665455818176},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.39845192432403564},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.38023990392684937},{"id":"https://openalex.org/C41895202","display_name":"Linguistics","score":0.37798184156417847},{"id":"https://openalex.org/C15744967","display_name":"Psychology","score":0.3210332989692688}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4413157580","title":"A<sup>T</sup>A: Adaptive Transformation Agent for Text-Guided Subject-Position Variable Background Inpainting","url":"https://doi.org/10.1109/cvpr52734.2025.01709","published":"2025-06-10","authors":["Yizhe Tang","Zhimin Sun","Yuzhen Du","Ran Yi","Guangben Lu","Teng Hu","Luying Li","Lizhuang Ma","Fangyuan Zou"],"abstract":"Image inpainting aims to fill the missing region of an image. Recently, there has been a surge of interest in foreground-conditioned background inpainting, a sub-task that fills the background of an image while the foreground subject and associated text prompt are provided. Existing background inpainting methods typically strictly preserve the subject’s original position from the source image, resulting in inconsistencies between the subject and the generated background. To address this challenge, we propose a new task, the \"Text-Guided Subject-Position Variable Background Inpainting\", which aims to dynamically adjust the subject position to achieve a harmonious relationship between the subject and the inpainted background, and propose the Adaptive Transformation Agent (A<sup xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">T</sup>A) for this task...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/cvpr52734.2025.01709","openalex_id":"https://openalex.org/W4413157580","cited_by_count":1,"quality_score":42,"matched_keywords":["agent"],"author_affiliations":["Shanghai Jiao Tong University","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C204241405","display_name":"Transformation (genetics)","score":0.7034579515457153},{"id":"https://openalex.org/C11727466","display_name":"Inpainting","score":0.6837555170059204},{"id":"https://openalex.org/C2777855551","display_name":"Subject (documents)","score":0.632100522518158},{"id":"https://openalex.org/C198082294","display_name":"Position (finance)","score":0.608890950679779},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6055124998092651},{"id":"https://openalex.org/C182365436","display_name":"Variable (mathematics)","score":0.5834970474243164},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5122734904289246},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.3374169170856476}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4413145974","title":"Visual Lexicon: Rich Image Features in Language Space","url":"https://doi.org/10.1109/cvpr52734.2025.01838","published":"2025-06-10","authors":["Xudong Wang","Xingyi Zhou","Alireza Fathi","Trevor Darrell","Cordelia Schmid"],"abstract":"We present Visual Lexicon, a novel visual language that encodes rich image information into the text space of vocabulary tokens while retaining intricate visual details that are often challenging to convey in natural language. Unlike traditional methods that prioritize either high-level semantics (e.g., CLIP) or pixel-level reconstruction (e.g., VAE), ViLex simultaneously captures rich semantic content and fine visual details, enabling high-quality image generation and comprehensive visual scene understanding. Through a self-supervised learning pipeline, ViLex generates tokens optimized for reconstructing input images using a frozen text-to-image (T2I) diffusion model, preserving the detailed information necessary for high-fidelity semantic-level reconstruction. As an image embedding in the language space, ViLex tokens leverage the compositionality of natural languages, allowing them to....","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/cvpr52734.2025.01838","openalex_id":"https://openalex.org/W4413145974","cited_by_count":0,"quality_score":41,"matched_keywords":["language model"],"author_affiliations":["Berkeley College","DeepMind (United Kingdom)","Google (United States)","University of California, Berkeley"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7685288190841675},{"id":"https://openalex.org/C2778121359","display_name":"Lexicon","score":0.765162467956543},{"id":"https://openalex.org/C2778572836","display_name":"Space (punctuation)","score":0.5979804992675781},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5962187051773071},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.5531248450279236},{"id":"https://openalex.org/C115961682","display_name":"Image (mathematics)","score":0.5014743804931641},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.37689700722694397},{"id":"https://openalex.org/C111919701","display_name":"Operating system","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4413147302","title":"Stable-SCore: A Stable Registration-based Framework for 3D Shape Correspondence","url":"https://doi.org/10.1109/cvpr52734.2025.00094","published":"2025-06-10","authors":["Haolin Liu","Xiaohang Zhan","Zizheng Yan","Zhongjin Luo","Yuxin Wen","Xiaoguang Han"],"abstract":"Establishing character shape correspondence is a critical and fundamental task in computer vision and graphics, with diverse applications including re-topology, attribute transfer, and shape interpolation. Current dominant functional map methods, while effective in controlled scenarios, struggle in real situations with more complex challenges such as non-isometric shape discrepancies. In response, we revisit registration-for-correspondence methods and tap their potential for more stable shape correspondence estimation. To overcome their common issues including unstable deformations and the necessity for careful pre-alignment or high-quality initial 3D correspondences, we introduce Stable-SCore: A Stable Registration-based Framework for 3D Shape Correspondence. We first re-purpose a foundation model for 2D character correspondence that ensures reliable and stable 2D mappings. Crucially, w...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/cvpr52734.2025.00094","openalex_id":"https://openalex.org/W4413147302","cited_by_count":4,"quality_score":41,"matched_keywords":[],"author_affiliations":["Chinese University of Hong Kong, Shenzhen","Shanghai Stock Exchange","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.6264045238494873},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5794336199760437},{"id":"https://openalex.org/C166704113","display_name":"Image registration","score":0.550960123538971},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.5102887153625488},{"id":"https://openalex.org/C112972136","display_name":"Stability (learning theory)","score":0.4215797185897827},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.19010436534881592},{"id":"https://openalex.org/C115961682","display_name":"Image (mathematics)","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":4}},{"id":"openalex:W4413146201","title":"PhD: A ChatGPT-Prompted Visual hallucination Evaluation Dataset","url":"https://doi.org/10.1109/cvpr52734.2025.01849","published":"2025-06-10","authors":["Jiazhen Liu","Yuhan Fu","Ruobing Xie","Runquan Xie","Xingwu Sun","Fengzong Lian","Zhan Kang","Xirong Li"],"abstract":"Multimodal Large Language Models (MLLMs) hallucinate, resulting in an emerging topic of visual hallucination evaluation (VHE). This paper contributes a ChatGPT-Prompted visual hallucination evaluation Dataset (PhD) for objective VHE at a large scale. The essence of VHE is to ask an MLLM questions about specific images to assess its susceptibility to hallucination. Depending on what to ask (objects, attributes, sentiment, etc.) and how the questions are asked, we structure PhD along two dimensions, i.e. task and mode. Five visual recognition tasks, ranging from low-level (object/attribute recognition) to middle-level (sentiment/position recognition and counting), are considered. Besides a normal visual QA mode, which we term PhD-base, PhD also asks questions with specious context (PhD-sec) or with incorrect context (PhD-icc), or with AI-generated counter common sense images (PhD-ccs). We....","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/cvpr52734.2025.01849","openalex_id":"https://openalex.org/W4413146201","cited_by_count":4,"quality_score":41,"matched_keywords":[],"author_affiliations":["Renmin University of China","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6187433004379272},{"id":"https://openalex.org/C2908998935","display_name":"Visual Hallucination","score":0.5919169783592224},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4529615044593811},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.34500786662101746},{"id":"https://openalex.org/C15744967","display_name":"Psychology","score":0.2449110746383667},{"id":"https://openalex.org/C118552586","display_name":"Psychiatry","score":0.05360281467437744}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":4}},{"id":"openalex:W4413144693","title":"OpenHumanVid: A Large-Scale High-Quality Dataset for Enhancing Human-Centric Video Generation","url":"https://doi.org/10.1109/cvpr52734.2025.00726","published":"2025-06-10","authors":["Hui Li","Miao Xu","Yun Zhan","Shan Mu","Jiaye Li","Kaihui Cheng","Yuxuan Chen","Tan Chen","Mao Ye","Jingdong Wang","Siyu Zhu"],"abstract":"Recent advancements in visual generation technologies have markedly increased the scale and availability of video datasets, which are crucial for training effective video generation models. However, a significant lack of high-quality, human-centric video datasets presents a challenge to progress in this field. To bridge this gap, we introduce OpenHumanVid, a large-scale and high-quality humancentric video dataset characterized by precise and detailed captions that encompass both human appearance and motion states, along with supplementary human motion conditions, including skeleton sequences and speech audio. To validate the efficacy of this dataset and the associated training strategies, we propose an extension of existing classical diffusion transformer architectures and conduct further pretraining of our models on the proposed dataset. Our findings yield two critical insights: First,....","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/cvpr52734.2025.00726","openalex_id":"https://openalex.org/W4413144693","cited_by_count":4,"quality_score":41,"matched_keywords":[],"author_affiliations":["Baidu (China)","Fudan University","Shanghai Jiao Tong University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7151398658752441},{"id":"https://openalex.org/C2778755073","display_name":"Scale (ratio)","score":0.6351538896560669},{"id":"https://openalex.org/C2779530757","display_name":"Quality (philosophy)","score":0.5993331074714661},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3624236285686493},{"id":"https://openalex.org/C205649164","display_name":"Geography","score":0.054861992597579956},{"id":"https://openalex.org/C58640448","display_name":"Cartography","score":0.051978737115859985},{"id":"https://openalex.org/C111472728","display_name":"Epistemology","score":0.0},{"id":"https://openalex.org/C138885662","display_name":"Philosophy","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":4}},{"id":"openalex:W4413146822","title":"Omnia de EgoTempo: Benchmarking Temporal Understanding of Multi-Modal LLMs in Egocentric Videos","url":"https://doi.org/10.1109/cvpr52734.2025.02247","published":"2025-06-10","authors":["Chiara Plizzari","Alessio Tonioni","Yongqin Xian","Achin Kulshrestha","Federico Tombari"],"abstract":"Understanding fine-grained temporal dynamics is crucial in egocentric videos, where continuous streams capture frequent, close-up interactions with objects. In this work, we bring to light that current egocentric video question-answering datasets often include questions that can be answered using only few frames or commonsense reasoning, without being necessarily grounded in the actual video. Our analysis shows that state-of-the-art Multi-Modal Large Language Models (MLLMs) on these benchmarks achieve remarkably high performance using just text or a single frame as input. To address these limitations, we introduce EgoTempo, a dataset specifically designed to evaluate temporal understanding in the egocentric domain. EgoTempo emphasizes tasks that require integrating information across the entire video, ensuring that models would need to rely on temporal patterns rather than static cues or...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/cvpr52734.2025.02247","openalex_id":"https://openalex.org/W4413146822","cited_by_count":4,"quality_score":41,"matched_keywords":[],"author_affiliations":["Google (United States)"],"concepts":[{"id":"https://openalex.org/C86251818","display_name":"Benchmarking","score":0.8503799438476562},{"id":"https://openalex.org/C71139939","display_name":"Modal","score":0.664157509803772},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5712268948554993},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4453479051589966},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.37763819098472595},{"id":"https://openalex.org/C144133560","display_name":"Business","score":0.11487796902656555},{"id":"https://openalex.org/C185592680","display_name":"Chemistry","score":0.10482850670814514},{"id":"https://openalex.org/C162853370","display_name":"Marketing","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":4}},{"id":"openalex:W4413147077","title":"MMAR: Towards Lossless Multi-Modal Auto-Regressive Probabilistic Modeling","url":"https://doi.org/10.1109/cvpr52734.2025.00747","published":"2025-06-10","authors":["Jian Yang","Dacheng Yin","Yizhou Zhou","Fengyun Rao","Wei Zhai","Yang Cao","Zheng-Jun Zha"],"abstract":"Recent advancements in multi-modal large language models have propelled the development of joint probabilistic models capable of both image understanding and generation. However, we have identified that recent methods suffer from loss of image information during understanding task, due to either image discretization or diffusion de-noising steps. To address this issue, we propose a novel Multi-Modal Auto-Regressive (MMAR) probabilistic modeling framework. Unlike discretization line of method, MMAR takes in continuous-valued image tokens to avoid information loss in an efficient way. Differing from diffusion-based approaches, we disentangle the diffusion process from auto-regressive backbone model by employing a lightweight diffusion head on top each auto-regressed image patch embedding. In this way, when the model transits from image generation to understanding through text generation, t...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/cvpr52734.2025.00747","openalex_id":"https://openalex.org/W4413147077","cited_by_count":0,"quality_score":41,"matched_keywords":["efficient"],"author_affiliations":["Shanghai Center for Brain Science and Brain-Inspired Technology","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6365422010421753},{"id":"https://openalex.org/C71139939","display_name":"Modal","score":0.6248590350151062},{"id":"https://openalex.org/C81081738","display_name":"Lossless compression","score":0.5893568396568298},{"id":"https://openalex.org/C49937458","display_name":"Probabilistic logic","score":0.541307270526886},{"id":"https://openalex.org/C159877910","display_name":"Autoregressive model","score":0.4691140949726105},{"id":"https://openalex.org/C78548338","display_name":"Data compression","score":0.2051812708377838},{"id":"https://openalex.org/C11413529","display_name":"Algorithm","score":0.20382419228553772},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.1887805461883545}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4413146011","title":"Incorporating Dense Knowledge Alignment into Unified Multimodal Representation Models","url":"https://doi.org/10.1109/cvpr52734.2025.02768","published":"2025-06-10","authors":["Yuhao Cui","Xinxing Zu","Wenhua Zhang","Zhongzhou Zhao","Jinyang Gao"],"abstract":"Leveraging Large Language Models (LLMs) for text representation has achieved significant success, but the exploration of using Multimodal LLMs (MLLMs) for multimodal representation remains limited. Previous MLLM-based representation studies have primarily focused on unifying the embedding space while neglecting the importance of multi-modal alignment. As a result, their cross-modal retrieval performance falls markedly behind that of the CLIP series models. To address this, in our work, we 1) construct DeKon5M, a contrastive learning dataset enriched with dense multimodal knowledge, which efficiently enhances multimodal alignment capabilities in representation tasks. 2) design a framework for training unified representation on MLLMs. Building upon this unified representation framework and the dense knowledge dataset DeKon5M, we developed the Dense Knowledge Representation model DeKR on Qw...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/cvpr52734.2025.02768","openalex_id":"https://openalex.org/W4413146011","cited_by_count":0,"quality_score":41,"matched_keywords":["retrieval"],"author_affiliations":["Alibaba Group (China)","Ningbo Dahongying University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7094659209251404},{"id":"https://openalex.org/C2776359362","display_name":"Representation (politics)","score":0.5550122261047363},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4208429455757141},{"id":"https://openalex.org/C199539241","display_name":"Law","score":0.0},{"id":"https://openalex.org/C17744445","display_name":"Political science","score":0.0},{"id":"https://openalex.org/C94625758","display_name":"Politics","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4413155789","title":"GroundingFace: Fine-grained Face Understanding via Pixel Grounding Multimodal Large Language Model","url":"https://doi.org/10.1109/cvpr52734.2025.00373","published":"2025-06-10","authors":["Yue Han","Jiangning Zhang","Junwei Zhu","Runze Hou","Xiaozhong Ji","Chuming Lin","Xiaobin Hu","Zhucun Xue","Yong Liu"],"abstract":"Multimodal Language Learning Models (MLLMs) have shown remarkable performance in image understanding, generation, and editing, with recent advancements achieving pixel-level grounding with reasoning. However, these models for common objects struggle with fine-grained face understanding. In this work, we introduce the FacePlayGround-240K dataset, the first pioneering large-scale, pixel-grounded face caption and question-answer (QA) dataset that includes 240K images, 47 mask categories, 5.4M mask annotations, and 7.3M grounded regions, meticulously curated for alignment pretraining and instruction-tuning. We present the GroundingFace framework, specifically designed to enhance fine-grained face understanding. This framework significantly augments the capabilities of existing grounding models in face part segmentation, face attribute comprehension, while preserving general scene understandi...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/cvpr52734.2025.00373","openalex_id":"https://openalex.org/W4413155789","cited_by_count":0,"quality_score":41,"matched_keywords":["language model"],"author_affiliations":["Tencent (China)","Tsinghua University","Zhejiang University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7181538939476013},{"id":"https://openalex.org/C2779304628","display_name":"Face (sociological concept)","score":0.7025454044342041},{"id":"https://openalex.org/C168993435","display_name":"Ground","score":0.5796988010406494},{"id":"https://openalex.org/C160633673","display_name":"Pixel","score":0.4705657660961151},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.43119844794273376},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.36979103088378906},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.34445250034332275},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.3393239676952362}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4413147649","title":"Enhancing Few-Shot Class-Incremental Learning via Training-Free Bi-Level Modality Calibration","url":"https://doi.org/10.1109/cvpr52734.2025.00923","published":"2025-06-10","authors":["Yiyang Chen","Tianyu Ding","Lei Wang","Jing Huo","Yang Gao","Wenbin Li"],"abstract":"Few-shot Class-Incremental Learning (FSCIL) challenges models to adapt to new classes with limited samples, presenting greater difficulties than traditional class-incremental learning. While existing approaches rely heavily on visual models and require additional training during base or incremental phases, we propose a training-free framework that leverages pre-trained visual-language models like CLIP. At the core of our approach is a novel Bi-level Modality Calibration (BiMC) strategy. Our framework initially performs intra-modal calibration, combining LLM-generated fine-grained category descriptions with visual prototypes from the base session to achieve precise classifier estimation. This is further complemented by inter-modal calibration that fuses pre-trained linguistic knowledge with task-specific visual priors to mitigate modality-specific biases. To enhance prediction robustness,...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/cvpr52734.2025.00923","openalex_id":"https://openalex.org/W4413147649","cited_by_count":0,"quality_score":41,"matched_keywords":["LLM"],"author_affiliations":["Microsoft (United States)","Nanjing University","University of Wollongong"],"concepts":[{"id":"https://openalex.org/C2780226545","display_name":"Modality (human–computer interaction)","score":0.7705507874488831},{"id":"https://openalex.org/C2778344882","display_name":"Shot (pellet)","score":0.7431508898735046},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6766864657402039},{"id":"https://openalex.org/C165838908","display_name":"Calibration","score":0.669022798538208},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.6065244078636169},{"id":"https://openalex.org/C2777211547","display_name":"Training (meteorology)","score":0.6026817560195923},{"id":"https://openalex.org/C2777212361","display_name":"Class (philosophy)","score":0.5937166810035706},{"id":"https://openalex.org/C2992734406","display_name":"One shot","score":0.48836228251457214}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4413144718","title":"Towards a Universal Synthetic Video Detector: From Face or Background Manipulations to Fully AI-Generated Content","url":"https://doi.org/10.1109/cvpr52734.2025.02612","published":"2025-06-10","authors":["Rohit Kundu","Hao Xiong","Vishal Mohanty","Athula Balachandran","Amit K. Roy–Chowdhury"],"abstract":"Existing DeepFake detection techniques primarily focus on facial manipulations, such as face-swapping or lip-syncing. However, advancements in text-to-video (T2V) and image-to-video (I2V) generative models now allow fully AI-generated synthetic content and seamless background alterations, challenging face-centric detection methods and demanding more versatile approaches.To address this, we introduce the Universal Network for Identifying Tampered and synthEtic videos (UNITE) model, which, unlike traditional detectors, captures full-frame manipulations. UNITE extends detection capabilities to scenarios without faces, non-human subjects, and complex background modifications. It leverages a transformer-based architecture that processes domain-agnostic features extracted from videos via the SigLIP-So400M foundation model. Given limited datasets encompassing both facial/background alterations....","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/cvpr52734.2025.02612","openalex_id":"https://openalex.org/W4413144718","cited_by_count":3,"quality_score":40,"matched_keywords":[],"author_affiliations":["Google (United States)","University of California, Riverside"],"concepts":[{"id":"https://openalex.org/C94915269","display_name":"Detector","score":0.6347380876541138},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.614963173866272},{"id":"https://openalex.org/C2779304628","display_name":"Face (sociological concept)","score":0.5896849632263184},{"id":"https://openalex.org/C2778152352","display_name":"Content (measure theory)","score":0.5384902954101562},{"id":"https://openalex.org/C4641261","display_name":"Face detection","score":0.45642349123954773},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.40688154101371765},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.3964095115661621},{"id":"https://openalex.org/C31510193","display_name":"Facial recognition system","score":0.29103735089302063}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":3}},{"id":"openalex:W4413144994","title":"Scaling Inference Time Compute for Diffusion Models","url":"https://doi.org/10.1109/cvpr52734.2025.00241","published":"2025-06-10","authors":["Nanye Ma","Shangyuan Tong","Haolin Jia","Hexiang Hu","Yu-Chuan Su","Mingda Zhang","Xuan Yang","Yandong Li","Tommi Jaakkola","Xuhui Jia","Saining Xie"],"abstract":"Generative models have made significant impacts across various domains, largely due to their ability to scale during training by increasing data, computational resources, and model size, a phenomenon characterized by the scaling laws. Recent research has begun to explore inference-time scaling behavior in Large Language Models (LLMs), revealing how performance can further improve with additional computation during inference. Unlike LLMs, diffusion models inherently possess the flexibility to adjust inference-time computation via the number of denoising steps, although the performance gains typically flatten after a few dozen. In this work, we explore the inference-time scaling behavior of diffusion models beyond increasing denoising steps and investigate how the generation performance can further improve with increased computation. Specifically, we consider a search problem aimed at iden...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/cvpr52734.2025.00241","openalex_id":"https://openalex.org/W4413144994","cited_by_count":3,"quality_score":40,"matched_keywords":[],"author_affiliations":["Google (United States)","Moscow Institute of Thermal Technology","New York University"],"concepts":[{"id":"https://openalex.org/C99844830","display_name":"Scaling","score":0.6834002733230591},{"id":"https://openalex.org/C2776214188","display_name":"Inference","score":0.6767913699150085},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.652656614780426},{"id":"https://openalex.org/C69357855","display_name":"Diffusion","score":0.5705412030220032},{"id":"https://openalex.org/C121864883","display_name":"Statistical physics","score":0.47403669357299805},{"id":"https://openalex.org/C11413529","display_name":"Algorithm","score":0.36662739515304565},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.25158393383026123},{"id":"https://openalex.org/C33923547","display_name":"Mathematics","score":0.17062431573867798}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":3}},{"id":"openalex:W4413147665","title":"Relative Pose Estimation through Affine Corrections of Monocular Depth Priors","url":"https://doi.org/10.1109/cvpr52734.2025.01557","published":"2025-06-10","authors":["Yifan Yu","Shaohui Liu","Rémi Pautrat","Marc Pollefeys","Viktor Larsson"],"abstract":"Monocular depth estimation (MDE) models have undergone significant advancements over recent years. Many MDE models aim to predict affine-invariant relative depth from monocular images, while recent developments in large-scale training and vision foundation models enable reasonable estimation of metric (absolute) depth. However, effectively leveraging these predictions for geometric vision tasks, in particular relative pose estimation, remains relatively under explored. While depths provide rich constraints for cross-view image alignment, the intrinsic noise and ambiguity from the monocular depth priors present practical challenges to improving upon classic keypoint-based solutions. In this paper, we develop three solvers for relative pose estimation that explicitly account for independent affine (scale and shift) ambiguities, covering both calibrated and uncalibrated conditions. We furth...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/cvpr52734.2025.01557","openalex_id":"https://openalex.org/W4413147665","cited_by_count":3,"quality_score":40,"matched_keywords":[],"author_affiliations":["ETH Zurich","Lund University","Microsoft (United States)","Microsoft Research (United Kingdom)"],"concepts":[{"id":"https://openalex.org/C177769412","display_name":"Prior probability","score":0.7718809843063354},{"id":"https://openalex.org/C92757383","display_name":"Affine transformation","score":0.6950048208236694},{"id":"https://openalex.org/C65909025","display_name":"Monocular","score":0.6164867877960205},{"id":"https://openalex.org/C52102323","display_name":"Pose","score":0.5879950523376465},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5859907865524292},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.5356215238571167},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5132205486297607},{"id":"https://openalex.org/C96250715","display_name":"Estimation","score":0.45032161474227905}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":3}},{"id":"openalex:W4413146486","title":"ReCapture: Generative Video Camera Controls for User-Provided Videos using Masked Video Fine-Tuning","url":"https://doi.org/10.1109/cvpr52734.2025.00197","published":"2025-06-10","authors":["David Junhao Zhang","Roni Paiss","Shiran Zada","Nikhil Karnad","David Jacobs","Yael Pritch","Inbar Mosseri","Mike Zheng Shou","Neal Wadhwa","Nataniel Ruiz"],"abstract":"Recently, breakthroughs in video modeling have allowed for controllable camera trajectories in generated videos. However, these methods cannot be directly applied to user-provided videos that are not generated by a video model. In this paper, we present ReCapture, a method for generating new videos with novel camera trajectories from a single user-provided video. Our method allows us to re-generate the reference video, with all its existing scene motion, from vastly different angles and with cinematic camera motion. Notably, using our method we can also plausibly hallucinate parts of the scene that were not observable in the reference video. Our method works by (1) generating a noisy anchor video with a new camera trajectory using multiview diffusion models or depth-based point cloud rendering and then (2) regenerating the anchor video into a clean and temporally consistent reangled vide...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/cvpr52734.2025.00197","openalex_id":"https://openalex.org/W4413146486","cited_by_count":3,"quality_score":40,"matched_keywords":[],"author_affiliations":["Google (United States)","National University of Singapore"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.729835569858551},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.6525238752365112},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5773778557777405},{"id":"https://openalex.org/C36528806","display_name":"Mark and recapture","score":0.5510200262069702},{"id":"https://openalex.org/C2778852477","display_name":"Video camera","score":0.5179558992385864},{"id":"https://openalex.org/C151211776","display_name":"Video capture","score":0.48659026622772217},{"id":"https://openalex.org/C121684516","display_name":"Computer graphics (images)","score":0.36012616753578186},{"id":"https://openalex.org/C65483669","display_name":"Video processing","score":0.3002161979675293}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":3}},{"id":"openalex:W4413145527","title":"LogiCzsl: Exploring Logic-induced Representation for Compositional Zero-shot Learning","url":"https://doi.org/10.1109/cvpr52734.2025.02821","published":"2025-06-10","authors":["Peng Wu","Xiankai Lu","Hao Hu","Yongqin Xian","Jianbing Shen","Wenguan Wang"],"abstract":"Compositional zero-shot learning (CZSL) aims to recognize unseen attribute-object compositions by learning the primitive concepts (i.e., attribute and object) from the training set. While recent works achieve impressive results in CZSL by leveraging large vision-language models like CLIP, they ignore the rich semantic relationships between primitive concepts and their compositions. In this work, we propose LogiCzsl, a novel logic-induced learning framework to explicitly model the semantic relationships. Our logic-induced learning framework formulates the relational knowledge constructed from large language models as a set of logic rules, and grounds them onto the training data. Our logic-induced losses are complementary to the widely used CZSL losses, therefore can be employed to inject the semantic information into any existing CZSL methods. Extensive experimental results show that our....","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/cvpr52734.2025.02821","openalex_id":"https://openalex.org/W4413145527","cited_by_count":3,"quality_score":40,"matched_keywords":[],"author_affiliations":["Google (United States)","Shandong University","University of Macau","Zhejiang University"],"concepts":[{"id":"https://openalex.org/C2776359362","display_name":"Representation (politics)","score":0.5999624729156494},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5961390733718872},{"id":"https://openalex.org/C2780813799","display_name":"Zero (linguistics)","score":0.5662696361541748},{"id":"https://openalex.org/C2778344882","display_name":"Shot (pellet)","score":0.5476682186126709},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.40427935123443604},{"id":"https://openalex.org/C80444323","display_name":"Theoretical computer science","score":0.35415753722190857},{"id":"https://openalex.org/C192562407","display_name":"Materials science","score":0.09173542261123657},{"id":"https://openalex.org/C17744445","display_name":"Political science","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":3}},{"id":"openalex:W4413145006","title":"Zero-Shot Styled Text Image Generation, but Make It Autoregressive","url":"https://doi.org/10.1109/cvpr52734.2025.00741","published":"2025-06-10","authors":["Vittorio Pippi","Fabio Quattrini","Silvia Cascianelli","Alessio Tonioni","Rita Cucchiara"],"abstract":"Styled Handwritten Text Generation (HTG) has recently received attention from the computer vision and document analysis communities, which have developed several solutions, either GAN- or diffusion-based, that achieved promising results. Nonetheless, these strategies fail to generalize to novel styles and have technical constraints, particularly in terms of maximum output length and training efficiency. To overcome these limitations, in this work, we propose a novel framework for text image generation, dubbed Emuru. Our approach leverages a powerful text image representation model (a variational autoencoder) combined with an autoregressive Transformer. Our approach enables the generation of styled text images conditioned on textual content and style examples, such as specific fonts or handwriting styles. We train our model solely on a diverse, synthetic dataset of English text rendered i...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/cvpr52734.2025.00741","openalex_id":"https://openalex.org/W4413145006","cited_by_count":2,"quality_score":39,"matched_keywords":[],"author_affiliations":["Google (United States)","University of Modena and Reggio Emilia"],"concepts":[{"id":"https://openalex.org/C159877910","display_name":"Autoregressive model","score":0.7218726277351379},{"id":"https://openalex.org/C2780813799","display_name":"Zero (linguistics)","score":0.6764101386070251},{"id":"https://openalex.org/C2778344882","display_name":"Shot (pellet)","score":0.6021344661712646},{"id":"https://openalex.org/C115961682","display_name":"Image (mathematics)","score":0.5636430382728577},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5580217242240906},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.39750781655311584},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.34536033868789673},{"id":"https://openalex.org/C33923547","display_name":"Mathematics","score":0.18473777174949646}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":2}},{"id":"openalex:W4413144323","title":"Unveiling the Ignorance of MLLMs: Seeing Clearly, Answering Incorrectly","url":"https://doi.org/10.1109/cvpr52734.2025.00849","published":"2025-06-10","authors":["Yexin Liu","Zhengyang Liang","Yueze Wang","Xianfeng Wu","Feilong Tang","Muyang He","Jian Li","Zheng Liu","Harry Yang","Ser-Nam Lim","Bo Zhao"],"abstract":"Multimodal Large Language Models (MLLMs) have displayed remarkable performance in multi-modal tasks, particularly in visual comprehension. However, we reveal that MLLMs often generate incorrect answers even when they understand the visual content. To this end, we manually construct a benchmark with 12 categories and design evaluation metrics that assess the degree of error in MLLM responses even when the visual content is seemingly understood. Based on this benchmark, we test 15 leading MLLMs and analyze the distribution of attention maps and logits of some MLLMs. Our investigation identifies two primary issues: 1) most instruction tuning datasets predominantly feature questions that \"directly\" relate to the visual content, leading to a bias in MLLMs’ responses to other indirect questions, and 2) MLLMs’ attention to visual tokens is notably lower than to system and question tokens. We fu...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/cvpr52734.2025.00849","openalex_id":"https://openalex.org/W4413144323","cited_by_count":2,"quality_score":39,"matched_keywords":[],"author_affiliations":["Beijing Academy of Artificial Intelligence","Hong Kong University of Science and Technology","Medgar Evers College","Peking University","Shanghai Jiao Tong University","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C2778732403","display_name":"Ignorance","score":0.7493677735328674},{"id":"https://openalex.org/C44291984","display_name":"Question answering","score":0.6485567688941956},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6030774116516113},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.333747923374176},{"id":"https://openalex.org/C111472728","display_name":"Epistemology","score":0.32200610637664795},{"id":"https://openalex.org/C138885662","display_name":"Philosophy","score":0.22328892350196838}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":2}},{"id":"openalex:W4413145992","title":"The Power of Context: How Multimodality Improves Image Super-Resolution","url":"https://doi.org/10.1109/cvpr52734.2025.02155","published":"2025-06-10","authors":["Kangfu Mei","Hossein Talebi","Mojtaba Ardakani","Vishal M. Patel","Peyman Milanfar","Mauricio Delbracio"],"abstract":"Single-image super-resolution (SISR) remains challenging due to the inherent difficulty of recovering fine-grained details and preserving perceptual quality from low-resolution inputs. Existing methods often rely on limited image priors, leading to suboptimal results. We propose a novel approach that leverages the rich contextual information available in multiple modalities - including depth, segmentation, edges, and text prompts-to learn a powerful generative prior for SISR within a diffusion model framework. We introduce a flexible network architecture that effectively fuses multimodal information, accommodating an arbitrary number of input modalities without requiring significant modifications to the diffusion process. Crucially, we mitigate hallucinations, often introduced by text prompts, by using spatial information from other modalities to guide regional text-based conditioning. E...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/cvpr52734.2025.02155","openalex_id":"https://openalex.org/W4413145992","cited_by_count":2,"quality_score":39,"matched_keywords":[],"author_affiliations":["Google (United States)","Johns Hopkins University"],"concepts":[{"id":"https://openalex.org/C2780910867","display_name":"Multimodality","score":0.9119172692298889},{"id":"https://openalex.org/C2779343474","display_name":"Context (archaeology)","score":0.652055561542511},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5626306533813477},{"id":"https://openalex.org/C115961682","display_name":"Image (mathematics)","score":0.4269983768463135},{"id":"https://openalex.org/C163258240","display_name":"Power (physics)","score":0.4125162959098816},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3960622549057007},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.3562561571598053},{"id":"https://openalex.org/C127313418","display_name":"Geology","score":0.11066114902496338}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":2}},{"id":"openalex:W4413147871","title":"Real-IAD D<sup>3</sup>: A Real-World 2D/Pseudo-3D/3D Dataset for Industrial Anomaly Detection","url":"https://doi.org/10.1109/cvpr52734.2025.01417","published":"2025-06-10","authors":["Wenbing Zhu","Lidong Wang","Ziqing Zhou","Chengjie Wang","Yurui Pan","Ruoyi Zhang","Zhuo Chen","Luya Cheng","Bin-Bin Gao","Jiangning Zhang","Zhenye Gan","Yuxie Wang"],"abstract":"The increasing complexity of industrial anomaly detection (IAD) has positioned multimodal detection methods as a focal area of machine vision research. However, dedicated multimodal datasets specifically tailored for IAD remain limited. Pioneering datasets like MVTec 3D have laid essential groundwork in multimodal IAD by incorporating RGB+3D data, but still face challenges in bridging the gap with real industrial environments due to limitations in scale and resolution. To address these challenges, we introduce Real-IAD D<sup xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">3</sup>, a high-precision multimodal dataset that uniquely incorporates an additional pseudo-3D modality generated through photometric stereo, alongside high-resolution RGB images and micrometer-level 3D point clouds. Real-IAD D<sup xmlns:mml=\"http://www.w3.org/1998/Math/MathML\"...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/cvpr52734.2025.01417","openalex_id":"https://openalex.org/W4413147871","cited_by_count":2,"quality_score":39,"matched_keywords":[],"author_affiliations":["Fudan University","Shanghai Jiao Tong University","Shanghai Ocean University","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C739882","display_name":"Anomaly detection","score":0.6381634473800659},{"id":"https://openalex.org/C12997251","display_name":"Anomaly (physics)","score":0.5044664144515991},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.4870007634162903},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.30891579389572144},{"id":"https://openalex.org/C121332964","display_name":"Physics","score":0.17226281762123108},{"id":"https://openalex.org/C26873012","display_name":"Condensed matter physics","score":0.0763871967792511}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":2}},{"id":"openalex:W4413156266","title":"OFER: Occluded Face Expression Reconstruction","url":"https://doi.org/10.1109/cvpr52734.2025.02513","published":"2025-06-10","authors":["Pratheba Selvaraju","Victoria Fernández Abrevaya","Timo Bolkart","Rick Akkerman","Tianyu Ding","Faezeh Amjadi","Ilya Zharkov"],"abstract":"Reconstructing 3D face models from a single image is an inherently ill-posed problem, which becomes even more challenging in the presence of occlusions. In addition to fewer available observations, occlusions introduce an extra source of ambiguity where multiple reconstructions can be equally valid. Despite the ubiquity of the problem, very few methods address its multi-hypothesis nature. In this paper we introduce OFER, a novel approach for single-image 3D face reconstruction that can generate plausible, diverse, and expressive 3D faces, even under strong occlusions. Specifically, we train two diffusion models to gener ate a shape and expression coefficients of face parametric model, conditioned on the input image. This approach captures the multi-modal nature of the problem, generating a distribution of solutions as output. However, to maintain consistency across diverse expressions, t...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/cvpr52734.2025.02513","openalex_id":"https://openalex.org/W4413156266","cited_by_count":2,"quality_score":39,"matched_keywords":[],"author_affiliations":["Google (United States)","Max Planck Institute for Intelligent Systems","Microsoft Research (United Kingdom)","University of Massachusetts Amherst"],"concepts":[{"id":"https://openalex.org/C2779304628","display_name":"Face (sociological concept)","score":0.6880966424942017},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5438022017478943},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.5174765586853027},{"id":"https://openalex.org/C90559484","display_name":"Expression (computer science)","score":0.4887792766094208},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4609091877937317},{"id":"https://openalex.org/C121684516","display_name":"Computer graphics (images)","score":0.32648783922195435},{"id":"https://openalex.org/C41895202","display_name":"Linguistics","score":0.10750514268875122},{"id":"https://openalex.org/C199360897","display_name":"Programming language","score":0.08347952365875244}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":2}},{"id":"openalex:W4413146716","title":"Fingerprinting Denoising Diffusion Probabilistic Models","url":"https://doi.org/10.1109/cvpr52734.2025.02683","published":"2025-06-10","authors":["Huan Teng","Yuhui Quan","Chengyu Wang","Jun Huang","Hui Ji"],"abstract":"Diffusion models, especially denoising diffusion probabilistic models (DDPMs), are prevalent tools in generative AI, making their intellectual property (IP) protection increasingly important. Most existing IP protection methods for DDPMs are invasive, e.g., model watermarking, which alter model parameters and raise concerns about performance degradation, also with requirement for extra computational resources for retraining or fine-tuning. In this paper, we propose the first non-invasive fingerprinting scheme for DDPMs, requiring no parameter changes or fine-tuning, and keeping generation quality intact. We introduce a discriminative and robust fingerprint latent space based on the well-designed \"crossing route\" of noisy samples that span the performance border-zone of DDPMs, with only black-box access required for the diffusion denoiser in ownership verification. Extensive experiments d...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/cvpr52734.2025.02683","openalex_id":"https://openalex.org/W4413146716","cited_by_count":2,"quality_score":39,"matched_keywords":[],"author_affiliations":["Alibaba Group (China)","National University of Singapore","South China University of Technology"],"concepts":[{"id":"https://openalex.org/C49937458","display_name":"Probabilistic logic","score":0.7499117255210876},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6587820053100586},{"id":"https://openalex.org/C69357855","display_name":"Diffusion","score":0.5668312311172485},{"id":"https://openalex.org/C163294075","display_name":"Noise reduction","score":0.5177280306816101},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.46676015853881836},{"id":"https://openalex.org/C153180895","display_name":"Pattern recognition (psychology)","score":0.44213610887527466},{"id":"https://openalex.org/C121332964","display_name":"Physics","score":0.05572423338890076},{"id":"https://openalex.org/C97355855","display_name":"Thermodynamics","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":2}},{"id":"openalex:W4413147583","title":"DistinctAD: Distinctive Audio Description Generation in Contexts","url":"https://doi.org/10.1109/cvpr52734.2025.01267","published":"2025-06-10","authors":["Bo Fang","Wenhao Wu","Qiangqiang Wu","Yuxin Song","Antoni B. Chan"],"abstract":"Audio Descriptions (ADs) aim to provide a narration of a movie in text form, describing non-dialogue-related narratives, such as characters, actions, or scene establishment. Automatic generation of ADs remains challenging due to: i) the domain gap between movie-AD data and existing data used to train vision-language models, and ii) the issue of contextual redundancy arising from highly similar neighboring visual clips in a long movie. In this work, we propose DistinctAD, a novel two-stage framework for generating ADs that emphasize distinctiveness to produce better narratives. To address the domain gap, we introduce a CLIP-AD adaptation strategy that does not require additional AD corpora, enabling more effective alignment between movie and AD modalities at both global and finegrained levels. In Stage-II, DistinctAD incorporates two key innovations: (i) a Contextual Expectation-Maximizat...","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/cvpr52734.2025.01267","openalex_id":"https://openalex.org/W4413147583","cited_by_count":2,"quality_score":39,"matched_keywords":[],"author_affiliations":["Baidu (China)","City University of Hong Kong"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7227978706359863},{"id":"https://openalex.org/C28490314","display_name":"Speech recognition","score":0.32705995440483093}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":2}},{"id":"openalex:W4413145709","title":"Pose Priors from Language Models","url":"https://doi.org/10.1109/cvpr52734.2025.00668","published":"2025-06-10","authors":["Sanjay Subramanian","Evonne Ng","Lea Müller","Dan Klein","Shiry Ginosar","Trevor Darrell"],"abstract":"Language is often used to describe physical interaction, yet most 3D human pose estimation methods overlook this rich source of information. We bridge this gap by leveraging large multimodal models (LMMs) as priors for reconstructing contact poses, offering a scalable alternative to traditional methods that rely on human annotations or motion capture data. Our approach extracts contact-relevant descriptors from an LMM and translates them into tractable losses to constrain 3D human pose optimization. Despite its simplicity, our method produces compelling reconstructions for both two-person interactions and self-contact scenarios, accurately capturing the semantics of physical and social interactions. Our results demonstrate that LMMs can serve as powerful tools for contact prediction and pose estimation, offering an alternative to costly manual human annotations or motion capture data. Ou...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/cvpr52734.2025.00668","openalex_id":"https://openalex.org/W4413145709","cited_by_count":1,"quality_score":38,"matched_keywords":[],"author_affiliations":["DeepMind (United Kingdom)","Google (United States)","University of California, Berkeley"],"concepts":[{"id":"https://openalex.org/C177769412","display_name":"Prior probability","score":0.787297248840332},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7001221179962158},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.48260700702667236},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4527932405471802},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.3815728425979614},{"id":"https://openalex.org/C107673813","display_name":"Bayesian probability","score":0.14755156636238098}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4413146139","title":"ICT: Image-Object Cross-Level Trusted Intervention for Mitigating Object Hallucination in Large Vision-Language Models","url":"https://doi.org/10.1109/cvpr52734.2025.00398","published":"2025-06-10","authors":["J. Chen","Tianshu Zhang","Shiyu Huang","Y. Niu","Linfeng Zhang","Lijie Wen","Xuming Hu"],"abstract":"Despite the recent breakthroughs achieved by Large Vision Language Models (LVLMs) in understanding and responding to complex visual-textual contexts, their inherent hallucination tendencies limit their practical application in real-world scenarios that demand high levels of precision. Existing methods typically either fine-tune the LVLMs using additional data, which incurs extra costs in manual annotation and computational resources or perform comparisons at the decoding stage, which may eliminate useful language priors for reasoning while introducing inference time overhead. Therefore, we propose ICT, a lightweight, training-free method that calculates an intervention direction to shift the model’s focus towards different levels of visual information, enhancing its attention to high-level and fine-grained visual details. During the forward pass stage, the intervention is applied to the....","companies":["Z.ai/Zhipu"],"matched_orgs":["Z.ai/Zhipu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/cvpr52734.2025.00398","openalex_id":"https://openalex.org/W4413146139","cited_by_count":1,"quality_score":38,"matched_keywords":[],"author_affiliations":["Chongqing University","Shanghai Jiao Tong University","Tsinghua University","Zhipu AI (China)"],"concepts":[{"id":"https://openalex.org/C2781238097","display_name":"Object (grammar)","score":0.6941937208175659},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6699603796005249},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.6116488575935364},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.5596024394035339},{"id":"https://openalex.org/C3019973339","display_name":"Object based","score":0.4888545870780945},{"id":"https://openalex.org/C2780665704","display_name":"Intervention (counseling)","score":0.43336546421051025},{"id":"https://openalex.org/C15744967","display_name":"Psychology","score":0.22317183017730713},{"id":"https://openalex.org/C118552586","display_name":"Psychiatry","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4413146732","title":"Generative Omnimatte: Learning to Decompose Video into Layers","url":"https://doi.org/10.1109/cvpr52734.2025.01168","published":"2025-06-10","authors":["Yao-Chih Lee","Erika Lu","Sarah Rumbley","Michal Geyer","Jia‐Bin Huang","Tali Dekel","Forrester Cole"],"abstract":"Given a video and a set of input object masks, an omnimatte method aims to decompose the video into semantically meaningful layers containing individual objects along with their associated effects, such as shadows and reflections. Existing omnimatte methods assume a static background or accurate pose and depth estimation and produce poor decompositions when these assumptions are violated. Furthermore, due to the lack of generative prior on natural videos, existing methods cannot complete dynamic occluded regions. We present a novel generative layered video decomposition framework to address the omnimatte problem. Our method does not assume a stationary scene or require camera pose or depth information and produces clean, complete layers, including convincing completions of occluded dynamic regions. Our core idea is to train a video diffusion model to identify and remove scene effects cau...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/cvpr52734.2025.01168","openalex_id":"https://openalex.org/W4413146732","cited_by_count":1,"quality_score":38,"matched_keywords":[],"author_affiliations":["DeepMind (United Kingdom)","Google (United States)","University of Maryland, College Park"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7432364225387573},{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.66873699426651},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5440352559089661},{"id":"https://openalex.org/C167966045","display_name":"Generative model","score":0.4283938407897949},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.38799965381622314},{"id":"https://openalex.org/C28490314","display_name":"Speech recognition","score":0.3385758697986603}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4413147391","title":"Boltzmann Attention Sampling for Image Analysis with Small Objects","url":"https://doi.org/10.1109/cvpr52734.2025.02417","published":"2025-06-10","authors":["Theodore Zhao","Sid Kiblawi","Naoto Usuyama","Ho Hin Lee","Sam Preston","Hoifung Poon","Mu Wei"],"abstract":"Detecting and segmenting small objects, such as lung nodules and tumor lesions, remains a critical challenge in image analysis. These objects often occupy less than 0.1% of an image, making traditional transformer architectures inefficient and prone to performance degradation due to redundant attention computations on irrelevant regions. Existing sparse attention mechanisms rely on rigid hierarchical structures, which are poorly suited for detecting small, variable, and uncertain object locations. In this paper, we propose BoltzFormer, a novel transformer-based architecture designed to address these challenges through dynamic sparse attention. BoltzFormer identifies and focuses attention on relevant areas by modeling uncertainty using a Boltzmann distribution with an annealing schedule. Initially, a higher temperature allows broader area sampling in early layers, when object location unc...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/cvpr52734.2025.02417","openalex_id":"https://openalex.org/W4413147391","cited_by_count":1,"quality_score":38,"matched_keywords":[],"author_affiliations":["Microsoft (United States)"],"concepts":[{"id":"https://openalex.org/C140779682","display_name":"Sampling (signal processing)","score":0.6413552165031433},{"id":"https://openalex.org/C115961682","display_name":"Image (mathematics)","score":0.5984971523284912},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5750471949577332},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.4255502223968506},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3892281651496887},{"id":"https://openalex.org/C121684516","display_name":"Computer graphics (images)","score":0.36245954036712646},{"id":"https://openalex.org/C106131492","display_name":"Filter (signal processing)","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4413146814","title":"Scene Splatter: Momentum 3D Scene Generation from Single Image with Video Diffusion Model","url":"https://doi.org/10.1109/cvpr52734.2025.00571","published":"2025-06-10","authors":["Shengjun Zhang","Jinzhao Li","Fei Xin","Hao Liu","Yueqi Duan"],"abstract":"In this paper, we propose Scene Splatter, a momentum-based paradigm for video diffusion to generate generic scenes from single image. Existing methods, which employ video generation models to synthesize novel views, suffer from limited video length and scene inconsistency, leading to artifacts and distortions during further reconstruction. To address this issue, we construct noisy samples from original features as momentum to enhance video details and maintain scene consistency. However, for latent features with the perception field that spans both known and unknown regions, such latent-level momentum restricts the generative ability of video diffusion in unknown regions. Therefore, we further introduce the aforementioned consistent video as a pixel-level momentum to a directly generated video without momentum for better recovery of unseen regions. Our cascaded momentum enables video dif...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/cvpr52734.2025.00571","openalex_id":"https://openalex.org/W4413146814","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Tencent (China)","Tsinghua University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6267820596694946},{"id":"https://openalex.org/C121684516","display_name":"Computer graphics (images)","score":0.6113255620002747},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.5909348130226135},{"id":"https://openalex.org/C69357855","display_name":"Diffusion","score":0.5145696997642517},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.49284815788269043},{"id":"https://openalex.org/C115961682","display_name":"Image (mathematics)","score":0.43639662861824036},{"id":"https://openalex.org/C121332964","display_name":"Physics","score":0.12736773490905762},{"id":"https://openalex.org/C97355855","display_name":"Thermodynamics","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4413156152","title":"Provoking Multi-modal Few-Shot LVLM via Exploration-Exploitation In-Context Learning","url":"https://doi.org/10.1109/cvpr52734.2025.00362","published":"2025-06-10","authors":["Cheng Chen","Yunpeng Zhai","Yifan Zhao","Jinyang Gao","Bolin Ding","Jia Li"],"abstract":"In-context learning (ICL), a predominant trend in instruction learning, aims at enhancing the performance of large language models by providing clear task guidance and examples, improving their capability in task understanding and execution. This paper investigates ICL on Large Vision-Language Models (LVLMs) and explores the policies of multi-modal demonstration selection. Existing research efforts in ICL face significant challenges: First, they rely on pre-defined demonstrations or heuristic selecting strategies based on human intuition, which are usually inadequate for covering diverse task requirements, leading to sub-optimal solutions; Second, individually selecting each demonstration fails in modeling the interactions between them, resulting in information redundancy. Unlike these prevailing efforts, we propose a new exploration-exploitation reinforcement learning framework, which e...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/cvpr52734.2025.00362","openalex_id":"https://openalex.org/W4413156152","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Alibaba Group (China)","Alibaba Group (United States)","State Key Laboratory of Virtual Reality Technology and Systems","Virtual Reality Medical Center"],"concepts":[{"id":"https://openalex.org/C71139939","display_name":"Modal","score":0.7149620056152344},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6761036515235901},{"id":"https://openalex.org/C2779343474","display_name":"Context (archaeology)","score":0.6432598829269409},{"id":"https://openalex.org/C2778344882","display_name":"Shot (pellet)","score":0.5271196961402893},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.38108259439468384},{"id":"https://openalex.org/C127313418","display_name":"Geology","score":0.15148422122001648},{"id":"https://openalex.org/C192562407","display_name":"Materials science","score":0.050741881132125854},{"id":"https://openalex.org/C191897082","display_name":"Metallurgy","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4413147230","title":"Low-Biased General Annotated Dataset Generation","url":"https://doi.org/10.1109/cvpr52734.2025.02338","published":"2025-06-10","authors":["Dengyang Jiang","Haoyu Wang","Lei Zhang","Wei Wei","Guang Dai","Mengmeng Wang","Jingdong Wang","Yanning Zhang"],"abstract":"Pre-training backbone networks on a general annotated dataset (e.g., ImageNet) that comprises numerous manually collected images with category annotations has proven to be indispensable for enhancing the generalization capacity of downstream visual tasks. However, those manually collected images often exhibit bias, which is non-transferable across either categories or domains, thus causing the model’s generalization capacity degeneration. To mitigate this problem, we present a low-biased general annotated dataset generation framework (lbGen). Instead of expensive manual collection, we aim at directly generating low-biased images with category annotations. To achieve this goal, we propose to leverage the advantage of a multimodal foundation model (e.g., CLIP), in terms of aligning images in a low-biased semantic space defined by language. Specifically, we develop a bi-level semantic align...","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/cvpr52734.2025.02338","openalex_id":"https://openalex.org/W4413147230","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Baidu (China)","Northwestern Polytechnical University","State Grid Corporation of China (China)","Zhejiang University of Technology"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6961176991462708},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.37421277165412903}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4413147659","title":"IM-Portrait: Learning 3D-aware Video Diffusion for Photorealistic Talking Heads from Monocular Videos","url":"https://doi.org/10.1109/cvpr52734.2025.01966","published":"2025-06-10","authors":["Yuan Li","Ziqian Bai","Feitong Tan","Zhaopeng Cui","Sean Fanello","Yinda Zhang"],"abstract":"We propose a novel 3D-aware diffusion-based method for generating photorealistic talking head videos directly from a single identity image and explicit control signals (e.g., expressions). Our method generates Multiplane Images (MPIs) that ensure geometric consistency, making them ideal for immersive viewing experiences like binocular videos for VR headsets. Unlike existing methods that often require a separate stage or joint optimization to reconstruct a 3D representation (such as NeRF or 3D Gaussians), our approach directly generates the final output through a single denoising process, eliminating the need for post-processing steps to render novel views efficiently. To effectively learn from monocular videos, we introduce a training mechanism that reconstructs the output MPI randomly in either the target or the reference camera space. This approach enables the model to simultaneously l...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/cvpr52734.2025.01966","openalex_id":"https://openalex.org/W4413147659","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Google (United States)","Zhejiang University"],"concepts":[{"id":"https://openalex.org/C65909025","display_name":"Monocular","score":0.7903577089309692},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7046289443969727},{"id":"https://openalex.org/C162462552","display_name":"Portrait","score":0.6635035276412964},{"id":"https://openalex.org/C121684516","display_name":"Computer graphics (images)","score":0.6243317723274231},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.559450089931488},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5067985653877258},{"id":"https://openalex.org/C69357855","display_name":"Diffusion","score":0.4775179922580719},{"id":"https://openalex.org/C153349607","display_name":"Visual arts","score":0.1791573166847229}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4413156575","title":"IDEA-Bench: How Far are Generative Models from Professional Designing?","url":"https://doi.org/10.1109/cvpr52734.2025.01728","published":"2025-06-10","authors":["Liang Chen","Lianghua Huang","Jing Fang","Huanzhang Dou","Wei Wang","Zhi-Fan Wu","Yupeng Shi","Junge Zhang","Xin Zhao","Yu Liu"],"abstract":"Recent advancements in image generation models enable the creation of high-quality images and targeted modifications based on textual instructions. Some models even support multimodal complex guidance and demonstrate robust task generalization capabilities. However, they still fall short of meeting the nuanced, professional demands of designers. To bridge this gap, we introduce IDEA-Bench, a comprehensive benchmark designed to advance image generation models toward applications with robust task generalization. IDEA-Bench comprises 100 professional image generation tasks and 275 specific cases, categorized into five major types based on the current capabilities of existing models. Furthermore, we provide a representative subset of 18 tasks with enhanced evaluation criteria to facilitate more nuanced and reliable evaluations using Multimodal Large Language Models (MLLMs). By assessing mode...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/cvpr52734.2025.01728","openalex_id":"https://openalex.org/W4413156575","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Alibaba Group (China)","Alibaba Group (United States)","Chinese Academy of Sciences","University of Science and Technology Beijing","Zhejiang University"],"concepts":[{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.5774495005607605},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5644518136978149},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.20993450284004211}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4413147067","title":"Focus-N-Fix: Region-Aware Fine-Tuning for Text-to-Image Generation","url":"https://doi.org/10.1109/cvpr52734.2025.01723","published":"2025-06-10","authors":["Xiaoying Xing","Avinab Saha","Junfeng He","Susan Hao","Paul Vicol","Moonkyung Ryu","Gang Li","Sahil Singla","Sarah Young","Yinxiao Li","Feng Yang","Deepak Ramachandran"],"abstract":"Text-to-image (T2I) generation has made significant advances in recent years, but challenges still remain in the generation of perceptual artifacts, misalignment with complex prompts, and safety. The prevailing approach to address these issues involves collecting human feedback on generated images, training reward models to estimate human feedback, and then fine-tuning T2I models based on the reward models to align them with human preferences. However, while existing reward fine-tuning methods can produce images with higher rewards, they may change model behavior in unexpected ways. For example, fine-tuning for one quality aspect (e.g., safety) may degrade other aspects (e.g., prompt alignment), or may lead to reward hacking (e.g., finding a way to increase rewards without having the intended effect). In this paper, we propose Focus-N-Fix, the first region-aware fine-tuning method that t...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/cvpr52734.2025.01723","openalex_id":"https://openalex.org/W4413147067","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["DeepMind (United Kingdom)","Google (United States)"],"concepts":[{"id":"https://openalex.org/C192209626","display_name":"Focus (optics)","score":0.7564643621444702},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.690401554107666},{"id":"https://openalex.org/C115961682","display_name":"Image (mathematics)","score":0.4825538992881775},{"id":"https://openalex.org/C121684516","display_name":"Computer graphics (images)","score":0.34857216477394104},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3407345116138458},{"id":"https://openalex.org/C121332964","display_name":"Physics","score":0.09729328751564026},{"id":"https://openalex.org/C120665830","display_name":"Optics","score":0.06820335984230042}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4413157934","title":"Encapsulated Composition of Text-to-Image and Text-to-Video Models for High-Quality Video Synthesis","url":"https://doi.org/10.1109/cvpr52734.2025.01697","published":"2025-06-10","authors":["Tongtong Su","Chengyu Wang","Bingyan Liu","Jun Huang","Dongming Lu"],"abstract":"In recent years, large text-to-video (T2V) synthesis models have garnered considerable attention for their abilities to generate videos from textual descriptions. However, achieving both high imaging quality and effective motion representation remains a significant challenge for these T2V models. Existing approaches often adapt pre-trained text-to-image (T2I) models to refine video frames, leading to issues such as flickering and artifacts due to inconsistencies across frames. In this paper, we introduce EVS, a training-free Encapsulated Video Synthesizer that composes T2I and T2V models to enhance both visual fidelity and motion smoothness of generated videos. Our approach utilizes a well-trained diffusion-based T2I model to refine low-quality video frames by treating them as out-of-distribution samples, effectively optimizing them with noising and denoising steps. Meanwhile, we employ....","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/cvpr52734.2025.01697","openalex_id":"https://openalex.org/W4413157934","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Alibaba Group (China)","South China University of Technology","Zhejiang University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7303497791290283},{"id":"https://openalex.org/C40231798","display_name":"Composition (language)","score":0.5404682159423828},{"id":"https://openalex.org/C2779530757","display_name":"Quality (philosophy)","score":0.4349972605705261},{"id":"https://openalex.org/C115961682","display_name":"Image (mathematics)","score":0.4137107729911804},{"id":"https://openalex.org/C49774154","display_name":"Multimedia","score":0.406076043844223},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.37762293219566345},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.33304721117019653},{"id":"https://openalex.org/C121684516","display_name":"Computer graphics (images)","score":0.3310731053352356}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4413145560","title":"DyFo: A Training-Free Dynamic Focus Visual Search for Enhancing LMMs in Fine-Grained Visual Understanding","url":"https://doi.org/10.1109/cvpr52734.2025.00850","published":"2025-06-10","authors":["Li Geng","Jinglin Xu","Yunzhen Zhao","Yuxin Peng"],"abstract":"Humans can effortlessly locate desired objects in cluttered environments, relying on a cognitive mechanism known as visual search to efficiently filter out irrelevant information and focus on task-related regions. Inspired by this process, we propose DyFo (Dynamic Focus), a training-free dynamic focusing visual search method that enhances fine-grained visual understanding in large multi-modal models (LMMs). Unlike existing approaches which require additional modules or data collection, DyFo leverages a bidirectional interaction between LMMs and visual experts, using a Monte Carlo Tree Search (MCTS) algorithm to simulate human-like focus adjustments. This enables LMMs to focus on key visual regions while filtering out irrelevant content, without introducing additional training caused by vocabulary expansion or the integration of specialized localization modules. Experimental results demon...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/cvpr52734.2025.00850","openalex_id":"https://openalex.org/W4413145560","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Peking University","Tencent (China)","University of Science and Technology Beijing"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6758862137794495},{"id":"https://openalex.org/C192209626","display_name":"Focus (optics)","score":0.5710462927818298},{"id":"https://openalex.org/C2777211547","display_name":"Training (meteorology)","score":0.569575846195221},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.50440514087677},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.3963415026664734},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.34911417961120605},{"id":"https://openalex.org/C121684516","display_name":"Computer graphics (images)","score":0.34875690937042236},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.348184198141098}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4413146254","title":"Context-Aware Multimodal Pretraining","url":"https://doi.org/10.1109/cvpr52734.2025.00403","published":"2025-06-10","authors":["Karsten Roth","Zeynep Akata","Dima Damen","Ivana Balažević","Olivier J. Hénaff"],"abstract":"Large-scale multimodal representation learning successfully optimizes for zero-shot transfer at test time. Yet the standard pretraining paradigm (contrastive learning on large amounts of image-text data) does not explicitly encourage representations to support few-shot adaptation. In this work, we propose a simple, but carefully designed extension to multimodal pretraining which enables representations to accommodate additional context. Using this objective, we show that vision-language models can be trained to exhibit significantly increased few-shot adaptation: across 21 downstream tasks, we find up to fourfold improvements in test-time sample efficiency, and average few-shot adaptation gains of over 5%, while retaining zero-shot generalization performance across model scales and training durations. In particular, equipped with simple, training-free, metric-based adaptation mechanisms,...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/cvpr52734.2025.00403","openalex_id":"https://openalex.org/W4413146254","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Center for Integrated Protein Science Munich","DeepMind (United Kingdom)","Google (United States)","TH Bingen University of Applied Sciences"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7198923230171204},{"id":"https://openalex.org/C2779343474","display_name":"Context (archaeology)","score":0.571654736995697},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.3902572989463806},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.38082417845726013},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.3323739171028137},{"id":"https://openalex.org/C95457728","display_name":"History","score":0.07038652896881104},{"id":"https://openalex.org/C166957645","display_name":"Archaeology","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4413146015","title":"Are Images Indistinguishable to Humans Also Indistinguishable to Classifiers?","url":"https://doi.org/10.1109/cvpr52734.2025.02681","published":"2025-06-10","authors":["Zebin You","Xinyu Zhang","Hanzhong Guo","Jingdong Wang","Chongxuan Li"],"abstract":"The ultimate goal of generative models is to perfectly capture the data distribution. For image generation, common metrics of visual quality (e.g., FID) and the perceived truthfulness of generated images seem to suggest that we are nearing this goal. However, through distribution classification tasks, we reveal that, from the perspective of neural network-based classifiers, even advanced diffusion models are still far from this goal. Specifically, classifiers are able to consistently and effortlessly distinguish real images from generated ones across various settings. Moreover, we uncover an intriguing discrepancy: classifiers can easily differentiate between diffusion models with comparable performance (e.g., U-ViTH vs. DiT-XL), but struggle to distinguish between models within the same family but of different scales (e.g., EDM2-XS vs. EDM2-XXL). Our methodology carries several importan...","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/cvpr52734.2025.02681","openalex_id":"https://openalex.org/W4413146015","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Baidu (China)","Renmin University of China","University of Adelaide"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6075671911239624},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5811032056808472},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.3761409819126129},{"id":"https://openalex.org/C153180895","display_name":"Pattern recognition (psychology)","score":0.37351804971694946}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"arxiv:2503.23733","title":"AdaMMS: Model Merging for Heterogeneous Multimodal Large Language Models with Unsupervised Coefficient Optimization","url":"http://arxiv.org/abs/2503.23733","published":"2025-06-10","authors":["Yiyang Du","Xiaochen Wang","Chi Chen","Jiabo Ye","Yiru Wang","Peng Li","Ming Yan","Ji Zhang","Fei Huang","Zhifang Sui","Maosong Sun","Yang Liu"],"abstract":"Recently, model merging methods have demonstrated powerful strengths in combining abilities on various tasks from multiple Large Language Models (LLMs). While previous model merging methods mainly focus on merging homogeneous models with identical architecture, they meet challenges when dealing with Multimodal Large Language Models (MLLMs) with inherent heterogeneous property, including differences in model architecture and the asymmetry in the parameter space. In this work, we propose AdaMMS <sup xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">1</sup>, a novel model merging method tailored for heterogeneous MLLMs. Our method tackles the challenges in three steps: mapping, merging and searching. Specifically, we first design mapping function between models to apply model merging on MLLMs with different architecture. Then we apply linear interpola...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/cvpr52734.2025.00879","openalex_id":"https://openalex.org/W4413144377","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Alibaba Group (China)","Open Source Science Project","Peking University","Tsinghua University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7801439762115479},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.5189954042434692},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4687037169933319},{"id":"https://openalex.org/C67186912","display_name":"Data modeling","score":0.43367695808410645},{"id":"https://openalex.org/C115903868","display_name":"Software engineering","score":0.09265786409378052}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4413146073","title":"AMO Sampler: Enhancing Text Rendering with Overshooting","url":"https://doi.org/10.1109/cvpr52734.2025.01228","published":"2025-06-10","authors":["Xixi Hu","Keyang Xu","Bo Liu","Qiang Liu","Hongliang Fei"],"abstract":"Achieving precise alignment between textual instructions and generated images in text-to-image generation is a significant challenge, particularly in rendering written text within images. Sate-of-the-art models like Stable Diffusion 3 (SD3), Flux, and AuraFlow still struggle with accurate text depiction, resulting in misspelled or inconsistent text. We introduce a training-free method with minimal computational overhead that significantly enhances text rendering quality. Specifically, we introduce an overshooting sampler for pre-trained rectified flow (RF) models, by alternating between over-simulating the learned ordinary differential equation (ODE) and reintroducing noise. Compared to the Euler sampler, the overshooting sampler effectively introduces an extra Langevin dynamics term that can help correct the compounding error from successive Euler steps and therefore improve the text re...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/cvpr52734.2025.01228","openalex_id":"https://openalex.org/W4413146073","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Google (United States)","The University of Texas at Austin"],"concepts":[{"id":"https://openalex.org/C205711294","display_name":"Rendering (computer graphics)","score":0.6781821250915527},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.663716197013855},{"id":"https://openalex.org/C121684516","display_name":"Computer graphics (images)","score":0.5220398902893066},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.38706907629966736}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/sws-self-aware-weakness-driven-problem-synthesis-in-reinforcement-learning-for-llm-reasoning","title":"SwS: Self-aware Weakness-driven Problem Synthesis in Reinforcement Learning for LLM Reasoning","url":"https://www.microsoft.com/en-us/research/publication/sws-self-aware-weakness-driven-problem-synthesis-in-reinforcement-learning-for-llm-reasoning/","published":"2025-06-09","authors":["Xiao Liang","Zhong-zhi Li","Yeyun Gong","Yang Wang","Hengyuan Zhang","Yelong Shen","Yingchun Wu","Weizhu Chen"],"abstract":"Reinforcement Learning with Verifiable Rewards (RLVR) has proven effective for training large language models (LLMs) on complex reasoning tasks, such as mathematical problem solving. A prerequisite for the scalability of RLVR is a high-quality problem set with precise and verifiable answers. However, the scarcity of well-crafted human-labeled math problems and limited-verification answers in existing distillation-oriented synthetic datasets limit their effectiveness in RL. Additionally, most problem synthesis strategies indiscriminately expand the problem set without considering the model's capabilities, leading to low efficiency in generating useful questions. To mitigate this issue, we introduce a Self-aware Weakness-driven problem Synthesis framework (SwS) that systematically identifies model deficiencies and leverages them for problem augmentation. Specifically, we define weaknesses....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":80,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","Reinforcement learning","1970-01-01","LLM","distillation"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/code-researcher-deep-research-agent-for-large-systems-code-and-commit-history","title":"Code Researcher: Deep Research Agent for Large Systems Code and Commit History","url":"https://www.microsoft.com/en-us/research/publication/code-researcher-deep-research-agent-for-large-systems-code-and-commit-history/","published":"2025-06-09","authors":["Ramneet Singh∗","Sathvik Joel∗","Abhav Mehrotra","Nalin Wadhwa","Ramakrishna Bairi","Aditya Kanade","Nagarajan Natarajan"],"abstract":"ArXiv link: https://arxiv.org/abs/2506.11060 Large Language Model (LLM)-based coding agents have shown promising results on coding benchmarks, but their effectiveness on systems code remains underexplored. Due to the size and complexities of systems code, making changes to a systems codebase is a daunting task, even for humans. It requires researching about many pieces of context, derived from the large codebase and its massive commit history, before making changes. Inspired by the recent progress on deep research agents, we design the first deep research agent for code, called Code Researcher, and apply it to the problem of generating patches for mitigating crashes reported in systems code. Code Researcher performs multi-step reasoning about semantics, patterns, and commit history of code to gather sufficient context. The context is stored in a structured memory which is used for synthe...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":80,"matched_keywords":["Tech Report","Artificial intelligence","Programming languages and software engineering","LLM","language model","memory","agent"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W4411150224","title":"MA-FSAR: Multimodal Adaptation of CLIP for few-shot action recognition","url":"https://doi.org/10.1016/j.patcog.2025.111902","published":"2025-06-09","authors":["Jiazheng Xing","Jian Zhao","Chao Xu","Mengmeng Wang","Guang Dai","Yong Liu","Jingdong Wang","Xuelong Li"],"abstract":"","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1016/j.patcog.2025.111902","openalex_id":"https://openalex.org/W4411150224","cited_by_count":13,"quality_score":50,"matched_keywords":[],"author_affiliations":["Baidu (China)","China Central Television","Northwestern Polytechnic University","Northwestern Polytechnical University","Robotics Research (United States)","State Grid Corporation of China (China)","Zhejiang University of Technology"],"concepts":[{"id":"https://openalex.org/C139807058","display_name":"Adaptation (eye)","score":0.7037543058395386},{"id":"https://openalex.org/C2780791683","display_name":"Action (physics)","score":0.6429985761642456},{"id":"https://openalex.org/C2987834672","display_name":"Action recognition","score":0.6183103919029236},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.548682451248169},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5106678605079651},{"id":"https://openalex.org/C2778344882","display_name":"Shot (pellet)","score":0.497513085603714},{"id":"https://openalex.org/C153180895","display_name":"Pattern recognition (psychology)","score":0.4161653220653534},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.33549267053604126}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":13}},{"id":"openalex:W4411153558","title":"Measuring Sharpness of AI-Generated Meteorological Imagery","url":"https://doi.org/10.1175/aies-d-24-0083.1","published":"2025-06-09","authors":["Imme Ebert‐Uphoff","Lander Ver Hoef","John S. Schreck","Jason Stock","María J. Molina","Amy McGovern","Michael Yu","Bill Petzke","Kyle Hilburn","David Hall","David John Gagne","William F. Campbell"],"abstract":"Abstract AI-based algorithms are emerging in many meteorological applications that produce imagery as output, including for global weather forecasting models. However, the imagery produced by AI algorithms, especially by convolutional neural networks (CNNs), is often described as too blurry to look realistic, partly because CNNs tend to represent uncertainty as blurriness. This blurriness can be undesirable since it might obscure important meteorological features. More complex AI models, such as Generative AI models, produce images that appear to be sharper. However, improved sharpness may come at the expense of a decline in other performance criteria, such as standard forecast verification metrics. To navigate any trade-off between sharpness and other performance metrics it is important to quantitatively assess those other metrics along with sharpness. While there is a rich set of forec...","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1175/aies-d-24-0083.1","openalex_id":"https://openalex.org/W4411153558","cited_by_count":1,"quality_score":38,"matched_keywords":[],"author_affiliations":["Colorado State University","Cooperative Institute for Research in Environmental Sciences","NOAA Oceanic and Atmospheric Research","NSF National Center for Atmospheric Research","National Oceanic and Atmospheric Administration","Nvidia (United States)","University of Maryland, College Park","University of Oklahoma"],"concepts":[{"id":"https://openalex.org/C62649853","display_name":"Remote sensing","score":0.5584001541137695},{"id":"https://openalex.org/C153294291","display_name":"Meteorology","score":0.40801554918289185},{"id":"https://openalex.org/C39432304","display_name":"Environmental science","score":0.3896084427833557},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.388715922832489},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.3288600444793701},{"id":"https://openalex.org/C127313418","display_name":"Geology","score":0.3215453028678894},{"id":"https://openalex.org/C205649164","display_name":"Geography","score":0.2505503296852112}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"arxiv:2506.08065","title":"Dynamic Diffusion Schrödinger Bridge in Astrophysical Observational Inversions","url":"http://arxiv.org/abs/2506.08065","published":"2025-06-09","authors":["Ye Zhu","Duo Xu","Zhiwei Deng","Jonathan C. Tan","Olga Russakovsky"],"abstract":"We study Diffusion Schrödinger Bridge (DSB) models in the context of dynamical astrophysical systems, specifically tackling observational inverse prediction tasks within Giant Molecular Clouds (GMCs) for star formation. We introduce the Astro-DSB model, a variant of DSB with the pairwise domain assumption tailored for astrophysical dynamics. By investigating its learning process and prediction performance in both physically simulated data and in real observations (the Taurus B213 data), we present two main takeaways. First, from the astrophysical perspective, our proposed paired DSB method improves interpretability, learning efficiency, and prediction performance over conventional astrostatistical and other machine learning methods. Second, from the generative modeling perspective, probabilistic generative modeling reveals improvements over discriminative pixel-to-pixel modeling in Out-O...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"preprint","doi":"https://doi.org/10.48550/arxiv.2506.08065","openalex_id":"https://openalex.org/W4416740373","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Canada Research Chairs","Chalmers University of Technology","DeepMind (United Kingdom)","Google (United Kingdom)","Laboratoire d'Informatique de l'École Polytechnique","Princeton University","University of Toronto","University of Virginia","École Polytechnique"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6148999929428101},{"id":"https://openalex.org/C97931131","display_name":"Discriminative model","score":0.5648999810218811},{"id":"https://openalex.org/C2779343474","display_name":"Context (archaeology)","score":0.5619999766349792},{"id":"https://openalex.org/C49937458","display_name":"Probabilistic logic","score":0.5564000010490417},{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.5382999777793884},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4839000105857849},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.4740999937057495},{"id":"https://openalex.org/C100776233","display_name":"Bridge (graph theory)","score":0.445499986410141}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/songbloom-coherent-song-generation-via-interleaved-autoregressive-sketching-and-diffusion-refinement","title":"SongBloom: Coherent Song Generation via Interleaved Autoregressive Sketching and Diffusion Refinement","url":"https://www.microsoft.com/en-us/research/publication/songbloom-coherent-song-generation-via-interleaved-autoregressive-sketching-and-diffusion-refinement/","published":"2025-06-08","authors":["Chenyu Yang","Shuai Wang","Hangting Chen","Wei Tan","Jianwei Yu","Haizhou Li","Jianwei Yu"],"abstract":"Generating music with coherent structure, harmonious instrumental and vocal elements remains a significant challenge in song generation. Existing language models and diffusion-based methods often struggle to balance global coherence with local fidelity, resulting in outputs that lack musicality or suffer from incoherent progression and mismatched lyrics. This paper introduces $\\textbf{SongBloom}$, a novel framework for full-length song generation that leverages an interleaved paradigm of autoregressive sketching and diffusion-based refinement. SongBloom employs an autoregressive diffusion model that combines the high fidelity of diffusion models with the scalability of language models. Specifically, it gradually extends a musical sketch from short to long and refines the details from coarse to fine-grained. The interleaved generation paradigm effectively integrates prior semantic and aco...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Audio and Acoustics","Graphics and multimedia","Computer science","Engineering","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/mira-medical-time-series-foundation-model-for-real-world-health-data","title":"MIRA: Medical Time Series Foundation Model for Real-World Health Data","url":"https://www.microsoft.com/en-us/research/publication/mira-medical-time-series-foundation-model-for-real-world-health-data/","published":"2025-06-08","authors":["Hao Li","Bowen Deng","Chang Xu","Zhiyuan Feng","Viktor Schlegel","Yu-Hao Huang","Yizheng Sun","Jingyuan Sun","Kailai Yang","Yiyao Yu","Jiang Bian"],"abstract":"A unified foundation model for medical time series -- pretrained on open access and ethics board-approved medical corpora -- offers the potential to reduce annotation burdens, minimize model customization, and enable robust transfer across clinical institutions, modalities, and tasks, particularly in data-scarce or privacy-constrained environments. However, existing generalist time series foundation models struggle to handle medical time series data due to their inherent challenges, including irregular intervals, heterogeneous sampling rates, and frequent missing values. To address these challenges, we introduce MIRA, a unified foundation model specifically designed for medical time series forecasting. MIRA incorporates a Continuous-Time Rotary Positional Encoding that enables fine-grained modeling of variable time intervals, a frequency-specific mixture-of-experts layer that routes comp...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Medical, health and genomics","Computer science","foundation models","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W4414406005","title":"A-Core: A Novel Framework of Agentic AI in the 6G Core Network","url":"https://doi.org/10.1109/iccworkshops67674.2025.11162291","published":"2025-06-08","authors":["Wen Tong","Wei Huo","Thierry Lejkin","Joël Penhoat","Chenghui Peng","Carlos Eduardo Pereira","Fei Wang","Shizhou Wu","Yang Lu","Yuanming Shi"],"abstract":"With the rapid advancement of generative artificial intelligence (GenAI), its integration into next-generation wireless networks is poised to address challenges faced by various stakeholders. To facilitate GenAI’s interaction with dynamic environments, agentic AI and multi-agent systems are gaining prominence. This paper introduces a novel framework, named A-Core, to integrate agentic AI with 6G core networks (CNs), playing a pivotal role in managing connectivity and generating customized services. A-Core leverages a network-wide CN large model (CN-LM), generative network (GN) instances, and multiple collaborative GenAI-based agents with specialized roles for different stages of service creation to autonomously generate customized services. A case study on smart cities demonstrates A-Core’s potential to improve service delivery efficiency. Finally, we discuss key issues in design and imp...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/iccworkshops67674.2025.11162291","openalex_id":"https://openalex.org/W4414406005","cited_by_count":1,"quality_score":46,"matched_keywords":["agent","multi-agent"],"author_affiliations":["Huawei Technologies (Canada)","Huawei Technologies (China)","Orange (France)","ShanghaiTech University"],"concepts":[{"id":"https://openalex.org/C26517878","display_name":"Key (lock)","score":0.6378999948501587},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6308000087738037},{"id":"https://openalex.org/C2164484","display_name":"Core (optical fiber)","score":0.6219000220298767},{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.5837000012397766},{"id":"https://openalex.org/C2780378061","display_name":"Service (business)","score":0.4715000092983246},{"id":"https://openalex.org/C2522767166","display_name":"Data science","score":0.36079999804496765},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3529999852180481},{"id":"https://openalex.org/C108037233","display_name":"Wireless network","score":0.34450000524520874}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4414405895","title":"LLM Prompt Engineering for IEEE 802.11 DCF Optimization","url":"https://doi.org/10.1109/iccworkshops67674.2025.11162196","published":"2025-06-08","authors":["Jiane Zuo","Qiao Lan","Ziyang Guo","Peng Liu","Jian Song"],"abstract":"The emergence of large-language models (LLMs) has catalyzed paradigm shifts in addressing engineering challenges. Although initial attempts have explored LLMs in wireless communications, existing approaches primarily focus on naïve prompts or fine-tuning for specific tasks such as protocol comprehension and resource allocation. In this paper, we introduce the first LLM reasoning framework specifically designed for optimizing the Distributed Coordination Function (DCF) in Wi-Fi networks. Our framework leverages inference-time computing to exploit the reasoning capabilities of LLMs, eliminating the need for task-specific fine-tuning. We instantiate our design in the context of low-delay channel access for next-generation Wireless Local Area Networks (WLANs), implementing reasoning pipelines with varying computational costs, including zero-shot, in-context learning (ICL) and chain-of-though...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/iccworkshops67674.2025.11162196","openalex_id":"https://openalex.org/W4414405895","cited_by_count":0,"quality_score":41,"matched_keywords":["LLM"],"author_affiliations":["Huawei Technologies (China)","Huawei Technologies (Germany)","Huawei Technologies (United Kingdom)","Huawei Technologies (United States)","Tsinghua University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7462000250816345},{"id":"https://openalex.org/C165696696","display_name":"Exploit","score":0.732699990272522},{"id":"https://openalex.org/C2779343474","display_name":"Context (archaeology)","score":0.6047000288963318},{"id":"https://openalex.org/C2780385302","display_name":"Protocol (science)","score":0.5284000039100647},{"id":"https://openalex.org/C192209626","display_name":"Focus (optics)","score":0.46389999985694885},{"id":"https://openalex.org/C14036430","display_name":"Function (biology)","score":0.4221000075340271},{"id":"https://openalex.org/C555944384","display_name":"Wireless","score":0.41850000619888306},{"id":"https://openalex.org/C127162648","display_name":"Channel (broadcasting)","score":0.3921000063419342}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4411120769","title":"PHAnToM: Persona-Based Prompting Has an Effect on Theory-of-Mind Reasoning in Large Language Models","url":"https://doi.org/10.1609/icwsm.v19i1.35923","published":"2025-06-07","authors":["Gerard Yeo","Fiona Tan An Ting","Kokil Jaidka","Shaz Furniturewala","Wu Fanyou","Weijie Xu","Vinija Jain","Aman Chadha","Yan Liu","See Kiong Ng"],"abstract":"The use of LLMs in natural language reasoning has shown mixed results, sometimes rivaling or even surpassing human performance in simpler classification tasks while struggling with social-cognitive reasoning, a domain where humans naturally excel. These differences have been attributed to many factors, such as variations in prompting and the specific LLMs used. However, no reasons appear conclusive, and no clear mechanisms have been established in prior work. In this study, we empirically evaluate how role-playing persona-based prompting influences Theory-of-Mind (ToM) reasoning capabilities. Grounding our research in psychological theory, we found that, beyond the inherent variance in the complexity of reasoning tasks, ToM performance differences arise because of socially-motivated prompting differences. In an era where prompt engineering with role-play is a typical approach to adapt LL...","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1609/icwsm.v19i1.35923","openalex_id":"https://openalex.org/W4411120769","cited_by_count":2,"quality_score":39,"matched_keywords":[],"author_affiliations":["Amazon (Germany)","Amazon (United States)","Birla Institute of Technology and Science, Pilani","National University of Singapore","Tsinghua University"],"concepts":[{"id":"https://openalex.org/C313442","display_name":"Persona","score":0.7731675505638123},{"id":"https://openalex.org/C104293457","display_name":"Imaging phantom","score":0.7047714591026306},{"id":"https://openalex.org/C15744967","display_name":"Psychology","score":0.4776429235935211},{"id":"https://openalex.org/C2779560602","display_name":"Theory of mind","score":0.4426553249359131},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.41876330971717834},{"id":"https://openalex.org/C41895202","display_name":"Linguistics","score":0.40097132325172424},{"id":"https://openalex.org/C180747234","display_name":"Cognitive psychology","score":0.3911295235157013},{"id":"https://openalex.org/C188147891","display_name":"Cognitive science","score":0.3740028738975525}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":2}},{"id":"bytedance-seed:282","title":"BRiTE: Bootstrapping Reinforced Thinking Process to Enhance Language Model Reasoning","url":"https://seed.bytedance.com/en/research/brite-bootstrapping-reinforced-thinking-process-to-enhance-language-model-reasoning","published":"2025-06-06","authors":["Han Zhong","Yutong Yin","Shenao Zhang","Xiaojun Xu","Yuanxin Liu","Yifei Zuo","Zhihan Liu","Boyi Liu","Sirui Zheng","Hongyi Guo","Liwei Wang","Mingyi Hong"],"abstract":"Large Language Models (LLMs) have demonstrated remarkable capabilities in complex reasoning tasks, yet generating reliable reasoning processes remains a significant challenge. We present a unified probabilistic framework that formalizes LLM reasoning through a novel graphical model incorporating latent thinking processes and evaluation signals. Within this framework, we introduce the Bootstrapping Reinforced Thinking Process (BRiTE) algorithm, which works in two steps. First, it generates high-quality rationales by approximating the optimal thinking process through reinforcement learning, using a novel reward shaping mechanism. Second, it enhances the base LLM by maximizing the joint probability of rationale generation with respect to the model’s parameters. Theoretically, we demonstrate BRiTE’s convergence at a rate of 1/T with T representing the number of iterations. Empirical evaluati...","companies":["ByteDance/Seed"],"matched_orgs":["ByteDance/Seed"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Machine Learning","Responsible AI","ICML 2025","LLM","language model"],"author_affiliations":["ByteDance/Seed"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official ByteDance Seed publication API https://seed.bytedance.com/api/get_article_list_v2"}},{"id":"bytedance-seed:264","title":"Astra: Toward General-Purpose Mobile Robots via Hierarchical Multimodal Learning","url":"https://seed.bytedance.com/en/research/astra-toward-general-purpose-mobile-robots-via-hierarchical-multimodal-learning","published":"2025-06-06","authors":["Sheng Chen","Peiyu He","Jiaxin Hu","Ziyang Liu","Yansheng Wang","Tao Xu","Chi Zhang","Chongchong Zhang","Chao An","Shiyu Cai","Duo Cao","Kangping Chen"],"abstract":"Modern robot navigation systems encounter difficulties in diverse and complex indoor environments. Traditional approaches rely on multiple modules with small models or rule-based systems and thus lack adaptability to new environments. To address this, we developed Astra, a comprehensive dual-model architecture, Astra-Global and Astra-Local, for mobile robot navigation. Astra-Global, a multimodal LLM, processes vision and language inputs to perform self and goal localization using a hybrid topological-semantic graph as the global map, and outperforms traditional visual place recognition methods. Astra-Local, a multitask network, handles local path planning and odometry estimation. Its 4D spatial-temporal encoder, trained through self-supervised learning, generates robust 4D features for downstream tasks. The planning head utilizes flow matching and a novel masked ESDF loss to minimize col...","companies":["ByteDance/Seed"],"matched_orgs":["ByteDance/Seed"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Robotics","arXiv","LLM"],"author_affiliations":["ByteDance/Seed"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official ByteDance Seed publication API https://seed.bytedance.com/api/get_article_list_v2"}},{"id":"openalex:W4411086771","title":"The Synergy Between Data and Multi-Modal Large Language Models: A Survey From Co-Development Perspective","url":"https://doi.org/10.1109/tpami.2025.3576835","published":"2025-06-06","authors":["Zhen Qin","Daoyuan Chen","Wenhao Zhang","Liuyi Yao","Yilun Huang","Bolin Ding","Yaliang Li","Shuiguang Deng"],"abstract":"Recent years have witnessed the rapid development of large language models (LLMs). Multi-modal LLMs (MLLMs) extend modality from text to various domains, attracting widespread attention due to their diverse application scenarios. As LLMs and MLLMs rely on vast amounts of model parameters and data to achieve emergent capabilities, the importance of data is gaining increasing recognition. Reviewing recent data-driven works for MLLMs, we find that the development of models and data is not two separate paths but rather interconnected. Vaster and higher-quality data improve MLLM performance, while MLLMs, in turn, facilitate the development of data. The co-development of multi-modal data and MLLMs requires a clear view of 1) at which development stages of MLLMs specific data-centric approaches can be employed to enhance certain MLLM capabilities, and 2) how MLLMs, using these capabilities, can...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/tpami.2025.3576835","openalex_id":"https://openalex.org/W4411086771","cited_by_count":2,"quality_score":39,"matched_keywords":[],"author_affiliations":["Alibaba Group (China)","Zhejiang University of Science and Technology"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6835733652114868},{"id":"https://openalex.org/C12713177","display_name":"Perspective (graphical)","score":0.6807107329368591},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5437877774238586},{"id":"https://openalex.org/C71139939","display_name":"Modal","score":0.5395821928977966},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.4506344199180603},{"id":"https://openalex.org/C67186912","display_name":"Data modeling","score":0.4274023175239563},{"id":"https://openalex.org/C115903868","display_name":"Software engineering","score":0.14309647679328918},{"id":"https://openalex.org/C188027245","display_name":"Polymer chemistry","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":2}},{"id":"openalex:W4414182345","title":"SSD-Based Intelligent Invigilation System for Multi-Modal Behavioral Features in Examination Scenarios","url":"https://doi.org/10.1145/3757749.3757782","published":"2025-06-06","authors":["Xinpeng Gao","Shengming Zhang","Run Chen","Weinan Wang","Jiajin Cao","Ningyuan Xu","Ran Tao","Zhihong Pan"],"abstract":"Against the background of low efficiency and lack of accuracy of traditional invigilation methods, this paper proposes a solution to design and implement an intelligent invigilation system to improve the level of examination invigilation intelligence. The system includes three user roles: administrators, invigilators, and candidates. Its functional modules include user management, data acquisition, and preprocessing. The core algorithm adopts a Single Shot MultiBox Detector(SSD) target detection algorithm, which is used to analyze and judge the abnormal behaviors of the candidates intelligently. The experimental results show that this intelligent invigilation system solves the problems of traditional invigilation well and effectively improves the intelligence level and management efficiency of exam invigilation.","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3757749.3757782","openalex_id":"https://openalex.org/W4414182345","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Anhui Xinhua University","Guangdong Polytechnic Normal University","Guangzhou University","Huawei Technologies (China)","Jilin University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6840999722480774},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5202999711036682},{"id":"https://openalex.org/C2164484","display_name":"Core (optical fiber)","score":0.48910000920295715},{"id":"https://openalex.org/C26517878","display_name":"Key (lock)","score":0.44179999828338623},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.4002000093460083},{"id":"https://openalex.org/C58328972","display_name":"Expert system","score":0.35899999737739563},{"id":"https://openalex.org/C56397880","display_name":"Intelligent decision support system","score":0.35679998993873596},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.3174000084400177}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7143693055","title":"Overview of the NTCIR-18 Lifelog-6 Task","url":"https://repository.nii.ac.jp/records/2002046","published":"2025-06-06","authors":["Liting Zhou","Cathal Gurrin","Hsin-Hung Chen","Hideo Joho","Chenyang Lyu","Longyue Wang","Graham Healy","Ly Duyen Tran","Quang-Linh Tran","Hoang Bao Le","Duc-Tien Dang-Nguyen","Tianbo Ji"],"abstract":"NTCIR-18 marked the sixth iteration of the Lifelog task, which aims to advance research on multimodal lifelog organization, search, and access. This task builds on methodologies successfully deployed in previous NTCIR conferences. In this paper, we detail the test collection, outline the specific tasks, provide an overview of submissions, and present findings from the NTCIR-18 Lifelog-6 task. We conclude with recommendations for future developments in lifelog research.","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.20736/0002002046","openalex_id":"https://openalex.org/W7143693055","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Alibaba Group (China)","Dublin City University","Høyskolen Kristiania","Nantong University","University College Dublin","University of Tsukuba"],"concepts":[{"id":"https://openalex.org/C176168674","display_name":"Lifelog","score":0.9677000045776367},{"id":"https://openalex.org/C2780451532","display_name":"Task (project management)","score":0.7361000180244446},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7164999842643738},{"id":"https://openalex.org/C2777267654","display_name":"Test (biology)","score":0.43689998984336853},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.3734000027179718},{"id":"https://openalex.org/C9652623","display_name":"Field (mathematics)","score":0.3474000096321106},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3361000120639801},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.3001999855041504}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"bytedance-seed:257","title":"SeedEdit 3.0: Fast and High-Quality Generative Image Editing","url":"https://seed.bytedance.com/en/research/seededit-3-0-fast-and-high-quality-generative-image-editing","published":"2025-06-05","authors":["Peng Wang","Yichun Shi","Xiaochen Lian","Zhonghua Zhai","Xin Xia","Xuefeng Xiao","Weilin Huang","Jianchao Yang"],"abstract":"We introduce SeedEdit 3.0, in companion with our T2I model Seedream 3.0, which significantly improves over our previous SeedEdit versions in both aspects of edit instruction following and image content (e.g., ID/IP) preservation on real image inputs. Additional to model upgrading with T2I, in this report, we present several key improvements. First, we develop an enhanced data curation pipeline with a meta-info paradigm and meta-info embedding strategy that help mix images from multiple data sources. This allows us to scale editing data effectively, and meta information is helpfult to connect VLM with diffusion model more closely. Second, we introduce a joint learning pipeline for computing a diffusion loss and reward losses. Finally, we evaluate SeedEdit 3.0 on our testing benchmarks, for real/synthetic image editing, where it achieves a best trade-off between multiple aspects, yielding....","companies":["ByteDance/Seed"],"matched_orgs":["ByteDance/Seed"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Computer Vision","Vision","arXiv"],"author_affiliations":["ByteDance/Seed"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official ByteDance Seed publication API https://seed.bytedance.com/api/get_article_list_v2"}},{"id":"apple:y814uhc3y3fapjqkb7ybb9pl","title":"Beyond Text Compression: Evaluating Tokenizers Across Scales","url":"https://machinelearning.apple.com/research/beyond-text-compression","published":"2025-06-05","authors":["Jonas F. Lotz","António Vilarinho Lopes","Stephan Peitz","Hendra Setiawan","Leonardo Emili"],"abstract":"Tokenizer design significantly impacts language model performance, yet evaluating tokenizer quality remains challenging. While text compression has emerged as a common intrinsic metric, recent work questions its reliability as a quality indicator. We investigate whether evaluating tokenizers on smaller models (350M parameters) reliably predicts their impact at larger scales (2.7B parameters). Through experiments with established tokenizers from...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["language model","compression"],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"official:f6316575663269c2","title":"Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models","url":"https://qwenlm.github.io/blog/qwen3-embedding/","published":"2025-06-05","authors":["Alibaba/Qwen"],"abstract":"GITHUB HUGGING FACE MODELSCOPE DISCORDWe release Qwen3 Embedding series, a new proprietary model of the Qwen model family. These models are specifically designed for text embedding, retrieval, and reranking tasks, built on the Qwen3 foundation model. Leveraging Qwen3’s robust multilingual text understanding capabilities, the series achieves state-of-the-art performance across multiple benchmarks for text embedding and reranking tasks. We have open-sourced this series of text embedding and reranking models under the Apache 2.","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["retrieval"],"author_affiliations":["Alibaba/Qwen"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official RSS feed https://qwenlm.github.io/blog/index.xml"}},{"id":"apple:j9s286j6ahbodjronz1ckpux","title":"Improve Vision Language Model Chain-of-thought Reasoning","url":"https://machinelearning.apple.com/research/chain-of-thought","published":"2025-06-05","authors":["Ruohong Zhang","Bowen Zhang","Yanghao Li","Haotian Zhang","Zhiqing Sun","Zhe Gan","Yinfei Yang","Ruoming Pang","Yiming Yang"],"abstract":"Chain-of-thought (CoT) reasoning in vision languagemodels (VLMs) is crucial for improvinginterpretability and trustworthiness. However,current training recipes often relying ondatasets dominated by short annotations withminimal rationales. In this work, we show thattraining VLM on short answers leads to poorgeneralization on reasoning tasks that requiremore detailed explanations. To address this limitation,we propose a two-stage...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["language model"],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"apple:xf6wnk9jzkycj8ntmdfiwjms","title":"Proxy-FDA: Proxy-Based Feature Distribution Alignment for Fine-Tuning Vision Foundation Models Without Forgetting","url":"https://machinelearning.apple.com/research/proxy-fda","published":"2025-06-05","authors":["Chen Huang","Skyler Seto","Hadi Pouransari","Mehrdad Farajtabar","Raviteja Vemulapalli","Fartash Faghri","Oncel Tuzel","Barry-John Theobald","Josh Susskind"],"abstract":"Vision foundation models pre-trained on massive data encode rich representations of real-world concepts, which can be adapted to downstream tasks by fine-tuning. However, fine-tuning foundation models on one task often leads to the issue of concept forgetting on other tasks. Recent methods of robust fine-tuning aim to mitigate forgetting of prior knowledge without affecting the fine-tuning performance. Knowledge is often preserved by matching the...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"openalex:W4411107668","title":"Gener <i>anno</i> : A Genomic Foundation Model for Metagenomic Annotation","url":"https://doi.org/10.1101/2025.06.04.656517","published":"2025-06-05","authors":["Q. Li","Wei Wu","Yiheng Zhu","Fuli Feng","Jieping Ye","Zheng Wang"],"abstract":"Abstract The rapid growth of genomic and metagenomic data has underscored the pressing need for advanced computational tools capable of deciphering complex biological sequences. In this study, we introduce Gener anno , a compact yet powerful genomic foundation model (GFM) specifically optimized for metagenomic annotation. Trained on an extensive dataset comprising 715 billion base pairs (bp) of prokaryotic DNA, Gener anno employs a transformer encoder architecture with 500 million parameters, enabling bidirectional attention over sequences up to 8192 bp at single-nucleotide resolution. This design addresses key limitations of existing methods, including the inability of traditional Hidden Markov Models (HMMs) to handle fragmented DNA sequences from multi-species microbial communities, as well as the suboptimal tokenization schemes of existing GFMs that compromise fine-grained analysis. A...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"preprint","doi":"https://doi.org/10.1101/2025.06.04.656517","openalex_id":"https://openalex.org/W4411107668","cited_by_count":1,"quality_score":42,"matched_keywords":["LLM"],"author_affiliations":["Alibaba Group (China)","Zhejiang University"],"concepts":[{"id":"https://openalex.org/C15151743","display_name":"Metagenomics","score":0.8853660821914673},{"id":"https://openalex.org/C2776321320","display_name":"Annotation","score":0.720862627029419},{"id":"https://openalex.org/C2780966255","display_name":"Foundation (evidence)","score":0.7173426151275635},{"id":"https://openalex.org/C70721500","display_name":"Computational biology","score":0.5173863172531128},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.43743857741355896},{"id":"https://openalex.org/C2522767166","display_name":"Data science","score":0.3430684804916382},{"id":"https://openalex.org/C127313418","display_name":"Geology","score":0.32129737734794617},{"id":"https://openalex.org/C86803240","display_name":"Biology","score":0.3189937472343445}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/slm-s2st-a-multimodal-language-model-for-direct-speech-to-speech-translation","title":"SLM-S2ST: A multimodal language model for direct speech-to-speech translation","url":"https://www.microsoft.com/en-us/research/publication/slm-s2st-a-multimodal-language-model-for-direct-speech-to-speech-translation/","published":"2025-06-04","authors":["Yuxuan Hu","Haibin Wu","Ruchao Fan","Xiaofei Wang","Heng Lu","Yao Qian","Jinyu Li"],"abstract":"Speech-aware language models (LMs) have demonstrated capabilities in understanding spoken language while generating text-based responses. However, enabling them to produce speech output efficiently and effectively remains a challenge. In this paper, we present SLM-S2ST, a multimodal LM for direct speech-to-speech translation (S2ST), built on the opensource Phi4-MM model. SLM-S2ST extends its predecessor by generating translated speech using an audio transformer head that predicts audio tokens with a delay relative to text tokens, followed by a streaming vocoder for waveform synthesis. Our experimental results on the CVSS-C dataset demonstrate SLMS2ST’s superior performance, significantly surpassing existing baseline models trained on the same dataset. Furthermore, when we scale up the training data and the model size, SLM-S2ST reaches on-par performance with the current SOTA model.","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Miscellaneous","Audio and Acoustics","Human language technologies","Audio and Speech Processing","Engineering","language model"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"bytedance-seed:291","title":"Sounding that Object: Interactive Object-Aware Image to Audio Generation","url":"https://seed.bytedance.com/en/research/sounding-that-object-interactive-object-aware-image-to-audio-generation","published":"2025-06-04","authors":["Tingle Li","Baihe Huang","Xiaobin Zhuang","Dongya Jia","Jiawei Chen","Yuping Wang","Zhuo Chen","Gopala Anumanchipalli","Yuxuan Wang"],"abstract":"Generating accurate sounds for complex audiovisual scenes is challenging, especially in the presence of multiple objects and sound sources. In this paper, we propose an interactive object-aware audio generation model that grounds sound generation in user-selected visual objects within images. Our method integrates object-centric learning into a conditional latent diffusion model, which learns to associate image regions with their corresponding sounds through multi-modal attention. At test time, our model employs image segmentation to allow users to interactively generate sounds at the object level. We theoretically validate that our attention mechanism functionally approximates test-time segmentation masks, ensuring the generated audio aligns with selected objects. Quantitative and qualitative evaluations show that our model outperforms baselines, achieving better alignment between objec...","companies":["ByteDance/Seed"],"matched_orgs":["ByteDance/Seed"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Computer Vision and Pattern Recognition","Speech","ICML 2025"],"author_affiliations":["ByteDance/Seed"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official ByteDance Seed publication API https://seed.bytedance.com/api/get_article_list_v2"}},{"id":"arxiv:2506.03524","title":"Seed-Coder: Let the Code Model Curate Data for Itself","url":"https://huggingface.co/papers/2506.03524","published":"2025-06-04","authors":["ByteDance Seed","Yuyu Zhang","Jing Su","Yifan Sun","Chenguang Xi","Xia Xiao","Shen Zheng","Anxiang Zhang","Kaibo Liu","Daoguang Zan","Tao Sun","Jinhua Zhu"],"abstract":"Code data in large language model (LLM) pretraining is recognized crucial not only for code-related tasks but also for enhancing general intelligence of LLMs. Current open-source LLMs often heavily rely on human effort to produce their code pretraining data, such as employing hand-crafted filtering rules tailored to individual programming languages, or using human-annotated data to train quality filters. However, these approaches are inherently limited in scalability, prone to subjective biases, and costly to extend and maintain across diverse programming languages. To address these challenges, we introduce Seed-Coder, a series of open-source LLMs comprising base, instruct and reasoning models of 8B size, minimizing human involvement in data construction. Our code pretraining data is produced by a model-centric data pipeline, which predominantly leverages LLMs for scoring and filtering c...","companies":["ByteDance/Seed"],"matched_orgs":["ByteDance/Seed"],"company_groups":["company_china"],"company_regions":["China"],"sources":["huggingface_search"],"source":"huggingface_search","work_type":"","doi":"","openalex_id":"","cited_by_count":0,"quality_score":39,"matched_keywords":["LLM","language model","preference"],"author_affiliations":[],"concepts":[],"official_report":false,"quality_signals":{"company_match_source":"HuggingFace Papers organization/author metadata"}},{"id":"openalex:W4411041472","title":"Advancing Conversational Diagnostic AI with Multimodal Reasoning","url":"https://doi.org/10.21203/rs.3.rs-6713863/v1","published":"2025-06-04","authors":["Ryutaro Tanno","Khaled Saab","Jan Freyberg","Chunjong Park","Tim Strother","Yong Cheng","Wei‐Hung Weng","David G. T. Barrett","David Stutz","Nenad Tomašev","Anil Palepu","Valentin Liévin"],"abstract":"Real-world clinical practice is inherently multimodal, relying on the synthesis of patient history with visual information such as medical imagery and clinical documents. Although large language models (LLMs) have shown promise in diagnostic dialogue, their evaluation has been largely restricted to text-only interactions, failing to capture the complexity of modern remote care delivery. Here we introduce a multimodal extension of the Articulate Medical Intelligence Explorer (multimodal AMIE), capable of gathering, interpreting and reasoning about multimodal data within a diagnostic conversation. To achieve this, we developed a state-aware dialogue framework that dynamically guides history-taking based on diagnostic uncertainty and evolving patient states, emulating the structured reasoning of experienced clinicians. We evaluated this updated, state-aware version of multimodal AMIE agains...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"preprint","doi":"https://doi.org/10.21203/rs.3.rs-6713863/v1","openalex_id":"https://openalex.org/W4411041472","cited_by_count":2,"quality_score":39,"matched_keywords":[],"author_affiliations":["Google (United States)","Google DeepMind (United Kingdom)","Google (United Kingdom)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5402199625968933},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.43299245834350586},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4117928445339203},{"id":"https://openalex.org/C188147891","display_name":"Cognitive science","score":0.3928956389427185},{"id":"https://openalex.org/C15744967","display_name":"Psychology","score":0.27205705642700195}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":2}},{"id":"openalex:W4411053664","title":"Generative Migration Architectures: Accelerating Cloud-Native Data Integration Through AI Orchestration","url":"https://doi.org/10.32996/jcsts.2025.7.5.79","published":"2025-06-04","authors":["Ritesh Sinha"],"abstract":"The integration of artificial intelligence into cloud migration frameworks represents a paradigm shift in data engineering practices across enterprise ecosystems. Generative AI models embedded within migration toolchains demonstrate exceptional capability in predicting schema inconsistencies and autonomously resolving structural disparities between heterogeneous data sources. Serverless architectures leveraging event-driven processing create adaptable migration pipelines that dynamically scale with workload intensity, effectively eliminating traditional bottlenecks. The evolution toward AI-augmented migration provides measurable advantages in regulatory compliance through automated data classification and lineage tracking. Performance benchmarking mechanisms intrinsic to these frameworks enable continuous optimization of cloud resource allocation throughout the migration lifecycle. Emerg...","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.32996/jcsts.2025.7.5.79","openalex_id":"https://openalex.org/W4411053664","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Amazon (United States)"],"concepts":[{"id":"https://openalex.org/C199168358","display_name":"Orchestration","score":0.9140099287033081},{"id":"https://openalex.org/C79974875","display_name":"Cloud computing","score":0.738425076007843},{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.6611535549163818},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.642345666885376},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.32760465145111084},{"id":"https://openalex.org/C111919701","display_name":"Operating system","score":0.1415334939956665},{"id":"https://openalex.org/C142362112","display_name":"Art","score":0.07435321807861328},{"id":"https://openalex.org/C153349607","display_name":"Visual arts","score":0.06042301654815674}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/gui-actor-coordinate-free-visual-grounding-for-gui-agents","title":"GUI-Actor: Coordinate-Free Visual Grounding for GUI Agents","url":"https://www.microsoft.com/en-us/research/publication/gui-actor-coordinate-free-visual-grounding-for-gui-agents/","published":"2025-06-03","authors":["Qianhui Wu","Kanzhi Cheng","Rui Yang","Chaoyun Zhang","Jianwei Yang","Huiqiang Jiang","Jian Mu","Baolin Peng","Bo Qiao","Reuben Tan","Si Qin","Lars Liden"],"abstract":"One of the principal challenges in building VLM-powered GUI agents is visual grounding—localizing the appropriate screen region for action execution based on both the visual content and the textual plans. Most existing work formulates this as a text-based coordinate generation task. However, these approaches suffer from several limitations: weak spatial-semantic alignment due to the lack of explicit spatial supervision; inability to handle ambiguous supervision targets, as single-point predictions penalize valid variations; and a mismatch between the dense nature of screen coordinates and the coarse, patch-level granularity of visual features extracted by models like Vision Transformers. In this paper, we propose GUI-Actor, a VLM-based method for coordinate-free GUI grounding. At its core, GUI-Actor introduces an attention-based action head that learns to align a dedicated token with all...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":84,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer vision","Human language technologies","AI agents","Deep learning","Multimodal Large Language Models","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"apple:q44h0rrsphi3j32p5ojs0l6g","title":"Prompting Whisper for Improved Verbatim Transcription and End-to-end Miscue Detection","url":"https://machinelearning.apple.com/research/prompting-whisper","published":"2025-06-03","authors":["Griffin Dietz Smith","Dianna Yee","Jennifer King Chen","Leah Findlater"],"abstract":"Equal Contributors","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"openalex:W4410991821","title":"MMoFusion: Multi-modal co-speech motion generation with diffusion model","url":"https://doi.org/10.1016/j.patcog.2025.111774","published":"2025-06-03","authors":["Sen Wang","Jiangning Zhang","Xin Tan","Zhifeng Xie","Chengjie Wang","Lizhuang Ma"],"abstract":"","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1016/j.patcog.2025.111774","openalex_id":"https://openalex.org/W4410991821","cited_by_count":7,"quality_score":44,"matched_keywords":[],"author_affiliations":["East China Normal University","Shanghai Jiao Tong University","Shanghai University","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C71139939","display_name":"Modal","score":0.6896480321884155},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5939455628395081},{"id":"https://openalex.org/C104114177","display_name":"Motion (physics)","score":0.5919091701507568},{"id":"https://openalex.org/C69357855","display_name":"Diffusion","score":0.4539952874183655},{"id":"https://openalex.org/C28490314","display_name":"Speech recognition","score":0.41330116987228394},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3391512334346771},{"id":"https://openalex.org/C24890656","display_name":"Acoustics","score":0.3382720351219177},{"id":"https://openalex.org/C121332964","display_name":"Physics","score":0.14696133136749268}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":7}},{"id":"openalex:W4410993406","title":"Pisces: A multi-modal data augmentation approach for drug combination synergy prediction","url":"https://doi.org/10.1016/j.xgen.2025.100892","published":"2025-06-03","authors":["Hanwen Xu","Jiacheng Lin","Addie Woicik","Zixuan Liu","Jianzhu Ma","Sheng Zhang","Hoifung Poon","Liewei Wang","Sheng Wang"],"abstract":"Drug combination therapy is promising for cancer treatment by reducing resistance and improving efficacy. Machine learning approaches to predicting drug combinations require massive training data. Here, we propose Pisces, a novel machine learning approach for drug combination synergy prediction. The key idea is to augment the sparse dataset by creating multiple views for each drug combination based on different modalities. We combined eight modalities of a drug to create 64 augmented views. By treating each augmented view as a separate instance, Pisces can process any number of drug modalities, circumventing the issue of missing modality. Pisces obtained state-of-the-art results on cell-line-based and xenograft-based drug synergy predictions and drug-drug interaction prediction. By interpreting Pisces's predictions using a genetic interaction network, we identified a breast cancer drug-s...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1016/j.xgen.2025.100892","openalex_id":"https://openalex.org/W4410993406","cited_by_count":4,"quality_score":41,"matched_keywords":[],"author_affiliations":["Mayo Clinic in Arizona","Microsoft (United States)","Tsinghua University","University of Illinois Urbana-Champaign","University of Washington"],"concepts":[{"id":"https://openalex.org/C2779903281","display_name":"Modalities","score":0.7196105122566223},{"id":"https://openalex.org/C2780035454","display_name":"Drug","score":0.6930801272392273},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6286287307739258},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.5708970427513123},{"id":"https://openalex.org/C2780226545","display_name":"Modality (human–computer interaction)","score":0.546349823474884},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5280448198318481},{"id":"https://openalex.org/C2994372470","display_name":"Cancer cell lines","score":0.43936699628829956},{"id":"https://openalex.org/C103637391","display_name":"Drug repositioning","score":0.4219932556152344}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":4}},{"id":"openalex:W4410964376","title":"Survey on Factuality in Large Language Models","url":"https://doi.org/10.1145/3742420","published":"2025-06-02","authors":["Cunxiang Wang","Xiaoze Liu","Yuanhao Yue","Qipeng Guo","Xiangkun Hu","Xiangru Tang","Tianhang Zhang","Cheng Jiayang","Yunzhi Yao","Xuming Hu","Zehan Qi","Wenyang Gao"],"abstract":"This survey addresses the crucial issue of factuality in Large Language Models (LLMs). As LLMs find applications across diverse domains, the reliability and accuracy of their outputs become vital. We define the “factuality issue” as the probability of LLMs to produce content inconsistent with established facts. We first delve into the implications of these inaccuracies. Subsequently, we analyze the mechanisms through which LLMs store and process facts, seeking the primary causes of factual errors. Our discussion then transitions to methodologies for evaluating LLM factuality, emphasizing key metrics, benchmarks, and studies. We further explore strategies for enhancing LLM factuality. Our survey offers a structured guide for researchers aiming to fortify the factual reliability of LLMs. We consistently maintain and update the related open-source materials at https://github.com/wangcunxian...","companies":["Microsoft","Amazon"],"matched_orgs":["Microsoft","Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"review","doi":"https://doi.org/10.1145/3742420","openalex_id":"https://openalex.org/W4410964376","cited_by_count":19,"quality_score":72,"matched_keywords":["LLM"],"author_affiliations":["Amazon (Germany)","Amazon (United States)","Fudan University","Hong Kong University of Science and Technology","Microsoft (United States)","Microsoft Research (United Kingdom)","Purdue University West Lafayette","Seattle University","ShangHai JiAi Genetics & IVF Institute","Shanghai Artificial Intelligence Laboratory","Shanghai Innovative Research Center of Traditional Chinese Medicine","Tsinghua University","Westlake University","William & Mary","Williams (United States)","Yale University","Zhejiang University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8935166001319885},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.4959631860256195},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4828560948371887}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":19}},{"id":"hf-org-paper:tencent:2506.01413","title":"Incentivizing Reasoning for Advanced Instruction-Following of Large Language Models","url":"https://huggingface.co/papers/2506.01413","published":"2025-06-02","authors":["Tencent/Hunyuan"],"abstract":"","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["HuggingFace org papers","tencent"],"author_affiliations":["Tencent/Hunyuan"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/tencent/papers"}},{"id":"openalex:W4413918290","title":"Automating Prompt Leakage Attacks on Large Language Models Using Agentic Approach","url":"https://doi.org/10.1109/mipro65660.2025.11131790","published":"2025-06-02","authors":["Tvrtko Sternak","Davor Runje","Dorian Granoša","Chi Wang"],"abstract":"This paper introduces a novel framework for evaluating the security of large language models (LLMs) against prompt leakage-the exposure of system-level prompts or proprietary configurations-which we identify as a critical threat to secure LLM deployment. Leveraging a multi-agent system implemented using AG2 (formerly AutoGen), we design agentic teams tasked with probing and exploiting the target LLM to elicit its prompt. Inspired by cryptographic principles, we define a prompt leakage-safe system as one in which an attacker cannot distinguish between two agents: one initialized with an original prompt and the other with a prompt stripped of sensitive information. In such a system, the agents' outputs are indistinguishable, ensuring sensitive information remains secure. This framework establishes a rigorous standard for evaluating and designing secure LLMs, bridging the gap between automa...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/mipro65660.2025.11131790","openalex_id":"https://openalex.org/W4413918290","cited_by_count":2,"quality_score":51,"matched_keywords":["LLM","agent","multi-agent"],"author_affiliations":["DeepMind (United Kingdom)","Google (United States)","SPX Corporation (United States)","University of Zagreb"],"concepts":[{"id":"https://openalex.org/C2777042071","display_name":"Leakage (economics)","score":0.6582029461860657},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6475179195404053},{"id":"https://openalex.org/C38652104","display_name":"Computer security","score":0.37716424465179443},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.3392292559146881},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.32805711030960083},{"id":"https://openalex.org/C162324750","display_name":"Economics","score":0.0},{"id":"https://openalex.org/C139719470","display_name":"Macroeconomics","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":2}},{"id":"openalex:W4410942332","title":"Simulating Subjects: The Promise and Peril of Artificial Intelligence Stand-Ins for Social Agents and Interactions","url":"https://doi.org/10.1177/00491241251337316","published":"2025-06-02","authors":["Austin C. Kozlowski","James A. Evans"],"abstract":"Large language models (LLMs), through their exposure to massive collections of online text, learn to reproduce the perspectives and linguistic styles of diverse social and cultural groups. This capability suggests a powerful social scientific application—the simulation of empirically realistic, culturally situated human subjects. Synthesizing recent research in artificial intelligence and computational social science, we outline a methodological foundation for simulating human subjects and their social interactions. We then identify six characteristics of current models that are likely to impair the realistic simulation of human subjects: bias, uniformity, atemporality, disembodiment, linguistic cultures, and alien intelligence. For each of these areas, we discuss promising approaches for overcoming their associated shortcomings. Given the rate of change of these models, we advocate for....","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1177/00491241251337316","openalex_id":"https://openalex.org/W4410942332","cited_by_count":9,"quality_score":46,"matched_keywords":[],"author_affiliations":["Google (United States)","Santa Fe Institute","University of Chicago"],"concepts":[{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5262241363525391},{"id":"https://openalex.org/C15744967","display_name":"Psychology","score":0.49202772974967957},{"id":"https://openalex.org/C77805123","display_name":"Social psychology","score":0.39111045002937317},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.37979745864868164}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":9}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/self-reflecting-large-language-models-a-hegelian-dialectical-approach-2","title":"Self-reflecting Large Language Models: A Hegelian Dialectical Approach","url":"https://www.microsoft.com/en-us/research/publication/self-reflecting-large-language-models-a-hegelian-dialectical-approach-2/","published":"2025-06-01","authors":["Sara Abdali","Can Goksen","Michael Solodko","Saeed Amizadeh","Julie E. Maybee","Kazuhito Koishida"],"abstract":"Investigating NLP through a philosophical lens has recently caught researchers’ eyes, as it bridges computational methods with classical schools of philosophy. This paper introduces a philosophical framework inspired by the Hegelian Dialectic to enable LLMs’ self-reflection, utilizing a self-dialectical approach to emulate internal critiques and synthesize new scientific ideas (spanning do mains such as mathematics, physics, and more). Additionally, we explore the effect of generation temperature in LLMs by introducing a dynamic annealing approach, which encourages creativity in the early stages and gradually focuses on refinement and nuance, as well as a constant temperature strategy. Furthermore, we implement a Multi-Agent Majority Voting (MAMV) strategy to assess the validity and novelty of the generated ideas, which proves useful in the absence of domain experts. We also evaluate the...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":96,"matched_keywords":["Article (Journal)","Artificial intelligence","Human language technologies","Mathematics","Social sciences","Generative AI","NLP","Human-computer interaction","Computer science","agent","multi-agent"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/what-does-success-look-like-catalyzing-meeting-intentionality-with-ai-assisted-prospective-reflection","title":"What Does Success Look Like? Catalyzing Meeting Intentionality with AI-Assisted Prospective Reflection","url":"https://www.microsoft.com/en-us/research/publication/what-does-success-look-like-catalyzing-meeting-intentionality-with-ai-assisted-prospective-reflection/","published":"2025-06-01","authors":["Ava Elizabeth Scott","Lev Tankelevitch","Payod Panda","Rishi Vanukuru","Xinyue Chen","Sean Rintel"],"abstract":"Despite decades of HCI and Meeting Science research, complaints about ineffective meetings are still pervasive. We argue that meeting technologies lack support for prospective reflection, that is, thinking about why a meeting is needed and what might happen. To explore this, we designed a Meeting Purpose Assistant (MPA) technology probe to coach users to articulate their meeting’s purpose and challenges, and act accordingly. The MPA used Generative AI to support personalized and actionable prospective reflection across the diversity of meeting contexts. Using a participatory prompting methodology, 18 employees of a global technology company reflected with the MPA on upcoming meetings. Observed impacts were: clarifying meeting purposes, challenges, and success conditions; changing perspectives and flexibility; improving preparation and communication; and proposing changed plans. We also i...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Human-computer interaction","Human–computer interaction","1970-01-01","personalized"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/scaling-textual-gradients-via-sampling-based-momentum","title":"Scaling Textual Gradients via Sampling-Based Momentum","url":"https://www.microsoft.com/en-us/research/publication/scaling-textual-gradients-via-sampling-based-momentum/","published":"2025-06-01","authors":["Zixin Ding","Junyuan Hong","Jiachen T. Wang","Zinan Lin","Zhangyang Wang","Yuxin Chen"],"abstract":"As prompts play an increasingly critical role in large language models (LLMs), optimizing textual prompts has become a crucial challenge. The Textual Gradient Descent (TGD) framework has emerged as a promising data-driven approach that iteratively refines textual prompts using LLM - suggested updates (or textual gradients) over minibatches of training samples. In this paper, we empirically demonstrate that scaling the number of training examples initially improves but later degrades TGD's performance across multiple downstream NLP tasks. However, while data scaling improves results for most tasks, it also significantly increases the computational cost when leveraging LLMs. To address this, we draw inspiration from numerical gradient descent and propose Textual Stochastic Gradient Descent with Momentum (TSGD-M) - a method that facilitates scalable in-context learning by reweighting prompt...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.1145/3786335.3813168","openalex_id":"https://openalex.org/W4414889978","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","large language models","Machine learning","1970-01-01","LLM"],"author_affiliations":["Microsoft","Microsoft (United States)","Princeton University","San Francisco Symphony","Santa Clara University","The University of Texas at Austin","University of Chicago"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/reason-before-retrieve-one-stage-reflective-chain-of-thoughts-for-training-free-zero-shot-composed-image-retrieval","title":"Reason-before-Retrieve: One-Stage Reflective Chain-of-Thoughts for Training-Free Zero-Shot Composed Image Retrieval","url":"https://www.microsoft.com/en-us/research/publication/reason-before-retrieve-one-stage-reflective-chain-of-thoughts-for-training-free-zero-shot-composed-image-retrieval/","published":"2025-06-01","authors":["Yuanmin Tang","Xiaoting Qin","Jue Zhang","Jing Yu","Gaopeng Gou","Gang Xiong","Qingwei Lin 林庆维","Saravan Rajmohan","Dongmei Zhang","Qi Wu"],"abstract":"Composed Image Retrieval (CIR) aims to retrieve target images that closely resemble a reference image while integrating user-specified textual modifications, thereby capturing user intent more precisely. Existing training-free zero-shot CIR (ZS-CIR) methods often employ a two-stage process: they first generate a caption for the reference image and then use Large Language Models for reasoning to obtain a target description. However, these methods suffer from missing critical visual details and limited reasoning capabilities, leading to suboptimal retrieval performance. To address these challenges, we propose a novel, training-free one-stage method, One-Stage Reflective Chain-of-Thought Reasoning for ZS-CIR (OSrCIR), which employs Multimodal Large Language Models to retain essential visual information in a single-stage reasoning process, eliminating the information loss seen in two-stage m...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer vision","1970-01-01","retrieval"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/epfl-smart-kitchen-30-densely-annotated-cooking-dataset-with-3d-kinematics-to-challenge-video-and-language-models","title":"EPFL-Smart-Kitchen-30: Densely annotated cooking dataset with 3D kinematics to challenge video and language models","url":"https://www.microsoft.com/en-us/research/publication/epfl-smart-kitchen-30-densely-annotated-cooking-dataset-with-3d-kinematics-to-challenge-video-and-language-models/","published":"2025-06-01","authors":["Andy Bonnetto","Haozhe Qi","Franklin Leong","Matea Tashkovska","Mahdi Rad","S. Shokur","Friedhelm Hummel","S. Micera","Marc Pollefeys","Alexander Mathis"],"abstract":"Understanding behavior requires datasets that capture humans while carrying out complex tasks. The kitchen is an excellent environment for assessing human motor and cognitive function, as many complex actions are naturally exhibited in kitchens from chopping to cleaning. Here, we introduce the EPFL-Smart-Kitchen-30 dataset, collected in a noninvasive motion capture platform inside a kitchen environment. Nine static RGB-D cameras, inertial measurement units (IMUs) and one head-mounted HoloLens~2 headset were used to capture 3D hand, body, and eye movements. The EPFL-Smart-Kitchen-30 dataset is a multi-view action dataset with synchronized exocentric, egocentric, depth, IMUs, eye gaze, body and hand kinematics spanning 29.7 hours of 16 subjects cooking four different recipes. Action sequences were densely annotated with 33.78 action segments per minute. Leveraging this multi-modal dataset,...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Biology","Computer science","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/when-testing-ai-tests-us-safeguarding-mental-health-on-the-digital-frontlines","title":"When Testing AI Tests Us: Safeguarding Mental Health on the Digital Frontlines","url":"https://www.microsoft.com/en-us/research/publication/when-testing-ai-tests-us-safeguarding-mental-health-on-the-digital-frontlines/","published":"2025-06-01","authors":["Sachin R. Pendse","Darren Gergle","Rachel Kornfield","Jonah Meyerhoff","David Mohr","Jina Suh","Annie Wescott","Casey Williams","Jessica Schleider"],"abstract":"The systematic testing of generative artificial intelligence (AI) models by collaborative teams and distributed individuals, often called red-teaming, is a core part of the infrastructure that ensures that AI models do not produce harmful content. Unlike past technologies, the black box nature of generative AI systems necessitates a uniquely interactional mode of testing, one in which individuals on red teams actively interact with the system, leveraging natural language to simulate malicious actors and solicit harmful outputs. This interactional labor done by red teams can result in mental health harms that are uniquely tied to the adversarial engagement strategies necessary to effectively red team. The importance of ensuring that generative AI models do not propagate societal or individual harm is widely recognized—one less visible foundation of end-to-end AI safety is also the protect...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Human-computer interaction","Computer science"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/from-elements-to-design-a-layered-approach-for-automatic-graphic-design-composition","title":"From Elements to Design: A Layered Approach for Automatic Graphic Design Composition","url":"https://www.microsoft.com/en-us/research/publication/from-elements-to-design-a-layered-approach-for-automatic-graphic-design-composition/","published":"2025-06-01","authors":["Jiawei Lin","Shizhao Sun","Danqing Huang","Ting Liu","Ji Li","Jiang Bian"],"abstract":"In this work, we investigate automatic design composition from multimodal graphic elements. Although recent studies have developed various generative models for graphic design, they usually face the following limitations: they only focus on certain subtasks and are far from achieving the design composition task; they do not consider the hierarchical information of graphic designs during the generation process. To tackle these issues, we introduce the layered design principle into Large Multimodal Models (LMMs) and propose a novel approach, called LaDeCo, to accomplish this challenging task. Specifically, LaDeCo first performs layer planning for a given element set, dividing the input elements into different semantic layers according to their contents. Based on the planning results, it subsequently predicts element attributes that control the design composition in a layer-wise manner, and...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer vision","Computer science"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/from-search-engines-to-action-engines","title":"From Search Engines to Action Engines","url":"https://www.microsoft.com/en-us/research/publication/from-search-engines-to-action-engines/","published":"2025-06-01","authors":["Suman Nath","Ryen W. White","Fazle Faisal","Morris Sharp","R. Gruen","Lenin Ravindranath Sivalingam"],"abstract":"With generative artificial intelligence (AI), there is progress in moving from search results to AI-generated answers that synthesize and summarize content. Research on AI agents and artificial capable intelligence aims to reach the next frontier in information access: task completion.","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.1109/mc.2025.3556643","openalex_id":"https://openalex.org/W4410639283","cited_by_count":1,"quality_score":65,"matched_keywords":["Article (Journal)","Systems and networking","Computer science"],"author_affiliations":["Microsoft","Microsoft (United States)"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/i2vguard-safeguarding-images-against-misuse-in-diffusion-based-image-to-video-models","title":"I2VGuard: Safeguarding Images against Misuse in Diffusion-based Image-to-Video Models","url":"https://www.microsoft.com/en-us/research/publication/i2vguard-safeguarding-images-against-misuse-in-diffusion-based-image-to-video-models/","published":"2025-06-01","authors":["Dongnan Gui","Xun Guo","Wengang Zhou","Yan Lu"],"abstract":"Recent advances in image-to-video generation have enabled animation of still images and offered pixel-level controllability. While these models hold great potential to transform single images into vivid and dynamic videos, they also carry risks of misuse that could impact privacy, security, and copyright protection. This paper proposes a novel approach that applies imperceptible perturbations on images to degrade the quality of the generated videos, thereby protecting images from misuse in white-box image-to-video diffusion models. Specifically, we function our approach as an adversarial attack, incorporating spatial, temporal, and diffusion attack modules. The spatial attack shifts image features from their original distribution to a lower-quality target distribution, reducing visual fidelity. The temporal attack disrupts coherent motion by interfering with temporal attention maps that....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/examining-the-expanding-role-of-synthetic-data-throughout-the-ai-development-pipeline","title":"Examining the Expanding Role of Synthetic Data Throughout the AI Development Pipeline","url":"https://www.microsoft.com/en-us/research/publication/examining-the-expanding-role-of-synthetic-data-throughout-the-ai-development-pipeline/","published":"2025-06-01","authors":["Shivani Kapania","Stephanie Ballard","Alex Kessler","Jennifer Wortman Vaughan"],"abstract":"Alongside the growth of generative AI, we are witnessing a surge in the use of synthetic data across all stages of the AI development pipeline. It is now common practice for researchers and practitioners to use one large generative model (which we refer to as an auxiliary model) to generate synthetic data that is used to train or evaluate another, reconfiguring AI workflows and reshaping the very nature of data. While scholars have raised concerns over the risks of synthetic data, policy guidance and best practices for its responsible use have not kept up with these rapidly evolving industry trends, in part because we lack a clear picture of current practices and challenges. Our work aims to address this gap. Through 29 interviews with AI practitioners and responsible AI experts, we examine the expanding role of synthetic data in AI development. Our findings reveal how auxiliary models a...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/autoformalizing-mathematical-statements-by-symbolic-equivalence-and-semantic-consistency","title":"Autoformalizing Mathematical Statements by Symbolic Equivalence and Semantic Consistency","url":"https://www.microsoft.com/en-us/research/publication/autoformalizing-mathematical-statements-by-symbolic-equivalence-and-semantic-consistency/","published":"2025-06-01","authors":["Zenan Li","Yifan Wu","Zhaoyu Li","Xinming Wei","Fan Yang","Xian Zhang","Xiaoxing Ma"],"abstract":"Autoformalization, the task of automatically translating natural language descriptions into a formal language, poses a significant challenge across various domains, especially in mathematics. Recent advancements in large language models (LLMs) have unveiled their promising capabilities to formalize even competition-level math problems. However, we observe a considerable discrepancy between pass@1 and pass@k accuracies in LLM-generated formalizations. To address this gap, we introduce a novel framework that scores and selects the best result from k autoformalization candidates based on two complementary self-consistency methods: symbolic equivalence and semantic consistency. Elaborately, symbolic equivalence identifies the logical homogeneity among autoformalization candidates using automated theorem provers, and semantic consistency evaluates the preservation of the original meaning by i...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Manual","Artificial intelligence","LLM"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/the-effects-of-generative-ai-on-high-skilled-work-evidence-from-three-field-experiments-with-software-developers","title":"The Effects of Generative AI on High-Skilled Work: Evidence from Three Field Experiments with Software Developers","url":"https://www.microsoft.com/en-us/research/publication/the-effects-of-generative-ai-on-high-skilled-work-evidence-from-three-field-experiments-with-software-developers/","published":"2025-06-01","authors":["Zheyuan (Kevin) Cui","Mert Demirer","Sonia Jaffe","Leon Musolff","Sida Peng","Tobias Salz"],"abstract":"This study evaluates the impact of generative AI on software developer productivity via randomized controlled trials at Microsoft, Accenture, and an anonymous Fortune 100 company. These field experiments, run by the companies as part of their ordinary course of business, provided a random subset of developers with access to an AI-based coding assistant suggesting intelligent code completions. Though each experiment is noisy, when data is combined across three experiments and 4,867 developers, our analysis reveals a 26.08% increase (SE: 10.3%) in completed tasks among developers using the AI tool. Notably, less experienced developers had higher adoption rates and greater productivity gains.","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.1287/mnsc.2025.00535","openalex_id":"https://openalex.org/W7131867455","cited_by_count":1,"quality_score":61,"matched_keywords":["Article (Journal)","Economics"],"author_affiliations":["Microsoft","Massachusetts Institute of Technology","Microsoft (United States)","Princeton University","University of Pennsylvania"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/how-good-are-humans-at-detecting-ai-generated-images-learnings-from-an-experiment","title":"How good are humans at detecting AI-generated images? Learnings from an experiment","url":"https://www.microsoft.com/en-us/research/publication/how-good-are-humans-at-detecting-ai-generated-images-learnings-from-an-experiment/","published":"2025-06-01","authors":["Thomas Roca","PhD","Anthony Cintron Roman","Jehú Torres","Marcelo Duarte","Pengce Wang","Kevin White","Amit Misra","Juan M. Lavista Ferres"],"abstract":"As AI-powered image generation improves, a key question is how well human beings can differentiate between \"real\" and AI-generated or modified images. Using data collected from the online game \"Real or Not Quiz.\", this study investigates how effectively people can distinguish AI-generated images from real ones. Participants viewed a randomized set of real and AI-generated images, aiming to identify their authenticity. Analysis of approximately 287,000 image evaluations by over 12,500 global participants revealed an overall success rate of only 62\\%, indicating a modest ability, slightly above chance. Participants were most accurate with human portraits but struggled significantly with natural and urban landscapes. These results highlight the inherent challenge humans face in distinguishing AI-generated visual content, particularly images without obvious artifacts or stylistic cues. This....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Unpublished","Artificial intelligence"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W4411108312","title":"A review on synergizing knowledge graphs and large language models","url":"https://doi.org/10.1007/s00607-025-01499-8","published":"2025-06-01","authors":["Zhenyao Yang","Sha Yuan","Zhou Shao","Wenfa Li","Runzhou Liu"],"abstract":"","companies":["Z.ai/Zhipu"],"matched_orgs":["Z.ai/Zhipu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"review","doi":"https://doi.org/10.1007/s00607-025-01499-8","openalex_id":"https://openalex.org/W4411108312","cited_by_count":3,"quality_score":40,"matched_keywords":[],"author_affiliations":["Columbia University","University of Science and Technology Beijing","Zhipu AI (China)"],"concepts":[{"id":"https://openalex.org/C2987255567","display_name":"Knowledge graph","score":0.5995545387268066},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5562882423400879},{"id":"https://openalex.org/C188147891","display_name":"Cognitive science","score":0.3496951460838318},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.341156005859375},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.30561721324920654},{"id":"https://openalex.org/C15744967","display_name":"Psychology","score":0.16310226917266846}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":3}},{"id":"openalex:W4410939617","title":"Multi Modal Transportation Path Selection of Coal based on Genetic Algorithm","url":"https://doi.org/10.12694/scpe.v26i4.4497","published":"2025-06-01","authors":["Jianjun Wu","Shusen Zhang","Ping Gong","Junyu Chen"],"abstract":"In order to solve the problem of selecting transportation routes and transfer nodes reasonably in the process of multimodal logistics distribution, the author proposes a coal transportation multimodal transportation path selection based on genetic algorithm. Firstly, this paper establishes an object function for routing according to the features of multi-modal transport, which has the minimum transport time, the minimum transport length and the minimum transport cost. Secondly, we design appropriate GA components, and get a multiobjective route optimal model for multimodal transport by using GA. Taking into account the high transportation costs of coal as a bulk commodity, a coal transportation multimodal transport path optimization model was constructed with the total transportation cost as the objective function of the model, and the minimum economic cost as the objective. At last, thi...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.12694/scpe.v26i4.4497","openalex_id":"https://openalex.org/W4410939617","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Huawei Technologies (China)","Shanxi Coal Transportation and Sales Group (China)","Tianjin Municipal Engineering Design and Research Institute"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7924145460128784},{"id":"https://openalex.org/C71139939","display_name":"Modal","score":0.7287102341651917},{"id":"https://openalex.org/C81917197","display_name":"Selection (genetic algorithm)","score":0.6865886449813843},{"id":"https://openalex.org/C2777735758","display_name":"Path (computing)","score":0.6333732604980469},{"id":"https://openalex.org/C8880873","display_name":"Genetic algorithm","score":0.6240870356559753},{"id":"https://openalex.org/C11413529","display_name":"Algorithm","score":0.5216662883758545},{"id":"https://openalex.org/C126255220","display_name":"Mathematical optimization","score":0.41351601481437683},{"id":"https://openalex.org/C518851703","display_name":"Coal","score":0.4126441180706024}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"official:cbeb72acc08f7246","title":"SLIM: One-shot Quantization and Sparsity with Low-rank Approximation for LLM Weight Compression","url":"https://research.nvidia.com/publication/2025-06_slim-one-shot-quantization-and-sparsity-low-rank-approximation-llm-weight","published":"2025-06","authors":["Mohammad Mozaffari","Amir Yazdanbakhsh","Maryam Mehri Dehnavi"],"abstract":"Official NVIDIA Research publication. ICML","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["ICML","LLM","compression","quantization"],"author_affiliations":["NVIDIA"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official NVIDIA Research publications page https://research.nvidia.com/publications?f%5B0%5D=publication_date%3A2025&page=2"}},{"id":"official:88e77fc0f568a0bc","title":"Spec2RTL-Agent: Automated Hardware Code Generation from Complex Specifications Using LLM Agent Systems","url":"https://research.nvidia.com/publication/2025-06_spec2rtl-agent-automated-hardware-code-generation-complex-specifications-using","published":"2025-06","authors":["Zhongzhi Yu","Mingjie Liu","Michael Zimmer","Yingyan (Celine) Lin","Yong Liu","Mark Haoxing Ren"],"abstract":"Official NVIDIA Research publication.","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.1109/iclad65226.2025.00013","openalex_id":"https://openalex.org/W4413145157","cited_by_count":5,"quality_score":65,"matched_keywords":["LLM","agent"],"author_affiliations":["NVIDIA","Cadence Design Systems (United States)","Georgia Institute of Technology","Nvidia (United Kingdom)","Nvidia (United States)"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official NVIDIA Research publications page https://research.nvidia.com/publications?f%5B0%5D=publication_date%3A2025&page=1"}},{"id":"official:2ce566ccc7342887","title":"Towards a VLM Benchmark for Simulated Robotics","url":"https://research.nvidia.com/publication/2025-06_towards-vlm-benchmark-simulated-robotics","published":"2025-06","authors":["Xuning Yang","Clemens Eppner","Valts Blukis","Peter Belcak","Stephen Tyree","Stan Birchfield","Fabio Ramos","Jonathan Tremblay"],"abstract":"Official NVIDIA Research publication. RSS","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["RSS"],"author_affiliations":["NVIDIA"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official NVIDIA Research publications page https://research.nvidia.com/publications?f%5B0%5D=publication_date%3A2025&page=1"}},{"id":"official:4ceb9556e0a8ddcc","title":"RoboSpatial: Teaching Spatial Understanding to 2D and 3D Vision-Language Models for Robotics","url":"https://research.nvidia.com/publication/2025-06_robospatial-teaching-spatial-understanding-2d-and-3d-vision-language-models","published":"2025-06","authors":["Chan Hee Song","Valts Blukis","Jonathan Tremblay","Stephen Tyree","Yu Su","Stan Birchfield"],"abstract":"Official NVIDIA Research publication. CVPR","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["CVPR"],"author_affiliations":["NVIDIA"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official NVIDIA Research publications page https://research.nvidia.com/publications?f%5B0%5D=publication_date%3A2025&page=2"}},{"id":"official:366c40242e0f75bb","title":"Make It Count: Text-to-Image Generation with an Accurate Number of Objects","url":"https://research.nvidia.com/publication/2025-06_make-it-count-text-image-generation-accurate-number-objects","published":"2025-06","authors":["Lital Binyamin","Yoad Tewel","Eran Hirsch","Royi Rassin","Gal Chechik"],"abstract":"Official NVIDIA Research publication. CVPR","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["CVPR"],"author_affiliations":["NVIDIA"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official NVIDIA Research publications page https://research.nvidia.com/publications?f%5B0%5D=publication_date%3A2025&page=1"}},{"id":"official:20be115b4cedeedd","title":"A Generative AI Game Jam Case Study from October 2024","url":"https://research.nvidia.com/publication/2025-06_generative-ai-game-jam-case-study-october-2024","published":"2025-06","authors":["Josef Spjut"],"abstract":"Official NVIDIA Research publication. CVPR","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["CVPR"],"author_affiliations":["NVIDIA"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official NVIDIA Research publications page https://research.nvidia.com/publications?f%5B0%5D=publication_date%3A2025&page=2"}},{"id":"official:d54cefdd03d1b2ff","title":"Marco: Configurable Graph-Based Task Solving and Multi-AI Agents Framework for Hardware Design","url":"https://research.nvidia.com/publication/2025-06_marco-configurable-graph-based-task-solving-and-multi-ai-agents-framework","published":"2025-06","authors":["Chia-Tung (Mark) Ho","Jing Gong","Yunsheng Bai","Chenhui Deng","Mark Haoxing Ren","Brucek Khailany"],"abstract":"Official NVIDIA Research publication.","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["NVIDIA"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official NVIDIA Research publications page https://research.nvidia.com/publications?f%5B0%5D=publication_date%3A2025&page=2"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/covomix2-advancing-zero-shot-dialogue-generation-with-fully-non-autoregressive-flow-matching","title":"CoVoMix2: Advancing Zero-Shot Dialogue Generation with Fully Non-Autoregressive Flow Matching","url":"https://www.microsoft.com/en-us/research/publication/covomix2-advancing-zero-shot-dialogue-generation-with-fully-non-autoregressive-flow-matching/","published":"2025-05-31","authors":["Leying Zhang","Yao Qian","Xiaofei Wang","Manthan Thakker","Dongmei Wang","Jianwei Yu","Haibin Wu","Yuxuan Hu","Jinyu Li","Yanmin Qian","Sheng Zhao"],"abstract":"Generating natural-sounding, multi-speaker dialogue is crucial for applications such as podcast creation, virtual agents, and multimedia content generation. However, existing systems struggle to maintain speaker consistency, model overlapping speech, and synthesize coherent conversations efficiently. In this paper, we introduce CoVoMix2, a fully non-autoregressive framework for zero-shot multi-talker dialogue generation. CoVoMix2 directly predicts mel-spectrograms from multi-stream transcriptions using a flow-matching-based generative model, eliminating the reliance on intermediate token representations. To better capture realistic conversational dynamics, we propose transcription-level speaker disentanglement, sentence-level alignment, and prompt-level random masking strategies. Our approach achieves state-of-the-art performance, outperforming strong baselines like MoonCast and Sesame i...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Audio and Acoustics","Computer science","Engineering","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/mmedagent-rl-optimizing-multi-agent-collaboration-for-multimodal-medical-reasoning","title":"MMedAgent-RL: Optimizing Multi-Agent Collaboration for Multimodal Medical Reasoning","url":"https://www.microsoft.com/en-us/research/publication/mmedagent-rl-optimizing-multi-agent-collaboration-for-multimodal-medical-reasoning/","published":"2025-05-30","authors":["Peng Xia","Jinglu Wang","Yibo Peng","Kaide Zeng","Xian Wu","Xiangru Tang","Hongtu Zhu","Yun Li","Shujie Liu","Yan Lu","Huaxiu Yao"],"abstract":"Medical Large Vision-Language Models (Med-LVLMs) have shown strong potential in multimodal diagnostic tasks. However, existing single-agent models struggle to generalize across diverse medical specialties, limiting their performance. Recent efforts introduce multi-agent collaboration frameworks inspired by clinical workflows, where general practitioners (GPs) and specialists interact in a fixed sequence. Despite improvements, these static pipelines lack flexibility and adaptability in reasoning. To address this, we propose MMedAgent-RL, a reinforcement learning (RL)-based multi-agent framework that enables dynamic, optimized collaboration among medical agents. Specifically, we train two GP agents based on Qwen2.5-VL via RL: the triage doctor learns to assign patients to appropriate specialties, while the attending physician integrates the judgments from multi-specialists and its own know...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":80,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","Vision-language models","1970-01-01","agent","multi-agent"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/geovision-labeler-zero-shot-geospatial-classification-with-vision-and-language-models","title":"GeoVision Labeler: Zero-Shot Geospatial Classification with Vision and Language Models","url":"https://www.microsoft.com/en-us/research/publication/geovision-labeler-zero-shot-geospatial-classification-with-vision-and-language-models/","published":"2025-05-30","authors":["Gilles Quentin Hacheme","Girmaw Abebe Tadesse","Caleb Robinson","Akram Zaytar","Rahul Dodhia","Juan M. Lavista Ferres"],"abstract":"Classifying geospatial imagery remains a major bottleneck for applications such as disaster response and land-use monitoring-particularly in regions where annotated data is scarce or unavailable. Existing tools (e.g., RS-CLIP) that claim zero-shot classification capabilities for satellite imagery nonetheless rely on task-specific pretraining and adaptation to reach competitive performance. We introduce GeoVision Labeler (GVL), a strictly zero-shot classification framework: a vision Large Language Model (vLLM) generates rich, human-readable image descriptions, which are then mapped to user-defined classes by a conventional Large Language Model (LLM). This modular, and interpretable pipeline enables flexible image classification for a large range of use cases. We evaluated GVL across three benchmarks-SpaceNet v7, UC Merced, and RESISC45. It achieves up to 93.2% zero-shot accuracy on the bi...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Unpublished","Artificial intelligence","Computer vision","Ecology and environment","LLM","language model"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/codesense-a-real-world-benchmark-and-dataset-for-code-semantic-reasoning","title":"CodeSense: a Real-World Benchmark and Dataset for Code Semantic Reasoning","url":"https://www.microsoft.com/en-us/research/publication/codesense-a-real-world-benchmark-and-dataset-for-code-semantic-reasoning/","published":"2025-05-30","authors":["Monoshi Kumar Roy","Simin Chen","Benjamin Steenhoek","Jinjun Peng","Gail E. Kaiser","Baishakhi Ray","Wei Le"],"abstract":"Understanding and reasoning about code semantics is essential for enhancing code LLMs'abilities to solve real-world software engineering (SE) tasks. Although several code reasoning benchmarks exist, most rely on synthetic datasets or educational coding problems and focus on coarse-grained reasoning tasks such as input/output prediction, limiting their effectiveness in evaluating LLMs in practical SE contexts. To bridge this gap, we propose CodeSense, the first benchmark that makes available a spectrum of fine-grained code reasoning tasks concerned with the software engineering of real-world code. We collected Python, C and Java software projects from real-world repositories. We executed tests from these repositories, collected their execution traces, and constructed a ground truth dataset for fine-grained semantic reasoning tasks. We then performed comprehensive evaluations on state-of-t...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","large language models","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/structured-3d-latents-for-scalable-and-versatile-3d-generation","title":"Structured 3D Latents for Scalable and Versatile 3D Generation","url":"https://www.microsoft.com/en-us/research/publication/structured-3d-latents-for-scalable-and-versatile-3d-generation/","published":"2025-05-30","authors":["Jianfeng Xiang","Zelong Lv","Sicheng Xu","Yu Deng","Ruicheng Wang","Bowen Zhang","Dong Chen","Xin Tong","Jiaolong Yang"],"abstract":"We introduce a novel 3D generation method for versatile and high-quality 3D asset creation. The cornerstone is a unified Structured LATent (SLAT) representation which allows decoding to different output formats, such as Radiance Fields, 3D Gaussians, and meshes. This is achieved by integrating a sparsely-populated 3D grid with dense multiview visual features extracted from a powerful vision foundation model, comprehensively capturing both structural (geometry) and textural (appearance) information while maintaining flexibility during decoding. We employ rectified flow transformers tailored for SLAT as our 3D generation models and train models with up to 2 billion parameters on a large 3D asset dataset of 500K diverse objects. Our model generates high-quality results with text or image conditions, significantly surpassing existing methods, including recent ones at similar scales. We showc...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Inproceedings (Conference)","Computer vision","Computer science","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"hf-org-paper:stepfun-ai:2505.24862","title":"ViStoryBench: Comprehensive Benchmark Suite for Story Visualization","url":"https://huggingface.co/papers/2505.24862","published":"2025-05-30","authors":["StepFun"],"abstract":"","companies":["StepFun"],"matched_orgs":["StepFun"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["HuggingFace org papers","stepfun-ai"],"author_affiliations":["StepFun"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/stepfun-ai/papers"}},{"id":"openalex:W4410886919","title":"A Joint Learning of Force Feedback of Robotic Manipulation and Textual Cues for Granular Materials Classification","url":"https://doi.org/10.1109/lra.2025.3575322","published":"2025-05-30","authors":["Zeqing Zhang","Guanqi Chen","Wentao Chen","Ruixing Jia","Guanhua Chen","Liangjun Zhang","Jia Pan","Peng Zhou"],"abstract":"Granular materials (GMs) are formed by a collection of particles. Even if their visual representation is straightforward, it can be seriously affected in the visually constrained environment. Based on frequency features observed in force signals, this paper proposes a non-visual classifier, <bold xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">GmClass</b>, using the force feedback in the robot-granules interaction. Specifically, we transform the force sequences into the frequency domain and integrate them with high-dimensional textual information into a two-branch architecture for multimodal supervised contrastive learning (MSCL). This method achieves an 84.10% classification accuracy, surpassing traditional supervised learning by 10% and outperforming supervised contrastive learning by more than 40%, demonstrating the positive impact of adding t...","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/lra.2025.3575322","openalex_id":"https://openalex.org/W4410886919","cited_by_count":21,"quality_score":58,"matched_keywords":[],"author_affiliations":["Baidu (China)","Southern University of Science and Technology","University of Hong Kong"],"concepts":[{"id":"https://openalex.org/C18555067","display_name":"Joint (building)","score":0.6415302753448486},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5300202965736389},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.4743974208831787},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4603756070137024},{"id":"https://openalex.org/C127413603","display_name":"Engineering","score":0.26934531331062317},{"id":"https://openalex.org/C66938386","display_name":"Structural engineering","score":0.07734143733978271}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":21}},{"id":"apple:psa80g5qk4ziowppfe178e8v","title":"SpeakStream: Streaming Text-to-Speech with Interleaved Data","url":"https://machinelearning.apple.com/research/speakstream-streaming","published":"2025-05-30","authors":["Richard He Bai","Zijin Gu","Tatiana Likhomanenko","Navdeep Jaitly"],"abstract":"With the increasing integration of speech front-ends and large language models (LLM),there is a need to explore architectures that integrate these modalities. While end-to-end models have been explored extensively, cascaded models that stream outputs from LLMs to TTS seem to be oddly under-explored, even though they are potentially much simpler.Using traditional text-to-speech systems to convert LLM outputs to audio, however, poses a technical...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["LLM"],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"apple:r6gry80d0ov50bgmgmqbtnwe","title":"World-Consistent Video Diffusion With Explicit 3D Modeling","url":"https://machinelearning.apple.com/research/world-consistent","published":"2025-05-30","authors":["Qihang Zhang","Shuangfei Zhai","Miguel Angel Bautista Martin","Kevin Miao","Alexander Toshev","Josh Susskind","Jiatao Gu"],"abstract":"As diffusion models dominating visual content generation, efforts have been made to adapt these models for multi-view image generation to create 3D content. Traditionally, these methods implicitly learn 3D consistency by generating only RGB frames, which can lead to artifacts and inefficiencies in training. In contrast, we propose generating Normalized Coordinate Space (NCS) frames alongside RGB frames. NCS frames capture each pixel's global...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/swe-bench-goes-live","title":"SWE-bench Goes Live!","url":"https://www.microsoft.com/en-us/research/publication/swe-bench-goes-live/","published":"2025-05-29","authors":["Linghao Zhang","Shilin He","Chaoyun Zhang","Yu Kang","Bowen Li","Chengxing Xie","J. Wang","Maoquan Wang","Yufan Huang","Shengyu Fu","Elsie Nallipogu","Qingwei Lin 林庆维"],"abstract":"The issue-resolving task, where a model generates patches to fix real-world bugs, has emerged as a critical benchmark for evaluating the capabilities of large language models (LLMs). While SWE-bench and its variants have become standard in this domain, they suffer from key limitations: they have not been updated since their initial releases, cover a narrow set of repositories, and depend heavily on manual effort for instance construction and environment setup. These factors hinder scalability and introduce risks of overfitting and data contamination. In this work, we present SWE-bench-Live, a live-updatable benchmark designed to overcome these challenges. Our initial release consists of 1,319 tasks derived from real GitHub issues created since 2024, spanning 93 repositories. Each task is accompanied by a dedicated Docker image to ensure reproducible execution. Central to our benchmark is...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":80,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Programming languages and software engineering","Computer science","large language models","1970-01-01","agent"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W4410846180","title":"AI/ML curation of AI/ML training datasets","url":"https://doi.org/10.1117/12.3055515","published":"2025-05-29","authors":["S.K. Shukla","Dimitris A. Pados","Kavita Varma","George Sklivanitis","Elizabeth Serena Bentley","Michael J. Medley"],"abstract":"Input data quality is critically important during the training phase of AI/ML models. Anomalies/faults within the training data can profoundly influence decision boundaries established by the models, thus affecting their predictive accuracy post-training on operational datasets. In this work, we propose and describe in considerable implementation detail a novel methodology based on <i>L</i><sub>1</sub>-norm dataset analysis and geometric principles, which aims to eliminate atypical data instances on a class-by-class basis prior to training. The suggested dataset curation procedure is entirely data driven (touch-free), unsupervised, and computationally efficient. Comprehensive experimental investigations conducted on real-world datasets are presented in this paper that illustrate the <i>L</i><sub>1</sub>- norm dataset curation technique and demonstrate its effectiveness in protecting Supp...","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1117/12.3055515","openalex_id":"https://openalex.org/W4410846180","cited_by_count":0,"quality_score":41,"matched_keywords":["efficient"],"author_affiliations":["Amazon (United States)","Florida Atlantic University","SUNY Polytechnic Institute","United States Air Force Research Laboratory"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6797002553939819},{"id":"https://openalex.org/C91632574","display_name":"Data curation","score":0.6186796426773071},{"id":"https://openalex.org/C2777211547","display_name":"Training (meteorology)","score":0.5675860047340393},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.47872138023376465},{"id":"https://openalex.org/C2522767166","display_name":"Data science","score":0.27988940477371216},{"id":"https://openalex.org/C205649164","display_name":"Geography","score":0.05747520923614502},{"id":"https://openalex.org/C153294291","display_name":"Meteorology","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/trap-targeted-redirecting-of-agentic-preferences","title":"TRAP: Targeted Redirecting of Agentic Preferences","url":"https://www.microsoft.com/en-us/research/publication/trap-targeted-redirecting-of-agentic-preferences/","published":"2025-05-28","authors":["Hangoo Kang","Jehyeok Yeon","Gagandeep Singh"],"abstract":"Autonomous agentic AI systems powered by vision-language models (VLMs) are rapidly advancing toward real-world deployment, yet their cross-modal reasoning capabilities introduce new attack surfaces for adversarial manipulation that exploit semantic reasoning across modalities. Existing adversarial attacks typically rely on visible pixel perturbations or require privileged model or environment access, making them impractical for stealthy, real-world exploitation. We introduce TRAP, a generative adversarial framework that manipulates the agent's decision-making using diffusion-based semantic injections. Our method combines negative prompt-based degradation with positive semantic optimization, guided by a Siamese semantic network and layout-aware spatial masking. Without requiring access to model internals, TRAP produces visually natural images yet induces consistent selection biases in age...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","agentic AI","Computer science","1970-01-01","agent"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/securing-ai-agents-with-information-flow-control","title":"Securing AI Agents with Information-Flow Control","url":"https://www.microsoft.com/en-us/research/publication/securing-ai-agents-with-information-flow-control/","published":"2025-05-28","authors":["Manuel Costa","Boris Köpf","Aashish Kolluri","Andrew Paverd","Mark Russinovich","Ahmed Salem","Shruti Tople","Lukas Wutschitz","Santiago Zanella-Béguelin"],"abstract":"As AI agents become increasingly autonomous and capable, ensuring their security against vulnerabilities such as prompt injection becomes critical. This paper explores the use of information-flow control (IFC) to provide security guarantees for AI agents. We present a formal model to reason about the security and expressiveness of agent planners. Using this model, we characterize the class of properties enforceable by dynamic taint-tracking and construct a taxonomy of tasks to evaluate security and utility trade-offs of planner designs. Informed by this exploration, we present Fides, a planner that tracks confidentiality and integrity labels, deterministically enforces security policies, and introduces novel primitives for selectively hiding information. Its evaluation in AgentDojo demonstrates that this approach broadens the range of tasks that can be securely accomplished. A tutorial t...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Miscellaneous","Artificial intelligence","Security, privacy, and cryptography","AI agents","Computer science","agent"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/contextual-integrity-in-llms-via-reasoning-and-reinforcement-learning","title":"Contextual Integrity in LLMs via Reasoning and Reinforcement Learning","url":"https://www.microsoft.com/en-us/research/publication/contextual-integrity-in-llms-via-reasoning-and-reinforcement-learning/","published":"2025-05-28","authors":["Guangchen Lan","Huseyin Inan","Sahar Abdelnabi","Janardhan (Jana) Kulkarni","Lukas Wutschitz","Reza Shokri","Christopher G. Brinton","Robert Sim"],"abstract":"As the era of autonomous agents making decisions on behalf of users unfolds, ensuring contextual integrity (CI) -- what is the appropriate information to share while carrying out a certain task -- becomes a central question to the field. We posit that CI demands a form of reasoning where the agent needs to reason about the context in which it is operating. To test this, we first prompt LLMs to reason explicitly about CI when deciding what information to disclose. We then extend this approach by developing a reinforcement learning (RL) framework that further instills in models the reasoning necessary to achieve CI. Using a synthetic, automatically created, dataset of only $\\sim700$ examples but with diverse contexts and information disclosure norms, we show that our method substantially reduces inappropriate information disclosure while maintaining task performance across multiple model s...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","AI agents","Computer science","1970-01-01","agent"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W4410827098","title":"Why human–AI relationships need socioaffective alignment","url":"https://doi.org/10.1057/s41599-025-04532-5","published":"2025-05-28","authors":["Hannah Rose Kirk","Iason Gabriel","Chris Summerfield","Bertie Vidgen","Scott A. Hale"],"abstract":"Humans strive to design safe AI systems that align with our goals and remain under our control. However, as AI capabilities advance, we face a new challenge: the emergence of deeper, more persistent relationships between humans and AI systems. We explore how increasingly capable AI agents may generate the perception of deeper relationships with users, especially as AI becomes more personalised and agentic. This shift, from transactional interaction to ongoing sustained social engagement with AI, necessitates a new focus on socioaffective alignment—how an AI system behaves within the social and psychological ecosystem co-created with its user, where preferences and perceptions evolve through mutual influence. Addressing these dynamics involves resolving key intrapersonal dilemmas, including balancing immediate versus long-term well-being, protecting autonomy, and managing AI companionship...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1057/s41599-025-04532-5","openalex_id":"https://openalex.org/W4410827098","cited_by_count":51,"quality_score":71,"matched_keywords":["long-term"],"author_affiliations":["Aedas (United Kingdom)","Contextual Change (United States)","Google (United Kingdom)","Google DeepMind (United Kingdom)","Research Institute in Science of Cyber Security","Takeda (United Kingdom)","University of Oxford"],"concepts":[{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.44536155462265015},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.4429781436920166},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.3205391764640808}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":51}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/ui-evol-automatic-knowledge-evolving-for-computer-use-agents","title":"UI-Evol: Automatic Knowledge Evolving for Computer Use Agents","url":"https://www.microsoft.com/en-us/research/publication/ui-evol-automatic-knowledge-evolving-for-computer-use-agents/","published":"2025-05-28","authors":["Ziyun Zhang","Xinyi Liu","Xiaoyi Zhang","Jun Wang","Gang Chen","Yan Lu"],"abstract":"External knowledge has played a crucial role in the recent development of computer use agents. We identify a critical knowledge-execution gap: retrieved knowledge often fails to translate into effective real-world task execution. Our analysis shows even 90\\% correct knowledge yields only 41\\% execution success rate. To bridge this gap, we propose UI-Evol, a plug-and-play module for autonomous GUI knowledge evolution. UI-Evol consists of two stages: a Retrace Stage that extracts faithful objective action sequences from actual agent-environment interactions, and a Critique Stage that refines existing knowledge by comparing these sequences against external references. We conduct comprehensive experiments on the OSWorld benchmark with the state-of-the-art Agent S2. Our results demonstrate that UI-Evol not only significantly boosts task performance but also addresses a previously overlooked i...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Article (Journal)","Artificial intelligence","Computer science","agent"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"apple:suolrcpeu2os03kfx4o3frxx","title":"Interleaved Reasoning for Large Language Models via Reinforcement Learning","url":"https://machinelearning.apple.com/research/interleaved-reasoning","published":"2025-05-28","authors":["Roy Xie","David Qiu","Deepak Gopinath","Dong Lin","Yanchao Sun","Chong Wang","Saloni Potdar","Bhuwan Dhingra"],"abstract":"Long chain-of-thought (CoT) significantly enhances large language models' (LLM) reasoning capabilities. However, the extensive reasoning traces lead to inefficiencies and an increased time-to-first-token (TTFT). We propose a novel training paradigm that uses reinforcement learning (RL) to guide reasoning LLMs to interleave thinking and answering for multi-hop questions. We observe that models inherently possess the ability to perform interleaved...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["LLM"],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"apple:zpuu3err1gl114gi0ixqsx3o","title":"Foundation Model Hidden Representations for Heart Rate Estimation from Auscultation","url":"https://machinelearning.apple.com/research/heart-rate-estimation","published":"2025-05-28","authors":["Jingping Nie","Tien Dung Tran","Karan Thakkar§","Vasudha Kowtha","Jon Huang","Carlos Avendano","Erdin Azemi","Vikramjit Mitra"],"abstract":"Auscultation, particularly heart sound, is a non-invasivetechnique that provides essential vital sign information.Recently, self-supervised acoustic representation founda-tion models (FMs) have been proposed to offer insightsinto acoustics-based vital signs. However, there has beenlittle exploration of the extent to which auscultation isencoded in these pre-trained FM representations. In thiswork, using a publicly available phonocardioram...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"openalex:W7124268441","title":"Cultural Evolution of Cooperation among LLM Agents","url":"https://doi.org/10.65109/jnmb7739","published":"2025-05-28","authors":["Aron Vallinder","Edward Hughes"],"abstract":"Large language models (LLMs) provide a compelling foundation for building generally-capable AI agents. These agents may soon be deployed at scale in the real world, representing the interests of individual humans (e.g., AI assistants) or groups of humans (e.g., AI-accelerated corporations). At present, relatively little is known about the dynamics of multiple LLM agents interacting over many generations of iterative deployment. In this paper, we examine whether a ''society'' of LLM agents can learn mutually beneficial social norms in the face of incentives to defect, a distinctive feature of human sociality that is arguably crucial to the success of civilization. In particular, we study the evolution of indirect reciprocity across generations of LLM agents playing a classic iterated Donor Game in which agents can observe the recent behavior of their peers. We find that the evolution of c...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.65109/jnmb7739","openalex_id":"https://openalex.org/W7124268441","cited_by_count":0,"quality_score":45,"matched_keywords":["LLM","agent"],"author_affiliations":["DeepMind (United Kingdom)","Google (United Kingdom)","Stockholm University"],"concepts":[{"id":"https://openalex.org/C169903001","display_name":"Reciprocity (cultural anthropology)","score":0.574400007724762},{"id":"https://openalex.org/C176544851","display_name":"Sociality","score":0.52920001745224},{"id":"https://openalex.org/C29122968","display_name":"Incentive","score":0.4747999906539917},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.4440000057220459},{"id":"https://openalex.org/C79416737","display_name":"Social learning","score":0.4334000051021576},{"id":"https://openalex.org/C144024400","display_name":"Sociology","score":0.38510000705718994},{"id":"https://openalex.org/C2778334786","display_name":"Variation (astronomy)","score":0.3580999970436096},{"id":"https://openalex.org/C196187386","display_name":"Sociocultural evolution","score":0.32910001277923584}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7124240789","title":"Game of Thoughts: Iterative Reasoning in Game-Theoretic Domains with Large Language Models","url":"https://doi.org/10.65109/gzfu8152","published":"2025-05-28","authors":["Benjamin Kempinski","Ian Gemp","Kate Larson","Marc Lanctot","Yoram Bachrach","Tal Kachman"],"abstract":"We explore the strategic reasoning capabilities of large language models (LLMs). We first show that naively allowing LLMs to select actions in games can lead to sub-optimal and easily exploitable strategies. To address this limitation we propose several algorithms that guide LLMs to iteratively refine their action choices by simulating game outcomes in self-play, akin to cognitive hierarchy models used to characterize human thought processes in strategic settings. Our empirical results in several prominent resource allocation and auction settings indicate that our approach produces stronger and less exploitable strategies. Hence, emulating human decision-making models can enable us to improve the reasoning capabilities of LLMs in multiagent interactions.","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.65109/gzfu8152","openalex_id":"https://openalex.org/W7124240789","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Google (Canada)","Google (United Kingdom)","Google DeepMind (United Kingdom)","Meta (United Kingdom)","Radboud University Nijmegen","University of Waterloo"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6807000041007996},{"id":"https://openalex.org/C2780791683","display_name":"Action (physics)","score":0.6173999905586243},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4837999939918518},{"id":"https://openalex.org/C169900460","display_name":"Cognition","score":0.4296000003814697},{"id":"https://openalex.org/C31170391","display_name":"Hierarchy","score":0.3862999975681305},{"id":"https://openalex.org/C539667460","display_name":"Management science","score":0.3646000027656555},{"id":"https://openalex.org/C29202148","display_name":"Resource allocation","score":0.3582000136375427},{"id":"https://openalex.org/C188147891","display_name":"Cognitive science","score":0.35109999775886536}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4410814669","title":"FaceChain-MMID: Generating highly identity-consistent realistic portraits via dividing & merging multi-modal representations","url":"https://doi.org/10.1016/j.patcog.2025.111858","published":"2025-05-28","authors":["Chao Xu","Wang Fei","Cheng Yu","Baigui Sun","Jian Zhao"],"abstract":"","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1016/j.patcog.2025.111858","openalex_id":"https://openalex.org/W4410814669","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Alibaba Group (China)","Alibaba Group (United States)","China Central Television","Northwestern Polytechnical University","Walsh University","Zhejiang Lab"],"concepts":[{"id":"https://openalex.org/C71139939","display_name":"Modal","score":0.7250261902809143},{"id":"https://openalex.org/C162462552","display_name":"Portrait","score":0.6227407455444336},{"id":"https://openalex.org/C2778355321","display_name":"Identity (music)","score":0.599591851234436},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.4994773864746094},{"id":"https://openalex.org/C46312422","display_name":"Communication","score":0.4212590754032135},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.41979682445526123},{"id":"https://openalex.org/C11413529","display_name":"Algorithm","score":0.380740761756897},{"id":"https://openalex.org/C153180895","display_name":"Pattern recognition (psychology)","score":0.3693244457244873}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/veritrail-closed-domain-hallucination-detection-with-traceability","title":"VeriTrail: Closed-Domain Hallucination Detection with Traceability","url":"https://www.microsoft.com/en-us/research/publication/veritrail-closed-domain-hallucination-detection-with-traceability/","published":"2025-05-27","authors":["Dasha Metropolitansky","Jonathan Larson"],"abstract":"Even when instructed to adhere to source material, Language Models often generate unsubstantiated content – a phenomenon known as “closed-domain hallucination.” This risk is amplified in processes with multiple generative steps (MGS), compared to processes with a single generative step (SGS). However, due to the greater complexity of MGS processes, we argue that detecting hallucinations in their final outputs is necessary but not sufficient: it is equally important to trace where hallucinated content was likely introduced and how faithful content may have been derived from the source through intermediate outputs. To address this need, we present VeriTrail, the first closed-domain hallucination detection method designed to provide traceability for both MGS and SGS processes. We also introduce the first datasets to include all intermediate outputs as well as human annotations of final outp...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Human language technologies","Computation and Language","Computer science","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/unsupervised-post-training-for-multi-modal-llm-reasoning-via-grpo","title":"Unsupervised Post-Training for Multi-Modal LLM Reasoning via GRPO","url":"https://www.microsoft.com/en-us/research/publication/unsupervised-post-training-for-multi-modal-llm-reasoning-via-grpo/","published":"2025-05-27","authors":["Lai Wei","Yuting Li","Chen Wang","Yue Wang","Linghe Kong","Weiran Huang","Lichao Sun"],"abstract":"Improving Multi-modal Large Language Models (MLLMs) in the post-training stage typically relies on supervised fine-tuning (SFT) or reinforcement learning (RL). However, these supervised methods require expensive and manually annotated multi-modal data--an ultimately unsustainable resource. While recent efforts have explored unsupervised post-training, their methods are complex and difficult to iterate. In this work, we are the first to investigate the use of GRPO, a stable and scalable online RL algorithm, for enabling continual self-improvement without any external supervision. We propose MM-UPT, a simple yet effective framework for unsupervised post-training of MLLMs. MM-UPT builds upon GRPO, replacing traditional reward signals with a self-rewarding mechanism based on majority voting over multiple sampled responses. Our experiments demonstrate that MM-UPT significantly improves the re...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","Multimodal Large Language Models","1970-01-01","LLM"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/training-language-models-to-generate-quality-code-with-program-analysis-feedback","title":"Training Language Models to Generate Quality Code with Program Analysis Feedback","url":"https://www.microsoft.com/en-us/research/publication/training-language-models-to-generate-quality-code-with-program-analysis-feedback/","published":"2025-05-27","authors":["Feng Yao","Zilong Wang","Liyuan Liu","Junxia Cui","Li Zhong","Xiaohan Fu","Haohui Mai","Vish Krishnan","Jianfeng Gao","Jingbo Shang"],"abstract":"Code generation with large language models (LLMs), often termed vibe coding, is increasingly adopted in production but fails to ensure code quality, particularly in security (e.g., SQL injection vulnerabilities) and maintainability (e.g., missing type annotations). Existing methods, such as supervised fine-tuning and rule-based post-processing, rely on labor-intensive annotations or brittle heuristics, limiting their scalability and effectiveness. We propose REAL, a reinforcement learning framework that incentivizes LLMs to generate production-quality code using program analysis-guided feedback. Specifically, REAL integrates two automated signals: (1) program analysis detecting security or maintainability defects and (2) unit tests ensuring functional correctness. Unlike prior work, our framework is prompt-agnostic and reference-free, enabling scalable supervision without manual interven...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","large language models","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/beyond-metrics-evaluating-llms-effectiveness-in-culturally-nuanced-low-resource-real-world-scenarios","title":"Beyond Metrics: Evaluating LLMs' Effectiveness in Culturally Nuanced, Low-Resource Real-World Scenarios","url":"https://www.microsoft.com/en-us/research/publication/beyond-metrics-evaluating-llms-effectiveness-in-culturally-nuanced-low-resource-real-world-scenarios/","published":"2025-05-27","authors":["Millicent Ochieng","Varun Gumma","Sunayana Sitaram","Jindong Wang","Vishrav Chaudhary","Keshet Ronen","Kalika Bali","Jacki O'Neill"],"abstract":"The deployment of Large Language Models (LLMs) in real-world applications presents both opportunities and challenges, particularly in multilingual and code-mixed communication settings. This research evaluates the performance of seven leading LLMs in sentiment analysis on a dataset derived from multilingual and code-mixed WhatsApp chats, including Swahili, English and Sheng. Our evaluation includes both quantitative analysis using metrics like F1 score and qualitative assessment of LLMs' explanations for their predictions. We find that, while Mistral-7b and Mixtral-8x7b achieved high F1 scores, they and other LLMs such as GPT-3.5-Turbo, Llama-2-70b, and Gemma-7b struggled with understanding linguistic and contextual nuances, as well as lack of transparency in their decision-making process as observed from their explanations. In contrast, GPT-4 and GPT-4-Turbo excelled in grasping diverse...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Human language technologies","Human-computer interaction","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/text2grad-reinforcement-learning-from-natural-language-feedback","title":"Text2Grad: Reinforcement Learning from Natural Language Feedback","url":"https://www.microsoft.com/en-us/research/publication/text2grad-reinforcement-learning-from-natural-language-feedback/","published":"2025-05-27","authors":["Hanyang Wang","Lu Wang","Chaoyun Zhang","Tianjun Mao","Si Qin","Qingwei Lin 林庆维","Saravan Rajmohan","Dongmei Zhang"],"abstract":"Traditional RLHF optimizes language models with coarse, scalar rewards that mask the fine-grained reasons behind success or failure, leading to slow and opaque learning. Recent work augments RL with textual critiques through prompting or reflection, improving interpretability but leaving model parameters untouched. We introduce Text2Grad, a reinforcement-learning paradigm that turns free-form textual feedback into span-level gradients. Given human (or programmatic) critiques, Text2Grad aligns each feedback phrase with the relevant token spans, converts these alignments into differentiable reward signals, and performs gradient updates that directly refine the offending portions of the model's policy. This yields precise, feedback-conditioned adjustments instead of global nudges. Text2Grad is realized through three components: (1) a high-quality feedback-annotation pipeline that pairs crit...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"bytedance-seed:266","title":"PaSa: An LLM Agent for Comprehensive Academic Paper Search","url":"https://seed.bytedance.com/en/research/pasa-an-llm-agent-for-comprehensive-academic-paper-search","published":"2025-05-27","authors":["Yichen He","Guanhua Huang","Peiyuan Feng","Yuan Lin","Yuchen Zhang","Hang Li","Weinan E"],"abstract":"We introduce PaSa, an advanced Paper Search agent powered by large language models. PaSa can autonomously make a series of decisions, including invoking search tools, reading papers, and selecting relevant references, to ultimately obtain comprehensive and accurate results for complex scholar queries. We optimize PaSa using reinforcement learning with a synthetic dataset, AutoScholarQuery, which includes 35k fine-grained academic queries and corresponding papers sourced from top-tier AI conference publications. Additionally, we develop RealScholarQuery, a benchmark collecting real-world academic queries to assess PaSa performance in more realistic scenarios. Despite being trained on synthetic data, PaSa significantly outperforms existing baselines on RealScholarQuery, including Google, Google Scholar, Google with GPT-4o for paraphrased queries, ChatGPT (search-enabled GPT-4o), GPT-o1, an...","companies":["ByteDance/Seed"],"matched_orgs":["ByteDance/Seed"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["LLM","Responsible AI","ACL 2025","agent"],"author_affiliations":["ByteDance/Seed"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official ByteDance Seed publication API https://seed.bytedance.com/api/get_article_list_v2"}},{"id":"apple:b16wwhdotpgh2yl5tkkmjm82","title":"CtrlSynth: Controllable Image-Text Synthesis for Data-Efficient Multimodal Learning","url":"https://machinelearning.apple.com/research/controlled-synthesis","published":"2025-05-27","authors":["Qingqing Cao","Mahyar Najibi","Sachin Mehta"],"abstract":"Pretraining robust vision or multimodal foundation models (e.g., CLIP) relies on large-scale datasets that may be noisy, potentially misaligned, and have long-tail distributions. Previous works have shown promising results in augmenting datasets by generating synthetic samples. However, they only support domain-specific ad hoc use cases (e.g., either image or text only, but not both), and are limited in data diversity due to a lack of...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["efficient"],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"apple:fr6pxk0lifyc0ey99ap4oq67","title":"CLIP-UP: A Simple and Efficient Mixture-of-Experts CLIP Training Recipe with Sparse Upcycling","url":"https://machinelearning.apple.com/research/clip-up","published":"2025-05-27","authors":["Xinze Wang","Chen Chen","Yinfei Yang","Hong-You Chen","Bowen Zhang","Aditya Pal","Xiangxin Zhu","Xianzhi Du"],"abstract":"Mixture-of-Experts (MoE) models are crucial for scaling model capacity while controlling inference costs. While integrating MoE into multimodal models like CLIP improves performance, training these models is notoriously challenging and expensive. We propose CLIP-Upcycling (CLIP-UP), an efficient alternative training strategy that converts a pre-trained dense CLIP model into a sparse MoE architecture. Through extensive experimentation with various...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["efficient"],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/rstar-coder-scaling-competitive-code-reasoning-with-a-large-scale-verified-dataset","title":"rStar-Coder: Scaling Competitive Code Reasoning with a Large-Scale Verified Dataset","url":"https://www.microsoft.com/en-us/research/publication/rstar-coder-scaling-competitive-code-reasoning-with-a-large-scale-verified-dataset/","published":"2025-05-26","authors":["Yifei Liu","Li Lyna Zhang","Yi Zhu","Bingcheng Dong","Xudong Zhou","Ning Shang","Fan Yang","Mao Yang"],"abstract":"Advancing code reasoning in large language models (LLMs) is fundamentally limited by the scarcity of high-difficulty datasets, especially those with verifiable input-output test cases necessary for rigorous solution validation at scale. We introduce rStar-Coder, which significantly improves LLM code reasoning capabilities by constructing a large-scale, verified dataset of 418K competition-level code problems, 580K long-reasoning solutions along with rich test cases of varying difficulty. This is achieved through three core contributions: (1) we curate competitive programming code problems and oracle solutions to synthesize new, solvable problems; (2) we introduce a reliable input-output test case synthesis pipeline that decouples the generation into a three-step input generation method and a mutual verification mechanism for effective output labeling; (3) we augment problems with high-qu...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","large language models","1970-01-01","LLM"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"official:896e8add59ccd1a5","title":"SynLogic: Synthesizing Verifiable Reasoning Data at Scale for Learning Logical Reasoning and Beyond","url":"https://huggingface.co/papers/2505.19641","published":"2025-05-26","authors":["MiniMax"],"abstract":"","companies":["MiniMax"],"matched_orgs":["MiniMax"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_report"],"source":"official_report","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["MiniMax"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official company report source"}},{"id":"openalex:W4410738052","title":"LLaFS++: Few-Shot Image Segmentation With Large Language Models","url":"https://doi.org/10.1109/tpami.2025.3573609","published":"2025-05-26","authors":["Lanyun Zhu","Tianrun Chen","Deyi Ji","Peng Xu","Jieping Ye","Jun Liu"],"abstract":"Despite the rapid advancements in few-shot segmentation (FSS), most of existing methods in this domain are hampered by their reliance on the limited and biased information from only a small number of labeled samples. This limitation inherently restricts their capability to achieve sufficiently high levels of performance. To address this issue, this paper proposes a pioneering framework named LLaFS++, which, for the first time, applies large language models (LLMs) into FSS and achieves notable success. LLaFS++ leverages the extensive prior knowledge embedded by LLMs to guide the segmentation process, effectively compensating for the limited information contained in the few-shot labeled samples and thereby achieving superior results. To enhance the effectiveness of the text-based LLMs in FSS scenarios, we present several innovative and task-specific designs within the LLaFS++ framework. Sp...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/tpami.2025.3573609","openalex_id":"https://openalex.org/W4410738052","cited_by_count":3,"quality_score":44,"matched_keywords":["LLM"],"author_affiliations":["Alibaba Group (China)","Lancaster University","Singapore University of Technology and Design","Zhejiang University of Science and Technology"],"concepts":[{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.7216851711273193},{"id":"https://openalex.org/C124504099","display_name":"Image segmentation","score":0.6774560809135437},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6580568552017212},{"id":"https://openalex.org/C2778344882","display_name":"Shot (pellet)","score":0.6224656105041504},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.5969818830490112},{"id":"https://openalex.org/C89600930","display_name":"Segmentation","score":0.5798128843307495},{"id":"https://openalex.org/C65885262","display_name":"Scale-space segmentation","score":0.45344242453575134},{"id":"https://openalex.org/C63099799","display_name":"Image texture","score":0.4353331923484802}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":3}},{"id":"arxiv:2505.19731","title":"Proximal Point Nash Learning from Human Feedback","url":"http://arxiv.org/abs/2505.19731","published":"2025-05-26","authors":["Daniil Tiapkin","Daniele Calandriello","Denis Belomestny","Éric Moulines","Alexey Naumov","Kashif Rasul","Michal Vaľko","Pierre Ménard"],"abstract":"Traditional Reinforcement Learning from Human Feedback (RLHF) often relies on reward models, frequently assuming preference structures like the Bradley--Terry model, which may not accurately capture the complexities of real human preferences (e.g., intransitivity). Nash Learning from Human Feedback (NLHF) offers a more direct alternative by framing the problem as finding a Nash equilibrium of a game defined by these preferences. While many works study the Nash learning problem directly in the policy space, we instead consider it under a more realistic policy parametrization setting. We first analyze a simple self-play policy gradient method, which is equivalent to Online IPO. We establish high-probability last-iterate convergence guarantees for this method, but our analysis also reveals a possible stability limitation of the underlying dynamics. Motivated by this, we embed the self-play....","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"preprint","doi":"https://doi.org/10.48550/arxiv.2505.19731","openalex_id":"https://openalex.org/W4414587654","cited_by_count":0,"quality_score":41,"matched_keywords":["preference"],"author_affiliations":["Centre de Mathématiques Appliquées de l'École polytechnique","Google (United Kingdom)","Google DeepMind (United Kingdom)","Hugging Face","Laboratoire de Mathématiques d'Orsay","Mohamed bin Zayed University of Artificial Intelligence","National Research University Higher School of Economics","Russian Academy of Sciences","Steklov Mathematical Institute","University of Duisburg-Essen","École Normale Supérieure de Lyon","École Polytechnique"],"concepts":[{"id":"https://openalex.org/C46814582","display_name":"Nash equilibrium","score":0.7105000019073486},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5979999899864197},{"id":"https://openalex.org/C126255220","display_name":"Mathematical optimization","score":0.5910999774932861},{"id":"https://openalex.org/C97541855","display_name":"Reinforcement learning","score":0.5504999756813049},{"id":"https://openalex.org/C32407928","display_name":"Best response","score":0.4975000023841858},{"id":"https://openalex.org/C2777303404","display_name":"Convergence (economics)","score":0.4878000020980835},{"id":"https://openalex.org/C141824439","display_name":"Epsilon-equilibrium","score":0.43950000405311584},{"id":"https://openalex.org/C57869625","display_name":"Rate of convergence","score":0.37549999356269836}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"arxiv:2505.20081","title":"Inference-time Alignment in Continuous Space","url":"https://huggingface.co/papers/2505.20081","published":"2025-05-26","authors":["Yige Yuan","Teng Xiao","Li Yunfan","Bingbing Xu","Shuchang Tao","Yunqi Qiu","Huawei Shen","Xueqi Cheng"],"abstract":"Aligning large language models with human feedback at inference time has received increasing attention due to its flexibility. Existing methods rely on generating multiple responses from the base policy for search using a reward model, which can be considered as searching in a discrete response space. However, these methods struggle to explore informative candidates when the base policy is weak or the candidate set is small, resulting in limited effectiveness. In this paper, to address this problem, we propose Simple Energy Adaptation (SEA), a simple yet effective algorithm for inference-time alignment. In contrast to expensive search over the discrete space, SEA directly adapts original responses from the base policy toward the optimal one via gradient-based sampling in continuous latent space. Specifically, SEA formulates inference as an iterative optimization procedure on an energy fu...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["huggingface_search"],"source":"huggingface_search","work_type":"","doi":"","openalex_id":"","cited_by_count":0,"quality_score":27,"matched_keywords":[],"author_affiliations":[],"concepts":[],"official_report":false,"quality_signals":{"company_match_source":"HuggingFace Papers organization/author metadata"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/wina-weight-informed-neuron-activation-for-accelerating-large-language-model-inference","title":"WINA: Weight Informed Neuron Activation for Accelerating Large Language Model Inference","url":"https://www.microsoft.com/en-us/research/publication/wina-weight-informed-neuron-activation-for-accelerating-large-language-model-inference/","published":"2025-05-25","authors":["Sihan Chen","Dan Zhao","Jongwoo Ko","Colby R. Banbury","Huiping Zhuang","Luming Liang","Tianyi Chen"],"abstract":"The growing computational demands of large language models (LLMs) make efficient inference and activation strategies increasingly critical. While recent approaches, such as Mixture-of-Experts (MoE), leverage selective activation but require specialized training, training-free sparse activation methods offer broader applicability and superior resource efficiency through their plug-and-play design. However, many existing methods rely solely on hidden state magnitudes to determine activation, resulting in high approximation errors and suboptimal inference accuracy. To address these limitations, we propose WINA (Weight Informed Neuron Activation), a novel, simple, and training-free sparse activation framework that jointly considers hidden state magnitudes and the column-wise [latex]\\ell2[/latex]-norms of weight matrices. We show that this leads to a sparsification strategy that obtains optim...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":80,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","1970-01-01","LLM","language model","efficient"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/token-importance-guided-direct-preference-optimization","title":"Token-Importance Guided Direct Preference Optimization","url":"https://www.microsoft.com/en-us/research/publication/token-importance-guided-direct-preference-optimization/","published":"2025-05-25","authors":["Ning Yang","Hai Lin","Yibo Liu","Baoliang Tian","Guoqing Liu","Haijun Zhang"],"abstract":"Aligning Large Language Models (LLMs) with human preferences is crucial for safe and effective AI interactions. While popular methods like Direct Preference Optimization (DPO) have simplified alignment, they remain sensitive to data noise and overlook the differential importance of individual tokens. Existing token-level approaches often rely on probability prediction or simplistic weighting schemes to obtain token importance, which still cannot fully address these issues. To solve this problem, we propose the Token-Importance Guided Direct Preference Optimization (TI-DPO), a framework that achieves fine-grained semantic control through two synergistic innovations. First, we propose a novel hybrid weighting mechanism that combines gradient attribution with a Gaussian prior, ensuring both the accuracy and robustness of token importance scores. Second, we employ a triplet loss to provide s...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","1970-01-01","preference","efficient"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/point-rft-improving-multimodal-reasoning-with-visually-grounded-reinforcement-finetuning","title":"Point-RFT: Improving Multimodal Reasoning with Visually Grounded Reinforcement Finetuning","url":"https://www.microsoft.com/en-us/research/publication/point-rft-improving-multimodal-reasoning-with-visually-grounded-reinforcement-finetuning/","published":"2025-05-25","authors":["Minheng Ni","Zhengyuan Yang","Linjie Li","Chung-Ching Lin","Kevin Lin","Wangmeng Zuo","Lijuan Wang"],"abstract":"Recent advances in large language models have significantly improved textual reasoning through the effective use of Chain-of-Thought (CoT) and reinforcement learning. However, extending these successes to vision-language tasks remains challenging due to inherent limitations in text-only CoT, such as visual hallucinations and insufficient multimodal integration. In this paper, we introduce Point-RFT, a multimodal reasoning framework explicitly designed to leverage visually grounded CoT reasoning for visual document understanding. Our approach consists of two stages: First, we conduct format finetuning using a curated dataset of 71K diverse visual reasoning problems, each annotated with detailed, step-by-step rationales explicitly grounded to corresponding visual elements. Second, we employ reinforcement finetuning targeting visual document understanding. On ChartQA, our approach improves....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","large language models","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/grammars-of-formal-uncertainty-when-to-trust-llms-in-automated-reasoning-tasks","title":"Grammars of Formal Uncertainty: When to Trust LLMs in Automated Reasoning Tasks","url":"https://www.microsoft.com/en-us/research/publication/grammars-of-formal-uncertainty-when-to-trust-llms-in-automated-reasoning-tasks/","published":"2025-05-25","authors":["Debargha Ganguly","Vikash Singh","Sreehari Sankar","Biyao Zhang","Xuecen Zhang","Srinivasan Iyengar","Xiaotian Han","Amit Sharma","S. Kalyanaraman","Vipin Chaudhary"],"abstract":"Large language models (LLMs) show remarkable promise for democratizing automated reasoning by generating formal specifications. However, a fundamental tension exists: LLMs are probabilistic, while formal verification demands deterministic guarantees. This paper addresses this epistemological gap by comprehensively investigating failure modes and uncertainty quantification (UQ) in LLM-generated formal artifacts. Our systematic evaluation of five frontier LLMs reveals Satisfiability Modulo Theories (SMT) based autoformalization's domain-specific impact on accuracy (from +34.8% on logical tasks to -44.5% on factual ones), with known UQ techniques like the entropy of token probabilities failing to identify these errors. We introduce a probabilistic context-free grammar (PCFG) framework to model LLM outputs, yielding a refined uncertainty taxonomy. We find uncertainty signals are task-depende...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","1970-01-01","LLM"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"bytedance-seed:277","title":"DiTAR: Diffusion Transformer Autoregressive Modeling for Speech Generation","url":"https://seed.bytedance.com/en/research/ditar-diffusion-transformer-autoregressive-modeling-for-speech-generation","published":"2025-05-25","authors":["Dongya Jia","Zhuo Chen","Jiawei Chen","Chenpeng Du","Jian Wu","Jian Cong","Xiaobin Zhuang","Chumin Li","Zhen Wei","Yuping Wang","Yuxuan Wang"],"abstract":"Several recent studies have attempted to autoregressively generate continuous speech representations without discrete speech tokens by combining diffusion and autoregressive models, yet they often face challenges with excessive computational loads or suboptimal outcomes. In this work, we propose Diffusion Transformer Autoregressive Modeling (DiTAR), a patch-based autoregressive framework combining a language model with a diffusion transformer. This approach significantly enhances the efficacy of autoregressive models for continuous tokens and reduces computational demands. DiTAR utilizes a divide-and-conquer strategy for patch generation, where the language model processes aggregated patch embeddings and the diffusion transformer subsequently generates the next patch based on the output of the language model. For inference, we propose defining temperature as the time point of introducing...","companies":["ByteDance/Seed"],"matched_orgs":["ByteDance/Seed"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Audio and Speech Processing","Speech","ICML 2025","language model"],"author_affiliations":["ByteDance/Seed"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official ByteDance Seed publication API https://seed.bytedance.com/api/get_article_list_v2"}},{"id":"openalex:W4411725541","title":"LPerceptual Quality Assessment of AI Generated Content Videos: a Dataset and Benchmark","url":"https://doi.org/10.1109/iscas56072.2025.11043899","published":"2025-05-25","authors":["Zhichao Zhang","Wei Sun","Xinyue Li","Yunhao Li","Jun Jia","Xiongkuo Min","Zicheng Zhang","Chunyi Li","Zhongpeng Ji","Fengyu Sun","Shangling Jui","Guangtao Zhai"],"abstract":"In recent years, artificial intelligence (AI) driven video generation has garnered significant attention due to advancements in large language model techniques. Thus, there is a great demand to explore the effectiveness of video quality assessment (VQA) models in evaluating the perceptual quality of AI-generated content (AIGC) videos and in optimizing video generation techniques. Therefore, in this paper, we try to systemically investigate the AIGC-VQA problem from both subjective and objective quality assessment perspectives. For the subjective perspective, we construct a Large-scale Generated Video Quality assessment (LGVQ) dataset, consisting of 2,808 AIGC videos generated by 6 video generation models using 468 carefully selected text prompts. We evaluate the perceptual quality of AIGC videos from three dimensions: spatial quality, temporal quality, and text-to-video alignment, which....","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/iscas56072.2025.11043899","openalex_id":"https://openalex.org/W4411725541","cited_by_count":0,"quality_score":41,"matched_keywords":["language model"],"author_affiliations":["Huawei Technologies (China)","Shanghai Jiao Tong University"],"concepts":[{"id":"https://openalex.org/C185798385","display_name":"Benchmark (surveying)","score":0.8095135688781738},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7241427302360535},{"id":"https://openalex.org/C2778152352","display_name":"Content (measure theory)","score":0.5909422039985657},{"id":"https://openalex.org/C2779530757","display_name":"Quality (philosophy)","score":0.528603732585907},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4449616074562073},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.3929326832294464},{"id":"https://openalex.org/C33923547","display_name":"Mathematics","score":0.08362576365470886},{"id":"https://openalex.org/C58640448","display_name":"Cartography","score":0.04015636444091797}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4410717231","title":"RL-Finetuning of OpenAI o1-mini to Enhance Biomedical Reasoning","url":"https://doi.org/10.1101/2025.05.19.654988","published":"2025-05-24","authors":["Kyle Swanson","Yiqun T. Chen","Aaron Jaech","James Zou"],"abstract":"Abstract Recent breakthroughs in advanced reasoning large language models (LLMs), such as OpenAI’s o1, have achieved impressive results in domains like math and coding. However, it’s not clear how much this type of reasoning helps in solving biomedical problems that involve more domain specialized knowledge and open-ended reasoning. Across two biomedical domains—gene characterization and small molecule property prediction—we find that the commercially available o1-mini model does not consistently outperform non-reasoning LLMs like GPT-4o. This motivated us to explore how much we can improve o1-mini’s biomedical reasoning through reinforcement learning (RL) finetuning. We show that RL finetuning of o1-mini results in large improvements in performance on gene classification, where it surprisingly outperformed domain-specific state-of-the-art models on some tasks. The results are mixed for....","companies":["OpenAI"],"matched_orgs":["OpenAI"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"preprint","doi":"https://doi.org/10.1101/2025.05.19.654988","openalex_id":"https://openalex.org/W4410717231","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Johns Hopkins Medicine","Johns Hopkins University","OpenAI (United States)","Stanford University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.569312572479248},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.37301671504974365}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/personalized-safety-in-llms-a-benchmark-and-a-planning-based-agent-approach","title":"Personalized Safety in LLMs: A Benchmark and A Planning-Based Agent Approach","url":"https://www.microsoft.com/en-us/research/publication/personalized-safety-in-llms-a-benchmark-and-a-planning-based-agent-approach/","published":"2025-05-23","authors":["Yuchen Wu","Edward Sun","Kaijie Zhu","Jianxun Lian","José Hernández-Orallo","Aylin Caliskan","Jindong Wang"],"abstract":"Large language models (LLMs) typically generate identical or similar responses for all users given the same prompt, posing serious safety risks in high-stakes applications where user vulnerabilities differ widely. Existing safety evaluations primarily rely on context-independent metrics - such as factuality, bias, or toxicity - overlooking the fact that the same response may carry divergent risks depending on the user's background or condition. We introduce personalized safety to fill this gap and present PENGUIN - a benchmark comprising 14,000 scenarios across seven sensitive domains with both context-rich and context-free variants. Evaluating six leading LLMs, we demonstrate that personalized user information significantly improves safety scores by 43.2%, confirming the effectiveness of personalization in safety alignment. However, not all context attributes contribute equally to safet...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":88,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","large language models","1970-01-01","LLM","personalized","personalization","agent"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"bytedance-seed:281","title":"Over-Tokenized Transformer: Vocabulary is Generally Worth Scaling","url":"https://seed.bytedance.com/en/research/over-tokenized-transformer-vocabulary-is-generally-worth-scaling","published":"2025-05-23","authors":["Hongzhi Huang","Defa Zhu","Banggu Wu","Yutao Zeng","Ya Wang","Qiyang Min","Xun Zhou"],"abstract":"Tokenization is a fundamental component of large language models (LLMs), yet its influence on model scaling and performance is not fully explored. In this paper, we introduce OverTokenized Transformers, a novel framework that decouples input and output vocabularies to improve language modeling performance. Specifically, our approach scales up input vocabularies to leverage multi-gram tokens. Through extensive experiments, we uncover a log-linear relationship between input vocabulary size and training loss, demonstrating that larger input vocabularies consistently enhance model performance, regardless of model size. Using a large input vocabulary, we achieve performance comparable to double-sized baselines with no additional cost. Our findings highlight the importance of tokenization in scaling laws and provide practical insight for tokenizer design, paving the way for more efficient and....","companies":["ByteDance/Seed"],"matched_orgs":["ByteDance/Seed"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Computation and Language","LLM","ICML 2025","efficient"],"author_affiliations":["ByteDance/Seed"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official ByteDance Seed publication API https://seed.bytedance.com/api/get_article_list_v2"}},{"id":"official:7cd3fd976ee99d8d","title":"Addendum to OpenAI o3 and o4-mini system card: OpenAI o3 Operator","url":"https://openai.com/index/o3-o4-mini-system-card-addendum-operator-o3","published":"2025-05-23","authors":["OpenAI"],"abstract":"We are replacing the existing GPT-4o-based model for Operator with a version based on OpenAI o3. The API version will remain based on 4o.","companies":["OpenAI"],"matched_orgs":["OpenAI"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Safety"],"author_affiliations":["OpenAI"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official RSS feed https://openai.com/news/rss.xml"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/reprompt-reasoning-augmented-reprompting-for-text-to-image-generation-via-reinforcement-learning","title":"RePrompt: Reasoning-Augmented Reprompting for Text-to-Image Generation via Reinforcement Learning","url":"https://www.microsoft.com/en-us/research/publication/reprompt-reasoning-augmented-reprompting-for-text-to-image-generation-via-reinforcement-learning/","published":"2025-05-22","authors":["Ming-Kuan Wu","Lu Wang","Pu Zhao","Fangkai Yang","Jianjin Zhang","Jianfeng Liu","Yuefeng Zhan","Weihao Han","Hao Sun","Jiayi Ji","Xiaoshuai Sun","Qingwei Lin"],"abstract":"Despite recent progress in text-to-image (T2I) generation, existing models often struggle to faithfully capture user intentions from short and under-specified prompts. While prior work has attempted to enhance prompts using large language models (LLMs), these methods frequently generate stylistic or unrealistic content due to insufficient grounding in visual semantics and real-world composition. Inspired by recent advances in reasoning for language model, we propose RePrompt, a novel reprompting framework that introduces explicit reasoning into the prompt enhancement process via reinforcement learning. Instead of relying on handcrafted rules or stylistic rewrites, our method trains a language model to generate structured, self-reflective prompts by optimizing for image-level outcomes. The tailored reward models assesse the generated images in terms of human preference, semantic alignment...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":80,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","Reinforcement learning","1970-01-01","language model","preference"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/dynamic-risk-assessments-for-offensive-cybersecurity-agents","title":"Dynamic Risk Assessments for Offensive Cybersecurity Agents","url":"https://www.microsoft.com/en-us/research/publication/dynamic-risk-assessments-for-offensive-cybersecurity-agents/","published":"2025-05-22","authors":["Boyi Wei","Benedikt Stroebl","Jiacen Xu","Joie Zhang","Zhou Li","Peter Henderson"],"abstract":"Foundation models are increasingly becoming better autonomous programmers, raising the prospect that they could also automate dangerous offensive cyber-operations. Current frontier model audits probe the cybersecurity risks of such agents, but most fail to account for the degrees of freedom available to adversaries in the real world. In particular, with strong verifiers and financial incentives, agents for offensive cybersecurity are amenable to iterative improvement by would-be adversaries. We argue that assessments should take into account an expanded threat model in the context of cybersecurity, emphasizing the varying degrees of freedom that an adversary may possess in stateful and non-stateful environments within a fixed compute budget. We show that even with a relatively small compute budget (8 H100 GPU Hours in our study), adversaries can improve an agent's cybersecurity capabilit...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":80,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Security, privacy, and cryptography","Computer science","foundation models","1970-01-01","agent"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/deep-video-discovery-agentic-search-with-tool-use-for-long-form-video-understanding","title":"Deep Video Discovery: Agentic Search with Tool Use for Long-form Video Understanding","url":"https://www.microsoft.com/en-us/research/publication/deep-video-discovery-agentic-search-with-tool-use-for-long-form-video-understanding/","published":"2025-05-22","authors":["Xiaoyi Zhang","Zhaoyang Jia","Zongyu Guo","Jiahao Li","Bin Li","Houqiang Li","Yan Lu"],"abstract":"Long-form video understanding presents significant challenges due to extensive temporal-spatial complexity and the difficulty of question answering under such extended contexts. While Large Language Models (LLMs) have demonstrated considerable advancements in video analysis capabilities and long context handling, they continue to exhibit limitations when processing information-dense hour-long videos. To overcome such limitations, we propose the Deep Video Discovery agent to leverage an agentic search strategy over segmented video clips. Different from previous video agents manually designing a rigid workflow, our approach emphasizes the autonomous nature of agents. By providing a set of search-centric tools on multi-granular video database, our DVD agent leverages the advanced reasoning capability of LLM to plan on its current observation state, strategically selects tools, formulates ap...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":80,"matched_keywords":["Inproceedings (Conference)","Computer vision","Computer science","large language models","1970-01-01","LLM","agent"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"apple:sga35zmxk3k0k5hv8flz036x","title":"SPD: Sync-Point Drop for Efficient Tensor Parallelism of Large Language Models","url":"https://machinelearning.apple.com/research/sync-point-drop","published":"2025-05-22","authors":["Han-Byul Kim","Duc Hoang","Arnav Kundu","Mohammad Samragh","Minsik Cho"],"abstract":"With the rapid expansion in the scale of largelanguage models (LLMs), enabling efficient distributed inference across multiple computing units has become increasingly critical. However, communication overheads from popular distributedinference techniques such as Tensor Parallelismpose a significant challenge to achieve scalabilityand low latency. Therefore, we introduce a noveloptimization technique, Sync-Point Drop (SPD), to reduce...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["efficient"],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"openalex:W4410594649","title":"RLFE-IDS: A framework of Intrusion Detection System based on Retrieval Augmented Generation and Large Language Model","url":"https://doi.org/10.1016/j.comnet.2025.111341","published":"2025-05-22","authors":["Xuewei Li","Zengyang Zheng","Mankun Zhao","Yue Zhao","Lifeng Shi","Baoliang Wang"],"abstract":"","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1016/j.comnet.2025.111341","openalex_id":"https://openalex.org/W4410594649","cited_by_count":8,"quality_score":53,"matched_keywords":["language model","retrieval"],"author_affiliations":["Alibaba Group (China)","Tianjin University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.9391025304794312},{"id":"https://openalex.org/C35525427","display_name":"Intrusion detection system","score":0.7716983556747437},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4073391556739807},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.3335176408290863},{"id":"https://openalex.org/C124101348","display_name":"Data mining","score":0.32857006788253784}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":8}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/dion-distributed-orthonormalized-updates","title":"Dion: Distributed Orthonormalized Updates","url":"https://www.microsoft.com/en-us/research/publication/dion-distributed-orthonormalized-updates/","published":"2025-05-21","authors":["Kwangjun Ahn","Byron Xu","Natalie Abreu","John Langford"],"abstract":"Recent work has shown that orthonormal matrix updates speed up neural network optimization, improve training stability, and offer better hyperparameter transfer across model sizes. Applying these updates efficiently when model weights and optimizer states are sharded across a large-scale distributed LLM training system remains a major challenge. We introduce Dion (DIstributed OrthoNormalization), a scalable and communication-efficient orthonormalizing optimizer. Dion leverages low-rank approximation and decoupled momentum buffers, eliminating the need for full gradient synchronization while producing numerically equivalent results. It is compatible with simultaneous DDP, FSDP, and TP parallelism, and it computes an orthonormalized update without unsharding a full parameter matrix on any single device. We evaluate Dion on language models from 120M to 3B parameters and find that its benefi...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":80,"matched_keywords":["Miscellaneous","Artificial intelligence","Computer science","Machine learning","mathematics","LLM","efficient"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/path-attention-position-encoding-via-accumulating-householder-transformations","title":"PaTH Attention: Position Encoding via Accumulating Householder Transformations","url":"https://www.microsoft.com/en-us/research/publication/path-attention-position-encoding-via-accumulating-householder-transformations/","published":"2025-05-21","authors":["Songlin Yang","Yikang Shen","Kaiyue Wen","Shawn Tan","Mayank Mishra","Liliang Ren","Rameswar Panda","Yoon Kim"],"abstract":"The attention mechanism is a core primitive in modern large language models (LLMs) and AI more broadly. Since attention by itself is permutation-invariant, position encoding is essential for modeling structured domains such as language. Rotary position encoding (RoPE) has emerged as the de facto standard approach for position encoding and is part of many modern LLMs. However, in RoPE the key/query transformation between two elements in a sequence is only a function of their relative position and otherwise independent of the actual input. This limits the expressivity of RoPE-based transformers. This paper describes PaTH, a flexible data-dependent position encoding scheme based on accumulated products of Householder(like) transformations, where each transformation is data-dependent, i.e., a function of the input. We derive an efficient parallel algorithm for training through exploiting a c...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","large language models","1970-01-01","efficient"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/aurora-a-foundation-model-for-the-earth-system","title":"A Foundation Model for the Earth System","url":"https://www.microsoft.com/en-us/research/publication/aurora-a-foundation-model-for-the-earth-system/","published":"2025-05-21","authors":["Cristian Bodnar","Wessel Bruinsma","Ana Lucic","Megan Stanley","Anna Allen","Johannes Brandstetter","Patrick Garvan","Maik Riechert","Jonathan Weyn","Haiyu Dong","Anna Vaughan","Jayesh Gupta"],"abstract":"Reliable forecasting of the Earth system is essential for mitigating natural disasters and supporting human progress. Traditional numerical models, although powerful, are extremely computationally expensive. Recent advances in artificial intelligence (AI) have shown promise in improving both predictive performance and efficiency, yet their potential remains underexplored in many Earth system domains. Here we introduce Aurora, a large-scale foundation model trained on more than one million hours of diverse geophysical data. Aurora outperforms operational forecasts in predicting air quality, ocean waves, tropical cyclone tracks and high-resolution weather, all at orders of magnitude lower computational cost. With the ability to be fine-tuned for diverse applications at modest expense, Aurora represents a notable step towards democratizing accurate and efficient Earth system predictions. Th...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Article (Journal)","Artificial intelligence","Atmospheric and Oceanic Physics","Machine learning","efficient"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"bytedance-seed:239","title":"MMaDA: Multimodal Large Diffusion Language Models","url":"https://seed.bytedance.com/en/research/mmada-multimodal-large-diffusion-language-models","published":"2025-05-21","authors":["Ling Yang","Ye Tian","Bowen Li","Xinchen Zhang","Ke Shen","Yunhai Tong","Mengdi Wang"],"abstract":"We introduce MMaDA, a novel class of multimodal diffusion foundation models designed to achieve superior performance across diverse domains such as textual reasoning, multimodal understanding, and text-to-image generation. The approach is distinguished by three key innovations: (i) MMaDA adopts a unified diffusion architecture with a shared probabilistic formulation and a modality-agnostic design, eliminating the need for modality-specific components. This architecture ensures seamless integration and processing across different data types. (ii) We implement a mixed long chain-of-thought (CoT) fine-tuning strategy that curates a unified CoT format across modalities. By aligning reasoning processes between textual and visual domains, this strategy facilitates cold-start training for the final reinforcement learning (RL) stage, thereby enhancing the model's ability to handle complex tasks....","companies":["ByteDance/Seed"],"matched_orgs":["ByteDance/Seed"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Computer Vision","LLM","arXiv"],"author_affiliations":["ByteDance/Seed"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official ByteDance Seed publication API https://seed.bytedance.com/api/get_article_list_v2"}},{"id":"openalex:W4410552044","title":"Touch100k: A large-scale touch-language-vision dataset for touch-centric multimodal representation","url":"https://doi.org/10.1016/j.inffus.2025.103305","published":"2025-05-21","authors":["Ning Cheng","Jinan Xu","Changhao Guan","Jing Gao","Weihao Wang","You Li","Fandong Meng","Jie Zhou","Bin Fang","Wenjuan Han"],"abstract":"","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1016/j.inffus.2025.103305","openalex_id":"https://openalex.org/W4410552044","cited_by_count":15,"quality_score":52,"matched_keywords":[],"author_affiliations":["Beijing Jiaotong University","Beijing University of Posts and Telecommunications","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7689170837402344},{"id":"https://openalex.org/C2776359362","display_name":"Representation (politics)","score":0.6339234113693237},{"id":"https://openalex.org/C2778755073","display_name":"Scale (ratio)","score":0.5905792713165283},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.4692673087120056},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.46662652492523193},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.39812761545181274},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.3246413469314575},{"id":"https://openalex.org/C58640448","display_name":"Cartography","score":0.10452446341514587}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":15}},{"id":"official:c2d684b864d8dc8c","title":"Hunyuan-TurboS: Advancing Large Language Models through Mamba-Transformer Synergy and Adaptive Chain-of-Thought","url":"https://huggingface.co/papers/2505.15431","published":"2025-05-21","authors":["Tencent Hunyuan"],"abstract":"","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_report"],"source":"official_report","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Tencent/Hunyuan"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official company report source"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/the-agentic-economy","title":"The Agentic Economy","url":"https://www.microsoft.com/en-us/research/publication/the-agentic-economy/","published":"2025-05-20","authors":["David Rothschild","Markus Mobius","Jake Hofman","Eleanor Dillon","Daniel G. Goldstein","Nicole Immorlica","Sonia Jaffe","Brendan Lucier","Aleksandrs Slivkins","Matthew Vogel"],"abstract":"Generative AI has transformed human-computer interaction by enabling natural language interfaces and the emergence of autonomous agents capable of acting on users' behalf. While early applications have improved individual productivity, these gains have largely been confined to predefined tasks within existing workflows. We argue that the more profound economic impact lies in reducing communication frictions between consumers and businesses. This shift could reorganize markets, redistribute power, and catalyze the creation of new products and services. We explore the implications of an agentic economy, where assistant agents act on behalf of consumers and service agents represent businesses, interacting programmatically to facilitate transactions. A key distinction we draw is between unscripted interactions -- enabled by technical advances in natural language and protocol design -- and un...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":84,"matched_keywords":["Miscellaneous","Artificial intelligence","Economics","AI agents","Autonomous agent","Computer science","Generative AI","Multi-agent system"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/rd-agent-quant-a-multi-agent-framework-for-data-centric-factors-and-model-joint-optimization","title":"R&D-Agent-Quant: A Multi-Agent Framework for Data-Centric Factors and Model Joint Optimization","url":"https://www.microsoft.com/en-us/research/publication/rd-agent-quant-a-multi-agent-framework-for-data-centric-factors-and-model-joint-optimization/","published":"2025-05-20","authors":["Yuante Li","Xu Yang","Xiao Yang","Minrui Xu","Xisen Wang","Weiqing Liu","Jiang Bian"],"abstract":"Financial markets pose fundamental challenges for asset return prediction due to their high dimensionality, non-stationarity, and persistent volatility. Despite advances in large language models and multi-agent systems, current quantitative research pipelines suffer from limited automation, weak interpretability, and fragmented coordination across key components such as factor mining and model innovation. In this paper, we propose R&D-Agent for Quantitative Finance, in short RD-Agent(Q), the first data-centric multi-agent framework designed to automate the full-stack research and development of quantitative strategies via coordinated factor-model co-optimization. RD-Agent(Q) decomposes the quant process into two iterative stages: a Research stage that dynamically sets goal-aligned prompts, formulates hypotheses based on domain priors, and maps them to concrete tasks, and a Development st...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":80,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Economics","Computer science","1970-01-01","agent","multi-agent"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/online-scheduling-for-llm-inference-with-kv-cache-constraints","title":"Online Scheduling for LLM Inference with KV Cache Constraints","url":"https://www.microsoft.com/en-us/research/publication/online-scheduling-for-llm-inference-with-kv-cache-constraints/","published":"2025-05-20","authors":["Patrick Jaillet","Jiashuo Jiang","Konstantina Mellou","Marco Molinaro","Chara Podimata","Zijie Zhou"],"abstract":"Large Language Model (LLM) inference, where a trained model generates text one word at a time in response to user prompts, is a computationally intensive process requiring efficient scheduling to optimize latency and resource utilization. A key challenge in LLM inference is the management of the Key-Value (KV) cache, which reduces redundant computations but introduces memory constraints. In this work, we model LLM inference with KV cache constraints theoretically and propose a novel batching and scheduling algorithm that minimizes inference latency while effectively managing the KV cache’s memory.More specifically, we make the following contributions. First, to evaluate the performance of online algorithms for scheduling in LLM inference, we introduce a hindsight optimal benchmark, formulated as an integer program that computes the minimum total inference latency under full future inform...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":80,"matched_keywords":["Miscellaneous","Algorithms","Artificial intelligence","LLM","language model","memory","efficient"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/steering-generative-models-with-experimental-data-for-protein-fitness-optimization","title":"Steering Generative Models with Experimental Data for Protein Fitness Optimization","url":"https://www.microsoft.com/en-us/research/publication/steering-generative-models-with-experimental-data-for-protein-fitness-optimization/","published":"2025-05-20","authors":["Jason Yang","Wenda Chu","Daniel Khalil","Raul Astudillo","Bruce Wittmann","Frances H. Arnold","Yisong Yue"],"abstract":"Protein fitness optimization involves finding a protein sequence that maximizes desired quantitative properties in a combinatorially large design space of possible sequences. Recent advances in steering protein generative models (e.g., diffusion models and language models) with labeled data offer a promising approach. However, most previous studies have optimized surrogate rewards and/or utilized large amounts of labeled data for steering, making it unclear how well existing methods perform and compare to each other in real-world optimization campaigns where fitness is measured through low-throughput wet-lab assays. In this study, we explore fitness optimization using small amounts (hundreds) of labeled sequence-fitness pairs and comprehensively evaluate strategies such as classifier guidance and posterior sampling for guiding generation from different discrete diffusion models of protei...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Biology","Computer science","Protein sequencing","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/prototypical-human-ai-collaboration-behaviors-from-llm-assisted-writing-in-the-wild","title":"Prototypical Human-AI Collaboration Behaviors from LLM-Assisted Writing in the Wild","url":"https://www.microsoft.com/en-us/research/publication/prototypical-human-ai-collaboration-behaviors-from-llm-assisted-writing-in-the-wild/","published":"2025-05-20","authors":["Sheshera Mysore","Debarati Das","Hancheng Cao","Bahar Sarrafzadeh"],"abstract":"As large language models (LLMs) are used in complex writing workflows, users engage in multi-turn interactions to steer generations to better fit their needs. Rather than passively accepting output, users actively refine, explore, and co-construct text. We conduct a large-scale analysis of this collaborative behavior for users engaged in writing tasks in the wild with two popular AI assistants, Bing Copilot and WildChat. Our analysis goes beyond simple task classification or satisfaction estimation common in prior work and instead characterizes how users interact with LLMs through the course of a session. We identify prototypical behaviors in how users interact with LLMs in prompts following their original request. We refer to these as Prototypical Human-AI Collaboration Behaviors (PATHs) and find that a small group of PATHs explain a majority of the variation seen in user-LLM interactio...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Miscellaneous","Artificial intelligence","Human language technologies","Computer science","LLM"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"bytedance-seed:834","title":"DAPO: An Open-Source LLM Reinforcement Learning System at Scale","url":"https://seed.bytedance.com/en/research/dapo-an-open-source-llm-reinforcement-learning-system-at-scale","published":"2025-05-20","authors":["Qiying Yu","Zheng Zhang","Ruofei Zhu","Yufeng Yuan","Xiaochen Zuo","Yu Yue","Weinan Dai","Tiantian Fan","Gaohong Liu","Lingjun Liu","Xin Liu","Haibin Lin"],"abstract":"Inference scaling empowers LLMs with unprecedented reasoning ability, with reinforcement learning as the core technique to elicit complex reasoning. However, key technical details of state-of-the-art reasoning LLMs are concealed (such as in OpenAI o1 blog and DeepSeek R1 technical report), thus the community still struggles to reproduce their RL training results. We propose the Decoupled Clip and Dynamic sAmpling Policy Optimization (DAPO) algorithm, and fully open-source a state-of-the-art large-scale RL system that achieves 50 points on AIME 2024 using Qwen2.5-32B base model. Unlike previous works that withhold training details, we introduce four key techniques of our algorithm that make large-scale LLM RL a success. In addition, we open-source our training code, which is built on the verl framework, along with a carefully curated and processed dataset. These components of our open-sou...","companies":["ByteDance/Seed"],"matched_orgs":["ByteDance/Seed"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Machine Learning","Computation and Language","LLM","NeurIPS 2025"],"author_affiliations":["ByteDance/Seed"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official ByteDance Seed publication API https://seed.bytedance.com/api/get_article_list_v2"}},{"id":"openalex:W4410536678","title":"Otter: A Multi-Modal Model With In-Context Instruction Tuning","url":"https://doi.org/10.1109/tpami.2025.3571946","published":"2025-05-20","authors":["Bo Li","Yuanhan Zhang","Liangyu Chen","Jinghao Wang","Fanyi Pu","Joshua Adrian Cahyono","Jingkang Yang","Chunyuan Li","Ziwei Liu"],"abstract":"Recent advances in Large Multimodal Models (LMMs) have unveiled great potential as visual assistants. However, most existing works focus on responding to individual instructions or using previous dialogues for contextual understanding. There is little discussion on employing both images and text as in-context examples to enhance the instruction following capability. To bridge this gap, we introduce the Otter model to leverage both textual and visual in-context examples for instruction tuning. Specifically, Otter builds upon Flamingo with Perceiver architecture, and has been instruction tuned for general purpose multi-modal assistant. Otter seamlessly processes multi-modal inputs, supporting modalities including text, multiple images, and dynamic video content. To support the training of Otter, we present the MIMIC-IT (MultI-Modal In-Context Instruction Tuning) dataset, which encompasses....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/tpami.2025.3571946","openalex_id":"https://openalex.org/W4410536678","cited_by_count":53,"quality_score":67,"matched_keywords":[],"author_affiliations":["Microsoft (United States)","Nanyang Institute of Technology","Nanyang Technological University"],"concepts":[{"id":"https://openalex.org/C2779343474","display_name":"Context (archaeology)","score":0.6624470353126526},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6488584280014038},{"id":"https://openalex.org/C71139939","display_name":"Modal","score":0.6027466058731079},{"id":"https://openalex.org/C2776931103","display_name":"Otter","score":0.513295590877533},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5057523250579834},{"id":"https://openalex.org/C183322885","display_name":"Context model","score":0.4607715308666229},{"id":"https://openalex.org/C127313418","display_name":"Geology","score":0.13453713059425354},{"id":"https://openalex.org/C205649164","display_name":"Geography","score":0.11999925971031189}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":53}},{"id":"openalex:W4410536426","title":"MDKAT: Multimodal Decoupling With Knowledge Aggregation and Transfer for Video Emotion Recognition","url":"https://doi.org/10.1109/tcsvt.2025.3571534","published":"2025-05-20","authors":["Jian Wang","Chenglong Wang","Lin Guo","Shuchang Zhao","Dandan Wang","Shiqing Zhang","Xiaoming Zhao","Jun Yu","Yaowei Wang","Yi Yang","Siwei Ma","Qi Tian"],"abstract":"Multimodal Emotion Recognition (MER) leverages multiple input signals to identify the expressed emotions in user-generated data. Currently, effectively addressing both modality heterogeneity and homogeneity on MER tasks is a challenging issue due to the diversity of multimodal inputs in videos. To address this issue, this work proposes an efficient Multimodal Decoupling Method with Knowledge Aggregation and Transfer (MDKAT) for robust multimodal feature learning in emotional videos. MDKAT is consisted of three key steps: modality-independent feature extraction, modality-specific feature extraction, and multi-loss integration for decoupling. In these three steps, four crucial modules are individually designed to improve different aspects of multimodal learning on MER tasks, including a Cross-modal Feature Fusion (CFF) module for enhancing modality-independent features, an Adaptive Masked....","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/tcsvt.2025.3571534","openalex_id":"https://openalex.org/W4410536426","cited_by_count":24,"quality_score":65,"matched_keywords":["efficient"],"author_affiliations":["Harbin Institute of Technology","Huawei Technologies (China)","Peking University","Peng Cheng Laboratory","Taizhou University","Zhejiang University of Science and Technology"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7021239399909973},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5493504405021667},{"id":"https://openalex.org/C205606062","display_name":"Decoupling (probability)","score":0.5230487585067749},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.41387754678726196},{"id":"https://openalex.org/C2776960227","display_name":"Knowledge transfer","score":0.4116722643375397},{"id":"https://openalex.org/C153180895","display_name":"Pattern recognition (psychology)","score":0.40284398198127747},{"id":"https://openalex.org/C28490314","display_name":"Speech recognition","score":0.3449632525444031},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.3227831721305847}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":24}},{"id":"bytedance-seed:238","title":"Emerging Properties in Unified Multimodal Pretraining","url":"https://seed.bytedance.com/en/research/emerging-properties-in-unified-multimodal-pretraining","published":"2025-05-20","authors":["Chaorui Deng","Deyao Zhu","Kunchang Li","Chenhui Gou","Feng Li","Zeyu Wang","Shu Zhong","Weihao Yu","Xiaonan Nie","Ziang Song","Guang Shi","Haoqi Fan"],"abstract":"Unifying multimodal understanding and generation has shown impressive capabilities in cutting-edge proprietary systems. In this work, we introduce BAGEL, an open0source foundational model that natively supports multimodal understanding and generation. BAGEL is a unified, decoder0only model pretrained on trillions of tokens curated from large0scale interleaved text, image, video, and web data. When scaled with such diverse multimodal interleaved data, BAGEL exhibits emerging capabilities in complex multimodal reasoning. As a result, it significantly outperforms open-source unified models in both multimodal generation and understanding across standard benchmarks, while exhibiting advanced multimodal reasoning abilities such as free-form image manipulation, future frame prediction, 3D manipulation, and world navigation. In the hope of facilitating further opportunities for multimodal resear...","companies":["ByteDance/Seed"],"matched_orgs":["ByteDance/Seed"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Computer Vision","Multimodal","arXiv"],"author_affiliations":["ByteDance/Seed"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official ByteDance Seed publication API https://seed.bytedance.com/api/get_article_list_v2"}},{"id":"openalex:W4410536545","title":"LMM-VQA: Advancing Video Quality Assessment With Large Multimodal Models","url":"https://doi.org/10.1109/tcsvt.2025.3571788","published":"2025-05-20","authors":["Qihang Ge","Wei Sun","Yu Zhang","Yunhao Li","Zhongpeng Ji","Fengyu Sun","Shangling Jui","Xiongkuo Min","Guangtao Zhai"],"abstract":"The explosive growth of videos on streaming media platforms has underscored the urgent need for effective video quality assessment (VQA) algorithms to monitor and perceptually optimize the quality of streaming videos. However, VQA remains an extremely challenging task due to the diverse video content and the complex spatial and temporal distortions, thus necessitating more advanced methods to address these issues. Nowadays, large multimodal models (LMMs), such as GPT-4V, have exhibited strong capabilities for various visual understanding tasks, motivating us to leverage the powerful multimodal representation ability of LMMs to solve the VQA task. Therefore, we propose an <italic xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">L</i>arge <italic xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">M</i>ulti-<ita...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/tcsvt.2025.3571788","openalex_id":"https://openalex.org/W4410536545","cited_by_count":14,"quality_score":63,"matched_keywords":["LLM","language model","media"],"author_affiliations":["East China Normal University","Huawei Technologies (China)","Shanghai Jiao Tong University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7146276235580444},{"id":"https://openalex.org/C103910844","display_name":"Video quality","score":0.5521456599235535},{"id":"https://openalex.org/C3020001037","display_name":"Quality assessment","score":0.5396945476531982},{"id":"https://openalex.org/C2779530757","display_name":"Quality (philosophy)","score":0.497974157333374},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4090225398540497},{"id":"https://openalex.org/C3018395757","display_name":"Evaluation methods","score":0.14488548040390015},{"id":"https://openalex.org/C200601418","display_name":"Reliability engineering","score":0.14105108380317688},{"id":"https://openalex.org/C21547014","display_name":"Operations management","score":0.0857267677783966}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":14}},{"id":"official:4517d8564c31e980","title":"Imagen 4 Model Card","url":"https://storage.googleapis.com/deepmind-media/Model-Cards/Imagen-4-Model-Card.pdf","published":"2025-05-20","authors":["Google/DeepMind"],"abstract":"Official Google DeepMind model card.","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"model_card","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["model card","Imagen 4"],"author_affiliations":["Google/DeepMind"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Google DeepMind model cards page https://deepmind.google/models/model-cards/"}},{"id":"openalex:W4410537723","title":"Implicit Multi-Behavior Generative Recommendation With Mixture of Quantization","url":"https://doi.org/10.1109/tkde.2025.3572014","published":"2025-05-20","authors":["Yuze Tan","Yanjie Gou","Kouying Xue","Shudong Huang","Yi Hu","Ivor W. Tsang","Jiancheng Lv"],"abstract":"Generative recommendation systems have recently seen a surge in interest, largely due to the promising advancements in generative AI. As a competitive solution for multi-behavior sequence recommendations, much of the recent research has concentrated on predicting the next item a user will likely interact with using a generative approach. However, these methods often 1). assign multiple residual quantization layers to obtain item codes, which leads to extra storage costs of more codebooks. And 2). explicitly utilize behavior sequences leading to longer sequences, potentially increasing the training time as well as inference time compared with original sequences. In response to these challenges, we introduce the <bold xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">I</b>mplicit <bold xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" xmlns:xlink=\"http:...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/tkde.2025.3572014","openalex_id":"https://openalex.org/W4410537723","cited_by_count":3,"quality_score":48,"matched_keywords":["efficient","quantization"],"author_affiliations":["Agency for Science, Technology and Research","Chengdu University","Hong Kong Baptist University","Sichuan University","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7789410352706909},{"id":"https://openalex.org/C28855332","display_name":"Quantization (signal processing)","score":0.6034747958183289},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5066984295845032},{"id":"https://openalex.org/C199833920","display_name":"Vector quantization","score":0.4832344949245453},{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.4759775698184967},{"id":"https://openalex.org/C153180895","display_name":"Pattern recognition (psychology)","score":0.38477015495300293},{"id":"https://openalex.org/C80444323","display_name":"Theoretical computer science","score":0.3276013135910034},{"id":"https://openalex.org/C11413529","display_name":"Algorithm","score":0.2806321680545807}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":3}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/elephant-measuring-and-understanding-social-sycophancy-in-llms","title":"ELEPHANT: Measuring and understanding social sycophancy in LLMs","url":"https://www.microsoft.com/en-us/research/publication/elephant-measuring-and-understanding-social-sycophancy-in-llms/","published":"2025-05-19","authors":["Myra Cheng","Sunny Yu","Cinoo Lee","Pranav Khadpe","Lujain Ibrahim","Dan Jurafsky"],"abstract":"LLMs are known to exhibit sycophancy: agreeing with and flattering users, even at the cost of correctness. Prior work measures sycophancy only as direct agreement with users'explicitly stated beliefs that can be compared to a ground truth. This fails to capture broader forms of sycophancy such as affirming a user's self-image or other implicit beliefs. To address this gap, we introduce social sycophancy, characterizing sycophancy as excessive preservation of a user's face (their desired self-image), and present ELEPHANT, a benchmark for measuring social sycophancy in an LLM. Applying our benchmark to 11 models, we show that LLMs consistently exhibit high rates of social sycophancy: on average, they preserve user's face 45 percentage points more than humans in general advice queries and in queries describing clear user wrongdoing (from Reddit's r/AmITheAsshole). Furthermore, when prompted...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":80,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","large language models","1970-01-01","LLM","preference"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/text-generation-beyond-discrete-token-sampling","title":"Text Generation Beyond Discrete Token Sampling","url":"https://www.microsoft.com/en-us/research/publication/text-generation-beyond-discrete-token-sampling/","published":"2025-05-19","authors":["Yufan Zhuang","Liyuan Liu","Chandan Singh","Jingbo Shang","Jianfeng Gao"],"abstract":"In standard autoregressive generation, an LLM predicts the next-token distribution, samples a discrete token, and then discards the distribution, passing only the sampled token as new input. To preserve this distribution's rich information, we propose Mixture of Inputs (MoI), a training-free method for autoregressive generation. After generating a token following the standard paradigm, we construct a new input that blends the generated discrete token with the previously discarded token distribution. Specifically, we employ a Bayesian estimation method that treats the token distribution as the prior, the sampled token as the observation, and replaces the conventional one-hot vector with the continuous posterior expectation as the new model input. MoI allows the model to maintain a richer internal representation throughout the generation process, resulting in improved text quality and reas...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","large language models","1970-01-01","LLM"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/reward-reasoning-model","title":"Reward Reasoning Model","url":"https://www.microsoft.com/en-us/research/publication/reward-reasoning-model/","published":"2025-05-19","authors":["Jiaxin Guo","Zewen Chi","Li Dong","Qingxiu Dong","Xun Wu","Shaohan Huang","Furu Wei"],"abstract":"Reward models play a critical role in guiding large language models toward outputs that align with human expectations. However, an open challenge remains in effectively utilizing test-time compute to enhance reward model performance. In this work, we introduce Reward Reasoning Models (RRMs), which are specifically designed to execute a deliberate reasoning process before generating final rewards. Through chain-of-thought reasoning, RRMs leverage additional test-time compute for complex queries where appropriate rewards are not immediately apparent. To develop RRMs, we implement a reinforcement learning framework that fosters self-evolved reward reasoning capabilities without requiring explicit reasoning traces as training data. Experimental results demonstrate that RRMs achieve superior performance on reward modeling benchmarks across diverse domains. Notably, we show that RRMs can adapt...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computation and Language","Computer science","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"hf-org-paper:moonshotai:2505.13426","title":"G1: Bootstrapping Perception and Reasoning Abilities of Vision-Language Model via Reinforcement Learning","url":"https://huggingface.co/papers/2505.13426","published":"2025-05-19","authors":["Moonshot/Kimi"],"abstract":"","companies":["Moonshot/Kimi"],"matched_orgs":["Moonshot/Kimi"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["HuggingFace org papers","moonshotai","language model"],"author_affiliations":["Moonshot/Kimi"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/moonshotai/papers"}},{"id":"openalex:W4413945133","title":"KARMA: Augmenting Embodied AI Agents with Long-and-Short Term Memory Systems","url":"https://doi.org/10.1109/icra55743.2025.11128047","published":"2025-05-19","authors":["Zixuan Wang","Bo Yu","Junzhe Zhao","W. Sun","Sai Hou","Shuai Liang","Xing Hu","Yinhe Han","Yiming Gan"],"abstract":"Embodied AI agents responsible for executing interconnected, long-sequence household tasks often face difficulties with in-context memory, leading to inefficiencies and errors in task execution. To address this issue, we introduce KARMA, an innovative memory system that integrates longterm and short-term memory modules, enhancing large language models (LLMs) for planning in embodied agents through memory-augmented prompting. Karma distinguishes between long-term and short-term memory, with long-term memory capturing comprehensive 3D scene graphs as representations of the environment, while short-term memory dynamically records changes in objects' positions and states. This dualmemory structure allows agents to retrieve relevant past scene experiences, thereby improving the accuracy and efficiency of task planning. Short-term memory employs strategies for effective and adaptive memory rep...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/icra55743.2025.11128047","openalex_id":"https://openalex.org/W4413945133","cited_by_count":5,"quality_score":58,"matched_keywords":["memory","long-term","efficient","agent"],"author_affiliations":["Alibaba Group (China)","Beijing Institute of Technology","Institute of Art","Institute of Automation","Institute of Computing Technology"],"concepts":[{"id":"https://openalex.org/C100609095","display_name":"Embodied cognition","score":0.7905982136726379},{"id":"https://openalex.org/C547328371","display_name":"Karma","score":0.7756463289260864},{"id":"https://openalex.org/C61797465","display_name":"Term (time)","score":0.731404185295105},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6226587295532227},{"id":"https://openalex.org/C188147891","display_name":"Cognitive science","score":0.38649362325668335},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.32397300004959106},{"id":"https://openalex.org/C15744967","display_name":"Psychology","score":0.1726107895374298},{"id":"https://openalex.org/C138885662","display_name":"Philosophy","score":0.13003253936767578}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":5}},{"id":"openalex:W4413360557","title":"ARAG: Analysis and Retrieval Augmented Generation for Comprehensive Reasoning over Socioeconomic Data","url":"https://doi.org/10.1109/icde65448.2025.00368","published":"2025-05-19","authors":["Yixiong Xiao","Jingjia Cao","Yangxin Jiang","Jingbo Zhou"],"abstract":"Recent advancements in Large Language Models (LLMs) have significantly impacted the field of question answering systems, particularly with LLM-based data analysis and Retrieval-Augmented Generation (RAG). Yet, applying them independently has limited their effectiveness in scenarios that require a synthesis of both data analysis and contemporary information retrieval. To bridge this gap, we introduce the Analysis and Retrieval Augmented Generation (ARAG) framework, which integrates data analysis with the retrieval of up-to-date information. Based on the framework, we build a system to showcase how ARAG interprets the dynamics of socioeconomic indicators by examining correlated data and retrieving relevant information from news sources. The comparison of ARAG with ChatGPT Search and Perplexity showed that ARAG significantly outperformed them in delivering indepth analytical insights. Moreo...","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/icde65448.2025.00368","openalex_id":"https://openalex.org/W4413360557","cited_by_count":0,"quality_score":49,"matched_keywords":["LLM","retrieval","news"],"author_affiliations":["Baidu (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7121959924697876},{"id":"https://openalex.org/C147077947","display_name":"Socioeconomic status","score":0.5889050364494324},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.4470304846763611},{"id":"https://openalex.org/C2522767166","display_name":"Data science","score":0.3346705436706543},{"id":"https://openalex.org/C144024400","display_name":"Sociology","score":0.05431815981864929},{"id":"https://openalex.org/C149923435","display_name":"Demography","score":0.0},{"id":"https://openalex.org/C2908647359","display_name":"Population","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4413458163","title":"LLM + Vector Data: Coupling of Large Language Models with Vector Data Management for Enhancing Data Science","url":"https://doi.org/10.1109/icdew67478.2025.00018","published":"2025-05-19","authors":["Arijit Khan","Yuxiang Wang","Weixi Zhang","Yao Tian","M. TAMER ÖZSU"],"abstract":"The emergence of generative AI (GenAI) is a major driving force behind the modern data science ecosystem, a field that exploits data as the central asset for actionable insights. Analogously, GenAI is a form of artificial intelligence which learns from massive datasets to generate new data, showcasing human-like creativity in text, images to code, speech, and video. Two critical pillars of the GenAI technology are large language models (LLMs) and vector data. In particular, LLMs are a category of genAI models that emphasize on generating new text contents. On the other hand, there is also an upsurge of dense, high-dimensional, billion-scale vector data from deep learning models that embed complex data, e.g., text, multimedia, graphs, and tables into vector representations aiming to preserve semantic similarity. Since LLMs operate on vector data at various stages consisting of pre-trainin...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/icdew67478.2025.00018","openalex_id":"https://openalex.org/W4413458163","cited_by_count":0,"quality_score":45,"matched_keywords":["LLM","retrieval"],"author_affiliations":["Aalborg University","Hangzhou Dianzi University","Hong Kong University of Science and Technology","Huawei Technologies (China)","University of Waterloo"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6790138483047485},{"id":"https://openalex.org/C92087593","display_name":"Vector (molecular biology)","score":0.5286949872970581},{"id":"https://openalex.org/C131584629","display_name":"Coupling (piping)","score":0.5003941059112549},{"id":"https://openalex.org/C67186912","display_name":"Data modeling","score":0.4875761866569519},{"id":"https://openalex.org/C124101348","display_name":"Data mining","score":0.35263949632644653},{"id":"https://openalex.org/C77088390","display_name":"Database","score":0.20876353979110718},{"id":"https://openalex.org/C127413603","display_name":"Engineering","score":0.15665960311889648},{"id":"https://openalex.org/C86803240","display_name":"Biology","score":0.09141376614570618}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4413349539","title":"Boosting Accuracy and Efficiency for Vector Retrieval with Local Scaling Graph","url":"https://doi.org/10.1109/icde65448.2025.00032","published":"2025-05-19","authors":["Hongya Wang","Wenlong Wu","Cong Luo","Aobei Bian","Chunguang Meng","Ying Wu","Ji Sun"],"abstract":"Vector database systems have been gaining more and more attention in recent years with the prevalence of Large Language Models. As the most important algorithmic component behind vector database systems, nearest neighbor search has been studied for decades and various approaches are proposed for efficient vector retrieval. Among these proposals, the graph-based search paradigm is able to achieve desirable accuracy-efficiency tradeoff, and thus has been widely used in many industrial vector retrieval engines. In this paper, however, we claim that its efficiency is largely handicapped by two unnoticed performance issues - accuracy saturation and long-tail queries, especially when the number of links is limited. Through both empirical and theoretical analysis, we identify that the existence of antihubs is the root cause of these performance limitations. To mitigates the negative impact of a...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/icde65448.2025.00032","openalex_id":"https://openalex.org/W4413349539","cited_by_count":0,"quality_score":45,"matched_keywords":["retrieval","efficient"],"author_affiliations":["Donghua University","Huawei Technologies (China)"],"concepts":[{"id":"https://openalex.org/C46686674","display_name":"Boosting (machine learning)","score":0.8745496273040771},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.635105311870575},{"id":"https://openalex.org/C99844830","display_name":"Scaling","score":0.626523494720459},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4691639840602875},{"id":"https://openalex.org/C132525143","display_name":"Graph","score":0.43125012516975403},{"id":"https://openalex.org/C80444323","display_name":"Theoretical computer science","score":0.34033316373825073},{"id":"https://openalex.org/C33923547","display_name":"Mathematics","score":0.23579955101013184},{"id":"https://openalex.org/C2524010","display_name":"Geometry","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4413925307","title":"ASCENT: Autonomous Skill Learning Toward Complex Embodied Tasks With Foundation Models","url":"https://doi.org/10.1109/icra55743.2025.11127927","published":"2025-05-19","authors":["Haolin Wu","Yuecheng Liu","Junyi Dong","Heng Zhang","Sitong Mao","Hesheng Wang","Weigang Wu","Shunbo Zhou"],"abstract":"Collecting data from simulated scenarios for training robotic skills provides a safer and more controllable alternative to real-world environments. However, it demands considerable effort, including the manual construction of simulation environments, the careful design of tasks, and the challenge of obtaining effective trajectories. These limitations hinder the efficiency of data collection from simulated scenarios. In this paper, we leverage the prior knowledge of Large Language Models (LLMs) and Large Multimodal Models (LMMs) to generate simulated scenarios and embodied tasks. We introduce a novel framework, ASCENT (Autonomous Skill learning toward Complex Embodied tasks with fouNdaTion models), designed to efficiently accomplish these tasks and generate trajectory data. ASCENT features a fully autonomous skill learning mechanism based on AI agent. During task training, the AI agent id...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/icra55743.2025.11127927","openalex_id":"https://openalex.org/W4413925307","cited_by_count":0,"quality_score":45,"matched_keywords":["LLM","agent"],"author_affiliations":["Huawei Technologies (China)","Shanghai Jiao Tong University","Sun Yat-sen University"],"concepts":[{"id":"https://openalex.org/C100609095","display_name":"Embodied cognition","score":0.859897255897522},{"id":"https://openalex.org/C2780966255","display_name":"Foundation (evidence)","score":0.8285629153251648},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6190809011459351},{"id":"https://openalex.org/C188147891","display_name":"Cognitive science","score":0.46578875184059143},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.39654549956321716},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.3537447154521942},{"id":"https://openalex.org/C15744967","display_name":"Psychology","score":0.2342437505722046},{"id":"https://openalex.org/C17744445","display_name":"Political science","score":0.09064039587974548}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"arxiv:2504.09993","title":"AimTS: Augmented Series and Image Contrastive Learning for Time Series Classification","url":"https://arxiv.org/abs/2504.09993","published":"2025-05-19","authors":["Yuxuan Chen","Shanshan Huang","Yunyao Cheng","Peng Chen","Zhongwen Rao","Yang Shu","Bin Yang","Lujia Pan","Chenjuan Guo"],"abstract":"Time series classification (TSC) is an important task in time series analysis. Existing TSC methods mainly train on each single domain separately, suffering from a degradation in accuracy when the samples for training are insufficient in certain domains. The pre-training and fine-tuning paradigm provides a promising direction for solving this problem. However, time series from different domains are substantially divergent, which challenges the effective pre-training on multi-source data and the generalization ability of pre-trained models. To handle this issue, we introduce Augmented Series and Image Contrastive Learning for Time Series Classification (AimTS), a pre-training framework that learns generalizable representations from multi-source time series data. We propose a two-level prototype-based contrastive learning method to effectively utilize various augmentations in multi-source....","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/icde65448.2025.00149","openalex_id":"https://openalex.org/W4413361291","cited_by_count":3,"quality_score":44,"matched_keywords":["efficient"],"author_affiliations":["Aalborg University","East China Normal University","Huawei Technologies (China)"],"concepts":[{"id":"https://openalex.org/C143724316","display_name":"Series (stratigraphy)","score":0.8187658190727234},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7025105357170105},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5391730070114136},{"id":"https://openalex.org/C115961682","display_name":"Image (mathematics)","score":0.5162351131439209},{"id":"https://openalex.org/C75294576","display_name":"Contextual image classification","score":0.46879491209983826},{"id":"https://openalex.org/C153180895","display_name":"Pattern recognition (psychology)","score":0.3778917193412781},{"id":"https://openalex.org/C127313418","display_name":"Geology","score":0.050406038761138916},{"id":"https://openalex.org/C151730666","display_name":"Paleontology","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":3}},{"id":"openalex:W4413925142","title":"SAS-Prompt: Large Language Models as Numerical Optimizers for Robot Self-Improvement","url":"https://doi.org/10.1109/icra55743.2025.11127882","published":"2025-05-19","authors":["Heni Ben Amor","Laura Graesser","Atıl Işçen","David B. D’Ambrosio","Saminda Abevruwan","Alex Bewley","Yifan Zhou","Kamalesh Kalirathinam","Swaroop Mishra","Pannag Sanketi"],"abstract":"We demonstrate the ability of large language models (LLMs) to perform iterative self-improvement of robot policies. An important insight of this paper is that LLMs have a built-in ability to perform (stochastic) numerical optimization and that this property can be leveraged for explainable robot policy search. Based on this insight, we introduce the SAS Prompt (Summarize, Analyze, Synthesize) – a single prompt that enables iterative learning and adaptation of robot behavior by combining the LLM's ability to retrieve, reason and optimize over previous robot traces in order to synthesize new, unseen behavior. Our approach can be regarded as an early example of a new family of explainable policy search methods that are entirely implemented within an LLM. We evaluate our approach both in simulation and on a real-robot table tennis task. Project website: sites.google.com/asu.edu/sas-llm/","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/icra55743.2025.11127882","openalex_id":"https://openalex.org/W4413925142","cited_by_count":2,"quality_score":43,"matched_keywords":["LLM"],"author_affiliations":["Arizona State University","DeepMind (United Kingdom)","Google (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6838072538375854},{"id":"https://openalex.org/C90509273","display_name":"Robot","score":0.5610151886940002},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.2674984335899353}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":2}},{"id":"openalex:W4413917542","title":"Gen-Drive: Enhancing Diffusion Generative Driving Policies with Reward Modeling and Reinforcement Learning Fine-Tuning","url":"https://doi.org/10.1109/icra55743.2025.11127286","published":"2025-05-19","authors":["Zhiyu Huang","Xinshuo Weng","Maximilian Igl","Yuxiao Chen","Yulong Cao","Boris Ivanovic","Marco Pavone","Chen Lv"],"abstract":"Autonomous driving necessitates the ability to reason about future interactions between traffic agents and to make informed evaluations for planning. This paper introduces the Gen-Drive framework, which shifts from the traditional prediction and deterministic planning framework to a generation-then-evaluation planning paradigm. The framework employs a behavior diffusion model as a scene generator to produce diverse possible future scenarios, thereby enhancing the capability for joint interaction reasoning. To facilitate decision-making, we propose a scene evaluator (reward) model, trained with pairwise preference data collected through VLM assistance, thereby reducing human workload and enhancing scalability. Furthermore, we utilize an RL fine-tuning framework to improve the generation quality of the diffusion model, rendering it more effective for planning tasks. We conduct training and...","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/icra55743.2025.11127286","openalex_id":"https://openalex.org/W4413917542","cited_by_count":2,"quality_score":43,"matched_keywords":["preference"],"author_affiliations":["Nanyang Technological University","Nvidia (United States)"],"concepts":[{"id":"https://openalex.org/C97541855","display_name":"Reinforcement learning","score":0.8929775953292847},{"id":"https://openalex.org/C67203356","display_name":"Reinforcement","score":0.6205715537071228},{"id":"https://openalex.org/C167966045","display_name":"Generative model","score":0.6111664772033691},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5968258380889893},{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.5947250127792358},{"id":"https://openalex.org/C69357855","display_name":"Diffusion","score":0.49736812710762024},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.40146803855895996},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.33080342411994934}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":2}},{"id":"openalex:W4413360460","title":"M<sup>2</sup>oERank: Multi-Objective Mixture-of-Experts Enhanced Ranking for Satisfaction-Oriented Web Search","url":"https://doi.org/10.1109/icde65448.2025.00333","published":"2025-05-19","authors":["Yuchen Li","Hao Zhang","Yongqi Zhang","Xinyu Ma","Wenwen Ye","Naifei Song","Shuaiqiang Wang","Haoyi Xiong","Dawei Yin","Lei Chen"],"abstract":"Pre-trained language models (PLMs) have been successfully used to build high-performance ranking models for large-scale information retrieval systems. However, traditional PLM-based ranking approaches face two key challenges: (1) these models use both sparse and dense content (such as the query/title and content of documents) as inputs, which may require different attention allocations; and (2) traditional PLM-based ranking approaches have identified multiple objectives to gauge user satisfaction with ranking results, but integrating these objectives into the end-to-end training process and the subsequent feature updates and iterations usually involves significant computational resource overhead. In this paper, we propose a novel PLM-based ranking approach M<sup xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">2</sup>oE Rank, Multi-objective Mixtu...","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/icde65448.2025.00333","openalex_id":"https://openalex.org/W4413360460","cited_by_count":1,"quality_score":42,"matched_keywords":["retrieval"],"author_affiliations":["Baidu (China)"],"concepts":[{"id":"https://openalex.org/C189430467","display_name":"Ranking (information retrieval)","score":0.5893630385398865},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5821189880371094},{"id":"https://openalex.org/C136764020","display_name":"World Wide Web","score":0.38471755385398865},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.3341369032859802}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4413925802","title":"Ms. NAMI: Multimodal Semantic Navigation on Relative Metric Intention Graph","url":"https://doi.org/10.1109/icra55743.2025.11128364","published":"2025-05-19","authors":["Shichao Zhai","Yuxiang Cui","Shuhao Ye","Xuan Yu","Sitong Mao","Shunbo Zhou","Rong Xiong","Yue Wang"],"abstract":"Embodied navigation in unknown environments presents the significant challenge of integrating tasks with multimodal goals into a unified framework. In this paper, we propose the Multimodal Semantic Navigation on Relative Metric Intention Graph (Ms. NAMI), a framework that integrates various navigation tasks with multimodal goals based on a relative topo-metric intention graph. A reinforcement learning based policy with a concise action space, consisting of frontier nodes and intention nodes, is designed to guide the agent to select reasonable sub-goals. A sparse reward design is introduced to reduce bias during training. Additionally, several engineering optimizations are implemented to enhance overall performance. The experimental results indicate that our method can achieve robust navigation performance in a variety of unknown environments.","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/icra55743.2025.11128364","openalex_id":"https://openalex.org/W4413925802","cited_by_count":0,"quality_score":41,"matched_keywords":["agent"],"author_affiliations":["Huawei Technologies (China)","University of Nottingham Ningbo China","Zhejiang University of Technology"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.749903678894043},{"id":"https://openalex.org/C176217482","display_name":"Metric (unit)","score":0.585297703742981},{"id":"https://openalex.org/C132525143","display_name":"Graph","score":0.49799442291259766},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.48678895831108093},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.35019630193710327},{"id":"https://openalex.org/C136764020","display_name":"World Wide Web","score":0.3264356255531311},{"id":"https://openalex.org/C80444323","display_name":"Theoretical computer science","score":0.1620369851589203},{"id":"https://openalex.org/C127413603","display_name":"Engineering","score":0.04913333058357239}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4413350111","title":"KnowTrans: Boosting Transferability of Data Preparation LLMs via Knowledge Augmentation","url":"https://doi.org/10.1109/icde65448.2025.00214","published":"2025-05-19","authors":["Yuhang Ge","Fengyu Li","Yuren Mao","Yanbo Yang","Congcong Ge","Zhaorun Chen","Jiang Long","Yunjun Gao"],"abstract":"Data Preparation (DP), which involves tasks such as data cleaning, imputation and integration, is a fundamental process in data-driven applications. Recently, Large Language Models (LLMs) fine-tuned for DP tasks, i.e., DP-LLMs, have achieved state-of-the-art performance. However, transferring DP-LLMs to novel datasets and tasks typically requires a substantial amount of labeled data, which is impractical in many real-world scenarios. To address this, we propose a knowledge augmentation framework for data preparation, dubbed KNOWTRANS. This framework allows DP-LLMs to be transferred to novel datasets and tasks with a few data points, significantly decreasing the dependence on extensive labeled data. KNOWTRANS comprises two components: Selective Knowledge Concentration and Automatic Knowledge Bridging. The first component re-uses knowledge from previously learned tasks, while the second au...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/icde65448.2025.00214","openalex_id":"https://openalex.org/W4413350111","cited_by_count":0,"quality_score":41,"matched_keywords":["LLM"],"author_affiliations":["Cloud Computing Center","Huawei Technologies (China)","Huawei Technologies (United States)","Zhejiang University"],"concepts":[{"id":"https://openalex.org/C46686674","display_name":"Boosting (machine learning)","score":0.8547785878181458},{"id":"https://openalex.org/C61272859","display_name":"Transferability","score":0.8178342580795288},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5372822880744934},{"id":"https://openalex.org/C2522767166","display_name":"Data science","score":0.38520121574401855},{"id":"https://openalex.org/C56739046","display_name":"Knowledge management","score":0.35903048515319824},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.2884514629840851},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.2845977544784546},{"id":"https://openalex.org/C140331021","display_name":"Logit","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4410502410","title":"ID-centric Pre-training for Recommendation","url":"https://doi.org/10.1145/3735128","published":"2025-05-19","authors":["Yiqing Wu","Ruobing Xie","Zhao Zhang","Xu Zhang","Fuzhen Zhuang","Leyu Lin","Zhanhui Kang","Zhulin An","Yongjun Xu"],"abstract":"Classical sequential recommendation models generally adopt ID embeddings to store knowledge learned from user historical behaviors and represent items. However, these unique IDs are challenging to be transferred to new domains. With the thriving of pre-trained language model (PLM), some pioneer works adopt PLM for pre-trained recommendation, where modality information is considered universal across domains via PLM. Unfortunately, the behavioral information in ID embeddings is verified to currently dominate in recommendation compared to modality information and thus limits these models’ performance. In this work, we propose a novel ID-centric recommendation pre-training paradigm (IDP), which directly transfers informative ID embeddings learned in pre-training domains to item representations in new domains. Specifically, in pre-training stage, besides the ID-based sequential recommendation...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3735128","openalex_id":"https://openalex.org/W4410502410","cited_by_count":0,"quality_score":41,"matched_keywords":["language model"],"author_affiliations":["Beihang University","Chinese Academy of Sciences","Institute of Computing Technology","Tencent (China)","University of Chinese Academy of Sciences"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8021122217178345},{"id":"https://openalex.org/C2780226545","display_name":"Modality (human–computer interaction)","score":0.7144391536712646},{"id":"https://openalex.org/C36503486","display_name":"Domain (mathematical analysis)","score":0.6074157357215881},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4987339973449707},{"id":"https://openalex.org/C2776745293","display_name":"Thriving","score":0.4984006881713867},{"id":"https://openalex.org/C557471498","display_name":"Recommender system","score":0.4376000463962555},{"id":"https://openalex.org/C100776233","display_name":"Bridge (graph theory)","score":0.4339069724082947},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.40045350790023804}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4413925321","title":"E2B: A Single Modality Point-Based Tracker with Event Cameras","url":"https://doi.org/10.1109/icra55743.2025.11127695","published":"2025-05-19","authors":["Hongwei Ren","Zhuo Li","Aiersi Tuerhong","Haobo Liu","Fei Liang","Yongxiang Feng","Wenhui Wang","Yaoyuan Wang","Ziyang Zhang","Weihua He","Bojun Cheng"],"abstract":"High-speed object tracking holds significant relevance across robotic domains, such as drones and autonomous driving. Compared to conventional cameras, event cameras are equipped with the ability to capture object motion information at exceptionally high temporal resolution with relatively low power consumption and remain immune from motion-blurring effects. Regrettably, many existing methods adopt a framebased approach by stacking events into Event Frame, which overlooks the sparsity and high temporal resolution of events. This approach is also reliant on the huge pre-training backbone and reaches a performance plateau but demands unrealistically large networks and high power consumption, rendering it impractical for real-time applications in battery-constrained robotic scenarios. In this paper, we propose an efficient and effective single-modality tracker using Point Cloud representati...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/icra55743.2025.11127695","openalex_id":"https://openalex.org/W4413925321","cited_by_count":0,"quality_score":41,"matched_keywords":["efficient"],"author_affiliations":["Hong Kong University of Science and Technology","Huawei Technologies (China)","Huawei Technologies (United Kingdom)","Huawei Technologies (United States)","Peking University","Tsinghua University","University of Hong Kong"],"concepts":[{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.6733058094978333},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6697498559951782},{"id":"https://openalex.org/C2780226545","display_name":"Modality (human–computer interaction)","score":0.6611701846122742},{"id":"https://openalex.org/C2779662365","display_name":"Event (particle physics)","score":0.5482098460197449},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5313330888748169},{"id":"https://openalex.org/C28719098","display_name":"Point (geometry)","score":0.4753410816192627},{"id":"https://openalex.org/C121332964","display_name":"Physics","score":0.09116187691688538},{"id":"https://openalex.org/C33923547","display_name":"Mathematics","score":0.0693945586681366}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4413361194","title":"DaRec: A Disentangled Alignment Framework for Large Language Model and Recommender System","url":"https://doi.org/10.1109/icde65448.2025.00073","published":"2025-05-19","authors":["Xihong Yang","Heming Jing","Zixing Zhang","Jindong Wang","Hai Tao Niu","Shuaiqiang Wang","Lu Yu","Junfeng Wang","Dawei Yin","Xinwang Liu","En Zhu","Defu Lian"],"abstract":"Benefiting from the strong reasoning capabilities, Large language models (LLMs) have demonstrated remarkable performance in recommender systems. Various efforts have been made to distill knowledge from LLMs to enhance collaborative models, employing techniques like contrastive learning for representation alignment. In this work, we prove that directly aligning the representations of LLMs and collaborative models is suboptimal for enhancing downstream recommendation tasks performance, based on the information theorem. Consequently, the challenge of effectively aligning semantic representations between collaborative models and LLMs remains unresolved. Inspired by this viewpoint, we propose a novel plug-and-play alignment framework for LLMs and collaborative models. Specifically, we first disentangle the latent representations of both LLMs and collaborative models into specific and shared c...","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/icde65448.2025.00073","openalex_id":"https://openalex.org/W4413361194","cited_by_count":0,"quality_score":41,"matched_keywords":["language model"],"author_affiliations":["Baidu (China)","National University of Defense Technology","University of Science and Technology of China"],"concepts":[{"id":"https://openalex.org/C557471498","display_name":"Recommender system","score":0.8701962232589722},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7990915775299072},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.5017123222351074},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.43587830662727356},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.3581366539001465}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4413947024","title":"Categorical Traffic Transformer: Interpretable and Diverse Behavior Prediction with Tokenized Latent","url":"https://doi.org/10.1109/icra55743.2025.11128234","published":"2025-05-19","authors":["Yuxiao Chen","Sander Tonkens","Marco Pavone"],"abstract":"Adept traffic models are critical to both real-time prediction/planning and closed-loop simulation for autonomous vehicles (AV). Key design objectives include accuracy, diverse multimodal behaviors, interpretability, and compatibility with other modules in the autonomy stack, e.g., the downstream planner. We present Categorical Traffic Transformer (CTT), a traffic model that outputs both continuous trajectory predictions and categorical predictions with clear semantic meanings (lane modes, homotopies, etc.). The most outstanding feature of CTT is its fully interpretable latent space, which enables direct supervision of the latent variables from the ground truth during training and avoids mode collapse completely. As a result, CTT can generate diverse behaviors conditioned on different semantic modes while significantly beating SOTA on prediction accuracy. In addition, CTT's ability to in...","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/icra55743.2025.11128234","openalex_id":"https://openalex.org/W4413947024","cited_by_count":2,"quality_score":39,"matched_keywords":[],"author_affiliations":["Nvidia (United Kingdom)","Nvidia (United States)","Stanford University","University of California, San Diego"],"concepts":[{"id":"https://openalex.org/C5274069","display_name":"Categorical variable","score":0.8674048185348511},{"id":"https://openalex.org/C66322947","display_name":"Transformer","score":0.5703659057617188},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5303331613540649},{"id":"https://openalex.org/C2985695025","display_name":"Road traffic","score":0.4336931109428406},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4104645550251007},{"id":"https://openalex.org/C124101348","display_name":"Data mining","score":0.39314085245132446},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.34936341643333435},{"id":"https://openalex.org/C2522767166","display_name":"Data science","score":0.32351869344711304}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":2}},{"id":"openalex:W4413360829","title":"Training Data Distribution Estimation for Optimized Pre-training Data Management","url":"https://doi.org/10.1109/icde65448.2025.00372","published":"2025-05-19","authors":["Hao Liang","Keshi Zhao","Yajie Yang","Bin Cui","Zenan Zhou","Wentao Zhang"],"abstract":"Large language models (LLMs) have demonstrated exceptional performance across a wide range of tasks and domains, with data preparation playing a critical role in achieving these results. Pretraining data typically combines information from multiple domains. To maximize performance when integrating data from various domains, determining the optimal data distribution is essential. However, state-of-the-art (SOTA) LLMs rarely disclose details about their pretraining data, making it difficult for researchers to identify ideal data distributions. In this paper, we introduce a new approach, data distribution estimation, which enables the automatic estimation of pretraining data distributions by analyzing the generated outputs of LLMs. We provide rigorous theoretical proofs, practical algorithms, and preliminary experimental results for data distribution estimation. Based on these findings, we....","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/icde65448.2025.00372","openalex_id":"https://openalex.org/W4413360829","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Baidu (China)","Beijing University of Posts and Telecommunications","Peking University"],"concepts":[{"id":"https://openalex.org/C2777211547","display_name":"Training (meteorology)","score":0.7554638385772705},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7118042707443237},{"id":"https://openalex.org/C96250715","display_name":"Estimation","score":0.6146757006645203},{"id":"https://openalex.org/C67186912","display_name":"Data modeling","score":0.4728822112083435},{"id":"https://openalex.org/C51632099","display_name":"Training set","score":0.45883530378341675},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.30782458186149597},{"id":"https://openalex.org/C77088390","display_name":"Database","score":0.10113173723220825},{"id":"https://openalex.org/C127413603","display_name":"Engineering","score":0.09893891215324402}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4413944816","title":"Dynamic Non-Prehensile Object Transport via Model-Predictive Reinforcement Learning","url":"https://doi.org/10.1109/icra55743.2025.11127521","published":"2025-05-19","authors":["Neel Jawale","Byron Boots","Balakumar Sundaralingam","Mohak Bhardwaj"],"abstract":"We investigate the problem of teaching a robot manipulator to perform dynamic non-prehensile object transport, also known as the ‘robot waiter’ task, from a limited set of real-world demonstrations. We propose an approach that combines batch reinforcement learning (RL) with modelpredictive control (MPC) by pretraining an ensemble of value functions from demonstration data, and utilizing them online within an uncertainty-aware MPC scheme to ensure robustness to limited data coverage. Our approach is straightforward to integrate with off-the-shelf MPC frameworks and enables learning solely from task space demonstrations with sparsely labeled transitions, while leveraging MPC to ensure smooth joint space motions and constraint satisfaction. We validate the proposed approach through extensive simulated and real-world experiments on a Franka Panda robot performing the robot waiter task and de...","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/icra55743.2025.11127521","openalex_id":"https://openalex.org/W4413944816","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Nvidia (United States)","University of Washington"],"concepts":[{"id":"https://openalex.org/C136380597","display_name":"Prehensile tail","score":0.9387867450714111},{"id":"https://openalex.org/C97541855","display_name":"Reinforcement learning","score":0.8086352348327637},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6351313591003418},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4968266785144806},{"id":"https://openalex.org/C2781238097","display_name":"Object (grammar)","score":0.47521620988845825},{"id":"https://openalex.org/C71924100","display_name":"Medicine","score":0.06345409154891968},{"id":"https://openalex.org/C105702510","display_name":"Anatomy","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4413926173","title":"Chain-of-Modality: Learning Manipulation Programs from Multimodal Human Videos with Vision-Language-Models","url":"https://doi.org/10.1109/icra55743.2025.11128270","published":"2025-05-19","authors":["Chen Wang","Fei Xia","Wenhao Yu","Tingnan Zhang","Ruohan Zhang","C. Karen Liu","Li Fei-Fei","Jie Tan","Jacky Liang"],"abstract":"Learning to perform manipulation tasks from human videos is a promising approach for teaching robots. However, many manipulation tasks require changing control parameters during task execution, such as force, which visual data alone cannot capture. In this work, we leverage sensing devices such as armbands that measure human muscle activities and microphones that record sound, to capture the details in the human manipulation process, and enable robots to extract task plans and control parameters to perform the same task. To achieve this, we introduce Chain-of-Modality (CoM), a prompting strategy that enables Vision Language Models to reason about multimodal human demonstration data - videos coupled with muscle or audio signals. By progressively integrating information from each modality, CoM refines a task plan and generates detailed control parameters, enabling robots to perform manipul...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/icra55743.2025.11128270","openalex_id":"https://openalex.org/W4413926173","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["DeepMind (United Kingdom)","Google (United States)","Stanford University"],"concepts":[{"id":"https://openalex.org/C2780226545","display_name":"Modality (human–computer interaction)","score":0.7503750324249268},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7175166606903076},{"id":"https://openalex.org/C199185054","display_name":"Chain (unit)","score":0.4859508275985718},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.46341556310653687},{"id":"https://openalex.org/C2780660688","display_name":"Multimodal learning","score":0.4556356966495514},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4467261731624603},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.4346689283847809},{"id":"https://openalex.org/C2780910867","display_name":"Multimodality","score":0.4123196303844452}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"arxiv:2505.13439","title":"VTBench: Evaluating Visual Tokenizers for Autoregressive Image Generation","url":"https://huggingface.co/papers/2505.13439","published":"2025-05-19","authors":["Huawei Lin","Tong Geng","Zhaozhuo Xu","Weijie Zhao"],"abstract":"Autoregressive (AR) models have recently shown strong performance in image generation, where a critical component is the visual tokenizer (VT) that maps continuous pixel inputs to discrete token sequences. The quality of the VT largely defines the upper bound of AR model performance. However, current discrete VTs fall significantly behind continuous variational autoencoders (VAEs), leading to degraded image reconstructions and poor preservation of details and text. Existing benchmarks focus on end-to-end generation quality, without isolating VT performance. To address this gap, we introduce VTBench, a comprehensive benchmark that systematically evaluates VTs across three core tasks: Image Reconstruction, Detail Preservation, and Text Preservation, and covers a diverse range of evaluation scenarios. We systematically assess state-of-the-art VTs using a set of metrics to evaluate the quali...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["huggingface_search"],"source":"huggingface_search","work_type":"","doi":"","openalex_id":"","cited_by_count":0,"quality_score":27,"matched_keywords":[],"author_affiliations":[],"concepts":[],"official_report":false,"quality_signals":{"company_match_source":"HuggingFace Papers organization/author metadata"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/mmar-a-challenging-benchmark-for-deep-reasoning-in-speech-audio-music-and-their-mix","title":"MMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their Mix","url":"https://www.microsoft.com/en-us/research/publication/mmar-a-challenging-benchmark-for-deep-reasoning-in-speech-audio-music-and-their-mix/","published":"2025-05-18","authors":["Ziyang Ma","Yi Ma","Yanqiao Zhu","Chen Yang","Yi-Wen Chao","Ruiyang Xu","Wenxi Chen","Yuanzhe Chen","Zhuo Chen","Jian Cong","Kai Li","Keliang Li"],"abstract":"We introduce MMAR, a new benchmark designed to evaluate the deep reasoning capabilities of Audio-Language Models (ALMs) across massive multi-disciplinary tasks. MMAR comprises 1,000 meticulously curated audio-question-answer triplets, collected from real-world internet videos and refined through iterative error corrections and quality checks to ensure high quality. Unlike existing benchmarks that are limited to specific domains of sound, music, or speech, MMAR extends them to a broad spectrum of real-world audio scenarios, including mixed-modality combinations of sound, music, and speech. Each question in MMAR is hierarchically categorized across four reasoning layers: Signal, Perception, Semantic, and Cultural, with additional sub-categories within each layer to reflect task diversity and complexity. To further foster research in this area, we annotate every question with a Chain-of-Tho...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Audio and Acoustics","Audio and Speech Processing","Computer science","Engineering","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W4411213511","title":"Leveraging Generative AI for Actionable Insights in Cloud Computing: Innovations and Applications","url":"https://doi.org/10.56472/iccsaiml25-121","published":"2025-05-18","authors":["Pavan Nithin Mullapudi"],"abstract":"Generative AI (GenAI) has emerged as a transformative tool in cloud computing, enabling advanced predictive analytics, explainable decision-making, and context-aware recommendations. This paper synthesizes academic research and industry advancements to explore four critical applications of GenAI: (1) time series classification for customer growth and churn prediction, (2) explainability in machine learning propensity models, (3) retrieval-augmented generation (RAG) systems for augmented insights, and (4) domain-specific fine-tuning for action recommendations. Drawing on peer-reviewed studies, we demonstrate how transformer-based architectures achieve 89% accuracy in churn prediction, counterfactual explanations improve stakeholder trust by 41%, and RAG systems reduce hallucinations in cost-optimization tools by 16%. Challenges such as data quality, ethical governance, and real-time scala...","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.56472/iccsaiml25-121","openalex_id":"https://openalex.org/W4411213511","cited_by_count":0,"quality_score":41,"matched_keywords":["retrieval"],"author_affiliations":["Amazon (United States)"],"concepts":[{"id":"https://openalex.org/C79974875","display_name":"Cloud computing","score":0.8258399367332458},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7484716773033142},{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.6701762676239014},{"id":"https://openalex.org/C2522767166","display_name":"Data science","score":0.5463680028915405},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.2579665780067444},{"id":"https://openalex.org/C111919701","display_name":"Operating system","score":0.057982057332992554}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4412030951","title":"Electric Motor Drive Anomaly Detection Using AutoGluon","url":"https://doi.org/10.1109/iemdc60492.2025.11061011","published":"2025-05-18","authors":["Ron Wang","Lizon Maharjan","Tausif Husain"],"abstract":"This paper proposes an anomaly detection method for electric motor drive using state-of-the-art convolutional neural network (CNN) based deep learning methods. As a first step, the motor controller time-series data are preprocessed and transformed into images, and thus the anomaly detection problem is formulated into an image classification problem. AutoGluon is specialized in automating the optimization process of model selection, network topology, and hyperparameter tunning. Under-the-hood, AutoGluon Multimodal (AutoMM) uses PyTorch Image Models (TIMM), which provides a wide variety of pre-trained image models. Different anomalies based on real-life data are labeled as multiple classes. A technique to automate the process of preparing and labeling large-scale train/validate/test data is introduced. AutoMM demonstrates strong capability in handling unbalanced dataset. In the end, the pr...","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/iemdc60492.2025.11061011","openalex_id":"https://openalex.org/W4412030951","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Amazon (United States)"],"concepts":[{"id":"https://openalex.org/C739882","display_name":"Anomaly detection","score":0.5328325033187866},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.4434199035167694},{"id":"https://openalex.org/C176871988","display_name":"Electric motor","score":0.4427802562713623},{"id":"https://openalex.org/C119599485","display_name":"Electrical engineering","score":0.21721410751342773},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.20085939764976501},{"id":"https://openalex.org/C127413603","display_name":"Engineering","score":0.1689348816871643}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/adaem-an-adaptively-and-automated-extensible-measurement-of-llms-value-difference","title":"AdAEM: An Adaptively and Automated Extensible Measurement of LLMs' Value Difference","url":"https://www.microsoft.com/en-us/research/publication/adaem-an-adaptively-and-automated-extensible-measurement-of-llms-value-difference/","published":"2025-05-17","authors":["Shitong Duan","Xiaoyuan Yi","Peng Zhang","Dongkuan Xu","Jing Yao","Tun Lu","Ning Gu","Xing Xie"],"abstract":"Assessing Large Language Models (LLMs)' underlying value differences enables comprehensive comparison of their misalignment, cultural adaptability, and biases. Nevertheless, current value measurement datasets face the informativeness challenge: with often outdated, contaminated, or generic test questions, they can only capture the shared value orientations among different LLMs, leading to saturated and thus uninformative results. To address this problem, we introduce AdAEM, a novel, self-extensible assessment framework for revealing LLMs' inclinations. Distinct from previous static benchmarks, AdAEM can automatically and adaptively generate and extend its test questions. This is achieved by probing the internal value boundaries of a diverse set of LLMs developed across cultures and time periods in an in-context optimization manner. The optimization process theoretically maximizes an info...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","large language models","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"bytedance-seed:240","title":"Model Merging in Pre-training of Large Language Models","url":"https://seed.bytedance.com/en/research/model-merging-in-pre-training-of-large-language-models","published":"2025-05-17","authors":["Yunshui Li","Yiyuan Ma","Shen Yan","Chaoyi Zhang","Jing Liu","Jianqiao Lu","Ziwen Xu","Mengzhao Chen","Minrui Wang","Shiyi Zhan","Jin Ma","Xunhao Lai"],"abstract":"Model merging has emerged as a promising technique for enhancing large language models, though its application in large-scale pre-training remains relatively unexplored. In this paper, we present a comprehensive investigation of model merging techniques during the pre-training process. Through extensive experiments with both dense and Mixture-of-Experts (MoE) architectures ranging from millions to over 100 billion parameters, we demonstrate that merging checkpoints trained with constant learning rates not only achieves significant performance improvements but also enables accurate prediction of annealing behavior. These improvements lead to both more efficient model development and significantly lower training costs. Our detailed ablation studies on merging strategies and hyperparameters provide new insights into the underlying mechanisms while uncovering novel applications. Through comp...","companies":["ByteDance/Seed"],"matched_orgs":["ByteDance/Seed"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["LLM","arXiv","efficient"],"author_affiliations":["ByteDance/Seed"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official ByteDance Seed publication API https://seed.bytedance.com/api/get_article_list_v2"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/semantic-caching-of-contextual-summaries-for-efficient-question-answering-with-language-models","title":"Semantic Caching of Contextual Summaries for Efficient Question-Answering with Language Models","url":"https://www.microsoft.com/en-us/research/publication/semantic-caching-of-contextual-summaries-for-efficient-question-answering-with-language-models/","published":"2025-05-16","authors":["Camille Couturier","Spyridon (Spyros) Mastorakis","Haiying Shen","Saravan Rajmohan","Victor Ruehle"],"abstract":"Large Language Models (LLMs) are increasingly deployed across edge and cloud platforms for real-time question-answering and retrieval-augmented generation. However, processing lengthy contexts in distributed systems incurs high computational overhead, memory usage, and network bandwidth. This paper introduces a novel semantic caching approach for storing and reusing intermediate contextual summaries, enabling efficient information reuse across similar queries in LLM-based QA workflows. Our method reduces redundant computations by up to 50-60% while maintaining answer accuracy comparable to full document processing, as demonstrated on NaturalQuestions, TriviaQA, and a synthetic ArXiv dataset. This approach balances computational cost and response quality, critical for real-time AI assistants.","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":80,"matched_keywords":["Article (Journal)","Artificial intelligence","Computer science","LLM","memory","retrieval","efficient"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/chain-of-model-learning-for-language-model","title":"Chain-of-Model Learning for Language Model","url":"https://www.microsoft.com/en-us/research/publication/chain-of-model-learning-for-language-model/","published":"2025-05-16","authors":["Kaitao Song","Xiaohua Wang","Xu Tan","Huiqiang Jiang","Chengruidong Zhang","Yongliang Shen","LU Cen","Li","Zifan Song","Caihua Shan","Yansen Wang","Kan Ren"],"abstract":"In this paper, we propose a novel learning paradigm, termed Chain-of-Model (CoM), which incorporates the causal relationship into the hidden states of each layer as a chain style, thereby introducing great scaling efficiency in model training and inference flexibility in deployment. We introduce the concept of Chain-of-Representation (CoR), which formulates the hidden states at each layer as a combination of multiple sub-representations (i.e., chains) at the hidden dimension level. In each layer, each chain from the output representations can only view all of its preceding chains in the input representations. Consequently, the model built upon CoM framework can progressively scale up the model size by increasing the chains based on the previous models (i.e., chains), and offer multiple sub-models at varying sizes for elastic inference by using different chain numbers. Based on this princ...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","large language models","1970-01-01","language model"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"official:bb7235e759a7e7ac","title":"Addendum to o3 and o4-mini system card: Codex","url":"https://openai.com/index/o3-o4-mini-codex-system-card-addendum","published":"2025-05-16","authors":["OpenAI"],"abstract":"Codex is a cloud-based coding agent. Codex is powered by codex-1, a version of OpenAI o3 optimized for software engineering. codex-1 was trained using reinforcement learning on real-world coding tasks in a variety of environments to generate code that closely mirrors human style and PR preferences, adheres precisely to instructions, and iteratively runs tests until passing results are achieved.","companies":["OpenAI"],"matched_orgs":["OpenAI"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Safety","agent"],"author_affiliations":["OpenAI"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official RSS feed https://openai.com/news/rss.xml"}},{"id":"apple:y3xvvugv23mp8l9t7qf0leov","title":"Do Large Language Models Have an English Accent? Evaluating and Improving the Naturalness of Multilingual LLMs","url":"https://machinelearning.apple.com/research/english-accent","published":"2025-05-16","authors":["Yanzhu Guo§","Simone Conia¶","Zelin Zhou","Min Li","Saloni Potdar","Henry Xiao"],"abstract":"Current Large Language Models (LLMs) are predominantly designed with English as the primary language, and even the few that are multilingual tend to exhibit strong English-centric biases. Much like speakers who might produce awkward expressions when learning a second language, LLMs often generate unnatural outputs in non-English languages, reflecting English-centric patterns in both vocabulary and grammar. Despite the importance of this issue,...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"openalex:W4414271538","title":"Generative AI for Creating Immersive Learning Environments: Virtual Reality and Beyond","url":"https://doi.org/10.1109/assic64892.2025.11158626","published":"2025-05-16","authors":["Rahul Vadisetty","Anand Polamarasetti","Mahesh Kumar Goyal","Sateesh Kumar Rongali","Sameer kumar Prajapati","Jinal Bhanubhai Butani"],"abstract":"Generative Artificial Intelligence (AI) revolutionizes immersive educational spaces with dynamic, personalized, and interactive experiences. In this article, Generative AI addresses its role in Virtual and Augmented Realities through automated creation, personalized learning pathways, and heightened engagement. With Generative AI, educational simulations can adapt to learner performance, produce interactive characters, and present real-time feedback through models such as Generative Adversarial Networks (GANs) and Transformerbased AI. Considering its potential, computational limitations, ethics, and authentic content concerns must be considered. In its examination, current implementations, benefits, and impediments, such as AI-powered flexible learning, are discussed in detail in this work. In conclusion, Generative AI's role in changing immersive instruction and opening doors for amplif...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/assic64892.2025.11158626","openalex_id":"https://openalex.org/W4414271538","cited_by_count":3,"quality_score":44,"matched_keywords":["personalized"],"author_affiliations":["Andhra University","Google (United States)","Judson University","University of North Carolina at Charlotte","Wayne State University"],"concepts":[{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.8495000004768372},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.698199987411499},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.5961999893188477},{"id":"https://openalex.org/C153715457","display_name":"Augmented reality","score":0.5281999707221985},{"id":"https://openalex.org/C167966045","display_name":"Generative model","score":0.4903999865055084},{"id":"https://openalex.org/C194969405","display_name":"Virtual reality","score":0.4781000018119812},{"id":"https://openalex.org/C49774154","display_name":"Multimedia","score":0.4706999957561493},{"id":"https://openalex.org/C184408114","display_name":"Generative Design","score":0.4050999879837036}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":3}},{"id":"openalex:W4414272007","title":"Cloud-Based Immersive Learning: The Role of Virtual Reality, Big Data, and Generative AI in Transformative Education Experiences","url":"https://doi.org/10.1109/assic64892.2025.11158636","published":"2025-05-16","authors":["Rahul Vadisetty","Anand Polamarasetti","Mahesh Kumar Goyal","Sateesh Kumar Rongali","Srichand Prajapati","Jinal Bhanubhai Butani"],"abstract":"Immersive learning transforms education by integrating Virtual Reality (VR), Big Data, and Generative Artificial Intelligence (AI) in cloud environments. This work discusses these technologies' contribution towards increased engagement, personalized learning, and recall through flexible and interactive experiences. Realistic simulations in a secure environment, real-time analysis via Big Data, and dynamically personalized information via Generative AI make immersive learning a reality. Nevertheless, scalability, security, and ease of integration are yet to be addressed. This article proposes an integrated model for cloud-based immersive learning, comparing conventional and AI-facilitated approaches through experimental evaluation. Besides, technical, ethical, and legislative considerations and future directions for inquiry are addressed. In conclusion, with its potential for personalized...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/assic64892.2025.11158636","openalex_id":"https://openalex.org/W4414272007","cited_by_count":0,"quality_score":41,"matched_keywords":["personalized"],"author_affiliations":["Andhra University","Google (United States)","Judson University","University of North Carolina at Charlotte","Wayne State University"],"concepts":[{"id":"https://openalex.org/C70587473","display_name":"Transformative learning","score":0.7967000007629395},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6079999804496765},{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.6065999865531921},{"id":"https://openalex.org/C38775462","display_name":"Transformational leadership","score":0.5012000203132629},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.48410001397132874},{"id":"https://openalex.org/C167966045","display_name":"Generative model","score":0.4742000102996826},{"id":"https://openalex.org/C194969405","display_name":"Virtual reality","score":0.44429999589920044},{"id":"https://openalex.org/C49774154","display_name":"Multimedia","score":0.3573000133037567}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/words-that-unite-the-world-a-unified-framework-for-deciphering-central-bank-communications-globally","title":"Words That Unite The World: A Unified Framework for Deciphering Central Bank Communications Globally","url":"https://www.microsoft.com/en-us/research/publication/words-that-unite-the-world-a-unified-framework-for-deciphering-central-bank-communications-globally/","published":"2025-05-14","authors":["Agam Shah","Siddhant Sukhani","Huzaifa Pardawala","Saketh Budideti","Riya Bhadani","Rudra Gopal","Siddhartha Somani","Michael Galarnyk","Soungmin Lee","Arnav Hiray","Akshar Ravichandran","Eric Kim"],"abstract":"Central banks around the world play a crucial role in maintaining economic stability. Deciphering policy implications in their communications is essential, especially as misinterpretations can disproportionately impact vulnerable populations. To address this, we introduce the World Central Banks (WCB) dataset, the most comprehensive monetary policy corpus to date, comprising over 380k sentences from 25 central banks across diverse geographic regions, spanning 28 years of historical data. After uniformly sampling 1k sentences per bank (25k total) across all available years, we annotate and review each sentence using dual annotators, disagreement resolutions, and secondary expert reviews. We define three tasks: Stance Detection, Temporal Classification, and Uncertainty Estimation, with each sentence annotated for all three. We benchmark seven Pretrained Language Models (PLMs) and nine Larg...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Economics","Computer science","large language models","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W4410371768","title":"Continual Learning of Large Language Models: A Comprehensive Survey","url":"https://doi.org/10.1145/3735633","published":"2025-05-14","authors":["Haizhou Shi","Zihao Xu","Hengyi Wang","Weiyi Qin","Wenyuan Wang","Yibin Wang","Zifeng Wang","Sayna Ebrahimi","Hao Wang"],"abstract":"The challenge of effectively and efficiently adapting statically pre-trained Large Language Models (LLMs) to ever-evolving data distributions remains predominant. When tailored for specific needs, pre-trained LLMs often suffer from significant performance degradation in previous knowledge domains—a phenomenon known as “catastrophic forgetting” . While extensively studied in the Continual Learning (CL) community, this problem presents new challenges in the context of LLMs. In this survey, we provide a comprehensive overview and detailed discussion of the current research progress on LLMs within the context of CL. Besides the introduction of the preliminary knowledge, this survey is structured into four main sections: we first describe an overview of continually learning LLMs, consisting of two directions of continuity: vertical continuity (or vertical continual learning) , i.e., continual...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"review","doi":"https://doi.org/10.1145/3735633","openalex_id":"https://openalex.org/W4410371768","cited_by_count":34,"quality_score":71,"matched_keywords":["LLM"],"author_affiliations":["Google (United States)","Google DeepMind (United Kingdom)","Rutgers, The State University of New Jersey"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.9042792320251465},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.4185764491558075},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3985273838043213}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":34}},{"id":"hf-org-paper:deepseek-ai:2505.09343","title":"Insights into DeepSeek-V3: Scaling Challenges and Reflections on Hardware for AI Architectures","url":"https://huggingface.co/papers/2505.09343","published":"2025-05-14","authors":["DeepSeek"],"abstract":"The rapid scaling of large language models (LLMs) has unveiled critical limitations in current hardware architectures, including constraints in memory capacity, computational efficiency, and interconnection bandwidth. DeepSeek-V3, trained on 2,048 NVIDIA H800 GPUs, demonstrates how hardware-aware model co-design can effectively address these challenges, enabling cost-efficient training and inference at scale. This paper presents an in-depth analysis of the DeepSeek-V3/R1 model architecture and its AI infrastructure, highlighting key innovations such as Multi-head Latent Attention (MLA) for enhanced memory efficiency, Mixture of Experts (MoE) architectures for optimized computation-communication trade-offs, FP8 mixed-precision training to unlock the full potential of hardware capabilities, and a Multi-Plane Network Topology to minimize cluster-level network overhead. Building on the hardw...","companies":["DeepSeek"],"matched_orgs":["DeepSeek"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["HuggingFace org papers","deepseek-ai","memory","efficient"],"author_affiliations":["DeepSeek"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/deepseek-ai/papers"}},{"id":"hf-org-paper:Qwen:2505.09388","title":"Qwen3 Technical Report","url":"https://huggingface.co/papers/2505.09388","published":"2025-05-14","authors":["Alibaba/Qwen"],"abstract":"In this work, we present Qwen3, the latest version of the Qwen model family. Qwen3 comprises a series of large language models (LLMs) designed to advance performance, efficiency, and multilingual capabilities. The Qwen3 series includes models of both dense and Mixture-of-Expert (MoE) architectures, with parameter scales ranging from 0.6 to 235 billion. A key innovation in Qwen3 is the integration of thinking mode (for complex, multi-step reasoning) and non-thinking mode (for rapid, context-driven responses) into a unified framework. This eliminates the need to switch between different models--such as chat-optimized models (e.g., GPT-4o) and dedicated reasoning models (e.g., QwQ-32B)--and enables dynamic mode switching based on user queries or chat templates. Meanwhile, Qwen3 introduces a thinking budget mechanism, allowing users to allocate computational resources adaptively during infer...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["HuggingFace org papers","Qwen","agent"],"author_affiliations":["Alibaba/Qwen"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/Qwen/papers"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/generating-full-field-evolution-of-physical-dynamics-from-irregular-sparse-observations","title":"Generating Full-field Evolution of Physical Dynamics from Irregular Sparse Observations","url":"https://www.microsoft.com/en-us/research/publication/generating-full-field-evolution-of-physical-dynamics-from-irregular-sparse-observations/","published":"2025-05-13","authors":["Panqi Chen","Yifan Sun","Lei Cheng","Yang Yang","Weichang Li","Yang Liu","Weiqing Liu","Jiang Bian","Shikai Fang"],"abstract":"Modeling and reconstructing multidimensional physical dynamics from sparse and off-grid observations presents a fundamental challenge in scientific research. Recently, diffusion-based generative modeling shows promising potential for physical simulation. However, current approaches typically operate on on-grid data with preset spatiotemporal resolution, but struggle with the sparsely observed and continuous nature of real-world physical dynamics. To fill the gaps, we present SDIFT, Sequential DIffusion in Functional Tucker space, a novel framework that generates full-field evolution of physical dynamics from irregular sparse observations. SDIFT leverages the functional Tucker model as the latent space representer with proven universal approximation property, and represents observations as latent functions and Tucker core sequences. We then construct a sequential diffusion model with temp...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","mathematics","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"bytedance-seed:236","title":"Seed1.5-VL Technical Report","url":"https://seed.bytedance.com/en/research/seed1-5-vl-technical-report","published":"2025-05-13","authors":["Seed Multimodal Team"],"abstract":"We present Seed1.5-VL, a vision-language foundation model designed to advance general-purpose multimodal understanding and reasoning. Seed1.5-VL is composed with a 532M-parameter vision encoder and a Mixture-of-Experts (MoE) LLM of 20B active parameters. Despite its relatively compact architecture, it delivers strong performance across a wide spectrum of public VLM benchmarks and internal evaluation suites, achieving the state-of-the-art performance on 38 out of 60 public benchmarks. Moreover, in agent-centric tasks such as GUI control and gameplay, Seed1.5-VL outperforms leading multimodal systems, including OpenAI CUA and Claude 3.7. Beyond visual and video understanding, it also demonstrates strong reasoning abilities, making it particularly effective for multimodal reasoning challenges such as visual puzzles. We believe these capabilities will empower broader applications across dive...","companies":["ByteDance/Seed"],"matched_orgs":["ByteDance/Seed"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["LLM","Multimodal","arXiv","agent"],"author_affiliations":["ByteDance/Seed"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official ByteDance Seed publication API https://seed.bytedance.com/api/get_article_list_v2"}},{"id":"openalex:W4410322513","title":"The role of generative AI and hybrid feedback in improving L2 writing skills: a comparative study","url":"https://doi.org/10.1080/17501229.2025.2503890","published":"2025-05-13","authors":["Zhihui Zhang","Scott Aubrey","Xiaomeng Huang","Thomas K. F. Chiu"],"abstract":"","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1080/17501229.2025.2503890","openalex_id":"https://openalex.org/W4410322513","cited_by_count":34,"quality_score":67,"matched_keywords":[],"author_affiliations":["Alibaba Group (China)","Chinese University of Hong Kong"],"concepts":[{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.6634896397590637},{"id":"https://openalex.org/C145420912","display_name":"Mathematics education","score":0.4675762951374054},{"id":"https://openalex.org/C15744967","display_name":"Psychology","score":0.46281731128692627},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.4420726001262665},{"id":"https://openalex.org/C41895202","display_name":"Linguistics","score":0.36767151951789856},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.2714877724647522},{"id":"https://openalex.org/C138885662","display_name":"Philosophy","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":34}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/aiopslab-a-holistic-framework-for-evaluating-ai-agents-for-enabling-autonomous-cloud","title":"AIOpsLab: A Holistic Framework for Evaluating AI Agents for Enabling Autonomous Cloud","url":"https://www.microsoft.com/en-us/research/publication/aiopslab-a-holistic-framework-for-evaluating-ai-agents-for-enabling-autonomous-cloud/","published":"2025-05-12","authors":["Yinfang Chen","Manish Shetty","Gagan Somashekar","Minghua Ma","Yogesh Simmhan","Jonathan Mace","Chetan Bansal","Rujia Wang","Saravan Rajmohan"],"abstract":"AI for IT Operations (AIOps) aims to automate complex operational tasks, such as fault localization and root cause analysis, to reduce human workload and minimize customer impact. While traditional DevOps tools and AIOps algorithms often focus on addressing isolated operational tasks, recent advances in Large Language Models (LLMs) and AI agents are revolutionizing AIOps by enabling end-to-end and multitask automation. This paper envisions a future where AI agents autonomously manage operational tasks throughout the entire incident lifecycle, leading to self-healing cloud systems, a paradigm we term AgentOps. Realizing this vision requires a comprehensive framework to guide the design, development, and evaluation of these agents. To this end, we present AIOPSLAB, a framework that not only deploys microservice cloud environments, injects faults, generates workloads, and exports telemetry....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Data platforms and analytics","Systems and networking","1970-01-01","LLM"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"official:996098f270731f2f","title":"MiniMax-Speech: Intrinsic Zero-Shot Text-to-Speech with a Learnable Speaker Encoder","url":"https://huggingface.co/papers/2505.07916","published":"2025-05-12","authors":["MiniMax"],"abstract":"","companies":["MiniMax"],"matched_orgs":["MiniMax"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_report"],"source":"official_report","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["MiniMax"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official company report source"}},{"id":"openalex:W4411337932","title":"SoK: Watermarking for AI-Generated Content","url":"https://doi.org/10.1109/sp61157.2025.00178","published":"2025-05-12","authors":["Xuandong Zhao","Sam Gunn","Miranda Christ","Jaiden Fairoze","Andrés Fábrega","Nicholas Carlini","Sanjam Garg","Sanghyun Hong","Milad Nasr","Florian Tramèr","Somesh Jha","Lei Li"],"abstract":"As the outputs of generative AI (GenAl) techniques improve in quality, it becomes increasingly challenging to distinguish them from human-created content. Watermarking schemes are a promising approach to address the problem of distinguishing between AI and human-generated content. These schemes embed hidden signals within AI -generated content to enable reliable detection. While watermarking is not a silver bullet for addressing all risks associated with GenAl, it can play a crucial role in enhancing AI safety and trustworthiness by combating misinformation and deception. This paper presents a comprehensive overview of water-marking techniques for GenAl, beginning with the need for watermarking from historical and regulatory perspectives. We formalize the definitions and desired properties of watermarking schemes and examine the key objectives and threat models for existing approaches. P...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/sp61157.2025.00178","openalex_id":"https://openalex.org/W4411337932","cited_by_count":11,"quality_score":48,"matched_keywords":[],"author_affiliations":["Berkeley College","Carnegie Mellon University","Columbia University","Cornell University","ETH Zurich","Google (United States)","Google DeepMind (United Kingdom)","Oregon State University","UC San Diego Health System","University of California, Berkeley","University of Wisconsin–Madison"],"concepts":[{"id":"https://openalex.org/C150817343","display_name":"Digital watermarking","score":0.889220118522644},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6899242997169495},{"id":"https://openalex.org/C2778152352","display_name":"Content (measure theory)","score":0.6271241903305054},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.355500191450119},{"id":"https://openalex.org/C108827166","display_name":"Internet privacy","score":0.3415434956550598},{"id":"https://openalex.org/C33923547","display_name":"Mathematics","score":0.10696837306022644},{"id":"https://openalex.org/C115961682","display_name":"Image (mathematics)","score":0.09318122267723083},{"id":"https://openalex.org/C134306372","display_name":"Mathematical analysis","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":11}},{"id":"openalex:W4411337856","title":"Supporting Human Raters with the Detection of Harmful Content Using Large Language Models","url":"https://doi.org/10.1109/sp61157.2025.00082","published":"2025-05-12","authors":["Kurt Thomas","Patrick Gage Kelley","David Tao","Sarah Meiklejohn","Owen Vallis","Shunwen Tan","Blaž Bratanič","Felipe Tiengo Ferreira","Vijay Eranti","Elie Bursztein"],"abstract":"In this paper, we explore the feasibility of leveraging large language models (LLMs) to automate or otherwise assist human raters with identifying harmful content including hate speech, harassment, violent extremism, and election misinformation. Using a dataset of 50,000 user comments, we demonstrate that LLMs can achieve 90 % accuracy when compared to human verdicts. We explore how to best leverage these capabilities, proposing five design patterns that integrate LLMs with human rating, such as pre-filtering non-violative content, detecting potential errors in human rating, or surfacing critical context to support human rating. We outline how to support all of these design patterns using a single, optimized prompt. Beyond these synthetic experiments, we share how piloting our proposed techniques in a real-world review queue yielded a 41.5% improvement in optimizing available human rater...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/sp61157.2025.00082","openalex_id":"https://openalex.org/W4411337856","cited_by_count":6,"quality_score":47,"matched_keywords":["election"],"author_affiliations":["DeepMind (United Kingdom)","Google (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7069799304008484},{"id":"https://openalex.org/C2778152352","display_name":"Content (measure theory)","score":0.6638155579566956},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.5847144722938538},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.44725966453552246},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3884970545768738},{"id":"https://openalex.org/C33923547","display_name":"Mathematics","score":0.09724223613739014},{"id":"https://openalex.org/C134306372","display_name":"Mathematical analysis","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":6}},{"id":"openalex:W4410294680","title":"ResDecode: Accelerating Large Language Models Inference via Residual Decoding Heads","url":"https://doi.org/10.26599/bdma.2024.9020074","published":"2025-05-12","authors":["Ziqian Zeng","Jiahong Yu","Qianshi Pang","Zihao Wang","Huiping Zhuang","Yu Fan","Hongen Shao","Xiaofeng Zou"],"abstract":"Large language Models (LLMs) have immense potential to enhance the capabilities of Cyber-Physical-Social Intelligence (CPSI) systems, enabling them to better engage with complex cyber, physical, and social environments. However, the high inference latency of LLMs, which is inherited from the autoregressive decoding process, hinders their wide application in CPSI systems. To address this challenge, current approaches have incorporated speculative decoding to enable parallel prediction of multiple subsequent tokens, thereby achieving inference acceleration. Nevertheless, the accuracy of these decoding heads falls short of the autoregressive decoding approach. In light of these limitations, we propose ResDecode, a novel speculative decoding method characterized by its efficient and accurate decoding heads. Within the lightweight draft model, we propose a residual decoding head to compensate...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.26599/bdma.2024.9020074","openalex_id":"https://openalex.org/W4410294680","cited_by_count":3,"quality_score":44,"matched_keywords":["efficient"],"author_affiliations":["Hong Kong University of Science and Technology","Huawei Technologies (China)","South China University of Technology","University of Hong Kong"],"concepts":[{"id":"https://openalex.org/C155512373","display_name":"Residual","score":0.7401943206787109},{"id":"https://openalex.org/C57273362","display_name":"Decoding methods","score":0.7046432495117188},{"id":"https://openalex.org/C2776214188","display_name":"Inference","score":0.702434778213501},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.571358859539032},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.412275493144989},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.358278751373291},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3465504050254822},{"id":"https://openalex.org/C11413529","display_name":"Algorithm","score":0.23862957954406738}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":3}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/generative-pre-trained-autoregressive-diffusion-transformer","title":"Generative Pre-trained Autoregressive Diffusion Transformer","url":"https://www.microsoft.com/en-us/research/publication/generative-pre-trained-autoregressive-diffusion-transformer/","published":"2025-05-11","authors":["Yuan Zhang","Jiacheng Jiang","Guoqing Ma","Zhiying Lu","Bo Wang","Haoyang Huang","Jian-min Yuan","Nan Duan"],"abstract":"In this work, we present GPDiT, a Generative Pre-trained Autoregressive Diffusion Transformer that unifies the strengths of diffusion and autoregressive modeling for long-range video synthesis, within a continuous latent space. Instead of predicting discrete tokens, GPDiT autoregressively predicts future latent frames using a diffusion loss, enabling natural modeling of motion dynamics and semantic consistency across frames. This continuous autoregressive framework not only enhances generation quality but also endows the model with representation capabilities. Additionally, we introduce a lightweight causal attention variant and a parameter-free rotation-based time-conditioning mechanism, improving both the training and inference efficiency. Extensive experiments demonstrate that GPDiT achieves strong performance in video generation quality, video representation ability, and few-shot lea...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"bytedance-seed:262","title":"Understanding Stragglers in Large Model Training Using What-if Analysis","url":"https://seed.bytedance.com/en/research/understanding-stragglers-in-large-model-training-using-what-if-analysis","published":"2025-05-09","authors":["Jinkun Lin","Ziheng Jiang","Zuquan Song","Sida Zhao","Menghan Yu","Zhanghan Wang","Chenyuan Wang","Zuocheng Shi","Xiang Shi","Wei Jia","Zherui Liu","Shuguang Wang"],"abstract":"Large language model (LLM) training is one of the most demanding distributed computations today, often requiring thousands of GPUs with frequent synchronization across machines. Such a workload pattern makes it susceptible to stragglers, where the training can be stalled by few slow workers. At ByteDance we find stragglers are not trivially always caused by hardware failures, but can arise from multiple complex factors. This work aims to present a comprehensive study on the straggler issues in LLM training, using a five-month trace collected from our ByteDance LLM training cluster. The core methodology is what-if analysis that simulates the scenario without any stragglers and contrasts with the actual case. We use this method to study the following questions: (1) how often do stragglers affect training jobs, and what effect do they have on job performance; (2) do stragglers exhibit tempo...","companies":["ByteDance/Seed"],"matched_orgs":["ByteDance/Seed"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Cluster Computing","Infrastructures","OSDI 2025","LLM","language model"],"author_affiliations":["ByteDance/Seed"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official ByteDance Seed publication API https://seed.bytedance.com/api/get_article_list_v2"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/cost-effective-low-latency-vector-search-with-azure-cosmos-db","title":"Cost-Effective, Low Latency Vector Search with Azure Cosmos DB","url":"https://www.microsoft.com/en-us/research/publication/cost-effective-low-latency-vector-search-with-azure-cosmos-db/","published":"2025-05-09","authors":["Nitish Upreti","Krishnan Sundaram","Hari Sudan Sundar","Samer Boshra","Balachandar Perumalswamy","S. Atri","Martin Chisholm","Revti Raman Singh","Greg Yang","Subramanyam Pattipaka","Tamara Hass","Nitesh Dudhey"],"abstract":"Vector indexing enables semantic search over diverse corpora and has become an important interface to databases for both users and AI agents. Efficient vector search requires deep optimizations in database systems. This has motivated a new class of specialized vector databases that optimize for vector search quality and cost. Instead, we argue that a scalable, high-performance, and cost-efficient vector search system can be built inside a cloud-native operational database like Azure Cosmos DB while leveraging the benefits of a distributed database such as high availability, durability, and scale. We do this by deeply integrating DiskANN, a state-of-the-art vector indexing library, inside Azure Cosmos DB NoSQL. This system uses a single vector index per partition stored in existing index trees, and kept in sync with underlying data. It supports < 20ms query latency over an index spanning....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Article (Journal)","Algorithms","Computer science","1970-01-01","efficient"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"apple:h5gbvcp98zi6unv3ybc9mu2k","title":"Matrix3D: Large Photogrammetry Model All-in-One","url":"https://machinelearning.apple.com/research/large-photogrammetry-model","published":"2025-05-09","authors":["Yuanxun Lu","Jingyang Zhang","Tian Fang","Jean–Daniel Nahmias","Yanghai Tsin","Long Quan","Xun Cao","Yao Yao","Shiwei Li"],"abstract":"We present Matrix3D, a unified model that performs several photogrammetry subtasks, including pose estimation, depth prediction, and novel view synthesis using just the same model. Matrix3D utilizes a multi-modal diffusion transformer (DiT) to integrate transformations across several modalities, such as images, camera parameters, and depth maps. The key to Matrix3D’s large-scale multi-modal training lies in the incorporation of a mask learning...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"openalex:W4411584410","title":"DST-GFN: A Dual-Stage Transformer Network with Gated Fusion for Pairwise User Preference Prediction in Dialogue Systems","url":"https://doi.org/10.1109/aemcse65292.2025.11042684","published":"2025-05-09","authors":["Kowei Shih","Zhenghao Deng","Xiang Chen","Yuanzhe Zhang","Li Zhang"],"abstract":"User preference prediction is important for customizing responses in large language models (LLMs) for dialogue systems. This paper presents DST-GFN (Dual-Stage TransformerGated Fusion Network), a model made to predict preferences from two responses generated by LLMs. DST-GFN uses a DualStage Transformer Encoder, a Gated Fusion Block (GFB), and a Hierarchical Contextual Fusion (HCF) layer to find connections and differences between responses. The model processes prompt-response pairs and the relationship between the two responses with two encoders. A gating method then combines the outputs. The final prediction is made using a Softmax layer. It applies a weighted cross-entropy loss and $L 2$ regularization to reduce class imbalance and overfitting. Tests show that DST-GFN performs better than models like BERT, LSTM, and GRU. Its key parts, the Dual-Stage Encoder and Gated Fusion Block, ar...","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/aemcse65292.2025.11042684","openalex_id":"https://openalex.org/W4411584410","cited_by_count":5,"quality_score":46,"matched_keywords":["preference"],"author_affiliations":["Amazon (United States)","Boston University","Harvard University","Massachusetts General Hospital","University of California, Irvine"],"concepts":[{"id":"https://openalex.org/C184898388","display_name":"Pairwise comparison","score":0.8202286958694458},{"id":"https://openalex.org/C2780980858","display_name":"Dual (grammatical number)","score":0.6254445910453796},{"id":"https://openalex.org/C2781249084","display_name":"Preference","score":0.5485256314277649},{"id":"https://openalex.org/C158525013","display_name":"Fusion","score":0.547662079334259},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5418340563774109},{"id":"https://openalex.org/C66322947","display_name":"Transformer","score":0.5157304406166077},{"id":"https://openalex.org/C146357865","display_name":"Stage (stratigraphy)","score":0.44915786385536194},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.444367915391922}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":5}},{"id":"openalex:W4410242491","title":"Autoregressive Temporal Modeling for Advanced Tracking-by-Diffusion","url":"https://doi.org/10.1007/s11263-025-02439-x","published":"2025-05-09","authors":["Pha Nguyen","Rishi Madhok","Bhiksha Raj","Khoa Luu"],"abstract":"Object tracking is a widely studied computer vision task with video and instance analysis applications. While paradigms such as tracking-by-regression,-detection,-attention have advanced the field, generative modeling offers new potential. Although some studies explore the generative process in instance-based understanding tasks, they rely on prediction refinement in the coordinate space rather than the visual domain. Instead, this paper presents Tracking-by-Diffusion, a novel paradigm for object tracking in video, leveraging visual generative models via the perspective of autoregressive models. This paradigm demonstrates broad applicability across point, box, and mask modalities while uniquely enabling textual guidance. We present DIFTracker, a framework that utilizes iterative latent variable diffusion models to redefine tracking as a next-frame reconstruction task. Our approach unique...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1007/s11263-025-02439-x","openalex_id":"https://openalex.org/W4410242491","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Carnegie Mellon University","Microsoft (United States)","University of Arkansas at Fayetteville"],"concepts":[{"id":"https://openalex.org/C159877910","display_name":"Autoregressive model","score":0.7676832675933838},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5909332036972046},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5389874577522278},{"id":"https://openalex.org/C153180895","display_name":"Pattern recognition (psychology)","score":0.5144729614257812},{"id":"https://openalex.org/C69357855","display_name":"Diffusion","score":0.491792231798172},{"id":"https://openalex.org/C2775936607","display_name":"Tracking (education)","score":0.47018516063690186},{"id":"https://openalex.org/C194657046","display_name":"STAR model","score":0.4102151095867157},{"id":"https://openalex.org/C151406439","display_name":"Time series","score":0.3159031271934509}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/llms-get-lost-in-multi-turn-conversation","title":"LLMs Get Lost In Multi-Turn Conversation","url":"https://www.microsoft.com/en-us/research/publication/llms-get-lost-in-multi-turn-conversation/","published":"2025-05-08","authors":["Philippe Laban","Hiroaki Hayashi","Yingbo Zhou","Jennifer Neville"],"abstract":"Large Language Models (LLMs) are conversational interfaces. As such, LLMs have the potential to assist their users not only when they can fully specify the task at hand, but also to help them define, explore, and refine what they need through multi-turn conversational exchange. Although analysis of LLM conversation logs has confirmed that underspecification occurs frequently in user instructions, LLM evaluation has predominantly focused on the single-turn, fully-specified instruction setting. In this work, we perform large-scale simulation experiments to compare LLM performance in single- and multi-turn settings. Our experiments confirm that all the top open- and closed-weight LLMs we test exhibit significantly lower performance in multi-turn conversations than single-turn, with an average drop of 39% across six generation tasks. Analysis of 200,000+ simulated conversations decomposes th...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":80,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Human language technologies","Computer science","large language models","1970-01-01","LLM"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W4410637047","title":"Tutorial on Landing Generative AI in Industrial Social and E-commerce Recsys","url":"https://doi.org/10.1145/3701716.3715871","published":"2025-05-08","authors":["Da Xu","Danqing Zhang","Chuanwei Ruan","Lingling Zheng","Bo Yang","Guangyu Yang","Shuyuan Xu","Haixun Wang"],"abstract":"Over the past two years, generative AI (GenAI) has evolved rapidly, influencing interdisciplinary fields including social and e-commerce Recsys.Despite several exciting research advances, landing GenAI innovations in real-world Recsys remains challenging due to the sophistication of modern industrial product and systems.Our tutorial begins with a brief overview of industrial Recsys and GenAI fundamentals (including LLMOps), followed by the ongoing efforts and opportunities to enhance existing Recsys data and model with foundation models.We then explore how GenAI's curation and reasoning capabilities can be integrated into Recsys-for example, by repurposing raw content, incorporating external knowledge for display and creative optimization, and generating personalized insights/explanations to foster transparency and trust.Following this, the tutorial highlights how AI agents can reshape R...","companies":["Microsoft","Amazon"],"matched_orgs":["Microsoft","Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3701716.3715871","openalex_id":"https://openalex.org/W4410637047","cited_by_count":5,"quality_score":58,"matched_keywords":["personalized"],"author_affiliations":["Amazon (United States)","LinkedIn (United States)","Menlo School","Microsoft (United States)","Teikoku Pharma (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5973069667816162},{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.536454975605011},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4674241244792938},{"id":"https://openalex.org/C2522767166","display_name":"Data science","score":0.3739736080169678},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.3698245882987976}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4410636941","title":"A Responsible and Extendable Context-Aware Recommender System","url":"https://doi.org/10.1145/3701716.3715164","published":"2025-05-08","authors":["Xiangmin Zhou","Lei Chen","Chengkun He","Junfeng Wu","Weiyi Zhou","Jie Shao","Yanchun Zhang"],"abstract":"Context-aware social media recommendation has been important in many applications such as e-commerce and entertainment. However, existing systems consider pre-specified contexts and cannot well handle user preferences, which negatively affects the recommendation quality and efficiency, and causes them not extendable to various applications. In this demo, we design RECARS, the first responsible and extendable context-aware recommender system. RECARS is designed with novel techniques, including efficient data organization over MongoDB and Apache Flink, and effective responsible recommendation generation that supports the system interactions with users. It allows users to perform iteratively refining and explaining the results by active learning and large language model (LLM). We demonstrate the usage of RECARS via YouTube.","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3701716.3715164","openalex_id":"https://openalex.org/W4410636941","cited_by_count":1,"quality_score":54,"matched_keywords":["LLM","language model","media","efficient"],"author_affiliations":["Hong Kong University of Science and Technology","Microsoft (United States)","RMIT University","Seattle University","Victoria University"],"concepts":[{"id":"https://openalex.org/C557471498","display_name":"Recommender system","score":0.9014343023300171},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7990155220031738},{"id":"https://openalex.org/C2779343474","display_name":"Context (archaeology)","score":0.6359628438949585},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.4441670775413513},{"id":"https://openalex.org/C136764020","display_name":"World Wide Web","score":0.42180323600769043},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.3563989996910095},{"id":"https://openalex.org/C151730666","display_name":"Paleontology","score":0.0},{"id":"https://openalex.org/C86803240","display_name":"Biology","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4411549842","title":"Large Language Model for E-Commerce Workshop","url":"https://doi.org/10.1145/3701716.3717864","published":"2025-05-08","authors":["Haoyu Han","Xianfeng Tang","Jing‐Zhi Huang","Monica Cheng","Omar Alonso","Hanqing Lu","Zhen Li","Chen Luo","Dawei Yin","Jiliang Tang"],"abstract":"Large Language Models (LLMs) are transforming E-Commerce by powering numerous applications such as product recommendation, search, classification, question answering, and advertising. Their growing adoption in real-world systems highlights their potential, yet challenges remain in ensuring accuracy, efficiency, fairness, and privacy. This workshop aims to bring together researchers and industry practitioners to explore the opportunities and limitations of LLMs in e-commerce. Through discussions on model design, algorithmic advancements, and practical deployment strategies, the workshop seeks to foster collaboration, bridge the gap between academia and industry, and drive innovation in applying LLMs to E-Commerce.","companies":["Amazon","Baidu"],"matched_orgs":["Amazon","Baidu"],"company_groups":["company_us","company_china"],"company_regions":["US","China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3701716.3717864","openalex_id":"https://openalex.org/W4411549842","cited_by_count":0,"quality_score":53,"matched_keywords":["language model"],"author_affiliations":["Amazon (United States)","Baidu (China)","Michigan State University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7429687976837158},{"id":"https://openalex.org/C199360897","display_name":"Programming language","score":0.4751845896244049},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.37483540177345276},{"id":"https://openalex.org/C136764020","display_name":"World Wide Web","score":0.32790428400039673},{"id":"https://openalex.org/C115903868","display_name":"Software engineering","score":0.32710927724838257}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4410637127","title":"RALLRec: Improving Retrieval Augmented Large Language Model Recommendation with Representation Learning","url":"https://doi.org/10.1145/3701716.3715508","published":"2025-05-08","authors":["Jian Xu","Sichun Luo","Xiangyu Chen","Haoming Huang","Hanxu Hou","Linqi Song"],"abstract":"Large Language Models (LLMs) have been integrated into recommendation systems to enhance user behavior comprehension. The Retrieval Augmented Generation (RAG) technique is further incorporated into these systems to retrieve more relevant items and improve system performance. However, existing RAG methods rely primarily on textual semantics and often fail to incorporate the most relevant items, limiting the effectiveness of the systems.","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3701716.3715508","openalex_id":"https://openalex.org/W4410637127","cited_by_count":5,"quality_score":50,"matched_keywords":["language model","retrieval"],"author_affiliations":["Alibaba Group (China)","City College of Dongguan University of Technology","City University of Hong Kong","City University of Hong Kong, Shenzhen Research Institute","Dongguan University of Technology","Tsinghua University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8401556015014648},{"id":"https://openalex.org/C2776359362","display_name":"Representation (politics)","score":0.5828117728233337},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5725656747817993},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.5689454078674316},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.5246332287788391},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.46492624282836914},{"id":"https://openalex.org/C59404180","display_name":"Feature learning","score":0.4116674065589905},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.32112643122673035}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":5}},{"id":"arxiv:2505.07105","title":"Knowledge Distillation for Enhancing Walmart E-commerce Search Relevance Using Large Language Models","url":"http://arxiv.org/abs/2505.07105","published":"2025-05-08","authors":["Hongwei Shang","Nguyen Vo","N. Sudhakar Yadav","Tian Zhang","Ajit Puthenputhussery","Xunfan Cai","S. Chen","Prijith Chandran","Changsung Kang"],"abstract":"Ensuring the products displayed in e-commerce search results are relevant to users' queries is crucial for improving the user experience. With their advanced semantic understanding, deep learning models have been widely used for relevance matching in search tasks. While large language models (LLMs) offer superior ranking capabilities, it is challenging to deploy LLMs in real-time systems due to the high-latency requirements. To leverage the ranking power of LLMs while meeting the low-latency demands of production systems, we propose a novel framework that distills a high-performing LLM into a more efficient, low-latency student model. To help the student model learn more effectively from the teacher model, we first train the teacher LLM as a classification model with soft targets. Then, we train the student model to capture the relevance margin between pairs of products for a given query...","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"preprint","doi":"https://doi.org/10.1145/3701716.3715242","openalex_id":"https://openalex.org/W4410636959","cited_by_count":1,"quality_score":50,"matched_keywords":["LLM","efficient","distillation"],"author_affiliations":["Amazon (United States)","Walmart (United States)"],"concepts":[{"id":"https://openalex.org/C158154518","display_name":"Relevance (law)","score":0.8288050293922424},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7289395928382874},{"id":"https://openalex.org/C204030448","display_name":"Distillation","score":0.6325825452804565},{"id":"https://openalex.org/C78597825","display_name":"E-commerce","score":0.5247021913528442},{"id":"https://openalex.org/C2779532271","display_name":"Relevance feedback","score":0.4374256730079651},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.4235420823097229},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.40413832664489746},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.3517269492149353}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"arxiv:2502.02988","title":"Training an LLM-as-a-Judge Model: Pipeline, Insights, and Practical Lessons","url":"http://arxiv.org/abs/2502.02988","published":"2025-05-08","authors":["Renjun Hu","Cheng Yi","Libin Meng","Jiaxin Xia","Yi Zong","Xing Shi","Wei Lin"],"abstract":"The rapid advancement of large language models (LLMs) has opened new possibilities for their adoption as evaluative judges. This paper introduces Themis, a fine-tuned LLM judge that delivers sophisticated context-aware evaluations. We provide a comprehensive overview of the development pipeline for Themis, highlighting its scenario-dependent evaluation prompts and two novel methods for controlled instruction generation. These designs enable Themis to effectively distill evaluative skills from teacher models, while retaining flexibility for continuous development. We introduce two human-labeled benchmarks for meta-evaluation, demonstrating that Themis can achieve high alignment with human preferences in an economical manner. Additionally, we explore insights into the LLM-as-a-judge paradigm, revealing nuances in performance and the varied effects of reference answers. Notably, we observe....","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"preprint","doi":"https://doi.org/10.1145/3701716.3715265","openalex_id":"https://openalex.org/W4407212961","cited_by_count":4,"quality_score":49,"matched_keywords":["LLM","distillation"],"author_affiliations":["Alibaba Group (China)","East China Normal University","Fudan University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7719640731811523},{"id":"https://openalex.org/C2780598303","display_name":"Flexibility (engineering)","score":0.7397087812423706},{"id":"https://openalex.org/C43521106","display_name":"Pipeline (software)","score":0.711883544921875},{"id":"https://openalex.org/C2779343474","display_name":"Context (archaeology)","score":0.6566354632377625},{"id":"https://openalex.org/C183003079","display_name":"Personalization","score":0.5463956594467163},{"id":"https://openalex.org/C176217482","display_name":"Metric (unit)","score":0.5116344094276428},{"id":"https://openalex.org/C2522767166","display_name":"Data science","score":0.38279759883880615},{"id":"https://openalex.org/C56739046","display_name":"Knowledge management","score":0.3416292667388916}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":4}},{"id":"openalex:W4410637834","title":"Generative Large Recommendation Models: Emerging Trends in LLMs for Recommendation","url":"https://doi.org/10.1145/3701716.3715865","published":"2025-05-08","authors":["Hao Wang","Wei Guo","Liuna Zhang","Jin Yao Chin","Yufei Ye","Huifeng Guo","Yong Liu","Defu Lian","Ruiming Tang","Enhong Chen"],"abstract":"In the era of information overload, recommendation systems play a pivotal role in filtering data and delivering personalized content. Recent advancements in feature interaction and user behavior modeling have significantly enhanced the recall and ranking processes of these systems. With the rise of large language models (LLMs), new opportunities have emerged to further improve recommendation systems. This tutorial explores two primary approaches for integrating LLMs: LLMs-enhanced recommendations, which leverage the reasoning capabilities of general LLMs, and generative large recommendation models, which focus on scaling and sophistication. While the former has been extensively covered in existing literature, the latter remains underexplored. This tutorial aims to fill this gap by providing a comprehensive overview of generative large recommendation models, including their recent advance...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3701716.3715865","openalex_id":"https://openalex.org/W4410637834","cited_by_count":6,"quality_score":47,"matched_keywords":["personalized"],"author_affiliations":["Huawei Technologies (China)","University of Science and Technology of China"],"concepts":[{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.6128605604171753},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5790238380432129},{"id":"https://openalex.org/C167966045","display_name":"Generative model","score":0.5035609602928162},{"id":"https://openalex.org/C2522767166","display_name":"Data science","score":0.4300926625728607},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3374013900756836}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":6}},{"id":"openalex:W4410636854","title":"Few-shot LLM Synthetic Data with Distribution Matching","url":"https://doi.org/10.1145/3701716.3715245","published":"2025-05-08","authors":["Jiyuan Ren","Zhaocheng Du","Zhihao Wen","Qinglin Jia","Sunhao Dai","Chuhan Wu","Zhenhua Dong"],"abstract":"As large language models (LLMs) advance, their ability to perform in-context learning and few-shot language generation has improved significantly. This has spurred using LLMs to produce high-quality synthetic data to enhance the performance of smaller models like online retrievers or weak LLMs. However, LLM-generated synthetic data often differs from the real data in key language attributes (e.g., styles, tones, content proportions, etc.). As a result, mixing these synthetic data directly with real data may distort the original data distribution, potentially hindering performance improvements. To solve this, we introduce SynAlign: a synthetic data generation and filtering framework based on key attribute distribution matching. Before generation, SynAlign employs an uncertainty tracker surrogated by the Gaussian Process model to iteratively select data clusters distinct from selected ones...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3701716.3715245","openalex_id":"https://openalex.org/W4410636854","cited_by_count":2,"quality_score":47,"matched_keywords":["LLM","efficient"],"author_affiliations":["Huawei Technologies (China)","Renmin University of China","Tsinghua University"],"concepts":[{"id":"https://openalex.org/C2778344882","display_name":"Shot (pellet)","score":0.7226340770721436},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6375576257705688},{"id":"https://openalex.org/C165064840","display_name":"Matching (statistics)","score":0.5991864800453186},{"id":"https://openalex.org/C160920958","display_name":"Synthetic data","score":0.5397303104400635},{"id":"https://openalex.org/C2992734406","display_name":"One shot","score":0.4882180988788605},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3327788710594177},{"id":"https://openalex.org/C33923547","display_name":"Mathematics","score":0.14884477853775024},{"id":"https://openalex.org/C127413603","display_name":"Engineering","score":0.09094884991645813}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":2}},{"id":"openalex:W4410636968","title":"GPT4Image: Large Pre-trained Models Help Vision Models Learn Better on Perception Task","url":"https://doi.org/10.1145/3701716.3717661","published":"2025-05-08","authors":["Ning Ding","Yehui Tang","Zhongqian Lv","Chao Xu","Kai Han","Yunhe Wang"],"abstract":"The upsurge in pre-trained large models started by ChatGPT has swept across the entire deep learning community.Such powerful models demonstrate advanced generative ability and multimodal understanding capability, which quickly set new state of the arts on a variety of benchmarks.The pre-trained LLM usually plays the role as a universal AI model that can conduct various tasks like article analysis and image comprehension.However, due to the prohibitively high memory and computational cost of implementing such a large model, the conventional models (such as CNN and ViT) are still essential for many visual perception tasks.In this paper, we propose to enhance the representation ability of ordinary vision models on perception tasks (e.g.image classification) by taking advantage of the off-the-shelf large pre-trained models.We present a new learning framework, dubbed GPT4Image, where the know...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3701716.3717661","openalex_id":"https://openalex.org/W4410636968","cited_by_count":1,"quality_score":46,"matched_keywords":["LLM","memory"],"author_affiliations":["Huawei Technologies (China)","Peking University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7709550261497498},{"id":"https://openalex.org/C2780451532","display_name":"Task (project management)","score":0.7405248880386353},{"id":"https://openalex.org/C26760741","display_name":"Perception","score":0.6982365846633911},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5354939103126526},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.40338897705078125},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.39619773626327515},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.33134663105010986},{"id":"https://openalex.org/C15744967","display_name":"Psychology","score":0.16695624589920044}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4410636510","title":"Beyond Retrieval: Generating Narratives in Conversational Recommender Systems","url":"https://doi.org/10.1145/3701716.3717531","published":"2025-05-08","authors":["Krishna Sayana","Raghavendra Vasudeva","Yuri Vasilevski","Su Kun","Liam Hebert","J. Pine","Hubert Pham","Ambarish Jash","Sukhdeep Sodhi"],"abstract":"Large Language Models (LLMs) have shown remarkable progress in generating human-quality text and engaging in complex reasoning. This presents a unique opportunity to revolutionize conversational recommender systems by enabling them to generate rich, engaging and personalized narratives that go beyond recommendations. However, the lack of suitable datasets limits research in this area. This paper addresses this challenge by making two key contributions.","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3701716.3717531","openalex_id":"https://openalex.org/W4410636510","cited_by_count":1,"quality_score":46,"matched_keywords":["personalized","retrieval"],"author_affiliations":["Google (United States)","University of Waterloo"],"concepts":[{"id":"https://openalex.org/C557471498","display_name":"Recommender system","score":0.8448330760002136},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7992719411849976},{"id":"https://openalex.org/C199033989","display_name":"Narrative","score":0.737409234046936},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.6382255554199219},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.4186781048774719},{"id":"https://openalex.org/C136764020","display_name":"World Wide Web","score":0.4068465828895569},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3448147773742676},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.324840247631073}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4410636370","title":"Towards Explainable Search Results in E-commerce","url":"https://doi.org/10.1145/3701716.3715264","published":"2025-05-08","authors":["Xianyang Tian","Xiang Xu","Chao Wang","Tong Ruan","Baohua Wu","Maofei Que","Shenghua Ni","Zhuoran Zhuang","Jingping Liu"],"abstract":"Search result explanations are essential in E-commerce, helping users understand the relevance of the returned results. Existing methods primarily focus on explaining relevance based on either product content or behavioral data. However, we argue that combining both content and behavior data can provide more comprehensive and accurate explanations. In this paper, we propose a novel approach to generate relevance explanations. First, we utilize the content data to train a domain-specific large language model (LLM) that generates relevance labels and reasoning processes for queries and items. Then, we introduce the BehaviorRAG framework to retrieve behavioral data related to queries and items, allowing the model to generate explainable reasons for their relevance. Finally, the LLM integrates outputs from both the content- and behavior-based modules to produce a final explanation. To evalua...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3701716.3715264","openalex_id":"https://openalex.org/W4410636370","cited_by_count":0,"quality_score":45,"matched_keywords":["LLM","language model"],"author_affiliations":["Alibaba Group (China)","East China University of Science and Technology","Shanghai University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6824923157691956},{"id":"https://openalex.org/C78597825","display_name":"E-commerce","score":0.4397630989551544},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.37620025873184204},{"id":"https://openalex.org/C136764020","display_name":"World Wide Web","score":0.3654058277606964}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4410638153","title":"EKD4Rec: Ensemble Knowledge Distillation from LLM-based Models to Traditional Sequential Recommenders","url":"https://doi.org/10.1145/3701716.3715527","published":"2025-05-08","authors":["Yue Wang","Dingyi Zhang","Haoyu Wenren","Yue Wang","Yingming Li"],"abstract":"In this paper, we propose a new ensemble knowledge distillation method for distilling knowledge from LLM-based recommendation (teacher) models to traditional light-weight sequential recommendation (student) models. In particular, instead of using one single teacher model, the averaged prediction from multiple teachers is employed as the soft targets of knowledge distillation. Further, only the top K soft labels of teachers' output distribution are sampled for distillation to make it more focused on the corresponding high-ranked items. Extensive experiments on three public datasets show the effectiveness of the proposed ensemble knowledge distillation for sequential recommendation (EKD4Rec).","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3701716.3715527","openalex_id":"https://openalex.org/W4410638153","cited_by_count":0,"quality_score":45,"matched_keywords":["LLM","distillation"],"author_affiliations":["Alibaba Group (China)","Zhejiang University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6892104148864746},{"id":"https://openalex.org/C204030448","display_name":"Distillation","score":0.616100013256073},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4821738600730896},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.4275363087654114},{"id":"https://openalex.org/C43617362","display_name":"Chromatography","score":0.08611199259757996},{"id":"https://openalex.org/C185592680","display_name":"Chemistry","score":0.06889057159423828}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4411549478","title":"RLG-RAG: Guiding the Knowledge Retrieval and Evaluation in Retrieval-Augmented Generation Framework by Reasoning Logic","url":"https://doi.org/10.1145/3701716.3715554","published":"2025-05-08","authors":["Kehan Xu","Kun Zhang","Wei Huang","Jingyuan Li","Yuanzhuo Wang"],"abstract":"The knowledge retrieval, integration, and evaluation processes in the RAG method lack the guidance of reasoning logic, leading to ongoing challenges in maintaining factual consistency. To address these issues, this paper proposes the RLG-RAG framework, which constructs a reasoning graph based on user queries to guide the knowledge retrieval, integration, and evaluation processes. By fully representing the reasoning logic of RAG, RLG-RAG dynamically models and integrates knowledge relationships during retrieval and defines a precise scope of relevant knowledge through sufficiency evaluation. This reduces inference-irrelevant knowledge that large language models may obtain. Experimental analyses on accuracy, factual consistency, and robustness demonstrate that RLG-RAG resists interference and provides accurate, factually consistent answers. The project URL is https://doi.org/10.5281/zenodo...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3701716.3715554","openalex_id":"https://openalex.org/W4411549478","cited_by_count":1,"quality_score":42,"matched_keywords":["retrieval"],"author_affiliations":["Beijing Technology and Business University","Chinese Academy of Sciences","Institute of Computing Technology","Tencent (China)","Yanshan University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7327203750610352},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.4666431248188019},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4521210789680481},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.4514819085597992}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4410636401","title":"FLARE: Fusing Language Models and Collaborative Architectures for Recommender Enhancement","url":"https://doi.org/10.1145/3701716.3717554","published":"2025-05-08","authors":["Liam Hebert","Marialena Kyriakidi","Hubert Pham","Krishna Sayana","J. Pine","Sukhdeep Sodhi","Ambarish Jash"],"abstract":"Recent proposals in recommender systems represent items with their textual description, using a large language model. They show better results on standard benchmarks compared to an item ID-only model, such as Bert4Rec. In this work, we revisit the often-used Bert4Rec baseline and show that with further tuning, Bert4Rec significantly outperforms previously reported numbers, and in some datasets, is competitive with state-of-the-art models.","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3701716.3717554","openalex_id":"https://openalex.org/W4410636401","cited_by_count":1,"quality_score":42,"matched_keywords":["language model"],"author_affiliations":["Google (United States)","University of Waterloo"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7628373503684998},{"id":"https://openalex.org/C557471498","display_name":"Recommender system","score":0.6911147236824036},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.35101163387298584},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.3394334316253662},{"id":"https://openalex.org/C136764020","display_name":"World Wide Web","score":0.2671715021133423}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4410636419","title":"Pre-train and Fine-tune: Recommenders as Large Models","url":"https://doi.org/10.1145/3701716.3715255","published":"2025-05-08","authors":["Zhenhao Jiang","Chenghao Chen","Hao Feng","Yu Yang","Jin Liu","Jie Zhang","Jia Jia","Ning Hu"],"abstract":"In reality, users have different interests in different periods, regions, scenes, etc. Such changes in interest are so drastic that they are difficult to be captured by recommenders. Existing multi-domain learning can alleviate this problem. However, the structure of the industrial recommendation system is complex, the amount of data is huge, and the training cost is extremely high, so it is difficult to modify the structure of the industrial recommender and re-train it. To fill this gap, we consider recommenders as large pre-trained models and fine-tune them. We first propose the theory of the information bottleneck for fine-tuning and present an explanation for the fine-tuning technique in recommenders. To tailor for recommendation, we design an information-aware adaptive kernel (IAK) technique to fine-tune the pre-trained recommender. Specifically, we define fine-tuning as two phases:...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3701716.3715255","openalex_id":"https://openalex.org/W4410636419","cited_by_count":0,"quality_score":41,"matched_keywords":["compression"],"author_affiliations":["Alibaba Group (China)","Chinese University of Hong Kong, Shenzhen"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6739566326141357},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3626363277435303}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4410637826","title":"Joint Modeling in Deep Recommender Systems","url":"https://doi.org/10.1145/3701716.3715867","published":"2025-05-08","authors":["Pengyue Jia","Jingtong Gao","Yejing Wang","Yuhao Wang","Xiaopeng Li","Qidong Liu","Yichao Wang","Bo Chen","Huifeng Guo","Ruiming Tang"],"abstract":"In the current digital era, Deep Recommender Systems (DRS) are essential for navigating and tailoring online content to individual preferences. However, conventional approaches that rely primarily on a single recommendation task, scenario, data modality, or user behavior are increasingly inadequate for capturing users' complex and evolving preferences. This limitation highlights the need for joint modeling approaches that integrate multiple tasks, scenarios, modalities, and behaviors within the recommendation process, enhancing recommendation precision, efficiency, and personalization. In this tutorial, we aim to give a comprehensive survey on the recent progress of the joint modeling methods in recommendations, which includes multi-task, multi-scenario, multi-modal, and multi-behavior modeling. This work will provide academic researchers and industry professionals with a thorough unders...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3701716.3715867","openalex_id":"https://openalex.org/W4410637826","cited_by_count":0,"quality_score":41,"matched_keywords":["personalization"],"author_affiliations":["City University of Hong Kong","Huawei Technologies (China)"],"concepts":[{"id":"https://openalex.org/C557471498","display_name":"Recommender system","score":0.8766503930091858},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.789539098739624},{"id":"https://openalex.org/C18555067","display_name":"Joint (building)","score":0.7385530471801758},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3970944881439209},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.3003077507019043},{"id":"https://openalex.org/C127413603","display_name":"Engineering","score":0.08653563261032104},{"id":"https://openalex.org/C170154142","display_name":"Architectural engineering","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4410636851","title":"Hierarchical Prompt Decision Transformer: Improving Few-Shot Policy Generalization with Global and Adaptive Guidance","url":"https://doi.org/10.1145/3701716.3715233","published":"2025-05-08","authors":["Zhe Wang","Haozhu Wang","Yanjun Qi"],"abstract":"Decision transformers recast reinforcement learning as a conditional sequence generation problem, offering a simple but effective alternative to traditional value or policy-based methods. A recent key development in this area is the integration of prompting in decision transformers to facilitate few-shot policy generalization. However, current methods mainly use static prompt segments to guide rollouts, limiting their ability to provide context-specific guidance. Addressing this, we introduce a hierarchical prompting approach enabled by retrieval augmentation. Our method learns two layers of soft tokens as guiding prompts: (1) global tokens encapsulating task-level information about trajectories, and (2) adaptive tokens that deliver focused, timestep-specific instructions. The adaptive tokens are dynamically retrieved from a curated set of demonstration segments, ensuring context-aware g...","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3701716.3715233","openalex_id":"https://openalex.org/W4410636851","cited_by_count":0,"quality_score":41,"matched_keywords":["retrieval"],"author_affiliations":["Amazon (United States)","University of Virginia"],"concepts":[{"id":"https://openalex.org/C177148314","display_name":"Generalization","score":0.6963692307472229},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6598922610282898},{"id":"https://openalex.org/C66322947","display_name":"Transformer","score":0.5825244188308716},{"id":"https://openalex.org/C2778344882","display_name":"Shot (pellet)","score":0.48984381556510925},{"id":"https://openalex.org/C2992734406","display_name":"One shot","score":0.42954856157302856},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.42917120456695557},{"id":"https://openalex.org/C33923547","display_name":"Mathematics","score":0.16475266218185425},{"id":"https://openalex.org/C127413603","display_name":"Engineering","score":0.12971463799476624}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4411549659","title":"Generative Prompting for Complex Product Retrieval","url":"https://doi.org/10.1145/3701716.3715546","published":"2025-05-08","authors":["Nan Xi","Jingjing Meng","Yitian Chen","Chaosheng Dong","Yan Gao","Yi Sun","Junsong Yuan"],"abstract":"","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3701716.3715546","openalex_id":"https://openalex.org/W4411549659","cited_by_count":0,"quality_score":41,"matched_keywords":["retrieval"],"author_affiliations":["Amazon (United States)","University at Buffalo, State University of New York"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.687522828578949},{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.5962414741516113},{"id":"https://openalex.org/C90673727","display_name":"Product (mathematics)","score":0.5364108681678772},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.46303993463516235},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.44636768102645874},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.44006362557411194},{"id":"https://openalex.org/C33923547","display_name":"Mathematics","score":0.10056918859481812},{"id":"https://openalex.org/C2524010","display_name":"Geometry","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4410636378","title":"Enhancing Web Service Anomaly Detection via Fine-grained Multi-modal Association and Frequency Domain Analysis","url":"https://doi.org/10.1145/3701716.3715221","published":"2025-05-08","authors":["Xixuan Yang","Xin Huang","Chiming Duan","Tong Jia","Shandong Dong","Ying Li","Gang Huang"],"abstract":"Anomaly detection is crucial for ensuring the stability and reliability of web service systems. Logs and metrics contain multiple information that can reflect the system's operational state and potential anomalies. Thus, existing anomaly detection methods use logs and metrics to detect web service systems' anomalies through data fusion approaches. They associate logs and metrics using coarse-grained time window alignment and capture the normal patterns of system operation through reconstruction. However, these methods have two issues that limit their performance in anomaly detection. First, due to asynchrony between logs and metrics, coarse-grained time window alignment cannot achieve a precise association between the two modalities. Second, reconstruction-based methods suffer from severe overgeneralization problems, resulting in anomalies being accurately reconstructed. In this paper, w...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3701716.3715221","openalex_id":"https://openalex.org/W4410636378","cited_by_count":4,"quality_score":41,"matched_keywords":[],"author_affiliations":["Alibaba Group (China)","Nanyang Normal University","Peking University"],"concepts":[{"id":"https://openalex.org/C739882","display_name":"Anomaly detection","score":0.6068559288978577},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5920848846435547},{"id":"https://openalex.org/C71139939","display_name":"Modal","score":0.581332802772522},{"id":"https://openalex.org/C19118579","display_name":"Frequency domain","score":0.49140775203704834},{"id":"https://openalex.org/C35578498","display_name":"Web service","score":0.4571717381477356},{"id":"https://openalex.org/C136764020","display_name":"World Wide Web","score":0.22661477327346802},{"id":"https://openalex.org/C124101348","display_name":"Data mining","score":0.2065967321395874},{"id":"https://openalex.org/C192562407","display_name":"Materials science","score":0.13837352395057678}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":4}},{"id":"openalex:W4410636862","title":"DMLoRA: Dynamic Multi-Subspace Low-Rank Adaptation","url":"https://doi.org/10.1145/3701716.3715489","published":"2025-05-08","authors":["Cong Jiang","Fangzhi Zhu","Xiaowei Chen","Junxiong Zhu","Bo Zheng","Yifeng Wang","Zheng Zhang"],"abstract":"As one of the most widely adopted parameter-efficient fine-tuning (PEFT) techniques, LoRA and its variants have garnered significant attention for their ability to avoid additional inference costs. However, the standard LoRA struggles to fully match the expressive capacity of fully fine-tuned models due to inherent approximation errors and the limited flexibility of rank-level component weights. In this work, we propose a novel Dynamic Multi-subspace LoRA (DMLoRA), which partitions high-dimensional input features into multiple subspaces, each optimized with dynamic rank-level weighting. By employing dynamic weights within fine-grained subspaces, DMLoRA effectively reduces the number of fine-tuned parameters while enhancing the flexibility of rank-level representations. Moreover, we present a rigorous theoretical analysis to demonstrate that the proposed subspace-induced dynamic LoRA achi...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3701716.3715489","openalex_id":"https://openalex.org/W4410636862","cited_by_count":0,"quality_score":41,"matched_keywords":["efficient"],"author_affiliations":["Alibaba Group (China)"],"concepts":[{"id":"https://openalex.org/C32834561","display_name":"Subspace topology","score":0.7625223398208618},{"id":"https://openalex.org/C139807058","display_name":"Adaptation (eye)","score":0.6917290091514587},{"id":"https://openalex.org/C164226766","display_name":"Rank (graph theory)","score":0.5912162661552429},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5635834336280823},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.2947436273097992},{"id":"https://openalex.org/C33923547","display_name":"Mathematics","score":0.2848644256591797},{"id":"https://openalex.org/C114614502","display_name":"Combinatorics","score":0.06612977385520935},{"id":"https://openalex.org/C121332964","display_name":"Physics","score":0.05627468228340149}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4411549758","title":"FuXi-α: Scaling Recommendation Model with Feature Interaction Enhanced Transformer","url":"https://doi.org/10.1145/3701716.3715448","published":"2025-05-08","authors":["Yufei Ye","Wei Guo","Jin Yao Chin","Hao Wang","Hong Zhu","Xi Lin","Yuyang Ye","Yong Liu","Ruiming Tang","Defu Lian","Enhong Chen"],"abstract":"Inspired by scaling laws and large language models, research on large-scale recommendation models has gained significant attention. Recent advancements have shown that expanding sequential recommendation models to large-scale recommendation models can be an effective strategy. Current state-of-the-art sequential recommendation models primarily use self-attention mechanisms for explicit feature interactions among items, while implicit interactions are managed through Feed-Forward Networks (FFNs). However, these models often inadequately integrate temporal and positional information, either by adding them to attention weights or by blending them with latent representations, which limits their expressive power. A recent model, HSTU, further reduces the focus on implicit feature interactions, constraining its performance. We propose a new model called FuXi-α to address these issues. This mod...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3701716.3715448","openalex_id":"https://openalex.org/W4411549758","cited_by_count":2,"quality_score":39,"matched_keywords":[],"author_affiliations":["Huawei Technologies (China)","University of Science and Technology of China"],"concepts":[{"id":"https://openalex.org/C99844830","display_name":"Scaling","score":0.7249374389648438},{"id":"https://openalex.org/C66322947","display_name":"Transformer","score":0.6921626329421997},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6485313177108765},{"id":"https://openalex.org/C2776401178","display_name":"Feature (linguistics)","score":0.4581555128097534},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3634079098701477},{"id":"https://openalex.org/C119599485","display_name":"Electrical engineering","score":0.17780891060829163},{"id":"https://openalex.org/C127413603","display_name":"Engineering","score":0.16795450448989868},{"id":"https://openalex.org/C165801399","display_name":"Voltage","score":0.12078768014907837}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":2}},{"id":"openalex:W4410115678","title":"<i>InsightLens</i>: Augmenting LLM-Powered Data Analysis With Interactive Insight Management and Navigation","url":"https://doi.org/10.1109/tvcg.2025.3567131","published":"2025-05-06","authors":["Luoxuan Weng","Xingbo Wang","Junyu Lu","Yingchaojie Feng","Yihan Liu","Haozhe Feng","Danqing Huang","Wei Chen"],"abstract":"The proliferation of large language models (LLMs) has revolutionized the capabilities of natural language interfaces (NLIs) for data analysis. LLMs can perform multi-step and complex reasoning to generate data insights based on users' analytic intents. However, these insights often entangle with an abundance of contexts in analytic conversations such as code, visualizations, and natural language explanations. This hinders efficient recording, organization, and navigation of insights within the current chat-based LLM interfaces. In this paper, we first conduct a formative study with eight data analysts to understand their general workflow and pain points of insight management during LLM-powered data analysis. Accordingly, we introduce InsightLens, an interactive system to overcome such challenges. Built upon an LLM-agent-based framework that automates insight recording and organization al...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/tvcg.2025.3567131","openalex_id":"https://openalex.org/W4410115678","cited_by_count":11,"quality_score":60,"matched_keywords":["LLM","efficient","agent"],"author_affiliations":["Cornell University","Tencent (China)","Zhejiang University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7726360559463501},{"id":"https://openalex.org/C172367668","display_name":"Data visualization","score":0.5278387069702148},{"id":"https://openalex.org/C1668388","display_name":"Data management","score":0.491986483335495},{"id":"https://openalex.org/C121684516","display_name":"Computer graphics (images)","score":0.4770714044570923},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.4396825134754181},{"id":"https://openalex.org/C36464697","display_name":"Visualization","score":0.4049065411090851},{"id":"https://openalex.org/C2522767166","display_name":"Data science","score":0.3598495125770569},{"id":"https://openalex.org/C49774154","display_name":"Multimedia","score":0.3461496829986572}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":11}},{"id":"openalex:W4410120638","title":"Graph Machine Learning in the Era of Large Language Models (LLMs)","url":"https://doi.org/10.1145/3732786","published":"2025-05-06","authors":["Shijie Wang","Jiani Huang","Zhikai Chen","Yu Song","Wenzhuo Tang","Haitao Mao","Wenqi Fan","Hui Liu","Xiaorui Liu","Dawei Yin","Qing Li"],"abstract":"Graphs play an important role in representing complex relationships in various domains like social networks, knowledge graphs, and molecular discovery. With the advent of deep learning, Graph Neural Networks (GNNs) have emerged as a cornerstone in Graph Machine Learning (Graph ML), facilitating the representation and processing of graphs. Recently, LLMs have demonstrated unprecedented capabilities in language tasks and are widely adopted in a variety of applications, such as computer vision and recommender systems. This remarkable success has also attracted interest in applying LLMs to the graph domain. Increasing efforts have been made to explore the potential of LLMs in advancing Graph ML’s generalization, transferability, and few-shot learning ability. Meanwhile, graphs, especially knowledge graphs, are rich in reliable factual knowledge, which can be utilized to enhance the reasoning...","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3732786","openalex_id":"https://openalex.org/W4410120638","cited_by_count":13,"quality_score":54,"matched_keywords":["LLM"],"author_affiliations":["Baidu (China)","Hong Kong Polytechnic University","Michigan State University","North Carolina State University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8562272787094116},{"id":"https://openalex.org/C132525143","display_name":"Graph","score":0.5533105731010437},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5283775329589844},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.4377608299255371},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.42540615797042847},{"id":"https://openalex.org/C80444323","display_name":"Theoretical computer science","score":0.1927584409713745}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":13}},{"id":"bytedance-seed:285","title":"MxMoE: Mixed-precision Quantization for MoE with Accuracy and Performance Co-Design","url":"https://seed.bytedance.com/en/research/mxmoe-mixed-precision-quantization-for-moe-with-accuracy-and-performance-co-design","published":"2025-05-05","authors":["Haojie Duanmu","Xiuhong Li","Zhihang Yuan","Size Zheng","Jiangfei Duan","Xingcheng Zhang","Dahua Lin"],"abstract":"Mixture-of-Experts (MoE) models face deployment challenges due to their large parameter counts and computational demands. We explore quantization for MoE models and highlight two key insights: 1) linear blocks exhibit varying quantization sensitivity, and 2) divergent expert activation frequencies create heterogeneous computational characteristics. Based on these observations, we introduce MxMoE, a mixed-precision optimization framework for MoE models that considers both algorithmic and system perspectives. MxMoE navigates the design space defined by parameter sensitivity, expert activation dynamics, and hardware resources to derive efficient mixed-precision configurations. Additionally, MxMoE automatically generates optimized mixed-precision Group-GEMM kernels, enabling parallel execution of GEMMs with different precisions. Evaluations show that MxMoE outperforms existing methods, achie...","companies":["ByteDance/Seed"],"matched_orgs":["ByteDance/Seed"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Machine Learning","Infrastructures","ICML 2025","efficient","quantization"],"author_affiliations":["ByteDance/Seed"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official ByteDance Seed publication API https://seed.bytedance.com/api/get_article_list_v2"}},{"id":"openalex:W4412081930","title":"LLM Interpretability: Tracing How LLMs Answer Factual Queries and Math Questions","url":"https://doi.org/10.1109/cai64502.2025.00151","published":"2025-05-05","authors":["Shweta Agrawal","He Nan Tony Li","Lucas Lu"],"abstract":"In this paper, we study how LLMs like GPT store and retrieve facts to answer factual queries. Using a pretrained GPT-2 model, we evaluate factual accuracy on knowledge datasets, math questions, and hand-crafted prompts, employing metrics such as weighted first token accuracy, F1 score for token overlap, perplexity, and BLEU. We leverage interpretability techniques like attention visualization, logit-lens analysis, and causal tracing to identify layers responsible for knowledge retrieval. To enhance the model's factual and mathematical capabilities, we implement prompt engineering, retrieval-augmented generation (RAG), and fine-tuning, comparing their impact on accuracy and fact retrieval mechanisms.","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/cai64502.2025.00151","openalex_id":"https://openalex.org/W4412081930","cited_by_count":0,"quality_score":45,"matched_keywords":["LLM","retrieval"],"author_affiliations":["Google (United States)","Stanford University"],"concepts":[{"id":"https://openalex.org/C2781067378","display_name":"Interpretability","score":0.8964607119560242},{"id":"https://openalex.org/C138673069","display_name":"Tracing","score":0.6158256530761719},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.4919576346874237},{"id":"https://openalex.org/C145420912","display_name":"Mathematics education","score":0.3827543258666992},{"id":"https://openalex.org/C33923547","display_name":"Mathematics","score":0.33165475726127625},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.25549712777137756},{"id":"https://openalex.org/C199360897","display_name":"Programming language","score":0.1731698215007782}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4412082463","title":"Generative AI to Enhance Situational Awareness and Collaboration: A Co-Worker for Tech Support in Enterprise Environments","url":"https://doi.org/10.1109/cai64502.2025.00250","published":"2025-05-05","authors":["Yasemin Karaca","Gözde Sarsar","Hüseyin Doğan","Stephen Giff"],"abstract":"Generative AI (GenAl) is revolutionizing enterprise operations by enhancing Situational Awareness (SA) and decision-making. This paper integrates insights from the OODA Loop and the Agent Teaming Situation Awareness (ATSA) framework to explore human-AI team (HAT) collaboration in tech support. Key themes include Shared Situational Awareness (SSA), Shared Mental Models (SMMs), Belief-Desire-Intention (BDI), and Explainable AI (XAI). Emphasizing trust and transparency, the paper proposes a hybrid system to address challenges in achieving trust in AI systems. Findings demonstrate how GenAl improves productivity while fostering adaptability and trust in dynamic enterprise environments.","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/cai64502.2025.00250","openalex_id":"https://openalex.org/W4412082463","cited_by_count":0,"quality_score":41,"matched_keywords":["agent"],"author_affiliations":["Bournemouth University","Google (United States)"],"concepts":[{"id":"https://openalex.org/C56739046","display_name":"Knowledge management","score":0.6689271330833435},{"id":"https://openalex.org/C145804949","display_name":"Situation awareness","score":0.6499490737915039},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5622304677963257},{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.5579177141189575},{"id":"https://openalex.org/C9114305","display_name":"Situational ethics","score":0.5241342186927795},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.42793500423431396},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.28593164682388306},{"id":"https://openalex.org/C127413603","display_name":"Engineering","score":0.21682488918304443}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/towards-safer-pretraining-analyzing-and-filtering-harmful-content-in-webscale-datasets-for-responsible-llms","title":"Towards Safer Pretraining: Analyzing and Filtering Harmful Content in Webscale datasets for Responsible LLMs","url":"https://www.microsoft.com/en-us/research/publication/towards-safer-pretraining-analyzing-and-filtering-harmful-content-in-webscale-datasets-for-responsible-llms/","published":"2025-05-04","authors":["Sai Krishna Mendu","Harish Yenala","Aditi Gulati","Shanu Kumar","Parag Agrawal"],"abstract":"Large language models (LLMs) have become integral to various real-world applications, leveraging massive, web-sourced datasets like Common Crawl, C4, and FineWeb for pretraining. While these datasets provide linguistic data essential for high-quality natural language generation, they often contain harmful content, such as hate speech, misinformation, and biased narratives. Training LLMs on such unfiltered data risks perpetuating toxic behaviors, spreading misinformation, and amplifying societal biases which can undermine trust in LLM-driven applications and raise ethical concerns about their use. This paper presents a large-scale analysis of inappropriate content across these datasets, offering a comprehensive taxonomy that categorizes harmful webpages into Topical and Toxic based on their intent. We also introduce a prompt evaluation dataset, a high-accuracy Topical and Toxic Prompt (TT...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Search and information retrieval","Computer science","1970-01-01","LLM"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/optimizing-chain-of-thought-reasoners-via-gradient-variance-minimization-in-rejection-sampling-and-rl","title":"Optimizing Chain-of-Thought Reasoners via Gradient Variance Minimization in Rejection Sampling and RL","url":"https://www.microsoft.com/en-us/research/publication/optimizing-chain-of-thought-reasoners-via-gradient-variance-minimization-in-rejection-sampling-and-rl/","published":"2025-05-04","authors":["Jiarui Yao","Yifan Hao","Hanning Zhang","Hanze Dong","Wei Xiong","Nan Jiang","Tong Zhang"],"abstract":"Chain-of-thought (CoT) reasoning in large language models (LLMs) can be formalized as a latent variable problem, where the model needs to generate intermediate reasoning steps. While prior approaches such as iterative reward-ranked fine-tuning (RAFT) have relied on such formulations, they typically apply uniform inference budgets across prompts, which fails to account for variability in difficulty and convergence behavior. This work identifies the main bottleneck in CoT training as inefficient stochastic gradient estimation due to static sampling strategies. We propose GVM-RAFT, a prompt-specific Dynamic Sample Allocation Strategy designed to minimize stochastic gradient variance under a computational budget constraint. The method dynamically allocates computational resources by monitoring prompt acceptance rates and stochastic gradient norms, ensuring that the resulting gradient varianc...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","large language models","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W4410068027","title":"Transforming Mental Health Care with Autonomous LLM Agents at the Edge","url":"https://doi.org/10.1145/3715014.3724073","published":"2025-05-04","authors":["Sijie Ji","Xinzhe Zheng","Wei Gao","Mani Srivastava"],"abstract":"The integration of Large Language Models (LLMs) with mobile devices is set to transform mental health care accessibility and quality. This paper introduces MindGuard, an autonomous LLM agent that utilizes mobile sensor data and engages in proactive, personalized conversations while ensuring user privacy through local processing. Unlike traditional mental health AI tools, MindGuard enables real-time, context-aware interventions by dynamically adapting to users' emotional and physiological states. The real-world implementation demonstrates its effectiveness with the ultimate goal of creating an accessible, scalable, and personalized mental healthcare ecosystem for anyone with smart mobile devices.","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3715014.3724073","openalex_id":"https://openalex.org/W4410068027","cited_by_count":0,"quality_score":49,"matched_keywords":["LLM","personalized","agent"],"author_affiliations":["Amazon (United States)","California Institute of Technology","National University of Singapore","University of California, Los Angeles"],"concepts":[{"id":"https://openalex.org/C162307627","display_name":"Enhanced Data Rates for GSM Evolution","score":0.5699977874755859},{"id":"https://openalex.org/C2992545881","display_name":"Mental health care","score":0.5009007453918457},{"id":"https://openalex.org/C134362201","display_name":"Mental health","score":0.49956178665161133},{"id":"https://openalex.org/C160735492","display_name":"Health care","score":0.47659438848495483},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.4587784707546234},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.3515123724937439},{"id":"https://openalex.org/C15744967","display_name":"Psychology","score":0.32737571001052856},{"id":"https://openalex.org/C17744445","display_name":"Political science","score":0.1970483362674713}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4410067994","title":"mmET: mmWave Radar-Based Eye Tracking on Smart Glasses","url":"https://doi.org/10.1145/3715014.3722050","published":"2025-05-04","authors":["Ruichun Ma","Yasuo Morimoto","John S. Ho","Sam Shiu","Jiang Zhu"],"abstract":"With the growing popularity of VR and AR devices, eye tracking has become a critical user interface and input modality for on-device AI agents. However, a compact, power-efficient, and robust eye tracking solution for AR/smart glasses remains an unsolved challenge. In this paper, we present mmET, the first mmWave radar-based eye tracking system on glasses. Our system, implemented as a pair of prototype glasses, utilizes sub-1cm mmWave radars placed near the eyes. The radars transmit FMCW signals and capture the reflections from the eyes and surrounding skin as the system input. To refine gaze estimation accuracy and data efficiency, we propose several novel methods: (1) concatenating multiple chirps and beamforming with learnable weights to improve resolution, (2) a novel neural network architecture to enhance robustness against remounting, (3) pretraining with contrastive loss to enable...","companies":["Meta/FAIR"],"matched_orgs":["Meta/FAIR"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3715014.3722050","openalex_id":"https://openalex.org/W4410067994","cited_by_count":1,"quality_score":42,"matched_keywords":["efficient"],"author_affiliations":["Meta (United States)","Yale University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.610818088054657},{"id":"https://openalex.org/C554190296","display_name":"Radar","score":0.6077849864959717},{"id":"https://openalex.org/C32283439","display_name":"Radar tracker","score":0.5299326777458191},{"id":"https://openalex.org/C62649853","display_name":"Remote sensing","score":0.47910287976264954},{"id":"https://openalex.org/C2775936607","display_name":"Tracking (education)","score":0.4600823223590851},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3374660015106201},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.3360201418399811},{"id":"https://openalex.org/C76155785","display_name":"Telecommunications","score":0.2722512483596802}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4410067997","title":"MMBind: Unleashing the Potential of Distributed and Heterogeneous Data for Multimodal Learning in IoT","url":"https://doi.org/10.1145/3715014.3722053","published":"2025-05-04","authors":["Xiaomin Ouyang","Jason Wu","Tomoyoshi Kimura","Yihan Lin","Gunjan Verma","Tarek Abdelzaher","Mani Srivastava"],"abstract":"Multimodal sensing systems are increasingly prevalent in various real-world applications. Most existing multimodal learning approaches heavily rely on training with a large amount of synchronized, complete multimodal data. However, such a setting is impractical in real-world IoT sensing applications where data is typically collected by distributed nodes with heterogeneous data modalities, and is also rarely labeled. In this paper, we propose MMBind, a new data binding approach for multimodal learning on distributed and heterogeneous IoT data. The key idea of MMBind is to construct a pseudo-paired multimodal dataset for model training by binding data from disparate sources and incomplete modalities through a sufficiently descriptive shared modality. We also propose a weighted contrastive learning approach to handle domain shifts among disparate data, coupled with an adaptive multimodal le...","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3715014.3722053","openalex_id":"https://openalex.org/W4410067997","cited_by_count":4,"quality_score":41,"matched_keywords":[],"author_affiliations":["Amazon (United States)","DEVCOM Army Research Laboratory","Hong Kong University of Science and Technology","United States Army Combat Capabilities Development Command","University of California, Los Angeles","University of Illinois Urbana-Champaign"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7834908962249756},{"id":"https://openalex.org/C81860439","display_name":"Internet of Things","score":0.49932408332824707},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.41492339968681335},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.38906678557395935},{"id":"https://openalex.org/C120314980","display_name":"Distributed computing","score":0.38796135783195496},{"id":"https://openalex.org/C2522767166","display_name":"Data science","score":0.3494683504104614},{"id":"https://openalex.org/C149635348","display_name":"Embedded system","score":0.19374078512191772}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":4}},{"id":"openalex:W4410067979","title":"LeakyFeeder: In-Air Gesture Control Through Leaky Acoustic Waves","url":"https://doi.org/10.1145/3715014.3722054","published":"2025-05-04","authors":["Yongjie Yang","Tao Chen","Zhenlin An","Shirui Cao","Xiaoran Fan","Longfei Shangguan"],"abstract":"We present LeakyFeeder, a mobile application that explores the acoustic signals leaked from headphones to reconstruct gesture motions around the ear for fine-grained gesture control. To achieve this goal, LeakyFeeder repurposes the speaker and a single feedforward microphone on active noise cancellation (ANC) headphones as a SONAR system, using inaudible frequency-modulated continuous-wave (FMCW) signals to track gesture reflections for accurate sensing. Since this single-receiver SONAR system is unable to differentiate reflection angles and further disentangle signal reflections from different gesture parts, we draw on principles of multi-modal learning to frame gesture motion reconstruction as a multi-modal translation task and propose a deep learning-based approach to fill the information gap between low-dimensional FMCW ranging readings and high-dimensional 3D hand movements. We impl...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3715014.3722054","openalex_id":"https://openalex.org/W4410067979","cited_by_count":1,"quality_score":38,"matched_keywords":[],"author_affiliations":["Google (United States)","University of Massachusetts Amherst","University of Pittsburgh"],"concepts":[{"id":"https://openalex.org/C207347870","display_name":"Gesture","score":0.777029812335968},{"id":"https://openalex.org/C24890656","display_name":"Acoustics","score":0.6747025847434998},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5203627943992615},{"id":"https://openalex.org/C204723758","display_name":"Acoustic wave","score":0.4996066093444824},{"id":"https://openalex.org/C2775924081","display_name":"Control (management)","score":0.41808009147644043},{"id":"https://openalex.org/C121332964","display_name":"Physics","score":0.2961510717868805},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.17208024859428406}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4410068136","title":"Detecting Context Shifts in the Human Experience Using Multimodal Foundation Models","url":"https://doi.org/10.1145/3715014.3724037","published":"2025-05-04","authors":["Iris Nguyen","Liying Han","Burke Dambly","Alireza Kazemi","Marina Kogan","Cory S. Inman","Mani Srivastava","Luis Antonio Ribot García"],"abstract":"Detecting context shifts in human experience is critical for applications in cognitive modeling, human-AI interaction, and adaptive neurotechnology. However, formalizing and identifying these shifts in real-world settings remains challenging due to annotation inconsistencies, data sparsity, and the multimodal nature of human perception.","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3715014.3724037","openalex_id":"https://openalex.org/W4410068136","cited_by_count":1,"quality_score":38,"matched_keywords":[],"author_affiliations":["Amazon (United States)","University of California, Los Angeles","University of Utah"],"concepts":[{"id":"https://openalex.org/C2780966255","display_name":"Foundation (evidence)","score":0.6853231191635132},{"id":"https://openalex.org/C2779343474","display_name":"Context (archaeology)","score":0.6364649534225464},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6272914409637451},{"id":"https://openalex.org/C183322885","display_name":"Context model","score":0.44778382778167725},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.4348292350769043},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3112751245498657},{"id":"https://openalex.org/C127313418","display_name":"Geology","score":0.07882213592529297},{"id":"https://openalex.org/C205649164","display_name":"Geography","score":0.07151859998703003}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/efficient-vocabulary-free-fine-grained-visual-recognition-in-the-age-of-multimodal-llms","title":"Efficient Vocabulary-Free Fine-Grained Visual Recognition in the Age of Multimodal LLMs","url":"https://www.microsoft.com/en-us/research/publication/efficient-vocabulary-free-fine-grained-visual-recognition-in-the-age-of-multimodal-llms/","published":"2025-05-02","authors":["Hari Chandana Kuchibhotla","Sai Srinivas Kancheti","Abbavaram Gowtham Reddy","Vineeth N Balasubramanian"],"abstract":"Fine-grained Visual Recognition (FGVR) involves distinguishing between visually similar categories, which is inherently challenging due to subtle inter-class differences and the need for large, expert-annotated datasets. In domains like medical imaging, such curated datasets are unavailable due to issues like privacy concerns and high annotation costs. In such scenarios lacking labeled data, an FGVR model cannot rely on a predefined set of training labels, and hence has an unconstrained output space for predictions. We refer to this task as Vocabulary-Free FGVR (VF-FGVR), where a model must predict labels from an unconstrained output space without prior label information. While recent Multimodal Large Language Models (MLLMs) show potential for VF-FGVR, querying these models for each test input is impractical because of high costs and prohibitive inference times. To address these limitati...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":80,"matched_keywords":["Article (Journal)","Artificial intelligence","Computer vision","Human language technologies","Computer science","1970-01-01","efficient"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W4410513106","title":"Collecting Qualitative Data at Scale with Large Language Models: A Case Study","url":"https://doi.org/10.1145/3710947","published":"2025-05-02","authors":["Alejandro Cuevas","Jennifer V. Scurrell","Eva Maxfield Brown","Jason Entenmann","Madeleine I. G. Daepp"],"abstract":"Chatbots have shown promise as tools to scale qualitative data collection. Recent advances in Large Language Models (LLMs) could accelerate this process by allowing researchers to easily deploy sophisticated interviewing chatbots. We test this assumption by conducting a large-scale user study (n=399) evaluating 3 different chatbots, two of which are LLM-based and a baseline which employs hard-coded questions. We evaluate the results with respect to participant engagement and experience, established metrics of chatbot quality grounded in theories of effective communication, and a novel scale evaluating ''richness'' or the extent to which responses capture the complexity and specificity of the social context under study. We find that, while the chatbots were able to elicit high-quality responses based on established evaluation metrics, the responses rarely capture participants' specific mo...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3710947","openalex_id":"https://openalex.org/W4410513106","cited_by_count":7,"quality_score":52,"matched_keywords":["LLM","personalized"],"author_affiliations":["Carnegie Mellon University","ETH Zurich","Microsoft (United States)","University of Washington"],"concepts":[{"id":"https://openalex.org/C2778755073","display_name":"Scale (ratio)","score":0.5811710357666016},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5321649312973022},{"id":"https://openalex.org/C2522767166","display_name":"Data science","score":0.45690834522247314},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.3646507263183594},{"id":"https://openalex.org/C205649164","display_name":"Geography","score":0.2103506624698639},{"id":"https://openalex.org/C58640448","display_name":"Cartography","score":0.13257035613059998}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":7}},{"id":"openalex:W4410539484","title":"Summaries, Highlights, and Action Items: Design, Implementation and Evaluation of an LLM-powered Meeting Recap System","url":"https://doi.org/10.1145/3711074","published":"2025-05-02","authors":["Sumit Asthana","Sagi Hilleli","Pengcheng He","Aaron Halfaker"],"abstract":"Meetings play a critical infrastructural role in coordinating work. The recent surge of hybrid and remote meetings in computer-mediated spaces has led to new problems (e.g., more time spent in less engaging meetings) and new opportunities (e.g., automated transcription/captioning and recap support). Advances in dialogue summarization offer the potential for improving post-meeting experiences, but fixed-length summaries often fail to meet diverse needs, such as quick overviews or detailed insights. To address these gaps, we use cognitive science and discourse theories to conceptualize two recap designs: important highlights and a structured, hierarchical minutes view, targeting complementary recap needs. We operationalize these representations into high-fidelity prototypes using dialogue summarization. Finally, we evaluate the representations' effectiveness with seven users in the context...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3711074","openalex_id":"https://openalex.org/W4410539484","cited_by_count":9,"quality_score":50,"matched_keywords":["LLM"],"author_affiliations":["Microsoft (Israel)","Microsoft (United States)","University of Michigan"],"concepts":[{"id":"https://openalex.org/C9354725","display_name":"Operationalization","score":0.7269495129585266},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5979288816452026},{"id":"https://openalex.org/C2779343474","display_name":"Context (archaeology)","score":0.577494204044342},{"id":"https://openalex.org/C170858558","display_name":"Automatic summarization","score":0.5062021613121033},{"id":"https://openalex.org/C2776459999","display_name":"Fidelity","score":0.46293869614601135},{"id":"https://openalex.org/C111226992","display_name":"Teamwork","score":0.459847092628479},{"id":"https://openalex.org/C2780791683","display_name":"Action (physics)","score":0.45216453075408936},{"id":"https://openalex.org/C2780876879","display_name":"Meaning (existential)","score":0.44435378909111023}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":9}},{"id":"openalex:W4410032230","title":"A decade of gender bias in machine translation","url":"https://doi.org/10.1016/j.patter.2025.101257","published":"2025-05-02","authors":["Beatrice Savoldi","Jasmijn Bastings","Luisa Bentivogli","Eva Vanmassenhove"],"abstract":"Gender bias in machine translation (MT) has been studied for over a decade, a time marked by societal, linguistic, and technological shifts. With the early optimism for a quick solution in mind, we review over 100 studies on the topic and uncover a more complex reality-one that resists a simple technical fix. While we identify key trends and advancements, persistent gaps remain. We argue that there is no simple technical solution to bias. Building on insights from our review, we examine the growing prominence of large language models and discuss the challenges and opportunities they present in the context of gender bias and translation. By doing so, we hope to inspire future work in the field to break with past limitations and to be less focused on a technical fix; more user-centric, multilingual, and multiculturally diverse; more personalized; and better grounded in real-world needs.","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"review","doi":"https://doi.org/10.1016/j.patter.2025.101257","openalex_id":"https://openalex.org/W4410032230","cited_by_count":6,"quality_score":47,"matched_keywords":["personalized"],"author_affiliations":["Fondazione Bruno Kessler","Google (United States)","Tilburg University"],"concepts":[{"id":"https://openalex.org/C149364088","display_name":"Translation (biology)","score":0.5715330839157104},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.46929270029067993},{"id":"https://openalex.org/C203005215","display_name":"Machine translation","score":0.46643733978271484},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.42162996530532837},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.38345620036125183},{"id":"https://openalex.org/C15744967","display_name":"Psychology","score":0.3755798935890198},{"id":"https://openalex.org/C180747234","display_name":"Cognitive psychology","score":0.3360576033592224},{"id":"https://openalex.org/C86803240","display_name":"Biology","score":0.13006898760795593}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":6}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/response-wide-shut-surprising-observations-in-basic-vision-language-model-capabilities","title":"Response Wide Shut: Surprising Observations in Basic Vision Language Model Capabilities","url":"https://www.microsoft.com/en-us/research/publication/response-wide-shut-surprising-observations-in-basic-vision-language-model-capabilities/","published":"2025-05-01","authors":["Shivam Chandhok","Wan-Cyuan Fan","Vered Shwartz","Vineeth N Balasubramanian","Leonid Sigal"],"abstract":"Vision-Language Models (VLMs) have emerged as general purpose tools for addressing a variety of complex computer vision problems. Such models have been shown to be highly capable, but, at the same time, also lacking some basic visual understanding skills. In this paper, we set out to understand the limitations of SoTA VLMs on fundamental visual tasks: object classification, understanding spatial arrangement, and ability to delineate individual object instances (through counting), by constructing a series of tests that probe which components of design, specifically, maybe lacking. Importantly, we go significantly beyond the current benchmarks, that simply measure final performance of VLM, by also comparing and contrasting it to performance of probes trained directly on features obtained from visual encoder (image embeddings), as well as intermediate vision-language projection used to brid...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":88,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer vision","Human language technologies","Computer science","human language technologies","1970-01-01","LLM","language model"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/media-bias-detector-designing-and-implementing-a-tool-for-real-time-selection-and-framing-bias-analysis-in-news-coverage","title":"Media Bias Detector: Designing and Implementing a Tool for Real-Time Selection and Framing Bias Analysis in News Coverage","url":"https://www.microsoft.com/en-us/research/publication/media-bias-detector-designing-and-implementing-a-tool-for-real-time-selection-and-framing-bias-analysis-in-news-coverage/","published":"2025-05-01","authors":["Jennifer Wang","Samar Haider","Amir Tohidi","Anushkaa Gupta","Yuxuan Zhang","Christopher Callison-Burch","David Rothschild","Duncan J Watts"],"abstract":"Mainstream media, through their decisions on what to cover and how to frame the stories they cover, can mislead readers without using outright falsehoods. Therefore, it is crucial to have tools that expose these editorial choices underlying media bias. In this paper, we introduce the Media Bias Detector, a tool for researchers, journalists, and news consumers. By integrating large language models, we provide near real-time granular insights into the topics, tone, political lean, and facts of news articles aggregated to the publisher level. We assessed the tool's impact by interviewing 13 experts from journalism, communications, and political science, revealing key insights into usability and functionality, practical applications, and AI's role in powering media bias tools. We explored this in more depth with a follow-up survey of 150 news consumers. This work highlights opportunities for...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":88,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Human-computer interaction","Computer science","1970-01-01","political","journalism","news","media"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/good-things-come-in-small-packages-should-we-adopt-lite-gpus-in-ai-infrastructure","title":"Good things come in small packages: Should we build AI clusters with Lite-GPUs?","url":"https://www.microsoft.com/en-us/research/publication/good-things-come-in-small-packages-should-we-adopt-lite-gpus-in-ai-infrastructure/","published":"2025-05-01","authors":["Burcu Canakci","Junyi Liu","Xingbo Wu","Nathanael Cheriere","Paolo Costa","Sergey Legtchenko","Dushyanth Narayanan","Ant Rowstron"],"abstract":"To match the blooming demand of generative AI workloads, GPU designers have so far been trying to pack more and more compute and memory into single complex and expensive packages. However, there is growing uncertainty about the scalability of individual GPUs and thus AI clusters, as state-of-the-art GPUs are already displaying packaging, yield, and cooling limitations. We propose to rethink the design and scaling of AI clusters through efficiently-connected large clusters of Lite-GPUs, GPUs with single, small dies and a fraction of the capabilities of larger GPUs. We think recent advances in co-packaged optics can enable distributing AI workloads onto many Lite-GPUs through high bandwidth and efficient communication. In this paper, we present the key benefits of Lite-GPUs on manufacturing cost, blast radius, yield, and power efficiency; and discuss systems opportunities and challenges ar...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":84,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Hardware and devices","Systems and networking","Computer science","1970-01-01","memory","efficient"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/it-was-80-me-20-ai-seeking-authenticity-in-co-writing-with-large-language-models","title":"\"It was 80% me, 20% AI\": Seeking Authenticity in Co-Writing with Large Language Models","url":"https://www.microsoft.com/en-us/research/publication/it-was-80-me-20-ai-seeking-authenticity-in-co-writing-with-large-language-models/","published":"2025-05-01","authors":["Angel Hsing-Chi Hwang","Q. Vera Liao","Su Lin Blodgett","Alexandra Olteanu","Adam Trischler"],"abstract":"Given the rising proliferation and diversity of AI writing assistance tools, especially those powered by large language models (LLMs), both writers and readers may have concerns about the impact of these tools on the authenticity of writing work. We examine whether and how writers want to preserve their authentic voice when co-writing with AI tools and whether personalization of AI writing support could help achieve this goal. We conducted semi-structured interviews with 19 professional writers, during which they co-wrote with both personalized and non-personalized AI writing-support tools. We supplemented writers’ perspectives with opinions from 30 avid readers about the written work co-produced with AI collected through an online survey. Our findings illuminate conceptions of authenticity in human-AI co-creation, which focus more on the process and experience of constructing creators’....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Human language technologies","Human-computer interaction","Social sciences","personalized","personalization"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/allhands-ask-me-anything-on-large-scale-verbatim-feedback-via-large-language-models","title":"AllHands: Ask Me Anything on Large-scale Verbatim Feedback via Large Language Models","url":"https://www.microsoft.com/en-us/research/publication/allhands-ask-me-anything-on-large-scale-verbatim-feedback-via-large-language-models/","published":"2025-05-01","authors":["Chaoyun Zhang","Zicheng Ma","Yuhao Wu","Shilin He","Si Qin","Minghua Ma","Xiaoting Qin","Yu Kang","Yuyi Liang","Xiaoyu Gou","Yajie Xue","Qingwei Lin 林庆维"],"abstract":"Verbatim feedback constitutes a valuable repository of user experiences, opinions, and requirements essential for software development. Effectively and efficiently extracting valuable insights from such data poses a challenging task. This paper introduces Allhands , an innovative analytic framework designed for large-scale feedback analysis through a natural language interface, leveraging large language models (LLMs). Allhands adheres to a conventional feedback analytic workflow, initially conducting classification and topic modeling on the feedback to convert them into a structurally augmented format, incorporating LLMs to enhance accuracy, robustness, generalization, and user-friendliness. Subsequently, an LLM agent is employed to interpret users' diverse questions in natural language on feedback, translating them into Python code for execution, and delivering comprehensive multi-modal...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","1970-01-01","LLM","agent"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/shifcon-enhancing-non-dominant-language-capabilities-with-a-shift-based-multilingual-contrastive-framework","title":"ShifCon: Enhancing Non-Dominant Language Capabilities with a Shift-based Multilingual Contrastive Framework","url":"https://www.microsoft.com/en-us/research/publication/shifcon-enhancing-non-dominant-language-capabilities-with-a-shift-based-multilingual-contrastive-framework/","published":"2025-05-01","authors":["Dongdong Zhang"],"abstract":"Although fine-tuning Large Language Models (LLMs) with multilingual data can rapidly enhance the multilingual capabilities of LLMs, they still exhibit a performance gap between the dominant language (e.g., English) and non-dominant ones due to the imbalance of training data across languages. To further enhance the performance of non-dominant languages, we propose ShifCon, a Shift-based multilingual Contrastive framework that aligns the internal forward process of other languages toward that of the dominant one. Specifically, it shifts the representations of non-dominant languages into the dominant language subspace, allowing them to access relatively rich information encoded in the model parameters. The enriched representations are then shifted back into their original language subspace before generation. Moreover, we introduce a subspace distance metric to pinpoint the optimal layer are...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/chain-of-reasoning-towards-unified-mathematical-reasoning-in-large-language-models-via-a-multi-paradigm-perspective","title":"Chain-of-Reasoning: Towards Unified Mathematical Reasoning in Large Language Models via a Multi-Paradigm Perspective","url":"https://www.microsoft.com/en-us/research/publication/chain-of-reasoning-towards-unified-mathematical-reasoning-in-large-language-models-via-a-multi-paradigm-perspective/","published":"2025-05-01","authors":["Yiyao Yu","Yuxiang Zhang","Dongdong Zhang","Xiao Liang","Hengyuan Zhang","Xingxing Zhang","Mahmoud Khademi","Hany Awadalla","Junjie Wang","Yujiu Yang","Furu Wei"],"abstract":"Large Language Models (LLMs) have made notable progress in mathematical reasoning, yet often rely on single-paradigm reasoning, limiting their effectiveness across diverse tasks. We introduce Chain-of-Reasoning (CoR), a novel unified framework integrating multiple reasoning paradigms — Natural Language Reasoning (NLR), Algorithmic Reasoning (AR), and Symbolic Reasoning (SR) — to enable synergistic collaboration. CoR generates multiple potential answers via different reasoning paradigms and synthesizes them into a coherent final solution. We propose a Progressive Paradigm Training (PPT) strategy for models to progressively master these paradigms, leading to CoR-Math-7B. Experimental results demonstrate that CoR-Math-7B significantly outperforms current SOTA models, achieving up to a 41.0% absolute improvement over GPT-4o in theorem proving and a 15.0% improvement over RL-based methods on....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Human language technologies"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"official:2fbd9d921e2b0a5d","title":"Proactive Agents for Multi-Turn Text-to-Image Generation Under Uncertainty","url":"https://deepmind.google/research/publications/121578/","published":"2025-05-01","authors":["Google/DeepMind"],"abstract":"","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Google/DeepMind"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Google DeepMind publications page https://deepmind.google/research/publications/"}},{"id":"openalex:W4410012115","title":"Constraining multimodal distribution for domain adaptation in stereo matching","url":"https://doi.org/10.1016/j.patcog.2025.111727","published":"2025-05-01","authors":["Zhelun Shen","Zhuo Li","Chenming Wu","Zhibo Rao","Lina Liu","Yuchao Dai","Liangjun Zhang"],"abstract":"","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1016/j.patcog.2025.111727","openalex_id":"https://openalex.org/W4410012115","cited_by_count":2,"quality_score":39,"matched_keywords":[],"author_affiliations":["Baidu (China)","Institute of Computing Technology","Nanchang Hangkong University","Northwestern Polytechnical University","Zhejiang University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6285033822059631},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.6114873290061951},{"id":"https://openalex.org/C165064840","display_name":"Matching (statistics)","score":0.5907257199287415},{"id":"https://openalex.org/C139807058","display_name":"Adaptation (eye)","score":0.5573898553848267},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.5486737489700317},{"id":"https://openalex.org/C36503486","display_name":"Domain (mathematical analysis)","score":0.49945545196533203},{"id":"https://openalex.org/C2776434776","display_name":"Domain adaptation","score":0.4856337606906891},{"id":"https://openalex.org/C110121322","display_name":"Distribution (mathematics)","score":0.4340147376060486}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":2}},{"id":"official:35bf98e383a0e356","title":"Claude Sonnet 4 and Opus 4 System Card","url":"https://www-cdn.anthropic.com/07b2a3f9902ee19fe39a36ca638e5ae987bc64dd.pdf","published":"2025-05","authors":["Anthropic"],"abstract":"Official Anthropic system card for Claude Sonnet 4 and Opus 4.","companies":["Anthropic"],"matched_orgs":["Anthropic"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"model_card","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["system card","Claude","Claude Sonnet 4 and Opus 4"],"author_affiliations":["Anthropic"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Anthropic system cards page https://www.anthropic.com/system-cards"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/ashabot-an-llm-powered-chatbot-to-support-the-informational-needs-of-community-health-workers","title":"ASHABot: An LLM-Powered Chatbot to Support the Informational Needs of Community Health Workers","url":"https://www.microsoft.com/en-us/research/publication/ashabot-an-llm-powered-chatbot-to-support-the-informational-needs-of-community-health-workers/","published":"2025-04-30","authors":["Pragnya Ramjee","Mehak Chhokar","Bhuvan Sachdeva","Mahendra Meena","Hamid Abdullah","Aditya Vashistha","Ruchit Nagar","Mohit Jain"],"abstract":"Community health workers (CHWs) provide last-mile healthcare services but face challenges due to limited medical knowledge and training. This paper describes the design, deployment, and evaluation of ASHABot, an LLM-powered, experts-in-the-loop, WhatsApp-based chatbot to address the information needs of CHWs in India. Through interviews with CHWs and their supervisors and log analysis, we examine factors affecting their engagement with ASHABot, and ASHABot's role in addressing CHWs' informational needs. We found that ASHABot provided a private channel for CHWs to ask rudimentary and sensitive questions they hesitated to ask supervisors. CHWs trusted the information they received on ASHABot and treated it as an authoritative resource. CHWs' supervisors expanded their knowledge by contributing answers to questions ASHABot failed to answer, but were concerned about demands on their workload...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":88,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Human-computer interaction","Medical, health and genomics","Chatbot","Computer science","Human–computer interaction","1970-01-01","LLM"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/you-only-prefill-once-combining-cached-knowledge-for-large-language-model-serving-with-cacheblend","title":"CacheBlend: Fast Large Language Model Serving for RAG with Cached Knowledge Fusion","url":"https://www.microsoft.com/en-us/research/publication/you-only-prefill-once-combining-cached-knowledge-for-large-language-model-serving-with-cacheblend/","published":"2025-04-30","authors":["Jiayi Yao","Hanchen Li","Yuhan Liu","Siddhant Ray","Yihua Cheng","Qizheng Zhang","Kuntai Du","Shan Lu","Junchen Jiang"],"abstract":"Large language models (LLMs) often incorporate multiple text chunks in their inputs to provide the necessary contexts. To speed up the prefill of the long LLM inputs, one can pre-compute the KV cache of a text and re-use the KV cache when the context is reused as the prefix of another LLM input. However, the reused text chunks are not always the input prefix, which makes precomputed KV caches not directly usable since they ignore the text’s cross-attention with the preceding texts. Thus, the benefits of reusing KV caches remain largely unrealized.This paper tackles just one question: when an LLM input contains multiple text chunks, how to quickly combine their precomputed KV caches in order to achieve the same generation quality as the expensive full prefill (i.e., without reusing KV cache)? We present CacheBlend, a scheme that reuses the pre-computed KV caches, regardless prefix or not,...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":84,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Systems and networking","Machine learning","1970-01-01","LLM","language model","retrieval"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/dreamgarden-a-designer-assistant-for-growing-games-from-a-single-prompt","title":"DreamGarden: A Designer Assistant for Growing Games from a Single Prompt","url":"https://www.microsoft.com/en-us/research/publication/dreamgarden-a-designer-assistant-for-growing-games-from-a-single-prompt/","published":"2025-04-30","authors":["Sam Earle","Samyak Parajuli","Andrzej Banburski-Fahey"],"abstract":"Coding assistants are increasingly leveraged in game design, both generating code and making high-level plans. To what degree can these tools align with developer workflows, and what new modes of human-computer interaction can emerge from their use? We present DreamGarden, an AI system capable of assisting with the development of diverse game environments in Unreal Engine. At the core of our method is an LLM-driven planner, capable of breaking down a single, high-level prompt -- a dream, memory, or imagined scenario provided by a human user -- into a hierarchical action plan, which is then distributed across specialized submodules facilitating concrete implementation. This system is presented to the user as a garden of plans and actions, both growing independently and responding to user intervention via seed prompts, pruning, and feedback. Through a user study, we explore design implicat...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":80,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Human-computer interaction","Computer science","1970-01-01","LLM","memory"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/rapgen-an-approach-for-fixing-code-inefficiencies-in-zero-shot-2","title":"RAPGen: An Approach for Fixing Code Inefficiencies in Zero-Shot","url":"https://www.microsoft.com/en-us/research/publication/rapgen-an-approach-for-fixing-code-inefficiencies-in-zero-shot-2/","published":"2025-04-30","authors":["Spandan Garg","Roshanak Zilouchian Moghaddam","Neel Sundaresan"],"abstract":"Performance bugs are non-functional bugs that can even manifest in well-tested commercial products. Fixing these performance bugs is an important yet challenging problem. In this work, we address this challenge and present a new approach called Retrieval-Augmented Prompt Generation (RAPGen). Given a code snippet with a performance issue, RAPGen first retrieves a prompt instruction from a pre-constructed knowledge-base of previous performance bug fixes and then generates a prompt using the retrieved instruction. It then uses this prompt on a Large Language Model in zero-shot to generate a fix. We compare our approach with the various prompt variations and state of the art methods in the task of performance bug fixing. Our empirical evaluation shows that RAPGen can generate performance improvement suggestions equivalent or better than a developer in $\\sim 60 {\\%}$ of the cases, getting $\\s...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Article (Journal)","Artificial intelligence","Programming languages and software engineering","Computer science","language model","retrieval"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/canvil-designerly-adaptation-for-llm-powered-user-experiences","title":"Canvil: Designerly Adaptation for LLM-Powered User Experiences","url":"https://www.microsoft.com/en-us/research/publication/canvil-designerly-adaptation-for-llm-powered-user-experiences/","published":"2025-04-30","authors":["K. Feng","Q. Vera Liao","Ziang Xiao","Jennifer Wortman Vaughan","Amy Zhang","David W. McDonald"],"abstract":"Advancements in large language models (LLMs) are sparking a proliferation of LLM-powered user experiences (UX). In product teams, designers often craft UX to meet user needs, but it is unclear how they engage with LLMs as a novel design material. Through a formative study with 12 designers, we find that designers seek a translational process that enables design requirements to shape and be shaped by LLM behavior, motivating a need for designerly adaptation to facilitate this translation. We then built Canvil, a Figma widget that operationalizes designerly adaptation. We used Canvil as a probe to study designerly adaptation in a group-based design study (6 groups, N=17), finding that designers constructively iterated on both adaptation approaches and interface designs to enhance end-user interaction with LLMs. Furthermore, designers identified promising collaborative workflows for designe...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Human-computer interaction","Computer science","1970-01-01","LLM"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/ai-on-my-shoulder-supporting-emotional-labor-in-front-office-roles-with-an-llm-based-empathetic-coworker","title":"AI on My Shoulder: Supporting Emotional Labor in Front-Office Roles with an LLM-based Empathetic Coworker","url":"https://www.microsoft.com/en-us/research/publication/ai-on-my-shoulder-supporting-emotional-labor-in-front-office-roles-with-an-llm-based-empathetic-coworker/","published":"2025-04-30","authors":["V. D. Swain","Qiuyue","Jash R. Parekh","Yechan Jeon","Roy Zimmerman","Mary Czerwinski","Jina Suh","Varun Mishra","Koustuv Saha","Javier Hernandez"],"abstract":"Client-Service Representatives (CSRs) are vital to organizations. Frequent interactions with disgruntled clients, however, disrupt their mental well-being. To help CSRs regulate their emotions while interacting with uncivil clients, we designed Care-Pilot, an LLM-powered assistant, and evaluated its efficacy, perception, and use. Our comparative analyses between 665 human and Care-Pilot-generated support messages highlight Care-Pilot's ability to adapt to and demonstrate empathy in various incivility incidents. Additionally, 143 CSRs assessed Care-Pilot's empathy as more sincere and actionable than human messages. Finally, we interviewed 20 CSRs who interacted with Care-Pilot in a simulation exercise. They reported that Care-Pilot helped them avoid negative thinking, recenter thoughts, and humanize clients; showing potential for bridging gaps in coworker support. Yet, they also noted dep...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Human-computer interaction","Computer science","1970-01-01","LLM"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/sonora-human-ai-co-creation-of-3d-audio-worlds-and-its-impact-on-anxiety-and-cognitive-load","title":"Sonora: Human-AI Co-Creation of 3D Audio Worlds and its Impact on Anxiety and Cognitive Load","url":"https://www.microsoft.com/en-us/research/publication/sonora-human-ai-co-creation-of-3d-audio-worlds-and-its-impact-on-anxiety-and-cognitive-load/","published":"2025-04-30","authors":["Fernanda De La Torre","Javier Hernandez","Andrew D. Wilson","Judith Amores"],"abstract":"Soundscapes are widely used for relaxation, but their potential for personalized, navigable experiences remains under-explored. To address this, we developed Sonora, an AI tool that enables real-time generation of synthetic, spatialized soundscapes, allowing users to navigate immersive auditory environments and customize soundscapes using voice commands. Sonora's architecture integrates audio diffusion models and LLMs within Unity. A between-subjects study with 32 participants investigated its effects on anxiety and user experience, compared to a control condition involving passive listening to a soundscape. Participants who interacted with Sonora reported higher entertainment than the control group. A positive correlation was found between state anxiety and user requests for Sonora, suggesting anxious users engaged more. Participants with moderate to high trait anxiety experienced signi...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Human-computer interaction","1970-01-01","personalized"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/intent-tagging-exploring-micro-prompting-interactions-for-supporting-granular-human-genai-co-creation-workflows","title":"Intent Tagging: Exploring Micro-Prompting Interactions for Supporting Granular Human-GenAI Co-Creation Workflows","url":"https://www.microsoft.com/en-us/research/publication/intent-tagging-exploring-micro-prompting-interactions-for-supporting-granular-human-genai-co-creation-workflows/","published":"2025-04-30","authors":["Frederic Gmeiner","Nicolai Marquardt","Michael Bentley","Hugo Romat","Michel Pahud","David Brown","Asta Roseway","Nikolas Martelaro","Kenneth Holstein","Ken Hinckley","Nathalie Henry Riche"],"abstract":"Despite Generative AI (GenAI) systems' potential for enhancing content creation, users often struggle to effectively integrate GenAI into their creative workflows. Core challenges include misalignment of AI-generated content with user intentions (intent elicitation and alignment), user uncertainty around how to best communicate their intents to the AI system (prompt formulation), and insufficient flexibility of AI systems to support diverse creative workflows (workflow flexibility). Motivated by these challenges, we created IntentTagger: a system for slide creation based on the notion of Intent Tags - small, atomic conceptual units that encapsulate user intent - for exploring granular and non-linear micro-prompting interactions for Human-GenAI co-creation workflows. Our user study with 12 participants provides insights into the value of flexibly expressing intent across varying levels of...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Human-computer interaction","Computer science","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/effects-of-llm-based-search-on-decision-making-speed-accuracy-and-overreliance","title":"Effects of LLM-based Search on Decision Making: Speed, Accuracy, and Overreliance","url":"https://www.microsoft.com/en-us/research/publication/effects-of-llm-based-search-on-decision-making-speed-accuracy-and-overreliance/","published":"2025-04-30","authors":["Sofia Eleni Spatharioti","David Rothschild","Daniel G. Goldstein","Jake Hofman"],"abstract":"Recent advances in large language models (LLMs) are transforming online applications, including search tools that accommodate complex natural language queries and provide direct responses. There are, however, concerns about the veracity of LLM-generated content due to potential for LLMs to \"hallucinate\". In two online experiments, we examined how LLM-based search affects behavior compared to traditional search and explored ways to reduce overreliance on incorrect LLM-based output. Participants assigned to LLM-based search completed tasks more quickly, with fewer but more complex queries, and reported a more satisfying experience. While decision accuracy was comparable when the LLM was correct, users overrelied on incorrect information when the model erred. In a second experiment, a color-coded highlighting system helped users detect errors, improving decision accuracy without affecting o...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Human-computer interaction","1970-01-01","LLM"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/social-by-nature-how-socio-tecture-shapes-the-work-of-smbs-and-considerations-for-reimagining-collaborative-human-ai-systems","title":"Social by Nature: How Socio-tecture Shapes the Work of SMBs and Considerations for Reimagining Collaborative Human-AI Systems","url":"https://www.microsoft.com/en-us/research/publication/social-by-nature-how-socio-tecture-shapes-the-work-of-smbs-and-considerations-for-reimagining-collaborative-human-ai-systems/","published":"2025-04-30","authors":["Elizabeth Ankrah","Kagonya Awori","Stephanie Nyairo","Mercy Muchai","Millicent Ochieng","Mark Kariuki","Gillian R Hayes","Jacki O'Neill"],"abstract":"Globally, small and medium-sized businesses (SMBs) have had to adapt to rapid digital changes, a shift accelerated by the COVID-19 pandemic. In Kenya, this transition has involved a significant move towards digital management tools. While many had already experienced marked digitalization over the last few decades, they completed this work differently from their European and North American counterparts. This study explores how Kenyan SMBs continue to navigate these changes and considers the potential of Generative AI in this context. Applying the concept of socio-tecture—which emphasizes social networks, relational business practices, and employees as knowledge producers—we analyze how these elements influence SMB operations in Nairobi. We highlight how socio-tecture affects business performance and growth, and discuss how an Afro-centric strengths-based approach might offer unique oppor...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Human-computer interaction","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/need-help-designing-proactive-ai-assistants-for-programming","title":"Need Help? Designing Proactive AI Assistants for Programming","url":"https://www.microsoft.com/en-us/research/publication/need-help-designing-proactive-ai-assistants-for-programming/","published":"2025-04-30","authors":["Valerie Chen","Alan Zhu","Sebastian Zhao","Hussein Mozannar","David Sontag","Ameet Talwalkar"],"abstract":"While current chat-based AI assistants primarily operate reactively, responding only when prompted by users, there is significant potential for these systems to proactively assist in tasks without explicit invocation, enabling a mixed-initiative interaction. This work explores the design and implementation of proactive AI assistants powered by large language models. We first outline the key design considerations for building effective proactive assistants. As a case study, we propose a proactive chat-based programming assistant that automatically provides suggestions and facilitates their integration into the programmer's code. The programming context provides a shared workspace enabling the assistant to offer more relevant suggestions. We conducted a randomized experimental study examining the impact of various design elements of the proactive assistant on programmer productivity and us...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Human-computer interaction","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/flexcad-unified-and-versatile-controllable-cad-generation-with-fine-tuned-large-language-models","title":"FlexCAD: Unified and Versatile Controllable CAD Generation with Fine-tuned Large Language Models","url":"https://www.microsoft.com/en-us/research/publication/flexcad-unified-and-versatile-controllable-cad-generation-with-fine-tuned-large-language-models/","published":"2025-04-30","authors":["Zhanwei Zhang","Shizhao Sun","Wenxiao Wang","Deng Cai","Jiang Bian (jiabia)"],"abstract":"Recently, there is a growing interest in creating computer-aided design (CAD) models based on user intent, known as controllable CAD generation. Existing work offers limited controllability and needs separate models for different types of control, reducing efficiency and practicality. To achieve controllable generation across all CAD construction hierarchies, such as sketch-extrusion, extrusion, sketch, face, loop and curve, we propose FlexCAD, a unified model by fine-tuning large language models (LLMs). First, to enhance comprehension by LLMs, we represent a CAD model as a structured text by abstracting each hierarchy as a sequence of text tokens. Second, to address various controllable generation tasks in a unified model, we introduce a hierarchy-aware masking strategy. Specifically, during training, we mask a hierarchy-aware field in the CAD text with a mask token. This field, compose...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer vision","Computer science"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"hf-org-paper:deepseek-ai:2504.21801","title":"DeepSeek-Prover-V2: Advancing Formal Mathematical Reasoning via Reinforcement Learning for Subgoal Decomposition","url":"https://huggingface.co/papers/2504.21801","published":"2025-04-30","authors":["DeepSeek"],"abstract":"We introduce DeepSeek-Prover-V2, an open-source large language model designed for formal theorem proving in Lean 4, with initialization data collected through a recursive theorem proving pipeline powered by DeepSeek-V3. The cold-start training procedure begins by prompting DeepSeek-V3 to decompose complex problems into a series of subgoals. The proofs of resolved subgoals are synthesized into a chain-of-thought process, combined with DeepSeek-V3's step-by-step reasoning, to create an initial cold start for reinforcement learning. This process enables us to integrate both informal and formal mathematical reasoning into a unified model. The resulting model, DeepSeek-Prover-V2-671B, achieves state-of-the-art performance in neural theorem proving, reaching 88.9% pass ratio on the MiniF2F-test and solving 49 out of 658 problems from PutnamBench. In addition to standard benchmarks, we introduc...","companies":["DeepSeek"],"matched_orgs":["DeepSeek"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["HuggingFace org papers","deepseek-ai","language model"],"author_affiliations":["DeepSeek"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/deepseek-ai/papers"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/phi-4-reasoning-technical-report","title":"Phi-4-reasoning Technical Report","url":"https://www.microsoft.com/en-us/research/publication/phi-4-reasoning-technical-report/","published":"2025-04-30","authors":["Marah Abdin","Sahaj Agarwal","Sahaj Agarwal","Ahmed Awadallah","Vidhisha Balachandran","Harkirat Behl","Lingjiao Chen (lingjiaochen)","Gustavo de Rosa","Suriya Gunasekar","Mojan Javaheripi","Neel Joshi","Piero Kauffmann"],"abstract":"We introduce Phi-4-reasoning, a 14-billion parameter reasoning model that achieves strong performance on complex reasoning tasks. Trained via supervised fine-tuning of Phi-4 on carefully curated set of “teachable” prompts–selected for the right level of complexity and diversity–and reasoning demonstrations generated using o3-mini, Phi-4-reasoning generates detailed reasoning chains that effectively leverage inference time compute. We further develop Phi-4-reasoning-plus, a variant enhanced through a short phase of outcome-based reinforcement learning that offers higher performance by generating longer reasoning traces. Across a wide range of reasoning tasks, both models outperform significantly larger open-weight models such as DeepSeekR1-Distill-Llama-70B model and approach the performance levels of full DeepSeek R1 model. Our comprehensive evaluations span benchmarks in math and scient...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Article (Journal)","Artificial intelligence"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W4409963643","title":"ShuffleInfer: Disaggregate LLM Inference for Mixed Downstream Workloads","url":"https://doi.org/10.1145/3732941","published":"2025-04-30","authors":["C.-C. Hu","Heyang Huang","Liangliang Xu","Xusheng Chen","Chenxi Wang","Xu Jiang","Shuang Chen","Hao Feng","Sa Wang","Yungang Bao","Ninghui Sun","Yizhou Shan"],"abstract":"Transformer-based large language model (LLM) inference serving is now the backbone of many cloud services. LLM inference consists of a prefill phase and a decode phase. However, existing LLM deployment practices often overlook the distinct characteristics of these phases, leading to significant interference. To mitigate interference, our insight is to carefully schedule and group inference requests based on their characteristics. We realize this idea in ShuffleInfer through three pillars. First, it partitions prompts into fixed-size chunks so that the accelerator always runs close to its computation-saturated limit. Second, it disaggregates prefill and decode instances so each can run independently. Finally, it uses a smart two-level scheduling algorithm augmented with predicted resource usage to avoid decode scheduling hotspots. Results show that ShuffleInfer improves time-to-first-toke...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3732941","openalex_id":"https://openalex.org/W4409963643","cited_by_count":12,"quality_score":57,"matched_keywords":["LLM","language model"],"author_affiliations":["Chinese Academy of Sciences","Huawei Technologies (China)","Institute of Computing Technology","State Key Laboratory of Computer Architecture","Xidian University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7669060230255127},{"id":"https://openalex.org/C2776207758","display_name":"Downstream (manufacturing)","score":0.6719676852226257},{"id":"https://openalex.org/C2776214188","display_name":"Inference","score":0.6364374160766602},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.25620144605636597},{"id":"https://openalex.org/C21547014","display_name":"Operations management","score":0.0},{"id":"https://openalex.org/C162324750","display_name":"Economics","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":12}},{"id":"openalex:W4410781058","title":"Evidence from counterfactual tasks supports emergent analogical reasoning in large language models","url":"https://doi.org/10.1093/pnasnexus/pgaf135","published":"2025-04-30","authors":["Taylor W. Webb","Keith J. Holyoak","Hongjing Lu"],"abstract":"A major debate has recently arisen concerning whether large language models (LLMs) have developed an emergent capacity for analogical reasoning. While some recent work has highlighted the strong zero-shot performance of these systems on a range of text-based analogy tasks, often rivaling human performance, other work has challenged these conclusions, citing evidence from so-called \"counterfactual\" tasks-tasks that are modified so as to decrease similarity with materials that may have been present in the language models' training data. Here, we report evidence that language models are also capable of generalizing to these new counterfactual task variants when they are augmented with the ability to write and execute code. The results further corroborate the emergence of a capacity for analogical reasoning in LLMs and argue against claims that this capacity depends on simple mimicry of the....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1093/pnasnexus/pgaf135","openalex_id":"https://openalex.org/W4410781058","cited_by_count":6,"quality_score":43,"matched_keywords":[],"author_affiliations":["Microsoft (United States)","University of California, Los Angeles"],"concepts":[{"id":"https://openalex.org/C108650721","display_name":"Counterfactual thinking","score":0.9112628102302551},{"id":"https://openalex.org/C521332185","display_name":"Analogy","score":0.784031093120575},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6072176694869995},{"id":"https://openalex.org/C2780451532","display_name":"Task (project management)","score":0.5071423053741455},{"id":"https://openalex.org/C103278499","display_name":"Similarity (geometry)","score":0.5051751732826233},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.45920756459236145},{"id":"https://openalex.org/C180747234","display_name":"Cognitive psychology","score":0.4514518976211548},{"id":"https://openalex.org/C18762648","display_name":"Work (physics)","score":0.4200114905834198}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":6}},{"id":"openalex:W4410465386","title":"Intelligent AI Agents for Fraud and Abuse Detection: Leveraging Machine Learning, NLP, and Behavioural Analytics for Enhanced Security","url":"https://doi.org/10.14445/23488387/ijcse-v12i4p103","published":"2025-04-30","authors":["Anirban Majumder"],"abstract":"Fraud and abuse in financial transactions, healthcare claims, and digital interactions pose significant challenges to organizations worldwide. The conventional rule-based detection approaches are often limited in adapting to evolving fraudulent tactics. This paper explores the development of intelligent AI agents for fraud and abuse detection, leveraging Machine Learning (ML), Natural Language Processing (NLP), and Behavioural Analytics to enhance security and risk mitigation. To generate models, the AI solution incorporates supervised and unsupervised Machine learning models for finding deviations through anomaly detection, NLP approaches for text-based fraud identification and behavioural analytics to permit recognition of deviations in user activity. Leveraging these state-of-the-art approaches helps the system to enable real-time detection and prevention of fraudulent activities, enh...","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.14445/23488387/ijcse-v12i4p103","openalex_id":"https://openalex.org/W4410465386","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Amazon (United States)"],"concepts":[{"id":"https://openalex.org/C79158427","display_name":"Analytics","score":0.7263196706771851},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6780638694763184},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5470935702323914},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.42325618863105774},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.41439980268478394},{"id":"https://openalex.org/C2522767166","display_name":"Data science","score":0.38244345784187317}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/tapas-thermal-and-power-aware-scheduling-for-llm-inference-in-cloud-platforms","title":"TAPAS: Thermal- and Power-Aware Scheduling for LLM Inference in Cloud Platforms","url":"https://www.microsoft.com/en-us/research/publication/tapas-thermal-and-power-aware-scheduling-for-llm-inference-in-cloud-platforms/","published":"2025-04-29","authors":["Jovan Stojkovic","Chaojie Zhang","Íñigo Goiri","Esha Choukse","Haoran Qiu","Rodrigo Fonseca","Josep Torrellas","Ricardo Bianchini"],"abstract":"The rising demand for generative large language models (LLMs) poses challenges for thermal and power management in cloud datacenters. Traditional techniques often are inadequate for LLM inference due to the fine-grained, millisecond-scale execution phases, each with distinct performance, thermal, and power profiles. Additionally, LLM inference workloads are sensitive to various configuration parameters (e.g., model parallelism, size, and quantization) that involve trade-offs between performance, temperature, power, and output quality. Moreover, clouds often co-locate SaaS and IaaS workloads, each with different levels of visibility and flexibility. We propose TAPAS, a thermal- and power-aware framework designed for LLM inference clusters in the cloud. TAPAS enhances cooling and power oversubscription capabilities, reducing the total cost of ownership (TCO) while effectively handling emer...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.1145/3676641.3716025","openalex_id":"https://openalex.org/W4408894415","cited_by_count":13,"quality_score":89,"matched_keywords":["Inproceedings (Conference)","Systems and networking","Computer science","1970-01-01","LLM","quantization"],"author_affiliations":["Microsoft","Microsoft (United States)","University of Illinois Urbana-Champaign"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/interactive-debugging-and-steering-of-multi-agent-ai-systems","title":"Interactive Debugging and Steering of Multi-Agent AI Systems","url":"https://www.microsoft.com/en-us/research/publication/interactive-debugging-and-steering-of-multi-agent-ai-systems/","published":"2025-04-29","authors":["Will Epperson","Gagan Bansal","Victor Dibia","Adam Fourney (adamfo)","Jack Gerrits","Erkang (Eric) Zhu","Saleema Amershi"],"abstract":"Fully autonomous teams of LLM-powered AI agents are emerging that collaborate to perform complex tasks for users. What challenges do developers face when trying to build and debug these AI agent teams? In formative interviews with five AI agent developers, we identify core challenges: difficulty reviewing long agent conversations to localize errors, lack of support in current tools for interactive debugging, and the need for tool support to iterate on agent configuration. Based on these needs, we developed an interactive multi-agent debugging tool, AGDebugger, with a UI for browsing and sending messages, the ability to edit and reset prior agent messages, and an overview visualization for navigating complex message histories. In a two-part user study with 14 participants, we identify common user strategies for steering agents and highlight the importance of interactive message resets for...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":84,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Human-computer interaction","Computer science","1970-01-01","LLM","agent","multi-agent"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/3dgen-ai-assisted-generation-of-provably-correct-binary-format-parsers","title":"3DGen: AI-Assisted Generation of Provably Correct Binary Format Parsers","url":"https://www.microsoft.com/en-us/research/publication/3dgen-ai-assisted-generation-of-provably-correct-binary-format-parsers/","published":"2025-04-29","authors":["Sarah Fakhoury","Markus Kuppe","Shuvendu Lahiri","Tahina Ramananandro","Nikhil Swamy"],"abstract":"Improper parsing of attacker-controlled input is a leading source of software security vulnerabilities, especially when programmers transcribe informal format descriptions in RFCs into efficient parsing logic in low-level, memory unsafe languages. Several researchers have proposed formal specification languages for data formats from which efficient code can be extracted. However, distilling informal requirements into formal specifications is challenging and, despite their benefits, new, formal languages are hard for people to learn and use. In this work, we present 3DGen, a framework that makes use of AI agents to transform mixed informal input, including natural language documents (i.e., RFCs) and example inputs into format specifications in a language called 3D. To support humans in understanding and trusting the generated specifications, 3DGen uses symbolic methods to also synthesize....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.1109/icse55347.2025.00173","openalex_id":"https://openalex.org/W4411551600","cited_by_count":2,"quality_score":82,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Programming languages and software engineering","AI agents","Computer science","memory","efficient"],"author_affiliations":["Microsoft","Microsoft (United States)"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/ai-instruments-embodying-prompts-as-instruments-to-abstract-reflect-graphical-interface-commands-as-general-purpose-tools","title":"AI-Instruments: Embodying Prompts as Instruments to Abstract & Reflect Graphical Interface Commands as General-Purpose Tools","url":"https://www.microsoft.com/en-us/research/publication/ai-instruments-embodying-prompts-as-instruments-to-abstract-reflect-graphical-interface-commands-as-general-purpose-tools/","published":"2025-04-29","authors":["Nathalie Henry Riche","Anna Offenwanger","Frederic Gmeiner","David Brown","Hugo Romat","Michel Pahud","Nicolai Marquardt","Kori Inkpen","Ken Hinckley"],"abstract":"Chat-based prompts respond with verbose linear-sequential texts, making it difficult to explore and refine ambiguous intents, back up and reinterpret, or shift directions in creative AI-assisted design work. AI-Instruments instead embody\"prompts\"as interface objects via three key principles: (1) Reification of user-intent as reusable direct-manipulation instruments; (2) Reflection of multiple interpretations of ambiguous user-intents (Reflection-in-intent) as well as the range of AI-model responses (Reflection-in-response) to inform design\"moves\"towards a desired result; and (3) Grounding to instantiate an instrument from an example, result, or extrapolation directly from another instrument. Further, AI-Instruments leverage LLM's to suggest, vary, and refine new instruments, enabling a system that goes beyond hard-coded functionality by generating its own instrumental controls from conte...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Human-computer interaction","Computer science","1970-01-01","LLM"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/iterative-self-tuning-llms-for-enhanced-jailbreaking-capabilities","title":"Iterative Self-Tuning LLMs for Enhanced Jailbreaking Capabilities","url":"https://www.microsoft.com/en-us/research/publication/iterative-self-tuning-llms-for-enhanced-jailbreaking-capabilities/","published":"2025-04-29","authors":["Chung-En Sun","Xiaodong Liu","Weiwei Yang","Tsui-Wei Weng","Hao Cheng","Aidan San","Michel Galley","Jianfeng Gao"],"abstract":"Recent research has shown that Large Language Models (LLMs) are vulnerable to automated jailbreak attacks, where adversarial suffixes crafted by algorithms appended to harmful queries bypass safety alignment and trigger unintended responses. Current methods for generating these suffixes are computationally expensive and have low Attack Success Rates (ASR), especially against well-aligned models like Llama2 and Llama3. To overcome these limitations, we introduce ADV-LLM, an iterative self-tuning process that crafts adversarial LLMs with enhanced jailbreak ability. Our framework significantly reduces the computational cost of generating adversarial suffixes while achieving nearly 100% ASR on various open-source LLMs. Moreover, it exhibits strong attack transferability to closed-source models, achieving 99% ASR on GPT-3.5 and 49% ASR on GPT-4, despite being optimized solely on Llama3. Beyon...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Security, privacy, and cryptography","1970-01-01","LLM"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/productmeta-an-interactive-system-for-metaphorical-product-design-ideation-with-multimodal-large-language-models","title":"ProductMeta: An Interactive System for Metaphorical Product Design Ideation with Multimodal Large Language Models","url":"https://www.microsoft.com/en-us/research/publication/productmeta-an-interactive-system-for-metaphorical-product-design-ideation-with-multimodal-large-language-models/","published":"2025-04-29","authors":["Qinyi Zhou","Jie Deng","Yu Liu","Yun Wang","Yan Xia","Yang Ou","Zhicong Lu","Sai Ma","Scarlett Li","Yingqing Xu"],"abstract":"Product metaphors, which involve creating products that convey meaning through metaphorical associations, are a powerful tool in product design. However, according to our formative study, novice designers often struggle to establish coherent links between target and source, to manage the complexity of diverse mapping possibilities and to balance product usability with metaphorical expression. To address these challenges, we introduce ProductMeta, a creativity support tool designed to support novice designers in exploring and developing metaphorical product designs. ProductMeta incorporates domain knowledge and decomposes the design process into iterative modules and framework-based interfaces, fostering both divergent and convergent thinking. Through user studies, we demonstrate that ProductMeta enables novice designers to generate diverse and contextually relevant design ideas by facili...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Human-computer interaction","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"official:329034461703f4bc","title":"Prompting with Phonemes: Enhancing LLM Multilinguality for non-Latin Scripts","url":"https://deepmind.google/research/publications/114003/","published":"2025-04-29","authors":["Google/DeepMind"],"abstract":"","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["LLM"],"author_affiliations":["Google/DeepMind"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Google DeepMind publications page https://deepmind.google/research/publications/"}},{"id":"official:79a5124b02349915","title":"Qwen3: Think Deeper, Act Faster","url":"https://qwenlm.github.io/blog/qwen3/","published":"2025-04-29","authors":["Alibaba/Qwen"],"abstract":"QWEN CHAT GitHub Hugging Face ModelScope Kaggle DEMO DISCORDIntroduction Today, we are excited to announce the release of Qwen3, the latest addition to the Qwen family of large language models. Our flagship model, Qwen3-235B-A22B, achieves competitive results in benchmark evaluations of coding, math, general capabilities, etc., when compared to other top-tier models such as DeepSeek-R1, o1, o3-mini, Grok-3, and Gemini-2.5-Pro. Additionally, the small MoE model, Qwen3-30B-A3B, outcompetes QwQ-32B with 10 times of activated parameters, and even a tiny model like Qwen3-4B can rival the performance of Qwen2.","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Alibaba/Qwen"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official RSS feed https://qwenlm.github.io/blog/index.xml"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/exploring-early-adopters-use-of-ai-driven-multi-agent-systems-to-inform-human-agent-interaction-design-insights-from-industry-practice","title":"Exploring Early Adopters' Use of AI Driven Multi-Agent Systems to Inform Human-Agent Interaction Design: Insights from Industry Practice","url":"https://www.microsoft.com/en-us/research/publication/exploring-early-adopters-use-of-ai-driven-multi-agent-systems-to-inform-human-agent-interaction-design-insights-from-industry-practice/","published":"2025-04-28","authors":["Suchismita Naik","Amanda Snellinger","Austin L. Toombs","Scott Saponas","Amanda K. Hall"],"abstract":"This case study explores the experiences of Microsoft employees, who are early adopters of multi-agent generative AI systems, as they experiment with these technologies to design, test, and deploy new tools attempting to bridge the gap between existing Microsoft products and emerging AI capabilities. Thirteen developers and creators participated in 60-minute semi-structured interviews to elicit their challenges, use cases, and lessons learned from their experimentation with multi-agent AI frameworks. A thematic qualitative analysis process was conducted to analyze the interview data. Participants reported building multi-agent AI tools to address tasks in team collaboration, productivity, customer support, creative processes, and security. Strategies for managing complexity, enhancing transparency, and balancing agent autonomy with human oversight were found to be important human-agent in...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.1145/3706599.3706693","openalex_id":"https://openalex.org/W4409720082","cited_by_count":2,"quality_score":86,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Human-computer interaction","agentic AI","Human–computer interaction","1970-01-01","agent","multi-agent"],"author_affiliations":["Microsoft","Indiana University Bloomington","Microsoft (United States)","Purdue University West Lafayette"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/local-prompt-optimization","title":"Local Prompt Optimization","url":"https://www.microsoft.com/en-us/research/publication/local-prompt-optimization/","published":"2025-04-28","authors":["Yash Jain","Vishal Chowdhary"],"abstract":"In recent years, the use of prompts to guide the output of Large Language Models have increased dramatically. However, even the best of experts struggle to choose the correct words to stitch up a prompt for the desired task. To solve this, LLM driven prompt optimization emerged as an important problem. Existing prompt optimization methods optimize a prompt globally, where in all the prompt tokens have to be optimized over a large vocabulary while solving a complex task. The large optimization space (tokens) leads to insufficient guidance for a better prompt. In this work, we introduce Local Prompt Optimization (LPO) that integrates with any general automatic prompt engineering method. We identify the optimization tokens in a prompt and nudge the LLM to focus only on those tokens in its optimization step. We observe remarkable performance improvements on Math Reasoning (GSM8k and MultiAri...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Article (Journal)","Artificial intelligence","Programming languages and software engineering","Technology for emerging markets","Computer science","LLM"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/fostering-appropriate-reliance-on-large-language-models-the-role-of-explanations-sources-and-inconsistencies","title":"Fostering Appropriate Reliance on Large Language Models: The Role of Explanations, Sources, and Inconsistencies","url":"https://www.microsoft.com/en-us/research/publication/fostering-appropriate-reliance-on-large-language-models-the-role-of-explanations-sources-and-inconsistencies/","published":"2025-04-28","authors":["Sunnie S. Y. Kim","Jennifer Wortman Vaughan","Q. Vera Liao","Tania Lombrozo","Olga Russakovsky"],"abstract":"Large language models (LLMs) can produce erroneous responses that sound fluent and convincing, raising the risk that users will rely on these responses as if they were correct. Mitigating such overreliance is a key challenge. Through a think-aloud study in which participants use an LLM-infused application to answer objective questions, we identify several features of LLM responses that shape users' reliance: explanations (supporting details for answers), inconsistencies in explanations, and sources. Through a large-scale, pre-registered, controlled experiment (N=308), we isolate and study the effects of these features on users' reliance, accuracy, and other measures. We find that the presence of explanations increases reliance on both correct and incorrect responses. However, we observe less reliance on incorrect responses when sources are provided or when explanations exhibit inconsiste...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Human-computer interaction","Computer science","1970-01-01","LLM"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/reinforcement-learning-for-reasoning-in-large-language-models-with-one-training-example","title":"Reinforcement Learning for Reasoning in Large Language Models with One Training Example","url":"https://www.microsoft.com/en-us/research/publication/reinforcement-learning-for-reasoning-in-large-language-models-with-one-training-example/","published":"2025-04-28","authors":["Yiping Wang","Qing Yang","Zhiyuan Zeng","Liliang Ren","Liyuan Liu","Baolin Peng","Hao Cheng","Xuehai He","Kuan Wang","Jianfeng Gao","Weizhu Chen","Shuohang Wang"],"abstract":"We show that reinforcement learning with verifiable reward using one training example (1-shot RLVR) is effective in incentivizing the mathematical reasoning capabilities of large language models (LLMs). Applying RLVR to the base model Qwen2.5-Math-1.5B, we identify a single example that elevates model performance on MATH500 from 36.0% to 73.6%, and improves the average performance across six common mathematical reasoning benchmarks from 17.6% to 35.7%. This result matches the performance obtained using the 1.2k DeepScaleR subset (MATH500: 73.6%, average: 35.9%), which includes the aforementioned example. Furthermore, RLVR with only two examples even slightly exceeds these results (MATH500: 74.8%, average: 36.6%). Similar substantial improvements are observed across various models (Qwen2.5-Math-7B, Llama3.2-3B-Instruct, DeepSeek-R1-Distill-Qwen-1.5B), RL algorithms (GRPO and PPO), and dif...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","large language models","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/a-taxonomy-of-linguistic-expressions-that-contribute-to-anthropomorphism-of-language-technologies","title":"A Taxonomy of Linguistic Expressions That Contribute To Anthropomorphism of Language Technologies","url":"https://www.microsoft.com/en-us/research/publication/a-taxonomy-of-linguistic-expressions-that-contribute-to-anthropomorphism-of-language-technologies/","published":"2025-04-28","authors":["Alicia DeVrio","Myra Cheng","Lisa Egede","Alexandra Olteanu","Su Lin Blodgett"],"abstract":"Recent attention to anthropomorphism -- the attribution of human-like qualities to non-human objects or entities -- of language technologies like LLMs has sparked renewed discussions about potential negative impacts of anthropomorphism. To productively discuss the impacts of this anthropomorphism and in what contexts it is appropriate, we need a shared vocabulary for the vast variety of ways that language can be anthropomorphic. In this work, we draw on existing literature and analyze empirical cases of user interactions with language technologies to develop a taxonomy of textual expressions that can contribute to anthropomorphism. We highlight challenges and tensions involved in understanding linguistic anthropomorphism, such as how all language is fundamentally human and how efforts to characterize and shift perceptions of humanness in machines can also dehumanize certain humans. We di...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Human-computer interaction","Computer science","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W4409855810","title":"Large Language Model-Based Workflow for Optimizing Offset Well Data Analysis and Generating Well Design Risk Profiles","url":"https://doi.org/10.4043/35607-ms","published":"2025-04-28","authors":["Peter Kowalchuk","A. Grotte","S. Brandsberg‐Dahl","Varad Sabharwal","Uwe Jensen"],"abstract":"Abstract Today's workflows for designing and approving new wells depend on modern well engineering practices that integrate input from multiple disciplines like subsurface, engineering, and drilling. Effective execution and planning within these workflows involve complex data analysis including data from historical events from nearby- and offset-wells. Accessing and preparing all this data requires extensive data reviews, that when relying on the historically mainly manual processes and tools, can be both time-consuming and resource intensive. To achieve a step change in efficiency there is a need for new, highly automated engineering solutions to streamline and enhance the end-to-end workflow efficiency. To tackle this challenge, we present an approach that employs new Large Language Models (LLMs) to effectively automate the review process for well engineers. Our LLM-based solution enab...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.4043/35607-ms","openalex_id":"https://openalex.org/W4409855810","cited_by_count":3,"quality_score":56,"matched_keywords":["LLM","language model","retrieval","efficient"],"author_affiliations":["Aker BP (Norway)","Halliburton (United States)","Microsoft (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7357558608055115},{"id":"https://openalex.org/C177212765","display_name":"Workflow","score":0.6540364027023315},{"id":"https://openalex.org/C175291020","display_name":"Offset (computer science)","score":0.5577742457389832},{"id":"https://openalex.org/C67186912","display_name":"Data modeling","score":0.4737996459007263},{"id":"https://openalex.org/C124101348","display_name":"Data mining","score":0.4341747462749481},{"id":"https://openalex.org/C77088390","display_name":"Database","score":0.308184951543808},{"id":"https://openalex.org/C199360897","display_name":"Programming language","score":0.27046555280685425}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":3}},{"id":"official:96c9a561e4ad0aea","title":"LlamaFirewall: An open source guardrail system for building secure AI agents","url":"https://ai.meta.com/research/publications/llamafirewall-an-open-source-guardrail-system-for-building-secure-ai-agents/","published":"2025-04-28","authors":["Sahana Chennabasappa","Cyrus Nikolaidis","Daniel Song","Stephanie Ding","Shengye Wan","Rashnil Chaturvedi","James Crnkovich","Beto de Paola","Lauren Deason","Nicholas Doucette","Dominik Gabi","Alekhya Gampa"],"abstract":"Official Meta AI publication page.","companies":["Meta/FAIR"],"matched_orgs":["Meta/FAIR"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Meta/FAIR"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Meta AI publication search https://ai.meta.com/results/?content_types%5B0%5D=publication&page=5"}},{"id":"openalex:W4411600904","title":"SecureGenAI: A Standardized Framework for Authentication and Provenance in AI-Generated Images using Blockchain-Enhanced Watermarking","url":"https://doi.org/10.1109/ickecs65700.2025.11035329","published":"2025-04-28","authors":["Dinesh Besiahgari"],"abstract":"The escalating popularity of Generative AI (GenAI) for image creation has raised challenges surrounding authenticity, ownership, and misuse. This paper proposes SecureGenAI, a revolutionary framework designed to tackle those issues, which incorporates invisible watermarks into the genesis of the images produced by the AI model to allow for seamless verification that is tamper-proof. SecureGenAI contrasts with more traditional methods based on watermarking, which usually add a watermark to an image after it has been created. SecureGenAI implements Discrete Wavelet Transform (DWT) watermarking in the AI model itself, allowing for greater resistance to adversarial techniques used to remove watermarks after image generation. Moreover, the framework combines the advantages of DWT watermarking with blockchain to store and authenticate the watermark data, making sure that watermarks cannot be a...","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/ickecs65700.2025.11035329","openalex_id":"https://openalex.org/W4411600904","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Amazon (United States)"],"concepts":[{"id":"https://openalex.org/C150817343","display_name":"Digital watermarking","score":0.8856422305107117},{"id":"https://openalex.org/C2779687700","display_name":"Blockchain","score":0.8772374391555786},{"id":"https://openalex.org/C2780049196","display_name":"Provenance","score":0.7297242283821106},{"id":"https://openalex.org/C148417208","display_name":"Authentication (law)","score":0.6942465901374817},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.675504207611084},{"id":"https://openalex.org/C38652104","display_name":"Computer security","score":0.3329535722732544},{"id":"https://openalex.org/C115961682","display_name":"Image (mathematics)","score":0.2559816241264343},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.2191455364227295}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/reinforcement-learning-from-automatic-feedback-for-high-quality-unit-test-generation","title":"Reinforcement Learning from Automatic Feedback for High-Quality Unit Test Generation","url":"https://www.microsoft.com/en-us/research/publication/reinforcement-learning-from-automatic-feedback-for-high-quality-unit-test-generation/","published":"2025-04-27","authors":["Benjamin Steenhoek","Michele Tufano","Neel Sundaresan","Alexey Svyatkovskiy"],"abstract":"Software testing is a crucial but time-consuming aspect of software development, and recently, Large Language Models (LLMs) have gained popularity for automated test case generation. However, because LLMs are trained on vast amounts of open-source code, they often generate test cases that do not adhere to best practices and may even contain test smells (anti-patterns). To address this issue, we propose Reinforcement Learning from Static Quality Metrics (RLSQM), wherein we utilize Reinforcement Learning to generate high-quality unit tests based on static analysis-based quality metrics. First, we analyzed LLM-generated tests and show that LLMs frequently do generate undesirable test smells -- up to 37% of the time. Then, we implemented lightweight static analysis-based reward model and trained LLMs using this reward model to optimize for five code quality metrics. Our experimental results....","companies":["Microsoft","Google/DeepMind"],"matched_orgs":["Microsoft","Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.1109/deeptest66595.2025.00011","openalex_id":"https://openalex.org/W4411203706","cited_by_count":14,"quality_score":98,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Programming languages and software engineering","1970-01-01","LLM"],"author_affiliations":["Microsoft","DeepMind (United Kingdom)","Google (United States)","Microsoft (Germany)"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/dear-diary-a-randomized-controlled-trial-of-generative-ai-coding-tools-in-the-workplace","title":"Dear Diary: A Randomized Controlled Trial of Generative AI Coding Tools in the Workplace","url":"https://www.microsoft.com/en-us/research/publication/dear-diary-a-randomized-controlled-trial-of-generative-ai-coding-tools-in-the-workplace/","published":"2025-04-27","authors":["Jenna Butler","Jina Suh","Sankeerti Haniyur","Constance Hadley"],"abstract":"Generative AI coding tools are relatively new and their impact on developers extends beyond traditional coding metrics, influencing beliefs about work and developers' roles in the workplace. This study aims to illuminate developers' preexisting beliefs about generative AI tools, their self-perceptions, and how regular use of these tools may alter these beliefs. Using a mixed methods approach, including surveys, a randomized controlled trial, and a three-week diary study, we explored the real-world application of generative AI tools within a large multinational software company. We found that the introduction and sustained use of generative AI coding tools significantly increases developers' perceptions of these tools as both useful and enjoyable. However, developers' views on the trustworthiness of AI-generated code remained unchanged. We also discovered unexpected uses of these tools, s...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Programming languages and software engineering","Computer science","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W4413360318","title":"Dialogagent: An Auto-Engagement Agent for Code Question Answering Data Production","url":"https://doi.org/10.1109/icse-seip66354.2025.00012","published":"2025-04-27","authors":["Xiaoyun Liang","Jingyi Ren","Jiaxing Qi","Chao Peng","Bo Jiang"],"abstract":"Large Language Models (LLMs) have become increasingly integral to enhancing developer productivity, particularly in code generation, comprehension, and repair tasks. However, fine-tuning these models with high-quality, real-world data is challenging due to privacy concerns and the lack of accessible, labeled datasets. In this paper, we present DialogAgent, an automated tool for generating synthetic training data that closely mimics real developer interactions within Integrated Development Environments (IDEs). DialogAgent enables the production of diverse, high-fidelity query-response pairs by simulating multi-turn dialogues and contextual behaviors observed in real-world programming scenarios. The tool significantly reduces the reliance on manual data generation, increasing efficiency by 4.8 times compared to traditional methods. Our experiments and online deployment demonstrate substant...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/icse-seip66354.2025.00012","openalex_id":"https://openalex.org/W4413360318","cited_by_count":1,"quality_score":42,"matched_keywords":["agent"],"author_affiliations":["Tencent (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7012858986854553},{"id":"https://openalex.org/C2778348673","display_name":"Production (economics)","score":0.6569308042526245},{"id":"https://openalex.org/C2776760102","display_name":"Code (set theory)","score":0.5518152713775635},{"id":"https://openalex.org/C44291984","display_name":"Question answering","score":0.5146437883377075},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.379766047000885},{"id":"https://openalex.org/C199360897","display_name":"Programming language","score":0.37701416015625},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.35507991909980774},{"id":"https://openalex.org/C2522767166","display_name":"Data science","score":0.3424373269081116}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4409833122","title":"RAG-Driven multiple assertions generation with large language models","url":"https://doi.org/10.1007/s10664-025-10641-1","published":"2025-04-26","authors":["Zhuang Liu","Hailong Wang","Tongtong Xu","Bei Wang"],"abstract":"","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1007/s10664-025-10641-1","openalex_id":"https://openalex.org/W4409833122","cited_by_count":2,"quality_score":39,"matched_keywords":[],"author_affiliations":["Huawei Technologies (China)","Zhejiang University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.602647066116333},{"id":"https://openalex.org/C199360897","display_name":"Programming language","score":0.420812726020813},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.4026113748550415}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":2}},{"id":"openalex:W4409787802","title":"GenAI applications of vision-language models for semiconductor defect classification","url":"https://doi.org/10.1117/12.3064772","published":"2025-04-26","authors":["Ting-Hung Lin","Hung‐Jen Chen","Yuan-Chung Wei","Sung-Po Yang","Yung‐Lun Lin","Parthasarathy Sriram","Guan-Hong Liou","Yu-Chieh Huang","Yi-Hsuan Chiu","P. T. Lai","Yiyi Wang","Mark Peng"],"abstract":"Visual Language Models (VLMs) combine advanced image/video recognition with the dialog capabilities of Large Language Models (LLMs), bringing AI applications closer to human-like intelligence. In the field of semiconductor imaging, this is an industry-first case study demonstrating the integration of VLMs for detection and classification—paving the way for entirely new solutions in semiconductor imaging. Since the advent of deep learning, CNN-based models have helped the semiconductor industry achieve success in anomaly classification, including optical and e-beam inspection, defect distribution maps, and even lithographic quality assessment. However, building a robust model typically requires thousands of images to reach high accuracy, and retraining is necessary whenever new products, processes, or defect types emerge—especially challenging during the initial stages of data collection....","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1117/12.3064772","openalex_id":"https://openalex.org/W4409787802","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Nvidia (United States)","Taiwan Semiconductor Manufacturing Company (Taiwan)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6143994331359863},{"id":"https://openalex.org/C108225325","display_name":"Semiconductor","score":0.4176316559314728},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4099154472351074},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.33940285444259644},{"id":"https://openalex.org/C49040817","display_name":"Optoelectronics","score":0.24974164366722107},{"id":"https://openalex.org/C192562407","display_name":"Materials science","score":0.2198488712310791}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"bytedance-seed:270","title":"ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference","url":"https://seed.bytedance.com/en/research/shadowkv-kv-cache-in-shadows-for-high-throughput-long-context-llm-inference","published":"2025-04-25","authors":["Hanshi Sun","Li-Wen Chang","Wenlei Bao","Size Zheng","Ningxin Zheng","Xin Liu","Harry Dong","Yuejie Chi","Beidi Chen"],"abstract":"With the widespread deployment of long-context large language models (LLMs), there has been a growing demand for efficient support of high-throughput inference. However, as the key-value (KV) cache expands with the sequence length, the increasing memory footprint and the need to access it for each token generation both result in low throughput when serving long-context LLMs. While various dynamic sparse attention methods have been proposed to speed up inference while maintaining generation quality, they either fail to sufficiently reduce GPU memory consumption or introduce significant decoding latency by offloading the KV cache to the CPU. We present ShadowKV, a high-throughput long-context LLM inference system that stores the low-rank key cache and offloads the value cache to reduce the memory footprint for larger batch sizes and longer sequences. To minimize decoding latency, ShadowKV....","companies":["ByteDance/Seed"],"matched_orgs":["ByteDance/Seed"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Machine Learning","Infrastructures","ICML 2025","LLM","memory","efficient"],"author_affiliations":["ByteDance/Seed"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official ByteDance Seed publication API https://seed.bytedance.com/api/get_article_list_v2"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/towards-flood-extent-forecasting-evaluating-a-weather-foundation-model-and-u-net-for-flood-forecasting","title":"Towards Flood Extent Forecasting: Evaluating a Weather Foundation Model and U-Net for Flood Forecasting","url":"https://www.microsoft.com/en-us/research/publication/towards-flood-extent-forecasting-evaluating-a-weather-foundation-model-and-u-net-for-flood-forecasting/","published":"2025-04-25","authors":["Samuel Chege Maina","Eric Wanjau"],"abstract":"This study explores a data-driven approach that combines flood forcing factors from observation and reanalysis datasets, antecedent flood extent maps, and deep learning to forecast daily flood extents in Rwanda. We extend the architecture used in ClimaX (transformer weather and climate foundation model), investigate its pretrained representations for flood forecasting, and compare performance against a U-Net baseline. Our results demonstrate that a ClimaX variant trained from scratch with a linear projection decoder outperforms the U-Net and other ClimaX variants, highlighting its potential as an effective tool for flood extent forecasting. This work underscores the potential of data-driven deep learning models for flood extent forecasting with implications for improving disaster preparedness and flood risk assessment in vulnerable regions. Venue: 1970-01-01","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Technology for emerging markets","Climate forecast","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/lawflow-collecting-and-simulating-lawyers-thought-processes","title":"LawFlow: Collecting and Simulating Lawyers' Thought Processes","url":"https://www.microsoft.com/en-us/research/publication/lawflow-collecting-and-simulating-lawyers-thought-processes/","published":"2025-04-25","authors":["Debarati Das","Khanh Chi Le","R. Parkar","Karin de Langis","Brendan Madson","Chad M. Berryman","Robin M. Willis","Daniel H. Moses","Brett McDonnell","Dan Schwarcz","Dongyeop Kang"],"abstract":"Legal practitioners, particularly those early in their careers, face complex, high-stakes tasks that require adaptive, context-sensitive reasoning. While AI holds promise in supporting legal work, current datasets and models are narrowly focused on isolated subtasks and fail to capture the end-to-end decision-making required in real-world practice. To address this gap, we introduce LawFlow, a dataset of complete end-to-end legal workflows collected from trained law students, grounded in real-world business entity formation scenarios. Unlike prior datasets focused on input-output pairs or linear chains of thought, LawFlow captures dynamic, modular, and iterative reasoning processes that reflect the ambiguity, revision, and client-adaptive strategies of legal practice. Using LawFlow, we compare human and LLM-generated workflows, revealing systematic differences in structure, reasoning flex...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Miscellaneous","Human language technologies","Social sciences","Computer science","LLM"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"hf-org-paper:moonshotai:2504.18425","title":"Kimi-Audio Technical Report","url":"https://huggingface.co/papers/2504.18425","published":"2025-04-25","authors":["Moonshot/Kimi"],"abstract":"We present Kimi-Audio, an open-source audio foundation model that excels in audio understanding, generation, and conversation. We detail the practices in building Kimi-Audio, including model architecture, data curation, training recipe, inference deployment, and evaluation. Specifically, we leverage a 12.5Hz audio tokenizer, design a novel LLM-based architecture with continuous features as input and discrete tokens as output, and develop a chunk-wise streaming detokenizer based on flow matching. We curate a pre-training dataset that consists of more than 13 million hours of audio data covering a wide range of modalities including speech, sound, and music, and build a pipeline to construct high-quality and diverse post-training data. Initialized from a pre-trained LLM, Kimi-Audio is continual pre-trained on both audio and text data with several carefully designed tasks, and then fine-tune...","companies":["Moonshot/Kimi"],"matched_orgs":["Moonshot/Kimi"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["HuggingFace org papers","moonshotai","LLM"],"author_affiliations":["Moonshot/Kimi"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/moonshotai/papers"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/secom-on-memory-construction-and-retrieval-for-personalized-conversational-agents","title":"SeCom: On Memory Construction and Retrieval for Personalized Conversational Agents","url":"https://www.microsoft.com/en-us/research/publication/secom-on-memory-construction-and-retrieval-for-personalized-conversational-agents/","published":"2025-04-24","authors":["Zhuoshi Pan","Qianhui Wu","Huiqiang Jiang","Xufang Luo","Hao Cheng","Dongsheng Li","Yuqing Yang","Chin-Yew Lin","H. Vicky Zhao","Lili Qiu","Jianfeng Gao"],"abstract":"To deliver coherent and personalized experiences in long-term conversations, existing approaches typically perform retrieval augmented response generation by constructing memory banks from conversation history at either the turn-level, session-level, or through summarization. In this paper, we present two key findings: (1) The granularity of memory unit matters: Turn-level, session-level, and summarization-based methods each exhibit limitations in both memory retrieval accuracy and the semantic quality of the retrieved content. (2) Prompt compression methods, such as LLMLingua-2 , can effectively serve as a denoising mechanism, enhancing memory retrieval accuracy across different granularities.Building on these insights, we propose SeCom , a method that constructs the memory bank at segment level by introducing a conversation Se gmentation model that partitions long-term conversations in...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":96,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Human-computer interaction","AI agents","large language models","1970-01-01","personalized","memory","long-term","retrieval","compression"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/tales-text-adventure-learning-environment-suite","title":"TALES: Text Adventure Learning Environment Suite","url":"https://www.microsoft.com/en-us/research/publication/tales-text-adventure-learning-environment-suite/","published":"2025-04-24","authors":["Christopher Zhang Cui","Xingdi Yuan","Zhang Xiao","Prithviraj Ammanabrolu","Marc-Alexandre Côté"],"abstract":"Reasoning is an essential skill to enable Large Language Models (LLMs) to interact with the world. As tasks become more complex, they demand increasingly sophisticated and diverse reasoning capabilities for sequential decision-making, requiring structured reasoning over the context history to determine the next best action. We introduce TALES, a diverse collection of synthetic and human-written text-adventure games designed to challenge and evaluate diverse reasoning capabilities. We present results over a range of LLMs, open- and closed-weights, performing a qualitative analysis on the top performing models. Despite an impressive showing on synthetic games, even the top LLM-driven agents fail to achieve 15% on games designed for human enjoyment. Code and visualization of the experiments can be found at https://microsoft.github.io/tale-suite.","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Miscellaneous","Artificial intelligence","Computation and Language","Computer science","LLM"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"bytedance-seed:93","title":"Let the Code LLM Edit Itself When You Edit the Code","url":"https://seed.bytedance.com/en/research/let-the-code-llm-edit-itself-when-you-edit-the-code","published":"2025-04-24","authors":["Zhenyu He","Jun Zhang","Shengjie Luo","Jingjing Xu","Zhi Zhang","Di He"],"abstract":"In this work, we investigate a typical scenario in code generation where a developer edits existing code in real time and requests a code assistant, e.g., a large language model, to re-predict the next token or next line on the fly. Naively, the LLM needs to re-encode the entire KV cache to provide an accurate prediction. However, this process is computationally expensive, especially when the sequence length is long. Simply encoding the edited subsequence and integrating it to the original KV cache meets the temporal confusion problem, leading to significantly worse performance. We address this efficiency and accuracy trade-off by introducing \\underline{\\textbf{Positional \\textbf{I}ntegrity \\textbf{E}ncoding} (PIE). Building upon the rotary positional encoding, PIE first removes the rotary matrices in the Key cache that introduce temporal confusion and then reapplies the correct rotary m...","companies":["ByteDance/Seed"],"matched_orgs":["ByteDance/Seed"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["NLP","Infrastructures","ICLR 2025","LLM","language model"],"author_affiliations":["ByteDance/Seed"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official ByteDance Seed publication API https://seed.bytedance.com/api/get_article_list_v2"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/walk-the-talk-measuring-the-faithfulness-of-large-language-model-explanations","title":"Walk the Talk? Measuring the Faithfulness of Large Language Model Explanations","url":"https://www.microsoft.com/en-us/research/publication/walk-the-talk-measuring-the-faithfulness-of-large-language-model-explanations/","published":"2025-04-24","authors":["Katie Matton","Robert Osazuwa Ness","John Guttag","Emre Kiciman"],"abstract":"Large language models (LLMs) are capable of generating plausible explanations of how they arrived at an answer to a question. However, these explanations can misrepresent the model's \"reasoning\" process, i.e., they can be unfaithful . This, in turn, can lead to over-trust and misuse. We introduce a new approach for measuring the faithfulness of LLM explanations. First, we provide a rigorous definition of faithfulness. Since LLM explanations mimic human explanations, they often reference high-level concepts in the input question that purportedly influenced the model. We define faithfulness in terms of the difference between the set of concepts that the LLM's explanations imply are influential and the set that truly are. Second, we present a novel method for estimating faithfulness that is based on: (1) using an auxiliary LLM to modify the values of concepts within model inputs to create r...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","LLM","language model"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W4409747853","title":"Prototyping with Prompts: Emerging Approaches and Challenges in Generative AI Design for Collaborative Software Teams","url":"https://doi.org/10.1145/3706598.3713166","published":"2025-04-24","authors":["Hari Subramonyam","Divy Thakkar","Andrew Ku","Jürgen Dieber","A. K. Sinha"],"abstract":"","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3706598.3713166","openalex_id":"https://openalex.org/W4409747853","cited_by_count":16,"quality_score":53,"matched_keywords":[],"author_affiliations":["Google (United States)","Stanford University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7665024995803833},{"id":"https://openalex.org/C115903868","display_name":"Software engineering","score":0.6172729134559631},{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.5633831024169922},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.5326606631278992},{"id":"https://openalex.org/C2780395129","display_name":"Rapid prototyping","score":0.49234211444854736},{"id":"https://openalex.org/C2777904410","display_name":"Software","score":0.4689732789993286},{"id":"https://openalex.org/C184408114","display_name":"Generative Design","score":0.4352272152900696},{"id":"https://openalex.org/C2776697782","display_name":"Software prototyping","score":0.426433801651001}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":16}},{"id":"openalex:W4409748984","title":"Tap&Say: Touch Location-Informed Large Language Model for Multimodal Text Correction on Smartphones","url":"https://doi.org/10.1145/3706598.3713376","published":"2025-04-24","authors":["Maozheng Zhao","Michael Xuelin Huang","Nathan G Huang","Shanqing Cai","Henry Huang","Michael Huang","Shumin Zhai","I. V. Ramakrishnan","Xiaojun Bi"],"abstract":"layer that integrates the tap location into the LLM's attention mechanism, enabling it to utilize the tap location for text correction. We fine-tuned the touch location-informed LLM on synthetic touch locations and correction commands, achieving significantly higher correction accuracy than the state-of-the-art method VT [45]. A 16-person user study demonstrated that Tap&Say outperforms VT [45] with 16.4% shorter task completion time and 47.5% fewer keyboard clicks and is preferred by users.","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3706598.3713376","openalex_id":"https://openalex.org/W4409748984","cited_by_count":3,"quality_score":48,"matched_keywords":["LLM","language model"],"author_affiliations":["Google (United States)","Harvard University Press","IDEA Public Schools","Stony Brook University","The University of Texas at Austin"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6665773391723633},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.5433658957481384},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.4798581004142761},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.3738235831260681},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3545961081981659},{"id":"https://openalex.org/C49774154","display_name":"Multimedia","score":0.34530705213546753},{"id":"https://openalex.org/C28490314","display_name":"Speech recognition","score":0.3415634334087372},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.34012508392333984}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":3}},{"id":"openalex:W4409749036","title":"Online-EYE: Multimodal Implicit Eye Tracking Calibration for XR","url":"https://doi.org/10.1145/3706598.3713461","published":"2025-04-24","authors":["Baosheng James Hou","Lucy Abramyan","Prasanthi Gurumurthy","Haley Adams","Ivana Tosic Rodgers","Eric J. Gonzalez","Khushman Patel","Andrea Colaço","Ken Pfeuffer","Hans Gellersen","Karan Ahuja","Mar González-Franco"],"abstract":"Unlike other inputs for extended reality (XR) that work out of the box, eye tracking typically requires custom calibration per user or session. We present a multimodal inputs approach for implicit calibration of eye tracker in VR, leveraging UI interaction for continuous, background calibration. Our method analyzes gaze data alongside controller interaction with UI elements, and employing ML techniques it continuously refines the calibration matrix without interrupting users from their current tasks. Potentially eliminating the need for explicit calibration. We demonstrate the accuracy and effectiveness of this implicit approach across various tasks and real time applications achieving comparable eye tracking accuracy to native, explicit calibration. While our evaluation focuses on VR and controller-based interactions, we anticipate the broader applicability of this approach to various X...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3706598.3713461","openalex_id":"https://openalex.org/W4409749036","cited_by_count":3,"quality_score":40,"matched_keywords":[],"author_affiliations":["Aarhus University","Google (United States)","Lancaster University","Northwestern University","Seattle University"],"concepts":[{"id":"https://openalex.org/C56461940","display_name":"Eye tracking","score":0.7732782363891602},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7072815299034119},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5936177968978882},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.574665904045105},{"id":"https://openalex.org/C165838908","display_name":"Calibration","score":0.5729780793190002},{"id":"https://openalex.org/C2775936607","display_name":"Tracking (education)","score":0.5289528369903564},{"id":"https://openalex.org/C33923547","display_name":"Mathematics","score":0.12007519602775574},{"id":"https://openalex.org/C15744967","display_name":"Psychology","score":0.10983270406723022}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":3}},{"id":"openalex:W4412567916","title":"Generalized Signature Method for Multivariate Time Series: A Data-Driven Framework for Feature Extraction","url":"https://doi.org/10.1109/amathe65477.2025.11081364","published":"2025-04-24","authors":["Praval Panwar"],"abstract":"The signature method, rooted in controlled differential equation theory, offers a robust approach to feature extraction from multimodal sequential data, widely applicable in data science. This study presents a generalized signature framework that unifies existing variations, categorizing them into augmentations, windows, transforms, and rescalings. By integrating these methods, we develop new combinations to optimize feature extraction for multivariate time series analysis. An extensive empirical evaluation on 26 datasets identifies which configurations yield the best predictive performance, leading to a canonical pipeline for the generalized signature method. This optimized approach achieves state-of-the-art accuracy on benchmark problems in multivariate time series classification, offering a powerful tool for data scientists working with complex sequential data.","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/amathe65477.2025.11081364","openalex_id":"https://openalex.org/W4412567916","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Microsoft (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7354099154472351},{"id":"https://openalex.org/C161584116","display_name":"Multivariate statistics","score":0.7031455039978027},{"id":"https://openalex.org/C2779696439","display_name":"Signature (topology)","score":0.6503783464431763},{"id":"https://openalex.org/C151406439","display_name":"Time series","score":0.6222822666168213},{"id":"https://openalex.org/C143724316","display_name":"Series (stratigraphy)","score":0.5865686535835266},{"id":"https://openalex.org/C52622490","display_name":"Feature extraction","score":0.5845898985862732},{"id":"https://openalex.org/C2776401178","display_name":"Feature (linguistics)","score":0.48815208673477173},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.47505733370780945}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4409720032","title":"Human Subjects Research in the Age of Generative AI: Opportunities and Challenges of Applying LLM-Simulated Data to HCI Studies","url":"https://doi.org/10.1145/3706599.3716299","published":"2025-04-23","authors":["Angel Hsing‐Chi Hwang","Michael S. Bernstein","S. Shyam Sundar","Renwen Zhang","Manoel Horta Ribeiro","Yingdan Lu","Serina Chang","Tongshuang Wu","Aimei Yang","Dmitri Williams","Joon Sung Park","Katherine Ognyanova"],"abstract":"","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3706599.3716299","openalex_id":"https://openalex.org/W4409720032","cited_by_count":3,"quality_score":44,"matched_keywords":["LLM"],"author_affiliations":["Carnegie Mellon University","Johns Hopkins University","Microsoft (United States)","National University of Singapore","Northwestern University","Pennsylvania State University","Princeton University","Rutgers, The State University of New Jersey","Stanford University","Sungkyunkwan University","Toyota Industries (United States)","Toyota Research Institute","University of California, Berkeley","University of Southern California"],"concepts":[{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.698245644569397},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5248008966445923},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.40051645040512085},{"id":"https://openalex.org/C2522767166","display_name":"Data science","score":0.39159929752349854},{"id":"https://openalex.org/C188147891","display_name":"Cognitive science","score":0.32870936393737793},{"id":"https://openalex.org/C15744967","display_name":"Psychology","score":0.26036888360977173}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":3}},{"id":"openalex:W4409720331","title":"Generative AI for medical education: Insights from a case study with medical students and an AI tutor for clinical reasoning","url":"https://doi.org/10.1145/3706599.3721208","published":"2025-04-23","authors":["Amy Wang","Roma Ruparel","Anna Iurchenko","Paul Jhun","Julie Anne Séguin","P. D. Strachan","Renee Wong","Alan Karthikesalingam","Yossi Matias","Avinatan Hassidim","Dale R. Webster","Christopher Semturs"],"abstract":"","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3706599.3721208","openalex_id":"https://openalex.org/W4409720331","cited_by_count":3,"quality_score":40,"matched_keywords":[],"author_affiliations":["Google (Israel)","Google (United States)","McMaster University"],"concepts":[{"id":"https://openalex.org/C2778371403","display_name":"TUTOR","score":0.888997495174408},{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.7255130410194397},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6019706130027771},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5463848114013672},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.37638112902641296},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.3346762955188751},{"id":"https://openalex.org/C509550671","display_name":"Medical education","score":0.32662099599838257},{"id":"https://openalex.org/C71924100","display_name":"Medicine","score":0.15411153435707092}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":3}},{"id":"arxiv:2504.16353","title":"Transformer-Based Extraction of Statutory Definitions from the U.S. Code","url":"http://arxiv.org/abs/2504.16353","published":"2025-04-23","authors":["Arpana Hosabettu","Harsh Shah"],"abstract":"Automatic extraction of definitions from legal texts is critical for enhancing the comprehension and clarity of complex legal corpora such as the United States Code (U.S.C.). We present an advanced NLP system leveraging transformer-based architectures to automatically extract defined terms, their definitions, and their scope from the U.S.C. We address the challenges of automatically identifying legal definitions, extracting defined terms, and determining their scope within this complex corpus of over 200,000 pages of federal statutory law. Building upon previous feature-based machine learning methods, our updated model employs domain-specific transformers (Legal-BERT) fine-tuned specifically for statutory texts, significantly improving extraction accuracy. Our work implements a multi-stage pipeline that combines document structure analysis with state-of-the-art language models to process...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"preprint","doi":"https://doi.org/10.48550/arxiv.2504.16353","openalex_id":"https://openalex.org/W4415065682","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Cornell University","Google (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7688000202178955},{"id":"https://openalex.org/C2777146004","display_name":"CLARITY","score":0.6360999941825867},{"id":"https://openalex.org/C195807954","display_name":"Information extraction","score":0.6021000146865845},{"id":"https://openalex.org/C2777206241","display_name":"Paragraph","score":0.5504000186920166},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.504800021648407},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.4871000051498413},{"id":"https://openalex.org/C2778012447","display_name":"Scope (computer science)","score":0.48089998960494995},{"id":"https://openalex.org/C8797682","display_name":"XML","score":0.4350000023841858}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"bytedance-seed:248","title":"Prompting Depth Anything for 4K Resolution Accurate Metric Depth Estimation","url":"https://seed.bytedance.com/en/research/prompting-depth-anything-for-4k-resolution-accurate-metric-depth-estimation","published":"2025-04-22","authors":["Haotong Lin","Sida Peng","Jingxiao Chen","Songyou Peng","Jiaming Sun","Minghuan Liu","Hujun Bao","Jiashi Feng","Xiaowei Zhou","Bingyi Kang"],"abstract":"Prompts play a critical role in unleashing the power of language and vision foundation models for specific tasks. For the first time, we introduce prompting into depth foundation models, creating a new paradigm for metric depth estimation termed Prompt Depth Anything. Specifically, we use a low-cost LiDAR as the prompt to guide the Depth Anything model for accurate metric depth output, achieving up to 4K resolution. Our approach centers on a concise prompt fusion design that integrates the LiDAR at multiple scales within the depth decoder. To address training challenges posed by limited datasets containing both LiDAR depth and precise GT depth, we propose a scalable data pipeline that includes synthetic data LiDAR simulation and real data pseudo GT depth generation. Our approach sets new state-of-the-arts on the ARKitScenes and ScanNet++ datasets and benefits downstream applications, inc...","companies":["ByteDance/Seed"],"matched_orgs":["ByteDance/Seed"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Computer Vision","Vision","CVPR 2025"],"author_affiliations":["ByteDance/Seed"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official ByteDance Seed publication API https://seed.bytedance.com/api/get_article_list_v2"}},{"id":"openalex:W4409657360","title":"G-Refer: Graph Retrieval-Augmented Large Language Model for Explainable Recommendation","url":"https://doi.org/10.1145/3696410.3714727","published":"2025-04-22","authors":["Yuhan Li","Xinni Zhang","Linhao Luo","Heng Chang","Yuxiang Ren","Irwin King","Jia Li"],"abstract":"Explainable recommendation has demonstrated significant advantages in informing users about the logic behind recommendations, thereby increasing system transparency, effectiveness, and trustworthiness. To provide personalized and interpretable explanations, existing works often combine the generation capabilities of large language models (LLMs) with collaborative filtering (CF) information. CF information extracted from the user-item interaction graph captures the user behaviors and preferences, which is crucial for providing informative explanations. However, due to the complexity of graph structure, effectively extracting the CF information from graphs still remains a challenge. Moreover, existing methods often struggle with the integration of extracted CF information with LLMs due to its implicit representation and the modality gap between graph structures and natural language explana...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3696410.3714727","openalex_id":"https://openalex.org/W4409657360","cited_by_count":14,"quality_score":63,"matched_keywords":["language model","personalized","retrieval"],"author_affiliations":["Chinese University of Hong Kong","Huawei Technologies (China)","Monash University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7982950806617737},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.5594637393951416},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.557222306728363},{"id":"https://openalex.org/C132525143","display_name":"Graph","score":0.49434542655944824},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.4699324071407318},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.400684118270874},{"id":"https://openalex.org/C80444323","display_name":"Theoretical computer science","score":0.20118963718414307}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":14}},{"id":"openalex:W4409671337","title":"TELEClass: Taxonomy Enrichment and LLM-Enhanced Hierarchical Text Classification with Minimal Supervision","url":"https://doi.org/10.1145/3696410.3714940","published":"2025-04-22","authors":["Yunyi Zhang","Ruozhen Yang","Xueqiang Xu","Rui Li","Jinfeng Xiao","Jiaming Shen","Jiawei Han"],"abstract":"Hierarchical text classification aims to categorize each document into a set of classes in a label taxonomy, which is a fundamental web text mining task with broad applications such as web content analysis and semantic indexing. Most earlier works focus on fully or semi-supervised methods that require a large amount of human annotated data which is costly and time-consuming to acquire. To alleviate human efforts, in this paper, we work on hierarchical text classification with a minimal amount of supervision: using the sole class name of each node as the only supervision. Recently, large language models (LLM) have shown competitive performance on various tasks through zero-shot prompting, but this method performs poorly in the hierarchical setting because it is ineffective to include the large and structured label space in a prompt. On the other hand, previous weakly-supervised hierarchic...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3696410.3714940","openalex_id":"https://openalex.org/W4409671337","cited_by_count":19,"quality_score":60,"matched_keywords":["LLM"],"author_affiliations":["Google (United States)","University of Illinois Urbana-Champaign","University of Science and Technology of China"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7124136686325073},{"id":"https://openalex.org/C58642233","display_name":"Taxonomy (biology)","score":0.6300689578056335},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.5382823944091797},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4293237626552582},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.3304515779018402},{"id":"https://openalex.org/C86803240","display_name":"Biology","score":0.12840476632118225},{"id":"https://openalex.org/C59822182","display_name":"Botany","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":19}},{"id":"arxiv:2502.09058","title":"Unleashing the Power of Large Language Model for Denoising Recommendation","url":"http://arxiv.org/abs/2502.09058","published":"2025-04-22","authors":["Shuyao Wang","Zhi Zheng","Yongduo Sui","Hui Xiong"],"abstract":"Recommender systems are crucial for personalizing user experiences but often depend on implicit feedback data, which can be noisy and misleading. Existing denoising studies involve incorporating auxiliary information or learning strategies from interaction data. However, they struggle with the inherent limitations of external knowledge and interaction data, as well as the non-universality of certain predefined assumptions, hindering accurate noise identification. Recently, large language models (LLMs) have gained attention for their extensive world knowledge and reasoning abilities, yet their potential in enhancing denoising in recommendations remains underexplored. In this paper, we introduce LLaRD, a framework leveraging LLMs to improve denoising in recommender systems, thereby boosting overall recommendation performance. Specifically, LLaRD generates denoising-related knowledge by fir...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"preprint","doi":"https://doi.org/10.1145/3696410.3714758","openalex_id":"https://openalex.org/W4407571719","cited_by_count":10,"quality_score":59,"matched_keywords":["LLM","language model","preference"],"author_affiliations":["Hong Kong University of Science and Technology","Tencent (China)","University of Hong Kong","University of Science and Technology of China"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7358646988868713},{"id":"https://openalex.org/C2780513914","display_name":"Bottleneck","score":0.6558101773262024},{"id":"https://openalex.org/C163294075","display_name":"Noise reduction","score":0.6190010905265808},{"id":"https://openalex.org/C165696696","display_name":"Exploit","score":0.5842354893684387},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.5510405898094177},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5453561544418335},{"id":"https://openalex.org/C21569690","display_name":"Collaborative filtering","score":0.5093070268630981},{"id":"https://openalex.org/C557471498","display_name":"Recommender system","score":0.48683875799179077}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":10}},{"id":"openalex:W4409671924","title":"SymAgent: A Neural-Symbolic Self-Learning Agent Framework for Complex Reasoning over Knowledge Graphs","url":"https://doi.org/10.1145/3696410.3714768","published":"2025-04-22","authors":["Ben Liu","Jihai Zhang","Fangquan Lin","Cheng Yang","Min Peng","Wotao Yin"],"abstract":"Recent advancements have highlighted that Large Language Models (LLMs) are prone to hallucinations when solving complex reasoning problems, leading to erroneous results. To tackle this issue, researchers incorporate Knowledge Graphs (KGs) to improve the reasoning ability of LLMs. However, existing methods face two limitations: 1) they typically assume that all answers to the questions are contained in KGs, neglecting the incompleteness issue of KGs, and 2) they treat the KG as a static repository and overlook the implicit logical reasoning structures inherent in KGs. In this paper, we introduce SymAgent, an innovative neural-symbolic agent framework that achieves collaborative augmentation between KGs and LLMs. We conceptualize KGs as dynamic environments and transform complex reasoning tasks into a multi-step interactive process, enabling KGs to participate deeply in the reasoning proce...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3696410.3714768","openalex_id":"https://openalex.org/W4409671924","cited_by_count":9,"quality_score":58,"matched_keywords":["LLM","efficient","agent"],"author_affiliations":["Alibaba Group (China)","Alibaba Group (United States)","Bellevue Hospital Center","Wuhan University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7452584505081177},{"id":"https://openalex.org/C2987255567","display_name":"Knowledge graph","score":0.5850988626480103},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5338718891143799},{"id":"https://openalex.org/C50644808","display_name":"Artificial neural network","score":0.4802396893501282},{"id":"https://openalex.org/C80444323","display_name":"Theoretical computer science","score":0.3357110619544983},{"id":"https://openalex.org/C188147891","display_name":"Cognitive science","score":0.33200111985206604},{"id":"https://openalex.org/C15744967","display_name":"Psychology","score":0.10602489113807678}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":9}},{"id":"openalex:W4409671937","title":"Filtering Discomforting Recommendations with Large Language Models","url":"https://doi.org/10.1145/3696410.3714850","published":"2025-04-22","authors":["Jiahao Liu","Yiyang Shao","Peng Zhang","Dongsheng Li","Hansu Gu","Chao Chen","Longzhi Du","Tun Lu","Ning Gu"],"abstract":"Personalized algorithms can inadvertently expose users to discomforting recommendations, potentially triggering negative consequences. The subjectivity of discomfort and the black-box nature of these algorithms make it challenging to effectively identify and filter such content. To address this, we first conducted a formative study to understand users' practices and expectations regarding discomforting recommendation filtering. Then, we designed a Large Language Model (LLM)-based tool named DiscomfortFilter, which constructs an editable preference profile for a user and helps the user express filtering needs through conversation to mask discomforting preferences within the profile. Based on the edited profile, DiscomfortFilter facilitates the discomforting recommendations filtering in a plug-and-play manner, maintaining flexibility and transparency. The constructed preference profile imp...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3696410.3714850","openalex_id":"https://openalex.org/W4409671937","cited_by_count":5,"quality_score":58,"matched_keywords":["LLM","language model","personalized","preference"],"author_affiliations":["Alibaba Group (China)","Fudan University","Independent Sector","Microsoft Research Asia (China)","Seattle University","Shanghai Jiao Tong University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.666932225227356},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.36961281299591064},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3427661657333374}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":5}},{"id":"arxiv:2502.14735","title":"EAGER-LLM: Enhancing Large Language Models as Recommenders through Exogenous Behavior-Semantic Integration","url":"http://arxiv.org/abs/2502.14735","published":"2025-04-22","authors":["Minjie Hong","Yan Xia","Zehan Wang","Jieming Zhu","Ye Wang","Sihang Cai","Xiaoda Yang","Quanyu Dai","Zhenhua Dong","Zhimeng Zhang","Zhou Zhao"],"abstract":"Large language models (LLMs) are increasingly leveraged as foundational backbones in the development of advanced recommender systems, offering enhanced capabilities through their extensive knowledge and reasoning. Existing llm-based recommender systems (RSs) often face challenges due to the significant differences between the linguistic semantics of pre-trained LLMs and the collaborative semantics essential for RSs. These systems use pre-trained linguistic semantics but learn collaborative semantics from scratch via the llm-Backbone. However, LLMs are not designed for recommendations, leading to inefficient collaborative learning, weak result correlations, and poor integration of traditional RS features. To address these challenges, we propose EAGER-LLM, a decoder-only llm-based generative recommendation framework that integrates endogenous and exogenous behavioral and semantic informati...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"preprint","doi":"https://doi.org/10.1145/3696410.3714933","openalex_id":"https://openalex.org/W4407814501","cited_by_count":8,"quality_score":53,"matched_keywords":["LLM","efficient"],"author_affiliations":["Huawei Technologies (China)","Zhejiang University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7907909154891968},{"id":"https://openalex.org/C557471498","display_name":"Recommender system","score":0.5529482960700989},{"id":"https://openalex.org/C184337299","display_name":"Semantics (computer science)","score":0.5346471667289734},{"id":"https://openalex.org/C75165309","display_name":"Search engine indexing","score":0.506102979183197},{"id":"https://openalex.org/C2385561","display_name":"RSS","score":0.49661093950271606},{"id":"https://openalex.org/C511192102","display_name":"Comprehension","score":0.4600737690925598},{"id":"https://openalex.org/C110903229","display_name":"Semantic integration","score":0.42789506912231445},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4085369110107422}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":8}},{"id":"openalex:W4409671189","title":"Beyond Utility: Evaluating LLM as Recommender","url":"https://doi.org/10.1145/3696410.3714759","published":"2025-04-22","authors":["Chumeng Jiang","Jiayin Wang","Weizhi Ma","Charles L. A. Clarke","Shuai Wang","Chuhan Wu","Min Zhang"],"abstract":"With the rapid development of Large Language Models (LLMs), recent studies employed LLMs as recommenders to provide personalized information services for distinct users. Despite efforts to improve the accuracy of LLM-based recommendation models, relatively little attention is paid to beyond-utility dimensions. Moreover, there are unique evaluation aspects of LLM-based recommendation models, which have been largely ignored. To bridge this gap, we explore four new evaluation dimensions and propose a multidimensional evaluation framework. The new evaluation dimensions include: 1) history length sensitivity, 2) candidate position bias, 3) generation-involved performance, and 4) hallucinations. All four dimensions have the potential to impact performance, but are largely unnecessary for consideration in traditional systems. Using this multidimensional evaluation framework, along with traditio...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3696410.3714759","openalex_id":"https://openalex.org/W4409671189","cited_by_count":8,"quality_score":53,"matched_keywords":["LLM","personalized"],"author_affiliations":["Huawei Technologies (China)","Tsinghua University","University of Waterloo"],"concepts":[{"id":"https://openalex.org/C557471498","display_name":"Recommender system","score":0.8068557381629944},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7604442834854126},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.32840996980667114}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":8}},{"id":"openalex:W4409657210","title":"Tool Learning in the Wild: Empowering Language Models as Automatic Tool Agents","url":"https://doi.org/10.1145/3696410.3714825","published":"2025-04-22","authors":["Zhengliang Shi","Shen Gao","Lingyong Yan","Yue Feng","Xiuyi Chen","Zhumin Chen","Dawei Yin","Suzan Verberne","Zhaochun Ren"],"abstract":"Augmenting large language models (LLMs) with external tools has emerged as a promising approach to extend their utility, enabling them to solve practical tasks.Previous methods manually parse tool documentation and create in-context demonstrations, transforming tools into structured formats for LLMs to use in their step-by-step reasoning.However, this manual process requires domain expertise and struggles to scale to large toolsets.Additionally, these methods rely heavily on ad-hoc inference techniques or special tokens to integrate free-form LLM generation with tool-calling actions, limiting the LLM's flexibility in handling diverse tool specifications and integrating multiple tools.In this work, we propose AutoTools, a framework that enables LLMs to automate the tool-use workflow.Specifically, the LLM automatically transforms tool documentation into callable functions, verifying syntax...","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3696410.3714825","openalex_id":"https://openalex.org/W4409657210","cited_by_count":10,"quality_score":51,"matched_keywords":["LLM"],"author_affiliations":["Baidu (China)","Leiden University","Shandong University","University of Birmingham","University of Electronic Science and Technology of China"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.745453953742981},{"id":"https://openalex.org/C199360897","display_name":"Programming language","score":0.4415709376335144},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3977513313293457},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.3596630096435547},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.3474547266960144},{"id":"https://openalex.org/C115903868","display_name":"Software engineering","score":0.3212655186653137}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":10}},{"id":"openalex:W4409657425","title":"Self-Calibrated Listwise Reranking with Large Language Models","url":"https://doi.org/10.1145/3696410.3714658","published":"2025-04-22","authors":["Ruiyang Ren","Yuhao Wang","Kun Zhou","Wayne Xin Zhao","Wenjie Wang","Jing Liu","Ji-Rong Wen","Tat‐Seng Chua"],"abstract":"Large language models (LLMs), with advanced linguistic capabilities, have been employed in reranking tasks through a sequence-to-sequence approach. In this paradigm, multiple passages are reranked in a listwise manner and a textual reranked permutation is generated. However, due to the limited context window of LLMs, this reranking paradigm requires a sliding window strategy to iteratively handle larger candidate sets. This not only increases computational costs but also restricts the LLM from fully capturing all the comparison information for all candidates. To address these challenges, we propose a novel self-calibrated listwise reranking method, which aims to leverage LLMs to produce global relevance scores for ranking. To achieve it, we first propose the relevance-aware listwise reranking framework, which incorporates explicit list-view relevance scores to improve reranking efficienc...","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3696410.3714658","openalex_id":"https://openalex.org/W4409657425","cited_by_count":7,"quality_score":48,"matched_keywords":["LLM"],"author_affiliations":["Baidu (China)","National University of Singapore","Renmin University of China","University of Science and Technology of China"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7181506156921387},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.44942474365234375},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.29542165994644165}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":7}},{"id":"openalex:W4409734799","title":"UKB-MDRMF: a multi-disease risk and multimorbidity framework based on UK biobank data","url":"https://doi.org/10.1038/s41467-025-58724-3","published":"2025-04-22","authors":["Yukang Jiang","Bingxin Zhao","Xiaopu Wang","Borui Tang","Huiyang Peng","Zidan Luo","Yue Shen","Zheng Wang","Zhiwen Jiang","Jie Wang","Jieping Ye","Xueqin Wang"],"abstract":"The rapid accumulation of biomedical cohort data presents opportunities to explore disease mechanisms, risk factors, and prognostic markers. However, current research often has a narrow focus, limiting the exploration of risk factors and inter-disease correlations. Additionally, fragmented processes and time constraints can hinder comprehensive analysis of the disease landscape. Our work addresses these challenges by integrating multimodal data from the UK Biobank, including basic, lifestyle, measurement, environment, genetic, and imaging data. We propose UKB-MDRMF, a comprehensive framework for predicting and assessing health risks across 1560 diseases. Unlike single disease models, UKB-MDRMF incorporates multimorbidity mechanisms, resulting in superior predictive accuracy, with all disease types showing improved performance in risk assessment. By jointly predicting and assessing multip...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1038/s41467-025-58724-3","openalex_id":"https://openalex.org/W4409734799","cited_by_count":8,"quality_score":45,"matched_keywords":[],"author_affiliations":["Alibaba Group (China)","University of North Carolina at Chapel Hill","University of Pennsylvania","University of Science and Technology of China"],"concepts":[{"id":"https://openalex.org/C116567970","display_name":"Biobank","score":0.9614253044128418},{"id":"https://openalex.org/C2779134260","display_name":"Disease","score":0.6926370859146118},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5620917677879333},{"id":"https://openalex.org/C2522767166","display_name":"Data science","score":0.523127019405365},{"id":"https://openalex.org/C2909273474","display_name":"Multimorbidity","score":0.5179556608200073},{"id":"https://openalex.org/C12713177","display_name":"Perspective (graphical)","score":0.4855046272277832},{"id":"https://openalex.org/C12174686","display_name":"Risk assessment","score":0.45818769931793213},{"id":"https://openalex.org/C112930515","display_name":"Risk analysis (engineering)","score":0.4195594787597656}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":8}},{"id":"openalex:W4409657081","title":"<i>ImageScope:</i> Unifying Language-Guided Image Retrieval via Large Multimodal Model Collective Reasoning","url":"https://doi.org/10.1145/3696410.3714777","published":"2025-04-22","authors":["Pengfei Luo","Jingbo Zhou","Tong Xu","Yuan Xia","Linli Xu","Enhong Chen"],"abstract":"With the proliferation of images in online content, language-guided image retrieval (LGIR) has emerged as a research hotspot over the past decade, encompassing a variety of subtasks with diverse input forms. While the development of large multimodal models (LMMs) has significantly facilitated these tasks, existing approaches often address them in isolation, requiring the construction of separate systems for each task. This not only increases system complexity and maintenance costs, but also exacerbates challenges stemming from language ambiguity and complex image content, making it difficult for retrieval systems to provide accurate and reliable results. To this end, we propose ImageScope, a training-free, three-stage framework that leverages collective reasoning to unify LGIR tasks. The key insight behind the unification lies in the compositional nature of language, which transforms div...","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3696410.3714777","openalex_id":"https://openalex.org/W4409657081","cited_by_count":4,"quality_score":45,"matched_keywords":["retrieval"],"author_affiliations":["Baidu (China)","University of Science and Technology of China"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.778475821018219},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.5450525283813477},{"id":"https://openalex.org/C1667742","display_name":"Image retrieval","score":0.4870636761188507},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4776969254016876},{"id":"https://openalex.org/C115961682","display_name":"Image (mathematics)","score":0.45304372906684875},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.3448057472705841}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":4}},{"id":"openalex:W4410089370","title":"TourRank: Utilizing Large Language Models for Documents Ranking with a Tournament-Inspired Strategy","url":"https://doi.org/10.1145/3696410.3714863","published":"2025-04-22","authors":["Yiqun Chen","Qi Liu","Yi Zhang","Weiwei Sun","Xinyu Ma","Wei Yang","Daiting Shi","Jiaxin Mao","Dawei Yin"],"abstract":"Large Language Models (LLMs) are increasingly employed in zero-shot documents ranking, yielding commendable results. However, several significant challenges still persist in LLMs for ranking: (1) LLMs are constrained by limited input length, precluding them from processing a large number of documents simultaneously; (2) The output document sequence is influenced by the input order of documents, resulting in inconsistent ranking outcomes; (3) Achieving a balance between cost and ranking performance is challenging. To tackle these issues, we introduce a novel documents ranking method called TourRank1. which is inspired by the sport tournaments, such as FIFA World Cup. Specifically, we 1) overcome the limitation in input length and reduce the ranking latency by incorporating a multi-stage grouping strategy similar to the parallel group stage of sport tournaments; 2) improve the ranking perf...","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3696410.3714863","openalex_id":"https://openalex.org/W4410089370","cited_by_count":6,"quality_score":43,"matched_keywords":[],"author_affiliations":["Baidu (China)","Carnegie Mellon University","Renmin University of China","University of Southern California"],"concepts":[{"id":"https://openalex.org/C136975688","display_name":"Tournament","score":0.9415677785873413},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7868239283561707},{"id":"https://openalex.org/C189430467","display_name":"Ranking (information retrieval)","score":0.7237472534179688},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5400480031967163},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.47764456272125244},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.45630568265914917},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.41578373312950134},{"id":"https://openalex.org/C33923547","display_name":"Mathematics","score":0.08068162202835083}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":6}},{"id":"openalex:W4409657366","title":"Evaluating Robustness of LLMs on Crisis-Related Microblogs across Events, Information Types, and Linguistic Features","url":"https://doi.org/10.1145/3696410.3714511","published":"2025-04-22","authors":["Muhammad Imran","Abdul Wahab Ziaullah","Kai Chen","Ferda Ofli"],"abstract":"The widespread use of microblogging platforms like X (formerly Twitter) during disasters provides real-time information to governments and response authorities. However, the data from these platforms is often noisy, requiring automated methods to filter relevant information. Traditionally, supervised machine learning models have been used, but they lack generalizability. In contrast, Large Language Models (LLMs) show better capabilities in understanding and processing natural language out of the box. This paper provides a detailed analysis of the performance of six well-known LLMs in processing disaster-related social media data from a large-set of real-world events. Our findings indicate that while LLMs, particularly GPT-4o and GPT-4, offer better generalizability across different disasters and information types, most LLMs face challenges in processing flood-related data, show minimal i...","companies":["OpenAI"],"matched_orgs":["OpenAI"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3696410.3714511","openalex_id":"https://openalex.org/W4409657366","cited_by_count":2,"quality_score":43,"matched_keywords":["media"],"author_affiliations":["Hamad bin Khalifa University","OpenAI (United States)"],"concepts":[{"id":"https://openalex.org/C143275388","display_name":"Microblogging","score":0.8316982984542847},{"id":"https://openalex.org/C63479239","display_name":"Robustness (evolution)","score":0.7345640063285828},{"id":"https://openalex.org/C518677369","display_name":"Social media","score":0.6758613586425781},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5496665835380554},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.40995001792907715},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3392814099788666},{"id":"https://openalex.org/C41895202","display_name":"Linguistics","score":0.3277227282524109},{"id":"https://openalex.org/C136764020","display_name":"World Wide Web","score":0.17488592863082886}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":2}},{"id":"arxiv:2502.16077","title":"ESANS: Effective and Semantic-Aware Negative Sampling for Large-Scale Retrieval Systems","url":"http://arxiv.org/abs/2502.16077","published":"2025-04-22","authors":["Haibo Xing","Kanefumi Matsuyama","Hao Deng","Jinxin Hu","Yu Zhang","Xiaoyi Zeng"],"abstract":"Industrial recommendation systems typically involve a two-stage process: retrieval and ranking, which aims to match users with millions of items. In the retrieval stage, classic embedding-based retrieval (EBR) methods depend on effective negative sampling techniques to enhance both performance and efficiency. However, existing techniques often suffer from false negatives, high cost for ensuring sampling quality and semantic information deficiency. To address these limitations, we propose Effective and Semantic-Aware Negative Sampling (ESANS), which integrates two key components: Effective Dense Interpolation Strategy (EDIS) and Multimodal Semantic-Aware Clustering (MSAC). EDIS generates virtual samples within the low-dimensional embedding space to improve the diversity and density of the sampling distribution while minimizing computational costs. MSAC refines the negative sampling distri...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"preprint","doi":"https://doi.org/10.1145/3696410.3714600","openalex_id":"https://openalex.org/W4409657373","cited_by_count":0,"quality_score":41,"matched_keywords":["retrieval"],"author_affiliations":["Alibaba Group (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7689799070358276},{"id":"https://openalex.org/C2778755073","display_name":"Scale (ratio)","score":0.5662914514541626},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.422260582447052},{"id":"https://openalex.org/C140779682","display_name":"Sampling (signal processing)","score":0.42034560441970825},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.09129133820533752},{"id":"https://openalex.org/C205649164","display_name":"Geography","score":0.05957096815109253},{"id":"https://openalex.org/C58640448","display_name":"Cartography","score":0.0},{"id":"https://openalex.org/C106131492","display_name":"Filter (signal processing)","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4410089537","title":"Explainable Multi-Modality Alignment for Transferable Recommendation","url":"https://doi.org/10.1145/3696410.3714733","published":"2025-04-22","authors":["Shenghao Yang","Weizhi Ma","Zhiqiang Guo","Min Zhang","Haiyang Wu","Junjie Zhai","Chunhui Zhang","Yuekui Yang"],"abstract":"With the development of multi-modal modeling techniques, recent sequential recommender systems enhance transferability by incorporating cross-domain universal multi-modal data, e.g., text and image. Existing methods typically adopt pairwise alignment to alleviate the gap between modalities. However, this alignment paradigm has limitations on explainability, consistency, and expansibility, resulting in suboptimal performance. This paper proposes a novel Explainable multi-modality Alignment method for transferable Rec ommender systems, i.e., EARec. Specifically, we design a two-stage framework to achieve explainable modality alignment in the source domain and recommendation based on aligned modality representations in the target domain. In the first stage, we adopt a generative task to align various modalities in parallel to a shared anchor with explainable meaning. All modalities share th...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3696410.3714733","openalex_id":"https://openalex.org/W4410089537","cited_by_count":1,"quality_score":38,"matched_keywords":[],"author_affiliations":["Tencent (China)","Tsinghua University"],"concepts":[{"id":"https://openalex.org/C2780226545","display_name":"Modality (human–computer interaction)","score":0.7123500108718872},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6839703321456909},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.29261088371276855}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/new-employee-copilot-usage-insights-into-productivity-and-socialization","title":"New employee Copilot usage: Insights into productivity and socialization","url":"https://www.microsoft.com/en-us/research/publication/new-employee-copilot-usage-insights-into-productivity-and-socialization/","published":"2025-04-21","authors":["Mihaela Vorvoreanu","Sydney Graham","Amy Heger","Shipi Dhanorkar","Kathleen Walker"],"abstract":"This report summarizes a mixed-methods study examining how a new generation of employees interacts with the generative AI assistant Microsoft Copilot for productivity when acclimating to a professional environment. Through a series of surveys, interviews, and a diary study, 125 Microsoft interns in a variety of roles provided insights into their usage and effects of Copilot in their new-employee roles. Top findings: We observed an association between frequency of Copilot use and workplace integration. Interns who used Copilot more frequently felt better socialized and identified with their teams more strongly. Greater usage of Copilot correlated with an increasingly favorable perception of the AI assistant. The most frequent and valued Copilot use cases were Information retrieval, writing assistance, and coding assistance. Participants highlighted multiple ways in which Copilot helped th...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Tech Report","Artificial intelligence","Human-computer interaction","Human–computer interaction","retrieval"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/agentic-reasoning-and-tool-integration-for-llms-via-reinforcement-learning","title":"Agentic Reasoning and Tool Integration for LLMs via Reinforcement Learning","url":"https://www.microsoft.com/en-us/research/publication/agentic-reasoning-and-tool-integration-for-llms-via-reinforcement-learning/","published":"2025-04-21","authors":["Joykirat Singh","Raghav Magazine","Yash Pandya","Akshay Nambi"],"abstract":"Large language models (LLMs) have achieved remarkable progress in complex reasoning tasks, yet they remain fundamentally limited by their reliance on static internal knowledge and text-only reasoning. Real-world problem solving often demands dynamic, multi-step reasoning, adaptive decision making, and the ability to interact with external tools and environments. In this work, we introduce ARTIST ( A gentic R easoning and T ool I ntegration in S elf-improving T ransformers), a unified framework that tightly couples agentic reasoning, reinforcement learning, and tool integration for LLMs. ARTIST enables models to autonomously decide when , how , and which tools to invoke within multi-turn reasoning chains, leveraging outcome-based RL to learn robust strategies for tool use and environment interaction without requiring step-level supervision. Extensive experiments on mathematical reasoning....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Unpublished","Artificial intelligence","Reinforcement learning","reinforcement learning agents"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W4409688984","title":"Abstract 3762: DEL-AI: Proteome-wide <i>in silico</i> screening of multi-billion compound libraries using machine learning foundation models","url":"https://doi.org/10.1158/1538-7445.am2025-3762","published":"2025-04-21","authors":["Elena L. Cáceres","Cristiana Carpinteiro","Manuel Sokolov Ravasqueira","Mridula Bontha","Brandon Bravo","Graham J. Carlson","Jim Davis","Ketki Dhamnaskar","John Eichenseer","Telmo Felgueira","Jamie Furneisen","Heta A. Gandhi"],"abstract":"Abstract DNA-encoded libraries (DEL) are a transformative technology in small-molecule discovery and the engine driving Nurix’s small molecule and degrader pipelines for novel targets. DEL platforms allow efficient screening of libraries of billions of unique molecules against a diverse set of biological targets. This approach generates datasets well suited for machine learning (ML) methods trained to identify patterns and relationships, which then guides the design of novel compounds. Reported methods have focused on the development of ML models built from experimental datasets and thus require a successful DEL screen as a prerequisite. However, executing high-quality DEL screens is an extremely resource intensive process limited by physical factors such as protein production and availability. As such, the scale at which experimental screens and their derivative ML models can be applied...","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1158/1538-7445.am2025-3762","openalex_id":"https://openalex.org/W4409688984","cited_by_count":0,"quality_score":41,"matched_keywords":["efficient"],"author_affiliations":["Amazon (United States)","Nurix (United States)"],"concepts":[{"id":"https://openalex.org/C2775905019","display_name":"In silico","score":0.7397027015686035},{"id":"https://openalex.org/C2780966255","display_name":"Foundation (evidence)","score":0.5907226800918579},{"id":"https://openalex.org/C70721500","display_name":"Computational biology","score":0.45482707023620605},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.3627811372280121},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3357812166213989},{"id":"https://openalex.org/C86803240","display_name":"Biology","score":0.2532244324684143},{"id":"https://openalex.org/C55493867","display_name":"Biochemistry","score":0.09990474581718445},{"id":"https://openalex.org/C205649164","display_name":"Geography","score":0.08296239376068115}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/ufo2-the-desktop-agentos","title":"UFO2: The Desktop AgentOS","url":"https://www.microsoft.com/en-us/research/publication/ufo2-the-desktop-agentos/","published":"2025-04-20","authors":["Chaoyun Zhang","He Huang","Chiming Ni","Jian Mu","Si Qin","Shilin He","Lu Wang","Fangkai Yang","Pu Zhao","Chao Du","Liqun Li","Yu Kang"],"abstract":"Recent Computer-Using Agents (CUAs), powered by multimodal large language models (LLMs), offer a promising direction for automating complex desktop workflows through natural language. However, most existing CUAs remain conceptual prototypes, hindered by shallow OS integration, fragile screenshot-based interaction, and disruptive execution. We present UFO2, a multiagent AgentOS for Windows desktops that elevates CUAs into practical, system-level automation. UFO2 features a centralized HostAgent for task decomposition and coordination, alongside a collection of application-specialized AppAgent equipped with native APIs, domain-specific knowledge, and a unified GUI--API action layer. This architecture enables robust task execution while preserving modularity and extensibility. A hybrid control detection pipeline fuses Windows UI Automation (UIA) with vision-based parsing to support diverse....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Article (Journal)","Artificial intelligence","Programming languages and software engineering","Systems and networking","Computer science","LLM"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/longitudinal-study-on-social-and-emotional-use-of-ai-conversational-agent","title":"Longitudinal Study on Social and Emotional Use of AI Conversational Agent","url":"https://www.microsoft.com/en-us/research/publication/longitudinal-study-on-social-and-emotional-use-of-ai-conversational-agent/","published":"2025-04-18","authors":["Mohit Chandra","Javier Hernandez","Gonzalo Ramos","Mahsa Ershadi","Ananya Bhattacharjee","Judith Amores","Ebele Okoli","Ann Paradiso","Shahed Warreth","Jina Suh"],"abstract":"Development in digital technologies has continuously reshaped how individuals seek and receive social and emotional support. While online platforms and communities have long served this need, the increased integration of general-purpose conversational AI into daily lives has introduced new dynamics in how support is provided and experienced. Existing research has highlighted both benefits (e.g., wider access to well-being resources) and potential risks (e.g., over-reliance) of using AI for support seeking. In this five-week, exploratory study, we recruited 149 participants divided into two usage groups: a baseline usage group (BU, n=60) that used the internet and AI as usual, and an active usage group (AU, n=89) encouraged to use one of four commercially available AI tools (Microsoft Copilot, Google Gemini, PI AI, ChatGPT) for social and emotional interactions. Our analysis revealed sign...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Unpublished","Artificial intelligence","Human-computer interaction","Computer science","agent"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W4409566559","title":"Zero-shot evaluation reveals limitations of single-cell foundation models","url":"https://doi.org/10.1186/s13059-025-03574-x","published":"2025-04-18","authors":["Kasia Z. Kedzierska","Lorin Crawford","Ava P. Amini","Alex X. Lu"],"abstract":"Foundation models such as scGPT and Geneformer have not been rigorously evaluated in a setting where they are used without any further training (i.e., zero-shot). Understanding the performance of models in zero-shot settings is critical to applications that exclude the ability to fine-tune, such as discovery settings where labels are unknown. Our evaluation of the zero-shot performance of Geneformer and scGPT suggests that, in some cases, these models may face reliability challenges and could be outperformed by simpler methods. Our findings underscore the importance of zero-shot evaluations in development and deployment of foundation models in single-cell research.","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1186/s13059-025-03574-x","openalex_id":"https://openalex.org/W4409566559","cited_by_count":52,"quality_score":67,"matched_keywords":[],"author_affiliations":["Microsoft (United States)","Microsoft Research (United Kingdom)","University of Oxford"],"concepts":[{"id":"https://openalex.org/C2780966255","display_name":"Foundation (evidence)","score":0.7267223000526428},{"id":"https://openalex.org/C2780813799","display_name":"Zero (linguistics)","score":0.7177968621253967},{"id":"https://openalex.org/C2778344882","display_name":"Shot (pellet)","score":0.6158739924430847},{"id":"https://openalex.org/C2992734406","display_name":"One shot","score":0.600570023059845},{"id":"https://openalex.org/C3019835501","display_name":"Single shot","score":0.5091914534568787},{"id":"https://openalex.org/C43214815","display_name":"Reliability (semiconductor)","score":0.495552659034729},{"id":"https://openalex.org/C2779304628","display_name":"Face (sociological concept)","score":0.42112863063812256},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.41101711988449097}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":52}},{"id":"apple:akxy1o0dk7n7j3j7gykx0c5x","title":"FastVLM: Efficient Vision encoding for Vision Language Models","url":"https://machinelearning.apple.com/research/fastvlm-efficient-vision-encoding","published":"2025-04-18","authors":["Pavan Kumar Anasosalu Vasu","Fartash Faghri","Chun-Liang Li","Cem Koc","Nate True","Albert Antony","Gokul Santhanam","James Gabriel","Peter Grasch","Oncel Tuzel","Hadi Pouransari"],"abstract":"Scaling the input image resolution is essential for enhancing the performance of Vision Language Models (VLMs), particularly in text-rich image understanding tasks. However, popular visual encoders such as ViTs become inefficient at high resolutions due to the large number of tokens and high encoding latency. At different operational resolutions, the vision encoder of a VLM can be optimized along two axes: reducing encoding latency and minimizing...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["efficient"],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"openalex:W4409584646","title":"Artificial Intelligence for Software Engineering: The Journey So Far and the Road Ahead","url":"https://doi.org/10.1145/3719006","published":"2025-04-18","authors":["Iftekhar Ahmed","Aldeida Aleti","Haipeng Cai","Alexander Chatzigeorgiou","Pinjia He","Xing Hu","Mauro Pezzè","Denys Poshyvanyk","Xin Xia"],"abstract":"Artificial intelligence and recent advances in deep learning architectures, including transformer networks and large language models, change the way people think and act to solve problems. Software engineering, as an increasingly complex process to design, develop, test, deploy, and maintain large-scale software systems for solving real-world challenges, is profoundly affected by many revolutionary artificial intelligence tools in general and machine learning in particular. In this roadmap for artificial intelligence in software engineering, we highlight the recent deep impact of artificial intelligence on software engineering by discussing successful stories of applications of artificial intelligence to classic and new software development challenges. We identify the new challenges that the software engineering community has to address in the coming years to successfully apply artificia...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3719006","openalex_id":"https://openalex.org/W4409584646","cited_by_count":10,"quality_score":47,"matched_keywords":[],"author_affiliations":["Chinese University of Hong Kong","Huawei Technologies (China)","Monash University","University at Buffalo, State University of New York","University of California, Irvine","University of Macedonia","Università della Svizzera italiana","William & Mary","Williams (United States)","Zhejiang University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7714911103248596},{"id":"https://openalex.org/C115903868","display_name":"Software engineering","score":0.5005397796630859},{"id":"https://openalex.org/C2777904410","display_name":"Software","score":0.44747892022132874},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3515121340751648},{"id":"https://openalex.org/C199360897","display_name":"Programming language","score":0.11616009473800659}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":10}},{"id":"openalex:W4409531882","title":"Biomedical Natural Language Processing in the Era of Large Language Models","url":"https://doi.org/10.1146/annurev-biodatasci-103123-095406","published":"2025-04-17","authors":["Naoto Usuyama","Cliff Wong","Sheng Zhang","Tristan Naumann","Hoifung Poon"],"abstract":"Biomedicine has rapidly digitized over recent decades, from genomic sequencing to electronic medical records. Now, the rise of large language models (LLMs) is driving a generative artificial intelligence (AI) revolution in natural language processing (NLP). Together, these trends create unprecedented possibilities to optimize patient care and accelerate biomedical discovery. Biomedical NLP already boosts productivity by automating labor-intensive tasks such as knowledge extraction and medical abstraction. Emerging approaches promise creativity gain, surpassing standard healthcare practices and uncovering emergent capabilities through Web-scale biomedical knowledge and population-level patient data. However, LLMs remain prone to hallucinations and omissions, and ensuring compliance and safety is vital in order to do no harm. Incorporating diverse modalities such as imaging and genomics is...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"review","doi":"https://doi.org/10.1146/annurev-biodatasci-103123-095406","openalex_id":"https://openalex.org/W4409531882","cited_by_count":15,"quality_score":52,"matched_keywords":[],"author_affiliations":["Microsoft (United States)"],"concepts":[{"id":"https://openalex.org/C66782513","display_name":"Biomedicine","score":0.6225965023040771},{"id":"https://openalex.org/C2522767166","display_name":"Data science","score":0.5582075119018555},{"id":"https://openalex.org/C2779343474","display_name":"Context (archaeology)","score":0.5302621722221375},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.491359144449234},{"id":"https://openalex.org/C160735492","display_name":"Health care","score":0.47337907552719116},{"id":"https://openalex.org/C2779903281","display_name":"Modalities","score":0.4687417149543762},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.44366657733917236},{"id":"https://openalex.org/C75684735","display_name":"Big data","score":0.4423852562904358}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":15}},{"id":"openalex:W4410809961","title":"Optimizing AI Model Training Costs with Stratified Sampling and Self-Adaptive Testing","url":"https://doi.org/10.1109/incacct65424.2025.11011465","published":"2025-04-17","authors":["Srihith Chennareddy","Vamshi Krishna Appala"],"abstract":"The data processing and generation of datasets which are required for training Large Language Models (LLMs) and Generative AI (GenAI) applications have become costly these days. This paper presents an approach to optimize the cost of generating model training datasets by combining a stratified sampling algorithm integrated with a self-adaptive, comprehensive test suite framework. The methodology proposed in this paper will reduce the size of training datasets without impacting model performance and data integrity. This helps to reduce the cost of AI model training. Our empirical analyses demonstrate that our proposed approach reduces the cost of generating datasets up to 60%, without losing model accuracy and performances. It provides a way for efficient use of compute and storage resources for generation of high quality datasets for AI model training.","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/incacct65424.2025.11011465","openalex_id":"https://openalex.org/W4410809961","cited_by_count":0,"quality_score":41,"matched_keywords":["efficient"],"author_affiliations":["Bellevue College","Microsoft (United States)"],"concepts":[{"id":"https://openalex.org/C49898467","display_name":"Stratified sampling","score":0.7295265197753906},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7056081295013428},{"id":"https://openalex.org/C2777211547","display_name":"Training (meteorology)","score":0.5040208697319031},{"id":"https://openalex.org/C140779682","display_name":"Sampling (signal processing)","score":0.4681278467178345},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.43317508697509766},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.3609856367111206},{"id":"https://openalex.org/C105795698","display_name":"Statistics","score":0.2258170247077942},{"id":"https://openalex.org/C33923547","display_name":"Mathematics","score":0.1336199939250946}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"official:4cc1e7023b06b3c4","title":"OpenAI o3 and o4-mini System Card","url":"https://openai.com/index/o3-o4-mini-system-card","published":"2025-04-16","authors":["OpenAI"],"abstract":"OpenAI o3 and OpenAI o4-mini combine state-of-the-art reasoning with full tool capabilities—web browsing, Python, image and file analysis, image generation, canvas, automations, file search, and memory.","companies":["OpenAI"],"matched_orgs":["OpenAI"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Publication","memory"],"author_affiliations":["OpenAI"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official RSS feed https://openai.com/news/rss.xml"}},{"id":"apple:aqeq068ofo3x985hh0t1rzfb","title":"Scaling Laws for Native Multimodal Models","url":"https://machinelearning.apple.com/research/scaling-laws-native-multimodal-models","published":"2025-04-16","authors":["Mustafa Shukor","Enrico Fini","Victor Guilherme Turrisi da Costa","Matthieu Cord","Joshua Susskind","Alaaeldin El-Nouby"],"abstract":"Building general-purpose models that can effectively perceive the world through multimodal signals has been a long-standing goal. Current approaches involve integrating separately pre-trained components, such as connecting vision encoders to LLMs and continuing multimodal training. While such approaches exhibit remarkable sample efficiency, it remains an open question whether such late-fusion architectures are inherently superior. In this work,...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"apple:vyforu9td719v698ae0pdhuh","title":"Scaling Diffusion Language Models via Adaptation from Autoregressive Models","url":"https://machinelearning.apple.com/research/scaling-diffusion-language-models","published":"2025-04-16","authors":["Shansan Gong","Shivam Agarwal","Yizhe Zhang","Jiacheng Ye","Lin Zheng","Mukai Li","Chenxin An","Peilin Zhao§","Wei Bi§","Jiawei Han","Hao Peng","Lingpeng Kong"],"abstract":"Diffusion Language Models (DLMs) have emerged as a promising new paradigm for text generative modeling, potentially addressing limitations of autoregressive (AR) models. However, current DLMs have been studied at a smaller scale compared to their AR counterparts and lack fair comparison on language modeling benchmarks. Additionally, training diffusion models from scratch at scale remains challenging. Given the prevalence of open-source AR...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"apple:i10ub9cf83wkbdkx9a5ep0om","title":"DART: Denoising Autoregressive Transformer for Scalable Text-to-Image Generation","url":"https://machinelearning.apple.com/research/dart-denoising-autoregressive-transformer","published":"2025-04-16","authors":["Jiatao Gu","Yuyang Wang","Yizhe Zhang","Qihang Zhang","Dinghuai Zhang§","Navdeep Jaitly","Josh Susskind","Shuangfei Zhai"],"abstract":"Diffusion models have become the dominant approach for visual generation. They are trained by denoising a Markovian process which gradually adds noise to the input. We argue that the Markovian property limits the model's ability to fully utilize the generation trajectory, leading to inefficiencies during training and inference. In this paper, we propose DART, a transformer-based model that unifies autoregressive (AR) and diffusion within a...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"openalex:W4409483037","title":"D3T: Dual-Domain Diffusion Transformer in Triplanar Latent Space for 3D Incomplete-View CT Reconstruction","url":"https://doi.org/10.1007/s11263-025-02426-2","published":"2025-04-16","authors":["Xuhui Liu","Hongmin Li","Zhi Qiao","Yawen Huang","Xi Liu","Juan Zhang","Zhen Qian","Xiantong Zhen","Baochang Zhang"],"abstract":"","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1007/s11263-025-02426-2","openalex_id":"https://openalex.org/W4409483037","cited_by_count":3,"quality_score":40,"matched_keywords":[],"author_affiliations":["Beihang University","Tencent (China)","United Imaging Healthcare (China)"],"concepts":[{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.6159883141517639},{"id":"https://openalex.org/C66322947","display_name":"Transformer","score":0.5745493173599243},{"id":"https://openalex.org/C2780980858","display_name":"Dual (grammatical number)","score":0.541320264339447},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5104231238365173},{"id":"https://openalex.org/C153180895","display_name":"Pattern recognition (psychology)","score":0.5093844532966614},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.38874879479408264},{"id":"https://openalex.org/C33923547","display_name":"Mathematics","score":0.33508455753326416},{"id":"https://openalex.org/C127413603","display_name":"Engineering","score":0.08361703157424927}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":3}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/ui-e2i-synth-advancing-gui-grounding-with-large-scale-instruction-synthesis","title":"UI-E2I-Synth: Advancing GUI Grounding with Large-Scale Instruction Synthesis","url":"https://www.microsoft.com/en-us/research/publication/ui-e2i-synth-advancing-gui-grounding-with-large-scale-instruction-synthesis/","published":"2025-04-15","authors":["Xinyi Liu","Xiaoyi Zhang","Ziyun Zhang","Yan Lu"],"abstract":"Recent advancements in Large Vision-Language Models are accelerating the development of Graphical User Interface (GUI) agents that utilize human-like vision perception capabilities to enhance productivity on digital devices. Compared to approaches predicated on GUI metadata, which are platform-dependent and vulnerable to implementation variations, vision-based approaches offer broader applicability. In this vision-based paradigm, the GUI instruction grounding, which maps user instruction to the location of corresponding element on the given screenshot, remains a critical challenge, particularly due to limited public training dataset and resource-intensive manual instruction data annotation. In this paper, we delve into unexplored challenges in this task including element-to-screen ratio, unbalanced element type, and implicit instruction. To address these challenges, we introduce a large-...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Article (Journal)","Artificial intelligence","Computer science"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"bytedance-seed:209","title":"Seedream 3.0 Technical Report","url":"https://seed.bytedance.com/en/research/seedream-3-0-technical-report","published":"2025-04-15","authors":["Seed Vision Team"],"abstract":"We present Seedream 3.0, a high-performance Chinese-English bilingual image generation foundation model. We develop several technical improvements to address existing challenges in Seedream 2.0, including alignment with complicated prompts, fine-grained typography generation, suboptimal visual aesthetics and fidelity, and limited image resolutions. Specifically, the advancements of Seedream 3.0 stem from improvements across the entire pipeline, from data construction to model deployment. At the data stratum, we double the dataset using a defect-aware training paradigm and a dual-axis collaborative data-sampling framework. Furthermore, we adopt several effective techniques such as mixed-resolution training, cross-modality RoPE, representation alignment loss, and resolution-aware timestep sampling in the pre-training phase. During the post-training stage, we utilize diversified aesthetic c...","companies":["ByteDance/Seed"],"matched_orgs":["ByteDance/Seed"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Computer Vision","Vision","arXiv"],"author_affiliations":["ByteDance/Seed"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official ByteDance Seed publication API https://seed.bytedance.com/api/get_article_list_v2"}},{"id":"hf-org-paper:moonshotai:2504.11354","title":"Kimina-Prover Preview: Towards Large Formal Reasoning Models with Reinforcement Learning","url":"https://huggingface.co/papers/2504.11354","published":"2025-04-15","authors":["Moonshot/Kimi"],"abstract":"We introduce Kimina-Prover Preview, a large language model that pioneers a novel reasoning-driven exploration paradigm for formal theorem proving, as showcased in this preview release. Trained with a large-scale reinforcement learning pipeline from Qwen2.5-72B, Kimina-Prover demonstrates strong performance in Lean 4 proof generation by employing a structured reasoning pattern we term formal reasoning pattern. This approach allows the model to emulate human problem-solving strategies in Lean, iteratively generating and refining proof steps. Kimina-Prover sets a new state-of-the-art on the miniF2F benchmark, reaching 80.7% with pass@8192. Beyond improved benchmark performance, our work yields several key insights: (1) Kimina-Prover exhibits high sample efficiency, delivering strong results even with minimal sampling (pass@1) and scaling effectively with computational budget, stemming from....","companies":["Moonshot/Kimi"],"matched_orgs":["Moonshot/Kimi"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["HuggingFace org papers","moonshotai","language model"],"author_affiliations":["Moonshot/Kimi"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/moonshotai/papers"}},{"id":"openalex:W4409474101","title":"Hadamard Product in Deep Learning: Introduction, Advances and Challenges","url":"https://doi.org/10.1109/tpami.2025.3560423","published":"2025-04-15","authors":["Grigorios G. Chrysos","Yongtao Wu","Razvan Pascanu","Philip H. S. Torr","Volkan Cevher"],"abstract":"While convolution and self-attention mechanisms have dominated architectural design in deep learning, this survey examines a fundamental yet understudied primitive: the Hadamard product. Despite its widespread implementation across various applications, the Hadamard product has not been systematically analyzed as a core architectural primitive. We present the first comprehensive taxonomy of its applications in deep learning, identifying four principal domains: higher-order correlation, multimodal data fusion, dynamic representation modulation, and efficient pairwise operations. The Hadamard product's ability to model nonlinear interactions with linear computational complexity makes it particularly valuable for resource-constrained deployments and edge computing scenarios. We demonstrate its natural applicability in multimodal fusion tasks, such as visual question answering, and its effec...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/tpami.2025.3560423","openalex_id":"https://openalex.org/W4409474101","cited_by_count":19,"quality_score":60,"matched_keywords":["efficient"],"author_affiliations":["Google DeepMind (United Kingdom)","University of Oxford","University of Wisconsin–Madison","École Polytechnique Fédérale de Lausanne"],"concepts":[{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.6354537606239319},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6131657361984253},{"id":"https://openalex.org/C108583219","display_name":"Deep learning","score":0.585163414478302},{"id":"https://openalex.org/C60292330","display_name":"Hadamard transform","score":0.47372546792030334},{"id":"https://openalex.org/C90673727","display_name":"Product (mathematics)","score":0.4170716106891632},{"id":"https://openalex.org/C153180895","display_name":"Pattern recognition (psychology)","score":0.36750856041908264},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.3501574695110321},{"id":"https://openalex.org/C33923547","display_name":"Mathematics","score":0.20054486393928528}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":19}},{"id":"official:0c269634a392d7f5","title":"Gemini 2.0 Flash Model Card","url":"https://storage.googleapis.com/deepmind-media/Model-Cards/Gemini-2-0-Flash-Model-Card.pdf","published":"2025-04-15","authors":["Google/DeepMind"],"abstract":"Official Google DeepMind model card.","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"model_card","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["model card","Gemini 2.0 Flash"],"author_affiliations":["Google/DeepMind"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Google DeepMind model cards page https://deepmind.google/models/model-cards/"}},{"id":"apple:odfn2tao4y4fzdnw056m2gny","title":"TIS-DPO: Token-level Importance Sampling for Direct Preference Optimization","url":"https://machinelearning.apple.com/research/tis-dpo-importance-sampling","published":"2025-04-15","authors":["law Liu","Felix Bai","Zhiyun Lu","Yanchao Sun","Xiang Kong","Simon Wang","Jiulong Shan","Lijie Wen","Philip S. Yu§","Meng Cao"],"abstract":"Direct Preference Optimization (DPO) has been widely adopted for preference alignment of Large Language Models (LLMs) due to its simplicity and effectiveness. However, DPO is derived as a bandit problem in which the whole response is treated as a single arm, ignoring the importance differences between tokens, which may affect optimization efficiency and make it difficult to achieve optimal results. In this work, we propose that the optimal data...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["preference"],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"apple:fglzg4c2780s6ufgxdkia7wt","title":"EC-DIT: Scaling Diffusion Transformers with Adaptive Expert-Choice Routing","url":"https://machinelearning.apple.com/research/ec-dit","published":"2025-04-15","authors":["Haotian Sun","Tao Lei","Bowen Zhang","Yanghao Li","Haoshuo Huang","Ruoming Pang","Bo Dai","Nan Du"],"abstract":"Diffusion transformers have been widely adopted for text-to-image synthesis. While scaling these models up to billions of parameters shows promise, the effectiveness of scaling beyond current sizes remains underexplored and challenging. By explicitly exploiting the computational heterogeneity of image generations, we develop a new family of Mixture-of-Experts (MoE) models (EC-DIT) for diffusion transformers with expert-choice routing. EC-DIT...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"openalex:W4409473464","title":"Role of Generative Artificial Intelligence in Personalized Medicine: A Systematic Review","url":"https://doi.org/10.7759/cureus.82310","published":"2025-04-15","authors":["Aashish Mishra","Anirban Majumder","Dheeraj Kommineni","Christopher Joseph","Tanay Chowdhury","Sathish Krishna Anumula"],"abstract":"Precision medicine presents challenges in data collection, cost, and privacy as it tailors treatments to each patient's unique genetic and clinical profile. With its ability to produce realistic and confidential patient data, generative artificial intelligence (AI) offers a promising avenue that could revolutionize patient-centric healthcare. This systematic review aims to assess the role of generative AI in personalized medicine. Following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines, we searched PubMed, Web of Science, Scopus, CINAHL, and Google Scholar, identifying 549 studies. After removing duplicates and applying eligibility criteria, 27 studies were found relevant and were included in this systematic review. Generative adversarial networks (GANs) were the most commonly used models (16 studies), followed by variational autoencoders (VAE...","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"review","doi":"https://doi.org/10.7759/cureus.82310","openalex_id":"https://openalex.org/W4409473464","cited_by_count":10,"quality_score":51,"matched_keywords":["personalized"],"author_affiliations":["Amazon (United States)","Eastern Kentucky University","IBM (United States)","Systems Analytics (United States)","University of Pittsburgh"],"concepts":[{"id":"https://openalex.org/C71924100","display_name":"Medicine","score":0.9338504076004028},{"id":"https://openalex.org/C32220436","display_name":"Personalized medicine","score":0.5632597208023071},{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.4766722023487091},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4052163362503052},{"id":"https://openalex.org/C60644358","display_name":"Bioinformatics","score":0.22744956612586975},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.0},{"id":"https://openalex.org/C86803240","display_name":"Biology","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":10}},{"id":"openalex:W4409476112","title":"Rethinking Natural Language Generation with Layer-Wise Multi-View Decoding","url":"https://doi.org/10.1145/3729536","published":"2025-04-15","authors":["Fenglin Liu","Xuancheng Ren","Guangxiang Zhao","Chenyu You","Sherry Ma","Xian Wu","Wei Fan","Xu Sun"],"abstract":"In natural language generation, language models, particularly those based on decoder-only architectures as in popular Large Language Models (LLMs), have demonstrated impressive performance across a wide range of tasks. However, encoder-decoder architectures remain highly effective for tasks involving non-text data, such as images and time-series data. The decoder relies on the attention mechanism to efficiently extract information from the encoder. While it is common practice to draw information from only the last encoder layer, this might lead to insufficient training of the encoder layer stack due to the hierarchy bypassing problem. In this work, we propose layer-wise multi-view decoding for improved encoder-decoder language models, where for each decoder layer, together with the representations from the last encoder layer, which serve as a global view, those from other encoder layers....","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3729536","openalex_id":"https://openalex.org/W4409476112","cited_by_count":1,"quality_score":38,"matched_keywords":[],"author_affiliations":["KLA (United States)","Peking University","Tencent (China)","University of Oxford","Yale University"],"concepts":[{"id":"https://openalex.org/C57273362","display_name":"Decoding methods","score":0.6701598167419434},{"id":"https://openalex.org/C2779227376","display_name":"Layer (electronics)","score":0.6263404488563538},{"id":"https://openalex.org/C2776608160","display_name":"Natural (archaeology)","score":0.572592556476593},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5224265456199646},{"id":"https://openalex.org/C195324797","display_name":"Natural language","score":0.43900030851364136},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.39242029190063477},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.386566698551178},{"id":"https://openalex.org/C11413529","display_name":"Algorithm","score":0.23088741302490234}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4410918015","title":"The Strategic Selection of Machine Learning Models: A Comparative Analysis of Dedicated Models versus Large Language Models","url":"https://doi.org/10.37745/ejcsit.2013/vol13n319298","published":"2025-04-15","authors":["Anupam Chansarkar"],"abstract":"This article presents a comprehensive analysis of the strategic considerations in choosing between dedicated machine learning models and Large Language Models (LLMs) for various applications. The article examines the performance metrics, resource requirements, and cost-benefit relationships of both approaches through multiple case studies, including inventory optimization and content generation scenarios. Through empirical evidence and comparative analysis, the article demonstrates that while LLMs offer remarkable versatility in handling diverse tasks, dedicated ML models often provide superior performance and resource efficiency for specialized applications. The article highlights the importance of aligning technological choices with specific use cases and operational requirements, providing organizations with a framework for making informed decisions about their machine learning implem...","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.37745/ejcsit.2013/vol13n319298","openalex_id":"https://openalex.org/W4410918015","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Amazon (United States)"],"concepts":[{"id":"https://openalex.org/C81917197","display_name":"Selection (genetic algorithm)","score":0.6085079312324524},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5689122080802917},{"id":"https://openalex.org/C93959086","display_name":"Model selection","score":0.5309127569198608},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4613805413246155},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.40982913970947266},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.37070873379707336}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"bytedance-seed:830","title":"VAPO: Efficient and Reliable Reinforcement Learning for Advanced Reasoning Tasks","url":"https://seed.bytedance.com/en/research/vapo-efficient-and-reliable-reinforcement-learning-for-advanced-reasoning-tasks","published":"2025-04-14","authors":["Yu Yue","Yufeng Yuan","Qiying Yu","Xiaochen Zuo","Ruofei Zhu","Wenyuan Xu","Jiaze Chen","Chengyi Wang","TianTian Fan","Zhengyin Du","Xiangpeng Wei","Xiangyu Yu"],"abstract":"We present VAPO, Value-based Augmented Proximal Policy Optimization framework for reasoning models., a novel framework tailored for reasoning models within the value-based paradigm. Benchmarked the AIME 2024 dataset, VAPO, built on the Qwen 32B pre-trained model, attains a state-of-the-art score of 60.4. In direct comparison under identical experimental settings, VAPO outperforms the previously reported results of DeepSeek-R1-Zero-Qwen-32B and DAPO by more than 10 points. The training process of VAPO stands out for its stability and efficiency. It reaches state-of-the-art performance within a mere 5,000 steps. Moreover, across multiple independent runs, no training crashes occur, underscoring its reliability. This research delves into long chain-of-thought (long-CoT) reasoning using a value-based reinforcement learning framework. We pinpoint three key challenges that plague value-based m...","companies":["ByteDance/Seed"],"matched_orgs":["ByteDance/Seed"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Artificial Intelligence","LLM","arXiv","efficient"],"author_affiliations":["ByteDance/Seed"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official ByteDance Seed publication API https://seed.bytedance.com/api/get_article_list_v2"}},{"id":"official:b4c1e2226ea3c28b","title":"Autoregressive Distillation of Diffusion Transformers","url":"https://ai.meta.com/research/publications/autoregressive-distillation-of-diffusion-transformers/","published":"2025-04-14","authors":["Yeongmin Kim","Sotiris Anagnostidis","Yuming Du","Edgar Schoenfeld","Jonas Kohler","Markos Georgopoulos","Albert Pumarola","Ali Thabet","Artsiom Sanakoyeu"],"abstract":"Official Meta AI publication page.","companies":["Meta/FAIR"],"matched_orgs":["Meta/FAIR"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Graphics","distillation"],"author_affiliations":["Meta/FAIR"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Meta AI publication search https://ai.meta.com/results/?content_types%5B0%5D=publication&page=6"}},{"id":"apple:mphp064nufoa2thl7m1wpod0","title":"FocalLens: Instruction Tuning Enables Zero-Shot Conditional Image Representations","url":"https://machinelearning.apple.com/research/focallens","published":"2025-04-14","authors":["Cheng-Yu Hsieh","Pavan Kumar Anasosalu Vasu","Fartash Faghri","Raviteja Vemulapalli","Chun-Liang Li","Ranjay Krishna","Oncel Tuzel","Hadi Pour Ansari"],"abstract":"This paper was accepted at the Workshop on Foundation Models in the Wild at ICLR 2025.","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"arxiv:2504.10462","title":"The Scalability of Simplicity: Empirical Analysis of Vision-Language Learning with a Single Transformer","url":"https://huggingface.co/papers/2504.10462","published":"2025-04-14","authors":["Weixian Lei","Jiacong Wang","Haochen Wang","Xiangtai Li","Jun Hao Liew","Jiashi Feng","Zilong Huang"],"abstract":"This paper introduces SAIL, a single transformer unified multimodal large language model (MLLM) that integrates raw pixel encoding and language decoding within a singular architecture. Unlike existing modular MLLMs, which rely on a pre-trained vision transformer (ViT), SAIL eliminates the need for a separate vision encoder, presenting a more minimalist architecture design. Instead of introducing novel architectural components, SAIL adapts mix-attention mechanisms and multimodal positional encodings to better align with the distinct characteristics of visual and textual modalities. We systematically compare SAIL's properties-including scalability, cross-modal information flow patterns, and visual representation capabilities-with those of modular MLLMs. By scaling both training data and model size, SAIL achieves performance comparable to modular MLLMs. Notably, the removal of pretrained Vi...","companies":["ByteDance/Seed"],"matched_orgs":["ByteDance/Seed"],"company_groups":["company_china"],"company_regions":["China"],"sources":["huggingface_search"],"source":"huggingface_search","work_type":"","doi":"","openalex_id":"","cited_by_count":0,"quality_score":31,"matched_keywords":["language model"],"author_affiliations":[],"concepts":[],"official_report":false,"quality_signals":{"company_match_source":"HuggingFace Papers organization/author metadata"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/almtokenizer-a-low-bitrate-and-semantic-rich-audio-codec-tokenizer-for-audio-language-modeling","title":"ALMTokenizer: A Low-bitrate and Semantic-rich Audio Codec Tokenizer for Audio Language Modeling","url":"https://www.microsoft.com/en-us/research/publication/almtokenizer-a-low-bitrate-and-semantic-rich-audio-codec-tokenizer-for-audio-language-modeling/","published":"2025-04-13","authors":["Dongchao Yang","Songxiang Liu","Haohan Guo","Jiankun Zhao","Yuanyuan Wang","Helin Wang","Zeqian Ju","Xubo Liu","Xueyuan Chen","Xu Tan","Xixin Wu","Helen M. Meng"],"abstract":"Recent advancements in audio language models have underscored the pivotal role of audio tokenization, which converts audio signals into discrete tokens, thereby facilitating the application of language model architectures to the audio domain. In this study, we introduce ALMTokenizer, a novel low-bitrate and semantically rich audio codec tokenizer for audio language models. Prior methods, such as Encodec, typically encode individual audio frames into discrete tokens without considering the use of context information across frames. Unlike these methods, we introduce a novel query-based compression strategy to capture holistic information with a set of learnable query tokens by explicitly modeling the context information across frames. This design not only enables the codec model to capture more semantic information but also encodes the audio signal with fewer token sequences. Additionally,...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":84,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Audio and Acoustics","Computer science","1970-01-01","language model","compression","quantization"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W4409703144","title":"Artificial intelligence -driven surgery: Revolutionizing precision and reducing risk in the operating room","url":"https://doi.org/10.70593/978-93-49307-41-4_5","published":"2025-04-13","authors":["Venkatesh Ganti"],"abstract":"Artificial intelligence (AI) evolves and penetrates deeper into the sphere of the most technologically advanced surgical medical instruments. Practice and even technological imperatives stimulate the development of artificial intelligence in this direction. Precision surgical navigation systems through augmented reality and robots integrate AI to get a live online surgical scene prediction. Backbone neural networks for training are based on a real-time calculated volumetric surgical scene from pre-, ongoing, and post-surgical procedure state image data fuse which consider physiological observations. Learned multi-modal representation net drastically extracts image to image inconsistency reducing features from merged observations for prediction network.","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"book-chapter","doi":"https://doi.org/10.70593/978-93-49307-41-4_5","openalex_id":"https://openalex.org/W4409703144","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Microsoft (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.4000338912010193},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.33383995294570923}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/causal-integration-of-chemical-structures-improves-representations-of-microscopy-images-for-morphological-profiling","title":"Causal integration of chemical structures improves representations of microscopy images for morphological profiling","url":"https://www.microsoft.com/en-us/research/publication/causal-integration-of-chemical-structures-improves-representations-of-microscopy-images-for-morphological-profiling/","published":"2025-04-12","authors":["Yemin Yu","Neil Tenenholtz","Lester Mackey","Ying Wei","David Alvarez-Melis","Ava P. Amini","Alex Lu"],"abstract":"Recent advances in self-supervised deep learning have improved our ability to quantify cellular morphological changes in high-throughput microscopy screens, a process known as morphological profiling. However, most current methods only learn from images, despite many screens being inherently multimodal, as they involve both a chemical or genetic perturbation as well as an image-based readout. We hypothesized that incorporating chemical compound structure during self-supervised pre-training could improve learned representations of images in high-throughput microscopy screens. We introduce a representation learning framework, MICON (Molecular-Image Contrastive Learning), that models chemical compounds as treatments that induce counterfactual transformations of cell phenotypes. MICON significantly outperforms classical hand-crafted features such as CellProfiler and existing deep-learning-ba...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Miscellaneous","Artificial intelligence","Medical, health and genomics","Computer science","deep learning models"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W4409383105","title":"RSGPT: A remote sensing vision language model and benchmark","url":"https://doi.org/10.1016/j.isprsjprs.2025.03.028","published":"2025-04-12","authors":["Yuan Hu","Jianlong Yuan","Congcong Wen","Xiao‐Nan Lu","Yu Liu","Li Xiang"],"abstract":"","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1016/j.isprsjprs.2025.03.028","openalex_id":"https://openalex.org/W4409383105","cited_by_count":79,"quality_score":71,"matched_keywords":["language model"],"author_affiliations":["Alibaba Group (China)","New York University Abu Dhabi","Peking University","University of Chinese Academy of Sciences","University of Reading"],"concepts":[{"id":"https://openalex.org/C185798385","display_name":"Benchmark (surveying)","score":0.6976097226142883},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6030993461608887},{"id":"https://openalex.org/C62649853","display_name":"Remote sensing","score":0.5317341685295105},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.43882936239242554},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.366890549659729},{"id":"https://openalex.org/C205649164","display_name":"Geography","score":0.3033864498138428},{"id":"https://openalex.org/C58640448","display_name":"Cartography","score":0.25462788343429565}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":79}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/audio-entailment-assessing-deductive-reasoning-for-audio-understanding","title":"Audio Entailment: Assessing Deductive Reasoning for Audio Understanding","url":"https://www.microsoft.com/en-us/research/publication/audio-entailment-assessing-deductive-reasoning-for-audio-understanding/","published":"2025-04-11","authors":["Soham Deshmukh","Shuo Han","Hazim T. Bukhari","Benjamin Elizalde","Hannes Gamper","Rita Singh","Bhiksha Raj"],"abstract":"Recent literature uses language to build foundation models for audio. These Audio–Language Models (ALMs) are trained on a vast number of audio–text pairs and show remarkable performance in tasks including Text-to-Audio Retrieval, Captioning, and Question Answering. However, their ability to engage in more complex open-ended tasks, like Interactive Question-Answering, requires proficiency in logical reasoning—a skill not yet benchmarked. We introduce the novel task of Audio Entailment to evaluate an ALM’s deductive reasoning ability. This task assesses whether a text description (hypothesis) of audio content can be deduced from an audio recording (premise), with potential conclusions being entailment, neutral, or contradiction, depending on the sufficiency of the evidence. We create two datasets for this task with audio recordings sourced from two audio captioning datasets—AudioCaps and C...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":80,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Audio and Acoustics","Audio signal processing","Generative AI","1970-01-01","retrieval"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W4409362682","title":"Improving Retrieval Augmented Language Model with Self-Reasoning","url":"https://doi.org/10.1609/aaai.v39i24.34743","published":"2025-04-11","authors":["Yuan Xia","Jingbo Zhou","Zhenhui Shi","Jun Chen","Haifeng Huang"],"abstract":"The Retrieval-Augmented Language Model (RALM) has demonstrated remarkable performance on knowledge-intensive tasks by integrating external knowledge during inference, which mitigates the factual hallucinations inherited in large language models (LLMs). Despite these advancements, challenges persist in the implementation of RALMs, particularly in terms of reliability and traceability. Specifically, the irrelevant document retrieval may result in unhelpful responses or even deteriorate the performance of LLMs, while the lack of appropriate citations in outputs complicates efforts to verify the trustworthiness of the models. To this end, we propose a novel self-reasoning framework aimed at improving the reliability and traceability of RALMs, whose core idea is to leverage reasoning trajectories generated by the LLM itself. The framework involves constructing self-reasoning trajectories thro...","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1609/aaai.v39i24.34743","openalex_id":"https://openalex.org/W4409362682","cited_by_count":13,"quality_score":62,"matched_keywords":["LLM","language model","retrieval"],"author_affiliations":["Baidu (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6167262196540833},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.6010940074920654},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.43521809577941895},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.42117515206336975},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.3676404654979706}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":13}},{"id":"openalex:W4409368316","title":"Learning to Prompt with Text Only Supervision for Vision-Language Models","url":"https://doi.org/10.1609/aaai.v39i4.32444","published":"2025-04-11","authors":["Muhammad Uzair Khattak","Muhammad Ferjad Naeem","Muzammal Naseer","Luc Van Gool","Federico Tombari"],"abstract":"Foundational vision-language models like CLIP are emerging as a promising paradigm in vision due to their excellent generalization. However, adapting these models for downstream tasks while maintaining their generalization remains challenging. In literature, one branch of methods adapts CLIP by learning prompts using images. While effective, these methods often rely on image-label data, which is not always practical, and struggle to generalize to new datasets due to overfitting on few-shot source data. Another approach explores training-free methods by generating class captions from large language models (LLMs) and performing prompt ensembling, but these methods often produce static, class-specific prompts that cannot be transferred to new classes and incur additional costs by generating LLM descriptions for each class separately. In this work, we aim to combine the strengths of both app...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1609/aaai.v39i4.32444","openalex_id":"https://openalex.org/W4409368316","cited_by_count":16,"quality_score":61,"matched_keywords":["LLM","efficient"],"author_affiliations":["Google (United States)","Khalifa University of Science and Technology","Moscow Banking Institute"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.4801977872848511},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.4063046872615814},{"id":"https://openalex.org/C41895202","display_name":"Linguistics","score":0.3988545536994934},{"id":"https://openalex.org/C15744967","display_name":"Psychology","score":0.39461052417755127},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3583498001098633},{"id":"https://openalex.org/C119767625","display_name":"Optometry","score":0.3233552873134613},{"id":"https://openalex.org/C71924100","display_name":"Medicine","score":0.17927291989326477},{"id":"https://openalex.org/C138885662","display_name":"Philosophy","score":0.11560121178627014}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":16}},{"id":"openalex:W4409365090","title":"LLMEmb: Large Language Model Can Be a Good Embedding Generator for Sequential Recommendation","url":"https://doi.org/10.1609/aaai.v39i11.33327","published":"2025-04-11","authors":["Qidong Liu","Xian Wu","Wanyu Wang","Yejing Wang","Yuanshao Zhu","Xiangyu Zhao","Feng Tian","Yefeng Zheng"],"abstract":"Sequential Recommender Systems (SRS), which model a user's interaction history to predict the next item of interest, are widely used in various applications. However, existing SRS often struggle with low-popularity items, a challenge known as the long-tail problem. This issue leads to reduced serendipity for users and diminished profits for sellers, ultimately harming the overall system. Large Language Model (LLM) has the ability to capture semantic relationships between items, independent of their popularity, making them a promising solution to this problem. In this paper, we introduce LLMEmb, a novel method leveraging LLM to generate item embeddings that enhance SRS performance. To bridge the gap between general-purpose LLM and the recommendation domain, we propose a Supervised Contrastive Fine-Tuning (SCFT) approach. This approach includes attribute-level data augmentation and a tailo...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1609/aaai.v39i11.33327","openalex_id":"https://openalex.org/W4409365090","cited_by_count":14,"quality_score":59,"matched_keywords":["LLM","language model"],"author_affiliations":["City University of Hong Kong","Tencent (China)","Xi'an Jiaotong University"],"concepts":[{"id":"https://openalex.org/C2780992000","display_name":"Generator (circuit theory)","score":0.7105039358139038},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6308714747428894},{"id":"https://openalex.org/C41608201","display_name":"Embedding","score":0.6297147870063782},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.4946593940258026},{"id":"https://openalex.org/C199360897","display_name":"Programming language","score":0.3689388930797577},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.3686116635799408},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.31234586238861084},{"id":"https://openalex.org/C121332964","display_name":"Physics","score":0.06756043434143066}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":14}},{"id":"openalex:W4409363066","title":"Harnessing Large Language Models for Knowledge Graph Question Answering via Adaptive Multi-Aspect Retrieval-Augmentation","url":"https://doi.org/10.1609/aaai.v39i24.34747","published":"2025-04-11","authors":["Derong Xu","Xinhang Li","Ziheng Zhang","Zhenxi Lin","Zhihong Zhu","Zhi Zheng","Xian Wu","Xiangyu Zhao","Tong Xu","Enhong Chen"],"abstract":"Large Language Models (LLMs) demonstrate remarkable capabilities, yet struggle with hallucination and outdated knowledge when tasked with complex knowledge reasoning, resulting in factually incorrect outputs. Previous studies have attempted to mitigate it by retrieving factual knowledge from large-scale knowledge graphs (KGs) to assist LLMs in logical reasoning and prediction of answers. However, this kind of approach often introduces noise and irrelevant data, especially in situations with extensive context from multiple knowledge aspects. In this way, LLM attention can be potentially mislead from question and relevant information. In our study, we introduce an Adaptive Multi-Aspect Retrieval-augmented over KGs (Amar) framework. This method retrieves knowledge including entities, relations, and subgraphs, and converts each piece of retrieved text into prompt embeddings. The Amar framewo...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1609/aaai.v39i24.34747","openalex_id":"https://openalex.org/W4409363066","cited_by_count":14,"quality_score":59,"matched_keywords":["LLM","retrieval"],"author_affiliations":["Center for Excellence in Brain Science and Intelligence Technology","City University of Hong Kong","Peking University","Tencent (China)","University of Science and Technology of China"],"concepts":[{"id":"https://openalex.org/C44291984","display_name":"Question answering","score":0.746051549911499},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6833714246749878},{"id":"https://openalex.org/C2987255567","display_name":"Knowledge graph","score":0.5958728790283203},{"id":"https://openalex.org/C132525143","display_name":"Graph","score":0.4583219587802887},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.4001249670982361},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.3957408666610718},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.37521472573280334},{"id":"https://openalex.org/C80444323","display_name":"Theoretical computer science","score":0.17059648036956787}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":14}},{"id":"openalex:W4409362648","title":"ELLA-V: Stable Neural Codec Language Modeling with Alignment-Guided Sequence Reordering","url":"https://doi.org/10.1609/aaai.v39i24.34703","published":"2025-04-11","authors":["Yakun Song","Zhuo Chen","Xiaofei Wang","Ziyang Ma","Xie Chen"],"abstract":"The language model (LM) approach based on acoustic and linguistic prompts, such as VALL-E, has achieved remarkable progress in the field of zero-shot audio generation. However, existing methods still have some limitations: 1) repetitions, transpositions, and omissions in the output synthesized speech due to limited alignment constraints between audio and phoneme tokens; 2) challenges of fine-grained control over the synthesized speech with autoregressive (AR) language model; 3) infinite silence generation due to the nature of AR-based decoding, especially under the greedy strategy. To alleviate these issues, we propose ELLA-V, a simple but efficient LM-based zero-shot text-to-speech (TTS) framework, which enables fine-grained control over synthesized audio at the phoneme level. The key to ELLA-V is interleaving sequences of acoustic and phoneme tokens, where phoneme tokens appear ahead o...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1609/aaai.v39i24.34703","openalex_id":"https://openalex.org/W4409362648","cited_by_count":9,"quality_score":54,"matched_keywords":["language model","efficient"],"author_affiliations":["Microsoft (United States)","Shanghai Jiao Tong University"],"concepts":[{"id":"https://openalex.org/C2778112365","display_name":"Sequence (biology)","score":0.6308985948562622},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.565626323223114},{"id":"https://openalex.org/C161765866","display_name":"Codec","score":0.5445088148117065},{"id":"https://openalex.org/C28490314","display_name":"Speech recognition","score":0.39483585953712463},{"id":"https://openalex.org/C199360897","display_name":"Programming language","score":0.34438854455947876},{"id":"https://openalex.org/C173608175","display_name":"Parallel computing","score":0.3294941186904907},{"id":"https://openalex.org/C9390403","display_name":"Computer hardware","score":0.13093805313110352},{"id":"https://openalex.org/C86803240","display_name":"Biology","score":0.08176758885383606}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":9}},{"id":"openalex:W4409362992","title":"ASER: Activation Smoothing and Error Reconstruction for Large Language Model Quantization","url":"https://doi.org/10.1609/aaai.v39i21.34443","published":"2025-04-11","authors":["Weibo Zhao","Yubin Shi","Xinyu Lyu","W. Sui","Li Shen","Yong Li"],"abstract":"Quantization stands as a pivotal technique for large language model (LLM) serving, yet it poses significant challenges particularly in achieving effective low-bit quantization. The limited numerical mapping makes the quantized model produce a non-trivial error, bringing out intolerable performance degration. This paper is anchored in the basic idea of model compression objectives, and delves into the layer-wise error distribution of LLMs during post-training quantization. Subsequently, we introduce ASER, an algorithm consisting of (1) Error Reconstruction: low-rank compensation for quantization error with LoRA-style matrices constructed by whitening SVD; (2) Activation Smoothing: outlier extraction to gain smooth activation and better error compensation. ASER is capable of quantizing typical LLMs to low-bit ones, particularly preserving accuracy even in W4A8 per-channel setup. Experiment...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1609/aaai.v39i21.34443","openalex_id":"https://openalex.org/W4409362992","cited_by_count":1,"quality_score":54,"matched_keywords":["LLM","language model","compression","quantization"],"author_affiliations":["Alibaba Group (China)"],"concepts":[{"id":"https://openalex.org/C3770464","display_name":"Smoothing","score":0.8176727294921875},{"id":"https://openalex.org/C28855332","display_name":"Quantization (signal processing)","score":0.5676578879356384},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.527278482913971},{"id":"https://openalex.org/C11413529","display_name":"Algorithm","score":0.4319134056568146},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.41872894763946533},{"id":"https://openalex.org/C28490314","display_name":"Speech recognition","score":0.37115511298179626},{"id":"https://openalex.org/C33923547","display_name":"Mathematics","score":0.32355791330337524},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.21719303727149963}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4409346669","title":"Frozen Language Models Are Gradient Coherence Rectifiers in Vision Transformers","url":"https://doi.org/10.1609/aaai.v39i2.32176","published":"2025-04-11","authors":["Lichen Bai","Zixuan Xiong","Hai Lin","Guangwei Xu","Xiangjin Xie","Rui Guo","Zhanhui Kang","Haitao Zheng","Hong‐Gee Kim"],"abstract":"Large language models (LLMs) have demonstrated remarkable performance in multimodal tasks even with frozen LLM Block and only a few trainable parameters. However, the underlying mechanisms of how LLMs enhance multimodal performance remains unclear. In this work, we focus on the phenomenon that ``Merely concatenating a frozen LLM block to the Vision Transformer (ViT) encoder can yield significant performance enhancements. Moreover, the choice of LLM block and insertion position can have a substantial impact, leading to varying degrees of improvement''. We analyze the optimization of the training process from the perspective of gradient dynamics and find that frozen LLM blocks act as gradient coherence rectifiers, aligning the gradients of different samples more closely during training. Furthermore, we demonstrate that the representation similarity between the inserted LLM block and the ad...","companies":["Alibaba/Qwen","Tencent/Hunyuan"],"matched_orgs":["Alibaba/Qwen","Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1609/aaai.v39i2.32176","openalex_id":"https://openalex.org/W4409346669","cited_by_count":0,"quality_score":53,"matched_keywords":["LLM"],"author_affiliations":["Alibaba Group (China)","Cloud Computing Center","Seoul National University","Tencent (China)","Tsinghua University","University Town of Shenzhen"],"concepts":[{"id":"https://openalex.org/C66322947","display_name":"Transformer","score":0.6463073492050171},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.4747600853443146},{"id":"https://openalex.org/C2778818243","display_name":"Optical coherence tomography","score":0.4720505177974701},{"id":"https://openalex.org/C2781181686","display_name":"Coherence (philosophical gambling strategy)","score":0.42970675230026245},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3367288410663605},{"id":"https://openalex.org/C120665830","display_name":"Optics","score":0.27208781242370605},{"id":"https://openalex.org/C121332964","display_name":"Physics","score":0.2384921908378601},{"id":"https://openalex.org/C119599485","display_name":"Electrical engineering","score":0.23207435011863708}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"apple:d0fx0cb2miflqzmebn9x1wne","title":"Talking Turns: Benchmarking Audio Foundation Models on Turn-Taking Dynamics","url":"https://machinelearning.apple.com/research/talking-turns","published":"2025-04-11","authors":["Siddhant Arora","Zhiyun Lu","Chung-Cheng Chiu","Ruoming Pang","Shinji Watanabe"],"abstract":"The recent wave of audio foundation models (FMs) could provide new capabilities for conversational modeling. However, there have been limited efforts to evaluate these audio FMs comprehensively on their ability to have natural and interactive conversations. To engage in meaningful conversation with the end user, we would want the FMs to additionally perform a fluent succession of turns without too much overlapping speech or long stretches of...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"apple:t1i0gesfl03p7zjpdfccdvbh","title":"MM-Ego: Towards Building Egocentric Multimodal LLMs","url":"https://machinelearning.apple.com/research/mm-ego","published":"2025-04-11","authors":["Hanrong Ye","Haotian Zhang","Erik Daxberger","Lin Chen","Zongyu Lin","Bowen Zhang","Haoxuan You","Dan Xu","Zhe Gan","Jiasen Lu","Yinfei Yang"],"abstract":"This research aims to comprehensively explore building a multimodal foundation model for egocentric video understanding. To achieve this goal, we work on three fronts. First, as there is a lack of QA data for egocentric video understanding, we automatically generate 7M high-quality QA samples for egocentric videos ranging from 30 seconds to one hour long in Ego4D based on human-annotated data. This is one of the largest egocentric QA datasets....","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"apple:mfe70l0e1cfgh91zs1z1z2up","title":"Language Models Know More Than They Show: Exploring Hallucinations From the Model's Viewpoint","url":"https://machinelearning.apple.com/research/exploring-hallucinations","published":"2025-04-11","authors":["Hadas Orgad","Michael Toker","Zorik Gekhman","Roi Reichart","Idan Szpektor§","Hadas Kotek","Yonatan Belinkov"],"abstract":"Large language models (LLMs) often produce errors, including factual inaccuracies, biases, and reasoning failures, collectively referred to as \"hallucinations\". Recent studies have demonstrated that LLMs' internal states encode information regarding the truthfulness of their outputs, and that this information can be utilized to detect errors. In this work, we show that the internal representations of LLMs encode much more information about...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"openalex:W4409365603","title":"MultiBooth: Towards Generating All Your Concepts in an Image from Text","url":"https://doi.org/10.1609/aaai.v39i10.33187","published":"2025-04-11","authors":["Chenyang Zhu","Kai Li","Yue Ma","Chunming He","Xiu Li"],"abstract":"This paper introduces MultiBooth, a method that generates images from texts containing various concepts from users.Despite diffusion models bringing significant advancements for customized text-to-image generation, existing methods often struggle with multi-concept scenarios due to low concept fidelity and high inference cost. MultiBooth addresses these issues by dividing the multi-concept generation process into two phases: a single-concept learning phase and a multi-concept integration phase. During the single-concept learning phase, we employ a multi-modal image encoder and an efficient concept encoding technique to learn a concise and discriminative representation for each concept. In the multi-concept integration phase, we use bounding boxes to define the generation area for each concept within the cross-attention map. This method enables the creation of individual concepts within t...","companies":["Meta/FAIR"],"matched_orgs":["Meta/FAIR"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1609/aaai.v39i10.33187","openalex_id":"https://openalex.org/W4409365603","cited_by_count":10,"quality_score":51,"matched_keywords":["efficient"],"author_affiliations":["BC Platforms (Finland)","Duke University","Hong Kong University of Science and Technology","Meta (United States)","Tsinghua University"],"concepts":[{"id":"https://openalex.org/C115961682","display_name":"Image (mathematics)","score":0.5621088743209839},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5516418814659119},{"id":"https://openalex.org/C2522767166","display_name":"Data science","score":0.3813190162181854},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.37513959407806396},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.27214619517326355}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":10}},{"id":"openalex:W4409366228","title":"Coherency Improved Explainable Recommendation via Large Language Model","url":"https://doi.org/10.1609/aaai.v39i11.33329","published":"2025-04-11","authors":["Shijie Liu","Ruixin Ding","Wei Lü","Jun Wang","Mo Yu","Xiaoming Shi","Wei Zhang"],"abstract":"Explainable recommender systems are designed to elucidate the explanation behind each recommendation, enabling users to comprehend the underlying logic. Previous works perform rating prediction and explanation generation in a multi-task manner. However, these works suffer from incoherence between predicted ratings and explanations. To address the issue, we propose a novel framework that employs a large language model (LLM) to generate a rating, transforms it into a rating vector, and finally generates an explanation based on the rating vector and user-item information. Moreover, we propose utilizing publicly available LLMs and pre-trained sentiment analysis models to automatically evaluate the coherence without human annotations. Extensive experimental results on three datasets of explainable recommendation show that the proposed framework is effective, outperforming state-of-the-art bas...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1609/aaai.v39i11.33329","openalex_id":"https://openalex.org/W4409366228","cited_by_count":6,"quality_score":51,"matched_keywords":["LLM","language model"],"author_affiliations":["East China Normal University","Peking University","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5852291584014893},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.536972165107727},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.37137818336486816}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":6}},{"id":"openalex:W4409346666","title":"Selective Forgetting: Advancing Machine Unlearning Techniques and Evaluation in Language Models","url":"https://doi.org/10.1609/aaai.v39i1.32068","published":"2025-04-11","authors":["Lingzhi Wang","Xingshan Zeng","Jinsong Guo","Kam‐Fai Wong","Georg Gottlob"],"abstract":"This paper explores Machine Unlearning (MU), an emerging field that is gaining increased attention due to concerns about neural models unintentionally remembering personal or sensitive information. We present SeUL, a novel method that enables selective and fine-grained unlearning for language models. Unlike previous work that employs a fully reversed training objective in unlearning, SeUL minimizes the negative impact on the capability of language models, particularly in terms of generation. Furthermore, we introduce two innovative evaluation metrics, sensitive extraction likelihood (S-EL) and sensitive memorization accuracy (S-MA), specifically designed to assess the effectiveness of forgetting sensitive information. In support of the unlearning framework, we propose efficient automatic online and offline sensitive span annotation methods. The online selection method, based on language....","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1609/aaai.v39i1.32068","openalex_id":"https://openalex.org/W4409346666","cited_by_count":5,"quality_score":50,"matched_keywords":["LLM","efficient"],"author_affiliations":["Chinese University of Hong Kong","Harbin Institute of Technology","Huawei Technologies (China)","University of Calabria","Unlimited Group (United Kingdom)"],"concepts":[{"id":"https://openalex.org/C7149132","display_name":"Forgetting","score":0.7136144638061523},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5403391122817993},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4426991045475006},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.37988847494125366},{"id":"https://openalex.org/C188147891","display_name":"Cognitive science","score":0.3553723692893982},{"id":"https://openalex.org/C15744967","display_name":"Psychology","score":0.2695775330066681},{"id":"https://openalex.org/C180747234","display_name":"Cognitive psychology","score":0.22974884510040283}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":5}},{"id":"openalex:W4409365604","title":"ST3: Accelerating Multimodal Large Language Model by Spatial-Temporal Visual Token Trimming","url":"https://doi.org/10.1609/aaai.v39i10.33201","published":"2025-04-11","authors":["Jiedong Zhuang","Lu Lu","Yong Dai","Rui Hu","Jian Chen","Qiang Liu","Haoji Hu"],"abstract":"Multimodal large language models (MLLMs) enhance their perceptual capabilities by integrating visual and textual information. However, processing the massive number of visual tokens incurs a significant computational cost. Existing analysis of the MLLM attention mechanisms remains shallow, leading to coarse-grain token pruning strategies that fail to effectively balance speed and accuracy. In this paper, we conduct a comprehensive investigation of MLLM attention mechanisms with LLaVA. We find that numerous visual tokens and partial attention computations are redundant during the decoding process. Based on this insight, we propose Spatial-Temporal Visual Token Trimming (ST3), a framework designed to accelerate MLLM inference without retraining. ST3 consists of two primary components: 1) Progressive Visual Token Pruning (PVTP), which eliminates inattentive visual tokens across layers, and....","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1609/aaai.v39i10.33201","openalex_id":"https://openalex.org/W4409365604","cited_by_count":1,"quality_score":50,"matched_keywords":["language model","memory","efficient"],"author_affiliations":["Alibaba Group (China)","Zhejiang University"],"concepts":[{"id":"https://openalex.org/C56951928","display_name":"Trimming","score":0.9239757061004639},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6274383664131165},{"id":"https://openalex.org/C48145219","display_name":"Security token","score":0.5985503196716309},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.35010647773742676},{"id":"https://openalex.org/C199360897","display_name":"Programming language","score":0.21977463364601135},{"id":"https://openalex.org/C111919701","display_name":"Operating system","score":0.06389370560646057}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4409363267","title":"RLPF: Reinforcement Learning from Prediction Feedback for User Summarization with LLMs","url":"https://doi.org/10.1609/aaai.v39i24.34738","published":"2025-04-11","authors":["Jiaxing Wu","Ning Lin","Luyang Liu","Harrison Lee","Neo Wu","Chao Wang","Sushant Prakash","Shawn O’Banion","Bradley Green","Jun Xie"],"abstract":"LLM-powered personalization agent systems employ Large Language Models (LLMs) to predict users’ behavior from their past activities. However, their effectiveness often hinges on the ability to effectively leverage extensive, long user historical data due to its inherent noise and length of such data. Existing pre-trained LLMs may generate summaries that are concise but lack the necessary context for downstream tasks, hindering their utility in personalization systems. To address these challenges, we introduce Reinforcement Learning from Prediction Feedback (RLPF). RLPF fine-tunes LLMs to generate concise, human-readable user summaries that are optimized for downstream task performance. By maximizing the usefulness of the generated summaries, RLPF effectively distills extensive user history data while preserving essential information for downstream tasks. Our empirical evaluation demonstr...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1609/aaai.v39i24.34738","openalex_id":"https://openalex.org/W4409363267","cited_by_count":1,"quality_score":50,"matched_keywords":["LLM","personalization","agent"],"author_affiliations":["DeepMind (United Kingdom)","Google (United States)"],"concepts":[{"id":"https://openalex.org/C170858558","display_name":"Automatic summarization","score":0.9460076689720154},{"id":"https://openalex.org/C97541855","display_name":"Reinforcement learning","score":0.7049418687820435},{"id":"https://openalex.org/C67203356","display_name":"Reinforcement","score":0.6129609942436218},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5759372115135193},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.36758923530578613},{"id":"https://openalex.org/C15744967","display_name":"Psychology","score":0.25609874725341797},{"id":"https://openalex.org/C77805123","display_name":"Social psychology","score":0.14427971839904785}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4409362354","title":"Aligning Language Models Using Follow-up Likelihood as Reward Signal","url":"https://doi.org/10.1609/aaai.v39i24.34776","published":"2025-04-11","authors":["Chen Zhang","Dading Chong","Feng Jiang","Chengguang Tang","Anningzhe Gao","Guohua Tang","Haizhou Li"],"abstract":"In natural human-to-human conversations, participants often receive feedback signals from one another based on their follow-up reactions. These reactions can include verbal responses, facial expressions, changes in emotional state, and other non-verbal cues. Similarly, in human-machine interactions, the machine can leverage the user's follow-up utterances as feedback signals to assess whether it has appropriately addressed the user's request. Therefore, we propose using the likelihood of follow-up utterances as rewards to differentiate preferred responses from less favored ones, without relying on human or commercial LLM-based preference annotations. Our proposed reward mechanism, ``Follow-up Likelihood as Reward\" (FLR), matches the performance of strong reward models trained on large-scale human or GPT-4 annotated data on 8 pairwise-preference and 4 rating-based benchmarks. Building upo...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1609/aaai.v39i24.34776","openalex_id":"https://openalex.org/W4409362354","cited_by_count":0,"quality_score":49,"matched_keywords":["LLM","language model","preference"],"author_affiliations":["Chinese University of Hong Kong, Shenzhen","National University of Singapore","Peking University","Shenzhen Research Institute of Big Data","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5433531403541565},{"id":"https://openalex.org/C49781872","display_name":"Maximum likelihood","score":0.5371174812316895},{"id":"https://openalex.org/C2779843651","display_name":"SIGNAL (programming language)","score":0.5329346060752869},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.4865249991416931},{"id":"https://openalex.org/C28490314","display_name":"Speech recognition","score":0.47102150321006775},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3831067681312561},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.3659694790840149},{"id":"https://openalex.org/C15744967","display_name":"Psychology","score":0.35162168741226196}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4409368518","title":"Towards a Multimodal Large Language Model with Pixel-Level Insight for Biomedicine","url":"https://doi.org/10.1609/aaai.v39i4.32394","published":"2025-04-11","authors":["Xiaoshuang Huang","Lingdong Shen","Jia Liu","Fangxin Shang","Hongxiang Li","Haifeng Huang","Yehui Yang"],"abstract":"In recent years, Multimodal Large Language Models (MLLM) have achieved notable advancements, demonstrating the feasibility of developing an intelligent biomedical assistant. However, current biomedical MLLMs predominantly focus on image-level understanding and restrict interactions to textual commands, thus limiting their capability boundaries and the flexibility of usage. In this paper, we introduce a novel end-to-end multimodal large language model for the biomedical domain, named MedPLIB, which possesses pixel-level understanding. Excitingly, it supports visual question answering (VQA), arbitrary pixel-level prompts (points, bounding boxes, and free-form shapes), and pixel-level grounding. We propose a novel Mixture-of-Experts (MoE) multi-stage training strategy, which divides MoE into separate training phases for a visual-language expert model and a pixel-grounding expert model, foll...","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1609/aaai.v39i4.32394","openalex_id":"https://openalex.org/W4409368518","cited_by_count":7,"quality_score":48,"matched_keywords":["language model"],"author_affiliations":["Baidu (China)","Chinese Academy of Sciences","Peking University"],"concepts":[{"id":"https://openalex.org/C66782513","display_name":"Biomedicine","score":0.9133447408676147},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5153433680534363},{"id":"https://openalex.org/C4441509","display_name":"Multimodal therapy","score":0.4801703989505768},{"id":"https://openalex.org/C41895202","display_name":"Linguistics","score":0.3890268802642822},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.36530232429504395},{"id":"https://openalex.org/C15744967","display_name":"Psychology","score":0.30767929553985596},{"id":"https://openalex.org/C138885662","display_name":"Philosophy","score":0.10589209198951721},{"id":"https://openalex.org/C542102704","display_name":"Psychotherapist","score":0.08968988060951233}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":7}},{"id":"openalex:W4409366283","title":"Interpretable Face Anti-Spoofing: Enhancing Generalization with Multimodal Large Language Models","url":"https://doi.org/10.1609/aaai.v39i9.33073","published":"2025-04-11","authors":["Guosheng Zhang","Keyao Wang","Haixiao Yue","Ajian Liu","Gang Zhang","Kun Yao","Errui Ding","Jingdong Wang"],"abstract":"Face Anti-Spoofing (FAS) is essential for ensuring the security and reliability of facial recognition systems. Most existing FAS methods are formulated as binary classification tasks, providing confidence scores without interpretation. They exhibit limited generalization in out-of-domain scenarios, such as new environments or unseen spoofing types. In this work, we introduce a multimodal large language model (MLLM) framework for FAS, termed Interpretable Face Anti-Spoofing (I-FAS), which transforms the FAS task into an interpretable visual question answering (VQA) paradigm. Specifically, we propose a Spoof-aware Captioning and Filtering (SCF) strategy to generate high-quality captions for FAS images, enriching the model's supervision with natural language interpretations. To mitigate the impact of noisy captions during training, we develop a Lopsided Language Model (L-LM) loss function t...","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1609/aaai.v39i9.33073","openalex_id":"https://openalex.org/W4409366283","cited_by_count":7,"quality_score":48,"matched_keywords":["language model"],"author_affiliations":["Baidu (China)","Chinese Academy of Sciences"],"concepts":[{"id":"https://openalex.org/C177148314","display_name":"Generalization","score":0.7468552589416504},{"id":"https://openalex.org/C2779304628","display_name":"Face (sociological concept)","score":0.6395537853240967},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.58575040102005},{"id":"https://openalex.org/C167900197","display_name":"Spoofing attack","score":0.5263760685920715},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.500314474105835},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.44987431168556213},{"id":"https://openalex.org/C28490314","display_name":"Speech recognition","score":0.3501763343811035},{"id":"https://openalex.org/C41895202","display_name":"Linguistics","score":0.32695311307907104}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":7}},{"id":"openalex:W4409366251","title":"CustomCrafter: Customized Video Generation with Preserving Motion and Concept Composition Abilities","url":"https://doi.org/10.1609/aaai.v39i8.32914","published":"2025-04-11","authors":["Tao Wu","Yong Zhang","Xintao Wang","Xianpan Zhou","Guangcong Zheng","Zhongang Qi","Ying Shan","Xi Li"],"abstract":"Customized video generation aims to generate high-quality videos guided by text prompts and subject's reference images. However, since it is only trained on static images, the fine-tuning process of subject learning disrupts abilities of video diffusion models (VDMs) to combine concepts and generate motions. To restore these abilities, some methods use additional video similar to the prompt to fine-tune or guide the model. This requires frequent changes of guiding videos and even re-tuning of the model when generating different motions, which is very inconvenient for users. In this paper, we propose CustomCrafter, a novel framework that preserves the model's motion generation and conceptual combination abilities without additional video and fine-tuning to recovery. For preserving conceptual combination ability, we design a plug-and-play module to update few parameters in VDMs, enhancing....","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1609/aaai.v39i8.32914","openalex_id":"https://openalex.org/W4409366251","cited_by_count":10,"quality_score":47,"matched_keywords":[],"author_affiliations":["Tencent (China)","Zhejiang University","Zhejiang University of Science and Technology"],"concepts":[{"id":"https://openalex.org/C40231798","display_name":"Composition (language)","score":0.6417307257652283},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5846364498138428},{"id":"https://openalex.org/C104114177","display_name":"Motion (physics)","score":0.5544072985649109},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.3712470531463623},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.35865795612335205},{"id":"https://openalex.org/C121684516","display_name":"Computer graphics (images)","score":0.3475472927093506},{"id":"https://openalex.org/C49774154","display_name":"Multimedia","score":0.33155521750450134},{"id":"https://openalex.org/C142362112","display_name":"Art","score":0.10271531343460083}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":10}},{"id":"openalex:W4409367532","title":"Image Conductor: Precision Control for Interactive Video Synthesis","url":"https://doi.org/10.1609/aaai.v39i5.32533","published":"2025-04-11","authors":["Yaowei Li","Xintao Wang","Zhaoyang Zhang","Zhouxia Wang","Ziyang Yuan","Liangbin Xie","Ying Shan","Yuexian Zou"],"abstract":"Filmmaking and animation production often require sophisticated techniques for coordinating camera transitions and object movements, typically involving labor-intensive real-world capturing. Despite advancements in generative AI for video creation, achieving precise control over motion for interactive video asset generation remains challenging. To this end, we propose Image Conductor, a method for precise control of camera transitions and object movements to generate video assets from a single image. An well-cultivated training strategy is proposed to separate distinct camera and object motion by camera LoRA weights and object LoRA weights. To further eliminate motion ambiguity from ill-posed trajectories, we introduce a camera-free guidance technique during inference process, enhancing object movements while eliminating camera transitions. Additionally, we develop a trajectory-oriented....","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1609/aaai.v39i5.32533","openalex_id":"https://openalex.org/W4409367532","cited_by_count":9,"quality_score":46,"matched_keywords":[],"author_affiliations":["Nanyang Technological University","Peking University","Tencent (China)","Tsinghua University","University of Macau"],"concepts":[{"id":"https://openalex.org/C34800285","display_name":"Conductor","score":0.5412430763244629},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5280638337135315},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.4765385091304779},{"id":"https://openalex.org/C2775924081","display_name":"Control (management)","score":0.4147055447101593},{"id":"https://openalex.org/C121684516","display_name":"Computer graphics (images)","score":0.4139041602611542},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.41090816259384155},{"id":"https://openalex.org/C49774154","display_name":"Multimedia","score":0.32085850834846497},{"id":"https://openalex.org/C33923547","display_name":"Mathematics","score":0.1448790431022644}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":9}},{"id":"openalex:W4409367025","title":"Follow-Your-Click: Open-domain Regional Image Animation via Motion Prompts","url":"https://doi.org/10.1609/aaai.v39i6.32643","published":"2025-04-11","authors":["Yue Ma","Yingqing He","Hongfa Wang","Andong Wang","Leqi Shen","Chenyang Qi","Jixuan Ying","Chengfei Cai","Zhifeng Li","Heung‐Yeung Shum","Wei Liu","Qifeng Chen"],"abstract":"Despite recent advances in image-to-video generation, better controllability and local animation are less explored. Most existing image-to-video methods are not locally aware and tend to move the entire scene. However, human artists may need to control the movement of different objects or regions. Additionally, current I2V methods require users not only to describe the target motion but also to provide redundant detailed descriptions of frame contents.These two issues hinder the practical utilization of current I2V tools. In this paper, we propose a practical framework, named Follow-Your-Click, to achieve image animation with a simple user click (for specifying what to move) and a motion prompt (for specifying how to move). Technically, we propose the first-frame masking strategy, which significantly improves the video generation quality, and a motion-augmented module equipped with a mot...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1609/aaai.v39i6.32643","openalex_id":"https://openalex.org/W4409367025","cited_by_count":9,"quality_score":46,"matched_keywords":[],"author_affiliations":["Hong Kong University of Science and Technology","Tencent (China)","Tsinghua University","University of Hong Kong"],"concepts":[{"id":"https://openalex.org/C502989409","display_name":"Animation","score":0.7327224016189575},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.679813802242279},{"id":"https://openalex.org/C121684516","display_name":"Computer graphics (images)","score":0.6000375151634216},{"id":"https://openalex.org/C115961682","display_name":"Image (mathematics)","score":0.5685400366783142},{"id":"https://openalex.org/C36503486","display_name":"Domain (mathematical analysis)","score":0.5542383790016174},{"id":"https://openalex.org/C104114177","display_name":"Motion (physics)","score":0.5208123922348022},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.47642216086387634},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.38834547996520996}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":9}},{"id":"openalex:W4409367035","title":"Multimodal Hypothetical Summary for Retrieval-based Multi-image Question Answering","url":"https://doi.org/10.1609/aaai.v39i5.32513","published":"2025-04-11","authors":["Peize Li","Qingyi Si","Peng Fu","Zheng Lin","Yan Wang"],"abstract":"Retrieval-based multi-image question answering (QA) task involves retrieving multiple question-related images and synthesizing these images to generate an answer. Conventional \"retrieve-then-answer\" pipelines often suffer from cascading errors because the training objective of QA fails to optimize the retrieval stage. To address this issue, we propose a novel method to effectively introduce and reference retrieved information into the QA. Given the image set to be retrieved, we employ a multimodal large language model (visual perspective) and a large language model (textual perspective) to obtain multimodal hypothetical summary in question-form and description-form. By combining visual and textual perspectives, MHyS captures image content more specifically and replaces real images in retrieval, which eliminates the modality gap by transforming into text-to-text retrieval and helps improv...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1609/aaai.v39i5.32513","openalex_id":"https://openalex.org/W4409367035","cited_by_count":0,"quality_score":45,"matched_keywords":["language model","retrieval"],"author_affiliations":["Beijing Academy of Artificial Intelligence","Huawei Technologies (China)","Institute of Information Engineering","Jilin University","University of Chinese Academy of Sciences"],"concepts":[{"id":"https://openalex.org/C44291984","display_name":"Question answering","score":0.8568544387817383},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.6557115912437439},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5604817271232605},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.35626882314682007}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4409356503","title":"Efficient Text-Guided 3D-Aware Generation With Score Distillation on 3D Distribution","url":"https://doi.org/10.1109/tcsvt.2025.3559931","published":"2025-04-11","authors":["Yiji Cheng","Fei Yin","Xiaoke Huang","Xintong Yu","Jiaxiang Liu","Shikun Feng","Yujiu Yang","Yansong Tang"],"abstract":"Text-to-3D generation enables the creation of 3D content with infinite possibilities. Existing methods typically involve training 3D generative models, which suffer from poor semantic alignment due to the scarcity of paired 3D data, or optimizing a 3D representation with 2D diffusion guidance, resulting in slow inference, low diversity, and Janus problems. In this paper, we introduce InstantDreamer, a model designed for text-guided 3D-aware generation in a single forward pass without requiring paired training datasets, thereby enhancing efficiency. To accomplish this, we extend score distillation to learn a 3D-aware semantics distribution. We distill priors from diffusion models into a 3D-aware generator, amortizing the optimization time required for new prompts and eliminating the necessity of paired training data. We equip the generator with hierarchical semantics conditioning, explici...","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/tcsvt.2025.3559931","openalex_id":"https://openalex.org/W4409356503","cited_by_count":0,"quality_score":45,"matched_keywords":["efficient","distillation"],"author_affiliations":["Baidu (China)","Tsinghua University","University Town of Shenzhen"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6491278409957886},{"id":"https://openalex.org/C204030448","display_name":"Distillation","score":0.5260172486305237},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4062984585762024},{"id":"https://openalex.org/C178790620","display_name":"Organic chemistry","score":0.0},{"id":"https://openalex.org/C185592680","display_name":"Chemistry","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4409346427","title":"Attribution Analysis Meets Model Editing: Advancing Knowledge Correction in Vision Language Models with VisEdit","url":"https://doi.org/10.1609/aaai.v39i2.32215","published":"2025-04-11","authors":["Qizhou Chen","Taolin Zhang","Chengyu Wang","Xiaofeng He","Dakan Wang","Tingting Liu"],"abstract":"Model editing aims to correct outdated or erroneous knowledge in large models without costly retraining. Recent research discovered that the mid-layer representation of the subject's final token in a prompt has a strong influence on factual predictions, and developed Large Language Model (LLM) editing techniques based on this observation. However, for Vision-LLMs (VLLMs), how visual representations impact the predictions from a decoder-only language model remains largely unexplored. To the best of our knowledge, model editing for VLLMs has not been extensively studied in the literature. In this work, we employ the contribution allocation and noise perturbation methods to measure the contributions of visual representations for token predictions. Our attribution analysis shows that visual representations in mid-to-later layers that are highly relevant to the prompt contribute significantly...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1609/aaai.v39i2.32215","openalex_id":"https://openalex.org/W4409346427","cited_by_count":0,"quality_score":45,"matched_keywords":["LLM","language model"],"author_affiliations":["Alibaba Group (China)","East China Normal University","Shanghai Stock Exchange"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6121305823326111},{"id":"https://openalex.org/C143299363","display_name":"Attribution","score":0.5707967877388},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.43495553731918335},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.42221683263778687},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4026801884174347},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.335663378238678},{"id":"https://openalex.org/C15744967","display_name":"Psychology","score":0.2342117726802826},{"id":"https://openalex.org/C77805123","display_name":"Social psychology","score":0.06195443868637085}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4409362519","title":"A New Formula for Sticker Retrieval: Reply with Stickers in Multi-Modal and Multi-Session Conversation","url":"https://doi.org/10.1609/aaai.v39i24.34720","published":"2025-04-11","authors":["Bingbing Wang","Yiming Du","Bin Liang","Zhixin Bai","Min Yang","Baojun Wang","Kam‐Fai Wong","Ruifeng Xu"],"abstract":"Stickers are widely used in online chatting, which can vividly express someone's intention, emotion, or attitude. Existing conversation research typically retrieves stickers based on a single session or the previous textual information, which can not adapt to the multi-modal and multi-session nature of the real-world conversation. To this end, we introduce MultiChat, a new dataset for sticker retrieval facing the multi-modal and multi-session conversation, comprising 1,542 sessions, featuring 50,192 utterances and 2,182 stickers. Based on the created dataset, we propose a novel Intent-Guided Sticker Retrieval (IGSR) framework that retrieves stickers for multi-modal and multi-session conversation history drawing support from intent learning. Specifically, we introduce sticker attributes to better leverage the sticker information in multi-modal conversation, which are incorporated with utt...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1609/aaai.v39i24.34720","openalex_id":"https://openalex.org/W4409362519","cited_by_count":0,"quality_score":45,"matched_keywords":["memory","retrieval"],"author_affiliations":["Chinese Academy of Sciences","Chinese University of Hong Kong","Harbin Institute of Technology","Huawei Technologies (China)","Peng Cheng Laboratory","Shenzhen Institute of Information Technology","Shenzhen Institutes of Advanced Technology"],"concepts":[{"id":"https://openalex.org/C2779182362","display_name":"Session (web analytics)","score":0.8936327695846558},{"id":"https://openalex.org/C2777200299","display_name":"Conversation","score":0.8578700423240662},{"id":"https://openalex.org/C71139939","display_name":"Modal","score":0.7477393746376038},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.4661129117012024},{"id":"https://openalex.org/C15744967","display_name":"Psychology","score":0.33573806285858154},{"id":"https://openalex.org/C28490314","display_name":"Speech recognition","score":0.33476656675338745},{"id":"https://openalex.org/C46312422","display_name":"Communication","score":0.30673953890800476},{"id":"https://openalex.org/C136764020","display_name":"World Wide Web","score":0.15325412154197693}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4409368430","title":"Energy-Guided Optimization for Personalized Image Editing with Pretrained Text-to-Image Diffusion Models","url":"https://doi.org/10.1609/aaai.v39i4.32424","published":"2025-04-11","authors":["Rui Jiang","Xinghe Fu","Guangcong Zheng","Teng Li","Taiping Yao","Xi Li"],"abstract":"The rapid advancement of pretrained text-driven diffusion models has significantly enriched applications in image generation and editing. However, as the demand for personalized content editing increases, new challenges emerge especially when dealing with arbitrary objects and complex scenes. Existing methods usually mistakes mask as the object shape prior, which struggle to achieve a seamless integration result. The mostly used inversion noise initialization also hinders the identity consistency towards the target object. To address these challenges, we propose a novel training-free framework that formulates personalized content editing as the optimization of edited images in the latent space, using diffusion models as the energy function guidance conditioned by reference text-image pairs. A coarse-to-fine strategy is proposed that employs text energy guidance at the early stage to achi...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1609/aaai.v39i4.32424","openalex_id":"https://openalex.org/W4409368430","cited_by_count":3,"quality_score":44,"matched_keywords":["personalized"],"author_affiliations":["Tencent (China)","Zhejiang University of Science and Technology"],"concepts":[{"id":"https://openalex.org/C115961682","display_name":"Image (mathematics)","score":0.6642442941665649},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6049088835716248},{"id":"https://openalex.org/C2776674983","display_name":"Image editing","score":0.5794069170951843},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5065594911575317},{"id":"https://openalex.org/C186370098","display_name":"Energy (signal processing)","score":0.47658440470695496},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.42420095205307007},{"id":"https://openalex.org/C33923547","display_name":"Mathematics","score":0.10720053315162659},{"id":"https://openalex.org/C105795698","display_name":"Statistics","score":0.06754621863365173}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":3}},{"id":"openalex:W4409381814","title":"TRANSFORMER EXPLAINER: Interactive Learning of Text-Generative Models","url":"https://doi.org/10.1609/aaai.v39i28.35347","published":"2025-04-11","authors":["Aeree Cho","Grace Kim","Alexander Karpekov","Alec Helbling","Zijie J. Wang","Seongmin Lee","Benjamin Hoover","Duen Horng Chau"],"abstract":"Transformers have revolutionized machine learning, yet their inner workings remain opaque to many. We present TRANSFORMER EXPLAINER, an interactive visualization tool designed for non-experts to learn about Transformers through the GPT-2 model. Our tool helps users understand complex Transformer concepts by integrating a model overview and smooth transitions across abstraction levels of math operations and model structures. It runs a live GPT-2 model locally in the user’s browser, empowering users to experiment with their own input and observe in real-time how the internal components and parameters of the Transformer work together to predict the next tokens. 125,000 users have used our open-source tool at https://poloclub.github.io/ transformer-explainer/.","companies":["OpenAI"],"matched_orgs":["OpenAI"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1609/aaai.v39i28.35347","openalex_id":"https://openalex.org/W4409381814","cited_by_count":6,"quality_score":43,"matched_keywords":[],"author_affiliations":["Georgia Institute of Technology","OpenAI (United States)"],"concepts":[{"id":"https://openalex.org/C66322947","display_name":"Transformer","score":0.7107386589050293},{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.6466158032417297},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5400465726852417},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.3240083158016205},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.22350311279296875},{"id":"https://openalex.org/C127413603","display_name":"Engineering","score":0.17585089802742004},{"id":"https://openalex.org/C119599485","display_name":"Electrical engineering","score":0.14632520079612732},{"id":"https://openalex.org/C165801399","display_name":"Voltage","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":6}},{"id":"openalex:W4409363101","title":"SongEditor: Adapting Zero-Shot Song Generation Language Model as a Multi-Task Editor","url":"https://doi.org/10.1609/aaai.v39i24.34750","published":"2025-04-11","authors":["Chenyu Yang","Shuai Wang","Hangting Chen","Jianwei Yu","Wei Tan","Rongzhi Gu","Yaoxun Xu","Yizhi Zhou","Haina Zhu","Haizhou Li"],"abstract":"The emergence of novel generative modeling paradigms, particularly audio language models, has significantly advanced the field of song generation. Although state-of-the-art models are capable of synthesizing both vocals and accompaniment tracks up to several minutes long concurrently, research about partial adjustments or editing of existing songs is still underexplored, which allows for more flexible and effective production. In this paper, we present SongEditor, the first song editing paradigm that introduces the editing capabilities into language-modeling song generation approaches, facilitating both segment-wise and track-wise modifications. SongEditor offers the flexibility to adjust lyrics, vocals, and accompaniments, as well as synthesizing songs from scratch. The core components of SongEditor include a music tokenizer, an autoregressive language model, and a diffusion generator,....","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1609/aaai.v39i24.34750","openalex_id":"https://openalex.org/W4409363101","cited_by_count":2,"quality_score":43,"matched_keywords":["language model"],"author_affiliations":["Chinese University of Hong Kong, Shenzhen","Nanjing University","Shanghai Jiao Tong University","Tencent (China)","Tsinghua University"],"concepts":[{"id":"https://openalex.org/C2780813799","display_name":"Zero (linguistics)","score":0.7185652852058411},{"id":"https://openalex.org/C2778344882","display_name":"Shot (pellet)","score":0.6710616946220398},{"id":"https://openalex.org/C2780451532","display_name":"Task (project management)","score":0.6452215909957886},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5394028425216675},{"id":"https://openalex.org/C41895202","display_name":"Linguistics","score":0.37472009658813477},{"id":"https://openalex.org/C28490314","display_name":"Speech recognition","score":0.3227877914905548},{"id":"https://openalex.org/C138885662","display_name":"Philosophy","score":0.15770289301872253},{"id":"https://openalex.org/C127413603","display_name":"Engineering","score":0.13194867968559265}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":2}},{"id":"openalex:W4409367006","title":"GenesisTex2: Stable, Consistent and High-Quality Text-to-Texture Generation","url":"https://doi.org/10.1609/aaai.v39i6.32621","published":"2025-04-11","authors":["Jiawei Lu","YingPeng Zhang","Zengjun Zhao","He Wang","Kun Zhou","Tianjia Shao"],"abstract":"Large-scale text-guided image diffusion models have demonstrated remarkable results in text-to-image (T2I) generation. However, applying these models to synthesize textures for 3D geometries remains challenging due to the domain gap between 2D images and textures on a 3D surface. Early works that used a projecting-inpainting approach managed to preserve generation diversity, but often resulted in noticeable artifacts and style inconsistencies. While recent methods have attempted to address these inconsistencies, they often introduce other issues, such as blurring, over-saturation, or over-smoothing. To overcome these challenges, we propose a novel text-to-texture synthesis framework that takes advantage of pre-trained diffusion models. We introduce a local attention reweighing mechanism in the self-attention layers to guide the model in focusing on spatial-correlated patches across diffe...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1609/aaai.v39i6.32621","openalex_id":"https://openalex.org/W4409367006","cited_by_count":2,"quality_score":43,"matched_keywords":["distillation"],"author_affiliations":["Tencent (China)","University College London","Zhejiang University"],"concepts":[{"id":"https://openalex.org/C2781195486","display_name":"Texture (cosmology)","score":0.6958737373352051},{"id":"https://openalex.org/C2779530757","display_name":"Quality (philosophy)","score":0.5424171090126038},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.43168407678604126},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.42975935339927673},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.3664582371711731},{"id":"https://openalex.org/C138885662","display_name":"Philosophy","score":0.1001158356666565},{"id":"https://openalex.org/C115961682","display_name":"Image (mathematics)","score":0.06421121954917908},{"id":"https://openalex.org/C111472728","display_name":"Epistemology","score":0.04469561576843262}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":2}},{"id":"openalex:W4409370024","title":"Simplifying Control Mechanism in Text-to-Image Diffusion Models","url":"https://doi.org/10.1609/aaai.v39i3.32309","published":"2025-04-11","authors":["Zhida Feng","Li Chen","Yuenan Sun","Jiaxiang Liu","Shikun Feng"],"abstract":"ControlNet has significantly advanced controllable image generation by integrating dense conditions (such as depth and canny edges) with text-to-image diffusion models. However, ControlNet's integration requires an additional amount nearly equal to half of the base diffusion model's parameters, making it inefficient. To address this, we introduce Simple-ControlNet, an efficient and streamlined network for controllable text-to-image generation. It employs a single-scale projection layer to incorporate condition information into the denoising U-Net. It is supplemented by Low-Rank Adapter (LoRA) parameters to facilitate condition learning. Impressively, Simple-ControlNet requires fewer than 3 million parameters for the control mechanism, substantially less than the 300 million needed by ControlNet. Our extensive experiments confirm that Simple-ControlNet matches and surpasses ControlNet's p...","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1609/aaai.v39i3.32309","openalex_id":"https://openalex.org/W4409370024","cited_by_count":1,"quality_score":42,"matched_keywords":["efficient"],"author_affiliations":["Baidu (China)","Wuhan University of Science and Technology"],"concepts":[{"id":"https://openalex.org/C89611455","display_name":"Mechanism (biology)","score":0.7226681113243103},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5218043327331543},{"id":"https://openalex.org/C69357855","display_name":"Diffusion","score":0.5069823265075684},{"id":"https://openalex.org/C115961682","display_name":"Image (mathematics)","score":0.49902844429016113},{"id":"https://openalex.org/C2775924081","display_name":"Control (management)","score":0.49583396315574646},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3331529200077057},{"id":"https://openalex.org/C121332964","display_name":"Physics","score":0.16667672991752625},{"id":"https://openalex.org/C111472728","display_name":"Epistemology","score":0.13194307684898376}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4409369745","title":"Multi-task Visual Grounding with Coarse-to-Fine Consistency Constraints","url":"https://doi.org/10.1609/aaai.v39i3.32265","published":"2025-04-11","authors":["Yong Dai","Jian Li","Jiedong Zhuang","Xian Zhang","Wankou Yang"],"abstract":"Multi-task visual grounding involves the simultaneous execution of localization and segmentation in images based on textual expressions. The majority of advanced methods predominantly focus on transformer-based multimodal fusion, aiming to extract robust multimodal representations. However, ambiguity between referring expression comprehension (REC) and referring image segmentation (RIS) is error-prone, leading to inconsistencies between multi-task predictions. Besides, insufficient multimodal understanding directly contributes to biased target perception. To overcome these challenges, we propose a Coarse-to-fine Consistency Constraints Visual Grounding architecture (C3VG), which integrates implicit and explicit modeling approaches within a two-stage framework. Initially, query and pixel decoders are employed to generate preliminary detection and segmentation outputs, a process referred t...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1609/aaai.v39i3.32265","openalex_id":"https://openalex.org/W4409369745","cited_by_count":4,"quality_score":41,"matched_keywords":[],"author_affiliations":["Southeast University","Tencent (China)","Zhejiang University"],"concepts":[{"id":"https://openalex.org/C2780451532","display_name":"Task (project management)","score":0.6404867768287659},{"id":"https://openalex.org/C2776436953","display_name":"Consistency (knowledge bases)","score":0.6216817498207092},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5684939622879028},{"id":"https://openalex.org/C168993435","display_name":"Ground","score":0.4321862459182739},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3460395932197571},{"id":"https://openalex.org/C127413603","display_name":"Engineering","score":0.13172972202301025},{"id":"https://openalex.org/C201995342","display_name":"Systems engineering","score":0.10675168037414551},{"id":"https://openalex.org/C119599485","display_name":"Electrical engineering","score":0.05346974730491638}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":4}},{"id":"openalex:W4409366044","title":"Local Conditional Controlling for Text-to-Image Diffusion Models","url":"https://doi.org/10.1609/aaai.v39i10.33139","published":"2025-04-11","authors":["Yibo Zhao","Peng Liang","Yang Yang","Zekai Luo","Hengjia Li","Yao Chen","Zheng Yang","Xiaofei He","Wei Zhao","Qinglin Lu","Wei Liu","Boxi Wu"],"abstract":"Diffusion models have exhibited impressive prowess in the text-to-image task. Recent methods add image-level structure controls, e.g., edge and depth maps, to manipulate the generation process together with text prompts to obtain desired images. This controlling process is globally operated on the entire image, which limits the flexibility of control regions. In this paper, we explore a novel and practical task setting: local control. It focuses on controlling specific local region according to user-defined image conditions, while the remaining regions are only conditioned by the original text prompt. However, it is non-trivial to achieve it. The naive manner of directly adding local conditions may lead to the local control dominance problem, which forces the model to focus on the controlled region and neglect object generation in other regions. To mitigate this problem, we propose Regio...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1609/aaai.v39i10.33139","openalex_id":"https://openalex.org/W4409366044","cited_by_count":4,"quality_score":41,"matched_keywords":[],"author_affiliations":["State Key Laboratory of Chemical Engineering","Tencent (China)","Xidian University","Zhejiang University of Technology"],"concepts":[{"id":"https://openalex.org/C69357855","display_name":"Diffusion","score":0.5154174566268921},{"id":"https://openalex.org/C115961682","display_name":"Image (mathematics)","score":0.4983024597167969},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.44243040680885315},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.38891875743865967},{"id":"https://openalex.org/C149782125","display_name":"Econometrics","score":0.37540701031684875},{"id":"https://openalex.org/C121864883","display_name":"Statistical physics","score":0.34259238839149475},{"id":"https://openalex.org/C33923547","display_name":"Mathematics","score":0.2822719216346741},{"id":"https://openalex.org/C121332964","display_name":"Physics","score":0.17717695236206055}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":4}},{"id":"openalex:W4409347878","title":"Improving Factuality in Large Language Models via Decoding-Time Hallucinatory and Truthful Comparators","url":"https://doi.org/10.1609/aaai.v39i24.34751","published":"2025-04-11","authors":["Dingkang Yang","Dongling Xiao","Jinjie Wei","M. H. Li","Zhaoyu Chen","Ke Li","Lihua Zhang"],"abstract":"Despite their remarkable capabilities, Large Language Models (LLMs) are prone to generate responses that contradict verifiable facts, i.e., unfaithful hallucination content. Existing efforts generally focus on optimizing model parameters or editing semantic representations, which compromise the internal factual knowledge of target LLMs. In addition, hallucinations typically exhibit multifaceted patterns in downstream tasks, limiting the model's holistic performance across tasks. In this paper, we propose a Comparator-driven Decoding-Time (CDT) framework to alleviate the response hallucination. Firstly, we construct hallucinatory and truthful comparators with multi-task fine-tuning samples. In this case, we present an instruction prototype-guided mixture of experts strategy to enhance the ability of the corresponding comparators to capture different hallucination or truthfulness patterns....","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1609/aaai.v39i24.34751","openalex_id":"https://openalex.org/W4409347878","cited_by_count":4,"quality_score":41,"matched_keywords":[],"author_affiliations":["Fudan University","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C57273362","display_name":"Decoding methods","score":0.7264828085899353},{"id":"https://openalex.org/C155745195","display_name":"Comparator","score":0.5877843499183655},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5661431550979614},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4044739305973053},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.37109410762786865},{"id":"https://openalex.org/C11413529","display_name":"Algorithm","score":0.17030644416809082},{"id":"https://openalex.org/C127413603","display_name":"Engineering","score":0.1187523901462555},{"id":"https://openalex.org/C119599485","display_name":"Electrical engineering","score":0.05242478847503662}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":4}},{"id":"openalex:W4409368270","title":"FlexiTex: Enhancing Texture Generation via Visual Guidance","url":"https://doi.org/10.1609/aaai.v39i4.32415","published":"2025-04-11","authors":["Dadong Jiang","Xianghui Yang","Zibo Zhao","Sheng Zhang","Jiaao Yu","Zeqiang Lai","Shaoxiong Yang","Chunchao Guo","Xiaobo Zhou","Zhihui Ke"],"abstract":"Recent texture generation methods achieve impressive results due to the powerful generative prior they leverage from large-scale text-to-image diffusion models. However, abstract textual prompts are limited in providing global textural or shape information, which results in the texture generation methods producing blurry or inconsistent patterns. To tackle this, we present FlexiTex, embedding rich information via visual guidance to generate a high-quality texture. The core of FlexiTex is the Visual Guidance Enhancement module, which incorporates more specific information from visual guidance to reduce ambiguity in the text prompt and preserve high-frequency details. To further enhance the visual guidance, we introduce a Direction-Aware Adaptation module that automatically designs direction prompts based on different camera poses, avoiding the Janus problem and maintaining semantically gl...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1609/aaai.v39i4.32415","openalex_id":"https://openalex.org/W4409368270","cited_by_count":4,"quality_score":41,"matched_keywords":[],"author_affiliations":["Tencent (China)","Tianjin University"],"concepts":[{"id":"https://openalex.org/C2781195486","display_name":"Texture (cosmology)","score":0.6800576448440552},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4764012396335602},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.4559086859226227},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.4087224304676056},{"id":"https://openalex.org/C115961682","display_name":"Image (mathematics)","score":0.13843536376953125}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":4}},{"id":"openalex:W4409347657","title":"Attentive Eraser: Unleashing Diffusion Model’s Object Removal Potential via Self-Attention Redirection Guidance","url":"https://doi.org/10.1609/aaai.v39i19.34285","published":"2025-04-11","authors":["Wenhao Sun","Xuemei Dong","Benlei Cui","Jingqun Tang"],"abstract":"Recently, diffusion models have emerged as promising newcomers in the field of generative models, shining brightly in image generation. However, when employed for object removal tasks, they still encounter issues such as generating random artifacts and the incapacity to repaint foreground object areas with appropriate content after removal. To tackle these problems, we propose Attentive Eraser, a tuning-free method to empower pre-trained diffusion models for stable and effective object removal. Firstly, in light of the observation that the self-attention maps influence the structure and shape details of the generated images, we propose Attention Activation and Suppression (ASS), which re-engineers the self-attention mechanism within the pre-trained diffusion models based on the given mask, thereby prioritizing the background over the foreground object during the reverse generation proces...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1609/aaai.v39i19.34285","openalex_id":"https://openalex.org/W4409347657","cited_by_count":4,"quality_score":41,"matched_keywords":[],"author_affiliations":["Alibaba Group (China)","Zhejiang Gongshang University"],"concepts":[{"id":"https://openalex.org/C69357855","display_name":"Diffusion","score":0.5520093441009521},{"id":"https://openalex.org/C2781238097","display_name":"Object (grammar)","score":0.5075739622116089},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.48674941062927246},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.32885709404945374},{"id":"https://openalex.org/C121332964","display_name":"Physics","score":0.24597501754760742},{"id":"https://openalex.org/C97355855","display_name":"Thermodynamics","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":4}},{"id":"openalex:W4415624956","title":"ANFM: Adaptable Node Representation Modeling in Smart Contracts for Vulnerability Detection Using Graph-Based Approaches","url":"https://doi.org/10.1109/miccis66057.2025.00008","published":"2025-04-11","authors":["Zekai Chen","Shujie Liang","Jiahui Huang","Teng Huang"],"abstract":"The growing security concerns surrounding smart contracts, driven by the substantial assets they manage and the immutable nature of blockchain technology, have heightened the need for effective vulnerability detection methods. Traditional rule-based approaches are often hindered by high false-positive rates and lengthy detection times. While recent research has explored deep learning techniques, these often overlook the subtle semantic information embedded in the codebase of smart contracts. To introduce this, we put forward ANFM, a framework that integrates both the semantic and structural aspects of smart contracts. ANFM leverages data flow and control flow information to construct a graph representation, which is further enhanced by a large language model to capture the intricate semantic details of the nodes. This enables a deeper understanding of the contract's code, improving detec...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/miccis66057.2025.00008","openalex_id":"https://openalex.org/W4415624956","cited_by_count":0,"quality_score":41,"matched_keywords":["language model"],"author_affiliations":["Guangzhou University","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7918999791145325},{"id":"https://openalex.org/C95713431","display_name":"Vulnerability (computing)","score":0.5698000192642212},{"id":"https://openalex.org/C132525143","display_name":"Graph","score":0.45899999141693115},{"id":"https://openalex.org/C2780801425","display_name":"Construct (python library)","score":0.4375},{"id":"https://openalex.org/C27458966","display_name":"Control flow graph","score":0.4345000088214874},{"id":"https://openalex.org/C167063184","display_name":"Vulnerability assessment","score":0.421099990606308},{"id":"https://openalex.org/C2776359362","display_name":"Representation (politics)","score":0.4207000136375427},{"id":"https://openalex.org/C51929080","display_name":"Codebase","score":0.40380001068115234}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4409367222","title":"SpotActor: Training-Free Layout-Controlled Consistent Image Generation","url":"https://doi.org/10.1609/aaai.v39i7.32831","published":"2025-04-11","authors":["Jiahao Wang","Caixia Yan","Weizhan Zhang","Haonan Lin","Mengmeng Wang","Guang Dai","Tieliang Gong","Hao Sun","Jingdong Wang"],"abstract":"Text-to-image diffusion models significantly enhance the efficiency of artistic creation with high-fidelity image generation. However, in typical application scenarios like comic book production, they can neither place each subject into its expected spot nor maintain the consistent appearance of each subject across images. For these issues, we pioneer a novel task, Layout-to-Consistent-Image (L2CI) generation, which produces consistent and compositional images in accordance with the given layout conditions and text prompts. To accomplish this challenging task, we present a new formalization of dual energy guidance with optimization in a dual semantic-latent space and thus propose a training-free pipeline, SpotActor, which features a layout-conditioned optimizing stage and a consistent sampling stage. In the optimizing stage, we innovate a nuanced layout energy function to mimic the atten...","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1609/aaai.v39i7.32831","openalex_id":"https://openalex.org/W4409367222","cited_by_count":3,"quality_score":40,"matched_keywords":[],"author_affiliations":["Baidu (China)","China Telecom","China Telecom (China)","State Grid Corporation of China (China)","Xi'an Jiaotong University"],"concepts":[{"id":"https://openalex.org/C2777211547","display_name":"Training (meteorology)","score":0.7461507320404053},{"id":"https://openalex.org/C115961682","display_name":"Image (mathematics)","score":0.58732008934021},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5567062497138977},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3681563138961792},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.3510705232620239},{"id":"https://openalex.org/C205649164","display_name":"Geography","score":0.10061603784561157},{"id":"https://openalex.org/C153294291","display_name":"Meteorology","score":0.04442301392555237}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":3}},{"id":"openalex:W4409369752","title":"OpenVIS: Open-vocabulary Video Instance Segmentation","url":"https://doi.org/10.1609/aaai.v39i3.32338","published":"2025-04-11","authors":["Pinxue Guo","Hao Huang","Peiyang He","Xuefeng Liu","Tianjun Xiao","Wenqiang Zhang"],"abstract":"Open-vocabulary Video Instance Segmentation (OpenVIS) can simultaneously detect, segment, and track arbitrary object categories in a video, without being constrained to categories seen during training. In this work, we propose InstFormer, a carefully designed framework for the OpenVIS task that achieves powerful open-vocabulary capabilities through lightweight fine-tuning with limited-category data. InstFormer begins with the open-world mask proposal network, encouraged to propose all potential instance class-agnostic masks by the contrastive instance margin loss. Next, we introduce InstCLIP, adapted from pre-trained CLIP with Instance Guidance Attention, which encodes open-vocabulary instance tokens efficiently. These instance tokens not only enable open-vocabulary classification but also offer strong universal tracking capabilities. Furthermore, to prevent the tracking module from bein...","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1609/aaai.v39i3.32338","openalex_id":"https://openalex.org/W4409369752","cited_by_count":3,"quality_score":40,"matched_keywords":[],"author_affiliations":["Amazon (United States)","Fudan University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6319634914398193},{"id":"https://openalex.org/C89600930","display_name":"Segmentation","score":0.5499831438064575},{"id":"https://openalex.org/C2777601683","display_name":"Vocabulary","score":0.5317766666412354},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.44554251432418823},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.3526085615158081},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.3488990068435669},{"id":"https://openalex.org/C41895202","display_name":"Linguistics","score":0.1826292872428894},{"id":"https://openalex.org/C138885662","display_name":"Philosophy","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":3}},{"id":"openalex:W4409346387","title":"MotionCraft: Crafting Whole-Body Motion with Plug-and-Play Multimodal Controls","url":"https://doi.org/10.1609/aaai.v39i2.32183","published":"2025-04-11","authors":["Yuxuan Bian","Ailing Zeng","Xuan Ju","Xian Liu","Zhaoyang Zhang","Wei Liu","Qiang Xu"],"abstract":"Whole-body multimodal motion generation, controlled by text, speech, or music, has numerous applications including video generation and character animation. However, employing a unified model to process different condition modalities presents two main challenges: motion distribution drifts across different tasks (e.g., co-speech gestures and text-driven daily actions) and the complex optimization of mixed conditions with varying granularities (e.g., text and audio). In this paper, we propose MotionCraft, a unified diffusion transformer that crafts whole-body motion with plug-and-play multimodal control. Our framework employs a coarse-to-fine training strategy, starting with the text-to-motion semantic pre-training, followed by the multimodal low-level control adaptation. To effectively learn and transfer motion knowledge across different distributions, we design MC-Attn for parallel mode...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1609/aaai.v39i2.32183","openalex_id":"https://openalex.org/W4409346387","cited_by_count":3,"quality_score":40,"matched_keywords":[],"author_affiliations":["Chinese University of Hong Kong","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C104114177","display_name":"Motion (physics)","score":0.6365448236465454},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.41616207361221313},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.3472249209880829},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.23083797097206116}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":3}},{"id":"openalex:W4409367204","title":"Learning Fine-Grained Alignment for Aerial Vision-Dialog Navigation","url":"https://doi.org/10.1609/aaai.v39i7.32758","published":"2025-04-11","authors":["Yifei Su","Dong An","Kehan Chen","Weichen Yu","Bo Ning","Yonggen Ling","Yan Huang","Liang Wang"],"abstract":"Aerial Vision-Dialog Navigation (AVDN) is a new task that requires drones to navigate to a target location based on human-robot dialog history. This paper focuses on the critical fine-grained cross-modal alignment problem in AVDN, requiring the drone to align language entities with visual landmarks in top-down views. To achieve this, we first construct a Fine-Grained AVDN (FG-AVDN) dataset via a semi-automatic annotation pipeline, providing diverse multimodal annotations at the entity-landmark level. Based on this, a novel Fine-grained Entity-Landmark Alignment (FELA) method is proposed to learn the cross-modal alignment explicitly. Concretely, FELA first boosts the drone's visual understanding with a precise semantic grid representation, which captures the environmental semantics and spatial structure simultaneously. Subsequently, to learn the entity-landmark alignment, we devise cross-...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1609/aaai.v39i7.32758","openalex_id":"https://openalex.org/W4409367204","cited_by_count":1,"quality_score":38,"matched_keywords":[],"author_affiliations":["Carnegie Mellon University","Institute of Automation","Mohamed bin Zayed University of Artificial Intelligence","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C173853756","display_name":"Dialog box","score":0.8267581462860107},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.5705540180206299},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5547981262207031},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.49187254905700684},{"id":"https://openalex.org/C136764020","display_name":"World Wide Web","score":0.09893080592155457}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4409366839","title":"Instruct Where the Model Fails: Generative Data Augmentation via Guided Self-contrastive Fine-tuning","url":"https://doi.org/10.1609/aaai.v39i6.32640","published":"2025-04-11","authors":["Weijian Ma","Ruoxin Chen","Ke-Yue Zhang","Shuang Wu","Shouhong Ding"],"abstract":"Data augmentation is expected to bring about unseen features of training set, enhancing the model’s ability to generalize in situations where data is limited. Generative image models trained on large web-crawled datasets such as LAION are known to produce images with stereotypes and imperceptible bias when used to augment training data, owing to dataset misalignment and the generator’s ignorance of the downstream model. We improve downstream task awareness in generated images by proposing a task-aware fine-tuning strategy that actively detects failures of downstream task in the target model to fine-tune the generation process between epochs. The dynamic fine-tuning strategy is achieved by (1) inspecting misalignment between generated data and original data via VLM captioners and (2) adjusts both prompts and diffusion model so that the strategy dynamically guides the generator by focusing...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1609/aaai.v39i6.32640","openalex_id":"https://openalex.org/W4409366839","cited_by_count":1,"quality_score":38,"matched_keywords":[],"author_affiliations":["Fudan University","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.6879926323890686},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5175718069076538},{"id":"https://openalex.org/C167966045","display_name":"Generative model","score":0.49055665731430054},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.39101892709732056},{"id":"https://openalex.org/C41895202","display_name":"Linguistics","score":0.37672749161720276},{"id":"https://openalex.org/C15744967","display_name":"Psychology","score":0.3763059377670288},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3450850248336792},{"id":"https://openalex.org/C138885662","display_name":"Philosophy","score":0.10847669839859009}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4409367039","title":"ICM-Assistant: Instruction-tuning Multimodal Large Language Models for Rule-based Explainable Image Content Moderation","url":"https://doi.org/10.1609/aaai.v39i8.32908","published":"2025-04-11","authors":["Mengyang Wu","Yuzhi Zhao","Jialun Cao","Mingjie Xu","Zhongming Jiang","Xuehui Wang","Qinbin Li","Guangneng Hu","Shengchao Qin","Chi‐Wing Fu"],"abstract":"Controversial contents largely inundate the Internet, infringing various cultural norms and child protection standards. Traditional Image Content Moderation (ICM) models fall short in producing precise moderation decisions for diverse standards, while recent multimodal large language models (MLLMs), when adopted to general rule-based ICM, often produce classification and explanation results that are inconsistent with human moderators. Aiming at flexible, explainable, and accurate ICM, we design a novel rule-based dataset generation pipeline, decomposing concise human-defined rules and leveraging well-designed multi-stage prompts to enrich short explicit image annotations. Our ICM-Instruct dataset includes detailed moderation explanation and moderation Q-A pairs. Built upon it, we create our ICM-Assistant model in the framework of rule-based ICM, making it readily applicable in real pract...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1609/aaai.v39i8.32908","openalex_id":"https://openalex.org/W4409367039","cited_by_count":1,"quality_score":38,"matched_keywords":[],"author_affiliations":["Chinese University of Hong Kong","Hong Kong University of Science and Technology","Huawei Technologies (China)","Huawei Technologies (United States)","Huazhong University of Science and Technology","Shanghai Jiao Tong University","Xidian University"],"concepts":[{"id":"https://openalex.org/C93225998","display_name":"Moderation","score":0.9255561232566833},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6213878989219666},{"id":"https://openalex.org/C115961682","display_name":"Image (mathematics)","score":0.5423156023025513},{"id":"https://openalex.org/C2778152352","display_name":"Content (measure theory)","score":0.5079415440559387},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.417038232088089},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.371279776096344},{"id":"https://openalex.org/C41895202","display_name":"Linguistics","score":0.3301910161972046},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.15513285994529724}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4409367190","title":"Feature Denoising Diffusion Model for Blind Image Quality Assessment","url":"https://doi.org/10.1609/aaai.v39i5.32530","published":"2025-04-11","authors":["Xudong Li","Yan Zhang","Yunhang Shen","Ke Li","Runze Hu","Xiawu Zheng","Sicheng Zhao"],"abstract":"Blind Image Quality Assessment (BIQA) aims to evaluate image quality in line with human perception, without reference benchmarks. Currently, deep learning BIQA methods typically depend on using features from high-level tasks for transfer learning. However, the inherent differences between BIQA and these high-level tasks inevitably introduce noise into the quality-aware features. In this paper, we take an initial step toward exploring the diffusion model for feature denoising in BIQA, namely Perceptual Feature Diffusion for IQA (PFD-IQA), which aims to remove noise from quality-aware features. Specifically, 1) we propose a Perceptual Prior Discovery and Aggregation module to establish two auxiliary tasks to discover potential low-level features in images that are used to aggregate perceptual textual prompt conditions for the diffusion model. 2) we propose a Perceptual Conditional Feature....","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1609/aaai.v39i5.32530","openalex_id":"https://openalex.org/W4409367190","cited_by_count":1,"quality_score":38,"matched_keywords":[],"author_affiliations":["Beijing Institute of Technology","Tencent (China)","Tsinghua University","Xiamen University"],"concepts":[{"id":"https://openalex.org/C2776401178","display_name":"Feature (linguistics)","score":0.6664227843284607},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5654201507568359},{"id":"https://openalex.org/C153180895","display_name":"Pattern recognition (psychology)","score":0.5642478466033936},{"id":"https://openalex.org/C69357855","display_name":"Diffusion","score":0.5477076768875122},{"id":"https://openalex.org/C163294075","display_name":"Noise reduction","score":0.514914870262146},{"id":"https://openalex.org/C2779530757","display_name":"Quality (philosophy)","score":0.48206931352615356},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.47398486733436584},{"id":"https://openalex.org/C2983327147","display_name":"Image denoising","score":0.45470768213272095}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4409370043","title":"TC-LLaVA: Rethinking the Transfer of LLava from Image to Video Understanding with Temporal Considerations","url":"https://doi.org/10.1609/aaai.v39i3.32317","published":"2025-04-11","authors":["Mingze Gao","Jingyu Liu","Mingda Li","Jiangtao Xie","Qingbin Liu","Kevin Zhao","Xi Chen","Hui Xiong"],"abstract":"Multimodal Large Language Models (MLLMs) have significantly improved performance across various image-language applications. Recently, there has been a growing interest in adapting image pre-trained MLLMs for video-related tasks. However, most efforts concentrate on enhancing the vision encoder and projector components, while the core part, Large Language Models (LLMs), remains comparatively under-explored. In this paper, we propose two strategies to enhance the model's capability in video understanding tasks by improving inter-layer attention computation in LLMs. Specifically, the first approach focuses on the enhancement of Rotary Position Embedding (RoPE) with Temporal-Aware Dual RoPE, which introduces temporal position information to strengthen the MLLM's temporal modeling capabilities while preserving the relative position relationships of both visual and text tokens. The second app...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1609/aaai.v39i3.32317","openalex_id":"https://openalex.org/W4409370043","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Dalian University of Technology","Hong Kong University of Science and Technology","Tencent (China)","University of Hong Kong"],"concepts":[{"id":"https://openalex.org/C115961682","display_name":"Image (mathematics)","score":0.5347596406936646},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.4942534267902374},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.3992037773132324},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3436182737350464}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4409367485","title":"StructSR: Refuse Spurious Details in Real-World Image Super-Resolution","url":"https://doi.org/10.1609/aaai.v39i5.32532","published":"2025-04-11","authors":["Yachao Li","Dong Liang","Tianyu Ding","Sheng-Jun Huang"],"abstract":"Diffusion-based models have shown great promise in real-world image super-resolution (Real-ISR), but often generate content with structural errors and spurious texture details due to the empirical priors and illusions of these models. To address this issue, we introduce StructSR, a simple, effective, and plug-and-play method that enhances structural fidelity and suppresses spurious details for diffusion-based Real-ISR. StructSR operates without the need for additional fine-tuning, external model priors, or high-level semantic knowledge. At its core is the Structure-Aware Screening (SAS) mechanism, which identifies the image with the highest structural similarity to the low-resolution (LR) input in the early inference stage, allowing us to leverage it as a historical structure knowledge to suppress the generation of spurious details. By intervening in the diffusion inference process, Stru...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1609/aaai.v39i5.32532","openalex_id":"https://openalex.org/W4409367485","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Microsoft (United States)","Nanjing University of Aeronautics and Astronautics"],"concepts":[{"id":"https://openalex.org/C97256817","display_name":"Spurious relationship","score":0.9479453563690186},{"id":"https://openalex.org/C115961682","display_name":"Image (mathematics)","score":0.5216077566146851},{"id":"https://openalex.org/C138268822","display_name":"Resolution (logic)","score":0.5113298296928406},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.4699048399925232},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.45471808314323425},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4106217324733734},{"id":"https://openalex.org/C39432304","display_name":"Environmental science","score":0.39726048707962036},{"id":"https://openalex.org/C121684516","display_name":"Computer graphics (images)","score":0.34717240929603577}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4409363532","title":"Spurious Feature Eraser: Stabilizing Test-Time Adaptation for Vision-Language Foundation Model","url":"https://doi.org/10.1609/aaai.v39i18.34124","published":"2025-04-11","authors":["Huan Ma","Yan Zhu","Changqing Zhang","Peilin Zhao","Baoyuan Wu","Long-Kai Huang","Qinghua Hu","Bingzhe Wu"],"abstract":"Vision-language foundation models have exhibited remarkable success across a multitude of downstream tasks due to their scalability on extensive image-text paired data. However, these models also display significant limitations when applied to downstream tasks, such as fine-grained image classification, as a result of ``decision shortcuts'' that hinder their generalization capabilities. In this work, we find that the CLIP model possesses a rich set of features, encompassing both desired invariant causal features and undesired decision shortcuts. Moreover, the underperformance of CLIP on downstream tasks originates from its inability to effectively utilize pre-trained features in accordance with specific task requirements. To address this challenge, we propose a simple yet effective method, Spurious Feature Eraser (SEraser), to alleviate the decision shortcuts by erasing the spurious feat...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1609/aaai.v39i18.34124","openalex_id":"https://openalex.org/W4409363532","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Tencent (China)","Tianjin University"],"concepts":[{"id":"https://openalex.org/C97256817","display_name":"Spurious relationship","score":0.7864649295806885},{"id":"https://openalex.org/C2776401178","display_name":"Feature (linguistics)","score":0.7106829881668091},{"id":"https://openalex.org/C2780966255","display_name":"Foundation (evidence)","score":0.6903613805770874},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6122372150421143},{"id":"https://openalex.org/C139807058","display_name":"Adaptation (eye)","score":0.5890636444091797},{"id":"https://openalex.org/C2777267654","display_name":"Test (biology)","score":0.531801164150238},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.45900604128837585},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.3512284755706787}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4409365985","title":"Instruction-guided Multi-Granularity Segmentation and Captioning with Large Multimodal Model","url":"https://doi.org/10.1609/aaai.v39i9.33054","published":"2025-04-11","authors":["Yuan Xu","Li Zhou","Zenghui Sun","Zikun Zhou","Jinsong Lan"],"abstract":"Large Multimodal Models (LMMs) have significantly progressed by extending large language models. Building on this progress, the latest developments in LMMs demonstrate the ability to generate dense pixel-wise segmentation by integrating segmentation models. Despite the innovations, existing works’ textual responses and segmentation masks remain at the instance level, showing limited ability to perform fine-grained understanding and segmentation even provided with detailed textual cues. To overcome this limitation, we introduce a Multi-Granularity Large Multimodal Model (MGLMM), which is capable of seamlessly adjusting the granularity of Segmentation and Captioning (SegCap) following user instructions, from panoptic SegCap to fine-grained SegCap. We name such a new task Multi-Granularity Segmentation and Captioning (MGSC). Observing the lack of a benchmark for model training and evaluatio...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1609/aaai.v39i9.33054","openalex_id":"https://openalex.org/W4409365985","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Alibaba Group (China)","Alibaba Group (United States)","Hong Kong Polytechnic University","Peng Cheng Laboratory"],"concepts":[{"id":"https://openalex.org/C157657479","display_name":"Closed captioning","score":0.9077484607696533},{"id":"https://openalex.org/C177774035","display_name":"Granularity","score":0.8445205688476562},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7315212488174438},{"id":"https://openalex.org/C89600930","display_name":"Segmentation","score":0.688671886920929},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.5339303016662598},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.48689761757850647},{"id":"https://openalex.org/C115961682","display_name":"Image (mathematics)","score":0.09591111540794373},{"id":"https://openalex.org/C199360897","display_name":"Programming language","score":0.08182856440544128}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4409349265","title":"How Do Position Encodings Affect Length Generalization? Case Studies On In-Context Function Learning","url":"https://doi.org/10.1609/aaai.v39i23.34637","published":"2025-04-11","authors":["Di-Nan Lin","Yao Jui-Feng","Kuo-Chen Wu","Hao Xu","Chun-lin Huang","Hung‐Yu Kao"],"abstract":"The capability of In-Context Learning (ICL) is crucial for large language models to generalize across a wide range of tasks. By utilizing prompts, these models can accurately predict outcomes for previously unseen tasks without necessitating retraining. However, this generalization ability does not extend to the length of the inputs; the effectiveness of ICL likely diminishes with excessively long inputs, resulting in errors in the generated text. To investigate this issue, we propose a study using a dataset of In-Context functions to understand the operational mechanisms of Transformer models in ICL and length generalization. We generated data using regression and Boolean functions and employed meta-learning techniques to endow the model with ICL capabilities. Our experimental results indicate that position encodings can significantly mitigate length generalization issues, with the most...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1609/aaai.v39i23.34637","openalex_id":"https://openalex.org/W4409349265","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Google (United States)","National Cheng Kung University","National Tsing Hua University"],"concepts":[{"id":"https://openalex.org/C2776035688","display_name":"Affect (linguistics)","score":0.8134952783584595},{"id":"https://openalex.org/C177148314","display_name":"Generalization","score":0.6623890399932861},{"id":"https://openalex.org/C2779343474","display_name":"Context (archaeology)","score":0.6458706259727478},{"id":"https://openalex.org/C14036430","display_name":"Function (biology)","score":0.524410605430603},{"id":"https://openalex.org/C198082294","display_name":"Position (finance)","score":0.5215182900428772},{"id":"https://openalex.org/C15744967","display_name":"Psychology","score":0.4197568893432617},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.3942066431045532},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.34127289056777954}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4409366274","title":"HFF-Tracker: A Hierarchical Fine-grained Fusion Tracker for Referring Multi-Object Tracking","url":"https://doi.org/10.1609/aaai.v39i10.33143","published":"2025-04-11","authors":["Zeyong Zhao","Yanchao Hao","Minghao Zhang","Qingbin Liu","Bo Li","Dianbo Sui","S. L. He","Xi Chen"],"abstract":"Referring Multi-Object Tracking (RMOT) aims to track multiple objects based on a provided language expression. Although prior studies have sought to accomplish this by integrating an textual module into the multi-object tracker, these methods combine text and image features in a basic way, neglecting the importance of text features. In this study, we propose a Hierarchical Fine-grained text-image Fusion tracker, named HFF-Tracker, which can perform fine-grained fusion of pixel-level visual features and text features across various semantic levels. Specifically, we have devised a Hierarchical Multi-Modal Fusion (HMMF) module to merge text and image features at an early stage in a hierarchical and detailed manner. The Text-Guided Decoder (TGD) is designed to provide the query with prior semantic information during the decoding process. Additionally, we have crafted a Text-Guided Prediction...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1609/aaai.v39i10.33143","openalex_id":"https://openalex.org/W4409366274","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Harbin Institute of Technology","Institute of Automation","Shandong Institute of Automation","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C2775936607","display_name":"Tracking (education)","score":0.7182579040527344},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.6760733127593994},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.666773796081543},{"id":"https://openalex.org/C158525013","display_name":"Fusion","score":0.6086487770080566},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5772258639335632},{"id":"https://openalex.org/C2781238097","display_name":"Object (grammar)","score":0.5550814867019653},{"id":"https://openalex.org/C202474056","display_name":"Video tracking","score":0.45348119735717773},{"id":"https://openalex.org/C15744967","display_name":"Psychology","score":0.11700260639190674}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4409363046","title":"FaceSpeak: Expressive and High-Quality Speech Synthesis from Human Portraits of Different Styles","url":"https://doi.org/10.1609/aaai.v39i24.34786","published":"2025-04-11","authors":["Tianhao Zhang","Jiawei Zhang","Jun Wang","Xinyuan Qian","Xu-Cheng Yin"],"abstract":"Humans can perceive speakers’ characteristics (e.g., identity, gender, personality and emotion) by their appearance, which are generally aligned to their voice style. Recently, vision-driven Text-to-speech ( TTS ) scholars grounded their investigations on real-person faces, thereby restricting effective speech synthesis from applying to vast potential usage scenarios with diverse characters and image styles. To solve this issue, we introduce a novel FaceSpeak approach. It extracts salient identity characteristics and emotional representations from a wide variety of image styles. Meanwhile, it mitigates the extraneous information (e.g., background, clothing, and hair color, etc.), resulting in synthesized speech closely aligned with a character’s persona. Furthermore, to overcome the scarcity of multi-modal TTS data, we have devised an innovative dataset, namely Expressive Multi-Modal TTS...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1609/aaai.v39i24.34786","openalex_id":"https://openalex.org/W4409363046","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Tencent (China)","University of Science and Technology Beijing"],"concepts":[{"id":"https://openalex.org/C162462552","display_name":"Portrait","score":0.5755087733268738},{"id":"https://openalex.org/C2779530757","display_name":"Quality (philosophy)","score":0.5254879593849182},{"id":"https://openalex.org/C14999030","display_name":"Speech synthesis","score":0.4705025851726532},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.45380690693855286},{"id":"https://openalex.org/C28490314","display_name":"Speech recognition","score":0.41097235679626465},{"id":"https://openalex.org/C15744967","display_name":"Psychology","score":0.389001727104187},{"id":"https://openalex.org/C46312422","display_name":"Communication","score":0.3640950322151184},{"id":"https://openalex.org/C41895202","display_name":"Linguistics","score":0.32480570673942566}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"hf-org-paper:moonshotai:2504.07491","title":"Kimi-VL Technical Report","url":"https://huggingface.co/papers/2504.07491","published":"2025-04-10","authors":["Moonshot/Kimi"],"abstract":"We present Kimi-VL, an efficient open-source Mixture-of-Experts (MoE) vision-language model (VLM) that offers advanced multimodal reasoning, long-context understanding, and strong agent capabilities - all while activating only 2.8B parameters in its language decoder (Kimi-VL-A3B). Kimi-VL demonstrates strong performance across challenging domains: as a general-purpose VLM, Kimi-VL excels in multi-turn agent tasks (e.g., OSWorld), matching flagship models. Furthermore, it exhibits remarkable capabilities across diverse challenging vision language tasks, including college-level image and video comprehension, OCR, mathematical reasoning, and multi-image understanding. In comparative evaluations, it effectively competes with cutting-edge efficient VLMs such as GPT-4o-mini, Qwen2.5-VL-7B, and Gemma-3-12B-IT, while surpassing GPT-4o in several key domains. Kimi-VL also advances in processing l...","companies":["Moonshot/Kimi"],"matched_orgs":["Moonshot/Kimi"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["HuggingFace org papers","moonshotai","LLM","language model","efficient","agent"],"author_affiliations":["Moonshot/Kimi"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/moonshotai/papers"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/enhancing-large-language-model-performance-with-gradient-based-parameter-selection","title":"Enhancing Large Language Model Performance with Gradient-Based Parameter Selection","url":"https://www.microsoft.com/en-us/research/publication/enhancing-large-language-model-performance-with-gradient-based-parameter-selection/","published":"2025-04-10","authors":["Haoling Li","Xin Zhang","Xiao Liu","Yeyun Gong","Yifan Wang","Qi Chen","Peng Cheng"],"abstract":"Large language models (LLMs) have revolutionized numerous fields of research, driving significant advancements in natural language processing, machine translation, and beyond. Although the extensive number of parameters contributes a lot to the great success, existing studies indicate that not all model parameters hold equal importance, which further leads to redundancy during the parameter update process. Recent works for reducing redundant parameter updates for LLMs either lack task-specific data information, may leading to suboptimal model performance, or discard transformer components or insignificant parameters, limiting the model's scalability across different tasks and potentially compromising the LLM structure. To address these issues and further enhance the performance of LLMs, we propose Gradient-Mask Tuning (GMT), a method that selectively updates parameters based on gradient....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","1970-01-01","LLM","language model"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"apple:s7as0vevpqzyf5h9m7gqdhes","title":"MicroNN: An On-device Disk Resident Updatable Vector Database","url":"https://machinelearning.apple.com/research/micronn-on-device","published":"2025-04-10","authors":["Jeffrey Pound","Floris Chabert","Arjun Bhushan","Ankur Goswami","Anil Pacaci","Shihabur Rahman Chowdhury"],"abstract":"Nearest neighbour search over dense vector collections has important applications in information retrieval, retrieval augmented generation (RAG), and content ranking. Performing efficient search over large vector collections is a well studied problem with many existing approaches and open source implementations. However, most state-of-the-art systems are generally targeted towards scenarios using large servers with an abundance of memory, static...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["memory","retrieval","efficient"],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"bytedance-seed:208","title":"Seed-Thinking-v1.5: Advancing Superb Reasoning Models with Reinforcement Learning","url":"https://seed.bytedance.com/en/research/seed-thinking-v1-5-advancing-superb-reasoning-models-with-reinforcement-learning","published":"2025-04-10","authors":["Jiaze Chen","TianTian Fan","Xin Liu","Lingjun Liu","Zhiqi Lin","Mingxuan Wang","Chengyi Wang","Xiangpeng Wei","Wenyuan Xu","Yufeng Yuan","Yu Yue","Lin Yan"],"abstract":"We introduce Seed-Thinking-v1.5, capable of reasoning through thinking before responding, resulting in improved performance on a wide range of benchmarks. Seed-Thinking-v1.5 achieves 86.7 on AIME 2024, 55.0 on Codeforces and 77.3 on GPQA, demonstrating excellent reasoning abilities in STEM and coding. Beyond reasoning tasks, the method demonstrates notable generalization across diverse domains. For instance, it surpasses DeepSeek R1 by 8% in win rate on non-reasoning tasks, indicating its broader applicability. Compared to other state-of-the-art reasoning models, Seed-Thinking-v1.5 is a Mixture-of-Experts (MoE) model with a relatively small size, featuring 20B activated and 200B total parameters. As part of our effort to assess generalized reasoning, we develop two internal benchmarks, BeyondAIME and Codeforces, both of which will be publicly released to support future research. External...","companies":["ByteDance/Seed"],"matched_orgs":["ByteDance/Seed"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["LLM","https://github.com/ByteDance-Seed/Seed-Thinking-v1.5"],"author_affiliations":["ByteDance/Seed"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official ByteDance Seed publication API https://seed.bytedance.com/api/get_article_list_v2"}},{"id":"official:f18c0454a7bd11e9","title":"Gemini 2.0 Flash-Lite Model Card","url":"https://storage.googleapis.com/deepmind-media/Model-Cards/Gemini-2-0-Flash-Lite-Model-Card.pdf","published":"2025-04-10","authors":["Google/DeepMind"],"abstract":"Official Google DeepMind model card.","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"model_card","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["model card","Gemini 2.0 Flash-Lite"],"author_affiliations":["Google/DeepMind"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Google DeepMind model cards page https://deepmind.google/models/model-cards/"}},{"id":"apple:cm1h37eyl7jf8a203xnkz00g","title":"Ferret-UI 2: Mastering Universal User Interface Understanding Across Platforms","url":"https://machinelearning.apple.com/research/ferret-ui-2","published":"2025-04-10","authors":["Zhangheng Li","Keen You","Haotian Zhang","Di Feng","Harsh Agrawal","Xiujun Li","Mohana Prasad Sathya Moorthy","Jeff Nichols","Yinfei Yang","Zhe Gan"],"abstract":"Building a generalist model for user interface (UI) understanding is challenging due to various foundational issues, such as platform diversity, resolution variation, and data limitation. In this paper, we introduce Ferret-UI 2, a multimodal large language model (MLLM) designed for universal UI understanding across a wide range of platforms, including iPhone, Android, iPad, Webpage, and AppleTV. Building on the foundation of Ferret-UI, Ferret-UI...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["language model"],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"apple:plb923umgma0xpwuxqavrxn2","title":"Task-Adaptive Pretrained Language Models via Clustered-Importance Sampling","url":"https://machinelearning.apple.com/research/task-adaptive","published":"2025-04-10","authors":["Apple"],"abstract":"Specialist language models (LMs) focus on a specific task or domain on which they often outperform generalist LMs of the same size. However, the specialist data needed to pretrain these models is only available in limited amount for most tasks. In this work, we build specialist models from large generalist training sets instead. We adjust the training distribution of the generalist data with guidance from the limited domain-specific data. We...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"apple:fzku7bapihltshgtimguc6a5","title":"RelCon: Relative Contrastive Learning for a Motion Foundation Model for Wearable Data","url":"https://machinelearning.apple.com/research/relative-contrastive-learning","published":"2025-04-10","authors":["Maxwell A. Xu","Jaya Narain","Gregory Darnell","Haraldur Hallgrímsson","Hyewon Jeong","Darren Forde","Richard Fineman","Karthik J. Raghuram","James M. Rehg","Shirley Ren"],"abstract":"We present RelCon, a novel self-supervised Relative Contrastive learning approach for training a motion foundation model from wearable accelerometry sensors. First, a learnable distance measure is trained to capture motif similarity and domain-specific semantic information such as rotation invariance. Then, the learned distance provides a measurement of semantic similarity between a pair of accelerometry time-series, which we use to train our...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"apple:uxrvapoyvj3gbwu0v1kkrztp","title":"No Need to Talk: Asynchronous Mixture of Language Models","url":"https://machinelearning.apple.com/research/no-need-to-talk","published":"2025-04-10","authors":["Anastasiia Filippova","Angelos Katharopoulos","David Grangier","Ronan Collobert"],"abstract":"We introduce SmallTalk LM, an innovative method for training a mixture of language models in an almost asynchronous manner. Each model of the mixture specializes in distinct parts of the data distribution, without the need of high-bandwidth communication between the nodes training each model. At inference, a lightweight router directs a given sequence to a single expert, according to a short prefix. This inference scheme naturally uses a fraction...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"apple:kjce27wtu48qn811d54rmism","title":"Do LLMs Know Internally When They Follow Instructions?","url":"https://machinelearning.apple.com/research/do-llms-know-internally","published":"2025-04-10","authors":["Juyeon Heo","Christina Heinze-Deml","Oussama Elachqar","Kwan Ho Ryan Chan","Shirley Ren","Udhay Nallasamy","Andy Miller","Jaya Narain"],"abstract":"Instruction-following is crucial for building AI agents with large language models (LLMs), as these models must adhere strictly to user-provided constraints and guidelines. However, LLMs often fail to follow even simple and clear instructions. To improve instruction-following behavior and prevent undesirable outputs, a deeper understanding of how LLMs’ internal states relate to these outcomes is required. In this work, we investigate whether LLMs...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"openalex:W4409322573","title":"Screening for anemia using multi-modal machine learning models on smartphones: protocol for a comparative accuracy study in rural India","url":"https://doi.org/10.1101/2025.04.10.25325591","published":"2025-04-10","authors":["Shrey Desai","Kaushal Jesalpura","Sujay Kakarmath","Mayank Daswani","Sethuraman Venkatraman","Jim Taylor","Shruti Gadgil","Marc Wilson","Rajroshan Sawhney","Anaita Singh","Raghu Pullakhandam","Matthew Thompson"],"abstract":"Abstract Introduction Anemia, or low blood hemoglobin (Hb), affects one third of the world population, and is particularly prevalent in women and children in lower resource settings. However, screening for anemia is limited by the availability of accurate, easy to use, lower cost and non-invasive methods. We aim to generate research to support potential development of a point of care test to detect anemia using smartphone images of conjunctivae, tongue and nail beds as well as photoplethysmogram (PPG) signals, and comparing their accuracy to a standard laboratory assay, and, a point of care Hb assay. Methods & Analysis Cross-sectional comparative accuracy study of Adults and children (>1 year) presenting to hospital and outpatient care at SEWA Rural, a non-governmental organization providing healthcare to a rural, tribal population in Gujarat, India. Patients whose clinician has requeste...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"preprint","doi":"https://doi.org/10.1101/2025.04.10.25325591","openalex_id":"https://openalex.org/W4409322573","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Gates Foundation","Google (United States)","Gujarat National Law University","National Institute of Nutrition","Society for Education Welfare and Action Rural"],"concepts":[{"id":"https://openalex.org/C2780385302","display_name":"Protocol (science)","score":0.7098461389541626},{"id":"https://openalex.org/C71139939","display_name":"Modal","score":0.6956397891044617},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5785539746284485},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.41823461651802063},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.3792448043823242},{"id":"https://openalex.org/C71924100","display_name":"Medicine","score":0.33469778299331665},{"id":"https://openalex.org/C142724271","display_name":"Pathology","score":0.09241354465484619},{"id":"https://openalex.org/C185592680","display_name":"Chemistry","score":0.06525194644927979}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"arxiv:2504.13914","title":"Seed1.5-Thinking: Advancing Superb Reasoning Models with Reinforcement Learning","url":"https://huggingface.co/papers/2504.13914","published":"2025-04-10","authors":["ByteDance Seed","Jiaze Chen","Tiantian Fan","Xin Liu","Lingjun Liu","Zhiqi Lin","Mingxuan Wang","Chengyi Wang","Xiangpeng Wei","Wenyuan Xu","Yufeng Yuan","Yu Yue"],"abstract":"We introduce Seed1.5-Thinking, capable of reasoning through thinking before responding, resulting in improved performance on a wide range of benchmarks. Seed1.5-Thinking achieves 86.7 on AIME 2024, 55.0 on Codeforces and 77.3 on GPQA, demonstrating excellent reasoning abilities in STEM and coding. Beyond reasoning tasks, the method demonstrates notable generalization across diverse domains. For instance, it surpasses DeepSeek R1 by 8% in win rate on non-reasoning tasks, indicating its broader applicability. Compared to other state-of-the-art reasoning models, Seed1.5-Thinking is a Mixture-of-Experts (MoE) model with a relatively small size, featuring 20B activated and 200B total parameters. As part of our effort to assess generalized reasoning, we develop two internal benchmarks, BeyondAIME and Codeforces, both of which will be publicly released to support future research. Model trial li...","companies":["ByteDance/Seed"],"matched_orgs":["ByteDance/Seed"],"company_groups":["company_china"],"company_regions":["China"],"sources":["huggingface_search"],"source":"huggingface_search","work_type":"","doi":"","openalex_id":"","cited_by_count":0,"quality_score":27,"matched_keywords":[],"author_affiliations":[],"concepts":[],"official_report":false,"quality_signals":{"company_match_source":"HuggingFace Papers organization/author metadata"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/sota-with-less-mcts-guided-sample-selection-for-data-efficient-visual-reasoning-self-improvement","title":"SoTA with Less: MCTS-Guided Sample Selection for Data-Efficient Visual Reasoning Self-Improvement","url":"https://www.microsoft.com/en-us/research/publication/sota-with-less-mcts-guided-sample-selection-for-data-efficient-visual-reasoning-self-improvement/","published":"2025-04-09","authors":["Xiyao Wang","Zhengyuan Yang","Chao Feng","Hongjin Lu","Linjie Li","Chung-Ching Lin","Kevin Lin","Furong Huang","Lijuan Wang"],"abstract":"We introduce ThinkLite-VL, a family of visual reasoning models that achieve state-of-the-art (SoTA) performance using an order of magnitude fewer training samples, relying purely on reinforcement fine-tuning (RFT) self-improvement without any knowledge distillation. Our central insight is that sample difficulty critically influences RFT effectiveness: appropriately challenging examples can drive substantial reasoning improvements, even in low-data regimes. However, quantifying sample difficulty in a reliable and scalable manner remains non-trivial. To address this, we repurpose Monte Carlo Tree Search (MCTS) to measure sample difficulty via the number of reasoning iterations a vision-language model (VLM) requires to solve each instance. This MCTS-based selection procedure identifies samples that induce deeper reasoning while remaining solvable, allowing us to filter a high-quality subset...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":84,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","visual reasoning","1970-01-01","language model","efficient","distillation"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W4409283601","title":"Towards conversational diagnostic artificial intelligence","url":"https://doi.org/10.1038/s41586-025-08866-7","published":"2025-04-09","authors":["Tao Tu","Mike Schaekermann","Anil Palepu","Khaled Saab","Jan Freyberg","Ryutaro Tanno","Amy Wang","Brenna Li","Mohamed Amin","Yong Cheng","Elahe Vedadi","Nenad Tomašev"],"abstract":". The study included 159 case scenarios from providers in Canada, the United Kingdom and India, 20 primary care physicians compared to AMIE, and evaluations by specialist physicians and patient-actors. AMIE demonstrated greater diagnostic accuracy and superior performance on 30 out of 32 axes according to the specialist physicians and 25 out of 26 axes according to the patient-actors. Our research has several limitations and should be interpreted with caution. Clinicians used synchronous text chat, which permits large-scale LLM-patient interactions, but this is unfamiliar in clinical practice. While further research is required before AMIE could be translated to real-world settings, the results represent a milestone towards conversational diagnostic AI.","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1038/s41586-025-08866-7","openalex_id":"https://openalex.org/W4409283601","cited_by_count":212,"quality_score":71,"matched_keywords":["LLM"],"author_affiliations":["Google (United States)"],"concepts":[{"id":"https://openalex.org/C120060458","display_name":"Milestone","score":0.841292142868042},{"id":"https://openalex.org/C2779885105","display_name":"Empathy","score":0.5808547735214233},{"id":"https://openalex.org/C2778000598","display_name":"Objective structured clinical examination","score":0.5180037617683411},{"id":"https://openalex.org/C2777352838","display_name":"Excellence","score":0.49357688426971436},{"id":"https://openalex.org/C3020132585","display_name":"Diagnostic accuracy","score":0.466916561126709},{"id":"https://openalex.org/C2778755073","display_name":"Scale (ratio)","score":0.4360809326171875},{"id":"https://openalex.org/C509550671","display_name":"Medical education","score":0.4340459108352661},{"id":"https://openalex.org/C2984752397","display_name":"Primary care","score":0.42946887016296387}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":212}},{"id":"openalex:W4410609240","title":"Position: Contextual Confidence and Generative AI","url":"https://doi.org/10.1109/satml64287.2025.00022","published":"2025-04-09","authors":["Shrey Jain","Zoë Hitzig","Pamela Mishkin"],"abstract":"Generative AI models perturb the foundations of effective human communication. They present new challenges to contextual confidence, disrupting participants' ability to identify the authentic context of communication and their ability to protect communication from reuse and recombination outside its intended context. In this paper, we describe strategies ‒ tools, technologies and policies ‒ that aim to stabilize communication in the face of these challenges. The strategies we discuss fall into two broad categories. Containment strategies aim to reassert context in environments where it is currently threatened ‒ a reaction to the context-free expectations and norms established by the internet. Mobilization strategies, by contrast, view the rise of generative AI as an opportunity to proactively set new and higher expectations around privacy and authenticity in mediated communication.","companies":["OpenAI","Microsoft"],"matched_orgs":["OpenAI","Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/satml64287.2025.00022","openalex_id":"https://openalex.org/W4410609240","cited_by_count":0,"quality_score":49,"matched_keywords":[],"author_affiliations":["Microsoft (United States)","OpenAI (United States)"],"concepts":[{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.7001925706863403},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.692855715751648},{"id":"https://openalex.org/C198082294","display_name":"Position (finance)","score":0.5792191028594971},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5658016204833984},{"id":"https://openalex.org/C78780964","display_name":"Position paper","score":0.5430320501327515},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.36670446395874023},{"id":"https://openalex.org/C136764020","display_name":"World Wide Web","score":0.11789187788963318},{"id":"https://openalex.org/C10138342","display_name":"Finance","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4409309502","title":"Multi-view Intent Learning and Alignment with Large Language Models for Session-based Recommendation","url":"https://doi.org/10.1145/3719344","published":"2025-04-09","authors":["Shutong Qiao","Wei Zhou","Junhao Wen","Chen Gao","Qun Luo","Peixuan Chen","Yong Li"],"abstract":"Session-based recommendation (SBR) methods often rely on user behavior data, which can struggle with the sparsity of session data, limiting performance. Researchers have identified that beyond behavioral signals, rich semantic information in item descriptions is crucial for capturing hidden user intent. While Large Language Models (LLMs) offer new ways to leverage this semantic data, the challenges of session anonymity, short-sequence nature, and high LLM training costs have hindered the development of a lightweight, efficient LLM framework for SBR. To address the above challenges, we propose an LLM-enhanced SBR framework that integrates semantic and behavioral signals from multiple views. This two-stage framework leverages the strengths of both LLMs and traditional SBR models while minimizing training costs. In the first stage, we use multi-view prompts to infer latent user intentions a...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3719344","openalex_id":"https://openalex.org/W4409309502","cited_by_count":4,"quality_score":49,"matched_keywords":["LLM","efficient"],"author_affiliations":["Chongqing University","Tencent (China)","Tsinghua University"],"concepts":[{"id":"https://openalex.org/C2779182362","display_name":"Session (web analytics)","score":0.8532490730285645},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6654281616210938},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.36747485399246216},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.36662915349006653},{"id":"https://openalex.org/C136764020","display_name":"World Wide Web","score":0.23472890257835388}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":4}},{"id":"openalex:W4410609246","title":"Learning with User-Level Differential Privacy Under Fixed Compute Budgets","url":"https://doi.org/10.1109/satml64287.2025.00055","published":"2025-04-09","authors":["Zachary Charles","Arun Ganesh","Ryan M. McKenna","H. Brendan McMahan","Nicole Mitchell","Krishna Pillutla","Keith Rush"],"abstract":"We investigate practical and scalable algorithms for training machine learning models with user-level differential privacy (DP) in order to provably safeguard all the examples contributed by each user. Motivated by the application of large language model (LLM) fine-tuning, we analyze algorithms under fixed compute budgets, especially large budget settings. We study two variants of DP-SGD with: (1) example-level sampling (ELS) and per-example gradient clipping, and (2) user-level sampling (ULS) and per-user gradient clipping. We derive a novel user-level DP accountant that allows us to compute provably tight privacy guarantees for ELS. We show that for fixed compute and privacy budgets, ULS generally yields better results than ELS, especially when each user has a diverse collection of examples and the compute budget is large. We validate our findings through experiments in synthetic mean....","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/satml64287.2025.00055","openalex_id":"https://openalex.org/W4410609246","cited_by_count":1,"quality_score":46,"matched_keywords":["LLM","language model"],"author_affiliations":["Google (United States)","Indian Institute of Technology Madras"],"concepts":[{"id":"https://openalex.org/C23130292","display_name":"Differential privacy","score":0.8800572752952576},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7458261847496033},{"id":"https://openalex.org/C93226319","display_name":"Differential (mechanical device)","score":0.49692562222480774},{"id":"https://openalex.org/C124101348","display_name":"Data mining","score":0.19290167093276978},{"id":"https://openalex.org/C127413603","display_name":"Engineering","score":0.0709187388420105},{"id":"https://openalex.org/C146978453","display_name":"Aerospace engineering","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4412404456","title":"Development of a GI Cancer Diagnostic Chatbot Based on RAG Model","url":"https://doi.org/10.1109/otcon65728.2025.11070521","published":"2025-04-09","authors":["Aaditya Vijayvargiya","Anurag Kaushish","Kanika Rawat","Mithun Kumar","Gagan Deep Singh"],"abstract":"Gastrointestinal (GI) cancer involves the digestive tract and is not easy to detect because it comes with vague signs and intricate medical reports. The present study puts forward a new use of RoBERTa large, a transformer language model, and Retrieval-Augmented Generation (RAG). Our model exhibits important advancements with three innovations: Domain-specific embeddings employing medically fine-tuned RoBERTa (Robustly Optimized BERT Pretraining Approach), dynamic knowledge incorporation of latest research through RAG (Retrieval-Augmented Generation) architecture, and hybrid ranking leveraging semantic and keyword-based relevance. Experimental results establish 51% cosine similarity in test set responses with retaining clinical coherence.","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/otcon65728.2025.11070521","openalex_id":"https://openalex.org/W4412404456","cited_by_count":0,"quality_score":45,"matched_keywords":["language model","retrieval"],"author_affiliations":["Dehradun Institute of Technology University","Google (United States)","University of Petroleum and Energy Studies"],"concepts":[{"id":"https://openalex.org/C2779041454","display_name":"Chatbot","score":0.8522035479545593},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6473498344421387},{"id":"https://openalex.org/C121608353","display_name":"Cancer","score":0.5241742134094238},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.37415650486946106},{"id":"https://openalex.org/C136764020","display_name":"World Wide Web","score":0.33046871423721313},{"id":"https://openalex.org/C71924100","display_name":"Medicine","score":0.2048085331916809},{"id":"https://openalex.org/C126322002","display_name":"Internal medicine","score":0.11949428915977478}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4409280128","title":"The Capability of Large Language Models to Measure and Differentiate Psychiatric Conditions Through O-Shot Learning","url":"https://doi.org/10.1016/j.biopsych.2025.02.165","published":"2025-04-09","authors":["Isaac R. Galatzer‐Levy","Daniel McDuff","Matteo Malgaroli"],"abstract":"","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1016/j.biopsych.2025.02.165","openalex_id":"https://openalex.org/W4409280128","cited_by_count":2,"quality_score":39,"matched_keywords":[],"author_affiliations":["Google (United States)","New York University"],"concepts":[{"id":"https://openalex.org/C2780009758","display_name":"Measure (data warehouse)","score":0.7832052707672119},{"id":"https://openalex.org/C2778344882","display_name":"Shot (pellet)","score":0.4969041645526886},{"id":"https://openalex.org/C15744967","display_name":"Psychology","score":0.48902449011802673},{"id":"https://openalex.org/C2992734406","display_name":"One shot","score":0.4206767678260803},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.40881600975990295},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.4038616418838501},{"id":"https://openalex.org/C118552586","display_name":"Psychiatry","score":0.3463976979255676},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.34125378727912903}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":2}},{"id":"openalex:W4412404596","title":"Leveraging Multilingual Pre-trained Models for Indian Language Summarization and Factual Incorrectness Detection","url":"https://doi.org/10.1109/otcon65728.2025.11070502","published":"2025-04-09","authors":["Kanav Gupta","Jaanhvi Saxena","Panshul Saxena","Mithun Kumar","Piyush Chauhan"],"abstract":"Text summarization for Indian languages remains an underexplored area in the NLP community. This scientific research contributes its share of work in this aspect by participating in the third edition of the ILSUM shared task, which focuses on multi-Indian-language summarization, i.e., Hindi, Gujarati, and Tamil, and factual inaccuracy detection in machine-generated cross-lingual summaries. This study aspires to accomplish two main tasks: first, to build robust summarization models that can deal with the linguistically complex scenarios of code-mixing and script-mixing and second, to automatically detect factual inconsistencies in cross-lingual summaries. The proposed methodology uses mBART, a pre-trained sequence-to-sequence model fine-tuned to generate fixed-length summaries by adapting to language-specific characteristics. In addition, we perform fine-tuning on BART-based models to cla...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/otcon65728.2025.11070502","openalex_id":"https://openalex.org/W4412404596","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Google (United States)","Symbiosis International University","University of Petroleum and Energy Studies"],"concepts":[{"id":"https://openalex.org/C170858558","display_name":"Automatic summarization","score":0.8495738506317139},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8002361059188843},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.6439883708953857},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5879946351051331},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.41530776023864746},{"id":"https://openalex.org/C28490314","display_name":"Speech recognition","score":0.33746105432510376}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"hf-org-paper:tencent:2504.05812","title":"Right Question is Already Half the Answer: Fully Unsupervised LLM Reasoning Incentivization","url":"https://huggingface.co/papers/2504.05812","published":"2025-04-08","authors":["Tencent/Hunyuan"],"abstract":"","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["HuggingFace org papers","tencent","LLM"],"author_affiliations":["Tencent/Hunyuan"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/tencent/papers"}},{"id":"hf-org-paper:stepfun-ai:2504.06263","title":"OmniSVG: A Unified Scalable Vector Graphics Generation Model","url":"https://huggingface.co/papers/2504.06263","published":"2025-04-08","authors":["StepFun"],"abstract":"","companies":["StepFun"],"matched_orgs":["StepFun"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["HuggingFace org papers","stepfun-ai"],"author_affiliations":["StepFun"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/stepfun-ai/papers"}},{"id":"apple:rh5rhn0u8w1rtfkvwlqxd3bv","title":"Revisit Large-Scale Image–Caption Data in Pre-training Multimodal Foundation Models","url":"https://machinelearning.apple.com/research/large-scale-image-caption","published":"2025-04-08","authors":["Zhengfeng Lai","Vasileios Saveris","Chen Chen","Hong-You Chen","Haotian Zhang","Bowen Zhang","Juan Lao Tebar","Wenze Hu","Zhe Gan","Peter Grasch","Meng Cao","Yinfei Yang"],"abstract":"Recent advancements in multimodal models highlight the value of rewritten captions for improving performance, yet key challenges remain. Notably, the role of synthetic captions and their interaction with original web-crawled AltTexts in pre-training is still unclear. Additionally, different multimodal foundation models may have distinct preferences for specific caption formats while the efforts of studying the optimal captions for each foundation...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"apple:qvht5hlwzlfdiemnv1pi8jnl","title":"Do LLMs Estimate Uncertainty Well in Instruction-Following?","url":"https://machinelearning.apple.com/research/estimate-uncertainty-well","published":"2025-04-08","authors":["Juyeon Heo","Miao Xiong","Christina Heinze-Deml","Jaya Narain"],"abstract":"Large language models (LLMs) could be valuable personal AI agents across various domains, provided they can precisely follow user instructions. However, recent studies have shown significant limitations in LLMs’ instruction-following capabilities, raising concerns about their reliability in high-stakes applications. Accurately estimating LLMs’ uncertainty in adhering to instructions is critical to mitigating deployment risks. We present, to our...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/waber-evaluating-reliability-and-efficiency-of-web-agents-with-existing-benchmarks","title":"WABER: Evaluating Reliability and Efficiency of Web Agents with Existing Benchmarks","url":"https://www.microsoft.com/en-us/research/publication/waber-evaluating-reliability-and-efficiency-of-web-agents-with-existing-benchmarks/","published":"2025-04-07","authors":["Su Kara","Fazle Faisal","Suman Nath"],"abstract":"Most existing web agent benchmarks evaluate agents solely based on their task completion rate, excluding other crucial aspects of agent behavior that impact their usability and deployability in real-world. We propose incorporating two important metrics into a web agent benchmark: reliability that assesses how consistently the agent completes tasks despite transient web unreliability that are common in the wild, and efficiency that measures the speed and cost-effectiveness of the agent's task completion. Developing new benchmarks to measure these metrics would take significant efforts. To address this, we introduce a novel network proxy-based solution called WABER, which enables the evaluation of these two metrics on existing agents and benchmarks without requiring any modifications to them. This allows agent developers to adopt it effortlessly on any agent and benchmark, with zero develo...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Systems and networking","1970-01-01","LLM","memory","agent"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W4409208764","title":"Generative compositor for few-shot visual information extraction","url":"https://doi.org/10.1016/j.patcog.2025.111624","published":"2025-04-07","authors":["Zhibo Yang","Wei Hua","Sibo Song","Cong Yao","Yingying Zhu","Wenqing Cheng","Xiang Bai"],"abstract":"Visual Information Extraction (VIE), aiming at extracting structured information from visually rich document images , plays a pivotal role in document processing. Considering various layouts, semantic scopes, and languages, VIE encompasses an extensive range of types, potentially numbering in the thousands. However, many of these types suffer from a lack of training data , which poses significant challenges. In this paper, we propose a novel generative model , named Generative Compositor, to address the challenge of few-shot VIE. The Generative Compositor is a hybrid pointer-generator network that emulates the operations of a compositor by retrieving words from the source text and assembling them based on the provided prompts. Furthermore, three pre-training strategies are employed to enhance the model’s perception of spatial context information. Besides, a prompt-aware resampler is spec...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1016/j.patcog.2025.111624","openalex_id":"https://openalex.org/W4409208764","cited_by_count":1,"quality_score":46,"matched_keywords":["retrieval","efficient"],"author_affiliations":["Alibaba Group (China)","Huazhong University of Science and Technology"],"concepts":[{"id":"https://openalex.org/C2778344882","display_name":"Shot (pellet)","score":0.7360970973968506},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.6912256479263306},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6805274486541748},{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.5293372869491577},{"id":"https://openalex.org/C195807954","display_name":"Information extraction","score":0.5165573358535767},{"id":"https://openalex.org/C4725764","display_name":"Extraction (chemistry)","score":0.5138002038002014},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.49370941519737244},{"id":"https://openalex.org/C153180895","display_name":"Pattern recognition (psychology)","score":0.46727171540260315}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4409199977","title":"Generative Prompt Controlled Diffusion for weakly supervised semantic segmentation","url":"https://doi.org/10.1016/j.neucom.2025.130103","published":"2025-04-06","authors":["Wangyu Wu","Tianhong Dai","Zhenhong Chen","Xiaowei Huang","Fei Ma","Jimin Xiao"],"abstract":"Weakly supervised semantic segmentation (WSSS), aiming to train segmentation models solely using image-level labels, has received significant attention. Existing approaches mainly concentrate on creating high-quality pseudo labels by utilizing existing images and their corresponding image-level labels. However, a major challenge arises when the available dataset is limited, as the quality of pseudo labels degrades significantly. In this paper, we tackle this challenge from a different perspective by introducing a novel approach called Generative Prompt Controlled Diffusion (GPCD) for data augmentation . This approach enhances the current labeled datasets by augmenting them with a variety of images, achieved through controlled diffusion guided by Generative Pre-trained Transformer (GPT) prompts. In this process, the existing images and image-level labels provide the necessary control info...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1016/j.neucom.2025.130103","openalex_id":"https://openalex.org/W4409199977","cited_by_count":28,"quality_score":65,"matched_keywords":[],"author_affiliations":["Imperial College London","Microsoft (United States)","University of Liverpool","Xi’an Jiaotong-Liverpool University"],"concepts":[{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.7278056144714355},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.715222954750061},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.68709796667099},{"id":"https://openalex.org/C89600930","display_name":"Segmentation","score":0.6860702037811279},{"id":"https://openalex.org/C153180895","display_name":"Pattern recognition (psychology)","score":0.5119841694831848},{"id":"https://openalex.org/C167966045","display_name":"Generative model","score":0.5106683373451233},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.42497727274894714},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.37071895599365234}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":28}},{"id":"openalex:W4410771269","title":"3D Gaussian Splatting with Normal Information for Mesh Extraction and Improved Rendering","url":"https://doi.org/10.1109/icasspw65056.2025.11011170","published":"2025-04-06","authors":["Meenakshi Krishnan","Liam Fowl","Ramani Duraiswami"],"abstract":"Differentiable 3D Gaussian splatting has emerged as an efficient and flexible rendering technique for representing complex scenes from a collection of 2D views and enabling high-quality real-time novel-view synthesis. However, its reliance on photometric losses can lead to imprecisely reconstructed geometry and extracted meshes, especially in regions with high curvature or fine detail. We propose a novel regularization method using the gradients of a signed distance function estimated from the Gaussians, to improve the quality of rendering while also extracting a surface mesh. The regularizing normal supervision facilitates better rendering and mesh reconstruction, which is crucial for downstream applications in video generation, animation, AR-VR and gaming. We demonstrate the effectiveness of our approach on datasets such as Mip-NeRF360, Tanks and Temples, and Deep-Blending. Our method....","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/icasspw65056.2025.11011170","openalex_id":"https://openalex.org/W4410771269","cited_by_count":2,"quality_score":43,"matched_keywords":["efficient"],"author_affiliations":["Google (United States)","University of Maryland, College Park"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7635542154312134},{"id":"https://openalex.org/C205711294","display_name":"Rendering (computer graphics)","score":0.6796379685401917},{"id":"https://openalex.org/C163716315","display_name":"Gaussian","score":0.5725551843643188},{"id":"https://openalex.org/C121684516","display_name":"Computer graphics (images)","score":0.518842339515686},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.38733816146850586},{"id":"https://openalex.org/C62520636","display_name":"Quantum mechanics","score":0.0},{"id":"https://openalex.org/C121332964","display_name":"Physics","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":2}},{"id":"apple:qufwrfv696cgdpuvyqi0ovbi","title":"SeedLM: Compressing LLM Weights into Seeds of Pseudo-Random Generators","url":"https://machinelearning.apple.com/research/seedlm-compressing","published":"2025-04-04","authors":["Rasoul Shafipour","David Harrison","Maxwell Horton","Jeffrey Marker","Houman Bedayat","Sachin Mehta","Mohammad Rastegari","Mahyar Najibi","Saman Naderiparizi"],"abstract":"Large Language Models (LLMs) have transformed natural language processing, but face significant challenges in widespread deployment due to their high runtime cost. In this paper, we introduce SeedLM, a novel post-training compression method that uses seeds of a pseudo-random generator to encode and compress model weights. Specifically, for each block of weights, wefind a seed that is fed into a Linear Feedback Shift Register (LFSR) during...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["LLM","compression"],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"openalex:W4409149729","title":"From Missteps to Mastery: Enhancing Low-Resource Dense Retrieval through Adaptive Query Generation","url":"https://doi.org/10.1145/3690624.3709225","published":"2025-04-04","authors":["Zhenyu Tong","Chuan Qin","Chuyu Fang","Kaichun Yao","Xi Chen","Jingshuai Zhang","Chen Zhu","Hengshu Zhu"],"abstract":"Document retrieval, designed to recall query-relevant documents from expansive collections, is essential for information-seeking tasks, such as web search and open-domain question-answering. Advances in representation learning and pretrained language models (PLMs) have driven a paradigm shift from traditional sparse retrieval methods to more effective dense retrieval approaches, forging enhanced semantic connections between queries and documents and establishing new performance benchmarks. However, reliance on extensive annotated document-query pairs limits their competitiveness in low-resource scenarios. Recent research efforts employing the few-shot capabilities of large language models (LLMs) and prompt engineering for synthetic data generation have emerged as a promising solution. Nonetheless, these approaches are hindered by the generation of lower-quality data within the convention...","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3690624.3709225","openalex_id":"https://openalex.org/W4409149729","cited_by_count":4,"quality_score":49,"matched_keywords":["LLM","retrieval"],"author_affiliations":["Baidu (China)","Chinese Academy of Sciences","Computer Network Information Center","Institute of Software","University of Chinese Academy of Sciences","University of Science and Technology of China"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7533507347106934},{"id":"https://openalex.org/C206345919","display_name":"Resource (disambiguation)","score":0.45328807830810547},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.38359349966049194},{"id":"https://openalex.org/C49774154","display_name":"Multimedia","score":0.341320663690567},{"id":"https://openalex.org/C31258907","display_name":"Computer network","score":0.09648275375366211}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":4}},{"id":"openalex:W4409154675","title":"Towards Speaker-Unknown Emotion Recognition in Conversation via Progressive Contrastive Deep Supervision","url":"https://doi.org/10.1109/taffc.2025.3558222","published":"2025-04-04","authors":["Siyuan Shen","Feng Liu","Hanyang Wang","Aimin Zhou"],"abstract":"Emotion recognition in conversation has attained increasing attention for perceiving user emotion in practical conversational applications. Conversational utterances spoken alternately by different speakers inspire most studies to leverage speaker information based on golden speaker labels. In this work, we challenge the existing paradigm of utilizing available speaker labels with a more realistic scenario, where the speaker identity of each utterance is unknown during inference. We propose Progressive Contrastive Deep Supervision for multimodal emotion recognition in conversation (PCDS), incorporating speaker diarization and emotion recognition into one unified framework. To facilitate joint task learning, we inject speaker and emotion bias into the network progressively via contrastive deep supervision, with the task-irrelevant contrast being the intermediate transition. To obtain expl...","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/taffc.2025.3558222","openalex_id":"https://openalex.org/W4409154675","cited_by_count":10,"quality_score":47,"matched_keywords":[],"author_affiliations":["Baidu (China)","East China Normal University","Midea Group (China)","Shanghai Jiao Tong University"],"concepts":[{"id":"https://openalex.org/C2777200299","display_name":"Conversation","score":0.8647710084915161},{"id":"https://openalex.org/C2777438025","display_name":"Emotion recognition","score":0.666415810585022},{"id":"https://openalex.org/C15744967","display_name":"Psychology","score":0.5842658281326294},{"id":"https://openalex.org/C133892786","display_name":"Speaker recognition","score":0.566802442073822},{"id":"https://openalex.org/C28490314","display_name":"Speech recognition","score":0.5598301291465759},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.4086947739124298},{"id":"https://openalex.org/C41895202","display_name":"Linguistics","score":0.4071533679962158},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.36755135655403137}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":10}},{"id":"openalex:W4409157766","title":"LLM-Eraser: Optimizing Large Language Model Unlearning through Selective Pruning","url":"https://doi.org/10.1145/3690624.3709312","published":"2025-04-04","authors":["Shengming Zhang","Le Zhang","Jingbo Zhou","Zhi Zheng","Hui Xiong"],"abstract":"We focus on unlearning unwanted knowledge in autoregressive large language models (LLMs) through pruning. Our goal is to selectively remove undesirable information (e.g., harmful responses, privacy-sensitive data) while ensuring the preservation of desirable knowledge (e.g., positive responses and objective facts). Previous approaches use gradient ascent (GA) over undesired knowledge to inversely optimize LLMs, which compromises the model's performance on desired knowledge. To address this limitation, we introduce a novel two-stage approach, named LLM-Eraser, for selectively identifying and editing parameters specifically associated with undesirable knowledge. LLM-Eraser operates in two stages: localization and unlearning. During the localization stage, we utilize neuron scores and trainable soft masks to identify parameters crucial to the undesired knowledge. In the unlearning stage, we...","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3690624.3709312","openalex_id":"https://openalex.org/W4409157766","cited_by_count":2,"quality_score":47,"matched_keywords":["LLM","language model"],"author_affiliations":["Baidu (China)","Chinese Academy of Medical Sciences & Peking Union Medical College","University of Science and Technology of China"],"concepts":[{"id":"https://openalex.org/C108010975","display_name":"Pruning","score":0.6648519039154053},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6542066335678101},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4866939187049866},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.47828564047813416},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.4433518648147583},{"id":"https://openalex.org/C28490314","display_name":"Speech recognition","score":0.3287421464920044},{"id":"https://openalex.org/C86803240","display_name":"Biology","score":0.0},{"id":"https://openalex.org/C6557445","display_name":"Agronomy","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":2}},{"id":"openalex:W4409157935","title":"Scaling the Vocabulary of Non-autoregressive Models for Fast Generative Retrieval","url":"https://doi.org/10.1145/3690624.3709330","published":"2025-04-04","authors":["Ravisri Valluri","Akash Kumar Mohankumar","Kushal Dave","Amit Prakash Singh","Jian Jiao","Manik Varma","Gaurav Sinha"],"abstract":"Generative Retrieval introduces a new approach to Information Retrieval by reframing it as a constrained generation task, leveraging recent advancements in Autoregressive (AR) language models. However, AR-based Generative Retrieval methods suffer from high inference latency and cost compared to traditional dense retrieval techniques, limiting their practical applicability. This paper investigates fully Non-autoregressive (NAR) language models as a more efficient alternative for generative retrieval. While standard NAR models alleviate latency and cost concerns, they exhibit a significant drop in retrieval performance (compared to AR models) due to their inability to capture dependencies between target tokens. To address this, we question the conventional choice of limiting the target token space to solely words or sub-words. We propose PIXNAR, a novel approach that expands the target voc...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3690624.3709330","openalex_id":"https://openalex.org/W4409157935","cited_by_count":0,"quality_score":45,"matched_keywords":["retrieval","efficient"],"author_affiliations":["Bellevue Hospital Center","Microsoft (United States)","Microsoft Research (India)"],"concepts":[{"id":"https://openalex.org/C159877910","display_name":"Autoregressive model","score":0.7640945911407471},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6711968183517456},{"id":"https://openalex.org/C2777601683","display_name":"Vocabulary","score":0.6631384491920471},{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.6534433364868164},{"id":"https://openalex.org/C99844830","display_name":"Scaling","score":0.5714050531387329},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5527726411819458},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.4811874032020569},{"id":"https://openalex.org/C167966045","display_name":"Generative model","score":0.43331441283226013}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4409149181","title":"Personalized Language Model Learning on Text Data Without User Identifiers","url":"https://doi.org/10.1145/3690624.3709211","published":"2025-04-04","authors":["Yucheng Ding","Yangwenjian Tan","Xiangyu Liu","Chaoyue Niu","Fandong Meng","Jie Zhou","Ning Liu","Fan Wu","Guihai Chen"],"abstract":"In many practical natural language applications, user data are highly sensitive, requiring anonymous uploads of text data from mobile devices to the cloud without user identifiers. However, the absence of user identifiers restricts the ability of cloud-based language models to provide personalized services, which are essential for catering to diverse user needs. The trivial method of replacing an explicit user identifier with a static user embedding as model input still compromises data anonymization. In this work, we propose to let each mobile device maintain a user-specific distribution to dynamically generate user embeddings, thereby breaking the one-to-one mapping between an embedding and a specific user. We further theoretically demonstrate that to prevent the cloud from tracking users via uploaded embeddings, the local distributions of different users should either be derived from....","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3690624.3709211","openalex_id":"https://openalex.org/W4409149181","cited_by_count":0,"quality_score":45,"matched_keywords":["language model","personalized"],"author_affiliations":["Shanghai Jiao Tong University","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8585273623466492},{"id":"https://openalex.org/C154504017","display_name":"Identifier","score":0.8090847134590149},{"id":"https://openalex.org/C67186912","display_name":"Data modeling","score":0.45796847343444824},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.45513761043548584},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.4413049817085266},{"id":"https://openalex.org/C136764020","display_name":"World Wide Web","score":0.43967264890670776},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3764265775680542},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.37061548233032227}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"arxiv:2504.03624","title":"Nemotron-H: A Family of Accurate and Efficient Hybrid Mamba-Transformer Models","url":"https://huggingface.co/papers/2504.03624","published":"2025-04-04","authors":["NVIDIA","Aaron Blakeman","Aarti Basant","Abhinav Khattar","Adithya Renduchintala","Akhiad Bercovich","Aleksander Ficek","Alexis Bjorlin","Ali Taghibakhshi","Amala Sanjay Deshmukh","Ameya Sunil Mahabaleshwarkar","Andrew Tao"],"abstract":"As inference-time scaling becomes critical for enhanced reasoning capabilities, it is increasingly becoming important to build models that are efficient to infer. We introduce Nemotron-H, a family of 8B and 56B/47B hybrid Mamba-Transformer models designed to reduce inference cost for a given accuracy level. To achieve this goal, we replace the majority of self-attention layers in the common Transformer model architecture with Mamba layers that perform constant computation and require constant memory per generated token. We show that Nemotron-H models offer either better or on-par accuracy compared to other similarly-sized state-of-the-art open-sourced Transformer models (e.g., Qwen-2.5-7B/72B and Llama-3.1-8B/70B), while being up to 3times faster at inference. To further increase inference speed and reduce the memory required at inference time, we created Nemotron-H-47B-Base from the 56B...","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["huggingface_search"],"source":"huggingface_search","work_type":"","doi":"","openalex_id":"","cited_by_count":0,"quality_score":43,"matched_keywords":["memory","efficient","compression","distillation"],"author_affiliations":[],"concepts":[],"official_report":false,"quality_signals":{"company_match_source":"HuggingFace Papers organization/author metadata"}},{"id":"openalex:W4409149623","title":"RankElectra: Semi-supervised Pre-training of Learning-to-Rank Electra for Web-scale Search","url":"https://doi.org/10.1145/3690624.3709395","published":"2025-04-04","authors":["Yuchen Li","Haoyi Xiong","Yongqi Zhang","Jiang Bian","Tianhao Peng","Xuhong Li","Shuaiqiang Wang","Linghe Kong","Dawei Yin"],"abstract":"While representation learning has been used to boost the performance of Learning-to-Rank (LTR) models through distilling key features for webpage ranking, the weak supervision signals extracted from users' sparse click-through data lead to inadequate representation of query-webpage pairs for ranking score prediction. Recent studies in generative LTR pre-training demonstrate the feasibility of incorporating reconstruction loss for enhanced ranking score prediction. However, LTR is afterall a regression task and it might be reasonable to find an alternate route that pre-trains LTR models with discriminative losses. Following the success of Electra in representation learning for natural language processing (NLP), this work proposes RankElectra that pre-trains the LTR model as a discriminator module inside a generative learning framework. Specifically, RankElectra first structures sparsely-a...","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3690624.3709395","openalex_id":"https://openalex.org/W4409149623","cited_by_count":5,"quality_score":42,"matched_keywords":[],"author_affiliations":["Baidu (China)","Shanghai Jiao Tong University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.638481616973877},{"id":"https://openalex.org/C2778755073","display_name":"Scale (ratio)","score":0.5881949067115784},{"id":"https://openalex.org/C2777211547","display_name":"Training (meteorology)","score":0.5516749024391174},{"id":"https://openalex.org/C164226766","display_name":"Rank (graph theory)","score":0.5478333234786987},{"id":"https://openalex.org/C86037889","display_name":"Learning to rank","score":0.43639498949050903},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4113468527793884},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.37593579292297363},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.3201017379760742}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":5}},{"id":"openalex:W4409158187","title":"Large Vison-Language Foundation Model in Baidu AIGC Image Advertising","url":"https://doi.org/10.1145/3690624.3709401","published":"2025-04-04","authors":["Zhipeng Jin","Wen Tao","Yafei Li","Yi Yang","Cong Han","Shuanglong Li","Lin Liu"],"abstract":"Recent advances in generative artificial intelligence have revolutionized information retrieval and content generation, opening up new opportunities for the e-commerce industry. Alignment learning between small models and parallel corpora cannot meet current needs. The success of ChatGPT demonstrates that large models need to first establish a fundamental understanding, and then utilize high-quality corpora for generation. Having a large model foundation is indispensable. In this paper, we establish a fundamental 10B multimodal model foundation for multimodal generation tasks and propose a scene-based alignment learning approach called conditional sample supervised fine-tuning for downstream generation tasks. Meanwhile, diffusion models are known to be vulnerable to outliers in training data. To address this, we utilize an alternative diffusion loss function that preserves the high quali...","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3690624.3709401","openalex_id":"https://openalex.org/W4409158187","cited_by_count":1,"quality_score":42,"matched_keywords":["retrieval"],"author_affiliations":["Baidu (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6871321797370911},{"id":"https://openalex.org/C2780966255","display_name":"Foundation (evidence)","score":0.6498763561248779},{"id":"https://openalex.org/C115961682","display_name":"Image (mathematics)","score":0.4710127115249634},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.43352681398391724},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.3870498538017273},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.37694719433784485},{"id":"https://openalex.org/C121684516","display_name":"Computer graphics (images)","score":0.3606170117855072},{"id":"https://openalex.org/C49774154","display_name":"Multimedia","score":0.35794901847839355}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4409158046","title":"Towards Web-scale Recommendations with LLMs: From Quality-aware Ranking to Candidate Generation","url":"https://doi.org/10.1145/3690624.3709413","published":"2025-04-04","authors":["J.J. Shah","Iman Barjasteh","Amey Barapatre","Rana Forsati","Gang Luo","Fan Wu","Yuan Fang","Xue Deng","Blake Shepard","Ronak Shah","Linjun Yang","Hongzhi Li"],"abstract":"Explore Further @ Bing is a webpage-to-webpage recommendation product, enhancing the search experience on Bing by surfacing engaging webpage recommendations tied to the search result URLs. In this paper, we present our approach for leveraging Large Language Models (LLMs) for enhancing our web-scale recommendation system. We describe the development and validation of our LLM-powered recommendation quality metric RecoDCG. We discuss our core techniques for utilizing LLMs to make our ranking stage quality-aware. Furthermore, we detail Q' recall, a recall path that enhances our system's candidate generation stage by leveraging LLMs to produce complementary and engaging recommendation candidates. We also address how we optimize our system for multiple objectives, balancing recommendation quality with click metrics. We deploy our work to production, achieving a significant improvement in recom...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3690624.3709413","openalex_id":"https://openalex.org/W4409158046","cited_by_count":0,"quality_score":41,"matched_keywords":["LLM"],"author_affiliations":["Microsoft (United States)"],"concepts":[{"id":"https://openalex.org/C189430467","display_name":"Ranking (information retrieval)","score":0.6507467031478882},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.629567563533783},{"id":"https://openalex.org/C2778755073","display_name":"Scale (ratio)","score":0.5768446326255798},{"id":"https://openalex.org/C2779530757","display_name":"Quality (philosophy)","score":0.5264514088630676},{"id":"https://openalex.org/C136764020","display_name":"World Wide Web","score":0.3426308035850525},{"id":"https://openalex.org/C2522767166","display_name":"Data science","score":0.3270471692085266},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.30761876702308655},{"id":"https://openalex.org/C205649164","display_name":"Geography","score":0.09369602799415588}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4409158359","title":"Conditional Generative Modeling for High-dimensional Marked Temporal Point Processes","url":"https://doi.org/10.1145/3690624.3709258","published":"2025-04-04","authors":["Zheng Dong","Zekai Fan","Shixiang Zhu"],"abstract":"","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3690624.3709258","openalex_id":"https://openalex.org/W4409158359","cited_by_count":1,"quality_score":38,"matched_keywords":[],"author_affiliations":["Amazon (United States)","Carnegie Mellon University"],"concepts":[{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.7139493227005005},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7003744840621948},{"id":"https://openalex.org/C88871306","display_name":"Point process","score":0.6435706615447998},{"id":"https://openalex.org/C28719098","display_name":"Point (geometry)","score":0.5055446624755859},{"id":"https://openalex.org/C3019722297","display_name":"High dimensional","score":0.453105628490448},{"id":"https://openalex.org/C167966045","display_name":"Generative model","score":0.4442563056945801},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4073812961578369},{"id":"https://openalex.org/C33923547","display_name":"Mathematics","score":0.17835035920143127}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4409158118","title":"SSE: Multimodal Semantic Data Selection and Enrichment for Industrial-scale Data Assimilation","url":"https://doi.org/10.1145/3690624.3709417","published":"2025-04-04","authors":["Maying Shen","Nadine Chang","Sifei Liu","José M. Alvarez"],"abstract":"","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3690624.3709417","openalex_id":"https://openalex.org/W4409158118","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Nvidia (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6645851135253906},{"id":"https://openalex.org/C24552861","display_name":"Data assimilation","score":0.5565111637115479},{"id":"https://openalex.org/C81917197","display_name":"Selection (genetic algorithm)","score":0.5192868113517761},{"id":"https://openalex.org/C2778755073","display_name":"Scale (ratio)","score":0.4908176362514496},{"id":"https://openalex.org/C75649859","display_name":"Assimilation (phonology)","score":0.4149511754512787},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.27159082889556885},{"id":"https://openalex.org/C41895202","display_name":"Linguistics","score":0.0869123637676239},{"id":"https://openalex.org/C205649164","display_name":"Geography","score":0.08219170570373535}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"bytedance-seed:251","title":"ViCaS: A Dataset for Combining Holistic and Pixel-level Video Understanding using Captions with Grounded Segmentation","url":"https://seed.bytedance.com/en/research/vicas-a-dataset-for-combining-holistic-and-pixel-level-video-understanding-using-captions-with-grounded-segmentation","published":"2025-04-03","authors":["Ali Athar","Xueqing Deng","Liang-Chieh Chen"],"abstract":"Recent advances in multimodal large language models (MLLMs) have expanded research in video understanding, primarily focusing on high-level tasks such as video captioning and question-answering. Meanwhile, a smaller body of work addresses dense, pixel-precise segmentation tasks, which typically involve category-guided or referral-based object segmentation. Although both directions are essential for developing models with human-level video comprehension, they have largely evolved separately, with distinct benchmarks and architectures. This paper aims to unify these efforts by introducing ViCaS, a new dataset containing thousands of challenging videos, each annotated with detailed, human-written captions and temporally consistent, pixel-accurate masks for multiple objects with phrase grounding. Our benchmark evaluates models on both holistic/high-level understanding and language-guided, pi...","companies":["ByteDance/Seed"],"matched_orgs":["ByteDance/Seed"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Computer Vision","Vision","CVPR 2025"],"author_affiliations":["ByteDance/Seed"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official ByteDance Seed publication API https://seed.bytedance.com/api/get_article_list_v2"}},{"id":"bytedance-seed:205","title":"Multi-SWE-bench: A Multilingual Benchmark for Issue Resolving","url":"https://seed.bytedance.com/en/research/multi-swe-bench-a-multilingual-benchmark-for-issue-resolving","published":"2025-04-03","authors":["Daoguang Zan","Zhirong Huang","Wei Liu","Hanwu Chen","Linhao Zhang","Shulin Xin","Lu Chen","Qi Liu","Xiaojian Zhong","Aoyan Li","Siyao Liu","Yongsheng Xiao"],"abstract":"The task of issue resolving is to modify a codebase to generate a patch that addresses a given issue. However, existing benchmarks, such as SWE-bench, focus almost exclusively on Python, making them insufficient for evaluating Large Language Models (LLMs) across diverse software ecosystems. To address this, we introduce a multilingual issue-resolving benchmark, called Multi-SWE-bench, covering Java, TypeScript, JavaScript, Go, Rust, C, and C++. It includes a total of 1,632 high-quality instances, which were carefully annotated from 2,456 candidates by 68 expert annotators, ensuring that the benchmark can provide an accurate and reliable evaluation. Based on Multi-SWE-bench, we evaluate a series of state-of-the-art models using three representative methods (Agentless, SWE-agent, and OpenHands) and present a comprehensive analysis with key empirical insights. In addition, we launch a Multi...","companies":["ByteDance/Seed"],"matched_orgs":["ByteDance/Seed"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["LLM","arXiv","agent"],"author_affiliations":["ByteDance/Seed"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official ByteDance Seed publication API https://seed.bytedance.com/api/get_article_list_v2"}},{"id":"hf-org-paper:deepseek-ai:2504.02495","title":"Inference-Time Scaling for Generalist Reward Modeling","url":"https://huggingface.co/papers/2504.02495","published":"2025-04-03","authors":["DeepSeek"],"abstract":"","companies":["DeepSeek"],"matched_orgs":["DeepSeek"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["HuggingFace org papers","deepseek-ai"],"author_affiliations":["DeepSeek"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/deepseek-ai/papers"}},{"id":"bytedance-seed:833","title":"Exploring Data Scaling Trends and Effects in Reinforcement Learning from Human Feedback","url":"https://seed.bytedance.com/en/research/exploring-data-scaling-trends-and-effects-in-reinforcement-learning-from-human-feedback","published":"2025-04-02","authors":["Wei Shen","Guanlin Liu","Zheng Wu","Ruofei Zhu","Qingping Yang","Chao Xin","Yu Yue","Lin Yan"],"abstract":"Reinforcement Learning from Human Feedback (RLHF) is crucial for aligning large language models with human preferences. While recent research has focused on algorithmic improvements, the importance of prompt-data construction has been overlooked. This paper addresses this gap by exploring data-driven bottlenecks in RLHF performance scaling, particularly reward hacking and decreasing response diversity. We introduce a hybrid reward system combining reasoning task verifiers (RTV) and a generative reward model (GenRM) to mitigate reward hacking. We also propose a novel prompt-selection method, Pre-PPO, to maintain response diversity and enhance learning effectiveness. Additionally, we find that prioritizing mathematical and coding tasks early in RLHF training significantly improves performance. Experiments across two model sizes validate our methods' effectiveness and scalability. Results s...","companies":["ByteDance/Seed"],"matched_orgs":["ByteDance/Seed"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Machine Learning","LLM","NeurIPS 2025"],"author_affiliations":["ByteDance/Seed"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official ByteDance Seed publication API https://seed.bytedance.com/api/get_article_list_v2"}},{"id":"apple:ma66vmk48lwxxs8jmdmtccbs","title":"Mutual Reinforcement of LLM Dialogue Synthesis and Summarization Capabilities for Few-Shot Dialogue Summarization","url":"https://machinelearning.apple.com/research/mutual-reinforcement-llm-dialogue","published":"2025-04-02","authors":["Yen-Ju Lu","Ting-Yao Hu","Hema Swetha Koppula","Hadi Pouransari","Jen-Hao Rick Chang","Yin Xia","Xiang Kong","Qi Zhu","Simon Wang","Oncel Tuzel","Raviteja Vemulapalli"],"abstract":"In this work, we propose Mutual Reinforcing Data Synthesis (MRDS) within LLMs to improve few-shot dialogue summarization task. Unlike prior methods that require external knowledge, we mutually reinforce the LLM\\'s dialogue synthesis and summarization capabilities, allowing them to complement each other during training and enhance overall performances. The dialogue synthesis capability is enhanced by directed preference optimization with...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["LLM","preference"],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"apple:fngcym18vuvw9yv6wzus2w1k","title":"From Dense to Dynamic: Token-Difficulty Driven MoEfication of Pre-Trained LLMs","url":"https://machinelearning.apple.com/research/dense-to-dynamic","published":"2025-04-02","authors":["Kumari Nishu","Sachin Mehta","Samira Abnar","Mehrdad Farajtabar","Maxwell Horton","Mahyar Najibi","Moin Nabi","Minsik Cho","Devang Naik"],"abstract":"Training large language models (LLMs) for different inference constraints is computationally expensive, limiting control over efficiency-accuracy trade-offs. Moreover, once trained, these models typically process tokens uniformly, regardless of their complexity, leading to static and inflexible behavior. In this paper, we introduce a post-training optimization framework, DynaMoE, that adapts a pre-trained dense LLM to a token-difficulty-driven...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["LLM"],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"apple:bt6zepa375d0grpl1xfwn3gd","title":"Universally Instance-Optimal Mechanisms for Private Statistical Estimation","url":"https://machinelearning.apple.com/research/universally-instance-optimal-mechanisms","published":"2025-04-02","authors":["Hilal Asi","John C. Duchi","Saminul Haque","Zewei Li","Feng Ruan"],"abstract":"We consider the problem of instance-optimal statistical estimation under the constraint of differential privacy where mechanisms must adapt to the difficulty of the input dataset. We prove anew instance specific lower bound using a new divergence and show it characterizes the local minimax optimal rates for private statistical estimation. We propose two new mechanisms that areuniversally instance-optimal for general estimation problems up to...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"openalex:W4409147443","title":"Estimating strawberry weight for grading by picking robot with point cloud completion and multimodal fusion network","url":"https://doi.org/10.1038/s41598-025-92641-1","published":"2025-04-02","authors":["Yiming Chen","Wei Wang","Junchao Chen","Jizhou Deng","Yuanping Xiang","博人 高橋","Xinghui Zhu","Changyun Li"],"abstract":"Strawberry grading by picking robots can eliminate the manual classification, reducing labor costs and minimizing the damage to the fruit. Strawberry size or weight is a key factor in grading, with accurate weight estimation being crucial for proper classification. In this paper, we collected 1521 sets of strawberry RGB-D images using a depth camera and manually measured the weight and size of the strawberries to construct a training dataset for the strawberry weight regression model. To address the issue of incomplete depth images caused by environmental interference with depth cameras, this study proposes a multimodal point cloud completion method specifically designed for symmetrical objects, leveraging RGB images to guide the completion of depth images in the same scene. The method follows a process of locating strawberry pixel regions, calculating centroid coordinates, determining t...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1038/s41598-025-92641-1","openalex_id":"https://openalex.org/W4409147443","cited_by_count":2,"quality_score":39,"matched_keywords":[],"author_affiliations":["Hunan Agricultural University","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.777633011341095},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6943633556365967},{"id":"https://openalex.org/C131979681","display_name":"Point cloud","score":0.6744922399520874},{"id":"https://openalex.org/C82990744","display_name":"RGB color model","score":0.6729616522789001},{"id":"https://openalex.org/C12267149","display_name":"Support vector machine","score":0.46825820207595825},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.442269504070282},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.41818729043006897}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":2}},{"id":"openalex:W4409079480","title":"NLP verification: towards a general methodology for certifying robustness","url":"https://doi.org/10.1017/s0956792525000099","published":"2025-04-02","authors":["Marco Casadio","Tanvi Dinkar","Ekaterina Komendantskaya","Luca Arnaboldi","Matthew L. Daggitt","Omri Isac","Guy Katz","Verena Rieser","Oliver Lemon"],"abstract":"Abstract Machine learning has exhibited substantial success in the field of natural language processing (NLP). For example, large language models have empirically proven to be capable of producing text of high complexity and cohesion. However, at the same time, they are prone to inaccuracies and hallucinations. As these systems are increasingly integrated into real-world applications, ensuring their safety and reliability becomes a primary concern. There are safety critical contexts where such models must be robust to variability or attack and give guarantees over their output. Computer vision had pioneered the use of formal verification of neural networks for such scenarios and developed common verification standards and pipelines, leveraging precise formal reasoning about geometric properties of data manifolds. In contrast, NLP verification methods have only recently appeared in the li...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1017/s0956792525000099","openalex_id":"https://openalex.org/W4409079480","cited_by_count":1,"quality_score":38,"matched_keywords":[],"author_affiliations":["Google (United States)","Google DeepMind (United Kingdom)","Hebrew University of Jerusalem","The University of Western Australia","University of Birmingham","University of Southampton"],"concepts":[{"id":"https://openalex.org/C63479239","display_name":"Robustness (evolution)","score":0.8112791776657104},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6868054866790771},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.5052459836006165},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4648882746696472},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.3504689335823059},{"id":"https://openalex.org/C185592680","display_name":"Chemistry","score":0.05441740155220032},{"id":"https://openalex.org/C104317684","display_name":"Gene","score":0.0},{"id":"https://openalex.org/C55493867","display_name":"Biochemistry","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/tools-for-thought-research-and-design-for-understanding-protecting-and-augmenting-human-cognition-with-generative-ai","title":"Tools for Thought: Research and Design for Understanding, Protecting, and Augmenting Human Cognition with Generative AI","url":"https://www.microsoft.com/en-us/research/publication/tools-for-thought-research-and-design-for-understanding-protecting-and-augmenting-human-cognition-with-generative-ai/","published":"2025-04-01","authors":["Lev Tankelevitch","Elena L. Glassman","Jessica He","Majeed Kazemitabaar","Aniket Kittur","Mina Lee","Srishti Palani","Advait Sarkar","Gonzalo Ramos","Yvonne Rogers","Hari Subramonyam"],"abstract":"CHI 2025 Workshop on Tools for Thought: Research and Design for Understanding, Protecting, and Augmenting Human Cognition with Generative AI.The workshop is part of the ACM (Association of Computing Machinery) CHI conference on Human Factors in Computing Systems.The workshop takes place Saturday, April 26, 2025 — 9AM-5:50PM JST — Yokohama, Japan.CHI takes place in Yokohama, Japan, from 26 April to 1 May 2025. Workshop abstract: We invite researchers, designers, practitioners, and provocateurs to explore what it means to understand and shape the impact of Generative AI (GenAI) on human cognition. GenAI radically widens the scope and capability of automation for work, learning, and creativity. While impactful, it also changes workflows and the quality of thinking involved, raising questions about its effects on cognition, including critical thinking and learning. Yet, GenAI also offers opp...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.1145/3706599.3706745","openalex_id":"https://openalex.org/W4409720586","cited_by_count":12,"quality_score":88,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Human language technologies","Human-computer interaction","1970-01-01","personalized"],"author_affiliations":["Microsoft","Carnegie Mellon University","Harvard University Press","Microsoft (United States)","Microsoft Research (United Kingdom)","Stanford University","Tableau Software (United States)","University College London","University of Chicago","University of Toronto"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/tecofes-text-column-featurization-using-semantic-analysis","title":"TeCoFeS: Text Column Featurization using Semantic Analysis","url":"https://www.microsoft.com/en-us/research/publication/tecofes-text-column-featurization-using-semantic-analysis/","published":"2025-04-01","authors":["Ananya Singha","Mukul Singh","Ashish Tiwari","Sumit Gulwani","Vu Le","Chris Parnin"],"abstract":"Extracting insights from text columns can be challenging and time-intensive. Existing methods for topic modeling and feature extraction are based on syntactic features and often overlook the semantics. We introduce the semantic text column featurization problem, and present a scalable approach for automatically solving it. We extract a small sample smartly, use a large language model (LLM) to label only the sample, and then lift the labeling to the whole column using text embeddings. We evaluate our approach by turning existing text classification benchmarks into semantic categorization benchmarks. Our approach performs better than baselines and naive use of LLMs. Venue: 1970-01-01","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.18653/v1/2025.findings-naacl.392","openalex_id":"https://openalex.org/W4411120184","cited_by_count":0,"quality_score":88,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Data platforms and analytics","Programming languages and software engineering","Computation and Language","Computer science","1970-01-01","LLM","language model"],"author_affiliations":["Microsoft","Microsoft (United States)"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/steering-large-language-models-between-code-execution-and-textual-reasoning","title":"Steering Large Language Models between Code Execution and Textual Reasoning","url":"https://www.microsoft.com/en-us/research/publication/steering-large-language-models-between-code-execution-and-textual-reasoning/","published":"2025-04-01","authors":["Yongchao Chen","Harsh Jhamtani","Srinagesh Sharma","Chuchu Fan","Chi Wang"],"abstract":"While a lot of recent research focuses on enhancing the textual reasoning capabilities of Large Language Models (LLMs) by optimizing the multi-agent framework or reasoning chains, several benchmark tasks can be solved with 100% success through direct coding, which is more scalable and avoids the computational overhead associated with textual iterating and searching. Textual reasoning has inherent limitations in solving tasks with challenges in math, logics, optimization, and searching, which is unlikely to be solved by simply scaling up the model and data size. The recently released OpenAI GPT Code Interpreter and multi-agent frameworks such as AutoGen have demonstrated remarkable proficiency of integrating code generation and execution to solve complex tasks using LLMs. However, based on our experiments on 7 existing popular methods for steering code/text generation in both single- and....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":80,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","1970-01-01","LLM","agent","multi-agent"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/an-empirical-study-of-validating-synthetic-data-for-formula-generation","title":"An Empirical Study of Validating Synthetic Data for Formula Generation","url":"https://www.microsoft.com/en-us/research/publication/an-empirical-study-of-validating-synthetic-data-for-formula-generation/","published":"2025-04-01","authors":["Usneek Singh","J. Cambronero","Sumit Gulwani","Aditya Kanade","Anirudh Khatry","Vu Le","Mukul Singh","Gust Verbruggen"],"abstract":"Large language models (LLMs) can be leveraged to help with writing formulas in spreadsheets, but resources on these formulas are scarce, impacting both the base performance of pre-trained models and limiting the ability to fine-tune them. Given a corpus of formulas, we can use a(nother) model to generate synthetic natural language utterances for fine-tuning. However, it is important to validate whether the NL generated by the LLM is indeed accurate to be beneficial for fine-tuning. In this paper, we provide empirical results on the impact of validating these synthetic training examples with surrogate objectives that evaluate the accuracy of the synthetic annotations. We demonstrate that validation improves performance over raw data across four models (2 open and 2 closed weight). Interestingly, we show that although validation tends to prune more challenging examples, it increases the co...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":80,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Data platforms and analytics","Programming languages and software engineering","Computer science","1970-01-01","LLM"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/ufo-a-ui-focused-agent-for-windows-os-interaction","title":"UFO: A UI-Focused Agent for Windows OS Interaction","url":"https://www.microsoft.com/en-us/research/publication/ufo-a-ui-focused-agent-for-windows-os-interaction/","published":"2025-04-01","authors":["Chaoyun Zhang","Liqun Li","Shilin He","Xu Zhang","Bo Qiao","Si Qin","Minghua Ma","Yu Kang","Qingwei Lin 林庆维","Saravan Rajmohan","Dongmei Zhang","Qi Zhang"],"abstract":"We introduce UFO, an innovative UI-Focused agent to fulfill user requests tailored to applications on Windows OS, harnessing the capabilities of GPT-Vision. UFO employs a dual-agent framework to meticulously observe and analyze the graphical user interface (GUI) and control information of Windows applications. This enables the agent to seamlessly navigate and operate within individual applications and across them to fulfill user requests, even when spanning multiple applications. The framework incorporates a control interaction module, facilitating action grounding without human intervention and enabling fully automated execution. Consequently, UFO transforms arduous and time-consuming processes into simple tasks achievable solely through natural language commands. We conducted testing of UFO across 9 popular Windows applications, encompassing a variety of scenarios reflective of users'....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Programming languages and software engineering","Systems and networking","1970-01-01","agent"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/ruag-learned-rule-augmented-generation-for-large-language-models","title":"RuAG: Learned-rule-augmented Generation for Large Language Models","url":"https://www.microsoft.com/en-us/research/publication/ruag-learned-rule-augmented-generation-for-large-language-models/","published":"2025-04-01","authors":["Yudi Zhang","Pei Xiao","Lu Wang","Chaoyun Zhang","Meng Fang","Yali Du","Yevgeniy Puzyrev","Randolph Yao","Si Qin","Qingwei Lin 林庆维","Mykola Pechenizkiy","Dongmei Zhang"],"abstract":"In-context learning (ICL) and Retrieval-Augmented Generation (RAG) have gained attention for their ability to enhance LLMs' reasoning by incorporating external knowledge but suffer from limited contextual window size, leading to insufficient information injection. To this end, we propose a novel framework to automatically distill large volumes of offline data into interpretable first-order logic rules, which are injected into LLMs to boost their reasoning capabilities. Our method begins by formulating the search process relying on LLMs' commonsense, where LLMs automatically define head and body predicates. Then, we apply Monte Carlo Tree Search (MCTS) to address the combinational searching space and efficiently discover logic rules from data. The resulting logic rules are translated into natural language, allowing targeted knowledge injection and seamless integration into LLM prompts for...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Algorithms","Artificial intelligence","1970-01-01","LLM","retrieval"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/proving-olympiad-inequalities-by-synergizing-llms-and-symbolic-reasoning","title":"Proving Olympiad Inequalities by Synergizing LLMs and Symbolic Reasoning","url":"https://www.microsoft.com/en-us/research/publication/proving-olympiad-inequalities-by-synergizing-llms-and-symbolic-reasoning/","published":"2025-04-01","authors":["Zenan Li","Zhaoyu Li","Wen Tang","Xian Zhang","Yuan Yao","Xujie Si","Fan Yang","Kaiyu Yang","Xiaoxing Ma"],"abstract":"Large language models (LLMs) can prove mathematical theorems formally by generating proof steps (\\textit{a.k.a.} tactics) within a proof system. However, the space of possible tactics is vast and complex, while the available training data for formal proofs is limited, posing a significant challenge to LLM-based tactic generation. To address this, we introduce a neuro-symbolic tactic generator that synergizes the mathematical intuition learned by LLMs with domain-specific insights encoded by symbolic methods. The key aspect of this integration is identifying which parts of mathematical reasoning are best suited to LLMs and which to symbolic methods. While the high-level idea of neuro-symbolic integration is broadly applicable to various mathematical problems, in this paper, we focus specifically on Olympiad inequalities (Figure~1). We analyze how humans solve these problems and distill th...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Systems and networking","Computer science","1970-01-01","LLM","efficient"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/performance-aware-llm-load-balancer-for-mixed-workloads","title":"Performance Aware LLM Load Balancer for Mixed Workloads","url":"https://www.microsoft.com/en-us/research/publication/performance-aware-llm-load-balancer-for-mixed-workloads/","published":"2025-04-01","authors":["Kunal Jain","A. Parayil","Ankur Mallick","Esha Choukse","Xiaoting Qin","Jue Zhang","Íñigo Goiri","Rujia Wang","Chetan Bansal","Victor Ruehle","Anoop Kulkarni","Steve Kofsky"],"abstract":"Large Language Model (LLM) workloads consist of distinct prefilland decode phases, each with unique compute and memory requirements that should be considered when routing input queries acrosscluster instances. However, existing load-balancing algorithms treatthese workloads as monolithic jobs, ignoring the differences between the two phases. This oversight leads to suboptimal querydistribution and increased response latency. In our work, we firstcharacterize the factors affecting response latency during LLM inference. We show that balancing inference requests across availableLLM instances can improve end-to-end latency more than simplyoptimizing the instance-level scheduler. Motivated by these findings, we propose a heuristic-guided, reinforcement learning-basedrouter for data-driven, workload-aware scheduling. Our router distributes queries across LLM instances by using a trainable resp...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","1970-01-01","LLM","language model","memory"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/make-some-noise-towards-llm-audio-reasoning-and-generation-using-sound-tokens","title":"Make Some Noise: Towards LLM audio reasoning and generation using sound tokens","url":"https://www.microsoft.com/en-us/research/publication/make-some-noise-towards-llm-audio-reasoning-and-generation-using-sound-tokens/","published":"2025-04-01","authors":["Shivam Mehta","Nebojsa Jojic","Hannes Gamper"],"abstract":"Integrating audio comprehension and generation into large language models (LLMs) remains challenging due to the continuous nature of audio and the resulting high sampling rates. Here, we introduce a novel approach that combines Variational Quantization with Conditional Flow Matching to convert audio into ultra-low bitrate discrete tokens of 0.23kpbs, allowing for seamless integration with text tokens in LLMs. We fine-tuned a pretrained text-based LLM using Low-Rank Adaptation (LoRA) to assess its effectiveness in achieving true multimodal capabilities, i.e., audio comprehension and generation. Our tokenizer outperforms a traditional VQ-VAE across various datasets with diverse acoustic events. Despite the substantial loss of fine-grained details through audio tokenization, our multimodal LLM trained with discrete tokens achieves competitive results in audio comprehension with state-of-the...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.1109/icassp49660.2025.10888809","openalex_id":"https://openalex.org/W4408354829","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Audio and Acoustics","Audio and Speech Processing","Generative AI","LLM","quantization"],"author_affiliations":["Microsoft","KTH Royal Institute of Technology","Microsoft (United States)"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/evaluating-the-evaluator-measuring-llms-adherence-to-task-evaluation-instructions","title":"Evaluating the Evaluator: Measuring LLMs' Adherence to Task Evaluation Instructions","url":"https://www.microsoft.com/en-us/research/publication/evaluating-the-evaluator-measuring-llms-adherence-to-task-evaluation-instructions/","published":"2025-04-01","authors":["Bhuvanashree Murugadoss","Christian Poelitz","Ian Drosos","Vu Le","Nick McKenna","Carina Suzana Negreanu","Chris Parnin","Advait Sarkar"],"abstract":"LLMs-as-a-judge is a recently popularized method which replaces human judgements in task evaluation with automatic evaluation using LLMs. Due to widespread use of RLHF (Reinforcement Learning from Human Feedback), state-of-the-art LLMs like GPT4 and Llama3 are expected to have strong alignment with human preferences when prompted for a quality judgement, such as the coherence of a text. While this seems beneficial, it is not clear whether the assessments by an LLM-as-a-judge constitute only an evaluation based on the instructions in the prompts, or reflect its preference for high-quality data similar to its fine-tune data. To investigate how much influence prompting the LLMs-as-a-judge has on the alignment of AI judgements to human judgements, we analyze prompts with increasing levels of instructions about the target quality of an evaluation, for several LLMs-as-a-judge. Further, we comp...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Human language technologies","1970-01-01","LLM","preference"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/concept-distillation-from-strong-to-weak-models-via-hypotheses-to-theories-prompting","title":"Concept Distillation from Strong to Weak Models via Hypotheses-to-Theories Prompting","url":"https://www.microsoft.com/en-us/research/publication/concept-distillation-from-strong-to-weak-models-via-hypotheses-to-theories-prompting/","published":"2025-04-01","authors":["Emmanuel Aboah Boateng","Cassiano Becker","Nabiha Asghar","Kabir Walia","Ashwin Srinivasan","Ehi Nosakhare","Soundararajan Srinivasan","Victor Dibia"],"abstract":"Hand-crafting high quality prompts to optimize the performance of language models is a complicated and labor-intensive process. Furthermore, when migrating to newer, smaller, or weaker models (possibly due to latency or cost gains), prompts need to be updated to re-optimize the task performance. We propose Concept Distillation (CD), an automatic prompt optimization technique for enhancing weaker models on complex tasks. CD involves: (1) collecting mistakes made by weak models with a base prompt (initialization), (2) using a strong model to generate reasons for these mistakes and create rules/concepts for weak models (induction), and (3) filtering these rules based on validation set performance and integrating them into the base prompt (deduction/verification). We evaluated CD on NL2Code and mathematical reasoning tasks, observing significant performance boosts for small and weaker langua...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Human language technologies","1970-01-01","efficient","distillation"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/causal-order-the-key-to-leveraging-imperfect-experts-in-causal-inference","title":"Causal Order: The Key to Leveraging Imperfect Experts in Causal Inference","url":"https://www.microsoft.com/en-us/research/publication/causal-order-the-key-to-leveraging-imperfect-experts-in-causal-inference/","published":"2025-04-01","authors":["Aniket Vashishtha","Abbavaram Gowtham Reddy","Abhinav Kumar","Saketh Bachu","Vineeth N Balasubramanian","Amit Sharma"],"abstract":"Large Language Models (LLMs) have recently been used as experts to infer causal graphs, often by repeatedly applying a pairwise prompt that asks about the causal relationship of each variable pair. However, such experts, including human domain experts, cannot distinguish between direct and indirect effects given a pairwise prompt. Therefore, instead of the graph, we propose that causal order be used as a more stable output interface for utilizing expert knowledge. When querying a perfect expert with a pairwise prompt, we show that the inferred graph can have significant errors whereas the causal order is always correct. In practice, however, LLMs are imperfect experts and we find that pairwise prompts lead to multiple cycles and do not yield a valid order. Hence, we propose a prompting strategy that introduces an auxiliary variable for every variable pair and instructs the LLM to avoid c...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Causal inference","Machine learning","1970-01-01","LLM"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/llm-assistance-for-memory-safety","title":"LLM Assistance for Memory Safety","url":"https://www.microsoft.com/en-us/research/publication/llm-assistance-for-memory-safety/","published":"2025-04-01","authors":["Nausheen Mohammed","Akash Lal","Aseem Rastogi","Rahul Sharma","Subhajit Roy"],"abstract":"Memory safety violations in low-level code, written in languages like C, continues to remain one of the major sources of software vulnerabilities. One method of removing such violations by construction is to port C code to a safe C dialect. Such dialects rely on programmer-supplied annotations to guarantee safety with minimal runtime overhead. This porting, however, is a manual process that imposes significant burden on the programmer and, hence, there has been limited adoption of this technique.The task of porting not only requires inferring annotations, but may also need refactoring/rewriting of the code to make it amenable to such annotations. In this paper, we use Large Language Models (LLMs) towards addressing both these concerns. We show how to harness LLM capabilities to do complex code reasoning as well as rewriting of large codebases. We also present a novel framework for whole-...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Inproceedings (Conference)","Programming languages and software engineering","1970-01-01","LLM","memory"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/execution-guided-within-prompt-search-for-programming-by-example","title":"Execution-guided within-prompt search for programming-by-example","url":"https://www.microsoft.com/en-us/research/publication/execution-guided-within-prompt-search-for-programming-by-example/","published":"2025-04-01","authors":["Gust Verbruggen","Ashish Tiwari","Mukul Singh","Vu Le","Sumit Gulwani"],"abstract":"Large language models (LLMs) can generate code from examples without beinglimited to a DSL, but they lack search, as sampled programs are independent. Inthis paper, we use an LLM as a policy that generates lines of code and then jointhese lines of code to let the LLM implicitly estimate the value of each of theselines in its next iteration. We further guide the policy and value estimation byexecuting each line and annotating it with its results on the given examples. Thisallows us to search for programs within a single (expanding) prompt until a soundprogram is found, by letting the policy reason in both the syntactic (code) andsemantic (execution) space. We evaluate within-prompt search on straight-linePython code generation using five benchmarks across different domains (strings,lists, and arbitrary Python programming problems). We show that the model usesthe execution results to guide...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","1970-01-01","LLM"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/ecoact-economic-agent-determines-when-to-register-what-action","title":"EcoAct: Economic Agent Determines When to Register What Action","url":"https://www.microsoft.com/en-us/research/publication/ecoact-economic-agent-determines-when-to-register-what-action/","published":"2025-04-01","authors":["Shaokun Zhang","Jieyu Zhang","Dujian Ding","Mirian Hipolito Garcia","Ankur Mallick","Daniel Madrigal","Menglin Xia","Victor Ruehle","Qingyun Wu","Chi Wang"],"abstract":"Recent advancements have enabled Large Language Models (LLMs) to function as agents that can perform actions using external tools. This requires registering, i.e., integrating tool information into the LLM context prior to taking actions. Current methods indiscriminately incorporate all candidate tools into the agent's context and retain them across multiple reasoning steps. This process remains opaque to LLM agents and is not integrated into their reasoning procedures, leading to inefficiencies due to increased context length from irrelevant tools. To address this, we introduce EcoAct, a tool using algorithm that allows LLMs to selectively register tools as needed, optimizing context use. By integrating the tool registration process into the reasoning procedure, EcoAct reduces computational costs by over 50% in multiple steps reasoning tasks while maintaining performance, as demonstrate...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","1970-01-01","LLM","agent"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/chatbench-from-static-benchmarks-to-human-ai-evaluation","title":"ChatBench: From Static Benchmarks to Human-AI Evaluation","url":"https://www.microsoft.com/en-us/research/publication/chatbench-from-static-benchmarks-to-human-ai-evaluation/","published":"2025-04-01","authors":["Serina Chang","Ashton Anderson","Jake Hofman"],"abstract":"With the rapid adoption of LLM-based chatbots, there is a pressing need to evaluate what humans and LLMs can achieve together. However, standard benchmarks, such as MMLU, measure LLM capabilities in isolation (i.e., \"AI-alone\"). Here, we design and conduct a user study to convert MMLU questions into user-AI conversations, by seeding the user with the question and having them carry out a conversation with the LLM to answer their question. We release ChatBench , a new dataset with AI-alone, user-alone, and user-AI data for 396 questions and two LLMs, including 144K answers and 7,336 user-AI conversations. We find that AI-alone accuracy fails to predict user-AI accuracy, with significant differences across multiple subjects (math, physics, and moral reasoning), and we analyze the user-AI conversations to provide insight into how they diverge from AI-alone benchmarks. Finally, we show that f...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Unpublished","Artificial intelligence","Human-computer interaction","Human–computer interaction","LLM"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/boosting-large-language-model-for-speech-synthesis-an-empirical-study","title":"Boosting Large Language Model for Speech Synthesis: An Empirical Study","url":"https://www.microsoft.com/en-us/research/publication/boosting-large-language-model-for-speech-synthesis-an-empirical-study/","published":"2025-04-01","authors":["Hongkun Hao","Long Zhou","Shujie Liu","Jinyu Li","Shujie Hu","Rui Wang","Furu Wei"],"abstract":"Large language models (LLMs) have made significant advancements in natural language processing and are concurrently extending the language ability to other modalities, such as speech and vision. Nevertheless, most of the previous work focuses on prompting LLMs with perception abilities like auditory comprehension, and the effective approach for augmenting LLMs with speech synthesis capabilities remains ambiguous. In this paper, we conduct a comprehensive empirical exploration of boosting LLMs with the ability to generate speech, by combining pre-trained LLM LLaMA/OPT and text-to-speech synthesis model VALL-E. We compare three integration methods between LLMs and speech synthesis models, including directly fine-tuned LLMs, superposed layers of LLMs and VALL-E, and coupled LLMs and VALL-E using LLMs as a powerful text encoder. Experimental results show that, using LoRA method to fine-tune....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Inproceedings (Conference)","Human language technologies","1970-01-01","LLM","language model"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/rustassistant-using-llms-to-fix-compilation-errors-in-rust-code","title":"RustAssistant: Using LLMs to Fix Compilation Errors in Rust Code","url":"https://www.microsoft.com/en-us/research/publication/rustassistant-using-llms-to-fix-compilation-errors-in-rust-code/","published":"2025-04-01","authors":["Pantazis Deligiannis","Akash Lal","Nikita Mehrotra","Rishi Poddar","Aseem Rastogi"],"abstract":"The Rust programming language, with its safety guarantees, has established itself as a viable choice for low-level systems programming language over the traditional, unsafe alternatives like C/C++. These guarantees come from a strong ownership-based type system, as well as primitive support for features like closures, pattern matching, etc., that make the code more concise and amenable to reasoning. These unique Rust features also pose a steep learning curve for programmers.This paper presents a tool called RustAssistant that leverages the emergent capabilities of Large Language Models (LLMs) to automatically suggest fixes for Rust compilation errors. RustAssistant uses a careful combination of prompting techniques as well as iteration between an LLM and the Rust compiler to deliver high accuracy of fixes. RustAssistant is able to achieve an impressive peak accuracy of roughly 74% on rea...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Inproceedings (Conference)","Programming languages and software engineering","1970-01-01","LLM"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/irokobench-a-new-benchmark-for-african-languages-in-the-age-of-large-language-models","title":"IrokoBench: A New Benchmark for African Languages in the Age of Large Language Models","url":"https://www.microsoft.com/en-us/research/publication/irokobench-a-new-benchmark-for-african-languages-in-the-age-of-large-language-models/","published":"2025-04-01","authors":["David Ifeoluwa Adelani","Jessica Ojo","Israel Abebe Azime","Jian Yun Zhuang","Jesujoba O. Alabi","Xuanli He","Millicent Ochieng","Sara Hooker","Andiswa Bukula","En-Shiun Annie Lee","Chiamaka Chukwuneke","Happy Buzaaba"],"abstract":"Despite the widespread adoption of Large language models (LLMs), their remarkable capabilities remain limited to a few high-resource languages. Additionally, many low-resource languages (\\eg African languages) are often evaluated only on basic text classification tasks due to the lack of appropriate or comprehensive benchmarks outside of high-resource languages. In this paper, we introduce IrokoBench -- a human-translated benchmark dataset for 17 typologically-diverse low-resource African languages covering three tasks: natural language inference~(AfriXNLI), mathematical reasoning~(AfriMGSM), and multi-choice knowledge-based question answering~(AfriMMLU). We use IrokoBench to evaluate zero-shot, few-shot, and translate-test settings~(where test sets are translated into English) across 10 open and six proprietary LLMs. Our evaluation reveals a significant performance gap between high-reso...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Human language technologies","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/irl-dittos-embodied-multimodal-ai-agent-interactions-in-open-spaces","title":"IRL Dittos: Embodied Multimodal AI Agent Interactions in Open Spaces","url":"https://www.microsoft.com/en-us/research/publication/irl-dittos-embodied-multimodal-ai-agent-interactions-in-open-spaces/","published":"2025-04-01","authors":["Seonghee Lee","Denae Ford","John Tang","Sasa Junuzovic","Asta Roseway","Ed Cutrell","Kori Inkpen"],"abstract":"We introduce the In Real Life (IRL) Ditto, an AI-driven embodied agent designed to represent remote colleagues in shared office spaces, creating opportunities for real-time exchanges even in their absence. IRL Ditto offers a unique hybrid experience by allowing in-person colleagues to encounter a digital version of their remote teammates, initiating greetings, updates, or small talk as they might in person. Our research question examines: How can the IRL Ditto influence interactions and relationships among colleagues in a shared office space? Through a four-day study, we assessed IRL Ditto's ability to strengthen social ties by simulating presence and enabling meaningful interactions across different levels of social familiarity. We find that enhancing social relationships depended deeply on the foundation of the relationship participants had with the source of the IRL Ditto. This study....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Article (Journal)","Artificial intelligence","Human-computer interaction","agent"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/exploring-how-llms-capture-and-represent-domain-specific-knowledge","title":"Exploring How LLMs Capture and Represent Domain-Specific Knowledge","url":"https://www.microsoft.com/en-us/research/publication/exploring-how-llms-capture-and-represent-domain-specific-knowledge/","published":"2025-04-01","authors":["Mirian Hipolito Garcia","Camille Couturier","Daniel Madrigal","Ankur Mallick","Anastasios Kyrillidis","Robert Sim","Victor Ruehle","Saravan Rajmohan"],"abstract":"We study whether Large Language Models (LLMs) inherently capture domain specific nuances in natural language. Our experiments probe the domain sensitivity of LLMs by examining their ability to distinguish queries from different domains using hidden states generated during the prefill phase. We reveal latent domain-related trajectories that indicate the model’s internal recognition of query domains. We also study the robustness of these domain representations to variations in prompt styles and sources. Our approach leverages these representations for model selection, mapping the LLM that best matches the domain trace of the input query (i.e., the model with the highest performance on similar traces). Our findings show that LLMs can differentiate queries for related domains, and that the fine-tuned model is not always the most accurate. Unlike previous work, our interpretations apply to bo...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Unpublished","Artificial intelligence","LLMs Inference","LLM"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/i-am-the-one-and-only-your-cyber-bff-understanding-the-impact-of-genai-requires-understanding-the-impact-of-anthropomorphic-ai","title":"\"I Am the One and Only, Your Cyber BFF\": Understanding the Impact of GenAI Requires Understanding the Impact of Anthropomorphic AI","url":"https://www.microsoft.com/en-us/research/publication/i-am-the-one-and-only-your-cyber-bff-understanding-the-impact-of-genai-requires-understanding-the-impact-of-anthropomorphic-ai/","published":"2025-04-01","authors":["Myra Cheng","Alicia DeVrio","Lisa Egede","Su Lin Blodgett","Alexandra Olteanu"],"abstract":"Many state-of-the-art generative AI (GenAI) systems are increasingly prone to anthropomorphic behaviors, i.e., to generating outputs that are perceived to be human-like. While this has led to scholars increasingly raising concerns about possible negative impacts such anthropomorphic AI systems can give rise to, anthropomorphism in AI development, deployment, and use remains vastly overlooked, understudied, and underspecified. In this perspective, we argue that we cannot thoroughly map the social impacts of generative AI without mapping the social impacts of anthropomorphic AI, and outline a call to action.","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Miscellaneous","Artificial intelligence","Human language technologies","Social sciences"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/shifting-work-patterns-with-generative-ai","title":"Shifting Work Patterns with Generative AI","url":"https://www.microsoft.com/en-us/research/publication/shifting-work-patterns-with-generative-ai/","published":"2025-04-01","authors":["Eleanor Dillon","Sonia Jaffe","Nicole Immorlica","Christopher Stanton"],"abstract":"We present evidence on how generative AI changes the work patterns of knowledge workers using data from a 6-month-long, cross-industry, randomized field experiment. Half of the 6,000 workers in the study received access to a generative AI tool integrated into the applications they already used for emails, document creation, and meetings. We find that access to the AI tool during the first year of its release primarily impacted behaviors that could be changed independently and not behaviors that required coordination to change: workers who used the tool spent 3 fewer hours, or 25% less time on email each week (intent to treat estimate is 1.4 hours) and seemed to complete documents moderately faster, but did not significantly change time spent in meetings.","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.2139/ssrn.5216904","openalex_id":"https://openalex.org/W4409817367","cited_by_count":1,"quality_score":65,"matched_keywords":["Unpublished","Economics","Generative AI"],"author_affiliations":["Microsoft","Harvard University","Microsoft (United States)","Microsoft Research (India)","Microsoft Research (United Kingdom)","Microsoft Research New England (United States)","National Bureau of Economic Research"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"bytedance-seed:204","title":"Recitation over Reasoning: How Cutting-Edge Language Models Can Fail on Elementary School-Level Reasoning Problems?","url":"https://seed.bytedance.com/en/research/recitation-over-reasoning-how-cutting-edge-language-models-can-fail-on-elementary-school-level-reasoning-problems","published":"2025-04-01","authors":["Kai Yan","Yufei Xu","Zhengyin Du","Xuesong Yao","Zheyu Wang","Xiaowen Guo","Jiecao Chen"],"abstract":"The rapid escalation from elementary school-level to frontier problems of the difficulty for LLM benchmarks in recent years have weaved a miracle for researchers that we are only inches away from surpassing human intelligence. However, is the LLMs' remarkable reasoning ability indeed comes from true intelligence by human standards, or are they simply reciting solutions witnessed during training at an Internet level? To study this problem, we propose RoR-Bench, a novel, multi-modal benchmark for detecting LLM's recitation behavior when asked simple reasoning problems but with conditions subtly shifted, and conduct empirical analysis on our benchmark. Surprisingly, we found existing cutting-edge LLMs unanimously exhibits extremely severe recitation behavior; by changing one phrase in the condition, top models such as OpenAI-o1 and DeepSeek-R1 can suffer 60%. External paper link: https://ar...","companies":["ByteDance/Seed"],"matched_orgs":["ByteDance/Seed"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Core Machine Learning","LLM","https://arxiv.org/abs/2504.00509"],"author_affiliations":["ByteDance/Seed"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official ByteDance Seed publication API https://seed.bytedance.com/api/get_article_list_v2"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/early-impacts-of-m365-copilot","title":"Early Impacts of M365 Copilot","url":"https://www.microsoft.com/en-us/research/publication/early-impacts-of-m365-copilot/","published":"2025-04-01","authors":["Eleanor Dillon","Sonia Jaffe","Sida Peng","Alexia Cambon"],"abstract":"Advances in generative AI have rapidly expanded the potential of computers to perform or assist in a wide array of tasks traditionally performed by humans. We analyze a large, real-world randomized experiment of over 6,000 workers at 56 firms to present some of the earliest evidence on how these technologies are changing the way knowledge workers do their jobs. We find substantial time savings on common core tasks across a wide range of industries and occupations: workers who make use of this technology spent half an hour less reading email each week and completed documents 12% faster. Despite the newness of the technology, nearly 40% of workers who were given access to the tool used it regularly in their work throughout the 6-month study.","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Tech Report","Artificial intelligence","Economics"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W4409042270","title":"A clinically accessible small multimodal radiology model and evaluation metric for chest X-ray findings","url":"https://doi.org/10.1038/s41467-025-58344-x","published":"2025-04-01","authors":["Juan Manuel Zambrano Chaves","Shih-Cheng Huang","Yanbo Xu","Hanwen Xu","Naoto Usuyama","Sheng Zhang","Fei Wang","Yujia Xie","Mahmoud Khademi","Ziyi Yang","Hany Awadalla","Julia Gong"],"abstract":"Large foundation models show promise in biomedicine but face challenges in clinical use due to performance gaps, accessibility, cost, and lack of scalable evaluation. Here we show that open-source small multimodal models can bridge these gaps in radiology by generating free-text findings from chest X-ray images. Our data-centric approach leverages 697K curated radiology image-text pairs to train a specialized, domain-adapted chest X-ray encoder. We integrate this encoder with pre-trained language models via a lightweight adapter that aligns image and text modalities. To enable robust, clinically relevant evaluation, we develop and validate CheXprompt, a GPT-4-based metric for assessing factual accuracy aligned with radiologists' evaluations. Benchmarked with CheXprompt and other standard factuality metrics, LLaVA-Rad (7B) achieves state-of-the-art performance, outperforming much larger m...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1038/s41467-025-58344-x","openalex_id":"https://openalex.org/W4409042270","cited_by_count":24,"quality_score":61,"matched_keywords":[],"author_affiliations":["Microsoft (United States)","Microsoft Research (United Kingdom)","Stanford University","University of California, Davis","University of California, San Francisco","University of Southern California","University of Washington"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7622295022010803},{"id":"https://openalex.org/C48044578","display_name":"Scalability","score":0.6844998598098755},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5480128526687622},{"id":"https://openalex.org/C118505674","display_name":"Encoder","score":0.5121943354606628},{"id":"https://openalex.org/C31601959","display_name":"Medical imaging","score":0.4773186147212982},{"id":"https://openalex.org/C176217482","display_name":"Metric (unit)","score":0.4690715968608856},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.4457305073738098},{"id":"https://openalex.org/C19527891","display_name":"Medical physics","score":0.3575892150402069}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":24}},{"id":"arxiv:2504.00573","title":"Training a Utility-based Retriever Through Shared Context Attribution for Retrieval-Augmented Language Models","url":"https://huggingface.co/papers/2504.00573","published":"2025-04-01","authors":["Yilong Xu","Jinhua Gao","Xiaoming Yu","Yuanhai Xue","Baolong Bi","Huawei Shen","Xueqi Cheng"],"abstract":"Retrieval-Augmented Language Models boost task performance, owing to the retriever that provides external knowledge. Although crucial, the retriever primarily focuses on semantics relevance, which may not always be effective for generation. Thus, utility-based retrieval has emerged as a promising topic, prioritizing passages that provides valid benefits for downstream tasks. However, due to insufficient understanding, capturing passage utility accurately remains unexplored. This work proposes SCARLet, a framework for training utility-based retrievers in RALMs, which incorporates two key factors, multi-task generalization and inter-passage interaction. First, SCARLet constructs shared context on which training data for various tasks is synthesized. This mitigates semantic bias from context differences, allowing retrievers to focus on learning task-specific utility for better task generali...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["huggingface_search"],"source":"huggingface_search","work_type":"","doi":"","openalex_id":"","cited_by_count":0,"quality_score":31,"matched_keywords":["retrieval"],"author_affiliations":[],"concepts":[],"official_report":false,"quality_signals":{"company_match_source":"HuggingFace Papers organization/author metadata"}},{"id":"official:a8e330f52afc9d61","title":"Minitron-SSM: Efficient Hybrid Language Model Compression through Group-Aware SSM Pruning","url":"https://research.nvidia.com/publication/2025-04_minitron-ssm-efficient-hybrid-language-model-compression-through-group-aware","published":"2025-04","authors":["Ali Taghibakhshi","Sharath Turuvekere Sreenivas","Saurav Muralidharan","Marcin Chochowski","Yashaswi Karnati","Raviraj Joshi","Ameya Sunil Mahabaleshwarkar","Zijia Chen","Yoshi Suhara","Oluwatobi Olabiyi","Daniel Korzekwa","Mostofa Patwary"],"abstract":"Official NVIDIA Research publication. NeurIPS","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["NeurIPS","language model","efficient","compression"],"author_affiliations":["NVIDIA"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official NVIDIA Research publications page https://research.nvidia.com/publications?f%5B0%5D=publication_date%3A2025&page=2"}},{"id":"hf-org-paper:stepfun-ai:2504.15281","title":"StyleMe3D: Stylization with Disentangled Priors by Multiple Encoders on 3D Gaussians","url":"https://huggingface.co/papers/2504.15281","published":"2025-04","authors":["StepFun"],"abstract":"","companies":["StepFun"],"matched_orgs":["StepFun"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["HuggingFace org papers","stepfun-ai"],"author_affiliations":["StepFun"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/stepfun-ai/papers"}},{"id":"official:9fce80fdbaee434d","title":"UniWav: Towards Unified Pre-training for Speech Representation Learning and Generation","url":"https://research.nvidia.com/publication/2025-04_uniwav-towards-unified-pre-training-speech-representation-learning-and","published":"2025-04","authors":["Alexander H. Liu","Sang-gil Lee","Huck Yang","Yuan Gong","Frank Wang","James R. Glas","Rafael Valle"],"abstract":"Official NVIDIA Research publication. ICLR","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["ICLR"],"author_affiliations":["NVIDIA"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official NVIDIA Research publications page https://research.nvidia.com/publications?f%5B0%5D=publication_date%3A2025&page=2"}},{"id":"official:b7af4d2dce2a6eba","title":"Towards Neural Scaling Laws for Time Series Foundation Models","url":"https://research.nvidia.com/publication/2025-04_towards-neural-scaling-laws-time-series-foundation-models","published":"2025-04","authors":["Qingren Yao","Huck Yang","Renhe Jiang","Ming Jin","Shirui Pan"],"abstract":"Official NVIDIA Research publication. ICLR","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["ICLR"],"author_affiliations":["NVIDIA"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official NVIDIA Research publications page https://research.nvidia.com/publications?f%5B0%5D=publication_date%3A2025&page=2"}},{"id":"official:8cbd1118e5aecfea","title":"LongVILA: Scaling Long-Context Visual Language Models for Long Videos","url":"https://research.nvidia.com/publication/2025-04_longvila-scaling-long-context-visual-language-models-long-videos","published":"2025-04","authors":["Yukang Chen","Fuzhao Xue","Dacheng Li","Qinghao Hu","Ligeng Zhu","Xiuyu Li","Yunhao Fang","Haotian Tang","Shang Yang","Zhijian Liu","Ethan He","Hongxu Yin"],"abstract":"Official NVIDIA Research publication. ICLR","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["ICLR"],"author_affiliations":["NVIDIA"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official NVIDIA Research publications page https://research.nvidia.com/publications?f%5B0%5D=publication_date%3A2025&page=2"}},{"id":"official:613125509347fa3b","title":"Lightning-Fast Image Inversion and Editing for Text-to-Image Diffusion Models,","url":"https://research.nvidia.com/publication/2025-04_lightning-fast-image-inversion-and-editing-text-image-diffusion-models","published":"2025-04","authors":["Dvir Samuel","Barak Meiri","Haggai Maron","Yoad Tewel","Nir Darshan","Shai Avidan","Gal Chechik","Rami Ben-Ari"],"abstract":"Official NVIDIA Research publication. ICLR","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["ICLR"],"author_affiliations":["NVIDIA"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official NVIDIA Research publications page https://research.nvidia.com/publications?f%5B0%5D=publication_date%3A2025&page=2"}},{"id":"official:c29d82690aca4ce6","title":"Latent Action Pretraining from Videos","url":"https://research.nvidia.com/publication/2025-04_latent-action-pretraining-videos","published":"2025-04","authors":["Seonghyeon Ye","Joel Jang","Byeongguk Jeon","Sejune Joo","Jianwei Yang","Baolin Peng","Ajay Mandlekar","Reuben Tan","Yu-Wei Chao","Yuchen Lin","Lars Liden","Kimin Lee"],"abstract":"Official NVIDIA Research publication. ICLR","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["ICLR"],"author_affiliations":["NVIDIA"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official NVIDIA Research publications page https://research.nvidia.com/publications?f%5B0%5D=publication_date%3A2025&page=2"}},{"id":"official:3d402d2c73dbafe7","title":"Hymba: A Hybrid-head Architecture for Small Language Models","url":"https://research.nvidia.com/publication/2025-04_hymba-hybrid-head-architecture-small-language-models","published":"2025-04","authors":["Xin Dong","Yonggan Fu","Shizhe Diao","Wonmin Byeon","Zijia Chen","Ameya Sunil Mahabaleshwarkar","Shih-Yang Liu","Matthijs Van keirsbilck","Min-Hung Chen","Yoshi Nishi","Yingyan Celine Lin","Jan Kautz"],"abstract":"Official NVIDIA Research publication. ICLR","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["ICLR"],"author_affiliations":["NVIDIA"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official NVIDIA Research publications page https://research.nvidia.com/publications?f%5B0%5D=publication_date%3A2025&page=2"}},{"id":"official:7196de95886e5b1f","title":"Audio Large Language Models Can Be Descriptive Speech Quality Evaluators","url":"https://research.nvidia.com/publication/2025-04_audio-large-language-models-can-be-descriptive-speech-quality-evaluators","published":"2025-04","authors":["Chen Chen","Yuchen Hu","Siyin Wang","Helin Wang","Zhehuai Chen","Chao Zhang","Huck Yang","EngSiong Chng"],"abstract":"Official NVIDIA Research publication. ICLR","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["ICLR"],"author_affiliations":["NVIDIA"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official NVIDIA Research publications page https://research.nvidia.com/publications?f%5B0%5D=publication_date%3A2025&page=2"}},{"id":"official:c4513f9c4f5e7175","title":"Sionna RT: Technical Report","url":"https://research.nvidia.com/publication/2025-04_sionna-rt-technical-report","published":"2025-04","authors":["Fayçal Aït Aoudia","Jakob Hoydis","Merlin Nimier-David","Sebastian Cammerer","Alex Keller"],"abstract":"Official NVIDIA Research publication.","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["NVIDIA"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official NVIDIA Research publications page https://research.nvidia.com/publications?f%5B0%5D=publication_date%3A2025&page=2"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/misaligned-roles-misplaced-images-structural-input-perturbations-expose-multimodal-alignment-blind-spots","title":"Misaligned Roles, Misplaced Images: Structural Input Perturbations Expose Multimodal Alignment Blind Spots","url":"https://www.microsoft.com/en-us/research/publication/misaligned-roles-misplaced-images-structural-input-perturbations-expose-multimodal-alignment-blind-spots/","published":"2025-03-31","authors":["Erfan Shayegani","G. M. Shahariar","Sara Abdali","Lei Yu","Nael B. Abu-Ghazaleh","Yue Dong"],"abstract":"Multimodal Language Models (MMLMs) typically undergo post-training alignment to prevent harmful content generation. However, these alignment stages focus primarily on the assistant role, leaving the user role unaligned, and stick to a fixed input prompt structure of special tokens, leaving the model vulnerable when inputs deviate from these expectations. We introduce Role-Modality Attacks (RMA), a novel class of adversarial attacks that exploit role confusion between the user and assistant and alter the position of the image token to elicit harmful outputs. Unlike existing attacks that modify query content, RMAs manipulate the input structure without altering the query itself. We systematically evaluate these attacks across multiple Vision Language Models (VLMs) on eight distinct settings, showing that they can be composed to create stronger adversarial prompts, as also evidenced by thei...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","Multimodal Large Language Models","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/new-initiatives-at-icmi-mentorship-in-focus","title":"New Initiatives at ICMI: Mentorship in Focus","url":"https://www.microsoft.com/en-us/research/publication/new-initiatives-at-icmi-mentorship-in-focus/","published":"2025-03-31","authors":["Catharine Oertel","Laura Cabrera-Quiros","Hayley Hung","Mohammad Soleymani","Sean Andrist","Dimosthenis Kontogiorgos","Chirag Raman"],"abstract":"In its 2024 iteration, the general chairs of SIGCHI's International Conference on Multimodal Interaction (ICMI) decided to launch a New Initiatives programme at the conference. The goal was to give new ideas a place to form within the ICMI community, so that not only seasoned members had a say, but also, newcomers could voice their needs and find an academic home they would like to return to and contribute to for many years. Ultimately, the question motivating this programme was: How can we make ICMI an even more welcoming place for newcomers, early career researchers, and people from diverse backgrounds and perspectives?","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Article (Journal)","Artificial intelligence","Human-computer interaction"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/inference-time-scaling-for-complex-tasks-where-we-stand-and-what-lies-ahead","title":"Inference-Time Scaling for Complex Tasks: Where We Stand and What Lies Ahead","url":"https://www.microsoft.com/en-us/research/publication/inference-time-scaling-for-complex-tasks-where-we-stand-and-what-lies-ahead/","published":"2025-03-31","authors":["Vidhisha Balachandran","Jingya Chen","Lingjiao Chen","Shivam Garg","Neel Joshi","Yash Lara","John Langford","Besmira Nushi","Vibhav Vineet","Yue Wu","Safoora Yousefi"],"abstract":"Inference-time scaling can enhance the reasoning capabilities of large language models (LLMs) on complex problems that benefit from step-by-step problem solving. Although lengthening generated scratchpads has proven effective for mathematical tasks, the broader impact of this approach on other tasks remains less clear. In this work, we investigate the benefits and limitations of scaling methods across nine state-of-the-art models and eight challenging tasks, including math and STEM reasoning, calendar planning, NP-hard problems, navigation, and spatial reasoning. We compare conventional models (e.g., GPT-4o) with models fine-tuned for inference-time scaling (e.g., o1) through evaluation protocols that involve repeated model calls, either independently or sequentially with feedback. These evaluations approximate lower and upper performance bounds and potential for future performance impro...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Tech Report","Artificial intelligence","Machine learning"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W4409069799","title":"Rate–Distortion–Perception Trade-Off in Information Theory, Generative Models, and Intelligent Communications","url":"https://doi.org/10.3390/e27040373","published":"2025-03-31","authors":["Xueyan Niu","Bo Bai","Nian Guo","Weixi Zhang","Wei Han"],"abstract":"Traditional rate-distortion (RD) theory examines the trade-off between the average length of the compressed representation of a source and the additive distortions of its reconstruction. The rate-distortion-perception (RDP) framework, which integrates the perceptual dimension into the RD paradigm, has garnered significant attention due to recent advancements in machine learning, where perceptual fidelity is assessed by the divergence between input and reconstruction distributions. In communication systems where downstream tasks involve generative modeling, high perceptual fidelity is essential, despite distortion constraints. However, while zero distortion implies perfect realism, the converse is not true, highlighting an imbalance in the significance of distortion and perceptual constraints. This article clarifies that incorporating perceptual constraints does not decrease the necessary...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.3390/e27040373","openalex_id":"https://openalex.org/W4409069799","cited_by_count":4,"quality_score":45,"matched_keywords":["compression"],"author_affiliations":["Huawei Technologies (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6212784051895142},{"id":"https://openalex.org/C126780896","display_name":"Distortion (music)","score":0.5945969820022583},{"id":"https://openalex.org/C26760741","display_name":"Perception","score":0.5697528123855591},{"id":"https://openalex.org/C52622258","display_name":"Information theory","score":0.47995880246162415},{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.4786040484905243},{"id":"https://openalex.org/C167966045","display_name":"Generative model","score":0.47736525535583496},{"id":"https://openalex.org/C2776459999","display_name":"Fidelity","score":0.4539310038089752},{"id":"https://openalex.org/C125112378","display_name":"Randomness","score":0.4376394748687744}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":4}},{"id":"openalex:W4410356867","title":"GANDALF: A LLM-based approach to map bark beetle outbreaks in semantic stories of Sentinel-2 images","url":"https://doi.org/10.1145/3672608.3707751","published":"2025-03-31","authors":["Vincenzo Pasquadibisceglie","Vito Recchia","Annalisa Appice","Donato Malerba","Giuseppe Fiameni"],"abstract":"Huge spruce forest areas have been damaged by massive bark beetle outbreaks across Europe during the past few years. Hence, forest health management requires large-scale inventory of bark beetle outbreaks to plan actions for promptly mitigating forest tree dieback. Deep learning techniques have recently achieved amazing results in imagery semantic segmentation tasks by dominating the recent research for mapping bark beetle outbreaks in Sentinel-2 images of forest areas. In addition, due to the impressive performance of Large Language Models (LLMs) in natural language understanding and generation tasks, LLMs have started attracting attention in multiple fields. In this paper, we describe GANDALF: an approach that leverages the potential of LLMs for mapping bark beetle outbreaks in Sentinel-2 images of forest areas. Specifically, we take advantage of the rich context of textual data to tra...","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3672608.3707751","openalex_id":"https://openalex.org/W4410356867","cited_by_count":1,"quality_score":42,"matched_keywords":["LLM"],"author_affiliations":["Nvidia (United States)","University of Bari Aldo Moro"],"concepts":[{"id":"https://openalex.org/C2779751432","display_name":"Bark beetle","score":0.6540380716323853},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6452000141143799},{"id":"https://openalex.org/C133446333","display_name":"Bark (sound)","score":0.42144161462783813},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4127483665943146},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.3228100538253784},{"id":"https://openalex.org/C205649164","display_name":"Geography","score":0.1788652241230011},{"id":"https://openalex.org/C97137747","display_name":"Forestry","score":0.13901913166046143}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4409210030","title":"Generative AI: A new frontier for agric extension service in Africa - revolutionizing farmer information access","url":"https://doi.org/10.51594/csitrj.v6i2.1870","published":"2025-03-31","authors":["Jennifer Bakowaa Sarfo","Samuel Aremora","Mary Opeyemi Adebote","Kehinde M. Balogun","Abayomi Taiwo Fashina"],"abstract":"Extension services in agriculture are essential in the dissemination of critical information and sustainable agricultural techniques to farmers in Africa. Nevertheless, such services tend to be limited by their reach, language constraints, and lack of context-specific information. With the potential to generate new and context-specific content, Generative Artificial Intelligence (AI) offers a revolutionary means to overcome these constraints and improve agricultural knowledge transfer. This paper investigates the future of Generative AI, with components including Natural Language Processing (NLP), content generation (text, picture, audio, video), context-specific information delivery to individuals, and AI-driven chatbots in reshaping African agricultural extension. We survey applications of AI worldwide in agriculture with examples of success stories along with drawing inferences applic...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.51594/csitrj.v6i2.1870","openalex_id":"https://openalex.org/W4409210030","cited_by_count":2,"quality_score":39,"matched_keywords":[],"author_affiliations":["Austin Peay State University","Google (United States)","Virginia Tech"],"concepts":[{"id":"https://openalex.org/C2778571376","display_name":"Frontier","score":0.8589065074920654},{"id":"https://openalex.org/C2778029271","display_name":"Extension (predicate logic)","score":0.7647293210029602},{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.6826838254928589},{"id":"https://openalex.org/C2780378061","display_name":"Service (business)","score":0.626309871673584},{"id":"https://openalex.org/C144133560","display_name":"Business","score":0.4104400873184204},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.35528457164764404},{"id":"https://openalex.org/C205649164","display_name":"Geography","score":0.334758460521698},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.2480916976928711}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":2}},{"id":"official:7f106913e60f19ff","title":"Through-The-Mask: Mask-based Motion Trajectories for Image-to-Video Generation","url":"https://ai.meta.com/research/publications/through-the-mask-mask-based-motion-trajectories-for-image-to-video-generation/","published":"2025-03-30","authors":["Guy Yariv","Yuval Kirstain","Amit Zohar","Shelly Sheynin","Yaniv Taigman","Yossef (Yossi) Adi","Sagie Benaim","Adam Polyak"],"abstract":"Official Meta AI publication page.","companies":["Meta/FAIR"],"matched_orgs":["Meta/FAIR"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Computer Vision"],"author_affiliations":["Meta/FAIR"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Meta AI publication search https://ai.meta.com/results/?content_types%5B0%5D=publication&page=6"}},{"id":"openalex:W4409078275","title":"The Integration of NLP and Computer Vision: Advanced Frameworks for Multi-Modal Content Understanding","url":"https://doi.org/10.32628/cseit25112708","published":"2025-03-30","authors":["Manish Kumar Keshri"],"abstract":"Content understanding systems leveraging Natural Language Processing (NLP) and Computer Vision (CV) have revolutionized how machines interpret and analyze multimodal information across diverse applications. This article explores the technologies driving advancements in content analysis, from text embedding techniques such as BERT to image and video representation methods including CNN-based approaches and Vision Transformers. It examines the challenges of processing diverse languages and regional contexts in a multimodal framework, alongside methodologies for collecting and preparing high-quality training data. The discussion covers various fusion architectures for integrating information across modalities, training approaches for multimodal classifiers, and evaluation frameworks to ensure model effectiveness. As these technologies continue to evolve, the integration of NLP and CV promis...","companies":["Meta/FAIR"],"matched_orgs":["Meta/FAIR"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.32628/cseit25112708","openalex_id":"https://openalex.org/W4409078275","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Meta (United States)"],"concepts":[{"id":"https://openalex.org/C71139939","display_name":"Modal","score":0.7191754579544067},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7175427675247192},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.6301620602607727},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.5508548617362976},{"id":"https://openalex.org/C2778152352","display_name":"Content (measure theory)","score":0.5119222402572632},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.3227742314338684},{"id":"https://openalex.org/C33923547","display_name":"Mathematics","score":0.04292804002761841},{"id":"https://openalex.org/C134306372","display_name":"Mathematical analysis","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4409277072","title":"Modeling Chinese EFL learners’ intention to use generative AI for L2 writing through an integrated model of the TAM and TTF","url":"https://doi.org/10.1007/s10639-025-13505-9","published":"2025-03-29","authors":["Hu Xing","Gong Wen"],"abstract":"","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1007/s10639-025-13505-9","openalex_id":"https://openalex.org/W4409277072","cited_by_count":9,"quality_score":46,"matched_keywords":[],"author_affiliations":["City University of Macau","Huawei Technologies (China)","Lingnan Normal University"],"concepts":[{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.596830427646637},{"id":"https://openalex.org/C16443162","display_name":"Educational technology","score":0.5556365251541138},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5072786211967468},{"id":"https://openalex.org/C167966045","display_name":"Generative model","score":0.46348100900650024},{"id":"https://openalex.org/C41895202","display_name":"Linguistics","score":0.4465583562850952},{"id":"https://openalex.org/C145420912","display_name":"Mathematics education","score":0.44197311997413635},{"id":"https://openalex.org/C15744967","display_name":"Psychology","score":0.43397045135498047},{"id":"https://openalex.org/C49774154","display_name":"Multimedia","score":0.3359454274177551}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":9}},{"id":"official:257d4ba58d7ce6d6","title":"QVQ-Max: Think with Evidence","url":"https://qwenlm.github.io/blog/qvq-max-preview/","published":"2025-03-28","authors":["Alibaba/Qwen"],"abstract":"QWEN CHAT GITHUB HUGGING FACE MODELSCOPE DISCORDIntroduction Last December, we launched QVQ-72B-Preview as an exploratory model, but it had many issues. Today, we are officially releasing the first version of QVQ-Max, our visual reasoning model. This model can not only “understand” the content in images and videos but also analyze and reason with this information to provide solutions. From math problems to everyday questions, from programming code to artistic creation, QVQ-Max has demonstrated impressive capabilities.","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Alibaba/Qwen"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official RSS feed https://qwenlm.github.io/blog/index.xml"}},{"id":"openalex:W4408930350","title":"Learning Human Feedback from Large Language Models for Content Quality-aware Recommendation","url":"https://doi.org/10.1145/3727144","published":"2025-03-28","authors":["Huili Wang","Chuhan Wu","Yongfeng Huang","Tao Qi"],"abstract":"Recommender systems are widely employed to mitigate information overload by tailoring online content to individual preferences. Existing recommendation methods typically focus on optimizing the relevance between candidate item content and user historical behaviors. However, these methods often neglect the quality of recommended content, which can negatively affect user experience and hinder the long-term growth of platforms. In fact, addressing this issue is particularly challenging, as signal on content quality feedback is typically sparse in the user interaction data (e.g., clicks) commonly used for model training. In this article, we propose a human feedback alignment framework for recommender system (HFAR), which leverages well-aligned large language models to simulate human feedback on content quality to enhance recommendation. Specifically, we propose a multi-task learning-based kn...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3727144","openalex_id":"https://openalex.org/W4408930350","cited_by_count":2,"quality_score":43,"matched_keywords":["long-term"],"author_affiliations":["Beijing University of Posts and Telecommunications","Huawei Technologies (China)","Huawei Technologies (Sweden)","Tsinghua University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6430959701538086},{"id":"https://openalex.org/C2779530757","display_name":"Quality (philosophy)","score":0.5662410259246826},{"id":"https://openalex.org/C2778152352","display_name":"Content (measure theory)","score":0.5289673209190369},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.38722318410873413},{"id":"https://openalex.org/C49774154","display_name":"Multimedia","score":0.3869783580303192},{"id":"https://openalex.org/C111472728","display_name":"Epistemology","score":0.07360661029815674},{"id":"https://openalex.org/C33923547","display_name":"Mathematics","score":0.0},{"id":"https://openalex.org/C138885662","display_name":"Philosophy","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":2}},{"id":"arxiv:2502.07578","title":"PIM Is All You Need: A CXL-Enabled GPU-Free System for Large Language Model Inference","url":"http://arxiv.org/abs/2502.07578","published":"2025-03-27","authors":["Yufeng Gu","Alireza Khadem","Sumanth Umesh","Ning Liang","Xavier Servot","Onur Mutlu","Ravishankar Iyer","Reetuparna Das"],"abstract":"Large Language Model (LLM) inference uses an autoregressive manner to generate one token at a time, which exhibits notably lower operational intensity compared to earlier Machine Learning (ML) models such as encoder-only transformers and Convolutional Neural Networks. At the same time, LLMs possess large parameter sizes and use key-value caches to store context information. Modern LLMs support context windows with up to 1 million tokens to generate versatile text, audio, and video content. A large key-value cache unique to each prompt requires a large memory capacity, limiting the inference batch size. Both low operational intensity and limited batch size necessitate a high memory bandwidth. However, contemporary hardware systems for ML model deployment, such as GPUs and TPUs, are primarily optimized for compute throughput. This mismatch challenges the efficient deployment of advanced LL...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"preprint","doi":"https://doi.org/10.1145/3676641.3716267","openalex_id":"https://openalex.org/W4407425079","cited_by_count":24,"quality_score":77,"matched_keywords":["LLM","language model","memory","efficient"],"author_affiliations":["ETH Zurich","Google (United States)","University of Michigan"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.860276460647583},{"id":"https://openalex.org/C2776214188","display_name":"Inference","score":0.5841017961502075},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.5013415813446045},{"id":"https://openalex.org/C118505674","display_name":"Encoder","score":0.4930705726146698},{"id":"https://openalex.org/C105339364","display_name":"Software deployment","score":0.48407331109046936},{"id":"https://openalex.org/C2776257435","display_name":"Bandwidth (computing)","score":0.45931676030158997},{"id":"https://openalex.org/C2779343474","display_name":"Context (archaeology)","score":0.44197148084640503},{"id":"https://openalex.org/C157764524","display_name":"Throughput","score":0.4367714822292328}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":24}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/debug-gym-a-text-based-environment-for-interactive-debugging","title":"debug-gym: A Text-Based Environment for Interactive Debugging","url":"https://www.microsoft.com/en-us/research/publication/debug-gym-a-text-based-environment-for-interactive-debugging/","published":"2025-03-27","authors":["Xingdi Yuan","Morgane M Moss","Charbel Feghali","Chinmay Singh","Darya Moldavskaya","Drew MacPhee","Lucas Caccia","Matheus Pereira","Minseon Kim","Alessandro Sordoni","Marc-Alexandre Côté"],"abstract":"Large Language Models (LLMs) are increasingly relied upon for coding tasks, yet in most scenarios it is assumed that all relevant information can be either accessed in context or matches their training data. We posit that LLMs can benefit from the ability to interactively explore a codebase to gather the information relevant to their task. To achieve this, we present a textual environment, namely debug-gym, for developing LLM-based agents in an interactive coding setting. Our environment is lightweight and provides a preset of useful tools, such as a Python debugger (pdb), designed to facilitate an LLM-based agent's interactive debugging. Beyond coding and debugging tasks, this approach can be generalized to other tasks that would benefit from information-seeking behavior by an LLM agent.","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Miscellaneous","Artificial intelligence","Programming languages and software engineering","Computer science","LLM","agent"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W4408903581","title":"Relax: Composable Abstractions for End-to-End Dynamic Machine Learning","url":"https://doi.org/10.1145/3676641.3716249","published":"2025-03-27","authors":["Ruihang Lai","Junru Shao","S.M. Feng","Steven Lyubomirsky","Bohan Hou","Wuwei Lin","Zihao Ye","Hongyi Jin","Yuchen Jin","Jiawei Liu","Li-Jie Jin","Yaxing Cai"],"abstract":"Dynamic shape computations have become critical in modern machine learning workloads, especially in emerging large language models. The success of these models has driven the demand for their universal deployment across a diverse set of backend environments. In this paper, we present Relax, a compiler abstraction for optimizing end-to-end dynamic machine learning workloads. Relax introduces a cross-level abstraction that encapsulates computational graphs, loop-level tensor programs, and external library calls in a single representation. Relax also introduces first-class symbolic shape annotations to track dynamic shape computations globally across the program, enabling dynamic shape-aware cross-level optimizations. We build an end-to-end compilation framework using the proposed approach to optimize dynamic shape models. Experimental results on LLMs show that Relax delivers performance co...","companies":["OpenAI","NVIDIA"],"matched_orgs":["OpenAI","NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3676641.3716249","openalex_id":"https://openalex.org/W4408903581","cited_by_count":8,"quality_score":57,"matched_keywords":[],"author_affiliations":["Carnegie Mellon University","Netflix (United States)","Nvidia (United States)","OpenAI (United States)","Seattle University","Shanghai Jiao Tong University","University of Illinois Urbana-Champaign","University of Washington"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8142867088317871},{"id":"https://openalex.org/C74296488","display_name":"End-to-end principle","score":0.7060506939888},{"id":"https://openalex.org/C199360897","display_name":"Programming language","score":0.4532164931297302},{"id":"https://openalex.org/C120314980","display_name":"Distributed computing","score":0.3812362253665924},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.26450079679489136}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":8}},{"id":"official:ec90e941c596e259","title":"Qwen2.5 Omni: See, Hear, Talk, Write, Do It All!","url":"https://qwenlm.github.io/blog/qwen2.5-omni/","published":"2025-03-27","authors":["Alibaba/Qwen"],"abstract":"QWEN CHAT HUGGING FACE MODELSCOPE DASHSCOPE GITHUB PAPER DEMO DISCORDWe release Qwen2.5-Omni, the new flagship end-to-end multimodal model in the Qwen series. Designed for comprehensive multimodal perception, it seamlessly processes diverse inputs including text, images, audio, and video, while delivering real-time streaming responses through both text generation and natural speech synthesis. To try the latest model, feel free to visit Qwen Chat and choose Qwen2.5-Omni-7B. The model is now openly available on Hugging Face, ModelScope, DashScope,and GitHub, with technical documentation available in our Paper.","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Alibaba/Qwen"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official RSS feed https://qwenlm.github.io/blog/index.xml"}},{"id":"openalex:W4408894550","title":"K <scp>lotski</scp> : Efficient Mixture-of-Expert Inference via Expert-Aware Multi-Batch Pipeline","url":"https://doi.org/10.1145/3676641.3716261","published":"2025-03-27","authors":["Zhiyuan Fang","Yuegui Huang","Zicong Hong","Yufeng Lyu","Wuhui Chen","Yue Yu","Yu Fan","Zibin Zheng"],"abstract":"Mixture of Experts (MoE), with its distinctive sparse structure, enables the scaling of language models up to trillions of parameters without significantly increasing computational costs. However, the substantial parameter size presents a challenge for inference, as the expansion in GPU memory cannot keep pace with the growth in parameters. Although offloading techniques utilise memory from the CPU and disk and parallelise the I/O and computation for efficiency, the computation for each expert in MoE models is often less than the I/O, resulting in numerous bubbles in the pipeline.","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3676641.3716261","openalex_id":"https://openalex.org/W4408894550","cited_by_count":2,"quality_score":47,"matched_keywords":["memory","efficient"],"author_affiliations":["Hong Kong University of Science and Technology","Huawei Technologies (China)","Peng Cheng Laboratory","Sun Yat-sen University"],"concepts":[{"id":"https://openalex.org/C43521106","display_name":"Pipeline (software)","score":0.7522091865539551},{"id":"https://openalex.org/C2776214188","display_name":"Inference","score":0.6906799077987671},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6811458468437195},{"id":"https://openalex.org/C58328972","display_name":"Expert system","score":0.45628082752227783},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3193354606628418},{"id":"https://openalex.org/C199360897","display_name":"Programming language","score":0.2006680965423584}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":2}},{"id":"openalex:W4408974494","title":"Bi-VLDoc: bidirectional vision-language modeling for visually-rich document understanding","url":"https://doi.org/10.1007/s10032-025-00518-w","published":"2025-03-27","authors":["Chuwei Luo","Guozhi Tang","Qi Zheng","Cong Yao","Lianwen Jin","Chenliang Li","Yang Xue","Luo Si"],"abstract":"","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1007/s10032-025-00518-w","openalex_id":"https://openalex.org/W4408974494","cited_by_count":3,"quality_score":40,"matched_keywords":[],"author_affiliations":["Alibaba Group (China)","South China University of Technology"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6082913875579834},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.43371865153312683},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3891637921333313},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.3874852657318115},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.3298233151435852}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":3}},{"id":"openalex:W4408841099","title":"NavCoT: Boosting LLM-Based Vision-and-Language Navigation via Learning Disentangled Reasoning","url":"https://doi.org/10.1109/tpami.2025.3554559","published":"2025-03-26","authors":["Bingqian Lin","Yunshuang Nie","Ziming Wei","Jiaqi Chen","Shikui Ma","Jianhua Han","Hang Xu","Xiaojun Chang","Xiaodan Liang"],"abstract":"Vision-and-Language Navigation (VLN), as a crucial research problem of Embodied AI, requires an embodied agent to navigate through complex 3D environments following natural language instructions. Recent research has highlighted the promising capacity of large language models (LLMs) in VLN by improving navigational reasoning accuracy and interpretability. However, their predominant use in an offline manner usually suffers from substantial domain gap between the VLN task and the LLM training corpus. This paper proposes a novel strategy called Navigational Chain-of-Thought (NavCoT), where we fulfill parameter-efficient in-domain training to enable self-guided navigational decision, leading to a significant mitigation of the domain gap in a cost-effective manner. Specifically, at each timestep, the LLM is prompted to forecast the navigational chain-of-thought by: 1) acting as a world model t...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/tpami.2025.3554559","openalex_id":"https://openalex.org/W4408841099","cited_by_count":26,"quality_score":75,"matched_keywords":["LLM","efficient","agent"],"author_affiliations":["Huawei Technologies (China)","Huawei Technologies (Sweden)","Quanta Computer (China)","Sun Yat-sen University","University of Hong Kong","University of Science and Technology of China"],"concepts":[{"id":"https://openalex.org/C46686674","display_name":"Boosting (machine learning)","score":0.82793128490448},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7353742718696594},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.6996966600418091},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.43712037801742554},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.35928115248680115},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.3372272253036499}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":26}},{"id":"apple:gxf6hs7yle77a4mkigj1p4wk","title":"ToolSandbox: A Stateful, Conversational, Interactive Evaluation Benchmark for LLM Tool Use Capabilities","url":"https://machinelearning.apple.com/research/toolsandbox-stateful-conversational-llm-benchmark","published":"2025-03-26","authors":["Jiarui Lu","Thomas Holleis","Yizhe Zhang","Bernhard Aumayer","Feng Nan","Felix Bai","Shuang Ma","Shen Ma","Mengyu Li","Guoli Yin","Zirui Wang","Ruoming Pang"],"abstract":"Recent large language models (LLMs) advancements sparked a growing research interest in tool assisted LLMs solving real-world challenges, which calls for comprehensive evaluation of tool-use capabilities. While previous works focused on either evaluating over stateless web services (RESTful API), based on a single turn user prompt, or an off-policy dialog trajectory, ToolSandbox includes stateful tool execution, implicit state dependencies...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["LLM"],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"openalex:W4408868638","title":"Bridging the human–AI knowledge gap through concept discovery and transfer in AlphaZero","url":"https://doi.org/10.1073/pnas.2406675122","published":"2025-03-26","authors":["Lisa Schut","Nenad Tomašev","Thomas McGrath","Demis Hassabis","Ulrich Paquet","Been Kim"],"abstract":"AI systems have attained superhuman performance across various domains. If the hidden knowledge encoded in these highly capable systems can be leveraged, human knowledge and performance can be advanced. Yet, this internal knowledge is difficult to extract. Due to the vast space of possible internal representations, searching for meaningful new conceptual knowledge can be like finding a needle in a haystack. Here, we introduce a method that extracts new chess concepts from AlphaZero, an AI system that mastered chess via self-play without human supervision. Our method excavates vectors that represent concepts from AlphaZero's internal representations using convex optimization, and filters the concepts based on teachability (whether the concept is transferable to another AI agent) and novelty (whether the concept contains information not present in human chess games). These steps ensure tha...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1073/pnas.2406675122","openalex_id":"https://openalex.org/W4408868638","cited_by_count":4,"quality_score":45,"matched_keywords":["agent"],"author_affiliations":["Google (United States)","Google DeepMind (United Kingdom)","Sunfire (Germany)","University of Oxford"],"concepts":[{"id":"https://openalex.org/C174348530","display_name":"Bridging (networking)","score":0.6888227462768555},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6680716276168823},{"id":"https://openalex.org/C2778738651","display_name":"Novelty","score":0.5540892481803894},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5267595648765564},{"id":"https://openalex.org/C105409693","display_name":"Human intelligence","score":0.42861780524253845},{"id":"https://openalex.org/C177264268","display_name":"Set (abstract data type)","score":0.41002869606018066},{"id":"https://openalex.org/C2522767166","display_name":"Data science","score":0.35317742824554443},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.3432864844799042}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":4}},{"id":"openalex:W4408922224","title":"IR-GPT: AI Foundation Models to Optimize Interventional Radiology","url":"https://doi.org/10.1007/s00270-024-03945-0","published":"2025-03-26","authors":["Jacqueline Brenner","James T Anibal","Lindsey Hazen","Miranda J. Song","Hannah Huth","Daguang Xu","Sheng Xu","Bradford J. Wood"],"abstract":"Foundation artificial intelligence (AI) models are capable of complex tasks that involve text, medical images, and many other types of data, but have not yet been customized for procedural medicine. This report reviews prior work in deep learning related to interventional radiology (IR), identifying barriers to generalization and deployment at scale. Moreover, this report outlines the potential design of an \"IR-GPT\" foundation model to provide a unified platform for AI in IR, including data collection, annotation, and training methods-while also contextualizing challenges and highlighting potential downstream applications.","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"review","doi":"https://doi.org/10.1007/s00270-024-03945-0","openalex_id":"https://openalex.org/W4408922224","cited_by_count":6,"quality_score":43,"matched_keywords":[],"author_affiliations":["National Institutes of Health","National Institutes of Health Clinical Center","Nvidia (United States)"],"concepts":[{"id":"https://openalex.org/C2780966255","display_name":"Foundation (evidence)","score":0.6844308376312256},{"id":"https://openalex.org/C105339364","display_name":"Software deployment","score":0.6398090124130249},{"id":"https://openalex.org/C108583219","display_name":"Deep learning","score":0.6349406242370605},{"id":"https://openalex.org/C71924100","display_name":"Medicine","score":0.6128379106521606},{"id":"https://openalex.org/C513090587","display_name":"Interventional radiology","score":0.5420668125152588},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5186666250228882},{"id":"https://openalex.org/C177148314","display_name":"Generalization","score":0.450950562953949},{"id":"https://openalex.org/C126838900","display_name":"Radiology","score":0.4249931275844574}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":6}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/cheap-permutation-testing","title":"Cheap Permutation Testing","url":"https://www.microsoft.com/en-us/research/publication/cheap-permutation-testing/","published":"2025-03-25","authors":["Carles Domingo-Enrich","Raaz Dwivedi","Lester Mackey"],"abstract":"Permutation tests are a popular choice for distinguishing distributions and testing independence, due to their exact, finite-sample control of false positives and their minimax optimality when paired with U-statistics. However, standard permutation tests are also expensive, requiring a test statistic to be computed hundreds or thousands of times to detect a separation between distributions. In this work, we offer a simple approach to accelerate testing: group your datapoints into bins and permute only those bins. For U and V-statistics, we prove that these cheap permutation tests have two remarkable properties. First, by storing appropriate sufficient statistics, a cheap test can be run in time comparable to evaluating a single test statistic. Second, cheap permutation power closely approximates standard permutation power. As a result, cheap tests inherit the exact false positive control...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Miscellaneous","Artificial intelligence","Computation","Machine learning","mathematics"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"official:b4d247ef2eff6e29","title":"Addendum to GPT-4o System Card: 4o image generation","url":"https://openai.com/index/gpt-4o-image-generation-system-card-addendum","published":"2025-03-25","authors":["OpenAI"],"abstract":"4o image generation is a new, significantly more capable image generation approach than our earlier DALL·E 3 series of models. It can create photorealistic output. It can take images as inputs and transform them.","companies":["OpenAI"],"matched_orgs":["OpenAI"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Publication"],"author_affiliations":["OpenAI"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official RSS feed https://openai.com/news/rss.xml"}},{"id":"openalex:W4413556753","title":"Gaussians-to-Life: Text-Driven Animation of 3D Gaussian Splatting Scenes","url":"https://doi.org/10.1109/3dv66043.2025.00093","published":"2025-03-25","authors":["Thomas Wimmer","Michael Oechsle","Michael Niemeyer","Federico Tombari"],"abstract":"State-of-the-art novel view synthesis methods achieve impressive results for multi-view captures of static 3D scenes. However, the reconstructed scenes still lack “liveliness,“ a key component for creating engaging 3D experiences. Recently, novel video diffusion models generate realistic videos with complex motion and enable animations of 2D images, however they cannot naively be used to animate 3D scenes as they lack multi-view consistency. To breathe life into the static world, we propose Gaussians2Life, a method for animating parts of high-quality 3D scenes in a Gaussian Splatting representation. Our key idea is to leverage powerful video diffusion models as the generative component of our model and to combine these with a robust technique to lift 2D videos into meaningful 3D motion. We find that, in contrast to prior work, this enables realistic animations of complex, pre-existing 3D...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/3dv66043.2025.00093","openalex_id":"https://openalex.org/W4413556753","cited_by_count":1,"quality_score":38,"matched_keywords":[],"author_affiliations":["Google (United States)","Technical University of Munich"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7784041166305542},{"id":"https://openalex.org/C121684516","display_name":"Computer graphics (images)","score":0.6462119221687317},{"id":"https://openalex.org/C502989409","display_name":"Animation","score":0.5446431636810303},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5145139694213867},{"id":"https://openalex.org/C163716315","display_name":"Gaussian","score":0.46175265312194824},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.44571754336357117},{"id":"https://openalex.org/C121332964","display_name":"Physics","score":0.0},{"id":"https://openalex.org/C62520636","display_name":"Quantum mechanics","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"apple:a4oxk9oimmegqp9443mmcavu","title":"UniVG: A Generalist Diffusion Model for Unified Image Generation and Editing","url":"https://machinelearning.apple.com/research/univg-diffusion-model","published":"2025-03-24","authors":["Tsu-Jui Fu","Yusu Qian","Chen Chen","Wenze Hu","Zhe Gan","Yinfei Yang"],"abstract":"Text-to-Image (T2I) diffusion models have shown impressive results in generating visually compelling images following user prompts. Building on this, various methods further fine-tune the pre-trained T2I model for specific tasks. However, this requires separate model architectures, training designs, and multiple parameter sets to handle different tasks. In this paper, we introduce UniVG, a generalist diffusion model capable of supporting a...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"official:d9816292466e4f18","title":"Qwen2.5-VL-32B: Smarter and Lighter","url":"https://qwenlm.github.io/blog/qwen2.5-vl-32b/","published":"2025-03-24","authors":["Alibaba/Qwen"],"abstract":"QWEN CHAT GITHUB HUGGING FACE MODELSCOPE DISCORDIntroduction At the end of January this year, we launched the Qwen2.5-VL series of models, which received widespread attention and positive feedback from the community. Building on the Qwen2.5-VL series, we continued to optimize the model using reinforcement learning and open-sourced the new VL model with the beloved 32B parameter scale under the Apache 2.0 license — Qwen2.5-VL-32B-Instruct. Compared to the previously released Qwen2.","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Alibaba/Qwen"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official RSS feed https://qwenlm.github.io/blog/index.xml"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/overcoming-vocabulary-mismatch-vocabulary-agnostic-teacher-guided-language-modeling","title":"Overcoming Vocabulary Mismatch: Vocabulary-agnostic Teacher Guided Language Modeling","url":"https://www.microsoft.com/en-us/research/publication/overcoming-vocabulary-mismatch-vocabulary-agnostic-teacher-guided-language-modeling/","published":"2025-03-23","authors":["Haebin Shin","Lei Ji","Xiao Liu","Yeyun Gong"],"abstract":"Using large teacher models to guide the training of smaller student models has become the prevailing paradigm for efficient and effective learning. However, vocabulary mismatches between teacher and student language models pose significant challenges in language modeling, resulting in divergent token sequences and output distributions. To overcome these limitations, we propose Vocabulary-agnostic Teacher Guided Language Modeling (VocAgnoLM), a novel approach that bridges the gap caused by vocabulary mismatch through two key methods: (1) Token-level Lexical Alignment, which aligns token sequences across mismatched vocabularies, and (2) Teacher Guided Loss, which leverages the loss of teacher model to guide effective student training. We demonstrate its effectiveness in language modeling with 1B student model using various 7B teacher models with different vocabularies. Notably, with Qwen2....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":80,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","large teacher models","Machine learning","1970-01-01","efficient"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"bytedance-seed:243","title":"SeedVR: Seeding Infinity in Diffusion Transformer Towards Generic Video Restoration","url":"https://seed.bytedance.com/en/research/seedvr-seeding-infinity-in-diffusion-transformer-towards-generic-video-restoration","published":"2025-03-22","authors":["Jianyi Wang","Zhijie Lin","Meng Wei","Yang Zhao","Ceyuan Yang","Fei Xiao","Chen Change Loy","Lu Jiang"],"abstract":"Video restoration poses non-trivial challenges in maintaining fidelity while recovering temporally consistent details from unknown degradations in the wild. Despite recent advances in diffusion-based restoration, these methods often face limitations in generation capability and sampling efficiency. In this work, we present SeedVR, a diffusion transformer designed to handle real-world video restoration with arbitrary length and resolution. The core design of SeedVR lies in the shifted window attention that facilitates effective restoration on long video sequences. SeedVR further supports variable-sized windows near the boundary of both spatial and temporal dimensions, overcoming the resolution constraints of traditional window attention. Equipped with contemporary practices, including causal video autoencoder, mixed image and video training, and progressive training, SeedVR achieves highl...","companies":["ByteDance/Seed"],"matched_orgs":["ByteDance/Seed"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Computer Vision","Vision","CVPR 2025"],"author_affiliations":["ByteDance/Seed"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official ByteDance Seed publication API https://seed.bytedance.com/api/get_article_list_v2"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/efficient-intent-based-filtering-for-multi-party-conversations-using-knowledge-distillation-from-llms","title":"Efficient Intent-Based Filtering for Multi-Party Conversations Using Knowledge Distillation from LLMs","url":"https://www.microsoft.com/en-us/research/publication/efficient-intent-based-filtering-for-multi-party-conversations-using-knowledge-distillation-from-llms/","published":"2025-03-21","authors":["Reem Gody","Mohamed Abdelghaffar","M. Jabreel","Ahmed Tawfik"],"abstract":"Large language models (LLMs) have showcased remarkable capabilities in conversational AI, enabling open-domain responses in chat-bots, as well as advanced processing of conversations like summarization, intent classification, and insights generation. However, these models are resource-intensive, demanding substantial memory and computational power. To address this, we propose a cost-effective solution that filters conversational snippets of interest for LLM processing, tailored to the target downstream application, rather than processing every snippet. In this work, we introduce an innovative approach that leverages knowledge distillation from LLMs to develop an intent-based filter for multi-party conversations, optimized for compute power constrained environments. Our method combines different strategies to create a diverse multi-party conversational dataset, that is annotated with the....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":84,"matched_keywords":["Article (Journal)","Artificial intelligence","Human language technologies","Computer science","LLM","memory","efficient","distillation"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W4408723695","title":"Emission Factor Recommendation for Life Cycle Assessments with Generative AI","url":"https://doi.org/10.1021/acs.est.4c12667","published":"2025-03-21","authors":["Bharathan Balaji","Fahimeh Ebrahimi","Nina G. G. Domingo","Venkata Sai Gargeya Vunnava","Abu-Zaher Faridee","Soma Ramalingam","Shikha Gupta","Anran Wang","Harsh Gupta","Domenic Belcastro","Kellen Axten","Jeremie Hakian"],"abstract":"Accurately quantifying greenhouse gas (GHG) emissions is crucial for organizations to measure and mitigate their environmental impact. Life cycle assessment (LCA) estimates the environmental impacts throughout a product's entire lifecycle, from raw material extraction to end-of-life. Measuring the emissions outside a product owner's control is challenging, and practitioners rely on emission factors (EFs)─estimations of GHG emissions per unit of activity─to model and estimate indirect impacts. However, the current practice of manually selecting appropriate EFs from databases is time-consuming and error-prone and requires expertise. We present an AI-assisted method leveraging natural language processing and machine learning to automatically recommend EFs with human-interpretable justifications. Our algorithm can assist experts by providing a ranked list of EFs or operating in a fully autom...","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1021/acs.est.4c12667","openalex_id":"https://openalex.org/W4408723695","cited_by_count":14,"quality_score":51,"matched_keywords":[],"author_affiliations":["Amazon (Germany)","Amazon (United States)","University of British Columbia"],"concepts":[{"id":"https://openalex.org/C2781039887","display_name":"Factor (programming language)","score":0.5451086163520813},{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.5246500968933105},{"id":"https://openalex.org/C2778706760","display_name":"Life-cycle assessment","score":0.5125565528869629},{"id":"https://openalex.org/C39432304","display_name":"Environmental science","score":0.41715970635414124},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.37391895055770874},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.325401246547699},{"id":"https://openalex.org/C162324750","display_name":"Economics","score":0.08891752362251282},{"id":"https://openalex.org/C139719470","display_name":"Macroeconomics","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":14}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/deduce-deductive-consistency-as-a-frame-work-to-evaluate-llm-reasoning","title":"DEDUCE: Deductive Consistency as a Frame Work to Evaluate LLM Reasoning","url":"https://www.microsoft.com/en-us/research/publication/deduce-deductive-consistency-as-a-frame-work-to-evaluate-llm-reasoning/","published":"2025-03-20","authors":["Atharva Pandey","Kshitij Dubey","Rahul Sharma","Amit Sharma"],"abstract":"Despite great performance on Olympiad-level reasoning problems, frontier large language models can still struggle on high school math. We study the nature of language models’ (LM) reasoning by analyzing their chain-of-thought traces. To avoid memorization issues, we present a framework that can evaluate reasoning of LMs over novel, perturbed versions of benchmark problems. Formally, we compare LMs to ideal deductive reasoners that given a set of premises, can provide valid conclusions over any number of reasoning hops. To assess reasoning performance beyond final accuracy, we introduce deductive consistency, a metric that evaluates the correctness of system’s reasoning across varying input premise lengths and the number of solution hops. Using this metric, we examine potential explanations for language models’ failures on novel problems. Through experiments on GSM8K and a synthetic datas...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","large language models","Machine learning","1970-01-01","LLM"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/re-imagine-symbolic-benchmark-synthesis-for-reasoning-evaluation","title":"Re-Imagine: Symbolic Benchmark Synthesis for Reasoning Evaluation","url":"https://www.microsoft.com/en-us/research/publication/re-imagine-symbolic-benchmark-synthesis-for-reasoning-evaluation/","published":"2025-03-20","authors":["Xinnuo Xu","Rachel Lawrence","Kshitij Dubey","Atharva Pandey","Fabian Falck","Risa Ueno","Aditya Nori","Rahul Sharma","Amit Sharma","Javier González"],"abstract":"Recent Large Language Models (LLMs) have reported high accuracy on reasoning benchmarks. However, it is still unclear whether the observed results arise from true “reasoning” or from statistical recall of the training set. Inspired by the ladder of causation (Pearl, 2009) and its three levels (associations, interventions and counterfactuals), this paper introduces RE-IMAGINE: a framework to characterize a hierarchy of reasoning ability in LLMs, alongside an automated pipeline to generate problem variations at different levels of the hierarchy. By altering problems in an intermediate symbolic representation, RE-IMAGINE generates arbitrarily many problems that are not solvable using memorization alone. Moreover, the framework is general and can work across reasoning domains, including math, code, and logic. We demonstrate our framework on four widely-used benchmarks to evaluate several fam...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","large language models","Machine learning","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"bytedance-seed:216","title":"Multi-Reward as Condition for Instruction-based Image Editing","url":"https://seed.bytedance.com/en/research/multi-reward-as-condition-for-instruction-based-image-editing","published":"2025-03-20","authors":["Xin Gu","Ming Li","Libo Zhang","Fan Chen","Longyin Wen","Tiejian Luo","Sijie Zhu"],"abstract":"High-quality training triplets (instruction, original image, edited image) are essential for instruction-based image editing. Predominant training datasets (e.g., InsPix2Pix) are created using text-to-image generative models (e.g., Stable Diffusion, DALL-E) which are not trained for image editing. Accordingly, these datasets suffer from inaccurate instruction following, poor detail preserving, and generation artifacts. In this paper, we propose to address the training data quality issue with multi-perspective reward data instead of refining the ground-truth image quality. 1) we first design a quantitative metric system based on best-in-class LVLM (Large Vision Language Model), i.e., GPT-4o in our case, to evaluate the generation quality from 3 perspectives, namely, instruction following, detail preserving, and generation quality. For each perspective, we collected quantitative score in 0...","companies":["ByteDance/Seed"],"matched_orgs":["ByteDance/Seed"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Computer Vision","Vision","ICLR 2025","language model"],"author_affiliations":["ByteDance/Seed"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official ByteDance Seed publication API https://seed.bytedance.com/api/get_article_list_v2"}},{"id":"openalex:W4408652593","title":"Wenwang: Toward Effectively Generating Code Beyond Standalone Functions via Generative Pre-trained Models","url":"https://doi.org/10.1145/3725213","published":"2025-03-20","authors":["Hao Yu","Bo Shen","J. Y. Zhang","Shaoxin Lin","Lin Li","Guangtai Liang","Ying Li","Qianxiang Wang","Tao Xie"],"abstract":"Code generation models based on the pre-training and fine-tuning paradigm have been increasingly attempted by both academia and industry, resulting in well-known industrial models such as Codex, CodeGen, and PanGu-Coder. After being pre-trained on a large-scale corpus of code, a model is further fine-tuned with datasets specifically for the target downstream task, e.g., generating code from natural language description. The target code being generated can be classified into two types: a standalone function, i.e., a function that invokes or accesses only built-in functions and standard libraries, and a non-standalone function, i.e., a function that invokes or accesses user-defined functions or third-party libraries. To effectively generate code especially non-standalone functions (largely ignored by existing work), in this article, we present Wenwang, an approach to improving the capabili...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3725213","openalex_id":"https://openalex.org/W4408652593","cited_by_count":1,"quality_score":38,"matched_keywords":[],"author_affiliations":["Hong Kong University of Science and Technology","Huawei Technologies (China)","Peking University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.853313684463501},{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.6687873005867004},{"id":"https://openalex.org/C2776760102","display_name":"Code (set theory)","score":0.5733562707901001},{"id":"https://openalex.org/C199360897","display_name":"Programming language","score":0.4416294991970062},{"id":"https://openalex.org/C115903868","display_name":"Software engineering","score":0.39569753408432007},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3703630566596985},{"id":"https://openalex.org/C177264268","display_name":"Set (abstract data type)","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4408684125","title":"Introduction to the Special Issue on Large Language Models, Conversational Systems, and Generative AI in Health - Part 2","url":"https://doi.org/10.1145/3723454","published":"2025-03-20","authors":["Jiayu Zhou","Manas Gaur","Amir M. Rahmani","Sharath Chandra Guntuku","Xiaofan Jiang","Tristan Naumann"],"abstract":"Dialogue systems are designed to offer human users social support or functional services through natural language interactions. Traditional conversation research has put significant emphasis on a system’s response-ability, including its capacity to understand dialogue context and generate appropriate responses. However, the key element of proactive behavior—a crucial aspect of intelligent conversations—is often overlooked in these studies. Proactivity empowers conversational agents to lead conversations towards achieving pre-defined targets or fulfilling specific goals on the system side. Proactive dialogue systems are equipped with advanced techniques to handle complex tasks, requiring strategic and motivational interactions, thus representing a significant step towards artificial general intelligence. Motivated by the necessity and challenges of building proactive dialogue systems, we....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3723454","openalex_id":"https://openalex.org/W4408684125","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["California University of Pennsylvania","Columbia University","Microsoft (United States)","University of California, Irvine","University of Maryland, Baltimore County","University of Michigan","University of Pennsylvania"],"concepts":[{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.7905921936035156},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5601692199707031},{"id":"https://openalex.org/C41895202","display_name":"Linguistics","score":0.5066617131233215},{"id":"https://openalex.org/C188147891","display_name":"Cognitive science","score":0.44170263409614563},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.3437036871910095},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3404727876186371},{"id":"https://openalex.org/C15744967","display_name":"Psychology","score":0.3026810884475708},{"id":"https://openalex.org/C138885662","display_name":"Philosophy","score":0.13429588079452515}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"hf-org-paper:stepfun-ai:2503.14935","title":"FAVOR-Bench: A Comprehensive Benchmark for Fine-Grained Video Motion Understanding","url":"https://huggingface.co/papers/2503.14935","published":"2025-03-19","authors":["StepFun"],"abstract":"","companies":["StepFun"],"matched_orgs":["StepFun"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["HuggingFace org papers","stepfun-ai"],"author_affiliations":["StepFun"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/stepfun-ai/papers"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/planrag-efficient-test-time-planning-for-retrieval-augmented-generation","title":"PlanRAG: Efficient Test-Time Planning for Retrieval Augmented Generation","url":"https://www.microsoft.com/en-us/research/publication/planrag-efficient-test-time-planning-for-retrieval-augmented-generation/","published":"2025-03-18","authors":["Prakhar Verma","Sukruta Prakash Midigeshi","Gaurav Sinha","Arno Solin","Nagarajan Natarajan","Amit Sharma"],"abstract":"We introduce PlanRAG, a novel framework that enables structured multi-hop reasoning in retrieval-augmented generation (RAG) through test-time reasoning plan generation. While existing approaches such as ReAct maintain reasoning chains within the language model's context window, we observe that this often leads to plan fragmentation and execution failures. Our key insight is that by isolating the reasoning plan as a directed acyclic graph (DAG) outside the LM's working memory, we can enable (1) systematic exploration of reasoning paths, (2) atomic subqueries enabling precise retrievals and grounding, and (3) efficiency through parallel execution and bounded context window utilization. Moreover, PlanRAG's modular design allows it to be integrated with existing RAG methods, thus providing a practical solution to improve current RAG systems. On standard multi-hop reasoning benchmarks, PlanRA...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":88,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","Machine learning","1970-01-01","language model","memory","retrieval","efficient"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"bytedance-seed:186","title":"Hyper-Connections","url":"https://seed.bytedance.com/en/research/hyper-connections","published":"2025-03-18","authors":["Defa Zhu","Hongzhi Huang","Zihao Huang","Yutao Zeng","Yunyao Mao","Banggu Wu","Qiyang Min","Xun Zhou"],"abstract":"We present hyper-connections, a simple yet effective method that can serve as an alternative to residual connections. This approach specifically addresses common drawbacks observed in residual connection variants, such as the seesaw effect between gradient vanishing and representation collapse. Theoretically, hyper-connections allow the network to adjust the strength of connections between features at different depths and dynamically rearrange layers. We conduct experiments focusing on the pre-training of large language models, including dense and sparse models, where hyper-connections show significant performance improvements over residual connections. Additional experiments conducted on vision tasks also demonstrate similar improvements. We anticipate that this method will be broadly applicable and beneficial across a wide range of AI problems. External paper link: https://arxiv.org/ab...","companies":["ByteDance/Seed"],"matched_orgs":["ByteDance/Seed"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["LLM","ICLR 2025"],"author_affiliations":["ByteDance/Seed"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official ByteDance Seed publication API https://seed.bytedance.com/api/get_article_list_v2"}},{"id":"openalex:W4408563990","title":"RMR: A Relative Membership Risk Measure for Machine Learning Models","url":"https://doi.org/10.1109/tdsc.2025.3551921","published":"2025-03-18","authors":["Li Bai","Haibo Hu","Qingqing Ye","Jianliang Xu","Jin Li","Chengfang Fang","Jie Shi"],"abstract":"Privacy leakage poses a significant threat when machine learning foundation models trained on private data are released. One such threat is membership inference attacks (MIA), which determine whether a specific example was included in a model's training set. This article shifts focus from developing new MIA algorithms to measuring a model's risk under MIA. We introduce a novel metric, Relative Membership Risk (RMR), which assesses a model's MIA vulnerability from a comparative standpoint. RMR calculates the difference in prediction loss for training examples relative to a predefined reference model, enabling risk comparison across models without needing to delve into details like training strategy, architecture, or data distribution. We also explore the selection of the reference model and show that using a high-risk reference model enhances the accuracy of the RMR measure. To identify t...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/tdsc.2025.3551921","openalex_id":"https://openalex.org/W4408563990","cited_by_count":2,"quality_score":43,"matched_keywords":["efficient"],"author_affiliations":["Guangzhou University","Hong Kong Baptist University","Hong Kong Polytechnic University","Huawei Technologies (China)"],"concepts":[{"id":"https://openalex.org/C2780009758","display_name":"Measure (data warehouse)","score":0.7100317478179932},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.695676326751709},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.42667126655578613},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.42616182565689087},{"id":"https://openalex.org/C124101348","display_name":"Data mining","score":0.30031776428222656}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":2}},{"id":"openalex:W4410536869","title":"Compressing Inside Generating: A Latent Domain Codec for AI-Generated Images","url":"https://doi.org/10.1109/dcc62719.2025.00051","published":"2025-03-18","authors":["Yuxu Chen","Zhenhao Sun","Yuliang Huang","Lei Deng","Wei Han","Bo Bai","Shiqi Wang"],"abstract":"Latent diffusion models (LDMs) have emerged as a prominent framework for image generation, consisting of a diffusion model <tex xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">$\\mathcal{M}$</tex> and a VAE decoder <tex xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">$\\mathcal{D}$</tex>. High-quality image generation models are large and computationally intensive. As a result, image generation is typically performed on cloud servers, with the generated images then transmitted to edge devices.","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/dcc62719.2025.00051","openalex_id":"https://openalex.org/W4410536869","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Central Research Institute","City University of Hong Kong","Huawei Technologies (China)"],"concepts":[{"id":"https://openalex.org/C161765866","display_name":"Codec","score":0.7624000310897827},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7614887952804565},{"id":"https://openalex.org/C36503486","display_name":"Domain (mathematical analysis)","score":0.49935150146484375},{"id":"https://openalex.org/C125411270","display_name":"Encoding (memory)","score":0.4489603638648987},{"id":"https://openalex.org/C121684516","display_name":"Computer graphics (images)","score":0.44252875447273254},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4276471734046936},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.41490232944488525},{"id":"https://openalex.org/C9390403","display_name":"Computer hardware","score":0.20171892642974854}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"arxiv:2503.14734","title":"GR00T N1: An Open Foundation Model for Generalist Humanoid Robots","url":"https://huggingface.co/papers/2503.14734","published":"2025-03-18","authors":["NVIDIA","Johan Bjorck","Fernando Castañeda","Nikita Cherniadev","Xingye Da","Runyu Ding","Linxi \"Jim\" Fan","Yu Fang","Dieter Fox","Fengyuan Hu","Spencer Huang","Joel Jang"],"abstract":"General-purpose robots need a versatile body and an intelligent mind. Recent advancements in humanoid robots have shown great promise as a hardware platform for building generalist autonomy in the human world. A robot foundation model, trained on massive and diverse data sources, is essential for enabling the robots to reason about novel situations, robustly handle real-world variability, and rapidly learn new tasks. To this end, we introduce GR00T N1, an open foundation model for humanoid robots. GR00T N1 is a Vision-Language-Action (VLA) model with a dual-system architecture. The vision-language module (System 2) interprets the environment through vision and language instructions. The subsequent diffusion transformer module (System 1) generates fluid motor actions in real time. Both modules are tightly coupled and jointly trained end-to-end. We train GR00T N1 with a heterogeneous mixtu...","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["huggingface_search"],"source":"huggingface_search","work_type":"","doi":"","openalex_id":"","cited_by_count":0,"quality_score":27,"matched_keywords":[],"author_affiliations":[],"concepts":[],"official_report":false,"quality_signals":{"company_match_source":"HuggingFace Papers organization/author metadata"}},{"id":"arxiv:2503.14492","title":"Cosmos-Transfer1: Conditional World Generation with Adaptive Multimodal Control","url":"https://huggingface.co/papers/2503.14492","published":"2025-03-18","authors":["NVIDIA","Hassan Abu Alhaija","Jose Alvarez","Maciej Bala","Tiffany Cai","Tianshi Cao","Liz Cha","Joshua Chen","Mike Chen","Francesco Ferroni","Sanja Fidler","Dieter Fox"],"abstract":"We introduce Cosmos-Transfer, a conditional world generation model that can generate world simulations based on multiple spatial control inputs of various modalities such as segmentation, depth, and edge. In the design, the spatial conditional scheme is adaptive and customizable. It allows weighting different conditional inputs differently at different spatial locations. This enables highly controllable world generation and finds use in various world-to-world transfer use cases, including Sim2Real. We conduct extensive evaluations to analyze the proposed model and demonstrate its applications for Physical AI, including robotics Sim2Real and autonomous vehicle data enrichment. We further demonstrate an inference scaling strategy to achieve real-time world generation with an NVIDIA GB200 NVL72 rack. To help accelerate research development in the field, we open-source our models and code at...","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["huggingface_search"],"source":"huggingface_search","work_type":"","doi":"","openalex_id":"","cited_by_count":0,"quality_score":27,"matched_keywords":[],"author_affiliations":[],"concepts":[],"official_report":false,"quality_signals":{"company_match_source":"HuggingFace Papers organization/author metadata"}},{"id":"arxiv:2503.15558","title":"Cosmos-Reason1: From Physical Common Sense To Embodied Reasoning","url":"https://huggingface.co/papers/2503.15558","published":"2025-03-18","authors":["NVIDIA","Alisson Azzolini","Hannah Brandon","Prithvijit Chattopadhyay","Huayu Chen","Jinju Chu","Yin Cui","Jenna Diamond","Yifan Ding","Francesco Ferroni","Rama Govindaraju","Jinwei Gu"],"abstract":"Physical AI systems need to perceive, understand, and perform complex actions in the physical world. In this paper, we present the Cosmos-Reason1 models that can understand the physical world and generate appropriate embodied decisions (e.g., next step action) in natural language through long chain-of-thought reasoning processes. We begin by defining key capabilities for Physical AI reasoning, with a focus on physical common sense and embodied reasoning. To represent physical common sense, we use a hierarchical ontology that captures fundamental knowledge about space, time, and physics. For embodied reasoning, we rely on a two-dimensional ontology that generalizes across different physical embodiments. Building on these capabilities, we develop two multimodal large language models, Cosmos-Reason1-8B and Cosmos-Reason1-56B. We curate data and train our models in four stages: vision pre-tr...","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["huggingface_search"],"source":"huggingface_search","work_type":"","doi":"","openalex_id":"","cited_by_count":0,"quality_score":27,"matched_keywords":[],"author_affiliations":[],"concepts":[],"official_report":false,"quality_signals":{"company_match_source":"HuggingFace Papers organization/author metadata"}},{"id":"official:12420a646f9fe153","title":"reWordBench: Benchmarking and Improving the Robustness of Reward Models with Transformed Inputs","url":"https://ai.meta.com/research/publications/rewordbench-benchmarking-and-improving-the-robustness-of-reward-models-with-transformed-inputs/","published":"2025-03-17","authors":["Zhaofeng Wu","Michihiro Yasunaga","Andrew Cohen","Yoon Kim","Asli Celikyilmaz","Marjan Ghazvininejad"],"abstract":"Official Meta AI publication page.","companies":["Meta/FAIR"],"matched_orgs":["Meta/FAIR"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["NLP"],"author_affiliations":["Meta/FAIR"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Meta AI publication search https://ai.meta.com/results/?content_types%5B0%5D=publication&page=6"}},{"id":"openalex:W4408500218","title":"Had Enough of Experts? Quantitative Knowledge Retrieval From Large Language Models","url":"https://doi.org/10.1002/sta4.70054","published":"2025-03-17","authors":["David Selby","Yuichiro Iwashita","Kai Spriestersbach","Mohammad Saad","Dennis Bappert","Archana Warrier","Sumantrak Mukherjee","Koichi Kise","Sebastian J. Vollmer"],"abstract":"ABSTRACT Large language models (LLMs) have been extensively studied for their ability to generate convincing natural language sequences; however, their utility for quantitative information retrieval is less well understood. Here, we explore the feasibility of LLMs as a mechanism for quantitative knowledge retrieval to aid two data analysis tasks: elicitation of prior distributions for Bayesian models and imputation of missing data. We introduce a framework that leverages LLMs to enhance Bayesian workflows by eliciting expert‐like prior knowledge and imputing missing data. Tested on diverse datasets, this approach can improve predictive accuracy and reduce data requirements, offering significant potential in healthcare, environmental science and engineering applications. We discuss the implications and challenges of treating LLMs as ‘experts’.","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1002/sta4.70054","openalex_id":"https://openalex.org/W4408500218","cited_by_count":3,"quality_score":44,"matched_keywords":["retrieval"],"author_affiliations":["Amazon (United States)","German Research Centre for Artificial Intelligence","Osaka Metropolitan University","Rheinland-Pfälzische Technische Universität Kaiserslautern-Landau","University of Kaiserslautern"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6681249141693115},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.4585639238357544},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.4561024308204651},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.35268107056617737},{"id":"https://openalex.org/C2522767166","display_name":"Data science","score":0.34736156463623047}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":3}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/reasoning-elicitation-in-language-models-via-counterfactual-feedback","title":"Reasoning Elicitation in Language Models via Counterfactual Feedback","url":"https://www.microsoft.com/en-us/research/publication/reasoning-elicitation-in-language-models-via-counterfactual-feedback/","published":"2025-03-15","authors":["Alihan Hüyük","Xinnuo Xu","Jacqueline Maasch","Aditya Nori","Javier González"],"abstract":"Despite the increasing effectiveness of language models, their reasoning capabilities remain underdeveloped. In particular, causal reasoning through counterfactual question answering is lacking. This work aims to bridge this gap. We first derive novel metrics that balance accuracy in factual and counterfactual questions, capturing a more complete view of the reasoning abilities of language models than traditional factual-only based metrics. Second, we propose several fine-tuning approaches that aim to elicit better reasoning mechanisms, in the sense of the proposed metrics. Finally, we evaluate the performance of the fine-tuned language models in a variety of realistic scenarios. In particular, we investigate to what extent our fine-tuning approaches systemically achieve better generalization with respect to the base models in several problems that require, among others, inductive and de...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W4408476544","title":"Multi-Source Domain Adaptation by Causal-Guided Adaptive Multimodal Diffusion Networks","url":"https://doi.org/10.1007/s11263-025-02401-x","published":"2025-03-15","authors":["Ziyun Cai","Yawen Huang","Tengfei Zhang","Yefeng Zheng","Dong Yue"],"abstract":"","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1007/s11263-025-02401-x","openalex_id":"https://openalex.org/W4408476544","cited_by_count":16,"quality_score":53,"matched_keywords":[],"author_affiliations":["Nanjing University of Posts and Telecommunications","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6371498107910156},{"id":"https://openalex.org/C139807058","display_name":"Adaptation (eye)","score":0.6366931200027466},{"id":"https://openalex.org/C36503486","display_name":"Domain (mathematical analysis)","score":0.5466948747634888},{"id":"https://openalex.org/C2776434776","display_name":"Domain adaptation","score":0.5285372138023376},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5266540050506592},{"id":"https://openalex.org/C69357855","display_name":"Diffusion","score":0.49127525091171265},{"id":"https://openalex.org/C153180895","display_name":"Pattern recognition (psychology)","score":0.43209075927734375},{"id":"https://openalex.org/C124101348","display_name":"Data mining","score":0.33580994606018066}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":16}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/api-agents-vs-gui-agents-divergence-and-convergence","title":"API Agents vs. GUI Agents: Divergence and Convergence","url":"https://www.microsoft.com/en-us/research/publication/api-agents-vs-gui-agents-divergence-and-convergence/","published":"2025-03-14","authors":["Chaoyun Zhang","Shilin He","Liqun Li","Si Qin","Yu Kang","Qingwei Lin 林庆维","Dongmei Zhang"],"abstract":"Large language models (LLMs) have evolved beyond simple text generation to power software agents that directly translate natural language commands into tangible actions. While API-based LLM agents initially rose to prominence for their robust automation capabilities and seamless integration with programmatic endpoints, recent progress in multimodal LLM research has enabled GUI-based LLM agents that interact with graphical user interfaces in a human-like manner. Although these two paradigms share the goal of enabling LLM-driven task automation, they diverge significantly in architectural complexity, development workflows, and user interaction models. This paper presents the first comprehensive comparative study of API-based and GUI-based LLM agents, systematically analyzing their divergence and potential convergence. We examine key dimensions and highlight scenarios in which hybrid approa...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":80,"matched_keywords":["Article (Journal)","Artificial intelligence","Programming languages and software engineering","Aritificial Intelligence","Computer science","Software system","LLM"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"bytedance-seed:225","title":"ClassDiffusion: More Aligned Personalization Tuning with Explicit Class Guidance","url":"https://seed.bytedance.com/en/research/classdiffusion-more-aligned-personalization-tuning-with-explicit-class-guidance","published":"2025-03-14","authors":["Jiannan Huang","Jun Hao Liew","Hanshu Yan","Yuyang Yin","Yao Zhao","Humphrey Shi","Yunchao Wei"],"abstract":"Recent text-to-image customization works have proven successful in generating images of given concepts by fine-tuning diffusion models on a few examples. However, tuning-based methods inherently tend to overfit the concepts, resulting in failure to create the concept under multiple conditions (e.g., headphone is missing when generating \"a dog wearing a headphone\"). Interestingly, we notice that the base model before fine-tuning exhibits the capability to compose the base concept with other elements (e.g., \"a dog wearing a headphone\"), implying that the compositional ability only disappears after personalization tuning. We observe a semantic shift in the customized concept after fine-tuning, indicating that the personalized concept is not aligned with the original concept, and further show through theoretical analyses that this semantic shift leads to increased difficulty in sampling the....","companies":["ByteDance/Seed"],"matched_orgs":["ByteDance/Seed"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Computer Vision","Vision","ICLR 2025","personalized","personalization"],"author_affiliations":["ByteDance/Seed"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official ByteDance Seed publication API https://seed.bytedance.com/api/get_article_list_v2"}},{"id":"arxiv:2503.11346","title":"AIstorian lets AI be a historian: A KG-powered multi-agent system for accurate biography generation","url":"http://arxiv.org/abs/2503.11346","published":"2025-03-14","authors":["Fengyu Li","Yilin Li","Junhao Zhu","Lu Chen","Yanfei Zhang","Jia Zhou","Hui Zu","Jingwen Zhao","Yunjun Gao"],"abstract":"Huawei has always been committed to exploring the AI application in historical research. Biography generation, as a specialized form of abstractive summarization, plays a crucial role in historical research but faces unique challenges that existing large language models (LLMs) struggle to address. These challenges include maintaining stylistic adherence to historical writing conventions, ensuring factual fidelity, and handling fragmented information across multiple documents. We present AIstorian, a novel end-to-end agentic system featured with a knowledge graph (KG)-powered retrieval-augmented generation (RAG) and anti-hallucination multi-agents. Specifically, AIstorian introduces an in-context learning based chunking strategy and a KG-based index for accurate and efficient reference retrieval. Meanwhile, AIstorian orchestrates multi-agents to conduct on-the-fly hallucination detection....","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"preprint","doi":"https://doi.org/10.48550/arxiv.2503.11346","openalex_id":"https://openalex.org/W4417285506","cited_by_count":0,"quality_score":57,"matched_keywords":["preference","retrieval","efficient","agent","multi-agent"],"author_affiliations":["Huawei Technologies (China)","Huawei Technologies (United States)","Zhejiang University"],"concepts":[{"id":"https://openalex.org/C203357204","display_name":"Chunking (psychology)","score":0.7335000038146973},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5997999906539917},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5489000082015991},{"id":"https://openalex.org/C520712124","display_name":"Biography","score":0.5037000179290771},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.4871000051498413},{"id":"https://openalex.org/C132525143","display_name":"Graph","score":0.413100004196167},{"id":"https://openalex.org/C2776760102","display_name":"Code (set theory)","score":0.3702999949455261},{"id":"https://openalex.org/C74672266","display_name":"Language acquisition","score":0.33889999985694885}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"apple:h2yh3rl8ao9osmmjvmioiycp","title":"Exploring Prediction Targets in Masked Pre-Training for Speech Foundation Models","url":"https://machinelearning.apple.com/research/exploring-prediction-targets","published":"2025-03-14","authors":["Li-Wei Chen","Takuya Higuchi","He Bai","Ahmed Hussen Abdelaziz","Alexander Rudnicky","Shinji Watanabe","Tatiana Likhomanenko","Barry-John Theobald","Zakaria Aldeneh"],"abstract":"Speech foundation models, such as HuBERT and its variants, are pre-trained on large amounts of unlabeled speech data and then used for a range of downstream tasks. These models use a masked prediction objective, where the model learns to predict information about masked input segments from the unmasked context. The choice of prediction targets in this framework impacts their performance on downstream tasks. For instance, models pre-trained with...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"openalex:W4408441875","title":"Deformable Graph Transformer","url":"https://doi.org/10.1109/tpami.2025.3550281","published":"2025-03-14","authors":["Jinyoung Park","Seongjun Yun","Hyeonjin Park","Jaewoo Kang","Jisu Jeong","Kyung-Min Kim","Jung-Woo Ha","Hyunwoo J. Kim"],"abstract":"Transformer-based models have recently shown success in representation learning on graph-structured data beyond natural language processing and computer vision. However, the success is limited to small-scale graphs due to the drawbacks of full dot-product attention on graphs such as the quadratic complexity with respect to the number of nodes and message aggregation from enormous irrelevant nodes. To address these issues, we propose Deformable Graph Transformer (DGT) that performs sparse attention via dynamically selected relevant nodes for efficiently handling large-scale graphs with a linear complexity in the number of nodes. Specifically, our framework first constructs multiple node sequences with various criteria to consider both structural and semantic proximity. Then, combining with our learnable Katz Positional Encodings, the sparse attention is applied to the node sequences for l...","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/tpami.2025.3550281","openalex_id":"https://openalex.org/W4408441875","cited_by_count":7,"quality_score":44,"matched_keywords":[],"author_affiliations":["Amazon (United States)","Korea Advanced Institute of Science and Technology","Korea University","Naver (South Korea)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6001606583595276},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5693214535713196},{"id":"https://openalex.org/C66322947","display_name":"Transformer","score":0.41749411821365356},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.4101705849170685},{"id":"https://openalex.org/C153180895","display_name":"Pattern recognition (psychology)","score":0.3909594416618347},{"id":"https://openalex.org/C127413603","display_name":"Engineering","score":0.17038163542747498},{"id":"https://openalex.org/C165801399","display_name":"Voltage","score":0.13712483644485474},{"id":"https://openalex.org/C119599485","display_name":"Electrical engineering","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":7}},{"id":"openalex:W4416874201","title":"Machine Learning Pre-trained Language Models for English-French Neural Machine Translation using Topsis","url":"https://doi.org/10.1109/inc465408.2025.11256193","published":"2025-03-14","authors":["Arnav Jain","Sai Priyamka Kotha","Satish Bhambri","Harshit Kohli"],"abstract":"Neural machine translation (NMT) has revolutionized cross lingual machine learning models and yet a robust evaluation framework is still necessary for model selection. This research implements a multi-parameter decision-making framework for assessing five pre-trained NMT models using the TOPSIS methodology. The framework utilizes both qualitative metrics like BLEU and ROUGE scores as well as operational parameters (model size and computational latency) to assess the models. Further, by utilizing the TOPSIS method, models are ranked objectively by measuring how close they are to the best possible option. This study introduces a data-backed way to effectively compare NMT models, making it easy to choose the right model for real world translation work.","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/inc465408.2025.11256193","openalex_id":"https://openalex.org/W4416874201","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Amazon (United States)","Thapar Institute of Engineering & Technology","University of Denver","Walmart (United States)"],"concepts":[{"id":"https://openalex.org/C203005215","display_name":"Machine translation","score":0.8536999821662903},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.7670000195503235},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7440000176429749},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.7035999894142151},{"id":"https://openalex.org/C51566761","display_name":"TOPSIS","score":0.680400013923645},{"id":"https://openalex.org/C149364088","display_name":"Translation (biology)","score":0.5382000207901001},{"id":"https://openalex.org/C135784402","display_name":"Evaluation of machine translation","score":0.4763000011444092},{"id":"https://openalex.org/C51632099","display_name":"Training set","score":0.36899998784065247}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/evidence-aggregator-ai-reasoning-applied-to-rare-disease-diagnostics","title":"Evidence Aggregator: AI reasoning applied to rare disease diagnostics","url":"https://www.microsoft.com/en-us/research/publication/evidence-aggregator-ai-reasoning-applied-to-rare-disease-diagnostics/","published":"2025-03-13","authors":["Hope Twede","Ashley Conard","Lynn Pais","Samantha Brye","Emily O’Heir","Greg Smith","Ron Paulsen","Christina A. Austin-Tse","Alex Bloemendal","Cas Simons","Scott Saponas","Miah Wander"],"abstract":"Retrieving, reviewing, and synthesizing technical information can be time-consuming andchallenging, particularly when requiring specialized expertise, as is the case of variant assessmentfor rare disease diagnostics. To address this challenge, we developed the Evidence Aggregator(EvAgg), a generative AI tool designed for rare disease diagnosis that systematically extractsrelevant information from the scientific literature for any human gene. EvAgg provides a thoroughand current summary of observed genetic variants and their associated clinical features, enablingrapid synthesis of evidence concerning gene-disease relationships. EvAgg demonstrates strongbenchmark performance, achieving 97% recall in identifying relevant papers, 92% recall indetecting instances of genetic variation within those papers, and ~80% accuracy in extractingindividual case and variant-level content (e.g. zygosity,....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Unpublished","Artificial intelligence","Medical, health and genomics","Medical diagnosis","Medicine"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W4408402975","title":"The Evidence Aggregator: AI reasoning applied to rare disease diagnostics","url":"https://doi.org/10.1101/2025.03.10.642480","published":"2025-03-13","authors":["Hope Twede","Lynn Pais","Samantha J. Bryen","Emily O’Heir","Gregory G. Smith","Ron Paulsen","Christina A Austin-Tse","Alex Bloemendal","Cas Simons","Amanda K. Hall","Scott Saponas","Jeremiah Wander"],"abstract":"Abstract Variant assessment of rare disease diagnostics depends on using domain knowledge in the time- consuming process of retrieving, reviewing, and synthesizing clinical and technical information. To address these challenges, we developed the Evidence Aggregator (EvAgg), an open-source, generative-AI-based tool designed for rare disease diagnosis that systematically extracts relevant information from the scientific literature for any human gene. EvAgg provides a thorough and current summary of observed genetic variants and their associated clinical features, enabling rapid synthesis of evidence concerning gene-disease relationships. We constructed an expert-curated dataset and evaluated EvAgg’s performance. EvAgg achieves 92% recall in identifying relevant papers, 96% recall in detecting instances of genetic variation within those papers, and ∼80% accuracy in extracting individual cas...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"preprint","doi":"https://doi.org/10.1101/2025.03.10.642480","openalex_id":"https://openalex.org/W4408402975","cited_by_count":4,"quality_score":41,"matched_keywords":[],"author_affiliations":["Broad Institute","Garvan Institute of Medical Research","Mass General Brigham","Massachusetts General Hospital","Microsoft (United States)","Murdoch Children's Research Institute"],"concepts":[{"id":"https://openalex.org/C180505990","display_name":"News aggregator","score":0.8502172231674194},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.4162839651107788},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3754960298538208},{"id":"https://openalex.org/C136764020","display_name":"World Wide Web","score":0.0764683187007904}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":4}},{"id":"openalex:W4408404750","title":"Attribute-Centric Compositional Text-to-Image Generation","url":"https://doi.org/10.1007/s11263-025-02371-0","published":"2025-03-13","authors":["Yuren Cong","Martin Renqiang Min","Li Erran Li","Bodo Rosenhahn","Michael Ying Yang"],"abstract":"Abstract Despite the recent impressive breakthroughs in text-to-image generation, generative models have difficulty in capturing the data distribution of underrepresented attribute compositions while over-memorizing overrepresented attribute compositions, which raises public concerns about their robustness and fairness. To tackle this challenge, we propose ACTIG , an attribute-centric compositional text-to-image generation framework. We present an attribute-centric feature augmentation and a novel image-free training scheme, which greatly improves model’s ability to generate images with underrepresented attributes. We further propose an attribute-centric contrastive loss to avoid overfitting to overrepresented attribute compositions. We validate our framework on the CelebA-HQ and CUB datasets. Extensive experiments show that the compositional generalization of ACTIG is outstanding, and o...","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1007/s11263-025-02371-0","openalex_id":"https://openalex.org/W4408404750","cited_by_count":3,"quality_score":40,"matched_keywords":[],"author_affiliations":["Amazon (United States)","University of Bath"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.61316978931427},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5669187903404236},{"id":"https://openalex.org/C115961682","display_name":"Image (mathematics)","score":0.5296242833137512},{"id":"https://openalex.org/C153180895","display_name":"Pattern recognition (psychology)","score":0.5281602740287781},{"id":"https://openalex.org/C9417928","display_name":"Image processing","score":0.4455220103263855},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.37873080372810364},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.3483653962612152},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.34565508365631104}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":3}},{"id":"arxiv:2503.09949","title":"UVE: Are MLLMs Unified Evaluators for AI-Generated Videos?","url":"https://huggingface.co/papers/2503.09949","published":"2025-03-13","authors":["Yuanxin Liu","Rui Zhu","Shuhuai Ren","Jiacong Wang","Haoyuan Guo","Xu Sun","Lu Jiang"],"abstract":"With the rapid growth of video generative models (VGMs), it is essential to develop reliable and comprehensive automatic metrics for AI-generated videos (AIGVs). Existing methods either use off-the-shelf models optimized for other tasks or rely on human assessment data to train specialized evaluators. These approaches are constrained to specific evaluation aspects and are difficult to scale with the increasing demands for finer-grained and more comprehensive evaluations. To address this issue, this work investigates the feasibility of using multimodal large language models (MLLMs) as a unified evaluator for AIGVs, leveraging their strong visual perception and language understanding capabilities. To evaluate the performance of automatic metrics in unified AIGV evaluation, we introduce a benchmark called UVE-Bench. UVE-Bench collects videos generated by state-of-the-art VGMs and provides p...","companies":["ByteDance/Seed"],"matched_orgs":["ByteDance/Seed"],"company_groups":["company_china"],"company_regions":["China"],"sources":["huggingface_search"],"source":"huggingface_search","work_type":"","doi":"","openalex_id":"","cited_by_count":0,"quality_score":31,"matched_keywords":["preference"],"author_affiliations":[],"concepts":[],"official_report":false,"quality_signals":{"company_match_source":"HuggingFace Papers organization/author metadata"}},{"id":"openalex:W4408353548","title":"Exploring Large Language Models for Knowledge Graph Completion","url":"https://doi.org/10.1109/icassp49660.2025.10889242","published":"2025-03-12","authors":["Liang Yao","Jiazhen Peng","Chengsheng Mao","Yuan Luo"],"abstract":"Knowledge graphs play a vital role in numerous artificial intelligence tasks, yet they frequently face the issue of incompleteness. In this study, we explore utilizing Large Language Models (LLM) for knowledge graph completion. We consider triples in knowledge graphs as text sequences and introduce an innovative framework called Knowledge Graph LLM (KG-LLM) to model these triples. Our technique employs entity and relation descriptions of a triple as prompts and utilizes the response for predictions. Experiments on various benchmark knowledge graphs demonstrate that our method attains state-of-the-art performance in tasks such as triple classification and relation prediction. We also find that fine-tuning relatively smaller models (e.g., LLaMA-7B, ChatGLM-6B) outperforms recent ChatGPT and GPT-4.","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/icassp49660.2025.10889242","openalex_id":"https://openalex.org/W4408353548","cited_by_count":37,"quality_score":71,"matched_keywords":["LLM"],"author_affiliations":["Northwestern University","Sun Yat-sen University","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7524710893630981},{"id":"https://openalex.org/C2987255567","display_name":"Knowledge graph","score":0.5776391625404358},{"id":"https://openalex.org/C2779538338","display_name":"Completion (oil and gas wells)","score":0.4500119090080261},{"id":"https://openalex.org/C132525143","display_name":"Graph","score":0.44788074493408203},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.4095459282398224},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.35593557357788086},{"id":"https://openalex.org/C80444323","display_name":"Theoretical computer science","score":0.29075151681900024},{"id":"https://openalex.org/C127313418","display_name":"Geology","score":0.06795081496238708}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":37}},{"id":"openalex:W4408352607","title":"Speech Recognition with LLMs Adapted to Disordered Speech Using Reinforcement Learning","url":"https://doi.org/10.1109/icassp49660.2025.10888006","published":"2025-03-12","authors":["Chirag Nagpal","Subhashini Venugopalan","Jimmy Tobin","Marilyn Ladewig","Katherine Heller","Katrin Tomanek"],"abstract":"We introduce a large language model (LLM) capable of processing speech inputs and show that tuning it further with reinforcement learning on human preference (RLHF) enables it to adapt better to disordered speech than traditional fine-tuning. Our method replaces low-frequency text tokens in an LLM’s vocabulary with audio tokens and enables the model to recognize speech by fine-tuning it on speech with transcripts. We then use RL with rewards based on syntactic and semantic accuracy measures generalizing the LLM further to recognize disordered speech. While the resulting LLM does not outperform existing systems for speech recognition we find that tuning with reinforcement learning using custom rewards leads to substantially better performance than supervised fine-tuning of the language model, specifically when adapting to speech in a different setting. This presents a compelling alternati...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/icassp49660.2025.10888006","openalex_id":"https://openalex.org/W4408352607","cited_by_count":4,"quality_score":53,"matched_keywords":["LLM","language model","preference"],"author_affiliations":["Google (United States)"],"concepts":[{"id":"https://openalex.org/C28490314","display_name":"Speech recognition","score":0.6294870376586914},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6180742383003235},{"id":"https://openalex.org/C97541855","display_name":"Reinforcement learning","score":0.5847197771072388},{"id":"https://openalex.org/C67203356","display_name":"Reinforcement","score":0.4379716217517853},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.34048861265182495},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.3284892439842224},{"id":"https://openalex.org/C15744967","display_name":"Psychology","score":0.20998308062553406},{"id":"https://openalex.org/C77805123","display_name":"Social psychology","score":0.05290922522544861}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":4}},{"id":"openalex:W4408354307","title":"Mixture of Experts Fusion for Fake Audio Detection Using Frozen wav2vec 2.0","url":"https://doi.org/10.1109/icassp49660.2025.10888990","published":"2025-03-12","authors":["Zhiyong Wang","Ruibo Fu","Zhengqi Wen","Jianhua Tao","Xiaopeng Wang","Yuankun Xie","Xin Qi","Shuchen Shi","Yi Lu","Yukun Liu","Chenxing Li","Xuefei Liu"],"abstract":"Speech synthesis technology has posed a serious threat to speaker verification systems. Currently, the most effective fake audio detection methods utilize pretrained models, and integrating features from various layers of pretrained model further enhances detection performance. However, most of the previously proposed fusion methods require fine-tuning the pretrained models, resulting in excessively long training times and hindering model iteration when facing new speech synthesis technology. To address this issue, this paper proposes a feature fusion method based on the Mixture of Experts, which extracts and integrates features relevant to fake audio detection from layer features, guided by a gating network based on the last layer feature, while freezing the pretrained model. Experiments conducted on the ASVspoof2019 and ASVspoof2021 datasets demonstrate that the proposed method achieve...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/icassp49660.2025.10888990","openalex_id":"https://openalex.org/W4408354307","cited_by_count":16,"quality_score":53,"matched_keywords":[],"author_affiliations":["Institute of Automation","Shanghai Polytechnic University","Tencent (China)","Tsinghua University","University of Chinese Academy of Sciences"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6185110807418823},{"id":"https://openalex.org/C158525013","display_name":"Fusion","score":0.5765817761421204},{"id":"https://openalex.org/C33954974","display_name":"Sensor fusion","score":0.4455462694168091},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3185986280441284},{"id":"https://openalex.org/C138885662","display_name":"Philosophy","score":0.0},{"id":"https://openalex.org/C41895202","display_name":"Linguistics","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":16}},{"id":"openalex:W4408352440","title":"Low Frame-rate Speech Codec: a Codec Designed for Fast High-quality Speech LLM Training and Inference","url":"https://doi.org/10.1109/icassp49660.2025.10888202","published":"2025-03-12","authors":["Edresson Casanova","Ryan Langman","Paarth Neekhara","Shehzeen Hussain","Jason Li","Subhankar Ghosh","Ante Jukić","Sang‐Gil Lee"],"abstract":"Large language models (LLMs) have significantly advanced audio processing through audio codecs that convert audio into discrete tokens, enabling the application of language modeling techniques to audio data. However, audio codecs often operate at high frame rates, resulting in slow training and inference, especially for autoregressive models. To address this challenge, we present the Low Frame-rate Speech Codec (LFSC): a neural audio codec that leverages finite scalar quantization and adversarial training with large speech language models to achieve high-quality audio compression with a 1.89 kbps bitrate and 21.5 frames per second. We demonstrate that our novel codec can make the inference of LLM-based text-to-speech models around three times faster while improving intelligibility and producing quality comparable to previous models.","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/icassp49660.2025.10888202","openalex_id":"https://openalex.org/W4408352440","cited_by_count":3,"quality_score":52,"matched_keywords":["LLM","compression","quantization"],"author_affiliations":["Nvidia (United States)"],"concepts":[{"id":"https://openalex.org/C161765866","display_name":"Codec","score":0.901165246963501},{"id":"https://openalex.org/C177067256","display_name":"Adaptive Multi-Rate audio codec","score":0.8846442699432373},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8209570646286011},{"id":"https://openalex.org/C75217168","display_name":"Codec2","score":0.7837722897529602},{"id":"https://openalex.org/C28490314","display_name":"Speech recognition","score":0.6824185252189636},{"id":"https://openalex.org/C13895895","display_name":"Speech coding","score":0.6582825779914856},{"id":"https://openalex.org/C108699837","display_name":"PSQM","score":0.6563190817832947},{"id":"https://openalex.org/C204201278","display_name":"Voice activity detection","score":0.5537570118904114}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":3}},{"id":"openalex:W4408353790","title":"Full-text Error Correction for Chinese Speech Recognition with Large Language Model","url":"https://doi.org/10.1109/icassp49660.2025.10890161","published":"2025-03-12","authors":["Zhiyuan Tang","Dong Wang","Shen Huang","Shidong Shang"],"abstract":"Large Language Models (LLMs) have demonstrated substantial potential for error correction in Automatic Speech Recognition (ASR). However, most research focuses on utterances from short-duration speech recordings, which are the predominant form of speech data for supervised ASR training. This paper investigates the effectiveness of LLMs for error correction in full-text generated by ASR systems from longer speech recordings, such as transcripts from podcasts, news broadcasts, and meetings. First, we develop a Chinese dataset for full-text error correction, named ChFT, utilizing a pipeline that involves text-to-speech synthesis, ASR, and error-correction pair extractor. This dataset enables us to correct errors across contexts, including both full-text and segment, and to address a broader range of error types, such as punctuation restoration and inverse text normalization, thus making the...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/icassp49660.2025.10890161","openalex_id":"https://openalex.org/W4408353790","cited_by_count":3,"quality_score":52,"matched_keywords":["LLM","language model","news"],"author_affiliations":["Tencent (China)","Tsinghua University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8275356292724609},{"id":"https://openalex.org/C28490314","display_name":"Speech recognition","score":0.7509950995445251},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.6717191934585571},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.6349527835845947},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5633007884025574},{"id":"https://openalex.org/C103088060","display_name":"Error detection and correction","score":0.46585613489151},{"id":"https://openalex.org/C2983812711","display_name":"Text recognition","score":0.4244476556777954},{"id":"https://openalex.org/C3018824978","display_name":"Error analysis","score":0.4177432358264923}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":3}},{"id":"openalex:W4408353705","title":"EagerLog: Active Learning Enhanced Retrieval Augmented Generation for Log-based Anomaly Detection","url":"https://doi.org/10.1109/icassp49660.2025.10888663","published":"2025-03-12","authors":["Chiming Duan","Tong Jia","Yong Yang","Guiyang Liu","Jinbu Liu","Huxing Zhang","Qi Zhou","Ying Li","Gang Huang"],"abstract":"Logs record essential information about system operations and serve as a critical source for anomaly detection, which has generated growing research interest. Utilizing large language models (LLMs) within a retrieval-augmented generation (RAG) framework for log-based anomaly detection is an effective approach due to its strong generalization capabilities and efficient few-shot performance. However, the effectiveness of this method hinges on the quality of the knowledge source, which can be impacted by noise and changes within the software systems. Facing these problems, in this paper, we propose a novel log-based anomaly detection method named EagerLog, employing active learning to choose the logs for humans to label, thereby adding them to the knowledge source, thus enhancing the knowledge source and maintaining its quality. Our experiments on three open datasets (BGL, Thunderbird, Zook...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/icassp49660.2025.10888663","openalex_id":"https://openalex.org/W4408353705","cited_by_count":6,"quality_score":51,"matched_keywords":["retrieval","efficient"],"author_affiliations":["Alibaba Group (China)","Beijing Academy of Artificial Intelligence","Peking University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7005580067634583},{"id":"https://openalex.org/C739882","display_name":"Anomaly detection","score":0.5401754379272461},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3701680302619934},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.32773557305336}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":6}},{"id":"openalex:W4408353241","title":"Preference Alignment Improves Language Model-Based TTS","url":"https://doi.org/10.1109/icassp49660.2025.10890510","published":"2025-03-12","authors":["Jinchuan Tian","Chunlei Zhang","Jiatong Shi","Hao Zhang","Jianwei Yu","Shinji Watanabe","Dong Yu"],"abstract":"Recent advancements in text-to-speech (TTS) have shown that language model (LM)-based systems offer competitive performance to their counterparts. Further optimization can be achieved through preference alignment algorithms, which adjust LMs to align with the preferences of reward models, enhancing the desirability of the generated content. This study presents a thorough empirical evaluation of how preference alignment algorithms, particularly Direct Preference Optimization (DPO), enhance LM-based TTS. With a 1.15B parameter LM-based TTS model, we demonstrate that preference alignment consistently improves intelligibility, speaker similarity, and proxy subjective evaluation scores, with the latter two metrics surpassing even human speech in certain evaluations. We also show preference alignment is applicable to low-resource scenarios and effectively generalized to out-of-domain applicati...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/icassp49660.2025.10890510","openalex_id":"https://openalex.org/W4408353241","cited_by_count":4,"quality_score":49,"matched_keywords":["language model","preference"],"author_affiliations":["Carnegie Mellon University","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7455988526344299},{"id":"https://openalex.org/C2781249084","display_name":"Preference","score":0.4841959774494171},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.4541049897670746},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4204499423503876},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.4109998345375061},{"id":"https://openalex.org/C33923547","display_name":"Mathematics","score":0.0759023129940033},{"id":"https://openalex.org/C105795698","display_name":"Statistics","score":0.06539109349250793}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":4}},{"id":"openalex:W4408353035","title":"InstantSpeech: Instant Synchronous Text-to-Speech Synthesis for LLM-driven Voice Chatbots","url":"https://doi.org/10.1109/icassp49660.2025.10890120","published":"2025-03-12","authors":["Muyang Du","Chuan Liu","Junjie Lai"],"abstract":"Chatbots powered by large language models (LLMs) offer natural, human-like interactions. However, traditional text-to-speech (TTS) models paired with LLMs typically wait for the entire sentence to be generated before starting synthesis, leading to increased response latency. Although word-by-word speech synthesis models have been proposed to address this issue, they still face challenges, such as relying on autoregressive architectures to maintain smooth transitions between words or conditioning on auxiliary features from LLM for naturalness. To overcome these limitations, we introduce InstantSpeech, a novel low-latency synchronous speech synthesis model. InstantSpeech employs a fully parallel architecture, combining a causal transformer-based acoustic model with a causal convolution-based vocoder, enabling it to start streaming speech synthesis immediately after the LLM generates the in...","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/icassp49660.2025.10890120","openalex_id":"https://openalex.org/W4408353035","cited_by_count":2,"quality_score":47,"matched_keywords":["LLM","distillation"],"author_affiliations":["Nvidia (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7517886161804199},{"id":"https://openalex.org/C2779432360","display_name":"Instant","score":0.70116126537323},{"id":"https://openalex.org/C2985487447","display_name":"Instant messaging","score":0.6599740386009216},{"id":"https://openalex.org/C28490314","display_name":"Speech recognition","score":0.5954996347427368},{"id":"https://openalex.org/C14999030","display_name":"Speech synthesis","score":0.5845544338226318},{"id":"https://openalex.org/C49774154","display_name":"Multimedia","score":0.434037446975708},{"id":"https://openalex.org/C61328038","display_name":"Speech processing","score":0.4299493730068207},{"id":"https://openalex.org/C136764020","display_name":"World Wide Web","score":0.19700339436531067}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":2}},{"id":"openalex:W4408355411","title":"Large Language Model Should Understand Pinyin for Chinese ASR Error Correction","url":"https://doi.org/10.1109/icassp49660.2025.10887651","published":"2025-03-12","authors":["Yuang Li","Xiaosong Qiao","Xiaofeng Zhao","Huan Zhao","Wei Tang","Min Zhang","Hao Yang"],"abstract":"Large language models (LLMs) can enhance automatic speech recognition (ASR) systems through generative error correction (GEC). In this paper, we propose Pinyin-enhanced GEC (PY-GEC), which leverages Pinyin—the phonetic representation of Mandarin Chinese—as supplementary information to improve Chinese ASR error correction. Our approach only utilizes synthetic errors for training and employs the one-best hypothesis during inference. Additionally, we introduce a multitask training approach involving conversion tasks between Pinyin and text to align their feature spaces. Experiments on the Aishell-1 and the Common Voice datasets demonstrate that our approach consistently outperforms GEC with text-only input. More importantly, we provide intuitive explanations for the effectiveness of PY-GEC and multitask training from two aspects: 1) increased attention weight on Pinyin features; and 2) alig...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/icassp49660.2025.10887651","openalex_id":"https://openalex.org/W4408355411","cited_by_count":5,"quality_score":46,"matched_keywords":["language model"],"author_affiliations":["Huawei Technologies (China)"],"concepts":[{"id":"https://openalex.org/C2781095461","display_name":"Pinyin","score":0.9662773609161377},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8174457550048828},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.6344050765037537},{"id":"https://openalex.org/C103088060","display_name":"Error detection and correction","score":0.5398305058479309},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.48676395416259766},{"id":"https://openalex.org/C28490314","display_name":"Speech recognition","score":0.4539785087108612},{"id":"https://openalex.org/C3018428822","display_name":"Chinese language","score":0.4172631502151489},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3392001688480377}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":5}},{"id":"openalex:W4408354084","title":"CoF: Coarse to Fine-Grained Image Understanding for Multi-modal Large Language Models","url":"https://doi.org/10.1109/icassp49660.2025.10889170","published":"2025-03-12","authors":["Yimin Wang","Dehong Gao","Bin Li","Rujiao Long","Yi Lei","Xiaoyan Cai","Libin Yang","Jinxia Zhang","Shanqing Yu","Qi Xuan"],"abstract":"The impressive performance of Large Language Model (LLM) has prompted researchers to develop Multi-modal LLM (MLLM), which has shown great potential for various multi-modal tasks. However, current MLLM often struggles to effectively address fine-grained multi-modal challenges. We argue that this limitation is closely linked to the models’ visual grounding capabilities. The restricted spatial awareness and perceptual acuity of visual encoders frequently lead to interference from irrelevant background information in images, causing the models to overlook subtle but crucial details. As a result, achieving fine-grained regional visual comprehension becomes difficult. In this paper, we break down multi-modal understanding into two stages, from Coarse to Fine (CoF). In the first stage, we prompt the MLLM to locate the approximate area of the answer. In the second stage, we further enhance the....","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/icassp49660.2025.10889170","openalex_id":"https://openalex.org/W4408354084","cited_by_count":1,"quality_score":46,"matched_keywords":["LLM","language model"],"author_affiliations":["Alibaba Group (China)","Northwestern Polytechnical University","Southeast University","Zhejiang University of Technology"],"concepts":[{"id":"https://openalex.org/C71139939","display_name":"Modal","score":0.7413160800933838},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.722237229347229},{"id":"https://openalex.org/C115961682","display_name":"Image (mathematics)","score":0.4902358651161194},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4338744282722473},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.40135183930397034},{"id":"https://openalex.org/C192562407","display_name":"Materials science","score":0.18640556931495667},{"id":"https://openalex.org/C159985019","display_name":"Composite material","score":0.07360753417015076}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4408352352","title":"MotionComposer: Enhancing Rhythmic Music Generation with Adaptive Retrieval Reference","url":"https://doi.org/10.1109/icassp49660.2025.10889094","published":"2025-03-12","authors":["Jinting Wang","Li Liu","Jun Wang"],"abstract":"With the rise of the AIGC era, rhythmic music generation has extensive applications, particularly with the surge in motion video creation. However, generating music that is rhythmically synchronized and stylistically aligned with motion video presents significant challenges. Although existing methods have made progress, they still face difficulties in producing high-quality long-term music, particularly when addressing complex rhythmic patterns and maintaining style-consistent musical chords. In this work, we present MotionComposer, a novel retrieval-augmented, easy-to-hard training approach designed to enhance rhythmic music generation. By leveraging the inherent alignment between motion rhythms and music beats, we first tackle the simpler task of beat prediction with BeatNet, which predicts music beats by analyzing motion patterns. To address the complex musical chord generation, we pr...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/icassp49660.2025.10889094","openalex_id":"https://openalex.org/W4408352352","cited_by_count":0,"quality_score":45,"matched_keywords":["long-term","retrieval"],"author_affiliations":["Tencent (China)","University of Hong Kong"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7285225987434387},{"id":"https://openalex.org/C135343436","display_name":"Rhythm","score":0.7148469686508179},{"id":"https://openalex.org/C2777946086","display_name":"Music information retrieval","score":0.5873817801475525},{"id":"https://openalex.org/C28490314","display_name":"Speech recognition","score":0.4937210977077484},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.33022984862327576},{"id":"https://openalex.org/C558565934","display_name":"Musical","score":0.1972900629043579},{"id":"https://openalex.org/C24890656","display_name":"Acoustics","score":0.08978152275085449},{"id":"https://openalex.org/C121332964","display_name":"Physics","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4408353321","title":"Diff4Steer: Steerable Diffusion Prior for Generative Music Retrieval with Semantic Guidance","url":"https://doi.org/10.1109/icassp49660.2025.10887608","published":"2025-03-12","authors":["Xuchan Bao","Judith Yue Li","Zhong Wan","Kun Su","Timo I. Denk","Joonseok Lee","Dima Kuzmin","Fei Sha"],"abstract":"Modern music retrieval systems often rely on fixed representations of user preferences, limiting their ability to capture users’ diverse and uncertain retrieval needs. To address this limitation, we introduce Diff4Steer, a novel generative retrieval framework that employs lightweight diffusion models to synthesize diverse seed embeddings representing potential directions for music exploration. Unlike deterministic methods that map user query to a single point in embedding space, Diff4Steer provides a statistical prior on the target modality (audio) for retrieval, effectively capturing the uncertainty and multi-faceted nature of user preferences. Furthermore, Diff4Steer can be steered by image or text inputs, enabling more flexible and controllable music discovery combined with nearest neighbor search. Our framework outperforms deterministic regression methods and LLM-based generative ret...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/icassp49660.2025.10887608","openalex_id":"https://openalex.org/W4408353321","cited_by_count":0,"quality_score":45,"matched_keywords":["LLM","retrieval"],"author_affiliations":["Google (United States)","University of Toronto"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7410731315612793},{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.6650985479354858},{"id":"https://openalex.org/C69357855","display_name":"Diffusion","score":0.5082008838653564},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4909816086292267},{"id":"https://openalex.org/C167966045","display_name":"Generative model","score":0.4349513351917267},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.4116849899291992},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.37351536750793457},{"id":"https://openalex.org/C97355855","display_name":"Thermodynamics","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4408352485","title":"\"I’ve Heard of You!\": Generate Spoken Named Entity Recognition Data for Unseen Entities","url":"https://doi.org/10.1109/icassp49660.2025.10888066","published":"2025-03-12","authors":["Jiawei Yu","Xiang Geng","Yuang Li","Mengxin Ren","Wei Tang","Jiahuan Li","Zhibin Lan","Min Zhang","Hao Yang","Shujian Huang","Jinsong Su"],"abstract":"Spoken named entity recognition (NER) aims to identify named entities from speech, playing an important role in speech processing. New named entities appear every day, however, annotating their Spoken NER data is costly. In this paper, we demonstrate that existing Spoken NER systems perform poorly when dealing with previously unseen named entities. To tackle this challenge, we propose a method for generating Spoken NER data based on a named entity dictionary (NED) to reduce costs. Specifically, we first use a large language model (LLM) to generate sentences from the sampled named entities and then use a text-to-speech (TTS) system to generate the speech. Furthermore, we introduce a noise metric to filter out noisy data. To evaluate our approach, we release a novel Spoken NER benchmark along with a corresponding NED containing 8,853 entities. Experiment results show that our method achiev...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/icassp49660.2025.10888066","openalex_id":"https://openalex.org/W4408352485","cited_by_count":0,"quality_score":45,"matched_keywords":["LLM","language model"],"author_affiliations":["Huawei Technologies (China)","Nanjing University","Xiamen University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7671536803245544},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.6168199777603149},{"id":"https://openalex.org/C2779135771","display_name":"Named-entity recognition","score":0.5569109916687012},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5350205898284912},{"id":"https://openalex.org/C28490314","display_name":"Speech recognition","score":0.5166664123535156},{"id":"https://openalex.org/C2777889803","display_name":"Named entity","score":0.45438870787620544},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.39933642745018005},{"id":"https://openalex.org/C2780451532","display_name":"Task (project management)","score":0.09345388412475586}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4408355071","title":"Towards a Single ASR Model That Generalizes to Disordered Speech","url":"https://doi.org/10.1109/icassp49660.2025.10888895","published":"2025-03-12","authors":["Jimmy Tobin","Katrin Tomanek","Subhashini Venugopalan"],"abstract":"This study investigates the impact of integrating a dataset of disordered speech recordings (~1,000 hours) into the fine-tuning of a near state-of-the-art ASR baseline system. Contrary to what one might expect, despite the data being less than 1% of the training data of the ASR system, we find a considerable improvement in disordered speech recognition accuracy. Specifically, we observe a 33% improvement on prompted speech, and a 26% improvement on a newly gathered spontaneous, conversational dataset of disordered speech. Importantly, there is no significant performance decline on standard speech recognition benchmarks. Further, we observe that the proposed tuning strategy helps close the gap between the baseline system and personalized models by 64% highlighting the significant progress as well as the room for improvement. Given the substantial benefits of our findings, this experiment....","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/icassp49660.2025.10888895","openalex_id":"https://openalex.org/W4408355071","cited_by_count":3,"quality_score":44,"matched_keywords":["personalized"],"author_affiliations":["Google (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7142094969749451},{"id":"https://openalex.org/C28490314","display_name":"Speech recognition","score":0.5730959177017212},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.4374564290046692},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3377044200897217}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":3}},{"id":"openalex:W4408345989","title":"Mamba-YOLO-World: Marrying YOLO-World with Mamba for Open-Vocabulary Detection","url":"https://doi.org/10.1109/icassp49660.2025.10888334","published":"2025-03-12","authors":["H Wang","Qingdong He","Jinlong Peng","Hao Yang","Mingmin Chi","Yabiao Wang"],"abstract":"Open-vocabulary detection (OVD) aims to detect objects beyond a predefined set of categories. As a pioneering model incorporating the YOLO series into OVD, YOLO-World is well-suited for scenarios prioritizing speed and efficiency. However, its performance is hindered by its neck feature fusion mechanism, which causes the quadratic complexity and the limited guided receptive fields. To address these limitations, we present Mamba-YOLO-World, a novel YOLO-based OVD model employing the proposed MambaFusion Path Aggregation Network (MambaFusion-PAN) as its neck architecture. Specifically, we introduce an innovative State Space Model-based feature fusion mechanism consisting of a Parallel-Guided Selective Scan algorithm and a Serial-Guided Selective Scan algorithm with linear complexity and globally guided receptive fields. It leverages multi-modal input sequences and mamba hidden states to gu...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/icassp49660.2025.10888334","openalex_id":"https://openalex.org/W4408345989","cited_by_count":7,"quality_score":44,"matched_keywords":[],"author_affiliations":["Fudan University","Shanghai Jiao Tong University","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5446507334709167},{"id":"https://openalex.org/C2777601683","display_name":"Vocabulary","score":0.4603635370731354},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.34892046451568604},{"id":"https://openalex.org/C41895202","display_name":"Linguistics","score":0.070333331823349},{"id":"https://openalex.org/C138885662","display_name":"Philosophy","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":7}},{"id":"openalex:W4408346425","title":"FUELVISION: A multimodal data fusion and multimodel ensemble algorithm for wildfire fuels mapping","url":"https://doi.org/10.1016/j.jag.2025.104436","published":"2025-03-12","authors":["Riyaaz Uddien Shaik","Mohamad Alipour","Eric Rowell","Bharathan Balaji","Adam C. Watts","Ertuǧrul Taciroğlu"],"abstract":"Accurate assessment of fuel conditions is a prerequisite for fire ignition and behavior prediction, and risk management. The method proposed herein leverages diverse data sources – including L8 optical imagery, S1 (C-band) Synthetic Aperture Radar (SAR) imagery, PL (L-band) SAR imagery, and terrain features – to capture comprehensive information about fuel types and distributions. An ensemble model was trained to predict landscape-scale fuels – such as the ’Scott and Burgan 40’ – using the as-received Forest Inventory and Analysis (FIA) field survey plot data obtained from the USDA Forest Service. However, this basic approach yielded relatively poor results due to the inadequate amount of training data. Pseudo-labeled and fully synthetic datasets were developed using generative AI approaches to address the limitations of ground truth data availability. These synthetic datasets were used....","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1016/j.jag.2025.104436","openalex_id":"https://openalex.org/W4408346425","cited_by_count":7,"quality_score":44,"matched_keywords":[],"author_affiliations":["Amazon (United States)","Desert Research Institute","US Forest Service","University of California, Los Angeles","University of Illinois Urbana-Champaign"],"concepts":[{"id":"https://openalex.org/C158525013","display_name":"Fusion","score":0.5209820866584778},{"id":"https://openalex.org/C33954974","display_name":"Sensor fusion","score":0.48862677812576294},{"id":"https://openalex.org/C205649164","display_name":"Geography","score":0.44300711154937744},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.41678646206855774},{"id":"https://openalex.org/C62649853","display_name":"Remote sensing","score":0.3913491368293762},{"id":"https://openalex.org/C11413529","display_name":"Algorithm","score":0.38740038871765137},{"id":"https://openalex.org/C58640448","display_name":"Cartography","score":0.37200647592544556},{"id":"https://openalex.org/C124101348","display_name":"Data mining","score":0.3557432293891907}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":7}},{"id":"openalex:W4408352707","title":"MADiff: Text-Guided Fashion Image Editing with Mask Prediction and Attention-Enhanced Diffusion","url":"https://doi.org/10.1109/icassp49660.2025.10888988","published":"2025-03-12","authors":["Zechao Zhan","Dehong Gao","Jinxia Zhang","Jiale Huang","Yang Hu","Xin Wang"],"abstract":"Text-guided image editing model has achieved great success in general domain. However, directly applying these models to the fashion domain may encounter two issues: (1) Inaccurate localization of editing region; (2) Weak editing magnitude. To address these issues, the MADiff model is proposed. Specifically, to more accurately identify editing region, the MaskNet is proposed, in which the foreground region, densepose and mask prompts from large language model are fed into a lightweight UNet to predict the mask for editing region. To strengthen the editing magnitude, the Attention-Enhanced Diffusion Model is proposed, where the noise map, attention map, and the mask from MaskNet are fed into the proposed Attention Processor to produce a refined noise map. By integrating the refined noise map into the diffusion model, the edited image can better align with the target prompt. Given the abse...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/icassp49660.2025.10888988","openalex_id":"https://openalex.org/W4408352707","cited_by_count":2,"quality_score":43,"matched_keywords":["language model"],"author_affiliations":["Alibaba Group (China)","Northwestern Polytechnical University","Southeast University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7239584922790527},{"id":"https://openalex.org/C69357855","display_name":"Diffusion","score":0.5414604544639587},{"id":"https://openalex.org/C115961682","display_name":"Image (mathematics)","score":0.5018789768218994},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.44404205679893494},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.4180070161819458},{"id":"https://openalex.org/C121684516","display_name":"Computer graphics (images)","score":0.37097039818763733},{"id":"https://openalex.org/C49774154","display_name":"Multimedia","score":0.3210465908050537},{"id":"https://openalex.org/C97355855","display_name":"Thermodynamics","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":2}},{"id":"openalex:W4408354617","title":"Implanting Robust Watermarks in Latent Diffusion Models for Video Generation","url":"https://doi.org/10.1109/icassp49660.2025.10888991","published":"2025-03-12","authors":["Xiaohang Liu","Heng Chang","Jinfu Wei","Lei Zhu","Emily Liu","Likun Li","Shiji Zhou","Chengyuan Li","Di Xu","Wei Gao"],"abstract":"In the dynamic realm of digital media, latent diffusion models (LDM) have revolutionized the generation of videos, surpassing the capabilities of traditional generative models. This paper presents Stable Video Signature, a pioneering watermarking framework for LDM in video generation. Addressing the pressing need for copyright and model protection, our approach is the first to implant watermarks directly into the generation process of LDM based on video through a novel two-stage process. We first encode watermarks into the video’s latent space embedding, ensuring a holistic temporal decoding mechanism of LDM. Then watermark is integrated into the LDM’s decoder. In this process, our method can maintain frame consistency, preserving the quality and robustness of generated videos during watermark implantation. We further show that the framework embeds watermarks seamlessly into LDM, maintai...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/icassp49660.2025.10888991","openalex_id":"https://openalex.org/W4408354617","cited_by_count":2,"quality_score":43,"matched_keywords":["media"],"author_affiliations":["Huawei Technologies (China)","Peking University","Tsinghua University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6524027585983276},{"id":"https://openalex.org/C69357855","display_name":"Diffusion","score":0.5141459107398987},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.44412752985954285},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.40420401096343994},{"id":"https://openalex.org/C97355855","display_name":"Thermodynamics","score":0.0},{"id":"https://openalex.org/C121332964","display_name":"Physics","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":2}},{"id":"openalex:W4408345868","title":"DiffCSS: Diverse and Expressive Conversational Speech Synthesis with Diffusion Models","url":"https://doi.org/10.1109/icassp49660.2025.10890208","published":"2025-03-12","authors":["Weihao Wu","Zhiwei Lin","Yixuan Zhou","Jingbei Li","Rui Niu","Qinghua Wu","Songjun Cao","Long Ma","Zhiyong Wu"],"abstract":"Conversational speech synthesis (CSS) aims to synthesize both contextually appropriate and expressive speech, and considerable efforts have been made to enhance the understanding of conversational context. However, existing CSS systems are limited to deterministic prediction, overlooking the diversity of potential responses. Moreover, they rarely employ language model (LM)-based TTS backbones, limiting the naturalness and quality of synthesized speech. To address these issues, in this paper, we propose DiffCSS, an innovative CSS framework that leverages diffusion models and an LM-based TTS backbone to generate diverse, expressive, and contextually coherent speech. A diffusion-based context-aware prosody predictor is proposed to sample diverse prosody embeddings conditioned on multimodal conversational context. Then a prosody-controllable LM-based TTS backbone is developed to synthesize h...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/icassp49660.2025.10890208","openalex_id":"https://openalex.org/W4408345868","cited_by_count":2,"quality_score":43,"matched_keywords":["language model"],"author_affiliations":["Tencent (China)","Tsinghua University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7563056945800781},{"id":"https://openalex.org/C14999030","display_name":"Speech synthesis","score":0.5981019735336304},{"id":"https://openalex.org/C28490314","display_name":"Speech recognition","score":0.5390043258666992},{"id":"https://openalex.org/C69357855","display_name":"Diffusion","score":0.446217805147171},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.33239293098449707},{"id":"https://openalex.org/C121332964","display_name":"Physics","score":0.052910178899765015},{"id":"https://openalex.org/C97355855","display_name":"Thermodynamics","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":2}},{"id":"openalex:W4408354287","title":"An Ensemble Approach to Short-form Video Quality Assessment Using Multimodal LLM","url":"https://doi.org/10.1109/icassp49660.2025.10888524","published":"2025-03-12","authors":["Wen Wen","Yilin Wang","Neil Birkbeck","Balu Adsumilli"],"abstract":"The rise of short-form videos, characterized by diverse content, editing styles, and artifacts, poses substantial challenges for learning-based blind video quality assessment (BVQA) models. Multimodal large language models (MLLMs), renowned for their superior generalization capabilities, present a promising solution. This paper focuses on effectively leveraging a pretrained MLLM for short-form video quality assessment, regarding the impacts of pre-processing and response variability, and insights on combining the MLLM with BVQA models. We first investigated how frame pre-processing and sampling techniques influence the MLLM’s performance. Then, we introduced a lightweight learning-based ensemble method that adaptively integrates predictions from the MLLM and state-of-the-art BVQA models. Our results demonstrated superior generalization performance with the proposed ensemble approach. Fur...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/icassp49660.2025.10888524","openalex_id":"https://openalex.org/W4408354287","cited_by_count":2,"quality_score":43,"matched_keywords":["LLM"],"author_affiliations":["City University of Hong Kong","Google (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7618979811668396},{"id":"https://openalex.org/C103910844","display_name":"Video quality","score":0.5670751929283142},{"id":"https://openalex.org/C2779530757","display_name":"Quality (philosophy)","score":0.5476764440536499},{"id":"https://openalex.org/C3020001037","display_name":"Quality assessment","score":0.5144691467285156},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.45882439613342285},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.3582730293273926},{"id":"https://openalex.org/C200601418","display_name":"Reliability engineering","score":0.15260529518127441},{"id":"https://openalex.org/C127413603","display_name":"Engineering","score":0.09054774045944214}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":2}},{"id":"openalex:W4408352669","title":"SPT: Sequence Prompt Transformer for Interactive Image Segmentation","url":"https://doi.org/10.1109/icassp49660.2025.10888197","published":"2025-03-12","authors":["Senlin Cheng","Haopeng Sun","Tao Xie","Hangyue Zhao","Yiqiang Chen","Bolei Xu","Xiaobo Li"],"abstract":"Interactive segmentation aims to extract objects of interest from an image based on user-provided clicks. In real-world applications, there is often a need to segment a series of images featuring the same target object. However, existing methods typically process one image at a time, failing to consider the sequential nature of the images. To overcome this limitation, we propose a novel method called Sequence Prompt Transformer (SPT), the first to utilize sequential image information for interactive segmentation. Our model comprises two key components: (1) Sequence Prompt Transformer (SPT) for acquiring information from sequence of images, clicks and masks to improve accurate. (2) Topk Prompt Selection (TPS) selects precise prompts for SPT to further enhance the segmentation effect. Additionally, we create the ADE20K-Seq benchmark to better evaluate model performance. We evaluate our app...","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/icassp49660.2025.10888197","openalex_id":"https://openalex.org/W4408352669","cited_by_count":5,"quality_score":42,"matched_keywords":[],"author_affiliations":["Baidu (China)","Chinese Academy of Sciences","Institute of Computing Technology"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6681420207023621},{"id":"https://openalex.org/C124504099","display_name":"Image segmentation","score":0.6056339144706726},{"id":"https://openalex.org/C66322947","display_name":"Transformer","score":0.5545173287391663},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.5501275658607483},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5278016328811646},{"id":"https://openalex.org/C89600930","display_name":"Segmentation","score":0.4927230179309845},{"id":"https://openalex.org/C2778112365","display_name":"Sequence (biology)","score":0.4795389175415039},{"id":"https://openalex.org/C127413603","display_name":"Engineering","score":0.13048601150512695}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":5}},{"id":"openalex:W4408354733","title":"Investigating Numerical Translation with Large Language Models","url":"https://doi.org/10.1109/icassp49660.2025.10887726","published":"2025-03-12","authors":["Wei Tang","Jiawei Yu","Yuang Li","Yanqing Zhao","Weidong Zhang","Wei Feng","Min Zhang","Hao Yang"],"abstract":"The inaccurate translation of numbers can lead to significant security issues, ranging from financial setbacks to medical inaccuracies. While large language models (LLMs) have made significant advancements in machine translation, their capacity for translating numbers has not been thoroughly explored. This study focuses on evaluating the reliability of LLM-based machine translation systems when handling numerical data. In order to systematically test the numerical translation capabilities of currently open source LLMs, we have constructed a numerical translation dataset between Chinese and English based on real business data, encompassing ten types of numerical translation. Experiments on the dataset indicate that errors in numerical translation are a common issue, with most open-source LLMs faltering when faced with our test scenarios. Especially when it comes to numerical types involvi...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/icassp49660.2025.10887726","openalex_id":"https://openalex.org/W4408354733","cited_by_count":1,"quality_score":42,"matched_keywords":["LLM"],"author_affiliations":["Huawei Technologies (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7557010650634766},{"id":"https://openalex.org/C149364088","display_name":"Translation (biology)","score":0.6338497400283813},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.49036312103271484},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.42237991094589233},{"id":"https://openalex.org/C104317684","display_name":"Gene","score":0.0},{"id":"https://openalex.org/C105580179","display_name":"Messenger RNA","score":0.0},{"id":"https://openalex.org/C185592680","display_name":"Chemistry","score":0.0},{"id":"https://openalex.org/C55493867","display_name":"Biochemistry","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4408351991","title":"Identity-Preserving Audio-Driven Holistic Human Motion Video Generation","url":"https://doi.org/10.1109/icassp49660.2025.10890615","published":"2025-03-12","authors":["Haiwei Xue","Zhensong Zhang","Minglei Li","Zonghong Dai","Zhiyong Wu"],"abstract":"Generating realistic human motion videos is a pivotal challenge in advancing human-computer interaction. While existing approaches often focus on generating either head or gesture movements from audio, they lack unified control over full-body motion, frequently producing low-resolution and blurred outputs. Additionally, these methods struggle to maintain character identity throughout the generated content. In this paper, we introduce a novel framework that generates photorealistic, personalized human motion videos from audio by decoupling identity features. We integrate both visual features and voice timbre to enhance the preservation of character identity. Our approach follows a four-stage paradigm: (1) frame generation, (2) identity feature customization, (3) audio-motion modeling, and (4) motion-video rendering. Through the collaborative modeling of audio-motion and motion-video stage...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/icassp49660.2025.10890615","openalex_id":"https://openalex.org/W4408351991","cited_by_count":1,"quality_score":42,"matched_keywords":["personalized"],"author_affiliations":["Fudan University","Huawei Technologies (China)","Tsinghua University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6269788146018982},{"id":"https://openalex.org/C2778355321","display_name":"Identity (music)","score":0.5227891802787781},{"id":"https://openalex.org/C104114177","display_name":"Motion (physics)","score":0.43134886026382446},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.3757800757884979},{"id":"https://openalex.org/C49774154","display_name":"Multimedia","score":0.3553942143917084},{"id":"https://openalex.org/C107038049","display_name":"Aesthetics","score":0.12032526731491089},{"id":"https://openalex.org/C142362112","display_name":"Art","score":0.09110823273658752}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4408354316","title":"Full-Reference Point Cloud Quality Assessment with Multimodal Large Language Models","url":"https://doi.org/10.1109/icassp49660.2025.10889736","published":"2025-03-12","authors":["Ryosuke Watanabe","Tomoaki Konno","Hiroshi Sankoh","Bryan Tanaka","Tatsuya Kobayashi"],"abstract":"Point cloud quality frequently degrades during various processes, such as scanning, compression, and transmission. Hence, reliable Point Cloud Quality Assessment (PCQA) methods are essential for detecting and mitigating the degradation in 3D applications. This paper proposes an accurate full-reference PCQA method that leverages Multimodal Large Language Models (MLLMs). The proposed method utilizes responses generated by MLLMs to assess point cloud quality. We introduce three innovative PCQA metrics derived from MLLMs: 1) response similarity score, 2) relative quality response score, and 3) absolute quality response score. In addition, we integrate these MLLM-based scores with conventional PCQA metrics using support vector regression to improve accuracy. Experimental results demonstrate that the average Pearson’s Linear Correlation Coefficient (PLCC) and Spearman’s Rank-Order Correlation....","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/icassp49660.2025.10889736","openalex_id":"https://openalex.org/W4408354316","cited_by_count":1,"quality_score":42,"matched_keywords":["compression"],"author_affiliations":["Google (United States)","KDDI Research (Japan)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7775396108627319},{"id":"https://openalex.org/C131979681","display_name":"Point cloud","score":0.6391443610191345},{"id":"https://openalex.org/C79974875","display_name":"Cloud computing","score":0.535443127155304},{"id":"https://openalex.org/C2779530757","display_name":"Quality (philosophy)","score":0.481361448764801},{"id":"https://openalex.org/C3020001037","display_name":"Quality assessment","score":0.44652605056762695},{"id":"https://openalex.org/C28719098","display_name":"Point (geometry)","score":0.4421542286872864},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.3887197971343994},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.32525867223739624}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4408356768","title":"Fine-portraitist: Visualizing the Speaker’s Face Portrait during Speech Listening","url":"https://doi.org/10.1109/icassp49660.2025.10889904","published":"2025-03-12","authors":["Jinting Wang","Li Liu","Jun Wang"],"abstract":"Speech-to-portrait generation (S2P) plays a crucial role in speech-driven, human-centered creative content generation, aiming to synthesize a speaker’s face portrait with identity consistency from a given speech clip. However, existing S2P methods can typically only preserve attribute consistency, e.g., gender and age, while failing to capture the more important part-appearance consistency due to the coarse speech-face correlation. In this work, we propose Fine-portraitist, a novel retrieval-augmented, easy-to-hard generation framework designed to tackle this problem. Specifically, Fine-portraitist enhances identity consistency in S2P through two key innovations: 1) We first explore the fine-grained speech-face correlation by decomposing the face portrait into speech-related and speech-unrelated parts. Based on this, we propose a two-stage, diffusion-based pipeline to progressively achie...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/icassp49660.2025.10889904","openalex_id":"https://openalex.org/W4408356768","cited_by_count":1,"quality_score":42,"matched_keywords":["retrieval"],"author_affiliations":["Tencent (China)","University of Hong Kong"],"concepts":[{"id":"https://openalex.org/C177291462","display_name":"Active listening","score":0.7449431419372559},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7017759084701538},{"id":"https://openalex.org/C28490314","display_name":"Speech recognition","score":0.6781643629074097},{"id":"https://openalex.org/C2779304628","display_name":"Face (sociological concept)","score":0.6252378821372986},{"id":"https://openalex.org/C162462552","display_name":"Portrait","score":0.5326520800590515},{"id":"https://openalex.org/C61328038","display_name":"Speech processing","score":0.42755240201950073},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.32325202226638794},{"id":"https://openalex.org/C41895202","display_name":"Linguistics","score":0.16632598638534546}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4408355874","title":"Enabling Beam Search for Language Model-Based Text-to-Speech Synthesis","url":"https://doi.org/10.1109/icassp49660.2025.10890055","published":"2025-03-12","authors":["Zehai Tu","Guangyan Zhang","Yiting Lu","Adaeze Adigwe","Simon King","Yiwen Guo"],"abstract":"Tokenising continuous speech into sequences of discrete tokens and modelling them with language models (LMs) has led to significant success in text-to-speech (TTS) synthesis. Despite these models can generate speech with high quality and naturalness, their synthesised samples can still suffer from artefacts, mispronunciation, word repeating, etc. In this paper, we argue these undesirable properties could partly be caused by the randomness of sampling-based strategies during the autoregressive decoding of LMs. Therefore, we look at maximization-based decoding approaches and propose Temporal Repetition Aware Diverse Beam Search (TRAD-BS) to find the most probable sequences of the generated speech tokens. Experiments with two recent LM-based TTS models demonstrate that our proposed maximisation-based decoding strategy generates speech with fewer mispronunciations and improved speaker consis...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/icassp49660.2025.10890055","openalex_id":"https://openalex.org/W4408355874","cited_by_count":1,"quality_score":42,"matched_keywords":["language model"],"author_affiliations":["Tencent (China)","University of Edinburgh"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7995055913925171},{"id":"https://openalex.org/C14999030","display_name":"Speech synthesis","score":0.6245802640914917},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.5693684220314026},{"id":"https://openalex.org/C28490314","display_name":"Speech recognition","score":0.5108366012573242},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.4603523910045624},{"id":"https://openalex.org/C19889080","display_name":"Beam search","score":0.4141945242881775},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4069986939430237},{"id":"https://openalex.org/C199360897","display_name":"Programming language","score":0.19181609153747559}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4408353267","title":"Predicting Compact Phrasal Rewrites with Large Language Models for ASR Post Editing","url":"https://doi.org/10.1109/icassp49660.2025.10890428","published":"2025-03-12","authors":["Hao Zhang","Felix Stahlberg","Shankar Kumar"],"abstract":"Large Language Models (LLMs) excel at rewriting tasks such as text style transfer and grammatical error correction. While there is considerable overlap between the inputs and outputs in these tasks, the decoding cost still increases with output length, regardless of the amount of overlap. By leveraging the overlap between the input and the output, Kaneko and Okazaki [1] proposed model-agnostic edit span representations to compress the rewrites to save computation. They reported an output length reduction rate of nearly 80% with minimal accuracy impact in four rewriting tasks. In this paper, we propose alternative edit phrase representations inspired by phrase-based statistical machine translation. We systematically compare our phrasal representations with their span representations. We apply the LLM rewriting model to the task of Automatic Speech Recognition (ASR) post editing and show t...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/icassp49660.2025.10890428","openalex_id":"https://openalex.org/W4408353267","cited_by_count":0,"quality_score":41,"matched_keywords":["LLM"],"author_affiliations":["Google (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7931170463562012},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.482021689414978},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.38406604528427124},{"id":"https://openalex.org/C28490314","display_name":"Speech recognition","score":0.33905160427093506},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.33364611864089966}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4408353418","title":"Generative Speech Foundation Model Pretraining for High-Quality Speech Extraction and Restoration","url":"https://doi.org/10.1109/icassp49660.2025.10888830","published":"2025-03-12","authors":["Pin-Jui Ku","Alexander H. Liu","Roman Korostik","Sung-Feng Huang","Szu‐Wei Fu","Ante Jukić"],"abstract":"This paper proposes a generative pretraining foundation model for high-quality speech restoration tasks. By directly operating on complex-valued short-time Fourier transform coefficients, our model does not rely on any vocoders for time-domain signal reconstruction. As a result, our model simplifies the synthesis process and removes the quality upper-bound introduced by any mel-spectrogram vocoder compared to prior work SpeechFlow. The proposed method is evaluated on multiple speech restoration tasks, including speech denoising, bandwidth extension, codec artifact removal, and target speaker extraction. In all scenarios, finetuning our pretrained model results in superior performance over strong baselines. Notably, in the target speaker extraction task, our model outperforms existing systems, including those leveraging SSL-pretrained encoders like WavLM. The code and the pretrained check...","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/icassp49660.2025.10888830","openalex_id":"https://openalex.org/W4408353418","cited_by_count":4,"quality_score":41,"matched_keywords":[],"author_affiliations":["Nvidia (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7049757838249207},{"id":"https://openalex.org/C28490314","display_name":"Speech recognition","score":0.6683413982391357},{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.5477131009101868},{"id":"https://openalex.org/C2780966255","display_name":"Foundation (evidence)","score":0.4759119153022766},{"id":"https://openalex.org/C2779530757","display_name":"Quality (philosophy)","score":0.46282559633255005},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4305141568183899},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.36958324909210205},{"id":"https://openalex.org/C95457728","display_name":"History","score":0.059277892112731934}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":4}},{"id":"openalex:W4408355697","title":"FashionFAE: Fine-grained Attributes Enhanced Fashion Vision-Language Pre-training","url":"https://doi.org/10.1109/icassp49660.2025.10889957","published":"2025-03-12","authors":["Jiale Huang","Dehong Gao","Jinxia Zhang","Zechao Zhan","Yang Hu","Xin Wang"],"abstract":"Large-scale Vision-Language Pre-training (VLP) has demonstrated remarkable success in the general domain. However, in the fashion domain, items are distinguished by fine-grained attributes such as texture and material, which are crucial for tasks such as retrieval. Existing models often fail to take advantage of these fine-grained attributes from both text and image modalities. To address the above issue, we propose a novel approach for the fashion domain, Fine-grained Attributes Enhanced VLP (FashionFAE), which focuses on the detailed characteristics of the fashion data. An attribute-emphasized text prediction task is proposed to predict fine-grained attributes of the items. This forces the model to focus on the salient attributes from the text modality. In addition, a novel attribute-promoted image reconstruction task is proposed, which further enhances the fine-grained ability of the....","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/icassp49660.2025.10889957","openalex_id":"https://openalex.org/W4408355697","cited_by_count":0,"quality_score":41,"matched_keywords":["retrieval"],"author_affiliations":["Alibaba Group (China)","Northwestern Polytechnical University","Southeast University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7114655375480652},{"id":"https://openalex.org/C2777211547","display_name":"Training (meteorology)","score":0.5399141311645508},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.43417778611183167},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.3642502427101135},{"id":"https://openalex.org/C49774154","display_name":"Multimedia","score":0.334127813577652},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.32758450508117676},{"id":"https://openalex.org/C153294291","display_name":"Meteorology","score":0.0},{"id":"https://openalex.org/C121332964","display_name":"Physics","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4408354762","title":"Exploring Inter-Variate and Long-Term Dependencies to Boost Multivariate Time Series Forecasting","url":"https://doi.org/10.1109/icassp49660.2025.10890320","published":"2025-03-12","authors":["Xi Ding","Yifan He","Shuigeng Zhou","Guiyang Liu","Qi Zhou"],"abstract":"Multivariate Time Series Forecasting (MTSF) is a critical task in various domains, and Large Language Models (LLMs) for MTSF have recently received considerable attention. Despite significant progress in large-scale time series models, particularly in fine-tuning pre-trained LLMs for MTSF, there are still limitations with existing works. First, multivariate time series (MTS) are often handled as multiple independent univariate inputs and processed separately across different variates, which neglects the dependencies between the variates. Second, most existing approaches employ Mean Squared Error (MSE) as loss function, which evaluates error at each time point separately, ignoring long-term dependencies. To address these limitations, this paper explores inter-variate and long-term dependencies to boost MTSF performance. We propose a temporal channel adapter to capture inter-variate relati...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/icassp49660.2025.10890320","openalex_id":"https://openalex.org/W4408354762","cited_by_count":0,"quality_score":41,"matched_keywords":["long-term"],"author_affiliations":["Alibaba Group (China)","Fudan University"],"concepts":[{"id":"https://openalex.org/C141547133","display_name":"Random variate","score":0.8699499368667603},{"id":"https://openalex.org/C161584116","display_name":"Multivariate statistics","score":0.7452243566513062},{"id":"https://openalex.org/C61797465","display_name":"Term (time)","score":0.6945455074310303},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6629194021224976},{"id":"https://openalex.org/C143724316","display_name":"Series (stratigraphy)","score":0.6214985251426697},{"id":"https://openalex.org/C151406439","display_name":"Time series","score":0.6142482757568359},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3916531503200531},{"id":"https://openalex.org/C149782125","display_name":"Econometrics","score":0.34103041887283325}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4408345630","title":"Editing Music with Melody and Text: Using ControlNet for Diffusion Transformer","url":"https://doi.org/10.1109/icassp49660.2025.10890309","published":"2025-03-12","authors":["Siyuan Hou","Shansong Liu","Ruibin Yuan","Wei Xue","Ying Shan","Man Zhao","Chao Zhang"],"abstract":"Despite the significant progress in controllable music generation and editing, challenges remain in the quality and length of generated music due to the use of Mel-spectrogram representations and UNet-based model structures. To address these limitations, we propose a novel approach using a Diffusion Transformer (DiT) augmented with an additional control branch using ControlNet. This allows for long-form and variable-length music generation and editing controlled by text and melody prompts. For more precise and fine-grained melody control, we introduce a novel top-k constant-Q Transform representation as the melody prompt, reducing ambiguity compared to previous representations (e.g., chroma), particularly for music with multiple tracks or a wide range of pitch values. To effectively balance the control signals from text and melody prompts, we adopt a curriculum learning strategy that pro...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/icassp49660.2025.10890309","openalex_id":"https://openalex.org/W4408345630","cited_by_count":4,"quality_score":41,"matched_keywords":[],"author_affiliations":["Hong Kong University of Science and Technology","Tencent (China)","Tsinghua University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6647696495056152},{"id":"https://openalex.org/C66322947","display_name":"Transformer","score":0.6377647519111633},{"id":"https://openalex.org/C28490314","display_name":"Speech recognition","score":0.5397208333015442},{"id":"https://openalex.org/C69357855","display_name":"Diffusion","score":0.42622435092926025},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.34997284412384033},{"id":"https://openalex.org/C119599485","display_name":"Electrical engineering","score":0.21056395769119263},{"id":"https://openalex.org/C127413603","display_name":"Engineering","score":0.10885852575302124},{"id":"https://openalex.org/C165801399","display_name":"Voltage","score":0.10050216317176819}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":4}},{"id":"openalex:W4408355873","title":"AMuSE: Attentive Multilingual Speech Encoding for Zero-Prior ASR","url":"https://doi.org/10.1109/icassp49660.2025.10890819","published":"2025-03-12","authors":["Ashutosh Varshney","Debmalya Chakrabarty","Akshat Jaiswal","Harish Arsikere","Abhinav Jain","Swayambhu Nath Ray","Frederick Weber","Anand Mohan","Prantik Sen","Garima Lalwani","Sambuddha Bhattacharya","Sri Garimella"],"abstract":"Multilingual ASR offers training, deployment and overall performance benefits, but models trained via simple data pooling are known to suffer from cross-lingual interference. Oracle language information (exact-prior) and language-specific parameters are usually leveraged to overcome this, but such approaches cannot enable seamless, truly multilingual experiences. Existing methods try to overcome this limitation by relying on inferred language information or language agnostic mixture-of-experts, but they incur additional runtime complexity and/or training cost in addition to being less effective in streaming scenarios. Building on previous studies where models were trained to handle mixed-prior (knowledge that the underlying language belongs to a known group), we propose Attentive Multilingual Speech Encoding (AMuSE), a training framework designed to match exact-prior performance even in....","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/icassp49660.2025.10890819","openalex_id":"https://openalex.org/W4408355873","cited_by_count":0,"quality_score":41,"matched_keywords":["LLM"],"author_affiliations":["Amazon (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7256332635879517},{"id":"https://openalex.org/C125411270","display_name":"Encoding (memory)","score":0.7118097543716431},{"id":"https://openalex.org/C28490314","display_name":"Speech recognition","score":0.6624228954315186},{"id":"https://openalex.org/C2780813799","display_name":"Zero (linguistics)","score":0.6608055830001831},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.4025987684726715},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3483922481536865},{"id":"https://openalex.org/C41895202","display_name":"Linguistics","score":0.17325663566589355},{"id":"https://openalex.org/C138885662","display_name":"Philosophy","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4408351875","title":"DreamVideo: High-Fidelity Image-to-Video Generation with Image Retention and Text Guidance","url":"https://doi.org/10.1109/icassp49660.2025.10887583","published":"2025-03-12","authors":["Cong Wang","Jiaxi Gu","Panwen Hu","Yuanfan Guo","Xiao Dong","Hang Xu","Xiaodan Liang"],"abstract":"Image-to-video generation, which aims to generate a video starting from a given reference image, has drawn great attention. Existing methods frequently integrate semantic information from images or simply concatenate images, which often leads to low fidelity and flickering in the generated videos. To tackle these problems, we propose a high-fidelity image-to-video generation method by devising a frame retention branch based on a pre-trained video diffusion model, named DreamVideo. Our DreamVideo perceives the reference image via convolution layers and concatenates the features with the noisy latents as model input. By this means, the details of the reference image can be preserved to the greatest extent. In addition, by incorporating the designed double-condition classifier-free guidance, DreamVideo can generate high-quality videos of different actions by providing varying prompt texts.....","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/icassp49660.2025.10887583","openalex_id":"https://openalex.org/W4408351875","cited_by_count":3,"quality_score":40,"matched_keywords":[],"author_affiliations":["Chinese University of Hong Kong, Shenzhen","Huawei Technologies (China)","Sun Yat-sen University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.724153995513916},{"id":"https://openalex.org/C115961682","display_name":"Image (mathematics)","score":0.527032196521759},{"id":"https://openalex.org/C2776459999","display_name":"Fidelity","score":0.511405885219574},{"id":"https://openalex.org/C113364801","display_name":"High fidelity","score":0.497090607881546},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.47157755494117737},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4505057632923126},{"id":"https://openalex.org/C49774154","display_name":"Multimedia","score":0.41877004504203796},{"id":"https://openalex.org/C121684516","display_name":"Computer graphics (images)","score":0.32395026087760925}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":3}},{"id":"openalex:W4408352766","title":"DrawSpeech: Expressive Speech Synthesis Using Prosodic Sketches as Control Conditions","url":"https://doi.org/10.1109/icassp49660.2025.10889767","published":"2025-03-12","authors":["Weidong Chen","Shan Yang","Guangzhi Li","Xixin Wu"],"abstract":"Controlling text-to-speech (TTS) systems to synthesize speech with the prosodic characteristics expected by users has attracted much attention. To achieve controllability, current studies focus on two main directions: (1) using reference speech as prosody prompt to guide speech synthesis, and (2) using natural language descriptions to control the generation process. However, finding reference speech that exactly contains the prosody that users want to synthesize takes a lot of effort. Description-based guidance in TTS systems can only determine the overall prosody, which has difficulty in achieving fine-grained prosody control over the synthesized speech. In this paper, we propose DrawSpeech, a sketch-conditioned diffusion model capable of generating speech based on any prosody sketches drawn by users. Specifically, the prosody sketches are fed to DrawSpeech to provide a rough indication...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/icassp49660.2025.10889767","openalex_id":"https://openalex.org/W4408352766","cited_by_count":3,"quality_score":40,"matched_keywords":[],"author_affiliations":["Chinese University of Hong Kong","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7799662351608276},{"id":"https://openalex.org/C14999030","display_name":"Speech synthesis","score":0.6566270589828491},{"id":"https://openalex.org/C28490314","display_name":"Speech recognition","score":0.6269228458404541},{"id":"https://openalex.org/C2775924081","display_name":"Control (management)","score":0.5778880715370178},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.4268278181552887},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3043747544288635}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":3}},{"id":"openalex:W4408352069","title":"AudioComposer: Towards Fine-grained Audio Generation with Natural Language Descriptions","url":"https://doi.org/10.1109/icassp49660.2025.10888303","published":"2025-03-12","authors":["Yuanyuan Wang","Hangting Chen","Dongchao Yang","Zhiyong Wu","Xixin Wu"],"abstract":"Current Text-to-audio (TTA) models mainly use coarse text descriptions as inputs to generate audio, which hinders models from generating audio with fine-grained control of content and style. Some studies try to improve the granularity by incorporating additional frame-level conditions or control networks. However, this usually leads to complex system design and difficulties due to the requirement for reference frame-level conditions. To address these challenges, we propose AudioComposer, a novel TTA generation framework that relies solely on natural language descriptions (NLDs) to provide both content specification and style control information. To further enhance audio generative modeling, we employ flow-based diffusion transformers with the cross-attention mechanism to incorporate text descriptions effectively into audio generation processes, which can not only simultaneously consider....","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/icassp49660.2025.10888303","openalex_id":"https://openalex.org/W4408352069","cited_by_count":3,"quality_score":40,"matched_keywords":[],"author_affiliations":["Chinese University of Hong Kong","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7669222354888916},{"id":"https://openalex.org/C2776187449","display_name":"Natural language generation","score":0.5758915543556213},{"id":"https://openalex.org/C2776608160","display_name":"Natural (archaeology)","score":0.5279570817947388},{"id":"https://openalex.org/C195324797","display_name":"Natural language","score":0.4458954334259033},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.26468658447265625},{"id":"https://openalex.org/C127313418","display_name":"Geology","score":0.0632219910621643},{"id":"https://openalex.org/C151730666","display_name":"Paleontology","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":3}},{"id":"openalex:W4408352642","title":"Span Attention for Entity-Consistent Task-Oriented Dialogue Response Generation","url":"https://doi.org/10.1109/icassp49660.2025.10887791","published":"2025-03-12","authors":["Jiale Chen","Xuelian Dong","Wenxiu Xie","Tao Gong","Fu Lee Wang","Tianyong Hao"],"abstract":"Task-oriented dialogue systems have recently gained increasing attention due to their capability of using natural language to fulfill specific user demands, such as restaurant reservation and hotel booking. Recent works directly model task-oriented dialogue response as a text generation task. However, these methods, utilizing generated response tokens as an attention query to obtain the vanilla attention distribution over an entire knowledge base, frequently lead to an entity inconsistency in final response generation. To tackle this problem, we propose a novel attention mechanism called span attention and a novel model named Span Attention GEnerator (SAGE). The span attention computes an attention score between a query vector and each knowledge record vector instead of computing a vanilla attention score among word vectors, which consisted of dialogue context and knowledge base. For eff...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/icassp49660.2025.10887791","openalex_id":"https://openalex.org/W4408352642","cited_by_count":2,"quality_score":39,"matched_keywords":[],"author_affiliations":["City University of Hong Kong","Google (United States)","South China Normal University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6973807215690613},{"id":"https://openalex.org/C2780451532","display_name":"Task (project management)","score":0.6204888820648193},{"id":"https://openalex.org/C2778753569","display_name":"Span (engineering)","score":0.5568491220474243},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.3481086492538452},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.32445088028907776},{"id":"https://openalex.org/C127413603","display_name":"Engineering","score":0.1351882517337799},{"id":"https://openalex.org/C66938386","display_name":"Structural engineering","score":0.0639500617980957},{"id":"https://openalex.org/C201995342","display_name":"Systems engineering","score":0.06204703450202942}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":2}},{"id":"openalex:W4408352258","title":"STA-V2A: Video-to-Audio Generation with Semantic and Temporal Alignment","url":"https://doi.org/10.1109/icassp49660.2025.10890132","published":"2025-03-12","authors":["Yong Ren","Chenxing Li","Manjie Xu","Wei Liang","Yu Gu","Rilin Chen","Dong Yu"],"abstract":"Visual and auditory perception are two crucial ways humans experience the world. Text-to-video generation has made remarkable progress over the past year, but the absence of harmonious audio in generated video limits its broader applications. In this paper, we propose Semantic and Temporal Aligned Video-to-Audio (STA-V2A), an approach that enhances audio generation from videos by extracting both local temporal and global semantic video features and combining these refined video features with text as cross-modal guidance. To address the issue of information redundancy in videos, we propose an onset prediction pretext task for local temporal feature extraction and an attentive pooling module for global semantic feature extraction. To supplement the insufficient semantic information in videos, we propose a Latent Diffusion Model with Text-to-Audio priors initialization and cross-modal guida...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/icassp49660.2025.10890132","openalex_id":"https://openalex.org/W4408352258","cited_by_count":2,"quality_score":39,"matched_keywords":[],"author_affiliations":["Beijing Institute of Technology","KLA (United States)","Seattle University","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7966454029083252},{"id":"https://openalex.org/C28490314","display_name":"Speech recognition","score":0.43081721663475037},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.370268315076828}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":2}},{"id":"openalex:W4408345957","title":"M-MoE: Mixture of Mixture-of-Expert Model for CTC-based Streaming Multilingual ASR","url":"https://doi.org/10.1109/icassp49660.2025.10887679","published":"2025-03-12","authors":["Songjun Cao","Xiong Wang","Yike Zhang","Xiaoming Zhang","Long Ma"],"abstract":"The Mixture-of-Expert (MoE) structure has been effectively utilized in multilingual ASR tasks. However, the potential of external language information remains underutilized. In this paper, we introduce the Mixture of MoE (M-MoE) structure, featuring multiple language-specific MoEs and a language-unknown MoE. The language-unknown MoE reuses experts from language-specific MoEs. Inputs with language IDs are directed to language-specific MoEs, while those without IDs go to the language-unknown MoE. We propose a two-stage training method for the M-MoE-based model. Our unified model structure is suitable for streaming ASR tasks in both language-known and language-unknown scenarios. Experiments on a three-language dataset show that compared to the Conformer baseline, our model achieves an average of 12% and 9% relative improvement in language-known and language-unknown scenarios. Compared to th...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/icassp49660.2025.10887679","openalex_id":"https://openalex.org/W4408345957","cited_by_count":2,"quality_score":39,"matched_keywords":[],"author_affiliations":["Tencent (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6264612078666687},{"id":"https://openalex.org/C61224824","display_name":"Mixture model","score":0.4134834110736847},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3664942979812622},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.36542749404907227},{"id":"https://openalex.org/C28490314","display_name":"Speech recognition","score":0.32521766424179077}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":2}},{"id":"openalex:W4408354639","title":"Bone Conducted Signal Guided Speech Enhancement For Voice Assistant on Earbuds","url":"https://doi.org/10.1109/icassp49660.2025.10889416","published":"2025-03-12","authors":["Jens Heitkaemper","Joe Caroselli","Max McKinnon","Arun Narayanan","Nathan Howard"],"abstract":"In this work we present a multi-modal, streaming enhancement network to improve speech recognition for voice assistants on earbuds. The proposed model is guided by a bone conducted signal (BCS) to separate the interfering sources from the target speaker signal. We train the model on a simulated speech enhancement training set with a simulated BCS and finetune it on a small earbuds specific training set, consisting of about 6 hours of speech. To account for distorted BCS the enhancement module is complemented by a voice activity-based decision to discard the enhanced output for BCS without speech information. A possibility to preprocess the BCS to account for the low-pass characteristic of the bone conduction is evaluated to lower the required transmission bandwidth from the earbuds to the recognition device. The results show that the BCS bandwidth can be reduced to 500 Hz with only small...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/icassp49660.2025.10889416","openalex_id":"https://openalex.org/W4408354639","cited_by_count":2,"quality_score":39,"matched_keywords":[],"author_affiliations":["Google (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6834025382995605},{"id":"https://openalex.org/C28490314","display_name":"Speech recognition","score":0.6527251601219177},{"id":"https://openalex.org/C2779843651","display_name":"SIGNAL (programming language)","score":0.495456337928772},{"id":"https://openalex.org/C2776182073","display_name":"Speech enhancement","score":0.4593003988265991},{"id":"https://openalex.org/C204201278","display_name":"Voice activity detection","score":0.449171781539917},{"id":"https://openalex.org/C61328038","display_name":"Speech processing","score":0.44868409633636475},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.1588495373725891},{"id":"https://openalex.org/C163294075","display_name":"Noise reduction","score":0.08147889375686646}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":2}},{"id":"openalex:W4408353434","title":"Loss-Aware Curriculum Learning for Chinese Grammatical Error Correction","url":"https://doi.org/10.1109/icassp49660.2025.10889130","published":"2025-03-12","authors":["Ding Zhang","Yangning Li","Lichen Bai","Hao Zhang","Yinghui Li","Haiye Lin","Hai-Tao Zheng","Xin Su","Zifei Shan"],"abstract":"Chinese grammatical error correction (CGEC) aims to detect and correct errors in the input Chinese sentences. Recently, Pre-trained Language Models (PLMS) have been employed to improve the performance. However, current approaches ignore that correction difficulty varies across different instances and treat these samples equally, enhancing the challenge of model learning. To address this problem, we propose a multi-granularity Curriculum Learning (CL) framework. Specifically, we first calculate the correction difficulty of these samples and feed them into the model from easy to hard batch by batch. Then Instance-Level CL is employed to help the model optimize in the appropriate direction automatically by regulating the loss function. Extensive experimental results and comprehensive analyses of various datasets prove the effectiveness of our method.","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/icassp49660.2025.10889130","openalex_id":"https://openalex.org/W4408353434","cited_by_count":1,"quality_score":38,"matched_keywords":[],"author_affiliations":["Peng Cheng Laboratory","Tencent (China)","Tsinghua University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7261325120925903},{"id":"https://openalex.org/C47177190","display_name":"Curriculum","score":0.5557446479797363},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.49734142422676086},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4572067856788635},{"id":"https://openalex.org/C3018824978","display_name":"Error analysis","score":0.43071550130844116},{"id":"https://openalex.org/C15744967","display_name":"Psychology","score":0.19034436345100403},{"id":"https://openalex.org/C33923547","display_name":"Mathematics","score":0.11342406272888184},{"id":"https://openalex.org/C19417346","display_name":"Pedagogy","score":0.10642799735069275}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4408353088","title":"DPI-TTS: Directional Patch Interaction for Fast-Converging and Style Temporal Modeling in Text-to-Speech","url":"https://doi.org/10.1109/icassp49660.2025.10889461","published":"2025-03-12","authors":["Xin Qi","Ruibo Fu","Zhengqi Wen","Tao Wang","Chunyu Qiang","Jianhua Tao","Chenxing Li","Yi Lu","Shuchen Shi","Zhiyong Wang","Xiaopeng Wang","Yuankun Xie"],"abstract":"In recent years, speech diffusion models have advanced rapidly. Alongside the widely used U-Net architecture, transformer-based models such as the Diffusion Transformer (DiT) have also gained attention. However, current DiT speech models treat Mel spectrograms as general images, which overlooks the specific acoustic properties of speech. To address these limitations, we propose a method called Directional Patch Interaction for Text-to-Speech (DPI-TTS), which builds on DiT and achieves fast training without compromising accuracy. Notably, DPI-TTS employs a low-to-high frequency, frame-by-frame progressive inference approach that aligns more closely with acoustic properties, enhancing the naturalness of the generated speech. Additionally, we introduce a fine-grained style temporal modeling method that further improves speaker style similarity. Experimental results demonstrate that our meth...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/icassp49660.2025.10889461","openalex_id":"https://openalex.org/W4408353088","cited_by_count":1,"quality_score":38,"matched_keywords":[],"author_affiliations":["Institute of Automation","Tencent (China)","Tsinghua University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7041667103767395},{"id":"https://openalex.org/C28490314","display_name":"Speech recognition","score":0.6130313873291016},{"id":"https://openalex.org/C2776445246","display_name":"Style (visual arts)","score":0.552083432674408},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.3324007987976074},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3305777907371521},{"id":"https://openalex.org/C95457728","display_name":"History","score":0.04949057102203369},{"id":"https://openalex.org/C166957645","display_name":"Archaeology","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4408355491","title":"Brick-Diffusion: Generating Long Videos with Brick-to-Wall Denoising","url":"https://doi.org/10.1109/icassp49660.2025.10888199","published":"2025-03-12","authors":["Yunlong Yuan","Yuanfan Guo","Chunwei Wang","Songcen Xu","Li Zhang"],"abstract":"Recent advances in diffusion models have greatly improved text-driven video generation. However, training models for long video generation demands significant computational power and extensive data, leading most video diffusion models to be limited to a small number of frames. Existing training-free methods that attempt to generate long videos using pre-trained short video diffusion models often struggle with issues such as insufficient motion dynamics and degraded video fidelity. In this paper, we present Brick-Diffusion, a novel, training-free approach capable of generating long videos of arbitrary length. Our method introduces a brick-to-wall denoising strategy, where the latent is denoised in segments, with a stride applied in subsequent iterations. This process mimics the construction of a staggered brick wall, where each brick represents a denoised segment, enabling communication b...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/icassp49660.2025.10888199","openalex_id":"https://openalex.org/W4408355491","cited_by_count":1,"quality_score":38,"matched_keywords":[],"author_affiliations":["Fudan University","Huawei Technologies (China)"],"concepts":[{"id":"https://openalex.org/C154226666","display_name":"Brick","score":0.8851272463798523},{"id":"https://openalex.org/C163294075","display_name":"Noise reduction","score":0.5764096975326538},{"id":"https://openalex.org/C69357855","display_name":"Diffusion","score":0.5121839046478271},{"id":"https://openalex.org/C192562407","display_name":"Materials science","score":0.3617522120475769},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.3366246819496155},{"id":"https://openalex.org/C159985019","display_name":"Composite material","score":0.20719844102859497},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.1877894103527069},{"id":"https://openalex.org/C121332964","display_name":"Physics","score":0.08035171031951904}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4408352448","title":"TAGMO: Temporal Control Audio Generation for Multiple Visual Objects Without Training","url":"https://doi.org/10.1109/icassp49660.2025.10889172","published":"2025-03-12","authors":["Xinyu Zhang","Keyu Fan","Yiran Wang","Yingshan Liang","Jiasheng Lu","Zhimin Du","Qingyang Shi","Peiwu Qin"],"abstract":"With the great popularity of Sora, video-based audio generation has become indispensable. While numerous video-to-audio generation models have emerged, they frequently face difficulties including semantic incompatibilities and synchronization problems, especially in situations with multiple objects. To address these difficulties, we introduce TAGMO, a novel training-free audio generation method that offers precise time control for multi-object video scenarios. Our approach first employs object detection to obtain the class labels and temporal labels of each object, which are then structured and utilized as control conditions within a latent diffusion model (LDM) to generate multi-object audio. Additionally, we innovatively design a time mask based on the corresponding temporal labels and integrate it into the denoising process of the pre-trained audio generation model to achieve accurate...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/icassp49660.2025.10889172","openalex_id":"https://openalex.org/W4408352448","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Huawei Technologies (China)","Tsinghua University","University Town of Shenzhen"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8178567886352539},{"id":"https://openalex.org/C3017588708","display_name":"Audio visual","score":0.713320255279541},{"id":"https://openalex.org/C2777211547","display_name":"Training (meteorology)","score":0.7046158313751221},{"id":"https://openalex.org/C28490314","display_name":"Speech recognition","score":0.5329521298408508},{"id":"https://openalex.org/C2775924081","display_name":"Control (management)","score":0.5253089070320129},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.43517446517944336},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.36234474182128906},{"id":"https://openalex.org/C49774154","display_name":"Multimedia","score":0.35576069355010986}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4408355855","title":"Speech Few-Shot Learning for Language Learners’ Speech Recognition","url":"https://doi.org/10.1109/icassp49660.2025.10890741","published":"2025-03-12","authors":["Jian Cheng","Sam Nguyen"],"abstract":"This paper reports how speech recognition accuracy can be improved using the speech few-shot in-context learning capabilities of a multimodal foundation model when applied to the speech of language learners. Our proposed method, which combines speech few-shot with context prompting, demonstrates significant improvements in recognizing language learners’ accented speech. Evaluations on data from an L2 English test set with accented speech produced a 33.1% relative WER reduction and improved target word recall from 89% to 97% compared to the Gemini 1.5 Pro baseline. Notably, speech few-shot alone contributes a 9.8% relative WER reduction beyond the gains from context prompting. These results underscore the importance of incorporating domain knowledge from both speech and text modalities within in-context learning, suggesting that speech few-shot in-context learning offers a viable alternat...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/icassp49660.2025.10890741","openalex_id":"https://openalex.org/W4408355855","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Google (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8132961988449097},{"id":"https://openalex.org/C28490314","display_name":"Speech recognition","score":0.6507302522659302},{"id":"https://openalex.org/C504749915","display_name":"Speech technology","score":0.4892265796661377},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.4674950838088989},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4224480986595154},{"id":"https://openalex.org/C61328038","display_name":"Speech processing","score":0.40125447511672974}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4408353846","title":"Knowledge Transfer Across Modalities for Weakly Supervised Point Cloud Semantic Segmentation","url":"https://doi.org/10.1109/icassp49660.2025.10890346","published":"2025-03-12","authors":["Zihan Wang","Yunhang Shen","Mengtian Li","Ke Li","Xing Sun","Shaohui Lin","Lizhuang Ma"],"abstract":"Current weakly supervised point cloud semantic segmentation struggles with insufficient utilization of limited annotations in unimodal representation learning due to the sparse and textureless nature of point clouds. In this work, we leverage cross-modality information by transferring knowledge from image and text sources to the point cloud network. The intuition is that images contribute rich texture, color, and discriminative information, complementing point clouds to boost semantic segmentation performance. To reduce extensive computational resources for cross-modality fusion, we introduce the Multi-Scale Deformable Knowledge Transfer, an innovative training scheme that optimizes and extends the one-to-one mapping to flexible one-to-many relations between multi-modal data. Furthermore, we employ pre-trained image-text models to generate pseudo labels for point clouds and construct pos...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/icassp49660.2025.10890346","openalex_id":"https://openalex.org/W4408353846","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["East China Normal University","Shanghai University","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C2779903281","display_name":"Modalities","score":0.7541955709457397},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7457585334777832},{"id":"https://openalex.org/C131979681","display_name":"Point cloud","score":0.70808345079422},{"id":"https://openalex.org/C89600930","display_name":"Segmentation","score":0.6083489060401917},{"id":"https://openalex.org/C79974875","display_name":"Cloud computing","score":0.475710928440094},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4657272696495056},{"id":"https://openalex.org/C28719098","display_name":"Point (geometry)","score":0.42086318135261536},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.3581724166870117}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4408352304","title":"EasyControl: Adding Control to Video Diffusion for Controllable Video Generation and Interpolation","url":"https://doi.org/10.1109/icassp49660.2025.10889997","published":"2025-03-12","authors":["Cong Wang","Jiaxi Gu","Panwen Hu","Xiao Dong","Yuanfan Guo","Hang Xu","Xiaodan Liang"],"abstract":"The diffusion model is widely leveraged for either controllable video generation or video interpolation. As each field has its task-specific problems, it is difficult to merely develop a single model for completing both tasks simultaneously. Moreover, most existing works only support image conditions and necessitate redesigning the model structure to accommodate other types of conditions. Even so, they still face frame flickering issues when using the image as the condition due to the strong alignment of image pixels. To tackle these problems, in this work, we are the first to propose a unified diffusion framework, EasyControl, for both tasks of controllable video generation and interpolation with different types of conditions. The proposed EasyControl introduces a condition adapter to extract the condition features, which is then injected into an interchangeable fundamental text-to-vide...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/icassp49660.2025.10889997","openalex_id":"https://openalex.org/W4408352304","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Chinese University of Hong Kong, Shenzhen","Huawei Technologies (China)","Sun Yat-sen University"],"concepts":[{"id":"https://openalex.org/C137800194","display_name":"Interpolation (computer graphics)","score":0.6818643808364868},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6669436693191528},{"id":"https://openalex.org/C69357855","display_name":"Diffusion","score":0.43684422969818115},{"id":"https://openalex.org/C121684516","display_name":"Computer graphics (images)","score":0.337218701839447},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.30698394775390625},{"id":"https://openalex.org/C104114177","display_name":"Motion (physics)","score":0.06656280159950256},{"id":"https://openalex.org/C97355855","display_name":"Thermodynamics","score":0.0},{"id":"https://openalex.org/C121332964","display_name":"Physics","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4408347257","title":"CAMEL: Cross-Attention Enhanced Mixture-of-Experts and Language Bias for Code-Switching Speech Recognition","url":"https://doi.org/10.1109/icassp49660.2025.10890869","published":"2025-03-12","authors":["He Wang","Xucheng Wan","Naijun Zheng","Kai Liu","Huan Zhou","Guojian Li","Lei Xie"],"abstract":"Code-switching automatic speech recognition (ASR) aims to transcribe speech that contains two or more languages accurately. To better capture language-specific speech representations and address language confusion in code-switching ASR, the mixture-of-experts (MoE) architecture and an additional language diarization (LD) decoder are commonly employed. However, most researches remain stagnant in simple operations like weighted summation or concatenation to fuse language-specific speech representations, leaving significant opportunities to explore the enhancement of integrating language bias information. In this paper, we introduce CAMEL, a cross-attention-based MoE and language bias approach for code-switching ASR. Specifically, after each MoE layer, we fuse language-specific speech representations with cross-attention, leveraging its strong contextual modeling abilities. Additionally, we...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/icassp49660.2025.10890869","openalex_id":"https://openalex.org/W4408347257","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Huawei Technologies (China)","Northwestern Polytechnical University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7818828225135803},{"id":"https://openalex.org/C18552078","display_name":"Code-switching","score":0.7287548780441284},{"id":"https://openalex.org/C28490314","display_name":"Speech recognition","score":0.5879093408584595},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.45317795872688293},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.4383500814437866},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4170164167881012},{"id":"https://openalex.org/C2776760102","display_name":"Code (set theory)","score":0.41060349345207214},{"id":"https://openalex.org/C41895202","display_name":"Linguistics","score":0.1422075629234314}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/yue-scaling-open-foundation-models-for-long-form-music-generation","title":"YuE: Scaling Open Foundation Models for Long-Form Music Generation","url":"https://www.microsoft.com/en-us/research/publication/yue-scaling-open-foundation-models-for-long-form-music-generation/","published":"2025-03-10","authors":["Ruibin Yuan","Hanfeng Lin","Shuyue Guo","Ge Zhang","Jiahao Pan","Yongyi Zang","Haohe Liu","Yiming Liang","Wenye Ma","Xingjian Du","Xinrun Du","Zhen Ye"],"abstract":"We tackle the task of long-form music generation--particularly the challenging \\textbf{lyrics-to-song} problem--by introducing YuE, a family of open foundation models based on the LLaMA2 architecture. Specifically, YuE scales to trillions of tokens and generates up to five minutes of music while maintaining lyrical alignment, coherent musical structure, and engaging vocal melodies with appropriate accompaniment. It achieves this through (1) track-decoupled next-token prediction to overcome dense mixture signals, (2) structural progressive conditioning for long-context lyrical alignment, and (3) a multitask, multiphase pre-training recipe to converge and generalize. In addition, we redesign the in-context learning technique for music generation, enabling versatile style transfer (e.g., converting Japanese city pop into an English rap while preserving the original accompaniment) and bidire...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","Engineering","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"bytedance-seed:198","title":"Seedream 2.0: A Native Chinese-English Bilingual Image Generation Foundation Model","url":"https://seed.bytedance.com/en/research/seedream-2-0-a-native-chinese-english-bilingual-image-generation-foundation-model","published":"2025-03-10","authors":["Lixue Gong","Xiaoxia Hou","Fanshi Li","Liang Li","Xiaochen Lian","Fei Liu","Liyang Liu","Wei Liu","Wei Lu","Yichun Shi","Shiqi Sun","Yu Tian"],"abstract":"Rapid advancement of diffusion models has catalyzed remarkable progress in the field of image generation. However, prevalent models such as Flux, SD3.5 and Midjourney, still grapple with issues like model bias, limited text rendering capabilities, and insufficient understanding of Chinese cultural nuances. To address these limitations, we present Seedream 2.0, a native Chinese-English bilingual image generation foundation model that excels across diverse dimensions, which adeptly manages text prompt in both Chinese and English, supporting bilingual image generation and text rendering. We develop a powerful data system that facilitates knowledge integration, and a caption system that balances the accuracy and richness for image description. Particularly, Seedream is integrated with a self-developed bilingual large language model as a text encoder, allowing it to learn native knowledge dir...","companies":["ByteDance/Seed"],"matched_orgs":["ByteDance/Seed"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Computer Vision","Vision","arXiv","language model"],"author_affiliations":["ByteDance/Seed"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official ByteDance Seed publication API https://seed.bytedance.com/api/get_article_list_v2"}},{"id":"openalex:W4408281006","title":"GO-NeRF: <u>G</u>enerating <u>O</u>bjects in <u>Ne</u>ural <u>R</u>adiance <u>F</u>ields for Virtual Reality Content Creation","url":"https://doi.org/10.1109/tvcg.2025.3549558","published":"2025-03-10","authors":["Peng Dai","Feitong Tan","Xin Yu","Yifan Peng","Yinda Zhang","Xiaojuan Qi"],"abstract":"Virtual environments (VEs) are pivotal for virtual, augmented, and mixed reality systems. Despite advances in 3D generation and reconstruction, the direct creation of 3D objects within an established 3D scene (represented as NeRF) for novel VE creation remains a relatively unexplored domain. This process is complex, requiring not only the generation of high-quality 3D objects but also their seamless integration into the existing scene. To this end, we propose a novel pipeline featuring an intuitive interface, dubbed GO-NeRF. Our approach takes text prompts and user-specified regions as inputs and leverages the scene context to generate 3D objects within the scene. We employ a compositional rendering formulation that effectively integrates the generated 3D objects into the scene, utilizing optimized 3D-aware opacity maps to avoid unintended modifications to the original scene. Furthermore...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/tvcg.2025.3549558","openalex_id":"https://openalex.org/W4408281006","cited_by_count":4,"quality_score":41,"matched_keywords":[],"author_affiliations":["Google (United States)","University of Hong Kong"],"concepts":[{"id":"https://openalex.org/C194969405","display_name":"Virtual reality","score":0.7853562831878662},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.768897294998169},{"id":"https://openalex.org/C36464697","display_name":"Visualization","score":0.5723229646682739},{"id":"https://openalex.org/C121684516","display_name":"Computer graphics (images)","score":0.5097672343254089},{"id":"https://openalex.org/C23690007","display_name":"Radiance","score":0.45261436700820923},{"id":"https://openalex.org/C2778152352","display_name":"Content (measure theory)","score":0.44712433218955994},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.41242915391921997},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4109541177749634}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":4}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/the-impact-of-generative-ai-coding-assistants-on-developers-who-are-visually-impaired","title":"The Impact of Generative AI Coding Assistants on Developers Who Are Visually Impaired","url":"https://www.microsoft.com/en-us/research/publication/the-impact-of-generative-ai-coding-assistants-on-developers-who-are-visually-impaired/","published":"2025-03-09","authors":["Claudia Flores-Saviaga","Ben Hanrahan","Kashif Imteyaz","Steven Clarke","Saiph Savage"],"abstract":"The rapid adoption of generative AI in software development has impacted the industry, yet its effects on developers with visual impairments remain largely unexplored. To address this gap, we used an Activity Theory framework to examine how developers with visual impairments interact with AI coding assistants. For this purpose, we conducted a study where developers who are visually impaired completed a series of programming tasks using a generative AI coding assistant. We uncovered that, while participants found the AI assistant beneficial and reported significant advantages, they also highlighted accessibility challenges. Specifically, the AI coding assistant often exacerbated existing accessibility barriers and introduced new challenges. For example, it overwhelmed users with an excessive number of suggestions, leading developers who are visually impaired to express a desire for AI tim...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.1145/3706598.3714008","openalex_id":"https://openalex.org/W4409749482","cited_by_count":12,"quality_score":88,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Programming languages and software engineering","Computer science","Generative AI","1970-01-01"],"author_affiliations":["Microsoft","Microsoft (United Kingdom)","Microsoft (United States)","Northeastern University","Universidad Nacional Autónoma de México"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/distillm-2-a-contrastive-approach-boosts-the-distillation-of-llms","title":"DistiLLM-2: A Contrastive Approach Boosts the Distillation of LLMs","url":"https://www.microsoft.com/en-us/research/publication/distillm-2-a-contrastive-approach-boosts-the-distillation-of-llms/","published":"2025-03-09","authors":["Jongwoo Ko","Tianyi Chen","Sungnyun Kim","Tianyu Ding","Luming Liang","Ilya Zharkov","SeYoung Yun"],"abstract":"Despite the success of distillation in large language models (LLMs), most prior work applies identical loss functions to both teacher- and student-generated data. These strategies overlook the synergy between loss formulations and data types, leading to a suboptimal performance boost in student models. To address this, we propose DistiLLM-2, a contrastive approach that simultaneously increases the likelihood of teacher responses and decreases that of student responses by harnessing this synergy. Our extensive experiments show that DistiLLM-2 not only builds high-performing student models across a wide range of tasks, including instruction-following and code generation, but also supports diverse applications, such as preference alignment and vision-language extensions. These findings highlight the potential of a contrastive approach to enhance the efficacy of LLM distillation by effective...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":84,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","large language models","1970-01-01","LLM","preference","distillation"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/refactorbench-evaluating-stateful-reasoning-in-language-agents-through-code","title":"RefactorBench: Evaluating Stateful Reasoning in Language Agents Through Code","url":"https://www.microsoft.com/en-us/research/publication/refactorbench-evaluating-stateful-reasoning-in-language-agents-through-code/","published":"2025-03-09","authors":["Dhruv Gautam","Spandan Garg","Jinu Jang","Neel Sundaresan","Roshanak Zilouchian Moghaddam"],"abstract":"Recent advances in language model (LM) agents and function calling have enabled autonomous, feedback-driven systems to solve problems across various digital domains. To better understand the unique limitations of LM agents, we introduce RefactorBench, a benchmark consisting of 100 large handcrafted multi-file refactoring tasks in popular open-source repositories. Solving tasks within RefactorBench requires thorough exploration of dependencies across multiple files and strong adherence to relevant instructions. Every task is defined by 3 natural language instructions of varying specificity and is mutually exclusive, allowing for the creation of longer combined tasks on the same repository. Baselines on RefactorBench reveal that current LM agents struggle with simple compositional tasks, solving only 22% of tasks with base instructions, in contrast to a human developer with short time cons...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Programming languages and software engineering","Computer science","language model","agent"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/streammind-unlocking-full-frame-rate-streaming-video-dialogue-through-event-gated-cognition","title":"StreamMind: Unlocking Full Frame Rate Streaming Video Dialogue through Event-Gated Cognition","url":"https://www.microsoft.com/en-us/research/publication/streammind-unlocking-full-frame-rate-streaming-video-dialogue-through-event-gated-cognition/","published":"2025-03-08","authors":["Xin Ding","Hao Wu","Yifan Yang","Shiqi Jiang","Qianxi Zhang","Donglin Bai","Zhibo Chen","Ting Cao"],"abstract":"With the rise of real-world human-AI interaction applications, such as AI assistants, the need for Streaming Video Dialogue is critical. To address this need, we introduce StreamMind, a video LLM framework that achieves ultra-FPS streaming video processing (100 fps on a single A100) and enables proactive, always-on responses in real time, without explicit user intervention. To solve the key challenge of the contradiction between linear video streaming speed and quadratic transformer computation cost, we propose a novel perception-cognition interleaving paradigm named ''event-gated LLM invocation'', in contrast to the existing per-time-step LLM invocation. By introducing a Cognition Gate network between the video encoder and the LLM, LLM is only invoked when relevant events occur. To realize the event feature extraction with constant cost, we propose Event-Preserving Feature Extractor (EP...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":80,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer vision","Computer science","1970-01-01","LLM","media"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W4408954391","title":"EmBARDiment: an Embodied AI Agent for Productivity in XR","url":"https://doi.org/10.1109/vr59515.2025.00093","published":"2025-03-08","authors":["Riccardo Bovo","Steven Abreu","Karan Ahuja","Eric J. Gonzalez","Li-Te Cheng","Mar González-Franco"],"abstract":"XR devices running chat-bots powered by Large Language Models (LLMs) have the to become always-on agents that enable much better productivity scenarios. Current screen based chat-bots do not take advantage of the the full-suite of natural inputs available in XR, including inward facing sensor data, instead they over-rely on explicit voice or text prompts, sometimes paired with multi-modal data dropped as part of the query. We propose a solution that leverages an attention framework that derives context implicitly from user actions, eye-gaze, and contextual memory within the XR environment. Our work minimizes the need for engineered explicit prompts, fostering grounded and intuitive interactions that glean user insights for the chat-bot.","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/vr59515.2025.00093","openalex_id":"https://openalex.org/W4408954391","cited_by_count":13,"quality_score":58,"matched_keywords":["memory","agent"],"author_affiliations":["Google (United States)"],"concepts":[{"id":"https://openalex.org/C100609095","display_name":"Embodied cognition","score":0.8146160244941711},{"id":"https://openalex.org/C204983608","display_name":"Productivity","score":0.7448235154151917},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5081989765167236},{"id":"https://openalex.org/C103683099","display_name":"Embodied agent","score":0.4251178205013275},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3043915033340454},{"id":"https://openalex.org/C162324750","display_name":"Economics","score":0.226563960313797},{"id":"https://openalex.org/C139719470","display_name":"Macroeconomics","score":0.07258343696594238}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":13}},{"id":"openalex:W4409761839","title":"H<sub>2</sub>E: Hand, Head, Eye a Multimodal Cascade of Natural Inputs","url":"https://doi.org/10.1109/vrw66409.2025.00026","published":"2025-03-08","authors":["Khushman Patel","Vrushank Phadnis","Eric J. Gonzalez","Hans Gellersen","Ken Pfeuffer","Mar González-Franco"],"abstract":"Eye-based interaction techniques for extended reality, such as gaze and pinch, are simple to use however suffer from input precision issues. We present H<inf xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">2</inf>E, an integrated fine and coarse-grained pointing framework that cascades Hand, Head, and Eye inputs. We further introduce the MagicPinch gesture as an example of the framework, which allows for smooth transitioning between fine and coarse-grained pointing. When combined together, after users initiate a pinch gesture, a cursor appears midway during the pinch at the position of the gaze, which can be dragged by head pointing if needed before pinch confirmation. This has the advantage that it can add a precision component without changing the semantics of the technique. In this paper, we describe the design of the H<inf xmlns:mml=\"http://w...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/vrw66409.2025.00026","openalex_id":"https://openalex.org/W4409761839","cited_by_count":1,"quality_score":38,"matched_keywords":[],"author_affiliations":["Aarhus University","Google (United States)"],"concepts":[{"id":"https://openalex.org/C34146451","display_name":"Cascade","score":0.7687921524047852},{"id":"https://openalex.org/C2780312720","display_name":"Head (geology)","score":0.7368294596672058},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6590068936347961},{"id":"https://openalex.org/C2776608160","display_name":"Natural (archaeology)","score":0.5371293425559998},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4181958734989166},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.3568190932273865},{"id":"https://openalex.org/C127413603","display_name":"Engineering","score":0.14525717496871948},{"id":"https://openalex.org/C127313418","display_name":"Geology","score":0.12222567200660706}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/exploring-interpretability-for-visual-prompt-tuning-with-hierarchical-concepts","title":"Exploring Interpretability for Visual Prompt Tuning with Hierarchical Concepts","url":"https://www.microsoft.com/en-us/research/publication/exploring-interpretability-for-visual-prompt-tuning-with-hierarchical-concepts/","published":"2025-03-07","authors":["Yubin Wang","Xinyang Jiang","De Cheng","Xiangqian Zhao","Zilong Wang","Dongsheng Li","Cairong Zhao"],"abstract":"Visual prompt tuning offers significant advantages for adapting pre-trained visual foundation models to specific tasks. However, current research provides limited insight into the interpretability of this approach, which is essential for enhancing AI reliability and enabling AI-driven knowledge discovery. In this paper, rather than learning abstract prompt embeddings, we propose the first framework, named Interpretable Visual Prompt Tuning (IVPT), to explore interpretability for visual prompts, by introducing hierarchical concept prototypes. Specifically, visual prompts are linked to human-understandable semantic concepts, represented as a set of category-agnostic prototypes, each corresponding to a specific region of the image. Then, IVPT aggregates features from these regions to generate interpretable prompts, which are structured hierarchically to explain visual prompts at different g...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","visual prompt tuning","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W4408231555","title":"A unified acoustic-to-speech-to-language embedding space captures the neural basis of natural language processing in everyday conversations","url":"https://doi.org/10.1038/s41562-025-02105-9","published":"2025-03-07","authors":["Ariel Goldstein","Haocheng Wang","Leonard Niekerken","Mariano Schain","Zaid Zada","Bobbi Aubrey","Tom Sheffer","Samuel A. Nastase","Harshvardhan Gazula","Aditi Singh","Aditi Rao","Gina Choe"],"abstract":"This study introduces a unified computational framework connecting acoustic, speech and word-level linguistic structures to study the neural basis of everyday conversations in the human brain. We used electrocorticography to record neural signals across 100 h of speech production and comprehension as participants engaged in open-ended real-life conversations. We extracted low-level acoustic, mid-level speech and contextual word embeddings from a multimodal speech-to-text model (Whisper). We developed encoding models that linearly map these embeddings onto brain activity during speech production and comprehension. Remarkably, this model accurately predicts neural activity at each level of the language processing hierarchy across hours of new conversations not used in training the model. The internal processing hierarchy in the model is aligned with the cortical hierarchy for speech and la...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1038/s41562-025-02105-9","openalex_id":"https://openalex.org/W4408231555","cited_by_count":26,"quality_score":63,"matched_keywords":[],"author_affiliations":["Athinoula A. Martinos Center for Biomedical Imaging","Google (United States)","Harvard University","Hebrew University of Jerusalem","Maastricht University","Massachusetts General Hospital","New York University","Princeton University"],"concepts":[{"id":"https://openalex.org/C12426560","display_name":"Basis (linear algebra)","score":0.6351660490036011},{"id":"https://openalex.org/C2778572836","display_name":"Space (punctuation)","score":0.6054491996765137},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6009629964828491},{"id":"https://openalex.org/C41608201","display_name":"Embedding","score":0.5877289175987244},{"id":"https://openalex.org/C2776608160","display_name":"Natural (archaeology)","score":0.5129256248474121},{"id":"https://openalex.org/C28490314","display_name":"Speech recognition","score":0.39955148100852966},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.35130542516708374},{"id":"https://openalex.org/C46312422","display_name":"Communication","score":0.35070517659187317}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":26}},{"id":"apple:s25raqcu68nltdl89xpvwbjh","title":"KV Prediction for Improved Time to First Token","url":"https://machinelearning.apple.com/research/kv-prediction","published":"2025-03-07","authors":["Maxwell Horton","Qingqing Cao","Chenfan Sun","Yanzi Jin","Sachin Mehta","Mohammad Rastegari","Moin Nabi"],"abstract":"Inference with transformer-based language models begins with a prompt processing step. In this step, the model generates the first output token and stores the KV cache needed for future generation steps. This prompt processing step can be computationally expensive, taking 10s of seconds or more for billion-parameter models on edge devices when prompt lengths or batch sizes rise. This degrades user experience by introducing significant latency...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"openalex:W4411846436","title":"MTIDHHGAN: A Multimodal Temporal Information-Driven Hierarchical Heterogeneous Graph Attention Network for Stock Movement Prediction","url":"https://doi.org/10.1145/3729706.3729757","published":"2025-03-07","authors":["Zhirui Wu","Peihui Chen","Jijun Yu","Xiaorong Ye"],"abstract":"The continuous change in stock market dynamics as well as stochastic patterns has transformed stock movement prediction into an enduringly complex scientific challenge. Graph neural networks became ubiquitous tools for modeling stock relations across the industry during the past few years. Despite their proven effectiveness the stock market dynamics exceed the capacity of present-day graph neural networks to process complex system evolution and numerous interconnected patterns. The proposed framework introduces MTIDHHGAN which stands for Multimodal Temporal Information-Driven Hierarchical Heterogeneous Graph Attention Network. This model represents its main strength through its ability to unite and embed two different kinds of multimodal data types (stock prices and news information) at multiple temporal scales. These multimodal embeddings are then used to make predictions about stock mo...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3729706.3729757","openalex_id":"https://openalex.org/W4411846436","cited_by_count":0,"quality_score":41,"matched_keywords":["news"],"author_affiliations":["Huawei Technologies (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7321304678916931},{"id":"https://openalex.org/C2780226923","display_name":"Movement (music)","score":0.44187480211257935},{"id":"https://openalex.org/C132525143","display_name":"Graph","score":0.43975627422332764},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.38998931646347046},{"id":"https://openalex.org/C80444323","display_name":"Theoretical computer science","score":0.24506613612174988},{"id":"https://openalex.org/C107038049","display_name":"Aesthetics","score":0.0},{"id":"https://openalex.org/C138885662","display_name":"Philosophy","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/validating-llm-as-a-judge-systems-under-rating-indeterminacy","title":"Validating LLM-as-a-Judge Systems under Rating Indeterminacy","url":"https://www.microsoft.com/en-us/research/publication/validating-llm-as-a-judge-systems-under-rating-indeterminacy/","published":"2025-03-06","authors":["Luke Guerdan","Solon Barocas","Kenneth Holstein","Hanna Wallach","Zhiwei Steven Wu","Alex Chouldechova"],"abstract":"The LLM-as-a-judge paradigm, in which a judge LLM system replaces human raters in rating the outputs of other generative AI (GenAI) systems, plays a critical role in scaling and standardizing GenAI evaluations. To validate such judge systems, evaluators assess human--judge agreement by first collecting multiple human ratings for each item in a validation corpus, then aggregating the ratings into a single, per-item gold label rating. For many items, however, rating criteria may admit multiple valid interpretations, so a human or LLM rater may deem multiple ratings\"reasonable\"or\"correct\". We call this condition rating indeterminacy. Problematically, many rating tasks that contain rating indeterminacy rely on forced-choice elicitation, whereby raters are instructed to select only one rating for each item. In this paper, we introduce a framework for validating LLM-as-a-judge systems under ra...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","large language models","1970-01-01","LLM"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/nature-language-model-deciphering-the-language-of-nature-for-scientific-discovery","title":"Nature Language Model: Deciphering the Language of Nature for Scientific Discovery","url":"https://www.microsoft.com/en-us/research/publication/nature-language-model-deciphering-the-language-of-nature-for-scientific-discovery/","published":"2025-03-06","authors":["Yingce Xia","Peiran Jin","Shufang Xie","Liang He","Chuan Cao","Renqian Luo","Guoqing Liu","Yue Wang","Zequn Liu","Yuan-Jyue Chen","Yuan Chen","Zekun Guo"],"abstract":"Foundation models have revolutionized natural language processing and artificial intelligence, significantly enhancing how machines comprehend and generate human languages. Inspired by the success of these foundation models, researchers have developed foundation models for individual scientific domains, including small molecules, materials, proteins, DNA, RNA and even cells. However, these models are typically trained in isolation, lacking the ability to integrate across different scientific domains. Recognizing that entities within these domains can all be represented as sequences, which together form the\"language of nature\", we introduce Nature Language Model (NatureLM), a sequence-based science foundation model designed for scientific discovery. Pre-trained with data from multiple scientific domains, NatureLM offers a unified, versatile model that enables various applications includin...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Miscellaneous","Artificial intelligence","Computer science","language model"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"official:de0dc4ca679356fd","title":"QwQ-32B: Embracing the Power of Reinforcement Learning","url":"https://qwenlm.github.io/blog/qwq-32b/","published":"2025-03-06","authors":["Alibaba/Qwen"],"abstract":"QWEN CHAT Hugging Face ModelScope DEMO DISCORDScaling Reinforcement Learning (RL) has the potential to enhance model performance beyond conventional pretraining and post-training methods. Recent studies have demonstrated that RL can significantly improve the reasoning capabilities of models. For instance, DeepSeek R1 has achieved state-of-the-art performance by integrating cold-start data and multi-stage training, enabling deep thinking and complex reasoning.Our research explores the scalability of Reinforcement Learning (RL) and its impact on enhancing the intelligence of large language models.","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Alibaba/Qwen"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official RSS feed https://qwenlm.github.io/blog/index.xml"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/compositional-causal-reasoning-evaluation-in-language-models","title":"Compositional Causal Reasoning Evaluation in Language Models","url":"https://www.microsoft.com/en-us/research/publication/compositional-causal-reasoning-evaluation-in-language-models/","published":"2025-03-05","authors":["Jacqueline Maasch","Alihan Hüyük","Xinnuo Xu","Aditya Nori","Javier González"],"abstract":"Causal reasoning and compositional reasoning are two core aspirations in AI. Measuring the extent of these behaviors requires principled evaluation methods. We explore a unified perspective that considers both behaviors simultaneously, termed compositional causal reasoning (CCR): the ability to infer how causal measures compose and, equivalently, how causal quantities propagate through graphs. We instantiate a framework for the systematic evaluation of CCR for the average treatment effect and the probability of necessity and sufficiency. As proof of concept, we demonstrate CCR evaluation for language models in the LLama, Phi, and GPT families. On a math word problem, our framework revealed a range of taxonomically distinct error patterns. CCR errors increased with the complexity of causal paths for all models except o1. Venue: 1970-01-01","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","Language model","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"apple:iue4zc71qfunkl7mrknjofyw","title":"SELMA: A Speech-Enabled Language Model for Virtual Assistant Interactions","url":"https://machinelearning.apple.com/research/selma-speech-enabled-language","published":"2025-03-05","authors":["Dominik Wagner","Alexander Churchill","Siddharth Sigtia","Erik Marchi"],"abstract":"In this work, we present and evaluate SELMA, a Speech-Enabled Language Model for virtual Assistant interactions that integrates audio and text as inputs to a Large Language Model (LLM). SELMA is designed to handle three primary and two auxiliary tasks related to interactions with virtual assistants simultaneously within a single end-to-end model. We employ low-rank adaptation modules for parameter-efficient training of both the audio encoder and...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["LLM","language model","efficient"],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"apple:kj3bk5gkemqabfpwjeyr2myk","title":"M2R2: Mixture of Multi-Rate Residuals for Efficient Transformer Inference","url":"https://machinelearning.apple.com/research/multi-rate-residuals-transformers","published":"2025-03-05","authors":["Nikhil Bhendawade","Mahyar Najibi","Devang Naik","Irina Belousova"],"abstract":"Residual transformations enhance the representational depth and expressive power of large language models (LLMs). However, applying static residual transformations across all tokens in auto-regressive generation leads to a suboptimal trade-off between inference efficiency and generation fidelity. Existing methods, including Early Exiting, Skip Decoding, and Mixture-of-Depth address this by modulating the residual transformation based on...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["efficient"],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"apple:mm8qnqkdmrcsieba1o5z9mnf","title":"Does Spatial Cognition Emerge in Frontier Models?","url":"https://machinelearning.apple.com/research/cognition-emerge-frontier-models","published":"2025-03-05","authors":["Santhosh Kumar Ramakrishnan","Erik Wijmans","Philipp Krähenbühl","Vladlen Koltun"],"abstract":"Not yet. We present SPACE, a benchmark that systematically evaluates spatial cognition in frontier models. Our benchmark builds on decades of research in cognitive science. It evaluates large-scale mapping abilities that are brought to bear when an organism traverses physical environments, smaller-scale reasoning about object shapes and layouts, and cognitive infrastructure such as spatial attention and memory. For many tasks, we instantiate...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["memory"],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"bytedance-seed:247","title":"LLaVA-Critic: Learning to Evaluate Multimodal Models","url":"https://seed.bytedance.com/en/research/llava-critic-learning-to-evaluate-multimodal-models","published":"2025-03-04","authors":["Tianyi Xiong","Xiyao Wang","Dong Guo","Qinghao Ye","Haoqi Fan","Quanquan Gu","Heng Huang","Chunyuan Li"],"abstract":"We introduce LLaVA-Critic, the first open-source large multimodal model (LMM) designed as a generalist evaluator to assess performance across a wide range of multimodal tasks. LLaVA-Critic is trained using a high-quality critic instruction-following dataset that incorporates diverse evaluation criteria and scenarios. Our experiments demonstrate the model's effectiveness in two key areas: (1) LMM-as-a-Judge, where LLaVA-Critic provides reliable evaluation scores, performing on par with or surpassing GPT models on multiple evaluation benchmarks; and (2) Preference Learning, where it generates reward signals for preference learning, enhancing model alignment capabilities. This work underscores the potential of open-source LMMs in self-critique and evaluation, setting the stage for future research into scalable, superhuman alignment feedback mechanisms for LMMs. External paper link: https://...","companies":["ByteDance/Seed"],"matched_orgs":["ByteDance/Seed"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Computer Vision","Vision","CVPR 2025","preference"],"author_affiliations":["ByteDance/Seed"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official ByteDance Seed publication API https://seed.bytedance.com/api/get_article_list_v2"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/overreliance-risk-identification-and-mitigation-framework","title":"Overreliance risk identification and mitigation framework","url":"https://www.microsoft.com/en-us/research/publication/overreliance-risk-identification-and-mitigation-framework/","published":"2025-03-04","authors":["Samir Passi","Mihaela Vorvoreanu","Ruth Kikin-Gil"],"abstract":"The Overreliance risk identification and mitigation framework guides product teams through 1) a series of questions meant to help them identify and characterize what the risk of overreliance looks like in their particular product or feature; 2) three UX goals to accomplish in order to foster appropriate reliance on genAI, with associated strategies, examples, and UX research evaluation questions. Cite as: Passi, S., Vorvoreanu, M., & Kikin-Gil, R. (2025). Overreliance risk identification and mitigation framework . Microsoft Technical Report MSR-TR-2025-26. https://www.microsoft.com/en-us/research/publication/overreliance-risk-identification-and-mitigation-framework/","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Tech Report","Human-computer interaction"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"apple:khfl0dvxafvvtwockkslffk8","title":"MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-Tuning","url":"https://machinelearning.apple.com/research/mm15-methods-analysis-insights","published":"2025-03-04","authors":["Haotian Zhang","Mingfei Gao","Zhe Gan","Philipp Dufter","Nina Wenzel","Forrest Huang","Dhruti Shah","Xianzhi Du","Bowen Zhang","Yanghao Li","Sam Dodge","Keen You"],"abstract":"We present MM1.5, a new family of multimodal large language models (MLLMs) designed to enhance capabilities in text-rich image understanding, visual referring and grounding, and multi-image reasoning. Building upon the MM1 architecture, MM1.5 adopts a data-centric approach to model training, systematically exploring the impact of diverse data mixtures across the entire model training lifecycle. This includes high-quality OCR data and synthetic...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["LLM"],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/bridge-bootstrapping-text-to-control-time-series-generation-via-multi-agent-iterative-optimization-and-diffusion-modelling","title":"BRIDGE: Bootstrapping Text to Control Time-Series Generation via Multi-Agent Iterative Optimization and Diffusion Modelling","url":"https://www.microsoft.com/en-us/research/publication/bridge-bootstrapping-text-to-control-time-series-generation-via-multi-agent-iterative-optimization-and-diffusion-modelling/","published":"2025-03-03","authors":["Hao Li","Yu-Hao Huang","Chang Xu","Viktor Schlegel","Ren-He Jiang","R. Batista-Navarro","Goran Nenadic","Jiang Bian"],"abstract":"Time-series Generation (TSG) is a prominent research area with broad applications in simulations, data augmentation, and counterfactual analysis. While existing methods have shown promise in unconditional single-domain TSG, real-world applications demand for cross-domain approaches capable of controlled generation tailored to domain-specific constraints and instance-level requirements. In this paper, we argue that text can provide semantic insights, domain information and instance-specific temporal patterns, to guide and improve TSG. We introduce Text-Controlled TSG'', a task focused on generating realistic time series by incorporating textual descriptions. To address data scarcity in this setting, we propose a novel LLM-based Multi-Agent framework that synthesizes diverse, realistic text-to-TS datasets. Furthermore, we introduce BRIDGE, a hybrid text-controlled TSG framework that integr...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":80,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","1970-01-01","LLM","agent","multi-agent"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/anyprefer-an-agentic-framework-for-preference-data-synthesis","title":"Anyprefer: An Agentic Framework for Preference Data Synthesis","url":"https://www.microsoft.com/en-us/research/publication/anyprefer-an-agentic-framework-for-preference-data-synthesis/","published":"2025-03-03","authors":["Yiyang Zhou","Zhaoyang Wang","Tianle Wang","Shangyu Xing","Peng Xia","Bo Li","Kaiyuan Zheng","Zijian Zhang","Zhaorun Chen","Wenhao Zheng","Xuchao Zhang","Chetan Bansal"],"abstract":"High-quality preference data is essential for aligning foundation models with human values through preference learning. However, manual annotation of such data is often time-consuming and costly. Recent methods often adopt a self-rewarding approach, where the target model generates and annotates its own preference data, but this can lead to inaccuracies since the reward model shares weights with the target model, thereby amplifying inherent biases. To address these issues, we propose Anyprefer, a framework designed to synthesize high-quality preference data for aligning the target model. Anyprefer frames the data synthesis process as a cooperative two-player Markov Game, where the target model and the judge model collaborate together. Here, a series of external tools are introduced to assist the judge model in accurately rewarding the target model’s responses, mitigating biases in the re...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","agentic AI","1970-01-01","preference"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"bytedance-seed:217","title":"The Rise and Down of Babel Tower: Investigating the Evolution Process of Multilingual Code Large Language Model","url":"https://seed.bytedance.com/en/research/the-rise-and-down-of-babel-tower-investigating-the-evolution-process-of-multilingual-code-large-language-model","published":"2025-03-03","authors":["Jiawei Chen","Wentao Chen","Jing Su","Jingjing Xu","Hongyu Lin","Mengjie Ren","Yaojie Lu","Xianpei Han","Le Sun"],"abstract":"Large language models (LLMs) have shown significant multilingual capabilities. However, the mechanisms underlying the development of these capabilities during pre-training are not well understood. In this paper, we use code LLMs as an experimental platform to explore the evolution of multilingual capabilities in LLMs during the pre-training process. Based on our observations, we propose the Babel Tower Hypothesis, which describes the entire process of LLMs acquiring new language capabilities. During the learning process, multiple languages initially share a single knowledge system dominated by the primary language and gradually develop language-specific knowledge systems. We then validate the above hypothesis by tracking the internal states of the LLMs through identifying working languages and language transferring neurons. Experimental results show that the internal state changes of the...","companies":["ByteDance/Seed"],"matched_orgs":["ByteDance/Seed"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["LLM","ICLR 2025","language model"],"author_affiliations":["ByteDance/Seed"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official ByteDance Seed publication API https://seed.bytedance.com/api/get_article_list_v2"}},{"id":"openalex:W4414878510","title":"Survey of Uncertainty Estimation in Large Language Models -Sources, Methods, Applications, and Challenge","url":"https://hal.science/hal-04973361","published":"2025-03-03","authors":["Jianfeng He","Linlin Yu","Changbin Li","Runing Yang","Fanglan Chen","Kangshuo Li","Min Zhang","Shuo Lei","Xuchao Zhang","Mohammad Beigi","Kaize Ding","Bei Xiao"],"abstract":"<div> Large Language Models (LLMs) have demonstrated exceptional performance across a wide range of domains, including everyday life, finance, law, and healthcare. However, inaccurate LLM generation has led to significant penalties in sensitive areas such as finance and health, where inaccuracy can result in the loss of money, time, or even life. Consequently, recent research has increasingly focused on uncertainty estimation for LLMs, aiming to quantify the likelihood that a model's generation can be trusted given the input. Even though LLMs can be applied to multiple tasks compared to pre-trained models, it also have some unique challenges in the field of Natural Language Processing (NLP) such as inaccessible training data and difficulty in fine-tuning. Therefore, uncertainty estimation for LLMs must address unique sources of uncertainty in addition to traditional uncertainty source in...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"preprint","doi":"","openalex_id":"https://openalex.org/W4414878510","cited_by_count":0,"quality_score":41,"matched_keywords":["LLM"],"author_affiliations":["American University","Microsoft (United States)","Northwestern University","The University of Texas at Dallas","University of California, Davis","Virginia Tech"],"concepts":[{"id":"https://openalex.org/C96250715","display_name":"Estimation","score":0.8187999725341797},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6409000158309937},{"id":"https://openalex.org/C177803969","display_name":"Uncertainty analysis","score":0.515999972820282},{"id":"https://openalex.org/C26517878","display_name":"Key (lock)","score":0.5151000022888184},{"id":"https://openalex.org/C204323151","display_name":"Range (aeronautics)","score":0.49630001187324524},{"id":"https://openalex.org/C32230216","display_name":"Uncertainty quantification","score":0.475600004196167},{"id":"https://openalex.org/C176147448","display_name":"Sensitivity analysis","score":0.4560999870300293},{"id":"https://openalex.org/C206345919","display_name":"Resource (disambiguation)","score":0.45100000500679016}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/eagle-3-scaling-up-inference-acceleration-of-large-language-models-via-training-time-test","title":"EAGLE-3: Scaling up Inference Acceleration of Large Language Models via Training-Time Test","url":"https://www.microsoft.com/en-us/research/publication/eagle-3-scaling-up-inference-acceleration-of-large-language-models-via-training-time-test/","published":"2025-03-02","authors":["Yuhui Li","Fangyun Wei","Chao Zhang","Hongyang Zhang"],"abstract":"The sequential nature of modern LLMs makes them expensive and slow, and speculative sampling has proven to be an effective solution to this problem. Methods like EAGLE perform autoregression at the feature level, reusing top-layer features from the target model to achieve better results than vanilla speculative sampling. A growing trend in the LLM community is scaling up training data to improve model intelligence without increasing inference costs. However, we observe that scaling up data provides limited improvements for EAGLE. We identify that this limitation arises from EAGLE's feature prediction constraints. In this paper, we introduce EAGLE-3, which abandons feature prediction in favor of direct token prediction and replaces reliance on top-layer features with multi-layer feature fusion via a technique named training-time test. These improvements significantly enhance performance a...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","large language models","1970-01-01","LLM"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/phi-4-mini-technical-report-compact-yet-powerful-multimodal-language-models-via-mixture-of-loras","title":"Phi-4-Mini Technical Report: Compact yet Powerful Multimodal Language Models via Mixture-of-LoRAs","url":"https://www.microsoft.com/en-us/research/publication/phi-4-mini-technical-report-compact-yet-powerful-multimodal-language-models-via-mixture-of-loras/","published":"2025-03-02","authors":["Abdelrahman Abouelenin","Atabak Ashfaq","Adam Atkinson","H. Awadalla","Nguyen Bach","Jianmin Bao","A. Benhaim","Martin Cai","Vishrav Chaudhary","Congcong Chen","Dongdong Chen","Dongdong Chen"],"abstract":"We introduce Phi-4-Mini and Phi-4-Multimodal, compact yet highly capable language and multimodal models. Phi-4-Mini is a 3.8-billion-parameter language model trained on high-quality web and synthetic data, significantly outperforming recent open-source models of similar size and matching the performance of models twice its size on math and coding tasks requiring complex reasoning. This achievement is driven by a carefully curated synthetic data recipe emphasizing high-quality math and coding datasets. Compared to its predecessor, Phi-3.5-Mini, Phi-4-Mini features an expanded vocabulary size of 200K tokens to better support multilingual applications, as well as group query attention for more efficient long-sequence generation. Phi-4-Multimodal is a multimodal model that integrates text, vision, and speech/audio input modalities into a single model. Its novel modality extension approach le...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Article (Journal)","Artificial intelligence","Computer science","language model","efficient"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"arxiv:2503.01052","title":"ALinFiK: Learning to Approximate Linearized Future Influence Kernel for Scalable Third-Party LLM Data Valuation","url":"https://huggingface.co/papers/2503.01052","published":"2025-03-02","authors":["Yanzhou Pan","Huawei Lin","Yide Ran","Jiamin Chen","Xiaodong Yu","Weijie Zhao","Denghui Zhang","Zhaozhuo Xu"],"abstract":"Large Language Models (LLMs) heavily rely on high-quality training data, making data valuation crucial for optimizing model performance, especially when working within a limited budget. In this work, we aim to offer a third-party data valuation approach that benefits both data providers and model developers. We introduce a linearized future influence kernel (LinFiK), which assesses the value of individual data samples in improving LLM performance during training. We further propose ALinFiK, a learning strategy to approximate LinFiK, enabling scalable data valuation. Our comprehensive evaluations demonstrate that this approach surpasses existing baselines in effectiveness and efficiency, demonstrating significant scalability advantages as LLM parameters increase.","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["huggingface_search"],"source":"huggingface_search","work_type":"","doi":"","openalex_id":"","cited_by_count":0,"quality_score":31,"matched_keywords":["LLM"],"author_affiliations":[],"concepts":[],"official_report":false,"quality_signals":{"company_match_source":"HuggingFace Papers organization/author metadata"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/vattention-dynamic-memory-management-for-serving-llms-without-pagedattention","title":"vAttention: Dynamic Memory Management for Serving LLMs without PagedAttention","url":"https://www.microsoft.com/en-us/research/publication/vattention-dynamic-memory-management-for-serving-llms-without-pagedattention/","published":"2025-03-01","authors":["Ramya Prabhu","Ajay Nayak","Jayashree Mohan","Ramachandran Ramjee","Ashish Panwar"],"abstract":"Efficient use of GPU memory is essential for high throughput LLM inference. Prior systems reserved memory for the KV-cache ahead-of-time, resulting in wasted capacity due to internal fragmentation. Inspired by OS-based virtual memory systems, vLLM proposed PagedAttention to enable dynamic memory allocation for KV-cache. This approach eliminates fragmentation, enabling high-throughput LLM serving with larger batch sizes. However, to be able to allocate physical memory dynamically, PagedAttention changes the layout of KV-cache from contiguous virtual memory to non-contiguous virtual memory. This change requires attention kernels to be rewritten to support paging, and serving framework to implement a memory manager. Thus, the PagedAttention model leads to software complexity, portability issues, redundancy and inefficiency. In this paper, we propose vAttention for dynamic KV-cache memory ma...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":80,"matched_keywords":["Inproceedings (Conference)","Systems and networking","Computer science","1970-01-01","LLM","memory","efficient"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/advancing-mobile-gui-agents-a-verifier-driven-approach-to-practical-deployment","title":"Advancing Mobile GUI Agents: A Verifier-Driven Approach to Practical Deployment","url":"https://www.microsoft.com/en-us/research/publication/advancing-mobile-gui-agents-a-verifier-driven-approach-to-practical-deployment/","published":"2025-03-01","authors":["Gaole Dai","Shiqi Jiang","Ting Cao","Yuanchun Li","Yuqing Yang","Rui Tan","Mo Li","Lili Qiu"],"abstract":"We propose V-Droid, a mobile GUI task automation agent. Unlike previous mobile agents that utilize Large Language Models (LLMs) as generators to directly generate actions at each step, V-Droid employs LLMs as verifiers to evaluate candidate actions before making final decisions. To realize this novel paradigm, we introduce a comprehensive framework for constructing verifier-driven mobile agents: the discretized action space construction coupled with the prefilling-only workflow to accelerate the verification process, the pair-wise progress preference training to significantly enhance the verifier's decision-making capabilities, and the scalable human-agent joint annotation scheme to efficiently collect the necessary data at scale. V-Droid sets a new state-of-the-art task success rate across several public mobile task automation benchmarks: 59.5% on AndroidWorld, 38.3% on AndroidLab, and....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":80,"matched_keywords":["Unpublished","Artificial intelligence","Human-computer interaction","Systems and networking","AI agents","preference","agent"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/pod-attention-unlocking-full-prefill-decode-overlap-for-faster-llm-inference","title":"POD-Attention: Unlocking Full Prefill-Decode Overlap for Faster LLM Inference","url":"https://www.microsoft.com/en-us/research/publication/pod-attention-unlocking-full-prefill-decode-overlap-for-faster-llm-inference/","published":"2025-03-01","authors":["Aditya K Kamath","Ramya Prabhu","Jayashree Mohan","Simon Peter","Ramachandran Ramjee","Ashish Panwar"],"abstract":"Each request in LLM inference goes through two phases: compute-bound prefill and memory-bandwidth-bound decode. To improve GPU utilization, recent systems use hybrid batching that combines the prefill and decode phases of different requests into the same batch. This approach optimizes linear operations but remains inefficient for attention computation because existing attention kernels specialize execution independently for the prefill and decode phases.In this paper, we present POD-Attention - the first GPU kernel that efficiently computes attention for hybrid batches. POD-Attention aims to maximize the utilization of both compute and memory bandwidth by carefully allocating the GPU's resources such that prefill and decode operations happen concurrently on the same multiprocessor. POD-Attention speeds up attention computation by up to 59 % (mean 28 % ), enabling higher throughput and lo...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Systems and networking","Computer science","1970-01-01","LLM","memory"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/navigating-rifts-in-human-llm-grounding-study-and-benchmark","title":"Navigating Rifts in Human-LLM Grounding: Study and Benchmark","url":"https://www.microsoft.com/en-us/research/publication/navigating-rifts-in-human-llm-grounding-study-and-benchmark/","published":"2025-03-01","authors":["Omar Shaikh","Hussein Mozannar","Gagan Bansal","Adam Fourney","Eric Horvitz"],"abstract":"Language models excel at following instructions but often struggle with the collaborative aspects of conversation that humans naturally employ. This limitation in grounding -- the process by which conversation participants establish mutual understanding -- can lead to outcomes ranging from frustrated users to serious consequences in high-stakes scenarios. To systematically study grounding challenges in human-LLM interactions, we analyze logs from three human-assistant datasets: WildChat, MultiWOZ, and Bing Chat. We develop a taxonomy of grounding acts and build models to annotate and forecast grounding behavior. Our findings reveal significant differences in human-human and human-LLM grounding: LLMs were three times less likely to initiate clarification and sixteen times less likely to provide follow-up requests than humans. Additionally, early grounding failures predicted later interact...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Human language technologies","Human-computer interaction","1970-01-01","LLM"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/convolutional-neural-network-transformer-cnnt-for-fluorescence-microscopy-image-denoising-with-improved-generalization-and-fast-adaptation","title":"Convolutional neural network transformer (CNNT) for fluorescence microscopy image denoising with improved generalization and fast adaptation","url":"https://www.microsoft.com/en-us/research/publication/convolutional-neural-network-transformer-cnnt-for-fluorescence-microscopy-image-denoising-with-improved-generalization-and-fast-adaptation/","published":"2025-03-01","authors":["Hui Xue"],"abstract":"Deep neural networks can improve the quality of fluorescence microscopy images. Previous methods, based on Convolutional Neural Networks (CNNs), require time-consuming training of individual models for each experiment, impairing their applicability and generalization. In this study, we propose a novel imaging-transformer based model, Convolutional Neural Network Transformer (CNNT), that outperforms CNN based networks for image denoising. We train a general CNNT based backbone model from pairwise high-low Signal-to-Noise Ratio (SNR) image volumes, gathered from a single type of fluorescence microscope, an instant Structured Illumination Microscope. Fast adaptation to new microscopes is achieved by fine-tuning the backbone on only 5–10 image volume pairs per new experiment. Results show that the CNNT backbone and fine-tuning scheme significantly reduces training time and improves image qua...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.1038/s41598-024-68918-2","openalex_id":"https://openalex.org/W4394653719","cited_by_count":15,"quality_score":75,"matched_keywords":["Article (Journal)","Artificial intelligence"],"author_affiliations":["Microsoft","Center for Biologics Evaluation and Research","Howard Hughes Medical Institute","Janelia Research Campus","Microsoft (United States)","National Heart Lung and Blood Institute","National Institute of Biomedical Imaging and Bioengineering","National Institutes of Health","United States Food and Drug Administration"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/promptpex-automatic-test-generation-for-language-model-prompts","title":"PromptPex: Automatic Test Generation for Language Model Prompts","url":"https://www.microsoft.com/en-us/research/publication/promptpex-automatic-test-generation-for-language-model-prompts/","published":"2025-03-01","authors":["Reshabh Sharma","Jonathan \"Peli\" de Halleux","Shraddha Barke","Ben Zorn"],"abstract":"Large language models (LLMs) are being used in many applications and prompts for these models are integrated into software applications as code-like artifacts. These prompts behave much like traditional software in that they take inputs, generate outputs, and perform some specific function. However, prompts differ from traditional code in many ways and require new approaches to ensure that they are robust. For example, unlike traditional software the output of a prompt depends on the AI model that interprets it. Also, while natural language prompts are easy to modify, the impact of updates is harder to predict. New approaches to testing, debugging, and modifying prompts with respect to the model running them are required.To address some of these issues, we developed PromptPex, an LLM-based tool to automatically generate and evaluate unit tests for a given prompt. PromptPex extracts input...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Unpublished","Artificial intelligence","Programming languages and software engineering","LLM","language model"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/dynamollm-designing-llm-inference-clusters-for-performance-and-energy-efficiency","title":"DynamoLLM: Designing LLM Inference Clusters for Performance and Energy Efficiency","url":"https://www.microsoft.com/en-us/research/publication/dynamollm-designing-llm-inference-clusters-for-performance-and-energy-efficiency/","published":"2025-03-01","authors":["Jovan Stojkovic","Chaojie Zhang","Íñigo Goiri","Josep Torrellas","Esha Choukse"],"abstract":"The rapid evolution and widespread adoption of generative large language models (LLMs) have made them a pivotal workload in various applications. Today, LLM inference clusters receive a large number of queries with strict Service Level Objectives (SLOs). To achieve the desired performance, these models execute on power-hungry GPUs causing the inference clusters to consume large amount of energy and, consequently, result in excessive carbon emissions. Fortunately, we find that there is a great opportunity to exploit the heterogeneity in inference compute properties and fluctuations in inference workloads, to significantly improve energy-efficiency. However, such a diverse and dynamic environment creates a large search-space where different system configurations (e.g., number of instances, model parallelism, and GPU frequency) translate into different energy-performance trade-offs. To addr...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Inproceedings (Conference)","Systems and networking","Computer science","1970-01-01","LLM"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/llm4evalwsdm-2025-large-language-model-for-evaluation-in-information-retrieval","title":"LLM4Eval@WSDM 2025: Large Language Model for Evaluation in Information Retrieval","url":"https://www.microsoft.com/en-us/research/publication/llm4evalwsdm-2025-large-language-model-for-evaluation-in-information-retrieval/","published":"2025-03-01","authors":["Hossein A. Rahmani","Clemencia Siro","Mohammad Aliannejadi","Nick Craswell","Charles L A Clarke","Guglielmo Faggioli","Bhaskar Mitra","Paul Thomas","Emine Yilmaz"],"abstract":"Large language models (LLMs) have demonstrated increasing task-solving abilities not present in smaller models. Utilizing the capabilities and responsibilities of LLMs for automated evaluation (LLM4Eval) has recently attracted considerable attention in multiple research communities. For instance, LLM4Eval models have been studied in the context of automated judgments, natural language generation, and retrieval augmented generation systems. We believe that the information retrieval community can significantly contribute to this growing research area by designing, implementing, analyzing, and evaluating various aspects of LLMs with applications to LLM4Eval tasks. The main goal of LLM4Eval workshop is to bring together researchers from industry and academia to discuss various aspects of LLMs for evaluation in information retrieval, including automated judgments, retrieval-augmented generati...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.1145/3701551.3705706","openalex_id":"https://openalex.org/W4407953544","cited_by_count":3,"quality_score":71,"matched_keywords":["Inproceedings (Conference)","Search and information retrieval","language model","retrieval"],"author_affiliations":["Microsoft","Amazon (United Kingdom)","Bellevue Hospital Center","Microsoft (Canada)","Microsoft (United States)","University College London","University of Amsterdam","University of Padua","University of Waterloo"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"bytedance-seed:218","title":"TC-MoE: Augmenting Mixture of Experts with Ternary Expert Choice","url":"https://seed.bytedance.com/en/research/tc-moe-augmenting-mixture-of-experts-with-ternary-expert-choice","published":"2025-03-01","authors":["Shen Yan","Xingyan Bin","Sijun Zhang","Yisen Wang","Zhouchen Lin"],"abstract":"The Mixture of Experts (MoE) architecture has emerged as a promising solution to reduce computational overhead by selectively activating subsets of model parameters. The effectiveness of MoE models depends primarily on their routing mechanisms, with the widely adopted Top-K routing scheme used for activating experts. However, the Top-K scheme has notable limitations, including unnecessary activations and underutilization of experts. In this work, rather than modifying the routing mechanism as done in previous studies, we propose the Ternary Choice MoE (TC-MoE), a novel approach that expands the expert space by applying the ternary set {-1, 0, 1} to each expert. This expansion allows more efficient and effective expert activations without incurring significant computational costs. Additionally, given the unique characteristics of the expanded expert space, we introduce a new load balance....","companies":["ByteDance/Seed"],"matched_orgs":["ByteDance/Seed"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["LLM","Infrastructures","ICLR 2025","efficient"],"author_affiliations":["ByteDance/Seed"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official ByteDance Seed publication API https://seed.bytedance.com/api/get_article_list_v2"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/designdiffusion-high-quality-text-to-design-image-generation-with-diffusion-models","title":"DesignDiffusion: High-Quality Text-to-Design Image Generation with Diffusion Models","url":"https://www.microsoft.com/en-us/research/publication/designdiffusion-high-quality-text-to-design-image-generation-with-diffusion-models/","published":"2025-03-01","authors":["Zhendong Wang","Jianmin Bao","Shuyang Gu","Dong Chen","Wengang Zhou","Houqiang Li"],"abstract":"In this paper, we present DesignDiffusion, a simple yet effective framework for the novel task of synthesizing design images from textual descriptions. A primary challenge lies in generating accurate and style-consistent textual and visual content. Existing works in a related task of visual text generation often focus on generating text within given specific regions, which limits the creativity of generation models, resulting in style or color inconsistencies between textual and visual elements if applied to design image generation. To address this issue, we propose an end-to-end, one-stage diffusion-based framework that avoids intricate components like position and layout modeling. Specifically, the proposed framework directly synthesizes textual and visual design elements from user prompts. It utilizes a distinctive character embedding derived from the visual text to enhance the input....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","1970-01-01","preference"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/model-as-a-game-on-numerical-and-spatial-consistency-for-generative-games","title":"Model as a Game: On Numerical and Spatial Consistency for Generative Games","url":"https://www.microsoft.com/en-us/research/publication/model-as-a-game-on-numerical-and-spatial-consistency-for-generative-games/","published":"2025-03-01","authors":["Jingye Chen","Yuzhong Zhao","Yupan Huang","Lei Cui","Li Dong","Tengchao Lv","Qifeng Chen","Furu Wei"],"abstract":"Recent advances in generative models have significantly impacted game generation. However, despite producing high-quality graphics and adequately receiving player input, existing models often fail to maintain fundamental game properties such as numerical and spatial consistency. Numerical consistency ensures gameplay mechanics correctly reflect score changes and other quantitative elements, while spatial consistency prevents jarring scene transitions, providing seamless player experiences. In this paper, we revisit the paradigm of generative games to explore what truly constitutes a Model as a Game (MaaG) with a well-developed mechanism. We begin with an empirical study on Traveler'', a 2D game created by an LLM featuring minimalist rules yet challenging generative models in maintaining consistency. Based on the DiT architecture, we design two specialized modules: (1) a numerical module....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Manual","Artificial intelligence","LLM"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/general-scales-unlock-ai-evaluation-with-explanatory-and-predictive-power","title":"General Scales Unlock AI Evaluation with Explanatory and Predictive Power","url":"https://www.microsoft.com/en-us/research/publication/general-scales-unlock-ai-evaluation-with-explanatory-and-predictive-power/","published":"2025-03-01","authors":["Lexin Zhou","Lorenzo Pacchiardi","Fernando Martínez-Plumed","Katherine M. Collins","Yael Moros-Daval","Seraphina Zhang","Qinlin Zhao","Yitian Huang","Luning Sun","Jonathan E. Prunty","Zongqian Li","Pablo Sánchez-García"],"abstract":"Ensuring safe and effective use of AI requires understanding and anticipating its performance on novel tasks, from advanced scientific challenges to transformed workplace activities. So far, benchmarking has guided progress in AI, but it has offered limited explanatory and predictive power for general-purpose AI systems, given the low transferability across diverse tasks. In this paper, we introduce general scales for AI evaluation that can explain what common AI benchmarks really measure, extract ability profiles of AI systems, and predict their performance for new task instances, in- and out-of-distribution. Our fully-automated methodology builds on 18 newly-crafted rubrics that place instance demands on general scales that do not saturate. Illustrated for 15 large language models and 63 tasks, high explanatory power is unleashed from inspecting the demand and ability profiles, bringin...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Manual","Artificial intelligence","distillation"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W4409248830","title":"eDKM: An Efficient and Accurate Train-Time Weight Clustering for Large Language Models","url":"https://doi.org/10.1109/hpca61900.2025.00133","published":"2025-03-01","authors":["Minsik Cho","Keivan Alizadeh Vahid","Qichen Fu","Saurabh Adya","Carlo C. Del Mundo","Mohammad Rastegari","Devang Naik","Peter Zatloukal"],"abstract":"Since Large Language Models or LLMs have demonstrated high-quality performance on many complex language tasks, there is a great interest in bringing these LLMs to mobile devices for faster responses and better privacy protection. However, the size of LLMs (i.e., billions of parameters) requires highly effective compression to fit into storage-limited devices. Among many compression techniques, weight-clustering, a form of non-linear quantization, is one of the leading candidates for LLM compression, and supported by modern smartphones. Yet, its training overhead is prohibitively significant for LLM fine-tuning. Especially, Differentiable KMeans Clustering, or DKM, has shown the state-of-the-art trade-off between compression ratio and accuracy regression, but its large memory complexity makes it nearly impossible to apply to train-time LLM compression. In this letter, we propose a memory-...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/hpca61900.2025.00133","openalex_id":"https://openalex.org/W4409248830","cited_by_count":4,"quality_score":61,"matched_keywords":["LLM","memory","efficient","compression","quantization"],"author_affiliations":["Apple (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7901099324226379},{"id":"https://openalex.org/C73555534","display_name":"Cluster analysis","score":0.6862651109695435},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.4383648633956909},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3621693551540375}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4409204162","title":"DRMSpell: dynamically reweighting multimodality for Chinese spelling correction","url":"https://doi.org/10.1631/fitee.2300816","published":"2025-03-01","authors":["Yinghao Li","Heyan Huang","Baojun Wang","Yang Gao"],"abstract":"Chinese spelling correction (CSC) is a task that aims to detect and correct the spelling errors that may occur in Chinese texts. However, the Chinese language exhibits a high degree of complexity, characterized by the presence of multiple phonetic representations known as pinyin, which possess distinct tonal variations that can correspond to various characters. Given the complexity inherent in the Chinese language, the CSC task becomes imperative for ensuring the accuracy and clarity of written communication. Recent research has included external knowledge into the model using phonological and visual modalities. However, these methods do not effectively target the utilization of modality information to address the different types of errors. In this paper, we propose a multimodal pretrained language model called DRMSpell for CSC, which takes into consideration the interaction between the....","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1631/fitee.2300816","openalex_id":"https://openalex.org/W4409204162","cited_by_count":5,"quality_score":46,"matched_keywords":["language model"],"author_affiliations":["Beijing Institute of Technology","Huawei Technologies (China)"],"concepts":[{"id":"https://openalex.org/C2777801307","display_name":"Spelling","score":0.935441255569458},{"id":"https://openalex.org/C2780910867","display_name":"Multimodality","score":0.9299083948135376},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5187373757362366},{"id":"https://openalex.org/C41895202","display_name":"Linguistics","score":0.43908512592315674},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.3658261001110077},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.35764551162719727},{"id":"https://openalex.org/C136764020","display_name":"World Wide Web","score":0.11626619100570679},{"id":"https://openalex.org/C138885662","display_name":"Philosophy","score":0.08440595865249634}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":5}},{"id":"openalex:W4410954908","title":"AICB: A benchmark for evaluating the communication subsystem of LLM training clusters","url":"https://doi.org/10.1016/j.tbench.2025.100212","published":"2025-03-01","authors":["Xinyue Li","Heyang Zhou","Qingxu Li","Sen Zhang","Gang Lü"],"abstract":"AICB (Artificial Intelligence Communication Benchmark) is a benchmark for evaluating the communication subsystem of GPU clusters, which includes representative workloads in the fields of Large Language Model (LLM) training. Guided by the theories and methodologies of Evaluatology, we simplified the real-workload LLM training systems through AICB that maintain good representativeness and usability. AICB bridges the gap between application benchmarks and microbenchmarks in the scope of LLM training. In addition, we constructed a new GPU-free evaluation system that helps researchers evaluate the communication system of the LLM training systems. To help the urgent demand on this evaluation subject, we open-source AICB and make it available at https://github.com/aliyun/aicb. • Guided by the principles of Evaluatology, we propose AICB, a benchmark to evaluate AI communication systems. By “hija...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1016/j.tbench.2025.100212","openalex_id":"https://openalex.org/W4410954908","cited_by_count":1,"quality_score":46,"matched_keywords":["LLM","language model"],"author_affiliations":["Alibaba Group (China)"],"concepts":[{"id":"https://openalex.org/C185798385","display_name":"Benchmark (surveying)","score":0.855073094367981},{"id":"https://openalex.org/C2777211547","display_name":"Training (meteorology)","score":0.7521336078643799},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6164549589157104},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.38653549551963806},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.32290202379226685},{"id":"https://openalex.org/C205649164","display_name":"Geography","score":0.08441174030303955},{"id":"https://openalex.org/C58640448","display_name":"Cartography","score":0.05830979347229004},{"id":"https://openalex.org/C153294291","display_name":"Meteorology","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4413825257","title":"The Power of Constraints in Natural Language to SQL Translation","url":"https://doi.org/10.14778/3734839.3734847","published":"2025-03-01","authors":["Tonghui Ren","Ke Chen","Yuankai Fan","Yinan Jing","Zhenying He","Kai Zhang","X. Sean Wang"],"abstract":"Current large language model (LLM)-based Natural Language to SQL (NL2SQL) approaches typically rely on the database schema and partial data values for the translation. These approaches are unable to use sufficient data for accurate database understanding due to limitations in data selection methods, and they cannot input the entire database due to the limited context window sizes of LLMs. This insufficient data integration may result in an incomplete understanding of the database, leading to semantically incorrect SQL generation. In this paper, we introduce REDSQL, a novel plug-and-play framework that refines the predicted SQL by utilizing the entire database in the refinement process. The core idea of REDSQL is to enhance SQL refinement by identifying potential errors based on the database content, which is achieved by applying constraints on the input relations of query operations. LLM...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.14778/3734839.3734847","openalex_id":"https://openalex.org/W4413825257","cited_by_count":0,"quality_score":45,"matched_keywords":["LLM","language model"],"author_affiliations":["China Telecom (China)","Fudan University","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5806678533554077},{"id":"https://openalex.org/C149364088","display_name":"Translation (biology)","score":0.49206942319869995},{"id":"https://openalex.org/C2776608160","display_name":"Natural (archaeology)","score":0.4670723080635071},{"id":"https://openalex.org/C510870499","display_name":"SQL","score":0.46539199352264404},{"id":"https://openalex.org/C199360897","display_name":"Programming language","score":0.4478675425052643},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.38588327169418335},{"id":"https://openalex.org/C41895202","display_name":"Linguistics","score":0.378577321767807},{"id":"https://openalex.org/C95457728","display_name":"History","score":0.14193326234817505}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4408952473","title":"Cross‐Domain Foundation Model Adaptation: Pioneering Computer Vision Models for Geophysical Data Analysis","url":"https://doi.org/10.1029/2025jh000601","published":"2025-03-01","authors":["Zhixiang Guo","Xinming Wu","Luming Liang","Hanlin Sheng","Nuo Chen","Zhengfa Bi"],"abstract":"Abstract We explore adapting foundation models (FMs) from the computer vision domain to geoscience. FMs, which are large neural networks trained on massive data sets, excel in diverse tasks with remarkable adaptability and generality. However, geoscience faces challenges like lacking curated training data sets and high computational cost for developing specialized FMs. This study considers adapting FMs from computer vision to geoscience, analyzing their scale, adaptability, and generality for geoscientific data analysis. We introduce a workflow that leverages existing computer vision FMs, fine‐tuning them for geoscientific tasks, reducing development costs while enhancing accuracy. Through experiments, we demonstrate this workflow's effectiveness in broad applications to process and interpret geoscientific data of lunar images, seismic data, DAS arrays and so on. Our findings introduce a...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1029/2025jh000601","openalex_id":"https://openalex.org/W4408952473","cited_by_count":8,"quality_score":45,"matched_keywords":[],"author_affiliations":["Lawrence Berkeley National Laboratory","Microsoft (United States)","University of Science and Technology of China"],"concepts":[{"id":"https://openalex.org/C2780966255","display_name":"Foundation (evidence)","score":0.7135169506072998},{"id":"https://openalex.org/C2776434776","display_name":"Domain adaptation","score":0.5395947694778442},{"id":"https://openalex.org/C139807058","display_name":"Adaptation (eye)","score":0.5303645133972168},{"id":"https://openalex.org/C36503486","display_name":"Domain (mathematical analysis)","score":0.4990804195404053},{"id":"https://openalex.org/C2522767166","display_name":"Data science","score":0.4595015048980713},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.45601633191108704},{"id":"https://openalex.org/C127313418","display_name":"Geology","score":0.3554112911224365},{"id":"https://openalex.org/C8058405","display_name":"Geophysics","score":0.34229278564453125}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":8}},{"id":"openalex:W4408145144","title":"A Diffusion Model for Traffic Data Imputation","url":"https://doi.org/10.1109/jas.2024.124611","published":"2025-03-01","authors":["Bo Lü","Qinghai Miao","Yahui Liu","Tariku Sinshaw Tamir","Hongxia Zhao","Xiqiao Zhang","Yanhong Lv","Fei‐Yue Wang"],"abstract":"Imputation of missing data has long been an important topic and an essential application for intelligent transportation systems (ITS) in the real world. As a state-of-the-art generative model, the diffusion model has proven highly successful in image generation, speech generation, time series modelling etc. and now opens a new avenue for traffic data imputation. In this paper, we propose a conditional diffusion model, called the implicit-explicit diffusion model, for traffic data imputation. This model exploits both the implicit and explicit feature of the data simultaneously. More specifically, we design two types of feature extraction modules, one to capture the implicit dependencies hidden in the raw data at multiple time scales and the other to obtain the long-term temporal dependencies of the time series. This approach not only inherits the advantages of the diffusion model for esti...","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/jas.2024.124611","openalex_id":"https://openalex.org/W4408145144","cited_by_count":4,"quality_score":45,"matched_keywords":["long-term"],"author_affiliations":["Baidu (China)","Chinese Academy of Sciences","Guangdong University of Technology","Harbin Institute of Technology","Meizu (China)","Shandong Institute of Automation","University of Chinese Academy of Sciences"],"concepts":[{"id":"https://openalex.org/C58041806","display_name":"Imputation (statistics)","score":0.6459223031997681},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.4845232367515564},{"id":"https://openalex.org/C105795698","display_name":"Statistics","score":0.3579128384590149},{"id":"https://openalex.org/C124101348","display_name":"Data mining","score":0.340212881565094},{"id":"https://openalex.org/C9357733","display_name":"Missing data","score":0.23020142316818237},{"id":"https://openalex.org/C33923547","display_name":"Mathematics","score":0.21825546026229858},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.12076687812805176}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":4}},{"id":"openalex:W4408070679","title":"DeepVerifier: Learning to Update Test Sequences for Coverage-Guided Verification","url":"https://doi.org/10.1145/3721133","published":"2025-03-01","authors":["Yuntao Lu","Chen Bai","Yuxuan Zhao","Ziyue Zheng","Yangdi Lyu","Mingyu Liu","Bei Yu"],"abstract":"Verification is critical in ensuring the reliable operation of modern, complex computing systems. However, as processor designs become increasingly sophisticated, conventional static verification techniques struggle to generate high-quality test sequences that achieve comprehensive coverage. Dynamic simulation-based approaches, which leverage coverage-driven objectives, can increase confidence in correct processor functionality but often suffer from low verification efficiency due to the generation of redundant test sequences and significant computational overhead. To address these challenges, this paper presents DeepVerifier, a novel coverage-guided test generation framework that leverages data-driven learning of existing test sequences and their associated coverage feedback. DeepVerifier uses a language model to learn the semantic representations of test sequences, ensure adherence to....","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3721133","openalex_id":"https://openalex.org/W4408070679","cited_by_count":1,"quality_score":42,"matched_keywords":["language model"],"author_affiliations":["Chinese University of Hong Kong","Hong Kong University of Science and Technology","Huawei Technologies (China)","Huawei Technologies (United Kingdom)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.9045295715332031},{"id":"https://openalex.org/C2777267654","display_name":"Test (biology)","score":0.5532109141349792},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3686751127243042},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.3607082664966583},{"id":"https://openalex.org/C151730666","display_name":"Paleontology","score":0.0},{"id":"https://openalex.org/C86803240","display_name":"Biology","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4411232379","title":"Generative AI for Computer Graphics","url":"https://doi.org/10.1109/mcg.2025.3574915","published":"2025-03-01","authors":["Rajesh Sharma","Vinicius Azevedo","Tomasz Bednarz","Doug Roble"],"abstract":"Generative AI has emerged as a transformative force in the realm of computer graphics, offering innovative methods that push the boundaries of creativity, efficiency, and realism. In this special issue, we delve into the myriad ways in which generative AI is reshaping the field, with six articles that explore its applications across a wide range of topics. These contributions cover advancements in neural networks for image generation, AI-assisted design tools, deep learning techniques for realistic simulations, and the future of AI-driven animation. By examining both the theoretical and practical implications of these developments, this issue provides a comprehensive overview of how generative AI is enhancing the art and science of computer graphics. As the field continues to evolve, the articles in this issue offer a glimpse into the exciting possibilities that lie ahead, illuminating t...","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/mcg.2025.3574915","openalex_id":"https://openalex.org/W4411232379","cited_by_count":3,"quality_score":40,"matched_keywords":[],"author_affiliations":["ETH Zurich","Menlo School","Nvidia (United States)","Walt Disney (Switzerland)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8059179782867432},{"id":"https://openalex.org/C77660652","display_name":"Computer graphics","score":0.773148775100708},{"id":"https://openalex.org/C121684516","display_name":"Computer graphics (images)","score":0.6856555938720703},{"id":"https://openalex.org/C21442007","display_name":"Graphics","score":0.5544219017028809},{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.5052023530006409},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.42104846239089966},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.40298280119895935},{"id":"https://openalex.org/C49774154","display_name":"Multimedia","score":0.3294757902622223}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":3}},{"id":"official:9c59b738014443e3","title":"NVIDIA Isaac GR00T N1: An Open Foundation Model for Humanoid Robots","url":"https://research.nvidia.com/publication/2025-03_nvidia-isaac-gr00t-n1-open-foundation-model-humanoid-robots","published":"2025-03","authors":["Yuke Zhu","Linxi \"Jim\" Fan","NVIDIA GEAR Team"],"abstract":"Official NVIDIA Research publication.","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["NVIDIA"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official NVIDIA Research publications page https://research.nvidia.com/publications?f%5B0%5D=publication_date%3A2025&page=3"}},{"id":"official:5d5389865c03cd9a","title":"LLaMA-Mesh: Unifying 3D Mesh Generation with Language Models","url":"https://research.nvidia.com/publication/2025-03_llama-mesh-unifying-3d-mesh-generation-language-models","published":"2025-03","authors":["Zhengyi Wang","Jonathan Lorraine","Yikai Wang","Hang Su","Jun Zhu","Sanja Fidler","Xiaohui Zeng"],"abstract":"Official NVIDIA Research publication.","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["NVIDIA"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official NVIDIA Research publications page https://research.nvidia.com/publications?f%5B0%5D=publication_date%3A2025&page=3"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/multi-modal-language-models-in-bioacoustics-with-zero-shot-transfer-a-case-study","title":"Multi-modal Language models in bioacoustics with zero-shot transfer: a case study","url":"https://www.microsoft.com/en-us/research/publication/multi-modal-language-models-in-bioacoustics-with-zero-shot-transfer-a-case-study/","published":"2025-02-28","authors":["Zhongqi Miao","Benjamin Elizalde","Soham Deshmukh","Justin Kitzes","Huaming Wang","Rahul Dodhia","Juan M. Lavista Ferres"],"abstract":"Automatically detecting sound events with Artificial Intelligence (AI) has become increasingly popular in the field of bioacoustics, ecoacoustics, and soundscape ecology, particularly for wildlife monitoring and conservation. Conventional methods predominantly employ supervised learning techniques that depend on substantial amounts of manually annotated bioacoustic data. However, manual annotation in bioacoustics is tremendously resource-intensive in terms of both human labor and financial resources, and it requires considerable domain expertise. Moreover, the supervised learning framework limits the application scope to predefined categories within a closed setting. The recent advent of Multi-Modal Language Models has markedly enhanced the versatility and possibilities within the realm of AI applications, as this technique addresses many of the challenges that inhibit the deployment of....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.21203/rs.3.rs-4438479/v1","openalex_id":"https://openalex.org/W4399999153","cited_by_count":7,"quality_score":79,"matched_keywords":["Article (Journal)","Artificial intelligence","Ecology and environment","1970-01-01","language model"],"author_affiliations":["Microsoft","Microsoft (United States)","University of Pittsburgh","Microsoft (Norway)"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"bytedance-seed:232","title":"FlexPrefill: A Context-Aware Sparse Attention Mechanism for Efficient Long-Sequence Inference","url":"https://seed.bytedance.com/en/research/flexprefill-a-context-aware-sparse-attention-mechanism-for-efficient-long-sequence-inference","published":"2025-02-28","authors":["Xunhao Lai","Jianqiao Lu","Yao Luo","Yiyuan Ma","Xun Zhou"],"abstract":"Large language models (LLMs) encounter computational challenges during long-sequence inference, especially in the attention pre-filling phase, where the complexity grows quadratically with the prompt length. Previous efforts to mitigate these challenges have relied on fixed sparse attention patterns or identifying sparse attention patterns based on limited cases. However, these methods lacked the flexibility to efficiently adapt to varying input demands. In this paper, we introduce FlexPrefill, a Flexible sparse Pre-filling mechanism that dynamically adjusts sparse attention patterns and computational budget in real-time to meet the specific requirements of each input and attention head. The flexibility of our method is demonstrated through two key innovations: 1) Query-Aware Sparse Pattern Determination: By measuring Jensen-Shannon divergence, this component adaptively switches between....","companies":["ByteDance/Seed"],"matched_orgs":["ByteDance/Seed"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Core Machine Learning","LLM","ICLR 2025 Oral","efficient"],"author_affiliations":["ByteDance/Seed"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official ByteDance Seed publication API https://seed.bytedance.com/api/get_article_list_v2"}},{"id":"apple:ga21p72bf7yhitxtcias2aev","title":"dMel: Speech Tokenization Made Simple","url":"https://machinelearning.apple.com/research/speech-tokenization-made-simple","published":"2025-02-28","authors":["He Bai","Tatiana Likhomanenko","Ruixiang Zhang","Zijin Gu","Zakaria Aldeneh","Navdeep Jaitly"],"abstract":"Large language models have revolutionized natural language processing by leveraging self-supervised pretraining on vast textual data. Inspired by this success, researchers have investigated complicated speech tokenization methods to discretize continuous speech signals so that language modeling techniques can be applied to speech data. However, existing approaches either model semantic (content) tokens, potentially losing acoustic information, or...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"openalex:W4409917695","title":"Position: Prospective of Autonomous Driving - Multimodal LLMs, World Models, Embodied Intelligence, AI Alignment, and Mamba","url":"https://doi.org/10.1109/wacvw65960.2025.00114","published":"2025-02-28","authors":["Yunsheng Ma","Wenqian Ye","Can Cui","Haiming Zhang","Shuo Xing","Fucai Ke","Jinhong Wang","Chenglin Miao","Jintai Chen","Hamid Rezatofighi","Zhen Li","Guangtao Zheng"],"abstract":"With the emergence of Generative AI, multimodal AI systems that leverage foundation models are beginning to demonstrate enormous potential for perceiving the real world, collecting new data, making decisions, and using tools like humans. In recent years, the use of Large Language Models and World Models in autonomous driving has received widespread attention. However, despite their enormous potential, there is still a lack of comprehensive understanding regarding the key challenges, opportunities, and future applications of these new foundation models in driving systems. In this paper, we provide an outlook on this field, summarizing existing methods and exploring their limitations. In addition, we further discuss the applicability of emerging approaches, such as Reinforcement Learning from Human Feedback and Mamba for applications in autonomous driving. Finally, we highlight open questi...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/wacvw65960.2025.00114","openalex_id":"https://openalex.org/W4409917695","cited_by_count":13,"quality_score":50,"matched_keywords":[],"author_affiliations":["Australian Regenerative Medicine Institute","Iowa State University","Monash University","Purdue University West Lafayette","Robert Bosch (India)","Tencent (China)","Texas A&M University","Tsinghua University","Universidad Católica Santo Domingo","University of Hong Kong","University of Illinois Urbana-Champaign","University of Toronto","University of Virginia","Zhejiang University"],"concepts":[{"id":"https://openalex.org/C100609095","display_name":"Embodied cognition","score":0.7779167890548706},{"id":"https://openalex.org/C198082294","display_name":"Position (finance)","score":0.6523804664611816},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5372260808944702},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4222482442855835},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.34312009811401367},{"id":"https://openalex.org/C144133560","display_name":"Business","score":0.13981160521507263},{"id":"https://openalex.org/C10138342","display_name":"Finance","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":13}},{"id":"openalex:W4408028067","title":"Improving Deep Assertion Generation via Fine-Tuning Retrieval-Augmented Pre-trained Language Models","url":"https://doi.org/10.1145/3721128","published":"2025-02-28","authors":["Quanjun Zhang","Chunrong Fang","Yi Zheng","Yaxin Zhang","Yuan Zhao","Rubing Huang","Jianyi Zhou","Yun Yang","Tao Zheng","Zhenyu Chen"],"abstract":"Unit testing validates the correctness of the units of the software system under test and serves as the cornerstone in improving software quality and reliability. To reduce manual efforts in writing unit tests, some techniques have been proposed to generate test assertions automatically, including deep learning (DL)-based, retrieval-based, and integration-based ones. Among them, recent integration-based approaches inherit from both DL-based and retrieval-based approaches and are considered state-of-the-art. Despite being promising, such integration-based approaches suffer from inherent limitations, such as retrieving assertions with lexical matching while ignoring meaningful code semantics, and generating assertions with a limited training corpus. In this paper, we propose a novel Retri eval-Augmented Deep Assertion Gen eration approach, namely RetriGen, based on a hybrid assertion retri...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3721128","openalex_id":"https://openalex.org/W4408028067","cited_by_count":4,"quality_score":49,"matched_keywords":["language model","retrieval"],"author_affiliations":["Huawei Technologies (China)","Macau University of Science and Technology","Nanjing University","Swinburne University of Technology"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8716424107551575},{"id":"https://openalex.org/C40422974","display_name":"Assertion","score":0.8493573665618896},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.48237672448158264},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.3647800087928772},{"id":"https://openalex.org/C199360897","display_name":"Programming language","score":0.2607106566429138}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":4}},{"id":"openalex:W4408030015","title":"Delving Into Instance Modeling for Weakly Supervised Video Anomaly Detection","url":"https://doi.org/10.1109/tcsvt.2025.3546766","published":"2025-02-28","authors":["Shengyang Sun","Jiashen Hua","Junyi Feng","Dongxu Wei","Baisheng Lai","Xiaojin Gong"],"abstract":"Weakly-supervised video anomaly detection (WS-VAD) aims to identify fine-grained anomalies from sparse video-level labels, which has gained increasing attention in recent years due to its various applications such as disaster warning and public security. Recent studies typically formulate WS-VAD as a multi-instance learning (MIL) problem. However, they neglect the instance creation process and simply apply a uniform temporal pooling (UTP) operation to obtain the training instances, leading to severe anomaly contamination and dilution. In this paper, we emphasize the importance of the instance modeling procedure and propose two simple yet effective modules, i.e., the dynamic segment merging (DSM) module and the retrieval-augmented anomaly restoration (RA2R) module, to tackle the problem from segment-level and feature-level, respectively. We equip various state-of-the-art WS-VAD models wit...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/tcsvt.2025.3546766","openalex_id":"https://openalex.org/W4408030015","cited_by_count":3,"quality_score":44,"matched_keywords":["retrieval"],"author_affiliations":["Alibaba Group (China)","Westlake University","Zhejiang University"],"concepts":[{"id":"https://openalex.org/C739882","display_name":"Anomaly detection","score":0.5875643491744995},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5741940140724182},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5116620063781738},{"id":"https://openalex.org/C153180895","display_name":"Pattern recognition (psychology)","score":0.4149303436279297},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.33973777294158936}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":3}},{"id":"openalex:W4409917055","title":"Enhancing Remote Sensing Representations Through Mixed-Modality Masked Autoencoding","url":"https://doi.org/10.1109/wacvw65960.2025.00058","published":"2025-02-28","authors":["Ori Linial","George Leifman","Yochai Blau","Nadav Sherman","Yotam Gigi","Wojciech Sirko","Genady Beryozkin"],"abstract":"This paper presents an innovative approach to pretraining models for remote sensing by integrating optical and SAR (Synthetic Aperture Radar) data from Sentinel-2 and Sentinel-1 satellites. Using a novel variation on the masked autoencoder (MAE) framework, our model incorporates a dual-task setup: reconstructing masked Sentinel-2 images and predicting corresponding Sentinel-1 images. This multitask design enables the encoder to capture both spectral and structural features across diverse environmental conditions. Additionally, we introduce a “mixing” strategy in the pretraining phase, combining patches from both image sources, which mitigates spatial misalignment errors and enhances model robustness. Evaluation on segmentation and classification tasks, including Sen1Floods11, BigEarthNet, and UrbanSRSeg8, demonstrates significant improvements in model performance and generalizability acr...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/wacvw65960.2025.00058","openalex_id":"https://openalex.org/W4409917055","cited_by_count":2,"quality_score":39,"matched_keywords":[],"author_affiliations":["Google (United States)"],"concepts":[{"id":"https://openalex.org/C2780226545","display_name":"Modality (human–computer interaction)","score":0.8218215703964233},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6082789301872253},{"id":"https://openalex.org/C62649853","display_name":"Remote sensing","score":0.5163379907608032},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3236702084541321},{"id":"https://openalex.org/C205649164","display_name":"Geography","score":0.1370827555656433}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":2}},{"id":"openalex:W4408050188","title":"GenCeption: Evaluate vision LLMs with unlabeled unimodal data","url":"https://doi.org/10.1016/j.csl.2025.101785","published":"2025-02-28","authors":["Lele Cao","Valentin Buchner","Zineb Senane","Fangkai Yang"],"abstract":"","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1016/j.csl.2025.101785","openalex_id":"https://openalex.org/W4408050188","cited_by_count":1,"quality_score":38,"matched_keywords":[],"author_affiliations":["Energiforsk (Sweden)","KTH Royal Institute of Technology","Microsoft (United States)","Mother Hospital","Stockholm University","Télécom Paris"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8345503807067871},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.515629231929779},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.45373621582984924}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"arxiv:2502.21291","title":"MIGE: A Unified Framework for Multimodal Instruction-Based Image Generation and Editing","url":"https://huggingface.co/papers/2502.21291","published":"2025-02-28","authors":["Xueyun Tian","Wei Li","Bingbing Xu","Yige Yuan","Yuanzhuo Wang","Huawei Shen"],"abstract":"Despite significant progress in diffusion-based image generation, subject-driven generation and instruction-based editing remain challenging. Existing methods typically treat them separately, struggling with limited high-quality data and poor generalization. However, both tasks require capturing complex visual variations while maintaining consistency between inputs and outputs. Therefore, we propose MIGE, a unified framework that standardizes task representations using multimodal instructions. It treats subject-driven generation as creation on a blank canvas and instruction-based editing as modification of an existing image, establishing a shared input-output formulation. MIGE introduces a novel multimodal encoder that maps free-form multimodal instructions into a unified vision-language space, integrating visual and semantic features through a feature fusion mechanism.This unification e...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["huggingface_search"],"source":"huggingface_search","work_type":"","doi":"","openalex_id":"","cited_by_count":0,"quality_score":27,"matched_keywords":[],"author_affiliations":[],"concepts":[],"official_report":false,"quality_signals":{"company_match_source":"HuggingFace Papers organization/author metadata"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/rapid-and-accurate-prediction-of-protein-homo-oligomer-symmetry-using-seq2symm","title":"Rapid and accurate prediction of protein homo-oligomer symmetry using Seq2Symm","url":"https://www.microsoft.com/en-us/research/publication/rapid-and-accurate-prediction-of-protein-homo-oligomer-symmetry-using-seq2symm/","published":"2025-02-27","authors":["Meghana Kshirsagar","Artur Meller","Ian R. Humphreys","Samuel Sledzieski","Yixi Xu","Rahul Dodhia","Eric Horvitz","Bonnie Berger","Gregory R. Bowman","Juan M. Lavista Ferres","David Baker","Minkyung Baek"],"abstract":"The majority of proteins must form higher-order assemblies to perform their biological functions, yet few machine learning models can accurately and rapidly predict the symmetry of assemblies involving multiple copies of the same protein chain. Here, we address this gap by finetuning several classes of protein foundation models, to predict homo-oligomer symmetry. Our best model named Seq2Symm, which utilizes ESM2, outperforms existing template-based and deep learning methods achieving an average AUC-PR of 0.47, 0.44 and 0.49 across homo-oligomer symmetries on three held-out test sets compared to 0.24, 0.24 and 0.25 with template-based search. Seq2Symm uses a single sequence as input and can predict at the rate of ~80,000 proteins/hour. We apply this method to 5 proteomes and ~3.5 million unlabeled protein sequences, showing its promise to be used in conjunction with downstream computatio...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.1038/s41467-025-57148-3","openalex_id":"https://openalex.org/W4408010982","cited_by_count":12,"quality_score":80,"matched_keywords":["Article (Journal)","Artificial intelligence","Medical, health and genomics","1970-01-01"],"author_affiliations":["Microsoft","Howard Hughes Medical Institute","Massachusetts Institute of Technology","Microsoft (United States)","National University College","Seoul National University","University of Pennsylvania","University of Washington","Washington University in St. Louis"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"bytedance-seed:120","title":"Comet: Fine-grained Computation-communication Overlapping for Mixture-of-Experts","url":"https://seed.bytedance.com/en/research/comet-fine-grained-computation-communication-overlapping-for-mixture-of-experts","published":"2025-02-27","authors":["Shulai Zhang","Ningxin Zheng","Haibin Lin","Ziheng Jiang","Wenlei Bao","Chengquan Jiang","Qi Hou","Weihao Cui","Size Zheng","Li-Wen Chang","Quan Chen","Xin Liu"],"abstract":"Mixture-of-experts (MoE) has been extensively employed to scale large language models to trillion-plus parameters while maintaining a fixed computational cost. The development of large MoE models in the distributed scenario encounters the problem of large communication overhead. The inter-device communication of a MoE layer can occupy 47% time of the entire model execution with popular models and frameworks. Therefore, existing methods suggest the communication in a MoE layer to be pipelined with the computation for overlapping. However, these coarse grained overlapping schemes introduce a notable impairment of computational efficiency and the latency concealing is sub-optimal.To this end, we present COMET, an optimized MoE system with fine-grained communication-computation overlapping. Leveraging data dependency analysis and task rescheduling, COMET achieves precise fine-grained overlap...","companies":["ByteDance/Seed"],"matched_orgs":["ByteDance/Seed"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["System Research","Infrastructures","MLSys 2025"],"author_affiliations":["ByteDance/Seed"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official ByteDance Seed publication API https://seed.bytedance.com/api/get_article_list_v2"}},{"id":"bytedance-seed:194","title":"SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines","url":"https://seed.bytedance.com/en/research/supergpqa-scaling-llm-evaluation-across-285-graduate-disciplines","published":"2025-02-27","authors":["M-A-P Team","Xinrun Du","Yifan Yao","Kaijing Ma","Bingli Wang","Tianyu Zheng","Kang Zhu","Minghao Liu","Yiming Liang","Xiaolong Jin","Zhenlin Wei","Chujie Zheng"],"abstract":"Large language models (LLMs) have demonstrated remarkable proficiency in mainstream academic disciplines such as mathematics, physics, and computer science. However, human knowledge encompasses over 200 specialized disciplines, far exceeding the scope of existing benchmarks. The capabilities of LLMs in many of these specialized fields-particularly in light industry, agriculture, and service-oriented disciplines-remain inadequately evaluated. To address this gap, we present SuperGPQA, a comprehensive benchmark that evaluates graduate-level knowledge and reasoning capabilities across 285 disciplines. Our benchmark employs a novel Human-LLM collaborative filtering mechanism to eliminate trivial or ambiguous questions through iterative refinement based on both LLM responses and expert feedback. Our experimental results reveal significant room for improvement in the performance of current sta...","companies":["ByteDance/Seed"],"matched_orgs":["ByteDance/Seed"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["LLM","arXiv"],"author_affiliations":["ByteDance/Seed"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official ByteDance Seed publication API https://seed.bytedance.com/api/get_article_list_v2"}},{"id":"official:24f8cedcf45d7166","title":"Logic.py: Bridging the Gap between LLMs and Constraint Solvers","url":"https://ai.meta.com/research/publications/logic-py-bridging-the-gap-between-llms-and-constraint-solvers/","published":"2025-02-27","authors":["Pascal Kesseli","Peter O'Hearn","Ricardo Silveira Cabral"],"abstract":"Official Meta AI publication page.","companies":["Meta/FAIR"],"matched_orgs":["Meta/FAIR"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Integrity","Theory"],"author_affiliations":["Meta/FAIR"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Meta AI publication search https://ai.meta.com/results/?content_types%5B0%5D=publication&page=6"}},{"id":"official:244917b9ef595803","title":"OpenAI GPT-4.5 System Card","url":"https://openai.com/index/gpt-4-5-system-card","published":"2025-02-27","authors":["OpenAI"],"abstract":"We’re releasing a research preview of OpenAI GPT‑4.5, our largest and most knowledgeable model yet.","companies":["OpenAI"],"matched_orgs":["OpenAI"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Publication"],"author_affiliations":["OpenAI"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official RSS feed https://openai.com/news/rss.xml"}},{"id":"openalex:W4408157628","title":"Enhancing 3D scene understanding via text annotations","url":"https://doi.org/10.15587/1729-4061.2025.323757","published":"2025-02-27","authors":["Ruslan Partsey","Vasyl Teslyuk","Oleksandr Maksymets","Vladyslav Humennyy","Volodymyr Kuzma"],"abstract":"The object of this study is the use of text annotations as a form of 3D scene representation. The paper investigates the task of integrating large-scale language models (LLMs) into complex 3D environments. Using the Embodied Question Answering task as an example, we analyze different types of scene annotations and evaluate the performance of LLMs on a subset of test episodes from the OpenEQA dataset. The aim of the study was to evaluate the effectiveness of textual scene descriptions compared to visual data for solving EQA tasks. The methodology implied estimating the optimal context length for scene annotations, measuring the differences between free-form and structured annotations, as well as analyzing the impact of model size on performance, and comparing model results with the level of human comprehension of scene annotations. The results showed that detailed descriptions that includ...","companies":["Meta/FAIR"],"matched_orgs":["Meta/FAIR"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.15587/1729-4061.2025.323757","openalex_id":"https://openalex.org/W4408157628","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["BC Platforms (Finland)","Lviv Polytechnic National University","Meta (United States)","Ukrainian Catholic University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5505214929580688},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.414266973733902},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.36506733298301697},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.33697211742401123}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/tip-of-the-tongue-query-elicitation-for-simulated-evaluation","title":"Tip of the Tongue Query Elicitation for Simulated Evaluation","url":"https://www.microsoft.com/en-us/research/publication/tip-of-the-tongue-query-elicitation-for-simulated-evaluation/","published":"2025-02-26","authors":["Yifan He","To Eun Kim","Fernando Diaz","Jaime Arguello","Bhaskar Mitra"],"abstract":"Tip-of-the-tongue (TOT) search occurs when a user struggles to recall a specific identifier, such as a document title. While common, existing search systems often fail to effectively support TOT scenarios. Research on TOT retrieval is further constrained by the challenge of collecting queries, as current approaches rely heavily on community question-answering (CQA) websites, leading to labor-intensive evaluation and domain bias. To overcome these limitations, we introduce two methods for eliciting TOT queries - leveraging large language models (LLMs) and human participants - to facilitate simulated evaluations of TOT retrieval systems. Our LLM-based TOT user simulator generates synthetic TOT queries at scale, achieving high correlations with how CQA-based TOT queries rank TOT retrieval systems when tested in the Movie domain. Additionally, these synthetic queries exhibit high linguistic....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":92,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Search and information retrieval","automatic evaluation","Information retrieval","large language models","Synthetic data","Tip of the Tongue","LLM","retrieval"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"bytedance-seed:227","title":"Towards Semantic Equivalence of Tokenization in Multimodal LLM","url":"https://seed.bytedance.com/en/research/towards-semantic-equivalence-of-tokenization-in-multimodal-llm","published":"2025-02-26","authors":["Shengqiong Wu","Hao Fei","Xiangtai Li","Jiayi Ji","Hanwang Zhang","Tat-Seng Chua","Shuicheng Yan"],"abstract":"Multimodal Large Language Models (MLLMs) have demonstrated exceptional capabilities in processing vision-language tasks. One of the crux of MLLMs lies in vision tokenization, which involves efficiently transforming input visual signals into feature representations that are most beneficial for LLMs. However, existing vision tokenizers, essential for semantic alignment between vision and language, remain problematic. Existing methods aggressively fragment visual input, corrupting the visual semantic integrity. To address this, this paper proposes a novel dynamic Semantic-Equivalent Vision Tokenizer (SeTok), which groups visual features into semantic units via a dynamic clustering algorithm, flexibly determining the number of tokens based on image complexity. The resulting vision tokens effectively preserve semantic integrity and capture both low-frequency and high-frequency visual features...","companies":["ByteDance/Seed"],"matched_orgs":["ByteDance/Seed"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Computer Vision","Vision","ICLR 2025","LLM"],"author_affiliations":["ByteDance/Seed"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official ByteDance Seed publication API https://seed.bytedance.com/api/get_article_list_v2"}},{"id":"openalex:W4407953173","title":"Unifying Bias and Unfairness in Information Retrieval: New Challenges in the LLM Era","url":"https://doi.org/10.1145/3701551.3703478","published":"2025-02-26","authors":["Sunhao Dai","Xu Chen","Shicheng Xu","Liang Pang","Zhenhua Dong","Jun Xu"],"abstract":"With the rapid advancements of large language models (LLMs), information retrieval (IR) systems, such as search engines and recommender systems, have undergone a paradigm shift due to their integration. However, integrating LLMs into the IR pipelines has also introduced new challenges, particularly in the form of biases and unfairness that may disrupt the information ecosystem. This tutorial will offer a comprehensive overview of emerging and pressing bias and unfairness issues associated with integrating LLMs into IR systems. Specifically, this tutorial first unifies bias and unfairness issues as problems of distribution mismatch and further categorizes the mitigation strategies under the umbrella of distribution alignment. Then, we summarize several types of bias and unfairness issues emerging from three critical stages of LLM integration into IR systems: data collection, model develop...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3701551.3703478","openalex_id":"https://openalex.org/W4407953173","cited_by_count":7,"quality_score":52,"matched_keywords":["LLM","retrieval"],"author_affiliations":["Chinese Academy of Sciences","Huawei Technologies (China)","Institute of Computing Technology","Renmin University of China"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6424041986465454},{"id":"https://openalex.org/C2522767166","display_name":"Data science","score":0.4520048499107361},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.4500119090080261}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":7}},{"id":"openalex:W4407953538","title":"Tutorial on Recommendation with Generative Models (Gen-RecSys)","url":"https://doi.org/10.1145/3701551.3703485","published":"2025-02-26","authors":["Yashar Deldjoo","Zhankui He","Julian McAuley","Anton Korikov","Scott Sanner","Arnau Ramisa","Renè Vidal","Maheswaran Sathiamoorthy","Atoosa Kasirzadeh","Silvia Milano"],"abstract":"This intermediate-level tutorial, titled \"Gen-RecSys\", merges both industrial and academic perspectives on recent advances in Generative AI for recommender systems (beyond LLMs). It aims to highlight the transformative role of generative models in modern recommender systems, which have significantly impacted the AI field-particularly with the rise of large language models (LLMs) like ChatGPT-and have contributed to a rapid convergence of the fields of search, data mining, and recommendation. By providing attendees with a modern perspective on GenAI applications in recommendation, the tutorial will emphasize how generative models can drive recommendation by unlocking and interacting with rich data representations, including behavioral, textual, and multi-modal data-knowledge highly transferable across many applications of interest to the WSDM community. Participants will learn about the c...","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3701551.3703485","openalex_id":"https://openalex.org/W4407953538","cited_by_count":14,"quality_score":51,"matched_keywords":[],"author_affiliations":["Amazon (United States)","Institut für Urheber- und Medienrecht","Ludwig-Maximilians-Universität München","Polytechnic University of Bari","University of California San Diego","University of Edinburgh","University of Pennsylvania","University of Toronto"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6041964292526245},{"id":"https://openalex.org/C557471498","display_name":"Recommender system","score":0.5304842591285706},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.43075618147850037},{"id":"https://openalex.org/C167966045","display_name":"Generative model","score":0.4245207905769348},{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.4182726740837097},{"id":"https://openalex.org/C70721500","display_name":"Computational biology","score":0.39789432287216187},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.29024770855903625},{"id":"https://openalex.org/C86803240","display_name":"Biology","score":0.15869072079658508}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":14}},{"id":"openalex:W4407953168","title":"Explainable CTR Prediction via LLM Reasoning","url":"https://doi.org/10.1145/3701551.3703551","published":"2025-02-26","authors":["Xiaohan Yu","Li Zhang","Chong Chen"],"abstract":"Recommendation Systems have become integral to modern user experiences, but lack transparency in their decision-making processes. Existing explainable recommendation methods are hindered by reliance on a post-hoc paradigm, wherein explanation generators are trained independently of the underlying recommender models. This paradigm necessitates substantial human effort in data construction and raises concerns about explanation reliability. In this paper, we present ExpCTR, a novel framework that integrates large language model based explanation generation directly into the CTR prediction process. Inspired by recent advances in reinforcement learning, we employ two carefully designed reward mechanisms, LC alignment, which ensures explanations reflect user intentions, and IC alignment, which maintains consistency with traditional ID-based CTR models. Our approach incorporates an efficient tr...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3701551.3703551","openalex_id":"https://openalex.org/W4407953168","cited_by_count":2,"quality_score":51,"matched_keywords":["LLM","language model","efficient"],"author_affiliations":["Huawei Technologies (China)","University College London"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6308397054672241},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.35610392689704895}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":2}},{"id":"openalex:W4409264149","title":"SGD: Street View Synthesis with Gaussian Splatting and Diffusion Prior","url":"https://doi.org/10.1109/wacv61041.2025.00375","published":"2025-02-26","authors":["Zhongrui Yu","Haoran Wang","Jinze Yang","Hanzhang Wang","Jiale Cao","Zhong Ji","Mingming Sun"],"abstract":"Novel View Synthesis (NVS) for street scenes plays a critical role in the autonomous driving simulation. Current mainstream methods, such as Neural Radiance Fields (NeRF) and 3D Gaussian Splatting (3DGS), struggle to maintain rendering quality at the viewpoint that deviates significantly from the training viewpoints. This issue stems from the sparse training views captured by a fixed camera on a moving vehicle. To tackle this problem, we propose a novel approach that enhances the capacity of 3DGS by leveraging prior from a Diffusion Model along with complementary multi-modal data. Specifically, we first fine-tune a Diffusion Model by adding images from adjacent frames as condition, meanwhile exploiting depth data from LiDAR point clouds to supply additional spatial information. Then we apply the fine-tuned Diffusion Model to regularize the 3DGS at unseen views during training. Experiment...","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/wacv61041.2025.00375","openalex_id":"https://openalex.org/W4409264149","cited_by_count":12,"quality_score":49,"matched_keywords":[],"author_affiliations":["Baidu (China)","Harbin Institute of Technology","Tianjin University","University of Chinese Academy of Sciences"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6711378693580627},{"id":"https://openalex.org/C163716315","display_name":"Gaussian","score":0.5195763111114502},{"id":"https://openalex.org/C69357855","display_name":"Diffusion","score":0.4865471422672272},{"id":"https://openalex.org/C121684516","display_name":"Computer graphics (images)","score":0.3685821294784546},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3477839231491089},{"id":"https://openalex.org/C121332964","display_name":"Physics","score":0.13517731428146362},{"id":"https://openalex.org/C62520636","display_name":"Quantum mechanics","score":0.0},{"id":"https://openalex.org/C97355855","display_name":"Thermodynamics","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":12}},{"id":"openalex:W4407948936","title":"Improving Retrieval-Augmented Deep Assertion Generation via Joint Training","url":"https://doi.org/10.1109/tse.2025.3545970","published":"2025-02-26","authors":["Quanjun Zhang","Chunrong Fang","Yi Zheng","Ruixiang Qian","Shengcheng Yu","Yuan Zhao","Jianyi Zhou","Yun Yang","Tao Zheng","Zhenyu Chen"],"abstract":"Unit testing attempts to validate the correctness of basic units of the software system under test and has a crucial role in software development and testing. However, testing experts have to spend a huge amount of effort to write unit test cases manually. Very recent work proposes a retrieve-and-edit approach to automatically generate unit test oracles, <italic xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">i.e.,</i> assertions. Despite being promising, it is still far from perfect due to some limitations, such as splitting assertion retrieval and generation into two separate components without benefiting each other. In this paper, we propose AG-RAG, a retrieval-augmented automated assertion generation (AG) approach that leverages external codebases and joint training to address various technical limitations of prior work. Inspired by the plast...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/tse.2025.3545970","openalex_id":"https://openalex.org/W4407948936","cited_by_count":3,"quality_score":48,"matched_keywords":["language model","retrieval"],"author_affiliations":["Huawei Technologies (China)","Nanjing University","Swinburne University of Technology"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8730641603469849},{"id":"https://openalex.org/C40422974","display_name":"Assertion","score":0.7592013478279114},{"id":"https://openalex.org/C18555067","display_name":"Joint (building)","score":0.7118768692016602},{"id":"https://openalex.org/C2777211547","display_name":"Training (meteorology)","score":0.6325262188911438},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5205187201499939},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.342241495847702},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.32838788628578186},{"id":"https://openalex.org/C199360897","display_name":"Programming language","score":0.1950187087059021}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":3}},{"id":"openalex:W4409262235","title":"Learning Visual Grounding from Generative Vision and Language Model","url":"https://doi.org/10.1109/wacv61041.2025.00782","published":"2025-02-26","authors":["Shijie Wang","Dahun Kim","Ali Taalimi","Chen Sun","Weicheng Kuo"],"abstract":"Visual grounding tasks aim to localize image regions based on natural language references. In this work, we ex-plore whether generative VLMs predominantly trained on image-text data could be leveraged to scale up the text an-notation of visual grounding data. We find that grounding knowledge already exists in generative VLM and can be elicited by proper prompting. We thus prompt a VLM to generate object-level descriptions by feeding it object regions from existing object detection datasets. We fur-ther propose attribute modeling to explicitly capture the im-portant object attributes, and spatial relation modeling to capture inter-object relationship, both of which are common linguistic pattern in referring expression. Our constructed dataset (500K images, 1M objects, 16M referring expressions) is one of the largest grounding datasets to date, and the first grounding dataset with purely m...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/wacv61041.2025.00782","openalex_id":"https://openalex.org/W4409262235","cited_by_count":3,"quality_score":44,"matched_keywords":["language model"],"author_affiliations":["DeepMind (United Kingdom)","Google (United States)","John Brown University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7151575088500977},{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.6414015293121338},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5500534772872925},{"id":"https://openalex.org/C167966045","display_name":"Generative model","score":0.5421707034111023},{"id":"https://openalex.org/C168993435","display_name":"Ground","score":0.45463886857032776},{"id":"https://openalex.org/C200220432","display_name":"Vision science","score":0.4232426881790161},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.3943330645561218},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.3289804458618164}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":3}},{"id":"openalex:W4409262154","title":"Refining Text-to-Image Generation: Towards Accurate Training-Free Glyph-Enhanced Image Generation","url":"https://doi.org/10.1109/wacv61041.2025.00429","published":"2025-02-26","authors":["Sanyam Lakhanpal","Shivang Chopra","Vinija Jain","Aman Chadha","Man Luo"],"abstract":"Over the past few years, Text-to-Image (T2I) generation approaches based on diffusion models have gained signifi-cant attention. However, vanilla diffusion models often suffer from spelling inaccuracies in the text displayed within the generated images. The capability to generate visual text is crucial, offering both academic interest and a wide range of practical applications. To produce accurate visual text images, state-of-the-art techniques adopt a glyph-controlled image generation approach, consisting of a text layout generator followed by an image generator that is conditioned on the generated text layout. Nevertheless, our study reveals that these models still face three primary challenges, prompting us to develop a testbed to facilitate future research. We introduce a benchmark, LenCom-Eval, specifically designed for testing models' capability in generating images with Lengthy an...","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/wacv61041.2025.00429","openalex_id":"https://openalex.org/W4409262154","cited_by_count":6,"quality_score":43,"matched_keywords":[],"author_affiliations":["Amazon (United States)","Arizona State University","Georgia Institute of Technology","Intel (United States)"],"concepts":[{"id":"https://openalex.org/C60044698","display_name":"Refining (metallurgy)","score":0.7498177886009216},{"id":"https://openalex.org/C142816647","display_name":"Glyph (data visualization)","score":0.7410925626754761},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7358531951904297},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.6167354583740234},{"id":"https://openalex.org/C115961682","display_name":"Image (mathematics)","score":0.5573858618736267},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.5039708018302917},{"id":"https://openalex.org/C2776674983","display_name":"Image editing","score":0.4396514892578125},{"id":"https://openalex.org/C2777211547","display_name":"Training (meteorology)","score":0.43418505787849426}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":6}},{"id":"openalex:W4407953564","title":"Zero-Shot Image Moderation in Google Ads with LLM-Assisted Textual Descriptions and Cross-modal Co-embeddings","url":"https://doi.org/10.1145/3701551.3706127","published":"2025-02-26","authors":["Enming Luo","Wei Qiao","Katie Warren","Jingxiang Li","Eric Xiao","Krishna Viswanathan","Yuan Wang","Yintao Liu","Jimin Li","Ariel Fuxman"],"abstract":"","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3701551.3706127","openalex_id":"https://openalex.org/W4407953564","cited_by_count":0,"quality_score":41,"matched_keywords":["LLM"],"author_affiliations":["Google (United States)"],"concepts":[{"id":"https://openalex.org/C71139939","display_name":"Modal","score":0.7339025735855103},{"id":"https://openalex.org/C2778344882","display_name":"Shot (pellet)","score":0.7021214962005615},{"id":"https://openalex.org/C93225998","display_name":"Moderation","score":0.6385262608528137},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6026949882507324},{"id":"https://openalex.org/C115961682","display_name":"Image (mathematics)","score":0.547374963760376},{"id":"https://openalex.org/C2780813799","display_name":"Zero (linguistics)","score":0.5251964330673218},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3592703938484192},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.3515230417251587}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4407953196","title":"Advancing Voice AI for E-commerce: Tracking ASR Model Performance at Scale","url":"https://doi.org/10.1145/3701551.3706130","published":"2025-02-26","authors":["Dhruv Agarwal","Nupur K. Neti","Federica Cerina"],"abstract":"Traditionally, automatic speech recognition (ASR) systems rely on human transcriptions to calculate word error rate (WER) by comparing ASR outputs to manual transcriptions. Recently, Amazon's mobile voice shopping platform stopped storing audio from incoming requests to enhance customer privacy, making offline, human-based evaluation unfeasible. This presentation introduces a multitask Speech LLM-based system that processes real-time audio, extracting key features to track ASR performance and detect traffic shifts-all without storing audio or requiring human annotations. Additionally, we demonstrate how combining these features with a synthetic audio generation model (TTS) enables accurate detection of ASR performance degradation, ensuring continuous optimization of the customer voice experience.","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3701551.3706130","openalex_id":"https://openalex.org/W4407953196","cited_by_count":0,"quality_score":41,"matched_keywords":["LLM"],"author_affiliations":["Amazon (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7447683811187744},{"id":"https://openalex.org/C2778755073","display_name":"Scale (ratio)","score":0.5358520746231079},{"id":"https://openalex.org/C28490314","display_name":"Speech recognition","score":0.49840545654296875},{"id":"https://openalex.org/C2775936607","display_name":"Tracking (education)","score":0.4257619082927704},{"id":"https://openalex.org/C19417346","display_name":"Pedagogy","score":0.0},{"id":"https://openalex.org/C15744967","display_name":"Psychology","score":0.0},{"id":"https://openalex.org/C62520636","display_name":"Quantum mechanics","score":0.0},{"id":"https://openalex.org/C121332964","display_name":"Physics","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4409262758","title":"Fine-grained Controllable Video Generation via Object Appearance and Context","url":"https://doi.org/10.1109/wacv61041.2025.00364","published":"2025-02-26","authors":["Hsin–Ping Huang","Yu-Chuan Su","Deqing Sun","Lu Jiang","Xuhui Jia","Yukun Zhu","Shuicheng Yan"],"abstract":"While text-to-video generation shows state-of-the-art results, fine-grained output control remains challenging for users relying solely on natural language prompts. In this work, we present FACTOR for fine-grained controllable video generation. FACTOR provides an intuitive interface where users can manipulate the trajectory and appearance of individual objects in conjunction with a text prompt. We propose a unified framework to integrate these control signals into an existing text-to-video model. Our approach involves a multimodal condition module with a joint encoder, control-attention layers, and an appearance augmentation mechanism. This design enables FACTOR to generate videos that closely align with detailed user specifications. Extensive experiments on standard benchmarks and user-provided inputs demonstrate a notable improvement in controllability by FACTOR over competitive baseli...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/wacv61041.2025.00364","openalex_id":"https://openalex.org/W4409262758","cited_by_count":1,"quality_score":38,"matched_keywords":[],"author_affiliations":["DeepMind (United Kingdom)","Google (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7193331718444824},{"id":"https://openalex.org/C2779343474","display_name":"Context (archaeology)","score":0.5277206301689148},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.37936335802078247},{"id":"https://openalex.org/C121684516","display_name":"Computer graphics (images)","score":0.3417811393737793},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3242947459220886},{"id":"https://openalex.org/C127313418","display_name":"Geology","score":0.07690742611885071},{"id":"https://openalex.org/C151730666","display_name":"Paleontology","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4409263347","title":"Generating Long-Take Videos via Effective Keyframes and Guidance","url":"https://doi.org/10.1109/wacv61041.2025.00365","published":"2025-02-26","authors":["Hsin–Ping Huang","Yu-Chuan Su","Ming–Hsuan Yang"],"abstract":"We tackle the challenge of generating long-take videos encompassing multiple non-repetitive yet coherent events. Existing approaches generate long videos conditioned on single input guidance, often leading to repetitive content. To address this problem, we develop a framework that uses multiple guidance sources to enhance long video generation. The main idea of our approach is to decouple video generation into keyframe generation and frame interpolation. In this process, keyframe generation focuses on cre-ating multiple coherent events, while the frame interpolation stage generates smooth intermediate frames between keyframes using existing video generation models. A novel mask attention module is further introduced to improve co-herence and efficiency. Experiments on challenging real-world videos demonstrate that the proposed method outper-forms prior methods by up to 9.5% in objective....","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/wacv61041.2025.00365","openalex_id":"https://openalex.org/W4409263347","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["DeepMind (United Kingdom)","Google (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7442679405212402},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.35487988591194153},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3471473455429077},{"id":"https://openalex.org/C49774154","display_name":"Multimedia","score":0.33696144819259644}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4409263235","title":"Contrastive Sequential-Diffusion Learning: Non-Linear and Multi-Scene Instructional Video Synthesis","url":"https://doi.org/10.1109/wacv61041.2025.00456","published":"2025-02-26","authors":["Vasco Ramos","Yonatan Bitton","Michal Yarom","Idan Szpektor","João Pedro de Magalhães"],"abstract":"Generated video scenes for action-centric sequence descriptions, such as recipe instructions and do-it-yourself projects, often include non-linear patterns, where the next video may need to be visually consistent not with the immediately preceding video but with earlier ones. Current multi-scene video synthesis approaches fail to meet these consistency requirements. To address this, we propose a contrastive sequential video diffusion method that selects the most suitable previously generated scene to guide and condition the denoising process of the next scene. The result is a multi-scene video that is grounded in the scene descriptions and coherent w.r.t. the scenes that require visual consistency. Experiments with action-centered data from the real world demonstrate the practicality and improved consistency of our model compared to previous work. Code and examples available at https://g...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/wacv61041.2025.00456","openalex_id":"https://openalex.org/W4409263235","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Google (United States)","Universidade Nova de Lisboa"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7736615538597107},{"id":"https://openalex.org/C49774154","display_name":"Multimedia","score":0.4557180106639862},{"id":"https://openalex.org/C69357855","display_name":"Diffusion","score":0.45529747009277344},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.42182299494743347},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.33823317289352417},{"id":"https://openalex.org/C121332964","display_name":"Physics","score":0.0},{"id":"https://openalex.org/C97355855","display_name":"Thermodynamics","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/satclip-global-general-purpose-location-embeddings-with-satellite-imagery-2","title":"SatCLIP: Global, General-Purpose Location Embeddings with Satellite Imagery","url":"https://www.microsoft.com/en-us/research/publication/satclip-global-general-purpose-location-embeddings-with-satellite-imagery-2/","published":"2025-02-25","authors":["Konstantin Klemmer","Esther Rolf","Caleb Robinson","Lester Mackey","M. Russwurm"],"abstract":"Geographic information is essential for modeling tasks in fields ranging from ecology to epidemiology. However, extracting relevant location characteristics for a given task can be challenging, often requiring expensive data fusion or distillation from massive global imagery datasets. To address this challenge, we introduce Satellite Contrastive Location-Image Pretraining (SatCLIP). This global, general-purpose geographic location encoder learns an implicit representation of locations by matching CNN and ViT inferred visual patterns of openly available satellite imagery with their geographic coordinates. The resulting SatCLIP location encoder efficiently summarizes the characteristics of any given location for convenient use in downstream tasks. In our experiments, we use SatCLIP embeddings to improve prediction performance on nine diverse location-dependent tasks including temperature p...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Computer vision","Ecology and environment","Computer science","1970-01-01","distillation"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/conformal-linguistic-calibration-trading-off-between-factuality-and-specificity","title":"Conformal Linguistic Calibration: Trading-off between Factuality and Specificity","url":"https://www.microsoft.com/en-us/research/publication/conformal-linguistic-calibration-trading-off-between-factuality-and-specificity/","published":"2025-02-25","authors":["Zhengping Jiang","Anqi Liu","Ben Van Durme"],"abstract":"Language model outputs are not always reliable, thus prompting research into how to adapt model responses based on uncertainty. Common approaches include: \\emph{abstention}, where models refrain from generating responses when uncertain; and \\emph{linguistic calibration}, where models hedge their statements using uncertainty quantifiers. However, abstention can withhold valuable information, while linguistically calibrated responses are often challenging to leverage in downstream tasks. We propose a unified view, Conformal Linguistic Calibration (CLC), which reinterprets linguistic calibration as \\emph{answer set prediction}. First we present a framework connecting abstention and linguistic calibration through the lens of linguistic pragmatics. We then describe an implementation of CLC that allows for controlling the level of imprecision in model responses. Results demonstrate our method....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Human language technologies","Computer science","Natural language processing","1970-01-01","language model"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/automated-feature-engineering-for-single-trial-eeg-and-eye-tracking-classification-in-predictive-text-interfaces","title":"Automated Feature Engineering for Single-Trial EEG and Eye-Tracking Classification in Predictive Text Interfaces","url":"https://www.microsoft.com/en-us/research/publication/automated-feature-engineering-for-single-trial-eeg-and-eye-tracking-classification-in-predictive-text-interfaces/","published":"2025-02-25","authors":["Ard Kastrati","R. Michael Winters","Nemanja Djuric","Ivan Tashev","Yu-Te Wang"],"abstract":"Brain-Computer Interfaces (BCIs) offer a direct connection between the human brain and digital systems, enabling innovative applications. However, realizing the full potential of BCIs remains challenging due to issues like noise, artifacts, and limited data availability. In this study, we develop a multimodal classifier that integrates electroencephalogram (EEG) and eye-tracking (ET) data to decode user responses to predictive text suggestions. Utilizing an automated feature engineering approach, our pipeline efficiently generates and selects relevant features without extensive manual intervention or deep theoretical insights. Applied to a recent BCI case study involving predictive text input, our method achieved higher classification accuracies compared to traditional approaches. Additionally, it revealed novel insights, such as behavioral patterns where participants did not fully read....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Audio and Acoustics","Human-computer interaction","Brain–computer interface","Human–computer interaction","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W4407922851","title":"Igniting Language Intelligence: The Hitchhiker’s Guide from Chain-of-Thought Reasoning to Language Agents","url":"https://doi.org/10.1145/3719341","published":"2025-02-25","authors":["Zhuosheng Zhang","Yao Yao","Aston Zhang","Xiangru Tang","Xinbei Ma","Zhiwei He","Yiming Wang","Mark Gerstein","Rui Wang","Gongshen Liu","Hai Zhao"],"abstract":"Large language models (LLMs) have dramatically enhanced the field of language intelligence, as demonstrably evidenced by their formidable empirical performance across a spectrum of complex reasoning tasks. Additionally, theoretical proofs have illuminated their emergent reasoning capabilities, providing a compelling showcase of their advanced cognitive abilities in linguistic contexts. Critical to their remarkable efficacy in handling complex reasoning tasks, LLMs leverage the intriguing chain-of-thought (CoT) reasoning techniques, obliging them to formulate intermediate steps en route to deriving an answer. The CoT reasoning approach has not only exhibited proficiency in amplifying reasoning performance but also in enhancing interpretability, controllability, and flexibility. In light of these merits, recent research endeavors have extended CoT reasoning methodologies to nurture the dev...","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"review","doi":"https://doi.org/10.1145/3719341","openalex_id":"https://openalex.org/W4407922851","cited_by_count":28,"quality_score":69,"matched_keywords":["agent"],"author_affiliations":["Amazon (United States)","Shanghai Jiao Tong University","Yale University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8419373631477356},{"id":"https://openalex.org/C199185054","display_name":"Chain (unit)","score":0.5354995727539062},{"id":"https://openalex.org/C188147891","display_name":"Cognitive science","score":0.4455050528049469},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4328559637069702},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.40342384576797485},{"id":"https://openalex.org/C199360897","display_name":"Programming language","score":0.3357214629650116},{"id":"https://openalex.org/C15744967","display_name":"Psychology","score":0.08719271421432495},{"id":"https://openalex.org/C1276947","display_name":"Astronomy","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":28}},{"id":"bytedance-seed:223","title":"You Only Sample Once: Taming One-Step Text-to-Image Synthesis by Self-Cooperative Diffusion GANs","url":"https://seed.bytedance.com/en/research/you-only-sample-once-taming-one-step-text-to-image-synthesis-by-self-cooperative-diffusion-gans","published":"2025-02-25","authors":["Yihong Luo","Xiaolong Chen","Xinghua Qu","Tianyang Hu","Jing Tang"],"abstract":"Recently, some works have tried to combine diffusion and Generative Adversarial Networks (GANs) to alleviate the computational cost of the iterative denoising inference in Diffusion Models (DMs). However, existing works in this line suffer from either training instability and mode collapse or subpar one-step generation learning efficiency. To address these issues, we introduce YOSO, a novel generative model designed for rapid, scalable, and high-fidelity one-step image synthesis with high training stability and mode coverage. Specifically, we smooth the adversarial divergence by the denoising generator itself, performing self-cooperative learning. We show that our method can serve as a one-step generation model training from scratch with competitive performance. Moreover, we extend our YOSO to one-step text-to-image generation based on pre-trained models by several effective training tec...","companies":["ByteDance/Seed"],"matched_orgs":["ByteDance/Seed"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Computer Vision","Speech","ICLR 2025","efficient"],"author_affiliations":["ByteDance/Seed"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official ByteDance Seed publication API https://seed.bytedance.com/api/get_article_list_v2"}},{"id":"bytedance-seed:219","title":"X-NeMo: Expressive Neural Motion Reenactment via Disentangled Latent Attention","url":"https://seed.bytedance.com/en/research/x-nemo-expressive-neural-motion-reenactment-via-disentangled-latent-attention","published":"2025-02-25","authors":["XiaoChen Zhao","Hongyi Xu","Guoxian Song","You Xie","Chenxu Zhang","Xiu Li","Linjie Luo","Jinli Suo","Yebin Liu"],"abstract":"We propose X-NeMo, a novel zero-shot diffusion-based portrait animation pipeline that animates a static portrait using facial movements from a driving video of a different individual. Our work first identifies the root causes of the limitations in prior approaches, such as identity leakage and difficulty in capturing subtle and extreme expressions. To address these challenges, we introduce a fully end-to-end training framework that distills a 1D identity-agnostic latent motion descriptor from driving image, effectively controlling motion through cross-attention during image generation. Our implicit motion descriptor captures expressive facial motion in fine detail, learned end-to-end from a diverse video dataset without reliance on any pre-trained motion detectors. We further disentangle motion latents from identity cues with enhanced expressiveness by supervising their learning with a d...","companies":["ByteDance/Seed"],"matched_orgs":["ByteDance/Seed"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Computer Vision","Vision","ICLR 2025"],"author_affiliations":["ByteDance/Seed"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official ByteDance Seed publication API https://seed.bytedance.com/api/get_article_list_v2"}},{"id":"official:53033d8b8498e7f5","title":"ShieldGemma 1 Model Card","url":"https://ai.google.dev/gemma/docs/shieldgemma/model_card","published":"2025-02-25","authors":["Google/DeepMind"],"abstract":"Official Google DeepMind model card.","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"model_card","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["model card","ShieldGemma 1"],"author_affiliations":["Google/DeepMind"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Google DeepMind model cards page https://deepmind.google/models/model-cards/"}},{"id":"official:09bfc752d9fcfd91","title":"RecurrentGemma Model Card","url":"https://ai.google.dev/gemma/docs/recurrentgemma/model_card","published":"2025-02-25","authors":["Google/DeepMind"],"abstract":"Official Google DeepMind model card.","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"model_card","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["model card","RecurrentGemma"],"author_affiliations":["Google/DeepMind"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Google DeepMind model cards page https://deepmind.google/models/model-cards/"}},{"id":"official:78d5515cacf9a5c9","title":"PaliGemma 2 Model Card","url":"https://ai.google.dev/gemma/docs/paligemma/model-card-2","published":"2025-02-25","authors":["Google/DeepMind"],"abstract":"Official Google DeepMind model card.","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"model_card","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["model card","PaliGemma 2"],"author_affiliations":["Google/DeepMind"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Google DeepMind model cards page https://deepmind.google/models/model-cards/"}},{"id":"official:e010d6295dc6bcbf","title":"PaliGemma 1 Model Card","url":"https://ai.google.dev/gemma/docs/paligemma/model-card","published":"2025-02-25","authors":["Google/DeepMind"],"abstract":"Official Google DeepMind model card.","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"model_card","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["model card","PaliGemma 1"],"author_affiliations":["Google/DeepMind"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Google DeepMind model cards page https://deepmind.google/models/model-cards/"}},{"id":"official:b88f593c3996b4eb","title":"Gemma 2 Model Card","url":"https://ai.google.dev/gemma/docs/model_card_2","published":"2025-02-25","authors":["Google/DeepMind"],"abstract":"Official Google DeepMind model card.","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"model_card","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["model card","Gemma 2"],"author_affiliations":["Google/DeepMind"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Google DeepMind model cards page https://deepmind.google/models/model-cards/"}},{"id":"official:88158d44710c7b94","title":"Gemma 1 Model Card","url":"https://ai.google.dev/gemma/docs/model_card","published":"2025-02-25","authors":["Google/DeepMind"],"abstract":"Official Google DeepMind model card.","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"model_card","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["model card","Gemma 1"],"author_affiliations":["Google/DeepMind"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Google DeepMind model cards page https://deepmind.google/models/model-cards/"}},{"id":"official:acda28140a68473c","title":"CodeGemma Model Card","url":"https://ai.google.dev/gemma/docs/codegemma/model_card","published":"2025-02-25","authors":["Google/DeepMind"],"abstract":"Official Google DeepMind model card.","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"model_card","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["model card","CodeGemma"],"author_affiliations":["Google/DeepMind"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Google DeepMind model cards page https://deepmind.google/models/model-cards/"}},{"id":"official:ba9dc2e550205ae7","title":"Deep research System Card","url":"https://openai.com/index/deep-research-system-card","published":"2025-02-25","authors":["OpenAI"],"abstract":"This report outlines the safety work carried out prior to releasing deep research including external red teaming, frontier risk evaluations according to our Preparedness Framework, and an overview of the mitigations we built in to address key risk areas.","companies":["OpenAI"],"matched_orgs":["OpenAI"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Safety"],"author_affiliations":["OpenAI"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official RSS feed https://openai.com/news/rss.xml"}},{"id":"apple:smyalg9mwcrcry8nyx87y1xv","title":"MIA-Bench: Towards Better Instruction Following Evaluation of Multimodal LLMs","url":"https://machinelearning.apple.com/research/towards-better-instruction-following","published":"2025-02-25","authors":["Yusu Qian","Hanrong Ye","Jean-Philippe Fauconnier","Peter Grasch","Yinfei Yang","Zhe Gan"],"abstract":"We introduce MIA-Bench, a new benchmark designed to evaluate multimodal large language models (MLLMs) on their ability to strictly adhere to complex instructions. Our benchmark comprises a diverse set of 400 image-prompt pairs, each crafted to challenge the models' compliance with layered instructions in generating accurate responses that satisfy specific requested patterns. Evaluation results from a wide array of state-of-the-art MLLMs reveal...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"official:bdf8a5aa0228fea7","title":"... QwQ-Max-Preview","url":"https://qwenlm.github.io/blog/qwq-max-preview/","published":"2025-02-25","authors":["Alibaba/Qwen"],"abstract":"QWEN CHAT DISCORDThis is a blog created by QwQ-Max-Preview. We hope you enjoy it!Introduction <think>Okay, the user wants me to create a title and introduction for their blog announcing the release of QwQ-Max-Preview. Let me start by understanding the key points they mentioned. First, the model is part of the Qwen series, built on Qwen2.5-Max. It’s a preview version, so they probably want to highlight that it’s a sneak peek before the full release.","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Alibaba/Qwen"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official RSS feed https://qwenlm.github.io/blog/index.xml"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/advancing-multi-modal-sensing-through-expandable-modality-alignment","title":"Babel: A Scalable Pre-trained Model for Multi-Modal Sensing via Expandable Modality Alignment","url":"https://www.microsoft.com/en-us/research/publication/advancing-multi-modal-sensing-through-expandable-modality-alignment/","published":"2025-02-24","authors":["Shenghong Dai","Shiqi Jiang","Yifan Yang","Ting Cao","Mo Li","Suman Banerjee","Lili Qiu"],"abstract":"This paper presents Babel, the expandable modality alignment model, specially designed for multi-modal sensing. While there has been considerable work on multi-modality alignment, they all struggle to effectively incorporate multiple sensing modalities due to the data scarcity constraints. How to utilize multi-modal data with partial pairings in sensing remains an unresolved challenge. Babel tackles this challenge by introducing the concept of expandable modality alignment. The key idea involves transforming the N-modality alignment into a series of binary-modality alignments. Novel techniques are also proposed to further mitigate data scarcity issue and balance the contribution of the newly incorporated modality with the previously established modality alignment during the expandable alignment process. We provide the comprehensive implementation. In the pre-training phase, Babel current...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":96,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer vision","Systems and networking","Computer science","Computer Vision and Pattern Recognition","Engineering","Machine learning","Signal processing","1970-01-01","retrieval"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/mpo-an-efficient-post-processing-framework-for-mixing-diverse-preference-alignment","title":"MPO: An Efficient Post-Processing Framework for Mixing Diverse Preference Alignment","url":"https://www.microsoft.com/en-us/research/publication/mpo-an-efficient-post-processing-framework-for-mixing-diverse-preference-alignment/","published":"2025-02-24","authors":["Tianze Wang","Dongnan Gui","Yifan Hu","Shuhang Lin","Linjun Zhang"],"abstract":"Reinforcement Learning from Human Feedback (RLHF) has shown promise in aligning large language models (LLMs). Yet its reliance on a singular reward model often overlooks the diversity of human preferences. Recent approaches address this limitation by leveraging multi-dimensional feedback to fine-tune corresponding reward models and train LLMs using reinforcement learning. However, the process is costly and unstable, especially given the competing and heterogeneous nature of human preferences. In this paper, we propose Mixing Preference Optimization (MPO), a post-processing framework for aggregating single-objective policies as an alternative to both multi-objective RLHF (MORLHF) and MaxMin-RLHF. MPO avoids alignment from scratch. Instead, it log-linearly combines existing policies into a unified one with the weight of each policy computed via a batch stochastic mirror descent. Empirical....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":88,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Human-computer interaction","Computer science","mathematics","Reinforcement learning","1970-01-01","preference","efficient"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/ampo-active-multi-preference-optimization","title":"AMPO: Active Multi-Preference Optimization","url":"https://www.microsoft.com/en-us/research/publication/ampo-active-multi-preference-optimization/","published":"2025-02-24","authors":["Taneesh Gupta","Rahul Madhavan","Xuchao Zhang","Chetan Bansal","Saravan Rajmohan"],"abstract":"Multi-preference optimization enriches language-model alignment beyond pairwise preferences by contrasting entire sets of helpful and undesired responses, thereby enabling richer training signals for large language models. During self-play alignment, these models often produce numerous candidate answers per query, rendering it computationally infeasible to include all responses in the training objective. In this work, we propose $\\textit{Active Multi-Preference Optimization}$ (AMPO), a novel approach that combines on-policy generation, a multi-preference group-contrastive loss, and active subset selection. Specifically, we score and embed large candidate pools of responses and then select a small, yet informative, subset that covers reward extremes and distinct semantic clusters for preference optimization. Our contrastive training scheme is capable of identifying not only the best and w...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","1970-01-01","preference"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/consequences-of-training-data-composition-for-deep-learning-models-in-single-cell-biology","title":"Consequences of training data composition for deep learning models in single-cell biology","url":"https://www.microsoft.com/en-us/research/publication/consequences-of-training-data-composition-for-deep-learning-models-in-single-cell-biology/","published":"2025-02-24","authors":["Ajay Nadig","Akshaya Thoutam","Madeline Hughes","Anay Gupta","Andrew W. Navia","Nicolo Fusi","Srivatsan Raghavan","Peter S. Winter","Ava P. Amini","Lorin Crawford"],"abstract":"Foundation models for single-cell transcriptomics have the potential to augment (or replace) purpose-built tools for a variety of common analyses, especially when data are sparse. Recent work with large language models has shown that training data composition greatly shapes performance; however, to date, single-cell foundation models have ignored this aspect, opting instead to train on the largest possible corpus. We systematically investigate the consequences of training dataset composition on the behavior of deep learning models of single-cell transcriptomics, focusing on human hematopoiesis as a tractable model system and including cells from adult and developing tissues, disease states, and perturbation atlases. We find that (1) these models generalize poorly to unseen cell types, (2) adding malignant cells to a healthy cell training corpus does not necessarily improve modeling of un...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.1101/2025.02.19.639127","openalex_id":"https://openalex.org/W4407893446","cited_by_count":2,"quality_score":70,"matched_keywords":["Article (Journal)","Medical, health and genomics","Artificial intelligence","Biology"],"author_affiliations":["Microsoft","Brigham and Women's Hospital","Broad Institute","Dana-Farber Cancer Institute","Georgia Institute of Technology","Harvard University","Massachusetts General Hospital","Microsoft (United States)","Microsoft Research (United Kingdom)"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"hf-org-paper:moonshotai:2502.16982","title":"Muon is Scalable for LLM Training","url":"https://huggingface.co/papers/2502.16982","published":"2025-02-24","authors":["Moonshot/Kimi"],"abstract":"","companies":["Moonshot/Kimi"],"matched_orgs":["Moonshot/Kimi"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["HuggingFace org papers","moonshotai","LLM"],"author_affiliations":["Moonshot/Kimi"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/moonshotai/papers"}},{"id":"openalex:W4407873490","title":"Uncover the balanced geometry in long-tailed contrastive language-image pretraining","url":"https://doi.org/10.1007/s10994-025-06745-w","published":"2025-02-24","authors":["Zhihan Zhou","Yuhuan Ye","Feng Hong","Peisen Zhao","Jiangchao Yao","Ya Zhang","Qi Tian","Yanfeng Wang"],"abstract":"","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1007/s10994-025-06745-w","openalex_id":"https://openalex.org/W4407873490","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Huawei Technologies (China)","Shandong Jiaotong University","Shanghai Jiao Tong University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5080742835998535},{"id":"https://openalex.org/C115961682","display_name":"Image (mathematics)","score":0.49637371301651},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.47665244340896606},{"id":"https://openalex.org/C41895202","display_name":"Linguistics","score":0.4489518702030182},{"id":"https://openalex.org/C2524010","display_name":"Geometry","score":0.3630911707878113},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.3599942922592163},{"id":"https://openalex.org/C33923547","display_name":"Mathematics","score":0.3284534811973572},{"id":"https://openalex.org/C138885662","display_name":"Philosophy","score":0.09124088287353516}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/cosmos-a-hybrid-adaptive-optimizer-for-memory-efficient-training-of-llms","title":"COSMOS: A Hybrid Adaptive Optimizer for Memory-Efficient Training of LLMs","url":"https://www.microsoft.com/en-us/research/publication/cosmos-a-hybrid-adaptive-optimizer-for-memory-efficient-training-of-llms/","published":"2025-02-23","authors":["Liming Liu","Zhenghao Xu","Zixuan Zhang","Hao Kang","Zichong Li","Chen Liang","Weizhu Chen","Tuo Zhao"],"abstract":"Large Language Models (LLMs) have demonstrated remarkable success across various domains, yet their optimization remains a significant challenge due to the complex and high-dimensional loss landscapes they inhabit. While adaptive optimizers such as AdamW are widely used, they suffer from critical limitations, including an inability to capture interdependencies between coordinates and high memory consumption. Subsequent research, exemplified by SOAP, attempts to better capture coordinate interdependence but incurs greater memory overhead, limiting scalability for massive LLMs. An alternative approach aims to reduce memory consumption through low-dimensional projection, but this leads to substantial approximation errors, resulting in less effective optimization (e.g., in terms of per-token efficiency). In this paper, we propose COSMOS, a novel hybrid optimizer that leverages the varying im...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":80,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","large language models","1970-01-01","memory","efficient"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/pretrain-value-not-reward-decoupled-value-policy-optimization","title":"Pretrain Value, Not Reward: Decoupled Value Policy Optimization","url":"https://www.microsoft.com/en-us/research/publication/pretrain-value-not-reward-decoupled-value-policy-optimization/","published":"2025-02-23","authors":["Chenghua Huang","Lu Wang","Fangkai Yang","Pu Zhao","Zhixu Li","Qingwei Lin 林庆维","Dongmei Zhang","S. Rajmohan","Qi Zhang"],"abstract":"In this paper, we explore how directly pretraining a value model simplifies and stabilizes reinforcement learning from human feedback (RLHF). In reinforcement learning, value estimation is the key to policy optimization, distinct from reward supervision. The value function predicts the \\emph{return-to-go} of a partial answer, that is, how promising the partial answer is if it were continued to completion. In RLHF, however, the standard pipeline first pretrains a reward model and then learns a value function online, even though no new reward signals are available once preference data is collected. This makes critic learning redundant, as the process of training a reward model and then deriving a value model is informationally equivalent to directly pretraining a value model. Importantly, this requires no additional supervision, and our value model is trained on exactly the same data used....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","Reinforcement learning","1970-01-01","preference"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/improving-llm-general-preference-alignment-via-optimistic-online-mirror-descent","title":"Improving LLM General Preference Alignment via Optimistic Online Mirror Descent","url":"https://www.microsoft.com/en-us/research/publication/improving-llm-general-preference-alignment-via-optimistic-online-mirror-descent/","published":"2025-02-23","authors":["Yuheng Zhang","Dian Yu","Tao Ge","Linfeng Song","Zhichen Zeng","Haitao Mi","Nan Jiang","Dong Yu"],"abstract":"Reinforcement learning from human feedback (RLHF) has demonstrated remarkable effectiveness in aligning large language models (LLMs) with human preferences. Many existing alignment approaches rely on the Bradley-Terry (BT) model assumption, which assumes the existence of a ground-truth reward for each prompt-response pair. However, this assumption can be overly restrictive when modeling complex human preferences. In this paper, we drop the BT model assumption and study LLM alignment under general preferences, formulated as a two-player game. Drawing on theoretical insights from learning in games, we integrate optimistic online mirror descent into our alignment framework to approximate the Nash policy. Theoretically, we demonstrate that our approach achieves an $O(T^{-1})$ bound on the duality gap, improving upon the previous $O(T^{-1/2})$ result. More importantly, we implement our method...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","1970-01-01","LLM","preference"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/swan-sgd-with-normalization-and-whitening-enables-stateless-llm-training","title":"SWAN: SGD with Normalization and Whitening Enables Stateless LLM Training","url":"https://www.microsoft.com/en-us/research/publication/swan-sgd-with-normalization-and-whitening-enables-stateless-llm-training/","published":"2025-02-21","authors":["Chao Ma","Wenbo Gong","Meyer Scetbon","Edward Meeds"],"abstract":"Adaptive optimizers such as Adam (Kingma&Ba, 2015) have been central to the success of large language models. However, they often require to maintain optimizer states throughout training, which can result in memory requirements several times greater than the model footprint. This overhead imposes constraints on scalability and computational efficiency. Stochastic Gradient Descent (SGD), in contrast, is a stateless optimizer, as it does not track state variables during training.Consequently, it achieves optimal memory efficiency. However, its capability in LLM training is limited (Zhao et al., 2024b). In this work, we show that pre-processing SGD in a stateless manner can achieve the same performance as the Adam optimizer for LLM training, while drastically reducing the memory cost. Specifically, we propose to pre-process the instantaneous stochastic gradients using normalization and whit...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":80,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","large language models","1970-01-01","LLM","memory"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/towards-foundation-models-for-mixed-integer-linear-programming","title":"Towards Foundation Models for Mixed Integer Linear Programming","url":"https://www.microsoft.com/en-us/research/publication/towards-foundation-models-for-mixed-integer-linear-programming/","published":"2025-02-21","authors":["Sirui Li","Janardhan (Jana) Kulkarni","Ishai Menache","Cathy Wu","Beibin Li"],"abstract":"Mixed Integer Linear Programming (MILP) is essential for modeling complex decision-making problems but faces challenges in computational tractability and requires expert formulation. Current deep learning approaches for MILP focus on specific problem classes and do not generalize to unseen classes. To address this shortcoming, we take a foundation model training approach, where we train a single deep learning model on a diverse set of MILP problems to generalize across problem classes. As existing datasets for MILP lack diversity and volume, we introduce MILP-Evolve, a novel LLM-based evolutionary framework that is capable of generating a large set of diverse MILP classes with an unlimited amount of instances. We study our methodology on three key learning tasks that capture diverse aspects of MILP: (1) integrality gap prediction, (2) learning to branch, and (3) a new task of aligning MI...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","Machine learning","1970-01-01","LLM"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/steering-llms-for-formal-theorem-proving","title":"Steering LLMs for Formal Theorem Proving","url":"https://www.microsoft.com/en-us/research/publication/steering-llms-for-formal-theorem-proving/","published":"2025-02-21","authors":["Shashank Kirtania","Arun Iyer"],"abstract":"Recent advances in automated theorem proving use Large Language Models (LLMs) to translate informal mathematical statements into formal proofs. However, informal cues are often ambiguous or lack strict logical structure, making it hard for models to interpret them precisely. While existing methods achieve strong performance, little is known about how LLMs internally represent informal cues, or how these influence proof generation. To address this, we explore \\textit{activation steering}, an inference-time intervention that identifies linear directions in residual activations associated with informal reasoning traces and adjusts them to improve proof construction without fine-tuning. This mechanism also yields interpretable information about how reasoning is internally encoded in the activation space of LLMs. We test our method for generating formal proofs from already-formalized theorems...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Miscellaneous","Artificial intelligence","Computer science"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W4411015872","title":"Research on Distributed Training Architecture for Large Scale Models for Natural Language Processing","url":"https://doi.org/10.1145/3728725.3728812","published":"2025-02-21","authors":["Jiangchuan Gong","Yang Wang"],"abstract":"Large-scale language models are critical for natural language processing, and efficient distributed learning is needed to train such models. In this paper, a distributed training scheme is developed to improve the efficiency and scalability of training large-scale language models. This training scheme is divided into a three-layer structure, and also incorporates an autotuned gradient compression method and a load balancing mechanism to improve training efficiency at any time. Experiments show that this scheme is highly efficient in terms of computation and resource utilization for large-scale training tasks, is much faster than current open systems, and offers new ideas for faster training of large-scale language models.","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3728725.3728812","openalex_id":"https://openalex.org/W4411015872","cited_by_count":0,"quality_score":45,"matched_keywords":["efficient","compression"],"author_affiliations":["Baidu (China)","Innovation Cluster (Canada)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8035697340965271},{"id":"https://openalex.org/C123657996","display_name":"Architecture","score":0.6710041761398315},{"id":"https://openalex.org/C2778755073","display_name":"Scale (ratio)","score":0.5047093629837036},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.4991598129272461},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4825965166091919},{"id":"https://openalex.org/C2777211547","display_name":"Training (meteorology)","score":0.4762939512729645},{"id":"https://openalex.org/C195324797","display_name":"Natural language","score":0.4522540867328644},{"id":"https://openalex.org/C2776608160","display_name":"Natural (archaeology)","score":0.4334523677825928}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4407833264","title":"Die-to-prompt: visual language model-based defect inspection and anomaly detection","url":"https://doi.org/10.1117/12.3052174","published":"2025-02-21","authors":["Yu Ding"],"abstract":"Currently Die-to-Die(D2D) and Die-to-Database(D2DB) methods for patterned wafer and reticle inspection have several limitations, including expensive and lengthy data collection, highly skewed dataset, long time-to-market and model not robust for different layer or when process drifts overtime. In this work, we’ll demonstrate zero shot defect inspection without needing reference image or model training using the NVIDIA Cosmos Nemotron vision language model (VLM). A prompt and target image could be sufficient to find defects reliability. The same out-of-box VLM can be deployed with NVIDIA video search and summarization (VSS) agent blueprint for anomaly detection in chip manufacturing production.","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1117/12.3052174","openalex_id":"https://openalex.org/W4407833264","cited_by_count":0,"quality_score":45,"matched_keywords":["language model","agent"],"author_affiliations":["Nvidia (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.666702151298523},{"id":"https://openalex.org/C739882","display_name":"Anomaly detection","score":0.5949364900588989},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.48175930976867676},{"id":"https://openalex.org/C12997251","display_name":"Anomaly (physics)","score":0.46367689967155457},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.3375433683395386},{"id":"https://openalex.org/C26873012","display_name":"Condensed matter physics","score":0.0},{"id":"https://openalex.org/C121332964","display_name":"Physics","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4407810413","title":"Trust at Your Own Peril: A Mixed Methods Exploration of the Ability of Large Language Models to Generate Expert‐Like Systems Engineering Artifacts and a Characterization of Failure Modes","url":"https://doi.org/10.1002/sys.21810","published":"2025-02-21","authors":["Taylan G. Topcu","Mohammad Husain","Max Ofsa","Paul Wach"],"abstract":"ABSTRACT Multi‐purpose large language models (LLMs), a subset of generative artificial intelligence (AI), have recently made significant progress. While expectations for LLMs to assist systems engineering (SE) tasks are paramount; the interdisciplinary and complex nature of systems, along with the need to synthesize deep‐domain knowledge and operational context, raise questions regarding the efficacy of LLMs to generate SE artifacts, particularly given that they are trained using data that is broadly available on the internet. To that end, we present results from an empirical exploration, where a human expert‐generated SE artifact was taken as a benchmark, parsed, and fed into various LLMs through prompt engineering to generate segments of typical SE artifacts. This procedure was applied without any fine‐tuning or calibration to document baseline LLM performance. We then adopted a two‐fo...","companies":["OpenAI"],"matched_orgs":["OpenAI"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1002/sys.21810","openalex_id":"https://openalex.org/W4407810413","cited_by_count":3,"quality_score":44,"matched_keywords":["LLM"],"author_affiliations":["OpenAI (United States)","Virginia Tech"],"concepts":[{"id":"https://openalex.org/C2780841128","display_name":"Characterization (materials science)","score":0.5507676005363464},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.4901769757270813},{"id":"https://openalex.org/C201995342","display_name":"Systems engineering","score":0.41810792684555054},{"id":"https://openalex.org/C539667460","display_name":"Management science","score":0.40867310762405396},{"id":"https://openalex.org/C127413603","display_name":"Engineering","score":0.3977136015892029},{"id":"https://openalex.org/C115903868","display_name":"Software engineering","score":0.3845236301422119},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.35740870237350464},{"id":"https://openalex.org/C188147891","display_name":"Cognitive science","score":0.35302862524986267}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":3}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/judging-the-judges-a-collection-of-llm-generated-relevance-judgements","title":"Judging the Judges: A Collection of LLM-Generated Relevance Judgements","url":"https://www.microsoft.com/en-us/research/publication/judging-the-judges-a-collection-of-llm-generated-relevance-judgements/","published":"2025-02-20","authors":["Hossein A. Rahmani","Clemencia Siro","Mohammad Aliannejadi","Nick Craswell","Charles L. A. Clarke","Guglielmo Faggioli","Bhaskar Mitra","Paul Thomas","Emine Yilmaz"],"abstract":"Using Large Language Models (LLMs) for relevance assessments offers promising opportunities to improve Information Retrieval (IR), Natural Language Processing (NLP), and related fields. Indeed, LLMs hold the promise of allowing IR experimenters to build evaluation collections with a fraction of the manual human labor currently required. This could help with fresh topics on which there is still limited knowledge and could mitigate the challenges of evaluating ranking systems in low-resource scenarios, where it is challenging to find human annotators. Given the fast-paced recent developments in the domain, many questions concerning LLMs as assessors are yet to be answered. Among the aspects that require further investigation, we can list the impact of various components in a relevance judgment generation pipeline, such as the prompt used or the LLM chosen.This paper benchmarks and reports....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":88,"matched_keywords":["Unpublished","Artificial intelligence","Search and information retrieval","automatic evaluation","Information retrieval","large language models","Synthetic data","LLM","retrieval"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/towards-efficient-optimizer-design-for-llm-via-structured-fisher-approximation-with-a-low-rank-extension","title":"Towards Efficient Optimizer Design for LLM via Structured Fisher Approximation with a Low-Rank Extension","url":"https://www.microsoft.com/en-us/research/publication/towards-efficient-optimizer-design-for-llm-via-structured-fisher-approximation-with-a-low-rank-extension/","published":"2025-02-20","authors":["Wenbo Gong","Meyer Scetbon","Chao Ma","Edward Meeds"],"abstract":"Designing efficient optimizers for large language models (LLMs) with low-memory requirements and fast convergence is an important and challenging problem. This paper makes a step towards the systematic design of such optimizers through the lens of structured Fisher information matrix (FIM) approximation. We show that many state-of-the-art efficient optimizers can be viewed as solutions to FIM approximation (under the Frobenius norm) with specific structural assumptions. Building on these insights, we propose two design recommendations of practical efficient optimizers for LLMs, involving the careful selection of structural assumptions to balance generality and efficiency, and enhancing memory efficiency of optimizers with general structures through a novel low-rank extension framework. We demonstrate how to use each design approach by deriving new memory-efficient optimizers: Row and Col...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":84,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","mathematics","1970-01-01","LLM","memory","efficient"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/parammute-suppressing-knowledge-critical-ffns-for-faithful-retrieval-augmented-generation","title":"ParamMute: Suppressing Knowledge-Critical FFNs for Faithful Retrieval-Augmented Generation","url":"https://www.microsoft.com/en-us/research/publication/parammute-suppressing-knowledge-critical-ffns-for-faithful-retrieval-augmented-generation/","published":"2025-02-20","authors":["Pengcheng Huang","Zhenghao Liu","Yukun Yan","Xiaoyuan Yi","Hao Chen","Zhiyuan Liu","Maosong Sun","Tong Xiao","Ge Yu","Chenyan Xiong"],"abstract":"Large language models (LLMs) integrated with retrieval-augmented generation (RAG) have improved factuality by grounding outputs in external evidence. However, they remain susceptible to unfaithful generation, where outputs contradict retrieved context despite its relevance and accuracy. Existing approaches aiming to improve faithfulness primarily focus on enhancing the utilization of external context, but often overlook the persistent influence of internal parametric knowledge during generation. In this work, we investigate the internal mechanisms behind unfaithful generation and identify a subset of mid-to-deep feed-forward networks (FFNs) that are disproportionately activated in such cases. Building on this insight, we propose Parametric Knowledge Muting through FFN Suppression (ParamMute), a framework that improves contextual faithfulness by suppressing the activation of unfaithfulnes...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":84,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","large language models","1970-01-01","LLM","memory","retrieval"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"apple:orqgx0c2ej3i159pjaocyiwe","title":"Wearable Accelerometer Foundation Models for Health via Knowledge Distillation","url":"https://machinelearning.apple.com/research/wearable-accelerometer-foundation-models","published":"2025-02-20","authors":["Salar Abbaspourazad","Anshuman Mishra","Joseph Futoma","Andrew C. Miller","Ian Shapiro"],"abstract":"Modern wearable devices can conveniently record various biosignals in the many different environments of daily living, enabling a rich view of individual health. However, not all biosignals are the same: high-fidelity biosignals, such as photoplethysmogram (PPG), contain more physiological information, but require optical sensors with a high power footprint. Alternatively, a lower-fidelity biosignal such as accelerometry has a significantly...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["distillation"],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"openalex:W4407772393","title":"Evaluating and Advancing Large Language Models for Water Knowledge Tasks in Engineering and Research","url":"https://doi.org/10.1021/acs.estlett.5c00038","published":"2025-02-20","authors":["Boyan Xu","Zihao Li","Yuxin Yang","Guanlan Wu","Chengzhi Wang","Xiongpeng Tang","Yu Li","Zihao Wu","Qingxian Su","Xueqing Shi","Yue Yang","Rui Tong"],"abstract":"Although large language models (LLMs) have demonstrated significant value in numerous fields, there remains limited research on evaluating their performance or enhancing their capabilities within water science and technology. This study initially assessed the performance of eight foundational models (i.e., GPT-4, GPT-3.5, Gemini, GLM-4, ERNIE, QWEN, Llama3-8B, and Llama3-70B) on a wide range of water knowledge tasks in engineering and research by developing an evaluation suite called WaterER (i.e., 1043 tasks). GPT-4 was demonstrated to excel in diverse water knowledge tasks in engineering and research. Llama3-70B was best for Chinese engineering queries, while Chinese-oriented models outperformed GPT-3.5 in English engineering tasks. Gemini demonstrated specialized academic capabilities in wastewater treatment, environmental restoration, drinking water treatment, sanitation, anaerobic d...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1021/acs.estlett.5c00038","openalex_id":"https://openalex.org/W4407772393","cited_by_count":18,"quality_score":55,"matched_keywords":[],"author_affiliations":["Beijing Normal University","Google (United States)","Institute of Art","Institute of Natural Science","National University of Singapore","Qingdao University of Science and Technology","Qingdao University of Technology","TÜV SÜD (Germany)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5465937256813049},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.3871935307979584}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":18}},{"id":"apple:hbf13t03yne4ylc40szh8jif","title":"Keyframer: Empowering Animation Design using Large Language Models","url":"https://machinelearning.apple.com/research/keyframer","published":"2025-02-20","authors":["Tiffany Tseng","Ruijia Cheng","Jeffrey Nichols"],"abstract":"Large language models (LLMs) have the potential to impact a wide range of creative domains, as exemplified in popular text-to-image generators like DALL·E and Midjourney. However, the application of LLMs to motion-based visual design has not yet been explored and presents novel challenges such as how users might effectively describe motion in natural language. Further, many existing generative design tools lack support for iterative refinement of...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"apple:ku5trday5aqmz7q3wz0arjsl","title":"Grounding Multimodal Large Language Models in Actions","url":"https://machinelearning.apple.com/research/grounding-multimodal-large","published":"2025-02-20","authors":["Andrew Szot","Bogdan Mazoure","Harsh Agrawal","Devon Hjelm","Zsolt Kira","Alexander Toshev"],"abstract":"Multimodal Large Language Models (MLLMs) have demonstrated a wide range of capabilities across many domains, including Embodied AI. In this work, we study how to best ground a MLLM into different embodiments and their associated action spaces, with the goal of leveraging the multimodal world knowledge of the MLLM. We first generalize a number of methods through a unified architecture and the lens of action space adaptors. For continuous actions,...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"openalex:W4407780143","title":"Analyzing patient perspectives with large language models: a cross-sectional study of sentiment and thematic classification on exception from informed consent","url":"https://doi.org/10.1038/s41598-025-89996-w","published":"2025-02-20","authors":["Aaron E. Kornblith","Chandan Singh","Johanna C. Innes","Todd P. Chang","Kathleen Adelgais","Maija Holsti","Joy Kim","Bradford McClain","Daniel K. Nishijima","Steffanie Rodgers","Manish I. Shah","Harold K. Simon"],"abstract":"Large language models (LLMs) can improve text analysis efficiency in healthcare. This study explores the application of LLMs to analyze patient perspectives within the exception from informed consent (EFIC) process, which waives consent in emergency research. Our objective is to assess whether LLMs can analyze patient perspectives in EFIC interviews with performance comparable to human reviewers. We analyzed 102 EFIC community interviews from 9 sites, each with 46 questions, as part of the Pediatric Dose Optimization for Seizures in Emergency Medical Services study. We evaluated 5 LLMs, including GPT-4, to assess sentiment polarity on a 5-point scale and classify responses into predefined thematic classes. Three human reviewers conducted parallel analyses, with agreement measured by Cohen's Kappa and classification accuracy. Polarity scores between LLM and human reviewers showed substant...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1038/s41598-025-89996-w","openalex_id":"https://openalex.org/W4407780143","cited_by_count":7,"quality_score":48,"matched_keywords":["LLM"],"author_affiliations":["Children's Healthcare of Atlanta","Children's Hospital of Los Angeles","Cincinnati Children's Hospital Medical Center","Emory University","George Washington University Hospital","Jacobs (United States)","Microsoft (United States)","Nationwide Children's Hospital","Oregon Health & Science University","Palo Alto University","Primary Children's Hospital","San Francisco Public Library","Stanford University","University of California, Davis","University of California, San Francisco","University of Colorado Denver","University of Southern California","University of Utah","University of Washington"],"concepts":[{"id":"https://openalex.org/C142052008","display_name":"Cross-sectional study","score":0.6597697734832764},{"id":"https://openalex.org/C68122502","display_name":"Informed consent","score":0.6338528394699097},{"id":"https://openalex.org/C93692415","display_name":"Thematic map","score":0.6246415376663208},{"id":"https://openalex.org/C74196892","display_name":"Thematic analysis","score":0.556902289390564},{"id":"https://openalex.org/C66402592","display_name":"Sentiment analysis","score":0.4984104633331299},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.496977835893631},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.42252397537231445},{"id":"https://openalex.org/C2522767166","display_name":"Data science","score":0.38904842734336853}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":7}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/how-to-audit-privacy-of-synthetic-data-generated-by-llms","title":"The Canary's Echo: Auditing Privacy Risks of LLM-Generated Synthetic Text","url":"https://www.microsoft.com/en-us/research/publication/how-to-audit-privacy-of-synthetic-data-generated-by-llms/","published":"2025-02-19","authors":["Matthieu Meeus","Lukas Wutschitz","Santiago Zanella-Béguelin","Reza Shokri","Shruti Tople"],"abstract":"How much information about training samples can be gleaned from synthetic data generated by Large Language Models (LLMs)? Overlooking the subtleties of information flow in synthetic data generation pipelines can lead to a false sense of privacy. In this paper, we design membership inference attacks (MIAs) that target data used to fine-tune pre-trained LLMs that are then used to synthesize data, particularly when the adversary does not have access to the fine-tuned model but only to the synthetic data. We show that such data-based MIAs do significantly better than a random guess, meaning that synthetic data leaks information about the training data. Further, we find that canaries crafted to maximize vulnerability to model-based MIAs are sub-optimal for privacy auditing when only synthetic data is released. Such out-of-distribution canaries have limited influence on the model's output when...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":96,"matched_keywords":["Unpublished","Artificial intelligence","Security, privacy, and cryptography","Machine learning","Natural language processing","Security and Privacy","Inproceedings (Conference)","Computer science","large language models","1970-01-01","LLM"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/explorer-scaling-exploration-driven-web-trajectory-synthesis-for-multimodal-web-agents","title":"Explorer: Scaling Exploration-driven Web Trajectory Synthesis for Multimodal Web Agents","url":"https://www.microsoft.com/en-us/research/publication/explorer-scaling-exploration-driven-web-trajectory-synthesis-for-multimodal-web-agents/","published":"2025-02-19","authors":["Vardaan Pahuja","Yadong Lu","Corby Rosset","Boyu Gou","Arindam Mitra","Spencer Whitehead","Yu Su","Ahmed Awadallah"],"abstract":"Recent success in large multimodal models (LMMs) has sparked promising applications of agents capable of autonomously completing complex web tasks. While open-source LMM agents have made significant advances in offline evaluation benchmarks, their performance still falls substantially short of human-level capabilities in more realistic online settings. A key bottleneck is the lack of diverse and large-scale trajectory-level datasets across various domains, which are expensive to collect. In this paper, we address this challenge by developing a scalable recipe to synthesize the largest and most diverse trajectory-level dataset to date, containing over 94K successful multimodal web trajectories, spanning 49K unique URLs, 720K screenshots, and 33M web elements. In particular, we leverage extensive web exploration and refinement to obtain diverse task intents. The average cost is 28 cents pe...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":80,"matched_keywords":["Unpublished","Artificial intelligence","Computer vision","Human-computer interaction","Autonomous agent","large language models","agent"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/rlthf-targeted-human-feedback-for-llm-alignment","title":"RLTHF: Targeted Human Feedback for LLM Alignment","url":"https://www.microsoft.com/en-us/research/publication/rlthf-targeted-human-feedback-for-llm-alignment/","published":"2025-02-19","authors":["Yifei Xu","Tusher Chakraborty","Emre Kiciman","Bibek Aryal","Eduardo Rodrigues","Srinagesh Sharma","Roberto Estevao","Maria Angels de Luis Balaguer","Jessica Wolk","Rafael Padilha","Leonardo Nunes","Shobana Balakrishna"],"abstract":"Fine-tuning large language models (LLMs) to align with user preferences is challenging due to the high cost of quality human annotations in Reinforcement Learning from Human Feedback (RLHF) and the generalizability limitations of AI Feedback. To address these challenges, we propose RLTHF, a human-AI hybrid framework that combines LLM-based initial alignment with selective human annotations to achieve full-human annotation alignment with minimal effort. RLTHF identifies hard-to-annotate samples mislabeled by LLMs using a reward model's reward distribution and iteratively enhances alignment by integrating strategic human corrections while leveraging LLM's correctly labeled samples. Evaluations on HH-RLHF and TL;DR datasets show that RLTHF reaches full-human annotation-level alignment with only 6-7% of the human annotation effort. Furthermore, models trained on RLTHF's curated datasets for....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","large language models","1970-01-01","LLM"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/world-and-human-action-models-towards-gameplay-ideation","title":"World and Human Action Models towards gameplay ideation","url":"https://www.microsoft.com/en-us/research/publication/world-and-human-action-models-towards-gameplay-ideation/","published":"2025-02-19","authors":["Anssi Kanervisto","Dave Bignell","Linda Yilin Wen","Martin Grayson","Raluca Georgescu","Sergio Valcarcel Macua","Shan Zheng Tan","Tabish Rashid","Tim Pearce","Yuhan Cao","Abdelhak Lemkhenter","Chentian Jiang"],"abstract":"Generative artificial intelligence (AI) has the potential to transform creative industries through supporting human creative ideation—the generation of new ideas1–5. However, limitations in model capabilities raise key challenges in integrating these technologies more fully into creative practices. Iterative tweaking and divergent thinking remain key to enabling creativity support using technology6,7, yet these practices are insufficiently supported by state-of-the-art generative AI models. Using game development as a lens, we demonstrate that we can make use of an understanding of user needs to drive the development and evaluation of generative AI models in a way that aligns with these creative practices. Concretely, we introduce a state-of-the-art generative model, the World and Human Action Model (WHAM), and show that it can generate consistent and diverse gameplay sequences and persi...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Article (Journal)","Artificial intelligence","Computer science","Medicine"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"apple:votyzhk3u130nonyl6xa1qz9","title":"From Multimodal LLMs to Generalist Embodied Agents: Methods and Lessons","url":"https://machinelearning.apple.com/research/generalist-embodied-agents","published":"2025-02-19","authors":["Andrew Szot","Bogdan Mazoure","Omar Attia","Aleksei Timofeev","Harsh Agrawal","Devon Hjelm","Zhe Gan","Zsolt Kira","Alexander Toshev"],"abstract":"We examine the capability of Multimodal Large Language Models (MLLMs) to tackle diverse domains that extend beyond the traditional language and vision tasks these models are typically trained on. Specifically, our focus lies in areas such as Embodied AI, Games, UI Control, and Planning. To this end, we introduce a process of adapting an MLLM to a Generalist Embodied Agent (GEA). GEA is a single unified model capable of grounding itself across...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["agent"],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/hybridna-a-hybrid-transformer-mamba2-long-range-dna-language-model","title":"HybriDNA: A Hybrid Transformer-Mamba2 Long-Range DNA Language Model","url":"https://www.microsoft.com/en-us/research/publication/hybridna-a-hybrid-transformer-mamba2-long-range-dna-language-model/","published":"2025-02-18","authors":["Mingqian Ma","Guoqing Liu","Chuan Cao","Pan Deng","Tri Dao","Albert Gu","Peiran Jin","Zhao Yang","Yingce Xia","Renqian Luo","Pipi Hu","Zun Wang"],"abstract":"Advances in natural language processing and large language models have sparked growing interest in modeling DNA, often referred to as the\"language of life\". However, DNA modeling poses unique challenges. First, it requires the ability to process ultra-long DNA sequences while preserving single-nucleotide resolution, as individual nucleotides play a critical role in DNA function. Second, success in this domain requires excelling at both generative and understanding tasks: generative tasks hold potential for therapeutic and industrial applications, while understanding tasks provide crucial insights into biological mechanisms and diseases. To address these challenges, we propose HybriDNA, a decoder-only DNA language model that incorporates a hybrid Transformer-Mamba2 architecture, seamlessly integrating the strengths of attention mechanisms with selective state-space models. This hybrid des...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Miscellaneous","Artificial intelligence","Biology","Computer science","language model"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/adaptivestep-automatically-dividing-reasoning-step-through-model-confidence","title":"AdaptiveStep: Automatically Dividing Reasoning Step through Model Confidence","url":"https://www.microsoft.com/en-us/research/publication/adaptivestep-automatically-dividing-reasoning-step-through-model-confidence/","published":"2025-02-18","authors":["Yuliang Liu","Junjie Lu","Zhaoling Chen","Chaofeng Qu","Jason Klein Liu","Chonghan Liu","Zefan Cai","Yunhui Xia","Li Zhao","Jiang Bian","Chuheng Zhang","Wei Shen"],"abstract":"Current approaches for training Process Reward Models (PRMs) often involve breaking down responses into multiple reasoning steps using rule-based techniques, such as using predefined placeholder tokens or setting the reasoning step's length into a fixed size. These approaches overlook the fact that specific words do not typically mark true decision points in a text. To address this, we propose AdaptiveStep, a method that divides reasoning steps based on the model's confidence in predicting the next word. This division method provides more decision-making information at each step, enhancing downstream tasks, such as reward model learning. Moreover, our method does not require manual annotation. We demonstrate its effectiveness through experiments with AdaptiveStep-trained PRMs in mathematical reasoning and code generation tasks. Experimental results indicate that the outcome PRM achieves....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"hf-org-paper:moonshotai:2502.13189","title":"MoBA: Mixture of Block Attention for Long-Context LLMs","url":"https://huggingface.co/papers/2502.13189","published":"2025-02-18","authors":["Moonshot/Kimi"],"abstract":"","companies":["Moonshot/Kimi"],"matched_orgs":["Moonshot/Kimi"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["HuggingFace org papers","moonshotai"],"author_affiliations":["Moonshot/Kimi"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/moonshotai/papers"}},{"id":"openalex:W4408610086","title":"LSEBMCL: A Latent Space Energy-Based Model for Continual Learning","url":"https://doi.org/10.1109/icaiic64266.2025.10920838","published":"2025-02-18","authors":["Xiaodi Li","Dingcheng Li","Rujun Gao","Mahmoud Zamani","Latifur Khan"],"abstract":"Continual learning has become essential in many practical applications such as online news summaries and product classification. The primary challenge is known as catastrophic forgetting, a phenomenon where a model inadvertently discards previously learned knowledge when it is trained on new tasks. Existing solutions involve storing exemplars from previous classes, regularizing parameters during the fine-tuning process, or assigning different model parameters to each task. The proposed solution LSEBMCL (Latent Space Energy-Based Model for Continual Learning) in this work is to use energy-based models (EBMs) to prevent catastrophic forgetting by sampling data points from previous tasks when training on new ones. The EBM is a machine learning model that associates an energy value with each input data point. The proposed method uses an EBM layer as an outer-generator in the continual learni...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/icaiic64266.2025.10920838","openalex_id":"https://openalex.org/W4408610086","cited_by_count":2,"quality_score":43,"matched_keywords":["news"],"author_affiliations":["Google (United States)","Texas A&M University","The University of Texas at Dallas"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6633582711219788},{"id":"https://openalex.org/C2778572836","display_name":"Space (punctuation)","score":0.6103976368904114},{"id":"https://openalex.org/C186370098","display_name":"Energy (signal processing)","score":0.49267107248306274},{"id":"https://openalex.org/C58024561","display_name":"Latent heat","score":0.4408648610115051},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3791062831878662},{"id":"https://openalex.org/C121332964","display_name":"Physics","score":0.10041838884353638},{"id":"https://openalex.org/C33923547","display_name":"Mathematics","score":0.08946490287780762},{"id":"https://openalex.org/C105795698","display_name":"Statistics","score":0.07673650979995728}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":2}},{"id":"openalex:W4407691816","title":"LPM: Efficient 3D Content Creation From Single Image by Large-Scale Partial 3D Modeling","url":"https://doi.org/10.1109/tcsvt.2025.3543384","published":"2025-02-18","authors":["Yisu Zhang","Chaohui Yu","Fan Wang","Jianke Zhu"],"abstract":"Synthesizing 3D content from single image has great potential in many real-world applications. To deal with the inherent ambiguity of single image, existing methods usually leverage pre-trained 2D diffusion models for computational intensive per-instance optimization. Although having been able to create 3D assets in a feed-forward manner, the efficacy of recent advances in 3D foundation models is still limited due to neglecting geometric cues from images. To address this issue, we propose an efficient 3D foundation model named LPM to synthesize 3D content from an image. Like the masked modeling in the 2D image domain, the key of our approach is to learn 3D representations from incomplete visible shapes. By taking advantage of a synthesis-by-analysis paradigm, we establish an efficient pipeline to first estimate the visible portions and then generate the complete 3D representations. Based...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/tcsvt.2025.3543384","openalex_id":"https://openalex.org/W4407691816","cited_by_count":0,"quality_score":41,"matched_keywords":["efficient"],"author_affiliations":["Alibaba Group (China)","Zhejiang University of Science and Technology"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.641295313835144},{"id":"https://openalex.org/C2778755073","display_name":"Scale (ratio)","score":0.5686134696006775},{"id":"https://openalex.org/C115961682","display_name":"Image (mathematics)","score":0.46954259276390076},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.45871785283088684},{"id":"https://openalex.org/C9417928","display_name":"Image processing","score":0.41585463285446167},{"id":"https://openalex.org/C121684516","display_name":"Computer graphics (images)","score":0.38271626830101013},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3648235499858856},{"id":"https://openalex.org/C121332964","display_name":"Physics","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"arxiv:2502.13141","title":"UniGuardian: A Unified Defense for Detecting Prompt Injection, Backdoor Attacks and Adversarial Attacks in Large Language Models","url":"https://huggingface.co/papers/2502.13141","published":"2025-02-18","authors":["Huawei Lin","Yingjie Lao","Tong Geng","Tan Yu","Weijie Zhao"],"abstract":"Large Language Models (LLMs) are vulnerable to attacks like prompt injection, backdoor attacks, and adversarial attacks, which manipulate prompts or models to generate harmful outputs. In this paper, departing from traditional deep learning attack paradigms, we explore their intrinsic relationship and collectively term them Prompt Trigger Attacks (PTA). This raises a key question: Can we determine if a prompt is benign or poisoned? To address this, we propose UniGuardian, the first unified defense mechanism designed to detect prompt injection, backdoor attacks, and adversarial attacks in LLMs. Additionally, we introduce a single-forward strategy to optimize the detection pipeline, enabling simultaneous attack detection and text generation within a single forward pass. Our experiments confirm that UniGuardian accurately and efficiently identifies malicious prompts in LLMs.","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["huggingface_search"],"source":"huggingface_search","work_type":"","doi":"","openalex_id":"","cited_by_count":0,"quality_score":27,"matched_keywords":[],"author_affiliations":[],"concepts":[],"official_report":false,"quality_signals":{"company_match_source":"HuggingFace Papers organization/author metadata"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/magma-a-foundation-model-for-multimodal-ai-agents","title":"Magma: A Foundation Model for Multimodal AI Agents","url":"https://www.microsoft.com/en-us/research/publication/magma-a-foundation-model-for-multimodal-ai-agents/","published":"2025-02-17","authors":["Jianwei Yang","Reuben Tan","Qianhui Wu","Ruijie Zheng","Baolin Peng","Yongyuan Liang","Yu Gu","Mu Cai","Seonghyeon Ye","Joel Jang","Yuquan Deng","Lars Liden"],"abstract":"We present Magma, a foundation model that serves multimodal AI agentic tasks in both the digital and physical worlds. Magma is a significant extension of vision-language (VL) models in that it not only retains the VL understanding ability (verbal intelligence) of the latter, but is also equipped with the ability to plan and act in the visual-spatial world (spatial-temporal intelligence) and complete agentic tasks ranging from UI navigation to robot manipulation. To endow the agentic capabilities, Magma is pretrained on large amounts of heterogeneous datasets spanning from images, videos to robotics data, where the actionable visual objects (e.g., clickable buttons in GUI) in images are labeled by Set-of-Mark (SoM) for action grounding, and the object movements (e.g., the trace of human hands or robotic arms) in videos are labeled by Trace-of-Mark (ToM) for action planning. Extensive expe...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Miscellaneous","Artificial intelligence","Computer vision","Computer science","Vision-language models"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"arxiv:2502.11401","title":"Following the Autoregressive Nature of LLM Embeddings via Compression and Alignment","url":"https://huggingface.co/papers/2502.11401","published":"2025-02-17","authors":["Jingcheng Deng","Zhongtao Jiang","Liang Pang","Liwei Chen","Kun Xu","Zihao Wei","Huawei Shen","Xueqi Cheng"],"abstract":"A new trend uses LLMs as dense text encoders via contrastive learning. However, since LLM embeddings predict the probability distribution of the next token, they are inherently generative and distributive, conflicting with contrastive learning, which requires embeddings to capture full-text semantics and align via cosine similarity. This discrepancy hinders the full utilization of LLMs' pre-training capabilities, resulting in inefficient learning. In response to this issue, we propose AutoRegEmbed, a new contrastive learning method built on embedding conditional probability distributions, which integrates two core tasks: information compression and conditional distribution alignment. The information compression task encodes text into the embedding space, ensuring that the embedding vectors capture global semantics. The conditional distribution alignment task focuses on aligning text embe...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["huggingface_search"],"source":"huggingface_search","work_type":"","doi":"","openalex_id":"","cited_by_count":0,"quality_score":35,"matched_keywords":["LLM","compression"],"author_affiliations":[],"concepts":[],"official_report":false,"quality_signals":{"company_match_source":"HuggingFace Papers organization/author metadata"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/on-the-query-complexity-of-verifier-assisted-language-generation","title":"On the Query Complexity of Verifier-Assisted Language Generation","url":"https://www.microsoft.com/en-us/research/publication/on-the-query-complexity-of-verifier-assisted-language-generation/","published":"2025-02-16","authors":["Edoardo Botta","Yuchen Li","Aashay Mehta","Jordan Ash","Cyril Zhang","Andrej Risteski"],"abstract":"Recently, a plethora of works have proposed inference-time algorithms (e.g. best-of-n), which incorporate verifiers to assist the generation process. Their quality-efficiency trade-offs have been empirically benchmarked on a variety of constrained generation tasks, but the algorithmic design landscape is still largely poorly understood. In this paper, we develop a mathematical framework for reasoning about constrained generation using a pre-trained language model generator oracle and a process verifier--which can decide whether a prefix can be extended to a string which satisfies the constraints of choice. We show that even in very simple settings, access to a verifier can render an intractable problem (information-theoretically or computationally) to a tractable one. In fact, we show even simple algorithms, like tokenwise rejection sampling, can enjoy significant benefits from access to...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":80,"matched_keywords":["Inproceedings (Conference)","Algorithms","Artificial intelligence","Computer science","Language model","1970-01-01","language model"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"hf-org-paper:deepseek-ai:2502.11089","title":"Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention","url":"https://huggingface.co/papers/2502.11089","published":"2025-02-16","authors":["DeepSeek"],"abstract":"","companies":["DeepSeek"],"matched_orgs":["DeepSeek"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["HuggingFace org papers","deepseek-ai"],"author_affiliations":["DeepSeek"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/deepseek-ai/papers"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/towards-effective-extraction-and-evaluation-of-factual-claims","title":"Towards Effective Extraction and Evaluation of Factual Claims","url":"https://www.microsoft.com/en-us/research/publication/towards-effective-extraction-and-evaluation-of-factual-claims/","published":"2025-02-15","authors":["Dasha Metropolitansky","Jonathan Larson"],"abstract":"A common strategy for fact-checking long-form content generated by Large Language Models (LLMs) is extracting simple claims that can be verified independently. Since inaccurate or incomplete claims compromise fact-checking results, ensuring claim quality is critical. However, the lack of a standardized evaluation framework impedes assessment and comparison of claim extraction methods. To address this gap, we propose a framework for evaluating claim extraction in the context of fact-checking along with automated, scalable, and replicable methods for applying this framework, including novel approaches for measuring coverage and decontextualization. We also introduce Claimify, an LLM-based claim extraction method, and demonstrate that it outperforms existing methods under our evaluation framework. A key feature of Claimify is its ability to handle ambiguity and extract claims only when ther...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Human language technologies","Computer science","1970-01-01","LLM"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W4407595211","title":"Fusion4DAL: Offline Multi-modal 3D Object Detection for 4D Auto-labeling","url":"https://doi.org/10.1007/s11263-025-02370-1","published":"2025-02-15","authors":["Zhiyuan Yang","Xuekuan Wang","Wei Zhang","Xiao Tan","Jincheng Lu","Jingdong Wang","Errui Ding","Cairong Zhao"],"abstract":"","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1007/s11263-025-02370-1","openalex_id":"https://openalex.org/W4407595211","cited_by_count":2,"quality_score":39,"matched_keywords":[],"author_affiliations":["Baidu (China)","Tongji University"],"concepts":[{"id":"https://openalex.org/C71139939","display_name":"Modal","score":0.7005710601806641},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.692885160446167},{"id":"https://openalex.org/C153180895","display_name":"Pattern recognition (psychology)","score":0.6671991348266602},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6368135213851929},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.47862520813941956},{"id":"https://openalex.org/C2776151529","display_name":"Object detection","score":0.45886537432670593},{"id":"https://openalex.org/C2781238097","display_name":"Object (grammar)","score":0.4460179805755615},{"id":"https://openalex.org/C185592680","display_name":"Chemistry","score":0.056774914264678955}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":2}},{"id":"openalex:W4407580437","title":"An Empirical Study of Retrieval-Augmented Code Generation: Challenges and Opportunities","url":"https://doi.org/10.1145/3717061","published":"2025-02-14","authors":["Zezhou Yang","Sirong Chen","Cuiyun Gao","Zhenhao Li","Xing Hu","Kui Liu","Xin Xia"],"abstract":"Code generation aims to automatically generate code snippets of specific programming language according to natural language descriptions. The continuous advancements in deep learning, particularly pre-trained models, have empowered the code generation task to achieve remarkable performance. One main challenge of pre-trained models for code generation is the semantic gap between developers’ natural language requirements and source code. To address the issue, prior studies typically adopt a retrieval-augmented framework for the task, where the similar code snippets collected by a retrieval process can be leveraged to help understand the requirements and provide guidance for the generation process. In a retrieval-augmented framework, similar data can be retrieved from the database using a retrieval algorithm, and original input data can be fused with retrieved data by different fusion strat...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3717061","openalex_id":"https://openalex.org/W4407580437","cited_by_count":18,"quality_score":59,"matched_keywords":["retrieval"],"author_affiliations":["Concordia University","Harbin Institute of Technology","Huawei Technologies (China)","Zhejiang University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8055282831192017},{"id":"https://openalex.org/C2776760102","display_name":"Code (set theory)","score":0.458458811044693},{"id":"https://openalex.org/C2522767166","display_name":"Data science","score":0.39404407143592834},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.37092941999435425},{"id":"https://openalex.org/C199360897","display_name":"Programming language","score":0.19812777638435364},{"id":"https://openalex.org/C177264268","display_name":"Set (abstract data type)","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":18}},{"id":"openalex:W4407591820","title":"ENHANCING INFORMATION RETRIEVAL WITH RETRIEVAL-AUGMENTED GENERATION (RAG) FOR IMPROVED CONVERSATIONAL AI","url":"https://doi.org/10.34218/ijcet_16_01_233","published":"2025-02-14","authors":["Prudhvi Chandra"],"abstract":"This article presents a comprehensive analysis of Retrieval-Augmented Generation (RAG) systems and their application in enhancing conversational AI capabilities. It proposes an integrated framework that combines traditional information retrieval techniques with state-of-the-art language models to improve response accuracy andPrudhvi Chandra","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.34218/ijcet_16_01_233","openalex_id":"https://openalex.org/W4407591820","cited_by_count":0,"quality_score":41,"matched_keywords":["retrieval"],"author_affiliations":["Amazon (United States)"],"concepts":[{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.5944457054138184},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5395063757896423},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.370144248008728}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4407450948","title":"Diffusion Model-Based Image Editing: A Survey","url":"https://doi.org/10.1109/tpami.2025.3541625","published":"2025-02-13","authors":["Yi Huang","Jiancheng Huang","Yifan Liu","Mingfu Yan","Jiaxi Lv","Jianzhuang Liu","Wei Xiong","He Zhang","Liangliang Cao","Shifeng Chen"],"abstract":"Denoising diffusion models have emerged as a powerful tool for various image generation and editing tasks, facilitating the synthesis of visual content in an unconditional or input-conditional manner. The core idea behind them is learning to reverse the process of gradually adding noise to images, allowing them to generate high-quality samples from a complex distribution. In this survey, we provide an exhaustive overview of existing methods using diffusion models for image editing, covering both theoretical and practical aspects in the field. We delve into a thorough analysis and categorization of these works from multiple perspectives, including learning strategies, user-input conditions, and the array of specific editing tasks that can be accomplished. In addition, we pay special attention to image inpainting and outpainting, and explore both earlier traditional context-driven and curr...","companies":["Apple","NVIDIA"],"matched_orgs":["Apple","NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/tpami.2025.3541625","openalex_id":"https://openalex.org/W4407450948","cited_by_count":66,"quality_score":79,"matched_keywords":[],"author_affiliations":["Adobe Systems (United States)","Apple (United States)","Nvidia (United States)","Shenzhen Institutes of Advanced Technology","Southern University of Science and Technology"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6697824001312256},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5562602281570435},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.5084890127182007},{"id":"https://openalex.org/C2776674983","display_name":"Image editing","score":0.452846884727478},{"id":"https://openalex.org/C115961682","display_name":"Image (mathematics)","score":0.44882988929748535},{"id":"https://openalex.org/C69357855","display_name":"Diffusion","score":0.4261326789855957},{"id":"https://openalex.org/C9417928","display_name":"Image processing","score":0.41100814938545227},{"id":"https://openalex.org/C121684516","display_name":"Computer graphics (images)","score":0.40214836597442627}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":66}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/simplifying-dino-via-coding-rate-regularization","title":"Simplifying DINO via Coding Rate Regularization","url":"https://www.microsoft.com/en-us/research/publication/simplifying-dino-via-coding-rate-regularization/","published":"2025-02-13","authors":["Ziyang Wu","Jingyuan Zhang","Druv Pai","XuDong Wang","Chandan Singh","Jianwei Yang","Jianfeng Gao","Yi Ma"],"abstract":"DINO and DINOv2 are two model families being widely used to learn representations from unlabeled imagery data at large scales. Their learned representations often enable state-of-the-art performance for downstream tasks, such as image classification and segmentation. However, they employ many empirically motivated design choices and their training pipelines are highly complex and unstable -- many hyperparameters need to be carefully tuned to ensure that the representations do not collapse -- which poses considerable difficulty to improving them or adapting them to new domains. In this work, we posit that we can remove most such-motivated idiosyncrasies in the pre-training pipelines, and only need to add an explicit coding rate term in the loss function to avoid collapse of the representations. As a result, we obtain highly simplified variants of the DINO and DINOv2 which we call SimDINO....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer vision","Computer science","Representation learning","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/region-adaptive-sampling-for-diffusion-transformers","title":"Region-Adaptive Sampling for Diffusion Transformers","url":"https://www.microsoft.com/en-us/research/publication/region-adaptive-sampling-for-diffusion-transformers/","published":"2025-02-13","authors":["Ziming Liu","Yifan Yang","Chengruidong Zhang","Yiqi Zhang","Lili Qiu","Yang You","Yuqing Yang"],"abstract":"Diffusion models (DMs) have become the leading choice for generative tasks across diverse domains. However, their reliance on multiple sequential forward passes significantly limits real-time performance. Previous acceleration methods have primarily focused on reducing the number of sampling steps or reusing intermediate results, failing to leverage variations across spatial regions within the image due to the constraints of convolutional U-Net structures. By harnessing the flexibility of Diffusion Transformers (DiTs) in handling variable number of tokens, we introduce RAS, a novel, training-free sampling strategy that dynamically assigns different sampling ratios to regions within an image based on the focus of the DiT model. Our key observation is that during each sampling step, the model concentrates on semantically meaningful regions, and these areas of focus exhibit strong continuit...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Miscellaneous","Computer vision","Computer science","Computer Vision and Pattern Recognition","efficient"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W4407451021","title":"Uni-MoE: Scaling Unified Multimodal LLMs With Mixture of Experts","url":"https://doi.org/10.1109/tpami.2025.3532688","published":"2025-02-13","authors":["Yunxin Li","Shenyuan Jiang","Baotian Hu","Longyue Wang","Wanqi Zhong","Wenhan Luo","Lin Ma","Min Zhang"],"abstract":"Recent advancements in Multimodal Large Language Models (MLLMs) underscore the significance of scalable models and data to boost performance, yet this often incurs substantial computational costs. Although the Mixture of Experts (MoE) architecture has been employed to scale large language or visual-language models efficiently, these efforts typically involve fewer experts and limited modalities. To address this, our work presents the pioneering attempt to develop a unified MLLM with the MoE architecture, named Uni-MoE that can handle a wide array of modalities. Specifically, it features modality-specific encoders with connectors for a unified multimodal representation. We also implement a sparse MoE architecture within the LLMs to enable efficient training and inference through modality-level data parallelism and expert-level model parallelism. To enhance the multi-expert collaboration a...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/tpami.2025.3532688","openalex_id":"https://openalex.org/W4407451021","cited_by_count":32,"quality_score":71,"matched_keywords":["efficient"],"author_affiliations":["Alibaba Group (China)","Harbin Institute of Technology","Hong Kong University of Science and Technology","Meizu (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5900783538818359},{"id":"https://openalex.org/C99844830","display_name":"Scaling","score":0.5275074243545532},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.482940137386322},{"id":"https://openalex.org/C33923547","display_name":"Mathematics","score":0.1384250521659851},{"id":"https://openalex.org/C2524010","display_name":"Geometry","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":32}},{"id":"bytedance-seed:273","title":"MME-CoT: Benchmarking Chain-of-Thought in Large Multimodal Models for Reasoning Quality, Robustness, and Efficiency","url":"https://seed.bytedance.com/en/research/mme-cot-benchmarking-chain-of-thought-in-large-multimodal-models-for-reasoning-quality-robustness-and-efficiency","published":"2025-02-13","authors":["Dongzhi Jiang","Renrui Zhang","Ziyu Guo","Yanwei Li","Yu Qi","Xinyan Chen","Liuhui Wang","Jianhan Jin","Claire Guo","Shen Yan","Bo Zhang","Chaoyou Fu"],"abstract":"Answering questions with Chain-of-Thought (CoT) has significantly enhanced the reasoning capabilities of Large Language Models (LLMs), yet its impact on Large Multimodal Models (LMMs) still lacks a systematic assessment and in-depth investigation. In this paper, we introduce MMECoT, a specialized benchmark evaluating the CoT reasoning performance of LMMs, spanning six domains: math, science, OCR, logic, space-time, and general scenes. As the first comprehensive study in this area, we propose a thorough evaluation suite incorporating three novel metrics that assess the reasoning quality, robustness, and efficiency at a fine-grained level. Leveraging curated high-quality data and a unique evaluation strategy, we conduct an in-depth analysis of state-of-the-art LMMs, uncovering several key insights: 1) Models with reflection mechanism demonstrate a superior CoT quality, with Kimi k1.5 outpe...","companies":["ByteDance/Seed"],"matched_orgs":["ByteDance/Seed"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Computer Vision and Pattern Recognition","Multimodal","ICML 2025"],"author_affiliations":["ByteDance/Seed"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official ByteDance Seed publication API https://seed.bytedance.com/api/get_article_list_v2"}},{"id":"huawei-noah:177","title":"ET-Plan-Bench: Embodied Task-level Planning Benchmark Towards Spatial-Temporal Cognition with Foundation Models","url":"https://www.noahlab.com.hk/en/scientific_research/et-plan-bench-embodied-task-level-planning-benchmark-towards-spatial-temporal-cognition-with-foundation-models","published":"2025-02-13","authors":["Huawei/Noah"],"abstract":"Official Huawei Noah's Ark Lab publication entry. Venue: IROS 2025. External paper link: https://arxiv.org/pdf/2410.14682","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Physical AI","IROS 2025","2025"],"author_affiliations":["Huawei/Noah"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Huawei Noah Wagtail publication API https://www.noahlab.com.hk/wt_app/api/v2/pages/"}},{"id":"openalex:W4407513179","title":"A multimodal embedding transfer approach for consistent and selective learning processes in cross-modal retrieval","url":"https://doi.org/10.1016/j.ins.2025.121974","published":"2025-02-13","authors":["Zhixiong Zeng","Shuyi He","Yuhao Zhang","Wenji Mao"],"abstract":"","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1016/j.ins.2025.121974","openalex_id":"https://openalex.org/W4407513179","cited_by_count":5,"quality_score":46,"matched_keywords":["retrieval"],"author_affiliations":["Chinese Academy of Sciences","Institute of Automation","Tencent (China)","University of Chinese Academy of Sciences"],"concepts":[{"id":"https://openalex.org/C71139939","display_name":"Modal","score":0.728723406791687},{"id":"https://openalex.org/C41608201","display_name":"Embedding","score":0.7174952030181885},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6741030216217041},{"id":"https://openalex.org/C150899416","display_name":"Transfer of learning","score":0.5523093342781067},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.46819955110549927},{"id":"https://openalex.org/C2776175482","display_name":"Transfer (computing)","score":0.4453747570514679},{"id":"https://openalex.org/C185592680","display_name":"Chemistry","score":0.09379446506500244},{"id":"https://openalex.org/C173608175","display_name":"Parallel computing","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":5}},{"id":"openalex:W4407507540","title":"SDD-LawLLM: Advancing Intelligent Legal Systems Through Synthetic Data-Driven Fine-Tuning of Large Language Models","url":"https://doi.org/10.3390/electronics14040742","published":"2025-02-13","authors":["Hanjie Ma","Yuhang Lu","Zhengdong Xiao","Jie Feng","Haixiang Zhang","Jian Yu"],"abstract":"The extensive use of large language models (LLMs) across various natural language processing tasks has markedly elevated the intelligence of legal systems. Despite their exceptional performance in terms of accuracy, these systems still struggle with explainability. To tackle this challenge, we propose an approach to boost the question-answering abilities of LLMs through data synthesis, focusing on Qwen-7B. By incorporating Retrieval-Augmented Generation (RAG) techniques, we enhance the system’s transparency and reliability by introducing detailed reasoning processes (CoT Prompts). Our experimental results indicate that our trained LLMs exhibit significant improvements in both answer accuracy and explainability, especially in objective evaluation tasks. Additionally, subjective assessments reveal that the model’s responses are not only precise but also highly understandable, thus boosting...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.3390/electronics14040742","openalex_id":"https://openalex.org/W4407507540","cited_by_count":4,"quality_score":45,"matched_keywords":["retrieval"],"author_affiliations":["Alibaba Group (China)","Zhejiang Sci-Tech University"],"concepts":[{"id":"https://openalex.org/C2780233690","display_name":"Transparency (behavior)","score":0.7393022179603577},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6366800665855408},{"id":"https://openalex.org/C46686674","display_name":"Boosting (machine learning)","score":0.6126713752746582},{"id":"https://openalex.org/C44291984","display_name":"Question answering","score":0.49138203263282776},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.49033570289611816},{"id":"https://openalex.org/C43214815","display_name":"Reliability (semiconductor)","score":0.47570687532424927},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.3688858151435852},{"id":"https://openalex.org/C2522767166","display_name":"Data science","score":0.3209994435310364}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":4}},{"id":"openalex:W4407457013","title":"An End-to-End Flight Control Method for UAVs Based on MD-SAC","url":"https://doi.org/10.1109/tce.2025.3541747","published":"2025-02-13","authors":["Chao Song","Yi Zhang","Shuangxia Bai","Bo Li","Zhigang Gan","Evgeny Neretin Shuangxia Bai is with the School of Data Science"],"abstract":"Deep reinforcement learning (DRL) allows uncrewed aerial vehicles (UAVs) to learn control policies for tasks in complicated and unfamiliar environments, hence it is widely employed in the field of UAV flight control. However, the model and operational environment of UAVs are typically simplified, rendering them unrepresentative of the real world. Furthermore, using only a single sensory data to control UAV flight is difficult to realize autonomous decision-making of UAVs. In this paper, an end-to-end flight control method for UAVs based on multimodal data fusion and Soft Actor-Critic (SAC) algorithm is proposed, named MD-SAC. First, this paper constructs the UAV model that is basically consistent with the real physical model, and forms a UAV multidata fusion state space including UAV information, UAV and target information and UAV sensor sensing information. Then, the strategy of directl...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/tce.2025.3541747","openalex_id":"https://openalex.org/W4407457013","cited_by_count":7,"quality_score":44,"matched_keywords":[],"author_affiliations":["City University of Hong Kong","Huawei Technologies (China)","Moscow Aviation Institute","Northwestern Polytechnical University","SAIC Motor (China)"],"concepts":[{"id":"https://openalex.org/C74296488","display_name":"End-to-end principle","score":0.7500576972961426},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.4864508807659149},{"id":"https://openalex.org/C127413603","display_name":"Engineering","score":0.4051643908023834},{"id":"https://openalex.org/C2775924081","display_name":"Control (management)","score":0.39226430654525757},{"id":"https://openalex.org/C178802073","display_name":"Aeronautics","score":0.3681543469429016},{"id":"https://openalex.org/C47446073","display_name":"Control theory (sociology)","score":0.3483225107192993},{"id":"https://openalex.org/C31258907","display_name":"Computer network","score":0.15246057510375977},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.12843245267868042}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":7}},{"id":"bytedance-seed:286","title":"One Example Shown, Many Concepts Known! Counterexample-Driven Conceptual Reasoning in Mathematical LLMs","url":"https://seed.bytedance.com/en/research/one-example-shown-many-concepts-known-counterexample-driven-conceptual-reasoning-in-mathematical-llms","published":"2025-02-12","authors":["Yinghui Li","Jiayi Kuang","Haojing Huang","Zhikun Xu","Xinnian Liang","Yi Yu","Wenlian Lu","Yangning Li","Xiaoyu Tan","Chao Qu","Ying Shen","Hai-Tao Zheng"],"abstract":"Leveraging mathematical Large Language Models (LLMs) for proof generation is a fundamental topic in LLMs research. We argue that the ability of current LLMs to prove statements largely depends on whether they have encountered the relevant proof process during training. This reliance limits their deeper understanding of mathematical theorems and related concepts. Inspired by the pedagogical method of “proof by counterexamples” commonly used in human mathematics education, our work aims to enhance LLMs’ ability to conduct mathematical reasoning and proof through counterexamples. Specifically, we manually create a high-quality, university-level mathematical benchmark, COUNTERMATH, which requires LLMs to prove mathematical statements by providing counterexamples, thereby assessing their grasp of mathematical concepts. Additionally, we develop a data engineering framework to automatically obt...","companies":["ByteDance/Seed"],"matched_orgs":["ByteDance/Seed"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Machine Learning","Application","ICML 2025"],"author_affiliations":["ByteDance/Seed"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official ByteDance Seed publication API https://seed.bytedance.com/api/get_article_list_v2"}},{"id":"openalex:W4407761308","title":"Unlocking Potential with Generative AI Instruction: Investigating Mid-level Software Development Student Perceptions, Behavior, and Adoption","url":"https://doi.org/10.1145/3641554.3701859","published":"2025-02-12","authors":["Jamie Gorson Benario","Jenn Marroquin","Monica Chan","Ernest Holmes","Daniel Mejia"],"abstract":"Generative AI tools are rapidly evolving and impacting many domains, including programming. Computer Science (CS) instructors must address student access to these tools. While some advocate to ban the tools entirely, others suggest embracing them so that students develop the skills for utilizing the tools safely and responsibly. Studies indicate positive impacts, as well as cautions, on student outcomes when these tools are integrated into courses. We studied the impact of incorporating instruction on industry-standard generative AI tools into a mid-level software development course with students from 16 Minority Serving Institutions. 89% of student participants used generative AI tools prior to the course without any formal instruction. After formal instruction, students most frequently used generative AI tools for explaining concepts and learning new things. Students generally reported...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3641554.3701859","openalex_id":"https://openalex.org/W4407761308","cited_by_count":3,"quality_score":40,"matched_keywords":[],"author_affiliations":["Google (United States)","Gorgias Press (United States)","The University of Texas at El Paso"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6651865243911743},{"id":"https://openalex.org/C26760741","display_name":"Perception","score":0.593513548374176},{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.5531682372093201},{"id":"https://openalex.org/C529173508","display_name":"Software development","score":0.4786030948162079},{"id":"https://openalex.org/C2777904410","display_name":"Software","score":0.45810627937316895},{"id":"https://openalex.org/C145420912","display_name":"Mathematics education","score":0.3658261001110077},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.3208003342151642},{"id":"https://openalex.org/C15744967","display_name":"Psychology","score":0.20314860343933105}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":3}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/llm-pretraining-with-continuous-concepts","title":"LLM Pretraining with Continuous Concepts","url":"https://www.microsoft.com/en-us/research/publication/llm-pretraining-with-continuous-concepts/","published":"2025-02-11","authors":["Jihoon Tack","Jack Lanchantin","Jane Yu","Andrew Cohen","Ilia Kulikov","Janice Lan","Shibo Hao","Yuandong Tian","Jason Weston","Xian Li"],"abstract":"Next token prediction has been the standard training objective used in large language model pretraining. Representations are learned as a result of optimizing for token-level perplexity. We propose Continuous Concept Mixing (CoCoMix), a novel pretraining framework that combines discrete next token prediction with continuous concepts. Specifically, CoCoMix predicts continuous concepts learned from a pretrained sparse autoencoder and mixes them into the model's hidden state by interleaving with token hidden representations. Through experiments on multiple benchmarks, including language modeling and downstream reasoning tasks, we show that CoCoMix is more sample efficient and consistently outperforms standard next token prediction, knowledge distillation and inserting pause tokens. We find that combining both concept learning and interleaving in an end-to-end framework is critical to perfor...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":88,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","large language models","1970-01-01","LLM","language model","efficient","distillation"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W4407356649","title":"Integrating protein language models and automatic biofoundry for enhanced protein evolution","url":"https://doi.org/10.1038/s41467-025-56751-8","published":"2025-02-11","authors":["Qiang Zhang","Wanyi Chen","Ming Qin","Yuhao Wang","Zhongji Pu","Keyan Ding","Yuyue Liu","Qunfeng Zhang","Dongfang Li","Xinjia Li","Yu Zhao","Jianhua Yao"],"abstract":"Traditional protein engineering methods, such as directed evolution, while effective, are often slow and labor-intensive. Advances in machine learning and automated biofoundry present new opportunities for optimizing these processes. This study devises a protein language model-enabled automatic evolution platform, a closed-loop system for automated protein engineering within the Design-Build-Test-Learn cycle. The protein language model ESM-2 makes zero-shot prediction of 96 variants to initiate the cycle. The biofoundry constructs and evaluates these variants, and feeds the results back to a multi-layer perceptron to train a fitness predictor, which then makes prediction of second round of 96 variants with improved fitness. With the tRNA synthetase as a model enzyme, four-rounds of evolution carried out within 10 days lead to mutants with enzyme activity improved by up to 2.4-fold. Our s...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1038/s41467-025-56751-8","openalex_id":"https://openalex.org/W4407356649","cited_by_count":48,"quality_score":71,"matched_keywords":["language model"],"author_affiliations":["Hangzhou Wanxiang Polytechnic","Tencent (China)","Zhejiang Lab","Zhejiang University","Zhejiang University of Science and Technology","Zhejiang University of Technology","Zhejiang University-University of Edinburgh Institute"],"concepts":[{"id":"https://openalex.org/C9418097","display_name":"Directed evolution","score":0.7933582663536072},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6438164710998535},{"id":"https://openalex.org/C147816474","display_name":"Protein engineering","score":0.6103652119636536},{"id":"https://openalex.org/C2908542902","display_name":"Directed Molecular Evolution","score":0.440290629863739},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4193716049194336},{"id":"https://openalex.org/C143065580","display_name":"Mutant","score":0.20377221703529358},{"id":"https://openalex.org/C86803240","display_name":"Biology","score":0.17002281546592712},{"id":"https://openalex.org/C181199279","display_name":"Enzyme","score":0.168736070394516}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":48}},{"id":"openalex:W4407375814","title":"A Bi-Step Grounding Paradigm for Large Language Models in Recommendation Systems","url":"https://doi.org/10.1145/3716393","published":"2025-02-11","authors":["Keqin Bao","Jizhi Zhang","Wenjie Wang","Yang Zhang","Zhengyi Yang","Yanchen Luo","Chong Chen","Fuli Feng","Qi Tian"],"abstract":"As the focus on Large Language Models (LLMs) in the field of recommendation intensifies, the optimization of LLMs for recommendation purposes (referred to as LLM4Rec) assumes a crucial role in enhancing their recommendation performance. However, existing approaches for LLM4Rec often assess performance using restricted sets of candidates, which may not accurately reflect the models’ overall ranking capabilities. In this article, our objective is to pursue LLM4Rec models with comprehensive ranking capacity and propose a two-step grounding framework known as BIGRec (Bi-step Grounding Paradigm for Recommendation). BIGRecm initially grounds LLMs to the recommendation space by fine-tuning them to generate meaningful tokens for items and subsequently identifies appropriate actual items that correspond to the generated tokens. By conducting extensive experiments on two datasets, we substantiate....","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3716393","openalex_id":"https://openalex.org/W4407375814","cited_by_count":28,"quality_score":65,"matched_keywords":[],"author_affiliations":["Huawei Technologies (China)","National University of Singapore","University of Science and Technology of China"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5894807577133179},{"id":"https://openalex.org/C168993435","display_name":"Ground","score":0.5142523646354675},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.39179834723472595},{"id":"https://openalex.org/C188147891","display_name":"Cognitive science","score":0.3583337664604187},{"id":"https://openalex.org/C41895202","display_name":"Linguistics","score":0.34191030263900757},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.32035040855407715},{"id":"https://openalex.org/C127413603","display_name":"Engineering","score":0.1772192120552063},{"id":"https://openalex.org/C15744967","display_name":"Psychology","score":0.16882964968681335}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":28}},{"id":"openalex:W4407354637","title":"HumanRef-GS: Image-to-3D Human Generation With Reference-Guided Diffusion and 3D Gaussian Splatting","url":"https://doi.org/10.1109/tcsvt.2025.3540969","published":"2025-02-11","authors":["Jingbo Zhang","Xiaoyu Li","Hongliang Zhong","Qi Zhang","Yan‐Pei Cao","Ying Shan","Jing Liao"],"abstract":"Generating a 3D human model from a single reference image is a challenging task as it involves inferring textures and geometries in unseen views while maintaining consistency with the reference image. Existing methods that rely on 3D generative models are limited by the availability of 3D training data. Optimization-based approaches that distill text-to-image diffusion models into 3D models often struggle to preserve the intricate texture details of the reference image, resulting in inconsistent appearances across different views. In this paper, we propose HumanRef-GS, a novel method for single image-to-3D clothed human generation based on 3D Gaussian Splatting (3DGS). To ensure the generated 3D model is both photorealistic and consistent with the input image, HumanRef-GS employs a unique technique called reference-guided score distillation sampling (Ref-SDS). This method effectively inc...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/tcsvt.2025.3540969","openalex_id":"https://openalex.org/W4407354637","cited_by_count":5,"quality_score":50,"matched_keywords":["efficient","distillation"],"author_affiliations":["City University of Hong Kong","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.689998984336853},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5318576693534851},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.5157108902931213},{"id":"https://openalex.org/C163716315","display_name":"Gaussian","score":0.44054871797561646},{"id":"https://openalex.org/C121684516","display_name":"Computer graphics (images)","score":0.3850557804107666},{"id":"https://openalex.org/C121332964","display_name":"Physics","score":0.10487663745880127},{"id":"https://openalex.org/C62520636","display_name":"Quantum mechanics","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":5}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/abstract-operations-research-modeling-using-natural-language-inputs","title":"Abstract Operations Research Modeling Using Natural Language Inputs","url":"https://www.microsoft.com/en-us/research/publication/abstract-operations-research-modeling-using-natural-language-inputs/","published":"2025-02-10","authors":["Junxuan Li","Ryan Wickman","Sahil Bhatnagar","Raj Kumar Maity","Arko Mukherjee"],"abstract":"Operations research (OR) uses mathematical models to enhance decision making, but developing these models requires expert knowledge and can be time-consuming. Automated mathematical programming (AMP) has emerged to simplify this process, but existing systems have limitations. This paper introduces a novel methodology that uses recent advances in a large language model (LLM) to create and edit abstract OR models from non-expert user queries expressed using natural language. This reduces the need for domain expertise and the time to formulate a problem, and an abstract OR model generated can be deployed to a multi-tenant platform to support a class of users with different input data. This paper presents an end-to-end pipeline, named NL2OR, that generates solutions to OR problems from natural language input, and shares experimental results on several important OR problems. Venue: 1970-01-01","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.3390/info16020128","openalex_id":"https://openalex.org/W4407340340","cited_by_count":4,"quality_score":76,"matched_keywords":["Article (Journal)","Artificial intelligence","1970-01-01","LLM","language model"],"author_affiliations":["Microsoft","Microsoft (United States)"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"bytedance-seed:283","title":"MARS: Unleashing the Power of Variance Reduction for Training Large Models","url":"https://seed.bytedance.com/en/research/mars-unleashing-the-power-of-variance-reduction-for-training-large-models","published":"2025-02-10","authors":["Huizhuo Yuan","Yifeng Liu","Shuang Wu","Xun Zhou","Quanquan Gu"],"abstract":"Training deep neural networks—and more recently, large models demands efficient and scalable optimizers. Adaptive gradient algorithms like Adam, AdamW, and their variants have been central to this task. Despite the development of numerous variance reduction algorithms in the past decade aimed at accelerating stochastic optimization in both convex and nonconvex settings, variance reduction has not found widespread success in training deep neural networks or large language models. Consequently, it has remained a less favored approach in modern AI. In this paper, to unleash the power of variance reduction for efficient training of large models, we propose a unified optimization framework, MARS (Make vAriance Reduction Shine), which reconciles preconditioned gradient methods with variance reduction via a scaled stochastic recursive momentum technique. Within our framework, we introduce three...","companies":["ByteDance/Seed"],"matched_orgs":["ByteDance/Seed"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Machine Learning","AI for Science","ICML 2025","efficient"],"author_affiliations":["ByteDance/Seed"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official ByteDance Seed publication API https://seed.bytedance.com/api/get_article_list_v2"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/examining-false-positives-under-inference-scaling-for-mathematical-reasoning","title":"Examining False Positives under Inference Scaling for Mathematical Reasoning","url":"https://www.microsoft.com/en-us/research/publication/examining-false-positives-under-inference-scaling-for-mathematical-reasoning/","published":"2025-02-10","authors":["Yu Wang","Nan Yang","Liang Wang","Furu Wei"],"abstract":"Recent advancements in language models have led to significant improvements in mathematical reasoning across various benchmarks. However, most of these benchmarks rely on automatic evaluation methods that only compare final answers using heuristics, without verifying the underlying reasoning steps. This limitation results in false positive solutions, where models may produce correct final answers but with flawed deduction paths. In this paper, we systematically examine the prevalence of false positive solutions in mathematical problem solving for language models. We analyze the characteristics and extent of this issue across different open-source models, datasets of varying difficulty levels, and decoding strategies. Specifically, we explore how false positives influence the inference time scaling behavior of language models. Our experimental results reveal that: (1) false positive solut...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Tech Report","Human language technologies","Computer science"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W4407357344","title":"Automatic Database Configuration Debugging using Retrieval-Augmented Language Models","url":"https://doi.org/10.1145/3709663","published":"2025-02-10","authors":["Sibei Chen","Ju Fan","Bin Wu","Nan Tang","Chao Deng","P. Wang","Ye Li","Jian Tan","Feifei Li","Jingren Zhou","Xiaoyong Du"],"abstract":"Database management system (DBMS) configuration debugging, e.g., diagnosing poorly configured DBMS knobs and generating troubleshooting recommendations, is crucial in optimizing DBMS performance. However, the configuration debugging process is tedious and, sometimes challenging, even for seasoned database administrators (DBAs) with sufficient experience in DBMS configurations and good understandings of the DBMS internals (e.g., MySQL or Oracle). To address this difficulty, we propose Andromeda, a framework that utilizes large language models (LLMs) to enable automatic DBMS configuration debugging. Andromeda serves as a natural surrogate of DBAs to answer a wide range of natural language (NL) questions on DBMS configuration issues, and to generate diagnostic suggestions to fix these issues. Nevertheless, directly prompting LLMs with these professional questions may result in overly generi...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3709663","openalex_id":"https://openalex.org/W4407357344","cited_by_count":4,"quality_score":45,"matched_keywords":["retrieval"],"author_affiliations":["Alibaba Group (China)","Alibaba Group (United States)","Renmin University of China"],"concepts":[{"id":"https://openalex.org/C168065819","display_name":"Debugging","score":0.800251841545105},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7525700330734253},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.5424041152000427},{"id":"https://openalex.org/C199360897","display_name":"Programming language","score":0.497130423784256},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.4413597881793976},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.37884077429771423},{"id":"https://openalex.org/C77088390","display_name":"Database","score":0.3292931318283081}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":4}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/gradient-multi-normalization-for-stateless-and-scalable-llm-training","title":"Gradient Multi-Normalization for Stateless and Scalable LLM Training","url":"https://www.microsoft.com/en-us/research/publication/gradient-multi-normalization-for-stateless-and-scalable-llm-training/","published":"2025-02-09","authors":["Meyer Scetbon","Chao Ma","Wenbo Gong","Edward Meeds"],"abstract":"Training large language models (LLMs) typically relies on adaptive optimizers like Adam (Kingma&Ba, 2015) which store additional state information to accelerate convergence but incur significant memory overhead. Recent efforts, such as SWAN (Ma et al., 2024) address this by eliminating the need for optimizer states while achieving performance comparable to Adam via a multi-step preprocessing procedure applied to instantaneous gradients. Motivated by the success of SWAN, we introduce a novel framework for designing stateless optimizers that normalizes stochastic gradients according to multiple norms. To achieve this, we propose a simple alternating scheme to enforce the normalization of gradients w.r.t these norms. We show that our procedure can produce, up to an arbitrary precision, a fixed-point of the problem, and that SWAN is a particular instance of our approach with carefully chosen...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":80,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","1970-01-01","LLM","memory","efficient"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/ai-in-the-era-of-gpt-transforming-the-future-of-work-and-discovery","title":"AI in the Era of GPT: Transforming the Future of Work and Discovery","url":"https://www.microsoft.com/en-us/research/publication/ai-in-the-era-of-gpt-transforming-the-future-of-work-and-discovery/","published":"2025-02-07","authors":["Juan M. Lavista Ferres","Elliot K Fishman","Linda C Chu","Felipe Lopez-Ramirez","Charles K Crawford","Steven P Rowe"],"abstract":"In this new era of accelerated discovery and rapid technological advancements, artificial intelligence (AI) stands out as one of the most prominent areas of innovation, transforming how we approach complex problems across different fields. The recognition by the Nobel Prize committee of the recent physics and chemistry laureates, John Hopfield and Geoffrey Hinton, for their role in enabling artificial neural networks is one example that emphasizes the transformative power that AI holds in our time [ 1 ]. The capabilities of AI seem to expand as quickly as hardware can provide increased capabilities [ 2 , 3 ]. With such power, AI has inspired a new level of optimism in the scientific community—although many are still skeptical of its potential impact on society [ 4 , 5 ]. That only seems natural, as throughout history, humans have evolved to focus on negative news, adopting a pessimistic....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Article (Journal)","Artificial intelligence","Medical, health and genomics","1970-01-01","news"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"apple:cxgiff9i4sbt95592hw482nv","title":"Cut Your Losses in Large-Vocabulary Language Models","url":"https://machinelearning.apple.com/research/cut-your-losses","published":"2025-02-07","authors":["Erik Wijmans","Brody Huval","Alexander Hertzberg","Vladlen Koltun","Philipp Krähenbühl"],"abstract":"As language models grow ever larger, so do their vocabularies. This has shifted the memory footprint of LLMs during training disproportionately to one single layer: the cross-entropy in the loss computation. Cross-entropy builds up a logit matrix with entries for each pair of input tokens and vocabulary items and, for small models, consumes an order of magnitude more memory than the rest of the LLM combined. We propose Cut Cross-Entropy (CCE), a...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["LLM","memory"],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"openalex:W4407250386","title":"Genetic-to-Chemical Perturbation Transfer Learning Through Unified Multimodal Molecular Representations","url":"https://doi.org/10.1101/2025.02.02.635055","published":"2025-02-07","authors":["Yiming Li","Min Zeng","Jun Zhu","Linjing Liu","Fang Wang","Long-Kai Huang","Fan Yang","Min Li","Jianhua Yao"],"abstract":"Abstract Artificial Intelligence virtual cell (AIVC) holds transformative potential for biomedical research. Central to this vision is the systematic modeling of genetic and chemical perturbation phenotypes to accurately predict cellular dynamic states from diverse interventions. However, disparities in screening agents, library scales, experimental technologies, and data production efficiency hinder the integration, modeling, and analysis of the cross-data. Here we present UniPert-G2CP , a two-phase deep learning approach comprising i) UniPert, a multimodal molecular representation model that bridges genetic and chemical domains, and ii) G2CP ( Genetic-to-Chemical Perturbation transfer learning), which systematically transforms CRISPR screen-based genetic insights into chemical perturbation modeling for cost-effective in silico drug screening. UniPert not only encodes multimodal perturb...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"preprint","doi":"https://doi.org/10.1101/2025.02.02.635055","openalex_id":"https://openalex.org/W4407250386","cited_by_count":0,"quality_score":41,"matched_keywords":["efficient"],"author_affiliations":["Center for Life Sciences","Central South University","City University of Hong Kong","Tencent (China)","Tsinghua University"],"concepts":[{"id":"https://openalex.org/C2776359362","display_name":"Representation (politics)","score":0.583022952079773},{"id":"https://openalex.org/C108583219","display_name":"Deep learning","score":0.46742767095565796},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4546613097190857},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.4485097825527191},{"id":"https://openalex.org/C188147891","display_name":"Cognitive science","score":0.43489062786102295},{"id":"https://openalex.org/C2522767166","display_name":"Data science","score":0.3661007285118103},{"id":"https://openalex.org/C70721500","display_name":"Computational biology","score":0.34094706177711487},{"id":"https://openalex.org/C86803240","display_name":"Biology","score":0.276113361120224}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/pike-rag-specialized-knowledge-and-rationale-augmented-generation","title":"PIKE-RAG: sPecIalized KnowledgE and Rationale Augmented Generation","url":"https://www.microsoft.com/en-us/research/publication/pike-rag-specialized-knowledge-and-rationale-augmented-generation/","published":"2025-02-06","authors":["Jinyu Wang","Jingjing Fu","Rui Wang","Lei Song","Jiang Bian"],"abstract":"Despite notable advancements in Retrieval-Augmented Generation (RAG) systems that expand large language model (LLM) capabilities through external retrieval, these systems often struggle to meet the complex and diverse needs of real-world industrial applications. The reliance on retrieval alone proves insufficient for extracting deep, domain-specific knowledge performing in logical reasoning from specialized corpora. To address this, we introduce sPecIalized KnowledgE and Rationale Augmentation Generation (PIKE-RAG), focusing on extracting, understanding, and applying specialized knowledge, while constructing coherent rationale to incrementally steer LLMs toward accurate responses. Recognizing the diverse challenges of industrial tasks, we introduce a new paradigm that classifies tasks based on their complexity in knowledge extraction and application, allowing for a systematic evaluation....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":84,"matched_keywords":["Miscellaneous","Artificial intelligence","Human language technologies","Search and information retrieval","Computer science","LLM","language model","retrieval"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/prompt-tuning-decision-transformers-with-structured-and-scalable-bandits","title":"Prompt Tuning Decision Transformers with Structured and Scalable Bandits","url":"https://www.microsoft.com/en-us/research/publication/prompt-tuning-decision-transformers-with-structured-and-scalable-bandits/","published":"2025-02-06","authors":["Finn Rietz","Oleg Smirnov","Sara Karimi","Lele Cao"],"abstract":"Prompt tuning has emerged as a key technique for adapting large pre-trained Decision Transformers (DTs) in offline Reinforcement Learning (RL), particularly in multi-task and few-shot settings. The Prompting Decision Transformer (PDT) enables task generalization via trajectory prompts sampled uniformly from expert demonstrations -- without accounting for prompt informativeness. In this work, we propose a bandit-based prompt-tuning method that learns to construct optimal trajectory prompts from demonstration data at inference time. We devise a structured bandit architecture operating in the trajectory prompt space, achieving linear rather than combinatorial scaling with prompt size. Additionally, we show that the pre-trained PDT itself can serve as a powerful feature extractor for the bandit, enabling efficient reward modeling across various environments. We theoretically establish regret...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"https://openalex.org/W7115097410","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","Reinforcement learning","1970-01-01","efficient"],"author_affiliations":["Microsoft","Microsoft (United States)"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W4407230380","title":"A multimodal multidomain multilingual medical foundation model for zero shot clinical diagnosis","url":"https://doi.org/10.1038/s41746-024-01339-7","published":"2025-02-06","authors":["Fenglin Liu","Li Zheng","Qingyu Yin","Jinfa Huang","Jiebo Luo","Anshul Thakur","Kim Branson","Patrick Schwab","Bing Yin","Xian Wu","Yefeng Zheng","David A. Clifton"],"abstract":"Radiology images are one of the most commonly used in daily clinical diagnosis. Typically, clinical diagnosis using radiology images involves disease reporting and classification, where the former is a multimodal task whereby textual reports are generated to describe clinical findings in images, as are common in various domains, e.g., chest X-ray or computed tomography. Existing approaches are mainly supervised, the quality of which heavily depends on the volume and quality of available labeled data. However, for rarer or more novel diseases, enrolling patients to collect data is both time-consuming and expensive. For non-English languages, sufficient quantities of labeled data are typically not available. We propose the Multimodal Multidomain Multilingual Foundation Model. It is useful for rare diseases and non-English languages, where the labeled data are frequently much more scarce, a...","companies":["Amazon","Tencent/Hunyuan"],"matched_orgs":["Amazon","Tencent/Hunyuan"],"company_groups":["company_us","company_china"],"company_regions":["US","China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1038/s41746-024-01339-7","openalex_id":"https://openalex.org/W4407230380","cited_by_count":25,"quality_score":74,"matched_keywords":[],"author_affiliations":["Amazon (United States)","GlaxoSmithKline (United Kingdom)","Science Oxford","Suzhou Research Institute","Tencent (China)","University of Oxford","University of Rochester","Westlake University"],"concepts":[{"id":"https://openalex.org/C2780813799","display_name":"Zero (linguistics)","score":0.7312852144241333},{"id":"https://openalex.org/C2780966255","display_name":"Foundation (evidence)","score":0.7126696109771729},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.4567985534667969},{"id":"https://openalex.org/C71924100","display_name":"Medicine","score":0.3529120981693268},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3409515619277954},{"id":"https://openalex.org/C17744445","display_name":"Political science","score":0.15324610471725464},{"id":"https://openalex.org/C138885662","display_name":"Philosophy","score":0.1485111117362976},{"id":"https://openalex.org/C41895202","display_name":"Linguistics","score":0.1484186351299286}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":25}},{"id":"bytedance-seed:182","title":"Ultra-Sparse Memory Network","url":"https://seed.bytedance.com/en/research/ultra-sparse-memory-network","published":"2025-02-06","authors":["Zihao Huang","Qiyang Min","Hongzhi Huang","Defa Zhu","Yutao Zeng","Ran Guo","Xun Zhou"],"abstract":"It is widely acknowledged that the performance of Transformer models is logarithmically related to their number of parameters and computational complexity. While approaches like Mixture of Experts (MoE) decouple parameter count from computational complexity, they still face challenges in inference due to high memory access costs. This work introduces UltraMem, incorporating large-scale, ultra-sparse memory layer to address these limitations. Our approach significantly reduces inference latency while maintaining model performance. We also investigate the scaling laws of this new architecture, demonstrating that it not only exhibits favorable scaling properties but outperforms MoE. In experiments, the largest UltraMem we train has 20 million memory slots. The results show that our method achieves state-of-the-art inference speed and model performance within a given computational budget, pa...","companies":["ByteDance/Seed"],"matched_orgs":["ByteDance/Seed"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["LLM","ICLR 2025","memory"],"author_affiliations":["ByteDance/Seed"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official ByteDance Seed publication API https://seed.bytedance.com/api/get_article_list_v2"}},{"id":"openalex:W4407206663","title":"Discovering Symbolic Cognitive Models from Human and Animal Behavior","url":"https://doi.org/10.1101/2025.02.05.636732","published":"2025-02-06","authors":["Pablo Samuel Castro","Nenad Tomašev","Ankit Anand","Navodita Sharma","Rishika Mohanta","Aparna Dev","Kuba Perlin","Siddhant Jain","Kyle Levin","Noémi Éltető","Will Dabney","Alexander Novikov"],"abstract":"Symbolic models play a key role in cognitive science, expressing computationally precise hypotheses about how the brain implements a cognitive process. Identifying an appropriate model typically requires a great deal of effort and ingenuity on the part of a human scientist. Here, we adapt FunSearch Romera-Paredes et al. (2024), a recently developed tool that uses Large Language Models (LLMs) in an evolutionary algorithm, to automatically discover symbolic cognitive models that accurately capture human and animal behavior. We consider datasets from three species performing a classic reward-learning task that has been the focus of substantial modeling effort, and find that the discovered programs outperform state-of-the-art cognitive models for each. The discovered programs can readily be interpreted as hypotheses about human and animal cognition, instantiating interpretable symbolic learn...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"preprint","doi":"https://doi.org/10.1101/2025.02.05.636732","openalex_id":"https://openalex.org/W4407206663","cited_by_count":12,"quality_score":53,"matched_keywords":["LLM"],"author_affiliations":["Columbia University","Google (United States)","Google DeepMind (United Kingdom)","Howard Hughes Medical Institute","Janelia Research Campus","Max Planck Institute for Biological Cybernetics","Princeton University","Rockefeller University","Sainsbury Laboratory","University College London"],"concepts":[{"id":"https://openalex.org/C2778154381","display_name":"Ingenuity","score":0.7692099213600159},{"id":"https://openalex.org/C169900460","display_name":"Cognition","score":0.7376919984817505},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6678622364997864},{"id":"https://openalex.org/C2780451532","display_name":"Task (project management)","score":0.5906085968017578},{"id":"https://openalex.org/C41690226","display_name":"Animal cognition","score":0.5860324501991272},{"id":"https://openalex.org/C188147891","display_name":"Cognitive science","score":0.5763144493103027},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5290300846099854},{"id":"https://openalex.org/C98045186","display_name":"Process (computing)","score":0.5265384316444397}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":12}},{"id":"arxiv:2502.04235","title":"MAGA: MAssive Genre-Audience Reformulation to Pretraining Corpus Expansion","url":"https://huggingface.co/papers/2502.04235","published":"2025-02-06","authors":["Xintong Hao","Ke Shen","Chenggang Li"],"abstract":"Despite the remarkable capabilities of large language models across various tasks, their continued scaling faces a critical challenge: the scarcity of high-quality pretraining data. While model architectures continue to evolve, the natural language data struggles to scale up. To tackle this bottleneck, we propose MAssive Genre-Audience~(MAGA) reformulation method, which systematic synthesizes diverse, contextually-rich pretraining data from existing corpus. This work makes three main contributions: (1) We propose MAGA reformulation method, a lightweight and scalable approach for pretraining corpus expansion, and build a 770B tokens MAGACorpus. (2) We evaluate MAGACorpus with different data budget scaling strategies, demonstrating consistent improvements across various model sizes (134M-13B), establishing the necessity for next-generation large-scale synthetic pretraining language models....","companies":["ByteDance/Seed"],"matched_orgs":["ByteDance/Seed"],"company_groups":["company_china"],"company_regions":["China"],"sources":["huggingface_search"],"source":"huggingface_search","work_type":"","doi":"","openalex_id":"","cited_by_count":0,"quality_score":27,"matched_keywords":[],"author_affiliations":[],"concepts":[],"official_report":false,"quality_signals":{"company_match_source":"HuggingFace Papers organization/author metadata"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/boosting-gpt-models-for-genomics-analysis-generating-trusted-genetic-variant-annotations-and-interpretations-through-rag-and-fine-tuning","title":"Boosting GPT Models for Genomics Analysis: Generating Trusted Genetic Variant Annotations and Interpretations through RAG and fine-tuning","url":"https://www.microsoft.com/en-us/research/publication/boosting-gpt-models-for-genomics-analysis-generating-trusted-genetic-variant-annotations-and-interpretations-through-rag-and-fine-tuning/","published":"2025-02-05","authors":["Shuangjia Lu","Erdal Cosgun"],"abstract":"Large language models (LLMs) have acquired a remarkable level of knowledge through their initial training. However, they lack expertise in particular domains such as genomics. Variant annotation data, an important component of genomics, is crucial for interpreting and prioritizing disease-related variants among millions of variants identified by genetic sequencing. In our project, we aimed to improve LLM performance in genomics by adding variant annotation data to LLMs by retrieval-augmented generation (RAG) and fine-tuning techniques. Using RAG, we successfully integrated 190 million highly accurate variant annotations, curated from 5 major annotation datasets and tools, into GPT-4o. This integration empowers users to query specific variants and receive accurate variant annotations and interpretations supported by advanced reasoning and language understanding capabilities of LLMs. Addit...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.1093/bioadv/vbaf019","openalex_id":"https://openalex.org/W4407174849","cited_by_count":10,"quality_score":82,"matched_keywords":["Article (Journal)","Medical, health and genomics","1970-01-01","LLM","retrieval"],"author_affiliations":["Microsoft","Microsoft (United States)","Yale University"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/verifiable-format-control-for-large-language-model-generations","title":"Verifiable Format Control for Large Language Model Generations","url":"https://www.microsoft.com/en-us/research/publication/verifiable-format-control-for-large-language-model-generations/","published":"2025-02-05","authors":["Zhaoyang Wang","Jinqi Jiang","Huichi Zhou","Wenhao Zheng","Xuchao Zhang","Chetan Bansal","Huaxiu Yao"],"abstract":"Recent Large Language Models (LLMs) have demonstrated satisfying general instruction following ability. However, small LLMs with about 7B parameters still struggle fine-grained format following (e.g., JSON format), which seriously hinder the advancements of their applications. Most existing methods focus on benchmarking general instruction following while overlook how to improve the specific format following ability for small LLMs. Besides, these methods often rely on evaluations based on advanced LLMs (e.g., GPT-4), which can introduce the intrinsic bias of LLMs and be costly due to the API calls. In this paper, we first curate a fully verifiable format following dataset VFF. In contrast to existing works often adopting external LLMs for instruction-following validations, every sample of VFF can be easily validated with a Python function. Further, we propose to leverage this verifiable....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","1970-01-01","language model"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"bytedance-seed:278","title":"Teaching Language Models to Critique via Reinforcement Learning","url":"https://seed.bytedance.com/en/research/teaching-language-models-to-critique-via-reinforcement-learning","published":"2025-02-05","authors":["Zhihui Xie","Jie Chen","Liyu Chen","Weichao Mao","Jingjing Xu","Lingpeng Kong"],"abstract":"Teaching large language models (LLMs) to critique and refine their outputs is crucial for building systems that can iteratively improve, yet it is fundamentally limited by the ability to provide accurate judgments and actionable suggestions. In this work, we study LLM critics for code generation and propose CTRL, a framework for Critic Training via Reinforcement Learning, which trains a critic model to generate feedback that maximizes correction performance for a fixed generator model without human supervision. Our results demonstrate that critics trained with CTRL significantly enhance pass rates and mitigate compounding errors across both base and stronger generator models. Furthermore, we show that these critic models act as accurate generative reward models and enable test-time scaling through iterative critique-revision, achieving up to 106.1% relative improvements across challengin...","companies":["ByteDance/Seed"],"matched_orgs":["ByteDance/Seed"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Machine Learning","LLM","ICML 2025"],"author_affiliations":["ByteDance/Seed"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official ByteDance Seed publication API https://seed.bytedance.com/api/get_article_list_v2"}},{"id":"bytedance-seed:156","title":"BFS-Prover: Scalable Best-First Tree Search for LLM-based Automatic Theorem Proving","url":"https://seed.bytedance.com/en/research/bfs-prover-scalable-best-first-tree-search-for-llm-based-automatic-theorem-proving","published":"2025-02-05","authors":["Ran Xin","Chenguang Xi","Jie Yang","Feng Chen","Hang Wu","Xia Xiao","Yifan Sun","Shen Zheng","Kai Shen"],"abstract":"Recent advancements in large language models (LLMs) have spurred growing interest in automatic theorem proving using Lean4, where effective tree search methods are crucial for navigating proof search spaces. While the existing approaches primarily rely on value functions and Monte Carlo Tree Search (MCTS), the potential of simpler methods like Best-First Search (BFS) remains underexplored. This paper investigates whether BFS can achieve competitive performance in large-scale theorem proving tasks. We present \\texttt{BFS-Prover}, a scalable expert iteration framework, featuring three key innovations. First, we implement strategic data filtering at each expert iteration round, excluding problems solvable via beam search node expansion to focus on harder cases. Second, we improve the sample efficiency of BFS through Direct Preference Optimization (DPO) applied to state-tactic pairs automati...","companies":["ByteDance/Seed"],"matched_orgs":["ByteDance/Seed"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["LLM","arXiv","preference"],"author_affiliations":["ByteDance/Seed"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official ByteDance Seed publication API https://seed.bytedance.com/api/get_article_list_v2"}},{"id":"apple:yzfs7mgsr0b8psbac2szx3gj","title":"Reinforcement Learning for Long-Horizon Interactive LLM Agents","url":"https://machinelearning.apple.com/research/reinforcement-learning-long-horizon","published":"2025-02-05","authors":["Kevin Chen","Marco Cusumano-Towner","Brody Huval","Aleksei Petrenko","Jackson Hamburger","Vladlen Koltun","Philipp Krähenbühl"],"abstract":"Interactive digital agents (IDAs) leverage APIs of stateful digital environments to perform tasks in response to user requests. While IDAs powered by instruction-tuned large language models (LLMs) can react to feedback from interface invocations in multi-step exchanges, they have not been trained in their respective digital environments. Prior methods accomplish less than half of tasks in sophisticated benchmarks such as AppWorld. We present a...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["LLM"],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/%ce%b5-vae-denoising-as-visual-decoding","title":"ε-VAE: Denoising as Visual Decoding","url":"https://www.microsoft.com/en-us/research/publication/%ce%b5-vae-denoising-as-visual-decoding/","published":"2025-02-04","authors":["Long Zhao","Sanghyun Woo","Ziyu Wan","Yandong Li","Han Zhang","Boqing Gong","Hartwig Adam","Xuhui Jia","Ting Liu"],"abstract":"In generative modeling, tokenization simplifies complex data into compact, structured representations, creating a more efficient, learnable space. For high-dimensional visual data, it reduces redundancy and emphasizes key features for high-quality generation. Current visual tokenization methods rely on a traditional autoencoder framework, where the encoder compresses data into latent representations, and the decoder reconstructs the original input. In this work, we offer a new perspective by proposing denoising as decoding, shifting from single-step reconstruction to iterative refinement. Specifically, we replace the decoder with a diffusion process that iteratively refines noise to recover the original image, guided by the latents provided by the encoder. We evaluate our approach by assessing both reconstruction (rFID) and generation quality (FID), comparing it to state-of-the-art autoe...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":88,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer vision","Graphics and multimedia","Computer science","Engineering","1970-01-01","efficient","compression"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/omni-dna-a-unified-genomic-foundation-model-for-cross-modal-and-multi-task-learning","title":"Omni-DNA: A Unified Genomic Foundation Model for Cross-Modal and Multi-Task Learning","url":"https://www.microsoft.com/en-us/research/publication/omni-dna-a-unified-genomic-foundation-model-for-cross-modal-and-multi-task-learning/","published":"2025-02-04","authors":["Zehui Li","Vallijah Subasri","Yifei Shen","Dongsheng Li","Yiren Zhao","Guy-Bart Stan","Caihua Shan"],"abstract":"Large Language Models (LLMs) demonstrate remarkable generalizability across diverse tasks, yet genomic foundation models (GFMs) still require separate finetuning for each downstream application, creating significant overhead as model sizes grow. Moreover, existing GFMs are constrained by rigid output formats, limiting their applicability to various genomic tasks. In this work, we revisit the transformer-based auto-regressive models and introduce Omni-DNA, a family of cross-modal multi-task models ranging from 20 million to 1 billion parameters. Our approach consists of two stages: (i) pretraining on DNA sequences with next token prediction objective, and (ii) expanding the multi-modal task-specific tokens and finetuning for multiple downstream tasks simultaneously. When evaluated on the Nucleotide Transformer and GB benchmarks, Omni-DNA achieves state-of-the-art performance on 18 out of....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":80,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Medical, health and genomics","Biology","Computer science","Genomics","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/harmony-in-divergence-towards-fast-accurate-and-memory-efficient-zeroth-order-llm-fine-tuning","title":"Harmony in Divergence: Towards Fast, Accurate, and Memory-efficient Zeroth-order LLM Fine-tuning","url":"https://www.microsoft.com/en-us/research/publication/harmony-in-divergence-towards-fast-accurate-and-memory-efficient-zeroth-order-llm-fine-tuning/","published":"2025-02-04","authors":["Qitao Tan","Jun Liu","Zheng Zhan","Caiwei Ding","Yanzhi Wang","Jin Lu","Geng Yuan"],"abstract":"Large language models (LLMs) excel across various tasks, but standard first-order (FO) fine-tuning demands considerable memory, significantly limiting real-world deployment. Recently, zeroth-order (ZO) optimization stood out as a promising memory-efficient training paradigm, avoiding backward passes and relying solely on forward passes for gradient estimation, making it attractive for resource-constrained scenarios. However, ZO method lags far behind FO method in both convergence speed and accuracy. To bridge the gap, we introduce a novel layer-wise divergence analysis that uncovers the distinct update pattern of FO and ZO optimization. Aiming to resemble the learning capacity of FO method from the findings, we propose Divergence-driven Zeroth-Order (DiZO) optimization. DiZO conducts divergence-driven layer adaptation by incorporating projections to ZO updates, generating diverse-magnitu...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":80,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","1970-01-01","LLM","memory","efficient"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"apple:ofq6bcafmb3hen4rph39drdh","title":"Adaptive Training Distributions with Scalable Online Bilevel Optimization","url":"https://machinelearning.apple.com/research/adaptive-training","published":"2025-02-04","authors":["David Grangier","Pierre Ablin","Awni Hannun"],"abstract":"Large neural networks pretrained on web-scale corpora are central to modern machine learning. In this paradigm, the distribution of the large, heterogeneous pretraining data rarely matches that of the application domain. This work considers modifying the pretraining distribution in the case where one has a small sample of data reflecting the targeted test conditions. We propose an algorithm motivated by a recent formulation of this setting as an...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/from-uncertain-to-safe-conformal-fine-tuning-of-diffusion-models-for-safe-pde-control","title":"From Uncertain to Safe: Conformal Fine-Tuning of Diffusion Models for Safe PDE Control","url":"https://www.microsoft.com/en-us/research/publication/from-uncertain-to-safe-conformal-fine-tuning-of-diffusion-models-for-safe-pde-control/","published":"2025-02-03","authors":["Peiyan Hu","Xiaowei Qian","Wenhao Deng","Rui Wang","Haodong Feng","Ruiqi Feng","Tao Zhang","Long Wei","Yue Wang","Zhi-Ming Ma","Tailin Wu"],"abstract":"The application of deep learning for partial differential equation (PDE)-constrained control is gaining increasing attention. However, existing methods rarely consider safety requirements crucial in real-world applications. To address this limitation, we propose Safe Diffusion Models for PDE Control (SafeDiffCon), which introduce the uncertainty quantile as model uncertainty quantification to achieve optimal control under safety constraints through both post-training and inference phases. Firstly, our approach post-trains a pre-trained diffusion model to generate control sequences that better satisfy safety constraints while achieving improved control objectives via a reweighted diffusion loss, which incorporates the uncertainty quantile estimated using conformal prediction. Secondly, during inference, the diffusion model dynamically adjusts both its generation process and parameters thr...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","Diffusion models","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/are-language-models-up-to-sequential-optimization-problems-from-evaluation-to-a-hegelian-inspired-enhancement","title":"Are Language Models Up to Sequential Optimization Problems? From Evaluation to a Hegelian-Inspired Enhancement","url":"https://www.microsoft.com/en-us/research/publication/are-language-models-up-to-sequential-optimization-problems-from-evaluation-to-a-hegelian-inspired-enhancement/","published":"2025-02-03","authors":["Soheil Abbasloo"],"abstract":"Large Language Models (LLMs) have demonstrated impressive capabilities across numerous fields, presenting an opportunity to revolutionize optimization problem-solving, a crucial, ubiquitous, and complex domain. This paper explores the proficiency of LLMs in handling Sequential Optimization Problems (SOPs). We introduce WorldGen, a dynamic framework for generating unseen SOPs with controllable complexities, to evaluate LLM performance. Our initial observations reveal that while LLMs perform well on simple SOPs, their performance significantly degrades with increased complexity. Motivated by this, we revisit philosophical hypotheses on reasoning to enhance LLM performance. Inspired by the influential framework of Hegelian Dialectics, we propose ACE, demonstrating how the performance of LLMs in SOP contexts can be significantly improved without any retraining or further fine-tuning.","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Miscellaneous","Artificial intelligence","Systems and networking","Computer science","LLM"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W4407108493","title":"CTRL: Connect Collaborative and Language Model for CTR Prediction","url":"https://doi.org/10.1145/3713080","published":"2025-02-03","authors":["Xiangyang Li","Bo Chen","Lu Hou","Ruiming Tang"],"abstract":"Traditional click-through rate (CTR) prediction models convert the tabular data into one-hot vectors and leverage the collaborative relations among features for inferring the user’s preference over items. This modeling paradigm discards essential semantic information. Though some works like P5 and KAR have explored the potential of using Pre-trained Language Models (PLMs) to extract semantic signals for CTR prediction, they are computationally expensive and suffer from low efficiency. Besides, the beneficial collaborative relations are not considered, hindering the recommendation performance. To solve these problems, in this article, we propose a novel framework CTRL , which is industrial-friendly and model-agnostic with superior inference efficiency. Specifically, the original tabular data is first converted into textual data. Both tabular data and converted textual data are regarded as...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3713080","openalex_id":"https://openalex.org/W4407108493","cited_by_count":3,"quality_score":52,"matched_keywords":["language model","preference","efficient"],"author_affiliations":["Huawei Technologies (China)","Huawei Technologies (Sweden)","Huawei Technologies (United Kingdom)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5390758514404297},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.4329809844493866}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":3}},{"id":"openalex:W4410492388","title":"Grounding Image Understanding to Oil and Gas Product Manuals: Refining LLaVA through Contextual Instruction Tuning","url":"https://doi.org/10.1109/aixmm62960.2025.00018","published":"2025-02-03","authors":["Hui Wang","Salma Benslimane"],"abstract":"The significance of multimodal models lies in their enhanced capabilities to parse, understand, and reason on complex documents that contain a mixture of text, images, tables, and other components. These models have shown remarkable proficiency in providing coherent and contextually accurate descriptions and analyses, thus significantly advancing document processing tasks. However, to maximize their utility in specific industries, it is crucial to adapt these multimodal models to domain-specific tasks, where specialized knowledge and data are required for precise interpretation and application. In this paper, our approach to multimodal document processing bridges the gap between proprietary large-scale models and smaller open-source models. By combining the robust embedding capabilities of proprietary models with the modular and resource-efficient architecture of open-source models, the....","companies":["Meta/FAIR"],"matched_orgs":["Meta/FAIR"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/aixmm62960.2025.00018","openalex_id":"https://openalex.org/W4410492388","cited_by_count":0,"quality_score":41,"matched_keywords":["efficient"],"author_affiliations":["Menlo School","Meta (United States)"],"concepts":[{"id":"https://openalex.org/C60044698","display_name":"Refining (metallurgy)","score":0.7796623110771179},{"id":"https://openalex.org/C168993435","display_name":"Ground","score":0.6156133413314819},{"id":"https://openalex.org/C90673727","display_name":"Product (mathematics)","score":0.6096800565719604},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.583324134349823},{"id":"https://openalex.org/C115961682","display_name":"Image (mathematics)","score":0.41054463386535645},{"id":"https://openalex.org/C199639397","display_name":"Engineering drawing","score":0.40530163049697876},{"id":"https://openalex.org/C21880701","display_name":"Process engineering","score":0.34787583351135254},{"id":"https://openalex.org/C78762247","display_name":"Petroleum engineering","score":0.34018802642822266}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4407078746","title":"EARBench: Towards Evaluating Physical Risk Awareness for Task Planning of Foundation Model-based Embodied AI Agents","url":"https://doi.org/10.21203/rs.3.rs-5540665/v1","published":"2025-02-03","authors":["Baoyuan Wu","Zihao Zhu","Bingzhe Wu","Zhengyou Zhang","Lei Han","Qingshan Liu"],"abstract":"","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"preprint","doi":"https://doi.org/10.21203/rs.3.rs-5540665/v1","openalex_id":"https://openalex.org/W4407078746","cited_by_count":1,"quality_score":38,"matched_keywords":[],"author_affiliations":["Chinese University of Hong Kong, Shenzhen","Nanjing University of Posts and Telecommunications","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C105339364","display_name":"Software deployment","score":0.6511338949203491},{"id":"https://openalex.org/C2780801425","display_name":"Construct (python library)","score":0.588169515132904},{"id":"https://openalex.org/C100609095","display_name":"Embodied cognition","score":0.5636320114135742},{"id":"https://openalex.org/C2780451532","display_name":"Task (project management)","score":0.5632073879241943},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5520777702331543},{"id":"https://openalex.org/C112930515","display_name":"Risk analysis (engineering)","score":0.5224915146827698},{"id":"https://openalex.org/C2780966255","display_name":"Foundation (evidence)","score":0.4457135498523712},{"id":"https://openalex.org/C32896092","display_name":"Risk management","score":0.4128752648830414}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"arxiv:2502.06807","title":"Competitive Programming with Large Reasoning Models","url":"https://huggingface.co/papers/2502.06807","published":"2025-02-03","authors":["OpenAI","Ahmed El-Kishky","Alexander Wei","Andre Saraiva","Borys Minaev","Daniel Selsam","David Dohan","Francis Song","Hunter Lightman","Ignasi Clavera","Jakub Pachocki","Jerry Tworek"],"abstract":"We show that reinforcement learning applied to large language models (LLMs) significantly boosts performance on complex coding and reasoning tasks. Additionally, we compare two general-purpose reasoning models - OpenAI o1 and an early checkpoint of o3 - with a domain-specific system, o1-ioi, which uses hand-engineered inference strategies designed for competing in the 2024 International Olympiad in Informatics (IOI). We competed live at IOI 2024 with o1-ioi and, using hand-crafted test-time strategies, placed in the 49th percentile. Under relaxed competition constraints, o1-ioi achieved a gold medal. However, when evaluating later models such as o3, we find that o3 achieves gold without hand-crafted domain-specific strategies or relaxed constraints. Our findings show that although specialized pipelines such as o1-ioi yield solid improvements, the scaled-up, general-purpose o3 model surpa...","companies":["OpenAI"],"matched_orgs":["OpenAI"],"company_groups":["company_us"],"company_regions":["US"],"sources":["huggingface_search"],"source":"huggingface_search","work_type":"","doi":"","openalex_id":"","cited_by_count":0,"quality_score":27,"matched_keywords":[],"author_affiliations":[],"concepts":[],"official_report":false,"quality_signals":{"company_match_source":"HuggingFace Papers organization/author metadata"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/modserve-scalable-and-resource-efficient-large-multimodal-model-serving","title":"ModServe: Scalable and Resource-Efficient Large Multimodal Model Serving","url":"https://www.microsoft.com/en-us/research/publication/modserve-scalable-and-resource-efficient-large-multimodal-model-serving/","published":"2025-02-02","authors":["Haoran Qiu","Anish Biswas","Zihan Zhao","Jayashree Mohan","Alind Khare","Esha Choukse","Íñigo Goiri","Zeyu Zhang","Haiying Shen","Chetan Bansal","Ramachandran Ramjee","Rodrigo Fonseca"],"abstract":"Large multimodal models (LMMs) demonstrate impressive capabilities in understanding images, videos, and audio beyond text. However, efficiently serving LMMs in production environments poses significant challenges due to their complex architectures and heterogeneous characteristics across their multi-stage inference pipelines. We present the first comprehensive systems analysis of two prominent LMM architectures, decoder-only and cross-attention, across six representative open-source models, revealing key systems design implications. We also present an in-depth analysis of production LMM inference traces, uncovering unique workload characteristics, including variable, heavy-tailed request distributions and bursty traffic patterns. Based on these insights, we propose ModServe, a modular LMM serving system that decouples stages for independent optimization and adaptive scaling. ModServe dyn...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Miscellaneous","Artificial intelligence","Computer science","efficient"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/vidit-q-efficient-and-accurate-quantization-of-diffusion-transformers-for-image-and-video-generation","title":"ViDiT-Q: Efficient and Accurate Quantization of Diffusion Transformers for Image and Video Generation","url":"https://www.microsoft.com/en-us/research/publication/vidit-q-efficient-and-accurate-quantization-of-diffusion-transformers-for-image-and-video-generation/","published":"2025-02-01","authors":["Tianchen Zhao","Tongcheng Fang","Enshu Liu","Wan Rui","Widyadewi Soedarmadji","Shiyao Li","Zinan Lin","Guohao Dai","Shengen Yan","Huazhong Yang","Xuefei Ning"],"abstract":"Diffusion transformers (DiTs) have exhibited remarkable performance in visual generation tasks, such as generating realistic images or videos based on textual instructions. However, larger model sizes and multi-frame processing for video generation lead to increased computational and memory costs, posing challenges for practical deployment on edge devices. Post-Training Quantization (PTQ) is an effective method for reducing memory costs and computational complexity. When quantizing diffusion transformers, we find that applying existing diffusion quantization methods designed for U-Net faces challenges in preserving quality. After analyzing the major challenges for quantizing diffusion transformers, we design an improved quantization scheme: \"ViDiT-Q\": Video and Image Diffusion Transformer Quantization) to address these issues. Furthermore, we identify highly sensitive layers and timestep...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":88,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer vision","Diffusion models","quantization","Transformers","1970-01-01","memory","efficient"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/distilled-decoding-1-one-step-sampling-of-image-auto-regressive-models-with-flow-matching","title":"Distilled Decoding 1: One-step Sampling of Image Auto-regressive Models with Flow Matching","url":"https://www.microsoft.com/en-us/research/publication/distilled-decoding-1-one-step-sampling-of-image-auto-regressive-models-with-flow-matching/","published":"2025-02-01","authors":["Enshu Liu","Xuefei Ning","Yu Wang","Zinan Lin"],"abstract":"Autoregressive (AR) models have achieved state-of-the-art performance in text and image generation but suffer from slow generation due to the token-by-token process. We ask an ambitious question: can a pre-trained AR model be adapted to generate outputs in just one or two steps? If successful, this would significantly advance the development and deployment of AR models. We notice that existing works that try to speed up AR generation by generating multiple tokens at once fundamentally cannot capture the output distribution due to the conditional dependencies between tokens, limiting their effectiveness for few-step generation. To address this, we propose Distilled Decoding (DD), which uses flow matching to create a deterministic mapping from Gaussian distribution to the output distribution of the pre-trained AR model. We then train a network to distill this mapping, enabling few-step gen...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":88,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer vision","Autoregressive model","Image synthesis","Rectified Flows","Synthetic data","1970-01-01","efficient"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/mmdt-decoding-the-trustworthiness-and-safety-of-multimodal-foundation-models","title":"MMDT: Decoding the Trustworthiness and Safety of Multimodal Foundation Models","url":"https://www.microsoft.com/en-us/research/publication/mmdt-decoding-the-trustworthiness-and-safety-of-multimodal-foundation-models/","published":"2025-02-01","authors":["Chejian Xu","Jiawei Zhang","Zhaorun Chen","Chulin Xie","Mintong Kang","Zhuowen Yuan","Zidi Xiong","Chenhui Zhang","Lingzhi Yuan","Yi Zeng","Peiyang Xu","Chengquan Guo"],"abstract":"Multimodal foundation models (MMFMs) play a crucial role in various applications, including autonomous driving, healthcare, and virtual assistants. However, several studies have revealed vulnerabilities in these models, such as generating unsafe content by text-to-image models. Existing benchmarks on multimodal models either predominantly assess the helpfulness of these models, or only focus on limited perspectives such as fairness and privacy. In this paper, we present the first unified platform, MMDT (Multimodal DecodingTrust), designed to provide a comprehensive safety and trustworthiness evaluation for MMFMs. Our platform assesses models from multiple perspectives, including safety, hallucination, fairness/bias, privacy, adversarial robustness, and out-of-distribution (OOD) generalization. We have designed various evaluation scenarios and red teaming algorithms under different tasks....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":84,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer vision","Security, privacy, and cryptography","Benchmarking","Diffusion models","Security and Privacy","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/ecoserve-designing-carbon-aware-ai-inference-systems","title":"EcoServe: Designing Carbon-Aware AI Inference Systems","url":"https://www.microsoft.com/en-us/research/publication/ecoserve-designing-carbon-aware-ai-inference-systems/","published":"2025-02-01","authors":["Yueying Li","Zhanqiu Hu","Esha Choukse","Rodrigo Fonseca","G. Edward Suh","Udit Gupta"],"abstract":"The rapid increase in LLM ubiquity and scale levies unprecedented demands on computing infrastructure. These demands not only incur large compute and memory resources, but also significant energy, yielding large operational and embodied carbon emissions. In this work, we present two main observations. First, while GPUs dominate operational carbon, host processing systems (e.g., CPUs, memory, storage) dominate embodied carbon. Second, based on traces from production deployment of two Generative AI services in the cloud, offline, batch-inference accounts for a significant portion (up to 55\\%) of serving capacity. We propose four pillars of carbon-conscious infrastructure design for LLM serving systems: \\textbf{\\textit{Reduce, Reuse, Rightsize, and Recycle}}. We demonstrate that EcoServe can lower carbon emissions by up to 47\\%, compared to performance, energy, and cost-optimized design poi...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Article (Journal)","Artificial intelligence","Systems and networking","1970-01-01","LLM","memory"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/qure","title":"QURE: AI-assisted and Automatically Verified UDF Inlining","url":"https://www.microsoft.com/en-us/research/publication/qure/","published":"2025-02-01","authors":["Tarique Siddiqui","Arnd Christian König","Jiashen Cao","Cong Yan","Shuvendu Lahiri (shuvendu)"],"abstract":"User-defined functions (UDFs) extend the capabilities of SQL by improving code reusability and encapsulating complex logic, but can hinder the performance due to optimization and execution inefficiencies. Prior approaches attempt to address this by rewriting UDFs into native SQL, which is then inlined into the SQL queries that invoke them. However, these approaches are either limited to simple pattern matching or require the synthesis of complex verification conditions from procedural code, a process that is brittle and difficult to automate. This limits coverage and makes the translation approaches less extensible to unseen procedural constructs. In this work, we present QURE, a framework that (1) leverages large language models (LLMs) to translate UDFs to native SQL, and (2) introduces a novel formal verification method to establish equivalence between the UDF and its translation. QURE...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.1145/3709716","openalex_id":"https://openalex.org/W4407356002","cited_by_count":2,"quality_score":74,"matched_keywords":["Article (Journal)","Data platforms and analytics","Programming languages and software engineering","1970-01-01","memory"],"author_affiliations":["Microsoft","Atlanta Technical College","Bellevue Hospital Center","Georgia Institute of Technology","Microsoft (United States)"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/when-do-llms-help-with-node-classification-a-comprehensive-analysis","title":"When Do LLMs Help With Node Classification? A Comprehensive Analysis","url":"https://www.microsoft.com/en-us/research/publication/when-do-llms-help-with-node-classification-a-comprehensive-analysis/","published":"2025-02-01","authors":["Xixi Wu","Yifei Shen","Fangzhou Ge","Caihua Shan","Yizhu Jiao","Xiangguo Sun","Hong Cheng"],"abstract":"Node classification is a fundamental task in graph analysis, with broad applications across various fields. Recent breakthroughs in Large Language Models (LLMs) have enabled LLM-based approaches for this task. Although many studies demonstrate the impressive performance of LLM-based methods, the lack of clear design guidelines may hinder their practical application. In this work, we aim to establish such guidelines through a fair and systematic comparison of these algorithms. As a first step, we developed LLMNodeBed, a comprehensive codebase and testbed for node classification using LLMs. It includes 10 homophilic datasets, 4 heterophilic datasets, 8 LLM-based algorithms, 8 classic baselines, and 3 learning paradigms. Subsequently, we conducted extensive experiments, training and evaluating over 2,700 models, to determine the key settings (e.g., learning paradigms and homophily) and comp...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","1970-01-01","LLM"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"huawei-noah:202","title":"Eve: Efficient Multimodal Vision Language Models with Elastic Visual Experts","url":"https://www.noahlab.com.hk/en/scientific_research/eve-efficient-multimodal-vision-language-models-with-elastic-visual-experts","published":"2025-02-01","authors":["Huawei/Noah"],"abstract":"Official Huawei Noah's Ark Lab publication entry. Venue: AAAI 2025. External paper link: https://ojs.aaai.org/index.php/AAAI/article/view/32718","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Model architecture and optimization","AAAI 2025","2025","efficient"],"author_affiliations":["Huawei/Noah"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Huawei Noah Wagtail publication API https://www.noahlab.com.hk/wt_app/api/v2/pages/"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/art-anonymous-region-transformer-for-variable-multi-layer-transparent-image-generation","title":"ART: Anonymous Region Transformer for Variable Multi-Layer Transparent Image Generation","url":"https://www.microsoft.com/en-us/research/publication/art-anonymous-region-transformer-for-variable-multi-layer-transparent-image-generation/","published":"2025-02-01","authors":["Yifan Pu","Yiming Zhao","Zhicong Tang","Ruihong Yin","Haoxing Ye","Yuhui Yuan","Dong Chen","Jianmin Bao","Sirui Zhang","Yanbin Wang","Lin Liang","Lijuan Wang"],"abstract":"Multi-layer image generation is a fundamental task that enables users to isolate, select, and edit specific image layers, thereby revolutionizing interactions with generative models. In this paper, we introduce the Anonymous Region Transformer (ART), which facilitates the direct generation of variable multi-layer transparent images based on a global text prompt and an anonymous region layout. Inspired by Schema theory suggests that knowledge is organized in frameworks (schemas) that enable people to interpret and learn from new information by linking it to prior knowledge.}, this anonymous region layout allows the generative model to autonomously determine which set of visual tokens should align with which text tokens, which is in contrast to the previously dominant semantic layout for the image generation task. In addition, the layer-wise region crop mechanism, which only selects the vi...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","1970-01-01","efficient"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/on-the-emergence-of-thinking-in-llms-i-searching-for-the-right-intuition","title":"On the Emergence of Thinking in LLMs I: Searching for the Right Intuition","url":"https://www.microsoft.com/en-us/research/publication/on-the-emergence-of-thinking-in-llms-i-searching-for-the-right-intuition/","published":"2025-02-01","authors":["Guanghao Ye","Khiem Duc Pham","Xinzhi Zhang","Sivakanth Gopi","Baolin Peng","Beibin Li","Janardhan (Jana) Kulkarni","Huseyin Inan"],"abstract":"Recent AI advancements, such as OpenAI's new models, are transforming LLMs into LRMs (Large Reasoning Models) that perform reasoning during inference, taking extra time and compute for higher-quality outputs. We aim to uncover the algorithmic framework for training LRMs. Methods like self-consistency, PRM, and AlphaZero suggest reasoning as guided search. We ask: what is the simplest, most scalable way to enable search in LLMs?We propose a post-training framework called Reinforcement Learning via Self-Play (RLSP). RLSP involves three steps: (1) supervised fine-tuning with human or synthetic demonstrations of the reasoning process, (2) using an exploration reward signal to encourage diverse and efficient reasoning behaviors, and (3) RL training with an outcome verifier to ensure correctness while preventing reward hacking. Our key innovation is to decouple exploration and correctness sign...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Unpublished","Artificial intelligence","efficient"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/logic-rl-unleashing-llm-reasoning-with-rule-based-reinforcement-learning","title":"Logic-RL: Unleashing LLM Reasoning with Rule-Based Reinforcement Learning","url":"https://www.microsoft.com/en-us/research/publication/logic-rl-unleashing-llm-reasoning-with-rule-based-reinforcement-learning/","published":"2025-02-01","authors":["Zitian Gao","Qingnan Ren","Haoming Luo","Yuqian Hong","Bryan Dai","Joey Zhou","Kai Qiu","Zhirong Wu","Chong Luo"],"abstract":"Inspired by the success of DeepSeek-R1, we explore the potential of rule-based reinforcement learning (RL) in large reasoning models. To analyze reasoning dynamics, we use synthetic logic puzzles as training data due to their controllable complexity and straightforward answer verification. We make some key technical contributions that lead to effective and stable RL training: a system prompt that emphasizes the thinking and answering process, a stringent format reward function that penalizes outputs for taking shortcuts, and a straightforward training recipe that achieves stable convergence. Our 7B model develops advanced reasoning skills-such as reflection, verification, and summarization-that are absent from the logic corpus. Remarkably, after training on just 5K logic problems, it demonstrates generalization abilities to the challenging math benchmarks AIME and AMC.","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Manual","Artificial intelligence","LLM"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"huawei-noah:181","title":"EoH-S: Evolution of Heuristic Set using LLMs for Automated Heuristic Design","url":"https://www.noahlab.com.hk/en/scientific_research/eoh-s-evolution-of-heuristic-set-using-llms-for-automated-heuristic-design","published":"2025-02-01","authors":["Huawei/Noah"],"abstract":"Official Huawei Noah's Ark Lab publication entry. Venue: AAAI 2025. External paper link: https://arxiv.org/pdf/2508.03082","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Industry Intelligence","AAAI 2025","2025"],"author_affiliations":["Huawei/Noah"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Huawei Noah Wagtail publication API https://www.noahlab.com.hk/wt_app/api/v2/pages/"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/differentially-private-synthetic-data-via-apis-3-using-simulators-instead-of-foundation-models","title":"Differentially Private Synthetic Data via APIs 3: Using Simulators Instead of Foundation Models","url":"https://www.microsoft.com/en-us/research/publication/differentially-private-synthetic-data-via-apis-3-using-simulators-instead-of-foundation-models/","published":"2025-02-01","authors":["Zinan Lin","Tadas Baltrusaitis","Sergey Yekhanin"],"abstract":"Differentially private (DP) synthetic data, which closely resembles the original private data while maintaining strong privacy guarantees, has become a key tool for unlocking the value of private data without compromising privacy. Recently, Private Evolution (PE) has emerged as a promising method for generating DP synthetic data. Unlike other training-based approaches, PE only requires access to inference APIs from foundation models, enabling it to harness the power of state-of-the-art models. However, a suitable foundation model for a specific private data domain is not always available. In this paper, we discover that the PE framework is sufficiently general to allow inference APIs beyond foundation models. Specifically, we show that simulators -- such as computer graphics-based image synthesis tools -- can also serve as effective APIs within the PE framework. This insight greatly expa...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Unpublished","Artificial intelligence","Security, privacy, and cryptography"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"apple:p1k0rcd7a1wyx6yhlwzz7n0n","title":"Step-by-Step Reasoning for Math Problems via Twisted Sequential Monte Carlo","url":"https://machinelearning.apple.com/research/step-by-step-reasoning","published":"2025-02-01","authors":["Shengyu Feng","Xiang Kong","Shuang Ma","Aonan Zhang","Dong Yin","Chong Wang","Ruoming Pang","Yiming Yang"],"abstract":"Augmenting the multi-step reasoning abilities of Large Language Models (LLMs) has been a persistent challenge. Recently, verification has shown promise in improving solution consistency by evaluating generated outputs. However, current verification approaches suffer from sampling inefficiencies, requiring a large number of samples to achieve satisfactory performance. Additionally, training an effective verifier often depends on extensive process...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"openalex:W4407066027","title":"Intelligent seismic workflows: The power of generative AI and language models","url":"https://doi.org/10.1190/tle44020142.1","published":"2025-02-01","authors":["Rayan Kanfar","Abdulmohsen Alali","Thierry-Laurent Tonellot","Hussain Salim","Oleg Ovcharenko"],"abstract":"Abstract Advanced seismic data processing involves specialized methods often implemented through various software, requiring extensive expertise and time from geoscientists to execute geophysical workflows. Recently, large language models (LLMs) have demonstrated the ability to understand natural language, reason about domain-specific topics, and assist users through complex tasks. In this paper, we introduce an LLM-based autonomous agent for seismic data processing, focusing on full-waveform sonic data workflows. The proposed agent is shown to reliably understand user queries, select appropriate tools, and execute geophysical tasks such as bandpass filtering, data clipping, and frequency spectrum analysis. Safeguards and guardrails are incorporated into the agent to ensure operation within defined parameters, maintaining data security and integrity. By automating routine processes, the....","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1190/tle44020142.1","openalex_id":"https://openalex.org/W4407066027","cited_by_count":2,"quality_score":47,"matched_keywords":["LLM","agent"],"author_affiliations":["Nvidia (United States)","Saudi Aramco (Saudi Arabia)","Saudi Aramco (United States)"],"concepts":[{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.699474573135376},{"id":"https://openalex.org/C177212765","display_name":"Workflow","score":0.6618822813034058},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6181352138519287},{"id":"https://openalex.org/C163258240","display_name":"Power (physics)","score":0.41040849685668945},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.4060376286506653},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3854062855243683},{"id":"https://openalex.org/C77088390","display_name":"Database","score":0.10456249117851257},{"id":"https://openalex.org/C121332964","display_name":"Physics","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":2}},{"id":"openalex:W4407445216","title":"Earnings Call Scripts Generation With Large Language Models Using Few‐Shot Learning Prompt Engineering and Fine‐Tuning Methods","url":"https://doi.org/10.1002/ail2.110","published":"2025-02-01","authors":["Sovik Kumar Nath","Yanyan Zhang","Jia Li"],"abstract":"ABSTRACT Company earnings calls are pivotal events that offer crucial insights into a company's financial well‐being and future outlook. Large language models (LLMs) present a promising avenue for automatically generating the initial draft of earnings call scripts, leveraging financial data and past examples. We evaluate two distinct methods: (1) few‐shot learning prompt engineering with a large language model (LLM) and (2) fine‐tuning a large language model on earnings call transcript data. Our findings indicate that both methods can produce coherent scripts encompassing key metrics, updates, and guidance. However, there are inherent trade‐offs in comprehensiveness, potential hallucinations, writing style, ease of use, and cost. We discuss the pros and cons of each method to guide practitioners on effectively harnessing LLMs for earnings call script generation. Notably, we employ a huma...","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1002/ail2.110","openalex_id":"https://openalex.org/W4407445216","cited_by_count":2,"quality_score":47,"matched_keywords":["LLM","language model"],"author_affiliations":["Amazon (United States)"],"concepts":[{"id":"https://openalex.org/C61423126","display_name":"Scripting language","score":0.8352137207984924},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5997987985610962},{"id":"https://openalex.org/C2781426361","display_name":"Earnings","score":0.5587711930274963},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.398994505405426},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.3334110379219055},{"id":"https://openalex.org/C121955636","display_name":"Accounting","score":0.27903085947036743},{"id":"https://openalex.org/C199360897","display_name":"Programming language","score":0.2049013078212738},{"id":"https://openalex.org/C144133560","display_name":"Business","score":0.18102839589118958}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":2}},{"id":"openalex:W4407264142","title":"KG-prompt: Interpretable knowledge graph prompt for pre-trained language models","url":"https://doi.org/10.1016/j.knosys.2025.113118","published":"2025-02-01","authors":["Liyi Chen","Jie Liu","Yutai Duan","Runze Wang"],"abstract":"Knowledge graphs (KGs) can provide rich factual knowledge for language models , enhancing reasoning ability and interpretability . However, existing knowledge injection methods usually ignore the structured information in KGs. Using structured knowledge to enhance pre-trained language models (PLMs) still has a set of challenging issues, including resource consumption of knowledge retraining, heterogeneous information, and knowledge noise. To address these issues, we explore how to flexibly inject structured knowledge into frozen PLMs. Inspired by prompt learning, we propose a novel method K nowledge G raph Prompt (KG-Prompt), which for the first time encodes the KG as structured prompts to enhance the knowledge expression ability of PLMs. KG-Prompt consists of a compressed subgraph construction module and a KG prompt generation module. In the compressed subgraph construction module, we c...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1016/j.knosys.2025.113118","openalex_id":"https://openalex.org/W4407264142","cited_by_count":6,"quality_score":43,"matched_keywords":[],"author_affiliations":["Alibaba Group (China)","Ministry of Education"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5347396731376648},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5175232887268066},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.5012433528900146},{"id":"https://openalex.org/C132525143","display_name":"Graph","score":0.4898092448711395},{"id":"https://openalex.org/C2987255567","display_name":"Knowledge graph","score":0.4675433039665222},{"id":"https://openalex.org/C80444323","display_name":"Theoretical computer science","score":0.1415126621723175}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":6}},{"id":"openalex:W4407055363","title":"Introduction to this special section: Generative and physics-informed AI","url":"https://doi.org/10.1190/tle44020078.1","published":"2025-02-01","authors":["Oleg Ovcharenko","Haibin Di","Umair bin Waheed","Vladimir Kazei"],"abstract":"Integrating advanced artificial intelligence (AI) into geoscience represents a pivotal moment, redefining how we approach exploration and interpretation of the earth's subsurface. Generative AI methods, such as large language models (LLMs), diffusion models, and physics-informed learning, offer new ways to simulate, invert, and interpret seismic data. LLMs are increasingly used in various seismic tasks ranging from interpolation and denoising to direct inversion for subsurface properties. Promising attempts have been made to develop foundational models that treat poststack seismic data like natural images. Prestack causality-aware and spatially aware foundational models have not yet been explored extensively. Diffusion models that draw samples from a learned distribution enhance data sets by generating synthetic subsurface models, filling data gaps, and creating plausible scenarios that....","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1190/tle44020078.1","openalex_id":"https://openalex.org/W4407055363","cited_by_count":1,"quality_score":38,"matched_keywords":[],"author_affiliations":["King Fahd University of Petroleum and Minerals","Nvidia (United Kingdom)","Nvidia (United States)","Saudi Aramco (United States)","UCB Pharma (United Kingdom)"],"concepts":[{"id":"https://openalex.org/C2780129039","display_name":"Section (typography)","score":0.823013424873352},{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.6814289093017578},{"id":"https://openalex.org/C2993458768","display_name":"Special section","score":0.5316650867462158},{"id":"https://openalex.org/C121332964","display_name":"Physics","score":0.32234662771224976},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.24716153740882874},{"id":"https://openalex.org/C61696701","display_name":"Engineering physics","score":0.22785168886184692},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.19899329543113708},{"id":"https://openalex.org/C111919701","display_name":"Operating system","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"official:2041cee108da1a7d","title":"Claude Sonnet 3.7 System Card","url":"https://www-cdn.anthropic.com/9ff93dfa8f445c932415d335c88852ef47f1201e.pdf","published":"2025-02","authors":["Anthropic"],"abstract":"Official Anthropic system card for Claude Sonnet 3.7.","companies":["Anthropic"],"matched_orgs":["Anthropic"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"model_card","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["system card","Claude","Claude Sonnet 3.7"],"author_affiliations":["Anthropic"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Anthropic system cards page https://www.anthropic.com/system-cards"}},{"id":"official:03cfaef3e414c815","title":"Spatio-Temporal Context Prompting for Zero-Shot Action Detection","url":"https://research.nvidia.com/publication/2025-02_spatio-temporal-context-prompting-zero-shot-action-detection","published":"2025-02","authors":["Wei-Jhe Huang","Min-Hung Chen","Shang-Hong Lai"],"abstract":"Official NVIDIA Research publication.","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.1109/wacv61041.2025.00880","openalex_id":"https://openalex.org/W4409263151","cited_by_count":2,"quality_score":54,"matched_keywords":[],"author_affiliations":["NVIDIA","National Tsing Hua University","Nvidia (United States)"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official NVIDIA Research publications page https://research.nvidia.com/publications?f%5B0%5D=publication_date%3A2025&page=3"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/scilama-a-single-cell-representation-learning-framework-to-leverage-prior-knowledge-from-large-language-models","title":"sciLaMA: A Single-Cell Representation Learning Framework to Leverage Prior Knowledge from Large Language Models","url":"https://www.microsoft.com/en-us/research/publication/scilama-a-single-cell-representation-learning-framework-to-leverage-prior-knowledge-from-large-language-models/","published":"2025-01-31","authors":["Hongru Hu","Shuwen Zhang","Yongin Choi","Venkat S. Malladi","G. Quon"],"abstract":"Single-cell RNA sequencing (scRNA-seq) enables high-resolution exploration of cellular diversity and gene regulation, yet analyzing such data remains challenging due to technical and methodological limitations. Existing task-specific deep generative models like Variational Auto-Encoder (VAE) and its variants struggle to incorporate external biological knowledge, while transformer-based foundational large Language Models (LLMs or large LaMs) face limitations in computational cost and applicability to tabular gene expression data. Here, we introduce sciLaMA (single-cell interpretable Language Model Adapter), a novel representation learning framework that bridges these gaps by integrating static gene embeddings from multimodal LaMs with scRNA-seq tabular data through a paired-VAE architecture. Our approach generates context-aware representations for both cells and genes and outperforms stat...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.1101/2025.01.28.635153","openalex_id":"https://openalex.org/W4407100080","cited_by_count":2,"quality_score":82,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Biology","Representation learning","1970-01-01","language model","efficient"],"author_affiliations":["Microsoft","Mayo Clinic in Florida","Microsoft (United States)","Microsoft Research (United Kingdom)","University of California, Davis","WinnMed"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/position-evaluating-generative-ai-systems-is-a-social-science-measurement-challenge","title":"Position: Evaluating Generative AI Systems is a Social Science Measurement Challenge","url":"https://www.microsoft.com/en-us/research/publication/position-evaluating-generative-ai-systems-is-a-social-science-measurement-challenge/","published":"2025-01-31","authors":["Hanna Wallach","Meera Desai","A. Feder Cooper","Angelina Wang","Chad Atalla","Solon Barocas","Su Lin Blodgett","Alex Chouldechova","Emily Corvi","Alex Dow","Jean Garcia-Gathright","Alexandra Olteanu"],"abstract":"The measurement tasks involved in evaluating generative AI (GenAI) systems are especially difficult, leading to what has been described as\"a tangle of sloppy tests [and] apples-to-oranges comparisons\" (Roose, 2024). In this position paper, we argue that the ML community would benefit from learning from and drawing on the social sciences when developing and using measurement instruments for evaluating GenAI systems. Specifically, our position is that evaluating GenAI systems is a social science measurement challenge. We present a four-level framework, grounded in measurement theory from the social sciences, for measuring concepts related to the capabilities, behaviors, and impacts of GenAI. This framework has two important implications for designing and evaluating evaluations: First, it can broaden the expertise involved in evaluating GenAI systems by enabling stakeholders with different....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":80,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Social sciences","Computer science","Generative AI","Social Science","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W4407005368","title":"Segment Anything Model (SAM) for Digital Pathology: Assess Zero-shot Segmentation on Whole Slide Imaging","url":"https://doi.org/10.2352/ei.2025.37.14.coimg-132","published":"2025-01-31","authors":["Ruining Deng","Can Cui","Quan Liu","Tianyuan Yao","Lucas W. Remedios","Shunxing Bao","Bennett A. Landman","Lee Wheless","Lori A. Coburn","Keith T. Wilson","Yaohong Wang","Shilin Zhao"],"abstract":"The segment anything model (SAM) was released as a foundation model for image segmentation. The promptable segmentation model was trained by over 1 billion masks on 11M licensed and privacy-respecting images. The model supports zero-shot image segmentation with various segmentation prompts (e.g., points, boxes, masks). It makes the SAM attractive for medical image analysis, especially for digital pathology where the training data are rare. In this study, we evaluate the zero-shot segmentation performance of SAM model on representative segmentation tasks on whole slide imaging (WSI), including (1) tumor segmentation, (2) non-tumor tissue segmentation, (3) cell nuclei segmentation. Core Results: The results suggest that the zero-shot SAM model achieves remarkable segmentation performance for large connected objects. However, it does not consistently achieve satisfying performance for dense...","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.2352/ei.2025.37.14.coimg-132","openalex_id":"https://openalex.org/W4407005368","cited_by_count":42,"quality_score":67,"matched_keywords":[],"author_affiliations":["Nvidia (United Kingdom)","Nvidia (United States)","The University of Texas MD Anderson Cancer Center","Vanderbilt University","Vanderbilt University Medical Center"],"concepts":[{"id":"https://openalex.org/C2777522853","display_name":"Digital pathology","score":0.6779443621635437},{"id":"https://openalex.org/C2778344882","display_name":"Shot (pellet)","score":0.6632170677185059},{"id":"https://openalex.org/C89600930","display_name":"Segmentation","score":0.6562268137931824},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5296770334243774},{"id":"https://openalex.org/C2780813799","display_name":"Zero (linguistics)","score":0.5035654902458191},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.45727163553237915},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.4363846778869629},{"id":"https://openalex.org/C121684516","display_name":"Computer graphics (images)","score":0.37639230489730835}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":42}},{"id":"official:80d6e8a0b1c9fd7c","title":"OpenAI o3-mini System Card","url":"https://openai.com/index/o3-mini-system-card","published":"2025-01-31","authors":["OpenAI"],"abstract":"This report outlines the safety work carried out for the OpenAI o3-mini model, including safety evaluations, external red teaming, and Preparedness Framework evaluations.","companies":["OpenAI"],"matched_orgs":["OpenAI"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Research"],"author_affiliations":["OpenAI"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official RSS feed https://openai.com/news/rss.xml"}},{"id":"openalex:W4407097559","title":"Transforming Healthcare: The Convergence of Generative AI and Cloud Technologies","url":"https://doi.org/10.32628/cseit251112127","published":"2025-01-31","authors":["Ripunjaya Pattnaik"],"abstract":"The convergence of generative AI and cloud computing is reshaping the healthcare landscape, presenting unprecedented opportunities for innovation and advancement across multiple domains. This comprehensive article explores the transformative potential of these technologies, examining their applications in personalized medicine, clinical decision support, medical imaging, and healthcare operations. The article delves into critical research areas, including privacy-preserving AI frameworks, edge computing integration, and interoperability solutions that enable seamless data exchange across healthcare systems. Special attention is given to emerging capabilities in emergency response, population health management, and patient engagement, alongside cost optimization and quality improvement considerations. Through analysis of current implementations and future directions, this article identifi...","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.32628/cseit251112127","openalex_id":"https://openalex.org/W4407097559","cited_by_count":0,"quality_score":41,"matched_keywords":["personalized"],"author_affiliations":["Amazon (United States)"],"concepts":[{"id":"https://openalex.org/C79974875","display_name":"Cloud computing","score":0.8033254146575928},{"id":"https://openalex.org/C2777303404","display_name":"Convergence (economics)","score":0.681696355342865},{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.6516523361206055},{"id":"https://openalex.org/C160735492","display_name":"Health care","score":0.47381722927093506},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.45846566557884216},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.30103203654289246},{"id":"https://openalex.org/C17744445","display_name":"Political science","score":0.22397205233573914},{"id":"https://openalex.org/C162324750","display_name":"Economics","score":0.1314135193824768}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/execoder-empowering-large-language-models-with-executability-representation-for-code-translation","title":"ExeCoder: Empowering Large Language Models with Executability Representation for Code Translation","url":"https://www.microsoft.com/en-us/research/publication/execoder-empowering-large-language-models-with-executability-representation-for-code-translation/","published":"2025-01-30","authors":["Minghua He","Fangkai Yang","Pu Zhao","Wenjie Yin","Yu Kang","Qingwei Lin 林庆维","Saravan Rajmohan","Dongmei Zhang","Qi Zhang"],"abstract":"Code translation is a crucial activity in the software development and maintenance process, and researchers have recently begun to focus on using pre-trained large language models (LLMs) for code translation. However, existing LLMs only learn the contextual semantics of code during pre-training, neglecting executability information closely related to the execution state of the code, which results in unguaranteed code executability and unreliable automated code translation. To address this issue, we propose ExeCoder, an LLM specifically designed for code translation, aimed at utilizing executability representations such as functional semantics, syntax structures, and variable dependencies to enhance the capabilities of LLMs in code translation. To evaluate the effectiveness of ExeCoder, we manually enhanced the widely used benchmark TransCoder-test, resulting in a benchmark called TransCo...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":80,"matched_keywords":["Article (Journal)","Artificial intelligence","Programming languages and software engineering","Computer science","Programming language","software engineering","LLM"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"apple:lg93uuiof0ywayqef2pcx5bv","title":"Mitigating Hallucinated Translations in Large Language Models with Hallucination-focused Preference Optimization","url":"https://machinelearning.apple.com/research/mitigating-hallucinated-translations","published":"2025-01-30","authors":["Zilu Tang","Rajen Chatterjee","Sarthak Garg"],"abstract":"Machine Translation (MT) is undergoing a paradigm shift, with systems based on fine-tuned large language models (LLM) becoming increasingly competitive with traditional encoder-decoder models trained specifically for translation tasks. However, LLM-based systems are at a higher risk of generating hallucinations, which can severely undermine user's trust and safety. Most prior research on hallucination mitigation focuses on traditional MT models,...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["LLM","preference"],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"hf-org-paper:deepseek-ai:2501.17811","title":"Janus-Pro: Unified Multimodal Understanding and Generation with Data and Model Scaling","url":"https://huggingface.co/papers/2501.17811","published":"2025-01-29","authors":["DeepSeek"],"abstract":"","companies":["DeepSeek"],"matched_orgs":["DeepSeek"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["HuggingFace org papers","deepseek-ai"],"author_affiliations":["DeepSeek"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/deepseek-ai/papers"}},{"id":"openalex:W4406949573","title":"Architecture 2.0: Foundations of Artificial Intelligence Agents for Modern Computer System Design","url":"https://doi.org/10.1109/mc.2024.3521641","published":"2025-01-29","authors":["Vijay Janapa Reddi","A. Yazdanbakhsh"],"abstract":"AI agents could herald a new golden age of modern computer system design, enabling the creation of complex systems with minimal human input. Achieving this vision, however, demands curating datasets, defining benchmarks, and fostering interpretability alongside balancing AI autonomy with human expertise.","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/mc.2024.3521641","openalex_id":"https://openalex.org/W4406949573","cited_by_count":3,"quality_score":40,"matched_keywords":[],"author_affiliations":["Google (United States)","Harvard University Press"],"concepts":[{"id":"https://openalex.org/C123657996","display_name":"Architecture","score":0.5616146922111511},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5573540925979614},{"id":"https://openalex.org/C207453521","display_name":"Artificial intelligence, situated approach","score":0.5001537799835205},{"id":"https://openalex.org/C30112582","display_name":"Artificial Intelligence System","score":0.4846973419189453},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.46880418062210083},{"id":"https://openalex.org/C201995342","display_name":"Systems engineering","score":0.3682783246040344},{"id":"https://openalex.org/C115903868","display_name":"Software engineering","score":0.35298919677734375},{"id":"https://openalex.org/C188147891","display_name":"Cognitive science","score":0.33033430576324463}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":3}},{"id":"openalex:W4406903235","title":"Contrastive Modality-Disentangled Learning for Multimodal Recommendation","url":"https://doi.org/10.1145/3715876","published":"2025-01-28","authors":["Xixun Lin","Rui Liu","Yanan Cao","Lixin Zou","Qian Li","Yongxuan Wu","Yang Liu","Dawei Yin","Guandong Xu"],"abstract":"Multimodal recommendation, which utilizes rich multimodal information to learn user preferences, has attracted significant attention. Most works focus on designing powerful encoders for extracting multimodal features, and simply aggregate the learned features together to make prediction. Consequently, they have a limited capacity to learn the inter-modality knowledge including the modality-shared and modality-unique knowledge. In fact, learning the modality-shared knowledge enables us to align cross-modality data for fusing heterogeneous modality features. Learning the modality-unique knowledge is equally important when recommendation tasks only involve a small amount of shared features and the necessary information is contained within specific modality. In this article, we propose Contrastive Modality-Disentangled Learning (CMDL) to overcome this critical limitation. CMDL exactly captur...","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3715876","openalex_id":"https://openalex.org/W4406903235","cited_by_count":38,"quality_score":67,"matched_keywords":[],"author_affiliations":["Academy of Mathematics and Systems Science","Baidu (China)","Chinese Academy of Sciences","Curtin University","Education University of Hong Kong","Institute of Information Engineering","University of Chinese Academy of Sciences","University of Technology Sydney","Wuhan University"],"concepts":[{"id":"https://openalex.org/C2780226545","display_name":"Modality (human–computer interaction)","score":0.8229475617408752},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.49872565269470215},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.3977542817592621},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.376995325088501}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":38}},{"id":"official:050c1d6726fd9209","title":"Qwen2.5-Max: Exploring the Intelligence of Large-scale MoE Model","url":"https://qwenlm.github.io/blog/qwen2.5-max/","published":"2025-01-28","authors":["Alibaba/Qwen"],"abstract":"QWEN CHAT API DEMO DISCORDIt is widely recognized that continuously scaling both data size and model size can lead to significant improvements in model intelligence. However, the research and industry community has limited experience in effectively scaling extremely large models, whether they are dense or Mixture-of-Expert (MoE) models. Many critical details regarding this scaling process were only disclosed with the recent release of DeepSeek V3. Concurrently, we are developing Qwen2.","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Alibaba/Qwen"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official RSS feed https://qwenlm.github.io/blog/index.xml"}},{"id":"openalex:W4406902947","title":"Beyond masking: Demystifying token-based pre-training for vision transformers","url":"https://doi.org/10.1016/j.patcog.2025.111386","published":"2025-01-28","authors":["Yunjie Tian","Lingxi Xie","Jiemin Fang","Jianbin Jiao","Qi Tian"],"abstract":"","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1016/j.patcog.2025.111386","openalex_id":"https://openalex.org/W4406902947","cited_by_count":7,"quality_score":44,"matched_keywords":[],"author_affiliations":["Huawei Technologies (China)","Huazhong University of Science and Technology","University of Chinese Academy of Sciences"],"concepts":[{"id":"https://openalex.org/C48145219","display_name":"Security token","score":0.7566639184951782},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6115627288818359},{"id":"https://openalex.org/C2777402240","display_name":"Masking (illustration)","score":0.4628116190433502},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.45278921723365784},{"id":"https://openalex.org/C2777211547","display_name":"Training (meteorology)","score":0.44269320368766785},{"id":"https://openalex.org/C66322947","display_name":"Transformer","score":0.4388169050216675},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.43079593777656555},{"id":"https://openalex.org/C28490314","display_name":"Speech recognition","score":0.3904908299446106}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":7}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/optimizing-large-language-model-training-using-fp4-quantization","title":"Optimizing Large Language Model Training Using FP4 Quantization","url":"https://www.microsoft.com/en-us/research/publication/optimizing-large-language-model-training-using-fp4-quantization/","published":"2025-01-27","authors":["Ruizhe Wang","Yeyun Gong","Xiao Liu","Guoshuai Zhao","Ziyue Yang","Baining Guo","Zhengjun Zha","Peng Cheng"],"abstract":"The growing computational demands of training large language models (LLMs) necessitate more efficient methods. Quantized training presents a promising solution by enabling low-bit arithmetic operations to reduce these costs. While FP8 precision has demonstrated feasibility, leveraging FP4 remains a challenge due to significant quantization errors and limited representational capacity. This work introduces the first FP4 training framework for LLMs, addressing these challenges with two key innovations: a differentiable quantization estimator for precise weight updates and an outlier clamping and compensation strategy to prevent activation collapse. To ensure stability, the framework integrates a mixed-precision training scheme and vector-wise quantization. Experimental results demonstrate that our FP4 framework achieves accuracy comparable to BF16 and FP8, with minimal degradation, scaling...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":84,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","large language models","1970-01-01","language model","efficient","quantization"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/skeleton-guided-translation-a-benchmarking-framework-for-code-repository-translation-with-fine-grained-quality-evaluation","title":"Skeleton-Guided-Translation: A Benchmarking Framework for Code Repository Translation with Fine-Grained Quality Evaluation","url":"https://www.microsoft.com/en-us/research/publication/skeleton-guided-translation-a-benchmarking-framework-for-code-repository-translation-with-fine-grained-quality-evaluation/","published":"2025-01-27","authors":["Xing Zhang","Jiaheng Wen","Fangkai Yang","Pu Zhao","Yu Kang","Junhao Wang","Maoquan Wang","Yufan Huang","Elsie Nallipogu","Qingwei Lin 林庆维","Yingnong Dang","Saravan Rajmohan"],"abstract":"The advancement of large language models has intensified the need to modernize enterprise applications and migrate legacy systems to secure, versatile languages. However, existing code translation benchmarks primarily focus on individual functions, overlooking the complexities involved in translating entire repositories, such as maintaining inter-module coherence and managing dependencies. While some recent repository-level translation benchmarks attempt to address these challenges, they still face limitations, including poor maintainability and overly coarse evaluation granularity, which make them less developer-friendly. We introduce Skeleton-Guided-Translation, a framework for repository-level Java to C# code translation with fine-grained quality evaluation. It uses a two-step process: first translating the repository's structural\"skeletons\", then translating the full repository guide...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Article (Journal)","Artificial intelligence","Programming languages and software engineering","Programming language","software engineering"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W4406867827","title":"Distinguishing LLM-Generated from Human-Written Code by Contrastive Learning","url":"https://doi.org/10.1145/3705300","published":"2025-01-27","authors":["Xiaodan Xu","Chao Ni","Xinrong Guo","Shaoxuan Liu","Xiaoya Wang","Kui Liu","Xiaohu Yang"],"abstract":"Large language models (LLMs), such as ChatGPT released by OpenAI, have attracted significant attention from both industry and academia due to their demonstrated ability to generate high-quality content for various tasks. Despite the impressive capabilities of LLMs, there are growing concerns regarding their potential risks in various fields, such as news, education, and software engineering. Recently, several commercial and open source LLM-generated content detectors have been proposed, which, however, are primarily designed for detecting natural language content without considering the specific characteristics of program code. This article aims to fill this gap by proposing a novel ChatGPT-generated code detector, CodeGPTSensor, based on a contrastive learning framework and a semantic encoder built with UniXcoder. To assess the effectiveness of CodeGPTSensor on differentiating ChatGPT-g...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3705300","openalex_id":"https://openalex.org/W4406867827","cited_by_count":11,"quality_score":56,"matched_keywords":["LLM","news"],"author_affiliations":["Huawei Technologies (China)","Zhejiang University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8310010433197021},{"id":"https://openalex.org/C2776760102","display_name":"Code (set theory)","score":0.4517310857772827},{"id":"https://openalex.org/C199360897","display_name":"Programming language","score":0.4227994978427887},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3360820710659027},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.3331649899482727},{"id":"https://openalex.org/C177264268","display_name":"Set (abstract data type)","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":11}},{"id":"official:08e8376e586f5b41","title":"Qwen2.5-1M: Deploy Your Own Qwen with Context Length up to 1M Tokens","url":"https://qwenlm.github.io/blog/qwen2.5-1m/","published":"2025-01-27","authors":["Alibaba/Qwen"],"abstract":"Tech Report HuggingFace ModelScope Qwen Chat HuggingFace Demo ModelScope Demo DISCORDIntroduction Two months after upgrading Qwen2.5-Turbo to support context length up to one million tokens, we are back with the open-source Qwen2.5-1M models and the corresponding inference framework support. Here’s what you can expect from this release:Opensource Models: We’re releasing two new checkpoints, Qwen2.5-7B-Instruct-1M and Qwen2.5-14B-Instruct-1M, marking the first time we’ve upgraded our opensource Qwen models to handle 1M-token contexts.","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Alibaba/Qwen"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official RSS feed https://qwenlm.github.io/blog/index.xml"}},{"id":"openalex:W4406858192","title":"Aligning, Autoencoding and Prompting Large Language Models for Novel Disease Reporting","url":"https://doi.org/10.1109/tpami.2025.3534586","published":"2025-01-27","authors":["Fenglin Liu","Xian Wu","Jinfa Huang","Bang Yang","Kim Branson","Patrick Schwab","Lei Clifton","Ping Zhang","Jiebo Luo","Yefeng Zheng","David A. Clifton"],"abstract":"Given radiology images, automatic radiology report generation aims to produce informative text that reports diseases. It can benefit current clinical practice in diagnostic radiology. Existing methods typically rely on large-scale medical datasets annotated by clinicians to train desirable models. However, for novel diseases, sufficient training data are typically not available. We propose a prompt-based deep learning framework, i.e., PromptLLM, to align, autoencode, and prompt the (large) language model to generate reports for novel diseases accurately and efficiently. Our method includes three major steps: 1) aligning visual images and textual reports to learn general knowledge across modalities from diseases where labeled data are sufficient, 2) autoencoding the LLM using unlabeled data of novel diseases to learn the specific knowledge and writing styles of the novel disease, and 3) p...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/tpami.2025.3534586","openalex_id":"https://openalex.org/W4406858192","cited_by_count":7,"quality_score":52,"matched_keywords":["LLM","language model"],"author_affiliations":["GlaxoSmithKline (Netherlands)","GlaxoSmithKline (United Kingdom)","Peking University","Tencent (China)","The Ohio State University","University of Oxford","University of Rochester"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.719325840473175},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5468021631240845},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.49982500076293945},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.3529706597328186},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.33670520782470703},{"id":"https://openalex.org/C2522767166","display_name":"Data science","score":0.33223381638526917}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":7}},{"id":"official:a5e68ef06beae216","title":"Qwen2.5 VL! Qwen2.5 VL! Qwen2.5 VL!","url":"https://qwenlm.github.io/blog/qwen2.5-vl/","published":"2025-01-26","authors":["Alibaba/Qwen"],"abstract":"QWEN CHAT GITHUB HUGGING FACE MODELSCOPE DISCORDWe release Qwen2.5-VL, the new flagship vision-language model of Qwen and also a significant leap from the previous Qwen2-VL. To try the latest model, feel free to visit Qwen Chat and choose Qwen2.5-VL-72B-Instruct. Also, we open both base and instruct models in 3 sizes, including 3B, 7B, and 72B, in both Hugging Face and ModelScope.The key features include:Understand things visually: Qwen2.","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["language model"],"author_affiliations":["Alibaba/Qwen"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official RSS feed https://qwenlm.github.io/blog/index.xml"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/llava-rad-mimic-cxr-annotations","title":"LLaVA-Rad MIMIC-CXR Annotations","url":"https://www.microsoft.com/en-us/research/publication/llava-rad-mimic-cxr-annotations/","published":"2025-01-24","authors":["Juan Manuel Zambrano Chaves (juanza)","Shih-Cheng Huang","Yanbo Xu","Hanwen Xu","Naoto Usuyama (naotous)","Sheng Zhang","Fei Wang","Yujia Xie","Mahmoud Khademi","Ziyi Yang","Hany Awadalla","Julia Gong"],"abstract":"LLaVA-Rad MIMIC-CXR features more accurate section extractions from MIMIC-CXR free-text radiology reports. Traditionally, rule-based methods were used to extract sections such as the reason for exam, findings, and impression. However, these approaches often fail due to inconsistencies in report structure and clinical language. In this work, we leverage GPT-4 to extract these sections more reliably, adding 237,073 image-text pairs to the training split and 1,952 pairs to the validation split. This enhancement afforded the development and fine-tuning of LLaVA-Rad, a multimodal large language model (LLM) tailored for radiology applications, achieving improved performance on report generation tasks. This resource is provided to support reproducibility and for the benefit of the research community, enabling further exploration in vision–language modeling. For more details, please refer to the...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":80,"matched_keywords":["Miscellaneous","Artificial intelligence","Computer vision","Medical, health and genomics","Multimodal Large Language Models","LLM","language model"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/di-bench-benchmarking-large-language-models-on-dependency-inference-with-testable-repositories-at-scale","title":"DI-BENCH: Benchmarking Large Language Models on Dependency Inference with Testable Repositories at Scale","url":"https://www.microsoft.com/en-us/research/publication/di-bench-benchmarking-large-language-models-on-dependency-inference-with-testable-repositories-at-scale/","published":"2025-01-23","authors":["Linghao Zhang","Junhao Wang","Shilin He","Chaoyun Zhang","Yu Kang","Bowen Li","Jiaheng Wen","Chengxing Xie","Maoquan Wang","Yufan Huang","Elsie Nallipogu","Qingwei Lin 林庆维"],"abstract":"Large Language Models have advanced automated software development, however, it remains a challenge to correctly infer dependencies, namely, identifying the internal components and external packages required for a repository to successfully run. Existing studies highlight that dependency-related issues cause over 40\\% of observed runtime errors on the generated repository. To address this, we introduce DI-BENCH, a large-scale benchmark and evaluation framework specifically designed to assess LLMs' capability on dependency inference. The benchmark features 581 repositories with testing environments across Python, C#, Rust, and JavaScript. Extensive experiments with textual and execution-based metrics reveal that the current best-performing model achieves only a 42.9% execution pass rate, indicating significant room for improvement. DI-BENCH establishes a new viewpoint for evaluating LLM p...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Article (Journal)","Artificial intelligence","Programming languages and software engineering","Programming language","software engineering","LLM"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"official:e24d5b88ddeee695","title":"Computer-Using Agent","url":"https://openai.com/index/computer-using-agent","published":"2025-01-23","authors":["OpenAI"],"abstract":"","companies":["OpenAI"],"matched_orgs":["OpenAI"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Research","agent"],"author_affiliations":["OpenAI"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official RSS feed https://openai.com/news/rss.xml"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/integrative-decoding-improve-factuality-via-implicit-self-consistency","title":"Integrative Decoding: Improve Factuality via Implicit Self-consistency","url":"https://www.microsoft.com/en-us/research/publication/integrative-decoding-improve-factuality-via-implicit-self-consistency/","published":"2025-01-22","authors":["Yi Cheng","Xiao Liang","Yeyun Gong","Wen Xiao","Song Wang","Yuji Zhang","Wenjun Hou","Kaishuai Xu","Wenge Liu","Wenjie Li","Jian Jiao","Qi Chen"],"abstract":"Self-consistency-based approaches, which involve repeatedly sampling multiple outputs and selecting the most consistent one as the final response, prove to be remarkably effective in improving the factual accuracy of large language models. Nonetheless, existing methods usually have strict constraints on the task format, largely limiting their applicability. In this paper, we present Integrative Decoding (ID), to unlock the potential of self-consistency in open-ended generation tasks. ID operates by constructing a set of inputs, each prepended with a previously sampled response, and then processes them concurrently, with the next token being selected by aggregating of all their corresponding predictions at each decoding step. In essence, this simple approach implicitly incorporates self-consistency in the decoding objective. Extensive evaluation shows that ID consistently enhances factual...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computation and Language","Computer science","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W4406803762","title":"Beyond the Hype: A Comprehensive Review of Current Trends in Generative AI Research, Teaching Practices, and Tools","url":"https://doi.org/10.1145/3689187.3709614","published":"2025-01-22","authors":["James Prather","Juho Leinonen","Natalie Kiesler","Jamie Gorson Benario","Sam Lau","Stephen MacNeil","Narges Norouzi","Simone Opel","Virginia Pettit","Leo Porter","Brent N. Reeves","Jaromír Šavelka"],"abstract":"Author(s): Prather, James; Leinonen, Juho; Kiesler, Natalie; Benario, Jamie Gorson; Lau, Sam; MacNeil, Stephen; Norouzi, Narges; Opel, Simone; Pettit, Vee; Porter, Leo; Reeves, Brent N; Savelka, Jaromir; Smith, David H; Strickroth, Sven; Zingaro, Daniel","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"review","doi":"https://doi.org/10.1145/3689187.3709614","openalex_id":"https://openalex.org/W4406803762","cited_by_count":58,"quality_score":67,"matched_keywords":[],"author_affiliations":["Aalto University","Abilene Christian University","Carnegie Mellon University","FernUniversität in Hagen","Georg Simon Ohm University of Applied Sciences Nuremberg","Google (United States)","LMU Klinikum","Ludwig-Maximilians-Universität München","Temple University","University of California San Diego","University of California, Berkeley","University of Illinois System","University of Toronto","Virginia Tech"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5849448442459106},{"id":"https://openalex.org/C148043351","display_name":"Current (fluid)","score":0.5548681020736694},{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.5483999848365784},{"id":"https://openalex.org/C2522767166","display_name":"Data science","score":0.49038028717041016},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.25771069526672363},{"id":"https://openalex.org/C127413603","display_name":"Engineering","score":0.20619043707847595},{"id":"https://openalex.org/C119599485","display_name":"Electrical engineering","score":0.082143634557724}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":58}},{"id":"hf-org-paper:deepseek-ai:2501.12948","title":"DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning","url":"https://huggingface.co/papers/2501.12948","published":"2025-01-22","authors":["DeepSeek"],"abstract":"We introduce our first-generation reasoning models, DeepSeek-R1-Zero and DeepSeek-R1. DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step, demonstrates remarkable reasoning capabilities. Through RL, DeepSeek-R1-Zero naturally emerges with numerous powerful and intriguing reasoning behaviors. However, it encounters challenges such as poor readability, and language mixing. To address these issues and further enhance reasoning performance, we introduce DeepSeek-R1, which incorporates multi-stage training and cold-start data before RL. DeepSeek-R1 achieves performance comparable to OpenAI-o1-1217 on reasoning tasks. To support the research community, we open-source DeepSeek-R1-Zero, DeepSeek-R1, and six dense models (1.5B, 7B, 8B, 14B, 32B, 70B) distilled from DeepSeek-R1 based on Qwen and Llama.","companies":["DeepSeek"],"matched_orgs":["DeepSeek"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["HuggingFace org papers","deepseek-ai"],"author_affiliations":["DeepSeek"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/deepseek-ai/papers"}},{"id":"apple:ilh2iszq5512udfw7hboqbx9","title":"Mapping Cells Through Time and Space With Moscot","url":"https://machinelearning.apple.com/research/mapping-cells-through-time","published":"2025-01-22","authors":["Dominik Klein","Giovanni Palla","Marius Lange§","Michal Klein","Zoe Piran¶","Manuel Gander","Laetitia Meng-Papaxanthos","Michael Sterr","Aimée Bastidas-Ponce","Marta Tarquis-Medina","Heiko Lickert","Mostafa Bakhti"],"abstract":"Single-cell genomics technologies enable multimodal profiling of millions of cells across temporal and spatial dimensions. Experimental limitations prevent the measurement of all-encompassing cellular states in their native temporal dynamics or spatial tissue niche. Optimal transport theory has emerged as a powerful tool to overcome such constraints, enabling the recovery of the original cellular context. However, most algorithmic implementations...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/understanding-the-llm-ification-of-chi-unpacking-the-impact-of-llms-at-chi-through-a-systematic-literature-review","title":"Understanding the LLM-ification of CHI: Unpacking the Impact of LLMs at CHI through a Systematic Literature Review","url":"https://www.microsoft.com/en-us/research/publication/understanding-the-llm-ification-of-chi-unpacking-the-impact-of-llms-at-chi-through-a-systematic-literature-review/","published":"2025-01-21","authors":["Rock Yuren Pang","Hope Schroeder","Kynnedy Simone Smith","Solon Barocas","Ziang Xiao","Emily Tseng","Danielle Bragg"],"abstract":"Large language models (LLMs) have been positioned to revolutionize HCI, by reshaping not only the interfaces, design patterns, and sociotechnical systems that we study, but also the research practices we use. To-date, however, there has been little understanding of LLMs' uptake in HCI. We address this gap via a systematic literature review of 153 CHI papers from 2020-24 that engage with LLMs. We taxonomize: (1) domains where LLMs are applied; (2) roles of LLMs in HCI projects; (3) contribution types; and (4) acknowledged limitations and risks. We find LLM work in 10 diverse domains, primarily via empirical and artifact contributions. Authors use LLMs in five distinct roles, including as research tools or simulated users. Still, authors often raise validity and reproducibility concerns, and overwhelmingly study closed models. We outline opportunities to improve HCI research with and on LL...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":80,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Human-computer interaction","Computer science","large language models","1970-01-01","LLM"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/test-time-preference-optimization-on-the-fly-alignment-via-iterative-textual-feedback","title":"Test-Time Preference Optimization: On-the-Fly Alignment via Iterative Textual Feedback","url":"https://www.microsoft.com/en-us/research/publication/test-time-preference-optimization-on-the-fly-alignment-via-iterative-textual-feedback/","published":"2025-01-21","authors":["Yafu Li","Xuyang Hu","Xiaoye Qu","Linjie Li","Yu Cheng"],"abstract":"Large language models (LLMs) demonstrate impressive performance but lack the flexibility to adapt to human preferences quickly without retraining. In this work, we introduce Test-time Preference Optimization (TPO), a framework that aligns LLM outputs with human preferences during inference, removing the need to update model parameters. Rather than relying on purely numerical rewards, TPO translates reward signals into textual critiques and uses them as textual rewards to iteratively refine its response. Evaluations on benchmarks covering instruction following, preference alignment, safety, and mathematics reveal that TPO progressively improves alignment with human preferences. Notably, after only a few TPO steps, the initially unaligned Llama-3.1-70B-SFT model can surpass the aligned counterpart, Llama-3.1-70B-Instruct. Furthermore, TPO scales efficiently with both the search width and d...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":80,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","large language models","1970-01-01","LLM","preference"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"bytedance-seed:242","title":"Video Depth Anything: Consistent Depth Estimation for Super-Long Videos","url":"https://seed.bytedance.com/en/research/video-depth-anything-consistent-depth-estimation-for-super-long-videos","published":"2025-01-21","authors":["Sili Chen","Hengkai Guo","Shengnan Zhu","Feihu Zhang","Zilong Huang","Jiashi Feng","Bingyi Kang"],"abstract":"Depth Anything has achieved remarkable success in monocular depth estimation with strong generalization ability. However, it suffers from temporal inconsistency in videos, hindering its practical applications. Various methods have been proposed to alleviate this issue by leveraging video generation models or introducing priors from optical flow and camera poses. Nonetheless, these methods are only applicable to short videos (< 10 seconds) and require a trade-off between quality and computational efficiency. We propose Video Depth Anything for high-quality, consistent depth estimation in super-long videos (over several minutes) without sacrificing efficiency. We base our model on Depth Anything V2 and replace its head with an efficient spatial-temporal head. We design a straightforward yet effective temporal consistency loss by constraining the temporal depth gradient, eliminating the nee...","companies":["ByteDance/Seed"],"matched_orgs":["ByteDance/Seed"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Computer Vision","Vision","CVPR 2025","efficient"],"author_affiliations":["ByteDance/Seed"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official ByteDance Seed publication API https://seed.bytedance.com/api/get_article_list_v2"}},{"id":"bytedance-seed:180","title":"UI-TARS: Pioneering Automated GUI Interaction with Native Agents","url":"https://seed.bytedance.com/en/research/ui-tars-pioneering-automated-gui-interaction-with-native-agents","published":"2025-01-21","authors":["Yujia Qin","Yining Ye","Junjie Fang","Haoming Wang","Shihao Liang","Shizuo Tian","Junda Zhang","Jiahao Li","Yunxin Li","Shijue Huang","Wanjun Zhong","Kuanye Li"],"abstract":"This paper introduces UI-TARS, a native GUI agent model that solely perceives the screenshots as input and performs human-like interactions (e.g., keyboard and mouse operations). Unlike prevailing agent frameworks that depend on heavily wrapped commercial models (e.g., GPT-4o) with expert-crafted prompts and workflows, UI-TARS is an end-to-end model that outperforms these sophisticated frameworks. Experiments demonstrate its superior performance: UI-TARS achieves SOTA performance in 10+ GUI agent benchmarks evaluating perception, grounding, and GUI task execution. Notably, in the OSWorld benchmark, UI-TARS achieves scores of 24.6 with 50 steps and 22.7 with 15 steps, outperforming Claude (22.0 and 14.9 respectively). In AndroidWorld, UI-TARS achieves 46.6, surpassing GPT-4o (34.5). UI-TARS incorporates several key innovations: (1) Enhanced Perception: leveraging a large-scale dataset of....","companies":["ByteDance/Seed"],"matched_orgs":["ByteDance/Seed"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Computer Vision","Multimodal","arXiv","agent"],"author_affiliations":["ByteDance/Seed"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official ByteDance Seed publication API https://seed.bytedance.com/api/get_article_list_v2"}},{"id":"hf-org-paper:moonshotai:2501.12599","title":"Kimi k1.5: Scaling Reinforcement Learning with LLMs","url":"https://huggingface.co/papers/2501.12599","published":"2025-01-21","authors":["Moonshot/Kimi"],"abstract":"Language model pretraining with next token prediction has proved effective for scaling compute but is limited to the amount of available training data. Scaling reinforcement learning (RL) unlocks a new axis for the continued improvement of artificial intelligence, with the promise that large language models (LLMs) can scale their training data by learning to explore with rewards. However, prior published work has not produced competitive results. In light of this, we report on the training practice of Kimi k1.5, our latest multi-modal LLM trained with RL, including its RL training techniques, multi-modal data recipes, and infrastructure optimization. Long context scaling and improved policy optimization methods are key ingredients of our approach, which establishes a simplistic, effective RL framework without relying on more complex techniques such as Monte Carlo tree search, value funct...","companies":["Moonshot/Kimi"],"matched_orgs":["Moonshot/Kimi"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["HuggingFace org papers","moonshotai","LLM","language model"],"author_affiliations":["Moonshot/Kimi"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/moonshotai/papers"}},{"id":"official:aabe5815dc9145cb","title":"Global-batch load balance almost free lunch to improve your MoE LLM training","url":"https://qwenlm.github.io/blog/global-load-balance/","published":"2025-01-21","authors":["Alibaba/Qwen"],"abstract":"GITHUB HUGGING FACE MODELSCOPE DISCORDBackground The Mixture-of-Experts (MoEs) architecture has become a popular model-parameter-scale-up technique. Typically, one MoE layer consists of a router (often parameterized as one single Linear layer) and a group of experts (for transformer-based models, each expert is one feedforward layer). Given an input, only a subset of experts will be activated, and then their outputs will be aggregated based on the scores the router assigned.","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["LLM"],"author_affiliations":["Alibaba/Qwen"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official RSS feed https://qwenlm.github.io/blog/index.xml"}},{"id":"official:210f0878d74225e6","title":"Hunyuan3D 2.0: Scaling Diffusion Models for High Resolution Textured 3D Assets Generation","url":"https://huggingface.co/papers/2501.12202","published":"2025-01-21","authors":["Tencent Hunyuan"],"abstract":"","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_report"],"source":"official_report","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Tencent/Hunyuan"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official company report source"}},{"id":"openalex:W4406657361","title":"UniAdapter: All-in-One Control for Flexible Video Generation","url":"https://doi.org/10.1109/tcsvt.2025.3532495","published":"2025-01-21","authors":["Cong Wang","Panwen Hu","Haoyu Zhao","Yuanfan Guo","Jiaxi Gu","Xiao Dong","Jianhua Han","Hang Xu","Xiaodan Liang"],"abstract":"Condition-based video generation aims to create video content based on given information that describes specific subjects. However, most existing works can only utilize a single condition to guide the denoising process, thereby limiting their applicability to specific scenarios. Although some works attempt to accommodate multiple conditions within one framework, they often require multiple encoders, leading to inefficiencies in integrating multi-condition features. In this work, we present a framework that, with the support of the proposed Unified Adapter (UniAdapter), enables simultaneous multi-condition control of video generation within a single model. To effectively merge these conditions, we propose a novel Probabilistic Multi-condition Concatenator (PMC) module, which employs a unified encoder to accommodate multiple conditions and concatenate condition features at the pixel level....","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/tcsvt.2025.3532495","openalex_id":"https://openalex.org/W4406657361","cited_by_count":2,"quality_score":39,"matched_keywords":[],"author_affiliations":["Chinese University of Hong Kong, Shenzhen","Fudan University","Huawei Technologies (China)","Shenzhen University","Sun Yat-sen University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6493829488754272},{"id":"https://openalex.org/C2775924081","display_name":"Control (management)","score":0.42699089646339417},{"id":"https://openalex.org/C49774154","display_name":"Multimedia","score":0.3347899317741394},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.31114429235458374}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":2}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/syndl-a-large-scale-synthetic-test-collection-for-passage-retrieval","title":"SynDL: A Large-Scale Synthetic Test Collection for Passage Retrieval","url":"https://www.microsoft.com/en-us/research/publication/syndl-a-large-scale-synthetic-test-collection-for-passage-retrieval/","published":"2025-01-20","authors":["Hossein A. Rahmani","Xi Wang","Emine Yilmaz","Nick Craswell","Bhaskar Mitra","Paul Thomas"],"abstract":"Large-scale test collections play a crucial role in Information Retrieval (IR) research. However, according to the Cranfield paradigm and the research into publicly available datasets, the existing information retrieval research studies are commonly developed on small-scale datasets that rely on human assessors for relevance judgments - a time-intensive and expensive process. Recent studies have shown the strong capability of Large Language Models (LLMs) in producing reliable relevance judgments with human accuracy but at a greatly reduced cost. In this paper, to address the missing large-scale ad-hoc document retrieval dataset, we extend the TREC Deep Learning Track (DL) test collection via additional language model synthetic labels to enable researchers to test and evaluate their search systems at a large scale. Specifically, such a test collection includes more than 1,900 test queries...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.1145/3701716.3715311","openalex_id":"https://openalex.org/W4411549495","cited_by_count":2,"quality_score":90,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Search and information retrieval","automatic evaluation","Information retrieval","large language models","Synthetic data","language model","retrieval"],"author_affiliations":["Microsoft","Amazon (United Kingdom)","Microsoft (Canada)","Microsoft (United States)","Seattle University","The Alan Turing Institute","University College London","University of Sheffield"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/judgeblender-ensembling-judgments-for-automatic-relevance-assessment","title":"JudgeBlender: Ensembling Judgments for Automatic Relevance Assessment","url":"https://www.microsoft.com/en-us/research/publication/judgeblender-ensembling-judgments-for-automatic-relevance-assessment/","published":"2025-01-20","authors":["Hossein A. Rahmani","Emine Yilmaz","Nick Craswell","Bhaskar Mitra"],"abstract":"The effective training and evaluation of retrieval systems require a substantial amount of relevance judgments, which are traditionally collected from human assessors -- a process that is both costly and time-consuming. Large Language Models (LLMs) have shown promise in generating relevance labels for search tasks, offering a potential alternative to manual assessments. Current approaches often rely on a single LLM, such as GPT-4, which, despite being effective, are expensive and prone to intra-model biases that can favour systems leveraging similar models. In this work, we introduce JudgeBlender, a framework that employs smaller, open-source models to provide relevance judgments by combining evaluations across multiple LLMs (LLMBlender) or multiple prompts (PromptBlender). By leveraging the LLMJudge benchmark [18], we compare JudgeBlender with state-of-the-art methods and the top perfor...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":88,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Search and information retrieval","automatic evaluation","Information retrieval","large language models","Synthetic data","LLM","retrieval"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W4406594717","title":"Large Language Models Enable Textual Interpretation of Image-Based Astronomical Transient Classifications","url":"https://doi.org/10.21203/rs.3.rs-5723428/v1","published":"2025-01-19","authors":["Fiorenzo Stoppa","Turan Bulmus","S. Bloemen","S. J. Smartt","P. Groot","P. M. Vreeswijk","K. Smith"],"abstract":"","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"preprint","doi":"https://doi.org/10.21203/rs.3.rs-5723428/v1","openalex_id":"https://openalex.org/W4406594717","cited_by_count":2,"quality_score":39,"matched_keywords":[],"author_affiliations":["Google (United States)","Radboud University Nijmegen","University of Oxford"],"concepts":[{"id":"https://openalex.org/C527412718","display_name":"Interpretation (philosophy)","score":0.7999916076660156},{"id":"https://openalex.org/C2780799671","display_name":"Transient (computer programming)","score":0.7021646499633789},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5721758604049683},{"id":"https://openalex.org/C115961682","display_name":"Image (mathematics)","score":0.501978874206543},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.4916391968727112},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4364733099937439},{"id":"https://openalex.org/C199360897","display_name":"Programming language","score":0.15630826354026794}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":2}},{"id":"apple:rzu4b0k212kh5g42tf4kyuye","title":"Delayed Fusion: Integrating Large Language Models into First-Pass Decoding in End-to-end Speech Recognition","url":"https://machinelearning.apple.com/research/delayed-fusion-integrating-large","published":"2025-01-18","authors":["Takaaki Hori","Martin Kocour","Adnan Haider","Erik McDermott","Xiaodan Zhuang"],"abstract":"This paper presents an efficient decoding approach for end-to-end automatic speech recognition (E2E-ASR) with large language models (LLMs). Although shallow fusion is the most common approach to incorporate language models into E2E-ASR decoding, we face two practical problems with LLMs. (1) LLM inference is computationally costly. (2) There may be a vocabulary mismatch between the ASR model and the LLM. To resolve this mismatch, we need to...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["LLM","efficient"],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"apple:nans9dnxfdvyaorubpwabm4q","title":"On the Modeling Capabilities of Large Language Models for Sequential Decision Making","url":"https://machinelearning.apple.com/research/modeling-capabilities-of-language","published":"2025-01-18","authors":["Martin Klissarov","Devon Hjelm","Alexander Toshev","Bogdan Mazoure"],"abstract":"Large pretrained models are showing increasingly better performance in reasoning and planning tasks across different modalities, opening the possibility to leverage them for complex sequential decision making problems. In this paper, we investigate the capabilities of Large Language Models (LLMs) for reinforcement learning (RL) across a diversity of interactive domains. We evaluate their ability to produce decision-making policies, either...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"openalex:W4406518786","title":"Hypnos: A domain-specific large language model for anesthesiology","url":"https://doi.org/10.1016/j.neucom.2025.129389","published":"2025-01-18","authors":["Zhonghai Wang","Jie Jiang","Yibing Zhan","Bohao Zhou","Yanhong Li","Chong Zhang","Baosheng Yu","Liang Ding","Hua Jin","Jun Peng","Lin Xu","Ang Li"],"abstract":"","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1016/j.neucom.2025.129389","openalex_id":"https://openalex.org/W4406518786","cited_by_count":11,"quality_score":52,"matched_keywords":["language model"],"author_affiliations":["China University of Petroleum, East China","First People's Hospital of Yunnan Province","Jingdong (China)","Tencent (China)","The University of Sydney","Vision Technology (United States)","Yunnan University","Zhejiang University"],"concepts":[{"id":"https://openalex.org/C2779526319","display_name":"Anesthesiology","score":0.7823523283004761},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6051411032676697},{"id":"https://openalex.org/C36503486","display_name":"Domain (mathematical analysis)","score":0.5338159203529358},{"id":"https://openalex.org/C135257023","display_name":"Domain-specific language","score":0.4691813290119171},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.42654740810394287},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3801940679550171},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.3672182261943817},{"id":"https://openalex.org/C199360897","display_name":"Programming language","score":0.25266918540000916}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":11}},{"id":"apple:w3flk76qup5cr19l4jopx9h3","title":"DSplats: 3D Generation by Denoising Splats-Based Multiview Diffusion Models","url":"https://machinelearning.apple.com/research/dsplats-3d-generation","published":"2025-01-18","authors":["Kevin Miao","Harsh Agrawal","Qihang Zhang","Federico Semeraro","Marco Cavallo","Jiatao Gu","Alexander Toshev"],"abstract":"Generating high-quality 3D content requires models capable of learning robust distributions of complex scenes and the real-world objects within them. Recent Gaussian-based 3D reconstruction techniques have achieved impressive results in recovering high-fidelity 3D assets from sparse input images by predicting 3D Gaussians in a feed-forward manner. However, these techniques often lack the extensive priors and expressiveness offered by Diffusion...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/a-generative-model-for-inorganic-materials-design","title":"A generative model for inorganic materials design","url":"https://www.microsoft.com/en-us/research/publication/a-generative-model-for-inorganic-materials-design/","published":"2025-01-16","authors":["Claudio Zeni","Robert Pinsler","Daniel Zügner","Andrew Fowler","Matthew Horton","Xiang Fu","Zilong Wang","Aliaksandra Shysheya","Jonathan Crabbé","Shoko Ueda","Roberto Sordillo","Lixin Sun"],"abstract":"The design of functional materials with desired properties is essential in driving technological advances in areas like energy storage, catalysis, and carbon capture1–3. Generative models provide a new paradigm for materials design by directly generating novel materials given desired property constraints, but current methods have low success rate in proposing stable crystals or can only satisfy a limited set of property constraints 4−11. Here, we present MatterGen, a model that generates stable, diverse inorganic materials across the periodic table and can further be fine-tuned to steer the generation towards a broad range of property constraints. Compared to prior generative models 4,12, structures produced by MatterGen are more than twice as likely to be novel and stable, and more than 10 times closer to the local energy minimum. After fine-tuning, MatterGen successfully generates stab...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.1038/s41586-025-08628-5","openalex_id":"https://openalex.org/W4406472463","cited_by_count":317,"quality_score":94,"matched_keywords":["Article (Journal)","Artificial intelligence","Medical, health and genomics"],"author_affiliations":["Microsoft","Chinese Academy of Sciences","Microsoft (Germany)","Microsoft (Netherlands)","Microsoft (United States)","Microsoft Research (United Kingdom)","Microsoft Research Asia (China)","Shenzhen Institutes of Advanced Technology"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/gaussmark-a-practical-approach-for-structural-watermarking-of-language-models","title":"GaussMark: A Practical Approach for Structural Watermarking of Language Models","url":"https://www.microsoft.com/en-us/research/publication/gaussmark-a-practical-approach-for-structural-watermarking-of-language-models/","published":"2025-01-16","authors":["Adam Block","Ayush Sekhari","Alexander Rakhlin"],"abstract":"Recent advances in Large Language Models (LLMs) have led to significant improvements in natural language processing tasks, but their ability to generate human-quality text raises significant ethical and operational concerns in settings where it is important to recognize whether or not a given text was generated by a human. Thus, recent work has focused on developing techniques for watermarking LLM-generated text, i.e., introducing an almost imperceptible signal that allows a provider equipped with a secret key to determine if given text was generated by their model. Current watermarking techniques are often not practical due to concerns with generation latency, detection time, degradation in text quality, or robustness. Many of these drawbacks come from the focus on token-level watermarking, which ignores the inherent structure of text. In this work, we introduce a new scheme, GaussMark,...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":80,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","large language models","1970-01-01","LLM","efficient"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/mitigating-hallucinations-in-large-vision-language-models-via-dpo-on-policy-data-hold-the-key","title":"Mitigating Hallucinations in Large Vision-Language Models via DPO: On-Policy Data Hold the Key","url":"https://www.microsoft.com/en-us/research/publication/mitigating-hallucinations-in-large-vision-language-models-via-dpo-on-policy-data-hold-the-key/","published":"2025-01-16","authors":["Zhihe Yang","Xufang Luo","Dongqi Han","Yunjian Xu","Dongsheng Li"],"abstract":"Hallucination remains a major challenge for Large Vision-Language Models (LVLMs). Direct Preference Optimization (DPO) has gained increasing attention as a simple solution to hallucination issues. It directly learns from constructed preference pairs that reflect the severity of hallucinations in responses to the same prompt and image. Nonetheless, different data construction methods in existing works bring notable performance variations. We identify a crucial factor here: outcomes are largely contingent on whether the constructed data aligns on-policy w.r.t the initial (reference) policy of DPO. Theoretical analysis suggests that learning from off-policy data is impeded by the presence of KL-divergence between the updated policy and the reference policy. From the perspective of dataset distribution, we systematically summarize the inherent flaws in existing algorithms that employ DPO to....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Article (Journal)","Artificial intelligence","Computer science","preference"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"bytedance-seed:183","title":"VideoWorld: Exploring Knowledge Learning from Unlabeled Videos","url":"https://seed.bytedance.com/en/research/videoworld-exploring-knowledge-learning-from-unlabeled-videos","published":"2025-01-16","authors":["Zhongwei Ren","Yunchao Wei","Xun Guo","Yao Zhao","Bingyi Kang","Jiashi Feng","Xiaojie Jin"],"abstract":"This work explores whether a deep generative model can learn complex knowledge solely from visual input, in contrast to the prevalent focus on text-based models like large language models (LLMs). We develop VideoWorld, an auto-regressive video generation model trained on unlabeled video data, and test its knowledge acquisition abilities in video-based Go and robotic control tasks. Our experiments reveal two key findings: (1) video-only training provides sufficient information for learning knowledge, including rules, reasoning and planning capabilities, and (2) the representation of visual change is crucial for knowledge acquisition. To improve both the efficiency and efficacy of this process, we introduce the Latent Dynamics Model (LDM) as a key component of VideoWorld. Remarkably, VideoWorld reaches a 5-dan professional level in the Video-GoBench with just a 300-million-parameter model,...","companies":["ByteDance/Seed"],"matched_orgs":["ByteDance/Seed"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Computer Vision","Vision","CVPR 2025"],"author_affiliations":["ByteDance/Seed"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official ByteDance Seed publication API https://seed.bytedance.com/api/get_article_list_v2"}},{"id":"openalex:W4406435421","title":"Summon a demon and bind it: A grounded theory of LLM red teaming","url":"https://doi.org/10.1371/journal.pone.0314658","published":"2025-01-15","authors":["Nanna Inie","Jonathan Stray","Leon Derczynski"],"abstract":"Engaging in the deliberate generation of abnormal outputs from Large Language Models (LLMs) by attacking them is a novel human activity. This paper presents a thorough exposition of how and why people perform such attacks, defining LLM red-teaming based on extensive and diverse evidence. Using a formal qualitative methodology, we interviewed dozens of practitioners from a broad range of backgrounds, all contributors to this novel work of attempting to cause LLMs to fail. We focused on the research questions of defining LLM red teaming, uncovering the motivations and goals for performing the activity, and characterizing the strategies people use when attacking LLMs. Based on the data, LLM red teaming is defined as a limit-seeking, non-malicious, manual activity, which depends highly on a team-effort and an alchemist mindset. It is highly intrinsically motivated by curiosity, fun, and to s...","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1371/journal.pone.0314658","openalex_id":"https://openalex.org/W4406435421","cited_by_count":7,"quality_score":48,"matched_keywords":["LLM"],"author_affiliations":["Berkeley College","IT University of Copenhagen","Nvidia (United States)","University of California, Berkeley"],"concepts":[{"id":"https://openalex.org/C2778491294","display_name":"Mindset","score":0.6731339693069458},{"id":"https://openalex.org/C33435437","display_name":"Curiosity","score":0.6376836895942688},{"id":"https://openalex.org/C156325361","display_name":"Grounded theory","score":0.6118189692497253},{"id":"https://openalex.org/C15744967","display_name":"Psychology","score":0.41371262073516846},{"id":"https://openalex.org/C190248442","display_name":"Qualitative research","score":0.3723699450492859},{"id":"https://openalex.org/C77805123","display_name":"Social psychology","score":0.32656437158584595},{"id":"https://openalex.org/C144024400","display_name":"Sociology","score":0.30685433745384216},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.23093938827514648}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":7}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/flexiclip-locality-preserving-free-form-character-animation","title":"FlexiClip: Locality-Preserving Free-Form Character Animation","url":"https://www.microsoft.com/en-us/research/publication/flexiclip-locality-preserving-free-form-character-animation/","published":"2025-01-14","authors":["Anant Khandelwal"],"abstract":"Animating clipart images with seamless motion while maintaining visual fidelity and temporal coherence presents significant challenges. Existing methods, such as AniClipart, effectively model spatial deformations but often fail to ensure smooth temporal transitions, resulting in artifacts like abrupt motions and geometric distortions. Similarly, text-to-video (T2V) and image-to-video (I2V) models struggle to handle clipart due to the mismatch in statistical properties between natural video and clipart styles. This paper introduces FlexiClip, a novel approach designed to overcome these limitations by addressing the intertwined challenges of temporal consistency and geometric integrity. FlexiClip extends traditional B\\'ezier curve-based trajectory modeling with key innovations: temporal Jacobians to correct motion dynamics incrementally, continuous-time modeling via probability flow ODEs (...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Graphics and multimedia","Computer science","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"bytedance-seed:178","title":"Diffusion Adversarial Post-Training for One-Step Video Generation","url":"https://seed.bytedance.com/en/research/diffusion-adversarial-post-training-for-one-step-video-generation","published":"2025-01-14","authors":["Shanchuan Lin","Xin Xia","Yuxi Ren","Ceyuan Yang","Xuefeng Xiao","Lu Jiang"],"abstract":"The diffusion models are widely used for image and video generation, but their iterative generation process is slow and expansive. While existing distillation approaches have demonstrated the potential for one-step generation in the image domain, they still suffer from significant quality degradation. In this work, we propose Adversarial Post-Training (APT) against real data following diffusion pre-training for one-step video generation. To improve the training stability and quality, we introduce several improvements to the model architecture and training procedures, along with an approximated R1 regularization objective. Empirically, our experiments show that our adversarial post-trained model, Seaweed-APT, can generate 2-second, 1280x720, 24fps videos in real time using a single forward evaluation step. Additionally, our model is capable of generating 1024px images in a single step, ac...","companies":["ByteDance/Seed"],"matched_orgs":["ByteDance/Seed"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Computer Vision and Pattern Recognition","Vision","ICML 2025","distillation"],"author_affiliations":["ByteDance/Seed"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official ByteDance Seed publication API https://seed.bytedance.com/api/get_article_list_v2"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/flavars-a-multimodal-foundational-language-and-vision-alignment-model-for-remote-sensing","title":"FLAVARS: A Multimodal Foundational Language and Vision Alignment Model for Remote Sensing","url":"https://www.microsoft.com/en-us/research/publication/flavars-a-multimodal-foundational-language-and-vision-alignment-model-for-remote-sensing/","published":"2025-01-14","authors":["Isaac Corley","Simone Fobi Nsutezo","Anthony Ortiz","Caleb Robinson","Rahul Dodhia","Juan M. Lavista Ferres","Peyman Najafirad"],"abstract":"Remote sensing imagery is dense with objects and contextual visual information. There is a recent trend to combine paired satellite images and text captions for pretraining performant encoders for downstream tasks. However, while contrastive image-text methods like CLIP enable vision-language alignment and zero-shot classification ability, vision-only downstream performance tends to degrade compared to image-only pretraining, such as MAE. In this paper, we propose FLAVARS, a pretraining method that combines the best of both contrastive learning and masked modeling, along with geospatial alignment via contrastive location encoding. We find that FLAVARS significantly outperforms a baseline of SkyCLIP for vision-only tasks such as KNN classification and semantic segmentation, +6\\% mIOU on SpaceNet1, while retaining the ability to perform zero-shot classification, unlike MAE pretrained metho...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Unpublished","Artificial intelligence","Computer vision"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"official:2f985d959f6c0bc8","title":"Towards Effective Process Supervision in Mathematical Reasoning","url":"https://qwenlm.github.io/blog/qwen2.5-math-prm/","published":"2025-01-14","authors":["Alibaba/Qwen"],"abstract":"GITHUB HUGGING FACE MODELSCOPE DISCORDIntroduction In recent years, Large Language Models (LLMs) have made remarkable advances in mathematical reasoning, yet they can make mistakes, such as miscalculations or logical errors, leading to wrong conclusions. Moreover, even when achieving correct final answers, these powerful models can still regularly make up plausible reasoning steps, where the final answers build upon flawed calculations or derivations, which undermine the reliability and trustworthiness of LLMs’ reasoning processes.","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Alibaba/Qwen"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official RSS feed https://qwenlm.github.io/blog/index.xml"}},{"id":"arxiv:2501.08313","title":"MiniMax-01: Scaling Foundation Models with Lightning Attention","url":"https://huggingface.co/papers/2501.08313","published":"2025-01-14","authors":["MiniMax","Aonian Li","Bangwei Gong","Bo Yang","Boji Shan","Chang Liu","Cheng Zhu","Chunhao Zhang","Congchao Guo","Da Chen","Dong Li","Enwei Jiao"],"abstract":"We introduce MiniMax-01 series, including MiniMax-Text-01 and MiniMax-VL-01, which are comparable to top-tier models while offering superior capabilities in processing longer contexts. The core lies in lightning attention and its efficient scaling. To maximize computational capacity, we integrate it with Mixture of Experts (MoE), creating a model with 32 experts and 456 billion total parameters, of which 45.9 billion are activated for each token. We develop an optimized parallel strategy and highly efficient computation-communication overlap techniques for MoE and lightning attention. This approach enables us to conduct efficient training and inference on models with hundreds of billions of parameters across contexts spanning millions of tokens. The context window of MiniMax-Text-01 can reach up to 1 million tokens during training and extrapolate to 4 million tokens during inference at a...","companies":["MiniMax"],"matched_orgs":["MiniMax"],"company_groups":["company_china"],"company_regions":["China"],"sources":["huggingface_search"],"source":"huggingface_search","work_type":"","doi":"","openalex_id":"","cited_by_count":0,"quality_score":35,"matched_keywords":["language model","efficient"],"author_affiliations":[],"concepts":[],"official_report":false,"quality_signals":{"company_match_source":"HuggingFace Papers organization/author metadata"}},{"id":"arxiv:2501.09038","title":"Do generative video models learn physical principles from watching videos?","url":"https://huggingface.co/papers/2501.09038","published":"2025-01-14","authors":["Saman Motamed","Laura Culp","Kevin Swersky","Priyank Jaini","Robert Geirhos"],"abstract":"AI video generation is undergoing a revolution, with quality and realism advancing rapidly. These advances have led to a passionate scientific debate: Do video models learn ``world models'' that discover laws of physics -- or, alternatively, are they merely sophisticated pixel predictors that achieve visual realism without understanding the physical principles of reality? We address this question by developing Physics-IQ, a comprehensive benchmark dataset that can only be solved by acquiring a deep understanding of various physical principles, like fluid dynamics, optics, solid mechanics, magnetism and thermodynamics. We find that across a range of current models (Sora, Runway, Pika, Lumiere, Stable Video Diffusion, and VideoPoet), physical understanding is severely limited, and unrelated to visual realism. At the same time, some test cases can already be successfully solved. This indica...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["huggingface_search"],"source":"huggingface_search","work_type":"","doi":"","openalex_id":"","cited_by_count":0,"quality_score":27,"matched_keywords":[],"author_affiliations":[],"concepts":[],"official_report":false,"quality_signals":{"company_match_source":"HuggingFace Papers organization/author metadata"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/peace-empowering-geologic-map-holistic-understanding-with-mllms","title":"PEACE: Empowering Geologic Map Holistic Understanding with MLLMs","url":"https://www.microsoft.com/en-us/research/publication/peace-empowering-geologic-map-holistic-understanding-with-mllms/","published":"2025-01-13","authors":["Yangyu Huang","Tianyi Gao","Haoran Xu","Qihao Zhao","Yang Song","Zhipeng Gui","Tengchao Lv","Hao Cheng","Lei Cui","Scarlett Li","Furu Wei"],"abstract":"Geologic map, as a fundamental diagram in geology science, provides critical insights into the structure and composition of Earth's subsurface and surface. These maps are indispensable in various fields, including disaster detection, resource exploration, and civil engineering. Despite their significance, current Multimodal Large Language Models (MLLMs) often fall short in geologic map understanding. This gap is primarily due to the challenging nature of cartographic generalization, which involves handling high-resolution map, managing multiple associated components, and requiring domain-specific knowledge. To quantify this gap, we construct GeoMap-Bench, the first-ever benchmark for evaluating MLLMs in geologic map understanding, which assesses the full-scale abilities in extracting, referring, grounding, reasoning, and analyzing. To bridge this gap, we introduce GeoMap-Agent, the inaug...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Inproceedings (Conference)","Computer vision","Social sciences","agent"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/rad-dino-exploring-scalable-medical-image-encoders-beyond-text-supervision","title":"Exploring Scalable Medical Image Encoders Beyond Text Supervision","url":"https://www.microsoft.com/en-us/research/publication/rad-dino-exploring-scalable-medical-image-encoders-beyond-text-supervision/","published":"2025-01-13","authors":["Fernando Pérez-García","Harshita Sharma","Sam Bond-Taylor","Kenza Bouzid","Valentina Salvatelli","Maximilian Ilse","Shruthi Bannur","Daniel Coelho de Castro","Anton Schwaighofer","Matthew P Lungren","Noel Codella","Stephanie Hyland"],"abstract":"Language-supervised pre-training has proven to be a valuable method for extracting semantically meaningful features from images, serving as a foundational element in multimodal systems within the computer vision and medical imaging domains. However, resulting features are limited by the information contained within the text. This is particularly problematic in medical imaging, where radiologists' written findings focus on specific observations; a challenge compounded by the scarcity of paired imaging-text data due to concerns over leakage of personal health information. In this work, we fundamentally challenge the prevailing reliance on language supervision for learning general purpose biomedical imaging encoders. We introduce RAD-DINO, a biomedical image encoder pre-trained solely on unimodal biomedical imaging data that obtains similar or greater performance than state-of-the-art biome...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Article (Journal)","Artificial intelligence","Computer vision","Medical, health and genomics"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W4406297511","title":"Tool learning with large language models: a survey","url":"https://doi.org/10.1007/s11704-024-40678-2","published":"2025-01-13","authors":["Changle Qu","Sunhao Dai","Xiaochi Wei","Hengyi Cai","Shuaiqiang Wang","Dawei Yin","Jun Xu","Ji-Rong Wen"],"abstract":"","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1007/s11704-024-40678-2","openalex_id":"https://openalex.org/W4406297511","cited_by_count":52,"quality_score":67,"matched_keywords":[],"author_affiliations":["Baidu (China)","Chinese Academy of Sciences","Institute of Computing Technology","Renmin University of China"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.9250589609146118},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4204167127609253},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.39622125029563904}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":52}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/lessons-from-red-teaming-100-generative-ai-products","title":"Lessons From Red Teaming 100 Generative AI Products","url":"https://www.microsoft.com/en-us/research/publication/lessons-from-red-teaming-100-generative-ai-products/","published":"2025-01-13","authors":["Blake Bullwinkel","Amanda Minnich","Shiven Chawla","Gary Lopez","Martin Pouliot","Whitney Maxwell","Joris de Gruyter","Katherine Pratt","Saphir Qi","Nina Chikanov","Roman Lutz","Raja Sekhar Rao Dheekonda"],"abstract":"In recent years, AI red teaming has emerged as a practice for probing the safety and security of generative AI systems. Due to the nascency of the field, there are many open questions about how red teaming operations should be conducted. Based on our experience red teaming over 100 generative AI products at Microsoft, we present our internal threat model ontology and eight main lessons we have learned: Understand what the system can do and where it is applied You don't have to compute gradients to break an AI system AI red teaming is not safety benchmarking Automation can help cover more of the risk landscape The human element of AI red teaming is crucial Responsible AI harms are pervasive but difficult to measure LLMs amplify existing security risks and introduce new ones The work of securing AI systems will never be complete By sharing these insights alongside case studies from our ope...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Unpublished","Artificial intelligence","Computer science"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/imagine-while-reasoning-in-space-multimodal-visualization-of-thought","title":"Imagine while Reasoning in Space: Multimodal Visualization-of-Thought","url":"https://www.microsoft.com/en-us/research/publication/imagine-while-reasoning-in-space-multimodal-visualization-of-thought/","published":"2025-01-12","authors":["Chengzu Li","Wenshan Wu","Huanyu Zhang","Yan Xia","Shaoguang Mao","Li Dong","Ivan Vuli'c","Furu Wei"],"abstract":"Chain-of-Thought (CoT) prompting has proven highly effective for enhancing complex reasoning in Large Language Models (LLMs) and Multimodal Large Language Models (MLLMs). Yet, it struggles in complex spatial reasoning tasks. Nonetheless, human cognition extends beyond language alone, enabling the remarkable capability to think in both words and images. Inspired by this mechanism, we propose a new reasoning paradigm, Multimodal Visualization-of-Thought (MVoT). It enables visual thinking in MLLMs by generating image visualizations of their reasoning traces. To ensure high-quality visualization, we introduce token discrepancy loss into autoregressive MLLMs. This innovation significantly improves both visual coherence and fidelity. We validate this approach through several dynamic spatial reasoning tasks. Experimental results reveal that MVoT demonstrates competitive performance across tasks...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","Language model","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W4406258718","title":"SketchTriplet: Self-Supervised Scenarized Sketch–Text–Image Triplet Generation","url":"https://doi.org/10.1109/jiot.2024.3523382","published":"2025-01-10","authors":["Zhenbei Wu","Qiang Wang","Jie Yang"],"abstract":"Touchscreen IoT devices, such as smartphones and tablets, have been seamlessly integrated into our daily lives. Drawing sketches on the touch screen is an extremely convenient mode of interaction, and when combined with generative AI, it makes the customized data generation and digital twin based on IoT devices more straightforward. However, the scarcity of free-hand sketch data makes the construction of data-driven generative AI models a thorny issue. Despite the emergence of some large-scale sketch datasets, these datasets primarily consist of sketches at the single-object level. There continues to be a lack of large-scale paired datasets for scene sketches. In this paper, we propose a self-supervised method for scene sketch generation that does not rely on any existing scene sketch, enabling the transformation of single-object sketches into scene sketches. To accomplish this, we intro...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/jiot.2024.3523382","openalex_id":"https://openalex.org/W4406258718","cited_by_count":2,"quality_score":39,"matched_keywords":[],"author_affiliations":["Alibaba Group (China)","Beijing University of Posts and Telecommunications"],"concepts":[{"id":"https://openalex.org/C2779231336","display_name":"Sketch","score":0.9100937247276306},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8792804479598999},{"id":"https://openalex.org/C132900626","display_name":"Sketch recognition","score":0.6427796483039856},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5447810292243958},{"id":"https://openalex.org/C2781238097","display_name":"Object (grammar)","score":0.5055208206176758},{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.494552344083786},{"id":"https://openalex.org/C157657479","display_name":"Closed captioning","score":0.4151308536529541},{"id":"https://openalex.org/C2778755073","display_name":"Scale (ratio)","score":0.41276177763938904}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":2}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/bioagents-democratizing-bioinformatics-analysis-with-multi-agent-systems","title":"BioAgents: Democratizing Bioinformatics Analysis with Multi-Agent Systems","url":"https://www.microsoft.com/en-us/research/publication/bioagents-democratizing-bioinformatics-analysis-with-multi-agent-systems/","published":"2025-01-09","authors":["Nikita Mehandru","Amanda K. Hall","Olesya Melnichenko","Yulia Dubinina","Daniel Tsirulnikov","David Bamman","Ahmed Alaa","Scott Saponas","V. Malladi"],"abstract":"Creating end-to-end bioinformatics workflows requires diverse domain expertise, which poses challenges for both junior and senior researchers as it demands a deep understanding of both genomics concepts and computational techniques. While large language models (LLMs) provide some assistance, they often fall short in providing the nuanced guidance needed to execute complex bioinformatics tasks, and require expensive computing resources to achieve high performance. We thus propose a multi-agent system built on small language models, fine-tuned on bioinformatics data, and enhanced with retrieval augmented generation (RAG). Our system, BioAgents, enables local operation and personalization using proprietary data. We observe performance comparable to human experts on conceptual genomics tasks, and suggest next steps to enhance code generation capabilities.","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":88,"matched_keywords":["Miscellaneous","Artificial intelligence","Human-computer interaction","Medical, health and genomics","Computer science","personalization","retrieval","agent","multi-agent"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W4406208390","title":"Foundation Models Defining a New Era in Vision: A Survey and Outlook","url":"https://doi.org/10.1109/tpami.2024.3506283","published":"2025-01-09","authors":["Muhammad Awais","Muzammal Naseer","Salman Khan","Rao Muhammad Anwer","Hisham Cholakkal","Mubarak Shah","Ming–Hsuan Yang","Fahad Shahbaz Khan"],"abstract":"Vision systems that see and reason about the compositional nature of visual scenes are fundamental to understanding our world. The complex relations between objects and their locations, ambiguities, and variations in the real-world environment can be better described in human language, naturally governed by grammatical rules and other modalities such as audio and depth. The models learned to bridge the gap between such modalities and large-scale training data facilitate contextual reasoning, generalization, and prompt capabilities at test time. These models are referred to as foundation models. The output of such models can be modified through human-provided prompts without retraining, e.g., segmenting a particular object by providing a bounding box, having interactive dialogues by asking questions about an image or video scene or manipulating the robot's behavior through language instru...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/tpami.2024.3506283","openalex_id":"https://openalex.org/W4406208390","cited_by_count":172,"quality_score":67,"matched_keywords":[],"author_affiliations":["Australian National University","Georgia Institute of Technology","Google (United States)","Khalifa University of Science and Technology","Linköping University","Mohamed bin Zayed University of Artificial Intelligence","University of California, Merced","University of Central Florida"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7769436836242676},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.6225407123565674},{"id":"https://openalex.org/C2781067378","display_name":"Interpretability","score":0.5370985269546509},{"id":"https://openalex.org/C2779903281","display_name":"Modalities","score":0.5039932131767273},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.48112693428993225},{"id":"https://openalex.org/C2780966255","display_name":"Foundation (evidence)","score":0.4633754789829254},{"id":"https://openalex.org/C9652623","display_name":"Field (mathematics)","score":0.4588414430618286},{"id":"https://openalex.org/C200220432","display_name":"Vision science","score":0.4212353527545929}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":172}},{"id":"openalex:W4406214337","title":"Automated Research Review Support Using Machine Learning, Large Language Models, and Natural Language Processing","url":"https://doi.org/10.3390/electronics14020256","published":"2025-01-09","authors":["Vishnu S. Pendyala","Karnavee Kamdar","Kapil Mulchandani"],"abstract":"Research expands the boundaries of a subject, economy, and civilization. Peer review is at the heart of research and is understandably an expensive process. This work, with human-in-the-loop, aims to support the research community in multiple ways. It predicts quality, and acceptance, and recommends reviewers. It helps the authors and editors to evaluate research work using machine learning models developed based on a dataset comprising 18,000+ research papers, some of which are from highly acclaimed, top conferences in Artificial Intelligence such as NeurIPS and ICLR, their reviews, aspect scores, and accept/reject decisions. Using machine learning algorithms such as Support Vector Machines, Deep Learning Recurrent Neural Network architectures such as LSTM, a wide variety of pre-trained word vectors using Word2Vec, GloVe, FastText, transformer architecture-based BERT, DistilBERT, Google...","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.3390/electronics14020256","openalex_id":"https://openalex.org/W4406214337","cited_by_count":4,"quality_score":49,"matched_keywords":["LLM","language model"],"author_affiliations":["Amazon (United States)","Oracle (United States)","San Jose State University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7078486680984497},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.6013569235801697},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5246665477752686},{"id":"https://openalex.org/C195324797","display_name":"Natural language","score":0.42111051082611084}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":4}},{"id":"openalex:W4408703712","title":"How Culturally Aware Are Vision-Language Models?","url":"https://doi.org/10.1109/ipas63548.2025.10924504","published":"2025-01-09","authors":["Olena Burda-Lassen","Aman Chadha","Shashank Goswami","Vinija Jain"],"abstract":"An image is often considered worth a thousand words, and certain images can tell rich and insightful stories. Can these stories be told via image captioning? Images from folklore genres, such as mythology, folk dance, cultural signs, and symbols, are vital to every culture. Our research compares the performance of four popular vision-language models (GPT-4V, Gemini Pro Vision, LLaVA, and OpenFlamingo) in identifying culturally specific information in such images and creating accurate and culturally sensitive image captions. We also propose a new evaluation metric, the Cultural Awareness Score (CAS), which measures the degree of cultural awareness in image captions. We provide a dataset MOSAIC-1.5k labeled with ground truth for images containing cultural background and context and a labeled dataset with assigned Cultural Awareness Scores that can be used with unseen data. Creating cultura...","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/ipas63548.2025.10924504","openalex_id":"https://openalex.org/W4408703712","cited_by_count":1,"quality_score":38,"matched_keywords":[],"author_affiliations":["Amazon (United States)","Stanford University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6457201242446899},{"id":"https://openalex.org/C188147891","display_name":"Cognitive science","score":0.3788423538208008},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.34285768866539},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.3390176594257355},{"id":"https://openalex.org/C15744967","display_name":"Psychology","score":0.18138664960861206}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/rstar-math-small-llms-can-master-math-reasoning-with-self-evolved-deep-thinking","title":"rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking","url":"https://www.microsoft.com/en-us/research/publication/rstar-math-small-llms-can-master-math-reasoning-with-self-evolved-deep-thinking/","published":"2025-01-08","authors":["Xinyu Guan","Li Lyna Zhang","Yifei Liu","Ning Shang","Youran Sun","Yi Zhu","Fan Yang","Mao Yang"],"abstract":"We present rStar-Math to demonstrate that small language models (SLMs) can rival or even surpass the math reasoning capability of OpenAI o1, without distillation from superior models. rStar-Math achieves this by exercising\"deep thinking\"through Monte Carlo Tree Search (MCTS), where a math policy SLM performs test-time search guided by an SLM-based process reward model. rStar-Math introduces three innovations to tackle the challenges in training the two SLMs: (1) a novel code-augmented CoT data sythesis method, which performs extensive MCTS rollouts to generate step-by-step verified reasoning trajectories used to train the policy SLM; (2) a novel process reward model training method that avoids na\\\"ive step-level score annotation, yielding a more effective process preference model (PPM); (3) a self-evolution recipe in which the policy SLM and PPM are built from scratch and iteratively evo...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":80,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Mathematics","Computer science","1970-01-01","preference","distillation"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W4406152279","title":"Toward expert-level medical question answering with large language models","url":"https://doi.org/10.1038/s41591-024-03423-7","published":"2025-01-08","authors":["K. K. Singhal","Tao Tu","Juraj Gottweis","Rory Sayres","Ellery Wulczyn","Mohamed Amin","Le Hou","Kevin Clark","Stephen Pfohl","Heather Cole-Lewis","Darlene Neal","Qazi Mamunur Rashid"],"abstract":"Large language models (LLMs) have shown promise in medical question answering, with Med-PaLM being the first to exceed a 'passing' score in United States Medical Licensing Examination style questions. However, challenges remain in long-form medical question answering and handling real-world workflows. Here, we present Med-PaLM 2, which bridges these gaps with a combination of base LLM improvements, medical domain fine-tuning and new strategies for improving reasoning and grounding through ensemble refinement and chain of retrieval. Med-PaLM 2 scores up to 86.5% on the MedQA dataset, improving upon Med-PaLM by over 19%, and demonstrates dramatic performance increases across MedMCQA, PubMedQA and MMLU clinical topics datasets. Our detailed human evaluations framework shows that physicians prefer Med-PaLM 2 answers to those from other physicians on eight of nine clinical axes. Med-PaLM 2 al...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1038/s41591-024-03423-7","openalex_id":"https://openalex.org/W4406152279","cited_by_count":632,"quality_score":75,"matched_keywords":["LLM","retrieval"],"author_affiliations":["Google (United States)","Stanford Health Care","Stanford Medicine","Stanford University"],"concepts":[{"id":"https://openalex.org/C44291984","display_name":"Question answering","score":0.7204053401947021},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.4593212604522705},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.4565448760986328},{"id":"https://openalex.org/C71924100","display_name":"Medicine","score":0.3800949454307556},{"id":"https://openalex.org/C15744967","display_name":"Psychology","score":0.3362233638763428},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.32847535610198975}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":632}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/refocus-visual-editing-as-a-chain-of-thought-for-structured-image-understanding","title":"ReFocus: Visual Editing as a Chain of Thought for Structured Image Understanding","url":"https://www.microsoft.com/en-us/research/publication/refocus-visual-editing-as-a-chain-of-thought-for-structured-image-understanding/","published":"2025-01-08","authors":["Xingyu Fu","Minqian Liu","Zhengyuan Yang","John Corring","Yijuan Lu","Jianwei Yang","Dan Roth","Dinei Florencio","Cha Zhang"],"abstract":"Structured image understanding, such as interpreting tables and charts, requires strategically refocusing across various structures and texts within an image, forming a reasoning sequence to arrive at the final answer. However, current multimodal large language models (LLMs) lack this multihop selective attention capability. In this work, we introduce ReFocus, a simple yet effective framework that equips multimodal LLMs with the ability to generate\"visual thoughts\"by performing visual editing on the input image through code, shifting and refining their visual focuses. Specifically, ReFocus enables multimodal LLMs to generate Python codes to call tools and modify the input image, sequentially drawing boxes, highlighting sections, and masking out areas, thereby enhancing the visual reasoning process. We experiment upon a wide range of structured image understanding tasks involving tables a...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","large language models","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/can-mllms-reason-in-multimodality-emma-an-enhanced-multimodal-reasoning-benchmark","title":"Can MLLMs Reason in Multimodality? EMMA: An Enhanced MultiModal ReAsoning Benchmark","url":"https://www.microsoft.com/en-us/research/publication/can-mllms-reason-in-multimodality-emma-an-enhanced-multimodal-reasoning-benchmark/","published":"2025-01-08","authors":["Yunzhuo Hao","Jiawei Gu","Huichen Will Wang","Linjie Li","Zhengyuan Yang","Lijuan Wang","Yu Cheng"],"abstract":"The ability to organically reason over and with both text and images is a pillar of human intelligence, yet the ability of Multimodal Large Language Models (MLLMs) to perform such multimodal reasoning remains under-explored. Existing benchmarks often emphasize text-dominant reasoning or rely on shallow visual cues, failing to adequately assess integrated visual and textual reasoning. We introduce EMMA (Enhanced MultiModal reAsoning), a benchmark targeting organic multimodal reasoning across mathematics, physics, chemistry, and coding. EMMA tasks demand advanced cross-modal reasoning that cannot be addressed by reasoning independently in each modality, offering an enhanced test suite for MLLMs' reasoning capabilities. Our evaluation of state-of-the-art MLLMs on EMMA reveals significant limitations in handling complex multimodal and multi-step reasoning tasks, even with advanced techniques...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","Multimodal Large Language Models","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W4406171741","title":"Chasing Common Knowledge: Joint Large Model Selection and Pulling in MEC With Parameter Sharing","url":"https://doi.org/10.1109/tpds.2025.3527649","published":"2025-01-08","authors":["Lizhen Zhou","Zichuan Xu","Qiufen Xia","Zhou Xu","Wenhao Ren","Qi Wu","Jingsheng Ma","Yan Song","Yuan Yang"],"abstract":"Pretrained Foundation Models (PFMs) are regarded as a promising accelerator for the development of various Artificial Intelligence (AI) applications, and have recently been widely fine-tuned to satisfy users' personalized inference demands. As many users are attracted to PFM-based AI applications, remote data centers are increasingly unable to solely bear the enormous computational demands and meet the delay requirements of inference requests. Mobile edge computing (MEC) offers a viable solution for delivering low-latency inference services by pulling fine-tuned PFMs from the remote data center to cloudlets in the proximity of users. However, a fine-tuned PFM typically comprises billions of model parameters, which are highly resource-intensive, time-consuming, and cost-prohibitive to execute at the edge. To address this, we investigate a novel joint large model selection and pulling prob...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/tpds.2025.3527649","openalex_id":"https://openalex.org/W4406171741","cited_by_count":3,"quality_score":44,"matched_keywords":["personalized"],"author_affiliations":["Alibaba Group (China)","Dalian University of Technology"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7421455383300781},{"id":"https://openalex.org/C18555067","display_name":"Joint (building)","score":0.6524302363395691},{"id":"https://openalex.org/C81917197","display_name":"Selection (genetic algorithm)","score":0.6109565496444702},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.23106035590171814},{"id":"https://openalex.org/C127413603","display_name":"Engineering","score":0.1562250554561615},{"id":"https://openalex.org/C66938386","display_name":"Structural engineering","score":0.1557769775390625}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":3}},{"id":"openalex:W4406164486","title":"MotIF: Motion Instruction Fine-Tuning","url":"https://doi.org/10.1109/lra.2025.3527290","published":"2025-01-08","authors":["Mona Hwang","Joey Hejna","Dorsa Sadigh","Yonatan Bisk"],"abstract":"While success in many robotics tasks can be determined by only observing the final state and how it differs from the initial state-e.g., if an apple is picked up-many tasks require observing the full motion of the robot to correctly determine success. For example, brushing hair requires repeated strokes that correspond to the contours and type of hair. Prior works often use off-the-shelf vision-language models (VLMs) as success detectors; however, when success depends on the full trajectory, VLMs struggle to make correct judgments for two reasons. First, modern VLMs often use single frames, and thus cannot capture changes over a full trajectory. Second, even if we provide state-ofthe- art VLMs with an input of multiple frames, they still fail to correctly detect success due to a lack of robot data. Our key idea is to fine-tune VLMs using abstract representations that are able to capture....","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/lra.2025.3527290","openalex_id":"https://openalex.org/W4406164486","cited_by_count":3,"quality_score":40,"matched_keywords":[],"author_affiliations":["Carnegie Mellon University","Google (United States)","Massachusetts Institute of Technology","Stanford University"],"concepts":[{"id":"https://openalex.org/C32276052","display_name":"Motif (music)","score":0.7595993280410767},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.46397194266319275},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.36651912331581116},{"id":"https://openalex.org/C142362112","display_name":"Art","score":0.204708993434906},{"id":"https://openalex.org/C107038049","display_name":"Aesthetics","score":0.11221060156822205}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":3}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/epicoder-encompassing-diversity-and-complexity-in-code-generation","title":"EpiCoder: Encompassing Diversity and Complexity in Code Generation","url":"https://www.microsoft.com/en-us/research/publication/epicoder-encompassing-diversity-and-complexity-in-code-generation/","published":"2025-01-07","authors":["Yaoxiang Wang","Haoling Li","Xin Zhang","Jie Wu","Xiao Liu","Wenxiang Hu","Zhongxin Guo","Yangyu Huang","Ying Xin","Yujiu Yang","Jinsong Su","Qi Chen"],"abstract":"Existing methods for code generation use code snippets as seed data, restricting the complexity and diversity of the synthesized data. In this paper, we introduce a novel feature tree-based synthesis framework, which revolves around hierarchical code features derived from high-level abstractions of code. The feature tree is constructed from raw data and refined iteratively to increase the quantity and diversity of the extracted features, which captures and recognizes more complex patterns and relationships within the code. By adjusting the depth and breadth of the sampled subtrees, our framework provides precise control over the complexity of the generated code, enabling functionalities that range from function-level operations to multi-file scenarios. We fine-tuned widely-used base models to obtain EpiCoder series, achieving state-of-the-art performance on multiple benchmarks at both th...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Code generation","Computer science","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W4407786753","title":"Transforming Healthcare Diagnostics: A Comprehensive Review of Convolutional Neural Networks in Medical Imaging and Disease Prediction","url":"https://doi.org/10.1109/icmcsi64620.2025.10883093","published":"2025-01-07","authors":["Sairam Durgaraju","Deepan Vishal Thulasi Vel","Harikrishna Madathala"],"abstract":"Integrating Convolutional Neural Networks (CNNs) into healthcare diagnostics has revolutionized the field of medical imaging by enhancing accuracy, efficiency, and early disease detection capabilities. This review comprehensively analyzes CNN applications across multiple medical domains, including oncology, cardiology, neurology, and ophthalmology, highlighting recent advancements and emerging trends in CNN-based imaging. Techniques such as transfer learning, self-supervised learning, and multi-modal data fusion have been explored for their potential to address critical limitations, including data scarcity, model interpretability, and generalization. Additionally, advancements in explainable AI (XAI) and federated learning have improved the transparency and adaptability of CNN models, enabling broader acceptance in clinical settings. Despite these advancements, challenges remain regardin...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"review","doi":"https://doi.org/10.1109/icmcsi64620.2025.10883093","openalex_id":"https://openalex.org/W4407786753","cited_by_count":1,"quality_score":38,"matched_keywords":[],"author_affiliations":["Cigna (United States)","Microsoft (United States)"],"concepts":[{"id":"https://openalex.org/C81363708","display_name":"Convolutional neural network","score":0.7468592524528503},{"id":"https://openalex.org/C160735492","display_name":"Health care","score":0.6153623461723328},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6133614182472229},{"id":"https://openalex.org/C31601959","display_name":"Medical imaging","score":0.5425089597702026},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5036661028862},{"id":"https://openalex.org/C2779134260","display_name":"Disease","score":0.4395264983177185},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.3292238414287567},{"id":"https://openalex.org/C71924100","display_name":"Medicine","score":0.20750707387924194}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/vlm-driven-behavior-tree-for-context-aware-task-planning","title":"VLM-driven Behavior Tree for Context-aware Task Planning","url":"https://www.microsoft.com/en-us/research/publication/vlm-driven-behavior-tree-for-context-aware-task-planning/","published":"2025-01-06","authors":["Naoki Wake","Atsushi Kanehira","Jun Takamatsu","Kazuhiro Sasabuchi","Katsushi Ikeuchi"],"abstract":"The use of Large Language Models (LLMs) for generating Behavior Trees (BTs) has recently gained attention in the robotics community, yet remains in its early stages of development. In this paper, we propose a novel framework that leverages Vision-Language Models (VLMs) to interactively generate and edit BTs that address visual conditions, enabling context-aware robot operations in visually complex environments. A key feature of our approach lies in the conditional control through self-prompted visual conditions. Specifically, the VLM generates BTs with visual condition nodes, where conditions are expressed as free-form text. Another VLM process integrates the text into its prompt and evaluates the conditions against real-world images during robot execution. We validated our framework in a real-world cafe scenario, demonstrating both its feasibility and limitations.","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":80,"matched_keywords":["Miscellaneous","Artificial intelligence","Computer vision","Human-computer interaction","Computer science","Embodied AI","Robotics"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W4406047156","title":"Evidence-Based Real-Time Road Segmentation With RGB-D Data Augmentation","url":"https://doi.org/10.1109/tits.2024.3509140","published":"2025-01-03","authors":["Feng Xue","Yicong Chang","Wenzhuang Xu","Wenteng Liang","Fei Sheng","Anlong Ming"],"abstract":"Despite significant progress in RGB-D based road segmentation in recent years, the latest methods cannot achieve both state-of-the-art accuracy and real time due to the high-performance reliance on heavy structures. We argue that this reliance is due to unsuitable multimodal fusion. To be specific, RGB and depth data in road scenes are each sensitive to different regions, but current RGB-D based road segmentation methods generally combine features within sensitive regions which preserves false road representation from one of the data. Based on such findings, we design an Evidence-based Road Segmentation Method (Evi-RoadSeg), which incorporates prior knowledge of the modal-specific characteristics. Firstly, we abandon the cross-modal fusion operation commonly used in existing multimodal based methods. Instead, we collect the road evidence from RGB and depth inputs separately via two low-l...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/tits.2024.3509140","openalex_id":"https://openalex.org/W4406047156","cited_by_count":7,"quality_score":44,"matched_keywords":[],"author_affiliations":["Beijing University of Posts and Telecommunications","Tencent (China)","University of Trento"],"concepts":[{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5905389189720154},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5898326635360718},{"id":"https://openalex.org/C89600930","display_name":"Segmentation","score":0.5827097296714783},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.5791208148002625},{"id":"https://openalex.org/C121684516","display_name":"Computer graphics (images)","score":0.3294757902622223}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":7}},{"id":"openalex:W4406002885","title":"Multi-modal conditional diffusion model using signed distance functions for metal-organic frameworks generation","url":"https://doi.org/10.1038/s41467-024-55390-9","published":"2025-01-02","authors":["Junkil Park","Youhan Lee","Jihan Kim"],"abstract":"The design of porous materials with user-desired properties has been a great interest for the last few decades. However, the flexibility of target properties has been highly limited, and targeting multiple properties of diverse modalities simultaneously has been scarcely explored. Furthermore, although deep generative models have opened a new paradigm in materials generation, their incorporation into porous materials such as metal-organic frameworks (MOFs) has not been satisfactory due to their structural complexity. In this work, we introduce MOFFUSION, a latent diffusion model that addresses the aforementioned challenges. Signed distance functions (SDFs) are employed for the input representation of MOFs, marking their first usage in representing porous materials for generative models. Using the suitability of SDFs in describing complicated pore structures, MOFFUSION exhibits exceptiona...","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1038/s41467-024-55390-9","openalex_id":"https://openalex.org/W4406002885","cited_by_count":14,"quality_score":51,"matched_keywords":[],"author_affiliations":["Korea Advanced Institute of Science and Technology","Nvidia (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6844933032989502},{"id":"https://openalex.org/C2780598303","display_name":"Flexibility (engineering)","score":0.5785528421401978},{"id":"https://openalex.org/C167966045","display_name":"Generative model","score":0.5532054901123047},{"id":"https://openalex.org/C71139939","display_name":"Modal","score":0.5194848775863647},{"id":"https://openalex.org/C5274069","display_name":"Categorical variable","score":0.49415138363838196},{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.4915231168270111},{"id":"https://openalex.org/C2776359362","display_name":"Representation (politics)","score":0.48107385635375977},{"id":"https://openalex.org/C2779903281","display_name":"Modalities","score":0.47133708000183105}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":14}},{"id":"openalex:W4405992798","title":"Transformation of ChatGPT into Threat: The Effects of Generative AI on Data Protection and Security","url":"https://doi.org/10.47672/ajce.2586","published":"2025-01-02","authors":["Nishchai Jayanna Manjula","Kiran Randhi","Srinivas Reddy Bandarapu"],"abstract":"Purpose: For 2022, GenAI models were the main digital transformation advancement. Cybersecurity is crucial when GenAI models like ChatGPT and Google Bard get more complex. Cybersecurity incidents have highlighted GenAI's offensive and defensive use, creating social, ethical, and privacy issues. GenAI's privacy and cybersecurity risks, possibilities, and constraints are covered in this paper. This study demonstrates ChatGPT's security flaws, which bad actors might utilize to steal sensitive data by violating the model's ethics. In this research, we show ChatGPT attacks using jailbreaks, reverse psychology, and quick injection. Learn how hackers utilize GenAI to launch cyberattacks. Materials and Methods: ChatGPT is great for customer service, but Bard AI is where it's at when it comes to conversational apps. Diverse technologies have diverse developer communities and ecosystems. With over...","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.47672/ajce.2586","openalex_id":"https://openalex.org/W4405992798","cited_by_count":1,"quality_score":38,"matched_keywords":[],"author_affiliations":["Amazon (United States)","TechLab (United States)"],"concepts":[{"id":"https://openalex.org/C204241405","display_name":"Transformation (genetics)","score":0.6910071969032288},{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.5601319074630737},{"id":"https://openalex.org/C38652104","display_name":"Computer security","score":0.49193811416625977},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.43326476216316223},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.21707388758659363},{"id":"https://openalex.org/C185592680","display_name":"Chemistry","score":0.10309791564941406},{"id":"https://openalex.org/C104317684","display_name":"Gene","score":0.0},{"id":"https://openalex.org/C55493867","display_name":"Biochemistry","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/sociotechnical-implications-of-generative-artificial-intelligence-for-information-access","title":"Sociotechnical Implications of Generative Artificial Intelligence for Information Access","url":"https://www.microsoft.com/en-us/research/publication/sociotechnical-implications-of-generative-artificial-intelligence-for-information-access/","published":"2025-01-01","authors":["Bhaskar Mitra","Henriette Cramer","Olya Gurevich"],"abstract":"Robust access to trustworthy information is a critical need for society with implications for knowledge production, public health education, and promoting informed citizenry in democratic societies. Generative AI technologies may enable new ways to access information and improve effectiveness of existing information retrieval systems but we are only starting to understand and grapple with their long-term social implications. In this chapter, we present an overview of some of the systemic consequences and risks of employing generative AI in the context of information access. We also provide recommendations for evaluation and mitigation, and discuss challenges for future research.","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":88,"matched_keywords":["In Book","Artificial intelligence","Search and information retrieval","algorithmic fairness","Generative AI","Information retrieval","Sociotechnical system","long-term","retrieval"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/the-new-calculator-practices-norms-and-implications-of-generative-ai-in-higher-education","title":"The New Calculator? Practices, Norms, and Implications of Generative AI in Higher Education","url":"https://www.microsoft.com/en-us/research/publication/the-new-calculator-practices-norms-and-implications-of-generative-ai-in-higher-education/","published":"2025-01-01","authors":["Auste Simkute","Viktor Kewenig","Abigail Sellen","Sean Rintel","Lev Tankelevitch"],"abstract":"Generative AI (GenAI) has introduced myriad opportunities and challenges for higher education. Anticipating this potential transformation requires understanding students' contextualised practices and norms around GenAI. We conducted semi-structured interviews with 26 students and 11 educators from diverse departments across two universities. Grounded in Strong Structuration Theory, we find diversity in students' uses and motivations for GenAI. Occurring in the context of unclear university guidelines, institutional fixation on plagiarism, and inconsistent educator communication, students' practices are informed by unspoken rules around appropriate use, GenAI limitations and reliance strategies, and consideration of agency and skills. Perceived impacts include changes in confidence, and concerns about skill development, relationships with educators, and plagiarism. Both groups envision ch...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":80,"matched_keywords":["Article (Journal)","Artificial intelligence","Human-computer interaction","Social sciences","Human Computer Interaction","Social Science","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W4406417959","title":"Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers","url":"https://doi.org/10.1109/taslpro.2025.3530270","published":"2025-01-01","authors":["Sanyuan Chen","Chengyi Wang","Yu Wu","Ziqiang Zhang","Long Zhou","Shujie Liu","Zhuo Chen","Tie‐Yan Liu","Huaming Wang","Jinyu Li","Lei He","Sheng Zhao"],"abstract":"We introduce a language modeling approach for text to speech synthesis (TTS). Specifically, we train a <italic xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">neural codec language model</i> (called <sc xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">VALL-E</small>) using discrete codes derived from an off-the-shelf neural audio codec model, and regard TTS as a conditional language modeling task rather than continuous signal regression as in previous work. During the pre-training stage, we scale up the TTS training data to 50 k hours of English speech which is hundreds of times larger than existing systems. <sc xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">VALL-E</small> emerges <italic xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" xmlns:xlink=\"http://www.w3.or...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/taslpro.2025.3530270","openalex_id":"https://openalex.org/W4406417959","cited_by_count":81,"quality_score":75,"matched_keywords":["language model","personalized"],"author_affiliations":["Harbin Institute of Technology","Microsoft (United States)","Nankai University","University of Science and Technology of China"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7968358397483826},{"id":"https://openalex.org/C28490314","display_name":"Speech recognition","score":0.7073042392730713},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5133149027824402},{"id":"https://openalex.org/C161765866","display_name":"Codec","score":0.5092850923538208},{"id":"https://openalex.org/C2780813799","display_name":"Zero (linguistics)","score":0.5035335421562195},{"id":"https://openalex.org/C61328038","display_name":"Speech processing","score":0.4813295602798462},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.4767923355102539},{"id":"https://openalex.org/C75217168","display_name":"Codec2","score":0.46894216537475586}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":81}},{"id":"openalex:W4412158322","title":"🧜Siren’s Song in the AI Ocean: A Survey on Hallucination in Large Language Models","url":"https://doi.org/10.1162/coli.a.16","published":"2025-01-01","authors":["Yue Zhang","Yafu Li","Leyang Cui","Cai Deng","Lemao Liu","Tingchen Fu","Xinting Huang","Enbo Zhao","Yanwen Zhang","Yulong Chen","Longyue Wang","Ahn Tuan Luu"],"abstract":"Abstract While large language models (LLMs) have demonstrated remarkable capabilities across a range of downstream tasks, a significant concern revolves around their propensity to exhibit hallucinations: LLMs occasionally generate content that diverges from the user input, contradicts previously generated context, or misaligns with established world knowledge. This phenomenon poses a substantial challenge to the reliability of LLMs in real-world scenarios. In this article, we survey recent efforts on the detection, explanation, and mitigation of hallucination, with an emphasis on the unique challenges posed by LLMs. We present taxonomies of the LLM hallucination phenomena and evaluation benchmarks, analyze existing approaches aiming at mitigating LLM hallucination, and discuss potential directions for future research.","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1162/coli.a.16","openalex_id":"https://openalex.org/W4412158322","cited_by_count":128,"quality_score":71,"matched_keywords":["LLM"],"author_affiliations":["Bridge University","Nanyang Technological University","Renmin University of China","Shanghai Artificial Intelligence Laboratory","Shanghai Municipal People's Government","Soochow University","Tencent (China)","University of Cambridge","University of Waterloo","Vector Institute"],"concepts":[{"id":"https://openalex.org/C160844653","display_name":"Siren (mythology)","score":0.8607272505760193},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5211320519447327},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.33613336086273193},{"id":"https://openalex.org/C41895202","display_name":"Linguistics","score":0.32652390003204346},{"id":"https://openalex.org/C142362112","display_name":"Art","score":0.18650361895561218},{"id":"https://openalex.org/C124952713","display_name":"Literature","score":0.17328152060508728},{"id":"https://openalex.org/C138885662","display_name":"Philosophy","score":0.1381077766418457}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":128}},{"id":"openalex:W4410706693","title":"GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest","url":"https://doi.org/10.1007/978-3-031-91813-1_4","published":"2025-01-01","authors":["Shilong Zhang","Peize Sun","Shoufa Chen","Min Xiao","Wenqi Shao","Wenwei Zhang","Yu Liu","Kai Chen","Ping Luo"],"abstract":"","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"book-chapter","doi":"https://doi.org/10.1007/978-3-031-91813-1_4","openalex_id":"https://openalex.org/W4410706693","cited_by_count":38,"quality_score":71,"matched_keywords":["language model"],"author_affiliations":["Alibaba Group (China)","Shanghai Artificial Intelligence Laboratory","University of Hong Kong"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.871324896812439},{"id":"https://openalex.org/C199360897","display_name":"Programming language","score":0.4037069082260132},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3606339693069458},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.3539102077484131}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":38}},{"id":"openalex:W4406771321","title":"Evaluation of Retrieval-Augmented Generation: A Survey","url":"https://doi.org/10.1007/978-981-96-1024-2_8","published":"2025-01-01","authors":["Hao Yu","Aoran Gan","Kai Zhang","Shiwei Tong","Qi Liu","Zhaofeng Liu"],"abstract":"","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"book-chapter","doi":"https://doi.org/10.1007/978-981-96-1024-2_8","openalex_id":"https://openalex.org/W4406771321","cited_by_count":91,"quality_score":71,"matched_keywords":["retrieval"],"author_affiliations":["McGill University","Tencent (China)","University of Science and Technology of China"],"concepts":[{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.6474823951721191},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5228060483932495},{"id":"https://openalex.org/C2778698081","display_name":"Thesaurus","score":0.5134664177894592},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.22081053256988525}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":91}},{"id":"openalex:W4414037010","title":"An Empirical Study of Catastrophic Forgetting in Large Language Models During Continual Fine-Tuning","url":"https://doi.org/10.1109/taslpro.2025.3606231","published":"2025-01-01","authors":["Yun Luo","Zhen Yang","Fandong Meng","Yafu Li","Jie Zhou","Yue Zhang"],"abstract":"Catastrophic forgetting (CF) is a phenomenon that occurs in machine learning when a model forgets previously learned information while acquiring new knowledge for achieving satisfactory performance in downstream tasks. As large language models (LLMs) have demonstrated remarkable performance, it is intriguing to investigate whether CF exists during the continual instruction tuning of LLMs. This study empirically evaluates the forgetting phenomenon in LLMs' knowledge during continual instruction tuning from the perspectives of domain knowledge, reasoning, and reading comprehension. The experiments reveal that catastrophic forgetting is generally observed in LLMs ranging from 1b to 7b parameters. Surprisingly, as the model scale increases, the severity of forgetting intensifies in such a model scale range, which may result from the much more significant initial performance in the larger LLM...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/taslpro.2025.3606231","openalex_id":"https://openalex.org/W4414037010","cited_by_count":31,"quality_score":71,"matched_keywords":["LLM"],"author_affiliations":["Tencent (China)","Westlake University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6815458536148071},{"id":"https://openalex.org/C7149132","display_name":"Forgetting","score":0.6729207038879395},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3971593379974365},{"id":"https://openalex.org/C28490314","display_name":"Speech recognition","score":0.39378130435943604},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.33580079674720764},{"id":"https://openalex.org/C180747234","display_name":"Cognitive psychology","score":0.1273859441280365},{"id":"https://openalex.org/C15744967","display_name":"Psychology","score":0.12071883678436279}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":31}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/how-generative-ai-improves-supply-chain-management","title":"How Generative AI Improves Supply Chain Management","url":"https://www.microsoft.com/en-us/research/publication/how-generative-ai-improves-supply-chain-management/","published":"2025-01-01","authors":["Ishai Menache","Jeevan Pathuri","David Simchi-Levi","Tom Linton"],"abstract":"Companies face a variety of complex challenges in designing and optimizing their supply chains. Increasing their resilience, reducing costs, and improving the quality of their planning are just a few of them. Over the past few decades, advances in information technologies have allowed firms to move from decision-making on the basis of intuition and experience to more automated and data-driven methods. As a result, businesses have seen efficiency gains, substantial cost reductions, and improved customer service. Venue: 1970-01-01","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Article (Journal)","Artificial intelligence","Machine learning","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W4406138127","title":"AllSpark: A Multimodal Spatiotemporal General Intelligence Model With Ten Modalities via Language as a Reference Framework","url":"https://doi.org/10.1109/tgrs.2025.3526725","published":"2025-01-01","authors":["Run Shao","Cheng Yang","Qiujun Li","Lei Xu","Xiang Yang","Xian Li","M. H. Li","Qing Zhu","Yongjun Zhang","Yansheng Li","Yu Liu","Yong Tang"],"abstract":"RGB, multispectral, point and other spatio-temporal modal data fundamentally represent different observational approaches for the same geographic object. Therefore, leveraging multimodal data is an inherent requirement for comprehending geographic objects. However, due to the high heterogeneity in structure and semantics among various spatio-temporal modalities, the joint interpretation of multimodal spatio-temporal data has long been an extremely challenging problem. The primary challenge resides in striking a trade-off between the cohesion and autonomy of diverse modalities. This trade-off becomes progressively nonlinear as the number of modalities expands. Inspired by the human cognitive system and linguistic philosophy, where perceptual signals from the five senses converge into language, we introduce the Language as Reference Framework (LaRF), a fundamental principle for constructin...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/tgrs.2025.3526725","openalex_id":"https://openalex.org/W4406138127","cited_by_count":22,"quality_score":67,"matched_keywords":["LLM","language model"],"author_affiliations":["Central South University","Huawei Technologies (China)","MPC Computers (United States)","Peking University","Shanghai Zhangjiang Laboratory","Southwest Jiaotong University","Wuhan University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7587390542030334},{"id":"https://openalex.org/C2779903281","display_name":"Modalities","score":0.7486206889152527},{"id":"https://openalex.org/C150189527","display_name":"Reference model","score":0.4263328015804291},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4161829352378845},{"id":"https://openalex.org/C67186912","display_name":"Data modeling","score":0.41452568769454956},{"id":"https://openalex.org/C2780226545","display_name":"Modality (human–computer interaction)","score":0.41135379672050476},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.4004271626472473},{"id":"https://openalex.org/C77088390","display_name":"Database","score":0.08596190810203552}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":22}},{"id":"openalex:W4414432428","title":"ViewCrafter: Taming Video Diffusion Models for High-fidelity Novel View Synthesis","url":"https://doi.org/10.1109/tpami.2025.3613256","published":"2025-01-01","authors":["Wangbo Yu","Jinbo Xing","Li Yuan","Wenbo Hu","Xiaoyu Li","Zhipeng Huang","Xiangjun Gao","Tien‐Tsin Wong","Ying Shan","Yonghong Tian"],"abstract":"Despite recent advancements in neural 3D reconstruction, the dependence on dense multi-view captures restricts their broader applicability. In this work, we propose ViewCrafter, a novel method for synthesizing high-fidelity novel views from single or sparse images with the prior of video diffusion model. Our method takes advantage of the powerful generation capabilities of video diffusion model and the coarse 3D clues offered by point-based representation to generate high-quality video frames with significantly improved camera pose control accuracy. To further enlarge the generation range of novel views, we tailored a progressive view synthesis strategy to expand the point cloud and the areas covered by the novel views, which can be further integrated with a camera trajectory planning algorithm to automatically reveal and address occlusions in different scenes. With ViewCrafter, we can f...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/tpami.2025.3613256","openalex_id":"https://openalex.org/W4414432428","cited_by_count":29,"quality_score":66,"matched_keywords":[],"author_affiliations":["Australian Regenerative Medicine Institute","Chinese University of Hong Kong","Hong Kong University of Science and Technology","Monash University","Peking University","Peng Cheng Laboratory","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8515999913215637},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.6937999725341797},{"id":"https://openalex.org/C205711294","display_name":"Rendering (computer graphics)","score":0.6561999917030334},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.626800000667572},{"id":"https://openalex.org/C2776449333","display_name":"View synthesis","score":0.5805000066757202},{"id":"https://openalex.org/C131979681","display_name":"Point cloud","score":0.5097000002861023},{"id":"https://openalex.org/C2776359362","display_name":"Representation (politics)","score":0.4781000018119812},{"id":"https://openalex.org/C13662910","display_name":"Trajectory","score":0.3944000005722046}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":29}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/enabling-autonomic-microservice-management-through-self-learning-agents","title":"Enabling Autonomic Microservice Management through Self-Learning Agents","url":"https://www.microsoft.com/en-us/research/publication/enabling-autonomic-microservice-management-through-self-learning-agents/","published":"2025-01-01","authors":["Fenglin Yu","Fangkai Yang","Xiaoting Qin","Zhiyang Zhang","Jue Zhang","Qingwei Lin 林庆维","Hongyu Zhang","Yingnong Dang","Saravan Rajmohan","Dongmei Zhang","Qi Zhang"],"abstract":"The increasing complexity of modern software systems necessitates robust autonomic self-management capabilities. While Large Language Models (LLMs) demonstrate potential in this domain, they often face challenges in adapting their general knowledge to specific service contexts. To address this limitation, we propose ServiceOdyssey, a self-learning agent system that autonomously manages microservices without requiring prior knowledge of service-specific configurations. By leveraging curriculum learning principles and iterative exploration, ServiceOdyssey progressively develops a deep understanding of operational environments, reducing dependence on human input or static documentation. A prototype built with the Sock Shop microservice demonstrates the potential of this approach for autonomic microservice management.","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Article (Journal)","Artificial intelligence","agent"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W4408061340","title":"Large Language Model for Multiobjective Evolutionary Optimization","url":"https://doi.org/10.1007/978-981-96-3538-2_13","published":"2025-01-01","authors":["Fei Liu","Xi Lin","Shunyu Yao","Zhenkun Wang","Xialiang Tong","Mingxuan Yuan","Qingfu Zhang"],"abstract":"","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"book-chapter","doi":"https://doi.org/10.1007/978-981-96-3538-2_13","openalex_id":"https://openalex.org/W4408061340","cited_by_count":22,"quality_score":63,"matched_keywords":["language model"],"author_affiliations":["City University of Hong Kong","City University of Hong Kong, Shenzhen Research Institute","Huawei Technologies (China)","Southern University of Science and Technology"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8558650612831116},{"id":"https://openalex.org/C68781425","display_name":"Multi-objective optimization","score":0.5425248146057129},{"id":"https://openalex.org/C159149176","display_name":"Evolutionary algorithm","score":0.5202118754386902},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.42355605959892273},{"id":"https://openalex.org/C126255220","display_name":"Mathematical optimization","score":0.37525495886802673},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.2586122155189514},{"id":"https://openalex.org/C33923547","display_name":"Mathematics","score":0.06102451682090759}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":22}},{"id":"openalex:W4413472064","title":"MuQ: Self-Supervised Music Representation Learning With Mel Residual Vector Quantization","url":"https://doi.org/10.1109/taslpro.2025.3602320","published":"2025-01-01","authors":["Haina Zhu","Yizhi Zhou","Hangting Chen","Jianwei Yu","Ziyang Ma","Rongzhi Gu","Yi Luo","Wei Chong Tan","Xie Chen"],"abstract":"Recent years have witnessed the success of foundation models pre-trained with self-supervised learning (SSL) in various music informatics understanding tasks, including music tagging, instrument classification, key detection, and more. In this paper, we propose a self-supervised music representation learning model for music understanding. Distinguished from previous studies adopting random projection or existing neural codec, the proposed model, named MuQ, is trained to predict tokens generated by Mel Residual Vector Quantization (Mel-RVQ). Our Mel-RVQ utilizes residual linear projection structure for Mel spectrum quantization to enhance the stability and efficiency of target extraction and lead to better performance. Experiments in a large variety of downstream tasks demonstrate that MuQ outperforms previous self-supervised music representation models with only 0.9 K hours of open-sourc...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/taslpro.2025.3602320","openalex_id":"https://openalex.org/W4413472064","cited_by_count":21,"quality_score":62,"matched_keywords":["quantization"],"author_affiliations":["Nanjing University","Shanghai Jiao Tong University","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C40567965","display_name":"Learning vector quantization","score":0.7931324243545532},{"id":"https://openalex.org/C199833920","display_name":"Vector quantization","score":0.7572458982467651},{"id":"https://openalex.org/C155512373","display_name":"Residual","score":0.6054275035858154},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5952182412147522},{"id":"https://openalex.org/C28490314","display_name":"Speech recognition","score":0.55677330493927},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5530197024345398},{"id":"https://openalex.org/C93372532","display_name":"Linde–Buzo–Gray algorithm","score":0.5049083828926086},{"id":"https://openalex.org/C28855332","display_name":"Quantization (signal processing)","score":0.42022082209587097}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":21}},{"id":"openalex:W4408810164","title":"Diffusion Model is Secretly a Training-Free Open Vocabulary Semantic Segmenter","url":"https://doi.org/10.1109/tip.2025.3551648","published":"2025-01-01","authors":["Jinglong Wang","Xiawei Li","Jing Zhang","Qingyuan Xu","Qin Zhou","Qian Yu","Lu Sheng","Dong Xu"],"abstract":"The pre-trained text-image discriminative models, such as CLIP, has been explored for open-vocabulary semantic segmentation with unsatisfactory results due to the loss of crucial localization information and awareness of object shapes. Recently, there has been a growing interest in expanding the application of generative models from generation tasks to semantic segmentation. These approaches utilize generative models either for generating annotated data or extracting features to facilitate semantic segmentation. This typically involves generating a considerable amount of synthetic data or requiring additional mask annotations. To this end, we uncover the potential of generative text-to-image diffusion models (e.g., Stable Diffusion) as highly efficient open-vocabulary semantic segmenters, and introduce a novel training-free approach named DiffSegmenter. The insight is that to generate re...","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/tip.2025.3551648","openalex_id":"https://openalex.org/W4408810164","cited_by_count":20,"quality_score":61,"matched_keywords":["efficient"],"author_affiliations":["Baidu (China)","Beihang University","City University of Hong Kong","University of Hong Kong"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.62568199634552},{"id":"https://openalex.org/C2777601683","display_name":"Vocabulary","score":0.5566156506538391},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5530751347541809},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.4382178783416748},{"id":"https://openalex.org/C41895202","display_name":"Linguistics","score":0.10901269316673279},{"id":"https://openalex.org/C138885662","display_name":"Philosophy","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":20}},{"id":"openalex:W4406237009","title":"Salute the Classic: Revisiting Challenges of Machine Translation in the Age of Large Language Models","url":"https://doi.org/10.1162/tacl_a_00730","published":"2025-01-01","authors":["Jianhui Pang","Fanghua Ye","Derek F. Wong","Dian Yu","Shuming Shi","Zhaopeng Tu","Longyue Wang"],"abstract":"Abstract The evolution of Neural Machine Translation (NMT) has been significantly influenced by six core challenges (Koehn and Knowles, 2017) that have acted as benchmarks for progress in this field. This study revisits these challenges, offering insights into their ongoing relevance in the context of advanced Large Language Models (LLMs): domain mismatch, amount of parallel data, rare word prediction, translation of long sentences, attention model as word alignment, and sub-optimal beam search. Our empirical findings show that LLMs effectively reduce reliance on parallel data for major languages during pretraining and significantly improve translation of long sentences containing approximately 80 words, even translating documents up to 512 words. Despite these improvements, challenges in domain mismatch and rare word prediction persist. While NMT-specific challenges like word alignment....","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1162/tacl_a_00730","openalex_id":"https://openalex.org/W4406237009","cited_by_count":16,"quality_score":57,"matched_keywords":["LLM"],"author_affiliations":["Tencent (China)","University College London","University of Macau"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7933290004730225},{"id":"https://openalex.org/C203005215","display_name":"Machine translation","score":0.6663926839828491},{"id":"https://openalex.org/C149364088","display_name":"Translation (biology)","score":0.5959684252738953},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.4748395085334778},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4726126492023468},{"id":"https://openalex.org/C41895202","display_name":"Linguistics","score":0.33720824122428894},{"id":"https://openalex.org/C138885662","display_name":"Philosophy","score":0.08140736818313599},{"id":"https://openalex.org/C104317684","display_name":"Gene","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":16}},{"id":"openalex:W4416035509","title":"SwiftKV: Fast Prefill-Optimized Inference with Knowledge-Preserving Model Transformation","url":"https://doi.org/10.18653/v1/2025.emnlp-main.1306","published":"2025-01-01","authors":["Aurick Qiao","Zhewei Yao","Samyam Rajbhandari","Yuxiong He"],"abstract":"LLM inference for enterprise applications, such as summarization, RAG, and code-generation, typically observe much longer prompt than generations, leading to high prefill cost and response latency.We present SwiftKV, a novel model transformation and distillation procedure targeted at reducing the prefill compute (in FLOPs) of prompt tokens while preserving high generation quality.First, SwiftKV prefills later layers' KV cache using an earlier layer's output, allowing prompt tokens to skip those later layers.Second, SwiftKV employs a lightweight knowledge-preserving distillation procedure that can adapt existing LLMs with minimal accuracy impact.Third, SwiftKV can naturally incorporate KV cache compression to improve inference performance in low-memory scenarios.Our comprehensive experiments show that SwiftKV can effectively reduce prefill computation by 25-50% across several LLM families...","companies":["Meta/FAIR"],"matched_orgs":["Meta/FAIR"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.18653/v1/2025.emnlp-main.1306","openalex_id":"https://openalex.org/W4416035509","cited_by_count":1,"quality_score":54,"matched_keywords":["LLM","memory","compression","distillation"],"author_affiliations":["Meta (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5576000213623047},{"id":"https://openalex.org/C11413529","display_name":"Algorithm","score":0.4375999867916107},{"id":"https://openalex.org/C204241405","display_name":"Transformation (genetics)","score":0.41839998960494995},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4083999991416931},{"id":"https://openalex.org/C2776214188","display_name":"Inference","score":0.351500004529953},{"id":"https://openalex.org/C33923547","display_name":"Mathematics","score":0.3327000141143799},{"id":"https://openalex.org/C2776401178","display_name":"Feature (linguistics)","score":0.26440000534057617},{"id":"https://openalex.org/C112972136","display_name":"Stability (learning theory)","score":0.24300000071525574}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4416035296","title":"OG-RAG: Ontology-grounded retrieval-augmented generation for large language models","url":"https://doi.org/10.18653/v1/2025.emnlp-main.1674","published":"2025-01-01","authors":["Kartik Sharma","Peeyush Kumar","Yunqing Li"],"abstract":"While LLMs are widely used for generic tasks like question answering and search, they struggle to adapt to specialized knowledge, such as industrial workflows in healthcare, legal, and agricultural sectors, as well as knowledgedriven tasks such as news journalism, investigative research, and consulting without expensive fine-tuning or sub-optimal retrieval methods.Existing retrieval-augmented models, such as RAG, offer improvements but fail to account for structured domain knowledge, leading to suboptimal context generation.Ontologies, which conceptually organize domain knowledge by defining entities and their interrelationships, offer a structured representation to address this gap.This paper presents OG-RAG, an Ontology-Grounded Retrieval Augmented Generation method designed to enhance LLMgenerated responses by anchoring retrieval processes in domain-specific ontologies.OG-RAG construc...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.18653/v1/2025.emnlp-main.1674","openalex_id":"https://openalex.org/W4416035296","cited_by_count":5,"quality_score":54,"matched_keywords":["retrieval","journalism","news"],"author_affiliations":["Microsoft (United States)","Microsoft Research (United Kingdom)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6194999814033508},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4154999852180481},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.39899998903274536},{"id":"https://openalex.org/C195324797","display_name":"Natural language","score":0.33169999718666077},{"id":"https://openalex.org/C41895202","display_name":"Linguistics","score":0.3154999911785126},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.30889999866485596},{"id":"https://openalex.org/C179603123","display_name":"Modeling language","score":0.25360000133514404},{"id":"https://openalex.org/C116834253","display_name":"Identification (biology)","score":0.2529999911785126}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":5}},{"id":"openalex:W4406458681","title":"ModelShield: Adaptive and Robust Watermark Against Model Extraction Attack","url":"https://doi.org/10.1109/tifs.2025.3530691","published":"2025-01-01","authors":["Kaiyi Pang","Tao Qi","Chuhan Wu","Minhao Bai","Minghu Jiang","Yongfeng Huang"],"abstract":"Large language models (LLMs) demonstrate general intelligence across a variety of machine learning tasks, thereby enhancing the commercial value of their intellectual property (IP). To protect this IP, model owners typically allow user access only in a black-box manner, however, adversaries can still utilize model extraction attacks to steal the model intelligence encoded in model generation. Watermarking technology offers a promising solution for defending against such attacks by embedding unique identifiers into the model-generated content. However, existing watermarking methods often compromise the quality of generated content due to heuristic alterations and lack robust mechanisms to counteract adversarial strategies, thus limiting their practicality in real-world scenarios. In this paper, we introduce an adaptive and robust watermarking method (named ModelShield) to protect the IP o...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/tifs.2025.3530691","openalex_id":"https://openalex.org/W4406458681","cited_by_count":13,"quality_score":54,"matched_keywords":["LLM"],"author_affiliations":["Beijing University of Posts and Telecommunications","Huawei Technologies (China)","Tsinghua University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8506501317024231},{"id":"https://openalex.org/C164112704","display_name":"Watermark","score":0.6293769478797913},{"id":"https://openalex.org/C150817343","display_name":"Digital watermarking","score":0.5979174375534058},{"id":"https://openalex.org/C38652104","display_name":"Computer security","score":0.4744737446308136},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4244720935821533},{"id":"https://openalex.org/C63479239","display_name":"Robustness (evolution)","score":0.4170331358909607},{"id":"https://openalex.org/C124101348","display_name":"Data mining","score":0.3502957224845886},{"id":"https://openalex.org/C153180895","display_name":"Pattern recognition (psychology)","score":0.3433494567871094}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":13}},{"id":"openalex:W4407317265","title":"Knowledge Graph Reasoning With Self-Supervised Reinforcement Learning","url":"https://doi.org/10.1109/taslpro.2025.3540648","published":"2025-01-01","authors":["Ying Ma","Owen Burns","Mingqiu Wang","Gang Li","Nan Du","Laurent El Shafey","Liqiang Wang","Izhak Shafran","Hagen Soltau"],"abstract":"Reinforcement learning (RL) is an effective method of finding reasoning pathways in incomplete knowledge graphs (KGs). To overcome the challenges of a large action space, a self-supervised pre-training method is proposed to warm up the policy network before the RL training stage. To alleviate the distributional mismatch issue in general self-supervised RL (SSRL), in our supervised learning (SL) stage, the agent selects actions based on the policy network and learns from generated labels; this self-generation of labels is the intuition behind the name self-supervised. With this training framework, the information density of our SL objective is increased and the agent is prevented from getting stuck with the early rewarded paths. Our self-supervised RL (SSRL) method improves the performance of RL by pairing it with the wide coverage achieved by SL during pretraining, since the breadth of t...","companies":["Google/DeepMind","Apple"],"matched_orgs":["Google/DeepMind","Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/taslpro.2025.3540648","openalex_id":"https://openalex.org/W4407317265","cited_by_count":1,"quality_score":54,"matched_keywords":["agent"],"author_affiliations":["Apple (Israel)","Apple (United States)","Beijing Institute of Technology","Google (United States)","Owl Research Institute","University of Central Florida"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7229562401771545},{"id":"https://openalex.org/C97541855","display_name":"Reinforcement learning","score":0.5791136026382446},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.570601224899292},{"id":"https://openalex.org/C132525143","display_name":"Graph","score":0.470132440328598},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.43313318490982056},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.36420387029647827},{"id":"https://openalex.org/C80444323","display_name":"Theoretical computer science","score":0.1918598711490631}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4412886914","title":"DistilQwen2.5: Industrial Practices of Training Distilled Open Lightweight Language Models","url":"https://doi.org/10.18653/v1/2025.acl-industry.4","published":"2025-01-01","authors":["Chengyu Wang","Junbing Yan","Yuanhao Yue","Jun Huang"],"abstract":"Enhancing computational efficiency and reducing deployment costs for large language models (LLMs) have become critical challenges in various resource-constrained scenarios.In this work, we present DistilQwen2.5, a family of distilled, lightweight LLMs derived from the public Qwen2.5 models.These distilled models exhibit enhanced instruction-following capabilities compared to the original models based on a series of distillation techniques that incorporate knowledge from much larger LLMs.In our industrial practice, we first leverage powerful proprietary LLMs with varying capacities as multi-agent teachers to select, rewrite, and refine instruction-response pairs that are more suitable for student LLMs to learn.After standard fine-tuning, we further leverage a computationally efficient model fusion approach that enables student models to progressively integrate fine-grained hidden knowledg...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.18653/v1/2025.acl-industry.4","openalex_id":"https://openalex.org/W4412886914","cited_by_count":1,"quality_score":54,"matched_keywords":["efficient","distillation","agent","multi-agent"],"author_affiliations":["Alibaba Group (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.654420793056488},{"id":"https://openalex.org/C2777211547","display_name":"Training (meteorology)","score":0.5131145119667053},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.3826883137226105},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.34682944416999817},{"id":"https://openalex.org/C199360897","display_name":"Programming language","score":0.32978978753089905},{"id":"https://openalex.org/C205649164","display_name":"Geography","score":0.05228373408317566},{"id":"https://openalex.org/C153294291","display_name":"Meteorology","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4406857575","title":"ShapeGPT: 3D Shape Generation With a Unified Multi-Modal Language Model","url":"https://doi.org/10.1109/tmm.2025.3535389","published":"2025-01-01","authors":["Fukun Yin","Xin Chen","Chi Zhang","Biao Jiang","Zibo Zhao","Wen Liu","Gang Yu","Tao Chen"],"abstract":"The advent of large language models, which enable flexibility through instruction-driven approaches, has revolutionized many traditional generative tasks, but large models for 3D data, particularly in comprehensively handling 3D shapes with other modalities, are still under-explored. By achieving instruction-based shape generation, versatile multi-modal generative shape models can significantly benefit various fields, such as 3D virtual construction and network-aided design. In this article, we present ShapeGPT, a shape-included multi-modal framework to leverage strong pre-trained language models to address multiple shape-relevant tasks. Specifically, ShapeGPT employs a “word-sentence-paragraph” framework to discretize continuous shapes into shape words, further assembles these words into shape sentences, and integrates shape with instructional text for multi-modal paragraphs. To learn t...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/tmm.2025.3535389","openalex_id":"https://openalex.org/W4406857575","cited_by_count":10,"quality_score":51,"matched_keywords":["language model"],"author_affiliations":["Fudan University","ShanghaiTech University","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8647238612174988},{"id":"https://openalex.org/C71139939","display_name":"Modal","score":0.6032269597053528},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.37173357605934143},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3299352526664734},{"id":"https://openalex.org/C185592680","display_name":"Chemistry","score":0.0},{"id":"https://openalex.org/C188027245","display_name":"Polymer chemistry","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":10}},{"id":"openalex:W4413553594","title":"Planner3D: LLM-enhanced Graph Prior Meets 3D Indoor Scene Explicit Regularization","url":"https://doi.org/10.1109/tpami.2025.3602216","published":"2025-01-01","authors":["Wei Yao","Martin Renqiang Min","George Vosselman","Li Erran Li","Michael Ying Yang"],"abstract":"Compositional 3D scene synthesis has diverse applications across a spectrum of industries such as robotics, films, and video games, as it closely mirrors the complexity of real-world multi-object environments. Conventional works typically employ shape retrieval based frameworks which naturally suffer from limited shape diversity. Recent progresses have been made in object shape generation with generative models such as diffusion models, which increases the shape fidelity. However, these approaches separately treat 3D shape generation and layout generation. The synthesized scenes are usually hampered by layout collision, which suggests that the scene-level fidelity is still under-explored. In this paper, we aim at generating realistic and reasonable 3D indoor scenes from scene graph. To enrich the priors of the given scene graph inputs, large language model is utilized to aggregate the gl...","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/tpami.2025.3602216","openalex_id":"https://openalex.org/W4413553594","cited_by_count":2,"quality_score":51,"matched_keywords":["LLM","language model","retrieval"],"author_affiliations":["Amazon (United States)","NEC (United States)","University of Twente"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.67146235704422},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.6202824711799622},{"id":"https://openalex.org/C2776135515","display_name":"Regularization (linguistics)","score":0.5926557779312134},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.5421718955039978},{"id":"https://openalex.org/C132525143","display_name":"Graph","score":0.43573129177093506},{"id":"https://openalex.org/C153180895","display_name":"Pattern recognition (psychology)","score":0.32572704553604126},{"id":"https://openalex.org/C80444323","display_name":"Theoretical computer science","score":0.13617241382598877}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":2}},{"id":"openalex:W4412888592","title":"DRT: Deep Reasoning Translation via Long Chain-of-Thought","url":"https://doi.org/10.18653/v1/2025.findings-acl.351","published":"2025-01-01","authors":["Jiaan Wang","Fandong Meng","Yunlong Liang","Jie Zhou"],"abstract":"Recently, O1-like models have emerged as representative examples, illustrating the effectiveness of long chain-of-thought (CoT) in reasoning tasks such as math and coding tasks.In this paper, we introduce DRT, an attempt to bring the success of long CoT to neural machine translation (MT).Specifically, in view of the literature books that might involve similes and metaphors, translating these texts to a target language is very difficult in practice due to cultural differences.In such cases, literal translation often fails to convey the intended meaning effectively.Even for professional human translators, considerable thought must be given to preserving semantics throughout the translation process.To simulate LLMs' long thought ability in MT, we first mine sentences containing similes or metaphors from existing literature books, and then develop a multi-agent framework to translate these s...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.18653/v1/2025.findings-acl.351","openalex_id":"https://openalex.org/W4412888592","cited_by_count":6,"quality_score":51,"matched_keywords":["agent","multi-agent"],"author_affiliations":["Tencent (China)"],"concepts":[{"id":"https://openalex.org/C149364088","display_name":"Translation (biology)","score":0.6664761900901794},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6323883533477783},{"id":"https://openalex.org/C199185054","display_name":"Chain (unit)","score":0.47870567440986633},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.42315375804901123},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.4103561043739319},{"id":"https://openalex.org/C188147891","display_name":"Cognitive science","score":0.3873363435268402},{"id":"https://openalex.org/C15744967","display_name":"Psychology","score":0.1742967665195465},{"id":"https://openalex.org/C185592680","display_name":"Chemistry","score":0.16763967275619507}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":6}},{"id":"openalex:W4416035608","title":"SMEC:Rethinking Matryoshka Representation Learning for Retrieval Embedding Compression","url":"https://doi.org/10.18653/v1/2025.emnlp-main.1332","published":"2025-01-01","authors":["Biao Zhang","Lixin Chen","Tong Liu","Bo Zheng"],"abstract":"Large language models (LLMs) generate highdimensional embeddings that capture rich semantic and syntactic information.However, high-dimensional embeddings exacerbate computational complexity and storage requirements, thereby hindering practical deployment.To address these challenges, we propose a novel training framework named Sequential Matryoshka Embedding Compression (SMEC).This framework introduces the Sequential Matryoshka Representation Learning(SMRL) method to mitigate gradient variance during training, the Adaptive Dimension Selection (ADS) module to reduce information degradation during dimension pruning, and the Selectable Cross-batch Memory (S-XBM) module to enhance unsupervised learning between high-and low-dimensional embeddings.Experiments on image, text, and multimodal datasets demonstrate that SMEC achieves significant dimensionality reduction while maintaining performanc...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.18653/v1/2025.emnlp-main.1332","openalex_id":"https://openalex.org/W4416035608","cited_by_count":0,"quality_score":49,"matched_keywords":["memory","retrieval","compression"],"author_affiliations":["Alibaba Group (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6636999845504761},{"id":"https://openalex.org/C41608201","display_name":"Embedding","score":0.5411999821662903},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5386999845504761},{"id":"https://openalex.org/C2776359362","display_name":"Representation (politics)","score":0.5073999762535095},{"id":"https://openalex.org/C180016635","display_name":"Compression (physics)","score":0.4884999990463257},{"id":"https://openalex.org/C59404180","display_name":"Feature learning","score":0.4113999903202057},{"id":"https://openalex.org/C78548338","display_name":"Data compression","score":0.38089999556541443},{"id":"https://openalex.org/C153180895","display_name":"Pattern recognition (psychology)","score":0.32589998841285706}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4415958678","title":"ViMoE: An Empirical Study of Designing Vision Mixture-of-Experts","url":"https://doi.org/10.1109/tip.2025.3626887","published":"2025-01-01","authors":["Xumeng Han","Longhui Wei","Zhiyang Dou","Yingfei Sun","Zhenjun Han","Qi Tian"],"abstract":"Mixture-of-Experts (MoE) models embody the divide-and-conquer concept and are a promising approach for increasing model capacity, demonstrating excellent scalability across multiple domains. In this paper, we integrate the MoE structure into the classic Vision Transformer (ViT), naming it ViMoE, and explore the potential of applying MoE to vision through a comprehensive study on image classification and semantic segmentation. However, we observe that the performance is sensitive to the configuration of MoE layers, making it challenging to obtain optimal results without careful design. The underlying cause is that inappropriate MoE layers lead to unreliable routing and hinder experts from effectively acquiring helpful information. To address this, we introduce a shared expert to learn and capture common knowledge, serving as an effective way to construct a stable ViMoE. Furthermore, we de...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/tip.2025.3626887","openalex_id":"https://openalex.org/W4415958678","cited_by_count":6,"quality_score":47,"matched_keywords":["efficient"],"author_affiliations":["Huawei Technologies (China)","University of Chinese Academy of Sciences"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7642999887466431},{"id":"https://openalex.org/C48044578","display_name":"Scalability","score":0.6244000196456909},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5638999938964844},{"id":"https://openalex.org/C120936955","display_name":"Empirical research","score":0.5637000203132629},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.4876999855041504},{"id":"https://openalex.org/C74172769","display_name":"Routing (electronic design automation)","score":0.43630000948905945},{"id":"https://openalex.org/C2780801425","display_name":"Construct (python library)","score":0.4246000051498413},{"id":"https://openalex.org/C66322947","display_name":"Transformer","score":0.3781000077724457}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":6}},{"id":"openalex:W4416035743","title":"Are LLMs Better than Reported? Detecting Label Errors and Mitigating Their Effect on Model Performance","url":"https://doi.org/10.18653/v1/2025.emnlp-main.1360","published":"2025-01-01","authors":["Omer Nahum","Nitay Calderon","Orgad Keller","Idan Szpektor","Roi Reichart"],"abstract":"NLP benchmarks rely on standardized datasets for training and evaluating models and are crucial for advancing the field.Traditionally, expert annotations ensure high-quality labels; however, the cost of expert annotation does not scale well with the growing demand for larger datasets required by modern models.While crowd-sourcing provides a more scalable solution, it often comes at the expense of annotation precision and consistency.Recent advancements in large language models (LLMs) offer new opportunities to enhance the annotation process, particularly for detecting label errors in existing datasets.In this work, we consider the recent approach of LLM-as-a-judge, leveraging an ensemble of LLMs to flag potentially mislabeled examples.We conduct a case study on four factual consistency datasets from the TRUE benchmark, spanning diverse NLP tasks, and on SummEval, which uses Likertscale r...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.18653/v1/2025.emnlp-main.1360","openalex_id":"https://openalex.org/W4416035743","cited_by_count":6,"quality_score":47,"matched_keywords":["LLM"],"author_affiliations":["Google (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.47369998693466187},{"id":"https://openalex.org/C112930515","display_name":"Risk analysis (engineering)","score":0.3508000075817108},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.2897000014781952},{"id":"https://openalex.org/C99498987","display_name":"Noise (video)","score":0.2752000093460083},{"id":"https://openalex.org/C149782125","display_name":"Econometrics","score":0.2680000066757202},{"id":"https://openalex.org/C26517878","display_name":"Key (lock)","score":0.25279998779296875},{"id":"https://openalex.org/C12174686","display_name":"Risk assessment","score":0.251800000667572},{"id":"https://openalex.org/C124101348","display_name":"Data mining","score":0.2498999983072281}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":6}},{"id":"openalex:W4414166040","title":"A Hierarchical Semantic Distillation Framework for Open-Vocabulary Object Detection","url":"https://doi.org/10.1109/tmm.2025.3607729","published":"2025-01-01","authors":["Shenghao Fu","Junkai Yan","Qize Yang","Xihan Wei","Xiaohua Xie","Wei‐Shi Zheng"],"abstract":"Open-vocabulary object detection (OVD) aims to detect objects beyond the training annotations, where detectors are usually aligned to a pre-trained vision-language model, e.g., CLIP, to inherit its generalizable recognition ability so that detectors can recognize new or novel objects. However, previous works directly align the feature space with CLIP and fail to learn the semantic knowledge effectively. In this work, we propose a hierarchical semantic distillation framework named HD-OVD to construct a comprehensive distillation process, which exploits generalizable knowledge from the CLIP model in three aspects. In the first hierarchy of HD-OVD, the detector learns fine-grained <italic xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">instance-wise semantics</i> from the CLIP image encoder by modeling relations among single objects in the visual sp...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/tmm.2025.3607729","openalex_id":"https://openalex.org/W4414166040","cited_by_count":2,"quality_score":47,"matched_keywords":["language model","distillation"],"author_affiliations":["Alibaba Group (China)","Alibaba Group (United States)","Sun Yat-sen University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8646000027656555},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.6891000270843506},{"id":"https://openalex.org/C2776151529","display_name":"Object detection","score":0.579200029373169},{"id":"https://openalex.org/C31170391","display_name":"Hierarchy","score":0.5613999962806702},{"id":"https://openalex.org/C153180895","display_name":"Pattern recognition (psychology)","score":0.5062999725341797},{"id":"https://openalex.org/C2781238097","display_name":"Object (grammar)","score":0.5027999877929688},{"id":"https://openalex.org/C52622490","display_name":"Feature extraction","score":0.5023999810218811},{"id":"https://openalex.org/C2780801425","display_name":"Construct (python library)","score":0.4821000099182129}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":2}},{"id":"openalex:W4409872571","title":"Diverse AI Feedback For Large Language Model Alignment","url":"https://doi.org/10.1162/tacl_a_00746","published":"2025-01-01","authors":["Tianshu Yu","Ting-En Lin","Yuchuan Wu","Min Yang","Fei Huang","Yongbin Li"],"abstract":"Abstract Recent advances in large language models (LLMs) focus on aligning models with human values to minimize harmful content. However, existing methods often rely on a single type of feedback, such as preferences, annotated labels, or critiques, which can lead to overfitting and suboptimal performance. In this paper, we propose Diverse AIFeedback (DAIF), a novel approach that integrates three types of feedback—critique, refinement, and preference—tailored to tasks of varying uncertainty levels. Through an analysis of information gain, we show that critique feedback is most effective for low-uncertainty tasks, refinement feedback for medium-uncertainty tasks, and preference feedback for high-uncertainty tasks. Training with this diversified feedback reduces overfitting and improves alignment. Experimental results across three tasks—question answering, dialog generation, and text summar...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1162/tacl_a_00746","openalex_id":"https://openalex.org/W4409872571","cited_by_count":1,"quality_score":46,"matched_keywords":["language model","preference"],"author_affiliations":["Alibaba Group (China)","Chinese Academy of Sciences","Shenzhen Institutes of Advanced Technology","Shenzhen Technology University","Shenzhen University","University of Chinese Academy of Sciences"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8772175312042236},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.5533381700515747},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.5406972169876099},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5049698948860168},{"id":"https://openalex.org/C28490314","display_name":"Speech recognition","score":0.33180615305900574}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4416034087","title":"AirRAG: Autonomous Strategic Planning and Reasoning Steer Retrieval Augmented Generation","url":"https://doi.org/10.18653/v1/2025.findings-emnlp.1030","published":"2025-01-01","authors":["Wenfeng Feng","Chuzhan Hao","Yuewei Zhang","Guochao Jiang","Jingyi Song"],"abstract":"Leveraging the autonomous decision-making capabilities of large language models (LLMs) has demonstrated superior performance in reasoning tasks.However, despite the success of iterative or agentic retrieval-augmented generation (RAG) techniques, these methods are often constrained to a single solution space when confronted with complex problems.In this paper, we propose a novel thinking pattern in RAG that integrates autonomous strategic planning with efficient reasoning actions, significantly activating intrinsic reasoning capabilities and expanding the solution space of specific tasks via Monte Carlo Tree Search (MCTS), which we refer to as AirRAG.Specifically, our approach designs five fundamental reasoning actions, which are expanded to a broad tree-based reasoning space using MCTS.The approach also incorporates self-consistency verification to explore potential reasoning paths and i...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.18653/v1/2025.findings-emnlp.1030","openalex_id":"https://openalex.org/W4416034087","cited_by_count":1,"quality_score":46,"matched_keywords":["retrieval","efficient"],"author_affiliations":["Alibaba Group (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6082000136375427},{"id":"https://openalex.org/C48243021","display_name":"Strategic planning","score":0.46630001068115234},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4359000027179718},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.3521000146865845},{"id":"https://openalex.org/C195094911","display_name":"Process management","score":0.34360000491142273},{"id":"https://openalex.org/C153715457","display_name":"Augmented reality","score":0.3131999969482422},{"id":"https://openalex.org/C201995342","display_name":"Systems engineering","score":0.2971000075340271},{"id":"https://openalex.org/C26517878","display_name":"Key (lock)","score":0.2874000072479248}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4411472261","title":"WI3D: Weakly Incremental 3D Detection via Vision Foundation Models","url":"https://doi.org/10.1109/tmm.2025.3581776","published":"2025-01-01","authors":["Mingsheng Li","Sijin Chen","Shengji Tang","Hongyuan Zhu","Yanyan Fang","Xin Chen","Zhuoyuan Li","Fukun Yin","Tao Chen"],"abstract":"Class-incremental 3D object detection demands a 3D detector to <italic xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">locate</i> and <italic xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">recognize</i> novel categories in a stream fashion while preserving its base detection ability. However, existing methods require delicate 3D annotations for learning novel categories, resulting in significant labeling costs. To this end, we explore a label-efficient approach called <bold xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">W</b>eakly <bold xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">I</b>ncremental <bold xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">3</b>D <bold xmlns:mml=\"htt...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/tmm.2025.3581776","openalex_id":"https://openalex.org/W4411472261","cited_by_count":0,"quality_score":45,"matched_keywords":["efficient","distillation"],"author_affiliations":["Agency for Science, Technology and Research","Fudan University","Institute for Infocomm Research","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8157126903533936},{"id":"https://openalex.org/C2780966255","display_name":"Foundation (evidence)","score":0.658252477645874},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.43311864137649536},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.3564460277557373},{"id":"https://openalex.org/C166957645","display_name":"Archaeology","score":0.0},{"id":"https://openalex.org/C95457728","display_name":"History","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4412986305","title":"Stereo-Talker: Audio-driven 3D Human Synthesis with Prior-Guided Mixture-of-Experts","url":"https://doi.org/10.1109/tpami.2025.3596160","published":"2025-01-01","authors":["Xiang Deng","Youxin Pang","Xiaochen Zhao","Chao Xu","Lizhen Wang","Hongjiang Xiao","Shi Yan","Hongwen Zhang","Yebin Liu"],"abstract":"This paper introduces Stereo-Talker, a novel one-shot audio-driven human video synthesis system that generates 3D talking videos with precise lip synchronization, expressive body gestures, temporally consistent photo-realistic quality, and continuous viewpoint control. The process follows a two-stage approach. In the first stage, the system maps audio input to high-fidelity motion sequences, encompassing upper-body gestures and facial expressions. To enrich motion diversity and authenticity, large language model (LLM) priors are integrated with text-aligned semantic audio features, leveraging LLMs' cross-modal generalization power to enhance motion quality. In the second stage, we improve diffusion-based video generation models by incorporating a prior-guided Mixture-of-Experts (MoE) mechanism: a view-guided MoE focuses on view-specific attributes, while a mask-guided MoE enhances region...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/tpami.2025.3596160","openalex_id":"https://openalex.org/W4412986305","cited_by_count":0,"quality_score":45,"matched_keywords":["LLM","language model"],"author_affiliations":["Apple (United States)","Beijing Normal University","Communication University of China","Tsinghua University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6871227622032166},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.6008595824241638},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.5379716753959656},{"id":"https://openalex.org/C28490314","display_name":"Speech recognition","score":0.4891660511493683},{"id":"https://openalex.org/C153180895","display_name":"Pattern recognition (psychology)","score":0.32372575998306274}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4416033648","title":"Shy-hunyuan-MT at WMT25 General Machine Translation Shared Task","url":"https://doi.org/10.18653/v1/2025.wmt-1.36","published":"2025-01-01","authors":["Mao Zheng","Zheng Li","Yang Du","Bingxin Qu","Mingyang Song"],"abstract":"In this paper, we present our submission to the WMT25 shared task on machine translation, for which we propose Synergy-enhanced policy optimization framework, named Shy.This novel two-phase training framework synergistically combines knowledge distillation and fusion via reinforcement learning.In the first phase, we introduce a multi-stage training framework that harnesses the complementary strengths of multiple state-of-the-art large language models to generate diverse, high-quality translation candidates.These candidates serve as pseudoreferences to guide the supervised fine-tuning of our model, Hunyuan-7B, effectively distilling the collective knowledge of multiple expert systems into a single efficient model.In the second phase, we further refine the distilled model through Group Relative Policy Optimization, a reinforcement learning technique that employs a composite reward function...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.18653/v1/2025.wmt-1.36","openalex_id":"https://openalex.org/W4416033648","cited_by_count":0,"quality_score":45,"matched_keywords":["efficient","distillation"],"author_affiliations":["Tencent (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7087000012397766},{"id":"https://openalex.org/C2780451532","display_name":"Task (project management)","score":0.6182000041007996},{"id":"https://openalex.org/C203005215","display_name":"Machine translation","score":0.5467000007629395},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5062000155448914},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.4797999858856201},{"id":"https://openalex.org/C149364088","display_name":"Translation (biology)","score":0.41190001368522644},{"id":"https://openalex.org/C2776401178","display_name":"Feature (linguistics)","score":0.3005000054836273},{"id":"https://openalex.org/C165064840","display_name":"Matching (statistics)","score":0.2702000141143799}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4412571773","title":"REAL Sampling: Boosting Factuality and Diversity of Open-ended Generation by Extrapolating the Entropy of an Infinitely Large LM","url":"https://doi.org/10.1162/tacl_a_00757","published":"2025-01-01","authors":["Haw-Shiuan Chang","Nanyun Peng","Mohit Bansal","Anil Ramakrishna","Tagyoung Chung"],"abstract":"Abstract Decoding methods for large language models (LLMs) usually struggle with the tradeoff between ensuring factuality and maintaining diversity. In this paper, we propose REAL (Residual Entropy from Asymptotic Line) sampling,1 which predicts the step-wise hallucination likelihood of an LLM. When an LLM is likely to hallucinate, REAL lowers the p threshold in nucleus sampling. Otherwise, REAL sampling increases the p threshold to boost the diversity. To predict the step-wise hallucination likelihood without supervision, we construct a THF (Token-level Hallucination Forecasting) model, which predicts the asymptotic entropy (i.e., inherent uncertainty) of the next token by extrapolating the next-token entropies of an infinitely large language model from a series of LLMs with different sizes. If an LLM’s entropy is higher than the asymptotic entropy (i.e., the LLM is more uncertain than....","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1162/tacl_a_00757","openalex_id":"https://openalex.org/W4412571773","cited_by_count":0,"quality_score":45,"matched_keywords":["LLM","language model"],"author_affiliations":["Amazon (Germany)","Amazon (United States)","UMass Memorial Health Care"],"concepts":[{"id":"https://openalex.org/C46686674","display_name":"Boosting (machine learning)","score":0.8418110609054565},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7145786285400391},{"id":"https://openalex.org/C106301342","display_name":"Entropy (arrow of time)","score":0.5907045006752014},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.49455973505973816},{"id":"https://openalex.org/C140779682","display_name":"Sampling (signal processing)","score":0.48469236493110657},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.42535415291786194},{"id":"https://openalex.org/C2781316041","display_name":"Diversity (politics)","score":0.4140772223472595},{"id":"https://openalex.org/C76155785","display_name":"Telecommunications","score":0.1405288279056549}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4413469181","title":"RA-BLIP: Multimodal Adaptive Retrieval-Augmented Bootstrapping Language-Image Pre-Training","url":"https://doi.org/10.1109/tmm.2025.3599070","published":"2025-01-01","authors":["Mengmeng Ding","Yang Ma","Pengda Qin","Jianlong Wu","Yuhong Li","Liqiang Nie"],"abstract":"Multimodal Large Language Models (MLLMs) have recently received substantial interest, which shows their emerging potential as general-purpose models for various vision-language tasks. MLLMs involve significant external knowledge within their parameters; however, it is challenging to continually update these models with the latest knowledge, which involves huge computational costs and poor interpretability. Retrieval augmentation techniques have proven to be effective plugins for both LLMs and MLLMs. In this study, we propose multimodal adaptive Retrieval-Augmented Bootstrapping Language-Image Pre-training (<bold xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">RA-BLIP</b>), a novel retrieval-augmented framework for various MLLMs. We first leverage the question to instruct the extraction of visual information through interactions with one set of le...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/tmm.2025.3599070","openalex_id":"https://openalex.org/W4413469181","cited_by_count":4,"quality_score":45,"matched_keywords":["retrieval"],"author_affiliations":["Alibaba Group (China)","Harbin Institute of Technology","Shenzhen Institute of Information Technology","The University of Sydney"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8507159948348999},{"id":"https://openalex.org/C207609745","display_name":"Bootstrapping (finance)","score":0.6849023699760437},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5387680530548096},{"id":"https://openalex.org/C1667742","display_name":"Image retrieval","score":0.4904763400554657},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.45528802275657654},{"id":"https://openalex.org/C199579030","display_name":"Automatic image annotation","score":0.4190182089805603},{"id":"https://openalex.org/C28490314","display_name":"Speech recognition","score":0.35673987865448},{"id":"https://openalex.org/C115961682","display_name":"Image (mathematics)","score":0.34602057933807373}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":4}},{"id":"openalex:W4408971910","title":"Pilot Study of Retrieval-Augmented Generation Model in Recommending Traditional Chinese Medicine Formulations","url":"https://doi.org/10.1007/978-3-031-86323-3_38","published":"2025-01-01","authors":["Yan-Wo Chan","Po‐Yu Huang","Zhi-Liang Chen","C. Wang","Wen‐Chen Lin","Jung-Peng Chiu","Yi-Chun Chiu","Yang-Hsien Lin","E X Huang","Simon See","Kang‐Ping Lin"],"abstract":"Retrieval-Augmented Generation (RAG) is a method used to optimize the output of large language models (LLMs). This study investigates the feasibility of using an LLM within a RAG framework to generate recommendations for Traditional Chinese Medicine (TCM) formulations. The study employs the mixtral-8x7b model as the LLM within the RAG architecture, utilizing clinical records from outpatient TCM visits as external data sources for generating TCM formulation recommendations. The recommendations from the RAG-based LLM are compared with those generated by the ChatGPT 3.5 model, evaluating their consistency with actual clinical prescriptions. Results indicate that the RAG-based LLM achieved an average score of 74, demonstrating a high level of alignment with clinical prescriptions across the cases studied. In contrast, the ChatGPT 3.5 model only achieved an average score of 25, primarily due....","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"book-chapter","doi":"https://doi.org/10.1007/978-3-031-86323-3_38","openalex_id":"https://openalex.org/W4408971910","cited_by_count":0,"quality_score":45,"matched_keywords":["LLM","retrieval"],"author_affiliations":["Chung Yuan Christian University","Nvidia (United States)","Taipei City Hospital","Taipei Municipal YangMing Hospital","The Medical Device (United Kingdom)"],"concepts":[{"id":"https://openalex.org/C188947578","display_name":"Traditional Chinese medicine","score":0.5178523659706116},{"id":"https://openalex.org/C556039675","display_name":"Traditional medicine","score":0.4714580178260803},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.46591830253601074},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.4592234492301941},{"id":"https://openalex.org/C71924100","display_name":"Medicine","score":0.3667503297328949},{"id":"https://openalex.org/C204787440","display_name":"Alternative medicine","score":0.2878066897392273},{"id":"https://openalex.org/C142724271","display_name":"Pathology","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4411100024","title":"Online Learning via Memory: Retrieval-Augmented Detector Adaptation","url":"https://doi.org/10.1007/978-3-031-91578-9_19","published":"2025-01-01","authors":["Yanan Jian","Fuxun Yu","Qi Zhang","W. I. Levine","Brandon Dubbs","Nikolaos Karianakis"],"abstract":"","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"book-chapter","doi":"https://doi.org/10.1007/978-3-031-91578-9_19","openalex_id":"https://openalex.org/W4411100024","cited_by_count":0,"quality_score":45,"matched_keywords":["memory","retrieval"],"author_affiliations":["Microsoft (Finland)","Microsoft (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8740971684455872},{"id":"https://openalex.org/C139807058","display_name":"Adaptation (eye)","score":0.5603588223457336},{"id":"https://openalex.org/C94915269","display_name":"Detector","score":0.5539987683296204},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4323929250240326},{"id":"https://openalex.org/C169760540","display_name":"Neuroscience","score":0.07479402422904968},{"id":"https://openalex.org/C76155785","display_name":"Telecommunications","score":0.05350607633590698},{"id":"https://openalex.org/C15744967","display_name":"Psychology","score":0.051652878522872925}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"arxiv:2505.01681","title":"Large language model driven development of turbulence models","url":"http://arxiv.org/abs/2505.01681","published":"2025-01-01","authors":["Zhongxin Yang","Yuanwei Bin","Yipeng Shi","Xiang I. A. Yang"],"abstract":"Abstract Artificial intelligence (AI) has achieved human-level performance in specialised tasks such as Go, image recognition and protein folding, raising the prospect of an AI singularity – where machines not only match, but surpass human reasoning. Here, we demonstrate a step towards this vision in the context of turbulence modelling. By treating a large language model (LLM), DeepSeek-R1, as an equal partner, we establish a closed-loop, iterative workflow in which the LLM proposes, refines and reasons about near-wall turbulence models under adverse pressure gradients (APGs), system rotation and surface roughness. Through multiple rounds of interaction involving long-chain reasoning and a priori and a posteriori evaluations, the LLM generates models that not only rediscover established strategies, but also synthesise new ones that outperform baseline wall models. Specifically, it recomm...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1017/flo.2025.10032","openalex_id":"https://openalex.org/W4415027759","cited_by_count":0,"quality_score":45,"matched_keywords":["LLM","language model"],"author_affiliations":["Eastern Institute of Technology","Peking University","Pennsylvania State University","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C2779343474","display_name":"Context (archaeology)","score":0.6348000168800354},{"id":"https://openalex.org/C74050887","display_name":"Rotation (mathematics)","score":0.5810999870300293},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5579000115394592},{"id":"https://openalex.org/C177212765","display_name":"Workflow","score":0.5354999899864197},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.508899986743927},{"id":"https://openalex.org/C2776542497","display_name":"Development (topology)","score":0.4607999920845032},{"id":"https://openalex.org/C196558001","display_name":"Turbulence","score":0.4544999897480011},{"id":"https://openalex.org/C2776799497","display_name":"Surface (topology)","score":0.4316999912261963}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"arxiv:2510.10401","title":"Knowledge-Decoupled Functionally Invariant Path With Synthetic Personal Data for Personalized ASR","url":"http://arxiv.org/abs/2510.10401","published":"2025-01-01","authors":["Yue Gu","Zhihao Du","Ying Shi","Jiqing Han","Yongjun He"],"abstract":"Fine-tuning generic ASR models with large-scale synthetic personal data can enhance the personalization of ASR models, but it introduces challenges in adapting to synthetic personal data without forgetting real knowledge, and in adapting to personal data without forgetting generic knowledge. Considering that the functionally invariant path (FIP) framework enables model adaptation while preserving prior knowledge, in this letter, we introduce FIP into synthetic-data-augmented personalized ASR models. However, the model still struggles to balance the learning of synthetic, personalized, and generic knowledge when applying FIP to train the model on all three types of data simultaneously. To decouple this learning process and further address the above two challenges, we integrate a gated parameter-isolation strategy into FIP and propose a knowledge-decoupled functionally invariant path (KDFI...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/lsp.2025.3621332","openalex_id":"https://openalex.org/W4415178749","cited_by_count":0,"quality_score":45,"matched_keywords":["personalized","personalization"],"author_affiliations":["Alibaba Group (China)","Harbin Institute of Technology"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7975999712944031},{"id":"https://openalex.org/C183003079","display_name":"Personalization","score":0.6559000015258789},{"id":"https://openalex.org/C7149132","display_name":"Forgetting","score":0.6432999968528748},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5171999931335449},{"id":"https://openalex.org/C160920958","display_name":"Synthetic data","score":0.49070000648498535},{"id":"https://openalex.org/C2777735758","display_name":"Path (computing)","score":0.4595000147819519},{"id":"https://openalex.org/C190470478","display_name":"Invariant (physics)","score":0.4101000130176544},{"id":"https://openalex.org/C98045186","display_name":"Process (computing)","score":0.39899998903274536}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4409177605","title":"Correction to: A Data-Efficient Nearest-Neighbor Language Model via Lightweight Nets","url":"https://doi.org/10.1007/978-981-96-2292-4_14","published":"2025-01-01","authors":["Qinhao Zhou","Xiang Xiang","Ke Wang","Yuqi Zhang","Yuchuan Wu","Yongbin Li"],"abstract":"","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"book-chapter","doi":"https://doi.org/10.1007/978-981-96-2292-4_14","openalex_id":"https://openalex.org/W4409177605","cited_by_count":0,"quality_score":45,"matched_keywords":["language model","efficient"],"author_affiliations":["Alibaba Group (China)","Huazhong University of Science and Technology","Zhejiang Sci-Tech University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7488670349121094},{"id":"https://openalex.org/C113238511","display_name":"k-nearest neighbors algorithm","score":0.5284162759780884},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.39693647623062134},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.34259581565856934},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3191866874694824}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4407719230","title":"A Data-Efficient Nearest-Neighbor Language Model via Lightweight Nets","url":"https://doi.org/10.1007/978-981-96-2292-4_1","published":"2025-01-01","authors":["Qinhao Zhou","Xiang Xiang","Ke Wang","Yuqi Zhang","Yuchuan Wu","Yongbin Li"],"abstract":"","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"book-chapter","doi":"https://doi.org/10.1007/978-981-96-2292-4_1","openalex_id":"https://openalex.org/W4407719230","cited_by_count":0,"quality_score":45,"matched_keywords":["language model","efficient"],"author_affiliations":["Alibaba Group (China)","Huazhong University of Science and Technology","Zhejiang Sci-Tech University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.622941255569458},{"id":"https://openalex.org/C113238511","display_name":"k-nearest neighbors algorithm","score":0.580917477607727},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.33100128173828125}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4416036377","title":"TableRAG: A Retrieval Augmented Generation Framework for Heterogeneous Document Reasoning","url":"https://doi.org/10.18653/v1/2025.emnlp-main.710","published":"2025-01-01","authors":["Xiaohan Yu","Jian Pu","Chong Chen"],"abstract":"Retrieval-Augmented Generation (RAG) has demonstrated considerable effectiveness in open-domain question answering.However, when applied to heterogeneous documents, comprising both textual and tabular components, existing RAG approaches exhibit critical limitations.The prevailing practice of flattening tables and chunking strategies disrupts the intrinsic tabular structure, leads to information loss, and undermines the reasoning capabilities of LLMs in multi-hop, global queries.To address these challenges, we propose TableRAG, an SQL-based framework that unifies textual understanding and complex manipulations over tabular data.TableRAG iteratively operates in four steps: context-sensitive query decomposition, text retrieval, SQL programming and execution, and compositional intermediate answer generation.We also develop HeteQA, a novel benchmark designed to evaluate the multi-hop heteroge...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.18653/v1/2025.emnlp-main.710","openalex_id":"https://openalex.org/W4416036377","cited_by_count":3,"quality_score":44,"matched_keywords":["retrieval"],"author_affiliations":["Huawei Technologies (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7020000219345093},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.42800000309944153},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.3752000033855438},{"id":"https://openalex.org/C161156560","display_name":"Document retrieval","score":0.35420000553131104},{"id":"https://openalex.org/C9652623","display_name":"Field (mathematics)","score":0.2791000008583069},{"id":"https://openalex.org/C124101348","display_name":"Data mining","score":0.2581000030040741},{"id":"https://openalex.org/C168167062","display_name":"Component (thermodynamics)","score":0.2500999867916107},{"id":"https://openalex.org/C20162079","display_name":"Case-based reasoning","score":0.24560000002384186}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":3}},{"id":"openalex:W4416035023","title":"Retrieval-Augmented Machine Translation with Unstructured Knowledge","url":"https://doi.org/10.18653/v1/2025.findings-emnlp.313","published":"2025-01-01","authors":["Jiaan Wang","Fandong Meng","Yingxue Zhang","Jie Zhou"],"abstract":"Retrieval-augmented generation (RAG) introduces additional information to enhance large language models (LLMs).In machine translation (MT), previous work typically retrieves in-context examples from paired MT corpora, or domain-specific knowledge from knowledge graphs, to enhance MT models.However, a large amount of world knowledge is organized in unstructured documents, and might not be fully paired across different languages.In this paper, we study retrieval-augmented MT using unstructured documents.Specifically, we build RAGtrans, the first benchmark to train and evaluate LLMs' retrieval-augmented MT ability.RAGtrans contains 169K MT samples collected via GPT-4o and human translators.Besides, documents from various languages are also provided to supply the knowledge to these samples.Based on RAGtrans, we further propose a multi-task training method to teach LLMs how to use information...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.18653/v1/2025.findings-emnlp.313","openalex_id":"https://openalex.org/W4416035023","cited_by_count":3,"quality_score":44,"matched_keywords":["retrieval"],"author_affiliations":["Tencent (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7175999879837036},{"id":"https://openalex.org/C203005215","display_name":"Machine translation","score":0.5906999707221985},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.5537999868392944},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5475999712944031},{"id":"https://openalex.org/C149364088","display_name":"Translation (biology)","score":0.3783999979496002},{"id":"https://openalex.org/C115925183","display_name":"Knowledge-based systems","score":0.3531000018119812},{"id":"https://openalex.org/C161301231","display_name":"Knowledge representation and reasoning","score":0.2928999960422516},{"id":"https://openalex.org/C26517878","display_name":"Key (lock)","score":0.28049999475479126}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":3}},{"id":"openalex:W4411119405","title":"Measuring memorization in language models via probabilistic extraction","url":"https://doi.org/10.18653/v1/2025.naacl-long.469","published":"2025-01-01","authors":["Jamie Hayes","Marika Swanberg","Harsh Chaudhari","Itay Yona","Ilia Shumailov","Milad Nasr","Christopher A. Choquette-Choo","Katherine Lee","A. Feder Cooper"],"abstract":"Jamie Hayes, Marika Swanberg, Harsh Chaudhari, Itay Yona, Ilia Shumailov, Milad Nasr, Christopher A. Choquette-Choo, Katherine Lee, A. Feder Cooper. Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers). 2025.","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.18653/v1/2025.naacl-long.469","openalex_id":"https://openalex.org/W4411119405","cited_by_count":7,"quality_score":44,"matched_keywords":[],"author_affiliations":["Microsoft (United States)","Microsoft Research (United Kingdom)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7225451469421387},{"id":"https://openalex.org/C49937458","display_name":"Probabilistic logic","score":0.712540328502655},{"id":"https://openalex.org/C30038468","display_name":"Memorization","score":0.5531625747680664},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.49786996841430664},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.4942479431629181},{"id":"https://openalex.org/C4725764","display_name":"Extraction (chemistry)","score":0.45598629117012024},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.43722736835479736},{"id":"https://openalex.org/C33923547","display_name":"Mathematics","score":0.15177449584007263}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":7}},{"id":"openalex:W4412889674","title":"Analyzing and Mitigating Inconsistency in Discrete Speech Tokens for Neural Codec Language Models","url":"https://doi.org/10.18653/v1/2025.acl-long.1498","published":"2025-01-01","authors":["Wenrui Liu","Zhifang Guo","Jin Xu","Yuanjun Lv","Yunfei Chu","Zemin Liu","Junyang Lin"],"abstract":"Building upon advancements in Large Language Models (LLMs), the field of audio processing has seen increased interest in training speech generation tasks with discrete speech token sequences.However, directly discretizing speech by neural audio codecs often results in sequences that fundamentally differ from text sequences.Unlike text, where text token sequences are deterministic, discrete speech tokens can exhibit significant variability based on contextual factors, while still producing perceptually identical audio segments.We refer to this phenomenon as Discrete Representation Inconsistency (DRI).This inconsistency can lead to a single speech segment being represented by multiple divergent sequences, which creates confusion in neural codec language models and results in poor generated speech.In this paper, we quantitatively analyze the DRI phenomenon within popular audio tokenizers su...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.18653/v1/2025.acl-long.1498","openalex_id":"https://openalex.org/W4412889674","cited_by_count":3,"quality_score":44,"matched_keywords":["language model"],"author_affiliations":["Alibaba Group (China)","Alibaba Group (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.791224479675293},{"id":"https://openalex.org/C161765866","display_name":"Codec","score":0.5994244813919067},{"id":"https://openalex.org/C28490314","display_name":"Speech recognition","score":0.5772507786750793},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.4740849733352661},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.4726390242576599},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3842388391494751},{"id":"https://openalex.org/C76155785","display_name":"Telecommunications","score":0.080717533826828}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":3}},{"id":"openalex:W4409152602","title":"ALPS: An Auto-Labeling and Pre-Training Scheme for Remote Sensing Segmentation With Segment Anything Model","url":"https://doi.org/10.1109/tip.2025.3556344","published":"2025-01-01","authors":["Song Zhang","Qingzhong Wang","Junyi Liu","Haoyi Xiong"],"abstract":"In the fast-growing field of Remote Sensing (RS) image analysis, the gap between massive unlabeled datasets and the ability to fully utilize these datasets for advanced RS analytics presents a significant challenge. To fill the gap, our work introduces an innovative auto-labeling framework named ALPS (Automatic Labeling for Pre-training in Segmentation), which leverages the Segment Anything Model (SAM) to predict precise pseudo-labels for RS images without necessitating prior annotations or additional prompts. The proposed pipeline significantly reduces the labor and resource demands traditionally associated with annotating RS datasets. By constructing two comprehensive pseudo-labeled RS datasets via ALPS for pre-training purposes, our approach enhances the performance of downstream tasks across various benchmarks, including iSAID and ISPRS Potsdam. Experiments demonstrate the effectiven...","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/tip.2025.3556344","openalex_id":"https://openalex.org/W4409152602","cited_by_count":7,"quality_score":44,"matched_keywords":[],"author_affiliations":["Aerospace Information Research Institute","Baidu (China)","Chinese Academy of Sciences","Institute of Electronics"],"concepts":[{"id":"https://openalex.org/C89600930","display_name":"Segmentation","score":0.7221499681472778},{"id":"https://openalex.org/C77618280","display_name":"Scheme (mathematics)","score":0.6816487312316895},{"id":"https://openalex.org/C124504099","display_name":"Image segmentation","score":0.6702038645744324},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6553364396095276},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.6268693208694458},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.5907884240150452},{"id":"https://openalex.org/C2777211547","display_name":"Training (meteorology)","score":0.49674636125564575},{"id":"https://openalex.org/C51632099","display_name":"Training set","score":0.41073551774024963}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":7}},{"id":"openalex:W4413785914","title":"A Novel Approach to Mimic Linguistic Features of Transcripts of Atypical Speech","url":"https://doi.org/10.1109/taslpro.2025.3603921","published":"2025-01-01","authors":["Mahya Mirgbagheri","Hamidreza Saghir","Tom Chau"],"abstract":"It is often infeasible and costly to obtain speech data from individuals who exhibit atypical linguistic patterns. Although there have been considerable advances in the field of text generation, concerns remain about the potential leakage of identifiable information when generating data from patient speech transcripts. In response to this challenge, we present a novel approach to mimic the linguistic patterns found in transcripts of atypical speech. Our method estimates the probability distributions of linguistic features, devoid of sensitive information and unique to atypical typical speech. With these probability distributions, we create a mimicked atypical transcript by modifying selected content of a transcript of typical speech. To evaluate this mimicry strategy, three testing scenarios were considered: baseline test, where a classifier was trained and tested on real data; cross-dat...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/taslpro.2025.3603921","openalex_id":"https://openalex.org/W4413785914","cited_by_count":3,"quality_score":44,"matched_keywords":["LLM"],"author_affiliations":["Holland Bloorview Kids Rehabilitation Hospital","Microsoft (United States)","Microsoft Research (United Kingdom)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6829647421836853},{"id":"https://openalex.org/C28490314","display_name":"Speech recognition","score":0.5801845788955688},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.5595728158950806},{"id":"https://openalex.org/C41895202","display_name":"Linguistics","score":0.4836674630641937},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4683040380477905},{"id":"https://openalex.org/C61328038","display_name":"Speech processing","score":0.4474961459636688},{"id":"https://openalex.org/C138885662","display_name":"Philosophy","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":3}},{"id":"openalex:W4408287340","title":"The Blessing of Reasoning: LLM-Based Contrastive Explanations in Black-Box Recommender Systems","url":"https://doi.org/10.2139/ssrn.5099067","published":"2025-01-01","authors":["Yuyan Wang","Pan Li","Minmin Chen"],"abstract":"","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"preprint","doi":"https://doi.org/10.2139/ssrn.5099067","openalex_id":"https://openalex.org/W4408287340","cited_by_count":2,"quality_score":43,"matched_keywords":["LLM"],"author_affiliations":["Georgia Institute of Technology","Google (United States)"],"concepts":[{"id":"https://openalex.org/C2776195157","display_name":"Blessing","score":0.9536190032958984},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5872044563293457},{"id":"https://openalex.org/C94966114","display_name":"Black box","score":0.5399025082588196},{"id":"https://openalex.org/C557471498","display_name":"Recommender system","score":0.5321649312973022},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.37388139963150024},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3605429530143738},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.23724031448364258},{"id":"https://openalex.org/C138885662","display_name":"Philosophy","score":0.12902840971946716}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":2}},{"id":"openalex:W4407950003","title":"QuArch: A Question-Answering Dataset for AI Agents in Computer Architecture","url":"https://doi.org/10.1109/lca.2025.3541961","published":"2025-01-01","authors":["Shvetank Prakash","Andrew Cheng","Jason Yik","Arya Tschand","Radhika Ghosal","Ikechukwu Uchendu","Jessica Quaye","Jeffrey Ma","Shreyas Grampurohit","Sofia Giannuzzi","Arnav Balyan","Fin Amin"],"abstract":"We introduce QuArch, a dataset of 1500 human-validated question-answer pairs designed to evaluate and enhance language models’ understanding of computer architecture. The dataset covers areas including processor design, memory systems, and performance optimization. Our analysis highlights a significant performance gap: the best closed-source model achieves 84% accuracy, while the top small open-source model reaches 72%. We observe notable struggles on QAs regarding memory systems and interconnection networks. Fine-tuning with QuArch improves small model accuracy by up to 8%, establishing a foundation for advancing AI-driven computer architecture research. The dataset and the leaderboard are accessible at <uri xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">https://quarch.ai/</uri>.","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/lca.2025.3541961","openalex_id":"https://openalex.org/W4407950003","cited_by_count":2,"quality_score":43,"matched_keywords":["memory"],"author_affiliations":["Google (United Kingdom)","Google (United States)","Google DeepMind (United Kingdom)","Harvard University Press","Indian Institute of Technology Bombay","North Carolina State University","Qualcomm (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8205617070198059},{"id":"https://openalex.org/C44291984","display_name":"Question answering","score":0.6997362971305847},{"id":"https://openalex.org/C123657996","display_name":"Architecture","score":0.6173240542411804},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.49584850668907166},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.49255236983299255},{"id":"https://openalex.org/C199360897","display_name":"Programming language","score":0.3536699414253235},{"id":"https://openalex.org/C118524514","display_name":"Computer architecture","score":0.33318617939949036},{"id":"https://openalex.org/C142362112","display_name":"Art","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":2}},{"id":"openalex:W4406949348","title":"EHealth: A Chinese Biomedical Language Model Built via Multi-Level Text Discrimination","url":"https://doi.org/10.1109/taslpro.2025.3536177","published":"2025-01-01","authors":["Quan Wang","Songtai Dai","Benfeng Xu","Yajuan Lyu","Hua Wu","Haifeng Wang"],"abstract":"Pre-trained language models (PLMs) have recently revolutionized the field of natural language processing, impacting not only the general domain but also the biomedical domain. Most previous studies on constructing biomedical PLMs relied simply on domain adaptation and focused mainly on English. This work introduces eHealth, a compact, encoder-based Chinese biomedical PLM that can be fine-tuned in a customized manner to effectively handle various Chinese biomedical language understanding tasks. Rather than relying on domain adaptation, eHealth is built from scratch using a novel pre-training framework. This framework trains eHealth as a discriminator through token- and sequence-level discrimination. Token-level discrimination detects corrupted input tokens and recovers their original forms from plausible candidates, while sequence-level discrimination further distinguishes corruptions of....","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/taslpro.2025.3536177","openalex_id":"https://openalex.org/W4406949348","cited_by_count":2,"quality_score":43,"matched_keywords":["language model"],"author_affiliations":["Baidu (China)","Beijing University of Posts and Telecommunications","University of Science and Technology of China"],"concepts":[{"id":"https://openalex.org/C202645933","display_name":"eHealth","score":0.839866042137146},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7042616605758667},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.5277358293533325},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4248684346675873},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.4121930003166199},{"id":"https://openalex.org/C28490314","display_name":"Speech recognition","score":0.33403465151786804},{"id":"https://openalex.org/C160735492","display_name":"Health care","score":0.156112402677536},{"id":"https://openalex.org/C162324750","display_name":"Economics","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":2}},{"id":"openalex:W4412719241","title":"AnyEnhance: A Unified Generative Model With Prompt-Guidance and Self-Critic for Voice Enhancement","url":"https://doi.org/10.1109/taslpro.2025.3587393","published":"2025-01-01","authors":["J. S. Zhang","Jing Yang","Zihao Fang","Yuancheng Wang","Zehua Zhang","Zhuo Wang","Fan Fan","Zhizheng Wu"],"abstract":"We introduce <sc xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">AnyEnhance</small>, a unified generative model for voice enhancement that processes both speech and singing voices. Based on a masked generative model, <sc xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">AnyEnhance</small> is capable of handling both speech and singing voices, supporting a wide range of enhancement tasks including denoising, dereverberation, declipping, super-resolution, and target speaker extraction, all simultaneously and without fine-tuning. <sc xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">AnyEnhance</small> introduces a prompt-guidance mechanism for in-context learning, which allows the model to natively accept a reference speaker’s timbre. In this way, it could boost enhancem...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/taslpro.2025.3587393","openalex_id":"https://openalex.org/W4412719241","cited_by_count":6,"quality_score":43,"matched_keywords":[],"author_affiliations":["Chinese University of Hong Kong, Shenzhen","Huawei Technologies (China)","Shenzhen Research Institute of Big Data"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7110299468040466},{"id":"https://openalex.org/C28490314","display_name":"Speech recognition","score":0.5857429504394531},{"id":"https://openalex.org/C167966045","display_name":"Generative model","score":0.5620591044425964},{"id":"https://openalex.org/C2776182073","display_name":"Speech enhancement","score":0.5280357599258423},{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.4810197949409485},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.38262614607810974},{"id":"https://openalex.org/C163294075","display_name":"Noise reduction","score":0.09209829568862915}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":6}},{"id":"openalex:W4409641247","title":"Privacy-Diffusion: Privacy-Preserving Stable Diffusion Without FHE and Differential Privacy","url":"https://doi.org/10.1109/access.2025.3562563","published":"2025-01-01","authors":["Po-Chu Hsu","Zhong Yuan Yu","Shuhei Mise","Hideaki Miyaji"],"abstract":"Text-to-image generation is trending in the generative artificial intelligence (GenAI) field. Among open-sourced image generation projects, Stable Diffusion is the state-of-the-art. Many artists and service providers customize the diffusion model to generate featured high-quality images. However, there is no protection to the privacy of the input text prompt, output image, and customized model. Privacy is very important since it can increase users’ willingness to use the service and protect the service provider’s intellectual property. Existing privacy-preserving diffusion model require fully homomorphic encryption (FHE) to ensure its privacy and security. Nonetheless, FHE is very time-consuming and may reduce accuracy due to approximations and deteriorate image quality. In this research, we propose Privacy-Diffusion, a privacy-preserving diffusion framework without FHE. By utilizing the...","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/access.2025.3562563","openalex_id":"https://openalex.org/W4409641247","cited_by_count":1,"quality_score":42,"matched_keywords":["efficient"],"author_affiliations":["Amazon (United States)","Ritsumeikan University"],"concepts":[{"id":"https://openalex.org/C23130292","display_name":"Differential privacy","score":0.7410914897918701},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5962035655975342},{"id":"https://openalex.org/C69357855","display_name":"Diffusion","score":0.5881112813949585},{"id":"https://openalex.org/C108827166","display_name":"Internet privacy","score":0.5327190160751343},{"id":"https://openalex.org/C123201435","display_name":"Information privacy","score":0.5321288108825684},{"id":"https://openalex.org/C509729295","display_name":"Privacy software","score":0.4796586334705353},{"id":"https://openalex.org/C3017597292","display_name":"Privacy protection","score":0.4302777945995331},{"id":"https://openalex.org/C38652104","display_name":"Computer security","score":0.4281771183013916}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4408105642","title":"OPT-Tree: Speculative Decoding with Adaptive Draft Tree Structure","url":"https://doi.org/10.1162/tacl_a_00735","published":"2025-01-01","authors":["Jikai Wang","Yi Su","Juntao Li","Qingrong Xia","Zi Ye","Xinyu Duan","Zhefeng Wang","Min Zhang"],"abstract":"Abstract Autoregressive language models demonstrate excellent performance in various scenarios. However, the inference efficiency is limited by its one-step-one-word generation mode, which has become a pressing problem recently as the models become increasingly larger. Speculative decoding employs a “draft and then verify” mechanism to allow multiple tokens to be generated in one step, realizing lossless acceleration. Existing methods mainly adopt fixed heuristic draft structures, which do not adapt to different situations to maximize the acceptance length during verification. To alleviate this dilemma, we propose OPT-Tree, an algorithm to construct adaptive and scalable draft trees, which can be applied to any autoregressive draft model. It searches the optimal tree structure that maximizes the mathematical expectation of the acceptance length in each decoding step. Experimental results...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1162/tacl_a_00735","openalex_id":"https://openalex.org/W4408105642","cited_by_count":5,"quality_score":42,"matched_keywords":[],"author_affiliations":["Huawei Technologies (China)","Soochow University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7862807512283325},{"id":"https://openalex.org/C113174947","display_name":"Tree (set theory)","score":0.7584524154663086},{"id":"https://openalex.org/C57273362","display_name":"Decoding methods","score":0.6503328084945679},{"id":"https://openalex.org/C163797641","display_name":"Tree structure","score":0.5972824096679688},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3228369951248169},{"id":"https://openalex.org/C11413529","display_name":"Algorithm","score":0.3087623119354248},{"id":"https://openalex.org/C197855036","display_name":"Binary tree","score":0.09974974393844604},{"id":"https://openalex.org/C33923547","display_name":"Mathematics","score":0.07672718167304993}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":5}},{"id":"openalex:W4408442274","title":"Instant Gaussian Splatting Generation for High-Quality and Real-Time Facial Asset Rendering","url":"https://doi.org/10.1109/tpami.2025.3550195","published":"2025-01-01","authors":["Dafei Qin","Hongyang Lin","Qixuan Zhang","Kaichun Qiao","Longwen Zhang","Jun Saito","Zijun Zhao","Jingyi Yu","Lan Xu","Taku Komura"],"abstract":"Traditional and AI-driven modeling techniques enable high-fidelity 3D asset generation from scans, videos, or text prompts. However, editing and rendering these assets often involves a trade-off between quality and speed. In this paper, we propose GauFace, a novel Gaussian Splatting representation, tailored for efficient rendering of facial mesh with textures. Then, we introduce TransGS, a diffusion transformer that instantly generates the GauFace assets from mesh, textures and lightning conditions. Specifically, we adopt a patch-based pipeline to handle the vast number of Gaussian Points, a novel texel-aligned sampling scheme with UV positional encoding to enhance the throughput of generating GauFace assets. Once trained, TransGS can generate GauFace assets in 5 seconds, delivering high fidelity and real-time facial interaction of 30fps@1440p to a Snapdragon 8 Gen 2 mobile platform. The...","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/tpami.2025.3550195","openalex_id":"https://openalex.org/W4408442274","cited_by_count":1,"quality_score":42,"matched_keywords":["efficient"],"author_affiliations":["Nvidia (United States)","ShanghaiTech University","University of Hong Kong"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7945034503936768},{"id":"https://openalex.org/C205711294","display_name":"Rendering (computer graphics)","score":0.6322637796401978},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.6317515969276428},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.566909670829773},{"id":"https://openalex.org/C121684516","display_name":"Computer graphics (images)","score":0.5261856317520142},{"id":"https://openalex.org/C2779432360","display_name":"Instant","score":0.4738845229148865},{"id":"https://openalex.org/C83248878","display_name":"Active appearance model","score":0.4344955086708069},{"id":"https://openalex.org/C163716315","display_name":"Gaussian","score":0.429293155670166}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4409019114","title":"Enhancing Language Models via HTML DOM Tree for Text Structure Understanding","url":"https://doi.org/10.1109/taslpro.2025.3555098","published":"2025-01-01","authors":["Hangdi Xing","Zirui Shao","Feiyu Gao","Jiajun Bu","Zhi Yu","Qi Zheng","Jingjun Gu","Xiaozhong Liu"],"abstract":"Understanding text structure, which enables the automated system to parse long text structure, is crucial for various natural language processing applications such as information extraction, summarization, and question answering. Although previous methods have advanced text structure parsing effectively, they face challenges such as not leveraging the abundance of unlabelled data and focusing mainly on content-inferred information. To address this deficiency, this paper introduces a novel Text Structure Language Model (TSLM), an LM pre-training framework that employs ubiquitous HTML documents and considers the text structure among text units. HTML documents are composed by experts and their hierarchies can reflect the structure of documents. Our learning framework is designed to equip the LM with awareness of two complementary kinds of structures from HTML documents. It encourages the mo...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/taslpro.2025.3555098","openalex_id":"https://openalex.org/W4409019114","cited_by_count":1,"quality_score":42,"matched_keywords":["language model"],"author_affiliations":["Alibaba Group (China)","Worcester Polytechnic Institute","Zhejiang University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7491384744644165},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.7211087942123413},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5314461588859558},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.4334052801132202},{"id":"https://openalex.org/C113174947","display_name":"Tree (set theory)","score":0.42872104048728943},{"id":"https://openalex.org/C28490314","display_name":"Speech recognition","score":0.33314889669418335},{"id":"https://openalex.org/C33923547","display_name":"Mathematics","score":0.08609005808830261},{"id":"https://openalex.org/C134306372","display_name":"Mathematical analysis","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4414008295","title":"Efficient Alignment of Unconditioned Action Prior for Language-Conditioned Pick and Place in Clutter","url":"https://doi.org/10.1109/tase.2025.3606549","published":"2025-01-01","authors":["Kechun Xu","Xunlong Xia","Kaixuan Wang","Yifei Yang","Yunxuan Mao","Bing Deng","Jieping Ye","Rong Xiong","Yue Wang"],"abstract":"We study the task of language-conditioned pick and place in clutter, where a robot should grasp a target object in open clutter and move it to a specified place. Some approaches learn end-to-end policies with features from vision foundation models, requiring large datasets. Others combine foundation models in a zero-shot setting, suffering from cascading errors. In addition, they primarily leverage vision and language foundation models, focusing less on action priors. In this paper, we aim to develop an effective policy by integrating foundation priors from vision, language, and action. We propose A<sup xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">2</sup>, an action prior alignment method that aligns unconditioned action priors with 3D vision-language priors by learning one attention layer. The alignment formulation enables our policy to train...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/tase.2025.3606549","openalex_id":"https://openalex.org/W4414008295","cited_by_count":1,"quality_score":42,"matched_keywords":["efficient"],"author_affiliations":["Alibaba Group (China)","Zhejiang University"],"concepts":[{"id":"https://openalex.org/C132094186","display_name":"Clutter","score":0.8019574880599976},{"id":"https://openalex.org/C2780791683","display_name":"Action (physics)","score":0.6750104427337646},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5401261448860168},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4398939609527588},{"id":"https://openalex.org/C28490314","display_name":"Speech recognition","score":0.38746124505996704},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.32195448875427246},{"id":"https://openalex.org/C121332964","display_name":"Physics","score":0.205856591463089},{"id":"https://openalex.org/C554190296","display_name":"Radar","score":0.19468653202056885}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4416036455","title":"EIFBENCH: Extremely Complex Instruction Following Benchmark for Large Language Models","url":"https://doi.org/10.18653/v1/2025.emnlp-main.1059","published":"2025-01-01","authors":["Tao Zou","Xinghua Zhang","Haiyang Yu","Minzheng Wang","Fei Huang","Yongbin Li"],"abstract":"With the development and widespread application of large language models (LLMs), the new paradigm of \"Model as Product\" is rapidly evolving, and demands higher capabilities to address complex user needs, often requiring precise workflow execution which involves the accurate understanding of multiple tasks.However, existing benchmarks focusing on single-task environments with limited constraints lack the complexity required to fully reflect real-world scenarios.To bridge this gap, we present the Extremely Complex Instruction Following Benchmark (EIFBENCH), meticulously crafted to facilitate a more realistic and robust evaluation of LLMs.EIFBENCH not only includes multi-task scenarios that enable comprehensive assessment across diverse task types concurrently, but also integrates a variety of constraints, replicating complex operational environments.Furthermore, we propose the Segment Poli...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.18653/v1/2025.emnlp-main.1059","openalex_id":"https://openalex.org/W4416036455","cited_by_count":1,"quality_score":42,"matched_keywords":["LLM"],"author_affiliations":["Alibaba Group (China)","Alibaba Group (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.705299973487854},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.49939998984336853},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.4781000018119812},{"id":"https://openalex.org/C185798385","display_name":"Benchmark (surveying)","score":0.4681999981403351},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.3799999952316284},{"id":"https://openalex.org/C199360897","display_name":"Programming language","score":0.3479999899864197},{"id":"https://openalex.org/C195324797","display_name":"Natural language","score":0.28780001401901245},{"id":"https://openalex.org/C2776401178","display_name":"Feature (linguistics)","score":0.2831000089645386}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4411148253","title":"A Task Complexity Evaluation Framework for Mobile AI Agent Applications","url":"https://doi.org/10.1007/978-3-031-94171-9_43","published":"2025-01-01","authors":["Jiazhi Wen","C. H. Li","Yuwei Yang","Jianye Li"],"abstract":"","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"book-chapter","doi":"https://doi.org/10.1007/978-3-031-94171-9_43","openalex_id":"https://openalex.org/W4411148253","cited_by_count":1,"quality_score":42,"matched_keywords":["agent"],"author_affiliations":["Alibaba Group (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7104499936103821},{"id":"https://openalex.org/C2780451532","display_name":"Task (project management)","score":0.6261107921600342},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.38600170612335205},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3380807638168335},{"id":"https://openalex.org/C201995342","display_name":"Systems engineering","score":0.0842161476612091},{"id":"https://openalex.org/C127413603","display_name":"Engineering","score":0.08113518357276917}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4416036309","title":"Teaching Your Models to Understand Code via Focal Preference Alignment","url":"https://doi.org/10.18653/v1/2025.emnlp-main.707","published":"2025-01-01","authors":["Jie Wu","Haoling Li","Xin Zhang","Xiao Liu","Yangyu Huang","Jianwen Luo","Yizhen Zhang","Zuchao Li","Ruihang Chu","Yujiu Yang","Shan Li"],"abstract":"Jie Wu, Haoling Li, Xin Zhang, Xiao Liu, Yangyu Huang, Jianwen Luo, Yizhen Zhang, Zuchao Li, Ruihang Chu, Yujiu Yang, Scarlett Li. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing. 2025.","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.18653/v1/2025.emnlp-main.707","openalex_id":"https://openalex.org/W4416036309","cited_by_count":0,"quality_score":41,"matched_keywords":["preference"],"author_affiliations":["Microsoft (United States)","Microsoft Research (United Kingdom)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6450999975204468},{"id":"https://openalex.org/C2776760102","display_name":"Code (set theory)","score":0.5374000072479248},{"id":"https://openalex.org/C195324797","display_name":"Natural language","score":0.49790000915527344},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.48089998960494995},{"id":"https://openalex.org/C2781249084","display_name":"Preference","score":0.44179999828338623},{"id":"https://openalex.org/C199360897","display_name":"Programming language","score":0.4041000008583069},{"id":"https://openalex.org/C2776608160","display_name":"Natural (archaeology)","score":0.40389999747276306},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.36579999327659607}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4410772741","title":"Storytelling Video Generation with Retrieval Augmentation and Character Consistency","url":"https://doi.org/10.1007/978-3-031-92808-6_14","published":"2025-01-01","authors":["Yingqing He","Menghan Xia","Haoxin Chen","Xiaodong Cun","Yuan Gong","Jinbo Xing","Yong Zhang","Xintao Wang","Chao Weng","Ying Shan","Qifeng Chen"],"abstract":"","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"book-chapter","doi":"https://doi.org/10.1007/978-3-031-92808-6_14","openalex_id":"https://openalex.org/W4410772741","cited_by_count":0,"quality_score":41,"matched_keywords":["retrieval"],"author_affiliations":["Hong Kong University of Science and Technology","Tencent (China)","Tsinghua University","University of Hong Kong"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8319573402404785},{"id":"https://openalex.org/C2776538412","display_name":"Storytelling","score":0.7764316201210022},{"id":"https://openalex.org/C2780861071","display_name":"Character (mathematics)","score":0.7213298678398132},{"id":"https://openalex.org/C2776436953","display_name":"Consistency (knowledge bases)","score":0.6983792781829834},{"id":"https://openalex.org/C49774154","display_name":"Multimedia","score":0.4236499071121216},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.3990475535392761},{"id":"https://openalex.org/C121684516","display_name":"Computer graphics (images)","score":0.3956140875816345},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3882344961166382}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4407262511","title":"RadImageGAN – A Multi-modal Dataset-Scale Generative AI for Medical Imaging","url":"https://doi.org/10.1007/978-3-031-82007-6_17","published":"2025-01-01","authors":["Zelong Liu","A. Peyton Smith","Alexander Lautin","Jieshen Zhou","Maxwell Yoo","Mikey Sullivan","Haorun Li","Louisa Deyer","Alexander Zhou","Arnold Yang","Alara Yimaz","Catherine Zhang"],"abstract":"","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"book-chapter","doi":"https://doi.org/10.1007/978-3-031-82007-6_17","openalex_id":"https://openalex.org/W4407262511","cited_by_count":4,"quality_score":41,"matched_keywords":[],"author_affiliations":["Cornell University","Icahn School of Medicine at Mount Sinai","Nvidia (United States)","Riverview Medical Center"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8127295970916748},{"id":"https://openalex.org/C71139939","display_name":"Modal","score":0.7178558111190796},{"id":"https://openalex.org/C2778755073","display_name":"Scale (ratio)","score":0.6102874279022217},{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.6098318696022034},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5185454487800598},{"id":"https://openalex.org/C31601959","display_name":"Medical imaging","score":0.4311589002609253},{"id":"https://openalex.org/C58640448","display_name":"Cartography","score":0.13004085421562195},{"id":"https://openalex.org/C205649164","display_name":"Geography","score":0.04732847213745117}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":4}},{"id":"openalex:W4414306651","title":"Pricing Optimization across Domains: A Comparative Review","url":"https://doi.org/10.63282/3050-922x.ijeret-v6i3p103","published":"2025-01-01","authors":["Pavan Nithin Mullapudi"],"abstract":"Pricing optimization is a critical capability across industries, integrating methods from rule-based heuristics to advanced artificial intelligence. This condensed literature review compares pricing methodologies in four major sectors – Financial Trading, Retail E-commerce, B2B SaaS/Cloud, and Travel and Hospitality – highlighting both common themes and domain-specific nuances. We outline a methodological taxonomy encompassing simple rule-based strategies, econometric demand modeling, operations research techniques from revenue management, machine learning and reinforcement learning (RL) algorithms, and emerging generative AI approaches. Industry sections detail each domain’s pricing objectives (e.g. profit vs. market share), the unique data available (from high-frequency market data to customer usage patterns), prevalent algorithms (from Black–Scholes models to multi-armed bandits), and...","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"review","doi":"https://doi.org/10.63282/3050-922x.ijeret-v6i3p103","openalex_id":"https://openalex.org/W4414306651","cited_by_count":0,"quality_score":41,"matched_keywords":["personalized"],"author_affiliations":["Amazon (United States)"],"concepts":[{"id":"https://openalex.org/C2779391423","display_name":"Dynamic pricing","score":0.7045999765396118},{"id":"https://openalex.org/C2781386248","display_name":"Revenue management","score":0.5795999765396118},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5371999740600586},{"id":"https://openalex.org/C142038384","display_name":"Yield management","score":0.5260000228881836},{"id":"https://openalex.org/C127705205","display_name":"Heuristics","score":0.5041000247001648},{"id":"https://openalex.org/C2780193402","display_name":"Pricing strategies","score":0.46389999985694885},{"id":"https://openalex.org/C181622380","display_name":"Profit (economics)","score":0.42399999499320984},{"id":"https://openalex.org/C185632549","display_name":"Investment theory","score":0.4124000072479248}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4406014203","title":"PRISM-Med: Parameter-Efficient Robust Interdomain Specialty Model for Medical Language Tasks","url":"https://doi.org/10.1109/access.2024.3525041","published":"2025-01-01","authors":["Jieui Kang","Hyungon Ryu","Jaehyeong Sim"],"abstract":"Language Models (LMs) have shown remarkable potential in healthcare applications, yet their widespread adoption faces challenges in achieving consistent performance across diverse medical specialties while maintaining parameter efficiency. Current approaches to fine-tuning language models for medical tasks often require extensive computational resources and struggle with managing specialized medical knowledge across different domains. To address these challenges, we present PRISM-Med (Parameter-efficient Robust Interdomain Specialty Model), a novel framework that enhances domain-specific performance through unsupervised domain separation and specialized adaptation. Our framework introduces three key innovations: (1) an unsupervised domain separator that automatically discovers optimal knowledge boundaries within medical corpora, (2) a domain-specific Low-Rank Adaptation (LoRA) strategy t...","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/access.2024.3525041","openalex_id":"https://openalex.org/W4406014203","cited_by_count":0,"quality_score":41,"matched_keywords":["efficient"],"author_affiliations":["Ewha Womans University","Nvidia (United Kingdom)","Nvidia (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7206741571426392},{"id":"https://openalex.org/C20387591","display_name":"Specialty","score":0.5565824508666992},{"id":"https://openalex.org/C67666897","display_name":"Prism","score":0.5281391739845276},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.4196586608886719},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.25009506940841675},{"id":"https://openalex.org/C71924100","display_name":"Medicine","score":0.13600635528564453},{"id":"https://openalex.org/C120665830","display_name":"Optics","score":0.07455334067344666},{"id":"https://openalex.org/C512399662","display_name":"Family medicine","score":0.06867203116416931}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4411120503","title":"NeMo-Inspector: A Visualization Tool for LLM Generation Analysis","url":"https://doi.org/10.18653/v1/2025.naacl-demo.28","published":"2025-01-01","authors":["Daria Gitman","Igor Gitman","Evelina Bakhturina"],"abstract":"Daria Gitman, Igor Gitman, Evelina Bakhturina. Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (System Demonstrations). 2025.","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.18653/v1/2025.naacl-demo.28","openalex_id":"https://openalex.org/W4411120503","cited_by_count":0,"quality_score":41,"matched_keywords":["LLM"],"author_affiliations":["Nvidia (United States)"],"concepts":[{"id":"https://openalex.org/C36464697","display_name":"Visualization","score":0.7511357665061951},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6456097364425659},{"id":"https://openalex.org/C172367668","display_name":"Data visualization","score":0.4318403899669647},{"id":"https://openalex.org/C2522767166","display_name":"Data science","score":0.3789704740047455},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.18166634440422058}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7117531248","title":"Multiphase and Multitask Prompt Tuning for LLM-Based Context-Aware Machine Translation","url":"https://doi.org/10.1109/tnnls.2025.3646848","published":"2025-01-01","authors":["Xinglin Lyu","Junhui Li","Daimeng Wei","Min Zhang","Shimin Tao","Hao Yang","Min Zhang"],"abstract":"Large language models (LLMs) are typically adapted for context-aware machine translation (MT) by combining both the source sentence and its surrounding sentences into a single input. This unified input is then processed in one go, with the model producing the target translation step by step. However, this method treats the intrasentence and intersentence contexts similarly, even though they play distinct roles. In this study, we present a novel strategy called multiphase prompt tuning (MPT) to address this issue by enabling LLMs to treat these two context types differently. MPT divides the context-aware translation task into three phases: encoding the intersentence context, encoding the source sentence, andthefinal decoding phase. Each phase incorporates distinct continuous prompts that help the model focus on the appropriate task for each type of context. We also introduce a multitask f...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/tnnls.2025.3646848","openalex_id":"https://openalex.org/W7117531248","cited_by_count":0,"quality_score":41,"matched_keywords":["LLM"],"author_affiliations":["Huawei Technologies (China)","Soochow University","Zhengzhou University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7925000190734863},{"id":"https://openalex.org/C203005215","display_name":"Machine translation","score":0.7827000021934509},{"id":"https://openalex.org/C2777530160","display_name":"Sentence","score":0.6690999865531921},{"id":"https://openalex.org/C2780451532","display_name":"Task (project management)","score":0.628000020980835},{"id":"https://openalex.org/C149364088","display_name":"Translation (biology)","score":0.6115000247955322},{"id":"https://openalex.org/C125411270","display_name":"Encoding (memory)","score":0.5885000228881836},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5680999755859375},{"id":"https://openalex.org/C57273362","display_name":"Decoding methods","score":0.5468999743461609}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4412081767","title":"LegalMind: Agentic AI-Driven Process Optimization and Cost Reduction in Legal Services Using DeepSeek","url":"https://doi.org/10.1109/access.2025.3586781","published":"2025-01-01","authors":["Nidadavolu Venkat Durga Sai Siva Vara Prasad Raju","Nuruzzaman Faruqui","Nikhil Patel","Olivia-Roxana Alecsoiu","Priyabrata Thatoi","Salem A. Alyami","AKM Azad"],"abstract":"The legal industry struggles with inefficiencies, high costs, and manual-intensive workflows. Traditional AI lacks adaptability in optimizing legal operations. To address this, we propose <italic xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">LegalMind</i>, an agentic AI-driven framework leveraging DeepSeek R1 for intelligent legal process automation and cost reduction. LegalMind integrates a structured legal dataset and fine-tunes DeepSeek R1 to enhance decision-making and workflow efficiency. Experimental results show a 42.6% cost reduction and a 60.8% improvement in document processing speed over baseline AI models. Scalability tests confirm the system’s ability to handle 100,000 queries efficiently. Real-world case studies validate LegalMind’s effectiveness in law firms, corporate legal departments, and government agencies, demonstrating sig...","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/access.2025.3586781","openalex_id":"https://openalex.org/W4412081767","cited_by_count":4,"quality_score":41,"matched_keywords":[],"author_affiliations":["Amazon (United States)","Birla Institute of Technology and Science, Pilani","Constantin Brâncuși University of Targu Jiu","Daffodil International University","Imam Mohammad ibn Saud Islamic University","Risun (China)","University of Dubuque"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6429558992385864},{"id":"https://openalex.org/C98045186","display_name":"Process (computing)","score":0.616433322429657},{"id":"https://openalex.org/C111335779","display_name":"Reduction (mathematics)","score":0.5927810072898865},{"id":"https://openalex.org/C2778820799","display_name":"Cost reduction","score":0.5233946442604065},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.351942241191864},{"id":"https://openalex.org/C195094911","display_name":"Process management","score":0.32828861474990845},{"id":"https://openalex.org/C144133560","display_name":"Business","score":0.15414705872535706},{"id":"https://openalex.org/C33923547","display_name":"Mathematics","score":0.09384670853614807}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":4}},{"id":"openalex:W4411379461","title":"Language-Guided Adaptive Vision Token Pruning for Efficient Multimodal Large Language Models","url":"https://doi.org/10.1007/978-981-96-8186-0_9","published":"2025-01-01","authors":["Omer Faruk Deniz","Tarik Arici","Fatemeh Sheikholeslami","Burak Gozluklu","Ameni Trabelsi","Suleiman A. Khan","Yapeng Tian","Latifur Khan"],"abstract":"","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"book-chapter","doi":"https://doi.org/10.1007/978-981-96-8186-0_9","openalex_id":"https://openalex.org/W4411379461","cited_by_count":0,"quality_score":41,"matched_keywords":["efficient"],"author_affiliations":["Amazon (United States)","The University of Texas at Dallas"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.9045875668525696},{"id":"https://openalex.org/C48145219","display_name":"Security token","score":0.7366847991943359},{"id":"https://openalex.org/C108010975","display_name":"Pruning","score":0.5984817743301392},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5263689160346985},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.49781274795532227},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.43301236629486084},{"id":"https://openalex.org/C28490314","display_name":"Speech recognition","score":0.3794596791267395},{"id":"https://openalex.org/C199360897","display_name":"Programming language","score":0.34800654649734497}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4407985183","title":"Identify Underperforming Wells using Geospatial Data and Generative AI","url":"https://doi.org/10.3997/2214-4609.202539023","published":"2025-01-01","authors":["Y. Gubanov","D. Tishechkin","Thomas Grant","I. Jeena Jacob","S. Thomas"],"abstract":"Summary Identifying underperforming wells is a crucial aspect of efficient oil and gas operations. Underperforming wells can significantly impact profitability, resource allocation, and operational decision-making. By promptly recognizing and addressing underperforming wells, companies can take proactive measures to mitigate losses and optimize production. However, pinpointing underperforming wells is a formidable task due to the complexity and diversity of data involved. Geographical information systems (GIS), geospatial analysis, and generative artificial intelligence (Gen AI) technologies offer a powerful combination for tackling these challenges. By integrating production data, well logs, and GIS data into a centralized platform, users can leverage natural language queries to interrogate the data and uncover the business-critical insights. The Gen AI tools, equipped with spatial reas...","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.3997/2214-4609.202539023","openalex_id":"https://openalex.org/W4407985183","cited_by_count":0,"quality_score":41,"matched_keywords":["efficient"],"author_affiliations":["Amazon (United States)"],"concepts":[{"id":"https://openalex.org/C9770341","display_name":"Geospatial analysis","score":0.8934657573699951},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6158958673477173},{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.5656389594078064},{"id":"https://openalex.org/C2522767166","display_name":"Data science","score":0.3960866630077362},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3594074249267578},{"id":"https://openalex.org/C127313418","display_name":"Geology","score":0.23096346855163574},{"id":"https://openalex.org/C62649853","display_name":"Remote sensing","score":0.13038897514343262}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4415355296","title":"Hybrid VLM-LLM Collaborative Reasoning via Multimodal Enhanced Chain-of-Thought for Sleep Apnea Diagnosis","url":"https://doi.org/10.2139/ssrn.5624214","published":"2025-01-01","authors":["Zhan Chen","Yingchen Wei","Xiaoyu Tan","Jingjing Huang","Xihe Qiu"],"abstract":"","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"preprint","doi":"https://doi.org/10.2139/ssrn.5624214","openalex_id":"https://openalex.org/W4415355296","cited_by_count":0,"quality_score":41,"matched_keywords":["LLM"],"author_affiliations":["Shanghai University of Engineering Science","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6255999803543091},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5938000082969666},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.5547000169754028},{"id":"https://openalex.org/C2776214188","display_name":"Inference","score":0.5180000066757202},{"id":"https://openalex.org/C2776401178","display_name":"Feature (linguistics)","score":0.4724999964237213},{"id":"https://openalex.org/C81363708","display_name":"Convolutional neural network","score":0.43630000948905945},{"id":"https://openalex.org/C177148314","display_name":"Generalization","score":0.38519999384880066},{"id":"https://openalex.org/C2780910867","display_name":"Multimodality","score":0.3610999882221222}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4411256794","title":"Enhancing Cross-Domain Generalizability in Social Determinants of Health Extraction with Prompt-Tuning Large Language Models.","url":"https://pubmed.ncbi.nlm.nih.gov/40502248","published":"2025-01-01","authors":["Peng Cheng","Zehao Yu","Kaleb E Smith","Wei‐Hsuan Lo‐Ciganic","Jiang Bian","Yonghui Wu"],"abstract":"The progress in natural language processing (NLP) using large language models (LLMs) has greatly improved patient information extraction from clinical narratives. However, most methods based on the fine-tuning strategy have limited transfer learning ability for cross-domain applications. This study proposed a novel approach that employs a soft prompt-based learning architecture, which introduces trainable prompts to guide LLMs toward desired outputs. We examined two types of LLM architectures, including encoder-only GatorTron and decoder-only GatorTronGPT, and evaluated their performance for the extraction of social determinants of health (SDoH) using a cross-institution dataset from the 2022 n2c2 challenge and a cross-disease dataset from the University of Florida (UF) Health. The results show that decoder-only LLMs with prompt tuning achieved better performance in cross-domain applicat...","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"","openalex_id":"https://openalex.org/W4411256794","cited_by_count":0,"quality_score":41,"matched_keywords":["LLM"],"author_affiliations":["Nvidia (United States)","UF Health Cancer Center","University of Florida","University of Florida Health"],"concepts":[{"id":"https://openalex.org/C27158222","display_name":"Generalizability theory","score":0.9335561990737915},{"id":"https://openalex.org/C36503486","display_name":"Domain (mathematical analysis)","score":0.5141447186470032},{"id":"https://openalex.org/C4725764","display_name":"Extraction (chemistry)","score":0.4877510666847229},{"id":"https://openalex.org/C15744967","display_name":"Psychology","score":0.4088279902935028},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.40330588817596436},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.3442423343658447},{"id":"https://openalex.org/C138496976","display_name":"Developmental psychology","score":0.15874406695365906},{"id":"https://openalex.org/C33923547","display_name":"Mathematics","score":0.12844577431678772}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4416036831","title":"DecoupleSearch: Decouple Planning and Search via Hierarchical Reward Modeling","url":"https://doi.org/10.18653/v1/2025.emnlp-main.222","published":"2025-01-01","authors":["Hao Sun","Zile Qiao","Bo Wang","Guoxin Chen","Yingyan Hou","Yong Jiang","Pengjun Xie","Fei Huang","Yan Zhang"],"abstract":"Retrieval-Augmented Generation (RAG) systems have emerged as a pivotal methodology for enhancing Large Language Models (LLMs) through the dynamic integration of external knowledge.To further improve RAG's flexibility, Agentic RAG introduces autonomous agents into the workflow.However, Agentic RAG faces several challenges: (1) the success of each step depends on both high-quality planning and accurate search, (2) the lack of supervision for intermediate reasoning steps, and (3) the exponentially large candidate space for planning and searching.To address these challenges, we propose DecoupleSearch, a novel framework that decouples planning and search processes using dual value models, enabling independent optimization of plan reasoning and search grounding.Our approach constructs a reasoning tree, where each node represents planning and search steps.We leverage Monte Carlo Tree Search to....","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.18653/v1/2025.emnlp-main.222","openalex_id":"https://openalex.org/W4416036831","cited_by_count":0,"quality_score":41,"matched_keywords":["retrieval"],"author_affiliations":["Alibaba Group (China)","Alibaba Group (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6003000140190125},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.36250001192092896},{"id":"https://openalex.org/C26517878","display_name":"Key (lock)","score":0.3278000056743622},{"id":"https://openalex.org/C2775924081","display_name":"Control (management)","score":0.310699999332428},{"id":"https://openalex.org/C2780791683","display_name":"Action (physics)","score":0.29910001158714294},{"id":"https://openalex.org/C112972136","display_name":"Stability (learning theory)","score":0.27549999952316284},{"id":"https://openalex.org/C2776401178","display_name":"Feature (linguistics)","score":0.271699994802475},{"id":"https://openalex.org/C61797465","display_name":"Term (time)","score":0.25360000133514404}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4411119559","title":"DSRAG: A Double-Stream Retrieval-Augmented Generation Framework for Countless Intent Detection","url":"https://doi.org/10.18653/v1/2025.naacl-industry.26","published":"2025-01-01","authors":["Pei Guo","Enjie Liu","Ruichao Zhong","Mochi Gao","Yunzhi Tan","Bo Hu","Li Zang"],"abstract":"Pei Guo, Enjie Liu, Ruichao Zhong, Mochi Gao, Yunzhi Tan, Bo Hu, Zang Li. Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 3: Industry Track). 2025.","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.18653/v1/2025.naacl-industry.26","openalex_id":"https://openalex.org/W4411119559","cited_by_count":0,"quality_score":41,"matched_keywords":["retrieval"],"author_affiliations":["Tencent (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7086780071258545},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.5020303726196289}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4411475417","title":"Automated Quality Assessment of Dietary Weight Loss Videos: A Likelihood-Based Large Language Model Framework","url":"https://doi.org/10.2139/ssrn.5312906","published":"2025-01-01","authors":["Shiqi Zhou","Hua Wu","Yongjian Zhang","Qian‐Qian Zhong","Yueqin Diao","Huihui Fang","Yanwu Xu","Hanyi Yu"],"abstract":"","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"preprint","doi":"https://doi.org/10.2139/ssrn.5312906","openalex_id":"https://openalex.org/W4411475417","cited_by_count":0,"quality_score":41,"matched_keywords":["language model"],"author_affiliations":["Baidu (China)","Intelligent Health (United Kingdom)","Peking University","Shenzhen Bao'an District People's Hospital","South China University of Technology"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6221230626106262},{"id":"https://openalex.org/C2779530757","display_name":"Quality (philosophy)","score":0.5360956788063049},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.3751735985279083},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3689917325973511},{"id":"https://openalex.org/C111472728","display_name":"Epistemology","score":0.0},{"id":"https://openalex.org/C138885662","display_name":"Philosophy","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4412939428","title":"Adaptive Mixture of Low-Rank Experts for Robust Audio Spoofing Detection","url":"https://doi.org/10.1109/lsp.2025.3595511","published":"2025-01-01","authors":["Qixian Chen","Yuxiong Xu","Sara Mandelli","Sheng Li","Bin Li"],"abstract":"In audio spoofing detection, most studies rely on clean datasets, making models susceptible to real-world post-processing attacks, such as channel compression and noise. To overcome this challenge, we propose the Adaptive MixtUre Low-rank ExperTs (AMULET) framework, which enhances resilience by leveraging attack-specific knowledge and dynamically adapting to varied attack conditions. Specifically, AMULET employs Attack-Specific Experts (ASEs) fine-tuned with Low-Rank Adaptation (LoRA), allowing each expert to focus on distinct post-processing patterns using just 1.13% of the parameters required for full fine-tuning. Furthermore, we introduce Adaptive Expert Fusion (AEF), which adaptively selects and integrates expert knowledge to enhance the robustness of spoofing detection. Experimental results demonstrate that AMULET significantly enhances robustness by improving noise resilience and e...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/lsp.2025.3595511","openalex_id":"https://openalex.org/W4412939428","cited_by_count":0,"quality_score":41,"matched_keywords":["compression"],"author_affiliations":["Microsoft (United States)","Politecnico di Milano","Shenzhen University"],"concepts":[{"id":"https://openalex.org/C167900197","display_name":"Spoofing attack","score":0.7127916812896729},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7095842361450195},{"id":"https://openalex.org/C164226766","display_name":"Rank (graph theory)","score":0.5244748592376709},{"id":"https://openalex.org/C28490314","display_name":"Speech recognition","score":0.4535582363605499},{"id":"https://openalex.org/C63479239","display_name":"Robustness (evolution)","score":0.4327770471572876},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3940490782260895},{"id":"https://openalex.org/C153180895","display_name":"Pattern recognition (psychology)","score":0.34156087040901184},{"id":"https://openalex.org/C38652104","display_name":"Computer security","score":0.3281427025794983}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7138887209","title":"AI-Driven Business Intelligence Automation: Integrating Data Engineering, Auto-BI, and Large Language Models","url":"https://doi.org/10.63282/3050-9416.ijaibdcms-v6i3p112","published":"2025-01-01","authors":["Ajith Suresh"],"abstract":"Business intelligence (BI) systems are important as they aid in assisting organizations to convert huge amounts of unprocessed data into insights that can be used to make strategic and operational decisions. Nevertheless, the conventional BI systems tend to employ manual data preparation, inert dashboards, and a set of analytical models, which restrict their capability to be adapted easily to the extensive and ever-evolving business contexts. With the continuous growth of enterprise data ecosystems in terms of volume, velocity, and variety, the need to process smart and automated analytics solutions has grown dramatically. The traditional BI systems also demand specialized technical skills in fields like data engineering, SQL queries and dashboards development which pose a barrier to the business users who require at the right time and in an accessible format the information to make the....","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.63282/3050-9416.ijaibdcms-v6i3p112","openalex_id":"https://openalex.org/W7138887209","cited_by_count":0,"quality_score":41,"matched_keywords":["LLM"],"author_affiliations":["Amazon (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8133000135421753},{"id":"https://openalex.org/C2767350","display_name":"Business intelligence","score":0.6342999935150146},{"id":"https://openalex.org/C2522767166","display_name":"Data science","score":0.4587000012397766},{"id":"https://openalex.org/C85345410","display_name":"Business process","score":0.4555000066757202},{"id":"https://openalex.org/C11066294","display_name":"Business rule","score":0.44179999828338623},{"id":"https://openalex.org/C37952496","display_name":"Business analytics","score":0.4408000111579895},{"id":"https://openalex.org/C115901376","display_name":"Automation","score":0.43720000982284546},{"id":"https://openalex.org/C135572916","display_name":"Data warehouse","score":0.4154999852180481}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4412888419","title":"KodCode: A Diverse, Challenging, and Verifiable Synthetic Dataset for Coding","url":"https://doi.org/10.18653/v1/2025.findings-acl.365","published":"2025-01-01","authors":["Zhangchen Xu","Yang Liu","Yueqin Yin","Mingyuan Zhou","Radha Poovendran"],"abstract":"We introduce KODCODE, a synthetic dataset that addresses the persistent challenge of acquiring high-quality, verifiable training data across diverse difficulties and domains for training Large Language Models for coding.Existing code-focused resources typically fail to ensure either the breadth of coverage (e.g., spanning simple coding tasks to advanced algorithmic problems) or verifiable correctness (e.g., unit tests).In contrast, KODCODE comprises question-solution-test triplets that are systematically validated via a self-verification procedure.Our pipeline begins by synthesizing a broad range of coding questions, then generates solutions and test cases with additional attempts allocated to challenging problems.Finally, posttraining data synthesis is done by rewriting questions into diverse formats and generating responses under a test-based reject sampling procedure from a reasoning....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.18653/v1/2025.findings-acl.365","openalex_id":"https://openalex.org/W4412888419","cited_by_count":3,"quality_score":40,"matched_keywords":[],"author_affiliations":["Microsoft (United States)"],"concepts":[{"id":"https://openalex.org/C85847156","display_name":"Verifiable secret sharing","score":0.7469454407691956},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7289849519729614},{"id":"https://openalex.org/C179518139","display_name":"Coding (social sciences)","score":0.6230469942092896},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3420564830303192},{"id":"https://openalex.org/C199360897","display_name":"Programming language","score":0.11482247710227966},{"id":"https://openalex.org/C33923547","display_name":"Mathematics","score":0.08727464079856873},{"id":"https://openalex.org/C105795698","display_name":"Statistics","score":0.06495609879493713},{"id":"https://openalex.org/C177264268","display_name":"Set (abstract data type)","score":0.05117926001548767}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":3}},{"id":"openalex:W4408468861","title":"A survey on knowledge graph evolution: proliferation, dynamic embedding, and versioning","url":"https://doi.org/10.1504/ijwgs.2025.144974","published":"2025-01-01","authors":["Xiongnan Jin","Zhilin Wang","Manni Duan","Yan Shao","Xingyun Hong","Yongheng Wang","Byungkook Oh"],"abstract":"In the era of large language models (LLMs), knowledge graphs (KGs) can play a pivotal role in enhancing LLMs by providing a structured representation of knowledge, relationships, and entities. This knowledge is essential for LLMs to understand and interpret information in a coherent and contextually relevant manner. KGs must undergo continuous evolution with minimal human intervention to remain effective. Organisations often employ automated techniques, such as web scraping, natural language processing, and machine learning algorithms, to accomplish this continuous evolution. However, there is a lack of reviews covering recent advances in KG evolution. In this survey, we first give an overview and then describe the methods of KG evolution. Afterward, we review and analyse the evaluation metrics, datasets, and experimental performances. Finally, we provide findings and future directions f...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1504/ijwgs.2025.144974","openalex_id":"https://openalex.org/W4408468861","cited_by_count":3,"quality_score":40,"matched_keywords":[],"author_affiliations":["Alibaba Group (China)","China Mobile (China)","Konkuk University","Shenzhen University","Zhejiang Lab"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6196501851081848},{"id":"https://openalex.org/C198140048","display_name":"Software versioning","score":0.5954989790916443},{"id":"https://openalex.org/C41608201","display_name":"Embedding","score":0.5766686201095581},{"id":"https://openalex.org/C132525143","display_name":"Graph","score":0.45705127716064453},{"id":"https://openalex.org/C80444323","display_name":"Theoretical computer science","score":0.39202016592025757},{"id":"https://openalex.org/C199360897","display_name":"Programming language","score":0.21841222047805786},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.1572103500366211},{"id":"https://openalex.org/C2777904410","display_name":"Software","score":0.10736167430877686}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":3}},{"id":"openalex:W4406857562","title":"Improving Visual Object Tracking Through Visual Prompting","url":"https://doi.org/10.1109/tmm.2025.3535323","published":"2025-01-01","authors":["Shih-Fang Chen","Jun-Cheng Chen","I‐Hong Jhuo","Yen‐Yu Lin"],"abstract":"Learning a discriminative model to distinguish a target from its surrounding distractors is essential to generic visual object tracking. Dynamic target representation adaptation against distractors is challenging due to the limited discriminative capabilities of prevailing trackers. We present a new visual Prompting mechanism for generic Visual Object Tracking (PiVOT) to address this issue. PiVOT proposes a prompt generation network with the pre-trained foundation model CLIP to automatically generate and refine visual prompts, enabling the transfer of foundation model knowledge for tracking. While CLIP offers broad category-level knowledge, the tracker, trained on instance-specific data, excels at recognizing unique object instances. Thus, PiVOT first compiles a visual prompt highlighting potential target locations. To transfer the knowledge of CLIP to the tracker, PiVOT leverages CLIP t...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/tmm.2025.3535323","openalex_id":"https://openalex.org/W4406857562","cited_by_count":2,"quality_score":39,"matched_keywords":[],"author_affiliations":["Microsoft (United States)","National Yang Ming Chiao Tung University","Research Center for Information Technology Innovation, Academia Sinica"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8385280966758728},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.6980692148208618},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.6072722673416138},{"id":"https://openalex.org/C2781238097","display_name":"Object (grammar)","score":0.5542711615562439},{"id":"https://openalex.org/C56461940","display_name":"Eye tracking","score":0.544568657875061},{"id":"https://openalex.org/C36464697","display_name":"Visualization","score":0.49967288970947266},{"id":"https://openalex.org/C202474056","display_name":"Video tracking","score":0.4913457930088043},{"id":"https://openalex.org/C2776151529","display_name":"Object detection","score":0.4285638630390167}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":2}},{"id":"openalex:W4412082085","title":"Enhancing the Robustness of Vision-Language Foundation Models by Alignment Perturbation","url":"https://doi.org/10.1109/tifs.2025.3586430","published":"2025-01-01","authors":["Cong Zhang","Shuhui Wang","Xiaodan Li","Yao Zhu","Honggang Qi","Qingming Huang"],"abstract":"While Vision-Language Models (VLMs) based on large-scale models have shown revolutionary advancements across various vision-language tasks, research on improving VLM robustness remains underexplored. Existing studies primarily focus on attacking VLM after the pretrained visual or textual encoders, typically requiring obvious noise or long inference time. In this study, we look into VLM structure and highlight alignment module’s role as a protective filter that enhances VLM robustness against various perturbations. Motivated by these insights, we investigate VLM from both user and model developer perspectives and introduce the alignment perturbation strategy, which consists of multimodal, visual, and textual perturbations. Multimodal perturbation aims to achieve targeted textual output generation and is further utilized to enhance VLM robustness. Minimal perturbations to visual or textual...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/tifs.2025.3586430","openalex_id":"https://openalex.org/W4412082085","cited_by_count":2,"quality_score":39,"matched_keywords":[],"author_affiliations":["Alibaba Group (China)","Chinese Academy of Sciences","Institute of Computing Technology","Tsinghua University","University of Chinese Academy of Sciences","University of Science and Technology Beijing"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7677594423294067},{"id":"https://openalex.org/C63479239","display_name":"Robustness (evolution)","score":0.7422974705696106},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.45092955231666565},{"id":"https://openalex.org/C177918212","display_name":"Perturbation (astronomy)","score":0.4126850366592407},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.38192105293273926},{"id":"https://openalex.org/C185592680","display_name":"Chemistry","score":0.0},{"id":"https://openalex.org/C121332964","display_name":"Physics","score":0.0},{"id":"https://openalex.org/C55493867","display_name":"Biochemistry","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":2}},{"id":"openalex:W4416645709","title":"An Immersive Virtual Reality Bimanual Telerobotic System With Haptic Feedback","url":"https://doi.org/10.1049/csy2.70033","published":"2025-01-01","authors":["Han Xu","Mingqi Chen","Gaofeng Li","Lei Wei","Shichi Peng","Haoliang Xu","ZunRan Wang","Huibin Cao","Qiang Li"],"abstract":"ABSTRACT In robotic bimanual teleoperation, multimodal sensory feedback plays a crucial role, providing operators with a more immersive operating experience, reducing cognitive burden and improving operating efficiency. In this study, we develop an immersive bilateral isomorphic bimanual telerobotic system, which comprises dual arms and dual dexterous hands, with visual and haptic force feedback. To assess the performance of this system, we carried out a series of experiments and investigated the user's teleoperation experience. The results demonstrate that haptic force feedback enhances physical perception capabilities and complex task operating abilities. In addition, it compensates for visual perception deficiencies and reduces the operator's work burden. Consequently, our proposed system achieves more intuitive, realistic and immersive teleoperation, improves operating efficiency and...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1049/csy2.70033","openalex_id":"https://openalex.org/W4416645709","cited_by_count":2,"quality_score":39,"matched_keywords":[],"author_affiliations":["Chinese Academy of Sciences","Deakin University","Hefei Institutes of Physical Science","Huawei Technologies (China)","Shenzhen Academy of Robotics","Shenzhen Technology University","Zhejiang University"],"concepts":[{"id":"https://openalex.org/C161759796","display_name":"Teleoperation","score":0.8783000111579895},{"id":"https://openalex.org/C152086174","display_name":"Haptic technology","score":0.8330000042915344},{"id":"https://openalex.org/C194969405","display_name":"Virtual reality","score":0.6746000051498413},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.625},{"id":"https://openalex.org/C2780451532","display_name":"Task (project management)","score":0.5507000088691711},{"id":"https://openalex.org/C90509273","display_name":"Robot","score":0.5078999996185303},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.5040000081062317},{"id":"https://openalex.org/C26760741","display_name":"Perception","score":0.48980000615119934}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":2}},{"id":"openalex:W4407039653","title":"Play to Your Strengths: Collaborative Intelligence of Conventional Recommender Models and Large Language Models","url":"https://doi.org/10.1007/978-981-96-1710-4_1","published":"2025-01-01","authors":["Yunjia Xi","Weiwen Liu","Jianghao Lin","Chuhan Wu","Bo Chen","Ruiming Tang","Weinan Zhang","Yong Yu"],"abstract":"","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"book-chapter","doi":"https://doi.org/10.1007/978-981-96-1710-4_1","openalex_id":"https://openalex.org/W4407039653","cited_by_count":1,"quality_score":38,"matched_keywords":[],"author_affiliations":["Huawei Technologies (China)","Shanghai Jiao Tong University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.891148567199707},{"id":"https://openalex.org/C557471498","display_name":"Recommender system","score":0.738201379776001},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4537307918071747},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.39794960618019104},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.3213194012641907}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4410357454","title":"Physics of Language Models: Part 1, Learning Hierarchical Language Structures","url":"https://doi.org/10.2139/ssrn.5250639","published":"2025-01-01","authors":["Zeyuan Allen-Zhu","Yuanzhi Li"],"abstract":"","companies":["Meta/FAIR"],"matched_orgs":["Meta/FAIR"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"preprint","doi":"https://doi.org/10.2139/ssrn.5250639","openalex_id":"https://openalex.org/W4410357454","cited_by_count":1,"quality_score":38,"matched_keywords":[],"author_affiliations":["Allen University","BC Platforms (Finland)","Meta (United States)","Mohamed bin Zayed University of Artificial Intelligence"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.46474072337150574},{"id":"https://openalex.org/C41895202","display_name":"Linguistics","score":0.39122331142425537},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.3325440585613251},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3288615047931671},{"id":"https://openalex.org/C138885662","display_name":"Philosophy","score":0.09597167372703552}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4408859120","title":"Open-Vocabulary Action Localization With Iterative Visual Prompting","url":"https://doi.org/10.1109/access.2025.3555167","published":"2025-01-01","authors":["Naoki Wake","Atsushi Kanehira","Kazuhiro Sasabuchi","Jun Takamatsu","Katsushi Ikeuchi"],"abstract":"Video action localization aims to find the timings of specific actions from a long video. Although existing learning-based approaches have been successful, they require annotating videos, which comes with a considerable labor cost. This paper proposes a training-free, open-vocabulary approach based on emerging off-the-shelf vision-language models (VLMs). The challenge stems from the fact that VLMs are neither designed to process long videos nor tailored for finding actions. We overcome these problems by extending an iterative visual prompting technique. Specifically, we sample video frames and create a concatenated image with frame index labels, allowing a VLM to identify the frames that most likely correspond to the start and end of the action. By iteratively narrowing the sampling window around the selected frames, the estimation gradually converges to more precise temporal boundaries....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/access.2025.3555167","openalex_id":"https://openalex.org/W4408859120","cited_by_count":1,"quality_score":38,"matched_keywords":[],"author_affiliations":["Microsoft (United States)","Robotics Research (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7630289793014526},{"id":"https://openalex.org/C2780791683","display_name":"Action (physics)","score":0.6665158271789551},{"id":"https://openalex.org/C2777601683","display_name":"Vocabulary","score":0.5327496528625488},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.4840778410434723},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.473732590675354},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.3693746328353882},{"id":"https://openalex.org/C28490314","display_name":"Speech recognition","score":0.3320053219795227},{"id":"https://openalex.org/C41895202","display_name":"Linguistics","score":0.1256508231163025}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4412375072","title":"Geometric Optimal Transport for Cross-Modal Medical Manifold Alignment: A Differential Approach to Multimodal Diagnosis","url":"https://doi.org/10.1109/access.2025.3587298","published":"2025-01-01","authors":["Yuan Shen"],"abstract":"We present a novel theoretical framework for multimodal medical diagnosis that formulates heterogeneous data integration as a problem of aligning Riemannian manifolds through differential geometric principles. Medical data modalities—ranging from imaging to clinical parameters—naturally reside on distinct manifolds with inherent topological structures that conventional fusion approaches fail to preserve. Our method, Twin-Topology Heteromodal Contrastive Alignment (TTHCA), establishes rigorous mathematical foundations for this alignment through three key innovations: 1) formulation of <inline-formula xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" xmlns:xlink=\"http://www.w3.org/1999/xlink\"> <tex-math notation=\"LaTeX\">$\\varepsilon $ </tex-math></inline-formula>-isometric embeddings that provably preserve geodesic distances during cross-modal mapping with theoretical error bounds; 2) develop...","companies":["Meta/FAIR"],"matched_orgs":["Meta/FAIR"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/access.2025.3587298","openalex_id":"https://openalex.org/W4412375072","cited_by_count":1,"quality_score":38,"matched_keywords":[],"author_affiliations":["Meta (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6060492992401123},{"id":"https://openalex.org/C71139939","display_name":"Modal","score":0.5869351625442505},{"id":"https://openalex.org/C529865628","display_name":"Manifold (fluid mechanics)","score":0.5854654908180237},{"id":"https://openalex.org/C2777974031","display_name":"Multimodal transport","score":0.41824111342430115},{"id":"https://openalex.org/C93226319","display_name":"Differential (mechanical device)","score":0.41590166091918945},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.40222105383872986},{"id":"https://openalex.org/C121332964","display_name":"Physics","score":0.15913888812065125},{"id":"https://openalex.org/C127413603","display_name":"Engineering","score":0.09226438403129578}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4411120249","title":"From Introspection to Best Practices: Principled Analysis of Demonstrations in Multimodal In-Context Learning","url":"https://doi.org/10.18653/v1/2025.naacl-long.170","published":"2025-01-01","authors":["Nan Xu","Fei Wang","Sheng Zhang","Hoifung Poon","Muhao Chen"],"abstract":"Nan Xu, Fei Wang, Sheng Zhang, Hoifung Poon, Muhao Chen. Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers). 2025.","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.18653/v1/2025.naacl-long.170","openalex_id":"https://openalex.org/W4411120249","cited_by_count":1,"quality_score":38,"matched_keywords":[],"author_affiliations":["Microsoft (United States)"],"concepts":[{"id":"https://openalex.org/C129671850","display_name":"Introspection","score":0.918753981590271},{"id":"https://openalex.org/C2779343474","display_name":"Context (archaeology)","score":0.659314751625061},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6331424713134766},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4142951965332031},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.34955310821533203},{"id":"https://openalex.org/C2522767166","display_name":"Data science","score":0.3490753769874573},{"id":"https://openalex.org/C188147891","display_name":"Cognitive science","score":0.33695971965789795},{"id":"https://openalex.org/C180747234","display_name":"Cognitive psychology","score":0.20813003182411194}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4411120010","title":"EKRAG: Benchmark RAG for Enterprise Knowledge Question Answering","url":"https://doi.org/10.18653/v1/2025.knowledgenlp-1.13","published":"2025-01-01","authors":["Yu Tan","Wenfei Zhou","Leiyang Leiyang","Aaditya Shukla","Mmadugula Mmadugula","Pritam Gundecha","Nicholas Burnett","Anbang Xu","Viseth Viseth","Tbar Tbar","Rama Akkiraju","Vivienne Zhang"],"abstract":"Tan Yu, Wenfei Zhou, Leiyang Leiyang, Aaditya Shukla, Mmadugula Mmadugula, Pritam Gundecha, Nicholas Burnett, Anbang Xu, Viseth Viseth, Tbar Tbar, Rama Akkiraju, Vivienne Zhang. Proceedings of the 4th International Workshop on Knowledge-Augmented Methods for Natural Language Processing. 2025.","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.18653/v1/2025.knowledgenlp-1.13","openalex_id":"https://openalex.org/W4411120010","cited_by_count":1,"quality_score":38,"matched_keywords":[],"author_affiliations":["Nvidia (United States)"],"concepts":[{"id":"https://openalex.org/C44291984","display_name":"Question answering","score":0.7552456855773926},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6827218532562256},{"id":"https://openalex.org/C185798385","display_name":"Benchmark (surveying)","score":0.6712161302566528},{"id":"https://openalex.org/C56739046","display_name":"Knowledge management","score":0.5568709969520569},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.31489741802215576},{"id":"https://openalex.org/C205649164","display_name":"Geography","score":0.0},{"id":"https://openalex.org/C13280743","display_name":"Geodesy","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4411851928","title":"BPDec: Unveiling the Potential of Masked Language Modeling Decoder in BERT Model Pretraining","url":"https://doi.org/10.1007/978-981-96-6599-0_27","published":"2025-01-01","authors":["Wen Liang","Youzhi Liang"],"abstract":"","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"book-chapter","doi":"https://doi.org/10.1007/978-981-96-6599-0_27","openalex_id":"https://openalex.org/W4411851928","cited_by_count":1,"quality_score":38,"matched_keywords":[],"author_affiliations":["Google (United States)","Stanford University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8856709599494934},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.5692662596702576},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.421917200088501},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4185660183429718},{"id":"https://openalex.org/C199360897","display_name":"Programming language","score":0.4062165915966034},{"id":"https://openalex.org/C28490314","display_name":"Speech recognition","score":0.3971668481826782}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4414035745","title":"Automating the Search for Artificial Life With Foundation Models","url":"https://doi.org/10.1162/artl.a.8","published":"2025-01-01","authors":["Akarsh Kumar","Chris Xiaoxuan Lu","Louis Kirsch","Yujin Tang","Kenneth O. Stanley","Phillip Isola","David Ha"],"abstract":"With the recent Nobel Prize awarded for radical advances in protein discovery, foundation models (FMs) for exploring large combinatorial spaces promise to revolutionize many scientific fields. Artificial Life (ALife) has not yet integrated FMs, thus presenting a major opportunity for the field to alleviate the historical burden of relying chiefly on manual design and trial and error to discover the configurations of lifelike simulations. This article presents, for the first time, a successful realization of this opportunity using vision-language FMs. The proposed approach, called automated search for Artificial Life (ASAL), (a) finds simulations that produce target phenomena, (b) discovers simulations that generate temporally open-ended novelty, and (c) illuminates an entire space of interestingly diverse simulations. Because of the generality of FMs, ASAL works effectively across a dive...","companies":["OpenAI"],"matched_orgs":["OpenAI"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1162/artl.a.8","openalex_id":"https://openalex.org/W4414035745","cited_by_count":1,"quality_score":38,"matched_keywords":[],"author_affiliations":["Akureyri Hospital","Dalle Molle Institute for Artificial Intelligence Research","Film Independent","Moscow Institute of Thermal Technology","OpenAI (United States)"],"concepts":[{"id":"https://openalex.org/C19273510","display_name":"Artificial life","score":0.7173265814781189},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6220459342002869},{"id":"https://openalex.org/C2780966255","display_name":"Foundation (evidence)","score":0.5477268099784851},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5116980075836182},{"id":"https://openalex.org/C166957645","display_name":"Archaeology","score":0.0},{"id":"https://openalex.org/C95457728","display_name":"History","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4415178536","title":"A Multimodal Deep Learning Framework Using ResNet-101 and Firefly-Based Feature Selection for Early Diagnosis of Dementia and Alzheimer’s Disease","url":"https://doi.org/10.1109/access.2025.3621157","published":"2025-01-01","authors":["Harsh Vardhan Bansal","Pooja Gupta","Vikas Juneja"],"abstract":"Dementia and Alzheimer’s Diseases (AD) has global health challenges especially due to the progressive nature and the impact these diseases put on elderly population. Early detection is vital to improve prognosis and initiate timely intervention so that lives can be saved. This paper proposes a novel multimodal diagnostic framework that integrates clinical data and MRI imaging for the early classification of dementia and AD. The clinical pathway employs the OASIS dataset, wherein numerical features undergo statistical filtering followed by metaheuristic feature selection using the Firefly Algorithm. For imaging-based diagnosis, a deep Convolutional Neural Network (CNN) architecture is developed using MRI data, further optimized via a hybrid ResNet101–Support Vector Machine (SVM) model to enhance classification accuracy. The decision-level fusion module integrates outputs from both modalit...","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/access.2025.3621157","openalex_id":"https://openalex.org/W4415178536","cited_by_count":1,"quality_score":38,"matched_keywords":[],"author_affiliations":["Amazon (United States)","JK Lakshmipat University","Ta Solutions (China)"],"concepts":[{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.7770000100135803},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7502999901771545},{"id":"https://openalex.org/C108583219","display_name":"Deep learning","score":0.6491000056266785},{"id":"https://openalex.org/C81363708","display_name":"Convolutional neural network","score":0.607699990272522},{"id":"https://openalex.org/C148483581","display_name":"Feature selection","score":0.6065999865531921},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.5956000089645386},{"id":"https://openalex.org/C2779483572","display_name":"Dementia","score":0.5658000111579895},{"id":"https://openalex.org/C2776401178","display_name":"Feature (linguistics)","score":0.47200000286102295}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4414109430","title":"<scp>DARE</scp>: Diverse Visual Question Answering with Robustness Evaluation","url":"https://doi.org/10.1162/tacl.a.29","published":"2025-01-01","authors":["Hannah Sterz","Jonas Pfeiffer","Ivan Vulić"],"abstract":"Abstract Vision Language Models (VLMs) extend remarkable capabilities of text-only large language models and vision-only models, being able to learn from and process multi-modal vision-text input. While modern VLMs perform well on a number of standard image classification and image-text matching tasks, they still struggle with a number of crucial vision-language (VL) reasoning abilities such as counting and spatial reasoning. Moreover, while they might be very brittle to small variations in instructions and/or evaluation protocols, existing benchmarks fail to evaluate their robustness (or rather the lack of it). In order to couple challenging VL scenarios with comprehensive robustness evaluation, we introduce DARE, Diverse Visual Question Answering with Robustness Evaluation, a carefully created and curated multiple-choice VQA benchmark. DARE evaluates VLM performance on five diverse cat...","companies":["Google/DeepMind"],"matched_orgs":["Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1162/tacl.a.29","openalex_id":"https://openalex.org/W4414109430","cited_by_count":1,"quality_score":38,"matched_keywords":[],"author_affiliations":["Google (Switzerland)","Google (United States)","University of Cambridge"],"concepts":[{"id":"https://openalex.org/C63479239","display_name":"Robustness (evolution)","score":0.9180999994277954},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.823199987411499},{"id":"https://openalex.org/C44291984","display_name":"Question answering","score":0.614300012588501},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.48539999127388},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.47850000858306885},{"id":"https://openalex.org/C165064840","display_name":"Matching (statistics)","score":0.3312000036239624},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.3046000003814697},{"id":"https://openalex.org/C165696696","display_name":"Exploit","score":0.2994999885559082}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4412889538","title":"WinSpot: GUI Grounding Benchmark with Multimodal Large Language Models","url":"https://doi.org/10.18653/v1/2025.acl-short.85","published":"2025-01-01","authors":["Hui Zheng","Yinheng Li","Dan Zhao","Colby Banbury","Tianyi Chen","Kazuhito Koishida"],"abstract":"Graphical User Interface (GUI) automation relies on accurate GUI grounding.However, obtaining large-scale, high-quality labeled data remains a key challenge, particularly in desktop environments like Windows Operating System (OS).Existing datasets primarily focus on structured web-based elements, leaving a gap in real-world GUI interaction data for non-web applications.To address this, we introduce a new framework that leverages LLMs to generate large-scale GUI grounding data, enabling automated and scalable labeling across diverse interfaces.To ensure high accuracy and reliability, we manually validated and refined 5,000 GUI coordinate-instruction pairs, creating WinSpot-the first benchmark specifically designed for GUI grounding tasks in Windows environments.WinSpot provides a high-quality dataset for training and evaluating visual GUI agents, establishing a foundation for future resea...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.18653/v1/2025.acl-short.85","openalex_id":"https://openalex.org/W4412889538","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Microsoft (United States)","Microsoft Research (United Kingdom)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7741400003433228},{"id":"https://openalex.org/C185798385","display_name":"Benchmark (surveying)","score":0.751615047454834},{"id":"https://openalex.org/C168993435","display_name":"Ground","score":0.6412143707275391},{"id":"https://openalex.org/C199360897","display_name":"Programming language","score":0.5258650779724121},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.4277323782444},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.35604143142700195},{"id":"https://openalex.org/C127413603","display_name":"Engineering","score":0.09357190132141113},{"id":"https://openalex.org/C127313418","display_name":"Geology","score":0.07407161593437195}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4412591620","title":"Multimodal Interactions and Explainable AI for Reflective Physical and Online Learning: MiXai^learn Workshop","url":"https://doi.org/10.1007/978-3-031-99267-4_41","published":"2025-01-01","authors":["Rwitajit Majumdar","Huiyong Li","Brendan Flanagan","Shinichiro Kubota","S. Bhattacharya","Aditi Kothiyal","Prajakt Pande","Olga C. Santos","Irene‐Angelica Chounta"],"abstract":"","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"book-chapter","doi":"https://doi.org/10.1007/978-3-031-99267-4_41","openalex_id":"https://openalex.org/W4412591620","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Aarhus University","Indian Institute of Technology Gandhinagar","Kumamoto University","Kyoto University","Kyushu University","Microsoft (United States)","Southern Methodist University","Universidad Nacional de Educación a Distancia","University of Duisburg-Essen"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6077343225479126},{"id":"https://openalex.org/C2986087404","display_name":"Online learning","score":0.44738155603408813},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.4099572002887726},{"id":"https://openalex.org/C49774154","display_name":"Multimedia","score":0.39960524439811707},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.36181628704071045},{"id":"https://openalex.org/C136764020","display_name":"World Wide Web","score":0.329181969165802}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4411571913","title":"Mixture of Rationale: Multi-modal Reasoning Mixture for Visual Question Answering","url":"https://doi.org/10.1007/978-981-96-6975-2_8","published":"2025-01-01","authors":["Tao Li","Linjun Shou","Xuejun Liu"],"abstract":"","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"book-chapter","doi":"https://doi.org/10.1007/978-981-96-6975-2_8","openalex_id":"https://openalex.org/W4411571913","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Microsoft (United States)","Microsoft Research Asia (China)","Nanjing University of Aeronautics and Astronautics"],"concepts":[{"id":"https://openalex.org/C71139939","display_name":"Modal","score":0.7243458032608032},{"id":"https://openalex.org/C44291984","display_name":"Question answering","score":0.7210794687271118},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5903650522232056},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.5335075855255127},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.48391491174697876},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4495221972465515},{"id":"https://openalex.org/C185592680","display_name":"Chemistry","score":0.1550276279449463},{"id":"https://openalex.org/C188027245","display_name":"Polymer chemistry","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4411270140","title":"Machine Unlearning Doesn't Do What You Think: Lessons for Generative AI Policy and Research","url":"https://doi.org/10.2139/ssrn.5288768","published":"2025-01-01","authors":["A. Feder Cooper","Christopher A. Choquette-Choo","Miranda Bogen","Kevin Klyman","Matthew Jagielski","Katja Filippova","Ken Liu","Alexandra Chouldechova","Jamie Hayes","Yangsibo Huang","Niloofar Mireshghallah","Ilia Shumailov"],"abstract":"","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"preprint","doi":"https://doi.org/10.2139/ssrn.5288768","openalex_id":"https://openalex.org/W4411270140","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Carnegie Mellon University","Cornell University","Data & Society Research Institute","Eastern University","Georgetown University","Google (United States)","Google DeepMind (United Kingdom)","Microsoft (United States)","Microsoft Research (United Kingdom)","Microsoft Research New York City (United States)","Stanford Medicine","Stanford University","University of California, Berkeley","University of Michigan","University of Pennsylvania","University of Toronto","University of Washington","Virginia College","West Virginia University","Yale University"],"concepts":[{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.7212502956390381},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.38466763496398926},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.31595438718795776}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4410698604","title":"GECO: GPT-Driven Estimation of 3D Human-Scene Contact in the Wild","url":"https://doi.org/10.1007/978-3-031-92591-7_29","published":"2025-01-01","authors":["C.-S. Lee","Simranjit Singh","Michael Fore","Georgios Pavlakos","Dimitrios Stamoulis"],"abstract":"","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"book-chapter","doi":"https://doi.org/10.1007/978-3-031-92591-7_29","openalex_id":"https://openalex.org/W4410698604","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Microsoft (United States)","The University of Texas at Austin"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8168506622314453},{"id":"https://openalex.org/C96250715","display_name":"Estimation","score":0.5872197151184082},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.5545639991760254},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5449212789535522},{"id":"https://openalex.org/C201995342","display_name":"Systems engineering","score":0.04342079162597656},{"id":"https://openalex.org/C127413603","display_name":"Engineering","score":0.036939799785614014}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4415002978","title":"Foundation Models for Speech Enhancement Leveraging Consistency Constraints and Contrast Stretching","url":"https://doi.org/10.1109/access.2025.3619782","published":"2025-01-01","authors":["Muhammad Salman Khan","Valerio Mario Salerno","Moreno La Quatra","Kuo-Hsuan Hung","Szu‐Wei Fu","Yu Tsao","Sabato Marco Siniscalchi"],"abstract":"Foundation models (FM) have proven effective in many speech applications except for speech enhancement (SE), where FM-based SE solutions still fall short with respect to specialized deep architectures. This work seeks to close this gap by systematically assessing and contrasting leading pre-trained FM architectures on a commonly used SE task, namely VoiceBank-Demand, and on the complex Deep Noise Suppression (DNS) challenge. Furthermore, three main ideas will be leveraged to boost FM-based SE models, namely: (i) Attention-based mask generation, (ii) consistency-preserving loss, and (iii) perceptual contrast stretching (PCS). Specifically, frame-level representations are effectively modeled using conformer layers, which leverage an attention mechanism. Inconsistency effects of signal reconstruction from the spectrogram are mitigated by incorporating consistency in the loss function. Final...","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/access.2025.3619782","openalex_id":"https://openalex.org/W4415002978","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Academia Sinica","Nvidia (United Kingdom)","Nvidia (United States)","University of Palermo","Università degli Studi di Enna Kore"],"concepts":[{"id":"https://openalex.org/C2776182073","display_name":"Speech enhancement","score":0.8167999982833862},{"id":"https://openalex.org/C45273575","display_name":"Spectrogram","score":0.7484999895095825},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7300999760627747},{"id":"https://openalex.org/C153083717","display_name":"Leverage (statistics)","score":0.6317999958992004},{"id":"https://openalex.org/C28490314","display_name":"Speech recognition","score":0.5665000081062317},{"id":"https://openalex.org/C34736171","display_name":"Preprocessor","score":0.505299985408783},{"id":"https://openalex.org/C2776436953","display_name":"Consistency (knowledge bases)","score":0.4943000078201294},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.47380000352859497}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4413055791","title":"Exploring the Coordination of Frequency and Attention in Masked Image Modeling","url":"https://doi.org/10.1109/tip.2025.3592555","published":"2025-01-01","authors":["Jie Gui","Tuo Chen","Minjing Dong","Zhengqi Liu","Hao Luo","James T. Kwok","Yuan Yan Tang"],"abstract":"Recently, masked image modeling (MIM), which learns visual representations by reconstructing the masked patches of an image, has become a popular self-supervised paradigm. However, the pre-training of MIM always takes massive time due to the large-scale data and large-size backbones. We mainly attribute it to the random patch masking in previous MIM works, which fails to leverage the crucial semantic information for effective visual representation learning. To tackle this issue, we propose the Frequency & Attention-driven Masking and Throwing Strategy (FAMT), which can detect semantic patches and reduce the number of training patches to boost model performance and training efficiency simultaneously. Specifically, FAMT utilizes the self-attention mechanism to extract semantic information from the image for masking during training in an unsupervised manner. However, attention alone could s...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/tip.2025.3592555","openalex_id":"https://openalex.org/W4413055791","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Alibaba Group (China)","City University of Hong Kong","Hong Kong University of Science and Technology","Southeast University","Zhuhai Institute of Advanced Technology"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7781513929367065},{"id":"https://openalex.org/C153083717","display_name":"Leverage (statistics)","score":0.6524662971496582},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.6105476021766663},{"id":"https://openalex.org/C2777402240","display_name":"Masking (illustration)","score":0.5555927753448486},{"id":"https://openalex.org/C2780490138","display_name":"Offline learning","score":0.41523391008377075},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.4105975031852722},{"id":"https://openalex.org/C153180895","display_name":"Pattern recognition (psychology)","score":0.4014746844768524},{"id":"https://openalex.org/C2986087404","display_name":"Online learning","score":0.11226528882980347}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4410873112","title":"Design Guidelines for Human-Generative AI Interaction","url":"https://doi.org/10.1007/978-3-031-93718-7_15","published":"2025-01-01","authors":["Lin Li","Yu Wang","Yayu Ping","Jian Gao","Zongbo Wang","Shouyu Wang"],"abstract":"","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"book-chapter","doi":"https://doi.org/10.1007/978-3-031-93718-7_15","openalex_id":"https://openalex.org/W4410873112","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Huawei Technologies (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8340132236480713},{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.7496055364608765},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.44214341044425964},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.3862673044204712}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7116783826","title":"Degradation-Aware Prompted Transformer for Unified Medical Image Restoration","url":"https://doi.org/10.1109/tip.2025.3644795","published":"2025-01-01","authors":["J. Y. Wei","Gang Yang","Zhijie Wang","Shimin Tao","Aiping Liu Aiping Liu","Xun Chen"],"abstract":"Medical image restoration (MedIR) aims to recover high-quality images from degraded inputs, yet faces unique challenges from physics-driven degradations and multi-modal task interference. While existing all-in-one methods handle natural image degradations well, they struggle with medical scenarios due to limited degradation perception and suboptimal multi-task optimization. In response, we introduce DaPT, a Degradation-aware Prompted Transformer, which integrates dynamic prompt learning and modular expert mining for unified MedIR. First, DaPT introduces spatially compact prompts with optimal transport regularization, amplifying inter-prompt differences to capture diverse degradation patterns. Second, a mixture of experts dynamically routes inputs to specialized modules via prompt guidance, resolving task conflicts while reducing computational overhead. The synergy of prompt learning and....","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/tip.2025.3644795","openalex_id":"https://openalex.org/W7116783826","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Huawei Technologies (China)","University of Science and Technology of China"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7210999727249146},{"id":"https://openalex.org/C101468663","display_name":"Modular design","score":0.6460000276565552},{"id":"https://openalex.org/C106430172","display_name":"Image restoration","score":0.5846999883651733},{"id":"https://openalex.org/C2780451532","display_name":"Task (project management)","score":0.5376999974250793},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.517300009727478},{"id":"https://openalex.org/C66322947","display_name":"Transformer","score":0.506600022315979},{"id":"https://openalex.org/C43126263","display_name":"Source code","score":0.44339999556541443},{"id":"https://openalex.org/C2779903281","display_name":"Modalities","score":0.4422999918460846}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4406264036","title":"DeepMIN: Deep Multi-modal Interest Network with Cognitive Learning Modules","url":"https://doi.org/10.1007/978-981-97-5555-4_14","published":"2025-01-01","authors":["Zhaoxiang Zhang","Zhiheng Li","Jipeng Jin","Xiaofeng Gao","Xiongwen Yang","Bo Zhang","Lei Xiao"],"abstract":"","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"book-chapter","doi":"https://doi.org/10.1007/978-981-97-5555-4_14","openalex_id":"https://openalex.org/W4406264036","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Shanghai Jiao Tong University","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8565819263458252},{"id":"https://openalex.org/C71139939","display_name":"Modal","score":0.6512392163276672},{"id":"https://openalex.org/C169900460","display_name":"Cognition","score":0.5143924355506897},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.49816346168518066},{"id":"https://openalex.org/C108583219","display_name":"Deep learning","score":0.4791109561920166},{"id":"https://openalex.org/C32542511","display_name":"Cognitive network","score":0.42451542615890503},{"id":"https://openalex.org/C149946192","display_name":"Cognitive radio","score":0.13224750757217407},{"id":"https://openalex.org/C76155785","display_name":"Telecommunications","score":0.09898000955581665}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4416798363","title":"Copyright Protection of General Information via Simulation Task Supervision","url":"https://doi.org/10.1109/tifs.2025.3638667","published":"2025-01-01","authors":["Chenxi Hu","Yangyi Hu","Huangxiang Li","Yifan Hu","Ning Zhu"],"abstract":"Large language models (LLMs) exhibit exceptional linguistic capabilities, yet their ability to verbatim reproduce copyrighted data from high-quality training datasets raises concerns about improper exploitation via Artificial Intelligence (AI) data traceability. Such copyrighted data may include general information (e.g., historical knowledge, or common sense), necessitating that LLMs generate content preserving key information rather than exact replicas. Current unlearning methods, however, primarily aim to fully forget targeted knowledge or key information, potentially leading to hallucinations of general information. To address these challenges, we propose an unlearning method based on attention flattening in auto-regressive models, combined with simulation tasks for targeted information forgetting. During optimization, the model is trained only on real tasks while acquiring knowledge...","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/tifs.2025.3638667","openalex_id":"https://openalex.org/W4416798363","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Baidu (China)","National Clinical Research","PLA Army Engineering University","Pride Foundation","Shanghai Jiao Tong University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8798999786376953},{"id":"https://openalex.org/C100279451","display_name":"Perplexity","score":0.796500027179718},{"id":"https://openalex.org/C2780451532","display_name":"Task (project management)","score":0.6920999884605408},{"id":"https://openalex.org/C26517878","display_name":"Key (lock)","score":0.6796000003814697},{"id":"https://openalex.org/C108010975","display_name":"Pruning","score":0.6098999977111816},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5885999798774719},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.5613999962806702},{"id":"https://openalex.org/C30038468","display_name":"Memorization","score":0.49720001220703125}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4410776156","title":"Conpet: Efficiently Adapting Large Language Models for Continual Structured Knowledge Acquisition","url":"https://doi.org/10.2139/ssrn.5263765","published":"2025-01-01","authors":["Chenyang Song","Han Xu","Zheni Zeng","Kuai Li","Chen Chen","Zhiyuan Liu","Maosong Sun","Tao Yang"],"abstract":"","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"preprint","doi":"https://doi.org/10.2139/ssrn.5263765","openalex_id":"https://openalex.org/W4410776156","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Tencent (China)","Tsinghua University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6346560716629028},{"id":"https://openalex.org/C2777220311","display_name":"Knowledge acquisition","score":0.4271323084831238},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.424589067697525},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.32482999563217163},{"id":"https://openalex.org/C56739046","display_name":"Knowledge management","score":0.3238890767097473}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4409785002","title":"Concept-Edge Fusion: Background Generation for Product Presentation Based on Text-to-Image Model","url":"https://doi.org/10.1007/978-981-96-5812-1_13","published":"2025-01-01","authors":["Pengfei Deng","Tianjiao Zhang","Weize Quan","Hanyu Wang","Qinglin Lu","Zhifeng Li","Dong‐Ming Yan"],"abstract":"","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"book-chapter","doi":"https://doi.org/10.1007/978-981-96-5812-1_13","openalex_id":"https://openalex.org/W4409785002","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Chinese Academy of Sciences","Shandong Institute of Automation","Tencent (China)","University of Chinese Academy of Sciences","University of Maryland, College Park"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7942581176757812},{"id":"https://openalex.org/C2777601897","display_name":"Presentation (obstetrics)","score":0.6790292263031006},{"id":"https://openalex.org/C162307627","display_name":"Enhanced Data Rates for GSM Evolution","score":0.47880038619041443},{"id":"https://openalex.org/C115961682","display_name":"Image (mathematics)","score":0.4642290472984314},{"id":"https://openalex.org/C90673727","display_name":"Product (mathematics)","score":0.46188706159591675},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4496748745441437},{"id":"https://openalex.org/C69744172","display_name":"Image fusion","score":0.4312552809715271},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.42767611145973206}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4410915339","title":"Compositional Text-to-Image Generation with Feedforward Layout Generation","url":"https://doi.org/10.1007/978-3-031-91979-4_3","published":"2025-01-01","authors":["Sifei Liu","Weili Nie","An‐Chieh Cheng","Morteza Mardani","Chao Liu","Benjamin Eckart","Arash Vahdat"],"abstract":"","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"book-chapter","doi":"https://doi.org/10.1007/978-3-031-91979-4_3","openalex_id":"https://openalex.org/W4410915339","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Nvidia (United States)","University of California San Diego"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7996506690979004},{"id":"https://openalex.org/C38858127","display_name":"Feed forward","score":0.5674977898597717},{"id":"https://openalex.org/C2985684807","display_name":"Text generation","score":0.5455337166786194},{"id":"https://openalex.org/C115961682","display_name":"Image (mathematics)","score":0.46533432602882385},{"id":"https://openalex.org/C121684516","display_name":"Computer graphics (images)","score":0.37973323464393616},{"id":"https://openalex.org/C199639397","display_name":"Engineering drawing","score":0.3794754147529602},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.37369391322135925},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.3527749180793762}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4408111315","title":"Automatic Data Labeling Using Large Language Models","url":"https://doi.org/10.1007/978-3-031-82475-3_17","published":"2025-01-01","authors":["A. Ramanathan","Jason Liang","Dan Sommerfield","Hossein Azari"],"abstract":"","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"book-chapter","doi":"https://doi.org/10.1007/978-3-031-82475-3_17","openalex_id":"https://openalex.org/W4408111315","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Microsoft (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8805713653564453},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.5043545961380005},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4373822510242462},{"id":"https://openalex.org/C199360897","display_name":"Programming language","score":0.32062309980392456}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4412509581","title":"Aspect-Based Sentiment Analysis with Dual Contrastive Learning and LLMs Data Augmentation","url":"https://doi.org/10.1007/978-981-96-9994-0_13","published":"2025-01-01","authors":["Jing Li","Hanjie Mai","Xuejie Zhang","Jin Wang","Xiaobing Zhou"],"abstract":"","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"book-chapter","doi":"https://doi.org/10.1007/978-981-96-9994-0_13","openalex_id":"https://openalex.org/W4412509581","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Alibaba Group (China)","Yunnan University"],"concepts":[{"id":"https://openalex.org/C66402592","display_name":"Sentiment analysis","score":0.6861796975135803},{"id":"https://openalex.org/C2780980858","display_name":"Dual (grammatical number)","score":0.6751443147659302},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5039727091789246},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.4591277539730072},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3682401776313782},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.32650357484817505},{"id":"https://openalex.org/C41895202","display_name":"Linguistics","score":0.2978445887565613},{"id":"https://openalex.org/C138885662","display_name":"Philosophy","score":0.07143411040306091}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4407984843","title":"Accelerating Unstructured and Semi-Structured Data Insights with Generative AI","url":"https://doi.org/10.3997/2214-4609.202539022","published":"2025-01-01","authors":["Srivinasa Murthy Gunturu","Atharva Gadad","Y. Gubanov","D. Tishechkin"],"abstract":"Summary Unstructured and semi-structured data like reports, logs, and other related records and documents are abundant in the oil and gas industry but challenging to analyze due to lack of standardization and complexity. Traditionally, custom data parsers had to be written to aggregate and extract insights from these data sources. However, generative AI (Gen AI) can now accelerate this process. Leveraging large language models (LLMs) and Gen AI techniques, it now possible to use Gen AI to help create data parsing templates by specifying sections of interest on the sample data files. Gen AI can then scale and apply these templates to a broader set of data and automatically adjust the template selection based on the file or document type. This approach streamlines and simplifies pipelines to aggregate diverse unstructured and semi-structured data sources into centralized enterprise data re...","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.3997/2214-4609.202539022","openalex_id":"https://openalex.org/W4407984843","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Amazon (United States)"],"concepts":[{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.7318500280380249},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7014878392219543},{"id":"https://openalex.org/C2781252014","display_name":"Unstructured data","score":0.5926083326339722},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4441647231578827},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.41350260376930237},{"id":"https://openalex.org/C2522767166","display_name":"Data science","score":0.3391640782356262},{"id":"https://openalex.org/C75684735","display_name":"Big data","score":0.17475011944770813},{"id":"https://openalex.org/C124101348","display_name":"Data mining","score":0.1434251368045807}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4416707493","title":"A Novel Multi-Perspective Framework for Molecule Pretraining: From Atom to Motif Views","url":"https://doi.org/10.1109/jbhi.2025.3637248","published":"2025-01-01","authors":["Wei Wang","Dengzhen Lu","Suyu Dong","Gongning Luo","Kuanquan Wang","Shanzhuo Zhang"],"abstract":"Predicting molecular properties is vital for drug discovery, but experimental measurement is costly and limited by scarce labeled data. Self-supervised molecular pretraining can leverage large unlabeled datasets, reducing dependence on extensive annotations. However, most methods struggle to preserve domain-specific chemical knowledge, especially clinically relevant substructures such as motifs. Random masking and generic graph augmentations often degrade critical chemical information and harm interpretability. Many approaches also work at a single scale-either atom or motif-missing opportunities for cross-scale integration. We propose A2M-Mol, a multi-perspective molecular pretraining framework that combines atom-level and motif-level views through four parallel graph constructions. This design enables cross-view alignment and multiscale fusion, explicitly encoding chemical knowledge. A...","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/jbhi.2025.3637248","openalex_id":"https://openalex.org/W4416707493","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Baidu (China)","Harbin Institute of Technology","Northeast Forestry University"],"concepts":[{"id":"https://openalex.org/C2780022179","display_name":"Molecular graph","score":0.6744999885559082},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6635000109672546},{"id":"https://openalex.org/C153083717","display_name":"Leverage (statistics)","score":0.6320000290870667},{"id":"https://openalex.org/C80444323","display_name":"Theoretical computer science","score":0.45570001006126404},{"id":"https://openalex.org/C79581498","display_name":"Suite","score":0.438400000333786},{"id":"https://openalex.org/C184720557","display_name":"Topology (electrical circuits)","score":0.41530001163482666},{"id":"https://openalex.org/C132525143","display_name":"Graph","score":0.39399999380111694},{"id":"https://openalex.org/C2777402240","display_name":"Masking (illustration)","score":0.37599998712539673}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"official:3453f7c572a04855","title":"Energy-Based Diffusion Language Models for Text Generation","url":"https://research.nvidia.com/publication/2025-01_energy-based-diffusion-language-models-text-generation","published":"2025-01","authors":["Minkai Xu","Tomas Geffner","Karsten Kreis","Weili Nie","Yilun Xu","Jure Leskovec","Stefano Ermon","Arash Vahdat"],"abstract":"Official NVIDIA Research publication. ICLR","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["ICLR"],"author_affiliations":["NVIDIA"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official NVIDIA Research publications page https://research.nvidia.com/publications?f%5B0%5D=publication_date%3A2025&page=3"}},{"id":"official:b9f8a74d5ef7cd8b","title":"Cosmos World Foundation Model Platform for Physical AI","url":"https://research.nvidia.com/publication/2025-01_cosmos-world-foundation-model-platform-physical-ai","published":"2025-01","authors":["Ming-Yu Liu","Many other contributors at https://d1qx31qr3h6wln.cloudfront.net/publications/NVIDIA%20Cosmos4.pdf","Jing Zhang"],"abstract":"Official NVIDIA Research publication.","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["NVIDIA"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official NVIDIA Research publications page https://research.nvidia.com/publications?f%5B0%5D=publication_date%3A2025&page=3"}},{"id":"official:263d51b69aa5a568","title":"Where did it all go wrong? A hierarchical look into multi-agent error attribution","url":"https://www.amazon.science/publications/where-did-it-all-go-wrong-a-hierarchical-look-into-multi-agent-error-attribution","published":"2025","authors":["Adi Banerjee","Anirudh Nair","Tarik Borogovac"],"abstract":"Error attribution in Large Language Model (LLM) multi-agent systems presents a significant challenge in debugging and improving collaborative AI systems. Current approaches to pinpointing agent and step level failures in multi-agent interaction traces—whether using all-at-once evaluation, step-by-step analysis, or binary search—fall short when analyzing complex patterns, struggling with both accuracy and Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Conversational AI","LLM","language model","agent","multi-agent"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=9"}},{"id":"official:369b35f73da4f4fb","title":"RxLens: Multi-agent LLM-powered scan and order for pharmacy","url":"https://www.amazon.science/publications/rxlens-multi-agent-llm-powered-scan-and-order-for-pharmacy","published":"2025","authors":["Akshay Jagatap","Srujana Merugu","Prakash Mandayam Comar"],"abstract":"Automated construction of shopping cart from medical prescriptions is a vital prerequisite for scaling up online pharmaceutical services in emerging markets due to the high prevalence of paper prescriptions that are challenging for customers to interpret. We present RxLens, a multi-step end-end Large Language Model (LLM)-based deployed solution for automated pharmacy cart construction comprising multiple Category: Computer vision","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Computer vision","LLM","language model","agent","multi-agent"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=32"}},{"id":"official:79351567298a25ca","title":"Insight agents: An LLM-based multi-agent system for data","url":"https://www.amazon.science/publications/insight-agents-an-llm-based-multi-agent-system-for-data","published":"2025","authors":["Jincheng Bai","Zhenyu Zhang","Jennifer Zhang","Jason Zhu"],"abstract":"Today, E-commerce sellers face several key challenges, including difficulties in discovering and effectively utilizing available programs and tools, and struggling to understand and utilize rich data from various tools. We therefore aim to develop Insight Agents (IA), a conversational multi-agent Data Insight system, to provide E-commerce sellers with personalized data and business insights through automated Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Conversational AI","LLM","personalized","agent","multi-agent"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=20"}},{"id":"official:93d8ada96e934eda","title":"UXAgent: An LLM-agent-based usability testing framework for web design","url":"https://www.amazon.science/publications/uxagent-an-llm-agent-based-usability-testing-framework-for-web-design","published":"2025","authors":["Yuxuan Lu","Bingsheng Yao","Hansu Gu","Jing Huang","Jessie Wang","Laurence (Yang) Li","Jiri Gesi","Qi He","Dakuo Wang","Toby Jia-Jun Li"],"abstract":"Usability testing is a fundamental yet challenging research method for user experience (UX) researchers to evaluate a web design. Recent advances in Large Language Model-simulated Agent (LLM Agent) research inspired us to design UXAgent to support UX researchers in evaluating and reiterating their usability testing study design before they conduct the real human-subject study. Our system features an LLM Category: Machine learning","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Machine learning","LLM","language model","agent"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=30"}},{"id":"official:b6c9e621d5b5d5ab","title":"Structuring the unstructured: A multi-agent LLM framework for transforming ambiguous SOPs into code","url":"https://www.amazon.science/publications/structuring-the-unstructured-a-multi-agent-llm-framework-for-transforming-ambiguous-sops-into-code","published":"2025","authors":["Sachin Kumar Giroh","Pushpendu Ghosh","Aryan Jain","Harshal Paunikar","Anish Nediyanchath","Aditi Rastogi","Promod Yenigalla"],"abstract":"This paper introduces, a three-stage multi agent LLM framework designed to transform unstructured and ambiguous Standard Operating Procedure (SOP) into a structured plan and an executable code template. Unstructured SOPs—common across industries such as finance, retail, and logistics—frequently suffer from ambiguity, missing information, and inconsistency, all of which hinder automation. We address this Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Conversational AI","LLM","agent","multi-agent"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=9"}},{"id":"official:8ac920bc55a63a59","title":"Stochastic rounding for LLM training: Theory and practice","url":"https://www.amazon.science/publications/stochastic-rounding-for-llm-training-theory-and-practice","published":"2025","authors":["Kaan Ozkara","Tao Yu","Youngsuk Park"],"abstract":"As the parameters of Large Language Mod-els (LLMs) have scaled to hundreds of billions, the demand for efficient training methods—balancing faster computation and reduced memory usage without sacrificing accuracy—has become more critical than ever. In recent years, various mixed precision strategies, which involve different precision levels for optimization components, have been proposed to increase training Category: Machine learning","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Machine learning","LLM","memory","efficient"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=31"}},{"id":"official:a28ce0a5943b193c","title":"Scaling context, not parameters: Training a compact 7B language model for efficient long-context processing","url":"https://www.amazon.science/publications/scaling-context-not-parameters-training-a-compact-7b-language-model-for-efficient-long-context-processing","published":"2025","authors":["Chen Wu","Yin Song"],"abstract":"We present MegaBeam-Mistral-7B1, a language model that supports 512K-token context length. Our work addresses practical limitations in long-context training, supporting real-world tasks such as compliance monitoring and verification. Evaluated on three long-context benchmarks, our 7B-parameter model demonstrates superior in-context learning performance on HELMET and robust retrieval and tracing capability Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.18653/v1/2025.acl-industry.6","openalex_id":"https://openalex.org/W4412887017","cited_by_count":0,"quality_score":68,"matched_keywords":["Conversational AI","language model","retrieval","efficient"],"author_affiliations":["Amazon","Amazon (United States)"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=25"}},{"id":"official:c428279e30e42afe","title":"Rethinking LLM uncertainty: A multi-agent approach to estimating black-box model uncertainty","url":"https://www.amazon.science/publications/rethinking-llm-uncertainty-a-multi-agent-approach-to-estimating-black-box-model-uncertainty","published":"2025","authors":["Yu Feng","Phu Mon Htut","Zheng Qi","Wei Xiao","Manuel Mager (Turatemai)","Nikolaos Pappas","Kishaloy Halder","Yang Li","Yassine Benajiba","Dan Roth"],"abstract":"Quantifying uncertainty in black-box LLMs is vital for reliable responses and scalable oversight. Existing methods, which gauge a model's uncertainty through evaluating self-consistency in responses to the target query, can be misleading: an LLM may confidently provide an incorrect answer to a target query, yet give a confident and accurate answer to that same target query when answering a knowledge-preserving Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.48448/45vx-t019","openalex_id":"https://openalex.org/W7106810134","cited_by_count":0,"quality_score":68,"matched_keywords":["Conversational AI","LLM","agent","multi-agent"],"author_affiliations":["Amazon","Amazon (Germany)","Amazon (United States)","California University of Pennsylvania"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=12"}},{"id":"official:3855d0c8c17cbcf7","title":"RASL: Retrieval augmented schema linking for massive database text-to-SQL","url":"https://www.amazon.science/publications/rasl-retrieval-augmented-schema-linking-for-massive-database-text-to-sql","published":"2025","authors":["Jeff Eben","Aitzaz Ahmad","Stephen Lau"],"abstract":"Despite advances in large language model (LLM)-based natural language interfaces for databases, scaling to enterprise-level data catalogs remains an under-explored challenge. Prior works addressing this challenge rely on domain-specific fine-tuning—complicating deployment—and fail to leverage important semantic context contained within database metadata. To address these limitations, we introduce a component-based Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Conversational AI","LLM","language model","retrieval"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=18"}},{"id":"official:0e4ab1e3f81ea147","title":"PersonaAgent: When large language model agents meet personalization at test time","url":"https://www.amazon.science/publications/personaagent-when-large-language-model-agents-meet-personalization-at-test-time","published":"2025","authors":["Weizhi Zhang","Xinyang Zhang","Chenwei Zhang","Liangwei Yang","Jingbo Shang","Zhepei Wei","Henry Peng Zou","Zijie Huang","Zhengyang Wang","Yifan Gao","Xiaoman Pan","Lian Xiong"],"abstract":"Large Language Model (LLM)-powered agents have emerged as a new paradigm for complex, multi-turn human-AI interactions, yet most existing systems adopt a one-size-fits-all approach, neglecting the evolving preferences and goals of individual users. This limitation hinders their ability to maintain alignment and coherence over extended multi-turn interactions and dynamic tasks. To address this gap, we propose Category: Machine learning","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Machine learning","LLM","language model","personalization"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=8"}},{"id":"official:c3c1e9ad35f0a95d","title":"MaRGen: Multi-agent LLM approach for self-directed market research and analysis","url":"https://www.amazon.science/publications/margen-multi-agent-llm-approach-for-self-directed-market-research-and-analysis","published":"2025","authors":["Roman Koshkin","Pengyu Dai","Nozomi Fujikawa","Masahito Togami","Marco Visentini Scarzanella"],"abstract":"We present an autonomous framework that leverages Large Language Models (LLMs) to automate end-to-end business analysis and market report generation. At its core, the system employs specialized agents - Researcher, Reviewer, Writer, and Retriever - that collaborate to analyze data and produce comprehensive reports. These agents learn from real professional consultants' presentation materials at Amazon through Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Conversational AI","LLM","agent","multi-agent"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=15"}},{"id":"official:a68e507c5fb13561","title":"MLZero: A multi-agent system for automated end-to-end machine learning solutions","url":"https://www.amazon.science/publications/mlzero-a-multi-agent-system-for-automated-end-to-end-machine-learning-solutions","published":"2025","authors":["Haoyang Fang","Boran Han","Nick Erickson","Xiyuan Zhang","Zhou Su","Anirudh Dagar","Jiani Zhang","Caner Turkmen","Tony Hu","Huzefa Rangwala","Ying Nian Wu","Yuyang (Bernie) Wang"],"abstract":"Previous AutoML systems have made progress in automating machine learning workflows, but still require significant manual setup and expert knowledge. This paper presents a novel multi-agent system that integrates Large Language Models (LLMs) with external knowledge bases of existing machine learning tools to automate the complete end-to-end solution. To address the limitations of pure LLM solutions, including Category: Computer vision","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Computer vision","LLM","agent","multi-agent"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=10"}},{"id":"official:0ebfb19af3827161","title":"M-LLM based video frame selection for efficient video understanding","url":"https://www.amazon.science/publications/m-llm-based-video-frame-selection-for-efficient-video-understanding","published":"2025","authors":["Kai Hu","Feng Gao","Xiaohan Nie","Peng Zhou","Son Tran","Tal Neiman","Lingyun Wang","Mubarak Shah","Raffay Hamid","Bing Yin","Trishul Chilimbi"],"abstract":"Recent advances in Multi-Modal Large Language Models (M-LLMs) show promising results in video reasoning. Popular Multi-Modal Large Language Model (M-LLM) frameworks usually apply naive uniform sampling to reduce the number of video frames that are fed into an M-LLM, particularly for long context videos. However, it could lose crucial context in certain periods of a video, so that the downstream M-LLM may Category: Machine learning","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Machine learning","LLM","language model","efficient"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=29"}},{"id":"official:904b5e9a74c42dbb","title":"From unstructured communication to intelligent RAG: Multi-agent automation for supply chain knowledge bases","url":"https://www.amazon.science/publications/from-unstructured-communication-to-intelligent-rag-multi-agent-automation-for-supply-chain-knowledge-bases","published":"2025","authors":["Yao Zhang","Zaixi Shang","Silpan Patel","Mikel Zuniga"],"abstract":"Supply chain operations generate vast amounts of operational data; however, critical knowledge—such as system usage practices, troubleshooting workflows, and resolution techniques—often remains buried within unstructured communications like support tickets, emails, and chat logs. While Retrieval-Augmented Generation (RAG) systems aim to leverage such communications as a knowledge base, their effectiveness Category: Information and knowledge management","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Information and knowledge management","retrieval","agent","multi-agent"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=20"}},{"id":"official:e37d1a7fd1f83bfd","title":"Faithful, unfaithful or ambiguous? Multi-agent debate with initial stance for summary evaluation","url":"https://www.amazon.science/publications/faithful-unfaithful-or-ambiguous-multi-agent-debate-with-initial-stance-for-summary-evaluation","published":"2025","authors":["Mahnaz Koupaee","Jake Vincent","Saab Mansour","Igor Shalyminov","Han He","Hwanjun Song","Raphael Shu","Jianfeng He","Kevin Nian","Amy Wong","Kyu Han","Shawn Su"],"abstract":"Faithfulness evaluators based on large language models (LLMs) are often fooled by the fluency of the text and struggle with identifying errors in the summaries. We propose an approach to summary faithfulness evaluation in which multiple LLM-based agents are assigned initial stances (regardless of what their belief might be) and forced to come up with a reason to justify the imposed belief, thus engaging Category: Machine learning","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Machine learning","LLM","agent","multi-agent"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=30"}},{"id":"official:a9f0f09cff0934b7","title":"Enhancing LLM-as-a-judge via multi-agent collaboration","url":"https://www.amazon.science/publications/enhancing-llm-as-a-judge-via-multi-agent-collaboration","published":"2025","authors":["Yiyue Qian","Shinan Zhang","Yun Zhou","Haibo Ding","Diego Socolinsky","Yi Zhang"],"abstract":"Large Language Models (LLMs) have revolutionized AI-generated content evaluation, with the LLM-as-a-Judge paradigm becoming increasingly popular. However, current single-LLM evaluation approaches face significant challenges, including inconsistent judgments and inherent biases from pre-training data. To address these limitations, we propose CollabEval, a novel multi-agent evaluation framework that implements Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Conversational AI","LLM","agent","multi-agent"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=26"}},{"id":"official:29bf4e52599c331d","title":"DEPART: A hierarchical multi-agent system for multi-turn interaction","url":"https://www.amazon.science/publications/depart-a-hierarchical-multi-agent-system-for-multi-turn-interaction","published":"2025","authors":["Hao-Lun Hsu","Jing Xu","Nikhil Vichare","Francesco Carbone","Miroslav Pajic","Giuseppe Carenini"],"abstract":"Large Language Models (LLMs) perform well on short-horizon tasks but struggle with long-horizon, multimodal scenarios that require multi-step reasoning, perception, and adaptive planning. We identify two key challenges in these settings: the difficulty of long-term coordination between planning and execution within single-agent architectures and the inefficiency of indiscriminate visual grounding. To address Category: Automated reasoning","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Automated reasoning","long-term","agent","multi-agent"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=24"}},{"id":"official:34a0df676a88ce35","title":"Customer-R1: Personalized simulation of human behaviors via RL-based LLM agent in online shopping","url":"https://www.amazon.science/publications/customer-r1-personalized-simulation-of-human-behaviors-via-rl-based-llm-agent-in-online-shopping","published":"2025","authors":["Ziyi Wang","Yuxuan Lu","Yimeng Zhang","Jing Huang","Dakuo Wang"],"abstract":"Simulating step-wise human behavior with Large Language Models (LLMs) has become an emerging research direction, enabling applications in various practical domains. While prior methods, including prompting, supervised fine-tuning (SFT), and reinforcement learning (RL), have shown promise in modeling step-wise behavior, they primarily learn a population-level policy without conditioning on a user's persona Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Conversational AI","LLM","personalized","agent"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=11"}},{"id":"official:3817020a1b776e6d","title":"CEDA: Cross-modal Evaluation through Debate Agents for Robust Hallucination Detection","url":"https://www.amazon.science/publications/ceda-cross-modal-evaluation-through-debate-agents-for-robust-hallucination-detection","published":"2025","authors":["Susmit Neogi","Yun Wang"],"abstract":"We present CEDA, a novel multimodal framework for detecting hallucinations in large language model outputs through a multi-agent debate approach. While existing methods for black-box LLMs often rely on response sampling and self-consistency checking, our framework leverages a three-fold approach: a multi-agent debate setting to critically examine and debate the authenticity of generated content, a lightweight Category: Machine learning","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Machine learning","language model","agent","multi-agent"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=8"}},{"id":"official:8513667eede94277","title":"Building analyst-like agents: A self-improving multi-agent framework for financial reasoning in the enterprise","url":"https://www.amazon.science/publications/building-analyst-like-agents-a-self-improving-multi-agent-framework-for-financial-reasoning-in-the-enterprise","published":"2025","authors":["Sabrina Zhang","Daksha Yadav","Tom Jin","Miriam Teng"],"abstract":"Enterprise accounting data is complex, ambiguous, and shaped by evolving systems and regulations. The institutional knowledge needed to reason over the data is sparse, scattered and rarely structurally documented—posing major challenges for LLM agents. We introduce a multi-agent financial research framework that mimics a junior analyst’s onboarding and growth. The Analyst Agent learns proactively from repeated Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Conversational AI","LLM","agent","multi-agent"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=19"}},{"id":"official:64a2e1323a1f8000","title":"BYOKG-RAG: Multi-strategy graph retrieval for knowledge graph question answering","url":"https://www.amazon.science/publications/byokg-rag-multi-strategy-graph-retrieval-for-knowledge-graph-question-answering","published":"2025","authors":["Costas Mavromatis","Soji Adeshina","Vassilis N. Ioannidis","Zhen Han","Qi Zhu","Ian Robinson","Bryan Thompson","Huzefa Rangwala","George Karypis"],"abstract":"Knowledge graph question answering (KGQA) presents significant challenges due to the structural and semantic variations across input graphs. Existing works rely on Large Language Model (LLM) agents for graph traversal and retrieval; an approach that is sensitive to traversal initialization, as it is prone to entity linking errors and may not generalize well to custom (“bring-your-own”) KGs. We introduce Category: Search and information retrieval","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.48448/stnx-3t21","openalex_id":"https://openalex.org/W7106812264","cited_by_count":0,"quality_score":68,"matched_keywords":["Search and information retrieval","LLM","language model","retrieval"],"author_affiliations":["Amazon","Amazon (Germany)","Amazon (United States)","George Mason University"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=13"}},{"id":"official:7c67f7c59e4cabd1","title":"Auto-GDA: Automatic domain adaptation for grounding verification in retrieval-augmented generation","url":"https://www.amazon.science/publications/auto-gda-automatic-domain-adaptation-for-grounding-verification-in-retrieval-augmented-generation","published":"2025","authors":["Tobias Leemann","Periklis Petridis","Giuseppe Vietri","Dionysis Manousakas","Aaron Roth","Sergul Aydore"],"abstract":"While retrieval-augmented generation (RAG) has been shown to enhance factuality of large language model (LLM) outputs, LLMs still suffer from hallucination, generating incorrect or irrelevant information. A common detection strategy involves prompting the LLM again to assess whether its response is grounded in the retrieved evidence, but this approach is costly. Alternatively, lightweight natural language Category: Machine learning","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Machine learning","LLM","language model","retrieval"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=29"}},{"id":"official:488c7db3a99c5a61","title":"AgentOccam: A simple yet strong baseline for LLM-based web agents","url":"https://www.amazon.science/publications/agentoccam-a-simple-yet-strong-baseline-for-llm-based-web-agents","published":"2025","authors":["Ke Yang","Yao Liu","Sapana Chaudhary","Rasool Fakoor","Pratik Chaudhari","George Karypis","Huzefa Rangwala"],"abstract":"Autonomy via agents based on large language models (LLMs) that can carry out personalized yet standardized tasks presents a significant opportunity to drive human efficiency. There is an emerging need and interest in automating web tasks (e.g., booking a hotel for a given date within a budget). Being a practical use case itself, the web agent also serves as an important proof-of-concept example for various Category: Machine learning","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Machine learning","LLM","personalized","agent"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=27"}},{"id":"official:0ed9a1894cab68d4","title":"MAPoRL: Multi-agent post-co-training for collaborative large language models with reinforcement learning","url":"https://www.amazon.science/publications/maporl-multi-agent-post-co-training-for-collaborative-large-language-models-with-reinforcement-learning","published":"2025","authors":["Chanwoo Park","Seungju Han","Xingzhi (Jacky) Guo","Asuman Ozdaglar","Kaiqing Zhang","Joo-Kyung Kim"],"abstract":"Leveraging multiple large language models (LLMs) to build collaborative multi-agentic workflows has demonstrated significant potential. However, most previous studies focus on prompting the out-of-the-box LLMs, relying on their innate capability for collaboration, which may not improve LLMs’ performance as shown recently. In this paper, we introduce a new post-training paradigm MAPoRL (MultiAgent Post-co-training Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.18653/v1/2025.acl-long.1459","openalex_id":"https://openalex.org/W4412889721","cited_by_count":3,"quality_score":67,"matched_keywords":["Conversational AI","agent","multi-agent"],"author_affiliations":["Amazon","Amazon (United States)"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=23"}},{"id":"official:dbb330ff2295d9bc","title":"Zero-resource speech translation and recognition with LLMs","url":"https://www.amazon.science/publications/zero-resource-speech-translation-and-recognition-with-llms","published":"2025","authors":["Karel Mundnich","Xing Niu","Prashant Mathur","Srikanth Ronanki","Brady Houston","Veera Raghavendra Elluru","Nilaksh Das","Zejiang Hou","Goeric Huybrechts","Anshu Bhatia","Daniel Garcia-Romero","Kyu Han"],"abstract":"Despite recent advancements in speech processing, zero-resource speech translation (ST) and automatic speech recognition (ASR) remain challenging problems. In this work, we propose to leverage a multilingual Large Language Model (LLM) to perform ST and ASR in languages for which the model has never seen paired audio-text data. We achieve this by using a pre-trained multilingual speech encoder, a multilingual Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Conversational AI","LLM","language model"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=36"}},{"id":"official:74e22b8348bdb84a","title":"VL-Cache: Sparsity and modality-aware KV cache compression for vision-language model inference acceleration","url":"https://www.amazon.science/publications/vl-cache-sparsity-and-modality-aware-kv-cache-compression-for-vision-language-model-inference-acceleration","published":"2025","authors":["Dezhan Tu","Danylo Vashchilenko","Bryan Lu","Panpan Xu"],"abstract":"Vision-Language Models (VLMs) have demonstrated impressive performance across a versatile set of tasks. A key challenge in accelerating VLMs is storing and accessing the large Key-Value (KV) cache that encodes long visual contexts, such as images or videos. While existing KV cache compression methods are effective for Large Language Models (LLMs), directly migrating them to VLMs yields suboptimal accuracy Category: Machine learning","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Machine learning","language model","compression"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=30"}},{"id":"official:df89c44f598e2dd6","title":"VIT-Pro: Visual instruction tuning for product images","url":"https://www.amazon.science/publications/vit-pro-visual-instruction-tuning-for-product-images","published":"2025","authors":["Vishnu Prabhakaran","Purav Aggarwal","Vishruit Kulshreshtha","Arunita Das","Venkata Sitaram Sruti Sahini","Anoop S V K K Saladi"],"abstract":"General vision-language models (VLMs) trained on web data struggle to understand and converse about real-world e-commerce product images. We propose a cost-efficient approach for collecting training data to train a generative VLM for e-commerce product images. The key idea is to leverage large-scale, loosely-coupled image-text pairs from e-commerce stores, use a pre-trained LLM to generate multi-modal instruction-following Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Conversational AI","LLM","efficient"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=32"}},{"id":"official:880b9ecfdb04339d","title":"Unfixing the mental set: Granting early-stage reasoning freedom in multi-agent debate","url":"https://www.amazon.science/publications/unfixing-the-mental-set-granting-early-stage-reasoning-freedom-in-multi-agent-debate","published":"2025","authors":["Jing Wu","Suiyao Chen","Inseok Heo","Sasha Gutfraind","Shengjie Liu","Chen Li","Bharathi Srinivasan","Xian Zhang","Michael Sharps"],"abstract":"Large language models (LLMs) have demonstrated remarkable performance across a wide range of tasks in recent years. While prior work has explored leveraging LLMs to generate synthetic data for self-improvement, repeated iterations often suffer from diminishing returns due to the reliance on homogeneous reasoning patterns and limited exploration of alternative perspectives. In this paper, we introduce a Category: Machine learning","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Machine learning","agent","multi-agent"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=15"}},{"id":"official:7310daf23c925364","title":"UnDIAL: Self-distillation with adjusted logits for robust unlearning in large language models","url":"https://www.amazon.science/publications/undial-self-distillation-with-adjusted-logits-for-robust-unlearning-in-large-language-models","published":"2025","authors":["Yijiang River Dong","Hongzhou Lin","Mikhail Belkin","Ramon Huerta","Ivan Vulić"],"abstract":"Mitigating the retention of sensitive or private information in large language models is essential for enhancing privacy and safety. Existing unlearning methods, like Gradient Ascent and Negative Preference Optimization, directly tune models to remove unwanted information. However, these methods often become unstable because they fine-tune by maximizing cross-entropy loss, which is the opposite of traditional Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Conversational AI","preference","distillation"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=33"}},{"id":"official:2604ab398801614c","title":"Towards knowledge checking in retrieval-augmented generation: A representation perspective","url":"https://www.amazon.science/publications/towards-knowledge-checking-in-retrieval-augmented-generation-a-representation-perspective","published":"2025","authors":["Shenglai Zeng","Jiankun Zhang","Bingheng Li","Yuping Lin","Tianqi Zheng","Dante Everaert","Hanqing Lu","Hui Liu","Yue Xing","Monica Cheng","Jiliang Tang"],"abstract":"Retrieval-Augmented Generation (RAG) systems have shown promise in enhancing the performance of Large Language Models (LLMs). However, these systems face challenges in effectively integrating external knowledge with the LLM’s internal knowledge, often leading to issues with misleading or unhelpful information. This work aims to provide a systematic study on knowledge checking in RAG systems. We conduct Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Conversational AI","LLM","retrieval"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=35"}},{"id":"official:1ee4b5783d5a6297","title":"TailorSQL: An NL2SQL System Tailored to Your Query Workload","url":"https://www.amazon.science/publications/tailorsql-an-nl2sql-system-tailored-to-your-query-workload","published":"2025","authors":["Kapil Eknath Vaidya","Jialin Ding","Sebastian Kosak","David Kernert","Chuan Lei","Xiao Qin","Abhinav Tripathy","Ramesh Balan","Balakrishnan (Murali) Narayanaswamy","Tim Kraska"],"abstract":"NL2SQL (natural language to SQL) translates natural language questions into SQL queries, thereby making structured data accessible to non-technical users, serving as the foundation for intelligent data applications. State-of-the-art NL2SQL techniques typically perform translation by retrieving database-specific information, such as the database schema, and invoking a pre-trained large language model (LLM Category: Cloud and systems","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Cloud and systems","LLM","language model"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=13"}},{"id":"official:471cc7eeac0c105d","title":"Speed without sacrifice: Fine-tuning language models with Medusa and knowledge distillation in travel applications","url":"https://www.amazon.science/publications/speed-without-sacrifice-fine-tuning-language-models-with-medusa-and-knowledge-distillation-in-travel-applications","published":"2025","authors":["Daniel Zagyva","Emmanouil Stergiadis","Laurens van der Maas","Aleksandra Dokic","Eran Fainman","Ilya Gusev","Moran Beladev"],"abstract":"In high-stakes industrial NLP applications, balancing generation quality with speed and efficiency presents significant challenges. We address them by investigating two complementary optimization approaches: Medusa for speculative decoding and knowledge distillation (KD) for model compression. We demonstrate the practical application of these techniques in real-world travel domain tasks, including trip Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Conversational AI","compression","distillation"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=22"}},{"id":"official:7a76be1f01ce9e57","title":"Sparse augmented tensor networks for post-training compression of large language models","url":"https://www.amazon.science/publications/sparse-augmented-tensor-networks-for-post-training-compression-of-large-language-models","published":"2025","authors":["Ryan Solgi","Kai Zhen","Rupak Vignesh Swaminathan","Nathan Susanj","Thanasis Mouchtaris","Jimmy Kunzmann","Zheng Zhang"],"abstract":"The efficient implementation of large language models (LLMs) is crucial for deployment on resource-constrained devices. Low-rank tensor compression techniques, such as tensor-train (TT) networks, have been widely studied for over-parameterized neural networks. However, their applications to compress pre-trained large language models (LLMs) for downstream tasks (post-training) remains challenging due to Category: Machine learning","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Machine learning","efficient","compression"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=23"}},{"id":"official:9e88ab0cbf1dbae6","title":"Sharpness aware vision language model prompt tuning via forward-only passes","url":"https://www.amazon.science/publications/sharpness-aware-vision-language-model-prompt-tuning-via-forward-only-passes","published":"2025","authors":["Yifan Yang","Zhen Zhang","Rupak Vignesh Swaminathan","Jing Liu","Nathan Susanj","Zheng Zhang"],"abstract":"Fine-tuning vision language models (VLMs) has achieved remarkable performance across various downstream tasks, yet, it requires access to model gradients through backpropagation (BP), making them unsuitable for memory-constrained, inference-only edge devices. To address this limitation, previous work has explored various BP-free fine-tuning methods. However, these approaches often rely on high-variance Category: Computer vision","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Computer vision","language model","memory"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=9"}},{"id":"official:c9e12b5c84bf7cde","title":"Sequence-level large language model training with contrastive preference optimization","url":"https://www.amazon.science/publications/sequence-level-large-language-model-training-with-contrastive-preference-optimization","published":"2025","authors":["Zhili Feng","Dhananjay Ram","Cole Hawkins","Aditya Rawal","Jinman Zhao","Sheng Zha"],"abstract":"The next token prediction loss is the dominant self-supervised training objective for large language models and has achieved promising results in a variety of downstream tasks. However, upon closer investigation of this objective, we find that it lacks an understanding of sequence-level signals, leading to a mismatch between training and inference processes. To bridge this gap, we introduce a contrastive Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Conversational AI","language model","preference"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=31"}},{"id":"official:30bb43e8c9cc8b14","title":"SelfElicit: Your language model secretly knows where is the relevant evidence","url":"https://www.amazon.science/publications/selfelicit-your-language-model-secretly-knows-where-is-the-relevant-evidence","published":"2025","authors":["Zhining Liu","Rana Ali Amjad","Ravinarayana Adkathimar","Tianxin Wei","Hanghang Tong"],"abstract":"Providing Language Models (LMs) with relevant evidence in the context (either via retrieval or user-provided) can significantly improve their ability to provide better-grounded responses. However, recent studies have found that LMs often struggle to fully comprehend and utilize key evidence from the context, especially when it contains noise and irrelevant information—an issue common in real-world scenarios Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Conversational AI","language model","retrieval"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=21"}},{"id":"official:80dba4f9fc9fc704","title":"SQLGenie: A practical LLM based system for reliable and efficient SQL generation","url":"https://www.amazon.science/publications/sqlgenie-a-practical-llm-based-system-for-reliable-and-efficient-sql-generation","published":"2025","authors":["Pushpendu Ghosh","Aryan Jain","Promod Yenigalla"],"abstract":"Large Language Models (LLMs) enable natural language to SQL conversion, allowing users to query databases without SQL expertise. However, generating accurate, efficient queries is challenging due to ambiguous intent, domain knowledge requirements, and database constraints. Extensive reasoning improves SQL quality but increases computational costs and latency. We propose SQLGenie, a practical system for Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Conversational AI","LLM","efficient"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=25"}},{"id":"official:954e7010921edf4a","title":"SATA-BENCH: Select all that apply benchmark for multiple choice questions","url":"https://www.amazon.science/publications/sata-bench-select-all-that-apply-benchmark-for-multiple-choice-questions","published":"2025","authors":["Weijie Xu","Shixian Cui","Xi Fang","Stephanie Eckman","Chandan Reddy"],"abstract":"Current large language model (LLM) evaluations primarily focus on single-answer tasks, whereas many real-world applications require identifying multiple correct answers. This capability remains under-explored due to the lack of dedicated evaluation frameworks. We introduce SATA-BENCH, a benchmark for evaluating LLMs on Select All That Apply (SATA) questions spanning six domains, including read-ing comprehension Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Conversational AI","LLM","language model"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=7"}},{"id":"official:9b02ee2055d25303","title":"Rationale-guided distillation for e-commerce relevance classification: Bridging large language models and lightweight cross-encoders","url":"https://www.amazon.science/publications/rationale-guided-distillation-for-e-commerce-relevance-classification-bridging-large-language-models-and-lightweight-cross-encoders","published":"2025","authors":["Sanjay Agrawal","Faizan Ahemad","Vivek Sembium"],"abstract":"Accurately classifying the relevance of Query-Product pairs is critical in online retail stores such as Amazon, as displaying irrelevant products can harm user experience and reduce engagement. While Large Language Models (LLMs) excel at this task due to their broad knowledge and strong reasoning abilities. However, their high computational demands constrain their practical deployment in real-world applications Category: Search and information retrieval","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Search and information retrieval","retrieval","distillation"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=37"}},{"id":"official:822a98326783e1e3","title":"RAGferee: Building contextual reward models for retrieval-augmented generation","url":"https://www.amazon.science/publications/ragferee-building-contextual-reward-models-for-retrieval-augmented-generation","published":"2025","authors":["Andrei C. Coman","Ionut Teodor Sorodoc","Leonardo Ribeiro","James Henderson","Bill Byrne","Adrià de Gispert"],"abstract":"Existing Reward Models (RMs), typically trained on general preference data, struggle in Retrieval Augmented Generation (RAG) settings, which require judging responses for faithfulness to retrieved context, relevance to the user query, appropriate refusals when context is insufficient, completeness and conciseness of information. To address the lack of publicly available RAG-centric preference datasets and Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.48448/669j-2314","openalex_id":"https://openalex.org/W7106825042","cited_by_count":0,"quality_score":64,"matched_keywords":["Conversational AI","preference","retrieval"],"author_affiliations":["Amazon","Amazon (Germany)","Amazon (United States)","Idiap Research Institute","Universitat Pompeu Fabra"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=13"}},{"id":"official:6125b0d14ab92ea5","title":"R-VLM: Region-aware vision language model for precise GUI grounding","url":"https://www.amazon.science/publications/r-vlm-region-aware-vision-language-model-for-precise-gui-grounding","published":"2025","authors":["Joonhyung Park","Peng Tang","Sagnik Das","Srikar Appalaraju","Kunwar Yashraj Singh","R. Manmatha","Shabnam Ghadar"],"abstract":"Visual agent models for automating human activities on Graphical User Interfaces (GUIs) have emerged as a promising research direction, driven by advances in large Vision Language Models (VLMs). A critical challenge in GUI automation is the precise grounding of interface elements across diverse platforms. Existing vision-only GUI agents directly ground elements from large and cluttered screenshots, requiring Category: Computer vision","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Computer vision","language model","agent"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=18"}},{"id":"official:f61262cfccd277ce","title":"QID: Efficient query-informed ViTs in data-scarce regimes for OCR-free visual document understanding","url":"https://www.amazon.science/publications/qid-efficient-query-informed-vits-in-data-scarce-regimes-for-ocr-free-visual-document-understanding","published":"2025","authors":["Binh Le","Shaoyuan Xu","Jinmiao Fu","Zhishen (Leo) Huang","Moyan Li","Yanhui Guo","Hongdong Li","Sameera Ramasinghe","Bryan Wang"],"abstract":"In Visual Document Understanding (VDU) tasks, fine-tuning a pre-trained Vision-Language Model (VLM) with new datasets often falls short in optimizing the vision en-coder to identify query-specific regions in text-rich document images. Existing methods that directly inject queries into model layers by modifying the network architecture often struggle to adapt to new datasets with limited annotations. To Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.1109/cvprw67362.2025.00014","openalex_id":"https://openalex.org/W4414198353","cited_by_count":0,"quality_score":64,"matched_keywords":["Conversational AI","language model","efficient"],"author_affiliations":["Amazon","Amazon (United States)","Olympus (Australia)","Sungkyunkwan University"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=29"}},{"id":"official:e4529c336d9f3f6d","title":"PersonaLens: A benchmark for personalization evaluation in conversational AI assistants","url":"https://www.amazon.science/publications/personalens-a-benchmark-for-personalization-evaluation-in-conversational-ai-assistants","published":"2025","authors":["Zheng Zhao","Clara Vania","Deep Kayal","Naila Khan","Shay B. Cohen","Emine Yilmaz"],"abstract":"Large language models (LLMs) have advanced conversational AI assistants. However, systematically evaluating how well these assistants apply personalization—adapting to individual user preferences while completing tasks—remains challenging. Existing personalization benchmarks focus on chit-chat, nonconversational tasks, or narrow domains, failing to capture the complexities of personalized task-oriented Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Conversational AI","personalized","personalization"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=20"}},{"id":"official:73a5fb8777790dc0","title":"On synthetic data strategies for domain-specific generative retrieval","url":"https://www.amazon.science/publications/on-synthetic-data-strategies-for-domain-specific-generative-retrieval","published":"2025","authors":["Haoyang Wen","Jiang Guo","Yi Zhang","Jiarong Jiang","Zhiguo Wang"],"abstract":"This paper investigates synthetic data generation strategies in developing generative retrieval models for domain-specific corpora, thereby addressing the scalability challenges inherent in manually annotating in-domain queries. We study the data strategies for a two-stage training framework: in the first stage, which focuses on learning to decode document identifiers from queries, we investigate LLM-generated Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Conversational AI","LLM","retrieval"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=24"}},{"id":"official:1944284bb9fbff52","title":"Multimodal music tokenization with residual quantization for generative retrieval","url":"https://www.amazon.science/publications/multimodal-music-tokenization-with-residual-quantization-for-generative-retrieval","published":"2025","authors":["Wo Jae Lee","Rifat Joyee","Emanuele Coviello","Sudev Mukherjee"],"abstract":"Recent advances in generative retrieval allow large language models (LLMs) to recommend items by generating their identifiers token by token, rather than using nearest-neighbor search over embeddings. This approach requires each item, such as a music track, to be represented by a compact and semantically meaningful token sequence that LLMs can generate. We propose a multimodal music tokenizer (3MToken) Category: Search and information retrieval","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Search and information retrieval","retrieval","quantization"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=9"}},{"id":"official:a995cdebcdb686ac","title":"Mamba drafters for speculative decoding","url":"https://www.amazon.science/publications/mamba-drafters-for-speculative-decoding","published":"2025","authors":["Daewon Choi","Seunghyuk Oh","Saket Dingliwal","Jihoon Tack","Kyuyoung Kim","Woomin Song","Seojin Kim","Insu Han","Jinwoo Shin","Aram Galstyan","Shubham Katiyar","Sravan Babu Bodapati"],"abstract":"Speculative decoding has emerged as a promising approach to accelerating large language model (LLM) generation using a fast drafter while maintaining alignment with the target model’s distribution. However, existing approaches face a tradeoff: external drafters offer flexibility but can suffer from slower drafting, while self-speculation methods use drafters tailored to the target model but require re-training Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Conversational AI","LLM","language model"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=19"}},{"id":"official:dc5e55119e014086","title":"Leveraging product catalog patterns for multilingual e-commerce product attribute prediction","url":"https://www.amazon.science/publications/leveraging-product-catalog-patterns-for-multilingual-e-commerce-product-attribute-prediction","published":"2025","authors":["Bryan Zhang","Suleiman Khan","Stephan Walter"],"abstract":"E-commerce stores increasingly use Large Language Models (LLMs) to enhance catalog data quality through automated regeneration. A critical challenge is accurately predicting missing structured attribute values across multilingual product catalogs, where LLM performance varies significantly by language. While existing approaches leverage general knowledge through prompt engineering and external retrieval Category: Machine learning","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Machine learning","LLM","retrieval"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=12"}},{"id":"official:23d56e03c5e68a63","title":"Learning LLM preference over intra-dialogue pairs: A framework for utterance-level understandings","url":"https://www.amazon.science/publications/learning-llm-preference-over-intra-dialogue-pairs-a-framework-for-utterance-level-understandings","published":"2025","authors":["Xuanqing Liu","Chris (Luyang) Kong","Wei Niu","Afshin Khashei","Belinda Zeng","Steve Johnson","Jon Jay","Davor Golac","Matt Pope"],"abstract":"Large language models (LLMs) have demonstrated remarkable capabilities in handling complex dialogue tasks without requiring use case-specific fine-tuning. However, analyzing live dialogues in real-time necessitates low-latency processing systems, making it impractical to deploy models with billions of parameters due to latency constraints. As a result, practitioners often prefer smaller models with millions Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.18653/v1/2025.naacl-industry.8","openalex_id":"https://openalex.org/W4411120397","cited_by_count":0,"quality_score":64,"matched_keywords":["Conversational AI","LLM","preference"],"author_affiliations":["Amazon","Amazon (Germany)","Amazon (United States)"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=32"}},{"id":"official:90a70abb5239d026","title":"Large language model-enhanced reinforcement learning for diverse and novel recommendations","url":"https://www.amazon.science/publications/large-language-model-enhanced-reinforcement-learning-for-diverse-and-novel-recommendations","published":"2025","authors":["Jiin Woo","Ali Bagheri Garakani","Tianchen Zhou","Zhishen (Leo) Huang","Yan Gao"],"abstract":"In recommendation systems, diversity and novelty are essential for capturing varied user preferences and encouraging exploration, yet many systems prioritize click relevance. While reinforcement learning (RL) has been explored to improve diversity, it often depends on random exploration that may not align with user interests. We propose LAAC (LLM-guided Adversarial Actor Critic), a novel method that leverages Category: Machine learning","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Machine learning","LLM","language model"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=8"}},{"id":"official:bcd5dbe2236c41a3","title":"LLM-based dialogue labeling for multiturn adaptive RAG","url":"https://www.amazon.science/publications/llm-based-dialogue-labeling-for-multiturn-adaptive-rag","published":"2025","authors":["Zhiyu Chen","Biancen Xie","Sidarth Srinivasan","Qun Liu","Manikandarajan Ramanathan","Raj Maragoud"],"abstract":"Customer service often relies on human agents, which, while effective, can be costly and slower to scale. Recent advancements in intelligent chatbots, particularly Retrieval-Augmented Generation (RAG) models, have significantly enhanced efficiency by integrating large language models with external knowledge retrieval. However, developing a multi-turn RAG-based chatbot for real-world customer service presents Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.18653/v1/2025.emnlp-industry.72","openalex_id":"https://openalex.org/W4416037472","cited_by_count":0,"quality_score":64,"matched_keywords":["Conversational AI","LLM","retrieval"],"author_affiliations":["Amazon","Amazon (United States)"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=9"}},{"id":"official:55d7690617b7226a","title":"InfoPO: On mutual information maximization for large language model alignment","url":"https://www.amazon.science/publications/infopo-on-mutual-information-maximization-for-large-language-model-alignment","published":"2025","authors":["Teng Xiao","Zhen Ge","Sujay Sanghavi","Tian Wang","Julian Katz-Samuels","Skylar Versage","Qingjun Cui","Trishul Chilimbi"],"abstract":"We study the post-training of large language models (LLMs) with human preference data. Recently, direct preference optimization and its variants have shown considerable promise in aligning language models, eliminating the need for reward models and online sampling. Despite these benefits, these methods rely on explicit assumptions about the Bradley-Terry (BT) model, which makes them prone to over-fitting Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Conversational AI","language model","preference"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=34"}},{"id":"official:1e2cfb594958e65a","title":"In-context reinforcement learning based retrieval-augmented generation for text-to-SQL","url":"https://www.amazon.science/publications/in-context-reinforcement-learning-based-retrieval-augmented-generation-for-text-to-sql","published":"2025","authors":["Rishit Toteja","Arindam Sarkar","Prakash Mandayam Comar"],"abstract":"Text-to-SQL simplifies database interactions by enabling non-experts to convert their natural language (NL) questions to Structured Query Language (SQL) queries. With advancements in Large Language Models (LLM), in-context learning (ICL) has emerged as a popular choice for building Text-to-SQL systems. Real world, industry-scale databases, often comprise thousands of tables and hundreds of columns, and Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Conversational AI","LLM","retrieval"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=36"}},{"id":"official:a8ec0784c415cde8","title":"Hyperband-based Bayesian optimization for black-box prompt selection","url":"https://www.amazon.science/publications/hyperband-based-bayesian-optimization-for-black-box-prompt-selection","published":"2025","authors":["Lennart Schneider","Martin Wistuba","Aaron Klein","Jacek Golebiowski","Giovanni Zappella","Felice Antonio Merra"],"abstract":"Optimal prompt selection is crucial for maximizing large language model (LLM) performance on downstream tasks, especially in black-box settings where models are only accessible via APIs. Black-box prompt selection is challenging due to potentially large, combinatorial search spaces, absence of gradient information, and high evaluation cost of prompts on a validation set. We propose HbBoPs, a novel method Category: Machine learning","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Machine learning","LLM","language model"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=23"}},{"id":"official:c2898fa3c247eb64","title":"How and where to translate? The impact of translation strategies in cross-lingual LLM prompting","url":"https://www.amazon.science/publications/how-and-where-to-translate-the-impact-of-translation-strategies-in-cross-lingual-llm-prompting","published":"2025","authors":["Aman Gupta","Yingying Zhuang","Zhou Yu","Ziji Zhang","Anurag Beniwal"],"abstract":"Despite advances in the multilingual capabilities of Large Language Models (LLMs), their performance varies substantially across different languages and tasks. In multilingual retrieval-augmented generation (RAG)-based systems, knowledge bases (KB) are often shared from high-resource languages (such as English) to lowresource ones, resulting in retrieved information from the KB being in a different language Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Conversational AI","LLM","retrieval"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=18"}},{"id":"official:6a9c31db8925b2fb","title":"Hephaestus: Improving fundamental agent capabilities of large language models through continual pre-training","url":"https://www.amazon.science/publications/hephaestus-improving-fundamental-agent-capabilities-of-large-language-models-through-continual-pre-training","published":"2025","authors":["Yuchen Zhuang","Jingfeng Yang","Haoming Jiang","Xin Liu","Kewei Cheng","Sanket Lokegaonkar","Yifan Gao","Qing Ping","Tianyi Liu","Binxuan Huang","Zheng Li","Zhengyang Wang"],"abstract":"Due to the scarcity of agent-oriented pre-training data, LLM-based autonomous agents typically rely on complex prompting or extensive fine-tuning, which often fails to introduce new capabilities while preserving strong generalizability. We introduce Hephaestus-Forge, the first large-scale pre-training corpus designed to enhance the fundamental capabilities of LLM agents in API function calling, intrinsic Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Conversational AI","LLM","agent"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=33"}},{"id":"official:8b625f9489bd87f4","title":"GENIUS: A generative framework for universal multimodal search","url":"https://www.amazon.science/publications/genius-a-generative-framework-for-universal-multimodal-search","published":"2025","authors":["Sungyeon Kim","Xinliang Zhu","Xiaofan Lin","Muhammet Bastan","Doug Gray","Suha Kwak"],"abstract":"Generative retrieval is an emerging approach in information retrieval that generates identifiers (IDs) of target data based on a query, providing an efficient alternative to traditional embedding-based retrieval methods. However, existing models are task-specific and fall short of embedding-based retrieval in performance. This paper proposes GENIUS, a universal generative retrieval framework supporting Category: Computer vision","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Computer vision","retrieval","efficient"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=28"}},{"id":"official:b982749ffea36afd","title":"FalseReject: A resource for improving contextual safety and mitigating over-refusals in LLMs via structured reasoning","url":"https://www.amazon.science/publications/falsereject-a-resource-for-improving-contextual-safety-and-mitigating-over-refusals-in-llms-via-structured-reasoning","published":"2025","authors":["Zhehao Zhang","Weijie Xu","Fanyou Wu","Chandan Reddy"],"abstract":"Safety alignment approaches in large language models (LLMs) often lead to the over-refusal of benign queries, significantly diminishing their utility in sensitive scenarios. To address this challenge, we introduce FalseReject, a comprehensive resource containing 16k seemingly toxic queries accompanied by structured responses across 44 safety-related categories. We propose a graph-informed adversarial multi-agent Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Conversational AI","agent","multi-agent"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=14"}},{"id":"official:f0a09d21d360e11c","title":"Exposing privacy gaps: Membership inference attack on preference data for LLM alignment","url":"https://www.amazon.science/publications/exposing-privacy-gaps-membership-inference-attack-on-preference-data-for-llm-alignment","published":"2025","authors":["Qizhang Feng","Siva Rajesh Kasa","Santhosh Kasa","Hyokun Yun","Choon Hui Teo","Sravan Bodapati"],"abstract":"Large Language Models (LLMs) have seen widespread adoption due to their remarkable natural language capabilities. However, when deploying them in real-world settings, it is important to align LLMs to generate texts according to acceptable human standards. Methods such as Proximal Policy Optimization (PPO) and Direct Preference Optimization (DPO) have enabled significant progress in refining LLMs using human Category: Machine learning","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Machine learning","LLM","preference"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=21"}},{"id":"official:f490f84567b4611e","title":"EcomScriptBench: A multi-task benchmark for e-commerce script planning via step-wise intention-driven product association","url":"https://www.amazon.science/publications/ecomscriptbench-a-multi-task-benchmark-for-e-commerce-script-planning-via-step-wise-intention-driven-product-association","published":"2025","authors":["Weiqi Wang","Limeng Cui","Xin Liu","Sreyashi Nag","Wenju Xu","Chen Luo","Sheikh Muhammad Sarwar","Laurence (Yang) Li","Hansu Gu","Hui Liu","Changlong Yu","Jiaxin Bai"],"abstract":"Goal-oriented script planning, or the ability to plan coherent sequences of actions toward specific goals, is commonly used by humans to plan for daily activities. In e-commerce, customers increasingly seek LLM-based assistants to plan for them with a script and recommend products at each step, thereby facilitating convenient and efficient shopping experiences. However, this capability remains under-explored Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Conversational AI","LLM","efficient"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=23"}},{"id":"baidu-ernie:official:4035175a72c03cfa","title":"ERNIE 4.5 Technical Report","url":"https://ernie.baidu.com/blog/publication/ERNIE_Technical_Report.pdf","published":"2025","authors":["ERNIE Team","Baidu"],"abstract":"Official ERNIE/Baidu publication page entry.","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["ERNIE","Baidu","technical report"],"author_affiliations":["Baidu"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official ERNIE publication page https://ernie.baidu.com/blog/publication/"}},{"id":"official:2e19206aff1bc4d6","title":"Do LLMs recognize your preferences? Evaluating personalized preference following in LLMs","url":"https://www.amazon.science/publications/do-llms-recognize-your-preferences-evaluating-personalized-preference-following-in-llms","published":"2025","authors":["Siyan Zhao","Mingyi Hong","Yang Liu","Devamanyu Hazarika","Kaixiang Lin"],"abstract":"Large Language Models (LLMs) are increasingly used as chatbots, yet their ability to personalize responses to user preferences remains limited. We introduce PREFEVAL, a benchmark for evaluating LLMs’ ability to infer, memorize and adhere to user preferences in a long-context conversational setting. PREFEVAL comprises 3,000 manually curated user preference and query pairs spanning 20 topics. PREFEVAL contains Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Conversational AI","personalized","preference"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=30"}},{"id":"official:950dd9a6c80fce1f","title":"DIVERSED: Relaxed speculative decoding via dynamic ensemble verification","url":"https://www.amazon.science/publications/diversed-relaxed-speculative-decoding-via-dynamic-ensemble-verification","published":"2025","authors":["Ziyi Wang","Siva Rajesh Kasa","Ankith M S","Santhosh Kasa","Jiaru Zou","Nan Jiang","Sumit Negi","Ruqi Zhang","Qifan Song"],"abstract":"Speculative decoding is an effective technique for accelerating large language model (LLM) inference by drafting multiple tokens in parallel. However, its practical speedup is often limited by a rigid verification step, which strictly enforces that the accepted token distribution exactly matches that of the target model. This constraint leads to the rejection of many plausible tokens, reducing the acceptance Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Conversational AI","LLM","language model"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=24"}},{"id":"official:487b0a8c6d102bd2","title":"Contextual ASR with retrieval augmented large language model","url":"https://www.amazon.science/publications/contextual-asr-with-retrieval-augmented-large-language-model","published":"2025","authors":["Cihan Xiao","Zejiang Hou","Daniel Garcia-Romero","Kyu Han"],"abstract":"Automatic speech recognition (ASR) systems can benefit from incorporating contextual information to improve recognition accuracy, especially for uncommon words or phrases. Current approaches like custom vocabularies or prompting with previous transcript segments provide limited contextual control. Compared to existing context biasing methods, RAG promises more flexible and scalable contextual control by Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Conversational AI","language model","retrieval"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=34"}},{"id":"official:385b9653da90bf1c","title":"Context length alone hurts LLM performance despite perfect retrieval","url":"https://www.amazon.science/publications/context-length-alone-hurts-llm-performance-despite-perfect-retrieval","published":"2025","authors":["Yufeng Du","Minyang Tian","Srikanth Ronanki","Subendhu Rongali","Sravan Bodapati","Aram Galstyan","Azton Wells","Roy Schwartz","Eliu A Huerta","Hao Peng"],"abstract":"Large language models (LLMs) often fail to scale their performance on long-context tasks performance in line with the context lengths they support. This gap is commonly attributed to retrieval failures—the models' inability to identify relevant information in the long inputs. Accordingly, recent efforts often focus on evaluating and improving LLMs' retrieval performance: if retrieval is perfect, a model Category: Machine learning","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Machine learning","LLM","retrieval"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=12"}},{"id":"official:cb5cc163190993b3","title":"Constrained decoding with speculative lookaheads","url":"https://www.amazon.science/publications/constrained-decoding-with-speculative-lookaheads","published":"2025","authors":["Nishanth Sridhar Nakshatri","Shamik Roy","Rajarshi (Raj) Das","Un Ch","Leo Boytsov","Rashmi Gangadharaiah"],"abstract":"Constrained decoding with lookahead heuristics (CDLH) is a highly effective method for aligning LLM generations to human preferences. However, the extensive lookahead rollout operations for each generated token makes CDLH prohibitively expensive, resulting in low adoption in practice. In contrast, common decoding strategies such as greedy decoding are extremely efficient, but achieve very low constraint Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Conversational AI","LLM","efficient"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=33"}},{"id":"official:2eb7d48110442b3d","title":"Compress, gather, and recompute: REFORMing long-context processing in transformers","url":"https://www.amazon.science/publications/compress-gather-and-recompute-reforming-long-context-processing-in-transformers","published":"2025","authors":["Woomin Song","Sai Muralidhar Jayanthi","Srikanth Ronanki","Kanthashree Mysore Sathyendra","Jinwoo Shin","Aram Galstyan","Shubham Katiyar","Sravan Babu Bodapati"],"abstract":"As large language models increasingly gain popularity in real-world applications, processing extremely long contexts, often exceeding the model’s pre-trained context limits, has emerged as a critical challenge. While existing approaches to efficient long-context processing show promise, recurrent compression-based methods struggle with information preservation, whereas random access approaches require substantial Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Conversational AI","efficient","compression"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=19"}},{"id":"official:0cc2c9f8f91d4a0e","title":"CoLLM: A large language model for composed image retrieval","url":"https://www.amazon.science/publications/collm-a-large-language-model-for-composed-image-retrieval","published":"2025","authors":["Chuong Huynh","Jinyu Yang","Ashish Tawari","Mubarak Shah","Son Tran","Raffay Hamid","Trishul Chilimbi","Abhinav Shrivastava"],"abstract":"Composed Image Retrieval (CIR) is a complex task that aims to retrieve images based on a multimodal query. Typical training data consists of triplets containing a reference image, a textual description of desired modifications, and the target image, which are expensive and time-consuming to acquire. The scarcity of CIR datasets has led to zero-shot approaches utilizing synthetic triplets or leveraging vision-language Category: Computer vision","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Computer vision","language model","retrieval"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=29"}},{"id":"official:89a8fc9ac6122d1e","title":"CACHE-ED: Redefining document entity extraction with graph-based templates, actor-critic agents & HIL","url":"https://www.amazon.science/publications/cache-ed-redefining-document-entity-extraction-with-graph-based-templates-actor-critic-agents-hil","published":"2025","authors":["Sudhanshu Bhoi","Harish Y V S"],"abstract":"In this paper, we present CACHE-ED, a novel framework for document entity extraction that combines the power of large language models (LLMs) with graph-based document representations, caching mechanisms, and an actor-critic multi-agent architecture. Our approach addresses the inefficiencies and inaccuracies that are common in extracting structured information from documents, particularly in templated formats Category: Machine learning","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Machine learning","agent","multi-agent"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=24"}},{"id":"official:53014a29fa4a000b","title":"Building multi-turn RAG for customer support with LLM labeling","url":"https://www.amazon.science/publications/building-multi-turn-rag-for-customer-support-with-llm-labeling","published":"2025","authors":["Zhiyu Chen","Biancen Xie","Sidarth Srinivasan","Qun Liu","Manikandarajan Ramanathan","Raj Maragoud"],"abstract":"Customer service in e-commerce often relies on human agents to handle inquiries related to orders, returns, and product information. While this approach is effective, it can be expensive and difficult to scale during periods of high demand. Recent advances in intelligent chatbots, particularly those based on Retrieval Augmented Generation (RAG) models, have significantly improved customer service efficiency Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Conversational AI","LLM","retrieval"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=18"}},{"id":"official:ea931764423b7c82","title":"AttributeForge: An agentic LLM framework for automated product schema modeling","url":"https://www.amazon.science/publications/attributeforge-an-agentic-llm-framework-for-automated-product-schema-modeling","published":"2025","authors":["Yunhan Huang","Klevis Ramo","Andrea Iovine","Melvin Monteiro","Sedat Gokalp","Arjun Bakshi","Hasan Turalic","Arsh Kumar","Jona Neumeier","Ripley Yates","Rejaul Monir","Simon Hartmann"],"abstract":"Effective product schema modeling is fundamental to e-commerce success, enabling accurate product discovery and superior customer experience. However, traditional manual schema modeling processes are severely bottlenecked, producing only tens of attributes per month, which is insufficient for modern e-commerce platforms managing thousands of product types. This paper introduces AttributeForge, the first Category: Search and information retrieval","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Search and information retrieval","LLM","retrieval"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=23"}},{"id":"official:fe680594f89a9ab0","title":"Aligning to constraints for data-efficient language model customization","url":"https://www.amazon.science/publications/aligning-to-constraints-for-data-efficient-language-model-customization","published":"2025","authors":["Fei Wang","Chao Shang","Shuai Wang","Sarthak Jain","Qiang Ning","Bonan Min","Yassine Benajiba","Vittorio Castelli","Dan Roth"],"abstract":"General-purpose language models (LMs) are aligned to diverse user intents, but fall short when it comes to specific applications. While finetuning is the default method for customized alignment, human annotations are often unavailable in various customization scenarios. Based on the observation that one of the main issues of LM customization is constraint adherence, we investigate the feasibility of using Category: Machine learning","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Machine learning","language model","efficient"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=32"}},{"id":"official:fb7a5327d31b35f0","title":"Active evaluation acquisition for efficient LLM benchmarking","url":"https://www.amazon.science/publications/active-evaluation-acquisition-for-efficient-llm-benchmarking","published":"2025","authors":["Yang Li","JIE MA","Miguel Ballesteros","Yassine Benajiba","Graham Horwood"],"abstract":"As large language models (LLMs) become increasingly versatile, numerous large scale benchmarks have been developed to thoroughly assess their capabilities. These benchmarks typically consist of diverse datasets and prompts to evaluate different aspects of LLM performance. However, comprehensive evaluations on hundreds or thousands of prompts incur tremendous costs in terms of computation, money, and time Category: Machine learning","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Machine learning","LLM","efficient"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=21"}},{"id":"official:1210ee9899014878","title":"A tri-agent framework for evaluating and aligning question clarification capabilities of large language models","url":"https://www.amazon.science/publications/a-tri-agent-framework-for-evaluating-and-aligning-question-clarification-capabilities-of-large-language-models","published":"2025","authors":["Yikai Zhao"],"abstract":"Large Language Models (LLMs) are increasingly deployed in interactive systems where understanding user intent precisely is paramount. A key capability for such systems is effective question clarification, especially when user queries are ambiguous or underspecified. This paper introduces a novel tri-agent framework for the robust evaluation of an LLM’s ability to engage in clarifying dialogue. Our framework Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Conversational AI","LLM","agent"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=16"}},{"id":"official:fa41b8155718a7a8","title":"Scalable, validated code translation of entire projects using large language models","url":"https://www.amazon.science/publications/scalable-validated-code-translation-of-entire-projects-using-large-language-models","published":"2025","authors":["Hanliang Zhang","Cristina David","Meng Wang","Brandon Paulsen","Daniel Kroening"],"abstract":"Large language models (LLMs) show promise in code translation due to their ability to generate idiomatic code. However, a significant limitation when using LLMs for code translation is scalability: existing works have shown a drop in translation success rates for code exceeding around 100 lines. We overcome this limitation by developing a modular approach to translation, where we partition the code into Category: Automated reasoning","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.1145/3729315","openalex_id":"https://openalex.org/W4411267085","cited_by_count":7,"quality_score":63,"matched_keywords":["Automated reasoning"],"author_affiliations":["Amazon","Amazon (United States)","Seattle University","University of Bristol"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=27"}},{"id":"official:bc7abf137132b354","title":"Enhancing e-commerce representation learning via hypergraph contrastive learning and interpretable LLM-driven analysis","url":"https://www.amazon.science/publications/enhancing-e-commerce-representation-learning-via-hypergraph-contrastive-learning-and-interpretable-llm-driven-analysis","published":"2025","authors":["Yiyue Qian","Shinan Zhang","Philip Chen","Diego Socolinsky","Negin Sokhandan","Song Cui","De Chen","Suchitra Sathyanarayana"],"abstract":"E-commerce has experienced significant growth recently, generating vast amounts of data on user preferences, interactions, and purchase patterns. Effectively modeling and representing users and products in these online ecosystems is crucial for various applications. However, existing approaches for e-commerce representation learning face several limitations: (i) they primarily consider user behavior patterns Category: Machine learning","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.1145/3701716.3717579","openalex_id":"https://openalex.org/W4410636571","cited_by_count":3,"quality_score":63,"matched_keywords":["Machine learning","LLM"],"author_affiliations":["Amazon","Amazon (United States)","Seattle University"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=20"}},{"id":"official:50d426d8f7f42f8e","title":"Zero-shot 3D question answering via voxel-based dynamic token compression","url":"https://www.amazon.science/publications/zero-shot-3d-question-answering-via-voxel-based-dynamic-token-compression","published":"2025","authors":["Hsiang-Wei Huang","Fu-Chen Chen","Wenhao Chai","Che-Chun Su","Lu Xia","Sanghun Jung","Cheng-Yen Yang","Jenq-Neng Hwang","Min Sun","Cheng-Hao Kuo"],"abstract":"Recent advancements in 3D Large Multi-modal Models (3D-LMMs) have driven significant progress in 3D question answering. However, recent multi-frame VisionLanguage Models (VLMs) demonstrate superior performance compared to 3D-LMMs on 3D question answering tasks, largely due to the greater scale and diversity of available 2D image data in contrast to the more limited 3D data. Multi-frame VLMs, although achieving Category: Computer vision","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Computer vision","compression"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=27"}},{"id":"official:f7a529458d0aa410","title":"Zero-knowledge LLM hallucination detection and mitigation through fine-grained cross-model consistency","url":"https://www.amazon.science/publications/zero-knowledge-llm-hallucination-detection-and-mitigation-through-fine-grained-cross-model-consistency","published":"2025","authors":["Aman Goel","Daniel Schwartz","Yanjun (Jane) Qi"],"abstract":"Large language models (LLMs) have demonstrated impressive capabilities across diverse tasks, but they remain susceptible to hallucinations— generating content that appears plausible but contains factual inaccuracies. We present FINCH-ZK, a black-box framework that leverages FINe-grained Cross-model consistency to detect and mitigate Hallucinations in LLM outputs without requiring external knowledge sources Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.18653/v1/2025.emnlp-industry.139","openalex_id":"https://openalex.org/W4416037193","cited_by_count":0,"quality_score":60,"matched_keywords":["Conversational AI","LLM"],"author_affiliations":["Amazon","Amazon (United States)"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=11"}},{"id":"official:ae5973f7a68df280","title":"What do LLMs understand about international trade? Introducing TradeGov dataset for international trade Q&A evaluation","url":"https://www.amazon.science/publications/what-do-llms-understand-about-international-trade-introducing-tradegov-dataset-for-international-trade-q-a-evaluation","published":"2025","authors":["Kriti Mahajan"],"abstract":"Given the constant flux in the world of geopolitics, staying up to date and compliant with international trade issues is challenging. But exploring if LLMs can aid this task is a frontier hither to unexplored in the LLM evaluation literature - primarily due to the lack of a dataset set for benchmarking the capabilities of LLMs on questions regarding international trade subjects. To address this gap, we Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Conversational AI","LLM"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=8"}},{"id":"official:cae45bb366ccb8d3","title":"Unlocking efficient, scalable, and continual knowledge editing with basis-level representation fine-tuning","url":"https://www.amazon.science/publications/unlocking-efficient-scalable-and-continual-knowledge-editing-with-basis-level-representation-fine-tuning","published":"2025","authors":["Tianci Liu","Ruirui Li","Yunzhe Qi","Hui Liu","Xianfeng Tang","Tianqi Zheng","Qingyu Yin","Monica Cheng","Luke Huan","Haoyu Wang","Jing Gao"],"abstract":"Large language models (LLMs) have achieved remarkable performance on various natural language tasks. However, they are trained on static corpora and their knowledge can become outdated quickly in the fast-changing world. This motivates the development of knowledge editing methods designed to update certain knowledge in LLMs without changing unrelated others. To make selective edits, previous efforts often Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Conversational AI","efficient"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=31"}},{"id":"official:7bdde99bd60afa05","title":"Universal semantic disentangled privacy-preserving speech representation learning","url":"https://www.amazon.science/publications/universal-semantic-disentangled-privacy-preserving-speech-representation-learning","published":"2025","authors":["Biel Tura Vecino","Subhadeep Maji","Aravind Varier","Antonio Bonafonte","Ivan Valles","Michael Owen","Costas Papayiannis","Leif Rādel","Grant Strimel","Oluwaseyi Feyisetan","Roberto Barra-Chicote","Ariya Rastrow"],"abstract":"The use of human speech to train LLMs poses privacy concerns due to these models’ ability to generate samples that closely resemble artifacts in the training data. We propose a speaker privacy-preserving representation learning method through the Universal Speech Codec (USC), a computationally efficient codec that disentangles speech into: (i) privacy-preserving semantically rich representations, capturing Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Conversational AI","efficient"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=25"}},{"id":"official:3754578364470fed","title":"Understanding the limitations of medical reasoning in large language models","url":"https://www.amazon.science/publications/understanding-the-limitations-of-medical-reasoning-in-large-language-models","published":"2025","authors":["Bill Cai","Xiaogang Wang","Ujjwal Ratan","Yash Shah"],"abstract":"Large language models demonstrate impressive performance on standardized healthcare benchmarks, yet their deployment readiness for real-world environments remains poorly understood. Current medical benchmarks present idealized scenarios that misrepresent the complexity of actual clinical data. We systematically evaluate LLM robustness by introducing clinician-validated perturbations to MedQA that mirror Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Conversational AI","LLM"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=24"}},{"id":"official:419eef7f3af422fb","title":"Understanding and improving information preservation in prompt compression for LLMs","url":"https://www.amazon.science/publications/understanding-and-improving-information-preservation-in-prompt-compression-for-llms","published":"2025","authors":["Weronika Łajewska","Momchil Hardalov","Laura Aina","Neha Anna John","Hang Su","Lluís Marquez"],"abstract":"Recent advancements in large language models (LLMs) have enabled their successful application to a broad range of tasks. However, in information-intensive tasks, the prompt length can grow fast, leading to increased computational requirements, performance degradation, and induced biases from irrelevant or redundant information. Recently, various prompt compression techniques have been introduced to optimize Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Conversational AI","compression"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=11"}},{"id":"official:96eabfe6e40e71dd","title":"UTFix: Change aware unit test repairing using LLM","url":"https://www.amazon.science/publications/utfix-change-aware-unit-test-repairing-using-llm","published":"2025","authors":["Shanto Rahman","Sachit Kuhar","Berk Cirisci","Pranav Garg","Shiqi Wang","Xiaofei Ma","Anoop Deoras","Baishakhi Ray"],"abstract":"Software updates, including bug repair and feature additions, are frequent in modern applications but they often leave test suites outdated, resulting in undetected bugs and increased chances of system failures. A recent study by Meta revealed that 14%-22% of software failures stem from outdated tests that fail to reflect changes in the codebase. This highlights the need to keep tests in sync with code Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Conversational AI","LLM"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=29"}},{"id":"official:73e02f04c4dab485","title":"Tuning-free personalized alignment via trial-error-explain in-context learning","url":"https://www.amazon.science/publications/tuning-free-personalized-alignment-via-trial-error-explain-in-context-learning","published":"2025","authors":["Hyundong Cho","Karishma Sharma","Nicolaas Jedema","Leonardo Ribeiro","Alessandro Moschitti","Ravi Krishnan","Jonathan May"],"abstract":"Language models are aligned to the collective voice of many, resulting in generic out-puts that do not align with specific users’ styles. In this work, we present Trial-Error-Explain In-Context Learning (TICL), a tuning-free method that personalizes language models for text generation tasks with fewer than 10 examples per user. TICL iteratively expands an in-context learning prompt via a trial-error-explain Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Conversational AI","personalized"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=33"}},{"id":"official:b6a973f9e5f19dcd","title":"Trustworthiness-as-reward: Improving LLM performance on text classification through reinforcement learning","url":"https://www.amazon.science/publications/trustworthiness-as-reward-improving-llm-performance-on-text-classification-through-reinforcement-learning","published":"2025","authors":["Yiqing Zhao","Xiaohui Shen","Lanfeng Pan"],"abstract":"Text classification has become increasingly important with the exponential growth of digital text data, finding applications in sentiment analysis, spam detection, topic categorization, and content moderation across various domains. Our research introduced a novel approach that integrates reinforcement learning with a specialized reasoning path. This methodology enabled smaller 7B parameter language models Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Conversational AI","LLM"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=13"}},{"id":"official:f00f79111f570eda","title":"Transforming expert insight into scalable ai assessment: A framework for LLM-generated metrics and user-calibrated evaluation","url":"https://www.amazon.science/publications/transforming-expert-insight-into-scalable-ai-assessment-a-framework-for-llm-generated-metrics-and-user-calibrated-evaluation","published":"2025","authors":["Nicholas Choma","Sreecharan Sankaranarayanan","Rajesh Cherukuri"],"abstract":"Effectively assessing AI systems, particularly those operating in specialized domains or producing dynamic outputs, requires translating nuanced human expertise into scalable, quantitative measures. Traditional metrics often fall short in capturing qualitative requirements that domain experts intuitively grasp. This paper presents a novel framework that systematically transforms qualitative expert feedback Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Conversational AI","LLM"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=17"}},{"id":"official:8d443aeee904f2f9","title":"Train a unified multimodal data quality classifier with synthetic data","url":"https://www.amazon.science/publications/train-a-unified-multimodal-data-quality-classifier-with-synthetic-data","published":"2025","authors":["Weizhi Wang","Rongmei Lin","Shiyang Li","Colin Lockard","Ritesh Sarkhel","Sanket Lokegaonkar","Jingbo Shang","Xifeng Yan","Nasser Zalmout","Xian Li"],"abstract":"The Multimodal Large Language Models (MLLMs) are continually pre-trained on a mixture of image-text caption data and interleaved document data, while the high-quality data filtering towards image-text interleaved document data is under-explored. We propose to train an efficient MLLM as a Unified Mulitmodal Data Quality Classifier to Filter both high-quality image-text caption and interleaved data (UniFilter Category: Computer vision","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Computer vision","efficient"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=14"}},{"id":"official:ae0cbd19ff39c6ad","title":"Towards internet-scale training for agents","url":"https://www.amazon.science/publications/towards-internet-scale-training-for-agents","published":"2025","authors":["Brandon Trabucco","Gunnar Sigurdsson","Robinson Piramuthu","Ruslan Salakhutdinov"],"abstract":"The predominant approach for training web navigation agents gathers human demonstrations for a set of popular websites and hand-written tasks, but it is becoming clear that human data is an inefficient resource. We develop a pipeline to facilitate internet-scale training for agents without laborious human annotations. In the first stage, an LLM generates tasks for 150k diverse websites. In the next stage Category: Machine learning","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Machine learning","LLM"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=23"}},{"id":"official:31cec280278a07ef","title":"Tournament of prompts: Evolving LLM instructions through structured debates and Elo ratings","url":"https://www.amazon.science/publications/tournament-of-prompts-evolving-llm-instructions-through-structured-debates-and-elo-ratings","published":"2025","authors":["Anirudh Nair","Adi Banerjee","Laurent Mombaerts","Matthew Hagen","Tarik Borogovac"],"abstract":"Prompt engineering represents a critical bottleneck to harness the full potential of Large Language Models (LLMs) for solving complex tasks, as it requires specialized expertise, significant trial-and-error, and manual intervention. This challenge is particularly pronounced for tasks involving subjective quality assessment, where defining explicit optimization objectives becomes fundamentally problematic Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Conversational AI","LLM"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=19"}},{"id":"official:7caf7c6ce235e9e9","title":"To answer or not to answer (TAONA): A robust textual graph understanding and question answering approach","url":"https://www.amazon.science/publications/to-answer-or-not-to-answer-taona-a-robust-textual-graph-understanding-and-question-answering-approach","published":"2025","authors":["Yuchen Yan","Aakash Kolekar","Sahika Genc","Wenju Xu","Edward W Huang","Anirudh Srinivasan","Mukesh Jain","Qi He","Hanghang Tong"],"abstract":"Recently, textual graph-based retrieval-augmented generation (GraphRAG) has gained popularity for addressing hallucinations in large language models when answering domain-specific questions. Most existing studies assume that generated answers should comprehensively integrate all relevant information from the textual graph. However, this assumption may not always hold when certain information needs to be Category: Machine learning","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Machine learning","retrieval"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=13"}},{"id":"official:92815e4ea18f3fdb","title":"Text2Outfit: Controllable outfit generation with multimodal language models","url":"https://www.amazon.science/publications/text2outfit-controllable-outfit-generation-with-multimodal-language-models","published":"2025","authors":["Yuanhao Zhai","Yen-Liang Lin","Minxu Peng","Larry Davis","Ashwin Chandramouli","Junsong Yuan","David Doermann"],"abstract":"Existing outfit recommendation frameworks focus on outfit compatibility prediction and complementary item retrieval. We present a text-driven outfit generation framework, Text2Outfit, which generates outfits controlled by text prompts. Our framework supports two forms of outfit recommendation: 1) Text-to-outfit generation, where the prompt includes the specification for each outfit item (e.g., product features Category: Computer vision","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Computer vision","retrieval"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=12"}},{"id":"official:8b2b254ebc37c8ca","title":"TelcoAI: Advancing 3GPP technical specification search through agentic multi-modal retrieval-augmented generation","url":"https://www.amazon.science/publications/telcoai-advancing-3gpp-technical-specification-search-through-agentic-multi-modal-retrieval-augmented-generation","published":"2025","authors":["Rahul Ghosh","Chun-Hao Liu","Gaurav Rele","Vidya Sagar Ravipati"],"abstract":"The 3rd Generation Partnership Project (3GPP) produces complex technical specifications essential to global telecommunications, yet their hierarchical structure, dense formatting, and multi-modal content make them difficult to process. While Large Language Models (LLMs) show promise, existing approaches fall short in handling complex queries, visual information, and document interdependencies. We present Category: Search and information retrieval","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Search and information retrieval","retrieval"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=11"}},{"id":"official:36164420568e52d5","title":"Speech retrieval-augmented generation without automatic speech recognition","url":"https://www.amazon.science/publications/speech-retrieval-augmented-generation-without-automatic-speech-recognition","published":"2025","authors":["Do June Min","Karel Mundnich","Andy Lapastora","Erfan Soltanmohammadi","Srikanth Ronanki","Kyu Han"],"abstract":"One common approach for question answering over speech data is to first transcribe speech using automatic speech recognition (ASR) and then employ text-based retrieval-augmented generation (RAG) on the transcriptions. While this cascaded pipeline has proven effective in many practical settings, ASR errors can propagate to the retrieval and generation steps. To overcome this limitation, we introduce SpeechRAG Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Conversational AI","retrieval"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=36"}},{"id":"official:a0f7593e7e5e6655","title":"Speech recognition rescoring with large speech-text foundation models","url":"https://www.amazon.science/publications/speech-recognition-rescoring-with-large-speech-text-foundation-models","published":"2025","authors":["Prashanth Gurunath Shivakumar","Jari Kolehmainen","Aditya Gourav","Yi Gu","Ankur Gandhe","Ariya Rastrow","Ivan Bulyko"],"abstract":"Large language models (LLM) have demonstrated the ability to understand human language by leveraging large amount of text data. Automatic speech recognition (ASR) systems are often limited by available transcribed speech data and benefit from a second pass rescoring using LLM. Recently multi-modal large language models, particularly speech and text foundational models have demonstrated strong spoken language Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.1109/icassp49660.2025.10890616","openalex_id":"https://openalex.org/W4408354385","cited_by_count":0,"quality_score":60,"matched_keywords":["Conversational AI","LLM"],"author_affiliations":["Amazon","Amazon (United States)"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=35"}},{"id":"official:1efa56aa82502d28","title":"SimRAG: Self-improving retrieval-augmented generation for adapting large language models to specialized domains","url":"https://www.amazon.science/publications/simrag-self-improving-retrieval-augmented-generation-for-adapting-large-language-models-to-specialized-domains","published":"2025","authors":["Ran Xu","Hui Liu","Sreyashi Nag","Zhenwei Dai","Yaochen Xie","Xianfeng Tang","Chen Luo","Laurence (Yang) Li","Joyce C. Ho","Carl Yang","Qi He"],"abstract":"Retrieval-augmented generation (RAG) enhances the question answering (QA) abilities of large language models (LLMs) by integrating external knowledge. However, adapting general-purpose RAG systems to specialized fields such as science and medicine poses unique challenges due to distribution shifts and limited access to domain-specific data. To tackle this, we propose SimRAG, a self-training approach that Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Conversational AI","retrieval"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=29"}},{"id":"official:d6515b1ea7d52848","title":"SeRA: Self-reviewing and alignment of LLMs using implicit reward margins","url":"https://www.amazon.science/publications/sera-self-reviewing-and-alignment-of-llms-using-implicit-reward-margins","published":"2025","authors":["Jongwoo Ko","Saket Dingliwal","Bhavana Ganesh","Sailik Sengupta","Sravan Bodapati","Aram Galstyan"],"abstract":"Direct alignment algorithms (DAAs), such as direct preference optimization (DPO), have become popular alternatives for Reinforcement Learning from Human Feedback (RLHF) due to their simplicity, efficiency, and stability. However, the preferences used in DAAs are usually collected before the alignment training begins and remain unchanged (off-policy). This design leads to two problems where the policy model Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Conversational AI","preference"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=30"}},{"id":"official:84cc0f73d04bde97","title":"Scaling laws for predicting downstream performance in LLMs","url":"https://www.amazon.science/publications/scaling-laws-for-predicting-downstream-performance-in-llms","published":"2025","authors":["Yangyi Chen","Binxuan Huang","Yifan Gao","Zhengyang Wang","Jingfeng Yang","Heng Ji"],"abstract":"Precise estimation of downstream performance in large language models (LLMs) prior to training is essential for guiding their development process. Scaling laws analysis utilizes the statistics of a series of significantly smaller sampling language models (LMs) to predict the performance of the target LLM. For downstream performance prediction, the critical challenge lies in the emergent abilities in LLMs Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Conversational AI","LLM"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=26"}},{"id":"official:8a71ead914daf310","title":"STED and consistency scoring: A framework for evaluating LLM structured output reliability","url":"https://www.amazon.science/publications/sted-and-consistency-scoring-a-framework-for-evaluating-llm-structured-output-reliability","published":"2025","authors":["Gordon Wang","Jinze Yu","Xing Zhang","Dayuan Jiang","Yin Song","Tomal Deb","Xuefeng Liu","Peiyang He"],"abstract":"Large Language Models (LLMs) are increasingly deployed for structured data generation, yet output consistency remains critical for production applications. We introduce a comprehensive framework for evaluating and improving consistency in LLM-generated structured outputs. Our approach combines: (1) STED (Semantic Tree Edit Distance), a novel similarity metric balancing semantic flexibility with structural Category: Machine learning","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Machine learning","LLM"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=23"}},{"id":"official:ea09c6a6c642134b","title":"SEEval: Advancing LLM text evaluation efficiency and accuracy through self-explanation prompting","url":"https://www.amazon.science/publications/seeval-advancing-llm-text-evaluation-efficiency-and-accuracy-through-self-explanation-prompting","published":"2025","authors":["Gregory Wu","Md Mosharaf Hossain","Tess Wood","Si-Chi Chin","Shayan Ali Akbar","Erwin Cornejo"],"abstract":"Large language models (LLMs) have achieved remarkable success in various natural language generation (NLG) tasks, but their performance in automatic text evaluation is not yet ready as human replacements. In this paper, we propose SEEval (Self-Explanation in Evaluation), a novel prompt-based text evaluator. Inspired by educational psychology, SEEval incorporates self-explanation, a metacognitive strategy Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Conversational AI","LLM"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=33"}},{"id":"official:e8aa9e76fecf17f0","title":"SABER: Small actions, big errors — Safe-guarding mutating steps in LLM agents","url":"https://www.amazon.science/publications/saber-small-actions-big-errors-safe-guarding-mutating-steps-in-llm-agents","published":"2025","authors":["Alex Cuadron Lafuente","Pengfei Yu","Yang Liu","Arpit Gupta"],"abstract":"Despite rapid progress in LLM agents, performance on long-horizon, tool-using tasks remains fragile. To better understand this fragility, we ask a simple question: do all actions contribute equally to failure? Analyzing execution traces on τ-Bench (Airline/Retail) and SWE-Bench Verified, we decompose trajectories into mutating (environment-changing) vs. non-mutating steps and formalize de-cisive deviations—earliest Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Conversational AI","LLM"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=9"}},{"id":"official:8237f438d3a935be","title":"REIC: RAG-enhanced intent classification at scale","url":"https://www.amazon.science/publications/reic-rag-enhanced-intent-classification-at-scale","published":"2025","authors":["Ziji Zhang","Michael Yang","Zhiyu Chen","Yingying Zhuang","Shu-Ting Pi","Qun Liu","Raj Maragoud","Vy Nguyen","Anurag Beniwal"],"abstract":"Accurate intent classification is critical for efficient routing in customer service, ensuring customers are connected with the most suitable agents while reducing handling times and operational costs. However, as companies expand their product lines, intent classification faces scalability challenges due to the increasing number of intents and variations in taxonomy across different verticals. In this Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Conversational AI","efficient"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=18"}},{"id":"official:71f9af4d78137513","title":"Quantifying fairness in LLMs beyond tokens: A semantic and statistical perspective","url":"https://www.amazon.science/publications/quantifying-fairness-in-llms-beyond-tokens-a-semantic-and-statistical-perspective","published":"2025","authors":["Weijie Xu","Yiwen Wang","Chi Xue","Xiangkun Hu","Xi Fang","Guimin Dong","Chandan Reddy"],"abstract":"Large Language Models (LLMs) often generate responses with inherent biases, undermining their reliability in real-world applications. Existing evaluation methods often overlook biases in long-form responses and the intrinsic variability of LLM outputs. To address these challenges, we pro-pose FiSCo (Fine-grained Semantic Comparison), a novel statistical frame-work to evaluate group-level fairness in LLMs Category: Machine learning","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Machine learning","LLM"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=14"}},{"id":"official:a113c050175b1228","title":"QA-Calibration of language model confidence scores","url":"https://www.amazon.science/publications/qa-calibration-of-language-model-confidence-scores","published":"2025","authors":["Atalanti Mastakouri","Elke Kirschbaum","Shiva Kasiviswanathan","Aaditya Ramdas"],"abstract":"To use generative question-and-answering (QA) systems for decision-making and in any critical application, these systems need to provide well-calibrated confidence scores that reflect the correctness of their answers. Existing calibration methods aim to ensure that the confidence score is on average indicative of the likelihood that the answer is correct. We argue, however, that this standard (average-case Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Conversational AI","language model"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=31"}},{"id":"official:8093d6ed93fb1420","title":"Proposer-agent-evaluator (PAE): Autonomous skill discovery for foundation model internet agents","url":"https://www.amazon.science/publications/proposer-agent-evaluator-pae-autonomous-skill-discovery-for-foundation-model-internet-agents","published":"2025","authors":["Yifei Zhou","Qianlan Yang","Kaixiang Lin","Min Bai","Xiong Zhou","Yu-Xiong Wang","Sergey Levine","Erran Li"],"abstract":"A generalist foundation model agent needs to have a large and diverse skill repertoire, such as finding directions between two travel locations and buying specific items from the Internet. If each skill needs to be specified manually through a fixed set of human-annotated instructions, the agent’s skill repertoire will necessarily be limited due to the scalability of human-annotated instructions. In this Category: Machine learning","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Machine learning","agent"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=23"}},{"id":"official:1a4d64fafbc635eb","title":"PipeRAG: Fast retrieval-augmented generation via adaptive pipeline parallelism","url":"https://www.amazon.science/publications/piperag-fast-retrieval-augmented-generation-via-adaptive-pipeline-parallelism","published":"2025","authors":["Wenqi Jiang","Shuai Zhang","Boran Han","Jie Wang","Yuyang (Bernie) Wang","Tim Kraska"],"abstract":"Retrieval-augmented generation (RAG) can enhance the generation quality of large language models (LLMs) by incorporating external token databases. However, retrievals from large databases can constitute a substantial portion of the overall generation time, particularly when retrievals are periodically performed to align the retrieved content with the latest states of generation. In this paper, we introduce Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Conversational AI","retrieval"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=36"}},{"id":"official:53291d33bc873804","title":"PipeFill: Using GPUs during bubbles in pipeline-parallel LLM training","url":"https://www.amazon.science/publications/pipefill-using-gpus-during-bubbles-in-pipeline-parallel-llm-training","published":"2025","authors":["Daiyaan Arfeen","Zhen Zhang","Xinwei Fu","Gregory R. Ganger","Yida Wang"],"abstract":"Training Deep Neural Networks (DNNs) with billions of parameters generally involves pipeline-parallel (PP) execution. Unfortunately, PP model training can use GPUs inefficiently, especially at large scale, due to idle GPU time caused by pipeline bubbles, which are often 15–30% and can exceed 60% of the training job’s GPU allocation. To improve the GPU utilization of PP model training, this paper describes Category: Machine learning","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Machine learning","LLM"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=27"}},{"id":"official:30283246e385dd88","title":"PARSE: LLM driven schema optimization for reliable entity extraction","url":"https://www.amazon.science/publications/parse-llm-driven-schema-optimization-for-reliable-entity-extraction","published":"2025","authors":["Anubhav Shrimal","Aryan Jain","Soumyajit Chowdhury","Promod Yenigalla"],"abstract":"Structured information extraction from unstructured text is critical for emerging Software 3.0 systems where LLM agents autonomously interact with APIs and tools. Recent approaches apply large language models directly to extraction tasks using existing JSON schemas, often with constraint decoding or reinforcement learning approaches to ensure syntactic validity, but treat JSON schemas as static contracts Category: Information and knowledge management","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.18653/v1/2025.emnlp-industry.184","openalex_id":"https://openalex.org/W4416037168","cited_by_count":0,"quality_score":60,"matched_keywords":["Information and knowledge management","LLM"],"author_affiliations":["Amazon","Amazon (United States)"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=11"}},{"id":"official:487c7fd3f785f81e","title":"On the analysis and distillation of emergent outlier properties in pre-trained language models","url":"https://www.amazon.science/publications/on-the-analysis-and-distillation-of-emergent-outlier-properties-in-pre-trained-language-models","published":"2025","authors":["Tianyang Zhao","Yash Singh","Srikar Appalaraju","Peng Tang","Ying Nian Wu","Erran Li"],"abstract":"A small subset of dimensions within language Transformers’ representation spaces emerge as \"outliers\" during pretraining, encoding critical knowledge sparsely. We extend previous findings on emergent outliers to Encoder-Decoder Transformers and instruction-finetuned models, and tackle the problem of distilling a student Transformer from a larger teacher Trans-former. Knowledge distillation reduces model Category: Machine learning","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Machine learning","distillation"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=34"}},{"id":"official:2f33e90c71bfd5f7","title":"On mitigating code LLM hallucinations with API documentation","url":"https://www.amazon.science/publications/on-mitigating-code-llm-hallucinations-with-api-documentation","published":"2025","authors":["Nihal Jain","Rob Kwiatkowski","Baishakhi Ray","Murali Krishna Ramanathan","Varun Kumar"],"abstract":"In this study, we address the issue of API hallucinations in various software engineering contexts. We introduce CloudAPIBench, a new benchmark designed to measure API hallucination occurrences. CloudAPIBench also provides annotations for frequencies of API occurrences in the public domain, allowing us to study API hallucinations at various frequency levels. Our findings reveal that Code LLMs struggle with Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Conversational AI","LLM"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=31"}},{"id":"official:3601fee049f9eaec","title":"Multimodal LLM augmented reasoning for interpretable visual perception analysis","url":"https://www.amazon.science/publications/multimodal-llm-augmented-reasoning-for-interpretable-visual-perception-analysis","published":"2025","authors":["Shravan Chaudhari","Trilokya Akula","Yoon Kim","Tom Blake"],"abstract":"In this paper, we advance the study of AI-augmented reasoning in the context of Human-Computer Interaction (HCI), psychology and cognitive science, focusing on the critical task of visual perception. Specifically, we investigate the applicability of Multimodal Large Language Models (MLLMs) in this domain. To this end, we leverage established principles and explanations from psychology and cognitive science Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Conversational AI","LLM"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=18"}},{"id":"official:8eba4daf6522714d","title":"Multi-lingual multi-turn automated red teaming for LLMs","url":"https://www.amazon.science/publications/multi-lingual-multi-turn-automated-red-teaming-for-llms","published":"2025","authors":["Abhishek Singhania","Christophe Dupuy","Shivam Mangale","Amani Namboori"],"abstract":"Warning: This paper includes content that may be considered inappropriate or offensive to some readers. Viewer discretion is advised. Language Model Models (LLMs) have improved dramatically in the past few years, increasing their adoption and the scope of their capabilities over time. A significant amount of work is dedicated to “model alignment”, i.e., preventing LLMs to generate unsafe responses when Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Conversational AI","language model"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=30"}},{"id":"official:b28d46d76de820f0","title":"Monte Carlo Temperature: A robust sampling strategy for LLM’s uncertainty quantification methods","url":"https://www.amazon.science/publications/monte-carlo-temperature-a-robust-sampling-strategy-for-llms-uncertainty-quantification-methods","published":"2025","authors":["Nicola Cecere","Andrea Bacciu","Ignacio Fernandez Tobias","Amin Mantrach"],"abstract":"Uncertainty quantification (UQ) in Large Language Models (LLMs) is essential for their safe and reliable deployment, particularly in critical applications where incorrect outputs can have serious consequences. Current UQ methods typically rely on querying the model multiple times using non-zero temperature sampling to generate diverse outputs for uncertainty estimation. However, the impact of selecting Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Conversational AI","LLM"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=28"}},{"id":"official:f536830d2a8fa4df","title":"Measuring the fairness gap between retrieval and generation in RAG systems using a cognitive complexity framework","url":"https://www.amazon.science/publications/measuring-the-fairness-gap-between-retrieval-and-generation-in-rag-systems-using-a-cognitive-complexity-framework","published":"2025","authors":["Sandeep Avula","Chia-Jung Lee","Rongting Zhang","Vanessa Murdock"],"abstract":"In this paper, we investigate the problem of quantifying fairness in Retrieval-Augmented Generation (RAG) systems, particularly for complex cognitive tasks that go beyond factual question-answering. While RAG systems have demonstrated effectiveness in information extraction tasks, their fairness implications for cognitively complex tasks - including ideation, content creation, and analytical reasoning — Category: Search and information retrieval","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Search and information retrieval","retrieval"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=19"}},{"id":"official:c7cf41d954e7e0b3","title":"Marconi: Prefix caching for the era of hybrid LLMs","url":"https://www.amazon.science/publications/marconi-prefix-caching-for-the-era-of-hybrid-llms","published":"2025","authors":["Rui Pan","Zhuang Wang","Zhen Jia","Can Karakus","Luca Zancato","Tri Dao","Yida Wang","Ravi Netravali"],"abstract":"Hybrid models that combine the language modeling capabilities of Attention layers with the efficiency of Recurrent layers (e.g., State Space Models) have gained traction in practically supporting long contexts in Large Language Model serving. Yet, the unique properties of these models complicate the usage of complementary efficiency optimizations such as prefix caching that skip redundant computations across Category: Cloud and systems","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Cloud and systems","language model"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=27"}},{"id":"official:d7488134ae8c900f","title":"MEMERAG: A multilingual end-to-end meta-evaluation benchmark for retrieval augmented generation","url":"https://www.amazon.science/publications/memerag-a-multilingual-end-to-end-meta-evaluation-benchmark-for-retrieval-augmented-generation","published":"2025","authors":["Andrea Cruz","Jayasimha Talur","Bruno Charron","Dong Liu","Saab Mansour","Marcello Federico"],"abstract":"Automatic evaluation of retrieval augmented generation (RAG) systems relies on fine grained dimensions like faithfulness and relevance, as judged by expert human annotators. Meta-evaluation benchmarks support the development of automatic evaluators that correlate well with human judgement. However, existing benchmarks predominantly focus on English or use translated data, which fails to capture cultural Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Conversational AI","retrieval"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=19"}},{"id":"official:f0589fd167ce2085","title":"LongLeader: A comprehensive leaderboard for large language models in long-context scenarios","url":"https://www.amazon.science/publications/longleader-a-comprehensive-leaderboard-for-large-language-models-in-long-context-scenarios","published":"2025","authors":["Pei (Patrick) Chen","Hongye Jin","Cheng-Che Lee","Rulin Shao","Jingfeng Yang","Mingyu Zhao","Zhaoyu Zhang","Qin Lu","Ning Xie","Huasheng Li","Bing Yin","Han Li"],"abstract":"Large Language Models (LLMs), exemplified by Claude and LLama, have exhibited impressive proficiency in tackling a myriad of Natural Language Processing (NLP) tasks. Yet, in pursuit of the ambitious goal of attaining Artificial General Intelligence (AGI), there remains ample room for enhancing LLM capabilities. Chief among these is the pressing need to bolster long-context comprehension. Numerous real-world Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Conversational AI","LLM"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=33"}},{"id":"official:92bc8a6904f3c944","title":"LentEx: Generalizable latent entity extraction via synthetic data and instruction-tuned LLMs","url":"https://www.amazon.science/publications/lentex-generalizable-latent-entity-extraction-via-synthetic-data-and-instruction-tuned-llms","published":"2025","authors":["Umesh Bodhwani","Yuan Ling","Cibi Chakravarthy Senthilkumar","Shujing Dong","Yarong Feng","Hongfei Li","Ayush Goyal"],"abstract":"Latent entity extraction (LEE) tackles the challenge of identifying implicit, contextually inferred entities within free text—an area where traditional entity extraction methods fall short. In this paper, we introduce LentEx, a novel framework for latent entity extraction that leverages synthetic data generation and instruction fine-tuning to optimize smaller, efficient large language models (LLMs). Latent Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.1109/ijcnn64981.2025.11228382","openalex_id":"https://openalex.org/W4416251810","cited_by_count":0,"quality_score":60,"matched_keywords":["Conversational AI","efficient"],"author_affiliations":["Amazon","Amazon (United States)"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=20"}},{"id":"official:a0f6c5a8fbe2604f","title":"Learning with less: Knowledge distillation from large language models via unlabeled data","url":"https://www.amazon.science/publications/learning-with-less-knowledge-distillation-from-large-language-models-via-unlabeled-data","published":"2025","authors":["Juanhui Li","Sreyashi Nag","Hui Liu","Xianfeng Tang","Sheikh Sarwar","Limeng Cui","Hansu Gu","Suhang Wang","Qi He","Jiliang Tang"],"abstract":"In real-world NLP applications, Large Language Models (LLMs) offer promising solutions due to their extensive training on vast datasets. However, the large size and high computation demands of LLMs limit their practicality in many applications, especially when further fine-tuning is required. To address these limitations, smaller models are typically preferred for deployment. However, their training is Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Conversational AI","distillation"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=23"}},{"id":"official:b16f12024fcc1d90","title":"LUME: LLM unlearning with multitask evaluations","url":"https://www.amazon.science/publications/lume-llm-unlearning-with-multitask-evaluations","published":"2025","authors":["Anil Ramakrishna","Yixin Wan","Xiaomeng Jin","Kai-Wei Chang","Zhiqi Bu","Bhanu Vinzamuri","Volkan Cevher","Mingyi Hong","Rahul Gupta"],"abstract":"Unlearning aims to remove copyrighted, sensitive, or private content from large language models (LLMs) without a full retraining. In this work, we develop a multi-task unlearning benchmark (LUME) which features three tasks: (1) unlearn synthetically generated creative short novels, (2) unlearn synthetic biographies with sensitive information, and (3) unlearn a collection of public biographies. We further Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Conversational AI","LLM"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=25"}},{"id":"official:769d4b6a1070cce9","title":"LLaVA-RE: Binary image-text relevancy evaluation with multimodal large language model","url":"https://www.amazon.science/publications/llava-re-binary-image-text-relevancy-evaluation-with-multimodal-large-language-model","published":"2025","authors":["Tao Sun","Oliver Liu","JinJin Li","Lan Ma"],"abstract":"Multimodal generative AI usually involves generating image or text responses given inputs in another modality. The evaluation of image-text relevancy is essential for measuring response quality or ranking candidate responses. In particular, binary relevancy evaluation, i.e., “Relevant” vs. “Not Relevant”, is a fundamental problem. However, this is a challenging task considering that texts have diverse formats Category: Computer vision","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Computer vision","language model"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=36"}},{"id":"official:ea91e2e0b34f6753","title":"LLMs for customized marketing content generation and evaluation at scale","url":"https://www.amazon.science/publications/llms-for-customized-marketing-content-generation-and-evaluation-at-scale","published":"2025","authors":["Haoran Liu","Amir Tahmasbi","Ehtesham Sam haque","Purak Jain"],"abstract":"Offsite marketing is essential in e-commerce, enabling businesses to reach customers through external platforms and drive traffic to retail websites. However, most current offsite marketing content is overly generic, template-based, and poorly aligned with landing pages, limiting its effectiveness. To address these limitations, we propose MarketingFM, a retrieval-augmented marketing content generation system Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Conversational AI","retrieval"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=17"}},{"id":"official:537ab0422f263395","title":"LLM-STARS: LLM-enhanced standardization of time-series analysis and relationships in subledgers","url":"https://www.amazon.science/publications/llm-stars-llm-enhanced-standardization-of-time-series-analysis-and-relationships-in-subledgers","published":"2025","authors":["Wei Tang","Daksha Yadav","Sabrina Zhang","Tom Jin"],"abstract":"Financial accounting systems rely heavily on subledgers to track detailed transaction records. However, modern systems often evolve into complex architectures where different components use inconsistent labeling conventions, making it difficult to understand and utilize important relationships within subledger data. This paper presents a novel framework LLM-STARS (LLM-Enhanced Standardization of Time-series Category: Machine learning","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Machine learning","LLM"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=18"}},{"id":"official:9d13db593261bff0","title":"HybGrag: Hybrid retrieval-augmented generation on textual and relational knowledge bases","url":"https://www.amazon.science/publications/hybgrag-hybrid-retrieval-augmented-generation-on-textual-and-relational-knowledge-bases","published":"2025","authors":["Jeremy Lee","Qi Zhu","Costas Mavromatis","Zhen Han","Soji Adeshina","Vassilis N. Ioannidis","Huzefa Rangwala","Christos Faloutsos"],"abstract":"Given a semi-structured knowledge base (SKB), where text documents are interconnected by relations, how can we effectively retrieve relevant information to answer user questions? Retrieval-Augmented Generation (RAG) retrieves documents to assist large language models (LLMs) in question answering; while Graph RAG (GRAG) uses structured knowledge bases as its knowledge source. However, many questions require Category: Machine learning","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Machine learning","retrieval"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=23"}},{"id":"official:9c3c82e27e193a8a","title":"HALLUCANA: Fixing LLM hallucination with a canary lookahead","url":"https://www.amazon.science/publications/hallucana-fixing-llm-hallucination-with-a-canary-lookahead","published":"2025","authors":["Tianyi Li","Erenay Dayanik","Shubhi Tyagi","Andrea Pierleoni"],"abstract":"In this paper, we present HALLUCANA, a canary lookahead to detect and correct factuality hallucinations of Large Language Models (LLMs) in long-form generation. HALLUCANA detects and intervenes as soon as traces of hallucination emerge, during and even before generation. To support timely detection, we exploit the internal factuality representation in the LLM hidden space, where we investigate various proxies Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Conversational AI","LLM"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=32"}},{"id":"official:fbb77925b3b413ef","title":"Generative audio language modeling with continuous-valued tokens and masked next-token prediction","url":"https://www.amazon.science/publications/generative-audio-language-modeling-with-continuous-valued-tokens-and-masked-next-token-prediction","published":"2025","authors":["Shu-wen Yang","Byeonggeun Kim","Kuan Po Huang","Huy Phan","Bo-Ru (Roy) Lu","Harsha Sundar","Shalini Ghosh","Hung-yi Lee","Chieh-Chi Kao","Chao Wang"],"abstract":"Autoregressive next-token prediction with the Transformer decoder has become a de facto standard in large language models (LLMs), achieving remarkable success in Natural Language Processing (NLP) at scale. Extending this paradigm to audio poses unique challenges due to its inherently continuous nature. We research audio generation with a causal language model (LM) without discrete tokens. We leverage token-wise Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Conversational AI","language model"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=22"}},{"id":"official:04fa36718f575522","title":"GT2Vec: Large language models for knowledge graph augmented text embedding","url":"https://www.amazon.science/publications/gt2vec-large-language-models-for-knowledge-graph-augmented-text-embedding","published":"2025","authors":["Jiacheng Lin","Kun Qian","Haoyu Han","Nurendra Choudhary","Tianxin Wei","Zhongruo Wang","Sahika Genc","Edward W Huang","sheng wang","Karthik Subbian","Danai Koutra"],"abstract":"Graph-structured information offers rich contextual information that can enhance language models by providing structured relationships and hierarchies, leading to more expressive embeddings for various applications such as retrieval, question answering, and classification. However, existing methods for integrating graph and text embeddings, often based on Multi-layer Perceptrons (MLPs) or shallow transformers Category: Search and information retrieval","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Search and information retrieval","retrieval"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=14"}},{"id":"official:fe0b51c93119b2d9","title":"GRIL: Knowledge graph retrieval-integrated learning with large language models","url":"https://www.amazon.science/publications/gril-knowledge-graph-retrieval-integrated-learning-with-large-language-models","published":"2025","authors":["Jialin Chen","Houyu Zhang","Seongjun Yun","Alejandro Mottini","Rex Ying","Xiang Song","Vassilis N. Ioannidis","Zheng Li","Qingjun Cui"],"abstract":"Retrieval-Augmented Generation (RAG) has significantly mitigated the hallucinations of Large Language Models (LLMs) by grounding the generation with external knowledge. Recent extensions of RAG to graph-based retrieval offer a promising direction, leveraging the structural knowledge for multi-hop reasoning. However, existing graph RAG typically decouples retrieval and reasoning processes, which prevents Category: Search and information retrieval","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Search and information retrieval","retrieval"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=11"}},{"id":"official:1dd760e72b427c2e","title":"GEXIA: Granularity expansion and iterative approximation for scalable multi-grained video-language learning","url":"https://www.amazon.science/publications/gexia-granularity-expansion-and-iterative-approximation-for-scalable-multi-grained-video-language-learning","published":"2025","authors":["Yicheng Wang","Zhikang Zhang","Jue Wang","David Fan","Zhenlin Xu","Linda Liu","Xiang Hao","Vimal Bhat","Xinyu (Arthur) Li"],"abstract":"In various video-language learning tasks, the challenge of achieving cross-modality alignment with multi-grained data persists. We propose a method to tackle this challenge from two crucial perspectives: data and modeling. Given the absence of a multi-grained video-text pretraining dataset, we introduce a Granularity EXpansion (GEX) method with Integration and Compression operations to expand the granularity Category: Computer vision","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Computer vision","compression"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=37"}},{"id":"official:901951589e2db7db","title":"Finding the sweet spot: Trading quality, cost, and speed during inference-time LLM reflection","url":"https://www.amazon.science/publications/finding-the-sweet-spot-trading-quality-cost-and-speed-during-inference-time-llm-reflection","published":"2025","authors":["Jack Butler","Nikita Kozodoi","Zainab Afolabi","Brian Tyacke","Gaiar Baimuratov"],"abstract":"As Large Language Models (LLMs) continue to evolve, practitioners face increasing options for enhancing inference-time performance without model retraining, including budget tuning and multi-step techniques like self-reflection. While these methods improve output quality, they create complex trade-offs among accuracy, cost, and latency that remain poorly understood across different domains. This paper systematically Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Conversational AI","LLM"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=8"}},{"id":"official:659815bd29623141","title":"FaVe: Factored and verified search rationale for long-form answer","url":"https://www.amazon.science/publications/fave-factored-and-verified-search-rationale-for-long-form-answer","published":"2025","authors":["Jihyeok Kim","Sungjin Lee","Seung-won Hwang","Yang Liu"],"abstract":"Targeting long-form question-answering, chain-of-query (CoQ) has been studied, integrating chain-of-thought (CoT) with retrieval-augmented generation. CoQ breaks down complex questions into simpler subquestions (SQs), allowing relevant information to be retrieved step by step. By doing so, CoQ aims to improve the answer comprehensiveness and verifiability, at the expense of latency. Our first contribution Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Conversational AI","retrieval"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=16"}},{"id":"official:b63440978f6cb33c","title":"FR-LoRA: Fisher regularized LoRA for multilingual continual learning","url":"https://www.amazon.science/publications/fr-lora-fisher-regularized-lora-for-multilingual-continual-learning","published":"2025","authors":["Sayanta Adhikari","Sanjay Agrawal","Vivek Sembium"],"abstract":"Relevance in e-commerce product search is critical to ensuring that results accurately reflect customer intent. While large language models (LLMs) have recently advanced natural language processing capabilities, their high inference latency and significant infrastructure demands make them less suitable for real-time e-commerce applications. Consequently, transformer-based encoder models are widely adopted Category: Search and information retrieval","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Search and information retrieval","retrieval"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=8"}},{"id":"official:6168c8e3dad114dd","title":"Evaluating the critical risks of Amazon’s Nova Premier under the Frontier Model Safety Framework","url":"https://www.amazon.science/publications/evaluating-the-critical-risks-of-amazons-nova-premier-under-the-frontier-model-safety-framework","published":"2025","authors":["Satyapriya Krishna","Ninareh Mehrabi","Abhinav Mohanty","Matteo Memelli","Vincent Ponzo","Payal Motwani","Rahul Gupta"],"abstract":"Nova Premier is Amazon’s most capable multimodal foundation model and teacher for model distillation. It processes text, images, and video with a one-million-token context window, enabling analysis of large codebases, 400-page documents, and 90-minute videos in a single prompt [2]. We present the first comprehensive evaluation of Nova Premier’s critical risk profile under the Frontier Model Safety Framework Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Conversational AI","distillation"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=19"}},{"id":"official:227b380e8f813ba7","title":"Enhancing language model agents using Diversity of Thoughts","url":"https://www.amazon.science/publications/enhancing-language-model-agents-using-diversity-of-thoughts","published":"2025","authors":["Vijay Lingam","Behrooz Omidvar-Tehrani","Sujay Sanghavi","Gaurav Gupta","Sayan Ghosh","Linbo Liu","Luke Huan","Anoop Deoras"],"abstract":"A popular approach to building agents using Language Models (LMs) involves iteratively prompting the LM, reflecting on its outputs, and updating the input prompts until the desired task is achieved. However, our analysis reveals two key shortcomings in the existing methods: (i) limited exploration of the decision space due to repetitive reflections, which result in redundant inputs, and (ii) an inability Category: Machine learning","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Machine learning","language model"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=28"}},{"id":"official:39b66f422d642790","title":"Efficient post-training for industry-specialized reasoning in small language models","url":"https://www.amazon.science/publications/efficient-post-training-for-industry-specialized-reasoning-in-small-language-models","published":"2025","authors":["Bill Cai","Sheldon Liu","Tatsuo Azeyanagi","Tomal Deb"],"abstract":"Large reasoning models (LRMs) excel at reasoning tasks but face deployment barriers due to computational constraints, regulatory requirements, and domain-specific knowledge gaps. This work addresses these limitations by developing cost-efficient post-training methods to enhance reasoning capabilities. Using Qwen3-4B as our base model, we investigate variations of efficient Supervised Fine-Tuning (SFT) and Category: Machine learning","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Machine learning","efficient"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=10"}},{"id":"official:a3b869888ed622c1","title":"Effective post-training embedding compression via temperature control in contrastive training","url":"https://www.amazon.science/publications/effective-post-training-embedding-compression-via-temperature-control-in-contrastive-training","published":"2025","authors":["Georgiana Dinu","Corey Barrett","Yi Xiang","Miguel Romero Calvo","Anna Currey","Xing Niu"],"abstract":"Fixed-size learned representations (dense representations, or embeddings) are widely used in many machine learning applications across language, vision or speech modalities. This paper investigates the role of the temperature parameter in contrastive training for text embeddings. We shed light on the impact this parameter has on the intrinsic dimensionality of the embedding spaces obtained, and show that Category: Machine learning","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Machine learning","compression"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=30"}},{"id":"official:432e3fe1bed2c1bf","title":"DuRep: Dual-mode speech representation learning via ASR-aware distillation","url":"https://www.amazon.science/publications/durep-dual-mode-speech-representation-learning-via-asr-aware-distillation","published":"2025","authors":["Prabash Reddy Male","Swayambhu Nath Ray","Harish Arsikere","Akshat Jaiswal","Prakhar Swarup","PRANTIK SEN","Debmalya Chakrabarty","K V Vijay Girish","Nikhil Bhave","Frederick Weber","Sambuddha Bhattacharya","Sri Garimella"],"abstract":"Recent advancements in speech encoders have drawn attention due to their integration with Large Language Models for various speech tasks. While most research has focused on either causal or full-context speech encoders, there’s limited exploration to effectively handle both streaming and non-streaming applications, while achieving state-of-the-art performance. We introduce DuRep, a Dual-mode Speech Representation Category: Machine learning","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Machine learning","distillation"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=25"}},{"id":"official:3a430a0fae9a29f3","title":"DreamBlend: Advancing personalized fine-tuning of text-to-image diffusion models","url":"https://www.amazon.science/publications/dreamblend-advancing-personalized-fine-tuning-of-text-to-image-diffusion-models","published":"2025","authors":["Shwetha Ram","Tal Neiman","Qianli Feng","Andrew Stuart","Son Tran","Trishul Chilimbi"],"abstract":"Given a small number of images of a subject, personalized image generation techniques can fine-tune large pre-trained text-to-image diffusion models to generate images of the subject in novel contexts, conditioned on text prompts. In doing so, a trade-off is made between prompt fidelity, subject fidelity and diversity. As the pre-trained model is fine-tuned, earlier checkpoints synthesize images with low Category: Computer vision","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Computer vision","personalized"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=37"}},{"id":"official:811e7aa59ef86151","title":"Document haystack: A long context multimodal image/document understanding vision LLM benchmark","url":"https://www.amazon.science/publications/document-haystack-a-long-context-multimodal-image-document-understanding-vision-llm-benchmark","published":"2025","authors":["Goeric Huybrechts","Srikanth Ronanki","Sai Muralidhar Jayanthi","Jack G. M. FitzGerald","Srinivasan Veeravanallur"],"abstract":"The proliferation of multimodal Large Language Models has significantly advanced the ability to analyze and understand complex data inputs from different modalities. However, the processing of long documents remains under-explored, largely due to a lack of suitable benchmarks. To address this, we introduce Document Haystack12 , a comprehensive benchmark designed to evaluate the performance of Vision Language Category: Machine learning","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.1109/iccvw69036.2025.00428","openalex_id":"https://openalex.org/W7131071365","cited_by_count":0,"quality_score":60,"matched_keywords":["Machine learning","LLM"],"author_affiliations":["Amazon","Amazon (United States)"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=14"}},{"id":"official:d1d82600fc850542","title":"DocVLM: Make your VLM an efficient reader","url":"https://www.amazon.science/publications/docvlm-make-your-vlm-an-efficient-reader","published":"2025","authors":["Mor Shpigel Nacson","Aviad Aberdam","Roy Ganz","Elad Ben Avraham","Alona Golts","Yair Kittenplon","Shai Mazor","Ron Litman"],"abstract":"Vision-Language Models (VLMs) excel in diverse visual tasks but face challenges in document understanding, which requires fine-grained text processing. While typical visual tasks perform well with low-resolution inputs, readingintensive applications demand high-resolution, resulting in significant computational overhead. Using OCR-extracted text in VLM prompts partially addresses this issue but underperforms Category: Computer vision","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Computer vision","efficient"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=18"}},{"id":"official:00e0d305d90a2c5e","title":"DocTalk: Scalable graph-based dialogue synthesis for enhancing LLM conversational capabilities","url":"https://www.amazon.science/publications/doctalk-scalable-graph-based-dialogue-synthesis-for-enhancing-llm-conversational-capabilities","published":"2025","authors":["Jing Yang Lee","Hamed Bonab","Nasser Zalmout","Ming Zeng","Sanket Lokegaonkar","Colin Lockard","Binxuan Huang","Ritesh Sarkhel","Haodong Wang"],"abstract":"Large Language Models (LLMs) are increasingly employed in multi-turn conversational tasks, yet their pre-training data predominantly consists of continuous prose, creating a potential mismatch between required capabilities and training paradigms. We introduce a novel approach to address this discrepancy by synthesizing conversational data from existing text corpora. We present a pipeline that transforms Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Conversational AI","LLM"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=12"}},{"id":"official:369b62d21f5d3822","title":"Direct and explicit 3D generation from a single image","url":"https://www.amazon.science/publications/direct-and-explicit-3d-generation-from-a-single-image","published":"2025","authors":["Haoyu Wu","Gitika Karumuri","Chuhang Zou","Seungbae Bang","Yuelong Li","Dimitris Samaras","Sunil Hadap"],"abstract":"Current image-to-3D approaches suffer from high computational costs and lack scalability for high-resolution outputs. In contrast, we introduce a novel framework to directly generate explicit surface geometry and texture using multi-view 2D depth and RGB images along with 3D Gaussian features using a repurposed Stable Diffusion model. We introduce a depth branch into U-Net for efficient and high quality Category: Computer vision","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Computer vision","efficient"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=37"}},{"id":"official:7ecd456773e67c45","title":"Detecting and mitigating challenges in zero-shot video summarization with video LLMs","url":"https://www.amazon.science/publications/detecting-and-mitigating-challenges-in-zero-shot-video-summarization-with-video-llms","published":"2025","authors":["Luca Cagliero","Lorenzo Vaiani","Eliana Pastor","Alkis Koudounas","Elena Baralis","Vittorio Mazzia","Sandro Pollastrini","Thomas Gueudre","Manuel Giollo","Daniele Amberti","Yue (Rex) Wu"],"abstract":"Video summarization aims to generate a condensed textual version of an original video. Summaries may consist of either plain text or a shortlist of salient events, possibly including temporal or spatial references. Video Large Language Models (VLLMs) exhibit impressive zero-shot capabilities in video analysis. However, their performance varies significantly according to the LLM prompt, the characteristics Category: Computer vision","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Computer vision","LLM"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=19"}},{"id":"official:11366ebadc8522d6","title":"DFLOW: Diverse dialogue flow simulation with large language models","url":"https://www.amazon.science/publications/dflow-diverse-dialogue-flow-simulation-with-large-language-models","published":"2025","authors":["Wanyu Du","Song Feng","James Gung","Justin Sun","Yi Zhang","Saab Mansour","Yanjun (Jane) Qi"],"abstract":"Developing language model-based dialogue agents requires effective data to train models that can follow specific task logic. However, most existing data simulation methods focus on increasing diversity in language, topics, or dialogue acts at the utterance level, largely neglecting a critical aspect of task logic diversity at the dialogue level. This paper proposes a novel data simulation method designed Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.18653/v1/2025.realm-1.2","openalex_id":"https://openalex.org/W4412944275","cited_by_count":0,"quality_score":60,"matched_keywords":["Conversational AI","language model"],"author_affiliations":["Amazon","Amazon (United States)"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=17"}},{"id":"official:a0e3787f0746de32","title":"DEFT-VTON: Efficient virtual try-on with consistent generalised h-transform","url":"https://www.amazon.science/publications/deft-vton-efficient-virtual-try-on-with-consistent-generalised-h-transform","published":"2025","authors":["Xingzi Xu","Qi Li","Shuwen Qiu","Julien Han","Karim Bouyarmane"],"abstract":"Diffusion models enables high-quality virtual try-on (VTO) with their established image synthesis abilities. Despite the extensive end-to-end training of large pre-trained models involved in current VTO methods, real-world applications often prioritize limited training and inferencing/serving/deployment budgets for VTO. To solve this obstacle, we apply Doob’s h-transform efficient fine-tuning (DEFT) for Category: Computer vision","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Computer vision","efficient"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=26"}},{"id":"official:7061e72613d19130","title":"Confidence scoring for LLM-generated SQL in supply chain data extraction","url":"https://www.amazon.science/publications/confidence-scoring-for-llm-generated-sql-in-supply-chain-data-extraction","published":"2025","authors":["Meredith Ma","Yikai Zhao"],"abstract":"Large Language Models (LLMs) have recently enabled natural language interfaces that translate user queries into executable SQL, offering a powerful solution for non-technical stakeholders to access structured data. However, one of the limitation that LLMs do not natively express uncertainty makes it difficult to assess the reliability of their generated queries. This paper presents a case study that evaluates Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Conversational AI","LLM"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=21"}},{"id":"official:6488f0fb09569fce","title":"CiteFix: Enhancing RAG accuracy through post-processing citation correction","url":"https://www.amazon.science/publications/citefix-enhancing-rag-accuracy-through-post-processing-citation-correction","published":"2025","authors":["Harsh Maheshwari","Srikanth Tenneti","Alwarappan Nakkiran"],"abstract":"Retrieval Augmented Generation (RAG) has emerged as a powerful application of Large Language Models (LLMs), revolutionizing information search and consumption. RAG systems combine traditional search capabilities with LLMs to generate comprehensive answers to user queries, ideally with accurate citations. However, in our experience of developing a RAG product, LLMs often struggle with source attribution, Category: Machine learning","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Machine learning","retrieval"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=22"}},{"id":"official:9e218c6aac7cf33f","title":"ChronosX: Adapting pretrained time series models with exogenous variables","url":"https://www.amazon.science/publications/chronosx-adapting-pretrained-time-series-models-with-exogenous-variables","published":"2025","authors":["Sebastian Pineda Arango","Pedro Mercado","Shubham Kapoor","Abdul Fatir Ansari","Lorenzo Stella","Huibin Shen","Hugo Senetaire","Caner Turkmen","Oleksandr Shchur","Danielle Maddix Robinson","Michael Bohlke-Schneider","Yuyang (Bernie) Wang"],"abstract":"Covariates provide valuable information on external factors that influence time series and are critical in many real-world time series forecasting tasks. For example, in retail, covariates may indicate promotions or peak dates such as holiday seasons that heavily influence demand forecasts. Recent advances in pre-training large language model architectures for time series forecasting have led to highly Category: Machine learning","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Machine learning","language model"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=31"}},{"id":"official:011dbc5f172c288c","title":"ChaI-TeA: A benchmark for evaluating autocompletion of interactions with LLM-based chatbots","url":"https://www.amazon.science/publications/chai-tea-a-benchmark-for-evaluating-autocompletion-of-interactions-with-llm-based-chatbots","published":"2025","authors":["Shani Goren","Oren Kalinsky","Tomer Stav","Yuri Rapoport","Yaron Fairstein","Ram Yazdi","Nachshon Cohen","Alex Libov","Guy Kushilevitz"],"abstract":"The rise of LLMs has deflected a growing portion of human-computer interactions towards LLM-based chatbots. The remarkable abilities of these models allow users to interact using long, diverse natural language text covering a wide range of topics and styles. Phrasing these messages is a time and effort consuming task, calling for an autocomplete solution to assist users. We present ChaI-TeA: Chat Interaction Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Conversational AI","LLM"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=34"}},{"id":"official:2c9be96438617480","title":"CatalogRAG: Retrieval-guided LLM prediction for multilingual e-commerce product attributes","url":"https://www.amazon.science/publications/catalograg-retrieval-guided-llm-prediction-for-multilingual-e-commerce-product-attributes","published":"2025","authors":["Bryan Zhang","Suleiman Khan","Stephan Walter"],"abstract":"E-commerce stores increasingly use Large Language Models (LLMs) to enhance catalog data quality through automated regeneration. A critical challenge is accurately predicting missing structured attribute values across multilingual product catalogs, where LLM performance varies significantly by language. While existing approaches leverage general knowledge through prompt engineering and external retrieval","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["LLM","retrieval"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=19"}},{"id":"official:299371ac472f3547","title":"CSPLADE: Learned sparse retrieval with causal language models","url":"https://www.amazon.science/publications/csplade-learned-sparse-retrieval-with-causal-language-models","published":"2025","authors":["Zhichao Xu","Aosong Feng","Yijun Tian","Haibo Ding","Lin Lee Cheong"],"abstract":"In recent years, dense retrieval has been the focus of information retrieval (IR) research. While effective, dense retrieval produces uninterpretable dense vectors, and suffers from the drawback of large index size. Learned sparse retrieval (LSR) has emerged as promising alternative, achieving competitive retrieval performance while also being able to leverage the classical inverted index data structure Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.18653/v1/2025.ijcnlp-long.7","openalex_id":"https://openalex.org/W7138229905","cited_by_count":0,"quality_score":60,"matched_keywords":["Conversational AI","retrieval"],"author_affiliations":["Amazon","Amazon (Germany)","Amazon (United States)"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=10"}},{"id":"official:96a054431aaae437","title":"Block-diagonal LoRA for eliminating communication overhead in tensor parallel LoRA serving","url":"https://www.amazon.science/publications/block-diagonal-lora-for-eliminating-communication-overhead-in-tensor-parallel-lora-serving","published":"2025","authors":["Xinyu Wang","Jonas M. Kübler","Kailash Budhathoki","Yida Wang","Matthaus Kleindessner"],"abstract":"When serving a single base LLM with several different LoRA adapters simultaneously, the adapters cannot simply be merged with the base model’s weights as the adapter swapping would create overhead and requests using different adapters could not be batched. Rather, the LoRA computations have to be separated from the base LLM computations, and in a multi-device setup the LoRA adapters can be sharded in a Category: Machine learning","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Machine learning","LLM"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=10"}},{"id":"official:d3287886f4d5815e","title":"BeyondCorrelation: The impact of human uncertainty in measuring the effectiveness of automatic evaluation and LLM-as-a-judge","url":"https://www.amazon.science/publications/beyond-correlation-the-impact-of-human-uncertainty-in-measuring-the-effectiveness-of-automatic-evaluation-and-llm-as-a-judge","published":"2025","authors":["Aparna Elangovan","Lei Xu","Jongwoo Ko","Mahsa Elyasi","Ling Liu","Sravan Bodapati","Dan Roth"],"abstract":"The effectiveness of automatic evaluation of generative models is typically measured by comparing the labels generated via automation with human labels using correlation metrics. However, metrics like Krippendorff’s α and Randolph’s κ were originally designed to measure the reliability of human labeling, thus make assumptions about typical human labeling behavior, and these assumptions may not be applicable Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Conversational AI","LLM"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=29"}},{"id":"official:6d9d1c3abfa17eba","title":"Beyond instruction-conditioning, MoTE: Mixture of task experts for multi-task embedding models","url":"https://www.amazon.science/publications/beyond-instruction-conditioning-mote-mixture-of-task-experts-for-multi-task-embedding-models","published":"2025","authors":["Miguel Romero Calvo","Shuoyang Ding","Corey Barrett","Georgiana Dinu","George Karypis"],"abstract":"Dense embeddings are fundamental to modern machine learning systems, powering Retrieval Augmented Generation (RAG), information retrieval, and representation learning. While instruction-conditioning has become the dominant approach for embedding specialization, its direct application to low-capacity models imposes fundamental representational constraints that limit the performance gains derived from specialization Category: Search and information retrieval","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Search and information retrieval","retrieval"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=22"}},{"id":"official:4c21eeee3cf29445","title":"Automated composition of agents: A knapsack approach for agentic component selection","url":"https://www.amazon.science/publications/automated-composition-of-agents-a-knapsack-approach-for-agentic-component-selection","published":"2025","authors":["Michelle Yuan","Khushbu Pahwa","Shuaichen Chang","Mustafa Kaba","Jiarong Jiang","Xiaofei Ma","Yi Zhang","Monica Sunkara"],"abstract":"Designing effective agentic systems requires the seamless composition and integration of agents, tools, and models within dynamic and uncertain environments. Most existing methods rely on static, semantic retrieval approaches for tool or agent discovery. However, effective reuse and composition of existing components remain challenging due to incomplete capability descriptions and the limitations of retrieval","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["retrieval","agent"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=10"}},{"id":"official:66433f7638ef3c3b","title":"AutoMixAlign: Adaptive data mixing for multi-task preference optimization in LLMs","url":"https://www.amazon.science/publications/automixalign-adaptive-data-mixing-for-multi-task-preference-optimization-in-llms","published":"2025","authors":["Nicholas Corrado","Julian Katz-Samuels","Adithya M Devraj","Hyokun Yun","Chao Zhang","Yi Xu","Yi Pan","Bing Yin","Trishul Chilimbi"],"abstract":"When aligning large language models (LLMs), their performance on various tasks (such as being helpful, harmless, and honest) depends heavily on the composition of their training data. However, selecting a data mixture that achieves strong performance across all tasks is challenging. Existing approaches rely on large ablation studies, heuristics, or human intuition, but these can be prohibitively expensive Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Conversational AI","preference"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=13"}},{"id":"official:10238f3677461614","title":"AutoChunker: Structured text chunking and its evaluation","url":"https://www.amazon.science/publications/autochunker-structured-text-chunking-and-its-evaluation","published":"2025","authors":["Arihant Jain","Purav Aggarwal","Anoop S V K K Saladi"],"abstract":"Text chunking is fundamental to modern retrieval-augmented systems, yet existing methods often struggle with maintaining semantic coherence, both within and across chunks, while dealing with document structure and noise. We present AutoChunker, a bottom-up approach for text chunking that combines document structure awareness with noise elimination. AutoChunker leverages language models to identify and segregate Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Conversational AI","retrieval"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=22"}},{"id":"official:9dd8032571f984ce","title":"Auto prompting without training labels: An LLM cascade for product quality assessment in e-commerce catalogs","url":"https://www.amazon.science/publications/auto-prompting-without-training-labels-an-llm-cascade-for-product-quality-assessment-in-e-commerce-catalogs","published":"2025","authors":["Soham Satyadharma","Fateme Sheikholeslami","Swati Kaul","Umit Batur","Suleiman Khan"],"abstract":"We introduce a novel, training free cascade for auto-prompting Large Language Models (LLMs) to assess product quality in e-commerce. Our system requires no training labels or model fine-tuning, instead automatically generating and refining prompts for evaluating attribute quality across tens of thousands of product category–attribute pairs. Starting from a seed of human-crafted prompts, the cascade progressively Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.48448/7tse-1404","openalex_id":"https://openalex.org/W7106813542","cited_by_count":0,"quality_score":60,"matched_keywords":["Conversational AI","LLM"],"author_affiliations":["Amazon","Amazon (United States)"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=11"}},{"id":"official:dc3e2be40e8c697f","title":"AnoLLM: Large language models for tabular anomaly detection","url":"https://www.amazon.science/publications/anollm-large-language-models-for-tabular-anomaly-detection","published":"2025","authors":["Che-Ping Tsai","Ganyu Teng","Phil Wallis","Wei Ding"],"abstract":"We introduce AnoLLM, a novel framework that leverages large language models (LLMs) for unsupervised tabular anomaly detection. By converting tabular data into a standardized text format, we further adapt a pre-trained LLM with this serialized data, and assign anomaly scores based on the negative log likelihood generated by the LLM. Unlike traditional methods that can require extensive feature engineering Category: Machine learning","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Machine learning","LLM"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=29"}},{"id":"official:3c2e0167520c6ec7","title":"An address intelligence framework for e-commerce deliveries","url":"https://www.amazon.science/publications/an-address-intelligence-framework-for-e-commerce-deliveries","published":"2025","authors":["Gokul Swamy","Aman Gulati","Srinivas Virinchi","Anoop S V K K Saladi"],"abstract":"For an e-commerce domain, the address is the single most important piece of data for ensuring accurate and reliable deliveries. In this two-part study, we first outline the construction of a language model to assist customers with address standardization and in the latter part, we detail a novel Pareto-ensemble multi-task prediction algorithm that derives critical insights from addresses to minimize operational Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Conversational AI","language model"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=11"}},{"id":"official:96dc3a8db2b94eeb","title":"Amazon Nova Premier: Technical report and model card","url":"https://www.amazon.science/publications/amazon-nova-premier-technical-report-and-model-card","published":"2025","authors":["Amazon Artificial General Intelligence"],"abstract":"We present Amazon Nova Premier, our most capable multimodal foundation model and teacher for model distillation. Nova Premier processes text, images, and videos with a one-million token context window enabling analysis of large codebases, long documents, and long videos in a single prompt. It also enables customers to use Amazon Bedrock to create customized variants of Amazon Nova Pro, Nova Lite, and Nova Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Conversational AI","distillation"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=26"}},{"id":"official:aa9439d04c2f3c2f","title":"Amazon Nova Multimodal Embeddings: Technical report and model card","url":"https://www.amazon.science/publications/amazon-nova-multimodal-embeddings-technical-report-and-model-card","published":"2025","authors":["Amazon Artificial General Intelligence"],"abstract":"We present Amazon Nova Multimodal Embeddings (MME), a state-of-the-art multimodal embedding model for agentic RAG and semantic search applications. Nova MME is the first embeddings model that supports five modalities as input: text, documents, images, video and audio, and transforms them into a single, unified embedding space. This powerful capability enables cross-modal retrieval —allowing users to search Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Conversational AI","retrieval"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=11"}},{"id":"official:438ab5f79e0da104","title":"Aligning large language models with implicit preferences from user-generated content","url":"https://www.amazon.science/publications/aligning-large-language-models-with-implicit-preferences-from-user-generated-content","published":"2025","authors":["Zhaoxuan Tan","Zheng Li","Tianyi Liu","Haodong Wang","Hyokun Yun","Ming Zeng","Pei (Patrick) Chen","Zhihan Zhang","Yifan Gao","Ruijie Wang","Priyanka Nigam","Bing Yin"],"abstract":"Learning from preference feedback is essential for aligning large language models (LLMs) with human values and improving the quality of generated responses. However, existing preference learning methods rely heavily on curated data from humans or advanced LLMs, which is costly and difficult to scale. In this work, we present PUGC, a novel framework that leverages implicit human Preferences in unlabeled Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Conversational AI","preference"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=21"}},{"id":"official:bd1acc7fa82dcdf2","title":"Aligning black-box language models with human judgments","url":"https://www.amazon.science/publications/aligning-black-box-language-models-with-human-judgments","published":"2025","authors":["Gerrit van den Burg","Gen Suzuki","Wei Liu","Murat Sensoy"],"abstract":"Large language models (LLMs) are increasingly used as automated judges to evaluate recommendation systems, search engines, and other subjective tasks, where relying on human evaluators can be costly, time-consuming, and unscalable. LLMs offer an efficient solution for continuous, automated evaluation. However, since the systems that are built and improved with these judgments are ultimately designed for Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Conversational AI","efficient"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=34"}},{"id":"official:8f42e11e54bce401","title":"Align-SLM: Textless spoken language models with reinforcement learning from AI feedback","url":"https://www.amazon.science/publications/align-slm-textless-spoken-language-models-with-reinforcement-learning-from-ai-feedback","published":"2025","authors":["GUAN-TING LIN","Prashanth Gurunath Shivakumar","Aditya Gourav","Yile Gu","Ankur Gandhe","Hung-yi Lee","Ivan Bulyko"],"abstract":"While textless Spoken Language Models (SLMs) have shown potential in end-to-end speech-to-speech modeling, they still lag behind text-based Large Language Models (LLMs) in terms of semantic coherence and relevance. This work introduces the Align-SLM framework, which leverages preference optimization inspired by Reinforcement Learning with AI Feedback (RLAIF) to enhance the semantic understanding of SLMs Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Conversational AI","preference"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=18"}},{"id":"official:674bd46d92304f06","title":"Agentic generative AI for media content discovery at the national football league","url":"https://www.amazon.science/publications/agentic-generative-ai-for-media-content-discovery-at-the-national-football-league","published":"2025","authors":["Henry Wang","Sirajus Salekin","Jake Lee","Ross Claytor","Shinan Zhang","Michael Chi"],"abstract":"Generative AI has unlocked new possibilities in content discovery and management. Through collaboration with the National Football League (NFL), we demonstrate how a generative-AI based workflow allows media researchers and analysts to query relevant historical plays using natural language, rather than using traditional filter and click-based interfaces. The agentic workflow takes a user query in natural Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.1007/978-3-032-06167-6_20","openalex_id":"https://openalex.org/W4414498702","cited_by_count":0,"quality_score":60,"matched_keywords":["Conversational AI","media"],"author_affiliations":["Amazon","Amazon (United States)","NFL Foundation"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=15"}},{"id":"official:22e65c01db625e4d","title":"Adjoint sharding for very long context training of state space models","url":"https://www.amazon.science/publications/adjoint-sharding-for-very-long-context-training-of-state-space-models","published":"2025","authors":["Xingzi Xu","Amir Tavanaei","Kavosh Asadi","Karim Bouyarmane"],"abstract":"Despite fast progress, efficiently training large language models (LLMs) in extremely long contexts remains challenging. Existing methods fall back to training LLMs with short contexts (up to a few thousand tokens) and use inference time techniques when evaluating on very long contexts (above 1M tokens). Training on very long contexts is limited by GPU memory availability and the prohibitively long training Category: Machine learning","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Machine learning","memory"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=8"}},{"id":"official:474759ac75816627","title":"ATLAS: Actor-critic task-completion with look-ahead action simulation","url":"https://www.amazon.science/publications/atlas-actor-critic-task-completion-with-look-ahead-action-simulation","published":"2025","authors":["Jiali Cheng","Anjishnu Kumar","Roshan Lal","Rishi Rajasekaran","Hani Ramezani","Omar Zia Khan","Oleg Rokhlenko","Sunny Chiu-Webster","Gang Hua","Hadi Amiri"],"abstract":"We observe that current state-of-the-art web-agents are unable to effectively adapt to new environments without neural network fine-tuning, without which they produce inefficient execution plans due to a lack of awareness of the structure and dynamics of the new environment. To address this limitation, we introduce ATLAS (Actor-Critic Task-completion with Look-ahead Action Simulation), a memory-augmented Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Conversational AI","memory"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=8"}},{"id":"official:e6317e97302cdb17","title":"ASK: Aspects and retrieval based hybrid clarification in task oriented dialogue systems","url":"https://www.amazon.science/publications/ask-aspects-and-retrieval-based-hybrid-clarification-in-task-oriented-dialogue-systems","published":"2025","authors":["Rishav Sahay","Lavanya Tekumalla","Purav Aggarwal","Arihant Jain","Anoop S V K K Saladi"],"abstract":"Ambiguous user queries pose a significant challenge in task-oriented dialogue systems relying on information retrieval. While Large Language Models (LLMs) have shown promise in generating clarification questions to tackle query ambiguity, they rely solely on the topk retrieved documents for clarification which fails when ambiguity is too high to retrieve relevant documents in the first place. Traditional Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Conversational AI","retrieval"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=22"}},{"id":"official:5140dade8a024b45","title":"A use-case-specific dataset for measuring dimensions of responsible performance in LLM-generated text","url":"https://www.amazon.science/publications/a-use-case-specific-dataset-for-measuring-dimensions-of-responsible-performance-in-llm-generated-text","published":"2025","authors":["Alicia Sagae","CJ Lee","Sandeep Avula","Brandon Dang","Vanessa Murdock"],"abstract":"Current methods for evaluating large language models (LLMs) typically focus on high-level tasks such as text generation, without targeting a particular AI application. This approach is not sufficient for evaluating LLMs for Responsible AI dimensions like fairness, since protected attributes that are highly relevant in one application may be less relevant in another. In this work, we construct a dataset Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Conversational AI","LLM"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=14"}},{"id":"official:9b1304ac4841cad4","title":"What matters when building vision language models for product image analysis?","url":"https://www.amazon.science/publications/what-matters-when-building-vision-language-models-for-product-image-analysis","published":"2025","authors":["Ameni Trabelsi","Maria Zontak","Yiming Qian","Brian Jackson","Suleiman Khan","Umit Batur"],"abstract":"This paper investigates multi-modal large language models (MLLMs) for predicting product features from images, comparing fine-tuned versus proprietary models. We introduce two domain-specific benchmarks: (1) Inductive Bias vs. Image Evidence (IBIE) Benchmark, which evaluates MLLMs’ ability to distinguish between image-derived features and latent knowledge, and (2) Catalog-bench, which assesses feature prediction Category: Computer vision","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.1109/wacvw65960.2025.00151","openalex_id":"https://openalex.org/W4409917617","cited_by_count":2,"quality_score":58,"matched_keywords":["Computer vision"],"author_affiliations":["Amazon","Amazon (United States)","Seattle University"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=32"}},{"id":"official:5b39be2ef81ac536","title":"Trust dynamics in AI-assisted development: Definitions, factors, and implications","url":"https://www.amazon.science/publications/trust-dynamics-in-ai-assisted-development-definitions-factors-and-implications","published":"2025","authors":["Sadra Sabouri","Philipp Eibl","Xinyi Zhou","Morteza Ziyadi","Nenad Medvidovic","Lars Lindemann","Souti Chattopadhyay"],"abstract":"Software developers increasingly rely on AI code generation utilities. To ensure that “good” code is accepted into the code base and “bad” code is rejected, developers must know when to trust an AI suggestion. Understanding how developers build this intuition is crucial to enhancing developer-AI collabo-rative programming. In this paper, we seek to understand how developers (1) define and (2) evaluate the Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.1109/icse55347.2025.00199","openalex_id":"https://openalex.org/W4411552879","cited_by_count":2,"quality_score":58,"matched_keywords":["Conversational AI"],"author_affiliations":["Amazon","Amazon (United States)","University of Southern California"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=33"}},{"id":"official:8c4d6b3a0327a29b","title":"Structured human assessment of text-to-image generative models","url":"https://www.amazon.science/publications/structured-human-assessment-of-text-to-image-generative-models","published":"2025","authors":["Ciprian Corneanu","Qianli Feng","Aleix Martinez"],"abstract":"Following the great progress in text-conditioned image generation there is a dire need for establishing clear comparison benchmarks. Unfortunately, assessing performance of such models is highly subjective and notoriously difficult. Current automatic assessment of generated images quality and their alignment to text are approximate at best while human assessment is subjective, poorly calibrated and not Category: Computer vision","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.1109/wacv61041.2025.00440","openalex_id":"https://openalex.org/W4409261955","cited_by_count":1,"quality_score":57,"matched_keywords":["Computer vision"],"author_affiliations":["Amazon","Amazon (United States)"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=36"}},{"id":"official:eebfb85e69f9bb3f","title":"Effective techniques for scaling audio encoder pretraining","url":"https://www.amazon.science/publications/effective-techniques-for-scaling-audio-encoder-pretraining","published":"2025","authors":["Byeonggeun Kim","Andrew Bydlon","Qingming Tang","Huy Phan","Chieh-Chi Kao","Tao Zhang","Chao Wang"],"abstract":"This work presents advancements in audio pretraining objectives designed to generate semantically rich embeddings, capable of addressing a wide range of audio-related tasks. Despite significant progress in the field, current methods often emphasize full fine-tuning in downstream applications, which can obscure the true potential of pretrained audio encoders. In this study, we present an audio encoder that Category: Machine learning","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.1109/icassp49660.2025.10890012","openalex_id":"https://openalex.org/W4408354391","cited_by_count":1,"quality_score":57,"matched_keywords":["Machine learning"],"author_affiliations":["Amazon","Amazon (United Kingdom)","Amazon (United States)","Bellevue Hospital Center"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=36"}},{"id":"official:1e2fd70796b22d6a","title":"Domain adaptation of VLM for soccer video understanding","url":"https://www.amazon.science/publications/domain-adaptation-of-vlm-for-soccer-video-understanding","published":"2025","authors":["Tiancheng Jiang","Henry Wang","Sirajus Salekin","Parmida Atighehchian","Shinan Zhang"],"abstract":"Vision Language Models (VLMs) have demonstrated strong performance in multi-modal tasks by effectively aligning visual and textual representations. However, most video understanding VLM research has been domain-agnostic, leaving the understanding of their transfer learning capability to specialized domains under-explored. In this work, we address this by exploring the adaptability of open-source VLMs to Category: Computer vision","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.1109/cvprw67362.2025.00608","openalex_id":"https://openalex.org/W4414197608","cited_by_count":1,"quality_score":57,"matched_keywords":["Computer vision"],"author_affiliations":["Amazon","Amazon (United States)","Massachusetts Institute of Technology"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=26"}},{"id":"official:0306e1ca53524ce8","title":"Zero-shot customized video editing with diffusion feature transfer","url":"https://www.amazon.science/publications/zero-shot-customized-video-editing-with-diffusion-feature-transfer","published":"2025","authors":["Wei Chen","Huidong Liu","Yang Liu","Chien-Chih Wang","Moyan Li","Hongdong Li","Bryan Wang"],"abstract":"Customized video editing aims at substituting the object in a given source video with a target object from reference images (Fig. 1). Existing approaches often rely on fine-tuning pre-trained models by learning the appearance of the objects in the reference images, as well as the temporal information from the source video. These methods are however not scalable as fine-tuning is required for each source Category: Computer vision","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Computer vision"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=16"}},{"id":"official:d7c05cb9476965e0","title":"When thinking fails: The pitfalls of reasoning for instruction-following in LLMs","url":"https://www.amazon.science/publications/when-thinking-fails-the-pitfalls-of-reasoning-for-instruction-following-in-llms","published":"2025","authors":["Xiaomin Li","Zhou Yu","Anurag Beniwal","Ziji Zhang","Yingying Zhuang","Narayanan Sadagopan"],"abstract":"Reasoning-enhanced large language models (RLLMs), whether explicitly trained for reasoning or prompted via chain-of-thought (CoT), have achieved state-of-the-art performance on many complex reasoning tasks. However, we uncover a surprising and previously overlooked phenomenon: explicit CoT reasoning can significantly degrade instruction-following accuracy. Evaluating 20+ models on two benchmarks: IFEval Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Conversational AI"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=9"}},{"id":"official:472e07841c79e51a","title":"Wanda++: Pruning large language models via regional gradients","url":"https://www.amazon.science/publications/wanda++-pruning-large-language-models-via-regional-gradients","published":"2025","authors":["Yifan Yang","Kai Zhen","Bhavana Ganesh","Aram Galstyan","Goeric Huybrechts","Markus Müller","Jonas M. Kübler","Rupak Vignesh Swaminathan","Thanasis Mouchtaris","Sravan Babu Bodapati","Nathan Susanj","Zheng Zhang"],"abstract":"Large Language Models (LLMs) pruning seeks to remove unimportant weights for inference speedup with minimal accuracy impact. However, existing methods often suffer from accuracy degradation without full-model sparsity-aware fine-tuning. This paper presents Wanda++, a novel pruning framework that outperforms the state-of-the-art methods by utilizing decoder-block-level regional gradients. Specifically, Wanda Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Conversational AI"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=16"}},{"id":"official:6cdb547a90e3cb31","title":"VADE: Visual attention guided hallucination detection and elimination","url":"https://www.amazon.science/publications/vade-visual-attention-guided-hallucination-detection-and-elimination","published":"2025","authors":["Vishnu Prabhakaran","Purav Aggarwal","Vinay Kumar Verma","Gokul Swamy","Anoop S V K K Saladi"],"abstract":"Vision Language Models (VLMs) have achieved significant advancements in complex visual understanding tasks. However, VLMs are prone to hallucinations—generating outputs that lack alignment with visual content. This paper addresses hallucination detection in VLMs by leveraging the visual grounding information encoded in transformer attention maps. We identify three primary challenges in this approach: the Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Conversational AI"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=22"}},{"id":"official:a20ae9292dc9442b","title":"Using large language models to improve product information in e-commerce catalogs","url":"https://www.amazon.science/publications/using-large-language-models-to-improve-product-information-in-e-commerce-catalogs","published":"2025","authors":["Gang Luo","Julien Han","Hayreddin Ceker","Karim Bouyarmane"],"abstract":"To give customers good experience, an e-commerce retailer needs high-quality product information in its catalog. Yet, the raw product information often lacks sufficient quality. For a large catalog that can contain billions of products, manually fixing this information is highly labor-intensive. To address this issue, we propose using the tool use functionality of large language models to automatically Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.1145/3746252.3761437","openalex_id":"https://openalex.org/W4416017916","cited_by_count":0,"quality_score":56,"matched_keywords":["Conversational AI"],"author_affiliations":["Amazon","Amazon (United States)"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=14"}},{"id":"official:fb88380bf3d83f4c","title":"Unlearning as multi-task optimization: A normalized gradient difference approach with an adaptive learning rate","url":"https://www.amazon.science/publications/unlearning-as-multi-task-optimization-a-normalized-gradient-difference-approach-with-an-adaptive-learning-rate","published":"2025","authors":["Xiaomeng Jin","Zhiqi Bu","Bhanu Vinzamuri","Anil Ramakrishna","Kai-Wei Chang","Volkan Cevher","Mingyi Hong"],"abstract":"Machine unlearning has been used to remove unwanted knowledge acquired by large language models (LLMs). In this paper, we examine machine unlearning from an optimization perspective, framing it as a regularized multi-task optimization problem, where one task optimizes a forgetting objective and another optimizes the model performance. In particular, we introduce a normalized gradient difference (NGDiff) Category: Machine learning","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.18653/v1/2025.naacl-long.563","openalex_id":"https://openalex.org/W4411119327","cited_by_count":0,"quality_score":56,"matched_keywords":["Machine learning"],"author_affiliations":["Amazon","Amazon (United States)"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=33"}},{"id":"official:8a009e2063154016","title":"Uncertainty-aware fusion: An ensemble framework for mitigating hallucinations in large language models","url":"https://www.amazon.science/publications/uncertainty-aware-fusion-an-ensemble-framework-for-mitigating-hallucinations-in-large-language-models","published":"2025","authors":["Prasenjit Dey","Srujana Merugu","Sivaramakrishnan (Siva) Kaveri"],"abstract":"Large Language Models (LLMs) are known to hallucinate and generate non-factual outputs which can undermine user trust. Traditional methods to directly mitigate hallucinations, such as representation editing and contrastive decoding, often require additional training data and involve high implementation complexity. While ensemble-based approaches harness multiple LLMs to tap into the \"wisdom of crowds\", Category: Machine learning","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Machine learning"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=32"}},{"id":"official:f3713fa5fa5519a0","title":"Turbocharging web automation: The impact of compressed history states","url":"https://www.amazon.science/publications/turbocharging-web-automation-the-impact-of-compressed-history-states","published":"2025","authors":["Xiyue Zhu","Peng Tang","Haofu Liao","Srikar Appalaraju"],"abstract":"Language models have led to a leap forward in web automation. The current web automation approaches take the current web state, history actions, and language instruction as inputs to predict the next action, overlooking the importance of history states. However, the highly verbose nature of web page states can result in long input sequences and sparse information, hampering the effective utilization of Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Conversational AI"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=18"}},{"id":"official:dab06dde8df4653e","title":"TurboFuzzLLM: Turbocharging mutation-based fuzzing for effectively jailbreaking large language models in practice","url":"https://www.amazon.science/publications/turbofuzzllm-turbocharging-mutation-based-fuzzing-for-effectively-jailbreaking-large-language-models-in-practice","published":"2025","authors":["Aman Goel","Carrie Wu","Zhe Wang","Dmitriy Bespalov","Yanjun (Jane) Qi"],"abstract":"Jailbreaking large-language models (LLMs) involves testing their robustness against adversarial prompts and evaluating their ability to withstand prompt attacks that could elicit unauthorized or malicious responses. In this paper, we present TurboFuzzLLM, a mutation-based fuzzing technique for efficiently finding a collection of effective jailbreaking templates that, when combined with harmful questions Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Conversational AI"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=26"}},{"id":"official:4b59b3499b0b6dfa","title":"Transforming expert knowledge into scalable ontology via large language models","url":"https://www.amazon.science/publications/transforming-expert-knowledge-into-scalable-ontology-via-large-language-models","published":"2025","authors":["Ikkei Itoku","David Theil","Evelyn Eichelsdoerfer Uehara","Sreyoshi Bhaduri","Jun Kuroda","Toshi Yumoto","Alex Gil","Natalie Perez","Rajesh Cherukuri","Naumaan Nayyar"],"abstract":"Having a unified, coherent taxonomy is essential for effective knowledge representation in domain-specific applications as diverse terminologies need to be mapped to underlying concepts. Traditional manual approaches to taxonomy alignment rely on expert review of concept pairs, but this becomes prohibitively expensive and time-consuming at scale, while subjective interpretations often lead to expert disagreements Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Conversational AI"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=18"}},{"id":"official:a5cc31e361efadce","title":"Training LLMs with MXFP4","url":"https://www.amazon.science/publications/training-llms-with-mxfp4","published":"2025","authors":["Albert Tseng","Tao Yu","Youngsuk Park"],"abstract":"Low precision (LP) datatypes such as MXFP4 can accelerate matrix multiplications (GEMMs) and reduce training costs. However, directly using MXFP4 instead of BF16 during training significantly degrades model quality. In this work, we present the first near-lossless training recipe that uses MXFP4 GEMMs, which are 2× faster than FP8 on supported hardware. Our key insight is to compute unbiased gradient estimates Category: Machine learning","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Machine learning"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=31"}},{"id":"official:f4df214fd6f2d025","title":"Towards safety reasoning in LLMs: AI-agentic deliberation for policy-embedded CoT data creation","url":"https://www.amazon.science/publications/towards-safety-reasoning-in-llms-ai-agentic-deliberation-for-policy-embedded-cot-data-creation","published":"2025","authors":["Tharindu Kumarage","Ninareh Mehrabi","Anil Ramakrishna","Xinyan Zhao","Richard Zemel","Kai-Wei Chang","Aram Galstyan","Rahul Gupta","Charith Peris"],"abstract":"Safety reasoning is a recent paradigm where LLMs reason over safety policies before generating responses, thereby mitigating limitations in existing safety measures such as over-refusal and jailbreak vulnerabilities. However, implementing this paradigm is challenging due to the resource-intensive process of creating high-quality policy-embedded chain-of-thought (CoT) datasets while ensuring reasoning remains Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Conversational AI"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=22"}},{"id":"official:7601a7127235db60","title":"Towards robust knowledge representations in multilingual LLMs for equivalence and inheritance based consistent reasoning","url":"https://www.amazon.science/publications/towards-robust-knowledge-representations-in-multilingual-llms-for-equivalence-and-inheritance-based-consistent-reasoning","published":"2025","authors":["Gaurav Arora","Srujana Merugu","Shreya Jain","Vaibhav Saxena"],"abstract":"Reasoning and linguistic skills form the cornerstone of human intelligence, facilitating problem-solving and decision-making. Recent advances in Large Language Models (LLMs) have led to impressive linguistic capabilities and emergent reasoning behaviors, fueling widespread adoption across application do-mains. However, LLMs still struggle with complex reasoning tasks, highlighting their systemic limitations Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Conversational AI"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=32"}},{"id":"official:7517ef5c395ebdad","title":"Towards long context hallucination detection","url":"https://www.amazon.science/publications/towards-long-context-hallucination-detection","published":"2025","authors":["Siyi Liu","Kishaloy Halder","Zheng Qi","Wei Xiao","Nikolaos Pappas","Phu Mon Htut","Neha Anna John","Yassine Benajiba","Dan Roth"],"abstract":"Large Language Models (LLMs) have demonstrated remarkable performance across various tasks. However, they are prone to contextual hallucination, generating information that is either unsubstantiated or contradictory to the given context. Although many studies have investigated contextual hallucinations in LLMs, addressing them in long-context inputs remains an open problem. In this work, we take an initial Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Conversational AI"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=31"}},{"id":"official:49cd1ed1a55bf4cd","title":"Think clearly: Improving reasoning via redundant token pruning","url":"https://www.amazon.science/publications/think-clearly-improving-reasoning-via-redundant-token-pruning","published":"2025","authors":["Daewon Choi","Jimin Lee","Jihoon Tack","Woomin Song","Saket Dingliwal","Sai Muralidhar Jayanthi","Bhavana Ganesh","Jinwoo Shin","Aram Galstyan","Sravan Babu Bodapati"],"abstract":"Recent large language models have shown promising capabilities in long-form reasoning, following structured chains of thought before arriving at a final answer. However, we observe that these reasoning paths tend to include substantial redundancy; analyzing attention patterns reveals that attention scores are widely scattered, particularly incorrect answers exhibit greater attention sparsity. In this paper Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Conversational AI"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=19"}},{"id":"official:474329e494495281","title":"Sustainability-focused generative AI risk mitigation strategies","url":"https://www.amazon.science/publications/sustainability-focused-generative-ai-risk-mitigation-strategies","published":"2025","authors":["Lin Shi","Sasha Gutfraind"],"abstract":"The rapid rise of generative AI (GenAI) has sparked the sustainability community to explore its potential applications, such as climate impact modeling and renewable energy optimization. However, deploying these GenAIpowered solutions in enterprise environments raises risk concerns. In particular, chatbots and similar GenAI applications face risks of misinformation and disinformation stemming from knowledge Category: Sustainability","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Sustainability"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=19"}},{"id":"official:d680e739926faf98","title":"Searching for optimal solutions with LLMs via bayesian optimization","url":"https://www.amazon.science/publications/searching-for-optimal-solutions-with-llms-via-bayesian-optimization","published":"2025","authors":["Dhruv Agarwal","Manoj Ghuhan","Rajarshi (Raj) Das","Sandesh Swamy","Sopan Khosla","Rashmi Gangadharaiah"],"abstract":"Scaling test-time compute to search for optimal solutions is an important step towards building generally-capable language models that can reason. Recent work, however, shows that tasks of varying complexity require distinct search strategies to solve optimally, thus making it challenging to design a one-size-fits-all approach. Prior solutions either attempt to predict task difficulty to select the optimal Category: Machine learning","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Machine learning"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=28"}},{"id":"official:fd735a2d57b699a3","title":"SWE-InfraBench: Evaluating language models on cloud infrastructure code","url":"https://www.amazon.science/publications/swe-infrabench-evaluating-language-models-on-cloud-infrastructure-code","published":"2025","authors":["Natalia Tarasova","Enrique Balp","Aleksei Iancheruk","Yevhenii Sielskyi","Nikita Kozodoi","Liam Byrne","Jack Butler","Dayuan Jiang","Marcin Czelej","Andrew Ang","Yash Shah","Roi Blanco"],"abstract":"Building infrastructure-as-code (IaC) in cloud computing is a critical task, underpinning the reliability, scalability, and security of modern software systems. Despite the remarkable progress of large language models (LLMs) in software engineering – demonstrated across many dedicated benchmarks – their capabilities in developing IaC remain underexplored. Unlike existing IaC benchmarks that predominantly Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Conversational AI"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=8"}},{"id":"official:c1fbfd06eed9f183","title":"SQLENS: An end-to-end framework for error detection and correction in text-to-SQL","url":"https://www.amazon.science/publications/sqlens-an-end-to-end-framework-for-error-detection-and-correction-in-text-to-sql","published":"2025","authors":["Yue Gong","Chuan Lei","Xiao Qin","Kapil Eknath Vaidya","Balakrishnan (Murali) Narayanaswamy","Tim Kraska"],"abstract":"Text-to-SQL systems translate natural language (NL) questions into SQL queries, enabling non-technical users to interact with structured data. While large language models (LLMs) have shown promising results on the text-to-SQL task, they often produce semantically incorrect yet syntactically valid queries, with limited insight into their reliability. We propose SQLENS, an end-to-end framework for fine-grained Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Conversational AI"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=9"}},{"id":"official:4f7d0823a883fa77","title":"SPIE Proceedings: Transformer-based image captioning as a framework for defense applications","url":"https://www.amazon.science/publications/spie-proceedings-transformer-based-image-captioning-as-a-framework-for-defense-applications","published":"2025","authors":["Devin J Ullerick","Dzmitry Kasinets","Jayeeta Ghosh","Dilshad Raihan Akkam Veettil","Amir K. Saeed","Benjamin A. Johnson","Benjamin M. Rodriguez"],"abstract":"Transformer models have revolutionized the field of image captioning, offering advanced capabilities through self attention mechanisms that capture intricate visual and textual relationships. This paper presents an innovative approach to applying transformer models for image captioning. Current State-of-the-Art (SOTA) performance has only been achieved by large vision-language models (LVLMs). Our approach Category: Computer vision","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Computer vision"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=20"}},{"id":"official:b60ff0e9d58420b3","title":"SIFT-50M: A large-scale multilingual dataset for speech instruction fine-tuning","url":"https://www.amazon.science/publications/sift-50m-a-large-scale-multilingual-dataset-for-speech-instruction-fine-tuning","published":"2025","authors":["Prabhat Pandey","Rupak Vignesh Swaminathan","K V Vijay Girish","Arunasish Sen","Jian Xie","Grant Strimel","Andreas Schwarz"],"abstract":"We introduce SIFT (Speech Instruction FineTuning), a 50M-example dataset designed for instruction fine-tuning and pre-training of speech-text large language models (LLMs). SIFT-50M is built from publicly available speech corpora, which collectively contain 14K hours of speech, and leverages LLMs along with off-the-shelf expert models. The dataset spans five languages, encompassing a diverse range of speech Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Conversational AI"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=22"}},{"id":"official:99f5327f37670df7","title":"SGBD: Sharpness-aware mirror gradient with BLIP-based denoising for robust multimodal product recommendation","url":"https://www.amazon.science/publications/sgbd-sharpness-aware-mirror-gradient-with-blip-based-denoising-for-robust-multimodal-product-recommendation","published":"2025","authors":["Sarthak Srivastava","Kathy Wu"],"abstract":"Multimodal recommender systems leverage diverse information, to model user preferences and item features, helping users discover relevant products. Integrating multimodal data can mitigate challenges like data sparsity and cold-start, but also introduces risks such as information adjustment and inherent noise, posing robustness challenges. In this paper, we analyze multimodal recommenders from the perspective Category: Computer vision","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.1109/iccvw69036.2025.00248","openalex_id":"https://openalex.org/W7131146433","cited_by_count":0,"quality_score":56,"matched_keywords":["Computer vision"],"author_affiliations":["Amazon","Amazon (United States)"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=28"}},{"id":"official:92a531c10b6b3fb8","title":"SEAL: Speaker error correction using acoustic-conditioned large language models","url":"https://www.amazon.science/publications/seal-speaker-error-correction-using-acoustic-conditioned-large-language-models","published":"2025","authors":["Anurag Kumar","Rohit Paturi","Amber Afshan","Sundararajan Srinivasan"],"abstract":"Speaker Diarization (SD) is a crucial component of modern end-to-end ASR pipelines. Traditional SD systems, which are typically audio-based and operate independently of ASR, often introduce speaker errors, particularly during speaker transitions and overlapping speech. Recently, language models including fine-tuned large language models (LLMs) have shown to be effective as a second-pass speaker error corrector Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Conversational AI"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=35"}},{"id":"official:c0501014cc34efc3","title":"Robust online inference using adaptive model switching","url":"https://www.amazon.science/publications/robust-online-inference-using-adaptive-model-switching","published":"2025","authors":["Kalpan Mukherjee","Vikramank Singh","Abishek Sankararaman","Murali Narayanaswamy","Tim Kraska"],"abstract":"It is well known that Large language models (LLMs) have good zero-shot and few-shot performance which makes them a promising candidate for inference when no or few training samples are available. However, when there is abundant task data, small custom trained models perform as well or are superior in performance to pre-trained LLMs, even after accounting for in-context examples. Further, smaller models Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Conversational AI"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=20"}},{"id":"official:78c27904cc678d05","title":"Rethinking MCQ benchmarks: Mandatory reasoning evaluation reveals significant performance drops in large language models","url":"https://www.amazon.science/publications/rethinking-mcq-benchmarks-mandatory-reasoning-evaluation-reveals-significant-performance-drops-in-large-language-models","published":"2025","authors":["Yue Zhang","Nhan Nguyen"],"abstract":"Rigorous evaluation of Large Language Models (LLMs) is critical for their adoption in high-stakes applications, particularly in highly technical domains that require deep expertise and specialized training. The proliferation of LLMs from vari2025ous providers further underscores the need for comprehensive model performance benchmarking. Like many standardized tests and certification exams, several prominent Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Conversational AI"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=8"}},{"id":"official:598e665870eb8d4d","title":"RedTWIZ: Diverse LLM red teaming via adaptive attack planning","url":"https://www.amazon.science/nova-ai-challenge/proceedings/redtwiz-diverse-llm-red-teaming-via-adaptive-attack-planning","published":"2025","authors":["NOVA University Lisbon"],"abstract":"This paper presents the vision, scientific contributions, and technical details of RedTWIZ: an adaptive and diverse multi-turn red teaming framework, to audit the robustness of Large Language Models (LLMs) in AI-assisted software development. Our work is driven by three major research streams: (1) robust and systematic assessment of LLM conversational jailbreaks; (2) a diverse generative multi-turn attack","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["LLM"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=16"}},{"id":"official:7cccf9dcd37c7b12","title":"ProxSparse: Regularized learning of semi-structured sparsity masks for pretrained LLMs","url":"https://www.amazon.science/publications/proxsparse-regularized-learning-of-semi-structured-sparsity-masks-for-pretrained-llms","published":"2025","authors":["Hongyi Liu","Rajarshi Saha","Zhen Jia","Youngsuk Park","Jiaji Huang","Shoham Sabach","Yu-Xiang Wang","George Karypis"],"abstract":"Large Language Models (LLMs) have demonstrated exceptional performance in natural language processing tasks, yet their massive size makes serving them inefficient and costly. Semistructured pruning has emerged as an effective method for model acceleration, but existing approaches are suboptimal because they focus on local, layer-wise optimizations using heuristic rules, failing to leverage global feedback Category: Machine learning","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Machine learning"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=24"}},{"id":"official:37851d10dbd616e7","title":"Protein structure tokenization: Benchmarking and new recipe","url":"https://www.amazon.science/publications/protein-structure-tokenization-benchmarking-and-new-recipe","published":"2025","authors":["Xinyu Yuan","Zichen Wang","Marcus Collins","Huzefa Rangwala"],"abstract":"Recent years have witnessed a surge in the development of protein structural tokenization methods, which chunk protein 3D structures into discrete or continuous representations. Structure tokenization enables the direct application of powerful techniques like language modeling for protein structures, and large multimodal models to integrate structures with protein sequences and functional texts. Despite Category: Machine learning","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Machine learning"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=21"}},{"id":"official:cb62400cdefc73e4","title":"Plan-and-write: Structure-guided length control for LLMs without model retraining","url":"https://www.amazon.science/publications/plan-and-write-structure-guided-length-control-for-llms-without-model-retraining","published":"2025","authors":["Wale Akinfaderin","Shreyas Subramanian","Akarsha Sehwag"],"abstract":"Length control in Large Language Models (LLMs) is a crucial but under-addressed challenge, with applications ranging from voice interfaces requiring concise responses to research summaries needing comprehensive outputs. Current approaches to length control, including Regularized DPO, Length-Instruction Fine-Tuning, and tool-augmented methods, typically require expensive model retrain-ing or complex inference-time Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Conversational AI"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=24"}},{"id":"official:0c92ebef38dd19fc","title":"Parakeet: Emission factor recommendation for life cycle assessments with generative AI","url":"https://www.amazon.science/publications/parakeet-emission-factor-recommendation-for-carbon-footprinting-with-generative-ai","published":"2025","authors":["Bharathan Balaji","Fahimeh Ebrahimi","Nina Domingo","Gargeya Vunnava","Abu-Zaher Faridee","Soma Ramalingam","Shikha Gupta","Anran Wang","Harsh Gupta","Domenic Belcastro","Kellen Axten","Jeremie Hakian"],"abstract":"Accurately quantifying greenhouse gas (GHG) emissions is crucial for organizations to measure and mitigate their environmental impact. Life cycle assessment (LCA) estimates the environmental impacts throughout a product’s entire lifecycle, from raw material extraction to end-of-life. Measuring the emissions outside of a product owner’s control is challenging, and practitioners rely on emission factors ( Category: Sustainability","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Sustainability"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=29"}},{"id":"official:c2ee208c0ad79584","title":"PIXELS: Progressive image xemplar-based editing with latent surgery","url":"https://www.amazon.science/publications/pixels-progressive-image-xemplar-based-editing-with-latent-surgery","published":"2025","authors":["Shristi Das Biswas","Matthew Shreve","Xuelu Li","Prateek Singhal","Kaushik Roychoudhury (Roy)"],"abstract":"Recent advancements in language-guided diffusion models for image editing are often bottle-necked by cumbersome prompt engineering to precisely articulate desired changes. An intuitive alternative calls on guidance from in-the-wild image exemplars to help users bring their imagined edits to life. Contemporary exemplar-based editing methods shy away from leveraging the rich latent space learnt by pre-existing Category: Computer vision","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.1609/aaai.v39i3.32270","openalex_id":"https://openalex.org/W4409370109","cited_by_count":0,"quality_score":56,"matched_keywords":["Computer vision"],"author_affiliations":["Amazon","Amazon (United States)","Purdue University West Lafayette"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=36"}},{"id":"official:8299083eeaad2108","title":"OrchDAG: Complex tool orchestration in multi-turn interactions with plan DAGs","url":"https://www.amazon.science/publications/orchdag-complex-tool-orchestration-in-multi-turn-interactions-with-plan-dags","published":"2025","authors":["Yifu Lu","Shengjie Liu","Li Dong"],"abstract":"Agentic tool use has gained traction with the rise of agentic tool calling, yet most existing work overlooks the complexity of multi-turn tool interactions. We introduce OrchDAG, a synthetic data generation pipeline that models tool execution as directed acyclic graphs (DAGs) with controllable complexity. Using this dataset, we benchmark model performance and propose a graph-based reward to enhance RLVR Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Conversational AI"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=9"}},{"id":"official:90c871af500d1e7e","title":"On localizing and deleting toxic memories in large language models","url":"https://www.amazon.science/publications/on-localizing-and-deleting-toxic-memories-in-large-language-models","published":"2025","authors":["Anubrata Das","Manoj Kumar","Ninareh Mehrabi","Anil Ramakrishna","Anna Rumshisky","Kai-Wei Chang","Aram Galstyan","Morteza Ziyadi","Rahul Gupta"],"abstract":"Ensuring that large language models (LLMs) do not generate harmful text is critical for their safe deployment. A common failure mode involves producing toxic responses to otherwise innocuous prompts. While various detoxification methods have been proposed, the underlying mechanisms that drive toxic generation in LLMs are not yet fully understood. Our work aims to provide a mechanistic understanding of toxic Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Conversational AI"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=33"}},{"id":"official:9c88f923bc217c51","title":"MetaSynth: Meta–prompting–driven agentic scaffolds for diverse synthetic data generation","url":"https://www.amazon.science/publications/metasynth-meta-prompting-driven-agentic-scaffolds-for-diverse-synthetic-data-generation","published":"2025","authors":["Haris Riaz","Sourav Bhabesh","Vinayak Arannil","Miguel Ballesteros","Graham Horwood"],"abstract":"Recent smaller language models such Phi-3.5 and Phi-4 rely on synthetic data generated using larger Language models. Questions remain about leveraging synthetic data for other use cases, such as adapting LLMs to specific domains. A key limitation of synthetic data is low diversity, which negatively impacts its downstream applicability for improving other models. To address this, we propose MetaSynth, a Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Conversational AI"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=16"}},{"id":"official:9bd3c52789058383","title":"MergeME: Model merging techniques for homogeneous and heterogeneous MoEs","url":"https://www.amazon.science/publications/mergeme-model-merging-techniques-for-homogeneous-and-heterogeneous-moes","published":"2025","authors":["Yuhang Zhou","Giannis Karamanolakis","Victor Soto","Anna Rumshisky","Mayank Kulkarni","Furong Huang","Wei Ai","Jianhua Lu"],"abstract":"The recent success of specialized Large Language Models (LLMs) in domains such as mathematical reasoning and coding has led to growing interest in methods for merging these expert LLMs into a unified Mixture-of-Experts (MoE) model, with the goal of enhancing performance in each domain while retaining effectiveness on general tasks. However, the effective merging of expert models remains an open challenge Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Conversational AI"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=28"}},{"id":"official:5ba725ad2f38b17a","title":"MURPHY: Reflective multi-turn reinforcement learning for self-correcting code generation in large language models","url":"https://www.amazon.science/publications/murphy-reflective-multi-turn-reinforcement-learning-for-self-correcting-code-generation-in-large-language-models","published":"2025","authors":["Chanakya Ekbote","Vijay Lingam","Behrooz Omidvar-Tehrani","Luke Huan","Sujay Sanghavi","Anoop Deoras","Stefano Soatto"],"abstract":"Reinforcement Learning with Verifiable Rewards (RLVR) has emerged as a powerful framework for enhancing the reasoning capabilities of large language models (LLMs). However, existing approaches such as Group Relative Policy Optimization (GRPO) and its variants, while effective on reasoning benchmarks, struggle with agentic tasks that require iterative decision-making and refinement. We introduce MURPHY, Category: Machine learning","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Machine learning"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=12"}},{"id":"official:649bf83c4d02564a","title":"MITRA: Mixed synthetic priors for enhancing tabular foundation models","url":"https://www.amazon.science/publications/mitra-mixed-synthetic-priors-for-enhancing-tabular-foundation-models","published":"2025","authors":["Xiyuan Zhang","Danielle Maddix Robinson","Junming Yin","Nick Erickson","Abdul Fatir Ansari","Boran Han","Shuai Zhang","Leman Akoglu","Michael Mahoney","Cuixiong Hu","Huzefa Rangwala","George Karypis"],"abstract":"Since the seminal work of TabPFN, research on tabular foundation models (TFMs) based on in-context learning (ICL) has challenged long-standing paradigms in machine learning. Without seeing any real-world data, models pretrained on purely synthetic datasets generalize remarkably well across diverse datasets, often using only a moderate number of in-context examples. This shifts the focus in tabular machine Category: Machine learning","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Machine learning"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=24"}},{"id":"official:ab2a158fc0869778","title":"MDSEval: A meta-evaluation benchmark for multimodal dialogue summarization","url":"https://www.amazon.science/publications/mdseval-a-meta-evaluation-benchmark-for-multimodal-dialogue-summarization","published":"2025","authors":["Yinhong Liu","Jianfeng He","Hang Su","Ruixue Lian","Kevin Nian","Jake Vincent","Srikanth Vishnubhotla","Robinson Piramuthu","Saab Mansour"],"abstract":"Multimodal Dialogue Summarization (MDS) is a critical task with wide-ranging applications. To support the development of effective MDS models, robust automatic evaluation methods are essential for reducing both cost and human effort. However, such methods require a strong meta-evaluation benchmark grounded in human annotations. In this work, we introduce MDSEval, the first meta-evaluation benchmark for Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Conversational AI"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=13"}},{"id":"official:8dedd83014854632","title":"LibEvolutionEval: A benchmark and study for version-specific code generation","url":"https://www.amazon.science/publications/libevolutioneval-a-benchmark-and-study-for-version-specific-code-generation","published":"2025","authors":["Sachit Kuhar","Wasi Ahmad","Zijian Wang","Nihal Jain","Haifeng Qian","Baishakhi Ray","Murali Krishna Ramanathan","Xiaofei Ma","Anoop Deoras"],"abstract":"Recent advancements in code completion models have primarily focused on local file contexts (Ding et al., 2023b; Jimenez et al., 2024). However, these studies do not fully capture the complexity of real-world software development, which often requires the use of rapidlyevolving public libraries. To fill the gap, we introduce LIBEVOLUTIONEVAL, a detailed study requiring an understanding of library evolution Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Conversational AI"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=34"}},{"id":"official:2fc7d3fa8a429a71","title":"Learning to reason over time: Timeline self-reflection for improved temporal reasoning in language models","url":"https://www.amazon.science/publications/learning-to-reason-over-time-timeline-self-reflection-for-improved-temporal-reasoning-in-language-models","published":"2025","authors":["Adrian Bazaga","Rexhina Blloshmi","Bill Byrne","Adrià de Gispert"],"abstract":"Large Language Models (LLMs) have emerged as powerful tools for generating coherent text, understanding context, and performing reasoning tasks. However, they struggle with temporal reasoning, which requires processing time-related information such as event sequencing, durations, and inter-temporal relationships. These capabilities are critical for applications including question answering, scheduling, Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Conversational AI"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=10"}},{"id":"official:f4e0706bdc04b9a0","title":"Learning rich speech representations with acoustic-semantic factorization","url":"https://www.amazon.science/publications/learning-rich-speech-representations-with-acoustic-semantic-factorization","published":"2025","authors":["Sandy Niu","Najmeh Sadoughi","Abhishek Yanamandra","Pichao Wang","Zhu Liu","Vimal Bhat","Liz Norred"],"abstract":"Self-supervised pretraining has transformed speech representation learning, enabling models to generalize across various downstream tasks. However, empirical studies have highlighted two notable gaps. First, different speech tasks require varying levels of acoustic and semantic information, which are encoded at different layers within the model. This adds the extra complexity of layer selection on downstream Category: Machine learning","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.1109/icassp49660.2025.10889923","openalex_id":"https://openalex.org/W4408352263","cited_by_count":0,"quality_score":56,"matched_keywords":["Machine learning"],"author_affiliations":["Amazon","Amazon (United States)"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=35"}},{"id":"official:5834fdaa7e348712","title":"LatteCLIP: Unsupervised CLIP fine-tuning via LMM-synthetic texts","url":"https://www.amazon.science/publications/latteclip-unsupervised-clip-fine-tuning-via-lmm-synthetic-texts","published":"2025","authors":["Anh Quan Cao","Maximilian Jaritz","Matthieu Guillaumin","Raoul de Charette","Loris Bazzani"],"abstract":"Large-scale vision-language pre-trained (VLP) models (e.g., CLIP [46]) are renowned for their versatility, as they can be applied to diverse applications in a zero-shot setup. However, when these models are used in specific domains, their performance often falls short due to domain gaps or the under-representation of these domains in the training data. While fine-tuning VLP models on custom datasets with Category: Computer vision","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Computer vision"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=37"}},{"id":"official:dfb7ebbe4d7a4ff1","title":"Latent diffusion shield - Mitigating malicious use of diffusion models through latent space adversarial perturbations","url":"https://www.amazon.science/publications/latent-diffusion-shield-mitigating-malicious-use-of-diffusion-models-through-latent-space-adversarial-perturbations","published":"2025","authors":["Huy Phan","Boshi Huang","Ekraam Sabir","Prateek Singhal","Bo Yuan"],"abstract":"Diffusion models have revolutionized the landscape of generative AI, particularly in the application of text-to-image generation. However, their powerful capability of generating high-fidelity images raises significant security concerns on the malicious use of the state-of-the-art (SOTA) text-to-image diffusion models, notably the risks of misusing personal photos and copyright infringement through the Category: Computer vision","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Computer vision"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=32"}},{"id":"official:65189d737d2f6ac6","title":"LOFTI: Localization and factuality transfer to Indian locales","url":"https://www.amazon.science/publications/lofti-localization-and-factuality-transfer-to-indian-locales","published":"2025","authors":["Sona Elza Simon","Soumen Kumar Mondal","Abhishek Singhania","Sayambhu Sen","Preethi Jyothi"],"abstract":"Large language models (LLMs) encode vast amounts of world knowledge acquired via training on large web-scale datasets crawled from the internet. However, the datasets used to train the LLMs typically exhibit a geographical bias towards English-speaking Western countries. This results in LLMs producing biased or hallucinated responses to queries that require answers localized to other geographical regions Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.48550/arxiv.2407.11833","openalex_id":"https://openalex.org/W4403780077","cited_by_count":0,"quality_score":56,"matched_keywords":["Conversational AI"],"author_affiliations":["Amazon","Amazon (United States)","Indian Institute of Technology Bombay"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=23"}},{"id":"official:4239dd3381be2d29","title":"Incorporating diverse perspectives in cultural alignment: Survey of evaluation benchmarks through a three-dimensional framework","url":"https://www.amazon.science/publications/incorporating-diverse-perspectives-in-cultural-alignment-survey-of-evaluation-benchmarks-through-a-three-dimensional-framework","published":"2025","authors":["Gregory Wu","Si-Chi Chin","Tess Wood","Ayush Goyal","Narayanan Sadagopan"],"abstract":"Large Language Models (LLMs) increasingly serve diverse global audiences, making it critical for responsible AI deployment across cultures. While recent works have proposed various approaches to enhance cultural alignment in LLMs, a systematic analysis of their evaluation benchmarks remains needed. We propose a novel framework that conceptualizes alignment along three dimensions: Cultural Group (who to Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Conversational AI"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=11"}},{"id":"official:3bd8bf4daab3cbdb","title":"In-context learning for addressing user cold-start in sequential movie recommenders","url":"https://www.amazon.science/publications/in-context-learning-for-addressing-user-cold-start-in-sequential-movie-recommenders","published":"2025","authors":["Jason Liang","Vu Nguyen","Vuong Le","Paul Albert","Julien Monteil"],"abstract":"The user cold-start problem remains a fundamental challenge for sequential recommender systems, particularly in large-scale video streaming services where a substantial portion of users have limited or no historical interaction data. In this work, we formulate an attempt at solving this issue by proposing a framework that leverages Large Language Models (LLMs) to enrich interaction histories using user Category: Machine learning","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Machine learning"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=13"}},{"id":"official:e949ba86147f05f9","title":"Improving faithfulness of text-to-image diffusion models through inference intervention","url":"https://www.amazon.science/publications/improving-faithfulness-of-text-to-image-diffusion-models-through-inference-intervention","published":"2025","authors":["Danfeng Guo","Sanchit Agarwal","Yu-Hsiang Lin","Jiun-Yu Kao","Tagyoung Chung","Nanyun Peng","Mohit Bansal"],"abstract":"Text-to-Image diffusion models have shown remarkable capabilities in generating high-quality images. However, current models often struggle to adhere to the complete set of conditions specified in the input text and return unfaithful generations. Existing works address this problem by either fine-tuning the base model or modifying the latent representations during the inference stage with gradient-based Category: Computer vision","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.1109/wacv61041.2025.00401","openalex_id":"https://openalex.org/W4409262280","cited_by_count":0,"quality_score":56,"matched_keywords":["Computer vision"],"author_affiliations":["Amazon","Amazon (United States)","University of California, Los Angeles","University of North Carolina at Chapel Hill"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=37"}},{"id":"official:396e8be68a8677b9","title":"IMPACT: Iterative mask-based parallel decoding for text-to-audio generation with diffusion modeling","url":"https://www.amazon.science/publications/impact-iterative-mask-based-parallel-decoding-for-text-to-audio-generation-with-diffusion-modeling","published":"2025","authors":["Kuan Po Huang","Shu-wen Yang","Huy Phan","Bo-Ru (Roy) Lu","Byeonggeun Kim","Sashank Macha","Qingming Tang","Shalini Ghosh","Hung-yi Lee","Chieh-Chi Kao","Chao Wang"],"abstract":"Text-to-audio generation synthesizes realistic sounds or music given a natural language prompt. Diffusion-based frameworks, including the Tango and the AudioLDM series, represent the state-of-the-art in text-to-audio generation. Despite achieving high audio fidelity, they incur significant inference latency due to the slow diffusion sampling process. MAGNET, a mask-based model operating on discrete tokens Category: Machine learning","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Machine learning"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=22"}},{"id":"official:97fa5dd1b1ef6c96","title":"IHEval: Evaluating language models on following the instruction hierarchy","url":"https://www.amazon.science/publications/iheval-evaluating-language-models-on-following-the-instruction-hierarchy","published":"2025","authors":["Zhihan Zhang","Shiyang Li","Zixuan Zhang","Xin Liu","Haoming Jiang","Xianfeng Tang","Yifan Gao","Zheng Li","Haodong Wang","Zhaoxuan Tan","Yichuan Li","Qingyu Yin"],"abstract":"The instruction hierarchy, which establishes a priority order from system messages to user messages, conversation history, and tool outputs, is essential for ensuring consistent and safe behavior in language models (LMs). Despite its importance, this topic receives limited attention, and there is a lack of comprehensive benchmarks for evaluating models’ ability to follow the instruction hierarchy. We bridge Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Conversational AI"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=34"}},{"id":"official:c9a15515cab08afd","title":"Human-in-the-loop runbook improvement with agentic support automation.","url":"https://www.amazon.science/publications/human-in-the-loop-runbook-improvement-with-agentic-support-automation","published":"2025","authors":["Rocker D'Antonio","Harry Xie"],"abstract":"Operational support is an important component of production software services. Support requests are often emergent and can come in many forms such as customer escalations or unplanned service interruption. Engineers across organizations have been successful in implementing automation to help streamline support processes but many solutions remain in the hands of human operators. A successful strategy to Category: Operations research and optimization","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Operations research and optimization"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=24"}},{"id":"official:fc4d7617bbbeef65","title":"How to talk to language models: Serialization strategies for structured entity matching","url":"https://www.amazon.science/publications/how-to-talk-to-language-models-serialization-strategies-for-structured-entity-matching","published":"2025","authors":["Haoteng Yin","Jinha Kim","Prashant Mathur","Krishanu Sarker","Vidit Bansal"],"abstract":"Entity matching (EM), which identifies whether two data records refer to the same real-world entity, is crucial for knowledge base construction and enhancing data-driven AI systems. Recent advances in language models (LMs) have shown great potential in resolving entities with rich textual attributes. However, their performance heavily depends on how structured entities are \"talked\" through serialized text Category: Machine learning","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Machine learning"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=34"}},{"id":"official:f3ac63d716db25ac","title":"HopNet: Harmonizing object placement network for realistic image generation via object composition","url":"https://www.amazon.science/publications/hopnet-harmonizing-object-placement-network-for-realistic-image-generation-via-object-composition","published":"2025","authors":["Matthew Poska","Sharon X. Huang","Bin Hwang"],"abstract":"Realistic image generation is an increasingly desired, but deceptively complicated computer vision task, especially when a specific object is required. Whether generating product advertisements or building novel datasets, object composition for realistic image generation depends on realistic object placements as well as believable object harmonization. To address this task, we introduce HopNet, the first Category: Computer vision","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.1109/cvprw67362.2025.00630","openalex_id":"https://openalex.org/W4414197775","cited_by_count":0,"quality_score":56,"matched_keywords":["Computer vision"],"author_affiliations":["Amazon","Amazon (United States)","Pennsylvania State University"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=27"}},{"id":"official:2d28b73d5ddf6324","title":"Group-aware reinforcement learning for output diversity in large language models","url":"https://www.amazon.science/publications/group-aware-reinforcement-learning-for-output-diversity-in-large-language-models","published":"2025","authors":["Oron Anschel","Alon Shoshan","Adam Botach","Shunit Haviv Hakimi","Asaf Gendler","Emanuel Ben Baruch","Nadav Bhonker","Igor Kviatkovsky","Manoj Aggarwal","Gérard Medioni"],"abstract":"Large Language Models (LLMs) often suffer from mode collapse, repeatedly generating the same few completions even when many valid answers exist, limiting their diversity across a wide range of tasks. We introduce Group-Aware Policy Optimization (GAPO), a simple extension of the recent and popular Group Relative Policy Optimization (GRPO) that computes rewards over the group as a whole. GAPO enables learning Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Conversational AI"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=14"}},{"id":"official:7e6fb63a39b531b3","title":"Group relative policy optimization for speech recognition","url":"https://www.amazon.science/publications/group-relative-policy-optimization-for-speech-recognition","published":"2025","authors":["Prashanth Gurunath Shivakumar","Yile Gu","Ankur Gandhe","Ivan Bulyko"],"abstract":"Speech Recognition has seen a dramatic shift towards adopting Large Language Models (LLMs). This shift is partly driven by good scalability properties demonstrated by LLMs, ability to leverage large amounts of labelled, unlabelled speech and text data, streaming capabilities with autoregressive framework and multi-tasking with instruction following characteristics of LLMs. However, simple next-token prediction Category: Machine learning","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.1109/asru65441.2025.11434657","openalex_id":"https://openalex.org/W7148344176","cited_by_count":0,"quality_score":56,"matched_keywords":["Machine learning"],"author_affiliations":["Amazon","Amazon (United States)"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=14"}},{"id":"official:e8a3a22171539a8d","title":"Griffin: Towards a graph-centric relational database foundation model","url":"https://www.amazon.science/publications/griffin-towards-a-graph-centric-relational-database-foundation-model","published":"2025","authors":["Yanbo Wang","Xiyuan Wang","Quan Gan","Minjie Wang","Qibin Yang","David Paul Wipf","Muhan Zhang"],"abstract":"We introduce Griffin, the first foundation model attemptation designed specifically for Relational Databases (RDBs). Unlike previous smaller models focused on single RDB tasks, Griffin unifies the data encoder and task decoder to handle diverse tasks. Additionally, we enhance the architecture by incorporating a cross-attention module and a novel aggregator. Griffin utilizes pretraining on both single-table Category: Machine learning","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Machine learning"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=18"}},{"id":"official:16b424a725732a49","title":"GaRAGe: A benchmark with grounding annotations for RAG evaluation","url":"https://www.amazon.science/publications/garage-a-benchmark-with-grounding-annotations-for-rag-evaluation","published":"2025","authors":["Ionut Teodor Sorodoc","Leonardo Ribeiro","Rexhina Blloshmi","Christopher Davis","Adrià de Gispert"],"abstract":"We present GaRAGe, a large RAG benchmark with human-curated long-form answers and annotations of each grounding passage, allowing a fine-grained evaluation of whether LLMs can identify relevant grounding when generating RAG answers. Our benchmark contains 2366 questions of diverse complexity, dynamism, and topics, and includes over 35K annotated passages retrieved from both private document sets and the Category: Machine learning","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Machine learning"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=17"}},{"id":"official:e146a89b34e54623","title":"Ensembles of low-rank expert adapters","url":"https://www.amazon.science/publications/ensembles-of-low-rank-expert-adapters","published":"2025","authors":["Yinghao Li","Vianne Gao","Chao Zhang","Ali Torkamani"],"abstract":"The training and fine-tuning of large language models (LLMs) often involve diverse textual data from multiple sources, which poses challenges due to conflicting gradient directions, hindering optimization and specialization. These challenges can undermine model generalization across tasks, resulting in reduced downstream performance. Recent research suggests that fine-tuning LLMs on carefully selected, Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Conversational AI"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=34"}},{"id":"official:16c13e2cbe83113d","title":"Enhancing foundation models for time series forecasting via Wavelet-based tokenization","url":"https://www.amazon.science/publications/enhancing-foundation-models-for-time-series-forecasting-via-wavelet-based-tokenization","published":"2025","authors":["Luca Masserano","Abdul Fatir Ansari","Boran Han","Xiyuan Zhang","Christos Faloutsos","Michael Mahoney","Andrew Wilson","Youngsuk Park","Syama Rangapuram","Danielle Maddix Robinson","Yuyang (Bernie) Wang"],"abstract":"How to best develop foundational models for time series forecasting remains an important open question. Tokenization is a crucial consideration in this effort: what is an effective discrete vocabulary for a real-valued sequential input? To address this question, we develop WaveToken, a wavelet-based tokenizer that allows models to learn complex representations directly in the space of time localized frequencies Category: Machine learning","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Machine learning"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=21"}},{"id":"official:8c0966d2c8162268","title":"Efficiently generating correlated sample paths from multi-step time series foundation models","url":"https://www.amazon.science/publications/efficiently-generating-correlated-sample-paths-from-multi-step-time-series-foundation-models","published":"2025","authors":["Ethan Baron","Boris Oreshkin","Ruijun Ma","Hanyu Zhang","Kari Torkkola","Michael Mahoney","Andrew Gordon Wilson","Tatiana Konstantinova"],"abstract":"Many time series applications require access to multi-step forecast trajectories in the form of sample paths. Recently, time series foundation models have leveraged multi-step lookahead predictions to improve the quality and efficiency of multi-step forecasts. However, these models only predict independent marginal distributions for each time step, rather than a full joint predictive distribution. To generate Category: Machine learning","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Machine learning"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=9"}},{"id":"official:8a84607fcaf94929","title":"Effective product schema matching and duplicate detection with large language models","url":"https://www.amazon.science/publications/effective-product-schema-matching-and-duplicate-detection-with-large-language-models","published":"2025","authors":["Andrea Iovine","Yunhan Huang","Melvin Monteiro","Mohamed Yakout","Sedat Gokalp"],"abstract":"Building and maintaining a rich and high-quality product schema helps customers of an e-commerce service find products based on the characteristics they desire. As the quantity of products sold on the service increases, so does the complexity of maintaining the schema. Expanding it requires finding gaps, designing new product attributes, and ensuring that they do not already exist in the schema. In this Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Conversational AI"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=17"}},{"id":"official:f2479cd445d5747e","title":"ELF-Gym: Evaluating large language models generated features for tabular prediction","url":"https://www.amazon.science/publications/elf-gym-evaluating-large-language-models-generated-features-for-tabular-prediction","published":"2025","authors":["Yanlin Zhang","Ning Li","Quan Gan","Weinan Zhang","David Paul Wipf","Minjie Wang"],"abstract":"Crafting effective features is a crucial yet labor-intensive and domain-specific task within machine learning pipelines. Fortunately, recent advancements in Large Language Models (LLMs) have shown promise in automating various data science tasks, including feature engineering. But despite this potential, evaluations thus far are primarily based on the end performance of a complete ML pipeline, providing Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Conversational AI"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=30"}},{"id":"official:38f23fa983bb4709","title":"Disentangling biased knowledge from reasoning in large language models via machine unlearning","url":"https://www.amazon.science/publications/disentangling-biased-knowledge-from-reasoning-in-large-language-models-via-machine-unlearning","published":"2025","authors":["Zheyuan Liu","Suraj Maharjan","Fanyou Wu","Rahil Parikh","Belhassen Bayar","Srinivasan Sengamedu","\"SHS\"","Meng Jiang"],"abstract":"The rapid development of Large Language Models (LLMs) has led to their widespread adoption across various domains, leveraging vast pre-training knowledge and impressive generalization capabilities. However, these models often inherit biased knowledge, resulting in unfair decisions in sensitive applications. It is challenging to remove this biased knowledge without compromising reasoning abilities due to Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Conversational AI"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=23"}},{"id":"official:7a560457532b3ee1","title":"Detect, disambiguate, and translate: on-demand visual reasoning for multimodal machine translation with large vision-language models","url":"https://www.amazon.science/publications/detect-disambiguate-and-translate-on-demand-visual-reasoning-for-multimodal-machine-translation-with-large-vision-language-models","published":"2025","authors":["Danyang Liu","Fanjie Kong","Xiaohang Sun","Dhruva Patil","Avijit Vajpayee","Zhu Liu","Vimal Bhat","Najmeh Sadoughi"],"abstract":"Multimodal machine translation (MMT) aims to leverage additional modalities to assist in language translation. With limited parallel data, current MMT systems rely heavily on monolingual English captioning data. These systems face three key issues: they often overlook that visual signals are unnecessary in many cases, they lack transparency in how visual information is used for disambiguation when needed Category: Machine learning","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Machine learning"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=31"}},{"id":"official:fda17dae5a15f75b","title":"Details matter for indoor open-vocabulary 3D instance segmentation","url":"https://www.amazon.science/publications/details-matter-for-indoor-open-vocabulary-3d-instance-segmentation","published":"2025","authors":["Sanghun Jung","Jingjing Zheng","Ke Zhang","Nan Qiao","Albert Chen","Lu Xia","Chi Liu","Yuyin Sun","Xiao Zeng","Hsiang-Wei Huang","Byron Boots","Min Sun"],"abstract":"Unlike closed-vocabulary 3D instance segmentation that is often trained end-to-end, open-vocabulary 3D instance segmentation (OV-3DIS) often leverages vision-language models (VLMs) to generate 3D instance proposals and classify them. While various concepts have been proposed from existing research, we observe that these individual concepts are not mutually exclusive but complementary. In this paper, we Category: Computer vision","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Computer vision"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=16"}},{"id":"official:8d515817cc8c0096","title":"Demonstrating multi-suction item picking at scale via multi-modal learning of pick success","url":"https://www.amazon.science/publications/demonstrating-multi-suction-item-picking-at-scale-via-multi-modal-learning-of-pick-success","published":"2025","authors":["Che Wang","Jeroen Vanbaar","Chaitanya Mitash","Shuai Li","Dylan Randle","Weiyao Wang","Sumedh Sontakke","Kostas Bekris","Kapil Katyal"],"abstract":"This work demonstrates how autonomously learning aspects of robotic operation from sparsely-labeled, real-world data of deployed, engineered solutions at industrial scale can provide with solutions that achieve improved performance. Specifically, it focuses on multi-suction robot picking and performs a comprehensive study on the application of multi-modal visual encoders for predicting the success of candidate Category: Computer vision","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Computer vision"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=18"}},{"id":"official:da1c655003210bb5","title":"Cybernaut: Towards reliable web automation","url":"https://www.amazon.science/publications/cybernaut-towards-reliable-web-automation","published":"2025","authors":["Ankur Tomar","Hengyue Liang","Indranil Bhattacharya","Natalia Larios Delgado","Francesco Carbone"],"abstract":"The emergence of AI-driven web automation through Large Language Models (LLMs) offers unprecedented opportunities for optimizing digital workflows. However, deploying such systems within industry's real-world environments presents four core challenges: (1) ensuring consistent execution, (2) accurately identifying critical HTML elements, (3) meeting human-like accuracy in order to automate operations at Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Conversational AI"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=16"}},{"id":"official:5727186d08c0012f","title":"CriSPO: Multi-aspect critique-suggestion-guided automatic prompt optimization for text generation","url":"https://www.amazon.science/publications/crispo-multi-aspect-critique-suggestion-guided-automatic-prompt-optimization-for-text-generation","published":"2025","authors":["Han He","Flora Liu","Lei Xu","Chaitanya Shivade","Yi Zhang","Sundararajan Srinivasan","Katrin Kirchhoff"],"abstract":"Existing automatic prompt engineering methods are typically designed for discriminative tasks, where new task prompts are iteratively reﬁned with limited feedback from a single metric reflecting a single aspect. However, these approaches are suboptimal for generative tasks, which require more nuanced guidance beyond a single numeric metric to improve the prompt and optimize multiple aspects of the generated Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Conversational AI"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=36"}},{"id":"official:382bb3fbb5554220","title":"Controllable conversational theme detection track at DSTC 12","url":"https://www.amazon.science/publications/controllable-conversational-theme-detection-track-at-dstc-12","published":"2025","authors":["Igor Shalyminov","Hang Su","Jake Vincent","Siffi Singh","Jason Cai","James Gung","Raphael Shu","Saab Mansour"],"abstract":"Conversational analytics has been on the forefront of transformation driven by the advances in Speech and Natural Language Processing techniques. Rapid adoption of Large Language Models (LLMs) in the analytics field has taken the problems that can be automated to a new level of complexity and scale. In this paper, we introduce Theme Detection as a critical task in conversational analytics, aimed at automatically Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Conversational AI"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=17"}},{"id":"official:d8b9bc6f58301b74","title":"Context-driven dynamic pruning for large speech foundation models","url":"https://www.amazon.science/publications/context-driven-dynamic-pruning-for-large-speech-foundation-models","published":"2025","authors":["Masao Someki","Shikhar Bharadwaj","Atharva Anand Joshi","Chyi-Jiunn Lin","Jinchuan Tian","Jee-weon Jung","Markus Müller","Nathan Susanj","Jing Liu","Shinji Watanabe"],"abstract":"Speech foundation models achieve strong generalization across languages and acoustic conditions, but require significant computational resources for inference. In the context of speech foundation models, pruning techniques have been studied that dynamically optimize model structures based on the target audio leveraging external context. In this work, we extend this line of research and propose context-driven Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Conversational AI"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=21"}},{"id":"official:c39a04a1b2c825c1","title":"Context-aware dynamic pruning for speech foundation models","url":"https://www.amazon.science/publications/context-aware-dynamic-pruning-for-speech-foundation-models","published":"2025","authors":["Masao Someki","Yifan Peng","Siddhant Arora","Shinji Watanabe","Markus Müller","Thanasis Mouchtaris","Grant Strimel","Jing Liu"],"abstract":"Foundation models, such as large language models, have achieved remarkable success in natural language processing and are evolving into models capable of handling multiple modalities. Listening ability, in particular, is crucial for many applications, leading to research on building speech foundation models. However, the high computational cost of these large models presents a significant challenge for Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Conversational AI"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=29"}},{"id":"official:8b2f757a2c25a54a","title":"CodeAssistBench (CAB): Dataset & benchmarking for multi-turn chat-based code assistance","url":"https://www.amazon.science/publications/codeassistbench-cab-dataset-benchmarking-for-multi-turn-chat-based-code-assistance","published":"2025","authors":["Myeongsoo Kim","Shweta Garg","Baishakhi Ray","Varun Kumar","Anoop Deoras"],"abstract":"Programming assistants powered by large language models have transformed software development, yet most benchmarks focus narrowly on code generation tasks. Recent efforts like InfiBench and StackEval attempt to address this gap using Stack Overflow data but remain limited to single-turn interactions in isolated contexts, require significant manual curation, and fail to represent complete project environments Category: Machine learning","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Machine learning"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=11"}},{"id":"official:0e5242ec19562d24","title":"Chain-of-instructions: Compositional instruction tuning on large language models","url":"https://www.amazon.science/publications/chain-of-instructions-compositional-instruction-tuning-on-large-language-models","published":"2025","authors":["Shirley Hayati","Taehee Jung","Tristan Botersong","Sudipta Kar","Abhinav Sethy","Joo-Kyung Kim","Dongyeop Kang"],"abstract":"Fine-tuning large language models (LLMs) with a collection of large and diverse instructions has improved the model’s generalization to different tasks, even for unseen tasks. However, most existing instruction datasets include only single instructions, and they struggle to follow complex instructions composed of multiple subtasks. In this work, we propose a novel concept of compositional instructions called Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Conversational AI"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=36"}},{"id":"official:10f5550149f2bbba","title":"CSR-Bench: Benchmarking LLM agents in deployment of computer science research repositories","url":"https://www.amazon.science/publications/csr-bench-benchmarking-llm-agents-in-deployment-of-computer-science-research-repositories","published":"2025","authors":["YIJIA XIAO","Runhui Wang","Chris (Luyang) Kong","Davor Golac","Wei Wang"],"abstract":"The increasing complexity of computer science research projects demands more effective tools for deploying code repositories. Large Language Models (LLMs), such as Anthropic Claude and Meta Llama, have demonstrated significant advancements across various fields of computer science research, including the automation of diverse software engineering tasks. To evaluate the effectiveness of LLMs in handling","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["LLM"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=31"}},{"id":"official:1589fd692172e07d","title":"CONFETTI: Conversational function-calling evaluation through turn-level interactions","url":"https://www.amazon.science/publications/confetti-conversational-function-calling-evaluation-through-turn-level-interactions","published":"2025","authors":["Tamer Alkhouli","Katerina Margatina","James Gung","Raphael Shu","Claudia Zaghi","Monica Sunkara","Yi Zhang"],"abstract":"We introduce Conversational Function-Calling Evaluation Through Turn-Level Interactions (CONFETTI), a conversational benchmark designed to evaluate the function-calling capabilities and response quality of large language models (LLMs). Current benchmarks lack comprehensive assessment of LLMs in complex conversational scenarios. CONFETTI addresses this gap through 109 human-simulated conversations, comprising Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Conversational AI"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=18"}},{"id":"official:731a29aa3262e17f","title":"Building more accountable multi-modal LLMs through spatially-informed visual reasoning","url":"https://www.amazon.science/publications/building-more-accountable-multi-modal-llms-through-spatially-informed-visual-reasoning","published":"2025","authors":["Jing Wu","Suiyao Chen","Sasha Gutfraind","Inseok Heo","Shengjie Liu","Chen Li","Jeremy Curuksu","Michael Sharps"],"abstract":"Recent research has demonstrated that debate mechanisms among Large Language Models (LLMs) show remarkable potential for enhancing reasoning capabilities and promoting responsible text generation. However, it remains an open question whether debate strategies can effectively generalize to Multi-Modal Large Language Models (MLLMs). In this paper, we address this challenge by proposing a location-aware debate Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Conversational AI"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=10"}},{"id":"official:66f1fd637db24b14","title":"Break-ideate-generate (BrIdGe): Moving beyond translations for localization using LLMs","url":"https://www.amazon.science/publications/break-ideate-generate-bridge-moving-beyond-translations-for-localization-using-llms","published":"2025","authors":["Swapnil Gupta","Lucas Pereira Carlini","Prateek Sircar","Deepak Gupta"],"abstract":"Language localization is the adaptation of written content to different linguistic and cultural contexts. Ability to localize written content is crucial for global businesses to provide consistent and reliable customer experience across diverse markets. Traditional methods have approached localization as an application of machine translation (MT), but localization requires more than linguistic conversion Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Conversational AI"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=25"}},{"id":"official:a60461b9914a11ad","title":"Beyond mere automation: A techno-functional framework for reimagining gen-AI in supply chain operations","url":"https://www.amazon.science/publications/beyond-mere-automation-a-techno-functional-framework-for-reimagining-gen-ai-in-supply-chain-operations","published":"2025","authors":["Sreyoshi Bhaduri","Pavan Mullapudi","Shannon Dietrich","Scott DeWaters","Raj Ratan","Brajesh Kashyap","Rajanikanth Mandava","Lu Guo","Hungjen Wang","Vykunth Ashok","Abhilasha Katariya","Rohit Malshe"],"abstract":"As Generative AI (Gen-AI) continues to evolve rapidly, its potential to transform supply chain operations remains largely unexplored. Narrowing in on retail supply chain, this paper presents a taxonomy diagram that categorizes trends in Gen-AI adoption across various functions thereby mapping current Gen-AI capabilities and identifying immediate opportunities and potential challenges. We identify several Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Conversational AI"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=20"}},{"id":"official:6506170d73ea598b","title":"Benchmarking query-conditioned natural language inference","url":"https://www.amazon.science/publications/benchmarking-query-conditioned-natural-language-inference","published":"2025","authors":["Marc Canby","Xinchi Chen","Xing Niu","Jifan Chen","Bonan Min","Sergul Aydore","Vittorio Castelli"],"abstract":"The growing excitement around the ability of large language models (LLMs) to tackle various tasks has been tempered by their propensity for generating unsubstantiated information (hallucination) and by their inability to effectively handle inconsistent inputs. To detect such issues, we propose the novel task of Query-Conditioned Natural Language Inference (QC-NLI), where the goal is to determine the semantic Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Conversational AI"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=18"}},{"id":"official:1072b6202d75db25","title":"Automated knowledge bank construction for business intelligence LLMs","url":"https://www.amazon.science/publications/automated-knowledge-bank-construction-for-business-intelligence-llms","published":"2025","authors":["Joe Standerfer","Elisabeth Munger","Shayaan Naik"],"abstract":"This paper presents a novel approach to building automated knowledge banks for Generative Business Intelligence (GenBI) systems, enabling natural language interfaces to organizational data without specialized engineering expertise. We demonstrate how dashboard definitions can be transformed into knowledge repositories that bridge the semantic gap between Large Language Models (LLMs) and organization-specific Category: Information and knowledge management","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Information and knowledge management"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=13"}},{"id":"official:21913ed5b98e418b","title":"AutoKB: Automated creation of structured knowledge bases for domain-specific support","url":"https://www.amazon.science/publications/autokb-automated-creation-of-structured-knowledge-bases-for-domain-specific-support","published":"2025","authors":["Rishav Sahay","Arihant Jain","Purav Aggarwal","Anoop S V K K Saladi"],"abstract":"Effective customer support requires domain-specific solutions tailored to users’ issues. However, LLMs like ChatGPT, while excelling in open-domain tasks, often face challenges such as hallucinations, lack of domain compliance, and generic solutions when applied to specialized contexts. RAG-based systems, designed to combine domain context from unstructured knowledge bases (KBs) with LLMs, often struggle Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Conversational AI"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=32"}},{"id":"official:db186bf35d233460","title":"AutoClimDS: Climate data science agentic AI — A knowledge graph is all you need","url":"https://www.amazon.science/publications/autoclimds-climate-data-science-agentic-ai-a-knowledge-graph-is-all-you-need","published":"2025","authors":["Ahmed Jaber","Wangshu Zhu","Karthick Jayavelu","Justin Downes","Sameer Mohamed","Candace Agonafir","Linnia Hawkins","Tian Zheng"],"abstract":"Climate data science faces persistent barriers stemming from the fragmented nature of data sources, heterogeneous formats, and the steep technical expertise required to identify, acquire, and process datasets. These challenges limit participation, slow discovery, and reduce the reproducibility of scientific workflows. In this paper, we present a proof of concept for addressing these barriers through the Category: Cloud and systems","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Cloud and systems"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=11"}},{"id":"official:c2ebcd71889bcdbb","title":"Analyzing and improving coherence of large language models in question answering","url":"https://www.amazon.science/publications/analyzing-and-improving-coherence-of-large-language-models-in-question-answering","published":"2025","authors":["Ivano Lauriola","Stefano Campese","Alessandro Moschitti"],"abstract":"Large language models (LLMs) have recently revolutionized natural language processing. These models, however, often suffer from instability or lack of coherence, that is the ability of the models to generate semantically equivalent outputs when receiving diverse yet semantically equivalent input variations. In this work, we analyze the behavior of multiple LLMs, including Mixtral-8x7B, Llama2-70b, Smaug Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.18653/v1/2025.naacl-long.588","openalex_id":"https://openalex.org/W4411119302","cited_by_count":0,"quality_score":56,"matched_keywords":["Conversational AI"],"author_affiliations":["Amazon","Amazon (United States)","University of Trento"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=13"}},{"id":"official:95adff71a4e026e2","title":"An explainable natural language framework for identifying and notifying target audiences in enterprise communication","url":"https://www.amazon.science/publications/an-explainable-natural-language-framework-for-identifying-and-notifying-target-audiences-in-enterprise-communication","published":"2025","authors":["Vítor Lourenço","Mohnish Dubey","Yunfei Bai","Audrey Depeige","Vivek Jain"],"abstract":"In large-scale maintenance organizations, identifying subject matter experts and managing communications across complex entities relationships poses significant challenges – including information overload and longer response times – that traditional communication approaches fail to address effectively. We propose a novel framework that combines RDF graph databases with LLMs to process natural language queries Category: Information and knowledge management","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Information and knowledge management"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=16"}},{"id":"official:1e6899e0aab46a75","title":"Ambiguity detection and uncertainty calibration for question answering with large language models","url":"https://www.amazon.science/publications/ambiguity-detection-and-uncertainty-calibration-for-question-answering-with-large-language-models","published":"2025","authors":["Zhengyan Shi","Giuseppe Castellucci","Simone Filice","Saar Kuzi","Eugene Agichtein","Oleg Rokhlenko","Shervin Malmasi"],"abstract":"Large Language Models (LLMs) have demonstrated excellent capabilities in Question Answering (QA) tasks, yet their ability to identify and address ambiguous questions remains underdeveloped. Ambiguities in user queries often lead to inaccurate or misleading answers, undermining user trust in these systems. Despite prior attempts using prompt-based methods, performance has largely been equivalent to random Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Conversational AI"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=25"}},{"id":"official:a0aecbca2584b00e","title":"Amazon’s frontier model safety framework","url":"https://www.amazon.science/publications/amazons-frontier-model-safety-framework","published":"2025","authors":["Amazon"],"abstract":"At Amazon, we look to our leadership principles every day to guide our decision-making. Our approach to AI development naturally follows from our leadership principle “Success and Scale Bring Broad Responsibility.” As we continue to scale the capabilities of Amazon’s frontier models and democratize access to the benefits of AI, we also take responsibility for mitigating the risks of our technology. Consistent Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Conversational AI"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=34"}},{"id":"official:0fac2dfe1c052dfd","title":"Amazon Nova Sonic: Technical report and model card","url":"https://www.amazon.science/publications/amazon-nova-sonic-technical-report-and-model-card","published":"2025","authors":["Amazon Artificial General Intelligence"],"abstract":"We present Amazon Nova Sonic, a new multimodal foundation model that unifies speech and text processing in a single architecture, delivering frontier voice intelligence and industry-leading price performance. Amazon Nova Sonic (\"Nova Sonic\") builds on the advances in large pre-trained text and speech models, while fusing the two modalities in a unified architecture to power downstream tasks requiring both Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Conversational AI"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=29"}},{"id":"official:84bd0d75c02f8b79","title":"Amazon Nova 2: Multimodal reasoning and generation models","url":"https://www.amazon.science/publications/amazon-nova-2-multimodal-reasoning-and-generation-models","published":"2025","authors":["Amazon Artificial General Intelligence"],"abstract":"We present Amazon Nova 2, a family of four foundation models designed to meet diverse enterprise needs across reasoning, multimodal processing, and real-time conversational AI. The family includes Nova 2 Lite and Nova 2 Pro — multimodal models with dynamic reasoning capabilities that allow customers to balance accuracy, speed, and efficiency through configurable “extended thinking” controls; Nova 2 Omni Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Conversational AI"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=8"}},{"id":"official:7c986bf1c3372253","title":"Accelerated test-time scaling with model-free speculative sampling","url":"https://www.amazon.science/publications/accelerated-test-time-scaling-with-model-free-speculative-sampling","published":"2025","authors":["Woomin Song","Saket Dingliwal","Sai Muralidhar Jayanthi","Bhavana Ganesh","Jinwoo Shin","Aram Galstyan","Sravan Babu Bodapati"],"abstract":"Language models have demonstrated remarkable capabilities in reasoning tasks through test-time scaling techniques like best-of-N sampling and tree search. However, these approaches often demand substantial computational resources, creating a critical trade-off between performance and efficiency. We introduce STAND (STochastic Adaptive N-gram Drafting), a novel model-free speculative decoding approach that Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Conversational AI"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=12"}},{"id":"official:29189f7e143311c5","title":"AIDE: Attribute-guided multi-hop data expansion for data scarcity in task-specific fine-tuning","url":"https://www.amazon.science/publications/aide-attribute-guided-multi-hop-data-expansion-for-data-scarcity-in-task-specific-fine-tuning","published":"2025","authors":["Jiayu Li","Xuan Zhu","Fang Liu","Yanjun (Jane) Qi"],"abstract":"Fine-tuning large language models (LLMs) for specific tasks requires diverse, high-quality training data. However, obtaining sufficient relevant data remains a significant challenge. Existing data synthesis methods either depend on extensive seed datasets or struggle to balance task relevance and data diversity. To address these challenges, we propose Attributeguided multI-hop Data Expansion (AIDE), a novel Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Conversational AI"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=20"}},{"id":"official:c61e2ab3015d5dd1","title":"A systematic survey of automatic prompt optimization techniques","url":"https://www.amazon.science/publications/a-systematic-survey-of-automatic-prompt-optimization-techniques","published":"2025","authors":["Kiran Ramnath","Kang Zhou","Patrick Guan","Soumya Smruti Mishra","Xuan Qi","Zhengyuan Shen","Shuai Wang","Sangmin Woo","Sullam Jeoung","Yawei Wang","Haozhu Wang","Han Ding"],"abstract":"Since the advent of large language models (LLMs), prompt engineering has been a crucial step for eliciting desired responses for various Natural Language Processing (NLP) tasks. However, prompt engineering remains an impediment for end users due to rapid advances in models, tasks, and associated best practices. To mitigate this, Automatic Prompt Optimization (APO) techniques have recently emerged that use Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.48448/1ykx-2y72","openalex_id":"https://openalex.org/W7106860931","cited_by_count":0,"quality_score":56,"matched_keywords":["Conversational AI"],"author_affiliations":["Amazon","Amazon (Germany)","Amazon (United States)","University of Illinois Urbana-Champaign"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=13"}},{"id":"official:9931a142e885e379","title":"A calibrated reflection approach for enhancing confidence estimation in LLMs","url":"https://www.amazon.science/publications/a-calibrated-reflection-approach-for-enhancing-confidence-estimation-in-llms","published":"2025","authors":["Umesh Bodhwani","Yuan Ling","Shujing Dong","Yarong Feng","Hongfei Li","Ayush Goyal"],"abstract":"A critical challenge in deploying Large Language Models (LLMs) is developing reliable mechanisms to estimate their confidence, enabling systems to determine when to trust model outputs versus seek human intervention. We present a Calibrated Reflection approach for enhancing confidence estimation in LLMs, a framework that combines structured reasoning with distance-aware calibration technique. Our approach Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Conversational AI"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=28"}},{"id":"official:ca195a418797f464","title":"Stepwise multi-turn jailbreak attacks on code LLMs via task decomposition and test-time scaling","url":"https://www.amazon.science/nova-ai-challenge/proceedings/stepwise-multi-turn-jailbreak-attacks-on-code-llms-via-task-decomposition-and-test-time-scaling","published":"2025","authors":["University of Wisconsin-Madison"],"abstract":"In this technical report, we present our automated red-teaming framework designed to induce jailbreaks in targeted code-generating Large Language Models (LLMs), prompting them to generate malicious and vulnerable code. As of May 13, according to the latest competition leaderboard, our solution has achieved top performance in the second tournament. Our solution consists of three primary modules. First, we","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=16"}},{"id":"official:f1b4b9ee0bb8a5cc","title":"Secure and useful models are reasonable: Aligning code models via utility-preserving reasoning","url":"https://www.amazon.science/nova-ai-challenge/proceedings/secure-and-useful-models-are-reasonable-aligning-code-models-via-utility-preserving-reasoning","published":"2025","authors":["Carnegie Mellon University"],"abstract":"Warning: This report contains partially redacted content that may be offensive to the reader Large language models (LLMs) may assist users with malicious cybersecurity at-tacks or inadvertently generate code with critical security flaws. These failures stem from their broader inability to reliably identify safe data or generate safe outputs, despite advances in alignment research. We identify three potential","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=16"}},{"id":"official:647386f81c4856fd","title":"REDCODER: Automated multi-turn red teaming for code LLMs","url":"https://www.amazon.science/nova-ai-challenge/proceedings/redcoder-automated-multi-turn-red-teaming-for-code-llms","published":"2025","authors":["University of California","Davis"],"abstract":"Large Language Models (LLMs) for code generation (i.e., Code LLMs) have demonstrated impressive capabilities in AI-assisted software development and testing. However, recent studies have shown that these models are prone to gen-erating vulnerable or even malicious code under adversarial settings. Existing red-teaming approaches rely on extensive human effort, limiting their scalability and practicality,","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=15"}},{"id":"official:c67d89bb8a7a3ce5","title":"PurpCode: Reasoning for safer code generation","url":"https://www.amazon.science/nova-ai-challenge/proceedings/purpcode-reasoning-for-safer-code-generation","published":"2025","authors":["University of Illinois at Urbana-Champaign"],"abstract":"We introduce PurpCode, a novel post-training method that aligns coding assistants to perform safety reasoning to defend against malicious cyber activities and provide secure and functional code. Our approach trains a reasoning model in two stages:(i) Rule learning, which explicitly teaches the model to reference cyber safety rules to avoid facilitating malicious cyber events and to generate vulnerability-free","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=16"}},{"id":"official:0725b4aadf92644b","title":"IclForge: Enhancing in-context learning with evolutionary algorithms under budgeted annotation","url":"https://www.amazon.science/publications/iclforge-enhancing-in-context-learning-with-evolutionary-algorithms-under-budgeted-annotation","published":"2025","authors":["Vijit Malik","Atul Pande","Anirban Majumder"],"abstract":"In-context learning (ICL) has emerged as a powerful paradigm for adapting Large Language Models (LLMs) to specific tasks without parameter updates. While various strategies exist for selecting relevant ICL exemplars from a labeled pool, the fundamental challenge of constructing this high-quality pool remains largely unexplored, especially for new tasks or domains with limited labeled data. We present IclForge","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=13"}},{"id":"official:d24ff7d9698a8764","title":"Data is all you need (almost): Iterative synthetic instruction tuning for secure code generation","url":"https://www.amazon.science/nova-ai-challenge/proceedings/data-is-all-you-need-almost-iterative-synthetic-instruction-tuning-for-secure-code-generation","published":"2025","authors":["Virginia Tech"],"abstract":"While large language models (LLMs) achieve strong performance in code generation, persistent security vulnerabilities hinder their safe deployment. Starting from a pretrained CodeGen model without inherent safety mechanisms, we develop a systematic synthetic instruction-tuning workflow to progressively enhance model security. Our pipeline begins with taxonomy-guided synthetic data, capturing diverse attack","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=15"}},{"id":"official:8f0cfcb181039f5e","title":"COMET: Closed-loop orchestration for malicious elicitation techniques in code models","url":"https://www.amazon.science/nova-ai-challenge/proceedings/comet-closed-loop-orchestration-for-malicious-elicitation-techniques-in-code-models","published":"2025","authors":["University of Texas at Dallas"],"abstract":"WARNING: Contains harmful content that can be offensive in nature Large language models (LLMs) for code generation have enhanced developer productivity while introducing new misuse vectors, as these models can generate potentially harmful code. Existing evaluation methods fail to assess such misuse scenarios, and structured red teaming pipelines for code generation remain under-developed. This paper presents","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=15"}},{"id":"official:7c4915a89594a405","title":"AlquistCoder: A constitution-guided approach to safe, trustworthy code generation","url":"https://www.amazon.science/nova-ai-challenge/proceedings/alquistcoder-a-constitution-guided-approach-to-safe-trustworthy-code-generation","published":"2025","authors":["Czech Technical University in Prague"],"abstract":"We introduce AlquistCoder, a code-generating system that effectively minimizes the risk of producing malicious content or vulnerable code while maintaining excellent Python coding and question answering standards across a wide range of tasks. The architecture of AlquistCoder employs a sophisticated input guardrail classifier that analyzes whether the user’s intention is benign, potentially harmful, or falls","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=15"}},{"id":"openalex:W4405950099","title":"ВЫЯВЛЕНИЕ АНОМАЛИЙ В ДАННЫХ МОНИТОРИНГА ПРОИЗВОДИТЕЛЬНОСТИ С ИСПОЛЬЗОВАНИЕМ АЛГОРИТМА ISOLATION FOREST: ВОЗМОЖНОСТИ И ОГРАНИЧЕНИЯ","url":"https://doi.org/10.70239/arsu.2024.t78.n4.02","published":"2024-12-31","authors":["Adilzhan Kereyev","МИХЕЛЬСОН О.Ю."],"abstract":"This paper explores the application of the Isolation Forest algorithm for detecting anomalies in performance monitoring data of a SaaS project’s servers. The main hypothesis suggests that the algorithm can identify early signs of performance degradation and potential failures by analyzing basic metrics such as CPU load, memory usage, network traffic, and disk space. Two approaches were tested: analyzing each metric separately and aggregating them into a single indicator to assess the overall system state. The results showed that Isolation Forest demonstrates high sensitivity to sudden changes in metrics, leading to a significant number of false positives. This issue is particularly relevant when dealing with short-term metric spikes that do not necessarily indicate real system problems. The paper discusses the limitations of this approach, including the need for fine-tuning hyperparamete...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.70239/arsu.2024.t78.n4.02","openalex_id":"https://openalex.org/W4405950099","cited_by_count":0,"quality_score":41,"matched_keywords":["memory"],"author_affiliations":["Microsoft (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7918466329574585},{"id":"https://openalex.org/C176217482","display_name":"Metric (unit)","score":0.6827707886695862},{"id":"https://openalex.org/C34736171","display_name":"Preprocessor","score":0.656697154045105},{"id":"https://openalex.org/C124101348","display_name":"Data mining","score":0.5608434677124023},{"id":"https://openalex.org/C64869954","display_name":"False positive paradox","score":0.5607814788818359},{"id":"https://openalex.org/C2775941552","display_name":"Isolation (microbiology)","score":0.5598219037055969},{"id":"https://openalex.org/C21200559","display_name":"Sensitivity (control systems)","score":0.44943487644195557},{"id":"https://openalex.org/C8642999","display_name":"Hyperparameter","score":0.43809017539024353}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/llm-rubric-a-multidimensional-calibrated-approach-to-automated-evaluation-of-natural-language-texts","title":"LLM-Rubric: A Multidimensional, Calibrated Approach to Automated Evaluation of Natural Language Texts","url":"https://www.microsoft.com/en-us/research/publication/llm-rubric-a-multidimensional-calibrated-approach-to-automated-evaluation-of-natural-language-texts/","published":"2024-12-30","authors":["Helia Hashemi","Jason Eisner","Corby Rosset","Benjamin Van Durme","Chris Kedzie"],"abstract":"This paper introduces a framework for the automated evaluation of natural language texts. A manually constructed rubric describes how to assess multiple dimensions of interest. To evaluate a text, a large language model (LLM) is prompted with each rubric question and produces a distribution over potential responses. The LLM predictions often fail to agree well with human judges -- indeed, the humans do not fully agree with one another. However, the multiple LLM distributions can be $\\textit{combined}$ to $\\textit{predict}$ each human judge's annotations on all questions, including a summary question that assesses overall quality or relevance. LLM-Rubric accomplishes this by training a small feed-forward neural network that includes both judge-specific and judge-independent parameters. When evaluating dialogue systems in a human-AI information-seeking task, we find that LLM-Rubric with 9....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Human language technologies","Computer science","1970-01-01","LLM","language model"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/timeraf-retrieval-augmented-foundation-model-for-zero-shot-time-series-forecasting","title":"TimeRAF: Retrieval-Augmented Foundation model for Zero-shot Time Series Forecasting","url":"https://www.microsoft.com/en-us/research/publication/timeraf-retrieval-augmented-foundation-model-for-zero-shot-time-series-forecasting/","published":"2024-12-30","authors":["Huanyu Zhang","Chang Xu","Yi-Fan Zhang","Zhang Zhang","Liang Wang","Tien-Ping Tan","Jiang Bian"],"abstract":"Time series forecasting plays a crucial role in data mining, driving rapid advancements across numerous industries. With the emergence of large models, time series foundation models (TSFMs) have exhibited remarkable generalization capabilities, such as zero-shot learning, through large-scale pre-training. Meanwhile, Retrieval-Augmented Generation (RAG) methods have been widely employed to enhance the performance of foundation models on unseen data, allowing models to access to external knowledge. In this paper, we introduce TimeRAF, a Retrieval-Augmented Forecasting model that enhance zero-shot time series forecasting through retrieval-augmented techniques. We develop customized time series knowledge bases that are tailored to the specific forecasting tasks. TimeRAF employs an end-to-end learnable retriever to extract valuable information from the knowledge base. Additionally, we propose...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Article (Journal)","Artificial intelligence","Computer science","retrieval"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"arxiv:2403.05720","title":"A dataset and benchmark for hospital course summarization with adapted large language models","url":"http://arxiv.org/abs/2403.05720","published":"2024-12-30","authors":["Asad Aali","Dave Van Veen","Yamin Arefeen","Jason Hom","Christian Blüthgen","Eduardo Pontes Reis","Sergios Gatidis","N. Clifford","Joseph Daws","Arash Saber Tehrani","Jangwon Kim","Akshay Chaudhari"],"abstract":"OBJECTIVE: Brief hospital course (BHC) summaries are clinical documents that summarize a patient's hospital stay. While large language models (LLMs) depict remarkable capabilities in automating real-world tasks, their capabilities for healthcare applications such as synthesizing BHCs from clinical notes have not been shown. We introduce a novel preprocessed dataset, the MIMIC-IV-BHC, encapsulating clinical note and BHC pairs to adapt LLMs for BHC synthesis. Furthermore, we introduce a benchmark of the summarization performance of 2 general-purpose LLMs and 3 healthcare-adapted LLMs. MATERIALS AND METHODS: Using clinical notes as input, we apply prompting-based (using in-context learning) and fine-tuning-based adaptation strategies to 3 open-source LLMs (Clinical-T5-Large, Llama2-13B, and FLAN-UL2) and 2 proprietary LLMs (Generative Pre-trained Transformer [GPT]-3.5 and GPT-4). We evaluat...","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1093/jamia/ocae312","openalex_id":"https://openalex.org/W4392736665","cited_by_count":23,"quality_score":68,"matched_keywords":["LLM","preference"],"author_affiliations":["Amazon (United States)","Artificial Intelligence in Medicine (Canada)","Association for the Advancement of Artificial Intelligence","Hospital Israelita Albert Einstein","Hospital São Paulo","Stanford University","The University of Texas at Austin","University Hospital of Zurich","University of California San Francisco Medical Center"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7180169820785522},{"id":"https://openalex.org/C2779343474","display_name":"Context (archaeology)","score":0.6958717107772827},{"id":"https://openalex.org/C170858558","display_name":"Automatic summarization","score":0.683256208896637},{"id":"https://openalex.org/C185798385","display_name":"Benchmark (surveying)","score":0.6403182744979858},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5698177814483643},{"id":"https://openalex.org/C86251818","display_name":"Benchmarking","score":0.4993855953216553},{"id":"https://openalex.org/C160735492","display_name":"Health care","score":0.4545208811759949},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.44710052013397217}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":23}},{"id":"openalex:W4405842131","title":"Parameter-efficient fine-tuning of large language models using semantic knowledge tuning","url":"https://doi.org/10.1038/s41598-024-75599-4","published":"2024-12-28","authors":["Nusrat Jahan Prottasha","Asif Mahmud","Md. Shohanur Islam Sobuj","Prakash Bhat","Md. Kowsher","Niloofar Yousefi","Özlem Özmen Garibay"],"abstract":"Large Language Models (LLMs) are gaining significant popularity in recent years for specialized tasks using prompts due to their low computational cost. Standard methods like prefix tuning utilize special, modifiable tokens that lack semantic meaning and require extensive training for best performance, often falling short. In this context, we propose a novel method called Semantic Knowledge Tuning (SK-Tuning) for prompt and prefix tuning that employs meaningful words instead of random tokens. This method involves using a fixed LLM to understand and process the semantic content of the prompt through zero-shot capabilities. Following this, it integrates the processed prompt with the input text to improve the model's performance on particular tasks. Our experimental results show that SK-Tuning exhibits faster training times, fewer parameters, and superior performance on tasks such as text c...","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1038/s41598-024-75599-4","openalex_id":"https://openalex.org/W4405842131","cited_by_count":13,"quality_score":58,"matched_keywords":["LLM","efficient"],"author_affiliations":["Amazon (United States)","Hajee Mohammad Danesh Science and Technology University","Noakhali Science and Technology University","University of Central Florida"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8590635061264038},{"id":"https://openalex.org/C2779343474","display_name":"Context (archaeology)","score":0.5758118629455566},{"id":"https://openalex.org/C141603448","display_name":"Prefix","score":0.571208655834198},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.5396454930305481},{"id":"https://openalex.org/C98045186","display_name":"Process (computing)","score":0.5387398600578308},{"id":"https://openalex.org/C2780586970","display_name":"Popularity","score":0.514717161655426},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.49129289388656616},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.47950416803359985}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":13}},{"id":"openalex:W4405844927","title":"RenAIssance: A Survey Into AI Text-to-Image Generation in the Era of Large Model","url":"https://doi.org/10.1109/tpami.2024.3522305","published":"2024-12-27","authors":["Fengxiang Bie","Yibo Yang","Zhongzhu Zhou","Adam Ghanem","Minjia Zhang","Zhewei Yao","Xiaoxia Wu","Connor Holmes","Pareesa Ameneh Golnari","David A. Clifton","Yuxiong He","Dacheng Tao"],"abstract":"Text-to-image generation (TTI) refers to the usage of models that could process text input and generate high fidelity images based on text descriptions. Text-to-image generation using neural networks could be traced back to the emergence of Generative Adversial Network (GAN), followed by the autoregressive Transformer. Diffusion models are one prominent type of generative model used for the generation of images through the systematic introduction of noises with repeating steps. As an effect of the impressive results of diffusion models on image synthesis, it has been cemented as the major image decoder used by text-to-image models and brought text-to-image generation to the forefront of machine-learning (ML) research. In the era of large models, scaling up model size and the integration with large language models have further improved the performance of TTI models, resulting the generati...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/tpami.2024.3522305","openalex_id":"https://openalex.org/W4405844927","cited_by_count":33,"quality_score":71,"matched_keywords":["retrieval"],"author_affiliations":["Bellevue Hospital Center","King Abdullah University of Science and Technology","Microsoft (United States)","The University of Sydney","University of Oxford"],"concepts":[{"id":"https://openalex.org/C52069626","display_name":"The Renaissance","score":0.7565239667892456},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6066638827323914},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5801922082901001},{"id":"https://openalex.org/C115961682","display_name":"Image (mathematics)","score":0.4306299090385437},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.40699756145477295},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.3695250153541565},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.35997456312179565},{"id":"https://openalex.org/C121684516","display_name":"Computer graphics (images)","score":0.34902945160865784}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":33}},{"id":"hf-org-paper:deepseek-ai:2412.19437","title":"DeepSeek-V3 Technical Report","url":"https://huggingface.co/papers/2412.19437","published":"2024-12-26","authors":["DeepSeek"],"abstract":"We present DeepSeek-V3, a strong Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token. To achieve efficient inference and cost-effective training, DeepSeek-V3 adopts Multi-head Latent Attention (MLA) and DeepSeekMoE architectures, which were thoroughly validated in DeepSeek-V2. Furthermore, DeepSeek-V3 pioneers an auxiliary-loss-free strategy for load balancing and sets a multi-token prediction training objective for stronger performance. We pre-train DeepSeek-V3 on 14.8 trillion diverse and high-quality tokens, followed by Supervised Fine-Tuning and Reinforcement Learning stages to fully harness its capabilities. Comprehensive evaluations reveal that DeepSeek-V3 outperforms other open-source models and achieves performance comparable to leading closed-source models. Despite its excellent performance, DeepSeek-V3 requires only 2.788M H800 G...","companies":["DeepSeek"],"matched_orgs":["DeepSeek"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["HuggingFace org papers","deepseek-ai","language model","efficient"],"author_affiliations":["DeepSeek"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/deepseek-ai/papers"}},{"id":"openalex:W4405810201","title":"CRP-RAG: A Retrieval-Augmented Generation Framework for Supporting Complex Logical Reasoning and Knowledge Planning","url":"https://doi.org/10.3390/electronics14010047","published":"2024-12-26","authors":["Kehan Xu","Kun Zhang","Jingyuan Li","Wei Huang","Yuanzhuo Wang"],"abstract":"The Retrieval-Augmented Generation (RAG) framework enhances Large Language Models (LLMs) by retrieving relevant knowledge to broaden their knowledge boundaries and mitigate factual hallucinations stemming from knowledge gaps. However, the RAG Framework faces challenges in effective knowledge retrieval and utilization; invalid or misused knowledge will interfere with LLM generation, reducing reasoning efficiency and answer quality. Existing RAG methods address these issues by decomposing and expanding queries, introducing special knowledge structures, and using reasoning process evaluation and feedback. However, the linear reasoning structures limit complex thought transformations and reasoning based on intricate queries. Additionally, knowledge retrieval and utilization are decoupled from reasoning and answer generation, hindering effective knowledge support during answer generation. To....","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.3390/electronics14010047","openalex_id":"https://openalex.org/W4405810201","cited_by_count":11,"quality_score":56,"matched_keywords":["LLM","retrieval"],"author_affiliations":["Beijing Technology and Business University","Chinese Academy of Sciences","Institute of Computing Technology","Tencent (China)","Yanshan University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7313927412033081},{"id":"https://openalex.org/C2776436953","display_name":"Consistency (knowledge bases)","score":0.5244293808937073},{"id":"https://openalex.org/C89288958","display_name":"Reasoning system","score":0.5173487067222595},{"id":"https://openalex.org/C37335422","display_name":"Model-based reasoning","score":0.48457372188568115},{"id":"https://openalex.org/C2780613888","display_name":"Knowledge retrieval","score":0.47700977325439453},{"id":"https://openalex.org/C97364631","display_name":"Deductive reasoning","score":0.41987210512161255},{"id":"https://openalex.org/C207685749","display_name":"Domain knowledge","score":0.4128188490867615},{"id":"https://openalex.org/C161301231","display_name":"Knowledge representation and reasoning","score":0.35936325788497925}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":11}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/bootstrap-your-own-context-length","title":"Bootstrap Your Own Context Length","url":"https://www.microsoft.com/en-us/research/publication/bootstrap-your-own-context-length/","published":"2024-12-25","authors":["Liang Wang","Nan Yang","Xingxing Zhang","Xiaolong Huang","Furu Wei"],"abstract":"We introduce a bootstrapping approach to train long-context language models by exploiting their short-context capabilities only. Our method utilizes a simple agent workflow to synthesize diverse long-context instruction tuning data, thereby eliminating the necessity for manual data collection and annotation. The proposed data synthesis workflow requires only a short-context language model, a text retriever, and a document collection, all of which are readily accessible within the open-source ecosystem. Subsequently, language models are fine-tuned using the synthesized data to extend their context lengths. In this manner, we effectively transfer the short-context capabilities of language models to long-context scenarios through a bootstrapping process. We conduct experiments with the open-source Llama-3 family of models and demonstrate that our method can successfully extend the context l...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Article (Journal)","Human language technologies","Computer science","language model","agent"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"official:b184e6ac069c82f6","title":"QVQ: To See the World with Wisdom","url":"https://qwenlm.github.io/blog/qvq-72b-preview/","published":"2024-12-25","authors":["Alibaba/Qwen"],"abstract":"GITHUB HUGGING FACE MODELSCOPE KAGGLE DEMO DISCORDLanguage and vision intertwine in the human mind, shaping how we perceive and understand the world around us. Our ability to reason is deeply rooted in both linguistic thought and visual memory - but what happens when we extend these capabilities to AI? Today’s large language models have demonstrated remarkable reasoning abilities, but we wondered: could they harness the power of visual understanding to reach new heights of cognitive capability?","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["memory"],"author_affiliations":["Alibaba/Qwen"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official RSS feed https://qwenlm.github.io/blog/index.xml"}},{"id":"openalex:W4405778884","title":"MENSA: Multi-Dataset Harmonized Pretraining for Semantic Segmentation","url":"https://doi.org/10.1109/tmm.2024.3521851","published":"2024-12-25","authors":["Bowen Shi","Xiaopeng Zhang","Yaoming Wang","Wenrui Dai","Junni Zou","Hongkai Xiong"],"abstract":"Existing pretraining methods for semantic segmentation are hampered by the task gap between global image -level pretraining and local pixel-level finetuning. Joint dense-level pretraining is a promising alternative to exploit off-the-shelf annotations from diverse segmentation datasets but suffers from low-quality class embeddings and inconsistent data and supervision signals across multiple datasets by directly employing CLIP. To overcome these challenges, we propose a novel <underline xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">M</u>ulti-datas<underline xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">E</u>t harmo<underline xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">N</u>ized pretraining framework for <underline xmlns:mml=\"http://www.w3.org/1998/Math/Mat...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/tmm.2024.3521851","openalex_id":"https://openalex.org/W4405778884","cited_by_count":1,"quality_score":38,"matched_keywords":[],"author_affiliations":["Huawei Technologies (China)","Shanghai Jiao Tong University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8674848079681396},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.6114347577095032},{"id":"https://openalex.org/C89600930","display_name":"Segmentation","score":0.5777822136878967},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.525789737701416},{"id":"https://openalex.org/C184337299","display_name":"Semantics (computer science)","score":0.49840426445007324},{"id":"https://openalex.org/C153180895","display_name":"Pattern recognition (psychology)","score":0.37444573640823364},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.35556793212890625},{"id":"https://openalex.org/C199360897","display_name":"Programming language","score":0.10211309790611267}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4405740139","title":"ESGReveal: An LLM-based approach for extracting structured data from ESG reports","url":"https://doi.org/10.1016/j.jclepro.2024.144572","published":"2024-12-24","authors":["Yi Zou","Mengying Shi","Zhongjie Chen","Zhu Deng","Zongxiong Lei","Zihan Zeng","Shiming Yang","Hongxiang Tong","Lei Xiao","Wenwen Zhou"],"abstract":"","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1016/j.jclepro.2024.144572","openalex_id":"https://openalex.org/W4405740139","cited_by_count":37,"quality_score":71,"matched_keywords":["LLM"],"author_affiliations":["Alibaba Group (China)","Sun Yat-sen University","Tsinghua University","University of Hong Kong"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.4187331199645996},{"id":"https://openalex.org/C144133560","display_name":"Business","score":0.3617687225341797},{"id":"https://openalex.org/C124101348","display_name":"Data mining","score":0.348958820104599},{"id":"https://openalex.org/C21880701","display_name":"Process engineering","score":0.34409964084625244},{"id":"https://openalex.org/C127413603","display_name":"Engineering","score":0.26425600051879883}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":37}},{"id":"arxiv:2410.13110","title":"Deep learning-based software engineering: progress, challenges, and opportunities","url":"http://arxiv.org/abs/2410.13110","published":"2024-12-24","authors":["Xiangping Chen","Xing Hu","Yuan Huang","He Jiang","Weixing Ji","Yanjie Jiang","Yanyan Jiang","Bo Liu","Hui Liu","Xiaochen Li","Xiaoli Lian","Guozhu Meng"],"abstract":"Abstract Researchers have recently achieved significant advances in deep learning techniques, which in turn has substantially advanced other research disciplines, such as natural language processing, image processing, speech recognition, and software engineering. Various deep learning techniques have been successfully employed to facilitate software engineering tasks, including code generation, software refactoring, and fault localization. Many studies have also been presented in top conferences and journals, demonstrating the applications of deep learning techniques in resolving various software engineering tasks. However, although several surveys have provided overall pictures of the application of deep learning techniques in software engineering, they focus more on learning techniques, that is, what kind of deep learning techniques are employed and how deep models are trained or fine-...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1007/s11432-023-4127-5","openalex_id":"https://openalex.org/W4403579163","cited_by_count":62,"quality_score":67,"matched_keywords":[],"author_affiliations":["Beihang University","Beijing Institute of Technology","Beijing Jiaotong University","Chinese Academy of Sciences","Dalian University of Technology","Fudan University","Harbin Institute of Technology","Huawei Technologies (China)","Institute of Information Engineering","Institute of Software","Nanjing University","Peking University","Sun Yat-sen University","Wuhan University","Zhejiang University"],"concepts":[{"id":"https://openalex.org/C108583219","display_name":"Deep learning","score":0.7261971235275269},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6299700140953064},{"id":"https://openalex.org/C529173508","display_name":"Software development","score":0.6159235835075378},{"id":"https://openalex.org/C182500959","display_name":"Social software engineering","score":0.6116766929626465},{"id":"https://openalex.org/C115903868","display_name":"Software engineering","score":0.5940762162208557},{"id":"https://openalex.org/C186846655","display_name":"Software construction","score":0.5202237963676453},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.48961883783340454},{"id":"https://openalex.org/C54534927","display_name":"Software requirements","score":0.48526594042778015}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":62}},{"id":"openalex:W4405754085","title":"Make Graph-Based Referring Expression Comprehension Great Again Through Expression-Guided Dynamic Gating and Regression","url":"https://doi.org/10.1109/tmm.2024.3521844","published":"2024-12-24","authors":["Jingcheng Ke","D. Wang","Jun-Cheng Chen","I‐Hong Jhuo","Chia‐Wen Lin","Yen‐Yu Lin"],"abstract":"One common belief is that with complex models and pre-training on large-scale datasets, transformer-based methods for referring expression comprehension (REC) perform much better than existing graph-based methods. We observe that since most graph-based methods adopt an off-the-shelf detector to locate candidate objects (i.e., regions detected by the object detector), they face two challenges that result in subpar performance: (1) the presence of significant noise caused by numerous irrelevant objects during reasoning, and (2) inaccurate localization outcomes attributed to the provided detector. To address these issues, we introduce a plug-and-adapt module guided by sub-expressions, called dynamic gate constraint (DGC), which can adaptively disable irrelevant proposals and their connections in graphs during reasoning. We further introduce an expression-guided regression strategy (EGR) to....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/tmm.2024.3521844","openalex_id":"https://openalex.org/W4405754085","cited_by_count":2,"quality_score":39,"matched_keywords":[],"author_affiliations":["Microsoft (United States)","National Tsing Hua University","National Yang Ming Chiao Tung University","Research Center for Information Technology Innovation, Academia Sinica"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8069606423377991},{"id":"https://openalex.org/C90559484","display_name":"Expression (computer science)","score":0.5457181930541992},{"id":"https://openalex.org/C132525143","display_name":"Graph","score":0.539974570274353},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.47711971402168274},{"id":"https://openalex.org/C511192102","display_name":"Comprehension","score":0.44306594133377075},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.42103147506713867},{"id":"https://openalex.org/C121329065","display_name":"Regular expression","score":0.4159669876098633},{"id":"https://openalex.org/C124101348","display_name":"Data mining","score":0.3662160336971283}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":2}},{"id":"openalex:W4405718304","title":"PASS: Test-Time Prompting to Adapt Styles and Semantic Shapes in Medical Image Segmentation","url":"https://doi.org/10.1109/tmi.2024.3521463","published":"2024-12-23","authors":["Chuyan Zhang","Hao Zheng","Xin You","Yefeng Zheng","Yun Gu"],"abstract":"Test-time adaptation (TTA) has emerged as a promising paradigm to handle the domain shifts at test time for medical images from different institutions without using extra training data. However, existing TTA solutions for segmentation tasks suffer from 1) dependency on modifying the source training stage and access to source priors or 2) lack of emphasis on shape-related semantic knowledge that is crucial for segmentation tasks. Recent research on visual prompt learning achieves source-relaxed adaptation by extended parameter space but still neglects the full utilization of semantic features, thus motivating our work on knowledge-enriched deep prompt learning. Beyond the general concern of image style shifts, we reveal that shape variability is another crucial factor causing the performance drop. To address this issue, we propose a TTA framework called PASS (Prompting to Adapt Styles and...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/tmi.2024.3521463","openalex_id":"https://openalex.org/W4405718304","cited_by_count":8,"quality_score":45,"matched_keywords":[],"author_affiliations":["Shanghai Jiao Tong University","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.660214900970459},{"id":"https://openalex.org/C124504099","display_name":"Image segmentation","score":0.6585781574249268},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.6021105051040649},{"id":"https://openalex.org/C89600930","display_name":"Segmentation","score":0.5906498432159424},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5817869901657104},{"id":"https://openalex.org/C115961682","display_name":"Image (mathematics)","score":0.5544671416282654},{"id":"https://openalex.org/C31601959","display_name":"Medical imaging","score":0.5392575860023499},{"id":"https://openalex.org/C2777267654","display_name":"Test (biology)","score":0.5381318926811218}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":8}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/trace-is-the-new-autodiff-unlocking-efficient-optimization-of-computational-workflows","title":"Trace is the Next AutoDiff: Generative Optimization with Rich Feedback, Execution Traces, and LLMs","url":"https://www.microsoft.com/en-us/research/publication/trace-is-the-new-autodiff-unlocking-efficient-optimization-of-computational-workflows/","published":"2024-12-22","authors":["Ching-An Cheng","Allen Nie","Adith Swaminathan"],"abstract":"We study a class of optimization problems motivated by automating the design and update of AI systems like coding assistants, robots, and copilots. AutoDiff frameworks, like PyTorch, enable efficient end-to-end optimization of differentiable systems. However, general computational workflows can be non-differentiable and involve rich feedback (e.g. console output or user's responses), heterogeneous parameters (e.g. prompts, codes), and intricate objectives (beyond maximizing a score). We investigate end-to-end generative optimization -- using generative models such as LLMs within the optimizer for automatic updating of general computational workflows. We discover that workflow execution traces are akin to back-propagated gradients in AutoDiff and can provide key information to interpret feedback for efficient optimization. Formally, we frame a new mathematical setup, Optimization with Tra...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":84,"matched_keywords":["Inproceedings (Conference)","Algorithms","Artificial intelligence","Computer science","optimization framework","1970-01-01","LLM","efficient"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/from-models-to-microtheories-distilling-a-models-topical-knowledge-for-grounded-question-answering","title":"From Models to Microtheories: Distilling a Model's Topical Knowledge for Grounded Question Answering","url":"https://www.microsoft.com/en-us/research/publication/from-models-to-microtheories-distilling-a-models-topical-knowledge-for-grounded-question-answering/","published":"2024-12-22","authors":["Nathaniel Weir","Bhavana Dalvi","Orion Weller","Oyvind Tafjord","Sam Hornstein","Alexander Sabol","P. Jansen","Ben Van Durme","Peter Clark"],"abstract":"Recent reasoning methods (e.g., chain-of-thought, entailment reasoning) help users understand how language models (LMs) answer a single question, but they do little to reveal the LM's overall understanding, or \"theory,\" about the question's topic, making it still hard to trust the model. Our goal is to materialize such theories - here called microtheories (a linguistic analog of logical microtheories) - as a set of sentences encapsulating an LM's core knowledge about a topic. These statements systematically work together to entail answers to a set of questions to both engender trust and improve performance. Our approach is to first populate a knowledge store with (model-generated) sentences that entail answers to training questions and then distill those down to a core microtheory that is concise, general, and non-redundant. We show that, when added to a general corpus (e.g., Wikipedia),...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":80,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computation and Language","Computer science","1970-01-01","efficient","distillation"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/when-can-proxies-improve-the-sample-complexity-of-preference-learning","title":"When Can Proxies Improve the Sample Complexity of Preference Learning?","url":"https://www.microsoft.com/en-us/research/publication/when-can-proxies-improve-the-sample-complexity-of-preference-learning/","published":"2024-12-20","authors":["Yuchen Zhu","Daniel Augusto de Souza","Zhengyan Shi","Mengyue Yang","Pasquale Minervini","Alexander D'Amour","Matt J. Kusner"],"abstract":"We address the problem of reward hacking, where maximising a proxy reward does not necessarily increase the true reward. This is a key concern for Large Language Models (LLMs), as they are often fine-tuned on human preferences that may not accurately reflect a true objective. Existing work uses various tricks such as regularisation, tweaks to the reward model, and reward hacking detectors, to limit the influence that such proxy preferences have on a model. Luckily, in many contexts such as medicine, education, and law, a sparse amount of expert data is often available. In these cases, it is often unclear whether the addition of proxy data can improve policy learning. We outline a set of sufficient conditions on proxy feedback that, if satisfied, indicate that proxy data can provably improve the sample complexity of learning the ground truth policy. These conditions can inform the data co...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":80,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","large language models","mathematics","1970-01-01","preference"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W4405643374","title":"Recommendation as Instruction Following: A Large Language Model Empowered Recommendation Approach","url":"https://doi.org/10.1145/3708882","published":"2024-12-20","authors":["Junjie Zhang","Ruobing Xie","Yupeng Hou","Wayne Xin Zhao","Leyu Lin","Ji-Rong Wen"],"abstract":"In the past few decades, recommender systems have attracted much attention in both research and industry communities. Existing recommendation models mainly learn the underlying user preference from historical behavior data (typically in the forms of item IDs), and then estimate the user–item matching relationships for recommendations. Inspired by the recent progress on large language models (LLMs), we develop a different recommendation paradigm, considering recommendation as instruction following by LLMs. The key idea is that the needs of a user can be expressed in natural language descriptions (called instructions ), so that LLMs can understand and further execute the instruction for fulfilling the recommendation. For this purpose, we instruction tune the 3B Flan-T5-XL, to better adapt LLMs to recommender systems. We first design a general instruction format for describing the preferenc...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3708882","openalex_id":"https://openalex.org/W4405643374","cited_by_count":77,"quality_score":79,"matched_keywords":["language model","personalized","preference"],"author_affiliations":["Renmin University of China","Tencent (China)","UC San Diego Health System","University of California San Diego"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6153456568717957},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.4646453559398651},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.3616573214530945}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":77}},{"id":"openalex:W4405669265","title":"A Multimodal Biomedical Foundation Model Trained from Fifteen Million Image–Text Pairs","url":"https://doi.org/10.1056/aioa2400640","published":"2024-12-20","authors":["Sheng Zhang","Yanbo Xu","Naoto Usuyama","Hanwen Xu","Jaspreet Bagga","Robert Tinn","Sam Preston","Rajesh Rao","Mu Wei","Naveen Valluri","Cliff Wong","Andrea Tupini"],"abstract":"","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1056/aioa2400640","openalex_id":"https://openalex.org/W4405669265","cited_by_count":159,"quality_score":67,"matched_keywords":[],"author_affiliations":["Microsoft (United States)","Providence College","Providence Portland Medical Center","University of Washington"],"concepts":[{"id":"https://openalex.org/C2780966255","display_name":"Foundation (evidence)","score":0.8071958422660828},{"id":"https://openalex.org/C115961682","display_name":"Image (mathematics)","score":0.5265465378761292},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.4968619644641876},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.47797414660453796},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.33555132150650024},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.3278099298477173},{"id":"https://openalex.org/C95457728","display_name":"History","score":0.1860332489013672},{"id":"https://openalex.org/C166957645","display_name":"Archaeology","score":0.11040681600570679}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":159}},{"id":"openalex:W4408018537","title":"Research and Practice on Database Interaction Based on Natural Language Processing","url":"https://doi.org/10.1109/aiac63745.2024.10899660","published":"2024-12-20","authors":["Zeshun You","Jiebin Yao","Dong Cheng","Zhiwei Wen","Zhi-Liang Lu","X. Y. Shen"],"abstract":"Data serves as the foundation for an enterprise's digital transformation, and its efficient utilization requires database support. SQL is highly complex and often considered unsuitable for non-technical users. Reducing the barriers to data utilization can be achieved by exploring natural language interactions with databases. The NL2SQL task aims to enable natural language interaction with databases. The emergence of large language models (LLMs) has spurred significant theoretical advances in NL2SQL, accelerating its development. However, an efficient NL2SQL system architecture has yet to be established. This paper presents an NL2SQL architecture that incorporates three methods of SQL generation. Additionally, five prevalent issues in NL2SQL systems are analyzed, and corresponding solutions are proposed.","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/aiac63745.2024.10899660","openalex_id":"https://openalex.org/W4408018537","cited_by_count":0,"quality_score":41,"matched_keywords":["efficient"],"author_affiliations":["Huawei Technologies (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7736905813217163},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.5567935705184937},{"id":"https://openalex.org/C195324797","display_name":"Natural language","score":0.45779046416282654},{"id":"https://openalex.org/C2776608160","display_name":"Natural (archaeology)","score":0.4374787211418152},{"id":"https://openalex.org/C77088390","display_name":"Database","score":0.38560736179351807},{"id":"https://openalex.org/C199360897","display_name":"Programming language","score":0.37736183404922485},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3684278726577759},{"id":"https://openalex.org/C95457728","display_name":"History","score":0.06956183910369873}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4411584695","title":"WiViD: Leveraging Wi-Fi and Vision for Depth Estimation via Multimodal Diffusion","url":"https://doi.org/10.1109/msn63567.2024.00021","published":"2024-12-20","authors":["Shijie Cheng","Yuchong Gao","Zheng Yang","Guoxuan Chi","Tony Xiao Han"],"abstract":"Depth estimation is crucial for numerous applications, including autonomous driving, robotic navigation and aug-mented reality. Existing solutions based on LiDAR and mm Wave technologies are constrained by high deployment costs, while those utilizing monocular vision suffer from limited accuracy. To address these challenges, this paper proposes WiViD, a diffusion-based depth estimation system that leverages commercial Wi-Fi and vision. Diffusion models, with their ability to iteratively refine predictions, offer significant advantages in producing accurate and detailed estimations. We introduce a Multimodal Conditional Diffusion (MMCD) mechanism and design two encoding modules: the Complex-Valued CSI Encoder (CCE) and the Residual Image Encoder (RIE). These components fully exploit the spatio-temporal information inherent in Wi-Fi CSI and enable the effective fusion of Wi-Fi CSI and RGB....","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/msn63567.2024.00021","openalex_id":"https://openalex.org/W4411584695","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Huawei Technologies (China)","Tsinghua University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7300837635993958},{"id":"https://openalex.org/C69357855","display_name":"Diffusion","score":0.5601997971534729},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.5358613133430481},{"id":"https://openalex.org/C96250715","display_name":"Estimation","score":0.5304778218269348},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5103234648704529},{"id":"https://openalex.org/C127413603","display_name":"Engineering","score":0.07094952464103699},{"id":"https://openalex.org/C201995342","display_name":"Systems engineering","score":0.0},{"id":"https://openalex.org/C121332964","display_name":"Physics","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"bytedance-seed:274","title":"FlowAR: Scale-wise Autoregressive Image Generation Meets Flow Matching","url":"https://seed.bytedance.com/en/research/flowar-scale-wise-autoregressive-image-generation-meets-flow-matching","published":"2024-12-19","authors":["Sucheng Ren","Qihang Yu","Ju He","Xiaohui Shen","Alan Yuille","Liang-Chieh Chen"],"abstract":"Autoregressive (AR) modeling has achieved remarkable success in natural language processing by enabling models to generate text with coherence and contextual understanding through next token prediction. Recently, in image generation, VAR proposes scale-wise autoregressive modeling, which extends the next token prediction to the next scale prediction, preserving the 2D structure of images. However, VAR encounters two primary challenges: (1) its complex and rigid scale design limits generalization in next scale prediction, and (2) the generator’s dependence on a discrete tokenizer with the same complex scale structure restricts modularity and flexibility in updating the tokenizer. To address these limitations, we introduce FlowAR, a general next scale prediction method featuring a streamlined scale design, where each subsequent scale is simply double the previous one. This eliminates the n...","companies":["ByteDance/Seed"],"matched_orgs":["ByteDance/Seed"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Computer Vision and Pattern Recognition","Vision","ICML 2025"],"author_affiliations":["ByteDance/Seed"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official ByteDance Seed publication API https://seed.bytedance.com/api/get_article_list_v2"}},{"id":"openalex:W4405578085","title":"Thoroughly Modeling Multi-domain Pre-trained Recommendation as Language","url":"https://doi.org/10.1145/3708883","published":"2024-12-19","authors":["Zekai Qu","Ruobing Xie","Chaojun Xiao","Yuan Yao","Zhiyuan Liu","Fengzong Lian","Zhanhui Kang","Jie Zhou"],"abstract":"With the thriving of the pre-trained language model (PLM) widely verified in various NLP tasks, pioneer efforts attempt to explore the possible cooperation of the general textual information in PLM with the personalized behavioral information in user historical behavior sequences to enhance sequential recommendation (SR). However, despite the commonalities of input format and task goal, there are huge gaps between the behavioral and textual information, which obstruct thoroughly modeling SR as language modeling via PLM. To bridge the gap, we propose a novel unified pre-trained language model enhanced sequential recommendation (UPSR) that thoroughly transfers the next item prediction task to a text generation task, aiming to build a unified pre-trained recommendation model for multi-domain recommendation tasks. We formally design five key indicators, namely naturalness, domain consistency...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3708883","openalex_id":"https://openalex.org/W4405578085","cited_by_count":3,"quality_score":48,"matched_keywords":["language model","personalized"],"author_affiliations":["China University of Geosciences (Beijing)","Institute of Information Engineering","National University of Singapore","Tencent (China)","Tsinghua University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7896305322647095},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.7313483953475952},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.5717402100563049},{"id":"https://openalex.org/C2780451532","display_name":"Task (project management)","score":0.57004714012146},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5332768559455872},{"id":"https://openalex.org/C2776436953","display_name":"Consistency (knowledge bases)","score":0.5066452622413635},{"id":"https://openalex.org/C2780522230","display_name":"Ambiguity","score":0.5059489607810974},{"id":"https://openalex.org/C36503486","display_name":"Domain (mathematical analysis)","score":0.4662662148475647}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":3}},{"id":"openalex:W4405599814","title":"PRADA: Pre-Train Ranking Models With Diverse Relevance Signals Mined From Search Logs","url":"https://doi.org/10.1109/tkde.2024.3515800","published":"2024-12-19","authors":["Shuting Wang","Zhicheng Dou","Kexiang Wang","Dehong Ma","Jun Fan","Daiting Shi","Zhicong Cheng","Simiu Gu","Dawei Yin","Ji-Rong Wen"],"abstract":"Existing studies have proven that pre-trained ranking models outperform pre-trained language models when it comes to ranking tasks. To pre-train such models, researchers have utilized large-scale search logs and clicks as weak-supervised signals of query-document relevance. However, search logs are incomplete and sparse. Different users with the same intent tend to use various forms of queries. It is hard for recorded clicks to sufficiently cover diverse relevance patterns between queries and documents. Moreover, the diverse intentions of a large user base lead to long-tail distributions of search intents. Deriving sufficient relevance signals from sparse clicks of these long-tail intents poses another challenge. Therefore, there is significant potential for exploring richer relevance signals beyond direct clicks to pre-train high-quality ranking models. To tackle this problem, we develo...","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/tkde.2024.3515800","openalex_id":"https://openalex.org/W4405599814","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Baidu (China)","Renmin University of China"],"concepts":[{"id":"https://openalex.org/C158154518","display_name":"Relevance (law)","score":0.8078798651695251},{"id":"https://openalex.org/C189430467","display_name":"Ranking (information retrieval)","score":0.7822239398956299},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7548311948776245},{"id":"https://openalex.org/C124101348","display_name":"Data mining","score":0.571333646774292},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.43454471230506897},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.3692651093006134},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3625461459159851},{"id":"https://openalex.org/C199539241","display_name":"Law","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"arxiv:2412.15115","title":"Qwen2.5 Technical Report","url":"https://huggingface.co/papers/2412.15115","published":"2024-12-19","authors":["Qwen","An Yang","Baosong Yang","Beichen Zhang","Binyuan Hui","Bo Zheng","Bowen Yu","Chengyuan Li","Dayiheng Liu","Fei Huang","Haoran Wei","Huan Lin"],"abstract":"In this report, we introduce Qwen2.5, a comprehensive series of large language models (LLMs) designed to meet diverse needs. Compared to previous iterations, Qwen 2.5 has been significantly improved during both the pre-training and post-training stages. In terms of pre-training, we have scaled the high-quality pre-training datasets from the previous 7 trillion tokens to 18 trillion tokens. This provides a strong foundation for common sense, expert knowledge, and reasoning capabilities. In terms of post-training, we implement intricate supervised finetuning with over 1 million samples, as well as multistage reinforcement learning. Post-training techniques enhance human preference, and notably improve long text generation, structural data analysis, and instruction following. To handle diverse and varied use cases effectively, we present Qwen2.5 LLM series in rich sizes. Open-weight offerin...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["huggingface_search"],"source":"huggingface_search","work_type":"","doi":"","openalex_id":"","cited_by_count":0,"quality_score":35,"matched_keywords":["LLM","preference"],"author_affiliations":[],"concepts":[],"official_report":false,"quality_signals":{"company_match_source":"HuggingFace Papers organization/author metadata"}},{"id":"official:dc193692a31aa187","title":"UniBench: Visual Reasoning Requires Rethinking Vision-Language Beyond Scaling","url":"https://ai.meta.com/research/publications/unibench-visual-reasoning-requires-rethinking-vision-language-beyond-scaling/","published":"2024-12-18","authors":["Haider Al-Tahan","Quentin Garrido","Randall Balestriero","Diane Bouchacourt","Caner Hazirbas","Mark Ibrahim"],"abstract":"Official Meta AI publication page.","companies":["Meta/FAIR"],"matched_orgs":["Meta/FAIR"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Core Machine Learning"],"author_affiliations":["Meta/FAIR"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Meta AI publication search https://ai.meta.com/results/?content_types%5B0%5D=publication&page=7"}},{"id":"openalex:W4405623404","title":"Provenance-Enabled Explainable AI","url":"https://doi.org/10.1145/3698826","published":"2024-12-18","authors":["Jiachi Zhang","Wenchao Zhou","Benjamin E. Ujcich"],"abstract":"Machine learning (ML) algorithms have advanced significantly in recent years, progressively evolving into artificial intelligence (AI) agents capable of solving complex, human-like intellectual challenges. Despite the advancements, the interpretability of these sophisticated models lags behind, with many ML architectures remaining \"black boxes\" that are too intricate and expansive for human interpretation. Recognizing this issue, there has been a revived interest in the field of explainable AI (XAI) aimed at explaining these opaque ML models. However, XAI tools often suffer from being tightly coupled with the underlying ML models and are inefficient due to redundant computations. We introduce provenance-enabled explainable AI (PXAI). PXAI decouples XAI computation from ML models through a provenance graph that tracks the creation and transformation of all data within the model. PXAI impr...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3698826","openalex_id":"https://openalex.org/W4405623404","cited_by_count":4,"quality_score":41,"matched_keywords":[],"author_affiliations":["Alibaba Group (China)","Georgetown University"],"concepts":[{"id":"https://openalex.org/C2781067378","display_name":"Interpretability","score":0.859556257724762},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.746300995349884},{"id":"https://openalex.org/C45374587","display_name":"Computation","score":0.6032808423042297},{"id":"https://openalex.org/C9652623","display_name":"Field (mathematics)","score":0.5544403195381165},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5460628271102905},{"id":"https://openalex.org/C2780502288","display_name":"Expansive","score":0.5091857314109802},{"id":"https://openalex.org/C527412718","display_name":"Interpretation (philosophy)","score":0.48365461826324463},{"id":"https://openalex.org/C132525143","display_name":"Graph","score":0.451569527387619}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":4}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/theagentcompany-benchmarking-llm-agents-on-consequential-real-world-tasks","title":"TheAgentCompany: Benchmarking LLM Agents on Consequential Real World Tasks","url":"https://www.microsoft.com/en-us/research/publication/theagentcompany-benchmarking-llm-agents-on-consequential-real-world-tasks/","published":"2024-12-17","authors":["Frank F. Xu","Yufan Song","Boxuan Li","Yuxuan Tang","Kritanjali Jain","Mengxue Bao","Z. Z. Wang","Xuhui Zhou","Zhitong Guo","Murong Cao","Mingyang Yang","Hao Yang Lu"],"abstract":"We interact with computers on an everyday basis, be it in everyday life or work, and many aspects of work can be done entirely with access to a computer and the Internet. At the same time, thanks to improvements in large language models (LLMs), there has also been a rapid development in AI agents that interact with and affect change in their surrounding environments. But how performant are AI agents at accelerating or even autonomously performing work-related tasks? The answer to this question has important implications both for industry looking to adopt AI into their workflows and for economic policy to understand the effects that adoption of AI may have on the labor market. To measure the progress of these LLM agents'performance on performing real-world professional tasks, in this paper we introduce TheAgentCompany, an extensible benchmark for evaluating AI agents that interact with th...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":80,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","large language models","1970-01-01","LLM","agent"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/evaluating-the-role-of-pre-training-dataset-size-and-diversity-on-single-cell-foundation-model-performance","title":"Evaluating the role of pre-training dataset size and diversity on single-cell foundation model performance","url":"https://www.microsoft.com/en-us/research/publication/evaluating-the-role-of-pre-training-dataset-size-and-diversity-on-single-cell-foundation-model-performance/","published":"2024-12-17","authors":["Alan DenAdel","Madeline Hughes","Akshaya Thoutam","Anay Gupta","Nicolo Fusi","Andrew W. Navia","Srivatsan Raghavan","Peter S. Winter","Ava P. Amini","Lorin Crawford"],"abstract":"The success of transformer-based foundation models on natural language and images has motivated their use in single-cell biology. Single-cell foundation models have been trained on increasingly larger transcriptomic datasets, scaling from initial studies with 1 million cells to newer atlases with over 100 million cells. This study investigates the role of pre-training dataset size and diversity on the performance of single-cell foundation models on both zero-shot and fine-tuned tasks. Using a large corpus of 22.2 million cells, we pre-train a total of 375 models which we evaluate by conducting 3,750 experiments. Our results show that current methods tend to plateau in performance with pre-training datasets that are only a fraction of the size.","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Article (Journal)","Medical, health and genomics"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"official:d1a15aef4c7c67c3","title":"FLAME : Factuality-Aware Alignment for Large Language Models","url":"https://ai.meta.com/research/publications/flame-factuality-aware-alignment-for-large-language-models/","published":"2024-12-17","authors":["Jack Lin","Luyu Gao","Barlas Oguz","Wenhan Xiong","Jimmy Lin","Scott Yih","Xilun Chen"],"abstract":"Official Meta AI publication page.","companies":["Meta/FAIR"],"matched_orgs":["Meta/FAIR"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["NLP"],"author_affiliations":["Meta/FAIR"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Meta AI publication search https://ai.meta.com/results/?content_types%5B0%5D=publication&page=7"}},{"id":"openalex:W4409156015","title":"A LLM-based agent for the automatic generation and generalization of IDS rules","url":"https://doi.org/10.1109/trustcom63139.2024.00259","published":"2024-12-17","authors":["Xiaowei Hu","Haoning Chen","Huaifeng Bao","Wen Wang","Feng Liu","Guoqiao Zhou","Peng Yin"],"abstract":"Cyberattacks on digital services and Internet of Things (IoT) are rising, employing complex tactics. Using intrusion detection systems (IDS) to detect and counter threats at key network points is vital for strong cybersecurity. Traditional rule-based network IDS rely on predefined rules, which may not effectively recognize the myriad complex variants of potential attacks. AI-driven methods for detecting malicious traffic offer enhanced capabilities but can fall short in terms of interpretability and performance under high-throughput network conditions. To address these challenges, we propose a LLM-based (Large Language Model) agent that utilizes multiple sources inputs to generate and generalize rules. The generated rules are designed to detect a variety of corresponding malicious threats, while the generalized rules are crafted to identify similar variant attacks. We have amassed an ext...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/trustcom63139.2024.00259","openalex_id":"https://openalex.org/W4409156015","cited_by_count":4,"quality_score":53,"matched_keywords":["LLM","language model","agent"],"author_affiliations":["Chinese Academy of Sciences","Institute of Information Engineering","Tencent (China)","University of Chinese Academy of Sciences"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7329723834991455},{"id":"https://openalex.org/C177148314","display_name":"Generalization","score":0.6975171566009521},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5225507616996765},{"id":"https://openalex.org/C33923547","display_name":"Mathematics","score":0.09003663063049316},{"id":"https://openalex.org/C134306372","display_name":"Mathematical analysis","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":4}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/inverse-design-of-vitrimeric-polymers-by-molecular-dynamics-and-generative-modeling","title":"Inverse Design of Vitrimeric Polymers by Molecular Dynamics and Generative Modeling","url":"https://www.microsoft.com/en-us/research/publication/inverse-design-of-vitrimeric-polymers-by-molecular-dynamics-and-generative-modeling/","published":"2024-12-16","authors":["Yiwen Zheng","Prakash Thakolkaran","Jake Smith","Ziheng Lu","Shuxin Zheng","Bichlien Nguyen","Siddhant Kumar","Aniruddh Vashisth"],"abstract":"Vitrimer is a new class of sustainable polymers with the ability of self-healing through rearrangement of dynamic covalent adaptive networks. However, a limited choice of constituent molecules restricts their property space, prohibiting full realization of their potential applications. Through a combination of molecular dynamics (MD) simulations and machine learning (ML), particularly a novel graph variational autoencoder (VAE) model, we establish a method for generating novel vitrimers and guide their inverse design based on desired glass transition temperature (Tg). We build the first vitrimer dataset of one million and calculate Tg on 8,424 of them by high-throughput MD simulations calibrated by a Gaussian process model. The proposed VAE employs dual graph encoders and a latent dimension overlapping scheme which allows for individual representation of multi-component vitrimers. By con...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Article (Journal)","Artificial intelligence","Ecology and environment","Materials science","Physics","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/vidtok-a-versatile-and-open-source-video-tokenizer","title":"VidTok: A Versatile and Open-Source Video Tokenizer","url":"https://www.microsoft.com/en-us/research/publication/vidtok-a-versatile-and-open-source-video-tokenizer/","published":"2024-12-16","authors":["Anni Tang","Tianyu He","Junliang Guo","Xinle Cheng","Li Song","Jiang Bian"],"abstract":"Encoding video content into compact latent tokens has become a fundamental step in video generation and understanding, driven by the need to address the inherent redundancy in pixel-level representations. Consequently, there is a growing demand for high-performance, open-source video tokenizers as video-centric research gains prominence. We introduce VidTok, a versatile video tokenizer that delivers state-of-the-art performance in both continuous and discrete tokenizations. VidTok incorporates several key advancements over existing approaches: 1) model architecture such as convolutional layers and up/downsampling modules; 2) to address the training instability and codebook collapse commonly associated with conventional Vector Quantization (VQ), we integrate Finite Scalar Quantization (FSQ) into discrete video tokenization; 3) improved training strategies, including a two-stage training p...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Miscellaneous","Artificial intelligence","Computer vision","Computer science","quantization"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W4405414691","title":"Dynamic text prompt joint multimodal features for accurate plant disease image captioning","url":"https://doi.org/10.1007/s00371-024-03729-0","published":"2024-12-16","authors":["Fangfang Liang","Zilong Huang","Wenjian Wang","Zhenxue He","Qing En"],"abstract":"","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1007/s00371-024-03729-0","openalex_id":"https://openalex.org/W4405414691","cited_by_count":5,"quality_score":42,"matched_keywords":[],"author_affiliations":["Baidu (China)","Carleton University","Hebei Agricultural University"],"concepts":[{"id":"https://openalex.org/C157657479","display_name":"Closed captioning","score":0.9656143188476562},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.825594425201416},{"id":"https://openalex.org/C2776401178","display_name":"Feature (linguistics)","score":0.6162418723106384},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5174051523208618},{"id":"https://openalex.org/C3019235130","display_name":"Plant disease","score":0.5094636082649231},{"id":"https://openalex.org/C115961682","display_name":"Image (mathematics)","score":0.42270779609680176},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.42215052247047424},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.3890606462955475}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":5}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/attribute-structuring-improves-llm-based-evaluation-of-clinical-text-summaries","title":"Attribute Structuring Improves LLM-Based Evaluation of Clinical Text Summaries","url":"https://www.microsoft.com/en-us/research/publication/attribute-structuring-improves-llm-based-evaluation-of-clinical-text-summaries/","published":"2024-12-15","authors":["Zelalem Gero","Chandan Singh","Yiqing Xie","Sheng Zhang","Tristan Naumann","Jianfeng Gao","Hoifung Poon"],"abstract":"Summarizing clinical text is crucial in health decision-support and clinical research. Large language models (LLMs) have shown the potential to generate accurate clinical text summaries, but still struggle with issues regarding grounding and evaluation, especially in safety-critical domains such as health. Holistically evaluating text summaries is challenging because they may contain unsubstantiated information. Here, we explore a general mitigation framework using Attribute Structuring (AS), which structures the summary evaluation process. It decomposes the evaluation process into a grounded procedure that uses an LLM for relatively simple structuring and scoring tasks, rather than the full task of holistic summary evaluation. Experiments show that AS consistently improves the correspondence between human annotations and automated metrics in clinical text summarization. Additionally, AS...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":88,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Medical, health and genomics","Computation and Language","Computer science","Healthcare","1970-01-01","LLM","efficient"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/maira-seg-enhancing-radiology-report-generation-with-segmentation-aware-multimodal-large-language-models","title":"MAIRA-Seg: Enhancing Radiology Report Generation with Segmentation-Aware Multimodal Large Language Models","url":"https://www.microsoft.com/en-us/research/publication/maira-seg-enhancing-radiology-report-generation-with-segmentation-aware-multimodal-large-language-models/","published":"2024-12-15","authors":["Harshita Sharma","Valentina Salvatelli","Shaury Srivastav","Kenza Bouzid","Shruthi Bannur","Daniel Coelho de Castro","Maximilian Ilse","Sam Bond-Taylor","Mercy Ranjit","Fabian Falck","Fernando Pérez-García","Anton Schwaighofer"],"abstract":"There is growing interest in applying AI to radiology report generation, particularly for chest X-rays (CXRs). This paper investigates whether incorporating pixel-level information through segmentation masks can improve fine-grained image interpretation of multimodal large language models (MLLMs) for radiology report generation. We introduce MAIRA-Seg, a segmentation-aware MLLM framework designed to utilize semantic segmentation masks alongside CXRs for generating radiology reports. We train expert segmentation models to obtain mask pseudolabels for radiology-specific structures in CXRs. Subsequently, building on the architectures of MAIRA, a CXR-specialised model for report generation, we integrate a trainable segmentation tokens extractor that leverages these mask pseudolabels, and employ mask-aware prompting to generate draft radiology reports. Our experiments on the publicly availabl...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Medical, health and genomics","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W4406611202","title":"LLM Enhanced Machine Learning Estimators for Classification","url":"https://doi.org/10.1109/wsc63780.2024.10838779","published":"2024-12-15","authors":["Yuhang Wu","Yingfei Wang","Chu Wang","Zeyu Zheng"],"abstract":"Pre-trained large language models (LLM) have emerged as a powerful tool for simulating various scenarios and generating informative output given specific instructions and multimodal input. In this work, we analyze the specific use of LLM to enhance a classical supervised machine learning method for classification problems. We propose a few approaches to integrate LLM into a classical machine learning estimator to further enhance the prediction performance. We examine the performance of the proposed approaches through both standard supervised learning binary classification tasks, and a transfer learning task where the test data observe distribution changes compared to the training data. Numerical experiments using four publicly available datasets are conducted and suggest that using LLM to enhance classical machine learning estimators can provide significant improvement on prediction perf...","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/wsc63780.2024.10838779","openalex_id":"https://openalex.org/W4406611202","cited_by_count":1,"quality_score":42,"matched_keywords":["LLM"],"author_affiliations":["Amazon (United States)","University of California, Berkeley","University of Washington"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7097960710525513},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.6122682094573975},{"id":"https://openalex.org/C185429906","display_name":"Estimator","score":0.585904061794281},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.567904531955719},{"id":"https://openalex.org/C105795698","display_name":"Statistics","score":0.1709437072277069},{"id":"https://openalex.org/C33923547","display_name":"Mathematics","score":0.12003439664840698}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/modality-driven-design-for-multi-step-dexterous-manipulation-insights-from-neuroscience","title":"Modality-Driven Design for Multi-Step Dexterous Manipulation: Insights from Neuroscience","url":"https://www.microsoft.com/en-us/research/publication/modality-driven-design-for-multi-step-dexterous-manipulation-insights-from-neuroscience/","published":"2024-12-14","authors":["Naoki Wake","Atsushi Kanehira","Daichi Saito","Jun Takamatsu","Kazuhiro Sasabuchi","Hideki Koike","Katsushi Ikeuchi"],"abstract":"Multi-step dexterous manipulation is a fundamental skill in household scenarios, yet remains an underexplored area in robotics. This paper proposes a modular approach, where each step of the manipulation process is addressed with dedicated policies based on effective modality input, rather than relying on a single end-to-end model. To demonstrate this, a dexterous robotic hand performs a manipulation task involving picking up and rotating a box. Guided by insights from neuroscience, the task is decomposed into three sub-skills, 1)reaching, 2)grasping and lifting, and 3)in-hand rotation, based on the dominant sensory modalities employed in the human brain. Each sub-skill is addressed using distinct methods from a practical perspective: a classical controller, a Vision-Language-Action model, and a reinforcement learning policy with force feedback, respectively. We tested the pipeline on a....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":84,"matched_keywords":["Miscellaneous","Artificial intelligence","Computer vision","Human-computer interaction","Computer science","Computer Vision and Pattern Recognition","Embodied AI","Robotics"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/scbench-a-kv-cache-centric-analysis-of-long-context-methods","title":"SCBench: A KV Cache-Centric Analysis of Long-Context Methods","url":"https://www.microsoft.com/en-us/research/publication/scbench-a-kv-cache-centric-analysis-of-long-context-methods/","published":"2024-12-13","authors":["Yucheng Li","Huiqiang Jiang","Qianhui Wu","Xufang Luo","Surin Ahn","Chengruidong Zhang","Amir H. Abdi","Dongsheng Li","Jianfeng Gao","Yuqing Yang","Lili Qiu"],"abstract":"Long-context LLMs have enabled numerous downstream applications but also introduced significant challenges related to computational and memory efficiency. To address these challenges, optimizations for long-context inference have been developed, centered around the KV cache. However, existing benchmarks often evaluate in single-request, neglecting the full lifecycle of the KV cache in real-world use. This oversight is particularly critical, as KV cache reuse has become widely adopted in LLMs inference frameworks, such as vLLM and SGLang, as well as by LLM providers, including OpenAI, Microsoft, Google, and Anthropic. To address this gap, we introduce SCBench(SharedContextBench), a comprehensive benchmark for evaluating long-context methods from a KV cachecentric perspective: 1) KV cache generation, 2) KV cache compression, 3) KV cache retrieval, 4) KV cache loading. Specifically, SCBench...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":104,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Human language technologies","Computation and Language","large language models","Machine learning","1970-01-01","LLM","memory","retrieval","efficient","compression","quantization"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"hf-org-paper:deepseek-ai:2412.10302","title":"DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding","url":"https://huggingface.co/papers/2412.10302","published":"2024-12-13","authors":["DeepSeek"],"abstract":"We present DeepSeek-VL2, an advanced series of large Mixture-of-Experts (MoE) Vision-Language Models that significantly improves upon its predecessor, DeepSeek-VL, through two key major upgrades. For the vision component, we incorporate a dynamic tiling vision encoding strategy designed for processing high-resolution images with different aspect ratios. For the language component, we leverage DeepSeekMoE models with the Multi-head Latent Attention mechanism, which compresses Key-Value cache into latent vectors, to enable efficient inference and high throughput. Trained on an improved vision-language dataset, DeepSeek-VL2 demonstrates superior capabilities across various tasks, including but not limited to visual question answering, optical character recognition, document/table/chart understanding, and visual grounding. Our model series is composed of three variants: DeepSeek-VL2-Tiny, De...","companies":["DeepSeek"],"matched_orgs":["DeepSeek"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["HuggingFace org papers","deepseek-ai","efficient"],"author_affiliations":["DeepSeek"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/deepseek-ai/papers"}},{"id":"openalex:W4412610612","title":"StAR: Learning on Text-Attributed Graphs with Structure-Aware Rationales","url":"https://doi.org/10.1109/hpcc64274.2024.00055","published":"2024-12-13","authors":["Zheyuan Zhang","Song Wang","Jingguo Ge","Yiming Xu","Yulei Wu","Jifei Wen","Chang Liu"],"abstract":"In recent years, the integration of Large Language Models (LLMs) with graph neural networks (GNNs) has opened new avenues in handling Text-Attributed Graphs (TAGs). This paper presents a novel approach leveraging LLMs for tackling TAG node classification problems, focusing on text augmentation and structural information enhancement through neighbor information integration. Our method employs a supervised fine-tuning process for LLMs with generated structure-ware rationales that involve structural information from TAGs. Through a combination of structure-aware rationale generation and alignment training, we enhance the learning and integration of graph structural information. We demonstrate the effectiveness of our approach across multiple datasets, showcasing improvements in node classification accuracy. Our contributions include the development of a self-guided approach to generate high...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/hpcc64274.2024.00055","openalex_id":"https://openalex.org/W4412610612","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Tencent (China)","University of Bristol","University of Chinese Academy of Sciences","University of Virginia","Xi'an Jiaotong University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6439306139945984},{"id":"https://openalex.org/C2780897414","display_name":"Star (game theory)","score":0.5505830645561218},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3854338526725769},{"id":"https://openalex.org/C80444323","display_name":"Theoretical computer science","score":0.33253517746925354},{"id":"https://openalex.org/C44870925","display_name":"Astrophysics","score":0.17916783690452576},{"id":"https://openalex.org/C121332964","display_name":"Physics","score":0.16581979393959045}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/metis-fast-quality-aware-rag-systems-with-configuration-adaptation-tr","title":"METIS: Fast Quality-Aware RAG Systems with Configuration Adaptation (TR)","url":"https://www.microsoft.com/en-us/research/publication/metis-fast-quality-aware-rag-systems-with-configuration-adaptation-tr/","published":"2024-12-12","authors":["Siddhant Ray","Rui Pan","Zhuohan Gu","Kuntai Du","Shaoting Feng","Ganesh Ananthanarayanan","Ravi Netravali","Junchen Jiang"],"abstract":"RAG (Retrieval Augmented Generation) allows LLMs (large language models) to generate better responses with external knowledge, but using more external knowledge often improves generation quality at the expense of response delay. Prior work either reduces the response delay (through better scheduling of RAG queries) or strives to maximize quality (which involves tuning the RAG workflow), but they fall short in optimizing the tradeoff between the delay and quality of RAG responses. This paper presents METIS, the first RAG system that jointly schedules queries and adapts the key RAG configurations of each query, such as the number of retrieved text chunks and synthesis methods, in order to balance quality optimization and response delay reduction. Using 4 popular RAG-QA datasets, we show that compared with the state-of-the-art RAG optimization schemes, METIS reduces the generation latency b...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Tech Report","Artificial intelligence","Systems and networking","Computer science","large language models","retrieval"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/phi-4-technical-report","title":"Phi-4 Technical Report","url":"https://www.microsoft.com/en-us/research/publication/phi-4-technical-report/","published":"2024-12-12","authors":["Marah I Abdin","Jyoti Aneja","Harkirat Behl","Sébastien Bubeck","Ronen Eldan","Suriya Gunasekar","Michael Harrison","Russell J. Hewett","Mojan Javaheripi","Piero Kauffmann","James R. Lee","Yin Tat Lee"],"abstract":"We present phi-4, a 14-billion parameter language model developed with a training recipe that is centrally focused on data quality. Unlike most language models, where pre-training is based primarily on organic data sources such as web content or code, phi-4 strategically incorporates synthetic data throughout the training process. While previous models in the Phi family largely distill the capabilities of a teacher model (specifically GPT-4), phi-4 substantially surpasses its teacher model on STEM-focused QA capabilities, giving evidence that our data-generation and post-training techniques go beyond distillation. Despite minimal changes to the phi-3 architecture, phi-4 achieves strong performance relative to its size– especially on reasoning-focused benchmarks– due to improved data, training curriculum, and innovations in the post-training scheme.","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Tech Report","Artificial intelligence","language model","distillation"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/deeper-evaluation-of-a-single-cell-foundation-model","title":"Deeper evaluation of a single-cell foundation model","url":"https://www.microsoft.com/en-us/research/publication/deeper-evaluation-of-a-single-cell-foundation-model/","published":"2024-12-12","authors":["Rebecca Boiarsky","Nalini M. Singh","Alejandro Buendia","Ava P. Amini","Gad Getz","David Sontag"],"abstract":"Large-scale foundation models, which are pre-trained on massive, unlabelled datasets and subsequently fine-tuned on specific tasks, have recently achieved unparalleled success on a wide array of applications, including in healthcare and biology . The success of these models has showcased the power of leveraging generalizable features and contextual understanding to improve a model’s performance. Single-cell bidirectional encoder representations from transformers (scBERT) by Yang et al. 7 is one of several recently developed foundation models to learn representations of single-cell RNA-sequencing data. Yang et al. pre-trained their model on 1.12 million cells to impute masked gene-expression values and characterize the performance of their model on a fine-tuning task to annotate cell types. We reproduce their results, and provide additional baselines and ablation studies (that is, remove....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Article (Journal)","Medical, health and genomics"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"official:c1797a2bcf7bf58b","title":"Zero-Shot Whole-Body Humanoid Control via Behavioral Foundation Models","url":"https://ai.meta.com/research/publications/zero-shot-whole-body-humanoid-control-via-behavioral-foundation-models/","published":"2024-12-12","authors":["Andrea Tirinzoni","Ahmed Touati","Jesse Farebrother","Mateusz Guzek","Anssi Kanervisto","Yingchen Xu","Alessandro Lazaric","Matteo Pirotta"],"abstract":"Official Meta AI publication page.","companies":["Meta/FAIR"],"matched_orgs":["Meta/FAIR"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Reinforcement Learning"],"author_affiliations":["Meta/FAIR"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Meta AI publication search https://ai.meta.com/results/?content_types%5B0%5D=publication&page=7"}},{"id":"openalex:W4405334847","title":"An Experimental Evaluation of LLM on Image Classification","url":"https://doi.org/10.1007/978-981-96-1242-0_37","published":"2024-12-12","authors":["Jiaxuan Wu","Xushuo Tang","Zhengyi Yang","Kongzhang Hao","Longbin Lai","Yongfei Liu"],"abstract":"","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"book-chapter","doi":"https://doi.org/10.1007/978-981-96-1242-0_37","openalex_id":"https://openalex.org/W4405334847","cited_by_count":8,"quality_score":49,"matched_keywords":["LLM"],"author_affiliations":["Alibaba Group (China)","Australian Wool Innovation (Australia)","UNSW Sydney","University of California, Irvine"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8682947158813477},{"id":"https://openalex.org/C115961682","display_name":"Image (mathematics)","score":0.5200331211090088},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.513048529624939},{"id":"https://openalex.org/C75294576","display_name":"Contextual image classification","score":0.43278875946998596},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.41557061672210693},{"id":"https://openalex.org/C153180895","display_name":"Pattern recognition (psychology)","score":0.33986592292785645}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":8}},{"id":"openalex:W4405323124","title":"Fine-Tuning Llama 3 for Sentiment Analysis: Leveraging AWS Cloud for Enhanced Performance","url":"https://doi.org/10.1007/s42979-024-03473-1","published":"2024-12-12","authors":["Shantanu Kumar","Shruti Singh"],"abstract":"","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1007/s42979-024-03473-1","openalex_id":"https://openalex.org/W4405323124","cited_by_count":1,"quality_score":38,"matched_keywords":[],"author_affiliations":["Amazon (United States)","Seattle University","Washington State University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.831924557685852},{"id":"https://openalex.org/C48044578","display_name":"Scalability","score":0.7618492245674133},{"id":"https://openalex.org/C66402592","display_name":"Sentiment analysis","score":0.7410200834274292},{"id":"https://openalex.org/C79974875","display_name":"Cloud computing","score":0.6091874837875366},{"id":"https://openalex.org/C185798385","display_name":"Benchmark (surveying)","score":0.5331603288650513},{"id":"https://openalex.org/C8642999","display_name":"Hyperparameter","score":0.5127009153366089},{"id":"https://openalex.org/C66322947","display_name":"Transformer","score":0.4864494502544403},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.47162795066833496}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/elevating-visual-perception-in-multimodal-llms-with-visual-embedding-distillation","title":"Elevating Visual Perception in Multimodal LLMs with Visual Embedding Distillation","url":"https://www.microsoft.com/en-us/research/publication/elevating-visual-perception-in-multimodal-llms-with-visual-embedding-distillation/","published":"2024-12-11","authors":["Jitesh Jain","Zhengyuan Yang","Humphrey Shi","Jianfeng Gao","Jianwei Yang"],"abstract":"In recent times, the standard practice for developing MLLMs is to feed features from vision encoder(s) into the LLM and train with natural language supervision. This approach often causes models to lean towards language comprehension and undermine the rich visual perception signals present in the data, which are critical for tasks involving spatial reasoning in the domain of embodied AI and robotics. Is it possible to optimize both at the same time? In this work, we propose VisPer-LM, the first approach that infuses visual perception knowledge from expert vision encoders into the LLM's (of an MLLM) hidden representations. We start by investigating MLLMs trained solely with natural language supervision and identify a positive correlation between the quality of visual representations within these models and their downstream performance. Given this insight, we formulate the objective during...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":84,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer vision","Computer science","Multimodal Large Language Models","1970-01-01","LLM","distillation"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/is-a-picture-worth-a-thousand-words-delving-into-spatial-reasoning-for-vision-language-models","title":"Is A Picture Worth A Thousand Words? Delving Into Spatial Reasoning for Vision Language Models","url":"https://www.microsoft.com/en-us/research/publication/is-a-picture-worth-a-thousand-words-delving-into-spatial-reasoning-for-vision-language-models/","published":"2024-12-11","authors":["Vibhav Vineet","Xin Wang","Neel Joshi"],"abstract":"Large language models (LLMs) and vision-language models (VLMs) have demonstrated remarkable performance across a wide range of tasks and domains. Despite this promise, spatial understanding and reasoning—a fundamental component of human cognition—remains under-explored. We develop novel benchmarks that cover diverse aspects of spatial reasoning such as relationship understanding, navigation, and counting. We conduct a comprehensive evaluation of competitive language and vision-language models. Our findings reveal several counter-intuitive insights that have been overlooked in the literature: (1) Spatial reasoning poses significant challenges where competitive models can fall behind random guessing; (2) Despite additional visual input, VLMs often under-perform compared to their LLM counterparts; (3) When both textual and visual information is available, multi-modal language models become....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","large language models","spatial reasoning","Vision-language models","LLM"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W4405232231","title":"Rethinking Resource Management in Edge Learning: A Joint Pre-Training and Fine-Tuning Design Paradigm","url":"https://doi.org/10.1109/twc.2024.3510418","published":"2024-12-11","authors":["Zhonghao Lyu","Yuchen Li","Guangxu Zhu","Jie Xu","H. Vincent Poor","Shuguang Cui"],"abstract":"In some applications, edge learning is experiencing a shift in focus from conventional learning from scratch to two-stage learning combining pre-training and task-specific fine-tuning. This paper considers the problem of joint communication and computation resource management in a two-stage edge learning system. In this system, model pre-training is first conducted at an edge server via centralized learning on local pre-stored general data, and then task-specific fine-tuning is performed at edge devices based on the pre-trained model via federated edge learning. For the two-stage learning model, we first analyze the convergence behavior (in terms of the average squared gradient norm bound), which characterizes the impacts of various system parameters, such as the number of learning rounds and batch sizes in the two stages, on the convergence rate. Based on our analytical results, we then...","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/twc.2024.3510418","openalex_id":"https://openalex.org/W4405232231","cited_by_count":21,"quality_score":58,"matched_keywords":[],"author_affiliations":["Baidu (China)","Chinese University of Hong Kong, Shenzhen","Princeton University","Shenzhen Research Institute of Big Data"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7176561951637268},{"id":"https://openalex.org/C2777211547","display_name":"Training (meteorology)","score":0.5721217393875122},{"id":"https://openalex.org/C18555067","display_name":"Joint (building)","score":0.5429021120071411},{"id":"https://openalex.org/C2780609101","display_name":"Resource management (computing)","score":0.4950534403324127},{"id":"https://openalex.org/C162307627","display_name":"Enhanced Data Rates for GSM Evolution","score":0.43902334570884705},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3876357972621918},{"id":"https://openalex.org/C49774154","display_name":"Multimedia","score":0.36938929557800293},{"id":"https://openalex.org/C120314980","display_name":"Distributed computing","score":0.20199903845787048}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":21}},{"id":"openalex:W4407317366","title":"FPGA Design for Multimodal Sensor Data Fusion in Autonomous Robots","url":"https://doi.org/10.1109/icscna63714.2024.10863838","published":"2024-12-11","authors":["Muthukumaran Vaithianathan","Shivakumar Udkar","Deepanjan Roy","Manjunath Reddy","S. Rajasekaran"],"abstract":"This research study introduces a novel Field Programmable Gate Array (FPGA) design for autonomous robotics that integrates data from multiple sensors to improve their operational efficiency and decision-making. The proposed technique accomplishes real-time performance by integrating information from multiple sensors, such as LiDAR, cameras, and inertial measurement units (IMUs), using the parallel processing capabilities of FPGAs. Sensor interfaces, control logic, and complex data integration algorithms are integrated into the design. Additionally, the design is adaptable and can be implemented with FPGAs. By employing Kalman filters for state prediction and decision trees for contextual classification, the design enhances accuracy and significantly reduces latency. According to the testing results, the FPGA-based system outperforms earlier existing systems in terms of processing speed a...","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/icscna63714.2024.10863838","openalex_id":"https://openalex.org/W4407317366","cited_by_count":2,"quality_score":43,"matched_keywords":["efficient"],"author_affiliations":["Market Matters","Nvidia (United States)","Qualcomm (United States)","Samsung (United States)"],"concepts":[{"id":"https://openalex.org/C42935608","display_name":"Field-programmable gate array","score":0.7572817802429199},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7329621315002441},{"id":"https://openalex.org/C33954974","display_name":"Sensor fusion","score":0.6390734910964966},{"id":"https://openalex.org/C90509273","display_name":"Robot","score":0.6008207201957703},{"id":"https://openalex.org/C149635348","display_name":"Embedded system","score":0.563696026802063},{"id":"https://openalex.org/C158525013","display_name":"Fusion","score":0.4438101649284363},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.4093538522720337},{"id":"https://openalex.org/C79403827","display_name":"Real-time computing","score":0.3776960074901581}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":2}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/turboattention-efficient-attention-approximation-for-high-throughputs-llms","title":"TurboAttention: Efficient Attention Approximation For High Throughputs LLMs","url":"https://www.microsoft.com/en-us/research/publication/turboattention-efficient-attention-approximation-for-high-throughputs-llms/","published":"2024-12-10","authors":["Hao Kang","Srikant Bharadwaj","James Hensman","Tushar Krishna","Victor Ruehle","Saravan Rajmohan"],"abstract":"Large language model (LLM) inference demands significant amount of computation and memory, especially in the key attention mechanism. While techniques, such as quantization and acceleration algorithms, like FlashAttention, have improved efficiency of the overall inference, they address different aspects of the problem: quantization focuses on weight-activation operations, while FlashAttention improves execution but requires high-precision formats. Recent Key-value (KV) cache quantization reduces memory bandwidth but still needs floating-point dequantization for attention operation. We present TurboAttention, a comprehensive approach to enable quantized execution of attention that simultaneously addresses both memory and computational efficiency. Our solution introduces two key innovations: FlashQ, a headwise attention quantization technique that enables both compression of KV cache and q...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":88,"matched_keywords":["Article (Journal)","Artificial intelligence","Computer science","LLM","language model","memory","efficient","compression","quantization"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/llms-can-be-fooled-into-labelling-a-document-as-relevant-best-cafe-near-me-this-paper-is-perfectly-relevant","title":"LLMs can be Fooled into Labelling a Document as Relevant (Best café near me; this paper is perfectly relevant)","url":"https://www.microsoft.com/en-us/research/publication/llms-can-be-fooled-into-labelling-a-document-as-relevant-best-cafe-near-me-this-paper-is-perfectly-relevant/","published":"2024-12-10","authors":["Marwah Alaofi","Paul Thomas","Falk Scholer","Mark Sanderson"],"abstract":"Large language models (LLMs) are increasingly being used to assess the relevance of information objects. This work reports on experiments to study the labelling of short texts for relevance, using multiple open-source and proprietary LLMs. While the overall agreement of some LLMs with human judgements is comparable to human-to-human agreement measured in previous research, LLMs are more likely to label passages as relevant compared to human judges, indicating that LLM labels denoting non-relevance are more reliable than those indicating relevance.This observation prompts us to further examine cases where human judges and LLMs disagree, particularly when the human judge labels the passage as non-relevant and the LLM labels it as relevant. Results show a tendency for many LLMs to label passages that include the original query terms as relevant. We therefore conduct experiments to inject qu...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Inproceedings (Conference)","Search and information retrieval","Information retrieval","1970-01-01","LLM"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"apple:ngz1girvuf1l9hn8c0glra9q","title":"Evaluating Gender Bias Transfer between Pre-trained and Prompt-Adapted Language Models","url":"https://machinelearning.apple.com/research/gender-bias-transfer","published":"2024-12-10","authors":["Natalie Mackraz","Nivedha Sivakumar","Samira Khorshidi","Krishna Patel","Barry-John Theobald","Luca Zappella","Nicholas Apostoloff"],"abstract":"Equal Contributors","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"openalex:W4405207273","title":"SparseCoder: Advancing source code analysis with sparse attention and learned token pruning","url":"https://doi.org/10.1007/s10664-024-10558-1","published":"2024-12-10","authors":["Xueqi Yang","Mariusz Jakubowski","Kang Li","Haojie Yu","Tim Menzies"],"abstract":"","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1007/s10664-024-10558-1","openalex_id":"https://openalex.org/W4405207273","cited_by_count":6,"quality_score":43,"matched_keywords":[],"author_affiliations":["Microsoft (United States)","North Carolina State University","North Central State College"],"concepts":[{"id":"https://openalex.org/C195956108","display_name":"Quadratic growth","score":0.7364501953125},{"id":"https://openalex.org/C3826847","display_name":"FLOPS","score":0.695020318031311},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6761574745178223},{"id":"https://openalex.org/C48145219","display_name":"Security token","score":0.549216628074646},{"id":"https://openalex.org/C2777904410","display_name":"Software","score":0.5121871829032898},{"id":"https://openalex.org/C2776214188","display_name":"Inference","score":0.48224076628685},{"id":"https://openalex.org/C108010975","display_name":"Pruning","score":0.458809494972229},{"id":"https://openalex.org/C43126263","display_name":"Source code","score":0.4193825125694275}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":6}},{"id":"openalex:W4405216627","title":"Draft Model Knows When to Stop: A Self-Verification Length Policy for Speculative Decoding","url":"https://doi.org/10.32388/i3vw8j","published":"2024-12-10","authors":["Ziyin Zhang","Jiahao Xu","Tian Liang","Xingyu Chen","Zhiwei He","Rui Wang","Zhaopeng Tu"],"abstract":"Speculative Decoding (SD) has become an important technique in accelerating the inference speed of large language models. Conventional SD methods employ a fixed draft length, which ignores the token generation difficulty across tasks. Consequently, in this paper, we address such an issue and introduce SVIP - a difficulty-aware dynamic draft length policy for speculative decoding systems. Based on a theoretical lower bound of draft token acceptance rate and its inference-time approximation, SVIP adaptively determines the lengths of draft sequences based on the entropy of each draft token distribution. Experimental results on mainstream SD benchmarks and frameworks demonstrate the superior performance of SVIP, achieving up to 20% walltime speedup on SpecBench over baseline SD methods and 60% speedup on MT-Bench for long-form generation of up to 8K tokens. Moreover, SVIP is totally training...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"preprint","doi":"https://doi.org/10.32388/i3vw8j","openalex_id":"https://openalex.org/W4405216627","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["China University of Mining and Technology","Shanghai Jiao Tong University","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C48145219","display_name":"Security token","score":0.8827941417694092},{"id":"https://openalex.org/C68339613","display_name":"Speedup","score":0.860278308391571},{"id":"https://openalex.org/C57273362","display_name":"Decoding methods","score":0.7647958993911743},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7377108335494995},{"id":"https://openalex.org/C2776214188","display_name":"Inference","score":0.566389799118042},{"id":"https://openalex.org/C147297375","display_name":"Look-ahead","score":0.48045891523361206},{"id":"https://openalex.org/C11413529","display_name":"Algorithm","score":0.4097922444343567},{"id":"https://openalex.org/C173608175","display_name":"Parallel computing","score":0.3114379048347473}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/minference-1-0-accelerating-pre-filling-for-long-context-llms-via-dynamic-sparse-attention","title":"MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention","url":"https://www.microsoft.com/en-us/research/publication/minference-1-0-accelerating-pre-filling-for-long-context-llms-via-dynamic-sparse-attention/","published":"2024-12-09","authors":["Huiqiang Jiang","Yucheng Li","Chengruidong Zhang","Qianhui Wu","Xufang Luo","Surin Ahn","Zhenhua Han","Amir H. Abdi","Dongsheng Li","Chin-Yew Lin","Yuqing Yang","Lili Qiu"],"abstract":"The computational challenges of Large Language Model (LLM) inference remain a significant barrier to their widespread deployment, especially as prompt lengths continue to increase. Due to the quadratic complexity of the attention computation, it takes 30 minutes for an 8B LLM to process a prompt of 1M tokens (i.e., the pre-filling stage) on a single A100 GPU. Existing methods for speeding up prefilling often fail to maintain acceptable accuracy or efficiency when applied to long-context LLMs. To address this gap, we introduce MInference (Milliontokens Inference), a sparse calculation method designed to accelerate pre-filling of long-sequence processing. Specifically, we identify three unique patterns in long-context attention matrices-the A-shape, Vertical-Slash, and Block-Sparsethat can be leveraged for efficient sparse computation on GPUs. We determine the optimal pattern for each atte...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":92,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Human language technologies","Systems and networking","Computer science","large language models","1970-01-01","LLM","language model","efficient"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/moe-cap-benchmarking-cost-accuracy-and-performance-of-sparse-mixture-of-experts-systems","title":"MoE-CAP: Benchmarking Cost, Accuracy and Performance of Sparse Mixture-of-Experts Systems","url":"https://www.microsoft.com/en-us/research/publication/moe-cap-benchmarking-cost-accuracy-and-performance-of-sparse-mixture-of-experts-systems/","published":"2024-12-09","authors":["Yao Fu","Yinsicheng Jiang","Yeqi Huang","Ping Nie","Zhan Lu","Leyang Xue","Congjie He","Man-Kit Sit","Jilong Xue","Li Dong","Ziming Miao","Kai Zou"],"abstract":"The sparse Mixture-of-Experts (MoE) architecture is increasingly favored for scaling Large Language Models (LLMs) efficiently, but it depends on heterogeneous compute and memory resources. These factors jointly affect system Cost, Accuracy, and Performance (CAP), making trade-offs inevitable. Existing benchmarks often fail to capture these trade-offs accurately, complicating practical deployment decisions. To address this, we introduce MoE-CAP, a benchmark specifically designed for MoE systems. Our analysis reveals that achieving an optimal balance across CAP is difficult with current hardware; MoE systems typically optimize two of the three dimensions at the expense of the third-a dynamic we term the MoE-CAP trade-off. To visualize this, we propose the CAP Radar Diagram. We further introduce sparsity-aware performance metrics-Sparse Memory Bandwidth Utilization (S-MBU) and Sparse Model....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","large language models","1970-01-01","memory"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/chimera-accurate-retrosynthesis-prediction-by-ensembling-models-with-diverse-inductive-biases","title":"Chimera: Accurate retrosynthesis prediction by ensembling models with diverse inductive biases","url":"https://www.microsoft.com/en-us/research/publication/chimera-accurate-retrosynthesis-prediction-by-ensembling-models-with-diverse-inductive-biases/","published":"2024-12-09","authors":["Krzysztof Maziarz","Guoqing Liu","Hubert Misztela","Aleksei Kornev","Piotr Gaiński","Holger Hoefling","Mike Fortunato","Rishi Gupta","Marwin Segler"],"abstract":"Planning and conducting chemical syntheses remains a major bottleneck in the discovery of functional small molecules, and prevents fully leveraging generative AI for molecular inverse design. While early work has shown that ML-based retrosynthesis models can predict reasonable routes, their low accuracy for less frequent, yet important reactions has been pointed out. As multi-step search algorithms are limited to reactions suggested by the underlying model, the applicability of those tools is inherently constrained by the accuracy of retrosynthesis prediction. Inspired by how chemists use different strategies to ideate reactions, we propose Chimera: a framework for building highly accurate reaction models that combine predictions from diverse sources with complementary inductive biases using a learning-based ensembling strategy. We instantiate the framework with two newly developed model...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Miscellaneous","Artificial intelligence","Chemistry","Computer science","Drug discovery","Machine learning"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W4407832377","title":"Warm-Starting Contextual Bandits Under Latent Reward Scaling","url":"https://doi.org/10.1109/icdm59182.2024.00043","published":"2024-12-09","authors":["Bastian Oetomo","R. Malinga Perera","Renata Borovica‐Gajic","Benjamin I. P. Rubinstein"],"abstract":"Multi-armed bandits have long been known to enjoy optimal long-term performance, with sub-linear cumulative re-gret bounds standard. Recent developments take the performance of early rounds into consideration by ‘warm-starting’ bandits via incorporating pre-existing information into initialisation. Unfor-tunately, existing warm-start approaches are brittle to differences in the reward distributions between pretraining and deployment phases. This paper considers one such contextual bandit setting, where the same linear relationship relates contexts and rewards in pretraining and deployment phases, but only up to (unknown) constant scaling. A probabilistic model is proposed to capture this novel transfer learning problem, and a simple algorithm is derived as a maximum a posteriori point estimate. We present a regret bound for our method, with empirical evaluation across a range of datasets...","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/icdm59182.2024.00043","openalex_id":"https://openalex.org/W4407832377","cited_by_count":0,"quality_score":41,"matched_keywords":["long-term"],"author_affiliations":["Amazon (United States)","University of Melbourne"],"concepts":[{"id":"https://openalex.org/C99844830","display_name":"Scaling","score":0.6358230710029602},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6157200932502747},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3924259543418884},{"id":"https://openalex.org/C180747234","display_name":"Cognitive psychology","score":0.3593031167984009},{"id":"https://openalex.org/C15744967","display_name":"Psychology","score":0.24937692284584045},{"id":"https://openalex.org/C33923547","display_name":"Mathematics","score":0.18974444270133972},{"id":"https://openalex.org/C2524010","display_name":"Geometry","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/machine-unlearning-doesnt-do-what-you-think-lessons-for-generative-ai-policy-research-and-practice","title":"Machine Unlearning Doesn't Do What You Think: Lessons for Generative AI Policy, Research, and Practice","url":"https://www.microsoft.com/en-us/research/publication/machine-unlearning-doesnt-do-what-you-think-lessons-for-generative-ai-policy-research-and-practice/","published":"2024-12-08","authors":["A. F. Cooper","Christopher A. Choquette-Choo","Miranda Bogen","Matthew Jagielski","Katja Filippova","Ken Ziyu Liu","Alex Chouldechova","Jamie Hayes","Yangsibo Huang","Niloofar Mireshghallah","Ilia Shumailov","Eleni Triantafillou"],"abstract":"We articulate fundamental mismatches between technical methods for machine unlearning in Generative AI, and documented aspirations for broader impact that these methods could have for law and policy. These aspirations are both numerous and varied, motivated by issues that pertain to privacy, copyright, safety, and more. For example, unlearning is often invoked as a solution for removing the effects of targeted information from a generative-AI model's parameters, e.g., a particular individual's personal data or in-copyright expression of Spiderman that was included in the model's training data. Unlearning is also proposed as a way to prevent a model from generating targeted types of information in its outputs, e.g., generations that closely resemble a particular individual's data or reflect the concept of\"Spiderman.\"Both of these goals--the targeted removal of information from a model and...","companies":["Microsoft","Google/DeepMind"],"matched_orgs":["Microsoft","Google/DeepMind"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.2139/ssrn.5060253","openalex_id":"https://openalex.org/W4407225086","cited_by_count":5,"quality_score":89,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","Machine learning","1970-01-01"],"author_affiliations":["Microsoft","Cornell University","Data & Society Research Institute","Eastern University","Georgetown University","Google (United States)","Google DeepMind (United Kingdom)","Microsoft (United States)","Microsoft Research (United Kingdom)","Microsoft Research New York City (United States)","Stanford Medicine","Stanford University","University of Michigan","University of Pennsylvania","University of Toronto","University of Washington","Virginia College","West Virginia University","Yale University"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"bytedance-seed:231","title":"MaskBit: Embedding-free Image Generation via Bit Tokens","url":"https://seed.bytedance.com/en/research/maskbit-embedding-free-image-generation-via-bit-tokens","published":"2024-12-08","authors":["Mark Weber","Lijun Yu","Qihang Yu","Xueqing Deng","Xiaohui Shen","Daniel Cremers","Liang-Chieh Chen"],"abstract":"Masked transformer models for class-conditional image generation have become a compelling alternative to diffusion models. Typically comprising two stages - an initial VQGAN model for transitioning between latent space and image space, and a subsequent Transformer model for image generation within latent space - these frameworks offer promising avenues for image synthesis. In this study, we present two primary contributions: Firstly, an empirical and systematic examination of VQGANs, leading to a modernized VQGAN. Secondly, a novel embedding-free generation network operating directly on bit tokens - a binary quantized representation of tokens with rich semantics. The first contribution furnishes a transparent, reproducible, and high-performing VQGAN model, enhancing accessibility and matching the performance of current state-of-the-art methods while revealing previously undisclosed detai...","companies":["ByteDance/Seed"],"matched_orgs":["ByteDance/Seed"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Computer Vision","Vision","ICLR 2025"],"author_affiliations":["ByteDance/Seed"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official ByteDance Seed publication API https://seed.bytedance.com/api/get_article_list_v2"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/machine-unlearning-doesnt-do-what-you-think-lessons-for-generative-ai-policy-research-and-practice-tr","title":"Machine Unlearning Doesn't Do What You Think: Lessons for Generative AI Policy, Research, and Practice (TR)","url":"https://www.microsoft.com/en-us/research/publication/machine-unlearning-doesnt-do-what-you-think-lessons-for-generative-ai-policy-research-and-practice-tr/","published":"2024-12-08","authors":["A. Feder Cooper","Christopher A. Choquette-Choo","Miranda Bogen","Matthew Jagielski","Katja Filippova","Ken Ziyu Liu","Alexandra Chouldechova","Jamie Hayes","Yangsibo Huang","Niloofar Mireshghallah","Ilia Shumailov","Eleni Triantafillou"],"abstract":"We articulate fundamental mismatches between technical methods for machine unlearning in Generative AI, and documented aspirations for broader impact that these methods could have for law and policy. These aspirations are both numerous and varied, motivated by issues that pertain to privacy, copyright, safety, and more. For example, unlearning is often invoked as a solution for removing the effects of targeted information from a generative-AI model's parameters, e.g., a particular individual's personal data or in-copyright expression of Spiderman that was included in the model's training data. Unlearning is also proposed as a way to prevent a model from generating targeted types of information in its outputs, e.g., generations that closely resemble a particular individual's data or reflect the concept of \"Spiderman.\" Both of these goals--the targeted removal of information from a model a...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Tech Report","Artificial intelligence","Computer science"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"apple:i9h0v9mzlfg162sb7ca6495n","title":"Momentum Approximation in Asynchronous Private Federated Learning","url":"https://machinelearning.apple.com/research/momentum-approximation","published":"2024-12-08","authors":["Tao Yu","Congzheng Song","Jianyu Wang","Mona Chitnis"],"abstract":"This paper was accepted for presentation at the International Workshop on Federated Foundation Models (FL@FM-NeurIPS'24), held in conjunction with NeurIPS 2024.","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"openalex:W4413179924","title":"Rethinking Data: Towards Better Performing Domain-Specific Small Language Models","url":"https://doi.org/10.1109/gcwkshp64532.2024.11101189","published":"2024-12-08","authors":["Boris Nazarov","Darya Frolova","Yackov Lubarsky","Alexei Gaissinski","Pavel Kisilev"],"abstract":"Fine-tuning of Large Language Models (LLMs) for downstream tasks, performed on domain-specific data has shown significant promise. However, commercial use of such LLMs is limited by the high computational cost required for their deployment at scale. On the other hand, small Language Models (LMs) are much more cost effective but have subpar performance in a similar setup. This paper presents our approach to fine-tuning a small LM, that reaches high accuracy in multiple-choice question answering task. We achieve this by improving data quality at each stage of the LM training pipeline. In particular, we start with data structuring resulting in extraction of compact, semantically meaningful text chunks used by a retriever. This allows more efficient knowledge digestion by the LM. Further, we improve the retrieved context by training a lightweight Chunk Re-Ranker (CRR) that generates more acc...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/gcwkshp64532.2024.11101189","openalex_id":"https://openalex.org/W4413179924","cited_by_count":1,"quality_score":42,"matched_keywords":["efficient"],"author_affiliations":["Huawei Technologies (China)","Huawei Technologies (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7044066190719604},{"id":"https://openalex.org/C36503486","display_name":"Domain (mathematical analysis)","score":0.4924672544002533},{"id":"https://openalex.org/C67186912","display_name":"Data modeling","score":0.4225451350212097},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.41448426246643066},{"id":"https://openalex.org/C135257023","display_name":"Domain-specific language","score":0.4115237593650818},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.40863001346588135},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.360343337059021},{"id":"https://openalex.org/C199360897","display_name":"Programming language","score":0.21282640099525452}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4405141746","title":"VIP: Versatile Image Outpainting Empowered by Multimodal Large Language Model","url":"https://doi.org/10.1007/978-981-96-0917-8_4","published":"2024-12-07","authors":["Jinze Yang","Haoran Wang","Zining Zhu","Chenglong Liu","Meng Wu","Mingming Sun"],"abstract":"","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"book-chapter","doi":"https://doi.org/10.1007/978-981-96-0917-8_4","openalex_id":"https://openalex.org/W4405141746","cited_by_count":2,"quality_score":43,"matched_keywords":["language model"],"author_affiliations":["Baidu (China)","Beijing Institute of Mathematical Sciences and Applications","University of Chinese Academy of Sciences","University of Michigan"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8613526225090027},{"id":"https://openalex.org/C115961682","display_name":"Image (mathematics)","score":0.4910848140716553},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.44639408588409424},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.43932777643203735},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.43106886744499207},{"id":"https://openalex.org/C199360897","display_name":"Programming language","score":0.41332152485847473},{"id":"https://openalex.org/C121684516","display_name":"Computer graphics (images)","score":0.41328105330467224},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.377687931060791}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":2}},{"id":"apple:of2yi1m3ak2m2j0l0pq0em9n","title":"Fairness Dynamics During Training","url":"https://machinelearning.apple.com/research/fairness-dynamics","published":"2024-12-06","authors":["Krishna Patel","Nivedha Sivakumar","Barry-John Theobald","Luca Zappella","Nicholas Apostoloff"],"abstract":"This paper was accepted at the Evaluating Evaluations: Examining Best Practices for Measuring Broader Impacts of Generative AI Workshop at NeurIPS 2024, and was also accepted at the Evaluating Evaluations: Examining Best Practices for Measuring Broader Impacts of Generative AI — Springer Special Issue 2025.","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"official:69a46d58f8fd2bfe","title":"OpenAI o1 System Card","url":"https://openai.com/index/openai-o1-system-card","published":"2024-12-05","authors":["OpenAI"],"abstract":"This report outlines the safety work carried out prior to releasing OpenAI o1 and o1-mini, including external red teaming and frontier risk evaluations according to our Preparedness Framework.","companies":["OpenAI"],"matched_orgs":["OpenAI"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Research"],"author_affiliations":["OpenAI"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official RSS feed https://openai.com/news/rss.xml"}},{"id":"apple:ojwhnsrs8wz6y2itipjwwlgj","title":"SALSA: Soup-Based Alignment Learning for Stronger Adaptation in RLHF","url":"https://machinelearning.apple.com/research/salsa-soup","published":"2024-12-05","authors":["Atoosa Chegini","Hamid Kazemi","Iman Mirzadeh","Dong Yin","Maxwell Horton","Moin Nabi","Mehrdad Farajtabar","Keivan Alizadeh"],"abstract":"This paper was accepted at the Fine-Tuning in Modern Machine Learning (FITML) Workshop at NeurIPS 2024.","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"apple:zsehp87ay74plygkhkgrto8i","title":"How Easy is It to Fool Your Multimodal LLMs? An Empirical Analysis on Deceptive Prompts","url":"https://machinelearning.apple.com/research/how-easy","published":"2024-12-05","authors":["Yusu Qian","Haotian Zhang","Yinfei Yang","Zhe Gan"],"abstract":"The remarkable advancements in Multimodal Large Language Models (MLLMs) have not rendered them immune to challenges, particularly in the context of handling deceptive information in prompts, thus producing hallucinated responses under such conditions. To quantitatively assess this vulnerability, we present MAD-Bench, a carefully curated benchmark that contains 1000 test samples divided into 5 categories, such as non-existent objects, count of...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"openalex:W4405079013","title":"MOFA-Video: Controllable Image Animation via Generative Motion Field Adaptions in Frozen Image-to-Video Diffusion Model","url":"https://doi.org/10.1007/978-3-031-72655-2_7","published":"2024-12-05","authors":["Muyao Niu","Xiaodong Cun","Xintao Wang","Yong Zhang","Ying Shan","Yinqiang Zheng"],"abstract":"","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"book-chapter","doi":"https://doi.org/10.1007/978-3-031-72655-2_7","openalex_id":"https://openalex.org/W4405079013","cited_by_count":8,"quality_score":45,"matched_keywords":[],"author_affiliations":["Bunkyo University","Tencent (China)","The University of Tokyo"],"concepts":[{"id":"https://openalex.org/C502989409","display_name":"Animation","score":0.8100208044052124},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8015366792678833},{"id":"https://openalex.org/C121684516","display_name":"Computer graphics (images)","score":0.6825919151306152},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.6763437390327454},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.6292065382003784},{"id":"https://openalex.org/C115961682","display_name":"Image (mathematics)","score":0.4932641386985779},{"id":"https://openalex.org/C9652623","display_name":"Field (mathematics)","score":0.4742816090583801},{"id":"https://openalex.org/C104114177","display_name":"Motion (physics)","score":0.44493168592453003}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":8}},{"id":"openalex:W4405078884","title":"Propose, Assess, Search: Harnessing LLMs for Goal-Oriented Planning in Instructional Videos","url":"https://doi.org/10.1007/978-3-031-72655-2_25","published":"2024-12-05","authors":["Md. Mohaiminul Islam","Tushar Nagarajan","Huiyu Wang","Fu-Jen Chu","Kris Kitani","Gedas Bertasius","Xitong Yang"],"abstract":"","companies":["Meta/FAIR"],"matched_orgs":["Meta/FAIR"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"book-chapter","doi":"https://doi.org/10.1007/978-3-031-72655-2_25","openalex_id":"https://openalex.org/W4405078884","cited_by_count":2,"quality_score":39,"matched_keywords":[],"author_affiliations":["Meta (United States)","University of North Carolina Health Care"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8243420124053955},{"id":"https://openalex.org/C84653758","display_name":"Goal orientation","score":0.42458346486091614},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3784891664981842},{"id":"https://openalex.org/C2522767166","display_name":"Data science","score":0.34907668828964233},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.3472517132759094},{"id":"https://openalex.org/C187736073","display_name":"Management","score":0.08560141921043396},{"id":"https://openalex.org/C162324750","display_name":"Economics","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":2}},{"id":"openalex:W4405016591","title":"3DSMILES-GPT: 3D molecular pocket-based generation with token-only large language model","url":"https://doi.org/10.1039/d4sc06864e","published":"2024-12-04","authors":["Jike Wang","Hao Luo","Rui Qin","Mingyang Wang","Xiaozhe Wan","Meijing Fang","Odin Zhang","Qiaolin Gou","Qun Su","Chao Shen","Ziyi You","Liwei Liu"],"abstract":"The generation of three-dimensional (3D) molecules based on target structures represents a cutting-edge challenge in drug discovery. Many existing approaches often produce molecules with invalid configurations, unphysical conformations, suboptimal drug-like qualities, limited synthesizability, and require extensive generation times. To address these challenges, we present 3DSMILES-GPT, a fully language-model-driven framework for 3D molecular generation that utilizes tokens exclusively. We treat both two-dimensional (2D) and 3D molecular representations as linguistic expressions, combining them through full-dimensional representations and pre-training the model on a vast dataset encompassing tens of millions of drug-like molecules. This token-only approach enables the model to comprehensively understand the 2D and 3D characteristics of large-scale molecules. Subsequently, we fine-tune the...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1039/d4sc06864e","openalex_id":"https://openalex.org/W4405016591","cited_by_count":34,"quality_score":71,"matched_keywords":["language model"],"author_affiliations":["Huawei Technologies (China)","Zhejiang University"],"concepts":[{"id":"https://openalex.org/C48145219","display_name":"Security token","score":0.7934520244598389},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6400371193885803},{"id":"https://openalex.org/C74187038","display_name":"Drug discovery","score":0.5669885873794556},{"id":"https://openalex.org/C32909587","display_name":"Molecule","score":0.49003350734710693},{"id":"https://openalex.org/C97541855","display_name":"Reinforcement learning","score":0.46647506952285767},{"id":"https://openalex.org/C2992829110","display_name":"First generation","score":0.42519962787628174},{"id":"https://openalex.org/C186060115","display_name":"Biological system","score":0.40540024638175964},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.37599194049835205}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":34}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/redstone-curating-general-code-math-and-qa-data-for-large-language-models","title":"REDSTONE: Curating General, Code, Math, and QA Data for Large Language Models","url":"https://www.microsoft.com/en-us/research/publication/redstone-curating-general-code-math-and-qa-data-for-large-language-models/","published":"2024-12-04","authors":["Yaoyao Chang","Lei Cui","Li Dong","Shaohan Huang","Yangyu Huang","Yupan Huang","Scarlett Li","Tengchao Lv","Shuming Ma","Qinzheng Sun","Wenhui Wang","Furu Wei"],"abstract":"Pre-training Large Language Models (LLMs) on high-quality, meticulously curated datasets is widely recognized as critical for enhancing their performance and generalization capabilities. This study explores the untapped potential of Common Crawl as a comprehensive and flexible resource for pre-training LLMs, addressing both general-purpose language understanding and specialized domain knowledge. We introduce REDSTONE, an innovative and scalable pipeline engineered to extract and process data from Common Crawl, facilitating the creation of extensive and varied pre-training datasets. Unlike traditional datasets, which often require expensive curation and domain-specific expertise, REDSTONE leverages the breadth of Common Crawl to deliver datasets tailored to a wide array of domains. In this work, we exemplify its capability by constructing pre-training datasets across multiple fields, incl...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Unpublished","Data platforms and analytics"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W4405014281","title":"Federated Black-box Prompt Tuning System for Large Language Models on the Edge","url":"https://doi.org/10.1145/3636534.3698856","published":"2024-12-04","authors":["Yiming Li","Jingwei Sun","Yudong Liu","Yuandong Zhang","Ang Li","Beidi Chen","Holger R. Roth","Daguang Xu","Tingjun Chen","Yiran Chen"],"abstract":"Federated learning (FL) offers a privacy-preserving way to train models across decentralized data. However, fine-tuning pre-trained language models (PLMs) in FL is challenging due to restricted model parameter access, high computational demands, and communication overheads. Our method treats large language models (LLMs) as black-box inference APIs, optimizing prompts with gradient-free methods. This approach, FedBPT, reduces exchanged variables, boosts communication efficiency, and minimizes computational and memory costs. We demonstrate the practical implementation of FedBPT on resource-limited edge devices, showcasing its ability to efficiently achieve collaborative on-device LLM fine-tuning.","companies":["Meta/FAIR","NVIDIA"],"matched_orgs":["Meta/FAIR","NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3636534.3698856","openalex_id":"https://openalex.org/W4405014281","cited_by_count":2,"quality_score":59,"matched_keywords":["LLM","memory"],"author_affiliations":["Carnegie Mellon University","Duke University","Meta (United States)","Nvidia (United States)","University of Maryland, College Park"],"concepts":[{"id":"https://openalex.org/C94966114","display_name":"Black box","score":0.8616316914558411},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6713138222694397},{"id":"https://openalex.org/C162307627","display_name":"Enhanced Data Rates for GSM Evolution","score":0.6456762552261353},{"id":"https://openalex.org/C138236772","display_name":"Edge device","score":0.41219162940979004},{"id":"https://openalex.org/C111919701","display_name":"Operating system","score":0.24399727582931519},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.1604434847831726},{"id":"https://openalex.org/C79974875","display_name":"Cloud computing","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":2}},{"id":"bytedance-seed:149","title":"FullStack Bench: Evaluating LLMs as Full Stack Coders","url":"https://seed.bytedance.com/en/research/fullstack-bench-evaluating-llms-as-full-stack-coders","published":"2024-12-03","authors":["Siyao Liu","He Zhu","Jerry Liu","Shulin Xin","Aoyan Li","Rui Long","Li Chen","Jack Yang","Jinxiang Xia","Z.Y","Peng","Shukai Liu"],"abstract":"As the capabilities of code large language models (LLMs) continue to expand, their applications across diverse code intelligence domains are rapidly increasing. However, most existing datasets only evaluate limited application domains. To address this gap, we have developed a comprehensive code evaluation dataset FullStack Bench focusing on full-stack programming, which encompasses a wide range of application domains (e.g., basic programming, dataanalysis, software engineering, mathematics, and machine learning). Besides, to assess multilingual programming capabilities, in FullStack Bench, we design real-world instructions and corresponding unit test cases from 16 widely-used programming languages to reflect real-world usage scenarios rather than simple translations. Moreover, we also release an effective code sandbox execution tool (i.e, SandboxFusion) supporting various programming lan...","companies":["ByteDance/Seed"],"matched_orgs":["ByteDance/Seed"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Code Intelligence","Infrastructures","arXiv"],"author_affiliations":["ByteDance/Seed"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official ByteDance Seed publication API https://seed.bytedance.com/api/get_article_list_v2"}},{"id":"arxiv:2506.10443","title":"MNN-LLM: A Generic Inference Engine for Fast Large Language Model Deployment on Mobile Devices","url":"http://arxiv.org/abs/2506.10443","published":"2024-12-03","authors":["Zhaode Wang","Jingbang Yang","Xinyu Qian","Shiwen Xing","Xiaotang Jiang","Chengfei Lv","Shengyu Zhang"],"abstract":"Large language models (LLMs) have demonstrated exceptional performance across a variety of tasks. However, their substantial scale leads to significant computational resource consumption during inference, resulting in high costs. Consequently, edge device inference presents a promising solution. The primary challenges of edge inference include memory usage and inference speed. This paper introduces MNN-LLM, a framework specifically designed to accelerate the deployment of large language models on mobile devices. MNN-LLM addresses the runtime characteristics of LLMs through model quantization and DRAM-Flash hybrid storage, effectively reducing memory usage. It rearranges weights and inputs based on mobile CPU instruction sets and GPU characteristics while employing strategies such as multicore load balancing, mixed-precision floating-point operations, and geometric computations to enhance...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"preprint","doi":"https://doi.org/10.1145/3700410.3702126","openalex_id":"https://openalex.org/W4405794828","cited_by_count":6,"quality_score":59,"matched_keywords":["LLM","language model","memory","quantization"],"author_affiliations":["Alibaba Group (China)","Zhejiang University"],"concepts":[{"id":"https://openalex.org/C105339364","display_name":"Software deployment","score":0.7785872220993042},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7371021509170532},{"id":"https://openalex.org/C2776214188","display_name":"Inference","score":0.7299237847328186},{"id":"https://openalex.org/C46743427","display_name":"Inference engine","score":0.5877580046653748},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.46389368176460266},{"id":"https://openalex.org/C186967261","display_name":"Mobile device","score":0.4217592775821686},{"id":"https://openalex.org/C199360897","display_name":"Programming language","score":0.32674163579940796},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.2517944574356079}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":6}},{"id":"apple:q059qsrbkap4yeisiq3glhcb","title":"Towards Time-Series Reasoning with LLMs","url":"https://machinelearning.apple.com/research/towards-time","published":"2024-12-03","authors":["Winnie Chow","Lauren Gardiner","Haraldur T. Hallgrímsson","Maxwell A. Xu","Shirley You Ren"],"abstract":"Multi-modal large language models (MLLMs) have enabled numerous advances in understanding and reasoning in domains like vision, but we have not yet seen this broad success for time-series. Although prior works on time-series MLLMs have shown promising performance in time-series forecasting, very few works show how an LLM could be used for time-series reasoning in natural language. We propose a novel multi-modal time-series LLM approach that...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["LLM"],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"apple:wy7r9j82f09hvxmcbyci33mn","title":"Leveraging Periodicity for Robustness with Multi-modal Mood Pattern Models","url":"https://machinelearning.apple.com/research/leveraging-periodicity","published":"2024-12-03","authors":["Jaya Narain","Jenny Sun","Oussama Elachqar","Haraldur Hallgrimsson","Feng Zhu","Shirley Ren"],"abstract":"Equal Contributors","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"official:ae7d04363dd4bb9b","title":"HunyuanVideo: A Systematic Framework For Large Video Generative Models","url":"https://huggingface.co/papers/2412.03603","published":"2024-12-03","authors":["Tencent Hunyuan"],"abstract":"","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_report"],"source":"official_report","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Tencent/Hunyuan"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official company report source"}},{"id":"openalex:W4406271370","title":"Fusion Side Tuning: A Parameter and Memory Efficient Fine-tuning Method for High-resolution Medical Image Classification","url":"https://doi.org/10.1109/bibm62325.2024.10821946","published":"2024-12-03","authors":["Zhangchi Wang","Yijin Huang","Yidu Wu","Pujin Cheng","Li Lin","Qinghai Guo","Xiaoying Tang"],"abstract":"Parameter-efficient fine-tuning (PEFT) has been proposed as a cost-effective approach for transferring large-scale pre-trained models (LPMs) to downstream tasks, mitigating the high costs associated with updating all parameters of LPMs. However, current PEFT methods encounter the challenge that GPU memory usage during training is not reduced as effectively as parameter usage. In this paper, we propose Fusion Side Tuning (FST), a novel memory-efficient and parameter-efficient fine-tuning method. FST significantly reduces GPU memory consumption during training, particularly at high resolutions. To achieve this, we freeze the backbone LPM and construct a learnable side fusion network that takes intermediate features from the backbone as input. The side fusion network consists of a sequence of fusion modules, which enable it to leverage the knowledge embedded in the intermediate features. Ad...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/bibm62325.2024.10821946","openalex_id":"https://openalex.org/W4406271370","cited_by_count":2,"quality_score":47,"matched_keywords":["memory","efficient"],"author_affiliations":["Huawei Technologies (China)","Southern University of Science and Technology"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6697161197662354},{"id":"https://openalex.org/C157524613","display_name":"Fine-tuning","score":0.5587528347969055},{"id":"https://openalex.org/C138268822","display_name":"Resolution (logic)","score":0.5328807234764099},{"id":"https://openalex.org/C158525013","display_name":"Fusion","score":0.5303274989128113},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5058133602142334},{"id":"https://openalex.org/C115961682","display_name":"Image (mathematics)","score":0.44673630595207214},{"id":"https://openalex.org/C69744172","display_name":"Image fusion","score":0.43066126108169556},{"id":"https://openalex.org/C153180895","display_name":"Pattern recognition (psychology)","score":0.36266472935676575}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":2}},{"id":"openalex:W4404966690","title":"SIGGesture: Generalized Co-Speech Gesture Synthesis via Semantic Injection with Large-Scale Pre-Training Diffusion Models","url":"https://doi.org/10.1145/3680528.3687677","published":"2024-12-03","authors":["Qingrong Cheng","Xu Li","Xinghui Fu"],"abstract":"","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3680528.3687677","openalex_id":"https://openalex.org/W4404966690","cited_by_count":8,"quality_score":45,"matched_keywords":[],"author_affiliations":["Tencent (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7241960763931274},{"id":"https://openalex.org/C207347870","display_name":"Gesture","score":0.6722385883331299},{"id":"https://openalex.org/C2778755073","display_name":"Scale (ratio)","score":0.5786467790603638},{"id":"https://openalex.org/C2777211547","display_name":"Training (meteorology)","score":0.5339881777763367},{"id":"https://openalex.org/C69357855","display_name":"Diffusion","score":0.5289222002029419},{"id":"https://openalex.org/C28490314","display_name":"Speech recognition","score":0.49436116218566895},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4270007014274597},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.4090993106365204}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":8}},{"id":"openalex:W4404961990","title":"Model Can Be Subtle: Two Important Mechanisms for Social Media Popularity Prediction","url":"https://doi.org/10.1145/3705319","published":"2024-12-03","authors":["Ning Xu","X Wang","Jing Liu","Lanjun Wang","Xuanya Li","Mengxiao Zhu","Yongdong Zhang","An-An Liu"],"abstract":"Social media popularity prediction is an important channel to explore content sharing and communication on social networks. It aims to capture informative cues by analyzing multi-type data (such as user profile, image, and text) to decide the popularity of a specified post. In this article, we divide social network users into two categories (i.e., active and inactive users) and find a dilemma in existing models: If an active user publishes the low-popularity post, the model will habitually predict the high score. On the contrary, if an inactive user provides the high-popularity post, the model still gives the low score incorrectly. Therefore, how to make the model more subtle to users is important. Comparing to existing methods that directly leverage multi-modal features for regression training, this article stresses more on two novel mechanisms. The first method aims to prevent the over...","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3705319","openalex_id":"https://openalex.org/W4404961990","cited_by_count":1,"quality_score":42,"matched_keywords":["media"],"author_affiliations":["Baidu (China)","Tianjin University","University of Science and Technology of China"],"concepts":[{"id":"https://openalex.org/C2780586970","display_name":"Popularity","score":0.9477952718734741},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8644315600395203},{"id":"https://openalex.org/C153083717","display_name":"Leverage (statistics)","score":0.6628820896148682},{"id":"https://openalex.org/C518677369","display_name":"Social media","score":0.5713016390800476},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.43020159006118774},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.40700721740722656},{"id":"https://openalex.org/C124101348","display_name":"Data mining","score":0.3455438017845154},{"id":"https://openalex.org/C136764020","display_name":"World Wide Web","score":0.2772793471813202}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4406271402","title":"Molecular Graph Representation Learning Integrating Large Language Models with Domain-specific Small Models","url":"https://doi.org/10.1109/bibm62325.2024.10821952","published":"2024-12-03","authors":["Tianyu Zhang","Yuxiang Ren","Chengbin Hou","Hairong Lv","Xuegong Zhang"],"abstract":"Molecular property prediction is a crucial foundation for drug discovery. In recent years, pre-trained deep learning models have been widely applied to this task. Some approaches that incorporate prior biological domain knowledge into the pre-training framework have achieved impressive results. However, these methods heavily rely on biochemical experts, and retrieving and summarizing vast amounts of domain knowledge literature is both time-consuming and expensive. Large Language Models (LLMs) have demonstrated remarkable performance in understanding and efficiently providing general knowledge. Nevertheless, they occasionally exhibit hallucinations and lack precision in generating domain-specific knowledge. Conversely, Domain-specific Small Models (DSMs) possess rich domain knowledge and can accurately calculate molecular domain-related metrics. However, due to their limited model size an...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/bibm62325.2024.10821952","openalex_id":"https://openalex.org/W4406271402","cited_by_count":3,"quality_score":40,"matched_keywords":[],"author_affiliations":["Fuzhou University","Huawei Technologies (China)","Tsinghua University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7442442774772644},{"id":"https://openalex.org/C132525143","display_name":"Graph","score":0.5161808133125305},{"id":"https://openalex.org/C2776359362","display_name":"Representation (politics)","score":0.5140958428382874},{"id":"https://openalex.org/C80444323","display_name":"Theoretical computer science","score":0.44420120120048523},{"id":"https://openalex.org/C36503486","display_name":"Domain (mathematical analysis)","score":0.4349616467952728},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.42883968353271484},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.42294496297836304},{"id":"https://openalex.org/C33923547","display_name":"Mathematics","score":0.14062634110450745}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":3}},{"id":"openalex:W4404965946","title":"BlobGEN-3D: Compositional 3D-Consistent Freeview Image Generation with 3D Blobs","url":"https://doi.org/10.1145/3680528.3687645","published":"2024-12-03","authors":["Chao Liu","Weili Nie","Sifei Liu","Abhishek Badki","Hang Su","Morteza Mardani","Benjamin Eckart","Arash Vahdat"],"abstract":"","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3680528.3687645","openalex_id":"https://openalex.org/W4404965946","cited_by_count":2,"quality_score":39,"matched_keywords":[],"author_affiliations":["Nvidia (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5618476271629333},{"id":"https://openalex.org/C121684516","display_name":"Computer graphics (images)","score":0.4454887807369232},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.44438251852989197},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4344261586666107},{"id":"https://openalex.org/C115961682","display_name":"Image (mathematics)","score":0.4228590726852417}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":2}},{"id":"openalex:W4406860627","title":"SRC-gAudio: Sampling-Rate-Controlled Audio Generation","url":"https://doi.org/10.1109/apsipaasc63619.2025.10849319","published":"2024-12-03","authors":["Chenxing Li","Manjie Xu","Dong Yu"],"abstract":"We introduce SRC-gAudio, a novel audio generation model designed to facilitate text-to-audio generation across a wide range of sampling rates within a single model architecture. SRC-gAudio incorporates the sampling rate as part of the generation condition to guide the diffusion-based audio generation process. Our model enables the generation of audio at multiple sampling rates with a single unified model. Furthermore, we explore the potential benefits of large-scale, low-sampling-rate data in enhancing the generation quality of high-sampling-rate audio. Through extensive experiments, we demonstrate that SRC-gAudio effectively generates audio under controlled sampling rates. Additionally, our results indicate that pre-training on low-sampling-rate data can lead to significant improvements in audio quality across various metrics.","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/apsipaasc63619.2025.10849319","openalex_id":"https://openalex.org/W4406860627","cited_by_count":1,"quality_score":38,"matched_keywords":[],"author_affiliations":["Tencent (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5969409942626953},{"id":"https://openalex.org/C140779682","display_name":"Sampling (signal processing)","score":0.4794289469718933},{"id":"https://openalex.org/C2779106878","display_name":"Digital audio broadcasting","score":0.41505166888237},{"id":"https://openalex.org/C28490314","display_name":"Speech recognition","score":0.3816646635532379},{"id":"https://openalex.org/C76155785","display_name":"Telecommunications","score":0.17809852957725525},{"id":"https://openalex.org/C94915269","display_name":"Detector","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4406260028","title":"Multi-rater Prompting for Ambiguous Medical Image Segmentation","url":"https://doi.org/10.1109/bibm62325.2024.10822080","published":"2024-12-03","authors":["Jinhong Wang","Yi Cheng","Jintai Chen","Hongxia Xu","Danny Chen","Jian Wu"],"abstract":"Multi-rater annotations commonly occur when medical images are independently annotated by multiple experts (raters). In this paper, we tackle two challenges arisen in multi-rater annotations for medical image segmentation (called ambiguous medical image segmentation): (1) How to train a deep learning model when a group of raters produces a set of diverse but plausible annotations, and (2) how to fine-tune the model efficiently when computation resources are not available for retraining the entire model on a different dataset domain. We propose a multi-rater prompt-based approach to address these two challenges altogether. Specifically, we introduce a series of rater-aware prompts that can be plugged into the U-Net model for uncertainty estimation to handle multi-annotation cases. During the prompt-based fine-tuning process, only 0.3% of learnable parameters are required to be updated com...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/bibm62325.2024.10822080","openalex_id":"https://openalex.org/W4406260028","cited_by_count":1,"quality_score":38,"matched_keywords":[],"author_affiliations":["Alibaba Group (China)","Second Affiliated Hospital of Zhejiang University","University of Illinois Urbana-Champaign","University of Notre Dame","Zhejiang Lab","Zhejiang University of Science and Technology"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.662555992603302},{"id":"https://openalex.org/C124504099","display_name":"Image segmentation","score":0.6219592094421387},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.6204385757446289},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.6059662103652954},{"id":"https://openalex.org/C89600930","display_name":"Segmentation","score":0.512579619884491}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4406461487","title":"Bestow: Efficient and Streamable Speech Language Model with The Best of Two Worlds in GPT and T5","url":"https://doi.org/10.1109/slt61566.2024.10832146","published":"2024-12-02","authors":["Zhehuai Chen","He Huang","Oleksii Hrinchuk","Krishna C. Puvvada","Nithin Rao Koluguri","Piotr Żelasko","Jagadeesh Balam","Boris Ginsburg"],"abstract":"Incorporating speech understanding capabilities into pretrained large-language models has become a vital research direction (SpeechLLM). The previous architectures can be categorized as: i) GPT-style, prepend speech prompts to the text prompts as a sequence of LLM inputs like a decoder-only model; ii) T5-style, introduce speech cross-attention to each layer of the pretrained LLMs. We propose BESTOW architecture to bring the BESt features from $T w O$ Worlds into a single model that is highly efficient and has strong multitask capabilities. Moreover, there is no clear streaming solution for either style, especially considering the solution should generalize to speech multitask. We reformulate streamable SpeechLLM as a read-write policy problem and unifies the offline and streaming research with BESTOW architecture. Hence we demonstrate the first open-source SpeechLLM solution that enables...","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/slt61566.2024.10832146","openalex_id":"https://openalex.org/W4406461487","cited_by_count":6,"quality_score":55,"matched_keywords":["LLM","language model","efficient"],"author_affiliations":["Nvidia (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6539334058761597},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.4152681827545166},{"id":"https://openalex.org/C41895202","display_name":"Linguistics","score":0.3604622483253479},{"id":"https://openalex.org/C28490314","display_name":"Speech recognition","score":0.3600345253944397},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.33320990204811096},{"id":"https://openalex.org/C138885662","display_name":"Philosophy","score":0.07997015118598938}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":6}},{"id":"openalex:W4406461575","title":"Resource-Efficient Adaptation of Speech Foundation Models for Multi-Speaker ASR","url":"https://doi.org/10.1109/slt61566.2024.10832215","published":"2024-12-02","authors":["Weiqing Wang","Kunal Dhawan","Tae‐Jin Park","Krishna C. Puvvada","Ivan Medennikov","Somshubra Majumdar","He Huang","Jagadeesh Balam","Boris Ginsburg"],"abstract":"Speech foundation models have achieved state-of-the-art (SoTA) performance across various tasks, such as automatic speech recognition (ASR) in hundreds of languages. However, multi-speaker ASR remains a challenging task for these models due to data scarcity and sparsity. In this paper, we present approaches to enable speech foundation models to process and understand multi-speaker speech with limited training data. Specifically, we adapt a speech foundation model for the multi-speaker ASR task using only telephonic data. Remarkably, the adapted model also performs well on meeting data without any fine-tuning, demonstrating the generalization ability of our approach. We conduct several ablation studies to analyze the impact of different parameters and strategies on model performance. Our findings highlight the effectiveness of our methods. Results show that less parameters give better ove...","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/slt61566.2024.10832215","openalex_id":"https://openalex.org/W4406461575","cited_by_count":2,"quality_score":43,"matched_keywords":["efficient"],"author_affiliations":["Nvidia (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7908196449279785},{"id":"https://openalex.org/C28490314","display_name":"Speech recognition","score":0.7012909054756165},{"id":"https://openalex.org/C139807058","display_name":"Adaptation (eye)","score":0.6936118006706238},{"id":"https://openalex.org/C2780966255","display_name":"Foundation (evidence)","score":0.6156485676765442},{"id":"https://openalex.org/C206345919","display_name":"Resource (disambiguation)","score":0.43461549282073975},{"id":"https://openalex.org/C133892786","display_name":"Speaker recognition","score":0.4159315228462219},{"id":"https://openalex.org/C31258907","display_name":"Computer network","score":0.0606326162815094},{"id":"https://openalex.org/C15744967","display_name":"Psychology","score":0.05181005597114563}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":2}},{"id":"openalex:W4405181871","title":"Poster: An Exploration of Large Language Models in Malicious Source Code Detection","url":"https://doi.org/10.1145/3658644.3691374","published":"2024-12-02","authors":["Di Xue","Gang Zhao","Zhongqi Fan","Wei Li","Yahong Xu","Zhen Liu","Yin Liu","Zhongliang Yuan"],"abstract":"Embedding malicious code within the software supply chain has become a significant concern in the information technology field. Current methods for detecting malicious code, based on signatures, behavior analysis, and traditional machine learning models, lack result interpretability. This study proposes a novel malicious code detection framework, Mal-LLM, which leverages the cost advantages of traditional machine learning models and the interpretability of LLMs. Initially, traditional machine learning models filter vast amounts of malicious source code in the software supply chain. Subsequently, LLMs analyze and interpret the filtered malicious source code using a customized prompt template incorporating role-playing and chain-of-thought techniques. The feasibility of the Mal-LLM framework is validated through extensive experimental analyses, examining the ambiguity and redundancy of the...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3658644.3691374","openalex_id":"https://openalex.org/W4405181871","cited_by_count":2,"quality_score":43,"matched_keywords":["LLM"],"author_affiliations":["Huawei Technologies (China)"],"concepts":[{"id":"https://openalex.org/C2781067378","display_name":"Interpretability","score":0.8417819142341614},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7758591175079346},{"id":"https://openalex.org/C43126263","display_name":"Source code","score":0.6257576942443848},{"id":"https://openalex.org/C2780522230","display_name":"Ambiguity","score":0.4954270124435425},{"id":"https://openalex.org/C2777904410","display_name":"Software","score":0.4624834656715393},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.45615991950035095},{"id":"https://openalex.org/C2776760102","display_name":"Code (set theory)","score":0.4399358034133911},{"id":"https://openalex.org/C2777462759","display_name":"Word embedding","score":0.43857550621032715}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":2}},{"id":"openalex:W4404982891","title":"Learning A Hierarchical Graph Autoregression Model for Semi-template Molecular Retrosynthesis","url":"https://doi.org/10.26434/chemrxiv-2024-gqp7b","published":"2024-12-02","authors":["Shen Yuan","Fanmeng Wang","Zhewei Wei","Peilin Zhao","Lanqing Li","Hongteng Xu"],"abstract":"As a significant task of pharmaceutical and chemical engineering, molecular retrosynthesis aims at predicting candidate reactants from predefined products. Treating this challenging task as a conditional generative modeling problem, we propose a hierarchical graph autoregression (HGAR) model and its pretraining-assisted multi-task learning paradigm, leading to an effective semi-template molecular retrosynthesis method. Given a product, we first construct a hierarchical graph by connecting the junction tree of its motifs to the atom-level molecular graph. Our HGAR model embeds the hierarchical graph in the motif and atom levels, respectively. The atom-level embeddings are applied to predict reaction centers and derive synthons from the product. The motif-level embeddings are applied to predict motifs and complete the corresponding synthons autoregressive, leading to the target reactants.....","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"preprint","doi":"https://doi.org/10.26434/chemrxiv-2024-gqp7b","openalex_id":"https://openalex.org/W4404982891","cited_by_count":1,"quality_score":38,"matched_keywords":[],"author_affiliations":["Renmin University of China","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C132525143","display_name":"Graph","score":0.535612940788269},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.526349663734436},{"id":"https://openalex.org/C133029050","display_name":"Vector autoregression","score":0.48676586151123047},{"id":"https://openalex.org/C42437451","display_name":"Retrosynthetic analysis","score":0.41500651836395264},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.40624141693115234},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.39394888281822205},{"id":"https://openalex.org/C33923547","display_name":"Mathematics","score":0.22331830859184265},{"id":"https://openalex.org/C80444323","display_name":"Theoretical computer science","score":0.20307740569114685}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/a-shared-standard-for-valid-measurement-of-generative-ai-systems-capabilities-risks-and-impacts","title":"A Shared Standard for Valid Measurement of Generative AI Systems' Capabilities, Risks, and Impacts","url":"https://www.microsoft.com/en-us/research/publication/a-shared-standard-for-valid-measurement-of-generative-ai-systems-capabilities-risks-and-impacts/","published":"2024-12-01","authors":["Alex Chouldechova","Chad Atalla","Solon Barocas","A. Feder Cooper","Emily Corvi","Alex Dow","Jean Garcia-Gathright","Nick Pangakis","Stefanie Reed","Emily Sheng","Dan Vann","Matthew Vogel"],"abstract":"The valid measurement of generative AI (GenAI) systems' capabilities, risks, and impacts forms the bedrock of our ability to evaluate these systems. We introduce a shared standard for valid measurement that helps place many of the disparate-seeming evaluation practices in use today on a common footing. Our framework, grounded in measurement theory from the social sciences, extends the work of Adcock & Collier (2001) in which the authors formalized valid measurement of concepts in political science via three processes: systematizing background concepts, operationalizing systematized concepts via annotation procedures, and applying those procedures to instances. We argue that valid measurement of GenAI systems' capabilities, risks, and impacts, further requires systematizing, operationalizing, and applying not only the entailed concepts, but also the contexts of interest and the metrics us...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":84,"matched_keywords":["Miscellaneous","Artificial intelligence","Social sciences","Computer science","Generative AI","Responsible AI","Social Science","political"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/personalized-progression-modelling-and-prediction-in-parkinsons-disease-with-a-novel-multi-modal-graph-approach","title":"Personalized progression modelling and prediction in Parkinson’s disease with a novel multi-modal graph approach","url":"https://www.microsoft.com/en-us/research/publication/personalized-progression-modelling-and-prediction-in-parkinsons-disease-with-a-novel-multi-modal-graph-approach/","published":"2024-12-01","authors":["Jie Lian","Xufang Luo","Caihua Shan","Dongqi Han","Chencheng Zhang","V. Vardhanabhuti","Dongsheng Li","Lili Qiu"],"abstract":"Parkinson’s disease (PD) is a complex neurological disorder characterized by dopaminergic neuron degeneration, leading to diverse motor and non-motor impairments. This variability complicates accurate progression modelling and early-stage prediction. Traditional classification methods based on clinical symptoms are often limited by disease heterogeneity. This study introduces an graph-based interpretable personalized progression method, utilizing data from the Parkinson’s Progression Markers Initiative (PPMI) and Stroke Parkinson’s Disease Biomarker Program (PDBP). Our approach integrates multimodal inter-individual and intra-individual data, including clinical assessments, MRI, and genetic information to make multi-dimension predictions. Validated using the PDBP dataset from 12 to 36 months, our AdaMedGraph method demonstrated strong performance, achieving AUC values of 0.748 and 0.714....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.1038/s41531-024-00832-w","openalex_id":"https://openalex.org/W4404886129","cited_by_count":15,"quality_score":83,"matched_keywords":["Article (Journal)","Artificial intelligence","Medicine","personalized"],"author_affiliations":["Microsoft","Hong Kong Virtual University","Microsoft (United States)","Microsoft Research (United Kingdom)","Microsoft Research Asia (China)","Ruijin Hospital","Shanghai Jiao Tong University","University of Hong Kong"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/microsoft-new-future-of-work-report-2024","title":"Microsoft New Future of Work Report 2024","url":"https://www.microsoft.com/en-us/research/publication/microsoft-new-future-of-work-report-2024/","published":"2024-12-01","authors":["Jenna Butler","Mihaela Vorvoreanu","Rebecca Janssen","Abigail Sellen","Nicole Immorlica","Adam Troy","Advait Sarkar","Alex Farach","Alex Chouldechova","Alexandra Olteanu","Alexia Cambon","Arjun Radhakrishna"],"abstract":"As Microsoft approaches its 50th anniversary, the landscape of work continues to evolve at an unprecedented pace. The past year has marked a pivotal shift, moving from predictions and controlled lab studies to the real-world implementation and impact of new technologies. These advancements, built on decades of research and development, are beginning to yield tangible results, offering a clearer view of how generative AI is reshaping the way work gets done.We began the New Future of Work Report series in 2021 , during the height of the global shift to remote work. That inaugural report synthesized research to help reimagine work as traditional models were upended. The 2022 report focused on hybrid work, exploring how intentional co-location could complement remote practices. By 2023 , the focus turned to integrating large language models (LLMs) into workflows, examining their potential to...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":80,"matched_keywords":["Tech Report","Artificial intelligence","Data platforms and analytics","Economics","Human-computer interaction","Programming languages and software engineering","Social sciences"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/efficacy-of-a-conversational-chatbot-for-cigarette-smoking-cessation-protocol-of-the-quitbot-full-scale-randomized-controlled-trial","title":"Efficacy of a conversational chatbot for cigarette smoking cessation: Protocol of the QuitBot full-scale randomized controlled trial","url":"https://www.microsoft.com/en-us/research/publication/efficacy-of-a-conversational-chatbot-for-cigarette-smoking-cessation-protocol-of-the-quitbot-full-scale-randomized-controlled-trial/","published":"2024-12-01","authors":["Jonathan B Bricker","Brianna M Sullivan","Kristin E Mull","Juan M. Lavista Ferres","Margarita Santiago-Torres"],"abstract":"Globally, cigarette smoking results in over 8 million premature annual deaths. Addressing this issue requires high-impact, cost-effective population-level interventions for smoking cessation. Conversational chatbots offer a potential solution given the recent advancements in machine learning and large language models. Chatbots can deliver supportive, empathetic behaviors, personalized responses, and timely advice tailored to users' needs that is engaging through therapeutic conversations aimed at creating lasting social-emotional connections. Despite their promise, little is known about the efficacy and underlying mechanisms of chatbots for cigarette smoking cessation. We developed QuitBot, a quit smoking program of two to three-minute conversations covering topics ranging from motivations to quit, setting a quit date, choosing cessation medications, coping with triggers, maintaining abs...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":80,"matched_keywords":["Article (Journal)","Artificial intelligence","Human language technologies","Medical, health and genomics","1970-01-01","language model","personalized"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/on-efficient-distillation-from-llms-to-slms","title":"On Efficient Distillation from LLMs to SLMs","url":"https://www.microsoft.com/en-us/research/publication/on-efficient-distillation-from-llms-to-slms/","published":"2024-12-01","authors":["Metod Jazbec","Menglin Xia","Ankur Mallick","Daniel Madrigal","Dongge Han","Samuel Kessler","Victor Ruehle"],"abstract":"Finetuning small language models (SLMs) on data generated by large language models (LLMs), a form of knowledge distillation, has recently been demonstrated to lead to significantly enhanced capabilities of small models across various domains (e.g., mathematical reasoning). However, current approaches typically require synthesizing a large number of new examples (100K), which increases the resources and training time needed for finetuning. To address this issue, we investigate principles for making the distillation process more efficient by reducing the amount of synthetic data required. Specifically, we explore (i) incorporating SLM's feedback into the LLM's data generation process and (ii) including LLM's rationales (i.e., step-by-step solutions) in the distilled data. In our experiments using the Mistral7B model as the SLM on math reasoning tasks (GSM8K, MATH), we find that both feedba...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","1970-01-01","LLM","efficient","distillation"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/interpreting-multi-band-galaxy-observations-with-large-language-model-based-agents","title":"Interpreting Multi-band Galaxy Observations with Large Language Model-Based Agents","url":"https://www.microsoft.com/en-us/research/publication/interpreting-multi-band-galaxy-observations-with-large-language-model-based-agents/","published":"2024-12-01","authors":["Zechang Sun","Yuan-Sen Ting","Yaobo Liang","Nan Duan","Song Huang","Zheng Cai"],"abstract":"Astronomical research traditionally relies on extensive domain knowledge to interpret observations and narrow down hypotheses. We demonstrate that this process can be emulated using large language model-based agents to accelerate research workflows. We propose mephisto, a multi-agent collaboration framework that mimics human reasoning to interpret multi-band galaxy observations. mephisto interacts with the CIGALE codebase, which includes spectral energy distribution (SED) models to explain observations. In this open-world setting, mephisto learns from its self-play experience, performs tree search, and accumulates knowledge in a dynamically updated base. As a proof of concept, we apply mephisto to the latest data from the James Webb Space Telescope. mephisto attains near-human proficiency in reasoning about galaxies' physical scenarios, even when dealing with a recently discovered popula...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Miscellaneous","Artificial intelligence","LLM","language model","agent","multi-agent"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/investigating-neural-audio-codecs-for-speech-language-model-based-speech-generation","title":"Investigating neural audio codecs for speech language model-based speech generation","url":"https://www.microsoft.com/en-us/research/publication/investigating-neural-audio-codecs-for-speech-language-model-based-speech-generation/","published":"2024-12-01","authors":["Jiaqi Li","Dongmei Wang","Xiaofei Wang","Yao Qian","Long Zhou","Shujie Liu","Midia Yousefi","Canrun Li","Chung-Hsien Tsai","Zhen Xiao","Yanqing Liu","Junkun Chen"],"abstract":"Neural audio codec tokens serve as the fundamental building blocks for speech language model (SLM)-based speech generation. However, there is no systematic understanding on how the codec system affects the speech generation performance of the SLM. In this work, we examine codec tokens within SLM framework for speech generation to provide insights for effective codec design. We retrain existing high-performing neural codec models on the same data set and loss functions to compare their performance in a uniform setting. We integrate codec tokens into two SLM systems: masked-based parallel speech generation system and an auto-regressive (AR) plus non-auto-regressive (NAR) model-based system. Our findings indicate that better speech reconstruction in codec systems does not guarantee improved speech generation in SLM. A high-quality codec decoder is crucial for natural speech production in SL...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.1109/slt61566.2024.10832266","openalex_id":"https://openalex.org/W4406495604","cited_by_count":0,"quality_score":72,"matched_keywords":["Inproceedings (Conference)","Human language technologies","1970-01-01","language model","quantization"],"author_affiliations":["Microsoft","Chinese University of Hong Kong, Shenzhen","Microsoft (United States)"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/hysynth-context-free-llm-approximation-for-guiding-program-synthesis","title":"HYSYNTH: Context-Free LLM Approximation for Guiding Program Synthesis","url":"https://www.microsoft.com/en-us/research/publication/hysynth-context-free-llm-approximation-for-guiding-program-synthesis/","published":"2024-12-01","authors":["Shraddha Barke","Emmanuel Anaya Gonzalez","Saketh Ram Kasibatla","Taylor Berg-Kirkpatrick","Nadia Polikarpova"],"abstract":"Many structured prediction and reasoning tasks can be framed as program synthesis problems, where the goal is to generate a program in a domain-specific language (DSL) that transforms input data into the desired output. Unfortunately, purely neural approaches, such as large language models (LLMs), often fail to produce fully correct programs in unfamiliar DSLs, while purely symbolic methods based on combinatorial search scale poorly to complex problems. Motivated by these limitations, we introduce a hybrid approach, where LLM completions for a given task are used to learn a task-specific, context-free surrogate model, which is then used to guide program synthesis. We evaluate this hybrid approach on three domains, and show that it outperforms both unguided search and direct sampling from LLMs, as well as existing program synthesizers. Venue: 1970-01-01","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Programming languages and software engineering","1970-01-01","LLM"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/human-agent-interaction-challenges","title":"Challenges in Human-Agent Communication","url":"https://www.microsoft.com/en-us/research/publication/human-agent-interaction-challenges/","published":"2024-12-01","authors":["Gagan Bansal","Jennifer Wortman Vaughan","Saleema Amershi","Eric Horvitz","Adam Fourney","Hussein Mozannar","Victor Dibia","Daniel S. Weld"],"abstract":"Remarkable advancements in modern generative foundation models have enabled the development of sophisticated and highly capable autonomous agents that can observe their environment, invoke tools, and communicate with other agents to solve problems. Although such agents can communicate with users through natural language, their complexity and wide-ranging failure modes present novel challenges for human-AI interaction. Building on prior research and informed by a communication grounding perspective, we contribute to the study of \\emph{human-agent communication} by identifying and analyzing twelve key communication challenges that these systems pose. These include challenges in conveying information from the agent to the user, challenges in enabling the user to convey information to the agent, and overarching challenges that need to be considered across all human-agent communication. We il...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Tech Report","Artificial intelligence","Human-computer interaction","Human–computer interaction","agent"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/can-graph-learning-improve-task-planning","title":"Can Graph Learning Improve Task Planning?","url":"https://www.microsoft.com/en-us/research/publication/can-graph-learning-improve-task-planning/","published":"2024-12-01","authors":["Xixi Wu","Yifei Shen","Caihua Shan","Kaitao Song","Siwei Wang","Bohang Zhang","Jiarui Feng","Hong Cheng","Wei Chen","Yun Xiong","Dongsheng Li"],"abstract":"Task planning is emerging as an important research topic alongside the development of large language models (LLMs). It aims to break down complex user requests into solvable sub-tasks, thereby fulfilling the original requests. In this context, the sub-tasks can be naturally viewed as a graph, where the nodes represent the sub-tasks, and the edges denote the dependencies among them. Consequently, task planning is a decision-making problem that involves selecting a connected path or subgraph within the corresponding graph and invoking it. In this paper, we explore graph learning-based methods for task planning, a direction that is orthogonal to the prevalent focus on prompt design. Our interest in graph learning stems from a theoretical discovery: the biases of attention and auto-regressive loss impede LLMs' ability to effectively navigate decision-making on graphs, which is adeptly addres...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","Graph neural networks","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/alpine-unveiling-the-planning-capability-of-autoregressive-learning-in-language-models","title":"ALPINE: Unveiling the Planning Capability of Autoregressive Learning in Language Models","url":"https://www.microsoft.com/en-us/research/publication/alpine-unveiling-the-planning-capability-of-autoregressive-learning-in-language-models/","published":"2024-12-01","authors":["Siwei Wang","Yifei Shen","Shi Feng","Haoran Sun","Shang-Hua Teng","Wei Chen"],"abstract":"In this paper, we present the findings of our Project ALPINE which stands for \"Autoregressive Learning for Planning In NEtworks.\" Project ALPINE initiates a theoretical investigation into the development of planning capabilities in Transformer-based language models through their autoregressive learning mechanisms, aiming to identify any potential limitations in their planning abilities. We abstract planning as a network path-finding task where the objective is to generate a valid path from a specified source node to a designated target node. In terms of expressiveness, we show that the Transformer is capable of executing path-finding by embedding the adjacency and reachability matrices within its weights. Our theoretical analysis of the gradient-based learning dynamic of the Transformer reveals that the Transformer is capable of learning both the adjacency matrix and a limited form of th...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","transformer language models","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W4405596328","title":"Woodpecker: hallucination correction for multimodal large language models","url":"https://doi.org/10.1007/s11432-024-4251-x","published":"2024-12-01","authors":["Shukang Yin","Chaoyou Fu","Sirui Zhao","Tong Bill Xu","Hao Wang","Dianbo Sui","Yunhang Shen","Ke Li","Xing Sun","Enhong Chen"],"abstract":"","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1007/s11432-024-4251-x","openalex_id":"https://openalex.org/W4405596328","cited_by_count":59,"quality_score":67,"matched_keywords":[],"author_affiliations":["Chinese Academy of Sciences","Nanjing University","Shandong Institute of Automation","Shanghai Chengtou (China)","Suzhou University of Science and Technology","Tencent (China)","University of Science and Technology of China"],"concepts":[{"id":"https://openalex.org/C2776190662","display_name":"Woodpecker","score":0.44309675693511963},{"id":"https://openalex.org/C15744967","display_name":"Psychology","score":0.4250434637069702},{"id":"https://openalex.org/C41895202","display_name":"Linguistics","score":0.40382856130599976},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.389457643032074},{"id":"https://openalex.org/C180747234","display_name":"Cognitive psychology","score":0.34536051750183105},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.3323504328727722},{"id":"https://openalex.org/C188147891","display_name":"Cognitive science","score":0.32024431228637695},{"id":"https://openalex.org/C138885662","display_name":"Philosophy","score":0.197109192609787}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":59}},{"id":"openalex:W4405399726","title":"OCRBench: on the hidden mystery of OCR in large multimodal models","url":"https://doi.org/10.1007/s11432-024-4235-6","published":"2024-12-01","authors":["Yuliang Liu","Zhang Li","Mingxin Huang","Biao Yang","Wenwen Yu","Chunyuan Li","Xu-Cheng Yin","Chenglin Liu","Lianwen Jin","Xiang Bai"],"abstract":"","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1007/s11432-024-4235-6","openalex_id":"https://openalex.org/W4405399726","cited_by_count":73,"quality_score":67,"matched_keywords":[],"author_affiliations":["Huazhong University of Science and Technology","Institute of Automation","Microsoft (United States)","Microsoft Research (United Kingdom)","Shandong Institute of Automation","South China University of Technology","University of Science and Technology Beijing"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.4847542941570282},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.40573498606681824},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.35392656922340393}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":73}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/compositional-generalization-across-distributional-shifts-with-sparse-tree-operations","title":"Compositional Generalization Across Distributional Shifts with Sparse Tree Operations","url":"https://www.microsoft.com/en-us/research/publication/compositional-generalization-across-distributional-shifts-with-sparse-tree-operations/","published":"2024-12-01","authors":["Paul Smolensky","Jianfeng Gao","Roland Fernandez"],"abstract":"Neural networks continue to struggle with compositional generalization, and this issue is exacerbated by a lack of massive pre-training. One successful approach for developing neural systems which exhibit human-like compositional generalization is hybrid neurosymbolic techniques. However, these techniques run into the core issues that plague symbolic approaches to AI: scalability and flexibility. The reason for this failure is that at their core, hybrid neurosymbolic models perform symbolic computation and relegate the scalable and flexible neural computation to parameterizing a symbolic system. We investigate a unified neurosymbolic system where transformations in the network can be interpreted simultaneously as both symbolic and neural computation. We extend a unified neurosymbolic architecture called the Differentiable Tree Machine in two central ways. First, we significantly increase...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","neural networks"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/mmlu-cf-a-contamination-free-multi-task-language-understanding-benchmark","title":"MMLU-CF: A Contamination-free Multi-task Language Understanding Benchmark","url":"https://www.microsoft.com/en-us/research/publication/mmlu-cf-a-contamination-free-multi-task-language-understanding-benchmark/","published":"2024-12-01","authors":["Qihao Zhao","Yangyu Huang","Tengchao Lv","Lei Cui","Furu Wei","Qinzheng Sun","Ying Xin","Shaoguang Mao","Xin Zhang","Qiufeng Yin","Scarlett Li"],"abstract":"Multiple-choice question (MCQ) datasets like Massive Multitask Language Understanding (MMLU) are widely used to evaluate the commonsense, understanding, and problem-solving abilities of large language models (LLMs). However, the open-source nature of these benchmarks and the broad sources of training data for LLMs have inevitably led to benchmark contamination, resulting in unreliable evaluation results. To alleviate this issue, we propose a contamination-free and more challenging MCQ benchmark called MMLU-CF. This benchmark reassesses LLMs' understanding of world knowledge by averting both unintentional and malicious data leakage. To avoid unintentional data leakage, we source data from a broader domain and design three decontamination rules. To prevent malicious data leakage, we divide the benchmark into validation and test sets with similar difficulty and subject distributions. The te...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Miscellaneous","Artificial intelligence"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W4404889932","title":"External Prompt Features Enhanced Parameter-Efficient Fine-Tuning for Salient Object Detection","url":"https://doi.org/10.1007/978-3-031-78347-0_6","published":"2024-12-01","authors":["Wen Liang","Peipei Ran","Mengchao Bai","Xiao Liu","P. Bilha Githinji","Wei Zhao","Peiwu Qin"],"abstract":"","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"book-chapter","doi":"https://doi.org/10.1007/978-3-031-78347-0_6","openalex_id":"https://openalex.org/W4404889932","cited_by_count":1,"quality_score":42,"matched_keywords":["efficient"],"author_affiliations":["Huawei Technologies (China)","Tsinghua University","Tsinghua–Berkeley Shenzhen Institute"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8638096451759338},{"id":"https://openalex.org/C2780719617","display_name":"Salient","score":0.6182005405426025},{"id":"https://openalex.org/C2776151529","display_name":"Object detection","score":0.5651522874832153},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.546116054058075},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.49292516708374023},{"id":"https://openalex.org/C2781238097","display_name":"Object (grammar)","score":0.46480098366737366},{"id":"https://openalex.org/C153180895","display_name":"Pattern recognition (psychology)","score":0.37313270568847656}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4404879608","title":"Visual Text Generation in the Wild","url":"https://doi.org/10.1007/978-3-031-73668-1_6","published":"2024-12-01","authors":["Yuanzhi Zhu","Jiawei Liu","Feiyu Gao","Wenyu Liu","Xinggang Wang","Peng Wang","Fei Huang","Cong Yao","Zhibo Yang"],"abstract":"","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"book-chapter","doi":"https://doi.org/10.1007/978-3-031-73668-1_6","openalex_id":"https://openalex.org/W4404879608","cited_by_count":4,"quality_score":41,"matched_keywords":[],"author_affiliations":["Alibaba Group (China)","Huazhong University of Science and Technology"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8346834182739258},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.39980730414390564},{"id":"https://openalex.org/C121684516","display_name":"Computer graphics (images)","score":0.359591007232666},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.34793752431869507}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":4}},{"id":"openalex:W4410538768","title":"RCRank: Multimodal Ranking of Root Causes of Slow Queries in Cloud Database Systems","url":"https://doi.org/10.14778/3717755.3717774","published":"2024-12-01","authors":["Biao Ouyang","Y. Zhang","Hanyin Cheng","Shu Yang","Chenjuan Guo","Bin Yang","Qingsong Wen","Lunting Fan","Christian S. Jensen"],"abstract":"With the continued migration of storage to cloud database systems, the impact of slow queries in such systems on services and user experience is increasing. Root-cause diagnosis plays an indispensable role in facilitating slow-query detection and revision. This paper proposes a method capable of both identifying possible root cause types for slow queries and ranking these according to their potential for accelerating slow queries. This enables prioritizing root causes with the highest impact, in turn improving slow-query revision effectiveness. To enable more accurate and detailed diagnoses, we propose the multimodal Ranking for the Root Causes of slow queries (RCRank) framework, which formulates root cause analysis as a multimodal machine learning problem and leverages multimodal information from query statements, execution plans, execution logs, and key performance indicators. To obtai...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.14778/3717755.3717774","openalex_id":"https://openalex.org/W4410538768","cited_by_count":2,"quality_score":39,"matched_keywords":[],"author_affiliations":["Aalborg University","Alibaba Group (China)","East China Normal University"],"concepts":[{"id":"https://openalex.org/C189430467","display_name":"Ranking (information retrieval)","score":0.7037185430526733},{"id":"https://openalex.org/C79974875","display_name":"Cloud computing","score":0.6575927734375},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6426771283149719},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.49157389998435974},{"id":"https://openalex.org/C171078966","display_name":"Root (linguistics)","score":0.4895871579647064},{"id":"https://openalex.org/C77088390","display_name":"Database","score":0.48822787404060364},{"id":"https://openalex.org/C111919701","display_name":"Operating system","score":0.06743249297142029},{"id":"https://openalex.org/C41895202","display_name":"Linguistics","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":2}},{"id":"openalex:W4404879609","title":"Exploring Phrase-Level Grounding with Text-to-Image Diffusion Model","url":"https://doi.org/10.1007/978-3-031-73668-1_10","published":"2024-12-01","authors":["Danni Yang","Ruohan Dong","Jiayi Ji","Yiwei Ma","Haowei Wang","Xiaoshuai Sun","Rongrong Ji"],"abstract":"","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"book-chapter","doi":"https://doi.org/10.1007/978-3-031-73668-1_10","openalex_id":"https://openalex.org/W4404879609","cited_by_count":1,"quality_score":38,"matched_keywords":[],"author_affiliations":["Tencent (China)","Xiamen University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8220123052597046},{"id":"https://openalex.org/C2776224158","display_name":"Phrase","score":0.7922630906105042},{"id":"https://openalex.org/C69357855","display_name":"Diffusion","score":0.5070412158966064},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.47837892174720764},{"id":"https://openalex.org/C115961682","display_name":"Image (mathematics)","score":0.461681067943573},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.4594403803348541},{"id":"https://openalex.org/C168993435","display_name":"Ground","score":0.45361757278442383},{"id":"https://openalex.org/C119599485","display_name":"Electrical engineering","score":0.11291256546974182}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4404879650","title":"Accelerating Image Generation with Sub-path Linear Approximation Model","url":"https://doi.org/10.1007/978-3-031-73668-1_19","published":"2024-12-01","authors":["Chen Xu","Tianhui Song","Weixin Feng","Xubin Li","Tiezheng Ge","Bo Zheng","Limin Wang"],"abstract":"","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"book-chapter","doi":"https://doi.org/10.1007/978-3-031-73668-1_19","openalex_id":"https://openalex.org/W4404879650","cited_by_count":1,"quality_score":38,"matched_keywords":[],"author_affiliations":["Alibaba Group (China)","Nanjing University","Shanghai Artificial Intelligence Laboratory"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.792397141456604},{"id":"https://openalex.org/C2777735758","display_name":"Path (computing)","score":0.5751753449440002},{"id":"https://openalex.org/C115961682","display_name":"Image (mathematics)","score":0.5165573358535767},{"id":"https://openalex.org/C11413529","display_name":"Algorithm","score":0.41223829984664917},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.36562544107437134},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3582500219345093},{"id":"https://openalex.org/C31258907","display_name":"Computer network","score":0.07477456331253052}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4408185200","title":"Report on the 2nd Workshop on Task-Focused IR in the Era of Generative AI","url":"https://doi.org/10.1145/3722449.3722466","published":"2024-12-01","authors":["Chirag Shah","Ryen W. White"],"abstract":"Generative Artificial Intelligence (GenAI) is revolutionizing how people access information and how they tackle and complete complex information tasks. This report is a summary of a recent workshop at Microsoft on this important and pressing topic. The event brought together a diverse mix of attendees from different professions and at different career stages for an engaging day of presentations and discussions. The emergent themes are described in detail in this summary. Date : 27 September 2024. Website : https://ir-ai.github.io.","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3722449.3722466","openalex_id":"https://openalex.org/W4408185200","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Microsoft (United States)","University of Washington"],"concepts":[{"id":"https://openalex.org/C2780451532","display_name":"Task (project management)","score":0.6111988425254822},{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.5837259888648987},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.31954383850097656},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.2838401794433594},{"id":"https://openalex.org/C127413603","display_name":"Engineering","score":0.25112617015838623},{"id":"https://openalex.org/C201995342","display_name":"Systems engineering","score":0.12911561131477356}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"official:cfbfec3cfca8d4ae","title":"Warped Diffusion: Solving Video Inverse Problems with Image Diffusion Models","url":"https://research.nvidia.com/publication/2024-12_warped-diffusion-solving-video-inverse-problems-image-diffusion-models","published":"2024-12","authors":["Giannis Daras","Weili Nie","Karsten Kreis","Alexandros G. Dimakis","Morteza Mardani","Nikola Kovachki","Arash Vahdat"],"abstract":"Official NVIDIA Research publication. NeurIPS","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["NeurIPS"],"author_affiliations":["NVIDIA"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official NVIDIA Research publications page https://research.nvidia.com/publications?f%5B0%5D=publication_date%3A2024&page=0"}},{"id":"official:81df5d6561a2c75d","title":"Self-Taught Recognizer: Toward Unsupervised Adaptation for Speech Foundation Models","url":"https://research.nvidia.com/publication/2024-12_self-taught-recognizer-toward-unsupervised-adaptation-speech-foundation-models","published":"2024-12","authors":["Yuchen Hu","Chen Chen","Huck Yang","Chengwei Qin","Pin-Yu Chen","Eng Siong Chng","Chao Zhang"],"abstract":"Official NVIDIA Research publication. NeurIPS","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["NeurIPS"],"author_affiliations":["NVIDIA"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official NVIDIA Research publications page https://research.nvidia.com/publications?f%5B0%5D=publication_date%3A2024&page=0"}},{"id":"official:154681d4941647f3","title":"Pretraining codomain attention neural operators for solving multiphysics pdes","url":"https://research.nvidia.com/publication/2024-12_pretraining-codomain-attention-neural-operators-solving-multiphysics-pdes","published":"2024-12","authors":["Md Ashiqur Rahman","Robert Joseph George","Mogab Elleithy","Daniel Leibovici","Zongyi Li","Boris Bonev","Colin White","Julius Berner","Raymond A. Yeh","Jean Kossaifi","Kamyar Azizzadenesheli","Anima Anandkumar"],"abstract":"Official NVIDIA Research publication. NeurIPS","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["NeurIPS"],"author_affiliations":["NVIDIA"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official NVIDIA Research publications page https://research.nvidia.com/publications?f%5B0%5D=publication_date%3A2024&page=0"}},{"id":"official:a4d90f83eab48f06","title":"Large Language Model Based Generative Error Correction: A Challenge and Baselines for Speech Recognition, Speaker Tagging, and Emotion Recognition","url":"https://research.nvidia.com/publication/2024-12_large-language-model-based-generative-error-correction-challenge-and-baselines","published":"2024-12","authors":["Huck Yang","Taejin Park","Yuan Gong","Yuanchao Li","Zhehuai Chen","yen-ting Lin","Chen Chen","Yuchen Hu","Kunal Dhawan","Piotr Zelasko","Chao Zhang","Yun-Nung Chen"],"abstract":"Official NVIDIA Research publication.","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["language model"],"author_affiliations":["NVIDIA"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official NVIDIA Research publications page https://research.nvidia.com/publications?f%5B0%5D=publication_date%3A2024&page=0"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/language-to-code-translation-with-a-single-labeled-example","title":"Language-to-Code Translation with a Single Labeled Example","url":"https://www.microsoft.com/en-us/research/publication/language-to-code-translation-with-a-single-labeled-example/","published":"2024-11-30","authors":["Kaj Bostrom","Harsh Jhamtani","Hao Fang","Sam Thomson","Richard Shin","Patrick Xia","Ben Van Durme","Jason Eisner","Jacob Andreas"],"abstract":"Tools for translating natural language into code promise natural, open-ended interaction with databases, web APIs, and other software systems. However, this promise is complicated by the diversity and continual development of these systems, each with its own interface and distinct set of features. Building a new language-to-code translator, even starting with a large language model (LM), typically requires annotating a large set of natural language commands with their associated programs. In this paper, we describe ICIP (In-Context Inverse Programming), a method for bootstrapping a language-to-code system using mostly (or entirely) unlabeled programs written using a potentially unfamiliar (but human-readable) library or API. ICIP uses a pre-trained LM to assign candidate natural language descriptions to these programs, then iteratively refines the descriptions to ensure global consistenc...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.18653/v1/2024.emnlp-main.462","openalex_id":"https://openalex.org/W4404782410","cited_by_count":0,"quality_score":72,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Human language technologies","1970-01-01","language model"],"author_affiliations":["Microsoft","Microsoft (United States)","The University of Texas at Austin"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/cogact-a-foundational-vision-language-action-model-for-synergizing-cognition-and-action-in-robotic-manipulation","title":"CogACT: A Foundational Vision-Language-Action Model for Synergizing Cognition and Action in Robotic Manipulation","url":"https://www.microsoft.com/en-us/research/publication/cogact-a-foundational-vision-language-action-model-for-synergizing-cognition-and-action-in-robotic-manipulation/","published":"2024-11-28","authors":["Qixiu Li","Yaobo Liang","Zeyu Wang","Lin Luo","Xi Chen","Mozheng Liao","Fangyun Wei","Yu Deng","Sicheng Xu","Yizhong Zhang","Xiaofan Wang","Bei Liu"],"abstract":"The advancement of large Vision-Language-Action (VLA) models has significantly improved robotic manipulation in terms of language-guided task execution and generalization to unseen scenarios. While existing VLAs adapted from pretrained large Vision-Language-Models (VLM) have demonstrated promising generalizability, their task performance is still unsatisfactory as indicated by the low tasks success rates in different environments. In this paper, we present a new advanced VLA architecture derived from VLM. Unlike previous works that directly repurpose VLM for action prediction by simple action quantization, we propose a componentized VLA architecture that has a specialized action module conditioned on VLM output. We systematically study the design of the action module and demonstrates the strong performance enhancement with diffusion action transformers for action sequence modeling, as we...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":84,"matched_keywords":["Miscellaneous","Artificial intelligence","Computer vision","Human-computer interaction","Computer science","Robotics","Vision-language models","quantization"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W4404821554","title":"Nucleotide Transformer: building and evaluating robust foundation models for human genomics","url":"https://doi.org/10.1038/s41592-024-02523-z","published":"2024-11-28","authors":["Hugo Dalla-Torre","Liam Gonzalez","Javier Mendoza‐Revilla","Nicolás López Carranza","Adam Grzywaczewski","Francesco Oteri","Christian Dallago","Evan Trop","Bernardo P. de Almeida","Hassan Sirelkhatim","Guillaume Richard","Marcin J. Skwark"],"abstract":"The prediction of molecular phenotypes from DNA sequences remains a longstanding challenge in genomics, often driven by limited annotated data and the inability to transfer learnings between tasks. Here, we present an extensive study of foundation models pre-trained on DNA sequences, named Nucleotide Transformer, ranging from 50 million up to 2.5 billion parameters and integrating information from 3,202 human genomes and 850 genomes from diverse species. These transformer models yield context-specific representations of nucleotide sequences, which allow for accurate predictions even in low-data settings. We show that the developed models can be fine-tuned at low cost to solve a variety of genomics applications. Despite no supervision, the models learned to focus attention on key genomic elements and can be used to improve the prioritization of genetic variants. The training and applicati...","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1038/s41592-024-02523-z","openalex_id":"https://openalex.org/W4404821554","cited_by_count":324,"quality_score":67,"matched_keywords":[],"author_affiliations":["Institute on Taxation and Economic Policy","Nvidia (United States)","Technical University of Munich"],"concepts":[{"id":"https://openalex.org/C189206191","display_name":"Genomics","score":0.7599178552627563},{"id":"https://openalex.org/C70721500","display_name":"Computational biology","score":0.557266116142273},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5266963243484497},{"id":"https://openalex.org/C141231307","display_name":"Genome","score":0.4929542541503906},{"id":"https://openalex.org/C51679486","display_name":"DNA sequencing","score":0.4585028290748596},{"id":"https://openalex.org/C66322947","display_name":"Transformer","score":0.4517202079296112},{"id":"https://openalex.org/C2779343474","display_name":"Context (archaeology)","score":0.42738550901412964},{"id":"https://openalex.org/C2777615720","display_name":"Prioritization","score":0.41265058517456055}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":324}},{"id":"official:f286328f7db1efc0","title":"QwQ: Reflect Deeply on the Boundaries of the Unknown","url":"https://qwenlm.github.io/blog/qwq-32b-preview/","published":"2024-11-28","authors":["Alibaba/Qwen"],"abstract":"GITHUB HUGGING FACE MODELSCOPE DEMO DISCORDNote: This is the pronunciation of QwQ: /kwju:/ , similar to the word “quill”.What does it mean to think, to question, to understand? These are the deep waters that QwQ (Qwen with Questions) wades into. Like an eternal student of wisdom, it approaches every problem - be it mathematics, code, or knowledge of our world - with genuine wonder and doubt. QwQ embodies that ancient philosophical spirit: it knows that it knows nothing, and that’s precisely what drives its curiosity.","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Alibaba/Qwen"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official RSS feed https://qwenlm.github.io/blog/index.xml"}},{"id":"openalex:W4404820443","title":"Multimodal Label Relevance Ranking via Reinforcement Learning","url":"https://doi.org/10.1007/978-3-031-72848-8_23","published":"2024-11-28","authors":["Taian Guo","Taolin Zhang","Haoqian Wu","Hanjun Li","Ruizhi Qiao","Xing Sun"],"abstract":"","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"book-chapter","doi":"https://doi.org/10.1007/978-3-031-72848-8_23","openalex_id":"https://openalex.org/W4404820443","cited_by_count":1,"quality_score":38,"matched_keywords":[],"author_affiliations":["Tencent (China)","Tsinghua University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8397766351699829},{"id":"https://openalex.org/C158154518","display_name":"Relevance (law)","score":0.8022664189338684},{"id":"https://openalex.org/C97541855","display_name":"Reinforcement learning","score":0.7507020235061646},{"id":"https://openalex.org/C189430467","display_name":"Ranking (information retrieval)","score":0.743456244468689},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5788264870643616},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.4976852238178253},{"id":"https://openalex.org/C2779532271","display_name":"Relevance feedback","score":0.494048148393631},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.377339243888855}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/large-language-model-brained-gui-agents-a-survey","title":"Large Language Model-Brained GUI Agents: A Survey","url":"https://www.microsoft.com/en-us/research/publication/large-language-model-brained-gui-agents-a-survey/","published":"2024-11-27","authors":["Chaoyun Zhang","Shilin He","Jiaxu Qian","Bowen Li","Liqun Li","Si Qin","Yu Kang","Ming-Jie Ma","Qingwei Lin 林庆维","Saravan Rajmohan","Dongmei Zhang","Qi Zhang"],"abstract":"GUIs have long been central to human-computer interaction, providing an intuitive and visually-driven way to access and interact with digital systems. The advent of LLMs, particularly multimodal models, has ushered in a new era of GUI automation. They have demonstrated exceptional capabilities in natural language understanding, code generation, and visual processing. This has paved the way for a new generation of LLM-brained GUI agents capable of interpreting complex GUI elements and autonomously executing actions based on natural language instructions. These agents represent a paradigm shift, enabling users to perform intricate, multi-step tasks through simple conversational commands. Their applications span across web navigation, mobile app interactions, and desktop automation, offering a transformative user experience that revolutionizes how individuals interact with software. This em...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":80,"matched_keywords":["Article (Journal)","Artificial intelligence","Programming languages and software engineering","software engineering","LLM","language model","agent"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W4404769790","title":"Exploring Pre-trained Text-to-Video Diffusion Models for Referring Video Object Segmentation","url":"https://doi.org/10.1007/978-3-031-73254-6_26","published":"2024-11-27","authors":["Zixin Zhu","Xuelu Feng","Dongdong Chen","Junsong Yuan","Chunming Qiao","Gang Hua"],"abstract":"","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"book-chapter","doi":"https://doi.org/10.1007/978-3-031-73254-6_26","openalex_id":"https://openalex.org/W4404769790","cited_by_count":6,"quality_score":43,"matched_keywords":[],"author_affiliations":["Dolby (United States)","Microsoft (United States)","University at Buffalo, State University of New York"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8574298024177551},{"id":"https://openalex.org/C89600930","display_name":"Segmentation","score":0.6376643180847168},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.615815281867981},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.6137150526046753},{"id":"https://openalex.org/C2781238097","display_name":"Object (grammar)","score":0.5275421142578125},{"id":"https://openalex.org/C202474056","display_name":"Video tracking","score":0.47806742787361145},{"id":"https://openalex.org/C121684516","display_name":"Computer graphics (images)","score":0.4339821934700012},{"id":"https://openalex.org/C124504099","display_name":"Image segmentation","score":0.42874228954315186}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":6}},{"id":"bytedance-seed:245","title":"DiG: Scalable and Efficient Diffusion Models with Gated Linear Attention","url":"https://seed.bytedance.com/en/research/dig-scalable-and-efficient-diffusion-models-with-gated-linear-attention","published":"2024-11-26","authors":["Lianghui Zhu","Zilong Huang","Bencheng Liao","Jun Hao Liew","Hanshu Yan","Jiashi Feng","Xinggang Wang"],"abstract":"Diffusion models with large-scale pre-training have achieved significant success in the field of visual content generation, particularly exemplified by Diffusion Transformers (DiT). However, DiT models have faced challenges with quadratic complexity efficiency, especially when handling long sequences. In this paper, we aim to incorporate the sub-quadratic modeling capability of Gated Linear Attention (GLA) into the 2D diffusion backbone. Specifically, we introduce Diffusion Gated Linear Attention Transformers (DiG), a simple, adoptable solution with minimal parameter overhead. We offer two variants, i,e, a plain and U-shape architecture, showing superior efficiency and competitive effectiveness. In addition to superior performance to DiT and other sub-quadratic-time diffusion models at 256×256 resolution, DiG demonstrates greater efficiency than these methods starting from a 512 resoluti...","companies":["ByteDance/Seed"],"matched_orgs":["ByteDance/Seed"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Computer Vision","Vision","CVPR 2025","memory","efficient"],"author_affiliations":["ByteDance/Seed"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official ByteDance Seed publication API https://seed.bytedance.com/api/get_article_list_v2"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/reducio-generating-1024%e2%a8%891024-video-within-16-seconds-using-extremely-compressed-motion-latents","title":"REDUCIO! Generating 1024⨉1024 Video within 16 Seconds using Extremely Compressed Motion Latents","url":"https://www.microsoft.com/en-us/research/publication/reducio-generating-1024%e2%a8%891024-video-within-16-seconds-using-extremely-compressed-motion-latents/","published":"2024-11-26","authors":["Rui Tian","Qi Dai","Jianmin Bao","Kai Qiu","Yifan Yang","Chong Luo","Zuxuan Wu","Yu-Gang Jiang"],"abstract":"Commercial video generation models have exhibited realistic, high-fidelity results but are still restricted to limited access. One crucial obstacle for large-scale applications is the expensive training and inference cost. In this paper, we argue that videos contain much more redundant information than images, thus can be encoded by very few motion latents based on a content image. Towards this goal, we design an image-conditioned VAE to encode a video to an extremely compressed motion latent space. This magic Reducio charm enables 64x reduction of latents compared to a common 2D VAE, without sacrificing the quality. Training diffusion models on such a compact representation easily allows for generating 1K resolution videos. We then adopt a two-stage video generation paradigm, which performs text-to-image and text-image-to-video sequentially. Extensive experiments show that our Reducio-D...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Miscellaneous","Computer vision","Computer science"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W4404725701","title":"OccGen: Generative Multi-modal 3D Occupancy Prediction for Autonomous Driving","url":"https://doi.org/10.1007/978-3-031-72661-3_6","published":"2024-11-26","authors":["Guoqing Wang","Zhongdao Wang","Tang Pin","Jilai Zheng","Xiangxuan Ren","Bailan Feng","Chao Ma"],"abstract":"","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"book-chapter","doi":"https://doi.org/10.1007/978-3-031-72661-3_6","openalex_id":"https://openalex.org/W4404725701","cited_by_count":21,"quality_score":58,"matched_keywords":[],"author_affiliations":["Huawei Technologies (China)","Shanghai Jiao Tong University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.855563759803772},{"id":"https://openalex.org/C160331591","display_name":"Occupancy","score":0.7878905534744263},{"id":"https://openalex.org/C71139939","display_name":"Modal","score":0.7362335920333862},{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.6591292023658752},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.45629265904426575},{"id":"https://openalex.org/C167966045","display_name":"Generative model","score":0.44113782048225403},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.371929407119751},{"id":"https://openalex.org/C170154142","display_name":"Architectural engineering","score":0.07414546608924866}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":21}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/ensuring-fair-llm-serving-amid-diverse-applications","title":"Ensuring Fair LLM Serving Amid Diverse Applications","url":"https://www.microsoft.com/en-us/research/publication/ensuring-fair-llm-serving-amid-diverse-applications/","published":"2024-11-24","authors":["Kunal Jain","Ankur Mallick","A. Parayil","Renee St. Amant","Rujia Wang","Victor Ruehle","Chetan Bansal","Saravan Rajmohan","Redwan Ibne Seraj Khan","Haiying Shen","Anoop Kulkarni","Steve Kofsky"],"abstract":"In a multi-tenant large language model (LLM) serving platform hosting diverse applications, some users may submit an excessive number of requests, causing the service to become unavailable to other users and creating unfairness. Existing fairness approaches do not account for variations in token lengths across applications and multiple LLM calls, making them unsuitable for such platforms. To address the fairness challenge, this paper analyzes millions of requests from thousands of users on MS CoPilot, a real-world multi-tenant LLM platform hosted by Microsoft. Our analysis confirms the inadequacy of existing methods and guides the development of FairServe, a system that ensures fair LLM access across diverse applications. FairServe proposes application-characteristic aware request throttling coupled with a weighted service counter based scheduling technique to curb abusive behavior and e...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Article (Journal)","Artificial intelligence","Systems and networking","Computer science","LLM","language model"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/enabling-adoption-of-regenerative-agriculture-through-soil-carbon-copilots","title":"Enabling Adoption of Regenerative Agriculture through Soil Carbon Copilots","url":"https://www.microsoft.com/en-us/research/publication/enabling-adoption-of-regenerative-agriculture-through-soil-carbon-copilots/","published":"2024-11-24","authors":["Margaret Capetz","Swati Sharma","Rafael Padilha","Peder Olsen","Jessica Wolk","Emre Kiciman","Ranveer Chandra"],"abstract":"Mitigating climate change requires transforming agriculture to minimize environ mental impact and build climate resilience. Regenerative agricultural practices enhance soil organic carbon (SOC) levels, thus improving soil health and sequestering carbon. A challenge to increasing regenerative agriculture practices is cheaply measuring SOC over time and understanding how SOC is affected by regenerative agricultural practices and other environmental factors and farm management practices. To address this challenge, we introduce an AI-driven Soil Organic Carbon Copilot that automates the ingestion of complex multi-resolution, multi-modal data to provide large-scale insights into soil health and regenerative practices. Our data includes extreme weather event data (e.g., drought and wildfire incidents), farm management data (e.g., cropland information and tillage predictions), and SOC predictio...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Article (Journal)","Artificial intelligence","Computer science"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W4404654684","title":"Dolphins: Multimodal Language Model for Driving","url":"https://doi.org/10.1007/978-3-031-72995-9_23","published":"2024-11-23","authors":["Yingzi Ma","Yulong Cao","Jiachen Sun","Marco Pavone","Chaowei Xiao"],"abstract":"","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"book-chapter","doi":"https://doi.org/10.1007/978-3-031-72995-9_23","openalex_id":"https://openalex.org/W4404654684","cited_by_count":38,"quality_score":71,"matched_keywords":["language model"],"author_affiliations":["Nvidia (United States)","Stanford University","University of Michigan–Ann Arbor","University of Wisconsin–Madison"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8523766994476318},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.41034847497940063},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3979395627975464},{"id":"https://openalex.org/C28490314","display_name":"Speech recognition","score":0.3447182774543762},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.3367592990398407}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":38}},{"id":"openalex:W4404654748","title":"PanGu-Draw: Advancing Resource-Efficient Text-to-Image Synthesis with Time-Decoupled Training and Reusable Coop-Diffusion","url":"https://doi.org/10.1007/978-3-031-72995-9_10","published":"2024-11-23","authors":["Guansong Lu","Yuanfan Guo","Jianhua Han","Minzhe Niu","Yihan Zeng","Songcen Xu","Zeyi Huang","Zhao Zhong","Zhang Wei","Hang Xu"],"abstract":"","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"book-chapter","doi":"https://doi.org/10.1007/978-3-031-72995-9_10","openalex_id":"https://openalex.org/W4404654748","cited_by_count":0,"quality_score":41,"matched_keywords":["efficient"],"author_affiliations":["Huawei Technologies (Canada)","Huawei Technologies (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8520702123641968},{"id":"https://openalex.org/C69357855","display_name":"Diffusion","score":0.5508926510810852},{"id":"https://openalex.org/C115961682","display_name":"Image (mathematics)","score":0.52237468957901},{"id":"https://openalex.org/C206345919","display_name":"Resource (disambiguation)","score":0.4487279951572418},{"id":"https://openalex.org/C2777211547","display_name":"Training (meteorology)","score":0.4423845410346985},{"id":"https://openalex.org/C49774154","display_name":"Multimedia","score":0.41223204135894775},{"id":"https://openalex.org/C121684516","display_name":"Computer graphics (images)","score":0.38285931944847107},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.363007128238678}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/gaps-between-research-and-practice-when-measuring-representational-harms-caused-by-llm-based-systems","title":"Gaps Between Research and Practice When Measuring Representational Harms Caused by LLM-Based Systems","url":"https://www.microsoft.com/en-us/research/publication/gaps-between-research-and-practice-when-measuring-representational-harms-caused-by-llm-based-systems/","published":"2024-11-22","authors":["Emma Harvey","Emily Sheng","Su Lin Blodgett","Alex Chouldechova","Jean Garcia-Gathright","Alexandra Olteanu","Hanna Wallach"],"abstract":"To facilitate the measurement of representational harms caused by large language model (LLM)-based systems, the NLP research community has produced and made publicly available numerous measurement instruments, including tools, datasets, metrics, benchmarks, annotation instructions, and other techniques. However, the research community lacks clarity about whether and to what extent these instruments meet the needs of practitioners tasked with developing and deploying LLM-based systems in the real world, and how these instruments could be improved. Via a series of semi-structured interviews with practitioners in a variety of roles in different organizations, we identify four types of challenges that prevent practitioners from effectively using publicly available instruments for measuring representational harms caused by LLM-based systems: (1) challenges related to using publicly available....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":84,"matched_keywords":["Miscellaneous","Artificial intelligence","Social sciences","Computer science","Responsible AI","Social Science","LLM","language model"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W4406030356","title":"DrugAssist: a large language model for molecule optimization","url":"https://doi.org/10.1093/bib/bbae693","published":"2024-11-22","authors":["Geyan Ye","Xibao Cai","Houtim Lai","Xing Wang","Junhong Huang","Longyue Wang","Wei Liu","Xiangxiang Zeng"],"abstract":"Recently, the impressive performance of large language models (LLMs) on a wide range of tasks has attracted an increasing number of attempts to apply LLMs in drug discovery. However, molecule optimization, a critical task in the drug discovery pipeline, is currently an area that has seen little involvement from LLMs. Most of existing approaches focus solely on capturing the underlying patterns in chemical structures provided by the data, without taking advantage of expert feedback. These non-interactive approaches overlook the fact that the drug discovery process is actually one that requires the integration of expert experience and iterative refinement. To address this gap, we propose DrugAssist, an interactive molecule optimization model which performs optimization through human-machine dialogue by leveraging LLM's strong interactivity and generalizability. DrugAssist has achieved lead...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1093/bib/bbae693","openalex_id":"https://openalex.org/W4406030356","cited_by_count":26,"quality_score":71,"matched_keywords":["LLM","language model"],"author_affiliations":["Hunan University","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8005080819129944},{"id":"https://openalex.org/C27158222","display_name":"Generalizability theory","score":0.7017388343811035},{"id":"https://openalex.org/C43521106","display_name":"Pipeline (software)","score":0.5483591556549072},{"id":"https://openalex.org/C2522767166","display_name":"Data science","score":0.47427138686180115},{"id":"https://openalex.org/C98045186","display_name":"Process (computing)","score":0.43417686223983765},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.42271965742111206},{"id":"https://openalex.org/C74187038","display_name":"Drug discovery","score":0.4220528304576874},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.4068320691585541}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":26}},{"id":"openalex:W4404612934","title":"LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents","url":"https://doi.org/10.1007/978-3-031-72970-6_8","published":"2024-11-22","authors":["Shilong Liu","Hao Cheng","Haotian Liu","Hao Zhang","Feng Li","Tianhe Ren","Xueyan Zou","Jianwei Yang","Hang Su","Jun Zhu","Lei Zhang","Jianfeng Gao"],"abstract":"","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"book-chapter","doi":"https://doi.org/10.1007/978-3-031-72970-6_8","openalex_id":"https://openalex.org/W4404612934","cited_by_count":35,"quality_score":67,"matched_keywords":[],"author_affiliations":["Hong Kong University of Science and Technology","Institute for Development and Economic Analysis","Microsoft (United States)","Tsinghua University","University of Wisconsin–Madison"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8600801229476929},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.48270201683044434},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.41183996200561523}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":35}},{"id":"openalex:W4404612908","title":"Grounding DINO: Marrying DINO with Grounded Pre-training for Open-Set Object Detection","url":"https://doi.org/10.1007/978-3-031-72970-6_3","published":"2024-11-22","authors":["Shilong Liu","Zhaoyang Zeng","Tianhe Ren","Feng Li","Hao Zhang","Jie Yang","Qing Jiang","Chunyuan Li","Jianwei Yang","Hang Su","Jun Zhu","Lei Zhang"],"abstract":"","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"book-chapter","doi":"https://doi.org/10.1007/978-3-031-72970-6_3","openalex_id":"https://openalex.org/W4404612908","cited_by_count":1288,"quality_score":67,"matched_keywords":[],"author_affiliations":["Chinese University of Hong Kong, Shenzhen","Hong Kong University of Science and Technology","Microsoft (United States)","South China University of Technology","Tsinghua University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8040570020675659},{"id":"https://openalex.org/C2781238097","display_name":"Object (grammar)","score":0.6197131276130676},{"id":"https://openalex.org/C177264268","display_name":"Set (abstract data type)","score":0.5105193853378296},{"id":"https://openalex.org/C2777211547","display_name":"Training (meteorology)","score":0.49345913529396057},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4819735884666443},{"id":"https://openalex.org/C51632099","display_name":"Training set","score":0.4699842035770416},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.3449634909629822},{"id":"https://openalex.org/C199360897","display_name":"Programming language","score":0.22424352169036865}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1288}},{"id":"openalex:W4404627538","title":"Language-Driven Visual Consensus for Zero-Shot Semantic Segmentation","url":"https://doi.org/10.1109/tcsvt.2024.3504816","published":"2024-11-22","authors":["Zicheng Zhang","Wei Ke","Yi Zhu","Xiaodan Liang","Jianzhuang Liu","Qixiang Ye","Tong Zhang"],"abstract":"The pre-trained vision-language model, exemplified by CLIP, advances zero-shot semantic segmentation by aligning visual features with class embeddings through a transformer decoder to generate semantic masks. Despite its effectiveness, prevailing methods within this paradigm encounter challenges, including overfitting on seen classes and small fragmentation in segmentation masks. To mitigate these issues, we propose a Language-Driven Visual Consensus (LDVC) approach, fostering improved alignment of linguistic and visual information. Specifically, we leverage class embeddings as anchors due to their discrete and abstract nature, steering visual features toward class embeddings. Moreover, to achieve a more compact visual space, we introduce route attention into the transformer decoder to find visual consensus, thereby enhancing semantic consistency within the same object. Equipped with a v...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/tcsvt.2024.3504816","openalex_id":"https://openalex.org/W4404627538","cited_by_count":7,"quality_score":48,"matched_keywords":["language model"],"author_affiliations":["Centre d'Imagerie BioMedicale","Huawei Technologies (China)","Huawei Technologies (Sweden)","Sun Yat-sen University","University of Chinese Academy of Sciences","Xi'an Jiaotong University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6997900009155273},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5948978662490845},{"id":"https://openalex.org/C89600930","display_name":"Segmentation","score":0.5701572299003601},{"id":"https://openalex.org/C2780813799","display_name":"Zero (linguistics)","score":0.518731415271759},{"id":"https://openalex.org/C124504099","display_name":"Image segmentation","score":0.4911665618419647},{"id":"https://openalex.org/C2778344882","display_name":"Shot (pellet)","score":0.47028210759162903},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.4498330056667328},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.38108304142951965}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":7}},{"id":"apple:rc5zkel41gf563etv5p9suj9","title":"Memory-Retaining Finetuning via Distillation","url":"https://machinelearning.apple.com/research/memory-retaining","published":"2024-11-21","authors":["Zitong Yang","Aonan Zhang","Sam Wiseman","Xiang Kong","Ke Ye","Dong Yin"],"abstract":"This paper was accepted at the Fine-Tuning in Modern Machine Learning: Principles and Scalability (FITML) Workshop at NeurIPS 2024.","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["memory","distillation"],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"apple:h5s9cr62c33rkckp5egojq17","title":"Efficient and Effective Uncertainty Quantification in LLMs","url":"https://machinelearning.apple.com/research/efficient-and-effective","published":"2024-11-21","authors":["Miao Xiong","Andrea Santilli","Michael Kirchhof","Adam Golinski","Sinead Williamson"],"abstract":"This paper was accepted at the Safe Generative AI Workshop (SGAIW) 2024 at NeurIPS 2024.","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["efficient"],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"apple:ij6b8akwv24kfhda492vaw4o","title":"Multimodal Autoregressive Pre-Training of Large Vision Encoders","url":"https://machinelearning.apple.com/research/multimodal-autoregressive","published":"2024-11-21","authors":["Enrico Fini","Mustafa Shukor","Xiujun Li","Philipp Dufter","Michal Klein","David Haldimann","Sai Aitharaju","Louis Béthune","Zhe Gan","Victor Turrisi","Alexander Toshev","Marcin Eichner"],"abstract":"Equal Contributors","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"openalex:W4404582879","title":"IVTP: Instruction-Guided Visual Token Pruning for Large Vision-Language Models","url":"https://doi.org/10.1007/978-3-031-72643-9_13","published":"2024-11-21","authors":["Kai Huang","Hao Zou","Xi Ye","BoChen Wang","Zhen Xie","Yu Liang"],"abstract":"","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"book-chapter","doi":"https://doi.org/10.1007/978-3-031-72643-9_13","openalex_id":"https://openalex.org/W4404582879","cited_by_count":4,"quality_score":41,"matched_keywords":[],"author_affiliations":["Alibaba Group (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8819111585617065},{"id":"https://openalex.org/C108010975","display_name":"Pruning","score":0.7109639048576355},{"id":"https://openalex.org/C48145219","display_name":"Security token","score":0.6296894550323486},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5701261162757874},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.37485843896865845},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.36116692423820496},{"id":"https://openalex.org/C28490314","display_name":"Speech recognition","score":0.32285094261169434},{"id":"https://openalex.org/C111919701","display_name":"Operating system","score":0.05296412110328674}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":4}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/building-ai-agents-for-autonomous-clouds-challenges-and-design-principles","title":"Building AI Agents for Autonomous Clouds: Challenges and Design Principles","url":"https://www.microsoft.com/en-us/research/publication/building-ai-agents-for-autonomous-clouds-challenges-and-design-principles/","published":"2024-11-20","authors":["Manisha M Shetty","Yinfang Chen","Gagan Somashekar","Minghua Ma","Yogesh L. Simmhan","Xuchao Zhang","Jonathan Mace","Pedro Las-Casas","Shachee Mishra Gupta","Suman Nath","Chetan Bansal","Saravan Rajmohan"],"abstract":"The rapid growth in the use of Large Language Models (LLMs) and AI Agents as part of software development and deployment is revolutionizing the information technology landscape. While code generation receives significant attention, a higher-impact application lies in using AI agents for operational resilience of cloud services, which currently require significant human effort and domain knowledge. There is a growing interest in AI for IT Operations (AIOps) which aims to automate complex operational tasks, like fault localization and root cause analysis, thereby reducing human intervention and customer impact. However, achieving the vision of autonomous and self-healing clouds through AIOps is hampered by the lack of standardized frameworks for building, evaluating, and improving AIOps agents. This vision paper lays the groundwork for such a framework by first framing the requirements and...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Programming languages and software engineering","Systems and networking","Computer science","agent"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"bytedance-seed:166","title":"DSTC: Direct Preference Learning with Only Self-Generated Tests and Code to Improve Code LMs","url":"https://seed.bytedance.com/en/research/dstc-direct-preference-learning-with-only-self-generated-tests-and-code-to-improve-code-lms","published":"2024-11-20","authors":["Zhihan Liu","Shenao Zhang","Yongfei Liu","Boyi Liu","Yingxiang Yang","Zhaoran Wang"],"abstract":"Direct preference learning offers a promising and computation-efficient beyond supervised fine-tuning (SFT) for improving code generation in coding large language models (LMs). However, the scarcity of reliable preference data is a bottleneck for the performance of direct preference learning to improve the coding accuracy of code LMs. In this paper, we introduce \\underline{\\textbf{D}}irect Preference Learning with Only \\underline{\\textbf{S}}elf-Generated \\underline{\\textbf{T}}ests and \\underline{\\textbf{C}}ode (DSTC), a framework that leverages only self-generated code snippets and tests to construct reliable preference pairs such that direct preference learning can improve LM coding accuracy without external annotations. DSTC combines a minimax selection process and test-code concatenation to improve preference pair quality, reducing the influence of incorrect self-generated tests and e...","companies":["ByteDance/Seed"],"matched_orgs":["ByteDance/Seed"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Code Generation","Infrastructures","arXiv","preference","efficient"],"author_affiliations":["ByteDance/Seed"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official ByteDance Seed publication API https://seed.bytedance.com/api/get_article_list_v2"}},{"id":"official:5b7789b9f1aaf4af","title":"Llama Guard 3-1B-INT4: Compact and Efficient Safeguard for Human-AI Conversations","url":"https://ai.meta.com/research/publications/llama-guard-3-1b-int4-compact-and-efficient-safeguard-for-human-ai-conversations/","published":"2024-11-20","authors":["Igor Fedorov","Kate Plawiak","Lemeng Wu","Tarek Elgamal","Naveen Suda","Eric Smith","Hongyuan Zhan","Jianfeng Chi","Yuriy Hulovatyy","Kimish Patel","Zechun Liu","Yangyang Shi"],"abstract":"Official Meta AI publication page.","companies":["Meta/FAIR"],"matched_orgs":["Meta/FAIR"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["NLP","Core Machine Learning","efficient"],"author_affiliations":["Meta/FAIR"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Meta AI publication search https://ai.meta.com/results/?content_types%5B0%5D=publication&page=8"}},{"id":"official:09ba797913aae7a4","title":"Llama Guard 3 Vision: Safeguarding Human-AI Image Understanding Conversations","url":"https://ai.meta.com/research/publications/llama-guard-3-vision-safeguarding-human-ai-image-understanding-conversations/","published":"2024-11-20","authors":["Jianfeng Chi","Ujjwal Karn","Hongyuan Zhan","Eric Smith","Javier Rando","Yiming Zhang","Kate Plawiak","Zacharie Delpierre Coudert","Kartikeya Upasani","Mahesh Pasupuleti"],"abstract":"Official Meta AI publication page.","companies":["Meta/FAIR"],"matched_orgs":["Meta/FAIR"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Conversational AI","Computer Vision"],"author_affiliations":["Meta/FAIR"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Meta AI publication search https://ai.meta.com/results/?content_types%5B0%5D=publication&page=8"}},{"id":"openalex:W4404545798","title":"Towards efficient and effective unlearning of large language models for recommendation","url":"https://doi.org/10.1007/s11704-024-40044-2","published":"2024-11-20","authors":["Hangyu Wang","Jianghao Lin","Bo Chen","Yang Yang","Ruiming Tang","Weinan Zhang","Yong Yu"],"abstract":"","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1007/s11704-024-40044-2","openalex_id":"https://openalex.org/W4404545798","cited_by_count":11,"quality_score":52,"matched_keywords":["efficient"],"author_affiliations":["Huawei Technologies (China)","Shanghai Jiao Tong University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.9106982946395874},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.42927587032318115},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.41036278009414673},{"id":"https://openalex.org/C2522767166","display_name":"Data science","score":0.33937954902648926}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":11}},{"id":"apple:ovr22n262fyfuczn8780vaul","title":"Do LLMs Internally \"Know\" When They Follow Instructions?","url":"https://machinelearning.apple.com/research/follow-instructions","published":"2024-11-20","authors":["Juyeon Heo","Christina Heinze-Deml","Oussama Elachqar","Shirley Ren","Udhay Nallasamy","Andy Miller","Kwan Ho Ryan Chan","Jaya Narain"],"abstract":"This paper was accepted at the Foundation Model Interventions (MINT) Workshop at NeurIPS 2024.","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"openalex:W4404646344","title":"Multi-scale Cooperative Multimodal Transformers for Multimodal Sentiment Analysis in Videos","url":"https://doi.org/10.1007/978-981-96-0351-0_21","published":"2024-11-20","authors":["Lianyang Ma","Yue Yu","Tao Liang","Tongliang Liu"],"abstract":"","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"book-chapter","doi":"https://doi.org/10.1007/978-981-96-0351-0_21","openalex_id":"https://openalex.org/W4404646344","cited_by_count":4,"quality_score":41,"matched_keywords":[],"author_affiliations":["Tencent (China)","The University of Sydney"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8602195978164673},{"id":"https://openalex.org/C66402592","display_name":"Sentiment analysis","score":0.5196184515953064},{"id":"https://openalex.org/C66322947","display_name":"Transformer","score":0.5045682191848755},{"id":"https://openalex.org/C2778755073","display_name":"Scale (ratio)","score":0.4571537673473358},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4487695097923279},{"id":"https://openalex.org/C135641252","display_name":"Multimodal interaction","score":0.4272664785385132},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.2673792243003845},{"id":"https://openalex.org/C58640448","display_name":"Cartography","score":0.11301010847091675}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":4}},{"id":"openalex:W4404534114","title":"The Emergence of Chunking Structures with Hierarchical RNN","url":"https://doi.org/10.1162/coli_a_00545","published":"2024-11-20","authors":["Zijun Wu","Anup Anand Deshmukh","Yongkang Wu","Jimmy Lin","Lili Mou"],"abstract":"Abstract In Natural Language Processing (NLP), predicting linguistic structures, such as parsing and chunking, has mostly relied on manual annotations of syntactic structures. This article introduces an unsupervised approach to chunking, a syntactic task that involves grouping words in a non-hierarchical manner. We present a Hierarchical Recurrent Neural Network (HRNN) designed to model word-to-chunk and chunk-to-sentence compositions. Our approach involves a two-stage training process: pretraining with an unsupervised parser and finetuning on downstream NLP tasks. Experiments on multiple datasets reveal a notable improvement of unsupervised chunking performance in both pretraining and finetuning stages. Interestingly, we observe that the emergence of the chunking structure is transient during the neural model’s downstream-task training. This study contributes to the advancement of unsup...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1162/coli_a_00545","openalex_id":"https://openalex.org/W4404534114","cited_by_count":3,"quality_score":40,"matched_keywords":[],"author_affiliations":["Canadian Institute for Advanced Research","Huawei Technologies (China)","Intel (United States)","University of Alberta","University of Waterloo"],"concepts":[{"id":"https://openalex.org/C203357204","display_name":"Chunking (psychology)","score":0.7621349096298218},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7139626145362854},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.38666507601737976},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3416163921356201}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":3}},{"id":"openalex:W4404545267","title":"Skews in the Phenomenon Space Hinder Generalization in Text-to-Image Generation","url":"https://doi.org/10.1007/978-3-031-73021-4_25","published":"2024-11-20","authors":["Yingshan Chang","Yasi Zhang","Zhiyuan Fang","Ying Wu","Yonatan Bisk","Feng Gao"],"abstract":"","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"book-chapter","doi":"https://doi.org/10.1007/978-3-031-73021-4_25","openalex_id":"https://openalex.org/W4404545267","cited_by_count":1,"quality_score":38,"matched_keywords":[],"author_affiliations":["Amazon (United States)","Carnegie Mellon University","Seattle University","University of California, Los Angeles"],"concepts":[{"id":"https://openalex.org/C177148314","display_name":"Generalization","score":0.7882068753242493},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7857692241668701},{"id":"https://openalex.org/C50335755","display_name":"Phenomenon","score":0.7432857751846313},{"id":"https://openalex.org/C2778572836","display_name":"Space (punctuation)","score":0.5916646718978882},{"id":"https://openalex.org/C115961682","display_name":"Image (mathematics)","score":0.5207564830780029},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4080553948879242},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.3581351339817047},{"id":"https://openalex.org/C80444323","display_name":"Theoretical computer science","score":0.3248228430747986}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4408402169","title":"SDL: Spoken Dual Learning for Joint Optimization of SLU and SLG","url":"https://doi.org/10.1109/icoiact64819.2024.10913256","published":"2024-11-20","authors":["Hao Yang","Min Zhang","Jiaxin Guo"],"abstract":"This paper presents a novel spoken dual learning (SDL) frame-work aimed at the concurrent training of spoken language understanding (SLU) and generation (SLG) models. The central proposition is to consider SLU and SLG as conditional text generation tasks steered by prompts, thereby offering consistent goals and context-aware direction. SDL navigates the inherent dual task constraints through an innovative transformation technique utilizing noisy transcripts and intents. The blueprint of SDL incorporates high-profile, pre-trained speech models such as Wav2vec2 or Whisper, serving as the backbone. Extensive experimentation on datasets of varying sizes (small, medium, and large) provides empirical evidence demonstrating that SDL exhibits higher performance levels compared to models that are independently trained, as the cycle consistency of mapping be-tween speech, semantics, and text acts....","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/icoiact64819.2024.10913256","openalex_id":"https://openalex.org/W4408402169","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Huawei Technologies (China)"],"concepts":[{"id":"https://openalex.org/C18555067","display_name":"Joint (building)","score":0.7822335958480835},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7798315286636353},{"id":"https://openalex.org/C2780980858","display_name":"Dual (grammatical number)","score":0.7726702690124512},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4235367178916931},{"id":"https://openalex.org/C28490314","display_name":"Speech recognition","score":0.3805779814720154},{"id":"https://openalex.org/C41895202","display_name":"Linguistics","score":0.08868607878684998},{"id":"https://openalex.org/C127413603","display_name":"Engineering","score":0.07435861229896545},{"id":"https://openalex.org/C170154142","display_name":"Architectural engineering","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/radphi-3-small-language-models-for-radiology","title":"RadPhi-3: Small Language Models for Radiology","url":"https://www.microsoft.com/en-us/research/publication/radphi-3-small-language-models-for-radiology/","published":"2024-11-19","authors":["Mercy Ranjit","Shaury Srivastav","Tanuja Ganu"],"abstract":"LLM based copilot assistants are useful in everyday tasks. There is a proliferation in the exploration of AI assistant use cases to support radiology workflows in a reliable manner. In this work, we present RadPhi-3, a Small Language Model instruction tuned from Phi-3-mini-4k-instruct with 3.8B parameters to assist with various tasks in radiology workflows. While impression summary generation has been the primary task which has been explored in prior works w.r.t radiology reports of Chest X-rays, we also explore other useful tasks like change summary generation comparing the current radiology report and its prior report, section extraction from radiology reports, tagging the reports with various pathologies and tubes, lines or devices present in them etc. In-addition, instruction tuning RadPhi-3 involved learning from a credible knowledge source used by radiologists, Radiopaedia.org. Rad...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":80,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Medical, health and genomics","Computer science","1970-01-01","LLM","language model"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/spotlight-accurate-explainable-and-efficient-anomaly-detection-for-open-ran","title":"SpotLight: Accurate, Explainable and Efficient Anomaly Detection for Open RAN","url":"https://www.microsoft.com/en-us/research/publication/spotlight-accurate-explainable-and-efficient-anomaly-detection-for-open-ran/","published":"2024-11-19","authors":["Chuanhao Sun","Ujjwal Pawar","Molham Khoja","Xenofon Foukas","Mahesh K. Marina","Bozidar Radunovic"],"abstract":"The Open RAN architecture, with disaggregated and virtualized RAN functions communicating over standardized interfaces, promises a diversified and multi-vendor RAN ecosystem. However, these same features contribute to increased operational complexity, making it highly challenging to troubleshoot RAN related performance issues and failures. Tackling this challenge requires a dependable, explainable anomaly detection method that Open RAN is currently lacking. To address this problem, we introduce SpotLight, a tailored system architecture with a distributed deep generative modeling based method running across the edge and cloud. SpotLight takes in a diverse, fine grained stream of metrics from the RAN and the platform, to continually detect and localize anomalies. It introduces a novel multi-stage generative model to detect potential anomalies at the edge using a light-weight algorithm, fol...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Data platforms and analytics","Systems and networking","1970-01-01","efficient"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/surds-benchmarking-spatial-understanding-and-reasoning-in-driving-scenarios-with-vision-language-models","title":"SURDS: Benchmarking Spatial Understanding and Reasoning in Driving Scenarios with Vision Language Models","url":"https://www.microsoft.com/en-us/research/publication/surds-benchmarking-spatial-understanding-and-reasoning-in-driving-scenarios-with-vision-language-models/","published":"2024-11-19","authors":["Xianda Guo","Ruijun Zhang","Yiqun Duan","Yuhang He","Dujun Nie","Wenke Huang","Chenming Zhang","Shuai Liu","Hao Zhao","Long Chen"],"abstract":"Accurate spatial reasoning in outdoor environments - covering geometry, object pose, and inter-object relationships - is fundamental to downstream tasks such as mapping, motion forecasting, and high-level planning in autonomous driving. We introduce SURDS, a large-scale benchmark designed to systematically evaluate the spatial reasoning capabilities of vision language models (VLMs). Built on the nuScenes dataset, SURDS comprises 41,080 vision-question-answer training instances and 9,250 evaluation samples, spanning six spatial categories: orientation, depth estimation, pixel-level localization, pairwise distance, lateral ordering, and front-behind relations. We benchmark leading general-purpose VLMs, including GPT, Gemini, and Qwen, revealing persistent limitations in fine-grained spatial understanding. To address these deficiencies, we go beyond static evaluation and explore whether ali...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Inproceedings (Conference)","Computer vision","Computer science","Vision-language models","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"apple:ie75f4dmzt3pwgvdcic9ttgs","title":"Towards Low-Bit Communication for Tensor Parallel LLM Inference","url":"https://machinelearning.apple.com/research/low-bit","published":"2024-11-19","authors":["Harry Dong","Tyler Johnson","Minsik Cho","Emad Soroush"],"abstract":"This paper was accepted at the Efficient Natural Language and Speech Processing (ENLSP) Workshop at NeurIPS 2024.","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["LLM","efficient"],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"apple:y4872eaqo0qq2slifk3d6uq0","title":"Speculative Streaming: Fast LLM Inference Without Auxiliary Models","url":"https://machinelearning.apple.com/research/llm-inference","published":"2024-11-19","authors":["Nikhil Bhendawade","Irina Belousova","Qichen Fu","Henry Mason","Mohammad Rastegari","Mahyar Najibi"],"abstract":"This paper was accepted at the Efficient Natural Language and Speech Processing (ENLSP) Workshop at NeurIPS 2024.","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["LLM","efficient"],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"apple:dyho4moogwv93sdospeulv7q","title":"Do Compressed LLMs Forget Knowledge? An Experimental Study with Practical Implications","url":"https://machinelearning.apple.com/research/forget-knowledge","published":"2024-11-19","authors":["Scott Hoang","Minsik Cho","Thomas Merth","Atlas Wang","Mohammad Rastegari","Devang Naik"],"abstract":"This paper was accepted at the Machine Learning and Compression Workshop at NeurIPS 2024.","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["compression"],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"apple:i9r2a3aytalqpxaq48njagal","title":"Dataset Decomposition: Faster LLM Training with Variable Sequence Length Curriculum","url":"https://machinelearning.apple.com/research/dataset-decomposition","published":"2024-11-19","authors":["Hadi Pour Ansari","Chun-Liang Li","Rick Chang","Pavan Kumar Anasosalu Vasu","Cem Koc","Vaishaal Shankar","Oncel Tuzel"],"abstract":"Large language models (LLMs) are commonly trained on datasets consisting of fixed-length token sequences. These datasets are created by randomly concatenating documents of various lengths and then chunking them into sequences of a predetermined target length (concat-and-chunk). Recent attention implementations mask cross-document attention, reducing the effective length of a chunk of tokens. Additionally, training on long sequences becomes...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["LLM"],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"openalex:W4404526287","title":"StyleCrafter: Taming Artistic Video Diffusion with Reference-Augmented Adapter Learning","url":"https://doi.org/10.1145/3687975","published":"2024-11-19","authors":["Gongye Liu","Menghan Xia","Yong Zhang","Haoxin Chen","Jinbo Xing","Yibo Wang","Xintao Wang","Ying Shan","Yujiu Yang"],"abstract":"Text-to-video (T2V) models have shown remarkable capabilities in generating diverse videos. However, they struggle to produce user-desired artistic videos due to (i) text's inherent clumsiness in expressing specific styles and (ii) the generally degraded style fidelity. To address these challenges, we introduce StyleCrafter, a generic method that enhances pretrained T2V models with a style control adapter, allowing video generation in any style by feeding a reference image. Considering the scarcity of artistic video data, we propose to first train a style control adapter using style-rich image datasets, then transfer the learned stylization ability to video generation through a tailor-made finetuning paradigm. To promote content-style disentanglement, we employ carefully designed data augmentation strategies to enhance decoupled learning. Additionally, we propose a scale-adaptive fusion....","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3687975","openalex_id":"https://openalex.org/W4404526287","cited_by_count":8,"quality_score":49,"matched_keywords":["efficient"],"author_affiliations":["Chinese University of Hong Kong","Tencent (China)","Tsinghua University"],"concepts":[{"id":"https://openalex.org/C177284502","display_name":"Adapter (computing)","score":0.7745981812477112},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6471819877624512},{"id":"https://openalex.org/C121684516","display_name":"Computer graphics (images)","score":0.5473870635032654},{"id":"https://openalex.org/C153715457","display_name":"Augmented reality","score":0.526104748249054},{"id":"https://openalex.org/C69357855","display_name":"Diffusion","score":0.43567055463790894},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.3691067695617676},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.36354076862335205},{"id":"https://openalex.org/C9390403","display_name":"Computer hardware","score":0.06383487582206726}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":8}},{"id":"openalex:W4404527328","title":"SGEdit: Bridging LLM with Text2Image Generative Model for Scene Graph-based Image Editing","url":"https://doi.org/10.1145/3687957","published":"2024-11-19","authors":["Zhiyuan Zhang","Dongdong Chen","Jing Liao"],"abstract":"Scene graphs offer a structured, hierarchical representation of images, with nodes and edges symbolizing objects and the relationships among them. It can serve as a natural interface for image editing, dramatically improving precision and flexibility. Leveraging this benefit, we introduce a new framework that integrates large language model (LLM) with Text2Image generative model for scene graph-based image editing. This integration enables precise modifications at the object level and creative recomposition of scenes without compromising overall image integrity. Our approach involves two primary stages: 1) Utilizing a LLM-driven scene parser, we construct an image's scene graph, capturing key objects and their interrelationships, as well as parsing fine-grained attributes such as object masks and descriptions. These annotations facilitate concept learning with a fine-tuned diffusion mode...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3687957","openalex_id":"https://openalex.org/W4404527328","cited_by_count":3,"quality_score":48,"matched_keywords":["LLM","language model"],"author_affiliations":["City University of Hong Kong","Microsoft (United States)"],"concepts":[{"id":"https://openalex.org/C174348530","display_name":"Bridging (networking)","score":0.7969346046447754},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7129881978034973},{"id":"https://openalex.org/C179372163","display_name":"Scene graph","score":0.6884095072746277},{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.6371240615844727},{"id":"https://openalex.org/C2776674983","display_name":"Image editing","score":0.6262420415878296},{"id":"https://openalex.org/C132525143","display_name":"Graph","score":0.5134618878364563},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5028026103973389},{"id":"https://openalex.org/C167966045","display_name":"Generative model","score":0.4683111608028412}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":3}},{"id":"arxiv:2409.12960","title":"LVCD: Reference-based Lineart Video Colorization with Diffusion Models","url":"http://arxiv.org/abs/2409.12960","published":"2024-11-19","authors":["Zhitong Huang","Mohan Zhang","Jing Liao"],"abstract":"We propose the first video diffusion framework for reference-based lineart video colorization. Unlike previous works that rely solely on image generative models to colorize lineart frame by frame, our approach leverages a large-scale pretrained video diffusion model to generate colorized animation videos. This approach leads to more temporally consistent results and is better equipped to handle large motions. Firstly, we introduce Sketch-guided ControlNet which provides additional control to finetune an image-to-video diffusion model for controllable video synthesis, enabling the generation of animation videos conditioned on lineart. We then propose Reference Attention to facilitate the transfer of colors from the reference frame to other frames containing fast and expansive motions. Finally, we present a novel scheme for sequential sampling, incorporating the Overlapped Blending Module....","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3687910","openalex_id":"https://openalex.org/W4403748157","cited_by_count":10,"quality_score":47,"matched_keywords":[],"author_affiliations":["City University of Hong Kong","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C69357855","display_name":"Diffusion","score":0.5867032408714294},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5683549642562866},{"id":"https://openalex.org/C121684516","display_name":"Computer graphics (images)","score":0.43668490648269653},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.3372114300727844},{"id":"https://openalex.org/C121332964","display_name":"Physics","score":0.06168469786643982},{"id":"https://openalex.org/C97355855","display_name":"Thermodynamics","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":10}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/dimensions-of-generative-ai-evaluation-design","title":"Dimensions of Generative AI Evaluation Design","url":"https://www.microsoft.com/en-us/research/publication/dimensions-of-generative-ai-evaluation-design/","published":"2024-11-18","authors":["Alex Dow","Jennifer Wortman Vaughan","Solon Barocas","Chad Atalla","Alex Chouldechova","Hanna Wallach"],"abstract":"There are few principles or guidelines to ensure evaluations of generative AI (GenAI) models and systems are effective. To help address this gap, we propose a set of general dimensions that capture critical choices involved in GenAI evaluation design. These dimensions include the evaluation setting, the task type, the input source, the interaction style, the duration, the metric type, and the scoring method. By situating GenAI evaluations within these dimensions, we aim to guide decision-making during GenAI evaluation design and provide a structure for comparing different evaluations. We illustrate the utility of the proposed set of general dimensions using two examples: a hypothetical evaluation of the fairness of a GenAI system and three real-world GenAI evaluations of biological threats.","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Miscellaneous","Artificial intelligence","Social sciences","Computer science","Responsible AI","Social Science"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W4404460707","title":"A foundation model for joint segmentation, detection and recognition of biomedical objects across nine modalities","url":"https://doi.org/10.1038/s41592-024-02499-w","published":"2024-11-18","authors":["Theodore Zhao","裕二 池谷","Jianwei Yang","Naoto Usuyama","Ho Hin Lee","Sid Kiblawi","Tristan Naumann","Jianfeng Gao","Angela Crabtree","Jacob Abel","Christine Moung-Wen","Brian Piening"],"abstract":"","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1038/s41592-024-02499-w","openalex_id":"https://openalex.org/W4404460707","cited_by_count":81,"quality_score":67,"matched_keywords":[],"author_affiliations":["ID Genomics (United States)","Microsoft (United States)","Microsoft Research (United Kingdom)","Providence College","Providence Portland Medical Center","University of Washington"],"concepts":[{"id":"https://openalex.org/C89600930","display_name":"Segmentation","score":0.633078932762146},{"id":"https://openalex.org/C2779903281","display_name":"Modalities","score":0.6137118935585022},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5356838703155518},{"id":"https://openalex.org/C18555067","display_name":"Joint (building)","score":0.5275760889053345},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5189895033836365},{"id":"https://openalex.org/C2780966255","display_name":"Foundation (evidence)","score":0.5102553367614746},{"id":"https://openalex.org/C153180895","display_name":"Pattern recognition (psychology)","score":0.4237164855003357},{"id":"https://openalex.org/C70721500","display_name":"Computational biology","score":0.34662097692489624}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":81}},{"id":"apple:e9yk5kyloui0pca53txb5o6z","title":"Duo-LLM: A Framework for Studying Adaptive Computation in Large Language Models","url":"https://machinelearning.apple.com/research/duo-llm","published":"2024-11-18","authors":["Keivan Alizadeh","Iman Mirzadeh","Hooman Shahrokhi","Dmitry Belenko","Frank Sun","Minsik Cho","Mohammad Hossein Sekhavat","Moin Nabi","Mehrdad Farajtabar"],"abstract":"This paper was accepted at the Efficient Natural Language and Speech Processing (ENLSP) Workshop at NeurIPS 2024.","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["LLM","efficient"],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"apple:wk82f1itro7aqri17k0n6rsn","title":"Recurrent Drafter for Fast Speculative Decoding in Large Language Models","url":"https://machinelearning.apple.com/research/recurrent-drafter","published":"2024-11-18","authors":["Aonan Zhang","Ray Zhang","Yunfei Cheng","Chong Wang","Yi Wang"],"abstract":"We present Recurrent Drafter (ReDrafter), an advanced speculative decoding approach that achieves state-of-the-art speedup for large language models (LLMs) inference. The performance gains are driven by three key aspects: (1) leveraging a recurrent neural network (RNN) as the draft model conditioning on LLM's hidden states, (2) applying a dynamic tree attention algorithm over beam search results to eliminate duplicated prefixes in candidate...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["LLM"],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/pearl-personalizing-large-language-model-writing-assistants-with-generation-calibrated-retrievers","title":"Pearl: Personalizing Large Language Model Writing Assistants with Generation-Calibrated Retrievers","url":"https://www.microsoft.com/en-us/research/publication/pearl-personalizing-large-language-model-writing-assistants-with-generation-calibrated-retrievers/","published":"2024-11-16","authors":["Sheshera Mysore","Zhuoran Lu","Mengting Wan","Longqi Yang","Bahar Sarrafzadeh","Steve Menezes","Tina Baghaee","Emmanuel Barajas Gonzalez","Jennifer Neville","Tara Safavi"],"abstract":"Powerful large language models have facilitated the development of writing assistants that promise to significantly improve the quality and efficiency of composition and communication. However, a barrier to effective assistance is the lack of personalization in LLM outputs to the author’s communication style, specialized knowledge, and values. In this paper, we address this challenge by proposing Pearl, a LLM writing assistant personalized with a retriever that is trained to be generation-calibrated for personalization. Generation calibration ensures that our retriever selects historic user authored documents to augment an LLM prompt such that they are likely to help an LLM generation better adhere to a users’ preferences. We propose two key novelties for training such a retriever: (1) A training data selection method that identifies user requests likely to benefit from personalization a...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":100,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Human language technologies","Natural language processing","Personalization","1970-01-01","LLM","language model","personalized","personalization","retrieval","media"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/evaluating-generative-ai-systems-is-a-social-science-measurement-challenge","title":"Evaluating Generative AI Systems is a Social Science Measurement Challenge","url":"https://www.microsoft.com/en-us/research/publication/evaluating-generative-ai-systems-is-a-social-science-measurement-challenge/","published":"2024-11-16","authors":["Hanna Wallach","Meera Desai","Nick Pangakis","A. Feder Cooper","Angelina Wang","Solon Barocas","Alex Chouldechova","Chad Atalla","Su Lin Blodgett","Emily Corvi","Alex Dow","Jean Garcia-Gathright"],"abstract":"Across academia, industry, and government, there is an increasing awareness that the measurement tasks involved in evaluating generative AI (GenAI) systems are especially difficult. We argue that these measurement tasks are highly reminiscent of measurement tasks found throughout the social sciences. With this in mind, we present a framework, grounded in measurement theory from the social sciences, for measuring concepts related to the capabilities, impacts, opportunities, and risks of GenAI systems. The framework distinguishes between four levels: the background concept, the systematized concept, the measurement instrument(s), and the instance-level measurements themselves. This four-level approach differs from the way measurement is typically done in ML, where researchers and practitioners appear to jump straight from background concepts to measurement instruments, with little to no ex...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":80,"matched_keywords":["Miscellaneous","Artificial intelligence","Social sciences","Computer science","Generative AI","Responsible AI","Social Science"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/dukawalla-voice-interfaces-for-small-businesses-in-africa","title":"Dukawalla: Voice Interfaces for Small Businesses in Africa","url":"https://www.microsoft.com/en-us/research/publication/dukawalla-voice-interfaces-for-small-businesses-in-africa/","published":"2024-11-16","authors":["Elizabeth Ankrah","Stephanie Nyairo","Mercy Muchai","Kagonya Awori","Millicent Ochieng","Mark Kariuki","Jacki O'Neill"],"abstract":"Small and medium-sized businesses (SMBs) often struggle with data-driven decision-making due to a lack of advanced analytics tools, especially in African countries where they make up majority of the workforce. Though many tools exist they are not designed to fit into the ways of working of SMB workers who are mobile-first, have limited time to learn new workflows, and for whom social and business are tightly coupled. To address this, the Dukawalla prototype was created. This intelligent assistant bridges the gap between raw business data and actionable insights by leveraging voice interaction and the power of generative AI. Dukawalla provides an intuitive way for business owners to interact with their data, aiding in informed decision-making. This paper examines Dukawalla’s deployment across SMBs in Nairobi, focusing on their experiences using this voice-based assistant to streamline dat...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Article (Journal)","Artificial intelligence","Human-computer interaction","Human–computer interaction"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/lorasc-expressive-and-generalizable-low-rank-adaptation-for-large-models-via-slow-cascaded-learning","title":"LORASC: Expressive and Generalizable Low-rank Adaptation for Large Models via Slow Cascaded Learning","url":"https://www.microsoft.com/en-us/research/publication/lorasc-expressive-and-generalizable-low-rank-adaptation-for-large-models-via-slow-cascaded-learning/","published":"2024-11-15","authors":["Yifan Yang","Yifei Shen","Yuqing Yang","Lili Qiu","Fangyun Wei"],"abstract":"Efficient fine-tuning plays a fundamental role in modern large models, with low-rank adaptation emerging as a particularly promising approach. However, the existing variants of LoRA are hampered by limited expressiveness, a tendency to overfit, and sensitivity to hyperparameter settings. This paper presents LoRA Slow Cascade Learning (LoRASC), an innovative technique designed to enhance LoRA's expressiveness and generalization capabilities while preserving its training efficiency. Our approach augments expressiveness through a cascaded learning strategy that enables a mixture-of-low-rank adaptation, thereby increasing the model's ability to capture complex patterns. Additionally, we introduce a slow-fast update mechanism and cascading noisy tuning to bolster generalization. The extensive experiments on various language and vision datasets, as well as robustness benchmarks, demonstrate th...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","efficient"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"official:9bac3c29bf4343de","title":"Extending the Context Length to 1M Tokens!","url":"https://qwenlm.github.io/blog/qwen2.5-turbo/","published":"2024-11-15","authors":["Alibaba/Qwen"],"abstract":"API Documentation (Chinese) HuggingFace Demo ModelScope DemoIntroduction After the release of Qwen2.5, we heard the community’s demand for processing longer contexts. In recent months, we have made many optimizations for the model capabilities and inference performance of extremely long context. Today, we are proud to introduce the new Qwen2.5-Turbo version, which features:Longer Context Support: We have extended the model’s context length from 128k to 1M, which is approximately 1 million English words or 1.","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Alibaba/Qwen"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official RSS feed https://qwenlm.github.io/blog/index.xml"}},{"id":"openalex:W4404387675","title":"Pretraining graph transformer for molecular representation with fusion of multimodal information","url":"https://doi.org/10.1016/j.inffus.2024.102784","published":"2024-11-14","authors":["Ruizhe Chen","Chunyan Li","Longyue Wang","Mingquan Liu","Shugao Chen","Jiahao Yang","Xiangxiang Zeng"],"abstract":"","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1016/j.inffus.2024.102784","openalex_id":"https://openalex.org/W4404387675","cited_by_count":19,"quality_score":56,"matched_keywords":[],"author_affiliations":["Hunan University","Tencent (China)","Yunnan Normal University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7270369529724121},{"id":"https://openalex.org/C2982962833","display_name":"Information fusion","score":0.5900250673294067},{"id":"https://openalex.org/C66322947","display_name":"Transformer","score":0.5781567692756653},{"id":"https://openalex.org/C132525143","display_name":"Graph","score":0.5371760725975037},{"id":"https://openalex.org/C2776359362","display_name":"Representation (politics)","score":0.490866482257843},{"id":"https://openalex.org/C158525013","display_name":"Fusion","score":0.47257721424102783},{"id":"https://openalex.org/C80444323","display_name":"Theoretical computer science","score":0.3916154205799103},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3839259445667267}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":19}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/learning-to-retrieve-iteratively-for-in-context-learning","title":"Learning to Retrieve Iteratively for In-Context Learning","url":"https://www.microsoft.com/en-us/research/publication/learning-to-retrieve-iteratively-for-in-context-learning/","published":"2024-11-13","authors":["Yunmo Chen","Tongfei Chen","Harsh Jhamtani","Patrick Xia","Richard Shin","Jason Eisner","Ben Van Durme"],"abstract":"We introduce iterative retrieval, a novel framework that empowers retrievers to make iterative decisions through policy optimization. Finding an optimal portfolio of retrieved items is a combinatorial optimization problem, generally considered NP-hard. This approach provides a learned approximation to such a solution, meeting specific task requirements under a given family of large language models (LLMs). We propose a training procedure based on reinforcement learning, incorporating feedback from LLMs. We instantiate an iterative retriever for composing in-context learning (ICL) exemplars and apply it to various semantic parsing tasks that demand synthesized programs as outputs. By adding only 4M additional parameters for state encoding, we convert an off-the-shelf dense retriever into a stateful iterative retriever, outperforming previous methods in selecting ICL exemplars on semantic p...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Human language technologies","Computer science","1970-01-01","retrieval"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"bytedance-seed:157","title":"LSH-MoE: Communication-efficient MoE Training via Locality-Sensitive Hashing","url":"https://seed.bytedance.com/en/research/lsh-moe-communication-efficient-moe-training-via-locality-sensitive-hashing","published":"2024-11-13","authors":["Xiaonan Nie","Qibin Liu","Fangcheng Fu","Shenhan Zhu","Xupeng Miao","Xiaoyang Li","Yang Zhang","Shouda Liu","Bin Cui"],"abstract":"Larger transformer models always perform better on various tasks but require more costs to scale up the model size. To efficiently enlarge models, the mixture-of-experts (MoE) architecture is widely adopted, which consists of a gate network and a series of experts and keep the training cost constant by routing the input data to a fixed number of experts instead of all. In existing large-scale MoE training systems, experts would be distributed among different GPUs for parallelization, and thus input data requires additional all-to-all communications to access the target experts and conduct corresponding computations. However, upon evaluating the training process of three mainstream MoE models on commonly used GPU clusters, we found that the all-to-all communication ratio averaged around 45%, which significantly hinders the efficiency and scalability of training MoE models.In this paper, w...","companies":["ByteDance/Seed"],"matched_orgs":["ByteDance/Seed"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["LLM","Infrastructures","NeurIPS 2024","efficient","compression"],"author_affiliations":["ByteDance/Seed"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official ByteDance Seed publication API https://seed.bytedance.com/api/get_article_list_v2"}},{"id":"openalex:W4404307277","title":"NeuralFeels with neural fields: Visuotactile perception for in-hand manipulation","url":"https://doi.org/10.1126/scirobotics.adl0628","published":"2024-11-13","authors":["Sudharshan Suresh","Haozhi Qi","Tingfan Wu","Taosha Fan","Luis A. Pineda","Mike Lambeta","Jitendra Malik","Mrinal Kalakrishnan","Roberto Calandra","Michael Kaess","Joseph D. Ortiz","Mustafa Mukadam"],"abstract":"scores of 81% and average pose drifts of 4.7 millimeters, which was further reduced to 2.3 millimeters with known object models. In addition, we observed that, under heavy visual occlusion, we could achieve improvements in tracking up to 94% compared with vision-only methods. Our results demonstrate that touch, at the very least, refines and, at the very best, disambiguates visual estimates during in-hand manipulation. We release our evaluation dataset of 70 experiments, FeelSight, as a step toward benchmarking in this domain. Our neural representation driven by multimodal sensing can serve as a perception backbone toward advancing robot dexterity.","companies":["Meta/FAIR"],"matched_orgs":["Meta/FAIR"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1126/scirobotics.adl0628","openalex_id":"https://openalex.org/W4404307277","cited_by_count":61,"quality_score":67,"matched_keywords":[],"author_affiliations":["Carnegie Mellon University","Deutsche Telekom (Slovakia)","Meta (United States)","Technische Universität Dresden","University of California, Berkeley"],"concepts":[{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.7920206785202026},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.7825297713279724},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6508244872093201},{"id":"https://openalex.org/C26760741","display_name":"Perception","score":0.5448830127716064},{"id":"https://openalex.org/C2781238097","display_name":"Object (grammar)","score":0.4767628312110901},{"id":"https://openalex.org/C152086174","display_name":"Haptic technology","score":0.47443950176239014},{"id":"https://openalex.org/C2780704645","display_name":"Observer (physics)","score":0.46916136145591736},{"id":"https://openalex.org/C34413123","display_name":"Robotics","score":0.4477241635322571}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":61}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/symbolic-prompt-program-search-a-structure-aware-approach-to-efficient-compile-time-prompt-optimization","title":"Symbolic Prompt Program Search: A Structure-Aware Approach to Efficient Compile-Time Prompt Optimization","url":"https://www.microsoft.com/en-us/research/publication/symbolic-prompt-program-search-a-structure-aware-approach-to-efficient-compile-time-prompt-optimization/","published":"2024-11-12","authors":["Tobias Schnabel","Jennifer Neville"],"abstract":"In many modern LLM applications, such as retrieval augmented generation, prompts have become programs themselves. In these settings, prompt programs are repeatedly called with different user queries or data instances. A big practical challenge is optimizing such prompt programs. Recent work has mostly focused on either simple prompt programs or assumed that the general structure of a prompt program is fixed. We introduce SAMMO, a framework to perform symbolic prompt program search for compile-time optimizations of prompt programs. SAMMO represents prompt programs on a symbolic level which allows for a rich set of transformations that can be searched over during optimization. We show that SAMMO generalizes previous methods and improves the performance of complex prompts on (1) instruction tuning, (2) RAG pipeline tuning, and (3) prompt compression, across several different LLMs. We make a...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":84,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","1970-01-01","LLM","retrieval","efficient","compression"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/towards-measuring-and-modeling-culture-in-llms-a-survey","title":"Towards Measuring and Modeling \"Culture\" in LLMs: A Survey","url":"https://www.microsoft.com/en-us/research/publication/towards-measuring-and-modeling-culture-in-llms-a-survey/","published":"2024-11-12","authors":["Muhammad Farid Adilazuarda","Sagnik Mukherjee","Pradhyumna Lavania","Siddhant Singh","Ashutosh Dwivedi","Alham Fikri Aji","Jacki O'Neill","Ashutosh Modi","M. Choudhury"],"abstract":"We present a survey of more than 90 recent papers that aim to study cultural representation and inclusion in large language models (LLMs). We observe that none of the studies explicitly define\"culture, which is a complex, multifaceted concept; instead, they probe the models on some specially designed datasets which represent certain aspects of\"culture\". We call these aspects the proxies of culture, and organize them across two dimensions of demographic and semantic proxies. We also categorize the probing methods employed. Our analysis indicates that only certain aspects of culture,'' such as values and objectives, have been studied, leaving several other interesting and important facets, especially the multitude of semantic domains (Thompson et al., 2020) and aboutness (Hershcovich et al., 2022), unexplored. Two other crucial gaps are the lack of robustness of probing techniques and situ...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Human-computer interaction","Computer science","1970-01-01","LLM"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/encoding-spreadsheets-for-large-language-models","title":"SpreadsheetLLM: Encoding Spreadsheets for Large Language Models","url":"https://www.microsoft.com/en-us/research/publication/encoding-spreadsheets-for-large-language-models/","published":"2024-11-12","authors":["Haoyu Dong","Yuzhang Tian","Jianbo Zhao","Junyu Xiong","Mengyu Zhou","Yun Lin","José Cambronero","Yeye He","Shi Han","Dongmei Zhang"],"abstract":"Spreadsheets are characterized by their extensive two-dimensional grids, flexible layouts, and varied formatting options, which pose significant challenges for large language models (LLMs). In response, we introduce SheetEncoder, pioneering an efficient encoding method designed to unleash and optimize LLMs’ powerful understanding and reasoning capability on spreadsheets. Initially, we propose a vanilla serialization approach that incorporates cell addresses, values, and formats. However, this approach was limited by LLMs' token constraints, making it impractical for most applications. To tackle this challenge, three innovative modules are proposed to compress spreadsheets effectively: structural-anchor-based compression, inverse index translation, and data-format-aware aggregation. It significantly improves performance in spreadsheet table detection task, outperforming the vanilla approa...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Data platforms and analytics","1970-01-01","LLM","efficient","compression"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"apple:zf9thm1svh6w1hjz6ncx2f4r","title":"Scaling Smart: Accelerating Large Language Model Pre-training with Small Model Initialization","url":"https://machinelearning.apple.com/research/scaling-smart","published":"2024-11-12","authors":["Mohammad Samragh","Iman Mirzadeh","Keivan Alizadeh Vahid","Fartash Faghri","Minsik Cho","Moin Nabi","Devang Naik","Mehrdad Farajtabar"],"abstract":"This paper was accepted at the Efficient Natural Language and Speech Processing (ENLSP) Workshop at NeurIPS 2024.","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["language model","efficient"],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"openalex:W4405440295","title":"SPICED: Syntactical Bug and Trojan Pattern Identification in A/MS Circuits using LLM-Enhanced Detection","url":"https://doi.org/10.1109/paine62042.2024.10792717","published":"2024-11-12","authors":["Jayeeta Chaudhuri","Dhruv Thapar","Arjun Chaudhuri","Farshad Firouzi","Krishnendu Chakrabarty"],"abstract":"Analog and mixed-signal (A/MS) integrated circuits (ICs) are crucial in modern electronics, playing key roles in signal processing, amplification, sensing, and power management. Many IC companies outsource manufacturing to third-party foundries, creating security risks such as stealthy analog Trojans. Traditional detection methods, including embedding circuit watermarks or conducting hardware-based monitoring, often impose significant area and power overheads, and may not effectively identify all types of Trojans. To address these shortcomings, we propose SPICED, a Large Language Model (LLM)-based framework that operates within the software domain, eliminating the need for hardware modifications for Trojan detection and localization. This is the first work using LLM-aided techniques for detecting and localizing syntactical bugs and analog Trojans in circuit netlists, requiring no explici...","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/paine62042.2024.10792717","openalex_id":"https://openalex.org/W4405440295","cited_by_count":12,"quality_score":57,"matched_keywords":["LLM","language model"],"author_affiliations":["Arizona State University","Nvidia (United States)"],"concepts":[{"id":"https://openalex.org/C174333608","display_name":"Trojan","score":0.7865160703659058},{"id":"https://openalex.org/C116834253","display_name":"Identification (biology)","score":0.7703537940979004},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6277196407318115},{"id":"https://openalex.org/C134146338","display_name":"Electronic circuit","score":0.49217358231544495},{"id":"https://openalex.org/C153180895","display_name":"Pattern recognition (psychology)","score":0.4398390054702759},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.43859967589378357},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.34252387285232544},{"id":"https://openalex.org/C38652104","display_name":"Computer security","score":0.1922372281551361}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":12}},{"id":"official:8eafa087e6ce8800","title":"Qwen2.5-Coder Series: Powerful, Diverse, Practical.","url":"https://qwenlm.github.io/blog/qwen2.5-coder-family/","published":"2024-11-12","authors":["Alibaba/Qwen"],"abstract":"GITHUB HUGGING FACE MODELSCOPE KAGGLE DEMO DISCORDIntroduction Today, we are excited to open source the “Powerful”, “Diverse”, and “Practical” Qwen2.5-Coder series, dedicated to continuously promoting the development of Open CodeLLMs.Powerful: Qwen2.5-Coder-32B-Instruct has become the current SOTA open-source code model, matching the coding capabilities of GPT-4o. While demonstrating strong and comprehensive coding abilities, it also possesses good general and mathematical skills; Diverse: Building on the previously open-sourced two sizes of 1.","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Alibaba/Qwen"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official RSS feed https://qwenlm.github.io/blog/index.xml"}},{"id":"openalex:W4404438890","title":"Constrained Reasoning Chains for Enhancing Theory-of-Mind in Large Language Models","url":"https://doi.org/10.1007/978-981-96-0119-6_34","published":"2024-11-12","authors":["Zizheng Lin","Chunkit Chan","Yangqiu Song","Xin Liu"],"abstract":"","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"book-chapter","doi":"https://doi.org/10.1007/978-981-96-0119-6_34","openalex_id":"https://openalex.org/W4404438890","cited_by_count":3,"quality_score":40,"matched_keywords":[],"author_affiliations":["Amazon (United States)","Hong Kong University of Science and Technology","Seattle University","University of Hong Kong"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8417917490005493},{"id":"https://openalex.org/C188147891","display_name":"Cognitive science","score":0.4978163242340088},{"id":"https://openalex.org/C199360897","display_name":"Programming language","score":0.41165611147880554},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4092952609062195},{"id":"https://openalex.org/C80444323","display_name":"Theoretical computer science","score":0.40823712944984436},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.3315180242061615},{"id":"https://openalex.org/C15744967","display_name":"Psychology","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":3}},{"id":"openalex:W4404439627","title":"KLoB: a Benchmark for Assessing Knowledge Localization Methods in Language Models","url":"https://doi.org/10.1007/978-981-96-0119-6_36","published":"2024-11-12","authors":["Yiming Ju","Huimin Ma","Xingrun Xing","Zhixiong Zeng"],"abstract":"","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"book-chapter","doi":"https://doi.org/10.1007/978-981-96-0119-6_36","openalex_id":"https://openalex.org/W4404439627","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Beijing Academy of Artificial Intelligence","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8720453977584839},{"id":"https://openalex.org/C185798385","display_name":"Benchmark (surveying)","score":0.7518988847732544},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5020804405212402},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.39542004466056824},{"id":"https://openalex.org/C58640448","display_name":"Cartography","score":0.10447525978088379},{"id":"https://openalex.org/C205649164","display_name":"Geography","score":0.04262414574623108}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4404239715","title":"Large language models for generative information extraction: a survey","url":"https://doi.org/10.1007/s11704-024-40555-y","published":"2024-11-11","authors":["Derong Xu","Wei Chen","Wenjun Peng","Chao Zhang","Tong Xu","Xiangyu Zhao","Xian Wu","Yefeng Zheng","Yan Wang","Enhong Chen"],"abstract":"Abstract Information Extraction (IE) aims to extract structural knowledge from plain natural language texts. Recently, generative Large Language Models (LLMs) have demonstrated remarkable capabilities in text understanding and generation. As a result, numerous works have been proposed to integrate LLMs for IE tasks based on a generative paradigm. To conduct a comprehensive systematic review and exploration of LLM efforts for IE tasks, in this study, we survey the most recent advancements in this field. We first present an extensive overview by categorizing these works in terms of various IE subtasks and techniques, and then we empirically analyze the most advanced methods and discover the emerging trend of IE tasks with LLMs. Based on a thorough review conducted, we identify several insights in technique and promising research directions that deserve further exploration in future studies...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1007/s11704-024-40555-y","openalex_id":"https://openalex.org/W4404239715","cited_by_count":195,"quality_score":71,"matched_keywords":["LLM"],"author_affiliations":["Anhui Conch Design and Research Institute of Building Materials (China)","City University of Hong Kong","Tencent (China)","University of Science and Technology of China"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.9110172986984253},{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.6548856496810913},{"id":"https://openalex.org/C195807954","display_name":"Information extraction","score":0.5560874342918396},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.4783848524093628},{"id":"https://openalex.org/C4725764","display_name":"Extraction (chemistry)","score":0.4626961350440979},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4378660023212433},{"id":"https://openalex.org/C167966045","display_name":"Generative model","score":0.41469770669937134},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.3619779348373413}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":195}},{"id":"bytedance-seed:132","title":"SeedEdit: Align Image Re-Generation to Image Editing","url":"https://seed.bytedance.com/en/research/seededit-align-image-re-generation-to-image-editing","published":"2024-11-11","authors":["Yichun Shi","Peng Wang","Weilin Huang"],"abstract":"We introduce SeedEdit, a diffusion model that is able to revise a given image with any text prompt. In our perspective, the key to such a task is to obtain an optimal balance between maintaining the original image, i.e. image reconstruction, and generating a new image, i.e. image re-generation. To this end, we start from a weak generator (text-to-image model) that creates diverse pairs between such two directions and gradually align it into a strong image editor that well balances between the two tasks. SeedEdit can achieve more diverse and stable editing capability over prior image editing methods, enabling sequential revision over images generated by diffusion models. External paper link: https://arxiv.org/abs/2411.06686","companies":["ByteDance/Seed"],"matched_orgs":["ByteDance/Seed"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Computer Vision","Vision","arXiv"],"author_affiliations":["ByteDance/Seed"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official ByteDance Seed publication API https://seed.bytedance.com/api/get_article_list_v2"}},{"id":"openalex:W4404238344","title":"Soft Magnetic Skin With Motion and Contact Sensing for Anthropomorphic Robotic Finger","url":"https://doi.org/10.1109/lra.2024.3495590","published":"2024-11-11","authors":["Xingyu Ding","Xiangbo Wang","Yaqi Zhang","Shuntian Yao","Y. Zheng","Fuchun Sun","Jianhua Shan","Bin Fang"],"abstract":"Drawing inspiration from human fine tactile and proprioceptive kinaesthetic sensing pathways, we propose a soft magnetic skin (m-skin) with multimodal sensing functions integrated into the anthropomorphic robotic finger. This paper mainly explores the magnetic tactile sensor's structural design, performance analysis, and bimodal sensing. The realization recognizes the contact information and the spatial joint angle of the fingers only by detecting the change in the magnetic field signal. Through our research, we pave the way for robotic fingers to realize an all-round sensing ability similar to human fingers, and the dexterous hand is designed with flexible five-fingers to prove the performances of soft magnetic skin, thus opening up new ways for human-robot interaction.","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/lra.2024.3495590","openalex_id":"https://openalex.org/W4404238344","cited_by_count":8,"quality_score":45,"matched_keywords":[],"author_affiliations":["Anhui University of Technology","Anshan Hospital","Beijing University of Posts and Telecommunications","Hebei University","Tencent (China)","Tsinghua University"],"concepts":[{"id":"https://openalex.org/C2988191880","display_name":"Robotic hand","score":0.5462895631790161},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.5112974643707275},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.4943704605102539},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.46517324447631836},{"id":"https://openalex.org/C104114177","display_name":"Motion (physics)","score":0.4605850577354431},{"id":"https://openalex.org/C2776058767","display_name":"Soft robotics","score":0.43878450989723206},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.3224605917930603},{"id":"https://openalex.org/C90509273","display_name":"Robot","score":0.27756649255752563}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":8}},{"id":"arxiv:2411.07126","title":"Edify Image: High-Quality Image Generation with Pixel Space Laplacian Diffusion Models","url":"https://huggingface.co/papers/2411.07126","published":"2024-11-11","authors":["NVIDIA","Yuval Atzmon","Maciej Bala","Yogesh Balaji","Tiffany Cai","Yin Cui","Jiaojiao Fan","Yunhao Ge","Siddharth Gururani","Jacob Huffman","Ronald Isaac","Pooya Jannaty"],"abstract":"We introduce Edify Image, a family of diffusion models capable of generating photorealistic image content with pixel-perfect accuracy. Edify Image utilizes cascaded pixel-space diffusion models trained using a novel Laplacian diffusion process, in which image signals at different frequency bands are attenuated at varying rates. Edify Image supports a wide range of applications, including text-to-image synthesis, 4K upsampling, ControlNets, 360 HDR panorama generation, and finetuning for image customization.","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["huggingface_search"],"source":"huggingface_search","work_type":"","doi":"","openalex_id":"","cited_by_count":0,"quality_score":27,"matched_keywords":[],"author_affiliations":[],"concepts":[],"official_report":false,"quality_signals":{"company_match_source":"HuggingFace Papers organization/author metadata"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/generative-ai-going-awry-enabling-designers-to-proactively-avoid-it-in-cscw-applications","title":"Generative AI Going Awry: Enabling Designers to Proactively Avoid It in CSCW Applications","url":"https://www.microsoft.com/en-us/research/publication/generative-ai-going-awry-enabling-designers-to-proactively-avoid-it-in-cscw-applications/","published":"2024-11-10","authors":["Jed R. Brubaker","Casey Fiesler","Michael Madaio","John Tang","Richmond Y. Wong"],"abstract":"The rapid development and deployment of generative AI technologies creates a design challenge of how to proactively understand the implications of productizing and deploying these new technologies, especially with regard to negative design implications. This is especially concerning in CSCW applications, where AI agents can introduce misunderstandings or even misdirections with the people interacting with the agent. In this panel, researchers from academia and industry will reflect on their experiences with ideas, methods, and processes to enable designers to proactively shape the responsible design of genAI in collaborative applications. The panelists represent a range of different approaches, including speculative fiction, design activities, design toolkits, and process guides. We hope that the panel encourages a discussion in the CSCW community around techniques we can put into practi...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.1145/3678884.3689133","openalex_id":"https://openalex.org/W4404331375","cited_by_count":2,"quality_score":70,"matched_keywords":["Article (Journal)","Human-computer interaction","Computer science","agent"],"author_affiliations":["Microsoft","Georgia Institute of Technology","Google (United States)","Microsoft (United States)","University of Colorado Boulder"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W4411173326","title":"A Brief Survey on Temporal Reasoning Based on Large Language Models","url":"https://doi.org/10.1109/acait63902.2024.11021814","published":"2024-11-08","authors":["Panfeng Zhang","Huan Zhang","Xiaoke Wang","Fu Zhang","Fan Yu"],"abstract":"Temporal reasoning is a pivotal mechanism for understanding the world around us, enabling inference, prediction, and deduction of temporal relationships among events. The advent of Large Language Models (LLMs) has sparked considerable interest in research on temporal reasoning utilizing these models. These models, trained on massive datasets, acquire potent representational capabilities, allowing them to learn temporal patterns and perform inference and prediction on complex temporal data. We provide a brief overview of recent research on temporal reasoning based on LLMs, exploring the capabilities of LLMs in temporal reasoning and outlining future directions. We particularly focus on four major research areas: Time Series Forecasting, Temporal Question Answering, Temporal Knowledge Graph and Assessing Temporal Reasoning Capability in LLMs. Through this review, we aim to offer new insigh...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/acait63902.2024.11021814","openalex_id":"https://openalex.org/W4411173326","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Huawei Technologies (China)","Inner Mongolia Electric Power (China)","Northeastern University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7431498169898987},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.3623165488243103},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.35240069031715393}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/generative-adapter-contextualizing-language-models-in-parameters-with-a-single-forward-pass","title":"Generative Adapter: Contextualizing Language Models in Parameters with A Single Forward Pass","url":"https://www.microsoft.com/en-us/research/publication/generative-adapter-contextualizing-language-models-in-parameters-with-a-single-forward-pass/","published":"2024-11-07","authors":["Tong Chen","Hao Fang","Patrick Xia","Xiaodong Liu","Ben Van Durme","Luke Zettlemoyer","Jianfeng Gao","Hao Cheng"],"abstract":"Large language models (LMs) are typically adapted to improve performance on new contexts (\\eg text prompts that define new tasks or domains) through fine-tuning or prompting. However, there is an accuracy compute tradeoff -- fine-tuning incurs significant training cost and prompting increases inference overhead. We introduce $GenerativeAdapter$, an effective and efficient adaptation method that directly maps new contexts to low-rank LM adapters, thereby significantly reducing inference overhead with no need for finetuning. The adapter generator is trained via self-supervised learning, and can be used to adapt a single frozen LM for any new task simply by mapping the associated task or domain context to a new adapter. We apply $GenerativeAdapter$ to two pretrained LMs (Mistral-7B-Instruct and Llama2-7B-Chat) and evaluate the adapted models in three adaption scenarios: knowledge acquisitio...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":88,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computation and Language","Computer science","mathematics","1970-01-01","personalization","memory","efficient"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/bitnet-a4-8-4-bit-activations-for-1-bit-llms","title":"BitNet a4.8: 4-bit Activations for 1-bit LLMs","url":"https://www.microsoft.com/en-us/research/publication/bitnet-a4-8-4-bit-activations-for-1-bit-llms/","published":"2024-11-07","authors":["Hongyu Wang","Shuming Ma","Furu Wei"],"abstract":"Recent research on the 1-bit Large Language Models (LLMs), such as BitNet b1.58, presents a promising direction for reducing the inference cost of LLMs while maintaining their performance. In this work, we introduce BitNet a4.8, enabling 4-bit activations for 1-bit LLMs. BitNet a4.8 employs a hybrid quantization and sparsification strategy to mitigate the quantization errors introduced by the outlier channels. Specifically, we utilize 4-bit activations for inputs to the attention and feed-forward network layers, while sparsifying intermediate states followed with 8-bit quantization. Extensive experiments demonstrate that BitNet a4.8 achieves performance comparable to BitNet b1.58 with equivalent training costs, while being faster in inference with enabling 4-bit (INT4/FP4) kernels. Additionally, BitNet a4.8 activates only 55% of parameters and supports 3-bit KV cache, further enhancing t...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":80,"matched_keywords":["Miscellaneous","Artificial intelligence","Computation and Language","Computer science","Machine learning","LLM","quantization"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W4404130309","title":"Code-switching finetuning: Bridging multilingual pretrained language models for enhanced cross-lingual performance","url":"https://doi.org/10.1016/j.engappai.2024.109532","published":"2024-11-07","authors":["Changtong Zan","Liang Ding","Li Shen","Yu Cao","Weifeng Liu"],"abstract":"","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1016/j.engappai.2024.109532","openalex_id":"https://openalex.org/W4404130309","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["China University of Petroleum, East China","Jingdong (China)","Sun Yat-sen University","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.9538426399230957},{"id":"https://openalex.org/C174348530","display_name":"Bridging (networking)","score":0.9199692010879517},{"id":"https://openalex.org/C2776760102","display_name":"Code (set theory)","score":0.4322052299976349},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.42337724566459656},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4154309630393982},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.38770031929016113},{"id":"https://openalex.org/C199360897","display_name":"Programming language","score":0.3195015490055084},{"id":"https://openalex.org/C31258907","display_name":"Computer network","score":0.1322917342185974}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/quietsync-integrating-multimodal-signals-for-accurate-silent-speech-interaction-with-head-worn-devices","title":"Whispering Wearables: Multimodal Approach to Silent Speech Recognition with Head-Worn Devices","url":"https://www.microsoft.com/en-us/research/publication/quietsync-integrating-multimodal-signals-for-accurate-silent-speech-interaction-with-head-worn-devices/","published":"2024-11-06","authors":["Tanmay Srivastava","R. Michael Winters","Yu-Te Wang","Thomas M. Gable","Teresa LaScala","Ivan Tashev"],"abstract":"Silent speech recognition has emerged as a promising approach for Thomas M. Gable Microsoft Corporation United States thomas.gable@microsoft.com Ivan J. Tashev Microsoft Research Labs, Microsoft Corporation United States ivantash@microsoft.com Silent speech recognition; Accessibility; EXG, and IMU sensing enabling hands-free and discreet interaction with head-worn de vices. In this paper, we present QuietSync, a multimodal system that combines inertial measurement unit (IMU) and contact electrode (ExG) signals to achieve accurate silent speech recognition using of-the-shelf devices. QuietSync utilizes an IMU attached to the lower part of the headphones near the ear and strategically places ExG electrodes on the headphones, glasses (nose and behind the ear), and face (for VR applications) to capture subtle movements and muscle activity associated with silent speech production. We con duct...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.1145/3678957.3685720","openalex_id":"https://openalex.org/W4403913173","cited_by_count":12,"quality_score":96,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Audio and Acoustics","Hardware and devices","Human-computer interaction","Audio and Speech Processing","Brain–computer interface","1970-01-01"],"author_affiliations":["Microsoft","Academia Sinica","Microsoft (United States)","Stony Brook University"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/llm2clip-powerful-language-model-unlock-richer-visual-representation","title":"LLM2CLIP: Powerful Language Model Unlocks Richer Visual Representation","url":"https://www.microsoft.com/en-us/research/publication/llm2clip-powerful-language-model-unlock-richer-visual-representation/","published":"2024-11-06","authors":["Weiquan Huang","Aoqi Wu","Yifan Yang","Xufang Luo","Yuqing Yang","Liang Hu","Qi Dai","Xiyang Dai","Dongdong Chen","Chong Luo","Lili Qiu"],"abstract":"CLIP is one of the most important multimodal foundational models today. What powers CLIP's capabilities? The rich supervision signals provided by natural language, the carrier of human knowledge, shape a powerful cross-modal representation space. However, with the rapid advancements in large language models LLMs like GPT-4 and LLaMA, the boundaries of language comprehension and generation are continually being pushed. This raises an intriguing question: can the capabilities of LLMs be harnessed to further improve multimodal representation learning? The potential benefits of incorporating LLMs into CLIP are clear. LLMs' strong textual understanding can fundamentally improve CLIP's ability to handle image captions, drastically enhancing its ability to process long and complex texts, a well-known limitation of vanilla CLIP. Moreover, LLMs are trained on a vast corpus of text, possessing ope...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":84,"matched_keywords":["Miscellaneous","Computer vision","Computation and Language","Computer science","Computer Vision and Pattern Recognition","LLM","language model","efficient"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/from-medprompt-to-o1-exploration-of-run-time-strategies-for-medical-challenge-problems-and-beyond","title":"From Medprompt to o1: Exploration of Run-Time Strategies for Medical Challenge Problems and Beyond","url":"https://www.microsoft.com/en-us/research/publication/from-medprompt-to-o1-exploration-of-run-time-strategies-for-medical-challenge-problems-and-beyond/","published":"2024-11-06","authors":["Harsha Nori","Naoto Usuyama","Nicholas King","S. McKinney","Xavier Fernandes","Sheng Zhang","Eric Horvitz"],"abstract":"Run-time steering strategies like Medprompt are valuable for guiding large language models (LLMs) to top performance on challenging tasks. Medprompt demonstrates that a general LLM can be focused to deliver state-of-the-art performance on specialized domains like medicine by using a prompt to elicit a run-time strategy involving chain of thought reasoning and ensembling. OpenAI's o1-preview model represents a new paradigm, where a model is designed to do run-time reasoning before generating final responses. We seek to understand the behavior of o1-preview on a diverse set of medical challenge problem benchmarks. Following on the Medprompt study with GPT-4, we systematically evaluate the o1-preview model across various medical benchmarks. Notably, even without prompting techniques, o1-preview largely outperforms the GPT-4 series with Medprompt. We further systematically study the efficacy...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Unpublished","Artificial intelligence","Medical, health and genomics","Computer science","LLM"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"bytedance-seed:187","title":"Polynomial Composition Activations: Unleashing the Dynamics of Large Language Models","url":"https://seed.bytedance.com/en/research/polynomial-composition-activations-unleashing-the-dynamics-of-large-language-models","published":"2024-11-06","authors":["Zhijian Zhuo","Ya Wang","Yutao Zeng","Xiaoqing Li","Xun Zhou","Jinwen Ma"],"abstract":"Transformers have found extensive applications across various domains due to the powerful fitting capabilities. This success can be partially attributed to their inherent nonlinearity. Thus, in addition to the ReLU function employed in the original transformer architecture, researchers have explored alternative modules such as GeLU and SwishGLU to enhance nonlinearity and thereby augment representational capacity. In this paper, we propose a novel category of polynomial composition activations (PolyCom), designed to optimize the dynamics of transformers. Theoretically, we provide a comprehensive mathematical analysis of PolyCom, highlighting its enhanced expressivity and efficacy relative to other activation functions. Notably, we demonstrate that networks incorporating PolyCom achieve the optimal approximation rate, indicating that PolyCom networks require minimal parameters to approxim...","companies":["ByteDance/Seed"],"matched_orgs":["ByteDance/Seed"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["LLM","ICLR 2025"],"author_affiliations":["ByteDance/Seed"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official ByteDance Seed publication API https://seed.bytedance.com/api/get_article_list_v2"}},{"id":"openalex:W4405937039","title":"Modified Convolutional Neural Network with Multiple Features for Multimodal Sarcasm Detection","url":"https://doi.org/10.1109/icrais62903.2024.10811714","published":"2024-11-06","authors":["Ramesh Krishnamaneni","Muralidhar Kurni","Souptik Sen","Ashwin Murthy"],"abstract":"Due to the increasing prevalence of social media and online interactions, there's a growing need for analytical models that can interpret the complex and diverse forms of communication often found on these platforms, particularly in identifying sarcasm. Contemporary studies utilize multi-stage models and advanced techniques for extracting semantic information, often relying on single-mode encoders. However, these models frequently encounter difficulties in efficiently fusing and aligning multi-modal representations. This paper proposes a novel multimodal sarcasm detection using the MCNN model. This work utilizes three modalities like text, image and audio. The input text undergoes preprocessing through tokenization and stemming. The input image undergoes preprocessing through Gaussian filtering, while the input audio is pre-processed via median filtering. Subsequently, features such as T...","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/icrais62903.2024.10811714","openalex_id":"https://openalex.org/W4405937039","cited_by_count":0,"quality_score":41,"matched_keywords":["media"],"author_affiliations":["Amazon (United States)","IBM Research - Almaden","Jawaharlal Nehru Technological University Anantapur"],"concepts":[{"id":"https://openalex.org/C2776207355","display_name":"Sarcasm","score":0.965959906578064},{"id":"https://openalex.org/C81363708","display_name":"Convolutional neural network","score":0.7639096975326538},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.721625804901123},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5897232890129089},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.36370915174484253},{"id":"https://openalex.org/C153180895","display_name":"Pattern recognition (psychology)","score":0.3553276062011719},{"id":"https://openalex.org/C2779975665","display_name":"Irony","score":0.0},{"id":"https://openalex.org/C142362112","display_name":"Art","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4404115950","title":"Task-Aware Few-Shot Image Generation via Dynamic Local Distribution Estimation and Sampling","url":"https://doi.org/10.1007/978-981-97-8490-5_33","published":"2024-11-06","authors":["Zheng Gu","Wenbin Li","Tianyu Ding","Zhengli Wang","Jing Huo","Kuihua Huang","Yang Gao"],"abstract":"","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"book-chapter","doi":"https://doi.org/10.1007/978-981-97-8490-5_33","openalex_id":"https://openalex.org/W4404115950","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["City University of Hong Kong, Shenzhen Research Institute","Microsoft (United States)","Nanjing University","Nanjing University of Science and Technology","National University of Defense Technology"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.850501298904419},{"id":"https://openalex.org/C2778344882","display_name":"Shot (pellet)","score":0.6957528591156006},{"id":"https://openalex.org/C2780451532","display_name":"Task (project management)","score":0.5623370409011841},{"id":"https://openalex.org/C140779682","display_name":"Sampling (signal processing)","score":0.5324068665504456},{"id":"https://openalex.org/C115961682","display_name":"Image (mathematics)","score":0.5092057585716248},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.44871124625205994},{"id":"https://openalex.org/C96250715","display_name":"Estimation","score":0.4426499009132385},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.44198840856552124}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/game-plot-design-with-an-llm-powered-assistant-an-empirical-study-with-game-designers","title":"Game Plot Design with an LLM-powered Assistant: An Empirical Study with Game Designers","url":"https://www.microsoft.com/en-us/research/publication/game-plot-design-with-an-llm-powered-assistant-an-empirical-study-with-game-designers/","published":"2024-11-05","authors":["Seyed Hossein Alavi","Weijia Xu","Nebojsa Jojic","Daniel Kennett","Raymond Ng","Sudha Rao","Haiyan Zhang","Bill Dolan","Vered Shwartz"],"abstract":"We introduce GamePlot, an LLM-powered assistant that supports game designers in crafting immersive narratives for turn-based games, and allows them to test these games through a collaborative game play and refine the plot throughout the process. Our user study with 14 game designers shows high levels of both satisfaction with the generated game plots and sense of ownership over the narratives, but also reconfirms that LLM are limited in their ability to generate complex and truly innovative content. We also show that diverse user populations have different expectations from AI assistants, and encourage researchers to study how tailoring assistants to diverse user groups could potentially lead to increased job satisfaction and greater creativity and innovation over time.","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Miscellaneous","Artificial intelligence","Computation and Language","Computer science","large language models","LLM"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/if-at-first-you-dont-succeed-try-try-again-insights-and-llm-informed-tooling-for-detecting-retry-bugs-in-software-systems","title":"If At First You Don’t Succeed, Try, Try, Again...? Insights and LLM-informed Tooling for Detecting Retry Bugs in Software Systems","url":"https://www.microsoft.com/en-us/research/publication/if-at-first-you-dont-succeed-try-try-again-insights-and-llm-informed-tooling-for-detecting-retry-bugs-in-software-systems/","published":"2024-11-05","authors":["Bogdan Alexandru Stoica","Utsav Sethi","Yiming Su","Cyrus Zhou","Shan Lu","Jonathan Mace","Madan Musuvathi","Suman Nath"],"abstract":"Retry - the re-execution of a task on failure - is a common mechanism to enable resilient software systems. Yet, despite its commonality and long history, retry remains difficult to implement and test in modern systems. Guided by our study of real-world retry issues, we propose a novel suite of static and dynamic techniques to detect retry problems in software systems. In particular, we find that the ad-hoc nature of retry implementation in software systems poses challenges for traditional program analysis but can be well handled by Large Language Models; we also find that careful repurposing existing unit tests can, along with fault injection, expose various types of retry problems. Venue: 1970-01-01","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Inproceedings (Conference)","Systems and networking","1970-01-01","LLM"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"bytedance-seed:127","title":"Classification Done Right for Vision-Language Pre-Training","url":"https://seed.bytedance.com/en/research/classification-done-right-for-vision-language-pre-training","published":"2024-11-05","authors":["Zilong Huang","Qinghao Ye","Bingyi Kang","Jiashi Feng","Haoqi Fan"],"abstract":"We introduce SuperClass, a super simple classification method for vision-language pre-training on image-text data. Unlike its contrastive counterpart CLIP who contrast with a text encoder, SuperClass directly utilizes tokenized raw text as supervised classification labels, without the need for additional text filtering or selection. Due to the absence of the text encoding as contrastive target, SuperClass does not require a text encoder and does not need to maintain a large batch size as CLIP does. SuperClass demonstrated superior performance on various downstream tasks, including classic computer vision benchmarks and vision language downstream tasks. We further explored the scaling behavior of SuperClass on model size, training length, or data size, and reported encouraging results and comparisons to CLIP. https://github.com/x-cls/superclass External paper link: https://arxiv.org/abs/2...","companies":["ByteDance/Seed"],"matched_orgs":["ByteDance/Seed"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Computer Vision","Multimodal","NeurIPS 2024"],"author_affiliations":["ByteDance/Seed"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official ByteDance Seed publication API https://seed.bytedance.com/api/get_article_list_v2"}},{"id":"apple:t6ld1za9ujkon54u073d8lol","title":"Device-Directed Speech Detection for Follow-up Conversations Using Large Language Models","url":"https://machinelearning.apple.com/research/device-directed","published":"2024-11-05","authors":["Oggi Rudovic","Pranay Dighe","Yi Su","Vineet Garg","Sameer Dharur","Xiaochuan Niu","Ahmed H. Abdelaziz","Saurabh Adya","Ahmed Tewfik"],"abstract":"This paper was accepted at the Adaptive Foundation Models (AFM) Workshop at NeurIPS 2024.","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"bytedance-seed:126","title":"How Far is Video Generation from World Model: A Physical Law Perspective","url":"https://seed.bytedance.com/en/research/how-far-is-video-generation-from-world-model-a-physical-law-perspective","published":"2024-11-04","authors":["Bingyi Kang","Yang Yue","Rui Lu","Zhijie Lin","Yang Zhao","Kaixin Wang","Gao Huang","Jiashi Feng"],"abstract":"OpenAI's Sora highlights the potential of video generation for developing world models that adhere to fundamental physical laws. However, the ability of video generation models to discover such laws purely from visual data without human priors can be questioned. A world model learning the true law should give predictions robust to nuances and correctly extrapolate on unseen scenarios. In this work, we evaluate across three key scenarios: in-distribution, out-of-distribution, and combinatorial generalization. We developed a 2D simulation testbed for object movement and collisions to generate videos deterministically governed by one or more classical mechanics laws. This provides an unlimited supply of data for large-scale experimentation and enables quantitative evaluation of whether the generated videos adhere to physical laws. We trained diffusion-based video generation models to predic...","companies":["ByteDance/Seed"],"matched_orgs":["ByteDance/Seed"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Computer Vision and Pattern Recognition","Vision","ICML 2025"],"author_affiliations":["ByteDance/Seed"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official ByteDance Seed publication API https://seed.bytedance.com/api/get_article_list_v2"}},{"id":"openalex:W4404025009","title":"Towards Multimodal Sentiment Analysis Debiasing via Bias Purification","url":"https://doi.org/10.1007/978-3-031-73636-0_27","published":"2024-11-04","authors":["Dingkang Yang","M. H. Li","Dongling Xiao","Yang Liu","Kun Yang","Zhaoyu Chen","Yuzheng Wang","Peng Zhai","Ke Li","Lihua Zhang"],"abstract":"","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"book-chapter","doi":"https://doi.org/10.1007/978-3-031-73636-0_27","openalex_id":"https://openalex.org/W4404025009","cited_by_count":19,"quality_score":56,"matched_keywords":[],"author_affiliations":["Fudan University","Ministry of Education of the People's Republic of China","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C2779458634","display_name":"Debiasing","score":0.9902795553207397},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8358798027038574},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4542827904224396},{"id":"https://openalex.org/C188147891","display_name":"Cognitive science","score":0.10848590731620789},{"id":"https://openalex.org/C15744967","display_name":"Psychology","score":0.06272658705711365}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":19}},{"id":"apple:dvfmrnyseech6hyw8iapknow","title":"Aggregate-and-Adapt Natural Language Prompts for Downstream Generalization of CLIP","url":"https://machinelearning.apple.com/research/aggregate-and-adapt","published":"2024-11-04","authors":["Chen Huang","Skyler Seto","Samira Abnar","David Grangier","Navdeep Jaitly","Josh Susskind"],"abstract":"Large pretrained vision-language models like CLIP have shown promising generalization capability, but may struggle in specialized domains (e.g., satellite imagery) or fine-grained classification (e.g., car models) where the visual concepts are unseen or under-represented during pretraining. Prompt learning offers a parameter-efficient finetuning framework that can adapt CLIP to downstream tasks even when limited annotation data are available. In...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["efficient"],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"official:f9e98773f742f19a","title":"Tencent Hunyuan3D-1.0: A Unified Framework for Text-to-3D and Image-to-3D Generation","url":"https://huggingface.co/papers/2411.02293","published":"2024-11-04","authors":["Tencent Hunyuan"],"abstract":"","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_report"],"source":"official_report","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Tencent/Hunyuan"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official company report source"}},{"id":"official:f852148471bfa91f","title":"Hunyuan-Large: An Open-Source MoE Model with 52 Billion Activated Parameters by Tencent","url":"https://huggingface.co/papers/2411.02265","published":"2024-11-04","authors":["Tencent Hunyuan"],"abstract":"","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_report"],"source":"official_report","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Tencent/Hunyuan"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official company report source"}},{"id":"openalex:W4404013747","title":"Application of Generative AI to Derive Insight from Supply Chain & Logistics Contracts","url":"https://doi.org/10.2118/222932-ms","published":"2024-11-04","authors":["Ajay Pratap Singh","Tianxia Jia","Varun Nalagatla","Brian P. Cunningham","Talib Siwani"],"abstract":"Abstract Contract management is a critical process for energy companies operating across upstream, midstream, and downstream sectors. These companies deal with numerous complex contracts containing intricate legal language, cross-references, and long document lineages spanning amendments and supplemental materials. Manually extracting insights and managing obligations from these highly unstructured contracts is extremely time-consuming and error-prone. This paper presents a novel framework leveraging machine learning and generative AI (GenAI) to automate and streamline contract management. The proposed solution utilizes large language models (LLMs), prompt engineering, retrieval-augmented generation (RAG), and chain-of-thought reasoning to extract structured data (such as fees, escalations etc.) from contracts, perform analyses, and generate natural language responses. It enables use cas...","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.2118/222932-ms","openalex_id":"https://openalex.org/W4404013747","cited_by_count":3,"quality_score":48,"matched_keywords":["LLM","retrieval"],"author_affiliations":["Amazon (United States)"],"concepts":[{"id":"https://openalex.org/C108713360","display_name":"Supply chain","score":0.7854092121124268},{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.5927006602287292},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.527479350566864},{"id":"https://openalex.org/C167966045","display_name":"Generative model","score":0.44457513093948364},{"id":"https://openalex.org/C199185054","display_name":"Chain (unit)","score":0.4354902505874634},{"id":"https://openalex.org/C144133560","display_name":"Business","score":0.30716150999069214},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.24034026265144348},{"id":"https://openalex.org/C162853370","display_name":"Marketing","score":0.08046260476112366}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":3}},{"id":"openalex:W4404013718","title":"Generative AI Powered Virtual Data Rooms for Energy","url":"http://dx.doi.org/10.2118/222602-ms","published":"2024-11-04","authors":["D. Tishechkin","Y. Gubanov","R. Gibson","S. Tribbey"],"abstract":"Abstract In the ever-evolving Oil & Gas (O&G) industry, National Oil Companies (NOCs) are central to the management and distribution of critical energy resources. A key aspect of their operations involves conducting bidding rounds to attract investments and forge partnerships for the exploration and development of oil and gas fields. Historically, this process required physical data rooms where interested parties could access confidential information related to the bids. However, with the rise of cloud technology, this practice has been transformed by the introduction of virtual data rooms (VDRs), which offer \"a faster, more cost-effective, and efficient way to manage data and information during licensing and bidding rounds\" (1). Generative AI, a subset of artificial intelligence, involves models that can generate new data based on patterns learned from existing data. This technology pre...","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.2118/222602-ms","openalex_id":"https://openalex.org/W4404013718","cited_by_count":1,"quality_score":46,"matched_keywords":["retrieval","efficient"],"author_affiliations":["Amazon (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7116276025772095},{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.48115456104278564},{"id":"https://openalex.org/C186370098","display_name":"Energy (signal processing)","score":0.4624635577201843},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4271318316459656},{"id":"https://openalex.org/C121684516","display_name":"Computer graphics (images)","score":0.3467940092086792},{"id":"https://openalex.org/C121332964","display_name":"Physics","score":0.061482369899749756},{"id":"https://openalex.org/C62520636","display_name":"Quantum mechanics","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4404013827","title":"Using Generative AI to Build a Reservoir Simulation Assistant","url":"https://doi.org/10.2118/221987-ms","published":"2024-11-04","authors":["Klaus Wiegand","M. Bedewi","K. Mukundakrishnan","D. Tishechkin","V. Ananthan","Dan Kahn"],"abstract":"Abstract Numerical reservoir simulation is an intricate aspect of reservoir engineering, requiring a thorough understanding of reservoir engineering principles and the specific syntax of the reservoir simulation software used with the objective of designing an economical Field Development Plan (FDP) that is used to extract the hydrocarbons in real life which entails a considerable investment. This complexity is not unique to reservoir engineering but is also present in domains such as climate modeling, aerospace engineering, and electrical grid simulation, where accurate modeling and simulation are vital. Reservoir Engineers spend a considerable amount of time working with input \"decks\", which are structured text files used to build reservoir simulation models. Despite available tools to aid in their creation, manual refinement is often necessary to optimize simulation outcomes for forec...","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.2118/221987-ms","openalex_id":"https://openalex.org/W4404013827","cited_by_count":4,"quality_score":45,"matched_keywords":["efficient"],"author_affiliations":["Amazon (United States)","Stone Ridge Technology (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7544325590133667},{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.7004275321960449},{"id":"https://openalex.org/C184408114","display_name":"Generative Design","score":0.461612343788147},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4397292733192444},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.33240967988967896},{"id":"https://openalex.org/C115903868","display_name":"Software engineering","score":0.32699593901634216},{"id":"https://openalex.org/C127413603","display_name":"Engineering","score":0.16975629329681396},{"id":"https://openalex.org/C21547014","display_name":"Operations management","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":4}},{"id":"openalex:W4404002404","title":"MM1: Methods, Analysis and Insights from Multimodal LLM Pre-training","url":"https://doi.org/10.1007/978-3-031-73397-0_18","published":"2024-11-02","authors":["Brandon McKinzie","Zhe Gan","Jean-Philippe Fauconnier","Sam Dodge","Bowen Zhang","Philipp Dufter","Dhruti Shah","Xianzhi Du","Futang Peng","Anton Belyi","Haotian Zhang","K. Singh"],"abstract":"","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"book-chapter","doi":"https://doi.org/10.1007/978-3-031-73397-0_18","openalex_id":"https://openalex.org/W4404002404","cited_by_count":45,"quality_score":71,"matched_keywords":["LLM"],"author_affiliations":["Apple (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.817894458770752},{"id":"https://openalex.org/C2777211547","display_name":"Training (meteorology)","score":0.5944246053695679},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4318506717681885},{"id":"https://openalex.org/C205649164","display_name":"Geography","score":0.04475200176239014},{"id":"https://openalex.org/C153294291","display_name":"Meteorology","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":45}},{"id":"openalex:W4403998454","title":"Multi-modal machine learning for the early detection of metabolic disorder in dairy cows using a cloud computing framework","url":"https://doi.org/10.1016/j.compag.2024.109563","published":"2024-11-02","authors":["Rafael Ferreira","María Angels de Luis Balaguer","Tiago Bresolin","Ranveer Chandra","Guilherme J. M. Rosa","Heather M. White","J.R.R. Dórea"],"abstract":"","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1016/j.compag.2024.109563","openalex_id":"https://openalex.org/W4403998454","cited_by_count":15,"quality_score":52,"matched_keywords":[],"author_affiliations":["Microsoft (United States)","Microsoft Research (United Kingdom)","University of Illinois Urbana-Champaign","University of Wisconsin–Madison"],"concepts":[{"id":"https://openalex.org/C79974875","display_name":"Cloud computing","score":0.7962753176689148},{"id":"https://openalex.org/C71139939","display_name":"Modal","score":0.5749975442886353},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.4754704535007477},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.46482518315315247},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.41750943660736084},{"id":"https://openalex.org/C111919701","display_name":"Operating system","score":0.13570687174797058},{"id":"https://openalex.org/C192562407","display_name":"Materials science","score":0.12021514773368835},{"id":"https://openalex.org/C188027245","display_name":"Polymer chemistry","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":15}},{"id":"openalex:W4404002400","title":"PQ-SAM: Post-training Quantization for Segment Anything Model","url":"https://doi.org/10.1007/978-3-031-72684-2_24","published":"2024-11-02","authors":["Xiaoyu Liu","Xin Ding","Lei Yu","Yuanyuan Xi","Wei Li","Zhijun Tu","Jie Hu","Hanting Chen","Baoqun Yin","Zhiwei Xiong"],"abstract":"","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"book-chapter","doi":"https://doi.org/10.1007/978-3-031-72684-2_24","openalex_id":"https://openalex.org/W4404002400","cited_by_count":6,"quality_score":47,"matched_keywords":["quantization"],"author_affiliations":["Huawei Technologies (China)","University of Science and Technology of China"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8038447499275208},{"id":"https://openalex.org/C28855332","display_name":"Quantization (signal processing)","score":0.6235924363136292},{"id":"https://openalex.org/C2777211547","display_name":"Training (meteorology)","score":0.4613070785999298},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.44193077087402344},{"id":"https://openalex.org/C11413529","display_name":"Algorithm","score":0.3998189866542816},{"id":"https://openalex.org/C121684516","display_name":"Computer graphics (images)","score":0.36683714389801025},{"id":"https://openalex.org/C28490314","display_name":"Speech recognition","score":0.35237085819244385},{"id":"https://openalex.org/C121332964","display_name":"Physics","score":0.06234461069107056}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":6}},{"id":"openalex:W4404004808","title":"Powerful and Flexible: Personalized Text-to-Image Generation via Reinforcement Learning","url":"https://doi.org/10.1007/978-3-031-73383-3_23","published":"2024-11-02","authors":["Fanyue Wei","Wei Zeng","Zhenyang Li","Dawei Yin","Lixin Duan","Li Wen"],"abstract":"","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"book-chapter","doi":"https://doi.org/10.1007/978-3-031-73383-3_23","openalex_id":"https://openalex.org/W4404004808","cited_by_count":1,"quality_score":42,"matched_keywords":["personalized"],"author_affiliations":["Baidu (China)","University of Electronic Science and Technology of China"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8591107130050659},{"id":"https://openalex.org/C97541855","display_name":"Reinforcement learning","score":0.837326169013977},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5272397398948669},{"id":"https://openalex.org/C115961682","display_name":"Image (mathematics)","score":0.4533403217792511},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.4231325685977936}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4404007908","title":"Improving Video Representation of Vision-Language Model with Decoupled Explicit Temporal Modeling","url":"https://doi.org/10.1007/978-981-97-8511-7_37","published":"2024-11-02","authors":["Yuxi Liu","Wenyu Zhang","Sihong Chen","Xinming Zhang"],"abstract":"","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"book-chapter","doi":"https://doi.org/10.1007/978-981-97-8511-7_37","openalex_id":"https://openalex.org/W4404007908","cited_by_count":0,"quality_score":41,"matched_keywords":["language model"],"author_affiliations":["Tencent (China)","University of Science and Technology of China"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8829251527786255},{"id":"https://openalex.org/C2776359362","display_name":"Representation (politics)","score":0.6451367139816284},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4798571467399597},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.4661860167980194},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.37605413794517517},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.3372949957847595},{"id":"https://openalex.org/C199539241","display_name":"Law","score":0.0},{"id":"https://openalex.org/C94625758","display_name":"Politics","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/quantifying-reliance-on-external-information-over-parametric-knowledge-during-retrieval-augmented-generation-rag-using-mechanistic-analysis","title":"Quantifying reliance on external information over parametric knowledge during Retrieval Augmented Generation (RAG) using mechanistic analysis","url":"https://www.microsoft.com/en-us/research/publication/quantifying-reliance-on-external-information-over-parametric-knowledge-during-retrieval-augmented-generation-rag-using-mechanistic-analysis/","published":"2024-11-01","authors":["Reshmi Ghosh","Rahul Seetharaman","Hitesh Wadhwa","Somyaa Aggarwal","Samyadeep Basu","Soundararajan Srinivasan","Wenlong Zhao","Shreyas Chaudhari","Ehsan Aghazadeh"],"abstract":"Retrieval Augmented Generation (RAG) is a widely used approach for leveraging external context in several natural language applications such as question answering and information retrieval. Yet, the exact nature in which a Language Model (LM) leverages this nonparametric memory or retrieved context isn’t clearly understood. This paper mechanistically examines the RAG pipeline to highlight that LMs demonstrate a “shortcut” effect and have a strong bias towards utilizing the retrieved context to answer questions, while relying minimally on model priors. We propose (a) Causal Mediation Analysis; for proving that parametric memory is minimally utilized when answering a question and (b) Attention Contributions and Knockouts for showing the last token residual stream do not get enriched from the subject token in the question, but gets enriched from tokens of RAG-context. We find this pronounce...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":84,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Human language technologies","Search and information retrieval","1970-01-01","language model","memory","retrieval"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/promptintern","title":"PromptIntern: Saving Inference Costs by Internalizing Recurrent Prompt during Large Language Model Fine-tuning","url":"https://www.microsoft.com/en-us/research/publication/promptintern/","published":"2024-11-01","authors":["Jiaru Zou","Mengyu Zhou","Tao Li","Shi Han","Dongmei Zhang"],"abstract":"Recent advances in fine-tuning large language models (LLMs) have greatly enhanced their usage in domain-specific tasks. Despite the success, fine-tuning continues to rely on repeated and lengthy prompts, which escalate computational expenses, require more resources, and lead to slower inference. In this paper, we present a novel approach, PromptIntern, which internalizes prompt knowledge during model fine-tuning to achieve efficient inference and save costs. Instead of compressing the prompts for a vanilla model, PromptIntern aims to embed the recurrent prompt directly into the model parameters. We design a fine-tuning pipeline that includes instruction template compression, few-shot example absorption, and a progressive internalization strategy, effectively diminishing the need for intricate prompts during inference. Comprehensive experiments on challenging NL2Code tasks demonstrate tha...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":84,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Human language technologies","Programming languages and software engineering","1970-01-01","language model","efficient","compression"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/working-with-generative-ai-we-need-more-african-voices-real-african-voices","title":"Working with Generative AI: We need more African Voices, Real African Voices","url":"https://www.microsoft.com/en-us/research/publication/working-with-generative-ai-we-need-more-african-voices-real-african-voices/","published":"2024-11-01","authors":["Najeeb G. Abdulhamid","Stephanie Nyairo","Jacki O'Neill"],"abstract":"Generative AI has taken the world by storm, appearing to be more usable than previous generations of AI. We describe the findings of a qualitative study of Small and Medium Businesses in Kenya and Nigeria who were using generative AI tools in their everyday work. We found that AI tools were used to support both mundane and creative work and provided both organisational and individual benefits. Participants adopted a number of methods to navigate the strengths and weaknesses of different tools and comparing the output of multiple tools was common. Additionally, our findings suggest that whilst to some extent rhetorics around the democratisation of AI might hold true, these tools did not well support or represent African languages, identities or locales and were understood by participants to embody Western biases. We propose that regional bias should be explicitly called out to encourage r...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":80,"matched_keywords":["Unpublished","Artificial intelligence","Human-computer interaction","Social sciences","cscw","Generative AI","Qualitative analysis"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/hybrid-retrieval-augmented-generation-for-real-time-composition-assistance","title":"Hybrid-RACA: Hybrid Retrieval-Augmented Composition Assistance for Real-time Text Prediction","url":"https://www.microsoft.com/en-us/research/publication/hybrid-retrieval-augmented-generation-for-real-time-composition-assistance/","published":"2024-11-01","authors":["Menglin Xia","Xuchao Zhang","Camille Couturier","Guoqing Zheng","Saravan Rajmohan","Victor Ruehle"],"abstract":"Large language models (LLMs) enhanced with retrieval augmentation has shown great performance in many applications. However, the computational demands for these models pose a challenge when applying them to real-time tasks, such as composition assistance. To address this, we propose Hybrid Retrieval-Augmented Composition Assistance (Hybrid-RACA), a novel system for real-time text prediction that efficiently combines a cloud-based LLM with a smaller client-side model through retrieval augmented memory. This integration enables the client model to generate better responses, benefiting from the LLM's capabilities and cloud-based data. Meanwhile, via a novel asynchronous memory update mechanism, the client model can deliver real-time completions to user inputs without the need to wait for responses from the cloud. Our experiments on five datasets demonstrate that Hybrid-RACA offers strong pe...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":80,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","LLMs Inference","1970-01-01","LLM","memory","retrieval"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/autorag-hp-automatic-online-hyper-parameter-tuning-for-retrieval-augmented-generation","title":"AutoRAG-HP: Automatic Online Hyper-Parameter Tuning for Retrieval-Augmented Generation","url":"https://www.microsoft.com/en-us/research/publication/autorag-hp-automatic-online-hyper-parameter-tuning-for-retrieval-augmented-generation/","published":"2024-11-01","authors":["Jia Fu","Xiaoting Qin","Fangkai Yang","Lu Wang","Jue Zhang","Qingwei Lin 林庆维","Yubo Chen","Dongmei Zhang","Saravan Rajmohan","Qi Zhang"],"abstract":"Recent advancements in Large Language Models have transformed ML/AI development, necessitating a reevaluation of AutoML principles for the Retrieval-Augmented Generation (RAG) systems. To address the challenges of hyper-parameter optimization and online adaptation in RAG, we propose the AutoRAG-HP framework, which formulates the hyper-parameter tuning as an online multi-armed bandit (MAB) problem and introduces a novel two-level Hierarchical MAB (Hier-MAB) method for efficient exploration of large search spaces. We conduct extensive experiments on tuning hyper-parameters, such as top-k retrieved documents, prompt compression ratio, and embedding methods, using the ALCE-ASQA and Natural Questions datasets. Our evaluation from jointly optimization all three hyper-parameters demonstrate that MAB-based online learning methods can achieve Recall@5 ≈ 0.8 for scenarios with prominent gradients....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":80,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","1970-01-01","LLM","retrieval","efficient","compression"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/a-study-on-context-length-and-efficient-transformers-for-biomedical-image-analysis","title":"A Study on Context Length and Efficient Transformers for Biomedical Image Analysis","url":"https://www.microsoft.com/en-us/research/publication/a-study-on-context-length-and-efficient-transformers-for-biomedical-image-analysis/","published":"2024-11-01","authors":["Sarah Hooper","Hui Xue"],"abstract":"Biomedical images are often high-resolution and multi-dimensional, presenting computational challenges for deep neural networks. These computational challenges are compounded when training transformers due to the self-attention operator, which scales quadratically with context length. Recent works have proposed alternatives to self-attention that scale more favorably with context length, alleviating these computational difficulties and potentially enabling more efficient application of transformers to large biomedical images. However, a systematic evaluation on this topic is lacking. In this study, we investigate the impact of context length on biomedical image analysis and we evaluate the performance of recently proposed substitutes to self-attention. We first curate a suite of biomedical imaging datasets, including 2D and 3D data for segmentation, denoising, and classification tasks. W...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":80,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Medical, health and genomics","Machine learning","Medical Imaging","1970-01-01","efficient"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/vptq-extreme-low-bit-vector-post-training-quantization-for-large-language-models","title":"VPTQ: Extreme Low-bit Vector Post-Training Quantization for Large Language Models","url":"https://www.microsoft.com/en-us/research/publication/vptq-extreme-low-bit-vector-post-training-quantization-for-large-language-models/","published":"2024-11-01","authors":["Yifei Liu","Jicheng Wen","Yang Wang","Shengyu Ye","Li Lyna Zhang","Ting Cao","Cheng Li","Mao Yang"],"abstract":"Scaling model size significantly challenges the deployment and inference of Large Language Models (LLMs). Due to the redundancy in LLM weights, recent research has focused on pushing weight-only quantization to extremely low-bit (even down to 2 bits). It reduces memory requirements, optimizes storage costs, and decreases memory bandwidth needs during inference. However, due to numerical representation limitations, traditional scalar-based weight quantization struggles to achieve such extreme low-bit. Recent research on Vector Quantization (VQ) for LLMs has demonstrated the potential for extremely low-bit model quantization by compressing vectors into indices using lookup tables.In this paper, we introduce Vector Post-Training Quantization (VPTQ) for extremely low-bit quantization of LLMs. We use Second-Order Optimization to formulate the LLM VQ problem and guide our quantization algorith...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Systems and networking","LLM","memory","quantization"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/understanding-communication-preferences-of-information-workers-in-engagement-with-text-based-conversational-agents","title":"Understanding Communication Preferences of Information Workers in Engagement with Text-Based Conversational Agents","url":"https://www.microsoft.com/en-us/research/publication/understanding-communication-preferences-of-information-workers-in-engagement-with-text-based-conversational-agents/","published":"2024-11-01","authors":["Ananya Bhattacharjee","Jina Suh","Mahsa Ershadi","Shamsi Iqbal","Andrew D. Wilson","Javier Hernandez"],"abstract":"Communication traits in text-based human-AI conversations play pivotal roles in shaping user experiences and perceptions of systems. With the advancement of large language models (LLMs), it is now feasible to analyze these traits at a more granular level. In this study, we explore the preferences of information workers regarding chatbot communication traits across seven applications. Participants were invited to participate in an interactive survey, which featured adjustable sliders, allowing them to adjust and express their preferences for five key communication traits: formality, personification, empathy, sociability, and humor. Our findings reveal distinct communication preferences across different applications; for instance, there was a preference for relatively high empathy in wellbeing contexts and relatively low personification in coding. Similarities in preferences were also note...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Unpublished","Artificial intelligence","Human-computer interaction","AI agents","Human Computer Interaction","preference"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/provcam-a-camera-module-with-self-contained-tcb-for-producing-verifiable-videos","title":"ProvCam: A Camera Module with Self-Contained TCB for Producing Verifiable Videos","url":"https://www.microsoft.com/en-us/research/publication/provcam-a-camera-module-with-self-contained-tcb-for-producing-verifiable-videos/","published":"2024-11-01","authors":["Yuxin (Myles) Liu","Zhihao Yao","Mingyi Chen","Ardalan Amiri Sani","Sharad Agarwal","Gene Tsudik"],"abstract":"Our perception of reality is under constant threat from ever-improving video manipulation techniques, including deep-fakes and generative AI. Therefore, proving authenticity of videos is increasingly important, especially in legal and news contexts. However, it is very challenging to prove it based on post-factum video content analysis.In this work, we take a preventative stance and construct ProvCam, a novel camera module that generates a cryptographic proof of video authenticity. Our solution greatly reduces the size of Trusted Computing Base (TCB) to include the module itself. Moreover, it mitigates tampering during the numerous processing steps between video capture by the camera sensor and generation of the digital video output. To confirm its practicality, we present a complete prototype of ProvCam on a Xilinx FPGA evaluation board. As experiments show, ProvCam incurs a negligible....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.1145/3636534.3649383","openalex_id":"https://openalex.org/W4399121379","cited_by_count":4,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Security, privacy, and cryptography","Systems and networking","1970-01-01","news"],"author_affiliations":["Microsoft","Microsoft (United States)","New Jersey Institute of Technology","University of California, Irvine"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/tap4llm-table-provider-on-sampling-augmenting-and-packing-semi-structured-data-for-large-language-model-reasoning","title":"TAP4LLM: Table Provider on Sampling, Augmenting, and Packing Semi-structured Data for Large Language Model Reasoning","url":"https://www.microsoft.com/en-us/research/publication/tap4llm-table-provider-on-sampling-augmenting-and-packing-semi-structured-data-for-large-language-model-reasoning/","published":"2024-11-01","authors":["Yuan Sui","Jiaru Zou","Mengyu Zhou","Xinyi He","Lun Du","Shi Han","Dongmei Zhang"],"abstract":"Table reasoning tasks have shown remarkable progress with the development of large language models (LLMs), which involve interpreting and drawing conclusions from tabular data based on natural language (NL) questions. Existing solutions mainly tested on smaller tables face scalability issues and struggle with complex queries due to incomplete or dispersed data across different table sections. To alleviate these challenges, we propose TAP4LLM as a versatile pre-processor suite for leveraging LLMs in table-based tasks effectively. It covers several distinct components: (1) table sampling to decompose large tables into manageable sub-tables based on query semantics, (2) table augmentation to enhance tables with additional knowledge from external sources or models, and (3) table packing & serialization to convert tables into various formats suitable for LLMs' understanding. In each module, w...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Inproceedings (Conference)","Data platforms and analytics","Human language technologies","1970-01-01","language model"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/scaling-laws-for-pre-training-agents-and-world-models","title":"Scaling Laws for Pre-training Agents and World Models","url":"https://www.microsoft.com/en-us/research/publication/scaling-laws-for-pre-training-agents-and-world-models/","published":"2024-11-01","authors":["Tim Pearce","Tabish Rashid","Dave Bignell","Raluca Stevenson","Sam Devlin","Katja Hofmann"],"abstract":"The performance of embodied agents has been shown to improve by increasing model parameters, dataset size, and compute. This has been demonstrated in domains from robotics to video games, when generative learning objectives on offline datasets (pre-training) are used to model an agent's behavior (imitation learning) or their environment (world modeling). This paper characterizes the role of scale in these tasks more precisely. Going beyond the simple intuition that bigger is better', we show that the same types of power laws found in language modeling (e.g. between loss and optimal model size), also arise in world modeling and imitation learning. However, the coefficients of these laws are heavily influenced by the tokenizer, task \\&architecture -- this has important implications on the optimal sizing of models and data. Venue: 1970-01-01","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","1970-01-01","agent"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/sciagent-tool-augmented-language-models-for-scientific-reasoning","title":"SciAgent: Tool-augmented Language Models for Scientific Reasoning","url":"https://www.microsoft.com/en-us/research/publication/sciagent-tool-augmented-language-models-for-scientific-reasoning/","published":"2024-11-01","authors":["Yubo Ma","Zhibin Gou","Junheng Hao","Ruochen Xu","Shuohang Wang","Liangming Pan","Yujiu Yang","Yixin Cao","Aixin Sun"],"abstract":"Scientific reasoning poses an excessive challenge for even the most advanced Large Language Models (LLMs). To make this task more practical and solvable for LLMs, we introduce a new task setting named tool-augmented scientific reasoning. This setting supplements LLMs with scalable toolsets, and shifts the focus from pursuing an omniscient problem solver to a proficient tool-user. To facilitate the research of such setting, we construct a tool-augmented training corpus named MathFunc which encompasses over 30,000 samples and roughly 6,000 tools. Building on MathFunc, we develop SciAgent to retrieve, understand and, if necessary, use tools for scientific problem solving. Additionally, we craft a benchmark, SciToolBench, spanning five scientific domains to evaluate LLMs’ abilities with tool assistance. Extensive experiments on SciToolBench confirm the effectiveness of SciAgent. Notably, Sci...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Human language technologies","Computer science"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/magentic-one-a-generalist-multi-agent-system-for-solving-complex-tasks","title":"Magentic-One: A Generalist Multi-Agent System for Solving Complex Tasks","url":"https://www.microsoft.com/en-us/research/publication/magentic-one-a-generalist-multi-agent-system-for-solving-complex-tasks/","published":"2024-11-01","authors":["Adam Fourney","Gagan Bansal","Hussein Mozannar","Cheng Tan","Eduardo Salinas","Erkang (Eric) Zhu","Friederike Niedtner","Grace Proebsting","Griffin Bassman","Jack Gerrits","Jacob Alber","Peter Chang"],"abstract":"Modern AI agents, driven by advances in large foundation models, promise to enhance our productivity and transform our lives by augmenting our knowledge and capabilities. To achieve this vision, AI agents must effectively plan, perform multi-step reasoning and actions, respond to novel observations, and recover from errors, to successfully complete complex tasks across a wide range of scenarios. In this work, we introduce Magentic-One, a high-performing open-source agentic system for solving such tasks. Magentic-One uses a multi-agent architecture where a lead agent, the Orchestrator , plans, tracks progress, and re-plans to recover from errors. Throughout task execution, the Orchestrator also directs other specialized agents to perform tasks as needed, such as operating a web browser, navigating local files, or writing and executing Python code. Our experiments show that Magentic-One ac...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Tech Report","Artificial intelligence","agent","multi-agent"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/cocost-automatic-complex-code-generation-with-online-searching-and-correctness-testing","title":"CoCoST: Automatic Complex Code Generation with Online Searching and Correctness Testing","url":"https://www.microsoft.com/en-us/research/publication/cocost-automatic-complex-code-generation-with-online-searching-and-correctness-testing/","published":"2024-11-01","authors":["Xinyi He","Jiaru Zou","Yun Lin","Mengyu Zhou","Shi Han","Zejian Yuan","Dongmei Zhang"],"abstract":"Large Language Models have revolutionized code generation ability by converting natural language descriptions into executable code. However, generating complex code within real-world scenarios remains challenging due to intricate structures, subtle bugs, understanding of advanced data types, and lack of supplementary contents. To address these challenges, we introduce the CoCoST framework, which enhances complex code generation by online searching for more information with planned queries and correctness testing for code refinement. Moreover, CoCoST serializes the complex inputs and outputs to improve comprehension and generates test cases to ensure the adaptability for real-world applications. CoCoST is validated through rigorous experiments on the DS-1000 and ClassEval datasets. Experimental results show that CoCoST substantially improves the quality of complex code generation, highlig...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Programming languages and software engineering","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W4407457236","title":"Enhancements in Data Querying: Applying MMR-Integrated In-Context Learning to LLM-based Text-to-SQL","url":"https://doi.org/10.1109/cac63892.2024.10865575","published":"2024-11-01","authors":["Junfang Li","Ming Huang","Zeng Zhenyu","Yang Chuniie"],"abstract":"In the modern business ecosystem, informed decision-making relies heavily on effective data utilization. To enable more intuitive and efficient data access, Text-to-SQL technologies serve as a crucial bridge. This paper introduces an improvement by integrating the Maximal Marginal Relevance (MMR) with In-Context Learning (ICL) to improve the performance of Large Language Models (LLMs) in generating SQL queries. This research focuses on strategically deploying MMR within ICL frameworks, optimizing prompt construction by balancing relevance and diversity of examples. This method was rigorously tested on datasets from domains such as tobacco and automotive marketing, as well as the cross-domain SPIDER benchmark. Results demonstrate that the proposed method excels in both domain-specific and complex questions.","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/cac63892.2024.10865575","openalex_id":"https://openalex.org/W4407457236","cited_by_count":4,"quality_score":49,"matched_keywords":["LLM","efficient"],"author_affiliations":["Alibaba Group (China)","Zhejiang University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8475608825683594},{"id":"https://openalex.org/C510870499","display_name":"SQL","score":0.7897548675537109},{"id":"https://openalex.org/C2779343474","display_name":"Context (archaeology)","score":0.5888535380363464},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.46794289350509644},{"id":"https://openalex.org/C77088390","display_name":"Database","score":0.4156855344772339},{"id":"https://openalex.org/C136764020","display_name":"World Wide Web","score":0.39140868186950684},{"id":"https://openalex.org/C86803240","display_name":"Biology","score":0.0},{"id":"https://openalex.org/C151730666","display_name":"Paleontology","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":4}},{"id":"openalex:W4405058830","title":"Agentic AI in Computer Vision Domain - Recent Advances and Prospects","url":"https://doi.org/10.55248/gengpi.5.1124.3309","published":"2024-11-01","authors":["Daniel Ogbu"],"abstract":"The field of computer vision has witnessed significant strides in recent years, driven by advances in agentic artificial intelligence [AI].Agentic AI, which refers to systems capable of autonomous decision-making and goal-directed behaviour, has transformed the capabilities of computer vision applications.This paper explores recent breakthroughs and the prospective future of agentic AI in the computer vision domain.The discussion begins with an overview of traditional computer vision models and their limitations in adaptability and decision-making.The integration of agentic AI has led to the development of more dynamic models capable of learning and making decisions in complex environments, which are critical for applications such as autonomous vehicles, medical imaging, and surveillance systems.Recent advances, such as reinforcement learning frameworks and self-supervised learning techn...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.55248/gengpi.5.1124.3309","openalex_id":"https://openalex.org/W4405058830","cited_by_count":10,"quality_score":47,"matched_keywords":[],"author_affiliations":["Microsoft (United States)"],"concepts":[{"id":"https://openalex.org/C36503486","display_name":"Domain (mathematical analysis)","score":0.49451562762260437},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.4385651648044586},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.42069071531295776},{"id":"https://openalex.org/C188147891","display_name":"Cognitive science","score":0.391457736492157},{"id":"https://openalex.org/C2522767166","display_name":"Data science","score":0.3283500075340271},{"id":"https://openalex.org/C15744967","display_name":"Psychology","score":0.30292588472366333},{"id":"https://openalex.org/C33923547","display_name":"Mathematics","score":0.08503204584121704},{"id":"https://openalex.org/C134306372","display_name":"Mathematical analysis","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":10}},{"id":"openalex:W4403982385","title":"MWVOS: Mask-Free Weakly Supervised Video Object Segmentation via promptable foundation model","url":"https://doi.org/10.1016/j.patcog.2024.111100","published":"2024-11-01","authors":["Zhenghao Zhang","Shengfan Zhang","Zuozhuo Dai","Zilong Dong","Siyu Zhu"],"abstract":"","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1016/j.patcog.2024.111100","openalex_id":"https://openalex.org/W4403982385","cited_by_count":4,"quality_score":41,"matched_keywords":[],"author_affiliations":["Alibaba Group (China)","Fudan University"],"concepts":[{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.755249559879303},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.6935222148895264},{"id":"https://openalex.org/C89600930","display_name":"Segmentation","score":0.6611849069595337},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6236445903778076},{"id":"https://openalex.org/C2780966255","display_name":"Foundation (evidence)","score":0.5514633059501648},{"id":"https://openalex.org/C2781238097","display_name":"Object (grammar)","score":0.5409955978393555},{"id":"https://openalex.org/C153180895","display_name":"Pattern recognition (psychology)","score":0.415394127368927},{"id":"https://openalex.org/C205649164","display_name":"Geography","score":0.07664680480957031}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":4}},{"id":"openalex:W4403966141","title":"Enhancing Customer Experience with Generative AI in Financial Services Contact Centers","url":"http://dx.doi.org/10.32628/cseit241051084","published":"2024-11-01","authors":["Santhosh Kumar Ganesan"],"abstract":"This article explores the transformative impact of Generative Artificial Intelligence (Gen AI) on customer service operations in financial services contact centers. Through a comprehensive case study approach, we examine the implementation of a Gen AI system designed to handle complex financial queries, provide personalized guidance, and ensure regulatory compliance. The article investigates the system's architecture, key features, and integration with existing infrastructure, while analyzing its performance across critical metrics such as First-Call Resolution rates, call handling times, customer satisfaction scores, and operational costs. Our article reveals significant improvements in these areas, with FCR rates increasing by 50%, call handling times decreasing by 40%, and customer satisfaction scores improving by 45%. The article also addresses the challenges and ethical consideratio...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.32628/cseit241051084","openalex_id":"https://openalex.org/W4403966141","cited_by_count":0,"quality_score":41,"matched_keywords":["personalized"],"author_affiliations":["Microsoft (United States)"],"concepts":[{"id":"https://openalex.org/C139043278","display_name":"Financial services","score":0.5465065240859985},{"id":"https://openalex.org/C144133560","display_name":"Business","score":0.5189575552940369},{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.4922310411930084},{"id":"https://openalex.org/C10138342","display_name":"Finance","score":0.3653390407562256},{"id":"https://openalex.org/C162853370","display_name":"Marketing","score":0.34602147340774536},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.26386070251464844},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.22264185547828674}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4405329769","title":"Emerging trends: evaluating general purpose foundation models","url":"https://doi.org/10.1017/s1351324924000068","published":"2024-11-01","authors":["Kenneth Church","Omar Alonso"],"abstract":"Abstract We suggest that foundation models are general purpose solutions similar to general purpose programmable microprocessors, where fine-tuning and prompt-engineering are analogous to coding for microprocessors. Evaluating general purpose solutions is not like hypothesis testing. We want to know how well the machine will perform on an unknown program with unknown inputs for unknown users with unknown budgets and unknown utility functions. This paper is based on an invited talk by John Mashey, “Lessons from SPEC,” at an ACL-2021 workshop on benchmarking. Mashey started by describing Standard Performance Evaluation Corporation (SPEC), a benchmark that has had more impact than benchmarks in our field because SPEC addresses an import commercial question: which CPU should I buy? In addition, SPEC can be interpreted to show that CPUs are 50,000 faster than they were 40 years ago. It is rem...","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1017/s1351324924000068","openalex_id":"https://openalex.org/W4405329769","cited_by_count":1,"quality_score":38,"matched_keywords":[],"author_affiliations":["Amazon (United States)","Northeastern University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8800621032714844},{"id":"https://openalex.org/C2780966255","display_name":"Foundation (evidence)","score":0.7017180919647217},{"id":"https://openalex.org/C2522767166","display_name":"Data science","score":0.5447675585746765},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3339337110519409},{"id":"https://openalex.org/C166957645","display_name":"Archaeology","score":0.06093543767929077},{"id":"https://openalex.org/C95457728","display_name":"History","score":0.05353185534477234}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"official:fa2172e6303a7215","title":"Open-World Task and Motion Planning via Vision-Language Model Inferred Constraints","url":"https://research.nvidia.com/publication/2024-11_open-world-task-and-motion-planning-vision-language-model-inferred-constraints","published":"2024-11","authors":["Nishanth Kumar","William Shen","Fabio Ramos","Dieter Fox","Tomás Lozano-Pérez","Leslie Pack Kaelbling","Caelan Garrett"],"abstract":"Official NVIDIA Research publication. CORL","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["CORL","language model"],"author_affiliations":["NVIDIA"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official NVIDIA Research publications page https://research.nvidia.com/publications?f%5B0%5D=publication_date%3A2024&page=0"}},{"id":"official:e55cde71c2314f92","title":"FastAdaSP: Multitask-Adapted Efficient Inference for Large Speech Language Model","url":"https://research.nvidia.com/publication/2024-11_fastadasp-multitask-adapted-efficient-inference-large-speech-language-model","published":"2024-11","authors":["Yichen Lu","Jiaqi Song","Huck Yang","Shinji Watanabe"],"abstract":"Official NVIDIA Research publication.","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["language model","efficient"],"author_affiliations":["NVIDIA"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official NVIDIA Research publications page https://research.nvidia.com/publications?f%5B0%5D=publication_date%3A2024&page=0"}},{"id":"official:9b30753c00cad11e","title":"DRC-Coder: Automated DRC Checker Code Generation Using LLM Autonomous Agent","url":"https://research.nvidia.com/publication/2024-11_drc-coder-automated-drc-checker-code-generation-using-llm-autonomous-agent","published":"2024-11","authors":["Chen-Chia Chang","Chia-Tung (Mark) Ho","Yaguang Li","Yiran Chen","Mark Haoxing Ren"],"abstract":"Official NVIDIA Research publication.","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["LLM","agent"],"author_affiliations":["NVIDIA"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official NVIDIA Research publications page https://research.nvidia.com/publications?f%5B0%5D=publication_date%3A2024&page=0"}},{"id":"official:d22a4c64723f5365","title":"Guiding Long-Horizon Task and Motion Planning with Vision Language Models","url":"https://research.nvidia.com/publication/2024-11_guiding-long-horizon-task-and-motion-planning-vision-language-models","published":"2024-11","authors":["Zhutian Yang","Caelan Garrett","Dieter Fox","Tomás Lozano-Pérez","Leslie Pack Kaelbling"],"abstract":"Official NVIDIA Research publication. CORL ICRA","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["CORL ICRA"],"author_affiliations":["NVIDIA"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official NVIDIA Research publications page https://research.nvidia.com/publications?f%5B0%5D=publication_date%3A2024&page=0"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/bridging-geometric-states-via-generative-modeling","title":"Bridging Geometric States via Generative Modeling","url":"https://www.microsoft.com/en-us/research/publication/bridging-geometric-states-via-generative-modeling/","published":"2024-10-31","authors":["Shengjie Luo","Yixian Xu","Di He","Shuxin Zheng","Tie-Yan Liu","Liwei Wang"],"abstract":"The accurate prediction of geometric state evolution in complex systems is critical for advancing scientific domains such as quantum chemistry and material modeling. Traditional experimental and computational methods face challenges in terms of environmental constraints and computational demands, while current deep learning approaches still fall short in terms of precision and generality. In this work, we introduce the Geometric Diffusion Bridge (GDB), a novel generative modeling framework that accurately bridges initial and target geometric states. GDB leverages a probabilistic approach to evolve geometric state distributions, employing an equivariant diffusion bridge derived by a modified version of Doob's $h$-transform for connecting geometric states. This tailored diffusion process is anchored by initial and target geometric states as fixed endpoints and governed by equivariant trans...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","geometric state evolution","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W4403939369","title":"An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Language Models","url":"https://doi.org/10.1007/978-3-031-73004-7_2","published":"2024-10-31","authors":["Liang Chen","Haozhe Zhao","Tianyu Liu","Shuai Bai","Junyang Lin","Chang Zhou","Baobao Chang"],"abstract":"","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"book-chapter","doi":"https://doi.org/10.1007/978-3-031-73004-7_2","openalex_id":"https://openalex.org/W4403939369","cited_by_count":61,"quality_score":67,"matched_keywords":[],"author_affiliations":["Alibaba Group (China)","Peking University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8304414749145508},{"id":"https://openalex.org/C2776214188","display_name":"Inference","score":0.6307051181793213},{"id":"https://openalex.org/C2779227376","display_name":"Layer (electronics)","score":0.6224875450134277},{"id":"https://openalex.org/C117896860","display_name":"Acceleration","score":0.5489197969436646},{"id":"https://openalex.org/C115961682","display_name":"Image (mathematics)","score":0.49419906735420227},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4836534261703491},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.4121347665786743},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.3506511151790619}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":61}},{"id":"official:121828b47af90f23","title":"Digitizing Touch with an Artificial Multimodal Fingertip","url":"https://ai.meta.com/research/publications/digitizing-touch-with-an-artificial-multimodal-fingertip/","published":"2024-10-31","authors":["Mike Lambeta","Tingfan Wu","Ali Sengül","Victoria Rose Most","Nolan Black","Kevin Sawyer","Romeo Mercado","Haozhi Qi","Alexander Sohn","Byron Taylor","Norb Tydingco","Gregg Kammerer"],"abstract":"Official Meta AI publication page.","companies":["Meta/FAIR"],"matched_orgs":["Meta/FAIR"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Human & Machine Intelligence","Robotics"],"author_affiliations":["Meta/FAIR"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Meta AI publication search https://ai.meta.com/results/?content_types%5B0%5D=publication&page=9"}},{"id":"openalex:W4403943534","title":"MCFC: A Momentum-Driven Clicked Feature Compressed Pre-trained Language Model for Information Retrieval","url":"https://doi.org/10.1007/978-981-97-9431-7_6","published":"2024-10-31","authors":["Dongyang Li","Ruixue Ding","Pengjun Xie","Xiaofeng He"],"abstract":"","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"book-chapter","doi":"https://doi.org/10.1007/978-981-97-9431-7_6","openalex_id":"https://openalex.org/W4403943534","cited_by_count":0,"quality_score":45,"matched_keywords":["language model","retrieval"],"author_affiliations":["Alibaba Group (China)","East China Normal University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8396925926208496},{"id":"https://openalex.org/C2776401178","display_name":"Feature (linguistics)","score":0.6245232224464417},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.47594523429870605},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.34123876690864563},{"id":"https://openalex.org/C138885662","display_name":"Philosophy","score":0.0},{"id":"https://openalex.org/C41895202","display_name":"Linguistics","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4403943312","title":"Editing Personality For Large Language Models","url":"https://doi.org/10.1007/978-981-97-9434-8_19","published":"2024-10-31","authors":["Shengyu Mao","Xiaohan Wang","Mengru Wang","Yong Jiang","Pengjun Xie","Fei Huang","Ningyu Zhang"],"abstract":"","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"book-chapter","doi":"https://doi.org/10.1007/978-981-97-9434-8_19","openalex_id":"https://openalex.org/W4403943312","cited_by_count":5,"quality_score":42,"matched_keywords":[],"author_affiliations":["Alibaba Group (China)","Zhejiang University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8713973760604858},{"id":"https://openalex.org/C187288502","display_name":"Personality","score":0.5694308876991272},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.4339945912361145},{"id":"https://openalex.org/C199360897","display_name":"Programming language","score":0.4059658646583557},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.39952173829078674},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.3754052519798279},{"id":"https://openalex.org/C188147891","display_name":"Cognitive science","score":0.3419148325920105},{"id":"https://openalex.org/C11171543","display_name":"Psychoanalysis","score":0.08596405386924744}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":5}},{"id":"openalex:W4403938930","title":"Unified Medical Image Pre-training in Language-Guided Common Semantic Space","url":"https://doi.org/10.1007/978-3-031-73004-7_8","published":"2024-10-31","authors":["Xiaoxuan He","Yifan Yang","Xinyang Jiang","Xufang Luo","Haoji Hu","Siyun Zhao","Dongsheng Li","Yuqing Yang","Lili Qiu"],"abstract":"","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"book-chapter","doi":"https://doi.org/10.1007/978-3-031-73004-7_8","openalex_id":"https://openalex.org/W4403938930","cited_by_count":4,"quality_score":41,"matched_keywords":[],"author_affiliations":["Microsoft (United States)","Zhejiang University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8780763745307922},{"id":"https://openalex.org/C2986420190","display_name":"Semantic space","score":0.5949928164482117},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.5887503623962402},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5383286476135254},{"id":"https://openalex.org/C2778572836","display_name":"Space (punctuation)","score":0.42338377237319946},{"id":"https://openalex.org/C115961682","display_name":"Image (mathematics)","score":0.41049572825431824},{"id":"https://openalex.org/C199360897","display_name":"Programming language","score":0.33781832456588745},{"id":"https://openalex.org/C111919701","display_name":"Operating system","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":4}},{"id":"openalex:W4403938692","title":"Sparse Mixture of Experts Language Models Excel in Knowledge Distillation","url":"https://doi.org/10.1007/978-981-97-9437-9_7","published":"2024-10-31","authors":["Haiyang Xu","Haoxiang Liu","Wei Gong","Xianjun Deng","Hai Wang"],"abstract":"","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"book-chapter","doi":"https://doi.org/10.1007/978-981-97-9437-9_7","openalex_id":"https://openalex.org/W4403938692","cited_by_count":0,"quality_score":41,"matched_keywords":["distillation"],"author_affiliations":["Alibaba Group (China)","Anhui Science and Technology University","Huazhong University of Science and Technology","Southeast University","University of Science and Technology of China"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8374878764152527},{"id":"https://openalex.org/C204030448","display_name":"Distillation","score":0.6843519806861877},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.4422638416290283},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4319651126861572},{"id":"https://openalex.org/C43617362","display_name":"Chromatography","score":0.06565886735916138},{"id":"https://openalex.org/C185592680","display_name":"Chemistry","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4403940682","title":"Modeling Comparative Logical Relation with Contrastive Learning for Text Generation","url":"https://doi.org/10.1007/978-981-97-9440-9_9","published":"2024-10-31","authors":["Yuhao Dan","Junfeng Tian","Jie Zhou","Ming Yan","Ji Zhang","Qin Chen","Liang He"],"abstract":"","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"book-chapter","doi":"https://doi.org/10.1007/978-981-97-9440-9_9","openalex_id":"https://openalex.org/W4403940682","cited_by_count":1,"quality_score":38,"matched_keywords":[],"author_affiliations":["Alibaba Group (China)","East China Normal University","Shanghai Institute of Technology","Xiaomi (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8744490146636963},{"id":"https://openalex.org/C25343380","display_name":"Relation (database)","score":0.7526608109474182},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.5628002882003784},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5216532349586487},{"id":"https://openalex.org/C2985684807","display_name":"Text generation","score":0.42995917797088623},{"id":"https://openalex.org/C199360897","display_name":"Programming language","score":0.33166998624801636},{"id":"https://openalex.org/C80444323","display_name":"Theoretical computer science","score":0.325929194688797},{"id":"https://openalex.org/C124101348","display_name":"Data mining","score":0.14537501335144043}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4403943413","title":"Object-Oriented Anchoring and Modal Alignment in Multimodal Learning","url":"https://doi.org/10.1007/978-3-031-72973-7_11","published":"2024-10-31","authors":["Shibin Mei","Bingbing Ni","Hang Wang","Chenglong Zhao","Fuqing Hu","Zhiming Pi","Bilian Ke"],"abstract":"","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"book-chapter","doi":"https://doi.org/10.1007/978-3-031-72973-7_11","openalex_id":"https://openalex.org/W4403943413","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Huawei Technologies (China)","Renji Hospital","Shanghai Jiao Tong University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8511174917221069},{"id":"https://openalex.org/C18483071","display_name":"Anchoring","score":0.7713645100593567},{"id":"https://openalex.org/C71139939","display_name":"Modal","score":0.6792069673538208},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5709042549133301},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.4919719398021698},{"id":"https://openalex.org/C2781238097","display_name":"Object (grammar)","score":0.4482003152370453},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.4061990976333618},{"id":"https://openalex.org/C188147891","display_name":"Cognitive science","score":0.1798732876777649}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"apple:jv6x6mz5ecwwqfy2woifabpy","title":"Towards Cross-Cultural Machine Translation with Retrieval-Augmented Generation from Multilingual Knowledge Graphs","url":"https://machinelearning.apple.com/research/cultural-translation","published":"2024-10-30","authors":["Simone Conia","Daniel Lee","Min Li","Umar Farooq Minhas","Saloni Potdar","Yunyao Li"],"abstract":"Translating text that contains entity names is a challenging task, as cultural-related references can vary significantly across languages. These variations may also be caused by transcreation, an adaptation process that entails more than transliteration and word-for-word translation. In this paper, we address the problem of cross-cultural translation on two fronts: (i) we introduce XC-Translate, the first large-scale, manually-created benchmark...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["retrieval"],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"openalex:W4403901714","title":"Developing a Research Center for Artificial Intelligence in Medicine","url":"https://doi.org/10.1016/j.mcpdig.2024.07.005","published":"2024-10-30","authors":["Curtis P. Langlotz","Johanna Inhyang Kim","Nigam H. Shah","Matthew P. Lungren","David B. Larson","Somalee Datta","Fei Fei Li","Ruth O’Hara","Thomas J. Montine","Robert A. Harrington","Garry E. Gold"],"abstract":"Artificial intelligence (AI) and machine learning (ML) are driving innovation in biosciences and are already affecting key elements of medical scholarship and clinical care. Many schools of medicine are capitalizing on the promise of these new technologies by establishing academic units to catalyze and grow research and innovation in AI/ML. At Stanford University, we have developed a successful model for an AI/ML research center with support from academic leaders, clinical departments, extramural grants, and industry partners. The Center for Artificial Intelligence in Medicine and Imaging uses the following 4 key tactics to support AI/ML research: project-based learning opportunities that build interdisciplinary collaboration; internal grant programs that catalyze extramural funding; infrastructure that facilitates the rapid creation of large multimodal AI-ready clinical data sets; and e...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1016/j.mcpdig.2024.07.005","openalex_id":"https://openalex.org/W4403901714","cited_by_count":12,"quality_score":49,"matched_keywords":[],"author_affiliations":["Artificial Intelligence in Medicine (Canada)","Cornell University","Microsoft (United States)","Stanford Medicine","Stanford University"],"concepts":[{"id":"https://openalex.org/C2779463800","display_name":"Center (category theory)","score":0.705155074596405},{"id":"https://openalex.org/C2777532764","display_name":"Research center","score":0.4382709562778473},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.37421008944511414},{"id":"https://openalex.org/C71924100","display_name":"Medicine","score":0.3517419695854187},{"id":"https://openalex.org/C19527891","display_name":"Medical physics","score":0.32122957706451416},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.30764660239219666},{"id":"https://openalex.org/C142724271","display_name":"Pathology","score":0.10736554861068726},{"id":"https://openalex.org/C8010536","display_name":"Crystallography","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":12}},{"id":"openalex:W4403908341","title":"LaMI-DETR: Open-Vocabulary Detection with Language Model Instruction","url":"https://doi.org/10.1007/978-3-031-73337-6_18","published":"2024-10-30","authors":["Penghui Du","Yu Wang","Yifan Sun","Luting Wang","Yue Liao","Gang Zhang","Errui Ding","Yan Wang","Jingdong Wang","Si Liu"],"abstract":"","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"book-chapter","doi":"https://doi.org/10.1007/978-3-031-73337-6_18","openalex_id":"https://openalex.org/W4403908341","cited_by_count":7,"quality_score":48,"matched_keywords":["language model"],"author_affiliations":["Baidu (China)","Beihang University","Tsinghua University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8807590007781982},{"id":"https://openalex.org/C2777601683","display_name":"Vocabulary","score":0.6969667673110962},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.5916054844856262},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5170343518257141},{"id":"https://openalex.org/C41895202","display_name":"Linguistics","score":0.21867460012435913},{"id":"https://openalex.org/C138885662","display_name":"Philosophy","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":7}},{"id":"openalex:W4403917938","title":"High Efficiency Image Compression for Large Visual-Language Models","url":"https://doi.org/10.1109/tcsvt.2024.3488181","published":"2024-10-30","authors":["Binzhe Li","Shurun Wang","Shiqi Wang","Yan Ye"],"abstract":"In recent years, large visual language models (LVLMs) have shown impressive performance and promising generalization capability in multi-modal tasks, thus replacing humans as receivers of visual information in various application scenarios. In this paper, we pioneer to propose a variable bitrate image compression scheme consisting of a pre-editing module and an end-to-end codec to achieve promising rate-accuracy performance for different LVLMs. In particular, instead of optimizing an adaptive pre-editing network towards a particular task or several representative tasks, we propose a new optimization strategy tailored for LVLMs, which is designed based on the representation and discrimination capability with token-level distortion and rank. The pre-editing module and the variable bitrate end-to-end image codec are jointly trained by the losses based on semantic tokens of the large model,....","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/tcsvt.2024.3488181","openalex_id":"https://openalex.org/W4403917938","cited_by_count":2,"quality_score":43,"matched_keywords":["compression"],"author_affiliations":["Alibaba Group (China)","Alibaba Group (United States)","City University of Hong Kong"],"concepts":[{"id":"https://openalex.org/C13481523","display_name":"Image compression","score":0.6810287833213806},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6688500642776489},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.5758013725280762},{"id":"https://openalex.org/C78548338","display_name":"Data compression","score":0.558894157409668},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5501948595046997},{"id":"https://openalex.org/C54243161","display_name":"Texture compression","score":0.468772828578949},{"id":"https://openalex.org/C180016635","display_name":"Compression (physics)","score":0.46773654222488403},{"id":"https://openalex.org/C115961682","display_name":"Image (mathematics)","score":0.4211844503879547}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":2}},{"id":"openalex:W4403921858","title":"AddressCLIP: Empowering Vision-Language Models for City-Wide Image Address Localization","url":"https://doi.org/10.1007/978-3-031-73390-1_5","published":"2024-10-30","authors":["Shixiong Xu","Chenghao Zhang","Lubin Fan","Gaofeng Meng","Shiming Xiang","Jieping Ye"],"abstract":"","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"book-chapter","doi":"https://doi.org/10.1007/978-3-031-73390-1_5","openalex_id":"https://openalex.org/W4403921858","cited_by_count":4,"quality_score":41,"matched_keywords":[],"author_affiliations":["Alibaba Group (China)","Beijing Academy of Artificial Intelligence","Chinese Academy of Sciences","Institut de Recherche et d’Innovation","Shandong Institute of Automation","University of Chinese Academy of Sciences"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8487256765365601},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.5849453210830688},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.556194543838501},{"id":"https://openalex.org/C115961682","display_name":"Image (mathematics)","score":0.5222721695899963},{"id":"https://openalex.org/C121684516","display_name":"Computer graphics (images)","score":0.33191823959350586}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":4}},{"id":"openalex:W4403924111","title":"WAVE: Warping DDIM Inversion Features for Zero-Shot Text-to-Video Editing","url":"https://doi.org/10.1007/978-3-031-73116-7_3","published":"2024-10-30","authors":["Yutang Feng","Sicheng Gao","Yuxiang Bao","Xiaodi Wang","Shumin Han","Juan Zhang","Baochang Zhang","Angela Yao"],"abstract":"","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"book-chapter","doi":"https://doi.org/10.1007/978-3-031-73116-7_3","openalex_id":"https://openalex.org/W4403924111","cited_by_count":2,"quality_score":39,"matched_keywords":[],"author_affiliations":["Baidu (China)","Beihang University","National University of Singapore"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8230045437812805},{"id":"https://openalex.org/C157202957","display_name":"Image warping","score":0.7542707324028015},{"id":"https://openalex.org/C2778344882","display_name":"Shot (pellet)","score":0.6138690114021301},{"id":"https://openalex.org/C121684516","display_name":"Computer graphics (images)","score":0.5796509385108948},{"id":"https://openalex.org/C1893757","display_name":"Inversion (geology)","score":0.5314347147941589},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4574204981327057},{"id":"https://openalex.org/C28490314","display_name":"Speech recognition","score":0.38892894983291626},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.37615349888801575}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":2}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/slowfast-vgen-slow-fast-learning-for-action-driven-long-video-generation","title":"SlowFast-VGen: Slow-Fast Learning for Action-Driven Long Video Generation","url":"https://www.microsoft.com/en-us/research/publication/slowfast-vgen-slow-fast-learning-for-action-driven-long-video-generation/","published":"2024-10-29","authors":["Yining Hong","Beide Liu","Maxine Wu","Yuanhao Zhai","Kai-Wei Chang","Linjie Li","K. Lin","Chung-Ching Lin","Jianfeng Wang","Zhengyuan Yang","Yingnian Wu","Lijuan Wang"],"abstract":"Human beings are endowed with a complementary learning system, which bridges the slow learning of general world dynamics with fast storage of episodic memory from a new experience. Previous video generation models, however, primarily focus on slow learning by pre-training on vast amounts of data, overlooking the fast learning phase crucial for episodic memory storage. This oversight leads to inconsistencies across temporally distant frames when generating longer videos, as these frames fall beyond the model's context window. To this end, we introduce SlowFast-VGen, a novel dual-speed learning system for action-driven long video generation. Our approach incorporates a masked conditional video diffusion model for the slow learning of world dynamics, alongside an inference-time fast learning strategy based on a temporal LoRA module. Specifically, the fast learning process updates its tempor...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":80,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer vision","Computer science","Computer Vision and Pattern Recognition","1970-01-01","memory"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/tamgen-drug-design-with-target-aware-molecule-generation-through-a-chemical-language-model","title":"TamGen: drug design with target-aware molecule generation through a chemical language model","url":"https://www.microsoft.com/en-us/research/publication/tamgen-drug-design-with-target-aware-molecule-generation-through-a-chemical-language-model/","published":"2024-10-29","authors":["Kehan Wu","Yingce Xia","Pan Deng","Renhe Liu","Yuan Zhang","Han Guo","Yumeng Cui","Qizhi Pei","Lijun Wu","Shufang Xie","Si Chen","Xi Lu"],"abstract":"Generative drug design facilitates the creation of compounds effective against pathogenic target proteins. This opens up the potential to discover novel compounds within the vast chemical space and fosters the development of innovative therapeutic strategies. However, the practicality of generated molecules is often limited, as many designs focus on a narrow set of drug-related properties, failing to improve the success rate of subsequent drug discovery process. To overcome these challenges, we develop TamGen, a method that employs a GPT-like chemical language model and enables target-aware molecule generation and compound refinement. We demonstrate that the compounds generated by TamGen have improved molecular quality and viability. Additionally, we have integrated TamGen into a drug discovery pipeline and identified 14 compounds showing compelling inhibitory activity against the Tuberc...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Article (Journal)","Artificial intelligence","Medicine","language model"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W4403878342","title":"TextDiffuser-2: Unleashing the Power of Language Models for Text Rendering","url":"https://doi.org/10.1007/978-3-031-72652-1_23","published":"2024-10-29","authors":["Jingye Chen","Yupan Huang","Tengchao Lv","Lei Cui","Qifeng Chen","Furu Wei"],"abstract":"","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"book-chapter","doi":"https://doi.org/10.1007/978-3-031-72652-1_23","openalex_id":"https://openalex.org/W4403878342","cited_by_count":30,"quality_score":67,"matched_keywords":[],"author_affiliations":["Hong Kong University of Science and Technology","Microsoft (United States)","Sun Yat-sen University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8395193815231323},{"id":"https://openalex.org/C205711294","display_name":"Rendering (computer graphics)","score":0.756903886795044},{"id":"https://openalex.org/C195818886","display_name":"Expressive power","score":0.46203094720840454},{"id":"https://openalex.org/C121684516","display_name":"Computer graphics (images)","score":0.4351407289505005},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.40336325764656067},{"id":"https://openalex.org/C199360897","display_name":"Programming language","score":0.3874730169773102},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.32998669147491455}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":30}},{"id":"apple:wl4gfmvbm45gib6pbmsqo5xi","title":"Computational Bottlenecks of Training Small-Scale Large Language Models","url":"https://machinelearning.apple.com/research/computational-bottlenecks","published":"2024-10-29","authors":["Saleh Ashkboos","Iman Mirzadeh","Keivan Alizadeh","Mohammad Hossein Sekhavat","Moin Nabi","Mehrdad Farajtabar","Fartash Faghri"],"abstract":"This paper was accepted at the Efficient Natural Language and Speech Processing (ENLSP) Workshop at NeurIPS 2024.","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["efficient"],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"apple:rvoskcfm8z8yw010jnktutn1","title":"ConvKGYarn: Spinning Configurable and Scalable Conversational Knowledge Graph QA Datasets with Large Language Models","url":"https://machinelearning.apple.com/research/convkgyarn-datasets","published":"2024-10-29","authors":["Ronak Pradeep","Daniel Lee","Ali Mousavi","Jeff Pound","Yisi Sang","Jimmy Lin","Ihab Ilyas","Saloni Potdar","Mostafa Arefiyan","Yunyao Li"],"abstract":"The rapid evolution of Large Language Models (LLMs) and conversational assistants necessitates dynamic, scalable, and configurable conversational datasets for training and evaluation. These datasets must accommodate diverse user interaction modes, including text and voice, each presenting unique modeling challenges. Knowledge Graphs (KGs), with their structured and evolving nature, offer an ideal foundation for current and precise knowledge....","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"openalex:W4403923016","title":"Adaptive Policy Regularization for Offline-to-Online Reinforcement Learning in HVAC Control","url":"https://doi.org/10.1145/3671127.3698163","published":"2024-10-29","authors":["Hsin‐Yu Liu","Bharathan Balaji","Rajesh K. Gupta","Dezhi Hong"],"abstract":"Reinforcement learning (RL)-based control methods have been extensively studied to improve building heating, ventilation, and air conditioning (HVAC) efficiency. Data-driven approaches demonstrate better transferability and scalability, making them useful in real-world applications. Most prior works focus on online learning requiring simulators or models of environment dynamics. However, transferring thermal simulators between environments is inefficient in practice. We build on recent works that employ offline training on static datasets from unknown policies. Pure offline RL is constrained by the replay buffer's distribution, we propose using offline-to-online RL to enhance pre-trained offline models through online adaptation to distribution shifts. We show that direct online fine-tuning deteriorates performance on offline policies. To address this, we propose automatically tuning the....","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3671127.3698163","openalex_id":"https://openalex.org/W4403923016","cited_by_count":2,"quality_score":39,"matched_keywords":[],"author_affiliations":["Amazon (United States)","University of California San Diego"],"concepts":[{"id":"https://openalex.org/C97541855","display_name":"Reinforcement learning","score":0.8429933786392212},{"id":"https://openalex.org/C2776135515","display_name":"Regularization (linguistics)","score":0.7100639343261719},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6638721823692322},{"id":"https://openalex.org/C122346748","display_name":"HVAC","score":0.6541897654533386},{"id":"https://openalex.org/C107464732","display_name":"Adaptive control","score":0.4964368939399719},{"id":"https://openalex.org/C67203356","display_name":"Reinforcement","score":0.44495901465415955},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3943358063697815},{"id":"https://openalex.org/C2775924081","display_name":"Control (management)","score":0.3650679886341095}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":2}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/large-language-models-can-provide-accurate-and-interpretable-incident-triage-2","title":"Large Language Models Can Provide Accurate and Interpretable Incident Triage","url":"https://www.microsoft.com/en-us/research/publication/large-language-models-can-provide-accurate-and-interpretable-incident-triage-2/","published":"2024-10-28","authors":["Zexin Wang","Jianhui Li","Minghua Ma","Ze Li","Yu Kang","Chaoyun Zhang","Chetan Bansal","Murali Chintalapati","S. Rajmohan","Qingwei Lin","Dongmei Zhang","Changhua Pei"],"abstract":"Large-scale cloud services frequently experience incidents that can have a significant impact on their stability. Incident triage is a critical process that assigns incidents to dedicated teams for resolution. However, traditional rule-based methods, commonly employed in various systems, have limitations due to a finite set of rules that necessitate continuous updates, leading to suboptimal performance. Current state-of-the-art approaches primarily rely on textual information, utilizing classifiers or unsupervised clustering. Unfortunately, the abundance of textual information, combined with considerable noise, presents a significant challenge to the accuracy of these methods. To tackle these challenges, we introduce COMET, an innovative system that utilizes an AutoExtractor to filter out non-critical logs and employs a Large Language Model (LLM) for keyword extraction. This approach eff...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":92,"matched_keywords":["Article (Journal)","Artificial intelligence","Programming languages and software engineering","AIOps","Cloud computing","software engineering","Inproceedings (Conference)","Systems and networking","LLM","language model"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/sculpt-systematic-tuning-of-long-prompts","title":"SCULPT: Systematic Tuning of Long Prompts","url":"https://www.microsoft.com/en-us/research/publication/sculpt-systematic-tuning-of-long-prompts/","published":"2024-10-28","authors":["Shanu Kumar","Akhila Yesantarao Venkata","Shubhanshu Khandelwal","Bishal Santra","Parag Agrawal","Manish Gupta"],"abstract":"Prompt optimization is essential for effective utilization of large language models (LLMs) across diverse tasks. While existing optimization methods are effective in optimizing short prompts, they struggle with longer, more complex ones, often risking information loss and being sensitive to small perturbations. To address these challenges, we propose SCULPT (Systematic Tuning of Long Prompts), a framework that treats prompt optimization as a hierarchical tree refinement problem. SCULPT represents prompts as tree structures, enabling targeted modifications while preserving contextual integrity. It employs a Critic-Actor framework that generates reflections and applies actions to refine the prompt. Evaluations demonstrate SCULPT's effectiveness on long prompts, its robustness to adversarial perturbations, and its ability to generate high-performing prompts even without any initial human-wr...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","1970-01-01","LLM"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/icondm-text-guided-icon-set-expansion-using-diffusion-models","title":"IconDM: Text-Guided Icon Set Expansion Using Diffusion Models","url":"https://www.microsoft.com/en-us/research/publication/icondm-text-guided-icon-set-expansion-using-diffusion-models/","published":"2024-10-28","authors":["Jiawei Lin","Zhaoyun Jiang","Jiaqi Guo","Ting Liu","Shizhao Sun","Zijiang Yang","Jian-Guang Lou","Dongmei Zhang"],"abstract":"Icons are ubiquitous visual elements in graphic design, yet their creation is often complex and time-consuming. To resolve this problem, we draw inspiration from the booming text-to-image field and propose Text-Guided Icon Set Expansion, a novel task that helps users design high-quality icons using textual descriptions. Besides, users can control the style consistency of the created icons by inputting a few hand-crafted icons as style reference. Despite its practicality, the task poses two unique challenges. (i) Abstract Concept Visualization. Abstract concepts like technology and health are frequently encountered in icon creation, but their visualization is not straightforward and requires a grounding process that translates them into physical, easy-to-depict objects. (ii) Fine-grained Style Transfer. Unlike ordinary images, icons exhibit richer fine-grained stylistic elements, includin...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Article (Journal)","Artificial intelligence","Computer vision","Computer science","language model"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W4404952749","title":"Leveraging RAG-Enhanced Large Language Model for Semi-Supervised Log Anomaly Detection","url":"https://doi.org/10.1109/issre62328.2024.00026","published":"2024-10-28","authors":["Wanhao Zhang","Qianli Zhang","Enyu Yu","Yuxiang Ren","Yeqing Meng","Mingxi Qiu","Jilong Wang"],"abstract":"Log-based anomaly detection is critical in monitoring the operations of information systems and in the real-time reporting of system failures. Utilizing deep learning-based log anomaly detection methods facilitates effective detection of anomalies within logs. However, existing methods are greatly dependent on log parsers, and parsing errors can considerably affect downstream anomaly detection tasks. Additionally, methods that predict the next log event in a sequence are susceptible to the instability of sequences and the emergence of unseen logs as systems evolve, resulting in a higher false positive rate. In this paper, we put forward LogRAG, a semi-supervised log anomaly detection framework based on retrieval-augmented generation (RAG). This framework conducts phased detection using both Log Tokens and Log Templates to mitigate the impact of log parsing errors. It also utilizes a sing...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/issre62328.2024.00026","openalex_id":"https://openalex.org/W4404952749","cited_by_count":8,"quality_score":57,"matched_keywords":["LLM","language model","retrieval"],"author_affiliations":["Huawei Technologies (China)","Tsinghua University"],"concepts":[{"id":"https://openalex.org/C739882","display_name":"Anomaly detection","score":0.6919431090354919},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6738691926002502},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.47315868735313416},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.3579104542732239}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":8}},{"id":"apple:qgqgoayxd4liwo844hy5pznp","title":"Smart Audit System Empowered by LLM","url":"https://machinelearning.apple.com/research/smart-audit","published":"2024-10-28","authors":["Xu Yao","Xiaoxu Wu","Xi Li","Huan Xu","Chenlei Li","Ping Huang","Si Li","Xiaoning Ma","Jiulong Shan"],"abstract":"Manufacturing quality audits are pivotal for ensuring high product standards in mass production environments. Traditional auditing processes, however, are labor-intensive and heavily reliant on human expertise, posing challenges in maintaining transparency, accountability, and continuous improvement across complex global supply chains. To address these challenges, we propose a smart audit system empowered by large language models (LLMs). Our...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["LLM"],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"apple:ks4zt9b5l0hubfc9xuluv7wt","title":"Promoting Cross-Modal Representations to Improve Multimodal Foundation Models for Physiological Signals","url":"https://machinelearning.apple.com/research/modal-representations","published":"2024-10-28","authors":["Ching Fang","Christopher Sandino","Behrooz Mahasseni","Juri Minxha","Hadi Pouransari","Erdrin Azemi","Ali Moin","Ellen Zippi"],"abstract":"Many healthcare applications are inherently multimodal, involving several physiological signals. As sensors for these signals become more common, improving machine learning methods for multimodal healthcare data is crucial. Pretraining foundation models is a promising avenue for success. However, methods for developing foundation models in healthcare are still in early exploration and it is unclear which pretraining strategies are most effective...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"openalex:W4404955189","title":"Self-Evolutionary Group-wise Log Parsing Based on Large Language Model","url":"https://doi.org/10.1109/issre62328.2024.00016","published":"2024-10-28","authors":["Changhua Pei","Zihan Liu","Jianhui Li","Erhan Zhang","Le Zhang","Haiming Zhang","Wei Chen","Dan Pei","Gaogang Xie"],"abstract":"Log parsing involves extracting appropriate templates from semi-structured logs, providing foundational information for downstream log analysis tasks such as anomaly detection and log comprehension. Initially, the task of log parsing was approached by domain experts who manually designed heuristic rules to extract templates. However, the effectiveness of these manual rules deteriorates when certain characteristics of a new log dataset do not conform to the pre-designed rules. To address these issues, introducing large language models (LLM) into log parsing has yielded promising results. Nevertheless, there are two limitations: one is the reliance on manually annotated templates within the prompt, and the other is the low efficiency of log processing. To address these challenges, we propose a self-evolving method called SelfLog, which, on the one hand, uses similar <group, template> pairs...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/issre62328.2024.00016","openalex_id":"https://openalex.org/W4404955189","cited_by_count":6,"quality_score":51,"matched_keywords":["LLM","language model"],"author_affiliations":["Chinese Academy of Sciences","Computer Network Information Center","Tencent (China)","Tsinghua University","University of Chinese Academy of Sciences"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7579345703125},{"id":"https://openalex.org/C186644900","display_name":"Parsing","score":0.7323382496833801},{"id":"https://openalex.org/C2781311116","display_name":"Group (periodic table)","score":0.4827134311199188},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.48183026909828186},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.47881776094436646},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.4378376007080078},{"id":"https://openalex.org/C178790620","display_name":"Organic chemistry","score":0.0},{"id":"https://openalex.org/C185592680","display_name":"Chemistry","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":6}},{"id":"openalex:W4404955314","title":"World Models: The Safety Perspective","url":"https://doi.org/10.1109/issrew63542.2024.00104","published":"2024-10-28","authors":["Zifan Zeng","Chongzhe Zhang","Feng Liu","Joseph Sifakis","Qunli Zhang","Shiming Liu","Peng Wang"],"abstract":"With the proliferation of the Large Language Model (LLM), the concept of World Models (WM) has recently attracted a great deal of attention in the AI research community, especially in the context of AI agents. It is arguably evolving into an essential foundation for building AI agent systems. A WM is intended to help the agent predict the future evolution of environmental states or help the agent fill in missing information so that it can plan its actions and behave safely. The safety property of WM plays a key role in their effective use in critical applications. In this work, we review and analyze the impacts of the current state-of-the-art in WM technology from the point of view of trustworthiness and safety based on a comprehensive survey and the fields of application envisaged. We provide an in-depth analysis of state-of-the-art WMs and derive technical research challenges and their...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/issrew63542.2024.00104","openalex_id":"https://openalex.org/W4404955314","cited_by_count":0,"quality_score":49,"matched_keywords":["LLM","language model","agent"],"author_affiliations":["Centre National de la Recherche Scientifique","Huawei Technologies (China)","Huawei Technologies (Germany)","Institut polytechnique de Grenoble","Université Grenoble Alpes"],"concepts":[{"id":"https://openalex.org/C12713177","display_name":"Perspective (graphical)","score":0.6746959090232849},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5505818724632263},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.18139472603797913}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4403842036","title":"MotionChain: Conversational Motion Controllers via Multimodal Prompts","url":"https://doi.org/10.1007/978-3-031-73347-5_4","published":"2024-10-28","authors":["Biao Jiang","Xin Chen","Chi Zhang","Fukun Yin","Zhuoyuan Li","Gang Yu","Jiayuan Fan"],"abstract":"","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"book-chapter","doi":"https://doi.org/10.1007/978-3-031-73347-5_4","openalex_id":"https://openalex.org/W4403842036","cited_by_count":10,"quality_score":47,"matched_keywords":[],"author_affiliations":["Fudan University","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8698625564575195},{"id":"https://openalex.org/C104114177","display_name":"Motion (physics)","score":0.5683699250221252},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.517704963684082},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.47657155990600586},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.4252511262893677},{"id":"https://openalex.org/C28490314","display_name":"Speech recognition","score":0.3913162350654602}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":10}},{"id":"openalex:W4404952881","title":"Multivariate Time Series Anomaly Detection based on Pre-trained Models with Dual-Attention Mechanism","url":"https://doi.org/10.1109/issrew63542.2024.00050","published":"2024-10-28","authors":["Yongqian Sun","Yang Guo","Minghan Liang","Xidao Wen","Junhua Kuang","Shenglin Zhang","Hongbo Li","Kaixu Xia","Dan Pei"],"abstract":"In major tech companies, monitoring server performance data with anomaly detection algorithms is crucial for assessing operational status. Existing models often require separate training or fine-tuning for each server due to generalization limitations, leading to increased storage, memory, and training costs. As the number of servers grows, this approach becomes impractical. To address this, we propose using pretrained language models for time series anomaly detection, leveraging their strong generalization capabilities. Specifically, we employ two pre-trained GPT-2 models as backbones and implement a two-stage fine-tuning strategy to retain learned knowledge while adapting to specific business data characteristics. Our experiments on multiple anomaly detection datasets demonstrate that our method achieves the best average F1-Score, outperforming the leading baseline by 7%.","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/issrew63542.2024.00050","openalex_id":"https://openalex.org/W4404952881","cited_by_count":3,"quality_score":44,"matched_keywords":["memory"],"author_affiliations":["Nankai University","Tencent (China)","Tsinghua University"],"concepts":[{"id":"https://openalex.org/C161584116","display_name":"Multivariate statistics","score":0.7774991989135742},{"id":"https://openalex.org/C739882","display_name":"Anomaly detection","score":0.7460475564002991},{"id":"https://openalex.org/C2780980858","display_name":"Dual (grammatical number)","score":0.7057834267616272},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6692022681236267},{"id":"https://openalex.org/C12997251","display_name":"Anomaly (physics)","score":0.6222620606422424},{"id":"https://openalex.org/C143724316","display_name":"Series (stratigraphy)","score":0.6207210421562195},{"id":"https://openalex.org/C89611455","display_name":"Mechanism (biology)","score":0.6177430152893066},{"id":"https://openalex.org/C151406439","display_name":"Time series","score":0.5354279279708862}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":3}},{"id":"openalex:W4403842129","title":"Select and Distill: Selective Dual-Teacher Knowledge Transfer for Continual Learning on Vision-Language Models","url":"https://doi.org/10.1007/978-3-031-73347-5_13","published":"2024-10-28","authors":["Yu-Chu Yu","Chi-Pin Huang","J.C. Chen","Kai-Po Chang","Yung-Hsuan Lai","Fu-En Yang","Yu-Chiang Frank Wang"],"abstract":"","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"book-chapter","doi":"https://doi.org/10.1007/978-3-031-73347-5_13","openalex_id":"https://openalex.org/W4403842129","cited_by_count":6,"quality_score":43,"matched_keywords":[],"author_affiliations":["National Taiwan University","Nvidia (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.824048638343811},{"id":"https://openalex.org/C2780980858","display_name":"Dual (grammatical number)","score":0.7085456848144531},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5493785738945007},{"id":"https://openalex.org/C2776960227","display_name":"Knowledge transfer","score":0.5144127011299133},{"id":"https://openalex.org/C150899416","display_name":"Transfer of learning","score":0.4766950309276581},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.4303659498691559},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.37336498498916626},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.3444393575191498}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":6}},{"id":"openalex:W4407130330","title":"Following the Compass: LLM-Empowered Intent Translation with Manual Guidance","url":"https://doi.org/10.1109/icnp61940.2024.10858507","published":"2024-10-28","authors":["Lingqi Guo","Jingyu Wang","Jianyu Wu","Caijun Yan","Haifeng Sun","Zirui Zhuang","Qi Qi","Yifei Dong","Haibao Ren","Jianxin Liao"],"abstract":"Intent-Based Networking (IBN) represents a novel paradigm of network automation and intelligence that has gradually been applied to network management. While the emergence of Large Language Models (LLMs) has improved the current state of IBN, hardware heterogeneity and high network dynamics remain significant challenges. Hardware heterogeneity requires that IBN effectively manage a diverse range of devices. The high network dynamics demands that IBN align service needs with rapidly changing network resources. We propose LIT, a framework of LLM-empowered Intent Translation with manual guidance. Given the outstanding language understanding and generation capabilities of LLM, LIT utilizes it in intent translation task. To further address two prevalent problems encountered in IBN, we introduce manual guidance and Mixture of Experts (MoE). Under the guidance of the manual, LLM improves its ab...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/icnp61940.2024.10858507","openalex_id":"https://openalex.org/W4407130330","cited_by_count":1,"quality_score":42,"matched_keywords":["LLM"],"author_affiliations":["Beijing University of Posts and Telecommunications","Huawei Technologies (China)","State Key Laboratory of Networking and Switching Technology"],"concepts":[{"id":"https://openalex.org/C2778361833","display_name":"Compass","score":0.8611820936203003},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6946136951446533},{"id":"https://openalex.org/C149364088","display_name":"Translation (biology)","score":0.6883828639984131},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.39706021547317505},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.33968067169189453},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.3247828781604767},{"id":"https://openalex.org/C58640448","display_name":"Cartography","score":0.08647674322128296},{"id":"https://openalex.org/C205649164","display_name":"Geography","score":0.05247548222541809}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4403842980","title":"MultiGen: Zero-Shot Image Generation from Multi-modal Prompts","url":"https://doi.org/10.1007/978-3-031-73242-3_17","published":"2024-10-28","authors":["Zhi-Fan Wu","Lianghua Huang","Wei Wang","Yanheng Wei","Liu Yu"],"abstract":"","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"book-chapter","doi":"https://doi.org/10.1007/978-3-031-73242-3_17","openalex_id":"https://openalex.org/W4403842980","cited_by_count":1,"quality_score":38,"matched_keywords":[],"author_affiliations":["Alibaba Group (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8144428133964539},{"id":"https://openalex.org/C71139939","display_name":"Modal","score":0.7622685432434082},{"id":"https://openalex.org/C2778344882","display_name":"Shot (pellet)","score":0.744677722454071},{"id":"https://openalex.org/C2780813799","display_name":"Zero (linguistics)","score":0.6009917855262756},{"id":"https://openalex.org/C121684516","display_name":"Computer graphics (images)","score":0.5928535461425781},{"id":"https://openalex.org/C115961682","display_name":"Image (mathematics)","score":0.55136638879776},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.5250646471977234},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.46976423263549805}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/lora-vs-full-fine-tuning-an-illusion-of-equivalence","title":"LoRA vs Full Fine-tuning: An Illusion of Equivalence","url":"https://www.microsoft.com/en-us/research/publication/lora-vs-full-fine-tuning-an-illusion-of-equivalence/","published":"2024-10-27","authors":["Reece Shuttleworth","Jacob Andreas","Antonio Torralba","Pratyusha Sharma","Pratyusha Sharma"],"abstract":"Fine-tuning is a crucial paradigm for adapting pre-trained large language models to downstream tasks. Recently, methods like Low-Rank Adaptation (LoRA) have been shown to effectively fine-tune LLMs with an extreme reduction in trainable parameters. But, \\emph{are their learned solutions really equivalent?} We study how LoRA and full-finetuning change pre-trained models by analyzing the model's weight matrices through the lens of their spectral properties. We find that LoRA and full fine-tuning yield weight matrices whose singular value decompositions exhibit very different structure: weight matrices trained with LoRA have new, high-ranking singular vectors, which we call \\emph{intruder dimensions}, while those trained with full fine-tuning do not. Further, we extend the finding that LoRA forgets less than full fine-tuning and find its forgetting is vastly localized to the intruder dimens...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/carmo-dynamic-criteria-generation-for-context-aware-reward-modelling","title":"CARMO: Dynamic Criteria Generation for Context-Aware Reward Modelling","url":"https://www.microsoft.com/en-us/research/publication/carmo-dynamic-criteria-generation-for-context-aware-reward-modelling/","published":"2024-10-27","authors":["Taneesh Gupta","Shivam Shandilya","Xuchao Zhang","Supriyo Ghosh","Chetan Bansal","Huaxiu Yao","Saravan Rajmohan"],"abstract":"Reward modeling in large language models is susceptible to reward hacking, causing models to latch onto superficial features such as the tendency to generate lists or unnecessarily long responses. In reinforcement learning from human feedback (RLHF) and more generally during post-training flawed reward signals often lead to outputs that optimize for these spurious correlates instead of genuine quality or correctness. We propose Context-Aware Reward Modeling (CARMO), a novel approach that first generates dynamic, context-relevant criteria to ground the reward model before producing reward scores. Unlike prior methods that rely on static rubrics, CARMO leverages large language models (LLMs) to adaptively create evaluation criteria such as logical consistency, clarity, and depth tailored to the user query. Our theoretical analysis shows that such criteria generation can mitigate reward hack...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","preference"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W4403780613","title":"GPT4Video: A Unified Multimodal Large Language Model for lnstruction-Followed Understanding and Safety-Aware Generation","url":"https://doi.org/10.1145/3664647.3681464","published":"2024-10-26","authors":["Zhanyu Wang","Longyue Wang","Zhen Zhao","Minghao Wu","Chenyang Lyu","Huayang Li","Cai Deng","Luping Zhou","Shuming Shi","Zhaopeng Tu"],"abstract":"","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3664647.3681464","openalex_id":"https://openalex.org/W4403780613","cited_by_count":18,"quality_score":59,"matched_keywords":["language model"],"author_affiliations":["Tencent (China)","The University of Sydney"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7193135619163513},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.4316878318786621},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3331208825111389}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":18}},{"id":"openalex:W4403792205","title":"mPLUG-PaperOwl: Scientific Diagram Analysis with the Multimodal Large Language Model","url":"https://doi.org/10.1145/3664647.3681294","published":"2024-10-26","authors":["Anwen Hu","Yaya Shi","Haiyang Xu","Jiabo Ye","Qinghao Ye","Ming Yan","Chenliang Li","Qi Qian","Ji Zhang","Fei Huang"],"abstract":"Weak diagram analysis abilities of LLMs or Multimodal LLMs greatly limit their application scenarios for scientific academic paper writing. In this work, towards a more versatile copilot for academic paper writing, we mainly focus on strengthening the multi-modal diagram analysis ability of Multimodal LLMs. By parsing Latex source files of academic papers, we carefully build a multi-modal diagram understanding dataset M-Paper. By aligning diagrams in the paper with related paragraphs, we construct professional diagram analysis samples for training and evaluation. M-Paper is the first dataset to support joint comprehension of multiple scientific diagrams, including figures and tables in the format of images or Latex codes. Besides, to better align the copilot with the user's intention, we introduce the 'outline' as the control signal, which could be directly given by the user or revised b...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3664647.3681294","openalex_id":"https://openalex.org/W4403792205","cited_by_count":11,"quality_score":56,"matched_keywords":["LLM","language model"],"author_affiliations":["Alibaba Group (China)","Alibaba Group (United States)","Bellevue Hospital Center","East China Normal University","University of Science and Technology of China"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6972851157188416},{"id":"https://openalex.org/C48419115","display_name":"Communication diagram","score":0.5573322176933289},{"id":"https://openalex.org/C186399060","display_name":"Diagram","score":0.47877228260040283},{"id":"https://openalex.org/C202446494","display_name":"Class diagram","score":0.4375545382499695},{"id":"https://openalex.org/C199360897","display_name":"Programming language","score":0.4269561469554901},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.4082998037338257},{"id":"https://openalex.org/C145644426","display_name":"Unified Modeling Language","score":0.211857408285141},{"id":"https://openalex.org/C77088390","display_name":"Database","score":0.08943355083465576}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":11}},{"id":"openalex:W4403791781","title":"Semantic Alignment for Multimodal Large Language Models","url":"https://doi.org/10.1145/3664647.3681014","published":"2024-10-26","authors":["Tao Wu","Mengze Li","Jingyuan Chen","Wei Ji","Lin Wang","Jinyang Gao","Kun Kuang","Zhou Zhao","Fei Wu"],"abstract":"Research on Multi-modal Large Language Models (MLLMs) towards the multi-image cross-modal instruction has received increasing attention and made significant progress, particularly in scenarios involving closely resembling images ( e.g., change captioning). Existing MLLMs typically follow a two-step process in their pipelines: first, extracting visual tokens independently for each input image, and then aligning these visual tokens from different images with the Large Language Model (LLM) in its textual feature space. However, the independent extraction of visual tokens for each image may result in different semantics being prioritized for different images in the first step, leading to a lack of preservation of linking information among images for subsequent LLM analysis. This issue becomes more serious in scenarios where significant variations exist among the images (e.g., visual storytel...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3664647.3681014","openalex_id":"https://openalex.org/W4403791781","cited_by_count":10,"quality_score":55,"matched_keywords":["LLM","language model"],"author_affiliations":["Alibaba Group (China)","National University of Singapore","Zhejiang University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.817363440990448},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.6560776829719543},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5446732044219971}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":10}},{"id":"openalex:W4403792011","title":"WeakSAM: Segment Anything Meets Weakly-supervised Instance-level Recognition","url":"https://doi.org/10.1145/3664647.3680960","published":"2024-10-26","authors":["Lianghui Zhu","Junwei Zhou","Yan Liu","Xin Hao","Wenyu Liu","Xinggang Wang"],"abstract":"Weakly-supervised visual recognition using inexact supervision is a critical yet challenging learning problem. It significantly reduces human labeling costs and traditionally relies on multi-instance learning and pseudo-labeling. This paper introduces WeakSAM and solves the weakly-supervised object detection (WSOD) and segmentation by utilizing the pre-learned world knowledge contained in a vision foundation model, i.e., the Segment Anything Model (SAM). WeakSAM addresses two critical limitations in traditional WSOD retraining, i.e., pseudo ground truth (PGT) incompleteness and noisy PGT instances, through adaptive PGT generation and Region of Interest (RoI) drop regularization. It also addresses the SAM's shortcomings of requiring human prompts and category unawareness in object detection and segmentation. Our results indicate that WeakSAM significantly surpasses previous state-of-the-a...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3664647.3680960","openalex_id":"https://openalex.org/W4403792011","cited_by_count":17,"quality_score":54,"matched_keywords":[],"author_affiliations":["Alibaba Group (China)","Huazhong University of Science and Technology"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.630476713180542},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5151130557060242},{"id":"https://openalex.org/C153180895","display_name":"Pattern recognition (psychology)","score":0.4879114329814911}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":17}},{"id":"openalex:W4403791410","title":"Hal-Eval: A Universal and Fine-grained Hallucination Evaluation Framework for Large Vision Language Models","url":"https://doi.org/10.1145/3664647.3680576","published":"2024-10-26","authors":["Chaoya Jiang","Hongrui Jia","Mengfan Dong","Wei Ye","Haiyang Xu","Ming Yan","Ji Zhang","Shikun Zhang"],"abstract":"Large Vision-Language Models (LVLMs) exhibit remarkable capabilities but struggle with ''hallucinations''-inconsistencies between images and their descriptions. Previous hallucination evaluation studies on LVLMs have identified hallucinations in terms of objects, attributes, and relations but overlooked complex hallucinations that create an entire narrative around a fictional entity. In this paper, we introduce a refined taxonomy of hallucinations, featuring a new category: Event Hallucination. We then utilize advanced LLMs to generate and filter fine-grained hallucinatory data consisting of various types of hallucinations, with a particular focus on event hallucinations, laying the groundwork for integrating discriminative and generative evaluation methods within our universal evaluation framework. The proposed benchmark distinctively assesses LVLMs' ability to tackle a broad spectrum o...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3664647.3680576","openalex_id":"https://openalex.org/W4403791410","cited_by_count":16,"quality_score":53,"matched_keywords":[],"author_affiliations":["Alibaba Group (China)","Peking University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.654334545135498},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.54647296667099},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.3955506980419159},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.32341498136520386}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":16}},{"id":"arxiv:2407.18568","title":"Learning Spectral-Decomposited Tokens for Domain Generalized Semantic Segmentation","url":"http://arxiv.org/abs/2407.18568","published":"2024-10-26","authors":["Jingjun Yi","Qi Bi","Hao Zheng","Haolan Zhan","Wei Ji","Yawen Huang","Yuexiang Li","Yefeng Zheng"],"abstract":"The rapid development of Vision Foundation Model (VFM) brings inherent out-domain generalization for a variety of down-stream tasks. Among them, domain generalized semantic segmentation (DGSS) holds unique challenges as the cross-domain images share common pixel-wise content information but vary greatly in terms of the style. In this paper, we present a novel Spectral-dEcomposed Token (SET) learning framework to advance the frontier. Delving into further than existing fine-tuning token & frozen backbone paradigm, the proposed SET especially focuses on the way learning style-invariant features from these learnable tokens. Particularly, the frozen VFM features are first decomposed into the phase and amplitude components in the frequency space, which mainly contain the information of content and style, respectively, and then separately processed by learnable tokens for task-specific informa...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"preprint","doi":"https://doi.org/10.1145/3664647.3680906","openalex_id":"https://openalex.org/W4403790959","cited_by_count":15,"quality_score":52,"matched_keywords":[],"author_affiliations":["Guangxi Medical University","Monash University","Tencent (China)","Westlake University","Wuhan University","Yale University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8185542821884155},{"id":"https://openalex.org/C89600930","display_name":"Segmentation","score":0.5533317923545837},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5432390570640564},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.5002422332763672},{"id":"https://openalex.org/C36503486","display_name":"Domain (mathematical analysis)","score":0.4312061667442322},{"id":"https://openalex.org/C33923547","display_name":"Mathematics","score":0.09241509437561035},{"id":"https://openalex.org/C134306372","display_name":"Mathematical analysis","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":15}},{"id":"openalex:W4403780601","title":"Differential-Perceptive and Retrieval-Augmented MLLM for Change Captioning","url":"https://doi.org/10.1145/3664647.3681453","published":"2024-10-26","authors":["Xian Zhang","Haokun Wen","Jianlong Wu","Pengda Qin","Hui Xue","Liqiang Nie"],"abstract":"Change captioning involves describing the subtle changes between a pair of similar images. Although existing efforts have achieved compelling success, they overlook the potential of multimodal large language models (MLLMs) in tackling this challenging task. In this work, we aim to empower MLLMs with the capability to perceive subtle differences between paired images and enhance their performance in generating change captions. Specifically, we present a diFferentIal-perceptive aNd rEtRieval-augmented MLLM (FINER-MLLM) tailored for this task. In particular, FINER-MLLM leverages LoRA fine-tuned MLLM's image encoder to extract image patch features, enabling the capture of detailed image information. Subsequently, within MLLM's feature extraction, typically Q-Former, FINER-MLLM incorporates dual constraints: the intra-image feature independence constraint and the inter-image feature alignment...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3664647.3681453","openalex_id":"https://openalex.org/W4403780601","cited_by_count":4,"quality_score":49,"matched_keywords":["LLM","retrieval"],"author_affiliations":["Alibaba Group (China)","Harbin Institute of Technology"],"concepts":[{"id":"https://openalex.org/C157657479","display_name":"Closed captioning","score":0.9483363628387451},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7745572328567505},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.5256313681602478},{"id":"https://openalex.org/C93226319","display_name":"Differential (mechanical device)","score":0.5030972361564636},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4124978184700012},{"id":"https://openalex.org/C115961682","display_name":"Image (mathematics)","score":0.12230291962623596},{"id":"https://openalex.org/C127413603","display_name":"Engineering","score":0.05083784461021423},{"id":"https://openalex.org/C146978453","display_name":"Aerospace engineering","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":4}},{"id":"openalex:W4403791844","title":"Eliminate Before Align: A Remote Sensing Image-Text Retrieval Framework with Keyword Explicit Reasoning","url":"https://doi.org/10.1145/3664647.3681270","published":"2024-10-26","authors":["Zhong Ji","Changxu Meng","Yan Zhang","Haoran Wang","Yanwei Pang","Jungong Han"],"abstract":"Mountains of researches center around the Remote Sensing Image-Text Retrieval (RSITR), aiming at retrieving the corresponding targets based on the given query. Among them, the transfer of Foundation Models (FMs), such as CLIP, to remote sensing domain shows promising results. However, existing FM-based approaches neglect the negative impact of weakly correlated sample pairs and the key distinctions among remote sensing texts, leading to biased and superficial exploration of sample pairs. To address these challenges, we propose a novel Eliminate Before Align strategy with Keyword Explicit Reasoning framework (EBAKER) for RSITR. Specifically, we devise an innovative Eliminate Before Align (EBA) strategy to filter out the weakly correlated sample pairs to mitigate their deviations from optimal embedding space during alignment. Moreover, we introduce a Keyword Explicit Reasoning (KER) module...","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3664647.3681270","openalex_id":"https://openalex.org/W4403791844","cited_by_count":6,"quality_score":47,"matched_keywords":["retrieval"],"author_affiliations":["Baidu (China)","Tianjin University","University of Sheffield"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7716807126998901},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.6997497081756592},{"id":"https://openalex.org/C1667742","display_name":"Image retrieval","score":0.5216591358184814},{"id":"https://openalex.org/C115961682","display_name":"Image (mathematics)","score":0.5112218856811523},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3205960988998413}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":6}},{"id":"arxiv:2408.03312","title":"MDT-A2G: Exploring Masked Diffusion Transformers for Co-Speech Gesture Generation","url":"http://arxiv.org/abs/2408.03312","published":"2024-10-26","authors":["Xiaofeng Mao","Zhengkai Jiang","Qilin Wang","Chencan Fu","Jiangning Zhang","Jiafu Wu","Yabiao Wang","Chengjie Wang","Wei Li","Mingmin Chi"],"abstract":"Recent advancements in the field of Diffusion Transformers have substantially improved the generation of high-quality 2D images, 3D videos, and 3D shapes. However, the effectiveness of the Transformer architecture in the domain of co-speech gesture generation remains relatively unexplored, as prior methodologies have predominantly employed the Convolutional Neural Network (CNNs) or simple a few transformer layers. In an attempt to bridge this research gap, we introduce a novel Masked Diffusion Transformer for co-speech gesture generation, referred to as MDT-A2G, which directly implements the denoising process on gesture sequences. To enhance the contextual reasoning capability of temporally aligned speech-driven gestures, we incorporate a novel Masked Diffusion Transformer. This model employs a mask modeling scheme specifically designed to strengthen temporal relation learning among sequ...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"preprint","doi":"https://doi.org/10.1145/3664647.3680684","openalex_id":"https://openalex.org/W4403445354","cited_by_count":5,"quality_score":46,"matched_keywords":["efficient"],"author_affiliations":["Fudan University","Tencent (China)","Zhejiang University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7600868940353394},{"id":"https://openalex.org/C66322947","display_name":"Transformer","score":0.6664285063743591},{"id":"https://openalex.org/C207347870","display_name":"Gesture","score":0.6505401134490967},{"id":"https://openalex.org/C23224414","display_name":"Hidden Markov model","score":0.5256391763687134},{"id":"https://openalex.org/C2776214188","display_name":"Inference","score":0.5248363614082336},{"id":"https://openalex.org/C28490314","display_name":"Speech recognition","score":0.524422287940979},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4835905432701111},{"id":"https://openalex.org/C68339613","display_name":"Speedup","score":0.4776628911495209}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":5}},{"id":"openalex:W4403781429","title":"A Method for Efficient Structured Data Generation with Large Language Models","url":"https://doi.org/10.1145/3688866.3689127","published":"2024-10-26","authors":["Zongzhi Hou","Ruohan Zhao","Zhongyang Li","Zheng Wang","Yizhen Wu","Junwei Gou","Zhifeng Zhu"],"abstract":"With the rapid development of large language model technology, we find ourselves at an interesting juncture regarding the importance of data. The textual data samples from these large unsupervised models are often of poor quality, which in turn produces substandard results. Implicitly, this means that the model struggles to learn the exact underlying structure of the data distribution without supervision, which can manifest as output lacking fidelity and relevance to real data distributions. In order to overcome some of these limitations in data-driven text generation tasks, this paper presents a Efficient Data Generation System (EDGS) for multimodal structured data generation.","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3688866.3689127","openalex_id":"https://openalex.org/W4403781429","cited_by_count":1,"quality_score":46,"matched_keywords":["language model","efficient"],"author_affiliations":["Huawei Technologies (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7827080488204956},{"id":"https://openalex.org/C199360897","display_name":"Programming language","score":0.42648160457611084},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.3756501078605652}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4403780662","title":"EvilEdit: Backdooring Text-to-Image Diffusion Models in One Second","url":"https://doi.org/10.1145/3664647.3680689","published":"2024-10-26","authors":["Hao Wang","Shangwei Guo","Jialing He","Kangjie Chen","Shudong Zhang","Tianwei Zhang","Tao Xiang"],"abstract":"Text-to-image (T2I) diffusion models enjoy great popularity and many individuals and companies build their applications based on publicly released T2I diffusion models. Previous studies have demonstrated that backdoor attacks can elicit T2I diffusion models to generate unsafe target images through textual triggers. However, existing backdoor attacks typically demand substantial tuning data for poisoning, limiting their practicality and potentially degrading the overall performance of T2I diffusion models. To address these issues, we propose EvilEdit, a training-free and data-free backdoor attack against T2I diffusion models. EvilEdit directly edits the projection matrices in the cross-attention layers to achieve projection alignment between a trigger and the corresponding backdoor target. We preserve the functionality of the backdoored model using a protected whitelist to ensure the sema...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3664647.3680689","openalex_id":"https://openalex.org/W4403780662","cited_by_count":8,"quality_score":45,"matched_keywords":[],"author_affiliations":["Chongqing University","Huawei Technologies (China)","Nanyang Technological University"],"concepts":[{"id":"https://openalex.org/C69357855","display_name":"Diffusion","score":0.6327252388000488},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6180596351623535},{"id":"https://openalex.org/C115961682","display_name":"Image (mathematics)","score":0.5585477948188782},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3661002814769745},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.3550654947757721},{"id":"https://openalex.org/C121684516","display_name":"Computer graphics (images)","score":0.3265058398246765},{"id":"https://openalex.org/C121332964","display_name":"Physics","score":0.11682212352752686},{"id":"https://openalex.org/C97355855","display_name":"Thermodynamics","score":0.07042881846427917}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":8}},{"id":"openalex:W4403791610","title":"CustomNet: Object Customization with Variable-Viewpoints in Text-to-Image Diffusion Models","url":"https://doi.org/10.1145/3664647.3681396","published":"2024-10-26","authors":["Ziyang Yuan","Mingdeng Cao","Xintao Wang","Zhongang Qi","Chun Yuan","Ying Shan"],"abstract":"Incorporating a customized object into image generation presents an attractive feature in text-to-image (T2I) generation. Some methods finetune T2I models for each object individually at test-time, which tend to be overfitted and time-consuming. Others train an extra encoder to extract object visual information for customization efficiently but struggle to preserve the object's identity. To address these limitations, we present CustomNet, a unified encoder-based object customization framework that explicitly incorporates 3D novel view synthesis capabilities into the customization process. This integration facilitates the adjustment of spatial positions and viewpoints, producing diverse outputs while effectively preserving the object's identity. To train our model effectively, we propose a dataset construction pipeline to better handle real-world objects and complex backgrounds. Additiona...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3664647.3681396","openalex_id":"https://openalex.org/W4403791610","cited_by_count":8,"quality_score":45,"matched_keywords":[],"author_affiliations":["Tencent (China)","The University of Tokyo","Tsinghua University","University Town of Shenzhen"],"concepts":[{"id":"https://openalex.org/C2776035091","display_name":"Viewpoints","score":0.7744036912918091},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7234125137329102},{"id":"https://openalex.org/C183003079","display_name":"Personalization","score":0.6628932356834412},{"id":"https://openalex.org/C182365436","display_name":"Variable (mathematics)","score":0.5485336780548096},{"id":"https://openalex.org/C2781238097","display_name":"Object (grammar)","score":0.5045112371444702},{"id":"https://openalex.org/C69357855","display_name":"Diffusion","score":0.44844701886177063},{"id":"https://openalex.org/C115961682","display_name":"Image (mathematics)","score":0.4351697862148285},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.4145389199256897}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":8}},{"id":"openalex:W4403791464","title":"Large Point-to-Gaussian Model for Image-to-3D Generation","url":"https://doi.org/10.1145/3664647.3680920","published":"2024-10-26","authors":["Longfei Lu","Huachen Gao","Tao Dai","Yaohua Zha","Hou Zhi","Junta Wu","Shu‐Tao Xia"],"abstract":"","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3664647.3680920","openalex_id":"https://openalex.org/W4403791464","cited_by_count":7,"quality_score":44,"matched_keywords":[],"author_affiliations":["Peng Cheng Laboratory","Shenzhen University","Tencent (China)","Tsinghua University","Tsinghua–Berkeley Shenzhen Institute"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6128159761428833},{"id":"https://openalex.org/C115961682","display_name":"Image (mathematics)","score":0.5243014097213745},{"id":"https://openalex.org/C163716315","display_name":"Gaussian","score":0.5088053345680237},{"id":"https://openalex.org/C28719098","display_name":"Point (geometry)","score":0.4989287853240967},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.42612963914871216},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.4177979826927185},{"id":"https://openalex.org/C121684516","display_name":"Computer graphics (images)","score":0.34246310591697693},{"id":"https://openalex.org/C33923547","display_name":"Mathematics","score":0.1855221688747406}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":7}},{"id":"openalex:W4403780506","title":"LLaVA-VSD: Large Language-and-Vision Assistant for Visual Spatial Description","url":"https://doi.org/10.1145/3664647.3688992","published":"2024-10-26","authors":["Yizhang Jin","Jian Li","Jiangning Zhang","Jianlong Hu","Zhenye Gan","Xin Tan","Yong Liu","Yabiao Wang","Chengjie Wang","Lizhuang Ma"],"abstract":"Visual Spatial Description (VSD) aims to generate texts that describe the spatial relationships between objects within images. Traditional visual spatial relationship classification (VSRC) methods typically output the spatial relationship between two objects in an image, often neglecting world knowledge and lacking general language capabilities. In this paper, we propose a Large Language-and-Vision Assistant for Visual Spatial Description, named LLaVA-VSD, which is designed for the classification, description, and open-ended description of visual spatial relationships. Specifically, the model first constructs a visual spatial instruction-following dataset using given figure-caption pairs for the three tasks. It then employs LoRA to fine-tune a Large Language and Vision Assistant for VSD, which has 13 billion parameters and supports high-resolution images. Finally, a large language model....","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3664647.3688992","openalex_id":"https://openalex.org/W4403780506","cited_by_count":2,"quality_score":43,"matched_keywords":["language model"],"author_affiliations":["East China Normal University","Shanghai Jiao Tong University","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7575814723968506},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5181344747543335},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.4281710684299469}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":2}},{"id":"openalex:W4403791703","title":"Break the Visual Perception: Adversarial Attacks Targeting Encoded Visual Tokens of Large Vision-Language Models","url":"https://doi.org/10.1145/3664647.3680779","published":"2024-10-26","authors":["Yubo Wang","Chaohu Liu","Yanqiu Qu","Haoyu Cao","Deqiang Jiang","Linli Xu"],"abstract":"Large vision-language models (LVLMs) integrate visual information into large language models, showcasing remarkable multi-modal conversational capabilities. However, the visual modules introduces new challenges in terms of robustness for LVLMs, as attackers can craft adversarial images that are visually clean but may mislead the model to generate incorrect answers. In general, LVLMs rely on vision encoders to transform images into visual tokens, which are crucial for the language models to perceive image contents effectively. Therefore, we are curious about one question: Can LVLMs still generate correct responses when the encoded visual tokens are attacked and disrupting the visual information? To this end, we propose a non-targeted attack method referred to as VT-Attack (Visual Tokens Attack), which constructs adversarial examples from multiple perspectives, with the goal of comprehensi...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3664647.3680779","openalex_id":"https://openalex.org/W4403791703","cited_by_count":6,"quality_score":43,"matched_keywords":[],"author_affiliations":["Tencent (China)","University of Science and Technology of China"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7984194159507751},{"id":"https://openalex.org/C37736160","display_name":"Adversarial system","score":0.780575156211853},{"id":"https://openalex.org/C26760741","display_name":"Perception","score":0.6596603989601135},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5583046674728394},{"id":"https://openalex.org/C2780878386","display_name":"Visual language","score":0.5242643356323242},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.4755798876285553},{"id":"https://openalex.org/C178253425","display_name":"Visual perception","score":0.4683125913143158},{"id":"https://openalex.org/C15744967","display_name":"Psychology","score":0.09119760990142822}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":6}},{"id":"openalex:W4403780681","title":"VoiceTuner: Self-Supervised Pre-training and Efficient Fine-tuning For Voice Generation","url":"https://doi.org/10.1145/3664647.3681695","published":"2024-10-26","authors":["Rongjie Huang","Yongqi Wang","Ruofan Hu","Xiaoshan Xu","Zhiqing Hong","Dongchao Yang","Xize Cheng","Zehan Wang","Ziyue Jiang","Zhenhui Ye","Luping Liu","Siqi Zheng"],"abstract":"Voice large language models (LLMs) cast voice synthesis as a language modeling task in a discrete space, and have demonstrated significant progress to date. Despite the recent success, the current development of voice LLMs in low-resource applications is hampered by data scarcity and high computational cost. In this work, we propose VoiceTuner, with a self-supervised pre-training and efficient fine-tuning approach for low-resource voice generation. Specifically, 1) to mitigate data scarcity, we leverage large-scale unlabeled dataset and pre-train VoiceTuner-SSL without pre-defined applications, which can be fine-tuned in downstream tasks; 2) to further reduce the high training cost in complete fine-tuning, we introduce a multiscale transformer adapter to effectively update only around 1% parameters as a plug-and-play module. Experimental results demonstrate that VoiceTuner-SSL presents s...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3664647.3681695","openalex_id":"https://openalex.org/W4403780681","cited_by_count":1,"quality_score":42,"matched_keywords":["efficient"],"author_affiliations":["Alibaba Group (China)","Chinese University of Hong Kong","University of Hong Kong","Zhejiang University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7151068449020386},{"id":"https://openalex.org/C2777211547","display_name":"Training (meteorology)","score":0.7130247354507446},{"id":"https://openalex.org/C28490314","display_name":"Speech recognition","score":0.5236061215400696},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3567584753036499},{"id":"https://openalex.org/C121332964","display_name":"Physics","score":0.0},{"id":"https://openalex.org/C153294291","display_name":"Meteorology","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4403791333","title":"ReForm-Eval: Evaluating Large Vision Language Models via Unified Re-Formulation of Task-Oriented Benchmarks","url":"https://doi.org/10.1145/3664647.3681529","published":"2024-10-26","authors":["Zejun Li","Ye Wang","Mengfei Du","Qingwen Liu","Binhao Wu","Jiwen Zhang","Chengxing Zhou","Zhihao Fan","Jie Fu","Jingjing Chen","Zhongyu Wei","Xuanjing Huang"],"abstract":"Recent years have witnessed remarkable progress in the development of large vision-language models (LVLMs). Benefiting from the strong language backbones and efficient cross-modal alignment strategies, LVLMs exhibit surprising capabilities to perceive visual signals and perform visually grounded reasoning. However, the capabilities of LVLMs have not been comprehensively and quantitatively evaluated. Most existing multi-modal benchmarks require task-oriented input-output formats, posing great challenges to automatically assess the free-form text output of LVLMs. To effectively leverage the annotations available and reduce the manual efforts required for constructing new benchmarks, we propose to re-formulate existing benchmarks into unified LVLM-compatible formats. Through systematic data collection and reformulation, we present ReForm-Eval benchmark, offering substantial data for evaluat...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3664647.3681529","openalex_id":"https://openalex.org/W4403791333","cited_by_count":1,"quality_score":42,"matched_keywords":["efficient"],"author_affiliations":["Alibaba Group (China)","Fudan University","Hong Kong University of Science and Technology","Northeastern University","Shanghai Center for Brain Science and Brain-Inspired Technology","Shanghai Institute for Science of Science"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7721630930900574},{"id":"https://openalex.org/C2780451532","display_name":"Task (project management)","score":0.658083438873291},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.4523348808288574},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4515690505504608},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.4388885498046875},{"id":"https://openalex.org/C199360897","display_name":"Programming language","score":0.41319432854652405},{"id":"https://openalex.org/C201995342","display_name":"Systems engineering","score":0.11024200916290283},{"id":"https://openalex.org/C127413603","display_name":"Engineering","score":0.09920558333396912}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"arxiv:2409.05076","title":"PIP: Detecting Adversarial Examples in Large Vision-Language Models via Attention Patterns of Irrelevant Probe Questions","url":"http://arxiv.org/abs/2409.05076","published":"2024-10-26","authors":["Yudong Zhang","Ruobing Xie","Jiansheng Chen","Xingwu Sun","Yu Wang"],"abstract":"Large Vision-Language Models (LVLMs) have demonstrated their powerful multimodal capabilities. However, they also face serious safety problems, as adversaries can induce robustness issues in LVLMs through the use of well-designed adversarial examples. Therefore, LVLMs are in urgent need of detection tools for adversarial examples to prevent incorrect responses. In this work, we first discover that LVLMs exhibit regular attention patterns for clean images when presented with probe questions. We propose an unconventional method named PIP, which utilizes the attention patterns of one randomly selected irrelevant probe question (e.g., \"Is there a clock''') to distinguish adversarial examples from clean examples. Regardless of the image to be tested and its corresponding question, PIP only needs to perform one additional inference of the image to be tested and the probe question, and then ach...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"preprint","doi":"https://doi.org/10.1145/3664647.3685510","openalex_id":"https://openalex.org/W4403617527","cited_by_count":5,"quality_score":42,"matched_keywords":[],"author_affiliations":["Tencent (China)","Tsinghua University","University of Macau","University of Science and Technology Beijing"],"concepts":[{"id":"https://openalex.org/C37736160","display_name":"Adversarial system","score":0.871050238609314},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7843216061592102},{"id":"https://openalex.org/C63479239","display_name":"Robustness (evolution)","score":0.6197569370269775},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.6019086837768555},{"id":"https://openalex.org/C2776214188","display_name":"Inference","score":0.5317334532737732},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.4413614273071289},{"id":"https://openalex.org/C207347870","display_name":"Gesture","score":0.44069257378578186},{"id":"https://openalex.org/C2780586882","display_name":"Simple (philosophy)","score":0.4271838963031769}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":5}},{"id":"arxiv:2407.17779","title":"DAC: 2D-3D Retrieval with Noisy Labels via Divide-and-Conquer Alignment and Correction","url":"http://arxiv.org/abs/2407.17779","published":"2024-10-26","authors":["Chaofan Gan","Yuanpeng Tu","Yuxi Li","Weiyao Lin"],"abstract":"With the recent burst of 2D and 3D data, cross-modal retrieval has attracted increasing attention recently. However, manual labeling by non-experts will inevitably introduce corrupted annotations given ambiguous 2D/3D content. Though previous works have addressed this issue by designing a naive division strategy with hand-crafted thresholds, their performance generally exhibits great sensitivity to the threshold value. Besides, they fail to fully utilize the valuable supervisory signals within each divided subset. To tackle this problem, we propose a Divide-and-conquer 2D-3D cross-modal Alignment and Correction framework (DAC), which comprises Multimodal Dynamic Division (MDD) and Adaptive Alignment and Correction (AAC). Specifically, the former performs accurate sample division by adaptive credibility modeling for each sample based on the compensation information within multimodal loss....","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"preprint","doi":"https://doi.org/10.1145/3664647.3680859","openalex_id":"https://openalex.org/W4402618476","cited_by_count":1,"quality_score":42,"matched_keywords":["retrieval"],"author_affiliations":["Shanghai Jiao Tong University","Tencent (China)","University of Hong Kong"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7985048890113831},{"id":"https://openalex.org/C185798385","display_name":"Benchmark (surveying)","score":0.6378918886184692},{"id":"https://openalex.org/C71559656","display_name":"Divide and conquer algorithms","score":0.637474536895752},{"id":"https://openalex.org/C774472","display_name":"Margin (machine learning)","score":0.5891338586807251},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5486181974411011},{"id":"https://openalex.org/C2780767217","display_name":"Generality","score":0.5012204647064209},{"id":"https://openalex.org/C198531522","display_name":"Sample (material)","score":0.4274721145629883},{"id":"https://openalex.org/C99498987","display_name":"Noise (video)","score":0.4196809232234955}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4403780696","title":"Bilateral Adaptive Cross-Modal Fusion Prompt Learning for CLIP","url":"https://doi.org/10.1145/3664647.3681218","published":"2024-10-26","authors":["Qiang Wang","Ke Yan","Shouhong Ding"],"abstract":"In the realm of CLIP adaptation through prompt learning, it is important to emphasize the pivotal role that the proper alignment of visual and textual representations plays when adapting the CLIP to downstream tasks. We propose that the proper alignment for downstream tasks is determined by the flexibility of the interaction between cross-modal information, which compensates for the absence of contrastive loss during the adaptation process. However, the current prompt learning methods, such as isolated modifications to the visual or language branches of CLIP or the employment of uni-directional cross-modal fusion, are not sufficient to explore the full potential of the mutual interaction between visual and textual modalities. To overcome this limitation, we propose a new paradigm for the CLIP prompt learning community, named Bilateral Adaptive Cross-Modal Fusion Prompt Learning (Bloom),....","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3664647.3681218","openalex_id":"https://openalex.org/W4403780696","cited_by_count":1,"quality_score":42,"matched_keywords":["efficient"],"author_affiliations":["Tencent (China)"],"concepts":[{"id":"https://openalex.org/C71139939","display_name":"Modal","score":0.7073938250541687},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6627947092056274},{"id":"https://openalex.org/C158525013","display_name":"Fusion","score":0.48041701316833496},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3749317526817322},{"id":"https://openalex.org/C192562407","display_name":"Materials science","score":0.08186429738998413},{"id":"https://openalex.org/C138885662","display_name":"Philosophy","score":0.0},{"id":"https://openalex.org/C188027245","display_name":"Polymer chemistry","score":0.0},{"id":"https://openalex.org/C41895202","display_name":"Linguistics","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4403780521","title":"See or Guess: Counterfactually Regularized Image Captioning","url":"https://doi.org/10.1145/3664647.3681458","published":"2024-10-26","authors":["Qian Cao","Xu Chen","Ruihua Song","Xiting Wang","Xinting Huang","Yuchen Ren"],"abstract":"Image captioning, which generates natural language descriptions of images, is a crucial task in vision-language research. Previous models have typically addressed this task by aligning the generative capabilities of machines with humans through statistical fitting existing datasets. While effective for normal images, they may struggle to accurately describe those where certain parts of the image are obscured or edited, unlike humans who excel in such cases. These weaknesses, including hallucinations and limited interpretability, often hinder performance in scenarios with shifted association patterns. In this paper, we present a generic image captioning framework that employs causal inference to make existing models more capable of interventional tasks, and counterfactually explainable. Our approach includes two variants leveraging either total effect or natural direct effect. Integrating...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3664647.3681458","openalex_id":"https://openalex.org/W4403780521","cited_by_count":4,"quality_score":41,"matched_keywords":[],"author_affiliations":["Renmin University of China","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C157657479","display_name":"Closed captioning","score":0.8879663944244385},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6794896721839905},{"id":"https://openalex.org/C115961682","display_name":"Image (mathematics)","score":0.6195283532142639},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.47171327471733093},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.3865639865398407}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":4}},{"id":"openalex:W4403780651","title":"Bridging Gaps in Content and Knowledge for Multimodal Entity Linking","url":"https://doi.org/10.1145/3664647.3681661","published":"2024-10-26","authors":["Pengfei Luo","Tong Xu","Che Liu","Suojuan Zhang","Linli Xu","Minglei Li","Enhong Chen"],"abstract":"Multimodal Entity Linking (MEL) aims to address the ambiguity in multimodal mentions and associate them with Multimodal Knowledge Graphs (MMKGs). Existing works primarily focus on designing multimodal interaction and fusion mechanisms to enhance the performance of MEL. However, these methods still overlook two crucial gaps within the MEL task. One is the content discrepancy between mentions and entities, manifested as uneven information density. The other is the knowledge gap, indicating insufficient knowledge extraction and reasoning during the linking process. To bridge these gaps, we propose a novel framework FissFuse, as well as a plug-and-play knowledge-aware re-ranking method KAR. Specifically, FissFuse collaborates with the Fission and Fusion branches, establishing dynamic features for each mention-entity pair and adaptively learning multimodal interactions to alleviate content di...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3664647.3681661","openalex_id":"https://openalex.org/W4403780651","cited_by_count":4,"quality_score":41,"matched_keywords":[],"author_affiliations":["Huawei Technologies (China)","PLA Army Engineering University","University of Science and Technology of China"],"concepts":[{"id":"https://openalex.org/C174348530","display_name":"Bridging (networking)","score":0.8650641441345215},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.675767719745636},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.3382376432418823},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.3240954279899597},{"id":"https://openalex.org/C31258907","display_name":"Computer network","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":4}},{"id":"openalex:W4405272689","title":"Adaptive Batch Budget for LLM Inference","url":"https://doi.org/10.1109/ubmk63289.2024.10773573","published":"2024-10-26","authors":["Çağrı Yeşil","Berhan Türkü Ay","Funda Ay Ak","Öykü Berfin Mercan","Oğuzhan Nefesoğlu"],"abstract":"Large Language Models (LLMs) are developing at a rapid pace, which requires the development of effective inference approaches to keep up with the increasing demand for computational resources. This study investigates the inference capabilities of LLMs, with a specific emphasis o n the prefill a nd decode stages. Wee valuate a nd contrast multiple cutting-edge techniques, such as vLLM, SplitFuse, and Sarathi's chunked-prefill strategy. Every strategy has i ts o wn advantages in maximizing GPU utilization and enhancing throughput during inference. We present an innovative method for scheduling tasks that adapts the allocation of resources according to the proportion of prefill a nd decode requests. This strategy aims toe nhance the efficiency o f current methods by improving the time p er output token (TPOT) and throughput while maintaining competitive time to first token (TTFT) metrics. T...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/ubmk63289.2024.10773573","openalex_id":"https://openalex.org/W4405272689","cited_by_count":0,"quality_score":41,"matched_keywords":["LLM"],"author_affiliations":["Huawei Technologies (China)"],"concepts":[{"id":"https://openalex.org/C2776214188","display_name":"Inference","score":0.6204821467399597},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6188427209854126},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.243057519197464}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4403791195","title":"<i>Aspects are Anchors:</i> Towards Multimodal Aspect-based Sentiment Analysis via Aspect-driven Alignment and Refinement","url":"https://doi.org/10.1145/3664647.3681189","published":"2024-10-26","authors":["Z. J. Chen","Zhihong Zhu","Wanshi Xu","Yunyan Zhang","Xian Wu","Yefeng Zheng"],"abstract":"Given coupled sentence image pairs, Multimodal Aspect-based Sentiment Analysis (MABSA) aims to detect aspect terms and predict their sentiment polarity. While existing methods have made great efforts in aligning images and text for improved MABSA performance, they still struggle to effectively mitigate the challenge of the noisy correspondence problem (NCP): the text description is often not well-aligned with the visual content. To alleviate NCP, in this paper, we introduce Aspect-driven Alignment and Refinement (ADAR), which is a two-stage coarse-to-fine alignment framework. In the first stage, ADAR devises a novel Coarse-to-fine Aspect-driven Alignment Module, which introduces Optimal Transport (OT) to learn the coarse-grained alignment between visual and textual features. Then the adaptive filter bin is applied to remove the irrelevant image regions at a fine-grained level; In the sec...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3664647.3681189","openalex_id":"https://openalex.org/W4403791195","cited_by_count":4,"quality_score":41,"matched_keywords":[],"author_affiliations":["Peking University Shenzhen Hospital","Tencent (China)","Westlake University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.709791362285614},{"id":"https://openalex.org/C66402592","display_name":"Sentiment analysis","score":0.5585562586784363},{"id":"https://openalex.org/C60051680","display_name":"Aspect-oriented programming","score":0.5156344175338745},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3829609751701355},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.33786481618881226},{"id":"https://openalex.org/C199360897","display_name":"Programming language","score":0.19440603256225586},{"id":"https://openalex.org/C2777904410","display_name":"Software","score":0.062285542488098145}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":4}},{"id":"openalex:W4403791857","title":"Multimodal Inplace Prompt Tuning for Open-set Object Detection","url":"https://doi.org/10.1145/3664647.3681275","published":"2024-10-26","authors":["Guilin Li","Mengdan Zhang","Xiawu Zheng","Peixian Chen","Zihan Wang","Yunhang Shen","Mingchen Zhuge","Chenglin Wu","Fei Chao","Ke Li","Xing Sun","Rongrong Ji"],"abstract":"The integration of large language models into open-world detection frameworks significantly improves versatility in new environments. Prompt representations derived from these models help establish classification boundaries for both base and novel categories within open-world detectors. However, we are the first to discover that directly fine-tuning language models in detection systems results in redundant attention patterns and leads to suboptimal prompt representations. In order to fully leverage the capabilities of large language models and augment prompt encoding for detection, this study introduces a redundancy assessment metric to identify uniform attention patterns. Furthermore, in areas with high redundancy, we incorporate multimodal inplace prompt tuning (MIPT) to enrich the text prompt with visual clues. Experimental results validate the efficacy of our MIPT framework, achievin...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3664647.3681275","openalex_id":"https://openalex.org/W4403791857","cited_by_count":2,"quality_score":39,"matched_keywords":[],"author_affiliations":["East China Normal University","King Abdullah University of Science and Technology","Tencent (China)","Xiamen University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7159332036972046},{"id":"https://openalex.org/C2776151529","display_name":"Object detection","score":0.5468778014183044},{"id":"https://openalex.org/C177264268","display_name":"Set (abstract data type)","score":0.4822300374507904},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.48025229573249817},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4624856114387512},{"id":"https://openalex.org/C42357961","display_name":"Open set","score":0.4298502802848816},{"id":"https://openalex.org/C2781238097","display_name":"Object (grammar)","score":0.42554229497909546},{"id":"https://openalex.org/C71681937","display_name":"Object-class detection","score":0.4206992983818054}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":2}},{"id":"openalex:W4403780757","title":"Attentive Linguistic Tracking in Diffusion Models for Training-free Text-guided Image Editing","url":"https://doi.org/10.1145/3664647.3680594","published":"2024-10-26","authors":["Bingyan Liu","Chengyu Wang","Jun Huang","Kui Jia"],"abstract":"Building on recent breakthroughs in diffusion-based text-to-image synthesis (TIS), training-free text-guided image editing (TIE) has emerged as an indispensable aspect of modern image editing practices. This technique involves the modification of features within attention layers to alter objects or their attributes within images during the generation process. Despite its utility, current image editing algorithms face challenges, particularly when editing multiple objects in an image. In this paper, we introduce VICTORIA, a novel approach that augments TIE by incorporating linguistic knowledge into the manipulation of attention maps during image generation. VICTORIA capitalizes on mechanisms within self-attention layers to ensure spatial consistency between source and target images. Further, we design a novel loss function that refines cross-attention maps, ensuring their alignment with l...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3664647.3680594","openalex_id":"https://openalex.org/W4403780757","cited_by_count":2,"quality_score":39,"matched_keywords":[],"author_affiliations":["Alibaba Group (China)","Chinese University of Hong Kong, Shenzhen","South China University of Technology"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6964297294616699},{"id":"https://openalex.org/C115961682","display_name":"Image (mathematics)","score":0.572944700717926},{"id":"https://openalex.org/C2775936607","display_name":"Tracking (education)","score":0.5523633360862732},{"id":"https://openalex.org/C2776674983","display_name":"Image editing","score":0.5459765791893005},{"id":"https://openalex.org/C2777211547","display_name":"Training (meteorology)","score":0.5354118347167969},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.45953914523124695},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.44794678688049316},{"id":"https://openalex.org/C69357855","display_name":"Diffusion","score":0.44500917196273804}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":2}},{"id":"openalex:W4403791842","title":"VL-Reader: Vision and Language Reconstructor is an Effective Scene Text Recognizer","url":"https://doi.org/10.1145/3664647.3681271","published":"2024-10-26","authors":["Humen Zhong","Zhibo Yang","Zhaohai Li","Peng Wang","Jun Tang","Wenqing Cheng","Cong Yao"],"abstract":"Text recognition is an inherent integration of vision and language, encompassing the visual texture in stroke patterns and the semantic context among the character sequences. Towards advanced text recognition, there are three key challenges: (1) an encoder capable of representing the visual and semantic distributions; (2) a decoder that ensures the alignment between vision and semantics; and (3) consistency in the framework during pre-training, if it exists, and fine-tuning. Inspired by masked autoencoding, a successful pre-training strategy in both vision and language, we propose an innovative scene text recognition approach, named VL-Reader. The novelty of the VL-Reader lies in the pervasive interplay between vision and language throughout the entire process. Concretely, we first introduce a Masked Visual-Linguistic Reconstruction (MVLR) objective, which aims at simultaneously modeling...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3664647.3681271","openalex_id":"https://openalex.org/W4403791842","cited_by_count":1,"quality_score":38,"matched_keywords":[],"author_affiliations":["Alibaba Group (China)","Huazhong University of Science and Technology"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7459452748298645},{"id":"https://openalex.org/C199360897","display_name":"Programming language","score":0.5245074033737183},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.5036527514457703},{"id":"https://openalex.org/C121684516","display_name":"Computer graphics (images)","score":0.4162719249725342},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4085143208503723},{"id":"https://openalex.org/C28490314","display_name":"Speech recognition","score":0.36821043491363525}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4403791263","title":"Neighbor Does Matter: Curriculum Global Positive-Negative Sampling for Vision-Language Pre-training","url":"https://doi.org/10.1145/3664647.3681502","published":"2024-10-26","authors":["Bin Huang","Feng He","Qi Wang","Hong Chen","Guohao Li","Zhifan Feng","Xin Wang","Wenwu Zhu"],"abstract":"Sampling strategies have been widely adopted in Vision-Language Pre-training (VLP) and have achieved great success recently. However, the sampling strategies adopted by current VLP works are limited in two ways: i) they only focus on negative sampling, ignoring the importance of more informative positive samples; ii) their sampling strategies are conducted in the local in-batch level, which may lead to sub-optimal results. To tackle these problems, in this paper, we propose a curriculum-based Global Positive-Negative Sampling (GPN-S) framework for vision-language pre-training, which conducts both positive and negative sampling in the global level, grounded on the notion of neighborhood relationships. Additionally, we incorporate curriculum learning into our sampling strategy, progressively increasing the complexity of samples as the training progresses. Specifically, our proposed GPN-S f...","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3664647.3681502","openalex_id":"https://openalex.org/W4403791263","cited_by_count":1,"quality_score":38,"matched_keywords":[],"author_affiliations":["Baidu (China)","Tsinghua University"],"concepts":[{"id":"https://openalex.org/C2777211547","display_name":"Training (meteorology)","score":0.6942251324653625},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6358365416526794},{"id":"https://openalex.org/C47177190","display_name":"Curriculum","score":0.5682166814804077},{"id":"https://openalex.org/C140779682","display_name":"Sampling (signal processing)","score":0.5063961744308472},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4953669309616089},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.33629530668258667},{"id":"https://openalex.org/C15744967","display_name":"Psychology","score":0.31365811824798584},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.274061381816864}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4403792005","title":"COMD: Training-free Video Motion Transfer With Camera-Object Motion Disentanglement","url":"https://doi.org/10.1145/3664647.3680600","published":"2024-10-26","authors":["Teng Hu","Jiangning Zhang","Ran Yi","Yating Wang","Jieyu Weng","Hongrui Huang","Yabiao Wang","Lizhuang Ma"],"abstract":"The emergence of diffusion models has greatly propelled the progress in image and video generation. Recently, some efforts have been made in controllable video generation, including text-to-video, image-to-video generation, video editing, and video motion control, among which camera motion control is an important topic. However, existing camera motion control methods rely on training a temporal camera module, and necessitate substantial computation resources due to the large amount of parameters in video generation models. Moreover, existing methods pre-define camera motion types during training, which limits their flexibility in camera control, preventing the realization of some specific camera controls, such as various camera movements in films. Therefore, to reduce training costs and achieve flexible camera control, we propose COMD, a novel training-free video motion transfer model, w...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3664647.3680600","openalex_id":"https://openalex.org/W4403792005","cited_by_count":1,"quality_score":38,"matched_keywords":[],"author_affiliations":["Harbin Institute of Technology","Shanghai Jiao Tong University","Tencent (China)","Zhejiang University"],"concepts":[{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.7326889038085938},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7156761288642883},{"id":"https://openalex.org/C104114177","display_name":"Motion (physics)","score":0.6778500080108643},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.6699721813201904},{"id":"https://openalex.org/C2781238097","display_name":"Object (grammar)","score":0.5609862804412842},{"id":"https://openalex.org/C2776175482","display_name":"Transfer (computing)","score":0.42723917961120605},{"id":"https://openalex.org/C173608175","display_name":"Parallel computing","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4403791983","title":"OpenLEAF: A Novel Benchmark for Open-Domain Interleaved Image-Text Generation","url":"https://doi.org/10.1145/3664647.3685511","published":"2024-10-26","authors":["Jie An","Zhengyuan Yang","Linjie Li","Jianfeng Wang","Kevin Lin","Zicheng Liu","Lijuan Wang","Jiebo Luo"],"abstract":"","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3664647.3685511","openalex_id":"https://openalex.org/W4403791983","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Bellevue College","Microsoft (United States)","University of Rochester"],"concepts":[{"id":"https://openalex.org/C185798385","display_name":"Benchmark (surveying)","score":0.7884998321533203},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7604715824127197},{"id":"https://openalex.org/C115961682","display_name":"Image (mathematics)","score":0.5515329241752625},{"id":"https://openalex.org/C36503486","display_name":"Domain (mathematical analysis)","score":0.5492770671844482},{"id":"https://openalex.org/C2993776861","display_name":"Open domain","score":0.5482041835784912},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4919855296611786},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.35955190658569336},{"id":"https://openalex.org/C153180895","display_name":"Pattern recognition (psychology)","score":0.3498821258544922}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4403791798","title":"FLIP-80M: 80 Million Visual-Linguistic Pairs for Facial Language-Image Pre-Training","url":"https://doi.org/10.1145/3664647.3681287","published":"2024-10-26","authors":["Yudong Li","Xianxu Hou","Zheng Dezhi","Linlin Shen","Zhe Zhao"],"abstract":"While significant progress has been made in multi-modal learning driven by large-scale image-text datasets, there is still a noticeable gap in the availability of such datasets within the facial domain. To facilitate and advance the field of facial representation learning, we present FLIP-80M, a large-scale visual-linguistic dataset comprising over 80 million face images paired with text descriptions. FLIP-80M is constructed by leveraging the large openly available image-text-pair dataset LAION-5B and a mixed-method approach to filter face-related pairs from both visual and linguistic perspectives. Our curation process involves face detection, face caption classification, text de-noising, and synthesis-based image augmentation. As a result, FLIP-80M stands as the largest face-text dataset to date. To evaluate the potential of our dataset, we fine-tune the CLIP model using the proposed FL...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3664647.3681287","openalex_id":"https://openalex.org/W4403791798","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Shenzhen Academy of Robotics","Shenzhen University","Tencent (China)","Xi’an Jiaotong-Liverpool University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5823956727981567},{"id":"https://openalex.org/C2777211547","display_name":"Training (meteorology)","score":0.545491635799408},{"id":"https://openalex.org/C115961682","display_name":"Image (mathematics)","score":0.5006978511810303},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.49319881200790405},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4726009964942932},{"id":"https://openalex.org/C41895202","display_name":"Linguistics","score":0.4162767827510834},{"id":"https://openalex.org/C205649164","display_name":"Geography","score":0.1762717366218567},{"id":"https://openalex.org/C153294291","display_name":"Meteorology","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/multi-field-adaptive-retrieval","title":"Multi-Field Adaptive Retrieval","url":"https://www.microsoft.com/en-us/research/publication/multi-field-adaptive-retrieval/","published":"2024-10-25","authors":["Millicent Li","Tongfei Chen","Ben Van Durme","Patrick Xia"],"abstract":"Document retrieval for tasks such as search and retrieval-augmented generation typically involves datasets that are unstructured: free-form text without explicit internal structure in each document. However, documents can have a structured form, consisting of fields such as an article title, message body, or HTML header. To address this gap, we introduce Multi-Field Adaptive Retrieval (MFAR), a flexible framework that accommodates any number of and any type of document indices on structured data. Our framework consists of two main steps: (1) the decomposition of an existing document into fields, each indexed independently through dense and lexical methods, and (2) learning a model which adaptively predicts the importance of a field by conditioning on the document query, allowing on-the-fly weighting of the most likely field(s). We find that our approach allows for the optimized use of de...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":84,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Search and information retrieval","Computation and Language","Computer science","Information retrieval","1970-01-01","retrieval"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/kahani-culturally-nuanced-visual-storytelling-pipeline-for-non-western-cultures","title":"Kahani: Culturally-Nuanced Visual Storytelling Pipeline for Non-Western Cultures","url":"https://www.microsoft.com/en-us/research/publication/kahani-culturally-nuanced-visual-storytelling-pipeline-for-non-western-cultures/","published":"2024-10-25","authors":["Hamna .","D. Sudharsan","Agrima Seth","Ritvik Budhiraja","Deepika Khullar","Vyshak Jain","Kalika Bali","Aditya Vashistha","Sameer Segal"],"abstract":"Large Language Models (LLMs) and Text-To-Image (T2I) models have demonstrated the ability to generate compelling text and visual stories. However, their outputs predominantly reflect the sensibilities of Western ideologies, often resulting in an outsider’s gaze on other cultures. As a result, non-Western communities have to put extra effort into generating culturally specific stories. To address this challenge, we developed a visual storytelling tool called Kahani that generates culturally grounded visual stories for non-Western cultures. Our tool leverages off-the-shelf models GPT-4 Turbo and Stable Diffusion XL (SDXL). By using Chain of Thought (CoT) and T2I prompting techniques, we capture the cultural context from user’s prompt and generate vivid descriptions of the characters and scene compositions. To evaluate the effectiveness of Kahani, we conducted a comparative user study with....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Miscellaneous","Human language technologies","Computation and Language","Computer science"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"apple:vg1zu8xxddnqnyxxnc3h5ple","title":"Divide-or-Conquer? Which Part Should You Distill Your LLM?","url":"https://machinelearning.apple.com/research/divide-conquer","published":"2024-10-25","authors":["Zhuofeng Wu","He Bai","Aonan Zhang","Jiatao Gu","VG Vinod Vydiswaran","Navdeep Jaitly","Yizhe Zhang"],"abstract":"Recent methods have demonstrated that Large Language Models (LLMs) can solve reasoning tasks better when they are encouraged to solve subtasks of the main task first. In this paper we devise a similar strategy that breaks down reasoning tasks into a problem decomposition phase and a problem solving phase and show that the strategy is able to outperform a single stage solution. Further, we hypothesize that the decomposition should be easier to...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["LLM"],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"openalex:W4403770638","title":"StyleAdapter: A Unified Stylized Image Generation Model","url":"https://doi.org/10.1007/s11263-024-02253-x","published":"2024-10-25","authors":["Zhouxia Wang","Xintao Wang","Liangbin Xie","Zhongang Qi","Ying Shan","Wenping Wang","Ping Luo"],"abstract":"","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1007/s11263-024-02253-x","openalex_id":"https://openalex.org/W4403770638","cited_by_count":12,"quality_score":49,"matched_keywords":[],"author_affiliations":["Tencent (China)","University of Hong Kong","University of Macau"],"concepts":[{"id":"https://openalex.org/C38935604","display_name":"Stylized fact","score":0.8699504137039185},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.6483376026153564},{"id":"https://openalex.org/C115961682","display_name":"Image (mathematics)","score":0.6014143228530884},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5779997706413269},{"id":"https://openalex.org/C153180895","display_name":"Pattern recognition (psychology)","score":0.5712032914161682},{"id":"https://openalex.org/C9417928","display_name":"Image processing","score":0.4748295247554779},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.43465885519981384},{"id":"https://openalex.org/C139719470","display_name":"Macroeconomics","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":12}},{"id":"openalex:W4403769907","title":"DreamDiffusion: High-Quality EEG-to-Image Generation with Temporal Masked Signal Modeling and CLIP Alignment","url":"https://doi.org/10.1007/978-3-031-72751-1_27","published":"2024-10-25","authors":["Yunpeng Bai","Xintao Wang","Yan‐Pei Cao","Yixiao Ge","Chun Yuan","Ying Shan"],"abstract":"","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"book-chapter","doi":"https://doi.org/10.1007/978-3-031-72751-1_27","openalex_id":"https://openalex.org/W4403769907","cited_by_count":11,"quality_score":48,"matched_keywords":[],"author_affiliations":["Kuaishou (China)","Tencent (China)","The University of Texas at Austin","Tsinghua–Berkeley Shenzhen Institute"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8425556421279907},{"id":"https://openalex.org/C522805319","display_name":"Electroencephalography","score":0.5703759789466858},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.5387222766876221},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5295031070709229},{"id":"https://openalex.org/C2779843651","display_name":"SIGNAL (programming language)","score":0.5023550987243652},{"id":"https://openalex.org/C115961682","display_name":"Image (mathematics)","score":0.48867467045783997},{"id":"https://openalex.org/C55020928","display_name":"Image quality","score":0.481904536485672},{"id":"https://openalex.org/C28490314","display_name":"Speech recognition","score":0.4672738313674927}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":11}},{"id":"openalex:W4403779178","title":"DreamView: Injecting View-Specific Text Guidance Into Text-to-3D Generation","url":"https://doi.org/10.1007/978-3-031-72698-9_21","published":"2024-10-25","authors":["Junkai Yan","Yipeng Gao","Qize Yang","Xihan Wei","Xuansong Xie","Ancong Wu","Wei‐Shi Zheng"],"abstract":"","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"book-chapter","doi":"https://doi.org/10.1007/978-3-031-72698-9_21","openalex_id":"https://openalex.org/W4403779178","cited_by_count":1,"quality_score":38,"matched_keywords":[],"author_affiliations":["Alibaba Group (China)","Ministry of Education of the People's Republic of China","Sun Yat-sen University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8321129083633423},{"id":"https://openalex.org/C2985684807","display_name":"Text generation","score":0.6077420711517334},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.48612093925476074},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4602697193622589},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.40200895071029663}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"arxiv:2410.21276","title":"GPT-4o System Card","url":"https://huggingface.co/papers/2410.21276","published":"2024-10-25","authors":["OpenAI","Aaron Hurst","Adam Lerer","Adam P. Goucher","Adam Perelman","Aditya Ramesh","Aidan Clark","AJ Ostrow","Akila Welihinda","Alan Hayes","Alec Radford","Aleksander Mądry"],"abstract":"GPT-4o is an autoregressive omni model that accepts as input any combination of text, audio, image, and video, and generates any combination of text, audio, and image outputs. It's trained end-to-end across text, vision, and audio, meaning all inputs and outputs are processed by the same neural network. GPT-4o can respond to audio inputs in as little as 232 milliseconds, with an average of 320 milliseconds, which is similar to human response time in conversation. It matches GPT-4 Turbo performance on text in English and code, with significant improvement on text in non-English languages, while also being much faster and 50\\% cheaper in the API. GPT-4o is especially better at vision and audio understanding compared to existing models. In line with our commitment to building AI safely and consistent with our voluntary commitments to the White House, we are sharing the GPT-4o System Card, w...","companies":["OpenAI"],"matched_orgs":["OpenAI"],"company_groups":["company_us"],"company_regions":["US"],"sources":["huggingface_search"],"source":"huggingface_search","work_type":"","doi":"","openalex_id":"","cited_by_count":0,"quality_score":27,"matched_keywords":[],"author_affiliations":[],"concepts":[],"official_report":false,"quality_signals":{"company_match_source":"HuggingFace Papers organization/author metadata"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/not-all-heads-matter-a-head-level-kv-cache-compression-method-with-integrated-retrieval-and-reasoning","title":"Not All Heads Matter: A Head-Level KV Cache Compression Method with Integrated Retrieval and Reasoning","url":"https://www.microsoft.com/en-us/research/publication/not-all-heads-matter-a-head-level-kv-cache-compression-method-with-integrated-retrieval-and-reasoning/","published":"2024-10-24","authors":["Yu Fu","Zefan Cai","Abedelkadir Asi","Wayne Xiong","Yue Dong","Wen Xiao"],"abstract":"Key-Value (KV) caching is a common technique to enhance the computational efficiency of Large Language Models (LLMs), but its memory overhead grows rapidly with input length. Prior work has shown that not all tokens are equally important for text generation, proposing layer-level KV cache compression to selectively retain key information. Recognizing the distinct roles of attention heads in generation, we propose HeadKV, a head-level KV cache compression method, and HeadKV-R2, which leverages a novel contextual reasoning ability estimation for compression. Our approach operates at the level of individual heads, estimating their importance for contextual QA tasks that require both retrieval and reasoning capabilities. Extensive experiments across diverse benchmarks (LongBench, LooGLE), model architectures (e.g., Llama-3-8B-Instruct, Mistral-7B-Instruct), and long-context abilities tests d...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":84,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computation and Language","Computer science","1970-01-01","memory","retrieval","compression"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/little-giants-synthesizing-high-quality-embedding-data-at-scale","title":"Little Giants: Synthesizing High-Quality Embedding Data at Scale","url":"https://www.microsoft.com/en-us/research/publication/little-giants-synthesizing-high-quality-embedding-data-at-scale/","published":"2024-10-24","authors":["Haonan Chen","Liang Wang","Nan Yang","Yutao Zhu","Ziliang Zhao","Furu Wei","Zhicheng Dou"],"abstract":"Synthetic data generation has become an increasingly popular way of training models without the need for large, manually labeled datasets. For tasks like text embedding, synthetic data offers diverse and scalable training examples, significantly reducing the cost of human annotation. However, most current approaches rely heavily on proprietary models like GPT-4, which are expensive and inefficient for generating large-scale embedding data. In this paper, we introduce SPEED, a framework that aligns open-source small models (8B) to efficiently generate large-scale synthetic embedding data. Through supervised fine-tuning, preference optimization, and self-improvement, SPEED enables small open-source models to produce high-quality data. Remarkably, SPEED uses only less than 1/10 of the GPT API calls, outperforming the state-of-the-art embedding model E5mistral when both are trained solely on...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":80,"matched_keywords":["Inproceedings (Conference)","Human language technologies","Search and information retrieval","Computer science","1970-01-01","preference","efficient"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/the-potential-and-value-of-ai-chatbot-in-personalized-cognitive-training","title":"The Potential and Value of AI Chatbot in Personalized Cognitive Training","url":"https://www.microsoft.com/en-us/research/publication/the-potential-and-value-of-ai-chatbot-in-personalized-cognitive-training/","published":"2024-10-24","authors":["Zilong Wang","Nan Chen","Luna K. Qiu","Ling Yue","Geli Guo","Yang Ou","Shiqi Jiang","Yuqing Yang","Lili Qiu"],"abstract":"In recent years, the rapid aging of the global population has led to an increase in cognitive disorders, such as Alzheimer's disease, presenting significant public health challenges. Although no effective treatments currently exist to reverse Alzheimer's, prevention and early intervention, including cognitive training, are critical. This report explores the potential of AI chatbots in enhancing personalized cognitive training. We introduce ReMe, a web-based framework designed to create AI chatbots that facilitate cognitive training research, specifically targeting episodic memory tasks derived from personal life logs. By leveraging large language models, ReMe provides enhanced user-friendly, interactive, and personalized training experiences. Case studies demonstrate ReMe's effectiveness in engaging users through life recall and open-ended language puzzles, highlighting its potential to....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","1970-01-01","personalized","memory"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"bytedance-seed:125","title":"Why Does the Effective Context Length of LLMs Fall Short?","url":"https://seed.bytedance.com/en/research/why-does-the-effective-context-length-of-llms-fall-short","published":"2024-10-24","authors":["Chenxin An","Jun Zhang","Ming Zhong","Lei Li","Shansan Gong","Yao Luo","Jingjing Xu","Lingpeng Kong"],"abstract":"Advancements in distributed training and efficient attention mechanisms have significantly expanded the context window sizes of large language models (LLMs). However, recent work reveals that the effective context lengths of open-source LLMs often fall short, typically not exceeding half of their training lengths. In this work, we attribute this limitation to the left-skewed frequency distribution of relative positions formed in LLMs pretraining and post-training stages, which impedes their ability to effectively gather distant information. To address this challenge, we introduce ShifTed Rotray position embeddING (STRING). STRING shifts well-trained positions to overwrite the original ineffective positions during inference, enhancing performance within their existing training lengths. Experimental results show that without additional training, STRING dramatically improves the performance...","companies":["ByteDance/Seed"],"matched_orgs":["ByteDance/Seed"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["LLM","ICLR 2025","efficient"],"author_affiliations":["ByteDance/Seed"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official ByteDance Seed publication API https://seed.bytedance.com/api/get_article_list_v2"}},{"id":"openalex:W4403727066","title":"Multimodal Cross-Domain Few-Shot Learning for Egocentric Action Recognition","url":"https://doi.org/10.1007/978-3-031-73414-4_11","published":"2024-10-24","authors":["Masashi Hatano","Ryo Hachiuma","Ryo Fujii","Hideo Saitô"],"abstract":"","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"book-chapter","doi":"https://doi.org/10.1007/978-3-031-73414-4_11","openalex_id":"https://openalex.org/W4403727066","cited_by_count":7,"quality_score":44,"matched_keywords":[],"author_affiliations":["Keio University","Nvidia (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8700261116027832},{"id":"https://openalex.org/C2987834672","display_name":"Action recognition","score":0.6479383111000061},{"id":"https://openalex.org/C2778344882","display_name":"Shot (pellet)","score":0.6340028643608093},{"id":"https://openalex.org/C36503486","display_name":"Domain (mathematical analysis)","score":0.6161530017852783},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5583140254020691},{"id":"https://openalex.org/C2780791683","display_name":"Action (physics)","score":0.45964929461479187},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.3830227255821228},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.3456918001174927}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":7}},{"id":"openalex:W4403705795","title":"Noise Calibration: Plug-and-Play Content-Preserving Video Enhancement Using Pre-trained Video Diffusion Models","url":"https://doi.org/10.1007/978-3-031-72764-1_18","published":"2024-10-24","authors":["Qinyu Yang","Haoxin Chen","Yong Zhang","Menghan Xia","Xiaodong Cun","Zhixun Su","Ying Shan"],"abstract":"","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"book-chapter","doi":"https://doi.org/10.1007/978-3-031-72764-1_18","openalex_id":"https://openalex.org/W4403705795","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Dalian University of Technology","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8583167791366577},{"id":"https://openalex.org/C165838908","display_name":"Calibration","score":0.601748526096344},{"id":"https://openalex.org/C69357855","display_name":"Diffusion","score":0.5835959911346436},{"id":"https://openalex.org/C99498987","display_name":"Noise (video)","score":0.5431330800056458},{"id":"https://openalex.org/C65483669","display_name":"Video processing","score":0.48818784952163696},{"id":"https://openalex.org/C2780070844","display_name":"Plug and play","score":0.48571476340293884},{"id":"https://openalex.org/C2778152352","display_name":"Content (measure theory)","score":0.44988730549812317},{"id":"https://openalex.org/C4924752","display_name":"Plug-in","score":0.4420281648635864}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/videowebarena-evaluating-long-context-multimodal-agents-with-video-understanding-web-tasks","title":"VideoWebArena: Evaluating Long Context Multimodal Agents with Video Understanding Web Tasks","url":"https://www.microsoft.com/en-us/research/publication/videowebarena-evaluating-long-context-multimodal-agents-with-video-understanding-web-tasks/","published":"2024-10-23","authors":["Lawrence Jang","Yinheng Li","Charles Ding","Justin Lin","Paul Pu Liang","Dan Zhao","Rogerio Bonatti","Kazuhito Koishida"],"abstract":"Videos are often used to learn or extract the necessary information to complete tasks in ways different than what text and static imagery alone can provide. However, many existing agent benchmarks neglect long-context video understanding, instead focusing on text or static image inputs. To bridge this gap, we introduce VideoWebArena (VideoWA), a benchmark for evaluating the capabilities of long-context multimodal agents for video understanding. VideoWA consists of 2,021 web agent tasks based on manually crafted video tutorials, which total almost four hours of content. For our benchmark, we define a taxonomy of long-context video-based agent tasks with two main areas of focus: skill retention and factual retention. While skill retention tasks evaluate whether an agent can use a given human demonstration to complete a task efficiently, the factual retention task evaluates whether an agent...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":88,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Graphics and multimedia","Benchmarking","Computer science","multimodal agent","video understanding","1970-01-01","agent"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/fast-constrained-sampling-in-pre-trained-diffusion-models","title":"Fast constrained sampling in pre-trained diffusion models","url":"https://www.microsoft.com/en-us/research/publication/fast-constrained-sampling-in-pre-trained-diffusion-models/","published":"2024-10-23","authors":["Alexandros Graikos","Nebojsa Jojic","Dimitris Samaras"],"abstract":"Large denoising diffusion models, such as Stable Diffusion, have been trained on billions of image-caption pairs to perform text-conditioned image generation. As a byproduct of this training, these models have acquired general knowledge about image statistics, which can be useful for other inference tasks. However, when confronted with sampling an image under new constraints, e.g. generating the missing parts of an image, using large pre-trained text-to-image diffusion models is inefficient and often unreliable. Previous approaches either utilized backpropagation through the denoiser network, making them significantly slower and more memory-demanding than simple text-to-image generation, or only enforced the constraint locally, failing to capture critical long-range correlations in the sampled image. In this work, we propose an algorithm that enables fast, high-quality generation under a...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","Diffusion models","1970-01-01","memory"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"apple:ojpfll8pu19crf7qaaiu3n75","title":"Towards Data-Centric RLHF: Simple Metrics for Preference Dataset Comparison","url":"https://machinelearning.apple.com/research/data-centric-rlhf","published":"2024-10-23","authors":["Judy Hanwen Shen","Archit Sharma","Jun Qin"],"abstract":"The goal of aligning language models to human preferences requires data that reveal these preferences. Ideally, time and money can be spent carefully collecting and tailoring bespoke preference data to each downstream application. However, in practice, a select few publicly available preference datasets are often used to train reward models for reinforcement learning from human feedback (RLHF). While new preference datasets are being introduced...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["preference"],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"apple:qdr3c8jkji4fhtvpmmew98xq","title":"MUSCLE: A Model Update Strategy for Compatible LLM Evolution","url":"https://machinelearning.apple.com/research/model-compatibility","published":"2024-10-23","authors":["Jessica Echterhoff","Fartash Faghri","Raviteja Vemulapalli","Ting-Yao Hu","Chun-Liang Li","Oncel Tuzel","Hadi Pouransari"],"abstract":"Large Language Models (LLMs) are regularly updated to enhance performance, typically through changes in data or architecture. Within the update process, developers often prioritize improving overall performance metrics, paying less attention to maintaining compatibility with earlier model versions. Instance-level degradation (instance regression) of performance from one model version to the next can interfere with a user's mental model of the...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["LLM"],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"openalex:W4403674578","title":"Prior Preserved Text-to-Image Personalization Without Image Regularization","url":"https://doi.org/10.1109/tcsvt.2024.3485236","published":"2024-10-23","authors":["Z. H. Wang","Ouxiang Li","Tan Wang","Longhui Wei","Yanbin Hao","Xiang Wang","Qi Tian"],"abstract":"The current state-of-the-art text-to-image (T2I) models have found numerous applications, driven by their ability to produce photorealistic images. Concept learning, as one notable application, aims to enable T2I models to generate personalized content and better enable users to create images according to their interests. Nevertheless, the process of concept learning often involves model fine-tuning, which in turn brings the potential risk of overfitting. Such overfitting causes the T2I model to have reduced output diversity and results in poor editability. To mitigate the overfitting problem, we introduce two simple yet effective designs, namely masked textual inversion (MaskTI) and text regularization (TextReg). MaskTI is a variant of vanilla textual inversion that forces the learnable identifier to only attend to the class descriptor. This modification can effectively reduce the overf...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/tcsvt.2024.3485236","openalex_id":"https://openalex.org/W4403674578","cited_by_count":6,"quality_score":51,"matched_keywords":["personalized","personalization"],"author_affiliations":["Huawei Technologies (China)","Nanyang Technological University","University of Science and Technology of China"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6896010637283325},{"id":"https://openalex.org/C115961682","display_name":"Image (mathematics)","score":0.598981499671936},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.5343189835548401},{"id":"https://openalex.org/C2776135515","display_name":"Regularization (linguistics)","score":0.5156856775283813},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5041722059249878},{"id":"https://openalex.org/C183003079","display_name":"Personalization","score":0.47655633091926575},{"id":"https://openalex.org/C9417928","display_name":"Image processing","score":0.437140554189682},{"id":"https://openalex.org/C136764020","display_name":"World Wide Web","score":0.12969380617141724}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":6}},{"id":"openalex:W4403628189","title":"QAIE: LLM-based Quantity Augmentation and Information Enhancement for few-shot Aspect-Based Sentiment Analysis","url":"https://doi.org/10.1016/j.ipm.2024.103917","published":"2024-10-22","authors":["Hengyang Lu","Tianci Liu","Rui Cong","Jun Yang","Qiang Gan","Wei Fang","Xiao‐Jun Wu"],"abstract":"","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1016/j.ipm.2024.103917","openalex_id":"https://openalex.org/W4403628189","cited_by_count":30,"quality_score":71,"matched_keywords":["LLM"],"author_affiliations":["Intelligent Health (United Kingdom)","Jiangnan University","Microsoft (United States)"],"concepts":[{"id":"https://openalex.org/C2778344882","display_name":"Shot (pellet)","score":0.7952237725257874},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.4894014894962311},{"id":"https://openalex.org/C66402592","display_name":"Sentiment analysis","score":0.4495196044445038},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.3227289915084839},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.31742504239082336},{"id":"https://openalex.org/C192562407","display_name":"Materials science","score":0.17435234785079956},{"id":"https://openalex.org/C191897082","display_name":"Metallurgy","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":30}},{"id":"bytedance-seed:224","title":"Merging LoRAs like Playing LEGO: Pushing the Modularity of LoRA to Extremes Through Rank-Wise Clustering","url":"https://seed.bytedance.com/en/research/merging-loras-like-playing-lego-pushing-the-modularity-of-lora-to-extremes-through-rank-wise-clustering","published":"2024-10-22","authors":["Ziyu Zhao","Tao Shen","Didi Zhu","Zexi Li","Jing Su","Xuwu Wang","Kun Kuang","Fei Wu"],"abstract":"Low-Rank Adaptation (LoRA) has emerged as a popular technique for fine-tuning large language models (LLMs) to various domains due to its modular design and widespread availability on platforms like Huggingface. This modularity has sparked interest in combining multiple LoRAs to enhance LLM capabilities. However, existing methods for LoRA composition primarily focus on task-specific adaptations that require additional training, and current model merging techniques often fail to fully leverage LoRA's modular nature, leading to parameter interference and performance degradation. In this paper, we investigate the feasibility of disassembling and reassembling multiple LoRAs at a finer granularity, analogous to assembling LEGO blocks. We introduce the concept of Minimal Semantic Units (MSUs), where the parameters corresponding to each rank in LoRA function as independent units. These MSUs demo...","companies":["ByteDance/Seed"],"matched_orgs":["ByteDance/Seed"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Core Machine Learning","LLM","ICLR 2025"],"author_affiliations":["ByteDance/Seed"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official ByteDance Seed publication API https://seed.bytedance.com/api/get_article_list_v2"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/exploring-continual-fine-tuning-for-enhancing-language-ability-in-large-language-model","title":"Exploring Continual Fine-Tuning for Enhancing Language Ability in Large Language Model","url":"https://www.microsoft.com/en-us/research/publication/exploring-continual-fine-tuning-for-enhancing-language-ability-in-large-language-model/","published":"2024-10-21","authors":["Divyanshu Aggarwal","Sankarshan Damle","Navin Goyal","Satya Lokam","Sunayana Sitaram"],"abstract":"A common challenge towards the adaptability of Large Language Models (LLMs) is their ability to learn new languages over time without hampering the model's performance on languages in which the model is already proficient (usually English). Continual fine-tuning (CFT) is the process of sequentially fine-tuning an LLM to enable the model to adapt to downstream tasks with varying data distributions and time shifts. This paper focuses on the language adaptability of LLMs through CFT. We study a two-phase CFT process in which an English-only end-to-end fine-tuned LLM from Phase 1 (predominantly Task Ability) is sequentially fine-tuned on a multilingual dataset -- comprising task data in new languages -- in Phase 2 (predominantly Language Ability). We observe that the similarity'' of Phase 2 tasks with Phase 1 determines the LLM's adaptability. For similar phase-wise datasets, the LLM after P...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Article (Journal)","Security, privacy, and cryptography","Computer science","1970-01-01","LLM","language model"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W4403582420","title":"RCAgent: Cloud Root Cause Analysis by Autonomous Agents with Tool-Augmented Large Language Models","url":"https://doi.org/10.1145/3627673.3680016","published":"2024-10-20","authors":["Z.H. Wang","Z.G Liu","Y. Zhang","Aoxiao Zhong","Jihong Wang","Fengbin Yin","Lunting Fan","Lingfei Wu","Qingsong Wen"],"abstract":"Large language model (LLM) applications in cloud root cause analysis (RCA) have been actively explored recently. However, current methods are still reliant on manual workflow settings and do not unleash LLMs' decision-making and environment interaction capabilities. We present RCAgent, a tool-augmented LLM autonomous agent framework for practical and privacy-aware industrial RCA usage. Running on an internally deployed model rather than GPT families, RCAgent is capable of free-form data collection and comprehensive analysis with tools. Our framework combines a variety of enhancements, including a unique Self-Consistency for action trajectories, and a suite of methods for context management, stabilization, and importing domain knowledge. Our experiments show RCAgent's evident and consistent superiority over ReAct across all aspects of RCA--predicting root causes, solutions, evidence, and....","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3627673.3680016","openalex_id":"https://openalex.org/W4403582420","cited_by_count":40,"quality_score":79,"matched_keywords":["LLM","language model","agent"],"author_affiliations":["Alibaba Group (China)","Bellevue Hospital Center","Nanjing University","Tsinghua University","Xi'an Jiaotong University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7450974583625793},{"id":"https://openalex.org/C79974875","display_name":"Cloud computing","score":0.6954582333564758},{"id":"https://openalex.org/C130963320","display_name":"Root cause analysis","score":0.6173240542411804},{"id":"https://openalex.org/C171078966","display_name":"Root (linguistics)","score":0.5737395286560059},{"id":"https://openalex.org/C84945661","display_name":"Root cause","score":0.4715844690799713},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.33906251192092896},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.3235276937484741},{"id":"https://openalex.org/C127413603","display_name":"Engineering","score":0.12589192390441895}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":40}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/protnote-a-multimodal-method-for-protein-function-annotation-2","title":"ProtNote: a multimodal method for protein-function annotation","url":"https://www.microsoft.com/en-us/research/publication/protnote-a-multimodal-method-for-protein-function-annotation-2/","published":"2024-10-20","authors":["Samir Char","Nathaniel Corley","Sarah Alamdari","Kevin Kaichuang Yang","Ava P. Amini"],"abstract":"Understanding the protein sequence-function relationship is essential for advancing protein biology and engineering. However, fewer than 1% of known protein sequences have human-verified functions. While deep learning methods have demonstrated promise for protein function prediction, current models are limited to predicting only those functions on which they were trained. Here, we introduce ProtNote, a multimodal deep learning model that leverages free-form text to enable both supervised and zero-shot protein function prediction. ProtNote not only maintains near state-of-the-art performance for annotations in its train set, but also generalizes to unseen and novel functions in zero-shot test settings. We envision that ProtNote will enhance protein function discovery by enabling scientists to use free text inputs, without restriction to predefined labels – a necessary capability for navig...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.1093/bioinformatics/btaf170","openalex_id":"https://openalex.org/W4409525624","cited_by_count":10,"quality_score":78,"matched_keywords":["Article (Journal)","Artificial intelligence","Medical, health and genomics","Biology"],"author_affiliations":["Microsoft","Microsoft (United States)","Microsoft Research (United Kingdom)"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/automated-proof-generation-for-rust-code-via-self-evolution","title":"Automated Proof Generation for Rust Code via Self-Evolution","url":"https://www.microsoft.com/en-us/research/publication/automated-proof-generation-for-rust-code-via-self-evolution/","published":"2024-10-20","authors":["Tianyu Chen","Shuai Lu","Shan Lu","Yeyun Gong","Chenyuan Yang","Xuheng Li","Md Rakib Hossain Misu","Hao Yu","Nan Duan","Peng Cheng","Fan Yang","Shuvendu Lahiri"],"abstract":"Ensuring correctness is crucial for code generation. Formal verification offers a definitive assurance of correctness, but demands substantial human effort in proof construction and hence raises a pressing need for automation. The primary obstacle lies in the severe lack of data - there is much less proof than code for LLMs to train upon. In this paper, we introduce SAFE, a novel framework that overcomes the lack of human-written proof to enable automated proof generation of Rust code. SAFE establishes a self-evolving cycle where data synthesis and fine-tuning collaborate to enhance the model capability, leveraging the definitive power of a symbolic verifier in telling correct proof from incorrect ones. SAFE also re-purposes the large number of synthesized incorrect proofs to train the self-debugging capability of the fine-tuned models, empowering them to fix incorrect proofs based on th...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Programming languages and software engineering","Systems and networking","Computer science","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/1-bit-ai-infra-part-1-1-fast-and-lossless-bitnet-b1-58-inference-on-cpus","title":"1-bit AI Infra: Part 1.1, Fast and Lossless BitNet b1.58 Inference on CPUs","url":"https://www.microsoft.com/en-us/research/publication/1-bit-ai-infra-part-1-1-fast-and-lossless-bitnet-b1-58-inference-on-cpus/","published":"2024-10-20","authors":["Jinheng Wang","Hansong Zhou","Ting Song","Shaoguang Mao","Shuming Ma","Hongyu Wang","Yan Xia","Furu Wei"],"abstract":"Recent advances in 1-bit Large Language Models (LLMs), such as BitNet and BitNet b1.58, present a promising approach to enhancing the efficiency of LLMs in terms of speed and energy consumption. These developments also enable local LLM deployment across a broad range of devices. In this work, we introduce bitnet.cpp, a tailored software stack designed to unlock the full potential of 1-bit LLMs. Specifically, we develop a set of kernels to support fast and lossless inference of ternary BitNet b1.58 LLMs on CPUs. Extensive experiments demonstrate that bitnet.cpp achieves significant speedups, ranging from 2.37x to 6.17x on x86 CPUs and from 1.37x to 5.07x on ARM CPUs, across various model sizes. The code is available at https://github.com/microsoft/BitNet .","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Miscellaneous","Artificial intelligence","Computation and Language","Computer science","LLM"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W4403578219","title":"SINKT: A Structure-Aware Inductive Knowledge Tracing Model with Large Language Model","url":"https://doi.org/10.1145/3627673.3679760","published":"2024-10-20","authors":["Lingyue Fu","Hao Guan","Kounianhua Du","Jianghao Lin","Wei Xia","Weinan Zhang","Ruiming Tang","Yasheng Wang","Yong Yu"],"abstract":"Knowledge Tracing (KT) aims to determine whether students will respond correctly to the next question, which is a crucial task in intelligent tutoring systems (ITS). In educational KT scenarios, transductive ID-based methods often face severe data sparsity and cold start problems, where interactions between individual students and questions are sparse, and new questions and concepts consistently arrive in the database. In addition, existing KT models only implicitly consider the correlation between concepts and questions, lacking direct modeling of the more complex relationships in the heterogeneous graph of concepts and questions. In this paper, we propose a <u>S</u>tructure-aware <u>IN</u>ductive <u>K</u>nowledge <u>T</u>racing model with large language model (dubbed SINKT), which, for the first time, introduces large language models (LLMs) and realizes inductive knowledge tracing. Fir...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3627673.3679760","openalex_id":"https://openalex.org/W4403578219","cited_by_count":22,"quality_score":63,"matched_keywords":["language model"],"author_affiliations":["Huawei Technologies (China)","Shanghai Jiao Tong University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7499144077301025},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.5419988632202148},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.34794312715530396}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":22}},{"id":"openalex:W4403577519","title":"MemoCRS: Memory-enhanced Sequential Conversational Recommender Systems with Large Language Models","url":"https://doi.org/10.1145/3627673.3679599","published":"2024-10-20","authors":["Yunjia Xi","Weiwen Liu","Jianghao Lin","Bo Chen","Ruiming Tang","Weinan Zhang","Yong Yu"],"abstract":"Conversational recommender systems (CRSs) aim to capture user preferences and provide personalized recommendations through multi-round natural language dialogues. However, most existing CRS models mainly focus on dialogue comprehension and preferences mining from the current dialogue session, overlooking user preferences in historical dialogue sessions. The preferences embedded in historical sessions and the current session exhibit continuity and sequentiality, and we refer to such CRSs as sequential CRSs. In this work, we leverage memory-enhanced LLMs to model the preference continuity, addressing two key issues: (1) redundancy and noise in historical dialogue sessions, and (2) the cold-start users problem. Thus, we propose a <u>Memo</u>ry-enhanced <u>C</u>onversational <u>R</u>ecommender <u>S</u>ystem Framework with Large Language Models (dubbed MemoCRS), consisting of user-specific me...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3627673.3679599","openalex_id":"https://openalex.org/W4403577519","cited_by_count":13,"quality_score":62,"matched_keywords":["personalized","preference","memory"],"author_affiliations":["Huawei Technologies (China)","Shanghai Jiao Tong University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8346378803253174},{"id":"https://openalex.org/C557471498","display_name":"Recommender system","score":0.8032559156417847},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.43677476048469543},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3890644907951355},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.3649395704269409},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.23406982421875}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":13}},{"id":"openalex:W4403577498","title":"Generative AI and Retrieval-Augmented Generation (RAG) Systems for Enterprise","url":"https://doi.org/10.1145/3627673.3680117","published":"2024-10-20","authors":["Anbang Xu","Tan Yu","Min Du","Pritam Gundecha","Yufan Guo","Xinliang Zhu","May Wang","Ping Li","Xinyun Chen"],"abstract":"This workshop introduces generative AI applications for enterprise, with a focus on retrieval-augmented generation (RAG) systems. Generative AI is a field of artificial intelligence that can create new content and solve complex problems. RAG systems are a novel generative AI technique that combines information retrieval with text generation to generate rich and diverse responses. RAG systems can leverage enterprise data, which is often specific, structured, and dynamic, to provide customized solutions for various domains. However, enterprise data also poses challenges such as scalability, security, and data quality. This workshop convenes researchers and practitioners to explore RAG and other generative AI systems in real-world enterprise scenarios, fostering knowledge exchange, collaboration, and identification of future directions. Relevant to the CIKM community, the workshop intersect...","companies":["Amazon","NVIDIA"],"matched_orgs":["Amazon","NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3627673.3680117","openalex_id":"https://openalex.org/W4403577498","cited_by_count":5,"quality_score":58,"matched_keywords":["retrieval"],"author_affiliations":["Amazon (United States)","Google (United States)","Nvidia (United States)","Palo Alto Networks (United States)","Visual Editor Consultants (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7696688175201416},{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.5888023376464844},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5199229121208191},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.4667380750179291},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.3436152935028076}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":5}},{"id":"openalex:W4403582426","title":"RealTCD: Temporal Causal Discovery from Interventional Data with Large Language Model","url":"https://doi.org/10.1145/3627673.3680042","published":"2024-10-20","authors":["Peiwen Li","Xin Wang","Zeyang Zhang","Yuan Meng","Fang Shen","Y Li","Jialong Wang","Yang Li","Wenwu Zhu"],"abstract":"In the field of Artificial Intelligence for Information Technology Operations, causal discovery is pivotal for operation and maintenance of systems, facilitating downstream industrial tasks such as root cause analysis. Temporal causal discovery, as an emerging method, aims to identify temporal causal relations between variables directly from observations by utilizing interventional data. However, existing methods mainly focus on synthetic datasets with heavy reliance on interventional targets and ignore the textual information hidden in real-world systems, failing to conduct causal discovery for real industrial scenarios. To tackle this problem, in this paper we investigate temporal causal discovery in industrial scenarios, which faces two critical challenges: how to discover causal relations without the interventional targets that are costly to obtain in practice, and how to discover ca...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3627673.3680042","openalex_id":"https://openalex.org/W4403582426","cited_by_count":11,"quality_score":56,"matched_keywords":["LLM","language model"],"author_affiliations":["Alibaba Group (China)","Tsinghua University","University Town of Shenzhen"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7006944417953491},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.4508405327796936},{"id":"https://openalex.org/C67186912","display_name":"Data modeling","score":0.43246781826019287},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.39464834332466125},{"id":"https://openalex.org/C2522767166","display_name":"Data science","score":0.32297709584236145},{"id":"https://openalex.org/C115903868","display_name":"Software engineering","score":0.1792580485343933}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":11}},{"id":"openalex:W4403577837","title":"Content-Based Collaborative Generation for Recommender Systems","url":"https://doi.org/10.1145/3627673.3679692","published":"2024-10-20","authors":["Yidan Wang","Zhaochun Ren","Weiwei Sun","Jiyuan Yang","Zhixiang Liang","Xin Chen","Ruobing Xie","Su Yan","Xu Zhang","Pengjie Ren","Zhumin Chen","Xin Xin"],"abstract":"Generative models have emerged as a promising utility to enhance recommender systems. It is essential to model both item content and user-item collaborative interactions in a unified generative framework for better recommendation. Although some existing large language model (LLM)-based methods contribute to fusing content information and collaborative signals, they fundamentally rely on textual language generation, which is not fully aligned with the recommendation task. How to integrate content knowledge and collaborative interaction signals in a generative framework tailored for item recommendation is still an open research challenge.","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3627673.3679692","openalex_id":"https://openalex.org/W4403577837","cited_by_count":8,"quality_score":53,"matched_keywords":["LLM","language model"],"author_affiliations":["Leiden University","Shandong University","Tencent (China)","Zhejiang University"],"concepts":[{"id":"https://openalex.org/C557471498","display_name":"Recommender system","score":0.8802312612533569},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7799369096755981},{"id":"https://openalex.org/C21569690","display_name":"Collaborative filtering","score":0.45381706953048706},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.4185553193092346},{"id":"https://openalex.org/C136764020","display_name":"World Wide Web","score":0.40146663784980774},{"id":"https://openalex.org/C49774154","display_name":"Multimedia","score":0.3393714129924774}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":8}},{"id":"openalex:W4403577370","title":"Editing Factual Knowledge and Explanatory Ability of Medical Large Language Models","url":"https://doi.org/10.1145/3627673.3679673","published":"2024-10-20","authors":["Derong Xu","Ziheng Zhang","Zhihong Zhu","Zhenxi Lin","Qidong Liu","Xian Wu","Tong Xu","Wanyu Wang","Yuyang Ye","Xiangyu Zhao","Enhong Chen","Yefeng Zheng"],"abstract":"Model editing aims to precisely alter the behaviors of large language models (LLMs) in relation to specific knowledge, while leaving unrelated knowledge intact. This approach has proven effective in addressing issues of hallucination and outdated information in LLMs. However, the potential of using model editing to modify knowledge in the medical field remains largely unexplored, even though resolving hallucination is a pressing need in this area. Our observations indicate that current methods face significant challenges in dealing with specialized and complex knowledge in medical domain. Therefore, we propose MedLaSA, a novel Layer-wise Scalable Adapter strategy for medical model editing. MedLaSA harnesses the strengths of both adding extra parameters and locate-then-edit methods for medical model editing. We utilize causal tracing to identify the association of knowledge in neurons acr...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3627673.3679673","openalex_id":"https://openalex.org/W4403577370","cited_by_count":14,"quality_score":51,"matched_keywords":[],"author_affiliations":["City University of Hong Kong","Peking University","Tencent (China)","University of Science and Technology of China","Westlake University","Xi'an Jiaotong University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7687460780143738},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.5406522154808044},{"id":"https://openalex.org/C2522767166","display_name":"Data science","score":0.4132958650588989},{"id":"https://openalex.org/C2778638050","display_name":"Explanatory model","score":0.4117937684059143},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.37308967113494873},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.37094858288764954},{"id":"https://openalex.org/C111472728","display_name":"Epistemology","score":0.15683963894844055},{"id":"https://openalex.org/C138885662","display_name":"Philosophy","score":0.07358887791633606}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":14}},{"id":"openalex:W4403577781","title":"SOUP: A Unified Shopping Query Suggestion Framework to Optimize Language Model with User Preference","url":"https://doi.org/10.1145/3627673.3679995","published":"2024-10-20","authors":["Xv Meng","Zhaohui Luo","Xinxin Wang","Wen Jiang","Wei Ning","Shuhan Qi"],"abstract":"The shopping query suggestion offers personalized queries to users and plays a crucial role in search engines. However, existing shopping query suggestion methods suffer from poor task generalization and limited semantic comprehension problems. This paper presents a comprehensive framework for the shopping query suggestion that effectively addresses the shortcomings of existing approaches. Our proposed framework leverages a generative language model and fine-grained preference alignment to enhance semantic comprehension and improve the quality of generated queries. Our key contributions include the introduction of a personalized prompt set for diverse query suggestion tasks, the integration of interaction behavior time to capture user query interests, and the utilization of reinforcement learning techniques to align user preferences. Experimental results demonstrate enhancements in diffe...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3627673.3679995","openalex_id":"https://openalex.org/W4403577781","cited_by_count":0,"quality_score":49,"matched_keywords":["language model","personalized","preference"],"author_affiliations":["Alibaba Group (China)","Harbin Institute of Technology"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8446516990661621},{"id":"https://openalex.org/C2781249084","display_name":"Preference","score":0.6693879961967468},{"id":"https://openalex.org/C192028432","display_name":"Query language","score":0.5871340036392212},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.5214654207229614},{"id":"https://openalex.org/C157692150","display_name":"Query optimization","score":0.5196908116340637},{"id":"https://openalex.org/C96956885","display_name":"RDF query language","score":0.4748803973197937},{"id":"https://openalex.org/C164120249","display_name":"Web search query","score":0.4072956442832947},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.36428162455558777}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"arxiv:2405.16089","title":"Towards Completeness-Oriented Tool Retrieval for Large Language Models","url":"http://arxiv.org/abs/2405.16089","published":"2024-10-20","authors":["Changle Qu","Sunhao Dai","Xiaochi Wei","Hengyi Cai","Shuaiqiang Wang","Dawei Yin","Jun Xu","Ji-Rong Wen"],"abstract":"Recently, integrating external tools with Large Language Models (LLMs) has gained significant attention as an effective strategy to mitigate the limitations inherent in their pre-training data. However, real-world systems often incorporate a wide array of tools, making it impractical to input all tools into LLMs due to length limitations and latency constraints. Therefore, to fully exploit the potential of tool-augmented LLMs, it is crucial to develop an effective tool retrieval system. Existing tool retrieval methods primarily focus on semantic matching between user queries and tool descriptions, frequently leading to the retrieval of redundant, similar tools. Consequently, these methods fail to provide a complete set of diverse tools necessary for addressing the multifaceted problems encountered by LLMs. In this paper, we propose a novel modelagnostic <u>CO</u> llaborative <u>L</u> ear...","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"preprint","doi":"https://doi.org/10.1145/3627673.3679847","openalex_id":"https://openalex.org/W4399115306","cited_by_count":7,"quality_score":48,"matched_keywords":["retrieval"],"author_affiliations":["Baidu (China)","Chinese Academy of Sciences","Institute of Computing Technology","Renmin University of China"],"concepts":[{"id":"https://openalex.org/C17231256","display_name":"Completeness (order theory)","score":0.860185980796814},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6617096066474915},{"id":"https://openalex.org/C199360897","display_name":"Programming language","score":0.4962003827095032},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.482991099357605},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.4277118444442749},{"id":"https://openalex.org/C33923547","display_name":"Mathematics","score":0.14335274696350098},{"id":"https://openalex.org/C134306372","display_name":"Mathematical analysis","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":7}},{"id":"openalex:W4403582506","title":"ELCoRec: Enhance Language Understanding with Co-Propagation of Numerical and Categorical Features for Recommendation","url":"https://doi.org/10.1145/3627673.3679789","published":"2024-10-20","authors":["Jizheng Chen","Kounianhua Du","Jianghao Lin","Bo Chen","Ruiming Tang","Weinan Zhang","Yong Yu"],"abstract":"Large language models have been flourishing in the natural language processing (NLP) domain, and their potential for recommendation has been paid much attention to. Despite the intelligence shown by the recommendation-oriented finetuned models, LLMs struggle to fully understand the user behavior patterns due to their innate weakness in interpreting numerical features and the overhead for long context, where the temporal relations among user behaviors, subtle quantitative signals among different ratings, and various side features of items are not well explored. Existing works only fine-tune a sole LLM on given text data without introducing that important information to it, leaving these problems unsolved. In this paper, we propose ELCoRec to Enhance Language understanding with Co-Propagation of numerical and categorical features for Recommendation. Concretely, we propose to inject the pre...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3627673.3679789","openalex_id":"https://openalex.org/W4403582506","cited_by_count":2,"quality_score":47,"matched_keywords":["LLM","preference"],"author_affiliations":["Huawei Technologies (China)","Shanghai Jiao Tong University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7180041074752808},{"id":"https://openalex.org/C5274069","display_name":"Categorical variable","score":0.7139339447021484},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.47642335295677185},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3905561864376068},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.15903151035308838}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":2}},{"id":"openalex:W4403577794","title":"Preliminary Study on Incremental Learning for Large Language Model-based Recommender Systems","url":"https://doi.org/10.1145/3627673.3679922","published":"2024-10-20","authors":["Tianhao Shi","Yang Zhang","Zhijian Xu","Chong Chen","Fuli Feng","Xiangnan He","Qi Tian"],"abstract":"Adapting Large Language Models for Recommendation (LLM4Rec) has shown promising results. However, the challenges of deploying LLM4Rec in real-world scenarios remain largely unexplored. In particular, recommender models need incremental adaptation to evolving user preferences, while the suitability of traditional incremental learning methods within LLM4Rec remains ambiguous due to the unique characteristics of Large Language Models (LLMs).","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3627673.3679922","openalex_id":"https://openalex.org/W4403577794","cited_by_count":5,"quality_score":46,"matched_keywords":["language model"],"author_affiliations":["Huawei Technologies (China)","National University of Singapore","University of Science and Technology of China"],"concepts":[{"id":"https://openalex.org/C557471498","display_name":"Recommender system","score":0.8680604696273804},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8472532033920288},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.42138975858688354},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.41696518659591675},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.3872421979904175},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.34892868995666504}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":5}},{"id":"openalex:W4403582631","title":"Scaling Vison-Language Foundation Model to 12 Billion Parameters in Baidu Dynamic Image Advertising","url":"https://doi.org/10.1145/3627673.3680014","published":"2024-10-20","authors":["Xinyu Zhao","Kang Zhao","Zhipeng Jin","Yi Yang","Wen Tao","Xiaodong Chen","Cong Han","Shuanglong Li","Lin Liu"],"abstract":"Dynamic image advertising is an add-on service in search advertising that matches visuals to search ads in real-time. However, the image matching system encompasses various sub-tasks with different objectives, increasing the complexity of achieving global optimization. Besides, prevalent long-tailed data poses a challenge to the multimodal representation learning in dynamic image advertising. Recently, vision-language pre-trained models have achieved remarkable performance across a variety of multimodal tasks, and implemented as the foundational representation model in electronic business scenarios. In this paper, to improve multimodal content understanding in Dynamic Image adVERtising, we present a viSion-language rEpresentation model (referred to as DIVERSE) that learns on cross-view and cross-token contrastive loss. Moreover, with large-scale curated advertising image-text data and ex...","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3627673.3680014","openalex_id":"https://openalex.org/W4403582631","cited_by_count":0,"quality_score":45,"matched_keywords":["retrieval","efficient"],"author_affiliations":["Baidu (China)"],"concepts":[{"id":"https://openalex.org/C2780966255","display_name":"Foundation (evidence)","score":0.6168017387390137},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5996964573860168},{"id":"https://openalex.org/C115961682","display_name":"Image (mathematics)","score":0.5247775912284851},{"id":"https://openalex.org/C112698675","display_name":"Advertising","score":0.44572579860687256},{"id":"https://openalex.org/C99844830","display_name":"Scaling","score":0.42338141798973083},{"id":"https://openalex.org/C121684516","display_name":"Computer graphics (images)","score":0.3681545555591583},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3282463550567627},{"id":"https://openalex.org/C49774154","display_name":"Multimedia","score":0.3216797113418579}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"arxiv:2501.07166","title":"Natural Language-Assisted Multi-modal Medication Recommendation","url":"http://arxiv.org/abs/2501.07166","published":"2024-10-20","authors":["Jie Tan","Yu Rong","Kangfei Zhao","Tian Bian","Tingyang Xu","Junzhou Huang","Hong Cheng","Helen Meng"],"abstract":"Combinatorial medication recommendation (CMR) is a fundamental task of healthcare, which offers opportunities for clinical physicians to provide more precise prescriptions for patients with intricate health conditions, particularly in the scenarios of long-term medical care. Previous research efforts have sought to extract meaningful information from electronic health records (EHRs) to facilitate combinatorial medication recommendations. Existing learning-based approaches further consider the chemical structures of medications, but ignore the textual medication descriptions in which the functionalities are clearly described. Furthermore, the textual knowledge derived from the EHRs of patients remains largely underutilized. To address these issues, we introduce the Natural Language-Assisted Multi-modal Medication Recommendation (NLA-MMR), a multimodal alignment framework designed to learn...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3627673.3679529","openalex_id":"https://openalex.org/W4403582711","cited_by_count":4,"quality_score":45,"matched_keywords":["long-term"],"author_affiliations":["Alibaba Group (China)","Beijing Institute of Technology","Chinese University of Hong Kong","The University of Texas at Arlington"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7424559593200684},{"id":"https://openalex.org/C71139939","display_name":"Modal","score":0.6341732144355774},{"id":"https://openalex.org/C195324797","display_name":"Natural language","score":0.5656108856201172},{"id":"https://openalex.org/C2776608160","display_name":"Natural (archaeology)","score":0.5482490658760071},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.5253906846046448},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3745967745780945},{"id":"https://openalex.org/C28490314","display_name":"Speech recognition","score":0.34432029724121094},{"id":"https://openalex.org/C95457728","display_name":"History","score":0.07625985145568848}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":4}},{"id":"openalex:W4403582537","title":"Multi-Stage Refined Visual Captioning for Baidu Ad Creatives Generation","url":"https://doi.org/10.1145/3627673.3679969","published":"2024-10-20","authors":["Yi Yang","Xinyu Zhao","Kang Zhao","Zhipeng Jin","Wen Tao","Lin Liu","Shuanglong Li"],"abstract":"High-quality multimodal training data is of critical importance for improving of multimodal model performance. However, the utilization of web-crawled vision-caption pairs is hindered by the presence of noise and irrelevance, as well as a lack of Chinese data. Large Language Models (LLM) and Large Multimodal Models (LMM) has demonstrated promising performance in cross-modal understanding and generation. In light of this, we propose a Chinese visual captioning pipeline for the synthesis of high-quality data. Our pipeline is comprised of two phases: the initial training of an encoder for visual understanding; and the subsequent fine-tuning of a captioning model in a two-stage iterative human-in-the-loop process, where the captioning model incorporates the pre-trained vision encoder and LLM by a visual cross-attention querying transformer. Extensive experiments have been conducted to valida...","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3627673.3679969","openalex_id":"https://openalex.org/W4403582537","cited_by_count":1,"quality_score":42,"matched_keywords":["LLM"],"author_affiliations":["Baidu (China)"],"concepts":[{"id":"https://openalex.org/C157657479","display_name":"Closed captioning","score":0.9237326383590698},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7365555167198181},{"id":"https://openalex.org/C146357865","display_name":"Stage (stratigraphy)","score":0.6015027761459351},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3846586346626282},{"id":"https://openalex.org/C28490314","display_name":"Speech recognition","score":0.33731609582901},{"id":"https://openalex.org/C115961682","display_name":"Image (mathematics)","score":0.07896378636360168},{"id":"https://openalex.org/C127313418","display_name":"Geology","score":0.06232738494873047},{"id":"https://openalex.org/C151730666","display_name":"Paleontology","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4403582433","title":"EASE: Learning Lightweight Semantic Feature Adapters from Large Language Models for CTR Prediction","url":"https://doi.org/10.1145/3627673.3680048","published":"2024-10-20","authors":["Zexuan Qiu","Jieming Zhu","Yankai Chen","Guohao Cai","Weiwen Liu","Zhenhua Dong","Irwin King"],"abstract":"Recent studies highlight the potential of large language models (LLMs) to enhance content integration in recommender systems by leveraging their semantic understanding capabilities. However, directly incorporating LLMs into an online inference pipeline significantly increases computation costs for large-scale deployment, posing a practical challenge in balancing their benefits and costs. In this work, we propose the EASE framework, which enriches and aligns semantic feature embeddings using LLMs during the training phase while establishing a lightweight inference pipeline that does not directly involve LLMs. Specifically, we train a semantic adapter to align item features with LLMs and simultaneously enrich semantic embeddings through reconstruction tasks from LLMs. During inference, we retain only the item feature encoder and lightweight semantic adapter, thereby eliminating the computa...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3627673.3680048","openalex_id":"https://openalex.org/W4403582433","cited_by_count":4,"quality_score":41,"matched_keywords":[],"author_affiliations":["Chinese University of Hong Kong","Huawei Technologies (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.847842812538147},{"id":"https://openalex.org/C2776401178","display_name":"Feature (linguistics)","score":0.6450921297073364},{"id":"https://openalex.org/C2781122975","display_name":"Semantic feature","score":0.5762469172477722},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.5727073550224304},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5411635041236877},{"id":"https://openalex.org/C28490314","display_name":"Speech recognition","score":0.338986873626709},{"id":"https://openalex.org/C41895202","display_name":"Linguistics","score":0.07233962416648865},{"id":"https://openalex.org/C138885662","display_name":"Philosophy","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":4}},{"id":"openalex:W4403577831","title":"UniEmbedding: Learning Universal Multi-Modal Multi-Domain Item Embeddings via User-View Contrastive Learning","url":"https://doi.org/10.1145/3627673.3680098","published":"2024-10-20","authors":["Boqi Dai","Zhaocheng Du","Jieming Zhu","Jintao Xu","D. Zou","Quanyu Dai","Zhenhua Dong","Rui Zhang","Hai-Tao Zheng"],"abstract":"Learning high-quality item embeddings is crucial for recommendation tasks such as matching and ranking. However, existing methods often rely on ID-based item embeddings learned end-to-end with downstream recommendation models, which may suffer from overfitting and limited generalizability. In this paper, we aim to learn universal item embeddings (dubbed UniEmbedding) that capture multi-modal semantics, generalize across multiple domains, and serve different downstream tasks. To achieve this goal, we introduce the UniEmbedding pretraining framework, which includes three modules: a domain-aware multi-modal adapter, a user-view projection module, and contrastive learning objectives across domains. Compared to naive ID embeddings, UniEmbedding provides rich semantic information that generalizes more effectively across domains. Unlike multi-modal embeddings directly extracted from off-the-she...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3627673.3680098","openalex_id":"https://openalex.org/W4403577831","cited_by_count":3,"quality_score":40,"matched_keywords":[],"author_affiliations":["Huawei Technologies (China)","Huazhong University of Science and Technology","Tsinghua University","University Town of Shenzhen"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7528262138366699},{"id":"https://openalex.org/C71139939","display_name":"Modal","score":0.6704192757606506},{"id":"https://openalex.org/C36503486","display_name":"Domain (mathematical analysis)","score":0.5468345880508423},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5246853232383728},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.38329678773880005},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.3294399678707123},{"id":"https://openalex.org/C33923547","display_name":"Mathematics","score":0.14550325274467468},{"id":"https://openalex.org/C188027245","display_name":"Polymer chemistry","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":3}},{"id":"openalex:W4403577842","title":"Workshop on Generative AI for E-commerce","url":"https://doi.org/10.1145/3627673.3679087","published":"2024-10-20","authors":["Mansi Ranjit Mane","Djordje Gligorijevic","Dingxian Wang","Behzed Shahrasbi","Topojoy Biswas","Evren Körpeoğlu","Marios Savvides"],"abstract":"The \"Gen AI for E-commerce\" workshop explores the role of Generative Artificial Intelligence in transforming e-commerce through enhanced user experience and operational efficiency. E-commerce companies grapple with multiple challenges such as lack of quality content for products, subpar user experience, sparse datasets etc. Gen AI offers significant potential to address these complexities. Yet, deploying these technologies at scale presents challenges such as hallucination in data, excessive costs, increased latency response, and limited generalization in sparse data environments. This workshop will bring together experts from academia and industry to discuss these challenges and opportunities, aiming to showcase case studies, breakthroughs, and insights into practical implementations of Gen AI in e-commerce.","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3627673.3679087","openalex_id":"https://openalex.org/W4403577842","cited_by_count":1,"quality_score":38,"matched_keywords":[],"author_affiliations":["Amazon (United States)","ResearchWorks (United States)","Walmart (United States)","eBay (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7086315751075745},{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.7019545435905457},{"id":"https://openalex.org/C78597825","display_name":"E-commerce","score":0.6333493590354919},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3678036630153656},{"id":"https://openalex.org/C136764020","display_name":"World Wide Web","score":0.28576499223709106}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4403582757","title":"Language Models-enhanced Semantic Topology Representation Learning For Temporal Knowledge Graph Extrapolation","url":"https://doi.org/10.1145/3627673.3679602","published":"2024-10-20","authors":["T. J. Zhang","Tongya Zheng","Zhenbang Xiao","Zulong Chen","Liangyue Li","Zunlei Feng","Dongxiang Zhang","Mingli Song"],"abstract":"Temporal Knowledge Graph (TKG) extrapolation aims to predict future missing facts based on historical information, which has exhibited both semantics and topology of events. The mainstream methods have advanced the prediction performance by exploring the potential of topology representations of TKGs based on dedicated temporal Graph Neural Networks (GNNs). Until recently, few Language Models (LM) based methods have attempted to model the semantic representations of TKGs, however, lacking specific designs for the topology information. Therefore, we propose a Semantic TOpology REpresentation learning (STORE) framework enhanced by LMs to bridge the gap between the semantics and topology of TKGs. Firstly, we tackle the challenge of long historical facts modeling by a time-aware sampling based on semantic priors to extract concise yet precise facts. Secondly, we handle the challenge of the in...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3627673.3679602","openalex_id":"https://openalex.org/W4403582757","cited_by_count":1,"quality_score":38,"matched_keywords":[],"author_affiliations":["Alibaba Group (China)","City University of Macau","Data Assurance and Communication Security","Hangzhou City University","Zhejiang University"],"concepts":[{"id":"https://openalex.org/C132459708","display_name":"Extrapolation","score":0.7821546196937561},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.751203715801239},{"id":"https://openalex.org/C2776359362","display_name":"Representation (politics)","score":0.5258344411849976},{"id":"https://openalex.org/C132525143","display_name":"Graph","score":0.5253925919532776},{"id":"https://openalex.org/C80444323","display_name":"Theoretical computer science","score":0.44479626417160034},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.4342382550239563},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.42987504601478577},{"id":"https://openalex.org/C2987255567","display_name":"Knowledge graph","score":0.4273624122142792}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4403576627","title":"Incremental Image Generation with Diffusion Models by Label Embedding Initialization and Fusion","url":"https://doi.org/10.1145/3688859.3690084","published":"2024-10-20","authors":["Bing Li","Dongdong Ren","Hao Liu","TingHao Yu","Chenglei Peng","Yang Gao"],"abstract":"Recent diffusion models have excelled in image generation, but adapting them incrementally to new, unseen classes remains difficult. This paper presents LeifDM, an incremental diffusion model that efficiently adapts pre-trained models to new classes in data-scarce environments, minimizing catastrophic forgetting. LeifDM employs a two-stage, classifier-free framework: first, it learns new weights for unseen classes through a fusion strategy, then applies these as initial label embeddings for conditional image generation. This enables LeifDM to produce diverse and realistic images of new classes, even with limited data.","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3688859.3690084","openalex_id":"https://openalex.org/W4403576627","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Nanjing University","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C114466953","display_name":"Initialization","score":0.7867566347122192},{"id":"https://openalex.org/C41608201","display_name":"Embedding","score":0.6885066628456116},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6538727879524231},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5590201616287231},{"id":"https://openalex.org/C115961682","display_name":"Image (mathematics)","score":0.549772322177887},{"id":"https://openalex.org/C158525013","display_name":"Fusion","score":0.5428900122642517},{"id":"https://openalex.org/C69357855","display_name":"Diffusion","score":0.5231862664222717},{"id":"https://openalex.org/C69744172","display_name":"Image fusion","score":0.5175279974937439}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/spaceblender-creating-context-rich-collaborative-spaces-through-generative-3d-scene-blending","title":"SpaceBlender: Creating Context-Rich Collaborative Spaces Through Generative 3D Scene Blending","url":"https://www.microsoft.com/en-us/research/publication/spaceblender-creating-context-rich-collaborative-spaces-through-generative-3d-scene-blending/","published":"2024-10-19","authors":["Nels Numan","Shwetha Rajaram","Balasaravanan Thoravi Kumaravel","Nicolai Marquardt","Andrew D. Wilson"],"abstract":"There is increased interest in using generative AI to create 3D spaces for Virtual Reality (VR) applications. However, today's models produce artificial environments, falling short of supporting collaborative tasks that benefit from incorporating the user's physical context. To generate environments that support VR telepresence, we introduce SpaceBlender, a novel pipeline that utilizes generative AI techniques to blend users' physical surroundings into unified virtual spaces. This pipeline transforms user-provided 2D images into context-rich 3D environments through an iterative process consisting of depth estimation, mesh alignment, and diffusion-based space completion guided by geometric priors and adaptive text prompts. In a preliminary within-subjects study, where 20 participants performed a collaborative VR affinity diagramming task in pairs, we compared SpaceBlender with a generic v...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.1145/3654777.3676361","openalex_id":"https://openalex.org/W4403334489","cited_by_count":11,"quality_score":83,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Graphics and multimedia","Human-computer interaction","Computer science"],"author_affiliations":["Microsoft","Michigan United","Microsoft (United States)","Microsoft Research (United Kingdom)","University College London","University of Michigan"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W4403564788","title":"Mitigating Hallucination in Visual-Language Models via Re-balancing Contrastive Decoding","url":"https://doi.org/10.1007/978-981-97-8620-6_33","published":"2024-10-19","authors":["Xiaoyu Liang","Jiayuan Yu","Lianrui Mu","Jiedong Zhuang","Jiaqi Hu","Yuchen Yang","Jiangnan Ye","Lu Lu","Jian Chen","Haoji Hu"],"abstract":"","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"book-chapter","doi":"https://doi.org/10.1007/978-981-97-8620-6_33","openalex_id":"https://openalex.org/W4403564788","cited_by_count":1,"quality_score":38,"matched_keywords":[],"author_affiliations":["Alibaba Group (China)","Zhejiang University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8496865034103394},{"id":"https://openalex.org/C57273362","display_name":"Decoding methods","score":0.7464226484298706},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.47625041007995605},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.42287930846214294},{"id":"https://openalex.org/C41895202","display_name":"Linguistics","score":0.32950887084007263},{"id":"https://openalex.org/C11413529","display_name":"Algorithm","score":0.1552901566028595},{"id":"https://openalex.org/C138885662","display_name":"Philosophy","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4403562107","title":"$$A^{3}R$$: Vision Language Pre-training by Attentive Alignment and Attentive Reconstruction","url":"https://doi.org/10.1007/978-981-97-8620-6_9","published":"2024-10-19","authors":["Yusong Hu","Yuting Gao","Zihan Xu","Ke Li","Xialei Liu"],"abstract":"","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"book-chapter","doi":"https://doi.org/10.1007/978-981-97-8620-6_9","openalex_id":"https://openalex.org/W4403562107","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Nankai University","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7976508736610413},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5750513672828674},{"id":"https://openalex.org/C2777211547","display_name":"Training (meteorology)","score":0.5028047561645508},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.45272544026374817},{"id":"https://openalex.org/C188147891","display_name":"Cognitive science","score":0.3843250572681427},{"id":"https://openalex.org/C15744967","display_name":"Psychology","score":0.15466144680976868},{"id":"https://openalex.org/C121332964","display_name":"Physics","score":0.039295494556427},{"id":"https://openalex.org/C153294291","display_name":"Meteorology","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4403536522","title":"What Makes a High-Quality Training Dataset for Large Language Models: A Practitioners' Perspective","url":"https://doi.org/10.1145/3691620.3695061","published":"2024-10-18","authors":["Xiao Yu","Zexian Zhang","Feifei Niu","Xing Hu","Xin Xia","John Grundy"],"abstract":"Large Language Models (LLMs) have demonstrated remarkable performance in various application domains, largely due to their self-supervised pre-training on extensive high-quality text datasets. However, despite the importance of constructing such datasets, many leading LLMs lack documentation of their dataset construction and training procedures, leaving LLM practitioners with a limited understanding of what makes a high-quality training dataset for LLMs. To fill this gap, we initially identified 18 characteristics of high-quality LLM training datasets, as well as 10 potential data pre-processing methods and 6 data quality assessment methods, through detailed interviews with 13 experienced LLM professionals. We then surveyed 219 LLM practitioners from 23 countries across 5 continents. We asked our survey respondents to rate the importance of these characteristics, provide a rationale for....","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3691620.3695061","openalex_id":"https://openalex.org/W4403536522","cited_by_count":14,"quality_score":55,"matched_keywords":["LLM"],"author_affiliations":["Huawei Technologies (China)","Monash University","University of Ottawa","Wuhan University of Technology","Zhejiang University"],"concepts":[{"id":"https://openalex.org/C12713177","display_name":"Perspective (graphical)","score":0.8010203838348389},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7542845010757446},{"id":"https://openalex.org/C2777211547","display_name":"Training (meteorology)","score":0.6293609142303467},{"id":"https://openalex.org/C2779530757","display_name":"Quality (philosophy)","score":0.6262152194976807},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.44480592012405396},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4185892343521118},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.38670986890792847},{"id":"https://openalex.org/C2522767166","display_name":"Data science","score":0.37598901987075806}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":14}},{"id":"openalex:W4403536437","title":"GraphCoder: Enhancing Repository-Level Code Completion via Coarse-to-fine Retrieval Based on Code Context Graph","url":"https://doi.org/10.1145/3691620.3695054","published":"2024-10-18","authors":["Wei Liu","Ailun Yu","Daoguang Zan","Bo Shen","Wei Zhang","Hantao Zhao","Zhi Jin","Qianxiang Wang"],"abstract":"The performance of repository-level code completion depends upon the effective leverage of both general and repository-specific knowledge. Despite the impressive capability of code LLMs in general code completion tasks, they often exhibit less satisfactory performance on repository-level completion due to the lack of repository-specific knowledge in these LLMs. To address this problem, we propose GraphCoder, a retrieval-augmented code completion framework that leverages LLMs' general code knowledge and the repository-specific knowledge via a graph-based retrieval-generation process. In particular, GraphCoder captures the context of completion target more accurately through code context graph (CCG) that consists of control-flow, data- and control-dependence between code statements, a more structured way to capture the completion target context than the sequence-based context used in exist...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3691620.3695054","openalex_id":"https://openalex.org/W4403536437","cited_by_count":11,"quality_score":52,"matched_keywords":["retrieval"],"author_affiliations":["Chinese Academy of Sciences","Huawei Technologies (China)","Institute of Software","Peking University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.785294771194458},{"id":"https://openalex.org/C2776760102","display_name":"Code (set theory)","score":0.6092088222503662},{"id":"https://openalex.org/C2779343474","display_name":"Context (archaeology)","score":0.5048900246620178},{"id":"https://openalex.org/C132525143","display_name":"Graph","score":0.4877341687679291},{"id":"https://openalex.org/C199360897","display_name":"Programming language","score":0.3847748935222626},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.3420073091983795},{"id":"https://openalex.org/C80444323","display_name":"Theoretical computer science","score":0.28373605012893677},{"id":"https://openalex.org/C177264268","display_name":"Set (abstract data type)","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":11}},{"id":"openalex:W4403527710","title":"Creating a biomedical knowledge base by addressing GPT inaccurate responses and benchmarking context","url":"https://doi.org/10.1101/2024.10.16.618663","published":"2024-10-18","authors":["Shelby S. Darnell","Rupert W. Overall","Andrea Guarracino","Vincenza Colonna","Flavia Villani","Erik Garrison","Arun Isaac","Priscilla Muli","Frederick Muriuki Muriithi","Alexander Kabui","Munyoki Kilyungi","Felix Lisso"],"abstract":"We created GNQA, a generative pre-trained transformer (GPT) knowledge base driven by a performant retrieval augmented generation (RAG) with a focus on aging, dementia, Alzheimer's and diabetes. We uploaded a corpus of three thousand peer reviewed publications on these topics into the RAG. To address concerns about inaccurate responses and GPT 'hallucinations', we implemented a context provenance tracking mechanism that enables researchers to validate responses against the original material and to get references to the original papers. To assess the effectiveness of contextual information we collected evaluations and feedback from both domain expert users and 'citizen scientists' on the relevance of GPT responses. A key innovation of our study is automated evaluation by way of a RAG assessment system (RAGAS). RAGAS combines human expert assessment with AI-driven evaluation to measure the....","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"preprint","doi":"https://doi.org/10.1101/2024.10.16.618663","openalex_id":"https://openalex.org/W4403527710","cited_by_count":4,"quality_score":45,"matched_keywords":["retrieval"],"author_affiliations":["African Institute for Development Policy","Cornell University","Humboldt-Universität zu Berlin","Nairobi Hospital","Nvidia (United Kingdom)","Nvidia (United States)","Pwani University","Strathmore University","University College London","University of Nairobi","University of Tennessee Health Science Center","Wageningen University & Research"],"concepts":[{"id":"https://openalex.org/C86251818","display_name":"Benchmarking","score":0.9311842322349548},{"id":"https://openalex.org/C2779343474","display_name":"Context (archaeology)","score":0.6725990772247314},{"id":"https://openalex.org/C4554734","display_name":"Knowledge base","score":0.5066400170326233},{"id":"https://openalex.org/C56739046","display_name":"Knowledge management","score":0.46427246928215027},{"id":"https://openalex.org/C42058472","display_name":"Base (topology)","score":0.4360124468803406},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.43006694316864014},{"id":"https://openalex.org/C2522767166","display_name":"Data science","score":0.40355241298675537},{"id":"https://openalex.org/C144133560","display_name":"Business","score":0.2907847762107849}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":4}},{"id":"openalex:W4403536732","title":"Giving Every Modality a Voice in Microservice Failure Diagnosis via Multimodal Adaptive Optimization","url":"https://doi.org/10.1145/3691620.3695489","published":"2024-10-18","authors":["Tao Lei","Shenglin Zhang","Zedong Jia","Jinrui Sun","Minghua Ma","Zhengdan Li","Yongqian Sun","Canqun Yang","Yuzhi Zhang","Dan Pei"],"abstract":"Microservice systems are inherently complex and prone to failures, which can significantly impact user experience. Existing diagnostic approaches based on single-modal data such as logs, metrics, or traces cannot comprehensively capture failure patterns. For those multimodal data-based failure diagnosis methods, the dominant modality can overshadow others, hindering low-yield modalities from fully leveraging their characteristics. This paper proposes Medicine, a modal-independent microservice failure diagnosis framework based on multimodal adaptive optimization. It encodes different modalities separately to retain their unique features and employs adaptive optimization to adjust the learning pace between modalities, thereby enhancing overall diagnostic performance. Experimental results demonstrate that Medicine outperforms existing single-modal and multimodal diagnostic approaches on thr...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3691620.3695489","openalex_id":"https://openalex.org/W4403536732","cited_by_count":7,"quality_score":44,"matched_keywords":[],"author_affiliations":["Microsoft (United States)","Nankai University","National Engineering Research Center for Information Technology in Agriculture","National Supercomputing Center of Tianjin","National University of Defense Technology","Tianjin University","Tianjin haihe hospital","Tsinghua University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7088093757629395},{"id":"https://openalex.org/C2780226545","display_name":"Modality (human–computer interaction)","score":0.6865127086639404},{"id":"https://openalex.org/C28490314","display_name":"Speech recognition","score":0.46803879737854004},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.29392433166503906}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":7}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/fine-grained-verifiers-preference-modeling-as-next-token-prediction-in-vision-language-alignment","title":"Fine-Grained Verifiers: Preference Modeling as Next-token Prediction in Vision-Language Alignment","url":"https://www.microsoft.com/en-us/research/publication/fine-grained-verifiers-preference-modeling-as-next-token-prediction-in-vision-language-alignment/","published":"2024-10-17","authors":["Chenhang Cui","An Zhang","Yiyang Zhou","Zhaorun Chen","Gelei Deng","Huaxiu Yao","Tat-Seng Chua"],"abstract":"The recent advancements in large language models (LLMs) and pre-trained vision models have accelerated the development of vision-language large models (VLLMs), enhancing the interaction between visual and linguistic modalities. Despite their notable success across various domains, VLLMs face challenges in modality alignment, which can lead to issues like hallucinations and unsafe content generation. Current alignment techniques often rely on coarse feedback and external datasets, limiting scalability and performance. In this paper, we propose FiSAO (Fine-Grained Self-Alignment Optimization), a novel self-alignment method that utilizes the model's own visual encoder as a fine-grained verifier to improve vision-language alignment without the need for additional data. By leveraging token-level feedback from the vision encoder, FiSAO significantly improves vision-language alignment, even sur...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":88,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer vision","Computation and Language","Computer science","Computer Vision and Pattern Recognition","Vision-language models","1970-01-01","preference"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"hf-org-paper:deepseek-ai:2410.13848","title":"Janus: Decoupling Visual Encoding for Unified Multimodal Understanding and Generation","url":"https://huggingface.co/papers/2410.13848","published":"2024-10-17","authors":["DeepSeek"],"abstract":"","companies":["DeepSeek"],"matched_orgs":["DeepSeek"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["HuggingFace org papers","deepseek-ai"],"author_affiliations":["DeepSeek"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/deepseek-ai/papers"}},{"id":"openalex:W4403510251","title":"iBTC: An Image-Assisting Binary and Triangle Combined Descriptor for Place Recognition by Fusing LiDAR and Camera Measurements","url":"https://doi.org/10.1109/lra.2024.3483032","published":"2024-10-17","authors":["Zuhao Zou","Chunran Zheng","Chongjian Yuan","Shunbo Zhou","Kaiwen Xue","Fu Zhang"],"abstract":"In this work, we introduce a novel multimodal descriptor, the image-assisting binary and triangle combined (iBTC) descriptor, which fuses LiDAR (Light Detection and Ranging) and camera measurements for 3D place recognition. The inherent invariance of a triangle to rigid transformations inspires us to design triangle-based descriptors. We first extract distinct 3D key points from both LiDAR and camera measurements and organize them into triplets to form triangles. By utilizing the lengths of the sides of these triangles, we can create triangle descriptors, enabling the rapid retrieval of similar triangles from a database. By encoding the geometric and visual details at the triangle vertices into binary descriptors, we augment the triangle descriptors with richer local information. This enrichment process empowers our descriptors to reject mis-matched triangle pairs. Consequently, the rema...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/lra.2024.3483032","openalex_id":"https://openalex.org/W4403510251","cited_by_count":7,"quality_score":48,"matched_keywords":["retrieval"],"author_affiliations":["Huawei Technologies (China)","University of Hong Kong"],"concepts":[{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.7647197246551514},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.7509549856185913},{"id":"https://openalex.org/C51399673","display_name":"Lidar","score":0.6479222178459167},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.4977739155292511},{"id":"https://openalex.org/C48372109","display_name":"Binary number","score":0.4864809215068817},{"id":"https://openalex.org/C115961682","display_name":"Image (mathematics)","score":0.4658867418766022},{"id":"https://openalex.org/C87335442","display_name":"Local binary patterns","score":0.4450916051864624},{"id":"https://openalex.org/C153180895","display_name":"Pattern recognition (psychology)","score":0.3754696249961853}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":7}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/unearthing-skill-level-insights-for-understanding-trade-offs-of-foundation-models","title":"Unearthing Skill-Level Insights for Understanding Trade-Offs of Foundation Models","url":"https://www.microsoft.com/en-us/research/publication/unearthing-skill-level-insights-for-understanding-trade-offs-of-foundation-models/","published":"2024-10-16","authors":["Mazda Moayeri","Vidhisha Balachandran","Varun Chandrasekaran","Safoora Yousefi","Thomas Fel","S. Feizi","Besmira Nushi","Neel Joshi","Vibhav Vineet"],"abstract":"With models getting stronger, evaluations have grown more complex, testing multiple skills in one benchmark and even in the same instance at once. However, skill-wise performance is obscured when inspecting aggregate accuracy, under-utilizing the rich signal modern benchmarks contain. We propose an automatic approach to recover the underlying skills relevant for any evaluation instance, by way of inspecting model-generated rationales. After validating the relevance of rationale-parsed skills and inferring skills for $46$k instances over $12$ benchmarks, we observe many skills to be common across benchmarks, resulting in the curation of hundreds of skill-slices (i.e. sets of instances testing a common skill). Inspecting accuracy over these slices yields novel insights on model trade-offs: e.g., compared to GPT-4o and Claude 3.5 Sonnet, on average, Gemini 1.5 Pro is $18\\%$ more accurate in...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","foundation models","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/seerattention-learning-intrinsic-sparse-attention-in-your-llms","title":"SeerAttention: Learning Intrinsic Sparse Attention in Your LLMs","url":"https://www.microsoft.com/en-us/research/publication/seerattention-learning-intrinsic-sparse-attention-in-your-llms/","published":"2024-10-16","authors":["Yizhao Gao","Zhichen Zeng","Dayou Du","Shijie Cao","Hayden Kwok-Hay So","Ting Cao","Fan Yang","Mao Yang"],"abstract":"Attention is the cornerstone of modern Large Language Models (LLMs). Yet its quadratic complexity limits the efficiency and scalability of LLMs, especially for those with a long-context window. A promising approach addressing this limitation is to leverage the sparsity in attention. However, existing sparsity-based solutions predominantly rely on predefined patterns or heuristics to approximate sparsity. This practice falls short to fully capture the dynamic nature of attention sparsity in language-based tasks. This paper argues that attention sparsity should be learned rather than predefined. To this end, we design SeerAttention, a new Attention mechanism that augments the conventional attention with a learnable gate that adaptively selects significant blocks in an attention map and deems the rest blocks sparse. Such block-level sparsity effectively balances accuracy and speedup. To ena...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Miscellaneous","Systems and networking","Computer science","efficient"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"official:00daad6e1d861990","title":"Movie Gen: A Cast of Media Foundation Models","url":"https://ai.meta.com/research/publications/movie-gen-a-cast-of-media-foundation-models/","published":"2024-10-16","authors":["Movie Gen Team"],"abstract":"Official Meta AI publication page.","companies":["Meta/FAIR"],"matched_orgs":["Meta/FAIR"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Speech & Audio","Computer Vision","media"],"author_affiliations":["Meta/FAIR"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Meta AI publication search https://ai.meta.com/results/?content_types%5B0%5D=publication&page=9"}},{"id":"openalex:W4403488180","title":"On the Cultural Gap in Text-to-Image Generation","url":"https://doi.org/10.3233/faia240581","published":"2024-10-16","authors":["Bingshuai Liu","Longyue Wang","Chenyang Lyu","Yong Zhang","Jinsong Su","Shuming Shi","Zhaopeng Tu"],"abstract":"One challenge in text-to-image (T2I) generation is the inadvertent reflection of culture gaps present in the training data, which signifies the disparity in generated image quality when the cultural elements of the input text are rarely collected in the training set. Although various T2I models have shown impressive but arbitrary examples, there is no benchmark to systematically evaluate a T2I model’s ability to generate cross-cultural images. To bridge the gap, we propose a Challenging Cross-Cultural (C3) benchmark with comprehensive evaluation criteria, which can assess how well-suited a model is to a target culture. By analyzing the flawed images generated by the Stable Diffusion model on the C3 benchmark, we find that the model often fails to generate certain cultural objects. Accordingly, we propose a novel multi-modal metric that considers object-text alignment to filter the fine-t...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"book-chapter","doi":"https://doi.org/10.3233/faia240581","openalex_id":"https://openalex.org/W4403488180","cited_by_count":10,"quality_score":47,"matched_keywords":[],"author_affiliations":["Dublin City University","Tencent (China)","Xiamen University"],"concepts":[{"id":"https://openalex.org/C115961682","display_name":"Image (mathematics)","score":0.5600407123565674},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.3130851686000824},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.24816229939460754}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":10}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/mmed-rag-versatile-multimodal-rag-system-for-medical-vision-language-models","title":"MMed-RAG: Versatile Multimodal RAG System for Medical Vision Language Models","url":"https://www.microsoft.com/en-us/research/publication/mmed-rag-versatile-multimodal-rag-system-for-medical-vision-language-models/","published":"2024-10-15","authors":["Peng Xia","Kangyu Zhu","Haoran Li","Weijia Shi","Sheng Wang","Linjun Zhang","James Zou","Huaxiu Yao"],"abstract":"Artificial Intelligence (AI) has demonstrated significant potential in healthcare, particularly in disease diagnosis and treatment planning. Recent progress in Medical Large Vision-Language Models (Med-LVLMs) has opened up new possibilities for interactive diagnostic tools. However, these models often suffer from factual hallucination, which can lead to incorrect diagnoses. Fine-tuning and retrieval-augmented generation (RAG) have emerged as methods to address these issues. However, the amount of high-quality data and distribution shifts between training data and deployment data limit the application of fine-tuning methods. Although RAG is lightweight and effective, existing RAG-based approaches are not sufficiently general to different medical domains and can potentially cause misalignment issues, both between modalities and between the model and the ground truth. In this paper, we prop...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":92,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Medical, health and genomics","Computation and Language","Computer science","Computer Vision and Pattern Recognition","Vision-language models","1970-01-01","preference","retrieval"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/cream-consistency-regularized-self-rewarding-language-models","title":"CREAM: Consistency Regularized Self-Rewarding Language Models","url":"https://www.microsoft.com/en-us/research/publication/cream-consistency-regularized-self-rewarding-language-models/","published":"2024-10-15","authors":["Zhaoyang Wang","Weilei He","Zhiyuan Liang","Xuchao Zhang","Chetan Bansal","Ying Wei","Weitong Zhang","Huaxiu Yao"],"abstract":"Recent self-rewarding large language models (LLM) have successfully applied LLM-as-a-Judge to iteratively improve the alignment performance without the need of human annotations for preference data. These methods commonly utilize the same LLM to act as both the policy model (which generates responses) and the reward model (which scores and ranks those responses). The ranked responses are then used as preference pairs to train the LLM via direct alignment technologies (e.g. DPO). However, it is noteworthy that throughout this process, there is no guarantee of accuracy in the rewarding and ranking, which is critical for ensuring accurate rewards and high-quality preference data. Empirical results from relatively small LLMs (e.g., 7B parameters) also indicate that improvements from self-rewarding may diminish after several iterations in certain situations, which we hypothesize is due to acc...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":84,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","large language models","1970-01-01","LLM","language model","preference"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/safree-training-free-and-adaptive-guard-for-safe-text-to-image-and-video-generation","title":"SAFREE: Training-Free and Adaptive Guard for Safe Text-to-Image And Video Generation","url":"https://www.microsoft.com/en-us/research/publication/safree-training-free-and-adaptive-guard-for-safe-text-to-image-and-video-generation/","published":"2024-10-15","authors":["Jaehong Yoon","Shoubin Yu","Vaidehi Patil","Huaxiu Yao","Mohit Bansal"],"abstract":"Recent advances in diffusion models have significantly enhanced their ability to generate high-quality images and videos, but they have also increased the risk of producing unsafe content. Existing unlearning/editing-based methods for safe generation remove harmful concepts from models but face several challenges: (1) They cannot instantly remove harmful concepts without training. (2) Their safe generation capabilities depend on collected training data. (3) They alter model weights, risking degradation in quality for content unrelated to toxic concepts. To address these, we propose SAFREE, a novel, training-free approach for safe T2I and T2V, that does not alter the model's weights. Specifically, we detect a subspace corresponding to a set of toxic concepts in the text embedding space and steer prompt embeddings away from this subspace, thereby filtering out harmful content while preserv...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer vision","Computer science","Computer Vision and Pattern Recognition","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/porover-improving-safety-and-reducing-overrefusal-in-large-language-models-with-overgeneration-and-preference-optimization","title":"POROver: Improving Safety and Reducing Overrefusal in Large Language Models with Overgeneration and Preference Optimization","url":"https://www.microsoft.com/en-us/research/publication/porover-improving-safety-and-reducing-overrefusal-in-large-language-models-with-overgeneration-and-preference-optimization/","published":"2024-10-15","authors":["Batuhan K. Karaman","I. Zabir","Alon Benhaim","Vishrav Chaudhary","M. Sabuncu","Xia Song"],"abstract":"Balancing safety and usefulness in large language models has become a critical challenge in recent years. Models often exhibit unsafe behavior or adopt an overly cautious approach, leading to frequent overrefusal of benign prompts, which reduces their usefulness. Addressing these issues requires methods that maintain safety while avoiding overrefusal. In this work, we examine how the overgeneration of training data using advanced teacher models (e.g., GPT-4o), including responses to both general-purpose and toxic prompts, influences the safety and overrefusal balance of instruction-following language models. Additionally, we present POROver, a strategy to use preference optimization methods in order to reduce overrefusal, via employing a superior teacher model's completions. Our results show that overgenerating completions for general-purpose prompts significantly improves the balance be...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","large language models","1970-01-01","preference"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/towards-graph-foundation-models-training-on-knowledge-graphs-enables-transferability-to-general-graphs","title":"Towards Graph Foundation Models: Training on Knowledge Graphs Enables Transferability to General Graphs","url":"https://www.microsoft.com/en-us/research/publication/towards-graph-foundation-models-training-on-knowledge-graphs-enables-transferability-to-general-graphs/","published":"2024-10-15","authors":["Kai Wang","Siqiang Luo","Caihua Shan","Yifei Shen"],"abstract":"Inspired by the success of large language models, there is a trend toward developing graph foundation models to conduct diverse downstream tasks in various domains. However, current models often require extra fine-tuning to apply their learned structural and semantic representations to new graphs, which limits their versatility. Recent breakthroughs in zero-shot inductive reasoning on knowledge graphs (KGs), offer us a new perspective on extending KG reasoning to general graph applications. In this paper, we introduce SCR, a unified graph reasoning framework designed to train on knowledge graphs and effectively generalize across a wide range of graph tasks and domains. We begin by designing the task-specific KG structures to establish a unified topology for different task formats. Then we propose semantic-conditioned message passing, a novel mechanism addressing the inherent semantic iso...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","large language models","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"apple:ut42tzx6q2ogfl2wp07n6485","title":"CAMPHOR: Collaborative Agents for Multi-Input Planning and High-Order Reasoning On Device","url":"https://machinelearning.apple.com/research/collaborative-agents","published":"2024-10-15","authors":["Yicheng Fu","Raviteja Anantha","Jianpeng Cheng"],"abstract":"While server-side Large Language Models (LLMs) demonstrate proficiency in tool integration and complex reasoning, deploying Small Language Models (SLMs) directly on devices brings opportunities to improve latency and privacy but also introduces unique challenges for accuracy and memory. We introduce CAMPHOR, an innovative on-device SLM multi-agent framework designed to handle multiple user inputs and reason over personal context locally, ensuring...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["memory","agent","multi-agent"],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/improving-instruction-following-in-language-models-through-activation-steering","title":"Improving Instruction-Following in Language Models through Activation Steering","url":"https://www.microsoft.com/en-us/research/publication/improving-instruction-following-in-language-models-through-activation-steering/","published":"2024-10-14","authors":["Alessandro Stolfo","Vidhisha Balachandran","Safoora Yousefi","Eric Horvitz","Besmira Nushi"],"abstract":"The ability to follow instructions is crucial for numerous real-world applications of language models. In pursuit of deeper insights and more powerful capabilities, we derive instruction-specific vector representations from language models and use them to steer models accordingly. These vectors are computed as the difference in activations between inputs with and without instructions, enabling a modular approach to activation steering. We demonstrate how this method can enhance model adherence to constraints such as output format, length, and word inclusion, providing inference-time control over instruction following. Our experiments across four models demonstrate how we can use the activation vectors to guide models to follow constraints even without explicit instructions and to enhance performance when instructions are present. Additionally, we explore the compositionality of activatio...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","Machine learning","1970-01-01","LLM"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W4403395420","title":"A Unified Framework for Multi-Domain CTR Prediction via Large Language Models","url":"https://doi.org/10.1145/3698878","published":"2024-10-14","authors":["Zichuan Fu","Xiangyang Li","Chuhan Wu","Yichao Wang","Kuicai Dong","Xiangyu Zhao","Mengchen Zhao","Huifeng Guo","Ruiming Tang"],"abstract":"Multi-Domain Click-Through Rate (MDCTR) prediction is crucial for online recommendation platforms, which involves providing personalized recommendation services to users in different domains. However, current MDCTR models are confronted with the following limitations. Firstly, due to varying data sparsity in different domains, models can easily be dominated by some specific domains, which leads to significant performance degradation in other domains (i.e., the “seesaw phenomenon”). Secondly, when new domain emerges, the scalability of existing methods is limited, making it difficult to adapt to the dynamic growth of the domain. Traditional MDCTR models usually use one-hot encoding for semantic information such as product titles, thus losing rich semantic information and leading to insufficient generalization of the model. In this article, we propose a novel solution Uni-CTR to address th...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3698878","openalex_id":"https://openalex.org/W4403395420","cited_by_count":20,"quality_score":69,"matched_keywords":["LLM","language model","personalized"],"author_affiliations":["City University of Hong Kong","Huawei Technologies (China)","Huawei Technologies (Sweden)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6395179629325867},{"id":"https://openalex.org/C36503486","display_name":"Domain (mathematical analysis)","score":0.5322938561439514},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.41753703355789185},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3509979546070099},{"id":"https://openalex.org/C33923547","display_name":"Mathematics","score":0.12019041180610657},{"id":"https://openalex.org/C134306372","display_name":"Mathematical analysis","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":20}},{"id":"openalex:W4405778626","title":"Language-Guided Pattern Formation for Swarm Robotics with Multi-Agent Reinforcement Learning","url":"https://doi.org/10.1109/iros58592.2024.10801665","published":"2024-10-14","authors":["Hsu-Shen Liu","So Kuroki","Tadashi Kozuno","Wei-Fang Sun","Chun‐Yi Lee"],"abstract":"This paper explores leveraging the vast knowledge encoded in Large Language Models (LLMs) to tackle pattern formation challenges for swarm robotics systems. A new framework, named LGPF (Language-Guided Pattern Formation), is proposed to address these challenges. The framework breaks down the pattern formation into two key components: pattern synthesis and swarm robotics control. For the former, this study utilizes the exceptional few-shot generalizability of LLMs to translate high-level natural language descriptions into the desired spatial pattern coordinates. This approach allows for overcoming previous limitations in representing and designing complex patterns. The framework further employs a centralized training with decentralized execution (CTDE) based multiagent reinforcement learning (MARL) approach to control the swarm robots in forming the specified pattern while avoiding collis...","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/iros58592.2024.10801665","openalex_id":"https://openalex.org/W4405778626","cited_by_count":3,"quality_score":48,"matched_keywords":["agent","multi-agent"],"author_affiliations":["National Tsing Hua University","Nvidia (United States)","Omron (Japan)"],"concepts":[{"id":"https://openalex.org/C97541855","display_name":"Reinforcement learning","score":0.7775523662567139},{"id":"https://openalex.org/C169337768","display_name":"Swarm robotics","score":0.7537664175033569},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.6522926688194275},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6413041353225708},{"id":"https://openalex.org/C34413123","display_name":"Robotics","score":0.5908257365226746},{"id":"https://openalex.org/C181335050","display_name":"Swarm behaviour","score":0.5461822152137756},{"id":"https://openalex.org/C90509273","display_name":"Robot","score":0.36084023118019104},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.3302106261253357}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":3}},{"id":"openalex:W4403434444","title":"Utility-Oriented Knowledge Graph Accuracy Estimation with Limited Annotations: A Case Study on DBpedia","url":"https://doi.org/10.1609/hcomp.v12i1.31605","published":"2024-10-14","authors":["Stefano Marchesin","Gianmaria Silvello","Ómar Alonso"],"abstract":"Knowledge Graphs (KGs) are essential for applications like search, recommendation, and virtual assistants, where their accuracy directly impacts effectiveness. However, due to their large-scale and ever-evolving nature, it is impractical to manually evaluate all KG contents. We propose a framework that employs sampling, estimation, and active learning to audit KG accuracy in a cost-effective manner. The framework prioritizes KG facts based on their utility to downstream tasks. We applied the framework to DBpedia and gathered annotations from both expert and layman annotators. We also explored the potential of Large Language Models (LLMs) as KG evaluators, showing that while they can perform comparably to low-quality human annotators, they tend to overestimate KG accuracy. As such, LLMs are currently insufficient to replace human crowdworkers in the evaluation process. The results also pr...","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1609/hcomp.v12i1.31605","openalex_id":"https://openalex.org/W4403434444","cited_by_count":5,"quality_score":42,"matched_keywords":[],"author_affiliations":["Amazon (United States)","University of Padua"],"concepts":[{"id":"https://openalex.org/C2987255567","display_name":"Knowledge graph","score":0.6922296285629272},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.657012939453125},{"id":"https://openalex.org/C132525143","display_name":"Graph","score":0.5781174302101135},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.48554012179374695},{"id":"https://openalex.org/C69075417","display_name":"Linked data","score":0.4735024571418762},{"id":"https://openalex.org/C80444323","display_name":"Theoretical computer science","score":0.19468623399734497},{"id":"https://openalex.org/C2129575","display_name":"Semantic Web","score":0.17557558417320251}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":5}},{"id":"openalex:W4403392337","title":"Response-Aided Score-Matching Representative Approaches for Big Data Analysis and Model Selection under Generalized Linear Models","url":"https://doi.org/10.3390/a17100456","published":"2024-10-14","authors":["Duo Zheng","Keren Li","Jie Yang"],"abstract":"In this paper, we propose an efficient method called the response-aided score-matching representative (RASMR) approach to facilitate massive data model selection and data analysis with generalized linear models (GLMs) and a predetermined data partition due to data localization. Similar to the original score-matching representative (SMR) approach, RASMR constructs an artificial data point, called the representative, for each data block. It then fits a GLM on the representative dataset, which provides not only an efficient approach for massive data analysis but also an ideal solution in response to privacy concerns by avoiding the transfer of sensitive data. By further splitting the data blocks according to the values of the response variables, RASMR can obtain more accurate parameter estimates than SMR. Furthermore, by theoretical justifications and simulation studies, we show that RASMR....","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.3390/a17100456","openalex_id":"https://openalex.org/W4403392337","cited_by_count":1,"quality_score":42,"matched_keywords":["efficient"],"author_affiliations":["Amazon (United States)","Bellevue Hospital Center","University of Alabama at Birmingham","University of Illinois Chicago"],"concepts":[{"id":"https://openalex.org/C41587187","display_name":"Generalized linear model","score":0.6081534624099731},{"id":"https://openalex.org/C93959086","display_name":"Model selection","score":0.5910696387290955},{"id":"https://openalex.org/C163175372","display_name":"Linear model","score":0.5689339637756348},{"id":"https://openalex.org/C81917197","display_name":"Selection (genetic algorithm)","score":0.5613092184066772},{"id":"https://openalex.org/C165064840","display_name":"Matching (statistics)","score":0.5470269322395325},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5261679887771606},{"id":"https://openalex.org/C75684735","display_name":"Big data","score":0.4643099308013916},{"id":"https://openalex.org/C105795698","display_name":"Statistics","score":0.37816551327705383}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4405787568","title":"Geolocation on Cartographic Maps with Multi-Modal Fusion","url":"https://doi.org/10.1109/iros58592.2024.10801404","published":"2024-10-14","authors":["Mengjie Zhou","Liu Liu","Yiran Zhong","Andrew Calway"],"abstract":"We explore the geolocation problem, aiming to localize ground-view images on cartographic maps, without the need of any GPS priors. This task mimics the human wayfinding ability and offers high scalability and robustness by using the compact and semantic representations of maps. Current methods often rely on 2D maps to encode dense contextual information for ground-to-map matching. In this paper, we lift ground-to-map matching to a 2.5D space, where heights of structures (e.g. buildings) provide richer geometric information to guide the matching process. We propose a new approach to learning representative embeddings from multi-modal data. Specifically, we establish a projection relationship between 2D and 2.5D space. The projection is further used to combine multi-modal features from the 2D and 2.5D maps using an effective pixel-to-point fusion method. By encoding crucial geometric cues...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/iros58592.2024.10801404","openalex_id":"https://openalex.org/W4405787568","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Huawei Technologies (China)","Shanghai Artificial Intelligence Laboratory","University of Bristol"],"concepts":[{"id":"https://openalex.org/C22041718","display_name":"Geolocation","score":0.9096565246582031},{"id":"https://openalex.org/C71139939","display_name":"Modal","score":0.6341920495033264},{"id":"https://openalex.org/C33954974","display_name":"Sensor fusion","score":0.6080363988876343},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5758318901062012},{"id":"https://openalex.org/C58640448","display_name":"Cartography","score":0.4730740487575531},{"id":"https://openalex.org/C62649853","display_name":"Remote sensing","score":0.45846614241600037},{"id":"https://openalex.org/C158525013","display_name":"Fusion","score":0.42951512336730957},{"id":"https://openalex.org/C13280743","display_name":"Geodesy","score":0.38101017475128174}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"arxiv:2410.10934","title":"Agent-as-a-Judge: Evaluate Agents with Agents","url":"https://huggingface.co/papers/2410.10934","published":"2024-10-14","authors":["Mingchen Zhuge","Changsheng Zhao","Dylan Ashley","Wenyi Wang","Dmitrii Khizbullin","Yunyang Xiong","Zechun Liu","Ernie Chang","Raghuraman Krishnamoorthi","Yuandong Tian","Yangyang Shi","Vikas Chandra"],"abstract":"Contemporary evaluation techniques are inadequate for agentic systems. These approaches either focus exclusively on final outcomes -- ignoring the step-by-step nature of agentic systems, or require excessive manual labour. To address this, we introduce the Agent-as-a-Judge framework, wherein agentic systems are used to evaluate agentic systems. This is an organic extension of the LLM-as-a-Judge framework, incorporating agentic features that enable intermediate feedback for the entire task-solving process. We apply the Agent-as-a-Judge to the task of code generation. To overcome issues with existing benchmarks and provide a proof-of-concept testbed for Agent-as-a-Judge, we present DevAI, a new benchmark of 55 realistic automated AI development tasks. It includes rich manual annotations, like a total of 365 hierarchical user requirements. We benchmark three of the popular agentic systems u...","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["huggingface_search"],"source":"huggingface_search","work_type":"","doi":"","openalex_id":"","cited_by_count":0,"quality_score":35,"matched_keywords":["LLM","agent"],"author_affiliations":[],"concepts":[],"official_report":false,"quality_signals":{"company_match_source":"HuggingFace Papers organization/author metadata"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/mmie-massive-multimodal-interleaved-comprehension-benchmark-for-large-vision-language-models","title":"MMIE: Massive Multimodal Interleaved Comprehension Benchmark for Large Vision-Language Models","url":"https://www.microsoft.com/en-us/research/publication/mmie-massive-multimodal-interleaved-comprehension-benchmark-for-large-vision-language-models/","published":"2024-10-13","authors":["Peng Xia","Siwei Han","Shi Qiu","Yiyang Zhou","Zhaoyang Wang","Wenhao Zheng","Zhaorun Chen","Chenhang Cui","Mingyu Ding","Linjie Li","Lijuan Wang","Huaxiu Yao"],"abstract":"Interleaved multimodal comprehension and generation, enabling models to produce and interpret both images and text in arbitrary sequences, have become a pivotal area in multimodal learning. Despite significant advancements, the evaluation of this capability remains insufficient. Existing benchmarks suffer from limitations in data scale, scope, and evaluation depth, while current evaluation metrics are often costly or biased, lacking in reliability for practical applications. To address these challenges, we introduce MMIE, a large-scale knowledge-intensive benchmark for evaluating interleaved multimodal comprehension and generation in Large Vision-Language Models (LVLMs). MMIE comprises 20K meticulously curated multimodal queries, spanning 3 categories, 12 fields, and 102 subfields, including mathematics, coding, physics, literature, health, and arts. It supports both interleaved inputs a...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":80,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer vision","Computer science","Computer Vision and Pattern Recognition","Vision-language models","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/kblam-knowledge-base-augmented-language-model-2","title":"KBLaM: Knowledge Base augmented Language Model","url":"https://www.microsoft.com/en-us/research/publication/kblam-knowledge-base-augmented-language-model-2/","published":"2024-10-13","authors":["Xi Wang","Liana Mikaelyan","Taketomo Isazawa","James Hensman"],"abstract":"In this paper, we propose Knowledge Base augmented Language Model (KBLaM), a new method for augmenting Large Language Models (LLMs) with external knowledge. KBLaM works with a knowledge base (KB) constructed from a corpus of documents, transforming each piece of knowledge in the KB into continuous key-value vector pairs via pre-trained sentence encoders with linear adapters and integrating them into pre-trained LLMs via a specialized rectangular attention mechanism. Unlike Retrieval-Augmented Generation, KBLaM eliminates external retrieval modules, and unlike in-context learning, its computational overhead scales linearly with KB size rather than quadratically. Our approach enables integrating a large KB of more than 10K triples into an 8B pre-trained LLM of only 8K context window on one single A100 80GB GPU and allows for dynamic updates without model fine-tuning or retraining. Experime...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Miscellaneous","Artificial intelligence","Computer science","LLM","language model","retrieval"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/blendscape-enabling-end-user-customization-of-video-conferencing-environments-through-generative-ai","title":"BlendScape: Enabling End-User Customization of Video-Conferencing Environments through Generative AI","url":"https://www.microsoft.com/en-us/research/publication/blendscape-enabling-end-user-customization-of-video-conferencing-environments-through-generative-ai/","published":"2024-10-11","authors":["Shwetha Rajaram","Nels Numan","Balasaravanan Thoravi Kumaravel","Nicolai Marquardt","Andrew D. Wilson"],"abstract":"Today's video-conferencing tools support a rich range of professional and social activities, but their generic meeting environments cannot be dynamically adapted to align with distributed collaborators' needs. To enable end-user customization, we developed BlendScape, a rendering and composition system for video-conferencing participants to tailor environments to their meeting context by leveraging AI image generation techniques. BlendScape supports flexible representations of task spaces by blending users' physical or digital backgrounds into unified environments and implements multimodal interaction techniques to steer the generation. Through an exploratory study with 15 end-users, we investigated whether and how they would find value in using generative AI to customize video-conferencing environments. Participants envisioned using a system like BlendScape to facilitate collaborative a...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.1145/3654777.3676326","openalex_id":"https://openalex.org/W4403333805","cited_by_count":11,"quality_score":83,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Graphics and multimedia","Human-computer interaction","Computer science"],"author_affiliations":["Microsoft","Michigan United","Microsoft (United States)","Microsoft Research (United Kingdom)","University College London","University of Michigan"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"apple:axvif5pyh8qsp5s2obtl62th","title":"GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models","url":"https://machinelearning.apple.com/research/gsm-symbolic","published":"2024-10-11","authors":["Iman Mirzadeh","Keivan Alizadeh","Hooman Shahrokhi","Oncel Tuzel","Samy Bengio","Mehrdad Farajtabar"],"abstract":"Recent advancements in Large Language Models (LLMs) have sparked interest in their formal reasoning capabilities, particularly in mathematics. The GSM8K benchmark is widely used to assess the mathematical reasoning of models on grade-school-level questions. While the performance of LLMs on GSM8K has significantly improved in recent years, it remains unclear whether their mathematical reasoning capabilities have genuinely advanced, raising...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"openalex:W4406356471","title":"Transformer-Based Self-Supervised Learning and Distillation for Medical Image Classification: Improving Colorectal Cancer Detection on NCT-CRC-HE-100K with Swin-T V2","url":"https://doi.org/10.1109/cbase64041.2024.10824558","published":"2024-10-11","authors":["Meng Li"],"abstract":"In this work, we present a novel approach to colorectal cancer tissue classification using Swin-Transformer V2 on the NCT-CRC-HE-100K dataset. This study is the first to apply Swin-Transformer V2 in this domain, leveraging its advanced architecture to achieve state-of-the-art performance. Building upon the success of previous self-supervised learning methods like MoBY, which demonstrated strong results on natural images, we extend these techniques to medical datasets. We perform self-supervised pretraining on a wide range of tumor-related datasets, incorporating advanced data augmentation strategies, such as random cropping and 2x magnification, to address the multi-scale nature of histopathological images. After pretraining, we employ a progressive layer-wise distillation technique, transferring knowledge from a large teacher model to a more efficient student model. This method dynamica...","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/cbase64041.2024.10824558","openalex_id":"https://openalex.org/W4406356471","cited_by_count":3,"quality_score":48,"matched_keywords":["efficient","distillation"],"author_affiliations":["Baidu (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.59776771068573},{"id":"https://openalex.org/C526805850","display_name":"Colorectal cancer","score":0.5764474272727966},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5541924238204956},{"id":"https://openalex.org/C75294576","display_name":"Contextual image classification","score":0.4708525538444519},{"id":"https://openalex.org/C153180895","display_name":"Pattern recognition (psychology)","score":0.4004132151603699},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.3893229365348816},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.3277119994163513},{"id":"https://openalex.org/C121608353","display_name":"Cancer","score":0.2888525128364563}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":3}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/controllable-safety-alignment-inference-time-adaptation-to-diverse-safety-requirements","title":"Controllable Safety Alignment: Inference-Time Adaptation to Diverse Safety Requirements","url":"https://www.microsoft.com/en-us/research/publication/controllable-safety-alignment-inference-time-adaptation-to-diverse-safety-requirements/","published":"2024-10-10","authors":["Jingyu (Jack) Zhang","Ahmed Elgohary","Ahmed Magooda","Daniel Khashabi","Ben Van Durme"],"abstract":"The current paradigm for safety alignment of large language models (LLMs) follows a one-size-fits-all approach: the model refuses to interact with any content deemed unsafe by the model provider. This approach lacks flexibility in the face of varying social norms across cultures and regions. In addition, users may have diverse safety needs, making a model with static safety standards too restrictive to be useful, as well as too costly to be re-aligned. We propose Controllable Safety Alignment (CoSA), a framework designed to adapt models to diverse safety requirements without re-training. Instead of aligning a fixed model, we align models to follow safety configs -- free-form natural language descriptions of the desired safety behaviors -- that are provided as part of the system prompt. To adjust model safety behavior, authorized users only need to modify such safety configs at inference....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":80,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computation and Language","Computer science","large language models","1970-01-01","LLM"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"bytedance-seed:167","title":"Reward-Augmented Data Enhances Direct Preference Alignment of LLMs","url":"https://seed.bytedance.com/en/research/reward-augmented-data-enhances-direct-preference-alignment-of-llms","published":"2024-10-10","authors":["Shenao Zhang","Zhihan Liu","Boyi Liu","Yufeng Zhang","Yingxiang Yang","Yongfei Liu","Liyu Chen","Tao Sun","Zhaoran Wang"],"abstract":"Preference alignment in Large Language Models (LLMs) has significantly improved their ability to adhere to human instructions and intentions. However, existing direct alignment algorithms primarily focus on relative preferences and often overlook the qualitative aspects of responses. Striving to maximize the implicit reward gap between the chosen and the slightly inferior rejected responses can cause overfitting and unnecessary unlearning of the high-quality rejected responses. The unawareness of the reward scores also drives the LLM to indiscriminately favor the low-quality chosen responses and fail to generalize to responses with the highest rewards, which are sparse in data. To overcome these shortcomings, our study introduces reward-conditioned LLM policies that discern and learn from the entire spectrum of response quality within the dataset, helping extrapolate to more optimal regi...","companies":["ByteDance/Seed"],"matched_orgs":["ByteDance/Seed"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Machine Learning","LLM","ICML 2025","preference"],"author_affiliations":["ByteDance/Seed"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official ByteDance Seed publication API https://seed.bytedance.com/api/get_article_list_v2"}},{"id":"openalex:W4403277141","title":"GPT-4V(ision) for Robotics: Multimodal Task Planning From Human Demonstration","url":"https://doi.org/10.1109/lra.2024.3477090","published":"2024-10-09","authors":["Naoki Wake","Atsushi Kanehira","Kazuhiro Sasabuchi","Jun Takamatsu","Katsushi Ikeuchi"],"abstract":"We introduce a pipeline that enhances a general-purpose Vision Language Model, GPT-4V(ision), to facilitate one-shot visual teaching for robotic manipulation. This system analyzes videos of humans performing tasks and outputs executable robot programs that incorporate insights into affordances. The process begins with GPT-4 V analyzing the videos to obtain textual explanations of environmental and action details. A GPT-4-based task planner then encodes these details into a symbolic task plan. Subsequently, vision systems spatially and temporally ground the task plan in the videos—objects are identified using an open-vocabulary object detector, and hand-object interactions are analyzed to pinpoint moments of grasping and releasing. This spatiotemporal grounding allows for the gathering of affordance information (e.g., grasp types, waypoints, and body postures) critical for robot execution...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/lra.2024.3477090","openalex_id":"https://openalex.org/W4403277141","cited_by_count":57,"quality_score":71,"matched_keywords":["language model"],"author_affiliations":["Microsoft (United States)","Robotics Research (United States)"],"concepts":[{"id":"https://openalex.org/C34413123","display_name":"Robotics","score":0.7312030792236328},{"id":"https://openalex.org/C2780451532","display_name":"Task (project management)","score":0.661120593547821},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.6384459137916565},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5236296653747559},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.4994330406188965},{"id":"https://openalex.org/C4441509","display_name":"Multimodal therapy","score":0.45457762479782104},{"id":"https://openalex.org/C201995342","display_name":"Systems engineering","score":0.2644472122192383},{"id":"https://openalex.org/C15744967","display_name":"Psychology","score":0.24995484948158264}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":57}},{"id":"bytedance-seed:222","title":"KOR-Bench: Benchmarking Language Models on Knowledge-Orthogonal Reasoning Tasks","url":"https://seed.bytedance.com/en/research/kor-bench-benchmarking-language-models-on-knowledge-orthogonal-reasoning-tasks","published":"2024-10-09","authors":["Kaijing Ma","Xinrun Du","Yunran Wang","Haoran Zhang","Zhoufutu Wen","Xingwei Qu","Jian Yang","Jiaheng Liu","Minghao Liu","Xiang Yue","Wenhao Huang","Ge Zhang"],"abstract":"In this paper, we introduce Knowledge-Orthogonal Reasoning (KOR), a concept aimed at minimizing reliance on domain-specific knowledge, enabling more accurate evaluation of models' reasoning abilities in out-of-distribution settings. Based on this concept, we propose the Knowledge-Orthogonal Reasoning Benchmark (KOR-Bench), encompassing five task categories: Operation, Logic, Cipher, Puzzle, and Counterfactual. KOR-Bench emphasizes models' effectiveness in applying new rule descriptions to solve novel rule-driven questions. O1-Preview and O1-Mini achieve accuracies of 72.88% and 70.16%, surpassing Claude-3.5-Sonnet and GPT-4o (58.96% and 58.00%), highlighting the effectiveness of KOR-Bench. We perform detailed analyses, identifying bottlenecks in the Cipher task with Stepwise Prompting, where two rounds of Self-Correction yield optimal results. We evaluate performance across three integra...","companies":["ByteDance/Seed"],"matched_orgs":["ByteDance/Seed"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Databases","ICLR 2025"],"author_affiliations":["ByteDance/Seed"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official ByteDance Seed publication API https://seed.bytedance.com/api/get_article_list_v2"}},{"id":"apple:neakiyju70tnnl29p39mtttg","title":"On the Limited Generalization Capability of the Implicit Reward Model Induced by Direct Preference Optimization","url":"https://machinelearning.apple.com/research/reward-generalization","published":"2024-10-09","authors":["Yong Lin","Skyler Seto","Maartje ter Hoeve","Katherine Metcalf","Barry-John Theobald","Xuan Wang","Yizhe Zhang","Chen Huang","Tong Zhang"],"abstract":"Reinforcement Learning from Human Feedback (RLHF) is an effective approach for aligning language models to human preferences. Central to RLHF is learning a reward function for scoring human preferences. Two main approaches for learning a reward model are 1) training an explicit reward model as in RLHF, and 2) using an implicit reward learned from preference data through methods such as Direct Preference Optimization (DPO). Prior work has shown...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["preference"],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"apple:hdw16yix0esi2fqov9vohy65","title":"Depth Pro: Sharp Monocular Metric Depth in Less Than a Second","url":"https://machinelearning.apple.com/research/depth-pro","published":"2024-10-09","authors":["Aleksei Bochkovskii","Amaël Delaunoy","Hugo Germain","Marcel Santos","Yichao Zhou","Stephan R. Richter","Vladlen Koltun"],"abstract":"We present a foundation model for zero-shot metric monocular depth estimation. Our model, Depth Pro, synthesizes high-resolution depth maps with unparalleled sharpness and high-frequency details. The predictions are metric, with absolute scale, without relying on the availability of metadata such as camera intrinsics. And the model is fast, producing a 2.25-megapixel depth map in 0.3 seconds on a standard GPU. These characteristics are enabled by...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"openalex:W4403220611","title":"Towards Open-World Recommendation with Knowledge Augmentation from Large Language Models","url":"https://doi.org/10.1145/3640457.3688104","published":"2024-10-08","authors":["Yunjia Xi","Weiwen Liu","Jianghao Lin","Xiaoling Cai","Hong Zhu","Jieming Zhu","Bo Chen","Ruiming Tang","Weinan Zhang","Yong Yu"],"abstract":"Recommender system plays a vital role in various online services. However, its insulated nature of training and deploying separately within a specific closed domain limits its access to open-world knowledge. Recently, the emergence of large language models (LLMs) has shown promise in bridging this gap by encoding extensive world knowledge and demonstrating reasoning capabilities. Nevertheless, previous attempts to directly use LLMs as recommenders cannot meet the inference latency demand of industrial recommender systems. In this work, we propose an Open-World Knowledge Augmented Recommendation Framework with Large Language Models, dubbed KAR, to acquire two types of external knowledge from LLMs — the reasoning knowledge on user preferences and the factual knowledge on items. We introduce factorization prompting to elicit accurate reasoning on user preferences. The generated reasoning an...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3640457.3688104","openalex_id":"https://openalex.org/W4403220611","cited_by_count":95,"quality_score":79,"matched_keywords":["LLM","news","efficient"],"author_affiliations":["Huawei Technologies (China)","Shanghai Jiao Tong University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7452124357223511},{"id":"https://openalex.org/C136764020","display_name":"World Wide Web","score":0.37700921297073364},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.3658730089664459},{"id":"https://openalex.org/C2522767166","display_name":"Data science","score":0.333984375}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":95}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/farmer-chat-scaling-ai-powered-agricultural-services-for-smallholder-farmers","title":"Farmer.Chat: Scaling AI-Powered Agricultural Services for Smallholder Farmers","url":"https://www.microsoft.com/en-us/research/publication/farmer-chat-scaling-ai-powered-agricultural-services-for-smallholder-farmers/","published":"2024-10-08","authors":["Namita Singh","Jacqueline Wang'ombe","Nereah Okanga","Tetyana Zelenska","Jona Repishti","Jayasankar G K","Sanjeev Mishra","Rajsekar Manokaran","Vineet Singh","Mohammed Irfan Rafiq","Rikin Gandhi","Akshay Nambi"],"abstract":"Small and medium-sized agricultural holders face challenges like limited access to localized, timely information, impacting productivity and sustainability. Traditional extension services, which rely on in-person agents, struggle with scalability and timely delivery, especially in remote areas. We introduce FarmerChat, a generative AI-powered chatbot designed to address these issues. Leveraging Generative AI, FarmerChat offers personalized, reliable, and contextually relevant advice, overcoming limitations of previous chatbots in deterministic dialogue flows, language support, and unstructured data processing. Deployed in four countries, FarmerChat has engaged over 15,000 farmers and answered over 300,000 queries. This paper highlights how FarmerChat's innovative use of GenAI enhances agricultural service scalability and effectiveness. Our evaluation, combining quantitative analysis and....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Unpublished","Artificial intelligence","Human-computer interaction","Technology for emerging markets","personalized"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W4403221739","title":"CoST: Contrastive Quantization based Semantic Tokenization for Generative Recommendation","url":"https://doi.org/10.1145/3640457.3688178","published":"2024-10-08","authors":["Jieming Zhu","Mengqun Jin","Qijiong Liu","Zexuan Qiu","Zhenhua Dong","Xiu Li"],"abstract":"Embedding-based retrieval serves as a dominant approach to candidate item matching for industrial recommender systems. With the success of generative AI, generative retrieval has recently emerged as a new retrieval paradigm for recommendation, which casts item retrieval as a generation problem. Its model consists of two stages: semantic tokenization and autoregressive generation. The first stage involves item tokenization that constructs discrete semantic tokens to index items, while the second stage autoregressively generates semantic tokens of candidate items. Therefore, semantic tokenization serves as a crucial preliminary step for training generative recommendation models. Existing research usually employs a vector quantizier with reconstruction loss (e.g., RQ-VAE) to obtain semantic tokens of items, but this method fails to capture the essential neighborhood relationships that are v...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3640457.3688178","openalex_id":"https://openalex.org/W4403221739","cited_by_count":10,"quality_score":55,"matched_keywords":["retrieval","quantization"],"author_affiliations":["Chinese University of Hong Kong","Hong Kong Polytechnic University","Huawei Technologies (China)","Tsinghua University","University Town of Shenzhen"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8044155240058899},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.6211783289909363},{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.6014668941497803},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5965169072151184},{"id":"https://openalex.org/C176982825","display_name":"Lexical analysis","score":0.48957642912864685},{"id":"https://openalex.org/C28855332","display_name":"Quantization (signal processing)","score":0.43393439054489136},{"id":"https://openalex.org/C11413529","display_name":"Algorithm","score":0.09630972146987915}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":10}},{"id":"openalex:W4403221563","title":"FLIP: Fine-grained Alignment between ID-based Models and Pretrained Language Models for CTR Prediction","url":"https://doi.org/10.1145/3640457.3688106","published":"2024-10-08","authors":["Hangyu Wang","Jianghao Lin","Xiangyang Li","Bo Chen","Chenxu Zhu","Ruiming Tang","Weinan Zhang","Yong Yu"],"abstract":"Click-through rate (CTR) prediction plays as a core function module in various personalized online services. The traditional ID-based models for CTR prediction take as inputs the one-hot encoded ID features of tabular modality, which capture the collaborative signals via feature interaction modeling. But the one-hot encoding discards the semantic information included in the textual features. Recently, the emergence of Pretrained Language Models (PLMs) has given rise to another paradigm, which takes as inputs the sentences of textual modality obtained by hard prompt templates and adopts PLMs to extract the semantic knowledge. However, PLMs often face challenges in capturing field-wise collaborative signals and distinguishing features with subtle textual differences. In this paper, to leverage the benefits of both paradigms and meanwhile overcome their limitations, we propose to conduct Fi...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3640457.3688106","openalex_id":"https://openalex.org/W4403221563","cited_by_count":12,"quality_score":53,"matched_keywords":["personalized"],"author_affiliations":["Huawei Technologies (China)","Shanghai Jiao Tong University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7453623414039612},{"id":"https://openalex.org/C2776591724","display_name":"Flip","score":0.5105987191200256},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.47642120718955994},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.41361522674560547},{"id":"https://openalex.org/C185592680","display_name":"Chemistry","score":0.0},{"id":"https://openalex.org/C190283241","display_name":"Apoptosis","score":0.0},{"id":"https://openalex.org/C55493867","display_name":"Biochemistry","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":12}},{"id":"openalex:W4403221778","title":"The Elephant in the Room: Rethinking the Usage of Pre-trained Language Model in Sequential Recommendation","url":"https://doi.org/10.1145/3640457.3688107","published":"2024-10-08","authors":["Zekai Qu","Ruobing Xie","Chaojun Xiao","Zhanhui Kang","Xingwu Sun"],"abstract":"Sequential recommendation (SR) has seen significant advancements with the help of Pre-trained Language Models (PLMs). Some PLM-based SR models directly use PLM to encode user historical behavior’s text sequences to learn user representations, while there is seldom an in-depth exploration of the capability and suitability of PLM in behavior sequence modeling. In this work, we first conduct extensive model analyses between PLMs and PLM-based SR models, discovering great underutilization and parameter redundancy of PLMs in behavior sequence modeling. Inspired by this, we explore different lightweight usages of PLMs in SR, aiming to maximally stimulate the ability of PLMs for SR while satisfying the efficiency and usability demands of practical systems. We discover that adopting behavior-tuned PLMs for item initializations of conventional ID-based SR models is the most economical framework o...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3640457.3688107","openalex_id":"https://openalex.org/W4403221778","cited_by_count":6,"quality_score":47,"matched_keywords":["language model"],"author_affiliations":["China University of Geosciences (Beijing)","Tencent (China)","Tsinghua University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7035186290740967},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.5073708891868591},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.37432312965393066},{"id":"https://openalex.org/C28490314","display_name":"Speech recognition","score":0.3657037615776062},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.34629860520362854}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":6}},{"id":"openalex:W4403222013","title":"Analyzing User Preferences and Quality Improvement on Bing's WebPage Recommendation Experience with Large Language Models","url":"https://doi.org/10.1145/3640457.3688062","published":"2024-10-08","authors":["J.J. Shah","Gang Luo","Jialin Liu","Amey Barapatre","Fan Wu","Chuck Wang","Hongzhi Li"],"abstract":"Explore Further @ Bing (Web Recommendations) is a web-scale query independent webpage-to-webpage recommendation system with an index size of over 200 billion webpages. Due to the significant variability in webpage quality across the web and the reliance of our system on learning soleley user behavior (clicks), our production system was susceptible to serving clickbait and low-quality recommendations. Our team invested several months in developing and shipping several improvements that utilize LLM-generated recommendation quality labels to enhance our ranking stack to improve the nature of the recommendations we show to our users. Another key motivation behind our efforts was to go beyond merely surfacing relevant webpages, focusing instead on prioritizing more useful and authoritative content that delivers value to users based on their implied intent. We demonstrate how large language mo...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3640457.3688062","openalex_id":"https://openalex.org/W4403222013","cited_by_count":0,"quality_score":45,"matched_keywords":["LLM","language model"],"author_affiliations":["Microsoft (United States)","Microsoft Research Asia (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8057816028594971},{"id":"https://openalex.org/C2779530757","display_name":"Quality (philosophy)","score":0.6474486589431763},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.5748195052146912},{"id":"https://openalex.org/C21959979","display_name":"Web page","score":0.52670818567276},{"id":"https://openalex.org/C136764020","display_name":"World Wide Web","score":0.48648640513420105},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.35467684268951416},{"id":"https://openalex.org/C111472728","display_name":"Epistemology","score":0.0},{"id":"https://openalex.org/C138885662","display_name":"Philosophy","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4403221761","title":"TLRec: A Transfer Learning Framework to Enhance Large Language Models for Sequential Recommendation Tasks","url":"https://doi.org/10.1145/3640457.3691710","published":"2024-10-08","authors":["Jiaye Lin","Shuang Peng","Zhong Zhang","Peilin Zhao"],"abstract":"Recently, Large Language Models (LLMs) have garnered significant attention in recommendation systems, improving recommendation performance through in-context learning or parameter-efficient fine-tuning. However, cross-domain generalization, i.e., model training in one scenario (source domain) but inference in another (target domain), is underexplored. In this paper, we present TLRec, a transfer learning framework aimed at enhancing LLMs for sequential recommendation tasks. TLRec specifically focuses on text inputs to mitigate the challenge of limited transferability across diverse domains, offering promising advantages over traditional recommendation models that heavily depend on unique identities (IDs) like user IDs and item IDs. Moreover, we leverage the source domain data to further enhance LLMs’ performance in the target domain. Initially, we employ powerful closed-source LLMs (e.g.,...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3640457.3691710","openalex_id":"https://openalex.org/W4403221761","cited_by_count":3,"quality_score":44,"matched_keywords":["efficient"],"author_affiliations":["Tencent (China)","Tsinghua University","Zhejiang Lab"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8357220888137817},{"id":"https://openalex.org/C150899416","display_name":"Transfer of learning","score":0.6693949699401855},{"id":"https://openalex.org/C2776175482","display_name":"Transfer (computing)","score":0.47491559386253357},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.379228413105011},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.3619433641433716},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.3592793941497803},{"id":"https://openalex.org/C173608175","display_name":"Parallel computing","score":0.06808087229728699}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":3}},{"id":"openalex:W4403210799","title":"MM-NeRF: Multimodal-Guided 3D Multi-Style Transfer of Neural Radiance Field","url":"https://doi.org/10.1109/tvcg.2024.3476331","published":"2024-10-08","authors":["Zijiang Yang","Zhongwei Qiu","Chang Xu","Dongmei Fu"],"abstract":"3D style transfer aims to generate stylized views of 3D scenes with specified styles, which requires high-quality generating and keeping multi-view consistency. Existing methods still suffer the challenges of high-quality stylization with texture details and stylization with multimodal guidance. In this paper, we reveal that the common training method of stylization with NeRF, which generates stylized multi-view supervision by 2D style transfer models, causes the same object in supervision to show various states (color tone, details, etc.) in different views, leading NeRF to tend to smooth the texture details, further resulting in low-quality rendering for 3D multi-style transfer. To tackle these problems, we propose a novel Multimodal-guided 3D Multi-style transfer of NeRF, termed MM-NeRF. First, MM-NeRF projects multimodal guidance into a unified space to keep the multimodal styles con...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/tvcg.2024.3476331","openalex_id":"https://openalex.org/W4403210799","cited_by_count":3,"quality_score":40,"matched_keywords":[],"author_affiliations":["Alibaba Group (China)","The University of Sydney","University of Science and Technology Beijing"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8176856637001038},{"id":"https://openalex.org/C2776436953","display_name":"Consistency (knowledge bases)","score":0.6758487820625305},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.6020533442497253},{"id":"https://openalex.org/C38935604","display_name":"Stylized fact","score":0.4989891052246094},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.4895525872707367},{"id":"https://openalex.org/C2779530757","display_name":"Quality (philosophy)","score":0.48864561319351196},{"id":"https://openalex.org/C2776445246","display_name":"Style (visual arts)","score":0.4539409279823303},{"id":"https://openalex.org/C205711294","display_name":"Rendering (computer graphics)","score":0.44638678431510925}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":3}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/differential-transformer","title":"Differential Transformer","url":"https://www.microsoft.com/en-us/research/publication/differential-transformer/","published":"2024-10-07","authors":["Tianzhu Ye","Li Dong","Yuqing Xia","Yutao Sun","Yi Zhu","Gao Huang","Furu Wei"],"abstract":"Transformer tends to overallocate attention to irrelevant context. In this work, we introduce Diff Transformer, which amplifies attention to the relevant context while canceling noise. Specifically, the differential attention mechanism calculates attention scores as the difference between two separate softmax attention maps. The subtraction cancels noise, promoting the emergence of sparse attention patterns. Experimental results on language modeling show that Diff Transformer outperforms Transformer in various settings of scaling up model size and training tokens. More intriguingly, it offers notable advantages in practical applications, such as long-context modeling, key information retrieval, hallucination mitigation, in-context learning, and reduction of activation outliers. By being less distracted by irrelevant context, Diff Transformer can mitigate hallucination in question answeri...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Tech Report","Artificial intelligence","Computation and Language","retrieval"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W4403177297","title":"Unified Frequency-Assisted Transformer Framework for Detecting and Grounding Multi-modal Manipulation","url":"https://doi.org/10.1007/s11263-024-02245-x","published":"2024-10-07","authors":["Huan Liu","Zichang Tan","Qiang Chen","Yunchao Wei","Yao Zhao","Jingdong Wang"],"abstract":"","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1007/s11263-024-02245-x","openalex_id":"https://openalex.org/W4403177297","cited_by_count":15,"quality_score":52,"matched_keywords":[],"author_affiliations":["Baidu (China)","Beijing Jiaotong University"],"concepts":[{"id":"https://openalex.org/C71139939","display_name":"Modal","score":0.752886176109314},{"id":"https://openalex.org/C66322947","display_name":"Transformer","score":0.6614856719970703},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6000533103942871},{"id":"https://openalex.org/C168993435","display_name":"Ground","score":0.5726112127304077},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4320608377456665},{"id":"https://openalex.org/C153180895","display_name":"Pattern recognition (psychology)","score":0.42930951714515686},{"id":"https://openalex.org/C127413603","display_name":"Engineering","score":0.20806542038917542},{"id":"https://openalex.org/C119599485","display_name":"Electrical engineering","score":0.20052754878997803}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":15}},{"id":"openalex:W4405601284","title":"Reinforest: Reinforcing Semantic Code Similarity for Cross-Lingual Code Search Models","url":"https://doi.org/10.1109/scam63643.2024.00026","published":"2024-10-07","authors":["Anthony Saieva","Saikat Chakraborty","Gail E. Kaiser"],"abstract":"This paper introduces a novel code-to-code search technique that enhances the performance of Large Language Models (LLMs) by including both static and dynamic features as well as utilizing both similar and dissimilar examples during training. We present the first-ever code search method that encodes dynamic runtime information during training without the need to execute either the corpus under search or the search query at inference time and the first code search technique that trains on both positive and negative reference samples. To validate the efficacy of our approach, we perform a set of studies demonstrating the capability of enhanced LLMs to perform cross-language code-to-code search. Our evaluation demonstrates that the effectiveness of our approach is consistent across various model architectures and programming languages. We outperform the state-of-the-art cross-language searc...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/scam63643.2024.00026","openalex_id":"https://openalex.org/W4405601284","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Columbia University","IBM (United States)","Microsoft (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.789757251739502},{"id":"https://openalex.org/C2776760102","display_name":"Code (set theory)","score":0.6082568764686584},{"id":"https://openalex.org/C130318100","display_name":"Semantic similarity","score":0.5877425074577332},{"id":"https://openalex.org/C103278499","display_name":"Similarity (geometry)","score":0.5001578330993652},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.4554268419742584},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.44855907559394836},{"id":"https://openalex.org/C199360897","display_name":"Programming language","score":0.4483790993690491},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3884798288345337}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/i-code-studio-a-configurable-and-composable-framework-for-integrative-ai","title":"i-Code Studio: A Configurable and Composable Framework for Integrative AI","url":"https://www.microsoft.com/en-us/research/publication/i-code-studio-a-configurable-and-composable-framework-for-integrative-ai/","published":"2024-10-06","authors":["Yuwei Fang","Mahmoud Khademi","Chenguang Zhu","Ziyi Yang","Reid Pryzant","Yichong Xu","Yao Qian","Takuya Yoshioka","Lu Yuan","Michael Zeng","Xuedong Huang"],"abstract":"Artificial General Intelligence (AGI) requires comprehensive understanding and generation capabilities for a variety of tasks spanning different modalities and functionalities. Integrative AI is one important direction to approach AGI, through combining multiple models to tackle complex multimodal tasks. However, there is a lack of a flexible and composable platform to facilitate efficient and effective model composition and coordination. In this paper, we propose the i-Code Studio, a configurable and composable framework for Integrative AI. The i-Code Studio orchestrates multiple pre-trained models in a finetuning-free fashion to conduct complex multimodal tasks. Instead of simple model composition, the i-Code Studio provides an integrative, flexible, and composable setting for developers to quickly and easily compose cutting-edge services and technologies tailored to their specific req...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":92,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer vision","Human language technologies","Multimodal Large Language Models","Natural language processing","1970-01-01","retrieval","efficient","agent"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/on-evaluating-llms-capabilities-as-functional-approximators-a-bayesian-perspective","title":"On Evaluating LLMs' Capabilities as Functional Approximators: A Bayesian Perspective","url":"https://www.microsoft.com/en-us/research/publication/on-evaluating-llms-capabilities-as-functional-approximators-a-bayesian-perspective/","published":"2024-10-06","authors":["Shoaib Ahmed Siddiqui","Yanzhi Chen","Juyeon Heo","Menglin Xia","Adrian Weller"],"abstract":"Recent works have successfully applied Large Language Models (LLMs) to function modeling tasks. However, the reasons behind this success remain unclear. In this work, we propose a new evaluation framework to comprehensively assess LLMs' function modeling abilities. By adopting a Bayesian perspective of function modeling, we discover that LLMs are relatively weak in understanding patterns in raw data, but excel at utilizing prior knowledge about the domain to develop a strong understanding of the underlying function. Our findings offer new insights about the strengths and limitations of LLMs in the context of function modeling. Venue: 1970-01-01","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/flsa-learning-semantic-structures-in-document-collections-using-foundation-models-2","title":"fLSA: Learning Semantic Structures in Document Collections Using Foundation Models","url":"https://www.microsoft.com/en-us/research/publication/flsa-learning-semantic-structures-in-document-collections-using-foundation-models-2/","published":"2024-10-06","authors":["Weijia Xu","Nebojsa Jojic","Nicolas Le Roux"],"abstract":"Humans can learn to solve new tasks by inducing high-level strategies from example solutions to similar problems and then adapting these strategies to solve unseen problems. Can we use large language models to induce such high-level structure from example documents or solutions? We introduce fLSA, a foundation-model-based Latent Semantic Analysis method that iteratively clusters and tags document segments based on document-level contexts. These tags can be used to model the latent structure of given documents and for hierarchical sampling of new texts. Our experiments on story writing, math, and multi-step reasoning datasets demonstrate that fLSA tags are more informative in reconstructing the original texts than existing tagging methods. Moreover, when used for hierarchical sampling, fLSA tags help expand the output space in the right directions that lead to correct solutions more often...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.18653/v1/2025.emnlp-main.1290","openalex_id":"https://openalex.org/W4416035638","cited_by_count":0,"quality_score":64,"matched_keywords":["Miscellaneous","Artificial intelligence","Computer science"],"author_affiliations":["Microsoft","Microsoft (United States)","Microsoft Research (United Kingdom)"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/__trashed-7","title":"DermaVQA: A Multilingual Visual Question Answering Dataset for Dermatology","url":"https://www.microsoft.com/en-us/research/publication/__trashed-7/","published":"2024-10-04","authors":["Wen-wai Yim","Yujuan Fu","Zhaoyi Sun","Asma Ben Abacha","Meliha Yetisgen","Fei Xia"],"abstract":"Remote medical care has become commonplace with the establishment of patient portals, the maturation of web technologies, and the proliferation of personal devices. However, though on-demand care provides convenience and expands patient access, this same phenomenon may lead to increased workload for healthcare providers. Drafting candidate responses may help speed up physician workflows answering electronic messages. One specialty that may benefit from the latest multi-modal vision-language foundational models is dermatology. However, there is no existing dataset that incorporate dermatological health queries along with user-generated images. In this work, we contribute a new dataset, DermaVQA (https://osf.io/72rp3/), for the task of dermatology question answering and we benchmark the performance of state-of-the-art multi-modal models on multilingual response generation using relevant mu...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.1007/978-3-031-72086-4_20","openalex_id":"https://openalex.org/W4403089285","cited_by_count":5,"quality_score":77,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Medical, health and genomics","Computer science","1970-01-01"],"author_affiliations":["Microsoft","Microsoft (United States)","University of Washington"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"official:f22a7d15f23574b6","title":"Beyond Turn-Based Interfaces: Synchronous LLMs as Full-Duplex Dialogue Agents","url":"https://ai.meta.com/research/publications/beyond-turn-based-interfaces-synchronous-llms-as-full-duplex-dialogue-agents/","published":"2024-10-04","authors":["Bandhav Veluri","Benjamin Peloquin","Bokai Yu","Hongyu Gong","Shyam Gollakota"],"abstract":"Official Meta AI publication page.","companies":["Meta/FAIR"],"matched_orgs":["Meta/FAIR"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Human & Machine Intelligence","Conversational AI"],"author_affiliations":["Meta/FAIR"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Meta AI publication search https://ai.meta.com/results/?content_types%5B0%5D=publication&page=9"}},{"id":"openalex:W4403122935","title":"Generative AI: Redefining the Future of Software Engineering","url":"https://doi.org/10.1109/ms.2024.3441889","published":"2024-10-04","authors":["Anita Carleton","Davide Falessi","Hongyu Zhang","Xin Xia"],"abstract":"This special issue about Generative AI (GAI) for software engineering refers to applying generative models and algorithms in software development, testing, maintenance and evolution. This special issue features five articles where you will see some examples of GAI adoption and discuss research directions and challenges as GAI continues to experience transformative growth and application.","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/ms.2024.3441889","openalex_id":"https://openalex.org/W4403122935","cited_by_count":10,"quality_score":47,"matched_keywords":[],"author_affiliations":["Carnegie Mellon University","Chongqing University","Huawei Technologies (China)","Software Engineering Institute","University of Rome Tor Vergata"],"concepts":[{"id":"https://openalex.org/C115903868","display_name":"Software engineering","score":0.5845305919647217},{"id":"https://openalex.org/C182500959","display_name":"Social software engineering","score":0.5651377439498901},{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.5327921509742737},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5234536528587341},{"id":"https://openalex.org/C529173508","display_name":"Software development","score":0.48133453726768494},{"id":"https://openalex.org/C2777904410","display_name":"Software","score":0.41620492935180664},{"id":"https://openalex.org/C127413603","display_name":"Engineering","score":0.37423935532569885},{"id":"https://openalex.org/C186846655","display_name":"Software construction","score":0.3556695580482483}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":10}},{"id":"arxiv:2410.03083","title":"Scaling Parameter-Constrained Language Models with Quality Data","url":"https://huggingface.co/papers/2410.03083","published":"2024-10-04","authors":["Ernie Chang","Matteo Paltenghi","Yang Li","Pin-Jie Lin","Changsheng Zhao","Patrick Huber","Zechun Liu","Rastislav Rabatin","Yangyang Shi","Vikas Chandra"],"abstract":"Scaling laws in language modeling traditionally quantify training loss as a function of dataset size and model parameters, providing compute-optimal estimates but often neglecting the impact of data quality on model generalization. In this paper, we extend the conventional understanding of scaling law by offering a microscopic view of data quality within the original formulation -- effective training tokens -- which we posit to be a critical determinant of performance for parameter-constrained language models. Specifically, we formulate the proposed term of effective training tokens to be a combination of two readily-computed indicators of text: (i) text diversity and (ii) syntheticity as measured by a teacher model. We pretrained over 200 models of 25M to 1.5B parameters on a diverse set of sampled, synthetic data, and estimated the constants that relate text quality, model size, traini...","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["huggingface_search"],"source":"huggingface_search","work_type":"","doi":"","openalex_id":"","cited_by_count":0,"quality_score":27,"matched_keywords":[],"author_affiliations":[],"concepts":[],"official_report":false,"quality_signals":{"company_match_source":"HuggingFace Papers organization/author metadata"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/irgen-generative-modeling-for-image-retrieval","title":"IRGen: Generative Modeling for Image Retrieval","url":"https://www.microsoft.com/en-us/research/publication/irgen-generative-modeling-for-image-retrieval/","published":"2024-10-03","authors":["Yidan Zhang","Ting Zhang","Dong Chen","Yujing Wang","Qi Chen","Xing Xie","Hao Sun","Weiwei Deng","Qi Zhang","Fan Yang","Mao Yang","Qingmin Liao"],"abstract":"While generative modeling has become prevalent across numerous research fields, its integration into the realm of image retrieval remains largely unexplored and underjustified. In this paper, we present a novel methodology, reframing image retrieval as a variant of generative modeling and employing a sequence-to-sequence model. This approach is harmoniously aligned with the current trend towards unification in research, presenting a cohesive framework that allows for end-to-end differentiable searching. This, in turn, facilitates superior performance via direct optimization techniques. The development of our model, dubbed IRGen, addresses the critical technical challenge of converting an image into a concise sequence of semantic units, which is pivotal for enabling efficient and effective search. Extensive experiments demonstrate that our model achieves state-of-the-art performance on th...","companies":["Microsoft","Baidu"],"matched_orgs":["Microsoft","Baidu"],"company_groups":["company_us","company_china"],"company_regions":["US","China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.1007/978-3-031-72633-0_2","openalex_id":"https://openalex.org/W4404600716","cited_by_count":8,"quality_score":96,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","1970-01-01","retrieval","efficient"],"author_affiliations":["Microsoft","Baidu (China)","Beijing Normal University","Bunkyo University","Microsoft (United States)","The University of Tokyo","Tsinghua University","University Town of Shenzhen"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/toolgen-unified-tool-retrieval-and-calling-via-generation","title":"ToolGen: Unified Tool Retrieval and Calling via Generation","url":"https://www.microsoft.com/en-us/research/publication/toolgen-unified-tool-retrieval-and-calling-via-generation/","published":"2024-10-03","authors":["Renxi Wang","Xudong Han","Lei Ji","Shu Wang","Timothy Baldwin","Haonan Li"],"abstract":"As large language models (LLMs) advance, their inability to autonomously execute tasks by directly interacting with external tools remains a critical limitation. Traditional methods rely on inputting tool descriptions as context, which is constrained by context length and requires separate, often inefficient, retrieval mechanisms. We introduce ToolGen, a paradigm shift that integrates tool knowledge directly into the LLM's parameters by representing each tool as a unique token. This enables the LLM to generate tool calls and arguments as part of its next token prediction capabilities, seamlessly blending tool invocation with language generation. Our framework allows the LLM to access and utilize a vast amount of tools with no additional retrieval step, significantly enhancing both performance and scalability. Experimental results with over 47,000 tools show that ToolGen not only achieves...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":84,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computation and Language","Computer science","1970-01-01","LLM","retrieval","efficient"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W4403099947","title":"An AI Agent for Fully Automated Multi‐Omic Analyses","url":"https://doi.org/10.1002/advs.202407094","published":"2024-10-03","authors":["Juexiao Zhou","Bin Zhang","Guowei Li","Xiuying Chen","Haoyang Li","Xiaopeng Xu","Siyuan Chen","Wenjia He","Chencheng Xu","Liwei Liu","Xin Gao"],"abstract":"With the fast-growing and evolving omics data, the demand for streamlined and adaptable tools to handle bioinformatics analysis continues to grow. In response to this need, Automated Bioinformatics Analysis (AutoBA) is introduced, an autonomous AI agent designed explicitly for fully automated multi-omic analyses based on large language models (LLMs). AutoBA simplifies the analytical process by requiring minimal user input while delivering detailed step-by-step plans for various bioinformatics tasks. AutoBA's unique capacity to self-design analysis processes based on input data variations further underscores its versatility. Compared with online bioinformatic services, AutoBA offers multiple LLM backends, with options for both online and local usage, prioritizing data security and user privacy. In comparison to ChatGPT and open-source LLMs, an automated code repair (ACR) mechanism in Auto...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1002/advs.202407094","openalex_id":"https://openalex.org/W4403099947","cited_by_count":54,"quality_score":75,"matched_keywords":["LLM","agent"],"author_affiliations":["Huawei Technologies (China)","King Abdullah University of Science and Technology"],"concepts":[{"id":"https://openalex.org/C177606310","display_name":"Adaptability","score":0.7895081639289856},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7578089237213135},{"id":"https://openalex.org/C3913047","display_name":"sync","score":0.6125421524047852},{"id":"https://openalex.org/C63479239","display_name":"Robustness (evolution)","score":0.5435360670089722},{"id":"https://openalex.org/C43521106","display_name":"Pipeline (software)","score":0.5354447960853577},{"id":"https://openalex.org/C98045186","display_name":"Process (computing)","score":0.45679640769958496},{"id":"https://openalex.org/C2780598303","display_name":"Flexibility (engineering)","score":0.414425253868103},{"id":"https://openalex.org/C2522767166","display_name":"Data science","score":0.4126374125480652}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":54}},{"id":"bytedance-seed:118","title":"Loong: Generating Minute-level Long Videos with Autoregressive Language Models","url":"https://seed.bytedance.com/en/research/loong-generating-minute-level-long-videos-with-autoregressive-language-models","published":"2024-10-03","authors":["Yuqing Wang","Tianwei Xiong","Daquan Zhou","Zhijie Lin","Yang Zhao","Bingyi Kang","Jiashi Feng","Xihui Liu"],"abstract":"It is desirable but challenging to generate content-rich long videos in the scale of minutes. Autoregressive large language models (LLMs) have achieved great success in generating coherent and long sequences of tokens in the domain of natural language processing, while the exploration of autoregressive LLMs for video generation is limited to generating short videos of several seconds. In this work, we conduct a deep analysis of the challenges that prevent autoregressive LLM-based video generators from generating long videos. Based on the observations and analysis, we propose Loong, a new autoregressive LLM-based video generator that can generate minute-long videos. Specifically, we model the text tokens and video tokens as a unified sequence for autoregressive LLMs and train the model from scratch. We propose progressive short-to-long training with a loss re-weighting scheme to mitigate....","companies":["ByteDance/Seed"],"matched_orgs":["ByteDance/Seed"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Computer Vision","Vision","arXiv","LLM"],"author_affiliations":["ByteDance/Seed"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official ByteDance Seed publication API https://seed.bytedance.com/api/get_article_list_v2"}},{"id":"bytedance-seed:116","title":"Video Instruction Tuning With Synthetic Data","url":"https://seed.bytedance.com/en/research/video-instruction-tuning-with-synthetic-data","published":"2024-10-03","authors":["Yuanhan Zhang","Jinming Wu","Wei Li","Bo Li","Zejun Ma","Ziwei Liu","Chunyuan Li"],"abstract":"The development of video large multimodal models (LMMs) has been hindered by the difficulty of curating large amounts of high-quality raw data from the web. To address this, we propose an alternative approach by creating a high-quality synthetic dataset specifically for video instruction-following, namely LLaVA-Video-178K. This dataset includes key tasks such as detailed captioning, open-ended question-answering (QA), and multiple-choice QA. By training on this dataset, in combination with existing visual instruction tuning data, we introduce LLaVA-Video, a new video LMM. Our experiments demonstrate that LLaVA-Video achieves strong performance across various video benchmarks, highlighting the effectiveness of our dataset. We plan to release the dataset, its generation pipeline, and the model checkpoints. External paper link: https://arxiv.org/abs/2410.02713","companies":["ByteDance/Seed"],"matched_orgs":["ByteDance/Seed"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Computer Vision","Multimodal","arXiv"],"author_affiliations":["ByteDance/Seed"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official ByteDance Seed publication API https://seed.bytedance.com/api/get_article_list_v2"}},{"id":"hf-org-paper:baidu:2410.02743","title":"MA-RLHF: Reinforcement Learning from Human Feedback with Macro Actions","url":"https://huggingface.co/papers/2410.02743","published":"2024-10-03","authors":["Baidu"],"abstract":"","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["HuggingFace org papers","baidu"],"author_affiliations":["Baidu"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/baidu/papers"}},{"id":"arxiv:2410.16820","title":"AttriPrompter: Auto-Prompting With Attribute Semantics for Zero-Shot Nuclei Detection via Visual-Language Pre-Trained Models","url":"http://arxiv.org/abs/2410.16820","published":"2024-10-03","authors":["Yongjian Wu","Yang Zhou","Jiya Saiyin","Bingzheng Wei","Maode Lai","Jianzhong Shou","Yan Xu"],"abstract":"Large-scale visual-language pre-trained models (VLPMs) have demonstrated exceptional performance in downstream object detection through text prompts for natural scenes. However, their application to zero-shot nuclei detection on histopathology images remains relatively unexplored, mainly due to the significant gap between the characteristics of medical images and the web-originated text-image pairs used for pre-training. This paper aims to investigate the potential of the object-level VLPM, Grounded Language-Image Pre-training (GLIP), for zero-shot nuclei detection. Specifically, we propose an innovative auto-prompting pipeline, named AttriPrompter, comprising attribute generation, attribute augmentation, and relevance sorting, to avoid subjective manual prompt design. AttriPrompter utilizes VLPMs' text-to-image alignment to create semantically rich text prompts, which are then fed into....","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/tmi.2024.3473745","openalex_id":"https://openalex.org/W4403095503","cited_by_count":4,"quality_score":45,"matched_keywords":["distillation"],"author_affiliations":["Alibaba Group (China)","Beihang University","Biomechanics Institute of Valencia","Chinese Academy of Medical Sciences & Peking Union Medical College","National Cancer Center","National Clinical Research","Peking Union Medical College Hospital","Zhejiang University"],"concepts":[{"id":"https://openalex.org/C2778344882","display_name":"Shot (pellet)","score":0.7016332149505615},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6627486944198608},{"id":"https://openalex.org/C184337299","display_name":"Semantics (computer science)","score":0.6577494740486145},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.6506270170211792},{"id":"https://openalex.org/C2780813799","display_name":"Zero (linguistics)","score":0.6421186923980713},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.5367342829704285},{"id":"https://openalex.org/C115961682","display_name":"Image (mathematics)","score":0.49796199798583984},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.46732938289642334}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":4}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/editroom-llm-parameterized-graph-diffusion-for-composable-3d-room-layout-editing","title":"EditRoom: LLM-parameterized Graph Diffusion for Composable 3D Room Layout Editing","url":"https://www.microsoft.com/en-us/research/publication/editroom-llm-parameterized-graph-diffusion-for-composable-3d-room-layout-editing/","published":"2024-10-02","authors":["Kaizhi Zheng","Xiaotong Chen","Xuehai He","Jing Gu","Linjie Li","Zhengyuan Yang","K. Lin","Jianfeng Wang","Lijuan Wang","Xin Eric Wang","Linjie Li"],"abstract":"Given the steep learning curve of professional 3D software and the time-consuming process of managing large 3D assets, language-guided 3D scene editing has significant potential in fields such as virtual reality, augmented reality, and gaming. However, recent approaches to language-guided 3D scene editing either require manual interventions or focus only on appearance modifications without supporting comprehensive scene layout changes. In response, we propose Edit-Room, a unified framework capable of executing a variety of layout edits through natural language commands, without requiring manual intervention. Specifically, EditRoom leverages Large Language Models (LLMs) for command planning and generates target scenes using a diffusion-based method, enabling six types of edits: rotate, translate, scale, replace, add, and remove. To address the lack of data for language-guided 3D scene edi...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":92,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer vision","Graphics and multimedia","Human-computer interaction","3D graphics","Computer science","Computer Vision and Pattern Recognition","1970-01-01","LLM"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"bytedance-seed:115","title":"HybridFlow: A Flexible and Efficient RLHF Framework","url":"https://seed.bytedance.com/en/research/hybridflow-a-flexible-and-efficient-rlhf-framework","published":"2024-10-02","authors":["Guangming Sheng","Chi Zhang","Zilingfeng Ye","Xibin Wu","Wang Zhang","Ru Zhang","Yanghua Peng","Haibin Lin","Chuan Wu"],"abstract":"Reinforcement Learning from Human Feedback (RLHF) is widely used in Large Language Model (LLM) alignment. Traditional RL can be modeled as a dataflow, where each node represents computation of a neural network (NN) and each edge denotes data dependencies between the NNs. RLHF complicates the dataflow by expanding each node into a distributed LLM training or generation program, and each edge into a many-to-many multicast. Traditional RL frameworks execute the dataflow using a single controller to instruct both intra-node computation and inter-node communication, which can be inefficient in RLHF due to large control dispatch overhead for distributed intra-node computation. Existing RLHF systems adopt a multi-controller paradigm, which can be inflexible due to nesting distributed computation and data communication. We propose HybridFlow, which combines single-controller and multi-controller...","companies":["ByteDance/Seed"],"matched_orgs":["ByteDance/Seed"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":84,"matched_keywords":["Reinforcement Learning","System Research","Infrastructures","EuroSys 2025","LLM","language model","memory","efficient"],"author_affiliations":["ByteDance/Seed"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official ByteDance Seed publication API https://seed.bytedance.com/api/get_article_list_v2"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/designing-staged-evaluation-workflows-for-llms-integrating-domain-experts-lay-users-and-model-generated-evaluation-criteria","title":"Designing Staged Evaluation Workflows for LLMs: Integrating Domain Experts, Lay Users, and Model-Generated Evaluation Criteria","url":"https://www.microsoft.com/en-us/research/publication/designing-staged-evaluation-workflows-for-llms-integrating-domain-experts-lay-users-and-model-generated-evaluation-criteria/","published":"2024-10-02","authors":["Annalisa Szymanski","S. A. Gebreegziabher","Oghenemaro Anuyah","Ronald A. Metoyer","T. Li"],"abstract":"Large Language Models (LLMs) are increasingly utilized for domain-specific tasks, yet evaluating their outputs remains challenging. A common strategy is to apply evaluation criteria to assess alignment with domain-specific standards, yet little is understood about how criteria differ across sources or where each type is most useful in the evaluation process. This study investigates criteria developed by domain experts, lay users, and LLMs to identify their complementary roles within an evaluation workflow. Results show that experts produce fact-based criteria with long-term value, lay users emphasize usability with a shorter-term focus, and LLMs target procedural checks for immediate task requirements. We also examine how criteria evolve between a priori and a posteriori phases, noting drift across stages as well as convergence in the a posteriori phase. Based on our observations, we pro...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.1145/3772318.3790897","openalex_id":"https://openalex.org/W4403863136","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","large language models","1970-01-01","long-term"],"author_affiliations":["Microsoft","Microsoft (United States)","University of Notre Dame"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W4403081537","title":"AccDiffusion: An Accurate Method for Higher-Resolution Image Generation","url":"https://doi.org/10.1007/978-3-031-72658-3_3","published":"2024-10-02","authors":["Zhihang Lin","Mingbao Lin","Meng Zhao","Rongrong Ji"],"abstract":"","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"book-chapter","doi":"https://doi.org/10.1007/978-3-031-72658-3_3","openalex_id":"https://openalex.org/W4403081537","cited_by_count":4,"quality_score":41,"matched_keywords":[],"author_affiliations":["Tencent (China)","Xiamen University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8325080871582031},{"id":"https://openalex.org/C115961682","display_name":"Image (mathematics)","score":0.5365669131278992},{"id":"https://openalex.org/C138268822","display_name":"Resolution (logic)","score":0.5251662135124207},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.46857190132141113},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4179246425628662},{"id":"https://openalex.org/C121684516","display_name":"Computer graphics (images)","score":0.40159958600997925}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":4}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/metareflection-learning-instructions-for-language-agents-using-past-reflections","title":"METAREFLECTION: Learning Instructions for Language Agents using Past Reflections","url":"https://www.microsoft.com/en-us/research/publication/metareflection-learning-instructions-for-language-agents-using-past-reflections/","published":"2024-10-01","authors":["Priyanshu Gupta","Shashank Kirtania","Ananya Singha","Sumit Gulwani","Arjun Radhakrishna","Sherry Shi","Gustavo Soares"],"abstract":"The popularity of Large Language Models (LLMs) have unleashed a new age of Language Agents for solving a diverse range of tasks. While contemporary frontier LLMs are capable enough to power reasonably good Language agents, the closed-API model makes it hard to improve in cases they perform sub-optimally. To address this, recent works have explored ways to improve their performance using techniques like self-reflection and prompt optimization. Unfortunately, techniques like self-reflection can be used only in an online setup, while contemporary prompt optimization techniques are designed and tested to work on simple tasks. To this end, we introduce MetaReflection, a novel offline reinforcement learning technique that enhances the performance of Language Agents by augmenting a semantic memory based on experiential learnings from past trials. We demonstrate the efficacy of MetaReflection by...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":88,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","automatic prompt engineering","Language Agents","Language model","1970-01-01","LLM","memory","agent"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/navigating-the-unknown-a-chat-based-collaborative-interface-for-personalized-exploratory-tasks","title":"Navigating the Unknown: A Chat-Based Collaborative Interface for Personalized Exploratory Tasks","url":"https://www.microsoft.com/en-us/research/publication/navigating-the-unknown-a-chat-based-collaborative-interface-for-personalized-exploratory-tasks/","published":"2024-10-01","authors":["Yingzhe Peng","Xiaoting Qin","Zhiyang Zhang","Jue Zhang","Qingwei Lin 林庆维","Xu Yang","Dongmei Zhang","Saravan Rajmohan","Qi Zhang"],"abstract":"The rise of large language models (LLMs) has revolutionized user interactions with knowledge-based systems, enabling chatbots to synthesize vast amounts of information and assist with complex, exploratory tasks. However, LLM-based chatbots often struggle to provide personalized support, particularly when users start with vague queries or lack sufficient contextual information. This paper introduces the C ollaborative A ssistant for Pe r sonalized E xploration ( CARE ), a system designed to enhance personalization in exploratory tasks by combining a multi-agent LLM framework with a structured user interface. CARE's interface consists of a Chat Panel, Solution Panel, and Needs Panel, enabling iterative query refinement and dynamic solution generation. The multi-agent framework collaborates to identify both explicit and implicit user needs, delivering tailored, actionable solutions. In a wi...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":84,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","1970-01-01","LLM","personalized","personalization","agent","multi-agent"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/ironies-of-generative-ai-understanding-and-mitigating-productivity-loss-in-human-ai-interaction","title":"Ironies of Generative AI: Understanding and Mitigating Productivity Loss in Human-AI Interaction","url":"https://www.microsoft.com/en-us/research/publication/ironies-of-generative-ai-understanding-and-mitigating-productivity-loss-in-human-ai-interaction/","published":"2024-10-01","authors":["Auste Simkute","Lev Tankelevitch","Victor Kewenig","Ava Elizabeth Scott","Abigail Sellen","Sean Rintel"],"abstract":"Generative AI (GenAI) systems offer opportunities to increase user productivity in many tasks, such as programming and writing. However, while they boost productivity in some studies, many others show that users are working ineffectively with GenAI systems and losing productivity. Despite the apparent novelty of these usability challenges, these ‘ironies of automation’ have been observed for over three decades in Human Factors research on the introduction of automation in domains such as aviation, automated driving, and intelligence. We draw on this extensive research alongside recent GenAI user studies to outline four key reasons for productivity loss with GenAI systems: a shift in users’ roles from production to evaluation, unhelpful restructuring of workflows, interruptions, and a tendency for automation to make easy tasks easier and hard tasks harder. We then suggest how Human Factor...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":84,"matched_keywords":["Article (Journal)","Artificial intelligence","Human-computer interaction","Social sciences","Human–computer interaction","1970-01-01","Computer science","personalization"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/medimageinsight-an-open-source-embedding-model-for-general-domain-medical-imaging","title":"MedImageInsight: An Open-Source Embedding Model for General Domain Medical Imaging","url":"https://www.microsoft.com/en-us/research/publication/medimageinsight-an-open-source-embedding-model-for-general-domain-medical-imaging/","published":"2024-10-01","authors":["Noel Codella","Yu Gu","Shrey Jain","Ho Hin Lee","Asma Ben Abacha","Alberto Santamaria-Pang","Will Guyman","Natieek Sangani","Sheng Zhang","Hoifung Poon","Stephanie Hyland","Shruthi Bannur"],"abstract":"In this work, we present MedImageInsight, an open-source medical imaging embedding model. MedImageInsight is trained on medical images with associated text and labels across a diverse collection of domains, including X-Ray, CT, MRI, dermoscopy, OCT, fundus photography, ultrasound, histopathology, and mammography. Rigorous evaluations demonstrate MedImageInsight's ability to achieve state-of-the-art (SOTA) or human expert level performance across classification, image-image search, and fine-tuning tasks. Specifically, on public datasets, MedImageInsight achieves SOTA in CT 3D medical image retrieval, as well as SOTA in disease classification and search for chest X-ray, dermatology, and OCT imaging. Furthermore, MedImageInsight achieves human expert performance in bone age estimation (on both public and partner data), as well as AUC above 0.9 in most other domains. When paired with a text....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":80,"matched_keywords":["Unpublished","Artificial intelligence","Computer vision","Medical, health and genomics","Healthcare","Medicine","retrieval"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/dont-just-string-tokens-stack-them-improving-multimodal-transformers-with-layer-stack","title":"DeepStack: Deeply Stacking Visual Tokens is Surprisingly Simple and Effective for LMMs","url":"https://www.microsoft.com/en-us/research/publication/dont-just-string-tokens-stack-them-improving-multimodal-transformers-with-layer-stack/","published":"2024-10-01","authors":["Lingchen Meng","Jianwei Yang","Rui Tian","Xiyang Dai","Zuxuan Wu","Jianfeng Gao","Yu-Gang Jiang"],"abstract":"Large multimodal models (LMMs) have shown tremendous improvements over the past year for multimodal understanding and reasoning. Currently, most (if not all) of the works attempt to connect vision and LLMs by feeding into a large language model (LLM) a string of visual tokens extracted from pretrained vision encoders ( e.g. , CLIP). Nevertheless, such a strategy brings considerable compute and memory overhead to the original LLMs due to extra visual tokens, which is particularly significant for high-resolution images and videos. Despite some efforts to mitigate this with sophisticated token compressions, the methods usually struggle to reach a good trade-off between efficacy and efficiency. In this work, we propose a new strategy for connecting vision and language transformers in large multimodal models (LMMs). Instead of stringing visual tokens as a sequence, we stack the visual tokens....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":80,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer vision","Multimodal Large Language Models","LLM","language model","memory"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/improving-steering-and-verification-in-ai-assisted-data-analysis-with-interactive-task-decomposition","title":"Improving Steering and Verification in AI-Assisted Data Analysis with Interactive Task Decomposition","url":"https://www.microsoft.com/en-us/research/publication/improving-steering-and-verification-in-ai-assisted-data-analysis-with-interactive-task-decomposition/","published":"2024-10-01","authors":["Majeed Kazemitabaar","Jack Williams","Ian Drosos","Tovi Grossman","Austin Henley","Carina Negreanu","Advait Sarkar"],"abstract":"LLM-powered tools like ChatGPT Data Analysis, have the potential to help users tackle the challenging task of data analysis programming, which requires expertise in data processing, programming, and statistics. However, our formative study (n=15) uncovered serious challenges in verifying AI-generated results and steering the AI (i.e., guiding the AI system to produce the desired output). We developed two contrasting approaches to address these challenges. The first (Stepwise) decomposes the problem into step-by-step subgoals with pairs of editable assumptions and code until task completion, while the second (Phasewise) decomposes the entire problem into three editable, logical phases: structured input/output assumptions, execution plan, and code. A controlled, within-subjects experiment (n=18) compared these systems against a conversational baseline. Users reported significantly greater....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Data platforms and analytics","Human-computer interaction","1970-01-01","LLM"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/exact-teaching-ai-agents-to-explore-with-reflective-mcts-and-exploratory-learning","title":"ExACT: Teaching AI Agents to Explore with Reflective-MCTS and Exploratory Learning","url":"https://www.microsoft.com/en-us/research/publication/exact-teaching-ai-agents-to-explore-with-reflective-mcts-and-exploratory-learning/","published":"2024-10-01","authors":["Xiao Yu","Baolin Peng","Vineeth Vajipey","Hao Cheng","Michel Galley","Jianfeng Gao","Zhou Yu"],"abstract":"Autonomous agents have demonstrated significant potential in automating complex multistep decision-making tasks. However, even state-of-the-art vision-language models (VLMs), such as GPT-4o, still fall short of human-level performance, particularly in intricate web environments and long-horizon tasks. To address these limitations, we present ExACT, an approach to combine test-time search and self-learning to build o1-like models for agentic applications. We first introduce Reflective Monte Carlo Tree Search (R-MCTS), a novel test time algorithm designed to enhance AI agents' ability to explore decision space on the fly. R-MCTS extends traditional MCTS by 1) incorporating contrastive reflection, allowing agents to learn from past interactions and dynamically improve their search efficiency; and 2) using multi-agent debate for reliable state evaluation. Next, we introduce Exploratory Learn...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Tech Report","Artificial intelligence","Human language technologies","Human-computer interaction","agent","multi-agent"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/what-can-foundation-models-embeddings-do","title":"What can Foundation Models’ Embeddings do?","url":"https://www.microsoft.com/en-us/research/publication/what-can-foundation-models-embeddings-do/","published":"2024-10-01","authors":["Linjie Li","Jianwei Yang","Lijuan Wang"],"abstract":"Foundation models possess strong capabilities in reasoning and memorizing across modalities. To further unleash the power of foundation models, we present FIND, a generalized interface for aligning foundation models' embeddings with unified image and dataset-level understanding spanning modality and granularity. As shown in Fig.1, a lightweight transformer interface without tuning any foundation model weights is enough for segmentation, grounding, and retrieval in an interleaved manner. The proposed interface has the following favorable attributes: (1) Generalizable. It applies to various tasks spanning retrieval, segmentation, etc., under the same architecture and weights. (2) Interleavable. With the benefit of multi-task multi-modal training, the proposed interface creates an interleaved shared embedding space. (3) Extendable. The proposed interface is adaptive to new tasks, and new mo...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Unpublished","Artificial intelligence","Computer science","foundation models","retrieval"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/redcode-multi-dimensional-safety-benchmark-for-code-agents","title":"RedCode: Risky Code Execution and Generation Benchmark for Code Agents","url":"https://www.microsoft.com/en-us/research/publication/redcode-multi-dimensional-safety-benchmark-for-code-agents/","published":"2024-10-01","authors":["Chengquan Guo","Xun Liu","Chulin Xie","Andy Zhou","Yi Zeng","Zinan Lin","Dawn Song","Bo Li"],"abstract":"With the rapidly increasing capabilities and adoption of code agents for AI-assisted coding and software development, safety and security concerns, such as generating or executing malicious code, have become significant barriers to the real-world deployment of these agents. To provide comprehensive and practical evaluations on the safety of code agents, we propose RedCode, an evaluation platform with benchmarks grounded in four key principles: real interaction with systems, holistic evaluation of unsafe code generation and execution, diverse input formats, and high-quality safety scenarios and tests. RedCode consists of two parts to evaluate agents' safety in unsafe code execution and generation: (1) RedCode-Exec provides challenging code prompts in Python as inputs, aiming to evaluate code agents' ability to recognize and handle unsafe code. We then map the Python code to other programm...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Security, privacy, and cryptography","AI agents","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/finding-inductive-loop-invariants-using-large-language-models","title":"Leveraging LLMs for Program Verification","url":"https://www.microsoft.com/en-us/research/publication/finding-inductive-loop-invariants-using-large-language-models/","published":"2024-10-01","authors":["Adharsh Kamath","Aditya Senthilnathan","Saikat Chakraborty","Pantazis Deligiannis","Shuvendu Lahiri","Akash Lal","Aseem Rastogi","Subhajit Roy","Rahul Sharma"],"abstract":"Loop invariants are fundamental to reasoning about programs with loops. They establish properties about a given loop's behavior. When they additionally are inductive, they become useful for the task of formal verification that seeks to establish strong mathematical guarantees about program's runtime behavior. The inductiveness ensures that the invariants can be checked locally without consulting the entire program, thus are indispensable artifacts in a formal proof of correctness. Finding inductive loop invariants is an undecidable problem, and despite a long history of research towards practical solutions, it remains far from a solved problem. This paper investigates the capabilities of the Large Language Models (LLMs) in offering a new solution towards this old, yet important problem. To that end, we first curate a dataset of verification problems on programs with loops. Next, we desig...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Inproceedings (Conference)","Programming languages and software engineering","1970-01-01","LLM","efficient"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/scaling-the-codebook-size-of-vq-gan-to-100000-with-a-utilization-rate-of-99","title":"Scaling the Codebook Size of VQ-GAN to 100,000 with a Utilization Rate of 99%","url":"https://www.microsoft.com/en-us/research/publication/scaling-the-codebook-size-of-vq-gan-to-100000-with-a-utilization-rate-of-99/","published":"2024-10-01","authors":["Fangyun Wei","Dong Chen"],"abstract":"In the realm of image quantization exemplified by VQGAN, the process encodes images into discrete tokens drawn from a codebook with a predefined size. Recent advancements, particularly with LLAMA 3, reveal that enlarging the codebook significantly enhances model performance. However, VQGAN and its derivatives, such as VQGAN-FC (Factorized Codes) and VQGAN-EMA, continue to grapple with challenges related to expanding the codebook size and enhancing codebook utilization. For instance, VQGAN-FC is restricted to learning a codebook with a maximum size of 16,384, maintaining a typically low utilization rate of less than 12% on ImageNet. In this work, we propose a novel image quantization model named VQGAN-LC (Large Codebook), which extends the codebook size to 100,000, achieving an utilization rate exceeding 99%. Unlike previous methods that optimize each codebook entry, our approach begins w...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Unpublished","Artificial intelligence","Image generation","quantization"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/rextime-a-benchmark-suite-for-reasoning-across-time-in-videos","title":"ReXTime: A Benchmark Suite for Reasoning-Across-Time in Videos","url":"https://www.microsoft.com/en-us/research/publication/rextime-a-benchmark-suite-for-reasoning-across-time-in-videos/","published":"2024-10-01","authors":["Jr-Jen Chen","Yu-Chien Liao","Hsi-Che Lin","Yu-Chu Yu","Yen-Chun Chen","Yu-Chiang Frank Wang"],"abstract":"We introduce ReXTime, a benchmark designed to rigorously test AI models' ability to perform temporal reasoning within video events. Specifically, ReXTime focuses on reasoning across time, i.e. human-like understanding when the question and its corresponding answer occur in different video segments. This form of reasoning, requiring advanced understanding of cause-and-effect relationships across video segments, poses significant challenges to even the frontier multimodal large language models. To facilitate this evaluation, we develop an automated pipeline for generating temporal reasoning question-answer pairs, significantly reducing the need for labor-intensive manual annotations. Our benchmark includes 921 carefully vetted validation samples and 2,143 test samples, each manually curated for accuracy and relevance. Evaluation results show that while frontier large language models outper...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Benchmarking","temporal reasoning"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/not-all-tokens-are-what-you-need-for-pretraining","title":"Not All Tokens Are What You Need for Pretraining","url":"https://www.microsoft.com/en-us/research/publication/not-all-tokens-are-what-you-need-for-pretraining/","published":"2024-10-01","authors":["Yeyun Gong","Xiao Liu","Yelong Shen","Ruochen Xu","Jian Jiao","Nan Duan","Weizhu Chen"],"abstract":"Previous language model pre-training methods have uniformly applied a next-token prediction loss to all training tokens. Challenging this norm, we posit that \"Not all tokens in a corpus are equally important for language model training\". Our initial analysis examines token-level training dynamics of language model, revealing distinct loss patterns for different tokens. Leveraging these insights, we introduce a new language model called Rho-1. Unlike traditional LMs that learn to predict every next token in a corpus, Rho-1 employs Selective Language Modeling (SLM), which selectively trains on useful tokens that aligned with the desired distribution. This approach involves scoring pretraining tokens using a reference model, and then training the language model with a focused loss on tokens with higher scores. When continual pretraining on 15B OpenWebMath corpus, Rho-1 yields an absolute im...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","language model training","language model"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/multimodal-large-language-models-make-text-to-image-generative-models-align-better","title":"Multimodal Large Language Models Make Text-to-Image Generative Models Align Better","url":"https://www.microsoft.com/en-us/research/publication/multimodal-large-language-models-make-text-to-image-generative-models-align-better/","published":"2024-10-01","authors":["Xun Wu","Shaohan Huang","Furu Wei"],"abstract":"Recent studies have demonstrated the exceptional potentials of leveraging human preference datasets to refine text-to-image generative models, enhancing the alignment between generated images and textual prompts. Despite these advances, current human preference datasets are either prohibitively expensive to construct or suffer from a lack of diversity in preference dimensions, resulting in limited applicability for instruction tuning in open-source text-to-image generative models and hinder further exploration. To address these challenges and promote the alignment of generative models through instruction tuning, we leverage multimodal large language models to create VisionPrefer, a high-quality and fine-grained preference dataset that captures multiple preference aspects. We aggregate feedback from AI annotators across four aspects: prompt-following, aesthetic, fidelity, and harmlessness...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Unpublished","Computer vision","text-to-image generation","preference"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/mesa-extrapolation-a-weave-position-encoding-method-for-enhanced-extrapolation-in-llms","title":"Mesa-Extrapolation: A Weave Position Encoding Method for Enhanced Extrapolation in LLMs","url":"https://www.microsoft.com/en-us/research/publication/mesa-extrapolation-a-weave-position-encoding-method-for-enhanced-extrapolation-in-llms/","published":"2024-10-01","authors":["Jingjing Liu"],"abstract":"Large language models (LLMs), although having revolutionized many fields, still suffer from the challenging extrapolation problem, where the inference ability of LLMs sharply declines beyond their max training lengths. In this work, we conduct a theoretical analysis to better understand why No Position Encoding (NoPE) fails outside its effective range, as well as examining the power of Position Encoding (PE) in this context. Our findings reveal that with meticulous weave position, PE can indeed be extended beyond effective range. Our theorems establish that LLMs equipped with weave PE can achieve improved extrapolation performance without additional cost. Furthermore, we introduce a novel weave PE method, Mesa-Extrapolation, which utilizes a chunk-based triangular attention matrix and applies Stair PE to manage the final chunk. This method not only retains competitive performance but als...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Unpublished","Artificial intelligence","large language models","memory"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/flash-a-workflow-automation-agent-for-diagnosing-recurring-incidents","title":"FLASH: A Workflow Automation Agent for Diagnosing Recurring Incidents","url":"https://www.microsoft.com/en-us/research/publication/flash-a-workflow-automation-agent-for-diagnosing-recurring-incidents/","published":"2024-10-01","authors":["Xuchao Zhang","Tanish Mittal","Chetan Bansal","Rujia Wang","Minghua Ma","Zhixin Ren","Hao Huang","Saravan Rajmohan"],"abstract":"Recurring incidents, typically raised by system monitors, often occur repeatedly, demanding significant human effort for troubleshooting. Automating the diagnosis process for these recurring incidents is crucial for minimizing service downtime, reducing customer impact, and decreasing manual labor. While recent agent approaches based on Large Language Models (LLMs) have demonstrated effectiveness in handling complex tasks requiring multiple logical steps, they still suffer from the reliability issue due to a lack of specific diagnostic knowledge. To enhance diagnostic reliability, we propose a workFLow Automation agent with Status supervision and Hindsight integration (FLASH), which significantly improves diagnostic accuracy by incorporating status supervision to break down the complex instructions into manageable pieces aligned with identified status. Moreover, we generate hindsight usi...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Unpublished","Artificial intelligence","Programming languages and software engineering","agent"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/boosting-text-to-video-generative-model-with-mllms-feedback","title":"Boosting Text-to-Video Generative Model with MLLMs Feedback","url":"https://www.microsoft.com/en-us/research/publication/boosting-text-to-video-generative-model-with-mllms-feedback/","published":"2024-10-01","authors":["Xun Wu","Shaohan Huang","Furu Wei"],"abstract":"Recent advancements in text-to-video generative models, such as Sora, have showcased impressive capabilities. These models have attracted significant interest for their potential applications. However, they often rely on extensive datasets of variable quality, which can result in generated videos that lack aesthetic appeal and do not accurately reflect the input text prompts. A promising approach to mitigate these issues is to leverage Reinforcement Learning from Human Feedback (RLHF), which aims to align the outputs of text-to-video generative with human preferences. However, the considerable costs associated with manual annotation have led to a scarcity of comprehensive preference datasets. In response to this challenge, our study begins by investigating the efficacy of Multimodal Large Language Models (MLLMs) generated annotations in capturing video preferences, discovering a high deg...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Unpublished","Artificial intelligence","Reinforcement learning","preference"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W4403013301","title":"Current State of Community-Driven Radiological AI Deployment in Medical Imaging","url":"https://doi.org/10.2196/55833","published":"2024-10-01","authors":["Vikash Gupta","Barbaros S. Erdal","Carolina Ramirez","Ralf Floca","Brad Genereaux","Sidney Bryson","Christopher P. Bridge","Jens Kleesiek","Felix Nensa","Rickmer Braren","Khaled Younis","Tobias Penzkofer"],"abstract":"Artificial intelligence (AI) has become commonplace in solving routine everyday tasks. Because of the exponential growth in medical imaging data volume and complexity, the workload on radiologists is steadily increasing. AI has been shown to improve efficiency in medical image generation, processing, and interpretation, and various such AI models have been developed across research laboratories worldwide. However, very few of these, if any, find their way into routine clinical use, a discrepancy that reflects the divide between AI research and successful AI translation. The goal of this paper is to give an overview of the intersection of AI and medical imaging landscapes. We also want to inform the readers about the importance of using standards in their radiology workflow and the challenges associated with deploying AI models in the clinical workflow. The main focus of this paper is to....","companies":["Microsoft","NVIDIA"],"matched_orgs":["Microsoft","NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.2196/55833","openalex_id":"https://openalex.org/W4403013301","cited_by_count":16,"quality_score":65,"matched_keywords":[],"author_affiliations":["Berlin Institute of Health at Charité - Universitätsmedizin Berlin","Charité - Universitätsmedizin Berlin","German Cancer Research Center","Goethe University Frankfurt","Heidelberg University","Jacksonville College","King's College London","King's College School","Massachusetts General Hospital","Mayo Clinic in Florida","Microsoft (United States)","National Computing Centre (United Kingdom)","Nvidia (United States)","Seoul National University Hospital","SimulConsult","Technical University of Munich","University Hospital Frankfurt","University of California, San Francisco","University of Pennsylvania","WinnMed"],"concepts":[{"id":"https://openalex.org/C43169469","display_name":"Preprint","score":0.8930966258049011},{"id":"https://openalex.org/C105339364","display_name":"Software deployment","score":0.6496267914772034},{"id":"https://openalex.org/C190892606","display_name":"Radiological weapon","score":0.5706585049629211},{"id":"https://openalex.org/C48103436","display_name":"State (computer science)","score":0.5502569079399109},{"id":"https://openalex.org/C31601959","display_name":"Medical imaging","score":0.45636409521102905},{"id":"https://openalex.org/C19527891","display_name":"Medical physics","score":0.408285528421402},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.3887367844581604},{"id":"https://openalex.org/C71924100","display_name":"Medicine","score":0.37262892723083496}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":16}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/wavecoder-widespread-and-versatile-enhanced-instruction-tuning-with-refined-data-generation","title":"WaveCoder: Widespread And Versatile Enhanced Instruction Tuning with Refined Data Generation","url":"https://www.microsoft.com/en-us/research/publication/wavecoder-widespread-and-versatile-enhanced-instruction-tuning-with-refined-data-generation/","published":"2024-10-01","authors":["Zhaojian Yu","Xin Zhang","Ning Shang","Yangyu Huang","Can Xu","Yishujie Zhao","Wenxiang Hu","Qiufeng Yin"],"abstract":"Recent work demonstrates that, after instruction tuning, Code Large Language Models (Code LLMs) can obtain impressive capabilities to address a wide range of code-related tasks. However, current instruction tuning methods for Code LLMs mainly focus on the traditional code generation task, resulting in poor performance in complex multi-task scenarios. In this paper, we concentrate on multiple code-related tasks and present WaveCoder, a series of Code LLMs trained with Widespread And Versatile Enhanced instruction data. To enable the models to tackle complex coderelated tasks, we propose a method to stably generate diverse, high-quality instruction data from open source code dataset in multitask scenarios and obtain CodeSeaXDataset, a dataset comprising 19,915 instruction instances across 4 code-related tasks, which is aimed at improving the generalization ability of Code LLM. Our experime...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","LLM"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/topic-conversation-relevance-tcr-dataset-and-benchmarks","title":"Topic-Conversation Relevance (TCR) Dataset and Benchmarks","url":"https://www.microsoft.com/en-us/research/publication/topic-conversation-relevance-tcr-dataset-and-benchmarks/","published":"2024-10-01","authors":["Yaran Fan","Jamie Pool","Senja Filipi","Ross Cutler"],"abstract":"Workplace meetings are vital to organizational collaboration, yet a large percentage of meetings are rated as ineffective. To help improve meeting effectiveness by understanding if the conversation is on topic, we create a comprehensive Topic-Conversation Relevance (TCR) Dataset that covers a variety of domains and meeting styles. The TCR dataset includes 1,500 unique meetings, 22,000 words in transcripts, and over 15,000 meeting topics, sourced from both newly collected Speech Interruption Meeting (SIM) data and existing public datasets. Along with the text data, we also open-source scripts to generate synthetic meetings or create augmented meetings from the TCR dataset to enhance the data diversity. For each data source, benchmarks are created using GPT-4 to evaluate the model accuracy in understanding transcription-topic relevance.","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Unpublished","Human-computer interaction","organizational collaboration"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/meta-diffub-a-contextualized-sequence-to-sequence-text-diffusion-model-with-meta-exploration","title":"Meta-Diffu$B$: A Contextualized Sequence-to-Sequence Text Diffusion Model with Meta-Exploration","url":"https://www.microsoft.com/en-us/research/publication/meta-diffub-a-contextualized-sequence-to-sequence-text-diffusion-model-with-meta-exploration/","published":"2024-10-01","authors":["Kevin Lin"],"abstract":"The diffusion model, a new generative modeling paradigm, has achieved significant success in generating images, audio, video, and text. It has been adapted for sequence-to-sequence text generation (Seq2Seq) through DiffuSeq, termed S2S Diffusion. Existing S2S-Diffusion models predominantly rely on fixed or hand-crafted rules to schedule noise during the diffusion and denoising processes. However, these models are limited by non-contextualized noise, which fails to fully consider the characteristics of Seq2Seq tasks. In this paper, we propose the Meta-Diffu$B$ framework—a novel scheduler-exploiter S2S-Diffusion paradigm designed to overcome the limitations of existing S2S-Diffusion models. We employ Meta-Exploration to train an additional scheduler model dedicated to scheduling contextualized noise for each sentence. Our exploiter model, an S2S-Diffusion model, leverages the noise schedul...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Unpublished","Artificial intelligence","Diffusion models"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/igor-image-goal-representations-are-the-atomic-control-units-for-foundation-models-in-embodied-ai","title":"IGOR: Image-GOal Representations are the Atomic Control Units for Foundation Models in Embodied AI","url":"https://www.microsoft.com/en-us/research/publication/igor-image-goal-representations-are-the-atomic-control-units-for-foundation-models-in-embodied-ai/","published":"2024-10-01","authors":["Xiaoyu Chen","Junliang Guo","Tianyu He","Chuheng Zhang","Pushi Zhang","Derek Yang","Li Zhao","Jiang Bian"],"abstract":"We introduce Image-GOal Representations (IGOR), aiming to learn a unified, semantically consistent action space across human and various robots. Through this unified latent action space, IGOR enables knowledge transfer among large-scale robot and human activity data. We achieve this by compressing visual changes between an initial image and its goal state into latent actions. IGOR allows us to generate latent action labels for internet-scale video data. This unified latent action space enables the training of foundation policy and world models across a wide variety of tasks performed by both robots and humans. We demonstrate that: (1) IGOR learns a semantically consistent action space for both human and robots, characterizing various possible motions of objects representing the physical interaction knowledge; (2) IGOR can “migrate” the movements of the object in the one video to other vi...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Robotics"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/gorilla-teaching-llms-to-use-tools","title":"Gorilla: Teaching LLMs to Use Tools","url":"https://www.microsoft.com/en-us/research/publication/gorilla-teaching-llms-to-use-tools/","published":"2024-10-01","authors":["Xin Wang"],"abstract":"Large Language Models (LLMs) have seen an im-pressive wave of advances recently, with models now excelling in a variety of tasks, such as mathematical reasoning and program synthesis. How-ever, their potential to effectively use tools via API calls remains unfulfilled. This is a challenging task even for today’s state-of- the-art LLMs such as GPT-4 largely due to their unawareness of what APIs are available and how to use them in a frequently updated toolset. We develop Gorilla, a finetuned LLaMA model that surpasses the performance of GPT-4 on writing API calls. When combined with a document retriever, Gorilla demonstrates a strong capability to adapt to test-time document changes, enabling flexible user updates or version changes. It also substantially mitigates the issue of hallucination, commonly encountered when prompting LLMs directly. To evaluate the model’s ability, we introduce....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Unpublished","Artificial intelligence","retrieval"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/a-large-scale-human-centric-benchmark-for-referring-expression-comprehension-in-the-lmm-era","title":"A Large-Scale Human-Centric Benchmark for Referring Expression Comprehension in the LMM Era","url":"https://www.microsoft.com/en-us/research/publication/a-large-scale-human-centric-benchmark-for-referring-expression-comprehension-in-the-lmm-era/","published":"2024-10-01","authors":["Fangyun Wei"],"abstract":"Prior research in human-centric AI has primarily addressed single-modality tasks like pedestrian detection, action recognition, and pose estimation. However, the emergence of large multimodal models (LMMs) such as GPT-4V has redirected attention towards integrating language with visual content. Referring expression comprehension (REC) represents a prime example of this multimodal approach. Current human-centric REC benchmarks, typically sourced from general datasets, fall short in the LMM era due to their limitations, such as insufficient testing samples, overly concise referring expressions, and limited vocabulary, making them inadequate for evaluating the full capabilities of modern REC models. In response, we present HC-RefLoCo (Human-Centric Referring Expression Comprehension with Long Context), a benchmark that includes 13,452 images, 24,129 instances, and 44,738 detailed annotation...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Unpublished","Artificial intelligence","large multimodal models"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/alchemy-amplifying-theorem-proving-capability-through-symbolic-mutation","title":"Alchemy: Amplifying Theorem-Proving Capability through Symbolic Mutation","url":"https://www.microsoft.com/en-us/research/publication/alchemy-amplifying-theorem-proving-capability-through-symbolic-mutation/","published":"2024-10-01","authors":["Shaonan Wu","Shuai Lu","Yeyun Gong","Nan Duan","Ping Wei"],"abstract":"Formal proofs are challenging to write even for experienced experts. Recent progress in Neural Theorem Proving (NTP) shows promise in expediting this process. However, the formal corpora available on the Internet are limited compared to the general text, posing a significant data scarcity challenge for NTP. To address this issue, this work proposes Alchemy, a general framework for data synthesis that constructs formal theorems through symbolic mutation. Specifically, for each candidate theorem in Mathlib, we identify all invocable theorems that can be used to rewrite or apply to it. Subsequently, we mutate the candidate theorem by replacing the corresponding term in the statement with its equivalent form or antecedent. As a result, our method increases the number of theorems in Mathlib by an order of magnitude, from 110k to 6M. Furthermore, we perform continual pretraining and supervised...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Manual","Artificial intelligence"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W4403021930","title":"Video Echoed in Harmony: Learning and Sampling Video-Integrated Chord Progression Sequences for Controllable Video Background Music Generation","url":"https://doi.org/10.1109/tcss.2024.3451515","published":"2024-10-01","authors":["Xinyi Tong","Sitong Chen","Peiyang Yu","Nian Liu","Hui Qv","Tao Ma","Bo Zheng","Feng Yu","Song‐Chun Zhu"],"abstract":"Automatically generating video background music mitigates the inefficiency and time-consuming drawbacks of current manual video editing. Two key challenges hinder the expansion of the inception of video-to-music tasks. 1) Limited availability of high-quality video–music datasets and annotations. 2) Absence of music generation methods that consider actual musicality, which are controlled by interpretable factors based on music theory. In the article, we propose video echoed in harmony (VEH), a method for learning and sampling video-integrated chord progression sequences. Our approach adopts harmony, represented by chord progressions that are aligned with various music formats [musical instrument digital interface (MIDI), audio, and score], imitating chord precedence in human music composition. Visual-language models link visual features to chord progressions through genre labels and descr...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/tcss.2024.3451515","openalex_id":"https://openalex.org/W4403021930","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Alibaba Group (China)","Beijing Academy of Artificial Intelligence","Beijing Institute for General Artificial Intelligence","Central Conservatory of Music","China Aerodynamics Research and Development Center","China Aerospace Science and Industry Corporation (China)"],"concepts":[{"id":"https://openalex.org/C194147245","display_name":"Chord (peer-to-peer)","score":0.6498854756355286},{"id":"https://openalex.org/C2776453491","display_name":"Harmony (color)","score":0.6303609609603882},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6150108575820923},{"id":"https://openalex.org/C49774154","display_name":"Multimedia","score":0.45311295986175537},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.43375518918037415},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.39329397678375244},{"id":"https://openalex.org/C28490314","display_name":"Speech recognition","score":0.37472301721572876},{"id":"https://openalex.org/C31258907","display_name":"Computer network","score":0.1367141306400299}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"official:d7a3e839260afb25","title":"Proto-CLIP: Vision-Language Prototypical Network for Few-Shot Learning","url":"https://research.nvidia.com/publication/2024-10_proto-clip-vision-language-prototypical-network-few-shot-learning","published":"2024-10","authors":["Jishnu Jaykumar P","Kamalesh Palanisamy","Yu-Wei Chao","Xinya Du","Yu Xiang"],"abstract":"Official NVIDIA Research publication. IROS","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.1109/iros58592.2024.10801660","openalex_id":"https://openalex.org/W4405785376","cited_by_count":8,"quality_score":64,"matched_keywords":["IROS"],"author_affiliations":["NVIDIA","Nvidia (United States)","The University of Texas at Dallas"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official NVIDIA Research publications page https://research.nvidia.com/publications?f%5B0%5D=publication_date%3A2024&page=1"}},{"id":"official:68e646678f20bd19","title":"Claude Haiku 3.5 and Sonnet 3.5 (new) System Card","url":"https://www-cdn.anthropic.com/c7822cdc35ad788ec87e14b3a9d45010f1f86c38.pdf","published":"2024-10","authors":["Anthropic"],"abstract":"Official Anthropic system card for Claude Haiku 3.5 and Sonnet 3.5 (new).","companies":["Anthropic"],"matched_orgs":["Anthropic"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"model_card","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["system card","Claude","Claude Haiku 3.5 and Sonnet 3.5 (new)"],"author_affiliations":["Anthropic"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Anthropic system cards page https://www.anthropic.com/system-cards"}},{"id":"openalex:W4403003238","title":"DynamiCrafter: Animating Open-Domain Images with Video Diffusion Priors","url":"https://doi.org/10.1007/978-3-031-72952-2_23","published":"2024-09-30","authors":["Jinbo Xing","Menghan Xia","Yong Zhang","Haoxin Chen","Wangbo Yu","Hanyuan Liu","Gongye Liu","Xintao Wang","Ying Shan","Tien‐Tsin Wong"],"abstract":"","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"book-chapter","doi":"https://doi.org/10.1007/978-3-031-72952-2_23","openalex_id":"https://openalex.org/W4403003238","cited_by_count":74,"quality_score":67,"matched_keywords":[],"author_affiliations":["Chinese University of Hong Kong","Peking University","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8586305379867554},{"id":"https://openalex.org/C177769412","display_name":"Prior probability","score":0.6656001210212708},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.6178666353225708},{"id":"https://openalex.org/C121684516","display_name":"Computer graphics (images)","score":0.6164827346801758},{"id":"https://openalex.org/C36503486","display_name":"Domain (mathematical analysis)","score":0.5304678678512573},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5169236063957214},{"id":"https://openalex.org/C107673813","display_name":"Bayesian probability","score":0.10909545421600342},{"id":"https://openalex.org/C33923547","display_name":"Mathematics","score":0.07488089799880981}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":74}},{"id":"official:17a917f7e4a5548a","title":"Ingest-And-Ground: Dispelling Hallucinations from Continually-Pretrained LLMs with RAG","url":"https://ai.meta.com/research/publications/ingest-and-ground-dispelling-hallucinations-from-continually-pretrained-llms-with-rag/","published":"2024-09-30","authors":["Chenhao Fang","Derek Larson","Shitong Zhu","Sophie Zeng","Wendy Summer","Yanqing Peng","Yuriy Hulovatyy","Rajeev Rao","Gabriel Forgues","Arya Pudota","Alex Goncalves","Hervé Robert"],"abstract":"Official Meta AI publication page.","companies":["Meta/FAIR"],"matched_orgs":["Meta/FAIR"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Conversational AI"],"author_affiliations":["Meta/FAIR"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Meta AI publication search https://ai.meta.com/results/?content_types%5B0%5D=publication&page=9"}},{"id":"openalex:W7128367332","title":"Adaptive Neural Feedback Methods for Bias and Weight Adjustment in Feed Forward Layers of LLMs","url":"https://doi.org/10.32628/ijsrst52310380","published":"2024-09-30","authors":["Sai Sukesh Reddy Tummuri"],"abstract":"Feed-forward layers constitute the dominant computational and parametric component of transformer-based Large Language Models (LLMs), yet they are a major source of training instability due to static bias terms, uncontrolled weight scaling, and activation distribution drift. Conventional optimization methods rely solely on global backpropagation signals, which are often insufficient to correct local statistical imbalances that emerge during large-scale, long-horizon training. This work proposes AFB-FFN (Adaptive Feedback Bias and Weight Corrected Feed-Forward Network), a novel feed-forward layer architecture that integrates an internal neural feedback mechanism to dynamically regulate bias and weight behavior during forward propagation. The proposed model introduces lightweight feedback units that generate bias correction vectors and weight gating signals conditioned on intermediate acti...","companies":["Meta/FAIR"],"matched_orgs":["Meta/FAIR"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.32628/ijsrst52310380","openalex_id":"https://openalex.org/W7128367332","cited_by_count":0,"quality_score":49,"matched_keywords":["LLM","language model","efficient"],"author_affiliations":["Meta (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6852999925613403},{"id":"https://openalex.org/C50644808","display_name":"Artificial neural network","score":0.614799976348877},{"id":"https://openalex.org/C155032097","display_name":"Backpropagation","score":0.5957000255584717},{"id":"https://openalex.org/C117251300","display_name":"Parametric statistics","score":0.501800000667572},{"id":"https://openalex.org/C48044578","display_name":"Scalability","score":0.48330000042915344},{"id":"https://openalex.org/C47446073","display_name":"Control theory (sociology)","score":0.42179998755455017},{"id":"https://openalex.org/C38858127","display_name":"Feed forward","score":0.40149998664855957},{"id":"https://openalex.org/C106301342","display_name":"Entropy (arrow of time)","score":0.3817000091075897}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4402997354","title":"Parrot: Pareto-Optimal Multi-reward Reinforcement Learning Framework for Text-to-Image Generation","url":"https://doi.org/10.1007/978-3-031-72920-1_26","published":"2024-09-30","authors":["Seung Hyun Lee","Yinxiao Li","Junjie Ke","Innfarn Yoo","Han Zhang","Jiahui Yu","Qifei Wang","Fei Deng","Glenn Entis","Junfeng He","Gang Li","Sangpil Kim"],"abstract":"","companies":["OpenAI"],"matched_orgs":["OpenAI"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"book-chapter","doi":"https://doi.org/10.1007/978-3-031-72920-1_26","openalex_id":"https://openalex.org/W4402997354","cited_by_count":8,"quality_score":45,"matched_keywords":[],"author_affiliations":["Google (United Kingdom)","Google (United States)","Google DeepMind (United Kingdom)","Korea University","OpenAI (United States)","Rutgers, The State University of New Jersey"],"concepts":[{"id":"https://openalex.org/C97541855","display_name":"Reinforcement learning","score":0.8186089396476746},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8075001239776611},{"id":"https://openalex.org/C2986314615","display_name":"Pareto optimal","score":0.6466482877731323},{"id":"https://openalex.org/C137635306","display_name":"Pareto principle","score":0.5637431740760803},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.527445375919342},{"id":"https://openalex.org/C115961682","display_name":"Image (mathematics)","score":0.5246458649635315},{"id":"https://openalex.org/C126255220","display_name":"Mathematical optimization","score":0.30887019634246826},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.28165119886398315}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":8}},{"id":"openalex:W4402997464","title":"Idea2Img: Iterative Self-refinement with GPT-4V for Automatic Image Design and Generation","url":"https://doi.org/10.1007/978-3-031-72920-1_10","published":"2024-09-30","authors":["Zhengyuan Yang","Jianfeng Wang","Linjie Li","Kevin Lin","Chung-Ching Lin","Zicheng Liu","Lijuan Wang"],"abstract":"","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"book-chapter","doi":"https://doi.org/10.1007/978-3-031-72920-1_10","openalex_id":"https://openalex.org/W4402997464","cited_by_count":4,"quality_score":41,"matched_keywords":[],"author_affiliations":["Microsoft (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8526561260223389},{"id":"https://openalex.org/C115961682","display_name":"Image (mathematics)","score":0.5235233306884766},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.44184398651123047},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3998892307281494},{"id":"https://openalex.org/C121684516","display_name":"Computer graphics (images)","score":0.381686270236969},{"id":"https://openalex.org/C11413529","display_name":"Algorithm","score":0.35526055097579956}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":4}},{"id":"openalex:W4403002333","title":"Turbo: Informativity-Driven Acceleration Plug-In for Vision-Language Large Models","url":"https://doi.org/10.1007/978-3-031-72952-2_25","published":"2024-09-30","authors":["Chen Ju","Haicheng Wang","Haozhe Cheng","Xu Chen","Zhonghua Zhai","Weilin Huang","Jinsong Lan","Shuai Xiao","Bo Zheng"],"abstract":"","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"book-chapter","doi":"https://doi.org/10.1007/978-3-031-72952-2_25","openalex_id":"https://openalex.org/W4403002333","cited_by_count":3,"quality_score":40,"matched_keywords":[],"author_affiliations":["Alibaba Group (China)","Shanghai Jiao Tong University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8407117128372192},{"id":"https://openalex.org/C2776240298","display_name":"Turbo","score":0.6835007667541504},{"id":"https://openalex.org/C117896860","display_name":"Acceleration","score":0.5411369800567627},{"id":"https://openalex.org/C28490314","display_name":"Speech recognition","score":0.3643677532672882},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3632965087890625},{"id":"https://openalex.org/C121684516","display_name":"Computer graphics (images)","score":0.3524113595485687},{"id":"https://openalex.org/C171146098","display_name":"Automotive engineering","score":0.11731305718421936},{"id":"https://openalex.org/C127413603","display_name":"Engineering","score":0.06955534219741821}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":3}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/interactive-speculative-planning-enhance-agent-efficiency-through-co-design-of-system-and-user-interface","title":"Interactive Speculative Planning: Enhance Agent Efficiency through Co-design of System and User Interface","url":"https://www.microsoft.com/en-us/research/publication/interactive-speculative-planning-enhance-agent-efficiency-through-co-design-of-system-and-user-interface/","published":"2024-09-29","authors":["Wenyue Hua","Mengting Wan","Shashank Vadrevu","Ryan Nadel","Yongfeng Zhang","Chi Wang"],"abstract":"Agents, as user-centric tools, are increasingly deployed for human task delegation, assisting with a broad spectrum of requests by generating thoughts, engaging with user proxies, and producing action plans. However, agents based on large language models (LLMs) often face substantial planning latency due to two primary factors: the efficiency limitations of the underlying LLMs due to their large size and high demand, and the structural complexity of the agents due to the extensive generation of intermediate thoughts to produce the final output. Given that inefficiency in service provision can undermine the value of automation for users, this paper presents a human-centered efficient agent planning method -- Interactive Speculative Planning -- aiming at enhancing the efficiency of agent planning through both system design and human-AI interaction. Our approach advocates for the co-design....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Article (Journal)","Algorithms","Artificial intelligence","Computer science","efficient","agent"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/social-conjuring-multi-user-runtime-collaboration-with-ai-in-building-virtual-3d-worlds","title":"Social Conjuring: Multi-User Runtime Collaboration with AI in Building Virtual 3D Worlds","url":"https://www.microsoft.com/en-us/research/publication/social-conjuring-multi-user-runtime-collaboration-with-ai-in-building-virtual-3d-worlds/","published":"2024-09-29","authors":["Amina Kobenova","C. DeVeaux","Samyak Parajuli","Andrzej Banburski-Fahey","Judith Amores","Jaron Lanier"],"abstract":"Generative artificial intelligence has shown promise in prompting virtual worlds into existence, yet little attention has been given to understanding how this process unfolds as social interaction. We present Social Conjurer, a framework for AI-augmented dynamic 3D scene co-creation, where multiple users collaboratively build and modify virtual worlds in real-time. Through an expanded set of interactions, including social and tool-based engagements as well as spatial reasoning, our framework facilitates the creation of rich, diverse virtual environments. Findings from a preliminary user study (N=12) provide insight into the user experience of this approach, how social contexts shape the prompting of spatial environments, and perspective on social applications of prompt-based 3D co-creation. In addition to highlighting the potential of AI-supported multi-user world creation and offering n...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Unpublished","Artificial intelligence","Human-computer interaction","Computer science"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/maia-2-a-unified-model-for-human-ai-alignment-in-chess","title":"Maia-2: A Unified Model for Human-AI Alignment in Chess","url":"https://www.microsoft.com/en-us/research/publication/maia-2-a-unified-model-for-human-ai-alignment-in-chess/","published":"2024-09-29","authors":["Zhenwei Tang","Difan Jiao","Reid McIlroy-Young","Jon Kleinberg","Siddhartha Sen","Ashton Anderson"],"abstract":"There are an increasing number of domains in which artificial intelligence (AI) systems both surpass human ability and accurately model human behavior. This introduces the possibility of algorithmically-informed teaching in these domains through more relatable AI partners and deeper insights into human decision-making. Critical to achieving this goal, however, is coherently modeling human behavior at various skill levels. Chess is an ideal model system for conducting research into this kind of human-AI alignment, with its rich history as a pivotal testbed for AI research, mature superhuman AI systems like AlphaZero, and precise measurements of skill via chess rating systems. Previous work in modeling human decision-making in chess uses completely independent models to capture human style at different skill levels, meaning they lack coherence in their ability to adapt to the full spectrum...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W4402952458","title":"LLaVA-Grounding: Grounded Visual Chat with Large Multimodal Models","url":"https://doi.org/10.1007/978-3-031-72775-7_2","published":"2024-09-29","authors":["Hao Zhang","Hongyang Li","Feng Li","Tianhe Ren","Xueyan Zou","Shilong Liu","Shijia Huang","Jianfeng Gao","Leizhang","Chunyuan Li","Jainwei Yang"],"abstract":"","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"book-chapter","doi":"https://doi.org/10.1007/978-3-031-72775-7_2","openalex_id":"https://openalex.org/W4402952458","cited_by_count":32,"quality_score":67,"matched_keywords":[],"author_affiliations":["Chinese University of Hong Kong","Hong Kong University of Science and Technology","Microsoft (United States)","South China University of Technology","Tsinghua University","University of Wisconsin–Madison"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8079761266708374},{"id":"https://openalex.org/C156325361","display_name":"Grounded theory","score":0.5340344905853271},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.4476802945137024},{"id":"https://openalex.org/C168993435","display_name":"Ground","score":0.41184112429618835},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3398445248603821},{"id":"https://openalex.org/C121684516","display_name":"Computer graphics (images)","score":0.3235314190387726},{"id":"https://openalex.org/C119599485","display_name":"Electrical engineering","score":0.11914825439453125},{"id":"https://openalex.org/C144024400","display_name":"Sociology","score":0.06618297100067139}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":32}},{"id":"openalex:W4402955558","title":"ST-LLM: Large Language Models Are Effective Temporal Learners","url":"https://doi.org/10.1007/978-3-031-72998-0_1","published":"2024-09-29","authors":["Ruyang Liu","Chen Li","Haoran Tang","Yixiao Ge","Ying Shan","Ge Li"],"abstract":"","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"book-chapter","doi":"https://doi.org/10.1007/978-3-031-72998-0_1","openalex_id":"https://openalex.org/W4402955558","cited_by_count":22,"quality_score":63,"matched_keywords":["LLM"],"author_affiliations":["Peking University","Peng Cheng Laboratory","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.870120644569397},{"id":"https://openalex.org/C199360897","display_name":"Programming language","score":0.4050635099411011},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.37410271167755127},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.35043203830718994}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":22}},{"id":"openalex:W4402961833","title":"Event Camera Data Dense Pre-training","url":"https://doi.org/10.1007/978-3-031-72775-7_17","published":"2024-09-29","authors":["Yan Yang","Liyuan Pan","Liu Liu"],"abstract":"","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"book-chapter","doi":"https://doi.org/10.1007/978-3-031-72775-7_17","openalex_id":"https://openalex.org/W4402961833","cited_by_count":5,"quality_score":42,"matched_keywords":[],"author_affiliations":["Australian National University","Beijing Institute of Technology","Huawei Technologies (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.859595537185669},{"id":"https://openalex.org/C2779662365","display_name":"Event (particle physics)","score":0.643074095249176},{"id":"https://openalex.org/C2777211547","display_name":"Training (meteorology)","score":0.6219997406005859},{"id":"https://openalex.org/C121684516","display_name":"Computer graphics (images)","score":0.41666334867477417},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4158868193626404},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.4125572144985199},{"id":"https://openalex.org/C62520636","display_name":"Quantum mechanics","score":0.0},{"id":"https://openalex.org/C153294291","display_name":"Meteorology","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":5}},{"id":"openalex:W4402943979","title":"Prompting Future Driven Diffusion Model for Hand Motion Prediction","url":"https://doi.org/10.1007/978-3-031-72667-5_10","published":"2024-09-28","authors":["Bowen Tang","Kaihao Zhang","Wenhan Luo","Wei Liu","Hongdong Li"],"abstract":"","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"book-chapter","doi":"https://doi.org/10.1007/978-3-031-72667-5_10","openalex_id":"https://openalex.org/W4402943979","cited_by_count":4,"quality_score":41,"matched_keywords":[],"author_affiliations":["Australian National University","Harbin Institute of Technology","Hong Kong University of Science and Technology","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8359571695327759},{"id":"https://openalex.org/C104114177","display_name":"Motion (physics)","score":0.5749840140342712},{"id":"https://openalex.org/C69357855","display_name":"Diffusion","score":0.5503022074699402},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.44948065280914307},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.34616196155548096},{"id":"https://openalex.org/C121332964","display_name":"Physics","score":0.06162169575691223},{"id":"https://openalex.org/C97355855","display_name":"Thermodynamics","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":4}},{"id":"openalex:W4403075657","title":"D-Rax: Domain-Specific Radiologic Assistant Leveraging Multi-modal Data and eXpert Model Predictions","url":"https://doi.org/10.1007/978-3-031-73471-7_10","published":"2024-09-28","authors":["Hareem Nisar","Syed Muhammad Anwar","Zhifan Jiang","Abhijeet Parida","Ramon Sanchez","Vishwesh Nath","Holger R. Roth","Marius George Linguraru"],"abstract":"","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"book-chapter","doi":"https://doi.org/10.1007/978-3-031-73471-7_10","openalex_id":"https://openalex.org/W4403075657","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Children's National","George Washington University","Nvidia (United States)","Universidad Politécnica de Madrid"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8521827459335327},{"id":"https://openalex.org/C36503486","display_name":"Domain (mathematical analysis)","score":0.6713112592697144},{"id":"https://openalex.org/C71139939","display_name":"Modal","score":0.6597599387168884},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4489811360836029},{"id":"https://openalex.org/C115903868","display_name":"Software engineering","score":0.3350261449813843},{"id":"https://openalex.org/C199360897","display_name":"Programming language","score":0.3315524756908417},{"id":"https://openalex.org/C188027245","display_name":"Polymer chemistry","score":0.0},{"id":"https://openalex.org/C185592680","display_name":"Chemistry","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/easy2hard-bench-standardized-difficulty-labels-for-profiling-llm-performance-and-generalization","title":"Easy2Hard-Bench: Standardized Difficulty Labels for Profiling LLM Performance and Generalization","url":"https://www.microsoft.com/en-us/research/publication/easy2hard-bench-standardized-difficulty-labels-for-profiling-llm-performance-and-generalization/","published":"2024-09-26","authors":["Mucong Ding","Chenghao Deng","Jocelyn Choo","Zichu Wu","Aakriti Agrawal","Avi Schwarzschild","Tianyi Zhou","Tom Goldstein","John Langford","A. Anandkumar","Furong Huang"],"abstract":"While generalization over tasks from easy to hard is crucial to profile language models (LLMs), the datasets with fine-grained difficulty annotations for each problem across a broad range of complexity are still blank. Aiming to address this limitation, we present Easy2Hard-Bench, a consistently formatted collection of 6 benchmark datasets spanning various domains, such as mathematics and programming problems, chess puzzles, and reasoning questions. Each problem within these datasets is annotated with numerical difficulty scores. To systematically estimate problem difficulties, we collect abundant performance data on attempts to each problem by humans in the real world or LLMs on the prominent leaderboard. Leveraging the rich performance data, we apply well-established difficulty ranking systems, such as Item Response Theory (IRT) and Glicko-2 models, to uniformly assign numerical diffic...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","large language models","1970-01-01","LLM"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"official:9b44a1a37d3799a6","title":"Unveiling the Role of Pretraining in Direct Speech Translation","url":"https://ai.meta.com/research/publications/unveiling-the-role-of-pretraining-in-direct-speech-translation/","published":"2024-09-26","authors":["Belen Alastruey","Gerard I. Gállego","Marta R. Costa-jussa"],"abstract":"Official Meta AI publication page.","companies":["Meta/FAIR"],"matched_orgs":["Meta/FAIR"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Speech & Audio","NLP"],"author_affiliations":["Meta/FAIR"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Meta AI publication search https://ai.meta.com/results/?content_types%5B0%5D=publication&page=9"}},{"id":"apple:zk0lik8e4f46s026e3iqcaj4","title":"Contextualization of ASR with LLM Using Phonetic Retrieval-Based Augmentation","url":"https://machinelearning.apple.com/research/asr-contextualization","published":"2024-09-26","authors":["Zhihong Lei","Xingyu Na","Mingbin Xu","Ernest Pusateri","Christophe Van Gysel","Yuanyuan Zhang","Shiyi Han","Zhen Huang"],"abstract":"Large language models (LLMs) have shown superb capability of modeling multimodal signals including audio and text, allowing the model to generate spoken or textual response given a speech input. However, it remains a challenge for the model to recognize personal named entities, such as contacts in a phone book, when the input modality is speech. In this work, we start with a speech recognition task and propose a retrieval-based solution to...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["LLM","retrieval"],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"openalex:W4402859709","title":"Mutual Prompt Leaning for Vision Language Models","url":"https://doi.org/10.1007/s11263-024-02243-z","published":"2024-09-26","authors":["Sifan Long","Zhen Zhao","Junkun Yuan","Zichang Tan","Jiangjiang Liu","Jingyuan Feng","Shengsheng Wang","Jingdong Wang"],"abstract":"","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1007/s11263-024-02243-z","openalex_id":"https://openalex.org/W4402859709","cited_by_count":5,"quality_score":42,"matched_keywords":[],"author_affiliations":["Baidu (China)","Jilin University","The University of Sydney","Zhejiang University of Science and Technology"],"concepts":[{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.689325213432312},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5952543020248413},{"id":"https://openalex.org/C153180895","display_name":"Pattern recognition (psychology)","score":0.5114092230796814},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.4561121463775635},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.44266968965530396}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":5}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/mnemosyne-parallelization-strategies-for-efficiently-serving-multi-million-context-length-llm-inference-requests-without-approximations","title":"Mnemosyne: Parallelization Strategies for Efficiently Serving Multi-Million Context Length LLM Inference Requests Without Approximations","url":"https://www.microsoft.com/en-us/research/publication/mnemosyne-parallelization-strategies-for-efficiently-serving-multi-million-context-length-llm-inference-requests-without-approximations/","published":"2024-09-25","authors":["Amey Agrawal","Junda Chen","Íñigo Goiri","Ramachandran Ramjee","Chaojie Zhang","Alexey Tumanov","Esha Choukse"],"abstract":"As large language models (LLMs) evolve to handle increasingly longer contexts, serving inference requests for context lengths in the range of millions of tokens presents unique challenges. While existing techniques are effective for training, they fail to address the unique challenges of inference, such as varying prefill and decode phases and their associated latency constraints - like Time to First Token (TTFT) and Time Between Tokens (TBT). Furthermore, there are no long context inference solutions that allow batching requests to increase the hardware utilization today.In this paper, we propose three key innovations for efficient interactive long context LLM inference, without resorting to any approximation: adaptive chunking to reduce prefill overheads in mixed batching, Sequence Pipeline Parallelism (SPP) to lower TTFT, and KV Cache Parallelism (KVP) to minimize TBT. These contribut...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Article (Journal)","Artificial intelligence","Computer science","LLM","efficient"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W4402890475","title":"Foundation models in robotics: Applications, challenges, and the future","url":"https://doi.org/10.1177/02783649241281508","published":"2024-09-25","authors":["Roya Firoozi","Johnathan Tucker","Stephen Tian","Anirudha Majumdar","Jiankai Sun","Weiyu Liu","Yuke Zhu","Shuran Song","Ashish Kapoor","Karol Hausman","Brian Ichter","Danny Driess"],"abstract":"We survey applications of pretrained foundation models in robotics. Traditional deep learning models in robotics are trained on small datasets tailored for specific tasks, which limits their adaptability across diverse applications. In contrast, foundation models pretrained on internet-scale data appear to have superior generalization capabilities, and in some instances display an emergent ability to find zero-shot solutions to problems that are not present in the training data. Foundation models may hold the potential to enhance various components of the robot autonomy stack, from perception to decision-making and control. For example, large language models can generate code or provide common sense reasoning, while vision-language models enable open-vocabulary visual recognition. However, significant open research challenges remain, particularly around the scarcity of robot-relevant tra...","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1177/02783649241281508","openalex_id":"https://openalex.org/W4402890475","cited_by_count":154,"quality_score":67,"matched_keywords":[],"author_affiliations":["Google (United States)","Nvidia (United States)","Princeton University","Shanghai Jiao Tong University","Stanford University","Technische Universität Berlin","The University of Texas at Austin"],"concepts":[{"id":"https://openalex.org/C34413123","display_name":"Robotics","score":0.7732030153274536},{"id":"https://openalex.org/C2780966255","display_name":"Foundation (evidence)","score":0.6493630409240723},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.647380530834198},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.44942545890808105},{"id":"https://openalex.org/C127413603","display_name":"Engineering","score":0.42777904868125916},{"id":"https://openalex.org/C90509273","display_name":"Robot","score":0.3638927936553955},{"id":"https://openalex.org/C17744445","display_name":"Political science","score":0.10178869962692261},{"id":"https://openalex.org/C199539241","display_name":"Law","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":154}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/rar-retrieval-augmented-retrieval-for-code-generation-in-low-resource-languages","title":"RAR: Retrieval Augmented Retrieval for Code Generation in Low Resource Languages","url":"https://www.microsoft.com/en-us/research/publication/rar-retrieval-augmented-retrieval-for-code-generation-in-low-resource-languages/","published":"2024-09-24","authors":["Avik Dutta","Mukul Singh","Sumit Gulwani","Vu Le","Gust Verbruggen"],"abstract":"Language models struggle in generating correct code for low resource programming languages, since these are underrepresented in training data. Popular approaches use either examples or documentation to improve the performance of these models. Instead of considering the independent retrieval of this information, we introduce retrieval augmented retrieval (RAR) as a two-step retrieval method for selecting relevant examples and documentation. Extensive experiments on two low resource languages (Power Query M and OfficeScript) show that RAR outperforms example or grammar retrieval techniques (2.81–26.14%). Venue: 1970-01-01","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.18653/v1/2024.emnlp-main.1199","openalex_id":"https://openalex.org/W4404781864","cited_by_count":1,"quality_score":85,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Data platforms and analytics","Programming languages and software engineering","Computer science","Natural language processing","1970-01-01","retrieval"],"author_affiliations":["Microsoft","Microsoft (Belgium)","Microsoft (United States)","Microsoft Research (India)","Microsoft Research (United Kingdom)"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/one-to-many-testing-for-code-generation-from-just-natural-language","title":"One-to-many testing for code generation from (just) natural language","url":"https://www.microsoft.com/en-us/research/publication/one-to-many-testing-for-code-generation-from-just-natural-language/","published":"2024-09-24","authors":["Mansi Uniyal","Mukul Singh","Gust Verbruggen","Sumit Gulwani","Vu Le"],"abstract":"MBPP is a popular dataset for evaluating models on the task of code generation. Despite its popularity there are three problems with the original MBPP: (1) reliance on providing test cases to generate the right signature, (2) contamination of the exact phrasing being present in training datasets, and (3) poor alignment between instruction and evaluation testcases. To overcome this, we create MBUPP, by adapting the popular MBPP dataset for code generation from natural language to emphasize on the natural language aspect by evaluating generated code on multiple sets of assertions. Additionally, we update the text descriptions to remove ambiguity and instructions that are not evaluated by the assertions, like specific algorithms to use. This adapted dataset resolves the challenges around contamination, ambiguity and testcase alignment. Further, we compare popular open and closed weight mode...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Data platforms and analytics","Programming languages and software engineering","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W4402758159","title":"Defining our future with generative AI","url":"https://doi.org/10.1038/s43588-024-00694-5","published":"2024-09-24","authors":["Siddharth Suri"],"abstract":"","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1038/s43588-024-00694-5","openalex_id":"https://openalex.org/W4402758159","cited_by_count":5,"quality_score":42,"matched_keywords":[],"author_affiliations":["Microsoft (United States)","Microsoft Research (United Kingdom)"],"concepts":[{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.7339126467704773},{"id":"https://openalex.org/C188147891","display_name":"Cognitive science","score":0.3393491804599762},{"id":"https://openalex.org/C111472728","display_name":"Epistemology","score":0.32307615876197815},{"id":"https://openalex.org/C15744967","display_name":"Psychology","score":0.3165815472602844},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.28172218799591064},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.26654255390167236},{"id":"https://openalex.org/C138885662","display_name":"Philosophy","score":0.2524171471595764}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":5}},{"id":"openalex:W4409133369","title":"Towards Faster Graph Partitioning via Pre-Training and Inductive Inference","url":"https://doi.org/10.1109/hpec62836.2024.10938459","published":"2024-09-23","authors":["Meng Qin","Chaorui Zhang","Yu Gao","Yibin Ding","Weipeng Jiang","Weixi Zhang","Wei Han","Bo Bai"],"abstract":"Graph partitioning (GP) is a classic problem that divides the node set of a graph into densely-connected blocks. Following the IEEE HPEC Graph Challenge and recent advances in pre-training techniques (e.g., large-language models), we propose PR-GPT (Pre-trained & Refined Graph ParTitioning) based on a novel pre-training & refinement paradigm. We first conduct the offline pre-training of a deep graph learning (DGL) model on small synthetic graphs with various topology properties. By using the inductive inference of DGL, one can directly generalize the pre-trained model (with frozen model parameters) to large graphs and derive feasible GP results. We also use the derived partition as a good initialization of an efficient GP method (e.g., InfoMap) to further refine the quality of partitioning. In this setting, the online generalization and refinement of PR-GPT can not only benefit from the....","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/hpec62836.2024.10938459","openalex_id":"https://openalex.org/W4409133369","cited_by_count":3,"quality_score":44,"matched_keywords":["efficient"],"author_affiliations":["Central Research Institute","Hong Kong University of Science and Technology","Huawei Technologies (China)","University of Hong Kong"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7453426122665405},{"id":"https://openalex.org/C2776214188","display_name":"Inference","score":0.6745091676712036},{"id":"https://openalex.org/C132525143","display_name":"Graph","score":0.5257885456085205},{"id":"https://openalex.org/C2777211547","display_name":"Training (meteorology)","score":0.49606451392173767},{"id":"https://openalex.org/C80444323","display_name":"Theoretical computer science","score":0.45740604400634766},{"id":"https://openalex.org/C88230418","display_name":"Graph theory","score":0.43073606491088867},{"id":"https://openalex.org/C21563000","display_name":"Inductive reasoning","score":0.4282938838005066},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3703955113887787}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":3}},{"id":"openalex:W4402794143","title":"Advancing robots with greater dynamic dexterity: A large-scale multi-view and multi-modal dataset of human-human throw&catch of arbitrary objects","url":"https://doi.org/10.1177/02783649241275674","published":"2024-09-23","authors":["Lipeng Chen","Jianing Qiu","Lin Li","Xi Luo","Guoyi Chi","Yu Zheng"],"abstract":"Learning and imitating behavioral intelligence from human demonstrations is a promising approach towards the intuitive programming of robots for enhanced dynamic dexterity. However, there has been no publicly available dataset in this domain. To address this gap, we introduce the first large-scale dataset and recording framework specifically designed for studying human collaborative dynamic dexterity in throw&catch tasks. The dataset, named H 2 TC, contains 15,000 multi-view and multi-modal synchronized recordings of diverse Human-Human Throw-and-Catch activities. It involves 34 human subjects with typical motor abilities and a variety of 52 objects frequently manipulated through throw&catch in domestic and/or industrial scenarios. The dataset is supplemented with a hierarchy of manually annotated semantic and dense labels, such as the ground truth human body, hand and object motions cap...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1177/02783649241275674","openalex_id":"https://openalex.org/W4402794143","cited_by_count":2,"quality_score":39,"matched_keywords":[],"author_affiliations":["Imperial College London","King's College London","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C90509273","display_name":"Robot","score":0.6814022064208984},{"id":"https://openalex.org/C71139939","display_name":"Modal","score":0.6449421644210815},{"id":"https://openalex.org/C2778755073","display_name":"Scale (ratio)","score":0.6167473793029785},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5414199829101562},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.535705029964447},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.3636510968208313},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.3397255837917328},{"id":"https://openalex.org/C205649164","display_name":"Geography","score":0.13736045360565186}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":2}},{"id":"openalex:W4402718416","title":"IgGM: A Generative Model for Functional Antibody and Nanobody Design","url":"https://doi.org/10.1101/2024.09.19.613838","published":"2024-09-22","authors":["Rubo Wang","Fandi Wu","Xingyu Gao","Jiaxiang Wu","Peilin Zhao","Jianhua Yao"],"abstract":"A bstract Immunoglobulins are crucial proteins produced by the immune system to identify and bind to foreign substances, playing an essential role in shielding organisms from infections and diseases. Designing specific antibodies opens new pathways for disease treatment. With the rise of deep learning, AI-driven drug design has become possible, leading to several methods for antibody design. However, many of these approaches require additional conditions that differ from real-world scenarios, making it challenging to incorporate them into existing antibody design processes. Here, we introduce IgGM, a generative model for the de novo design of immunoglobulins with functional specificity. IgGM produces antibody sequences and structures simultaneously for a given antigen, consisting of three core components: a pre-trained language model for extracting sequence features, a feature learning m...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"preprint","doi":"https://doi.org/10.1101/2024.09.19.613838","openalex_id":"https://openalex.org/W4402718416","cited_by_count":16,"quality_score":57,"matched_keywords":["language model"],"author_affiliations":["Chinese Academy of Sciences","Institute of Microelectronics","Tencent (China)","University of Chinese Academy of Sciences"],"concepts":[{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.6403594017028809},{"id":"https://openalex.org/C167966045","display_name":"Generative model","score":0.4858417212963104},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.48256009817123413},{"id":"https://openalex.org/C159654299","display_name":"Antibody","score":0.4745541214942932},{"id":"https://openalex.org/C70721500","display_name":"Computational biology","score":0.3245340585708618},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.2664376199245453},{"id":"https://openalex.org/C86803240","display_name":"Biology","score":0.21022284030914307},{"id":"https://openalex.org/C203014093","display_name":"Immunology","score":0.12408369779586792}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":16}},{"id":"openalex:W4402782221","title":"FastComposer: Tuning-Free Multi-subject Image Generation with Localized Attention","url":"https://doi.org/10.1007/s11263-024-02227-z","published":"2024-09-19","authors":["Guangxuan Xiao","Tianwei Yin","William T. Freeman","Frédo Durand","Song Han"],"abstract":"Abstract Diffusion models excel at text-to-image generation, especially in subject-driven generation for personalized images. However, existing methods are inefficient due to the subject-specific fine-tuning, which is computationally intensive and hampers efficient deployment. Moreover, existing methods struggle with multi-subject generation as they often blend identity among subjects. We present FastComposer which enables efficient, personalized, multi-subject text-to-image generation without fine-tuning. FastComposer uses subject embeddings extracted by an image encoder to augment the generic text conditioning in diffusion models, enabling personalized image generation based on subject images and textual instructions with only forward passes . To address the identity blending problem in the multi-subject generation, FastComposer proposes cross-attention localization supervision during....","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1007/s11263-024-02227-z","openalex_id":"https://openalex.org/W4402782221","cited_by_count":109,"quality_score":75,"matched_keywords":["personalized","efficient"],"author_affiliations":["Massachusetts Institute of Technology","Nvidia (United States)"],"concepts":[{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.680243968963623},{"id":"https://openalex.org/C153180895","display_name":"Pattern recognition (psychology)","score":0.5815449357032776},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5716103315353394},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.5315746068954468},{"id":"https://openalex.org/C115961682","display_name":"Image (mathematics)","score":0.5268574953079224},{"id":"https://openalex.org/C2777855551","display_name":"Subject (documents)","score":0.5214535593986511},{"id":"https://openalex.org/C9417928","display_name":"Image processing","score":0.41013413667678833},{"id":"https://openalex.org/C161191863","display_name":"Library science","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":109}},{"id":"official:394fa46ab8849cca","title":"Qwen2.5-LLM: Extending the boundary of LLMs","url":"https://qwenlm.github.io/blog/qwen2.5-llm/","published":"2024-09-19","authors":["Alibaba/Qwen"],"abstract":"GITHUB HUGGING FACE MODELSCOPE DEMO DISCORDIntroduction In this blog, we delve into the details of our latest Qwen2.5 series language models. We have developed a range of decoder-only dense models, with seven of them open-sourced, spanning from 0.5B to 72B parameters. Our research indicates a significant interest among users in models within the 10-30B range for production use, as well as 3B models for mobile applications. To meet these demands, we are open-sourcing Qwen2.","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["LLM"],"author_affiliations":["Alibaba/Qwen"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official RSS feed https://qwenlm.github.io/blog/index.xml"}},{"id":"official:12bd6088ab057738","title":"Qwen2.5: A Party of Foundation Models!","url":"https://qwenlm.github.io/blog/qwen2.5/","published":"2024-09-19","authors":["Alibaba/Qwen"],"abstract":"GITHUB HUGGING FACE MODELSCOPE DEMO DISCORDIntroduction In the past three months since Qwen2’s release, numerous developers have built new models on the Qwen2 language models, providing us with valuable feedback. During this period, we have focused on creating smarter and more knowledgeable language models. Today, we are excited to introduce the latest addition to the Qwen family: Qwen2.5. We are announcing what might be the largest opensource release in history!","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Alibaba/Qwen"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official RSS feed https://qwenlm.github.io/blog/index.xml"}},{"id":"official:724ed60ddfe8b822","title":"Qwen2.5-Math: The world's leading open-sourced mathematical LLMs","url":"https://qwenlm.github.io/blog/qwen2.5-math/","published":"2024-09-19","authors":["Alibaba/Qwen"],"abstract":"GITHUB HUGGING FACE MODELSCOPE DISCORD🚨 Qwen2.5-Math mainly supports solving English and Chinese math problems through CoT and TIR. We do not recommend using this series of models for other tasks. Introduction A month ago, we released the first series of mathematical LLMs - Qwen2-Math - of our Qwen family. Today, we have upgraded it and open-sourced Qwen2.5-Math series, including base models Qwen2.5-Math-1.5B/7B/72B, instruction-tuned models Qwen2.5-Math-1.5B/7B/72B-Instruct, and mathematical reward model Qwen2.","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Alibaba/Qwen"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official RSS feed https://qwenlm.github.io/blog/index.xml"}},{"id":"official:a2ddcf2029a05f31","title":"Qwen2.5-Coder: Code More, Learn More!","url":"https://qwenlm.github.io/blog/qwen2.5-coder/","published":"2024-09-19","authors":["Alibaba/Qwen"],"abstract":"GITHUB HUGGING FACE MODELSCOPE DEMO DISCORDIntroduction In early April, we introduced CodeQwen1.5, which garnered significant attention from the community. Since then, we have been working to enhance the coding model. Today, we are excited to announce the release of the next generation of open-source coding models, Qwen2.5-Coder, and officially rename CodeQwen to Qwen-Coder. We think “Coder” is more human-like and agile, reflecting our vision of it becoming a true coding partner in the future.","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Alibaba/Qwen"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official RSS feed https://qwenlm.github.io/blog/index.xml"}},{"id":"openalex:W4402582789","title":"A Survey on Video Diffusion Models","url":"https://doi.org/10.1145/3696415","published":"2024-09-18","authors":["Zhen Xing","Qijun Feng","Haoran Chen","Qi Dai","Han Hu","Hang Xu","Zuxuan Wu","Yu–Gang Jiang"],"abstract":"The recent wave of AI-generated content (AIGC) has witnessed substantial success in computer vision, with the diffusion model playing a crucial role in this achievement. Due to their impressive generative capabilities, diffusion models are gradually superseding methods based on GANs and auto-regressive Transformers, demonstrating exceptional performance not only in image generation and editing, but also in the realm of video-related research. However, existing surveys mainly focus on diffusion models in the context of image generation, with few up-to-date reviews on their application in the video domain. To address this gap, this article presents a comprehensive review of video diffusion models in the AIGC era. Specifically, we begin with a concise introduction to the fundamentals and evolution of diffusion models. Subsequently, we present an overview of research on diffusion models in t...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"review","doi":"https://doi.org/10.1145/3696415","openalex_id":"https://openalex.org/W4402582789","cited_by_count":96,"quality_score":67,"matched_keywords":[],"author_affiliations":["Fudan University","Huawei Technologies (China)","Huawei Technologies (Sweden)","Microsoft Research Asia (China)","Shanghai Institute of Computing Technology"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8571904897689819},{"id":"https://openalex.org/C36503486","display_name":"Domain (mathematical analysis)","score":0.5808137059211731},{"id":"https://openalex.org/C94124525","display_name":"Categorization","score":0.5618060827255249},{"id":"https://openalex.org/C2779343474","display_name":"Context (archaeology)","score":0.5485804677009583},{"id":"https://openalex.org/C192209626","display_name":"Focus (optics)","score":0.5050826668739319},{"id":"https://openalex.org/C2522767166","display_name":"Data science","score":0.4636100232601166},{"id":"https://openalex.org/C2780310081","display_name":"Video editing","score":0.4577736556529999},{"id":"https://openalex.org/C26517878","display_name":"Key (lock)","score":0.45505908131599426}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":96}},{"id":"openalex:W4402592961","title":"Noise-Robust Vision-Language Pre-Training With Positive-Negative Learning","url":"https://doi.org/10.1109/tpami.2024.3462996","published":"2024-09-18","authors":["Zhenyu Huang","Mouxing Yang","Xinyan Xiao","Peng Hu","Xi Peng"],"abstract":"Vision-Language Pre-training (VLP) has shown promising performance in various tasks by learning a generic image-text representation space. However, most existing VLP methods encounter the Noisy Correspondence (NC) problem which refers to wrongly matched image-text pairs harvested from the wild. In this paper, we empirically study the influence of NC on the VLP model and obtain the following two observations. First, the NC will largely degrade the performance in downstream tasks even via fine-tuning, indicating the necessity of handling NC in the pre-training period. Second, the influence of NC varies in different pre-training objectives, suggesting the objective-customized solution for achieving NC robustness. Based on the above observations, we propose a novel NoisE-robust Vision-languagE pRe-training method (NEVER) to endow the VLP model with robustness against NC. In brief, NEVER firs...","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/tpami.2024.3462996","openalex_id":"https://openalex.org/W4402592961","cited_by_count":8,"quality_score":45,"matched_keywords":[],"author_affiliations":["Baidu (China)","Chengdu University","Sichuan University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6332820057868958},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.6224743127822876},{"id":"https://openalex.org/C99498987","display_name":"Noise (video)","score":0.483020156621933},{"id":"https://openalex.org/C2777211547","display_name":"Training (meteorology)","score":0.4604414701461792},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.4582591950893402},{"id":"https://openalex.org/C28490314","display_name":"Speech recognition","score":0.4446980655193329},{"id":"https://openalex.org/C153180895","display_name":"Pattern recognition (psychology)","score":0.34545665979385376},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.34001463651657104}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":8}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/eureka-evaluating-and-understanding-large-foundation-models","title":"EUREKA: Evaluating and Understanding Large Foundation Models","url":"https://www.microsoft.com/en-us/research/publication/eureka-evaluating-and-understanding-large-foundation-models/","published":"2024-09-17","authors":["Vidhisha Balachandran","Jingya Chen","Neel Joshi","Besmira Nushi","Hamid Palangi","Eduardo Salinas","Vibhav Vineet","James Woffinden-Luey","Safoora Yousefi"],"abstract":"Rigorous and reproducible evaluation of large foundation models is critical for assessing the state of the art, informing next steps in model improvement, and for guiding scientific advances in Artificial Intelligence (AI). Evaluation is also important for informing the increasing number of application developers that build services on foundation models. The evaluation process has however become challenging in practice due to several reasons that require immediate attention from the community, including benchmark saturation, lack of transparency in the methods being deployed for measurement, development challenges in extracting the right measurements for generative tasks, and, more generally, the extensive number of capabilities that need to be considered for showing a well-rounded comparison across models. In addition, despite the overwhelming numbers of side-by-side capability evaluati...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Tech Report","Artificial intelligence","Computer vision","Human language technologies","Machine learning","retrieval"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W4402568552","title":"Multi-Modal 3D Object Detection by Box Matching","url":"https://doi.org/10.1109/tits.2024.3453963","published":"2024-09-17","authors":["Zhe Liu","Xiaoqing Ye","Zhikang Zou","Xinwei He","Xiao Tan","Errui Ding","Jingdong Wang","Xiang Bai"],"abstract":"Multi-modal 3D object detection has received growing attention as the information from different sensors like LiDAR and cameras are complementary. Most fusion methods for 3D detection rely on an accurate alignment and calibration between 3D point clouds and RGB images. However, such an assumption is not reliable in a real-world self-driving system, as the alignment between different modalities is easily affected by asynchronous sensors and disturbed sensor placement. We propose a novel Fusion network by Box Matching (FBMNet) for multi-modal 3D detection, which provides an alternative way for cross-modal feature alignment by learning the correspondence at the bounding box level to free up the dependency of calibration during inference. With the learned assignments between 3D and 2D object proposals, the fusion for detection can be effectively performed by combining their ROI features. Ext...","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/tits.2024.3453963","openalex_id":"https://openalex.org/W4402568552","cited_by_count":9,"quality_score":46,"matched_keywords":[],"author_affiliations":["Baidu (China)","Huazhong Agricultural University","Huazhong University of Science and Technology"],"concepts":[{"id":"https://openalex.org/C71139939","display_name":"Modal","score":0.6224682927131653},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5366790294647217},{"id":"https://openalex.org/C165064840","display_name":"Matching (statistics)","score":0.5320070385932922},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5217498540878296},{"id":"https://openalex.org/C2776151529","display_name":"Object detection","score":0.4950965940952301},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.44443997740745544},{"id":"https://openalex.org/C2781238097","display_name":"Object (grammar)","score":0.4243980050086975},{"id":"https://openalex.org/C153180895","display_name":"Pattern recognition (psychology)","score":0.40849220752716064}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":9}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/retrievalattention-accelerating-long-context-llm-inference-via-vector-retrieval","title":"RetrievalAttention: Accelerating Long-Context LLM Inference via Vector Retrieval","url":"https://www.microsoft.com/en-us/research/publication/retrievalattention-accelerating-long-context-llm-inference-via-vector-retrieval/","published":"2024-09-16","authors":["Di Liu","Meng Chen","Baotong Lu","Huiqiang Jiang","Qianxi Zhang","Qi Chen","Chengruidong Zhang","Bailu Ding","Kai Zhang","Chen Chen","Fan Yang","Yuqing Yang"],"abstract":"Transformer-based Large Language Models (LLMs) have become increasingly important. However, due to the quadratic time complexity of attention computation, scaling LLMs to longer contexts incurs extremely slow inference speed and high GPU memory consumption for caching key-value (KV) vectors. This paper proposes RetrievalAttention, a training-free approach to both accelerate attention computation and reduce GPU memory consumption. By leveraging the dynamic sparsity of attention mechanism, RetrievalAttention proposes to build approximate nearest neighbour search (ANNS) indexes for KV vectors in CPU memory and retrieve the most relevant ones through vector search during generation. Unfortunately, we observe that the off-the-shelf ANNS indexes are often ineffective for such retrieval tasks due to the out-of-distribution (OOD) between query vectors and key vectors in the attention mechanism.....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":100,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Human language technologies","Systems and networking","Computation and Language","large language models","Machine learning","systems","1970-01-01","LLM","memory","retrieval"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/promptriever-instruction-trained-retrievers-can-be-prompted-like-language-models","title":"Promptriever: Instruction-Trained Retrievers Can Be Prompted Like Language Models","url":"https://www.microsoft.com/en-us/research/publication/promptriever-instruction-trained-retrievers-can-be-prompted-like-language-models/","published":"2024-09-16","authors":["Orion Weller","Ben Van Durme","Dawn Lawrie","Ashwin Paranjape","Yuhao Zhang","Jack Hessel"],"abstract":"Instruction-tuned language models (LM) are able to respond to imperative commands, providing a more natural user interface compared to their base counterparts. In this work, we present Promptriever, the first retrieval model able to be prompted like an LM. To train Promptriever, we curate and release a new instance-level instruction training set from MS MARCO, spanning nearly 500k instances. Promptriever not only achieves strong performance on standard retrieval tasks, but also follows instructions. We observe: (1) large gains (reaching SoTA) on following detailed relevance instructions (+14.3 p-MRR / +3.1 nDCG on FollowIR), (2) significantly increased robustness to lexical choices/phrasing in the query+instruction (+12.9 Robustness@10 on InstructIR), and (3) the ability to perform hyperparameter search via prompting to reliably improve retrieval performance (+1.4 average increase on BEI...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":84,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Search and information retrieval","Computation and Language","Computer science","Information retrieval","1970-01-01","retrieval"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W4402796316","title":"Economics and Equity of Large Language Models: Health Care Perspective","url":"https://doi.org/10.2196/64226","published":"2024-09-16","authors":["Radha Nagarajan","Midori Kondo","Franz Salas","Emre Sezgın","Yuan Yao","Vanessa Klotzman","Sandip A. Godambe","Naqi Khan","Alfonso Limon","Graham Stephenson","Sharief Taraman","Nephi Walton"],"abstract":"Large language models (LLMs) continue to exhibit noteworthy capabilities across a spectrum of areas, including emerging proficiencies across the health care continuum. Successful LLM implementation and adoption depend on digital readiness, modern infrastructure, a trained workforce, privacy, and an ethical regulatory landscape. These factors can vary significantly across health care ecosystems, dictating the choice of a particular LLM implementation pathway. This perspective discusses 3 LLM implementation pathways-training from scratch pathway (TSP), fine-tuned pathway (FTP), and out-of-the-box pathway (OBP)-as potential onboarding points for health systems while facilitating equitable adoption. The choice of a particular pathway is governed by needs as well as affordability. Therefore, the risks, benefits, and economics of these pathways across 4 major cloud service providers (Amazon, M...","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.2196/64226","openalex_id":"https://openalex.org/W4402796316","cited_by_count":25,"quality_score":70,"matched_keywords":["LLM","long-term"],"author_affiliations":["Amazon (United States)","Children's Hospital of Orange County","Cognizant (United States)","Irvine University","Massachusetts Institute of Technology","Medical College of Wisconsin","NYU Langone Health","National Institutes of Health","Nationwide Children's Hospital","Office of Patient Care Services","Orthopaedic Specialty Institute","Scripps Research Institute","UC Irvine Health","University of California, Irvine"],"concepts":[{"id":"https://openalex.org/C43169469","display_name":"Preprint","score":0.910622239112854},{"id":"https://openalex.org/C199728807","display_name":"Equity (law)","score":0.6263935565948486},{"id":"https://openalex.org/C160735492","display_name":"Health care","score":0.5780660510063171},{"id":"https://openalex.org/C12713177","display_name":"Perspective (graphical)","score":0.5732361078262329},{"id":"https://openalex.org/C2250968","display_name":"Health equity","score":0.4110940098762512},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.3521367609500885},{"id":"https://openalex.org/C144024400","display_name":"Sociology","score":0.33836957812309265},{"id":"https://openalex.org/C162324750","display_name":"Economics","score":0.29512834548950195}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":25}},{"id":"arxiv:2409.10016","title":"AceParse: A Comprehensive Dataset with Diverse Structured Texts for Academic Literature Parsing","url":"https://huggingface.co/papers/2409.10016","published":"2024-09-16","authors":["Huawei Ji","Cheng Deng","Bo Xue","Zhouyang Jin","Jiaxin Ding","Xiaoying Gan","Luoyi Fu","Xinbing Wang","Chenghu Zhou"],"abstract":"With the development of data-centric AI, the focus has shifted from model-driven approaches to improving data quality. Academic literature, as one of the crucial types, is predominantly stored in PDF formats and needs to be parsed into texts before further processing. However, parsing diverse structured texts in academic literature remains challenging due to the lack of datasets that cover various text structures. In this paper, we introduce AceParse, the first comprehensive dataset designed to support the parsing of a wide range of structured texts, including formulas, tables, lists, algorithms, and sentences with embedded mathematical expressions. Based on AceParse, we fine-tuned a multimodal model, named AceParser, which accurately parses various structured texts within academic literature. This model outperforms the previous state-of-the-art by 4.1% in terms of F1 score and by 5% in....","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["huggingface_search"],"source":"huggingface_search","work_type":"","doi":"","openalex_id":"","cited_by_count":0,"quality_score":27,"matched_keywords":[],"author_affiliations":[],"concepts":[],"official_report":false,"quality_signals":{"company_match_source":"HuggingFace Papers organization/author metadata"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/learnings-from-a-large-scale-deployment-of-an-llm-powered-expert-in-the-loop-healthcare-chatbot","title":"Learnings from a Large-Scale Deployment of an LLM-Powered Expert-in-the-Loop Healthcare Chatbot","url":"https://www.microsoft.com/en-us/research/publication/learnings-from-a-large-scale-deployment-of-an-llm-powered-expert-in-the-loop-healthcare-chatbot/","published":"2024-09-15","authors":["Bhuvan Sachdeva","Pragnya Ramjee","Geeta Fulari","Dr. Kaushik Murali","Mohit Jain"],"abstract":"Large Language Models (LLMs) are widely used in healthcare, but limitations like hallucinations, incomplete information, and bias hinder their reliability. To address these, researchers released the Build Your Own expert Bot (BYOeB) platform, enabling developers to create LLM-powered chatbots with integrated expert verification. CataractBot, its first implementation, provides expert-verified responses to cataract surgery questions. A pilot evaluation showed its potential; however the study had a small sample size and was primarily qualitative. In this work, we conducted a large-scale 24-week deployment of CataractBot involving 318 patients and attendants who sent 1,992 messages, with 91.71% of responses verified by seven experts. Analysis of interaction logs revealed that medical questions significantly outnumbered logistical ones, hallucinations were negligible, and experts rated 84.52%...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":84,"matched_keywords":["Miscellaneous","Artificial intelligence","Human-computer interaction","Medical, health and genomics","Chatbot","Computer science","Human–computer interaction","LLM"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W4406265638","title":"Characterizing the Accuracy-Efficiency Trade-off of Low-rank Decomposition in Language Models","url":"https://doi.org/10.1109/iiswc63097.2024.00026","published":"2024-09-15","authors":["Chakshu Moar","Faraz Tahmasebi","Michael Pellauer","Hyoukjun Kwon"],"abstract":"Recent large language models (LLMs) employ billions of parameters to enable broad problem-solving capabilities. Such language models also tend to be memory-bound because of the dominance of matrix-vector and matrix-matrix multiplications with low arithmetic intensity. Therefore, optimizing the memory footprint and traffic is an important optimization direction for LLMs today. Model compression methods such as quantization and parameter pruning have been actively explored to achieve memory footprint and traffic optimization. However, the accuracy-efficiency trade-off of rank pruning (i.e., low-rank decomposition) for LLMs is not well-understood yet. Therefore, in this work, we characterize the accuracy-efficiency trade-off of a low-rank decomposition method, specifically Tucker decomposition, on recent language models, including an open-source LLM, Llama 2.We formalize the low-rank decomp...","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/iiswc63097.2024.00026","openalex_id":"https://openalex.org/W4406265638","cited_by_count":1,"quality_score":58,"matched_keywords":["LLM","memory","compression","quantization","agent"],"author_affiliations":["Nvidia (United States)","University of California, Irvine"],"concepts":[{"id":"https://openalex.org/C124681953","display_name":"Decomposition","score":0.6296571493148804},{"id":"https://openalex.org/C164226766","display_name":"Rank (graph theory)","score":0.6232246160507202},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5983108282089233},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.32317960262298584},{"id":"https://openalex.org/C33923547","display_name":"Mathematics","score":0.16947302222251892},{"id":"https://openalex.org/C185592680","display_name":"Chemistry","score":0.08027186989784241},{"id":"https://openalex.org/C114614502","display_name":"Combinatorics","score":0.0},{"id":"https://openalex.org/C178790620","display_name":"Organic chemistry","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4406262129","title":"Procedures for Evaluating Classical, Quantum, and Hybrid Machine Learning Algorithms","url":"https://doi.org/10.1109/qce60285.2024.10427","published":"2024-09-15","authors":["P.K. Mishra","R. W. Lessard","Indranil Roychoudhury"],"abstract":"Quantum machine learning (QML), an emerging discipline with applications in various domains, has the potential to dramatically improve deep learning while reducing model complexity. Quantum computers are well suited for machine learning (ML) tasks, especially because ML models are rapidly growing and require large amounts of data to learn. For example, classical computers that train large language and foundation models typically require supercomputers. If a need exists to double the number of model parameters, one will have to double the size of the supercomputer needed to train it. A quantum computer's parameter or state space grows exponentially with the number of qubits. This means doubling the number of model parameters only requires adding one quantum bit. Another advantage is that quantum computers can use entanglement and interference to potentially learn patterns with less data.....","companies":["Meta/FAIR"],"matched_orgs":["Meta/FAIR"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/qce60285.2024.10427","openalex_id":"https://openalex.org/W4406262129","cited_by_count":1,"quality_score":38,"matched_keywords":[],"author_affiliations":["Menlo School","Meta (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6863620281219482},{"id":"https://openalex.org/C84114770","display_name":"Quantum","score":0.47950679063796997},{"id":"https://openalex.org/C11413529","display_name":"Algorithm","score":0.4360157251358032},{"id":"https://openalex.org/C137019171","display_name":"Quantum algorithm","score":0.42402997612953186},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.38254639506340027},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.35369935631752014},{"id":"https://openalex.org/C121332964","display_name":"Physics","score":0.08518779277801514},{"id":"https://openalex.org/C62520636","display_name":"Quantum mechanics","score":0.0758049488067627}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4402516174","title":"Interpretable Failure Localization for Microservice Systems Based on Graph Autoencoder","url":"https://doi.org/10.1145/3695999","published":"2024-09-13","authors":["Yongqian Sun","Zihan Lin","Binpeng Shi","Shenglin Zhang","Shiyu Ma","Pengxiang Jin","Zhenyu Zhong","Lemeng Pan","Yicheng Guo","Dan Pei"],"abstract":"Accurate and efficient localization of root cause instances in large-scale microservice systems is of paramount importance. Unfortunately, prevailing methods face several limitations. Notably, some recent methods rely on supervised learning which necessitates a substantial amount of labeled data. However, labeling root cause instances is time-consuming and laborious, especially with multiple modalities of data including logs, traces, metrics, and so on. Moreover, some approaches favor deep learning for localization but lack interpretability and continuous improvement mechanisms. To address the above challenges, we propose DeepHunt , a novel root cause localization method based on multimodal data analysis. Firstly, DeepHunt introduces root cause score (RCS) by integrating reconstruction errors and failure propagation patterns (upstream–downstream relationships), imparting interpretability...","companies":["Alibaba/Qwen","Huawei/Noah"],"matched_orgs":["Alibaba/Qwen","Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3695999","openalex_id":"https://openalex.org/W4402516174","cited_by_count":18,"quality_score":71,"matched_keywords":["efficient"],"author_affiliations":["Alibaba Group (China)","Huawei Technologies (China)","Nankai University","Tsinghua University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8055951595306396},{"id":"https://openalex.org/C101738243","display_name":"Autoencoder","score":0.7084013223648071},{"id":"https://openalex.org/C132525143","display_name":"Graph","score":0.5546207427978516},{"id":"https://openalex.org/C2778505942","display_name":"Microservices","score":0.530820369720459},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3867630064487457},{"id":"https://openalex.org/C80444323","display_name":"Theoretical computer science","score":0.2997371554374695},{"id":"https://openalex.org/C50644808","display_name":"Artificial neural network","score":0.1484096348285675},{"id":"https://openalex.org/C79974875","display_name":"Cloud computing","score":0.08790639042854309}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":18}},{"id":"bytedance-seed:107","title":"Seed-Music: A Unified Framework for High Quality and Controlled Music Generation","url":"https://seed.bytedance.com/en/research/seed-music-a-unified-framework-for-high-quality-and-controlled-music-generation","published":"2024-09-13","authors":["Ye Bai","Haonan Chen","Jitong Chen","Zhuo Chen","Yi Deng","Xiaohong Dong","Lamtharn Hantrakul","Weituo Hao","Qingqing Huang","Zhongyi Huang","Dongya Jia","Feihu La"],"abstract":"We introduce Seed-Music, a suite of music generation systems capable of producing high-quality music with fine-grained style control. Our unified framework leverages both auto-regressive language modeling and diffusion approaches to support two key music creation workflows: controlled music generation and postproduction editing. For controlled music generation, our system enables vocal music generation with performance controls from multi-modal inputs, including style descriptions, audio references, musical scores, and voice prompts. For postproduction editing, it offers interactive tools for editing lyrics and vocal melodies directly in the generated audio.We encourage readers to listen to demo audio examples at https://team.doubao.com/seed-music. External paper link: https://arxiv.org/abs/2409.09214","companies":["ByteDance/Seed"],"matched_orgs":["ByteDance/Seed"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Speech&Audio","Speech","arXiv"],"author_affiliations":["ByteDance/Seed"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official ByteDance Seed publication API https://seed.bytedance.com/api/get_article_list_v2"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/windows-agent-arena-evaluating-multi-modal-os-agents-at-scale","title":"Windows Agent Arena: Evaluating Multi-Modal OS Agents at Scale","url":"https://www.microsoft.com/en-us/research/publication/windows-agent-arena-evaluating-multi-modal-os-agents-at-scale/","published":"2024-09-11","authors":["Rogerio Bonatti","Dan Zhao","Francesco Bonacci","Dillon Dupont","Sara Abdali","Yinheng Li","Yadong Lu","Justin Wagle","Kazuhito Koishida","A. Bucker","Lawrence Jang","Zack Hui"],"abstract":"Large language models (LLMs) show remarkable potential to act as computer agents, enhancing human productivity and software accessibility in multi-modal tasks that require planning and reasoning. However, measuring agent performance in realistic environments remains a challenge since: (i) most benchmarks are limited to specific modalities or domains (e.g. text-only, web navigation, Q&A, coding) and (ii) full benchmark evaluations are slow (on order of magnitude of days) given the multi-step sequential nature of tasks. To address these challenges, we introduce the Windows Agent Arena: a reproducible, general environment focusing exclusively on the Windows operating system (OS) where agents can operate freely within a real Windows OS and use the same wide range of applications, tools, and web browsers available to human users when solving tasks. We adapt the OSWorld framework (Xie et al.,....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","1970-01-01","agent"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"apple:yhriiz1dk5zzemrbini0cal4","title":"Retrieval-Augmented Correction of Named Entity Speech Recognition Errors","url":"https://machinelearning.apple.com/research/retrieval-asr","published":"2024-09-11","authors":["Ernest Pusateri","Anmol Walia","Anirudh Kashi","Bortik Bandyopadhyay","Nadia Hyder","Sayantan Mahinder","Raviteja Anantha","Daben Liu","Sashank Gondala"],"abstract":"In recent years, end-to-end automatic speech recognition (ASR) systems have proven themselves remarkably accurate and performant, but these systems still have a significant error rate for entity names which appear infrequently in their training data. In parallel to the rise of end-to-end ASR systems, large language models (LLMs) have proven to be a versatile tool for various natural language processing (NLP) tasks. In NLP tasks where a database...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["retrieval"],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"openalex:W4402442326","title":"UniTSyn: A Large-Scale Dataset Capable of Enhancing the Prowess of Large Language Models for Program Testing","url":"https://doi.org/10.1145/3650212.3680342","published":"2024-09-11","authors":["Yifeng He","Jiabo Huang","Yuyang Rong","Yiwen Guo","Ethan Wang","Hao Chen"],"abstract":"The remarkable capability of large language models (LLMs) in generating high-quality code has drawn increasing attention in the software testing community. However, existing code LLMs often demonstrate unsatisfactory capabilities in generating accurate, complete tests since they were trained on code snippets collected without differentiating between code for testing and for other purposes. In this paper, we present a large-scale dataset, UniTSyn, which can enhance LLMs for Unit Test Synthesis. Associating tests with the tested functions is crucial for LLMs to infer the expected behavior and the logic paths to be verified. By leveraging Language Server Protocol, UniTSyn achieves the challenging goal of collecting focal-test pairs without per-project execution setups or per-language heuristics, which tend to be fragile and difficult to scale. Containing 2.7 million focal-test pairs across....","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3650212.3680342","openalex_id":"https://openalex.org/W4402442326","cited_by_count":10,"quality_score":51,"matched_keywords":["LLM"],"author_affiliations":["Tencent (China)","University of California, Davis"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6957310438156128},{"id":"https://openalex.org/C2778755073","display_name":"Scale (ratio)","score":0.6404274702072144},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.402762770652771},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.36074984073638916},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.332863986492157},{"id":"https://openalex.org/C58640448","display_name":"Cartography","score":0.06286969780921936},{"id":"https://openalex.org/C205649164","display_name":"Geography","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":10}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/policy-filtration-for-rlhf-to-mitigate-noise-in-reward-models","title":"Policy Filtration for RLHF to Mitigate Noise in Reward Models","url":"https://www.microsoft.com/en-us/research/publication/policy-filtration-for-rlhf-to-mitigate-noise-in-reward-models/","published":"2024-09-10","authors":["Chuheng Zhang","Wei Shen","Li Zhao","Xuyun Zhang","Xiaolong Xu","Wanchun Dou","Jiang Bian"],"abstract":"While direct policy optimization methods exist, pioneering LLMs are fine-tuned with reinforcement learning from human feedback (RLHF) to generate better responses under the supervision of a reward model learned from preference data. One major challenge of RLHF is the inaccuracy of the intermediate reward model, especially in the tasks that requires complex reasoning for the reward model to score a response. We find that the reliability of the reward model varies across responses assigned with different rewards. This motivates us to filter the samples whose rewards may be unreliable to improve the signal-to-noise ratio during policy learning, resulting in Policy Filtration for Proximal Policy Optimization (PF-PPO). To choose a proper policy filtering strategy, we use the coefficient of determination (R2) between the rewards and actual scores on filtered samples as the metrics to help us f...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","large language models","1970-01-01","preference"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/can-we-count-on-llms-the-fixed-effect-fallacy-and-claims-of-gpt-4-capabilities","title":"Can We Count on LLMs? The Fixed-Effect Fallacy and Claims of GPT-4 Capabilities","url":"https://www.microsoft.com/en-us/research/publication/can-we-count-on-llms-the-fixed-effect-fallacy-and-claims-of-gpt-4-capabilities/","published":"2024-09-10","authors":["Thomas Ball","Shuo Chen","Cormac Herley"],"abstract":"In this paper we explore evaluation of LLM capabilities. We present measurements of GPT-4 performance on several deterministic tasks; each task involves a basic calculation and takes as input parameter some element drawn from a large well-defined population (e.g., count elements in a list, multiply two k-digit numbers, etc). We examine several conditions per-task and perform enough trials so that statistically significant differences can be detected. This allows us to investigate the sensitivity of task-accuracy both to query phrasing and input parameter population. We find that seemingly trivial modifications in the task-prompt or input population can yield differences far larger than can be explained by sampling effects. For example, performance on a simple list-counting task varies with query-phrasing and list-length, but also with list composition (i.e., the thing-to-be-counted) and....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Miscellaneous","Artificial intelligence","Computer science","LLM"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"apple:nq0aa2uw45v2v31mq30211e5","title":"Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs","url":"https://machinelearning.apple.com/research/ferretui-mobile","published":"2024-09-10","authors":["Keen You","Haotian Zhang","Eldon Schoop","Floris Weers","Amanda Swearngin","Jeffrey Nichols","Yinfei Yang","Zhe Gan"],"abstract":"Recent advancements in multimodal large language models (MLLMs) have been noteworthy, yet, these general-domain MLLMs often fall short in their ability to comprehend and interact effectively with user interface (UI) screens. In this paper, we present Ferret-UI, a new MLLM tailored for enhanced understanding of mobile UI screens, equipped with referring, grounding, and reasoning capabilities. Given that UI screens typically exhibit a more...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/overview-of-the-mediqa-magic-task-at-imageclef-2024-multimodal-and-generative-telemedicine-in-dermatology","title":"Overview of the MEDIQA-MAGIC Task at ImageCLEF 2024: Multimodal And Generative TelemedICine in Dermatology","url":"https://www.microsoft.com/en-us/research/publication/overview-of-the-mediqa-magic-task-at-imageclef-2024-multimodal-and-generative-telemedicine-in-dermatology/","published":"2024-09-09","authors":["Wen-wai Yim","Asma Ben Abacha","Yujuan Fu","Zhaoyi Sun","Meliha Yetisgen","Fei Xia"],"abstract":"Multimodal processing and language generation require models to internally represent both language and vision, and then generate contextually appropriate responses. To do so with arbitrary images and textual inputs in the medical field, requires additional high performance and fidelity. This paper presents the overview of the MEDIQA-MAGIC shared task at ImageCLEF 2024. In this dermatological visual question-answering (VQA) task, participants receive the input of an image and a textual consumer health query, and are expected to output a textual medical answer. A total of twenty two runs were submitted with a variety of general language-vision models and fine-tuned models, with the best team achieving 8.969 BLEU points. We hope that the findings and insights explored here will inspire future research directions to support improved patient care. Venue: 1970-01-01","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer vision","Medical, health and genomics","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"apple:mq6gl2qfi4z04f9w3xlo5uqk","title":"UI-JEPA: Towards Active Perception of User Intent Through Onscreen User Activity","url":"https://machinelearning.apple.com/research/ui-intent","published":"2024-09-09","authors":["Yicheng Fu","Raviteja Anantha","Prabal Vashisht","Jianpeng Cheng","Etai Littwin"],"abstract":"Generating user intent from a sequence of user interface (UI) actions is a core challenge in comprehensive UI understanding. Recent advancements in multimodal large language models (MLLMs) have led to substantial progress in this area, but their demands for extensive model parameters, computing power, and high latency makes them impractical for scenarios requiring lightweight, on-device solutions with low latency or heightened privacy....","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"openalex:W4402351150","title":"Cap4Video++: Enhancing Video Understanding With Auxiliary Captions","url":"https://doi.org/10.1109/tpami.2024.3410329","published":"2024-09-09","authors":["Wenhao Wu","Xiaohan Wang","Haipeng Luo","Jingdong Wang","Yi Yang","Wanli Ouyang"],"abstract":"Understanding videos, especially aligning them with textual data, presents a significant challenge in computer vision. The advent of vision-language models (VLMs) like CLIP has sparked interest in leveraging their capabilities for enhanced video understanding, showing marked advancements in both performance and efficiency. However, current methods often neglect vital user-generated metadata such as video titles. In this paper, we present Cap4Video++, a universal framework that leverages auxiliary captions to enrich video understanding. More recently, we witness the flourishing of large language models (LLMs) like ChatGPT. Cap4Video++ harnesses the synergy of vision-language models (VLMs) and large language models (LLMs) to generate video captions, utilized in three key phases: (i) Input stage employs Semantic Pair Sampling to extract beneficial samples from captions, aiding contrastive l...","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/tpami.2024.3410329","openalex_id":"https://openalex.org/W4402351150","cited_by_count":3,"quality_score":44,"matched_keywords":["retrieval"],"author_affiliations":["Baidu (China)","Beijing Academy of Artificial Intelligence","Shanghai Artificial Intelligence Laboratory","Stanford University","The University of Sydney","University of Chinese Academy of Sciences","Zhejiang University of Science and Technology"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.736961305141449},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5845194458961487},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.4679614007472992},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.39814338088035583},{"id":"https://openalex.org/C28490314","display_name":"Speech recognition","score":0.36640506982803345},{"id":"https://openalex.org/C49774154","display_name":"Multimedia","score":0.35003921389579773}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":3}},{"id":"openalex:W4402350740","title":"Siamese Neural Network and Multimodal Data Fusion Approach for Small-Sample Learning in Industrial Soft Sensor Modeling","url":"https://doi.org/10.1109/jsen.2024.3451148","published":"2024-09-09","authors":["Yuchen Zhao","Zhe Liu","Yan Feng","Chunjie Yang","Yang Cao","Yaoyao Bao","Jiayun Mao","Siwei Lou","Hang Xiao"],"abstract":"In industrial scenarios, soft sensor modeling often faces the challenge of underfitting due to limited available data. Therefore, in the field of industrial soft sensing, it is crucial to develop solutions for regression problems based on small-sample learning. One potentially feasible approach is to pair the data used for modeling to generate a larger set of data pairs, and then utilize Siamese neural networks (SNNs) to extract additional information from the paired data, thereby addressing the underfitting issues. Moreover, the integration of data from multiple sources can also help alleviate the problem of underfitting. Based on these ideas, this article proposes an SNN-based multimodal data fusion framework (SNN-MMDFF) for tackling the challenges of small-sample learning in industrial soft sensor modeling. The SNN-MDDFF employs end-to-end multimodal feature extraction (MMFE) networks...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/jsen.2024.3451148","openalex_id":"https://openalex.org/W4402350740","cited_by_count":2,"quality_score":39,"matched_keywords":[],"author_affiliations":["Alibaba Group (China)","State Key Laboratory of Industrial Control Technology"],"concepts":[{"id":"https://openalex.org/C115575686","display_name":"Soft sensor","score":0.7331565618515015},{"id":"https://openalex.org/C33954974","display_name":"Sensor fusion","score":0.7016814947128296},{"id":"https://openalex.org/C50644808","display_name":"Artificial neural network","score":0.6861639022827148},{"id":"https://openalex.org/C198531522","display_name":"Sample (material)","score":0.566369891166687},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5637917518615723},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5502175092697144},{"id":"https://openalex.org/C158525013","display_name":"Fusion","score":0.46971243619918823},{"id":"https://openalex.org/C67186912","display_name":"Data modeling","score":0.41636741161346436}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":2}},{"id":"openalex:W4402335002","title":"CLMTR: a generic framework for contrastive multi-modal trajectory representation learning","url":"https://doi.org/10.1007/s10707-024-00528-6","published":"2024-09-07","authors":["Anqi Liang","Bin Yao","Jiong Xie","Wenli Zheng","Yanyan Shen","Qiqi Ge"],"abstract":"","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1007/s10707-024-00528-6","openalex_id":"https://openalex.org/W4402335002","cited_by_count":2,"quality_score":39,"matched_keywords":[],"author_affiliations":["Alibaba Group (China)","Shanghai Jiao Tong University"],"concepts":[{"id":"https://openalex.org/C2776359362","display_name":"Representation (politics)","score":0.6485580801963806},{"id":"https://openalex.org/C71139939","display_name":"Modal","score":0.6356885433197021},{"id":"https://openalex.org/C13662910","display_name":"Trajectory","score":0.5742748975753784},{"id":"https://openalex.org/C205649164","display_name":"Geography","score":0.5272157192230225},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.47831442952156067},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4259323477745056},{"id":"https://openalex.org/C58640448","display_name":"Cartography","score":0.38168108463287354},{"id":"https://openalex.org/C41895202","display_name":"Linguistics","score":0.35125651955604553}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":2}},{"id":"apple:wl6guvfa36ibp97iij2h1upp","title":"CTRLorALTer: Conditional LoRAdapter for Efficient Zero-Shot Control & Altering of T2I Models","url":"https://machinelearning.apple.com/research/conditional-loradapter","published":"2024-09-06","authors":["Nick Stracke","Stefan Andreas Baumann","Josh Susskind","Miguel Angel Bautista Martin","Björn Ommer"],"abstract":"Text-to-image generative models have become a prominent and powerful tool that excels at generating high-resolution realistic images. However, guiding the generative process of these models to consider detailed forms of conditioning reflecting style and/or structure information remains an open problem. In this paper, we present LoRAdapter, an approach that unifies both style and structure conditioning under the same formulation using a novel...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["efficient"],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"official:4fe94c31026152f1","title":"Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model","url":"https://ai.meta.com/research/publications/transfusion-predict-the-next-token-and-diffuse-images-with-one-multi-modal-model/","published":"2024-09-05","authors":["Chunting Zhou","Lili Yu","Arun Babu","Kushal Tirumala","Michihiro Yasunaga","Leonid Shamis","Jacob Kahn","Luke Zettlemoyer","Omer Levy","Xuezhe Ma"],"abstract":"Official Meta AI publication page.","companies":["Meta/FAIR"],"matched_orgs":["Meta/FAIR"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Conversational AI","NLP"],"author_affiliations":["Meta/FAIR"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Meta AI publication search https://ai.meta.com/results/?content_types%5B0%5D=publication&page=10"}},{"id":"openalex:W4402217122","title":"An approximation theory framework for measure-transport sampling algorithms","url":"https://doi.org/10.1090/mcom/4013","published":"2024-09-04","authors":["Ricardo Baptista","Bamdad Hosseini","Nikola Kovachki","Youssef Marzouk","Amir Sagiv"],"abstract":"This article presents a general approximation-theoretic framework to analyze measure transport algorithms for probabilistic modeling. A primary motivating application for such algorithms is sampling—a central task in statistical inference and generative modeling. We provide a priori error estimates in the continuum limit, i.e., when the measures (or their densities) are given, but when the transport map is discretized or approximated using a finite-dimensional function space. Our analysis relies on the regularity theory of transport maps and on classical approximation theory for high-dimensional functions. A third element of our analysis, which is of independent interest, is the development of new stability estimates that relate the distance between two maps to the distance (or divergence) between the pushforward measures they define. We present a series of applications of our framework,...","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1090/mcom/4013","openalex_id":"https://openalex.org/W4402217122","cited_by_count":9,"quality_score":46,"matched_keywords":[],"author_affiliations":["California Institute of Technology","Decision Systems (United States)","Mathematical Systems & Solutions (United States)","Nvidia (United States)","Technion – Israel Institute of Technology","University of Washington"],"concepts":[{"id":"https://openalex.org/C33923547","display_name":"Mathematics","score":0.8140257596969604},{"id":"https://openalex.org/C2780009758","display_name":"Measure (data warehouse)","score":0.6312480568885803},{"id":"https://openalex.org/C140779682","display_name":"Sampling (signal processing)","score":0.46325430274009705},{"id":"https://openalex.org/C28826006","display_name":"Applied mathematics","score":0.45130455493927},{"id":"https://openalex.org/C2777686260","display_name":"Calculus (dental)","score":0.43261903524398804},{"id":"https://openalex.org/C11413529","display_name":"Algorithm","score":0.40743309259414673},{"id":"https://openalex.org/C126255220","display_name":"Mathematical optimization","score":0.3682364225387573},{"id":"https://openalex.org/C121864883","display_name":"Statistical physics","score":0.34054431319236755}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":9}},{"id":"openalex:W4409310963","title":"High Efficient Neural Network for the Segmentation and Detection of Brain Tumors","url":"https://doi.org/10.1109/globalaisummit62156.2024.10947986","published":"2024-09-04","authors":["Divya Beeram","Saigurudatta Pamulaparthyvenkata","Dinesh Gottipalli","Sarika Mulukuntla","Shiak Abdul Kareen","T. Saravanan"],"abstract":"A malignant brain tumour is a tumour that has spread across the brain and is threatening human health. Correctly dividing tumors into subtypes and classes is essential for later prognosis and therapy planning. Identifying a brain tumor can be a tedious and error-prone process, hence radiologists need to use automation whenever possible. This paper presents conditional deep learning for structural multimodal MRIs of the brain to perform tumor categorization using a residual network, survival rate forecasting, and dissection, to name a few. To begin, we recommend a segmentation method that separates non-overlapping regions using a combination of conditional random fields and convolutional neural networks. Using these patches, finding the tumor takes hardly no time at all. Errors multiply if their scopes cross. In the paper's second section, the authors provide a method of feature mapping u...","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/globalaisummit62156.2024.10947986","openalex_id":"https://openalex.org/W4409310963","cited_by_count":0,"quality_score":41,"matched_keywords":["efficient"],"author_affiliations":["Amazon (United States)","GITAM University","San Jose State University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6953067183494568},{"id":"https://openalex.org/C89600930","display_name":"Segmentation","score":0.6259315013885498},{"id":"https://openalex.org/C50644808","display_name":"Artificial neural network","score":0.6023569703102112},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5630166530609131},{"id":"https://openalex.org/C124504099","display_name":"Image segmentation","score":0.4377460479736328},{"id":"https://openalex.org/C153180895","display_name":"Pattern recognition (psychology)","score":0.3679566979408264}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/mars-a-financial-market-simulation-engine-powered-by-generative-foundation-model","title":"MarS: a Financial Market Simulation Engine Powered by Generative Foundation Model","url":"https://www.microsoft.com/en-us/research/publication/mars-a-financial-market-simulation-engine-powered-by-generative-foundation-model/","published":"2024-09-03","authors":["Junjie Li","Yang Liu","Weiqing Liu","Shikai Fang","Lewen Wang","Chang Xu","Jiang Bian"],"abstract":"Generative models aim to simulate realistic effects of various actions across different contexts, from text generation to visual effects. Despite efforts to build real-world simulators, leveraging generative models for virtual worlds, like financial markets, remains underexplored. In financial markets, generative models can simulate market effects of various behaviors, enabling interaction with market scenes and players, and training strategies without financial risk. This simulation relies on the finest structured data in financial market like orders thus building the finest realistic simulation. We propose Large Market Model (LMM), an order-level generative foundation model, for financial market simulation, akin to language modeling in the digital world. Our financial Market Simulation engine (MarS), powered by LMM, addresses the need for realistic, interactive and controllable order g...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Miscellaneous","Artificial intelligence","Economics","Computer science","agent"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W4402188320","title":"Triple disentangled representation learning for multimodal affective analysis","url":"https://doi.org/10.1016/j.inffus.2024.102663","published":"2024-09-03","authors":["Ying Zhou","Xuefeng Liang","Han Chen","Yin Zhao","Xin Chen","Lida Yu"],"abstract":"","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1016/j.inffus.2024.102663","openalex_id":"https://openalex.org/W4402188320","cited_by_count":22,"quality_score":59,"matched_keywords":[],"author_affiliations":["Alibaba Group (China)","Beijing Normal University","Xidian University"],"concepts":[{"id":"https://openalex.org/C2779903281","display_name":"Modalities","score":0.8260864019393921},{"id":"https://openalex.org/C2780226545","display_name":"Modality (human–computer interaction)","score":0.812568187713623},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6525865793228149},{"id":"https://openalex.org/C185798385","display_name":"Benchmark (surveying)","score":0.5745633244514465},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5442015528678894},{"id":"https://openalex.org/C177148314","display_name":"Generalization","score":0.5362752079963684},{"id":"https://openalex.org/C2776359362","display_name":"Representation (politics)","score":0.5290631055831909},{"id":"https://openalex.org/C190470478","display_name":"Invariant (physics)","score":0.46297261118888855}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":22}},{"id":"openalex:W4402172467","title":"ReLS: Retrieval Is Efficient Knowledge Transfer For Logic Synthesis","url":"https://doi.org/10.1145/3670474.3685946","published":"2024-09-03","authors":["Rongjian Liang","Chia-Tung Ho","Anthony Agnesina","Wen-Hao Liu","Haoxing Ren"],"abstract":"Design houses manage extensive collections of legacy synthesis data containing valuable knowledge crucial for enhancing synthesis efficiency and performance. Effectively encapsulating and transferring this knowledge is a pressing research challenge. Previous methods incorporate synthesis knowledge into neural networks trained on legacy data, often necessitating retraining for updated databases and fine-tuning for new designs. This study introduces ReLS, a retrieval-powered logic synthesis framework that integrates network weight-captured knowledge with information stored in dynamic synthesis databases. ReLS employs a graph neural network-based And-Inverter Graphs (AIGs) encoder to convert new AIGs into vector representations for similarity searches against legacy AIGs. Top-ranked recipes from similar AIGs are retrieved and re-ranked to offer initial solutions for advanced synthesis optim...","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3670474.3685946","openalex_id":"https://openalex.org/W4402172467","cited_by_count":1,"quality_score":46,"matched_keywords":["retrieval","efficient"],"author_affiliations":["Nvidia (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6739683151245117},{"id":"https://openalex.org/C2776175482","display_name":"Transfer (computing)","score":0.422067254781723},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3651544451713562},{"id":"https://openalex.org/C173608175","display_name":"Parallel computing","score":0.08293715119361877}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/proceedings-of-the-first-workshop-on-large-language-models-for-evaluation-in-information-retrieval-llm4eval-2024","title":"Proceedings of The First Workshop on Large Language Models for Evaluation in Information Retrieval (LLM4Eval 2024)","url":"https://www.microsoft.com/en-us/research/publication/proceedings-of-the-first-workshop-on-large-language-models-for-evaluation-in-information-retrieval-llm4eval-2024/","published":"2024-09-02","authors":["Clemencia Siro","Mohammad Aliannejadi","Hossein A. Rahmani","Nick Craswell","Charles L. A. Clarke","Guglielmo Faggioli","Bhaskar Mitra","Paul Thomas","Emine Yilmaz"],"abstract":"About The Workshop: Large language models (LLMs) have demonstrated increasing task-solving abilities not present in smaller models. Utilizing the capabilities and responsibilities of LLMs for automated evaluation (LLM4Eval) has recently attracted considerable attention in multiple research communities. For instance, LLM4Eval models have been studied in the context of automated judgments, natural language generation, and retrieval augmented generation systems. We believe that the information retrieval community can significantly contribute to this growing research area by designing, implementing, analyzing, and evaluating various aspects of LLMs with applications to LLM4Eval tasks. The main goal of LLM4Eval workshop is to bring together researchers from industry and academia to discuss various aspects of LLMs for evaluation in information retrieval, including automated judgments, retrieva...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":80,"matched_keywords":["Miscellaneous","Artificial intelligence","Search and information retrieval","automatic evaluation","Information retrieval","large language models","retrieval"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/cmmd-contrastive-multi-modal-diffusion-for-video-audio-conditional-modeling","title":"CMMD: Contrastive Multi-Modal Diffusion for Video-Audio Conditional Modeling","url":"https://www.microsoft.com/en-us/research/publication/cmmd-contrastive-multi-modal-diffusion-for-video-audio-conditional-modeling/","published":"2024-09-01","authors":["Ruihan Yang","Hannes Gamper","Sebastian Braun"],"abstract":"We introduce a multi-modal diffusion model tailored for the bi-directional conditional generation of video and audio. We propose a joint contrastive training loss to improve the synchronization between visual and auditory occurrences. We present experiments on two datasets to evaluate the efficacy of our proposed model. The assessment of generation quality and alignment performance is carried out from various angles, encompassing both objective and subjective metrics. Our findings demonstrate that the proposed model outperforms the baseline in terms of quality and generation speed through introduction of our novel cross-modal easy fusion architectural block. Furthermore, the incorporation of the contrastive loss results in improvements in audio-visual alignment, particularly in the high-correlation video-to-audio generation task. Venue: 1970-01-01","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.1007/978-3-031-93806-1_16","openalex_id":"https://openalex.org/W4410915437","cited_by_count":2,"quality_score":82,"matched_keywords":["Inproceedings (Conference)","Audio and Acoustics","Graphics and multimedia","Audio signal processing","Computer vision","Machine learning","1970-01-01"],"author_affiliations":["Microsoft","Microsoft (United States)","University of California, Irvine"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/intention-is-all-you-need","title":"Intention Is All You Need","url":"https://www.microsoft.com/en-us/research/publication/intention-is-all-you-need/","published":"2024-09-01","authors":["Advait Sarkar"],"abstract":"Among the many narratives of the transformative power of Generative AI is one that sees in the world a latent nation of programmers who need to wield nothing but intentions and natural language to render their ideas in software. In this paper, this outlook is problematised in two ways. First, it is observed that generative AI is not a neutral vehicle of intention. Multiple recent studies paint a picture of the “mechanised convergence” phenomenon, namely, that generative AI has a homogenising effect on intention. Second, it is observed that the formation of intention itself is immensely challenging. Constraints, materiality, and resistance can offer paths to design metaphors for intentional tools. Finally, existentialist approaches to intention are discussed and possible implications for programming are proposed in the form of a speculative, illustrative set of intentional programming pra...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":80,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Human-computer interaction","Programming languages and software engineering","Human–computer interaction","Programming language","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/pam-prompting-audio-language-models-for-audio-quality-assessment","title":"PAM: Prompting Audio-Language Models for Audio Quality Assessment","url":"https://www.microsoft.com/en-us/research/publication/pam-prompting-audio-language-models-for-audio-quality-assessment/","published":"2024-09-01","authors":["Soham Deshmukh","Dareen Alharthi","Benjamin Elizalde","Hannes Gamper","Mahmoud Al Ismail","Rita Singh","Bhiksha Raj","Huaming Wang"],"abstract":"While audio quality is a key performance metric for various audio processing tasks, including generative modeling, its objective measurement remains a challenge. Audio-Language Models (ALMs) are pre-trained on audio-text pairs that may contain information about audio quality, the presence of artifacts, or noise. Given an audio input and a text prompt related to quality, an ALM can be used to calculate a similarity score between the two. Here, we exploit this capability and introduce PAM, a no-reference metric for assessing audio quality for different audio processing tasks. Contrary to other\"reference-free\"metrics, PAM does not require computing embeddings on a reference dataset nor training a task-specific model on a costly set of human listening scores. We extensively evaluate the reliability of PAM against established metrics and human listening scores on four tasks: text-to-audio (TT...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Audio and Acoustics","Audio and Speech Processing","Generative AI","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/domino-eliminating-communication-in-llm-training-via-generic-tensor-slicing-and-overlapping","title":"Domino: Eliminating Communication in LLM Training via Generic Tensor Slicing and Overlapping","url":"https://www.microsoft.com/en-us/research/publication/domino-eliminating-communication-in-llm-training-via-generic-tensor-slicing-and-overlapping/","published":"2024-09-01","authors":["Guanhua Wang","Chengming Zhang","Zheyu Shen","Ang Li","Olatunji Ruwase"],"abstract":"Given the popularity of generative AI, Large Language Models (LLMs) often consume hundreds or thousands of GPUs for parallelizing and accelerating the training process. Communication overhead becomes more pronounced when training LLMs at scale. To eliminate communication overhead in distributed LLM training, we propose Domino, which provides a generic scheme to hide communication behind computation. By breaking data dependency of a single batch training into smaller independent pieces, Domino pipelines these independent pieces training and provides generic strategy of fine-grained communication and computation overlapping. Extensive results show that, comparing with Megatron-LM, Domino achieves up to 1.3x speedup for LLM training on Nvidia DGX-H100 GPUs.","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Tech Report","Artificial intelligence","Systems and networking","Computer network","Distributed computing","LLM"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/cosmic-data-efficient-instruction-tuning-for-speech-in-context-learning","title":"COSMIC: Data Efficient Instruction-tuning For Speech In-Context Learning","url":"https://www.microsoft.com/en-us/research/publication/cosmic-data-efficient-instruction-tuning-for-speech-in-context-learning/","published":"2024-09-01","authors":["Jing Pan","Jian Wu","Yashesh Gaur","Sunit Sivasankaran","Zhuo Chen","Shujie Liu","Jinyu Li"],"abstract":"We present a cost-effective method to integrate speech into a large language model (LLM), resulting in a Contextual Speech Model with Instruction-following/in-context-learning Capabilities (COSMIC) multi-modal LLM. Using GPT-3.5, we generate Speech Comprehension Test Question-Answer (SQA) pairs from speech transcriptions for supervised instruction tuning. With under 30 million trainable parameters and only 450 hours of English speech data, COSMIC demonstrates emerging capabilities in instruction-following and in-context learning. Equipped with such capabilities, COSMIC achieves a maximum 33.18 BLEU score in 0-shot EN-to-X speech to text translation (S2TT) and a significant boost in the 1-shot setting. Additionally, there is an average 25.8% relative Word Error Rate (WER) reduction for 1-shot cross-domain adaptation. COSMIC exhibits a significant automatic speech recognition (ASR) accurac...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Human language technologies","1970-01-01","LLM","language model","efficient"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/ai-delegates-with-a-dual-focus-ensuring-privacy-and-strategic-self-disclosure","title":"AI Delegates with a Dual Focus: Ensuring Privacy and Strategic Self-Disclosure","url":"https://www.microsoft.com/en-us/research/publication/ai-delegates-with-a-dual-focus-ensuring-privacy-and-strategic-self-disclosure/","published":"2024-09-01","authors":["Xi Chen","Zhiyang Zhang","Fangkai Yang","Xiaoting Qin","Chao Du","Xi Cheng","Hangxin Liu","Qingwei Lin 林庆维","Saravan Rajmohan","Dongmei Zhang","Qi Zhang"],"abstract":"Large language model (LLM)-based AI delegates are increasingly utilized to act on behalf of users, assisting them with a wide range of tasks through conversational interfaces. Despite their advantages, concerns arise regarding the potential risk of privacy leaks, particularly in scenarios involving social interactions. While existing research has focused on protecting privacy by limiting the access of AI delegates to sensitive user information, many social scenarios require disclosing private details to achieve desired outcomes, necessitating a balance between privacy protection and disclosure. To address this challenge, we conduct a pilot study to investigate user preferences for AI delegates across various social relations and task scenarios, and then propose a novel AI delegate system that enables privacy-conscious self-disclosure. Our user study demonstrates that the proposed AI dele...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Article (Journal)","Artificial intelligence","Human-computer interaction","LLM","language model"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/retrieval-augmented-generation-rag-and-beyond-a-comprehensive-survey-on-how-to-make-your-llms-use-external-data-more-wisely","title":"Retrieval Augmented Generation (RAG) and Beyond: A Comprehensive Survey on How to Make your LLMs use External Data More Wisely","url":"https://www.microsoft.com/en-us/research/publication/retrieval-augmented-generation-rag-and-beyond-a-comprehensive-survey-on-how-to-make-your-llms-use-external-data-more-wisely/","published":"2024-09-01","authors":["Siyun Zhao","Yuqing Yang","Zilong Wang","Zhiyuan He","Luna K. Qiu","Lili Qiu"],"abstract":"Large language models (LLMs) augmented with external data have demonstrated remarkable capabilities in completing real-world tasks. Techniques for integrating external data into LLMs, such as Retrieval-Augmented Generation (RAG) and fine-tuning, are gaining increasing attention and widespread application. Nonetheless, the effective deployment of data-augmented LLMs across various specialized fields presents substantial challenges. These challenges encompass a wide range of issues, from retrieving relevant data and accurately interpreting user intent to fully harnessing the reasoning capabilities of LLMs for complex tasks. We believe that there is no one-size-fits-all solution for data-augmented LLM applications. In practice, underperformance often arises from a failure to correctly identify the core focus of a task or because the task inherently requires a blend of multiple capabilities....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Miscellaneous","Artificial intelligence","LLM","retrieval"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/gaussian-flow-bridges-for-audio-domain-transfer-with-unpaired-data","title":"Gaussian Flow Bridges for Audio Domain Transfer with Unpaired Data","url":"https://www.microsoft.com/en-us/research/publication/gaussian-flow-bridges-for-audio-domain-transfer-with-unpaired-data/","published":"2024-09-01","authors":["Eloi Moliner","Sebastian Braun","Hannes Gamper"],"abstract":"Audio domain transfer is the process of modifying audio signals to match characteristics of a different domain, while retaining the original content. Examples include transferring room acoustics or altering audio effects such as distortion. This paper investigates the potential of Gaussian Flow Bridges, an emerging approach in generative modeling, for these problems. The presented framework addresses the transport problem across different distributions of audio signals through the implementation of a series of two deterministic probability flows. The proposed framework facilitates manipulation of the target distribution properties through a continuous control variable, which defines a certain aspect of the target domain. Notably, this approach does not rely on paired examples for training. To address identified challenges on maintaining the speech content consistent, we recommend a train...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Inproceedings (Conference)","Audio and Acoustics","Audio and Speech Processing","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/an-investigation-of-noise-robustness-for-flow-matching-based-zero-shot-tts","title":"An Investigation of Noise Robustness for Flow-Matching-Based Zero-Shot TTS","url":"https://www.microsoft.com/en-us/research/publication/an-investigation-of-noise-robustness-for-flow-matching-based-zero-shot-tts/","published":"2024-09-01","authors":["Xiaofei Wang","Sefik Emre Eskimez","Manthan Thakker","Hemin Yang","Zirun Zhu","Min Tang","Yufei Xia","Jinzhu Li","sheng zhao","Jinyu Li","Naoyuki Kanda"],"abstract":"Recently, zero-shot text-to-speech (TTS) systems, capable of synthesizing any speaker's voice from a short audio prompt, have made rapid advancements. However, the quality of the generated speech significantly deteriorates when the audio prompt contains noise, and limited research has been conducted to address this issue. In this paper, we explored various strategies to enhance the quality of audio generated from noisy audio prompts within the context of flow-matching-based zero-shot TTS. Our investigation includes comprehensive training strategies: unsupervised pre-training with masked speech denoising, multi-speaker detection and DNSMOS-based data filtering on the pre-training data, and fine-tuning with random noise mixing. The results of our experiments demonstrate significant improvements in intelligibility, speaker similarity, and overall audio quality compared to the approach of ap...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Inproceedings (Conference)","Human language technologies","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W4402112155","title":"Improving Robustness of LLM-based Speech Synthesis by Learning Monotonic Alignment","url":"https://doi.org/10.21437/interspeech.2024-335","published":"2024-09-01","authors":["Paarth Neekhara","Shehzeen Hussain","Subhankar Ghosh","Jason Li","Boris Ginsburg"],"abstract":"Large Language Model (LLM) based text-to-speech (TTS) systems have demonstrated remarkable capabilities in handling large speech datasets and generating natural speech for new speakers.However, LLM-based TTS models are not robust as the generated output can contain repeating words, missing words and mis-aligned speech (referred to as hallucinations or attention errors), especially when the text contains multiple occurrences of the same token.We examine these challenges in an encoder-decoder transformer model and find that certain cross-attention heads in such models implicitly learn the text and speech alignment when trained for predicting speech tokens for a given text.To make the alignment more robust, we propose techniques utilizing CTC loss and attention priors that encourage monotonic cross-attention over the text tokens.Our guided attention training technique does not introduce any...","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.21437/interspeech.2024-335","openalex_id":"https://openalex.org/W4402112155","cited_by_count":7,"quality_score":52,"matched_keywords":["LLM","language model"],"author_affiliations":["Nvidia (United States)"],"concepts":[{"id":"https://openalex.org/C63479239","display_name":"Robustness (evolution)","score":0.7966787815093994},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6795821189880371},{"id":"https://openalex.org/C72169020","display_name":"Monotonic function","score":0.4871253967285156},{"id":"https://openalex.org/C28490314","display_name":"Speech recognition","score":0.42228686809539795},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4102162718772888},{"id":"https://openalex.org/C33923547","display_name":"Mathematics","score":0.11900126934051514},{"id":"https://openalex.org/C134306372","display_name":"Mathematical analysis","score":0.0},{"id":"https://openalex.org/C104317684","display_name":"Gene","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":7}},{"id":"openalex:W4402228277","title":"A benchmark approach and dataset for large-scale lane mapping from MLS point clouds","url":"https://doi.org/10.1016/j.jag.2024.104139","published":"2024-09-01","authors":["Xiaoxin Mi","Zhen Dong","Zhipeng Cao","Bisheng Yang","Zhen Cao","Chao Zheng","Jantien Stoter","Liangliang Nan"],"abstract":"Accurate lane maps with semantics are crucial for various applications, such as high-definition maps (HD Maps), intelligent transportation systems (ITS), and digital twins. Manual annotation of lanes is labor-intensive and costly, prompting researchers to explore automatic lane extraction methods. This paper presents an end-to-end large-scale lane mapping method that considers both lane geometry and semantics. This study represents lane markings as polylines with uniformly sampled points and associated semantics, allowing for adaptation to varying lane shapes. Additionally, we propose an end-to-end network to extract lane polylines from mobile laser scanning (MLS) data, enabling the inference of vectorized lane instances without complex post-processing. The network consists of three components: a feature encoder, a column proposal generator, and a lane information decoder. The feature en...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1016/j.jag.2024.104139","openalex_id":"https://openalex.org/W4402228277","cited_by_count":4,"quality_score":41,"matched_keywords":[],"author_affiliations":["Delft University of Technology","State Key Laboratory of Information Engineering in Surveying Mapping and Remote Sensing","Tencent (China)","Wuhan University of Technology"],"concepts":[{"id":"https://openalex.org/C185798385","display_name":"Benchmark (surveying)","score":0.7225989103317261},{"id":"https://openalex.org/C2778755073","display_name":"Scale (ratio)","score":0.6583979725837708},{"id":"https://openalex.org/C131979681","display_name":"Point cloud","score":0.6064738631248474},{"id":"https://openalex.org/C58640448","display_name":"Cartography","score":0.5597549676895142},{"id":"https://openalex.org/C205649164","display_name":"Geography","score":0.5359922647476196},{"id":"https://openalex.org/C28719098","display_name":"Point (geometry)","score":0.43667083978652954},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.4107493758201599},{"id":"https://openalex.org/C124101348","display_name":"Data mining","score":0.3564232587814331}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":4}},{"id":"openalex:W4402565929","title":"1942O Application of GigaPath: An open-weight billion-parameter AI foundation model based on a novel vision transformer architecture for cancer mutation prediction and TME analysis","url":"https://doi.org/10.1016/j.annonc.2024.08.2028","published":"2024-09-01","authors":["Carlo Bifulco","H. Poon","Naoto Usuyama","H. Xu","S. Wang","Brian Piening","Rom S. Leidner"],"abstract":"","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1016/j.annonc.2024.08.2028","openalex_id":"https://openalex.org/W4402565929","cited_by_count":2,"quality_score":39,"matched_keywords":[],"author_affiliations":["Microsoft (United States)","Microsoft Research (United Kingdom)","Providence Hospital","Providence Portland Medical Center","University of Washington","University of Washington Applied Physics Laboratory"],"concepts":[{"id":"https://openalex.org/C71924100","display_name":"Medicine","score":0.7702701091766357},{"id":"https://openalex.org/C66322947","display_name":"Transformer","score":0.5300120115280151},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.32485800981521606},{"id":"https://openalex.org/C127413603","display_name":"Engineering","score":0.10801327228546143},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.08183988928794861},{"id":"https://openalex.org/C119599485","display_name":"Electrical engineering","score":0.06729426980018616},{"id":"https://openalex.org/C165801399","display_name":"Voltage","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":2}},{"id":"openalex:W4402508545","title":"The Interplay Between Generative AI and 5G-Advanced Toward 6G","url":"http://dx.doi.org/10.1109/mnet.2024.3429692","published":"2024-09-01","authors":["Xingqin Lin","Mingzhe Chen","Taesang Yoo","Yue Wang","Lina Bariah","Nguyen H. Tran","Kaibin Huang"],"abstract":"","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/mnet.2024.3429692","openalex_id":"https://openalex.org/W4402508545","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["China Telecom","China Telecom (China)","Nvidia (United States)","Qualcomm (United States)","Technology Innovation Institute","The University of Sydney","University of Hong Kong","University of Miami"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.689652681350708},{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.5968196392059326},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3885532319545746},{"id":"https://openalex.org/C2522767166","display_name":"Data science","score":0.33290016651153564}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"official:60781642fb7d77b0","title":"DiffiT: Diffusion Vision Transformers for Image Generation","url":"https://research.nvidia.com/publication/2024-09_diffit-diffusion-vision-transformers-image-generation","published":"2024-09","authors":["Ali Hatamizadeh","Jiaming Song","Guilin Liu","Jan Kautz","Arash Vahdat"],"abstract":"Official NVIDIA Research publication. ECCV","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.1007/978-3-031-73242-3_3","openalex_id":"https://openalex.org/W4403843325","cited_by_count":31,"quality_score":86,"matched_keywords":["ECCV"],"author_affiliations":["NVIDIA","Nvidia (United States)"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official NVIDIA Research publications page https://research.nvidia.com/publications?f%5B0%5D=publication_date%3A2024&page=1"}},{"id":"openalex:W4402082295","title":"Earnings call scripts generation with large language models: A study of few-shot prompting and fine-tuning methods","url":"https://doi.org/10.22541/au.172514199.95737319/v1","published":"2024-08-31","authors":["Sovik Kumar Nath","Yanyan Zhang","Jia Li"],"abstract":"Company earnings calls are crucial events that provide transparency into a company’s financial health and prospects. Large language models (LLMs) offer a promising approach to automatically generate the first draft of earnings call scripts from financial data and past examples. We evaluate two methods: 1) Few-shot prompt engineering with a state-of-the-art model, and 2) Fine-tuning a language model on earnings call transcript data. Our results indicate both approaches can produce coherent scripts covering key metrics, updates, and guidance. However, there are trade-offs in comprehensiveness, hallucinations, writing style, ease of use, and cost. We discuss the pros and cons of each method to guide practitioners on effectively leveraging large language models for this financial communication task.","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"preprint","doi":"https://doi.org/10.22541/au.172514199.95737319/v1","openalex_id":"https://openalex.org/W4402082295","cited_by_count":0,"quality_score":41,"matched_keywords":["language model"],"author_affiliations":["Amazon (United States)"],"concepts":[{"id":"https://openalex.org/C61423126","display_name":"Scripting language","score":0.8393868803977966},{"id":"https://openalex.org/C2781426361","display_name":"Earnings","score":0.6927850842475891},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.4753648638725281},{"id":"https://openalex.org/C2778344882","display_name":"Shot (pellet)","score":0.4288567304611206},{"id":"https://openalex.org/C15744967","display_name":"Psychology","score":0.3394280672073364},{"id":"https://openalex.org/C41895202","display_name":"Linguistics","score":0.3315544128417969},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3221518397331238},{"id":"https://openalex.org/C144133560","display_name":"Business","score":0.299527645111084}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/compositional-3d-aware-video-generation-with-llm-director","title":"Compositional 3D-aware Video Generation with LLM Director","url":"https://www.microsoft.com/en-us/research/publication/compositional-3d-aware-video-generation-with-llm-director/","published":"2024-08-30","authors":["Hanxin Zhu","Tianyu He","Anni Tang","Junliang Guo","Zhibo Chen","Jiang Bian"],"abstract":"Significant progress has been made in text-to-video generation through the use of powerful generative models and large-scale internet data. However, substantial challenges remain in precisely controlling individual concepts within the generated video, such as the motion and appearance of specific characters and the movement of viewpoints. In this work, we propose a novel paradigm that generates each concept in 3D representation separately and then composes them with priors from Large Language Models (LLM) and 2D diffusion models. Specifically, given an input textual prompt, our scheme consists of three stages: 1) We leverage LLM as the director to first decompose the complex query into several sub-prompts that indicate individual concepts within the video~(\\textit{e.g.}, scene, objects, motions), then we let LLM to invoke pre-trained expert models to obtain corresponding 3D representatio...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":80,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","large language models","1970-01-01","LLM","distillation"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/phi-3-technical-report-a-highly-capable-language-model-locally-on-your-phone","title":"Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone","url":"https://www.microsoft.com/en-us/research/publication/phi-3-technical-report-a-highly-capable-language-model-locally-on-your-phone/","published":"2024-08-30","authors":["Marah I Abdin","Sam Ade Jacobs","Ammar Ahmad Awan","Jyoti Aneja","Ahmed Awadallah","Hany Hassan Awadalla","Nguyen Bach","Amit Bahree","Arash Bakhtiari","Harkirat Behl","Alon Benhaim","Misha Bilenko"],"abstract":"We introduce phi-3-mini, a 3.8 billion parameter language model trained on 3.3 trillion tokens, whose overall performance, as measured by both academic benchmarks and internal testing, rivals that of models such as Mixtral 8x7B and GPT-3.5 (e.g., phi-3-mini achieves 69% on MMLU and 8.38 on MT-bench), despite being small enough to be deployed on a phone. The innovation lies entirely in our dataset for training, a scaled-up version of the one used for phi-2, composed of heavily filtered web data and synthetic data. The model is also further aligned for robustness, safety, and chat format. We also provide some initial parameter-scaling results with a 7B and 14B models trained for 4.8T tokens, called phi-3-small and phi-3-medium, both significantly more capable than phi-3-mini (e.g., respectively 75% and 78% on MMLU, and 8.7 and 8.9 on MT-bench).","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Tech Report","Artificial intelligence","Computation and Language","language model"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/training-ultra-long-context-language-model-with-fully-pipelined-distributed-transformer","title":"Training Ultra Long Context Language Model with Fully Pipelined Distributed Transformer","url":"https://www.microsoft.com/en-us/research/publication/training-ultra-long-context-language-model-with-fully-pipelined-distributed-transformer/","published":"2024-08-29","authors":["Jinghan Yao","Sam Ade Jacobs","Masahiro Tanaka","Olatunji Ruwase","A. Shafi","H. Subramoni","Dhabaleswar K. Panda"],"abstract":"Large Language Models (LLMs) with long context capabilities are integral to complex tasks in natural language processing and computational biology, such as text generation and protein sequence analysis. However, training LLMs directly on extremely long contexts demands considerable GPU resources and increased memory, leading to higher costs and greater complexity. Alternative approaches that introduce long context capabilities via downstream finetuning or adaptations impose significant design limitations. In this paper, we propose Fully Pipelined Distributed Transformer (FPDT) for efficiently training long-context LLMs with extreme hardware efficiency. For GPT and Llama models, we achieve a 16x increase in sequence length that can be trained on the same hardware compared to current state-of-the-art solutions. With our dedicated sequence chunk pipeline design, we can now train 8B LLM with...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":84,"matched_keywords":["Miscellaneous","Artificial intelligence","Computer science","Distributed, Parallel, and Cluster Computing","Machine learning","LLM","language model","memory"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/is-this-it-towards-ecologically-valid-benchmarks-for-situated-collaboration","title":"\"Is This It?\": Towards Ecologically Valid Benchmarks for Situated Collaboration","url":"https://www.microsoft.com/en-us/research/publication/is-this-it-towards-ecologically-valid-benchmarks-for-situated-collaboration/","published":"2024-08-29","authors":["Dan Bohus","Sean Andrist","Emily Bao","Eric Horvitz","Ann Paradiso"],"abstract":"We report initial work towards constructing ecologically valid benchmarks to assess the capabilities of large multimodal models for engaging in situated collaboration. In contrast to existing benchmarks, in which question-answer pairs are generated post hoc over preexisting or synthetic datasets via templates, human annotators, or large language models (LLMs), we propose and investigate an interactive system-driven approach, where the questions are generated by users in context, during their interactions with an end-to-end situated AI system. We illustrate how the questions that arise are different in form and content from questions typically found in existing embodied question answering (EQA) benchmarks and discuss new real-world challenge problems brought to the fore.","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.1145/3686215.3690152","openalex_id":"https://openalex.org/W4403923056","cited_by_count":0,"quality_score":68,"matched_keywords":["Article (Journal)","Artificial intelligence","Human-computer interaction","Computer science"],"author_affiliations":["Microsoft","Michigan United","Microsoft (United States)","University of Michigan"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"official:458f370fde8a4a15","title":"Qwen2-VL: To See the World More Clearly","url":"https://qwenlm.github.io/blog/qwen2-vl/","published":"2024-08-29","authors":["Alibaba/Qwen"],"abstract":"DEMO GITHUB HUGGING FACE MODELSCOPE API DISCORDAfter a year’s relentless efforts, today we are thrilled to release Qwen2-VL! Qwen2-VL is the latest version of the vision language models based on Qwen2 in the Qwen model familities. Compared with Qwen-VL, Qwen2-VL has the capabilities of:SoTA understanding of images of various resolution & ratio: Qwen2-VL achieves state-of-the-art performance on visual understanding benchmarks, including MathVista, DocVQA, RealWorldQA, MTVQA, etc.Understanding videos of 20min+: Qwen2-VL can understand videos over 20 minutes for high-quality video-based question answering, dialog, content creation, etc.","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Alibaba/Qwen"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official RSS feed https://qwenlm.github.io/blog/index.xml"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/wildfeedback-aligning-llms-with-in-situ-user-interactions-and-feedback","title":"WildFeedback: Aligning LLMs With In-situ User Interactions And Feedback","url":"https://www.microsoft.com/en-us/research/publication/wildfeedback-aligning-llms-with-in-situ-user-interactions-and-feedback/","published":"2024-08-28","authors":["Taiwei Shi","Zhuoer Wang","Longqi Yang","Ying-Chun Lin","Zexue He","Mengting Wan","Pei Zhou","Sujay Kumar Jauhar","Xiaofeng Xu","Xia Song","Jennifer Neville"],"abstract":"As large language models (LLMs) continue to advance, aligning these models with human preferences has emerged as a critical challenge. Traditional alignment methods, relying on human or LLM annotated datasets, are limited by their resource-intensive nature, inherent subjectivity, and the risk of feedback loops that amplify model biases. To overcome these limitations, we introduce WildFeedback, a novel framework that leverages real-time, in-situ user interactions to create preference datasets that more accurately reflect authentic human values. WildFeedback operates through a three-step process: feedback signal identification, preference data construction, and user-guided evaluation. We applied this framework to a large corpus of user-LLM conversations, resulting in a rich preference dataset that reflects genuine user preferences. This dataset captures the nuances of user preferences by i...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Article (Journal)","Artificial intelligence","Computer science","1970-01-01","LLM","preference"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W4401943272","title":"When Search Engine Services Meet Large Language Models: Visions and Challenges","url":"https://doi.org/10.1109/tsc.2024.3451185","published":"2024-08-28","authors":["Haoyi Xiong","将尚 渡辺","Yuchen Li","Xuhong Li","Mengnan Du","Shuaiqiang Wang","Dawei Yin","Sumi Helal"],"abstract":"Combining Large Language Models (LLMs) with search engine services marks a significant shift in the field of services computing, opening up new possibilities to enhance how we search for and retrieve information, understand content, and interact with internet services. This paper conducts an in-depth examination of how integrating LLMs with search engines can mutually benefit both technologies. We focus on two main areas: using search engines to improve LLMs (Search4LLM) and enhancing search engine functions using LLMs (LLM4Search). For Search4LLM, we investigate how search engines can provide diverse high-quality datasets for pre-training of LLMs, how they can use the most relevant documents to help LLMs learn to answer queries more accurately, how training LLMs with Learning-To-Rank (LTR) tasks can enhance their ability to respond with greater precision, and how incorporating recent se...","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/tsc.2024.3451185","openalex_id":"https://openalex.org/W4401943272","cited_by_count":46,"quality_score":71,"matched_keywords":["LLM"],"author_affiliations":["Baidu (China)","New Jersey Institute of Technology","University of Bologna"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7657676935195923},{"id":"https://openalex.org/C133979268","display_name":"Vision","score":0.6677103042602539},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.5101635456085205},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.24528288841247559},{"id":"https://openalex.org/C27206212","display_name":"Theology","score":0.0},{"id":"https://openalex.org/C138885662","display_name":"Philosophy","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":46}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/understanding-the-weakness-of-large-language-model-agents-within-a-complex-android-environment","title":"Understanding the Weakness of Large Language Model Agents within a Complex Android Environment","url":"https://www.microsoft.com/en-us/research/publication/understanding-the-weakness-of-large-language-model-agents-within-a-complex-android-environment/","published":"2024-08-25","authors":["Mingzhe Xing","Rongkai Zhang","Hui Xue","Qi Chen","Fan Yang","Zhen Xiao"],"abstract":"Large language models (LLMs) have empowered intelligent agents to execute intricate tasks within domain-specific software such as browsers and games. However, when applied to general-purpose software systems like operating systems, LLM agents face three primary challenges. Firstly, the action space is vast and dynamic, posing difficulties for LLM agents to maintain an up-to-date understanding and deliver accurate responses. Secondly, real-world tasks often require inter-application cooperation}, demanding farsighted planning from LLM agents. Thirdly, agents need to identify optimal solutions aligning with user constraints, such as security concerns and preferences. These challenges motivate AndroidArena, an environment and benchmark designed to evaluate LLM agents on a modern operating system. To address high-cost of manpower, we design a scalable and semi-automated method to construct t...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","1970-01-01","LLM","language model"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/decoding-the-ai-pen-techniques-and-challenges-in-detecting-ai-generated-text","title":"Decoding the AI Pen: Techniques and Challenges in Detecting AI-Generated Text","url":"https://www.microsoft.com/en-us/research/publication/decoding-the-ai-pen-techniques-and-challenges-in-detecting-ai-generated-text/","published":"2024-08-24","authors":["Sara Abdali","Richard Anarfi","CJ Barberan","Jia He"],"abstract":"Large Language Models (LLMs) have revolutionized the field of Natural Language Generation (NLG) by demonstrating an impressive ability to generate human-like text. However, their widespread usage introduces challenges that necessitate thoughtful examination, ethical scrutiny, and responsible practices. In this study, we delve into these challenges, explore existing strategies for mitigating them, with a particular emphasis on identifying AI-generated text as the ultimate solution. Additionally, we assess the feasibility of detection from a theoretical perspective and propose novel research directions to address the current limitations in this domain. Venue: 1970-01-01","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.1145/3637528.3671463","openalex_id":"https://openalex.org/W4392736762","cited_by_count":12,"quality_score":100,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Human language technologies","Social sciences","Language model","large language models","Natural language processing","Social Science","1970-01-01"],"author_affiliations":["Microsoft","Microsoft (United States)"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/intelligent-router-for-llm-workloads-improving-performance-through-workload-aware-scheduling","title":"Intelligent Router for LLM Workloads: Improving Performance Through Workload-Aware Scheduling","url":"https://www.microsoft.com/en-us/research/publication/intelligent-router-for-llm-workloads-improving-performance-through-workload-aware-scheduling/","published":"2024-08-24","authors":["A. Parayil","Ankur Mallick","Esha Choukse","Xiaoting Qin","Jue Zhang","Íñigo Goiri","Rujia Wang","Chetan Bansal","Victor Ruehle","Saravan Rajmohan","Kunal Jain","Anoop Kulkarni"],"abstract":"Large Language Model (LLM) workloads have distinct prefill and decode phases with different compute and memory requirements which should ideally be accounted for when scheduling input queries across different LLM instances in a cluster. However existing scheduling algorithms treat LLM workloads as monolithic jobs without considering the distinct characteristics of the two phases in each workload. This leads to sub-optimal scheduling and increased response latency. In this work, we propose a heuristic-guided reinforcement learning-based intelligent router for data-driven and workload-aware scheduling. Our router leverages a trainable response-length predictor, and a novel formulation for estimating the impact of mixing different workloads to schedule queries across LLM instances and achieve over 11% lower end-to-end latency than existing approaches.","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":80,"matched_keywords":["Article (Journal)","Artificial intelligence","Systems and networking","Computer science","LLM","language model","memory"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W4401856679","title":"AutoWebGLM: A Large Language Model-based Web Navigating Agent","url":"https://doi.org/10.1145/3637528.3671620","published":"2024-08-24","authors":["Hanyu Lai","Xiao Liu","Iat Long Iong","Shuntian Yao","Yuxuan Chen","Pengbo Shen","Hao Yu","Hanchen Zhang","Xiaohan Zhang","Yuxiao Dong","Jie Tang"],"abstract":"Large language models (LLMs) have fueled many intelligent web agents, but most existing ones perform far from satisfying in real-world web navigation tasks due to three factors: (1) the complexity of HTML text data (2) versatility of actions on webpages, and (3) task difficulty due to the open-domain nature of the web. In light of these challenges, we develop the open AutoWebGLM based on ChatGLM3-6B. AutoWebGLM can serve as a powerful automated web navigation agent that outperform GPT-4. Inspired by human browsing patterns, we first design an HTML simplification algorithm to represent webpages with vital information preserved succinctly. We then employ a hybrid human-AI method to build web browsing data for curriculum training. Finally, we bootstrap the model by reinforcement learning and rejection sampling to further facilitate webpage comprehension, browser operations, and efficient ta...","companies":["Z.ai/Zhipu"],"matched_orgs":["Z.ai/Zhipu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3637528.3671620","openalex_id":"https://openalex.org/W4401856679","cited_by_count":30,"quality_score":79,"matched_keywords":["language model","efficient","agent"],"author_affiliations":["Beijing University of Posts and Telecommunications","Tsinghua University","University of Chinese Academy of Sciences","Zhipu AI (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8012771606445312},{"id":"https://openalex.org/C136764020","display_name":"World Wide Web","score":0.5088112950325012},{"id":"https://openalex.org/C199360897","display_name":"Programming language","score":0.4018076956272125},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.3605506420135498},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.32277965545654297},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3210896849632263}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":30}},{"id":"openalex:W4401863256","title":"Multimodal Pretraining, Adaptation, and Generation for Recommendation: A Survey","url":"https://doi.org/10.1145/3637528.3671473","published":"2024-08-24","authors":["Qijiong Liu","Jieming Zhu","Yanting Yang","Quanyu Dai","Zhaocheng Du","Xiao-Ming Wu","Zhou Zhao","Rui Zhang","Zhenhua Dong"],"abstract":"Personalized recommendation serves as a ubiquitous channel for users to discover information tailored to their interests. However, traditional recommendation models primarily rely on unique IDs and categorical features for user-item matching, potentially overlooking the nuanced essence of raw item contents across multiple modalities such as text, image, audio, and video. This underutilization of multimodal data poses a limitation to recommender systems, especially in multimedia services like news, music, and short-video platforms. The recent advancements in large multimodal models offer new opportunities and challenges in developing content-aware recommender systems. This survey seeks to provide a comprehensive exploration of the latest advancements and future trajectories in multimodal pretraining, adaptation, and generation techniques, as well as their applications in enhancing recomme...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3637528.3671473","openalex_id":"https://openalex.org/W4401863256","cited_by_count":35,"quality_score":75,"matched_keywords":["personalized","news"],"author_affiliations":["Hong Kong Polytechnic University","Huawei Technologies (China)","Huazhong University of Science and Technology","Zhejiang University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8197417259216309},{"id":"https://openalex.org/C557471498","display_name":"Recommender system","score":0.7871261239051819},{"id":"https://openalex.org/C139807058","display_name":"Adaptation (eye)","score":0.6839501261711121},{"id":"https://openalex.org/C2779903281","display_name":"Modalities","score":0.6497020721435547},{"id":"https://openalex.org/C5274069","display_name":"Categorical variable","score":0.5583914518356323},{"id":"https://openalex.org/C49774154","display_name":"Multimedia","score":0.46471986174583435},{"id":"https://openalex.org/C2522767166","display_name":"Data science","score":0.4619036316871643},{"id":"https://openalex.org/C136764020","display_name":"World Wide Web","score":0.45747941732406616}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":35}},{"id":"openalex:W4401863940","title":"LogParser-LLM: Advancing Efficient Log Parsing with Large Language Models","url":"https://doi.org/10.1145/3637528.3671810","published":"2024-08-24","authors":["Aoxiao Zhong","Dengyao Mo","Guiyang Liu","Jinbu Liu","Qingda Lu","Qi Zhou","Jiesheng Wu","Quanzheng Li","Qingsong Wen"],"abstract":"Logs are ubiquitous digital footprints, playing an indispensable role in system diagnostics, security analysis, and performance optimization. The extraction of actionable insights from logs is critically dependent on the log parsing process, which converts raw logs into structured formats for downstream analysis. Yet, the complexities of contemporary systems and the dynamic nature of logs pose significant challenges to existing automatic parsing techniques. The emergence of Large Language Models (LLM) offers new horizons. With their expansive knowledge and contextual prowess, LLMs have been transformative across diverse applications. Building on this, we introduce LogParser-LLM, a novel log parser integrated with LLM capabilities. This union seamlessly blends semantic insights with statistical nuances, obviating the need for hyper-parameter tuning and labeled training data, while ensurin...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3637528.3671810","openalex_id":"https://openalex.org/W4401863940","cited_by_count":37,"quality_score":75,"matched_keywords":["LLM","efficient"],"author_affiliations":["Alibaba Group (China)","Alibaba Group (United States)","Bellevue Hospital Center","Harvard University","Massachusetts General Hospital"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7721778750419617},{"id":"https://openalex.org/C186644900","display_name":"Parsing","score":0.7434560656547546},{"id":"https://openalex.org/C199360897","display_name":"Programming language","score":0.5125965476036072},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.4592478275299072},{"id":"https://openalex.org/C60690694","display_name":"Bottom-up parsing","score":0.4378282129764557},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.4072182774543762},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3602360486984253},{"id":"https://openalex.org/C42560504","display_name":"Top-down parsing","score":0.23404759168624878}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":37}},{"id":"arxiv:2404.11457","title":"Bias and Unfairness in Information Retrieval Systems: New Challenges in the LLM Era","url":"http://arxiv.org/abs/2404.11457","published":"2024-08-24","authors":["Sunhao Dai","Xu Chen","Shicheng Xu","Liang Pang","Zhenhua Dong","Jun Xu"],"abstract":"With the rapid advancements of large language models (LLMs), information retrieval (IR) systems, such as search engines and recommender systems, have undergone a significant paradigm shift. This evolution, while heralding new opportunities, introduces emerging challenges, particularly in terms of biases and unfairness, which may threaten the information ecosystem. In this paper, we present a comprehensive survey of existing works on emerging and pressing bias and unfairness issues in IR systems when the integration of LLMs. We first unify bias and unfairness issues as distribution mismatch problems, providing a groundwork for categorizing various mitigation strategies through distribution alignment. Subsequently, we systematically delve into the specific bias and unfairness issues arising from three critical stages of LLMs integration into IR systems: data collection, model development,....","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"preprint","doi":"https://doi.org/10.1145/3637528.3671458","openalex_id":"https://openalex.org/W4394947904","cited_by_count":94,"quality_score":75,"matched_keywords":["LLM","retrieval"],"author_affiliations":["Chinese Academy of Sciences","Huawei Technologies (China)","Institute of Computing Technology","Renmin University of China"],"concepts":[{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.5009019374847412},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.46944376826286316},{"id":"https://openalex.org/C2522767166","display_name":"Data science","score":0.38109809160232544}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":94}},{"id":"openalex:W4401857375","title":"A Survey on RAG Meeting LLMs: Towards Retrieval-Augmented Large Language Models","url":"https://doi.org/10.1145/3637528.3671470","published":"2024-08-24","authors":["Wenqi Fan","Yujuan Ding","Liangbo Ning","Shijie Wang","Hengyun Li","Dawei Yin","Tat‐Seng Chua","Qing Li"],"abstract":"As one of the most advanced techniques in AI, Retrieval-Augmented Generation (RAG) can offer reliable and up-to-date external knowledge, providing huge convenience for numerous tasks. Particularly in the era of AI-Generated Content (AIGC), the powerful capacity of retrieval in providing additional knowledge enables RAG to assist existing generative AI in producing high-quality outputs. Recently, Large Language Models (LLMs) have demonstrated revolutionary abilities in language understanding and generation, while still facing inherent limitations such as hallucinations and out-of-date internal knowledge. Given the powerful abilities of RAG in providing the latest and helpful auxiliary information, Retrieval-Augmented Large Language Models (RA-LLMs) have emerged to harness external and authoritative knowledge bases, rather than solely relying on the model's internal knowledge, to augment t...","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3637528.3671470","openalex_id":"https://openalex.org/W4401857375","cited_by_count":498,"quality_score":71,"matched_keywords":["retrieval"],"author_affiliations":["Baidu (China)","Hong Kong Polytechnic University","National University of Singapore"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.605438232421875},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.36853307485580444},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.36259862780570984}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":498}},{"id":"openalex:W4401857430","title":"A Review of Modern Recommender Systems Using Generative Models (Gen-RecSys)","url":"https://doi.org/10.1145/3637528.3671474","published":"2024-08-24","authors":["Yashar Deldjoo","Zhankui He","Julian McAuley","Anton Korikov","Scott Sanner","Arnau Ramisa","Renè Vidal","Maheswaran Sathiamoorthy","Atoosa Kasirzadeh","Silvia Milano"],"abstract":"Traditional recommender systems typically use user-item rating histories as their main data source. However, deep generative models now have the capability to model and sample from complex data distributions, including user-item interactions, text, images, and videos, enabling novel recommendation tasks. This comprehensive, multidisciplinary survey connects key advancements in RS using Generative Models (Gen-RecSys), covering: interaction-driven generative models; the use of large language models (LLM) and textual data for natural language recommendation; and the integration of multimodal models for generating and processing images/videos in RS. Our work highlights necessary paradigms for evaluating the impact and harm of Gen-RecSys and identifies open challenges. This survey accompanies a \"tutorial\" presented at ACM KDD'24, with supporting materials provided at: https://encr.pw/vDhLq.","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"review","doi":"https://doi.org/10.1145/3637528.3671474","openalex_id":"https://openalex.org/W4401857430","cited_by_count":85,"quality_score":71,"matched_keywords":["LLM"],"author_affiliations":["Amazon (United States)","Institut für Urheber- und Medienrecht","Ludwig-Maximilians-Universität München","Polytechnic University of Bari","University of California San Diego","University of Edinburgh","University of Toronto"],"concepts":[{"id":"https://openalex.org/C557471498","display_name":"Recommender system","score":0.8317124843597412},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7851645946502686},{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.611080527305603},{"id":"https://openalex.org/C167966045","display_name":"Generative model","score":0.5581235289573669},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5245233774185181},{"id":"https://openalex.org/C26517878","display_name":"Key (lock)","score":0.466518759727478},{"id":"https://openalex.org/C22467394","display_name":"Multidisciplinary approach","score":0.4526020884513855},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.38614386320114136}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":85}},{"id":"openalex:W4401863317","title":"UrbanGPT: Spatio-Temporal Large Language Models","url":"https://doi.org/10.1145/3637528.3671578","published":"2024-08-24","authors":["Zhonghang Li","Lianghao Xia","Jiabin Tang","Yong Xu","Lei Shi","Long Xia","Dawei Yin","Chao Huang"],"abstract":"Spatio-temporal prediction aims to forecast and gain insights into the ever-changing dynamics of urban environments across both time and space. Its purpose is to anticipate future patterns, trends, and events in diverse facets of urban life, including transportation, population movement, and crime rates. Although numerous efforts have been dedicated to developing neural network techniques for accurate predictions on spatio-temporal data, it is important to note that many of these methods heavily depend on having sufficient labeled data to generate precise spatio-temporal representations. Unfortunately, the issue of data scarcity is pervasive in practical urban sensing scenarios. In certain cases, it becomes challenging to collect any labeled data from downstream scenarios, intensifying the problem further. Consequently, it becomes necessary to build a spatio-temporal model that can exhib...","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3637528.3671578","openalex_id":"https://openalex.org/W4401863317","cited_by_count":114,"quality_score":67,"matched_keywords":[],"author_affiliations":["Baidu (China)","South China University of Technology","University of Hong Kong"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7274243831634521},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.3787016272544861},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3263387382030487}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":114}},{"id":"openalex:W4401857190","title":"Improving Multi-modal Recommender Systems by Denoising and Aligning Multi-modal Content and User Feedback","url":"https://doi.org/10.1145/3637528.3671703","published":"2024-08-24","authors":["Guipeng Xv","Xinyu Li","Ruobing Xie","Chen Lin","Chong Liu","Feng Xia","Zhanhui Kang","Leyu Lin"],"abstract":"Multi-modal recommender systems (MRSs) are pivotal in diverse online web platforms and have garnered considerable attention in recent years. However, previous studies overlook the challenges of (1)noisy multi-modal content, (2) noisy user feedback, and (3) aligning multi-modal content and user feedback. To tackle these challenges, we propose Denoising and Aligning Multi-modal Recommender System (DA-MRS). To mitigate noise in multi-modal content, DA-MRS first constructs item-item graphs determined by consistent content similarity across modalities. To denoise user feedback, DA-MRS associates the probability of observed feedback with multi-modal content and devises a denoised BPR loss. Furthermore, DA-MRS implements Alignment guided by User preference to enhance task-specific item representation and Alignment guided by graded Item relations to provide finer-grained alignment. Extensive exp...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3637528.3671703","openalex_id":"https://openalex.org/W4401857190","cited_by_count":24,"quality_score":65,"matched_keywords":["preference"],"author_affiliations":["Tencent (China)","Xiamen University"],"concepts":[{"id":"https://openalex.org/C71139939","display_name":"Modal","score":0.8889474272727966},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.793365478515625},{"id":"https://openalex.org/C557471498","display_name":"Recommender system","score":0.7727357149124146},{"id":"https://openalex.org/C2778152352","display_name":"Content (measure theory)","score":0.46423426270484924},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.33778589963912964},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.3296854496002197},{"id":"https://openalex.org/C33923547","display_name":"Mathematics","score":0.06668496131896973},{"id":"https://openalex.org/C134306372","display_name":"Mathematical analysis","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":24}},{"id":"openalex:W4401863300","title":"Enhancing On-Device LLM Inference with Historical Cloud-Based LLM Interactions","url":"https://doi.org/10.1145/3637528.3671679","published":"2024-08-24","authors":["Yucheng Ding","Chaoyue Niu","Fan Wu","Shaojie Tang","Chengfei Lyu","Guihai Chen"],"abstract":"Many billion-scale large language models (LLMs) have been released for resource-constraint mobile devices to provide local LLM inference service when cloud-based powerful LLMs are not available. However, the capabilities of current on-device LLMs still lag behind those of cloud-based LLMs, and how to effectively and efficiently enhance on-device LLM inference becomes a practical requirement. We thus propose to collect the user's historical interactions with the cloud-based LLM and build an external datastore on the mobile device for enhancement using nearest neighbors search. Nevertheless, the full datastore improves the quality of token generation at the unacceptable expense of much slower generation speed. To balance performance and efficiency, we propose to select an optimal subset of the full datastore within the given size limit, the optimization objective of which is proven to be s...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3637528.3671679","openalex_id":"https://openalex.org/W4401863300","cited_by_count":12,"quality_score":57,"matched_keywords":["LLM","memory"],"author_affiliations":["Alibaba Group (China)","Shanghai Jiao Tong University","The University of Texas at Dallas"],"concepts":[{"id":"https://openalex.org/C79974875","display_name":"Cloud computing","score":0.7770925760269165},{"id":"https://openalex.org/C2776214188","display_name":"Inference","score":0.6658616065979004},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6427605748176575},{"id":"https://openalex.org/C2522767166","display_name":"Data science","score":0.3312171399593353},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.22718337178230286},{"id":"https://openalex.org/C111919701","display_name":"Operating system","score":0.10519823431968689}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":12}},{"id":"arxiv:2407.15431","title":"Pre-Training and Prompting for Few-Shot Node Classification on Text-Attributed Graphs","url":"http://arxiv.org/abs/2407.15431","published":"2024-08-24","authors":["Huanjing Zhao","Beining Yang","Yukuo Cen","Junyu Ren","C. Zhang","Yuxiao Dong","Evgeny Kharlamov","Shu Zhao","Jie Tang"],"abstract":"The text-attributed graph (TAG) is one kind of important real-world graph-structured data with each node associated with raw texts. For TAGs, traditional few-shot node classification methods directly conduct training on the pre-processed node features and do not consider the raw texts. The performance is highly dependent on the choice of the feature pre-processing method. In this paper, we propose P2TAG, a framework designed for few-shot node classification on TAGs with graph pre-training and prompting. P2TAG first pre-trains the language model (LM) and graph neural network (GNN) on TAGs with self-supervised loss. To fully utilize the ability of language models, we adapt the masked language modeling objective for our framework. The pre-trained model is then used for the few-shot node classification with a mixed prompt method, which simultaneously considers both text and graph information...","companies":["Z.ai/Zhipu"],"matched_orgs":["Z.ai/Zhipu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3637528.3671952","openalex_id":"https://openalex.org/W4401863472","cited_by_count":11,"quality_score":52,"matched_keywords":["language model"],"author_affiliations":["Anhui University","Robert Bosch (Germany)","Tsinghua University","University of Edinburgh","Zhipu AI (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8196965456008911},{"id":"https://openalex.org/C132525143","display_name":"Graph","score":0.6236764192581177},{"id":"https://openalex.org/C62611344","display_name":"Node (physics)","score":0.5571590065956116},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5344376564025879},{"id":"https://openalex.org/C2778344882","display_name":"Shot (pellet)","score":0.5006718635559082},{"id":"https://openalex.org/C2993807640","display_name":"Attention network","score":0.47490179538726807},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.4610674977302551},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.44909343123435974}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":11}},{"id":"openalex:W4401863338","title":"Generative Auto-bidding via Conditional Diffusion Modeling","url":"https://doi.org/10.1145/3637528.3671526","published":"2024-08-24","authors":["Jiayan Guo","Yusen Huo","Zhilin Zhang","Tianyu Wang","Chuan Yu","Jian Xu","Bo Zheng","Yan Zhang"],"abstract":"Auto-bidding plays a crucial role in facilitating online advertising by automatically providing bids for advertisers. Reinforcement learning (RL) has gained popularity for auto-bidding. However, most current RL auto-bidding methods are modeled through the Markovian Decision Process (MDP), which assumes the Markovian state transition. This assumption restricts the ability to perform in long horizon scenarios and makes the model unstable when dealing with highly random online advertising environments. To tackle this issue, this paper introduces AI-Generated Bidding (AIGB), a novel paradigm for auto-bidding through generative modeling. In this paradigm, we propose DiffBid, a conditional diffusion modeling approach for bid generation. DiffBid directly models the correlation between the return and the entire trajectory, effectively avoiding error propagation across time steps in long horizons...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3637528.3671526","openalex_id":"https://openalex.org/W4401863338","cited_by_count":15,"quality_score":52,"matched_keywords":[],"author_affiliations":["Alibaba Group (China)","Peking University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6820242404937744},{"id":"https://openalex.org/C9233905","display_name":"Bidding","score":0.5990119576454163},{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.5690651535987854},{"id":"https://openalex.org/C167966045","display_name":"Generative model","score":0.4913999140262604},{"id":"https://openalex.org/C69357855","display_name":"Diffusion","score":0.431957870721817},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4179420471191406},{"id":"https://openalex.org/C144133560","display_name":"Business","score":0.07484111189842224},{"id":"https://openalex.org/C97355855","display_name":"Thermodynamics","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":15}},{"id":"arxiv:2407.09395","title":"Deep Bag-of-Words Model: An Efficient and Interpretable Relevance Architecture for Chinese E-Commerce","url":"http://arxiv.org/abs/2407.09395","published":"2024-08-24","authors":["Zhe Lin","Jiwei Tan","Dan Ou","Xi Chen","Shaowei Yao","Bo Zheng"],"abstract":"Text relevance or text matching of query and product is an essential technique for the e-commerce search system to ensure that the displayed products can match the intent of the query. Many studies focus on improving the performance of the relevance model in search system. Recently, pre-trained language models like BERT have achieved promising performance on the text relevance task. While these models perform well on the offline test dataset, there are still obstacles to deploy the pre-trained language model to the online system as their high latency. The two-tower model is extensively employed in industrial scenarios, owing to its ability to harmonize performance with computational efficiency. Regrettably, such models present an opaque ''black box'' nature, which prevents developers from making special optimizations. In this paper, we raise deep Bag-o f-Words (DeepBoW) model, an efficie...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3637528.3671559","openalex_id":"https://openalex.org/W4400668487","cited_by_count":5,"quality_score":50,"matched_keywords":["language model","efficient"],"author_affiliations":["Alibaba Group (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8465554118156433},{"id":"https://openalex.org/C158154518","display_name":"Relevance (law)","score":0.7126708030700684},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.5135798454284668},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.49402791261672974},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.45326727628707886},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.44370636343955994},{"id":"https://openalex.org/C2776359362","display_name":"Representation (politics)","score":0.42887312173843384},{"id":"https://openalex.org/C90805587","display_name":"Word (group theory)","score":0.4215978682041168}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":5}},{"id":"openalex:W4401864397","title":"Dual-Assessment Driven Pruning: Iterative Optimizing Layer-wise Sparsity for Large Language Model","url":"https://doi.org/10.1145/3637528.3671780","published":"2024-08-24","authors":["Qinghui Sun","Weilun Wang","Zhu Yanni","Shenghuan He","Hao Yi","Zehua Cai","Hong Liu"],"abstract":"Large Language Models (LLMs) have demonstrated efficacy in various domains, but deploying these models is economically challenging due to extensive parameter counts. Numerous efforts have been dedicated to reducing the parameter count of these models without compromising performance, employing a technique known as model pruning. Conventional pruning methods assess the significance of weights within individual layers and typically apply uniform sparsity levels across all layers, potentially neglecting the varying significance of each layer. To address this oversight, we first propose a dual-assessment driven pruning strategy that employs both intra-layer metric and global performance metric to comprehensively evaluate the impact of pruning. Then our method leverages an iterative optimization algorithm to find the optimal layer-wise sparsity distribution, thereby minimally impacting model....","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3637528.3671780","openalex_id":"https://openalex.org/W4401864397","cited_by_count":3,"quality_score":48,"matched_keywords":["LLM","language model"],"author_affiliations":["Alibaba Group (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7264138460159302},{"id":"https://openalex.org/C108010975","display_name":"Pruning","score":0.7192856073379517},{"id":"https://openalex.org/C2779227376","display_name":"Layer (electronics)","score":0.6761136651039124},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.5471634864807129},{"id":"https://openalex.org/C2780980858","display_name":"Dual (grammatical number)","score":0.5196841359138489},{"id":"https://openalex.org/C159694833","display_name":"Iterative method","score":0.5003900527954102},{"id":"https://openalex.org/C2993148961","display_name":"Dual layer","score":0.4969046413898468},{"id":"https://openalex.org/C11413529","display_name":"Algorithm","score":0.3682038187980652}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":3}},{"id":"openalex:W4401863565","title":"When Box Meets Graph Neural Network in Tag-aware Recommendation","url":"https://doi.org/10.1145/3637528.3671973","published":"2024-08-24","authors":["Fake Lin","Ziwei Zhao","Xi Zhu","Da Zhang","Shitian Shen","Xueying Li","Tong Xu","Suojuan Zhang","Enhong Chen"],"abstract":"Last year has witnessed the re-flourishment of tag-aware recommender systems supported by the LLM-enriched tags. Unfortunately, though large efforts have been made, current solutions may fail to describe the diversity and uncertainty inherent in user preferences with only tag-driven profiles. Recently, with the development of geometry-based techniques, e.g., box embeddings, the diversity of user preferences now could be fully modeled as the range within a box in high dimension space. However, defect still exists as these approaches are incapable of capturing high-order neighbor signals, i.e., semantic-rich multi-hop relations within the user-tag-item tripartite graph, which severely limits the effectiveness of user modeling. To deal with this challenge, in this paper, we propose a novel framework, called BoxGNN, to perform message aggregation via combinations of logical operations, there...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3637528.3671973","openalex_id":"https://openalex.org/W4401863565","cited_by_count":6,"quality_score":47,"matched_keywords":["LLM"],"author_affiliations":["Alibaba Group (China)","Alibaba Group (United States)","PLA Army Engineering University","University of Science and Technology of China"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7780932188034058},{"id":"https://openalex.org/C50644808","display_name":"Artificial neural network","score":0.5385969281196594},{"id":"https://openalex.org/C132525143","display_name":"Graph","score":0.5357970595359802},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.39520007371902466},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.32171404361724854},{"id":"https://openalex.org/C80444323","display_name":"Theoretical computer science","score":0.25369247794151306}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":6}},{"id":"openalex:W4401862778","title":"Reinforced Compressive Neural Architecture Search for Versatile Adversarial Robustness","url":"https://doi.org/10.1145/3637528.3672009","published":"2024-08-24","authors":["Dingrong Wang","Hitesh Sapkota","Zhiqiang Tao","Qi Yu"],"abstract":"Prior research on neural architecture search (NAS) for adversarial robustness has revealed that a lightweight and adversarially robust sub-network could exist in a non-robust large teacher network. Such a sub-network is generally discovered based on heuristic rules to perform neural architecture search. However, heuristic rules are inadequate to handle diverse adversarial attacks and different \"teacher\" network capacity. To address this key challenge, we propose Reinforced Compressive Neural Architecture Search (RC-NAS), aiming to achieve Versatile Adversarial Robustness. Specifically, we define novel task settings that compose datasets, adversarial attacks, and teacher network configuration. Given diverse tasks, we develop an innovative dual-level training paradigm that consists of a meta-training and a fine-tuning phase to effectively expose the RL agent to diverse attack scenarios (in...","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3637528.3672009","openalex_id":"https://openalex.org/W4401862778","cited_by_count":2,"quality_score":47,"matched_keywords":["compression","agent"],"author_affiliations":["Amazon (United States)","Rochester Institute of Technology"],"concepts":[{"id":"https://openalex.org/C63479239","display_name":"Robustness (evolution)","score":0.7234077453613281},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6462773084640503},{"id":"https://openalex.org/C123657996","display_name":"Architecture","score":0.476815402507782},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4763486683368683},{"id":"https://openalex.org/C37736160","display_name":"Adversarial system","score":0.4302988350391388},{"id":"https://openalex.org/C185592680","display_name":"Chemistry","score":0.08768793940544128},{"id":"https://openalex.org/C104317684","display_name":"Gene","score":0.0},{"id":"https://openalex.org/C55493867","display_name":"Biochemistry","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":2}},{"id":"openalex:W4401857231","title":"Pre-trained KPI Anomaly Detection Model Through Disentangled Transformer","url":"https://doi.org/10.1145/3637528.3671522","published":"2024-08-24","authors":["Zhaoyang Yu","Changhua Pei","Xin Wang","Minghua Ma","Chetan Bansal","Saravan Rajmohan","Qingwei Lin","Dongmei Zhang","Xidao Wen","Jianhui Li","Gaogang Xie","Dan Pei"],"abstract":"In large-scale online service systems, numerous Key Performance Indicators (KPIs), such as service response time and error rate, are gathered in a time-series format. KPI Anomaly Detection (KAD) is a critical data mining problem due to its widespread applications in real-world scenarios. However, KAD faces the challenges of dealing with KPI heterogeneity and noisy data. We propose KAD-Disformer, a KPI Anomaly Detection approach through Disentangled Transformer. KAD-Disformer pre-trains a model on existing accessible KPIs, and the pre-trained model can be effectively \"fine-tuned\" to unseen KPI using only a handful of samples from the unseen KPI. We propose a series of innovative designs, including disentangled projection for transformer, unsupervised few-shot fine-tuning (uTune), and denoising modules, each of which significantly contributes to the overall performance. Our extensive exper...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3637528.3671522","openalex_id":"https://openalex.org/W4401857231","cited_by_count":8,"quality_score":45,"matched_keywords":[],"author_affiliations":["Chinese Academy of Sciences","Computer Network Information Center","Microsoft (United States)","Microsoft Research Asia (China)","Stony Brook University","Tsinghua University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7630633115768433},{"id":"https://openalex.org/C135510737","display_name":"Performance indicator","score":0.7289056777954102},{"id":"https://openalex.org/C739882","display_name":"Anomaly detection","score":0.6709850430488586},{"id":"https://openalex.org/C63479239","display_name":"Robustness (evolution)","score":0.6114195585250854},{"id":"https://openalex.org/C66322947","display_name":"Transformer","score":0.5091692805290222},{"id":"https://openalex.org/C124101348","display_name":"Data mining","score":0.4602455496788025},{"id":"https://openalex.org/C127413603","display_name":"Engineering","score":0.1478050947189331},{"id":"https://openalex.org/C165801399","display_name":"Voltage","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":8}},{"id":"openalex:W4401863391","title":"Large Language Model with Curriculum Reasoning for Visual Concept Recognition","url":"https://doi.org/10.1145/3637528.3671653","published":"2024-08-24","authors":["Y. J. Zhang","Xin Wang","Hong Chen","Jiapei Fan","Weigao Wen","Hui Xue","Hong Mei","Wenwu Zhu"],"abstract":"Visual concept recognition aims to capture the basic attributes of an image and reason about the relationships among them to determine whether the image satisfies a certain concept, and has been widely used in various tasks such as human action recognition and image risk warning. Most existing works adopt deep neural networks for visual concept recognition, which are black-box and incomprehensible to humans, thus making them unacceptable for sensitive domains such as prohibited event detection and risk early warning etc. To address this issue, we propose to combine large language model (LLM) with explainable symbolic reasoning via curriculum reweighting to increase the interpretability and accuracy of visual concept recognition in this paper. However, realizing this goal is challenging given that i) the performance of symbolic representations are limited by the lack of annotated reasonin...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3637528.3671653","openalex_id":"https://openalex.org/W4401863391","cited_by_count":0,"quality_score":45,"matched_keywords":["LLM","language model"],"author_affiliations":["Alibaba Group (China)","Peking University","Tsinghua University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7958911657333374},{"id":"https://openalex.org/C9616225","display_name":"Semantic reasoner","score":0.7220512628555298},{"id":"https://openalex.org/C2781067378","display_name":"Interpretability","score":0.6615392565727234},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.6319867372512817},{"id":"https://openalex.org/C2777508537","display_name":"Visual reasoning","score":0.5736147165298462},{"id":"https://openalex.org/C193221554","display_name":"Commonsense reasoning","score":0.5638020634651184},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.4717234969139099},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.43128150701522827}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4401862830","title":"Killing Two Birds with One Stone: Cross-modal Reinforced Prompting for Graph and Language Tasks","url":"https://doi.org/10.1145/3637528.3671742","published":"2024-08-24","authors":["Wenyuan Jiang","Wenwei Wu","Le Zhang","Zixuan Yuan","Jian Xiang","Jingbo Zhou","Hui Xiong"],"abstract":"In recent years, Graph Neural Networks (GNNs) and Large Language Models (LLMs) have exhibited remarkable capability in addressing different graph learning and natural language tasks, respectively. Motivated by this, integrating LLMs with GNNs has been increasingly studied to acquire transferable knowledge across modalities, which leads to improved empirical performance in language and graph domains. However, existing studies mainly focused on a single-domain scenario by designing complicated integration techniques to manage multimodal data effectively. Therefore, a concise and generic learning framework for multi-domain tasks, i.e., graph and language domains, is highly desired yet remains under-exploited due to two major challenges. First, the language corpus of downstream tasks differs significantly from graph data, making it hard to bridge the knowledge gap between modalities. Second,...","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3637528.3671742","openalex_id":"https://openalex.org/W4401862830","cited_by_count":4,"quality_score":45,"matched_keywords":["agent"],"author_affiliations":["Baidu (China)","Hong Kong University of Science and Technology","University of Hong Kong","University of Science and Technology of China"],"concepts":[{"id":"https://openalex.org/C71139939","display_name":"Modal","score":0.6527752876281738},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6066024303436279},{"id":"https://openalex.org/C132525143","display_name":"Graph","score":0.5116429924964905},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.33076879382133484},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.32479023933410645},{"id":"https://openalex.org/C80444323","display_name":"Theoretical computer science","score":0.181723952293396},{"id":"https://openalex.org/C188027245","display_name":"Polymer chemistry","score":0.0},{"id":"https://openalex.org/C185592680","display_name":"Chemistry","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":4}},{"id":"openalex:W4401863796","title":"CrossLight: Offline-to-Online Reinforcement Learning for Cross-City Traffic Signal Control","url":"https://doi.org/10.1145/3637528.3671927","published":"2024-08-24","authors":["Sun Qian","Rui Zha","Le Zhang","Jingbo Zhou","Yu Mei","Zhiling Li","Hui Xiong"],"abstract":"The recent advancements in Traffic Signal Control (TSC) have highlighted the potential of Reinforcement Learning (RL) as a promising solution to alleviate traffic congestion. Current research in this area primarily concentrates on either online or offline learning strategies, aiming to create optimized policies for specific cities. Nevertheless, the transferability of these policies to new cities is impeded by constraints such as the limited availability of high-quality data and the expensive and risky exploration process. To this end, in this paper, we present an innovative cross-city Traffic Signal Control (TSC) paradigm called CrossLight. Our approach involves meta training using offline data from source cities and adaptively fine-tuning in the target city. This novel methodology aims to address the challenges of transferring TSC policies across different cities effectively. In our pr...","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3637528.3671927","openalex_id":"https://openalex.org/W4401863796","cited_by_count":8,"quality_score":45,"matched_keywords":[],"author_affiliations":["Baidu (China)","Hong Kong University of Science and Technology","University of Hong Kong","University of Science and Technology of China"],"concepts":[{"id":"https://openalex.org/C97541855","display_name":"Reinforcement learning","score":0.8065498471260071},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6398353576660156},{"id":"https://openalex.org/C2987419075","display_name":"Traffic signal","score":0.6134769916534424},{"id":"https://openalex.org/C2779843651","display_name":"SIGNAL (programming language)","score":0.5604655146598816},{"id":"https://openalex.org/C2775924081","display_name":"Control (management)","score":0.43001317977905273},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3408932685852051},{"id":"https://openalex.org/C79403827","display_name":"Real-time computing","score":0.19764631986618042},{"id":"https://openalex.org/C199360897","display_name":"Programming language","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":8}},{"id":"openalex:W4401863351","title":"Pre-train and Refine: Towards Higher Efficiency in K-Agnostic Community Detection without Quality Degradation","url":"https://doi.org/10.1145/3637528.3671686","published":"2024-08-24","authors":["Meng Qin","Chaorui Zhang","Yu Gao","Weixi Zhang","Dit‐Yan Yeung"],"abstract":"Community detection (CD) is a classic graph inference task that partitions nodes of a graph into densely connected groups. While many CD methods have been proposed with either impressive quality or efficiency, balancing the two aspects remains a challenge. This study explores the potential of deep graph learning to achieve a better trade-off between the quality and efficiency of K-agnostic CD, where the number of communities K is unknown. We propose PRoCD (Pre-training & Refinement fOr Community Detection), a simple yet effective method that reformulates K-agnostic CD as the binary node pair classification. PRoCD follows a pre-training & refinement paradigm inspired by recent advances in pre-training techniques. We first conduct the offline pre-training of PRoCD on small synthetic graphs covering various topology properties. Based on the inductive inference across graphs, we then general...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3637528.3671686","openalex_id":"https://openalex.org/W4401863351","cited_by_count":2,"quality_score":43,"matched_keywords":["efficient"],"author_affiliations":["Hong Kong University of Science and Technology","Huawei Technologies (China)","University of Hong Kong"],"concepts":[{"id":"https://openalex.org/C114466953","display_name":"Initialization","score":0.7798164486885071},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.735034704208374},{"id":"https://openalex.org/C2776214188","display_name":"Inference","score":0.7243624925613403},{"id":"https://openalex.org/C132525143","display_name":"Graph","score":0.5414602160453796},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.5357363224029541},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5097216963768005},{"id":"https://openalex.org/C177148314","display_name":"Generalization","score":0.5095551013946533},{"id":"https://openalex.org/C2779530757","display_name":"Quality (philosophy)","score":0.4755786955356598}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":2}},{"id":"openalex:W4401857309","title":"MISP: A Multimodal-based Intelligent Server Failure Prediction Model for Cloud Computing Systems","url":"https://doi.org/10.1145/3637528.3671568","published":"2024-08-24","authors":["Xianting Lu","Yunong Wang","Yu Fu","Qi Sun","Xuhua Ma","Xudong Zheng","Cheng Zhuo"],"abstract":"Traditional server failure prediction methods predominantly rely on single-modality data such as system logs or system status curves. This reliance may lead to an incomplete understanding of system health and impending issues, proving inadequate for the complex and dynamic landscape of contemporary cloud computing environments. The potential of multimodal data to provide comprehensive insights is widely acknowledged, yet the lack of a holistic dataset and the challenges inherent in integrating features from both structured and unstructured data have impeded the exploration of multimodal-based server failure prediction. Addressing these challenges, this paper presents an industrial-scale, comprehensive dataset for server failure prediction, comprising nearly 80 types of structured and unstructured data sourced from real-world industrial cloud systems 1. Building on this resource, we intro...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3637528.3671568","openalex_id":"https://openalex.org/W4401857309","cited_by_count":6,"quality_score":43,"matched_keywords":[],"author_affiliations":["Alibaba Group (China)","Lanzhou University","Zhejiang University"],"concepts":[{"id":"https://openalex.org/C79974875","display_name":"Cloud computing","score":0.775584876537323},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7539501190185547},{"id":"https://openalex.org/C206345919","display_name":"Resource (disambiguation)","score":0.49498528242111206},{"id":"https://openalex.org/C93996380","display_name":"Server","score":0.47744423151016235},{"id":"https://openalex.org/C2780226545","display_name":"Modality (human–computer interaction)","score":0.44992929697036743},{"id":"https://openalex.org/C2987335383","display_name":"Cloud server","score":0.44317159056663513},{"id":"https://openalex.org/C2779903281","display_name":"Modalities","score":0.43182867765426636},{"id":"https://openalex.org/C124101348","display_name":"Data mining","score":0.4253285229206085}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":6}},{"id":"openalex:W4401857420","title":"Controllable Multi-Behavior Recommendation for In-Game Skins with Large Sequential Model","url":"https://doi.org/10.1145/3637528.3671572","published":"2024-08-24","authors":["Yanjie Gou","Yuanzhou Yao","Zhao Zhang","Yiqing Wu","Yi Hu","Fuzhen Zhuang","Jiangming Liu","Yongjun Xu"],"abstract":"Online games often house virtual shops where players can acquire character skins. Our task is centered on tailoring skin recommendations across diverse scenarios by analyzing historical interactions such as clicks, usage, and purchases. Traditional multi-behavior recommendation models employed for this task are limited. They either only predict skins based on a single type of behavior or merely recommend skins for target behavior type/task. These models lack the ability to control predictions of skins that are associated with different scenarios and behaviors. To overcome these limitations, we utilize the pretraining capabilities of Large Sequential Models (LSMs) coupled with a novel stimulus prompt mechanism and build a controllable multi-behavior recommendation (CMBR) model. In our approach, the pretraining ability is used to encapsulate users' multi-behavioral sequences into the repre...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3637528.3671572","openalex_id":"https://openalex.org/W4401857420","cited_by_count":6,"quality_score":43,"matched_keywords":[],"author_affiliations":["Beihang University","Chinese Academy of Sciences","Institute of Computing Technology","Tencent (China)","University of Chinese Academy of Sciences","Yunnan University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8501471281051636},{"id":"https://openalex.org/C2780451532","display_name":"Task (project management)","score":0.5408978462219238},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.461481511592865},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4571024179458618},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.4221784472465515},{"id":"https://openalex.org/C78639753","display_name":"Behavioral modeling","score":0.41552940011024475},{"id":"https://openalex.org/C162324750","display_name":"Economics","score":0.0},{"id":"https://openalex.org/C187736073","display_name":"Management","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":6}},{"id":"openalex:W4401863583","title":"Enhancing Asymmetric Web Search through Question-Answer Generation and Ranking","url":"https://doi.org/10.1145/3637528.3671517","published":"2024-08-24","authors":["Dezhi Ye","J. Liu","Jiabin Fan","Bowen Tian","Tianhua Zhou","Xiang Chen","Jin Ma"],"abstract":"This paper addresses the challenge of the semantic gap between user queries and web content, commonly referred to as asymmetric text matching, within the domain of web search. By leveraging BERT for reading comprehension, current algorithms enable significant advancements in query understanding, but still encounter limitations in effectively resolving the asymmetrical ranking problem due to model comprehension and summarization constraints.To tackle this issue, we propose the QAGR (Question-Answer Generation and Ranking) method, comprising an offline module called QAGeneration and an online module called QARanking. The QAGeneration module utilizes large language models (LLMs) to generate high-quality question-answering pairs for each web page. This process involves two steps: generating question-answer pairs and performing verification to eliminate irrelevant questions, resulting in high...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3637528.3671517","openalex_id":"https://openalex.org/W4401863583","cited_by_count":1,"quality_score":42,"matched_keywords":["efficient"],"author_affiliations":["Tencent (China)"],"concepts":[{"id":"https://openalex.org/C189430467","display_name":"Ranking (information retrieval)","score":0.6895073652267456},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6746923327445984},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.5894419550895691},{"id":"https://openalex.org/C136764020","display_name":"World Wide Web","score":0.43421199917793274},{"id":"https://openalex.org/C97854310","display_name":"Search engine","score":0.417098730802536}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4401863765","title":"Optimizing Novelty of Top-k Recommendations using Large Language Models and Reinforcement Learning","url":"http://dx.doi.org/10.1145/3637528.3671618","published":"2024-08-24","authors":["Amit Sharma","Hua Li","Xue Li","Jian Jiao"],"abstract":"Given an input query, a recommendation model is trained using user feedback data (e.g., click data) to output a ranked list of items. In real-world systems, besides accuracy, an important consideration for a new model is novelty of its top-k recommendations w.r.t. an existing deployed model. However, novelty of top-k items is a difficult goal to optimize a model for, since it involves a non-differentiable sorting operation on the model's predictions. Moreover, novel items, by definition, do not have any user feedback data. Given the semantic capabilities of large language models, we address these problems using a reinforcement learning (RL) formulation where large language models provide feedback for the novel items. However, given millions of candidate items, the sample complexity of a standard RL algorithm can be prohibitively high. To reduce sample complexity, we reduce the top-k list...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3637528.3671618","openalex_id":"https://openalex.org/W4401863765","cited_by_count":3,"quality_score":40,"matched_keywords":[],"author_affiliations":["Microsoft (United States)","Microsoft Research (India)"],"concepts":[{"id":"https://openalex.org/C2778738651","display_name":"Novelty","score":0.9067846536636353},{"id":"https://openalex.org/C97541855","display_name":"Reinforcement learning","score":0.8397990465164185},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7390843629837036},{"id":"https://openalex.org/C67203356","display_name":"Reinforcement","score":0.4943857192993164},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4913720190525055},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.437904953956604},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.4036981761455536},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.3477672338485718}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":3}},{"id":"openalex:W4401863531","title":"Enhancing Multi-field B2B Cloud Solution Matching via Contrastive Pre-training","url":"https://doi.org/10.1145/3637528.3671513","published":"2024-08-24","authors":["Haonan Chen","Zhicheng Dou","Xuetong Hao","Yunhao Tao","Shiren Song","Zhenli Sheng"],"abstract":"Cloud solutions have gained significant popularity in the technology industry as they offer a combination of services and tools to tackle specific problems. However, despite their widespread use, the task of identifying appropriate company customers for a specific target solution to the sales team of a solution provider remains a complex business problem that existing matching systems have yet to adequately address. In this work, we study the B2B solution matching problem and identify two main challenges of this scenario: (1) the modeling of complex multi-field features and (2) the limited, incomplete, and sparse transaction data. To tackle these challenges, we propose a framework CAMA, which is built with a hierarchical multi-field matching structure as its backbone and supplemented by three data augmentation strategies and a contrastive pre-training objective to compensate for the impe...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3637528.3671513","openalex_id":"https://openalex.org/W4401863531","cited_by_count":2,"quality_score":39,"matched_keywords":[],"author_affiliations":["Huawei Technologies (China)","Renmin University of China"],"concepts":[{"id":"https://openalex.org/C79974875","display_name":"Cloud computing","score":0.7429788112640381},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7054712772369385},{"id":"https://openalex.org/C165064840","display_name":"Matching (statistics)","score":0.5754456520080566},{"id":"https://openalex.org/C2777211547","display_name":"Training (meteorology)","score":0.5296563506126404},{"id":"https://openalex.org/C9652623","display_name":"Field (mathematics)","score":0.4726133644580841},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.39337411522865295},{"id":"https://openalex.org/C33923547","display_name":"Mathematics","score":0.10634180903434753},{"id":"https://openalex.org/C111919701","display_name":"Operating system","score":0.08974385261535645}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":2}},{"id":"openalex:W4401857178","title":"The 4th KDD Workshop on Deep Learning for Spatiotemporal Data, Applications, and Systems (DeepSpatial'24)","url":"https://doi.org/10.1145/3637528.3671501","published":"2024-08-24","authors":["Zhe Jiang","Liang Zhao","Xun Zhou","Junbo Zhang","Shashi Shekhar","Jieping Ye"],"abstract":"Over the last decades, a rapidly growing volume of spatiotemporal data has been collected from smartphones and GPS, terrestrial, seaborne, airborne, and spaceborne sensors, as well as computational simulations. Meanwhile, advances in deep learning technologies, especially the recent breakthroughs of generative AI and foundation models such as Large Language Models (LLMs) and Large Vision Models (LVMs), have achieved tremendous success in natural language processing and computer vision applications. There is growing anticipation of the same level of accomplishment of AI on spatiotemporal data in tackling grand societal challenges, such as national water resource management, monitoring coastal hazards, energy and food security, as well as mitigation and adaptation to climate change. When deep learning, especially emerging foundation models, intersects spatiotemporal data in scientific doma...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3637528.3671501","openalex_id":"https://openalex.org/W4401857178","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Alibaba Group (China)","Emory University","Harbin Institute of Technology","Jingdong (China)","University of Florida","University of Minnesota"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.751146137714386},{"id":"https://openalex.org/C108583219","display_name":"Deep learning","score":0.5117034912109375},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.46712666749954224},{"id":"https://openalex.org/C2522767166","display_name":"Data science","score":0.4169728457927704},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.3392350375652313}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4401857427","title":"The 10th Mining and Learning from Time Series Workshop: From Classical Methods to LLMs","url":"https://doi.org/10.1145/3637528.3671489","published":"2024-08-24","authors":["Sanjay Purushotham","Dongjin Song","Qingsong Wen","Jun Huan","Cong Shen","Stefan Zohren","Yuriy Nevmyvaka"],"abstract":"Time series data has become ubiquitous across various fields such as healthcare, finance, entertainment, and transportation, driven by advancements in sensing technologies that enable continuous monitoring and recording. This growth in data size and complexity presents new challenges for traditional analysis techniques, necessitating the development of advanced, interdisciplinary temporal mining algorithms. The goals of this workshop are to: (1) highlight significant challenges in learning and mining from time series data, such as irregular sampling, spatiotemporal structures, and uncertainty quantification; (2) discuss recent developments in algorithmic, theoretical, statistical, and systems-based approaches for addressing these challenges, including both classical methods and large language models (LLMs); and (3) synergize research efforts by exploring both new and open problems in tim...","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3637528.3671489","openalex_id":"https://openalex.org/W4401857427","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Amazon (United States)","Morgan Stanley (United States)","University of Connecticut","University of Maryland, Baltimore County","University of Oxford","University of Virginia"],"concepts":[{"id":"https://openalex.org/C2522767166","display_name":"Data science","score":0.7270594835281372},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.657802939414978},{"id":"https://openalex.org/C151406439","display_name":"Time series","score":0.5491552352905273},{"id":"https://openalex.org/C2778137410","display_name":"Government (linguistics)","score":0.535266637802124},{"id":"https://openalex.org/C75684735","display_name":"Big data","score":0.5111380219459534},{"id":"https://openalex.org/C171686336","display_name":"Topic model","score":0.4503617584705353},{"id":"https://openalex.org/C512170562","display_name":"Entertainment","score":0.4208815395832062},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.36788636445999146}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4401857335","title":"KDD workshop on Evaluation and Trustworthiness of Generative AI Models","url":"https://doi.org/10.1145/3637528.3671481","published":"2024-08-24","authors":["Yuan Ling","Shujing Dong","Yarong Feng","Zongyi Joe Liu","George Karypis","Chandan K. Reddy"],"abstract":"The KDD workshop on Evaluation and Trustworthiness of Generative AI Models aims to address the critical need for reliable generative AI technologies by exploring comprehensive evaluation strategies. This workshop will delve into various aspects of assessing generative AI models, including Large Language Models (LLMs) and diffusion models, focusing on trustworthiness, safety, bias, fairness, and ethical considerations. With an emphasis on interdisciplinary collaboration, the workshop will feature invited talks, peer-reviewed paper presentations, and panel discussions to advance the state of the art in generative AI evaluation.","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3637528.3671481","openalex_id":"https://openalex.org/W4401857335","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Amazon (United States)","University of Minnesota","Virginia Tech"],"concepts":[{"id":"https://openalex.org/C153701036","display_name":"Trustworthiness","score":0.7836251258850098},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7165724039077759},{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.6885635256767273},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.49916696548461914},{"id":"https://openalex.org/C167966045","display_name":"Generative model","score":0.4415610730648041},{"id":"https://openalex.org/C2522767166","display_name":"Data science","score":0.4012581408023834},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.38008731603622437},{"id":"https://openalex.org/C38652104","display_name":"Computer security","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"apple:gx5r7ee2zrkr1eegla53zwtc","title":"Positional Description for Numerical Normalization","url":"https://machinelearning.apple.com/research/positional-description","published":"2024-08-23","authors":["Deepanshu Gupta","Javier Latorre"],"abstract":"We present a Positional Description Scheme (PDS) tailored for digit sequences, integrating placeholder value information for each digit. Given the structural limitations of subword tokenization algorithms, language models encounter critical Text Normalization (TN) challenges when handling numerical tasks. Our schema addresses this challenge through straightforward pre-processing, preserving the model architecture while significantly simplifying...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"openalex:W4401809506","title":"Follow-Up Attention: An Empirical Study of Developer and Neural Model Code Exploration","url":"https://doi.org/10.1109/tse.2024.3445338","published":"2024-08-23","authors":["Matteo Paltenghi","Rahul Pandita","Austin Z. Henley","Albert Ziegler"],"abstract":"Recent neural models of code, such as OpenAI Codex and AlphaCode, have demonstrated remarkable proficiency at code generation due to the underlying attention mechanism. However, it often remains unclear how the models actually process code, and to what extent their reasoning and the way their attention mechanism scans the code matches the patterns of developers. A poor understanding of the model reasoning process limits the way in which current neural models are leveraged today, so far mostly for their raw prediction. To fill this gap, this work studies how the processed attention signal of three open large language models - CodeGen, InCoder and GPT-J - agrees with how developers look at and explore code when each answers the same sensemaking questions about code. Furthermore, we contribute an open-source eye-tracking dataset comprising 92 manually-labeled sessions from 25 developers eng...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/tse.2024.3445338","openalex_id":"https://openalex.org/W4401809506","cited_by_count":3,"quality_score":40,"matched_keywords":[],"author_affiliations":["Microsoft (United States)","University of Stuttgart"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8505386114120483},{"id":"https://openalex.org/C2776760102","display_name":"Code (set theory)","score":0.5649844408035278},{"id":"https://openalex.org/C120936955","display_name":"Empirical research","score":0.493510365486145},{"id":"https://openalex.org/C199360897","display_name":"Programming language","score":0.4890623986721039},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3679161071777344},{"id":"https://openalex.org/C115903868","display_name":"Software engineering","score":0.36778613924980164},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.3204239308834076},{"id":"https://openalex.org/C111472728","display_name":"Epistemology","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":3}},{"id":"openalex:W4401717397","title":"MMICT: Boosting Multi-Modal Fine-Tuning with In-Context Examples","url":"https://doi.org/10.1145/3688804","published":"2024-08-21","authors":["Tao Chen","Enwei Zhang","Yuting Gao","Ke Li","Xing Sun","Yan Zhang","Hui Li","Rongrong Ji"],"abstract":"Although In-Context Learning (ICL) brings remarkable performance gains to Large Language Models (LLMs), the improvements remain lower than fine-tuning on downstream tasks. This paper introduces Multi-Modal In-Context Tuning (MMICT), a novel multi-modal fine-tuning paradigm that boosts multi-modal fine-tuning by fully leveraging the promising ICL capability of Multi-Modal LLMs (MM-LLMs). We propose the Multi-Modal Hub (M-Hub), a unified module that captures various multi-modal features according to different inputs and objectives. Based on M-Hub, MMICT enables MM-LLMs to learn from in-context visual-guided textual features and subsequently generate outputs conditioned on the textual-guided visual features. Moreover, leveraging the flexibility of M-Hub, we design a variety of in-context demonstrations. Extensive experiments on a diverse range of downstream multi-modal tasks demonstrate tha...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3688804","openalex_id":"https://openalex.org/W4401717397","cited_by_count":3,"quality_score":40,"matched_keywords":[],"author_affiliations":["Tencent (China)","Xiamen University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8818835020065308},{"id":"https://openalex.org/C46686674","display_name":"Boosting (machine learning)","score":0.7764791250228882},{"id":"https://openalex.org/C71139939","display_name":"Modal","score":0.6883336305618286},{"id":"https://openalex.org/C2779343474","display_name":"Context (archaeology)","score":0.4793722927570343},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.34937015175819397},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.3491642475128174},{"id":"https://openalex.org/C192562407","display_name":"Materials science","score":0.08046367764472961},{"id":"https://openalex.org/C127313418","display_name":"Geology","score":0.06281688809394836}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":3}},{"id":"arxiv:2408.11810","title":"Pixel Is Not a Barrier: An Effective Evasion Attack for Pixel-Domain Diffusion Models","url":"https://huggingface.co/papers/2408.11810","published":"2024-08-21","authors":["Chun-Yen Shih","Li-Xuan Peng","Jia-Wei Liao","Ernie Chu","Cheng-Fu Chou","Jun-Cheng Chen"],"abstract":"Diffusion Models have emerged as powerful generative models for high-quality image synthesis, with many subsequent image editing techniques based on them. However, the ease of text-based image editing introduces significant risks, such as malicious editing for scams or intellectual property infringement. Previous works have attempted to safeguard images from diffusion-based editing by adding imperceptible perturbations. These methods are costly and specifically target prevalent Latent Diffusion Models (LDMs), while Pixel-domain Diffusion Models (PDMs) remain largely unexplored and robust against such attacks. Our work addresses this gap by proposing a novel attack framework, AtkPDM. AtkPDM is mainly composed of a feature representation attacking loss that exploits vulnerabilities in denoising UNets and a latent optimization strategy to enhance the naturalness of adversarial images. Exten...","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["huggingface_search"],"source":"huggingface_search","work_type":"","doi":"","openalex_id":"","cited_by_count":0,"quality_score":27,"matched_keywords":[],"author_affiliations":[],"concepts":[],"official_report":false,"quality_signals":{"company_match_source":"HuggingFace Papers organization/author metadata"}},{"id":"official:7b39dd2b9474fb27","title":"Lumos : Empowering Multimodal LLMs with Scene Text Recognition","url":"https://ai.meta.com/research/publications/lumos-empowering-multimodal-llms-with-scene-text-recognition/","published":"2024-08-20","authors":["Ashish Shenoy","Yichao Lu","Srihari Jayakumar","Debojeet Chatterjee","Mohsen Moslehpour","Pierce Chuang","Abhay Harpale","Vikas Bhardwaj","Di Xu","Shicong Zhao","Ankit Ramchandani","Luna Dong"],"abstract":"Official Meta AI publication page.","companies":["Meta/FAIR"],"matched_orgs":["Meta/FAIR"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Conversational AI","NLP"],"author_affiliations":["Meta/FAIR"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Meta AI publication search https://ai.meta.com/results/?content_types%5B0%5D=publication&page=10"}},{"id":"openalex:W4410719129","title":"Strategic Integration of LangChain, Hugging Face Transformers, and OpenAI for Document Intelligence Systems","url":"https://doi.org/10.32628/ijsrset25121177","published":"2024-08-20","authors":["Oyejide Timothy Odofin","Abraham Ayodeji Abayomi","Ejielo Ogbuefi","Jeffrey Chidera Ogeawuchi","Oluwasanmi Segun Adanigbo","Toluwase Peter Gbenle"],"abstract":"This paper explores the strategic integration of LangChain, Hugging Face Transformers, and OpenAI's models to enhance document intelligence systems. Document intelligence, a vital component in automating document understanding, processing, and reasoning, benefits from the synergy between these advanced natural language processing (NLP) tools. LangChain’s chaining capabilities, Hugging Face's pretrained models, and OpenAI’s foundation models are leveraged to automate and optimize document-related tasks such as retrieval, classification, summarization, and anomaly detection. This research presents a comprehensive overview of the key technologies and their combined power in transforming industries such as law, finance, and research. Through architectural design and case studies, the paper demonstrates how this integration can streamline complex workflows, reduce operational costs, and enhan...","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.32628/ijsrset25121177","openalex_id":"https://openalex.org/W4410719129","cited_by_count":3,"quality_score":44,"matched_keywords":["retrieval"],"author_affiliations":["Amazon (United States)","DXC Technology (United States)","First Data (United States)","SKA Observatory","Texas Instruments (United States)"],"concepts":[{"id":"https://openalex.org/C2779304628","display_name":"Face (sociological concept)","score":0.5088484287261963},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.481794536113739},{"id":"https://openalex.org/C66322947","display_name":"Transformer","score":0.47631606459617615},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4256758689880371},{"id":"https://openalex.org/C127413603","display_name":"Engineering","score":0.2939213812351227},{"id":"https://openalex.org/C144024400","display_name":"Sociology","score":0.13070398569107056},{"id":"https://openalex.org/C119599485","display_name":"Electrical engineering","score":0.08932921290397644},{"id":"https://openalex.org/C36289849","display_name":"Social science","score":0.07349279522895813}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":3}},{"id":"openalex:W4401684116","title":"3D Molecular Pocket-based Generation with Token-only Large Language Model","url":"https://doi.org/10.26434/chemrxiv-2024-0ckgt-v2","published":"2024-08-19","authors":["Jike Wang","Hao Luo","Rui Qin","Mingyang Wang","Meijing Fang","Odin Zhang","Qiaolin Gou","Qun Su","Chao Shen","Ziyi You","Xiaozhe Wan","Liwei Liu"],"abstract":"Designing innovative molecular structures tailored to specific protein targets represents a fundamental challenge in drug discovery. Most existing approaches based on graph neural networks for generating three-dimensional (3D) molecules within protein pockets often produce molecules with invalid configurations, suboptimal drug-like qualities and limited synthesizability, while also requiring extended generation times. To address these challenges, we present 3DSMILES-GPT, a fully language-model-driven framework for 3D molecular generation. Initially, leveraging the architecture of large language models, we treat both two-dimensional (2D) and 3D molecular representations as linguistic expressions and pre-train the model on an extensive dataset. This approach enables the model to comprehensively understand the 2D and 3D characteristics of large-scale molecules. Subsequently, we fine-tune th...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"preprint","doi":"https://doi.org/10.26434/chemrxiv-2024-0ckgt-v2","openalex_id":"https://openalex.org/W4401684116","cited_by_count":1,"quality_score":42,"matched_keywords":["language model"],"author_affiliations":["Huawei Technologies (China)","Zhejiang University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7201759219169617},{"id":"https://openalex.org/C74187038","display_name":"Drug discovery","score":0.6177342534065247},{"id":"https://openalex.org/C48145219","display_name":"Security token","score":0.6068486571311951},{"id":"https://openalex.org/C98045186","display_name":"Process (computing)","score":0.476842999458313},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.45013371109962463},{"id":"https://openalex.org/C97541855","display_name":"Reinforcement learning","score":0.4496699273586273},{"id":"https://openalex.org/C132525143","display_name":"Graph","score":0.44631466269493103},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.3456956744194031}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/on-the-necessity-of-world-knowledge-for-mitigating-missing-labels-in-extreme-classification","title":"On the Necessity of World Knowledge for Mitigating Missing Labels in Extreme Classification","url":"https://www.microsoft.com/en-us/research/publication/on-the-necessity-of-world-knowledge-for-mitigating-missing-labels-in-extreme-classification/","published":"2024-08-18","authors":["Jatin Prakash","Anirudh Buvanesh","Bishal Santra","D. Saini","Sachin Yadav","Jian Jiao","Yashoteja Prabhu","Amit Sharma","Manik Varma"],"abstract":"Extreme Classification (XC) aims to map a query to the most relevant documents from a very large document set. XC algorithms used in real-world applications typically learn this mapping from datasets curated from implicit feedback, such as user clicks. However, these datasets often suffer from missing labels. In this work, we observe that systematic missing labels lead to missing knowledge, which is critical for modelling relevance between queries and documents. We formally show that this absence of knowledge is hard to recover using existing methods such as propensity weighting and data imputation strategies that solely rely on the training dataset. While Large Language Models (LLMs) provide an attractive solution to augment the missing knowledge, leveraging them in applications with low latency requirements and large document sets is challenging. To mitigate missing knowledge at scale,...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.1145/3690624.3709290","openalex_id":"https://openalex.org/W4409150274","cited_by_count":1,"quality_score":69,"matched_keywords":["Article (Journal)","Artificial intelligence","Computer science","retrieval"],"author_affiliations":["Microsoft","Bellevue Hospital Center","Microsoft (United States)","Microsoft Research (India)","Mila - Quebec Artificial Intelligence Institute","New York University"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"arxiv:2404.15506","title":"Metric3D v2: A Versatile Monocular Geometric Foundation Model for Zero-Shot Metric Depth and Surface Normal Estimation","url":"http://arxiv.org/abs/2404.15506","published":"2024-08-16","authors":["Mu Hu","Wei Yin","Chi Zhang","Zhipeng Cai","Xiaoxiao Long","Hao Chen","Kaixuan Wang","Gang Yu","Chunhua Shen","Shaojie Shen"],"abstract":"We introduce Metric3D v2, a geometric foundation model designed for zero-shot metric depth and surface normal estimation from single images, critical for accurate 3D recovery. Depth and normal estimation, though complementary, present distinct challenges. State-of-the-art monocular depth methods achieve zero-shot generalization through affine-invariant depths, but fail to recover real-world metric scale. Conversely, current normal estimation techniques struggle with zero-shot performance due to insufficient labeled data. We propose targeted solutions for both metric depth and normal estimation. For metric depth, we present a canonical camera space transformation module that resolves metric ambiguity across various camera models and large-scale datasets, which can be easily integrated into existing monocular models. For surface normal estimation, we introduce a joint depth-normal optimiza...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"preprint","doi":"https://doi.org/10.1109/tpami.2024.3444912","openalex_id":"https://openalex.org/W4395481595","cited_by_count":145,"quality_score":67,"matched_keywords":[],"author_affiliations":["HKU-Pasteur Research Pole","Hong Kong University of Science and Technology","Intel (United States)","Tencent (China)","The University of Adelaide","University of Hong Kong","Westlake University","Zhejiang University"],"concepts":[{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.630105197429657},{"id":"https://openalex.org/C176217482","display_name":"Metric (unit)","score":0.5757448077201843},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.5699898600578308},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5676501393318176},{"id":"https://openalex.org/C92757383","display_name":"Affine transformation","score":0.440822571516037},{"id":"https://openalex.org/C65909025","display_name":"Monocular","score":0.43806764483451843},{"id":"https://openalex.org/C33923547","display_name":"Mathematics","score":0.36391153931617737},{"id":"https://openalex.org/C2524010","display_name":"Geometry","score":0.16119793057441711}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":145}},{"id":"apple:jp6ac3ekj79079qkkhsd6npn","title":"ReALM: Reference Resolution as Language Modeling","url":"https://machinelearning.apple.com/research/realm-reference","published":"2024-08-16","authors":["Joel Moniz","Soundarya Krishnan","Melis Ozyildirim","Prathamesh Saraf","Halim Cagri Ates","Yuan Zhang","Hong Yu"],"abstract":"Reference resolution is an important problem, one that is essential to understand and successfully handle contexts of different kinds. This context includes both previous turns and context that pertains to non-conversational entities, such as entities on the user's screen or those running in the background. While LLMs have been shown to be extremely powerful for a variety of tasks, their use in reference resolution, particularly for...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/llexus-an-ai-agent-system-for-incident-management","title":"LLexus: an AI agent system for incident management","url":"https://www.microsoft.com/en-us/research/publication/llexus-an-ai-agent-system-for-incident-management/","published":"2024-08-15","authors":["Pedro Las-Casas","Alok Kumbhare","Rodrigo Fonseca","Sharad Agarwal"],"abstract":"When operating a software service on a cloud, the complexity of keeping multiple distributed components responsive is a significant challenge for engineering teams. Engineers frequently rely on Troubleshooting Guides (TSGs) to navigate how to mitigate performance or outage incidents. However, the effectiveness of TSGs is often hindered by their length, implicit reliance on tribal knowledge, and the variable quality of their content. This paper introduces LLexus, an agent-based AI system to automate the execution of TSGs.","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Article (Journal)","Artificial intelligence","Systems and networking","Agent AI","Computer network","agent"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/a-methodology-for-using-large-language-models-to-create-user-friendly-applications-for-medicaid-redetermination-and-other-social-services","title":"A Methodology for Using Large Language Models to Create User-Friendly Applications for Medicaid Redetermination and Other Social Services","url":"https://www.microsoft.com/en-us/research/publication/a-methodology-for-using-large-language-models-to-create-user-friendly-applications-for-medicaid-redetermination-and-other-social-services/","published":"2024-08-15","authors":["Sumanth Ratna","Bill Weeks","Juan M. Lavista Ferres","Aneesh Chopra","Mayana Pereira"],"abstract":"Background Following the unwinding of Medicaid’s continuous enrollment provision, states must redetermine Medicaid eligibility, creating uncertainty about coverage [ 1 ] and the widespread administrative removal of beneficiaries from rolls [ 2 ]. Existing research demonstrates that Large Language Models (LLMs) can automate clinical trial eligibility query extraction [ 3 ], generation [ 4 ], and classification [ 5 ]. Given that Medicaid redetermination follows eligibility rules similar to those in clinical trials, we thought LLMs might help with Medicaid redetermination, as well. Therefore, using the State of Washington, South Carolina, and North Dakota as examples, we applied LLMs to extract Medicaid rules from publicly available documents and transform those rules into a web application that could allow users to determine whether they are eligible for Medicaid. This paper describes the....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Article (Journal)","Artificial intelligence","Medical, health and genomics","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W4406356737","title":"Precise Image Editing with Multimodal Agents","url":"https://doi.org/10.1109/prai62207.2024.10826725","published":"2024-08-15","authors":["Bin Fu","Chi Zhang","Fukun Yin","Cheng Pei","Zebiao Huang"],"abstract":"The rapid advancements in large language models (LLMs) have revolutionized the field of artificial intelligence, enabling the development of sophisticated agents capable of performing complex tasks. The emergence of multimodal LLMs, such as GPT-4 Vision, has further expanded the possibilities by allowing agents to process and understand visual data directly. However, current end-to-end solutions for image content editing often fall short in terms of stability, precision, and interpretability. Motivated by these limitations, we propose a novel framework that leverages the capabilities of multimodal agents to execute precise image editing tasks in a sequential and logical manner. Our approach integrates a comprehensive suite of advanced image editing tools into the action space of the agent, enabling it to interact directly with these tools through their APIs. Additionally, we introduce a....","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/prai62207.2024.10826725","openalex_id":"https://openalex.org/W4406356737","cited_by_count":1,"quality_score":42,"matched_keywords":["agent"],"author_affiliations":["Fudan University","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7468134760856628},{"id":"https://openalex.org/C2776674983","display_name":"Image editing","score":0.5317417979240417},{"id":"https://openalex.org/C115961682","display_name":"Image (mathematics)","score":0.5076624155044556},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.4768870174884796},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.4383352994918823},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.41287875175476074},{"id":"https://openalex.org/C121684516","display_name":"Computer graphics (images)","score":0.3503495454788208}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/deepseek-prover-v1-5-harnessing-proof-assistant-feedback-for-reinforcement-learning-and-monte-carlo-tree-search","title":"DeepSeek-Prover-V1.5: Harnessing Proof Assistant Feedback for Reinforcement Learning and Monte-Carlo Tree Search","url":"https://www.microsoft.com/en-us/research/publication/deepseek-prover-v1-5-harnessing-proof-assistant-feedback-for-reinforcement-learning-and-monte-carlo-tree-search/","published":"2024-08-14","authors":["Huajian Xin","Z. Ren","Jun-Mei Song","Zhihong Shao","Wanjia Zhao","Haocheng Wang","Bo Liu (Benjamin Liu)","Liyue Zhang","Xuan Lu","Qiushi Du","W. Gao","Qihao Zhu"],"abstract":"We introduce DeepSeek-Prover-V1.5, an open-source language model designed for theorem proving in Lean 4, which enhances DeepSeek-Prover-V1 by optimizing both training and inference processes. Pre-trained on DeepSeekMath-Base with specialization in formal mathematical languages, the model undergoes supervised fine-tuning using an enhanced formal theorem proving dataset derived from DeepSeek-Prover-V1. Further refinement is achieved through reinforcement learning from proof assistant feedback (RLPAF). Beyond the single-pass whole-proof generation approach of DeepSeek-Prover-V1, we propose RMaxTS, a variant of Monte-Carlo tree search that employs an intrinsic-reward-driven exploration strategy to generate diverse proof paths. DeepSeek-Prover-V1.5 demonstrates significant improvements over DeepSeek-Prover-V1, achieving new state-of-the-art results on the test set of the high school level min...","companies":["Microsoft","DeepSeek"],"matched_orgs":["Microsoft","DeepSeek"],"company_groups":["company_us","company_china"],"company_regions":["US","China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":96,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computation and Language","Computer science","1970-01-01","HuggingFace org papers","deepseek-ai","language model"],"author_affiliations":["Microsoft","DeepSeek"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/does-reasoning-emerge-examining-the-probabilities-of-causation-in-large-language-models","title":"Does Reasoning Emerge? Examining the Probabilities of Causation in Large Language Models","url":"https://www.microsoft.com/en-us/research/publication/does-reasoning-emerge-examining-the-probabilities-of-causation-in-large-language-models/","published":"2024-08-14","authors":["Javier González","Aditya Nori"],"abstract":"Recent advances in AI have been significantly driven by the capabilities of large language models (LLMs) to solve complex problems in ways that resemble human thinking. However, there is an ongoing debate about the extent to which LLMs are capable of actual reasoning. Central to this debate are two key probabilistic concepts that are essential for connecting causes to their effects: the probability of necessity (PN) and the probability of sufficiency (PS). This paper introduces a framework that is both theoretical and practical, aimed at assessing how effectively LLMs are able to replicate real-world reasoning mechanisms using these probabilistic measures. By viewing LLMs as abstract machines that process information through a natural language interface, we examine the conditions under which it is possible to compute suitable approximations of PN and PS. Our research marks an important s...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","large language models","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/towards-flexible-visual-relationship-segmentation","title":"Towards Flexible Visual Relationship Segmentation","url":"https://www.microsoft.com/en-us/research/publication/towards-flexible-visual-relationship-segmentation/","published":"2024-08-14","authors":["Fangrui Zhu","Jianwei Yang","Huaizu Jiang"],"abstract":"Visual relationship understanding has been studied separately in human-object interaction (HOI) detection, scene graph generation (SGG), and referring relationships (RR) tasks. Given the complexity and interconnectedness of these tasks, it is crucial to have a flexible framework that can effectively address these tasks in a cohesive manner. In this work, we propose FleVRS, a single model that seamlessly integrates the above three aspects in standard and promptable visual relationship segmentation, and further possesses the capability for open-vocabulary segmentation to adapt to novel scenarios. FleVRS leverages the synergy between text and image modalities, to ground various types of relationships from images and use textual features from vision-language models to visual conceptual understanding. Empirical validation across various datasets demonstrates that our framework outperforms exi...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","human-object interaction"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W4401541163","title":"LLMs-based machine translation for E-commerce","url":"https://doi.org/10.1016/j.eswa.2024.125087","published":"2024-08-13","authors":["Dehong Gao","Kaidi Chen","Ben Chen","H.-F. Dai","Linbo Jin","Wen Jiang","Wei Ning","Shanqing Yu","Qi Xuan","Xiaoyan Cai","Libin Yang","Zhen Wang"],"abstract":"","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1016/j.eswa.2024.125087","openalex_id":"https://openalex.org/W4401541163","cited_by_count":28,"quality_score":65,"matched_keywords":[],"author_affiliations":["Alibaba Group (China)","Northwestern Polytechnical University","Zhejiang University of Technology"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7279295325279236},{"id":"https://openalex.org/C149364088","display_name":"Translation (biology)","score":0.6004949808120728},{"id":"https://openalex.org/C203005215","display_name":"Machine translation","score":0.5977216362953186},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4051506221294403},{"id":"https://openalex.org/C185592680","display_name":"Chemistry","score":0.09331649541854858},{"id":"https://openalex.org/C104317684","display_name":"Gene","score":0.0},{"id":"https://openalex.org/C105580179","display_name":"Messenger RNA","score":0.0},{"id":"https://openalex.org/C55493867","display_name":"Biochemistry","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":28}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/mutual-reasoning-makes-smaller-llms-stronger-problem-solvers","title":"Mutual Reasoning Makes Smaller LLMs Stronger Problem-Solvers","url":"https://www.microsoft.com/en-us/research/publication/mutual-reasoning-makes-smaller-llms-stronger-problem-solvers/","published":"2024-08-13","authors":["Zhenting Qi","Mingyuan Ma","Jiahang Xu","Li Lyna Zhang","Fan Yang","Mao Yang"],"abstract":"This paper introduces rStar, a self-play mutual reasoning approach that significantly improves reasoning capabilities of small language models (SLMs) without fine-tuning or superior models. rStar decouples reasoning into a self-play mutual generation-discrimination process. First, a target SLM augments the Monte Carlo Tree Search (MCTS) with a rich set of human-like reasoning actions to construct higher quality reasoning trajectories. Next, another SLM, with capabilities similar to the target SLM, acts as a discriminator to verify each trajectory generated by the target SLM. The mutually agreed reasoning trajectories are considered mutual consistent, thus are more likely to be correct. Extensive experiments across five SLMs demonstrate rStar can effectively solve diverse reasoning problems, including GSM8K, GSM-Hard, MATH, SVAMP, and StrategyQA. Remarkably, rStar boosts GSM8K accuracy fr...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W4401626633","title":"Research on Image Generation Optimization based Deep Learning","url":"https://doi.org/10.20944/preprints202408.0927.v1","published":"2024-08-13","authors":["Hao Yan","Zixiang Wang","Bo Shi","Yi Zhao","Yang Zhang","Ranran Lyu"],"abstract":"Image generation optimization is an important research direction in the field of deep learning, which aims to improve the performance of image generation models and the quality of generated images. In recent years, researchers have made significant progress in image generation optimization with the development of deep generative models such as generative adversarial networks (GANs) and variational autoencoders (VAEs). These models are able to generate high-quality, realistic images by learning the distribution of image data. In this study, a deep learning-based image generation optimization model was adopted, which combined the advantages of GAN and VAE. The model architecture consists of a generator and a discriminator, where the generator is responsible for generating the image and the discriminator is used to judge the authenticity of the image. In addition, the model also introduces....","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"preprint","doi":"https://doi.org/10.20944/preprints202408.0927.v1","openalex_id":"https://openalex.org/W4401626633","cited_by_count":9,"quality_score":46,"matched_keywords":[],"author_affiliations":["Amazon (United States)","Northeastern University","Syracuse University","Boston University","Film Independent"],"concepts":[{"id":"https://openalex.org/C2779803651","display_name":"Discriminator","score":0.7754721641540527},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7607980370521545},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.7294962406158447},{"id":"https://openalex.org/C108583219","display_name":"Deep learning","score":0.6888653039932251},{"id":"https://openalex.org/C2780992000","display_name":"Generator (circuit theory)","score":0.6705636382102966},{"id":"https://openalex.org/C112972136","display_name":"Stability (learning theory)","score":0.559567391872406},{"id":"https://openalex.org/C115961682","display_name":"Image (mathematics)","score":0.5428334474563599},{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.5065985918045044}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":9}},{"id":"openalex:W4401538057","title":"A Distinct Approach to Clinical GenAI Oversight","url":"https://doi.org/10.31219/osf.io/vm6zy","published":"2024-08-13","authors":["Fabio A. Thiers","Kimberly Lucy"],"abstract":"Policymakers are determined to regulate clinical Generative AI (GenAI) solutions, but progress has been hampered by the commingling with regulatory approaches originally designed for Narrow AI (NarAI). This article clarifies this matter by describing the distinctive function and risk profile of GenAI models in healthcare settings. It elaborates why regulatory frameworks crafted for NarAI oversight are not adequate for GenAI because of their distinct nature. A first principle analysis is then used to delineate the pivotal role that healthcare organizations will need to take in GenAI oversight. Finally, it describes a distinct approach to clinical GenAI regulation that combines centralized benchmarking of GenAI models with the ISO/IEC 42001 certification of AI Management Systems (AIMS) implemented in healthcare organizations.","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"preprint","doi":"https://doi.org/10.31219/osf.io/vm6zy","openalex_id":"https://openalex.org/W4401538057","cited_by_count":1,"quality_score":38,"matched_keywords":[],"author_affiliations":["Microsoft (United States)"],"concepts":[{"id":"https://openalex.org/C86251818","display_name":"Benchmarking","score":0.764640212059021},{"id":"https://openalex.org/C46304622","display_name":"Certification","score":0.6828526258468628},{"id":"https://openalex.org/C160735492","display_name":"Health care","score":0.5989924669265747},{"id":"https://openalex.org/C14036430","display_name":"Function (biology)","score":0.5977823734283447},{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.467311829328537},{"id":"https://openalex.org/C195094911","display_name":"Process management","score":0.4170709252357483},{"id":"https://openalex.org/C539667460","display_name":"Management science","score":0.4040168523788452},{"id":"https://openalex.org/C112930515","display_name":"Risk analysis (engineering)","score":0.39575183391571045}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"apple:xn1r8i9d1h5kmo3kxuybhj5k","title":"APE: Active Prompt Engineering - Identifying Informative Few-Shot Examples for LLMs","url":"https://machinelearning.apple.com/research/ape-active-prompt-engineering","published":"2024-08-12","authors":["Kun Qian","Farima Fatahi Bayat","Anton Belyi","Yash Govind","Rahul Khot","Katherine Luna","Azadeh Nikfarjam","Xiaoguang Qi","Yisi Sang","Fei Wu","Victor (AIML) Zhang","Yunyao Li"],"abstract":"Prompt engineering is an iterative procedure that often requires extensive manual efforts to formulate suitable instructions for effectively directing large language models (LLMs) in specific tasks. Incorporating few-shot examples is a vital and efficacious approach to provide LLMs with precise and tangible instructions, leading to improved LLM performance. Nonetheless, identifying the most informative demonstrations for LLMs is labor-intensive,...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["LLM"],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"openalex:W4404103133","title":"A Novel Summarization Framework based on Reference-Free Evaluation of Multiple Large Language Models","url":"http://dx.doi.org/10.1109/metacom62920.2024.00047","published":"2024-08-12","authors":["Wei Feng","Huan Zhao","Min Zhang","Hao Yang","Wei Tang"],"abstract":"Recently many researchers study the problem of summarizing the abstraction of medium news utilizing Large Language Models (LLMs). However, the single model usually produces results with some flaws. This paper studies how to make fusion of results from multiple LLMs. Our main contribution is summarized as follows. First, we trained multiple summarization models from multiple LLMs, including Meta-Llama-3-8B-Instruct, Qwen2-7B-Instruct, and GLM-4-9b-chat. Second, we design a new reference-free evaluation metric, which could make fusion of results from multiple big models. Third, we make experiments to evaluate the performance of our proposed framework. The experiments show that the fusion solution multiple models could produce better results than single model. The summarization result produced by this fusion solution is more consistent with human evaluation terms of coherence, consistency,....","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/metacom62920.2024.00047","openalex_id":"https://openalex.org/W4404103133","cited_by_count":0,"quality_score":41,"matched_keywords":["news"],"author_affiliations":["Huawei Technologies (China)"],"concepts":[{"id":"https://openalex.org/C170858558","display_name":"Automatic summarization","score":0.9065783023834229},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8237031698226929},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.5264920592308044},{"id":"https://openalex.org/C134714966","display_name":"Multi-document summarization","score":0.5223742127418518},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4133307635784149}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/lut-tensor-core-lookup-table-enables-efficient-low-bit-llm-inference-acceleration","title":"LUT Tensor Core: Lookup Table Enables Efficient Low-Bit LLM Inference Acceleration","url":"https://www.microsoft.com/en-us/research/publication/lut-tensor-core-lookup-table-enables-efficient-low-bit-llm-inference-acceleration/","published":"2024-08-11","authors":["Zhiwen Mo","Lei Wang","Jianyu Wei","Zhichen Zeng","Shijie Cao","Lingxiao Ma","Naifeng Jing","Ting Cao","Jilong Xue","Fan Yang","Mao Yang"],"abstract":"As large language model (LLM) inference demands ever-greater resources, there is a rapid growing trend of using low-bit weights to shrink memory usage and boost inference efficiency. However, these low-bit LLMs introduce the need for mixed-precision matrix multiplication (mpGEMM), which is a crucial yet under-explored operation that involves multiplying lower-precision weights with higher-precision activations. Unfortunately, current hardware does not natively support mpGEMM, resulting in indirect and inefficient dequantization-based implementations. To address the mpGEMM requirements in low-bit LLMs, we explored the lookup table (LUT)-based approach for mpGEMM. However, a conventional LUT implementation falls short of its potential. To fully harness the power of LUT-based mpGEMM, we introduce LUT Tensor Core, a software-hardware co-design optimized for low-bit LLM inference. Specificall...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":84,"matched_keywords":["Miscellaneous","Artificial intelligence","Computer science","large language models","LLM","language model","memory","efficient"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/interpretable-user-satisfaction-estimation-for-conversational-systems-with-large-language-models","title":"Interpretable User Satisfaction Estimation for Conversational Systems with Large Language Models","url":"https://www.microsoft.com/en-us/research/publication/interpretable-user-satisfaction-estimation-for-conversational-systems-with-large-language-models/","published":"2024-08-11","authors":["Ying-Chun Lin","Jennifer Neville","Jack W. Stokes","Longqi Yang","Tara Safavi","Mengting Wan","Scott Counts","Siddharth Suri","Reid Andersen","Xiaofeng Xu","Deepak Gupta","Sujay Kumar Jauhar"],"abstract":"Accurate and interpretable user satisfaction estimation (USE) is critical for understanding, evaluating, and continuously improving conversational systems. Users express their satisfaction or dissatisfaction with diverse conversational patterns in both general-purpose (ChatGPT and Bing Copilot) and task-oriented (customer service chatbot) conversational systems. Existing approaches based on featurized ML models or text embeddings fall short in extracting generalizable patterns and are hard to interpret. In this work, we show that LLMs can extract interpretable signals of user satisfaction from their natural language utterances more effectively than embedding-based approaches. Moreover, an LLM can be tailored for USE via an iterative prompting framework using supervision from labeled examples. The resulting method, Supervised Prompting for User satisfaction Rubrics (SPUR), not only has hi...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":80,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Search and information retrieval","Computer science","Information retrieval","1970-01-01","LLM"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/call-me-when-necessary-llms-can-efficiently-and-faithfully-reason-over-structured-environments","title":"Call Me When Necessary: LLMs can Efficiently and Faithfully Reason over Structured Environments","url":"https://www.microsoft.com/en-us/research/publication/call-me-when-necessary-llms-can-efficiently-and-faithfully-reason-over-structured-environments/","published":"2024-08-11","authors":["Sitao Cheng","Ziyuan Zhuang","Yong Xu","Fangkai Yang","Chaoyun Zhang","Xiaoting Qin","Xiang Huang","Ling Chen","Qingwei Lin 林庆维","Dongmei Zhang","Saravan Rajmohan","Qi Zhang"],"abstract":"Large Language Models (LLMs) have shown potential in reasoning over structured environments, e.g., knowledge graphs and tables. Such tasks typically require multi-hop reasoning, i.e., match natural language utterance with instances in the environment. Previous works adopt LLMs to incrementally build a reasoning path, where LLMs either invoke tools or pick up items by step-by-step interacting with the environment. We propose Reasoning-Path-Editing (Readi), a novel framework where LLMs can efficiently and faithfully reason over structured environments. In Readi, LLMs initially generate a reasoning path given a query, and edit the path only when necessary. We instantiate the path on structured environments and provide feedback to edit the path if anything goes wrong. Experimental results on three KGQA and two TableQA datasets show the effectiveness of Readi, significantly surpassing previou...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Knowledge graph","large language models","1970-01-01","LLM"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"official:83241bebdf175cb9","title":"LM Transparency Tool: Interactive Tool for Analyzing Transformer Language Models","url":"https://ai.meta.com/research/publications/lm-transparency-tool-interactive-tool-for-analyzing-transformer-language-models/","published":"2024-08-11","authors":["Igor Tufanov","Karen Hambardzumyan","Javier Ferrando","Lena Voita"],"abstract":"Official Meta AI publication page.","companies":["Meta/FAIR"],"matched_orgs":["Meta/FAIR"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["NLP"],"author_affiliations":["Meta/FAIR"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Meta AI publication search https://ai.meta.com/results/?content_types%5B0%5D=publication&page=10"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/vullibgen-generating-names-of-vulnerability-affected-packages-via-a-large-language-model","title":"VulLibGen: Generating Names of Vulnerability-Affected Packages via a Large Language Model","url":"https://www.microsoft.com/en-us/research/publication/vullibgen-generating-names-of-vulnerability-affected-packages-via-a-large-language-model/","published":"2024-08-09","authors":["Tianyu Chen","Lin Li","Liuchuan Zhu","Zongyang Li","Xueqing Liu","Guangtai Liang","Qianxiang Wang","Tao Xie"],"abstract":"Security practitioners maintain vulnerability reports (e.g., GitHub Advisory) to help developers mitigate security risks. An important task for these databases is automatically extracting structured information mentioned in the report, e.g., the affected software packages, to accelerate the defense of the vulnerability ecosystem. However, it is challenging for existing work on affected package identification to achieve a high accuracy. One reason is that all existing work focuses on relatively smaller models, thus they cannot harness the knowledge and semantic capabilities of large language models. To address this limitation, we propose VulLibGen, the first method to use LLM for affected package identification. In contrast to existing work, VulLibGen proposes the novel idea to directly generate the affected package. To improve the accuracy, VulLibGen employs supervised fine-tuning (SFT),...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":84,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Security, privacy, and cryptography","Computer science","1970-01-01","LLM","language model","retrieval"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/llmjudge-llms-for-relevance-judgments","title":"LLMJudge: LLMs for Relevance Judgments","url":"https://www.microsoft.com/en-us/research/publication/llmjudge-llms-for-relevance-judgments/","published":"2024-08-09","authors":["Hossein A Rahmani","Emine Yilmaz","Nick Craswell","Bhaskar Mitra","Paul Thomas","Charles L A Clarke","Mohammad Aliannejadi","Clemencia Siro","Guglielmo Faggioli"],"abstract":"The LLMJudge challenge is organized as part of the LLM4Eval workshop at SIGIR 2024. Test collections are essential for evaluating information retrieval (IR) systems. The evaluation and tuning of a search system is largely based on relevance labels, which indicate whether a document is useful for a specific search and user. However, collecting relevance judgments on a large scale is costly and resource-intensive. Consequently, typical experiments rely on third-party labelers who may not always produce accurate annotations. The LLMJudge challenge aims to explore an alternative approach by using LLMs to generate relevance judgments. Recent studies have shown that LLMs can generate reliable relevance judgments for search systems. However, it remains unclear which LLMs can match the accuracy of human labelers, which prompts are most effective, how fine-tuned open-source LLMs compare to closed...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Miscellaneous","Search and information retrieval","Information retrieval","retrieval"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"official:b375a949e0a2c83a","title":"Qwen2-Audio: Chat with Your Voice!","url":"https://qwenlm.github.io/blog/qwen2-audio/","published":"2024-08-09","authors":["Alibaba/Qwen"],"abstract":"DEMO PAPER GITHUB HUGGING FACE MODELSCOPE DISCORDTo achieve the objective of building an AGI system, the model should be capable of understanding information from different modalities. Thanks to the rapid development of large language models, LLMs are now capable of understanding language and reasoning. Previously we have taken a step forward to extend our LLM, i.e., Qwen, to more modalities, including vision and audio, and built Qwen-VL and Qwen-Audio. Today, we release Qwen2-Audio, the next version of Qwen-Audio, which is capable of accepting audio and text inputs and generating text outputs.","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["LLM"],"author_affiliations":["Alibaba/Qwen"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official RSS feed https://qwenlm.github.io/blog/index.xml"}},{"id":"openalex:W4401442820","title":"RemixFormer++: A Multi-Modal Transformer Model for Precision Skin Tumor Differential Diagnosis With Memory-Efficient Attention","url":"https://doi.org/10.1109/tmi.2024.3441012","published":"2024-08-09","authors":["Jing Xu","Kai Huang","Lianzhen Zhong","Yuan Gao","Kai Sun","Wei Liu","Yanjie Zhou","Wenchao Guo","Yuan Guo","Yuanqiang Zou","Yuping Duan","Le Lü"],"abstract":"Diagnosing malignant skin tumors accurately at an early stage can be challenging due to ambiguous and even confusing visual characteristics displayed by various categories of skin tumors. To improve diagnosis precision, all available clinical data from multiple sources, particularly clinical images, dermoscopy images, and medical history, could be considered. Aligning with clinical practice, we propose a novel Transformer model, named RemixFormer++ that consists of a clinical image branch, a dermoscopy image branch, and a metadata branch. Given the unique characteristics inherent in clinical and dermoscopy images, specialized attention strategies are adopted for each type. Clinical images are processed through a top-down architecture, capturing both localized lesion details and global contextual information. Conversely, dermoscopy images undergo a bottom-up processing with two-level hier...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/tmi.2024.3441012","openalex_id":"https://openalex.org/W4401442820","cited_by_count":9,"quality_score":54,"matched_keywords":["memory","efficient"],"author_affiliations":["Alibaba Group (China)","Central South University","Tianjin University","Xiangya Hospital Central South University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7340993881225586},{"id":"https://openalex.org/C93518851","display_name":"Metadata","score":0.6399582624435425},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.6194265484809875},{"id":"https://openalex.org/C118505674","display_name":"Encoder","score":0.49971556663513184},{"id":"https://openalex.org/C153180895","display_name":"Pattern recognition (psychology)","score":0.47306978702545166},{"id":"https://openalex.org/C71139939","display_name":"Modal","score":0.4391821324825287},{"id":"https://openalex.org/C2779974597","display_name":"Clinical Practice","score":0.43246781826019287},{"id":"https://openalex.org/C2780226545","display_name":"Modality (human–computer interaction)","score":0.4169369041919708}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":9}},{"id":"openalex:W4401435830","title":"BrainSegFounder: Towards 3D foundation models for neuroimage segmentation","url":"https://doi.org/10.1016/j.media.2024.103301","published":"2024-08-08","authors":["J. Charles Cox","Peng Liu","Skylar E. Stolte","Yunchao Yang","Kang Liu","Kyle B. See","Huiwen Ju","Ruogu Fang"],"abstract":"","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1016/j.media.2024.103301","openalex_id":"https://openalex.org/W4401435830","cited_by_count":43,"quality_score":67,"matched_keywords":[],"author_affiliations":["Nvidia (United States)","University of Florida"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7372431755065918},{"id":"https://openalex.org/C89600930","display_name":"Segmentation","score":0.7313569784164429},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.7065721750259399},{"id":"https://openalex.org/C58693492","display_name":"Neuroimaging","score":0.5892887115478516},{"id":"https://openalex.org/C31601959","display_name":"Medical imaging","score":0.5241553783416748},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.5108487010002136},{"id":"https://openalex.org/C108583219","display_name":"Deep learning","score":0.46569952368736267},{"id":"https://openalex.org/C153180895","display_name":"Pattern recognition (psychology)","score":0.3527422547340393}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":43}},{"id":"official:2e9bb4d1f7803fd7","title":"Introducing Qwen2-Math","url":"https://qwenlm.github.io/blog/qwen2-math/","published":"2024-08-08","authors":["Alibaba/Qwen"],"abstract":"GITHUB HUGGING FACE MODELSCOPE DISCORD🚨 This model mainly supports English. We will release bilingual (English and Chinese) math models soon. Introduction Over the past year, we have dedicated significant effort to researching and enhancing the reasoning capabilities of large language models, with a particular focus on their ability to solve arithmetic and mathematical problems. Today, we are delighted to introduce a series of math-specific large language models of our Qwen2 series, Qwen2-Math and Qwen2-Math-Instruct-1.","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Alibaba/Qwen"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official RSS feed https://qwenlm.github.io/blog/index.xml"}},{"id":"openalex:W4401436581","title":"Large Language Model Influence on Management Reasoning: A Randomized Controlled Trial","url":"https://doi.org/10.1101/2024.08.05.24311485","published":"2024-08-07","authors":["Ethan Goh","Robert J. Gallo","Eric Strong","Yingjie Weng","Hannah Kerman","Jason Freed","Joséphine A. Cool","Zahir Kanjee","Kathleen P. Lane","Andrew S. Parsons","Neera Ahuja","Eric Horvitz"],"abstract":"Importance: Large language model (LLM) artificial intelligence (AI) systems have shown promise in diagnostic reasoning, but their utility in management reasoning with no clear right answers is unknown. Objective: To determine whether LLM assistance improves physician performance on open-ended management reasoning tasks compared to conventional resources. Design: Prospective, randomized controlled trial conducted from 30 November 2023 to 21 April 2024. Setting: Multi-institutional study from Stanford University, Beth Israel Deaconess Medical Center, and the University of Virginia involving physicians from across the United States. Participants: 92 practicing attending physicians and residents with training in internal medicine, family medicine, or emergency medicine. Intervention: Five expert-developed clinical case vignettes were presented with multiple open-ended management questions an...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"preprint","doi":"https://doi.org/10.1101/2024.08.05.24311485","openalex_id":"https://openalex.org/W4401436581","cited_by_count":10,"quality_score":55,"matched_keywords":["LLM","language model"],"author_affiliations":["Beth Israel Deaconess Medical Center","Center for Innovation","Harvard University","Intel (United States)","Kaiser Permanente","Microsoft (United States)","Stanford Medicine","Stanford University","University of Minnesota Medical Center","University of Virginia","VA Palo Alto Health Care System"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.48927685618400574},{"id":"https://openalex.org/C168563851","display_name":"Randomized controlled trial","score":0.4478762745857239},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.33291059732437134},{"id":"https://openalex.org/C71924100","display_name":"Medicine","score":0.08881253004074097},{"id":"https://openalex.org/C141071460","display_name":"Surgery","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":10}},{"id":"apple:o4hd5c8jazv46k46hdwynx3b","title":"Rephrasing the Web: A Recipe for Compute and Data-Efficient Language Modeling","url":"https://machinelearning.apple.com/research/recipe-for-compute","published":"2024-08-06","authors":["Pratyush Maini","Skyler Seto","Richard Bai","David Grangier","Yizhe Zhang","Navdeep Jaitly"],"abstract":"Large language models are trained on massive scrapes of the web, which are often unstructured, noisy, and poorly phrased. Current scaling laws show that learning from such data requires an abundance of both compute and data, which grows with the size of the model being trained. This is infeasible both because of the large compute costs and duration associated with pre-training, and the impending scarcity of high-quality data on the web. In this...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["efficient"],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"apple:ij2e2jutiq42yzspt1j9ktsb","title":"KGLens: Towards Efficient and Effective Knowledge Probing of Large Language Models with Knowledge Graphs","url":"https://machinelearning.apple.com/research/kglens-towards-efficient","published":"2024-08-06","authors":["Daniel Zheng","Richard Bai","Yizhe Zhang","Yi (Siri) Su","Xiaochuan Niu","Navdeep Jaitly"],"abstract":"This paper was accepted at the Workshop Towards Knowledgeable Language Models at ACL 2024.","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["efficient"],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/player-driven-emergence-in-llm-driven-game-narrative","title":"Player-Driven Emergence in LLM-Driven Game Narrative","url":"https://www.microsoft.com/en-us/research/publication/player-driven-emergence-in-llm-driven-game-narrative/","published":"2024-08-05","authors":["Xiangyu Peng","Jessica Quaye","Sudha Rao","Weijia Xu","Portia Botchway","Chris Brockett","Nebojsa Jojic","Gabriel DesGarennes","Ken Lobb","Michael Xu","Jorge J. G. Leandro","Claire Jin"],"abstract":"We explore how interaction with large language models (LLMs) can give rise to emergent behaviors, empowering players to participate in the evolution of game narratives. Our testbed is a text-adventure game in which players attempt to solve a mystery under a fixed narrative premise, but can freely interact with non-player characters generated by GPT-4, a large language model. We recruit 28 gamers to play the game and use GPT-4 to automatically convert the game logs into a node-graph representing the narrative in the player’s gameplay. We find that through their interactions with the non-deterministic behavior of the LLM, players are able to discover interesting new emergent nodes that were not a part of the original narrative but have potential for being fun and engaging. Players that created the most emergent nodes tended to be those that often enjoy games that facilitate discovery, expl...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":84,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Human language technologies","AI in games","Natural language processing","1970-01-01","LLM","language model"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/geneva-generating-and-visualizing-branching-narratives-using-llms","title":"GENEVA: GENErating and Visualizing branching narratives using LLMs","url":"https://www.microsoft.com/en-us/research/publication/geneva-generating-and-visualizing-branching-narratives-using-llms/","published":"2024-08-05","authors":["Jorge J. G. Leandro (jorgeleandro)","Sudha Rao","Michael Xu","Weijia Xu","Nebojsa Jojic","Chris Brockett","Bill Dolan"],"abstract":"Dialogue-based Role Playing Games (RPGs) require powerful storytelling. The narratives of these may take years to write and typically involve a large creative team. In this work, we demonstrate the potential of large generative text models to assist this process. \\textbf{GENEVA}, a prototype tool, generates a rich narrative graph with branching and reconverging storylines that match a high-level narrative description and constraints provided by the designer. A large language model (LLM), GPT-4, is used to generate the branching narrative and to render it in a graph format in a two-step process. We illustrate the use of GENEVA in generating new branching narratives for four well-known stories under different contextual constraints. This tool has the potential to assist in game development, simulations, and other applications with game-like properties. Link to the GENEVA tool: Visualizing....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":80,"matched_keywords":["Inproceedings (Conference)","Human language technologies","AI in games","Natural language processing","1970-01-01","LLM","language model"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"apple:k9dq1v6vwejmbht27jnyy27x","title":"LLM in a Flash: Efficient Large Language Model Inference with Limited Memory","url":"https://machinelearning.apple.com/research/efficient-large-language","published":"2024-08-05","authors":["Keivan Alizadeh","Iman Mirzadeh","Dmitry Belenko","S. Karen Khatamifard","Minsik Cho","Carlo C Del Mundo","Mohammad Rastegari","Mehrdad Farajtabar"],"abstract":"Large language models (LLMs) are central to modern natural language processing, delivering exceptional performance in various tasks. However, their substantial computational and memory requirements present challenges, especially for devices with limited DRAM capacity. This paper tackles the challenge of efficiently running LLMs that exceed the available DRAM capacity by storing the model parameters in flash memory, but bringing them on demand to...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["LLM","language model","memory","efficient"],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"apple:eijegvxznv725e7yjw3toarv","title":"Direct Large Language Model Alignment Through Self-Rewarding Contrastive Prompt Distillation","url":"https://machinelearning.apple.com/research/direct-large-language","published":"2024-08-05","authors":["Aiwei Liu","Haoping Bai","Zhiyun Lu","Xiang Kong","Simon Wang","Jiulong Shan","Meng Cao","Lijie Wen"],"abstract":"Aligning large language models (LLMs) with human expectations without human-annotated preference data is an important problem. In this paper, we propose a method to evaluate the response preference by using the output probabilities of response pairs under contrastive prompt pairs, which could achieve better performance on LLaMA2-7B and LLaMA2-13B compared to RLAIF. Based on this, we propose an automatic alignment method, Direct Large Model...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["language model","preference","distillation"],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"openalex:W4401943488","title":"Training Interactive Agent in Large FPS Game Map with Rule-enhanced Reinforcement Learning","url":"https://doi.org/10.1109/cog60054.2024.10645654","published":"2024-08-05","authors":["Chen Zhang","Huan Hu","Yuan Zhou","Qiyang Cao","Ruochen Liu","Wenya Wei","Elvis S. Liu"],"abstract":"In the realm of competitive gaming, 3D first-person shooter (FPS) games have gained immense popularity, prompting the development of game AI systems to enhance gameplay. However, deploying game AI in practical scenarios still poses challenges, particularly in large-scale and complex FPS games. In this paper, we focus on the practical deployment of game AI in the online multiplayer competitive 3D FPS game called Arena Breakout, developed by Tencent Games. We propose a novel gaming AI system named Private Military Company Agent (PMCA), which is interactable within a large game map and engages in combat with players while utilizing tactical advantages provided by the surrounding terrain. To address the challenges of navigation and combat in modern 3D FPS games, we introduce a method that combines navigation mesh (Navmesh) and shooting-rule with deep reinforcement learning (NSRL). The integr...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/cog60054.2024.10645654","openalex_id":"https://openalex.org/W4401943488","cited_by_count":2,"quality_score":43,"matched_keywords":["agent"],"author_affiliations":["Tencent (China)","University of Science and Technology of China"],"concepts":[{"id":"https://openalex.org/C97541855","display_name":"Reinforcement learning","score":0.8430594205856323},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7572660446166992},{"id":"https://openalex.org/C67203356","display_name":"Reinforcement","score":0.45395344495773315},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.4497249126434326},{"id":"https://openalex.org/C47932503","display_name":"Error-driven learning","score":0.44225892424583435},{"id":"https://openalex.org/C2777211547","display_name":"Training (meteorology)","score":0.4304671883583069},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4116588830947876},{"id":"https://openalex.org/C49774154","display_name":"Multimedia","score":0.3461375832557678}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":2}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/cachegen-fast-context-loading-for-language-model-applications-via-kv-cache-streaming","title":"CacheGen: Fast Context Loading for Language Model Applications via KV Cache Streaming","url":"https://www.microsoft.com/en-us/research/publication/cachegen-fast-context-loading-for-language-model-applications-via-kv-cache-streaming/","published":"2024-08-04","authors":["Yuhan Liu","Hanchen Li","Yihua Cheng","Siddhant Ray","Yuyang Huang","Qizheng Zhang","Kuntai Du","Jiayi Yao","Shan Lu","Ganesh Ananthanarayanan","Michael Maire","Henry Hoffmann"],"abstract":"As large language models (LLMs) take on complex tasks, their inputs are supplemented with longer contexts that in corporate domain knowledge or user-specific information. Yet using long contexts poses a challenge for responsive LLM systems, as nothing can be generated until the whole context is processed by the LLM. While the context-processing delay can be reduced by reusing the KV cache of a context across different inputs, fetching the KV cache, which contains large tensors, over the network can cause extra network delays. CacheGen is a fast context-loading module for LLM sys tems. First, CacheGen uses a custom tensor encoder, which embraces KV cache’s distributional properties, to encode a KV cache into more compact bitstream representations with negligible encoding/decoding overhead. This reduces the bandwidth demand to fetch the KV cache. Second, to main tain low context-loading de...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":80,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Systems and networking","1970-01-01","LLM","language model","compression"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/mmiu-multimodal-multi-image-understanding-for-evaluating-large-vision-language-models","title":"MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models","url":"https://www.microsoft.com/en-us/research/publication/mmiu-multimodal-multi-image-understanding-for-evaluating-large-vision-language-models/","published":"2024-08-04","authors":["Fanqing Meng","Jin Wang","Chuanhao Li","Quanfeng Lu","Hao Tian","Jiaqi Liao","Xizhou Zhu","Jifeng Dai","Yu Qiao","Ping Luo","Kaipeng Zhang","Wenqi Shao"],"abstract":"The capability to process multiple images is crucial for Large Vision-Language Models (LVLMs) to develop a more thorough and nuanced understanding of a scene. Recent multi-image LVLMs have begun to address this need. However, their evaluation has not kept pace with their development. To fill this gap, we introduce the Multimodal Multi-image Understanding (MMIU) benchmark, a comprehensive evaluation suite designed to assess LVLMs across a wide range of multi-image tasks. MMIU encompasses 7 types of multi-image relationships, 52 tasks, 77K images, and 11K meticulously curated multiple-choice questions, making it the most extensive benchmark of its kind. Our evaluation of 24 popular LVLMs, including both open-source and proprietary models, reveals significant challenges in multi-image comprehension, particularly in tasks involving spatial understanding. Even the most advanced models, such a...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer vision","Computer science","Vision-language models","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/self-enhancing-video-data-management-system-for-compositional-events-with-large-language-models-technical-report","title":"Self-Enhancing Video Data Management System for Compositional Events with Large Language Models [Technical Report]","url":"https://www.microsoft.com/en-us/research/publication/self-enhancing-video-data-management-system-for-compositional-events-with-large-language-models-technical-report/","published":"2024-08-04","authors":["Enhao Zhang","Nicole Sullivan","Brandon Haynes","Ranjay Krishna","Magdalena Balazinska"],"abstract":"Complex video queries can be answered by decomposing them into modular subtasks. However, existing video data management systems assume the existence of predefined modules for each subtask. We introduce VOCAL-UDF, a novel self-enhancing system that supports compositional queries over videos without the need for predefined modules. VOCAL-UDF automatically identifies and constructs missing modules and encapsulates them as user-defined functions (UDFs), thus expanding its querying capabilities. To achieve this, we formulate a unified UDF model that leverages large language models (LLMs) to aid in new UDF generation. VOCAL-UDF handles a wide range of concepts by supporting both program-based UDFs (i.e., Python functions generated by LLMs) and distilled-model UDFs (lightweight vision models distilled from strong pretrained models). To resolve the inherent ambiguity in user intent, VOCAL-UDF g...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Article (Journal)","Data platforms and analytics","Computer science"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"apple:gq6ykts3mv5a46yvgkqbz9gl","title":"BISCUIT: Scaffolding LLM-Generated Code with Ephemeral UIs in Computational Notebooks","url":"https://machinelearning.apple.com/research/biscuit-scaffolding-llm","published":"2024-08-03","authors":["Ruijia Cheng","Titus Barik","Alan Leung","Fred Hohman","Jeffrey Nichols"],"abstract":"This paper was accepted at IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC) 2024.","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["LLM"],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/autogen-studio-a-no-code-developer-tool-for-building-and-debugging-multi-agent-systems","title":"AutoGen Studio: A No-Code Developer Tool for Building and Debugging Multi-Agent Systems","url":"https://www.microsoft.com/en-us/research/publication/autogen-studio-a-no-code-developer-tool-for-building-and-debugging-multi-agent-systems/","published":"2024-08-02","authors":["Victor Dibia","Jingya Chen","Gagan Bansal","Suff Syed","Adam Fourney","Erkang (Eric) Zhu","Chi Wang","Saleema Amershi"],"abstract":"Multi-agent systems, where multiple agents (generative AI models + tools) collaborate, are emerging as an effective pattern for solving long-running, complex tasks in numerous domains. However, specifying their parameters (such as models, tools, and orchestration mechanisms etc,.) and debugging them remains challenging for most developers. To address this challenge, we present AUTOGEN STUDIO, a no-code developer tool for rapidly prototyping, debugging, and evaluating multi-agent workflows built upon the AUTOGEN framework. AUTOGEN STUDIO offers a web interface and a Python API for representing LLM-enabled agents using a declarative (JSON-based) specification. It provides an intuitive drag-and-drop UI for agent workflow specification, interactive evaluation and debugging of workflows, and a gallery of reusable agent components. We highlight four design principles for no-code multi-agent de...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":84,"matched_keywords":["Unpublished","Artificial intelligence","Human-computer interaction","Generative AI","Human–computer interaction","LLM","agent","multi-agent"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/i-want-it-that-way-enabling-interactive-decision-support-using-large-language-models-and-constraint-programming","title":"\"I Want It That Way\": Enabling Interactive Decision Support Using Large Language Models and Constraint Programming","url":"https://www.microsoft.com/en-us/research/publication/i-want-it-that-way-enabling-interactive-decision-support-using-large-language-models-and-constraint-programming/","published":"2024-08-01","authors":["Connor Lawless","Jakob Schoeffer","Lindy Le","Kael Rowan","Shilad Sen","Cristina St. Hill","Jina Suh","Bahar Sarrafzadeh"],"abstract":"A critical factor in the success of many decision support systems is the accurate modeling of user preferences. Psychology research has demonstrated that users often develop their preferences during the elicitation process, highlighting the pivotal role of system-user interaction in developing personalized systems. This paper introduces a novel approach, combining Large Language Models (LLMs) with Constraint Programming to facilitate interactive decision support. We study this hybrid framework through the lens of meeting scheduling, a time-consuming daily activity faced by a multitude of information workers. We conduct three studies to evaluate the novel framework, including a diary study to characterize contextual scheduling preferences, a quantitative evaluation of the system’s performance, and a user study to elicit insights with a technology probe that encapsulates our framework. Our...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.1145/3685053","openalex_id":"https://openalex.org/W4401218824","cited_by_count":19,"quality_score":103,"matched_keywords":["Article (Journal)","Artificial intelligence","Human-computer interaction","Computer science","1970-01-01","LLM","personalized","preference"],"author_affiliations":["Microsoft","Cornell University","Macalester College","Microsoft (United States)","The University of Texas at Austin"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/report-on-the-1st-workshop-on-large-language-model-for-evaluation-in-information-retrieval-llm4eval-2024-at-sigir-2024","title":"Report on the 1st Workshop on Large Language Model for Evaluation in Information Retrieval (LLM4Eval 2024) at SIGIR 2024","url":"https://www.microsoft.com/en-us/research/publication/report-on-the-1st-workshop-on-large-language-model-for-evaluation-in-information-retrieval-llm4eval-2024-at-sigir-2024/","published":"2024-08-01","authors":["Hossein A. Rahmani","Clemencia Siro","Mohammad Aliannejadi","Nick Craswell","Charles L. A. Clarke","Guglielmo Faggioli","Bhaskar Mitra","Paul Thomas","Emine Yilmaz"],"abstract":"The first edition of the workshop on Large Language Model for Evaluation in Information Retrieval (LLM4Eval 2024) took place in July 2024, co-located with the ACM SIGIR Conference 2024 in the USA (SIGIR 2024). The aim was to bring information retrieval researchers together around the topic of LLMs for evaluation in information retrieval that gathered attention with the advancement of large language models and generative AI. Given the novelty of the topic, the workshop was focused around multi-sided discussions, namely panels and poster sessions of the accepted proceedings papers.","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.1145/3722449.3722461","openalex_id":"https://openalex.org/W4408184531","cited_by_count":6,"quality_score":90,"matched_keywords":["Unpublished","Artificial intelligence","Search and information retrieval","automatic evaluation","Information retrieval","large language models","language model","retrieval"],"author_affiliations":["Microsoft","Microsoft (Canada)","Microsoft (United States)","Seattle University","University College London","University of Amsterdam","University of Padua","University of Waterloo"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/lets-fix-this-together-conversational-debugging-with-github-copilot","title":"Let’s Fix this Together: Conversational Debugging with GitHub Copilot","url":"https://www.microsoft.com/en-us/research/publication/lets-fix-this-together-conversational-debugging-with-github-copilot/","published":"2024-08-01","authors":["Yasharth Bajpai","Bhavya Chopra","Param Biyani","Cagri Aslan","Sumit Gulwani","Dustin Coleman","Chris Parnin","Arjun Radhakrishna","Gustavo Soares"],"abstract":"Despite advancements in IDE tooling, code understanding , generation, and automated repair, debugging continues to present significant challenges. Existing debugging strategies available to developers in literature are often too mechanical and rigid for day-to-day issues. Recent advances in Large Language Models (LLMs) promise practical solutions that allow for more free-form debugging strategies. While LLMs offer satisfactory assistance in some cases, they often leap to action without sufficient context, making implicit assumptions and providing inaccurate responses. Moreover, the dialogue between developers and LLMs predominantly takes the form of question-answer pairs, placing the burden of formulating the correct questions and sustaining multi-turn conversations on the developer. We introduce R OBIN , a novel multi-agent conversational AI- assistant within GitHub Copilot Chat, specif...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":84,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Human-computer interaction","Programming languages and software engineering","Human Computer Interaction","1970-01-01","agent","multi-agent"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/autogen-enabling-next-gen-llm-applications-via-multi-agent-conversation-framework","title":"AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation","url":"https://www.microsoft.com/en-us/research/publication/autogen-enabling-next-gen-llm-applications-via-multi-agent-conversation-framework/","published":"2024-08-01","authors":["Qingyun Wu","Gagan Bansal","Jieyu Zhang","Yiran Wu","Beibin Li","Erkang (Eric) Zhu","Li Jiang","Xiaoyun Zhang","Shaokun Zhang","Ahmed Awadallah","Ryen W. White","Doug Burger"],"abstract":"We present AutoGen, an open-source framework that allows developers to build LLM applications by composing multiple agents to converse with each other to accomplish tasks. AutoGen agents are customizable, conversable, and can operate in various modes that employ combinations of LLMs, human inputs, and tools. It also enables developers to create flexible agent behaviors and conversation patterns for different applications using both natural language and code. AutoGen serves as a generic infrastructure and is widely used by AI practitioners and researchers to build diverse applications of various complexities and LLM capacities. We demonstrate the framework’s effectiveness with several pilot applications, on domains ranging from mathematics and coding to question-answering, supply-chain optimization, online decision-making, and entertainment. Venue: 1970-01-01","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":84,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Human-computer interaction","Human–computer interaction","1970-01-01","LLM","agent","multi-agent"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/tabularis-revilio-converting-text-to-tables","title":"Tabularis Revilio: Converting Text to Tables","url":"https://www.microsoft.com/en-us/research/publication/tabularis-revilio-converting-text-to-tables/","published":"2024-08-01","authors":["Mukul Singh","Sumit Gulwani","Vu Le","Gust Verbruggen"],"abstract":"Copying tables from documents and applications without proper tabular support, like PDF documents, web pages or images, surprisingly remains a challenge. In this paper, we present Revilio, a novel neurosymbolic system for reconstructing tables when their column boundaries have been lost. Revilio addresses this task by detecting headers, generating an initial table sketch using a large language model, and using that sketch as a guiding representation during an enumerate-and-test strategy that evaluates syntactic and semantic table structures. We evaluate Revilio on a diverse set of datasets, demonstrating significant improvements over existing table parsing methods. Revilio outperforms traditional techniques in both accuracy and scalability, handling large tables with over 100,000 rows. Our experiments find an increase in reconstruction accuracy by 5.8–11.3% over both neural and symbolic....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Data platforms and analytics","Programming languages and software engineering","1970-01-01","language model"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/productivity-implications-for-generative-ai-role-based-prompts-as-a-networked-hermeneutic","title":"Commentary: Productivity implications for generative AI role-based prompts as a networked hermeneutic","url":"https://www.microsoft.com/en-us/research/publication/productivity-implications-for-generative-ai-role-based-prompts-as-a-networked-hermeneutic/","published":"2024-08-01","authors":["Sean Rintel"],"abstract":"Commentary for Membership categorisation, sociological description and role prompt engineering with ChatGPT - William Housley, Patrik Dahl, 2024 As Housley and Dahl (2024) demonstrate, role-based prompts for Generative AI (GenAI) systems are based on vernacular resources of membership categorization and action description, representing a networked hermeneutic of lay and professional sociology. As a Microsoft Human-Computer Interaction researcher, I see three implications for designing GenAI systems for productivity. Venue: 1970-01-01","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Article (Journal)","Artificial intelligence","Human-computer interaction","Social sciences","Social Science","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/causal-reasoning-and-large-language-models-opening-a-new-frontier-for-causality","title":"Causal Reasoning and Large Language Models: Opening a New Frontier for Causality","url":"https://www.microsoft.com/en-us/research/publication/causal-reasoning-and-large-language-models-opening-a-new-frontier-for-causality/","published":"2024-08-01","authors":["Emre Kiciman","Robert Osazuwa Ness","Amit Sharma","Chenhao Tan"],"abstract":"The causal capabilities of large language models (LLMs) are a matter of significant debate, with critical implications for the use of LLMs in societally impactful domains such as medicine, science, law, and policy. We conduct a \"behavorial\" study of LLMs to benchmark their capability in generating causal arguments. Across a wide range of tasks, we find that LLMs can generate text corresponding to correct causal arguments with high probability, surpassing the best-performing existing methods. Algorithms based on GPT-3.5 and 4 outperform existing algorithms on a pairwise causal discovery task (97%, 13 points gain), counterfactual reasoning task (92%, 20 points gain) and event causality (86% accuracy in determining necessary and sufficient causes in vignettes). We perform robustness checks across tasks and show that the capabilities cannot be explained by dataset memorization alone, especia...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Article (Journal)","Artificial intelligence","Causal inference","large language model","1970-01-01","LLM"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/automatic-bug-detection-in-llm-powered-text-based-games-using-llms","title":"Automatic Bug Detection in LLM-Powered Text-Based Games Using LLMs","url":"https://www.microsoft.com/en-us/research/publication/automatic-bug-detection-in-llm-powered-text-based-games-using-llms/","published":"2024-08-01","authors":["Claire Jin","Sudha Rao","Xiangyu Peng","Portia Botchway","Jessica Quaye","Chris Brockett","Bill Dolan"],"abstract":"Advancements in large language models (LLMs) are revolutionizing interactive game design, enabling dynamic plotlines and interactions between players and non-player characters (NPCs). However, LLMs may exhibit flaws such as hallucinations, forgetfulness, or misinterpretations of prompts, causing logical inconsistencies and unexpected deviations from intended designs. Automated techniques for detecting such game bugs are still lacking. To address this, we propose a systematic LLM-based method for automatically identifying such bugs from player game logs, eliminating the need for collecting additional data such as post-play surveys. Applied to a text-based game DejaBoom!, our approach effectively identifies bugs inherent in LLM-powered interactive games, surpassing unstructured LLM-powered bug-catching methods and filling the gap in automated detection of logical and design flaws.","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Human language technologies","Computation and Language","Gaming","LLM"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/a-glitch-in-the-matrix-locating-and-detecting-language-model-grounding-with-fakepedia","title":"A Glitch in the Matrix? Locating and Detecting Language Model Grounding with Fakepedia","url":"https://www.microsoft.com/en-us/research/publication/a-glitch-in-the-matrix-locating-and-detecting-language-model-grounding-with-fakepedia/","published":"2024-08-01","authors":["Giovanni Monea","Maxime Peyrard","Martin Josifoski","Vishrav Chaudhary","Jason Eisner","Emre Kiciman","Hamid Palangi","Barun Patra","Robert West"],"abstract":"Large language models (LLMs) have an impressive ability to draw on novel information supplied in their context. Yet the mechanisms underlying this contextual grounding remain unknown, especially in situations where contextual information contradicts factual knowledge stored in the parameters, which LLMs also excel at recalling. Favoring the contextual information is critical for retrieval-augmented generation methods, which enrich the context with up-to-date information, hoping that grounding can rectify outdated or noisy stored knowledge. We present a novel method to study grounding abilities using Fakepedia, a novel dataset of counterfactual texts constructed to clash with a model’s internal parametric knowledge. In this study, we introduce Fakepedia, a counterfactual dataset designed to evaluate grounding abilities when the internal parametric knowledge clashes with the contextual inf...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","LLM","language model","retrieval"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/natural-language-decomposition-and-interpretation-of-complex-utterances","title":"Natural Language Decomposition and Interpretation of Complex Utterances","url":"https://www.microsoft.com/en-us/research/publication/natural-language-decomposition-and-interpretation-of-complex-utterances/","published":"2024-08-01","authors":["Harsh Jhamtani","Hao Fang","Patrick Xia","Eran Levy","Jacob Andreas","Ben Van Durme"],"abstract":"Natural language interfaces often require supervised data to translate user requests into programs, database queries, or other structured intent representations. During data collection, it can be difficult to anticipate and formalize the full range of user needs -- for example, in a system designed to handle simple requests (like find my meetings tomorrow or move my meeting with my manager to noon ) , users may also express more elaborate requests (like swap all my calls on Monday and Tuesday ). We introduce an approach for equipping a simple language-to-code model to handle complex utterances via a process of hierarchical natural language decomposition. Our approach uses a pre-trained language model to decompose a complex utterance into a sequence of smaller natural language steps, then interprets each step using the language-to-code model. To test our approach, we collect and release D...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Human language technologies","1970-01-01","language model"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/collaborative-quest-completion-with-llm-driven-non-player-characters-in-minecraft","title":"Collaborative Quest Completion with LLM-driven Non-Player Characters in Minecraft","url":"https://www.microsoft.com/en-us/research/publication/collaborative-quest-completion-with-llm-driven-non-player-characters-in-minecraft/","published":"2024-08-01","authors":["Sudha Rao","Weijia Xu","Michael Xu","Jorge J. G. Leandro","Ken Lobb","Gabriel DesGarennes","Chris Brockett","Bill Dolan"],"abstract":"The use of generative AI in video game development is on the rise, and as the conversational and other capabilities of large language models continue to improve, we expect LLM-driven non-player characters (NPCs) to become widely deployed. In this paper, we seek to understand how human players collaborate with LLM-driven NPCs to accomplish in-game goals. We design a minigame within Minecraft where a player works with two GPT4-driven NPCs to complete a quest. We perform a user study in which 28 Minecraft players play this minigame and share their feedback. On analyzing the game logs and recordings, we find that several patterns of collaborative behavior emerge from the NPCs and the human players. We also report on the current limitations of language-only models that do not have rich game-state or visual understanding. We believe that this preliminary study and analysis will inform future g...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Human language technologies","Computation and Language","LLM"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/everything-of-thoughts-defying-the-law-of-penrose-triangle-for-thought-generation-2","title":"Everything of Thoughts: Defying the Law of Penrose Triangle for Thought Generation","url":"https://www.microsoft.com/en-us/research/publication/everything-of-thoughts-defying-the-law-of-penrose-triangle-for-thought-generation-2/","published":"2024-08-01","authors":["Ruomeng Ding","Chaoyun Zhang","Lu Wang","Yong Xu","Minghua Ma","Wei Zhang","Si Qin","Saravan Rajmohan","Qingwei Lin 林庆维","Dongmei Zhang"],"abstract":"Recent advancements in Large Language Models (LLMs) have revolutionized decision-making by breaking down complex problems into more manageable language sequences referred to as ”thoughts”. An effective thought design should consider three key perspectives: performance, efficiency, and flexibility. However, existing thought can at most exhibit two of these attributes. To address these limitations, we introduce a novel thought prompting approach called ”Everything of Thoughts”(XoT) to defy the law of” Penrose triangle of existing thought paradigms. XoT leverages pretrained reinforcement learning and Monte Carlo Tree Search (MCTS) to incorporate external domain knowledge into thoughts, thereby enhancing LLMs’ capabilities and enabling them to generalize to unseen problems efficiently. Through the utilization of the MCTS-LLM collaborative thought revision framework, this approach autonomousl...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","1970-01-01","LLM"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/evaluating-llms-mathematical-reasoning-in-financial-document-question-answering","title":"Evaluating LLMs' Mathematical Reasoning in Financial Document Question Answering","url":"https://www.microsoft.com/en-us/research/publication/evaluating-llms-mathematical-reasoning-in-financial-document-question-answering/","published":"2024-08-01","authors":["Pragya Srivastava","Manuj Malik","Vivek Gupta","Tanuja Ganu","Dan Roth"],"abstract":"Large Language Models (LLMs), excel in natural language understanding, but their capability for complex mathematical reasoning with an amalgamation of structured tables and unstructured text is uncertain. This study explores LLMs' mathematical reasoning on four financial tabular question-answering datasets: TATQA, FinQA, ConvFinQA, and Multihiertt. Through extensive experiments with various models and prompting techniques, we assess how LLMs adapt to complex tables and mathematical tasks. We focus on sensitivity to table complexity and performance variations with an increasing number of arithmetic reasoning steps. The results provide insights into LLMs' capabilities and limitations in handling complex mathematical scenarios for semi-structured tables. Ultimately, we introduce a novel prompting technique tailored to semi-structured documents, matching or outperforming other baselines in p...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/benchmarking-data-science-agents","title":"Benchmarking Data Science Agents","url":"https://www.microsoft.com/en-us/research/publication/benchmarking-data-science-agents/","published":"2024-08-01","authors":["Yuge Zhang","Qiyang Jiang","Xingyu Han","Nan Chen","Yuqing Yang","Kan Ren"],"abstract":"In the era of data-driven decision-making, the complexity of data analysis necessitates advanced expertise and tools of data science, presenting significant challenges even for specialists. Large Language Models (LLMs) have emerged as promising aids as data science agents, assisting humans in data analysis and processing. Yet their practical efficacy remains constrained by the varied demands of real-world applications and complicated analytical process. In this paper, we introduce DSEval -- a novel evaluation paradigm, as well as a series of innovative benchmarks tailored for assessing the performance of these agents throughout the entire data science lifecycle. Incorporating a novel bootstrapped annotation method, we streamline dataset preparation, improve the evaluation coverage, and expand benchmarking comprehensiveness. Our findings uncover prevalent obstacles and provide critical in...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/gems-generative-expert-metric-system-through-iterative-prompt-priming","title":"GEMS: Generative Expert Metric System through Iterative Prompt Priming","url":"https://www.microsoft.com/en-us/research/publication/gems-generative-expert-metric-system-through-iterative-prompt-priming/","published":"2024-08-01","authors":["Ti-Chung Cheng","Carmen Badea","Christian Bird","Tom Zimmermann","Robert DeLine","Nicole Forsgren","Denae Ford"],"abstract":"Across domains, metrics and measurements are fundamental to identifying challenges, informing decisions, and resolving conflicts. Despite the abundance of data available in this information age, not only can it be challenging for a single expert to work across multi-disciplinary data}, but non-experts can also find it unintuitive to create effective measures or transform theories into context-specific metrics that are chosen appropriately. This technical report addresses this challenge by examining software communities within large software corporations, where different measures are used as proxies to locate counterparts within the organization to transfer tacit knowledge. We propose a prompt-engineering framework inspired by neural activities, demonstrating that generative models can extract and summarize theories and perform basic reasoning, thereby transforming concepts into context-a...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Tech Report","Artificial intelligence","Programming languages and software engineering"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"apple:r87s0zj1u3oxp0brb1p0blby","title":"Tuning LLMs with Contrastive Alignment Instructions for Machine Translation in Unseen, Low-resource Languages","url":"https://machinelearning.apple.com/research/contrastive-alignment-instructions","published":"2024-08-01","authors":["Zhuoyuan Mao","Yen Yu"],"abstract":"This article introduces contrastive alignment instructions (AlignInstruct) to address two challenges in machine translation (MT) on large language models (LLMs). One is the expansion of supported languages to previously unseen ones. The second relates to the lack of data in low-resource languages. Model fine-tuning through MT instructions (MTInstruct) is a straightforward approach to the first challenge. However, MTInstruct is limited by weak...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"openalex:W4401827344","title":"MKEAH： Multimodal knowledge extraction and accumulation based on hyperplane embedding for knowledge-based visual question answering","url":"https://doi.org/10.1016/j.vrih.2023.06.002","published":"2024-08-01","authors":["Heng Zhang","Zhihua Wei","Guanming Liu","Rui Wang","Ruibin Mu","Chuanbao Liu","Aiquan Yuan","Guodong Cao","Ning Hu"],"abstract":"External knowledge representations play an essential role in knowledge-based visual question and answering to better understand complex scenarios in the open world. Recent entity-relationship embedding approaches are deficient in representing some complex relations, resulting in a lack of topic-related knowledge and redundancy in topic-irrelevant information. To this end, we propose MKEAH: Multimodal Knowledge Extraction and Accumulation on Hyperplanes. To ensure that the lengths of the feature vectors projected onto the hyperplane compare equally and to filter out sufficient topic-irrelevant information, two losses are proposed to learn the triplet representations from the complementary views: range loss and orthogonal loss. To interpret the capability of extracting topic-related knowledge, we present the Topic Similarity (TS) between topic and entity-relations. Experimental results dem...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1016/j.vrih.2023.06.002","openalex_id":"https://openalex.org/W4401827344","cited_by_count":12,"quality_score":49,"matched_keywords":[],"author_affiliations":["Alibaba Group (China)","Tongji University"],"concepts":[{"id":"https://openalex.org/C44291984","display_name":"Question answering","score":0.8045403957366943},{"id":"https://openalex.org/C41608201","display_name":"Embedding","score":0.7639228105545044},{"id":"https://openalex.org/C68693459","display_name":"Hyperplane","score":0.6712489724159241},{"id":"https://openalex.org/C120567893","display_name":"Knowledge extraction","score":0.546191930770874},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5406069755554199},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.44285038113594055},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.4193114638328552},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.36693769693374634}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":12}},{"id":"openalex:W4401507000","title":"Introduction to the Special Issue on AI-Generated Content for Multimedia","url":"https://doi.org/10.1109/tcsvt.2024.3427488","published":"2024-08-01","authors":["Shengxi Li","Xuelong Li","Leonardo Chiariglione","Jiebo Luo","Wenwu Wang","Zhengyuan Yang","Danilo P. Mandic","Hamido Fujita"],"abstract":"Our world is becoming rapidly dependent on data of increasing complexity, diversity, and volume which calls for robust and powerful tools to process such big data. Probabilistic generative models fulfill this goal by learning latent characteristic data relations, especially for the recent emergence of large-scale deep generative models that are able to create realistic content, namely, artificial intelligence-generated content (AIGC). The applications of AIGC span across various domains, and witness rich potential in multimedia content creation, including dialog generation, text-to-speech conversion, image/video generation, and cross-modal content generation.","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/tcsvt.2024.3427488","openalex_id":"https://openalex.org/W4401507000","cited_by_count":4,"quality_score":41,"matched_keywords":[],"author_affiliations":["Beihang University","CEDEO (Italy)","Imperial College London","Iwate Prefectural University","Microsoft (United States)","Northwestern Polytechnical University","University of Rochester","University of Surrey"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8063111901283264},{"id":"https://openalex.org/C173853756","display_name":"Dialog box","score":0.763995885848999},{"id":"https://openalex.org/C49774154","display_name":"Multimedia","score":0.5941298007965088},{"id":"https://openalex.org/C167966045","display_name":"Generative model","score":0.5836972594261169},{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.56960129737854},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4931325614452362},{"id":"https://openalex.org/C2778152352","display_name":"Content (measure theory)","score":0.48290133476257324},{"id":"https://openalex.org/C98045186","display_name":"Process (computing)","score":0.4721427261829376}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":4}},{"id":"openalex:W4401214577","title":"Achieving Inclusive Healthcare through Integrating Education and Research with AI and Personalized Curricula","url":"https://doi.org/10.1101/2024.07.31.24311182","published":"2024-08-01","authors":["Amir Bahmani","Kexin Cha","Arash Alavi","Amit Rai Dixit","Antony Ross","Ryan J. Park","Francesca Goncalves","Shirley Ma","Paul Saxman","Ramesh Nair","Ramin Akhavan-Sarraf","Xin Zhou"],"abstract":"Background: Precision medicine promises significant health benefits but faces challenges such as complex data management and analytics, interdisciplinary collaboration, and education of researchers, healthcare professionals, and participants. Addressing these needs requires the integration of computational experts, engineers, designers, and healthcare professionals to develop user-friendly systems and shared terminologies. The widespread adoption of large language models (LLMs) such as Generative Pretrained Transformer (GPT) and Claude highlights the importance of making complex data accessible to non-specialists. Methods: We evaluated the Stanford Data Ocean (SDO) precision medicine training program's learning outcomes, AI Tutor performance, and learner satisfaction by assessing self-rated competency on key learning objectives through pre- and post-learning surveys, along with formative...","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"preprint","doi":"https://doi.org/10.1101/2024.07.31.24311182","openalex_id":"https://openalex.org/W4401214577","cited_by_count":0,"quality_score":41,"matched_keywords":["personalized"],"author_affiliations":["Amazon (United States)","Cardiovascular Institute of the South","Human Longevity (United States)","Martin Luther King, Jr. Community Hospital","Stanford University","University of California San Diego"],"concepts":[{"id":"https://openalex.org/C47177190","display_name":"Curriculum","score":0.5868974328041077},{"id":"https://openalex.org/C2522767166","display_name":"Data science","score":0.5536344051361084},{"id":"https://openalex.org/C160735492","display_name":"Health care","score":0.5524008274078369},{"id":"https://openalex.org/C1668388","display_name":"Data management","score":0.50674968957901},{"id":"https://openalex.org/C56739046","display_name":"Knowledge management","score":0.5027694702148438},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.47676974534988403},{"id":"https://openalex.org/C75684735","display_name":"Big data","score":0.46306341886520386},{"id":"https://openalex.org/C79974875","display_name":"Cloud computing","score":0.4554221034049988}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"official:709f7a8392c9531d","title":"LLM Pruning and Distillation in Practice: The Minitron Approach","url":"https://research.nvidia.com/publication/2024-08_llm-pruning-and-distillation-practice-minitron-approach","published":"2024-08","authors":["Sharath Turuvekere Sreenivas","Saurav Muralidharan","Raviraj Joshi","Marcin Chochowski","Mostofa Patwary","Mohammad Shoeybi","Bryan Catanzaro","Jan Kautz","Pavlo Molchanov"],"abstract":"Official NVIDIA Research publication.","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["LLM","distillation"],"author_affiliations":["NVIDIA"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official NVIDIA Research publications page https://research.nvidia.com/publications?f%5B0%5D=publication_date%3A2024&page=1"}},{"id":"official:e99b8a3845f96832","title":"GenTranslate: Large Language Models are Generative Multilingual Speech and Machine Translators","url":"https://research.nvidia.com/publication/2024-08_gentranslate-large-language-models-are-generative-multilingual-speech-and","published":"2024-08","authors":["Yuchen Hu","Chen Chen","Huck Yang","Ruizhe Li","Zhehuai Chen","Eng Siong Chng"],"abstract":"Official NVIDIA Research publication.","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["NVIDIA"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official NVIDIA Research publications page https://research.nvidia.com/publications?f%5B0%5D=publication_date%3A2024&page=1"}},{"id":"openalex:W4401176820","title":"Crux: GPU-Efficient Communication Scheduling for Deep Learning Training","url":"https://doi.org/10.1145/3651890.3672239","published":"2024-07-31","authors":["Jiamin Cao","Yu Guan","Kun Qian","Jiaqi Gao","Wencong Xiao","Jianbo Dong","Binzhang Fu","Dennis Cai","Ennan Zhai"],"abstract":"Deep learning training (DLT), e.g., large language model (LLM) training, has become one of the most important services in multitenant cloud computing. By deeply studying in-production DLT jobs, we observed that communication contention among different DLT jobs seriously influences the overall GPU computation utilization, resulting in the low efficiency of the training cluster. In this paper, we present Crux, a communication scheduler that aims to maximize GPU computation utilization by mitigating the communication contention among DLT jobs. Maximizing GPU computation utilization for DLT, nevertheless, is NP-Complete; thus, we formulate and prove a novel theorem to approach this goal by GPU intensity-aware communication scheduling. Then, we propose an approach that prioritizes the DLT flows with high GPU computation intensity, reducing potential communication contention. Our 96-GPU testbe...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3651890.3672239","openalex_id":"https://openalex.org/W4401176820","cited_by_count":28,"quality_score":77,"matched_keywords":["LLM","language model","efficient"],"author_affiliations":["Alibaba Group (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8313450217247009},{"id":"https://openalex.org/C206729178","display_name":"Scheduling (production processes)","score":0.5201323628425598},{"id":"https://openalex.org/C108583219","display_name":"Deep learning","score":0.49967360496520996},{"id":"https://openalex.org/C2777211547","display_name":"Training (meteorology)","score":0.4529725909233093},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4458690583705902},{"id":"https://openalex.org/C173608175","display_name":"Parallel computing","score":0.4188828468322754},{"id":"https://openalex.org/C162324750","display_name":"Economics","score":0.0},{"id":"https://openalex.org/C153294291","display_name":"Meteorology","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":28}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/omniparser-for-pure-vision-based-gui-agent","title":"OmniParser for Pure Vision Based GUI Agent","url":"https://www.microsoft.com/en-us/research/publication/omniparser-for-pure-vision-based-gui-agent/","published":"2024-07-31","authors":["Yadong Lu","Jianwei Yang","Yelong Shen","Ahmed Awadallah"],"abstract":"The recent success of large vision language models shows great potential in driving the agent system operating on user interfaces. However, we argue that the power multimodal models like GPT-4V as a general agent on multiple operating systems across different applications is largely underestimated due to the lack of a robust screen parsing technique capable of: 1) reliably identifying interactable icons within the user interface, and 2) understanding the semantics of various elements in a screenshot and accurately associate the intended action with the corresponding region on the screen. To fill these gaps, we introduce \\textsc{OmniParser}, a comprehensive method for parsing user interface screenshots into structured elements, which significantly enhances the ability of GPT-4V to generate actions that can be accurately grounded in the corresponding regions of the interface. We first cura...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Miscellaneous","Artificial intelligence","Computer science","Generative AI","large language models","agent"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W4401176799","title":"Alibaba HPN: A Data Center Network for Large Language Model Training","url":"https://doi.org/10.1145/3651890.3672265","published":"2024-07-31","authors":["Kun Qian","Yongqing Xi","Jiamin Cao","Jiaqi Gao","Yichi Xu","Yu Guan","Binzhang Fu","Xuemei Shi","Fangbo Zhu","Rui Miao","Chao Wang","Peng Wang"],"abstract":"This paper presents HPN, Alibaba Cloud's data center network for large language model (LLM) training. Due to the differences between LLMs and general cloud computing (e.g., in terms of traffic patterns and fault tolerance), traditional data center networks are not well-suited for LLM training. LLM training produces a small number of periodic, bursty flows (e.g., 400Gbps) on each host. This characteristic of LLM training predisposes Equal-Cost Multi-Path (ECMP) to hash polarization, causing issues such as uneven traffic distribution. HPN introduces a 2-tier, dual-plane architecture capable of interconnecting 15K GPUs within one Pod, typically accommodated by the traditional 3-tier Clos architecture. Such a new architecture design not only avoids hash polarization but also greatly reduces the search space for path selection. Another challenge in LLM training is that its requirement for GPU...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3651890.3672265","openalex_id":"https://openalex.org/W4401176799","cited_by_count":100,"quality_score":75,"matched_keywords":["LLM","language model"],"author_affiliations":["Alibaba Group (China)","Alibaba Group (United States)","Bellevue Hospital Center"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7081571817398071},{"id":"https://openalex.org/C2777211547","display_name":"Training (meteorology)","score":0.5301671028137207},{"id":"https://openalex.org/C2779463800","display_name":"Center (category theory)","score":0.5148288607597351},{"id":"https://openalex.org/C67186912","display_name":"Data modeling","score":0.4770326316356659},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3770461976528168},{"id":"https://openalex.org/C115903868","display_name":"Software engineering","score":0.27906328439712524},{"id":"https://openalex.org/C185592680","display_name":"Chemistry","score":0.0},{"id":"https://openalex.org/C121332964","display_name":"Physics","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":100}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/virchow-2-scaling-self-supervised-mixed-magnification-models-in-pathology","title":"Virchow2: Scaling Self-Supervised Mixed Magnification Models in Pathology","url":"https://www.microsoft.com/en-us/research/publication/virchow-2-scaling-self-supervised-mixed-magnification-models-in-pathology/","published":"2024-07-31","authors":["Eric Zimmermann","Eugene Vorontsov","Julian Viret","Adam Casson","Michal Zelechowski","George Shaikovski","Neil Tenenholtz","Jimmy Hall","Thomas J. Fuchs","Nicolo Fusi","Siqi Liu","Kristen Severson"],"abstract":"Foundation models are rapidly being developed for computational pathology applications. However, it remains an open question which factors are most important for downstream performance with data scale and diversity, model size, and training algorithm all playing a role. In this work, we present the result of scaling both data and model size, surpassing previous studies in both dimensions, and introduce two new models: Virchow2, a 632M parameter vision transformer, and Virchow2G, a 1.85B parameter vision transformer, each trained with 3.1M histopathology whole slide images. To support this scale, we propose domain-inspired adaptations to the DINOv2 training algorithm, which is quickly becoming the default method in self-supervised learning for computational pathology. We achieve state of the art performance on twelve tile-level tasks, as compared to the top performing competing models. Ou...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Miscellaneous","Computer vision","Medical, health and genomics","Computer science","Computer Vision and Pattern Recognition"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"bytedance-seed:95","title":"ByteCheckpoint: A Unified Checkpointing System for Large Foundation Model Development","url":"https://seed.bytedance.com/en/research/bytecheckpoint-a-unified-checkpointing-system-for-large-foundation-model-development","published":"2024-07-29","authors":["Borui Wan","Mingji Han","Yiyao Sheng","Yanghua Peng","Haibin Lin","Mofan Zhang","Zhichao Lai","Menghan Yu","Junda Zhang","Zuquan Song","Xin Liu","Chuan Wu"],"abstract":"Checkpointing to preserve training states is crucial during the development of Large Foundation Models (LFMs), for training resumption upon various failures or changes in GPU resources and parallelism configurations. In addition, saved checkpoints are dispatched to evaluation tasks or transferred across different training stages (e.g., from pre-training to post-training). All these scenarios require resharding distributed checkpoints from one parallelism to another. In production, different LFMs are trained with various frameworks and storage backends, depending on model sizes and training scales. A high-performance checkpointing system is needed to enable efficient checkpoint management at scale. This paper presents ByteCheckpoint, an industrial-grade checkpointing system for large-scale LFM training. ByteCheckpoint employs a parallelism-agnostic checkpoint representation that enables e...","companies":["ByteDance/Seed"],"matched_orgs":["ByteDance/Seed"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["System Research","Infrastructures","NSDI 25","efficient"],"author_affiliations":["ByteDance/Seed"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official ByteDance Seed publication API https://seed.bytedance.com/api/get_article_list_v2"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/mixture-of-nested-experts-adaptive-processing-of-visual-tokens","title":"Mixture of Nested Experts: Adaptive Processing of Visual Tokens","url":"https://www.microsoft.com/en-us/research/publication/mixture-of-nested-experts-adaptive-processing-of-visual-tokens/","published":"2024-07-29","authors":["Gagan Jain","Nidhi Hegde","Aditya Kusupati","Arsha Nagrani","Shyamal Buch","Prateek Jain","Anurag Arnab","Sujoy Paul"],"abstract":"The visual medium (images and videos) naturally contains a large amount of information redundancy, thereby providing a great opportunity for leveraging efficiency in processing. While Vision Transformer (ViT) based models scale effectively to large data regimes, they fail to capitalize on this inherent redundancy, leading to higher computational costs. Mixture of Experts (MoE) networks demonstrate scalability while maintaining same inference-time costs, but they come with a larger parameter footprint. We present Mixture of Nested Experts (MoNE), which utilizes a nested structure for experts, wherein individual experts fall on an increasing compute-accuracy curve. Given a compute budget, MoNE learns to dynamically choose tokens in a priority order, and thus redundant tokens are processed through cheaper nested experts. Using this framework, we achieve equivalent performance as the baselin...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"official:9ed2b6a33d57193f","title":"Factorizing Text-to-Video Generation by Explicit Image Conditioning","url":"https://ai.meta.com/research/publications/factorizing-text-to-video-generation-by-explicit-image-conditioning/","published":"2024-07-29","authors":["Rohit Girdhar","Mannat Singh","Andrew Brown","Quentin Duval","Samaneh Azadi","Saketh Rambhatla","Mian Akbar Shah","Xi Yin","Devi Parikh","Ishan Misra"],"abstract":"Official Meta AI publication page.","companies":["Meta/FAIR"],"matched_orgs":["Meta/FAIR"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Computer Vision","Core Machine Learning"],"author_affiliations":["Meta/FAIR"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Meta AI publication search https://ai.meta.com/results/?content_types%5B0%5D=publication&page=11"}},{"id":"apple:n48shhemxxvtge1usa7ei9lq","title":"Apple Intelligence Foundation Language Models","url":"https://machinelearning.apple.com/research/apple-intelligence-foundation-language-models","published":"2024-07-29","authors":["Apple"],"abstract":"We present foundation language models developed to power Apple Intelligence features, including a ∼3 billion parameter model designed to run efficiently on devices and a large server-based language model designed for Private Cloud Compute. These models are designed to perform a wide range of tasks efficiently, accurately, and responsibly. This report describes the model architecture, the data used to train the model, the training process, how the...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["language model"],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"arxiv:2407.19884","title":"Preliminary WMT24 Ranking of General MT Systems and LLMs","url":"http://arxiv.org/abs/2407.19884","published":"2024-07-29","authors":["Tom Kocmi","Eleftherios Avramidis","Rachel Bawden","Ondřej Bojar","Anton Dvorkovich","Christian Federmann","Mark Fishel","Markus Freitag","Thamme Gowda","Roman Grundkiewicz","Barry Haddow","Marzena Karpinska"],"abstract":"This is the preliminary ranking of WMT24 General MT systems based on automatic metrics. The official ranking will be a human evaluation, which is superior to the automatic ranking and supersedes it. The purpose of this report is not to interpret any findings but only provide preliminary results to the participants of the General MT task that may be useful during the writing of the system submission.","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"preprint","doi":"https://doi.org/10.48550/arxiv.2407.19884","openalex_id":"https://openalex.org/W4401202125","cited_by_count":2,"quality_score":39,"matched_keywords":[],"author_affiliations":["Board of the Swiss Federal Institutes of Technology","Charles University","Dublin City University","ETH Zurich","Edinburgh College","German Research Centre for Artificial Intelligence","Google (United States)","IU International University of Applied Sciences","Johns Hopkins University","Microsoft (Germany)","Microsoft (United Kingdom)","Microsoft (United States)","University of Edinburgh","University of Massachusetts Amherst","University of Tartu","Árni Magnússon Institute for Icelandic Studies"],"concepts":[{"id":"https://openalex.org/C189430467","display_name":"Ranking (information retrieval)","score":0.9137792587280273},{"id":"https://openalex.org/C2780451532","display_name":"Task (project management)","score":0.6894132494926453},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.545294463634491},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.4581522047519684},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.37909507751464844},{"id":"https://openalex.org/C2522767166","display_name":"Data science","score":0.35033607482910156},{"id":"https://openalex.org/C127413603","display_name":"Engineering","score":0.10885593295097351},{"id":"https://openalex.org/C201995342","display_name":"Systems engineering","score":0.04357248544692993}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":2}},{"id":"apple:dcp8udoljxh2e5e74j665d1x","title":"DataComp-LM: In Search of the Next Generation of Training Sets for Language Models","url":"https://machinelearning.apple.com/research/datacomp-lm-search","published":"2024-07-26","authors":["Jeffrey Li","Alex Fang","Hadi Pour Ansari","Fartash Faghri","Alaaeldin Mohamed Elnouby Ali","Alexander Toshev","Vaishaal Shankar","Georgios Smyrnis","Matt Jordan","Maor Igvi","Alex Dimakis","Hanlin Zhang"],"abstract":"We introduce DataComp for Language Models (DCLM), a testbed for controlled dataset experiments with the goal of improving language models. As part of DCLM, we provide a standardized corpus of 240T tokens extracted from Common Crawl, effective pretraining recipes based on the OpenLM framework, and a broad suite of 53 downstream evaluations. Participants in the DCLM benchmark can experiment with data curation strategies such as deduplication,...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"openalex:W4401024268","title":"APIMig: A Project-Level Cross-Multi-Version API Migration Framework Based on Evolution Knowledge Graph","url":"https://doi.org/10.24963/ijcai.2024/829","published":"2024-07-26","authors":["Li Kuang","Qi Xie","HaiYang Yang","Yang Yang","Xiang Wei","HaoYue Kang","YingJie Xia"],"abstract":"API migration is essential for software maintenance due to the rapid evolution of third-party libraries where API elements may change continuously through updates. There are two main challenges for API migration at the project level, especially across multiple versions: 1) lack of specific library evolution knowledge across multi-version; 2) difficulty in identifying the chain of changes at the project level. This paper proposes a project-level cross-multi-version API migration framework APIMig. We first construct an API evolution knowledge graph (KG) to capture changes between adjacent library versions and then derive coherent cross-version API evolution knowledge by KG reasoning. Second, we design a chain exploration algorithm to track the chain of changes and aggregate the affected code segments. Finally, a large language model is employed in completing API migration by providing the....","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.24963/ijcai.2024/829","openalex_id":"https://openalex.org/W4401024268","cited_by_count":6,"quality_score":47,"matched_keywords":["language model"],"author_affiliations":["Brigham and Women's Hospital","Central South University","Electronics Research Institute","Hangzhou Dianzi University","Harvard University","Tencent (China)","Zhejiang University","Zhejiang University of Science and Technology"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6767661571502686},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.5140069723129272},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.35723090171813965},{"id":"https://openalex.org/C188147891","display_name":"Cognitive science","score":0.32037949562072754},{"id":"https://openalex.org/C15744967","display_name":"Psychology","score":0.17677319049835205}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":6}},{"id":"openalex:W4401016984","title":"FGNN2: A Powerful Pretraining Framework for Learning the Logic Functionality of Circuits","url":"https://doi.org/10.1109/tcad.2024.3434464","published":"2024-07-26","authors":["Ziyi Wang","Chen Bai","Zhuolun He","Guangliang Zhang","Qiang Xu","Tsung-Yi Ho","Yu Huang","Bei Yu"],"abstract":"Learning feasible representation from raw gate-level circuits is essential for incorporating machine learning techniques in logic synthesis, physical design, or verification. Existing structure-based learning methods tend to concentrate mainly on the graph topology, often neglecting logic functionality. This oversight frequently results in a failure to capture the underlying semantics, thereby limiting their overall applicability. To address the concern, we propose a novel circuit representation learning framework, FGNN2, that utilizes a contrastive scheme to effectively extract generic functionality knowledge. We construct a comprehensive pretraining dataset through a customized circuit augmentation scheme. We have also developed a novel contrastive loss function to capture the relative functional distance between different circuits, and to generate representations that are invariant to...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/tcad.2024.3434464","openalex_id":"https://openalex.org/W4401016984","cited_by_count":9,"quality_score":46,"matched_keywords":[],"author_affiliations":["Chinese University of Hong Kong","Huawei Technologies (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6128014922142029},{"id":"https://openalex.org/C118524514","display_name":"Computer architecture","score":0.5155205726623535},{"id":"https://openalex.org/C2777211547","display_name":"Training (meteorology)","score":0.4454286992549896},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.34251829981803894},{"id":"https://openalex.org/C153294291","display_name":"Meteorology","score":0.0},{"id":"https://openalex.org/C121332964","display_name":"Physics","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":9}},{"id":"openalex:W4401023829","title":"Consistency-Aware Padding for Incomplete Multi-Modal Alignment Clustering Based on Self-Repellent Greedy Anchor Search","url":"https://doi.org/10.24963/ijcai.2024/659","published":"2024-07-26","authors":["Shubin Ma","Liang Zhao","Mingdong Lu","Yifan Guo","Bo Xu"],"abstract":"Multi-modal representation is faithful and highly effective in describing real-world data samples' characteristics by describing their complementary information. However, the collected data often exhibits incomplete and misaligned characteristics due to factors such as inconsistent sensor frequencies and device malfunctions. Existing research has not effectively addressed the issue of filling missing data in scenarios where multiview data are both imbalanced and misaligned. Instead, it relies on class-level alignment of the available data. Thus, it results in some data samples not being well-matched, thereby affecting the quality of data fusion. In this paper, we propose the Consistency-Aware Padding for Incomplete Multi-Modal Alignment Clustering Based on Self-Repellent Greedy Anchor Search(CAPIMAC) to tackle the problem of filling imbalanced and misaligned data in multi-modal datasets....","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.24963/ijcai.2024/659","openalex_id":"https://openalex.org/W4401023829","cited_by_count":4,"quality_score":41,"matched_keywords":[],"author_affiliations":["Dalian University of Technology","Huazhong University of Science and Technology","King University","Peking University","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7664430141448975},{"id":"https://openalex.org/C59404180","display_name":"Feature learning","score":0.5223661065101624},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5122360587120056},{"id":"https://openalex.org/C132525143","display_name":"Graph","score":0.47413206100463867},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.4636014699935913},{"id":"https://openalex.org/C2987255567","display_name":"Knowledge graph","score":0.45789462327957153},{"id":"https://openalex.org/C80444323","display_name":"Theoretical computer science","score":0.2692982256412506}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":4}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/generative-ai-in-real-world-workplaces","title":"Generative AI in Real-World Workplaces","url":"https://www.microsoft.com/en-us/research/publication/generative-ai-in-real-world-workplaces/","published":"2024-07-25","authors":["Sonia Jaffe","Neha Parikh Shah","Jenna Butler","Alex Farach","Alexia Cambon","Brent Hecht","Michael Schwarz","Jaime Teevan"],"abstract":"This report presents the most recent findings of Microsoft’s research initiative on AI and Productivity, which seeks to measure and understand the productivity gains associated with LLM-powered productivity tools like Microsoft Copilot. The report synthesizes research results from over a dozen recent studies conducted by researchers at Microsoft, with a focus on studies of generative AI in actual workplace environments. One of these is, to our knowledge, the largest, randomized controlled trial of the introduction of generative AI into organizations. Overall, the research suggests that generative AI is already aiding workers in becoming more productive in their day-to-day jobs in significant ways. However, the influence of generative AI is subject to variation by role, function, and organization and is contingent upon adoption and utilization. The report explores these variations and und...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":80,"matched_keywords":["Tech Report","Artificial intelligence","Economics","Human-computer interaction","Search and information retrieval","Social sciences","LLM"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/large-language-models-as-co-pilots-for-causal-inference-in-medical-studies","title":"Large Language Models as Co-Pilots for Causal Inference in Medical Studies","url":"https://www.microsoft.com/en-us/research/publication/large-language-models-as-co-pilots-for-causal-inference-in-medical-studies/","published":"2024-07-25","authors":["Ahmed Alaa","Rachael V. Phillips","Emre Kiciman","Laura B. Balzer","M. V. D. Laan","Maya Petersen"],"abstract":"The validity of medical studies based on real-world clinical data, such as observational studies, depends on critical assumptions necessary for drawing causal conclusions about medical interventions. Many published studies are flawed because they violate these assumptions and entail biases such as residual confounding, selection bias, and misalignment between treatment and measurement times. Although researchers are aware of these pitfalls, they continue to occur because anticipating and addressing them in the context of a specific study can be challenging without a large, often unwieldy, interdisciplinary team with extensive expertise. To address this expertise gap, we explore the use of large language models (LLMs) as co-pilot tools to assist researchers in identifying study design flaws that undermine the validity of causal inferences. We propose a conceptual framework for LLMs as cau...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Article (Journal)","Artificial intelligence","Medical, health and genomics","Computer science"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"apple:yk0kmxwtmfu1632ksn5s9hkn","title":"LazyLLM: Dynamic Token Pruning for Efficient Long Context LLM Inference","url":"https://machinelearning.apple.com/research/dynamic-token-pruning","published":"2024-07-25","authors":["Qichen Fu","Minsik Cho","Thomas Merth","Sachin Mehta","Mohammad Rastegari","Mahyar Najibi"],"abstract":"This paper was accepted at the Efficient Systems for Foundation Models Workshop at ICML 2024.","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["LLM","efficient"],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"apple:n5lp8rfvfgu4ulv714j1iycs","title":"Pre-Trained Foundation Model Representations to Uncover Breathing Patterns in Speech","url":"https://machinelearning.apple.com/research/pretrained-foundation-model","published":"2024-07-25","authors":["Vikramjit Mitra","Anirban Chatterjee","Ke Zhai","Helen Weng","Ayuko Hill","Nicole Hay","Christopher Webb","Jamie Cheng","Erdrin Azemi"],"abstract":"The process of human speech production involves coordinated respiratory action to elicit acoustic speech signals. Typically, speech is produced when air is forced from the lungs and is modulated by the vocal tract, where such actions are interspersed by moments of breathing in air (inhalation) to refill the lungs again. Respiratory rate (𝑅𝑅) is a vital metric that is used to assess the overall health, fitness, and general well-being of an...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"openalex:W4400985903","title":"GenUSD: 3D scene generation made easy","url":"https://doi.org/10.1145/3641520.3665306","published":"2024-07-25","authors":["Tsung-Yi Lin","Chen-Hsuan Lin","Yin Cui","Yunhao Ge","Seungjun Nah","Arun Mallya","Zekun Hao","Yifan Ding","Hanzi Mao","Zhaoshuo Li","Yen-Chen Lin","Xiaohui Zeng"],"abstract":"We introduce GenUSD, an end-to-end text-to-scene generation framework that transforms natural language queries into realistic 3D scenes, including 3D objects and layouts. The process involves two main steps: 1) A Large Language Model (LLM) generates a scene layout hierarchically. It first proposes a high-level plan to decompose the scene into multiple functionally and spatially distinct subscenes. Then, for each subscene, the LLM proposes objects with detailed positions, poses, sizes, and descriptions. To manage complex object relationships and intricate scenes, we introduce object layout design meta functions as tools for the LLM. 2) A novel text-to-3D model generates each 3D object with surface meshes and high-resolution texture maps based on the LLM’s descriptions. The assembled 3D assets form the final 3D scene, represented as a Universal Scene Description (USD) format. GenUSD ensure...","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3641520.3665306","openalex_id":"https://openalex.org/W4400985903","cited_by_count":2,"quality_score":47,"matched_keywords":["LLM","language model"],"author_affiliations":["Nvidia (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7001807689666748},{"id":"https://openalex.org/C121684516","display_name":"Computer graphics (images)","score":0.40625521540641785},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.36995238065719604},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.3620825409889221}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":2}},{"id":"openalex:W4400969939","title":"FactoryDecoder: Expertise-Free Digital Twin Generation and Modification Tool","url":"https://doi.org/10.1145/3641234.3671031","published":"2024-07-25","authors":["Jiachun Du","Hanjin Zhong","Liang Zhou","Jianye Li"],"abstract":"Digital twins are essential visualization applications in the manufacturing field. However, their development requires specialized 3D engineers, which often complicates modifications during the operation stage. To address this complexity, we introduce FactoryDecoder, a development tool for non-expert users in 3D engineering to generate and modify digital twins using natural language inputs. FactoryDecoder converts users' descriptions of production line into hierarchical asset codes, facilitating the automated layout and simplified modification of digital twins. Furthermore, if the system finds that the 3D asset library lacks appropriate device representations, it will automatically use a 3D mesh generator to create new ones. We evaluate the performance of large language models (LLMs) to optimize FactoryDecoder's capabilities. Preliminary user studies highlight FactoryDecoder's effectiven...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3641234.3671031","openalex_id":"https://openalex.org/W4400969939","cited_by_count":1,"quality_score":38,"matched_keywords":[],"author_affiliations":["Alibaba Group (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7499879002571106},{"id":"https://openalex.org/C76178495","display_name":"Asset (computer security)","score":0.578840434551239},{"id":"https://openalex.org/C36464697","display_name":"Visualization","score":0.5599791407585144},{"id":"https://openalex.org/C9652623","display_name":"Field (mathematics)","score":0.5037879347801208},{"id":"https://openalex.org/C2780992000","display_name":"Generator (circuit theory)","score":0.4373264014720917},{"id":"https://openalex.org/C115903868","display_name":"Software engineering","score":0.3464750051498413},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.32209354639053345},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.1730092465877533}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/q-sparse-all-large-language-models-can-be-fully-sparsely-activated","title":"Q-Sparse: All Large Language Models can be Fully Sparsely-Activated","url":"https://www.microsoft.com/en-us/research/publication/q-sparse-all-large-language-models-can-be-fully-sparsely-activated/","published":"2024-07-24","authors":["Hongyu Wang","Shuming Ma","Ruiping Wang","Furu Wei"],"abstract":"We introduce, Q-Sparse, a simple yet effective approach to training sparsely-activated large language models (LLMs). Q-Sparse enables full sparsity of activations in LLMs which can bring significant efficiency gains in inference. This is achieved by applying top-K sparsification to the activations and the straight-through-estimator to the training. We also introduce Block Q-Sparse for batch training and inference. The key results from this work are, (1) Q-Sparse can achieve results comparable to those of baseline LLMs while being much more efficient at inference time; (2) We present an inference-optimal scaling law for sparsely-activated LLMs; (3) Q-Sparse is effective in different settings, including training-from-scratch, continue-training of off-the-shelf LLMs, and finetuning; (4) Q-Sparse works for both full-precision and 1-bit LLMs (e.g., BitNet b1.58). Particularly, the synergy of....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Miscellaneous","Artificial intelligence","Computation and Language","Machine learning","efficient"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W4405469551","title":"Enhancing reservoir simulation workflows with generative AI for expert model building, quality control, and interpretation","url":"https://doi.org/10.1190/image2024-4099981.1","published":"2024-07-24","authors":["Klaus Wiegand","K. Mukundakrishnan","M. Bedewi","V. Ananthan","Dan Kahn","D. Tishechkin","Marcos Kajita"],"abstract":"The successful and timely development of a reservoir simulation model for a particular asset as well as the interpretation of the simulation results requires not only expert domain knowledge in reservoir engineering, but also great experience in using the toolchain associated with the simulation workflow. One crucial task is the construction of the input file deck for the numerical reservoir simulator based on the results of previous steps in the workflow. While most or all of the input deck can be generated by tools, it is the detailed understanding and fine tuning of the numerous model parameters, along with the quality control of the model’s input, that determines the difference between success and failure.We are developing a generative AI assistant to lower the barrier for building and deploying numerical simulation models on the GPU-based reservoir simulation platform ECHELON. This....","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1190/image2024-4099981.1","openalex_id":"https://openalex.org/W4405469551","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Amazon (United States)","Stone Ridge Technology (United States)"],"concepts":[{"id":"https://openalex.org/C177212765","display_name":"Workflow","score":0.7454381585121155},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7325916290283203},{"id":"https://openalex.org/C527412718","display_name":"Interpretation (philosophy)","score":0.680039644241333},{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.6216362118721008},{"id":"https://openalex.org/C2779530757","display_name":"Quality (philosophy)","score":0.5650737285614014},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.44280755519866943},{"id":"https://openalex.org/C2775924081","display_name":"Control (management)","score":0.4139867424964905},{"id":"https://openalex.org/C115903868","display_name":"Software engineering","score":0.39133983850479126}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"official:fbec1d28009aa6f0","title":"Imagine yourself: Tuning-Free Personalized Image Generation","url":"https://ai.meta.com/research/publications/imagine-yourself-tuning-free-personalized-image-generation/","published":"2024-07-23","authors":["Zecheng He","Bo Sun","Felix Xu","Haoyu Ma","Ankit Ramchandani","Vincent Cheung","Siddharth Shah","Anmol Kalia","Ning Zhang","Peizhao Zhang","Roshan Sumbaly","Peter Vajda"],"abstract":"Official Meta AI publication page.","companies":["Meta/FAIR"],"matched_orgs":["Meta/FAIR"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Computer Vision","personalized"],"author_affiliations":["Meta/FAIR"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Meta AI publication search https://ai.meta.com/results/?content_types%5B0%5D=publication&page=11"}},{"id":"openalex:W4400915096","title":"LORE++: Logical location regression network for table structure recognition with pre-training","url":"https://doi.org/10.1016/j.patcog.2024.110816","published":"2024-07-23","authors":["Rujiao Long","Hangdi Xing","Zhibo Yang","Qi Zheng","Zhi Yu","Fei Huang","Cong Yao"],"abstract":"","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1016/j.patcog.2024.110816","openalex_id":"https://openalex.org/W4400915096","cited_by_count":14,"quality_score":51,"matched_keywords":[],"author_affiliations":["Alibaba Group (China)","Zhejiang University"],"concepts":[{"id":"https://openalex.org/C2777211547","display_name":"Training (meteorology)","score":0.701384961605072},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.614406406879425},{"id":"https://openalex.org/C45235069","display_name":"Table (database)","score":0.5901957750320435},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5728534460067749},{"id":"https://openalex.org/C83546350","display_name":"Regression","score":0.43881338834762573},{"id":"https://openalex.org/C153180895","display_name":"Pattern recognition (psychology)","score":0.4300999045372009},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.3373726010322571},{"id":"https://openalex.org/C124101348","display_name":"Data mining","score":0.259132444858551}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":14}},{"id":"openalex:W4400927300","title":"Highly adaptive multi-modal image matching based on tuning-free filtering and enhanced sketch features","url":"https://doi.org/10.1016/j.inffus.2024.102599","published":"2024-07-23","authors":["Yifan Liao","Pengjie Tao","Qi Chen","Lei Wang","Tao Ke"],"abstract":"","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1016/j.inffus.2024.102599","openalex_id":"https://openalex.org/W4400927300","cited_by_count":8,"quality_score":45,"matched_keywords":[],"author_affiliations":["China University of Geosciences","Cloud Computing Center","Eastern Institute of Technology, Ningbo","Huawei Technologies (China)","Wuhan University"],"concepts":[{"id":"https://openalex.org/C2779231336","display_name":"Sketch","score":0.8465067148208618},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8261415958404541},{"id":"https://openalex.org/C71139939","display_name":"Modal","score":0.7849425673484802},{"id":"https://openalex.org/C165064840","display_name":"Matching (statistics)","score":0.6542875170707703},{"id":"https://openalex.org/C115961682","display_name":"Image (mathematics)","score":0.6142075657844543},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.5589902997016907},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5430826544761658},{"id":"https://openalex.org/C153180895","display_name":"Pattern recognition (psychology)","score":0.3840015232563019}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":8}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/dejavu-kv-cache-streaming-for-fast-fault-tolerant-generative-llm-serving","title":"DéjàVu: KV-cache Streaming for Fast, Fault-tolerant Generative LLM Serving","url":"https://www.microsoft.com/en-us/research/publication/dejavu-kv-cache-streaming-for-fast-fault-tolerant-generative-llm-serving/","published":"2024-07-22","authors":["Foteini Strati","Sara Mcallister","Amar Phanishayee","Jakub Tarnawski","Ana Klimovic"],"abstract":"Distributed LLM serving is costly and often underutilizes hardware accelerators due to three key challenges: bubbles in pipeline-parallel deployments caused by the bimodal latency of prompt and token processing, GPU memory overprovisioning, and long recovery times in case of failures. In this paper, we propose DéjàVu, a system to address all these challenges using a versatile and efficient KV cache streaming library (DéjàVuLib). Using DéjàVuLib, we propose and implement efficient prompt-token disaggregation to reduce pipeline bubbles, microbatch swapping for efficient GPU memory management, and state replication for fault-tolerance. We highlight the efficacy of these solutions on a range of large models across cloud deployments. Venue: 1970-01-01","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":84,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Systems and networking","large language models","1970-01-01","LLM","memory","efficient"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/mgit-a-model-versioning-and-management-system-3","title":"MGit: A Model Versioning and Management System","url":"https://www.microsoft.com/en-us/research/publication/mgit-a-model-versioning-and-management-system-3/","published":"2024-07-22","authors":["Wei Hao","Daniel Mendoza","Rafael da Silva","Deepak Narayanan","Amar Phanishayee","Asaf Cidon","Junfeng Yang"],"abstract":"New ML models are often derived from existing ones (e.g., through fine-tuning, quantization or distillation), forming an ecosystem where models are related to each other and can share structure or even parameter values. Managing such a large and evolving ecosystem of model derivatives is challenging. For instance, the overhead of storing all such models is high, and models may inherit bugs from related models, complicating error attribution and debugging. In this paper, we propose a model versioning and management system called MGit that makes it easier to store, test, update, and collaborate on related models. MGit introduces a lineage graph that records the relationships between models, optimizations to efficiently store model parameters, and abstractions over this lineage graph that facilitate model testing, updating and collaboration. We find that MGit works well in practice: MGit is...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Systems and networking","quantization","distillation"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W4400889241","title":"A foundation model for clinical-grade computational pathology and rare cancers detection","url":"https://doi.org/10.1038/s41591-024-03141-0","published":"2024-07-22","authors":["Eugene Vorontsov","Alican Bozkurt","Adam Casson","George Shaikovski","Michal Zelechowski","Kristen Severson","Eric Zimmermann","James M. Hall","Neil Tenenholtz","Nicolò Fusi","Ellen Yang","Philippe Mathieu"],"abstract":"The analysis of histopathology images with artificial intelligence aims to enable clinical decision support systems and precision medicine. The success of such applications depends on the ability to model the diverse patterns observed in pathology images. To this end, we present Virchow, the largest foundation model for computational pathology to date. In addition to the evaluation of biomarker prediction and cell identification, we demonstrate that a large foundation model enables pan-cancer detection, achieving 0.95 specimen-level area under the (receiver operating characteristic) curve across nine common and seven rare cancers. Furthermore, we show that with less training data, the pan-cancer detector built on Virchow can achieve similar performance to tissue-specific clinical-grade models in production and outperform them on some rare variants of cancer. Virchow's performance gains h...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1038/s41591-024-03141-0","openalex_id":"https://openalex.org/W4400889241","cited_by_count":336,"quality_score":67,"matched_keywords":[],"author_affiliations":["American International Group (United States)","Memorial Sloan Kettering Cancer Center","Microsoft (United States)","St George Hospital","University of Rochester"],"concepts":[{"id":"https://openalex.org/C2780966255","display_name":"Foundation (evidence)","score":0.5346029996871948},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5099989175796509},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5073289275169373},{"id":"https://openalex.org/C2777522853","display_name":"Digital pathology","score":0.49913668632507324},{"id":"https://openalex.org/C2781197716","display_name":"Biomarker","score":0.47583985328674316},{"id":"https://openalex.org/C121608353","display_name":"Cancer","score":0.46020883321762085},{"id":"https://openalex.org/C2985322473","display_name":"Cancer detection","score":0.45207005739212036},{"id":"https://openalex.org/C58471807","display_name":"Receiver operating characteristic","score":0.4458172023296356}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":336}},{"id":"official:fb162d40887541d7","title":"The Llama 3 Herd of Models","url":"https://ai.meta.com/research/publications/the-llama-3-herd-of-models/","published":"2024-07-22","authors":["Llama team"],"abstract":"Official Meta AI publication page.","companies":["Meta/FAIR"],"matched_orgs":["Meta/FAIR"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Human & Machine Intelligence","Conversational AI"],"author_affiliations":["Meta/FAIR"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Meta AI publication search https://ai.meta.com/results/?content_types%5B0%5D=publication&page=11"}},{"id":"official:64e811baa13f51d8","title":"CYBERSECEVAL 3: Advancing the Evaluation of Cybersecurity Risks and Capabilities in Large Language Models","url":"https://ai.meta.com/research/publications/cyberseceval-3-advancing-the-evaluation-of-cybersecurity-risks-and-capabilities-in-large-language-models/","published":"2024-07-22","authors":["Shengye Wan","Cyrus Nikolaidis","Daniel Song","David Molnar","James Crnkovich","Jayson Grace","Manish Bhatt","Sahana Chennabasappa","Spencer Whitman","Stephanie Ding","Vlad Ionescu","Yue Li"],"abstract":"Official Meta AI publication page.","companies":["Meta/FAIR"],"matched_orgs":["Meta/FAIR"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Systems Research"],"author_affiliations":["Meta/FAIR"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Meta AI publication search https://ai.meta.com/results/?content_types%5B0%5D=publication&page=11"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/stealing-part-of-a-production-language-model","title":"Stealing Part of a Production Language Model","url":"https://www.microsoft.com/en-us/research/publication/stealing-part-of-a-production-language-model/","published":"2024-07-21","authors":["Nicholas Carlini","Daniel Paleka","Krishnamurthy Dj Dvijotham","Thomas Steinke","Jonathan Hayase","A. Feder Cooper","Katherine Lee","Matthew Jagielski","Milad Nasr","Arthur Conmy","Itay Yona","Eric Wallace"],"abstract":"We introduce the first model-stealing attack that extracts precise, nontrivial information from black-box production language models like OpenAI's ChatGPT or Google's PaLM-2. Specifically, our attack recovers the embedding projection layer (up to symmetries) of a transformer model, given typical API access. For under $20 USD, our attack extracts the entire projection matrix of OpenAI's Ada and Babbage language models. We thereby confirm, for the first time, that these black-box models have a hidden dimension of 1024 and 2048, respectively. We also recover the exact hidden dimension size of the gpt-3.5-turbo model, and estimate it would cost under $2,000 in queries to recover the entire projection matrix. We conclude with potential defenses and mitigations, and discuss the implications of possible future work that could extend our attack. Venue: 1970-01-01","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Security, privacy, and cryptography","1970-01-01","language model"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/is-behavior-cloning-all-you-need-understanding-horizon-in-imitation-learning","title":"Is Behavior Cloning All You Need? Understanding Horizon in Imitation Learning","url":"https://www.microsoft.com/en-us/research/publication/is-behavior-cloning-all-you-need-understanding-horizon-in-imitation-learning/","published":"2024-07-19","authors":["Dylan Foster","Adam Block","Dipendra Misra"],"abstract":"Imitation learning (IL) aims to mimic the behavior of an expert in a sequential decision making task by learning from demonstrations, and has been widely applied to robotics, autonomous driving, and autoregressive text generation. The simplest approach to IL, behavior cloning (BC), is thought to incur sample complexity with unfavorable quadratic dependence on the problem horizon, motivating a variety of different online algorithms that attain improved linear horizon dependence under stronger assumptions on the data and the learner's access to the expert. We revisit the apparent gap between offline and online IL from a learning-theoretic perspective, with a focus on general policy classes up to and including deep neural networks. Through a new analysis of behavior cloning with the logarithmic loss, we show that it is possible to achieve horizon-independent sample complexity in offline IL....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","Imitation learning","mathematics","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/improving-context-aware-preference-modeling-for-language-models","title":"Improving Context-Aware Preference Modeling for Language Models","url":"https://www.microsoft.com/en-us/research/publication/improving-context-aware-preference-modeling-for-language-models/","published":"2024-07-19","authors":["Silviu Pitis","Ziang Xiao","Nicolas Le Roux","Alessandro Sordoni"],"abstract":"While finetuning language models from pairwise preferences has proven remarkably effective, the underspecified nature of natural language presents critical challenges. Direct preference feedback is uninterpretable, difficult to provide where multidimensional criteria may apply, and often inconsistent, either because it is based on incomplete instructions or provided by diverse principals. To address these challenges, we consider the two-step preference modeling procedure that first resolves the under-specification by selecting a context, and then evaluates preference with respect to the chosen context. We decompose reward modeling error according to these two steps, which suggests that supervising context in addition to context-specific preference may be a viable approach to aligning models with diverse human preferences. For this to work, the ability of models to evaluate context-specif...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","natural language models","1970-01-01","preference"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/as-generative-models-improve-people-adapt-their-prompts","title":"As Generative Models Improve, People Adapt Their Prompts","url":"https://www.microsoft.com/en-us/research/publication/as-generative-models-improve-people-adapt-their-prompts/","published":"2024-07-18","authors":["E. Jahani","Benjamin S. Manning","Joe Zhang","Hong-Yi TuYe","Mohammed Alsobay","Christos Nicolaides","Siddharth Suri","David Holtz"],"abstract":"In an online experiment with N = 1891 participants, we collected and analyzed over 18,000 prompts to explore how the importance of prompting will change as the capabilities of generative AI models continue to improve. Each participant in our experiment was randomly and blindly assigned to use one of three text-to-image diffusion models: DALL-E 2, its more advanced successor DALL-E 3, or a version of DALL-E 3 with automatic prompt revision. Participants were then asked to write prompts to reproduce a target image as closely as possible in 10 consecutive tries. We find that task performance was higher for participants using DALL-E 3 than for those using DALL-E 2. This performance gap corresponds to a noticeable difference in the similarity of participants' images to their target images, and was caused in equal measure by: (1) the increased technical capabilities of DALL-E 3, and (2) endoge...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.2139/ssrn.4899363","openalex_id":"https://openalex.org/W4400943826","cited_by_count":1,"quality_score":73,"matched_keywords":["Miscellaneous","Artificial intelligence","Computer science","Economics","Generative AI"],"author_affiliations":["Microsoft","Massachusetts Institute of Technology","Microsoft (United States)","Microsoft Research New York City (United States)","Moscow Institute of Thermal Technology","University of California, Berkeley"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"arxiv:2407.13833","title":"Phi-3 Safety Post-Training: Aligning Language Models with a \"Break-Fix\" Cycle","url":"https://huggingface.co/papers/2407.13833","published":"2024-07-18","authors":["Emman Haider","Daniel Perez-Becker","Thomas Portet","Piyush Madan","Amit Garg","David Majercak","Wen Wen","Dongwoo Kim","Ziyi Yang","Jianwen Zhang","Hiteshi Sharma","Blake Bullwinkel"],"abstract":"Recent innovations in language model training have demonstrated that it is possible to create highly performant models that are small enough to run on a smartphone. As these models are deployed in an increasing number of domains, it is critical to ensure that they are aligned with human preferences and safety considerations. In this report, we present our methodology for safety aligning the Phi-3 series of language models. We utilized a \"break-fix\" cycle, performing multiple rounds of dataset curation, safety post-training, benchmarking, red teaming, and vulnerability identification to cover a variety of harm areas in both single and multi-turn scenarios. Our results indicate that this approach iteratively improved the performance of the Phi-3 models across a wide range of responsible AI benchmarks.","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["huggingface_search"],"source":"huggingface_search","work_type":"","doi":"","openalex_id":"","cited_by_count":0,"quality_score":31,"matched_keywords":["language model"],"author_affiliations":[],"concepts":[],"official_report":false,"quality_signals":{"company_match_source":"HuggingFace Papers organization/author metadata"}},{"id":"openalex:W4400734331","title":"Learning together: Towards foundation models for machine learning interatomic potentials with meta-learning","url":"https://doi.org/10.1038/s41524-024-01339-x","published":"2024-07-17","authors":["Alice E. A. Allen","Nicholas Lubbers","Sakib Matin","Justin S. Smith","Richard A. Messerly","Sergei Tretiak","Kipton Barros"],"abstract":"Abstract The development of machine learning models has led to an abundance of datasets containing quantum mechanical (QM) calculations for molecular and material systems. However, traditional training methods for machine learning models are unable to leverage the plethora of data available as they require that each dataset be generated using the same QM method. Taking machine learning interatomic potentials (MLIPs) as an example, we show that meta-learning techniques, a recent advancement from the machine learning community, can be used to fit multiple levels of QM theory in the same training process. Meta-learning changes the training procedure to learn a representation that can be easily re-trained to new tasks with small amounts of data. We then demonstrate that meta-learning enables simultaneously training to multiple large organic molecule datasets. As a proof of concept, we examin...","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1038/s41524-024-01339-x","openalex_id":"https://openalex.org/W4400734331","cited_by_count":41,"quality_score":67,"matched_keywords":[],"author_affiliations":["Center for Integrated Nanotechnologies","Los Alamos National Laboratory","Nvidia (United States)"],"concepts":[{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.7797703742980957},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.7259359359741211},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7013102769851685},{"id":"https://openalex.org/C153083717","display_name":"Leverage (statistics)","score":0.6991320252418518},{"id":"https://openalex.org/C98045186","display_name":"Process (computing)","score":0.41229650378227234},{"id":"https://openalex.org/C111919701","display_name":"Operating system","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":41}},{"id":"official:4025902845cdbf2c","title":"VFusion3D: Learning Scalable 3D Generative Models from Video Diffusion Models","url":"https://ai.meta.com/research/publications/vfusion3d-learning-scalable-3d-generative-models-from-video-diffusion-models/","published":"2024-07-17","authors":["Junlin Han","Filippos Kokkinos","Philip Torr"],"abstract":"Official Meta AI publication page.","companies":["Meta/FAIR"],"matched_orgs":["Meta/FAIR"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Graphics","Computer Vision"],"author_affiliations":["Meta/FAIR"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Meta AI publication search https://ai.meta.com/results/?content_types%5B0%5D=publication&page=11"}},{"id":"official:d36c5932e3d3a156","title":"Prover-Verifier Games improve legibility of language model outputs","url":"https://openai.com/index/prover-verifier-games-improve-legibility","published":"2024-07-17","authors":["OpenAI"],"abstract":"Discover how prover-verifier games improve the legibility of language model outputs, making AI solutions clearer, easier to verify, and more trustworthy for both humans and machines.","companies":["OpenAI"],"matched_orgs":["OpenAI"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Research","language model"],"author_affiliations":["OpenAI"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official RSS feed https://openai.com/news/rss.xml"}},{"id":"apple:k6rdwxuwgv7jxkq0wm04t2gf","title":"Projected Language Models: A Large Model Pre-Segmented Into Smaller Ones","url":"https://machinelearning.apple.com/research/projected-language-models","published":"2024-07-17","authors":["David Grangier","Angelos Katharopoulos","Pierre Ablin","Awni Hannun"],"abstract":"This paper has been accepted at the Foundation Models in the Wild workshop at ICML 2024.","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"openalex:W4400726600","title":"Open-Vocabulary Category-Level Object Pose and Size Estimation","url":"https://doi.org/10.1109/lra.2024.3430156","published":"2024-07-17","authors":["Junhao Cai","Yisheng He","Weihao Yuan","Siyu Zhu","Zilong Dong","Liefeng Bo","Qifeng Chen"],"abstract":"This letter studies a new open-set problem, the open-vocabulary category-level object pose and size estimation. Given human text descriptions of arbitrary novel object categories, the robot agent seeks to predict the position, orientation, and size of the target object in the observed scene image. To enable such generalizability, we first introduce OO3D-9D, a large-scale photorealistic dataset for this task. Derived from OmniObject3D, OO3D-9D is the largest and most diverse dataset in the field of category-level object pose and size estimation. It includes additional annotations for the symmetry axis of each category, which help resolve symmetric ambiguity. Apart from the large-scale dataset, we find another key factor to enabling such generalizability is leveraging the strong prior knowledge in pre-trained visual-language foundation models. We then propose a framework built on pre-train...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/lra.2024.3430156","openalex_id":"https://openalex.org/W4400726600","cited_by_count":11,"quality_score":52,"matched_keywords":["agent"],"author_affiliations":["Alibaba Group (China)","Fudan University","Hong Kong University of Science and Technology"],"concepts":[{"id":"https://openalex.org/C2777601683","display_name":"Vocabulary","score":0.6355171799659729},{"id":"https://openalex.org/C2781238097","display_name":"Object (grammar)","score":0.6320856809616089},{"id":"https://openalex.org/C52102323","display_name":"Pose","score":0.6148557066917419},{"id":"https://openalex.org/C96250715","display_name":"Estimation","score":0.5686829686164856},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5581221580505371},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5356554388999939},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.47142547369003296},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.3399769961833954}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":11}},{"id":"apple:gulgzd26liwkd89h56nra93b","title":"Improving GFlowNets for Text-to-Image Diffusion Alignment","url":"https://machinelearning.apple.com/research/improving-gflownets","published":"2024-07-17","authors":["Dinghuai Zhang","Yizhe Zhang","Jiatao Gu","Ruixiang Zhang","Josh Susskind","Navdeep Jaitly","Shuangfei Zhai"],"abstract":"This paper was accepted at the Foundation Models in the Wild workshop at ICML 2024.","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"openalex:W4400720992","title":"Deciphering language disturbances in schizophrenia: A study using fine-tuned language models","url":"https://doi.org/10.1016/j.schres.2024.07.016","published":"2024-07-17","authors":["Renyu Li","Minne Cao","Da‐Wei Fu","Wei Wei","Dequan Wang","Zhaoxia Yuan","Ruofei Hu","Wei Deng"],"abstract":"","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1016/j.schres.2024.07.016","openalex_id":"https://openalex.org/W4400720992","cited_by_count":9,"quality_score":46,"matched_keywords":[],"author_affiliations":["Alibaba Group (Cayman Islands)","Alibaba Group (China)","Hangzhou Seventh Peoples Hospital","Nanjing Brain Hospital","Universidad Politécnica de Madrid","Zhejiang University"],"concepts":[{"id":"https://openalex.org/C2776412080","display_name":"Schizophrenia (object-oriented programming)","score":0.7160613536834717},{"id":"https://openalex.org/C151956035","display_name":"Logistic regression","score":0.6258202791213989},{"id":"https://openalex.org/C161584116","display_name":"Multivariate statistics","score":0.5164773464202881},{"id":"https://openalex.org/C15744967","display_name":"Psychology","score":0.502678394317627},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.49246343970298767},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.4839401841163635},{"id":"https://openalex.org/C142853389","display_name":"Association (psychology)","score":0.46053436398506165},{"id":"https://openalex.org/C2780135775","display_name":"Positive and Negative Syndrome Scale","score":0.460157185792923}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":9}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/formalizing-natural-language-intent-into-program-specifications-via-large-language-models","title":"Can Large Language Models Transform Natural Language Intent into Formal Method Postconditions?","url":"https://www.microsoft.com/en-us/research/publication/formalizing-natural-language-intent-into-program-specifications-via-large-language-models/","published":"2024-07-16","authors":["Madeline Endres","Sarah Fakhoury","Saikat Chakraborty","Shuvendu Lahiri"],"abstract":"Informal natural language that describes code functionality, such as code comments or function documentation, may contain substantial information about a programs intent. However, there is typically no guarantee that a programs implementation and natural language documentation are aligned. In the case of a conflict, leveraging information in code-adjacent natural language has the potential to enhance fault localization, debugging, and code trustworthiness. In practice, however, this information is often underutilized due to the inherent ambiguity of natural language which makes natural language intent challenging to check programmatically. The emergent abilities of Large Language Models (LLMs) have the potential to facilitate the translation of natural language intent to programmatically checkable assertions. However, it is unclear if LLMs can correctly translate informal natural languag...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.1145/3660791","openalex_id":"https://openalex.org/W4400582518","cited_by_count":26,"quality_score":102,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Programming languages and software engineering","Programming language (88)","software engineering","1970-01-01"],"author_affiliations":["Microsoft","Microsoft (United States)","Seattle University","University of Michigan"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/on-overcoming-miscalibrated-conversational-priors-in-llm-based-chatbots","title":"On Overcoming Miscalibrated Conversational Priors in LLM-based Chatbots","url":"https://www.microsoft.com/en-us/research/publication/on-overcoming-miscalibrated-conversational-priors-in-llm-based-chatbots/","published":"2024-07-16","authors":["Christine Herlihy","Jennifer Neville","Tobias Schnabel","Adith Swaminathan"],"abstract":"We explore the use of Large Language Model (LLM-based) chatbots to power recommender systems. We observe that the chatbots respond poorly when they encounter under-specified requests (e.g., they make incorrect assumptions, hedge with a long response, or refuse to answer). We conjecture that such miscalibrated response tendencies (i.e., conversational priors) can be attributed to LLM fine-tuning using annotators --- single-turn annotations may not capture multi-turn conversation utility, and the annotators' preferences may not even be representative of users interacting with a recommender system.We first analyze public LLM chat logs to conclude that query under-specification is common. Next, we study synthetic recommendation problems with configurable latent item utilities and frame them as Partially Observed Decision Processes (PODP). We find that pre-trained LLMs can be sub-optimal for....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Search and information retrieval","Machine learning","LLM","language model"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"apple:aex2gvaexpy8tti65lqdz7pa","title":"Ferretv2: An Improved Baseline for Referring and Grounding","url":"https://machinelearning.apple.com/research/ferretv2","published":"2024-07-16","authors":["Haotian Zhang","Haoxuan You","Philipp Dufter","Bowen Zhang","Chen Chen","Hong-You Chen","Tsu-Jui Fu","William Yang Wang","Shih-Fu Chang","Zhe Gan","Yinfei Yang"],"abstract":"While Ferret seamlessly integrates regional understanding into the Large Language Model (LLM) to facilitate its referring and grounding capability, it poses certain limitations: constrained by the pre-trained fixed visual encoder and failed to perform well on broader tasks. In this work, we unveil Ferret-v2, a significant upgrade to Ferret, with three key designs. (1) Any resolution grounding and referring: A flexible approach that effortlessly...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["LLM","language model"],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/slip-securing-llms-ip-using-weights-decomposition","title":"SLIP: Securing LLMs IP Using Weights Decomposition","url":"https://www.microsoft.com/en-us/research/publication/slip-securing-llms-ip-using-weights-decomposition/","published":"2024-07-15","authors":["Yehonathan Refael","Adam Hakim","Lev Greenberg","Tal Aviv","Satya Lokam","Ben Fishman","Shachar Seidman"],"abstract":"Large language models (LLMs) have recently seen widespread adoption, in both academia and industry. As these models grow, they become valuable intellectual property (IP), reflecting enormous investments by their owners. Moreover, the high cost of cloud-based deployment has driven interest towards deployment to edge devices, yet this risks exposing valuable parameters to theft and unauthorized use. Current methods to protect models' IP on the edge have limitations in terms of practicality, loss in accuracy, or suitability to requirements. In this paper, we introduce a novel hybrid inference algorithm, named SLIP, designed to protect edge-deployed models from theft. SLIP is the first hybrid protocol that is both practical for real-world applications and provably secure, while having zero accuracy degradation and minimal impact on latency. It involves partitioning the model between two comp...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Article (Journal)","Artificial intelligence","Security, privacy, and cryptography","Computer science","mathematics"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/automated-root-causing-of-cloud-incidents-using-in-context-learning-with-gpt-4","title":"Automated Root Causing of Cloud Incidents using In-Context Learning with GPT-4","url":"https://www.microsoft.com/en-us/research/publication/automated-root-causing-of-cloud-incidents-using-in-context-learning-with-gpt-4/","published":"2024-07-15","authors":["Xuchao Zhang","Supriyo GHOSH","Chetan Bansal","Rujia Wang","Minghua Ma","Yu Kang","Saravan Rajmohan"],"abstract":"Root Cause Analysis (RCA) plays a pivotal role in the incident diagnosis process for cloud services, requiring on-call engineers to identify the primary issues and implement corrective actions to prevent future recurrences. Improving the incident RCA process is vital for minimizing service downtime, customer impact and manual toil. Recent advances in artificial intelligence have introduced state-of-the-art Large Language Models (LLMs) like GPT-4, which have proven effective in tackling various AIOps problems, ranging from code authoring to incident management. Nonetheless, the GPT-4 model's immense size presents challenges when trying to fine-tune it on user data because of the significant GPU resource demand and the necessity for continuous model fine-tuning with the emergence of new data. To address the high cost of fine-tuning LLM, we propose an in-context learning approach for automa...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","1970-01-01","LLM"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/x-lifecycle-learning-for-cloud-incident-management-using-llms","title":"X-lifecycle Learning for Cloud Incident Management using LLMs","url":"https://www.microsoft.com/en-us/research/publication/x-lifecycle-learning-for-cloud-incident-management-using-llms/","published":"2024-07-15","authors":["Drishti Goel","Fiza Husain","Aditya Singh","Supriyo GHOSH","A. Parayil","Chetan Bansal","Xuchao Zhang","Saravan Rajmohan"],"abstract":"Incident management for large cloud services is a complex and tedious process and requires significant amount of manual efforts from on-call engineers (OCEs). OCEs typically leverage data from different stages of the software development lifecycle [SDLC] (e.g., codes, configuration, monitor data, service properties, service dependencies, trouble-shooting documents, etc.) to generate insights for detection, root causing and mitigating of incidents. Recent advancements in large language models [LLMs] (e.g., ChatGPT, GPT-4, Gemini) created opportunities to automatically generate contextual recommendations to the OCEs assisting them to quickly identify and mitigate critical issues. However, existing research typically takes a silo-ed view for solving a certain task in incident management by leveraging data from a single stage of SDLC. In this paper, we demonstrate that augmenting additional....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W4402979749","title":"CLIPER: A Unified Vision-Language Framework for In-the-Wild Facial Expression Recognition","url":"https://doi.org/10.1109/icme57554.2024.10687508","published":"2024-07-15","authors":["Hanting Li","Hongjing Niu","Zhaoqing Zhu","Feng Zhao"],"abstract":"As one of the most informative behaviors of humans, facial expressions are often compound and variable, which is manifested by the fact that different people may express the same expression in very different ways. However, most facial expression recognition (FER) methods still use one-hot or soft labels as the supervision, which lack sufficient semantic descriptions of facial expressions and are less interpretable. Recently, contrastive vision-language pre-training models (e.g., CLIP) use text as the supervision and have injected new vitality into various computer vision tasks, benefiting from the rich semantics in text. Therefore, we propose CLIPER, a unified framework for both static and dynamic facial Expression Recognition based on CLIP. Besides, we introduce multiple expression text descriptors (METD) to learn fine-grained expression representations and a two-stage training paradigm...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/icme57554.2024.10687508","openalex_id":"https://openalex.org/W4402979749","cited_by_count":40,"quality_score":67,"matched_keywords":[],"author_affiliations":["Alibaba Group (China)","University of Science and Technology of China"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6500217914581299},{"id":"https://openalex.org/C195704467","display_name":"Facial expression","score":0.5935878753662109},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.487490177154541},{"id":"https://openalex.org/C90559484","display_name":"Expression (computer science)","score":0.4714670479297638},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.38174593448638916},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.3634706139564514},{"id":"https://openalex.org/C28490314","display_name":"Speech recognition","score":0.3502596616744995},{"id":"https://openalex.org/C199360897","display_name":"Programming language","score":0.14268732070922852}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":40}},{"id":"apple:col4t2rb5bvyzovvenybhm3x","title":"CodeAct: Your LLM Agent Acts Better when Generating Code","url":"https://machinelearning.apple.com/research/codeact","published":"2024-07-15","authors":["Xingyao Wang","Yangyi Chen","Lifan Yuan","Yizhe Zhang","Yunzhu Li","Hao Peng","Ji Heng"],"abstract":"Large Language Model (LLM) agents, capable of performing a broad range of actions, such as invoking tools and controlling robots, show great potential in tackling real-world challenges. LLM agents are typically prompted to produce actions by generating JSON or text in a pre-defined format, which is usually limited by constrained action space (e.g., the scope of pre-defined tools) and restricted flexibility (e.g., inability to compose multiple...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["LLM","language model","agent"],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"openalex:W4400663251","title":"A systematic evaluation of GPT-4V's multimodal capability for chest X-ray image analysis","url":"https://doi.org/10.1016/j.metrad.2024.100099","published":"2024-07-15","authors":["Yunyi Liu","Yingshu Li","Zhanyu Wang","Xinyu Liang","Lingqiao Liu","Lei Wang","Lei Wang","Leyang Cui","Zhaopeng Tu","Longyue Wang","Longyue Wang","Luping Zhou"],"abstract":"This work evaluates GPT-4V’s multimodal capability for medical image analysis, focusing on three representative tasks radiology report generation, medical visual question answering, and medical visual grounding. For the evaluation, a set of prompts is designed for each task to induce the corresponding capability of GPT-4V to produce sufficiently good outputs. Three evaluation ways including quantitative analysis, human evaluation, and case study are employed to achieve an in-depth and extensive evaluation. Our evaluation shows that GPT-4V excels in understanding medical images can generate high-quality radiology reports and effectively answer questions about medical images. Meanwhile, it is found that its performance for medical visual grounding needs to be substantially improved. In addition, we observe the discrepancy between the evaluation outcome from quantitative analysis and that f...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1016/j.metrad.2024.100099","openalex_id":"https://openalex.org/W4400663251","cited_by_count":20,"quality_score":57,"matched_keywords":[],"author_affiliations":["Guangzhou University of Chinese Medicine","Tencent (China)","The University of Adelaide","The University of Sydney","University of Wollongong"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6800072193145752},{"id":"https://openalex.org/C2780451532","display_name":"Task (project management)","score":0.6256070137023926},{"id":"https://openalex.org/C177264268","display_name":"Set (abstract data type)","score":0.5609726905822754},{"id":"https://openalex.org/C3018395757","display_name":"Evaluation methods","score":0.4846108853816986},{"id":"https://openalex.org/C95986675","display_name":"Quantitative analysis (chemistry)","score":0.47866925597190857},{"id":"https://openalex.org/C31601959","display_name":"Medical imaging","score":0.4619027078151703},{"id":"https://openalex.org/C2779530757","display_name":"Quality (philosophy)","score":0.43850177526474},{"id":"https://openalex.org/C19527891","display_name":"Medical physics","score":0.41478702425956726}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":20}},{"id":"apple:yd8voaoy8fhbhj1nioy9t9kj","title":"Whispering Experts: Toxicity Mitigation in Pre-trained Language Models by Dampening Expert Neurons","url":"https://machinelearning.apple.com/research/whispering-experts","published":"2024-07-15","authors":["Xavier Suau Cuadros","Pieter Delobelle","Rin Metcalf Susa","Armand Joulin","Nick Apostoloff","Luca Zappella","Pau Rodriguez Lopez"],"abstract":"An important issue with Large Language Models (LLMs) is their undesired ability to generate toxic language. In this work, we show that the neurons responsible for toxicity can be determined by their power to discriminate toxic sentences, and that toxic language can be mitigated by reducing their activation levels proportionally to this power. We propose AUROC adaptation (AURA), an intervention that can be applied to any pre-trained LLM to...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["LLM"],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"apple:ey53xb7zgv8apmd9gg0h2wyq","title":"Revealing the Utilized Rank of Subspaces of Learning in Neural Networks","url":"https://machinelearning.apple.com/research/revealing-utilized-rank","published":"2024-07-15","authors":["Isha Garg","Christian Koguchi","Eshan Verma","Daniel Ulbricht"],"abstract":"This paper has been accepted at the Efficient Systems for Foundation Models workshop at ICML 2024.","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["efficient"],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"openalex:W4402980487","title":"Multi-modal Learnable Queries for Image Aesthetics Assessment","url":"https://doi.org/10.1109/icme57554.2024.10687472","published":"2024-07-15","authors":["Zhiwei Xiong","Yunfan Zhang","Zhiqi Shen","Peiran Ren","Han Yu"],"abstract":"Image aesthetics assessment (IAA) is attracting wide interest with the prevalence of social media. The problem is challenging due to its subjective and ambiguous nature. Instead of directly extracting aesthetic features solely from the image, user comments associated with an image could potentially provide complementary knowledge that is useful for IAA. With existing large-scale pre-trained models demonstrating strong capabilities in extracting high-quality transferable visual and textual features, learnable queries are shown to be effective in extracting useful features from the pre-trained visual features. Therefore, in this paper, we propose MMLQ, which utilizes multi-modal learnable queries to extract aesthetics-related features from multi-modal pre-trained features. Extensive experimental results demonstrate that MMLQ achieves new state-of-the-art performance on multimodal IAA, beat...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/icme57554.2024.10687472","openalex_id":"https://openalex.org/W4402980487","cited_by_count":0,"quality_score":41,"matched_keywords":["media"],"author_affiliations":["Alibaba Group (China)","Nanyang Technological University"],"concepts":[{"id":"https://openalex.org/C71139939","display_name":"Modal","score":0.7971612215042114},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6178740859031677},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.5044821500778198},{"id":"https://openalex.org/C107038049","display_name":"Aesthetics","score":0.4695499539375305},{"id":"https://openalex.org/C115961682","display_name":"Image (mathematics)","score":0.4555862545967102},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4186943769454956},{"id":"https://openalex.org/C142362112","display_name":"Art","score":0.1431545913219452},{"id":"https://openalex.org/C192562407","display_name":"Materials science","score":0.11511081457138062}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4401991032","title":"Enhancing Visual Wake Word Spotting with Pretrained Model and Feature Balance Scaling","url":"http://dx.doi.org/10.1109/icmew63481.2024.10645389","published":"2024-07-15","authors":["Xuandong Huang","Shangfei Wang","Jinghao Yan","Kai Tang","Pengfei Hu"],"abstract":"Wake word spotting mainly focus on audio modality or audio-visual multimodal exploration. The visual modality delivers stable outcomes under poor acoustic conditions, making visual wake word spotting an emerging and challenging task. However, challenges such as overfitting due to subject dependence and performance decrease from imbalances between positive and negative samples still exist in visual wake word spotting. This paper introduces an efficient and robust visual wake word spotting system. Notably, a pretrained visual lipreading sequence encoder is employed to extract more effective lip movement features, allowing the model to focus on lip movement patterns and prevent overfitting. Additionally, we propose feature balance scaling, which adjusts the feature value ranges of both positive and negative samples during training. This scaling method can be easily applied to wake word spot...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/icmew63481.2024.10645389","openalex_id":"https://openalex.org/W4401991032","cited_by_count":0,"quality_score":41,"matched_keywords":["efficient"],"author_affiliations":["Tencent (China)","University of Science and Technology of China"],"concepts":[{"id":"https://openalex.org/C2779506182","display_name":"Spotting","score":0.903628945350647},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7232455015182495},{"id":"https://openalex.org/C2776401178","display_name":"Feature (linguistics)","score":0.6308907866477966},{"id":"https://openalex.org/C168031717","display_name":"Balance (ability)","score":0.5311043858528137},{"id":"https://openalex.org/C48939323","display_name":"Wake","score":0.5298713445663452},{"id":"https://openalex.org/C90805587","display_name":"Word (group theory)","score":0.5111334919929504},{"id":"https://openalex.org/C99844830","display_name":"Scaling","score":0.5006487369537354},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.49279382824897766}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4402981109","title":"Cross-Lingual Transfer for Natural Language Inference via Multilingual Prompt Translator","url":"http://dx.doi.org/10.1109/icme57554.2024.10687356","published":"2024-07-15","authors":["Xiaoyu Qiu","Yuechen Wang","Jiaxin Shi","Wengang Zhou","Houqiang Li"],"abstract":"Based on multilingual pre-trained models, cross-lingual transfer with prompt learning has shown promising effectiveness, where soft prompt learned in a source language is transferred to target languages for downstream tasks, particularly in the low-resource scenario. To efficiently transfer soft prompt, we propose a novel framework, Multilingual Prompt Translator (MPT), where a multilingual prompt translator is introduced to properly process crucial knowledge embedded in prompt by changing language knowledge while retaining task knowledge. More concretely, we first train prompt in source language and employ translator to translate it into target prompt. Besides, we extend an external corpus as auxiliary data, on which an alignment task for predicted answer probability is designed to convert language knowledge, thereby equipping target prompt with multilingual knowledge. In few-shot setti...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/icme57554.2024.10687356","openalex_id":"https://openalex.org/W4402981109","cited_by_count":2,"quality_score":39,"matched_keywords":[],"author_affiliations":["Huawei Technologies (China)","University of Science and Technology of China"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8223845958709717},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.6676849126815796},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5721980333328247},{"id":"https://openalex.org/C2776214188","display_name":"Inference","score":0.5458524227142334},{"id":"https://openalex.org/C2776175482","display_name":"Transfer (computing)","score":0.4495125114917755},{"id":"https://openalex.org/C195324797","display_name":"Natural language","score":0.41403281688690186},{"id":"https://openalex.org/C199360897","display_name":"Programming language","score":0.34806281328201294},{"id":"https://openalex.org/C173608175","display_name":"Parallel computing","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":2}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/large-language-models-can-accurately-predict-searcher-preferences","title":"Large Language Models Can Accurately Predict Searcher Preferences","url":"https://www.microsoft.com/en-us/research/publication/large-language-models-can-accurately-predict-searcher-preferences/","published":"2024-07-14","authors":["Paul Thomas","Seth Spielman","Nick Craswell","Bhaskar Mitra"],"abstract":"Much of the evaluation and tuning of a search system relies on rele vance labels—annotations that say whether a document is useful for a given search and searcher. Ideally these come from real searchers, but it is hard to collect this data at scale, so typical experiments rely on third-party labellers who may or may not produce accurate an notations. Label quality is managed with ongoing auditing, training, and monitoring. We discuss an alternative approach. We take careful feedback from real searchers and use this to select a large language model (LLM), and prompt, that agrees with this feedback; the LLM can then produce labels at scale. Our experiments show LLMs are as accurate as human labellers and as useful for finding the best sys tems and hardest queries. LLM performance varies with prompt features, but also varies unpredictably with simple paraphrases. This unpredictability reinf...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.1145/3626772.3657707","openalex_id":"https://openalex.org/W4400526908","cited_by_count":131,"quality_score":114,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Search and information retrieval","Information retrieval","large language model","Machine learning","LLM","language model"],"author_affiliations":["Microsoft","Microsoft (Canada)","Microsoft (United States)","Microsoft Research Montréal (Canada)","Seattle University"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/llm4eval-large-language-model-for-evaluation-in-ir","title":"LLM4Eval: Large Language Model for Evaluation in IR","url":"https://www.microsoft.com/en-us/research/publication/llm4eval-large-language-model-for-evaluation-in-ir/","published":"2024-07-14","authors":["Hossein A. Rahmani","Clemencia Siro","Mohammad Aliannejadi","Nick Craswell","Charles L. A. Clarke","Guglielmo Faggioli","Bhaskar Mitra","Paul Thomas","Emine Yilmaz"],"abstract":"Large language models (LLMs) have demonstrated increasing task-solving abilities not present in smaller models. Utilizing the capabilities and responsibilities of LLMs for automated evaluation (LLM4eval) has recently attracted considerable attention in multiple research communities. For instance, LLM4eval models have been studied in the context of automated judgments, natural language generation, and retrieval augmented generation systems. We believe that the information retrieval community can significantly contribute to this growing research area by designing, implementing, analyzing, and evaluating various aspects of LLMs with applications to LLM4eval tasks. The main goal of LLM4eval workshop is to bring together researchers from industry and academia to discuss various aspects of LLMs for evaluation in information retrieval, including automated judgments, retrieval-augmented generati...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.1145/3626772.3657992","openalex_id":"https://openalex.org/W4400528870","cited_by_count":19,"quality_score":87,"matched_keywords":["Inproceedings (Conference)","Search and information retrieval","language model","retrieval"],"author_affiliations":["Microsoft","Amazon (United Kingdom)","Amsterdam University of the Arts","Bellevue Hospital Center","Microsoft (Canada)","Microsoft (United States)","University College London","University of Amsterdam","University of Padua","University of Waterloo","Seattle University"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/hey-thats-my-model-introducing-chain-hash-an-llm-fingerprinting-technique","title":"Hey, That's My Model! Introducing Chain & Hash, An LLM Fingerprinting Technique","url":"https://www.microsoft.com/en-us/research/publication/hey-thats-my-model-introducing-chain-hash-an-llm-fingerprinting-technique/","published":"2024-07-14","authors":["Mark Russinovich","Ahmed Salem","Yanan Cai"],"abstract":"Growing concerns over the theft and misuse of Large Language Models (LLMs) have heightened the need for effective fingerprinting, which links a model to its original version to detect misuse. In this paper, we define five key properties for a successful fingerprint: Transparency, Efficiency, Persistence, Robustness, and Unforgeability. We introduce a novel fingerprinting framework that provides verifiable proof of ownership while maintaining fingerprint integrity. Our approach makes two main contributions. First, we propose a Chain and Hash technique that cryptographically binds fingerprint prompts with their responses, ensuring no adversary can generate colliding fingerprints and allowing model owners to irrefutably demonstrate their creation. Second, we address a realistic threat model in which instruction-tuned models' output distribution can be significantly altered through meta-prom...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","large language models","1970-01-01","LLM"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/clave-an-adaptive-framework-for-evaluating-values-of-llm-generated-responses","title":"CLAVE: An Adaptive Framework for Evaluating Values of LLM Generated Responses","url":"https://www.microsoft.com/en-us/research/publication/clave-an-adaptive-framework-for-evaluating-values-of-llm-generated-responses/","published":"2024-07-14","authors":["Jing Yao","Xiaoyuan Yi","Xing Xie"],"abstract":"The rapid progress in Large Language Models (LLMs) poses potential risks such as generating unethical content. Assessing LLMs' values can help expose their misalignment, but relies on reference-free evaluators, e.g., fine-tuned LLMs or close-source ones like GPT-4, to identify values reflected in generated responses. Nevertheless, these evaluators face two challenges in open-ended value evaluation: they should align with changing human value definitions with minimal annotation, against their own bias (adaptability), and detect varying value expressions and scenarios robustly (generalizability). To handle these challenges, we introduce CLAVE, a novel framework which integrates two complementary LLMs, a large one to extract high-level value concepts from a few human labels, leveraging its extensive knowledge and generalizability, and a smaller one fine-tuned on such concepts to better alig...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","large language models","1970-01-01","LLM"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W4400606525","title":"How Can Recommender Systems Benefit from Large Language Models: A Survey","url":"https://doi.org/10.1145/3678004","published":"2024-07-13","authors":["Jianghao Lin","Xinyi Dai","Yunjia Xi","Weiwen Liu","Bo Chen","Hao Zhang","Yong Liu","Chuhan Wu","Xiangyang Li","Chenxu Zhu","Huifeng Guo","Yong Yu"],"abstract":"With the rapid development of online services and web applications, recommender systems (RS) have become increasingly indispensable for mitigating information overload and matching users’ information needs by providing personalized suggestions over items. Although the RS research community has made remarkable progress over the past decades, conventional recommendation models (CRM) still have some limitations, e.g., lacking open-domain world knowledge, and difficulties in comprehending users’ underlying preferences and motivations. Meanwhile, large language models (LLM) have shown impressive general intelligence and human-like capabilities for various natural language processing (NLP) tasks, which mainly stem from their extensive open-world knowledge, logical and commonsense reasoning abilities, as well as their comprehension of human culture and society. Consequently, the emergence of LL...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3678004","openalex_id":"https://openalex.org/W4400606525","cited_by_count":154,"quality_score":75,"matched_keywords":["LLM","personalized"],"author_affiliations":["Huawei Technologies (China)","Shanghai Jiao Tong University"],"concepts":[{"id":"https://openalex.org/C557471498","display_name":"Recommender system","score":0.8554190397262573},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6685795783996582},{"id":"https://openalex.org/C136764020","display_name":"World Wide Web","score":0.4313771724700928},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.41119813919067383},{"id":"https://openalex.org/C2522767166","display_name":"Data science","score":0.3813563585281372}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":154}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/codeplan-repository-level-coding-using-llms-and-planning-2","title":"CodePlan: Repository-level Coding using LLMs and Planning","url":"https://www.microsoft.com/en-us/research/publication/codeplan-repository-level-coding-using-llms-and-planning-2/","published":"2024-07-12","authors":["Ramakrishna Bairi","Atharv Sonwane","Aditya Kanade","Vageesh D C","Arun Shankar Iyer","Suresh Parthasarathy","Sriram Rajamani","B. Ashok","Shashank Shet"],"abstract":"Software engineering activities such as package migration, fixing error reports from static analysis or testing, and adding type annotations or other specifications to a codebase, involve pervasively editing the entire repository of code. We formulate these activities as repository-level coding tasks. Recent tools like GitHub Copilot, which are powered by Large Language Models (LLMs), have succeeded in offering high-quality solutions to localized coding problems. Repository-level coding tasks are more involved and cannot be solved directly using LLMs, since code within a repository is inter-dependent and the entire repository may be too large to fit into the prompt. We frame repository-level coding as a planning problem and present a task-agnostic, neuro-symbolic framework called CodePlan to solve it. CodePlan synthesizes a multi-step chain-of-edits (plan), where each step results in a c...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Article (Journal)","Artificial intelligence","Programming languages and software engineering","Computer science","1970-01-01","LLM"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W4400573519","title":"MotionCtrl: A Unified and Flexible Motion Controller for Video Generation","url":"https://doi.org/10.1145/3641519.3657518","published":"2024-07-12","authors":["Zhouxia Wang","Ziyang Yuan","Xintao Wang","Yaowei Li","Tianshui Chen","Menghan Xia","Ping Luo","Ying Shan"],"abstract":"Motions in a video primarily consist of camera motion, induced by camera movement, and object motion, resulting from object movement. Accurate control of both camera and object motion is essential for video generation. However, existing works either mainly focus on one type of motion or do not clearly distinguish between the two, limiting their control capabilities and diversity. Therefore, this paper presents MotionCtrl, a unified and flexible motion controller for video generation designed to effectively and independently control camera and object motion. The architecture and training strategy of MotionCtrl are carefully devised, taking into account the inherent properties of camera motion, object motion, and imperfect training data. Compared to previous methods, MotionCtrl offers three main advantages: 1) It effectively and independently controls camera motion and object motion, enabl...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3641519.3657518","openalex_id":"https://openalex.org/W4400573519","cited_by_count":76,"quality_score":67,"matched_keywords":[],"author_affiliations":["Guangdong University of Technology","Nanyang Technological University","Peking University","Tencent (China)","Tsinghua University","University of Hong Kong"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7076044082641602},{"id":"https://openalex.org/C203479927","display_name":"Controller (irrigation)","score":0.49976110458374023},{"id":"https://openalex.org/C104114177","display_name":"Motion (physics)","score":0.482793927192688},{"id":"https://openalex.org/C145565327","display_name":"Motion control","score":0.45808306336402893},{"id":"https://openalex.org/C128840427","display_name":"Motion compensation","score":0.42526838183403015},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.38648778200149536},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.32543492317199707},{"id":"https://openalex.org/C90509273","display_name":"Robot","score":0.14610806107521057}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":76}},{"id":"apple:bqwczzm2oecc51xh0s60wxba","title":"Superposition Prompting: Improving and Accelerating Retrieval-Augmented Generation","url":"https://machinelearning.apple.com/research/superposition-prompting","published":"2024-07-12","authors":["Thomas Merth","Qichen Fu","Mohammad Rastegari","Mahyar Najibikohnehshahri"],"abstract":"Despite the successes of large language models (LLMs), they exhibit significant drawbacks, particularly when processing long contexts. Their inference cost scales quadratically with respect to sequence length, making it expensive for deployment in some real-world text processing applications, such as retrieval-augmented generation (RAG). Additionally, LLMs also exhibit the \"distraction phenomenon,\" where irrelevant context in the prompt degrades...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["retrieval"],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/autoregressive-speech-synthesis-without-vector-quantization","title":"Autoregressive Speech Synthesis without Vector Quantization","url":"https://www.microsoft.com/en-us/research/publication/autoregressive-speech-synthesis-without-vector-quantization/","published":"2024-07-11","authors":["Lingwei Meng","Long Zhou","Shujie Liu","Sanyuan Chen","Bing Han","Shujie Hu","Yanqing Liu","Jinyu Li","Sheng Zhao","Xixin Wu","Helen Meng","Furu Wei"],"abstract":"We present MELLE, a novel continuous-valued tokens based language modeling approach for text to speech synthesis (TTS). MELLE autoregressively generates continuous mel-spectrogram frames directly from text condition, bypassing the need for vector quantization, which are originally designed for audio compression and sacrifice fidelity compared to mel-spectrograms. Specifically, (i) instead of cross-entropy loss, we apply regression loss with a proposed spectrogram flux loss function to model the probability distribution of the continuous-valued tokens. (ii) we have incorporated variational inference into MELLE to facilitate sampling mechanisms, thereby enhancing the output diversity and model robustness. Experiments demonstrate that, compared to the two-stage codec language models VALL-E and its variants, the single-stage MELLE mitigates robustness issues by avoiding the inherent flaws of...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":84,"matched_keywords":["Miscellaneous","Artificial intelligence","Audio and Acoustics","Audio and Speech Processing","Computation and Language","sound","compression","quantization"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/accuracy-is-not-all-you-need","title":"Accuracy is Not All You Need","url":"https://www.microsoft.com/en-us/research/publication/accuracy-is-not-all-you-need/","published":"2024-07-11","authors":["Abhinav Dutta","Sanjeev Krishnan","Nipun Kwatra","Ramachandran Ramjee"],"abstract":"When Large Language Models (LLMs) are compressed using techniques such as quantization, the predominant way to demonstrate the validity of such techniques is by measuring the model's accuracy on various benchmarks.If the accuracies of the baseline model and the compressed model are close, it is assumed that there was negligible degradation in quality.However, even when the accuracy of baseline and compressed model are similar, we observe the phenomenon of flips, wherein answers change from correct to incorrect and vice versa in proportion.We conduct a detailed study of metrics across multiple compression techniques, models and datasets, demonstrating that the behavior of compressed models as visible to end-users is often significantly different from the baseline model, even when accuracy is similar.We further evaluate compressed models qualitatively and quantitatively using MT-Bench and....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":80,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","large language models","1970-01-01","compression","quantization"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W4413207097","title":"Securing AI-Agentic Interactions via Multi-Agent Reinforcement Learning (MARL) with Secure Communication Protocols","url":"https://doi.org/10.60087/jaigs.v4i1.397","published":"2024-07-11","authors":["Mohan Vamsi Musunuru","Subba Rao"],"abstract":"The rapid deployment of AI-driven autonomous agents in critical applications has increased the importance of secure and reliable inter-agent communication. This study presents a novel framework that integrates Multi-Agent Reinforcement Learning (MARL) with secure communication protocols to enhance trust, privacy, and resilience in AI-agentic interactions. The proposed approach leverages decentralized training and policy sharing, while employing cryptographic techniques—such as end-to-end encryption and key exchange mechanisms—to safeguard information flow among agents. Experimental results in simulated environments demonstrate that our method not only maintains competitive task performance but also significantly reduces vulnerabilities to eavesdropping, message tampering, and adversarial manipulation. By aligning MARL coordination strategies with robust security mechanisms, this research...","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.60087/jaigs.v4i1.397","openalex_id":"https://openalex.org/W4413207097","cited_by_count":1,"quality_score":46,"matched_keywords":["agent","multi-agent"],"author_affiliations":["Amazon (United States)"],"concepts":[{"id":"https://openalex.org/C97541855","display_name":"Reinforcement learning","score":0.7487231492996216},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7451750040054321},{"id":"https://openalex.org/C48044578","display_name":"Scalability","score":0.5316168665885925},{"id":"https://openalex.org/C2776788033","display_name":"Eavesdropping","score":0.5145737528800964},{"id":"https://openalex.org/C105339364","display_name":"Software deployment","score":0.5065523386001587},{"id":"https://openalex.org/C38652104","display_name":"Computer security","score":0.4965680241584778},{"id":"https://openalex.org/C120314980","display_name":"Distributed computing","score":0.4654465615749359},{"id":"https://openalex.org/C13687954","display_name":"Autonomous agent","score":0.46283650398254395}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"bytedance-seed:90","title":"Seed-ASR: Understanding Diverse Speech and Contexts with LLM-based Speech Recognition","url":"https://seed.bytedance.com/en/research/seed-asr-understanding-diverse-speech-and-contexts-with-llm-based-speech-recognition","published":"2024-07-10","authors":["Ye Bai","Jingping Chen","Jitong Chen","Wei Chen","Zhuo Chen","Chuang Ding","Linhao Dong","Qianqian Dong","Yujiao Du","Kepan Gao","Lu Gao","Yi Guo"],"abstract":"Modern automatic speech recognition (ASR) model is required to accurately transcribe diverse speech signals (from different domains, languages, accents, etc) given the specific contextual information in various application scenarios. Classic end-to-end models fused with extra language models perform well, but mainly in data matching scenarios and are gradually approaching a bottleneck. In this work, we introduce Seed-ASR, a large language model (LLM) based speech recognition model. Seed-ASR is developed based on the framework of audio conditioned LLM (AcLLM), leveraging the capabilities of LLMs by inputting continuous speech representations together with contextual information into the LLM. Through stage-wise large-scale training and the elicitation of context-aware capabilities in LLM, Seed-ASR demonstrates significant improvement over end-to-end models on comprehensive evaluation sets,...","companies":["ByteDance/Seed"],"matched_orgs":["ByteDance/Seed"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Speech&Audio","Speech","arXiv","LLM","language model"],"author_affiliations":["ByteDance/Seed"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official ByteDance Seed publication API https://seed.bytedance.com/api/get_article_list_v2"}},{"id":"bytedance-seed:221","title":"IDA-VLM: Towards Movie Understanding via ID-Aware Large Vision-Language Model","url":"https://seed.bytedance.com/en/research/ida-vlm-towards-movie-understanding-via-id-aware-large-vision-language-model","published":"2024-07-10","authors":["Yatai Ji","Shilong Zhang","Jie Wu","Peize Sun","Weifeng Chen","Xuefeng Xiao","Sidi Yang","Yujiu Yang","Ping Luo"],"abstract":"The rapid advancement of Large Vision-Language models (LVLMs) has demonstrated a spectrum of emergent capabilities. Nevertheless, current models only focus on the visual content of a single scenario, while their ability to associate instances across different scenes has not yet been explored, which is essential for understanding complex visual content, such as movies with multiple characters and intricate plots. Towards movie understanding, a critical initial step for LVLMs is to unleash the potential of character identities memory and recognition across multiple visual scenarios. To achieve the goal, we propose visual instruction tuning with ID reference and develop an ID-Aware Large Vision-Language Model, IDA-VLM. Furthermore, our research introduces a novel benchmark MM-ID, to examine LVLMs on instance IDs memory and recognition across four dimensions: matching, location, question-ans...","companies":["ByteDance/Seed"],"matched_orgs":["ByteDance/Seed"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Computer Vision","Vision","ICLR 2025","language model","memory"],"author_affiliations":["ByteDance/Seed"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official ByteDance Seed publication API https://seed.bytedance.com/api/get_article_list_v2"}},{"id":"openalex:W4400524696","title":"GraphGPT: Graph Instruction Tuning for Large Language Models","url":"https://doi.org/10.1145/3626772.3657775","published":"2024-07-10","authors":["Jiabin Tang","Yuhao Yang","Wei Wei","Lei Shi","Lixin Su","Suqi Cheng","Dawei Yin","Chao Huang"],"abstract":"Graph Neural Networks (GNNs) have evolved to understand graph structures through recursive exchanges and aggregations among nodes. To enhance robustness, self-supervised learning (SSL) has become a vital tool for data augmentation. Traditional methods often depend on fine-tuning with task-specific labels, limiting their effectiveness when labeled data is scarce. Our research tackles this by advancing graph model generalization in zero-shot learning environments. Inspired by the success of large language models (LLMs), we aim to create a graph-oriented LLM capable of exceptional generalization across various datasets and tasks without relying on downstream graph data. We introduce the GraphGPT framework, which integrates LLMs with graph structural knowledge through graph instruction tuning. This framework includes a text-graph grounding component to link textual and graph structures and a...","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3626772.3657775","openalex_id":"https://openalex.org/W4400524696","cited_by_count":132,"quality_score":71,"matched_keywords":["LLM"],"author_affiliations":["Baidu (China)","University of Hong Kong"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8175190091133118},{"id":"https://openalex.org/C132525143","display_name":"Graph","score":0.510992169380188},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.4675264358520508},{"id":"https://openalex.org/C80444323","display_name":"Theoretical computer science","score":0.3357095718383789},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.27268266677856445}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":132}},{"id":"openalex:W4400484796","title":"Automated Unit Test Improvement using Large Language Models at Meta","url":"https://doi.org/10.1145/3663529.3663839","published":"2024-07-10","authors":["Nadia Alshahwan","Jubin Chheda","Anastasia Finogenova","Beliz Gokkaya","Mark Harman","Inna Harper","Alexandru Marginean","Shubho Sengupta","E. Wang"],"abstract":"This paper describes Meta’s TestGen-LLM tool, which uses LLMs to automatically improve existing human-written tests. TestGen-LLM verifies that its generated test classes successfully clear a set of filters that assure measurable improvement over the original test suite, thereby eliminating problems due to LLM hallucination. We describe the deployment of TestGen-LLM at Meta test-a-thons for the Instagram and Facebook platforms. In an evaluation on Reels and Stories products for Instagram, 75% of TestGen-LLM’s test cases built correctly, 57% passed reliably, and 25% increased coverage. During Meta’s Instagram and Facebook test-a-thons, it improved 11.5% of all classes to which it was applied, with 73% of its recommendations being accepted for production deployment by Meta software engineers. We believe this is the first report on industrial scale deployment of LLM-generated code backed by....","companies":["Meta/FAIR"],"matched_orgs":["Meta/FAIR"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3663529.3663839","openalex_id":"https://openalex.org/W4400484796","cited_by_count":85,"quality_score":71,"matched_keywords":["LLM"],"author_affiliations":["Meta (United Kingdom)","Meta (United States)","University College London"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6872904300689697},{"id":"https://openalex.org/C2777267654","display_name":"Test (biology)","score":0.5791163444519043},{"id":"https://openalex.org/C148027188","display_name":"Unit testing","score":0.5239037275314331},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.3337504267692566},{"id":"https://openalex.org/C199360897","display_name":"Programming language","score":0.2813045382499695},{"id":"https://openalex.org/C127313418","display_name":"Geology","score":0.08754709362983704},{"id":"https://openalex.org/C2777904410","display_name":"Software","score":0.08700257539749146},{"id":"https://openalex.org/C151730666","display_name":"Paleontology","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":85}},{"id":"openalex:W4400530529","title":"A Field Guide to Automatic Evaluation of LLM-Generated Summaries","url":"https://doi.org/10.1145/3626772.3661346","published":"2024-07-10","authors":["Tempest A van Schaik","B. M. Pugh"],"abstract":"Large Language models (LLMs) are rapidly being adopted for tasks such as text summarization, in a wide range of industries. This has driven the need for scalable, automatic, reliable, and cost-effective methods to evaluate the quality of LLM-generated text. What is meant by evaluating an LLM is not yet well defined and there are widely different expectations about what kind of information evaluation will produce. Evaluation methods that were developed for traditional Natural Language Processing (NLP) tasks (before the rise of LLMs) remain applicable but are not sufficient for capturing high-level semantic qualities of summaries. Emerging evaluation methods that use LLMs to evaluate LLM-output, appear to be powerful but lacking in reliability. New elements of LLM generated text that were not an element of previous NLP tasks, such as the artifacts of hallucination, need to be considered. W...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3626772.3661346","openalex_id":"https://openalex.org/W4400530529","cited_by_count":27,"quality_score":68,"matched_keywords":["LLM"],"author_affiliations":["Microsoft (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6816741824150085},{"id":"https://openalex.org/C9652623","display_name":"Field (mathematics)","score":0.6124576926231384},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.45397356152534485},{"id":"https://openalex.org/C33923547","display_name":"Mathematics","score":0.09195095300674438},{"id":"https://openalex.org/C202444582","display_name":"Pure mathematics","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":27}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/nutime-numerically-multi-scaled-embedding-for-large-scale-time-series-pretraining","title":"NuTime: Numerically Multi-Scaled Embedding for Large-Scale Time Series Pretraining","url":"https://www.microsoft.com/en-us/research/publication/nutime-numerically-multi-scaled-embedding-for-large-scale-time-series-pretraining/","published":"2024-07-10","authors":["Chenguo Lin","Xumeng Wen","Wei Cao","Congrui Huang","Jiang Bian","Stephen Lin","Zhirong Wu"],"abstract":"Recent research on time-series self-supervised models shows great promise in learning semantic representations. However, it has been limited to small-scale datasets, e.g., thousands of temporal sequences. In this work, we make key technical contributions that are tailored to the numerical properties of time-series data and allow the model to scale to large datasets, e.g., millions of temporal sequences. We adopt the Transformer architecture by first partitioning the input into non-overlapping windows. Each window is then characterized by its normalized shape and two scalar values denoting the mean and standard deviation within each window. To embed scalar values that may possess arbitrary numerical amplitudes in a high-dimensional space, we propose a numerically multi-scaled embedding module enumerating all possible numerical scales for the scalars. The model undergoes pretraining with a...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Article (Journal)","Artificial intelligence","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"bytedance-seed:109","title":"LLaVA-NeXT-Interleave: Tackling Multi-image, Video, and 3D in Large Multimodal Models","url":"https://seed.bytedance.com/en/research/llava-next-interleave-tackling-multi-image-video-and-3d-in-large-multimodal-models","published":"2024-07-10","authors":["Feng Li","Renrui Zhang","Hao Zhang","Yuanhan Zhang","Bo Li","Wei Li","Zejun Ma","Chunyuan Li"],"abstract":"Visual instruction tuning has made considerable strides in enhancing the capabilities of Large Multimodal Models (LMMs). However, existing open LMMs largely focus on single-image tasks, their applications to multi-image scenarios remains less explored. Additionally, prior LMM research separately tackles different scenarios, leaving it impossible to generalize cross scenarios with new emerging capabilities. To this end, we introduce LLaVA-NeXT-Interleave, which simultaneously tackles Multi-image, Multi-frame (video), Multi-view (3D), and Multi-patch (single-image) scenarios in LMMs. To enable these capabilities, we regard the interleaved data format as a general template and compile the M4-Instruct dataset with 1,177.6k samples, spanning 4 primary domains with 14 tasks and 41 datasets. We also curate the LLaVA-Interleave Bench to comprehensively evaluate the multi-image performance of LMM...","companies":["ByteDance/Seed"],"matched_orgs":["ByteDance/Seed"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Computer Vision","Multimodal","ICLR 2025 Spotlight"],"author_affiliations":["ByteDance/Seed"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official ByteDance Seed publication API https://seed.bytedance.com/api/get_article_list_v2"}},{"id":"openalex:W4400528710","title":"Multimodal Representation and Retrieval [MRR 2024]","url":"https://doi.org/10.1145/3626772.3657987","published":"2024-07-10","authors":["Xinliang Zhu","Arnab Dhua","Douglas Gray","İsmet Zeki Yalnız","Tan Yu","Mohamed Elhoseiny","Bryan A. Plummer"],"abstract":"Multimodal data is available in many applications like e-commerce production listings, social media posts and short videos. However, existing algorithms dealing with those types of data still focus on uni-modal representation learning by vision-language alignment and cross-modal retrieval. In this workshop, we target to bring a new retrieval problem where both queries and documents are multimodal. With the popularity of vision language modeling, large language models (LLMs), retrieval augmented generation (RAG), and multimodal LLM, we see a lot of new opportunities for multimodal representation and retrieval tasks. This event will be a comprehensive half-day workshop focusing on the subject of multimodal representation and retrieval. The agenda includes keynote speeches, oral presentations, and an interactive panel discussion.","companies":["Amazon","NVIDIA"],"matched_orgs":["Amazon","NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3626772.3657987","openalex_id":"https://openalex.org/W4400528710","cited_by_count":2,"quality_score":63,"matched_keywords":["LLM","retrieval","media"],"author_affiliations":["Amazon (United States)","King Abdullah University of Science and Technology","Menlo School","Nvidia (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6503955125808716},{"id":"https://openalex.org/C2776359362","display_name":"Representation (politics)","score":0.5425385236740112},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.4661029577255249},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4108927249908447},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.33407795429229736},{"id":"https://openalex.org/C17744445","display_name":"Political science","score":0.0},{"id":"https://openalex.org/C94625758","display_name":"Politics","score":0.0},{"id":"https://openalex.org/C199539241","display_name":"Law","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":2}},{"id":"openalex:W4400524654","title":"CorpusLM: Towards a Unified Language Model on Corpus for Knowledge-Intensive Tasks","url":"https://doi.org/10.1145/3626772.3657778","published":"2024-07-10","authors":["Xiaoxi Li","Zhicheng Dou","Yujia Zhou","Fangchao Liu"],"abstract":"Large language models (LLMs) have gained significant attention in various fields but prone to hallucination, especially in knowledge-intensive (KI) tasks. To address this, retrieval-augmented generation (RAG) has emerged as a popular solution to enhance factual accuracy. However, traditional retrieval modules often rely on large document index and disconnect with generative tasks. With the advent of generative retrieval (GR), language models can retrieve by directly generating document identifiers (DocIDs), offering superior performance in retrieval tasks. However, the potential relationship between GR and downstream tasks remains unexplored. In this paper, we propose CorpusLM, a unified language model that leverages external corpus to tackle various knowledge-intensive tasks by integrating generative retrieval, closed-book generation, and RAG through a unified greedy decoding process. W...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3626772.3657778","openalex_id":"https://openalex.org/W4400524654","cited_by_count":13,"quality_score":62,"matched_keywords":["language model","retrieval","efficient"],"author_affiliations":["Huawei Technologies (China)","Renmin University of China"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.813626766204834},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.5624891519546509},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4461214244365692},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.4426132142543793},{"id":"https://openalex.org/C199360897","display_name":"Programming language","score":0.32186704874038696}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":13}},{"id":"apple:n3l7f168jfk17lbtuor0falx","title":"Accurate Knowledge Distillation via N-best Reranking","url":"https://machinelearning.apple.com/research/accurate-knowledge-distillation","published":"2024-07-10","authors":["Hendra Setiawan"],"abstract":"We propose utilizing n-best reranking to enhance Sequence-Level Knowledge Distillation (Kim and Rush, 2016) where we extract pseudo-labels for student model’s training data from top n-best hypotheses and leverage a diverse set of models with different inductive biases, objective functions or architectures, including some publicly-available large language models, to pick the highest-quality hypotheses as labels. The effectiveness of our proposal...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["distillation"],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"openalex:W4400524512","title":"Unsupervised Large Language Model Alignment for Information Retrieval via Contrastive Feedback","url":"https://doi.org/10.1145/3626772.3657689","published":"2024-07-10","authors":["Qian Dong","Yiding Liu","Qingyao Ai","Zhijing Wu","Haitao Li","Yiqun Liu","Shuaiqiang Wang","Dawei Yin","Shaoping Ma"],"abstract":"Large language models (LLMs) have demonstrated remarkable capabilities across various research domains, including the field of Information Retrieval (IR). However, the responses generated by off-the-shelf LLMs tend to be generic, i.e., cannot capture the distinctiveness of each document with similar content. This limits the performance of LLMs in IR because finding and distinguishing relevant documents from substantial similar documents is a typical problem in many IR tasks. To address this issue, we propose an unsupervised alignment method, namely Reinforcement Learning from Contrastive Feedback (RLCF), empowering LLMs to generate both high-quality and context-specific responses. Our approach constructs unsupervised contrastive feedback signals based on similar document groups, and adopts a reward function, named group-wise reciprocal rank, to optimize LLMs. We conduct extensive experim...","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3626772.3657689","openalex_id":"https://openalex.org/W4400524512","cited_by_count":7,"quality_score":52,"matched_keywords":["language model","retrieval"],"author_affiliations":["Baidu (China)","Beijing Institute of Technology","Tsinghua University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8263869285583496},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5936888456344604},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.5922753810882568},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.4404805302619934},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.40406256914138794}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":7}},{"id":"apple:tuas1ka4k1ubs4xwgw3u154x","title":"TOAD: Task-Oriented Automatic Dialogs with Diverse Response Styles","url":"https://machinelearning.apple.com/research/toad","published":"2024-07-10","authors":["Yinhong Liu","Yimai Fang","David Vandyke","Nigel Collier"],"abstract":"In light of recent advances in large language models (LLMs), the expectations for the next generation of virtual assistants include enhanced naturalness and adaptability across diverse usage scenarios. However, the creation of high-quality annotated data for Task-Oriented Dialog (TOD) is recognized to be slow and costly. To address these challenges, we introduce Task-Oriented Automatic Dialogs (TOAD), a novel and scalable TOD dataset along with...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"openalex:W4400529471","title":"Enhancing Baidu Multimodal Advertisement with Chinese Text-to-Image Generation via Bilingual Alignment and Caption Synthesis","url":"https://doi.org/10.1145/3626772.3661350","published":"2024-07-10","authors":["Kang Zhao","Xinyu Zhao","Zhipeng Jin","Yi Yang","Wen Tao","Cong Han","Shuanglong Li","Lin Liu"],"abstract":"Recent advances in generative artificial intelligence have revolutionized information retrieval and content generation, opening up new opportunities for the e-commerce industry. In particular, text-to-image generation models offer a novel approach to guiding the image generation process using natural language input, which is inspiring for multimodal search advertising. Traditional multimodal search ads require advertisers to prepare ad creatives, such as ad images, which is time-consuming and requires uniform image specifications and content quality inspection. To this end, we propose a streamlined generation framework for search ad image creatives. First, we prepare a Chinese image caption model with high-quality image-caption pairs to bootstrap training data refinement. With curated high-quality images and synthesized descriptive captions, we then train a Chinese text-to-image generati...","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3626772.3661350","openalex_id":"https://openalex.org/W4400529471","cited_by_count":5,"quality_score":46,"matched_keywords":["retrieval"],"author_affiliations":["Baidu (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7338413000106812},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.5247645378112793},{"id":"https://openalex.org/C28490314","display_name":"Speech recognition","score":0.4520558714866638},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4316824674606323},{"id":"https://openalex.org/C115961682","display_name":"Image (mathematics)","score":0.4217183291912079},{"id":"https://openalex.org/C112698675","display_name":"Advertising","score":0.338601291179657},{"id":"https://openalex.org/C144133560","display_name":"Business","score":0.07151120901107788}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":5}},{"id":"openalex:W4402263605","title":"Verification-Aided Learning of Neural Network Barrier Functions with Termination Guarantees","url":"https://doi.org/10.23919/acc60939.2024.10645043","published":"2024-07-10","authors":["Shaoru Chen","Lekan Molu","Mahyar Fazlyab"],"abstract":"Barrier functions are a general framework for establishing a safety guarantee for a system. However, there is no general method for finding these functions. To address this shortcoming, recent approaches use self-supervised learning techniques to learn these functions using training data that are periodically generated by a verification procedure, leading to a verification-aided learning framework. Despite its immense potential in automating barrier function synthesis, the verification-aided learning framework does not have termination guarantees and may suffer from a low success rate of finding a valid barrier function in practice. In this paper, we propose a holistic approach to address these drawbacks. With a convex formulation of the barrier function synthesis, we propose to first learn an empirically well-behaved neural network basis function and then apply a fine-tuning algorithm t...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.23919/acc60939.2024.10645043","openalex_id":"https://openalex.org/W4402263605","cited_by_count":5,"quality_score":42,"matched_keywords":[],"author_affiliations":["Johns Hopkins University","Microsoft (United States)","Microsoft Research New York City (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.755748450756073},{"id":"https://openalex.org/C50644808","display_name":"Artificial neural network","score":0.6431148052215576},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.37876737117767334}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":5}},{"id":"openalex:W4400525222","title":"Short Video Ordering via Position Decoding and Successor Prediction","url":"https://doi.org/10.1145/3626772.3657795","published":"2024-07-10","authors":["Shiping Ge","Qiang Chen","Zhiwei Jiang","Yafeng Yin","Ziyao Chen","Qing Gu"],"abstract":"Short video collection is an easy way for users to consume coherent content on various online short video platforms, such as TikTok, YouTube, Douyin, and WeChat Channel. These collections cover a wide range of content, including online courses, TV series, movies, and cartoons. However, short video creators occasionally publish videos in a disorganized manner due to various reasons, such as revisions, secondary creations, deletions, and reissues, which often result in a poor browsing experience for users. Therefore, accurately reordering videos within a collection based on their content coherence is a vital task that can enhance user experience and presents an intriguing research problem in the field of video narrative reasoning. In this work, we curate a dedicated multimodal dataset for this Short Video Ordering (SVO) task and present the performance of some benchmark methods on the data...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3626772.3657795","openalex_id":"https://openalex.org/W4400525222","cited_by_count":1,"quality_score":38,"matched_keywords":[],"author_affiliations":["Nanjing University","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C75306776","display_name":"Successor cardinal","score":0.802837610244751},{"id":"https://openalex.org/C57273362","display_name":"Decoding methods","score":0.7177481651306152},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6807436943054199},{"id":"https://openalex.org/C198082294","display_name":"Position (finance)","score":0.6769959926605225},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.42547935247421265},{"id":"https://openalex.org/C78780964","display_name":"Position paper","score":0.41344499588012695},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.36540117859840393},{"id":"https://openalex.org/C11413529","display_name":"Algorithm","score":0.31703007221221924}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4401853180","title":"A Model and Query Language for Multi-modal Hybrid Query","url":"https://doi.org/10.1145/3676288.3676291","published":"2024-07-10","authors":["Chuan Hu","Zihao Zhao","Along Mao","Zhihong Shen"],"abstract":"As data grows exponentially, its diversity also increases, including both structured forms and unstructured forms like audio, images, and videos. Advances in AI have improved our ability to analyze unstructured data, leading to the use of multimodal hybrid queries that blend structured and unstructured data. However, database systems struggle due to the lack of adequate data models for multimodal data and languages for these hybrid queries. This paper extends the property graph model to represent multimodal data and their semantic information, introducing essential functions for hybrid graph queries. A high-level graph query language, CypherPlus, is presented, capable of expressing hybrid queries like “Give me the friends of the friends of Mary, who have blond hair and are younger than 30 years old.” A Neo4j-based implementation and experiments over synthetic and real-world datasets demo...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3676288.3676291","openalex_id":"https://openalex.org/W4401853180","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Chinese Academy of Sciences","Computer Network Information Center","Huawei Technologies (China)","University of Chinese Academy of Sciences"],"concepts":[{"id":"https://openalex.org/C96956885","display_name":"RDF query language","score":0.8675514459609985},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.823114275932312},{"id":"https://openalex.org/C157692150","display_name":"Query optimization","score":0.7808632850646973},{"id":"https://openalex.org/C192028432","display_name":"Query language","score":0.7708617448806763},{"id":"https://openalex.org/C192939062","display_name":"Sargable","score":0.7209348082542419},{"id":"https://openalex.org/C99016210","display_name":"Query expansion","score":0.7028946280479431},{"id":"https://openalex.org/C164120249","display_name":"Web search query","score":0.6345899701118469},{"id":"https://openalex.org/C117667704","display_name":"Object Query Language","score":0.6019907593727112}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"arxiv:2403.11202","title":"Data is all you need: Finetuning LLMs for Chip Design via an Automated design-data augmentation framework","url":"https://huggingface.co/papers/2403.11202","published":"2024-07-10","authors":["Kaiyan Chang","Kun Wang","Nan Yang","Ying Wang","Dantong Jin","Wenlong Zhu","Zhirong Chen","Cangyuan Li","Hao Yan","Yunhao Zhou","Zhuoliang Zhao","Yuan Cheng"],"abstract":"Recent advances in large language models have demonstrated their potential for automated generation of hardware description language (HDL) code from high-level prompts. Researchers have utilized fine-tuning to enhance the ability of these large language models (LLMs) in the field of Chip Design. However, the lack of Verilog data hinders further improvement in the quality of Verilog generation by LLMs. Additionally, the absence of a Verilog and Electronic Design Automation (EDA) script data augmentation framework significantly increases the time required to prepare the training dataset for LLM trainers. This paper proposes an automated design-data augmentation framework, which generates high-volume and high-quality natural language aligned with Verilog and EDA scripts. For Verilog generation, it translates Verilog files to an abstract syntax tree and then maps nodes to natural language wi...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["huggingface_search"],"source":"huggingface_search","work_type":"","doi":"","openalex_id":"","cited_by_count":0,"quality_score":31,"matched_keywords":["LLM"],"author_affiliations":[],"concepts":[],"official_report":false,"quality_signals":{"company_match_source":"HuggingFace Papers organization/author metadata"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/monitorassistant-simplifying-cloud-service-monitoring-via-large-language-models","title":"MonitorAssistant: Simplifying Cloud Service Monitoring via Large Language Models","url":"https://www.microsoft.com/en-us/research/publication/monitorassistant-simplifying-cloud-service-monitoring-via-large-language-models/","published":"2024-07-09","authors":["Zhaoyang Yu","Minghua Ma","Chaoyun Zhang","Si Qin","Yu Kang","Chetan Bansal","Saravan Rajmohan","Yingnong Dang","Changhua Pei","Dan Pei","Qingwei Lin 林庆维","Dongmei Zhang"],"abstract":"In large-scale cloud service systems, monitoring metric data and conducting anomaly detection is an important way to maintain reliability and stability. However, great disparity exists between academic approaches and industrial practice to anomaly detection. Industry predominantly uses simple, efficient methods due to better interpretability and ease of implementation. In contrast, academically favor deep-learning methods, despite their advanced capabilities, face practical challenges in real-world applications. To address these challenges, this paper introduces MonitorAssistant, an end-to-end practical anomaly detection system via Large Language Models. MonitorAssistant automates model configuration recommendation achieving knowledge inheritance and alarm interpretation with guidance-oriented anomaly reports, facilitating a more intuitive engineer-system interaction through natural lang...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","1970-01-01","efficient"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W4400451984","title":"Predicting blood–brain barrier permeability of molecules with a large language model and machine learning","url":"https://doi.org/10.1038/s41598-024-66897-y","published":"2024-07-09","authors":["Eddie Huang","Jai‐Sing Yang","Ken Ying-Kai Liao","Warren C. W. Tseng","Chien-Yu Lee","Michelle Gill","Colin B. Compas","Simon See","Fuu‐Jen Tsai"],"abstract":"Predicting the blood-brain barrier (BBB) permeability of small-molecule compounds using a novel artificial intelligence platform is necessary for drug discovery. Machine learning and a large language model on artificial intelligence (AI) tools improve the accuracy and shorten the time for new drug development. The primary goal of this research is to develop artificial intelligence (AI) computing models and novel deep learning architectures capable of predicting whether molecules can permeate the human blood-brain barrier (BBB). The in silico (computational) and in vitro (experimental) results were validated by the Natural Products Research Laboratories (NPRL) at China Medical University Hospital (CMUH). The transformer-based MegaMolBART was used as the simplified molecular input line entry system (SMILES) encoder with an XGBoost classifier as an in silico method to check if a molecule co...","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1038/s41598-024-66897-y","openalex_id":"https://openalex.org/W4400451984","cited_by_count":49,"quality_score":71,"matched_keywords":["language model"],"author_affiliations":["China Medical University","China Medical University Hospital","Nvidia (United States)"],"concepts":[{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.669058084487915},{"id":"https://openalex.org/C2775905019","display_name":"In silico","score":0.6629021167755127},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5922375917434692},{"id":"https://openalex.org/C2778402981","display_name":"Blood–brain barrier","score":0.5860844850540161},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.526630699634552},{"id":"https://openalex.org/C169258074","display_name":"Random forest","score":0.4489353597164154},{"id":"https://openalex.org/C74187038","display_name":"Drug discovery","score":0.4305747151374817},{"id":"https://openalex.org/C169903167","display_name":"Test set","score":0.4302378296852112}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":49}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/etalon-holistic-performance-evaluation-framework-for-llm-inference-systems","title":"Etalon: Holistic Performance Evaluation Framework for LLM Inference Systems","url":"https://www.microsoft.com/en-us/research/publication/etalon-holistic-performance-evaluation-framework-for-llm-inference-systems/","published":"2024-07-09","authors":["Amey Agrawal","Anmol Agarwal","Nitin Kedia","Jayashree Mohan","Souvik Kundu","Nipun Kwatra","Ramachandran Ramjee","Alexey Tumanov"],"abstract":"Serving large language models (LLMs) in production can incur substantial costs, which has prompted recent advances in inference system optimizations. Today, these systems are evaluated against conventional latency and throughput metrics (eg. TTFT, TBT, Normalised Latency and TPOT). However, these metrics fail to fully capture the nuances of LLM inference, leading to an incomplete assessment of user-facing performance crucial for real-time applications such as chat and translation. In this paper, we first identify the pitfalls of current performance metrics in evaluating LLM inference systems. We then propose Etalon, a comprehensive performance evaluation framework that includes fluidity-index -- a novel metric designed to reflect the intricacies of the LLM inference process and its impact on real-time user experience. Finally, we evaluate various existing open-source platforms and model-...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Inproceedings (Conference)","Systems and networking","LLM"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W4400409509","title":"TF²: Few-Shot Text-Free Training-Free Defect Image Generation for Industrial Anomaly Inspection","url":"https://doi.org/10.1109/tcsvt.2024.3424435","published":"2024-07-08","authors":["Qianzi Yu","Kai Zhu","Yang Cao","Feijie Xia","Yu Kang"],"abstract":"Anomaly inspection aims at identifying various defects in real time on modern industrial production lines. However, due to insufficient anomaly data, existing detectors cannot effectively accomplish the classification of defects, thereby failing to provide guidance for subsequent production. To address it, we propose TF2, a few-shot text-free training-free defect image generation method, which jointly models the image distribution of class-agnostic defects and backgrounds, achieving efficient semantic enhancement. Firstly, we propose the Response Alignment Strategy, which merges the reversed latent space of both defect-free and defective samples, generating new defect images not limited to textual descriptions yet with consistent content. Moreover, we introduce the Defect Moving Strategy and the Regional Average Loss to merge the reversed latent space of the moving areas and enhance the....","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/tcsvt.2024.3424435","openalex_id":"https://openalex.org/W4400409509","cited_by_count":17,"quality_score":58,"matched_keywords":["efficient"],"author_affiliations":["Alibaba Group (China)","University of Science and Technology of China"],"concepts":[{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.48898303508758545},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.46954450011253357},{"id":"https://openalex.org/C153180895","display_name":"Pattern recognition (psychology)","score":0.4298170804977417},{"id":"https://openalex.org/C2778344882","display_name":"Shot (pellet)","score":0.4182608425617218},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.39639145135879517},{"id":"https://openalex.org/C192562407","display_name":"Materials science","score":0.2245543897151947},{"id":"https://openalex.org/C191897082","display_name":"Metallurgy","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":17}},{"id":"openalex:W4403420890","title":"LogRAG: Semi-Supervised Log-based Anomaly Detection with Retrieval-Augmented Generation","url":"https://doi.org/10.1109/icws62655.2024.00129","published":"2024-07-07","authors":["Wanhao Zhang","Qianli Zhang","Enyu Yu","Yuxiang Ren","Yeqing Meng","Mingxi Qiu","Jilong Wang"],"abstract":"Log-based anomaly detection is critical in monitoring the operation of microservice systems and in the realtime reporting of system failures. Utilizing deep learning-based log anomaly detection methods facilitates effective detection of anomalies within logs. However, existing methods are greatly dependent on log parsers, and parsing errors can considerably affect downstream anomaly detection tasks. Additionally, methods that predict the next log event in a sequence are susceptible to the instability of sequences and the emergence of unseen logs as systems evolve, resulting in a higher false positive rate. In this paper, we propose a semi-supervised log anomaly detection framework based on retrieval-augmented generation (RAG). This framework conducts phased detection using both Log Tokens and Log Templates to mitigate the impact of log parsing errors. It also utilizes a single-class clas...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/icws62655.2024.00129","openalex_id":"https://openalex.org/W4403420890","cited_by_count":8,"quality_score":57,"matched_keywords":["LLM","language model","retrieval"],"author_affiliations":["Huawei Technologies (China)","Tsinghua University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6455749869346619},{"id":"https://openalex.org/C739882","display_name":"Anomaly detection","score":0.5667755007743835},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.46044763922691345},{"id":"https://openalex.org/C153180895","display_name":"Pattern recognition (psychology)","score":0.35362309217453003}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":8}},{"id":"openalex:W4401693274","title":"Predicting Uncertainty of Generative LLMs with MARS: Meaning-Aware Response Scoring","url":"http://dx.doi.org/10.1109/isit57864.2024.10619136","published":"2024-07-07","authors":["Yavuz Faruk Bakman","Duygu Nur Yaldiz","Baturalp Buyukates","Salman Avestimehr","Chenyang Tao","Dimitrios Dimitriadis"],"abstract":"Generative Large Language Models (LLMs) have recently been widely utilized for their unprecedented capabil-ities across many tasks. Considering their use in high-stakes environments and for mission-critical applications, the fact that LLMs often can generate inaccurate or misleading results can be potentially harmful, which motivates us to study the correctness of generative LLM outputs. Uncertainty Estimation (UE) in generative LLMs is a developing area, with state-of-the-art probability-based techniques frequently using length-normalized scoring. As an alternative to length-normalized scoring in UE, in this work, we propose Meaning-Aware Response Scoring (MARS). The key idea of MARS is to consider the semantic contribution of each token of the generated sequence to the context of the question during UE. Through extensive experiments on three question-answering datasets across five pret...","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/isit57864.2024.10619136","openalex_id":"https://openalex.org/W4401693274","cited_by_count":0,"quality_score":41,"matched_keywords":["LLM"],"author_affiliations":["Amazon (United States)","Southern California University for Professional Studies","University of Southern California"],"concepts":[{"id":"https://openalex.org/C2780876879","display_name":"Meaning (existential)","score":0.7197723388671875},{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.6910104751586914},{"id":"https://openalex.org/C83260615","display_name":"Mars Exploration Program","score":0.5612366199493408},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.4932183623313904},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.42252999544143677},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.3409494161605835},{"id":"https://openalex.org/C15744967","display_name":"Psychology","score":0.2739454507827759},{"id":"https://openalex.org/C87355193","display_name":"Astrobiology","score":0.18287113308906555}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4402260259","title":"Estimation of Downed Woody Time-Lag Fuel Loadings with Multimodal Remote Sensing Data and Ensemble Machine Learning Regression Model","url":"http://dx.doi.org/10.1109/igarss53475.2024.10641641","published":"2024-07-07","authors":["Riyaaz Uddien Shaik","Mohamad Alipour","Eric Rowell","Bharathan Balaji","Adam C. Watts","Ertuǧrul Taciroğlu"],"abstract":"Accurate fuel condition assessment is crucial for predicting fire behavior, enhancing operational decision support, and improving overall fire management. Our approach utilizes diverse data sources, such as Landsat-8 optical imagery, Sentinel-1 (C-band) SAR imagery, PALSAR (L-band) SAR imagery, and terrain features, to estimate time-lag fuel loadings (1 hour, 10 hours, and 100 hours). Optical data mainly captures the characteristics of leaf and forest canopy, while SAR data is more sensitive to forest vertical structures due to its strong penetrability. An ensemble model was trained on the Forest Inventory and Analysis (FIA) plots and spectral indices. Followed by, feature importance analysis and the inclusion of polynomial features were undertaken. The ensemble strategy, involving neural networks, decision trees, gradient boosting, and ensemble methods, achieved R<sup xmlns:mml=\"http://...","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/igarss53475.2024.10641641","openalex_id":"https://openalex.org/W4402260259","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Amazon (United States)","Desert Research Institute","US Forest Service","University of California, Los Angeles","University of Illinois Urbana-Champaign"],"concepts":[{"id":"https://openalex.org/C75778745","display_name":"Lag","score":0.7361088991165161},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6067144274711609},{"id":"https://openalex.org/C67186912","display_name":"Data modeling","score":0.5425804257392883},{"id":"https://openalex.org/C152877465","display_name":"Regression analysis","score":0.5387319922447205},{"id":"https://openalex.org/C2993377847","display_name":"Lag time","score":0.5349534749984741},{"id":"https://openalex.org/C83546350","display_name":"Regression","score":0.48415786027908325},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4755798876285553},{"id":"https://openalex.org/C169258074","display_name":"Random forest","score":0.4547211527824402}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4403125288","title":"Cloud Gaming Video Coding Algorithm Assisted by Pixel-Level Motion Vector Prediction Utilizing Camera Information","url":"https://doi.org/10.1109/ucom62433.2024.10695846","published":"2024-07-05","authors":["Haonan Sun","Yuan Chen","Yifan Wang","Fuzheng Yang","Kun Yang","Gaoxing Chen"],"abstract":"The rapid development of cloud gaming puts tremendous pressure on network bandwidth, highlighting the crucial role of video coding technology. Considering substantial disparities in video generation between natural recordings and those from video games, with game videos often characterized by frequent and rapid camera rotations, directly applying conventional video compression technologies to gaming videos is inadequate. Therefore, we propose a novel cloud gaming video coding algorithm assisted by pixel-level motion vector prediction. By meticulously analyzing the coordinate mapping process within the game rendering pipeline, we present an efficient pixel-level motion vector prediction method leveraging camera information, which is integrated in the H.266VVC standard test model (VTM) as an additional inter-frame mode to compete with existing modes for rate-distortion optimization. Experi...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/ucom62433.2024.10695846","openalex_id":"https://openalex.org/W4403125288","cited_by_count":0,"quality_score":45,"matched_keywords":["efficient","compression"],"author_affiliations":["Alibaba Group (China)","Xidian University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8028329610824585},{"id":"https://openalex.org/C2779020251","display_name":"Motion vector","score":0.6659908294677734},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.6198868751525879},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.6146547198295593},{"id":"https://openalex.org/C179518139","display_name":"Coding (social sciences)","score":0.6010133028030396},{"id":"https://openalex.org/C160633673","display_name":"Pixel","score":0.5402019619941711},{"id":"https://openalex.org/C174493125","display_name":"Quarter-pixel motion","score":0.468948096036911},{"id":"https://openalex.org/C10161872","display_name":"Motion estimation","score":0.45659077167510986}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4400316340","title":"Language-aware multiple datasets detection pretraining for DETRs","url":"https://doi.org/10.1016/j.neunet.2024.106506","published":"2024-07-04","authors":["Jing Hao","Song Chen"],"abstract":"Pretraining on large-scale datasets can boost the performance of object detectors while the annotated datasets for object detection are hard to scale up due to the high labor cost. What we possess are numerous isolated filed-specific datasets, thus, it is appealing to jointly pretrain models across aggregation of datasets to enhance data volume and diversity. In this paper, we propose a strong framework for utilizing Multiple datasets to pretrain DETR-like detectors, termed METR, without the need for manual label spaces integration. It converts the typical multi-classification in object detection into binary classification by introducing a pre-trained language model. Specifically, we design a category extraction module for extracting potential categories involved in an image and assign these categories into different queries by language embeddings. Each query is only responsible for pred...","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1016/j.neunet.2024.106506","openalex_id":"https://openalex.org/W4400316340","cited_by_count":2,"quality_score":43,"matched_keywords":["language model"],"author_affiliations":["Baidu (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8641266822814941},{"id":"https://openalex.org/C185798385","display_name":"Benchmark (surveying)","score":0.6381574869155884},{"id":"https://openalex.org/C2777212361","display_name":"Class (philosophy)","score":0.6016223430633545},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5780746340751648},{"id":"https://openalex.org/C2781238097","display_name":"Object (grammar)","score":0.5428826808929443},{"id":"https://openalex.org/C2776151529","display_name":"Object detection","score":0.4834914803504944},{"id":"https://openalex.org/C165064840","display_name":"Matching (statistics)","score":0.45878344774246216},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.45103439688682556}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":2}},{"id":"openalex:W4400321017","title":"Multi-modal conditioning for metal-organic frameworks generation using 3D modeling techniques","url":"https://doi.org/10.26434/chemrxiv-2024-w8fps","published":"2024-07-04","authors":["Junkil Park","Youhan Lee","Jihan Kim"],"abstract":"The design of porous materials with user-desired properties has been a great interest for the last few decades. However, the flexibility of target properties has been highly limited, and targeting multiple properties of diverse modalities simultaneously has been scarcely explored. Furthermore, although deep generative models have opened a new paradigm in materials generation, their incorporation into porous materials such as metal-organic frameworks (MOFs) has not been satisfactory due to their structural complexity. In this work, we introduce MOFFUSION, a latent diffusion model that addresses the aforementioned challenges. Signed distance functions (SDFs) were employed for the input representation of MOFs, marking their first usage in representing porous materials for generative models. Using the suitability of SDFs in describing complicated pore structures, MOFFUSION exhibited exceptio...","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"preprint","doi":"https://doi.org/10.26434/chemrxiv-2024-w8fps","openalex_id":"https://openalex.org/W4400321017","cited_by_count":5,"quality_score":42,"matched_keywords":[],"author_affiliations":["Korea Advanced Institute of Science and Technology","Nvidia (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6211094260215759},{"id":"https://openalex.org/C2780598303","display_name":"Flexibility (engineering)","score":0.5794224739074707},{"id":"https://openalex.org/C5274069","display_name":"Categorical variable","score":0.5639376044273376},{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.5631058216094971},{"id":"https://openalex.org/C2776359362","display_name":"Representation (politics)","score":0.553218424320221},{"id":"https://openalex.org/C167966045","display_name":"Generative model","score":0.5248863101005554},{"id":"https://openalex.org/C71139939","display_name":"Modal","score":0.46972906589508057},{"id":"https://openalex.org/C2779903281","display_name":"Modalities","score":0.461273193359375}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":5}},{"id":"arxiv:2407.03648","title":"High Fidelity Text-Guided Music Generation and Editing via Single-Stage Flow Matching","url":"https://huggingface.co/papers/2407.03648","published":"2024-07-04","authors":["Gael Le Lan","Bowen Shi","Zhaoheng Ni","Sidd Srinivasan","Anurag Kumar","Brian Ellis","David Kant","Varun Nagaraja","Ernie Chang","Wei-Ning Hsu","Yangyang Shi","Vikas Chandra"],"abstract":"We introduce a simple and efficient text-controllable high-fidelity music generation and editing model. It operates on sequences of continuous latent representations from a low frame rate 48 kHz stereo variational auto encoder codec that eliminates the information loss drawback of discrete representations. Based on a diffusion transformer architecture trained on a flow-matching objective the model can generate and edit diverse high quality stereo samples of variable duration, with simple text descriptions. We also explore a new regularized latent inversion method for zero-shot test-time text-guided editing and demonstrate its superior performance over naive denoising diffusion implicit model (DDIM) inversion for variety of music editing prompts. Evaluations are conducted on both objective and subjective metrics and demonstrate that the proposed model is not only competitive to the evalua...","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["huggingface_search"],"source":"huggingface_search","work_type":"","doi":"","openalex_id":"","cited_by_count":0,"quality_score":31,"matched_keywords":["efficient"],"author_affiliations":[],"concepts":[],"official_report":false,"quality_signals":{"company_match_source":"HuggingFace Papers organization/author metadata"}},{"id":"arxiv:2407.04051","title":"FunAudioLLM: Voice Understanding and Generation Foundation Models for Natural Interaction Between Humans and LLMs","url":"https://huggingface.co/papers/2407.04051","published":"2024-07-04","authors":["Tongyi SpeechTeam"],"abstract":"This report introduces FunAudioLLM, a model family designed to enhance natural voice interactions between humans and large language models (LLMs). At its core are two innovative models: SenseVoice, which handles multilingual speech recognition, emotion recognition, and audio event detection; and CosyVoice, which facilitates natural speech generation with control over multiple languages, timbre, speaking style, and speaker identity. SenseVoice-Small delivers exceptionally low-latency ASR for 5 languages, and SenseVoice-Large supports high-precision ASR for over 50 languages, while CosyVoice excels in multi-lingual voice generation, zero-shot in-context learning, cross-lingual voice cloning, and instruction-following capabilities. The models related to SenseVoice and CosyVoice have been open-sourced on Modelscope and Huggingface, along with the corresponding training, inference, and fine-t...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["huggingface_search"],"source":"huggingface_search","work_type":"","doi":"","openalex_id":"","cited_by_count":0,"quality_score":31,"matched_keywords":["LLM"],"author_affiliations":[],"concepts":[],"official_report":false,"quality_signals":{"company_match_source":"HuggingFace Papers organization/author metadata"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/agentinstruct-toward-generative-teaching-with-agentic-flows","title":"AgentInstruct: Toward Generative Teaching with Agentic Flows","url":"https://www.microsoft.com/en-us/research/publication/agentinstruct-toward-generative-teaching-with-agentic-flows/","published":"2024-07-03","authors":["Arindam Mitra","Luciano Del Corro","Guoqing Zheng","Shweti Mahajan","Dany Rouhana","Andres Codas","Yadong Lu","Wei-ge Chen","Olga Vrousgou","Corby Rosset","Fillipe Silva","Hamed Khanpour"],"abstract":"Synthetic data is becoming increasingly important for accelerating the development of language models, both large and small. Despite several successful use cases, researchers also raised concerns around model collapse and drawbacks of imitating other models. This discrepancy can be attributed to the fact that synthetic data varies in quality and diversity. Effective use of synthetic data usually requires significant human effort in curating the data. We focus on using synthetic data for post-training, specifically creating data by powerful models to teach a new skill or behavior to another model, we refer to this setting as Generative Teaching. We introduce AgentInstruct, an extensible agentic framework for automatically creating large amounts of diverse and high-quality synthetic data. AgentInstruct can create both the prompts and responses, using only raw data sources like text documen...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Miscellaneous","Artificial intelligence","human language technologies"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/viseval-a-benchmark-for-data-visualization-in-the-era-of-large-language-models","title":"VisEval: A Benchmark for Data Visualization in the Era of Large Language Models","url":"https://www.microsoft.com/en-us/research/publication/viseval-a-benchmark-for-data-visualization-in-the-era-of-large-language-models/","published":"2024-07-01","authors":["Nan Chen","Yuge Zhang","Jiahang Xu","Kan Ren","Yuqing Yang"],"abstract":"Translating natural language to visualization (NL2VIS) has shown great promise for visual data analysis, but it remains a challenging task that requires multiple low-level implementations, such as natural language processing and visualization design. Recent advancements in pre-trained large language models (LLMs) are opening new avenues for generating visualizations from natural language. However, the lack of a comprehensive and reliable benchmark hinders our understanding of LLMs' capabilities in visualization generation. In this paper, we address this gap by proposing a new NL2VIS benchmark called VisEval. Firstly, we introduce a high-quality and large-scale dataset. This dataset includes 2,524 representative queries covering 146 databases, paired with accurately labeled ground truths. Secondly, we advocate for a comprehensive automated evaluation methodology covering multiple dimensio...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.1109/tvcg.2024.3456320","openalex_id":"https://openalex.org/W4402401986","cited_by_count":34,"quality_score":102,"matched_keywords":["Article (Journal)","Human language technologies","Human-computer interaction","human language technologies","Human–computer interaction"],"author_affiliations":["Microsoft","Microsoft (United States)","ShanghaiTech University"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/mixdq-memory-efficient-few-step-text-to-image-diffusion-models-with-metric-decoupled-mixed-precision-quantization","title":"MixDQ: Memory-Efficient Few-Step Text-to-Image Diffusion Models with Metric-Decoupled Mixed Precision Quantization","url":"https://www.microsoft.com/en-us/research/publication/mixdq-memory-efficient-few-step-text-to-image-diffusion-models-with-metric-decoupled-mixed-precision-quantization/","published":"2024-07-01","authors":["Tianchen Zhao","Xuefei Ning","Tongcheng Fang","Enshu Liu","Guyue Huang","Zinan Lin","Shengen Yan","Guohao Dai","Yu Wang"],"abstract":"Diffusion models have achieved significant visual generation quality. However, their significant computational and memory costs pose challenge for their application on resource-constrained mobile devices or even desktop GPUs. Recent few-step diffusion models reduces the inference time by reducing the denoising steps. However, their memory consumptions are still excessive. The Post Training Quantization (PTQ) replaces high bit-width FP representation with low-bit integer values (INT4/8) , which is an effective and efficient technique to reduce the memory cost. However, when applying to few-step diffusion models, existing quantization methods face challenges in preserving both the image quality and text alignment. To address this issue, we propose an mixed-precision quantization framework - MixDQ. Firstly, We design specialized BOS-aware quantization method for highly sensitive text embedd...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.1007/978-3-031-72630-9_17","openalex_id":"https://openalex.org/W4405003117","cited_by_count":7,"quality_score":99,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer vision","Diffusion models","Generative model","Machine learning","1970-01-01","memory","efficient","quantization"],"author_affiliations":["Microsoft","Infinitus (China)","Microsoft (United States)","Shanghai Jiao Tong University","Tsinghua University","University of California, Santa Barbara"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/differentially-private-synthetic-data-via-foundation-model-apis-2-text","title":"Differentially Private Synthetic Data via Foundation Model APIs 2: Text","url":"https://www.microsoft.com/en-us/research/publication/differentially-private-synthetic-data-via-foundation-model-apis-2-text/","published":"2024-07-01","authors":["Chulin Xie","Zinan Lin","Arturs Backurs","Sivakanth Gopi","Da Yu","Huseyin Inan","Harsha Nori","Haotian Jiang","Huishuai Zhang","Yin Tat Lee","Bo Li","Sergey Yekhanin"],"abstract":"Text data has become extremely valuable due to the emergence of machine learning algorithms that learn from it. A lot of high-quality text data generated in the real world is private and therefore cannot be shared or used freely due to privacy concerns. Generating synthetic replicas of private text data with a formal privacy guarantee, i.e., differential privacy (DP), offers a promising and scalable solution. However, existing methods necessitate DP finetuning of large language models (LLMs) on private data to generate DP synthetic data. This approach is not viable for proprietary LLMs (e.g., GPT-3.5) and also demands considerable computational resources for open-source LLMs. Lin et al. (2024) recently introduced the Private Evolution (PE) algorithm to generate DP synthetic images with only API access to diffusion models. In this work, we propose an augmented PE algorithm, named Aug-PE,....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":96,"matched_keywords":["Inproceedings (Conference)","Algorithms","Artificial intelligence","Mathematics","Security, privacy, and cryptography","data privacy","Differential privacy","NLP","Synthetic data","1970-01-01","LLM"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/rubicon-rubric-based-evaluation-of-domain-specific-human-ai-conversations","title":"RUBICON: Rubric-based Evaluation of Domain Specific Human-AI Conversations","url":"https://www.microsoft.com/en-us/research/publication/rubicon-rubric-based-evaluation-of-domain-specific-human-ai-conversations/","published":"2024-07-01","authors":["Param Biyani","Yasharth Bajpai","Arjun Radhakrishna","Gustavo Soares","Sumit Gulwani"],"abstract":"The evaluation of conversational assistants, such as GitHub Copilot Chat, poses a significant challenge for tool builders in the domain of Software Engineering. These assistants rely on language models and chat-based user experiences, making evaluating them according to the quality of the Human-AI conversations complicated. Exist ing general-purpose conversational quality metrics from literature are inadequate for assessing domain-specific dialogues due to their lack of context sensitivity. In this paper, we present RUBICON, a technique for evaluating domain-specific Human-AI conversations. RUBICON leverages large language models to generate rubrics for assessing conversation quality. It employs a selection process to choose the subset of rubrics based on their performance in scoring conversations. In our experiments, RUBICON effectively learns to differentiate conversation quality, achi...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.1145/3664646.3664778","openalex_id":"https://openalex.org/W4400484844","cited_by_count":4,"quality_score":84,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Human-computer interaction","Programming languages and software engineering","automatic evaluation","Machine learning","1970-01-01"],"author_affiliations":["Microsoft","Microsoft (United States)","Microsoft Research (India)"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/natural-language-to-class-level-code-generation-by-iterative-tool-augmented-reasoning-over-repository","title":"Class-Level Code Generation from Natural Language Using Iterative, Tool-Enhanced Reasoning over Repository","url":"https://www.microsoft.com/en-us/research/publication/natural-language-to-class-level-code-generation-by-iterative-tool-augmented-reasoning-over-repository/","published":"2024-07-01","authors":["Ajinkya Deshpande","Anmol Agarwal","Shashank Shet","Arun Iyer","Aditya Kanade","Ramakrishna Bairi","Suresh Parthasarathy"],"abstract":"LLMs have demonstrated significant potential in code generation tasks, achieving promising results at the function or statement level across various benchmarks. However, the complexities associated with creating code artifacts like classes, particularly within the context of real-world software repositories, remain underexplored. Prior research treats class-level generation as an isolated task, neglecting the intricate dependencies & interactions that characterize real-world software environments. To address this gap, we introduce RepoClassBench, a comprehensive benchmark designed to rigorously evaluate LLMs in generating complex, class-level code within real-world repositories. RepoClassBench includes \"Natural Language to Class generation\" tasks across Java, Python & C# from a selection of repositories. We ensure that each class in our dataset not only has cross-file dependencies within...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":84,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Programming languages and software engineering","large language models","Machine learning","software engineering","1970-01-01","agent"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/can-llms-be-fooled-investigating-vulnerabilities-in-llms","title":"Can LLMs be Fooled? Investigating Vulnerabilities in LLMs","url":"https://www.microsoft.com/en-us/research/publication/can-llms-be-fooled-investigating-vulnerabilities-in-llms/","published":"2024-07-01","authors":["Sara Abdali","Jia He","CJ Barberan","Richard Anarfi"],"abstract":"The advent of Large Language Models (LLMs) has garnered significant popularity and wielded immense power across various domains within Natural Language Processing (NLP). While their capabilities are undeniably impressive, it is crucial to identify and scrutinize their vulnerabilities especially when those vulnerabilities can have costly consequences. One such LLM, trained to provide a concise summarization from medical documents could unequivocally leak personal patient data when prompted surreptitiously. This is just one of many unfortunate examples that have been unveiled and further research is necessary to comprehend the underlying reasons behind such vulnerabilities. In this study, we delve into multiple sections of vulnerabilities which are model-based, training-time, inference-time vulnerabilities, and discuss mitigation strategies including “Model Editing” which aims at modifying...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":84,"matched_keywords":["Miscellaneous","Artificial intelligence","Human language technologies","Security, privacy, and cryptography","Social sciences","large language models","Natural language processing","LLM"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/language-guided-skill-learning-with-temporal-variational-inference","title":"Language-guided Skill Learning with Temporal Variational Inference","url":"https://www.microsoft.com/en-us/research/publication/language-guided-skill-learning-with-temporal-variational-inference/","published":"2024-07-01","authors":["Haotian Fu","Pratyusha Sharma","Elias Stengel-Eskin","George Konidaris","Nicolas Le Roux","Marc-Alexandre Côté","Xingdi Yuan"],"abstract":"We present an algorithm for skill discovery from expert demonstrations. The algorithm first utilizes Large Language Models (LLMs) to propose an initial segmentation of the trajectories. Following that, a hierarchical variational inference framework incorporates the LLM-generated segmentation information to discover reusable skills by merging trajectory segments. To further control the trade-off between compression and reusability, we introduce a novel auxiliary objective based on the Minimum Description Length principle that helps guide this skill discovery process. Our results demonstrate that agents equipped with our method are able to discover skills that help accelerate learning and outperform baseline skill learning approaches on new long-horizon tasks in BabyAI, a grid world navigation environment, as well as ALFRED, a household simulation environment. Venue: 1970-01-01","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":80,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","Machine learning","1970-01-01","LLM","compression"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/exploring-llm-based-agents-for-root-cause-analysis","title":"Exploring LLM-based Agents for Root Cause Analysis","url":"https://www.microsoft.com/en-us/research/publication/exploring-llm-based-agents-for-root-cause-analysis/","published":"2024-07-01","authors":["Devjeet Roy","Xuchao Zhang","Rashi Bhave","Chetan Bansal","Pedro Las-Casas","Rodrigo Fonseca","Saravan Rajmohan"],"abstract":"The growing complexity of cloud based software systems has resulted in incident management becoming an integral part of the software development lifecycle. Root cause analysis (RCA), a critical part of the incident management process, is a demanding task for on-call engineers, requiring deep domain knowledge and extensive experience with a team's specific services. Automation of RCA can result in significant savings of time, and ease the burden of incident management on on-call engineers. Recently, researchers have utilized Large Language Models (LLMs) to perform RCA, and have demonstrated promising results. However, these approaches are not able to dynamically collect additional diagnostic information such as incident related logs, metrics or databases, severely restricting their ability to diagnose root causes. In this work, we explore the use of LLM based agents for RCA to address thi...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":80,"matched_keywords":["Inproceedings (Conference)","Algorithms","Artificial intelligence","1970-01-01","LLM","retrieval","agent"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/nnscaler-constraint-guided-parallelization-plan-generation-for-deep-learning-training","title":"nnScaler: Constraint-Guided Parallelization Plan Generation for Deep Learning Training","url":"https://www.microsoft.com/en-us/research/publication/nnscaler-constraint-guided-parallelization-plan-generation-for-deep-learning-training/","published":"2024-07-01","authors":["Zhiqi Lin","Youshan Miao","Quanlu Zhang","Fan Yang","Yi Zhu","Cheng Li","Saeed Maleki","Xu Cao","Ning Shang","Yilei Yang","Weijiang Xu","Mao Yang"],"abstract":"With the growing model size of deep neural networks (DNN), deep learning training is increasingly relying on handcrafted search spaces to find efficient parallelization execution plans. However, our study shows that existing search spaces exclude plans that significantly impact the training performance of well-known DNN models (e.g., AlphaFold2) under important settings, such as when handling large embedding tables in large language models.To address this problem, we propose nnScaler, a framework that generates efficient parallelization plans for deep learning training. Instead of relying on the existing search space, nnScaler advocates a more general approach that empowers domain experts to construct their own search space through three primitives, op-trans, op-assign, and op-order, which capture model transformation and the temporal-spatial scheduling of the transformed model of any pa...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Systems and networking","Computer System","1970-01-01","efficient"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/the-art-of-saying-no-contextual-noncompliance-in-language-models","title":"The Art of Saying No: Contextual Noncompliance in Language Models","url":"https://www.microsoft.com/en-us/research/publication/the-art-of-saying-no-contextual-noncompliance-in-language-models/","published":"2024-07-01","authors":["Faeze Brahman","Sachin Kumar","Vidhisha Balachandran","Pradeep Dasigi","Valentina Pyatkin","Abhilasha Ravichander","Sarah Wiegreffe","Nouha Dziri","K. Chandu","Jack Hessel","Yulia Tsvetkov","Noah A. Smith"],"abstract":"Chat-based language models are designed to be helpful, yet they should not comply with every user request. While most existing work primarily focuses on refusal of\"unsafe\"queries, we posit that the scope of noncompliance should be broadened. We introduce a comprehensive taxonomy of contextual noncompliance describing when and how models should not comply with user requests. Our taxonomy spans a wide range of categories including incomplete, unsupported, indeterminate, and humanizing requests (in addition to unsafe requests). To test noncompliance capabilities of language models, we use this taxonomy to develop a new evaluation suite of 1000 noncompliance prompts. We find that most existing models show significantly high compliance rates in certain previously understudied categories with models like GPT-4 incorrectly complying with as many as 30% of requests. To address these gaps, we exp...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Chat-based language models","Computer science","1970-01-01","efficient"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/report-on-the-search-futures-workshop-at-ecir-2024","title":"Report on The Search Futures Workshop at ECIR 2024","url":"https://www.microsoft.com/en-us/research/publication/report-on-the-search-futures-workshop-at-ecir-2024/","published":"2024-07-01","authors":["Leif Azzopardi","Charles L. A. Clarke","Paul Kantor","Bhaskar Mitra","Johanne R. Trippas","Zhaochun Ren"],"abstract":"The First Search Futures Workshop, in conjunction with the Forty-sixth European Conference on Information Retrieval (ECIR) 2024, looked into the future of search to ask questions such as: How can we harness the power of generative AI to enhance, improve and re-imagine Information Retrieval (IR)? What are the principles and fundamental rights that the field of Information Retrieval should strive to uphold? How can we build trustworthy IR systems in light of Large Language Models and their ability to generate content at super human speeds? What new applications and affordances does generative AI offer and enable, and can we go back to the future, and do what we only dreamed of previously? The workshop started with seventeen lightning talks from a diverse set speakers. Instead of conventional paper presentations, the lightning talks provided a rapid and concise overview of ideas, allowing s...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Article (Journal)","Artificial intelligence","Search and information retrieval","Information retrieval","1970-01-01","retrieval"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/certainly-uncertain-a-benchmark-and-metric-for-multimodal-epistemic-and-aleatoric-awareness","title":"Certainly Uncertain: A Benchmark and Metric for Multimodal Epistemic and Aleatoric Awareness","url":"https://www.microsoft.com/en-us/research/publication/certainly-uncertain-a-benchmark-and-metric-for-multimodal-epistemic-and-aleatoric-awareness/","published":"2024-07-01","authors":["Khyathi Raghavi Chandu","Linjie Li","Anas Awadalla","Ximing Lu","J. Park","Jack Hessel","Lijuan Wang","Yejin Choi"],"abstract":"The ability to acknowledge the inevitable uncertainty in their knowledge and reasoning is a prerequisite for AI systems to be truly truthful and reliable. In this paper, we present a taxonomy of uncertainty specific to vision-language AI systems, distinguishing between epistemic uncertainty (arising from a lack of information) and aleatoric uncertainty (due to inherent unpredictability), and further explore finer categories within. Based on this taxonomy, we synthesize a benchmark dataset, CertainlyUncertain, featuring 178K visual question answering (VQA) samples as contrastive pairs. This is achieved by 1) inpainting images to make previously answerable questions into unanswerable ones; and 2) using image captions to prompt large language models for both answerable and unanswerable questions. Additionally, we introduce a new metric confidence-weighted accuracy, that is well correlated w...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer vision","Computer science","Vision-language models","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/core-resolving-code-quality-issues-using-llms","title":"CORE: Resolving Code Quality Issues using LLMs","url":"https://www.microsoft.com/en-us/research/publication/core-resolving-code-quality-issues-using-llms/","published":"2024-07-01","authors":["Nalin Wadhwa","Jui Pradhan","Atharv Sonwane","Surya Prakash Sahu","Nagarajan Natarajan","Aditya Kanade (kanadeaditya)","Suresh Parthasarathy (supartha)","Sriram Rajamani (sriram)"],"abstract":"As software projects progress, quality of code assumes paramount importance as it affects reliability, maintainability and security of software. For this reason, static analysis tools are used in developer workflows to flag code quality issues. However, developers need to spend extra efforts to revise their code to improve code quality based on the tool findings. In this work, we investigate the use of (instruction-following) large language models (LLMs) to assist developers in revising code to resolve code quality issues.We present a tool, CORE (short for COde REvisions), architected using a pair of LLMs organized as a duo comprised of a proposer and a ranker. Providers of static analysis tools recommend ways to mitigate the tool warnings and developers follow them to revise their code. The proposer LLM of CORE takes the same set of recommendations and applies them to generate candidate...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Programming languages and software engineering","software engineering","1970-01-01","LLM"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/the-vision-of-autonomic-computing-can-llms-make-it-a-reality","title":"The Vision of Autonomic Computing: Can LLMs Make It a Reality?","url":"https://www.microsoft.com/en-us/research/publication/the-vision-of-autonomic-computing-can-llms-make-it-a-reality/","published":"2024-07-01","authors":["Zhiyang Zhang","Fangkai Yang","Xiaoting Qin","Jue Zhang","Qingwei Lin 林庆维","Gong Cheng","Dongmei Zhang","Saravan Rajmohan","Qi Zhang"],"abstract":"The Vision of Autonomic Computing (ACV), proposed over two decades ago, envisions computing systems that self-manage akin to biological organisms, adapting seamlessly to changing environments. Despite decades of research, achieving ACV remains challenging due to the dynamic and complex nature of modern computing systems. Recent advancements in Large Language Models (LLMs) offer promising solutions to these challenges by leveraging their extensive knowledge, language understanding, and task automation capabilities. This paper explores the feasibility of realizing ACV through an LLM-based multi-agent framework for microservice management. We introduce a five-level taxonomy for autonomous service maintenance and present an online evaluation benchmark based on the Sock Shop microservice demo project to assess our framework's performance. Our findings demonstrate significant progress towards....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Article (Journal)","Artificial intelligence","LLM","agent","multi-agent"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/the-coexplorer-technology-probe-a-generative-ai-powered-adaptive-interface-to-support-intentionality-in-planning-and-running-video-meetings","title":"The CoExplorer Technology Probe: A Generative AI-Powered Adaptive Interface to Support Intentionality in Planning and Running Video Meetings","url":"https://www.microsoft.com/en-us/research/publication/the-coexplorer-technology-probe-a-generative-ai-powered-adaptive-interface-to-support-intentionality-in-planning-and-running-video-meetings/","published":"2024-07-01","authors":["Gun Woo (Warren) Park","Payod Panda","Lev Tankelevitch","Sean Rintel"],"abstract":"Effective meetings are effortful, but traditional videoconferencing systems offer little support for reducing this effort across the meeting lifecycle. Generative AI (GenAI) has the potential to radically redefine meetings by augmenting intentional meeting behaviors. CoExplorer, our novel adaptive meeting prototype, preemptively generates likely phases that meetings would undergo, tools that allow capturing attendees' thoughts before the meeting, and for each phase, window layouts, and appropriate applications and files. Using CoExplorer as a technology probe in a guided walkthrough, we studied its potential in a sample of participants from a global technology company. Our findings suggest that GenAI has the potential to help meetings stay on track and reduce workload, although concerns were raised about users' agency, trust, and possible disruption to traditional meeting norms. We discu...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Human-computer interaction","Social sciences","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/pre-gated-moe-an-algorithm-system-co-design-for-fast-and-scalable-mixture-of-expert-inference","title":"Pre-gated MoE: An Algorithm-System Co-Design for Fast and Scalable Mixture-of-Expert Inference","url":"https://www.microsoft.com/en-us/research/publication/pre-gated-moe-an-algorithm-system-co-design-for-fast-and-scalable-mixture-of-expert-inference/","published":"2024-07-01","authors":["Ranggi Hwang","Jianyu Wei","Shijie Cao","Changho Hwang","Xiaohu Tang","Ting Cao","Mao Yang"],"abstract":"Large language models (LLMs) based on transformers have made significant strides in recent years, the success of which is driven by scaling up their model size.Despite their high algorithmic performance, the computational and memory requirements of LLMs present unprecedented challenges. To tackle the high compute requirements of LLMs, the Mixture-of-Experts (MoE) architecture was introduced which is able to scale its model size without proportionally scaling up its computational requirements. Unfortunately, MoE's high memory demands and dynamic activation of sparse experts restrict its applicability to real-world problems. Previous solutions that offload MoE's memory-hungry expert parameters to CPU memory fall short because the latency to migrate activated experts from CPU to GPU incurs high performance overhead. Our proposed Pre-gated MoE system effectively tackles the compute and memor...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Inproceedings (Conference)","Systems and networking","systems","1970-01-01","memory"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/arena-learning-build-data-flywheel-for-llms-post-training-via-simulated-chatbot-arena","title":"Arena Learning: Build Data Flywheel for LLMs Post-training via Simulated Chatbot Arena","url":"https://www.microsoft.com/en-us/research/publication/arena-learning-build-data-flywheel-for-llms-post-training-via-simulated-chatbot-arena/","published":"2024-07-01","authors":["Haipeng Luo","Qingfeng Sun","Can Xu","Pu Zhao","Qingwei Lin 林庆维","Jianguang Lou","Shifeng Chen","Yansong Tang","Weizhu Chen"],"abstract":"Assessing the effectiveness of large language models (LLMs) presents significant challenges. The method of conducting human-annotated battles in an online Chatbot Arena has been recognized as a highly effective evaluative approach. However, this process is hindered by the costliness and time demands of human annotation, complicating the enhancement of LLMs via post-training. In this paper, we introduce '' Arena Learning '', an innovative offline strategy designed to simulate these arena battles. We have developed a comprehensive set of instructions for simulated battles and employ AI-driven annotations to assess battle outcomes, facilitating continuous improvement of the target model through both supervised fine-tuning and reinforcement learning. A crucial aspect of our methodology is ensuring precise evaluations and achieving consistency between offline simulations and online competitio...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Unpublished","Algorithms","language model","preference","efficient"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/superbench","title":"SuperBench: Improving Cloud AI Infrastructure Reliability with Proactive Validation","url":"https://www.microsoft.com/en-us/research/publication/superbench/","published":"2024-07-01","authors":["Yifan Xiong","Yuting Jiang","Ziyue Yang","Lei Qu","Guoshuai Zhao","Shuguang Liu","Dong Zhong","Boris Pinzur","Jie Zhang","Yang Wang","Jithin Jose","Hossein Pourreza"],"abstract":"Reliability in cloud AI infrastructure is crucial for cloud service providers, prompting the widespread use of hardware redundancies. However, these redundancies can inadvertently lead to hidden degradation, so called \"gray failure\", for AI workloads, significantly affecting end-to-end performance and concealing performance issues, which complicates root cause analysis for failures and regressions.We introduce SuperBench, a proactive validation system for AI infrastructure that mitigates hidden degradation caused by hardware redundancies and enhances overall reliability. SuperBench features a comprehensive benchmark suite, capable of evaluating individual hardware components and representing most real AI workloads. It comprises a Validator which learns benchmark criteria to clearly pinpoint defective components. Additionally, SuperBench incorporates a Selector to balance validation time....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Inproceedings (Conference)","Systems and networking","Computer Systems","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/evaluating-llm-driven-user-intent-formalization-for-verification-aware-languages","title":"Evaluating LLM-driven User-Intent Formalization for Verification-Aware Languages","url":"https://www.microsoft.com/en-us/research/publication/evaluating-llm-driven-user-intent-formalization-for-verification-aware-languages/","published":"2024-07-01","authors":["Shuvendu Lahiri"],"abstract":"Verification-aware programming languages such as Dafny and F provide means to formally specify and prove properties of programs. Although the problem of checking an implementation against a specification can be defined mechanically, there is no algorithmic way of ensuring the correctness of the user-intent formalization for programs -- that a specification adheres to the user's intent behind the program. The intent or requirement is expressed informally in natural language and the specification is a formal artefact. The advent of large language models (LLMs) has made strides bridging the gap between informal intent and formal program implementations recently, driven in large parts due to benchmarks and automated metrics for evaluation.Recent work has proposed evaluating {\\it user-intent formalization} problem for mainstream programming languages~\\cite{endres-fse24}. However, such an appr...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Inproceedings (Conference)","Programming languages and software engineering","1970-01-01","LLM"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"huawei-noah:203","title":"Rethinking optimization and architecture for tiny language models","url":"https://www.noahlab.com.hk/en/scientific_research/rethinking-optimization-and-architecture-for-tiny-language-models","published":"2024-07-01","authors":["Huawei/Noah"],"abstract":"Official Huawei Noah's Ark Lab publication entry. Venue: ICML 2024. External paper link: https://dl.acm.org/doi/10.5555/3692070.3694016","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Model architecture and optimization","ICML 2024","2024"],"author_affiliations":["Huawei/Noah"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Huawei Noah Wagtail publication API https://www.noahlab.com.hk/wt_app/api/v2/pages/"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/large-language-models-for-tabular-data-progresses-and-future-directions","title":"Large Language Models for Tabular Data: Progresses and Future Directions","url":"https://www.microsoft.com/en-us/research/publication/large-language-models-for-tabular-data-progresses-and-future-directions/","published":"2024-07-01","authors":["Haoyu Dong","Zhiruo Wang"],"abstract":"HaoAreYuDong/Large-Language-Models-for-Tabular-Data Tables contain a significant portion of the world's structured information. The ability to efficiently and accurately understand, process, reason about, analyze, and generate tabular data is critical for achieving Artificial General Intelligence (AGI) systems.However, despite their prevalence and importance, tables present unique challenges due to their structured nature and the diverse semantics embedded within them. Textual content, numerical values, visual formats, and even formulas in tables carry rich semantic information that is often underutilized due to the complexity of accurately interpreting and integrating.Fortunately, the advent of Large Language Models (LLMs) has opened new frontiers in natural language processing (NLP) and machine learning (ML), showing remarkable success in understanding and generating text, code, etc. A...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Inproceedings (Conference)","Data platforms and analytics","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"hf-org-paper:deepseek-ai:2407.01906","title":"Let the Expert Stick to His Last: Expert-Specialized Fine-Tuning for Sparse Architectural Large Language Models","url":"https://huggingface.co/papers/2407.01906","published":"2024-07-01","authors":["DeepSeek"],"abstract":"","companies":["DeepSeek"],"matched_orgs":["DeepSeek"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["HuggingFace org papers","deepseek-ai"],"author_affiliations":["DeepSeek"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/deepseek-ai/papers"}},{"id":"apple:e11nn365iw295yhnlrurpia2","title":"Applying RLAIF for Code Generation with API-usage in Lightweight LLMs","url":"https://machinelearning.apple.com/research/applying-rlaif","published":"2024-07-01","authors":["Sujan Dutta","Sayantan Mahinder","Raviteja Anantha","Bortik Bandyopadhyay"],"abstract":"This paper was accepted at the Natural Language Reasoning and Structured Explanations workshop at ACL 2024.","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"official:8d00ea9356caafc1","title":"Real-Time Anomaly Detection and Reactive Planning with Large Language Models","url":"https://research.nvidia.com/publication/2024-07_real-time-anomaly-detection-and-reactive-planning-large-language-models","published":"2024-07","authors":["Rohan Sinha","Amine Elhafsi","Christopher Agia","Matthew Foutter","Edward Schmerling","Marco Pavone"],"abstract":"Official NVIDIA Research publication. RSS","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["RSS"],"author_affiliations":["NVIDIA"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official NVIDIA Research publications page https://research.nvidia.com/publications?f%5B0%5D=publication_date%3A2024&page=1"}},{"id":"official:f62fe4968bb1a1eb","title":"Breathing Life Into Sketches Using Text-to-Video Priors","url":"https://research.nvidia.com/publication/2024-07_breathing-life-sketches-using-text-video-priors","published":"2024-07","authors":["Rinon Gal","Yael Vinker","Yuval Alaluf","Amit Bermano","Daniel Cohen-Or","Ariel Shamir","Gal Chechik"],"abstract":"Official NVIDIA Research publication. CVPR","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["CVPR"],"author_affiliations":["NVIDIA"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official NVIDIA Research publications page https://research.nvidia.com/publications?f%5B0%5D=publication_date%3A2024&page=1"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/vall-e-r-robust-and-efficient-zero-shot-text-to-speech-synthesis-via-monotonic-alignment","title":"VALL-E R: Robust and Efficient Zero-Shot Text-to-Speech Synthesis via Monotonic Alignment","url":"https://www.microsoft.com/en-us/research/publication/vall-e-r-robust-and-efficient-zero-shot-text-to-speech-synthesis-via-monotonic-alignment/","published":"2024-06-30","authors":["Bing Han","Long Zhou","Shujie Liu","Sanyuan Chen","Lingwei Meng","Yanming Qian","Yanqing Liu","Sheng Zhao","Jinyu Li","Furu Wei"],"abstract":"With the help of discrete neural audio codecs, large language models (LLM) have increasingly been recognized as a promising methodology for zero-shot Text-to-Speech (TTS) synthesis. However, sampling based decoding strategies bring astonishing diversity to generation, but also pose robustness issues such as typos, omissions and repetition. In addition, the high sampling rate of audio also brings huge computational overhead to the inference process of autoregression. To address these issues, we propose VALL-E R, a robust and efficient zero-shot TTS system, building upon the foundation of VALL-E. Specifically, we introduce a phoneme monotonic alignment strategy to strengthen the connection between phonemes and acoustic sequence, ensuring a more precise alignment by constraining the acoustic tokens to match their associated phonemes. Furthermore, we employ a codec-merging approach to downsa...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":80,"matched_keywords":["Miscellaneous","Artificial intelligence","Audio and Acoustics","Audio and Speech Processing","LLM","efficient","quantization"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/efficient-expert-pruning-for-sparse-mixture-of-experts-language-models-enhancing-performance-and-reducing-inference-costs","title":"Efficient Expert Pruning for Sparse Mixture-of-Experts Language Models: Enhancing Performance and Reducing Inference Costs","url":"https://www.microsoft.com/en-us/research/publication/efficient-expert-pruning-for-sparse-mixture-of-experts-language-models-enhancing-performance-and-reducing-inference-costs/","published":"2024-06-30","authors":["Enshu Liu","Junyi Zhu","Zinan Lin","Xuefei Ning","Matthew B. Blaschko","Shengen Yan","Guohao Dai","Huazhong Yang","Yu Wang"],"abstract":"The rapid advancement of large language models (LLMs) has led to architectures with billions to trillions of parameters, posing significant deployment challenges due to their substantial demands on memory, processing power, and energy consumption. Sparse Mixture-of-Experts (SMoE) architectures have emerged as a solution, activating only a subset of parameters per token, thereby achieving faster inference while maintaining performance. However, SMoE models still face limitations in broader deployment due to their large parameter counts and significant GPU memory requirements. In this work, we introduce a gradient-free evolutionary strategy named EEP (Efficient Expert P}runing) to enhance the pruning of experts in SMoE models. EEP relies solely on model inference (i.e., no gradient computation) and achieves greater sparsity while maintaining or even improving performance on downstream task...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Miscellaneous","Artificial intelligence","Computer science","Machine learning","memory","efficient"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W4402353595","title":"Retrieval-Augmented Meta Learning for Low-Resource Text Classification","url":"https://doi.org/10.1109/ijcnn60899.2024.10651119","published":"2024-06-30","authors":["Rongsheng Li","Yangning Li","Yinghui Li","Chaiyut Luoyiching","Nannan Zhou","Hanjing Su","Haitao Zheng"],"abstract":"Meta-learning has achieved promising results in low-resource text classification, which aims to identify target classes by transferring knowledge from source classes through a series of small tasks called episodes. However, the current meta-learning algorithms that solely rely on learning from meta-training tasks may struggle to generalize well to meta-testing tasks. To address this problem, we propose a method called Retrieval-Augmented Meta Learning (RAML) that utilizes external knowledge to compensate for the performance degradation when meta-training tasks do not adequately support meta-testing tasks. RAML first utilizes a retriever to retrieve knowledge relevant to the query from an external corpus, and then employs the Multi-View Passages Fusion Network to integrate the retrieved knowledge for performing few-shot classification. This network can effectively combine the probability....","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/ijcnn60899.2024.10651119","openalex_id":"https://openalex.org/W4402353595","cited_by_count":2,"quality_score":47,"matched_keywords":["retrieval","distillation"],"author_affiliations":["Tencent (China)","Tsinghua University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.782829999923706},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.4934329092502594},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4895661771297455},{"id":"https://openalex.org/C206345919","display_name":"Resource (disambiguation)","score":0.47303956747055054},{"id":"https://openalex.org/C31258907","display_name":"Computer network","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":2}},{"id":"openalex:W4402352484","title":"From Handcrafted Features to LLMs: A Brief Survey for Machine Translation Quality Estimation","url":"https://doi.org/10.1109/ijcnn60899.2024.10650457","published":"2024-06-30","authors":["Haofei Zhao","Yilun Liu","Shimin Tao","Weibin Meng","Yimeng Chen","Xiang Geng","Chang Su","Min Zhang","Hao Yang"],"abstract":"Machine Translation Quality Estimation (MTQE) is the task of estimating the quality of machine-translated text in real time without the need for reference translations, which is of great importance for the development of MT. After two decades of evolution, QE has yielded a wealth of results. This article provides a comprehensive overview of QE datasets, annotation methods, shared tasks, methodologies, challenges, and future research directions. It begins with an introduction to the background and significance of QE, followed by an explanation of the concepts and evaluation metrics for word-level QE, sentence-level QE, document-level QE, and explainable QE. The paper categorizes the methods developed throughout the history of QE into those based on handcrafted features, deep learning, and Large Language Models (LLMs), with a further division of deep learning-based methods into classic dee...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/ijcnn60899.2024.10650457","openalex_id":"https://openalex.org/W4402352484","cited_by_count":10,"quality_score":47,"matched_keywords":[],"author_affiliations":["Huawei Technologies (China)","Nanjing University","Northeastern University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6739060878753662},{"id":"https://openalex.org/C2779530757","display_name":"Quality (philosophy)","score":0.5903096795082092},{"id":"https://openalex.org/C96250715","display_name":"Estimation","score":0.5432882905006409},{"id":"https://openalex.org/C149364088","display_name":"Translation (biology)","score":0.513619601726532},{"id":"https://openalex.org/C203005215","display_name":"Machine translation","score":0.4967880845069885},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4907744824886322},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.43289974331855774},{"id":"https://openalex.org/C127413603","display_name":"Engineering","score":0.11317875981330872}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":10}},{"id":"openalex:W4402352531","title":"Incremental Soft Pruning to Get the Sparse Neural Network During Training","url":"http://dx.doi.org/10.1109/ijcnn60899.2024.10650747","published":"2024-06-30","authors":["Kehan Zhu","Fuyi Hu","Yuanbin Ding","Yunyun Dong","Ruxin Wang"],"abstract":"The traditional three-stage pruning pipeline is first to train an original dense network, then identify redundant parts of the network for pruning based on the evaluation metrics of the pruning algorithm, and finally fine-tuning the pruned model, which is a time-consuming and computationally expensive process. Traditional pruning algorithms are greedy and aggressive, which may cause many important network connections to be pruned incorrectly, resulting in significant performance degradation. In this paper, we propose an incremental soft pruning during training method with the following characteristics: 1) Given the pruning rate of the network, a trained sub-network, which has performance comparable to the original network, can be obtained after training. 2) We propose three incremental pruning rate growth functions and allow the network structure to be dynamically adjusted during trainin...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/ijcnn60899.2024.10650747","openalex_id":"https://openalex.org/W4402352531","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Alibaba Group (China)","Yunnan University"],"concepts":[{"id":"https://openalex.org/C108010975","display_name":"Pruning","score":0.7481147646903992},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7360994815826416},{"id":"https://openalex.org/C2777211547","display_name":"Training (meteorology)","score":0.7052563428878784},{"id":"https://openalex.org/C50644808","display_name":"Artificial neural network","score":0.624324381351471},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5083793997764587},{"id":"https://openalex.org/C51632099","display_name":"Training set","score":0.4253210723400116},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.41471850872039795},{"id":"https://openalex.org/C153180895","display_name":"Pattern recognition (psychology)","score":0.3258116543292999}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/direct-preference-knowledge-distillation-for-large-language-models","title":"Direct Preference Knowledge Distillation for Large Language Models","url":"https://www.microsoft.com/en-us/research/publication/direct-preference-knowledge-distillation-for-large-language-models/","published":"2024-06-27","authors":["Yixing Li","Yuxian Gu","Li Dong","Dequan Wang","Yu Cheng","Furu Wei"],"abstract":"In the field of large language models (LLMs), Knowledge Distillation (KD) is a critical technique for transferring capabilities from teacher models to student models. However, existing KD methods face limitations and challenges in distillation of LLMs, including efficiency and insufficient measurement capabilities of traditional KL divergence. It is shown that LLMs can serve as an implicit reward function, which we define as a supplement to KL divergence. In this work, we propose Direct Preference Knowledge Distillation (DPKD) for LLMs. DPKD utilizes distribution divergence to represent the preference loss and implicit reward function. We re-formulate KD of LLMs into two stages: first optimizing and objective consisting of implicit reward and reverse KL divergence and then improving the preference probability of teacher outputs over student outputs. We conducted experiments and analysis....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":80,"matched_keywords":["Miscellaneous","Artificial intelligence","Computation and Language","Computer science","LLM","preference","distillation"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"official:898210c6562aa4fb","title":"Meta Large Language Model Compiler: Foundation Models of Compiler Optimization","url":"https://ai.meta.com/research/publications/meta-large-language-model-compiler-foundation-models-of-compiler-optimization/","published":"2024-06-27","authors":["Chris Cummins","Volker Seeker","Dejan Grubisic","Baptiste Rozière","Jonas Gehring","Gabriel Synnaeve","Hugh Leather"],"abstract":"Official Meta AI publication page.","companies":["Meta/FAIR"],"matched_orgs":["Meta/FAIR"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Systems Research","language model"],"author_affiliations":["Meta/FAIR"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Meta AI publication search https://ai.meta.com/results/?content_types%5B0%5D=publication&page=12"}},{"id":"apple:gd583733x0yh8ekcewc615wi","title":"Toward Robust Evaluation: A Comprehensive Taxonomy of Datasets and Metrics for Open Domain Question Answering in the Era of Large Language Models","url":"https://machinelearning.apple.com/research/towards-robust-evaluation","published":"2024-06-26","authors":["Akchay Srivastava","Atif Memon"],"abstract":"Open Domain Question Answering (ODQA) within natural language processing involves building systems that answer factual questions using large-scale knowledge corpora. Recent advances stem from the confluence of several factors, such as large-scale training datasets, deep learning techniques, and the rise of large language models. High-quality datasets are used to train models on realistic scenarios and enable the evaluation of the system on...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.1109/access.2024.3446854","openalex_id":"https://openalex.org/W4401717597","cited_by_count":9,"quality_score":61,"matched_keywords":[],"author_affiliations":["Apple","Apple (United States)"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"official:c36afd7c67542366","title":"Neurons in Large Language Models: Dead, N-gram, Positional","url":"https://ai.meta.com/research/publications/neurons-in-large-language-models-dead-n-gram-positional/","published":"2024-06-25","authors":["Elena Voita","Javier Ferrando Monsonis","Christoforos Nalmpantis"],"abstract":"Official Meta AI publication page.","companies":["Meta/FAIR"],"matched_orgs":["Meta/FAIR"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["NLP"],"author_affiliations":["Meta/FAIR"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Meta AI publication search https://ai.meta.com/results/?content_types%5B0%5D=publication&page=12"}},{"id":"openalex:W4400053039","title":"Leveraging Intent Detection and Generative AI for Enhanced Customer Support","url":"https://doi.org/10.60087/jaigs.v5i1.178","published":"2024-06-25","authors":["Vamsi Katragadda"],"abstract":"Customer support plays a pivotal role in shaping customer satisfaction and fostering loyalty within any business. This paper delves into how the integration of intent detection and generative AI (GenAI) can transform customer support systems. At the core of this transformation is the ability to understand user intent, which is essential for directing customers effectively through the support funnel to the appropriate services. By employing sophisticated natural language processing (NLP) techniques, training LLM to perform RAG and machine learning models, businesses can precisely discern customer intents. This capability allows for the delivery of tailored, immediate responses. The paper further explores the methodologies employed, the advantages gained, and the challenges faced in the adoption of these advanced technologies in customer support systems.","companies":["Meta/FAIR"],"matched_orgs":["Meta/FAIR"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.60087/jaigs.v5i1.178","openalex_id":"https://openalex.org/W4400053039","cited_by_count":13,"quality_score":54,"matched_keywords":["LLM"],"author_affiliations":["Menlo School","Meta (United States)"],"concepts":[{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.6634549498558044},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.556186318397522},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.32060664892196655}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":13}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/t-mac-cpu-renaissance-via-table-lookup-for-low-bit-llm-deployment-on-edge","title":"T-MAC: CPU Renaissance via Table Lookup for Low-Bit LLM Deployment on Edge","url":"https://www.microsoft.com/en-us/research/publication/t-mac-cpu-renaissance-via-table-lookup-for-low-bit-llm-deployment-on-edge/","published":"2024-06-24","authors":["Jianyu Wei","Shijie Cao","Ting Cao","Lingxiao Ma","Lei Wang","Yanyong Zhang","Mao Yang"],"abstract":"The deployment of Large Language Models (LLMs) on edge devices is increasingly important to enhance on-device intelligence. Weight quantization is crucial for reducing the memory footprint of LLMs on devices. However, low-bit LLMs necessitate mixed precision matrix multiplication (mpGEMM) of low precision weights and high precision activations during inference. Existing systems, lacking native support for mpGEMM, resort to dequantize weights for high precision computation. Such an indirect way can lead to a significant inference overhead. In this paper, we introduce T-MAC, an innovative lookup table(LUT)-based method designed for efficient low-bit LLM (i.e., weight-quantized LLM) inference on CPUs. T-MAC directly supports mpGEMM without dequantization, while simultaneously eliminating multiplications and reducing additions required. Specifically, T-MAC transforms the traditional data-typ...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":84,"matched_keywords":["Miscellaneous","Artificial intelligence","Computer science","Distributed, Parallel, and Cluster Computing","LLM","memory","efficient","quantization"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"apple:iwjbtuauyd3l8fyp3gjn2lq5","title":"Hypernetworks for Personalizing ASR to Atypical Speech","url":"https://machinelearning.apple.com/research/hypernetworks-personalizing-asr","published":"2024-06-24","authors":["Max Müller-Eberstein","Dianna Yee","Karren Yang","Gautam Varma Mantena","Colin Lea"],"abstract":"Parameter-efficient fine-tuning (PEFT) for personalizing automatic speech recognition (ASR) has recently shown promise for adapting general population models to atypical speech. However, these approaches assume a priori knowledge of the atypical speech disorder being adapted for -- the diagnosis of which requires expert knowledge that is not always available. Even given this knowledge, data scarcity and high inter/intra-speaker variability...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.1162/tacl_a_00696","openalex_id":"https://openalex.org/W4402639795","cited_by_count":7,"quality_score":63,"matched_keywords":["efficient"],"author_affiliations":["Apple","Apple (United States)","IT University of Copenhagen"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"openalex:W4404134046","title":"RTLFixer: Automatically Fixing RTL Syntax Errors with Large Language Model","url":"https://doi.org/10.1145/3649329.3657353","published":"2024-06-23","authors":["Yun-Da Tsai","Mingjie Liu","Haoxing Ren"],"abstract":"This paper presents RTLFixer, a novel framework enabling automatic syntax errors fixing for Verilog code with Large Language Models (LLMs). Despite LLM's promising capabilities, our analysis indicates that approximately 55% of errors in LLM-generated Verilog are syntax-related, leading to compilation failures. To tackle this issue, we introduce a novel debugging framework that employs Retrieval-Augmented Generation (RAG) and ReAct prompting, enabling LLMs to act as autonomous agents in interactively debugging the code with feedback. This framework demonstrates exceptional proficiency in resolving syntax errors, successfully correcting about 98.5% of compilation errors in our debugging dataset, comprising 212 erroneous implementations derived from the VerilogEval benchmark. Our method leads to 32.3% and 10.1% increase in pass@1 success rates in the VerilogEval-Machine and VerilogEval-Huma...","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3649329.3657353","openalex_id":"https://openalex.org/W4404134046","cited_by_count":69,"quality_score":79,"matched_keywords":["LLM","language model","retrieval"],"author_affiliations":["Nvidia (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.842897891998291},{"id":"https://openalex.org/C199360897","display_name":"Programming language","score":0.7173600196838379},{"id":"https://openalex.org/C60048249","display_name":"Syntax","score":0.6247459053993225},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.4615499973297119},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3522903621196747}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":69}},{"id":"hf-org-paper:moonshotai:2407.00079","title":"Mooncake: A KVCache-centric Disaggregated Architecture for LLM Serving","url":"https://huggingface.co/papers/2407.00079","published":"2024-06-23","authors":["Moonshot/Kimi"],"abstract":"Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI. It features a KVCache-centric disaggregated architecture that separates the prefill and decoding clusters. It also leverages the underutilized CPU, DRAM, and SSD resources of the GPU cluster to implement a disaggregated cache of KVCache. The core of Mooncake is its KVCache-centric scheduler, which balances maximizing overall effective throughput while meeting latency-related Service Level Objectives (SLOs). Unlike traditional studies that assume all requests will be processed, Mooncake faces challenges due to highly overloaded scenarios. To mitigate these, we developed a prediction-based early rejection policy. Experiments show that Mooncake excels in long-context scenarios. Compared to the baseline method, Mooncake can achieve up to a 525% increase in throughput in certain simulated scenarios while...","companies":["Moonshot/Kimi"],"matched_orgs":["Moonshot/Kimi"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["HuggingFace org papers","moonshotai","LLM"],"author_affiliations":["Moonshot/Kimi"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/moonshotai/papers"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/evolving-roles-and-workflows-of-creative-practitioners-in-the-age-of-generative-ai","title":"Evolving Roles and Workflows of Creative Practitioners in the Age of Generative AI","url":"https://www.microsoft.com/en-us/research/publication/evolving-roles-and-workflows-of-creative-practitioners-in-the-age-of-generative-ai/","published":"2024-06-22","authors":["Srishti Palani","Gonzalo Ramos"],"abstract":"Creative practitioners (like designers, software developers, and architects) have started to employ Generative AI models (GenAI) to produce text, images, and assets comparable to those made by people. While HCI research explores specific GenAI models and creativity support tools, little is known about practitioners’ evolving roles and workflows with GenAI models across a project’s stages. This knowledge is key to guide the development of the new generation of Creativity Support Tools. We contribute to this knowledge by employing a triangulated method to capture interviews, videos, and survey responses of creative practitioners reflecting on projects they completed with GenAI. Our observations let us derive a set of factors that capture practitioners’ perceived roles, challenges, benefits, and interaction patterns when creating with GenAI. From these factors, we offer insights and propose...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.1145/3635636.3656190","openalex_id":"https://openalex.org/W4399917581","cited_by_count":39,"quality_score":110,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Human-computer interaction","Human–computer interaction","Machine learning","User experience design","1970-01-01"],"author_affiliations":["Microsoft","Microsoft (United States)"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/impact-of-decentralized-learning-on-player-utilities-in-stackelberg-games","title":"Impact of Decentralized Learning on Player Utilities in Stackelberg Games","url":"https://www.microsoft.com/en-us/research/publication/impact-of-decentralized-learning-on-player-utilities-in-stackelberg-games/","published":"2024-06-21","authors":["Kate Donahue","Nicole Immorlica","Meena Jagadeesan","Brendan Lucier","Aleksandrs Slivkins"],"abstract":"When deployed in the world, a learning agent such as a recommender system or a chatbot often repeatedly interacts with another learning agent (such as a user) over time. In many such two-agent systems, each agent learns separately and the rewards of the two agents are not perfectly aligned. To better understand such cases, we examine the learning dynamics of the two-agent system and the implications for each agent's objective. We model these systems as Stackelberg games with decentralized learning and show that standard regret benchmarks (such as Stackelberg equilibrium payoffs) result in worst-case linear regret for at least one player. To better capture these systems, we construct a relaxed regret benchmark that is tolerant to small learning errors by agents. We show that standard learning algorithms fail to provide sublinear regret, and we develop algorithms to achieve near-optimal [l...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","1970-01-01","agent"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/democratizing-protein-language-models-with-parameter-efficient-fine-tuning","title":"Democratizing Protein Language Models with Parameter-Efficient Fine-Tuning","url":"https://www.microsoft.com/en-us/research/publication/democratizing-protein-language-models-with-parameter-efficient-fine-tuning/","published":"2024-06-20","authors":["Samuel Sledzieski","Meghana Kshirsagar","Minkyung Baek","Bonnie Berger","Rahul Dodhia","Juan M. Lavista Ferres"],"abstract":"Proteomics has been revolutionized by large pre-trained protein language models, which learn unsupervised representations from large corpora of sequences. The parameters of these models are then fine-tuned in a supervised setting to tailor the model to a specific downstream task. However, as model size increases, the computational and memory footprint of fine-tuning becomes a barrier for many research groups. In the field of natural language processing, which has seen a similar explosion in the size of models, these challenges have been addressed by methods for parameter-efficient fine-tuning (PEFT). In this work, we newly bring parameter-efficient fine-tuning methods to proteomics. Using the parameter-efficient method LoRA, we train new models for two important proteomic tasks: predicting protein-protein interactions (PPI) and predicting the symmetry of homooligomers. We show that for h...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.1073/pnas.2405840121","openalex_id":"https://openalex.org/W4399849668","cited_by_count":72,"quality_score":110,"matched_keywords":["Article (Journal)","Artificial intelligence","Medical, health and genomics","1970-01-01","language model","memory","efficient"],"author_affiliations":["Microsoft","Massachusetts Institute of Technology","Microsoft (United States)","Seoul National University"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/care-a-benchmark-suite-for-the-classification-and-retrieval-of-enzymes","title":"CARE: a Benchmark Suite for the Classification and Retrieval of Enzymes","url":"https://www.microsoft.com/en-us/research/publication/care-a-benchmark-suite-for-the-classification-and-retrieval-of-enzymes/","published":"2024-06-20","authors":["Jason Yang","Ariane Mora","Shengchao Liu","Bruce J. Wittmann","A. Anandkumar","Frances H. Arnold","Yisong Yue"],"abstract":"Enzymes are important proteins that catalyze chemical reactions. In recent years, machine learning methods have emerged to predict enzyme function from sequence; however, there are no standardized benchmarks to evaluate these methods. We introduce CARE, a benchmark and dataset suite for the Classification And Retrieval of Enzymes (CARE). CARE centers on two tasks: (1) classification of a protein sequence by its enzyme commission (EC) number and (2) retrieval of an EC number given a chemical reaction. For each task, we design train-test splits to evaluate different kinds of out-of-distribution generalization that are relevant to real use cases. For the classification task, we provide baselines for state-of-the-art methods. Because the retrieval task has not been previously formalized, we propose a method called Contrastive Reaction-EnzymE Pretraining (CREEP) as one of the first baselines....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Biology","Machine learning","1970-01-01","retrieval"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"official:e9c8d422f664df0c","title":"Consistency Models","url":"https://openai.com/index/consistency-models","published":"2024-06-20","authors":["OpenAI"],"abstract":"Diffusion models have significantly advanced the fields of image, audio, and video generation, but they depend on an iterative sampling process that causes slow generation.","companies":["OpenAI"],"matched_orgs":["OpenAI"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Research"],"author_affiliations":["OpenAI"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official RSS feed https://openai.com/news/rss.xml"}},{"id":"arxiv:2501.18110","title":"Lifelong 3D Mapping Framework for Hand-Held & Robot-Mounted LiDAR Mapping Systems","url":"http://arxiv.org/abs/2501.18110","published":"2024-06-20","authors":["Liudi Yang","Sai Manoj Prakhya","Senhua Zhu","Ziyuan Liu"],"abstract":"We propose a lifelong 3D mapping framework that is modular, cloud-native by design and more importantly, works for both hand-held and robot-mounted 3D LiDAR mapping systems. Our proposed framework comprises of dynamic point removal, multi-session map alignment, map change detection and map version control. First, our sensor-setup agnostic dynamic point removal algorithm works seamlessly with both hand-held and robot-mounted setups to produce clean static 3D maps. Second, the multi-session map alignment aligns these clean static maps automatically, without manual parameter fine-tuning, into a single reference frame, using a two stage approach based on feature descriptor matching and fine registration. Third, our novel map change detection identifies positive and negative changes between two aligned maps. Finally, the map version control maintains a single base map that represents the curr...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/lra.2024.3417113","openalex_id":"https://openalex.org/W4399849446","cited_by_count":9,"quality_score":46,"matched_keywords":[],"author_affiliations":["Huawei Technologies (China)","Huawei Technologies (Germany)"],"concepts":[{"id":"https://openalex.org/C51399673","display_name":"Lidar","score":0.7410265207290649},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.48922157287597656},{"id":"https://openalex.org/C90509273","display_name":"Robot","score":0.48692309856414795},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.46516183018684387},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3836252987384796},{"id":"https://openalex.org/C62649853","display_name":"Remote sensing","score":0.3027885854244232},{"id":"https://openalex.org/C205649164","display_name":"Geography","score":0.21194088459014893}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":9}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/mmctagent-multi-modal-critical-thinking-agent-framework-for-complex-visual-reasoning","title":"MMCTAgent: Multi-modal Critical Thinking Agent Framework for Complex Visual Reasoning","url":"https://www.microsoft.com/en-us/research/publication/mmctagent-multi-modal-critical-thinking-agent-framework-for-complex-visual-reasoning/","published":"2024-06-19","authors":["Somnath Kumar","Yash Gadhia","Tanuja Ganu","Akshay Nambi"],"abstract":"Recent advancements in Multi-modal Large Language Models (MLLMs) have significantly improved their performance in tasks combining vision and language. However, challenges persist in detailed multi-modal understanding, comprehension of complex tasks, and reasoning over multi-modal information. This paper introduces MMCTAgent, a novel multi-modal critical thinking agent framework designed to address the inherent limitations of current MLLMs in complex visual reasoning tasks. Inspired by human cognitive processes and critical thinking, MMCTAgent iteratively analyzes multi-modal information, decomposes queries, plans strategies, and dynamically evolves its reasoning. Additionally, MMCTAgent incorporates critical thinking elements such as verification of final answers and self-reflection through a novel approach that defines a vision-based critic and identifies task-specific evaluation criter...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer vision","Computer science","1970-01-01","agent"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/raising-the-bar-investigating-the-values-of-large-language-models-via-generative-evolving-testing","title":"Raising the Bar: Investigating the Values of Large Language Models via Generative Evolving Testing","url":"https://www.microsoft.com/en-us/research/publication/raising-the-bar-investigating-the-values-of-large-language-models-via-generative-evolving-testing/","published":"2024-06-19","authors":["Han Jiang","Xiaoyuan Yi","Zhihua Wei","Shu Wang","Xing Xie"],"abstract":"Warning: Contains harmful model outputs. Despite significant advancements, the propensity of Large Language Models (LLMs) to generate harmful and unethical content poses critical challenges. Measuring value alignment of LLMs becomes crucial for their regulation and responsible deployment. Although numerous benchmarks have been constructed to assess social bias, toxicity, and ethical issues in LLMs, those static benchmarks suffer from evaluation chronoeffect, in which, as models rapidly evolve, existing benchmarks may leak into training data or become saturated, overestimating ever-developing LLMs. To tackle this problem, we propose GETA, a novel generative evolving testing approach based on adaptive testing methods in measurement theory. Unlike traditional adaptive testing methods that rely on a static test item pool, GETA probes the underlying moral boundaries of LLMs by dynamically gen...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","large language models","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/instruction-pre-training-language-models-are-supervised-multitask-learners","title":"Instruction Pre-Training: Language Models are Supervised Multitask Learners","url":"https://www.microsoft.com/en-us/research/publication/instruction-pre-training-language-models-are-supervised-multitask-learners/","published":"2024-06-19","authors":["Daixuan Cheng","Yuxian Gu","Shaohan Huang","Junyu Bi","Minlie Huang","Furu Wei"],"abstract":"Unsupervised multitask pre-training has been the critical method behind the recent success of language models (LMs). However, supervised multitask learning still holds significant promise, as scaling it in the post-training stage trends towards better generalization. In this paper, we explore supervised multitask pre-training by proposing Instruction Pre-Training, a framework that scalably augments massive raw corpora with instruction-response pairs to pre-train LMs. The instruction-response pairs are generated by an efficient instruction synthesizer built on open-source models. In our experiments, we synthesize 200M instruction-response pairs covering 40+ task categories to verify the effectiveness of Instruction Pre-Training. In pre-training from scratch, Instruction Pre-Training not only consistently enhances pre-trained base models but also benefits more from further instruction tuni...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Miscellaneous","Artificial intelligence","Computation and Language","Computer science","efficient"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"bytedance-seed:153","title":"SD-Eval: A Benchmark Dataset for Spoken Dialogue Understanding Beyond Words","url":"https://seed.bytedance.com/en/research/sd-eval-a-benchmark-dataset-for-spoken-dialogue-understanding-beyond-words","published":"2024-06-19","authors":["Junyi Ao","Yuancheng Wang","Xiaohai Tian","Dekun Chen","Jun Zhang","Lu Lu","Yuxuan Wang","Haizhou Li","Zhizheng Wu"],"abstract":"Speech encompasses a wealth of information, including but not limited to content, paralinguistic, and environmental information. This comprehensive nature of speech significantly impacts communication and is crucial for human-computer interaction. Chat-Oriented Large Language Models (LLMs), known for their general-purpose assistance capabilities, have evolved to handle multi-modal inputs, including speech. Although these models can be adept at recognizing and analyzing speech, they often fall short of generating appropriate responses. We argue that this is due to the lack of principles on task definition and model development, which requires open-source datasets and metrics suitable for model evaluation. To bridge the gap, we present SD-Eval, a benchmark dataset aimed at multidimensional evaluation of spoken dialogue understanding and generation. SD-Eval focuses on paralinguistic and env...","companies":["ByteDance/Seed"],"matched_orgs":["ByteDance/Seed"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Speech&Audio","Speech","NeurIPS 2024","LLM"],"author_affiliations":["ByteDance/Seed"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official ByteDance Seed publication API https://seed.bytedance.com/api/get_article_list_v2"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/florence-2-advancing-a-unified-representation-for-a-variety-of-vision-tasks","title":"Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks","url":"https://www.microsoft.com/en-us/research/publication/florence-2-advancing-a-unified-representation-for-a-variety-of-vision-tasks/","published":"2024-06-19","authors":["Bin Xiao","Haiping Wu","Weijian Xu","Xiyang Dai","Houdong Hu","Yumao Lu","Michael Zeng","Ce Liu","Lu Yuan"],"abstract":"We introduce Florence-2, a novel vision foundation model with a unified, prompt-based representation for a variety of computer vision and vision-language tasks. While existing large vision models excel in transfer learning, they struggle to perform a diversity of tasks with simple instructions, a capability that implies handling the complexity of various spatial hierarchy and semantic granularity. Florence-2 was designed to take text-prompt as task instructions and generate desirable results in text forms, whether it be captioning, object detection, grounding or segmentation. This multi-task learning setup demands large-scale, high-quality annotated data. To this end, we co-developed FLD-5B that consists of 5.4 billion comprehensive visual annotations on 126 million images, using an iterative strategy of automated image annotation and model refinement. We adopted a sequence-to-sequence s...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer vision","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W4401163976","title":"A Novel Multimodal Human Activity Recognition based on Self-Attention Mechanism","url":"https://doi.org/10.1109/bmsb62888.2024.10608274","published":"2024-06-19","authors":["Xue Ding","Yaowen Mei","Bowen Cai","Yuetian Zhou","Jinyang Yu","Weiliang Xie","Ting Jiang"],"abstract":"Multimodal human activity recognition attracts wide attention in human-computer interaction. However, in the collected multimodal signals, not all modal signals contain useful feature information; some irrelevant and redundant information may negatively impact the model’s performance, reducing the accuracy of activity recognition. This paper designs a self-attention mechanism-based multimodal fusion network for combining $\\mathbf{W i}$ Fi signals and image streams based on video signals. The self-attention mechanism possesses the capability to capture spatiotemporal local features within multimodal signals. It dynamically learns the weights of different modalities, assigning higher weights to relatively important modalities. This process effectively fuses features extracted from individual modalities, resulting in a more comprehensive feature set. Through extensive experiments, we evalua...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/bmsb62888.2024.10608274","openalex_id":"https://openalex.org/W4401163976","cited_by_count":2,"quality_score":39,"matched_keywords":[],"author_affiliations":["Alibaba Group (China)","Beijing University of Posts and Telecommunications","China Telecom","China Telecom (China)"],"concepts":[{"id":"https://openalex.org/C89611455","display_name":"Mechanism (biology)","score":0.7361471056938171},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7137479186058044},{"id":"https://openalex.org/C121687571","display_name":"Activity recognition","score":0.48068365454673767},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.3908497393131256},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.38598430156707764},{"id":"https://openalex.org/C111472728","display_name":"Epistemology","score":0.0},{"id":"https://openalex.org/C138885662","display_name":"Philosophy","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":2}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/rocov2-radiology-objects-in-context-version-2-an-updated-multimodal-image-dataset","title":"ROCOv2: Radiology Objects in COntext Version 2, an Updated Multimodal Image Dataset","url":"https://www.microsoft.com/en-us/research/publication/rocov2-radiology-objects-in-context-version-2-an-updated-multimodal-image-dataset/","published":"2024-06-18","authors":["Johannes Rückert","Louise Bloch","Raphael Brüngel","Ahmad Idrissi-Yaghir","Henning Schäfer","Cynthia S. Schmidt","Sven Koitka","Obioma Pelka","Asma Ben Abacha","Alba García Seco de Herrera","Henning Müller","Peter A. Horn"],"abstract":"Automated medical image analysis systems often require large amounts of training data with high quality labels, which are difficult and time consuming to generate. This paper introduces Radiology Object in COntext version 2 (ROCOv2), a multimodal dataset consisting of radiological images and associated medical concepts and captions extracted from the PMC Open Access subset. It is an updated version of the ROCO dataset published in 2018, and adds 35,705 new images added to PMC since 2018. It further provides manually curated concepts for imaging modalities with additional anatomical and directional concepts for X-rays. The dataset consists of 79,789 images and has been used, with minor modifications, in the concept detection and caption prediction tasks of ImageCLEFmedical Caption 2023. The dataset is suitable for training image annotation models based on image-caption pairs, or for multi...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.1038/s41597-024-03496-6","openalex_id":"https://openalex.org/W4397028377","cited_by_count":42,"quality_score":102,"matched_keywords":["Article (Journal)","Artificial intelligence","Computer vision","Medical, health and genomics","1970-01-01"],"author_affiliations":["Microsoft","Dortmund University of Applied Sciences and Arts","Essen University Hospital","HES-SO University of Applied Sciences and Arts Western Switzerland","Institut für Medizinische Informatik, Biometrie und Epidemiologie","Microsoft (United States)","University of Essex"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"apple:lli14ho6r2v99d7ihkuqp9hy","title":"Multimodal Large Language Models with Fusion Low Rank Adaptation for Device Directed Speech Detection","url":"https://machinelearning.apple.com/research/llm-fusion-low-rank","published":"2024-06-18","authors":["Shruti Palaskar","Oggi Rudovic","Sameer Dharur","Florian Pesce","Gautam Krishna","Aswin Sivaraman","Jack Berkowitz","Ahmed Hussen Abdelaziz","Saurabh Adya","Ahmed Tewfik"],"abstract":"Although Large Language Models (LLMs) have shown promise for human-like conversations, they are primarily pre-trained on text data. Incorporating audio or video improves performance, but collecting large-scale multimodal data and pre-training multimodal LLMs is challenging. To this end, we propose a Fusion Low Rank Adaptation (FLoRA) technique that efficiently adapts a pre-trained unimodal LLM to consume new, previously unseen modalities via low...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["LLM"],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"arxiv:2406.12793","title":"ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All Tools","url":"https://huggingface.co/papers/2406.12793","published":"2024-06-18","authors":["Team GLM","Aohan Zeng","Bin Xu","Bowen Wang","Chenhui Zhang","Da Yin","Diego Rojas","Guanyu Feng","Hanlin Zhao","Hanyu Lai","Hao Yu","Hongning Wang"],"abstract":"We introduce ChatGLM, an evolving family of large language models that we have been developing over time. This report primarily focuses on the GLM-4 language series, which includes GLM-4, GLM-4-Air, and GLM-4-9B. They represent our most capable models that are trained with all the insights and lessons gained from the preceding three generations of ChatGLM. To date, the GLM-4 models are pre-trained on ten trillions of tokens mostly in Chinese and English, along with a small set of corpus from 24 languages, and aligned primarily for Chinese and English usage. The high-quality alignment is achieved via a multi-stage post-training process, which involves supervised fine-tuning and learning from human feedback. Evaluations show that GLM-4 1) closely rivals or outperforms GPT-4 in terms of general metrics such as MMLU, GSM8K, MATH, BBH, GPQA, and HumanEval, 2) gets close to GPT-4-Turbo in inst...","companies":["Z.ai/Zhipu"],"matched_orgs":["Z.ai/Zhipu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["huggingface_search"],"source":"huggingface_search","work_type":"","doi":"","openalex_id":"","cited_by_count":0,"quality_score":27,"matched_keywords":[],"author_affiliations":[],"concepts":[],"official_report":false,"quality_signals":{"company_match_source":"HuggingFace Papers organization/author metadata"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/from-rags-to-rich-parameters-probing-how-language-models-utilize-external-knowledge-over-parametric-information-for-factual-queries","title":"From RAGs to rich parameters: Probing how language models utilize external knowledge over parametric information for factual queries","url":"https://www.microsoft.com/en-us/research/publication/from-rags-to-rich-parameters-probing-how-language-models-utilize-external-knowledge-over-parametric-information-for-factual-queries/","published":"2024-06-17","authors":["Hitesh Wadhwa","Rahul Seetharaman","Somyaa Aggarwal","Reshmi Ghosh","Samyadeep Basu","Soundararajan Srinivasan","Wenlong Zhao","Shreyas Chaudhari","Ehsan Aghazadeh","Reshmi Ghosh"],"abstract":"Retrieval Augmented Generation (RAG) enriches the ability of language models to reason using external context to augment responses for a given user prompt. This approach has risen in popularity due to practical applications in various applications of language models in search, question/answering, and chat-bots. However, the exact nature of how this approach works isn't clearly understood. In this paper, we mechanistically examine the RAG pipeline to highlight that language models take shortcut and have a strong bias towards utilizing only the context information to answer the question, while relying minimally on their parametric memory. We probe this mechanistic behavior in language models with: (i) Causal Mediation Analysis to show that the parametric memory is minimally utilized when answering a question and (ii) Attention Contributions and Knockouts to show that the last token residua...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":84,"matched_keywords":["Article (Journal)","Artificial intelligence","Human language technologies","Search and information retrieval","Technology for emerging markets","Computer science","memory","retrieval"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/explaining-clips-performance-disparities-on-data-from-blind-low-vision-users","title":"Explaining CLIP's performance disparities on data from blind/low vision users","url":"https://www.microsoft.com/en-us/research/publication/explaining-clips-performance-disparities-on-data-from-blind-low-vision-users/","published":"2024-06-17","authors":["Daniela Massiceti","Camilla Longden","Agnieszka Slowik","Samuel Wills","Martin Grayson","Cecily Morrison"],"abstract":"Large multi-modal models (LMMs) hold the potential to usher in a new era of automated visual assistance for people who are blind or low vision (BLV). Yet, these models have not been systematically evaluated on data captured by BLV users. We address this by empirically assessing CLIP, a widely-used LMM likely to underpin many assistive technologies. Testing 25 CLIP variants in a zero-shot classification task, we find that their accuracy is 15 percentage points lower on average for images captured by BLV users than web-crawled images. This disparity stems from CLIP's sensitivities to 1) image content (e.g. not recognizing disability objects as well as other objects); 2) image quality (e.g. not being robust to lighting variation); and 3) text content (e.g. not recognizing objects described by tactile adjectives as well as visual ones). We delve deeper with a textual analysis of three common...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer vision","Human-computer interaction","Computer science","Zero shot learning"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/bioclip-a-vision-foundation-model-for-the-tree-of-life","title":"BIOCLIP: A Vision Foundation Model for the Tree of Life","url":"https://www.microsoft.com/en-us/research/publication/bioclip-a-vision-foundation-model-for-the-tree-of-life/","published":"2024-06-17","authors":["Samuel Stevens","Jiaman Wu","Matthew J Thompson","Elizabeth G. Campolongo","Chan Hee Song","David Carlyn","Li Dong","W. Dahdul","Charles Stewart","Tanya Y. Berger-Wolf","Wei-Lun Chao","Yu Su"],"abstract":"Images of the natural world, collected by a variety of cameras, from drones to individual phones, are increasingly abundant sources of biological information. There is an explosion of computational methods and tools, particularly computer vision, for extracting biologically relevant information from images for science and conservation. Yet most of these are bespoke approaches designed for a specific task and are not easily adaptable or extendable to new questions, contexts, and datasets. A vision model for general organismal biology questions on images is of timely need. To approach this, we curate and release TreeOfLife-10M, the largest and most diverse ML-ready dataset of biology images. We then develop BioCLIP, a foundation model for the tree of life, leveraging the unique properties of biology captured by TreeOfLife-10M, namely the abundance and variety of images of plants, animals,....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Inproceedings (Conference)","Computer vision","Computer science","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/hierarchical-intra-modal-correlation-learning-for-label-free-3d-semantic-segmentation","title":"Hierarchical Intra-modal Correlation Learning for Label-free 3D Semantic Segmentation","url":"https://www.microsoft.com/en-us/research/publication/hierarchical-intra-modal-correlation-learning-for-label-free-3d-semantic-segmentation/","published":"2024-06-17","authors":["Xin Kang","Lei Chu","Jiahao Li","Xuejin Chen","Yan Lu"],"abstract":"Recent methods for label-free 3D semantic segmentation aim to assist 3D model training by leveraging the open-world recognition ability of pre-trained vision language models. However, these methods usually suffer from inconsistent and noisy pseudo-labels provided by the vision language models. To address this issue, we present a hierarchical intra-modal correlation learning framework that captures visual and geometric correlations in 3D scenes at three levels: intra-set, intra-scene, and inter-scene, to help learn more compact 3D representations. We refine pseudo-labels using intra-set correlations within each geometric consistency set and align features of visually and geometrically similar points using intra-scene and inter-scene correlation learning. We also introduce a feedback mechanism to distill the correlation learning capability into the 3D model. Experiments on both indoor and....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Inproceedings (Conference)","Computer vision","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"hf-org-paper:deepseek-ai:2406.11931","title":"DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence","url":"https://huggingface.co/papers/2406.11931","published":"2024-06-17","authors":["DeepSeek"],"abstract":"We present DeepSeek-Coder-V2, an open-source Mixture-of-Experts (MoE) code language model that achieves performance comparable to GPT4-Turbo in code-specific tasks. Specifically, DeepSeek-Coder-V2 is further pre-trained from an intermediate checkpoint of DeepSeek-V2 with additional 6 trillion tokens. Through this continued pre-training, DeepSeek-Coder-V2 substantially enhances the coding and mathematical reasoning capabilities of DeepSeek-V2, while maintaining comparable performance in general language tasks. Compared to DeepSeek-Coder-33B, DeepSeek-Coder-V2 demonstrates significant advancements in various aspects of code-related tasks, as well as reasoning and general capabilities. Additionally, DeepSeek-Coder-V2 expands its support for programming languages from 86 to 338, while extending the context length from 16K to 128K. In standard benchmark evaluations, DeepSeek-Coder-V2 achieves...","companies":["DeepSeek"],"matched_orgs":["DeepSeek"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["HuggingFace org papers","deepseek-ai","language model"],"author_affiliations":["DeepSeek"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/deepseek-ai/papers"}},{"id":"openalex:W4402916263","title":"Multi-Track Timeline Control for Text-Driven 3D Human Motion Generation","url":"https://doi.org/10.1109/cvprw63382.2024.00197","published":"2024-06-17","authors":["Mathis Petrovich","Or Litany","Umar Iqbal","Michael J. Black","Gül Varol","Xue Bin Peng","Davis Rempe"],"abstract":"Recent advances in generative modeling have led to promising progress on synthesizing 3D human motion from text, with methods that can generate character animations from short prompts and specified durations. However, using a single text prompt as input lacks the fine-grained control needed by animators, such as composing multiple actions and defining precise durations for parts of the motion. To address this, we introduce the new problem of timeline control for text-driven motion synthesis, which provides an intuitive, yet fine-grained, input interface for users. Instead of a single prompt, users can specify a multi-track timeline of multiple prompts organized in temporal intervals that may overlap. This enables specifying the exact timings of each action and composing multiple actions in sequence or at overlapping intervals. To generate composite animations from a multi-track timeline,...","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/cvprw63382.2024.00197","openalex_id":"https://openalex.org/W4402916263","cited_by_count":25,"quality_score":62,"matched_keywords":[],"author_affiliations":["Max Planck Institute for Intelligent Systems","Nvidia (United States)","Technion – Israel Institute of Technology","Université Gustave Eiffel","École nationale des ponts et chaussées","Laboratoire d'Informatique Gaspard-Monge","Simon Fraser University","Université Bourgogne Franche-Comté"],"concepts":[{"id":"https://openalex.org/C4438859","display_name":"Timeline","score":0.882610559463501},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6760708093643188},{"id":"https://openalex.org/C89992363","display_name":"Track (disk drive)","score":0.6310746669769287},{"id":"https://openalex.org/C145565327","display_name":"Motion control","score":0.47320741415023804},{"id":"https://openalex.org/C104114177","display_name":"Motion (physics)","score":0.47170257568359375},{"id":"https://openalex.org/C2986578859","display_name":"Human motion","score":0.42055776715278625},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.3997972309589386},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.37916308641433716}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":25}},{"id":"apple:rterjevnvb7yy589u664i3zc","title":"Synthetic Query Generation using Large Language Models for Virtual Assistants","url":"https://machinelearning.apple.com/research/synthetic-query-gen-llm","published":"2024-06-17","authors":["Sonal Sannigrahi","Thiago Fraga da Silva","Youssef Oualil","Christophe Van Gysel"],"abstract":"This paper was accepted in the Industry Track at SIGIR 2024.","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.1145/3626772.3661355","openalex_id":"https://openalex.org/W4399597618","cited_by_count":8,"quality_score":60,"matched_keywords":[],"author_affiliations":["Apple","Apple (Germany)","Apple (United States)","Instituto Superior Técnico"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"openalex:W4402916267","title":"Revisiting pre-trained remote sensing model benchmarks: resizing and normalization matters","url":"https://doi.org/10.1109/cvprw63382.2024.00322","published":"2024-06-17","authors":["Isaac Corley","Caleb Robinson","Rahul Dodhia","Juan Lavista Ferres","Peyman Najafirad"],"abstract":"Research in self-supervised learning (SSL) with natural images has progressed rapidly in recent years and is now increasingly being applied to and benchmarked with datasets containing remotely sensed imagery. A common benchmark case is to evaluate SSL pre-trained model embeddings on datasets of remotely sensed imagery with small patch sizes, e.g., 32 × 32 pixels, whereas standard SSL pre-training takes place with larger patch sizes, e.g., 224 × 224. Furthermore, pre-training methods tend to use different image normalization preprocessing steps depending on the dataset. In this paper, we show, across seven satellite and aerial imagery datasets of varying resolution, that by simply following the preprocessing steps used in pre-training (precisely, image sizing and normalization methods), one can achieve significant performance improvements when evaluating the extracted features on downstre...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/cvprw63382.2024.00322","openalex_id":"https://openalex.org/W4402916267","cited_by_count":15,"quality_score":52,"matched_keywords":[],"author_affiliations":["Microsoft (United States)","The University of Texas at San Antonio"],"concepts":[{"id":"https://openalex.org/C56281022","display_name":"Resizing","score":0.8452509045600891},{"id":"https://openalex.org/C136886441","display_name":"Normalization (sociology)","score":0.7326372265815735},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.646624743938446},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3503209054470062},{"id":"https://openalex.org/C144133560","display_name":"Business","score":0.11460268497467041},{"id":"https://openalex.org/C19165224","display_name":"Anthropology","score":0.0},{"id":"https://openalex.org/C105639569","display_name":"Economic policy","score":0.0},{"id":"https://openalex.org/C2910001868","display_name":"European union","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":15}},{"id":"openalex:W4402916763","title":"Task Navigator: Decomposing Complex Tasks for Multimodal Large Language Models","url":"https://doi.org/10.1109/cvprw63382.2024.00230","published":"2024-06-17","authors":["Feipeng Ma","Yizhou Zhou","Yueyi Zhang","Siying Wu","Zheyu Zhang","Zilong He","Fengyun Rao","Xiaoyan Sun"],"abstract":"Inspired by the remarkable progress achieved by recent Large Language Models (LLMs), Multimodal Large Language Models (MLLMs) take LLMs as their brains, and have achieved surprising results in many downstream tasks by training on a large amount of task-specific data. However, when faced with complex tasks that require the collaboration of multiple capabilities, existing MLLMs recollect training data and retrain the model, ignoring the systematic utilization of LLMs and their possessed capabilities learned in downstream tasks. Inspired by the way humans tackle complex questions, in this paper, we propose a novel framework called Task Navigator. In our framework, LLMs act as navigators to chart a viable path for solving complex tasks and guide MLLMs through the process step by step. Specifically, LLMs iteratively break down sub-problems and refine them to be more reasonable and answerable,...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/cvprw63382.2024.00230","openalex_id":"https://openalex.org/W4402916763","cited_by_count":4,"quality_score":41,"matched_keywords":[],"author_affiliations":["Institute of Art","National Science Center","Tencent (China)","University of Science and Technology of China"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8124674558639526},{"id":"https://openalex.org/C2780451532","display_name":"Task (project management)","score":0.7127981185913086},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.47654587030410767},{"id":"https://openalex.org/C175154964","display_name":"Task analysis","score":0.44858384132385254},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.41245341300964355},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.39271068572998047},{"id":"https://openalex.org/C201995342","display_name":"Systems engineering","score":0.10394850373268127},{"id":"https://openalex.org/C127413603","display_name":"Engineering","score":0.07948276400566101}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":4}},{"id":"arxiv:2406.11263","title":"Understanding the Collapse of LLMs in Model Editing","url":"https://huggingface.co/papers/2406.11263","published":"2024-06-17","authors":["Wanli Yang","Fei Sun","Jiajun Tan","Xinyu Ma","Du Su","Dawei Yin","Huawei Shen"],"abstract":"Despite significant progress in model editing methods, their application in real-world scenarios remains challenging as they often cause large language models (LLMs) to collapse. Among them, ROME is particularly concerning, as it could disrupt LLMs with only a single edit. In this paper, we study the root causes of such collapse. Through extensive analysis, we identify two primary factors that contribute to the collapse: i) inconsistent handling of prefixed and unprefixed keys in the parameter update equation may result in very small denominators, causing excessively large parameter updates; ii) the subject of collapse cases is usually the first token, whose unprefixed key distribution significantly differs from the prefixed key distribution in autoregressive transformers, causing the aforementioned issue to materialize. To validate our findings, we propose a simple yet effective approac...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["huggingface_search"],"source":"huggingface_search","work_type":"","doi":"","openalex_id":"","cited_by_count":0,"quality_score":27,"matched_keywords":[],"author_affiliations":[],"concepts":[],"official_report":false,"quality_signals":{"company_match_source":"HuggingFace Papers organization/author metadata"}},{"id":"arxiv:2406.11704","title":"Nemotron-4 340B Technical Report","url":"https://huggingface.co/papers/2406.11704","published":"2024-06-17","authors":["Nvidia","Bo Adler","Niket Agarwal","Ashwath Aithal","Dong H. Anh","Pallab Bhattacharya","Annika Brundyn","Jared Casper","Bryan Catanzaro","Sharon Clay","Jonathan Cohen","Sirshak Das"],"abstract":"We release the Nemotron-4 340B model family, including Nemotron-4-340B-Base, Nemotron-4-340B-Instruct, and Nemotron-4-340B-Reward. Our models are open access under the NVIDIA Open Model License Agreement, a permissive model license that allows distribution, modification, and use of the models and its outputs. These models perform competitively to open access models on a wide range of evaluation benchmarks, and were sized to fit on a single DGX H100 with 8 GPUs when deployed in FP8 precision. We believe that the community can benefit from these models in various research studies and commercial applications, especially for generating synthetic data to train smaller language models. Notably, over 98% of data used in our model alignment process is synthetically generated, showcasing the effectiveness of these models in generating synthetic data. To further support open research and facilitat...","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["huggingface_search"],"source":"huggingface_search","work_type":"","doi":"","openalex_id":"","cited_by_count":0,"quality_score":27,"matched_keywords":[],"author_affiliations":[],"concepts":[],"official_report":false,"quality_signals":{"company_match_source":"HuggingFace Papers organization/author metadata"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/interpreting-user-requests-in-the-context-of-natural-language-standing-instructions","title":"Interpreting User Requests in the Context of Natural Language Standing Instructions","url":"https://www.microsoft.com/en-us/research/publication/interpreting-user-requests-in-the-context-of-natural-language-standing-instructions/","published":"2024-06-16","authors":["Nikita Moghe","Patrick Xia","Jacob Andreas","Jason Eisner","Ben Van Durme","Harsh Jhamtani"],"abstract":"Users of natural language interfaces, generally powered by Large Language Models (LLMs),often must repeat their preferences each time they make a similar request. We describe an approach to LLM-based dialogue modeling in which persistent user constraints and preferences -- collectively termed standing instructions -- as additional context for such interfaces. For example, when a user states\"I'm hungry\", a previously expressed preference for Persian food can be automatically added to the LLM prompt, influencing the search for relevant restaurants. We develop NLSI, a language-to-program dataset consisting of over 2.4K dialogues spanning 17 domains, where each dialogue is paired with a user profile (a set of users specific standing instructions) and corresponding structured representations (API calls). A key challenge in NLSI is to identify which subset of the standing instructions is appli...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","1970-01-01","LLM","preference","retrieval"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W4402781025","title":"ViLa-MIL: Dual-scale Vision-Language Multiple Instance Learning for Whole Slide Image Classification","url":"https://doi.org/10.1109/cvpr52733.2024.01069","published":"2024-06-16","authors":["Jiangbo Shi","Chen Li","Tieliang Gong","Yefeng Zheng","Huazhu Fu"],"abstract":"Multiple instance learning (MIL)-based framework has become the mainstream for processing the whole slide image (WSI) with giga-pixel size and hierarchical image context in digital pathology. However, these methods heavily depend on a substantial number of bag-level labels and solely learn from the original slides, which are easily affected by variations in data distribution. Recently, vision language model (VLM)-based methods introduced the language prior by pre-training on large-scale pathological image-text pairs. However, the previous text prompt lacks the consideration of pathological prior knowledge, there-fore does not substantially boost the model's performance. Moreover, the collection of such pairs and the pre-training process are very time-consuming and source-intensive. To solve the above problems, we propose a dual-scale vision-language multiple instance learning (ViLa-MIL)....","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/cvpr52733.2024.01069","openalex_id":"https://openalex.org/W4402781025","cited_by_count":38,"quality_score":75,"matched_keywords":["LLM","language model"],"author_affiliations":["Agency for Science, Technology and Research","Institute of High Performance Computing","Tencent (China)","Xi'an Jiaotong University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7017116546630859},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.6327126026153564},{"id":"https://openalex.org/C2780980858","display_name":"Dual (grammatical number)","score":0.6296062469482422},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.5740470886230469},{"id":"https://openalex.org/C2778755073","display_name":"Scale (ratio)","score":0.5678577423095703},{"id":"https://openalex.org/C75294576","display_name":"Contextual image classification","score":0.5113658308982849},{"id":"https://openalex.org/C115961682","display_name":"Image (mathematics)","score":0.42738738656044006},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.4147469997406006}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":38}},{"id":"openalex:W4402727444","title":"SDSTrack: Self-Distillation Symmetric Adapter Learning for Multi-Modal Visual Object Tracking","url":"https://doi.org/10.1109/cvpr52733.2024.02507","published":"2024-06-16","authors":["Xiaojun Hou","Jiazheng Xing","Yijie Qian","Yaowei Guo","Shuo Xin","Junhao Chen","Kai Tang","Rui Wang","Zhengkai Jiang","Liang Liu","Yong Liu"],"abstract":"Multimodal Visual Object Tracking (VOT) has recently gained significant attention due to its robustness. Early research focused on fully fine-tuning RGB-based trackers, which was inefficient and lacked generalized representation due to the scarcity of multimodal data. Therefore, recent studies have utilized prompt tuning to transfer pre-trained RGB-based trackers to multimodal data. However, the modality gap limits pre-trained knowledge recall, and the dominance of the RGB modality persists, preventing the full utilization of information from other modalities. To address these issues, we propose a novel symmetric multimodal tracking framework called SDSTrack. We introduce lightweight adaptation for efficient fine-tuning, which directly transfers the feature extraction ability from RGB to other domains with a small number of trainable parameters and integrates multimodal features in a bal...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/cvpr52733.2024.02507","openalex_id":"https://openalex.org/W4402727444","cited_by_count":77,"quality_score":75,"matched_keywords":["efficient","distillation"],"author_affiliations":["Huzhou University","Tencent (China)","Zhejiang University"],"concepts":[{"id":"https://openalex.org/C177284502","display_name":"Adapter (computing)","score":0.7302269339561462},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6656678318977356},{"id":"https://openalex.org/C71139939","display_name":"Modal","score":0.5604814887046814},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5585981011390686},{"id":"https://openalex.org/C204030448","display_name":"Distillation","score":0.5444780588150024},{"id":"https://openalex.org/C202474056","display_name":"Video tracking","score":0.5380185842514038},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.5321534872055054},{"id":"https://openalex.org/C2775936607","display_name":"Tracking (education)","score":0.4934445917606354}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":77}},{"id":"openalex:W4402703076","title":"PhotoMaker: Customizing Realistic Human Photos via Stacked ID Embedding","url":"https://doi.org/10.1109/cvpr52733.2024.00825","published":"2024-06-16","authors":["Zhen Li","Mingdeng Cao","Xintao Wang","Zhongang Qi","Ming‐Ming Cheng","Ying Shan"],"abstract":"Recent advances in text-to-image generation have made remarkable progress in synthesizing realistic human photos conditioned on given text prompts. However, existing per-sonalized generation methods cannot simultaneously sat-isfy the requirements of high efficiency, promising identity (ID) fidelity, and flexible text controllability. In this work, we introduce PhotoMaker, an efficient personalized text-to-image generation method, which mainly encodes an arbitrary number of input ID images into a stack ID embed-ding for preserving ID information. Such an embedding, serving as a unified ID representation, can not only encap-sulate the characteristics of the same input ID comprehen-sively, but also accommodate the characteristics of differ-ent IDs for subsequent integration. This paves the way for more intriguing and practically valuable applications. Be-sides, to drive the training of our....","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/cvpr52733.2024.00825","openalex_id":"https://openalex.org/W4402703076","cited_by_count":103,"quality_score":75,"matched_keywords":["personalized","efficient"],"author_affiliations":["Nankai University","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7232623100280762},{"id":"https://openalex.org/C41608201","display_name":"Embedding","score":0.7149875164031982},{"id":"https://openalex.org/C121684516","display_name":"Computer graphics (images)","score":0.47441738843917847},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.29882699251174927}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":103}},{"id":"openalex:W4402753874","title":"CogAgent: A Visual Language Model for GUI Agents","url":"https://doi.org/10.1109/cvpr52733.2024.01354","published":"2024-06-16","authors":["Wenyi Hong","Weihan Wang","Qingsong Lv","Jiazheng Xu","Wenmeng Yu","Junhui Ji","Yan Wang","Zihan Wang","Yuxiao Dong","Ming Ding","Jie Tang"],"abstract":"People are spending an enormous amount of time on dig-ital devices through graphical user interfaces (GUIs), e.g., computer or smartphone screens. Large language models (LLMs) such as ChatGPT can assist people in tasks like writing emails, but struggle to understand and interact with GUIs, thus limiting their potential to increase automation levels. In this paper, we introduce CogAgent, an 18-billion-parameter visual language model (VLM) specializing in GUI understanding and navigation. By utilizing both low-resolution and high-resolution image encoders, CogA-gent supports input at a resolution of1120 × 1120, enabling it to recognize tiny page elements and text. As a general-ist visual language model, CogAgent achieves the state of the art on five text-rich and four general VQA benchmarks, including VQAv2, OK- VQA, Text- Vqa, St- Vqa, ChartQA, infoVQA, DocVQA, MM-Vet, and POPE. CogAgent,...","companies":["Z.ai/Zhipu"],"matched_orgs":["Z.ai/Zhipu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/cvpr52733.2024.01354","openalex_id":"https://openalex.org/W4402753874","cited_by_count":126,"quality_score":75,"matched_keywords":["LLM","language model"],"author_affiliations":["Tsinghua University","Zhipu AI (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.808350145816803},{"id":"https://openalex.org/C199360897","display_name":"Programming language","score":0.6157408356666565},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.46697068214416504},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.3828444480895996},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3265524208545685}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":126}},{"id":"openalex:W4402716047","title":"YOLO-World: Real-Time Open-Vocabulary Object Detection","url":"https://doi.org/10.1109/cvpr52733.2024.01599","published":"2024-06-16","authors":["Tianheng Cheng","Lin Song","Yixiao Ge","Wenyu Liu","Xinggang Wang","Ying Shan"],"abstract":"The You Only Look Once (YOLO) series of detectors have established themselves as efficient and practical tools. However, their reliance on predefined and trained object categories limits their applicability in open scenarios. Addressing this limitation, we introduce YOLO-World, an innovative approach that enhances YOLO with open-vocabulary detection capabilities through vision-language modeling and pre-training on large-scale datasets. Specifically, we propose a new Re-parameterizable Vision-Language Path Aggregation Network (RepVL-PAN) and region-text contrastive loss to facilitate the interaction between visual and linguistic information. Our method excels in detecting a wide range of objects in a zero-shot manner with high efficiency. On the challenging LVIS dataset, YOLO-World achieves 35.4 AP with 52.0 FPS on V100, which outperforms many state-of-the-art methods in terms of both acc...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/cvpr52733.2024.01599","openalex_id":"https://openalex.org/W4402716047","cited_by_count":516,"quality_score":71,"matched_keywords":["efficient"],"author_affiliations":["Huazhong University of Science and Technology","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7555388808250427},{"id":"https://openalex.org/C2777601683","display_name":"Vocabulary","score":0.6076176762580872},{"id":"https://openalex.org/C2776151529","display_name":"Object detection","score":0.549267590045929},{"id":"https://openalex.org/C2781238097","display_name":"Object (grammar)","score":0.5342599153518677},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.47760722041130066},{"id":"https://openalex.org/C41895202","display_name":"Linguistics","score":0.16122940182685852},{"id":"https://openalex.org/C153180895","display_name":"Pattern recognition (psychology)","score":0.11211615800857544},{"id":"https://openalex.org/C138885662","display_name":"Philosophy","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":516}},{"id":"openalex:W4402753969","title":"ViT-CoMer: Vision Transformer with Convolutional Multi-scale Feature Interaction for Dense Predictions","url":"https://doi.org/10.1109/cvpr52733.2024.00525","published":"2024-06-16","authors":["Chunlong Xia","Xinliang Wang","Feng Lv","Xin Hao","Yifeng Shi"],"abstract":"Although Vision Transformer (ViT) has achieved significant success in computer vision, it does not perform well in dense prediction tasks due to the lack of inner-patch information interaction and the limited diversity of feature scale. Most existing studies are devoted to designing vision-specific transformers to solve the above problems, which introduce additional pre-training costs. Therefore, we present a plain, pre-training-free, and feature-enhanced ViT back-bone with Convolutional Multi-scale feature interaction, named ViT-CoMer, which facilitates bidirectional interaction between CNN and transformer. Compared to the state-of-the-art, ViT-CoMer has the following advantages: (1) We inject spatial pyramid multi-receptive field convolutional features into the ViT architecture, which effectively alleviates the problems of limited local information interaction and single-feature repres...","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/cvpr52733.2024.00525","openalex_id":"https://openalex.org/W4402753969","cited_by_count":109,"quality_score":71,"matched_keywords":["efficient"],"author_affiliations":["Baidu (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.683384358882904},{"id":"https://openalex.org/C66322947","display_name":"Transformer","score":0.6135214567184448},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5896464586257935},{"id":"https://openalex.org/C2776401178","display_name":"Feature (linguistics)","score":0.5021345615386963},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.4894053041934967},{"id":"https://openalex.org/C153180895","display_name":"Pattern recognition (psychology)","score":0.48174285888671875},{"id":"https://openalex.org/C52622490","display_name":"Feature extraction","score":0.4380190372467041},{"id":"https://openalex.org/C81363708","display_name":"Convolutional neural network","score":0.43769699335098267}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":109}},{"id":"openalex:W4402716381","title":"SEED-Bench: Benchmarking Multimodal Large Language Models","url":"https://doi.org/10.1109/cvpr52733.2024.01263","published":"2024-06-16","authors":["Bohao Li","Yuying Ge","Yixiao Ge","Guangzhi Wang","Rui Wang","Ruimao Zhang","Ying Shan"],"abstract":"Multimodal large language models (MLLMs), building upon the foundation of powerful large language models (LLMs), have recently demonstrated exceptional capabilities in generating not only texts but also images given in-terleaved multimodal inputs (acting like a combination of GPT-4V and DALL-E 3). However, existing MLLM benchmarks remain limited to assessing only models' comprehension ability of single image-text inputs, failing to keep up with the strides made in MLLMs. A comprehensive benchmark is imperative for investigating the progress and uncovering the limitations of current MLLMs. In this work, we categorize the capabilities of MLLMs into hierarchical levels from L<inf xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">0</inf> to L<inf xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">4</inf> based on....","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/cvpr52733.2024.01263","openalex_id":"https://openalex.org/W4402716381","cited_by_count":100,"quality_score":71,"matched_keywords":["efficient"],"author_affiliations":["Chinese University of Hong Kong, Shenzhen","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C86251818","display_name":"Benchmarking","score":0.8006609678268433},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6905812621116638},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.532813310623169},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.43323689699172974},{"id":"https://openalex.org/C144133560","display_name":"Business","score":0.059057652950286865},{"id":"https://openalex.org/C162853370","display_name":"Marketing","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":100}},{"id":"openalex:W4402727014","title":"MAPLM: A Real-World Large-Scale Vision-Language Benchmark for Map and Traffic Scene Understanding","url":"https://doi.org/10.1109/cvpr52733.2024.02061","published":"2024-06-16","authors":["Xu Cao","Tong Zhou","Yunsheng Ma","Wenqian Ye","Can Cui","Tang Kun","Zhipeng Cao","Kaizhao Liang","Ziran Wang","James M. Rehg","Chao Zheng"],"abstract":"Vision-language generative AI has demonstrated re-markable promise for empowering cross-modal scene understanding of autonomous driving and high-definition (HD) map systems. However, current benchmark datasets lack multi-modal point cloud, image, and language data pairs. Recent approaches utilize visual instruction learning and cross-modal prompt engineering to expand vision-language models into this domain. In this paper, we pro-pose a new vision-language benchmark that can be used to finetune traffic and HD map domain-specific foundation models. Specifically, we annotate and leverage large-scale, broad-coverage traffic and map data extracted from huge HD map annotations, and use CLIP and LLaMA-2 / Vi-cuna to finetune a baseline model with instruction-following data. Our experimental results across various algorithms reveal that while visual instruction-tuning large language models (LLM...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/cvpr52733.2024.02061","openalex_id":"https://openalex.org/W4402727014","cited_by_count":35,"quality_score":71,"matched_keywords":["LLM"],"author_affiliations":["Purdue University West Lafayette","Tencent (China)","Universitas Ratu Samban","University of Illinois Urbana-Champaign","University of Virginia"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6770942211151123},{"id":"https://openalex.org/C2778755073","display_name":"Scale (ratio)","score":0.6485365629196167},{"id":"https://openalex.org/C185798385","display_name":"Benchmark (surveying)","score":0.6482911109924316},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5609879493713379},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.5356466770172119},{"id":"https://openalex.org/C58640448","display_name":"Cartography","score":0.21496325731277466},{"id":"https://openalex.org/C205649164","display_name":"Geography","score":0.19193118810653687}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":35}},{"id":"openalex:W4402782434","title":"HumanGaussian: Text-Driven 3D Human Generation with Gaussian Splatting","url":"https://doi.org/10.1109/cvpr52733.2024.00635","published":"2024-06-16","authors":["Xian Liu","Xiaohang Zhan","Jiaxiang Tang","Ying Shan","Gang Zeng","Dahua Lin","Xihui Liu","Ziwei Liu"],"abstract":"Realistic 3D human generationfrom text prompts is a de-sirable yet challenging task. Existing methods optimize 3D representations like mesh or neural fields via score distil-lation sampling (SDS), which suffers from inadequate fine details or excessive training time. In this paper, we pro-pose an efficient yet effective framework, HumanGaussian, that generates high-quality 3D humans with fine-grained geometry and realistic appearance. Our key insight is that 3D Gaussian Splatting is an efficient renderer with peri-odic Gaussian shrinkage or growing, where such adaptive density control can be naturally guided by intrinsic human structures. Specifically, 1) we first propose a Structure-Aware SDS that simultaneously optimizes human appear-ance and geometry. The multi-modal score function from both RGB and depth space is leveraged to distill the Gaus-sian densification and pruning process. 2...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/cvpr52733.2024.00635","openalex_id":"https://openalex.org/W4402782434","cited_by_count":42,"quality_score":71,"matched_keywords":["efficient"],"author_affiliations":["Tencent (China)","University of Hong Kong"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7669708728790283},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.43059271574020386},{"id":"https://openalex.org/C163716315","display_name":"Gaussian","score":0.41354283690452576},{"id":"https://openalex.org/C121684516","display_name":"Computer graphics (images)","score":0.40978971123695374},{"id":"https://openalex.org/C121332964","display_name":"Physics","score":0.09537658095359802},{"id":"https://openalex.org/C62520636","display_name":"Quantum mechanics","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":42}},{"id":"openalex:W4402728160","title":"Dynamic Adapter Meets Prompt Tuning: Parameter-Efficient Transfer Learning for Point Cloud Analysis","url":"https://doi.org/10.1109/cvpr52733.2024.01393","published":"2024-06-16","authors":["Xin Zhou","Dingkang Liang","Wei Xu","Xingkui Zhu","Yihan Xu","Zhikang Zou","Xiang Bai"],"abstract":"Point cloud analysis has achieved outstanding performance by transferring point cloud pretrained models. However, existing methods for model adaptation usually update all model parameters, i.e., full fine-tuning paradigm, which is inefficient as it relies on high computational costs (e.g., training GPU memory) and massive storage space. In this paper, we aim to study parameter-efficient transfer learning for point cloud analysis with an ideal tradeoff between task performance and parameter efficiency. To achieve this goal, we freeze the parameters of the default pretrained models and then propose the Dynamic Adapter, which generates a dynamic scale for each token, considering the token significance to the downstream task. We further seamlessly integrate Dynamic Adapter with Prompt Tuning (DAPT) by constructing Internal Prompts, capturing the instance-specific features for interaction. Ex...","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/cvpr52733.2024.01393","openalex_id":"https://openalex.org/W4402728160","cited_by_count":25,"quality_score":70,"matched_keywords":["memory","efficient"],"author_affiliations":["Baidu (China)","Huazhong University of Science and Technology"],"concepts":[{"id":"https://openalex.org/C177284502","display_name":"Adapter (computing)","score":0.7691425681114197},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6961719989776611},{"id":"https://openalex.org/C79974875","display_name":"Cloud computing","score":0.5291327238082886},{"id":"https://openalex.org/C150899416","display_name":"Transfer of learning","score":0.49647170305252075},{"id":"https://openalex.org/C131979681","display_name":"Point cloud","score":0.4911455810070038},{"id":"https://openalex.org/C2776175482","display_name":"Transfer (computing)","score":0.4668498635292053},{"id":"https://openalex.org/C28719098","display_name":"Point (geometry)","score":0.4169250726699829},{"id":"https://openalex.org/C120314980","display_name":"Distributed computing","score":0.3397987484931946}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":25}},{"id":"openalex:W4402727613","title":"SmartEdit: Exploring Complex Instruction-Based Image Editing with Multimodal Large Language Models","url":"https://doi.org/10.1109/cvpr52733.2024.00799","published":"2024-06-16","authors":["Yuzhou Huang","Liangbin Xie","Xintao Wang","Ziyang Yuan","Xiaodong Cun","Yixiao Ge","Jiantao Zhou","Chao Dong","Rui Huang","Ruimao Zhang","Ying Shan"],"abstract":"Current instruction-based image editing methods, such as InstructPix2Pix, often fail to produce satisfactory results in complex scenarios due to their dependence on the simple CLIP text encoder in diffusion models. To rectify this, this paper introduces SmartEdit, a novel approach of instruction-based image editing that leverages Multimodal Large Language Models (MLLMs) to enhance its understanding and reasoning capabilities. However, direct integration of these elements still faces challenges in situations requiring complex reasoning. To mitigate this, we propose a Bidirectional Interaction Module (BIM) that enables comprehensive bidirectional information interactions between the input image and the MLLM output. During training, we initially incorporate perception data to boost the perception and understanding capabilities of diffusion models. Subsequently, we demonstrate that a small a...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/cvpr52733.2024.00799","openalex_id":"https://openalex.org/W4402727613","cited_by_count":44,"quality_score":67,"matched_keywords":[],"author_affiliations":["Chinese University of Hong Kong, Shenzhen","Shenzhen Institutes of Advanced Technology","Tencent (China)","University of Macau"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8078844547271729},{"id":"https://openalex.org/C2776674983","display_name":"Image editing","score":0.48303255438804626},{"id":"https://openalex.org/C115961682","display_name":"Image (mathematics)","score":0.44308194518089294},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.4165796935558319},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.35678887367248535},{"id":"https://openalex.org/C121684516","display_name":"Computer graphics (images)","score":0.34223827719688416},{"id":"https://openalex.org/C49774154","display_name":"Multimedia","score":0.3260582685470581},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.3244086503982544}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":44}},{"id":"openalex:W4402816858","title":"Scaling Up to Excellence: Practicing Model Scaling for Photo-Realistic Image Restoration In the Wild","url":"https://doi.org/10.1109/cvpr52733.2024.02425","published":"2024-06-16","authors":["Fanghua Yu","Jinjin Gu","Zheyuan Li","Jinfan Hu","Xiangtao Kong","Xintao Wang","Jingwen He","Yu Qiao","Chao Dong"],"abstract":"We introduce SUPIR (Scaling-UP Image Restoration), a groundbreaking image restoration method that harnesses generative prior and the power of model scaling up. Lever-aging multi-modal techniques and advanced generative prior, SUPIR marks a significant advance in intelligent and realistic image restoration. As a pivotal catalyst within SUPIR, model scaling dramatically enhances its capabil-ities and demonstrates new potential for image restoration. We collect a dataset comprising 20 million high-resolution, high-quality images for model training, each en-riched with descriptive text annotations. SUPIR provides the capability to restore images guided by textual prompts, broadening its application scope and potential. Moreover, we introduce negative-quality prompts to further improve perceptual quality. We also develop a restoration-guided sampling method to suppress the fidelity issue enco...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/cvpr52733.2024.02425","openalex_id":"https://openalex.org/W4402816858","cited_by_count":85,"quality_score":67,"matched_keywords":[],"author_affiliations":["Chinese Academy of Sciences","Hong Kong Polytechnic University","ShangHai JiAi Genetics & IVF Institute","Shanghai Artificial Intelligence Laboratory","Shenzhen Institutes of Advanced Technology","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C99844830","display_name":"Scaling","score":0.838047981262207},{"id":"https://openalex.org/C2777352838","display_name":"Excellence","score":0.6730718612670898},{"id":"https://openalex.org/C106430172","display_name":"Image restoration","score":0.6231690645217896},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5143840909004211},{"id":"https://openalex.org/C115961682","display_name":"Image (mathematics)","score":0.4034622013568878},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3194640278816223},{"id":"https://openalex.org/C9417928","display_name":"Image processing","score":0.23501646518707275},{"id":"https://openalex.org/C33923547","display_name":"Mathematics","score":0.1505885124206543}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":85}},{"id":"openalex:W4402733574","title":"SAI3D: Segment any Instance in 3D Scenes","url":"https://doi.org/10.1109/cvpr52733.2024.00317","published":"2024-06-16","authors":["Yingda Yin","Yuzheng Liu","Xiao Yang","Daniel Cohen‐Or","Jingwei Huang","Baoquan Chen"],"abstract":"Advancements in 3D instance segmentation have tra-ditionally been tethered to the availability of annotated datasets, limiting their application to a narrow spectrum of object categories. Recent efforts have sought to har-ness vision-language models like CLIP for open-set semantic reasoning, yet these methods struggle to distinguish between objects of the same categories and rely on specific prompts that are not universally applicable. In this paper, we introduce SAI3D, a novel zero-shot 3D instance segmentation approach that synergistically leverages geometric priors and semantic cues derived from Segment Any-thing Model (SAM). Our method partitions a 3D scene into geometric primitives, which are then progressively merged into 3D instance segmentations that are consistent with the multi-view SAM masks. Moreover, we design a hierarchi-cal region-growing algorithm with a dynamic threshold...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/cvpr52733.2024.00317","openalex_id":"https://openalex.org/W4402733574","cited_by_count":32,"quality_score":67,"matched_keywords":[],"author_affiliations":["Peking University","Tel Aviv University","Tencent (China)","École nationale des ponts et chaussées"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.666575014591217},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5016100406646729},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.443433552980423},{"id":"https://openalex.org/C121684516","display_name":"Computer graphics (images)","score":0.43163496255874634}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":32}},{"id":"openalex:W4402623742","title":"MIGC: Multi-Instance Generation Controller for Text-to-Image Synthesis","url":"https://doi.org/10.1109/cvpr52733.2024.00651","published":"2024-06-16","authors":["Dewei Zhou","You Li","Fan Ma","Xiaoting Zhang","Yi Yang"],"abstract":"We present a Multi-Instance Generation (MIG) task, si-multaneously generating multiple instances with diverse controls in one image. Given a set of predefined coordinates and their corresponding descriptions, the task is to ensure that generated instances are accurately at the designated locations and that all instances' attributes adhere to their corresponding description. This broadens the scope of current research on Single-instance generation, elevating it to a more versatile and practical dimension. Inspired by the idea of divide and conquer, we introduce an innovative approach named Multi-Instance Generation Controller (MIGC) to address the challenges of the MIG task. Ini-tially, we break down the MIG task into several subtasks, each involving the shading of a single instance. To ensure precise shading for each instance, we introduce an instance enhancement attention mechanism. Las...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/cvpr52733.2024.00651","openalex_id":"https://openalex.org/W4402623742","cited_by_count":52,"quality_score":67,"matched_keywords":[],"author_affiliations":["Huawei Technologies (China)","Zhejiang University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6751507520675659},{"id":"https://openalex.org/C203479927","display_name":"Controller (irrigation)","score":0.4532330334186554},{"id":"https://openalex.org/C115961682","display_name":"Image (mathematics)","score":0.45053738355636597},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4052177369594574},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.36449918150901794},{"id":"https://openalex.org/C86803240","display_name":"Biology","score":0.0},{"id":"https://openalex.org/C6557445","display_name":"Agronomy","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":52}},{"id":"openalex:W4402716423","title":"LL3DA: Visual Interactive Instruction Tuning for Omni-3D Understanding, Reasoning, and Planning","url":"https://doi.org/10.1109/cvpr52733.2024.02496","published":"2024-06-16","authors":["Sijin Chen","Xin Chen","Chi Zhang","Mingsheng Li","Gang Yu","Hao Fei","Hongyuan Zhu","Jiayuan Fan","Tao Chen"],"abstract":"Recent progress in Large Multimodal Models (LMM) has opened up great possibilities for various applications in the field of human-machine interactions. However, developing LMMs that can comprehend, reason, and plan in complex and diverse 3D environments remains a challenging topic, especially considering the demand for understanding permutation-invariant point cloud representations of the 3D scene. Existing works seek help from multi-view images by projecting 2D features to 3D space, which inevitably leads to huge computational overhead and performance degradation. In this paper, we present LL3DA, a Large Language 3D Assistant that takes point cloud as the direct input and responds to both text instructions and visual interactions. The additional visual interaction enables LMMs to better comprehend human interactions with the 3D environment and further remove the ambiguities within plain...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/cvpr52733.2024.02496","openalex_id":"https://openalex.org/W4402716423","cited_by_count":57,"quality_score":67,"matched_keywords":[],"author_affiliations":["Agency for Science, Technology and Research","Fudan University","Institute for Infocomm Research","National University of Singapore","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7676568627357483},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.5673345923423767},{"id":"https://openalex.org/C99740376","display_name":"Interactive visual analysis","score":0.4257458448410034},{"id":"https://openalex.org/C36464697","display_name":"Visualization","score":0.32281196117401123},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.28542467951774597},{"id":"https://openalex.org/C59732488","display_name":"Visual analytics","score":0.2760390341281891}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":57}},{"id":"openalex:W4402753859","title":"Forgery-aware Adaptive Transformer for Generalizable Synthetic Image Detection","url":"https://doi.org/10.1109/cvpr52733.2024.01024","published":"2024-06-16","authors":["Huan Liu","Zichang Tan","Chuangchuang Tan","Yunchao Wei","Jingdong Wang","Yao Zhao"],"abstract":"In this paper, we study the problem of generalizable syn-thetic image detection, aiming to detect forgery images from diverse generative methods, e.g., GANs and diffusion mod-els. Cutting-edge solutions start to explore the benefits of pre-trained models, and mainly follow the fixed paradigm of solely training an attached classifier, e.g., combining frozen CLIP-ViT with a learnable linear layer in UniFD [43]. However, our analysis shows that such a fixed paradigm is prone to yield detectors with insufficient learning regarding forgery representations. We attribute the key challenge to the lack of forgery adaptation, and present a novel forgery-aware adaptive transformer approach, namely FatFormer. Based on the pre-trained vision-language spaces of CLIP, FatFormer introduces two core designs for the adaption to build generalized forgery representations. First, motivated by the fact that b...","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/cvpr52733.2024.01024","openalex_id":"https://openalex.org/W4402753859","cited_by_count":59,"quality_score":67,"matched_keywords":[],"author_affiliations":["Baidu (China)","Beijing Jiaotong University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6681221723556519},{"id":"https://openalex.org/C66322947","display_name":"Transformer","score":0.5499670505523682},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5085402727127075},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.4712042212486267},{"id":"https://openalex.org/C153180895","display_name":"Pattern recognition (psychology)","score":0.35941094160079956},{"id":"https://openalex.org/C127413603","display_name":"Engineering","score":0.12345141172409058},{"id":"https://openalex.org/C165801399","display_name":"Voltage","score":0.09291410446166992},{"id":"https://openalex.org/C119599485","display_name":"Electrical engineering","score":0.07047915458679199}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":59}},{"id":"openalex:W4402713106","title":"DreamAvatar: Text-and-Shape Guided 3D Human Avatar Generation via Diffusion Models","url":"https://doi.org/10.1109/cvpr52733.2024.00097","published":"2024-06-16","authors":["Yukang Cao","Yan‐Pei Cao","Kai Han","Ying Shan","Kwan-Yee K. Wong"],"abstract":"We present DreamAvatar, a text-and-shape guided framework for generating high-quality 3D human avatars with controllable poses. While encouraging results have been reported by recent methods on text-guided 3D common object generation, generating high-quality human avatars remains an open challenge due to the complexity of the human body's shape, pose, and appearance. We propose DreamAvatar to tackle this challenge, which utilizes a train-able NeRF for predicting density and color for 3D points and pretrained text-to-image diffusion models for providing 2D self-supervision. Specifically, we leverage the SMPL model to provide shape and pose guidance for the generation. We introduce a dual-observation-space design that involves the joint optimization of a canonical space and a posed space that are related by a learnable deformation field. This facilitates the generation of more complete tex...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/cvpr52733.2024.00097","openalex_id":"https://openalex.org/W4402713106","cited_by_count":85,"quality_score":67,"matched_keywords":[],"author_affiliations":["Tencent (China)","University of Hong Kong"],"concepts":[{"id":"https://openalex.org/C2777365542","display_name":"Avatar","score":0.903731107711792},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6706225872039795},{"id":"https://openalex.org/C69357855","display_name":"Diffusion","score":0.6007338166236877},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.40572190284729004},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3795612156391144},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.33914801478385925},{"id":"https://openalex.org/C121332964","display_name":"Physics","score":0.0987439751625061},{"id":"https://openalex.org/C97355855","display_name":"Thermodynamics","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":85}},{"id":"openalex:W4402753848","title":"DiffEditor: Boosting Accuracy and Flexibility on Diffusion-Based Image Editing","url":"https://doi.org/10.1109/cvpr52733.2024.00811","published":"2024-06-16","authors":["Chong Mou","Xintao Wang","Jiechong Song","Ying Shan","Jian Zhang"],"abstract":"Large-scale Text-to-Image (T2I) diffusion models have revolutionized image generation over the last few years. Although owning diverse and high-quality generation capabilities, translating these abilities to fine-grained Image editing remains challenging. In this paper, we propose DiffEditor to rectify two weaknesses in existing diffusion-based image editing: (1) in complex scenarios, editing results often lack editing accuracy and exhibit unexpected artifacts; (2) lack of flexibility to harmonize editing operations, e.g., imagine new content. In our solution, we introduce image prompts in fine-grained image editing, cooperating with the text prompt to better describe the editing content. To increase the flexibility while maintaining content consistency, we locally combine stochastic differential equation (SDE) into the ordinary differential equation (ODE) sampling. In addition, we incor...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/cvpr52733.2024.00811","openalex_id":"https://openalex.org/W4402753848","cited_by_count":36,"quality_score":67,"matched_keywords":[],"author_affiliations":["Peking University","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C46686674","display_name":"Boosting (machine learning)","score":0.8149164319038391},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7354813814163208},{"id":"https://openalex.org/C2780598303","display_name":"Flexibility (engineering)","score":0.5793376564979553},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5601097941398621},{"id":"https://openalex.org/C2776674983","display_name":"Image editing","score":0.5160703063011169},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.4836758077144623},{"id":"https://openalex.org/C115961682","display_name":"Image (mathematics)","score":0.4037778377532959},{"id":"https://openalex.org/C121684516","display_name":"Computer graphics (images)","score":0.32273751497268677}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":36}},{"id":"openalex:W4402754290","title":"A Pedestrian is Worth One Prompt: Towards Language Guidance Person Re- Identification","url":"https://doi.org/10.1109/cvpr52733.2024.01642","published":"2024-06-16","authors":["Zexian Yang","Dayan Wu","Chenming Wu","Lin Zheng","Jingzi Gu","Weiping Wang"],"abstract":"Extensive advancements have been made in person ReID through the mining of semantic information. Nevertheless, existing methods that utilize semantic-parts from a single image modality do not explicitly achieve this goal. Whiteness the impressive capabilities in multimodal understanding of Vision Language Foundation Model CLIP, a recent two-stage CLIP-based method employs automated prompt engineering to obtain specific textual labels for classifying pedestrians. However, we note that the predefined soft prompts may be inadequate in expressing the entire visual context and struggle to generalize to unseen classes. This paper presents an end-to-end Prompt-driven Semantic Guidance (PromptSG) framework that harnesses the rich semantics inherent in CLIP. Specifically, we guide the model to attend to regions that are semantically faithful to the prompt. To provide personalized language descrip...","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/cvpr52733.2024.01642","openalex_id":"https://openalex.org/W4402754290","cited_by_count":26,"quality_score":67,"matched_keywords":["personalized"],"author_affiliations":["Baidu (China)","Chinese Academy of Sciences","Institute of Information Engineering"],"concepts":[{"id":"https://openalex.org/C2777113093","display_name":"Pedestrian","score":0.6917363405227661},{"id":"https://openalex.org/C116834253","display_name":"Identification (biology)","score":0.6781541705131531},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6053794622421265},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.43626198172569275},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.4041183590888977},{"id":"https://openalex.org/C22212356","display_name":"Transport engineering","score":0.24497297406196594},{"id":"https://openalex.org/C127413603","display_name":"Engineering","score":0.19534450769424438},{"id":"https://openalex.org/C86803240","display_name":"Biology","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":26}},{"id":"openalex:W4402727598","title":"LaRE<sup>2</sup>: Latent Reconstruction Error Based Method for Diffusion-Generated Image Detection","url":"https://doi.org/10.1109/cvpr52733.2024.01609","published":"2024-06-16","authors":["Yunpeng Luo","Junlong Du","Ke Yan","Shouhong Ding"],"abstract":"The evolution of Diffusion Models has dramatically improved image generation quality, making it increasingly difficult to differentiate between real and generated images. This development, while impressive, also raises significant privacy and security concerns. In response to this, we propose a novel Latent REconstruction error guided feature REfinement method (LaRE<sup xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">2</sup>) for detecting the diffusion-generated images. We come up with the Latent Reconstruction Error (LaRE), the first reconstruction-error based feature in the latent space for generated image detection. LaRE surpasses existing methods in terms of feature extraction efficiency while preserving crucial cues required to differentiate between the real and the fake. To exploit LaRE, we propose an Error-Guided feature REfinement module...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/cvpr52733.2024.01609","openalex_id":"https://openalex.org/W4402727598","cited_by_count":29,"quality_score":66,"matched_keywords":[],"author_affiliations":["Tencent (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5521706938743591},{"id":"https://openalex.org/C141379421","display_name":"Iterative reconstruction","score":0.5339207649230957},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5157692432403564},{"id":"https://openalex.org/C115961682","display_name":"Image (mathematics)","score":0.5063410997390747},{"id":"https://openalex.org/C69357855","display_name":"Diffusion","score":0.4693649113178253},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.4670844078063965},{"id":"https://openalex.org/C153180895","display_name":"Pattern recognition (psychology)","score":0.34613388776779175},{"id":"https://openalex.org/C11413529","display_name":"Algorithm","score":0.324576735496521}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":29}},{"id":"arxiv:2402.19014","title":"Enhancing Visual Document Understanding with Contrastive Learning in Large Visual-Language Models","url":"http://arxiv.org/abs/2402.19014","published":"2024-06-16","authors":["Xin Li","Yunfei Wu","Xinghua Jiang","Zhihao Guo","Mingming Gong","Haoyu Cao","Yinsong Liu","Deqiang Jiang","Xing Sun"],"abstract":"Recently, the advent of Large Visual-Language Models (LVLMs) has received increasing attention across various domains, particularly in the field of visual document understanding (VDU). Different from conventional vision-language tasks, VDU is specifically concerned with text-rich scenarios containing abundant document elements. Nevertheless, the importance of fine-grained features remains largely unexplored within the community of LVLMs, leading to suboptimal performance in text-rich scenarios. In this paper, we abbreviate it as the fine-grained feature collapse issue. With the aim of filling this gap, we propose a contrastive learning framework, termed Document Object COntrastive learning (DoCo), specifically tailored for the downstream tasks of VDU. DoCo leverages an auxiliary multimodal encoder to obtain the features of document objects and align them to the visual features generated....","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/cvpr52733.2024.01472","openalex_id":"https://openalex.org/W4402716457","cited_by_count":24,"quality_score":61,"matched_keywords":[],"author_affiliations":["Tencent (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7946881055831909},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.6159994006156921},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5152738094329834},{"id":"https://openalex.org/C2780878386","display_name":"Visual language","score":0.5143693685531616},{"id":"https://openalex.org/C41895202","display_name":"Linguistics","score":0.31720542907714844},{"id":"https://openalex.org/C138885662","display_name":"Philosophy","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":24}},{"id":"openalex:W4402753983","title":"Learning to Rematch Mismatched Pairs for Robust Cross-Modal Retrieval","url":"https://doi.org/10.1109/cvpr52733.2024.02519","published":"2024-06-16","authors":["Haochen Han","Qinghua Zheng","Guang Dai","Minnan Luo","Jingdong Wang"],"abstract":"Collecting well-matched multimedia datasets is crucial for training cross-modal retrieval models. However, in real-world scenarios, massive multimodal data are harvested from the Internet, which inevitably contains Partially Mis-matched Pairs (PMPs). Undoubtedly, such semantical irrelevant data will remarkably harm the cross-modal retrieval performance. Previous efforts tend to mitigate this problem by estimating a soft correspondence to down-weight the contribution of PMPs. In this paper, we aim to address this challenge from a new perspective: the potential semantic similarity among unpaired samples makes it possible to excavate useful knowledge from mismatched pairs. To achieve this, we propose L2RM, a general framework based on Optimal Transport (OT) that learns to rematch mismatched pairs. In detail, L2RM aims to generate refined alignments by seeking a minimal-cost transport plan a...","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/cvpr52733.2024.02519","openalex_id":"https://openalex.org/W4402753983","cited_by_count":18,"quality_score":59,"matched_keywords":["retrieval"],"author_affiliations":["Baidu (China)","State Grid Corporation of China (China)","Xi'an Jiaotong University"],"concepts":[{"id":"https://openalex.org/C71139939","display_name":"Modal","score":0.7378479242324829},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6399540901184082},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.48405924439430237},{"id":"https://openalex.org/C192562407","display_name":"Materials science","score":0.12494146823883057},{"id":"https://openalex.org/C188027245","display_name":"Polymer chemistry","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":18}},{"id":"openalex:W4402727313","title":"Contrastive Pre-Training with Multi-View Fusion for No-Reference Point Cloud Quality Assessment","url":"https://doi.org/10.1109/cvpr52733.2024.02451","published":"2024-06-16","authors":["Ziyu Shan","Yujie Zhang","Qi Yang","Haichen Yang","Yiling Xu","Jenq–Neng Hwang","Xiaozhong Xu","Shan Liu"],"abstract":"No-reference point cloud quality assessment (NR-PCQA) aims to automatically evaluate the perceptual quality of distorted point clouds without available reference, which have achieved tremendous improvements due to the utilization of deep neural networks. However, learning-based NR-PCQA methods suffer from the scarcity of labeled data and usually perform suboptimally in terms of generalization. To solve the problem, we propose a novel contrastive pre-training framework tailored for PCQA (CoPA), which enables the pre-trained model to learn quality-aware representations from unlabeled data. To obtain anchors in the representation space, we project point clouds with different distortions into images and randomly mix their local patches to form mixed images with multiple distortions. Utilizing the generated anchors, we constrain the pretraining process via a quality-aware contrastive loss fol...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/cvpr52733.2024.02451","openalex_id":"https://openalex.org/W4402727313","cited_by_count":21,"quality_score":58,"matched_keywords":[],"author_affiliations":["Shanghai Jiao Tong University","Tencent (China)","University of Washington"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.768810510635376},{"id":"https://openalex.org/C131979681","display_name":"Point cloud","score":0.6497493982315063},{"id":"https://openalex.org/C79974875","display_name":"Cloud computing","score":0.5394886136054993},{"id":"https://openalex.org/C2779530757","display_name":"Quality (philosophy)","score":0.5325713753700256},{"id":"https://openalex.org/C2777211547","display_name":"Training (meteorology)","score":0.514754056930542},{"id":"https://openalex.org/C28719098","display_name":"Point (geometry)","score":0.4879305064678192},{"id":"https://openalex.org/C3020001037","display_name":"Quality assessment","score":0.4324919283390045},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3882037103176117}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":21}},{"id":"openalex:W4402715856","title":"OMG: Towards Open-vocabulary Motion Generation via Mixture of Controllers","url":"https://doi.org/10.1109/cvpr52733.2024.00053","published":"2024-06-16","authors":["Han Liang","Jiacheng Bao","Ruichi Zhang","Sihan Ren","Yuecheng Xu","Sibei Yang","Xin Chen","Jingyi Yu","Lan Xu"],"abstract":"We have recently seen tremendous progress in realistic text-to-motion generation. Yet, the existing methods of-ten fail or produce implausible motions with unseen text inputs, which limits the applications. In this paper, we present OMG, a novel framework, which enables compelling motion generation from zero-shot open-vocabulary text prompts. Our key idea is to carefully tailor the pretrain-then-finetune paradigm into the text-to-motion generation. At the pre-training stage, our model improves the gener-ation ability by learning the rich out-of-domain inherent motion traits. To this end, we scale up a large unconditional diffusion model up to 1B parameters, so as to utilize the massive unlabeled motion data up to over 20M motion instances. At the subsequent fine-tuning stage, we intro-duce motion ControlNet, which incorporates text prompts as conditioning information, through a trainable...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/cvpr52733.2024.00053","openalex_id":"https://openalex.org/W4402715856","cited_by_count":19,"quality_score":56,"matched_keywords":[],"author_affiliations":["Tencent (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7304158806800842},{"id":"https://openalex.org/C104114177","display_name":"Motion (physics)","score":0.6194924116134644},{"id":"https://openalex.org/C2777601683","display_name":"Vocabulary","score":0.5636177062988281},{"id":"https://openalex.org/C145565327","display_name":"Motion control","score":0.5358527898788452},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4634622633457184},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.4177328050136566},{"id":"https://openalex.org/C90509273","display_name":"Robot","score":0.17488950490951538},{"id":"https://openalex.org/C41895202","display_name":"Linguistics","score":0.11828950047492981}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":19}},{"id":"openalex:W4402727492","title":"HumanRef: Single Image to 3D Human Generation via Reference-Guided Diffusion","url":"https://doi.org/10.1109/cvpr52733.2024.00181","published":"2024-06-16","authors":["Jingbo Zhang","Xiaoyu Li","Qi Zhang","Yan‐Pei Cao","Ying Shan","Jing Liao"],"abstract":"Generating a 3D human model from a single reference image is challenging because it requires inferring textures and geometries in invisible views while maintaining consistency with the reference image. Previous methods utilizing 3D generative models are limited by the availability of 3D training data. Optimization-based methods that lift text-to-image diffusion models to 3D generation often fail to preserve the texture details of the reference image, resulting in inconsistent appearances in different views. In this paper, we propose HumanRef, a 3D human generation framework from a single-view input. To ensure the generated 3D model is photorealistic and consistent with the input image, HumanRef introduces a novel method called reference-guided score distillation sampling (Ref-SDS), which effectively incorporates image guidance into the generation process. Furthermore, we introduce region...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/cvpr52733.2024.00181","openalex_id":"https://openalex.org/W4402727492","cited_by_count":15,"quality_score":56,"matched_keywords":["distillation"],"author_affiliations":["City University of Hong Kong","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5978763699531555},{"id":"https://openalex.org/C69357855","display_name":"Diffusion","score":0.4329400062561035},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.41946959495544434},{"id":"https://openalex.org/C115961682","display_name":"Image (mathematics)","score":0.4102303385734558},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3772641718387604},{"id":"https://openalex.org/C121332964","display_name":"Physics","score":0.07824906706809998},{"id":"https://openalex.org/C97355855","display_name":"Thermodynamics","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":15}},{"id":"openalex:W4402753581","title":"Aligning and Prompting Everything All at Once for Universal Visual Perception","url":"https://doi.org/10.1109/cvpr52733.2024.01253","published":"2024-06-16","authors":["Yunhang Shen","Chaoyou Fu","Peixian Chen","Mengdan Zhang","Ke Li","Xing Sun","Yunsheng Wu","Shaohui Lin","Rongrong Ji"],"abstract":"Vision foundation models have been explored recently to build general-purpose vision systems. However, predomi-nant paradigms, driven by casting instance-level tasks as an object-word alignment, bring heavy cross-modality in-teraction, which is not effective in prompting object detection and visual grounding. Another line of work that fo-cuses on pixel-level tasks often encounters a large annotation gap of things and stuff, and suffers from mutual inter-ference between foreground-object and background-class segmentation. In stark contrast to the prevailing methods, we present APE, a universal visual perception model for aligning and prompting everything all at once in an image to perform diverse tasks, i.e., detection, segmentation, and grounding, as an instance-level sentence-object matching paradigm. Specifically, APE advances the convergence of detection and grounding by reformulating...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/cvpr52733.2024.01253","openalex_id":"https://openalex.org/W4402753581","cited_by_count":19,"quality_score":56,"matched_keywords":[],"author_affiliations":["East China Normal University","Tencent (China)","Xiamen University"],"concepts":[{"id":"https://openalex.org/C26760741","display_name":"Perception","score":0.6819576025009155},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5539864897727966},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.49930477142333984},{"id":"https://openalex.org/C188147891","display_name":"Cognitive science","score":0.4258612394332886},{"id":"https://openalex.org/C178253425","display_name":"Visual perception","score":0.4160420596599579},{"id":"https://openalex.org/C180747234","display_name":"Cognitive psychology","score":0.37424761056900024},{"id":"https://openalex.org/C107038049","display_name":"Aesthetics","score":0.37395238876342773},{"id":"https://openalex.org/C15744967","display_name":"Psychology","score":0.36606043577194214}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":19}},{"id":"openalex:W4402727055","title":"OVER-NAV: Elevating Iterative Vision-and-Language Navigation with Open-Vocabulary Detection and StructurEd Representation","url":"https://doi.org/10.1109/cvpr52733.2024.01542","published":"2024-06-16","authors":["Ganlong Zhao","Guanbin Li","Weikai Chen","Yizhou Yu"],"abstract":"Recent advances in Iterative Vision-and-Language Navigation (IVLN) introduce a more meaningful and practical paradigm of VLN by maintaining the agent's memory across tours of scenes. Although the long-term memory aligns better with the persistent nature of the VLN task, it poses more challenges on how to utilize the highly unstructured navigation memory with extremely sparse supervision. Towards this end, we propose OVER-NAV, which aims to go over and beyond the current arts of IVLN techniques. In particular, we propose to incorporate LLMs and open-vocabulary detectors to distill key information and establish correspondence between multi-modal signals. Such a mechanism introduces reliable cross-modal supervision and enables on-the-fly generalization to unseen scenes without the need of extra annotation and re-training. To fully exploit the interpreted navigation data, we further introduc...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/cvpr52733.2024.01542","openalex_id":"https://openalex.org/W4402727055","cited_by_count":6,"quality_score":55,"matched_keywords":["memory","long-term","agent"],"author_affiliations":["Sun Yat-sen University","Tencent (China)","University of Hong Kong"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7669181227684021},{"id":"https://openalex.org/C2777601683","display_name":"Vocabulary","score":0.6074674129486084},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.6035662889480591},{"id":"https://openalex.org/C2776359362","display_name":"Representation (politics)","score":0.5894904136657715},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.5695040822029114},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.46647363901138306},{"id":"https://openalex.org/C41895202","display_name":"Linguistics","score":0.08375048637390137},{"id":"https://openalex.org/C138885662","display_name":"Philosophy","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":6}},{"id":"openalex:W4402727079","title":"VIT-LENS: Towards Omni-modal Representations","url":"https://doi.org/10.1109/cvpr52733.2024.02516","published":"2024-06-16","authors":["Weixian Lei","Yixiao Ge","Kun Yi","Jianfeng Zhang","Difei Gao","Dylan Sun","Yuying Ge","Ying Shan","Mike Zheng Shou"],"abstract":"Aiming to advance AI agents, large foundation models significantly improve reasoning and instruction execution, yet the current focus on vision and language neglects the potential of perceiving diverse modalities in open-world environments. However, the success of data-driven vision and language models is costly or even infeasible to be reproduced for rare modalities. In this paper, we present Vit-lens that facilitates efficient omni-modal representation learning by perceiving novel modalities with a pretrained- ViT and aligning them to a pre-defined space. Specifically, the modality-specific lens is tuned to project any-modal signals to an intermediate embedding space, which are then processed by a strong ViT with pre-trained visual knowledge. The encoded representations are optimized toward aligning with the modal-independent space, pre-defined by off-the-shelf foundation models. Vit-l...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/cvpr52733.2024.02516","openalex_id":"https://openalex.org/W4402727079","cited_by_count":11,"quality_score":52,"matched_keywords":["efficient"],"author_affiliations":["National University of Singapore","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C71139939","display_name":"Modal","score":0.6489422917366028},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5888262987136841},{"id":"https://openalex.org/C15336307","display_name":"Lens (geology)","score":0.5141428112983704},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.32328933477401733},{"id":"https://openalex.org/C120665830","display_name":"Optics","score":0.23075062036514282},{"id":"https://openalex.org/C121332964","display_name":"Physics","score":0.18175414204597473},{"id":"https://openalex.org/C192562407","display_name":"Materials science","score":0.13673850893974304},{"id":"https://openalex.org/C188027245","display_name":"Polymer chemistry","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":11}},{"id":"openalex:W4402753692","title":"Not All Prompts Are Secure: A Switchable Backdoor Attack Against Pre-trained Vision Transfomers","url":"https://doi.org/10.1109/cvpr52733.2024.02306","published":"2024-06-16","authors":["Sheng Yang","Jiawang Bai","Kuofeng Gao","Yong Yang","Yiming Li","Shu–Tao Xia"],"abstract":"Given the power of vision transformers, a new learning paradigm, pre-training and then prompting, makes it more efficient and effective to address downstream visual recog-nition tasks. In this paper, we identify a novel security threat towards such a paradigm from the perspective of back-door attacks. Specifically, an extra prompt token, called the switch token in this work, can turn the backdoor mode on, i.e., converting a benign model into a backdoored one. Once under the backdoor mode, a specific trigger can force the model to predict a target class. It poses a severe risk to the users of cloud API, since the malicious behavior can not be activated and detected under the benign mode, thus making the attack very stealthy. To attack a pre-trained model, our proposed attack, named SWARM, learns a trigger and prompt tokens including a switch token. They are optimized with the clean loss w...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/cvpr52733.2024.02306","openalex_id":"https://openalex.org/W4402753692","cited_by_count":7,"quality_score":52,"matched_keywords":["efficient","distillation"],"author_affiliations":["Tencent (China)","Tsinghua University","Zhejiang University"],"concepts":[{"id":"https://openalex.org/C2781045450","display_name":"Backdoor","score":0.9832480549812317},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6412440538406372},{"id":"https://openalex.org/C38652104","display_name":"Computer security","score":0.6251682043075562},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.42913520336151123},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4179069995880127},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.3474672734737396}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":7}},{"id":"openalex:W4402753779","title":"No Time to Train: Empowering Non-Parametric Networks for Few-Shot 3D Scene Segmentation","url":"https://doi.org/10.1109/cvpr52733.2024.00368","published":"2024-06-16","authors":["Xiangyang Zhu","Renrui Zhang","Bowei He","Ziyu Guo","Jiaming Liu","Han Xiao","Chaoyou Fu","Hao Dong","Peng Gao"],"abstract":"To reduce the reliance on large-scale datasets, recent works in 3D segmentation resort to few-shot learning. current 3D few-shot segmentation methods first pre-train models on ‘seen’ classes, and then evaluate their generalization performance on ‘unseen’ classes. However, the prior pre-training stage not only introduces excessive time over-head but also incurs a significant domain gap on ‘un-seen’ classes. To tackle these issues, we propose a Non-parametric Network for few-shot 3D Segmentation, Seg-NN, and its Parametric variant, Seg-PN. Without training, Seg-NN extracts dense representations by hand-crafted filters and achieves comparable performance to existing parametric models. Due to the elimination of pre-training, Seg-NN can alleviate the domain gap issue and save a substantial amount of time. Based on Seg-NN, Seg-PN only requires training a lightweight QUEry-Support Transferring....","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/cvpr52733.2024.00368","openalex_id":"https://openalex.org/W4402753779","cited_by_count":14,"quality_score":51,"matched_keywords":[],"author_affiliations":["Chinese University of Hong Kong","City University of Hong Kong","Peking University","Shanghai Artificial Intelligence Laboratory","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C2778344882","display_name":"Shot (pellet)","score":0.7792083024978638},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7205291986465454},{"id":"https://openalex.org/C89600930","display_name":"Segmentation","score":0.6603377461433411},{"id":"https://openalex.org/C117251300","display_name":"Parametric statistics","score":0.6060217618942261},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5767526626586914},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.5518954396247864},{"id":"https://openalex.org/C2992734406","display_name":"One shot","score":0.47194236516952515},{"id":"https://openalex.org/C127413603","display_name":"Engineering","score":0.10049355030059814}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":14}},{"id":"openalex:W4402704527","title":"HRVDA: High-Resolution Visual Document Assistant","url":"https://doi.org/10.1109/cvpr52733.2024.01471","published":"2024-06-16","authors":["Chaohu Liu","Kun Yin","Haoyu Cao","Xinghua Jiang","Xin Li","Yinsong Liu","Deqiang Jiang","Xing Sun","Linli Xu"],"abstract":"Leveraging vast training data, multimodal large language models (MLLMs) have demonstrated formidable general visual comprehension capabilities and achieved remarkable performance across various tasks. However, their performance in visual document understanding still leaves much room for improvement. This discrepancy is primarily attributed to the fact that visual document understanding is a fine-grained prediction task. In natural scenes, MLLMs typically use low-resolution images, leading to a substantial loss of visual information. Furthermore, general-purpose MLLMs do not excel in handling document-oriented instructions. In this paper, we propose a High-Resolution Visual Document Assistant (HRVDA), which bridges the gap between MLLMs and visual document understanding. This model employs a content filtering mechanism and an instruction filtering module to separately filter out the conte...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/cvpr52733.2024.01471","openalex_id":"https://openalex.org/W4402704527","cited_by_count":10,"quality_score":51,"matched_keywords":["efficient"],"author_affiliations":["Tencent (China)","University of Science and Technology of China"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7463549971580505},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.3930186927318573},{"id":"https://openalex.org/C121684516","display_name":"Computer graphics (images)","score":0.38543227314949036},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3525822162628174}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":10}},{"id":"openalex:W4402660108","title":"GenesisTex: Adapting Image Denoising Diffusion to Texture Space","url":"https://doi.org/10.1109/cvpr52733.2024.00442","published":"2024-06-16","authors":["Chenjian Gao","Boyan Jiang","Xinghui Li","Yingpeng Zhang","Qian Yu"],"abstract":"We present GenesisTex, a novel method for synthesizing textures for 3D geometries from text descriptions. GenesisTex adapts the pretrained image diffusion model to texture space by texture space sampling. Specifically, we maintain a latent texture map for each viewpoint, which is updated with predicted noise on the rendering of the corresponding viewpoint. The sampled latent texture maps are then decoded into a final texture map. During the sampling process, we focus on both global and local consistency across multiple viewpoints: global consistency is achieved through the integration of style consistency mechanisms within the noise prediction network, and low-level consistency is achieved by dynamically aligning latent textures. Finally, we apply reference-based inpainting and img2img on denser views for texture refinement. Our approach overcomes the limitations of slow optimization in....","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/cvpr52733.2024.00442","openalex_id":"https://openalex.org/W4402660108","cited_by_count":7,"quality_score":48,"matched_keywords":["distillation"],"author_affiliations":["Beihang University","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C2983327147","display_name":"Image denoising","score":0.676183819770813},{"id":"https://openalex.org/C2781195486","display_name":"Texture (cosmology)","score":0.6129345297813416},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6112802028656006},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.5819131731987},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.575102686882019},{"id":"https://openalex.org/C63099799","display_name":"Image texture","score":0.5742031335830688},{"id":"https://openalex.org/C163294075","display_name":"Noise reduction","score":0.5579572916030884},{"id":"https://openalex.org/C69357855","display_name":"Diffusion","score":0.5076433420181274}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":7}},{"id":"openalex:W4402716409","title":"Low-Rank Approximation for Sparse Attention in Multi-Modal LLMs","url":"https://doi.org/10.1109/cvpr52733.2024.01306","published":"2024-06-16","authors":["Song Lin","Yukang Chen","Shuai Yang","Xiaohan Ding","Yixiao Ge","Ying-Cong Chen","Ying Shan"],"abstract":"This paper focuses on the high computational complexity in Large Language Models (LLMs), a significant challenge in both natural language processing (NLP) and multi-modal tasks. We propose Low-Rank Approximation for Sparse Attention (LoRA -Sparse), an innovative approach that strategically reduces this complexity. LoRA -Sparse introduces low-rank linear projection layers for sparse attention approximation. It utilizes an order-mimic training methodology, which is crucial for efficiently approximating the self-attention mechanism in LLMs. We empirically show that sparse attention not only reduces computational demands, but also enhances model performance in both NLP and multi-modal tasks. This surprisingly shows that redundant attention in LLMs might be non-beneficial. We extensively validate LoRA -Sparse through rigorous empirical studies in both (NLP) and multi-modal tasks, demonstratin...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/cvpr52733.2024.01306","openalex_id":"https://openalex.org/W4402716409","cited_by_count":8,"quality_score":45,"matched_keywords":[],"author_affiliations":["Tencent (China)","University of Hong Kong"],"concepts":[{"id":"https://openalex.org/C71139939","display_name":"Modal","score":0.7175953388214111},{"id":"https://openalex.org/C164226766","display_name":"Rank (graph theory)","score":0.5723701119422913},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5046433210372925},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3249858021736145},{"id":"https://openalex.org/C33923547","display_name":"Mathematics","score":0.2605106830596924},{"id":"https://openalex.org/C114614502","display_name":"Combinatorics","score":0.1377272605895996},{"id":"https://openalex.org/C192562407","display_name":"Materials science","score":0.10343652963638306},{"id":"https://openalex.org/C188027245","display_name":"Polymer chemistry","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":8}},{"id":"openalex:W4402716341","title":"FreeMan: Towards Benchmarking 3D Human Pose Estimation Under Real-World Conditions","url":"https://doi.org/10.1109/cvpr52733.2024.02075","published":"2024-06-16","authors":["Jiong Wang","Fengyu Yang","Bingliang Li","Wenbo Gou","Danqi Yan","Ailing Zeng","Yijun Gao","Junle Wang","Yanqing Jing","Ruimao Zhang"],"abstract":"Estimating the 3D structure of the human body from nat-ural scenes is afundamental aspect of visual perception. 3D human pose estimation is a vital step in advancing fields like AIGC and human-robot interaction, serving as a crucial tech-nique for understanding and interacting with human actions in real-world settings. However, the current datasets, often collected under single laboratory conditions using complex motion capture equipment and unvarying backgrounds, are insufficient. The absence of datasets on variable conditions is stalling the progress of this crucial task. To facilitate the development of 3D pose estimation, we present FreeMan, the first large-scale, multi-view dataset collected under the real-world conditions. FreeMan was captured by synchronizing 8 smartphones across diverse scenarios. It comprises 11M frames from 8000 sequences, viewed from different perspec-tives. T...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/cvpr52733.2024.02075","openalex_id":"https://openalex.org/W4402716341","cited_by_count":8,"quality_score":45,"matched_keywords":[],"author_affiliations":["Chinese University of Hong Kong, Shenzhen","Institute for Development and Economic Analysis","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C86251818","display_name":"Benchmarking","score":0.9172430634498596},{"id":"https://openalex.org/C52102323","display_name":"Pose","score":0.6737784743309021},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6089320182800293},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.475299209356308},{"id":"https://openalex.org/C96250715","display_name":"Estimation","score":0.46521663665771484},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.33474233746528625},{"id":"https://openalex.org/C127413603","display_name":"Engineering","score":0.20857110619544983},{"id":"https://openalex.org/C201995342","display_name":"Systems engineering","score":0.12752535939216614}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":8}},{"id":"openalex:W4402780271","title":"MMSum: A Dataset for Multimodal Summarization and Thumbnail Generation of Videos","url":"https://doi.org/10.1109/cvpr52733.2024.02069","published":"2024-06-16","authors":["Jielin Qiu","Jiacheng Zhu","William Han","Aditesh Kumar","Karthik Mittal","Claire Jin","Zhengyuan Yang","Linjie Li","Jianfeng Wang","Ding Zhao","Bo Li","Lijuan Wang"],"abstract":"Multimodal summarization with multimodal output (MSMO) has emerged as a promising research direction. Nonetheless, numerous limitations exist within existing public MSMO datasets, including insufficient maintenance, data inaccessibility, limited size, and the absence of proper categorization, which pose significant challenges. To address these challenges and provide a comprehensive dataset for this new direction, we have meticulously curated the MMSum dataset. Our new dataset features (1) Human-validated summaries for both video and textual content, providing superior human instruction and labels for mul-timodal learning. (2) Comprehensively and meticulously arranged categorization, spanning 17 principal categories and 170 subcategories to encapsulate a diverse array of real-world scenarios. (3) Benchmark tests performed on the proposed dataset to assess various tasks and methods, includ...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/cvpr52733.2024.02069","openalex_id":"https://openalex.org/W4402780271","cited_by_count":7,"quality_score":44,"matched_keywords":[],"author_affiliations":["Carnegie Mellon University","Microsoft (United States)","University of Chicago"],"concepts":[{"id":"https://openalex.org/C160174412","display_name":"Thumbnail","score":0.9633853435516357},{"id":"https://openalex.org/C170858558","display_name":"Automatic summarization","score":0.9482266902923584},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8005921840667725},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4755527675151825},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.37523841857910156},{"id":"https://openalex.org/C115961682","display_name":"Image (mathematics)","score":0.18126928806304932}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":7}},{"id":"openalex:W4402753920","title":"Building a Strong Pre-Training Baseline for Universal 3D Large-Scale Perception","url":"https://doi.org/10.1109/cvpr52733.2024.01883","published":"2024-06-16","authors":["Haoming Chen","Zhizhong Zhang","Yanyun Qu","Ruixin Zhang","Xin Tan","Yuan Xie"],"abstract":"An effective pre-training framework with universal 3D representations is extremely desired in perceiving large- scale dynamic scenes. However, establishing such an ideal framework that is both task-generic and label-efficient poses a challenge in unifying the representation of the same primitive across diverse scenes. The current contrastive 3D pre-training methods typically follow a frame-level consistency, which focuses on the 2D-3D relationships in each detached image. Such inconsiderate consistency greatly hampers the promising path of reaching an universal pre-training framework: (1) The cross-scene semantic self-conflict, i.e., the intense collision between primitive segments of the same semantics from different scenes; (2) Lacking a globally unified bond that pushes the cross-scene semantic consistency into 3D representation learning. To address above challenges, we propose a CSC....","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/cvpr52733.2024.01883","openalex_id":"https://openalex.org/W4402753920","cited_by_count":3,"quality_score":44,"matched_keywords":["efficient"],"author_affiliations":["East China Normal University","Tencent (China)","Xiamen University"],"concepts":[{"id":"https://openalex.org/C12725497","display_name":"Baseline (sea)","score":0.8126829862594604},{"id":"https://openalex.org/C2777211547","display_name":"Training (meteorology)","score":0.6376640796661377},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6078732013702393},{"id":"https://openalex.org/C2778755073","display_name":"Scale (ratio)","score":0.6014497876167297},{"id":"https://openalex.org/C26760741","display_name":"Perception","score":0.591204047203064},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4195926785469055},{"id":"https://openalex.org/C15744967","display_name":"Psychology","score":0.1812852919101715},{"id":"https://openalex.org/C127313418","display_name":"Geology","score":0.10426333546638489}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":3}},{"id":"openalex:W4402716238","title":"Anchor-based Robust Finetuning of Vision-Language Models","url":"https://doi.org/10.1109/cvpr52733.2024.02542","published":"2024-06-16","authors":["Jinwei Han","Zhiwen Lin","Zhongyisun Sun","Yingguo Gao","Ke Yan","Shouhong Ding","Yuan Gao","Gui-Song Xia"],"abstract":"We aim at finetuning a vision-language model without hurting its out-of-distribution (OOD) generalization. We address two types of OOD generalization, i.e., i) domain shift such as natural to sketch images, and ii) zero-shot capability to recognize the category that was not contained in the finetune data. Arguably, the diminished OOD generalization after finetuning stems from the excessively simplified finetuning target, which only provides the class information, such as “a photo of a [CLASS]”. This is distinct from the process in that CLIP was pretrained, where there is abundant text supervision with rich semantic information. Therefore, we propose to compensate for the finetune process using auxiliary supervision with rich semantic information, which acts as anchors to preserve the OOD generalization. Specifically, two types of anchors are elaborated in our method, including i) text-co...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/cvpr52733.2024.02542","openalex_id":"https://openalex.org/W4402716238","cited_by_count":2,"quality_score":43,"matched_keywords":["language model"],"author_affiliations":["Tencent (China)","Wuhan University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6872648000717163},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4354054033756256}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":2}},{"id":"openalex:W4402754161","title":"UniGS: Unified Representation for Image Generation and Segmentation","url":"https://doi.org/10.1109/cvpr52733.2024.00603","published":"2024-06-16","authors":["Lu Qi","Lehan Yang","Weidong Guo","Yu Xu","Bo Du","Varun Jampani","Shuicheng Yan"],"abstract":"This paper introduces a novel unified representation of diffusion models for image generation and segmentation. Specifically, we use a colormap to represent entity-level masks, addressing the challenge of varying entity numbers while aligning the representation closely with the image RGB domain. Two novel modules, including the location-aware color palette and progressive dichotomy module, are proposed to support our mask representation. On the one hand, a location-aware palette guarantees the colors' consistency to entities' locations. On the other hand, the progressive dichotomy module can efficiently decode the synthesized colormap to high-quality entity-level masks in a depth-first binary search without knowing the cluster numbers. To tackle the issue of lacking large-scale segmentation training data, we employ an inpainting pipeline and then improve the flexibility of diffusion mode...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/cvpr52733.2024.00603","openalex_id":"https://openalex.org/W4402754161","cited_by_count":4,"quality_score":41,"matched_keywords":[],"author_affiliations":["Tencent (China)","University of California, Merced","University of Sydney","Wuhan University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6809332966804504},{"id":"https://openalex.org/C124504099","display_name":"Image segmentation","score":0.6745703220367432},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.6337937116622925},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.6171292066574097},{"id":"https://openalex.org/C2776359362","display_name":"Representation (politics)","score":0.5814300775527954},{"id":"https://openalex.org/C65885262","display_name":"Scale-space segmentation","score":0.523625910282135},{"id":"https://openalex.org/C89600930","display_name":"Segmentation","score":0.5174697041511536},{"id":"https://openalex.org/C25694479","display_name":"Segmentation-based object categorization","score":0.46476659178733826}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":4}},{"id":"openalex:W4402753568","title":"Generate Subgoal Images Before Act: Unlocking the Chain-of-Thought Reasoning in Diffusion Model for Robot Manipulation with Multimodal Prompts","url":"https://doi.org/10.1109/cvpr52733.2024.01327","published":"2024-06-16","authors":["Fei Ni","Jianye Hao","Shiguang Wu","Longxin Kou","Jiashun Liu","Yan Zheng","Bin Wang","Yuzheng Zhuang"],"abstract":"Robotics agents often struggle to understand and follow the multi-modal prompts in complex manipulation scenes which are challenging to be sufficiently and accurately described by text alone. Moreover, for long-horizon manipulation tasks, the deviation from general instruction tends to accumulate if lack of intermediate guidance from high-level subgoals. For this, we consider can we generate subgoal images before act to enhance the instruction following in long-horizon manipulation with multi-modal prompts? Inspired by the great success of diffusion model in image generation tasks, we propose a novel hierarchical framework named as CoTDiffusion that incorporates diffusion model as a high-level planner to convert the general and multimodal prompts into coherent visual subgoal plans, which further guide the low-level policy model before action execution. We design a semantic alignment modu...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/cvpr52733.2024.01327","openalex_id":"https://openalex.org/W4402753568","cited_by_count":4,"quality_score":41,"matched_keywords":[],"author_affiliations":["Huawei Technologies (China)","Tianjin University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5823490619659424},{"id":"https://openalex.org/C90509273","display_name":"Robot","score":0.5817379951477051},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4974048435688019},{"id":"https://openalex.org/C199185054","display_name":"Chain (unit)","score":0.45193326473236084},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.38280975818634033},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.3696678876876831},{"id":"https://openalex.org/C188147891","display_name":"Cognitive science","score":0.35780835151672363},{"id":"https://openalex.org/C15744967","display_name":"Psychology","score":0.2715988755226135}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":4}},{"id":"openalex:W4402716138","title":"FaceChain-SuDe: Building Derived Class to Inherit Category Attributes for One-Shot Subject-Driven Generation","url":"https://doi.org/10.1109/cvpr52733.2024.00689","published":"2024-06-16","authors":["Pengchong Qiao","Lei Shang","Chang Liu","Baigui Sun","Xiangyang Ji","Jie Chen"],"abstract":"Recently, subject-driven generation has garnered significant interest due to its ability to personalize text-to-image generation. Typical works focus on learning the new subject's private attributes. However, an important fact has not been taken seriously that a subject is not an isolated new concept but should be a specialization of a certain category in the pre-trained model. This results in the subject failing to comprehensively inherit the attributes in its category, causing poor attribute-related generations. In this paper, motivated by object-oriented programming, we model the subject as a derived class whose base class is its semantic category. This modeling enables the subject to inherit public attributes from its category while learning its private attributes from the user-provided example. Specifically, we propose a plug-and-play method, Subject-Derived regularization (SuDe). I...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/cvpr52733.2024.00689","openalex_id":"https://openalex.org/W4402716138","cited_by_count":4,"quality_score":41,"matched_keywords":[],"author_affiliations":["Alibaba Group (China)","Peking University","Tsinghua University"],"concepts":[{"id":"https://openalex.org/C2777855551","display_name":"Subject (documents)","score":0.6971499919891357},{"id":"https://openalex.org/C2778344882","display_name":"Shot (pellet)","score":0.6427575349807739},{"id":"https://openalex.org/C2777212361","display_name":"Class (philosophy)","score":0.6365243196487427},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6157805323600769},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3579134941101074},{"id":"https://openalex.org/C136764020","display_name":"World Wide Web","score":0.1235893964767456},{"id":"https://openalex.org/C185592680","display_name":"Chemistry","score":0.0},{"id":"https://openalex.org/C178790620","display_name":"Organic chemistry","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":4}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/exposing-the-achilles-heel-evaluating-llms-ability-to-handle-mistakes-in-mathematical-reasoning","title":"Exposing the Achilles' Heel: Evaluating LLMs Ability to Handle Mistakes in Mathematical Reasoning","url":"https://www.microsoft.com/en-us/research/publication/exposing-the-achilles-heel-evaluating-llms-ability-to-handle-mistakes-in-mathematical-reasoning/","published":"2024-06-15","authors":["Joykirat Singh","Akshay Nambi","Vibhav Vineet"],"abstract":"Large Language Models (LLMs) have been applied to Math Word Problems (MWPs) with transformative impacts, revolutionizing how these complex problems are approached and solved in various domains including educational settings. However, the evaluation of these models often prioritizes final accuracy, overlooking the crucial aspect of reasoning capabilities. This work addresses this gap by focusing on the ability of LLMs to detect and correct reasoning mistakes. We introduce a novel dataset MWP-MISTAKE, incorporating MWPs with both correct and incorrect reasoning steps generated through rule-based methods and smaller language models. Our comprehensive benchmarking reveals significant insights into the strengths and weaknesses of state-of-the-art models, such as GPT-4o, GPT-4, GPT-3.5Turbo, and others. We highlight GPT-$o's superior performance in mistake detection and rectification and the p...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Unpublished","Artificial intelligence","Computer science"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W4399666302","title":"BrainMass: Advancing Brain Network Analysis for Diagnosis With Large-Scale Self-Supervised Learning","url":"https://doi.org/10.1109/tmi.2024.3414476","published":"2024-06-14","authors":["Yanwu Yang","Chenfei Ye","Guinan Su","Ziyao Zhang","Zhikai Chang","Hairui Chen","Piu Chan","Yue Yu","Ting Ma"],"abstract":"Foundation models pretrained on large-scale datasets via self-supervised learning demonstrate exceptional versatility across various tasks. Due to the heterogeneity and hard-to-collect medical data, this approach is especially beneficial for medical image analysis and neuroscience research, as it streamlines broad downstream tasks without the need for numerous costly annotations. However, there has been limited investigation into brain network foundation models, limiting their adaptability and generalizability for broad neuroscience studies. In this study, we aim to bridge this gap. In particular, 1) we curated a comprehensive dataset by collating images from 30 datasets, which comprises 70,781 samples of 46,686 participants. Moreover, we introduce pseudo-functional connectivity (pFC) to further generates millions of augmented brain networks by randomly dropping certain timepoints of the...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/tmi.2024.3414476","openalex_id":"https://openalex.org/W4399666302","cited_by_count":32,"quality_score":67,"matched_keywords":[],"author_affiliations":["Capital Medical University","Chinese Academy of Sciences","Harbin Institute of Technology","Peng Cheng Laboratory","Shenzhen Institute of Information Technology","Shenzhen Institutes of Advanced Technology","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6419236063957214},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5834196209907532},{"id":"https://openalex.org/C2778755073","display_name":"Scale (ratio)","score":0.5778336524963379},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.47778186202049255},{"id":"https://openalex.org/C121332964","display_name":"Physics","score":0.0},{"id":"https://openalex.org/C62520636","display_name":"Quantum mechanics","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":32}},{"id":"official:b555e65fd18842e4","title":"ViP: A Differentially Private Foundation Model for Computer Vision","url":"https://ai.meta.com/research/publications/vip-a-differentially-private-foundation-model-for-computer-vision/","published":"2024-06-14","authors":["Yaodong Yu","Maziar Sanjabi","Yi Ma","Kamalika Chaudhuri","Chuan Guo"],"abstract":"Official Meta AI publication page.","companies":["Meta/FAIR"],"matched_orgs":["Meta/FAIR"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Core Machine Learning"],"author_affiliations":["Meta/FAIR"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Meta AI publication search https://ai.meta.com/results/?content_types%5B0%5D=publication&page=13"}},{"id":"apple:g8l2yuiyxkonh2ti3emps6h2","title":"Time Sensitive Knowledge Editing through Efficient Finetuning","url":"https://machinelearning.apple.com/research/time-sensitive-finetuning","published":"2024-06-14","authors":["Hugh Ge","Ali Mousavi","Edouard Grave","Armand Joulin","Kun Qian","Benjamin Han","Mostafa Arefiyan Khalilabad","Yunyao Li"],"abstract":"Large Language Models (LLMs) have demonstrated impressive capability in different tasks and are bringing transformative changes to many domains. However, keeping the knowledge in LLMs up-to-date remains a challenge once pretraining is complete. It is thus essential to design effective methods to both update obsolete knowledge and induce new knowledge into LLMs. Existing locate-and-edit knowledge editing (KE) method suffers from two limitations....","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["efficient"],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"official:38bc89aa4b12416c","title":"Decomposed evaluations of geographic disparities in text-to-image models","url":"https://ai.meta.com/research/publications/decomposed-evaluations-of-geographic-disparities-in-text-to-image-models/","published":"2024-06-14","authors":["Abhishek Sureddy","Dishant Padalia","Nandhinee Periyakaruppa","Oindrila Saha","Adina Williams","Adriana Romero Soriano","Megan Richards","Polina Kirichenko","Melissa Hall"],"abstract":"Official Meta AI publication page.","companies":["Meta/FAIR"],"matched_orgs":["Meta/FAIR"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Computer Vision"],"author_affiliations":["Meta/FAIR"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Meta AI publication search https://ai.meta.com/results/?content_types%5B0%5D=publication&page=12"}},{"id":"apple:d48adb0xs9vx7a29f9ia9mjc","title":"Server-side Rescoring of Spoken Entity-centric Knowledge Queries for Virtual Assistants","url":"https://machinelearning.apple.com/research/server-side-rescoring","published":"2024-06-14","authors":["Darien Zhang","Sashank Gondala","Thiago Fraga da Silva","Christophe Van Gysel"],"abstract":"On-device Virtual Assistants powered by Automated Speech Recognition (ASR) require effective knowledge integration for the challenging entity-rich query recognition.In this paper, we conduct an empirical study of modeling strategies for server-side rescoring of spoken information domain queries using various categories of Language Models (N-Gram word Language Models, sub-word neural LMs). We investigate the combination of on-device and...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.1007/s10772-024-10102-y","openalex_id":"https://openalex.org/W4399368038","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Apple","Apple (United States)"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"openalex:W4399687862","title":"Design and evaluation of a global workspace agent embodied in a realistic multimodal environment","url":"https://doi.org/10.3389/fncom.2024.1352685","published":"2024-06-14","authors":["Rousslan Fernand Julien Dossa","Kai Arulkumaran","Arthur Juliani","Shuntaro Sasai","Ryota Kanai"],"abstract":"As the apparent intelligence of artificial neural networks (ANNs) advances, they are increasingly likened to the functional networks and information processing capabilities of the human brain. Such comparisons have typically focused on particular modalities, such as vision or language. The next frontier is to use the latest advances in ANNs to design and investigate scalable models of higher-level cognitive processes, such as conscious information access, which have historically lacked concrete and specific hypotheses for scientific evaluation. In this work, we propose and then empirically assess an embodied agent with a structure based on global workspace theory (GWT) as specified in the recently proposed \"indicator properties\" of consciousness. In contrast to prior works on GWT which utilized single modalities, our agent is trained to navigate 3D environments based on realistic audiovi...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.3389/fncom.2024.1352685","openalex_id":"https://openalex.org/W4399687862","cited_by_count":4,"quality_score":49,"matched_keywords":["memory","agent"],"author_affiliations":["Hoya (Japan)","Microsoft (United States)","Microsoft Research New York City (United States)"],"concepts":[{"id":"https://openalex.org/C58581272","display_name":"Workspace","score":0.9385663866996765},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7966284155845642},{"id":"https://openalex.org/C2779903281","display_name":"Modalities","score":0.6714814901351929},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5871566534042358},{"id":"https://openalex.org/C100609095","display_name":"Embodied cognition","score":0.5716661810874939},{"id":"https://openalex.org/C20854674","display_name":"Cognitive architecture","score":0.5628411769866943},{"id":"https://openalex.org/C123657996","display_name":"Architecture","score":0.5438310503959656},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.5044289827346802}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":4}},{"id":"openalex:W4399660826","title":"Machine learning on longitudinal multi-modal data enables the understanding and prognosis of Alzheimer’s disease progression","url":"https://doi.org/10.1016/j.isci.2024.110263","published":"2024-06-14","authors":["Suixia Zhang","Jing Yuan","Yu Sun","Fei Wu","Ziyue Liu","Fei‐Fei Zhai","Yaoyun Zhang","Judith Somekh","Mor Peleg","Yi‐Cheng Zhu","Zhengxing Huang"],"abstract":"WBA). The index of disease-related states provided a remarkable performance in predicting the time to conversion to AD dementia (C-Index: 0.923 ± 0.007). Our model shows potential for promoting the understanding of heterogeneous disease progression and early predicting the conversion time to AD dementia.","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1016/j.isci.2024.110263","openalex_id":"https://openalex.org/W4399660826","cited_by_count":10,"quality_score":47,"matched_keywords":[],"author_affiliations":["Alibaba Group (Cayman Islands)","Alibaba Group (China)","Chinese Academy of Medical Sciences & Peking Union Medical College","Peking Union Medical College Hospital","University of Haifa","Xinjiang Medical University","Zhejiang University"],"concepts":[{"id":"https://openalex.org/C2779134260","display_name":"Disease","score":0.5831576585769653},{"id":"https://openalex.org/C3020672099","display_name":"Longitudinal data","score":0.47917547821998596},{"id":"https://openalex.org/C71139939","display_name":"Modal","score":0.47387099266052246},{"id":"https://openalex.org/C2522767166","display_name":"Data science","score":0.44578611850738525},{"id":"https://openalex.org/C169760540","display_name":"Neuroscience","score":0.4234508275985718},{"id":"https://openalex.org/C502032728","display_name":"Alzheimer's disease","score":0.4134741425514221},{"id":"https://openalex.org/C188147891","display_name":"Cognitive science","score":0.4058288037776947},{"id":"https://openalex.org/C71924100","display_name":"Medicine","score":0.3829898238182068}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":10}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/videogui-a-benchmark-for-gui-automation-from-instructional-videos","title":"VideoGUI: A Benchmark for GUI Automation from Instructional Videos","url":"https://www.microsoft.com/en-us/research/publication/videogui-a-benchmark-for-gui-automation-from-instructional-videos/","published":"2024-06-13","authors":["Kevin Qinghong Lin","Linjie Li","Difei Gao","Qinchen Wu","Mingyi Yan","Zhengyuan Yang","Lijuan Wang","Mike Zheng Shou"],"abstract":"Graphical User Interface (GUI) automation holds significant promise for enhancing human productivity by assisting with computer tasks. Existing task formulations primarily focus on simple tasks that can be specified by a single, language-only instruction, such as\"Insert a new slide.\"In this work, we introduce VideoGUI, a novel multi-modal benchmark designed to evaluate GUI assistants on visual-centric GUI tasks. Sourced from high-quality web instructional videos, our benchmark focuses on tasks involving professional and novel software (e.g., Adobe Photoshop or Stable Diffusion WebUI) and complex activities (e.g., video editing). VideoGUI evaluates GUI assistants through a hierarchical process, allowing for identification of the specific levels at which they may fail: (i) high-level planning: reconstruct procedural subtasks from visual conditions without language descriptions; (ii) middle...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"official:2008baece530e11c","title":"Know When To Stop: A Study of Semantic Drift in Text Generation","url":"https://ai.meta.com/research/publications/know-when-to-stop-a-study-of-semantic-drift-in-text-generation/","published":"2024-06-13","authors":["Ava Spataru","Eric Hambro","Lena Voita","Nicola Cancedda"],"abstract":"Official Meta AI publication page.","companies":["Meta/FAIR"],"matched_orgs":["Meta/FAIR"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["NLP"],"author_affiliations":["Meta/FAIR"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Meta AI publication search https://ai.meta.com/results/?content_types%5B0%5D=publication&page=13"}},{"id":"arxiv:2511.23335","title":"Towards Improving Interpretability of Language Model Generation Through a Structured Knowledge Discovery Approach","url":"http://arxiv.org/abs/2511.23335","published":"2024-06-13","authors":["Shuqi Liu","Han Wu","Guanzhi Deng","Jianshu Chen","Xiaoyang Wang","Linqi Song"],"abstract":"Knowledge-enhanced text generation aims to enhance the quality of generated text by utilizing internal or external knowledge sources. While language models have demonstrated impressive capabilities in generating coherent and fluent text, the lack of interpretability presents a substantial obstacle. The limited interpretability of generated text significantly impacts its practical usability, particularly in knowledge-enhanced text generation tasks that necessitate reliability and explainability. Existing methods often employ domain-specific knowledge retrievers that are tailored to specific data characteristics, limiting their generalizability to diverse data types and tasks. To overcome this limitation, we directly leverage the two-tier architecture of structured knowledge, consisting of high-level entities and lowlevel knowledge triples, to design our task-agnostic structured knowledge....","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/jstsp.2024.3414147","openalex_id":"https://openalex.org/W4399619866","cited_by_count":0,"quality_score":41,"matched_keywords":["language model"],"author_affiliations":["City University of Hong Kong","Seattle University","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C2781067378","display_name":"Interpretability","score":0.8963607549667358},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.748294472694397},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.48253366351127625},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.4650175869464874},{"id":"https://openalex.org/C120567893","display_name":"Knowledge extraction","score":0.4606608748435974},{"id":"https://openalex.org/C67186912","display_name":"Data modeling","score":0.45716914534568787},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.3672335147857666},{"id":"https://openalex.org/C2522767166","display_name":"Data science","score":0.34155184030532837}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/dataset-and-lessons-learned-from-the-2024-satml-llm-capture-the-flag-competition","title":"Dataset and Lessons Learned from the 2024 SaTML LLM Capture-the-Flag Competition","url":"https://www.microsoft.com/en-us/research/publication/dataset-and-lessons-learned-from-the-2024-satml-llm-capture-the-flag-competition/","published":"2024-06-12","authors":["Edoardo Debenedetti","Javier Rando","Daniel Paleka","Fineas Silaghi","Dragos Albastroiu","Niv Cohen","Yuval Lemberg","Reshmi Ghosh","Ahmed Salem","Rui Wen","Giovanni Cherubin","Santiago Zanella-Béguelin"],"abstract":"Large language model systems face important security risks from maliciously crafted messages that aim to overwrite the system's original instructions or leak private data. To study this problem, we organized a capture-the-flag competition at IEEE SaTML 2024, where the flag is a secret string in the LLM system prompt. The competition was organized in two phases. In the first phase, teams developed defenses to prevent the model from leaking the secret. During the second phase, teams were challenged to extract the secrets hidden for defenses proposed by the other teams. This report summarizes the main insights from the competition. Notably, we found that all defenses were bypassed at least once, highlighting the difficulty of designing a successful defense and the necessity for additional research to protect LLM systems. To foster future research in this direction, we compiled a dataset wit...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":84,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Security, privacy, and cryptography","Machine learning","Security and Privacy","1970-01-01","LLM","language model"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/aligning-vision-models-with-human-aesthetics-in-retrieval-benchmarks-and-algorithms","title":"Aligning Vision Models with Human Aesthetics in Retrieval: Benchmarks and Algorithms","url":"https://www.microsoft.com/en-us/research/publication/aligning-vision-models-with-human-aesthetics-in-retrieval-benchmarks-and-algorithms/","published":"2024-06-12","authors":["Miaosen Zhang","Yixuan Wei","Zhen Xing","Yifei Ma","Zuxuan Wu","Ji Li","Zheng Zhang","Qi Dai","Chong Luo","Xin Geng","Baining Guo"],"abstract":"Modern vision models are trained on very large noisy datasets. While these models acquire strong capabilities, they may not follow the user's intent to output the desired results in certain aspects, e.g., visual aesthetic, preferred style, and responsibility. In this paper, we target the realm of visual aesthetics and aim to align vision models with human aesthetic standards in a retrieval system. Advanced retrieval systems usually adopt a cascade of aesthetic models as re-rankers or filters, which are limited to low-level features like saturation and perform poorly when stylistic, cultural or knowledge contexts are involved. We find that utilizing the reasoning ability of large language models (LLMs) to rephrase the search query and extend the aesthetic expectations can make up for this shortcoming. Based on the above findings, we propose a preference-based reinforcement learning method...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":80,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","large language models","1970-01-01","preference","retrieval"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/position-towards-bidirectional-human-ai-alignment","title":"Position: Towards Bidirectional Human-AI Alignment","url":"https://www.microsoft.com/en-us/research/publication/position-towards-bidirectional-human-ai-alignment/","published":"2024-06-12","authors":["Hua Shen","Tiffany Knearem","Reshmi Ghosh","Kenan Alkiek","Kundan Krishna","Yachuan Liu","Ziqiao Ma","S. Petridis","Yi-Hao Peng","Li Qiwei","Sushrita Rakshit","Chenglei Si"],"abstract":"Recent advances in general-purpose AI underscore the urgent need to align AI systems with human goals and values. Yet, the lack of a clear, shared understanding of what constitutes \"alignment\" limits meaningful progress and cross-disciplinary collaboration. In this position paper, we argue that the research community should explicitly define and critically reflect on \"alignment\" to account for the bidirectional and dynamic relationship between humans and AI. Through a systematic review of over 400 papers spanning HCI, NLP, ML, and more, we examine how alignment is currently defined and operationalized. Building on this analysis, we introduce the Bidirectional Human-AI Alignment framework, which not only incorporates traditional efforts to align AI with human values but also introduces the critical, underexplored dimension of aligning humans with AI -- supporting cognitive, behavioral, an...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Inproceedings (Conference)","Social sciences","Computer science","1970-01-01","long-term"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W4399566004","title":"An empirical study on the robustness of the segment anything model (SAM)","url":"https://doi.org/10.1016/j.patcog.2024.110685","published":"2024-06-12","authors":["Yuqing Wang","Yun Zhao","Linda Petzold"],"abstract":"The Segment Anything Model (SAM) is a foundation model for general image segmentation. Although it exhibits impressive performance predominantly on natural images, understanding its robustness against various image perturbations and domains is critical for real-world applications where such challenges frequently arise. In this study we conduct a comprehensive robustness investigation of SAM under diverse real-world conditions. Our experiments encompass a wide range of image perturbations. Our experimental results demonstrate that SAM’s performance generally declines under perturbed images, with varying degrees of vulnerability across different perturbations. By customizing prompting techniques and leveraging domain knowledge based on the unique characteristics of each dataset, the model’s resilience to these perturbations can be enhanced, addressing dataset-specific challenges. This work...","companies":["Meta/FAIR"],"matched_orgs":["Meta/FAIR"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1016/j.patcog.2024.110685","openalex_id":"https://openalex.org/W4399566004","cited_by_count":44,"quality_score":67,"matched_keywords":[],"author_affiliations":["Meta (United States)","University of California, Santa Barbara"],"concepts":[{"id":"https://openalex.org/C63479239","display_name":"Robustness (evolution)","score":0.718802809715271},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5246163606643677},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3750767707824707},{"id":"https://openalex.org/C185592680","display_name":"Chemistry","score":0.1589030921459198},{"id":"https://openalex.org/C104317684","display_name":"Gene","score":0.0},{"id":"https://openalex.org/C55493867","display_name":"Biochemistry","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":44}},{"id":"openalex:W4399575748","title":"Meet generative AI your new shared decision-making assistant","url":"https://doi.org/10.1136/bmjebm-2023-112651","published":"2024-06-12","authors":["Glyn Elwyn","Padhraig Ryan","Daniel Blumkin","William B. Weeks"],"abstract":"","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1136/bmjebm-2023-112651","openalex_id":"https://openalex.org/W4399575748","cited_by_count":8,"quality_score":45,"matched_keywords":[],"author_affiliations":["Dartmouth College","Dartmouth Institute for Health Policy and Clinical Practice","Eon Corporation (United States)","Microsoft (United States)"],"concepts":[{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.6594197154045105},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5732053518295288},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.40606823563575745}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":8}},{"id":"openalex:W4399571027","title":"Advancing DNA Language Models through Motif-Oriented Pre-Training with MoDNA","url":"https://doi.org/10.3390/biomedinformatics4020085","published":"2024-06-12","authors":["Weizhi An","Yuzhi Guo","Yatao Bian","Hehuan Ma","Jinyu Yang","Chunyuan Li","Junzhou Huang"],"abstract":"Acquiring meaningful representations of gene expression is essential for the accurate prediction of downstream regulatory tasks, such as identifying promoters and transcription factor binding sites. However, the current dependency on supervised learning, constrained by the limited availability of labeled genomic data, impedes the ability to develop robust predictive models with broad generalization capabilities. In response, recent advancements have pivoted towards the application of self-supervised training for DNA sequence modeling, enabling the adaptation of pre-trained genomic representations to a variety of downstream tasks. Departing from the straightforward application of masked language learning techniques to DNA sequences, approaches such as MoDNA enrich genome language modeling with prior biological knowledge. In this study, we advance DNA language models by utilizing the Motif...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.3390/biomedinformatics4020085","openalex_id":"https://openalex.org/W4399571027","cited_by_count":3,"quality_score":40,"matched_keywords":[],"author_affiliations":["Tencent (China)","The University of Texas at Arlington"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6661725640296936},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5430492758750916},{"id":"https://openalex.org/C32276052","display_name":"Motif (music)","score":0.5235576033592224},{"id":"https://openalex.org/C2776207758","display_name":"Downstream (manufacturing)","score":0.49967169761657715},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.4936140775680542},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.4901019036769867},{"id":"https://openalex.org/C177148314","display_name":"Generalization","score":0.4751424789428711},{"id":"https://openalex.org/C70721500","display_name":"Computational biology","score":0.42215633392333984}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":3}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/noise-aware-differentially-private-regression-via-meta-learning","title":"Noise-Aware Differentially Private Regression via Meta-Learning","url":"https://www.microsoft.com/en-us/research/publication/noise-aware-differentially-private-regression-via-meta-learning/","published":"2024-06-11","authors":["Ossi Räisä","Stratis Markou","Matthew Ashman","Wessel Bruinsma","Marlon Tobaben","Antti Honkela","Richard Turner"],"abstract":"Many high-stakes applications require machine learning models that protect user privacy and provide well-calibrated, accurate predictions. While Differential Privacy (DP) is the gold standard for protecting user privacy, standard DP mechanisms typically significantly impair performance. One approach to mitigating this issue is pre-training models on simulated data before DP learning on the private data. In this work we go a step further, using simulated data to train a meta-learning model that combines the Convolutional Conditional Neural Process (ConvCNP) with an improved functional DP mechanism of Hall et al. [2013] yielding the DPConvCNP. DPConvCNP learns from simulated data how to map private data to a DP predictive model in one forward pass, and then provides accurate, well-calibrated predictions. We compare DPConvCNP with a DP Gaussian Process (GP) baseline with carefully tuned hyp...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","Machine learning","mathematics","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/mmworld-towards-multi-discipline-multi-faceted-world-model-evaluation-in-videos","title":"MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos","url":"https://www.microsoft.com/en-us/research/publication/mmworld-towards-multi-discipline-multi-faceted-world-model-evaluation-in-videos/","published":"2024-06-11","authors":["Xuehai He","Weixi Feng","Kaizhi Zheng","Yujie Lu","Wanrong Zhu","Jiachen Li","Yue Fan","Jianfeng Wang","Linjie Li","Zhengyuan Yang","K. Lin","William Yang Wang"],"abstract":"Multimodal Language Language Models (MLLMs) demonstrate the emerging abilities of \"world models\"-- interpreting and reasoning about complex real-world dynamics. To assess these abilities, we posit videos are the ideal medium, as they encapsulate rich representations of real-world dynamics and causalities. To this end, we introduce MMWorld, a new benchmark for multi-discipline, multi-faceted multimodal video understanding. MMWorld distinguishes itself from previous video understanding benchmarks with two unique advantages: (1) multi-discipline, covering various disciplines that often require domain expertise for comprehensive understanding; (2) multi-faceted reasoning, including explanation, counterfactual thinking, future prediction, etc. MMWorld consists of a human-annotated dataset to evaluate MLLMs with questions about the whole videos and a synthetic dataset to analyze MLLMs within a...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer vision","Computer science","Multimodal Large Language Models","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"bytedance-seed:229","title":"Autoregressive Pretraining with Mamba in Vision","url":"https://seed.bytedance.com/en/research/autoregressive-pretraining-with-mamba-in-vision","published":"2024-06-11","authors":["Sucheng Ren","Xianhang Li","Haoqin Tu","Feng Wang","Fangxun Shu","Lei Zhang","Jieru Mei","Linjie Yang","Peng Wang","Heng Wang","Alan Yuille","Cihang Xie"],"abstract":"The vision community has started to build with the recently developed state space model, Mamba, as the new backbone for a range of tasks. This paper shows that Mamba's visual capability can be significantly enhanced through autoregressive pretraining, a direction not previously explored. Efficiency-wise, the autoregressive nature can well capitalize on the Mamba's unidirectional recurrent structure, enabling faster overall training speed compared to other training strategies like mask modeling. Performance-wise, autoregressive pretraining equips the Mamba architecture with markedly higher accuracy over its supervised-trained counterparts and, more importantly, successfully unlocks its scaling potential to large and even huge model sizes. For example, with autoregressive pretraining, a base-size Mamba attains 83.2\\% ImageNet accuracy, outperforming its supervised counterpart by 2.0\\%; our...","companies":["ByteDance/Seed"],"matched_orgs":["ByteDance/Seed"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Computer Vision","Vision","ICLR 2025"],"author_affiliations":["ByteDance/Seed"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official ByteDance Seed publication API https://seed.bytedance.com/api/get_article_list_v2"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/motion-consistency-model-accelerating-video-diffusion-with-disentangled-motion-appearance-distillation","title":"Motion Consistency Model: Accelerating Video Diffusion with Disentangled Motion-Appearance Distillation","url":"https://www.microsoft.com/en-us/research/publication/motion-consistency-model-accelerating-video-diffusion-with-disentangled-motion-appearance-distillation/","published":"2024-06-10","authors":["Yuanhao Zhai","Kevin Lin","Zhengyuan Yang","Linjie Li","Jianfeng Wang","Chung-Ching Lin","David Doermann","Junsong Yuan","Lijuan Wang"],"abstract":"Image diffusion distillation achieves high-fidelity generation with very few sampling steps. However, applying these techniques directly to video diffusion often results in unsatisfactory frame quality due to the limited visual quality in public video datasets. This affects the performance of both teacher and student video diffusion models. Our study aims to improve video diffusion distillation while improving frame appearance using abundant high-quality image data. We propose motion consistency model (MCM), a single-stage video diffusion distillation method that disentangles motion and appearance learning. Specifically, MCM includes a video consistency model that distills motion from the video teacher model, and an image discriminator that enhances frame appearance to match high-quality image data. This combination presents two challenges: (1) conflicting frame learning objectives, as v...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":80,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer vision","Computer science","diffusion distillation","1970-01-01","distillation"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W4399521940","title":"Calibrated Language Models Must Hallucinate","url":"https://doi.org/10.1145/3618260.3649777","published":"2024-06-10","authors":["Adam Tauman Kalai","Santosh Vempala"],"abstract":"Recent language models generate false but plausible-sounding text with surprising frequency. Such “hallucinations” are an obstacle to the usability of language-based AI systems and can harm people who rely upon their outputs. This work shows that there is an inherent statistical lower-bound on the rate that pretrained language models hallucinate certain types of facts, having nothing to do with the transformer LM architecture or data quality. For “arbitrary” facts whose veracity cannot be determined from the training data, we show that hallucinations must occur at a certain rate for language models that satisfy a statistical calibration condition appropriate for generative language models. Specifically, if the maximum probability of any fact is bounded, we show that the probability of generating a hallucination is close to the fraction of facts that occur exactly once in the training dat...","companies":["OpenAI"],"matched_orgs":["OpenAI"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3618260.3649777","openalex_id":"https://openalex.org/W4399521940","cited_by_count":55,"quality_score":67,"matched_keywords":[],"author_affiliations":["Georgia Institute of Technology","OpenAI (United States)"],"concepts":[{"id":"https://openalex.org/C2911011789","display_name":"Hallucinating","score":0.82439124584198},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7209466695785522},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.6015478372573853},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.48197001218795776},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.4333963990211487},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.3843016028404236}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":55}},{"id":"openalex:W4399487288","title":"GeckOpt: LLM System Efficiency via Intent-Based Tool Selection","url":"https://doi.org/10.1145/3649476.3658784","published":"2024-06-10","authors":["Michael Fore","Simranjit Singh","Dimitrios Stamoulis"],"abstract":"In this preliminary study, we investigate a GPT-driven intent-based reasoning approach to streamline tool selection for large language models (LLMs) aimed at system efficiency. By identifying the intent behind user prompts at runtime, we narrow down the API toolset required for task execution, reducing token consumption by up to 24.6%. Early results on a real-world, massively parallel Copilot platform with over 100 GPT-4-Turbo nodes show cost reductions and potential towards improving LLM-based system efficiency.","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3649476.3658784","openalex_id":"https://openalex.org/W4399487288","cited_by_count":5,"quality_score":46,"matched_keywords":["LLM"],"author_affiliations":["Microsoft (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6865758895874023},{"id":"https://openalex.org/C81917197","display_name":"Selection (genetic algorithm)","score":0.6555072069168091},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.22486558556556702}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":5}},{"id":"openalex:W4399487315","title":"Generalizable and Relation Sensitive Netlist Representation for Analog Circuit Design","url":"https://doi.org/10.1145/3649476.3658793","published":"2024-06-10","authors":["Surya Penmetsa","Fahad Rahman Amik","Zhanguang Zhang","Yingying Fu","Yingxue Zhang","Wulong Liu","Jianye Hao"],"abstract":"The problem of transistor sizing is challenging due to the large design space and complex performance trade-offs. Conventional black-box optimization methods, such as Bayesian optimization, cannot leverage past experience. In this paper, we propose a novel state representation for analog circuits to capture the heterogeneity of connections between different components. Component types are encoded through an embedding lookup table, which enables learning transferable knowledge across circuits. Experiments on various designs demonstrate that the agent with transfer learning can reduce runtime significantly in both fine-tuning and zero-shot transfer settings compared to current SOTA baselines. Notably, when learning from scratch, our agent achieves at least 21% higher Figures of Merit (FoM) compared to the SOTA method.","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3649476.3658793","openalex_id":"https://openalex.org/W4399487315","cited_by_count":0,"quality_score":41,"matched_keywords":["agent"],"author_affiliations":["Huawei Technologies (Canada)","Huawei Technologies (China)"],"concepts":[{"id":"https://openalex.org/C177650935","display_name":"Netlist","score":0.9839545488357544},{"id":"https://openalex.org/C25343380","display_name":"Relation (database)","score":0.7149447202682495},{"id":"https://openalex.org/C2776359362","display_name":"Representation (politics)","score":0.6822479963302612},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6753822565078735},{"id":"https://openalex.org/C26490066","display_name":"Circuit extraction","score":0.4407193660736084},{"id":"https://openalex.org/C9390403","display_name":"Computer hardware","score":0.19042670726776123},{"id":"https://openalex.org/C23572009","display_name":"Equivalent circuit","score":0.18171781301498413},{"id":"https://openalex.org/C119599485","display_name":"Electrical engineering","score":0.16680976748466492}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/cares-a-comprehensive-benchmark-of-trustworthiness-in-medical-vision-language-models","title":"CARES: A Comprehensive Benchmark of Trustworthiness in Medical Vision Language Models","url":"https://www.microsoft.com/en-us/research/publication/cares-a-comprehensive-benchmark-of-trustworthiness-in-medical-vision-language-models/","published":"2024-06-09","authors":["Peng Xia","Ze Chen","Juanxi Tian","Yangrui Gong","Ruibo Hou","Yue Xu","Zhenbang Wu","Zhiyuan Fan","Yiyang Zhou","Kangyu Zhu","Wenhao Zheng","Zhaoyang Wang"],"abstract":"Artificial intelligence has significantly impacted medical applications, particularly with the advent of Medical Large Vision Language Models (Med-LVLMs), sparking optimism for the future of automated and personalized healthcare. However, the trustworthiness of Med-LVLMs remains unverified, posing significant risks for future model deployment. In this paper, we introduce CARES and aim to comprehensively evaluate the Trustworthiness of Med-LVLMs across the medical domain. We assess the trustworthiness of Med-LVLMs across five dimensions, including trustfulness, fairness, safety, privacy, and robustness. CARES comprises about 41K question-answer pairs in both closed and open-ended formats, covering 16 medical image modalities and 27 anatomical regions. Our analysis reveals that the models consistently exhibit concerns regarding trustworthiness, often displaying factual inaccuracies and fai...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":80,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Medical, health and genomics","Computer science","Medical Large Vision Language Models","1970-01-01","personalized"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/discoveryworld-a-virtual-environment-for-developing-and-evaluating-automated-scientific-discovery-agents","title":"DISCOVERYWORLD: A Virtual Environment for Developing and Evaluating Automated Scientific Discovery Agents","url":"https://www.microsoft.com/en-us/research/publication/discoveryworld-a-virtual-environment-for-developing-and-evaluating-automated-scientific-discovery-agents/","published":"2024-06-09","authors":["Peter Alexander Jansen","Marc-Alexandre Côté","Tushar Khot","Erin Bransom","Bhavana Dalvi","Bodhisattwa Prasad Majumder","Oyvind Tafjord","Peter Clark"],"abstract":"Automated scientific discovery promises to accelerate progress across scientific domains. However, developing and evaluating an AI agent's capacity for end-to-end scientific reasoning is challenging as running real-world experiments is often prohibitively expensive or infeasible. In this work we introduce DISCOVERYWORLD, the first virtual environment for developing and benchmarking an agent's ability to perform complete cycles of novel scientific discovery. DISCOVERYWORLD contains a variety of different challenges, covering topics as diverse as radioisotope dating, rocket science, and proteomics, to encourage development of general discovery skills rather than task-specific solutions. DISCOVERYWORLD itself is an inexpensive, simulated, text-based environment (with optional 2D visual overlay). It includes 120 different challenge tasks, spanning eight topics each with three levels of diffi...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","AI agents","Computer science","1970-01-01","agent"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/cvqa-culturally-diverse-multilingual-visual-question-answering-benchmark","title":"CVQA: Culturally-diverse Multilingual Visual Question Answering Benchmark","url":"https://www.microsoft.com/en-us/research/publication/cvqa-culturally-diverse-multilingual-visual-question-answering-benchmark/","published":"2024-06-09","authors":["David Romero","Chenyang Lyu","Haryo Akbarianto Wibowo","Teresa Lynn","Injy Hamed","Aditya Nanda Kishore","Aishik Mandal","Alina Dragonetti","Artem Abzaliev","A. Tonja","Bontu Fufa Balcha","Chenxi Whitehouse"],"abstract":"Visual Question Answering (VQA) is an important task in multimodal AI, and it is often used to test the ability of vision-language models to understand and reason on knowledge present in both visual and textual data. However, most of the current VQA models use datasets that are primarily focused on English and a few major world languages, with images that are typically Western-centric. While recent efforts have tried to increase the number of languages covered on VQA datasets, they still lack diversity in low-resource languages. More importantly, although these datasets often extend their linguistic range via translation or some other approaches, they usually keep images the same, resulting in narrow cultural representation. To address these limitations, we construct CVQA, a new Culturally-diverse multilingual Visual Question Answering benchmark, designed to cover a rich set of languages...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","Vision-language models","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/vall-e-2-neural-codec-language-models-are-human-parity-zero-shot-text-to-speech-synthesizers-2","title":"VALL-E 2: Neural Codec Language Models are Human Parity Zero-Shot Text to Speech Synthesizers","url":"https://www.microsoft.com/en-us/research/publication/vall-e-2-neural-codec-language-models-are-human-parity-zero-shot-text-to-speech-synthesizers-2/","published":"2024-06-07","authors":["Sanyuan Chen","Shujie Liu","Long Zhou","Yanqing Liu","Xu Tan","Jinyu Li","Sheng Zhao","Yao Qian","Furu Wei"],"abstract":"This paper introduces VALL-E 2, the latest advancement in neural codec language models that marks a milestone in zero-shot text-to-speech synthesis (TTS), achieving human parity for the first time. Based on its predecessor, VALL-E, the new iteration introduces two significant enhancements: Repetition Aware Sampling refines the original nucleus sampling process by accounting for token repetition in the decoding history. It not only stabilizes the decoding but also circumvents the infinite loop issue. Grouped Code Modeling organizes codec codes into groups to effectively shorten the sequence length, which not only boosts inference speed but also addresses the challenges of long sequence modeling. Our experiments on the LibriSpeech and VCTK datasets show that VALL-E 2 surpasses previous systems in speech robustness, naturalness, and speaker similarity. It is the first of its kind to reach h...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":84,"matched_keywords":["Miscellaneous","Artificial intelligence","Audio and Acoustics","Audio and Speech Processing","Computation and Language","Computer science","Engineering","sound"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/offline-training-of-language-model-agents-with-functions-as-learnable-weights","title":"Offline Training of Language Model Agents with Functions as Learnable Weights","url":"https://www.microsoft.com/en-us/research/publication/offline-training-of-language-model-agents-with-functions-as-learnable-weights/","published":"2024-06-07","authors":["Shaokun Zhang","Jieyu Zhang","Jiale Liu","Linxin Song","Chi Wang","Ranjay Krishna","Qingyun Wu"],"abstract":"Researchers and practitioners have recently reframed powerful Large Language Models (LLMs) as agents, enabling them to automate complex tasks largely via the use of specialized functions. To facilitate the development of LLM agents, we present a novel paradigm of training LLM agents without modifying the LLM weights, which is particularly useful when the LLMs are difficult or inaccessible for modifications. Inspired by how humans continuously forge tools to adapt to real-world tasks, rather than change our biological structure to fit a static set of tools, we propose to progressively forge agent's functions to better solve the downstream tasks instead of modifying the LLM weights. By treating the functions as learnable agent parameters' and leveraging the fundamental idea of model training in artificial intelligence, we develop AgentOptimizer that employs the LLM to update agents' functi...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":80,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","1970-01-01","LLM","language model","agent"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"official:b1ef15a97858eab9","title":"Hello Qwen2","url":"https://qwenlm.github.io/blog/qwen2/","published":"2024-06-07","authors":["Alibaba/Qwen"],"abstract":"GITHUB HUGGING FACE MODELSCOPE DEMO DISCORDIntroduction After months of efforts, we are pleased to announce the evolution from Qwen1.5 to Qwen2. This time, we bring to you:Pretrained and instruction-tuned models of 5 sizes, including Qwen2-0.5B, Qwen2-1.5B, Qwen2-7B, Qwen2-57B-A14B, and Qwen2-72B; Having been trained on data in 27 additional languages besides English and Chinese; State-of-the-art performance in a large number of benchmark evaluations; Significantly improved performance in coding and mathematics; Extended context length support up to 128K tokens with Qwen2-7B-Instruct and Qwen2-72B-Instruct.","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Alibaba/Qwen"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official RSS feed https://qwenlm.github.io/blog/index.xml"}},{"id":"openalex:W4399422671","title":"AI red teaming","url":"https://doi.org/10.1117/12.3029883","published":"2024-06-07","authors":["Joe Lucas"],"abstract":"This is far more than just prompt injection and LLM jailbreaks. Apply a systematic lifecycle perspective to frame adversarial testing requirements and tactics for artificial intelligence (AI) systems. In this talk, we’ll explore why red teaming AI-enabled systems has some nuance that might not be covered during traditional testing and evaluation. We’ll discuss how to build AI red teaming capabilities, measure their performance and effectiveness, and explore the future of adversarial testing.","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1117/12.3029883","openalex_id":"https://openalex.org/W4399422671","cited_by_count":0,"quality_score":41,"matched_keywords":["LLM"],"author_affiliations":["Nvidia (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5842185616493225},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4598335921764374}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/llm-vectorizer-llm-based-verified-loop-vectorizer","title":"LLM-Vectorizer: LLM-Based Verified Loop Vectorizer","url":"https://www.microsoft.com/en-us/research/publication/llm-vectorizer-llm-based-verified-loop-vectorizer/","published":"2024-06-06","authors":["Jubi Taneja","Avery Laird","Cong Yan","Madan Musuvathi","Shuvendu Lahiri"],"abstract":"Vectorization is a powerful optimization technique that significantly boosts the performance of high performance computing applications operating on large data arrays. Despite decades of research on auto-vectorization, compilers frequently miss opportunities to vectorize code. On the other hand, writing vectorized code manually using compiler intrinsics is still a complex, error-prone task that demands deep knowledge of specific architecture and compilers. In this paper, we evaluate the potential of large-language models (LLMs) to generate vectorized (Single Instruction Multiple Data) code from scalar programs that process individual array elements. We propose a novel finite-state-machine multi-agents based approach that harnesses LLMs and test-based feedback to generate vectorized code. Our findings indicate that LLMs are capable of producing high-performance vectorized code with run-ti...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Inproceedings (Conference)","Programming languages and software engineering","Computer science","1970-01-01","LLM"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/maira-2-grounded-radiology-report-generation","title":"MAIRA-2: Grounded Radiology Report Generation","url":"https://www.microsoft.com/en-us/research/publication/maira-2-grounded-radiology-report-generation/","published":"2024-06-06","authors":["Shruthi Bannur","Kenza Bouzid","Daniel Coelho de Castro","Anton Schwaighofer","Sam Bond-Taylor","Maximilian Ilse","Fernando Pérez-García","Valentina Salvatelli","Harshita Sharma","Felix Meissen","Mercy Ranjit","Shaury Srivastav"],"abstract":"Radiology reporting is a complex task that requires detailed image understanding, integration of multiple inputs, including comparison with prior imaging, and precise language generation. This makes it ideal for the development and use of generative multimodal models. Here, we extend report generation to include the localisation of individual findings on the image - a task we call grounded report generation. Prior work indicates that grounding is important for clarifying image understanding and interpreting AI-generated text. Therefore, grounded reporting stands to improve the utility and transparency of automated report drafting. To enable evaluation of grounded reporting, we propose a novel evaluation framework - RadFact - leveraging the reasoning capabilities of large language models (LLMs). RadFact assesses the factuality of individual generated sentences, as well as correctness of g...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Tech Report","Artificial intelligence","Medical, health and genomics","LLM"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/geogen","title":"GeoGen: Geometry-Aware Generative Modeling via Signed Distance Functions","url":"https://www.microsoft.com/en-us/research/publication/geogen/","published":"2024-06-06","authors":["Salvatore Esposito","Qingshan Xu","Kacper Kania","Charlie Hewitt","Octave Mariotti","Lohit Petikam","Julien Valentin","Arno Onken","Oisin Mac Aodha"],"abstract":"We introduce a new generative approach for synthesizing 3D geometry and images from single-view collections. Most existing approaches predict volumetric density to render multi-view consistent images. By employing volumetric rendering using neural radiance fields, they inherit a key limitation: the generated geometry is noisy and unconstrained, limiting the quality and utility of the output meshes. To address this issue, we propose GeoGen, a new SDF-based 3D generative model trained in an end-to-end manner. Initially, we reinterpret the volumetric density as a Signed Distance Function (SDF). This allows us to introduce useful priors to generate valid meshes. However, those priors prevent the generative model from learning details, limiting the applicability of the method to real-world scenarios. To alleviate that problem, we make the transformation learnable and constrain the rendered de...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Inproceedings (Conference)","Graphics and multimedia","Computer science","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"official:069366d912f8a435","title":"Generalizing an LLM from 8k to 1M Context using Qwen-Agent","url":"https://qwenlm.github.io/blog/qwen-agent-2405/","published":"2024-06-06","authors":["Alibaba/Qwen"],"abstract":"We’ve created an agent using Qwen2 models with an 8k context size to understand documents with 1M tokens, surpassing RAG and native long-context models. This agent was also used to generate data for training new long-context Qwen models.","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["LLM","agent"],"author_affiliations":["Alibaba/Qwen"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official RSS feed https://qwenlm.github.io/blog/index.xml"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/understanding-information-storage-and-transfer-in-multi-modal-large-language-models","title":"Understanding Information Storage and Transfer in Multi-modal Large Language Models","url":"https://www.microsoft.com/en-us/research/publication/understanding-information-storage-and-transfer-in-multi-modal-large-language-models/","published":"2024-06-05","authors":["Samyadeep Basu","Martin Grayson","Cecily Morrison","Besmira Nushi","S. Feizi","Daniela Massiceti"],"abstract":"Understanding the mechanisms of information storage and transfer in Transformer-based models is important for driving model understanding progress. Recent work has studied these mechanisms for Large Language Models (LLMs), revealing insights on how information is stored in a model's parameters and how information flows to and from these parameters in response to specific prompts. However, these studies have not yet been extended to Multi-modal Large Language Models (MLLMs). Given their expanding capabilities and real-world use, we start by studying one aspect of these models -- how MLLMs process information in a factual visual question answering task. We use a constraint-based formulation which views a visual question as having a set of visual or textual constraints that the model's generated answer must satisfy to be correct (e.g. What movie directed by the director in this photo has wo...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","Transformer-based models","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"official:dd7577b4fdcab462","title":"An Introduction to Vision-Language Modeling","url":"https://ai.meta.com/research/publications/an-introduction-to-vision-language-modeling/","published":"2024-06-05","authors":["Florian Bordes","Richard Pang","Anurag Ajay","Alexander C. Li","Adrien Bardes","Suzanne Petryk","Oscar Mañas","Zhiqiu Lin","Anas Mahmoud","Bargav Jayaraman","Mark Ibrahim","Melissa Hall"],"abstract":"Official Meta AI publication page.","companies":["Meta/FAIR"],"matched_orgs":["Meta/FAIR"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Core Machine Learning"],"author_affiliations":["Meta/FAIR"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Meta AI publication search https://ai.meta.com/results/?content_types%5B0%5D=publication&page=13"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/prise-learning-temporal-action-abstractions-as-a-sequence-compression-problem","title":"PRISE: LLM-Style Sequence Compression for Learning Temporal Action Abstractions in Control","url":"https://www.microsoft.com/en-us/research/publication/prise-learning-temporal-action-abstractions-as-a-sequence-compression-problem/","published":"2024-06-04","authors":["Ruijie Zheng","Ching-An Cheng","Hal Daumé III","Furong Huang","Andrey Kolobov"],"abstract":"Temporal action abstractions, along with belief state representations, are a powerful knowledge sharing mechanism for sequential decision making. In this work, we propose a novel view that treats inducing temporal action abstractions as a sequence compression problem. To do so, we bring a subtle but critical component of LLM training pipelines -- input tokenization via byte pair encoding (BPE) -- to the seemingly distant task of learning skills of variable time span in continuous control domains. We introduce an approach called Primitive Sequence Encoding (PRISE) that combines continuous action quantization with BPE to learn powerful action abstractions. We empirically show that high-level skills discovered by PRISE from a multitask set of robotic manipulation demonstrations significantly boost the learning performance of Behavior Cloning on downstream tasks. Venue: 1970-01-01","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":88,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Machine learning","Reinforcement learning","Robotics","1970-01-01","LLM","compression","quantization"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/mitigate-position-bias-in-large-language-models-via-scaling-a-single-dimension","title":"Mitigate Position Bias in Large Language Models via Scaling a Single Dimension","url":"https://www.microsoft.com/en-us/research/publication/mitigate-position-bias-in-large-language-models-via-scaling-a-single-dimension/","published":"2024-06-04","authors":["Yijiong Yu","Huiqiang Jiang","Xufang Luo","Qianhui Wu","Chin-Yew Lin","Dongsheng Li","Yuqing Yang","Yongfeng Huang","Lili Qiu"],"abstract":"Large Language Models (LLMs) are increasingly applied in various real-world scenarios due to their excellent generalization capabilities and robust generative abilities. However, they exhibit position bias, also known as\"lost in the middle\", a phenomenon that is especially pronounced in long-context scenarios, which indicates the placement of the key information in different positions of a prompt can significantly affect accuracy. This paper first explores the micro-level manifestations of position bias, concluding that attention weights are a micro-level expression of position bias. It further identifies that, in addition to position embeddings, causal attention mask also contributes to position bias by creating position-specific hidden states. Based on these insights, we propose a method to mitigate position bias by scaling this positional hidden states. Experiments on the NaturalQuest...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Unpublished","Artificial intelligence","Human language technologies","LLMs","Natural language processing","retrieval"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"bytedance-seed:71","title":"Seed-TTS: A Family of High-Quality Versatile Speech Generation Models","url":"https://seed.bytedance.com/en/research/seed-tts-a-family-of-high-quality-versatile-speech-generation-models","published":"2024-06-04","authors":["Philip Anastassiou","Jiawei Chen","Jitong Chen","Yuanzhe Chen","Zhuo Chen","Ziyi Chen","Jian Cong","Lelai Deng","Chuang Ding","Lu Gao","Mingqing Gong","Peisong Huang"],"abstract":"We introduce Seed-TTS, a family of large-scale autoregressive text-to-speech(TTS) models capable of generating speech that is virtually indistinguishable fromhuman speech. Seed-TTS serves as a foundation model for speech generation andexcels in speech in-context learning, achieving performance in speaker similarityand naturalness that matches ground truth human speech in both objective andsubjective evaluations. With fine-tuning, we achieve even higher subjective scoresacross these metrics. Seed-TTS offers superior controllability over various speechattributes such as emotion and is capable of generating highly expressive and diversespeech for speakers in the wild. Furthermore, we propose a self-distillation methodfor speech factorization, as well as a reinforcement learning approach to enhancemodel robustness, speaker similarity, and controllability. We additionally present anon-autoreg...","companies":["ByteDance/Seed"],"matched_orgs":["ByteDance/Seed"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Speech&Audio","Speech","arXiv","language model","distillation"],"author_affiliations":["ByteDance/Seed"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official ByteDance Seed publication API https://seed.bytedance.com/api/get_article_list_v2"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/biomedparse-a-biomedical-foundation-model-for-image-parsing-of-everything-everywhere-all-at-once","title":"BiomedParse: a biomedical foundation model for image parsing of everything everywhere all at once","url":"https://www.microsoft.com/en-us/research/publication/biomedparse-a-biomedical-foundation-model-for-image-parsing-of-everything-everywhere-all-at-once/","published":"2024-06-04","authors":["Theodore Zhao","Yu Gu","Jianwei Yang","Naoto Usuyama","Ho Hin Lee","Tristan Naumann","Jianfeng Gao","Angela Crabtree","B. Piening","Carlo Bifulco","Mu-Hsin Wei","Hoifung Poon"],"abstract":"Biomedical image analysis is fundamental for biomedical discovery in cell biology, pathology, radiology, and many other biomedical domains. Holistic image analysis comprises interdependent subtasks such as segmentation, detection, and recognition of relevant objects. Here, we propose BiomedParse, a biomedical foundation model for imaging parsing that can jointly conduct segmentation, detection, and recognition for 82 object types across 9 imaging modalities. Through joint learning, we can improve accuracy for individual tasks and enable novel applications such as segmenting all relevant objects in an image through a text prompt, rather than requiring users to laboriously specify the bounding box for each object. We leveraged readily available natural-language labels or descriptions accompanying those datasets and use GPT-4 to harmonize the noisy, unstructured text information with establ...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Unpublished","Computer vision","Computer science","efficient"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"apple:vuocbug2ffq9o07e551p5nby","title":"Entity Disambiguation via Fusion Entity Decoding","url":"https://machinelearning.apple.com/research/entity-disambiguation-fusion-decoding","published":"2024-06-04","authors":["Junxiong Wang","Ali Mousavi","Omar Attia","Ronak Pradeep","Saloni Potdar","Alexander M. Rush","Umar Farooq Minhas","Yunyao Li"],"abstract":"Entity disambiguation (ED), which links the mentions of ambiguous entities to their referent entities in a knowledge base, serves as a core component in entity linking (EL). Existing generative approaches demonstrate improved accuracy compared to classification approaches under the standardized ZELDA benchmark. Nevertheless, generative approaches suffer from the need for large-scale pre-training and inefficient generation. Most importantly,...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/sltrain-a-sparse-plus-low-rank-approach-for-parameter-and-memory-efficient-pretraining","title":"SLTrain: a sparse plus low-rank approach for parameter and memory efficient pretraining","url":"https://www.microsoft.com/en-us/research/publication/sltrain-a-sparse-plus-low-rank-approach-for-parameter-and-memory-efficient-pretraining/","published":"2024-06-03","authors":["Andi Han","Jiaxiang Li","Wei Huang","Mingyi Hong","Akiko Takeda","Pratik Jawanpuria","Bamdev Mishra"],"abstract":"Large language models (LLMs) have shown impressive capabilities across various tasks. However, training LLMs from scratch requires significant computational power and extensive memory capacity. Recent studies have explored low-rank structures on weights for efficient fine-tuning in terms of parameters and memory, either through low-rank adaptation or factorization. While effective for fine-tuning, low-rank structures are generally less suitable for pretraining because they restrict parameters to a low-dimensional subspace. In this work, we propose to parameterize the weights as a sum of low-rank and sparse matrices for pretraining, which we call SLTrain. The low-rank component is learned via matrix factorization, while for the sparse component, we employ a simple strategy of uniformly selecting the sparsity support at random and learning only the non-zero entries with the fixed support.....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":84,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","Language model","1970-01-01","memory","efficient","quantization"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/leveraging-visual-tokens-for-extended-text-contexts-in-multi-modal-learning","title":"Leveraging Visual Tokens for Extended Text Contexts in Multi-Modal Learning","url":"https://www.microsoft.com/en-us/research/publication/leveraging-visual-tokens-for-extended-text-contexts-in-multi-modal-learning/","published":"2024-06-03","authors":["Alex Jinpeng Wang","Linjie Li","Yiqi Lin","Min Li","Lijuan Wang","Mike Zheng Shou"],"abstract":"Training models with longer in-context lengths is a significant challenge for multimodal model due to substantial GPU memory and computational costs. This exploratory study does not present state-of-the-art models; rather, it introduces an innovative method designed to increase in-context text length in multi-modality large language models (MLLMs) efficiently. We present Visualized In-Context Text Processing (VisInContext), which processes long in-context text using visual tokens. This technique significantly reduces GPU memory usage and floating point operations (FLOPs) for both training and inferenceing stage. For instance, our method expands the pre-training in-context text length from 256 to 2048 tokens with nearly same FLOPs for a 56 billion parameter MOE model. Experimental results demonstrate that model trained with VisInContext delivers superior performance on common downstream b...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":80,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","Few Shot Learning","1970-01-01","memory","retrieval"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/medfuzz-exploring-the-robustness-of-large-language-models-in-medical-question-answering","title":"MedFuzz: Exploring the Robustness of Large Language Models in Medical Question Answering","url":"https://www.microsoft.com/en-us/research/publication/medfuzz-exploring-the-robustness-of-large-language-models-in-medical-question-answering/","published":"2024-06-03","authors":["Robert Osazuwa Ness","Hayden Helm","Katie Matton","Sheng Zhang","Junaid Bajwa","Carey E. Priebe","Eric Horvitz"],"abstract":"Large language models (LLM) have achieved impressive performance on medical question-answering benchmarks. However, high benchmark accuracy does not imply that the performance generalizes to real-world clinical settings. Medical question-answering benchmarks rely on assumptions consistent with quantifying LLM performance but that may not hold in the open world of the clinic. Yet LLMs learn broad knowledge that can help the LLM generalize to practical conditions regardless of unrealistic assumptions in celebrated benchmarks. We seek to quantify how well LLM medical question-answering benchmark performance generalizes when benchmark assumptions are violated. Specifically, we present an adversarial method that we call MedFuzz (for medical fuzzing). MedFuzz attempts to modify benchmark questions in ways aimed at confounding the LLM. We demonstrate the approach by targeting strong assumptions...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Unpublished","Artificial intelligence","large language models","LLM"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"official:2a9324ad39a3bc9d","title":"CHAI: Clustered Head Attention for Efficient LLM Inference","url":"https://ai.meta.com/research/publications/chai-clustered-head-attention-for-efficient-llm-inference/","published":"2024-06-03","authors":["Saurabh Agarwal","Bilge Acun","Basil Hosmer","Mostafa Elhoushi","Yejin Lee","Carole-Jean Wu"],"abstract":"Official Meta AI publication page.","companies":["Meta/FAIR"],"matched_orgs":["Meta/FAIR"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Systems Research","LLM","efficient"],"author_affiliations":["Meta/FAIR"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Meta AI publication search https://ai.meta.com/results/?content_types%5B0%5D=publication&page=13"}},{"id":"openalex:W4401753997","title":"Harness the Power of Generative AI in Healthcare with Amazon AI/ML Services","url":"https://doi.org/10.1109/ichi61247.2024.00070","published":"2024-06-03","authors":["Sherry Ding","V. Srinivasa Raman"],"abstract":"As a transformative and innovative technology, gen-erative AI enables us to solve really complex problems and re-imagine how we do things. There are big opportunities in how healthcare companies and organizations will use it to transform the whole industry and deliver amazing experience for their cus-tomers. As a leading cloud computing company with over twenty years innovation in machine learning (ML), AWS has developed a set of AI/ML services that allow healthcare companies and organizations to unleash the power of generative AI. In this paper, we will introduce AWS generative AI service stack, highlight some commonly used AWS AI/ML services in building generative AI applications for the healthcare field. We will discuss architectures of two popular applications in healthcare: chatbot and intelligent document processing (IDP), to showcase how different services work together in generat...","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/ichi61247.2024.00070","openalex_id":"https://openalex.org/W4401753997","cited_by_count":12,"quality_score":49,"matched_keywords":[],"author_affiliations":["Amazon (United States)"],"concepts":[{"id":"https://openalex.org/C535291247","display_name":"Amazon rainforest","score":0.8466651439666748},{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.6435828804969788},{"id":"https://openalex.org/C160735492","display_name":"Health care","score":0.592764675617218},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5061904788017273},{"id":"https://openalex.org/C163258240","display_name":"Power (physics)","score":0.4224094748497009},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.41890794038772583},{"id":"https://openalex.org/C17744445","display_name":"Political science","score":0.11747479438781738},{"id":"https://openalex.org/C86803240","display_name":"Biology","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":12}},{"id":"openalex:W4399534418","title":"WiP: A Solution for Reducing MLLM-Based Agent Interaction Overhead","url":"https://doi.org/10.1145/3662006.3662062","published":"2024-06-03","authors":["Wenjie Li","Xiaoyang Liu","Zi-yi Zheng","Jishun Wang","Ling Kang","Ming Fu"],"abstract":"Current Multi-modal LLM-based mobile agents are associated with concerns over high inference time and cost. We propose to tackle these issues by developing a lightweight UI Transition Graph (UTG) and locally executing automatic tasks. Specifically, we build a lightweight HTML-based UTG on both system-level and third-party applications, enabling the avoidance of computational overhead and laboriousness. Then we simplify the interaction phase with the LLM, and perform a local shortest path search on the UTG after a target option is derived from the LLM. The small-scale experiments demonstrate the benefits of our method.","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3662006.3662062","openalex_id":"https://openalex.org/W4399534418","cited_by_count":0,"quality_score":45,"matched_keywords":["LLM","agent"],"author_affiliations":["Huawei Technologies (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6656655073165894},{"id":"https://openalex.org/C2779960059","display_name":"Overhead (engineering)","score":0.6320207118988037},{"id":"https://openalex.org/C41550386","display_name":"Multi-agent system","score":0.42379823327064514},{"id":"https://openalex.org/C120314980","display_name":"Distributed computing","score":0.38400059938430786},{"id":"https://openalex.org/C111919701","display_name":"Operating system","score":0.15443912148475647},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.08375087380409241}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/mediq-question-asking-llms-for-adaptive-and-reliable-clinical-reasoning","title":"MEDIQ: Question-Asking LLMs for Adaptive and Reliable Clinical Reasoning","url":"https://www.microsoft.com/en-us/research/publication/mediq-question-asking-llms-for-adaptive-and-reliable-clinical-reasoning/","published":"2024-06-02","authors":["Shuyue Stella Li","Vidhisha Balachandran","Shangbin Feng","Jonathan Ilgen","Emma Pierson","Pang Wei Koh","Yulia Tsvetkov"],"abstract":"In high-stakes domains like clinical reasoning, AI assistants powered by large language models (LLMs) are yet to be reliable and safe. We identify a key obstacle towards reliability: existing LLMs are trained to answer any question, even with incomplete context in the prompt or insufficient parametric knowledge. We propose to change this paradigm to develop more careful LLMs that ask follow-up questions to gather necessary and sufficient information and respond reliably. We introduce MEDIQ, a framework to simulate realistic clinical interactions, which incorporates a Patient System and an adaptive Expert System. The Patient may provide incomplete information in the beginning; the Expert refrains from making diagnostic decisions when unconfident, and instead elicits missing details from the Patient via follow-up questions. To evaluate MEDIQ, we convert MEDQA and CRAFT-MD -- medical benchm...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":80,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Medical, health and genomics","Computer science","large language models","1970-01-01","LLM"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/transferring-world-models-from-simulation-for-efficient-real-world-finetuning","title":"Rapidly Adapting Policies to the Real World via Simulation-Guided Fine-Tuning","url":"https://www.microsoft.com/en-us/research/publication/transferring-world-models-from-simulation-for-efficient-real-world-finetuning/","published":"2024-06-02","authors":["Patrick Yin","Tyler Westenbroek","Simran Bagaria","Kevin Huang","Ching-An Cheng","Andrey Kolobov","Abhishek Gupta"],"abstract":"Robot learning requires a considerable amount of data to realize it's promise of generalization. However, it can be challenging to actually collect the magnitude of data necessary for generalization entirely in the real world. Simulation can serve as a source of plentiful data with coverage over relevant states and actions, without requiring the burden of human data collection. However, the high-fidelity physics simulators are fundamentally misspecified approximations to reality, making direct zero-shot transfer challenging. This makes real-world finetuning of policies pretrained in simulation an attractive approach to robot learning. However, current finetuning methods simply use the simulator to provide a reasonable initialization for real-world learning. We go beyond this paradigm by demonstrating how the task structure extracted from its simulation can be used to effectively guide an...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Machine learning","Robotics","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W4400645754","title":"Pix2Planning: End-to-End Planning by Vision-language Model for Autonomous Driving on Carla Simulator","url":"https://doi.org/10.1109/iv55156.2024.10588479","published":"2024-06-02","authors":["Xiangru Mu","Tong Qin","Songan Zhang","Chunjing Xu","Ming Yang"],"abstract":"The end-to-end neural network has become a hot topic in recent years. Compared with traditional module-based solutions, the end-to-end paradigm is able to reduce the accumulated error and avoid information loss, so that it earns great attention in autonomous driving tasks. However, the current end-to-end network designs easily lose useful information during training due to the complexity of mapping high-dimensional visual observation to navigation waypoints. Since the future navigation point is reasoned from the former one, the planning task is like a sequence generation task. Inspired by the great power of the neural language model, we propose an end-to-end framework, which transfers the planning task as a language sequence generation task conditioned on pixel inputs. The proposed method firstly extracts and transforms the image feature from camera-view to bird-eye-view (BEV). Then the....","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/iv55156.2024.10588479","openalex_id":"https://openalex.org/W4400645754","cited_by_count":6,"quality_score":47,"matched_keywords":["language model"],"author_affiliations":["Huawei Technologies (China)","Shanghai Jiao Tong University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7118424773216248},{"id":"https://openalex.org/C44154836","display_name":"Simulation","score":0.5681132674217224},{"id":"https://openalex.org/C2780689630","display_name":"Driving simulator","score":0.4491705894470215},{"id":"https://openalex.org/C74296488","display_name":"End-to-end principle","score":0.44660526514053345},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4083622395992279},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.39147114753723145},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.39102548360824585},{"id":"https://openalex.org/C121684516","display_name":"Computer graphics (images)","score":0.3561909794807434}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":6}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/table-gpt-table-fine-tuned-gpt-for-diverse-table-tasks","title":"Table-GPT: Table Fine-tuned GPT for Diverse Table Tasks","url":"https://www.microsoft.com/en-us/research/publication/table-gpt-table-fine-tuned-gpt-for-diverse-table-tasks/","published":"2024-06-01","authors":["Peng Li","Yeye He","Dror Yashar","Weiwei Cui","Song Ge","Haidong Zhang","Danielle Rifinski Fainman","Dongmei Zhang","Surajit Chaudhuri"],"abstract":"Language models, such as GPT-3 and ChatGPT, demonstrate remarkable abilities to follow diverse human instructions and perform a wide range of tasks, using instruction fine-tuning. However, when probing language models using a range of basic table-understanding tasks, we observe that today's language models are still sub-optimal in many table-related tasks, likely because they are pre-trained predominantly on one-dimensional natural-language texts, whereas relational tables are two-dimensional objects. In this work, we propose a new table fine-tuning'' paradigm, where we continue to train/fine-tune language models like GPT-3.5 and ChatGPT, using diverse table-tasks synthesized from real tables as training data, which is analogous to instruction fine-tuning'', but with the goal of enhancing language models' ability to understand tables and perform table tasks. We show that our resulting Ta...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.1145/3654979","openalex_id":"https://openalex.org/W4399175313","cited_by_count":46,"quality_score":98,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Data platforms and analytics","1970-01-01"],"author_affiliations":["Microsoft","Georgia Institute of Technology","Microsoft (Israel)","Microsoft (United States)","Microsoft Research Asia (China)"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/im-not-sure-but-examining-the-impact-of-large-language-models-uncertainty-expression-on-user-reliance-and-trust","title":"\"I'm Not Sure, But...\": Examining the Impact of Large Language Models' Uncertainty Expression on User Reliance and Trust","url":"https://www.microsoft.com/en-us/research/publication/im-not-sure-but-examining-the-impact-of-large-language-models-uncertainty-expression-on-user-reliance-and-trust/","published":"2024-06-01","authors":["Sunnie S. Y. Kim","Q. Vera Liao","Mihaela Vorvoreanu","Stephanie Ballard","Jennifer Wortman Vaughan"],"abstract":"Widely deployed large language models (LLMs) can produce convincing yet incorrect outputs, potentially misleading users who may rely on them as if they were correct. To reduce such overreliance, there have been calls for LLMs to communicate their uncertainty to end users. However, there has been little empirical work examining how users perceive and act upon LLMs' expressions of uncertainty. We explore this question through a large-scale, pre-registered, human-subject experiment (N=404) in which participants answer medical questions with or without access to responses from a fictional LLM-infused search engine. Using both behavioral and self-reported measures, we examine how different natural language expressions of uncertainty impact participants' reliance, trust, and overall task performance. We find that first-person expressions (e.g., \" I'm not sure, but...\" ) decrease participants'....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.1145/3630106.3658941","openalex_id":"https://openalex.org/W4396802056","cited_by_count":92,"quality_score":98,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","1970-01-01","LLM"],"author_affiliations":["Microsoft","Microsoft (Canada)","Microsoft (United States)","Princeton University"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/can-llms-learn-by-teaching-a-preliminary-study","title":"Can LLMs Learn by Teaching for Better Reasoning? A Preliminary Study","url":"https://www.microsoft.com/en-us/research/publication/can-llms-learn-by-teaching-a-preliminary-study/","published":"2024-06-01","authors":["Xuefei Ning","Zifu Wang","Shiyao Li","Zinan Lin","Peiran Yao","Tianyu Fu","Matthew B. Blaschko","Guohao Dai","Huazhong Yang","Yu Wang"],"abstract":"Teaching to improve student models (e.g., knowledge distillation) is an extensively studied methodology in LLMs. However, for humans, teaching not only improves students but also improves teachers. We ask: Can LLMs also learn by teaching (LbT)? If yes, we can potentially unlock the possibility of continuously advancing the models without solely relying on human-produced data or stronger models. In this paper, we provide a preliminary exploration of this ambitious agenda. We show that LbT ideas can be incorporated into existing LLM training/prompting pipelines and provide noticeable improvements. Specifically, we design three methods, each mimicking one of the three levels of LbT in humans: observing students' feedback, learning from the feedback, and learning iteratively, with the goals of improving answer accuracy without training and improving models' inherent capability with fine-tuni...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":88,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","large language model","large language models","Logical reasoning","Natural language processing","1970-01-01","LLM","distillation"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/splitwise-efficient-generative-llm-inference-using-phase-splitting","title":"Splitwise: Efficient generative LLM inference using phase splitting","url":"https://www.microsoft.com/en-us/research/publication/splitwise-efficient-generative-llm-inference-using-phase-splitting/","published":"2024-06-01","authors":["Pratyush Patel","Esha Choukse","Chaojie Zhang","Aashaka Shah","Íñigo Goiri","Saeed Maleki","Ricardo Bianchini"],"abstract":"Generative large language model (LLM) applications are growing rapidly, leading to large-scale deployments of expensive and power-hungry GPUs. Our characterization of LLM inference shows that each inference request undergoes two phases: a compute-intensive prompt computation phase and a memory-intensive token generation phase, each with distinct latency, throughput, memory, and power characteristics. Despite state of-the-art batching and scheduling, the token generation phase underutilizes compute resources. Unlike prompt computation, token generation does not need the compute capability of the latest GPUs and can be run with lower power and cost.Based on these insights, we propose Splitwise, a model deployment and scheduling technique that splits the two phases of LLM inference requests on to separate machines. Splitwise enables phase-specific resource management using hardware that is....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.1109/mm.2025.3575361","openalex_id":"https://openalex.org/W4411055324","cited_by_count":2,"quality_score":86,"matched_keywords":["Inproceedings (Conference)","Systems and networking","Computer science","1970-01-01","LLM","language model","memory","efficient"],"author_affiliations":["Microsoft","Microsoft (United States)","Palo Alto Institute","Palo Alto Research Center","University of Washington"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/knowledge-distillation-in-automated-annotation-supervised-text-classification-with-llm-generated-training-labels","title":"Knowledge Distillation in Automated Annotation: Supervised Text Classification with LLM-Generated Training Labels","url":"https://www.microsoft.com/en-us/research/publication/knowledge-distillation-in-automated-annotation-supervised-text-classification-with-llm-generated-training-labels/","published":"2024-06-01","authors":["Nick Pangakis","Sam Wolken"],"abstract":"Computational social science (CSS) practitioners often rely on human-labeled data to fine-tune supervised text classifiers. We assess the potential for researchers to augment or replace human-generated training data with surrogate training labels from generative large language models (LLMs). We introduce a recommended workflow and test this LLM application by replicating 14 classification tasks and measuring performance. We employ a novel corpus of English-language text classification data sets from recent CSS articles in high-impact journals. Because these data sets are stored in password-protected archives, our analyses are less prone to issues of contamination. For each task, we compare supervised classifiers fine-tuned using GPT-4 labels against classifiers fine-tuned with human annotations and against labels from GPT-4 and Mistral-7B with few-shot in-context learning. Our findings i...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":80,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Social sciences","1970-01-01","LLM","efficient","distillation"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/flasheval-towards-fast-and-accurate-evaluation-of-text-to-image-diffusion-generative-models","title":"FlashEval: Towards Fast and Accurate Evaluation of Text-to-image Diffusion Generative Models","url":"https://www.microsoft.com/en-us/research/publication/flasheval-towards-fast-and-accurate-evaluation-of-text-to-image-diffusion-generative-models/","published":"2024-06-01","authors":["Lin Zhao","Tianchen Zhao","Zinan Lin","Xuefei Ning","Guohao Dai","Huazhong Yang","Yu Wang"],"abstract":"In recent years, there has been significant progress in the development of text-to-image generative models. Evaluating the quality of the generative models is one essential step in the development process. Unfortunately, the evaluation process could consume a significant amount of computational resources, making the required periodic evaluation of model performance (e.g., monitoring training progress) impractical. Therefore, we seek to improve the evaluation efficiency by selecting the representative subset of the text-image dataset. We systematically investigate the design choices, including the section criteria (textural features or image-based metrics) and the selection granularity (prompt-level or set-level). We find that the insights from prior work on subset selection for training data do not generalize to this problem, and we propose FlashEval, an iterative search algorithm tailor...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":80,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer vision","Diffusion models","Generative model","1970-01-01","quantization"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/its-like-a-rubber-duck-that-talks-back-understanding-generative-ai-assisted-data-analysis-workflows-through-a-participatory-prompting-study","title":"\"It's like a rubber duck that talks back\": Understanding Generative AI-Assisted Data Analysis Workflows through a Participatory Prompting Study","url":"https://www.microsoft.com/en-us/research/publication/its-like-a-rubber-duck-that-talks-back-understanding-generative-ai-assisted-data-analysis-workflows-through-a-participatory-prompting-study/","published":"2024-06-01","authors":["Ian Drosos","Advait Sarkar","Xiaotong Xu","Carina Negreanu","Sean Rintel","Lev Tankelevitch"],"abstract":"Generative AI tools can help users with many tasks. One such task is data analysis, which is notoriously challenging for non-expert end-users due to its expertise requirements, and where AI holds much potential, such as finding relevant data sources, proposing analysis strategies, and writing analysis code. To understand how data analysis workflows can be assisted or impaired by generative AI, we conducted a study (n=15) using Bing Chat via participatory prompting. Participatory prompting is a recently developed methodology in which users and researchers reflect together on tasks through co-engagement with generative AI. In this paper we demonstrate the value of the participatory prompting method. We found that generative AI benefits the information foraging and sensemaking loops of data analysis in specific ways, but also introduces its own barriers and challenges, arising from the diff...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":80,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Human-computer interaction","Social sciences","Human–computer interaction","Social Science","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/solving-data-centric-tasks-using-large-language-models","title":"Solving Data-centric Tasks using Large Language Models","url":"https://www.microsoft.com/en-us/research/publication/solving-data-centric-tasks-using-large-language-models/","published":"2024-06-01","authors":["Shraddha Barke","Christian Poelitz","Carina Negreanu","Ben Zorn","José Cambronero","Andy Gordon","Vu Le","Elnaz Nouri","Nadia Polikarpova","Advait Sarkar","Brian Slininger","Neil Toronto"],"abstract":"Large language models are rapidly replacing help forums like StackOverflow, and are especially helpful to non-professional programmers and end users. These users are often interested in data-centric tasks , like spreadsheet manipulation and data wrangling, which are hard to solve if the intent is only communicated using a natural-language description, without including data. But how do we decide how much data and which data to include in the prompt?This paper makes two contributions towards answering this question. First, we create a dataset of real-world NL-to-code tasks manipulating tabular data, mined from StackOverflow posts. Second, we introduce a novel cluster-then-select prompting technique, which adds the most representative rows from the input data to the LLM prompt. Our experiments show that LLM performance is indeed sensitive to the amount of data passed in the prompt, and tha...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Natural language processing","software engineering","1970-01-01","LLM"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/position-what-can-large-language-models-tell-us-about-time-series-analysis","title":"Position: What Can Large Language Models Tell Us about Time Series Analysis","url":"https://www.microsoft.com/en-us/research/publication/position-what-can-large-language-models-tell-us-about-time-series-analysis/","published":"2024-06-01","authors":["Ming Jin","Yifan Zhang","Wei Chen","Kexin Zhang","Yuxuan Liang","Bin Yang","Jindong Wang","Shirui Pan","Qingsong Wen"],"abstract":"Time series analysis is essential for comprehending the complexities inherent in various realworld systems and applications. Although large language models (LLMs) have recently made significant strides, the development of artificial general intelligence (AGI) equipped with time series analysis capabilities remains in its nascent phase. Most existing time series models heavily rely on domain knowledge and extensive model tuning, predominantly focusing on prediction tasks. In this paper, we argue that current LLMs have the potential to revolutionize time series analysis, thereby promoting efficient decision-making and advancing towards a more universal form of time series analytical intelligence. Such advancement could unlock a wide range of possibilities, including time series modality switching and question answering. We encourage researchers and practitioners to recognize the potential....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","1970-01-01","LLM","efficient"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/language-models-can-be-deductive-solvers","title":"Language Models can be Deductive Solvers","url":"https://www.microsoft.com/en-us/research/publication/language-models-can-be-deductive-solvers/","published":"2024-06-01","authors":["Jiazhan Feng","Ruochen Xu","Junheng Hao","Hiteshi Sharma","Yelong Shen","Dongyan Zhao","Weizhu Chen"],"abstract":"Logical reasoning is a fundamental aspect of human intelligence and a key component of tasks like problem-solving and decision-making. Recent advancements have enabled Large Language Models (LLMs) to potentially exhibit reasoning capabilities, but complex logical reasoning remains a challenge. The state-of-the-art, solver-augmented language models, use LLMs to parse natural language logical questions into symbolic representations first and then adopt external logical solvers to take in the symbolic representations and output the answers. Despite their impressive performance, any parsing errors will inevitably result in the failure of the execution of external logical solvers and no answer to the logical questions. In this paper, we introduce LoGiPT, a novel language model that directly internalizes and emulates the reasoning processes of logical solvers and avoids parsing errors by learn...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Human language technologies","Computer science","1970-01-01","language model"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/sigma-secure-gpt-inference-with-function-secret-sharing","title":"SIGMA: Secure GPT Inference with Function Secret Sharing","url":"https://www.microsoft.com/en-us/research/publication/sigma-secure-gpt-inference-with-function-secret-sharing/","published":"2024-06-01","authors":["Kanav Gupta","Neha Jawalkar","Ananta Mukherjee","Nishanth Chandran","Divya Gupta","Ashish Panwar","Rahul Sharma"],"abstract":"Secure 2-party computation (2PC) enables secure inference that offers protection for both proprietary machine learning (ML) models and sensitive inputs to them. However, the existing secure inference solutions suffer from high latency and communication overheads, particularly for transformers. Function secret sharing (FSS) is a recent paradigm for obtaining efficient 2PC protocols with a preprocessing phase. We provide SIGMA, the first end-to-end system for secure transformer inference based on FSS. By constructing new FSS-based protocols for complex machine learning functionalities, such as Softmax and GeLU, and also accelerating their computation on GPUs, SIGMA improves the latency of secure inference of transformers by 11-19x over the state-of-the-art that uses preprocessing and GPUs. We present the first secure inference of generative pre-trained transformer (GPT) models. In particul...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Inproceedings (Conference)","Security, privacy, and cryptography","Computer science","1970-01-01","efficient"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/overview-of-the-mediqa-m3g-2024-shared-task-on-multilingual-multimodal-medical-answer-generation","title":"Overview of the MEDIQA-M3G 2024 Shared Task on Multilingual Multimodal Medical Answer Generation","url":"https://www.microsoft.com/en-us/research/publication/overview-of-the-mediqa-m3g-2024-shared-task-on-multilingual-multimodal-medical-answer-generation/","published":"2024-06-01","authors":["Wen-wai Yim","Asma Ben Abacha","Yujuan Fu","Zhaoyi Sun","Fei Xia","Meliha Yetisgen","Martin Krallinger"],"abstract":"Remote patient care provides opportunities for expanding medical access, saving healthcare costs, and offering on-demand convenient services. In the MEDIQA-M3G 2024 Shared Task, researchers explored solutions for the specific task of dermatological consumer health visual question answering, where user generated queries and images are used as input and a free-text answer response is generated as output. In this novel challenge, eight teams with a total of 48 submissions were evaluated across three language test sets. In this work, we provide a summary of the dataset, as well as results and approaches. We hope that the insights learned here will inspire future research directions that can lead to technology that deburdens clinical workload and improves care. Venue: 1970-01-01","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer vision","Medical, health and genomics","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"huawei-noah:180","title":"Evolution of Heuristics: Towards Efficient Automatic Algorithm Design Using Large Language Model","url":"https://www.noahlab.com.hk/en/scientific_research/evolution-of-heuristics-towards-efficient-automatic-algorithm-design-using-large-language-model","published":"2024-06-01","authors":["Huawei/Noah"],"abstract":"Official Huawei Noah's Ark Lab publication entry. Venue: ICML 24. External paper link: https://arxiv.org/pdf/2401.02051","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Industry Intelligence","ICML 24","2024","language model","efficient"],"author_affiliations":["Huawei/Noah"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Huawei Noah Wagtail publication API https://www.noahlab.com.hk/wt_app/api/v2/pages/"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/peekaboo-interactive-video-generation-via-masked-diffusion-2","title":"Peekaboo: Interactive video generation via masked-diffusion","url":"https://www.microsoft.com/en-us/research/publication/peekaboo-interactive-video-generation-via-masked-diffusion-2/","published":"2024-06-01","authors":["Yash Jain","Anshul Nasery","Vibhav Vineet","Harkirat Behl"],"abstract":"Modern video generation models like Sora have achieved remarkable success in producing high-quality videos. However, a significant limitation is their inability to offer interactive control to users, a feature that promises to open up unprecedented applications and creativity. In this work, we introduce the first solution to equip diffusion-based video generation models with spatio-temporal control. We present PEEKABOO, a novel masked attention module, which seamlessly integrates with current video generation models offering control without the need for additional training or inference overhead. To facilitate future research, we also introduce a comprehensive benchmark for interactive video generation. This benchmark offers a standardized framework for the community to assess the efficacy of emerging interactive video generation models. Our extensive qualitative and quantitative assessme...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Inproceedings (Conference)","Computer vision","Computer science","Computer Vision and Pattern Recognition"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/overview-of-the-mediqa-corr-2024-shared-task-on-medical-error-detection-and-correction","title":"Overview of the MEDIQA-CORR 2024 Shared Task on Medical Error Detection and Correction","url":"https://www.microsoft.com/en-us/research/publication/overview-of-the-mediqa-corr-2024-shared-task-on-medical-error-detection-and-correction/","published":"2024-06-01","authors":["Asma Ben Abacha","Wen-wai Yim","Yujuan Fu","Zhaoyi Sun","Fei Xia","Meliha Yetisgen"],"abstract":"Automatic detection and correction of medical errors enables a more rigorous validation of medical documentation as well as clinical notes generated by large language models. Such solutions can ensure the accuracy and medical coherence of clinical texts and enhance patient care and health outcomes. The MEDIQA-CORR 2024 shared task focused on detecting and correcting different types of medical errors in clinical texts. Seventeen teams participated in the shared task and experimented with a broad range of approaches and models. In this paper, we describe the MEDIQA-CORR task, datasets, and the participants’ results and methods. Venue: 1970-01-01","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Medical, health and genomics","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/is-in-context-learning-in-large-language-models-bayesian-a-martingale-perspective","title":"Is In-Context Learning in Large Language Models Bayesian? A Martingale Perspective","url":"https://www.microsoft.com/en-us/research/publication/is-in-context-learning-in-large-language-models-bayesian-a-martingale-perspective/","published":"2024-06-01","authors":["Fabian Falck","Ziyu Wang","Chris Holmes"],"abstract":"In-context learning (ICL) has emerged as a particularly remarkable characteristic of Large Language Models (LLM): given a pretrained LLM and an observed dataset, LLMs can make predictions for new data points from the same distribution without fine-tuning. Numerous works have postulated ICL as approximately Bayesian inference, rendering this a natural hypothesis. In this work, we analyse this hypothesis from a new angle through the martingale property, a fundamental requirement of a Bayesian learning system for exchangeable data. We show that the martingale property is a necessary condition for unambiguous predictions in such scenarios, and enables a principled, decomposed notion of uncertainty vital in trustworthy, safety-critical systems. We derive actionable checks with corresponding theory and test statistics which must hold if the martingale property is satisfied. We also examine if....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","1970-01-01","LLM"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W4403215639","title":"Generative AI as Economic Agents","url":"https://doi.org/10.1145/3699824.3699832","published":"2024-06-01","authors":["Nicole Immorlica","Brendan Lucier","Aleksandrs Slivkins"],"abstract":"Traditionally, AI has been modeled within economics as a technology that impacts payoffs by reducing costs or refining information for human agents. Our position is that, in light of recent advances in generative AI, it is increasingly useful to model AI itself as an economic agent. In our framework, each user is augmented with an AI agent and can consult the AI prior to taking actions in a game. The AI agent and the user have potentially different information and preferences over the communication, which can result in equilibria that are qualitatively different than in settings without AI.","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3699824.3699832","openalex_id":"https://openalex.org/W4403215639","cited_by_count":11,"quality_score":52,"matched_keywords":["agent"],"author_affiliations":["Microsoft (United States)","Microsoft Research (United Kingdom)","Microsoft Research New England (United States)","Microsoft Research New York City (United States)"],"concepts":[{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.7354899048805237},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.37449222803115845},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3716384768486023}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":11}},{"id":"official:c602e86b3c8d29a5","title":"Claude Sonnet 3.5 System Card","url":"https://www-cdn.anthropic.com/fed9cc193a14b84131812372d8d5857f8f304c52/Model_Card_Claude_3_Addendum.pdf","published":"2024-06","authors":["Anthropic"],"abstract":"Official Anthropic system card for Claude Sonnet 3.5.","companies":["Anthropic"],"matched_orgs":["Anthropic"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"model_card","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["system card","Claude","Claude Sonnet 3.5"],"author_affiliations":["Anthropic"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Anthropic system cards page https://www.anthropic.com/system-cards"}},{"id":"official:886929ffc6d3f26c","title":"RegionGPT: Towards Region Understanding Vision Language Model","url":"https://research.nvidia.com/publication/2024-06_regiongpt-towards-region-understanding-vision-language-model","published":"2024-06","authors":["Qiushan Guo","Shalini De Mello","Hongxu Danny Yin","Wonmin Byeon","Ka Chun Cheung","Yizhou Yu","Ping Luo","Sifei Liu"],"abstract":"Official NVIDIA Research publication. CVPR","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["CVPR","language model"],"author_affiliations":["NVIDIA"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official NVIDIA Research publications page https://research.nvidia.com/publications?f%5B0%5D=publication_date%3A2024&page=2"}},{"id":"official:317b52c8925cdcd8","title":"Large Language Model (LLM) for Standard Cell Layout Design Optimization","url":"https://research.nvidia.com/publication/2024-06_large-language-model-llm-standard-cell-layout-design-optimization","published":"2024-06","authors":["Chia-Tung (Mark) Ho","Mark Haoxing Ren"],"abstract":"Official NVIDIA Research publication.","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["LLM","language model"],"author_affiliations":["NVIDIA"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official NVIDIA Research publications page https://research.nvidia.com/publications?f%5B0%5D=publication_date%3A2024&page=2"}},{"id":"official:ac75e04bc4cc6e45","title":"Motion-I2V: Consistent and Controllable Image-to-Video Generation with Explicit Motion Modeling","url":"https://research.nvidia.com/publication/2024-06_motion-i2v-consistent-and-controllable-image-video-generation-explicit-motion","published":"2024-06","authors":["Xiaoyu Shi","Zhaoyang Huang","Fu-Yun Wang","Weikang Bian","Dasong Li","Yi Zhang","Manyuan Zhang","Ka Chun Cheung","Simon See","Hongwei Qin","Jifeng Dai","Hongsheng Li"],"abstract":"Official NVIDIA Research publication. SIGGRAPH","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["SIGGRAPH"],"author_affiliations":["NVIDIA"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official NVIDIA Research publications page https://research.nvidia.com/publications?f%5B0%5D=publication_date%3A2024&page=2"}},{"id":"official:878eb2914d354ed2","title":"An Empirical Study of Mamba-based Language Models","url":"https://research.nvidia.com/publication/2024-06_empirical-study-mamba-based-language-models","published":"2024-06","authors":["Roger Waleffe","Wonmin Byeon","Duncan Riach","Brandon Norick","Vijay Korthikanti","Tri Dao","Albert Gu","Ali Hatamizadeh","Sudhakar Singh","Deepak Narayanan","Garvit Kulshreshtha","Vartika Singh"],"abstract":"Official NVIDIA Research publication.","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["NVIDIA"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official NVIDIA Research publications page https://research.nvidia.com/publications?f%5B0%5D=publication_date%3A2024&page=2"}},{"id":"openalex:W4399518864","title":"The impact of generative artificial intelligence on socioeconomic inequalities and policy making","url":"https://doi.org/10.1093/pnasnexus/pgae191","published":"2024-05-31","authors":["Valerio Capraro","Austin Lentsch","Daron Acemoğlu","Selin Akgün","Aisel Akhmedova","Ennio Bilancini","Jean‐François Bonnefon","Pablo Brañas‐Garza","Luigi Butera","Karen M. Douglas","Jim A. C. Everett","Gerd Gigerenzer"],"abstract":", it might improve diagnostics and accessibility, but could deepen pre-existing inequalities. In each section, we cover a specific topic, evaluate existing research, identify critical gaps, and recommend research directions, including explicit trade-offs that complicate the derivation of a priori hypotheses. We conclude with a section highlighting the role of policymaking to maximize generative AI's potential to reduce inequalities while mitigating its harmful effects. We discuss strengths and weaknesses of existing policy frameworks in the European Union, the United States, and the United Kingdom, observing that each fails to fully confront the socioeconomic challenges we have identified. We propose several concrete policies that could promote shared prosperity through the advancement of generative AI. This article emphasizes the need for interdisciplinary collaborations to understand a...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1093/pnasnexus/pgae191","openalex_id":"https://openalex.org/W4399518864","cited_by_count":263,"quality_score":67,"matched_keywords":[],"author_affiliations":["Aarhus University","Bocconi University","Brigham Young University","Copenhagen Business School","Economic and Social Research Institute","IIT@MIT","IMT School for Advanced Studies Lucca","Massachusetts Institute of Technology","Max Planck Institute for Human Development","Michigan State University","Microsoft (United States)","Monash University","New York University","Norwegian School of Economics","The University of Queensland","Toulouse School of Economics","Universidad Loyola Andalucía","University of Cambridge","University of Kent","University of Klagenfurt","University of Massachusetts Boston","University of Milano-Bicocca","University of Pennsylvania","University of Turin","University of York","Vrije Universiteit Amsterdam","Australian Regenerative Medicine Institute","Center for Interdisciplinary Studies","Centre for Economic Policy Research","Max Planck Society","Microsoft Research New York City (United States)","National Bureau of Economic Research","New York Law School","Trinity College Dublin","Universidad Loyola","University of Modena and Reggio Emilia","University of Oxford"],"concepts":[{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.7552961111068726},{"id":"https://openalex.org/C45555294","display_name":"Inequality","score":0.6461290121078491},{"id":"https://openalex.org/C147077947","display_name":"Socioeconomic status","score":0.6287858486175537},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.32217228412628174},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3161502480506897},{"id":"https://openalex.org/C144024400","display_name":"Sociology","score":0.2725818455219269},{"id":"https://openalex.org/C33923547","display_name":"Mathematics","score":0.23506346344947815},{"id":"https://openalex.org/C149923435","display_name":"Demography","score":0.13527706265449524}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":263}},{"id":"apple:ssqkm9ylv7q5r8hg5hu1zlvo","title":"Embedding Pose Graph, Enabling 3D Foundation Model Capabilities with a Compact Representation","url":"https://machinelearning.apple.com/research/embedding-pose-graph","published":"2024-05-31","authors":["Hugues Thomas","Jian Zhang"],"abstract":"This paper presents the Embedding Pose Graph (EPG), an innovative method that combines the strengths of foundation models with a simple 3D representation suitable for robotics applications. Addressing the need for efficient spatial understanding in robotics, EPG provides a compact yet powerful approach by attaching foundation model features to the nodes of a pose graph. Unlike traditional methods that rely on bulky data formats like voxel grids...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["efficient"],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"arxiv:2405.18721","title":"Correctable Landmark Discovery via Large Models for Vision-Language Navigation","url":"http://arxiv.org/abs/2405.18721","published":"2024-05-31","authors":["Bingqian Lin","Yunshuang Nie","Ziming Wei","Yi Zhu","Hang Xu","Shikui Ma","Jianzhuang Liu","Xiaodan Liang"],"abstract":"Vision-Language Navigation (VLN) requires the agent to follow language instructions to reach a target position. A key factor for successful navigation is to align the landmarks implied in the instruction with diverse visual observations. However, previous VLN agents fail to perform accurate modality alignment especially in unexplored scenes, since they learn from limited navigation data and lack sufficient open-world alignment knowledge. In this work, we propose a new VLN paradigm, called COrrectable LaNdmark DiScOvery via Large ModEls (CONSOLE). In CONSOLE, we cast VLN as an open-world sequential landmark discovery problem, by introducing a novel correctable landmark discovery scheme based on two large models ChatGPT and CLIP. Specifically, we use ChatGPT to provide rich open-world landmark cooccurrence commonsense, and conduct CLIP-driven landmark discovery based on these commonsense p...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"preprint","doi":"https://doi.org/10.1109/tpami.2024.3407759","openalex_id":"https://openalex.org/W4399198074","cited_by_count":13,"quality_score":54,"matched_keywords":["agent"],"author_affiliations":["Huawei Technologies (China)","Quanta Computer (China)","Shenzhen Institutes of Advanced Technology","Sun Yat-sen University"],"concepts":[{"id":"https://openalex.org/C2780297707","display_name":"Landmark","score":0.9522619247436523},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7678979635238647},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5426573753356934},{"id":"https://openalex.org/C2776760102","display_name":"Code (set theory)","score":0.47728338837623596},{"id":"https://openalex.org/C177769412","display_name":"Prior probability","score":0.4626380503177643},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.36578959226608276},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.33882877230644226},{"id":"https://openalex.org/C199360897","display_name":"Programming language","score":0.09302827715873718}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":13}},{"id":"openalex:W4399423300","title":"Speak From Heart: An Emotion-Guided LLM-Based Multimodal Method for Emotional Dialogue Generation","url":"https://doi.org/10.1145/3652583.3658104","published":"2024-05-30","authors":["Chenxiao Liu","Zheyong Xie","Sirui Zhao","Jin Zhou","Tong Xu","Minglei Li","Enhong Chen"],"abstract":"Recent advancements in Large Language Models~(LLMs) have greatly enhanced the generation capabilities of dialogue systems. However, progress on emotional expression during dialogues might be still limited, especially when capturing and processing the multimodal cues for emotional expression. Therefore, it is urgent to fully adapt the multimodal understanding ability and transferability of LLMs to enhance the emotional-oriented multimodal processing capabilities. To that end, in this paper, we propose a novel Emotion-Guided Multimodal Dialogue model based on LLM, termed ELMD. Specifically, to enhance the emotional expression ability of LLMs, our ELMD customizes an emotional retrieval module, which mainly provides appropriate response demonstration for LLM in understanding emotional context. Subsequently, a two-stage training strategy is proposed, founded on previous demonstration support,...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3652583.3658104","openalex_id":"https://openalex.org/W4399423300","cited_by_count":20,"quality_score":65,"matched_keywords":["LLM","retrieval"],"author_affiliations":["Huawei Technologies (China)","University of Science and Technology of China"],"concepts":[{"id":"https://openalex.org/C2779343474","display_name":"Context (archaeology)","score":0.6031089425086975},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5755674839019775},{"id":"https://openalex.org/C90559484","display_name":"Expression (computer science)","score":0.5498121976852417},{"id":"https://openalex.org/C143110190","display_name":"Emotional expression","score":0.5384248495101929},{"id":"https://openalex.org/C180747234","display_name":"Cognitive psychology","score":0.5029608607292175},{"id":"https://openalex.org/C61272859","display_name":"Transferability","score":0.4605514109134674},{"id":"https://openalex.org/C4441509","display_name":"Multimodal therapy","score":0.4483611583709717},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.3901299834251404}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":20}},{"id":"apple:oag12q7d4wtg77n24rhovb9u","title":"CLIP meets Model Zoo Experts: Pseudo-Supervision for Visual Enhancement","url":"https://machinelearning.apple.com/research/clip-model-zoo-experts","published":"2024-05-30","authors":["Mohammadreza Salehi","Mehrdad Farajtabar","Maxwell Horton","Fartash Faghri","Hadi Pouransari","Raviteja Vemulapalli","Oncel Tuzel","Ali Farhadi","Mohammad Rastegari","Sachin Mehta"],"abstract":"Contrastive language image pretraining (CLIP) is a standard method for training vision-language models. While CLIP is scalable, promptable, and robust to distribution shifts on image classification tasks, it lacks object localization capabilities. This paper studies the following question: Can we augment CLIP training with task-specific vision models from model zoos to improve its visual representations? Towards this end, we leverage open-source...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"openalex:W4399422997","title":"Multi-modal Entity Alignment via Position-enhanced Multi-label Propagation","url":"https://doi.org/10.1145/3652583.3658085","published":"2024-05-30","authors":["Wei Tang","Yuanyi Wang"],"abstract":"Multi-modal Entity Alignment (MMEA) refers to utilizing multiple modalities such as text, images, videos, etc., to match entities from multiple knowledge graphs. Compared to single-modal entity alignment, multi-modal entity alignment can provide a more comprehensive description of entity semantics and improve matching accuracy. Currently, research efforts are directed towards the development of sophisticated deep learning models, such as graph neural networks, that can effectively capture and integrate the multi-modal features of entities for entity alignment tasks. While these models have shown promising results, they tend to focus on capturing only the local structure of entities, leading to the challenge of subgraph isomorphism. Moreover, the complexity of these models often hinders their scalability. To address these limitations, this paper proposes a non-neural, position-enhanced mu...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3652583.3658085","openalex_id":"https://openalex.org/W4399422997","cited_by_count":1,"quality_score":42,"matched_keywords":["long-term"],"author_affiliations":["Huawei Technologies (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8363543152809143},{"id":"https://openalex.org/C71139939","display_name":"Modal","score":0.6277996897697449},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5635189414024353},{"id":"https://openalex.org/C192209626","display_name":"Focus (optics)","score":0.5325638055801392},{"id":"https://openalex.org/C184337299","display_name":"Semantics (computer science)","score":0.5162946581840515},{"id":"https://openalex.org/C48044578","display_name":"Scalability","score":0.5144476890563965},{"id":"https://openalex.org/C165064840","display_name":"Matching (statistics)","score":0.4777059555053711},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.3614429235458374}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/parrot-efficient-serving-of-llm-based-applications-with-semantic-variable","title":"Parrot: Efficient Serving of LLM-based Applications with Semantic Variable","url":"https://www.microsoft.com/en-us/research/publication/parrot-efficient-serving-of-llm-based-applications-with-semantic-variable/","published":"2024-05-29","authors":["Chaofan Lin","Zhenhua Han","Chengruidong Zhang","Yuqing Yang","Fan Yang","Chen Chen","Lili Qiu"],"abstract":"The rise of large language models (LLMs) has enabled LLM-based applications (a.k.a. AI agents or co-pilots), a new software paradigm that combines the strength of LLM and conventional software. Diverse LLM applications from different tenants could design complex workflows using multiple LLM requests to accomplish one task. However, they have to use the over-simplified request-level API provided by today's public LLM services, losing essential application-level information. Public LLM services have to blindly optimize individual LLM requests, leading to sub-optimal end-to-end performance of LLM applications.This paper introduces Parrot, an LLM service system that focuses on the end-to-end experience of LLM-based applications. Parrot proposes Semantic Variable, a unified abstraction to expose application-level knowledge to public LLM services. A Semantic Variable annotates an input/output....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":80,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","Machine learning","1970-01-01","LLM","efficient"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/autodroid-llm-powered-task-automation-in-android","title":"Autodroid: LLM-Powered Task Automation in Android","url":"https://www.microsoft.com/en-us/research/publication/autodroid-llm-powered-task-automation-in-android/","published":"2024-05-29","authors":["Hao Wen","Yuanchun Li","Guohong Liu","Shanhui Zhao","Tao Yu","Toby Jia-Jun Li","Shiqi Jiang","Yunhao Liu","Yaqin Zhang","Yunxin Liu"],"abstract":"Mobile task automation is an attractive technique that aims to enable voice-based hands-free user interaction with smartphones. However, existing approaches suffer from poor scalability due to the limited language understanding ability and the non-trivial manual efforts required from developers or endusers. The recent advance of large language models (LLMs) in language understanding and reasoning inspires us to rethink the problem from a model-centric perspective, where task preparation, comprehension, and execution are handled by a unified language model. In this work, we introduce AutoDroid, a mobile task automation system capable of handling arbitrary tasks on any Android application without manual efforts. The key insight is to combine the commonsense knowledge of LLMs and domain-specific knowledge of apps through automated dynamic analysis. The main components include a functionalit...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":80,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Systems and networking","1970-01-01","LLM","language model","memory"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/instruction-guided-visual-masking","title":"Instruction-Guided Visual Masking","url":"https://www.microsoft.com/en-us/research/publication/instruction-guided-visual-masking/","published":"2024-05-29","authors":["Jinliang Zheng","Jianxiong Li","Si Cheng","Yinan Zheng","Jiaming Li","Jihao Liu","Yu Liu","Jingjing Liu","Xianyuan Zhan"],"abstract":"Instruction following is crucial in contemporary LLM. However, when extended to multimodal setting, it often suffers from misalignment between specific textual instruction and targeted local region of an image. To achieve more accurate and nuanced multimodal instruction following, we introduce Instruction-guided Visual Masking (IVM), a new versatile visual grounding model that is compatible with diverse multimodal models, such as LMM and robot model. By constructing visual masks for instruction-irrelevant regions, IVM-enhanced multimodal models can effectively focus on task-relevant image regions to better align with complex instructions. Specifically, we design a visual masking data generation pipeline and create an IVM-Mix-1M dataset with 1 million image-instruction pairs. We further introduce a new learning technique, Discriminator Weighted Supervised Learning (DWSL) for preferential....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","large language models","1970-01-01","LLM"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/divide-and-conquer-meets-consensus-unleashing-the-power-of-functions-in-code-generation","title":"Divide-and-Conquer Meets Consensus: Unleashing the Power of Functions in Code Generation","url":"https://www.microsoft.com/en-us/research/publication/divide-and-conquer-meets-consensus-unleashing-the-power-of-functions-in-code-generation/","published":"2024-05-29","authors":["Jingchang Chen","Hongxuan Tang","Zheng Chu","Qianglong Chen","Zekun Wang","Ming Liu","Bing Qin"],"abstract":"Despite recent progress made by large language models in code generation, they still struggle with programs that meet complex requirements. Recent work utilizes plan-and-solve decomposition to decrease the complexity and leverage self-tests to refine the generated program. Yet, planning deep-inside requirements in advance can be challenging, and the tests need to be accurate to accomplish self-improvement. To this end, we propose FunCoder, a code generation framework incorporating the divide-and-conquer strategy with functional consensus. Specifically, FunCoder recursively branches off sub-functions as smaller goals during code generation, represented by a tree hierarchy. These sub-functions are then composited to attain more complex objectives. Additionally, we designate functions via a consensus formed by identifying similarities in program behavior, mitigating error propagation. FunCo...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Code generation","Computer science","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/participation-in-the-age-of-foundation-models","title":"Participation in the age of foundation models","url":"https://www.microsoft.com/en-us/research/publication/participation-in-the-age-of-foundation-models/","published":"2024-05-28","authors":["Harini Suresh","Emily Tseng","Meg Young","Mary L. Gray","Emma Pierson","Karen Levy"],"abstract":"Growing interest and investment in the capabilities of foundation models has positioned such systems to impact a wide array of public services. Alongside these opportunities is the risk that these systems reify existing power imbalances and cause disproportionate harm to marginalized communities. Participatory approaches hold promise to instead lend agency and decision-making power to marginalized stakeholders. But existing approaches in participatory AI/ML are typically deeply grounded in context - how do we apply these approaches to foundation models, which are, by design, disconnected from context? Our paper interrogates this question.First, we examine existing attempts at incorporating participation into foundation models. We highlight the tension between participation and scale, demonstrating that it is intractable for impacted communities to meaningfully shape a foundation model th...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":96,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Human-computer interaction","Social sciences","Computer science","Computers and Society","foundation models","human-computer interaction","Machine learning","1970-01-01","journalism"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/think-before-you-act-decision-transformers-with-internal-working-memory","title":"Think Before You Act: Decision Transformers with Internal Working Memory","url":"https://www.microsoft.com/en-us/research/publication/think-before-you-act-decision-transformers-with-internal-working-memory/","published":"2024-05-28","authors":["Jikun Kang","Romain Laroche","Xingdi Yuan","Adam Trischler","Xuefei Liu","Jie Fu"],"abstract":"Large language model (LLM)-based decision-making agents have shown the ability to generalize across multiple tasks. However, their performance relies on massive data and compute. We argue that this inefficiency stems from the forgetting phenomenon, in which a model memorizes its behaviors in parameters throughout training. As a result, training on a new task may deteriorate the model’s performance on previous tasks. In contrast to LLMs’ implicit memory mechanism, the human brain utilizes distributed memory storage, which helps manage and organize multiple skills efficiently, mitigating the forgetting phenomenon. Thus inspired, we propose an internal working memory module to store, blend, and retrieve information for different downstream tasks. Evaluation results show that the proposed method improves training efficiency and generalization in both Atari games and meta-world object manipul...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":84,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","large language models","1970-01-01","LLM","language model","memory"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"bytedance-seed:76","title":"Seeing the Image: Prioritizing Visual Correlation by Contrastive Alignment","url":"https://seed.bytedance.com/en/research/seeing-the-image-prioritizing-visual-correlation-by-contrastive-alignment","published":"2024-05-28","authors":["Xin Xiao","Bohong Wu","Jiacong Wang","Chunyuan Li","Xun Zhou","Haoyuan Guo"],"abstract":"Existing image-text modality alignment in Vision Language Models (VLMs) treatseach text token equally in an autoregressive manner. Despite being simple and effective, this method results in sub-optimal cross-modal alignment by over-emphasizingthe text tokens that are less correlated with or even contradictory with the inputimages. In this paper, we advocate for assigning distinct contributions for each texttoken based on its visual correlation. Specifically, we present by contrasting imageinputs, the difference in prediction logits on each text token provides strong guidance of visual correlation. We therefore introduce Contrastive ALignment (CAL),a simple yet effective re-weighting strategy that prioritizes training visually correlated tokens. Our experimental results demonstrate that CAL consistently improvesdifferent types of VLMs across different resolutions and model sizes on variou...","companies":["ByteDance/Seed"],"matched_orgs":["ByteDance/Seed"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Computer Vision","Multimodal","arXiv","efficient"],"author_affiliations":["ByteDance/Seed"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official ByteDance Seed publication API https://seed.bytedance.com/api/get_article_list_v2"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/bridging-the-gap-dynamic-learning-strategies-for-improving-multilingual-performance-in-llms","title":"Bridging the Gap: Dynamic Learning Strategies for Improving Multilingual Performance in LLMs","url":"https://www.microsoft.com/en-us/research/publication/bridging-the-gap-dynamic-learning-strategies-for-improving-multilingual-performance-in-llms/","published":"2024-05-28","authors":["Somnath Kumar","Vaibhav Balloli","Mercy Ranjit","Kabir Ahuja","Tanuja Ganu","Sunayana Sitaram","Kalika Bali","Akshay Nambi"],"abstract":"Large language models (LLMs) are at the forefront of transforming numerous domains globally. However, their inclusivity and effectiveness remain limited for non-Latin scripts and low-resource languages. This paper tackles the imperative challenge of enhancing the multilingual performance of LLMs without extensive training or fine-tuning. Through systematic investigation and evaluation of diverse languages using popular question-answering (QA) datasets, we present novel techniques that unlock the true potential of LLMs in a polyglot landscape. Our approach encompasses three key strategies that yield significant improvements in multilingual proficiency. First, by meticulously optimizing prompts tailored for polyglot LLMs, we unlock their latent capabilities, resulting in substantial performance boosts across languages. Second, we introduce a new hybrid approach that synergizes LLM Retrieva...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Unpublished","Artificial intelligence","LLM","retrieval"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/promptwizard-task-aware-agent-driven-prompt-optimization-framework","title":"PromptWizard: Task-Aware Agent-driven Prompt Optimization Framework","url":"https://www.microsoft.com/en-us/research/publication/promptwizard-task-aware-agent-driven-prompt-optimization-framework/","published":"2024-05-27","authors":["Eshaan Agarwal","Joykirat Singh","Vivek Dani","Raghav Magazine","Tanuja Ganu","Akshay Nambi"],"abstract":"Large language models (LLMs) have transformed AI across diverse domains, with prompting being central to their success in guiding model outputs. However, manual prompt engineering is both labor-intensive and domain-specific, necessitating the need for automated solutions. We introduce PromptWizard, a novel, fully automated framework for discrete prompt optimization, utilizing a self-evolving, self-adapting mechanism. Through a feedback-driven critique and synthesis process, PromptWizard achieves an effective balance between exploration and exploitation, iteratively refining both prompt instructions and in-context examples to generate human-readable, task-specific prompts. This guided approach systematically improves prompt quality, resulting in superior performance across 45 tasks. PromptWizard excels even with limited training data, smaller LLMs, and various LLM architectures. Additiona...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Unpublished","Artificial intelligence","Computer science","LLM","agent"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W4401751457","title":"MR to CT Synthesis Using 3d Latent Diffusion","url":"https://doi.org/10.1109/isbi56570.2024.10635137","published":"2024-05-27","authors":["Austin Tapp","Abhijeet Parida","Can Zhao","Van Lam","Natasha Leporé","Syed Muhammad Anwar","Marius George Linguraru"],"abstract":"Diffusion probabilistic models are recognized for generating realistically appearing synthetic images, but producing 3D medical images remains computationally intensive. Further, latent diffusion for synthesizing medical volumes has focused on generating images of the same modality as training data. This study introduces cross-modality synthesis using 3D latent diffusion for generating computed tomography (CT) volumes from magnetic resonance imaging (MRI). We train an autoencoder to reconstruct CT via denoising diffusion probabilistic models using a novel MRI-CT latent space. The image generation method is formulated so that the user may produce synthetic CT (sCT), preserving anatomical features, or further noise the latent space, to generate CTs with similar but unique anatomical features, all without model retraining or tuning. Evaluation on public adult and private pediatric datasets....","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/isbi56570.2024.10635137","openalex_id":"https://openalex.org/W4401751457","cited_by_count":8,"quality_score":45,"matched_keywords":[],"author_affiliations":["Children's Hospital of Los Angeles","Nvidia (United States)","University of Southern California"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5354722738265991},{"id":"https://openalex.org/C69357855","display_name":"Diffusion","score":0.4901467263698578},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.33159157633781433},{"id":"https://openalex.org/C121332964","display_name":"Physics","score":0.12979549169540405},{"id":"https://openalex.org/C97355855","display_name":"Thermodynamics","score":0.06386330723762512}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":8}},{"id":"openalex:W4401752267","title":"Improving Mitosis Detection on Histopathology Images Using Large Vision-Language Models","url":"https://doi.org/10.1109/isbi56570.2024.10635613","published":"2024-05-27","authors":["Ruiwen Ding","James M. Hall","Neil Tenenholtz","Kristen Severson"],"abstract":"In certain types of cancerous tissue, mitotic count has been shown to be associated with tumor proliferation, poor prognosis, and therapeutic resistance. Due to the high inter-rater variability of mitotic counting by pathologists, convolutional neural networks (CNNs) have been employed to reduce the subjectivity of mitosis detection in hematoxylin and eosin (H&E)-stained whole slide images. However, most existing models have performance that lags behind expert panel review and only incorporate visual information. In this work, we demonstrate that pre-trained large-scale vision-language models that leverage both visual features and natural language improve mitosis detection accuracy. We formulate the mitosis detection task as an image captioning task and a visual question answering (VQA) task by including metadata such as tumor and scanner types as context. The effectiveness of our pipeli...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/isbi56570.2024.10635613","openalex_id":"https://openalex.org/W4401752267","cited_by_count":3,"quality_score":40,"matched_keywords":[],"author_affiliations":["Microsoft (United States)","University of California, Los Angeles"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7021129727363586},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.535214900970459},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.46462899446487427},{"id":"https://openalex.org/C544855455","display_name":"Histopathology","score":0.43179991841316223},{"id":"https://openalex.org/C71924100","display_name":"Medicine","score":0.0},{"id":"https://openalex.org/C142724271","display_name":"Pathology","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":3}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/promptfix-you-prompt-and-we-fix-the-photo","title":"PromptFix: You Prompt and We Fix the Photo","url":"https://www.microsoft.com/en-us/research/publication/promptfix-you-prompt-and-we-fix-the-photo/","published":"2024-05-26","authors":["Yongsheng Yu","Ziyun Zeng","Hang Hua","Jianlong Fu","Jiebo Luo"],"abstract":"Diffusion models equipped with language models demonstrate excellent controllability in image generation tasks, allowing image processing to adhere to human instructions. However, the lack of diverse instruction-following data hampers the development of models that effectively recognize and execute user-customized instructions, particularly in low-level tasks. Moreover, the stochastic nature of the diffusion process leads to deficiencies in image generation or editing tasks that require the detailed preservation of the generated images. To address these limitations, we propose PromptFix, a comprehensive framework that enables diffusion models to follow human instructions to perform a wide variety of image-processing tasks. First, we construct a large-scale instruction-following dataset that covers comprehensive image-processing tasks, including low-level tasks, image editing, and object....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Inproceedings (Conference)","Computer vision","Computer science","Image generation","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/inversionview-a-general-purpose-method-for-reading-information-from-neural-activations","title":"InversionView: A General-Purpose Method for Reading Information from Neural Activations","url":"https://www.microsoft.com/en-us/research/publication/inversionview-a-general-purpose-method-for-reading-information-from-neural-activations/","published":"2024-05-26","authors":["Xinting Huang","Madhur Panwar","Navin Goyal","Michael Hahn"],"abstract":"The inner workings of neural networks can be better understood if we can fully decipher the information encoded in neural activations. In this paper, we argue that this information is embodied by the subset of inputs that give rise to similar activations. Computing such subsets is nontrivial as the input space is exponentially large. We propose InversionView, which allows us to practically inspect this subset by sampling from a trained decoder model conditioned on activations. This helps uncover the information content of activation vectors, and facilitates understanding of the algorithms implemented by transformer models. We present four case studies where we investigate models ranging from small transformers to GPT-2. In these studies, we demonstrate the characteristics of our method, show the distinctive advantages it offers, and provide causally verified circuits.","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","neural networks"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/crafting-interpretable-embeddings-by-asking-llms-questions","title":"Crafting Interpretable Embeddings by Asking LLMs Questions","url":"https://www.microsoft.com/en-us/research/publication/crafting-interpretable-embeddings-by-asking-llms-questions/","published":"2024-05-25","authors":["Vinamra Benara","Chandan Singh","John X. Morris","Richard Antonello","Ion Stoica","Alexander Huth","Jianfeng Gao"],"abstract":"Large language models (LLMs) have rapidly improved text embeddings for a growing array of natural-language processing tasks. However, their opaqueness and proliferation into scientific domains such as neuroscience have created a growing need for interpretability. Here, we ask whether we can obtain interpretable embeddings through LLM prompting. We introduce question-answering embeddings (QA-Emb), embeddings where each feature represents an answer to a yes/no question asked to an LLM. Training QA-Emb reduces to selecting a set of underlying questions rather than learning model weights. We use QA-Emb to flexibly generate interpretable models for predicting fMRI voxel responses to language stimuli. QA-Emb significantly outperforms an established interpretable baseline, and does so while requiring very few questions. This paves the way towards building flexible feature spaces that can concre...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":88,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Biology","Computer science","large language models","Natural language processing","1970-01-01","LLM","efficient"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"official:4f1c9097b0461016","title":"DOC-RAG: ASR Language Model Personalization with Domain-Distributed Co-occurrence Retrieval Augmentation","url":"https://ai.meta.com/research/publications/doc-rag-asr-language-model-personalization-with-domain-distributed-co-occurrence-retrieval-augmentation/","published":"2024-05-24","authors":["Zhe Liu"],"abstract":"Official Meta AI publication page.","companies":["Meta/FAIR"],"matched_orgs":["Meta/FAIR"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Speech & Audio","NLP","language model","personalization","retrieval"],"author_affiliations":["Meta/FAIR"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Meta AI publication search https://ai.meta.com/results/?content_types%5B0%5D=publication&page=13"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/culturepark-boosting-cross-cultural-understanding-in-large-language-models","title":"CulturePark: Boosting Cross-cultural Understanding in Large Language Models","url":"https://www.microsoft.com/en-us/research/publication/culturepark-boosting-cross-cultural-understanding-in-large-language-models/","published":"2024-05-23","authors":["Cheng Li","Damien Teney","Linyi Yang","Qingsong Wen","Xing Xie","Jindong Wang"],"abstract":"Cultural bias is pervasive in many large language models (LLMs), largely due to the deficiency of data representative of different cultures. Typically, cultural datasets and benchmarks are constructed either by extracting subsets of existing datasets or by aggregating from platforms such as Wikipedia and social media. However, these approaches are highly dependent on real-world data and human annotations, making them costly and difficult to scale. Inspired by cognitive theories on social communication, this paper introduces CulturePark, an LLM-powered multi-agent communication framework for cultural data collection. CulturePark simulates cross-cultural human communication with LLM-based agents playing roles in different cultures. It generates high-quality cross-cultural dialogues encapsulating human beliefs, norms, and customs. Using CulturePark, we generated 41,000 cultural samples to f...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":88,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","large language models","1970-01-01","LLM","media","agent","multi-agent"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/efficient-adversarial-training-in-llms-with-continuous-attacks","title":"Efficient Adversarial Training in LLMs with Continuous Attacks","url":"https://www.microsoft.com/en-us/research/publication/efficient-adversarial-training-in-llms-with-continuous-attacks/","published":"2024-05-23","authors":["Sophie Xhonneux","Alessandro Sordoni","Stephan Günnemann","Gauthier Gidel","Leo Schwinn"],"abstract":"Large language models (LLMs) are vulnerable to adversarial attacks that can bypass their safety guardrails. In many domains, adversarial training has proven to be one of the most promising methods to reliably improve robustness against such attacks. Yet, in the context of LLMs, current methods for adversarial training are hindered by the high computational costs required to perform discrete adversarial attacks at each training iteration. We address this problem by instead calculating adversarial attacks in the continuous embedding space of the LLM, which is orders of magnitudes more efficient. We propose a fast adversarial training algorithm (C-AdvUL) composed of two losses: the first makes the model robust on continuous embedding attacks computed on an adversarial behaviour dataset; the second ensures the usefulness of the final model by fine-tuning on utility data. Moreover, we introdu...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":80,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","large language models","1970-01-01","LLM","efficient"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/opportunities-and-risks-of-large-language-models-in-psychiatry","title":"Opportunities and risks of large language models in psychiatry","url":"https://www.microsoft.com/en-us/research/publication/opportunities-and-risks-of-large-language-models-in-psychiatry/","published":"2024-05-23","authors":["Nick Obradovich","S. Khalsa","Waqas U. Khan","Jina Suh","Roy H. Perlis","Olusola A Ajilore","Martin P. Paulus"],"abstract":"The integration of large language models (LLMs) into mental healthcare and research heralds a potentially transformative shift, one offering enhanced access to care, efficient data collection, and innovative therapeutic tools. This paper reviews the development, function, and burgeoning use of LLMs in psychiatry, highlighting their potential to enhance mental healthcare through improved diagnostic accuracy, personalized care, and streamlined administrative processes. It is also acknowledged that LLMs introduce challenges related to computational demands, potential for misinterpretation, and ethical concerns, necessitating the development of pragmatic frameworks to ensure their safe deployment. We explore both the promise of LLMs in enriching psychiatric care and research through examples such as predictive analytics and therapy chatbots and risks including labor substitution, privacy con...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Article (Journal)","Artificial intelligence","Human-computer interaction","Mental health","personalized","efficient"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/quantifying-the-gain-in-weak-to-strong-generalization","title":"Quantifying the Gain in Weak-to-Strong Generalization","url":"https://www.microsoft.com/en-us/research/publication/quantifying-the-gain-in-weak-to-strong-generalization/","published":"2024-05-23","authors":["Moses Charikar","Chirag Pabbaraju","Kirankumar Shiragur"],"abstract":"Recent advances in large language models have shown capabilities that are extraordinary and near-superhuman. These models operate with such complexity that reliably evaluating and aligning them proves challenging for humans. This leads to the natural question: can guidance from weak models (like humans) adequately direct the capabilities of strong models? In a recent and somewhat surprising work, Burns et al. (2023) empirically demonstrated that when strong models (like GPT-4) are finetuned using labels generated by weak supervisors (like GPT-2), the strong models outperform their weaker counterparts -- a phenomenon they term weak-to-strong generalization. In this work, we present a theoretical framework for understanding weak-to-strong generalization. Specifically, we show that the improvement in performance achieved by strong models over their weaker counterparts is quantified by the m...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","large language models","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"bytedance-seed:77","title":"Unveiling the Tapestry of Consistency in Large Vision-Language Models","url":"https://seed.bytedance.com/en/research/unveiling-the-tapestry-of-consistency-in-large-vision-language-models","published":"2024-05-23","authors":["Yuan Zhang","Fei Xiao","Tao Huang","Chun-Kai Fan","Hongyuan Dong","Jiawen Li","Jiacong Wang","Kuan Cheng","Shanghang Zhang","Haoyuan Guo"],"abstract":"Large vision-language models (LVLMs) have recently achieved rapid progress, exhibiting great perception and reasoning abilities concerning visual information. However, when faced with prompts in different sizes of solution spaces, LVLMs fail to always give consistent answers regarding the same knowledge point. This inconsistency of answers between different solution spaces is prevalent in LVLMs and erodes trust. To this end, we provide a multi-modal benchmark ConBench, to intuitively analyze how LVLMs perform when the solution space of a prompt revolves around a knowledge point. Based on the ConBench tool, we are the first to reveal the tapestry and get the following findings: (1) In the discriminate realm, the larger the solution space of the prompt, the lower the accuracy of the answers. (2) Establish the relationship between the discriminative and generative realms: the accuracy of th...","companies":["ByteDance/Seed"],"matched_orgs":["ByteDance/Seed"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Computer Vision","Vision","arXiv"],"author_affiliations":["ByteDance/Seed"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official ByteDance Seed publication API https://seed.bytedance.com/api/get_article_list_v2"}},{"id":"hf-org-paper:deepseek-ai:2405.14333","title":"DeepSeek-Prover: Advancing Theorem Proving in LLMs through Large-Scale Synthetic Data","url":"https://huggingface.co/papers/2405.14333","published":"2024-05-23","authors":["DeepSeek"],"abstract":"Proof assistants like Lean have revolutionized mathematical proof verification, ensuring high accuracy and reliability. Although large language models (LLMs) show promise in mathematical reasoning, their advancement in formal theorem proving is hindered by a lack of training data. To address this issue, we introduce an approach to generate extensive Lean 4 proof data derived from high-school and undergraduate-level mathematical competition problems. This approach involves translating natural language problems into formal statements, filtering out low-quality statements, and generating proofs to create synthetic data. After fine-tuning the DeepSeekMath 7B model on this synthetic dataset, which comprises 8 million formal statements with proofs, our model achieved whole-proof generation accuracies of 46.3% with 64 samples and 52% cumulatively on the Lean 4 miniF2F test, surpassing the basel...","companies":["DeepSeek"],"matched_orgs":["DeepSeek"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["HuggingFace org papers","deepseek-ai"],"author_affiliations":["DeepSeek"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/deepseek-ai/papers"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/a-whole-slide-foundation-model-for-digital-pathology-from-real-world-data","title":"A whole-slide foundation model for digital pathology from real-world data","url":"https://www.microsoft.com/en-us/research/publication/a-whole-slide-foundation-model-for-digital-pathology-from-real-world-data/","published":"2024-05-22","authors":["Hanwen Xu","Naoto Usuyama","Jaspreet Bagga","Sheng Zhang","Rajesh Rao","Tristan Naumann","Cliff Wong","Zelalem Gero","Javier González","Yu Gu","Yanbo Xu","Mu-Hsin Wei"],"abstract":"Digital pathology poses unique computational challenges, as a standard gigapixel slide may comprise tens of thousands of image tiles. Prior models have often resorted to subsampling a small portion of tiles for each slide, thus missing the important slide-level context. Here we present Prov-GigaPath, a whole-slide pathology foundation model pretrained on 1.3 billion 256 × 256 pathology image tiles in 171,189 whole slides from Providence, a large US health network comprising 28 cancer centres.","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Article (Journal)","Artificial intelligence","Medical, health and genomics","Medicine","Pathology"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/small-language-models-for-application-interactions-a-case-study","title":"Small Language Models for Application Interactions: A Case Study","url":"https://www.microsoft.com/en-us/research/publication/small-language-models-for-application-interactions-a-case-study/","published":"2024-05-22","authors":["Beibin Li","Yi Zhang","Sébastien Bubeck","Jeevan Pathuri","Ishai Menache"],"abstract":"We study the efficacy of Small Language Models (SLMs) in facilitating application usage through natural language interactions. Our focus here is on a particular internal application used in Microsoft for cloud supply chain fulfilment. Our experiments show that small models can outperform much larger ones in terms of both accuracy and running time, even when fine-tuned on small datasets. Alongside these results, we also highlight SLM-based system design considerations.","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Miscellaneous","Artificial intelligence","Computer science","small language models"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/xrag-extreme-context-compression-for-retrieval-augmented-generation-with-one-token","title":"xRAG: Extreme Context Compression for Retrieval-augmented Generation with One Token","url":"https://www.microsoft.com/en-us/research/publication/xrag-extreme-context-compression-for-retrieval-augmented-generation-with-one-token/","published":"2024-05-21","authors":["Xin Cheng","Xun Wang","Xingxing Zhang","Tao Ge","Si-Qing Chen","Furu Wei","Huishuai Zhang","Dongyan Zhao"],"abstract":"This paper introduces xRAG, an innovative context compression method tailored for retrieval-augmented generation. xRAG reinterprets document embeddings in dense retrieval--traditionally used solely for retrieval--as features from the retrieval modality. By employing a modality fusion methodology, xRAG seamlessly integrates these embeddings into the language model representation space, effectively eliminating the need for their textual counterparts and achieving an extreme compression rate. In xRAG, the only trainable component is the modality bridge, while both the retriever and the language model remain frozen. This design choice allows for the reuse of offline-constructed document embeddings and preserves the plug-and-play nature of retrieval augmentation. Experimental results demonstrate that xRAG achieves an average improvement of over 10% across six knowledge-intensive tasks, adapta...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":84,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","1970-01-01","language model","retrieval","efficient","compression"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W4398183095","title":"Protecting scientific integrity in an age of generative AI","url":"https://doi.org/10.1073/pnas.2407886121","published":"2024-05-21","authors":["Wolfgang Blau","Vinton G. Cerf","Juan Enriquez","Joseph S. Francisco","Urs Gasser","Mary L. Gray","Mark Greaves","Barbara J. Grosz","Kathleen Hall Jamieson","Gerald H. Haug","John L. Hennessy","Eric Horvitz"],"abstract":"ISSN:0027-8424","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"editorial","doi":"https://doi.org/10.1073/pnas.2407886121","openalex_id":"https://openalex.org/W4398183095","cited_by_count":70,"quality_score":67,"matched_keywords":[],"author_affiliations":["BG Group (United Kingdom)","Carnegie Mellon University","Columbia University","Executive Office of the President","German National Academy of Sciences Leopoldina","Google (United States)","Harvard University","Harvard University Press","Indiana University Bloomington","Lawrence Berkeley National Laboratory","Massachusetts Institute of Technology","Microsoft (United States)","Microsoft Research New England (United States)","National Academy of Sciences","Stanford University","Technical University of Munich","The Francis Crick Institute","The University of Texas at Austin","University of California, Berkeley","University of Michigan","University of Pennsylvania"],"concepts":[{"id":"https://openalex.org/C89611455","display_name":"Mechanism (biology)","score":0.6860346794128418},{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.6561134457588196},{"id":"https://openalex.org/C2164484","display_name":"Core (optical fiber)","score":0.5092722177505493},{"id":"https://openalex.org/C188147891","display_name":"Cognitive science","score":0.47427377104759216},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.3999381959438324},{"id":"https://openalex.org/C15744967","display_name":"Psychology","score":0.3970794081687927},{"id":"https://openalex.org/C46312422","display_name":"Communication","score":0.3738376498222351},{"id":"https://openalex.org/C111472728","display_name":"Epistemology","score":0.31187203526496887}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":70}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/to-err-is-human-how-about-medical-large-language-models-comparing-pre-trained-language-models-for-medical-assessment-errors-and-reliability","title":"To Err Is Human, How about Medical Large Language Models? Comparing Pre-trained Language Models for Medical Assessment Errors and Reliability","url":"https://www.microsoft.com/en-us/research/publication/to-err-is-human-how-about-medical-large-language-models-comparing-pre-trained-language-models-for-medical-assessment-errors-and-reliability/","published":"2024-05-20","authors":["Wen-wai Yim","Yujuan Fu","Asma Ben Abacha","Meliha Yetisgen"],"abstract":"Unpredictability, especially unpredictability with unknown error characteristics, is a highly undesirable trait, particularly in medical patient care applications. Although large pre-trained language models (LLM) have been applied to a variety of unseen tasks with highly competitive and successful results, their sensitivity to language inputs and resulting performance variability is not well-studied. In this work, we test state-of-the-art pre-trained language models from a variety of families to characterize their error generation and reliability in medical assessment ability. Particularly, we experiment with general medical assessment multiple choice tests, as well as their open-ended and true-false alternatives. We also profile model consistency, error agreements with each other and to humans; and finally, quantify their ability to recover and explain errors. The findings in this work....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Medical, health and genomics","Computer science","1970-01-01","LLM"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"apple:uh72y08ai0a7kogdi7p1jfm0","title":"Automatic Creative Selection with Cross-Modal Matching","url":"https://machinelearning.apple.com/research/automatic-creative-selection","published":"2024-05-20","authors":["Alex Kim","Jia Huang","Rob Monarch","Jerry Kwac","Anikesh Kamath","Parmeshwar (Parry) Khurd","Kailash Thiyagarajan","Goodman Gu"],"abstract":"Application developers advertise their Apps by creating product pages with App images, and bidding on search terms. It is then crucial for App images to be highly relevant with the search terms. Solutions to this problem require an image-text matching model to predict the quality of the match between the chosen image and the search terms. In this work, we present a novel approach to matching an App image to search terms based on fine-tuning a...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"arxiv:2405.11724","title":"Token-wise Influential Training Data Retrieval for Large Language Models","url":"https://huggingface.co/papers/2405.11724","published":"2024-05-20","authors":["Huawei Lin","Jikai Long","Zhaozhuo Xu","Weijie Zhao"],"abstract":"Given a Large Language Model (LLM) generation, how can we identify which training data led to this generation? In this paper, we proposed RapidIn, a scalable framework adapting to LLMs for estimating the influence of each training data. The proposed framework consists of two stages: caching and retrieval. First, we compress the gradient vectors by over 200,000x, allowing them to be cached on disk or in GPU/CPU memory. Then, given a generation, RapidIn efficiently traverses the cached gradients to estimate the influence within minutes, achieving over a 6,326x speedup. Moreover, RapidIn supports multi-GPU parallelization to substantially accelerate caching and retrieval. Our empirical result confirms the efficiency and effectiveness of RapidIn.","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["huggingface_search"],"source":"huggingface_search","work_type":"","doi":"","openalex_id":"","cited_by_count":0,"quality_score":43,"matched_keywords":["LLM","language model","memory","retrieval"],"author_affiliations":[],"concepts":[],"official_report":false,"quality_signals":{"company_match_source":"HuggingFace Papers organization/author metadata"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/diffusion-for-world-modeling-visual-details-matter-in-atari","title":"Diffusion for World Modeling: Visual Details Matter in Atari","url":"https://www.microsoft.com/en-us/research/publication/diffusion-for-world-modeling-visual-details-matter-in-atari/","published":"2024-05-19","authors":["Eloi Alonso","Adam Jelley","Vincent Micheli","Anssi Kanervisto","A. Storkey","Tim Pearce","Franccois Fleuret"],"abstract":"World models constitute a promising approach for training reinforcement learning agents in a safe and sample-efficient manner. Recent world models predominantly operate on sequences of discrete latent variables to model environment dynamics. However, this compression into a compact discrete representation may ignore visual details that are important for reinforcement learning. Concurrently, diffusion models have become a dominant approach for image generation, challenging well-established methods modeling discrete latents. Motivated by this paradigm shift, we introduce DIAMOND (DIffusion As a Model Of eNvironment Dreams), a reinforcement learning agent trained in a diffusion world model. We analyze the key design choices that are required to make diffusion suitable for world modeling, and demonstrate how improved visual details can lead to improved agent performance. DIAMOND achieves a m...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":84,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","reinforcement learning agents","1970-01-01","efficient","compression","agent"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/mora-high-rank-updating-for-parameter-efficient-fine-tuning","title":"MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning","url":"https://www.microsoft.com/en-us/research/publication/mora-high-rank-updating-for-parameter-efficient-fine-tuning/","published":"2024-05-19","authors":["Ting Jiang","Shaohan Huang","Shengyue Luo","Zihan Zhang","Haizhen Huang","Furu Wei","Weiwei Deng","Feng Sun","Qi Zhang","Deqing Wang","Fuzhen Zhuang"],"abstract":"Low-rank adaptation is a popular parameter-efficient fine-tuning method for large language models. In this paper, we analyze the impact of low-rank updating, as implemented in LoRA. Our findings suggest that the low-rank updating mechanism may limit the ability of LLMs to effectively learn and memorize new knowledge. Inspired by this observation, we propose a new method called MoRA, which employs a square matrix to achieve high-rank updating while maintaining the same number of trainable parameters. To achieve it, we introduce the corresponding non-parameter operators to reduce the input dimension and increase the output dimension for the square matrix. Furthermore, these operators ensure that the weight can be merged back into LLMs, which makes our method can be deployed like LoRA. We perform a comprehensive evaluation of our method across five tasks: instruction tuning, mathematical re...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":80,"matched_keywords":["Miscellaneous","Artificial intelligence","Computation and Language","Computer science","Machine learning","memory","efficient"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W4400236422","title":"Calculating Color Differences of Images via Siamese Neural Network","url":"http://dx.doi.org/10.1109/iscas58744.2024.10558454","published":"2024-05-19","authors":["Yixuan Gao","Xiongkuo Min","Xiaohong Liu","Lei Sun","Yonglin Luo","Zuowei Cao","Guangtao Zhai"],"abstract":"Recently, the color difference (CD) of standard dynamic range (SDR) images has attracted the attention of researchers. It is worth noting that due to the development of high dynamic range (HDR) image generation technology, the CD of the SDR and HDR images is also worth in-depth research. This is because the HDR image generated from an original SDR image may have changes in color. Some color changes can give people a comfortable impression, but this may also change the information originally expressed in the SDR image. Therefore, this paper researches the CD of the original SDR image and the generated HDR image, and proposes a network to predict the CDs of SDR and HDR image pairs. Specifically, we first build a SDR-HDR image CD dataset. The dataset contains 504 SDR and HDR image pairs, where HDR images are generated from the SDR images using five HDR image generation methods. Second, we p...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/iscas58744.2024.10558454","openalex_id":"https://openalex.org/W4400236422","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Shanghai Jiao Tong University","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7048593163490295},{"id":"https://openalex.org/C50644808","display_name":"Artificial neural network","score":0.6371829509735107},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.6133908033370972},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.5340681672096252},{"id":"https://openalex.org/C153180895","display_name":"Pattern recognition (psychology)","score":0.3753437399864197},{"id":"https://openalex.org/C121684516","display_name":"Computer graphics (images)","score":0.34501802921295166}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/synthetic-test-collections-for-retrieval-evaluation","title":"Synthetic Test Collections for Retrieval Evaluation","url":"https://www.microsoft.com/en-us/research/publication/synthetic-test-collections-for-retrieval-evaluation/","published":"2024-05-18","authors":["Hossein A. Rahmani","Nick Craswell","Emine Yilmaz","Bhaskar Mitra","Daniel Campos"],"abstract":"Test collections play a vital role in evaluation of information retrieval (IR) systems. Obtaining a diverse set of user queries for test collection construction can be challenging, and acquiring relevance judgments, which indicate the appropriateness of retrieved documents to a query, is often costly and resource-intensive. Generating synthetic datasets using Large Language Models (LLMs) has recently gained significant attention in various applications. In IR, while previous work exploited the capabilities of LLMs to generate synthetic queries or documents to augment training data and improve the performance of ranking models, using LLMs for constructing synthetic test collections is relatively unexplored. Previous studies demonstrate that LLMs have the potential to generate synthetic relevance judgments for use in the evaluation of IR systems. In this paper, we comprehensively investiga...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.1145/3626772.3657942","openalex_id":"https://openalex.org/W4400526284","cited_by_count":24,"quality_score":120,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Search and information retrieval","automatic evaluation","Benchmarking","Information retrieval","Language model","large language models","Synthetic data","LLM","retrieval"],"author_affiliations":["Microsoft","Amazon (United Kingdom)","Bellevue Hospital Center","Microsoft (Canada)","Microsoft (United States)","University College London"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/towards-modular-llms-by-building-and-reusing-a-library-of-loras","title":"Towards Modular LLMs by Building and Reusing a Library of LoRAs","url":"https://www.microsoft.com/en-us/research/publication/towards-modular-llms-by-building-and-reusing-a-library-of-loras/","published":"2024-05-17","authors":["O. Ostapenko","Zhan Su","E. Ponti","Laurent Charlin","Nicolas Le Roux","Matheus Pereira","Lucas Caccia","Alessandro Sordoni"],"abstract":"The growing number of parameter-efficient adaptations of a base large language model (LLM) calls for studying whether we can reuse such trained adapters to improve performance for new tasks. We study how to best build a library of adapters given multi-task data and devise techniques for both zero-shot and supervised task generalization through routing in such library. We benchmark existing approaches to build this library and introduce model-based clustering, MBC, a method that groups tasks based on the similarity of their adapter parameters, indirectly optimizing for transfer across the multi-task dataset. To re-use the library, we present a novel zero-shot routing mechanism, Arrow, which enables dynamic selection of the most relevant adapters for new inputs without the need for retraining. We experiment with several LLMs, such as Phi-2 and Mistral, on a wide array of held-out tasks, ve...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":84,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","large language models","1970-01-01","LLM","language model","efficient"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/beyond-assouad-fano-and-le-cam-toward-unified-lower-bounds-for-statistical-estimation-and-interactive-decision-making","title":"Assouad, Fano, and Le Cam with Interaction: A Unifying Lower Bound Framework and Characterization for Bandit Learnability","url":"https://www.microsoft.com/en-us/research/publication/beyond-assouad-fano-and-le-cam-toward-unified-lower-bounds-for-statistical-estimation-and-interactive-decision-making/","published":"2024-05-17","authors":["Fan Chen","Dylan Foster","Yanjun Han","Jian Qian","Alexander Rakhlin","Yunbei Xu"],"abstract":"In this paper, we develop a unified framework for lower bound methods in statistical estimation and interactive decision making. Classical lower bound techniques---such as Fano's inequality, Le Cam's method, and Assouad's lemma---have been central to the study of minimax risk in statistical estimation, yet they are insufficient for the analysis of methods that collect data in an interactive manner. The recent minimax lower bounds for interactive decision making via the Decision-Estimation Coefficient (DEC) appear to be genuinely different from the classical methods. We propose a unified view of these distinct methodologies through a general algorithmic lower bound method. We further introduce a novel complexity measure, decision coverage, which facilitates the derivation of new lower bounds for interactive decision making. In particular, we establish necessary and sufficient complexity m...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","statistical estimation","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W4397012899","title":"Align vision-language semantics by multi-task learning for multi-modal summarization","url":"https://doi.org/10.1007/s00521-024-09908-3","published":"2024-05-17","authors":["Chenhao Cui","Xinnian Liang","Shuangzhi Wu","Zhoujun Li"],"abstract":"","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1007/s00521-024-09908-3","openalex_id":"https://openalex.org/W4397012899","cited_by_count":7,"quality_score":44,"matched_keywords":[],"author_affiliations":["Beihang University","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C170858558","display_name":"Automatic summarization","score":0.8774203062057495},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7344877123832703},{"id":"https://openalex.org/C71139939","display_name":"Modal","score":0.6675122380256653},{"id":"https://openalex.org/C2780451532","display_name":"Task (project management)","score":0.5690015554428101},{"id":"https://openalex.org/C184337299","display_name":"Semantics (computer science)","score":0.5394251346588135},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.5274488925933838},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.49365171790122986},{"id":"https://openalex.org/C199360897","display_name":"Programming language","score":0.18201297521591187}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":7}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/lean-attention-hardware-aware-scalable-attention-mechanism-for-the-decode-phase-of-transformers","title":"LeanAttention: Hardware-Aware Scalable Attention Mechanism for the Decode-Phase of Transformers","url":"https://www.microsoft.com/en-us/research/publication/lean-attention-hardware-aware-scalable-attention-mechanism-for-the-decode-phase-of-transformers/","published":"2024-05-16","authors":["Rya Sanovar","Srikant Bharadwaj","Renee St. Amant","Victor Ruehle","Saravan Rajmohan"],"abstract":"Transformer-based models have emerged as one of the most widely used architectures for natural language processing, natural language generation, and image generation. The size of the state-of-the-art models has increased steadily reaching billions of parameters. These huge models are memory hungry and incur significant inference latency even on cutting edge AI-accelerators, such as GPUs. Specifically, the time and memory complexity of the attention operation is quadratic in terms of the total context length, i.e., prompt and output tokens. Thus, several optimizations such as key-value tensor caching and FlashAttention computation have been proposed to deliver the low latency demands of applications relying on such large models. However, these techniques do not cater to the computationally distinct nature of different phases during inference. To that end, we propose LeanAttention, a scala...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Miscellaneous","Artificial intelligence","Hardware and devices","Systems and networking","Computer science","memory"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W4396929025","title":"An in-depth evaluation of federated learning on biomedical natural language processing for information extraction","url":"https://doi.org/10.1038/s41746-024-01126-4","published":"2024-05-15","authors":["Le Peng","Gaoxiang Luo","Sicheng Zhou","Jiandong Chen","Ziyue Xu","Ju Sun","Rui Zhang"],"abstract":"Language models (LMs) such as BERT and GPT have revolutionized natural language processing (NLP). However, the medical field faces challenges in training LMs due to limited data access and privacy constraints imposed by regulations like the Health Insurance Portability and Accountability Act (HIPPA) and the General Data Protection Regulation (GDPR). Federated learning (FL) offers a decentralized solution that enables collaborative learning while ensuring data privacy. In this study, we evaluated FL on 2 biomedical NLP tasks encompassing 8 corpora using 6 LMs. Our results show that: (1) FL models consistently outperformed models trained on individual clients' data and sometimes performed comparably with models trained with polled data; (2) with the fixed number of total data, FL models training with more clients produced inferior performance but pre-trained transformer-based models exhibi...","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1038/s41746-024-01126-4","openalex_id":"https://openalex.org/W4396929025","cited_by_count":39,"quality_score":67,"matched_keywords":[],"author_affiliations":["Nvidia (United States)","University of Minnesota","University of Pennsylvania"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7426158785820007},{"id":"https://openalex.org/C2778306010","display_name":"Health Insurance Portability and Accountability Act","score":0.7413277626037598},{"id":"https://openalex.org/C63000827","display_name":"Software portability","score":0.6186211109161377},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5600792169570923},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.4527299106121063},{"id":"https://openalex.org/C66322947","display_name":"Transformer","score":0.43928804993629456},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.4359763264656067},{"id":"https://openalex.org/C204854418","display_name":"Polling","score":0.43122488260269165}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":39}},{"id":"openalex:W4396941612","title":"<i>MTPret</i>: Improving X-Ray Image Analytics With Multitask Pretraining","url":"https://doi.org/10.1109/tai.2024.3400750","published":"2024-05-15","authors":["Weibin Liao","Qingzhong Wang","Xuhong Li","Yi Liu","Zeyu Chen","Siyu Huang","Dejing Dou","Yanwu Xu","Haoyi Xiong"],"abstract":"While deep neural networks (DNNs) have been widely used in various X-ray image analytics tasks such as classification, segmentation, detection, etc., there frequently needs to collect and annotate a huge amount of training data to train a model for every single task. In this work, we proposed a multi-task self-supervised pre-training strategy MTPret to improve the performance of DNNs in various X-ray analytics tasks. MTPret first trains the backbone to learn visual representations from multiple datasets of different tasks through contrastive learning, then MTPret leverages a multi-task continual learning to learn discriminative features from various downstream tasks. To evaluate the performance of MTPret, we collected eleven X-ray image datasets from different body parts, such as heads, chest, lungs, bones, and etc., for various tasks to pre-train backbones, and fine-tuned the networks o...","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/tai.2024.3400750","openalex_id":"https://openalex.org/W4396941612","cited_by_count":3,"quality_score":40,"matched_keywords":[],"author_affiliations":["Baidu (China)","Harvard University Press"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8524391651153564},{"id":"https://openalex.org/C97931131","display_name":"Discriminative model","score":0.6836367249488831},{"id":"https://openalex.org/C114466953","display_name":"Initialization","score":0.6818903684616089},{"id":"https://openalex.org/C27158222","display_name":"Generalizability theory","score":0.6691714525222778},{"id":"https://openalex.org/C2780451532","display_name":"Task (project management)","score":0.6677219867706299},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.634063720703125},{"id":"https://openalex.org/C79158427","display_name":"Analytics","score":0.5927294492721558},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.5783153772354126}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":3}},{"id":"openalex:W4400908709","title":"Comparison of Machine Learning Algorithms and Large Language Models for Product Categorization","url":"https://doi.org/10.1109/siu61531.2024.10600809","published":"2024-05-15","authors":["Abdullah İhsanoğlu","Mounes Zaval","Olcay Taner Yıldız"],"abstract":"This study explores the efficacy of traditional machine learning algorithms and Large Language Models (LLMs) in automating product categorization for online e-commerce platforms. By comparing these methodologies, we assess their performance in classifying a diverse range of product listings. Our findings indicate that for this context, LLMs offer similar performance in understanding and categorizing complex textual data to traditional machine learning techniques, suggesting that use of LLMs in this context may be unnecessary, and that the trade-off ultimately comes down to the operational costs and resource consumption of each model. This work contributes to the field by providing insights into the capabilities and limitations of current text categorization techniques in the context of rapidly expanding online marketplaces.","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/siu61531.2024.10600809","openalex_id":"https://openalex.org/W4400908709","cited_by_count":2,"quality_score":39,"matched_keywords":[],"author_affiliations":["Huawei Technologies (China)"],"concepts":[{"id":"https://openalex.org/C94124525","display_name":"Categorization","score":0.7822595238685608},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7471712231636047},{"id":"https://openalex.org/C90673727","display_name":"Product (mathematics)","score":0.5700221061706543},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5494361519813538},{"id":"https://openalex.org/C2986744138","display_name":"Text categorization","score":0.5261770486831665},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.5208740234375},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.4850313067436218},{"id":"https://openalex.org/C11413529","display_name":"Algorithm","score":0.42766910791397095}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":2}},{"id":"openalex:W4396912460","title":"Practical applications of advanced cloud services and generative AI systems in medical image analysis","url":"https://doi.org/10.54254/2755-2721/64/20241361","published":"2024-05-14","authors":["Jingyu Xu","Binbin Wu","Jiaxin Huang","Yulu Gong","Yifan Zhang","Bo Liu"],"abstract":"The medical field is one of the important fields in the application of artificial intelligence technology. With the explosive growth and diversification of medical data, as well as the continuous improvement of medical needs and challenges, artificial intelligence technology is playing an increasingly important role in the medical field. Artificial intelligence technologies represented by computer vision, natural language processing, and machine learning have been widely penetrated into diverse scenarios such as medical imaging, health management, medical information, and drug research and development, and have become an important driving force for improving the level and quality of medical services. The article explores the transformative potential of generative AI in medical imaging, emphasizing its ability to generate synthetic data, enhance images, aid in anomaly detection, and facil...","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.54254/2755-2721/64/20241361","openalex_id":"https://openalex.org/W4396912460","cited_by_count":28,"quality_score":65,"matched_keywords":[],"author_affiliations":["Amazon (United States)","American Society of Heating, Refrigerating, and Air-Conditioning Engineers","Northern Arizona University","Software Engineering Institute","Trine University","Tsinghua University","Zhejiang University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6031461358070374},{"id":"https://openalex.org/C9652623","display_name":"Field (mathematics)","score":0.5271453857421875},{"id":"https://openalex.org/C534262118","display_name":"Medical diagnosis","score":0.5248727202415466},{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.5206117630004883},{"id":"https://openalex.org/C70587473","display_name":"Transformative learning","score":0.5068104863166809},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5023305416107178},{"id":"https://openalex.org/C2522767166","display_name":"Data science","score":0.497130423784256},{"id":"https://openalex.org/C31601959","display_name":"Medical imaging","score":0.47886964678764343}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":28}},{"id":"apple:rmytoxm67fl9kw9ab22ozz0w","title":"KV-Runahead: Scalable Causal LLM Inference by Parallel Key-Value Cache Generation","url":"https://machinelearning.apple.com/research/kv-runahead","published":"2024-05-14","authors":["Minsik Cho","Mohammad Rastegari","Devang Naik"],"abstract":"Large Language Model or LLM inference has two phases, the prompt (or prefill) phase to output the first token and the extension (or decoding) phase to the generate subsequent tokens. In this work, we propose an efficient parallelization scheme, KV-Runahead to accelerate the prompt phase. The key observation is that the extension phase generates tokens faster than the prompt phase because of key-value cache (KV-cache). Hence, KV-Runahead...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["LLM","language model","efficient"],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"official:b69866d83547e367","title":"Hunyuan-DiT: A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding","url":"https://huggingface.co/papers/2405.08748","published":"2024-05-14","authors":["Tencent Hunyuan"],"abstract":"- Abstract - 🎉 Hunyuan-DiT Key Features - Chinese-English Bilingual DiT Architecture - Multi-turn Text2Image Generation - 📈 Comparisons - 🎥 Visualization - 📜 Requirements - 🛠 Dependencies and Installation - 🧱 Download Pretrained Models - :truck: Training - Data Preparation - Full Parameter Training - LoRA - 🔑 Inference - Using Gradio - Using Diffusers - Using Command Line - More Configurations - Using ComfyUI - 🚀 Acceleration (for Linux) - 🔗 BibTeX","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_report"],"source":"official_report","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Tencent/Hunyuan"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official company report source"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/ms-marco-web-search-a-large-scale-information-rich-web-dataset-with-millions-of-real-click-labels","title":"MS MARCO Web Search: A Large-scale Information-rich Web Dataset with Millions of Real Click Labels","url":"https://www.microsoft.com/en-us/research/publication/ms-marco-web-search-a-large-scale-information-rich-web-dataset-with-millions-of-real-click-labels/","published":"2024-05-13","authors":["Qi Chen","Xiubo Geng","Corby Rosset","Carolyn Buractaon","Jingwen Lu","Tao Shen","Kun Zhou","Chenyan Xiong","Yeyun Gong","Paul Bennett","Nick Craswell","Xing Xie"],"abstract":"Recent breakthroughs in large models have highlighted the critical significance of data scale, labels and modals. In this paper, we introduce MS MARCO Web Search, the first large-scale information-rich web dataset, featuring millions of real clicked query-document labels. This dataset closely mimics real-world web document and query distribution, provides rich information for various kinds of downstream tasks and encourages research in various areas, such as generic end-to-end neural indexer models, generic embedding models, and next generation information access system with large language models. MS MARCO Web Search offers a retrieval benchmark with three web retrieval challenge tasks that demands innovations in both machine learning and information retrieval system research domains. As the first dataset that meets large, real and rich data requirements, MS MARCO Web Search paves the wa...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.1145/3589335.3648327","openalex_id":"https://openalex.org/W4396843721","cited_by_count":7,"quality_score":91,"matched_keywords":["Inproceedings (Conference)","Search and information retrieval","Systems and networking","Evaluation of retrieval results","Information retrieval","Information systems","Relevance assessment","retrieval"],"author_affiliations":["Microsoft","Carnegie Mellon University","ETH Zurich","Microsoft (Norway)","Microsoft (United States)","Microsoft Research (India)","Microsoft Research Asia (China)","Photon Spot (United States)","University of Technology Sydney"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W4400909953","title":"Adapting Large Language Models by Integrating Collaborative Semantics for Recommendation","url":"https://doi.org/10.1109/icde60146.2024.00118","published":"2024-05-13","authors":["Bowen Zheng","Yupeng Hou","Hongyu Lu","Yu Chen","Wayne Xin Zhao","Ming Chen","Ji-Rong Wen"],"abstract":"Recently, large language models (LLMs) have shown great potential in recommender systems, either improving existing recommendation models or serving as the backbone. However, there exists a large semantic gap between LLMs and recommender systems, since items to be recommended are often indexed by discrete identifiers (item ID) out of the LLM's vocabulary. In essence, LLMs capture language semantics while recommender systems imply collaborative semantics, making it difficult to sufficiently leverage the model capacity of LLMs for recommendation. To address this challenge, in this paper, we propose a new LLM-based recommendation model called LC-Rec, which can better integrate language and collaborative semantics for recommender systems. Our approach can directly generate items from the entire item set for recommendation, without relying on candidate items. Specifically, we make two major c...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/icde60146.2024.00118","openalex_id":"https://openalex.org/W4400909953","cited_by_count":76,"quality_score":75,"matched_keywords":["LLM","quantization"],"author_affiliations":["Renmin University of China","Tencent (China)","University of California, San Diego"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.773774266242981},{"id":"https://openalex.org/C184337299","display_name":"Semantics (computer science)","score":0.6923025846481323},{"id":"https://openalex.org/C199360897","display_name":"Programming language","score":0.4584159553050995},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.4471280872821808}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":76}},{"id":"openalex:W4401416547","title":"ZAPP! Zonotope Agreement of Prediction and Planning for Continuous-Time Collision Avoidance with Discrete-Time Dynamics","url":"https://doi.org/10.1109/icra57147.2024.10610953","published":"2024-05-13","authors":["Luca Paparusso","Shreyas Kousik","Edward Schmerling","Francesco Braghin","Marco Pavone"],"abstract":"The past few years have seen immense progress on two fronts that are critical to safe, widespread mobile robot deployment: predicting uncertain motion of multiple agents, and planning robot motion under uncertainty. However, the numerical methods required on each front have resulted in a mismatch of representation for prediction and planning. In prediction, numerical tractability is usually achieved by coarsely discretizing time, and by representing multimodal multi-agent interactions as distributions with infinite support. On the other hand, safe planning typically requires very fine time discretization, paired with distributions with compact support, to reduce conservativeness and ensure numerical tractability. The result is, when existing predictors are coupled with planning and control, one may often find unsafe motion plans. This paper proposes ZAPP (Zonotope Agreement of Prediction...","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/icra57147.2024.10610953","openalex_id":"https://openalex.org/W4401416547","cited_by_count":3,"quality_score":48,"matched_keywords":["agent","multi-agent"],"author_affiliations":["Georgia Institute of Technology","Nvidia (United States)","Politecnico di Milano"],"concepts":[{"id":"https://openalex.org/C2780864053","display_name":"Collision avoidance","score":0.7985966205596924},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6548936367034912},{"id":"https://openalex.org/C121704057","display_name":"Collision","score":0.5689517259597778},{"id":"https://openalex.org/C145912823","display_name":"Dynamics (music)","score":0.4315928518772125},{"id":"https://openalex.org/C47446073","display_name":"Control theory (sociology)","score":0.40685394406318665},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.22295576333999634},{"id":"https://openalex.org/C121332964","display_name":"Physics","score":0.18591323494911194},{"id":"https://openalex.org/C38652104","display_name":"Computer security","score":0.08989012241363525}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":3}},{"id":"openalex:W4396878071","title":"EvCap: Element-Aware Video Captioning","url":"https://doi.org/10.1109/tcsvt.2024.3399933","published":"2024-05-13","authors":["Sheng Liu","Annan Li","Yuwei Zhao","Jiahao Wang","Yunhong Wang"],"abstract":"Video captioning is a multi-modal task across computer vision and natural language processing. Previous methods generally follow two paradigms, i.e. template-based and sequence-based. Template-based methods can generate relatively accurate elements (e.g. humans, objects, or actions) to complete a template caption, but with a rather limited vocabulary and syntactic structure; in contrast, sequence-based methods generate more natural descriptions like humans but easily suffer element errors due to their heavy dependence on visual features that often contain much distracting information. In this work, we draw lessons from the element extraction manner in template-based methods and propose a novel Element-aware video Captioning (EvCap) framework that applies linguistic features beyond general visual features to consolidate model awareness of specific elements under the sequence-based paradig...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/tcsvt.2024.3399933","openalex_id":"https://openalex.org/W4396878071","cited_by_count":11,"quality_score":48,"matched_keywords":[],"author_affiliations":["Beihang University","Huawei Technologies (China)"],"concepts":[{"id":"https://openalex.org/C157657479","display_name":"Closed captioning","score":0.8722955584526062},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7849608659744263},{"id":"https://openalex.org/C200288055","display_name":"Element (criminal law)","score":0.47518980503082275},{"id":"https://openalex.org/C121684516","display_name":"Computer graphics (images)","score":0.4437185823917389},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.40103086829185486},{"id":"https://openalex.org/C49774154","display_name":"Multimedia","score":0.39084383845329285},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.38133835792541504},{"id":"https://openalex.org/C115961682","display_name":"Image (mathematics)","score":0.24831250309944153}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":11}},{"id":"openalex:W4401414505","title":"CppFlow: Generative Inverse Kinematics for Efficient and Robust Cartesian Path Planning","url":"https://doi.org/10.1109/icra57147.2024.10611724","published":"2024-05-13","authors":["Jeremy Morgan","David Millard","Gaurav S. Sukhatme"],"abstract":"In this work we present CppFlow - a novel and performant planner for the Cartesian Path Planning problem, which finds valid trajectories up to 129x faster than current methods, while also succeeding on more difficult problems where others fail. At the core of the proposed algorithm is the use of a learned, generative Inverse Kinematics solver, which is able to efficiently produce promising entire candidate solution trajectories on the GPU. Precise, valid solutions are then found through classical approaches such as differentiable programming, global search, and optimization. In combining approaches from these two paradigms we get the best of both worlds - efficient approximate solutions from generative AI which are made exact using the guarantees of traditional planning and optimization. We evaluate our system against other state of the art methods on a set of established baselines as we...","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/icra57147.2024.10611724","openalex_id":"https://openalex.org/W4401414505","cited_by_count":5,"quality_score":46,"matched_keywords":["efficient"],"author_affiliations":["Amazon (United States)","University of Southern California"],"concepts":[{"id":"https://openalex.org/C17816587","display_name":"Inverse kinematics","score":0.810308039188385},{"id":"https://openalex.org/C39920418","display_name":"Kinematics","score":0.7019960284233093},{"id":"https://openalex.org/C81074085","display_name":"Motion planning","score":0.6744983196258545},{"id":"https://openalex.org/C16038011","display_name":"Cartesian coordinate system","score":0.6629959344863892},{"id":"https://openalex.org/C207467116","display_name":"Inverse","score":0.6095180511474609},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5757569074630737},{"id":"https://openalex.org/C2777735758","display_name":"Path (computing)","score":0.472321480512619},{"id":"https://openalex.org/C33923547","display_name":"Mathematics","score":0.3432193994522095}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":5}},{"id":"openalex:W4400909784","title":"Scaling Up Multivariate Time Series Pre-Training with Decoupled Spatial-Temporal Representations","url":"https://doi.org/10.1109/icde60146.2024.00057","published":"2024-05-13","authors":["Rui Zha","Le Zhang","Shuangli Li","Jingbo Zhou","Tong Xu","Hui Xiong","Enhong Chen"],"abstract":"Data scale has been acknowledged as a crucial factor for enhancing the generalization and effectiveness of pre-training models. While existing methods of multivariate time series pre-training are primarily limited to a single specific dataset, scaling to a larger scenario that includes multiple diverse datasets (e.g., multi-region data) remains a substantial challenge. In this paper, we present a novel Decoupled Spatial-Temporal Representation Learning (DeSTR) framework to serve as the backbone network for investigating the data scaling capability of multivariate time series pre-training architectures. Specifically, DeSTR utilizes two separate encoders to capture both the temporal dynamics within each time series and the spatial correlations among multiple variables. The obtained representations of distinct modalities are then fed into a Spatial-Guided Temporal Transformer to equip the t...","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/icde60146.2024.00057","openalex_id":"https://openalex.org/W4400909784","cited_by_count":7,"quality_score":44,"matched_keywords":[],"author_affiliations":["Baidu (China)","Nature Inspires Creativity Engineers Lab","University of Science and Technology of China"],"concepts":[{"id":"https://openalex.org/C99844830","display_name":"Scaling","score":0.704900860786438},{"id":"https://openalex.org/C161584116","display_name":"Multivariate statistics","score":0.6883879899978638},{"id":"https://openalex.org/C143724316","display_name":"Series (stratigraphy)","score":0.6364186406135559},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6264342069625854},{"id":"https://openalex.org/C151406439","display_name":"Time series","score":0.5795110464096069},{"id":"https://openalex.org/C2777211547","display_name":"Training (meteorology)","score":0.48932427167892456},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.44347310066223145},{"id":"https://openalex.org/C153180895","display_name":"Pattern recognition (psychology)","score":0.3439386487007141}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":7}},{"id":"openalex:W4396843889","title":"One-step Reach: LLM-based Keyword Generation for Sponsored Search Advertising","url":"https://doi.org/10.1145/3589335.3651943","published":"2024-05-12","authors":["Yang Wang","Zheyi Sha","Kylie Lin","Chaobing Feng","Kunhong Zhu","Lipeng Wang","Xuewu Jiao","Fei Huang","Chao Ye","D. He","Zhi Guo","Shuanglong Li"],"abstract":"Query keyword matching plays a crucial role in sponsored search advertising by retrieving semantically related keywords of the user query to target relevant advertisements. Conventional technical solutions adopt the retrieve-judge-then-rank retrieval framework structured in cascade funnels. However, it has limitations in accurately depicting the semantic relevance between the query and keyword, and the cumulative funnel losses result in unsatisfactory precision and recall. To address the above issues, this paper proposes a Large Language Model (LLM)-based keyword generation method (LKG) to reach related keywords from the search query in one step. LKG models the query keyword matching as an end-to-end keyword generation task based on the LLM through multi-match prompt tuning. Moreover, it employs the feedback tuning and the prefix tree-based constrained beam search to improve the generati...","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3589335.3651943","openalex_id":"https://openalex.org/W4396843889","cited_by_count":0,"quality_score":49,"matched_keywords":["LLM","language model","retrieval"],"author_affiliations":["Baidu (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7088559865951538},{"id":"https://openalex.org/C2988412617","display_name":"Keyword search","score":0.6217876672744751},{"id":"https://openalex.org/C187687199","display_name":"Search advertising","score":0.41003769636154175},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.4011009633541107},{"id":"https://openalex.org/C512338625","display_name":"Online advertising","score":0.3292887508869171},{"id":"https://openalex.org/C136764020","display_name":"World Wide Web","score":0.2723485827445984},{"id":"https://openalex.org/C110875604","display_name":"The Internet","score":0.08376455307006836}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"arxiv:2402.18899","title":"Aligning Language Models for Versatile Text-based Item Retrieval","url":"http://arxiv.org/abs/2402.18899","published":"2024-05-12","authors":["Yuxuan Lei","Jianxun Lian","Jing Yao","Mingqi Wu","Defu Lian","Xing Xie"],"abstract":"This paper addresses the gap between general-purpose text embeddings and the specific demands of item retrieval tasks. We demonstrate the shortcomings of existing models in capturing the nuances necessary for zero-shot performance on item retrieval tasks. To overcome these limitations, we propose generate in-domain dataset from ten tasks tailored to unlocking models' representation ability for item retrieval. Our empirical studies demonstrate that fine-tuning embedding models on the dataset leads to remarkable improvements in a variety of retrieval tasks. We also illustrate the practical application of our refined model in a conversational setting, where it enhances the capabilities of LLM-based Recommender Agents like Chat-Rec. Our code is available at https://github.com/microsoft/RecAI.","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3589335.3651468","openalex_id":"https://openalex.org/W4396843788","cited_by_count":2,"quality_score":47,"matched_keywords":["LLM","retrieval"],"author_affiliations":["Microsoft (United States)","Microsoft Research Asia (China)","University of Science and Technology of China"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8313430547714233},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.6309552192687988},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.5938031673431396},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.45005157589912415},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.44174379110336304},{"id":"https://openalex.org/C199360897","display_name":"Programming language","score":0.34448331594467163}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":2}},{"id":"arxiv:2404.02249","title":"RAT: Retrieval-Augmented Transformer for Click-Through Rate Prediction","url":"http://arxiv.org/abs/2404.02249","published":"2024-05-12","authors":["Yushen Li","Jinpeng Wang","Tao Dai","Jieming Zhu","Jun Yuan","Rui Zhang","Shu‐Tao Xia"],"abstract":"Predicting click-through rates (CTR) is a fundamental task for Web applications, where a key issue is to devise effective models for feature interactions. Current methodologies predominantly concentrate on modeling feature interactions within an individual sample, while overlooking the potential cross-sample relationships that can serve as a reference context to enhance the prediction. To make up for such deficiency, this paper develops a <u>R</u>etrieval-<u>A</u>ugmented <u>T</u>ransformer (RAT), aiming to acquire fine-grained feature interactions within and across samples. By retrieving similar samples, we construct augmented input for each target sample. We then build Transformer layers with cascaded attention to capture both intra- and cross-sample feature interactions, facilitating comprehensive reasoning for improved CTR prediction while retaining efficiency. Extensive experiments....","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"preprint","doi":"https://doi.org/10.1145/3589335.3651550","openalex_id":"https://openalex.org/W4393967983","cited_by_count":4,"quality_score":45,"matched_keywords":["retrieval"],"author_affiliations":["Huawei Technologies (China)","Peng Cheng Laboratory","Shenzhen University","Tsinghua University","Tsinghua–Berkeley Shenzhen Institute"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7105386257171631},{"id":"https://openalex.org/C66322947","display_name":"Transformer","score":0.5870278477668762},{"id":"https://openalex.org/C115174607","display_name":"Click-through rate","score":0.429929256439209},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.33438989520072937},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.28472864627838135},{"id":"https://openalex.org/C119599485","display_name":"Electrical engineering","score":0.14110633730888367},{"id":"https://openalex.org/C165801399","display_name":"Voltage","score":0.08696916699409485},{"id":"https://openalex.org/C127413603","display_name":"Engineering","score":0.0726199746131897}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":4}},{"id":"openalex:W4396843600","title":"Modeling User Viewing Flow using Large Language Models for Article Recommendation","url":"https://doi.org/10.1145/3589335.3648305","published":"2024-05-12","authors":["Zhenghao Liu","Zulong Chen","M.-L. Zhang","Shaoyang Duan","Hong Wen","Liangyue Li","Nan Li","Yu Gu","Ge Yu"],"abstract":"This paper proposes the USer ViewING FLow ModEling (SINGLE) method for the article recommendation task, which models the user constant preference and instant interest from user-clicked articles. Specifically, we first employ a user constant viewing flow modeling method to summarize the user's general interest to recommend articles. In this case, we utilize Large Language Models (LLMs) to capture constant user preferences from previously clicked articles, such as skills and positions. Then we design the user instant viewing flow modeling method to build interactions between user-clicked article history and candidate articles. It attentively reads the representations of user-clicked articles and aims to learn the user's different interest views to match the candidate article. Our experimental results on the Alibaba Technology Association (ATA) website show the advantage of SINGLE, achievin...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3589335.3648305","openalex_id":"https://openalex.org/W4396843600","cited_by_count":4,"quality_score":45,"matched_keywords":["preference"],"author_affiliations":["Alibaba Group (China)","Northeastern University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8311599493026733},{"id":"https://openalex.org/C38349280","display_name":"Flow (mathematics)","score":0.47696125507354736},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.4338805079460144},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.3759438395500183},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.35296350717544556},{"id":"https://openalex.org/C199360897","display_name":"Programming language","score":0.3404494524002075},{"id":"https://openalex.org/C136764020","display_name":"World Wide Web","score":0.3302342891693115},{"id":"https://openalex.org/C33923547","display_name":"Mathematics","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":4}},{"id":"openalex:W4396844278","title":"Synslator: An Interactive Machine Translation Tool with Online Learning","url":"https://doi.org/10.1145/3589335.3651240","published":"2024-05-12","authors":["Jiayi Wang","Ke Wang","Fengming Zhou","Chengyu Wang","Zhiyong Fu","Zeyu Feng","Yu Qing Zhao","Yuqi Zhang"],"abstract":"Interactive Machine Translation (IMT) advances the computer-aided translation (CAT) paradigm, enabling collaboration between machine translation systems and human translators for high-quality outputs. This paper presents Synslator, a CAT tool designed for IMT and proficient in online learning with real-time translation memories. Synslator accommodates different CAT service deployments by integrating two neural translation models for online learning and a language model to boost translation fluency interactively. Our evaluations demonstrate the system's online learning effectiveness, showing a 13% increase in post-editing efficiency with Synslator's interactive features. A tutorial video is provided at: https://youtu.be/K0vRsb2lTt8.","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3589335.3651240","openalex_id":"https://openalex.org/W4396844278","cited_by_count":1,"quality_score":42,"matched_keywords":["language model"],"author_affiliations":["Alibaba Group (China)","University College London","Wilmington University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8151682615280151},{"id":"https://openalex.org/C203005215","display_name":"Machine translation","score":0.5928215384483337},{"id":"https://openalex.org/C149364088","display_name":"Translation (biology)","score":0.512652575969696},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.49451157450675964},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.471231073141098},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.3528115153312683},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.34283447265625},{"id":"https://openalex.org/C104317684","display_name":"Gene","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4396843981","title":"Large Language Models for Graphs: Progresses and Directions","url":"https://doi.org/10.1145/3589335.3641251","published":"2024-05-12","authors":["Chao Huang","Xubin Ren","Jiabin Tang","Dawei Yin","Nitesh V. Chawla"],"abstract":"Graph neural networks (GNNs) have emerged as fundamental methods for handling structured graph data in various domains, including citation networks, molecule prediction, and recommender systems. They enable the learning of informative node or graph representations, which are crucial for tasks such as link prediction and node classification in the context of graphs. To achieve high-quality graph representation learning, certain essential factors come into play: clean labels, accurate graph structures, and sufficient initial node features. However, real-world graph data often suffer from noise and sparse labels, while different datasets have unique feature constructions. These factors significantly impact the generalization capabilities of graph neural networks, particularly when faced with unseen tasks. Recently, due to the efficent text processing and task generalization capability of la...","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3589335.3641251","openalex_id":"https://openalex.org/W4396843981","cited_by_count":1,"quality_score":38,"matched_keywords":[],"author_affiliations":["Baidu (China)","University of Hong Kong","University of Notre Dame"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8070157766342163},{"id":"https://openalex.org/C132525143","display_name":"Graph","score":0.5539869666099548},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5034522414207458},{"id":"https://openalex.org/C80444323","display_name":"Theoretical computer science","score":0.5015866756439209},{"id":"https://openalex.org/C177148314","display_name":"Generalization","score":0.4876321852207184},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.4768287241458893},{"id":"https://openalex.org/C557471498","display_name":"Recommender system","score":0.46861395239830017},{"id":"https://openalex.org/C59404180","display_name":"Feature learning","score":0.4174310564994812}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/multimodal-healthcare-ai-identifying-and-designing-clinically-relevant-vision-language-applications-for-radiology","title":"Multimodal Healthcare AI: Identifying and Designing Clinically Relevant Vision-Language Applications for Radiology","url":"https://www.microsoft.com/en-us/research/publication/multimodal-healthcare-ai-identifying-and-designing-clinically-relevant-vision-language-applications-for-radiology/","published":"2024-05-11","authors":["Nur Yildirim","Hannah Richardson (nee Murfet)","Maria T Wetscherek","Junaid Bajwa","Joseph Jacob","Mark A Pinnock","Stephen Harris","Daniel Coelho de Castro","Shruthi Bannur","Stephanie Hyland","Pratik Ghosh","Mercy Ranjit"],"abstract":"Recent advances in AI combine large language models (LLMs) with vision encoders that bring forward unprecedented technical capabilities to leverage for a wide range of healthcare applications. Focusing on the domain of radiology, vision-language models (VLMs) achieve good performance results for tasks such as generating radiology findings based on a patient’s medical image, or answering visual questions (e.g., “Where are the nodules in this chest X-ray?”). However, the clinical utility of potential applications of these capabilities is currently underexplored. We engaged in an iterative, multidisciplinary design process to envision clinically relevant VLM interactions, and co-designed four VLM use concepts: Draft Report Generation, Augmented Report Review, Visual Search and Querying, and Patient Imaging History Highlights. We studied these concepts with 13 radiologists and clinicians who...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.1145/3613904.3642013","openalex_id":"https://openalex.org/W4392120599","cited_by_count":66,"quality_score":106,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Human-computer interaction","Healthcare","Human–computer interaction","1970-01-01"],"author_affiliations":["Microsoft","Cambridge University Hospitals NHS Foundation Trust","Carnegie Mellon University","Microsoft (India)","Microsoft (United Kingdom)","Microsoft (United States)","Nuance Communications (United States)","University College London","University College London Hospitals NHS Foundation Trust"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/codeaid-evaluating-a-classroom-deployment-of-an-llm-based-programming-assistant-that-balances-student-and-educator-needs","title":"CodeAid: Evaluating a Classroom Deployment of an LLM-based Programming Assistant that Balances Student and Educator Needs","url":"https://www.microsoft.com/en-us/research/publication/codeaid-evaluating-a-classroom-deployment-of-an-llm-based-programming-assistant-that-balances-student-and-educator-needs/","published":"2024-05-11","authors":["Majeed Kazemitabaar","Runlong Ye","Xiaoning Wang","Austin Z. Henley","Paul Denny","Michelle Craig","Tovi Grossman"],"abstract":"Timely, personalized feedback is essential for students learning programming. LLM-powered tools like ChatGPT offer instant support, but reveal direct answers with code, which may hinder deep conceptual engagement. We developed CodeAid, an LLM-powered programming assistant delivering helpful, technically correct responses, without revealing code solutions. CodeAid answers conceptual questions, generates pseudo-code with line-by-line explanations, and annotates student's incorrect code with fix suggestions. We deployed CodeAid in a programming class of 700 students for a 12-week semester. A thematic analysis of 8,000 usages of CodeAid was performed, further enriched by weekly surveys, and 22 student interviews. We then interviewed eight programming educators to gain further insights. Our findings reveal four design considerations for future educational AI assistants: D1) exploiting AI's un...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.48550/arxiv.2401.11314","openalex_id":"https://openalex.org/W4391157645","cited_by_count":198,"quality_score":102,"matched_keywords":["Inproceedings (Conference)","Human-computer interaction","1970-01-01","LLM","personalized"],"author_affiliations":["Microsoft","Microsoft (United States)","University of Auckland","University of Toronto"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/how-do-analysts-understand-and-verify-ai-assisted-data-analyses","title":"How Do Analysts Understand and Verify AI-Assisted Data Analyses?","url":"https://www.microsoft.com/en-us/research/publication/how-do-analysts-understand-and-verify-ai-assisted-data-analyses/","published":"2024-05-11","authors":["Ken Gu","Ruoxi Shang","Tim Althoff","Chenglong Wang","Steven Drucker"],"abstract":"Data analysis is challenging as it requires synthesizing domain knowledge, statistical expertise, and programming skills. Assistants powered by large language models (LLMs), such as ChatGPT, can assist analysts by translating natural language instructions into code. However, AI-assistant responses and analysis code can be misaligned with the analyst's intent or be seemingly correct but lead to incorrect conclusions. Therefore, validating AI assistance is crucial and challenging. Here, we explore how analysts understand and verify the correctness of AI-generated analyses. To observe analysts in diverse verification approaches, we develop a design probe equipped with natural language explanations, code, visualizations, and interactive data tables with common data operations. Through a qualitative user study (n=22) using this probe, we uncover common behaviors within verification workflows....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.1145/3613904.3642497","openalex_id":"https://openalex.org/W4396832076","cited_by_count":29,"quality_score":93,"matched_keywords":["Inproceedings (Conference)","Human-computer interaction","1970-01-01"],"author_affiliations":["Microsoft","Microsoft (United States)","University of Washington"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/maidr-making-statistical-visualizations-accessible-with-multimodal-data-representation","title":"MAIDR: Making Statistical Visualizations Accessible with Multimodal Data Representation","url":"https://www.microsoft.com/en-us/research/publication/maidr-making-statistical-visualizations-accessible-with-multimodal-data-representation/","published":"2024-05-11","authors":["JooYoung Seo","Yilin Xia","Bongshin Lee","Sean McCurry","Yu Jun Yam"],"abstract":"This paper investigates new data exploration experiences that enable blind users to interact with statistical data visualizations−bar plots, heat maps, box plots, and scatter plots−leveraging multimodal data representations. In addition to sonification and textual descriptions that are commonly employed by existing accessible visualizations, our MAIDR (multimodal access and interactive data representation) system incorporates two additional modalities (braille and review) that offer complementary benefits. It also provides blind users with the autonomy and control to interactively access and understand data visualizations. In a user study involving 11 blind participants, we found the MAIDR system facilitated the accurate interpretation of statistical visualizations. Participants exhibited a range of strategies in combining multiple modalities, influenced by their past interactions and ex...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.1145/3613904.3642730","openalex_id":"https://openalex.org/W4392428556","cited_by_count":25,"quality_score":89,"matched_keywords":["Inproceedings (Conference)","Human-computer interaction","1970-01-01"],"author_affiliations":["Microsoft","Microsoft (United States)","University of Illinois Urbana-Champaign"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"arxiv:2404.00487","title":"Contextual AI Journaling: Integrating LLM and Time Series Behavioral Sensing Technology to Promote Self-Reflection and Well-being using the MindScape App","url":"http://arxiv.org/abs/2404.00487","published":"2024-05-11","authors":["Subigya Nepal","Arvind Pillai","W. Joseph Campbell","Talie Massachi","Eunsol Soul Choi","Xuhai Xu","Joanna Kuc","Jeremy F. Huckins","Jason Holden","Colin A. Depp","Nicholas C. Jacobson","Mary Czerwinski"],"abstract":"MindScape aims to study the benefits of integrating time series behavioral patterns (e.g., conversational engagement, sleep, location) with Large Language Models (LLMs) to create a new form of contextual AI journaling, promoting self-reflection and well-being. We argue that integrating behavioral sensing in LLMs will likely lead to a new frontier in AI. In this Late-Breaking Work paper, we discuss the MindScape contextual journal App design that uses LLMs and behavioral sensing to generate contextual and personalized journaling prompts crafted to encourage self-reflection and emotional development. We also discuss the MindScape study of college students based on a preliminary user study and our upcoming study to assess the effectiveness of contextual AI journaling in promoting better well-being on college campuses. MindScape represents a new application class that embeds behavioral intel...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"preprint","doi":"https://doi.org/10.1145/3613905.3650767","openalex_id":"https://openalex.org/W4393904418","cited_by_count":50,"quality_score":75,"matched_keywords":["LLM","personalized"],"author_affiliations":["Colby College","Cornell University","Dartmouth College","John Brown University","Massachusetts Institute of Technology","Microsoft (United States)","University College London","University of California San Diego"],"concepts":[{"id":"https://openalex.org/C2225880","display_name":"Journaling file system","score":0.9905785322189331},{"id":"https://openalex.org/C65682993","display_name":"Reflection (computer programming)","score":0.6185095310211182},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5286843180656433},{"id":"https://openalex.org/C15744967","display_name":"Psychology","score":0.4272683560848236},{"id":"https://openalex.org/C71611378","display_name":"Contextual design","score":0.42358165979385376},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.22543278336524963},{"id":"https://openalex.org/C77088390","display_name":"Database","score":0.11328914761543274},{"id":"https://openalex.org/C199360897","display_name":"Programming language","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":50}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/photoscout-synthesis-powered-multi-modal-image-search","title":"PhotoScout: Synthesis-Powered Multi-Modal Image Search","url":"https://www.microsoft.com/en-us/research/publication/photoscout-synthesis-powered-multi-modal-image-search/","published":"2024-05-11","authors":["Celeste Barnaby","Qiaochu Chen","Chenglong Wang","Isil Dillig"],"abstract":"Due to the availability of increasingly large amounts of visual data, there is a growing need for tools that can help users find relevant images. While existing tools can perform image retrieval based on similarity or metadata, they fall short in scenarios that necessitate semantic reasoning about the content of the image. This paper explores a new multi-modal image search approach that allows users to conveniently specify and perform semantic image search tasks. With our tool, PhotoScout, the user interactively provides natural language descriptions, positive and negative examples, and object tags to specify their search tasks. Under the hood, PhotoScout is powered by a program synthesis engine that generates visual queries in a domain-specific language and executes the synthesized program to retrieve the desired images. In a study with 25 participants, we observed that PhotoScout allow...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.1145/3613904.3642319","openalex_id":"https://openalex.org/W4396833137","cited_by_count":5,"quality_score":73,"matched_keywords":["Inproceedings (Conference)","Human-computer interaction","1970-01-01","retrieval"],"author_affiliations":["Microsoft","Microsoft (United States)","The University of Texas at Austin"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/understanding-the-role-of-large-language-models-in-personalizing-and-scaffolding-strategies-to-combat-academic-procrastination","title":"Understanding the Role of Large Language Models in Personalizing and Scaffolding Strategies to Combat Academic Procrastination","url":"https://www.microsoft.com/en-us/research/publication/understanding-the-role-of-large-language-models-in-personalizing-and-scaffolding-strategies-to-combat-academic-procrastination/","published":"2024-05-11","authors":["Ananya Bhattacharjee","Yuchen Zeng","Sarah Yi Xu","Dana Kulzhabayeva","Minyi Ma","Rachel Kornfield","Syed Ishtiaque Ahmed","Alex Mariakakis","Mary Czerwinski","Anastasia Kuzminykh","Michael Liut","Joseph Jay Williams"],"abstract":"Traditional interventions for academic procrastination often fail to capture the nuanced, individual-specific factors that underlie them. Large language models (LLMs) hold immense potential for addressing this gap by permitting open-ended inputs, including the ability to customize interventions to individuals' unique needs. However, user expectations and potential limitations of LLMs in this context remain underexplored. To address this, we conducted interviews and focus group discussions with 15 university students and 6 experts, during which a technology probe for generating personalized advice for managing procrastination was presented. Our results highlight the necessity for LLMs to provide structured, deadline-oriented steps and enhanced user support mechanisms. Additionally, our results surface the need for an adaptive approach to questioning based on factors like busyness. These f...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Inproceedings (Conference)","Human-computer interaction","1970-01-01","LLM","personalized"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W4396827151","title":"The HaLLMark Effect: Supporting Provenance and Transparent Use of Large Language Models in Writing with Interactive Visualization","url":"https://doi.org/10.1145/3613904.3641895","published":"2024-05-11","authors":["Md Naimul Hoque","Tasfia Mashiat","Bhavya Ghai","Cecilia D. Shelton","Fanny Chevalier","Kari Kraus","Niklas Elmqvist"],"abstract":"The use of Large Language Models (LLMs) for writing has sparked controversy both among readers and writers. On one hand, writers are concerned that LLMs will deprive them of agency and ownership, and readers are concerned about spending their time on text generated by soulless machines. On the other hand, AI-assistance can improve writing as long as writers can conform to publisher policies, and as long as readers can be assured that a text has been verified by a human. We argue that a system that captures the provenance of interaction with an LLM can help writers retain their agency, conform to policies, and communicate their use of AI to publishers and readers transparently. Thus we propose HaLLMark, a tool for visualizing the writer’s interaction with the LLM. We evaluated HaLLMark with 13 creative writers, and found that it helped them retain a sense of control and ownership of the t...","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3613904.3641895","openalex_id":"https://openalex.org/W4396827151","cited_by_count":42,"quality_score":71,"matched_keywords":["LLM"],"author_affiliations":["Aarhus University","Amazon (United States)","George Mason University","University of Maryland, College Park","University of Toronto"],"concepts":[{"id":"https://openalex.org/C108170787","display_name":"Agency (philosophy)","score":0.8047119975090027},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6657105088233948},{"id":"https://openalex.org/C36464697","display_name":"Visualization","score":0.5173817873001099},{"id":"https://openalex.org/C2775924081","display_name":"Control (management)","score":0.48757094144821167},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.2923312783241272},{"id":"https://openalex.org/C144024400","display_name":"Sociology","score":0.2384338080883026},{"id":"https://openalex.org/C36289849","display_name":"Social science","score":0.06721439957618713}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":42}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/generative-echo-chamber-effects-of-llm-powered-search-systems-on-diverse-information-seeking","title":"Generative Echo Chamber? Effects of LLM-Powered Search Systems on Diverse Information Seeking","url":"https://www.microsoft.com/en-us/research/publication/generative-echo-chamber-effects-of-llm-powered-search-systems-on-diverse-information-seeking/","published":"2024-05-11","authors":["Nikhil Sharma","Q. Vera Liao","Ziang Xiao"],"abstract":"Large language models (LLMs) powered conversational search systems have already been used by hundreds of millions of people, and are believed to bring many benefits over conventional search. However, while decades of research and public discourse interrogated the risk of search systems in increasing selective exposure and creating echo chambers -- limiting exposure to diverse opinions and leading to opinion polarization, little is known about such a risk of LLM-powered conversational search. We conduct two experiments to investigate: 1) whether and how LLM-powered conversational search increases selective exposure compared to conventional search; 2) whether and how LLMs with opinion biases that either reinforce or challenge the user's view change the effect. Overall, we found that participants engaged in more biased information querying with LLM-powered conversational search, and an opin...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Inproceedings (Conference)","Human-computer interaction","1970-01-01","LLM"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/understanding-nonlinear-collaboration-between-human-and-ai-agents-a-co-design-framework-for-creative-design","title":"Understanding Nonlinear Collaboration between Human and AI Agents: A Co-design Framework for Creative Design","url":"https://www.microsoft.com/en-us/research/publication/understanding-nonlinear-collaboration-between-human-and-ai-agents-a-co-design-framework-for-creative-design/","published":"2024-05-11","authors":["Jiayi Zhou","Renzhong Li","Junxiu Tang","Tan Tang","Haotian Li","Weiwei Cui","Yingcai Wu"],"abstract":"Creative design is a nonlinear process where designers generate diverse ideas in the pursuit of an open-ended goal and converge towards consensus through iterative remixing. In contrast, AI-powered design tools often employ a linear sequence of incremental and precise instructions to approximate design objectives. Such operations violate customary creative design practices and thus hinder AI agents' ability to complete creative design tasks. To explore better human-AI co-design tools, we first summarize human designers' practices through a formative study with 12 design experts. Taking graphic design as a representative scenario, we formulate a nonlinear human-AI co-design framework and develop a proof-of-concept prototype, OptiMuse. We evaluate OptiMuse and validate the nonlinear framework through a comparative study. We notice a subconscious change in people's attitudes towards AI agen...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Inproceedings (Conference)","Human-computer interaction","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W4396833706","title":"Human-Centered Evaluation and Auditing of Language Models","url":"https://doi.org/10.1145/3613905.3636302","published":"2024-05-11","authors":["Ziang Xiao","Wesley Hanwen Deng","Michelle S. Lam","Motahhare Eslami","Juho Kim","Mina Lee","Q. Vera Liao"],"abstract":"The recent advancements in Large Language Models (LLMs) have significantly impacted numerous, and will impact more, real-world applications. However, these models also pose significant risks to individuals and society. To mitigate these issues and guide future model development, responsible evaluation and auditing of LLMs are essential. This workshop aims to address the current “evaluation crisis” in LLM research and practice by bringing together HCI and AI researchers and practitioners to rethink LLM evaluation and auditing from a human-centered perspective. The workshop will explore topics around understanding stakeholders’ needs and goals with evaluation and auditing LLMs, establishing human-centered evaluation and auditing methods, developing tools and resources to support these methods, building community and fostering collaboration. By soliciting papers, organizing invited keynote....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3613905.3636302","openalex_id":"https://openalex.org/W4396833706","cited_by_count":16,"quality_score":61,"matched_keywords":["LLM","media"],"author_affiliations":["Carnegie Mellon University","Human Media","Johns Hopkins University","Korea Advanced Institute of Science and Technology","Microsoft (Canada)","Microsoft (United States)","Research Canada","Stanford University"],"concepts":[{"id":"https://openalex.org/C199521495","display_name":"Audit","score":0.8110471963882446},{"id":"https://openalex.org/C55587333","display_name":"Engineering ethics","score":0.5179997086524963},{"id":"https://openalex.org/C12713177","display_name":"Perspective (graphical)","score":0.4213069677352905},{"id":"https://openalex.org/C56739046","display_name":"Knowledge management","score":0.4198695421218872},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.35500434041023254},{"id":"https://openalex.org/C17744445","display_name":"Political science","score":0.3465840816497803},{"id":"https://openalex.org/C539667460","display_name":"Management science","score":0.34521305561065674},{"id":"https://openalex.org/C127413603","display_name":"Engineering","score":0.26711195707321167}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":16}},{"id":"openalex:W4396832042","title":"HCI History and the Trajectory to Generative AI","url":"https://doi.org/10.1145/3613905.3636273","published":"2024-05-11","authors":["Jonathan Grudin","Donald Brinkman"],"abstract":"This course examines HCI history broadly, then conversational AI history from ELIZA to generative AI. A study of an LLM predecessor illuminates possibilities. With rapid change comes rising uncertainty. Not all history is relevant, but unchanging human nature abides. Some digital dreams become digital nightmares. Social media can deliver disinformation, malware, negative self-image, and polarization that undermines communities. Generative AI provides value but raises employment and career questions, education challenges, and empowers bad actors. We benefit from understanding the forces, the trajectories that brought us here, and how unanticipated consequences arose. Past events that shaped the present have become evident.","companies":["Meta/FAIR","Microsoft"],"matched_orgs":["Meta/FAIR","Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3613905.3636273","openalex_id":"https://openalex.org/W4396832042","cited_by_count":0,"quality_score":57,"matched_keywords":["LLM","media"],"author_affiliations":["Meta (United States)","Microsoft (United States)","University of Washington"],"concepts":[{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.8366552591323853},{"id":"https://openalex.org/C2776552730","display_name":"Disinformation","score":0.8266292810440063},{"id":"https://openalex.org/C2776291640","display_name":"Value (mathematics)","score":0.486335813999176},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.45982033014297485},{"id":"https://openalex.org/C108827166","display_name":"Internet privacy","score":0.4383094608783722},{"id":"https://openalex.org/C144024400","display_name":"Sociology","score":0.4015241861343384},{"id":"https://openalex.org/C518677369","display_name":"Social media","score":0.39455652236938477},{"id":"https://openalex.org/C111472728","display_name":"Epistemology","score":0.36341941356658936}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"official:435fdfebf33dc0cb","title":"Notes on Qwen-Max-0428","url":"https://qwenlm.github.io/blog/qwen-max-0428/","published":"2024-05-11","authors":["Alibaba/Qwen"],"abstract":"API DEMO DISCORDPreviously, we opensourced a series of Qwen1.5 model ranging from 0.5 to 110 billion parameters. Now, we release a larger model, Qwen-Max-0428. Qwen-Max-0428 is an instruction-tuned model for chat service. Very recently, it is available via Chatbot Arena and it has now become the top-10 in the leaderboard. Furthermore, our evaluation of MT-Bench also demonstrates that the new model outperforms our previous largest model Qwen1.5-110B-Chat.Models MT-Bench Arena Qwen1.","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Alibaba/Qwen"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official RSS feed https://qwenlm.github.io/blog/index.xml"}},{"id":"openalex:W4396817294","title":"MoESys: A Distributed and Efficient Mixture-of-Experts Training and Inference System for Internet Services","url":"https://doi.org/10.1109/tsc.2024.3399654","published":"2024-05-10","authors":["Dianhai Yu","Liang Shen","Hongxiang Hao","Weibao Gong","Huachao Wu","Jiang Bian","Li-Rong Dai","Haoyi Xiong"],"abstract":"While modern internet services, such as chatbots, search engines, and online advertising, demand the use of large-scale deep neural networks (DNNs), distributed training and inference over heterogeneous computing systems are desired to facilitate these DNN models. Mixture-of-Experts (MoE) is one the most common strategies to lower the cost of training subject to the overall size of models/data through gating and parallelism in a divide-and-conquer fashion. While DeepSpeed [1] has made efforts in carrying out large-scale MoE training over heterogeneous infrastructures, the efficiency of training and inference could be further improved from several system aspects, including load balancing, communication/computation efficiency, and memory footprint limits. In this work, we present a novel MoESys that boosts efficiency in both large-scale training and inference. Specifically, in the training...","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/tsc.2024.3399654","openalex_id":"https://openalex.org/W4396817294","cited_by_count":16,"quality_score":61,"matched_keywords":["memory","efficient"],"author_affiliations":["Baidu (China)","University of Science and Technology of China"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8323315382003784},{"id":"https://openalex.org/C110875604","display_name":"The Internet","score":0.7188297510147095},{"id":"https://openalex.org/C2776214188","display_name":"Inference","score":0.6581323742866516},{"id":"https://openalex.org/C2777211547","display_name":"Training (meteorology)","score":0.6029309034347534},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3604217767715454},{"id":"https://openalex.org/C120314980","display_name":"Distributed computing","score":0.3543447256088257},{"id":"https://openalex.org/C31258907","display_name":"Computer network","score":0.3532460331916809},{"id":"https://openalex.org/C136764020","display_name":"World Wide Web","score":0.32234930992126465}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":16}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/end-to-end-automatic-speech-recognition","title":"End-to-End Automatic Speech Recognition","url":"https://www.microsoft.com/en-us/research/publication/end-to-end-automatic-speech-recognition/","published":"2024-05-10","authors":["Jinyu Li"],"abstract":"The field of automatic speech recognition (ASR) is now dominated by the end-to-end (E2E) models that directly map speech to text. In this talk, we will give an overview of the E2E ASR models and introduce the recent progress from an industry perspective. To design an E2E model that has high accuracy and low latency, a masking strategy was applied to Transformer Transducer. We will discuss technologies that can use text-only data for general model training through pretraining and adaptation to a new domain through augmentation and factorization. We will also discuss how to build multilingual ASR models to serve all the users. Then, we will extend E2E modeling for streaming multi-speaker ASR. Finally, we will end the talk with some new research opportunities we can explore.","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Tech Report","Human language technologies"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W4396820567","title":"Enhancing Chinese abbreviation prediction with LLM generation and contrastive evaluation","url":"https://doi.org/10.1016/j.ipm.2024.103768","published":"2024-05-10","authors":["Jingping Liu","Xianyang Tian","Hanwen Tong","Chenhao Xie","Tong Ruan","Cong Lin","Baohua Wu","Haofen Wang"],"abstract":"","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1016/j.ipm.2024.103768","openalex_id":"https://openalex.org/W4396820567","cited_by_count":10,"quality_score":51,"matched_keywords":["LLM"],"author_affiliations":["Alibaba Group (China)","East China University of Science and Technology","Fudan University","Tongji University"],"concepts":[{"id":"https://openalex.org/C2777629044","display_name":"Contrastive analysis","score":0.5171232223510742},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.5097987055778503},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.4830649495124817},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.36409834027290344},{"id":"https://openalex.org/C41895202","display_name":"Linguistics","score":0.33604204654693604},{"id":"https://openalex.org/C15744967","display_name":"Psychology","score":0.32315313816070557},{"id":"https://openalex.org/C138885662","display_name":"Philosophy","score":0.06540393829345703}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":10}},{"id":"openalex:W4396773582","title":"Unveiling Code Pre-Trained Models: Investigating Syntax and Semantics Capacities","url":"https://doi.org/10.1145/3664606","published":"2024-05-09","authors":["Wei Ma","Shangqing Liu","Mengjie Zhao","Xiaofei Xie","Wenhan Wang","Qiang Hu","Junyin Zhang","Yang Liu"],"abstract":"Code models have made significant advancements in code intelligence by encoding knowledge about programming languages. While previous studies have explored the capabilities of these models in learning code syntax, there has been limited investigation on their ability to understand code semantics. Additionally, existing analyses assume that the number of edges between nodes at the abstract syntax tree (AST) is related to syntax distance, and also often require transforming the high-dimensional space of deep learning models to a low-dimensional one, which may introduce inaccuracies. To study how code models represent code syntax and semantics, we conduct a comprehensive analysis of seven code models, including four representative code pre-trained models (CodeBERT, GraphCodeBERT, CodeT5, and UnixCoder) and three large language models (LLMs) (StarCoder, CodeLlama and CodeT5+). We design four...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3664606","openalex_id":"https://openalex.org/W4396773582","cited_by_count":24,"quality_score":61,"matched_keywords":[],"author_affiliations":["Huawei Technologies (China)","Ludwig-Maximilians-Universität München","Nanyang Technological University","Singapore Management University","The University of Tokyo","University of Alberta","University of Luxembourg"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.833996057510376},{"id":"https://openalex.org/C60048249","display_name":"Syntax","score":0.7913010120391846},{"id":"https://openalex.org/C58646249","display_name":"Abstract syntax tree","score":0.7796858549118042},{"id":"https://openalex.org/C114408938","display_name":"Abstract syntax","score":0.7188870310783386},{"id":"https://openalex.org/C199360897","display_name":"Programming language","score":0.6354018449783325},{"id":"https://openalex.org/C184337299","display_name":"Semantics (computer science)","score":0.5726616382598877},{"id":"https://openalex.org/C11742125","display_name":"Syntax error","score":0.5504549145698547},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.47488850355148315}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":24}},{"id":"openalex:W4396782977","title":"Unraveling Complexity: An Exploration Into Large-Scale Multimodal Signal Processing","url":"https://doi.org/10.1109/mis.2024.3398592","published":"2024-05-09","authors":["Zhenyu Wen","Yuheng Ye","Jie Su","Taotao Li","Jinhao Wan","Shilian Zheng","Zhen Hong","Shibo He","Haoran Duan","Yuexiang Li","Yawen Huang","Yefeng Zheng"],"abstract":"Advanced communication systems and military reconnaissance are increasingly prevalent in high-tech environments, greatly supported by the flourishing in signal processing technologies. The recent exponential proliferation of sensors led to an unprecedented expansion in the scale and diversity of signals across various modalities. Such influx poses significant challenges in effectively integrating multi-modal signal data to deliver comprehensive and interpretive solutions across a diverse range of applications. In this paper, we provide an overview of the core issues, challenges, and future research directions in different stages of developing large-scale multi-modal signal processing models. Additionally, we introduce a prior investigation into signal representation learning, where we propose a contrastive learning-based framework to extract fine-grained signal features under few-shot co...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/mis.2024.3398592","openalex_id":"https://openalex.org/W4396782977","cited_by_count":5,"quality_score":42,"matched_keywords":[],"author_affiliations":["China Information Technology Security Evaluation Center","Durham University","Tencent (China)","Zhejiang University","Zhejiang University of Technology"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7981632947921753},{"id":"https://openalex.org/C71139939","display_name":"Modal","score":0.6371979117393494},{"id":"https://openalex.org/C104267543","display_name":"Signal processing","score":0.6218836307525635},{"id":"https://openalex.org/C2778755073","display_name":"Scale (ratio)","score":0.5387258529663086},{"id":"https://openalex.org/C2779843651","display_name":"SIGNAL (programming language)","score":0.41748344898223877},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3795216679573059},{"id":"https://openalex.org/C84462506","display_name":"Digital signal processing","score":0.20454522967338562},{"id":"https://openalex.org/C188027245","display_name":"Polymer chemistry","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":5}},{"id":"openalex:W4396758737","title":"ReLLa: Retrieval-enhanced Large Language Models for Lifelong Sequential Behavior Comprehension in Recommendation","url":"https://doi.org/10.1145/3589334.3645467","published":"2024-05-08","authors":["Jianghao Lin","Rong Shan","Chenxu Zhu","Kounianhua Du","Bo Chen","Shigang Quan","Ruiming Tang","Yong Yu","Weinan Zhang"],"abstract":"With large language models (LLMs) achieving remarkable breakthroughs in NLP domains, LLM-enhanced recommender systems have received much attention and have been actively explored currently. In this paper, we focus on adapting and empowering a pure large language model for zero-shot and few-shot recommendation tasks. First and foremost, we identify and formulate the lifelong sequential behavior incomprehension problem for LLMs in recommendation domains, i.e., LLMs fail to extract useful information from a textual context of long user behavior sequence, even if the length of context is far from reaching the context limitation of LLMs. To address such an issue and improve the recommendation performance of LLMs, we propose a novel framework, namely Retrieval enhanced Large Language models (ReLLa) for recommendation tasks in both zero-shot and few-shot settings. For zero-shot recommendation,....","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3589334.3645467","openalex_id":"https://openalex.org/W4396758737","cited_by_count":45,"quality_score":79,"matched_keywords":["LLM","language model","retrieval"],"author_affiliations":["Huawei Technologies (China)","Shanghai Jiao Tong University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7784466743469238},{"id":"https://openalex.org/C2779343474","display_name":"Context (archaeology)","score":0.745985209941864},{"id":"https://openalex.org/C2778344882","display_name":"Shot (pellet)","score":0.6178087592124939},{"id":"https://openalex.org/C557471498","display_name":"Recommender system","score":0.5782093405723572},{"id":"https://openalex.org/C511192102","display_name":"Comprehension","score":0.5741254687309265},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4768418073654175},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.4531625807285309},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.42835134267807007}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":45}},{"id":"openalex:W4396723563","title":"PMG : Personalized Multimodal Generation with Large Language Models","url":"https://doi.org/10.1145/3589334.3645633","published":"2024-05-08","authors":["Xiaoteng Shen","Rui Zhang","Xiaoyan Zhao","Jieming Zhu","Xi Xiao"],"abstract":"The emergence of large language models (LLMs) has revolutionized the capabilities of text comprehension and generation. Multi-modal generation attracts great attention from both the industry and academia, but there is little work on personalized generation, which has important applications such as recommender systems. This paper proposes the first method for personalized multimodal generation using LLMs, showcases its applications and validates its performance via an extensive experimental study on two datasets. The proposed method, Personalized Multimodal Generation (PMG for short) first converts user behaviors (e.g., clicks in recommender systems or conversations with a virtual assistant) into natural language to facilitate LLM understanding and extract user preference descriptions. Such user preferences are then fed into a generator, such as a multimodal LLM or diffusion model, to pro...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3589334.3645633","openalex_id":"https://openalex.org/W4396723563","cited_by_count":26,"quality_score":79,"matched_keywords":["LLM","personalized","personalization","preference"],"author_affiliations":["Chinese University of Hong Kong","Huawei Technologies (China)","Tsinghua University","University Town of Shenzhen"],"concepts":[{"id":"https://openalex.org/C183003079","display_name":"Personalization","score":0.8770211338996887},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8050938844680786},{"id":"https://openalex.org/C2776187449","display_name":"Natural language generation","score":0.7416877150535583},{"id":"https://openalex.org/C2781249084","display_name":"Preference","score":0.6698271036148071},{"id":"https://openalex.org/C2780992000","display_name":"Generator (circuit theory)","score":0.6552003026008606},{"id":"https://openalex.org/C557471498","display_name":"Recommender system","score":0.4910554885864258},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.4358759820461273},{"id":"https://openalex.org/C2985684807","display_name":"Text generation","score":0.42585518956184387}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":26}},{"id":"openalex:W4396758712","title":"Representation Learning with Large Language Models for Recommendation","url":"https://doi.org/10.1145/3589334.3645458","published":"2024-05-08","authors":["Xubin Ren","Wei Wei","Lianghao Xia","Lixin Su","Suqi Cheng","Junfeng Wang","Dawei Yin","Chao Huang"],"abstract":"Recommender systems have seen significant advancements with the influence of deep learning and graph neural networks, particularly in capturing complex user-item relationships. However, these graph-based recommenders heavily depend on ID-based data, potentially disregarding valuable textual information associated with users and items, resulting in less informative learned representations. Moreover, the utilization of implicit feedback data introduces potential noise and bias, posing challenges for the effectiveness of user preference learning. While the integration of large language models (LLMs) into traditional ID-based recommenders has gained attention, challenges such as scalability issues, limitations in text-only reliance, and prompt input constraints need to be addressed for effective implementation in practical recommender systems. To address these challenges, we propose a model-...","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3589334.3645458","openalex_id":"https://openalex.org/W4396758712","cited_by_count":155,"quality_score":75,"matched_keywords":["LLM","preference"],"author_affiliations":["Baidu (China)","University of Hong Kong"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8354987502098083},{"id":"https://openalex.org/C557471498","display_name":"Recommender system","score":0.698779284954071},{"id":"https://openalex.org/C48044578","display_name":"Scalability","score":0.5598111152648926},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5364693999290466},{"id":"https://openalex.org/C2522767166","display_name":"Data science","score":0.4963734745979309},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.4544844627380371},{"id":"https://openalex.org/C63479239","display_name":"Robustness (evolution)","score":0.45034947991371155},{"id":"https://openalex.org/C187191949","display_name":"Profiling (computer programming)","score":0.4493572413921356}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":155}},{"id":"openalex:W4396722529","title":"GraphTranslator: Aligning Graph Model to Large Language Model for Open-ended Tasks","url":"https://doi.org/10.1145/3589334.3645682","published":"2024-05-08","authors":["Mengmei Zhang","Mingwei Sun","Puying Wang","Shen Fan","Yanhu Mo","Xiaoxiao Xu","Hong Liu","Cheng Yang","Chuan Shi"],"abstract":"Large language models (LLMs) like ChatGPT, exhibit powerful zero-shot and instruction-following capabilities, have catalyzed a revolutionary transformation across diverse fields, especially for open-ended tasks. While the idea is less explored in the graph domain, despite the availability of numerous powerful graph models (GMs), they are restricted to tasks in a pre-defined form. Although several methods applying LLMs to graphs have been proposed, they fail to simultaneously handle the pre-defined and open-ended tasks, with LLM as a node feature enhancer or as a standalone predictor. To break this dilemma, we propose to bridge the pretrained GM and LLM by a Translator, named GraphTranslator, aiming to leverage GM to handle the pre-defined tasks effectively and utilize the extended interface of LLMs to offer various open-ended tasks for GM. To train such Translator, we propose a Producer....","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3589334.3645682","openalex_id":"https://openalex.org/W4396722529","cited_by_count":39,"quality_score":75,"matched_keywords":["LLM","language model"],"author_affiliations":["Alibaba Group (China)","China Telecom","China Telecom (China)","Peng Cheng Laboratory"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8461974263191223},{"id":"https://openalex.org/C153083717","display_name":"Leverage (statistics)","score":0.5292180180549622},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.5002555847167969},{"id":"https://openalex.org/C132525143","display_name":"Graph","score":0.48280608654022217},{"id":"https://openalex.org/C2778496695","display_name":"Dilemma","score":0.42662250995635986},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.34091275930404663},{"id":"https://openalex.org/C80444323","display_name":"Theoretical computer science","score":0.3352086544036865},{"id":"https://openalex.org/C111472728","display_name":"Epistemology","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":39}},{"id":"openalex:W4396735838","title":"Can GNN be Good Adapter for LLMs?","url":"https://doi.org/10.1145/3589334.3645627","published":"2024-05-08","authors":["Xuanwen Huang","Kaiqiao Han","Yang Yang","Dezheng Bao","Q. T. Tao","Ziwei Chai","Qi Zhu"],"abstract":"Recently, large language models (LLMs) have demonstrated superior capabilities in understanding and zero-shot learning on textual data, promising significant advances for many text-related domains. In the graph domain, various real-world scenarios also involve textual data, where tasks and node features can be described by text. These text-attributed graphs (TAGs) have broad applications in social media, recommendation systems, etc. Thus, this paper explores how to utilize LLMs to model TAGs. Previous methods for TAG modeling are based on million-scale LMs. When scaled up to billion-scale LLMs, they face huge challenges in computational costs. Additionally, they also ignore the zero-shot inference capabilities of LLMs. Therefore, we propose GraphAdapter, which uses a graph neural network (GNN) as an efficient adapter in collaboration with LLMs to tackle TAGs. In terms of efficiency, the....","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3589334.3645627","openalex_id":"https://openalex.org/W4396735838","cited_by_count":34,"quality_score":75,"matched_keywords":["media","efficient"],"author_affiliations":["Amazon (United States)","Zhejiang University"],"concepts":[{"id":"https://openalex.org/C177284502","display_name":"Adapter (computing)","score":0.8118559122085571},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7202191352844238},{"id":"https://openalex.org/C2776214188","display_name":"Inference","score":0.5055022239685059},{"id":"https://openalex.org/C132525143","display_name":"Graph","score":0.47130143642425537},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.38193070888519287},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.3424515128135681},{"id":"https://openalex.org/C80444323","display_name":"Theoretical computer science","score":0.22338631749153137},{"id":"https://openalex.org/C111919701","display_name":"Operating system","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":34}},{"id":"openalex:W4396736493","title":"Knowledge-Augmented Large Language Models for Personalized Contextual Query Suggestion","url":"https://doi.org/10.1145/3589334.3645404","published":"2024-05-08","authors":["Jinheon Baek","Nirupama Chandrasekaran","Silviu Cucerzan","Allen Herring","Sujay Kumar Jauhar"],"abstract":"Large Language Models (LLMs) excel at tackling various natural language tasks. However, due to the significant costs involved in re-training or fine-tuning them, they remain largely static and difficult to personalize. Nevertheless, a variety of applications could benefit from generations that are tailored to users' preferences, goals, and knowledge. Among them is web search, where knowing what a user is trying to accomplish, what they care about, and what they know can lead to improved search experiences. In this work, we propose a novel and general approach that augments an LLM with relevant context from users' interaction histories with a search engine in order to personalize its outputs. Specifically, we construct an entity-centric knowledge store for each user based on their search and browsing activities on the web, which is then leveraged to provide contextually relevant LLM promp...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3589334.3645404","openalex_id":"https://openalex.org/W4396736493","cited_by_count":21,"quality_score":70,"matched_keywords":["LLM","personalized","personalization"],"author_affiliations":["Korea Advanced Institute of Science and Technology","Microsoft (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8511824011802673},{"id":"https://openalex.org/C183003079","display_name":"Personalization","score":0.7692313194274902},{"id":"https://openalex.org/C2776945383","display_name":"Personalized search","score":0.6334540843963623},{"id":"https://openalex.org/C2779343474","display_name":"Context (archaeology)","score":0.5426552295684814},{"id":"https://openalex.org/C48044578","display_name":"Scalability","score":0.5287344455718994},{"id":"https://openalex.org/C164120249","display_name":"Web search query","score":0.5152878761291504},{"id":"https://openalex.org/C136764020","display_name":"World Wide Web","score":0.48792359232902527},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.4788224995136261}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":21}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/llms-can-find-mathematical-reasoning-mistakes-by-pedagogical-chain-of-thought","title":"LLMs can Find Mathematical Reasoning Mistakes by Pedagogical Chain-of-Thought","url":"https://www.microsoft.com/en-us/research/publication/llms-can-find-mathematical-reasoning-mistakes-by-pedagogical-chain-of-thought/","published":"2024-05-08","authors":["Zhuoxuan Jiang","Haoyuan Peng","Shanshan Feng","Fan Li","Dongsheng Li"],"abstract":"Self-correction is emerging as a promising approach to mitigate the issue of hallucination in Large Language Models (LLMs). To facilitate effective self-correction, recent research has proposed mistake detection as its initial step. However, current literature suggests that LLMs often struggle with reliably identifying reasoning mistakes when using simplistic prompting strategies. To address this challenge, we introduce a unique prompting strategy, termed the Pedagogical Chain-of-Thought (PedCoT), which is specifically designed to guide the identification of reasoning mistakes, particularly mathematical reasoning mistakes. PedCoT consists of pedagogical principles for prompts (PPP) design, two-stage interaction process (TIP) and grounded PedCoT prompts, all inspired by the educational theory of the Bloom Cognitive Model (BCM). We evaluate our approach on two public datasets featuring mat...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W4396758711","title":"GraphPro: Graph Pre-training and Prompt Learning for Recommendation","url":"https://doi.org/10.1145/3589334.3645546","published":"2024-05-08","authors":["Yuhao Yang","Lianghao Xia","Da Luo","Kangyi Lin","Chao Huang"],"abstract":"GNN-based recommendation systems have been successful in capturing complex user-item interactions using multi-hop message passing. However, these methods often struggle to handle the dynamic nature of user-item interactions, making it challenging to adapt to changes in user preferences and new data distributions. This limits their scalability and performance in real-world dynamic scenarios. In our study, we propose a framework called GraphPro that combines dynamic graph pre-training with prompt learning in an efficient way. This unique approach allows GNNs to effectively capture both long-term user preferences and short-term behavior changes, resulting in accurate and up-to-date recommendations. To address the issue of changing user preferences, we integrate a temporal prompt mechanism and a graph-structural prompt learning mechanism into the pre-trained GNN architecture. The temporal pr...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3589334.3645546","openalex_id":"https://openalex.org/W4396758711","cited_by_count":22,"quality_score":67,"matched_keywords":["long-term","efficient"],"author_affiliations":["Tencent (China)","University of Hong Kong"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8841248154640198},{"id":"https://openalex.org/C48044578","display_name":"Scalability","score":0.8056233525276184},{"id":"https://openalex.org/C105339364","display_name":"Software deployment","score":0.5651405453681946},{"id":"https://openalex.org/C63479239","display_name":"Robustness (evolution)","score":0.5636826753616333},{"id":"https://openalex.org/C2778712577","display_name":"Retraining","score":0.5551648736000061},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.5329799056053162},{"id":"https://openalex.org/C132525143","display_name":"Graph","score":0.518626868724823},{"id":"https://openalex.org/C557471498","display_name":"Recommender system","score":0.48132145404815674}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":22}},{"id":"openalex:W4396758674","title":"AgentCF: Collaborative Learning with Autonomous Language Agents for Recommender Systems","url":"https://doi.org/10.1145/3589334.3645537","published":"2024-05-08","authors":["Junjie Zhang","Yupeng Hou","Ruobing Xie","Wenqi Sun","Julian McAuley","Wayne Xin Zhao","Leyu Lin","Ji-Rong Wen"],"abstract":"","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3589334.3645537","openalex_id":"https://openalex.org/W4396758674","cited_by_count":57,"quality_score":67,"matched_keywords":[],"author_affiliations":["Renmin University of China","Tencent (China)","UC San Diego Health System"],"concepts":[{"id":"https://openalex.org/C557471498","display_name":"Recommender system","score":0.8474219441413879},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8165057301521301},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.398124635219574},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.3744931221008301},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.3240252137184143},{"id":"https://openalex.org/C136764020","display_name":"World Wide Web","score":0.31442520022392273}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":57}},{"id":"openalex:W4396757491","title":"ClickPrompt: CTR Models are Strong Prompt Generators for Adapting Language Models to CTR Prediction","url":"https://doi.org/10.1145/3589334.3645396","published":"2024-05-08","authors":["Jianghao Lin","Bo Chen","Hangyu Wang","Yunjia Xi","Yanru Qu","Xinyi Dai","Kangning Zhang","Ruiming Tang","Yong Yu","Weinan Zhang"],"abstract":"Click-through rate (CTR) prediction has become increasingly indispensable for various Internet applications. Traditional CTR models convert the multi-field categorical data into ID features via one-hot encoding, and extract the collaborative signals among features. Such a paradigm suffers from the problem of semantic information loss. Another line of research explores the potential of pretrained language models (PLMs) for CTR prediction by converting input data into textual sentences through hard prompt templates. Although semantic signals are preserved, they generally fail to capture the collaborative information (e.g., feature interactions, pure ID features), not to mention the unacceptable inference overhead brought by the huge model size. In this paper, we aim to model both the semantic knowledge and collaborative knowledge for accurate CTR estimation, and meanwhile address the infer...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3589334.3645396","openalex_id":"https://openalex.org/W4396757491","cited_by_count":27,"quality_score":64,"matched_keywords":[],"author_affiliations":["Huawei Technologies (China)","Shanghai Jiao Tong University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8373094797134399},{"id":"https://openalex.org/C2776214188","display_name":"Inference","score":0.6440191268920898},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.6157107949256897},{"id":"https://openalex.org/C2779343474","display_name":"Context (archaeology)","score":0.5613305568695068},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.502321720123291},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.48864513635635376},{"id":"https://openalex.org/C2776401178","display_name":"Feature (linguistics)","score":0.47334030270576477},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.4131094813346863}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":27}},{"id":"openalex:W4396757612","title":"FreqMAE: Frequency-Aware Masked Autoencoder for Multi-Modal IoT Sensing","url":"https://doi.org/10.1145/3589334.3645346","published":"2024-05-08","authors":["Denizhan Kara","Tomoyoshi Kimura","Shengzhong Liu","Jinyang Li","Dongxin Liu","Tianshi Wang","Ruijie Wang","Yizhuo Chen","Yigong Hu","Tarek Abdelzaher"],"abstract":"This paper presents FreqMAE, a novel self-supervised learning framework that synergizes masked autoencoding (MAE) with physics-informed insights to capture feature patterns in multi-modal IoT sensor data. FreqMAE enhances latent space representation of sensor data, reducing reliance on data labeling and improving accuracy for AI tasks. Differing from data augmentation-based methods like contrastive learning, FreqMAE's approach eliminates the need for handcrafted transformations. Adapting MAE for IoT sensing signals, we present three contributions from frequency domain insights: First, a Temporal-Shifting Transformer (TS-T) encoder that enables temporal interactions while distinguishing different frequency bands; Second, a factorized multi-modal fusion mechanism for leveraging cross-modal correlations and preserving unique modality features; Third, a hierarchically weighted loss function....","companies":["Meta/FAIR"],"matched_orgs":["Meta/FAIR"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3589334.3645346","openalex_id":"https://openalex.org/W4396757612","cited_by_count":16,"quality_score":53,"matched_keywords":[],"author_affiliations":["Meta (United States)","Shanghai Jiao Tong University","University of Illinois Urbana-Champaign"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7700673341751099},{"id":"https://openalex.org/C59404180","display_name":"Feature learning","score":0.6028872728347778},{"id":"https://openalex.org/C101738243","display_name":"Autoencoder","score":0.594998300075531},{"id":"https://openalex.org/C118505674","display_name":"Encoder","score":0.5838705897331238},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5482653975486755},{"id":"https://openalex.org/C71139939","display_name":"Modal","score":0.5313461422920227},{"id":"https://openalex.org/C2776359362","display_name":"Representation (politics)","score":0.4768846333026886},{"id":"https://openalex.org/C153180895","display_name":"Pattern recognition (psychology)","score":0.43815451860427856}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":16}},{"id":"apple:ivdi2dy2zc18td9q5mt0uxva","title":"Generative Modeling with Phase Stochastic Bridges","url":"https://machinelearning.apple.com/research/generative-modeling","published":"2024-05-08","authors":["Tianrong Chen","Jiatao Gu","Laurent Dinh","Evangelos A. Theodorou","Joshua Susskind","Shuangfei Zhai"],"abstract":"This paper introduces a novel generative modeling framework grounded in phase space dynamics, taking inspiration from the principles underlying Critically Damped Langevin Dynamics (CLD). Leveraging insights from stochastic optimal control, we construct a favorable path measure in the phase space that proves highly advantageous for generative sampling. A distinctive feature of our approach is the early-stage data prediction capability within the...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/skeleton-of-thought-large-language-models-can-do-parallel-decoding","title":"Skeleton-of-Thought: Prompting LLMs for Efficient Parallel Generation","url":"https://www.microsoft.com/en-us/research/publication/skeleton-of-thought-large-language-models-can-do-parallel-decoding/","published":"2024-05-07","authors":["Xuefei Ning","Zinan Lin","Zixuan Zhou","Zifu Wang","Huazhong Yang","Yu Wang"],"abstract":"This work aims at decreasing the end-to-end generation latency of large language models (LLMs). One of the major causes of the high generation latency is the sequential decoding approach adopted by almost all state-of-the-art LLMs. In this work, motivated by the thinking and writing process of humans, we propose Skeleton-of-Thought (SoT), which first guides LLMs to generate the skeleton of the answer, and then conducts parallel API calls or batched decoding to complete the contents of each skeleton point in parallel. Not only does SoT provide considerable speed-ups across 12 LLMs, but it can also potentially improve the answer quality on several question categories. SoT is an initial attempt at data-centric optimization for inference efficiency, and further underscores the potential of pushing LLMs to think more like a human for answer quality. Venue: 1970-01-01","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":92,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Human language technologies","Computer science","Efficient algorithm","Generative model","Language model","Machine learning","1970-01-01","efficient"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/zero-extremely-efficient-collective-communication-for-giant-model-training","title":"ZeRO++: Extremely Efficient Collective Communication for Giant Model Training","url":"https://www.microsoft.com/en-us/research/publication/zero-extremely-efficient-collective-communication-for-giant-model-training/","published":"2024-05-07","authors":["Guanhua Wang","Heyang Qin","Sam Ade Jacobs","Connor Holmes","Samyam Rajbhandari","Olatunji Ruwase","Feng Yang","Lei Yang","Yuxiong He"],"abstract":"Zero Redundancy Optimizer (ZeRO) has been used to train a wide range of large language models on massive GPUs clusters due to its ease of use, efficiency, and good scalability. However, when training on low-bandwidth clusters, or at scale which forces batch size per GPU to be small, ZeRO's effective throughput is limited because of high communication volume from gathering weights in forward pass, backward pass, and averaging gradients. This paper introduces three communication volume reduction techniques, which we collectively refer to as ZeRO++, targeting each of the communication collectives in ZeRO. First is block-quantization based all-gather. Second is data remapping that trades-off communication for more memory. Third is a novel all-to-all based quantized gradient averaging paradigm as replacement of reduce-scatter collective, which preserves accuracy despite communicating low prec...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":88,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Distributed, Parallel, and Cluster Computing","Machine learning","Performance","1970-01-01","memory","efficient","quantization"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/differentially-private-synthetic-data-via-foundation-model-apis-1-images","title":"Differentially Private Synthetic Data via Foundation Model APIs 1: Images","url":"https://www.microsoft.com/en-us/research/publication/differentially-private-synthetic-data-via-foundation-model-apis-1-images/","published":"2024-05-07","authors":["Zinan Lin","Sivakanth Gopi","Janardhan (Jana) Kulkarni","Harsha Nori","Sergey Yekhanin"],"abstract":"Generating differentially private (DP) synthetic data that closely resembles the original private data without leaking sensitive user information is a scalable way to mitigate privacy concerns in the current data-driven world. In contrast to current practices that train customized models for this task, we aim to generate DP Synthetic Data via APIs (DPSDA), where we treat foundation models as blackboxes and only utilize their inference APIs. Such API-based, training-free approaches are easier to deploy as exemplified by the recent surge in the number of API-based apps. These approaches can also leverage the power of large foundation models which are accessible via their inference APIs while the model weights are unreleased. However, this comes with greater challenges due to strictly more restrictive model access and the additional need to protect privacy from the API provider.In this pape...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":88,"matched_keywords":["Inproceedings (Conference)","Algorithms","Artificial intelligence","Mathematics","Security, privacy, and cryptography","data privacy","Differential privacy","Synthetic data","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/privacy-preserving-in-context-learning-with-differentially-private-few-shot-generation","title":"Privacy-Preserving In-Context Learning with Differentially Private Few-Shot Generation","url":"https://www.microsoft.com/en-us/research/publication/privacy-preserving-in-context-learning-with-differentially-private-few-shot-generation/","published":"2024-05-07","authors":["Xinyu Tang","Richard Shin","Huseyin Inan","Andre Manoel","Fatemehsadat Mireshghallah","Zinan Lin","Sivakanth Gopi","Janardhan (Jana) Kulkarni","Robert Sim"],"abstract":"We study the problem of in-context learning (ICL) with large language models (LLMs) on private datasets. This scenario poses privacy risks, as LLMs may leak or regurgitate the private examples demonstrated in the prompt. We propose a novel algorithm that generates synthetic few-shot demonstrations from the private dataset with formal differential privacy (DP) guarantees, and show empirically that it can achieve effective ICL. We conduct extensive experiments on standard benchmarks and compare our algorithm with non-private ICL and zero-shot solutions. Our results demonstrate that our algorithm can achieve competitive performance with strong privacy levels. These results open up new possibilities for ICL with privacy protection for a broad range of applications. Venue: 1970-01-01","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":84,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Security, privacy, and cryptography","Differential privacy","large language model","Machine learning","Synthetic data","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/you-only-cache-once-decoder-decoder-architectures-for-language-models","title":"You Only Cache Once: Decoder-Decoder Architectures for Language Models","url":"https://www.microsoft.com/en-us/research/publication/you-only-cache-once-decoder-decoder-architectures-for-language-models/","published":"2024-05-07","authors":["Yutao Sun","Li Dong","Yi Zhu","Shaohan Huang","Wenhui Wang","Shuming Ma","Quanlu Zhang","Jianyong Wang","Furu Wei"],"abstract":"We introduce a decoder-decoder architecture, YOCO, for large language models, which only caches key-value pairs once. It consists of two components, i.e., a cross-decoder stacked upon a self-decoder. The self-decoder efficiently encodes global key-value (KV) caches that are reused by the cross-decoder via cross-attention. The overall model behaves like a decoder-only Transformer, although YOCO only caches once. The design substantially reduces GPU memory demands, yet retains global attention capability. Additionally, the computation flow enables prefilling to early exit without changing the final output, thereby significantly speeding up the prefill stage. Experimental results demonstrate that YOCO achieves favorable performance compared to Transformer in various settings of scaling up model size and number of training tokens. We also extend YOCO to 1M context length with near-perfect ne...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":80,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","large language models","1970-01-01","memory","retrieval"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/privately-aligning-language-models-with-reinforcement-learning","title":"Privately Aligning Language Models with Reinforcement Learning","url":"https://www.microsoft.com/en-us/research/publication/privately-aligning-language-models-with-reinforcement-learning/","published":"2024-05-07","authors":["Fan Wu","Huseyin Inan","Arturs Backurs","Varun Chandrasekaran","Janardhan (Jana) Kulkarni","Robert Sim"],"abstract":"Positioned between pre-training and user deployment, aligning large language models (LLMs) through reinforcement learning (RL) has emerged as a prevailing strategy for training instruction following-models such as ChatGPT. In this work, we initiate the study of privacy-preserving alignment of LLMs through Differential Privacy (DP) in conjunction with RL. Following the influential work of Ziegler et al. (2020), we study two dominant paradigms: (i) alignment via RL without human in the loop (e.g., positive review generation) and (ii) alignment via RL from human feedback (RLHF) (e.g., summarization in a human-preferred way). We give a new DP framework to achieve alignment via RL, and prove its correctness. Our experimental results validate the effectiveness of our approach, offering competitive utility while ensuring strong privacy protections. Venue: 1970-01-01","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Algorithms","Artificial intelligence","Computer science","large language models","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"hf-org-paper:deepseek-ai:2405.04434","title":"DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model","url":"https://huggingface.co/papers/2405.04434","published":"2024-05-07","authors":["DeepSeek"],"abstract":"We present DeepSeek-V2, a strong Mixture-of-Experts (MoE) language model characterized by economical training and efficient inference. It comprises 236B total parameters, of which 21B are activated for each token, and supports a context length of 128K tokens. DeepSeek-V2 adopts innovative architectures including Multi-head Latent Attention (MLA) and DeepSeekMoE. MLA guarantees efficient inference through significantly compressing the Key-Value (KV) cache into a latent vector, while DeepSeekMoE enables training strong models at an economical cost through sparse computation. Compared with DeepSeek 67B, DeepSeek-V2 achieves significantly stronger performance, and meanwhile saves 42.5% of training costs, reduces the KV cache by 93.3%, and boosts the maximum generation throughput to 5.76 times. We pretrain DeepSeek-V2 on a high-quality and multi-source corpus consisting of 8.1T tokens, and fu...","companies":["DeepSeek"],"matched_orgs":["DeepSeek"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["HuggingFace org papers","deepseek-ai","language model","efficient"],"author_affiliations":["DeepSeek"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/deepseek-ai/papers"}},{"id":"openalex:W4396693147","title":"Day-to-Night Street View Image Generation for 24-Hour Urban Scene Auditing Using Generative AI","url":"https://doi.org/10.3390/jimaging10050112","published":"2024-05-07","authors":["Zhiyi Liu","Tingting Li","Tianyi Ren","Da Chen","Wenjing Li","Waishan Qiu"],"abstract":"A smarter city should be a safer city. Nighttime safety in metropolitan areas has long been a global concern, particularly for large cities with diverse demographics and intricate urban forms, whose citizens are often threatened by higher street-level crime rates. However, due to the lack of night-time urban appearance data, prior studies based on street view imagery (SVI) rarely addressed the perceived night-time safety issue, which can generate important implications for crime prevention. This study hypothesizes that night-time SVI can be effectively generated from widely existing daytime SVIs using generative AI (GenAI). To test the hypothesis, this study first collects pairwise day-and-night SVIs across four cities diverged in urban landscapes to construct a comprehensive day-and-night SVI dataset. It then trains and validates a day-to-night (D2N) model with fine-tuned brightness adj...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.3390/jimaging10050112","openalex_id":"https://openalex.org/W4396693147","cited_by_count":29,"quality_score":66,"matched_keywords":[],"author_affiliations":["Beijing University of Civil Engineering and Architecture","Hong Kong Design Centre","Huawei Technologies (China)","The University of Tokyo","University of Bath","University of Hong Kong"],"concepts":[{"id":"https://openalex.org/C199521495","display_name":"Audit","score":0.6693698763847351},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5590109825134277},{"id":"https://openalex.org/C115961682","display_name":"Image (mathematics)","score":0.5157735347747803},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.5010354518890381},{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.5008225440979004},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.49766114354133606},{"id":"https://openalex.org/C144133560","display_name":"Business","score":0.0898091197013855},{"id":"https://openalex.org/C121955636","display_name":"Accounting","score":0.06546437740325928}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":29}},{"id":"openalex:W4396747541","title":"Semi-supervised multi-modal medical image segmentation with unified translation","url":"https://doi.org/10.1016/j.compbiomed.2024.108570","published":"2024-05-07","authors":["Huajun Sun","Jia Wei","Wenguang Yuan","Rui Li"],"abstract":"","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1016/j.compbiomed.2024.108570","openalex_id":"https://openalex.org/W4396747541","cited_by_count":11,"quality_score":48,"matched_keywords":[],"author_affiliations":["Huawei Technologies (China)","Rochester Institute of Technology","South China University of Technology"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7942332029342651},{"id":"https://openalex.org/C89600930","display_name":"Segmentation","score":0.6670057773590088},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.6575141549110413},{"id":"https://openalex.org/C71139939","display_name":"Modal","score":0.5687568783760071},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.5407935976982117},{"id":"https://openalex.org/C153083717","display_name":"Leverage (statistics)","score":0.512267529964447},{"id":"https://openalex.org/C2776321320","display_name":"Annotation","score":0.49586567282676697},{"id":"https://openalex.org/C2780226545","display_name":"Modality (human–computer interaction)","score":0.47881609201431274}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":11}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/corporate-communication-companion-ccc-an-llm-empowered-writing-assistant-for-workplace-social-media","title":"Corporate Communication Companion (CCC): An LLM-empowered Writing Assistant for Workplace Social Media","url":"https://www.microsoft.com/en-us/research/publication/corporate-communication-companion-ccc-an-llm-empowered-writing-assistant-for-workplace-social-media/","published":"2024-05-06","authors":["Zhuoran Lu","Sheshera Mysore","Tara Safavi","Jennifer Neville","Longqi Yang","Mengting Wan"],"abstract":"Workplace social media platforms enable employees to cultivate their professional image and connect with colleagues in a semi-formal environment. While semi-formal corporate communication poses a unique set of challenges, large language models (LLMs) have shown great promise in helping users draft and edit their social media posts. However, LLMs may fail to capture individualized tones and voices in such workplace use cases, as they often generate text using a\"one-size-fits-all\"approach that can be perceived as generic and bland. In this paper, we present Corporate Communication Companion (CCC), an LLM-empowered interactive system that helps people compose customized and individualized workplace social media posts. Using need-finding interviews to motivate our system design, CCC decomposes the writing process into two core functions, outline and edit: First, it suggests post outlines bas...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Article (Journal)","Artificial intelligence","Computer science","LLM","media"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"apple:zkq98mcixyzai6r4c5fyc7vn","title":"Knowledge Transfer from Vision Foundation Models for Efficient Training of Small Task-specific Models","url":"https://machinelearning.apple.com/research/knowledge-transfer","published":"2024-05-06","authors":["Raviteja Vemulapalli","Hadi Pouransari","Fartash Faghri","Sachin Mehta","Mehrdad Farajtabar","Mohammad Rastegari","Oncel Tuzel"],"abstract":"Vision Foundation Models (VFMs) pretrained on massive datasets exhibit impressive performance on various downstream tasks, especially with limited labeled target data. However, due to their high inference compute cost, these models cannot be deployed for many real-world applications. Motivated by this, we ask the following important question, \"How can we leverage the knowledge from a large VFM to train a small task-specific model for a new target...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["efficient"],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"openalex:W7124339405","title":"Reinforcement Learning for Question Answering in Programming Domain using Public Community Scoring as a Human Feedback","url":"https://doi.org/10.65109/lell9659","published":"2024-05-06","authors":["Alexey Gorbatovski","Sergey V. Kovalchuk"],"abstract":"This study explores improving GPT Neo 125M in programming-focused Community Question Answering (CQA) using Reinforcement Learning from Human Feedback (RLHF) and Stack Overflow scores. We utilized two reward model training strategies with Proximal Policy Optimization (PPO), achieving enhancements comparable to GPT Neo's 2.7B model. The research introduces an auxiliary scoring mechanism, revealing the limitations of traditional linguistic metrics for programming responses. It highlights the need for domain-specific evaluation methods and the challenges in applying RLHF to programming CQA, contributing to the advancement of Large Language Models (LLMs) with human feedback.","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.65109/lell9659","openalex_id":"https://openalex.org/W7124339405","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Huawei Technologies (China)","ITMO University"],"concepts":[{"id":"https://openalex.org/C97541855","display_name":"Reinforcement learning","score":0.7699000239372253},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6646999716758728},{"id":"https://openalex.org/C36503486","display_name":"Domain (mathematical analysis)","score":0.5861999988555908},{"id":"https://openalex.org/C44291984","display_name":"Question answering","score":0.5782999992370605},{"id":"https://openalex.org/C2993776861","display_name":"Open domain","score":0.5426999926567078},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.517799973487854},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.44040000438690186},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.41359999775886536}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/negativeprompt-leveraging-psychology-for-large-language-models-enhancement-via-negative-emotional-stimuli","title":"NegativePrompt: Leveraging Psychology for Large Language Models Enhancement via Negative Emotional Stimuli","url":"https://www.microsoft.com/en-us/research/publication/negativeprompt-leveraging-psychology-for-large-language-models-enhancement-via-negative-emotional-stimuli/","published":"2024-05-04","authors":["Xu Wang","Cheng Li","Yi Chang","Jindong Wang","Yuan Wu"],"abstract":"Large Language Models (LLMs) have become integral to a wide spectrum of applications, ranging from traditional computing tasks to advanced artificial intelligence (AI) applications. This widespread adoption has spurred extensive research into LLMs across various disciplines, including the social sciences. Notably, studies have revealed that LLMs possess emotional intelligence, which can be further developed through positive emotional stimuli. This discovery raises an intriguing question: can negative emotions similarly influence LLMs, potentially enhancing their performance? In response to this question, we introduce NegativePrompt, a novel approach underpinned by psychological principles, involving ten specifically designed negative emotional stimuli. We embark on rigorous experimental evaluations of five LLMs including Flan-T5-Large, Vicuna, Llama 2, ChatGPT, and GPT-4, across a set of...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"apple:cwv9w2rkm054w0yy06ss1rhf","title":"Conformal Prediction via Regression-as-Classification","url":"https://machinelearning.apple.com/research/conformal-prediction","published":"2024-05-03","authors":["Etash Guha","Shlok Natarajan","Thomas Mollenhoff","Emtiyaz Khan","Eugene Ndiaye"],"abstract":"Conformal prediction (CP) for regression can be challenging, especially when the output distribution is heteroscedastic, multimodal, or skewed. Some of the issues can be addressed by estimating a distribution over the output, but in reality, such approaches can be sensitive to estimation error and yield unstable intervals. Here, we circumvent the challenges by converting regression to a classification problem and then use CP for classification to...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"openalex:W4396620533","title":"Infusing internalized knowledge of language models into hybrid prompts for knowledgeable dialogue generation","url":"https://doi.org/10.1016/j.knosys.2024.111874","published":"2024-05-03","authors":["Jiaqi Bai","Zhao Yan","Shun Zhang","Jian Yang","Hongcheng Guo","Zhoujun Li"],"abstract":"","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1016/j.knosys.2024.111874","openalex_id":"https://openalex.org/W4396620533","cited_by_count":10,"quality_score":47,"matched_keywords":[],"author_affiliations":["Beihang University","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5356594324111938},{"id":"https://openalex.org/C15744967","display_name":"Psychology","score":0.4103376269340515},{"id":"https://openalex.org/C188147891","display_name":"Cognitive science","score":0.38455909490585327},{"id":"https://openalex.org/C41895202","display_name":"Linguistics","score":0.37214481830596924},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.35856032371520996},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.33490824699401855},{"id":"https://openalex.org/C199360897","display_name":"Programming language","score":0.3242807388305664},{"id":"https://openalex.org/C138885662","display_name":"Philosophy","score":0.0671493411064148}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":10}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/mixture-of-linear-experts-for-long-term-time-series-forecasting","title":"Mixture-of-Linear-Experts for Long-term Time Series Forecasting","url":"https://www.microsoft.com/en-us/research/publication/mixture-of-linear-experts-for-long-term-time-series-forecasting/","published":"2024-05-02","authors":["Ronghao Ni","Zinan Lin","Shuaiqi Wang","Giulia Fanti"],"abstract":"Long-term time series forecasting (LTSF) aims to predict future values of a time series given the past values. The current state-of-the-art (SOTA) on this problem is attained in some cases by linear-centric models, which primarily feature a linear mapping layer. However, due to their inherent simplicity, they are not able to adapt their prediction rules to periodic changes in time series patterns. To address this challenge, we propose a Mixture-of-Experts-style augmentation for linear-centric models and propose Mixture-of-Linear-Experts (MoLE). Instead of training a single model, MoLE trains multiple linear-centric models (i.e., experts) and a router model that weighs and mixes their outputs. While the entire framework is trained end-to-end, each expert learns to specialize in a specific temporal pattern, and the router model learns to compose the experts adaptively. Experiments show tha...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":80,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Machine learning","Mixture of experts","Time series","1970-01-01","long-term"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/aura-amplifying-understanding-resilience-and-awareness-for-responsible-ai-content-work","title":"AURA: Amplifying Understanding, Resilience, and Awareness for Responsible AI Content Work","url":"https://www.microsoft.com/en-us/research/publication/aura-amplifying-understanding-resilience-and-awareness-for-responsible-ai-content-work/","published":"2024-05-02","authors":["Alice Qian Zhang","Judith Amores","Hong Shen","Mary Czerwinski","Mary L. Gray","Jina Suh"],"abstract":"Behind the scenes of maintaining the safety of technology products from harmful and illegal digital content lies unrecognized human labor. The recent rise in the use of generative AI technologies and the accelerating demands to meet responsible AI (RAI) aims necessitates an increased focus on the labor behind such efforts in the age of AI. This study investigates the nature and challenges of content work that supports RAI efforts, or \"RAI content work,\" that spans content moderation, data labeling, and red teaming -- through the lived experiences of content workers. We conduct a formative survey and semi-structured interview studies to develop a conceptualization of RAI content work and a subsequent framework of recommendations for providing holistic support for content workers. We validate our recommendations through a series of workshops with content workers and derive considerations f...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.1145/3710931","openalex_id":"https://openalex.org/W4410537238","cited_by_count":5,"quality_score":73,"matched_keywords":["Article (Journal)","Artificial intelligence","Human-computer interaction","Computer science"],"author_affiliations":["Microsoft","Carnegie Mellon University","Microsoft (United States)"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"bytedance-seed:72","title":"StoryDiffusion: Consistent Self-Attention for Long-Range Image and Video Generation","url":"https://seed.bytedance.com/en/research/storydiffusion-consistent-self-attention-for-long-range-image-and-video-generation","published":"2024-05-02","authors":["Yupeng Zhou","Daquan Zhou","Ming-Ming Cheng","Jiashi Feng","Qibin Hou"],"abstract":"For recent diffusion-based generative models, maintaining consistent contentacross a series of generated images, especially those containing subjects and complex details, presents a significant challenge. In this paper, we propose a new wayof self-attention calculation, termed Consistent Self-Attention, that significantlyboosts the consistency between the generated images and augments prevalent pretrained diffusion-based text-to-image models in a zero-shot manner. To extendour method to long-range video generation, we further introduce a novel semanticspace temporal motion prediction module, named Semantic Motion Predictor. It istrained to estimate the motion conditions between two provided images in the semantic spaces. This module converts the generated sequence of images into videoswith smooth transitions and consistent subjects that are significantly more stablethan the modules based...","companies":["ByteDance/Seed"],"matched_orgs":["ByteDance/Seed"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Computer Vision","Vision","NeurIPS 2024"],"author_affiliations":["ByteDance/Seed"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official ByteDance Seed publication API https://seed.bytedance.com/api/get_article_list_v2"}},{"id":"apple:vz3557d72rxv79n0x38nku88","title":"ReLU Strikes Back: Exploiting Activation Sparsity in Large Language Models","url":"https://machinelearning.apple.com/research/relu-strikes-back","published":"2024-05-02","authors":["Iman Mirzadeh","Keivan Alizadeh Vahid","Sachin Mehta","Carlo C Del Mundo","Oncel Tuzel","Golnoosh Samei","Mohammad Rastegari","Mehrdad Farajtabar"],"abstract":"Large Language Models (LLMs) with billions of parameters have drastically transformed AI applications. However, their demanding computation during inference has raised significant challenges for deployment on resource-constrained devices. Despite recent trends favoring alternative activation functions such as GELU or SiLU, known for increased computation, this study strongly advocates for reinstating ReLU activation in LLMs. We demonstrate that...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"apple:iuhouruqtq6su8bx3usggitu","title":"Guiding Instruction-based Image Editing via Multimodal Large Language Models","url":"https://machinelearning.apple.com/research/mgie","published":"2024-05-02","authors":["Tsu-Jui Fu","Wenze Hu","Xianzhi Du","William Wang","Yinfei Yang","Zhe Gan"],"abstract":"Instruction-based image editing improves the controllability and flexibility of image manipulation via natural commands without elaborate descriptions or regional masks. However, human instructions are sometimes too brief for current methods to capture and follow. Multimodal large language models (MLLMs) show promising capabilities in cross-modal understanding and visual-aware response generation via LMs. We investigate how MLLMs facilitate edit...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/generative-ai-and-the-politics-of-visibility","title":"Generative AI and the Politics of Visibility","url":"https://www.microsoft.com/en-us/research/publication/generative-ai-and-the-politics-of-visibility/","published":"2024-05-01","authors":["Tarleton Gillespie"],"abstract":"Proponents of generative AI tools claim they will supplement, even replace, the work of cultural production. This raises questions about the politics of visibility: what kinds of stories do these tools tend to generate, and what do they generally not? Do these tools match the kind of diversity of representation that marginalized populations and non-normative communities have fought to secure in publishing and broadcast media? I tested three widely available generative AI tools with prompts designed to reveal these normative assumptions; I prompted the tools multiple times with each, to track the diversity of the outputs to the same query. I demonstrate that, as currently designed and trained, generative AI tools tend to reproduce normative identities and narratives, rarely representing less common arrangements and perspectives. When they do generate variety, it is often narrow, maintaini...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.1177/20539517241252131","openalex_id":"https://openalex.org/W4396863502","cited_by_count":78,"quality_score":106,"matched_keywords":["Article (Journal)","Artificial intelligence","Social sciences","1970-01-01","politics","media"],"author_affiliations":["Microsoft","Cornell University","Microsoft (United States)"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/selective-pre-training-for-private-fine-tuning","title":"Selective Pre-training for Private Fine-tuning","url":"https://www.microsoft.com/en-us/research/publication/selective-pre-training-for-private-fine-tuning/","published":"2024-05-01","authors":["Da Yu","Sivakanth Gopi","Janardhan (Jana) Kulkarni","Zinan Lin","Saurabh Naik","Tomasz Lukasz Religa","Jian Yin","Huishuai Zhang"],"abstract":"Suppose we want to train text prediction models in email clients or word processors. The models must preserve the privacy of user data and adhere to a specific fixed size to meet memory and inference time requirements. We introduce a generic framework to solve this problem. Specifically, we are given a public dataset D pub and a private dataset D priv corresponding to a downstream task T . How should we pre-train a fixed-size model M on D pub and fine-tune it on D priv such that performance of M with respect to T is maximized and M satisfies differential privacy with respect to D priv ? We show that pre-training on a subset of dataset D pub that brings the public distribution closer to the private distribution is a crucial ingredient to maximize the transfer learning abilities of M after pre-training, especially in the regimes where model sizes are relatively small. Besides performance i...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":88,"matched_keywords":["Article (Journal)","Algorithms","Security, privacy, and cryptography","Artificial intelligence","data privacy","Differential privacy","1970-01-01","memory","compression"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/large-language-models-cannot-explain-themselves","title":"Large Language Models Cannot Explain Themselves","url":"https://www.microsoft.com/en-us/research/publication/large-language-models-cannot-explain-themselves/","published":"2024-05-01","authors":["Advait Sarkar"],"abstract":"Large language models can be prompted to produce text. They can also be prompted to produce \"explanations\" of their output. But these are not really explanations, because they do not accurately reflect the mechanical process underlying the prediction. The illusion that they reflect the reasoning process can result in significant harms. These \"explanations\" can be valuable, but for promoting critical thinking rather than for understanding the model. I propose a recontextualisation of these \"explanations\", using the term \"exoplanations\" to draw attention to their exogenous nature. I discuss some implications for design and technology, such as the inclusion of appropriate guardrails and responses when models are prompted to generate explanations. Venue: 1970-01-01","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":88,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Human language technologies","Human-computer interaction","Social sciences","Generative adversarial network","Human–computer interaction","large language models","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/rescaling-intermediate-features-makes-trained-consistency-models-perform-better","title":"Rescaling Intermediate Features Makes Trained Consistency Models Perform Better","url":"https://www.microsoft.com/en-us/research/publication/rescaling-intermediate-features-makes-trained-consistency-models-perform-better/","published":"2024-05-01","authors":["Junyi Zhu","Zinan Lin","Enshu Liu","Xuefei Ning","Matthew B. Blaschko"],"abstract":"In the domain of deep generative models, diffusion models are renowned for their high-quality image generation but are constrained by intensive computational demands. To mitigate this, consistency models have been proposed as a computationally efficient alternative. Our research reveals that post-training rescaling of internal features can enhance the one-step sample quality of these models without incurring detectable computational overhead. This optimization is evidenced by an obvious improvement in Fréchet Inception Distance (FID). For example, with our rescaled consistency distillation (CD) model, FID on the ImageNet dataset reduces from 6.2 to 5.2, on the LSUN-cat dataset from 10.9 to 9.5. Closer inspection of the generated images reveals that this enhancement may originate from improved visual details and clarity. Venue: 1970-01-01","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":84,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer vision","Diffusion models","Generative model","1970-01-01","efficient","distillation"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/evoke-evoking-critical-thinking-abilities-in-llms-via-reviewer-author-prompt-editing","title":"Evoke: Evoking Critical Thinking Abilities in LLMs via Reviewer-Author Prompt Editing","url":"https://www.microsoft.com/en-us/research/publication/evoke-evoking-critical-thinking-abilities-in-llms-via-reviewer-author-prompt-editing/","published":"2024-05-01","authors":["Xinyu Hu","Pengfei Tang","Simiao Zuo","Zihan Wang","Qiang Lou","Jian Jiao","Denis Charles"],"abstract":"Large language models (LLMs) have made impressive progress in natural language processing. These models rely on proper human instructions (or prompts) to generate suitable responses. However, the potential of LLMs are not fully harnessed by commonly-used prompting methods: many human-in-the-loop algorithms employ ad-hoc procedures for prompt selection; while auto prompt generation approaches are essentially searching all possible prompts randomly and inefficiently. We propose Evoke, an automatic prompt refinement framework. In Evoke, there are two instances of a same LLM: one as a reviewer (LLM-Reviewer), it scores the current prompt; the other as an author (LLM-Author), it edits the prompt by considering the edit history and the reviewer's feedback. Such an author-reviewer feedback loop ensures that the prompt is refined in each iteration. We further aggregate a data selection approach....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":84,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Human language technologies","automatic prompt engineering","in-context learning","large language models","1970-01-01","LLM"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/model-tells-you-what-to-discard-adaptive-kv-cache-compression-for-llms","title":"Model Tells You What to Discard: Adaptive KV Cache Compression for LLMs","url":"https://www.microsoft.com/en-us/research/publication/model-tells-you-what-to-discard-adaptive-kv-cache-compression-for-llms/","published":"2024-05-01","authors":["Suyu Ge","Yunan Zhang","Liyuan Liu","Minjia Zhang","Jiawei Han","Jianfeng Gao"],"abstract":"In this study, we introduce adaptive KV cache compression, a plug-and-play method that reduces the memory footprint of generative inference for Large Language Models (LLMs). Different from the conventional KV cache that retains key and value vectors for all context tokens, we conduct targeted profiling to discern the intrinsic structure of attention modules. Based on the recognized structure, we then construct the KV cache in an adaptive manner: evicting long-range contexts on attention heads emphasizing local contexts, discarding non-special tokens on attention heads centered on special tokens, and only employing the standard KV cache for attention heads that broadly attend to all tokens. Moreover, with the lightweight attention profiling used to guide the construction of the adaptive KV cache, FastGen can be deployed without resource-intensive fine-tuning or re-training. In our experim...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":80,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","large language models","1970-01-01","memory","compression"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/improving-the-training-of-rectified-flows","title":"Improving the Training of Rectified Flows","url":"https://www.microsoft.com/en-us/research/publication/improving-the-training-of-rectified-flows/","published":"2024-05-01","authors":["Sangyun Lee","Zinan Lin","Giulia Fanti"],"abstract":"Diffusion models have shown great promise for image and video generation, but sampling from state-of-the-art models requires expensive numerical integration of a generative ODE. One approach for tackling this problem is rectified flows, which iteratively learn smooth ODE paths that are less susceptible to truncation error. However, rectified flows still require a relatively large number of function evaluations (NFEs). In this work, we propose improved techniques for training rectified flows, allowing them to compete with knowledge distillation methods even in the low NFE setting. Our main insight is that under realistic settings, a single iteration of the Reflow algorithm for training rectified flows is sufficient to learn nearly straight trajectories; hence, the current practice of using multiple Reflow iterations is unnecessary. We thus propose techniques to improve one-round training....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":80,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer vision","Generative model","Image processing","1970-01-01","distillation"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/the-metacognitive-demands-and-opportunities-of-generative-ai","title":"The Metacognitive Demands and Opportunities of Generative AI","url":"https://www.microsoft.com/en-us/research/publication/the-metacognitive-demands-and-opportunities-of-generative-ai/","published":"2024-05-01","authors":["Lev Tankelevitch","Viktor Kewenig","Auste Simkute","Ava Elizabeth Scott","Advait Sarkar","Abigail Sellen","Sean Rintel"],"abstract":"Generative AI (GenAI) systems offer unprecedented opportunities for transforming professional and personal work, yet present challenges around prompting, evaluating and relying on outputs, and optimizing workflows. We argue that metacognition - the psychological ability to monitor and control one's thoughts and behavior - offers a valuable lens to understand and design for these usability challenges. Drawing on research in psychology and cognitive science, and recent GenAI user studies, we illustrate how GenAI systems impose metacognitive demands on users, requiring a high degree of metacognitive monitoring and control. We propose these demands could be addressed by integrating metacognitive support strategies into GenAI systems, and by designing GenAI systems to reduce their metacognitive demand by targeting explainability and customizability. Metacognition offers a coherent framework f...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Human-computer interaction","Social sciences","Human–computer interaction","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/provably-robust-dpo-aligning-language-models-with-noisy-feedback-2","title":"Provably Robust DPO: Aligning Language Models with Noisy Feedback","url":"https://www.microsoft.com/en-us/research/publication/provably-robust-dpo-aligning-language-models-with-noisy-feedback-2/","published":"2024-05-01","authors":["Sayak Ray Chowdhury","Anush Kini","Nagarajan Natarajan"],"abstract":"Learning from preference-based feedback has recently gained traction as a promising approach to align language models with human interests. While these aligned generative models have demonstrated impressive capabilities across various tasks, their dependence on high-quality human preference data poses a bottleneck in practical applications. Specifically, noisy (incorrect and ambiguous) preference pairs in the dataset might restrict the language models from capturing human intent accurately. While practitioners have recently proposed heuristics to mitigate the effect of noisy preferences, a complete theoretical understanding of their workings remain elusive. In this work, we aim to bridge this gap by by introducing a general framework for policy optimization in the presence of random preference flips. We focus on the direct preference optimization (DPO) algorithm in particular since it as...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","Language model","1970-01-01","preference"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/kosmos-g-generating-images-in-context-with-multimodal-large-language-models","title":"Kosmos-G: Generating Images in Context with Multimodal Large Language Models","url":"https://www.microsoft.com/en-us/research/publication/kosmos-g-generating-images-in-context-with-multimodal-large-language-models/","published":"2024-05-01","authors":["Xichen Pan","Li Dong","Shaohan Huang","Zhiliang Peng","Wenhu Chen","Furu Wei"],"abstract":"Recent advancements in subject-driven image generation have made significant strides. However, current methods still fall short in diverse application scenarios, as they require test-time tuning and cannot accept interleaved multi-image and text input. These limitations keep them far from the ultimate goal of \"image as a foreign language in image generation.\" This paper presents Kosmos-G, a model that leverages the advanced multimodal perception capabilities of Multimodal Large Language Models (MLLMs) to tackle the aforementioned challenge. Our approach aligns the output space of MLLM with CLIP using the textual modality as an anchor and performs compositional instruction tuning on curated data. Kosmos-G demonstrates an impressive capability of zero-shot subject-driven generation with interleaved multi-image and text input. Notably, the score distillation instruction tuning requires no m...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","1970-01-01","personalized","distillation"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/coexplorer-generative-ai-powered-2d-and-3d-adaptive-interfaces-to-support-intentionality-in-video-meetings","title":"CoExplorer: Generative AI Powered 2D and 3D Adaptive Interfaces to Support Intentionality in Video Meetings","url":"https://www.microsoft.com/en-us/research/publication/coexplorer-generative-ai-powered-2d-and-3d-adaptive-interfaces-to-support-intentionality-in-video-meetings/","published":"2024-05-01","authors":["Gun Woo (Warren) Park","Payod Panda","Lev Tankelevitch","Sean Rintel"],"abstract":"Current online meeting technologies lack holistic support for re ducing the effort of planning and running meetings. We present CoExplorer2D and CoExplorerVR, generative AI (GenAI)-driven technology probes for exploring the significant transformative po tential of GenAI to augment these aspects of meetings. In each system, before the meeting, these systems generate tools that allow Current online meeting technologies lack holistic support for reducing the effort of planning and running meetings. We present CoExplorer2D and CoExplorerVR, generative AI (GenAI)-driven technology probes for exploring the significant transformative potential of GenAI to augment these aspects of meetings. In each system, before the meeting, these systems generate tools that allow synthesis and ranking of attendees’ key issues for discussion, and likely phases that a meeting would require to cover these issues....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Human-computer interaction","Social sciences","Human–computer interaction","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/beyond-the-waiting-room-patients-perspectives-on-the-conversational-nuances-of-pre-consultation-chatbots","title":"Beyond the Waiting Room: Patient's Perspectives on the Conversational Nuances of Pre-Consultation Chatbots","url":"https://www.microsoft.com/en-us/research/publication/beyond-the-waiting-room-patients-perspectives-on-the-conversational-nuances-of-pre-consultation-chatbots/","published":"2024-05-01","authors":["Brenna Li","Ofek Gross","Noah Crampton","Mamta Kapoor","Saba Tauseef Tetyana Skoropad","Mohit Jain","Khai Truong","Alex Mariakakis"],"abstract":"Pre-consultation serves as a critical information exchange between healthcare providers and patients, streamlining visits and supporting patient-centered care. Human-led pre-consultations offer many benefits, yet they require significant time and energy from clinical staff. In this work, we identify design goals for pre-consultation chatbots given their potential to carry out human-like conversations and autonomously adapt their line of questioning. We conducted a study with 33 walk-in clinic patients to elicit design considerations for pre-consultation chatbots. Participants were exposed to one of two study conditions: an LLM-powered AI agent and a Wizard-of-Oz agent simulated by medical professionals. Our study found that both conditions were equally well-received and demonstrated comparable conversational capabilities. However, the extent of the follow-up questions and the amount of e...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Medical, health and genomics","HCI","1970-01-01","LLM","agent"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/vidur-a-large-scale-simulation-framework-for-llm-inference","title":"VIDUR: A Large-Scale Simulation Framework for LLM Inference","url":"https://www.microsoft.com/en-us/research/publication/vidur-a-large-scale-simulation-framework-for-llm-inference/","published":"2024-05-01","authors":["Amey Agrawal","Nitin Kedia","Jayashree Mohan","Ashish Panwar","Nipun Kwatra","Bhargav S. Gulavani","Ramachandran Ramjee","Alexey Tumanov"],"abstract":"Optimizing the deployment of Large Language Models (LLMs) is expensive today since it requires experimentally running an application workload against an LLM implementation while exploring large configuration space formed by system knobs such as parallelization strategies, batching techniques, and scheduling policies. To address this challenge, we present Vidur – a large-scale, high-fidelity, easily-extensible simulation framework for LLM inference performance. Vidur models the performance of LLM operators using a combination of experimental profiling and predictive modeling, and evaluates the end-to-end inference performance for different workloads by estimating several metrics of interest such as latency and throughput. We validate the fidelity of Vidur on several LLMs and show that it estimates inference latency with less than 9% error across the range. Further, we present Vidur-Search...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Inproceedings (Conference)","Systems and networking","Computer science","1970-01-01","LLM"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/pariksha-a-scalable-democratic-transparent-evaluation-platform-for-assessing-indic-large-language-models","title":"PARIKSHA: A Scalable, Democratic, Transparent Evaluation Platform for Assessing Indic Large Language Models","url":"https://www.microsoft.com/en-us/research/publication/pariksha-a-scalable-democratic-transparent-evaluation-platform-for-assessing-indic-large-language-models/","published":"2024-05-01","authors":["Ishaan Watts","Varun Gumma","Aditya Yadavalli","Vivek Seshadri","Swami Manohar","Sunayana Sitaram"],"abstract":"Evaluation of multilingual Large Language Models (LLMs) is challenging due to a variety of factors - the lack of benchmarks with sufficient linguistic diversity, contamination of popular benchmarks into LLM pre-training data and the lack of local, cultural nuances in translated benchmarks. Hence, it is difficult to do extensive evaluation of LLMs in the multilingual setting, leading to lack of fair comparisons between models and difficulties in replicating the evaluation setup used by some models. Recently, several Indic (Indian language) LLMs have been created as an answer to a call to build more locally and culturally relevant LLMs. Our evaluation framework, named Pariksha, is the first comprehensive evaluation of Indic LLMs that uses a combination of Human and LLM-based evaluation. We conduct a total of 90k human evaluations and 50k LLM-based evaluations of 29 models to present leader...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Unpublished","Artificial intelligence","Human language technologies","Natural language processing","LLM"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/disentangled-prompt-representation-for-domain-generalization","title":"Disentangled Prompt Representation for Domain Generalization","url":"https://www.microsoft.com/en-us/research/publication/disentangled-prompt-representation-for-domain-generalization/","published":"2024-05-01","authors":["De Cheng","Zhipeng Xu","Xinyang Jiang","Nannan Wang","Dongsheng Li","Xinbo Gao"],"abstract":"Domain Generalization (DG) aims to develop a versatile model capable of performing well on unseen target domains. Recent advancements in pre-trained Visual Foundation Models (VFMs), such as CLIP, show significant potential in enhancing the generalization abilities of deep models. Although there is a growing focus on VFM-based domain prompt tuning for DG, effectively learning prompts that disentangle invariant features across all domains remains a major challenge. In this paper, we propose addressing this challenge by leveraging the controllable and flexible language prompt of the VFM. Observing that the text modality of VFMs is inherently easier to disentangle, we introduce a novel text feature guided visual prompt tuning framework. This framework first automatically disentangles the text prompt using a large language model (LLM) and then learns domain-invariant visual representation gui...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Inproceedings (Conference)","Computer vision","1970-01-01","LLM","language model"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/differentially-private-reward-estimation-with-preference-feedback","title":"Differentially Private Reward Estimation with Preference Feedback","url":"https://www.microsoft.com/en-us/research/publication/differentially-private-reward-estimation-with-preference-feedback/","published":"2024-05-01","authors":["Sayak Ray Chowdhury","Xingyu Zhou","Nagarajan Natarajan"],"abstract":"Learning from preference-based feedback has recently gained considerable traction as a promising approach to align generative models with human interests. Instead of relying on numerical rewards, the generative models are trained using reinforcement learning with human feedback (RLHF). These approaches first solicit feedback from human labelers typically in the form of pairwise comparisons between two possible actions, then estimate a reward model using these comparisons, and finally employ a policy based on the estimated reward model. An adversarial attack in any step of the above pipeline might reveal private and sensitive information of human labelers. In this work, we adopt the notion of label differential privacy (DP) and focus on the problem of reward estimation from preference-based feedback while protecting privacy of each individual labelers. Specifically, we consider the parame...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Machine learning","1970-01-01","preference"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/white-paper-on-ai-and-the-future-of-work-in-africa","title":"White Paper on AI and the Future of Work in Africa","url":"https://www.microsoft.com/en-us/research/publication/white-paper-on-ai-and-the-future-of-work-in-africa/","published":"2024-05-01","authors":["Jacki O'Neill","Vukosi Marivate","Barbara Glover","Winnie Karanu","Girmaw Abebe Tadesse","et.al."],"abstract":"This white paper is the result of a multidisciplinary workshop that took place in Nairobi on 3rd November 2023, where diverse thought-leaders from various sectors and backgrounds discussed the implications of generative AI for the future of work in Africa. The workshop was organized by a core committee including Jacki O’Neill (MSR Africa, Nairobi), Winnie Karanu (Microsoft Philanthropies), Vukosi Marivate (University of Pretoria and Lelapa AI), Wesley Rosslyn Smith (University of Pretoria), Barbara Glover (NEPAD), Charity Wayua, Matt Grollnek, Anne Makena (Oxford University).The workshop explored four topics: 1) Macroeconomics, 2) Jobs, skills and labour markets, 3) Workers’ perspectives on AI, and 4) Africa-centric AI platforms. This white paper presents the insights and recommendations from these four topics, each written by a different group of contributors.The white paper aims to pro...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Miscellaneous","Artificial intelligence","Future of Work","Generative AI"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/onesparse-a-unified-system-for-multi-index-vector-search","title":"OneSparse: A Unified System for Multi-index Vector Search","url":"https://www.microsoft.com/en-us/research/publication/onesparse-a-unified-system-for-multi-index-vector-search/","published":"2024-05-01","authors":["Yaoqi Chen","Ruicheng Zheng","Qi Chen","Shuotao Xu","Qianxi Zhang","Xue Wu","Weihao Han","Hua Yuan","Mingqin Li","Yujing Wang","Jason Li","Fan Yang"],"abstract":"Multi-index vector search has become the cornerstone for many applications, such as recommendation systems. Efficient search in such a multi-modal hybrid vector space is challenging since no single index design performs well for all kinds of vector data. Existing approaches to processing multi-index hybrid queries either suffer from algorithmic limitations or processing inefficiency. In this paper, we propose OneSparse, a unified multi-vector index query system that incorporates multiple posting-based vector indices, which enables highly efficient retrieval of multi-modal data-sets. OneSparse introduces a novel multi-index query engine design of inter-index intersection push-down. It also optimizes the vector posting format to expedite multi-index queries. Our experiments show OneSparse achieves more than 6× search performance improvement while maintaining comparable accuracy. OneSparse....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","retrieval","efficient"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/gaia-zero-shot-talking-avatar-generation","title":"GAIA: Zero-shot Talking Avatar Generation","url":"https://www.microsoft.com/en-us/research/publication/gaia-zero-shot-talking-avatar-generation/","published":"2024-05-01","authors":["Tianyu He","Junliang Guo","Runyi Yu","Yuchi Wang","Jialiang Zhu","Kaikai An","Leyi Li","Xu Tan","Chunyu Wang","HsiangTao Wu","Sheng Zhao","Jiang Bian"],"abstract":"Zero-shot talking avatar generation aims at synthesizing natural talking videos from speech and a single portrait image. Previous methods have relied on domain-specific heuristics such as warping-based motion representation and 3D Morphable Models, which limit the naturalness and diversity of the generated avatars. In this work, we introduce GAIA (Generative AI for Avatar), which eliminates the domain priors in talking avatar generation. In light of the observation that the speech only drives the motion of the avatar while the appearance of the avatar and the background typically remain the same throughout the entire video, we divide our approach into two stages: 1) disentangling each frame into motion and appearance representations; 2) generating motion sequences conditioned on the speech and reference portrait image. We collect a large-scale high-quality talking avatar dataset and trai...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","Zero shot learning"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/a-fixed-point-approach-for-causal-generative-modeling","title":"A Fixed-Point Approach for Causal Generative Modeling","url":"https://www.microsoft.com/en-us/research/publication/a-fixed-point-approach-for-causal-generative-modeling/","published":"2024-05-01","authors":["Meyer Scetbon","Joel Jennings","Agrin Hilmkil","Cheng Zhang","Chao Ma"],"abstract":"We propose a novel formalism for describing Structural Causal Models (SCMs) as fixed-point problems on causally ordered variables, eliminating the need for Directed Acyclic Graphs (DAGs), and establish the weakest known conditions for their unique recovery given the topological ordering (TO). Based on this, we design a two-stage causal generative model that first infers in a zero-shot manner a valid TO from observations, and then learns the generative SCM on the ordered variables. To infer TOs, we propose to amortize the learning of TOs on synthetically generated datasets by sequentially predicting the leaves of graphs seen during training. To learn SCMs, we design a transformer-based architecture that exploits a new attention mechanism enabling the modeling of causal structures, and show that this parameterization is consistent with our formalism. Finally, we conduct an extensive evalua...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","mathematics","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W4396568157","title":"Foundation models meet visualizations: Challenges and opportunities","url":"https://doi.org/10.1007/s41095-023-0393-x","published":"2024-05-01","authors":["Weikai Yang","Mengchen Liu","Zheng Wang","Shixia Liu"],"abstract":"Recent studies have indicated that foundation models, such as BERT and GPT, excel at adapting to various downstream tasks. This adaptability has made them a dominant force in building artificial intelligence (AI) systems. Moreover, a new research paradigm has emerged as visualization techniques are incorporated into these models. This study divides these intersections into two research areas: visualization for foundation model (VIS4FM) and foundation model for visualization (FM4VIS). In terms of VIS4FM, we explore the primary role of visualizations in understanding, refining, and evaluating these intricate foundation models. VIS4FM addresses the pressing need for transparency, explainability, fairness, and robustness. Conversely, in terms of FM4VIS, we highlight how foundation models can be used to advance the visualization field itself. The intersection of foundation models with visuali...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1007/s41095-023-0393-x","openalex_id":"https://openalex.org/W4396568157","cited_by_count":48,"quality_score":67,"matched_keywords":[],"author_affiliations":["Microsoft (United States)","Tsinghua University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7549211978912354},{"id":"https://openalex.org/C36464697","display_name":"Visualization","score":0.7539727687835693},{"id":"https://openalex.org/C2780966255","display_name":"Foundation (evidence)","score":0.6749662756919861},{"id":"https://openalex.org/C2780233690","display_name":"Transparency (behavior)","score":0.5472620725631714},{"id":"https://openalex.org/C2522767166","display_name":"Data science","score":0.5220348834991455},{"id":"https://openalex.org/C177606310","display_name":"Adaptability","score":0.46340417861938477},{"id":"https://openalex.org/C63479239","display_name":"Robustness (evolution)","score":0.4105584919452667},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.3707371950149536}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":48}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/vision-language-models-for-spreadsheet-understanding-challenges-and-opportunities","title":"Vision Language Models for Spreadsheet Understanding: Challenges and Opportunities","url":"https://www.microsoft.com/en-us/research/publication/vision-language-models-for-spreadsheet-understanding-challenges-and-opportunities/","published":"2024-05-01","authors":["Shiyu Xia","Junyu Xiong","Haoyu Dong","Jianbo Zhao","Yuzhang Tian","Mengyu Zhou","Yeye He","Shi Han","Dongmei Zhang"],"abstract":"This paper explores capabilities of Vision Language Models on spreadsheet comprehension. We propose three self-supervised challenges with corresponding evaluation metrics to comprehensively evaluate VLMs on Optical Character Recognition (OCR), spatial perception, and visual format recognition. Additionally, we utilize the spreadsheet table detection task to assess the overall performance of VLMs by integrating these challenges. To probe VLMs more finely, we propose three spreadsheet-to-image settings: column width adjustment, style change, and address augmentation.We propose variants of prompts to address the above tasks in different settings. Notably, to leverage the strengths of VLMs in understanding text rather than two-dimensional positioning, we propose to decode cell values on the four boundaries of the table in spreadsheet boundary detection. Our findings reveal that VLMs demonstr...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Inproceedings (Conference)","Data platforms and analytics","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/training-diffusion-models-towards-diverse-image-generation-with-reinforcement-learning","title":"Training Diffusion Models Towards Diverse Image Generation with Reinforcement Learning","url":"https://www.microsoft.com/en-us/research/publication/training-diffusion-models-towards-diverse-image-generation-with-reinforcement-learning/","published":"2024-05-01","authors":["Zichen Miao","Jiang Wang","Zhengyuan Yang","Lijuan Wang","Qiang Qiu","Zicheng Liu"],"abstract":"Diffusion models have demonstrated unprecedented capabilities in image generation. Yet, they incorporate and amplify the data bias (e.g., gender, age) from the original training set, limiting the diversity of generated images. In this paper, we propose a diversity-oriented fine-tuning method using reinforcement learning (RL) for diffusion models under the guidance of an image-set-based reward function. Specifically, the proposed reward function, denoted as Diversity Reward, utilizes a set of generated images to evaluate the coverage of the current generative distribution w.r.t. the reference distribution, represented by a set of unbiased images. Built on top of the probabilistic method of distribution discrepancy estimation, Diversity Reward can measure the relative distribution gap with a small set of images efficiently. We further formulate the diffusion process as a multi-step decisio...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Inproceedings (Conference)","Computer vision","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/designing-skill-compatible-ai-methodologies-and-frameworks-in-chess","title":"Designing Skill-Compatible AI: Methodologies and Frameworks in Chess","url":"https://www.microsoft.com/en-us/research/publication/designing-skill-compatible-ai-methodologies-and-frameworks-in-chess/","published":"2024-05-01","authors":["Karim Hamade","Reid McIlroy-Young","Siddhartha Sen","Jon Kleinberg","Ashton Anderson"],"abstract":"Powerful artificial intelligence systems are often used in settings where they must interact with agents that are computationally much weaker, for example when they work alongside humans or operate in complex environments where some tasks are handled by algorithms, heuristics, or other entities of varying computational power. For AI agents to successfully interact in these settings, however, achieving superhuman performance alone is not sufficient; they also need to account for suboptimal actions or idiosyncratic style from their less-skilled counterparts. We propose a formal evaluation framework for assessing the compatibility of near-optimal AI with interaction partners who may have much lower levels of skill; we use popular collaborative chess variants as model systems to study and develop AI agents that can successfully interact with lower-skill entities. Traditional chess engines de...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"apple:ymcqcs210aea5jk1lx08l15g","title":"Compressing LLMs: The Truth is Rarely Pure and Never Simple","url":"https://machinelearning.apple.com/research/compressing-llms","published":"2024-05-01","authors":["Ajay Jaiswal","Zhe Gan","Xianzhi Du","Bowen Zhang","Zhangyang Wang","Yinfei Yang"],"abstract":"Despite their remarkable achievements, modern Large Language Models (LLMs) encounter exorbitant computational and memory footprints. Recently, several works have shown significant success in training-free and data-free compression (pruning and quantization) of LLMs achieving 50-60% sparsity and reducing the bit-width down to 3 or 4 bits per weight, with negligible perplexity degradation over the uncompressed baseline. As recent research efforts...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["memory","compression","quantization"],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/nl2fix-generating-functionally-correct-code-edits-from-bug-descriptions","title":"NL2Fix: Generating Functionally Correct Code Edits from Bug Descriptions","url":"https://www.microsoft.com/en-us/research/publication/nl2fix-generating-functionally-correct-code-edits-from-bug-descriptions/","published":"2024-05-01","authors":["Sarah Fakhoury","Saikat Chakraborty","Madan Musuvathi","Shuvendu Lahiri"],"abstract":"Despite the notable advancement of Large Language Models for Code Generation, there is a distinct gap in benchmark datasets and evaluation of LLMs' proficiency in generating functionally correct code edits based on natural language descriptions of intended changes. We address this void by presenting the challenge of translating natural language descriptions of code changes, particularly bug fixes outlined in Issue reports within repositories, into accurate code fixes. To tackle this issue, we introduce Defects4J-Nl2fix , a dataset comprising 283 Java programs from the widely-used Defects4J dataset, augmented with high-level descriptions of bug fixes. Subsequently, we empirically evaluate three state-of-the-art LLMs on this task, exploring the impact of different prompting strategies on their ability to generate functionally correct edits. Results show varied ability across models on this...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Inproceedings (Conference)","Programming languages and software engineering"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"apple:mn81qq6jx07xwf9ojtgmdxlm","title":"Large Language Models as Generalizable Policies for Embodied Tasks","url":"https://machinelearning.apple.com/research/llms-as-generalizable-policies","published":"2024-05-01","authors":["Andrew Szot","Max Schwarzer","Harsh Agrawal","Bogdan Mazoure","Walter Talbott","Rin Metcalf Susa","Natalie Mackraz","Devon Hjelm","Alexander Toshev"],"abstract":"We show that large language models (LLMs) can be adapted to be generalizable policies for embodied visual tasks. Our approach, called Large LAnguage model Reinforcement Learning Policy (LLaRP), adapts a pre-trained frozen LLM to take as input text instructions and visual egocentric observations and output actions directly in the environment. Using reinforcement learning, we train LLaRP to see and act solely through environmental interactions. We...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["LLM","language model"],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"official:63ab4ff201e24c7a","title":"Large Language Models are Efficient Learners of Noise-Robust Speech Recognition","url":"https://research.nvidia.com/publication/2024-05_large-language-models-are-efficient-learners-noise-robust-speech-recognition","published":"2024-05","authors":["YuChen Hu","Chen Chen","Huck Yang","Ruizhe Li","Chao Zhang","Pin-Yu Chen","EnSiong Chng"],"abstract":"Official NVIDIA Research publication. ICLR","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["ICLR","efficient"],"author_affiliations":["NVIDIA"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official NVIDIA Research publications page https://research.nvidia.com/publications?f%5B0%5D=publication_date%3A2024&page=3"}},{"id":"official:be5c55848020fb92","title":"It's Never Too Late: Fusing Acoustic Information into Large Language Models for Automatic Speech Recognition","url":"https://research.nvidia.com/publication/2024-05_it-s-never-too-late-fusing-acoustic-information-large-language-models-automatic","published":"2024-05","authors":["Chen Chen","Ruizhe Li","Yuchen Hu","Sabato Marco Siniscalchi","Pin-Yu Chen","Ensiong Chng","Huck Yang"],"abstract":"Official NVIDIA Research publication. ICLR","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["ICLR"],"author_affiliations":["NVIDIA"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official NVIDIA Research publications page https://research.nvidia.com/publications?f%5B0%5D=publication_date%3A2024&page=3"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/the-emerging-ai-divide-in-the-united-states","title":"The Emerging AI Divide in the United States","url":"https://www.microsoft.com/en-us/research/publication/the-emerging-ai-divide-in-the-united-states/","published":"2024-04-30","authors":["Madeleine I. G. Daepp","Scott Counts"],"abstract":"The digital divide describes disparities in access to and usage of digital tooling between social and economic groups. Emerging generative artificial intelligence tools, which strongly affect productivity, could magnify the impact of these divides. However, the affordability, multi-modality, and multilingual capabilities of these tools could also make them more accessible to diverse users in comparison with previous forms of digital tooling. In this study, we characterize spatial differences in U.S. residents' knowledge of a new generative AI tool, ChatGPT, through an analysis of state- and county-level search query data. In the first six months after the tool's release, we observe the highest rates of users searching for ChatGPT in West Coast states and persistently low rates of search in Appalachian and Gulf states. Counties with the highest rates of search are relatively more urbanize...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Unpublished","Artificial intelligence","Social sciences","Computer science"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W4396498719","title":"Large language models leverage external knowledge to extend clinical insight beyond language boundaries","url":"https://doi.org/10.1093/jamia/ocae079","published":"2024-04-29","authors":["Jiageng Wu","Xian Wu","Zhaopeng Qiu","Minghui Li","Shixu Lin","Yingying Zhang","Yefeng Zheng","Changzheng Yuan","Jie Yang"],"abstract":"OBJECTIVES: Large Language Models (LLMs) such as ChatGPT and Med-PaLM have excelled in various medical question-answering tasks. However, these English-centric models encounter challenges in non-English clinical settings, primarily due to limited clinical knowledge in respective languages, a consequence of imbalanced training corpora. We systematically evaluate LLMs in the Chinese medical context and develop a novel in-context learning framework to enhance their performance. MATERIALS AND METHODS: The latest China National Medical Licensing Examination (CNMLE-2022) served as the benchmark. We collected 53 medical books and 381 149 medical questions to construct the medical knowledge base and question bank. The proposed Knowledge and Few-shot Enhancement In-context Learning (KFE) framework leverages the in-context learning ability of LLMs to integrate diverse external clinical knowledge s...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1093/jamia/ocae079","openalex_id":"https://openalex.org/W4396498719","cited_by_count":28,"quality_score":69,"matched_keywords":["LLM"],"author_affiliations":["Brigham and Women's Hospital","Harvard University","Tencent (China)","Zhejiang University"],"concepts":[{"id":"https://openalex.org/C153083717","display_name":"Leverage (statistics)","score":0.7423604726791382},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5970057249069214},{"id":"https://openalex.org/C2522767166","display_name":"Data science","score":0.3727562427520752},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.3505714237689972},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.2884504795074463}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":28}},{"id":"bytedance-seed:73","title":"PLLaVA : Parameter-free LLaVA Extension from Images to Videos for Video Dense Captioning","url":"https://seed.bytedance.com/en/research/pllava-parameter-free-llava-extension-from-images-to-videos-for-video-dense-captioning","published":"2024-04-29","authors":["Lin Xu","Yilin Zhao","Daquan Zhou","Zhijie Lin","See Kiong Ng","Jiashi Feng"],"abstract":"Vision-language pre-training has significantly elevated performance across a widerange of image-language applications. Yet, the pre-training process for videorelated tasks demands exceptionally large computational and data resources, whichhinders the progress of video-language models. This paper investigates a straightforward, highly efficient, and resource-light approach to adapting an existingimage-language pre-trained model for dense video understanding. Our preliminaryexperiments reveal that directly fine-tuning pre-trained image-language modelswith multiple frames as inputs on video datasets leads to performance saturationor even a drop. Our further investigation reveals that it is largely attributed to thebias of learned high-norm visual features. Motivated by this finding, we proposea simple but effective pooling strategy to smooth the feature distribution alongthe temporal dimens...","companies":["ByteDance/Seed"],"matched_orgs":["ByteDance/Seed"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Computer Vision","Vision","arXiv","efficient"],"author_affiliations":["ByteDance/Seed"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official ByteDance Seed publication API https://seed.bytedance.com/api/get_article_list_v2"}},{"id":"hf-org-paper:huawei-noah:2404.18911","title":"Kangaroo: Lossless Self-Speculative Decoding via Double Early Exiting","url":"https://huggingface.co/papers/2404.18911","published":"2024-04-29","authors":["Huawei/Noah"],"abstract":"","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["HuggingFace org papers","huawei-noah"],"author_affiliations":["Huawei/Noah"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/huawei-noah/papers"}},{"id":"apple:uc2hbakdhwwd3o52u1o8lva1","title":"JointNet: Extending Text-to-Image Diffusion for Dense Distribution Modeling","url":"https://machinelearning.apple.com/research/jointnet","published":"2024-04-29","authors":["Jingyang Zhang","Shiwei Li","Yuanxun Lu","Tian Fang","David McKinnon","Yanghai Tsin","Long Quan","Yao Yao"],"abstract":"We introduce JointNet, a novel neural network architecture for modeling the joint distribution of images and an additional dense modality (e.g., depth maps). JointNet is extended from a pre-trained text-to-image diffusion model, where a copy of the original network is created for the new dense modality branch and is densely connected with the RGB branch. The RGB branch is locked during network fine-tuning, which enables efficient learning of the...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["efficient"],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"apple:dr73ql2p4k8k2ljep0g26gtj","title":"Direct2.5: Diverse 3D Content Creation via Multi-view 2.5D Diffusion","url":"https://machinelearning.apple.com/research/direct-2-5","published":"2024-04-29","authors":["Yuanxun Lu","Jingyang Zhang","Shiwei Li","Tian Fang","David McKinnon","Yanghai Tsin","Long Quan","Xun Cao","Yao Yao"],"abstract":"Recent advances in generative AI have unveiled significant potential for the creation of 3D content. However, current methods either apply a pre-trained 2D diffusion model with the time-consuming score distillation sampling (SDS), or a direct 3D diffusion model trained on limited 3D data losing generation diversity. In this work, we approach the problem by employing a multi-view 2.5D diffusion fine-tuned from a pre-trained 2D diffusion model. The...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["distillation"],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/dpo-meets-ppo-reinforced-token-optimization-for-rlhf","title":"DPO Meets PPO: Reinforced Token Optimization for RLHF","url":"https://www.microsoft.com/en-us/research/publication/dpo-meets-ppo-reinforced-token-optimization-for-rlhf/","published":"2024-04-28","authors":["Han Zhong","Guhao Feng","Wei Xiong","Li Zhao","Di He","Jiang Bian","Liwei Wang"],"abstract":"In the classical Reinforcement Learning from Human Feedback (RLHF) framework, Proximal Policy Optimization (PPO) is employed to learn from sparse, sentence-level rewards -- a challenging scenario in traditional deep reinforcement learning. Despite the great successes of PPO in the alignment of large language models, its open-source implementation is still largely sub-optimal. To address these issues, we introduce a framework that models RLHF problems as a Markov decision process (MDP), enabling the capture of fine-grained token-wise information. Under this framework, we introduce an algorithm Reinforced Token Optimization (\\texttt{RTO}), which learns the token-wise reward function from preference data and performs policy optimization based on this learned token-wise reward signal. Theoretically, \\texttt{RTO} is proven to have the capability of finding the near-optimal policy sample-effic...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":80,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","mathematics","Reinforcement learning","1970-01-01","preference"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/kimbap-a-node-property-map-system-for-distributed-graph-analytics","title":"Kimbap: A Node-Property Map System for Distributed Graph Analytics","url":"https://www.microsoft.com/en-us/research/publication/kimbap-a-node-property-map-system-for-distributed-graph-analytics/","published":"2024-04-27","authors":["Hochan Lee","Roshan Dathathri","Keshav Pingali"],"abstract":"Most distributed graph analytics systems such as Gemini, Gluon, and SympleGraph support a computational model in which node properties are updated iteratively using properties of adjacent neighbors of those nodes. However, there are many algorithms that cannot be expressed in this model, such as the Louvain algorithm for community detection and the Shiloach-Vishkin algorithm for connected components. These algorithms may be more efficient or may produce better quality output than simpler algorithms that can be expressed using updates only from adjacent vertices.This paper describes Kimbap, a distributed graph analytics programming framework, and its high-performance implementation that addresses this problem. Kimbap supports general vertex-centric algorithms by permitting the computation at a node to read and write properties of any node in the graph, not just its adjacent neighbors. The...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Programming languages and software engineering","Systems and networking","Computer science","memory","efficient"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/characterizing-power-management-opportunities-for-llms-in-the-cloud","title":"Characterizing Power Management Opportunities for LLMs in the Cloud","url":"https://www.microsoft.com/en-us/research/publication/characterizing-power-management-opportunities-for-llms-in-the-cloud/","published":"2024-04-27","authors":["Pratyush Patel","Esha Choukse","Chaojie Zhang","Íñigo Goiri","Brijesh Warrier","Nithish Mahalingam","Ricardo Bianchini"],"abstract":"Cloud providers and datacenter operators are grappling with massive demand for graphics processing units (GPUs) due to surging use of large language models (LLMs). To try to keep up, enterprises are building new GPU clusters to run LLM workloads, which in turn are running into an energy wall worldwide. Power oversubscription and adding more servers to existing and upcoming datacenters could help alleviate this challenge. However, GPU-heavy workloads like LLMs could create power surges, exceeding fixed power contracts with utility companies. Proper power usage analysis and management would help providers oversubscribe power to add more GPU servers to existing datacenters safely and more efficiently. In a recent paper: Characterizing Power Management Opportunities for LLMs in the Cloud , researchers from Microsoft analy ze power patterns for several popular, open-source LLMs across commonl...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Inproceedings (Conference)","Systems and networking","Computer science","1970-01-01","LLM"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W4399427593","title":"SRAG: Speech Retrieval Augmented Generation for Spoken Language Understanding","url":"https://doi.org/10.1109/iccect60629.2024.10546001","published":"2024-04-26","authors":["Hao Yang","Min Zhang","Daimeng Wei","Jiaxin Guo"],"abstract":"Retrieval augmented generation (RAG) has shown promise for enhancing natural language understanding (NLU) capabilities of large language models (LLMs) by retrieving relevant knowledge as prompts. Extending RAG to spoken language understanding (SLU) represents an important research direction. This paper proposes a RAG approach for improving SLU. First, the encoder of a pretrained automatic speech recognition model is utilized for speech retrieval over the training set. The corresponding texts and intent labels are then formulated as prompts to guide the SLU decoder. Furthermore, a prompt attention mechanism is introduced to strengthen the attention between generation and prompts. Experiments demonstrate that the proposed speech RAG approach substantially outperforms conventional end-to-end and cascaded SLU models in intent prediction from speech. This highlights the efficacy of leveraging...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/iccect60629.2024.10546001","openalex_id":"https://openalex.org/W4399427593","cited_by_count":4,"quality_score":45,"matched_keywords":["retrieval"],"author_affiliations":["Huawei Technologies (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8780524730682373},{"id":"https://openalex.org/C2776230583","display_name":"Spoken language","score":0.767690896987915},{"id":"https://openalex.org/C177264268","display_name":"Set (abstract data type)","score":0.6537879705429077},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.5804914236068726},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.5596917271614075},{"id":"https://openalex.org/C118505674","display_name":"Encoder","score":0.552481472492218},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4997429847717285},{"id":"https://openalex.org/C28490314","display_name":"Speech recognition","score":0.4942786693572998}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":4}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/automatic-root-cause-analysis-via-large-language-models-for-cloud-incidents","title":"Automatic Root Cause Analysis via Large Language Models for Cloud Incidents","url":"https://www.microsoft.com/en-us/research/publication/automatic-root-cause-analysis-via-large-language-models-for-cloud-incidents/","published":"2024-04-25","authors":["Yinfang Chen","Huaibing Xie","Minghua Ma","Yu Kang","Xin Gao","Liu Shi","Yunjie Cao","Xue‐Chao Gao","Hao Fan","Ming Wen","Jun Zeng","Supriyo GHOSH"],"abstract":"Ensuring the reliability and availability of cloud services necessitates efficient root cause analysis (RCA) for cloud incidents. Traditional RCA methods, which rely on manual investigations of data sources such as logs and traces, are often laborious, error-prone, and challenging for on-call engineers. In this paper, we introduce RCACopilot, an innovative on-call system empowered by the large language model for automating RCA of cloud incidents. RCACopilot matches incoming incidents to corresponding incident handlers based on their alert types, aggregates the critical runtime diagnostic information, predicts the incident's root cause category, and provides an explanatory narrative. We evaluate RCACopilot using a real-world dataset consisting of a year's worth of incidents from Microsoft. Our evaluation demonstrates that RCACopilot achieves RCA accuracy up to 0.766. Furthermore, the diag...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":92,"matched_keywords":["Inproceedings (Conference)","Algorithms","Artificial intelligence","Data platforms and analytics","Programming languages and software engineering","Systems and networking","Computer science","1970-01-01","language model","efficient"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"official:af2b53d0f9a4decc","title":"Qwen1.5-110B: The First 100B+ Model of the Qwen1.5 Series","url":"https://qwenlm.github.io/blog/qwen1.5-110b/","published":"2024-04-25","authors":["Alibaba/Qwen"],"abstract":"GITHUB HUGGING FACE MODELSCOPE DEMO DISCORDIntroduction Recently we have witnessed a burst of large-scale models with over 100 billion parameters in the opensource community. These models have demonstrated remarkable performance in both benchmark evaluation and chatbot arena. Today, we release the first 100B+ model of the Qwen1.5 series, Qwen1.5-110B, which achieves comparable performance with Meta-Llama3-70B in the base model evaluation, and outstanding performance in the chat evaluation, including MT-Bench and AlpacaEval 2.","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Alibaba/Qwen"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official RSS feed https://qwenlm.github.io/blog/index.xml"}},{"id":"apple:gej0xg6lqjx7n3i58s2fjjcm","title":"CatLIP: CLIP-level Visual Recognition Accuracy with 2.7× Faster Pre-training on Web-scale Image-Text Data","url":"https://machinelearning.apple.com/research/visual-recognition-accuracy","published":"2024-04-25","authors":["Sachin Mehta","Max Horton","Fartash Faghri","Mohammad Sekhavat","Mahyar Najibikohnehshahri","Mehrdad Farajtabar","Oncel Tuzel","Mohammad Rastegari"],"abstract":"Contrastive learning has emerged as a transformative method for learning effective visual representations through the alignment of image and text embeddings. However, pairwise similarity computation in contrastive loss between image and text pairs poses computational challenges. This paper presents a novel weakly supervised pre-training of vision models on web-scale image-text data. The proposed method reframes pre-training on image-text data as...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"openalex:W4395464584","title":"UniQRNet: Unifying Referring Expression Grounding and Segmentation with QRNet","url":"https://doi.org/10.1145/3660638","published":"2024-04-25","authors":["Jiabo Ye","Junfeng Tian","Ming Yan","Haiyang Xu","Qinghao Ye","Yaya Shi","Xiaoshan Yang","Xuwu Wang","Ji Zhang","Liang He","Xin Lin"],"abstract":"Referring expression comprehension aims to align natural language queries with visual scenes, which requires establishing fine-grained correspondence between vision and language. This has important applications in multi-modal reasoning systems. Existing methods typically use text-agnostic visual backbones to extract features independently without considering the specific text input. However, we argue that the extracted visual features can be inconsistent with the referring expression, which hurts multi-modal understanding. To address this, we first propose Query-modulated Refinement Network (QRNet) that leverages language guidance to guide visual feature extraction. However, it only focuses on the grounding task that can only provide coarse-grained annotations in the form of bounding box coordinates. The guidance for the visual backbone is indirect, and the inconsistent issue still exist...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3660638","openalex_id":"https://openalex.org/W4395464584","cited_by_count":4,"quality_score":41,"matched_keywords":[],"author_affiliations":["Alibaba Group (China)","Chinese Academy of Sciences","East China Normal University","Fudan University","Institute of Automation","University of Science and Technology of China"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.9045368432998657},{"id":"https://openalex.org/C89600930","display_name":"Segmentation","score":0.667150616645813},{"id":"https://openalex.org/C2780451532","display_name":"Task (project management)","score":0.5979369282722473},{"id":"https://openalex.org/C147037132","display_name":"Minimum bounding box","score":0.5810654163360596},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5600402355194092},{"id":"https://openalex.org/C90559484","display_name":"Expression (computer science)","score":0.5238038301467896},{"id":"https://openalex.org/C177264268","display_name":"Set (abstract data type)","score":0.5110891461372375},{"id":"https://openalex.org/C63584917","display_name":"Bounding overwatch","score":0.4987599849700928}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":4}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/from-local-to-global-a-graph-rag-approach-to-query-focused-summarization","title":"From Local to Global: A Graph RAG Approach to Query-Focused Summarization","url":"https://www.microsoft.com/en-us/research/publication/from-local-to-global-a-graph-rag-approach-to-query-focused-summarization/","published":"2024-04-24","authors":["Darren Edge","Ha Trinh","Newman Cheng","Joshua Bradley","Alex Chao","Apurva Mody","Steven Truitt","Dasha Metropolitansky","Robert Osazuwa Ness","Jonathan Larson"],"abstract":"The use of retrieval-augmented generation (RAG) to retrieve relevant information from an external knowledge source enables large language models (LLMs) to answer questions over private and/or previously unseen document collections. However, RAG fails on global questions directed at an entire text corpus, such as \"What are the main themes in the dataset?\", since this is inherently a query-focused summarization (QFS) task, rather than an explicit retrieval task. Prior QFS methods, meanwhile, do not scale to the quantities of text indexed by typical RAG systems. To combine the strengths of these contrasting methods, we propose GraphRAG, a graph-based approach to question answering over private text corpora that scales with both the generality of user questions and the quantity of source text. Our approach uses an LLM to build a graph index in two stages: first, to derive an entity knowledge...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":84,"matched_keywords":["Unpublished","Artificial intelligence","Human language technologies","Generative AI","Information retrieval","Question answering","LLM","retrieval"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/make-your-llm-fully-utilize-the-context","title":"Make Your LLM Fully Utilize the Context","url":"https://www.microsoft.com/en-us/research/publication/make-your-llm-fully-utilize-the-context/","published":"2024-04-24","authors":["Shengnan An","Zexiong Ma","Zeqi Lin","Nanning Zheng","Jian-Guang Lou"],"abstract":"While many contemporary large language models (LLMs) can process lengthy input, they still struggle to fully utilize information within the long context, known as the lost-in-the-middle challenge. We hypothesize that it stems from insufficient explicit supervision during the long-context training, which fails to emphasize that any position in a long context can hold crucial information. Based on this intuition, our study presents information-intensive (IN2) training, a purely data-driven solution to overcome lost-in-the-middle. Specifically, IN2 training leverages a synthesized long-context question-answer dataset, where the answer requires (1) fine-grained information awareness on a short segment (~128 tokens) within a synthesized long context (4K-32K tokens), and (2) the integration and reasoning of information from two or more short segments. Through applying this information-intensiv...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":80,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","large language models","1970-01-01","LLM","retrieval"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"apple:ocl9fvvkvasxl9ygblhpcpq8","title":"OpenELM: An Efficient Language Model Family with Open Training and Inference Framework","url":"https://machinelearning.apple.com/research/openelm","published":"2024-04-24","authors":["Sachin Mehta","Mohammad Sekhavat","Qingqing Cao","Max Horton","Yanzi Jin","Frank Sun","Iman Mirzadeh","Mahyar Najibikohnehshahri","Dmitry Belenko","Peter Zatloukal","Mohammad Rastegari"],"abstract":"This paper has been accepted at the Efficient Systems for Foundation Models workshop at ICML 2024.","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["language model","efficient"],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"apple:n8epi0lxnqo70zw8hay1o5ri","title":"Think While You Write Hypothesis Verification Promotes Faithful Knowledge-to-Text Generation","url":"https://machinelearning.apple.com/research/write-hypothesis","published":"2024-04-24","authors":["Yifu Qiu","Varun Embar","Shay B. Cohen","Benjamin Han"],"abstract":"Neural knowledge-to-text generation models often struggle to faithfully generate descriptions for the input facts: they may produce hallucinations that contradict the given facts, or describe facts not present in the input. To reduce hallucinations, we propose a novel decoding method, TWEAK (Think While Effectively Articulating Knowledge). TWEAK treats the generated sequences at each decoding step and its future sequences as hypotheses, and ranks...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"openalex:W4395048825","title":"Computational scoring and experimental evaluation of enzymes generated by neural networks","url":"https://doi.org/10.1038/s41587-024-02214-2","published":"2024-04-23","authors":["Sean R. Johnson","Xiaozhi Fu","Sandra Viknander","Clara Goldin","Sarah Monaco","Aleksej Zelezniak","Kevin Yang"],"abstract":"In recent years, generative protein sequence models have been developed to sample novel sequences. However, predicting whether generated proteins will fold and function remains challenging. We evaluate a set of 20 diverse computational metrics to assess the quality of enzyme sequences produced by three contrasting generative models: ancestral sequence reconstruction, a generative adversarial network and a protein language model. Focusing on two enzyme families, we expressed and purified over 500 natural and generated sequences with 70-90% identity to the most similar natural sequences to benchmark computational metrics for predicting in vitro enzyme activity. Over three rounds of experiments, we developed a computational filter that improved the rate of experimental success by 50-150%. The proposed metrics and models will drive protein engineering research by serving as a benchmark for g...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1038/s41587-024-02214-2","openalex_id":"https://openalex.org/W4395048825","cited_by_count":68,"quality_score":71,"matched_keywords":["language model"],"author_affiliations":["Chalmers University of Technology","Invitae (United States)","King's College London","Microsoft (United States)","Microsoft Research (United Kingdom)","New England Biolabs (United States)","Vilnius University"],"concepts":[{"id":"https://openalex.org/C50644808","display_name":"Artificial neural network","score":0.5726553201675415},{"id":"https://openalex.org/C70721500","display_name":"Computational biology","score":0.5293382406234741},{"id":"https://openalex.org/C181199279","display_name":"Enzyme","score":0.5014445781707764},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4724683463573456},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.4615386426448822},{"id":"https://openalex.org/C86803240","display_name":"Biology","score":0.3453713059425354},{"id":"https://openalex.org/C55493867","display_name":"Biochemistry","score":0.2581789493560791}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":68}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/aligning-llm-agents-by-learning-latent-preference-from-user-edits","title":"Aligning LLM Agents by Learning Latent Preference from User Edits","url":"https://www.microsoft.com/en-us/research/publication/aligning-llm-agents-by-learning-latent-preference-from-user-edits/","published":"2024-04-22","authors":["Ge Gao","Alexey Taymanov","Eduardo Salinas","Paul Mineiro","Dipendra Misra"],"abstract":"We study interactive learning of LLM-based language agents based on user edits made to the agent's output. In a typical setting such as writing assistants, the user interacts with a language agent to generate a response given a context, and may optionally edit the agent response to personalize it based on their latent preference, in addition to improving the correctness. The edit feedback is naturally generated, making it a suitable candidate for improving the agent's alignment with the user's preference, and for reducing the cost of user edits over time. We propose a learning framework, PRELUDE that infers a description of the user's latent preference based on historic edit data. The inferred user preference descriptions are used to define prompts for generating responses in the future. This avoids fine-tuning the agent, which is costly, challenging to scale with the number of users, an...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":88,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","Language Agents","large language models","1970-01-01","LLM","preference","agent"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/multi-head-mixture-of-experts","title":"Multi-Head Mixture-of-Experts","url":"https://www.microsoft.com/en-us/research/publication/multi-head-mixture-of-experts/","published":"2024-04-22","authors":["Xun Wu","Shaohan Huang","Wenhui Wang","Furu Wei"],"abstract":"Sparse Mixtures of Experts (SMoE) scales model capacity without significant increases in training and inference costs, but exhibits the following two issues: (1) Low expert activation, where only a small subset of experts are activated for optimization. (2) Lacking fine-grained analytical capabilities for multiple semantic concepts within individual tokens. We propose Multi-Head Mixture-of-Experts (MH-MoE), which employs a multi-head mechanism to split each token into multiple sub-tokens. These sub-tokens are then assigned to and processed by a diverse set of experts in parallel, and seamlessly reintegrated into the original token form. The multi-head mechanism enables the model to collectively attend to information from various representation spaces within different experts, while significantly enhances expert activation, thus deepens context understanding and alleviate overfitting. Mor...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":80,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computation and Language","Computer science","large language models","Machine learning","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W4394994587","title":"Recommender Systems in the Era of Large Language Models (LLMs)","url":"https://doi.org/10.1109/tkde.2024.3392335","published":"2024-04-22","authors":["Zihuai Zhao","Wenqi Fan","Jiatong Li","Yunqing Liu","Xiaowei Mei","Yiqi Wang","Zhen Wen","Fei Wang","Xiangyu Zhao","Jiliang Tang","Qing Li"],"abstract":"With the prosperity of e-commerce and web applications, Recommender Systems (RecSys) have become an indispensable and important component in our daily lives, providing personalized suggestions that cater to user preferences. While Deep Neural Networks (DNNs) have achieved significant advancements in enhancing recommender systems by modeling user-item interactions and incorporating their textual side information, these DNN-based methods still exhibit some limitations, such as difficulties in effectively understanding users' interests and capturing textual side information, inabilities in generalizing to various seen/unseen recommendation scenarios and reasoning on their predictions, etc. Meanwhile, the development of Large Language Models (LLMs), such as ChatGPT and GPT-4, has revolutionized the fields of Natural Language Processing (NLP) and Artificial Intelligence (AI), due to their rem...","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/tkde.2024.3392335","openalex_id":"https://openalex.org/W4394994587","cited_by_count":305,"quality_score":75,"matched_keywords":["LLM","personalized"],"author_affiliations":["Amazon (United States)","City University of Hong Kong","Hong Kong Polytechnic University","Michigan State University","National University of Defense Technology"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7727717161178589},{"id":"https://openalex.org/C557471498","display_name":"Recommender system","score":0.6500712633132935},{"id":"https://openalex.org/C2522767166","display_name":"Data science","score":0.3924332559108734},{"id":"https://openalex.org/C136764020","display_name":"World Wide Web","score":0.366159588098526},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.3588621914386749}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":305}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/just-in-time-checkpointing-low-cost-error-recovery-from-deep-learning-training-failures","title":"Just-In-Time Checkpointing: Low Cost Error Recovery from Deep Learning Training Failures","url":"https://www.microsoft.com/en-us/research/publication/just-in-time-checkpointing-low-cost-error-recovery-from-deep-learning-training-failures/","published":"2024-04-22","authors":["Tanmaey Gupta","Sanjeev Krishnan","Rituraj Kumar","Abhishek Vijeev","Bhargav Gulavani","Nipun Kwatra","Ramachandran Ramjee","Muthian Sivathanu"],"abstract":"Deep Learning training jobs process large amounts of training data using many GPU devices, often running for weeks or months. When hardware or software failures happen, these jobs need to restart, losing the memory state for the Deep Neural Network (DNN) model trained so far, unless checkpointing mechanisms are used to save training state periodically. However, for large models, periodic checkpointing incurs significant steady state overhead, and during recovery, a large number of GPUs need to redo work since the last checkpoint. This is especially problematic when failures are frequent for large DNN (such as Large Language Model) training jobs using many GPUs. In this paper, we present a novel approach of just-in-time checkpointing when failures happen, which enables recovery from failures with just a single minibatch iteration of work replayed by all GPUs. This reduces the cost of erro...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Inproceedings (Conference)","Systems and networking","Computer science","language model","memory"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"official:f003aed5893cdfa3","title":"Text Quality-Based Pruning for Efficient Training of Language Models","url":"https://ai.meta.com/research/publications/text-quality-based-pruning-for-efficient-training-of-language-models/","published":"2024-04-22","authors":["Vasu Sharma","Karthik Padthe","Newsha Ardalani","Kushal Tirumala","Russ Howes","Hu Xu","Bernie Huang","Daniel Li (FAIR)","Armen Aghajanyan","Gargi Ghosh","Luke Zettlemoyer"],"abstract":"Official Meta AI publication page.","companies":["Meta/FAIR"],"matched_orgs":["Meta/FAIR"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["NLP","efficient"],"author_affiliations":["Meta/FAIR"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Meta AI publication search https://ai.meta.com/results/?content_types%5B0%5D=publication&page=14"}},{"id":"arxiv:2404.14619","title":"OpenELM: An Efficient Language Model Family with Open-source Training and Inference Framework","url":"https://huggingface.co/papers/2404.14619","published":"2024-04-22","authors":["Sachin Mehta","Mohammad Hossein Sekhavat","Qingqing Cao","Maxwell Horton","Yanzi Jin","Chenfan Sun","Iman Mirzadeh","Mahyar Najibi","Dmitry Belenko","Peter Zatloukal","Mohammad Rastegari"],"abstract":"The reproducibility and transparency of large language models are crucial for advancing open research, ensuring the trustworthiness of results, and enabling investigations into data and model biases, as well as potential risks. To this end, we release OpenELM, a state-of-the-art open language model. OpenELM uses a layer-wise scaling strategy to efficiently allocate parameters within each layer of the transformer model, leading to enhanced accuracy. For example, with a parameter budget of approximately one billion parameters, OpenELM exhibits a 2.36% improvement in accuracy compared to OLMo while requiring 2times fewer pre-training tokens. Diverging from prior practices that only provide model weights and inference code, and pre-train on private datasets, our release includes the complete framework for training and evaluation of the language model on publicly available datasets, including...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["huggingface_search"],"source":"huggingface_search","work_type":"","doi":"","openalex_id":"","cited_by_count":0,"quality_score":35,"matched_keywords":["language model","efficient"],"author_affiliations":[],"concepts":[],"official_report":false,"quality_signals":{"company_match_source":"HuggingFace Papers organization/author metadata"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/protecting-your-llms-with-information-bottleneck","title":"Protecting Your LLMs with Information Bottleneck","url":"https://www.microsoft.com/en-us/research/publication/protecting-your-llms-with-information-bottleneck/","published":"2024-04-21","authors":["Zichuan Liu","Zefan Wang","Linjie Xu","Jinyu Wang","Lei Song","Tianchun Wang","Chunlin Chen","Wei Cheng","Jiang Bian"],"abstract":"The advent of large language models (LLMs) has revolutionized the field of natural language processing, yet they might be attacked to produce harmful content. Despite efforts to ethically align LLMs, these are often fragile and can be circumvented by jailbreaking attacks through optimized or manual adversarial prompts. To address this, we introduce the Information Bottleneck Protector (IBProtector), a defense mechanism grounded in the information bottleneck principle, and we modify the objective to avoid trivial solutions. The IBProtector selectively compresses and perturbs prompts, facilitated by a lightweight and trainable extractor, preserving only essential information for the target LLMs to respond with the expected answer. Moreover, we further consider a situation where the gradient is not visible to be compatible with any LLM. Our empirical evaluations show that IBProtector outper...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":80,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","large language models","Natural language processing","1970-01-01","LLM"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/segmentation-using-large-language-models-a-new-typology-of-american-neighborhoods","title":"Segmentation using large language models: A new typology of American neighborhoods","url":"https://www.microsoft.com/en-us/research/publication/segmentation-using-large-language-models-a-new-typology-of-american-neighborhoods/","published":"2024-04-21","authors":["Alex D. Singleton","Seth Spielman"],"abstract":"In the United States, recent changes to the National Statistical System have amplified the geographic-demographic resolution trade-off. That is, when working with demographic and economic data from the American Community Survey, as one zooms in geographically one loses resolution demographically due to very large margins of error. In this paper, we present a solution to this problem in the form of an AI based open and reproducible geodemographic classification system for the United States using small area estimates from the American Community Survey (ACS). We employ a partitioning clustering algorithm to a range of socio-economic, demographic, and built environment variables. Our approach utilizes an open source software pipeline that ensures adaptability to future data updates. A key innovation is the integration of GPT4, a state-of-the-art large language model, to generate intuitive cl...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.1140/epjds/s13688-024-00466-1","openalex_id":"https://openalex.org/W4395009938","cited_by_count":6,"quality_score":78,"matched_keywords":["Article (Journal)","Artificial intelligence","Social sciences","Computer science","language model"],"author_affiliations":["Microsoft","Microsoft (United States)","University of Liverpool"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/semantically-aligned-question-and-code-generation","title":"Semantically Aligned Question and Code Generation for Automated Insight Generation","url":"https://www.microsoft.com/en-us/research/publication/semantically-aligned-question-and-code-generation/","published":"2024-04-20","authors":["Ananya Singha","Bhavya Chopra","Anirudh Khatry","Sumit Gulwani","Austin Henley","Vu Le","Chris Parnin","Mukul Singh","Gust Verbruggen"],"abstract":"Automated insight generation is a common tactic for helping knowledge workers, such as data scientists, to quickly understand the potential value of new and unfamiliar data. Unfortunately, automated insights produced by large-language models can generate code that does not correctly correspond (or \\emph{align}) to the insight. In this paper, we leverage the semantic knowledge of large language models to generate targeted and insightful questions about data and the corresponding code to answer those questions. Then through an empirical study on data from Open-WikiTable, we show that embeddings can be effectively used for filtering out semantically unaligned pairs of question and code. Additionally, we found that generating questions and code together yields more diverse questions. Venue: 1970-01-01","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Natural language processing","Programming language","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W4394967004","title":"Exploiting Duality in Aspect Sentiment Triplet Extraction With Sequential Prompting","url":"https://doi.org/10.1109/tkde.2024.3391381","published":"2024-04-19","authors":["Jingping Liu","Tao Chen","Hao Guo","Chao Wang","Haiyun Jiang","Yanghua Xiao","Xiang Xu","Baohua Wu"],"abstract":"Aspect sentiment triplet extraction is an important task in natural language processing. Previous work tends to focus on the interaction between the aspect and opinion, while ignoring the positive impact of sentiment on interaction within the triplet. In this paper, we propose a novel aspect sentiment triplet extraction model based on dual learning with sequential prompting. This model is designed as a bidirectional extraction framework that fully takes sentiment polarity into account in the interaction process of aspect and opinion. Besides, we introduce a dual loss as a regularization term for the extraction model to promote better learning in both directions. We further design a sequential prompting strategy to determine aspect, opinion, and sentiment polarity more accurately, which utilizes the results extracted in the previous step as prior knowledge to guide the prediction of the n...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/tkde.2024.3391381","openalex_id":"https://openalex.org/W4394967004","cited_by_count":13,"quality_score":50,"matched_keywords":[],"author_affiliations":["Alibaba Group (China)","East China University of Science and Technology","Fudan University","Shanghai University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.79115891456604},{"id":"https://openalex.org/C2778023678","display_name":"Duality (order theory)","score":0.5699634552001953},{"id":"https://openalex.org/C4725764","display_name":"Extraction (chemistry)","score":0.5012071132659912},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.44873976707458496},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.4205652177333832},{"id":"https://openalex.org/C80444323","display_name":"Theoretical computer science","score":0.3958943486213684},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.33124858140945435},{"id":"https://openalex.org/C33923547","display_name":"Mathematics","score":0.1346552073955536}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":13}},{"id":"official:fbacfce486c1be14","title":"RealWorldQA Benchmark Dataset Card","url":"https://huggingface.co/datasets/xai-org/RealworldQA","published":"2024-04-18","authors":["xAI"],"abstract":"Official xAI dataset card for RealWorldQA, a benchmark for real-world understanding with anonymized vehicle and real-world images plus verifiable question-answer pairs.","companies":["xAI"],"matched_orgs":["xAI"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_report"],"source":"official_report","work_type":"benchmark_dataset_card","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["xAI"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official company report source"}},{"id":"openalex:W4394923298","title":"DynaPipe: Optimizing Multi-task Training through Dynamic Pipelines","url":"https://doi.org/10.1145/3627703.3629585","published":"2024-04-18","authors":["Chenyu Jiang","Zhen Jia","Shuai Zheng","Yida Wang","Chuan Wu"],"abstract":"Multi-task model training has been adopted to enable a single deep neural network model (often a large language model) to handle multiple tasks (e.g., question answering and text summarization). Multi-task training commonly receives input sequences of highly different lengths due to the diverse contexts of different tasks. Padding (to the same sequence length) or packing (short examples into long sequences of the same length) is usually adopted to prepare input samples for model training, which is nonetheless not space or computation efficient. This paper proposes a dynamic micro-batching approach to tackle sequence length variation and enable efficient multi-task model training. We advocate pipelineparallel training of the large model with variable-length micro-batches, each of which potentially comprises a different number of samples. We optimize micro-batch construction using a dynami...","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3627703.3629585","openalex_id":"https://openalex.org/W4394923298","cited_by_count":7,"quality_score":52,"matched_keywords":["language model","efficient"],"author_affiliations":["Amazon (United States)","University of Hong Kong"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8627451658248901},{"id":"https://openalex.org/C170858558","display_name":"Automatic summarization","score":0.7402368783950806},{"id":"https://openalex.org/C2780451532","display_name":"Task (project management)","score":0.6279897689819336},{"id":"https://openalex.org/C43521106","display_name":"Pipeline (software)","score":0.5892365574836731},{"id":"https://openalex.org/C206729178","display_name":"Scheduling (production processes)","score":0.5477926731109619},{"id":"https://openalex.org/C2777211547","display_name":"Training (meteorology)","score":0.536666989326477},{"id":"https://openalex.org/C175309249","display_name":"Pipeline transport","score":0.5255069732666016},{"id":"https://openalex.org/C37404715","display_name":"Dynamic programming","score":0.5021278858184814}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":7}},{"id":"official:9b05409e6571bd37","title":"CYBERSECEVAL 2: A Wide-Ranging Cybersecurity Evaluation Suite for Large Language Models","url":"https://ai.meta.com/research/publications/cyberseceval-2-a-wide-ranging-cybersecurity-evaluation-suite-for-large-language-models/","published":"2024-04-18","authors":["GenAI Cybersec Team","Manish Bhatt","Sahana Chennabasappa","Yue Li","Cyrus Nikolaidis","Daniel Song","Shengye Wan","Faizan Ahmad","Cornelius Aschermann","Yaohui Chen","Dhaval Kapil","David Molnar"],"abstract":"Official Meta AI publication page.","companies":["Meta/FAIR"],"matched_orgs":["Meta/FAIR"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Meta/FAIR"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Meta AI publication search https://ai.meta.com/results/?content_types%5B0%5D=publication&page=14"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/token-level-direct-preference-optimization","title":"Token-level Direct Preference Optimization","url":"https://www.microsoft.com/en-us/research/publication/token-level-direct-preference-optimization/","published":"2024-04-17","authors":["Yongcheng Zeng","Guoqing Liu","Weiyu Ma","Ning Yang","Haifeng Zhang","Jun Wang"],"abstract":"Fine-tuning pre-trained Large Language Models (LLMs) is essential to align them with human values and intentions. This process often utilizes methods like pairwise comparisons and KL divergence against a reference LLM, focusing on the evaluation of full answers generated by the models. However, the generation of these responses occurs in a token level, following a sequential, auto-regressive fashion. In this paper, we introduce Token-level Direct Preference Optimization (TDPO), a novel approach to align LLMs with human preferences by optimizing policy at the token level. Unlike previous methods, which face challenges in divergence efficiency, TDPO incorporates forward KL divergence constraints for each token, improving alignment and diversity. Utilizing the Bradley-Terry model for a token-based reward system, TDPO enhances the regulation of KL divergence, while preserving simplicity with...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":80,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","large language models","1970-01-01","LLM","preference"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W4394869910","title":"Generative large language models are all-purpose text analytics engines: text-to-text learning is all your need","url":"https://doi.org/10.1093/jamia/ocae078","published":"2024-04-17","authors":["Peng Cheng","Xi Yang","Aokun Chen","Zehao Yu","Kaleb E Smith","Anthony Costa","Mona G. Flores","Jiang Bian","Yonghui Wu"],"abstract":"OBJECTIVE: To solve major clinical natural language processing (NLP) tasks using a unified text-to-text learning architecture based on a generative large language model (LLM) via prompt tuning. METHODS: We formulated 7 key clinical NLP tasks as text-to-text learning and solved them using one unified generative clinical LLM, GatorTronGPT, developed using GPT-3 architecture and trained with up to 20 billion parameters. We adopted soft prompts (ie, trainable vectors) with frozen LLM, where the LLM parameters were not updated (ie, frozen) and only the vectors of soft prompts were updated, known as prompt tuning. We added additional soft prompts as a prefix to the input layer, which were optimized during the prompt tuning. We evaluated the proposed method using 7 clinical NLP tasks and compared them with previous task-specific solutions based on Transformer models. RESULTS AND CONCLUSION: The...","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1093/jamia/ocae078","openalex_id":"https://openalex.org/W4394869910","cited_by_count":26,"quality_score":71,"matched_keywords":["LLM","language model"],"author_affiliations":["Nvidia (United States)","UF Health Cancer Center","University of Florida","University of Florida Health"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.821814775466919},{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.7047548294067383},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.6958757638931274},{"id":"https://openalex.org/C66322947","display_name":"Transformer","score":0.6554045081138611},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.6035975813865662},{"id":"https://openalex.org/C153604712","display_name":"Relationship extraction","score":0.5796388387680054},{"id":"https://openalex.org/C2776214188","display_name":"Inference","score":0.5041993856430054},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.4927256405353546}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":26}},{"id":"official:ce1428e4f2d4effe","title":"Code with CodeQwen1.5","url":"https://qwenlm.github.io/blog/codeqwen1.5/","published":"2024-04-16","authors":["Alibaba/Qwen"],"abstract":"GITHUB HUGGING FACE MODELSCOPE DEMO DISCORDIntroduction The advent of advanced programming tools, which harnesses the power of large language models (LLMs), has significantly enhanced programmer productivity and accuracy. Notwithstanding these advancements, dominant coding assistants like Github Copilot, built upon proprietary LLMs, pose notable challenges in terms of cost, privacy, security, and potential copyright infringement. Recognizing the imperative for a more transparent and accessible alternative, the open-source community has embarked on a concerted endeavor to develop open codeLLMs.","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Alibaba/Qwen"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official RSS feed https://qwenlm.github.io/blog/index.xml"}},{"id":"openalex:W4399631359","title":"Interpretable Online Log Analysis Using Large Language Models with Prompt Strategies","url":"https://doi.org/10.1145/3643916.3644408","published":"2024-04-15","authors":["Yilun Liu","Shimin Tao","Weibin Meng","Jingyu Wang","Wenbing Ma","Yuhang Chen","Yanqing Zhao","Hao Yang","Yanfei Jiang"],"abstract":"Automated log analysis is crucial in modern software-intensive systems for facilitating program comprehension throughout software maintenance and engineering life cycles. Existing methods perform tasks such as log parsing and log anomaly detection by providing a single prediction value without interpretation. However, given the increasing volume of system events, the limited interpretability of analysis results hinders analysts' comprehension of program status and their ability to take appropriate actions. Moreover, these methods require substantial in-domain training data, and their performance declines sharply (by up to 62.5%) in online scenarios involving unseen logs from new domains, a common occurrence due to rapid software updates. In this paper, we propose LogPrompt, a novel interpretable log analysis approach for online scenarios. LogPrompt employs large language models (LLMs) to...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3643916.3644408","openalex_id":"https://openalex.org/W4399631359","cited_by_count":27,"quality_score":64,"matched_keywords":[],"author_affiliations":["Beijing University of Posts and Telecommunications","Huawei Technologies (China)","University of Science and Technology of China"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6993101835250854},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.4137713611125946},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.38394129276275635}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":27}},{"id":"apple:b966xeejxsfr5hpmpvpm3hdn","title":"Hierarchical and Dynamic Prompt Compression for Efficient Zero-shot API Usage","url":"https://machinelearning.apple.com/research/hierarchical-dynamic-prompt","published":"2024-04-15","authors":["Yichen Jiang","Marco Del Vecchio","Anders Johannsen","Mohit Bansal"],"abstract":"Long prompts present a significant challenge for practical LLM-based systems that need to operate with low latency and limited resources. We investigate prompt compression for zero-shot dialogue systems that learn to use unseen APIs directly in-context from their documentation, which may take up hundreds of prompt tokens per API. We start from a recently introduced approach (Mu et al., 2023) that learns to compress the prompt into a few “gist...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["LLM","efficient","compression"],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"bytedance-seed:215","title":"HQ-Edit: A High-Quality Dataset for Instruction-based Image Editing","url":"https://seed.bytedance.com/en/research/hq-edit-a-high-quality-dataset-for-instruction-based-image-editing","published":"2024-04-15","authors":["Mude Hui","Siwei Yang","Bingchen Zhao","Yichun Shi","Heng Wang","Peng Wang","Yuyin Zhou","Cihang Xie"],"abstract":"This study introduces HQ-Edit, a high-quality instruction-based image editing dataset with around 200,000 edits. Unlike prior approaches relying on attribute guidance or human feedback on building datasets, we devise a scalable data collection pipeline leveraging advanced foundation models, namely GPT-4V and DALL-E 3. To ensure its high quality, diverse examples are first collected online, expanded, and then used to create high-quality diptychs featuring input and output images with detailed text prompts, followed by precise alignment ensured through post-processing. In addition, we propose two evaluation metrics, Alignment and Coherence, to quantitatively assess the quality of image edit pairs using GPT-4V. HQ-Edits high-resolution images, rich in detail and accompanied by comprehensive editing prompts, substantially enhance the capabilities of existing image editing models. For example...","companies":["ByteDance/Seed"],"matched_orgs":["ByteDance/Seed"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Computer Vision","Vision","ICLR 2025"],"author_affiliations":["ByteDance/Seed"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official ByteDance Seed publication API https://seed.bytedance.com/api/get_article_list_v2"}},{"id":"apple:g267o5nkx3bujghji90jo238","title":"Overcoming the Pitfalls of Vision-Language Model Finetuning for OOD Generalization","url":"https://machinelearning.apple.com/research/overcoming-pitfalls-vision-language","published":"2024-04-15","authors":["Yuhang Zang","Hanlin Goh","Josh Susskind","Chen Huang"],"abstract":"Existing vision-language models exhibit strong generalization on a variety of visual domains and tasks. However, such models mainly perform zero-shot recognition in a closed-set manner, and thus struggle to handle open-domain visual concepts by design. There are recent finetuning methods, such as prompt learning, that not only study the discrimination between in-distribution (ID) and out-of-distribution (OOD) samples, but also show some...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["language model"],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"apple:krko4e7402pgice4w756a15l","title":"Vanishing Gradients in Reinforcement Finetuning of Language Models","url":"https://machinelearning.apple.com/research/vanishing-gradients-reinforcement","published":"2024-04-15","authors":["Noam Razin","Hattie Zhou","Preetum Nakkilan","Josh Susskind","Omid Saremi","Arwen Bradley","Vimal Thilak","Etai Littwin"],"abstract":"Pretrained language models are commonly adapted to comply with human intent and downstream tasks via finetuning. The finetuning process involves supervised finetuning (SFT), using labeled samples, and/or reinforcement learning based fine-tuning (RFT) via policy gradient methods, using a (possibly learned) reward function. This work highlights an overlooked optimization hurdle in RFT: we prove that the expected gradient for an input sample (i.e....","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"apple:qsvxhdbsdh1z9d5k5xbjbtmb","title":"Frequency-Aware Masked Autoencoders for Multimodal Pretraining on Biosignals","url":"https://machinelearning.apple.com/research/frequency-aware-masked-autoencoders","published":"2024-04-15","authors":["Ran Liu","Ellen Zippi","Hadi Pour Ansari","Chris Sandino","Jingping Nie","Hanlin Goh","Erdrin Azemi","Ali Moin"],"abstract":"Inspired by the advancements in foundation models for language-vision modeling, we explore the utilization of transformers and large-scale pretraining on biosignals. In this study, our aim is to design a general-purpose architecture for biosignals that can be easily trained on multiple modalities and can be adapted to new modalities or tasks with ease.The proposed model is designed with three key features: (i) A frequency-aware architecture that...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"openalex:W4394804915","title":"HICL: Hashtag-Driven In-Context Learning for Social Media Natural Language Understanding","url":"https://doi.org/10.1109/tnnls.2024.3384987","published":"2024-04-15","authors":["Hanzhuo Tan","Chunpu Xu","Jing Li","Yuqun Zhang","Zeyang Fang","Zeyu Chen","Baohua Lai"],"abstract":"Natural language understanding (NLU) is integral to various social media applications. However, the existing NLU models rely heavily on context for semantic learning, resulting in compromised performance when faced with short and noisy social media content. To address this issue, we leverage in-context learning (ICL), wherein language models learn to make inferences by conditioning on a handful of demonstrations to enrich the context and propose a novel hashtag-driven ICL (HICL) framework. Concretely, we pretrain a model #Encoder, which employs #hashtags (user-annotated topic labels) to drive BERT-based pretraining through contrastive learning. Our objective here is to enable #Encoder to gain the ability to incorporate topic-related semantic information, which allows it to retrieve topic-related posts to enrich contexts and enhance social media NLU with noisy contexts. To further integra...","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/tnnls.2024.3384987","openalex_id":"https://openalex.org/W4394804915","cited_by_count":2,"quality_score":43,"matched_keywords":["media"],"author_affiliations":["Baidu (China)","Hong Kong Polytechnic University","Southern University of Science and Technology"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8383856415748596},{"id":"https://openalex.org/C2779439875","display_name":"Natural language understanding","score":0.7336613535881042},{"id":"https://openalex.org/C518677369","display_name":"Social media","score":0.7195580005645752},{"id":"https://openalex.org/C153083717","display_name":"Leverage (statistics)","score":0.6847671270370483},{"id":"https://openalex.org/C2779343474","display_name":"Context (archaeology)","score":0.5540265440940857},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.530570924282074},{"id":"https://openalex.org/C185798385","display_name":"Benchmark (surveying)","score":0.5122279524803162},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.4473952054977417}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":2}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/llm-based-test-driven-interactive-code-generation-user-study-and-empirical-evaluation","title":"LLM-Based Test-Driven Interactive Code Generation: User Study and Empirical Evaluation","url":"https://www.microsoft.com/en-us/research/publication/llm-based-test-driven-interactive-code-generation-user-study-and-empirical-evaluation/","published":"2024-04-14","authors":["Sarah Fakhoury","Aaditya Naik","Georgios Sakkas","Saikat Chakraborty","Shuvendu Lahiri"],"abstract":"Large language models (LLMs) have shown great potential in automating significant aspects of coding by producing natural code from informal natural language (NL) intent. However, given NL is informal, it does not lend easily to checking that the generated code correctly satisfies the user intent. In this paper, we propose a novel interactive workflow TiCoder for guided intent clarification (i.e., partial formalization) through tests to support the generation of more accurate code suggestions. Through a mixed methods user study with 15 programmers, we present an empirical evaluation of the effectiveness of the workflow to improve code generation accuracy. We find that participants using the proposed workflow are significantly more likely to correctly evaluate AI generated code, and report significantly less task-induced cognitive load. Furthermore, we test the potential of the workflow at...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Article (Journal)","Programming languages and software engineering","Computer science","LLM"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W4398239202","title":"Exploring the Effectiveness of LLM based Test-driven Interactive Code Generation: User Study and Empirical Evaluation","url":"https://doi.org/10.1145/3639478.3643525","published":"2024-04-14","authors":["Sarah Fakhoury","Aaditya Naik","Georgios K. Sakkas","Saikat Chakraborty","Madan Musuvathi","Shuvendu K. Lahiri"],"abstract":"We introduce a novel workflow, TiCoder, designed to enhance the trust and accuracy of LLM-based code generation through interactive and guided intent formalization. TiCoder partially formalizes ambiguous intent in natural language prompts by generating a set of tests to distinguish common divergent behaviours in generated code suggestions. We evaluate the code generation accuracy improvements provided by TiCoder at scale across four competitive LLMs, and evaluate the cost-benefit trade off of evaluating tests surfaced by TiCoder through a user study with 15 participants.","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3639478.3643525","openalex_id":"https://openalex.org/W4398239202","cited_by_count":8,"quality_score":49,"matched_keywords":["LLM"],"author_affiliations":["Microsoft (United States)","UC San Diego Health System","University of Pennsylvania"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8152695298194885},{"id":"https://openalex.org/C177212765","display_name":"Workflow","score":0.6835774779319763},{"id":"https://openalex.org/C2776760102","display_name":"Code (set theory)","score":0.6142362952232361},{"id":"https://openalex.org/C177264268","display_name":"Set (abstract data type)","score":0.5624353885650635},{"id":"https://openalex.org/C120936955","display_name":"Empirical research","score":0.5462120175361633},{"id":"https://openalex.org/C2777267654","display_name":"Test (biology)","score":0.5235395431518555},{"id":"https://openalex.org/C133162039","display_name":"Code generation","score":0.486870676279068},{"id":"https://openalex.org/C2776187449","display_name":"Natural language generation","score":0.41820722818374634}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":8}},{"id":"openalex:W4398239326","title":"Automated Code Editing with Search-Generate-Modify","url":"https://doi.org/10.1145/3639478.3643124","published":"2024-04-14","authors":["Changshu Liu","Pelin Cetin","Yogesh Patodia","Baishakhi Ray","Saikat Chakraborty","Yangruibo Ding"],"abstract":"Code editing is essential in evolving software development. In literature, several automated code editing tools are proposed, which leverage Information Retrieval-based techniques and Machine Learning-based code generation and code editing models.","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3639478.3643124","openalex_id":"https://openalex.org/W4398239326","cited_by_count":3,"quality_score":44,"matched_keywords":["retrieval"],"author_affiliations":["Columbia University","Microsoft (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8515818119049072},{"id":"https://openalex.org/C153083717","display_name":"Leverage (statistics)","score":0.7255007028579712},{"id":"https://openalex.org/C133162039","display_name":"Code generation","score":0.6316415667533875},{"id":"https://openalex.org/C199360897","display_name":"Programming language","score":0.6234637498855591},{"id":"https://openalex.org/C2776760102","display_name":"Code (set theory)","score":0.6064490675926208},{"id":"https://openalex.org/C121957198","display_name":"KPI-driven code analysis","score":0.48980164527893066},{"id":"https://openalex.org/C43126263","display_name":"Source code","score":0.4774263799190521},{"id":"https://openalex.org/C2777904410","display_name":"Software","score":0.4583932161331177}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":3}},{"id":"openalex:W4394766673","title":"Visual Tuning","url":"https://doi.org/10.1145/3657632","published":"2024-04-12","authors":["Bruce X. B. Yu","Jianlong Chang","Haixin Wang","Lingbo Liu","Shijie Wang","Zhiyu Wang","Junfan Lin","Lingxi Xie","Haojie Li","Zhouchen Lin","Qi Tian","Chang Wen Chen"],"abstract":"Fine-tuning visual models has been widely shown promising performance on many downstream visual tasks. With the surprising development of pre-trained visual foundation models, visual tuning jumped out of the standard modus operandi that fine-tunes the whole pre-trained model or just the fully connected layer. Instead, recent advances can achieve superior performance than full-tuning the whole pre-trained parameters by updating far fewer parameters, enabling edge devices and downstream applications to reuse the increasingly large foundation models deployed on the cloud. With the aim of helping researchers get the full picture and future directions of visual tuning, this survey characterizes a large and thoughtful selection of recent works, providing a systematic and comprehensive overview of existing work and models. Specifically, it provides a detailed background of visual tuning and cat...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"review","doi":"https://doi.org/10.1145/3657632","openalex_id":"https://openalex.org/W4394766673","cited_by_count":27,"quality_score":64,"matched_keywords":[],"author_affiliations":["Hong Kong Polytechnic University","Huawei Technologies (China)","Peking University","Peng Cheng Laboratory","Shandong University of Science and Technology","University of Illinois Urbana-Champaign","Zhejiang University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8825826644897461},{"id":"https://openalex.org/C157524613","display_name":"Fine-tuning","score":0.6395761966705322},{"id":"https://openalex.org/C177284502","display_name":"Adapter (computing)","score":0.6086671948432922},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.48747968673706055},{"id":"https://openalex.org/C206588197","display_name":"Reuse","score":0.45351874828338623},{"id":"https://openalex.org/C81917197","display_name":"Selection (genetic algorithm)","score":0.42998582124710083},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.4299491345882416},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.3630790114402771}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":27}},{"id":"openalex:W4394745219","title":"Pre-training by Predicting Program Dependencies for Vulnerability Analysis Tasks","url":"https://doi.org/10.1145/3597503.3639142","published":"2024-04-12","authors":["Zhongxin Liu","Zhijie Tang","Junwei Zhang","Xin Xia","Xiaohu Yang"],"abstract":"Vulnerability analysis is crucial for software security. Inspired by the success of pre-trained models on software engineering tasks, this work focuses on using pre-training techniques to enhance the understanding of vulnerable code and boost vulnerability analysis. The code understanding ability of a pre-trained model is highly related to its pre-training objectives. The semantic structure, e.g., control and data dependencies, of code is important for vulnerability analysis. However, existing pre-training objectives either ignore such structure or focus on learning to use it. The feasibility and benefits of learning the knowledge of analyzing semantic structure have not been investigated. To this end, this work proposes two novel pre-training objectives, namely Control Dependency Prediction (CDP) and Data Dependency Prediction (DDP), which aim to predict the statement-level control depe...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3597503.3639142","openalex_id":"https://openalex.org/W4394745219","cited_by_count":14,"quality_score":51,"matched_keywords":[],"author_affiliations":["Huawei Technologies (China)","Zhejiang University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7697323560714722},{"id":"https://openalex.org/C95713431","display_name":"Vulnerability (computing)","score":0.5661618709564209},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.5423020720481873},{"id":"https://openalex.org/C43126263","display_name":"Source code","score":0.5337128639221191},{"id":"https://openalex.org/C167063184","display_name":"Vulnerability assessment","score":0.5153855085372925},{"id":"https://openalex.org/C19768560","display_name":"Dependency (UML)","score":0.5105074644088745},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4833654761314392},{"id":"https://openalex.org/C2776760102","display_name":"Code (set theory)","score":0.4497588872909546}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":14}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/researchagent-iterative-research-idea-generation-over-scientific-literature-with-large-language-models","title":"ResearchAgent: Iterative Research Idea Generation over Scientific Literature with Large Language Models","url":"https://www.microsoft.com/en-us/research/publication/researchagent-iterative-research-idea-generation-over-scientific-literature-with-large-language-models/","published":"2024-04-10","authors":["Jinheon Baek","Sujay Kumar Jauhar","Silviu Cucerzan","Sung Ju Hwang"],"abstract":"Scientific Research, vital for improving human life, is hindered by its inherent complexity, slow pace, and the need for specialized experts. To enhance its productivity, we propose a ResearchAgent, a large language model-powered research idea writing agent, which automatically generates problems, methods, and experiment designs while iteratively refining them based on scientific literature. Specifically, starting with a core paper as the primary focus to generate ideas, our ResearchAgent is augmented not only with relevant publications through connecting information over an academic graph but also entities retrieved from an entity-centric knowledge store based on their underlying concepts, mined and shared across numerous papers. In addition, mirroring the human approach to iteratively improving ideas with peer discussions, we leverage multiple ReviewingAgents that provide reviews and f...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":80,"matched_keywords":["Miscellaneous","Artificial intelligence","Computer science","large language models","language model","preference","agent"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W4394699135","title":"Self-Supervised Learning for Time Series Analysis: Taxonomy, Progress, and Prospects","url":"https://doi.org/10.1109/tpami.2024.3387317","published":"2024-04-10","authors":["Kexin Zhang","Qingsong Wen","Chaoli Zhang","Rongyao Cai","Ming Jin","Yong Liu","James Y. Zhang","Yuxuan Liang","Guansong Pang","Dongjin Song","Shirui Pan"],"abstract":"Self-supervised learning (SSL) has recently achieved impressive performance on various time series tasks. The most prominent advantage of SSL is that it reduces the dependence on labeled data. Based on the pre-training and fine-tuning strategy, even a small amount of labeled data can achieve high performance. Compared with many published self-supervised surveys on computer vision and natural language processing, a comprehensive survey for time series SSL is still missing. To fill this gap, we review current state-of-the-art SSL methods for time series data in this article. To this end, we first comprehensively review existing surveys related to SSL and time series, and then provide a new taxonomy of existing time series SSL methods by summarizing them from three perspectives: generative-based, contrastive-based, and adversarial-based. These methods are further divided into ten subcategor...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/tpami.2024.3387317","openalex_id":"https://openalex.org/W4394699135","cited_by_count":195,"quality_score":67,"matched_keywords":[],"author_affiliations":["Alibaba Group (China)","Griffith University","Hong Kong University of Science and Technology","Huzhou University","Monash University","Singapore Management University","University of Connecticut","University of Hong Kong","Zhejiang Normal University","Zhejiang University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8360134363174438},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5875207185745239},{"id":"https://openalex.org/C73555534","display_name":"Cluster analysis","score":0.5586305856704712},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.5567726492881775},{"id":"https://openalex.org/C151406439","display_name":"Time series","score":0.4968166649341583},{"id":"https://openalex.org/C739882","display_name":"Anomaly detection","score":0.4761733412742615},{"id":"https://openalex.org/C143724316","display_name":"Series (stratigraphy)","score":0.4550510346889496},{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.41673752665519714}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":195}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/fip-a-fixed-point-approach-for-causal-generative-modeling","title":"FiP: A Fixed-Point Approach for Causal Generative Modeling","url":"https://www.microsoft.com/en-us/research/publication/fip-a-fixed-point-approach-for-causal-generative-modeling/","published":"2024-04-09","authors":["M. Scetbon","Joel Jennings","Agrin Hilmkil","Cheng Zhang","Chao Ma"],"abstract":"Modeling true world data-generating processes lies at the heart of empirical science. Structural Causal Models (SCMs) and their associated Directed Acyclic Graphs (DAGs) provide an increasingly popular answer to such problems by defining the causal generative process that transforms random noise into observations. However, learning them from observational data poses an ill-posed and NP-hard inverse problem in general. In this work, we propose a new and equivalent formalism that does not require DAGs to describe them, viewed as fixed-point problems on the causally ordered variables, and we show three important cases where they can be uniquely recovered given the topological ordering (TO). To the best of our knowledge, we obtain the weakest conditions for their recovery when TO is known. Based on this, we design a two-stage causal generative model that first infers the causal order from ob...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","Machine learning","mathematics","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"bytedance-seed:74","title":"Magic-Boost: Boost 3D Generation with Mutli-View Conditioned Diffusion","url":"https://seed.bytedance.com/en/research/magic-boost-boost-3d-generation-with-mutli-view-conditioned-diffusion","published":"2024-04-09","authors":["Fan Yang","Jianfeng Zhang","Yichun Shi","Bowen Chen","Chenxu Zhang","Huichao Zhang","Xiaofeng Yang","Jiashi Feng","Guosheng Lin"],"abstract":"Benefiting from the rapid development of 2D diffusion models, 3D content creation has made significant progress recently. One promising solution involves the fine-tuning of pre-trained 2D diffusion models to harness their capacity for producing multi-view images, which are then lifted into accurate 3D models via methods like fast-NeRFs or large reconstruction models. However, as inconsistency still exists and limited generated resolution, the generation results of such methods still lack intricate textures and complex geometries. To solve this problem, we propose Magic-Boost, a multi-view conditioned diffusion model that significantly refines coarse generative results through a brief period of SDS optimization (∼15min). Compared to the previous text or single image based diffusion models, Magic-Boost exhibits a robust capability to generate images with high consistency from pseudo synthe...","companies":["ByteDance/Seed"],"matched_orgs":["ByteDance/Seed"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Computer Vision","Vision","arXiv"],"author_affiliations":["ByteDance/Seed"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official ByteDance Seed publication API https://seed.bytedance.com/api/get_article_list_v2"}},{"id":"apple:ci9o6u17wolyucv7udwyh78l","title":"Data Filtering Networks","url":"https://machinelearning.apple.com/research/data-filtering-networks","published":"2024-04-08","authors":["Alex Fang","Albin Madappally Jose","Amit Jain","Ludwig Schmidt","Alexander Toshev","Vaishaal Shankar"],"abstract":"Large training sets have become a cornerstone of machine learning and are the foundation for recent advances in language modeling and multimodal learning. While data curation for pre-training is often still ad-hoc, one common paradigm is to first collect a massive pool of data from the Web and then filter this candidate pool down to an actual training set via various heuristics. In this work, we study the problem of learning a data filtering...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"official:acff834d7c4d7295","title":"MART: Improving LLM Safety with Multi-round Automatic Red-Teaming","url":"https://ai.meta.com/research/publications/mart-improving-llm-safety-with-multi-round-automatic-red-teaming/","published":"2024-04-05","authors":["Suyu Ge","Chunting Zhou","Rui Hou","Madian Khabsa","Yi-Chia Wang","Qifan Wang","Jiawei Han","Yuning Mao"],"abstract":"Official Meta AI publication page.","companies":["Meta/FAIR"],"matched_orgs":["Meta/FAIR"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Conversational AI","NLP","LLM"],"author_affiliations":["Meta/FAIR"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Meta AI publication search https://ai.meta.com/results/?content_types%5B0%5D=publication&page=15"}},{"id":"openalex:W4393934663","title":"A multimodal data fusion model for accurate and interpretable urban land use mapping with uncertainty analysis","url":"https://doi.org/10.1016/j.jag.2024.103805","published":"2024-04-04","authors":["Xiaoqin Yan","Zhangwei Jiang","Peng Luo","Hao Wu","Anning Dong","Fengling Mao","Ziyin Wang","Hong Liu","Yao Yao"],"abstract":"Urban land use patterns can be more accurately mapped by fusing multimodal data. However, many studies only consider socioeconomic and physical attributes within land parcels, neglecting spatial interaction and uncertainty caused by multimodal data. To address these issues, we constructed a multimodal data fusion model (MDFNet) to extract natural physical, socioeconomic, and spatial connectivity ancillary information from multimodal data. We also established an uncertainty analysis framework based on a generalized additive model and learnable weight module to explain data-driven uncertainty. Shenzhen was chosen as the demonstration area. The results demonstrated the effectiveness of the proposed method, with a test accuracy of 0.882 and a Kappa of 0.858. Uncertainty analysis indicated the contributions in overall task of 0.361, 0.308, and 0.232 for remote sensing, social sensing, and tax...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1016/j.jag.2024.103805","openalex_id":"https://openalex.org/W4393934663","cited_by_count":21,"quality_score":58,"matched_keywords":[],"author_affiliations":["Alibaba Group (China)","China University of Geosciences","Peking University","Technical University of Munich","The University of Tokyo","Tokyo University of Information Sciences"],"concepts":[{"id":"https://openalex.org/C33954974","display_name":"Sensor fusion","score":0.5249409675598145},{"id":"https://openalex.org/C205649164","display_name":"Geography","score":0.5245859026908875},{"id":"https://openalex.org/C58640448","display_name":"Cartography","score":0.5240723490715027},{"id":"https://openalex.org/C124101348","display_name":"Data mining","score":0.44575101137161255},{"id":"https://openalex.org/C4792198","display_name":"Land use","score":0.4393942356109619},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.4390827715396881},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4004303812980652},{"id":"https://openalex.org/C127413603","display_name":"Engineering","score":0.1572379171848297}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":21}},{"id":"official:2bf4ae1cff56f5a4","title":"DP-RDM: Adapting Diffusion Models to Private Domains Without Fine-Tuning","url":"https://ai.meta.com/research/publications/dp-rdm-adapting-diffusion-models-to-private-domains-without-fine-tuning/","published":"2024-04-04","authors":["Jonathan Lebensold","Maziar Sanjabi","Pietro Astolfi","Adriana Romero Soriano","Kamalika Chaudhuri","Mike Rabbat","Chuan Guo"],"abstract":"Official Meta AI publication page.","companies":["Meta/FAIR"],"matched_orgs":["Meta/FAIR"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Core Machine Learning"],"author_affiliations":["Meta/FAIR"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Meta AI publication search https://ai.meta.com/results/?content_types%5B0%5D=publication&page=15"}},{"id":"apple:ab62r5vi6eetr36bt4g7vyd1","title":"MobileCLIP: Fast Image-Text Models through Multi-Modal Reinforced Training","url":"https://machinelearning.apple.com/research/mobileclip","published":"2024-04-04","authors":["Pavan Kumar Anasosalu Vasu","Hadi Pour Ansari","Fartash Faghri","Raviteja Vemulapalli","Oncel Tuzel"],"abstract":"Equal Contributors","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/know-your-neighbors-improving-single-view-reconstruction-via-spatial-vision-language-reasoning","title":"Know Your Neighbors: Improving Single-View Reconstruction via Spatial Vision-Language Reasoning","url":"https://www.microsoft.com/en-us/research/publication/know-your-neighbors-improving-single-view-reconstruction-via-spatial-vision-language-reasoning/","published":"2024-04-03","authors":["Rui Li","Tobias Fischer","Mattia Segu","Marc Pollefeys","L. V. Gool","Federico Tombari"],"abstract":"Recovering the 3D scene geometry from a single view is a fundamental yet ill-posed problem in computer vision. While classical depth estimation methods infer only a 2.5D scene representation limited to the image plane, recent approaches based on radiance fields reconstruct a full 3D representation. However, these methods still struggle with occluded regions since inferring geometry without visual observation requires (i) semantic knowledge of the surroundings, and (ii) reasoning about spatial context. We propose KYN, a novel method for single-view scene reconstruction that reasons about semantic and spatial context to predict each point's density. We introduce a vision-language modulation module to enrich point features with fine-grained semantic information. We aggregate point representations across the scene through a language-guided spatial attention mechanism to yield per-point densi...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Inproceedings (Conference)","Computer vision","Computer science","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"official:9948d121393e6c45","title":"Sieve: Multimodal Dataset Pruning Using Image Captioning Models","url":"https://ai.meta.com/research/publications/sieve-multimodal-dataset-pruning-using-image-captioning-models/","published":"2024-04-03","authors":["Anas Mahmoud","Mostafa Elhoushi","Amro Abbas","Yu Yang","Newsha Ardalani","Hugh Leather","Ari Morcos"],"abstract":"Official Meta AI publication page.","companies":["Meta/FAIR"],"matched_orgs":["Meta/FAIR"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Computer Vision"],"author_affiliations":["Meta/FAIR"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Meta AI publication search https://ai.meta.com/results/?content_types%5B0%5D=publication&page=15"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/retrieve-what-you-need-a-mutual-learning-framework-for-open-domain-question-answering","title":"LLM Agent - Retrieve What You Need: A Mutual Learning Framework for Open-domain Question Answering","url":"https://www.microsoft.com/en-us/research/publication/retrieve-what-you-need-a-mutual-learning-framework-for-open-domain-question-answering/","published":"2024-04-02","authors":["Dingmin Wang","Qiuyuan Huang","Matthew Jackson","Jianfeng Gao"],"abstract":"An open-domain question answering (QA) system usually follows a retrieve-then-read paradigm, in which a retriever is used to retrieve relevant documents from a large corpus, and then a reader generates answers based on the retrieved documents and the original question. In this paper, we propose a simple and novel mutual learning framework to improve the performance of retrieve-then-read-style models via an intermediate module named the knowledge selector, which we train with reinforcement learning. The key benefits of our proposed intermediate module are: 1) no requirement for additional annotated question-passage pairs; 2) improvements in both retrieval and QA performance, as well as computational efficiency, compared to prior competitive retrieve-then-read models; 3) with no fine tuning, improvement in the zero-shot performance of large-scale pre-trained language models, e.g., ChatGPT,...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":96,"matched_keywords":["Article (Journal)","Artificial intelligence","Human language technologies","Human-computer interaction","Agnet","Knowledge transfer","NLP","1970-01-01","LLM","retrieval","agent"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/are-large-language-model-based-evaluators-the-solution-to-scaling-up-multilingual-evaluation","title":"METAL: Towards Multilingual Meta-Evaluation","url":"https://www.microsoft.com/en-us/research/publication/are-large-language-model-based-evaluators-the-solution-to-scaling-up-multilingual-evaluation/","published":"2024-04-02","authors":["Rishav Hada","Varun Gumma","Mohamed Ahmed","Kalika Bali","Sunayana Sitaram"],"abstract":"With the rising human-like precision of Large Language Models (LLMs) in numerous tasks, their utilization in a variety of real-world applications is becoming more prevalent. Several studies have shown that LLMs excel on many standard NLP benchmarks. However, it is challenging to evaluate LLMs due to test dataset contamination and the limitations of traditional metrics. Since human evaluations are difficult to collect, there is a growing interest in the community to use LLMs themselves as reference-free evaluators for subjective metrics. However, past work has shown that LLM-based evaluators can exhibit bias and have poor alignment with human judgments. In this study, we propose a framework for an end-to-end assessment of LLMs as evaluators in multilingual scenarios. We create a carefully curated dataset, covering 10 languages containing native speaker judgments for the task of summarizat...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Article (Journal)","Artificial intelligence","Computer science","LLM"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"official:b92e05f388a1ebab","title":"Qwen1.5-32B: Fitting the Capstone of the Qwen1.5 Language Model Series","url":"https://qwenlm.github.io/blog/qwen1.5-32b/","published":"2024-04-02","authors":["Alibaba/Qwen"],"abstract":"GITHUB HUGGING FACE MODELSCOPE DEMO DISCORDIntroduction The open-source community has long sought a model that strikes an ideal balance between performance, efficiency, and memory footprint. Despite the emergence of cutting-edge models like Qwen1.5-72B and DBRX, the models have faced persistent challenges such as large memory consumption, slow inference speed, and substantial finetuning costs.A growing consensus within the field now points to a model with approximately 30 billion parameters as the optimal “sweet spot” for achieving both strong performance and manageable resource requirements.","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["language model","memory"],"author_affiliations":["Alibaba/Qwen"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official RSS feed https://qwenlm.github.io/blog/index.xml"}},{"id":"openalex:W4393563223","title":"Continued pretraining for enhanced multi-organ segmentation from CT images","url":"http://dx.doi.org/10.1117/12.3006630","published":"2024-04-02","authors":["Yaqi Yang","Chen Shen","Yucheng Tang","Holger R. Roth","Masahiro Oda","Yuichiro Hayashi","Kazunari Misawa","Kensaku Mori"],"abstract":"Self-supervised pretraining has shown great performance in improving the accuracy of downstream tasks. Although pretraining on a large dataset improves performances, it becomes challenging to further optimize the model by solely enlarging the dataset. In contrast, additional adaptation of pretrained models to the target domain has shown promise in NLP. Inspired by the success of continual pretraining, we investigated the efficacy of adapting the target domain dataset to a pretrained model in medical imaging, particularly in the context of segmentation. We present a study based on a self-supervised pretraining framework using the SwinUNETR backbone. In this study, we improved the generalizability of the self-supervised pretraining by adapting a foundational model pretrained on 5k CT volumes to data of the downstream segmentation task. In detail, we employed 385 abdominal CT volumes for th...","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1117/12.3006630","openalex_id":"https://openalex.org/W4393563223","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Aichi Cancer Center","Nagoya University","Nvidia (United States)"],"concepts":[{"id":"https://openalex.org/C27158222","display_name":"Generalizability theory","score":0.8947457075119019},{"id":"https://openalex.org/C89600930","display_name":"Segmentation","score":0.8094676733016968},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7879584431648254},{"id":"https://openalex.org/C2780451532","display_name":"Task (project management)","score":0.7170457243919373},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.7134600877761841},{"id":"https://openalex.org/C2779343474","display_name":"Context (archaeology)","score":0.6253073811531067},{"id":"https://openalex.org/C2776434776","display_name":"Domain adaptation","score":0.5972952842712402},{"id":"https://openalex.org/C139807058","display_name":"Adaptation (eye)","score":0.4730616807937622}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/prompts-as-programs-a-structure-aware-approach-to-efficient-compile-time-prompt-optimization","title":"Prompts As Programs: A Structure-Aware Approach to Efficient Compile-Time Prompt Optimization","url":"https://www.microsoft.com/en-us/research/publication/prompts-as-programs-a-structure-aware-approach-to-efficient-compile-time-prompt-optimization/","published":"2024-04-01","authors":["Tobias Schnabel","Jennifer Neville"],"abstract":"Large language models (LLMs) can now handle longer and more complex inputs, which facilitate the use of more elaborate prompts. However, prompts often require some tuning to improve performance for deployment. Recent work has proposed automatic prompt optimization methods, but as prompt complexity and LLM strength increase, many prompt optimization techniques are no longer sufficient and a new approach is needed to optimize {\\em meta prompt programs}. To address this, we introduce SAMMO, a framework for {\\em compile-time} optimizations of metaprompt programs, which represent prompts as structured objects that allows for a rich set of transformations that can be searched over during optimization. We show that SAMMO generalizes previous methods and improves the performance of complex prompts on (1) instruction tuning, (2) RAG pipeline tuning, and (3) prompt compression, across several diff...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":80,"matched_keywords":["Unpublished","Artificial intelligence","Computation and Language","Machine learning","LLM","efficient","compression"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/towards-greener-llms-bringing-energy-efficiency-to-the-forefront-of-llm-inference","title":"Towards Greener LLMs: Bringing Energy-Efficiency to the Forefront of LLM Inference","url":"https://www.microsoft.com/en-us/research/publication/towards-greener-llms-bringing-energy-efficiency-to-the-forefront-of-llm-inference/","published":"2024-04-01","authors":["Jovan Stojkovic","Esha Choukse","Chaojie Zhang","Íñigo Goiri","Josep Torrellas"],"abstract":"With the ubiquitous use of modern large language models (LLMs) across industries, the inference serving for these models is ever expanding. Given the high compute and memory requirements of modern LLMs, more and more top-of-the-line GPUs are being deployed to serve these models. Energy availability has come to the forefront as the biggest challenge for data center expansion to serve these models. In this paper, we present the trade-offs brought up by making energy efficiency the primary goal of LLM serving under performance SLOs. We show that depending on the inputs, the model, and the service-level agreements, there are several knobs available to the LLM inference provider to use for being energy efficient. We characterize the impact of these knobs on the latency, throughput, as well as the energy. By exploring these trade-offs, we offer valuable insights into optimizing energy usage wi...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Systems and networking","1970-01-01","LLM","memory","efficient"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/natural-language-supervision-for-general-purpose-audio-representations","title":"Natural Language Supervision For General-Purpose Audio Representations","url":"https://www.microsoft.com/en-us/research/publication/natural-language-supervision-for-general-purpose-audio-representations/","published":"2024-04-01","authors":["Benjamin Elizalde","Soham Deshmukh","Huaming Wang"],"abstract":"Audio-Language models jointly learn multimodal text and audio representations that enable Zero-Shot inference. Models rely on the encoders to create powerful representations of the input and generalize to multiple tasks ranging from sounds, music, and speech. Although models have achieved remarkable performance, there is still a gap with task-specific models. In this paper, we propose a Contrastive Language-Audio Pretraining model that is pretrained with a diverse collection of 4.6M audio-text pairs employing two innovative encoders for Zero-Shot inference. To learn audio representations, we trained an audio encoder on 22 audio tasks, instead of the standard training of sound event classification. To learn language representations, we trained an autoregressive decoder-only model instead of the standard encoder-only models. Then, the audio and language representations are brought into a j...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Audio and Acoustics","Audio and Speech Processing","Audio signal processing"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/injecting-new-knowledge-into-large-language-models-via-supervised-fine-tuning","title":"Injecting New Knowledge into Large Language Models via Supervised Fine-Tuning","url":"https://www.microsoft.com/en-us/research/publication/injecting-new-knowledge-into-large-language-models-via-supervised-fine-tuning/","published":"2024-04-01","authors":["Nick Mecklenburg","Yiyou Lin","Xiaoxiao Li","Daniel Holstein","Leonardo Nunes","Sara Malvar","Bruno Silva","Ranveer Chandra","Vijay Aski","Pavan Kumar Reddy Yannam","Tolga Aktas"],"abstract":"In recent years, Large Language Models (LLMs) have shown remarkable performance in generating human-like text, proving to be a valuable asset across various applications. However, adapting these models to incorporate new, out-of-domain knowledge remains a challenge, particularly for facts and events that occur after the model's knowledge cutoff date. This paper investigates the effectiveness of Supervised Fine-Tuning (SFT) as a method for knowledge injection in LLMs, specifically focusing on the domain of recent sporting events. We compare different dataset generation strategies -- token-based and fact-based scaling -- to create training data that helps the model learn new information. Our experiments on GPT-4 demonstrate that while token-based scaling can lead to improvements in Q&A accuracy, it may not provide uniform coverage of new knowledge. Fact-based scaling, on the other hand, of...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Article (Journal)","Artificial intelligence","Data platforms and analytics","large language models","LLM"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/t-sot-fnt-streaming-multi-talker-asr-with-text-only-domain-adaptation-capability","title":"t-SOT FNT: streaming multi-talker ASR with text-only domain adaptation capability","url":"https://www.microsoft.com/en-us/research/publication/t-sot-fnt-streaming-multi-talker-asr-with-text-only-domain-adaptation-capability/","published":"2024-04-01","authors":["Jian Wu","Naoyuki Kanda","Takuya Yoshioka","Rui Zhao","Zhuo Chen","Jinyu Li"],"abstract":"Token-level serialized output training (t-SOT) was recently proposed to address the challenge of streaming multi-talker automatic speech recognition (ASR). T-SOT effectively handles overlapped speech by representing multi-talker transcriptions as a single token stream with $\\langle \\text{cc}\\rangle$ symbols interspersed. However, the use of a naive neural transducer architecture significantly constrained its applicability for text-only adaptation. To overcome this limitation, we propose a novel t-SOT model structure that incorporates the idea of factorized neural transducers (FNT). The proposed method separates a language model (LM) from the transducer's predictor and handles the unnatural token order resulting from the use of $\\langle \\text{cc}\\rangle$ symbols in t-SOT. We achieve this by maintaining multiple hidden states and introducing special handling of the $\\langle \\text{cc}\\rangl...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.1109/icassp48485.2024.10447904","openalex_id":"https://openalex.org/W4392903212","cited_by_count":2,"quality_score":70,"matched_keywords":["Inproceedings (Conference)","Human language technologies","1970-01-01","language model"],"author_affiliations":["Microsoft","Microsoft (United States)"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/training-free-multi-objective-diffusion-model-for-3d-molecule-generation","title":"Training-free Multi-objective Diffusion Model for 3D Molecule Generation","url":"https://www.microsoft.com/en-us/research/publication/training-free-multi-objective-diffusion-model-for-3d-molecule-generation/","published":"2024-04-01","authors":["Xu Han","Caihua Shan","Yifei Shen","Can Xu","Han Yang","Xiang Li","Dongsheng Li"],"abstract":"Searching for novel and diverse molecular candidates is a critical undertaking in drug and material discovery. Existing approaches have successfully adapted the diffusion model, the most effective generative model in image generation, to create 1D SMILES strings, 2D chemical graphs, or 3D molecular conformers. However, these methods are not efficient and flexible enough to generate 3D molecules with multiple desired properties, as they require additional training for the models for each new property or even a new combination of existing properties. Moreover, some properties may potentially conflict, making it impossible to find a molecule that satisfies all of them simultaneously. To address these challenges, we present a training-free conditional 3D molecular generation algorithm based on off-the-shelf unconditional diffusion models and property prediction models. The key techniques inc...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","1970-01-01","efficient"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/sat2scene-3d-urban-scene-generation-from-satellite-images-with-diffusion","title":"Sat2Scene: 3D Urban Scene Generation from Satellite Images with Diffusion","url":"https://www.microsoft.com/en-us/research/publication/sat2scene-3d-urban-scene-generation-from-satellite-images-with-diffusion/","published":"2024-04-01","authors":["Zuoyue Li","Zhenqiang Li","Zhaopeng Cui","Marc Pollefeys","Martin R. Oswald"],"abstract":"Directly generating scenes from satellite imagery offers exciting possibilities for integration into applications like games and map services. However, challenges arise from significant view changes and scene scale. Previous efforts mainly focused on image or video generation, lacking exploration into the adaptability of scene generation for arbitrary views. Existing 3D generation works either operate at the object level or are difficult to utilize the geometry obtained from satellite imagery. To overcome these limitations, we propose a novel architecture for direct 3D scene generation by introducing diffusion models into 3D sparse representations and combining them with neural rendering techniques. Specifically, our approach generates texture colors at the point level for a given geometry using a 3D diffusion model first, which is then transformed into a scene representation in a feed-f...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer vision","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/enabling-memory-safety-of-c-programs-using-llms","title":"Enabling Memory Safety of C Programs using LLMs","url":"https://www.microsoft.com/en-us/research/publication/enabling-memory-safety-of-c-programs-using-llms/","published":"2024-04-01","authors":["Nausheen Mohammed","Akash Lal","Aseem Rastogi","Subhajit Roy","Rahul Sharma"],"abstract":"Memory safety violations in low-level code, written in languages like C, continues to remain one of the major sources of software vulnerabilities. One method of removing such violations by construction is to port C code to a safe C dialect. Such dialects rely on programmer-supplied annotations to guarantee safety with minimal runtime overhead. This porting, however, is a manual process that imposes significant burden on the programmer and, hence, there has been limited adoption of this technique.The task of porting not only requires inferring annotations, but may also need refactoring/rewriting of the code to make it amenable to such annotations. In this paper, we use Large Language Models (LLMs) towards addressing both these concerns. We show how to harness LLM capabilities to do complex code reasoning as well as rewriting of large codebases. We also present a novel framework for whole-...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Tech Report","Programming languages and software engineering","LLM","memory"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"official:1705ad0f0fa1ee74","title":"LCM-Lookahead for Encoder-based Text-to-Image Personalization","url":"https://research.nvidia.com/publication/2024-04_lcm-lookahead-encoder-based-text-image-personalization","published":"2024-04","authors":["Rinon Gal","Or Lichter","Elad Richardson","Or Patashnik","Amit H Bermano","Gal Chechik","Daniel Cohen-Or"],"abstract":"Official NVIDIA Research publication. ECCV","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.1007/978-3-031-72630-9_19","openalex_id":"https://openalex.org/W4405003184","cited_by_count":17,"quality_score":77,"matched_keywords":["ECCV","personalization"],"author_affiliations":["NVIDIA","Nvidia (United States)","Tel Aviv University"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official NVIDIA Research publications page https://research.nvidia.com/publications?f%5B0%5D=publication_date%3A2024&page=3"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/large-language-models-are-capable-of-offering-cognitive-reappraisal-if-guided","title":"Large Language Models are Capable of Offering Cognitive Reappraisal, if Guided","url":"https://www.microsoft.com/en-us/research/publication/large-language-models-are-capable-of-offering-cognitive-reappraisal-if-guided/","published":"2024-03-31","authors":["Hongli Zhan","Allen Zheng","Yoon Kyung Lee","Jina Suh","Junyi Jessy Li","Desmond C. Ong"],"abstract":"Large language models (LLMs) have offered new opportunities for emotional support, and recent work has shown that they can produce empathic responses to people in distress. However, long-term mental well-being requires emotional self-regulation, where a one-time empathic response falls short. This work takes a first step by engaging with cognitive reappraisals, a strategy from psychology practitioners that uses language to targetedly change negative appraisals that an individual makes of the situation; such appraisals is known to sit at the root of human emotional experience. We hypothesize that psychologically grounded principles could enable such advanced psychology capabilities in LLMs, and design RESORT which consists of a series of reappraisal constitutions across multiple dimensions that can be used as LLM instructions. We conduct a first-of-its-kind expert evaluation (by clinical....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":84,"matched_keywords":["Miscellaneous","Artificial intelligence","Human-computer interaction","Computation and Language","Computer science","LLM","long-term","media"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/llm-as-a-mastermind-a-survey-of-strategic-reasoning-with-large-language-models","title":"LLM as a Mastermind: A Survey of Strategic Reasoning with Large Language Models","url":"https://www.microsoft.com/en-us/research/publication/llm-as-a-mastermind-a-survey-of-strategic-reasoning-with-large-language-models/","published":"2024-03-31","authors":["Yadong Zhang","Shaoguang Mao","Tao Ge","Xun Wang","Adrian de Wynter","Yan Xia","Wenshan Wu","Ting Song","Man Lan","Furu Wei"],"abstract":"This paper presents a comprehensive survey of the current status and opportunities for Large Language Models (LLMs) in strategic reasoning, a sophisticated form of reasoning that necessitates understanding and predicting adversary actions in multi-agent settings while adjusting strategies accordingly. Strategic reasoning is distinguished by its focus on the dynamic and uncertain nature of interactions among multi-agents, where comprehending the environment and anticipating the behavior of others is crucial. We explore the scopes, applications, methodologies, and evaluation metrics related to strategic reasoning with LLMs, highlighting the burgeoning development in this area and the interdisciplinary approaches enhancing their decision-making performance. It aims to systematize and clarify the scattered literature on this subject, providing a systematic review that underscores the importa...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":80,"matched_keywords":["Miscellaneous","Artificial intelligence","Computation and Language","Computer science","LLM","agent","multi-agent"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/a-framework-for-exploring-the-consequences-of-ai-mediated-enterprise-knowledge-access-and-identifying-risks-to-workers","title":"A Framework for Exploring the Consequences of AI-Mediated Enterprise Knowledge Access and Identifying Risks to Workers","url":"https://www.microsoft.com/en-us/research/publication/a-framework-for-exploring-the-consequences-of-ai-mediated-enterprise-knowledge-access-and-identifying-risks-to-workers/","published":"2024-03-30","authors":["Anna Gausen","Bhaskar Mitra","Sin Lindley"],"abstract":"Organisations generate vast amounts of information, which has resulted in a long-term research effort into knowledge access systems for enterprise settings. Recent developments in artificial intelligence, in relation to large language models, are poised to have significant impact on knowledge access. This has the potential to shape the workplace and knowledge in new and unanticipated ways. Many risks can arise from the deployment of these types of AI systems, due to interactions between the technical system and organisational power dynamics.This paper presents the Consequence-Mechanism-Risk framework to identify risks to workers from AI-mediated enterprise knowledge access systems. We have drawn on wide-ranging literature detailing risks to workers, and categorised risks as being to worker value, power, and wellbeing. The contribution of our framework is to additionally consider (i) the....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":108,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Human-computer interaction","Search and information retrieval","Social sciences","algorithmic fairness","Computer-supported cooperative work","Human–computer interaction","Information extraction","Knowledge extraction","Knowledge management","Sociotechnical system","1970-01-01","long-term"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/wavllm-towards-robust-and-adaptive-speech-large-language-model","title":"WavLLM: Towards Robust and Adaptive Speech Large Language Model","url":"https://www.microsoft.com/en-us/research/publication/wavllm-towards-robust-and-adaptive-speech-large-language-model/","published":"2024-03-30","authors":["Shujie Hu","Long Zhou","Shujie Liu","Sanyuan Chen","Hongkun Hao","Jing Pan","Xunying Liu","Jinyu Li","Sunit Sivasankaran","Linquan Liu","Furu Wei"],"abstract":"The recent advancements in large language models (LLMs) have revolutionized the field of natural language processing, progressively broadening their scope to multimodal perception and generation. However, effectively integrating listening capabilities into LLMs poses significant challenges, particularly with respect to generalizing across varied contexts and executing complex auditory tasks. In this work, we introduce WavLLM, a robust and adaptive speech large language model with dual encoders, and a prompt-aware LoRA weight adapter, optimized by a two-stage curriculum learning approach. Leveraging dual encoders, we decouple different types of speech information, utilizing a Whisper encoder to process the semantic content of speech, and a WavLM encoder to capture the unique characteristics of the speaker's identity. Within the curriculum learning framework, WavLLM first builds its founda...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":88,"matched_keywords":["Miscellaneous","Artificial intelligence","Audio and Acoustics","Audio and Speech Processing","Computation and Language","Computer science","Engineering","sound","language model"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/quarot-outlier-free-4-bit-inference-in-rotated-llms","title":"QuaRot: Outlier-Free 4-Bit Inference in Rotated LLMs","url":"https://www.microsoft.com/en-us/research/publication/quarot-outlier-free-4-bit-inference-in-rotated-llms/","published":"2024-03-30","authors":["Saleh Ashkboos","Amirkeivan Mohtashami","Maximilian L. Croci","Bo Li","Martin Jaggi","Dan Alistarh","Torsten Hoefler","James Hensman","Pashmina Cameron"],"abstract":"We introduce QuaRot, a new Quantization scheme based on Rotations, which is able to quantize LLMs end-to-end, including all weights, activations, and KV cache in 4 bits. QuaRot rotates LLMs in a way that removes outliers from the hidden state without changing the output, making quantization easier. This computational invariance is applied to the hidden state (residual) of the LLM, as well as to the activations of the feed-forward components, aspects of the attention mechanism and to the KV cache. The result is a quantized model where all matrix multiplications are performed in 4-bits, without any channels identified for retention in higher precision. Our quantized LLaMa2-70B model has losses of at most 0.29 WikiText-2 perplexity and retains 99% of the zero-shot performance. Code is available at: https://github.com/spcl/QuaRot .","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","large language models","LLM","quantization"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/codi-2-in-context-interleaved-and-interactive-any-to-any-generation","title":"CoDi-2: In-Context, Interleaved, and Interactive Any-to-Any Generation","url":"https://www.microsoft.com/en-us/research/publication/codi-2-in-context-interleaved-and-interactive-any-to-any-generation/","published":"2024-03-29","authors":["Zineng Tang","Ziyi Yang","Mahmoud Khademi","Yang Liu","Chenguang Zhu","Mohit Bansal"],"abstract":"We present CoDi-2, a Multimodal Large Language Model (MLLM) for learning in-context interleaved multi-modal representations. By aligning modalities with language for both encoding and generation, CoDi-2 empowers Large Language Models (LLMs) to understand modality-interleaved instructions and in-context examples and autoregressively generate grounded and coherent multimodal outputs in an any-to-any input-output modality paradigm. To train CoDi-2, we build a large-scale generation dataset encompassing in-context multimodal instructions across text, vision, and audio. CoDi-2 demonstrates a wide range of zero-shot and few-shot capabilities for tasks like editing, exemplar learning, composition, reasoning, etc. CoDi-2 surpasses previous domain-specific models on tasks such as subject-driven image generation, vision transformation, and audio editing and showcases a significant advancement for....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":80,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer vision","Generative AI","Multimodal Large Language Models","1970-01-01","language model"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"apple:a1hr46h1q13049f8rxh23n7p","title":"Towards a World-English Language Model","url":"https://machinelearning.apple.com/research/world-english-language-model","published":"2024-03-29","authors":["Rricha Jalota","Lyan Verwimp","Markus Nussbaum-Thom","Amr Mousa","Arturo Argueta","Youssef Oualil"],"abstract":"Neural Network Language Models (NNLMs) of Virtual Assistants (VAs) are generally language-, region-, and in some cases, device-dependent, which increases the effort to scale and maintain them. Combining NNLMs for one or more of the categories could be one way to improve scalability. In this work, we combine regional variants of English by building a \"World English\" NNLM. We examine three data sampling techniques and we experiment with adding...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["language model"],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/mlcopilot-unleashing-the-power-of-large-language-models-in-solving-machine-learning-tasks","title":"MLCopilot: Unleashing the Power of Large Language Models in Solving Machine Learning Tasks","url":"https://www.microsoft.com/en-us/research/publication/mlcopilot-unleashing-the-power-of-large-language-models-in-solving-machine-learning-tasks/","published":"2024-03-28","authors":["Lei Zhang","Yuge Zhang","Kan Ren","Dongsheng Li","Yuqing Yang"],"abstract":"The field of machine learning (ML) has gained widespread adoption, leading to significant demand for adapting ML to specific scenarios, which is yet expensive and non-trivial. The predominant approaches towards the automation of solving ML tasks (e.g., AutoML) are often time-consuming and hard to understand for human developers. In contrast, though human engineers have the incredible ability to understand tasks and reason about solutions, their experience and knowledge are often sparse and difficult to utilize by quantitative approaches. In this paper, we aim to bridge the gap between machine intelligence and human knowledge by introducing a novel framework MLCopilot, which leverages the state-of-the-art large language models to develop ML solutions for novel tasks. We showcase the possibility of extending the capability of LLMs to comprehend structured inputs and perform thorough reason...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","1970-01-01","LLM"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"official:4b6e38973d3b45f6","title":"Qwen1.5-MoE: Matching 7B Model Performance with 1/3 Activated Parameters","url":"https://qwenlm.github.io/blog/qwen-moe/","published":"2024-03-28","authors":["Alibaba/Qwen"],"abstract":"GITHUB HUGGING FACE MODELSCOPE DEMO DISCORDIntroduction Since the surge in interest sparked by Mixtral, research on mixture-of-expert (MoE) models has gained significant momentum. Both researchers and practitioners are keenly interested in understanding how to effectively train such models and assessing their efficiency and effectiveness. Today, we introduce Qwen1.5-MoE-A2.7B, a small MoE model with only 2.7 billion activated parameters yet matching the performance of state-of-the-art 7B models like Mistral 7B and Qwen1.","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Alibaba/Qwen"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official RSS feed https://qwenlm.github.io/blog/index.xml"}},{"id":"official:d22821aab778c9a7","title":"Grok-1 Open-Weights Model Card","url":"https://huggingface.co/xai-org/grok-1","published":"2024-03-28","authors":["xAI"],"abstract":"Official xAI model card for the Grok-1 open-weights release. The card describes the released weights, code repository, download instructions, runtime requirements, and Apache-2.0 license.","companies":["xAI"],"matched_orgs":["xAI"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_report"],"source":"official_report","work_type":"model_card","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["xAI"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official company report source"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/gaussiancube-a-structured-and-explicit-radiance-representation-for-3d-generative-modeling","title":"GaussianCube: A Structured and Explicit Radiance Representation for 3D Generative Modeling","url":"https://www.microsoft.com/en-us/research/publication/gaussiancube-a-structured-and-explicit-radiance-representation-for-3d-generative-modeling/","published":"2024-03-27","authors":["Bowen Zhang","Yiji Cheng","Jiaolong Yang","Chunyu Wang","Feng Zhao","Yansong Tang","Dong Chen","Baining Guo"],"abstract":"We introduce a radiance representation that is both structured and fully explicit and thus greatly facilitates 3D generative modeling. Existing radiance representations either require an implicit feature decoder, which significantly degrades the modeling power of the representation, or are spatially unstructured, making them difficult to integrate with mainstream 3D diffusion methods. We derive GaussianCube by first using a novel densification-constrained Gaussian fitting algorithm, which yields high-accuracy fitting using a fixed number of free Gaussians, and then rearranging these Gaussians into a predefined voxel grid via Optimal Transport. Since GaussianCube is a structured grid representation, it allows us to use standard 3D U-Net as our backbone in diffusion modeling without elaborate designs. More importantly, the high-accuracy fitting of the Gaussians allows us to achieve a high-...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","Generative modeling","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W4393277515","title":"Exploring the Potential of Large Language Models (LLMs)in Learning on Graphs","url":"https://doi.org/10.1145/3655103.3655110","published":"2024-03-26","authors":["Zhikai Chen","Haitao Mao","Hang Li","Wei Jin","Hongzhi Wen","Xiaochi Wei","Shuaiqiang Wang","Dawei Yin","Wenqi Fan","Hui Liu","Jiliang Tang"],"abstract":"Learning on Graphs has attracted immense attention due to its wide real-world applications. The most popular pipeline for learning on graphs with textual node attributes primarily relies on Graph Neural Networks (GNNs), and utilizes shallow text embedding as initial node representations, which has limitations in general knowledge and profound semantic understanding. In recent years, Large Language Models (LLMs) have been proven to possess extensive common knowledge and powerful semantic comprehension abilities that have revolutionized existing workflows to handle text data. In this paper, we aim to explore the potential of LLMs in graph machine learning, especially the node classification task, and investigate two possible pipelines: LLMs-as-Enhancers and LLMs-as-Predictors. The former leverages LLMs to enhance nodes' text attributes with their massive knowledge and then generate predict...","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3655103.3655110","openalex_id":"https://openalex.org/W4393277515","cited_by_count":153,"quality_score":71,"matched_keywords":["LLM"],"author_affiliations":["Baidu (China)","Emory University","Hong Kong Polytechnic University","Michigan State University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8120250701904297},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.41867348551750183},{"id":"https://openalex.org/C2522767166","display_name":"Data science","score":0.36311984062194824},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.35936176776885986}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":153}},{"id":"openalex:W4393187333","title":"Model tuning or prompt Tuning? a study of large language models for clinical concept and relation extraction","url":"https://doi.org/10.1016/j.jbi.2024.104630","published":"2024-03-26","authors":["Peng Cheng","Xi Yang","Kaleb E Smith","Zehao Yu","Aokun Chen","Jiang Bian","Yonghui Wu"],"abstract":"","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1016/j.jbi.2024.104630","openalex_id":"https://openalex.org/W4393187333","cited_by_count":61,"quality_score":67,"matched_keywords":[],"author_affiliations":["Nvidia (United States)","UF Health Cancer Center","University of Florida","University of Florida Health"],"concepts":[{"id":"https://openalex.org/C27158222","display_name":"Generalizability theory","score":0.6817184686660767},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.4660326838493347},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.40621015429496765},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.3213503956794739},{"id":"https://openalex.org/C15744967","display_name":"Psychology","score":0.2228149175643921},{"id":"https://openalex.org/C138496976","display_name":"Developmental psychology","score":0.0797567069530487}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":61}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/large-language-models-produce-responses-perceived-to-be-empathic","title":"Large Language Models Produce Responses Perceived to be Empathic","url":"https://www.microsoft.com/en-us/research/publication/large-language-models-produce-responses-perceived-to-be-empathic/","published":"2024-03-25","authors":["Yoon Kyung Lee","Jina Suh","Hongli Zhan","Junyi Jessy Li","Desmond C. Ong"],"abstract":"Large Language Models (LLMs) have demonstrated surprising performance on many tasks, including writing supportive messages that display empathy. Here, we had these models generate empathic messages in response to posts describing common life experiences, such as workplace situations, parenting, relationships, and other anxiety- and anger-eliciting situations. Across two studies (N=192, 202), we showed human raters a variety of responses written by several models (GPT4 Turbo, Llama2, and Mistral), and had people rate these responses on how empathic they seemed to be. We found that LLM-generated responses were consistently rated as more empathic than human-written responses. Linguistic analyses also show that these models write in distinct, predictable \"styles\", in terms of their use of punctuation, emojis, and certain words. These results highlight the potential of using LLMs to enhance h...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.1109/acii63134.2024.00012","openalex_id":"https://openalex.org/W4409763035","cited_by_count":24,"quality_score":100,"matched_keywords":["Miscellaneous","Artificial intelligence","Human-computer interaction","Computation and Language","Computer science","LLM"],"author_affiliations":["Microsoft","Microsoft (United States)","The University of Texas at Austin"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/the-use-of-generative-search-engines-for-knowledge-work-and-complex-tasks","title":"The Use of Generative Search Engines for Knowledge Work and Complex Tasks","url":"https://www.microsoft.com/en-us/research/publication/the-use-of-generative-search-engines-for-knowledge-work-and-complex-tasks/","published":"2024-03-25","authors":["Siddharth Suri","Scott Counts","Leijie Wang","Chacha Chen","Mengting Wan","Tara Safavi","Jennifer Neville","Chirag Shah","Ryen W. White","Reid Andersen","Georg Buscher","Sathish Manivannan"],"abstract":"Until recently, search engines were the predominant method for people to access online information. The recent emergence of large language models (LLMs) has given machines new capabilities such as the ability to generate new digital artifacts like text, images, code etc., resulting in a new tool, a generative search engine, which combines the capabilities of LLMs with a traditional search engine. Through the empirical analysis of Bing Copilot (Bing Chat), one of the first publicly available generative search engines, we analyze the types and complexity of tasks that people use Bing Copilot for compared to Bing Search. Findings indicate that people use the generative search engine for more knowledge work tasks that are higher in cognitive complexity than were commonly done with a traditional search engine.","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Unpublished","Artificial intelligence","Search and information retrieval","Social sciences","Computational Social Sciences"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/omnivid-a-generative-framework-for-universal-video-understanding","title":"OmniVid: A Generative Framework for Universal Video Understanding","url":"https://www.microsoft.com/en-us/research/publication/omnivid-a-generative-framework-for-universal-video-understanding/","published":"2024-03-25","authors":["Junke Wang","Dongdong Chen","Chong Luo","Bo He","Lu Yuan","Zuxuan Wu","Yu-Gang Jiang"],"abstract":"The core of video understanding tasks, such as recognition, captioning, and tracking, is to automatically detect objects or actions in a video and analyze their temporal evolution. Despite sharing a common goal, different tasks often rely on distinct model architectures and annotation formats. In contrast, natural language processing benefits from a unified output space, i.e., text sequences, which simplifies the training of powerful foundational language models, such as GPT-3, with extensive training corpora. Inspired by this, we seek to unify the output space of video understanding tasks by using languages as labels and additionally introducing time and box tokens. In this way, a variety of video tasks could be formulated as video-grounded token generation. This enables us to address various types of video tasks, including classification (such as action recognition), captioning (coveri...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Inproceedings (Conference)","Computer vision","Computer science","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W4393148669","title":"A Multimodal, Multi-Task Adapting Framework for Video Action Recognition","url":"https://doi.org/10.1609/aaai.v38i6.28361","published":"2024-03-24","authors":["Mengmeng Wang","Jiazheng Xing","Boyuan Jiang","Jun Chen","Jianbiao Mei","Xingxing Zuo","Guang Dai","Jingdong Wang","Yong Liu"],"abstract":"Recently, the rise of large-scale vision-language pretrained models like CLIP, coupled with the technology of Parameter-Efficient FineTuning (PEFT), has captured substantial attraction in video action recognition. Nevertheless, prevailing approaches tend to prioritize strong supervised performance at the expense of compromising the models' generalization capabilities during transfer. In this paper, we introduce a novel Multimodal, Multi-task CLIP adapting framework named M2-CLIP to address these challenges, preserving both high supervised performance and robust transferability. Firstly, to enhance the individual modality architectures, we introduce multimodal adapters to both the visual and text branches. Specifically, we design a novel visual TED-Adapter, that performs global Temporal Enhancement and local temporal Difference modeling to improve the temporal representation capabilities....","companies":["Tencent/Hunyuan","Baidu"],"matched_orgs":["Tencent/Hunyuan","Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1609/aaai.v38i6.28361","openalex_id":"https://openalex.org/W4393148669","cited_by_count":26,"quality_score":79,"matched_keywords":["efficient"],"author_affiliations":["Baidu (China)","State Grid Corporation of China (China)","Technical University of Munich","Tencent (China)","Zhejiang University"],"concepts":[{"id":"https://openalex.org/C2987834672","display_name":"Action recognition","score":0.6943210363388062},{"id":"https://openalex.org/C2780451532","display_name":"Task (project management)","score":0.6617742776870728},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6570984721183777},{"id":"https://openalex.org/C2780791683","display_name":"Action (physics)","score":0.5903373956680298},{"id":"https://openalex.org/C4441509","display_name":"Multimodal therapy","score":0.45744937658309937},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.43854260444641113},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3464648723602295},{"id":"https://openalex.org/C15744967","display_name":"Psychology","score":0.15683329105377197}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":26}},{"id":"openalex:W4393148276","title":"MmAP: Multi-Modal Alignment Prompt for Cross-Domain Multi-Task Learning","url":"https://doi.org/10.1609/aaai.v38i14.29540","published":"2024-03-24","authors":["Yi Xin","Junlong Du","Qiang Wang","Ke Yan","Shouhong Ding"],"abstract":"Multi-Task Learning (MTL) is designed to train multiple correlated tasks simultaneously, thereby enhancing the performance of individual tasks. Typically, a multi-task network structure consists of a shared backbone and task-specific decoders. However, the complexity of the decoders increases with the number of tasks. To tackle this challenge, we integrate the decoder-free vision-language model CLIP, which exhibits robust zero-shot generalization capability. Recently, parameter-efficient transfer learning methods have been extensively explored with CLIP for adapting to downstream tasks, where prompt tuning showcases strong potential. Nevertheless, these methods solely fine-tune a single modality (text or visual), disrupting the modality structure of CLIP. In this paper, we first propose Multi-modal Alignment Prompt (MmAP) for CLIP, which aligns text and visual modalities during fine-tuni...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1609/aaai.v38i14.29540","openalex_id":"https://openalex.org/W4393148276","cited_by_count":69,"quality_score":75,"matched_keywords":["language model","efficient"],"author_affiliations":["Nanjing University","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C71139939","display_name":"Modal","score":0.7264331579208374},{"id":"https://openalex.org/C2780451532","display_name":"Task (project management)","score":0.6680877208709717},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6520044803619385},{"id":"https://openalex.org/C36503486","display_name":"Domain (mathematical analysis)","score":0.5853434801101685},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3850717842578888},{"id":"https://openalex.org/C127413603","display_name":"Engineering","score":0.1635105311870575},{"id":"https://openalex.org/C33923547","display_name":"Mathematics","score":0.09463846683502197},{"id":"https://openalex.org/C201995342","display_name":"Systems engineering","score":0.08966192603111267}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":69}},{"id":"openalex:W4393146352","title":"CLIPSyntel: CLIP and LLM Synergy for Multimodal Question Summarization in Healthcare","url":"https://doi.org/10.1609/aaai.v38i20.30206","published":"2024-03-24","authors":["Akash Ghosh","A. Seetharama Acharya","Raghav Jain","Sriparna Saha","Aman Chadha","Setu Sinha"],"abstract":"In the era of modern healthcare, swiftly generating medical question summaries is crucial for informed and timely patient care. Despite the increasing complexity and volume of medical data, existing studies have focused solely on text-based summarization, neglecting the integration of visual information. Recognizing the untapped potential of combining textual queries with visual representations of medical conditions, we introduce the Multimodal Medical Question Summarization (MMQS) Dataset. This dataset, a major contribution of our work, pairs medical queries with visual aids, facilitating a richer and more nuanced understanding of patient needs. We also propose a framework, utilizing the power of Contrastive Language Image Pretraining(CLIP) and Large Language Models(LLMs), consisting of four modules that identify medical disorders, generate relevant context, filter medical concepts, and...","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1609/aaai.v38i20.30206","openalex_id":"https://openalex.org/W4393146352","cited_by_count":34,"quality_score":75,"matched_keywords":["LLM","personalized"],"author_affiliations":["Amazon (United States)","Indian Institute of Technology Patna","Indira Gandhi Institute of Medical Sciences"],"concepts":[{"id":"https://openalex.org/C170858558","display_name":"Automatic summarization","score":0.8436422348022461},{"id":"https://openalex.org/C160735492","display_name":"Health care","score":0.5703338980674744},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.40786027908325195},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.22038114070892334},{"id":"https://openalex.org/C17744445","display_name":"Political science","score":0.17812702059745789},{"id":"https://openalex.org/C199539241","display_name":"Law","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":34}},{"id":"openalex:W4393147046","title":"SECap: Speech Emotion Captioning with Large Language Model","url":"https://doi.org/10.1609/aaai.v38i17.29902","published":"2024-03-24","authors":["Yaoxun Xu","Hangting Chen","Jianwei Yu","Qiaochu Huang","Zhiyong Wu","Shi-Xiong Zhang","Guangzhi Li","Yi Luo","Rongzhi Gu"],"abstract":"Speech emotions are crucial in human communication and are extensively used in fields like speech synthesis and natural language understanding. Most prior studies, such as speech emotion recognition, have categorized speech emotions into a fixed set of classes. Yet, emotions expressed in human speech are often complex, and categorizing them into predefined groups can be insufficient to adequately represent speech emotions. On the contrary, describing speech emotions directly by means of natural language may be a more effective approach. Regrettably, there are not many studies available that have focused on this direction. Therefore, this paper proposes a speech emotion captioning framework named SECap, aiming at effectively describing speech emotions using natural language. Owing to the impressive capabilities of large language models in language comprehension and text generation, SECap....","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1609/aaai.v38i17.29902","openalex_id":"https://openalex.org/W4393147046","cited_by_count":31,"quality_score":71,"matched_keywords":["language model"],"author_affiliations":["Chinese University of Hong Kong, Shenzhen","Tencent (China)","Tsinghua University","University Town of Shenzhen"],"concepts":[{"id":"https://openalex.org/C157657479","display_name":"Closed captioning","score":0.8669109344482422},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5885266065597534},{"id":"https://openalex.org/C28490314","display_name":"Speech recognition","score":0.45554155111312866},{"id":"https://openalex.org/C41895202","display_name":"Linguistics","score":0.4340091347694397},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.4120092988014221},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.2683897912502289},{"id":"https://openalex.org/C138885662","display_name":"Philosophy","score":0.040106773376464844},{"id":"https://openalex.org/C115961682","display_name":"Image (mathematics)","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":31}},{"id":"openalex:W4393148714","title":"T2I-Adapter: Learning Adapters to Dig Out More Controllable Ability for Text-to-Image Diffusion Models","url":"https://doi.org/10.1609/aaai.v38i5.28226","published":"2024-03-24","authors":["Chong Mou","Xintao Wang","Liangbin Xie","Yanze Wu","Jian Zhang","Zhongang Qi","Ying Shan"],"abstract":"The incredible generative ability of large-scale text-to-image (T2I) models has demonstrated strong power of learning complex structures and meaningful semantics. However, relying solely on text prompts cannot fully take advantage of the knowledge learned by the model, especially when flexible and accurate controlling (e.g., structure and color) is needed. In this paper, we aim to ``dig out\" the capabilities that T2I models have implicitly learned, and then explicitly use them to control the generation more granularly. Specifically, we propose to learn low-cost T2I-Adapters to align internal knowledge in T2I models with external control signals, while freezing the original large T2I models. In this way, we can train various adapters according to different conditions, achieving rich control and editing effects in the color and structure of the generation results. Further, the proposed T2I...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1609/aaai.v38i5.28226","openalex_id":"https://openalex.org/W4393148714","cited_by_count":685,"quality_score":67,"matched_keywords":[],"author_affiliations":["Peking University","Peking University Shenzhen Hospital","Shenzhen Institutes of Advanced Technology","Tencent (China)","University of Macau"],"concepts":[{"id":"https://openalex.org/C177284502","display_name":"Adapter (computing)","score":0.9544022083282471},{"id":"https://openalex.org/C16654397","display_name":"Dig","score":0.7340787649154663},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6358358860015869},{"id":"https://openalex.org/C115961682","display_name":"Image (mathematics)","score":0.4879995882511139},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.36593562364578247},{"id":"https://openalex.org/C121684516","display_name":"Computer graphics (images)","score":0.36062008142471313},{"id":"https://openalex.org/C9390403","display_name":"Computer hardware","score":0.29096198081970215},{"id":"https://openalex.org/C136764020","display_name":"World Wide Web","score":0.17589542269706726}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":685}},{"id":"openalex:W4393158805","title":"LogFormer: A Pre-train and Tuning Pipeline for Log Anomaly Detection","url":"https://doi.org/10.1609/aaai.v38i1.27764","published":"2024-03-24","authors":["Hongcheng Guo","Jian Yang","Jiaheng Liu","Jiaqi Bai","Boyang Wang","Zhoujun Li","Tieqiao Zheng","Bo Zhang","Junran Peng","Qi Tian"],"abstract":"Log anomaly detection is a key component in the field of artificial intelligence for IT operations (AIOps). Considering log data of variant domains, retraining the whole network for unknown domains is inefficient in real industrial scenarios. However, previous deep models merely focused on extracting the semantics of log sequences in the same domain, leading to poor generalization on multi-domain logs. To alleviate this issue, we propose a unified Transformer-based framework for Log anomaly detection (LogFormer) to improve the generalization ability across different domains, where we establish a two-stage process including the pre-training and adapter-based tuning stage. Specifically, our model is first pre-trained on the source domain to obtain shared semantic knowledge of log data. Then, we transfer such knowledge to the target domain via shared parameters. Besides, the Log-Attention m...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1609/aaai.v38i1.27764","openalex_id":"https://openalex.org/W4393158805","cited_by_count":53,"quality_score":67,"matched_keywords":[],"author_affiliations":["Beihang University","Huawei Technologies (China)"],"concepts":[{"id":"https://openalex.org/C739882","display_name":"Anomaly detection","score":0.7392557263374329},{"id":"https://openalex.org/C43521106","display_name":"Pipeline (software)","score":0.6742358803749084},{"id":"https://openalex.org/C12997251","display_name":"Anomaly (physics)","score":0.6152648329734802},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.4156167209148407},{"id":"https://openalex.org/C127313418","display_name":"Geology","score":0.3456904888153076},{"id":"https://openalex.org/C124101348","display_name":"Data mining","score":0.30290335416793823},{"id":"https://openalex.org/C121332964","display_name":"Physics","score":0.15101996064186096},{"id":"https://openalex.org/C111919701","display_name":"Operating system","score":0.09833869338035583}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":53}},{"id":"openalex:W4393156206","title":"Latent Diffusion Transformer for Probabilistic Time Series Forecasting","url":"https://doi.org/10.1609/aaai.v38i11.29085","published":"2024-03-24","authors":["Shibo Feng","Chunyan Miao","Zhong Zhang","Peilin Zhao"],"abstract":"The probability prediction of multivariate time series is a notoriously challenging but practical task. This research proposes to condense high-dimensional multivariate time series forecasting into a problem of latent space time series generation, to improve the expressiveness of each timestamp and make forecasting more manageable. To solve the problem that the existing work is hard to extend to high-dimensional multivariate time series, we present a latent multivariate time series diffusion framework called Latent Diffusion Transformer (LDT), which consists of a symmetric statistics-aware autoencoder and a diffusion-based conditional generator, to implement this idea. Through careful design, the time series autoencoder can compress multivariate timestamp patterns into a concise latent representation by considering dynamic statistics. Then, the diffusion-based conditional generator is ab...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1609/aaai.v38i11.29085","openalex_id":"https://openalex.org/W4393156206","cited_by_count":33,"quality_score":67,"matched_keywords":[],"author_affiliations":["BC Research (Canada)","Nanyang Technological University","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C49937458","display_name":"Probabilistic logic","score":0.6699213981628418},{"id":"https://openalex.org/C143724316","display_name":"Series (stratigraphy)","score":0.529301643371582},{"id":"https://openalex.org/C151406439","display_name":"Time series","score":0.4675983786582947},{"id":"https://openalex.org/C149782125","display_name":"Econometrics","score":0.4331636130809784},{"id":"https://openalex.org/C66322947","display_name":"Transformer","score":0.4230876564979553},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.38923248648643494},{"id":"https://openalex.org/C33923547","display_name":"Mathematics","score":0.27741122245788574},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.21423128247261047}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":33}},{"id":"openalex:W4393154693","title":"IT3D: Improved Text-to-3D Generation with Explicit View Synthesis","url":"https://doi.org/10.1609/aaai.v38i2.27886","published":"2024-03-24","authors":["Yiwen Chen","Chi Zhang","Xiaofeng Yang","Zhongang Cai","Gang Yu","Lei Yang","Guosheng Lin"],"abstract":"Recent strides in Text-to-3D techniques have been propelled by distilling knowledge from powerful large text-to-image diffusion models (LDMs). Nonetheless, existing Text-to-3D approaches often grapple with challenges such as over-saturation, inadequate detailing, and unrealistic outputs. This study presents a novel strategy that leverages explicitly synthesized multi-view images to address these issues. Our approach involves the utilization of image-to-image pipelines, empowered by LDMs, to generate posed high-quality images based on the renderings of coarse 3D models. Although the generated images mostly alleviate the aforementioned issues, challenges such as view inconsistency and significant content variance persist due to the inherent generative nature of large diffusion models, posing extensive difficulties in leveraging these images effectively. To overcome this hurdle, we advocate...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1609/aaai.v38i2.27886","openalex_id":"https://openalex.org/W4393154693","cited_by_count":47,"quality_score":67,"matched_keywords":[],"author_affiliations":["Nanyang Technological University","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.4605286717414856},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.32076025009155273}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":47}},{"id":"openalex:W4393148505","title":"Follow Your Pose: Pose-Guided Text-to-Video Generation Using Pose-Free Videos","url":"https://doi.org/10.1609/aaai.v38i5.28206","published":"2024-03-24","authors":["Yue Ma","Yingqing He","Xiaodong Cun","Xintao Wang","Siran Chen","Xiu Li","Qifeng Chen"],"abstract":"Generating text-editable and pose-controllable character videos have an imperious demand in creating various digital human. Nevertheless, this task has been restricted by the absence of a comprehensive dataset featuring paired video-pose captions and the generative prior models for videos. In this work, we design a novel two-stage training scheme that can utilize easily obtained datasets (i.e., image pose pair and pose-free video) and the pre-trained text-to-image (T2I) model to obtain the pose-controllable character videos. Specifically, in the first stage, only the keypoint image pairs are used only for a controllable text-to-image generation. We learn a zero-initialized convolutional encoder to encode the pose information. In the second stage, we finetune the motion of the above network via a pose-free video dataset by adding the learnable temporal self-attention and reformed cross-fr...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1609/aaai.v38i5.28206","openalex_id":"https://openalex.org/W4393148505","cited_by_count":89,"quality_score":67,"matched_keywords":[],"author_affiliations":["Hong Kong University of Science and Technology","Shenzhen Institutes of Advanced Technology","Tencent (China)","Tsinghua University","Tsinghua–Berkeley Shenzhen Institute"],"concepts":[{"id":"https://openalex.org/C52102323","display_name":"Pose","score":0.6794735193252563},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.6715942621231079},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.6638826131820679},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6201443076133728}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":89}},{"id":"openalex:W4393159787","title":"AnomalyDiffusion: Few-Shot Anomaly Image Generation with Diffusion Model","url":"https://doi.org/10.1609/aaai.v38i8.28696","published":"2024-03-24","authors":["Teng Hu","Jiangning Zhang","Ran Yi","Yuzhen Du","Xu Chen","Liang Liu","Yabiao Wang","Chengjie Wang"],"abstract":"Anomaly inspection plays an important role in industrial manufacture. Existing anomaly inspection methods are limited in their performance due to insufficient anomaly data. Although anomaly generation methods have been proposed to augment the anomaly data, they either suffer from poor generation authenticity or inaccurate alignment between the generated anomalies and masks. To address the above problems, we propose AnomalyDiffusion, a novel diffusion-based few-shot anomaly generation model, which utilizes the strong prior information of latent diffusion model learned from large-scale dataset to enhance the generation authenticity under few-shot training data. Firstly, we propose Spatial Anomaly Embedding, which consists of a learnable anomaly embedding and a spatial embedding encoded from an anomaly mask, disentangling the anomaly information into anomaly appearance and location informat...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1609/aaai.v38i8.28696","openalex_id":"https://openalex.org/W4393159787","cited_by_count":94,"quality_score":67,"matched_keywords":[],"author_affiliations":["Shanghai Jiao Tong University","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C12997251","display_name":"Anomaly (physics)","score":0.6511410474777222},{"id":"https://openalex.org/C2778344882","display_name":"Shot (pellet)","score":0.648060142993927},{"id":"https://openalex.org/C115961682","display_name":"Image (mathematics)","score":0.518471896648407},{"id":"https://openalex.org/C69357855","display_name":"Diffusion","score":0.5098830461502075},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.3460184335708618},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.30411797761917114},{"id":"https://openalex.org/C121332964","display_name":"Physics","score":0.2566379904747009},{"id":"https://openalex.org/C192562407","display_name":"Materials science","score":0.16842535138130188}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":94}},{"id":"openalex:W4393148014","title":"Rethinking Reverse Distillation for Multi-Modal Anomaly Detection","url":"https://doi.org/10.1609/aaai.v38i8.28687","published":"2024-03-24","authors":["Zhihao Gu","Jiangning Zhang","Liang Liu","Xu Chen","Jinlong Peng","Zhenye Gan","Guannan Jiang","Annan Shu","Yabiao Wang","Lizhuang Ma"],"abstract":"In recent years, there has been significant progress in employing color images for anomaly detection in industrial scenarios, but it is insufficient for identifying anomalies that are invisible in RGB images alone. As a supplement, introducing extra modalities such as depth and surface normal maps can be helpful to detect these anomalies. To this end, we present a novel Multi-Modal Reverse Distillation (MMRD) paradigm that consists of a frozen multi-modal teacher encoder to generate distillation targets and a learnable student decoder targeting to restore multi-modal representations from the teacher. Specifically, the teacher extracts complementary visual features from different modalities via a siamese architecture and then parameter-freely fuses these information from multiple levels as the targets of distillation. For the student, it learns modality-related priors from the teacher rep...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1609/aaai.v38i8.28687","openalex_id":"https://openalex.org/W4393148014","cited_by_count":23,"quality_score":64,"matched_keywords":["distillation"],"author_affiliations":["Shanghai Jiao Tong University","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C739882","display_name":"Anomaly detection","score":0.7260067462921143},{"id":"https://openalex.org/C71139939","display_name":"Modal","score":0.6540180444717407},{"id":"https://openalex.org/C12997251","display_name":"Anomaly (physics)","score":0.6040900945663452},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.4711764454841614},{"id":"https://openalex.org/C204030448","display_name":"Distillation","score":0.4605397880077362},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3504211902618408},{"id":"https://openalex.org/C39432304","display_name":"Environmental science","score":0.3209362328052521},{"id":"https://openalex.org/C185592680","display_name":"Chemistry","score":0.12496906518936157}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":23}},{"id":"huawei-noah:189","title":"PreRoutGNN for Timing Prediction with Order Preserving Partition: Global Circuit Pre-training, Local Delay Learning and Attentional Cell Modeling","url":"https://www.noahlab.com.hk/en/scientific_research/preroutgnn-for-timing-prediction-with-order-preserving-partition-global-circuit-pre-training-local-delay-learning-and-attentional-cell-modeling","published":"2024-03-24","authors":["Huawei/Noah"],"abstract":"Official Huawei Noah's Ark Lab publication entry. Venue: AAAI 2024. External paper link: https://ojs.aaai.org/index.php/AAAI/article/view/29653/31111","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Industry Intelligence","AAAI 2024","2024"],"author_affiliations":["Huawei/Noah"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Huawei Noah Wagtail publication API https://www.noahlab.com.hk/wt_app/api/v2/pages/"}},{"id":"openalex:W4393154081","title":"Multi-Domain Incremental Learning for Face Presentation Attack Detection","url":"https://doi.org/10.1609/aaai.v38i6.28359","published":"2024-03-24","authors":["Keyao Wang","Guosheng Zhang","Haixiao Yue","Ajian Liu","Gang Zhang","Haocheng Feng","Junyu Han","Errui Ding","Jingdong Wang"],"abstract":"Previous face Presentation Attack Detection (PAD) methods aim to improve the effectiveness of cross-domain tasks. However, in real-world scenarios, the original training data of the pre-trained model is not available due to data privacy or other reasons. Under these constraints, general methods for fine-tuning single-target domain data may lose previously learned knowledge, leading to a catastrophic forgetting problem. To address these issues, we propose a multi-domain incremental learning (MDIL) method for PAD, which not only learns knowledge well from the new domain but also maintains the performance of previous domains stably. Specifically, we propose an adaptive domain-specific experts (ADE) framework based on the vision transformer to preserve the discriminability of previous domains. Furthermore, an asymmetric classifier is designed to keep the output distribution of different clas...","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1609/aaai.v38i6.28359","openalex_id":"https://openalex.org/W4393154081","cited_by_count":27,"quality_score":64,"matched_keywords":[],"author_affiliations":["Baidu (China)","University of Chinese Academy of Sciences"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6335528492927551},{"id":"https://openalex.org/C2777601897","display_name":"Presentation (obstetrics)","score":0.626148521900177},{"id":"https://openalex.org/C2779304628","display_name":"Face (sociological concept)","score":0.5114454030990601},{"id":"https://openalex.org/C36503486","display_name":"Domain (mathematical analysis)","score":0.4948214888572693},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.45454686880111694},{"id":"https://openalex.org/C2780735816","display_name":"Incremental learning","score":0.43020814657211304},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.3333631157875061},{"id":"https://openalex.org/C38652104","display_name":"Computer security","score":0.32712334394454956}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":27}},{"id":"openalex:W4393160590","title":"Generative Multi-Modal Knowledge Retrieval with Large Language Models","url":"https://doi.org/10.1609/aaai.v38i17.29837","published":"2024-03-24","authors":["Xinwei Long","Jiali Zeng","Fandong Meng","Zhiyuan Ma","Kaiyan Zhang","Bowen Zhou","Jie Zhou"],"abstract":"Knowledge retrieval with multi-modal queries plays a crucial role in supporting knowledge-intensive multi-modal applications. However, existing methods face challenges in terms of their effectiveness and training efficiency, especially when it comes to training and integrating multiple retrievers to handle multi-modal queries. In this paper, we propose an innovative end-to-end generative framework for multi-modal knowledge retrieval. Our framework takes advantage of the fact that large language models (LLMs) can effectively serve as virtual knowledge bases, even when trained with limited data. We retrieve knowledge via a two-step process: 1) generating knowledge clues related to the queries, and 2) obtaining the relevant document by searching databases using the knowledge clue. In particular, we first introduce an object-aware prefix-tuning technique to guide multi-grained visual learnin...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1609/aaai.v38i17.29837","openalex_id":"https://openalex.org/W4393160590","cited_by_count":19,"quality_score":64,"matched_keywords":["LLM","retrieval"],"author_affiliations":["Tencent (China)","Tsinghua University"],"concepts":[{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.7282950282096863},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6537749767303467},{"id":"https://openalex.org/C71139939","display_name":"Modal","score":0.6031192541122437},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.5623741149902344},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4937926232814789},{"id":"https://openalex.org/C185592680","display_name":"Chemistry","score":0.06866109371185303},{"id":"https://openalex.org/C188027245","display_name":"Polymer chemistry","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":19}},{"id":"openalex:W4393149060","title":"Less Is More: Label Recommendation for Weakly Supervised Point Cloud Semantic Segmentation","url":"https://doi.org/10.1609/aaai.v38i5.28237","published":"2024-03-24","authors":["Zhiyi Pan","Nan Zhang","Wei Gao","Shan Liu","Ge Li"],"abstract":"Weak supervision has proven to be an effective strategy for reducing the burden of annotating semantic segmentation tasks in 3D space. However, unconstrained or heuristic weakly supervised annotation forms may lead to suboptimal label efficiency. To address this issue, we propose a novel label recommendation framework for weakly supervised point cloud semantic segmentation. Distinct from pre-training and active learning, the label recommendation framework consists of three stages: inductive bias learning, recommendations for points to be labeled, and point cloud semantic segmentation learning. In practice, we first introduce the point cloud upsampling task to induct inductive bias from structural information. During the recommendation stage, we present a cross-scene clustering strategy to generate centers of clustering as recommended points. Then we introduce a recommended point position...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1609/aaai.v38i5.28237","openalex_id":"https://openalex.org/W4393149060","cited_by_count":26,"quality_score":63,"matched_keywords":[],"author_affiliations":["Peking University","Peng Cheng Laboratory","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C131979681","display_name":"Point cloud","score":0.7268438935279846},{"id":"https://openalex.org/C89600930","display_name":"Segmentation","score":0.724584698677063},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5826331973075867},{"id":"https://openalex.org/C79974875","display_name":"Cloud computing","score":0.532889723777771},{"id":"https://openalex.org/C28719098","display_name":"Point (geometry)","score":0.511774480342865},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.4903004467487335},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.43252891302108765},{"id":"https://openalex.org/C33923547","display_name":"Mathematics","score":0.14210012555122375}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":26}},{"id":"openalex:W4393147193","title":"Editing Language Model-Based Knowledge Graph Embeddings","url":"https://doi.org/10.1609/aaai.v38i16.29737","published":"2024-03-24","authors":["Siyuan Cheng","Ningyu Zhang","Bozhong Tian","Xi Chen","Qingbin Liu","Huajun Chen"],"abstract":"Recently decades have witnessed the empirical success of framing Knowledge Graph (KG) embeddings via language models. However, language model-based KG embeddings are usually deployed as static artifacts, making them difficult to modify post-deployment without re-training after deployment. To address this issue, we propose a new task of editing language model-based KG embeddings in this paper. This task is designed to facilitate rapid, data-efficient updates to KG embeddings without compromising the performance of other aspects. We build four new datasets: E-FB15k237, A-FB15k237, E-WN18RR, and A-WN18RR, and evaluate several knowledge editing baselines demonstrating the limited ability of previous models to handle the proposed challenging task. We further propose a simple yet strong baseline dubbed KGEditor, which utilizes additional parametric layers of the hypernetwork to edit/add facts....","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1609/aaai.v38i16.29737","openalex_id":"https://openalex.org/W4393147193","cited_by_count":18,"quality_score":63,"matched_keywords":["language model","efficient"],"author_affiliations":["Tencent (China)","Zhejiang University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6370530724525452},{"id":"https://openalex.org/C132525143","display_name":"Graph","score":0.4708632528781891},{"id":"https://openalex.org/C2987255567","display_name":"Knowledge graph","score":0.4479026794433594},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.39486801624298096},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3297259211540222},{"id":"https://openalex.org/C80444323","display_name":"Theoretical computer science","score":0.3263748288154602}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":18}},{"id":"openalex:W4393148771","title":"VLM2Scene: Self-Supervised Image-Text-LiDAR Learning with Foundation Models for Autonomous Driving Scene Understanding","url":"https://doi.org/10.1609/aaai.v38i4.28121","published":"2024-03-24","authors":["Guibiao Liao","Jiankun Li","Xiaoqing Ye"],"abstract":"Vision and language foundation models (VLMs) have showcased impressive capabilities in 2D scene understanding. However, their latent potential in elevating the understanding of 3D autonomous driving scenes remains untapped. In this paper, we propose VLM2Scene, which exploits the potential of VLMs to enhance 3D self-supervised representation learning through our proposed image-text-LiDAR contrastive learning strategy. Specifically, in the realm of autonomous driving scenes, the inherent sparsity of LiDAR point clouds poses a notable challenge for point-level contrastive learning methods. This method often grapples with limitations tied to a restricted receptive field and the presence of noisy points. To tackle this challenge, our approach emphasizes region-level learning, leveraging regional masks without semantics derived from the vision foundation model. This approach capitalizes on val...","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1609/aaai.v38i4.28121","openalex_id":"https://openalex.org/W4393148771","cited_by_count":22,"quality_score":59,"matched_keywords":[],"author_affiliations":["Baidu (China)","Peng Cheng Laboratory"],"concepts":[{"id":"https://openalex.org/C2780966255","display_name":"Foundation (evidence)","score":0.8423081636428833},{"id":"https://openalex.org/C51399673","display_name":"Lidar","score":0.6719222664833069},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5856758952140808},{"id":"https://openalex.org/C115961682","display_name":"Image (mathematics)","score":0.5230906009674072},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5085676908493042},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.4945962131023407},{"id":"https://openalex.org/C62649853","display_name":"Remote sensing","score":0.2850169539451599},{"id":"https://openalex.org/C205649164","display_name":"Geography","score":0.2092936635017395}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":22}},{"id":"openalex:W4393156223","title":"Image Captioning with Multi-Context Synthetic Data","url":"https://doi.org/10.1609/aaai.v38i5.28203","published":"2024-03-24","authors":["Feipeng Ma","Yizhou Zhou","Fengyun Rao","Yueyi Zhang","Xiaoyan Sun"],"abstract":"Image captioning requires numerous annotated image-text pairs, resulting in substantial annotation costs. Recently, large models (e.g. diffusion models and large language models) have excelled in producing high-quality images and text. This potential can be harnessed to create synthetic image-text pairs for training captioning models. Synthetic data can improve cost and time efficiency in data collection, allow for customization to specific domains, bootstrap generalization capability for zero-shot performance, and circumvent privacy concerns associated with real-world data. However, existing methods struggle to attain satisfactory performance solely through synthetic data. We identify the issue as generated images from simple descriptions mostly capture a solitary perspective with limited context, failing to align with the intricate scenes prevalent in real-world imagery. To tackle this...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1609/aaai.v38i5.28203","openalex_id":"https://openalex.org/W4393156223","cited_by_count":17,"quality_score":58,"matched_keywords":["language model"],"author_affiliations":["Tencent (China)","University of Science and Technology of China"],"concepts":[{"id":"https://openalex.org/C157657479","display_name":"Closed captioning","score":0.9397733211517334},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6171645522117615},{"id":"https://openalex.org/C2779343474","display_name":"Context (archaeology)","score":0.5824446678161621},{"id":"https://openalex.org/C115961682","display_name":"Image (mathematics)","score":0.5239900946617126},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4977464973926544},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.4066905379295349},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.3905346393585205},{"id":"https://openalex.org/C28490314","display_name":"Speech recognition","score":0.3610485792160034}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":17}},{"id":"openalex:W4393148049","title":"Sparse3D: Distilling Multiview-Consistent Diffusion for Object Reconstruction from Sparse Views","url":"https://doi.org/10.1609/aaai.v38i7.28626","published":"2024-03-24","authors":["Zi–Xin Zou","Weihao Cheng","Yan‐Pei Cao","Shi-Sheng Huang","Ying Shan","Song–Hai Zhang"],"abstract":"Reconstructing 3D objects from extremely sparse views is a long-standing and challenging problem. While recent techniques employ image diffusion models for generating plausible images at novel viewpoints or for distilling pre-trained diffusion priors into 3D representations using score distillation sampling (SDS), these methods often struggle to simultaneously achieve high-quality, consistent, and detailed results for both novel-view synthesis (NVS) and geometry. In this work, we present Sparse3D, a novel 3D reconstruction method tailored for sparse view inputs. Our approach distills robust priors from a multiview-consistent diffusion model to refine a neural radiance field. Specifically, we employ a controller that harnesses epipolar features from input views, guiding a pre-trained diffusion model, such as Stable Diffusion, to produce novel-view images that maintain 3D consistency with....","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1609/aaai.v38i7.28626","openalex_id":"https://openalex.org/W4393148049","cited_by_count":16,"quality_score":57,"matched_keywords":["distillation"],"author_affiliations":["Beijing Normal University","Tencent (China)","Tsinghua University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5668663382530212},{"id":"https://openalex.org/C69357855","display_name":"Diffusion","score":0.53729248046875},{"id":"https://openalex.org/C2781238097","display_name":"Object (grammar)","score":0.52274090051651},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.49659067392349243},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.48272940516471863},{"id":"https://openalex.org/C127313418","display_name":"Geology","score":0.34242016077041626},{"id":"https://openalex.org/C121684516","display_name":"Computer graphics (images)","score":0.33728262782096863},{"id":"https://openalex.org/C121332964","display_name":"Physics","score":0.14813899993896484}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":16}},{"id":"openalex:W4393161178","title":"Tree-of-Reasoning Question Decomposition for Complex Question Answering with Large Language Models","url":"https://doi.org/10.1609/aaai.v38i17.29928","published":"2024-03-24","authors":["Kun Zhang","Jiali Zeng","Fandong Meng","Yuanzhuo Wang","Shiqi Sun","Long Bai","Huawei Shen","Jie Zhou"],"abstract":"Large language models (LLMs) have recently demonstrated remarkable performance across various Natual Language Processing tasks. In the field of multi-hop reasoning, the Chain-of-thought (CoT) prompt method has emerged as a paradigm, using curated stepwise reasoning demonstrations to enhance LLM's ability to reason and produce coherent rational pathways. To ensure the accuracy, reliability, and traceability of the generated answers, many studies have incorporated information retrieval (IR) to provide LLMs with external knowledge. However, existing CoT with IR methods decomposes questions into sub-questions based on a single compositionality type, which limits their effectiveness for questions involving multiple compositionality types. Additionally, these methods suffer from inefficient retrieval, as complex questions often contain abundant information, leading to the retrieval of irreleva...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1609/aaai.v38i17.29928","openalex_id":"https://openalex.org/W4393161178","cited_by_count":9,"quality_score":54,"matched_keywords":["LLM","retrieval"],"author_affiliations":["Beijing Institute of Big Data Research","Chinese Academy of Sciences","Institute of Computing Technology","Tencent (China)","University of Chinese Academy of Sciences"],"concepts":[{"id":"https://openalex.org/C44291984","display_name":"Question answering","score":0.8675791025161743},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.5636521577835083},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5470004677772522},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5283967852592468},{"id":"https://openalex.org/C124681953","display_name":"Decomposition","score":0.5053737759590149},{"id":"https://openalex.org/C113174947","display_name":"Tree (set theory)","score":0.5017881393432617},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.4144305884838104},{"id":"https://openalex.org/C41895202","display_name":"Linguistics","score":0.34137600660324097}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":9}},{"id":"openalex:W4393148942","title":"Weakly Supervised Open-Vocabulary Object Detection","url":"https://doi.org/10.1609/aaai.v38i4.28127","published":"2024-03-24","authors":["Jianghang Lin","Yunhang Shen","Bingquan Wang","Shaohui Lin","Ke Li","Liujuan Cao"],"abstract":"Despite weakly supervised object detection (WSOD) being a promising step toward evading strong instance-level annotations, its capability is confined to closed-set categories within a single training dataset. In this paper, we propose a novel weakly supervised open-vocabulary object detection framework, namely WSOVOD, to extend traditional WSOD to detect novel concepts and utilize diverse datasets with only image-level annotations. To achieve this, we explore three vital strategies, including dataset-level feature adaptation, image-level salient object localization, and region-level vision-language alignment. First, we perform data-aware feature extraction to produce an input-conditional coefficient, which is leveraged into dataset attribute prototypes to identify dataset bias and help achieve cross-dataset generalization. Second, a customized location-oriented weakly supervised region p...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1609/aaai.v38i4.28127","openalex_id":"https://openalex.org/W4393148942","cited_by_count":16,"quality_score":53,"matched_keywords":[],"author_affiliations":["East China Normal University","Tencent (China)","Xiamen University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6017935276031494},{"id":"https://openalex.org/C2777601683","display_name":"Vocabulary","score":0.5929076075553894},{"id":"https://openalex.org/C2781238097","display_name":"Object (grammar)","score":0.5420161485671997},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5270552635192871},{"id":"https://openalex.org/C2776151529","display_name":"Object detection","score":0.49560320377349854},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.40621474385261536},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.38915109634399414},{"id":"https://openalex.org/C153180895","display_name":"Pattern recognition (psychology)","score":0.2894860506057739}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":16}},{"id":"openalex:W4393152836","title":"Teaching Large Language Models to Translate with Comparison","url":"https://doi.org/10.1609/aaai.v38i17.29920","published":"2024-03-24","authors":["Jiali Zeng","Fandong Meng","Yongjing Yin","Jie Zhou"],"abstract":"Open-sourced large language models (LLMs) have demonstrated remarkable efficacy in various tasks with instruction tuning. However, these models can sometimes struggle with tasks that require more specialized knowledge such as translation. One possible reason for such deficiency is that instruction tuning aims to generate fluent and coherent text that continues from a given instruction without being constrained by any task-specific requirements. Moreover, it can be more challenging to tune smaller LLMs with lower-quality training data. To address this issue, we propose a novel framework using examples in comparison to teach LLMs to learn translation. Our approach involves output comparison and preference comparison, presenting the model with carefully designed examples of correct and incorrect translations and an additional preference loss for better regularization. Empirical evaluation o...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1609/aaai.v38i17.29920","openalex_id":"https://openalex.org/W4393152836","cited_by_count":11,"quality_score":52,"matched_keywords":["preference"],"author_affiliations":["Tencent (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.4918553829193115},{"id":"https://openalex.org/C41895202","display_name":"Linguistics","score":0.3940278887748718},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.3824402987957001},{"id":"https://openalex.org/C145420912","display_name":"Mathematics education","score":0.38134488463401794},{"id":"https://openalex.org/C15744967","display_name":"Psychology","score":0.2889344096183777},{"id":"https://openalex.org/C138885662","display_name":"Philosophy","score":0.12554556131362915}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":11}},{"id":"openalex:W4393150138","title":"Federated Modality-Specific Encoders and Multimodal Anchors for Personalized Brain Tumor Segmentation","url":"https://doi.org/10.1609/aaai.v38i2.27909","published":"2024-03-24","authors":["Qian Dai","Dong Wei","Hong Liu","Jinghan Sun","Liansheng Wang","Yefeng Zheng"],"abstract":"Most existing federated learning (FL) methods for medical image analysis only considered intramodal heterogeneity, limiting their applicability to multimodal imaging applications. In practice, it is not uncommon that some FL participants only possess a subset of the complete imaging modalities, posing inter-modal heterogeneity as a challenge to effectively training a global model on all participants’ data. In addition, each participant would expect to obtain a personalized model tailored for its local data characteristics from the FL in such a scenario. In this work, we propose a new FL framework with federated modality-specific encoders and multimodal anchors (FedMEMA) to simultaneously address the two concurrent issues. Above all, FedMEMA employs an exclusive encoder for each modality to account for the inter-modal heterogeneity in the first place. In the meantime, while the encoders a...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1609/aaai.v38i2.27909","openalex_id":"https://openalex.org/W4393150138","cited_by_count":11,"quality_score":52,"matched_keywords":["personalized"],"author_affiliations":["Tencent (China)","Tencent Healthcare (China)","Xiamen University"],"concepts":[{"id":"https://openalex.org/C2780226545","display_name":"Modality (human–computer interaction)","score":0.7672216892242432},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6349521279335022},{"id":"https://openalex.org/C118505674","display_name":"Encoder","score":0.6233291625976562},{"id":"https://openalex.org/C89600930","display_name":"Segmentation","score":0.6205578446388245},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4847359359264374},{"id":"https://openalex.org/C4441509","display_name":"Multimodal therapy","score":0.42038464546203613},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.3835461735725403},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.37989991903305054}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":11}},{"id":"openalex:W4393147080","title":"VLN-Video: Utilizing Driving Videos for Outdoor Vision-and-Language Navigation","url":"https://doi.org/10.1609/aaai.v38i17.29813","published":"2024-03-24","authors":["Jialu Li","Aishwarya Padmakumar","Gaurav S. Sukhatme","Mohit Bansal"],"abstract":"Outdoor Vision-and-Language Navigation (VLN) requires an agent to navigate through realistic 3D outdoor environments based on natural language instructions. The performance of existing VLN methods is limited by insufficient diversity in navigation environments and limited training data. To address these issues, we propose VLN-Video, which utilizes the diverse outdoor environments present in driving videos in multiple cities in the U.S. augmented with automatically generated navigation instructions and actions to improve outdoor VLN performance. VLN-Video combines the best of intuitive classical approaches and modern deep learning techniques, using template infilling to generate grounded non-repetitive navigation instructions, combined with an image rotation similarity based navigation action predictor to obtain VLN style data from driving videos for pretraining deep learning VLN models.....","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1609/aaai.v38i17.29813","openalex_id":"https://openalex.org/W4393147080","cited_by_count":9,"quality_score":50,"matched_keywords":["agent"],"author_affiliations":["Amazon (United States)","University of North Carolina at Chapel Hill"],"concepts":[{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.6360138654708862},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.625289261341095},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4527266025543213}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":9}},{"id":"openalex:W4393154723","title":"SSMG: Spatial-Semantic Map Guided Diffusion Model for Free-Form Layout-to-Image Generation","url":"https://doi.org/10.1609/aaai.v38i3.28024","published":"2024-03-24","authors":["Chengyou Jia","Minnan Luo","Zhuohang Dang","Guang Dai","Xiaojun Chang","Mengmeng Wang","Jingdong Wang"],"abstract":"Despite significant progress in Text-to-Image (T2I) generative models, even lengthy and complex text descriptions still struggle to convey detailed controls. In contrast, Layout-to-Image (L2I) generation, aiming to generate realistic and complex scene images from user-specified layouts, has risen to prominence. However, existing methods transform layout information into tokens or RGB images for conditional control in the generative process, leading to insufficient spatial and semantic controllability of individual instances. To address these limitations, we propose a novel Spatial-Semantic Map Guided (SSMG) diffusion model that adopts the feature map, derived from the layout, as guidance. Owing to rich spatial and semantic information encapsulated in well-designed feature maps, SSMG achieves superior generation quality with sufficient spatial and semantic controllability compared to prev...","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1609/aaai.v38i3.28024","openalex_id":"https://openalex.org/W4393154723","cited_by_count":13,"quality_score":50,"matched_keywords":[],"author_affiliations":["Baidu (China)","Mohamed bin Zayed University of Artificial Intelligence","State Grid Corporation of China (China)","University of Technology Sydney","Xi'an Jiaotong University","Zhejiang University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6012915372848511},{"id":"https://openalex.org/C115961682","display_name":"Image (mathematics)","score":0.5297397375106812},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.45707353949546814},{"id":"https://openalex.org/C69357855","display_name":"Diffusion","score":0.42024174332618713},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.3618224859237671},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.3224911689758301},{"id":"https://openalex.org/C121332964","display_name":"Physics","score":0.09392449259757996},{"id":"https://openalex.org/C97355855","display_name":"Thermodynamics","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":13}},{"id":"openalex:W4393153464","title":"ExpCLIP: Bridging Text and Facial Expressions via Semantic Alignment","url":"https://doi.org/10.1609/aaai.v38i7.28594","published":"2024-03-24","authors":["Yicheng Zhong","Huawei Wei","Peiji Yang","Zhisheng Wang"],"abstract":"The objective of stylized speech-driven facial animation is to create animations that encapsulate specific emotional expressions. Existing methods often depend on pre-established emotional labels or facial expression templates, which may limit the necessary flexibility for accurately conveying user intent. In this research, we introduce a technique that enables the control of arbitrary styles by leveraging natural language as emotion prompts. This technique presents benefits in terms of both flexibility and user-friendliness. To realize this objective, we initially construct a Text-Expression Alignment Dataset (TEAD), wherein each facial expression is paired with several prompt-like descriptions. We propose an innovative automatic annotation method, supported by CahtGPT, to expedite the dataset construction, thereby eliminating the substantial expense of manual annotation. Following this...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1609/aaai.v38i7.28594","openalex_id":"https://openalex.org/W4393153464","cited_by_count":13,"quality_score":50,"matched_keywords":[],"author_affiliations":["Tencent (China)"],"concepts":[{"id":"https://openalex.org/C174348530","display_name":"Bridging (networking)","score":0.7026721239089966},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5612211227416992},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.5401384830474854},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4509253203868866},{"id":"https://openalex.org/C28490314","display_name":"Speech recognition","score":0.32788515090942383},{"id":"https://openalex.org/C31258907","display_name":"Computer network","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":13}},{"id":"openalex:W4393146990","title":"Translate Meanings, Not Just Words: IdiomKB’s Role in Optimizing Idiomatic Translation with Language Models","url":"https://doi.org/10.1609/aaai.v38i17.29817","published":"2024-03-24","authors":["Shuang Li","Jiangjie Chen","Siyu Yuan","Xinyi Wu","Hao Yang","Shimin Tao","Yanghua Xiao"],"abstract":"To translate well, machine translation (MT) systems and general-purposed language models (LMs) need a deep understanding of both source and target languages and cultures. Therefore, idioms, with their non-compositional nature, pose particular challenges for Transformer-based systems, as literal translations often miss the intended meaning. Traditional methods, which replace idioms using existing knowledge bases (KBs), often lack scale and context-awareness. Addressing these challenges, our approach prioritizes context-awareness and scalability, allowing for offline storage of idioms in a manageable KB size. This ensures efficient serving with smaller models and provides a more comprehensive understanding of idiomatic expressions. We introduce a multilingual idiom KB (IdiomKB) developed using large LMs to address this. This KB facilitates better translation by smaller models, such as BLOO...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1609/aaai.v38i17.29817","openalex_id":"https://openalex.org/W4393146990","cited_by_count":8,"quality_score":49,"matched_keywords":["efficient"],"author_affiliations":["Fudan University","Huawei Technologies (China)"],"concepts":[{"id":"https://openalex.org/C41895202","display_name":"Linguistics","score":0.7171645164489746},{"id":"https://openalex.org/C149364088","display_name":"Translation (biology)","score":0.6082493662834167},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5482306480407715},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.46560031175613403},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3770858347415924},{"id":"https://openalex.org/C15744967","display_name":"Psychology","score":0.332571804523468},{"id":"https://openalex.org/C138885662","display_name":"Philosophy","score":0.17471492290496826},{"id":"https://openalex.org/C185592680","display_name":"Chemistry","score":0.04881119728088379}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":8}},{"id":"openalex:W4393152829","title":"Multichannel AV-wav2vec2: A Framework for Learning Multichannel Multi-Modal Speech Representation","url":"https://doi.org/10.1609/aaai.v38i17.29951","published":"2024-03-24","authors":["Qiushi Zhu","Jie Zhang","裕二 池谷","Yuchen Hu","Lirong Dai"],"abstract":"Self-supervised speech pre-training methods have developed rapidly in recent years, which show to be very effective for many near-field single-channel speech tasks. However, far-field multichannel speech processing is suffering from the scarcity of labeled multichannel data and complex ambient noises. The efficacy of self-supervised learning for far-field multichannel and multi-modal speech processing has not been well explored. Considering that visual information helps to improve speech recognition performance in noisy scenes, in this work we propose the multichannel multi-modal speech self-supervised learning framework AV-wav2vec2, which utilizes video and multichannel audio data as inputs. First, we propose a multi-path structure to process multi-channel audio streams and a visual stream in parallel, with intra-, and inter-channel contrastive as training targets to fully exploit the r...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1609/aaai.v38i17.29951","openalex_id":"https://openalex.org/W4393152829","cited_by_count":9,"quality_score":46,"matched_keywords":[],"author_affiliations":["Nanyang Technological University","Tencent (China)","University of Science and Technology of China"],"concepts":[{"id":"https://openalex.org/C71139939","display_name":"Modal","score":0.6720021963119507},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5716509819030762},{"id":"https://openalex.org/C2776359362","display_name":"Representation (politics)","score":0.5459562540054321},{"id":"https://openalex.org/C28490314","display_name":"Speech recognition","score":0.5247823596000671},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.3569793403148651},{"id":"https://openalex.org/C192562407","display_name":"Materials science","score":0.05891576409339905},{"id":"https://openalex.org/C94625758","display_name":"Politics","score":0.0},{"id":"https://openalex.org/C17744445","display_name":"Political science","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":9}},{"id":"openalex:W4393147985","title":"Empowering Dual-Level Graph Self-Supervised Pretraining with Motif Discovery","url":"https://doi.org/10.1609/aaai.v38i8.28774","published":"2024-03-24","authors":["Pengwei Yan","Kaisong Song","Zhuoren Jiang","Yangyang Kang","Tianqianjin Lin","Changlong Sun","Xiaozhong Liu"],"abstract":"While self-supervised graph pretraining techniques have shown promising results in various domains, their application still experiences challenges of limited topology learning, human knowledge dependency, and incompetent multi-level interactions. To address these issues, we propose a novel solution, Dual-level Graph self-supervised Pretraining with Motif discovery (DGPM), which introduces a unique dual-level pretraining structure that orchestrates node-level and subgraph-level pretext tasks. Unlike prior approaches, DGPM autonomously uncovers significant graph motifs through an edge pooling module, aligning learned motif similarities with graph kernel-based similarities. A cross-matching task enables sophisticated node-motif interactions and novel representation learning. Extensive experiments on 15 datasets validate DGPM's effectiveness and generalizability, outperforming state-of-the-a...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1609/aaai.v38i8.28774","openalex_id":"https://openalex.org/W4393147985","cited_by_count":9,"quality_score":46,"matched_keywords":[],"author_affiliations":["Alibaba Group (China)","Northeastern University","Worcester Polytechnic Institute","Zhejiang University"],"concepts":[{"id":"https://openalex.org/C32276052","display_name":"Motif (music)","score":0.7600429654121399},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.45913058519363403},{"id":"https://openalex.org/C132525143","display_name":"Graph","score":0.4542061686515808},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.35356247425079346},{"id":"https://openalex.org/C15744967","display_name":"Psychology","score":0.33581674098968506},{"id":"https://openalex.org/C80444323","display_name":"Theoretical computer science","score":0.2250712811946869},{"id":"https://openalex.org/C142362112","display_name":"Art","score":0.1078084409236908},{"id":"https://openalex.org/C107038049","display_name":"Aesthetics","score":0.06157442927360535}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":9}},{"id":"openalex:W4393147850","title":"Explainable Origin-Destination Crowd Flow Interpolation via Variational Multi-Modal Recurrent Graph Auto-Encoder","url":"https://doi.org/10.1609/aaai.v38i8.28796","published":"2024-03-24","authors":["Qiang Zhou","Xinjiang Lu","Jingjing Gu","Zhe Zheng","Jin Bo","Jingbo Zhou"],"abstract":"Origin-destination (OD) crowd flow, if more accurately inferred at a fine-grained level, has the potential to enhance the efficacy of various urban applications. While in practice for mining OD crowd flow with effect, the problem of spatially interpolating OD crowd flow occurs since the ineluctable missing values. This problem is further complicated by the inherent scarcity and noise nature of OD crowd flow data. In this paper, we propose an uncertainty-aware interpolative and explainable framework, namely UApex, for realizing reliable and trustworthy OD crowd flow interpolation. Specifically, we first design a Variational Multi-modal Recurrent Graph Auto-Encoder (VMR-GAE) for uncertainty-aware OD crowd flow interpolation. A key idea here is to formulate the problem as semi-supervised learning on directed graphs. Next, to mitigate the data scarcity, we incorporate a distribution alignmen...","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1609/aaai.v38i8.28796","openalex_id":"https://openalex.org/W4393147850","cited_by_count":4,"quality_score":45,"matched_keywords":["efficient"],"author_affiliations":["Baidu (China)","Dalian University of Technology","Nanjing University of Aeronautics and Astronautics"],"concepts":[{"id":"https://openalex.org/C71139939","display_name":"Modal","score":0.7111931443214417},{"id":"https://openalex.org/C137800194","display_name":"Interpolation (computer graphics)","score":0.5406420230865479},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5230482816696167},{"id":"https://openalex.org/C132525143","display_name":"Graph","score":0.4998645782470703},{"id":"https://openalex.org/C38349280","display_name":"Flow (mathematics)","score":0.4935115575790405},{"id":"https://openalex.org/C118505674","display_name":"Encoder","score":0.4735502302646637},{"id":"https://openalex.org/C11413529","display_name":"Algorithm","score":0.44230806827545166},{"id":"https://openalex.org/C101738243","display_name":"Autoencoder","score":0.4199215769767761}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":4}},{"id":"openalex:W4393159345","title":"Unsupervised Domain Adaptative Temporal Sentence Localization with Mutual Information Maximization","url":"https://doi.org/10.1609/aaai.v38i4.28145","published":"2024-03-24","authors":["Daizong Liu","Xiang Fang","Xiaoye Qu","Jianfeng Dong","He Yan","Yang Yang","Pan Zhou","Yu Cheng"],"abstract":"Temporal sentence localization (TSL) aims to localize a target segment in a video according to a given sentence query. Though respectable works have made decent achievements in this task, they severely rely on abundant yet expensive manual annotations for training. Moreover, these trained data-dependent models usually can not generalize well to unseen scenarios because of the inherent domain shift. To facilitate this issue, in this paper, we target another more practical but challenging setting: unsupervised domain adaptative temporal sentence localization (UDA-TSL), which explores whether the localization knowledge can be transferred from a fully-annotated data domain (source domain) to a new unannotated data domain (target domain). Particularly, we propose an effective and novel baseline for UDA-TSL to bridge the multi-modal gap across different domains and learn the potential correspo...","companies":["Meta/FAIR"],"matched_orgs":["Meta/FAIR"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1609/aaai.v38i4.28145","openalex_id":"https://openalex.org/W4393159345","cited_by_count":5,"quality_score":42,"matched_keywords":[],"author_affiliations":["BC Platforms (Finland)","Chinese University of Hong Kong","Huazhong University of Science and Technology","Meta (United States)","Nanyang Technological University","Peking University","Zhejiang Gongshang University"],"concepts":[{"id":"https://openalex.org/C152139883","display_name":"Mutual information","score":0.7087217569351196},{"id":"https://openalex.org/C2777530160","display_name":"Sentence","score":0.6001055240631104},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5897936224937439},{"id":"https://openalex.org/C2776330181","display_name":"Maximization","score":0.570167064666748},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5614919066429138},{"id":"https://openalex.org/C36503486","display_name":"Domain (mathematical analysis)","score":0.4999873638153076},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.36636456847190857},{"id":"https://openalex.org/C153180895","display_name":"Pattern recognition (psychology)","score":0.35792142152786255}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":5}},{"id":"openalex:W4393147882","title":"Diverse and Stable 2D Diffusion Guided Text to 3D Generation with Noise Recalibration","url":"https://doi.org/10.1609/aaai.v38i7.28476","published":"2024-03-24","authors":["Xiaofeng Yang","Fayao Liu","Yi Xu","Hanjing Su","Qingyao Wu","Guosheng Lin"],"abstract":"In recent years, following the success of text guided image generation, text guided 3D generation has gained increasing attention among researchers. Dreamfusion is a notable approach that enhances generation quality by utilizing 2D text guided diffusion models and introducing SDS loss, a technique for distilling 2D diffusion model information to train 3D models. However, the SDS loss has two major limitations that hinder its effectiveness. Firstly, when given a text prompt, the SDS loss struggles to produce diverse content. Secondly, during training, SDS loss may cause the generated content to overfit and collapse, limiting the model's ability to learn intricate texture details. To overcome these challenges, we propose a novel approach called Noise Recalibration algorithm. By incorporating this technique, we can generate 3D content with significantly greater diversity and stunning detail...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1609/aaai.v38i7.28476","openalex_id":"https://openalex.org/W4393147882","cited_by_count":5,"quality_score":42,"matched_keywords":[],"author_affiliations":["Agency for Science, Technology and Research","Institute for Infocomm Research","Nanyang Technological University","South China University of Technology","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C69357855","display_name":"Diffusion","score":0.6298726797103882},{"id":"https://openalex.org/C99498987","display_name":"Noise (video)","score":0.5613585114479065},{"id":"https://openalex.org/C15744967","display_name":"Psychology","score":0.4262952506542206},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.3682537078857422},{"id":"https://openalex.org/C121864883","display_name":"Statistical physics","score":0.34296169877052307},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.23612657189369202},{"id":"https://openalex.org/C121332964","display_name":"Physics","score":0.21167340874671936},{"id":"https://openalex.org/C97355855","display_name":"Thermodynamics","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":5}},{"id":"openalex:W4393148071","title":"PM-INR: Prior-Rich Multi-Modal Implicit Large-Scale Scene Neural Representation","url":"https://doi.org/10.1609/aaai.v38i7.28481","published":"2024-03-24","authors":["Yiying Yang","Fukun Yin","Wen Liu","Jiayuan Fan","Xin Chen","Gang Yu","Tao Chen"],"abstract":"Recent advancements in implicit neural representations have contributed to high-fidelity surface reconstruction and photorealistic novel view synthesis. However, with the expansion of the scene scale, such as block or city level, existing methods will encounter challenges because traditional sampling cannot cope with the cubically growing sampling space. To alleviate the dependence on filling the sampling space, we explore using multi-modal priors to assist individual points to obtain more global semantic information and propose a priorrich multi-modal implicit neural representation network, Pm-INR, for the outdoor unbounded large-scale scene. The core of our method is multi-modal prior extraction and crossmodal prior fusion modules. The former encodes codebooks from different modality inputs and extracts valuable priors, while the latter fuses priors to maintain view consistency and pre...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1609/aaai.v38i7.28481","openalex_id":"https://openalex.org/W4393148071","cited_by_count":4,"quality_score":41,"matched_keywords":[],"author_affiliations":["Fudan University","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C71139939","display_name":"Modal","score":0.6942880749702454},{"id":"https://openalex.org/C2776359362","display_name":"Representation (politics)","score":0.626621425151825},{"id":"https://openalex.org/C2778755073","display_name":"Scale (ratio)","score":0.6170151233673096},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.4507019519805908},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.36240535974502563},{"id":"https://openalex.org/C205649164","display_name":"Geography","score":0.18734878301620483},{"id":"https://openalex.org/C17744445","display_name":"Political science","score":0.10392692685127258},{"id":"https://openalex.org/C192562407","display_name":"Materials science","score":0.10271847248077393}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":4}},{"id":"openalex:W4393146683","title":"Truth Forest: Toward Multi-Scale Truthfulness in Large Language Models through Intervention without Tuning","url":"http://dx.doi.org/10.1609/aaai.v38i19.30087","published":"2024-03-24","authors":["Zhongzhi Chen","Xingwu Sun","Xianfeng Jiao","Fengzong Lian","Zhanhui Kang","Di Wang","Chengzhong Xu"],"abstract":"Despite the great success of large language models (LLMs) in various tasks, they suffer from generating hallucinations. We introduce Truth Forest, a method that enhances truthfulness in LLMs by uncovering hidden truth representations using multi-dimensional orthogonal probes. Specifically, it creates multiple orthogonal bases for modeling truth by incorporating orthogonal constraints into the probes. Moreover, we introduce Random Peek, a systematic technique considering an extended range of positions within the sequence, reducing the gap between discerning and generating truth features in LLMs. By employing this approach, we improved the truthfulness of Llama-2-7B from 40.8% to 74.5% on TruthfulQA. Likewise, significant improvements are observed in fine-tuned models. We conducted a thorough analysis of truth features using probes. Our visualization results show that orthogonal probes cap...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1609/aaai.v38i19.30087","openalex_id":"https://openalex.org/W4393146683","cited_by_count":3,"quality_score":40,"matched_keywords":[],"author_affiliations":["Tencent (China)","University of Macau"],"concepts":[{"id":"https://openalex.org/C2778755073","display_name":"Scale (ratio)","score":0.6624365448951721},{"id":"https://openalex.org/C2780665704","display_name":"Intervention (counseling)","score":0.5147924423217773},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.4369204640388489},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.3707095980644226},{"id":"https://openalex.org/C39432304","display_name":"Environmental science","score":0.35816866159439087},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3373766541481018},{"id":"https://openalex.org/C15744967","display_name":"Psychology","score":0.3348882794380188},{"id":"https://openalex.org/C205649164","display_name":"Geography","score":0.1766670048236847}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":3}},{"id":"openalex:W4393152847","title":"Reliable Data Generation and Selection for Low-Resource Relation Extraction","url":"http://dx.doi.org/10.1609/aaai.v38i17.29915","published":"2024-03-24","authors":["Junjie Yu","Xing Wang","Wenliang Chen"],"abstract":"Automated construction of annotated data holds significant importance in Relation Extraction (RE) tasks due to the hardness and cost of human annotation. In this work, we propose Self-RDGS, a method for Self-supervised Reliable Data Generation and Selection in low-resource RE tasks. At first, we fully utilize the knowledge of triplets as prompts to generate sentences by employing the Large Language Models (LLMs). Since the auto-generated data contains noise, we then propose a ranking-based data selection method to select reliable sentences. Finally, we integrate the data selection and RE model training within a self-supervised iterative framework. Through experimentation on three datasets with low-resource settings, we demonstrate the effectiveness of our proposed approach in constructing annotated data and achieving noteworthy improvements in comparison to multiple baselines. Code, data...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1609/aaai.v38i17.29915","openalex_id":"https://openalex.org/W4393152847","cited_by_count":3,"quality_score":40,"matched_keywords":[],"author_affiliations":["Tencent (China)"],"concepts":[{"id":"https://openalex.org/C81917197","display_name":"Selection (genetic algorithm)","score":0.6250072717666626},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5980389714241028},{"id":"https://openalex.org/C25343380","display_name":"Relation (database)","score":0.5320404767990112},{"id":"https://openalex.org/C206345919","display_name":"Resource (disambiguation)","score":0.46760550141334534},{"id":"https://openalex.org/C4725764","display_name":"Extraction (chemistry)","score":0.44767817854881287},{"id":"https://openalex.org/C2777466982","display_name":"Data extraction","score":0.4292728304862976},{"id":"https://openalex.org/C124101348","display_name":"Data mining","score":0.34961754083633423},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.2370358407497406}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":3}},{"id":"openalex:W4393160278","title":"A Label Disambiguation-Based Multimodal Massive Multiple Instance Learning Approach for Immune Repertoire Classification","url":"http://dx.doi.org/10.1609/aaai.v38i14.29547","published":"2024-03-24","authors":["Fan Xu","Yu Zhao","Bingzhe Wu","Yueshan Huang","Qin Ren","Yang Xiao","Bing He","Jie Zheng","Jianhua Yao"],"abstract":"One individual human’s immune repertoire consists of a huge set of adaptive immune receptors at a certain time point, representing the individual's adaptive immune state. Immune repertoire classification and associated receptor identification have the potential to make a transformative contribution to the development of novel vaccines and therapies. The vast number of instances and exceedingly low witness rate pose a great challenge to the immune repertoire classification, which can be formulated as a Massive Multiple Instance Learning (MMIL) problem. Traditional MIL methods, at both bag-level and instance-level, confront the issues of substantial computational burden or supervision ambiguity when handling massive instances. To address these issues, we propose a novel label disambiguation-based multimodal massive multiple instance learning approach (LaDM³IL) for immune repertoire classif...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1609/aaai.v38i14.29547","openalex_id":"https://openalex.org/W4393160278","cited_by_count":1,"quality_score":38,"matched_keywords":[],"author_affiliations":["Shanghai Jiao Tong University","ShanghaiTech University","Tencent (China)","Tsinghua University"],"concepts":[{"id":"https://openalex.org/C2778473898","display_name":"Repertoire","score":0.8894015550613403},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5593640804290771},{"id":"https://openalex.org/C3018252375","display_name":"Immune recognition","score":0.45157331228256226},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.44677236676216125},{"id":"https://openalex.org/C8891405","display_name":"Immune system","score":0.42877498269081116},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.3586353659629822},{"id":"https://openalex.org/C86803240","display_name":"Biology","score":0.23872968554496765},{"id":"https://openalex.org/C203014093","display_name":"Immunology","score":0.08605140447616577}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"apple:vmxhpbhu0crrrwxl4hswv6sg","title":"A Multi-signal Large Language Model for Device-directed Speech Detection","url":"https://machinelearning.apple.com/research/llm-device-directed-speech-detection","published":"2024-03-22","authors":["Dominik Wagner","Alex Churchill","Siddharth Sigtia","Panos Georgiou","Matt Mirsamadi","Aarshee Mishra","Erik Marchi"],"abstract":"We present an architecture for device-directed speech detection that treats the task as a text-generation problem. We use a multi-modal fusion approach that combines acoustic information from the recorded audio waveform with text and confidence information obtained from an automatic speech recognition system. The audio waveform is represented as a sequence of continuous embeddings by an audio encoder and presented as a prefix token to a...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["language model"],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/can-large-language-models-explore-in-context","title":"Can large language models explore in-context?","url":"https://www.microsoft.com/en-us/research/publication/can-large-language-models-explore-in-context/","published":"2024-03-21","authors":["Akshay Krishnamurthy","Keegan Harris","Dylan Foster","Cyril Zhang","Aleksandrs Slivkins"],"abstract":"We investigate the extent to which contemporary Large Language Models (LLMs) can engage in exploration, a core capability in reinforcement learning and decision making. We focus on native performance of existing LLMs, without training interventions. We deploy LLMs as agents in simple multi-armed bandit environments, specifying the environment description and interaction history entirely in-context, i.e., within the LLM prompt. We experiment with GPT-3.5, GPT-4, and Llama2, using a variety of prompt designs, and find that the models do not robustly engage in exploration without substantial interventions: i) Across all of our experiments, only one configuration resulted in satisfactory exploratory behavior: GPT-4 with chain-of-thought reasoning and an externally summarized interaction history, presented as sufficient statistics; ii) All other configurations did not result in robust explora...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","large language models","1970-01-01","LLM"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/appropriate-reliance-on-generative-ai-research-synthesis","title":"Appropriate reliance on Generative AI: Research synthesis","url":"https://www.microsoft.com/en-us/research/publication/appropriate-reliance-on-generative-ai-research-synthesis/","published":"2024-03-21","authors":["Samir Passi","Shipi Dhanorkar","Mihaela Vorvoreanu"],"abstract":"Appropriate reliance on AI happens when users accept correct AI outputs and reject incorrect ones. New complexities arise for fostering appropriate reliance on generative AI (GenAI) systems. GenAI systems pose several risks, despite often rivaling, and sometimes surpassing, human performance on many tasks. Inappropriate reliance – either under-reliance or overreliance – on GenAI can have negative consequences such as poor human+GenAI team performance and even product abandonment. Based on a review of ~50 papers from multiple research areas, this report provides an overview of the factors that affect overreliance on GenAI, the effectiveness of different mitigation strategies for overreliance on GenAI, and potential design strategies to facilitate appropriate reliance on GenAI. See also our 2022 research synthesis on Overreliance on AI Cite as: Samir Passi, Shipi Dhanorkar, & Mihaela Vorvo...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Tech Report","Artificial intelligence"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"apple:pyrtivu3donqe3kfdh2tnyzx","title":"TiC-CLIP: Continual Training of CLIP Models","url":"https://machinelearning.apple.com/research/tic-clip-v2","published":"2024-03-21","authors":["Saurabh Garg","Hadi Pour Ansari","Mehrdad Farajtabar","Sachin Mehta","Raviteja Vemulapalli","Oncel Tuzel","Vaishaal Shankar","Fartash Faghri"],"abstract":"Keeping large foundation models up to date on latest data is inherently expensive. To avoid the prohibitive costs of constantly retraining, it is imperative to continually train these models. This problem is exacerbated by the lack of any large scale continual learning benchmarks or baselines. We introduce the first set of web-scale Time-Continual (TiC) benchmarks for training vision-language models: TiC-DataComp, TiC-YFCC, and TiC-Redcaps....","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/llmlingua-2-data-distillation-for-efficient-and-faithful-task-agnostic-prompt-compression","title":"LLMLingua-2: Data Distillation for Efficient and Faithful Task-Agnostic Prompt Compression","url":"https://www.microsoft.com/en-us/research/publication/llmlingua-2-data-distillation-for-efficient-and-faithful-task-agnostic-prompt-compression/","published":"2024-03-20","authors":["Zhuoshi Pan","Qianhui Wu","Huiqiang Jiang","Menglin Xia","Xufang Luo","Jue Zhang","Qingwei Lin 林庆维","Victor Ruehle","Yuqing Yang","Chin-Yew Lin","H. Vicky Zhao","Lili Qiu"],"abstract":"This paper focuses on task-agnostic prompt compression for better generalizability and efficiency. Considering the redundancy in natural language, existing approaches compress prompts by removing tokens or lexical units according to their information entropy obtained from a causal language model such as LLaMa-7B. The challenge is that information entropy may be a suboptimal compression metric: (i) it only leverages unidirectional context and may fail to capture all essential information needed for prompt compression; (ii) it is not aligned with the prompt compression objective. To address these issues, we propose a data distillation procedure to derive knowledge from an LLM to compress prompts without losing crucial information, and meantime, introduce an extractive text compression dataset. We formulate prompt compression as a token classification problem to guarantee the faithfulness o...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":96,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Human language technologies","Efficient algorithm","LLMs Inference","1970-01-01","LLM","language model","efficient","compression","distillation"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/speechlm-enhanced-speech-pre-training-with-unpaired-textual-data","title":"SpeechLM: Enhanced Speech Pre-Training With Unpaired Textual Data","url":"https://www.microsoft.com/en-us/research/publication/speechlm-enhanced-speech-pre-training-with-unpaired-textual-data/","published":"2024-03-20","authors":["Ziqiang Zhang","Sanyuan Chen","Long Zhou","Yu Wu","Shuo Ren","Shujie Liu","Zhuoyuan Yao","Xun Gong","Lirong Dai","Jinyu Li","Furu Wei"],"abstract":"How to boost speech pre-training with textual data is an unsolved problem due to the fact that speech and text are very different modalities with distinct characteristics. In this paper, we propose a cross-modal Speech and L anguage M odel ( SpeechLM ) to explicitly align speech and text pre-training with a pre-defined unified discrete representation. Specifically, we introduce two alternative discrete tokenizers to bridge the speech and text modalities, including phoneme-unit and hidden-unit tokenizers, which can be trained using unpaired speech or a small amount of paired speech-text data. Based on the trained tokenizers, we convert the unlabeled speech and text data into tokens of phoneme units or hidden units. The pre-training objective is designed to unify the speech and the text into the same discrete semantic space with a unified Transformer network. We evaluate SpeechLM on variou...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.1109/taslp.2024.3379877","openalex_id":"https://openalex.org/W4392979802","cited_by_count":30,"quality_score":94,"matched_keywords":["Article (Journal)","Human language technologies","1970-01-01"],"author_affiliations":["Microsoft","Harbin Institute of Technology","Microsoft (United States)","Microsoft Research Asia (China)","University of Science and Technology of China"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"bytedance-seed:75","title":"Magic-Me: Identity-Specific Video Customized Diffusion","url":"https://seed.bytedance.com/en/research/magic-me-identity-specific-video-customized-diffusion","published":"2024-03-20","authors":["Ze Ma","Daquan Zhou","Chun-Hsiao Yeh","Xue-She Wang","Xiuyu Li","Huanrui Yang","Zhen Dong","Kurt Keutzer","Jiashi Feng"],"abstract":"Creating content with specified identities (ID) has attracted significant interest in the field of generative models. In the field of text-to-image generation (T2I), subject-driven creation has achieved great progress with the identity controlled via reference images. However, its extension to video generation is not well explored. In this work, we propose a simple yet effective subject identity controllable video generation framework, termed Video Custom Diffusion (VCD). With a specified identity defined by a few images, VCD reinforces the identity characteristics and injects frame-wise correlation at the initialization stage for stable video outputs. To achieve this, we propose three novel components that are essential for high-quality identity preservation and stable video generation: 1) a noise initialization method with 3D Gaussian Noise Prior for better inter-frame stability; 2) an...","companies":["ByteDance/Seed"],"matched_orgs":["ByteDance/Seed"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Computer Vision","Vision","arXiv","personalized"],"author_affiliations":["ByteDance/Seed"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official ByteDance Seed publication API https://seed.bytedance.com/api/get_article_list_v2"}},{"id":"openalex:W4392980188","title":"Turning a CLIP Model Into a Scene Text Spotter","url":"https://doi.org/10.1109/tpami.2024.3379828","published":"2024-03-20","authors":["Wenwen Yu","Yuliang Liu","Xingkui Zhu","Haoyu Cao","Xing Sun","Xiang Bai"],"abstract":"We exploit the potential of the large-scale Contrastive Language-Image Pretraining (CLIP) model to enhance scene text detection and spotting tasks, transforming it into a robust backbone, FastTCM-CR50. This backbone utilizes visual prompt learning and cross-attention in CLIP to extract image and text-based prior knowledge. Using predefined and learnable prompts, FastTCM-CR50 introduces an instance-language matching process to enhance the synergy between image and text embeddings, thereby refining text regions. Our Bimodal Similarity Matching (BSM) module facilitates dynamic language prompt generation, enabling offline computations and improving performance. FastTCM-CR50 offers several advantages: 1) It can enhance existing text detectors and spotters, improving performance by an average of 1.6% and 1.5%, respectively. 2) It outperforms the previous TCM-CR50 backbone, yielding an average....","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/tpami.2024.3379828","openalex_id":"https://openalex.org/W4392980188","cited_by_count":19,"quality_score":56,"matched_keywords":[],"author_affiliations":["Huazhong University of Science and Technology","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7101097106933594},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.6666057109832764},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.5342705845832825},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.4234147369861603},{"id":"https://openalex.org/C153180895","display_name":"Pattern recognition (psychology)","score":0.3983081579208374},{"id":"https://openalex.org/C121684516","display_name":"Computer graphics (images)","score":0.37753215432167053},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.3487395644187927}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":19}},{"id":"apple:tp1irl5a0q6w02pt7otsr9bp","title":"MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training","url":"https://machinelearning.apple.com/research/mm1-methods-analysis-insights","published":"2024-03-20","authors":["Brandon McKinzie","Zhe Gan","Jean-Philippe Fauconnier Biard","Sam Dodge","Philipp Dufter","Bowen Zhang","Dhruti Shah","Xianzhi Du","Futang Peng","Haotian Zhang","Floris Weers","Anton Belyi"],"abstract":"In this work, we discuss building performant Multimodal Large Language Models (MLLMs). In particular, we study the importance of various architecture components and data choices. Through careful and comprehensive ablations of the image encoder, the vision language connector, and various pre-training data choices, we identified several crucial design lessons. For example, we demonstrate that for large-scale multimodal pre-training using a careful...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["LLM"],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"openalex:W4393002806","title":"Instruction Multi-Constraint Molecular Generation Using a Teacher-Student Large Language Model","url":"http://dx.doi.org/10.21203/rs.3.rs-3845824/v1","published":"2024-03-20","authors":["Xiangxiang Zeng","Peng Zhou","Jianmin Wang","Chunyan Li","Zixu Wang","Yiping Liu","Siqi Sun","Jianxin Lin","Longyue Wang"],"abstract":"","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"preprint","doi":"https://doi.org/10.21203/rs.3.rs-3845824/v1","openalex_id":"https://openalex.org/W4393002806","cited_by_count":1,"quality_score":42,"matched_keywords":["language model"],"author_affiliations":["Fudan University","Hunan University","Tencent (China)","University of Tsukuba","Yonsei University","Yunnan Normal University"],"concepts":[{"id":"https://openalex.org/C2776036281","display_name":"Constraint (computer-aided design)","score":0.700564980506897},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5396561622619629},{"id":"https://openalex.org/C145420912","display_name":"Mathematics education","score":0.5068989396095276},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.4549944996833801},{"id":"https://openalex.org/C15744967","display_name":"Psychology","score":0.3284638226032257},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.2972121238708496},{"id":"https://openalex.org/C33923547","display_name":"Mathematics","score":0.1173962950706482},{"id":"https://openalex.org/C2524010","display_name":"Geometry","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/defending-against-indirect-prompt-injection-attacks-with-spotlighting","title":"Defending Against Indirect Prompt Injection Attacks With Spotlighting","url":"https://www.microsoft.com/en-us/research/publication/defending-against-indirect-prompt-injection-attacks-with-spotlighting/","published":"2024-03-19","authors":["Keegan Hines","Gary Lopez","Matthew Hall","Federico Zarfati","Yonatan Zunger","Emre Kiciman"],"abstract":"Large Language Models (LLMs), while powerful, are built and trained to process a single text input. In common applications, multiple inputs can be processed by concatenating them together into a single stream of text. However, the LLM is unable to distinguish which sections of prompt belong to various input sources. Indirect prompt injection attacks take advantage of this vulnerability by embedding adversarial instructions into untrusted data being processed alongside user commands. Often, the LLM will mistake the adversarial instructions as user commands to be followed, creating a security vulnerability in the larger system. We introduce spotlighting, a family of prompt engineering techniques that can be used to improve LLMs' ability to distinguish among multiple sources of input. The key insight is to utilize transformations of an input to provide a reliable and continuous signal of it...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Article (Journal)","Artificial intelligence","Security, privacy, and cryptography","Computer science","LLM"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"official:59411c8662ab0e45","title":"SceneScript: Reconstructing Scenes With An Autoregressive Structured Language Model","url":"https://ai.meta.com/research/publications/scenescript-reconstructing-scenes-with-an-autoregressive-structured-language-model/","published":"2024-03-19","authors":["Armen Avetisyan","Chris Xie","Henry Howard-Jenkins","Tsun-Yi Yang","Samir Aroudj","Suvam Patra","Fuyang Zhang","Duncan Frost","Luke Holland","Campbell Orme","Jakob Julian Engel","Edward Miller"],"abstract":"Official Meta AI publication page.","companies":["Meta/FAIR"],"matched_orgs":["Meta/FAIR"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Computer Vision","language model"],"author_affiliations":["Meta/FAIR"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Meta AI publication search https://ai.meta.com/results/?content_types%5B0%5D=publication&page=15"}},{"id":"arxiv:2402.10294","title":"LAVE: LLM-Powered Agent Assistance and Language Augmentation for Video Editing","url":"http://arxiv.org/abs/2402.10294","published":"2024-03-18","authors":["Bryan Wang","Yuliang Li","Zhaoyang Lv","Haijun Xia","Yan Xu","Raj S. Sodhi"],"abstract":"Video creation has become increasingly popular, yet the expertise and effort required for editing often pose barriers to beginners. In this paper, we explore the integration of large language models (LLMs) into the video editing workflow to reduce these barriers. Our design vision is embodied in LAVE, a novel system that provides LLM-powered agent assistance and language-augmented editing features. LAVE automatically generates language descriptions for the user’s footage, serving as the foundation for enabling the LLM to process videos and assist in editing tasks. When the user provides editing objectives, the agent plans and executes relevant actions to fulfill them. Moreover, LAVE allows users to edit videos through either the agent or direct UI manipulation, providing flexibility and enabling manual refinement of agent actions. Our user study, which included eight participants ranging...","companies":["Meta/FAIR"],"matched_orgs":["Meta/FAIR"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"preprint","doi":"https://doi.org/10.1145/3640543.3645143","openalex_id":"https://openalex.org/W4391940619","cited_by_count":56,"quality_score":75,"matched_keywords":["LLM","agent"],"author_affiliations":["META Health","Meta (United States)","University of California San Diego","University of Toronto"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8392974138259888},{"id":"https://openalex.org/C177212765","display_name":"Workflow","score":0.7655279636383057},{"id":"https://openalex.org/C100609095","display_name":"Embodied cognition","score":0.6163866519927979},{"id":"https://openalex.org/C2780598303","display_name":"Flexibility (engineering)","score":0.6078859567642212},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.5760354995727539},{"id":"https://openalex.org/C2780310081","display_name":"Video editing","score":0.5627329349517822},{"id":"https://openalex.org/C49774154","display_name":"Multimedia","score":0.48069697618484497},{"id":"https://openalex.org/C98045186","display_name":"Process (computing)","score":0.4548705816268921}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":56}},{"id":"openalex:W4392904157","title":"Leveraging Speech PTM, Text LLM, And Emotional TTS For Speech Emotion Recognition","url":"https://doi.org/10.1109/icassp48485.2024.10445906","published":"2024-03-18","authors":["Ziyang Ma","Wen Wu","Zhisheng Zheng","Yiwei Guo","Qian Chen","Shiliang Zhang","Xie Chen"],"abstract":"In this paper, we explored how to boost speech emotion recognition (SER) with the state-of-the-art speech pre-trained model (PTM), data2vec, text generation technique, GPT-4, and speech synthesis technique, Azure TTS. First, we investigated the representation ability of different speech self-supervised pre-trained models, and we found that data2vec has a good representation ability on the SER task. Second, we employed a powerful large language model (LLM), GPT-4, and emotional text-to-speech (TTS) model, Azure TTS, to generate emotionally congruent text and speech. We carefully designed the text prompt and dataset construction, to obtain the synthetic emotional speech data with high quality. Third, we studied different ways of data augmentation to promote the SER task with synthetic speech, including random mixing, adversarial training, transfer learning, and curriculum learning. Experim...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/icassp48485.2024.10445906","openalex_id":"https://openalex.org/W4392904157","cited_by_count":29,"quality_score":74,"matched_keywords":["LLM","language model"],"author_affiliations":["Alibaba Group (China)","Shanghai Jiao Tong University","University of Cambridge"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8000236749649048},{"id":"https://openalex.org/C28490314","display_name":"Speech recognition","score":0.658231258392334},{"id":"https://openalex.org/C14999030","display_name":"Speech synthesis","score":0.6006832122802734},{"id":"https://openalex.org/C2780451532","display_name":"Task (project management)","score":0.5728049278259277},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.521238386631012},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.49392610788345337},{"id":"https://openalex.org/C2777438025","display_name":"Emotion recognition","score":0.4726352095603943},{"id":"https://openalex.org/C2776359362","display_name":"Representation (politics)","score":0.4124554991722107}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":29}},{"id":"openalex:W4392903872","title":"SALM: Speech-Augmented Language Model with in-Context Learning for Speech Recognition and Translation","url":"https://doi.org/10.1109/icassp48485.2024.10447553","published":"2024-03-18","authors":["Zhehuai Chen","He Huang","Andrei Andrusenko","Oleksii Hrinchuk","Krishna C. Puvvada","Jason Li","Subhankar Ghosh","Jagadeesh Balam","Boris Ginsburg"],"abstract":"We present a novel Speech Augmented Language Model (SALM) with multitask and in-context learning capabilities. SALM comprises a frozen text LLM, a audio encoder, a modality adapter module, and LoRA layers to accommodate speech input and associated task instructions. The unified SALM not only achieves performance on par with task-specific Conformer baselines for Automatic Speech Recognition (ASR) and Speech Translation (AST), but also exhibits zero-shot in-context learning capabilities, demonstrated through keyword-boosting task for ASR and AST. Moreover, speech supervised in-context training is proposed to bridge the gap between LLM training and downstream speech tasks, which further boosts the in-context learning ability of speech-to-text models. Proposed model is open-sourced via NeMo toolkit <sup xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" xmlns:xlink=\"http://www.w3.org/1999/xlink\"...","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/icassp48485.2024.10447553","openalex_id":"https://openalex.org/W4392903872","cited_by_count":25,"quality_score":70,"matched_keywords":["LLM","language model"],"author_affiliations":["Nvidia (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7923698425292969},{"id":"https://openalex.org/C28490314","display_name":"Speech recognition","score":0.6855289340019226},{"id":"https://openalex.org/C177284502","display_name":"Adapter (computing)","score":0.6668733358383179},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.6016342043876648},{"id":"https://openalex.org/C46686674","display_name":"Boosting (machine learning)","score":0.5601551532745361},{"id":"https://openalex.org/C118505674","display_name":"Encoder","score":0.513323962688446},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5116696357727051},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.4884997606277466}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":25}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/securing-large-language-models-threats-vulnerabilities-and-responsible-practices","title":"Securing Large Language Models: Threats, Vulnerabilities and Responsible Practices","url":"https://www.microsoft.com/en-us/research/publication/securing-large-language-models-threats-vulnerabilities-and-responsible-practices/","published":"2024-03-18","authors":["Sara Abdali","Richard Anarfi","C. Barberan","Jia He","Erfan Shayegani"],"abstract":"Large language models (LLMs) have significantly transformed the landscape of Natural Language Processing (NLP). Their impact extends across a diverse spectrum of tasks, revolutionizing how we approach language understanding and generations. Nevertheless, alongside their remarkable utility, LLMs introduce critical security and risk considerations. These challenges warrant careful examination to ensure responsible deployment and safeguard against potential vulnerabilities. This research paper thoroughly investigates security and privacy concerns related to LLMs from five thematic perspectives: security and privacy concerns, vulnerabilities against adversarial attacks, potential harms caused by misuses of LLMs, mitigation strategies to address these challenges while identifying limitations of current strategies. Lastly, the paper recommends promising avenues for future research to enhance t...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Miscellaneous","Artificial intelligence","Security, privacy, and cryptography","Computer science"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W4392909390","title":"Music Understanding LLaMA: Advancing Text-to-Music Generation with Question Answering and Captioning","url":"https://doi.org/10.1109/icassp48485.2024.10447027","published":"2024-03-18","authors":["Shansong Liu","Atin Sakkeer Hussain","Chenshuo Sun","Ying Shan"],"abstract":"Text-to-music generation (T2M-Gen) faces a major obstacle due to the scarcity of large-scale publicly available music datasets with natural language captions. To address this, we propose the Music Understanding LLaMA (MU-LLaMA), capable of answering music-related questions and generating captions for music files. Our model utilizes audio representations from a pretrained MERT model to extract music features. However, obtaining a suitable dataset for training the MU-LLaMA model remains challenging, as existing publicly accessible audio question answering datasets lack the necessary depth for open-ended music question answering. To fill this gap, we present a methodology for generating question-answer pairs from existing audio captioning datasets and introduce the MusicQA Dataset designed for answering open-ended music-related questions. The experiments demonstrate that the proposed MU-LLa...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/icassp48485.2024.10447027","openalex_id":"https://openalex.org/W4392909390","cited_by_count":31,"quality_score":67,"matched_keywords":[],"author_affiliations":["National University of Singapore","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C157657479","display_name":"Closed captioning","score":0.8394982218742371},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6857435703277588},{"id":"https://openalex.org/C44291984","display_name":"Question answering","score":0.529364824295044},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.47377753257751465},{"id":"https://openalex.org/C28490314","display_name":"Speech recognition","score":0.41308891773223877},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3479394316673279},{"id":"https://openalex.org/C115961682","display_name":"Image (mathematics)","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":31}},{"id":"openalex:W4392902855","title":"Enhancing Speaker Diarization with Large Language Models: A Contextual Beam Search Approach","url":"https://doi.org/10.1109/icassp48485.2024.10446204","published":"2024-03-18","authors":["Tae Jin Park","Kunal Dhawan","Nithin Rao Koluguri","Jagadeesh Balam"],"abstract":"Large language models (LLMs) have shown great promise for capturing contextual information in natural language processing tasks. We propose a novel approach to speaker diarization that incorporates the prowess of LLMs to exploit contextual cues in human dialogues. Our method builds upon an acoustic-based speaker diarization system by adding lexical information from an LLM in the inference stage. We model the multi-modal decoding process probabilistically and perform joint acoustic and lexical beam searches to incorporate cues from both modalities: audio and text. Our experiments demonstrate that infusing lexical knowledge from the LLM into an acoustics-only diarization system improves the overall speaker-attributed word error rate (SA-WER). The experimental results show that LLMs can provide complementary information to acoustic models for the speaker diarization task via the proposed be...","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/icassp48485.2024.10446204","openalex_id":"https://openalex.org/W4392902855","cited_by_count":9,"quality_score":50,"matched_keywords":["LLM"],"author_affiliations":["Nvidia (United States)"],"concepts":[{"id":"https://openalex.org/C149838564","display_name":"Speaker diarisation","score":0.875031590461731},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7774475812911987},{"id":"https://openalex.org/C28490314","display_name":"Speech recognition","score":0.6097027659416199},{"id":"https://openalex.org/C165696696","display_name":"Exploit","score":0.5474106669425964},{"id":"https://openalex.org/C2780451532","display_name":"Task (project management)","score":0.5431011319160461},{"id":"https://openalex.org/C2776214188","display_name":"Inference","score":0.5320974588394165},{"id":"https://openalex.org/C40969351","display_name":"Word error rate","score":0.5179868936538696},{"id":"https://openalex.org/C98045186","display_name":"Process (computing)","score":0.4682990610599518}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":9}},{"id":"openalex:W4392904577","title":"Image Retrieval with Composed Query by Multi-Scale Multi-Modal Fusion","url":"https://doi.org/10.1109/icassp48485.2024.10446291","published":"2024-03-18","authors":["Zelong Sun","G. Yang","Zhiwu Lu","Hao Jiang","Guojie Zhu","Zhao Cao"],"abstract":"Image retrieval with composed query (IR-CQ) is a challenging task since it aims to retrieve the target image according to a hybrid-modality query which consists of a reference image and a text modifier. Previous approaches mainly focus on designing various multi-modal fusion modules to fuse the hybrid-modality query, but these fusion modules are often suboptimal without considering sufficient fusion between the two modalities. In this paper, we propose a general fusion block by taking three fusion strategies: weighted summing, concatenating, and bilinear pooling. Importantly, this general fusion block can be deployed to fuse not only the hybrid-modality query but also the multi-scale features of the reference image. Specifically, we first fuse the multi-scale features of the reference image with the Multi-Scale Fusion (MSF) block and then fuse the features of the reference image and text...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/icassp48485.2024.10446291","openalex_id":"https://openalex.org/W4392904577","cited_by_count":7,"quality_score":48,"matched_keywords":["retrieval"],"author_affiliations":["Huawei Technologies (China)","Renmin University of China"],"concepts":[{"id":"https://openalex.org/C141353440","display_name":"Fuse (electrical)","score":0.864750862121582},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7207809090614319},{"id":"https://openalex.org/C69744172","display_name":"Image fusion","score":0.6944507956504822},{"id":"https://openalex.org/C158525013","display_name":"Fusion","score":0.6146492958068848},{"id":"https://openalex.org/C2780226545","display_name":"Modality (human–computer interaction)","score":0.599800169467926},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5490350127220154},{"id":"https://openalex.org/C2777210771","display_name":"Block (permutation group theory)","score":0.5477095246315002},{"id":"https://openalex.org/C71139939","display_name":"Modal","score":0.5086015462875366}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":7}},{"id":"openalex:W4392909790","title":"SlideSpeech: A Large Scale Slide-Enriched Audio-Visual Corpus","url":"https://doi.org/10.1109/icassp48485.2024.10448079","published":"2024-03-18","authors":["Haoxu Wang","Fan Yu","Xian Shi","Yuezhang Wang","Shiliang Zhang","Ming Li"],"abstract":"Multi-Modal automatic speech recognition (ASR) techniques aim to leverage additional modalities to improve the performance of speech recognition systems. While existing approaches primarily focus on video or contextual information, the utilization of extra supplementary textual information has been overlooked. Recognizing the abundance of online conference videos with slides, which provide rich domain-specific information in the form of text and images, we release SlideSpeech, a large-scale audio-visual corpus enriched with slides. The corpus contains 1,705 videos, 1,000+ hours, with 473 hours of high-quality transcribed speech. Moreover, the corpus contains a significant amount of real-time synchronized slides. In this work, we present the pipeline for constructing the corpus and propose baseline methods for utilizing text information in the visual slide context. Through the application...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/icassp48485.2024.10448079","openalex_id":"https://openalex.org/W4392909790","cited_by_count":10,"quality_score":47,"matched_keywords":[],"author_affiliations":["Alibaba Group (China)","Duke Kunshan University","Wuhan University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8626223206520081},{"id":"https://openalex.org/C153083717","display_name":"Leverage (statistics)","score":0.6573168039321899},{"id":"https://openalex.org/C43521106","display_name":"Pipeline (software)","score":0.6229197382926941},{"id":"https://openalex.org/C28490314","display_name":"Speech recognition","score":0.6031224131584167},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.5061487555503845},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.45090451836586},{"id":"https://openalex.org/C192209626","display_name":"Focus (optics)","score":0.4475138187408447},{"id":"https://openalex.org/C2779343474","display_name":"Context (archaeology)","score":0.44432389736175537}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":10}},{"id":"openalex:W4392904283","title":"Dynamic Data Sampler for Cross-Language Transfer Learning in Large Language Models","url":"https://doi.org/10.1109/icassp48485.2024.10446640","published":"2024-03-18","authors":["Yudong Li","Yuhao Feng","Zhou Wen","Zhe Zhao","Linlin Shen","Cheng Hou","Xianxu Hou"],"abstract":"Large Language Models (LLMs) have gained significant attention in the field of natural language processing (NLP) due to their wide range of applications. However, training LLMs for languages other than English poses significant challenges, due to the difficulty in acquiring large-scale corpus and the requisite computing resources. In this paper, we propose ChatFlow, a cross-language transfer-based LLM, to address these challenges and train large Chinese language models in a cost-effective manner. We employ a mix of Chinese, English, and parallel corpus to continuously train the LLaMA2 model, aiming to align cross-language representations and facilitate the knowledge transfer specifically to the Chinese language model. In addition, we use a dynamic data sampler to progressively transition the model from unsupervised pre-training to supervised fine-tuning. Experimental results demonstrate....","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/icassp48485.2024.10446640","openalex_id":"https://openalex.org/W4392904283","cited_by_count":2,"quality_score":47,"matched_keywords":["LLM","language model"],"author_affiliations":["Shenzhen University","Software (Spain)","State Key Laboratory of Information Engineering in Surveying Mapping and Remote Sensing","Tencent (China)","Wuhan University","Xi’an Jiaotong-Liverpool University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8580116033554077},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.7035836577415466},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.66310054063797},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.6257749199867249},{"id":"https://openalex.org/C150899416","display_name":"Transfer of learning","score":0.5592141151428223},{"id":"https://openalex.org/C2777303404","display_name":"Convergence (economics)","score":0.5003831386566162},{"id":"https://openalex.org/C9652623","display_name":"Field (mathematics)","score":0.4738231897354126},{"id":"https://openalex.org/C2776175482","display_name":"Transfer (computing)","score":0.45357030630111694}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":2}},{"id":"openalex:W4392903247","title":"AutoPrep: An Automatic Preprocessing Framework for In-The-Wild Speech Data","url":"https://doi.org/10.1109/icassp48485.2024.10447759","published":"2024-03-18","authors":["Jianwei Yu","Hangting Chen","Yanyao Bian","Xiang Li","Yi Luo","Jinchuan Tian","Mengyang Liu","Jiayi Jiang","Shuai Wang"],"abstract":"Recently, the utilization of extensive open-sourced text data has significantly advanced the performance of text-based large language models (LLMs). However, the use of in-the-wild large-scale speech data in the speech technology community remains constrained. One reason for this limitation is that a considerable amount of the publicly available speech data is compromised by background noise, speech overlapping, lack of speech segmentation information, missing speaker labels, and incomplete transcriptions, which can largely hinder their usefulness. On the other hand, human annotation of speech data is both time-consuming and costly. To address this issue, we introduce an automatic in-the-wild speech data preprocessing framework (AutoPrep) in this paper, which is designed to enhance speech quality, generate speaker labels, and produce transcriptions automatically. The proposed AutoPrep fr...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/icassp48485.2024.10447759","openalex_id":"https://openalex.org/W4392903247","cited_by_count":8,"quality_score":45,"matched_keywords":[],"author_affiliations":["Shenzhen Research Institute of Big Data","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8219085931777954},{"id":"https://openalex.org/C28490314","display_name":"Speech recognition","score":0.7171862125396729},{"id":"https://openalex.org/C34736171","display_name":"Preprocessor","score":0.6661955118179321},{"id":"https://openalex.org/C73555534","display_name":"Cluster analysis","score":0.6233090162277222},{"id":"https://openalex.org/C204201278","display_name":"Voice activity detection","score":0.5863033533096313},{"id":"https://openalex.org/C89600930","display_name":"Segmentation","score":0.5343382358551025},{"id":"https://openalex.org/C2776321320","display_name":"Annotation","score":0.5003945827484131},{"id":"https://openalex.org/C99498987","display_name":"Noise (video)","score":0.49516716599464417}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":8}},{"id":"openalex:W4392909978","title":"A Unified Framework for Multi-Intent Spoken Language Understanding with Prompting","url":"https://doi.org/10.1109/icassp48485.2024.10447804","published":"2024-03-18","authors":["Feifan Song","Lianzhe Huang","Houfeng Wang"],"abstract":"ChatGPT has demonstrated impressive capabilities in building conversations. However, for Spoken Language Understanding (SLU) with multiple intents, traditional approaches where Intent Detection and Slot Filling are jointly modeled with distinct formulations hinder networks from effectively extracting shared features. In this work, we describe a Prompt-based SLU (PromptSLU) framework, to intuitively unify two sub-tasks into the same form for a common pre-trained model. Specifically, variable intents are predicted first, then naturally embedded into prompts to guide slot-value inference from a semantic perspective. Furthermore, we are inspired by multi-task learning to introduce an auxiliary sub-task and a concise general objective, which helps to learn relationships among provided labels. Experiment results show that our framework outperforms several competitive baselines on two datasets....","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/icassp48485.2024.10447804","openalex_id":"https://openalex.org/W4392909978","cited_by_count":7,"quality_score":44,"matched_keywords":[],"author_affiliations":["Peking University","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.776190996170044},{"id":"https://openalex.org/C2776230583","display_name":"Spoken language","score":0.5370925664901733},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.42732134461402893},{"id":"https://openalex.org/C41895202","display_name":"Linguistics","score":0.3797810673713684},{"id":"https://openalex.org/C138885662","display_name":"Philosophy","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":7}},{"id":"openalex:W4392904709","title":"Unified Pretraining Target Based Video-Music Retrieval with Music Rhythm and Video Optical Flow Information","url":"https://doi.org/10.1109/icassp48485.2024.10446029","published":"2024-03-18","authors":["Tianjun Mao","Shansong Liu","Yunxuan Zhang","Dian Li","Ying Shan"],"abstract":"Background music (BGM) can enhance the video’s emotion. However, selecting an appropriate BGM often requires domain knowledge. This has led to the development of video-music retrieval techniques. Most existing approaches utilize pretrained video/music feature extractors trained with different target sets to obtain average video/music-level embeddings. The drawbacks are two-fold. One is that different target sets for video/music pretraining may cause the generated embeddings difficult to match. The second is that the underlying temporal correlation between video and music is ignored. In this paper, our proposed approach leverages a unified target set to perform video/music pretraining and produces clip-level embeddings to preserve temporal information. The downstream crossmodal matching is based on the clip-level features with embedded music rhythm and optical flow information. Experiment...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/icassp48485.2024.10446029","openalex_id":"https://openalex.org/W4392904709","cited_by_count":2,"quality_score":43,"matched_keywords":["retrieval"],"author_affiliations":["Fudan University","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.799206018447876},{"id":"https://openalex.org/C177264268","display_name":"Set (abstract data type)","score":0.5438022017478943},{"id":"https://openalex.org/C135343436","display_name":"Rhythm","score":0.49247798323631287},{"id":"https://openalex.org/C155542232","display_name":"Optical flow","score":0.490444153547287},{"id":"https://openalex.org/C28490314","display_name":"Speech recognition","score":0.4767250120639801},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.46300676465034485},{"id":"https://openalex.org/C2777946086","display_name":"Music information retrieval","score":0.45230865478515625},{"id":"https://openalex.org/C2776401178","display_name":"Feature (linguistics)","score":0.4520712196826935}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":2}},{"id":"openalex:W4392903063","title":"Improving Biomedical Entity Linking with Retrieval-Enhanced Learning","url":"http://dx.doi.org/10.1109/icassp48485.2024.10448513","published":"2024-03-18","authors":["Zhenxi Lin","Ziheng Zhang","Xian Wu","Yefeng Zheng"],"abstract":"Biomedical entity linking (BioEL) has achieved remarkable progress with the help of pre-trained language models. However, existing BioEL methods usually struggle to handle rare and difficult entities due to long-tailed distribution. To address this limitation, we introduce a new scheme kNN-BioEL, which provides a BioEL model with the ability to reference similar instances from the entire training corpus as clues for prediction, thus improving the generalization capabilities. Moreover, we design a contrastive learning objective with dynamic hard negative sampling (DHNS) that improves the quality of the retrieved neighbors during inference. Extensive experimental results show that kNN-BioEL outperforms state-of-the-art baselines on several datasets.<sup xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">1</sup>","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/icassp48485.2024.10448513","openalex_id":"https://openalex.org/W4392903063","cited_by_count":1,"quality_score":42,"matched_keywords":["retrieval"],"author_affiliations":["Tencent (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8130846619606018},{"id":"https://openalex.org/C177148314","display_name":"Generalization","score":0.7201396226882935},{"id":"https://openalex.org/C2776214188","display_name":"Inference","score":0.6562526226043701},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.6404345035552979},{"id":"https://openalex.org/C140779682","display_name":"Sampling (signal processing)","score":0.4765085279941559},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.4640777111053467},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.46340203285217285},{"id":"https://openalex.org/C77618280","display_name":"Scheme (mathematics)","score":0.46136632561683655}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4392902675","title":"Image Aesthetics Assessment Via Learnable Queries","url":"https://doi.org/10.1109/icassp48485.2024.10446282","published":"2024-03-18","authors":["Zhiwei Xiong","Yunfan Zhang","Zhiqi Shen","Peiran Ren","Han Yu"],"abstract":"Image aesthetics assessment (IAA) aims to estimate the aesthetics of images. Depending on the content of an image, diverse criteria need to be selected to assess its aesthetics. Existing works utilize pre-trained vision backbones based on content knowledge to learn image aesthetics. However, training those backbones is time-consuming and suffers from attention dispersion. Inspired by learnable queries in vision-language alignment, we propose the Image Aesthetics Assessment via Learnable Queries (IAA-LQ) approach. It adapts learnable queries to extract aesthetic features from pre-trained image features obtained from a frozen image encoder. Extensive experiments on real-world data demonstrate the advantages of IAA-LQ, beating the best state-of-the-art method by 2.2% and 2.1% in terms of SRCC and PLCC, respectively.","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/icassp48485.2024.10446282","openalex_id":"https://openalex.org/W4392902675","cited_by_count":5,"quality_score":42,"matched_keywords":[],"author_affiliations":["Alibaba Group (China)","Nanyang Technological University"],"concepts":[{"id":"https://openalex.org/C115961682","display_name":"Image (mathematics)","score":0.6569821834564209},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6238505244255066},{"id":"https://openalex.org/C107038049","display_name":"Aesthetics","score":0.5538482069969177},{"id":"https://openalex.org/C118505674","display_name":"Encoder","score":0.5199090838432312},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5182485580444336},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.48316091299057007},{"id":"https://openalex.org/C142362112","display_name":"Art","score":0.1392943561077118},{"id":"https://openalex.org/C111919701","display_name":"Operating system","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":5}},{"id":"openalex:W4392909496","title":"Hint-Enhanced In-Context Learning Wakes Large Language Models Up For Knowledge-Intensive Tasks","url":"http://dx.doi.org/10.1109/icassp48485.2024.10447527","published":"2024-03-18","authors":["Yifan Wang","Qingyan Guo","Xinzhe Ni","Chufan Shi","Lemao Liu","Haiyun Jiang","Yujiu Yang"],"abstract":"In-context learning (ICL) ability has emerged with the increasing scale of large language models (LLMs), enabling them to learn input-label mappings from demonstrations and perform well on downstream tasks. However, under the standard ICL setting, LLMs may sometimes neglect query-related information in demonstrations, leading to incorrect predictions. To address this limitation, we propose a new paradigm called Hint-enhanced In-Context Learning (HICL) to explore the power of ICL in open-domain question answering, an important form in knowledge-intensive tasks. HICL leverages LLMs’ reasoning ability to extract query-related knowledge from demonstrations, then concatenates the knowledge to prompt LLMs in a more explicit way. Furthermore, we track the source of this knowledge to identify specific examples, and introduce a Hint-related Example Retriever (HER) to select informative examples f...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/icassp48485.2024.10447527","openalex_id":"https://openalex.org/W4392909496","cited_by_count":4,"quality_score":41,"matched_keywords":[],"author_affiliations":["Tencent (China)","Tsinghua University","Tsinghua–Berkeley Shenzhen Institute"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7604331970214844},{"id":"https://openalex.org/C2779343474","display_name":"Context (archaeology)","score":0.6389729976654053},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3454774022102356},{"id":"https://openalex.org/C95457728","display_name":"History","score":0.07484322786331177},{"id":"https://openalex.org/C166957645","display_name":"Archaeology","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":4}},{"id":"openalex:W4392931056","title":"Leveraging in-the-wild Data for Effective Self-supervised Pretraining in Speaker Recognition","url":"https://doi.org/10.1109/icassp48485.2024.10447489","published":"2024-03-18","authors":["Shuai Wang","Qibing Bai","Qi Liu","Jianwei Yu","Zhengyang Chen","Bing Han","Yanmin Qian","Haizhou Li"],"abstract":"Current speaker recognition systems primarily rely on supervised approaches, constrained by the scale of labeled datasets. To boost the system performance, researchers leverage large pretrained models such as WavLM to transfer learned high-level features to the downstream speaker recognition task. However, this approach introduces extra parameters as the pretrained model remains in the inference stage. Another group of researchers directly apply self-supervised methods such as DINO to speaker embedding learning, yet they have not explored its potential on large-scale in-the-wild datasets. In this paper, we present the effectiveness of DINO training on the large-scale WenetSpeech dataset and its transferability in enhancing the supervised system performance on the CNCeleb dataset. Additionally, we introduce a confidence-based data filtering algorithm to remove unreliable data from the pre...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/icassp48485.2024.10447489","openalex_id":"https://openalex.org/W4392931056","cited_by_count":3,"quality_score":40,"matched_keywords":[],"author_affiliations":["Chinese University of Hong Kong, Shenzhen","Shanghai Jiao Tong University","Shenzhen Research Institute of Big Data","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8305774927139282},{"id":"https://openalex.org/C153083717","display_name":"Leverage (statistics)","score":0.696366548538208},{"id":"https://openalex.org/C2776214188","display_name":"Inference","score":0.6061298847198486},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.597142219543457},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.577850878238678},{"id":"https://openalex.org/C150899416","display_name":"Transfer of learning","score":0.46343642473220825},{"id":"https://openalex.org/C61423126","display_name":"Scripting language","score":0.4392763078212738},{"id":"https://openalex.org/C28490314","display_name":"Speech recognition","score":0.4373339116573334}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":3}},{"id":"openalex:W4392904020","title":"A Study of Mispronunciation Detection and Diagnosis Based on Meta-Learning","url":"https://doi.org/10.1109/icassp48485.2024.10447007","published":"2024-03-18","authors":["Yukai Wan","Yuqi Shi","Binghuai Lin","Yanlu Xie"],"abstract":"The majority of the current mispronunciation detection and diagnosis (MD&D) methods rely on manually annotated data for model training. However, annotating mispronunciations produced by second language (L2) learners is costly. Consequently, data scarcity emerges as a significant challenge in MD&D tasks. In this paper, we employ model-agnostic meta-learning (MAML) to train a phoneme recognition model for MD&D. We conduct experiments using varied meta-learning task partitioning and training strategies to endow the model’s ability to rapidly adapt to unfamiliar speakers. Our best-performing method achieves an F-measure of 61.45%, surpassing both the method using fine-tuned pre-trained model wav2vec2.0 and the approach of incorporating reference text during training. These related works also aim to address the challenge of data scarcity in MD&D. Notably, with few-shot fine-tuning, our model....","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/icassp48485.2024.10447007","openalex_id":"https://openalex.org/W4392904020","cited_by_count":3,"quality_score":40,"matched_keywords":[],"author_affiliations":["Beijing Language and Culture University","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7939361333847046},{"id":"https://openalex.org/C2780451532","display_name":"Task (project management)","score":0.698714017868042},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.6391012668609619},{"id":"https://openalex.org/C109747225","display_name":"Scarcity","score":0.6115788817405701},{"id":"https://openalex.org/C51632099","display_name":"Training set","score":0.5440040230751038},{"id":"https://openalex.org/C2780009758","display_name":"Measure (data warehouse)","score":0.515009880065918},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.5051742196083069},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.4770059883594513}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":3}},{"id":"openalex:W4392909573","title":"Semantic Enrichment for Video Question Answering with Gated Graph Neural Networks","url":"https://doi.org/10.1109/icassp48485.2024.10447275","published":"2024-03-18","authors":["Chenyang Lyu","Wenxi Li","Tianbo Ji","Yi Yu","Longyue Wang"],"abstract":"Video Question Answering (VideoQA) is a complex task that requires a deep understanding of a video to accurately answer questions. Existing methods often struggle to effectively integrate the visual and language-based semantic information, subsequently leading to an incomplete understanding of video content and sub-optimal performance. To address the challenge, we introduce a novel approach in this paper to enrich the semantics of video frames, questions, and answer candidates. Specifically, we parse video frames and questions into semantic graphs - visual semantic graph and question semantic graph, which captures information about objects, their attributes, and relationships. These graphs are then encoded using a Gated Graph Neural Network (GGNN). For answer candidates, we propose to verbalize them using Large Language Models (LLMs) to further inject more semantic information from visua...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/icassp48485.2024.10447275","openalex_id":"https://openalex.org/W4392909573","cited_by_count":1,"quality_score":38,"matched_keywords":[],"author_affiliations":["Dublin City University","Mohamed bin Zayed University of Artificial Intelligence","Nantong University","National Institute of Informatics","Shanghai Jiao Tong University","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7724196910858154},{"id":"https://openalex.org/C44291984","display_name":"Question answering","score":0.6849075555801392},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.48317399621009827},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.4450339078903198},{"id":"https://openalex.org/C132525143","display_name":"Graph","score":0.44189032912254333},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.41052526235580444},{"id":"https://openalex.org/C80444323","display_name":"Theoretical computer science","score":0.15598082542419434}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4392903436","title":"GBSD: Generative Bokeh with Stage Diffusion","url":"https://doi.org/10.1109/icassp48485.2024.10447874","published":"2024-03-18","authors":["Jieren Deng","Xin Zhou","Hao Tian","Zhihong Pan","Derek Aguiar"],"abstract":"The bokeh effect is an artistic technique that blurs out-of-focus areas in a photograph and has gained interest due to recent developments in text-to-image synthesis and the ubiquity of smartphone cameras and photo sharing apps. Prior work on rendering bokeh effects have focused on manipulating photographs using classical computer graphics or neural rendering techniques, but have either depth discontinuity artifacts or are restricted to reproducing bokeh effects that are present in the training data. In this paper, we present generative bokeh with stage diffusion (GBSD), the first generative text-to-image model that synthesizes photorealistic images with a bokeh style. Motivated by how image synthesis occurs progressively in diffusion models, our approach combines latent diffusion models with a 2-stage conditioning algorithm to render bokeh effects on semantically defined objects. Since....","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/icassp48485.2024.10447874","openalex_id":"https://openalex.org/W4392903436","cited_by_count":1,"quality_score":38,"matched_keywords":[],"author_affiliations":["Baidu (China)","University of Connecticut"],"concepts":[{"id":"https://openalex.org/C205711294","display_name":"Rendering (computer graphics)","score":0.8554463386535645},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7882481813430786},{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.7360479831695557},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5840730667114258},{"id":"https://openalex.org/C167966045","display_name":"Generative model","score":0.5487202405929565},{"id":"https://openalex.org/C2989087649","display_name":"Image synthesis","score":0.5447521805763245},{"id":"https://openalex.org/C77660652","display_name":"Computer graphics","score":0.5078323483467102},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.49479907751083374}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4392911118","title":"Search for Gravitational Wave Probes - A Self-Supervised Learning for Pulsars Based on Signal Contexts","url":"http://dx.doi.org/10.1109/icassp48485.2024.10446944","published":"2024-03-18","authors":["S. Wang","Xiaofeng Cheng","Ming Xie","Yuhang Ling","Chao Liu","Mingmin Chi","Pei Wang","Zhongyi Sun","Yabiao Wang"],"abstract":"The recent successful detection of gravitational waves (GWs) at nanohertz based on pulsar timing arrays has underscored the growing significance of searching for new pulsars, which serve as valuable probes for GWs. However, one of the challenges in this endeavor is the lack of labeled data, which can lead to overfitting and poor generalization in supervised deep neural networks. In this paper, we propose a self-supervised pretext task based on signal con-texts to obtain discriminative radio signal representation. Specially, signal attentions are designed to enhance pulse signals within time-phase or frequency-phase images whenever a pulsar is detected. To validate our proposed model, we conducted experiments using the FAST public dataset with a significant improvement in recall and AUC compared to existing single and multimodal deep models with different attentions. As a result, we searc...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/icassp48485.2024.10446944","openalex_id":"https://openalex.org/W4392911118","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Chinese Academy of Sciences","Fudan University","National Astronomical Observatories","Tencent (China)","Zhejiang Lab"],"concepts":[{"id":"https://openalex.org/C110363677","display_name":"Pulsar","score":0.687982439994812},{"id":"https://openalex.org/C190330329","display_name":"Gravitational wave","score":0.6448014974594116},{"id":"https://openalex.org/C2779843651","display_name":"SIGNAL (programming language)","score":0.5944358110427856},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5247278809547424},{"id":"https://openalex.org/C121332964","display_name":"Physics","score":0.4204336702823639},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3869745135307312},{"id":"https://openalex.org/C1276947","display_name":"Astronomy","score":0.3537887632846832},{"id":"https://openalex.org/C199360897","display_name":"Programming language","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4392932013","title":"General Phrase Debiaser: Debiasing Masked Language Models at a Multi-Token Level","url":"http://dx.doi.org/10.1109/icassp48485.2024.10447098","published":"2024-03-18","authors":["Bingkang Shi","Xiaodan Zhang","Dehan Kong","Yulei Wu","Zongzhen Liu","Honglei Lyu","Longtao Huang"],"abstract":"The social biases and unwelcome stereotypes revealed by pretrained language models are becoming obstacles to their application. Compared to numerous debiasing methods targeting word level, there has been relatively less attention on biases present at phrase level, limiting the performance of debiasing in discipline domains. In this paper, we propose an automatic multi-token debiasing pipeline called General Phrase Debiaser, which is capable of mitigating phrase-level biases in masked language models. Specifically, our method consists of a phrase filter stage that generates stereotypical phrases from Wikipedia pages as well as a model debias stage that can debias models at the multi-token level to tackle bias challenges on phrases. The latter searches for prompts that trigger model’s bias, and then uses them for debiasing. State-of-the-art results on standard datasets and metrics show tha...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/icassp48485.2024.10447098","openalex_id":"https://openalex.org/W4392932013","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Alibaba Group (China)","Chinese Academy of Sciences","Institute of Information Engineering","University of Bristol"],"concepts":[{"id":"https://openalex.org/C2779458634","display_name":"Debiasing","score":0.9931967854499817},{"id":"https://openalex.org/C2776224158","display_name":"Phrase","score":0.7985289692878723},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7876993417739868},{"id":"https://openalex.org/C48145219","display_name":"Security token","score":0.757926344871521},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5250449180603027},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.5197486877441406},{"id":"https://openalex.org/C43521106","display_name":"Pipeline (software)","score":0.5077601671218872},{"id":"https://openalex.org/C153962237","display_name":"Noun phrase","score":0.43004217743873596}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4392904498","title":"Domain-Adaptive Semantic Segmentation Emerges From Vision-Language Supervised Domain-Debiased Self-Training","url":"http://dx.doi.org/10.1109/icassp48485.2024.10447308","published":"2024-03-18","authors":["Huayu Wang","Zekun Jiang","Lingxi Xie","Dongsheng Jiang","Wei Shen","Qi Tian"],"abstract":"Unsupervised domain adaptive semantic segmentation leverages synthetic data to train a segmentation model and transfers it to unlabeled real images. Due to the style difference, the transferred model suffers from the domain gap. Even worse, some classes exhibit the extreme domain gap, where the feature distributions undergo a complete shift between the two domains. To alleviate it, we propose a domain-debiased self-training strategy with CLIP to distill its domain-agnostic knowledge. Specifically, we enforce the consistency between the feature maps from our segmentation model and the image encoder of CLIP. Meanwhile, the text embeddings from the text encoder for each class serve as a domain-agnostic classifier to support a domain-debiased feature learning condition. Experimental results under standard UDA settings demonstrate that our proposed strategy consistently improves the UDA segme...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/icassp48485.2024.10447308","openalex_id":"https://openalex.org/W4392904498","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Huawei Technologies (China)","Shanghai Jiao Tong University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.805266797542572},{"id":"https://openalex.org/C89600930","display_name":"Segmentation","score":0.779676079750061},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.7334910035133362},{"id":"https://openalex.org/C118505674","display_name":"Encoder","score":0.6535582542419434},{"id":"https://openalex.org/C95623464","display_name":"Classifier (UML)","score":0.5756707191467285},{"id":"https://openalex.org/C2776401178","display_name":"Feature (linguistics)","score":0.5732975006103516},{"id":"https://openalex.org/C153180895","display_name":"Pattern recognition (psychology)","score":0.5717350840568542},{"id":"https://openalex.org/C36503486","display_name":"Domain (mathematical analysis)","score":0.4875201880931854}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/tnt-llm-text-mining-at-scale-with-large-language-models","title":"TnT-LLM: Text Mining at Scale with Large Language Models","url":"https://www.microsoft.com/en-us/research/publication/tnt-llm-text-mining-at-scale-with-large-language-models/","published":"2024-03-17","authors":["Mengting Wan","Tara Safavi","Sujay Kumar Jauhar","Yujin Kim","Scott Counts","Jennifer Neville","Siddharth Suri","Chirag Shah","Ryen W. White","Longqi Yang","Reid Andersen","Georg Buscher"],"abstract":"Transforming unstructured text into structured and meaningful forms, organized by useful category labels, is a fundamental step in text mining for downstream analysis and application. However, most existing methods for producing label taxonomies and building text-based label classifiers still rely heavily on domain expertise and manual curation, making the process expensive and time-consuming. This is particularly challenging when the label space is under-specified and large-scale data annotations are unavailable. In this paper, we address these challenges with Large Language Models (LLMs), whose prompt-based interface facilitates the induction and use of large-scale pseudo labels. We propose TnT-LLM, a two-phase framework that employs LLMs to automate the process of end-to-end label generation and assignment with minimal human effort for any given use-case. In the first phase, we introd...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.1145/3637528.3671647","openalex_id":"https://openalex.org/W4401863388","cited_by_count":51,"quality_score":110,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Search and information retrieval","Computer science","large language models","1970-01-01","LLM"],"author_affiliations":["Microsoft","Microsoft (United States)","University of Washington"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/unified-generative-modeling-of-3d-molecules-via-bayesian-flow-networks","title":"Unified Generative Modeling of 3D Molecules via Bayesian Flow Networks","url":"https://www.microsoft.com/en-us/research/publication/unified-generative-modeling-of-3d-molecules-via-bayesian-flow-networks/","published":"2024-03-16","authors":["Yuxuan Song","Jingjing Gong","Yanru Qu","Hao Zhou","Mingyue Zheng","Jingjing Liu","Wei-Ying Ma"],"abstract":"Advanced generative model (e.g., diffusion model) derived from simplified continuity assumptions of data distribution, though showing promising progress, has been difficult to apply directly to geometry generation applications due to the multi-modality and noise-sensitive nature of molecule geometry. This work introduces Geometric Bayesian Flow Networks (GeoBFN), which naturally fits molecule geometry by modeling diverse modalities in the differentiable parameter space of distributions. GeoBFN maintains the SE-(3) invariant density modeling property by incorporating equivariant inter-dependency modeling on parameters of distributions and unifying the probabilistic modeling of different modalities. Through optimized training and sampling techniques, we demonstrate that GeoBFN achieves state-of-the-art performance on multiple 3D molecule generation benchmarks in terms of generation quality...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Biology","Computer science","Physics","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/sigma-an-open-source-interactive-system-for-mixed-reality-task-assistance-research-extended-abstract","title":"SIGMA: An Open-Source Interactive System for Mixed-Reality Task Assistance Research – Extended Abstract","url":"https://www.microsoft.com/en-us/research/publication/sigma-an-open-source-interactive-system-for-mixed-reality-task-assistance-research-extended-abstract/","published":"2024-03-15","authors":["Dan Bohus","Sean Andrist","Nick Saw","Ann Paradiso","Ishani Chakraborty","Mahdi Rad"],"abstract":"We introduce an open-source system called Sigma (short for “Situated Interactive Guidance, Monitoring, and Assistance”) as a platform for conducting research on task-assistive agents in mixed-reality scenarios. The system leverages the sensing and rendering affordances of a head-mounted mixed reality device in conjunction with large language and vision models to guide users step by step through procedural tasks. By open-sourcing the system, we aim to lower the barrier to entry, accelerate research in this space, and chart a path towards community-driven end-to-end evaluation of large language, vision, and multimodal models in the context of real-world interactive applications.","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Article (Journal)","Artificial intelligence","Computer vision","Computer science"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"hf-org-paper:huawei-noah:2403.09419","title":"RoDUS: Robust Decomposition of Static and Dynamic Elements in Urban Scenes","url":"https://huggingface.co/papers/2403.09419","published":"2024-03-14","authors":["Huawei/Noah"],"abstract":"","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["HuggingFace org papers","huawei-noah"],"author_affiliations":["Huawei/Noah"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/huawei-noah/papers"}},{"id":"apple:h520qablmiqctmvqg4d2d1fi","title":"CEASE: Conversation Embeddings for Implicit Summarisation in the Continuous Space","url":"https://machinelearning.apple.com/research/cease","published":"2024-03-14","authors":["Seanie Lee","Alex Coca","Jianpeng Cheng","Anders Johannsen","Joris Driesen"],"abstract":"Few-shot dialogue state tracking (DST) with Large Language Models (LLM) relies on an effective and efficient conversation retriever to find similar in-context examples for prompt learning. Previous works use raw dialogue context as search keys and queries, and a retriever is fine-tuned with annotated dialogues to achieve superior performance. However, the approach is less suited for scaling to new domains or new annotation languages, where...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["LLM","efficient"],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"apple:ixg22mdy9uf3875qao2v31mv","title":"Corpus Synthesis for Zero-shot ASR Domain Adaptation using Large Language Models","url":"https://machinelearning.apple.com/research/corpus-synthesis","published":"2024-03-14","authors":["Hsuan Su","Ting-Yao Hu","Hema Swetha Koppula","Raviteja Vemulapalli","Jen-Hao Rick Chang","Karren Yang","Gautam Varma Mantena","Oncel Tuzel"],"abstract":"While Automatic Speech Recognition (ASR) systems are widely used in many real-world applications, they often do not generalize well to new domains and need to be finetuned on data from these domains. However, target-domain data is usually not readily available in many scenarios. In this paper, we propose a new strategy for adapting ASR models to new target domains without any text or speech from those domains. To accomplish this, we propose a...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"openalex:W4392824193","title":"Stein Variational Belief Propagation for Multi-Robot Coordination","url":"https://doi.org/10.1109/lra.2024.3375708","published":"2024-03-14","authors":["Jana Pavlasek","J. Mah","Ruihan Xu","Odest Chadwicke Jenkins","Fábio Ramos"],"abstract":"Decentralized coordination for multi-robot systems involves planning in challenging, high-dimensional spaces. The planning problem is particularly challenging in the presence of obstacles and different sources of uncertainty such as inaccurate dynamic models and sensor noise. In this letter, we introduce Stein Variational Belief Propagation (SVBP), a novel algorithm for performing inference over nonparametric marginal distributions of nodes in a graph. We apply SVBP to multi-robot coordination by modelling a robot swarm as a graphical model and performing inference for each robot. We demonstrate our algorithm on a simulated multi-robot perception task, and on a multi-robot planning task within a Model-Predictive Control (MPC) framework, on both simulated and real-world mobile robots. Our experiments show that SVBP represents multi-modal distributions better than sampling-based or Gaussia...","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/lra.2024.3375708","openalex_id":"https://openalex.org/W4392824193","cited_by_count":2,"quality_score":39,"matched_keywords":[],"author_affiliations":["Nvidia (United States)","The University of Sydney","University of Michigan"],"concepts":[{"id":"https://openalex.org/C90509273","display_name":"Robot","score":0.5448616743087769},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.4642990231513977},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4442146420478821}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":2}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/benchmarking-large-language-models-across-languages-modalities-models-and-tasks","title":"Benchmarking Large Language Models Across Languages, Modalities, Models and Tasks","url":"https://www.microsoft.com/en-us/research/publication/benchmarking-large-language-models-across-languages-modalities-models-and-tasks/","published":"2024-03-13","authors":["Sanchit Ahuja","Divyanshu Aggarwal","Varun Gumma","Ishaan Watts","Ashutosh Sathe","Millicent Ochieng","Rishav Hada","Prachi Jain","Mohamed Ahmed","Kalika Bali","Sunayana Sitaram"],"abstract":"Recently, there has been a surge in LLM evaluation research to comprehend LLM capabilities and limitations. However, much of this research has been confined to English, leaving LLM building and evaluation for non-English languages relatively unexplored. Several new LLMs have been introduced recently, necessitating their evaluation on non-English languages. This study aims to perform a thorough evaluation of the non-English capabilities of state-of-the-art LLMs (GPT-3.5-Turbo, GPT-4, PaLM2, Mistral, and Llama2) by comparing them on the same set of multilingual datasets. Our benchmark comprises datasets covering languages, including low-resource African languages. We also include two multimodal datasets in the benchmark and compare the performance of LLaVA-v1.5 and GPT-4-Vision. Our experiments show that GPT-4 and PaLM2 outperform the Llama and Mistral models on various tasks, notably on l...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":80,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Human language technologies","large language model","Natural language processing","1970-01-01","LLM"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"official:1807540dd572be8a","title":"GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection","url":"https://ai.meta.com/research/publications/galore-memory-efficient-llm-training-by-gradient-low-rank-projection/","published":"2024-03-13","authors":["Jiawei Zhao","Zhenyu Zhang","Beidi Chen","Zhangyang Wang","Anima Anandkumar","Yuandong Tian"],"abstract":"Official Meta AI publication page.","companies":["Meta/FAIR"],"matched_orgs":["Meta/FAIR"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Core Machine Learning","LLM","memory","efficient"],"author_affiliations":["Meta/FAIR"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Meta AI publication search https://ai.meta.com/results/?content_types%5B0%5D=publication&page=15"}},{"id":"openalex:W4392761692","title":"Audio-Visual Contrastive Pre-train for Face Forgery Detection","url":"https://doi.org/10.1145/3651311","published":"2024-03-13","authors":["Hanqing Zhao","Wenbo Zhou","Dongdong Chen","Weiming Zhang","Ying Guo","Zhen Cheng","Pengfei Yan","Nenghai Yu"],"abstract":"The highly realistic avatar in the metaverse may lead to deepfakes of facial identity. Malicious users can more easily obtain the three-dimensional structure of faces, thus using deepfake technology to create counterfeit videos with higher realism. To automatically discern facial videos forged with the advancing generation techniques, deepfake detectors need to achieve stronger generalization abilities. Inspired by transfer learning, neural networks pre-trained on other large-scale face-related tasks would provide fundamental features for deepfake detection. We propose a video-level deepfake detection method based on a temporal transformer with a self-supervised audio–visual contrastive learning approach for pre-training the deepfake detector. The proposed method learns motion representations in the mouth region by encouraging the paired video and audio representations to be close while....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3651311","openalex_id":"https://openalex.org/W4392761692","cited_by_count":4,"quality_score":41,"matched_keywords":[],"author_affiliations":["Meizu (China)","Microsoft (United States)","Seattle University","University of Science and Technology of China"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8673662543296814},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.6534522175788879},{"id":"https://openalex.org/C177148314","display_name":"Generalization","score":0.6141279935836792},{"id":"https://openalex.org/C63479239","display_name":"Robustness (evolution)","score":0.5211891531944275},{"id":"https://openalex.org/C2779304628","display_name":"Face (sociological concept)","score":0.4764140844345093},{"id":"https://openalex.org/C94915269","display_name":"Detector","score":0.4138699173927307},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.40411537885665894},{"id":"https://openalex.org/C153180895","display_name":"Pattern recognition (psychology)","score":0.3748213052749634}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":4}},{"id":"openalex:W4392693688","title":"Distributed Semantic Communications for Multimodal Audio-Visual Parsing Tasks","url":"https://doi.org/10.1109/tgcn.2024.3374700","published":"2024-03-12","authors":["Penghong Wang","Jiahui Li","Chen Liu","Xiaopeng Fan","Mengyao Ma","Yaowei Wang"],"abstract":"Semantic communication has significantly improved in single-modal single-task scenarios, but its progress is limited in multimodal and multi-task transmission contexts. To address this issue, this paper investigates a distributed semantic communication system for audio-visual parsing (AVP) task. The system acquires audio-visual information from distributed terminals and conducts multi-task analysis on the far-end server, which involves event categorization and boundary recording. We propose a distributed deep joint source-channel coding scheme with auxiliary information feedback to implement this system, aiming to enhance parsing performance and reduce bandwidth consumption during communication. Specifically, the server initially receives the audio feature from the audio terminal and then sends the semantic information extracted from the audio feature back to the visual terminal. The rec...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/tgcn.2024.3374700","openalex_id":"https://openalex.org/W4392693688","cited_by_count":11,"quality_score":48,"matched_keywords":[],"author_affiliations":["Harbin Institute of Technology","Huawei Technologies (China)","Peng Cheng Laboratory"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7941120862960815},{"id":"https://openalex.org/C186644900","display_name":"Parsing","score":0.7197954058647156},{"id":"https://openalex.org/C3017588708","display_name":"Audio visual","score":0.6592826247215271},{"id":"https://openalex.org/C28490314","display_name":"Speech recognition","score":0.49707677960395813},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.48490267992019653},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.44123703241348267},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.34309279918670654},{"id":"https://openalex.org/C49774154","display_name":"Multimedia","score":0.2150670886039734}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":11}},{"id":"openalex:W4392715767","title":"Multimodal Dialogue Systems via Capturing Context-aware Dependencies and Ordinal Information of Semantic Elements","url":"https://doi.org/10.1145/3645099","published":"2024-03-12","authors":["Weidong He","Zhi Li","Hao Wang","Tong Xu","Zhefeng Wang","Baoxing Huai","Nicholas Jing Yuan","Enhong Chen"],"abstract":"The topic of multimodal conversation systems has recently garnered significant attention across various industries, including travel and retail, among others. While pioneering works in this field have shown promising performance, they often focus solely on context information at the utterance level, overlooking the context-aware dependencies of multimodal semantic elements like words and images. Furthermore, the ordinal information of images, which indicates the relevance between visual context and users’ demands, remains underutilized during the integration of visual content. Additionally, the exploration of how to effectively utilize corresponding attributes provided by users when searching for desired products is still largely unexplored. To address these challenges, we propose PMATE, a P osition-aware M ultimodal di A logue system with seman T ic E lements. Specifically, to obtain se...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3645099","openalex_id":"https://openalex.org/W4392715767","cited_by_count":4,"quality_score":41,"matched_keywords":[],"author_affiliations":["Huawei Technologies (China)","Tsinghua University","University Town of Shenzhen","University of Science and Technology of China"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8923264741897583},{"id":"https://openalex.org/C2779343474","display_name":"Context (archaeology)","score":0.6497921943664551},{"id":"https://openalex.org/C192209626","display_name":"Focus (optics)","score":0.626121461391449},{"id":"https://openalex.org/C2775852435","display_name":"Utterance","score":0.6057928800582886},{"id":"https://openalex.org/C184337299","display_name":"Semantics (computer science)","score":0.5336951017379761},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4976496994495392},{"id":"https://openalex.org/C2777200299","display_name":"Conversation","score":0.4896426796913147},{"id":"https://openalex.org/C158154518","display_name":"Relevance (law)","score":0.4881764054298401}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":4}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/rethinking-generative-large-language-model-evaluation-for-semantic-comprehension","title":"Rethinking Generative Large Language Model Evaluation for Semantic Comprehension","url":"https://www.microsoft.com/en-us/research/publication/rethinking-generative-large-language-model-evaluation-for-semantic-comprehension/","published":"2024-03-11","authors":["Fangyun Wei","Xi Chen","Linzi Luo"],"abstract":"Despite their sophisticated capabilities, large language models (LLMs) encounter a major hurdle in effective assessment. This paper first revisits the prevalent evaluation method-multiple choice question answering (MCQA), which allows for straightforward accuracy measurement. Through a comprehensive evaluation of 24 models across 11 benchmarks, we highlight several potential drawbacks of MCQA, for instance, the inconsistency between the MCQA evaluation and the generation of open-ended responses in practical scenarios. In response, we introduce an RWQ-Elo rating system, engaging 24 LLMs such as GPT-4, GPT-3.5, Google-Gemini-Pro and LLaMA-1/-2, in a two-player competitive format, with GPT-4 serving as the judge. Each LLM receives an Elo rating thereafter. This system is designed to mirror real-world usage, and for this purpose, we have compiled a new benchmark called Real-world questions''...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":80,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","large language models","1970-01-01","LLM","language model"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/towards-a-clinically-accessible-radiology-foundation-model-open-access-and-lightweight-with-automated-evaluation","title":"Towards a clinically accessible radiology foundation model: open-access and lightweight, with automated evaluation","url":"https://www.microsoft.com/en-us/research/publication/towards-a-clinically-accessible-radiology-foundation-model-open-access-and-lightweight-with-automated-evaluation/","published":"2024-03-11","authors":["Juan Manuel Zambrano Chaves","Shih-Cheng Huang","Yanbo Xu","Hanwen Xu","Naoto Usuyama (naotous)","Sheng Zhang","Fei Wang","Yujia Xie","Mahmoud Khademi","Ziyi Yang","Hany Hassan Awadalla","Julia Gong"],"abstract":"The scaling laws and extraordinary performance of large foundation models motivate the development and utilization of such models in biomedicine. However, despite early promising results on some biomedical benchmarks, there are still major challenges that need to be addressed before these models can be used in real-world clinics. Frontier general-domain models such as GPT-4V still have significant performance gaps in multimodal biomedical applications. More importantly, less-acknowledged pragmatic issues, including accessibility, model cost, and tedious manual evaluation make it hard for clinicians to use state-of-the-art large models directly on private patient data. Here, we explore training open-source small multimodal models (SMMs) to bridge competency gaps for unmet clinical needs in radiology. To maximize data efficiency, we adopt a modular approach by incorporating state-of-the-ar...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Miscellaneous","Artificial intelligence","Computer vision","Medical, health and genomics","Computer science","retrieval"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"apple:s96smng5ru64hk48f1pw4st6","title":"Merge Vision Foundation Models via Multi-Task Distillation","url":"https://machinelearning.apple.com/research/merge-vision","published":"2024-03-11","authors":["Haoxiang Wang","Pavan Kumar Anasosalu Vasu","Fartash Faghri","Raviteja Vemulapalli","Mehrdad Farajtabar","Sachin Mehta","Mohammad Rastegari","Oncel Tuzel","Hadi Pour Ansari"],"abstract":"As the repository of publicly available pre-trained vision foundation models (VFMs) — such as CLIP, DINOv2, and SAM — grows, users face challenges in storage, memory, and computational efficiency when deploying multiple models concurrently. To address these concerns, we introduce a unique approach that merges the capabilities of multiple VFMs into a single efficient multi-task model. Our method, termed \"joint distillation,\" seamlessly integrates...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["memory","efficient","distillation"],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"bytedance-seed:220","title":"Painting with Words: Elevating Detailed Image Captioning with Benchmark and Alignment Learning","url":"https://seed.bytedance.com/en/research/painting-with-words-elevating-detailed-image-captioning-with-benchmark-and-alignment-learning","published":"2024-03-10","authors":["Qinghao Ye","Xianhan Zeng","Fu Li","Chunyuan Li","Haoqi Fan"],"abstract":"Image captioning has long been a pivotal task in visual understanding, with recent advancements in vision-language models (VLMs) significantly enhancing the ability to generate detailed image captions. However, the evaluation of detailed image captioning remains underexplored due to outdated evaluation metrics and coarse annotations. In this paper, we introduce DeCapBench along with a novel metric, DCScore, specifically designed for detailed captioning tasks. DCScore evaluates hallucinations and fine-grained comprehensiveness by deconstructing responses into the smallest self-sufficient units, termed primitive information units, and assessing them individually. Our evaluation shows that DCScore aligns more closely with human judgment than other rule-based or model-based metrics. Concurrently, DeCapBench exhibits a high correlation with VLM arena results on descriptive tasks, surpassing e...","companies":["ByteDance/Seed"],"matched_orgs":["ByteDance/Seed"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Computer Vision","Multimodal","ICLR 2025","preference"],"author_affiliations":["ByteDance/Seed"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official ByteDance Seed publication API https://seed.bytedance.com/api/get_article_list_v2"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/zero-extremely-efficient-collective-communication-for-large-model-training","title":"ZeRO++: Extremely Efficient Collective Communication for Large Model Training","url":"https://www.microsoft.com/en-us/research/publication/zero-extremely-efficient-collective-communication-for-large-model-training/","published":"2024-03-09","authors":["Guanhua Wang","Heyang Qin","Sam Ade Jacobs","Xiaoxia Wu","Connor Holmes","Zhewei Yao","Samyam Rajbhandari","Olatunji Ruwase","Feng Yang","Lei Yang","Yuxiong He"],"abstract":"While the Zero Redundancy Optimizer (ZeRO) excels in training large-scale models, it struggles to achieve good throughput in environments with limited band-width or small batches where communication becomes a major bottleneck. Inspired by the principles of fine-grained quantization in machine learning algorithms, we designed ZeRO++, an optimizer robust to quantization effects that allows for significant communication volume reduction using low-precision quantization techniques. ZeRO++ composes of three communication volume reduction techniques (low-precision all-gather, data remapping, and low-precision gradient averaging) to significantly reduce the communication volume up to 4x that enables up to 2.16x better throughput at 384 GPU scale. Our results also show ZeRO++ can speedup the RLHF by 3.3x compared to vanilla ZeRO. To verify the convergence of ZeRO++, we test up to 13B model for p...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","1970-01-01","efficient","quantization"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"hf-org-paper:deepseek-ai:2403.05525","title":"DeepSeek-VL: Towards Real-World Vision-Language Understanding","url":"https://huggingface.co/papers/2403.05525","published":"2024-03-08","authors":["DeepSeek"],"abstract":"We present DeepSeek-VL, an open-source Vision-Language (VL) Model designed for real-world vision and language understanding applications. Our approach is structured around three key dimensions: We strive to ensure our data is diverse, scalable, and extensively covers real-world scenarios including web screenshots, PDFs, OCR, charts, and knowledge-based content, aiming for a comprehensive representation of practical contexts. Further, we create a use case taxonomy from real user scenarios and construct an instruction tuning dataset accordingly. The fine-tuning with this dataset substantially improves the model's user experience in practical applications. Considering efficiency and the demands of most real-world scenarios, DeepSeek-VL incorporates a hybrid vision encoder that efficiently processes high-resolution images (1024 x 1024), while maintaining a relatively low computational overhe...","companies":["DeepSeek"],"matched_orgs":["DeepSeek"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["HuggingFace org papers","deepseek-ai","LLM","language model"],"author_affiliations":["DeepSeek"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/deepseek-ai/papers"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/erbench-an-entity-relationship-based-automatically-verifiable-hallucination-benchmark-for-large-language-models","title":"ERBench: An Entity-Relationship based Automatically Verifiable Hallucination Benchmark for Large Language Models","url":"https://www.microsoft.com/en-us/research/publication/erbench-an-entity-relationship-based-automatically-verifiable-hallucination-benchmark-for-large-language-models/","published":"2024-03-07","authors":["Jio Oh","Soyeon Kim","Junseok Seo","Jindong Wang","Ruochen Xu","Xing Xie","Steven Euijong Whang"],"abstract":"Large language models (LLMs) have achieved unprecedented performance in various applications, yet their evaluation remains a critical issue. Existing hallucination benchmarks are either static or lack adjustable complexity for thorough analysis. We contend that utilizing existing relational databases is a promising approach for constructing benchmarks due to their accurate knowledge description via functional dependencies. We propose ERBench to automatically convert any relational database into a benchmark based on the entity-relationship (ER) model. Our key idea is to construct questions using the database schema, records, and functional dependencies such that they can be automatically verified. In addition, we use foreign key constraints to join relations and construct multihop questions, which can be arbitrarily complex and used to debug the intermediate answers of LLMs. Finally, ERBe...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","large language models","1970-01-01","LLM"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/llms-in-the-imaginarium-tool-learning-through-simulated-trial-and-error","title":"LLMs in the Imaginarium: Tool Learning through Simulated Trial and Error","url":"https://www.microsoft.com/en-us/research/publication/llms-in-the-imaginarium-tool-learning-through-simulated-trial-and-error/","published":"2024-03-07","authors":["Boshi Wang","Hao Fang","Jason Eisner","Ben Van Durme","Yu Su"],"abstract":"Tools are essential for large language models (LLMs) to acquire up-to-date information and take consequential actions in external environments. Existing work on tool-augmented LLMs primarily focuses on the broad coverage of tools and the flexibility of adding new tools. However, a critical aspect that has surprisingly been understudied is simply how accurately an LLM uses tools for which it has been trained. We find that existing LLMs, including GPT-4 and open-source LLMs specifically fine-tuned for tool use, only reach a correctness rate in the range of 30% to 60%, far from reliable use in practice. We propose a biologically inspired method for tool-augmented LLMs, simulated trial and error (STE), that orchestrates three key mechanisms for successful tool use behaviors in the biological system: trial and error, imagination, and memory. Specifically, STE leverages an LLM's 'imagination'....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Article (Journal)","Artificial intelligence","LLM","memory","long-term"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W4392542796","title":"Ensuring useful adoption of generative artificial intelligence in healthcare","url":"https://doi.org/10.1093/jamia/ocae043","published":"2024-03-07","authors":["Jenelle Jindal","Matthew P. Lungren","Nigam H. Shah"],"abstract":"OBJECTIVES: This article aims to examine how generative artificial intelligence (AI) can be adopted with the most value in health systems, in response to the Executive Order on AI. MATERIALS AND METHODS: We reviewed how technology has historically been deployed in healthcare, and evaluated recent examples of deployments of both traditional AI and generative AI (GenAI) with a lens on value. RESULTS: Traditional AI and GenAI are different technologies in terms of their capability and modes of current deployment, which have implications on value in health systems. DISCUSSION: Traditional AI when applied with a framework top-down can realize value in healthcare. GenAI in the short term when applied top-down has unclear value, but encouraging more bottom-up adoption has the potential to provide more benefit to health systems and patients. CONCLUSION: GenAI in healthcare can provide the most v...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"review","doi":"https://doi.org/10.1093/jamia/ocae043","openalex_id":"https://openalex.org/W4392542796","cited_by_count":49,"quality_score":67,"matched_keywords":[],"author_affiliations":["Microsoft (United States)","Stanford Health Care","Stanford Medicine","Stanford University","University of California System","University of California, San Francisco"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6576260328292847},{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.6153517365455627},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5909998416900635},{"id":"https://openalex.org/C160735492","display_name":"Health care","score":0.5807888507843018},{"id":"https://openalex.org/C2522767166","display_name":"Data science","score":0.33132803440093994},{"id":"https://openalex.org/C17744445","display_name":"Political science","score":0.05336034297943115},{"id":"https://openalex.org/C199539241","display_name":"Law","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":49}},{"id":"official:d99aeb35d5215e3b","title":"Generative Pre-training for Speech with Flow Matching","url":"https://ai.meta.com/research/publications/generative-pre-training-for-speech-with-flow-matching/","published":"2024-03-05","authors":["Alex Liu","Matt Le","Apoorv Vyas","Bowen Shi","Andros Tjandra","Wei-Ning Hsu"],"abstract":"Official Meta AI publication page.","companies":["Meta/FAIR"],"matched_orgs":["Meta/FAIR"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Speech & Audio"],"author_affiliations":["Meta/FAIR"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Meta AI publication search https://ai.meta.com/results/?content_types%5B0%5D=publication&page=16"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/table-meets-llm-can-large-language-models-understand-structured-table-data-a-benchmark-and-empirical-study","title":"Table Meets LLM: Can Large Language Models Understand Structured Table Data? A Benchmark and Empirical Study","url":"https://www.microsoft.com/en-us/research/publication/table-meets-llm-can-large-language-models-understand-structured-table-data-a-benchmark-and-empirical-study/","published":"2024-03-04","authors":["Yuan Sui","Mengyu Zhou","Mingjie Zhou","Shi Han","Dongmei Zhang"],"abstract":"Large language models (LLMs) are becoming attractive as few-shot reasoners to solve Natural Language (NL)-related tasks. However, there is still much to learn about how well LLMs understand structured data, such as tables. Although tables can be used as input to LLMs with serialization, there is a lack of comprehensive studies that examine whether LLMs can truly comprehend such data. In this paper, we try to understand this by designing a benchmark to evaluate the structural understanding capabilities (SUC) of LLMs. The benchmark we create includes seven tasks, each with its own unique challenges, e.g. , cell lookup, row retrieval, and size detection. We perform a series of evaluations on GPT-3.5 and GPT-4. We find that performance varied depending on several input choices, including table input format, content order, role prompting, and partition marks. Drawing from the insights gained....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":80,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Data platforms and analytics","Human language technologies","1970-01-01","LLM","retrieval"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/mathscale-scaling-instruction-tuning-for-mathematical-reasoning","title":"MathScale: Scaling Instruction Tuning for Mathematical Reasoning","url":"https://www.microsoft.com/en-us/research/publication/mathscale-scaling-instruction-tuning-for-mathematical-reasoning/","published":"2024-03-04","authors":["Zhengyang Tang","Xingxing Zhang","Benyou Wang","Furu Wei"],"abstract":"Large language models (LLMs) have demonstrated remarkable capabilities in problem-solving. However, their proficiency in solving mathematical problems remains inadequate. We propose MathScale, a simple and scalable method to create high-quality mathematical reasoning data using frontier LLMs (e.g., {\\tt GPT-3.5}). Inspired by the cognitive mechanism in human mathematical learning, it first extracts topics and knowledge points from seed math questions and then build a concept graph, which is subsequently used to generate new math questions. MathScale exhibits effective scalability along the size axis of the math dataset that we generate. As a result, we create a mathematical reasoning dataset (MathScaleQA) containing two million math question-answer pairs. To evaluate mathematical reasoning abilities of LLMs comprehensively, we construct {\\sc MwpBench}, a benchmark of Math Word Problems,....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":80,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Mathematics","Computer science","large language models","mathematics","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/found-in-the-middle-how-language-models-use-long-contexts-better-via-plug-and-play-positional-encoding","title":"Found in the Middle: How Language Models Use Long Contexts Better via Plug-and-Play Positional Encoding","url":"https://www.microsoft.com/en-us/research/publication/found-in-the-middle-how-language-models-use-long-contexts-better-via-plug-and-play-positional-encoding/","published":"2024-03-04","authors":["Zhenyu (Allen) Zhang","Runjin Chen","Shiwei Liu","Zhewei Yao","Olatunji Ruwase","Beidi Chen","Xiaoxia Wu","Zhangyang Wang"],"abstract":"This paper aims to overcome the\"lost-in-the-middle\"challenge of large language models (LLMs). While recent advancements have successfully enabled LLMs to perform stable language modeling with up to 4 million tokens, the persistent difficulty faced by most LLMs in identifying relevant information situated in the middle of the context has not been adequately tackled. To address this problem, this paper introduces Multi-scale Positional Encoding (Ms-PoE) which is a simple yet effective plug-and-play approach to enhance the capacity of LLMs to handle the relevant information located in the middle of the context, without fine-tuning or introducing any additional overhead. Ms-PoE leverages the position indice rescaling to relieve the long-term decay effect introduced by RoPE, while meticulously assigning distinct scaling ratios to different attention heads to preserve essential knowledge learn...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","large language models","1970-01-01","long-term"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W4392367398","title":"LLMRec: Large Language Models with Graph Augmentation for Recommendation","url":"https://doi.org/10.1145/3616855.3635853","published":"2024-03-04","authors":["Wei Wei","Xubin Ren","Jiabin Tang","Qinyong Wang","Lixin Su","Suqi Cheng","Junfeng Wang","Dawei Yin","Chao Huang"],"abstract":"The problem of data sparsity has long been a challenge in recommendation systems, and previous studies have attempted to address this issue by incorporating side information. However, this approach often introduces side effects such as noise, availability issues, and low data quality, which in turn hinder the accurate modeling of user preferences and adversely impact recommendation performance. In light of the recent advancements in large language models (LLMs), which possess extensive knowledge bases and strong reasoning capabilities, we propose a novel framework called LLMRec that enhances recommender systems by employing three simple yet effective LLM-based graph augmentation strategies. Our approach leverages the rich content available within online platforms (e.g., Netflix, MovieLens) to augment the interaction graph in three ways: (i) reinforcing user-item interaction egde, (ii) en...","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3616855.3635853","openalex_id":"https://openalex.org/W4392367398","cited_by_count":184,"quality_score":71,"matched_keywords":["LLM"],"author_affiliations":["Baidu (China)","University of Hong Kong"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8628434538841248},{"id":"https://openalex.org/C2776156558","display_name":"MovieLens","score":0.7988255620002747},{"id":"https://openalex.org/C557471498","display_name":"Recommender system","score":0.6675175428390503},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.548042893409729},{"id":"https://openalex.org/C132525143","display_name":"Graph","score":0.4964748024940491},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.48965007066726685},{"id":"https://openalex.org/C185798385","display_name":"Benchmark (surveying)","score":0.4690901041030884},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.43037062883377075}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":184}},{"id":"openalex:W4392384263","title":"Text-Video Retrieval via Multi-Modal Hypergraph Networks","url":"https://doi.org/10.1145/3616855.3635757","published":"2024-03-04","authors":["Qian Li","Lixin Su","Jiashu Zhao","隆義 山下","Hengyi Cai","Suqi Cheng","Hengzhu Tang","Junfeng Wang","D. Z. Yin"],"abstract":"Text-video retrieval is a challenging task that aims to identify relevant videos given textual queries. Compared to conventional textual retrieval, the main obstacle for text-video retrieval is the semantic gap between the textual nature of queries and the visual richness of video content. Previous works primarily focus on aligning the query and the video by finely aggregating word-frame matching signals. Inspired by the human cognitive process of modularly judging the relevance between text and video, the judgment needs high-order matching signal due to the consecutive and complex nature of video contents. In this paper, we propose chunk-level text-video matching, where the query chunks are extracted to describe a specific retrieval unit, and the video chunks are segmented into distinct clips from videos. We formulate the chunk-level matching as n-ary correlations modeling between words...","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3616855.3635757","openalex_id":"https://openalex.org/W4392384263","cited_by_count":13,"quality_score":54,"matched_keywords":["retrieval"],"author_affiliations":["Baidu (China)","Institute of Computing Technology","Wilfrid Laurier University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8312609195709229},{"id":"https://openalex.org/C2781221856","display_name":"Hypergraph","score":0.5026397705078125},{"id":"https://openalex.org/C165064840","display_name":"Matching (statistics)","score":0.5018551349639893},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4987649917602539},{"id":"https://openalex.org/C2776359362","display_name":"Representation (politics)","score":0.4336722791194916},{"id":"https://openalex.org/C189391414","display_name":"Visual Word","score":0.4234578013420105},{"id":"https://openalex.org/C43521106","display_name":"Pipeline (software)","score":0.4189501702785492},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.41093724966049194}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":13}},{"id":"apple:fyo2f63epu76d9370fk4noy6","title":"SynthDST: Synthetic Data is All You Need for Few-Shot Dialog State Tracking","url":"https://machinelearning.apple.com/research/synthdst","published":"2024-03-04","authors":["Atharva Kulkarni","Andy Tseng","Joel Moniz","Dhivya Piraviperumal","Hong Yu","Shruti Bhargava"],"abstract":"In-context learning with Large Language Models (LLMs) has emerged as a promising avenue of research in Dialog State Tracking (DST). However, the best-performing in-context learning methods involve retrieving and adding similar examples to the prompt, requiring access to labeled training data. Procuring such training data for a wide range of domains and applications is time-consuming, expensive, and, at times, infeasible. While zero-shot learning...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"arxiv:2312.14345","title":"Logic-Scaffolding: Personalized Aspect-Instructed Recommendation Explanation Generation using LLMs","url":"http://arxiv.org/abs/2312.14345","published":"2024-03-04","authors":["Behnam Rahdari","Hao Ding","Ziwei Fan","Yifei Ma","Zhuotong Chen","Anoop Deoras","Branislav Kveton"],"abstract":"The unique capabilities of Large Language Models (LLMs), such as the natural language text generation ability, position them as strong candidates for providing explanation for recommendations. However, despite the size of the LLM, most existing models struggle to produce zero-shot explanations reliably. To address this issue, we propose a framework called Logic-Scaffolding, that combines the ideas of aspect-based explanation and chain-of-thought prompting to generate explanations through intermediate reasoning steps. In this paper, we share our experience in building the framework and present an interactive demonstration for exploring our results.","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"preprint","doi":"https://doi.org/10.1145/3616855.3635689","openalex_id":"https://openalex.org/W4390214376","cited_by_count":4,"quality_score":49,"matched_keywords":["LLM","personalized"],"author_affiliations":["Amazon (United States)","University of California, Santa Barbara","University of Pittsburgh"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6641305088996887},{"id":"https://openalex.org/C195324797","display_name":"Natural language","score":0.4998185634613037},{"id":"https://openalex.org/C188147891","display_name":"Cognitive science","score":0.3824054002761841},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3588571548461914},{"id":"https://openalex.org/C15744967","display_name":"Psychology","score":0.1996101438999176}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":4}},{"id":"openalex:W4392384347","title":"Making Small Language Models Better Multi-task Learners with Mixture-of-Task-Adapters","url":"https://doi.org/10.1145/3616855.3635690","published":"2024-03-04","authors":["Yukang Xie","Chengyu Wang","Junbing Yan","J Zhou","Feiqi Deng","Jun Huang"],"abstract":"Recently, Large Language Models (LLMs) have achieved amazing zero-shot learning performance over a variety of Natural Language Processing (NLP) tasks, especially for text generative tasks. Yet, the large size of LLMs often leads to the high computational cost of model training and online deployment. In our work, we present ALTER, a system that effectively builds the multi-tAsk Learners with mixTure-of-task-adaptERs upon small language models (with <1B parameters) to address multiple NLP tasks simultaneously, capturing the commonalities and differences between tasks, in order to support domain-specific applications. Specifically, in ALTER, we propose the Mixture-of-Task-Adapters (MTA) module as an extension to the transformer architecture for the underlying model to capture the intra-task and inter-task knowledge. A two-stage training method is further proposed to optimize the collaborati...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3616855.3635690","openalex_id":"https://openalex.org/W4392384347","cited_by_count":4,"quality_score":41,"matched_keywords":[],"author_affiliations":["Alibaba Group (China)","East China Normal University","South China University of Technology"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8390954732894897},{"id":"https://openalex.org/C2780451532","display_name":"Task (project management)","score":0.654078483581543},{"id":"https://openalex.org/C66322947","display_name":"Transformer","score":0.6445714831352234},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.6037906408309937},{"id":"https://openalex.org/C175154964","display_name":"Task analysis","score":0.5378633141517639},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5231431722640991},{"id":"https://openalex.org/C123657996","display_name":"Architecture","score":0.470651775598526},{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.4547693729400635}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":4}},{"id":"openalex:W4392384306","title":"Some Useful Things to Know When Combining IR and NLP: The Easy, the Hard and the Ugly","url":"https://doi.org/10.1145/3616855.3636452","published":"2024-03-04","authors":["Omar Alonso","Kenneth Church"],"abstract":"Deep nets such as GPT are at the core of the current advances in many systems and applications. Things are moving fast; techniques become obsolete quickly (within weeks). How can we take advantage of new discoveries and incorporate them into our existing work? Are new developments radical improvements, or incremental repetitions of established concepts, or combinations of both?","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3616855.3636452","openalex_id":"https://openalex.org/W4392384306","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Amazon (United States)","Northeastern University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7528657913208008},{"id":"https://openalex.org/C2164484","display_name":"Core (optical fiber)","score":0.5353413820266724},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5247880816459656},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.4202677607536316},{"id":"https://openalex.org/C2522767166","display_name":"Data science","score":0.37522679567337036},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.3306843638420105},{"id":"https://openalex.org/C76155785","display_name":"Telecommunications","score":0.0926651656627655}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/taming-throughput-latency-tradeoff-in-llm-inference-with-sarathi-serve","title":"Taming Throughput-Latency Tradeoff in LLM Inference with Sarathi-Serve","url":"https://www.microsoft.com/en-us/research/publication/taming-throughput-latency-tradeoff-in-llm-inference-with-sarathi-serve/","published":"2024-03-03","authors":["Amey Agrawal","Nitin Kedia","Ashish Panwar","Jayashree Mohan","Nipun Kwatra","Bhargav Gulavani","Alexey Tumanov","Ramachandran Ramjee"],"abstract":"Each LLM serving request goes through two phases. The first is prefill which processes the entire input prompt and produces the first output token and the second is decode which generates the rest of output tokens, one-at-a-time. Prefill iterations have high latency but saturate GPU compute due to parallel processing of the input prompt. In contrast, decode iterations have low latency but also low compute utilization because a decode iteration processes only a single token per request. This makes batching highly effective for decodes and consequently for overall throughput. However, batching multiple requests leads to an interleaving of prefill and decode iterations which makes it challenging to achieve both high throughput and low latency. We introduce an efficient LLM inference scheduler, Sarathi-Serve, to address this throughput-latency tradeoff. Sarathi-Serve introduces chunked-prefi...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Systems and networking","Computer science","1970-01-01","LLM","efficient"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/key-point-driven-data-synthesis-with-its-enhancement-on-mathematical-reasoning","title":"Key-Point-Driven Data Synthesis with its Enhancement on Mathematical Reasoning","url":"https://www.microsoft.com/en-us/research/publication/key-point-driven-data-synthesis-with-its-enhancement-on-mathematical-reasoning/","published":"2024-03-03","authors":["Yiming Huang","Xiao Liu","Yeyun Gong","Zhibin Gou","Yelong Shen","Nan Duan","Weizhu Chen"],"abstract":"Large language models (LLMs) have shown great potential in complex reasoning tasks, yet their performance is often hampered by the scarcity of high-quality, reasoning-focused training datasets. Addressing this challenge, we propose Key-Point-Driven Data Synthesis (KPDDS), a novel data synthesis framework that synthesizes question-answer pairs by leveraging key points and exemplar pairs from authentic data sources. KPDDS ensures the generation of novel questions with rigorous quality control and substantial scalability. As a result, we present KPMath, the most extensive synthetic dataset tailored for mathematical reasoning to date, comprising over one million question-answer pairs. Utilizing KPMath and augmenting it with additional reasoning-intensive corpora, we create the comprehensive KPMath-Plus dataset. Fine-tuning the Mistral-7B model on KPMath-Plus yields a zero-shot PASS@1 accurac...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Miscellaneous","Artificial intelligence","Computation and Language","Computer science","large language models"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"bytedance-seed:26","title":"SDXL-Lightning: Progressive Adversarial Diffusion Distillation","url":"https://seed.bytedance.com/en/research/sdxl-lightning-progressive-adversarial-diffusion-distillation","published":"2024-03-02","authors":["Shanchuan Lin","Anran Wang","Xiao Yang"],"abstract":"We propose a diffusion distillation method that achieves new state-of-the-art in one-step/few-step 1024px text-to-image generation based on SDXL. Our method combines progressive and adversarial distillation to achieve a balance between quality and mode coverage. In this paper, we discuss the theoretical analysis, discriminator design, model formulation, and training techniques. We open-source our distilled SDXL-Lightning models both as LoRA and full UNet weights. External paper link: https://arxiv.org/abs/2402.13929","companies":["ByteDance/Seed"],"matched_orgs":["ByteDance/Seed"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Computer Vision","Vision","arXiv","distillation"],"author_affiliations":["ByteDance/Seed"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official ByteDance Seed publication API https://seed.bytedance.com/api/get_article_list_v2"}},{"id":"openalex:W4396876085","title":"NLP4ReF: Requirements Classification and Forecasting: From Model-Based Design to Large Language Models","url":"https://doi.org/10.1109/aero58975.2024.10521022","published":"2024-03-02","authors":["Jordan Peer","Yaniv Mordecai","Yoram Reich"],"abstract":"We introduce Natural Language Processing for Requirement Forecasting (NLP4ReF), a model-based machine learning and natural language processing solution for enhancing the Requirements Engineering (RE) process. RE continues to face significant challenges and demands innovative approaches for process efficiency. Traditional RE methods relying on natural language struggle with incomplete, hidden, forgotten, and evolving requirements during and after the critical design review, risking project failures and setbacks. NLP4ReF tackles several key challenges: a) distinguishing between functional and non-functional requirements, b) classification of requirements by their respective system classes, and c) generation of unanticipated requirements to enhance project success. NLP4ReF employs a common natural language toolkit (NLTK) package and the recently-trending Chat-GPT. We tested NLP4ReF on PROMI...","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/aero58975.2024.10521022","openalex_id":"https://openalex.org/W4396876085","cited_by_count":8,"quality_score":45,"matched_keywords":[],"author_affiliations":["Amazon (United States)","Bellevue Hospital Center","Tel Aviv University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7239465117454529},{"id":"https://openalex.org/C6604083","display_name":"Requirements engineering","score":0.5982890129089355},{"id":"https://openalex.org/C59488412","display_name":"Requirements analysis","score":0.5586251616477966},{"id":"https://openalex.org/C98045186","display_name":"Process (computing)","score":0.5126139521598816},{"id":"https://openalex.org/C195324797","display_name":"Natural language","score":0.500481367111206},{"id":"https://openalex.org/C115903868","display_name":"Software engineering","score":0.4923444986343384},{"id":"https://openalex.org/C199747065","display_name":"Non-functional requirement","score":0.46132296323776245},{"id":"https://openalex.org/C173577280","display_name":"Requirements management","score":0.45079755783081055}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":8}},{"id":"openalex:W4392350060","title":"AI, ML, and Large Language Models in Cybersecurity","url":"http://doi.org/10.56726/irjmets49546","published":"2024-03-02","authors":["Pranav Kumar","Chaudhary","E Raff","J Barker","J Sylvester","C Nicholas","J Saxe","K Berlin","S Krasser","D Carrel","R Beyah","R Shams"],"abstract":"As technology continues to advance, an increasing number of entities are connecting to the internet, presenting significant security challenges in daily operations.Addressing these challenges has become an urgent priority.Cybersecurity threats are also evolving alongside technological progress.While rule-based and signature-based techniques have been effective in mitigating risks, the integration of Artificial Intelligence (AI), Machine Learning (ML), and Large Language Models (LLMs) is now essential for bolstering cybersecurity defenses against these evolving threats.This study investigates the applications, obstacles, and prospects of AI, ML, and LLMs in cybersecurity.Through an examination of various use cases, analysis of associated risks, and proposition of strategic solutions, this research aims to advance cybersecurity practices for safeguarding the digital landscape of tomorrow.","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.56726/irjmets49546","openalex_id":"https://openalex.org/W4392350060","cited_by_count":3,"quality_score":40,"matched_keywords":[],"author_affiliations":["Amazon (United States)"],"concepts":[{"id":"https://openalex.org/C38652104","display_name":"Computer security","score":0.552766740322113},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5470441579818726},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.39320388436317444}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":3}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/automating-human-tutor-style-programming-feedback-leveraging-gpt-4-tutor-model-for-hint-generation-and-gpt-3-5-student-model-for-hint-validation","title":"Automating Human Tutor-Style Programming Feedback: Leveraging GPT-4 Tutor Model for Hint Generation and GPT-3.5 Student Model for Hint Validation","url":"https://www.microsoft.com/en-us/research/publication/automating-human-tutor-style-programming-feedback-leveraging-gpt-4-tutor-model-for-hint-generation-and-gpt-3-5-student-model-for-hint-validation/","published":"2024-03-01","authors":["Tung Phung","Victor-Alexandru Pădurean","Anjali Singh","Christopher Brooks","José Cambronero","Sumit Gulwani","Adish Singla","Gustavo Soares"],"abstract":"Generative AI and large language models hold great promise in enhancingprogramming education by automatically generating individualized feedback forstudents. We investigate the role of generative AI models in providing humantutor-style programming hints to help students resolve errors in their buggyprograms. Recent works have benchmarked state-of-the-art models for variousfeedback generation scenarios; however, their overall quality is still inferiorto human tutors and not yet ready for real-world deployment. In this paper, weseek to push the limits of generative AI models toward providing high-qualityprogramming hints and develop a novel technique, GPT4Hints-GPT3.5Val. As a firststep, our technique leverages GPT-4 as a “tutor” model to generate hints – itboosts the generative quality by using symbolic information of failing testcases and fixes in prompts. As a next step, our technique l...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.1145/3636555.3636846","openalex_id":"https://openalex.org/W4392484243","cited_by_count":49,"quality_score":98,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Programming languages and software engineering","1970-01-01"],"author_affiliations":["Microsoft","Max Planck Institute for Software Systems","Microsoft (United States)","University of Michigan"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/enhancing-human-annotation-leveraging-large-language-models-and-efficient-batch-processing","title":"Enhancing human annotation: Leveraging large language models and efficient batch processing","url":"https://www.microsoft.com/en-us/research/publication/enhancing-human-annotation-leveraging-large-language-models-and-efficient-batch-processing/","published":"2024-03-01","authors":["Oleg Zendel","J Shane Culpepper","Falk Scholer","Paul Thomas"],"abstract":"Large language models (LLMs) are capable of assessing document and query characteristics, including relevance, and are now being used for a variety of different classification labeling tasks as well. This study explores how to use LLMs to classify an information need , often represented as a user query. In particular, our goal is to classify the cognitive complexity of the search task for a given \"backstory\". Using 180 TREC topics and backstories, we show that GPT-based LLMs agree with human experts as much as other human experts. We also show that batching and ordering can significantly impact the accuracy of GPT-3.5, but rarely alters the quality of GPT-4 predictions. This study provides insights into the efficacy of large language models for annotation tasks normally completed by humans, and offers recommendations for other similar applications. Venue: 1970-01-01","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Inproceedings (Conference)","Search and information retrieval","Information retrieval","1970-01-01","efficient"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"official:6a69ea99374525aa","title":"Correction Focused Language Model Training for Speech Recognition","url":"https://ai.meta.com/research/publications/correction-focused-language-model-training-for-speech-recognition/","published":"2024-03-01","authors":["Yingyi Ma","Zhe Liu","Ozlem Kalinli"],"abstract":"Official Meta AI publication page.","companies":["Meta/FAIR"],"matched_orgs":["Meta/FAIR"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Speech & Audio","language model"],"author_affiliations":["Meta/FAIR"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Meta AI publication search https://ai.meta.com/results/?content_types%5B0%5D=publication&page=16"}},{"id":"official:d7f7cbada2fb4ab9","title":"Claude 3 System Card","url":"https://www-cdn.anthropic.com/c6a80a657af445f40e31afac050f3bf76d3b1404.pdf","published":"2024-03","authors":["Anthropic"],"abstract":"Official Anthropic system card for Claude 3.","companies":["Anthropic"],"matched_orgs":["Anthropic"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"model_card","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["system card","Claude","Claude 3"],"author_affiliations":["Anthropic"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Anthropic system cards page https://www.anthropic.com/system-cards"}},{"id":"official:12cfac7841d2efce","title":"A Chat about Boring Problems: Studying GPT-Based Text Normalization","url":"https://research.nvidia.com/publication/2024-03_chat-about-boring-problems-studying-gpt-based-text-normalization","published":"2024-03","authors":["Yang Zhang","Travis M. Bartley","Mariana Graterol-Fuenmayor","Vitaly Lavrukhin","Evelina Bakhturina","Boris Ginsburg"],"abstract":"Official NVIDIA Research publication.","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.1109/icassp48485.2024.10447169","openalex_id":"https://openalex.org/W4392902937","cited_by_count":3,"quality_score":55,"matched_keywords":[],"author_affiliations":["NVIDIA","City University of New York","Nvidia (United States)","The Graduate Center, CUNY"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official NVIDIA Research publications page https://research.nvidia.com/publications?f%5B0%5D=publication_date%3A2024&page=3"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/truce-private-benchmarking-to-prevent-contamination-and-improve-comparative-evaluation-of-llms","title":"TRUCE: Private Benchmarking to Prevent Contamination and Improve Comparative Evaluation of LLMs","url":"https://www.microsoft.com/en-us/research/publication/truce-private-benchmarking-to-prevent-contamination-and-improve-comparative-evaluation-of-llms/","published":"2024-02-29","authors":["Nishanth Chandran","Sunayana Sitaram","Divya Gupta","Rahul Sharma","Kashish Mittal","Swami Manohar"],"abstract":"Benchmarking is the de-facto standard for evaluating LLMs, due to its speed, replicability and low cost. However, recent work has pointed out that the majority of the open source benchmarks available today have been contaminated or leaked into LLMs, meaning that LLMs have access to test data during pretraining and/or fine-tuning. This raises serious concerns about the validity of benchmarking studies conducted so far and the future of evaluation using benchmarks. To solve this problem, we propose Private Benchmarking, a solution where test datasets are kept private and models are evaluated without revealing the test data to the model. We describe various scenarios (depending on the trust placed on model owners or dataset owners), and present solutions to avoid data contamination using private benchmarking. For scenarios where the model weights need to be kept private, we describe solutio...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Miscellaneous","Security, privacy, and cryptography","Computation and Language","Computer science","Cryptography and Security"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/ai-transparency-in-the-age-of-llms-a-human-centered-research-roadmap","title":"AI Transparency in the Age of LLMs: A Human-Centered Research Roadmap","url":"https://www.microsoft.com/en-us/research/publication/ai-transparency-in-the-age-of-llms-a-human-centered-research-roadmap/","published":"2024-02-29","authors":["Q. Vera Liao","Jennifer Wortman Vaughan"],"abstract":"The rise of powerful large language models (LLMs) brings about tremendous opportunities for innovation but also looming risks for individuals and society at large. We have reached a pivotal moment for ensuring that LLMs and LLM-infused applications are developed and deployed responsibly. However, a central pillar of responsible AI—transparency—is largely missing from the current discourse around LLMs. It is paramount to pursue new approaches to provide transparency for LLMs, and years of research at the intersection of AI and human-computer interaction (HCI) highlight that we must do so with a human-centered perspective: Transparency is fundamentally about supporting appropriate human understanding, and this understanding is sought by different stakeholders with different goals in different contexts. In this new era of LLMs, we must develop and design approaches to transparency by consid...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Article (Journal)","Artificial intelligence","Human-computer interaction","1970-01-01","LLM"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W4392302569","title":"Convolutions are competitive with transformers for protein sequence pretraining","url":"https://doi.org/10.1016/j.cels.2024.01.008","published":"2024-02-29","authors":["Kevin Yang","Nicolò Fusi","Alex X. Lu"],"abstract":"","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1016/j.cels.2024.01.008","openalex_id":"https://openalex.org/W4392302569","cited_by_count":98,"quality_score":67,"matched_keywords":[],"author_affiliations":["Microsoft (United States)","Microsoft Research (United Kingdom)","Microsoft Research New England (United States)"],"concepts":[{"id":"https://openalex.org/C66322947","display_name":"Transformer","score":0.7989605665206909},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7103687524795532},{"id":"https://openalex.org/C123657996","display_name":"Architecture","score":0.5434125065803528},{"id":"https://openalex.org/C81363708","display_name":"Convolutional neural network","score":0.5228676795959473},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.5157130360603333},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3979557752609253},{"id":"https://openalex.org/C165801399","display_name":"Voltage","score":0.14607509970664978},{"id":"https://openalex.org/C127413603","display_name":"Engineering","score":0.10501083731651306}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":98}},{"id":"openalex:W4392309090","title":"ZeroNLG: Aligning and Autoencoding Domains for Zero-Shot Multimodal and Multilingual Natural Language Generation","url":"https://doi.org/10.1109/tpami.2024.3371376","published":"2024-02-29","authors":["Bang Yang","Fenglin Liu","Yuexian Zou","Xian Wu","Yaowei Wang","David A. Clifton"],"abstract":"Natural Language Generation (NLG) accepts input data in the form of images, videos, or text and generates corresponding natural language text as output. Existing NLG methods mainly adopt a supervised approach and rely heavily on coupled data-to-text pairs. However, for many targeted scenarios and for non-English languages, sufficient quantities of labeled data are often not available. As a result, it is necessary to collect and label data-text pairs for training, which is both costly and time-consuming. To relax the dependency on labeled data of downstream tasks, we propose an intuitive and effective zero-shot learning framework, ZeroNLG, which can deal with multiple NLG tasks, including image-to-text (image captioning), video-to-text (video captioning), and text-to-text (neural machine translation), across English, Chinese, German, and French within a unified framework. ZeroNLG does not...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/tpami.2024.3371376","openalex_id":"https://openalex.org/W4392309090","cited_by_count":15,"quality_score":52,"matched_keywords":[],"author_affiliations":["Peking University","Peng Cheng Laboratory","Suzhou Research Institute","Tencent (China)","University of Oxford"],"concepts":[{"id":"https://openalex.org/C2780813799","display_name":"Zero (linguistics)","score":0.7090169191360474},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.700027585029602},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.6292145252227783},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.566712498664856},{"id":"https://openalex.org/C195324797","display_name":"Natural language","score":0.54443359375},{"id":"https://openalex.org/C2776187449","display_name":"Natural language generation","score":0.4457484185695648},{"id":"https://openalex.org/C2776608160","display_name":"Natural (archaeology)","score":0.4355565905570984},{"id":"https://openalex.org/C2778344882","display_name":"Shot (pellet)","score":0.4202161729335785}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":15}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/overview-of-the-trec-2023-deep-learning-track","title":"Overview of the TREC 2023 Deep Learning Track","url":"https://www.microsoft.com/en-us/research/publication/overview-of-the-trec-2023-deep-learning-track/","published":"2024-02-28","authors":["Nick Craswell","Bhaskar Mitra","Emine Yilmaz","Hossein A. Rahmani","Daniel Campos","Jimmy Lin","Ellen M. Voorhees","Ian Soboroff"],"abstract":"This is the fifth year of the TREC Deep Learning track. As in previous years, we leverage the MS MARCO datasets that made hundreds of thousands of human-annotated training labels available for both passage and document ranking tasks. We mostly repeated last year’s design, to get another matching test set, based on the larger, cleaner, less-biased v2 passage and document set, with passage ranking as primary and document ranking as a secondary task (using labels inferred from passage). As we did last year, we sample from MS MARCO queries that were completely held out, unused in corpus construction, unlike the test queries in the first three years. This approach yields a more difficult test with more headroom for improvement. Alongside the usual MS MARCO (human) queries from MS MARCO, this year we generated synthetic queries using a fine-tuned T5 model and using a GPT-4 prompt.The new headl...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":112,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Search and information retrieval","Benchmarking","Deep learning","Deep neural networks","Document retrieval","Information retrieval","Ranking (information retrieval)","Search algorithm","Search engine","Text retrieval","1970-01-01","LLM","language model"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W4392265998","title":"SLaDe: A Portable Small Language Model Decompiler for Optimized Assembly","url":"https://doi.org/10.1109/cgo57630.2024.10444788","published":"2024-02-28","authors":["Jordi Armengol-Estapé","Jackson Woodruff","Chris Cummins","Michael O’Boyle"],"abstract":"Decompilation is a well-studied area with numerous high-quality tools available. These are frequently used for security tasks and to port legacy code. However, they regularly generate difficult-to-read programs and require a large amount of engineering effort to support new programming languages and ISAs. Recent interest in neural approaches has produced portable tools that generate readable code. Nevertheless, to-date such techniques are usually restricted to synthetic programs without optimization, and no models have evaluated their portability. Furthermore, while the code generated may be more readable, it is usually incorrect. This paper presents SLaDe, a Small Language model Decompiler based on a sequence-to-sequence Transformer trained over real-world code and augmented with a type inference engine. We utilize a novel tokenizer, dropout-free regularization, and type inference to ge...","companies":["Meta/FAIR"],"matched_orgs":["Meta/FAIR"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/cgo57630.2024.10444788","openalex_id":"https://openalex.org/W4392265998","cited_by_count":20,"quality_score":61,"matched_keywords":["language model"],"author_affiliations":["Menlo School","Meta (United States)","University of Edinburgh"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8355870842933655},{"id":"https://openalex.org/C63000827","display_name":"Software portability","score":0.7184809446334839},{"id":"https://openalex.org/C199360897","display_name":"Programming language","score":0.6404964327812195},{"id":"https://openalex.org/C2776760102","display_name":"Code (set theory)","score":0.525654673576355},{"id":"https://openalex.org/C198370458","display_name":"Type inference","score":0.5091058611869812},{"id":"https://openalex.org/C2777062904","display_name":"Toolchain","score":0.5013790130615234},{"id":"https://openalex.org/C43126263","display_name":"Source code","score":0.45795950293540955},{"id":"https://openalex.org/C50831359","display_name":"Assembly language","score":0.4420490860939026}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":20}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/strong-baselines-for-parameter-efficient-few-shot-fine-tuning","title":"Strong Baselines for Parameter Efficient Few-Shot Fine-tuning","url":"https://www.microsoft.com/en-us/research/publication/strong-baselines-for-parameter-efficient-few-shot-fine-tuning/","published":"2024-02-27","authors":["S. Basu","Daniela Massiceti","S. Hu","S. Feizi"],"abstract":"Few-shot classification (FSC) entails learning novel classes given only a few examples per class after a pre-training (or meta-training) phase on a set of base classes. Recent works have shown that simply fine-tuning a pre-trained Vision Transformer (ViT) on new test classes is a strong approach for FSC. Fine-tuning ViTs, however, is expensive in time, compute and storage. This has motivated the design of parameter efficient fine-tuning (PEFT) methods which fine-tune only a fraction of the Transformer's parameters. While these methods have shown promise, inconsistencies in experimental conditions make it difficult to disentangle their advantage from other experimental factors including the feature extractor architecture, pre-trained initialization and fine-tuning algorithm, amongst others. In our paper, we conduct a large-scale, experimentally consistent, empirical analysis to study PEFT...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":80,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer vision","Computer science","Few Shot Learning","1970-01-01","efficient"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/reslora-identity-residual-mapping-in-low-rank-adaption","title":"ResLoRA: Identity Residual Mapping in Low-Rank Adaption","url":"https://www.microsoft.com/en-us/research/publication/reslora-identity-residual-mapping-in-low-rank-adaption/","published":"2024-02-27","authors":["Shuhua Shi","Shaohan Huang","Minghui Song","Zhoujun Li","Zihan Zhang","Haizhen Huang","Furu Wei","Weiwei Deng","Feng Sun","Qi Zhang"],"abstract":"As one of the most popular parameter-efficient fine-tuning (PEFT) methods, low-rank adaptation (LoRA) is commonly applied to fine-tune large language models (LLMs). However, updating the weights of LoRA blocks effectively and expeditiously is challenging due to the long calculation path in the original model. To address this, we propose ResLoRA, an improved framework of LoRA. By adding residual paths during training and using merging approaches to eliminate these extra paths during inference, our method can achieve better results in fewer training steps without any extra trainable parameters or inference cost compared to LoRA. The experiments on NLG, NLU, and text-to-image tasks demonstrate the effectiveness of our method. To the best of our knowledge, ResLoRA is the first work that combines the residual path with LoRA. The code of our method is available at https://github.com/microsoft/...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Miscellaneous","Artificial intelligence","Computation and Language","Computer science","large language models","efficient"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W4408749234","title":"A Multimodal Approach to Software Quality Assurance: Integrating Static Analysis, Dynamic Testing, and AI-based Anomaly Detection","url":"https://doi.org/10.15680/ijircce.2024.1202003","published":"2024-02-27","authors":["Gopinath Kathiresan"],"abstract":": The combination of software architecture evolutions and cloud computing and cyber-physical systems creates advanced complexity when ensuring software reliability and security and efficiency. The once typical software quality assurance (SQA) practices using manual reviews and isolated testing methods fail to provide acceptable modern results anymore. This study develops a multimodal software quality assurance enhancement approach which combines static analysis together with dynamic testing and AI anomaly detection techniques. Software quality examines both potential defects alongside security vulnerabilities through code-level static analysis before running the program while dynamic testing evaluates real-time functionalities and security features. AI-based anomaly detection systems develop through machine learning models which help software testing teams by predicting failures as well....","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.15680/ijircce.2024.1202003","openalex_id":"https://openalex.org/W4408749234","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Apple (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8566818237304688},{"id":"https://openalex.org/C739882","display_name":"Anomaly detection","score":0.7729616165161133},{"id":"https://openalex.org/C106436119","display_name":"Quality assurance","score":0.72691810131073},{"id":"https://openalex.org/C2776969324","display_name":"Software quality assurance","score":0.5719919204711914},{"id":"https://openalex.org/C124101348","display_name":"Data mining","score":0.5133873224258423},{"id":"https://openalex.org/C2777904410","display_name":"Software","score":0.48465877771377563},{"id":"https://openalex.org/C2779530757","display_name":"Quality (philosophy)","score":0.4699794352054596},{"id":"https://openalex.org/C115903868","display_name":"Software engineering","score":0.44373437762260437}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/the-era-of-1-bit-llms-all-large-language-models-are-in-1-58-bits","title":"The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits","url":"https://www.microsoft.com/en-us/research/publication/the-era-of-1-bit-llms-all-large-language-models-are-in-1-58-bits/","published":"2024-02-26","authors":["Shuming Ma","Hongyu Wang","Lingxiao Ma","Lei Wang","Wenhui Wang","Shaohan Huang","Lifeng Dong","Ruiping Wang","Jilong Xue","Furu Wei"],"abstract":"Recent research, such as BitNet, is paving the way for a new era of 1-bit Large Language Models (LLMs). In this work, we introduce a 1-bit LLM variant, namely BitNet b1.58, in which every single parameter (or weight) of the LLM is ternary {-1, 0, 1}. It matches the full-precision (i.e., FP16 or BF16) Transformer LLM with the same model size and training tokens in terms of both perplexity and end-task performance, while being significantly more cost-effective in terms of latency, memory, throughput, and energy consumption. More profoundly, the 1.58-bit LLM defines a new scaling law and recipe for training new generations of LLMs that are both high-performance and cost-effective. Furthermore, it enables a new computation paradigm and opens the door for designing specific hardware optimized for 1-bit LLMs.","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":80,"matched_keywords":["Miscellaneous","Artificial intelligence","Computation and Language","Computer science","Machine learning","LLM","memory"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/towards-optimal-learning-of-language-models","title":"Towards Optimal Learning of Language Models","url":"https://www.microsoft.com/en-us/research/publication/towards-optimal-learning-of-language-models/","published":"2024-02-26","authors":["Yuxian Gu","Li Dong","Yaru Hao","Qingxiu Dong","Minlie Huang","Furu Wei"],"abstract":"This work studies the general principles of improving the learning of language models (LMs), which aims at reducing the necessary training steps for achieving superior performance. Specifically, we present a theory for the optimal learning of LMs. We first propose an objective that optimizes LM learning by maximizing the data compression ratio in an \"LM-training-as-lossless-compression\" view. Then, we derive a theorem, named Learning Law, to reveal the properties of the dynamics in the optimal learning process under our objective. The theorem is then validated by experiments on a linear classification and a real-world language modeling task. Finally, we empirically verify that the optimal learning of LMs essentially stems from the improvement of the coefficients in the scaling law of LMs, indicating great promise and significance for designing practical learning acceleration methods. Our...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Miscellaneous","Artificial intelligence","Computation and Language","Computer science","compression"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W4392168151","title":"scGPT: toward building a foundation model for single-cell multi-omics using generative AI","url":"https://doi.org/10.1038/s41592-024-02201-0","published":"2024-02-26","authors":["Haotian Cui","Chloe Wang","Hassaan Maan","Kuan Pang","Fengning Luo","Nan Duan","Bo Wang"],"abstract":"","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1038/s41592-024-02201-0","openalex_id":"https://openalex.org/W4392168151","cited_by_count":937,"quality_score":67,"matched_keywords":[],"author_affiliations":["Microsoft (United States)","University Health Network","University of Toronto","Vector Institute"],"concepts":[{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.6620688438415527},{"id":"https://openalex.org/C2776214188","display_name":"Inference","score":0.6358082890510559},{"id":"https://openalex.org/C191908910","display_name":"Synthetic biology","score":0.610347330570221},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6000725626945496},{"id":"https://openalex.org/C152662350","display_name":"Systems biology","score":0.5015749931335449},{"id":"https://openalex.org/C70721500","display_name":"Computational biology","score":0.498563289642334},{"id":"https://openalex.org/C66322947","display_name":"Transformer","score":0.4764726161956787},{"id":"https://openalex.org/C2776321320","display_name":"Annotation","score":0.47562792897224426}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":937}},{"id":"hf-org-paper:huawei-noah:2403.00818","title":"DenseMamba: State Space Models with Dense Hidden Connection for Efficient Large Language Models","url":"https://huggingface.co/papers/2403.00818","published":"2024-02-26","authors":["Huawei/Noah"],"abstract":"","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["HuggingFace org papers","huawei-noah","efficient"],"author_affiliations":["Huawei/Noah"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/huawei-noah/papers"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/language-specific-neurons-the-key-to-multilingual-capabilities-in-large-language-models","title":"Language-Specific Neurons: The Key to Multilingual Capabilities in Large Language Models","url":"https://www.microsoft.com/en-us/research/publication/language-specific-neurons-the-key-to-multilingual-capabilities-in-large-language-models/","published":"2024-02-25","authors":["Tianyi Tang","Wenyang Luo","Haoyang Huang","Dongdong Zhang","Xiaolei Wang","Xin Zhao","Furu Wei","Ji-Rong Wen"],"abstract":"Large language models (LLMs) demonstrate remarkable multilingual capabilities without being pre-trained on specially curated multilingual parallel corpora. It remains a challenging problem to explain the underlying mechanisms by which LLMs process multilingual texts. In this paper, we delve into the composition of Transformer architectures in LLMs to pinpoint language-specific regions. Specially, we propose a novel detection method, language activation probability entropy (LAPE), to identify language-specific neurons within LLMs. Based on LAPE, we conduct comprehensive experiments on several representative LLMs, such as LLaMA-2, BLOOM, and Mistral. Our findings indicate that LLMs' proficiency in processing a particular language is predominantly due to a small subset of neurons, primarily situated in the models' top and bottom layers. Furthermore, we showcase the feasibility to\"steer\"the....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computation and Language","Computer science","large language models","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"bytedance-seed:21","title":"MegaScale: Scaling Large Language Model Training to More Than 10,000 GPUs","url":"https://seed.bytedance.com/en/research/megascale-scaling-large-language-model-training-to-more-than-10-000-gpus","published":"2024-02-23","authors":["Ziheng Jiang","Haibin Lin","Yinmin Zhong","Qi Huang","Yangrui Chen","Zhi Zhang","Yanghua Peng","Xiang Li","Cong Xie","Shibiao Nong","Yulu Jia","Sun He"],"abstract":"We present the design, implementation and engineering experience in building and deploying MegaScale, a production system for training large language models (LLMs) at the scale of more than 10,000 GPUs. Training LLMs at this scale brings unprecedented challenges to training efficiency and stability. We take a full-stack approach that co-designs the algorithmic and system components across model block and optimizer design, computation and communication overlapping, operator optimization, data pipeline, and network performance tuning. Maintaining high efficiency throughout the training process (i.e., stability) is an important consideration in production given the long extent of LLM training jobs. Many hard stability issues only emerge at large scale, and in-depth observability is the key to address them. We develop a set of diagnosis tools to monitor system components and events deep in t...","companies":["ByteDance/Seed"],"matched_orgs":["ByteDance/Seed"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["System Research","Infrastructures","Nsdi 2024","LLM","language model"],"author_affiliations":["ByteDance/Seed"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official ByteDance Seed publication API https://seed.bytedance.com/api/get_article_list_v2"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/dosa-a-dataset-of-social-artifacts-from-different-indian-geographical-subcultures","title":"DOSA: A Dataset of Social Artifacts from Different Indian Geographical Subcultures","url":"https://www.microsoft.com/en-us/research/publication/dosa-a-dataset-of-social-artifacts-from-different-indian-geographical-subcultures/","published":"2024-02-22","authors":["Agrima Seth","Sanchit Ahuja","Kalika Bali","Sunayana Sitaram"],"abstract":"Generative models are increasingly being used in various applications, such as text generation, commonsense reasoning, and question-answering. To be effective globally, these models must be aware of and account for local socio-cultural contexts, making it necessary to have benchmarks to evaluate the models for their cultural familiarity. Since the training data for LLMs is web-based and the Web is limited in its representation of information, it does not capture knowledge present within communities that are not on the Web. Thus, these models exacerbate the inequities, semantic misalignment, and stereotypes from the Web. There has been a growing call for community-centered participatory research methods in NLP. In this work, we respond to this call by using participatory research methods to introduce DOSA, the first community-generated Dataset of 615 Social Artifacts, by engaging with 260...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Miscellaneous","Artificial intelligence","Human language technologies","Computer science","Text generation"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"arxiv:2402.14905","title":"MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases","url":"https://huggingface.co/papers/2402.14905","published":"2024-02-22","authors":["Zechun Liu","Changsheng Zhao","Forrest Iandola","Chen Lai","Yuandong Tian","Igor Fedorov","Yunyang Xiong","Ernie Chang","Yangyang Shi","Raghuraman Krishnamoorthi","Liangzhen Lai","Vikas Chandra"],"abstract":"This paper addresses the growing need for efficient large language models (LLMs) on mobile devices, driven by increasing cloud costs and latency concerns. We focus on designing top-quality LLMs with fewer than a billion parameters, a practical choice for mobile deployment. Contrary to prevailing belief emphasizing the pivotal role of data and parameter quantity in determining model quality, our investigation underscores the significance of model architecture for sub-billion scale LLMs. Leveraging deep and thin architectures, coupled with embedding sharing and grouped-query attention mechanisms, we establish a strong baseline network denoted as MobileLLM, which attains a remarkable 2.7%/4.3% accuracy boost over preceding 125M/350M state-of-the-art models. Additionally, we propose an immediate block-wise weight sharing approach with no increase in model size and only marginal latency overh...","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["huggingface_search"],"source":"huggingface_search","work_type":"","doi":"","openalex_id":"","cited_by_count":0,"quality_score":31,"matched_keywords":["efficient"],"author_affiliations":[],"concepts":[],"official_report":false,"quality_signals":{"company_match_source":"HuggingFace Papers organization/author metadata"}},{"id":"official:340037217ce7ead7","title":"Watermarking Makes Language Models Radioactive","url":"https://ai.meta.com/research/publications/watermarking-makes-language-models-radioactive/","published":"2024-02-21","authors":["Tom Sander","Pierre Fernandez","Alain Durmus","Matthijs Douze","Teddy Furon"],"abstract":"Official Meta AI publication page.","companies":["Meta/FAIR"],"matched_orgs":["Meta/FAIR"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Integrity","NLP"],"author_affiliations":["Meta/FAIR"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Meta AI publication search https://ai.meta.com/results/?content_types%5B0%5D=publication&page=16"}},{"id":"official:b51007c9ea44b180","title":"Toolformer: Language Models Can Teach Themselves to Use Tools","url":"https://ai.meta.com/research/publications/toolformer-language-models-can-teach-themselves-to-use-tools/","published":"2024-02-21","authors":["Timo Schick","Jane Yu","Roberto Dessì","Roberta Raileanu","Maria Lomeli","Eric Hambro","Luke Zettlemoyer","Nicola Cancedda","Thomas Scialom"],"abstract":"Official Meta AI publication page.","companies":["Meta/FAIR"],"matched_orgs":["Meta/FAIR"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["NLP"],"author_affiliations":["Meta/FAIR"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Meta AI publication search https://ai.meta.com/results/?content_types%5B0%5D=publication&page=16"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/longrope-extending-llm-context-window-beyond-2-million-tokens","title":"LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens","url":"https://www.microsoft.com/en-us/research/publication/longrope-extending-llm-context-window-beyond-2-million-tokens/","published":"2024-02-20","authors":["Yiran Ding","Li Lyna Zhang","Chengruidong Zhang","Yuanyuan Xu","Ning Shang","Jiahang Xu","Fan Yang","Mao Yang"],"abstract":"Large context window is a desirable feature in large language models (LLMs). However, due to high fine-tuning costs, scarcity of long texts, and catastrophic values introduced by new token positions, current extended context windows are limited to around 128k tokens. This paper introduces LongRoPE that, for the first time, extends the context window of pre-trained LLMs to an impressive 2048k tokens, with up to only 1k fine-tuning steps at within 256k training lengths, while maintaining performance at the original short context window. This is achieved by three key innovations: (i) we identify and exploit two forms of non-uniformities in positional interpolation through an efficient search, providing a better initialization for fine-tuning and enabling an 8x extension in non-fine-tuning scenarios; (ii) we introduce a progressive extension strategy that first fine-tunes a 256k length LLM a...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","large language models","1970-01-01","LLM","efficient"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/text2analysis-a-benchmark-of-table-question-answering-with-advanced-data-analysis-and-unclear-queries","title":"Text2Analysis: A Benchmark of Table Question Answering with Advanced Data Analysis and Unclear Queries","url":"https://www.microsoft.com/en-us/research/publication/text2analysis-a-benchmark-of-table-question-answering-with-advanced-data-analysis-and-unclear-queries/","published":"2024-02-20","authors":["Xinyi He","Mengyu Zhou","Xinrun Xu","Xiaojun Ma","Rui Ding","Lun Du","Yan Gao","Ran Jia","Xu Chen","Shi Han","Zejian Yuan","Dongmei Zhang"],"abstract":"Tabular data analysis is crucial in various fields, and large language models show promise in this area. However, current research mostly focuses on rudimentary tasks like Text2SQL and TableQA, neglecting advanced analysis like forecasting and chart generation. To address this gap, we developed the Text2Analysis benchmark, incorporating advanced analysis tasks that go beyond the SQL-compatible operations and require more in-depth analysis. We also develop five innovative and effective annotation methods, harnessing the capabilities of large language models to enhance data quality and quantity. Additionally, we include unclear queries that resemble real-world user questions to test how well models can understand and tackle such challenges. Finally, we collect 2249 query-result pairs with 347 tables. We evaluate five state-of-the-art models using three different metrics and the results sho...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Data platforms and analytics","Human language technologies","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/dyval-2-dynamic-evaluation-of-large-language-models-by-meta-probing-agents","title":"DyVal 2: Dynamic Evaluation of Large Language Models by Meta Probing Agents","url":"https://www.microsoft.com/en-us/research/publication/dyval-2-dynamic-evaluation-of-large-language-models-by-meta-probing-agents/","published":"2024-02-20","authors":["Kaijie Zhu","Jindong Wang","Qinlin Zhao","Ruochen Xu","Xing Xie"],"abstract":"Evaluation of large language models (LLMs) has raised great concerns in the community due to the issue of data contamination. Existing work designed evaluation protocols using well-defined algorithms for specific tasks, which cannot be easily extended to diverse scenarios. Moreover, current evaluation benchmarks can only provide the overall benchmark results and cannot support a fine-grained and multifaceted analysis of LLMs' abilities. In this paper, we propose meta probing agents (MPA), a general dynamic evaluation protocol inspired by psychometrics to evaluate LLMs. MPA is the key component of DyVal 2, which naturally extends the previous DyVal~\\citep{zhu2023dyval}. MPA designs the probing and judging agents to automatically transform an original evaluation problem into a new one following psychometric theory on three basic cognitive abilities: language understanding, problem solving,...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","large language models","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/slot-vlm-slowfast-slots-for-video-language-modeling","title":"Slot-VLM: SlowFast Slots for Video-Language Modeling","url":"https://www.microsoft.com/en-us/research/publication/slot-vlm-slowfast-slots-for-video-language-modeling/","published":"2024-02-19","authors":["Jiaqi Xu","Cuiling Lan","Wenxuan Xie","Xuejin Chen","Yan Lu"],"abstract":"Video-Language Models (VLMs), powered by the advancements in Large Language Models (LLMs), are charting new frontiers in video understanding. A pivotal challenge is the development of an efficient method to encapsulate video content into a set of representative tokens to align with LLMs. In this work, we introduce Slot-VLM, a novel framework designed to generate semantically decomposed video tokens, in terms of object-wise and event-wise visual representations, to facilitate LLM inference. Particularly, we design a SlowFast Slots module, i.e., SF-Slots, that adaptively aggregates the dense video tokens from the CLIP vision encoder to a set of representative slots. In order to take into account both the spatial object details and the varied temporal dynamics, SF-Slots is built with a dual-branch structure. The Slow-Slots branch focuses on extracting object-centric slots from features at h...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":80,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","Video-Language Models","1970-01-01","LLM","efficient"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/synthetic-data-almost-from-scratch-generalized-instruction-tuning-for-language-models","title":"Synthetic Data (Almost) from Scratch: Generalized Instruction Tuning for Language Models","url":"https://www.microsoft.com/en-us/research/publication/synthetic-data-almost-from-scratch-generalized-instruction-tuning-for-language-models/","published":"2024-02-19","authors":["Haoran Li","Qingxiu Dong","Zhengyang Tang","Chaojun Wang","Xingxing Zhang","Haoyang Huang","Shaohan Huang","Xiaolong Huang","Zeqiang Huang","Dongdong Zhang","Yuxian Gu","Xin Cheng"],"abstract":"We introduce Generalized Instruction Tuning (called GLAN), a general and scalable method for instruction tuning of Large Language Models (LLMs). Unlike prior work that relies on seed examples or existing datasets to construct instruction tuning data, GLAN exclusively utilizes a pre-curated taxonomy of human knowledge and capabilities as input and generates large-scale synthetic instruction data across all disciplines. Specifically, inspired by the systematic structure in human education system, we build the taxonomy by decomposing human knowledge and capabilities to various fields, sub-fields and ultimately, distinct disciplines semi-automatically, facilitated by LLMs. Subsequently, we generate a comprehensive list of subjects for every discipline and proceed to design a syllabus tailored to each subject, again utilizing LLMs. With the fine-grained key concepts detailed in every class se...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Miscellaneous","Artificial intelligence","Computation and Language","Computer science"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/hybrid-llm-cost-efficient-and-quality-aware-query-routing","title":"Hybrid LLM: Cost-Efficient and Quality-Aware Query Routing","url":"https://www.microsoft.com/en-us/research/publication/hybrid-llm-cost-efficient-and-quality-aware-query-routing/","published":"2024-02-16","authors":["Dujian Ding","Ankur Mallick","Chi Wang","Robert Sim","Subhabrata Mukherjee","Victor Ruehle","Laks V. S. Lakshmanan","Ahmed Awadallah"],"abstract":"Large language models (LLMs) excel in most NLP tasks but also require expensive cloud servers for deployment due to their size, while smaller models that can be deployed on lower cost (e.g., edge) devices, tend to lag behind in terms of response quality. Therefore in this work we propose a hybrid inference approach which combines their respective strengths to save cost and maintain quality. Our approach uses a router that assigns queries to the small or large model based on the predicted query difficulty and the desired quality level. The desired quality level can be tuned dynamically at test time to seamlessly trade quality for cost as per the scenario requirements. In experiments our approach allows us to make up to 40% fewer calls to the large model, with no drop in response quality. Venue: 1970-01-01","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":80,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","large language models","Machine learning","1970-01-01","LLM","efficient"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W4391884092","title":"Jointly Harnessing Prior Structures and Temporal Consistency for Sign Language Video Generation","url":"https://doi.org/10.1145/3648368","published":"2024-02-16","authors":["Yucheng Suo","Zhedong Zheng","Xiaohan Wang","Bang Zhang","Yi Yang"],"abstract":"Sign language provides a way for differently-abled individuals to express their feelings and emotions. However, learning sign language can be challenging and time consuming. An alternative approach is to animate user photos using sign language videos of specific words, which can be achieved using existing image animation methods. However, the finger motions in the generated videos are often not ideal. To address this issue, we propose the Structure-aware Temporal Consistency Network (STCNet), which jointly optimizes the prior structure of humans with temporal consistency to produce sign language videos. We use a fine-grained skeleton detector to acquire knowledge of body structure and introduce both short- and long-term cycle loss to ensure the continuity of the generated video. The two losses and keypoint detector network are optimized in an end-to-end manner. Quantitative and qualitati...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3648368","openalex_id":"https://openalex.org/W4391884092","cited_by_count":20,"quality_score":61,"matched_keywords":["long-term"],"author_affiliations":["Alibaba Group (China)","University of Macau","Zhejiang University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8714147210121155},{"id":"https://openalex.org/C522192633","display_name":"Sign language","score":0.6178749799728394},{"id":"https://openalex.org/C2776436953","display_name":"Consistency (knowledge bases)","score":0.610470712184906},{"id":"https://openalex.org/C139676723","display_name":"Sign (mathematics)","score":0.5642220377922058},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5303326845169067},{"id":"https://openalex.org/C502989409","display_name":"Animation","score":0.5043965578079224},{"id":"https://openalex.org/C2776737515","display_name":"American Sign Language","score":0.4197271764278412},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.3883187472820282}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":20}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/generative-representational-instruction-tuning","title":"Generative Representational Instruction Tuning","url":"https://www.microsoft.com/en-us/research/publication/generative-representational-instruction-tuning/","published":"2024-02-15","authors":["Niklas Muennighoff","Hongjin Su","Liang Wang","Nan Yang","Furu Wei","Tao Yu","Amanpreet Singh","Douwe Kiela"],"abstract":"All text-based language problems can be reduced to either generation or embedding. Current models only perform well at one or the other. We introduce generative representational instruction tuning (GRIT) whereby a large language model is trained to handle both generative and embedding tasks by distinguishing between them through instructions. Compared to other open models, our resulting GritLM 7B sets a new state of the art on the Massive Text Embedding Benchmark (MTEB) and outperforms all models up to its size on a range of generative tasks. By scaling up further, GritLM 8x7B outperforms all open generative language models that we tried while still being among the best embedding models. Notably, we find that GRIT matches training on only generative or embedding data, thus we can unify both at no performance loss. Among other benefits, the unification via GRIT speeds up Retrieval-Augment...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":80,"matched_keywords":["Inproceedings (Conference)","Human language technologies","Search and information retrieval","Computer science","1970-01-01","language model","retrieval"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/bitdistiller-unleashing-the-potential-of-sub-4-bit-llms-via-self-distillation","title":"BitDistiller: Unleashing the Potential of Sub-4-Bit LLMs via Self-Distillation","url":"https://www.microsoft.com/en-us/research/publication/bitdistiller-unleashing-the-potential-of-sub-4-bit-llms-via-self-distillation/","published":"2024-02-15","authors":["Dayou Du","Yijia Zhang","Shijie Cao","Jiaqi Guo","Ting Cao","Xiaowen Chu","Ningyi Xu"],"abstract":"The upscaling of Large Language Models (LLMs) has yielded impressive advances in natural language processing, yet it also poses significant deployment challenges. Weight quantization has emerged as a widely embraced solution to reduce memory and computational demands. This paper introduces BitDistiller, a framework that synergizes Quantization-Aware Training (QAT) with Knowledge Distillation (KD) to boost the performance of LLMs at ultra-low precisions (sub-4-bit). Specifically, BitDistiller first incorporates a tailored asymmetric quantization and clipping technique to maximally preserve the fidelity of quantized weights, and then proposes a novel Confidence-Aware Kullback-Leibler Divergence (CAKLD) objective, which is employed in a self-distillation manner to enable faster convergence and superior model performance. Empirical evaluations demonstrate that BitDistiller significantly surp...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":80,"matched_keywords":["Miscellaneous","Artificial intelligence","Computation and Language","Computer science","memory","quantization","distillation"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/using-left-and-right-brains-together-towards-vision-and-language-planning","title":"Using Left and Right Brains Together: Towards Vision and Language Planning","url":"https://www.microsoft.com/en-us/research/publication/using-left-and-right-brains-together-towards-vision-and-language-planning/","published":"2024-02-15","authors":["Jun Cen","Chenfei Wu","Xiao Liu","Sheng-Siang Yin","Yixuan Pei","Jinglong Yang","Qifeng Chen","Nan Duan","Jianguo Zhang"],"abstract":"Large Language Models (LLMs) and Large Multi-modality Models (LMMs) have demonstrated remarkable decision masking capabilities on a variety of tasks. However, they inherently operate planning within the language space, lacking the vision and spatial imagination ability. In contrast, humans utilize both left and right hemispheres of the brain for language and visual planning during the thinking process. Therefore, we introduce a novel vision-language planning framework in this work to perform concurrent visual and language planning for tasks with inputs of any form. Our framework incorporates visual planning to capture intricate environmental details, while language planning enhances the logical coherence of the overall system. We evaluate the effectiveness of our framework across vision-language tasks, vision-only tasks, and language-only tasks. The results demonstrate the superior perfo...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","large language models","Large Multi-modality Models","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/orca-math-unlocking-the-potential-of-slms-in-grade-school-math","title":"Orca-Math: Unlocking the potential of SLMs in Grade School Math","url":"https://www.microsoft.com/en-us/research/publication/orca-math-unlocking-the-potential-of-slms-in-grade-school-math/","published":"2024-02-15","authors":["Arindam Mitra","Hamed Khanpour","Corby Rosset","Ahmed Awadallah"],"abstract":"Mathematical word problem-solving has long been recognized as a complex task for small language models (SLMs). A recent study hypothesized that the smallest model size, needed to achieve over 80% accuracy on the GSM8K benchmark, is 34 billion parameters. To reach this level of performance with smaller models, researcher often train SLMs to generate Python code or use tools to help avoid calculation errors. Additionally, they employ ensembling, where outputs of up to 100 model runs are combined to arrive at a more accurate result. Result selection is done using consensus, majority vote or a separate a verifier model used in conjunction with the SLM. Ensembling provides a substantial boost in accuracy but at a significant cost increase with multiple calls to the model (e.g., Phi-GSM uses top-48 to boost the performance from 68.2 to 81.5).In this work, we present Orca-Math, a 7-billion-para...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Miscellaneous","Artificial intelligence","Computer science","preference","agent","multi-agent"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"official:c8a9bc4b99e83351","title":"Video generation models as world simulators","url":"https://openai.com/index/video-generation-models-as-world-simulators","published":"2024-02-15","authors":["OpenAI"],"abstract":"We explore large-scale training of generative models on video data. Specifically, we train text-conditional diffusion models jointly on videos and images of variable durations, resolutions and aspect ratios. We leverage a transformer architecture that operates on spacetime patches of video and image latent codes. Our largest model, Sora, is capable of generating a minute of high fidelity video. Our results suggest that scaling video generation models is a promising path towards building general purpose simulators of the physical world.","companies":["OpenAI"],"matched_orgs":["OpenAI"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Research"],"author_affiliations":["OpenAI"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official RSS feed https://openai.com/news/rss.xml"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/chemreasoner-heuristic-search-over-a-large-language-models-knowledge-space-using-quantum-chemical-feedback","title":"ChemReasoner: Heuristic Search over a Large Language Model's Knowledge Space using Quantum-Chemical Feedback","url":"https://www.microsoft.com/en-us/research/publication/chemreasoner-heuristic-search-over-a-large-language-models-knowledge-space-using-quantum-chemical-feedback/","published":"2024-02-14","authors":["Henry W Sprueill","Carl N. Edwards","Khushbu Agarwal","Mariefel V. Olarte","Udishnu Sanyal","Conrad Johnston","Hongbin Liu","Heng Ji","Sutanay Choudhury"],"abstract":"The discovery of new catalysts is essential for the design of new and more efficient chemical processes in order to transition to a sustainable future. We introduce an AI-guided computational screening framework unifying linguistic reasoning with quantum-chemistry based feedback from 3D atomistic representations. Our approach formulates catalyst discovery as an uncertain environment where an agent actively searches for highly effective catalysts via the iterative combination of large language model (LLM)-derived hypotheses and atomistic graph neural network (GNN)-derived feedback. Identified catalysts in intermediate search steps undergo structural evaluation based on spatial orientation, reaction pathways, and stability. Scoring functions based on adsorption energies and barriers steer the exploration in the LLM's knowledge space toward energetically favorable, high-efficiency catalysts...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":92,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","computational chemistry","Computer science","Physics","1970-01-01","LLM","language model","efficient","agent"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W4391807605","title":"Make-Your-Video: Customized Video Generation Using Textual and Structural Guidance","url":"https://doi.org/10.1109/tvcg.2024.3365804","published":"2024-02-14","authors":["Jinbo Xing","Menghan Xia","Yuxin Liu","Yuechen Zhang","Yong Zhang","Yingqing He","Hanyuan Liu","Haoxin Chen","Xiaodong Cun","Xintao Wang","Ying Shan","Tien‐Tsin Wong"],"abstract":"Creating a vivid video from the event or scenario in our imagination is a truly fascinating experience. Recent advancements in text-to-video synthesis have unveiled the potential to achieve this with prompts only. While text is convenient in conveying the overall scene context, it may be insufficient to control precisely. In this paper, we explore customized video generation by utilizing text as context description and motion structure (e.g., frame-wise depth) as concrete guidance. Our method, dubbed Make-Your-Video, involves joint-conditional video generation using a Latent Diffusion Model that is pre-trained for still image synthesis and then promoted for video generation with the introduction of temporal modules. This two-stage learning scheme not only reduces the computing resources required, but also improves the performance by transferring the rich concepts available in image datas...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/tvcg.2024.3365804","openalex_id":"https://openalex.org/W4391807605","cited_by_count":47,"quality_score":67,"matched_keywords":[],"author_affiliations":["Chinese University of Hong Kong","Hong Kong University of Science and Technology","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8878116607666016},{"id":"https://openalex.org/C2779343474","display_name":"Context (archaeology)","score":0.6418624520301819},{"id":"https://openalex.org/C2779662365","display_name":"Event (particle physics)","score":0.48222076892852783},{"id":"https://openalex.org/C2776459999","display_name":"Fidelity","score":0.4759749472141266},{"id":"https://openalex.org/C49774154","display_name":"Multimedia","score":0.47323712706565857},{"id":"https://openalex.org/C202474056","display_name":"Video tracking","score":0.44280874729156494},{"id":"https://openalex.org/C2781181686","display_name":"Coherence (philosophical gambling strategy)","score":0.44216153025627136},{"id":"https://openalex.org/C2776760102","display_name":"Code (set theory)","score":0.43042299151420593}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":47}},{"id":"openalex:W4391807641","title":"Chinese Title Generation for Short Videos: Dataset, Metric and Algorithm","url":"http://dx.doi.org/10.1109/tpami.2024.3365739","published":"2024-02-14","authors":["Ziqi Zhang","Zongyang Ma","Chunfeng Yuan","Yuxin Chen","Peijin Wang","Zhongang Qi","Chenglei Hao","Bing Li","Ying Shan","Weiming Hu","Stephen J. Maybank"],"abstract":"Previous work for video captioning aims to objectively describe the video content but the captions lack human interest and attractiveness, limiting its practical application scenarios. The intention of video title generation (video titling) is to produce attractive titles, but there is a lack of benchmarks. This work offers CREATE, the first large-scale Chinese shoRt vidEo retrievAl and Title gEneration dataset, to assist research and applications in video titling, video captioning, and video retrieval in Chinese. CREATE comprises a high-quality labeled 210 K dataset and two web-scale 3 M and 10 M pre-training datasets, covering 51 categories, 50K+ tags, 537K+ manually annotated titles and captions, and 10M+ short videos with original video information. This work presents ACTEr, a unique Attractiveness-Consensus-based Title Evaluation, to objectively evaluate the quality of video title g...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/tpami.2024.3365739","openalex_id":"https://openalex.org/W4391807641","cited_by_count":3,"quality_score":44,"matched_keywords":["retrieval"],"author_affiliations":["Aerospace Information Research Institute","Birkbeck, University of London","Chinese Academy of Sciences","Institute of Automation","ShanghaiTech University","Tencent (China)","University of Chinese Academy of Sciences"],"concepts":[{"id":"https://openalex.org/C157657479","display_name":"Closed captioning","score":0.9544274806976318},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8366209268569946},{"id":"https://openalex.org/C176217482","display_name":"Metric (unit)","score":0.6465070247650146},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.6424679756164551},{"id":"https://openalex.org/C103910844","display_name":"Video quality","score":0.5732909440994263},{"id":"https://openalex.org/C158154518","display_name":"Relevance (law)","score":0.5187644958496094},{"id":"https://openalex.org/C2779530757","display_name":"Quality (philosophy)","score":0.464630663394928},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4249994158744812}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":3}},{"id":"openalex:W4391825130","title":"Enhancing multi-modal fusion in visual dialog via sample debiasing and feature interaction","url":"https://doi.org/10.1016/j.inffus.2024.102302","published":"2024-02-14","authors":["Chenyu Lu","Jun Yin","Hao Yang","Shiliang Sun"],"abstract":"","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1016/j.inffus.2024.102302","openalex_id":"https://openalex.org/W4391825130","cited_by_count":5,"quality_score":42,"matched_keywords":[],"author_affiliations":["East China Normal University","Huawei Technologies (China)","Shanghai Jiao Tong University","Shanghai Maritime University"],"concepts":[{"id":"https://openalex.org/C2779458634","display_name":"Debiasing","score":0.9861176013946533},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7774381637573242},{"id":"https://openalex.org/C71139939","display_name":"Modal","score":0.666132926940918},{"id":"https://openalex.org/C173853756","display_name":"Dialog box","score":0.6660192012786865},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.6352447271347046},{"id":"https://openalex.org/C2776401178","display_name":"Feature (linguistics)","score":0.614048957824707},{"id":"https://openalex.org/C198531522","display_name":"Sample (material)","score":0.5672557353973389},{"id":"https://openalex.org/C158525013","display_name":"Fusion","score":0.5624346733093262}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":5}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/feature-reuse-and-scaling-understanding-transfer-learning-with-protein-language-models","title":"Feature Reuse and Scaling: Understanding Transfer Learning with Protein Language Models","url":"https://www.microsoft.com/en-us/research/publication/feature-reuse-and-scaling-understanding-transfer-learning-with-protein-language-models/","published":"2024-02-13","authors":["Francesca-Zhoufan Li","Ava P. Amini","Yisong Yue","Kevin Yang","Alex Lu"],"abstract":"Large pretrained protein language models (PLMs) have improved protein property and structure prediction from sequences via transfer learning, in which weights and representations from PLMs are repurposed for downstream tasks. Although PLMs have shown great promise, currently there is little understanding of how the features learned by pretraining relate to and are useful for downstream tasks. We perform a systematic analysis of transfer learning using PLMs, conducting 370 experiments across a comprehensive suite of factors including different downstream tasks, architectures, model sizes, model depths, and pretraining time. We observe that while almost all down-stream tasks do benefit from pretrained models compared to naive sequence representations, for the majority of tasks performance does not scale with pretraining, and instead relies on low-level features learned early in pretraining...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.1101/2024.02.05.578959","openalex_id":"https://openalex.org/W4391652655","cited_by_count":53,"quality_score":102,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Biology","protein language models","1970-01-01"],"author_affiliations":["Microsoft","California Institute of Technology","Microsoft (Norway)","Microsoft (United States)"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/dora-weight-decomposed-low-rank-adaptation","title":"DoRA: Weight-Decomposed Low-Rank Adaptation","url":"https://www.microsoft.com/en-us/research/publication/dora-weight-decomposed-low-rank-adaptation/","published":"2024-02-13","authors":["Shih-yang Liu","Chien-Yi Wang","Hongxu Yin","Pavlo Molchanov","Yu-Chiang Frank Wang","Kwang-Ting Cheng","Min-Hung Chen"],"abstract":"Among the widely used parameter-efficient finetuning (PEFT) methods, LoRA and its variants have gained considerable popularity because of avoiding additional inference costs. However, there still often exists an accuracy gap between these methods and full fine-tuning (FT). In this work, we first introduce a novel weight decomposition analysis to investigate the inherent differences between FT and LoRA. Aiming to resemble the learning capacity of FT from the findings, we propose Weight-Decomposed LowRank Adaptation (DoRA). DoRA decomposes the pre-trained weight into two components, magnitude and direction, for fine-tuning, specifically employing LoRA for directional updates to efficiently minimize the number of trainable parameters. By employing DoRA, we enhance both the learning capacity and training stability of LoRA while avoiding any additional inference overhead. DoRA consistently ou...","companies":["Microsoft","NVIDIA"],"matched_orgs":["Microsoft","NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":92,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","Fine-tuning","1970-01-01","ICML","efficient"],"author_affiliations":["Microsoft","NVIDIA"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"official:c426d1fa228bfcdd","title":"IM-3D: Iterative Multiview Diffusion and Reconstruction for High-Quality 3D Generation","url":"https://ai.meta.com/research/publications/im-3d-iterative-multiview-diffusion-and-reconstruction-for-high-quality-3d-generation/","published":"2024-02-13","authors":["Luke Melas-Kyriazi","Iro Laina","Christian Rupprecht","Natalia Neverova","Andrea Vedaldi","Oran Gafni","Filippos Kokkinos"],"abstract":"Official Meta AI publication page.","companies":["Meta/FAIR"],"matched_orgs":["Meta/FAIR"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Graphics","Computer Vision"],"author_affiliations":["Meta/FAIR"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Meta AI publication search https://ai.meta.com/results/?content_types%5B0%5D=publication&page=16"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/embodied-agent-ai","title":"Agent AI Towards a Holistic Intelligence","url":"https://www.microsoft.com/en-us/research/publication/embodied-agent-ai/","published":"2024-02-12","authors":["Qiuyuan Huang","Naoki Wake","Bidipta Sarkar","Zane Durante","Ran Gong","Rohan Taori","Yusuke Noda","Demetri Terzopoulos","Noboru Kuno","Ade Famoti","Ashley J. Llorens","John Langford"],"abstract":"Recent advancements in large foundational mod els have remarkably enhanced our understanding of sensory information in open-world environ ments. At this pivotal moment, it is crucial to the AI research trend toward excessive reductionism and returning to the AI principles inspired by the holistic philosophy of Aristotle. Specifically, we emphasize developing “Agent AI”, an embodied system that integrates large foundation models into agent actions. The emerging field of Agent AI spans a wide range of existing embodied and agent-based multimodal interactions, including robotics, gaming, and diagnostic systems. We em phasize the importance of integrating recent large foundational models to enhance intelligence and interaction capabilities. Furthermore, we discuss how agents exhibit remarkable capabilities across a variety of domains and tasks, challenging our understanding of learning and c...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer vision","Human-computer interaction","1970-01-01","agent"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"apple:wogzdbi56ii21h6ayl2dhpwc","title":"The Entity-Deduction Arena: A Playground for Probing the Conversational Reasoning and Planning Capabilities of LLMs","url":"https://machinelearning.apple.com/research/parlor-game-arena","published":"2024-02-12","authors":["Yizhe Zhang","Jiarui Lu","Navdeep Jaitly"],"abstract":"LLMs are currently effective at answering questions that are clearly asked. However, they may encounter difficulties when faced with ambiguous queries. This emphasizes the need for the development of intelligent agents capable of asking clarification questions, which require complex understanding, state tracking, and planning in multi-turn conversations. In this paper, we study a surrogate problem by employing entity-deducing games as evaluation...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"openalex:W4392187503","title":"Enhancing Predictive Maintenance in an Oil & Gas Refinery Using IoT, AI & ML: An Generative AI Solution","url":"https://doi.org/10.2523/iptc-23466-ms","published":"2024-02-12","authors":["Shweta Saboo","Dushyant Shekhawat"],"abstract":"Abstract Oil and gas refinery operations are under constant pressure to enhance efficiency and ensure uninterrupted processing. The adoption of predictive maintenance strategies has emerged as a pivotal solution, enabling real-time anomaly detection, predicting pressure fluctuations, and monitoring asset health. An illuminating example hails from a downstream operator in Western Australia that strategically harnesses the power of IoT and AI/ML. For them, revenue hinges on the streamlined delivery of gas processing services to customers, amplifying the significance of process efficiency gains. Leveraging on-site equipment data analysis, this approach significantly minimizes on-site maintenance requirements and automates back-office tasks, reducing manual data analysis and response generation in maintenance permit systems. The technical infrastructure involves wireless sensor-enabled data....","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.2523/iptc-23466-ms","openalex_id":"https://openalex.org/W4392187503","cited_by_count":12,"quality_score":49,"matched_keywords":[],"author_affiliations":["Amazon (United States)"],"concepts":[{"id":"https://openalex.org/C105168734","display_name":"Refinery","score":0.7645573616027832},{"id":"https://openalex.org/C81860439","display_name":"Internet of Things","score":0.5813320875167847},{"id":"https://openalex.org/C20309002","display_name":"Oil refinery","score":0.5534124374389648},{"id":"https://openalex.org/C70452415","display_name":"Predictive maintenance","score":0.4948405623435974},{"id":"https://openalex.org/C2987168347","display_name":"Crude oil","score":0.493389755487442},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.43044331669807434},{"id":"https://openalex.org/C78762247","display_name":"Petroleum engineering","score":0.404674232006073},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.3204731345176697}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":12}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/policy-improvement-using-language-feedback-models","title":"Policy Improvement using Language Feedback Models","url":"https://www.microsoft.com/en-us/research/publication/policy-improvement-using-language-feedback-models/","published":"2024-02-11","authors":["Victor Zhong","Dipendra Misra","Xingdi Yuan","Marc-Alexandre Côté"],"abstract":"We introduce Language Feedback Models (LFMs) that identify desirable behaviour - actions that help achieve tasks specified in the instruction - for imitation learning in instruction following. To train LFMs, we obtain feedback from Large Language Models (LLMs) on visual trajectories verbalized to language descriptions. First, by using LFMs to identify desirable behaviour to imitate, we improve in task-completion rate over strong behavioural cloning baselines on three distinct language grounding environments (Touchdown, ScienceWorld, and ALFWorld). Second, LFMs outperform using LLMs as experts to directly predict actions, when controlling for the number of LLM output tokens. Third, LFMs generalize to unseen environments, improving task-completion rate by 3.5-12.0% through one round of adaptation. Finally, LFM can be modified to provide human-interpretable feedback without performance loss...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","Language Feedback Models","1970-01-01","LLM"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/differentially-private-training-of-mixture-of-experts-models","title":"Differentially Private Training of Mixture of Experts Models","url":"https://www.microsoft.com/en-us/research/publication/differentially-private-training-of-mixture-of-experts-models/","published":"2024-02-10","authors":["Pierre Tholoniat","Huseyin A. Inan","Janardhan (Jana) Kulkarni","Robert Sim"],"abstract":"This position paper investigates the integration of Differential Privacy (DP) in the training of Mixture of Experts (MoE) models within the field of natural language processing. As Large Language Models (LLMs) scale to billions of parameters, leveraging expansive datasets, they exhibit enhanced linguistic capabilities and emergent abilities. However, this growth raises significant computational and privacy concerns. Our study addresses these issues by exploring the potential of MoE models, known for their computational efficiency, and the application of DP, a standard for privacy preservation. We present the first known attempt to train MoE models under the constraints of DP, addressing the unique challenges posed by their architecture and the complexities of DP integration. Our initial experimental studies demonstrate that MoE models can be effectively trained with DP, achieving perform...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Inproceedings (Conference)","Algorithms","Computer science","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/exploring-interaction-patterns-for-debugging-enhancing-conversational-capabilities-of-ai-assistants","title":"Exploring Interaction Patterns for Debugging: Enhancing Conversational Capabilities of AI-assistants","url":"https://www.microsoft.com/en-us/research/publication/exploring-interaction-patterns-for-debugging-enhancing-conversational-capabilities-of-ai-assistants/","published":"2024-02-09","authors":["Bhavya Chopra","Yasharth Bajpai","Param Biyani","Gustavo Soares","Arjun Radhakrishna","Chris Parnin","Sumit Gulwani"],"abstract":"The widespread availability of Large Language Models (LLMs) within Integrated Development Environments (IDEs) has led to their speedy adoption. Conversational interactions with LLMs enable programmers to obtain natural language explanations for various software development tasks. However, LLMs often leap to action without sufficient context, giving rise to implicit assumptions and inaccurate responses. Conversations between developers and LLMs are primarily structured as question-answer pairs, where the developer is responsible for asking the the right questions and sustaining conversations across multiple turns. In this paper, we draw inspiration from interaction patterns and conversation analysis -- to design Robin, an enhanced conversational AI-assistant for debugging. Through a within-subjects user study with 12 industry professionals, we find that equipping the LLM to -- (1) leverag...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":80,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Human-computer interaction","Programming languages and software engineering","Computer science","1970-01-01","LLM"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W4391681328","title":"CodeKGC: Code Language Model for Generative Knowledge Graph Construction","url":"https://doi.org/10.1145/3641850","published":"2024-02-09","authors":["Zhen Bi","Jing Chen","Yinuo Jiang","Feiyu Xiong","Wei Guo","Huajun Chen","Ningyu Zhang"],"abstract":"Current generative knowledge graph construction approaches usually fail to capture structural knowledge by simply flattening natural language into serialized texts or a specification language. However, large generative language model trained on structured data such as code has demonstrated impressive capability in understanding natural language for structural prediction and reasoning tasks. Intuitively, we address the task of generative knowledge graph construction with code language model: given a code-format natural language input, the target is to generate triples which can be represented as code completion tasks. Specifically, we develop schema-aware prompts that effectively utilize the semantic structure within the knowledge graph. As code inherently possesses structure, such as class and function definitions, it serves as a useful model for prior semantic structural knowledge. Furt...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3641850","openalex_id":"https://openalex.org/W4391681328","cited_by_count":55,"quality_score":71,"matched_keywords":["language model"],"author_affiliations":["Alibaba Group (China)","Zhejiang University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8609890937805176},{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.7258298993110657},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.5448430180549622},{"id":"https://openalex.org/C195324797","display_name":"Natural language","score":0.5369093418121338},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5048323273658752},{"id":"https://openalex.org/C132525143","display_name":"Graph","score":0.4396895468235016},{"id":"https://openalex.org/C2987255567","display_name":"Knowledge graph","score":0.4356077313423157},{"id":"https://openalex.org/C167966045","display_name":"Generative model","score":0.4239393472671509}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":55}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/structured-entity-extraction-using-large-language-models","title":"Learning to Extract Structured Entities Using Language Models","url":"https://www.microsoft.com/en-us/research/publication/structured-entity-extraction-using-large-language-models/","published":"2024-02-08","authors":["Haolun Wu","Ye Yuan","Liana Mikaelyan","Alexander Meulemans","Xue Liu","James Hensman","Bhaskar Mitra"],"abstract":"Recent advances in machine learning have significantly impacted the field of information extraction, with Language Models (LMs) playing a pivotal role in extracting structured information from unstructured text. Prior works typically represent information extraction as triplet-centric and use classical metrics such as precision and recall for evaluation. We reformulate the task to be entity-centric, enabling the use of diverse metrics that can provide more insights from various perspectives. We contribute to the field by introducing Structured Entity Extraction and proposing the Approximate Entity Set OverlaP (AESOP) metric, designed to appropriately assess model performance. Later, we introduce a new model that harnesses the power of LMs for enhanced effectiveness and efficiency by decomposing the extraction task into multiple stages. Quantitative and human side-by-side evaluations conf...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":84,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Search and information retrieval","Deep learning","Information extraction","Knowledge base","Knowledge extraction","Machine learning"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/premier-taco-is-a-few-shot-policy-learner-pretraining-multitask-representation-via-temporal-action-driven-contrastive-loss","title":"Premier-TACO is a Few-Shot Policy Learner: Pretraining Multitask Representation via Temporal Action-Driven Contrastive Loss","url":"https://www.microsoft.com/en-us/research/publication/premier-taco-is-a-few-shot-policy-learner-pretraining-multitask-representation-via-temporal-action-driven-contrastive-loss/","published":"2024-02-08","authors":["Ruijie Zheng","Yongyuan Liang","Xiyao Wang","Shuang Ma","Hal Daumé III","Huazhe Xu","John Langford","Praveen Palanisamy","Kalyan Shankar Basu","Furong Huang"],"abstract":"We present Premier-TACO, a multitask feature representation learning approach designed to improve few-shot policy learning efficiency in sequential decision-making tasks. Premier-TACO leverages a subset of multitask offline datasets for pretraining a general feature representation, which captures critical environmental dynamics and is fine-tuned using minimal expert demonstrations. It advances the temporal action contrastive learning (TACO) objective, known for state-of-the-art results in visual control tasks, by incorporating a novel negative example sampling strategy. This strategy is crucial in significantly boosting TACO's computational efficiency, making large-scale multitask offline pretraining feasible. Our extensive empirical evaluation in a diverse set of continuous control benchmarks including Deepmind Control Suite, MetaWorld, and LIBERO demonstrate Premier-TACO's effectivenes...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","Machine learning","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/culturellm-incorporating-cultural-differences-into-large-language-models","title":"CultureLLM: Incorporating Cultural Differences into Large Language Models","url":"https://www.microsoft.com/en-us/research/publication/culturellm-incorporating-cultural-differences-into-large-language-models/","published":"2024-02-08","authors":["Cheng Li","Mengzhou Chen","Jindong Wang","Sunayana Sitaram","Xing Xie"],"abstract":"Large language models (LLMs) are reported to be partial to certain cultures owing to the training data dominance from the English corpora. Since multilingual cultural data are often expensive to collect, existing efforts handle this by prompt engineering or culture-specific pre-training. However, they might overlook the knowledge deficiency of low-resource culture and require extensive computing resources. In this paper, we propose CultureLLM, a cost-effective solution to incorporate cultural differences into LLMs. CultureLLM adopts World Value Survey (WVS) as seed data and generates semantically equivalent training data via the proposed semantic data augmentation. Using only 50 seed samples from WVS with augmented data, we fine-tune culture-specific LLMs and one unified model (CultureLLM-One) for 9 cultures covering rich and low-resource languages. Extensive experiments on 60 culture-re...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","large language models","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/fewer-is-more-boosting-llm-reasoning-with-reinforced-context-pruning","title":"Fewer is More: Boosting LLM Reasoning with Reinforced Context Pruning","url":"https://www.microsoft.com/en-us/research/publication/fewer-is-more-boosting-llm-reasoning-with-reinforced-context-pruning/","published":"2024-02-08","authors":["Xijie Huang","Li Lyna Zhang","Kwang-Ting Cheng","Fan Yang","Mao Yang"],"abstract":"Large Language Models (LLMs) have shown impressive capabilities, yet they still struggle with math reasoning. In this work, we propose CoT-Influx, a novel approach that pushes the boundary of few-shot Chain-of-Thoughts (CoT) learning to improve LLM mathematical reasoning. Motivated by the observation that adding more concise CoT examples in the prompt can improve LLM reasoning performance, CoT-Influx employs a coarse-to-fine pruner to maximize the input of effective and concise CoT examples. The pruner first selects as many crucial CoT examples as possible and then prunes unimportant tokens to fit the context window. A math reasoning dataset with diverse difficulty levels and reasoning steps is used to train the pruner, along with a math-specialized reinforcement learning approach. As a result, by enabling more CoT examples with double the context window size in tokens, CoT-Influx signif...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","1970-01-01","LLM"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W4391651996","title":"Genre: generative multi-turn question answering with contrastive learning for entity–relation extraction","url":"https://doi.org/10.1007/s40747-023-01321-y","published":"2024-02-08","authors":["Lulu Wang","Kai Yu","Aishan Wumaier","Peng Zhang","Tuergen Yibulayin","Xi Wu","Jibing Gong","Maihemuti Maimaiti"],"abstract":"Abstract Extractive approaches have been the mainstream paradigm for identifying overlapping entity–relation extraction. However, limited by their inherently methodological flaws, which hardly deal with three issues: hierarchical dependent entity–relations, implicit entity–relations, and entity normalization. Recent advances have proposed an effective solution based on generative language models, which cast entity–relation extraction as a sequence-to-sequence text generation task. Inspired by the observation that humans learn by getting to the bottom of things, we propose a novel framework, namely GenRE, Generative multi-turn question answering with contrastive learning for entity–relation extraction. Specifically, a template-based question prompt generation first is designed to answer in different turns. We then formulate entity–relation extraction as a generative question answering tas...","companies":["Z.ai/Zhipu"],"matched_orgs":["Z.ai/Zhipu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1007/s40747-023-01321-y","openalex_id":"https://openalex.org/W4391651996","cited_by_count":8,"quality_score":49,"matched_keywords":["language model"],"author_affiliations":["Xinjiang University","Yanshan University","Zhipu AI (China)","Zhong Ke San Huan (China)"],"concepts":[{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.8033431768417358},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7951103448867798},{"id":"https://openalex.org/C153604712","display_name":"Relationship extraction","score":0.7476063966751099},{"id":"https://openalex.org/C136886441","display_name":"Normalization (sociology)","score":0.6638034582138062},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.6003352403640747},{"id":"https://openalex.org/C44291984","display_name":"Question answering","score":0.5983302593231201},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5971707105636597},{"id":"https://openalex.org/C167966045","display_name":"Generative model","score":0.5299457311630249}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":8}},{"id":"openalex:W4391651317","title":"Video Frame-wise Explanation Driven Contrastive Learning for Procedural Text Generation","url":"https://doi.org/10.1016/j.cviu.2024.103954","published":"2024-02-08","authors":["Zhihao Wang","Lin Li","Zhongwei Xie","Chuanbo Liu"],"abstract":"","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1016/j.cviu.2024.103954","openalex_id":"https://openalex.org/W4391651317","cited_by_count":2,"quality_score":39,"matched_keywords":[],"author_affiliations":["Huawei Technologies (China)","Wuhan University of Technology"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8291727304458618},{"id":"https://openalex.org/C157657479","display_name":"Closed captioning","score":0.7167499661445618},{"id":"https://openalex.org/C126042441","display_name":"Frame (networking)","score":0.6640958189964294},{"id":"https://openalex.org/C2776359362","display_name":"Representation (politics)","score":0.6456360816955566},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5783307552337646},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.5015854835510254},{"id":"https://openalex.org/C40506919","display_name":"Sequence learning","score":0.45798978209495544},{"id":"https://openalex.org/C2780451532","display_name":"Task (project management)","score":0.44421839714050293}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":2}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/multilingual-e5-text-embeddings-a-technical-report","title":"Multilingual E5 Text Embeddings: A Technical Report","url":"https://www.microsoft.com/en-us/research/publication/multilingual-e5-text-embeddings-a-technical-report/","published":"2024-02-07","authors":["Liang Wang","Nan Yang","Xiaolong Huang","Linjun Yang","Rangan Majumder","Furu Wei"],"abstract":"This technical report presents the training methodology and evaluation results of the open-source multilingual E5 text embedding models, released in mid-2023. Three embedding models of different sizes (small / base / large) are provided, offering a balance between the inference efficiency and embedding quality. The training procedure adheres to the English E5 model recipe, involving contrastive pre-training on 1 billion multilingual text pairs, followed by fine-tuning on a combination of labeled datasets. Additionally, we introduce a new instruction-tuned embedding model, whose performance is on par with state-of-the-art, English-only models of similar sizes. Information regarding the model release can be found at https://github.com/microsoft/unilm/tree/master/e5 .","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Tech Report","Artificial intelligence","Computation and Language","Computer science","Information retrieval"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W4391621032","title":"Normal Transformer: Extracting Surface Geometry From LiDAR Points Enhanced by Visual Semantics","url":"https://doi.org/10.1109/tiv.2024.3363174","published":"2024-02-07","authors":["Ancheng Lin","Jun Li","Yusheng Xiang","Wei Bian","Mukesh Prasad"],"abstract":"High-quality surface normal can help improve geometry estimation in problems faced by autonomous vehicles, such as collision avoidance and occlusion inference. While a considerable volume of literature focuses on densely scanned indoor scenarios, normal estimation during autonomous driving remains an intricate problem due to the sparse, non-uniform, and noisy nature of real-world LiDAR scans. In this paper, we introduce a multi-modal technique that leverages 3D point clouds and 2D colour images obtained from LiDAR and camera sensors for surface normal estimation. We present the Hybrid Geometric Transformer (HGT), a novel transformer-based neural network architecture that proficiently fuses visual semantic and 3D geometric information. Furthermore, we developed an effective learning strategy for the multi-modal data. Experimental results demonstrate the superior effectiveness of our infor...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/tiv.2024.3363174","openalex_id":"https://openalex.org/W4391621032","cited_by_count":2,"quality_score":39,"matched_keywords":[],"author_affiliations":["Alibaba Group (China)","Sunway University","University of Technology Sydney"],"concepts":[{"id":"https://openalex.org/C51399673","display_name":"Lidar","score":0.5683149695396423},{"id":"https://openalex.org/C66322947","display_name":"Transformer","score":0.5467371940612793},{"id":"https://openalex.org/C2524010","display_name":"Geometry","score":0.49982500076293945},{"id":"https://openalex.org/C2776799497","display_name":"Surface (topology)","score":0.4924703538417816},{"id":"https://openalex.org/C184337299","display_name":"Semantics (computer science)","score":0.45775851607322693},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3918856084346771},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.3834550380706787},{"id":"https://openalex.org/C127313418","display_name":"Geology","score":0.3726547956466675}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":2}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/cataractbot-an-llm-powered-expert-in-the-loop-chatbot-for-cataract-patients","title":"CataractBot: An LLM-Powered Expert-in-the-Loop Chatbot for Cataract Patients","url":"https://www.microsoft.com/en-us/research/publication/cataractbot-an-llm-powered-expert-in-the-loop-chatbot-for-cataract-patients/","published":"2024-02-06","authors":["Pragnya Ramjee","Bhuvan Sachdeva","Satvik Golechha","Shreyas Kulkarni","Geeta Fulari","Dr. Kaushik Murali","Mohit Jain"],"abstract":"The healthcare landscape is evolving, with patients seeking reliable information about their health conditions and available treatment options. Despite the abundance of information sources, the digital age overwhelms individuals with excess, often inaccurate information. Patients primarily trust medical professionals, highlighting the need for expert-endorsed health information. However, increased patient loads on experts has led to reduced communication time, impacting information sharing. To address this gap, we develop CataractBot, an experts-in-the-loop chatbot powered by LLMs, in collaboration with an eye hospital in India. CataractBot answers cataract surgery related questions instantly by querying a curated knowledge base, and provides expert-verified responses asynchronously. It has multimodal and multilingual capabilities. In an in-the-wild deployment study with 55 participants,...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":84,"matched_keywords":["Miscellaneous","Artificial intelligence","Human-computer interaction","Medical, health and genomics","Computer science","Human–computer interaction","Machine learning","LLM"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/open-vocabulary-calibration-for-vision-language-models","title":"Open-Vocabulary Calibration for Vision-Language Models","url":"https://www.microsoft.com/en-us/research/publication/open-vocabulary-calibration-for-vision-language-models/","published":"2024-02-06","authors":["Shuoyuan Wang","Jindong Wang","Guoqing Wang","Bob Zhang","Kaiyang Zhou","Hongxin Wei"],"abstract":"Vision-language models (VLMs) have emerged as formidable tools, showing their strong capability in handling various open-vocabulary tasks in image recognition, text-driven visual content generation, and visual chatbots, to name a few. In recent years, considerable efforts and resources have been devoted to adaptation methods for improving downstream performance of VLMs, particularly on parameter-efficient fine-tuning methods like prompt learning. However, a crucial aspect that has been largely overlooked is the confidence calibration problem in fine-tuned VLMs, which could greatly reduce reliability when deploying such models in the real world. This paper bridges the gap by systematically investigating the confidence calibration problem in the context of prompt learning and reveals that existing calibration methods are insufficient to address the problem, especially in the open-vocabular...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","Vision-language models","1970-01-01","efficient"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/llm-for-hybrid-workplace-decision-support","title":"Leveraging Large Language Models for Hybrid Workplace Decision Support","url":"https://www.microsoft.com/en-us/research/publication/llm-for-hybrid-workplace-decision-support/","published":"2024-02-06","authors":["Yujin Kim","Chin-Chia Hsu"],"abstract":"Large Language Models (LLMs) hold the potential to perform a variety of text processing tasks and provide textual explanations for proposed actions or decisions. In the era of hybrid work, LLMs can provide intelligent decision support for workers who are designing their hybrid work plans. In particular, they can offer suggestions and explanations to workers balancing numerous decision factors, thereby enhancing their work experience. In this paper, we present a decision support model for workspaces in hybrid work environments, leveraging the reasoning skill of LLMs. We first examine LLM's capability of making suitable workspace suggestions. We find that its reasoning extends beyond the guidelines in the prompt and the LLM can manage the trade-off among the available resources in the workspaces. We conduct an extensive user study to understand workers' decision process for workspace choic...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Miscellaneous","Artificial intelligence","Human-computer interaction","Human Computer Interaction","large language models","LLM"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W4391558462","title":"CoderEval: A Benchmark of Pragmatic Code Generation with Generative Pre-trained Models","url":"https://doi.org/10.1145/3597503.3623316","published":"2024-02-06","authors":["Hao Yu","Shen Bo","Dezhi Ran","J. Y. Zhang","Qi Rong Zhang","Yuchi Ma","Guangtai Liang","Ying Li","Qianxiang Wang","Tao Xie"],"abstract":"Code generation models based on the pre-training and fine-tuning paradigm have been increasingly attempted by both academia and industry, resulting in well-known industrial models such as Codex, CodeGen, and PanGu-Coder. To evaluate the effectiveness of these models, multiple existing benchmarks (e.g., HumanEval and AiXBench) are proposed, including only cases of generating a standalone function, i.e., a function that may invoke or access only built-in functions and standard libraries. However, non-standalone functions, which typically are not included in the existing benchmarks, constitute more than 70% of the functions in popular open-source projects, and evaluating models' effectiveness on standalone functions cannot reflect these models' effectiveness on pragmatic code generation scenarios (i.e., code generation for real settings of open source or proprietary code).","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3597503.3623316","openalex_id":"https://openalex.org/W4391558462","cited_by_count":104,"quality_score":67,"matched_keywords":[],"author_affiliations":["Huawei Technologies (China)","Peking University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.788891613483429},{"id":"https://openalex.org/C185798385","display_name":"Benchmark (surveying)","score":0.7547824382781982},{"id":"https://openalex.org/C133162039","display_name":"Code generation","score":0.7349024415016174},{"id":"https://openalex.org/C2776760102","display_name":"Code (set theory)","score":0.6595880389213562},{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.5964939594268799},{"id":"https://openalex.org/C43126263","display_name":"Source code","score":0.5441076159477234},{"id":"https://openalex.org/C14036430","display_name":"Function (biology)","score":0.5353392362594604},{"id":"https://openalex.org/C115903868","display_name":"Software engineering","score":0.5132491588592529}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":104}},{"id":"arxiv:2309.09867","title":"EGFE: End-to-end Grouping of Fragmented Elements in UI Designs with Multimodal Learning","url":"http://arxiv.org/abs/2309.09867","published":"2024-02-06","authors":["Liuqing Chen","Yunnong Chen","Shuhong Xiao","Yaxuan Song","Lingyun Sun","Yankun Zhen","Tingting Zhou","Yanfang Chang"],"abstract":"When translating UI design prototypes to code in industry, automatically generating code from design prototypes can expedite the development of applications and GUI iterations. However, in design prototypes without strict design specifications, UI components may be composed of fragmented elements. Grouping these fragmented elements can greatly improve the readability and maintainability of the generated code. Current methods employ a two-stage strategy that introduces hand-crafted rules to group fragmented elements. Unfortunately, the performance of these methods is not satisfying due to visually overlapped and tiny UI elements. In this study, we propose EGFE, a novel method for automatically End-to-end Grouping Fragmented Elements via UI sequence prediction. To facilitate the UI understanding, we innovatively construct a Transformer encoder to model the relationship between the UI eleme...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"preprint","doi":"https://doi.org/10.1145/3597503.3623313","openalex_id":"https://openalex.org/W4386876193","cited_by_count":5,"quality_score":42,"matched_keywords":[],"author_affiliations":["Alibaba Group (China)","Zhejiang University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8173873424530029},{"id":"https://openalex.org/C160713754","display_name":"Maintainability","score":0.7061383724212646},{"id":"https://openalex.org/C53016008","display_name":"Front and back ends","score":0.49896788597106934},{"id":"https://openalex.org/C2777904410","display_name":"Software","score":0.4887706935405731},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.44056352972984314},{"id":"https://openalex.org/C207850805","display_name":"Reverse engineering","score":0.41800978779792786},{"id":"https://openalex.org/C2780801425","display_name":"Construct (python library)","score":0.41198989748954773},{"id":"https://openalex.org/C43126263","display_name":"Source code","score":0.4108327031135559}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":5}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/distillm-towards-streamlined-distillation-for-large-language-models","title":"DistiLLM: Towards Streamlined Distillation for Large Language Models","url":"https://www.microsoft.com/en-us/research/publication/distillm-towards-streamlined-distillation-for-large-language-models/","published":"2024-02-05","authors":["Jongwoo Ko","Sungnyun Kim","Tianyi Chen","Se-Young Yun"],"abstract":"Knowledge distillation (KD) is widely used for compressing a teacher model to a smaller student model, reducing its inference cost and memory footprint while preserving model capabilities. However, current KD methods for auto-regressive sequence models (e.g., large language models) suffer from missing a standardized objective function. Moreover, the recent use of student-generated outputs to address training-inference mismatches has significantly escalated computational costs. To tackle these issues, we introduce DistiLLM, a more effective and efficient KD framework for auto-regressive language models. DistiLLM comprises two components: (1) a novel skew Kullback-Leibler divergence loss, where we unveil and leverage its theoretical properties, and (2) an adaptive off-policy approach designed to enhance the efficiency in utilizing student-generated outputs. Extensive experiments, including...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":84,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","Knowledge Distillation","1970-01-01","memory","efficient","distillation"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/harmbench-a-standardized-evaluation-framework-for-automated-red-teaming-and-robust-refusal","title":"HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal","url":"https://www.microsoft.com/en-us/research/publication/harmbench-a-standardized-evaluation-framework-for-automated-red-teaming-and-robust-refusal/","published":"2024-02-05","authors":["Mantas Mazeika","Long Phan","Xuwang Yin","Andy Zou","Zifan Wang","Norman Mu","Elham Sakhaee","Nathaniel Li","Steven Basart","Bo Li","David Forsyth","Dan Hendrycks"],"abstract":"Automated red teaming holds substantial promise for uncovering and mitigating the risks associated with the malicious use of large language models (LLMs), yet the field lacks a standardized evaluation framework to rigorously assess new methods. To address this issue, we introduce HarmBench, a standardized evaluation framework for automated red teaming. We identify several desirable properties previously unaccounted for in red teaming evaluations and systematically design HarmBench to meet these criteria. Using HarmBench, we conduct a large-scale comparison of 18 red teaming methods and 33 target LLMs and defenses, yielding novel insights. We also introduce a highly efficient adversarial training method that greatly enhances LLM robustness across a wide range of attacks, demonstrating how HarmBench enables codevelopment of attacks and defenses. We open source HarmBench at https://github.c...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":80,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","large language models","1970-01-01","LLM","efficient"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/anytool-self-reflective-hierarchical-agents-for-large-scale-api-calls","title":"AnyTool: Self-Reflective, Hierarchical Agents for Large-Scale API Calls","url":"https://www.microsoft.com/en-us/research/publication/anytool-self-reflective-hierarchical-agents-for-large-scale-api-calls/","published":"2024-02-05","authors":["Yu Du","Fangyun Wei","Hongyang Zhang"],"abstract":"We introduce AnyTool, a large language model agent designed to revolutionize the utilization of a vast array of tools in addressing user queries. We utilize over 16,000 APIs from Rapid API, operating under the assumption that a subset of these APIs could potentially resolve the queries. AnyTool primarily incorporates three elements: an API retriever with a hierarchical structure, a solver aimed at resolving user queries using a selected set of API candidates, and a self-reflection mechanism, which re-activates AnyTool if the initial solution proves impracticable. AnyTool is powered by the function calling feature of GPT-4, eliminating the need for training external modules. We also revisit the evaluation protocol introduced by previous works and identify a limitation in this protocol that leads to an artificially high pass rate. By revising the evaluation protocol to better reflect pract...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":80,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","Machine learning","1970-01-01","language model","agent"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/tag-llm-repurposing-general-purpose-llms-for-specialized-domains","title":"Tag-LLM: Repurposing General-Purpose LLMs for Specialized Domains","url":"https://www.microsoft.com/en-us/research/publication/tag-llm-repurposing-general-purpose-llms-for-specialized-domains/","published":"2024-02-05","authors":["Junhong Shen","Neil Tenenholtz","James Hall","David Alvarez-Melis","Nicolo Fusi"],"abstract":"Large Language Models (LLMs) have demonstrated remarkable proficiency in understanding and generating natural language. However, their capabilities wane in highly specialized domains underrepresented in the pretraining corpus, such as physical and biomedical sciences. This work explores how to repurpose general LLMs into effective task solvers for specialized domains. We introduce a novel, model-agnostic framework for learning custom input tags, which are parameterized as continuous vectors appended to the LLM's embedding layer, to condition the LLM. We design two types of input tags: domain tags are used to delimit specialized representations (e.g., chemical formulas) and provide domain-relevant context; function tags are used to represent specific functions (e.g., predicting molecular properties) and compress function-solving instructions. We develop a three-stage protocol to learn the...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","large language models","1970-01-01","LLM"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W4391547698","title":"Text2NeRF: Text-Driven 3D Scene Generation With Neural Radiance Fields","url":"https://doi.org/10.1109/tvcg.2024.3361502","published":"2024-02-05","authors":["Jingbo Zhang","Xiaoyu Li","Ziyu Wan","Can Wang","Jing Liao"],"abstract":"Text-driven 3D scene generation is widely applicable to video gaming, film industry, and metaverse applications that have a large demand for 3D scenes. However, existing text-to-3D generation methods are limited to producing 3D objects with simple geometries and dreamlike styles that lack realism. In this work, we present Text2NeRF, which is able to generate a wide range of 3D scenes with complicated geometric structures and high-fidelity textures purely from a text prompt. To this end, we adopt NeRF as the 3D representation and leverage a pre-trained text-to-image diffusion model to constrain the 3D reconstruction of the NeRF to reflect the scene description. Specifically, we employ the diffusion model to infer the text-related image as the content prior and use a monocular depth estimation method to offer the geometric prior. Both content and geometric priors are utilized to update the...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/tvcg.2024.3361502","openalex_id":"https://openalex.org/W4391547698","cited_by_count":70,"quality_score":67,"matched_keywords":[],"author_affiliations":["City University of Hong Kong","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.841161847114563},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.6616995930671692},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.624915361404419},{"id":"https://openalex.org/C153083717","display_name":"Leverage (statistics)","score":0.6055931448936462},{"id":"https://openalex.org/C2776436953","display_name":"Consistency (knowledge bases)","score":0.4546353220939636},{"id":"https://openalex.org/C177769412","display_name":"Prior probability","score":0.45036551356315613},{"id":"https://openalex.org/C11727466","display_name":"Inpainting","score":0.43637481331825256},{"id":"https://openalex.org/C121684516","display_name":"Computer graphics (images)","score":0.34301692247390747}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":70}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/the-essential-role-of-causality-in-foundation-world-models-for-embodied-ai","title":"The Essential Role of Causality in Foundation World Models for Embodied AI","url":"https://www.microsoft.com/en-us/research/publication/the-essential-role-of-causality-in-foundation-world-models-for-embodied-ai/","published":"2024-02-05","authors":["Tarun Gupta","Wenbo Gong","Chao Ma","Nick Pawlowski","Agrin Hilmkil","M. Scetbon","Ade Famoti","A. Llorens","Jianfeng Gao","Stefan Bauer","Danica Kragic","Bernhard Schölkopf"],"abstract":"Recent advances in foundation models, especially in large multi-modal models and conversational agents, have ignited interest in the potential of generally capable embodied agents. Such agents will require the ability to perform new tasks in many different real-world environments. However, current foundation models fail to accurately model physical interactions and are therefore insufficient for Embodied AI. The study of causality lends itself to the construction of veridical world models, which are crucial for accurately predicting the outcomes of possible interactions. This paper focuses on the prospects of building foundation world models for the upcoming generation of embodied agents and presents a novel viewpoint on the significance of causality within these. We posit that integrating causal considerations is vital to facilitating meaningful physical interactions with the world. Fin...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Article (Journal)","Artificial intelligence","Computer science"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"hf-org-paper:deepseek-ai:2402.03300","title":"DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models","url":"https://huggingface.co/papers/2402.03300","published":"2024-02-05","authors":["DeepSeek"],"abstract":"Mathematical reasoning poses a significant challenge for language models due to its complex and structured nature. In this paper, we introduce DeepSeekMath 7B, which continues pre-training DeepSeek-Coder-Base-v1.5 7B with 120B math-related tokens sourced from Common Crawl, together with natural language and code data. DeepSeekMath 7B has achieved an impressive score of 51.7% on the competition-level MATH benchmark without relying on external toolkits and voting techniques, approaching the performance level of Gemini-Ultra and GPT-4. Self-consistency over 64 samples from DeepSeekMath 7B achieves 60.9% on MATH. The mathematical reasoning capability of DeepSeekMath is attributed to two key factors: First, we harness the significant potential of publicly available web data through a meticulously engineered data selection pipeline. Second, we introduce Group Relative Policy Optimization (GRPO...","companies":["DeepSeek"],"matched_orgs":["DeepSeek"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["HuggingFace org papers","deepseek-ai","memory"],"author_affiliations":["DeepSeek"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/deepseek-ai/papers"}},{"id":"openalex:W4395030094","title":"Label-Efficient Sleep Staging Using Transformers Pretrained with Position Prediction","url":"http://dx.doi.org/10.1109/aimhc59811.2024.00023","published":"2024-02-05","authors":["Sayeri Lala","Hanlin Goh","Christopher M. Sandino"],"abstract":"Sleep staging is a clinically important task for diagnosing various sleep disorders, but remains challenging to deploy at scale because it because it is both labor-intensive and time-consuming. Supervised deep learning-based approaches can automate sleep staging but at the expense of large labeled datasets, which can be unfeasible to procure for various settings, e.g., uncommon sleep disorders. While self-supervised learning (SSL) can mitigate this need, recent studies on SSL for sleep staging have shown performance gains saturate after training with labeled data from only tens of subjects, hence are unable to match peak performance attained with larger datasets. We hypothesize that the rapid saturation stems from applying a sub-optimal pretraining scheme that pretrains only a portion of the architecture, i.e, the feature encoder, but not the temporal encoder; therefore, we propose adopt...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/aimhc59811.2024.00023","openalex_id":"https://openalex.org/W4395030094","cited_by_count":0,"quality_score":41,"matched_keywords":["efficient"],"author_affiliations":["Apple (United States)","Princeton University"],"concepts":[{"id":"https://openalex.org/C66322947","display_name":"Transformer","score":0.7139014601707458},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.667367696762085},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.49025920033454895},{"id":"https://openalex.org/C198082294","display_name":"Position (finance)","score":0.45265334844589233},{"id":"https://openalex.org/C2775841894","display_name":"Sleep (system call)","score":0.44273698329925537},{"id":"https://openalex.org/C28490314","display_name":"Speech recognition","score":0.3699248433113098},{"id":"https://openalex.org/C153180895","display_name":"Pattern recognition (psychology)","score":0.33501869440078735},{"id":"https://openalex.org/C119599485","display_name":"Electrical engineering","score":0.17464497685432434}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"official:3e7f770ad85bcdad","title":"Introducing Qwen1.5","url":"https://qwenlm.github.io/blog/qwen1.5/","published":"2024-02-04","authors":["Alibaba/Qwen"],"abstract":"GITHUB HUGGING FACE MODELSCOPE DEMO DISCORDIntroduction In recent months, our focus has been on developing a “good” model while optimizing the developer experience. As we progress towards Qwen1.5, the next iteration in our Qwen series, this update arrives just before the Chinese New Year.With Qwen1.5, we are open-sourcing base and chat models across six sizes: 0.5B, 1.8B, 4B, 7B, 14B, 32B, 72B, and 110B, and also an MoE model (see blog for more information).","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Alibaba/Qwen"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official RSS feed https://qwenlm.github.io/blog/index.xml"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/nofuneval-funny-how-code-lms-falter-on-requirements-beyond-functional-correctness","title":"NoFunEval: Funny How Code LMs Falter on Requirements Beyond Functional Correctness","url":"https://www.microsoft.com/en-us/research/publication/nofuneval-funny-how-code-lms-falter-on-requirements-beyond-functional-correctness/","published":"2024-02-02","authors":["Manav Singhal","Tushar Aggarwal","Abhijeet Awasthi","Nagarajan Natarajan","Aditya Kanade"],"abstract":"Existing evaluation benchmarks of language models of code (code LMs) focus almost exclusively on whether the LMs can generate functionally-correct code. In real-world software engineering, developers think beyond functional correctness. They have requirements on\"how\"a functionality should be implemented to meet overall system design objectives like efficiency, security, and maintainability. They would also trust the code LMs more if the LMs demonstrate robust understanding of requirements and code semantics. We propose a new benchmark NoFunEval to evaluate code LMs on non-functional requirements and simple classification instances for both functional and non-functional requirements. We propose a prompting method, Coding Concepts (CoCo), as a way for a developer to communicate the domain knowledge to the LMs. We conduct an extensive evaluation of twenty-two code LMs. Our finding is that t...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Miscellaneous","Artificial intelligence","Programming languages and software engineering","Computation and Language","Computer science","software engineering"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/unilog-automatic-logging-via-llm-and-in-context-learning","title":"UniLog: Automatic Logging via LLM and In-Context Learning","url":"https://www.microsoft.com/en-us/research/publication/unilog-automatic-logging-via-llm-and-in-context-learning/","published":"2024-02-01","authors":["Junjielong Xu","Ziang Cui","Yuan Zhao","Xu Zhang","Shilin He","Pinjia He","Liqun Li","Yu Kang","Qingwei Lin 林庆维","Yingnong Dang","Saravan Rajmohan","Dongmei Zhang"],"abstract":"Logging, which aims to determine the position of logging statements, the verbosity levels, and the log messages, is a crucial process for software reliability enhancement. In recent years, numerous automatic logging tools have been designed to assist developers in one of the logging tasks ( e.g. , providing suggestions on whether to log in try-catch blocks). These tools are useful in certain situations yet cannot provide a comprehensive logging solution in general. Moreover, although recent research has started to explore end-to-end logging, it is still largely constrained by the high cost of fine-tuning, hindering its practical usefulness in software development. To address these problems, this paper proposes UniLog, an automatic logging framework based on the in-context learning (ICL) paradigm of large language models (LLMs). Specifically, UniLog can generate an appropriate logging sta...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.1145/3597503.3623326","openalex_id":"https://openalex.org/W4391558438","cited_by_count":55,"quality_score":122,"matched_keywords":["Inproceedings (Conference)","Algorithms","Artificial intelligence","Data platforms and analytics","Programming languages and software engineering","Systems and networking","AIOps","software engineering","1970-01-01","LLM"],"author_affiliations":["Microsoft","Chinese University of Hong Kong, Shenzhen","Microsoft (United States)","Microsoft Research Asia (China)","Peking University","Southeast University"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/interactive-agent-foundation-model","title":"An Interactive Agent Foundation Model","url":"https://www.microsoft.com/en-us/research/publication/interactive-agent-foundation-model/","published":"2024-02-01","authors":["Zane Durante","Bidipta Sarkar","Ran Gong","Rohan Taori","Yusuke Noda","Paul Tang","Ehsan Adeli","Shrinidhi Kowshika Lakshmikanth","Kevin Schulman","Arnold Milstein","Demetri Terzopoulos","Ade Famoti"],"abstract":"The development of artificial intelligence systems is transitioning from creating static, task-specific models to dynamic, agent-based systems capa ble of performing well in a wide range of ap plications. We propose an Agent Foundation Model that uses a novel multi-task agent training paradigm for training AI agents across a wide range of domains, datasets, and tasks. Our training paradigm unifies diverse pre training strategies, including visual masked auto- encoders, language modeling, and next-action prediction, enabling a versatile and adaptable AI framework. We demonstrate the performance of our framework across three separate domains— Robotics, Gaming AI, and Healthcare. Our model demonstrates its ability to generate meaningful and contextually relevant outputs in each area. The strength of our approach lies in its general ity, leveraging a variety of data sources such as robotics....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.1109/cvprw67362.2025.00350","openalex_id":"https://openalex.org/W4414199207","cited_by_count":5,"quality_score":105,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Human language technologies","Human-computer interaction","Medical, health and genomics","Agent AI","Deep learning","Embodied AI","Gaming","Health care","Robotics","agent"],"author_affiliations":["Microsoft","Microsoft (United States)","Microsoft Research (United Kingdom)","Stanford University","UCLA Health"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/xpert-empowering-incident-management-with-query-recommendations-via-large-language-models","title":"Xpert: Empowering Incident Management with Query Recommendations via Large Language Models","url":"https://www.microsoft.com/en-us/research/publication/xpert-empowering-incident-management-with-query-recommendations-via-large-language-models/","published":"2024-02-01","authors":["Yuxuan Jiang","Chaoyun Zhang","Shilin He","Zhihao Yang","Minghua Ma","Si Qin","Yu Kang","Yingnong Dang","Saravan Rajmohan","Qingwei Lin 林庆维","Dongmei Zhang"],"abstract":"Large-scale cloud systems play a pivotal role in modern IT infrastructure. However, incidents occurring within these systems can lead to service disruptions and adversely affect user experience. To swiftly resolve such incidents, on-call engineers depend on crafting domain-specific language (DSL) queries to analyze telemetry data. However, writing these queries can be challenging and time-consuming. This paper presents a thorough empirical study on the utilization of queries of XQL, a DSL employed for incident management in a large-scale cloud management system at CompanyX. The findings obtained underscore the importance and viability of XQL queries recommendation to enhance incident management.Building upon these valuable insights, we introduce Xpert, an end-to-end machine learning framework that automates XQL recommendation process. By leveraging historical incident data and large lang...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":88,"matched_keywords":["Inproceedings (Conference)","Algorithms","Artificial intelligence","Data platforms and analytics","Programming languages and software engineering","Systems and networking","AIOps","software engineering","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W4391429431","title":"The impact of chatbots based on large language models on second language vocabulary acquisition","url":"https://doi.org/10.1016/j.heliyon.2024.e25370","published":"2024-02-01","authors":["Zhihui Zhang","Xiaomeng Huang"],"abstract":"In recent years, the integration of artificial intelligence (AI) and machine learning (ML) into education, particularly for Personalized Language Learning (PLL), has garnered significant attention. This approach tailors interventions to address the unique challenges faced by individual learners. Large Language Models (LLMs), including Chatbots, have demonstrated a substantial potential in automating and enhancing educational tasks, effectively capturing the complexity and diversity of human language. In this study, 52 foreign language students were randomly divided into two groups: one with the assistance of a Chatbot based on LLMs and one without. Both groups learned the same series of target words over eight weeks. Post-treatment assessments, including systematic observation and quantitative tests assessing both receptive and productive vocabulary knowledge, were conducted immediately....","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1016/j.heliyon.2024.e25370","openalex_id":"https://openalex.org/W4391429431","cited_by_count":74,"quality_score":79,"matched_keywords":["LLM","personalized","long-term"],"author_affiliations":["Alibaba Group (China)","University of Southern California"],"concepts":[{"id":"https://openalex.org/C2777601683","display_name":"Vocabulary","score":0.7826476097106934},{"id":"https://openalex.org/C2779041454","display_name":"Chatbot","score":0.7331823110580444},{"id":"https://openalex.org/C74672266","display_name":"Language acquisition","score":0.5342264175415039},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5319775938987732},{"id":"https://openalex.org/C2781316041","display_name":"Diversity (politics)","score":0.42734211683273315},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.42186301946640015},{"id":"https://openalex.org/C15744967","display_name":"Psychology","score":0.3827829360961914},{"id":"https://openalex.org/C145420912","display_name":"Mathematics education","score":0.22920900583267212}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":74}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/ores-open-vocabulary-responsible-visual-synthesis","title":"ORES: Open-vocabulary Responsible Visual Synthesis","url":"https://www.microsoft.com/en-us/research/publication/ores-open-vocabulary-responsible-visual-synthesis/","published":"2024-02-01","authors":["Minheng Ni","Chenfei Wu","Xiaodong Wang","Shengming Yin","Lijuan Wang","Zicheng Liu","Nan Duan"],"abstract":"Avoiding synthesizing specific visual concepts is an essential challenge in responsible visual synthesis. However, the visual concept that needs to be avoided for responsible visual synthesis tends to be diverse, depending on the region, context, and usage scenarios. In this work, we formalize a new task, Open-vocabulary Responsible Visual Synthesis (ORES), where the synthesis model is able to avoid forbidden visual concepts while allowing users to input any desired content. To address this problem, we present a Two-stage Intervention (TIN) framework. By introducing 1) rewriting with learnable instruction through a large-scale language model (LLM) and 2) synthesizing with prompt intervention on a diffusion synthesis model, it can effectively synthesize images avoiding any concepts but following the user's query as much as possible. To evaluate on ORES, we provide a publicly available dat...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.1609/aaai.v38i19.30144","openalex_id":"https://openalex.org/W4393146681","cited_by_count":5,"quality_score":77,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","1970-01-01","LLM","language model"],"author_affiliations":["Microsoft","Microsoft (United States)","Microsoft Research Asia (China)"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/flame-a-small-language-model-for-spreadsheet-formulas","title":"FLAME: A Small Language Model for Spreadsheet Formulas","url":"https://www.microsoft.com/en-us/research/publication/flame-a-small-language-model-for-spreadsheet-formulas/","published":"2024-02-01","authors":["Harshit Joshi","Abishai Ebenezer","José Cambronero","Sumit Gulwani","Aditya Kanade","Vu Le","Ivan Radicek","Gust Verbruggen"],"abstract":"Spreadsheets are a vital tool for end-user data management.Using large language models for formula authoring assistance in these environments can be difficult, as these models are expensive to train and challenging to deploy due totheir size (up to billions of parameters). We present FLAME,a transformer-based model trained exclusively on Excel formulas that leverages domain insights to achieve competitiveperformance while being substantially smaller (60M parameters) and training on two orders of magnitude less data. Wecurate a training dataset using sketch deduplication, introducean Excel-specific formula tokenizer, and use domain-specificversions of masked span prediction and noisy auto-encodingas pre-training objectives. We evaluate FLAME on formularepair, formula completion, and similarity-based formula retrieval. FLAME can outperform much larger models, such asthe Davinci (175B) and....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","spreadsheets","1970-01-01","language model","retrieval"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/toward-human-ai-alignment-in-large-scale-multi-player-games","title":"Toward Human-AI Alignment in Large-Scale Multi-Player Games","url":"https://www.microsoft.com/en-us/research/publication/toward-human-ai-alignment-in-large-scale-multi-player-games/","published":"2024-02-01","authors":["Sugandha Sharma","Guy Davidson","Khimya Khetarpal","Anssi Kanervisto","Udit Arora","Katja Hofmann","Ida Momennejad"],"abstract":"Achieving human-AI alignment in complex multi-agent games is crucial for creating trustworthy AI agents that enhance gameplay. We propose a method to evaluate this alignment using an interpretable task-sets framework, focusing on high-level behavioral tasks instead of low-level policies. Our approach has three components. First, we analyze extensive human gameplay data from Xbox's Bleeding Edge (100K+ games), uncovering behavioral patterns in a complex task space. This task space serves as a basis set for a behavior manifold capturing interpretable axes: fight-flight, explore-exploit, and solo-multi-agent. Second, we train an AI agent to play Bleeding Edge using a Generative Pretrained Causal Transformer and measure its behavior. Third, we project human and AI gameplay to the proposed behavior manifold to compare and contrast. This allows us to interpret differences in policy as higher-l...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Tech Report","Artificial intelligence","Computer science","agent","multi-agent"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/ironies-of-generative-ai-understanding-and-mitigating-productivity-loss-in-human-ai-interactions","title":"Ironies of Generative AI: Understanding and mitigating productivity loss in human-AI interactions","url":"https://www.microsoft.com/en-us/research/publication/ironies-of-generative-ai-understanding-and-mitigating-productivity-loss-in-human-ai-interactions/","published":"2024-02-01","authors":["Auste Simkute","Lev Tankelevitch","Viktor Kewenig","Ava Elizabeth Scott","Abigail Sellen","Sean Rintel"],"abstract":"Generative AI (GenAI) systems offer opportunities to increase user productivity in many tasks, such as programming and writing. However, while they boost productivity in some studies, many others show that users are working ineffectively with GenAI systems and losing productivity. Despite the apparent novelty of these usability challenges, these 'ironies of automation' have been observed for over three decades in Human Factors research on the introduction of automation in domains such as aviation, automated driving, and intelligence. We draw on this extensive research alongside recent GenAI user studies to outline four key reasons for productivity loss with GenAI systems: a shift in users' roles from production to evaluation, unhelpful restructuring of workflows, interruptions, and a tendency for automation to make easy tasks easier and hard tasks harder. We then suggest how Human Factor...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Miscellaneous","Artificial intelligence","Human-computer interaction","Human–computer interaction","personalization"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W4391449530","title":"AnnoPRO: a strategy for protein function annotation based on multi-scale protein representation and a hybrid deep learning of dual-path encoding","url":"https://doi.org/10.1186/s13059-024-03166-1","published":"2024-02-01","authors":["Lingyan Zheng","Shuiyang Shi","Mingkun Lu","Fang Pan","Ziqi Pan","Hongning Zhang","Zhimeng Zhou","Hanyu Zhang","Minjie Mou","Shijie Huang","Lin Tao","Weiqi Xia"],"abstract":"Protein function annotation has been one of the longstanding issues in biological sciences, and various computational methods have been developed. However, the existing methods suffer from a serious long-tail problem, with a large number of GO families containing few annotated proteins. Herein, an innovative strategy named AnnoPRO was therefore constructed by enabling sequence-based multi-scale protein representation, dual-path protein encoding using pre-training, and function annotation by long short-term memory-based decoding. A variety of case studies based on different benchmarks were conducted, which confirmed the superior performance of AnnoPRO among available methods. Source code and models have been made freely available at: https://github.com/idrblab/AnnoPRO and https://zenodo.org/records/10012272.","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1186/s13059-024-03166-1","openalex_id":"https://openalex.org/W4391449530","cited_by_count":64,"quality_score":71,"matched_keywords":["memory"],"author_affiliations":["Alibaba Group (China)","Cloud Computing Center","East China University of Science and Technology","Hangzhou Normal University","Second Affiliated Hospital of Zhejiang University","Tsinghua University","Zhejiang Lab","Zhejiang Provincial People's Hospital","Zhejiang University"],"concepts":[{"id":"https://openalex.org/C2776321320","display_name":"Annotation","score":0.797468900680542},{"id":"https://openalex.org/C2776359362","display_name":"Representation (politics)","score":0.6085020303726196},{"id":"https://openalex.org/C125411270","display_name":"Encoding (memory)","score":0.5952669382095337},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5761868357658386},{"id":"https://openalex.org/C57273362","display_name":"Decoding methods","score":0.551943302154541},{"id":"https://openalex.org/C14036430","display_name":"Function (biology)","score":0.5482176542282104},{"id":"https://openalex.org/C2777735758","display_name":"Path (computing)","score":0.5441492795944214},{"id":"https://openalex.org/C207060522","display_name":"Protein function prediction","score":0.49449893832206726}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":64}},{"id":"openalex:W4391735184","title":"Self-supervised learning based on Transformer for flow reconstruction and prediction","url":"https://doi.org/10.1063/5.0188998","published":"2024-02-01","authors":["Bonan Xu","Yuanye Zhou","Xin Bian"],"abstract":"Machine learning has great potential for efficient reconstruction and prediction of flow fields. However, existing datasets may have highly diversified labels for different flow scenarios, which are not applicable for training a model. To this end, we make a first attempt to apply the self-supervised learning (SSL) technique to fluid dynamics, which disregards data labels for pre-training the model. The SSL technique embraces a large amount of data (8000 snapshots) at Reynolds numbers of Re = 200, 300, 400, and 500 without discriminating between them, which improves the generalization of the model. The Transformer model is pre-trained via a specially designed pretext task, where it reconstructs the complete flow fields after randomly masking 20% data points in each snapshot. For the downstream task of flow reconstruction, the pre-trained model is fine-tuned separately with 256 snapshots....","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1063/5.0188998","openalex_id":"https://openalex.org/W4391735184","cited_by_count":25,"quality_score":66,"matched_keywords":["efficient"],"author_affiliations":["Baidu (China)","Zhejiang University"],"concepts":[{"id":"https://openalex.org/C55282118","display_name":"Snapshot (computer storage)","score":0.6352566480636597},{"id":"https://openalex.org/C182748727","display_name":"Reynolds number","score":0.6107428073883057},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5572068095207214},{"id":"https://openalex.org/C66322947","display_name":"Transformer","score":0.5413851737976074},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5409293174743652},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.4985041618347168},{"id":"https://openalex.org/C2776214188","display_name":"Inference","score":0.4168534576892853},{"id":"https://openalex.org/C153180895","display_name":"Pattern recognition (psychology)","score":0.32116425037384033}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":25}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/machine-created-universal-language-for-cross-lingual-transfer","title":"Machine-Created Universal Language for Cross-lingual Transfer","url":"https://www.microsoft.com/en-us/research/publication/machine-created-universal-language-for-cross-lingual-transfer/","published":"2024-02-01","authors":["Yaobo Liang","Quanzhi Zhu","Junhe Zhao","Nan Duan"],"abstract":"There are two primary approaches to addressing cross-lingual transfer: multilingual pre-training, which implicitly aligns the hidden representations of various languages, and translate-test, which explicitly translates different languages into an intermediate language, such as English. Translate-test offers better interpretability compared to multilingual pre-training. However, it has lower performance than multilingual pre-training(Conneau and Lample, 2019; Conneau et al, 2020) and struggles with word-level tasks due to translation altering word order. As a result, we propose a new Machine-created Universal Language (MUL) as an alternative intermediate language. MUL comprises a set of discrete symbols forming a universal vocabulary and a natural language to MUL translator for converting multiple natural languages to MUL. MUL unifies shared concepts from various languages into a single u...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"apple:mry34sc281id0dhm56bv5640","title":"Scalable Pre-training of Large Autoregressive Image Models","url":"https://machinelearning.apple.com/research/autoregressive-image-models","published":"2024-02-01","authors":["Alaaeldin El-Nouby","Michal Klein","Shuangfei Zhai","Miguel Angel Bautista","Alexander Toshev","Vaishaal Shankar","Joshua M Susskind","Armand Joulin"],"abstract":"This paper introduces AIM, a collection of vision models pre-trained with an autoregressive objective. These models are inspired by their textual counterparts, i.e., Large Language Models (LLMs), and exhibit similar scaling properties. Specifically, we highlight two key findings: (1) the performance of the visual features scale with both the model capacity and the quantity of data, (2) the value of the objective function correlates with the...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"official:1f640ba83798f23c","title":"ConsiStory: Training-Free Consistent Text-to-Image Generation","url":"https://research.nvidia.com/publication/2024-02_consistory-training-free-consistent-text-image-generation","published":"2024-02","authors":["Yoad Tewel","Omri Kaduri","Rinon Gal","Yoni Kasten","Lior Wolf","Gal Chechik","Yuval Atzmon"],"abstract":"Official NVIDIA Research publication. SIGGRAPH","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["SIGGRAPH"],"author_affiliations":["NVIDIA"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official NVIDIA Research publications page https://research.nvidia.com/publications?f%5B0%5D=publication_date%3A2024&page=3"}},{"id":"official:10f325041b8638b7","title":"Building an early warning system for LLM-aided biological threat creation","url":"https://openai.com/index/building-an-early-warning-system-for-llm-aided-biological-threat-creation","published":"2024-01-31","authors":["OpenAI"],"abstract":"We’re developing a blueprint for evaluating the risk that a large language model (LLM) could aid someone in creating a biological threat. In an evaluation involving both biology experts and students, we found that GPT-4 provides at most a mild uplift in biological threat creation accuracy. While this uplift is not large enough to be conclusive, our finding is a starting point for continued research and community deliberation.","companies":["OpenAI"],"matched_orgs":["OpenAI"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Research","LLM","language model"],"author_affiliations":["OpenAI"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official RSS feed https://openai.com/news/rss.xml"}},{"id":"arxiv:2403.13291","title":"An Analysis on Matching Mechanisms and Token Pruning for Late-interaction Models","url":"http://arxiv.org/abs/2403.13291","published":"2024-01-31","authors":["Qi Liu","Gang Guo","Jiaxin Mao","Zhicheng Dou","Ji-Rong Wen","Hao Jiang","Xinyu Zhang","Zhao Cao"],"abstract":"With the development of pre-trained language models, the dense retrieval models have become promising alternatives to the traditional retrieval models that rely on exact match and sparse bag-of-words representations. Different from most dense retrieval models using a bi-encoder to encode each query or document into a dense vector, the recently proposed late-interaction multi-vector models (i.e., ColBERT and COIL) achieve state-of-the-art retrieval effectiveness by using all token embeddings to represent documents and queries and modeling their relevance with a sum-of-max operation. However, these fine-grained representations may cause unacceptable storage overhead for practical search systems. In this study, we systematically analyze the matching mechanism of these late-interaction models and show that the sum-of-max operation heavily relies on the co-occurrence signals and some importan...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3639818","openalex_id":"https://openalex.org/W4391402460","cited_by_count":4,"quality_score":45,"matched_keywords":["retrieval"],"author_affiliations":["Huawei Technologies (China)","Renmin University of China"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.9002099633216858},{"id":"https://openalex.org/C153083717","display_name":"Leverage (statistics)","score":0.7238892912864685},{"id":"https://openalex.org/C108010975","display_name":"Pruning","score":0.7008170485496521},{"id":"https://openalex.org/C48145219","display_name":"Security token","score":0.6789207458496094},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.5361430644989014},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5050435662269592},{"id":"https://openalex.org/C118505674","display_name":"Encoder","score":0.503027617931366},{"id":"https://openalex.org/C158154518","display_name":"Relevance (law)","score":0.49175336956977844}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":4}},{"id":"arxiv:2401.17723","title":"LoRec: Large Language Model for Robust Sequential Recommendation against Poisoning Attacks","url":"https://huggingface.co/papers/2401.17723","published":"2024-01-31","authors":["Kaike Zhang","Qi Cao","Yunfan Wu","Fei Sun","Huawei Shen","Xueqi Cheng"],"abstract":"Sequential recommender systems stand out for their ability to capture users' dynamic interests and the patterns of item-to-item transitions. However, the inherent openness of sequential recommender systems renders them vulnerable to poisoning attacks, where fraudulent users are injected into the training data to manipulate learned patterns. Traditional defense strategies predominantly depend on predefined assumptions or rules extracted from specific known attacks, limiting their generalizability to unknown attack types. To solve the above problems, considering the rich open-world knowledge encapsulated in Large Language Models (LLMs), our research initially focuses on the capabilities of LLMs in the detection of unknown fraudulent activities within recommender systems, a strategy we denote as LLM4Dec. Empirical evaluations demonstrate the substantial capability of LLMs in identifying unk...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["huggingface_search"],"source":"huggingface_search","work_type":"","doi":"","openalex_id":"","cited_by_count":0,"quality_score":35,"matched_keywords":["LLM","language model"],"author_affiliations":[],"concepts":[],"official_report":false,"quality_signals":{"company_match_source":"HuggingFace Papers organization/author metadata"}},{"id":"openalex:W4391349906","title":"H3-OPT: Accurate prediction of CDR-H3 loop structures of antibodies with deep learning","url":"https://doi.org/10.7554/elife.91512.2","published":"2024-01-30","authors":["Hedi Chen","Xiaoyu Fan","Shuqian Zhu","Yuchan Pei","Xiaochun Zhang","Xiaonan Zhang","Lihang Liu","Feng Qian","Boxue Tian"],"abstract":"Abstract Accurate prediction of the structurally diverse complementarity determining region heavy chain 3 (CDR-H3) loop structure remains a primary and long-standing challenge for antibody modeling. Here, we present the H3-OPT toolkit for predicting the 3D structures of monoclonal antibodies and nanobodies. H3-OPT combines the strengths of AlphaFold2 with a pre-trained protein language model, and provides a 2.24 Å average RMSDCα between predicted and experimentally determined CDR-H3 loops, thus outperforming other current computational methods in our non-redundant high-quality dataset. The model was validated by experimentally solving three structures of anti-VEGF nanobodies predicted by H3-OPT. We examined the potential applications of H3-OPT through analyzing antibody surface properties and antibody-antigen interactions. This structural prediction tool can be used to optimize antibody-...","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"preprint","doi":"https://doi.org/10.7554/elife.91512.2","openalex_id":"https://openalex.org/W4391349906","cited_by_count":0,"quality_score":41,"matched_keywords":["language model"],"author_affiliations":["Baidu (China)","Molecular Oncology (United States)","State Key Laboratory of Molecular Oncology","Tsinghua University"],"concepts":[{"id":"https://openalex.org/C184670325","display_name":"Loop (graph theory)","score":0.596560537815094},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.47578418254852295},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.43609195947647095},{"id":"https://openalex.org/C33923547","display_name":"Mathematics","score":0.2466331124305725},{"id":"https://openalex.org/C114614502","display_name":"Combinatorics","score":0.06353098154067993}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/strokenuwa-tokenizing-strokes-for-vector-graphic-synthesis","title":"StrokeNUWA: Tokenizing Strokes for Vector Graphic Synthesis","url":"https://www.microsoft.com/en-us/research/publication/strokenuwa-tokenizing-strokes-for-vector-graphic-synthesis/","published":"2024-01-29","authors":["Zecheng Tang","Chenfei Wu","Zekai Zhang","Mingheng Ni","Sheng-Siang Yin","Yu Liu","Zhengyuan Yang","Lijuan Wang","Zicheng Liu","Juntao Li","Nan Duan"],"abstract":"To leverage LLMs for visual synthesis, traditional methods convert raster image information into discrete grid tokens through specialized visual modules, while disrupting the model's ability to capture the true semantic representation of visual scenes. This paper posits that an alternative representation of images, vector graphics, can effectively surmount this limitation by enabling a more natural and semantically coherent segmentation of the image information. Thus, we introduce StrokeNUWA, a pioneering work exploring a better visual representation ''stroke tokens'' on vector graphics, which is inherently visual semantics rich, naturally compatible with LLMs, and highly compressed. Equipped with stroke tokens, StrokeNUWA can significantly surpass traditional LLM-based and optimization-based methods across various metrics in the vector graphic generation task. Besides, StrokeNUWA achiev...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":80,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","large language models","1970-01-01","LLM","compression"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"apple:l8awpq54c0conr1pxxa4sn65","title":"Acoustic Model Fusion for End-to-end Speech Recognition","url":"https://machinelearning.apple.com/research/acoustic-model-fusion","published":"2024-01-29","authors":["Zhihong Lei","Mingbin Xu","Shiyi Han","Leo Liu","Zhen Huang","Tim Ng","Yuanyuan Zhang","Ernest Pusateri","Mirko Hannemann","Yaqiao Deng","Man-Hung Siu"],"abstract":"Recent advances in deep learning and automatic speech recognition (ASR) have enabled the end-to-end (E2E) ASR system and boosted its accuracy to a new level. The E2E systems implicitly model all conventional ASR components, such as the acoustic model (AM) and the language model (LM), in a single network trained on audio-text pairs. Despite this simpler system architecture, fusing a separate LM, trained exclusively on text corpora, into the E2E...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["language model"],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"apple:jtjzppzu7gmb4028emcqoqkb","title":"Large-scale Training of Foundation Models for Wearable Biosignals","url":"https://machinelearning.apple.com/research/large-scale-training","published":"2024-01-29","authors":["Salar Abbaspourazad","Oussama Elachqar","Andrew C. Miller","Saba Emrani","Udhyakumar Nallasamy","Ian Shapiro"],"abstract":"Tracking biosignals is crucial for monitoring wellness and preempting the development of severe medical conditions. Today, wearable devices can conveniently record various biosignals, creating the opportunity to monitor health status without disruption to one's daily routine. Despite the widespread use of wearable devices and existing digital biomarkers, the absence of curated data with annotated medical labels hinders the development of new...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":52,"matched_keywords":[],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/slicegpt-compress-large-language-models-by-deleting-rows-and-columns","title":"SliceGPT: Compress Large Language Models by Deleting Rows and Columns","url":"https://www.microsoft.com/en-us/research/publication/slicegpt-compress-large-language-models-by-deleting-rows-and-columns/","published":"2024-01-25","authors":["Saleh Ashkboos","Maximilian L. Croci","Marcelo Gennari do Nascimento","Torsten Hoefler","James Hensman"],"abstract":"Large language models have become the cornerstone of natural language processing, but their use comes with substantial costs in terms of compute and memory resources. Sparsification provides a solution to alleviate these resource constraints, and recent works have shown that trained models can be sparsified post-hoc. Existing sparsification techniques face challenges as they need additional data structures and offer constrained speedup with current hardware. In this paper we present SliceGPT, a new post-training sparsification scheme which replaces each weight matrix with a smaller (dense) matrix, reducing the embedding dimension of the network. Through extensive experimentation, we show that SliceGPT can remove up to 25% of the model parameters (including embeddings) for LLAMA2-70B, OPT 66B and Phi-2 models while maintaining 99%, 99% and 90% zero-shot task performance of the dense model...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":84,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Human language technologies","Computer science","large language models","Natural language processing","1970-01-01","memory"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/eagle-speculative-sampling-requires-rethinking-feature-uncertainty","title":"EAGLE: Speculative Sampling Requires Rethinking Feature Uncertainty","url":"https://www.microsoft.com/en-us/research/publication/eagle-speculative-sampling-requires-rethinking-feature-uncertainty/","published":"2024-01-25","authors":["Yuhui Li","Fangyun Wei","Chao Zhang","Hongyang Zhang","Fangyun Wei"],"abstract":"Autoregressive decoding makes the inference of Large Language Models (LLMs) time-consuming. In this paper, we reconsider speculative sampling and derive two key observations. Firstly, autoregression at the feature (second-to-top-layer) level is more straightforward than at the token level. Secondly, the inherent uncertainty in feature (second-to-top-layer) level autoregression constrains its performance. Based on these insights, we introduce EAGLE (Extrapolation Algorithm for Greater Language-model Efficiency), a simple yet highly efficient speculative sampling framework. By incorporating a token sequence advanced by one time step, EAGLE effectively resolves the uncertainty, enabling precise second-to-top-layer feature prediction with minimal overhead. We conducted comprehensive evaluations of EAGLE, including all models from the Vicuna and LLaMA2-Chat series, the MoE model Mixtral 8x7B....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","large language models","1970-01-01","efficient"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"hf-org-paper:deepseek-ai:2401.14196","title":"DeepSeek-Coder: When the Large Language Model Meets Programming -- The Rise of Code Intelligence","url":"https://huggingface.co/papers/2401.14196","published":"2024-01-25","authors":["DeepSeek"],"abstract":"The rapid development of large language models has revolutionized code intelligence in software development. However, the predominance of closed-source models has restricted extensive research and development. To address this, we introduce the DeepSeek-Coder series, a range of open-source code models with sizes from 1.3B to 33B, trained from scratch on 2 trillion tokens. These models are pre-trained on a high-quality project-level code corpus and employ a fill-in-the-blank task with a 16K window to enhance code generation and infilling. Our extensive evaluations demonstrate that DeepSeek-Coder not only achieves state-of-the-art performance among open-source code models across multiple benchmarks but also surpasses existing closed-source models like Codex and GPT-3.5. Furthermore, DeepSeek-Coder models are under a permissive license that allows for both research and unrestricted commercia...","companies":["DeepSeek"],"matched_orgs":["DeepSeek"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["HuggingFace org papers","deepseek-ai","language model"],"author_affiliations":["DeepSeek"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/deepseek-ai/papers"}},{"id":"official:d981b6869add5eb4","title":"Minimax Estimation for Personalized Federated Learning: An Alternative between FedAvg and Local Training?","url":"https://ai.meta.com/research/publications/minimax-estimation-for-personalized-federated-learning-an-alternative-between-fedavg-and-local-training/","published":"2024-01-25","authors":["Shuxiao Chen","Qinqing Zheng","Qi Long","Weijie Su"],"abstract":"Official Meta AI publication page.","companies":["Meta/FAIR"],"matched_orgs":["Meta/FAIR"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Core Machine Learning","personalized"],"author_affiliations":["Meta/FAIR"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Meta AI publication search https://ai.meta.com/results/?content_types%5B0%5D=publication&page=16"}},{"id":"official:bd73e0a4850a7e19","title":"Introducing Qwen-VL","url":"https://qwenlm.github.io/blog/qwen-vl/","published":"2024-01-25","authors":["Alibaba/Qwen"],"abstract":"Along with the rapid development of our large language model Qwen, we leveraged Qwen’s capabilities and unified multimodal pretraining to address the limitations of multimodal models in generalization, and we opensourced multimodal model Qwen-VL in Sep. 2023. Recently, the Qwen-VL series has undergone a significant upgrade with the launch of two enhanced versions, Qwen-VL-Plus and Qwen-VL-Max. The key technical advancements in these versions include:Substantially boost in image-related reasoning capabilities; Considerable enhancement in recognizing, extracting, and analyzing details within images and texts contained therein; Support for high-definition images with resolutions above one million pixels and images of various aspect ratios.","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["language model"],"author_affiliations":["Alibaba/Qwen"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official RSS feed https://qwenlm.github.io/blog/index.xml"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/leveraging-large-language-models-for-collective-decision-making","title":"Leveraging Large Language Models for Collective Decision-Making","url":"https://www.microsoft.com/en-us/research/publication/leveraging-large-language-models-for-collective-decision-making/","published":"2024-01-24","authors":["Marios Papachristou","Longqi Yang","Chin-Chia Hsu"],"abstract":"In various work contexts, such as meeting scheduling, collaborating, and project planning, collective decision-making is essential but often challenging due to diverse individual preferences, varying work focuses, and power dynamics among members. To address this, we propose a system leveraging Large Language Models (LLMs) to facilitate group decision-making by managing conversations and balancing preferences among individuals. Our system aims to extract individual preferences from conversations and suggest options that satisfy the preferences of the members. We specifically apply this system to corporate meeting scheduling. We create synthetic employee profiles and simulate conversations at scale, leveraging LLMs to evaluate the system performance as a novel approach to conducting a user study. Our results indicate efficient coordination with reduced interactions between the members and...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.1145/3757418","openalex_id":"https://openalex.org/W4415250585","cited_by_count":0,"quality_score":84,"matched_keywords":["Unpublished","Artificial intelligence","Human-computer interaction","Computation and Language","Human–computer interaction","large language models","LLM","efficient"],"author_affiliations":["Microsoft","Arizona State University","Cornell University","Microsoft (United States)"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W4391165314","title":"Learning Hierarchical Fingerprints via Multi-Level Fusion for Video Integrity and Source Analysis","url":"https://doi.org/10.1109/tce.2024.3357977","published":"2024-01-24","authors":["Yuanman Li","Jiaxiong Ye","Limin Zeng","Rongqin Liang","Xianwei Zheng","Weiwei Sun","Na Wang"],"abstract":"As a prevalent form of multimodal data, video data plays a crucial role in numerous applications, offering various benefits. Meanwhile, video integrity and source issues also pose security risks. Video data is multimodal, containing a container describing video coding and packaging, along with a video data stream featuring visual and audio information. Many works on video integrity and source analysis focus on video containers, and they overlook the fact that a malicious user can readily manipulate these traces within the containers by reconstructing them without transcoding. In our research, we propose a hierarchical fingerprint learning framework through multi-level fusion for video integrity and source analysis. Our approach integrates video encoding attributes, extracting multi-level features from both decoded video key frames and reference frames. We model the dependencies between t...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/tce.2024.3357977","openalex_id":"https://openalex.org/W4391165314","cited_by_count":8,"quality_score":45,"matched_keywords":[],"author_affiliations":["Alibaba Group (China)","Foshan University","Shenzhen University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8416259288787842},{"id":"https://openalex.org/C153876917","display_name":"Traceability","score":0.5847100615501404},{"id":"https://openalex.org/C202474056","display_name":"Video tracking","score":0.5798553824424744},{"id":"https://openalex.org/C65483669","display_name":"Video processing","score":0.5585997700691223},{"id":"https://openalex.org/C43126263","display_name":"Source code","score":0.4653205871582031},{"id":"https://openalex.org/C179518139","display_name":"Coding (social sciences)","score":0.46519505977630615},{"id":"https://openalex.org/C2776401178","display_name":"Feature (linguistics)","score":0.4517485499382019},{"id":"https://openalex.org/C125411270","display_name":"Encoding (memory)","score":0.4342452883720398}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":8}},{"id":"openalex:W4391128584","title":"Personalized Prompt for Sequential Recommendation","url":"https://doi.org/10.1109/tkde.2024.3357498","published":"2024-01-23","authors":["Yiqing Wu","Ruobing Xie","Yongchun Zhu","Fuzhen Zhuang","Xu Zhang","Leyu Lin","Qing He"],"abstract":"Pre-training models have shown their power in sequential recommendation. Recently, prompt has been widely explored and verified for tuning after pre-training in NLP, which helps to more effectively and parameter-efficiently extract useful knowledge from pre-training models for downstream tasks, especially in cold-start scenarios. However, it is challenging to bring prompt-tuning from NLP to recommendation, since the tokens of recommendation (i.e., items) are million-level and do not have concrete explainable semantics, and the sequence modeling in recommendation should be personalized. In this work, we first introduce prompt to recommendation models and propose a novel Personalized prompt-based recommendation (PPR) framework for cold-start recommendation. Specifically, we build personalized soft prompt via a prompt generator based on user profiles, and enable a sufficient training on pro...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/tkde.2024.3357498","openalex_id":"https://openalex.org/W4391128584","cited_by_count":45,"quality_score":75,"matched_keywords":["personalized","efficient"],"author_affiliations":["Beihang University","Institute of Computing Technology","Tencent (China)","University of Chinese Academy of Sciences"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8907545804977417},{"id":"https://openalex.org/C557471498","display_name":"Recommender system","score":0.6298061609268188},{"id":"https://openalex.org/C2780992000","display_name":"Generator (circuit theory)","score":0.4834038317203522},{"id":"https://openalex.org/C184337299","display_name":"Semantics (computer science)","score":0.46946752071380615},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.46499741077423096},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.43240463733673096},{"id":"https://openalex.org/C163258240","display_name":"Power (physics)","score":0.10474923253059387},{"id":"https://openalex.org/C62520636","display_name":"Quantum mechanics","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":45}},{"id":"official:3146db0787c1bcd8","title":"Introducing Qwen","url":"https://qwenlm.github.io/blog/qwen/","published":"2024-01-23","authors":["Alibaba/Qwen"],"abstract":"4 months after our first release of Qwen-7B, which is the starting point of our opensource journey of large language models (LLM), we now provide an introduction to the Qwen series to give you a whole picture of our work as well as our objectives. Below are important links to our opensource projects and community.PAPER GITHUB HUGGING FACE MODELSCOPE DISCORDAdditionally, we have WeChat groups for chatting and we invite you to join the groups through the provided link in our GitHub readme.","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["LLM"],"author_affiliations":["Alibaba/Qwen"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official RSS feed https://qwenlm.github.io/blog/index.xml"}},{"id":"arxiv:2401.12920","title":"Truck Parking Usage Prediction with Decomposed Graph Neural Networks","url":"https://huggingface.co/papers/2401.12920","published":"2024-01-23","authors":["Rei Tamaru","Yang Cheng","Steven Parker","Ernie Perry","Bin Ran","Soyoung Ahn"],"abstract":"Truck parking on freight corridors faces the major challenge of insufficient parking spaces. This is exacerbated by the Hour-of-Service (HOS) regulations, which often result in unauthorized parking practices, causing safety concerns. It has been shown that providing accurate parking usage prediction can be a cost-effective solution to reduce unsafe parking practices. In light of this, existing studies have developed various methods to predict the usage of a truck parking site and have demonstrated satisfactory accuracy. However, these studies focused on a single parking site, and few approaches have been proposed to predict the usage of multiple truck parking sites considering spatio-temporal dependencies, due to the lack of data. This paper aims to fill this gap and presents the Regional Temporal Graph Convolutional Network (RegT-GCN) to predict parking usage across the entire state to....","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["huggingface_search"],"source":"huggingface_search","work_type":"","doi":"","openalex_id":"","cited_by_count":0,"quality_score":27,"matched_keywords":[],"author_affiliations":[],"concepts":[],"official_report":false,"quality_signals":{"company_match_source":"HuggingFace Papers organization/author metadata"}},{"id":"bytedance-seed:28","title":"Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data","url":"https://seed.bytedance.com/en/research/depth-anything-unleashing-the-power-of-large-scale-unlabeled-data","published":"2024-01-19","authors":["Lihe Yang","Bingyi Kang","Zilong Huang","Xiaogang Xu","Jiashi Feng","Hengshuang Zhao"],"abstract":"This work presents Depth Anything, a highly practical solution for robust monocular depth estimation. Without pursuing novel technical modules, we aim to build a simple yet powerful foundation model dealing with any images under any circumstances. To this end, we scale up the dataset by designing a data engine to collect and automatically annotate large-scale unlabeled data (~62M), which significantly enlarges the data coverage and thus is able to reduce the generalization error. We investigate two simple yet effective strategies that make data scaling-up promising. First, a more challenging optimization target is created by leveraging data augmentation tools. It compels the model to actively seek extra visual knowledge and acquire robust representations. Second, an auxiliary supervision is developed to enforce the model to inherit rich semantic priors from pre-trained encoders. We evalu...","companies":["ByteDance/Seed"],"matched_orgs":["ByteDance/Seed"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Computer Vision","Vision","CVPR 2024"],"author_affiliations":["ByteDance/Seed"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official ByteDance Seed publication API https://seed.bytedance.com/api/get_article_list_v2"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/contrastive-preference-optimization-pushing-the-boundaries-of-llm-performance-in-machine-translation","title":"Contrastive Preference Optimization: Pushing the Boundaries of LLM Performance in Machine Translation","url":"https://www.microsoft.com/en-us/research/publication/contrastive-preference-optimization-pushing-the-boundaries-of-llm-performance-in-machine-translation/","published":"2024-01-16","authors":["Haoran Xu","Amr Sharaf","Yunmo Chen","Weiting Tan","Lingfeng Shen","Benjamin Van Durme","Kenton Murray","Young Jin Kim"],"abstract":"Moderate-sized large language models (LLMs) -- those with 7B or 13B parameters -- exhibit promising machine translation (MT) performance. However, even the top-performing 13B LLM-based translation models, like ALMA, does not match the performance of state-of-the-art conventional encoder-decoder translation models or larger-scale LLMs such as GPT-4. In this study, we bridge this performance gap. We first assess the shortcomings of supervised fine-tuning for LLMs in the MT task, emphasizing the quality issues present in the reference data, despite being human-generated. Then, in contrast to SFT which mimics reference translations, we introduce Contrastive Preference Optimization (CPO), a novel approach that trains models to avoid generating adequate but not perfect translations. Applying CPO to ALMA models with only 22K parallel sentences and 12M parameters yields significant improvements....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":84,"matched_keywords":["Unpublished","Artificial intelligence","Inproceedings (Conference)","Computer science","large language models","1970-01-01","LLM","preference"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/biomedclip-a-multimodal-biomedical-foundation-model-pretrained-from-fifteen-million-scientific-image-text-pairs","title":"BiomedCLIP: a multimodal biomedical foundation model pretrained from fifteen million scientific image-text pairs","url":"https://www.microsoft.com/en-us/research/publication/biomedclip-a-multimodal-biomedical-foundation-model-pretrained-from-fifteen-million-scientific-image-text-pairs/","published":"2024-01-16","authors":["Sheng Zhang","Yanbo Xu","Naoto Usuyama","Hanwen Xu","Jaspreet Bagga","Rob Tinn","Sam Preston","Rajesh Rao","Mu Wei","Naveen Valluri","Cliff Wong","Andrea Tupini"],"abstract":"Biomedical data is inherently multimodal, comprising physical measurements and natural language narratives. A generalist biomedical AI model needs to simultaneously process different modalities of data, including text and images. Therefore, training an effective generalist biomedical model requires high-quality multimodal data, such as parallel image-text pairs. Here, we present PMC-15M, a novel dataset that is two orders of magnitude larger than existing biomedical multimodal datasets such as MIMIC-CXR, and spans a diverse range of biomedical image types. PMC-15M contains 15 million biomedical image-text pairs collected from 4.4 million scientific articles. Based on PMC-15M, we have pretrained BiomedCLIP, a multimodal foundation model, with domain-specific adaptations tailored to biomedical vision-language processing. We conducted extensive experiments and ablation studies on standard b...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Miscellaneous","Computer vision","Computer Vision and Pattern Recognition","retrieval"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"huawei-noah:137","title":"Chain-of-Experts: When LLMs Meet Complex Operations Research Problems","url":"https://www.noahlab.com.hk/en/scientific_research/chain-of-experts-when-llms-meet-complex-operations-research-problems","published":"2024-01-16","authors":["Huawei/Noah"],"abstract":"Official Huawei Noah's Ark Lab publication entry. Venue: ICLR 2024. External paper link: https://openreview.net/forum?id=HobyL1B9CZ","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Optimize intelligence","ICLR 2024","2024"],"author_affiliations":["Huawei/Noah"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Huawei Noah Wagtail publication API https://www.noahlab.com.hk/wt_app/api/v2/pages/"}},{"id":"apple:jefnjwhus21qbrye3nw96lta","title":"Personalization of CTC-based End-to-End Speech Recognition Using Pronunciation-Driven Subword Tokenization","url":"https://machinelearning.apple.com/research/ctc-based","published":"2024-01-16","authors":["Zhihong Lei","Ernest Pusateri","Shiyi Han","Leo Liu","Mingbin Xu","Tim Ng","Ruchir Travadi","Youyuan Zhang","Mirko Hannemann","Man-Hung Siu","Zhen Huang"],"abstract":"Recent advances in deep learning and automatic speech recognition have boosted the accuracy of end-to-end speech recognition to a new level. However, recognition of personal content, such as contact names, remains a challenge. In this work, we present a personalization solution for an end-to-end system based on connectionist temporal classification. Our solution uses a class-based language model, in which a general language model provides...","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["language model","personalization"],"author_affiliations":["Apple"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Apple Machine Learning page https://machinelearning.apple.com/research"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/rag-vs-fine-tuning-pipelines-tradeoffs-and-a-case-study-on-agriculture","title":"RAG vs Fine-tuning: Pipelines, Tradeoffs, and a Case Study on Agriculture","url":"https://www.microsoft.com/en-us/research/publication/rag-vs-fine-tuning-pipelines-tradeoffs-and-a-case-study-on-agriculture/","published":"2024-01-15","authors":["Maria Angels de Luis Balaguer","Vinamra Benara","Renato L. de F. Cunha","Roberto Estevão","Todd Hendry","Daniel Holstein","Jennifer Marsman","Nick Mecklenburg","Sara Malvar","Leonardo Nunes","Rafael Padilha","Morris Sharp"],"abstract":"There are two common ways in which developers are incorporating proprietary and domain-specific data when building applications of Large Language Models (LLMs): Retrieval-Augmented Generation (RAG) and Fine-Tuning. RAG augments the prompt with the external data, while fine-Tuning incorporates the additional knowledge into the model itself. However, the pros and cons of both approaches are not well understood. In this paper, we propose a pipeline for fine-tuning and RAG, and present the tradeoffs of both for multiple popular LLMs, including Llama2-13B, GPT-3.5, and GPT-4. Our pipeline consists of multiple stages, including extracting information from PDFs, generating questions and answers, using them for fine-tuning, and leveraging GPT-4 for evaluating the results. We propose metrics to assess the performance of different stages of the RAG and fine-Tuning pipeline. We conduct an in-depth....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Miscellaneous","Artificial intelligence","Data platforms and analytics","Computer science","large language models","retrieval"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W4390872793","title":"An Ultralow-Power Triaxial MEMS Accelerometer With High-Voltage Biasing and Electrostatic Mismatch Compensation","url":"https://doi.org/10.1109/jssc.2024.3349861","published":"2024-01-15","authors":["Yimai Peng","Seokhyeon Jeong","Kyojin Choo","Yejoong Kim","Li-Yu Chen","Rohit Rothe","Li Xu","Ilya Gurin","O. Oliaei","Matthew Thompson","Stephen F. Bart","P.G. Hartwell"],"abstract":"This article presents a triaxial microelectromechanical system (MEMS) capacitive accelerometer using a high-voltage biasing technique to achieve high resolution with ultralow power. The accelerometer system generates a differential pair of high voltages to bias the MEMS structure, raising the MEMS signal substantially above the noise floor of the analog front-end (AFE) circuits. With the consequent increased signal-to-noise ratio (SNR), the proposed accelerometer system eliminates the need for a power-hungry low-noise amplifier (LNA) and signal chopping which significantly improves the power-noise tradeoff found in conventionally biased MEMS accelerometers. Moreover, by fine-tuning the bias voltages, the proposed method cancels the electrostatic mismatch in the MEMS due to process variation and ensures robust operation. The proposed accelerometer is composed of one integrated MEMS-CMOS c...","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/jssc.2024.3349861","openalex_id":"https://openalex.org/W4390872793","cited_by_count":2,"quality_score":39,"matched_keywords":[],"author_affiliations":["Nvidia (United States)","University of Michigan","École Polytechnique Fédérale de Lausanne"],"concepts":[{"id":"https://openalex.org/C89805583","display_name":"Accelerometer","score":0.750726580619812},{"id":"https://openalex.org/C37977207","display_name":"Microelectromechanical systems","score":0.6457556486129761},{"id":"https://openalex.org/C99498987","display_name":"Noise (video)","score":0.571806788444519},{"id":"https://openalex.org/C20254490","display_name":"Biasing","score":0.5508942604064941},{"id":"https://openalex.org/C119599485","display_name":"Electrical engineering","score":0.5116744041442871},{"id":"https://openalex.org/C165005293","display_name":"Chip","score":0.49750950932502747},{"id":"https://openalex.org/C206755178","display_name":"Capacitive sensing","score":0.47921377420425415},{"id":"https://openalex.org/C165801399","display_name":"Voltage","score":0.45363953709602356}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":2}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/welding-natural-language-queries-to-analytics-irs-with-llms","title":"Welding Natural Language Queries to Analytics IRs with LLMs","url":"https://www.microsoft.com/en-us/research/publication/welding-natural-language-queries-to-analytics-irs-with-llms/","published":"2024-01-14","authors":["Kaushik Rajan","Aseem Rastogi","Akash Lal","Sampath Rajendra","Krithika Subramanian","Krut Patel"],"abstract":"From the recent momentum behind translating natural language to SQL (nl2sql), to commercial product offerings such as Co-Pilot for Microsoft Fabric, Large Language Models (LLMs) are poised to have a big impact on data analytics. In this paper, we show that LLMs can be used to convert natural language analytics queries directly to custom intermediate query representations (IRs) of modern data analytics systems. This has the direct benefit of making IRs more accessible to end-users, but interestingly, it can also result in improved translation accuracy and better end-to-end performance, especially when the query semantics is better captured in the IR rather than in SQL. We build an LLM-based pipeline (nl2weld) for one instance of this flow, to translate natural language queries to the Weld IR using gpt-4. nl2weld is carefully designed to harness self-reflection and instruction-following ca...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Inproceedings (Conference)","Data platforms and analytics","1970-01-01","LLM"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/knowledge-centric-templatic-views-of-documents","title":"Knowledge-Centric Templatic Views of Documents","url":"https://www.microsoft.com/en-us/research/publication/knowledge-centric-templatic-views-of-documents/","published":"2024-01-12","authors":["Isabel Cachola","Silviu Cucerzan","Allen Herring","Vuksan Mijovic","Erik Oveson","Sujay Kumar Jauhar"],"abstract":"Authors seeking to communicate with broader audiences often compose their ideas about the same underlying knowledge in different documents and formats -- for example, as slide decks, newsletters, reports, brochures, etc. Prior work in document generation has generally considered the creation of each separate format to be different a task, developing independent methods for generation and evaluation. This approach is suboptimal for the advancement of AI-supported content authoring from both research and application perspectives because it leads to fragmented learning processes, redundancy in models and methods, and disjointed evaluation. Thus, in our work, we consider each of these documents to be templatic views of the same underlying knowledge, and we aim to unify the generation and evaluation of these templatic views of documents. We begin by introducing an LLM-powered method to extrac...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Miscellaneous","Artificial intelligence","Computer science","large language models","LLM"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/domain-adaptation-for-sustainable-soil-management-using-causal-and-contrastive-constraint-minimization","title":"Domain Adaptation for Sustainable Soil Management using Causal and Contrastive Constraint Minimization","url":"https://www.microsoft.com/en-us/research/publication/domain-adaptation-for-sustainable-soil-management-using-causal-and-contrastive-constraint-minimization/","published":"2024-01-12","authors":["Somya Sharma","Swati Sharma","Rafael Padilha","Emre Kiciman","Ranveer Chandra"],"abstract":"Monitoring organic matter is pivotal for maintaining soil health and can help inform sustainable soil management practices. While sensor-based soil information offers higher-fidelity and reliable insights into organic matter changes, sampling and measuring sensor data is cost-prohibitive. We propose a multi-modal, scalable framework that can estimate organic matter from remote sensing data, a more readily available data source while leveraging sparse soil information for improving generalization. Using the sensor data, we preserve underlying causal relations among sensor attributes and organic matter. Simultaneously we leverage inherent structure in the data and train the model to discriminate among domains using contrastive learning. This causal and contrastive constraint minimization ensures improved generalization and adaptation to other domains. We also shed light on the interpretabi...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Article (Journal)","Artificial intelligence","Computer science"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"hf-org-paper:deepseek-ai:2401.06066","title":"DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models","url":"https://huggingface.co/papers/2401.06066","published":"2024-01-11","authors":["DeepSeek"],"abstract":"","companies":["DeepSeek"],"matched_orgs":["DeepSeek"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["HuggingFace org papers","deepseek-ai"],"author_affiliations":["DeepSeek"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/deepseek-ai/papers"}},{"id":"openalex:W4390765666","title":"Generative modeling of biological shapes and images using a probabilistic <i>α</i> -shape sampler","url":"http://dx.doi.org/10.1101/2024.01.09.574919","published":"2024-01-11","authors":["Emily T. Winn-Nuñez","Hadley Witt","Dhananjay Bhaskar","Ryan Yuki Huang","Jonathan S. Reichner","Ian Y. Wong","Lorin Crawford"],"abstract":"-shape sampler R package is open-source and can be downloaded at https://github.com/lcrawlab/ashapesampler.","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"preprint","doi":"https://doi.org/10.1101/2024.01.09.574919","openalex_id":"https://openalex.org/W4390765666","cited_by_count":1,"quality_score":38,"matched_keywords":[],"author_affiliations":["Brown University","Microsoft (United States)","Providence College","Rhode Island Hospital","Yale University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7188376784324646},{"id":"https://openalex.org/C49937458","display_name":"Probabilistic logic","score":0.6377169489860535},{"id":"https://openalex.org/C2780451532","display_name":"Task (project management)","score":0.553360104560852},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.537169873714447},{"id":"https://openalex.org/C167966045","display_name":"Generative model","score":0.5347902178764343},{"id":"https://openalex.org/C2776214188","display_name":"Inference","score":0.49519410729408264},{"id":"https://openalex.org/C115961682","display_name":"Image (mathematics)","score":0.4877351224422455},{"id":"https://openalex.org/C2778334786","display_name":"Variation (astronomy)","score":0.46246403455734253}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/mindagent-emergent-gaming-interaction","title":"MindAgent: Emergent Gaming Interaction","url":"https://www.microsoft.com/en-us/research/publication/mindagent-emergent-gaming-interaction/","published":"2024-01-09","authors":["Steven Gong","Qiuyuan Huang","Xiaojian Ma","Hoi Vo","Zane Durante","Yusuke Noda","Zilong Zheng","Song-chun Zhu","Demetri Terzopoulos","Feifei Li","Jianfeng Gao"],"abstract":"Large Language Models (LLMs) have the capacity of performing complex scheduling in a multi-agent system and can coordinate these agents into com pleting sophisticated tasks that require extensive collaboration. However, despite the introduction of numerous gaming frameworks, the community has insufficient benchmarks rather than building general multi-agents collaboration infrastructure that encompass both LLM and human-NPCs communications. In this work, we propose a novel infrastructure - MindAgent - to evaluate planning and coordina tion emergent capabilities for gaming interaction. In particular, our infrastructure leverages existing gaming framework to require understanding of the coordina tor for a considerable multi-agents, collaborate with human players via un- finetuned proper instructions, and establish an in-context learning with feedback on few-shot prompt way. Furthermore, we....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":80,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Human language technologies","1970-01-01","LLM","agent","multi-agent"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/trustllm-trustworthiness-in-large-language-models","title":"TrustLLM: Trustworthiness in Large Language Models","url":"https://www.microsoft.com/en-us/research/publication/trustllm-trustworthiness-in-large-language-models/","published":"2024-01-09","authors":["Lichao Sun","Yue Huang","Haoran Wang","Siyuan Wu","Qihui Zhang","Chujie Gao","Yixin Huang","Wenhan Lyu","Yixuan Zhang","Xiner Li","Zheng Liu","Yixin Liu"],"abstract":"Large language models (LLMs), exemplified by ChatGPT, have gained considerable attention for their excellent natural language processing capabilities. Nonetheless, these LLMs present many challenges, particularly in the realm of trustworthiness. Therefore, ensuring the trustworthiness of LLMs emerges as an important topic. This paper introduces TrustLLM, a comprehensive study of trustworthiness in LLMs, including principles for different dimensions of trustworthiness, established benchmark, evaluation, and analysis of trustworthiness for mainstream LLMs, and discussion of open challenges and future directions. Specifically, we first propose a set of principles for trustworthy LLMs that span eight different dimensions. Based on these principles, we further establish a benchmark across six dimensions including truthfulness, safety, fairness, robustness, privacy, and machine ethics. We then...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer science","large language models","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W4390769997","title":"Quantifying <scp>3D</scp> MR fingerprinting (<scp>3D‐MRF</scp>) reproducibility across subjects, sessions, and scanners automatically using <scp>MNI</scp> atlases","url":"https://doi.org/10.1002/mrm.29983","published":"2024-01-09","authors":["Andrew Dupuis","Yong Chen","Michael S. Hansen","Kelvin Chow","Jessie Sun","Chaitra Badve","Dan Ma","Mark A. Griswold","Rasim Boyacıoğlu"],"abstract":"PURPOSE: Quantitative MRI techniques such as MR fingerprinting (MRF) promise more objective and comparable measurements of tissue properties at the point-of-care than weighted imaging. However, few direct cross-modal comparisons of MRF's repeatability and reproducibility versus weighted acquisitions have been performed. This work proposes a novel fully automated pipeline for quantitatively comparing cross-modal imaging performance in vivo via atlas-based sampling. METHODS: We acquire whole-brain 3D-MRF, turbo spin echo, and MPRAGE sequences three times each on two scanners across 10 subjects, for a total of 60 multimodal datasets. The proposed automated registration and analysis pipeline uses linear and nonlinear registration to align all qualitative and quantitative DICOM stacks to Montreal Neurological Institute (MNI) 152 space, then samples each dataset's native space through transfor...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1002/mrm.29983","openalex_id":"https://openalex.org/W4390769997","cited_by_count":12,"quality_score":49,"matched_keywords":[],"author_affiliations":["Case Western Reserve University","Medical Solutions","Microsoft (United States)","Siemens Healthcare (United States)","University Hospitals of Cleveland"],"concepts":[{"id":"https://openalex.org/C9893847","display_name":"Reproducibility","score":0.7272984385490417},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6601573824882507},{"id":"https://openalex.org/C154020017","display_name":"Repeatability","score":0.5799599289894104},{"id":"https://openalex.org/C153180895","display_name":"Pattern recognition (psychology)","score":0.5396549701690674},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5387552380561829},{"id":"https://openalex.org/C54170458","display_name":"Voxel","score":0.4757004678249359},{"id":"https://openalex.org/C33923547","display_name":"Mathematics","score":0.18993908166885376},{"id":"https://openalex.org/C105795698","display_name":"Statistics","score":0.09827637672424316}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":12}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/agent-ai-surveying-the-horizons-of-multimodal-interaction","title":"Agent AI: Surveying the Horizons of Multimodal Interaction","url":"https://www.microsoft.com/en-us/research/publication/agent-ai-surveying-the-horizons-of-multimodal-interaction/","published":"2024-01-07","authors":["Zane Durante","Qiuyuan Huang","Naoki Wake","Ran Gong","Jae Sung Park","Bidipta Sarkar","Rohan Taori","Yusuke Noda","Demetri Terzopoulos","Yejin Choi","Katsushi Ikeuchi","Hoi Vo"],"abstract":"Multi-modal AI systems will likely become a ubiquitous presence in our everyday lives. A promising approach to making these systems more interactive is to embody them as agents within physical and virtual environments. At present, systems leverage existing foundation models as the basic building blocks for the creation of embodied agents. Embedding agents within such environments facilitates the ability of models to process and interpret visual and contextual data, which is critical for the creation of more sophisticated and context-aware AI systems. For example, a system that can perceive user actions, human behavior, environmental objects, audio expressions, and the collective sentiment of a scene can be used to inform and direct agent responses within the given environment. To accelerate research on agent-based multimodal intelligence, we define \"Agent AI\" as a class of interactive sy...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":88,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computer vision","Human language technologies","Human-computer interaction","Medical, health and genomics","Gaming","1970-01-01","agent"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/long-context-compression-with-activation-beacon","title":"Long Context Compression with Activation Beacon","url":"https://www.microsoft.com/en-us/research/publication/long-context-compression-with-activation-beacon/","published":"2024-01-06","authors":["Peitian Zhang","Zheng Liu","Shitao Xiao","Ninglu Shao","Qiwei Ye","Zhicheng Dou"],"abstract":"Long context compression is a critical research problem due to its significance in reducing the high computational and memory costs associated with LLMs. In this paper, we propose Activation Beacon, a plug-in module for transformer-based LLMs that targets effective, efficient, and flexible compression of long contexts. To achieve this, our method introduces the following technical designs. 1) We directly compress the activations (i.e. keys and values at every layer), rather than leveraging soft prompts to relay information (which constitute a major bottleneck to encapsulate the complex information within long contexts). 2) We tailor the compression workflow, where each fine-grained input unit is progressively compressed, enabling high-quality compression and efficient computation during both training and inference. 3) We train the model through compression-based auto-regression, making f...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":84,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Computation and Language","Computer science","1970-01-01","memory","efficient","compression"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"hf-org-paper:deepseek-ai:2401.02954","title":"DeepSeek LLM: Scaling Open-Source Language Models with Longtermism","url":"https://huggingface.co/papers/2401.02954","published":"2024-01-05","authors":["DeepSeek"],"abstract":"The rapid development of open-source large language models (LLMs) has been truly remarkable. However, the scaling law described in previous literature presents varying conclusions, which casts a dark cloud over scaling LLMs. We delve into the study of scaling laws and present our distinctive findings that facilitate scaling of large scale models in two commonly used open-source configurations, 7B and 67B. Guided by the scaling laws, we introduce DeepSeek LLM, a project dedicated to advancing open-source language models with a long-term perspective. To support the pre-training phase, we have developed a dataset that currently consists of 2 trillion tokens and is continuously expanding. We further conduct supervised fine-tuning (SFT) and Direct Preference Optimization (DPO) on DeepSeek LLM Base models, resulting in the creation of DeepSeek Chat models. Our evaluation results demonstrate th...","companies":["DeepSeek"],"matched_orgs":["DeepSeek"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"technical-report","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["HuggingFace org papers","deepseek-ai","LLM","preference","long-term"],"author_affiliations":["DeepSeek"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official HuggingFace org papers page https://huggingface.co/deepseek-ai/papers"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/odin-a-single-model-for-2d-and-3d-segmentation","title":"ODIN: A Single Model for 2D and 3D Segmentation","url":"https://www.microsoft.com/en-us/research/publication/odin-a-single-model-for-2d-and-3d-segmentation/","published":"2024-01-03","authors":["Ayush Jain","Pushkal Katara","N. Gkanatsios","Adam W. Harley","Gabriel H. Sarch","Kriti Aggarwal","Vishrav Chaudhary","Katerina Fragkiadaki"],"abstract":"State-of-the-art models on contemporary 3D segmentation benchmarks like ScanNet consume and label dataset-provided 3D point clouds, obtained through post processing of sensed multiview RGB-D images. They are typically trained in-domain, forego large-scale 2D pre-training and outperform alternatives that featurize the posed RGB-D multiview images instead. The gap in performance between methods that consume posed images versus post-processed 3D point clouds has fueled the belief that 2D and 3D perception require distinct model architectures. In this paper, we challenge this view and propose ODIN (Omni-Dimensional INstance segmentation), a model that can segment and label both 2D RGB images and 3D point clouds, using a transformer architecture that alternates between 2D within-view and 3D cross-view information fusion. Our model differentiates 2D and 3D feature operations through the positi...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Inproceedings (Conference)","Computer vision","Computer science","1970-01-01","agent"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W4394625624","title":"HD-Fusion: Detailed Text-to-3D Generation Leveraging Multiple Noise Estimation","url":"https://doi.org/10.1109/wacv57701.2024.00317","published":"2024-01-03","authors":["Jinbo Wu","Xiaobo Gao","Xing Liu","Zhengyang Shen","Chen Zhao","Haocheng Feng","Jingtuo Liu","Errui Ding"],"abstract":"In this paper, we study Text-to-3D content generation leveraging 2D diffusion priors to enhance the quality and detail of the generated 3D models. Recent progress [11] in text-to-3D has shown that employing high-resolution (e.g., 512 × 512) renderings can lead to the production of high-quality 3D models using latent diffusion priors. To enable rendering at even higher resolutions, which has the potential to further augment the quality and detail of the models, we propose a novel approach that combines multiple noise estimation processes with a pretrained 2D diffusion prior. Distinct from the Bar-Tal et al.s’ study which binds multiple denoised results [1] to generate images from texts, our approach integrates the computation of scoring distillation losses such as SDS loss and VSD loss which are essential techniques for the 3D content generation with 2D diffusion priors. We experimentally...","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/wacv57701.2024.00317","openalex_id":"https://openalex.org/W4394625624","cited_by_count":18,"quality_score":59,"matched_keywords":["distillation"],"author_affiliations":["Baidu (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6903947591781616},{"id":"https://openalex.org/C99498987","display_name":"Noise (video)","score":0.6106460094451904},{"id":"https://openalex.org/C96250715","display_name":"Estimation","score":0.5556151270866394},{"id":"https://openalex.org/C158525013","display_name":"Fusion","score":0.5129208564758301},{"id":"https://openalex.org/C33954974","display_name":"Sensor fusion","score":0.4822852909564972},{"id":"https://openalex.org/C29265498","display_name":"Noise measurement","score":0.44919419288635254},{"id":"https://openalex.org/C28490314","display_name":"Speech recognition","score":0.4058297276496887},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3809453845024109}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":18}},{"id":"openalex:W4394597991","title":"Text-to-image Editing by Image Information Removal","url":"https://doi.org/10.1109/wacv57701.2024.00515","published":"2024-01-03","authors":["Zhongping Zhang","Jian Zheng","Jacob Zhiyuan Fang","Bryan A. Plummer"],"abstract":"Diffusion models have demonstrated impressive performance in text-guided image generation. Current methods that leverage the knowledge of these models for image editing either fine-tune them using the input image (e.g., Imagic) or incorporate structure information as additional constraints (e.g., ControlNet). However, fine-tuning large-scale diffusion models on a single image can lead to severe overfitting issues and lengthy inference time. Information leakage from pretrained models also make it challenging to preserve image content not related to the text input. Additionally, methods that incorporate structural guidance (e.g., edge maps, semantic maps, keypoints) find retaining attributes like colors and textures difficult. Using the input image as a control could mitigate these issues, but since these models are trained via reconstruction, a model can simply hide information about the....","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/wacv57701.2024.00515","openalex_id":"https://openalex.org/W4394597991","cited_by_count":5,"quality_score":42,"matched_keywords":[],"author_affiliations":["Amazon (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.670269250869751},{"id":"https://openalex.org/C115961682","display_name":"Image (mathematics)","score":0.6406055688858032},{"id":"https://openalex.org/C2776674983","display_name":"Image editing","score":0.5121569633483887},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.4805329442024231},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.41777247190475464},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.4033921957015991}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":5}},{"id":"openalex:W4394593104","title":"Disentangled Pre-training for Image Matting","url":"https://doi.org/10.1109/wacv57701.2024.00024","published":"2024-01-03","authors":["Yanda Li","Zilong Huang","Gang Yu","Ling Chen","Yunchao Wei","Jianbo Jiao"],"abstract":"Image matting requires high-quality pixel-level human annotations to support the training of a deep model in recent literature. Whereas such annotation is costly and hard to scale, significantly holding back the development of the research. In this work, we make the first attempt towards addressing this problem, by proposing a self-supervised pretraining approach that can leverage infinite numbers of data to boost the matting performance. The pre-training task is designed in a similar manner as image matting, where random trimap and alpha matte are generated to achieve an image disentanglement objective. The pre-trained model is then used as an initialisation of the downstream matting task for fine-tuning. Extensive experimental evaluations show that the proposed approach outperforms both the state-of-the-art matting methods and other alternative self-supervised initialisation approaches...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/wacv57701.2024.00024","openalex_id":"https://openalex.org/W4394593104","cited_by_count":1,"quality_score":38,"matched_keywords":[],"author_affiliations":["Beijing Jiaotong University","Tencent (China)","University of Birmingham","University of Technology Sydney"],"concepts":[{"id":"https://openalex.org/C2777211547","display_name":"Training (meteorology)","score":0.6855170130729675},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.568254828453064},{"id":"https://openalex.org/C115961682","display_name":"Image (mathematics)","score":0.48179763555526733},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.4740279018878937},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.46460431814193726},{"id":"https://openalex.org/C205649164","display_name":"Geography","score":0.08172094821929932},{"id":"https://openalex.org/C153294291","display_name":"Meteorology","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4390490761","title":"Explainability for Large Language Models: A Survey","url":"https://doi.org/10.1145/3639372","published":"2024-01-02","authors":["Haiyan Zhao","Hanjie Chen","Fan Yang","Ninghao Liu","Huiqi Deng","Hengyi Cai","Shuaiqiang Wang","Dawei Yin","Mengnan Du"],"abstract":"Large language models (LLMs) have demonstrated impressive capabilities in natural language processing. However, their internal mechanisms are still unclear and this lack of transparency poses unwanted risks for downstream applications. Therefore, understanding and explaining these models is crucial for elucidating their behaviors, limitations, and social impacts. In this article, we introduce a taxonomy of explainability techniques and provide a structured overview of methods for explaining Transformer-based language models. We categorize techniques based on the training paradigms of LLMs: traditional fine-tuning-based paradigm and prompting-based paradigm. For each paradigm, we summarize the goals and dominant approaches for generating local explanations of individual predictions and global explanations of overall model knowledge. We also discuss metrics for evaluating generated explana...","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1145/3639372","openalex_id":"https://openalex.org/W4390490761","cited_by_count":500,"quality_score":67,"matched_keywords":[],"author_affiliations":["Baidu (China)","Institute of Computing Technology","Johns Hopkins University","New Jersey Institute of Technology","Shanghai Jiao Tong University","University of Georgia","Wake Forest University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8967035412788391},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.4575468897819519},{"id":"https://openalex.org/C2522767166","display_name":"Data science","score":0.3477069139480591},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3338547945022583},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.32770025730133057}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":500}},{"id":"openalex:W3098826124","title":"Overview of the Ninth Dialog System Technology Challenge: DSTC9","url":"http://dx.doi.org/10.1109/taslp.2024.3426331","published":"2024-01-01","authors":["Chulaka Gunasekara","Seokhwan Kim","Luis Fernando D’Haro","Abhinav Rastogi","Yun-Nung Chen","Mihail Eric","Behnam Hedayatnia","Karthik Gopalakrishnan","Yang Liu","Chao-Wei Huang","Dilek Hakkani‐Tür","Jinchao Li"],"abstract":"This paper introduces the Ninth Dialog System Technology Challenge (DSTC-9). This edition of the DSTC focuses on applying end-to-end dialog technologies for four distinct tasks in dialog systems, namely, 1. Task-oriented dialog Modeling with Unstructured Knowledge Access, 2. Multi-domain task-oriented dialog, 3. Interactive evaluation of dialog and 4. Situated interactive multimodal dialog. This paper describes the task definition, provided datasets, baselines, and evaluation setup for each track. We also summarize the results of the submitted systems to highlight the general trends of the state-of-the-art technologies for the tasks.","companies":["Meta/FAIR","Microsoft","Apple","Amazon"],"matched_orgs":["Meta/FAIR","Microsoft","Apple","Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/taslp.2024.3426331","openalex_id":"https://openalex.org/W3098826124","cited_by_count":31,"quality_score":103,"matched_keywords":[],"author_affiliations":["Amazon (United States)","Apple (United States)","Bellevue Hospital Center","Carnegie Mellon University","Contextual Change (United States)","Google (United States)","IBM (United States)","META Health","Meta (United States)","Microsoft (United States)","National Taiwan University","Snap (United States)","Tsinghua University","Universidad Politécnica de Madrid","University of British Columbia","University of Illinois Urbana-Champaign"],"concepts":[{"id":"https://openalex.org/C173853756","display_name":"Dialog box","score":0.9875589609146118},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7916437387466431},{"id":"https://openalex.org/C2780451532","display_name":"Task (project management)","score":0.7853617668151855},{"id":"https://openalex.org/C190954187","display_name":"Dialog system","score":0.778814435005188},{"id":"https://openalex.org/C36503486","display_name":"Domain (mathematical analysis)","score":0.5740557312965393},{"id":"https://openalex.org/C177264268","display_name":"Set (abstract data type)","score":0.5157516002655029},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.47508373856544495},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.40430760383605957}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":31}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/pydex-repairing-bugs-in-introductory-python-assignments-using-llms","title":"PyDex: Repairing Bugs in Introductory Python Assignments using LLMs","url":"https://www.microsoft.com/en-us/research/publication/pydex-repairing-bugs-in-introductory-python-assignments-using-llms/","published":"2024-01-01","authors":["Jialu Zhang","José Cambronero","Sumit Gulwani","Vu Le","Ruzica Piskac","Gustavo Soares","Gust Verbruggen"],"abstract":"Students often make mistakes in their introductory programming assignments as part of their learning process. Unfortunately, providing custom repairs for these mistakes can require a substantial amount of time and effort from class instructors. Automated program repair (APR) techniques can be used to synthesize such fixes. Prior work has explored the use of symbolic and neural techniques for APR in the education domain. Both types of approaches require either substantial engineering efforts or large amounts of data and training. We propose to use a large language model trained on code, such as Codex (a version of GPT), to build an APR system – PyDex – for introductory Python programming assignments. Our system can fix both syntactic and semantic mistakes by combining multi-modal prompts, iterative querying, test-case-based selection of few-shots, and program chunking. We evaluate PyDex o...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":80,"matched_keywords":["Inproceedings (Conference)","Artificial intelligence","Programming languages and software engineering","Programming language","Software","1970-01-01","language model"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/improving-text-embeddings-with-large-language-models","title":"Improving Text Embeddings with Large Language Models","url":"https://www.microsoft.com/en-us/research/publication/improving-text-embeddings-with-large-language-models/","published":"2024-01-01","authors":["Liang Wang","Nan Yang","Xiaolong Huang","Linjun Yang","Rangan Majumder","Furu Wei"],"abstract":"In this paper, we introduce a novel and simple method for obtaining high-quality text embeddings using only synthetic data and less than 1k training steps. Unlike existing methods that often depend on multi-stage intermediate pre-training with billions of weakly-supervised text pairs, followed by fine-tuning with a few labeled datasets, our method does not require building complex training pipelines or relying on manually collected datasets that are often constrained by task diversity and language coverage. We leverage proprietary LLMs to generate diverse synthetic data for hundreds of thousands of text embedding tasks across nearly 100 languages. We then fine-tune open-source decoder-only LLMs on the synthetic data using standard contrastive loss. Experiments demonstrate that our method achieves strong performance on highly competitive text embedding benchmarks without using any labeled...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":76,"matched_keywords":["Inproceedings (Conference)","Human language technologies","Search and information retrieval","human language technologies","Information retrieval","1970-01-01"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W4396721893","title":"nach0: multimodal natural and chemical languages foundation model","url":"https://doi.org/10.1039/d4sc00966e","published":"2024-01-01","authors":["Micha Livne","Zulfat Miftahutdinov","Elena Tutubalina","Maksim Kuznetsov","Daniil Polykovskiy","Annika Brundyn","Aastha Jhunjhunwala","Anthony Costa","Alex Aliper","Alán Aspuru‐Guzik","Alex Zhavoronkov"],"abstract":"Large Language Models (LLMs) have substantially driven scientific progress in various domains, and many papers have demonstrated their ability to tackle complex problems with creative solutions. Our paper introduces a new foundation model, nach0, capable of solving various chemical and biological tasks: biomedical question answering, named entity recognition, molecular generation, molecular synthesis, attributes prediction, and others. nach0 is a multi-domain and multi-task encoder-decoder LLM pre-trained on unlabeled text from scientific literature, patents, and molecule strings to incorporate a range of chemical and linguistic knowledge. We employed instruction tuning, where specific task-related instructions are utilized to fine-tune nach0 for the final set of tasks. To train nach0 effectively, we leverage the NeMo framework, enabling efficient parallel optimization of both base and l...","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1039/d4sc00966e","openalex_id":"https://openalex.org/W4396721893","cited_by_count":34,"quality_score":75,"matched_keywords":["LLM","efficient"],"author_affiliations":["Nvidia (United States)","University of Toronto"],"concepts":[{"id":"https://openalex.org/C2780966255","display_name":"Foundation (evidence)","score":0.886376142501831},{"id":"https://openalex.org/C2776608160","display_name":"Natural (archaeology)","score":0.6593459844589233},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.4856511354446411},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.33312833309173584},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.3319287896156311},{"id":"https://openalex.org/C205649164","display_name":"Geography","score":0.11930599808692932},{"id":"https://openalex.org/C166957645","display_name":"Archaeology","score":0.07261425256729126}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":34}},{"id":"openalex:W4392616472","title":"Exploring Human-Like Translation Strategy with Large Language Models","url":"https://doi.org/10.1162/tacl_a_00642","published":"2024-01-01","authors":["Zhiwei He","Tian Liang","Wenxiang Jiao","Zhuosheng Zhang","Yujiu Yang","Rui Wang","Zhaopeng Tu","Shuming Shi","Xing Wang"],"abstract":"Abstract Large language models (LLMs) have demonstrated impressive capabilities in general scenarios, exhibiting a level of aptitude that approaches, in some aspects even surpasses, human-level intelligence. Among their numerous skills, the translation abilities of LLMs have received considerable attention. Compared to typical machine translation that focuses solely on source-to-target mapping, LLM-based translation can potentially mimic the human translation process, which might take preparatory steps to ensure high-quality translation. This work explores this possibility by proposing the MAPS framework, which stands for Multi-Aspect Prompting and Selection. Specifically, we enable LLMs first to analyze the given source sentence and induce three aspects of translation-related knowledge (keywords, topics, and relevant demonstrations) to guide the final translation process. Moreover, we e...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1162/tacl_a_00642","openalex_id":"https://openalex.org/W4392616472","cited_by_count":70,"quality_score":75,"matched_keywords":["LLM","preference"],"author_affiliations":["Shanghai Jiao Tong University","Tencent (China)","Tsinghua University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8547654747962952},{"id":"https://openalex.org/C149364088","display_name":"Translation (biology)","score":0.6510929465293884},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.5135488510131836},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.474331796169281},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.41488173604011536},{"id":"https://openalex.org/C2522767166","display_name":"Data science","score":0.32414138317108154},{"id":"https://openalex.org/C185592680","display_name":"Chemistry","score":0.0},{"id":"https://openalex.org/C105580179","display_name":"Messenger RNA","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":70}},{"id":"openalex:W4400111385","title":"SpeechX: Neural Codec Language Model as a Versatile Speech Transformer","url":"https://doi.org/10.1109/taslp.2024.3419418","published":"2024-01-01","authors":["Xiaofei Wang","Manthan Thakker","Zhuo Chen","Naoyuki Kanda","Şefik Emre Eskimez","Sanyuan Chen","Min Tang","Shujie Liu","Jinyu Li","Takuya Yoshioka"],"abstract":"Recent advancements in generative speech models based on audio-text prompts have enabled remarkable innovations like high-quality zero-shot text-to-speech. However, existing models still face limitations in handling diverse audio-text speech generation tasks involving transforming input speech and processing audio captured in adverse acoustic conditions. This paper introduces SpeechX, a versatile speech generation model capable of zero-shot TTS and various speech transformation tasks, dealing with both clean and noisy signals. SpeechX combines neural codec language modeling with multi-task learning using task-dependent prompting, enabling unified and extensible modeling and providing a consistent way for leveraging textual input in speech enhancement and transformation tasks. Experimental results show SpeechX's efficacy in various tasks, including zero-shot TTS, noise suppression, target...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/taslp.2024.3419418","openalex_id":"https://openalex.org/W4400111385","cited_by_count":34,"quality_score":71,"matched_keywords":["language model"],"author_affiliations":["Microsoft (United States)","Microsoft Research Asia (China)"],"concepts":[{"id":"https://openalex.org/C66322947","display_name":"Transformer","score":0.6752513647079468},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6713560819625854},{"id":"https://openalex.org/C177067256","display_name":"Adaptive Multi-Rate audio codec","score":0.4974720776081085},{"id":"https://openalex.org/C28490314","display_name":"Speech recognition","score":0.4870748817920685},{"id":"https://openalex.org/C161765866","display_name":"Codec","score":0.4386309087276459},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.34931832551956177},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3229343295097351},{"id":"https://openalex.org/C61328038","display_name":"Speech processing","score":0.2193414270877838}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":34}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/target-aware-molecule-generation-for-drug-design-using-a-chemical-language-model","title":"Target-aware Molecule Generation for Drug Design Using a Chemical Language Model","url":"https://www.microsoft.com/en-us/research/publication/target-aware-molecule-generation-for-drug-design-using-a-chemical-language-model/","published":"2024-01-01","authors":["Yingce Xia","Kehan Wu","Pan Deng","Renhe Liu","Yuan Zhang","Han Guo","Yumeng Cui","Qizhi Pei","Lijun Wu","Shufang Xie","Si Chen","Xi Lu"],"abstract":"Generative drug design facilitates the creation of compounds effective against specific pathogenic target proteins. This opens up the potential to discover novel compounds within the vast chemical space and fosters the development of innovative therapeutic strategies. However, the practicality of generated molecules is often limited, as many designs focus on a narrow set of drug-related properties, failing to improve the success rate of the subsequent drug discovery process. To overcome these challenges, we develop TamGen, a method that employs a GPT-like chemical language model and enables target-aware molecule generation and compound refinement. We demonstrate that the compounds generated by TamGen have improved molecular quality and viability. Furthermore, we have integrated TamGen into a drug discovery pipeline and identified 7 compounds showing compelling inhibitory activity against...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Unpublished","Medical, health and genomics","Biochemistry","language model"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"microsoft:url:https://www.microsoft.com/en-us/research/publication/advanced-long-content-speech-recognition-with-factorized-neural-transducer","title":"Advanced Long-Content Speech Recognition with Factorized Neural Transducer","url":"https://www.microsoft.com/en-us/research/publication/advanced-long-content-speech-recognition-with-factorized-neural-transducer/","published":"2024-01-01","authors":["Xun Gong","Yu Wu","Jinyu Li","Shujie Liu","Rui Zhao","Xie Chen","Yanmin Qian"],"abstract":"Long-content automatic speech recognition (ASR) has obtained increasing interest in recent years, as it captures the relationship among consecutive historical utterances while decoding the current utterance.In this paper, we propose two novel approaches, which integrate {long-content} information into the factorized neural transducer~(FNT) based architecture in both non-streaming~(referred to as LongFNT) and streaming~(referred to as SLongFNT) scenarios.We first investigate whether {long-content} transcriptions can improve the vanilla conformer transducer~(C-T) models.Our experiments indicate that the vanilla C-T models do not exhibit improved performance when utilizing {long-content} transcriptions, possibly due to the predictor network of C-T models not functioning as a pure language model.Instead, FNT shows its potential in utilizing {long-content} information, where we propose the Lo...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Article (Journal)","Human language technologies","1970-01-01","language model"],"author_affiliations":["Microsoft"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Microsoft Research publications API https://www.microsoft.com/en-us/research/wp-json/microsoft-research/v1/faceted-search"}},{"id":"openalex:W4396952261","title":"Multimodal Deep Learning","url":"https://doi.org/10.1007/978-3-031-53092-0_10","published":"2024-01-01","authors":["Amirreza Shaban","Safoora Yousefi"],"abstract":"","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"book-chapter","doi":"https://doi.org/10.1007/978-3-031-53092-0_10","openalex_id":"https://openalex.org/W4396952261","cited_by_count":190,"quality_score":67,"matched_keywords":[],"author_affiliations":["Microsoft (United States)"],"concepts":[{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4537206292152405},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.41751939058303833},{"id":"https://openalex.org/C15744967","display_name":"Psychology","score":0.3640570640563965}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":190}},{"id":"openalex:W4392846385","title":"Large Language Models are Zero-Shot Rankers for Recommender Systems","url":"https://doi.org/10.1007/978-3-031-56060-6_24","published":"2024-01-01","authors":["Yupeng Hou","Junjie Zhang","Zihan Lin","Hongyu Lu","Ruobing Xie","Julian McAuley","Wayne Xin Zhao"],"abstract":"","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"book-chapter","doi":"https://doi.org/10.1007/978-3-031-56060-6_24","openalex_id":"https://openalex.org/W4392846385","cited_by_count":220,"quality_score":67,"matched_keywords":[],"author_affiliations":["Renmin University of China","Tencent (China)","UC San Diego Health System"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8612848520278931},{"id":"https://openalex.org/C557471498","display_name":"Recommender system","score":0.777891993522644},{"id":"https://openalex.org/C2780813799","display_name":"Zero (linguistics)","score":0.5519919395446777},{"id":"https://openalex.org/C2778344882","display_name":"Shot (pellet)","score":0.4299982190132141},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3775554895401001},{"id":"https://openalex.org/C80444323","display_name":"Theoretical computer science","score":0.3313939869403839},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.31580665707588196},{"id":"https://openalex.org/C41895202","display_name":"Linguistics","score":0.05463644862174988}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":220}},{"id":"openalex:W4391827269","title":"Disentangled Cross-Modal Transformer for RGB-D Salient Object Detection and Beyond","url":"https://doi.org/10.1109/tip.2024.3364022","published":"2024-01-01","authors":["Hao Chen","Feihong Shen","Ding Ding","Yongjian Deng","Chao Li"],"abstract":"Previous multi-modal transformers for RGB-D salient object detection (SOD) generally directly connect all patches from two modalities to model cross-modal correlation and perform multi-modal combination without differentiation, which can lead to confusing and inefficient fusion. Instead, we disentangle the cross-modal complementarity from two views to reduce cross-modal fusion ambiguity: 1) Context disentanglement. We argue that modeling long-range dependencies across modalities as done before is uninformative due to the severe modality gap. Differently, we propose to disentangle the cross-modal complementary contexts to intra-modal self-attention to explore global complementary understanding, and spatial-aligned inter-modal attention to capture local cross-modal correlations, respectively. 2) Representation disentanglement. Unlike previous undifferentiated combination of cross-modal rep...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/tip.2024.3364022","openalex_id":"https://openalex.org/W4391827269","cited_by_count":39,"quality_score":67,"matched_keywords":[],"author_affiliations":["Alibaba Group (China)","Beijing University of Technology","Southeast University"],"concepts":[{"id":"https://openalex.org/C71139939","display_name":"Modal","score":0.7231338620185852},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6506085991859436},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5019431114196777},{"id":"https://openalex.org/C66322947","display_name":"Transformer","score":0.4528804421424866},{"id":"https://openalex.org/C82990744","display_name":"RGB color model","score":0.41669604182243347},{"id":"https://openalex.org/C153180895","display_name":"Pattern recognition (psychology)","score":0.34029656648635864},{"id":"https://openalex.org/C165801399","display_name":"Voltage","score":0.11588582396507263},{"id":"https://openalex.org/C127413603","display_name":"Engineering","score":0.09894946217536926}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":39}},{"id":"openalex:W4403069407","title":"Cross Prompting Consistency with Segment Anything Model for Semi-supervised Medical Image Segmentation","url":"https://doi.org/10.1007/978-3-031-72120-5_16","published":"2024-01-01","authors":["Juzheng Miao","Cheng Chen","Keli Zhang","Jie Chuai","Quanzheng Li","Pheng‐Ann Heng"],"abstract":"","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"book-chapter","doi":"https://doi.org/10.1007/978-3-031-72120-5_16","openalex_id":"https://openalex.org/W4403069407","cited_by_count":30,"quality_score":67,"matched_keywords":[],"author_affiliations":["Chinese University of Hong Kong","Harvard University","Huawei Technologies (China)","Massachusetts General Hospital","Office of the General Counsel"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.82901930809021},{"id":"https://openalex.org/C2776436953","display_name":"Consistency (knowledge bases)","score":0.7522525787353516},{"id":"https://openalex.org/C89600930","display_name":"Segmentation","score":0.6300560235977173},{"id":"https://openalex.org/C124504099","display_name":"Image segmentation","score":0.6250148415565491},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.6034531593322754},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.559912383556366},{"id":"https://openalex.org/C115961682","display_name":"Image (mathematics)","score":0.49066197872161865},{"id":"https://openalex.org/C153180895","display_name":"Pattern recognition (psychology)","score":0.32510024309158325}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":30}},{"id":"openalex:W4394828156","title":"ChatGPT for Robotics: Design Principles and Model Abilities","url":"https://doi.org/10.1109/access.2024.3387941","published":"2024-01-01","authors":["Sai Vemprala","Rogerio Bonatti","Arthur Bucker","Ashish Kapoor"],"abstract":"This paper presents an experimental study regarding the use of OpenAI’s ChatGPT [1] for robotics applications. We outline a strategy that combines design principles for prompt engineering and the creation of a high-level function library which allows ChatGPT to adapt to different robotics tasks, simulators, and form factors. We focus our evaluations on the effectiveness of different prompt engineering techniques and dialog strategies towards the execution of various types of robotics tasks. We explore ChatGPT’s ability to use free-form dialog, parse XML tags, and to synthesize code, in addition to the use of task-specific prompting functions and closed-loop reasoning through dialogues. Our study encompasses a range of tasks within the robotics domain, from basic logical, geometrical, and mathematical reasoning all the way to complex domains such as aerial navigation, manipulation, and em...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/access.2024.3387941","openalex_id":"https://openalex.org/W4394828156","cited_by_count":415,"quality_score":67,"matched_keywords":[],"author_affiliations":["Carnegie Mellon University","Microsoft (United States)"],"concepts":[{"id":"https://openalex.org/C34413123","display_name":"Robotics","score":0.8520113229751587},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.7783750295639038},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7534322738647461},{"id":"https://openalex.org/C173853756","display_name":"Dialog box","score":0.5887138843536377},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.5488829612731934},{"id":"https://openalex.org/C90509273","display_name":"Robot","score":0.523618757724762},{"id":"https://openalex.org/C60940618","display_name":"Robotic paradigms","score":0.5221477150917053},{"id":"https://openalex.org/C71901391","display_name":"Upload","score":0.4332472085952759}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":415}},{"id":"openalex:W4394862910","title":"A Large-Scale Evaluation of Speech Foundation Models","url":"https://doi.org/10.1109/taslp.2024.3389631","published":"2024-01-01","authors":["Shu-Wen Yang","Heng-Jui Chang","Zili Huang","Andy T. Liu","Cheng-I Lai","Haibin Wu","Jiatong Shi","Xuankai Chang","Hsiang-Sheng Tsai","Wen-Chin Huang","Tzu-hsun Feng","Po-Han Chi"],"abstract":"The foundation model paradigm leverages a shared foundation model to achieve state-of-the-art (SOTA) performance for various tasks, requiring minimal downstream-specific data collection and modeling. This approach has proven crucial in the field of Natural Language Processing (NLP). However, the speech processing community lacks a similar setup to explore the paradigm systematically. To bridge this gap, we establish the Speech processing Universal PERformance Benchmark (SUPERB). SUPERB represents an ecosystem designed to evaluate foundation models across a wide range of speech processing tasks, facilitating the sharing of results on an online leaderboard and fostering collaboration through a community-driven benchmark database that aids in new development cycles. We present a unified learning framework for solving the speech processing tasks in SUPERB with the frozen foundation model fol...","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/taslp.2024.3389631","openalex_id":"https://openalex.org/W4394862910","cited_by_count":41,"quality_score":67,"matched_keywords":[],"author_affiliations":["Amazon (United States)","Johns Hopkins University","Lawrie Technology (United States)","Menlo School","Nagoya University","National Taiwan University"],"concepts":[{"id":"https://openalex.org/C2780966255","display_name":"Foundation (evidence)","score":0.7324991226196289},{"id":"https://openalex.org/C2778755073","display_name":"Scale (ratio)","score":0.6267046332359314},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.49109530448913574},{"id":"https://openalex.org/C152588345","display_name":"Scale model","score":0.4167368412017822},{"id":"https://openalex.org/C28490314","display_name":"Speech recognition","score":0.3882451355457306},{"id":"https://openalex.org/C127413603","display_name":"Engineering","score":0.25778642296791077},{"id":"https://openalex.org/C95457728","display_name":"History","score":0.1667809784412384},{"id":"https://openalex.org/C205649164","display_name":"Geography","score":0.13650202751159668}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":41}},{"id":"huawei-noah:179","title":"LLM4EDA: Emerging Progress in Large Language Models for Electronic Design Automation","url":"https://www.noahlab.com.hk/en/scientific_research/llm4eda-emerging-progress-in-large-language-models-for-electronic-design-automation","published":"2024-01-01","authors":["Huawei/Noah"],"abstract":"Official Huawei Noah's Ark Lab publication entry. Venue: arXiv 24. External paper link: https://arxiv.org/pdf/2401.12224","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Industry Intelligence","arXiv 24","2024"],"author_affiliations":["Huawei/Noah"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Huawei Noah Wagtail publication API https://www.noahlab.com.hk/wt_app/api/v2/pages/"}},{"id":"openalex:W4403051598","title":"Retrieval-style In-context Learning for Few-shot Hierarchical Text Classification","url":"https://doi.org/10.1162/tacl_a_00697","published":"2024-01-01","authors":["Huiyao Chen","Yu Zhao","Zulong Chen","Mengjia Wang","Liangyue Li","Meishan Zhang","Min Zhang"],"abstract":"Abstract Hierarchical text classification (HTC) is an important task with broad applications, and few-shot HTC has gained increasing interest recently. While in-context learning (ICL) with large language models (LLMs) has achieved significant success in few-shot learning, it is not as effective for HTC because of the expansive hierarchical label sets and extremely ambiguous labels. In this work, we introduce the first ICL-based framework with LLM for few-shot HTC. We exploit a retrieval database to identify relevant demonstrations, and an iterative policy to manage multi-layer hierarchical labels. Particularly, we equip the retrieval database with HTC label-aware representations for the input texts, which is achieved by continual training on a pretrained language model with masked language modeling (MLM), layer-wise classification (CLS, specifically for HTC), and a novel divergent contra...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1162/tacl_a_00697","openalex_id":"https://openalex.org/W4403051598","cited_by_count":13,"quality_score":62,"matched_keywords":["LLM","language model","retrieval"],"author_affiliations":["Alibaba Group (China)","Harbin Institute of Technology","Tianjin University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8353400826454163},{"id":"https://openalex.org/C2776445246","display_name":"Style (visual arts)","score":0.7187879681587219},{"id":"https://openalex.org/C2779343474","display_name":"Context (archaeology)","score":0.6842403411865234},{"id":"https://openalex.org/C2778344882","display_name":"Shot (pellet)","score":0.6434537768363953},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5590766072273254},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.5306348204612732},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.5210235118865967},{"id":"https://openalex.org/C2992734406","display_name":"One shot","score":0.48926565051078796}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":13}},{"id":"openalex:W4401208924","title":"Decision-Oriented Dialogue for Human-AI Collaboration","url":"https://doi.org/10.1162/tacl_a_00679","published":"2024-01-01","authors":["Jessy Lin","Nicholas Tomlin","Jacob Andreas","Jason Eisner"],"abstract":"Abstract We describe a class of tasks called decision-oriented dialogues, in which AI assistants such as large language models (LMs) must collaborate with one or more humans via natural language to help them make complex decisions. We formalize three domains in which users face everyday decisions: (1) choosing an assignment of reviewers to conference papers, (2) planning a multi-step itinerary in a city, and (3) negotiating travel plans for a group of friends. In each of these settings, AI assistants and users have disparate abilities that they must combine to arrive at the best decision: Assistants can access and process large amounts of information, while users have preferences and constraints external to the system. For each task, we build a dialogue environment where agents receive a reward based on the quality of the final decision they reach. We evaluate LMs in self-play and in col...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1162/tacl_a_00679","openalex_id":"https://openalex.org/W4401208924","cited_by_count":25,"quality_score":62,"matched_keywords":[],"author_affiliations":["Berkeley College","Microsoft (United States)","University of California, Berkeley"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8147038221359253},{"id":"https://openalex.org/C2522767166","display_name":"Data science","score":0.42821449041366577},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4100201725959778},{"id":"https://openalex.org/C56739046","display_name":"Knowledge management","score":0.36062031984329224},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.3310130834579468}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":25}},{"id":"openalex:W4401070302","title":"VioLA: Conditional Language Models for Speech Recognition, Synthesis, and Translation","url":"https://doi.org/10.1109/taslp.2024.3434425","published":"2024-01-01","authors":["Tianrui Wang","Long Zhou","Ziqiang Zhang","Yu Wu","Shujie Liu","Yashesh Gaur","Zhuo Chen","Jinyu Li","Furu Wei"],"abstract":"Recent research shows a big convergence in model architecture, training objectives, and inference methods across various tasks for different modalities. In this paper, we propose <sc xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" xmlns:xlink=\"http://www.w3.org/1999/xlink\"><b>VioLA</b></small>, a single auto-regressive Transformer decoder-only network that unifies various cross-modal tasks involving speech and text, such as speech-to-text, text-to-text, text-to-speech, and speech-to-speech tasks, as a conditional language model task via multi-task learning framework. To accomplish this, we first convert the speech utterances to discrete tokens (similar to the textual data) using an offline neural codec encoder. In such a way, all these tasks are converted to token-based sequence prediction problems, which can be naturally handled with one conditional language model. We further integrate t...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/taslp.2024.3434425","openalex_id":"https://openalex.org/W4401070302","cited_by_count":19,"quality_score":60,"matched_keywords":["language model"],"author_affiliations":["Beijing Jiaotong University","Microsoft (United States)","Microsoft Research Asia (China)","University of Science and Technology of China"],"concepts":[{"id":"https://openalex.org/C149364088","display_name":"Translation (biology)","score":0.6282713413238525},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6113629341125488},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.5739414691925049},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5300628542900085},{"id":"https://openalex.org/C68615497","display_name":"Viola","score":0.4955238103866577},{"id":"https://openalex.org/C28490314","display_name":"Speech recognition","score":0.4761655032634735},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.43861281871795654},{"id":"https://openalex.org/C86803240","display_name":"Biology","score":0.12515252828598022}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":19}},{"id":"openalex:W4403365573","title":"MoME: Mixture of Multimodal Experts for Cancer Survival Prediction","url":"https://doi.org/10.1007/978-3-031-72083-3_30","published":"2024-01-01","authors":["Conghao Xiong","Hao Chen","Hao Zheng","Wei Dong","Yefeng Zheng","Joseph J. Y. Sung","Irwin King"],"abstract":"","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"book-chapter","doi":"https://doi.org/10.1007/978-3-031-72083-3_30","openalex_id":"https://openalex.org/W4403365573","cited_by_count":20,"quality_score":57,"matched_keywords":[],"author_affiliations":["Chinese University of Hong Kong","Hong Kong University of Science and Technology","Nanyang Technological University","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7676932215690613},{"id":"https://openalex.org/C121608353","display_name":"Cancer","score":0.487091064453125},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.42119163274765015},{"id":"https://openalex.org/C126322002","display_name":"Internal medicine","score":0.06836414337158203},{"id":"https://openalex.org/C71924100","display_name":"Medicine","score":0.06393924355506897}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":20}},{"id":"openalex:W4404689873","title":"Findings of the WMT24 General Machine Translation Shared Task: The LLM Era Is Here but MT Is Not Solved Yet","url":"https://doi.org/10.18653/v1/2024.wmt-1.1","published":"2024-01-01","authors":["Tom Kocmi","Eleftherios Avramidis","Rachel Bawden","Ondřej Bojar","Anton Dvorkovich","Christian Federmann","Mark Fishel","Markus Freitag","Thamme Gowda","Roman Grundkiewicz","Barry Haddow","Marzena Karpinska"],"abstract":"Tom Kocmi, Eleftherios Avramidis, Rachel Bawden, Ondřej Bojar, Anton Dvorkovich, Christian Federmann, Mark Fishel, Markus Freitag, Thamme Gowda, Roman Grundkiewicz, Barry Haddow, Marzena Karpinska, Philipp Koehn, Benjamin Marie, Christof Monz, Kenton Murray, Masaaki Nagata, Martin Popel, Maja Popović, Mariya Shmatova, Steinthór Steingrímsson, Vilém Zouhar. Proceedings of the Ninth Conference on Machine Translation. 2024.","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.18653/v1/2024.wmt-1.1","openalex_id":"https://openalex.org/W4404689873","cited_by_count":16,"quality_score":57,"matched_keywords":["LLM"],"author_affiliations":["Board of the Swiss Federal Institutes of Technology","Charles University","Dublin City University","ETH Zurich","Edinburgh College","German Research Centre for Artificial Intelligence","Google (United States)","Johns Hopkins University","Microsoft (Germany)","Microsoft (United Kingdom)","Microsoft (United States)","University of Amsterdam","University of Edinburgh","University of Massachusetts Amherst","University of Tartu","Árni Magnússon Institute for Icelandic Studies"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6753889918327332},{"id":"https://openalex.org/C2780451532","display_name":"Task (project management)","score":0.6377460360527039},{"id":"https://openalex.org/C203005215","display_name":"Machine translation","score":0.6135438680648804},{"id":"https://openalex.org/C149364088","display_name":"Translation (biology)","score":0.5740921497344971},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.37533074617385864},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.36030638217926025},{"id":"https://openalex.org/C127413603","display_name":"Engineering","score":0.14015257358551025},{"id":"https://openalex.org/C201995342","display_name":"Systems engineering","score":0.0843903124332428}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":16}},{"id":"openalex:W4390887452","title":"CoRe-Sleep: A Multimodal Fusion Framework for Time Series Robust to Imperfect Modalities","url":"https://doi.org/10.1109/tnsre.2024.3354388","published":"2024-01-01","authors":["Konstantinos Kontras","Christos Chatzichristos","Huy Phan","Johan A. K. Suykens","Maarten De Vos"],"abstract":"Sleep abnormalities can have severe health consequences. Automated sleep staging, i.e. labelling the sequence of sleep stages from the patient's physiological recordings, could simplify the diagnostic process. Previous work on automated sleep staging has achieved great results, mainly relying on the EEG signal. However, often multiple sources of information are available beyond EEG. This can be particularly beneficial when the EEG recordings are noisy or even missing completely. In this paper, we propose CoRe-Sleep, a Coordinated Representation multimodal fusion network that is particularly focused on improving the robustness of signal analysis on imperfect data. We demonstrate how appropriately handling multimodal information can be the key to achieving such robustness. CoRe-Sleep tolerates noisy or missing modalities segments, allowing training on incomplete data. Additionally, it show...","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/tnsre.2024.3354388","openalex_id":"https://openalex.org/W4390887452","cited_by_count":19,"quality_score":56,"matched_keywords":[],"author_affiliations":["Amazon (United States)","KU Leuven","Queen Mary University of London"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7681261301040649},{"id":"https://openalex.org/C63479239","display_name":"Robustness (evolution)","score":0.7260067462921143},{"id":"https://openalex.org/C2779903281","display_name":"Modalities","score":0.7012937068939209},{"id":"https://openalex.org/C2775841894","display_name":"Sleep (system call)","score":0.5520213842391968},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5356857776641846},{"id":"https://openalex.org/C9357733","display_name":"Missing data","score":0.49805760383605957},{"id":"https://openalex.org/C2780310539","display_name":"Imperfect","score":0.4938672184944153},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.45356154441833496}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":19}},{"id":"openalex:W4404317174","title":"CLAPSep: Leveraging Contrastive Pre-Trained Model for Multi-Modal Query-Conditioned Target Sound Extraction","url":"https://doi.org/10.1109/taslp.2024.3497586","published":"2024-01-01","authors":["Hao Ma","Zhiyuan Peng","Li Xu","Mingjie Shao","Xixin Wu","Ju Liu"],"abstract":"Universal sound separation (USS) aims to extract arbitrary types of sounds from real-world recordings. This can be achieved by language-queried target sound extraction (TSE), which typically consists of two components: a query network that converts user queries into conditional embeddings, and a separation network that extracts the target sound accordingly. Existing methods commonly train models from scratch. As a consequence, substantial data and computational resources are required to make the randomly initialized model comprehend sound events and perform separation accordingly. In this paper, we propose to integrate pre-trained models into TSE models to address the above issue. To be specific, we tailor and adapt the powerful contrastive language-audio pre-trained model (CLAP) for USS, denoted as CLAPSep. CLAPSep also accepts flexible user inputs, taking both positive and negative use...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/taslp.2024.3497586","openalex_id":"https://openalex.org/W4404317174","cited_by_count":19,"quality_score":56,"matched_keywords":[],"author_affiliations":["Chinese University of Hong Kong","North Carolina State University","Shandong University","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C71139939","display_name":"Modal","score":0.6706752181053162},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6706255078315735},{"id":"https://openalex.org/C4725764","display_name":"Extraction (chemistry)","score":0.6369328498840332},{"id":"https://openalex.org/C203718221","display_name":"Sound (geography)","score":0.4475197196006775},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3837069272994995},{"id":"https://openalex.org/C28490314","display_name":"Speech recognition","score":0.364810049533844},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.34657734632492065},{"id":"https://openalex.org/C24890656","display_name":"Acoustics","score":0.21735471487045288}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":19}},{"id":"openalex:W4390604007","title":"FastTextDodger: Decision-Based Adversarial Attack Against Black-Box NLP Models With Extremely High Efficiency","url":"https://doi.org/10.1109/tifs.2024.3350376","published":"2024-01-01","authors":["Xiaoxue Hu","Geling Liu","Baolin Zheng","Lingchen Zhao","Qian Wang","Yufei Zhang","Minxin Du"],"abstract":"Recently, achieving query-efficient adversarial example attacks targeting black-box natural language models has attracted widespread attention from researchers. This task is considered difficult due to the discrete nature of texts, limited knowledge of the target model, and strict query access limitations in real-world systems. However, existing attacks often require a large number of queries or result in low attack success rates, having not met practical requirements. To address this, we propose FastTextDodger, a simple and compact decision-based black-box textual adversarial attack that generates grammatically correct adversarial texts with high attack success rates and few queries. Experimental results show that FastTextDodger achieves an impressive 97.4% attack success rate on benchmark datasets and models, and only needs about 200 queries. Compared to state-of-the-art attacks, FastT...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/tifs.2024.3350376","openalex_id":"https://openalex.org/W4390604007","cited_by_count":13,"quality_score":54,"matched_keywords":["efficient"],"author_affiliations":["Alibaba Group (China)","Chinese University of Hong Kong","Wuhan University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8371211290359497},{"id":"https://openalex.org/C37736160","display_name":"Adversarial system","score":0.7445233464241028},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.668573260307312},{"id":"https://openalex.org/C94966114","display_name":"Black box","score":0.6672796010971069},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.48160621523857117},{"id":"https://openalex.org/C84525736","display_name":"Decision tree","score":0.44364938139915466},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.3996448516845703},{"id":"https://openalex.org/C153180895","display_name":"Pattern recognition (psychology)","score":0.3329398036003113}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":13}},{"id":"openalex:W4391941146","title":"Dual-View Curricular Optimal Transport for Cross-Lingual Cross-Modal Retrieval","url":"https://doi.org/10.1109/tip.2024.3365248","published":"2024-01-01","authors":["Yabing Wang","Shuhui Wang","Hao Luo","Jianfeng Dong","Fan Wang","Meng Han","Xun Wang","Meng Wang"],"abstract":"Current research on cross-modal retrieval is mostly English-oriented, as the availability of a large number of English-oriented human-labeled vision-language corpora. In order to break the limit of non-English labeled data, cross-lingual cross-modal retrieval (CCR) has attracted increasing attention. Most CCR methods construct pseudo-parallel vision-language corpora via Machine Translation (MT) to achieve cross-lingual transfer. However, the translated sentences from MT are generally imperfect in describing the corresponding visual contents. Improperly assuming the pseudo-parallel data are correctly correlated will make the networks overfit to the noisy correspondence. Therefore, we propose Dual-view Curricular Optimal Transport (DCOT) to learn with noisy correspondence in CCR. In particular, we quantify the confidence of the sample pair correlation with optimal transport theory from bot...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/tip.2024.3365248","openalex_id":"https://openalex.org/W4391941146","cited_by_count":13,"quality_score":54,"matched_keywords":["retrieval"],"author_affiliations":["Alibaba Group (China)","Chinese Academy of Sciences","Hefei University of Technology","Institute of Computing Technology","Zhejiang Gongshang University","Zhejiang University of Science and Technology"],"concepts":[{"id":"https://openalex.org/C22019652","display_name":"Overfitting","score":0.8385317325592041},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7795687317848206},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5897265076637268},{"id":"https://openalex.org/C71139939","display_name":"Modal","score":0.5803571343421936},{"id":"https://openalex.org/C63479239","display_name":"Robustness (evolution)","score":0.4801146984100342},{"id":"https://openalex.org/C2780980858","display_name":"Dual (grammatical number)","score":0.4781549870967865},{"id":"https://openalex.org/C27181475","display_name":"Cross-validation","score":0.44987502694129944},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.43571770191192627}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":13}},{"id":"openalex:W4396712842","title":"Deep Boosting Learning: A Brand-New Cooperative Approach for Image-Text Matching","url":"https://doi.org/10.1109/tip.2024.3396063","published":"2024-01-01","authors":["Haiwen Diao","Ying Zhang","Shang Gao","Xiang Ruan","Huchuan Lu"],"abstract":"Image-text matching remains a challenging task due to heterogeneous semantic diversity across modalities and insufficient distance separability within triplets. Different from previous approaches focusing on enhancing multi-modal representations or exploiting cross-modal correspondence for more accurate retrieval, in this paper we aim to leverage the knowledge transfer between peer branches in a boosting manner to seek a more powerful matching model. Specifically, we propose a brand-new Deep Boosting Learning (DBL) algorithm, where an anchor branch is first trained to provide insights into the data properties, with a target branch gaining more advanced knowledge to develop optimal features and distance metrics. Concretely, an anchor branch initially learns the absolute or relative distance between positive and negative pairs, providing a foundational understanding of the particular netwo...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/tip.2024.3396063","openalex_id":"https://openalex.org/W4396712842","cited_by_count":9,"quality_score":54,"matched_keywords":["retrieval","distillation"],"author_affiliations":["Dalian University of Technology","Sekisui Chemical (Japan)","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C46686674","display_name":"Boosting (machine learning)","score":0.8932304382324219},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7570047378540039},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.6504937410354614},{"id":"https://openalex.org/C153083717","display_name":"Leverage (statistics)","score":0.5691827535629272},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.5300754308700562},{"id":"https://openalex.org/C774472","display_name":"Margin (machine learning)","score":0.4585989713668823},{"id":"https://openalex.org/C165064840","display_name":"Matching (statistics)","score":0.4538884460926056},{"id":"https://openalex.org/C108583219","display_name":"Deep learning","score":0.43187853693962097}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":9}},{"id":"openalex:W4403840670","title":"CogView3: Finer and Faster Text-to-Image Generation via Relay Diffusion","url":"https://doi.org/10.1007/978-3-031-72980-5_1","published":"2024-01-01","authors":["Wendi Zheng","Jiayan Teng","Zhuoyi Yang","Weihan Wang","Jidong Chen","Xiaotao Gu","Yuxiao Dong","Ming Ding","Jie Tang"],"abstract":"","companies":["Z.ai/Zhipu"],"matched_orgs":["Z.ai/Zhipu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"book-chapter","doi":"https://doi.org/10.1007/978-3-031-72980-5_1","openalex_id":"https://openalex.org/W4403840670","cited_by_count":17,"quality_score":54,"matched_keywords":[],"author_affiliations":["Tsinghua University","Zhipu AI (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8471140265464783},{"id":"https://openalex.org/C2778156585","display_name":"Relay","score":0.6668769717216492},{"id":"https://openalex.org/C69357855","display_name":"Diffusion","score":0.5568397045135498},{"id":"https://openalex.org/C115961682","display_name":"Image (mathematics)","score":0.474201500415802},{"id":"https://openalex.org/C121684516","display_name":"Computer graphics (images)","score":0.44778358936309814},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.4251965880393982},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3844035863876343},{"id":"https://openalex.org/C121332964","display_name":"Physics","score":0.0517657995223999}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":17}},{"id":"openalex:W4392826867","title":"SKIM: Skeleton-Based Isolated Sign Language Recognition With Part Mixing","url":"https://doi.org/10.1109/tmm.2023.3321502","published":"2024-01-01","authors":["Kezhou Lin","Xiaohan Wang","Linchao Zhu","Bang Zhang","Yi Yang"],"abstract":"In this article, we present skeleton-based isolated sign language recognition (IsoSLR) with part mixing - SKIM. An IsoSLR model that solely takes the skeleton representation of the human body as input. Previous skeleton-based works either perform worse when compared to RGB-based counterparts or require fusion with other modalities to obtain competitive results. With SKIM, a single skeleton-based model without complex pre-training can obtain similar or even higher accuracy than current state-of-the-art methods. This margin can be further increased by simple late fusion within the same modality. To achieve this, we first develop a novel data augmentation technique called part mixing. It swaps the corresponding keypoints within one region (e.g. hand) between two randomly selected samples and combines their labels linearly as the new label. As regions like hand and face are key articulators....","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/tmm.2023.3321502","openalex_id":"https://openalex.org/W4392826867","cited_by_count":16,"quality_score":53,"matched_keywords":[],"author_affiliations":["Alibaba Group (China)","Zhejiang University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8421275615692139},{"id":"https://openalex.org/C183115368","display_name":"Weighting","score":0.6620016098022461},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.6255303621292114},{"id":"https://openalex.org/C18969341","display_name":"Skeleton (computer programming)","score":0.6030135154724121},{"id":"https://openalex.org/C522192633","display_name":"Sign language","score":0.5760317444801331},{"id":"https://openalex.org/C153180895","display_name":"Pattern recognition (psychology)","score":0.5271472930908203},{"id":"https://openalex.org/C41608201","display_name":"Embedding","score":0.4837339222431183},{"id":"https://openalex.org/C2780226545","display_name":"Modality (human–computer interaction)","score":0.4791240692138672}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":16}},{"id":"openalex:W4404783567","title":"RaFe: Ranking Feedback Improves Query Rewriting for RAG","url":"https://doi.org/10.18653/v1/2024.findings-emnlp.49","published":"2024-01-01","authors":["Shengyu Mao","Yong Jiang","Boli Chen","Xiao Li","Peng Wang","Xinyu Wang","Pengjun Xie","Fei Huang","Huajun Chen","Ningyu Zhang"],"abstract":"As Large Language Models (LLMs) and Retrieval Augmentation Generation (RAG) techniques have evolved, query rewriting has been widely incorporated into the RAG system for downstream tasks like open-domain QA.Many works have attempted to utilize small models with reinforcement learning rather than costly LLMs to improve query rewriting.However, current methods require annotations (e.g., labeled relevant documents or downstream answers) or predesigned rewards for feedback, which lack generalization, and fail to utilize signals tailored for query rewriting.In this paper, we propose RaFe, a framework for training query rewriting models free of annotations.By leveraging a publicly available reranker, RaFe provides feedback aligned well with the rewriting objectives.Experimental results demonstrate that RaFe can obtain better performance than baselines.","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.18653/v1/2024.findings-emnlp.49","openalex_id":"https://openalex.org/W4404783567","cited_by_count":12,"quality_score":53,"matched_keywords":["retrieval"],"author_affiliations":["Alibaba Group (China)","Alibaba Group (United States)"],"concepts":[{"id":"https://openalex.org/C189430467","display_name":"Ranking (information retrieval)","score":0.8261572122573853},{"id":"https://openalex.org/C154690210","display_name":"Rewriting","score":0.7747675180435181},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7211663126945496},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.4399099349975586},{"id":"https://openalex.org/C199360897","display_name":"Programming language","score":0.18356090784072876}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":12}},{"id":"openalex:W4404939297","title":"Filtered Corpus Training (FiCT) Shows that Language Models Can Generalize from Indirect Evidence","url":"https://doi.org/10.1162/tacl_a_00720","published":"2024-01-01","authors":["Abhinav Patil","Jaap Jumelet","Yu Ying Chiu","Andy Lapastora","Peter Shen","Lexie Wang","Clevis Willrich","Shane Steinert‐Threlkeld"],"abstract":"Abstract This paper introduces Filtered Corpus Training, a method that trains language models (LMs) on corpora with certain linguistic constructions filtered out from the training data, and uses it to measure the ability of LMs to perform linguistic generalization on the basis of indirect evidence. We apply the method to both LSTM and Transformer LMs (of roughly comparable size), developing filtered corpora that target a wide range of linguistic phenomena. Our results show that while transformers are better qua LMs (as measured by perplexity), both models perform equally and surprisingly well on linguistic generalization measures, suggesting that they are capable of generalizing from indirect evidence.","companies":["Microsoft","Amazon"],"matched_orgs":["Microsoft","Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1162/tacl_a_00720","openalex_id":"https://openalex.org/W4404939297","cited_by_count":4,"quality_score":53,"matched_keywords":[],"author_affiliations":["Amazon (United States)","Microsoft (United States)","University of Amsterdam","University of Washington"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8055751323699951},{"id":"https://openalex.org/C2777211547","display_name":"Training (meteorology)","score":0.6885068416595459},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.583094596862793},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.5262914299964905},{"id":"https://openalex.org/C51632099","display_name":"Training set","score":0.5209451913833618},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.484508216381073},{"id":"https://openalex.org/C28490314","display_name":"Speech recognition","score":0.32931798696517944},{"id":"https://openalex.org/C41895202","display_name":"Linguistics","score":0.32832205295562744}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":4}},{"id":"openalex:W4400903651","title":"MMGER: Multi-Modal and Multi-Granularity Generative Error Correction With LLM for Joint Accent and Speech Recognition","url":"https://doi.org/10.1109/lsp.2024.3432275","published":"2024-01-01","authors":["Bingshen Mu","Xucheng Wan","Naijun Zheng","Huan Zhou","Lei Xie"],"abstract":"Despite notable advancements in automatic speech recognition (ASR), performance tends to degrade when faced with adverse conditions. Generative error correction (GER) leverages the exceptional text comprehension capabilities of large language models (LLM), delivering impressive performance in ASR error correction, where N-best hypotheses provide valuable information for transcription prediction. However, GER encounters challenges such as fixed N-best hypotheses, insufficient utilization of acoustic information, and limited specificity to multi-accent scenarios. In this paper, we explore the application of GER in multi-accent scenarios. Accents represent deviations from standard pronunciation norms, and the multi-task learning framework for simultaneous ASR and accent recognition (AR) has effectively addressed the multi-accent scenarios, making it a prominent solution. In this work, we pr...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/lsp.2024.3432275","openalex_id":"https://openalex.org/W4400903651","cited_by_count":11,"quality_score":52,"matched_keywords":["LLM"],"author_affiliations":["Huawei Technologies (China)","Northwestern Polytechnical University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7304393649101257},{"id":"https://openalex.org/C28490314","display_name":"Speech recognition","score":0.7231066823005676},{"id":"https://openalex.org/C18555067","display_name":"Joint (building)","score":0.6536316871643066},{"id":"https://openalex.org/C177774035","display_name":"Granularity","score":0.6048270463943481},{"id":"https://openalex.org/C2776756274","display_name":"Stress (linguistics)","score":0.5972727537155151},{"id":"https://openalex.org/C71139939","display_name":"Modal","score":0.5628218650817871},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.49827146530151367},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.4751693606376648}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":11}},{"id":"openalex:W4392608007","title":"Unifying Structured Data as Graph for Data-to-Text Pre-Training","url":"https://doi.org/10.1162/tacl_a_00641","published":"2024-01-01","authors":["Shujie Li","Liang Li","Ruiying Geng","Min Yang","Binhua Li","Guanghu Yuan","Wanwei He","Shao Yuan","Can Ma","Fei Huang","Yongbin Li"],"abstract":"Abstract Data-to-text (D2T) generation aims to transform structured data into natural language text. Data-to-text pre-training has proved to be powerful in enhancing D2T generation and yields impressive performance. However, previous pre-training methods either oversimplified structured data into a sequence without considering input structures or designed training objectives tailored for a specific data structure (e.g., table or knowledge graph). In this paper, we unify different types of structured data (i.e., table, key-value data, knowledge graph) into the graph format and cast different D2T generation tasks as graph-to-text generation. To effectively exploit the structural information of the input graph, we propose a structure-enhanced pre-training method for D2T generation by designing a structure-enhanced Transformer. Concretely, we devise a position matrix for the Transformer, enc...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1162/tacl_a_00641","openalex_id":"https://openalex.org/W4392608007","cited_by_count":13,"quality_score":50,"matched_keywords":[],"author_affiliations":["Alibaba Group (China)","Chinese Academy of Sciences","Institute of Information Engineering","Shenzhen Institutes of Advanced Technology"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.849544882774353},{"id":"https://openalex.org/C132525143","display_name":"Graph","score":0.5343709588050842},{"id":"https://openalex.org/C51632099","display_name":"Training set","score":0.4916840195655823},{"id":"https://openalex.org/C2522767166","display_name":"Data science","score":0.4478623569011688},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.40314021706581116},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.36945638060569763},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.3652479648590088},{"id":"https://openalex.org/C80444323","display_name":"Theoretical computer science","score":0.2219589650630951}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":13}},{"id":"openalex:W4397026222","title":"MAC: Masked Contrastive Pre-Training for Efficient Video-Text Retrieval","url":"https://doi.org/10.1109/tmm.2024.3402613","published":"2024-01-01","authors":["Fangxun Shu","Biaolong Chen","Yue Liao","Jinqiao Wang","Si Liu"],"abstract":"We present a simple yet effective end-to-end Video-language Pre-training (VidLP) framework, Masked Contrastive Video-language Pre-training (MAC), for video-text retrieval tasks. Our MAC aims to reduce video representation's spatial and temporal redundancy in the VidLP model by a mask sampling mechanism to improve pre-training efficiency. Comparing conventional temporal sparse sampling, we propose to randomly mask a high ratio of spatial regions and only take visible regions into the encoder as sparse spatial sampling. Similarly, we adopt the mask sampling technique for text inputs for consistency. Instead of blindly applying the mask-then-prediction paradigm from MAE, we propose a masked-then-alignment paradigm for efficient video-text alignment. The motivation is that video-text retrieval tasks rely on high-level alignment rather than low-level reconstruction, and multimodal alignment w...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/tmm.2024.3402613","openalex_id":"https://openalex.org/W4397026222","cited_by_count":5,"quality_score":50,"matched_keywords":["retrieval","efficient"],"author_affiliations":["Alibaba Group (China)","Beihang University","Beijing Academy of Artificial Intelligence","Chinese Academy of Sciences","Institute of Automation","University of Chinese Academy of Sciences"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8737857937812805},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5174277424812317},{"id":"https://openalex.org/C2983174267","display_name":"Video retrieval","score":0.4909262955188751},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.4570349454879761},{"id":"https://openalex.org/C28490314","display_name":"Speech recognition","score":0.443244069814682},{"id":"https://openalex.org/C2777211547","display_name":"Training (meteorology)","score":0.41816315054893494},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.405538409948349},{"id":"https://openalex.org/C49774154","display_name":"Multimedia","score":0.3626976013183594}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":5}},{"id":"openalex:W4391660241","title":"Recent advances in artificial intelligence generated content","url":"https://doi.org/10.1631/fitee.2410000","published":"2024-01-01","authors":["Junping Zhang","Lingyun Sun","Cong Jin","Junbin Gao","Xiaobing Li","Jiebo Luo","Zhigeng Pan","Ying Tang","Jingdong Wang"],"abstract":"人工智能生成内容(AIGC)是近年来人工智能(AI)领域一个研究热点,它有望取代人类以较低成本高效率执行内容生成工作,如音乐、绘画、多模态内容生成、新闻文章、总结报告、股评摘要,以至元宇宙中的内容生成和数字人。AIGC为未来AI发展和实现提供了一条新的技术路径。 在此背景下,《信息与电子工程前沿(英文)》期刊组织了一期关于AIGC最新进展的特刊。本期特刊关注AIGC理论、算法、应用及相关领域。通过吸引高质量论文,我们希望帮助学术界和工业界研究人员更深入了解AIGC背后的基本理论及其潜在应用,激励更多研究人员加入并推进AIGC领域的研究。因此,我们就以下主题(但不限于)征集论文:(1)AI生成音乐;(2)AI生成绘画;(3)AI对话模型;(4)AI新闻摘要;(5)AI与元宇宙;(6)AI与数字人;(7)AI图像编辑;(8)AI生成短视频;(9)AI生成多媒体内容;(10)ChatGPT相关工作。经严格评审,选出12篇论文,包括1篇评论、1篇观点、3篇综述、6篇研究和1篇通讯。我们将其划分为3个主要部分:ChatGPT、扩散模型、提示学习和多模态。 总体而言,本期特刊涵盖了与AIGC开发和应用相关的广泛研究主题,包括人工智能图像/文本生成、三维内容创建、以用户为中心的图形设计、特定风格的音乐生成,以及与因果表征学习、高阶扩散模型相关的工作。此外,还详细调研了概率扩散模型、提示学习和ChatGPT。 最后,感谢所有作者对本期特刊的支持,特别感谢所有评审人对专刊投稿富有见地的意见和有益建议。","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1631/fitee.2410000","openalex_id":"https://openalex.org/W4391660241","cited_by_count":12,"quality_score":49,"matched_keywords":[],"author_affiliations":["Baidu (China)","Central Conservatory of Music","Communication University of China","Fudan University","Nanjing University of Information Science and Technology","Rowan University","The University of Sydney","University of Rochester","Zhejiang International Studies University","Zhejiang University"],"concepts":[{"id":"https://openalex.org/C157170001","display_name":"Applications of artificial intelligence","score":0.6473962068557739},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5787553191184998},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.40044891834259033}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":12}},{"id":"openalex:W4401043452","title":"On-the-Fly Fusion of Large Language Models and Machine Translation","url":"http://dx.doi.org/10.18653/v1/2024.findings-naacl.35","published":"2024-01-01","authors":["Hieu Hoang","Huda Khayrallah","Marcin Junczys-Dowmunt"],"abstract":"We propose on-the-fly ensembling of a neural machine translation (NMT) model with a large language model (LLM), prompted on the same task and input.Through experiments on 4 language directions with varying data amounts, we find that a slightly weaker-at-translation LLM can improve translations of a NMT model, and such an ensemble can produce better translations than ensembling two stronger NMT models.We demonstrate that our ensemble method can be combined with various techniques from LLM prompting, such as in context learning and translation context.","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.18653/v1/2024.findings-naacl.35","openalex_id":"https://openalex.org/W4401043452","cited_by_count":4,"quality_score":49,"matched_keywords":["LLM","language model"],"author_affiliations":["Microsoft (United States)","Microsoft Research (United Kingdom)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6815847754478455},{"id":"https://openalex.org/C203005215","display_name":"Machine translation","score":0.6546830534934998},{"id":"https://openalex.org/C149364088","display_name":"Translation (biology)","score":0.6015546917915344},{"id":"https://openalex.org/C158525013","display_name":"Fusion","score":0.5109564661979675},{"id":"https://openalex.org/C2781020372","display_name":"On the fly","score":0.5015733242034912},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.47628679871559143},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4537433087825775},{"id":"https://openalex.org/C41895202","display_name":"Linguistics","score":0.19764751195907593}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":4}},{"id":"openalex:W4390707123","title":"Generalizing Upper Limb Force Modeling With Transfer Learning: A Multimodal Approach Using EMG and IMU for New Users and Conditions","url":"https://doi.org/10.1109/tnsre.2024.3351829","published":"2024-01-01","authors":["Gelareh Hajian","Evan Campbell","Mehdi Ansari","Evelyn Morin","Ali Etemad","Kevin Englehart","Erik Scheme"],"abstract":"In the field of EMG-based force modeling, the ability to generalize models across individuals could play a significant role in its adoption across a range of applications, including assistive devices, robotic and rehabilitation devices. However, current studies have predominately focused on intra-subject modeling, largely neglecting the burden of end-user data acquisition. In this work, we propose the use of transfer learning (TL) to generalize force modeling to a new user by first establishing a baseline model trained using other users' data, and then adapting to the end-user using a small amount of new data (only 10% , 20% , and 40% of the new user data). Using a deep multimodal convolutional neural network, consisting of two CNN models, one with high-density (HD) EMG and one with motion data recorded by an Inertial Measurement Unit (IMU), our proposed TL technique significantly improv...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/tnsre.2024.3351829","openalex_id":"https://openalex.org/W4390707123","cited_by_count":12,"quality_score":49,"matched_keywords":[],"author_affiliations":["Microsoft (United States)","Queen's University","Toronto Rehabilitation Institute","University of New Brunswick","University of Toronto"],"concepts":[{"id":"https://openalex.org/C79061980","display_name":"Inertial measurement unit","score":0.7455605268478394},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6673685312271118},{"id":"https://openalex.org/C81363708","display_name":"Convolutional neural network","score":0.6171963214874268},{"id":"https://openalex.org/C150899416","display_name":"Transfer of learning","score":0.6002023220062256},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.535198986530304},{"id":"https://openalex.org/C2780451532","display_name":"Task (project management)","score":0.470064252614975},{"id":"https://openalex.org/C67712803","display_name":"User modeling","score":0.4476633071899414},{"id":"https://openalex.org/C18762648","display_name":"Work (physics)","score":0.424641489982605}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":12}},{"id":"openalex:W4404783388","title":"Cross-Domain Audio Deepfake Detection: Dataset and Analysis","url":"https://doi.org/10.18653/v1/2024.emnlp-main.286","published":"2024-01-01","authors":["Yuang Li","Min Zhang","Mengxin Ren","Xiaosong Qiao","Miaomiao Ma","Daimeng Wei","Hao Yang"],"abstract":"Audio deepfake detection (ADD) is essential for preventing the misuse of synthetic voices that may infringe on personal rights and privacy.Recent zero-shot text-to-speech (TTS) models pose higher risks as they can clone voices with a single utterance.However, the existing ADD datasets are outdated, leading to suboptimal generalization of detection models.In this paper, we construct a new cross-domain ADD dataset comprising over 300 hours of speech data that is generated by five advanced zeroshot TTS models.To simulate real-world scenarios, we employ diverse attack methods and audio prompts from different datasets.Experiments show that, through novel attackaugmented training, the Wav2Vec2-large and Whisper-medium models achieve equal error rates of 4.1% and 6.5% respectively.Additionally, we demonstrate our models' outstanding few-shot ADD ability by fine-tuning with just one minute of ta...","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.18653/v1/2024.emnlp-main.286","openalex_id":"https://openalex.org/W4404783388","cited_by_count":12,"quality_score":49,"matched_keywords":[],"author_affiliations":["Huawei Technologies (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7575504779815674},{"id":"https://openalex.org/C36503486","display_name":"Domain (mathematical analysis)","score":0.5756928324699402},{"id":"https://openalex.org/C160372630","display_name":"Audio analyzer","score":0.538797914981842},{"id":"https://openalex.org/C28490314","display_name":"Speech recognition","score":0.32035911083221436},{"id":"https://openalex.org/C127220857","display_name":"Audio signal processing","score":0.22054290771484375},{"id":"https://openalex.org/C64922751","display_name":"Audio signal","score":0.15059944987297058},{"id":"https://openalex.org/C13895895","display_name":"Speech coding","score":0.11043137311935425},{"id":"https://openalex.org/C33923547","display_name":"Mathematics","score":0.10826009511947632}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":12}},{"id":"openalex:W4394773691","title":"Retrieve What You Need: A Mutual Learning Framework for Open-domain Question Answering","url":"https://doi.org/10.1162/tacl_a_00646","published":"2024-01-01","authors":["Dingmin Wang","Qiuyuan Huang","Matthew O. Jackson","Jianfeng Gao"],"abstract":"Abstract An open-domain question answering (QA) system usually follows a retrieve-then-read paradigm, in which a retriever is used to retrieve relevant passages from a large corpus, and then a reader generates answers based on the retrieved passages and the original question. In this paper, we propose a simple and novel mutual learning framework to improve the performance of retrieve-then-read-style models via an intermediate module named the knowledge selector, which we train with reinforcement learning. The key benefits of our proposed intermediate module are: 1) no requirement for additional annotated question-passage pairs; 2) improvements in both retrieval and QA performance, as well as computational efficiency, compared to prior competitive retrieve-then-read models; 3) with no finetuning, improvement in the zero-shot performance of large-scale pre-trained language models, e.g., Ch...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1162/tacl_a_00646","openalex_id":"https://openalex.org/W4394773691","cited_by_count":7,"quality_score":48,"matched_keywords":["retrieval"],"author_affiliations":["Microsoft (United States)","University of Oxford"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.9195730686187744},{"id":"https://openalex.org/C44291984","display_name":"Question answering","score":0.7813405394554138},{"id":"https://openalex.org/C2993776861","display_name":"Open domain","score":0.7603020071983337},{"id":"https://openalex.org/C97541855","display_name":"Reinforcement learning","score":0.5776572823524475},{"id":"https://openalex.org/C2776036281","display_name":"Constraint (computer-aided design)","score":0.5740664601325989},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5658444762229919},{"id":"https://openalex.org/C36503486","display_name":"Domain (mathematical analysis)","score":0.5209992527961731},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.513326108455658}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":7}},{"id":"openalex:W4403650354","title":"A Refer-and-Ground Multimodal Large Language Model for Biomedicine","url":"https://doi.org/10.1007/978-3-031-72390-2_38","published":"2024-01-01","authors":["Xiaoshuang Huang","Haifeng Huang","Lingdong Shen","Yehui Yang","Fangxin Shang","Junwei Liu","Jia Liu"],"abstract":"","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"book-chapter","doi":"https://doi.org/10.1007/978-3-031-72390-2_38","openalex_id":"https://openalex.org/W4403650354","cited_by_count":7,"quality_score":48,"matched_keywords":["language model"],"author_affiliations":["Baidu (China)","China Agricultural University","Institute of Automation"],"concepts":[{"id":"https://openalex.org/C66782513","display_name":"Biomedicine","score":0.8506525158882141},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.813991129398346},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.4667893350124359},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4301721453666687},{"id":"https://openalex.org/C199360897","display_name":"Programming language","score":0.37307876348495483},{"id":"https://openalex.org/C54355233","display_name":"Genetics","score":0.0},{"id":"https://openalex.org/C86803240","display_name":"Biology","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":7}},{"id":"openalex:W4393102369","title":"MedSumm: A Multimodal Approach to Summarizing Code-Mixed Hindi-English Clinical Queries","url":"https://doi.org/10.1007/978-3-031-56069-9_8","published":"2024-01-01","authors":["Akash Ghosh","A. Seetharama Acharya","Prince Jha","Sriparna Saha","Aniket Gaudgaul","Rajdeep Majumdar","Aman Chadha","Raghav Jain","Setu Sinha","Shivani Agarwal"],"abstract":"","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"book-chapter","doi":"https://doi.org/10.1007/978-3-031-56069-9_8","openalex_id":"https://openalex.org/W4393102369","cited_by_count":10,"quality_score":47,"matched_keywords":[],"author_affiliations":["Amazon (United States)","Indian Institute of Technology Patna","Indira Gandhi Institute of Medical Sciences","Stanford University"],"concepts":[{"id":"https://openalex.org/C519982507","display_name":"Hindi","score":0.9090607762336731},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8743988275527954},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.6490383148193359},{"id":"https://openalex.org/C2776760102","display_name":"Code (set theory)","score":0.5497786998748779},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5319619178771973},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.371641606092453},{"id":"https://openalex.org/C199360897","display_name":"Programming language","score":0.3702225089073181},{"id":"https://openalex.org/C177264268","display_name":"Set (abstract data type)","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":10}},{"id":"openalex:W7126447459","title":"Do Language Models Know When They’re Hallucinating References?","url":"https://doi.org/10.18653/v1/2024.findings-eacl.62","published":"2024-01-01","authors":["Ayush Agrawal","Mirac Suzgun","Lester Mackey","Adam Kalai"],"abstract":"State-of-the-art language models (LMs) are notoriously susceptible to generating hallucinated information.Such inaccurate outputs not only undermine the reliability of these models but also limit their use and raise serious concerns about misinformation and propaganda.In this work, we focus on hallucinated book and article references and present them as the \"model organism\" of language model hallucination research, due to their frequent and easy-to-discern nature.We posit that if a language model cites a particular reference in its output, then it should ideally possess sufficient information about its authors and content, among other relevant details.Using this basic insight, we illustrate that one can identify hallucinated references without ever consulting any external resources, by asking a set of direct or indirect queries to the language model about the references.These queries can...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.18653/v1/2024.findings-eacl.62","openalex_id":"https://openalex.org/W7126447459","cited_by_count":6,"quality_score":47,"matched_keywords":["language model"],"author_affiliations":["Microsoft (United States)","Microsoft Research (United Kingdom)"],"concepts":[{"id":"https://openalex.org/C2911011789","display_name":"Hallucinating","score":0.6693000197410583},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.4702000021934509},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4049000144004822},{"id":"https://openalex.org/C111472728","display_name":"Epistemology","score":0.38420000672340393},{"id":"https://openalex.org/C41895202","display_name":"Linguistics","score":0.367900013923645},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.3499999940395355},{"id":"https://openalex.org/C116834253","display_name":"Identification (biology)","score":0.33340001106262207},{"id":"https://openalex.org/C2780791683","display_name":"Action (physics)","score":0.326200008392334}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":6}},{"id":"openalex:W4402671765","title":"SoftDedup: an Efficient Data Reweighting Method for Speeding Up Language Model Pre-training","url":"http://dx.doi.org/10.18653/v1/2024.acl-long.220","published":"2024-01-01","authors":["Nan He","Weichen Xiong","Hanwen Liu","Yi Liao","Lei Ding","Kai Zhang","Guohua Tang","Han Xiao","Wei Yang"],"abstract":"Nan He, Weichen Xiong, Hanwen Liu, Yi Liao, Lei Ding, Kai Zhang, Guohua Tang, Xiao Han, Yang Wei. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2024.","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.18653/v1/2024.acl-long.220","openalex_id":"https://openalex.org/W4402671765","cited_by_count":1,"quality_score":46,"matched_keywords":["language model","efficient"],"author_affiliations":["Tencent (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.850312352180481},{"id":"https://openalex.org/C2777211547","display_name":"Training (meteorology)","score":0.6506777405738831},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.5814673900604248},{"id":"https://openalex.org/C67186912","display_name":"Data modeling","score":0.5805454850196838},{"id":"https://openalex.org/C51632099","display_name":"Training set","score":0.5611673593521118},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5500667095184326},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.4050469398498535},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.3899666368961334}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4402667069","title":"CSCD-NS: a Chinese Spelling Check Dataset for Native Speakers","url":"https://doi.org/10.18653/v1/2024.acl-long.10","published":"2024-01-01","authors":["Yong Hu","Fandong Meng","Jie Zhou"],"abstract":"In this paper, we present CSCD-NS, the first Chinese spelling check (CSC) dataset designed for native speakers, containing 40,000 samples from a Chinese social platform.Compared with existing CSC datasets aimed at Chinese learners, CSCD-NS is ten times larger in scale and exhibits a distinct error distribution, with a significantly higher proportion of word-level errors.To further enhance the data resource, we propose a novel method that simulates the input process through an input method, generating large-scale and high-quality pseudo data that closely resembles the actual error distribution and outperforms existing methods.Moreover, we investigate the performance of various models in this scenario, including large language models (LLMs), such as ChatGPT.The result indicates that generative models underperform BERT-like classification models due to strict length and pronunciation constr...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.18653/v1/2024.acl-long.10","openalex_id":"https://openalex.org/W4402667069","cited_by_count":9,"quality_score":46,"matched_keywords":[],"author_affiliations":["Tencent (China)"],"concepts":[{"id":"https://openalex.org/C2777801307","display_name":"Spelling","score":0.8339393734931946},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7062745094299316},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.5789099335670471},{"id":"https://openalex.org/C28490314","display_name":"Speech recognition","score":0.4543350338935852},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.42026287317276},{"id":"https://openalex.org/C136764020","display_name":"World Wide Web","score":0.3415353298187256},{"id":"https://openalex.org/C41895202","display_name":"Linguistics","score":0.32575470209121704},{"id":"https://openalex.org/C138885662","display_name":"Philosophy","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":9}},{"id":"openalex:W4403674676","title":"Automatic Disfluency Detection From Untranscribed Speech","url":"https://doi.org/10.1109/taslp.2024.3485465","published":"2024-01-01","authors":["Amrit Romana","Kazuhito Koishida","Emily Mower Provost"],"abstract":"Speech disfluencies, such as filled pauses or repetitions, are disruptions in the typical flow of speech. All speakers experience disfluencies at times, and the rate at which we produce disfluencies may be increased by certain speaker or environmental characteristics. Modeling disfluencies has been shown to be useful for a range of downstream tasks, and as a result, disfluency detection has many potential applications. In this work, we investigate language, acoustic, and multimodal methods for frame-level automatic disfluency detection and categorization. Each of these methods relies on audio as an input. First, we evaluate several automatic speech recognition (ASR) systems in terms of their ability to transcribe disfluencies, measured using disfluency error rates. We then use these ASR transcripts as input to a language-based disfluency detection model. We find that disfluency detection...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/taslp.2024.3485465","openalex_id":"https://openalex.org/W4403674676","cited_by_count":9,"quality_score":46,"matched_keywords":[],"author_affiliations":["Microsoft (United States)","University of Michigan"],"concepts":[{"id":"https://openalex.org/C28490314","display_name":"Speech recognition","score":0.6861125230789185},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.622400164604187},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.44655105471611023},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3528065085411072}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":9}},{"id":"openalex:W4399206114","title":"PersonMAE: Person Re-Identification Pre-Training With Masked AutoEncoders","url":"https://doi.org/10.1109/tmm.2024.3405649","published":"2024-01-01","authors":["Hezhen Hu","Xiaoyi Dong","Jianmin Bao","Dongdong Chen","Lu Yuan","Dong Chen","Houqiang Li"],"abstract":"Pre-training is playing an increasingly important role in learning generic feature representation for Person Re-identification (ReID). We argue that a high-quality ReID representation should have three properties, namely, multi-level awareness, occlusion robustness, and cross-region invariance. To this end, we propose a simple yet effective pre-training framework, namely PersonMAE, which involves two core designs into masked autoencoders to better serve the task of Person Re-ID. 1) PersonMAE generates two regions from the given image with <italic xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">RegionA</i> as the input and <italic xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">RegionB</i> as the prediction target. <italic xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" xmlns:xlink=\"http://www.w3.org/1999/x...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/tmm.2024.3405649","openalex_id":"https://openalex.org/W4399206114","cited_by_count":8,"quality_score":45,"matched_keywords":[],"author_affiliations":["Microsoft (United States)","Microsoft Research Asia (China)","University of Science and Technology of China"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8469549417495728},{"id":"https://openalex.org/C2777211547","display_name":"Training (meteorology)","score":0.6505383849143982},{"id":"https://openalex.org/C116834253","display_name":"Identification (biology)","score":0.6493315100669861},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.6266747713088989},{"id":"https://openalex.org/C51632099","display_name":"Training set","score":0.4441414177417755},{"id":"https://openalex.org/C28490314","display_name":"Speech recognition","score":0.44366398453712463},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.42122095823287964},{"id":"https://openalex.org/C153180895","display_name":"Pattern recognition (psychology)","score":0.41835135221481323}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":8}},{"id":"openalex:W4399168671","title":"Hyperbolic Pre-Trained Language Model","url":"https://doi.org/10.1109/taslp.2024.3407575","published":"2024-01-01","authors":["Weize Chen","Xu Han","Yankai Lin","Kaichen He","Ruobing Xie","Jie Zhou","Zhiyuan Liu","Maosong Sun"],"abstract":"In recent years, we have witnessed significant improvements in pre-trained language models (PLM) brought about by the scaling of parameter sizes and data amounts. However, this also brings high computational and storage costs. In this paper, we present a new direction to improve PLMs without scaling parameters and data: adopting a geometric feature space that is more suitable for encoding the intrinsic structured features of text. Although text is generally considered unstructured data, it possesses rich intrinsic structured features that signify syntactic and semantic relationships. Leveraging these structured features is vital for text understanding. Given that structured features are better encoded in hyperbolic spaces than in the Euclidean spaces used by conventional PLMs, we propose that PLMs should operate entirely within hyperbolic spaces. Our experiments demonstrate the superiori...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/taslp.2024.3407575","openalex_id":"https://openalex.org/W4399168671","cited_by_count":4,"quality_score":45,"matched_keywords":["language model"],"author_affiliations":["Renmin University of China","Tencent (China)","Tsinghua University"],"concepts":[{"id":"https://openalex.org/C99844830","display_name":"Scaling","score":0.6835655570030212},{"id":"https://openalex.org/C83677898","display_name":"Hyperbolic space","score":0.6672825813293457},{"id":"https://openalex.org/C129782007","display_name":"Euclidean geometry","score":0.6615710854530334},{"id":"https://openalex.org/C2776760102","display_name":"Code (set theory)","score":0.6505466103553772},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.649911105632782},{"id":"https://openalex.org/C136197465","display_name":"Variety (cybernetics)","score":0.636725127696991},{"id":"https://openalex.org/C2776359362","display_name":"Representation (politics)","score":0.6019130349159241},{"id":"https://openalex.org/C125411270","display_name":"Encoding (memory)","score":0.5470824837684631}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":4}},{"id":"arxiv:2402.01696","title":"HiGen: Hierarchy-Aware Sequence Generation for Hierarchical Text Classification","url":"http://arxiv.org/abs/2402.01696","published":"2024-01-01","authors":["Vidit Jain","Mukund Rungta","Yuchen Zhuang","Yue Yu","Zeyu Wang","Mu Gao","Jeffrey Skolnick","Chao Zhang"],"abstract":"Hierarchical text classification (HTC) is a complex subtask under multi-label text classification, characterized by a hierarchical label taxonomy and data imbalance. The best-performing models aim to learn a static representation by combining document and hierarchical label information. However, the relevance of document sections can vary based on the hierarchy level, necessitating a dynamic document representation. To address this, we propose HiGen, a text-generation-based framework utilizing language models to encode dynamic text representations. We introduce a level-guided loss function to capture the relationship between text and label name semantics. Our approach incorporates a task-specific pretraining strategy, adapting the language model to in-domain knowledge and significantly enhancing performance for classes with limited examples. Furthermore, we present a new and valuable dat...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.18653/v1/2024.eacl-long.82","openalex_id":"https://openalex.org/W4391591626","cited_by_count":4,"quality_score":45,"matched_keywords":["language model"],"author_affiliations":["Georgia Institute of Technology","Microsoft (United States)","Microsoft Research (United Kingdom)","Yale University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8402612805366516},{"id":"https://openalex.org/C31170391","display_name":"Hierarchy","score":0.6783169507980347},{"id":"https://openalex.org/C158154518","display_name":"Relevance (law)","score":0.5858325362205505},{"id":"https://openalex.org/C2781289151","display_name":"Class hierarchy","score":0.5630211234092712},{"id":"https://openalex.org/C66746571","display_name":"ENCODE","score":0.5462878942489624},{"id":"https://openalex.org/C184337299","display_name":"Semantics (computer science)","score":0.5459545850753784},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.509152352809906},{"id":"https://openalex.org/C2776359362","display_name":"Representation (politics)","score":0.4526878595352173}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":4}},{"id":"openalex:W4400770808","title":"Exploring Universal Intrinsic Task Subspace for Few-Shot Learning via Prompt Tuning","url":"https://doi.org/10.1109/taslp.2024.3430545","published":"2024-01-01","authors":["Yujia Qin","Xiaozhi Wang","Yusheng Su","Yankai Lin","Ning Ding","Jing Yi","Weize Chen","Zhiyuan Liu","Juanzi Li","Lei Hou","Peng Li","Maosong Sun"],"abstract":"Why can pre-trained language models (PLMs) learn universal representations and effectively adapt to broad NLP tasks differing a lot superficially? In this work, we empirically find evidence indicating that the adaptations of PLMs to various few-shot tasks can be reparameterized as optimizing only a few free parameters in a unified low-dimensional <italic xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">intrinsic task subspace</i> , which may help us understand why PLMs could easily adapt to various NLP tasks with small-scale data. To find such a subspace and examine its universality, we propose an analysis pipeline called <italic xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">intrinsic prompt tuning</i> (IPT). Specifically, we resort to the recent success of prompt tuning and decompose the soft prompts of...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/taslp.2024.3430545","openalex_id":"https://openalex.org/W4400770808","cited_by_count":8,"quality_score":45,"matched_keywords":[],"author_affiliations":["Renmin University of China","Tencent (China)","Tsinghua University"],"concepts":[{"id":"https://openalex.org/C32834561","display_name":"Subspace topology","score":0.8401098251342773},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7344953417778015},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.6181451678276062},{"id":"https://openalex.org/C2780451532","display_name":"Task (project management)","score":0.6089746952056885},{"id":"https://openalex.org/C177148314","display_name":"Generalization","score":0.5281327366828918},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.49744442105293274},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.398373544216156},{"id":"https://openalex.org/C33923547","display_name":"Mathematics","score":0.13507309556007385}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":8}},{"id":"openalex:W4403863315","title":"Exploring Multi-Modal Contextual Knowledge for Open-Vocabulary Object Detection","url":"https://doi.org/10.1109/tip.2024.3485518","published":"2024-01-01","authors":["Yifan Xu","Mengdan Zhang","Xiaoshan Yang","Changsheng Xu"],"abstract":"We explore multi-modal contextual knowledge learned through multi-modal masked language modeling to provide explicit localization guidance for novel classes in open-vocabulary object detection (OVD). Intuitively, a well-modeled and correctly predicted masked concept word should effectively capture the textual contexts, visual contexts, and the cross-modal correspondence between texts and regions, thereby automatically activating high attention on corresponding regions. In light of this, we propose a multi-modal contextual knowledge distillation framework, MMC-Det, to explicitly supervise a student detector with the context-aware attention of the masked concept words in a teacher fusion transformer. The teacher fusion transformer is trained with our newly proposed diverse multi-modal masked language modeling (D-MLM) strategy, which significantly enhances the fine-grained region-level visu...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/tip.2024.3485518","openalex_id":"https://openalex.org/W4403863315","cited_by_count":4,"quality_score":45,"matched_keywords":["distillation"],"author_affiliations":["Institute of Automation","Tencent (China)","University of Chinese Academy of Sciences"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7067299485206604},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.6168254017829895},{"id":"https://openalex.org/C2777601683","display_name":"Vocabulary","score":0.6050221920013428},{"id":"https://openalex.org/C2776151529","display_name":"Object detection","score":0.5251940488815308},{"id":"https://openalex.org/C71139939","display_name":"Modal","score":0.5156742334365845},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.4801428020000458},{"id":"https://openalex.org/C2781238097","display_name":"Object (grammar)","score":0.4524948298931122},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.44342294335365295}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":4}},{"id":"openalex:W4402683960","title":"Dodo: Dynamic Contextual Compression for Decoder-only LMs","url":"http://dx.doi.org/10.18653/v1/2024.acl-long.536","published":"2024-01-01","authors":["Guanghui Qin","Corby Rosset","Ethan C. Chau","Nikhil Rao","Benjamin Van Durme"],"abstract":"Transformer-based language models (LMs) are inefficient in long contexts.We propose DODO , a solution for context compression.Instead of one vector per token in a standard transformer model, DODO represents text with a dynamic number of hidden states at each layer, reducing the cost of self-attention to a fraction of typical time and space.Moreover, off-the-shelf models such as LLAMA can be adapted to DODO by efficient parameter tuning methods such as LoRA.In use, DODO can act as either an autoregressive LM or a context compressor for downstream tasks.We demonstrate through experiments in language modeling, question answering, and summarization that DODO retains capabilities in these tasks, while drastically reducing the overhead during decoding.For example, in the autoencoding task, DODO shrinks context at a 20x compression ratio with a BLEU score of 98% for reconstruction, achieving ne...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.18653/v1/2024.acl-long.536","openalex_id":"https://openalex.org/W4402683960","cited_by_count":0,"quality_score":45,"matched_keywords":["efficient","compression"],"author_affiliations":["Johns Hopkins University","Microsoft (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7101438045501709},{"id":"https://openalex.org/C180016635","display_name":"Compression (physics)","score":0.5545684695243835},{"id":"https://openalex.org/C78548338","display_name":"Data compression","score":0.5392175912857056},{"id":"https://openalex.org/C150178126","display_name":"Dynamic range compression","score":0.4575253427028656},{"id":"https://openalex.org/C57273362","display_name":"Decoding methods","score":0.4197350740432739},{"id":"https://openalex.org/C185588885","display_name":"Soft-decision decoder","score":0.418995201587677},{"id":"https://openalex.org/C28490314","display_name":"Speech recognition","score":0.3636859655380249},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.23171013593673706}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4412217129","title":"Data Management Opportunities in Unifying Large Language Models + Knowledge Graphs","url":"https://vbn.aau.dk/da/publications/e4f32515-aace-4586-b663-e06940489798","published":"2024-01-01","authors":["Arijit Khan","Tianxing Wu","Xi Chen"],"abstract":"Large Language Models (LLMs), e.g., ChatGPT, PaLM, and LLaMA are transforming natural language processing (NLP) and artificial intelligence (AI). Recent LLMs browse Web knowledge and learn from external knowledge bases, unifying LLMs and knowledge graphs (KGs). The possibility of bridging KGs with LLMs has garnered attention in knowledge engineering. On the one hand, LLMs can be enhanced with KGs to provide answers with more contextualized facts. On the other hand, downstream tasks, e.g., KG curation, embedding, and search can also benefit by adopting LLMs. It remains an interesting direction to explore effective interactions between LLMs and KGs, where many recent advances arise from NLP, deep learning, information retrieval, and computer vision domains. The workshop, titled “LLM+KG: Data Management Opportunities in Unifying Large Language Models + Knowledge Graphs”, is targeted at data...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"","openalex_id":"https://openalex.org/W4412217129","cited_by_count":0,"quality_score":45,"matched_keywords":["LLM","retrieval"],"author_affiliations":["Aalborg University","Places For People","Southeast University","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C2987255567","display_name":"Knowledge graph","score":0.5922012329101562},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5698909759521484},{"id":"https://openalex.org/C2522767166","display_name":"Data science","score":0.512187659740448},{"id":"https://openalex.org/C1668388","display_name":"Data management","score":0.41402214765548706},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.37786391377449036},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3224368095397949},{"id":"https://openalex.org/C124101348","display_name":"Data mining","score":0.23133787512779236}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4402013269","title":"PSP: Pre-training and Structure Prompt Tuning for Graph Neural Networks","url":"https://doi.org/10.1007/978-3-031-70362-1_25","published":"2024-01-01","authors":["Qingqing Ge","Zeyuan Zhao","Yiding Liu","Anfeng Cheng","Xiang Li","Shuaiqiang Wang","Dawei Yin"],"abstract":"","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"book-chapter","doi":"https://doi.org/10.1007/978-3-031-70362-1_25","openalex_id":"https://openalex.org/W4402013269","cited_by_count":7,"quality_score":44,"matched_keywords":[],"author_affiliations":["Baidu (China)","East China Normal University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8530610799789429},{"id":"https://openalex.org/C50644808","display_name":"Artificial neural network","score":0.5522854924201965},{"id":"https://openalex.org/C132525143","display_name":"Graph","score":0.4676515460014343},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.45216241478919983},{"id":"https://openalex.org/C80444323","display_name":"Theoretical computer science","score":0.2788921594619751}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":7}},{"id":"openalex:W4390577888","title":"Disentangled Representation Learning for Controllable Person Image Generation","url":"https://doi.org/10.1109/tmm.2023.3345180","published":"2024-01-01","authors":["Wenju Xu","Chengjiang Long","Yongwei Nie","Guanghui Wang"],"abstract":"In this paper, we propose a novel framework named DRL-CPG to learn disentangled latent representation for controllable person image generation, which can produce realistic person images with desired poses and human attributes (e.g. pose, head, upper clothes, and pants) provided by various source persons. Unlike the existing works leveraging the semantic masks to obtain the representation of each component, we propose to generate disentangled latent code via a novel attribute encoder with transformers trained in a manner of curriculum learning from a relatively easy step to a gradually hard one. A random component mask-agnostic strategy is introduced to randomly remove component masks from the person segmentation masks, which aims at increasing the difficulty of training and promoting the transformer encoder to recognize the underlying boundaries between each component. This enables the m...","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/tmm.2023.3345180","openalex_id":"https://openalex.org/W4390577888","cited_by_count":7,"quality_score":44,"matched_keywords":[],"author_affiliations":["Amazon (United States)","META Health","South China University of Technology","Toronto Metropolitan University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.826274037361145},{"id":"https://openalex.org/C59404180","display_name":"Feature learning","score":0.6718138456344604},{"id":"https://openalex.org/C118505674","display_name":"Encoder","score":0.6683018803596497},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.6398997902870178},{"id":"https://openalex.org/C66322947","display_name":"Transformer","score":0.6008870601654053},{"id":"https://openalex.org/C168167062","display_name":"Component (thermodynamics)","score":0.555796205997467},{"id":"https://openalex.org/C89600930","display_name":"Segmentation","score":0.5291077494621277},{"id":"https://openalex.org/C153180895","display_name":"Pattern recognition (psychology)","score":0.46628519892692566}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":7}},{"id":"openalex:W4404782886","title":"Enhancing Fine-Grained Image Classifications via Cascaded Vision Language Models","url":"https://doi.org/10.18653/v1/2024.findings-emnlp.102","published":"2024-01-01","authors":["Canshi Wei"],"abstract":"Fine-grained image classification, especially in zero-/few-shot scenarios, poses a considerable challenge for vision-language models (VLMs) like CLIP, which often struggle to differentiate between semantically similar classes due to insufficient supervision for fine-grained tasks.On the other hand, Large Vision Language Models (LVLMs) have demonstrated remarkable capabilities in tasks like Visual Question Answering (VQA) but remain underexplored in the context of fine-grained image classification.This paper presents CascadeVLM, a novel framework that harnesses the complementary strengths of both CLIP-like and LVLMs VLMs to tackle these challenges.Using granular knowledge effectively in LVLMs and integrating a cascading approach, CascadeVLM dynamically allocates samples using an entropy threshold, balancing computational efficiency with classification accuracy.Experiments on multiple fine...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.18653/v1/2024.findings-emnlp.102","openalex_id":"https://openalex.org/W4404782886","cited_by_count":2,"quality_score":43,"matched_keywords":["efficient"],"author_affiliations":["Tencent (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6902410984039307},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.5552228093147278},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5410950779914856},{"id":"https://openalex.org/C115961682","display_name":"Image (mathematics)","score":0.5095994472503662},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.38971370458602905}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":2}},{"id":"openalex:W4400410743","title":"AnimeDiff: Customized Image Generation of Anime Characters Using Diffusion Model","url":"https://doi.org/10.1109/tmm.2024.3415357","published":"2024-01-01","authors":["Yuqi Jiang","Qiankun Liu","Dongdong Chen","Lu Yuan","Ying Fu"],"abstract":"Due to the unprecedented power of text-to-image diffusion models, customizing these models to generate new concepts has gained increasing attention. Existing works have achieved some success on real-world concepts, but fail on the concepts of anime characters. We empirically find that such low quality comes from the newly introduced identifier text tokens, which are optimized to identify different characters. In this paper, we propose <italic xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">AnimeDiff</i> which focuses on customized image generation of anime characters. Our AnimeDiff directly binds anime characters with their names and keeps the embeddings of text tokens unchanged. Furthermore, when composing multiple characters in a single image, the model tends to confuse the properties of those characters. To address this issue, our AnimeDiff in...","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/tmm.2024.3415357","openalex_id":"https://openalex.org/W4400410743","cited_by_count":6,"quality_score":43,"matched_keywords":[],"author_affiliations":["Beijing Institute of Technology","Microsoft (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7872510552406311},{"id":"https://openalex.org/C118130439","display_name":"Anime","score":0.723957896232605},{"id":"https://openalex.org/C121684516","display_name":"Computer graphics (images)","score":0.556444525718689},{"id":"https://openalex.org/C69357855","display_name":"Diffusion","score":0.4631459414958954},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.4605099558830261},{"id":"https://openalex.org/C115961682","display_name":"Image (mathematics)","score":0.4434123635292053},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4345194101333618},{"id":"https://openalex.org/C97355855","display_name":"Thermodynamics","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":6}},{"id":"arxiv:2403.09401","title":"Unsupervised Modality-Transferable Video Highlight Detection With Representation Activation Sequence Learning","url":"http://arxiv.org/abs/2403.09401","published":"2024-01-01","authors":["Tingtian Li","Zixun Sun","Xinyu Xiao"],"abstract":"Identifying highlight moments of raw video materials is crucial for improving the efficiency of editing videos that are pervasive on internet platforms. However, the extensive work of manually labeling footage has created obstacles to applying supervised methods to videos of unseen categories. The absence of an audio modality that contains valuable cues for highlight detection in many videos also makes it difficult to use multimodal strategies. In this paper, we propose a novel model with cross-modal perception for unsupervised highlight detection. The proposed model learns representations with visual-audio level semantics from image-audio pair data via a self-reconstruction task. To achieve unsupervised highlight detection, we investigate the latent representations of the network and propose the representation activation sequence learning (RASL) module with k-point contrastive learning....","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/tip.2024.3372469","openalex_id":"https://openalex.org/W4392543396","cited_by_count":5,"quality_score":42,"matched_keywords":[],"author_affiliations":["Tencent (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7927085161209106},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.6828573942184448},{"id":"https://openalex.org/C59404180","display_name":"Feature learning","score":0.6434159278869629},{"id":"https://openalex.org/C2780226545","display_name":"Modality (human–computer interaction)","score":0.5624884366989136},{"id":"https://openalex.org/C184337299","display_name":"Semantics (computer science)","score":0.5412946939468384},{"id":"https://openalex.org/C2776359362","display_name":"Representation (politics)","score":0.5252413749694824},{"id":"https://openalex.org/C2776401178","display_name":"Feature (linguistics)","score":0.4768042266368866},{"id":"https://openalex.org/C2780451532","display_name":"Task (project management)","score":0.4368554651737213}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":5}},{"id":"openalex:W4404780585","title":"On Instruction-Finetuning Neural Machine Translation Models","url":"http://dx.doi.org/10.18653/v1/2024.wmt-1.114","published":"2024-01-01","authors":["Vikas Raunak","Roman Grundkiewicz","Marcin Junczys-Dowmunt"],"abstract":"In this work, we introduce instruction finetuning for Neural Machine Translation (NMT) models, which distills instruction following capabilities from Large Language Models (LLMs) into orders-of-magnitude smaller NMT models.Our instruction-finetuning recipe for NMT models enables customization of translations for a limited but disparate set of translationspecific tasks.We show that NMT models are capable of following multiple instructions simultaneously and demonstrate capabilities of zero-shot composition of instructions.We also show that through instruction finetuning, traditionally disparate tasks such as formalitycontrolled machine translation, multi-domain adaptation as well as multi-modal translations can be tackled jointly by a single instruction finetuned NMT model, at a performance level comparable to LLMs such as GPT-3.5-Turbo.To the best of our knowledge, our work is among the....","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.18653/v1/2024.wmt-1.114","openalex_id":"https://openalex.org/W4404780585","cited_by_count":1,"quality_score":42,"matched_keywords":["efficient"],"author_affiliations":["Microsoft (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7821507453918457},{"id":"https://openalex.org/C203005215","display_name":"Machine translation","score":0.7316922545433044},{"id":"https://openalex.org/C149364088","display_name":"Translation (biology)","score":0.5568583011627197},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4829063415527344},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.3416275382041931},{"id":"https://openalex.org/C185592680","display_name":"Chemistry","score":0.07388538122177124},{"id":"https://openalex.org/C55493867","display_name":"Biochemistry","score":0.0},{"id":"https://openalex.org/C105580179","display_name":"Messenger RNA","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4399407106","title":"Lightweight Model Pre-Training via Language Guided Knowledge Distillation","url":"https://doi.org/10.1109/tmm.2024.3410532","published":"2024-01-01","authors":["Mingsheng Li","Lin Zhang","Mingzhen Zhu","Zilong Huang","Gang Yu","Jiayuan Fan","Tao Chen"],"abstract":"This paper studies the problem of pre-training for small models, which is essential for many mobile devices. Current state-of-the-art methods on this problem transfer the representational knowledge of a large network (as a Teacher) into a smaller model (as a Student) using self-supervised distillation, improving the performance of the small model on downstream tasks. However, existing approaches are insufficient in extracting the crucial knowledge that is useful for discerning categories in downstream tasks during the distillation process. In this paper, for the first time, we introduce language guidance to the distillation process and propose a new method named Language-Guided Distillation (LGD) system, which uses category names of the target downstream task to help refine the knowledge transferred between the teacher and student. To this end, we utilize a pre-trained text encoder to ex...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/tmm.2024.3410532","openalex_id":"https://openalex.org/W4399407106","cited_by_count":1,"quality_score":42,"matched_keywords":["distillation"],"author_affiliations":["Fudan University","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8745420575141907},{"id":"https://openalex.org/C204030448","display_name":"Distillation","score":0.5248851180076599},{"id":"https://openalex.org/C2777211547","display_name":"Training (meteorology)","score":0.5210004448890686},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.5051723122596741},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.502112865447998},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.4293982982635498},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.39696821570396423},{"id":"https://openalex.org/C121332964","display_name":"Physics","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4391782871","title":"Improve Geoscience Interpretation, Reporting, and Transparency with Generative AI","url":"http://dx.doi.org/10.3997/2214-4609.202439088","published":"2024-01-01","authors":["D. Tishechkin","A. Dubovik","A. Koriagin","Roman Khudorozhkov"],"abstract":"Summary The paper emphasizes the need for adaptability in Generative AI, incorporating methods like fine-tuning, prompt engineering, and retrieval-augmented generation (RAG). Centralized data storage, preferably in the cloud, is recommended for co-locating relevant data, with a flexible cost model. A significant portion of the paper is devoted to the interpretability of neural networks, particularly in the context of the energy industry. It addresses the challenge of explaining outputs from large language models (LLMs) and proposes the use of grad-cam like methods for building saliency maps. These maps help in understanding the importance of words or tokens in tasks like passage retrieval and question answering. Through examples and case studies, the paper demonstrates how interpretability algorithms and saliency maps can improve geoscience workflows. The integration of Generative AI and...","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.3997/2214-4609.202439088","openalex_id":"https://openalex.org/W4391782871","cited_by_count":1,"quality_score":42,"matched_keywords":["retrieval"],"author_affiliations":["Amazon (United States)","Technical Data Analysis (United States)"],"concepts":[{"id":"https://openalex.org/C2780233690","display_name":"Transparency (behavior)","score":0.7087520360946655},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5839530825614929},{"id":"https://openalex.org/C527412718","display_name":"Interpretation (philosophy)","score":0.5836912393569946},{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.5431972146034241},{"id":"https://openalex.org/C2522767166","display_name":"Data science","score":0.47541624307632446},{"id":"https://openalex.org/C1965285","display_name":"Earth science","score":0.4037682116031647},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.2671246826648712},{"id":"https://openalex.org/C127313418","display_name":"Geology","score":0.24135807156562805}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4401042049","title":"HW-TSC at SemEval-2024 Task 5: Self-Eval? A Confident LLM System for Auto Prediction and Evaluation for the Legal Argument Reasoning Task","url":"http://dx.doi.org/10.18653/v1/2024.semeval-1.255","published":"2024-01-01","authors":["Xiaofeng Zhao","Xiaosong Qiao","Kaiwen Ou","Min Zhang","Chang Su","Mengyao Piao","Yuang Li","Yinglu Li","Ming Zhu","Yilun Liu"],"abstract":"Xiaofeng Zhao, Xiaosong Qiao, Kaiwen Ou, Min Zhang, Su Chang, Mengyao Piao, Yuang Li, Yinglu Li, Ming Zhu, Yilun Liu. Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024). 2024.","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.18653/v1/2024.semeval-1.255","openalex_id":"https://openalex.org/W4401042049","cited_by_count":1,"quality_score":42,"matched_keywords":["LLM"],"author_affiliations":["Huawei Technologies (China)"],"concepts":[{"id":"https://openalex.org/C2780451532","display_name":"Task (project management)","score":0.8218201398849487},{"id":"https://openalex.org/C44572571","display_name":"SemEval","score":0.735442042350769},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7167091369628906},{"id":"https://openalex.org/C98184364","display_name":"Argument (complex analysis)","score":0.6656633615493774},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.49510708451271057},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.42637360095977783},{"id":"https://openalex.org/C185592680","display_name":"Chemistry","score":0.12891322374343872},{"id":"https://openalex.org/C127413603","display_name":"Engineering","score":0.07748657464981079}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4403069356","title":"FM-OSD: Foundation Model-Enabled One-Shot Detection of Anatomical Landmarks","url":"https://doi.org/10.1007/978-3-031-72120-5_28","published":"2024-01-01","authors":["Juzheng Miao","Cheng Chen","Keli Zhang","Jie Chuai","Quanzheng Li","Pheng‐Ann Heng"],"abstract":"","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"book-chapter","doi":"https://doi.org/10.1007/978-3-031-72120-5_28","openalex_id":"https://openalex.org/W4403069356","cited_by_count":5,"quality_score":42,"matched_keywords":[],"author_affiliations":["Chinese University of Hong Kong","Harvard University","Huawei Technologies (China)","Massachusetts General Hospital","Office of the General Counsel"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8211511373519897},{"id":"https://openalex.org/C2778344882","display_name":"Shot (pellet)","score":0.6444277167320251},{"id":"https://openalex.org/C2780966255","display_name":"Foundation (evidence)","score":0.5953737497329712},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5475871562957764},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.47255489230155945},{"id":"https://openalex.org/C2992734406","display_name":"One shot","score":0.4449300467967987},{"id":"https://openalex.org/C121684516","display_name":"Computer graphics (images)","score":0.40199774503707886},{"id":"https://openalex.org/C127413603","display_name":"Engineering","score":0.05915442109107971}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":5}},{"id":"openalex:W4403843284","title":"Context Diffusion: In-Context Aware Image Generation","url":"https://doi.org/10.1007/978-3-031-72980-5_22","published":"2024-01-01","authors":["Ivona Najdenkoska","Animesh A. Sinha","Abhimanyu Dubey","Dhruv Mahajan","Vignesh Ramanathan","Filip Radenović"],"abstract":"","companies":["Meta/FAIR"],"matched_orgs":["Meta/FAIR"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"book-chapter","doi":"https://doi.org/10.1007/978-3-031-72980-5_22","openalex_id":"https://openalex.org/W4403843284","cited_by_count":5,"quality_score":42,"matched_keywords":[],"author_affiliations":["Meta (United States)","University of Amsterdam"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8215476274490356},{"id":"https://openalex.org/C2779343474","display_name":"Context (archaeology)","score":0.7058010697364807},{"id":"https://openalex.org/C69357855","display_name":"Diffusion","score":0.5576582551002502},{"id":"https://openalex.org/C115961682","display_name":"Image (mathematics)","score":0.4559517502784729},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.4117039144039154},{"id":"https://openalex.org/C121684516","display_name":"Computer graphics (images)","score":0.3699517250061035},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.35510173439979553},{"id":"https://openalex.org/C95457728","display_name":"History","score":0.06982037425041199}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":5}},{"id":"openalex:W4392980010","title":"MorphNeRF: Text-Guided 3D-Aware Editing via Morphing Generative Neural Radiance Fields","url":"https://doi.org/10.1109/tmm.2024.3379888","published":"2024-01-01","authors":["Yingchen Yu","Rongliang Wu","Yifang Men","Shijian Lu","Miaomiao Cui","Xuansong Xie","Chunyan Miao"],"abstract":"Generative neural radiance fields (NeRF) bring image generation into the 3D era, which have delivered impressive generation quality and 3D consistency, especially in the face generation domain. Upon pre-trained generative NeRF, 3D-aware image editing has been explored and achieved promising performance via manipulating semantic maps or attributes. However, a more flexible editing interface, text, remains under-explored in the context of 3D-aware image editing. In this work, we leverage the Contrastive Language-Image Pre-training (CLIP) model to achieve 3D-aware image editing in pre-trained generative NeRF models given a target text prompt. To achieve accurate and controllable geometry editing, we propose MorphNeRF, a learnable morphing network that morphs the 3D geometry of images toward the target descriptions via generative NeRF. Different from prior studies that achieve image editing....","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/tmm.2024.3379888","openalex_id":"https://openalex.org/W4392980010","cited_by_count":4,"quality_score":41,"matched_keywords":[],"author_affiliations":["Agency for Science, Technology and Research","Alibaba Group (China)","Institute for Infocomm Research","Nanyang Technological University"],"concepts":[{"id":"https://openalex.org/C50637493","display_name":"Morphing","score":0.94090735912323},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8582999110221863},{"id":"https://openalex.org/C2776674983","display_name":"Image editing","score":0.6215393543243408},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.563788890838623},{"id":"https://openalex.org/C153083717","display_name":"Leverage (statistics)","score":0.5614287257194519},{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.47332319617271423},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.4464098811149597},{"id":"https://openalex.org/C2779343474","display_name":"Context (archaeology)","score":0.4296822249889374}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":4}},{"id":"openalex:W4411611344","title":"Form-Filling on Autopilot: How Generative AI and Amazon Connect are Transforming Agent Workflows","url":"https://doi.org/10.63282/3050-9246.ijetcsit-v5i2p111","published":"2024-01-01","authors":["P. Krishnamurthy","Ramprasad Srirama"],"abstract":"This research aims to combine generative AI systems with Amazon Connect so that form-filling duties in a contact center can be automated. Current contact centers encounter substantial problems in managing manual data entry, which results in higher AHT, variations in documentation, and a high frequency of errors. We propose using real-time speech transcription, AI-driven dynamic form filling, and automated summarization to lower manual work and improve customer service output. The design consists of Amazon Connect for customer communications, Contact Lens for monitoring conversations, AWS Lambda for instant event handling, Amazon DynamoDB for managing structured data, and Amazon Bedrock for integrating generative AI. Tests show that our system leads to a 20% drop in AHT, a 73% drop in data entry errors, and a 78% decrease in documentation time. This research contributes an important step....","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.63282/3050-9246.ijetcsit-v5i2p111","openalex_id":"https://openalex.org/W4411611344","cited_by_count":0,"quality_score":41,"matched_keywords":["agent"],"author_affiliations":["Amazon (United States)"],"concepts":[{"id":"https://openalex.org/C535291247","display_name":"Amazon rainforest","score":0.8192125558853149},{"id":"https://openalex.org/C18020424","display_name":"Autopilot","score":0.7946419715881348},{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.6745609045028687},{"id":"https://openalex.org/C177212765","display_name":"Workflow","score":0.6459646224975586},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.5302155613899231},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4232128858566284},{"id":"https://openalex.org/C127413603","display_name":"Engineering","score":0.19594335556030273},{"id":"https://openalex.org/C77088390","display_name":"Database","score":0.14265745878219604}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4403150348","title":"FM-ABS: Promptable Foundation Model Drives Active Barely Supervised Learning for 3D Medical Image Segmentation","url":"https://doi.org/10.1007/978-3-031-72111-3_28","published":"2024-01-01","authors":["Zhe Xu","Cheng Chen","Donghuan Lu","Jinghan Sun","Dong Wei","Yefeng Zheng","Quanzheng Li","Raymond Kai‐Yu Tong"],"abstract":"","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"book-chapter","doi":"https://doi.org/10.1007/978-3-031-72111-3_28","openalex_id":"https://openalex.org/W4403150348","cited_by_count":4,"quality_score":41,"matched_keywords":[],"author_affiliations":["Chinese University of Hong Kong","Harvard University","Massachusetts General Hospital","Office of the General Counsel","Tencent (China)","Xiamen University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7726930379867554},{"id":"https://openalex.org/C2780966255","display_name":"Foundation (evidence)","score":0.7004133462905884},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5887842774391174},{"id":"https://openalex.org/C89600930","display_name":"Segmentation","score":0.579314649105072},{"id":"https://openalex.org/C124504099","display_name":"Image segmentation","score":0.4753779470920563},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.43953025341033936},{"id":"https://openalex.org/C95457728","display_name":"History","score":0.0},{"id":"https://openalex.org/C166957645","display_name":"Archaeology","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":4}},{"id":"openalex:W4404780845","title":"Exploring the Traditional NMT Model and Large Language Model for Chat Translation","url":"http://dx.doi.org/10.18653/v1/2024.wmt-1.105","published":"2024-01-01","authors":["Jinlong Yang","Hengchao Shang","Daimeng Wei","Jiaxin Guo","Zongyao Li","Zhanglin Wu","Zhiqiang Rao","Shaojun Li","Yuhao Xie","Yuanchang Luo","Jiawei Zheng","Bin Wei"],"abstract":"Jinlong Yang, Hengchao Shang, Daimeng Wei, Jiaxin Guo, Zongyao Li, Zhanglin Wu, Zhiqiang Rao, Shaojun Li, Yuhao Xie, Yuanchang Luo, Zheng Jiawei, Bin Wei, Hao Yang. Proceedings of the Ninth Conference on Machine Translation. 2024.","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.18653/v1/2024.wmt-1.105","openalex_id":"https://openalex.org/W4404780845","cited_by_count":0,"quality_score":41,"matched_keywords":["language model"],"author_affiliations":["Huawei Technologies (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7262568473815918},{"id":"https://openalex.org/C149364088","display_name":"Translation (biology)","score":0.5866257548332214},{"id":"https://openalex.org/C137293760","display_name":"Language model","score":0.4212135076522827},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.3959924578666687},{"id":"https://openalex.org/C105580179","display_name":"Messenger RNA","score":0.0},{"id":"https://openalex.org/C104317684","display_name":"Gene","score":0.0},{"id":"https://openalex.org/C55493867","display_name":"Biochemistry","score":0.0},{"id":"https://openalex.org/C185592680","display_name":"Chemistry","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4401042708","title":"Exploring Compositional Generalization of Large Language Models","url":"https://doi.org/10.18653/v1/2024.naacl-srw.3","published":"2024-01-01","authors":["Haoran Yang","Hongyuan Lu","Wai Lam","Deng Cai"],"abstract":"Haoran Yang, Hongyuan Lu, Wai Lam, Deng Cai. Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 4: Student Research Workshop). 2024.","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.18653/v1/2024.naacl-srw.3","openalex_id":"https://openalex.org/W4401042708","cited_by_count":4,"quality_score":41,"matched_keywords":[],"author_affiliations":["Chinese University of Hong Kong","Tencent (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6942022442817688},{"id":"https://openalex.org/C177148314","display_name":"Generalization","score":0.6695576906204224},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.4201034903526306},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3232330083847046},{"id":"https://openalex.org/C80444323","display_name":"Theoretical computer science","score":0.3204120993614197},{"id":"https://openalex.org/C33923547","display_name":"Mathematics","score":0.10467109084129333},{"id":"https://openalex.org/C134306372","display_name":"Mathematical analysis","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":4}},{"id":"openalex:W4390489076","title":"Context-Aware and Semantic-Consistent Spatial Interactions for One-Shot Object Detection Without Fine-Tuning","url":"https://doi.org/10.1109/tcsvt.2023.3349007","published":"2024-01-01","authors":["Hanqing Yang","Sijia Cai","Bing Deng","Jieping Ye","Guosheng Lin","Yu Zhang"],"abstract":"One-shot object detection (OSOD) without fine-tuning has recently garnered considerable attention and research focus. It aims to directly detect novel-class objects in the target image by providing merely one support image patch without undergoing the fine-tuning stage. However, most existing methods adopt image pair matching regardless of the scale inconsistency and spatial semantic mismatch of image pairs, which limits their ability to acquire high-quality target-support related features. This paper addresses these limitations by incorporating cross-scale contexts and semantic-consistent cues that are robust against the challenges of scarce and ambiguous matching. Specifically, we first introduce a simple yet effective Aggregation-Transformer-based Pyramid (ATP) module to explore the long-range cross-scale spatial interactions by employing the customized size-aware aggregation approach...","companies":["Alibaba/Qwen"],"matched_orgs":["Alibaba/Qwen"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.1109/tcsvt.2023.3349007","openalex_id":"https://openalex.org/W4390489076","cited_by_count":4,"quality_score":41,"matched_keywords":[],"author_affiliations":["Alibaba Group (China)","Nanyang Technological University","Zhejiang University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7861964106559753},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.6021856069564819},{"id":"https://openalex.org/C118505674","display_name":"Encoder","score":0.5757606029510498},{"id":"https://openalex.org/C153180895","display_name":"Pattern recognition (psychology)","score":0.5161346197128296},{"id":"https://openalex.org/C64754055","display_name":"Spatial contextual awareness","score":0.5111942887306213},{"id":"https://openalex.org/C2776151529","display_name":"Object detection","score":0.45942404866218567},{"id":"https://openalex.org/C52622490","display_name":"Feature extraction","score":0.43973326683044434},{"id":"https://openalex.org/C41608201","display_name":"Embedding","score":0.43542274832725525}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":4}},{"id":"openalex:W4404780840","title":"CantTalkAboutThis: Aligning Language Models to Stay on Topic in Dialogues","url":"https://doi.org/10.18653/v1/2024.findings-emnlp.713","published":"2024-01-01","authors":["Makesh Narsimhan Sreedhar","Traian Rebedea","Shaona Ghosh","Jiaqi Zeng","Christopher Parisien"],"abstract":"Recent advancements in instruction-tuning datasets have predominantly focused on specific tasks like mathematical or logical reasoning.There has been a notable gap in data designed for aligning language models to maintain topic relevance in conversations -a critical aspect for deploying chatbots to production.We introduce the CANTTALKABOUT-THIS dataset to help language models remain focused on the subject at hand during taskoriented interactions.It consists of synthetic dialogues on a wide range of conversation topics from different domains.These dialogues are interspersed with distractor turns that intentionally divert the chatbot from the predefined topic.Fine-tuning language models on this dataset helps make them resilient to deviating from the assigned role and improves their ability to maintain topical coherence compared to generalpurpose instruction-tuned LLMs like GPT-4-TURBO and....","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.18653/v1/2024.findings-emnlp.713","openalex_id":"https://openalex.org/W4404780840","cited_by_count":4,"quality_score":41,"matched_keywords":[],"author_affiliations":["Nvidia (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.766965389251709},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.4920017421245575},{"id":"https://openalex.org/C41895202","display_name":"Linguistics","score":0.43896299600601196},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.37131428718566895},{"id":"https://openalex.org/C199360897","display_name":"Programming language","score":0.3217613101005554},{"id":"https://openalex.org/C138885662","display_name":"Philosophy","score":0.06284895539283752}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":4}},{"id":"openalex:W4402797960","title":"CSAFT: Continuous Semantic Augmentation Fine-Tuning for Legal Large Language Models","url":"https://doi.org/10.1007/978-3-031-72344-5_20","published":"2024-01-01","authors":["Bo Li","Shuang Fan","Jin Huang"],"abstract":"","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"book-chapter","doi":"https://doi.org/10.1007/978-3-031-72344-5_20","openalex_id":"https://openalex.org/W4402797960","cited_by_count":4,"quality_score":41,"matched_keywords":[],"author_affiliations":["Baidu (China)","Tsinghua University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.850697934627533},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.5224960446357727},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4259144067764282},{"id":"https://openalex.org/C199360897","display_name":"Programming language","score":0.40846800804138184}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":4}},{"id":"openalex:W4411659480","title":"Zero-Interpolation Models: Bridging Modes with Nonlinear Latent Spaces","url":"https://doi.org/10.63282/3050-9416.ijaibdcms-v5i1p107","published":"2024-01-01","authors":["Sai Prasad Veluru"],"abstract":"Zero-interpolation models provide a fresh development in generative modeling as they allow one to negotiate complex, multimodal latent spaces without running into the common problems with mode collapse & also implausible transitions. When switching between different data modes, conventional interpolation methods especially linear algorithms have trouble typically generating more synthetic results that fail to reflect any other actual distribution within the training information. This work addresses the challenge by building paths respecting the inherent geometry of the latent space using a nonlinear, manifold-aware interpolation technique. These zero-interpolation models are designed to cover high-probability regions, therefore avoiding implausible samples and more faithfully reflecting the range seen in multimodal distributions. Our contributions begin with a theoretical framework that....","companies":["Apple"],"matched_orgs":["Apple"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.63282/3050-9416.ijaibdcms-v5i1p107","openalex_id":"https://openalex.org/W4411659480","cited_by_count":3,"quality_score":40,"matched_keywords":[],"author_affiliations":["Apple (Israel)","Apple (United States)"],"concepts":[{"id":"https://openalex.org/C174348530","display_name":"Bridging (networking)","score":0.7198302745819092},{"id":"https://openalex.org/C137800194","display_name":"Interpolation (computer graphics)","score":0.6377909779548645},{"id":"https://openalex.org/C158622935","display_name":"Nonlinear system","score":0.5565671324729919},{"id":"https://openalex.org/C2780813799","display_name":"Zero (linguistics)","score":0.5520583987236023},{"id":"https://openalex.org/C33923547","display_name":"Mathematics","score":0.47625255584716797},{"id":"https://openalex.org/C28826006","display_name":"Applied mathematics","score":0.35839495062828064},{"id":"https://openalex.org/C134306372","display_name":"Mathematical analysis","score":0.3371877670288086},{"id":"https://openalex.org/C121864883","display_name":"Statistical physics","score":0.32828712463378906}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":3}},{"id":"openalex:W4402670068","title":"SpeechGuard: Exploring the Adversarial Robustness of Multi-modal Large Language Models","url":"https://doi.org/10.18653/v1/2024.findings-acl.596","published":"2024-01-01","authors":["Raghuveer Peri","Sai Muralidhar Jayanthi","Srikanth Ronanki","Anshu Bhatia","Karel Mundnich","Saket Dingliwal","Nilaksh Das","Zejiang Hou","Goeric Huybrechts","Srikanth Vishnubhotla","Daniel Garcia-Romero","Sundararajan Srinivasan"],"abstract":"Raghuveer Peri, Sai Muralidhar Jayanthi, Srikanth Ronanki, Anshu Bhatia, Karel Mundnich, Saket Dingliwal, Nilaksh Das, Zejiang Hou, Goeric Huybrechts, Srikanth Vishnubhotla, Daniel Garcia-Romero, Sundararajan Srinivasan, Kyu Han, Katrin Kirchhoff. Findings of the Association for Computational Linguistics: ACL 2024. 2024.","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.18653/v1/2024.findings-acl.596","openalex_id":"https://openalex.org/W4402670068","cited_by_count":3,"quality_score":40,"matched_keywords":[],"author_affiliations":["Amazon (United States)"],"concepts":[{"id":"https://openalex.org/C37736160","display_name":"Adversarial system","score":0.8051980137825012},{"id":"https://openalex.org/C63479239","display_name":"Robustness (evolution)","score":0.7886155843734741},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.747058629989624},{"id":"https://openalex.org/C71139939","display_name":"Modal","score":0.7027016282081604},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4549834132194519},{"id":"https://openalex.org/C28490314","display_name":"Speech recognition","score":0.3699670433998108},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.3632195293903351},{"id":"https://openalex.org/C104317684","display_name":"Gene","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":3}},{"id":"openalex:W4402670020","title":"Instruction Position Matters in Sequence Generation with Large Language Models","url":"https://doi.org/10.18653/v1/2024.findings-acl.693","published":"2024-01-01","authors":["Yijin Liu","Xianfeng Zeng","Chenze Shao","Fandong Meng","Jie Zhou"],"abstract":"Large language models (LLMs) are capable of performing conditional sequence generation tasks, such as translation or summarization, through instruction fine-tuning.The fine-tuning data is generally a sequential concatenation of a specific task instruction, an input sentence, and the corresponding response.Considering the locality of self-attention modeling in LLMs, these models face the risk of instruction forgetting when generating responses for long input sentences.To mitigate this issue, we propose to enhance the instruction-following capability of LLMs by relocating the position of task instructions after the input sentences.Theoretical analysis suggests that our straightforward method can alter the model's learning focus, thereby emphasizing the training of instructionfollowing capabilities.Concurrently, experimental results demonstrate that our approach consistently outperforms tra...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.18653/v1/2024.findings-acl.693","openalex_id":"https://openalex.org/W4402670020","cited_by_count":3,"quality_score":40,"matched_keywords":[],"author_affiliations":["Tencent (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6965351700782776},{"id":"https://openalex.org/C2778112365","display_name":"Sequence (biology)","score":0.5963971018791199},{"id":"https://openalex.org/C78780964","display_name":"Position paper","score":0.5211189389228821},{"id":"https://openalex.org/C198082294","display_name":"Position (finance)","score":0.46733129024505615},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.4025604724884033},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.34125056862831116},{"id":"https://openalex.org/C199360897","display_name":"Programming language","score":0.33356979489326477},{"id":"https://openalex.org/C136764020","display_name":"World Wide Web","score":0.16109174489974976}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":3}},{"id":"openalex:W4402684074","title":"Towards Multiple References Era – Addressing Data Leakage and Limited Reference Diversity in Machine Translation Evaluation","url":"https://doi.org/10.18653/v1/2024.findings-acl.710","published":"2024-01-01","authors":["Xianfeng Zeng","Yijin Liu","Fandong Meng","Jie Zhou"],"abstract":"Recent research has shown a weak correlation between n-gram-based metrics and human evaluations in machine translation task, particularly when evaluating large language models (LLMs).Additionally, the data leakage risk in LLMs may cause an overestimation problem when evaluating LLMs on downstream tasks.In this work, we identify the limited diversity of references as the primary cause for the inferior performance of n-gram-based metrics and the overestimation problem.To address this issue, we propose to utilize multiple references generated by LLMs, coupled with an effective selection strategy focused on accuracy and diversity, to improve the alignment between automatic metrics and human evaluations.We validate our approach on the WMT22 Metrics benchmark with 4 languages and observe a maximum accuracy gain of 9.5% in F200spBLEU, which makes it on par with computationally expensive neural-...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.18653/v1/2024.findings-acl.710","openalex_id":"https://openalex.org/W4402684074","cited_by_count":2,"quality_score":39,"matched_keywords":[],"author_affiliations":["Tencent (China)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6975988745689392},{"id":"https://openalex.org/C2777042071","display_name":"Leakage (economics)","score":0.5646693706512451},{"id":"https://openalex.org/C203005215","display_name":"Machine translation","score":0.5557641983032227},{"id":"https://openalex.org/C149364088","display_name":"Translation (biology)","score":0.5160755515098572},{"id":"https://openalex.org/C2781316041","display_name":"Diversity (politics)","score":0.47314512729644775},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3790173828601837},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.3471946716308594},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.32032153010368347}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":2}},{"id":"arxiv:2406.02128","title":"Iteration Head: A Mechanistic Study of Chain-of-Thought","url":"http://arxiv.org/abs/2406.02128","published":"2024-01-01","authors":["Charles Arnal","Wassim Bouaziz","Vivien Cabannes","François Charton","Julia Kempe","Alice Yang"],"abstract":"Chain-of-Thought (CoT) reasoning is known to improve Large Language Models both empirically and in terms of theoretical approximation power. However, our understanding of the inner workings and conditions of apparition of CoT capabilities remains limited. This paper helps fill this gap by demonstrating how CoT reasoning emerges in transformers in a controlled and interpretable setting. In particular, we observe the appearance of a specialized attention mechanism dedicated to iterative reasoning, which we coined \"iteration heads\". We track both the emergence and the precise working of these iteration heads down to the attention level, and measure the transferability of the CoT skills to which they give rise between tasks.","companies":["Meta/FAIR"],"matched_orgs":["Meta/FAIR"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.52202/079017-3463","openalex_id":"https://openalex.org/W4399417123","cited_by_count":2,"quality_score":39,"matched_keywords":[],"author_affiliations":["Centre Inria de Saclay","Courant Institute of Mathematical Sciences","Laboratoire de Mathématiques d'Orsay","Meta (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7359382510185242},{"id":"https://openalex.org/C66322947","display_name":"Transformer","score":0.5339428782463074},{"id":"https://openalex.org/C61272859","display_name":"Transferability","score":0.4803113341331482},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4556972086429596},{"id":"https://openalex.org/C89611455","display_name":"Mechanism (biology)","score":0.4126889407634735},{"id":"https://openalex.org/C80444323","display_name":"Theoretical computer science","score":0.3548453748226166},{"id":"https://openalex.org/C188147891","display_name":"Cognitive science","score":0.32993167638778687},{"id":"https://openalex.org/C119857082","display_name":"Machine learning","score":0.24399533867835999}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":2}},{"id":"openalex:W4402670267","title":"Improving Machine Translation with Large Language Models: A Preliminary Study with Cooperative Decoding","url":"http://dx.doi.org/10.18653/v1/2024.findings-acl.786","published":"2024-01-01","authors":["Jiali Zeng","Fandong Meng","Yongjing Yin","Jie Zhou"],"abstract":"Contemporary translation engines based on the encoder-decoder framework have made significant strides in development.However, the emergence of Large Language Models (LLMs) has disrupted their position by presenting the potential for achieving superior translation quality.To uncover the circumstances in which LLMs excel and explore how their strengths can be harnessed to enhance translation quality, we first conduct a comprehensive analysis to assess the strengths and limitations of various commercial NMT systems and MT-oriented LLMs.Our findings indicate that neither NMT nor MToriented LLMs alone can effectively address all the translation issues, but MT-oriented LLMs show promise as a complementary solution to NMT systems.Building upon these insights, we propose Cooperative Decoding (CoDec), which treats NMT systems as a pretranslation model and MT-oriented LLMs as a supplemental soluti...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.18653/v1/2024.findings-acl.786","openalex_id":"https://openalex.org/W4402670267","cited_by_count":2,"quality_score":39,"matched_keywords":[],"author_affiliations":["Tencent (China)"],"concepts":[{"id":"https://openalex.org/C203005215","display_name":"Machine translation","score":0.7948670983314514},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7644708156585693},{"id":"https://openalex.org/C57273362","display_name":"Decoding methods","score":0.6975113153457642},{"id":"https://openalex.org/C149364088","display_name":"Translation (biology)","score":0.5979247689247131},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.4899434447288513},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4713834226131439},{"id":"https://openalex.org/C11413529","display_name":"Algorithm","score":0.12162989377975464},{"id":"https://openalex.org/C185592680","display_name":"Chemistry","score":0.07817468047142029}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":2}},{"id":"openalex:W4404781192","title":"Context-aware and Style-related Incremental Decoding Framework for Discourse-Level Literary Translation","url":"https://doi.org/10.18653/v1/2024.wmt-1.97","published":"2024-01-01","authors":["Yuanchang Luo","Jiaxin Guo","Daimeng Wei","Hengchao Shang","Zongyao Li","Zhanglin Wu","Zhiqiang Rao","Shaojun Li","Jinlong Yang","Hao Yang"],"abstract":"This report outlines our approach for the WMT24 Discourse-Level Literary Translation Task, focusing on the Chinese-English language pair in the Constrained Track.Translating literary texts poses significant challenges due to the nuanced meanings, idiomatic expressions, and intricate narrative structures inherent in such works.To address these challenges, we leveraged the Chinese-Llama2 model, specifically enhanced for this task through a combination of Continual Pre-training (CPT) and Supervised Fine-Tuning (SFT).Our methodology includes a novel Incremental Decoding framework, which ensures that each sentence is translated with consideration of its broader context, maintaining coherence and consistency throughout the text.This approach allows the model to capture long-range dependencies and stylistic elements, producing translations that faithfully preserve the original literary quality....","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.18653/v1/2024.wmt-1.97","openalex_id":"https://openalex.org/W4404781192","cited_by_count":2,"quality_score":39,"matched_keywords":[],"author_affiliations":["Huawei Technologies (China)"],"concepts":[{"id":"https://openalex.org/C57273362","display_name":"Decoding methods","score":0.7755075693130493},{"id":"https://openalex.org/C2776445246","display_name":"Style (visual arts)","score":0.7650529146194458},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7167021632194519},{"id":"https://openalex.org/C2779343474","display_name":"Context (archaeology)","score":0.662894070148468},{"id":"https://openalex.org/C149364088","display_name":"Translation (biology)","score":0.6076902151107788},{"id":"https://openalex.org/C41895202","display_name":"Linguistics","score":0.47478407621383667},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.4477871060371399},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.3355347514152527}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":2}},{"id":"openalex:W4404782750","title":"Automatic Instruction Evolving for Large Language Models","url":"https://doi.org/10.18653/v1/2024.emnlp-main.397","published":"2024-01-01","authors":["Weihao Zeng","Can Xu","Yingxiu Zhao","Jian–Guang Lou","Weizhu Chen"],"abstract":"Fine-tuning large pre-trained language models with Evol-Instruct has achieved encouraging results across a wide range of tasks.However, designing effective evolving methods for instruction evolution requires substantial human expertise.This paper proposes Auto Evol-Instruct, an end-to-end framework that evolves instruction datasets using large language models without any human effort.The framework automatically analyzes and summarizes suitable evolutionary strategies for the given instruction data and iteratively improves the evolving method based on issues exposed during the instruction evolution process.Our extensive experiments demonstrate that the best method optimized by Auto Evol-Instruct outperforms human-designed methods on various benchmarks, including MT-Bench, AlpacaEval, GSM8K, and HumanEval.","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.18653/v1/2024.emnlp-main.397","openalex_id":"https://openalex.org/W4404782750","cited_by_count":2,"quality_score":39,"matched_keywords":[],"author_affiliations":["Microsoft (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8077690005302429},{"id":"https://openalex.org/C199360897","display_name":"Programming language","score":0.4550103545188904},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.4298895299434662},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.35614854097366333}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":2}},{"id":"openalex:W4392963978","title":"VEMO: A Versatile Elastic Multi-modal Model for Search-Oriented Multi-task Learning","url":"https://doi.org/10.1007/978-3-031-56027-9_4","published":"2024-01-01","authors":["Nanyi Fei","Hao Jiang","Haoyu Lu","Jinqiang Long","Yanqi Dai","Tuo Fan","Zhao Cao","Zhiwu Lu"],"abstract":"","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"book-chapter","doi":"https://doi.org/10.1007/978-3-031-56027-9_4","openalex_id":"https://openalex.org/W4392963978","cited_by_count":1,"quality_score":38,"matched_keywords":[],"author_affiliations":["Huawei Technologies (China)","Renmin University of China"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.867171585559845},{"id":"https://openalex.org/C71139939","display_name":"Modal","score":0.6764572858810425},{"id":"https://openalex.org/C2780451532","display_name":"Task (project management)","score":0.5537914037704468},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4449710249900818},{"id":"https://openalex.org/C107457646","display_name":"Human–computer interaction","score":0.33865100145339966},{"id":"https://openalex.org/C201995342","display_name":"Systems engineering","score":0.06609264016151428},{"id":"https://openalex.org/C159985019","display_name":"Composite material","score":0.051149338483810425},{"id":"https://openalex.org/C192562407","display_name":"Materials science","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4391775721","title":"Topic Segmentation of Semi-structured and Unstructured Conversational Datasets Using Language Models","url":"https://doi.org/10.1007/978-3-031-47718-8_7","published":"2024-01-01","authors":["Reshmi Ghosh","Harjeet Singh Kajal","Sharanya Kamath","Dhuri Shrivastava","Samyadeep Basu","Hansi Zeng","Soundararajan Srinivasan"],"abstract":"","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"book-chapter","doi":"https://doi.org/10.1007/978-3-031-47718-8_7","openalex_id":"https://openalex.org/W4391775721","cited_by_count":1,"quality_score":38,"matched_keywords":[],"author_affiliations":["Microsoft (United States)","University of Maryland, College Park","University of Massachusetts Amherst"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6915780901908875},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.6345459818840027},{"id":"https://openalex.org/C89600930","display_name":"Segmentation","score":0.6051953434944153},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5103051662445068},{"id":"https://openalex.org/C2781252014","display_name":"Unstructured data","score":0.472649484872818},{"id":"https://openalex.org/C41895202","display_name":"Linguistics","score":0.33640772104263306},{"id":"https://openalex.org/C124101348","display_name":"Data mining","score":0.14106076955795288},{"id":"https://openalex.org/C75684735","display_name":"Big data","score":0.03218260407447815}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4403066905","title":"Learning to Segment Multiple Organs from Multimodal Partially Labeled Datasets","url":"https://doi.org/10.1007/978-3-031-72114-4_36","published":"2024-01-01","authors":["Hong Liu","Dong Wei","Donghuan Lu","Jinghan Sun","Hao Zheng","Yefeng Zheng","Liansheng Wang"],"abstract":"","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"book-chapter","doi":"https://doi.org/10.1007/978-3-031-72114-4_36","openalex_id":"https://openalex.org/W4403066905","cited_by_count":1,"quality_score":38,"matched_keywords":[],"author_affiliations":["Tencent (China)","Xiamen University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8733271360397339},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5256786942481995},{"id":"https://openalex.org/C153180895","display_name":"Pattern recognition (psychology)","score":0.357485830783844}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4402683124","title":"Improving the Quality of IWLST 2024 Cascade Offline Speech Translation and Speech-to-Speech Translation via Translation Hypothesis Ensembling with NMT models and Large Language Models","url":"http://dx.doi.org/10.18653/v1/2024.iwslt-1.7","published":"2024-01-01","authors":["Zhanglin Wu","Jiaxin Guo","Daimeng Wei","Zhiqiang Rao","Zongyao Li","Hengchao Shang","Yuanchang Luo","Shaojun Li","Hao Yang"],"abstract":"Zhanglin Wu, Jiaxin Guo, Daimeng Wei, Zhiqiang Rao, Zongyao Li, Hengchao Shang, Yuanchang Luo, Shaojun Li, Hao Yang. Proceedings of the 21st International Conference on Spoken Language Translation (IWSLT 2024). 2024.","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.18653/v1/2024.iwslt-1.7","openalex_id":"https://openalex.org/W4402683124","cited_by_count":1,"quality_score":38,"matched_keywords":[],"author_affiliations":["Huawei Technologies (China)"],"concepts":[{"id":"https://openalex.org/C2780366754","display_name":"Speech translation","score":0.8109380006790161},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.7929050922393799},{"id":"https://openalex.org/C149364088","display_name":"Translation (biology)","score":0.5952823162078857},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.5873203277587891},{"id":"https://openalex.org/C28490314","display_name":"Speech recognition","score":0.566110372543335},{"id":"https://openalex.org/C34146451","display_name":"Cascade","score":0.5442956686019897},{"id":"https://openalex.org/C203005215","display_name":"Machine translation","score":0.5267089009284973},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4852190315723419}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4402114700","title":"Hierarchical Structure-Aware Graph Prompting for Drug-Drug Interaction Prediction","url":"https://doi.org/10.1007/978-3-031-70371-3_3","published":"2024-01-01","authors":["Yuhan Ye","Jingbo Zhou","Shuangli Li","Congxi Xiao","Haochao Ying","Hui Xiong"],"abstract":"","companies":["Baidu"],"matched_orgs":["Baidu"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"book-chapter","doi":"https://doi.org/10.1007/978-3-031-70371-3_3","openalex_id":"https://openalex.org/W4402114700","cited_by_count":1,"quality_score":38,"matched_keywords":[],"author_affiliations":["Baidu (China)","Hong Kong University of Science and Technology","University of Hong Kong","University of Science and Technology of China","Zhejiang University"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8366391658782959},{"id":"https://openalex.org/C132525143","display_name":"Graph","score":0.5436865091323853},{"id":"https://openalex.org/C80444323","display_name":"Theoretical computer science","score":0.41492295265197754},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.35967686772346497}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4402670012","title":"BBA: Bi-Modal Behavioral Alignment for Reasoning with Large Vision-Language Models","url":"http://dx.doi.org/10.18653/v1/2024.findings-acl.433","published":"2024-01-01","authors":["Xueliang Zhao","Xinting Huang","Tingchen Fu","Qintong Li","Shansan Gong","Lemao Liu","Wei Bi","Lingpeng Kong"],"abstract":"Multimodal reasoning stands as a pivotal capability for large vision-language models (LVLMs).The integration with Domain-Specific Languages (DSL), offering precise visual representations, equips these models with the opportunity to execute more accurate reasoning in complex and professional domains.However, the vanilla Chain-of-Thought (CoT) prompting method faces challenges in effectively leveraging the unique strengths of visual and DSL representations, primarily due to their differing reasoning mechanisms.Additionally, it often falls short in addressing critical steps in multi-step reasoning tasks.To mitigate these challenges, we introduce the Bi-Modal Behavioral Alignment (BBA) prompting method, designed to maximize the potential of DSL in augmenting complex multi-modal reasoning tasks.This method initiates by guiding LVLMs to create separate reasoning chains for visual and DSL repre...","companies":["Tencent/Hunyuan"],"matched_orgs":["Tencent/Hunyuan"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.18653/v1/2024.findings-acl.433","openalex_id":"https://openalex.org/W4402670012","cited_by_count":1,"quality_score":38,"matched_keywords":[],"author_affiliations":["Tencent (China)","University of Hong Kong"],"concepts":[{"id":"https://openalex.org/C71139939","display_name":"Modal","score":0.7005997896194458},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6988266706466675},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.4974260628223419},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.3821272850036621},{"id":"https://openalex.org/C185592680","display_name":"Chemistry","score":0.052502185106277466},{"id":"https://openalex.org/C188027245","display_name":"Polymer chemistry","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4391789209","title":"Applied Generative AI and Deep Learning Techniques to Optimize\\Automate Strategic Geophysical Workflows and Insights Generation","url":"http://dx.doi.org/10.3997/2214-4609.202439052","published":"2024-01-01","authors":["А. С. Дубовик","J. Aguas","H.S. Gill","Roman Khudorozhkov"],"abstract":"Summary The paper discusses a new AI approach designed to improve how O&G companies handle seismic data. Traditional ways of dealing with this data are often slow and don’t always give useful insights. This new technology solves these problems by using a mix of cloud computing, machine learning, and data analysis. This technology has four main parts. The first part quickly finds where seismic data comes from, making it easier to get and understand this information. The second part uses advanced learning methods to create detailed information from seismic data, helping in making better decisions for exploration and drilling. The third part lets users ask questions in natural language to get specific information about seismic projects, which helps in interpreting and reporting data. The last part creates realistic 3D seismic for testing and training ML models insight the company prem. This...","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.3997/2214-4609.202439052","openalex_id":"https://openalex.org/W4391789209","cited_by_count":1,"quality_score":38,"matched_keywords":[],"author_affiliations":["Amazon (United States)","Technical Data Analysis (United States)"],"concepts":[{"id":"https://openalex.org/C177212765","display_name":"Workflow","score":0.7936743497848511},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6445553302764893},{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.6333497762680054},{"id":"https://openalex.org/C108583219","display_name":"Deep learning","score":0.5421501398086548},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5362080931663513},{"id":"https://openalex.org/C8058405","display_name":"Geophysics","score":0.366901159286499},{"id":"https://openalex.org/C115903868","display_name":"Software engineering","score":0.3304629921913147},{"id":"https://openalex.org/C127313418","display_name":"Geology","score":0.2794068455696106}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":1}},{"id":"openalex:W4402422298","title":"Multimodal Adaptive Inference for Document Image Classification with Anytime Early Exiting","url":"https://doi.org/10.1007/978-3-031-70546-5_16","published":"2024-01-01","authors":["Omar Hamed","Souhail Bakkali","Matthew B. Blaschko","Sien Moens","Jordy Van Landeghem"],"abstract":"","companies":["Microsoft"],"matched_orgs":["Microsoft"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"book-chapter","doi":"https://doi.org/10.1007/978-3-031-70546-5_16","openalex_id":"https://openalex.org/W4402422298","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["KU Leuven","La Rochelle Université","Microsoft (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8714362382888794},{"id":"https://openalex.org/C2776214188","display_name":"Inference","score":0.6341345310211182},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.5425049662590027},{"id":"https://openalex.org/C115961682","display_name":"Image (mathematics)","score":0.41227734088897705},{"id":"https://openalex.org/C31972630","display_name":"Computer vision","score":0.39496076107025146},{"id":"https://openalex.org/C23123220","display_name":"Information retrieval","score":0.3223918080329895}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4402683114","title":"HW-TSC at TextGraphs-17 Shared Task: Enhancing Inference Capabilities of LLMs with Knowledge Graphs","url":"http://dx.doi.org/10.18653/v1/2024.textgraphs-1.11","published":"2024-01-01","authors":["Wei Tang","Xiaosong Qiao","Xiaofeng Zhao","Min Zhang","Chang Su","Yuang Li","Yinglu Li","Yilun Liu","Feiyu Yao","Shimin Tao","Hao Yang","Xianghui He"],"abstract":"Wei Tang, Xiaosong Qiao, Xiaofeng Zhao, Min Zhang, Chang Su, Yuang Li, Yinglu Li, Yilun Liu, Feiyu Yao, Shimin Tao, Hao Yang, He Xianghui. Proceedings of TextGraphs-17: Graph-based Methods for Natural Language Processing. 2024.","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.18653/v1/2024.textgraphs-1.11","openalex_id":"https://openalex.org/W4402683114","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Huawei Technologies (China)"],"concepts":[{"id":"https://openalex.org/C2776214188","display_name":"Inference","score":0.7190209627151489},{"id":"https://openalex.org/C2780451532","display_name":"Task (project management)","score":0.6894363164901733},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.6180809736251831},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.33637556433677673},{"id":"https://openalex.org/C56739046","display_name":"Knowledge management","score":0.3256487250328064},{"id":"https://openalex.org/C127413603","display_name":"Engineering","score":0.13240602612495422},{"id":"https://openalex.org/C201995342","display_name":"Systems engineering","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W4401043747","title":"HW-TSC at SemEval-2024 Task 9: Exploring Prompt Engineering Strategies for Brain Teaser Puzzles Through LLMs","url":"http://dx.doi.org/10.18653/v1/2024.semeval-1.234","published":"2024-01-01","authors":["Yinglu Li","Zhao Yan-qing","Min Zhang","Yadong Deng","Aiju Geng","Xiaoqin Liu","Mengxin Ren","Yuang Li","Chang Su","Xiaofeng Zhao"],"abstract":"Yinglu Li, Zhao Yanqing, Min Zhang, Yadong Deng, Aiju Geng, Xiaoqin Liu, Mengxin Ren, Yuang Li, Su Chang, Xiaofeng Zhao. Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024). 2024.","companies":["Huawei/Noah"],"matched_orgs":["Huawei/Noah"],"company_groups":["company_china"],"company_regions":["China"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.18653/v1/2024.semeval-1.234","openalex_id":"https://openalex.org/W4401043747","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Huawei Technologies (China)"],"concepts":[{"id":"https://openalex.org/C44572571","display_name":"SemEval","score":0.7764828205108643},{"id":"https://openalex.org/C2780451532","display_name":"Task (project management)","score":0.6997377276420593},{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.611591100692749},{"id":"https://openalex.org/C154945302","display_name":"Artificial intelligence","score":0.42335045337677},{"id":"https://openalex.org/C204321447","display_name":"Natural language processing","score":0.39344215393066406},{"id":"https://openalex.org/C127413603","display_name":"Engineering","score":0.08503487706184387},{"id":"https://openalex.org/C201995342","display_name":"Systems engineering","score":0.0}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"openalex:W7139022577","title":"Auto-BI Frameworks Powered by Generative Artificial Intelligence for Scalable Self-Service Data Analytics in Large Organizations","url":"https://doi.org/10.63282/3050-9262.ijaidsml-v5i3p119","published":"2024-01-01","authors":["Ajith Suresh"],"abstract":"The exponential growth of enterprise data has created significant challenges for organizations seeking to derive timely and actionable insights from complex datasets. The conventional Business Intelligence (BI) systems are typically based on expert analysts, formal query languages, and ad hoc dashboard development, which may reduce accessibility and reduce the speed of decision-making. The current developments in Generative Artificial Intelligence (GenAI) and Large Language Models (LLMs) offer fresh possibilities to make the traditional BI systems automated, intelligent, and easy to use. The technologies make it possible to interact with data naturally, create insights automatically and dynamically visualizing data, which considerably enhance analytical efficiency. The proposed research paper suggests a Self-service Data analytics-driven Auto-BI framework, which is driven by Generative A...","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["openalex"],"source":"openalex","work_type":"article","doi":"https://doi.org/10.63282/3050-9262.ijaidsml-v5i3p119","openalex_id":"https://openalex.org/W7139022577","cited_by_count":0,"quality_score":37,"matched_keywords":[],"author_affiliations":["Amazon (United States)"],"concepts":[{"id":"https://openalex.org/C41008148","display_name":"Computer science","score":0.8258000016212463},{"id":"https://openalex.org/C79158427","display_name":"Analytics","score":0.6248999834060669},{"id":"https://openalex.org/C33499554","display_name":"Dashboard","score":0.6011000275611877},{"id":"https://openalex.org/C48044578","display_name":"Scalability","score":0.5228999853134155},{"id":"https://openalex.org/C2522767166","display_name":"Data science","score":0.507099986076355},{"id":"https://openalex.org/C2767350","display_name":"Business intelligence","score":0.4668000042438507},{"id":"https://openalex.org/C39890363","display_name":"Generative grammar","score":0.46320000290870667},{"id":"https://openalex.org/C175801342","display_name":"Data analysis","score":0.45419999957084656}],"official_report":false,"quality_signals":{"affiliation_source":"OpenAlex","company_match_source":"OpenAlex authorship institution metadata","cited_by_count":0}},{"id":"official:dc407673e3e0eea5","title":"Interactive Texture Painting with Generative AI","url":"https://research.nvidia.com/publication/2024-01_interactive-texture-painting-generative-ai","published":"2024-01","authors":["Anita Hu","Nishkrit Desai","Ashley Goldstein","Hassan Abu Alhaija","Seung Wook Kim","Daniela Hasenbring","Alex Zook","Rajeev Rao","Maria Shugrina"],"abstract":"Official NVIDIA Research publication. SIGGRAPH","companies":["NVIDIA"],"matched_orgs":["NVIDIA"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["SIGGRAPH"],"author_affiliations":["NVIDIA"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official NVIDIA Research publications page https://research.nvidia.com/publications?f%5B0%5D=publication_date%3A2024&page=3"}},{"id":"official:58d257117b65301d","title":"A deep dive into large language models for automated bug localization and repair","url":"https://www.amazon.science/publications/a-deep-dive-into-large-language-models-for-automated-bug-localization-and-repair","published":"2024","authors":["Soneya Binta Hossain","Nan Jiang","Qiang Zhou","Xiaopeng LI","Wen-Hao Chiang","Yingjun Lyu","Hoan Nguyen","Omer Tripp"],"abstract":"Large language models (LLMs) have shown impressive effectiveness in various software engineering tasks, including automated program repair (APR). In this study, we take a deep dive into automated bug localization and repair utilizing LLMs. In contrast to many deep learning-based APR methods that assume known bug locations, rely on line-level localization tools, or address bug prediction and fixing in one Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.1145/3660773","openalex_id":"https://openalex.org/W4400582613","cited_by_count":58,"quality_score":86,"matched_keywords":["Conversational AI"],"author_affiliations":["Amazon","Amazon (United States)","Purdue University West Lafayette","University of Virginia"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=65"}},{"id":"official:33fac15f629396d9","title":"AffordanceLLM: Grounding affordance from vision language models","url":"https://www.amazon.science/publications/affordancellm-grounding-affordance-from-vision-language-models","published":"2024","authors":["Shengyi Qian","Weifeng Chen","Min Bai","Xiong Zhou","Zhuowen Tu","Erran Li"],"abstract":"Affordance grounding refers to the task of finding the area of an object with which one can interact. It is a fundamental but challenging task, as a successful solution requires the comprehensive understanding of a scene in multiple aspects including detection, localization, and recognition of objects with their parts, of geospatial configuration/layout of the scene, of 3D shapes and physics, as well as Category: Computer vision","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.1109/cvprw63382.2024.00754","openalex_id":"https://openalex.org/W4402916210","cited_by_count":27,"quality_score":83,"matched_keywords":["Computer vision"],"author_affiliations":["Amazon","Amazon (United States)"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=52"}},{"id":"official:e869d169a3be9639","title":"GROUNDHOG: Grounding large language models to holistic segmentation","url":"https://www.amazon.science/publications/groundhog-grounding-large-language-models-to-holistic-segmentation","published":"2024","authors":["Yichi Zhang","Martin Ma","Xiaofeng Gao","Suhaila Shakiah","Qiaozi (QZ) Gao","Joyce Chai"],"abstract":"Most multimodal large language models (MLLMs) learn language-to-object grounding through causal language modeling where grounded objects are captured by bounding boxes as sequences of location tokens. This paradigm lacks pixel-level representations that are impor- tant for fine-grained visual understanding and diagnosis. In this work, we introduce GROUNDHOG, an MLLM developed by grounding Large Language Category: Computer vision","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.1109/cvpr52733.2024.01349","openalex_id":"https://openalex.org/W4402753283","cited_by_count":24,"quality_score":80,"matched_keywords":["Computer vision"],"author_affiliations":["Amazon","Amazon (United States)","University of Michigan–Ann Arbor"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=69"}},{"id":"official:0810835b628d95d3","title":"Efficient continual pre-training for building domain specific large language models","url":"https://www.amazon.science/publications/efficient-continual-pre-training-for-building-domain-specific-large-language-models","published":"2024","authors":["Yong Xie","Karan Aggarwal","Aitzaz Ahmad"],"abstract":"Large language models (LLMs) have demonstrated remarkable open-domain capabilities. LLMs tailored for a domain are typically trained entirely on a domain corpus to excel at handling domain-specific tasks. In this work, we explore an alternative strategy of continual pre-training as a means to develop domain-specific LLMs over an existing open-domain LLM. We introduce FinPythia-6.9B, developed through domain-adaptive Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.18653/v1/2024.findings-acl.606","openalex_id":"https://openalex.org/W4402683765","cited_by_count":14,"quality_score":78,"matched_keywords":["Conversational AI","LLM","efficient"],"author_affiliations":["Amazon","Amazon (United States)"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=59"}},{"id":"official:119c66cc0af6c40b","title":"Evaluating human-AI partnership for LLM-based code migration","url":"https://www.amazon.science/publications/evaluating-human-ai-partnership-for-llm-based-code-migration","published":"2024","authors":["Ishaani M","Behrooz Omidvar-Tehrani","Anmol Anubhai"],"abstract":"The potential of Generative AI, especially Large Language Models (LLMs), to transform software development is remarkable. In this paper, we focus on one area in software development called “code migration”. We define code migration as the process of transitioning the language version of a code repository by converting both the source code and its dependencies. Carefully designing an effective human-AI partnership Category: Information and knowledge management","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.1145/3613905.3650896","openalex_id":"https://openalex.org/W4396833224","cited_by_count":15,"quality_score":75,"matched_keywords":["Information and knowledge management","LLM"],"author_affiliations":["Amazon","Amazon (United States)"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=68"}},{"id":"official:e0579e21ebfdddcc","title":"RecMind: Large language model powered agent for recommendation","url":"https://www.amazon.science/publications/recmind-large-language-model-powered-agent-for-recommendation","published":"2024","authors":["Yancheng Wang","Ziyan Jiang","Zheng Chen","Fan Yang","Yingxue Zhou","Eunah Cho","Xing Fan","Xiaojiang Huang","Yanbin Lu","Yingzhen Yang"],"abstract":"While the recommendation system (RS) has advanced significantly through deep learning, current RS approaches usually train and finetune models on task-specific datasets, limiting their generalizability to new recommendation tasks and their ability to leverage external knowledge due to model scale and data size constraints. Thus, we designed an LLM-powered autonomous recommender agent, RecMind, which is Category: Search and information retrieval","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":72,"matched_keywords":["Search and information retrieval","LLM","language model","retrieval","agent"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=70"}},{"id":"official:a551564cc7be525d","title":"ViLA: Efficient video-language alignment for video question answering","url":"https://www.amazon.science/publications/vila-efficient-video-language-alignment-for-video-question-answering","published":"2024","authors":["Xijun Wang","Junbang Liang","Chun-Kai Wang","Kenan Deng","Yu (Michael) Lou","Ming Lin","Shan Yang"],"abstract":"In this work, we propose an efficient Video-Language Alignment (ViLA) network. Our ViLA model addresses both efficient frame sampling and effective cross-modal alignment in a unified way. In our ViLA network, we design a new learnable text-guided Frame-Prompter together with a cross-modal distillation (QFormer-Distiller) module. Pretrained large image-language models have shown promising results on problems Category: Computer vision","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.1007/978-3-031-73033-7_11","openalex_id":"https://openalex.org/W4403906532","cited_by_count":7,"quality_score":71,"matched_keywords":["Computer vision","efficient","distillation"],"author_affiliations":["Amazon","Amazon (United States)","Seattle University","University of Maryland, College Park"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=54"}},{"id":"official:1140660f807a27e4","title":"Turn-taking and backchannel prediction with acoustic and large language model fusion","url":"https://www.amazon.science/publications/turn-taking-and-backchannel-prediction-with-acoustic-and-large-language-model-fusion","published":"2024","authors":["Jinhan Wang","Long Chen","Aparna Khare","Anirudh Raju","Pranav Dheram","Di He","Minhua Wu","Andreas Stolcke","Venkatesh Ravichandran"],"abstract":"We propose an approach for continuous prediction of turn-taking and backchanneling locations in spoken dialogue by fusing a neural acoustic model with a large language model (LLM). Experiments on the Switchboard human-human conversation dataset demonstrate that our approach consistently outperforms the baseline models with single modality. We also develop a novel multi-task instruction fine- tuning strategy Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.1109/icassp48485.2024.10447196","openalex_id":"https://openalex.org/W4392902922","cited_by_count":6,"quality_score":70,"matched_keywords":["Conversational AI","LLM","language model"],"author_affiliations":["Amazon","Amazon (United States)","University of California, Los Angeles"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=78"}},{"id":"official:b7fc96dd5de1afa9","title":"MAML-en-LLM: Model agnostic meta-training of LLMs for improved in-context learning","url":"https://www.amazon.science/publications/maml-en-llm-model-agnostic-meta-training-of-llms-for-improved-in-context-learning","published":"2024","authors":["Sanchit Sinha","Yuguang Yue","Victor Soto","Mayank Kulkarni","Jianhua Lu","Aidong Zhang"],"abstract":"Adapting large language models (LLMs) to unseen tasks with in-context training samples without fine-tuning remains an important research problem. To learn a robust LLM that adapts well to unseen tasks, multiple meta-training approaches have been proposed such as MetaICL and MetaICT, which involve meta-training pre-trained LLMs on a wide variety of diverse tasks. These meta-training approaches essentially Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.1145/3637528.3671905","openalex_id":"https://openalex.org/W4401863877","cited_by_count":10,"quality_score":70,"matched_keywords":["Conversational AI","LLM"],"author_affiliations":["Amazon","Amazon (United States)","University of Virginia"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=60"}},{"id":"official:a23bc270ccb35654","title":"Distributed training of large language models on AWS Trainium","url":"https://www.amazon.science/publications/distributed-training-of-large-language-models-on-aws-trainium","published":"2024","authors":["Xinwei Fu","Zhen Zhang","Haozheng Fan","Guangtai Huang","Randy Huang","Rahul Solanki","Fei Wu","Ron Diamant","Yida Wang"],"abstract":"Large language models (LLMs) are ubiquitously powerful but prohibitively expensive to train, often requiring thousands of compute devices, typically GPUs. To reduce the cost of training LLMs for customers, Amazon Web Services (AWS) launched the Amazon EC2 trn1 instances, powered by AWS Trainium, Amazon’s homegrown deep-learning accelerator, as an alternative to distributed LLM training. The trn1 instances Category: Cloud and systems","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.1145/3698038.3698535","openalex_id":"https://openalex.org/W4404386171","cited_by_count":10,"quality_score":70,"matched_keywords":["Cloud and systems","LLM"],"author_affiliations":["Amazon","Amazon (United States)"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=42"}},{"id":"official:970f29daaa3821e5","title":"Textual dataset distillation via language model embedding","url":"https://www.amazon.science/publications/textual-dataset-distillation-via-language-model-embedding","published":"2024","authors":["Yefan Tao","Chris (Luyang) Kong","Andrey Kan","Laurent Callot"],"abstract":"Dataset distillation is a process aimed at condensing datasets while preserving essential characteristics. In the text domain, prevailing methods typically generate distilled data as embedding vectors, which are not human-readable. This approach simplifies optimization but limits the transferability of distilled data across different model architectures. To address this limitation, we introduce a model-agnostic Category: Information and knowledge management","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.18653/v1/2024.findings-emnlp.733","openalex_id":"https://openalex.org/W4404780847","cited_by_count":5,"quality_score":69,"matched_keywords":["Information and knowledge management","language model","distillation"],"author_affiliations":["Amazon","Amazon (United States)"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=46"}},{"id":"official:528225507a5b0b3b","title":"Generating colloquial radiology reports with large language models","url":"https://www.amazon.science/publications/generating-colloquial-radiology-reports-with-large-language-models","published":"2024","authors":["Cynthia Crystal Tang","Supriya Nagesh","David A. Fussell","Justin Glavis-Bloom","Nina Mishra","Charles Li","Gillean Cortes","Robert Hill","Jasmine Zhao","Angellica Gordon","Joshua Wright","Hayden Troutt"],"abstract":"Objectives: Patients are increasingly being given direct access to their medical records. However, radiology reports are written for clinicians and typically contain medical jargon, which can be confusing. One solution is for radiologists to provide a “colloquial” version that is accessible to the layperson. Because manually generating these colloquial translations would represent a significant burden for Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.1093/jamia/ocae223","openalex_id":"https://openalex.org/W4401856153","cited_by_count":13,"quality_score":69,"matched_keywords":["Conversational AI"],"author_affiliations":["Amazon","Amazon (United States)","University of California, Irvine"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=42"}},{"id":"official:cbfa740b068fca06","title":"SYNTHESIZRR: Generating diverse datasets with retrieval augmentation","url":"https://www.amazon.science/publications/synthesizrr-generating-diverse-datasets-with-retrieval-augmentation","published":"2024","authors":["Abhishek Divekar","Greg Durrett"],"abstract":"It is often desirable to distill the capabilities of large language models (LLMs) into smaller student models due to compute and memory constraints. One way to do this for classification tasks is via dataset synthesis, which can be accomplished by generating examples of each label from the LLM. Prior approaches to synthesis use few-shot prompting, which relies on the LLM’s parametric knowledge to generate Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Conversational AI","LLM","memory","retrieval"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=44"}},{"id":"official:81408c165c623a3f","title":"RAG-QA arena: Evaluating domain robustness for long-form retrieval-augmented question answering","url":"https://www.amazon.science/publications/rag-qa-arena-evaluating-domain-robustness-for-long-form-retrieval-augmented-question-answering","published":"2024","authors":["Rujun Han","Yuhao Zhang","Peng Qi","Yumo Xu","Jenyuan Wang","Lan Liu","William Yang Wang","Bonan Min","Vittorio Castelli"],"abstract":"Question answering based on retrieval-augmented generation (RAG-QA) is an important research topic in NLP and has a wide range of real-world applications. However, most existing datasets for this task are either constructed using a single source corpus or consist of short extractive answers, which fall short of evaluating large language model (LLM) based RAG-QA systems on cross-domain generalization. To Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Conversational AI","LLM","language model","retrieval"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=40"}},{"id":"official:af3dce3c3340f2da","title":"Performance-guided LLM knowledge distillation for efficient text classification at scale","url":"https://www.amazon.science/publications/performance-guided-llm-knowledge-distillation-for-efficient-text-classification-at-scale","published":"2024","authors":["Flavio Di Palo","Prateek Singhi","Bilal Fadlallah"],"abstract":"Large Language Models (LLMs) face significant challenges at inference time due to their high computational demands. To address this, we present Performance-Guided Knowledge Distillation (PGKD), a cost-effective and high-throughput solution for production text classification applications. PGKD utilizes teacher-student Knowledge Distillation to distill the knowledge of LLMs into smaller, task-specific models Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Conversational AI","LLM","efficient","distillation"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=45"}},{"id":"official:ce9ef28039db94aa","title":"Paralinguistics-enhanced large language modeling of spoken dialogue","url":"https://www.amazon.science/publications/paralinguistics-enhanced-large-language-modeling-of-spoken-dialogue","published":"2024","authors":["GUAN-TING LIN","Prashanth Gurunath Shivakumar","Ankur Gandhe","Huck Yang","Yi Gu","Shalini Ghosh","Andreas Stolcke","Hung-yi Lee","Ivan Bulyko"],"abstract":"Large Language Models (LLMs) have demonstrated superior abilities in tasks such as chatting, reasoning, and question-answering. However, standard LLMs may ignore crucial paralinguistic information, such as sentiment, emotion, and speaking style, which are essential for achieving natural, human-like spoken conversation, especially when such information is conveyed by acoustic cues. We therefore propose Paralinguistics-enhanced Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.1109/icassp48485.2024.10446933","openalex_id":"https://openalex.org/W4392931281","cited_by_count":12,"quality_score":68,"matched_keywords":["Conversational AI"],"author_affiliations":["Amazon","Amazon (United States)","National Taiwan University"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=80"}},{"id":"official:38b037ba548e4ba6","title":"PEARL: Preference extraction with exemplar augmentation and retrieval with LLM agents","url":"https://www.amazon.science/publications/pearl-preference-extraction-with-exemplar-augmentation-and-retrieval-with-llm-agents","published":"2024","authors":["Vijit Malik","Akshay Jagatap","Vinayak Puranik","Anirban Majumder"],"abstract":"Identifying preferences of customers in their shopping journey is a pivotal aspect in providing product recommendations. The task becomes increasingly challenging when there is a multi-turn conversation between the user and a shopping assistant chatbot. In this paper, we address a novel and complex problem of identifying customer preferences in the form of keyvalue filters on an e-commerce website in a Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Conversational AI","LLM","preference","retrieval"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=45"}},{"id":"official:da48d3da39ab7f71","title":"MEND: Meta demonstration distillation for efficient and effective in-context learning","url":"https://www.amazon.science/publications/mend-meta-demonstration-distillation-for-efficient-and-effective-in-context-learning","published":"2024","authors":["Yichuan Li","Xiyao Ma","Sixing Lu","Kyumin Lee","Xiaohu Liu","Chenlei (Edward) Guo"],"abstract":"Large Language models (LLMs) have demonstrated impressive in-context learning (ICL) capabilities, where a LLM makes predictions for a given test input together with a few input-output pairs (demonstrations). Nevertheless, the inclusion of demonstrations leads to a quadratic increase in the computational overhead of the self-attention mechanism. Existing solutions attempt to distill lengthy demonstrations Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Conversational AI","LLM","efficient","distillation"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=75"}},{"id":"official:d276845a84056e41","title":"MARCO: Multi-agent real-time chat orchestration","url":"https://www.amazon.science/publications/marco-multi-agent-real-time-chat-orchestration","published":"2024","authors":["Anubhav Shrimal","Stanley Kanagaraj","Kriti Biswas","Swarnalatha Raghuraman","Anish Nediyanchath","Yi Zhang","Promod Yenigalla"],"abstract":"Large language model advancements have enabled the development of multi-agent frameworks to tackle complex, real-world problems such as to automate tasks that require interactions with diverse tools, reasoning, and human collaboration. We present MARCO, a Multi-Agent Real-time Chat Orchestration framework for automating tasks using LLMs. MARCO addresses key challenges in utilizing LLMs for complex, multi-step Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Conversational AI","language model","agent","multi-agent"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=40"}},{"id":"official:91c31d33d636cec5","title":"HyperDPO: Hypernetwork-based multi-objective fine-tuning framework","url":"https://www.amazon.science/publications/hyperdpo-hypernetwork-based-multi-objective-fine-tuning-framework","published":"2024","authors":["Yinuo Ren","Tesi Xiao","Michael Shavlovsky","Lexing Ying","Holakou Rahmanian"],"abstract":"In LLM alignment and many other ML applications, one often faces the MultiObjective Fine-Tuning (MOFT) problem, i.e. fine-tuning an existing model with datasets labeled w.r.t. different objectives simultaneously. To address the challenge, we propose the HyperDPO framework, a hypernetwork-based approach that extends the Direct Preference Optimization (DPO) technique, originally developed for efficient LLM Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Conversational AI","LLM","preference","efficient"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=43"}},{"id":"official:ae7f05e85d1808c6","title":"Evolutionary contrastive distillation for language model alignment","url":"https://www.amazon.science/publications/evolutionary-contrastive-distillation-for-language-model-alignment","published":"2024","authors":["Julian Katz-Samuels","Zheng Li","Hyokun Yun","Priyanka Nigam","Yi Xu","Vaclav Petricek","Bing Yin","Trishul Chilimbi"],"abstract":"The ability of large language models (LLMs) to execute complex instructions is essential for their real-world applications. However, several recent studies indicate that LLMs struggle with challenging instructions (Zhou et al., 2023; Qin et al., 2024; Jiang et al., 2023b). In this paper, we propose Evolutionary Contrastive Distillation (ECD), a novel method for generating high-quality synthetic preference Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Conversational AI","language model","preference","distillation"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=45"}},{"id":"official:0abec762ddffc47a","title":"Efficient pointwise-pairwise learning-to-rank for news recommendation","url":"https://www.amazon.science/publications/efficient-pointwise-pairwise-learning-to-rank-for-news-recommendation","published":"2024","authors":["Nithish Kannen Senthilkumar","Yao Ma","Gerrit van den Burg","Jean Baptiste Faddoul"],"abstract":"News recommendation is a challenging task that involves personalization based on the interaction history and preferences of each user. Recent works have leveraged the power of pretrained language models (PLMs) to directly rank news items by using inference approaches that predominately fall into three categories: pointwise, pairwise, and listwise learning-to-rank. While pointwise methods offer linear inference Category: Machine learning","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Machine learning","personalization","news","efficient"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=46"}},{"id":"official:b7028733cae66791","title":"DATA ADVISOR: Dynamic data curation for safety alignment of large language models","url":"https://www.amazon.science/publications/data-advisor-dynamic-data-curation-for-safety-alignment-of-large-language-models","published":"2024","authors":["Fei Wang","Ninareh Mehrabi","Palash Goyal","Rahul Gupta","Kai-Wei Chang","Aram Galstyan"],"abstract":"Data is a crucial element in large language model (LLM) alignment. Recent studies have explored using LLMs for efficient data collection. However, LLM-generated data often suffers from quality issues, with underrepresented or absent aspects and low-quality data-points. To address these problems, we propose DATA ADVISOR, an enhanced LLM-based method for generating data that takes into account the characteristics Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":68,"matched_keywords":["Conversational AI","LLM","language model","efficient"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=43"}},{"id":"official:bca003f379370118","title":"Shopping trajectory representation learning with pre-training for e-commerce customer understanding and recommendation","url":"https://www.amazon.science/publications/shopping-trajectory-representation-learning-with-pre-training-for-e-commerce-customer-understanding-and-recommendation","published":"2024","authors":["Yankai Chen","Tuan Truong","Xin Shen","Jin Li","Irwin King"],"abstract":"Understanding customer behavior is crucial for improving service quality in large-scale E-commerce. This paper proposes C-STAR, a new framework that learns compact representations from customer shopping journeys, with good versatility to fuel multiple down-stream customer-centric tasks. We define the notion of shopping trajectory that encompasses customer interactions at the level of product categories, Category: Machine learning","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.1145/3637528.3671747","openalex_id":"https://openalex.org/W4401863666","cited_by_count":11,"quality_score":67,"matched_keywords":["Machine learning"],"author_affiliations":["Amazon","Amazon (United States)","Chinese University of Hong Kong"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=57"}},{"id":"official:31e9848bc06e6af9","title":"Towards ASR robust spoken language understanding through in-context learning with word confusion networks","url":"https://www.amazon.science/publications/towards-asr-robust-spoken-language-understanding-through-in-context-learning-with-word-confusion-networks","published":"2024","authors":["Kevin Everson","Yi Gu","Huck Yang","Prashanth Gurunath Shivakumar","GUAN-TING LIN","Jari Kolehmainen","Ivan Bulyko","Ankur Gandhe","Shalini Ghosh","Wael Hamza","Hung-yi Lee","Ariya Rastrow"],"abstract":"In the realm of spoken language understanding (SLU), numerous natural language understanding (NLU) methodologies have been adapted by supplying large language models (LLMs) with transcribed speech instead of conventional written text. In real-world scenarios, prior to input into an LLM, an automated speech recognition (ASR) system generates an output transcript hypothesis, where inherent errors can degrade Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.1109/icassp48485.2024.10447938","openalex_id":"https://openalex.org/W4392909867","cited_by_count":6,"quality_score":66,"matched_keywords":["Conversational AI","LLM"],"author_affiliations":["Amazon","Amazon (United States)","National Taiwan University","University of Washington"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=79"}},{"id":"official:b69b499925020073","title":"REAPER: Reasoning based retrieval planning for complex RAG systems","url":"https://www.amazon.science/publications/reaper-reasoning-based-retrieval-planning-for-complex-rag-systems","published":"2024","authors":["Ashutosh Joshi","Sheikh Sarwar","Samarth Varshney","Sreyashi Nag","Shrivats Agrawal","Juhi Naik"],"abstract":"Complex dialog systems often use retrieved evidence to facilitate factual responses. Such RAG (Retrieval Augmented Generation) systems retrieve from massive heterogeneous data stores that are usually architected as multiple indexes or APIs instead of a single monolithic source. For a given query, relevant evidence needs to be retrieved from one or a small subset of possible retrieval sources. Complex queries Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.1145/3627673.3680087","openalex_id":"https://openalex.org/W4403577834","cited_by_count":6,"quality_score":66,"matched_keywords":["Conversational AI","retrieval"],"author_affiliations":["Amazon","Amazon (United States)"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=51"}},{"id":"official:69dadd0c1e5007cc","title":"On-device constrained self-supervised learning for keyword spotting via quantization aware pre-training and fine-tuning","url":"https://www.amazon.science/publications/on-device-constrained-self-supervised-learning-for-keyword-spotting-via-quantization-aware-pre-training-and-fine-tuning","published":"2024","authors":["Gene-Ping Yang","Yue Gu","Sashank Macha","Qingming Tang","Yuzong Liu"],"abstract":"Large self-supervised models have excelled in various speech processing tasks, but their deployment on resource-limited devices is often impractical due to their substantial memory footprint. Previous studies have demonstrated the effectiveness of self-supervised pre-training for keyword spotting, even with constrained model capacity. In our pursuit of maintaining high performance while minimizing the model Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.1109/icassp48485.2024.10447258","openalex_id":"https://openalex.org/W4392909943","cited_by_count":2,"quality_score":66,"matched_keywords":["Conversational AI","memory","quantization"],"author_affiliations":["Amazon","Amazon (United States)","SpeechTech (Czechia)","University of Edinburgh"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=80"}},{"id":"official:9edf934f79a505e7","title":"On early detection of hallucinations in factual question answering","url":"https://www.amazon.science/publications/on-early-detection-of-hallucinations-in-factual-question-answering","published":"2024","authors":["Ben Snyder","Marius Moisescu","Muhammad Bilal Zafar"],"abstract":"While large language models (LLMs) have taken great strides towards helping humans with a plethora of tasks, hallucinations remain a major impediment towards gaining user trust. The fluency and coherence of model generations even when hallucinating makes detection a difficult task. In this work, we explore if the artifacts associated with the model generations can provide hints that the generation will Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.1145/3637528.3671796","openalex_id":"https://openalex.org/W4401863263","cited_by_count":10,"quality_score":66,"matched_keywords":["Conversational AI"],"author_affiliations":["Amazon","Amazon (United States)","Ruhr University Bochum"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=52"}},{"id":"official:0e82eb056f32c3c4","title":"SoftQE: Learned representations of queries expanded by LLMs","url":"https://www.amazon.science/publications/softqe-learned-representations-of-queries-expanded-by-llms","published":"2024","authors":["Varad Pimpalkhute","John Heyer","Xusen Yin","Sameer Gupta"],"abstract":"We investigate the integration of Large Language Models (LLMs) into query encoders to improve dense retrieval without increasing latency and cost, by circumventing the dependency on LLMs at inference time. SoftQE incorporates knowledge from LLMs by mapping embeddings of input queries to those of the LLM-expanded queries. While improvements over various strong baselines on in-domain MS-MARCO metrics are Category: Search and information retrieval","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.1007/978-3-031-56066-8_8","openalex_id":"https://openalex.org/W4392800405","cited_by_count":1,"quality_score":65,"matched_keywords":["Search and information retrieval","LLM","retrieval"],"author_affiliations":["Amazon","Amazon (United States)","Seattle University","University of Massachusetts Amherst"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=76"}},{"id":"official:331ccc7165fa7da9","title":"PEFA: ParamEter-Free Adapters for large-scale embedding-based retrieval models","url":"https://www.amazon.science/publications/pefa-parameter-free-adapters-for-large-scale-embedding-based-retrieval-models","published":"2024","authors":["Wei-Cheng Chang","Jyun-Yu Jiang","Jiong Zhang","Mutasem Al-Darabsah","Choon Hui Teo","Cho-Jui Hsieh","Hsiang-Fu Yu","S. V. N. Vishwanathan"],"abstract":"Embedding-based Retrieval Models (ERMs) have emerged as a promising framework for large-scale text retrieval problems due to powerful large language models. Nevertheless, fine-tuning ERMs to reach state-of-the-art results can be expensive due to the extreme scale of data as well as the complexity of multi-stages pipelines (e.g., pre-training, fine-tuning, distillation). In this work, we propose the PEFA Category: Search and information retrieval","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.1145/3616855.3635791","openalex_id":"https://openalex.org/W4389470748","cited_by_count":1,"quality_score":65,"matched_keywords":["Search and information retrieval","retrieval","distillation"],"author_affiliations":["Amazon","Amazon (United States)","UCLA Health"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=81"}},{"id":"official:f2307ade218fefba","title":"Variance-reduced zeroth-order methods for fine-tuning language models","url":"https://www.amazon.science/publications/variance-reduced-zeroth-order-methods-for-fine-tuning-language-models","published":"2024","authors":["Tanmay Gautam","Youngsuk Park","Hao Zhou","Parameswaran Raman","Wooseok Ha"],"abstract":"Fine-tuning language models (LMs) has demonstrated success in a wide array of downstream tasks. However, as LMs are scaled up, the memory requirements for backpropagation become prohibitively high. Zeroth-order (ZO) optimization methods can leverage memory-efficient for-ward passes to estimate gradients. Recently, MeZO, an adaptation of ZO-SGD, has been shown to consistently outperform zero-shot and in-context Category: Machine learning","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Machine learning","memory","efficient"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=69"}},{"id":"official:93bbd939dbd35300","title":"Towards unified multi-modal personalization: Large vision-language models for generative recommendation and beyond","url":"https://www.amazon.science/publications/towards-unified-multi-modal-personalization-large-vision-language-models-for-generative-recommendation-and-beyond","published":"2024","authors":["Tianxin Wei","Bowen Jin","Ruirui Li","Hansi Zeng","Zhengyang Wang","Jianhui Sun","Qingyu Yin","Hanqing Lu","Suhang Wang","Jingrui He","Xianfeng Tang"],"abstract":"Developing a unified model that can effectively harness heterogeneous resources and respond to a wide range of personalized needs has been a longstanding community aspiration. Our daily choices, especially in domains like fashion and retail, are substantially shaped by multi-modal data, such as pictures and textual descriptions. The vision and language modalities not only offer intuitive guidance but also Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Conversational AI","personalized","personalization"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=69"}},{"id":"official:43dc35bc4ea09683","title":"Towards effective genAI multi-agent collaboration: Design and evaluation for enterprise applications","url":"https://www.amazon.science/publications/towards-effective-genai-multi-agent-collaboration-design-and-evaluation-for-enterprise-applications","published":"2024","authors":["Raphael Shu","Nilaksh Das","Michelle Yuan","Monica Sunkara","Yi Zhang"],"abstract":"AI agents powered by large language models (LLMs) have shown strong capabilities in problem solving. Through combining many intelligent agents, multi-agent collaboration has emerged as a promising approach to tackle complex, multi-faceted problems that exceed the capabilities of single AI agents. However, designing the collaboration protocols and evaluating the effectiveness of these systems remains a significant Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Conversational AI","agent","multi-agent"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=38"}},{"id":"official:0d9913156ce4845b","title":"RoseLoRA: Row and column-wise sparse low-rank adaptation of pre-trained language model for knowledge editing and fine-tuning","url":"https://www.amazon.science/publications/roselora-row-and-column-wise-sparse-low-rank-adaptation-of-pre-trained-language-model-for-knowledge-editing-and-fine-tuning","published":"2024","authors":["Haoyu Wang","Tianci Liu","Ruirui Li","Monica Cheng","Tuo Zhao","Jing Gao"],"abstract":"Pre-trained language models, trained on largescale corpora, demonstrate strong generalizability across various NLP tasks. Finetuning these models for specific tasks typically involves updating all parameters, which is resource-intensive. Parameter-efficient finetuning (PEFT) methods, such as the popular LoRA family, introduce low-rank matrices to learn only a few parameters efficiently. However, during Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Conversational AI","language model","efficient"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=43"}},{"id":"official:32ac1d40f329ecee","title":"Question aware vision transformer for multimodal reasoning","url":"https://www.amazon.science/publications/question-aware-vision-transformer-for-multimodal-reasoning","published":"2024","authors":["Roy Ganz","Yair Kittenplon","Aviad Aberdam","Elad Ben Avraham","Oren Nuriel","Shai Mazor","Ron Litman"],"abstract":"Vision-Language (VL) models have gained significant research focus, enabling remarkable advances in multimodal reasoning. These architectures typically comprise a vision encoder, a Large Language Model (LLM), and a projection module that aligns visual features with the LLM’s representation space. Despite their success, a critical limitation persists: the vision encoding process remains decoupled from user Category: Computer vision","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Computer vision","LLM","language model"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=74"}},{"id":"official:f370007996f5ace4","title":"Private text generation by seeding large language model prompts","url":"https://www.amazon.science/publications/private-text-generation-by-seeding-large-language-model-prompts","published":"2024","authors":["Supriya Nagesh","Justin Chen","Nina Mishra","Tal Wagner"],"abstract":"We explore how private synthetic text can be generated by suitably prompting a large language model (LLM). This addresses a challenge for organizations like hospitals, which hold sensitive text data like patient medical records, and wish to share it in order to train machine learning models for medical tasks, while preserving patient privacy. Methods that rely on training or finetuning a model may be out Category: Security, privacy, and abuse prevention","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Security, privacy, and abuse prevention","LLM","language model"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=40"}},{"id":"official:d43b794e0c582111","title":"Precise model benchmarking with only a few observations","url":"https://www.amazon.science/publications/precise-model-benchmarking-with-only-a-few-observations","published":"2024","authors":["Riccardo Fogliato","Pratik Patil","Nil-Jana Akpinar","Mathew Monfort"],"abstract":"How can we precisely estimate a large language model’s (LLM) accuracy on questions belonging to a specific topic within a larger question-answering dataset? The standard direct estimator, which averages the model’s accuracy on the questions in each subgroup, may exhibit high variance for subgroups (topics) with small sample sizes. Synthetic regression modeling, which leverages the model’s accuracy on questions Category: Machine learning","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Machine learning","LLM","language model"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=40"}},{"id":"official:85c3c216e71df51d","title":"OpenTab: Advancing large language models as open-domain table reasoners","url":"https://www.amazon.science/publications/opentab-advancing-large-language-models-as-open-domain-table-reasoners","published":"2024","authors":["Kezhi Kong","Jiani Zhang","Zhengyuan Shen","Balasubramaniam Srinivasan","Chuan Lei","Christos Faloutsos","Huzefa Rangwala","George Karypis"],"abstract":"Large Language Models (LLMs) trained on large volumes of data excel at various natural language tasks, but they cannot handle tasks requiring knowledge that has not been trained on previously. One solution is to use a retriever that fetches relevant information to expand LLM’s knowledge scope. However, existing textual-oriented retrieval-based LLMs are not ideal on structured table data due to diversified Category: Machine learning","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Machine learning","LLM","retrieval"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=75"}},{"id":"official:0b4c07f4b4c26b95","title":"Multi-modal retrieval for large language model based speech recognition","url":"https://www.amazon.science/publications/multi-modal-retrieval-for-large-language-model-based-speech-recognition","published":"2024","authors":["Jari Kolehmainen","Aditya Gourav","Prashanth Gurunath Shivakumar","Yi Gu","Ankur Gandhe","Ariya Rastrow","Grant Strimel","Ivan Bulyko"],"abstract":"Retrieval is a widely adopted approach for improving language models leveraging external information. As the field moves towards multimodal large language models, it is important to extend the pure text-based methods to incorporate other modalities in retrieval as well for applications across the wide spectrum of machine learning tasks and data types. In this work, we propose multi-modal retrieval with Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Conversational AI","language model","retrieval"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=59"}},{"id":"official:1d12ef82c531c8b5","title":"MATTER: Memory-augmented transformer using heterogeneous knowledge sources","url":"https://www.amazon.science/publications/matter-memory-augmented-transformer-using-heterogeneous-knowledge-sources","published":"2024","authors":["Dongkyu Lee","Chandana Satya Prakash","Jack G. M. FitzGerald","Jens Lehmann"],"abstract":"Leveraging external knowledge is crucial for achieving high performance in knowledge-intensive tasks, such as question answering. The retrieve-and-read approach is widely adopted for integrating external knowledge into a language model. However, this approach suffers from increased computational cost and latency due to the long context length, which grows proportionally with the number of retrieved knowledge Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Conversational AI","language model","memory"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=58"}},{"id":"official:8739d608372d5ca6","title":"LLM self-correction with DeCRIM: Decompose, critique, and refine for enhanced following of instructions with multiple constraints","url":"https://www.amazon.science/publications/llm-self-correction-with-decrim-decompose-critique-and-refine-for-enhanced-following-of-instructions-with-multiple-constraints","published":"2024","authors":["Thomas Palmeira Ferraz","Kartik Mehta","Yu-Hsiang Lin","Haw-Shiuan Chang","Shereen Oraby","Sijia Liu","Vivek Subramanian","Tagyoung Chung","Mohit Bansal","Nanyun Peng"],"abstract":"Instruction following is a key capability for LLMs. However, recent studies have shown that LLMs often struggle with instructions containing multiple constraints (e.g. a request to create a social media post “in a funny tone” with “no hashtag”). Despite this, most evaluations focus solely on synthetic data. To address this, we introduce RealInstruct, the first benchmark designed to evaluate LLMs’ ability Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Conversational AI","LLM","media"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=45"}},{"id":"official:66224938279c665d","title":"HR-MultiWOZ: A task oriented dialogue (TOD) dataset for HR LLM agent","url":"https://www.amazon.science/publications/hr-multiwoz-a-task-oriented-dialogue-tod-dataset-for-hr-llm-agent","published":"2024","authors":["Weijie Xu","Zicheng Huang","Wenxiang Hu","Xi Fang","Rajesh Cherukuri","Naumaan Nayyar","Lorenzo Malandri","Srinivasan Sengamedu","\"SHS\""],"abstract":"Recent advancements in Large Language Models (LLMs) have been reshaping Natural Language Processing (NLP) task in several domains. Their use in the field of Human Resources (HR) has still room for expansions and could be beneficial for several time consuming tasks. Examples such as time-off submissions, medical claims filing, and access requests are noteworthy, but they are by no means the sole instances Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Conversational AI","LLM","agent"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=73"}},{"id":"official:d83e05db9a64dc41","title":"GraphEval: A knowledge-graph based LLM hallucination evaluation framework","url":"https://www.amazon.science/publications/grapheval-a-knowledge-graph-based-llm-hallucination-evaluation-framework","published":"2024","authors":["Hannah Sansford","Nicholas Richardson","Hermina Petric Maretic","Juba Nait Saada"],"abstract":"Methods to evaluate Large Language Model (LLM) responses and detect inconsistencies, also known as hallucinations, with respect to the provided knowledge, are becoming increasingly important for LLM applications. Current metrics fall short in their ability to provide explainable decisions, systematically check all pieces of information in the response, and are often too computationally expensive to be used Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Conversational AI","LLM","language model"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=56"}},{"id":"official:2e535a0b698479cd","title":"EIVEN: Efficient implicit attribute value extraction using multimodal LLM","url":"https://www.amazon.science/publications/eiven-efficient-implicit-attribute-value-extraction-using-multimodal-llm","published":"2024","authors":["Henry Peng Zou","Gavin Yu","Ziwei Fan","Dan Bu","Han Liu","Peng Dai","Dongmei Jia","Cornelia Caragea"],"abstract":"In e-commerce, accurately extracting product attribute values from multimodal data is crucial for improving user experience and operational efficiency of retailers. However, previous approaches to multimodal attribute value extraction often struggle with implicit attribute values embedded in images or text, rely heavily on extensive labeled data, and can easily confuse similar attribute values. To address Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Conversational AI","LLM","efficient"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=68"}},{"id":"official:f268797406f58efc","title":"Correcting language model outputs by editing salient layers","url":"https://www.amazon.science/publications/correcting-language-model-outputs-by-editing-salient-layers","published":"2024","authors":["Kshitij Mishra","Tamer Soliman","Anil Ramakrishna","Anoop Kumar","Aram Galstyan"],"abstract":"Large language models can accumulate incorrect or outdated knowledge as the real world evolves. Compared to typical solutions such as retraining, retrieval augmented generation, model editing offers an effective yet low cost solution to address this issue. However, existing model editing algorithms employ manual selection of edit layers, which requires prior domain knowledge or expensive architecturespecific Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.18653/v1/2024.findings-eacl.86","openalex_id":"https://openalex.org/W7126410531","cited_by_count":0,"quality_score":64,"matched_keywords":["Conversational AI","language model","retrieval"],"author_affiliations":["Amazon","Amazon (United States)"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=78"}},{"id":"official:4aaafc5f42b10d6f","title":"Collecting high-quality multi-modal conversational search data for e-commerce","url":"https://www.amazon.science/publications/collecting-high-quality-multi-modal-conversational-search-data-for-e-commerce","published":"2024","authors":["Marcus Collins","Eugene Agichtein","Oleg Rokhlenko","Shervin Malmasi"],"abstract":"Continued improvement of conversational assistants in knowledge-rich domains like E-Commerce requires large volumes of realistic high-quality conversation data to power increasingly sophisticated LLM chatbots, dialogue managers, response rankers, and recommenders. The problem is worse for multi-modal interactions in realistic conversational product search and recommendation. Here, an artificial sales agent Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.18653/v1/2024.knowledgenlp-1.3","openalex_id":"https://openalex.org/W4402683103","cited_by_count":0,"quality_score":64,"matched_keywords":["Conversational AI","LLM","agent"],"author_affiliations":["Amazon","Amazon (United States)"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=55"}},{"id":"official:45f449b000e67b12","title":"CoMM: Collaborative multi-agent, multi-reasoning-path prompting for complex problem solving","url":"https://www.amazon.science/publications/comm-collaborative-multi-agent-multi-reasoning-path-prompting-for-complex-problem-solving","published":"2024","authors":["Pei (Patrick) Chen","Boran Han","Shuai Zhang"],"abstract":"Large Language Models (LLMs) have shown great ability in solving traditional natural-language tasks and elementary reasoning tasks with appropriate prompting techniques. However, their ability is still limited in solving complicated science problems. In this work, we aim to push the upper bound of the reasoning capability of LLMs by proposing a collaborative multi-agent, multi-reasoning-path (CoMM) prompting Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Conversational AI","agent","multi-agent"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=69"}},{"id":"official:e1139459de83b0f4","title":"CoMERA: Computing- and memory-efficient training via rank-adaptive tensor optimization","url":"https://www.amazon.science/publications/comera-computing-and-memory-efficient-training-via-rank-adaptive-tensor-optimization","published":"2024","authors":["Zi Yang","Ziyue Liu","Samridhi Choudhary","Xinfeng Xie","Cao Gao","Siegfried Kunzmann","Zheng Zhang"],"abstract":"Training large AI models such as deep learning recommendation systems and large language models (LLMs) costs massive GPUs and computing time. The high training cost has become only affordable to big tech companies, meanwhile also causing increasing concerns about the environmental impact. This paper presents CoMERA, a Computing- and Memory-Efficient training method via Rank-Adaptive tensor optimization. Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Conversational AI","memory","efficient"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=40"}},{"id":"official:60d97a2738d55277","title":"Chronos: Learning the language of time series","url":"https://www.amazon.science/publications/chronos-learning-the-language-of-time-series","published":"2024","authors":["Abdul Fatir Ansari","Lorenzo Stella","Caner Turkmen","Xiyuan Zhang","Pedro Mercado","Huibin Shen","Oleksandr Shchur","Syama Rangapuram","Sebastian Pineda Arango","Shubham Kapoor","Jasper Zschiegner","Danielle Maddix Robinson"],"abstract":"We introduce Chronos, a simple yet effective framework for pretrained probabilistic time series models. Chronos tokenizes time series values using scaling and quantization into a fixed vocabulary and trains existing transformer-based language model architectures on these tokenized time series via the cross-entropy loss. We pretrained Chronos models based on the T5 family (ranging from 20M to 710M parameters Category: Machine learning","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Machine learning","language model","quantization"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=39"}},{"id":"official:0865c24f142980f0","title":"CPR: Retrieval augmented generation for copyright protection","url":"https://www.amazon.science/publications/cpr-retrieval-augmented-generation-for-copyright-protection","published":"2024","authors":["Aditya Golatkar","Alessandro Achille","Luca Zancato","Yu-Xiang Wang","Ashwin Swaminathan","Stefano Soatto"],"abstract":"Retrieval Augmented Generation (RAG) is emerging as a flexible and robust technique to adapt models to private users data without training, to handle credit attribution, and to allow efficient machine unlearning at scale. However, RAG techniques for image generation may lead to parts of the retrieved samples being copied in the model’s output. To reduce risks of leaking private information contained in Category: Computer vision","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Computer vision","retrieval","efficient"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=69"}},{"id":"official:adaff7c98b5acf08","title":"COLLAGE: Light-weight low-precision strategy for LLM training","url":"https://www.amazon.science/publications/collage-light-weight-low-precision-strategy-for-llm-training","published":"2024","authors":["Tao Yu","Gaurav Gupta","Karthick Gopalswamy","Amith Mamidala","Hao Zhou","Jeffrey Huynh","Youngsuk Park","Ron Diamant","Anoop Deoras","Luke Huan"],"abstract":"Large models training is plagued by the intense compute cost and limited hardware memory. A practical solution is low-precision representation but is troubled by loss in numerical accuracy and unstable training rendering the model less useful. We argue that low-precision floating points can perform well provided the error is properly compensated at the critical locations in the training process. We propose Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Conversational AI","LLM","memory"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=55"}},{"id":"official:f6d5bc7ca40c1db4","title":"B’MOJO: Hybrid state space realizations of foundation models with eidetic and fading memory","url":"https://www.amazon.science/publications/bmojo-hybrid-state-space-realizations-of-foundation-models-with-eidetic-and-fading-memory","published":"2024","authors":["Luca Zancato","Arjun Seshadri","Yonatan Dukler","Aditya Golatkar","Yantao Shen","Ben Bowman","Matthew Trager","Alessandro Achille","Stefano Soatto"],"abstract":"We describe a family of architectures to support transductive inference by allowing memory to grow to a finite but a-priori unknown bound while making efficient use of finite resources for inference. Current architectures use such resources to represent data either eidetically over a finite span (“context” in Transformers), or fading over an infinite span (in State Space Models, or SSMs). Recent hybrid Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Conversational AI","memory","efficient"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=39"}},{"id":"official:49185a37b5941458","title":"Bifurcated attention for single-context large-batch sampling","url":"https://www.amazon.science/publications/bifurcated-attention-for-single-context-large-batch-sampling","published":"2024","authors":["Ben Athiwaratkun","Sujan Gonugondla","Sanjay Krishna Gouda","Hantian Ding","Qing Sun","Jun Wang","Jiacheng Guo","Liangfu Chen","Haifeng Qian","Parminder Bhatia","Ramesh Nallapati","Sudipta Sengupta"],"abstract":"In our study, we present bifurcated attention, a method developed for language model inference in single-context batch sampling contexts. This approach aims to reduce redundant memory IO costs, a significant factor in latency for high batch sizes and long context lengths. Bifurcated attention achieves this by dividing the attention mechanism during incremental decoding into two distinct GEMM operations, Category: Machine learning","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Machine learning","language model","memory"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=59"}},{"id":"official:3f4021db65114c35","title":"Bi-CAT: Improving robustness of LLM-based text rankers to conditional distribution shifts","url":"https://www.amazon.science/publications/bi-cat-improving-robustness-of-llm-based-text-rankers-to-conditional-distribution-shifts","published":"2024","authors":["Sriram Srinivasan","Stephen Sheng","Rishabh Deshmukh","Chen Luo","Yesh Dattatreya","Subhajit Sanyal","S. V. N. Vishwanathan"],"abstract":"Retrieval and ranking lie at the heart of several applications like search, question-answering, and recommendations. The use of Large language models (LLMs) such as BERT in these applications have shown promising results in recent times. Recent works on text-based retrievers and rankers show promising results by using bi-encoders (BE) architecture with BERT like LLMs for retrieval and a cross-attention Category: Search and information retrieval","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Search and information retrieval","LLM","retrieval"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=73"}},{"id":"official:a392657b820bae54","title":"Automated evaluation of retrieval-augmented language models with task-specific exam generation","url":"https://www.amazon.science/publications/automated-evaluation-of-retrieval-augmented-language-models-with-task-specific-exam-generation","published":"2024","authors":["Gauthier Guinet","Behrooz Omidvar-Tehrani","Anoop Deoras","Laurent Callot"],"abstract":"We propose a new method to measure the task-specific accuracy of Retrieval-Augmented Large Language Models (RAG). Evaluation is performed by scoring the RAG on an automatically-generated synthetic exam composed of multiple choice questions based on the corpus of documents associated with the task. Our method is an automated, cost-efficient, interpretable, and robust strategy to select the optimal components Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Conversational AI","retrieval","efficient"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=61"}},{"id":"official:794c62e5c56cd41e","title":"Approximations may be all you need: Towards pre-training LLMs with low-rank decomposition and optimizers","url":"https://www.amazon.science/publications/approximations-may-be-all-you-need-towards-pre-training-llms-with-low-rank-decomposition-and-optimizers","published":"2024","authors":["Namrata Shivagunde","Mayank Kulkarni","Giannis Karamanolakis","Jack G. M. FitzGerald","Yannick Versley","Saleh Soltan","Volkan Cevher","Jianhua Lu","Anna Rumshisky"],"abstract":"Large language models (LLMs) have achieved remarkable performance on various natural language processing tasks, but training LLMs at scale is extremely resource-intensive, requiring substantial computational power, memory, and energy consumption. This has motivated research into efficient training methods, particularly during the pre-training phase. There are two main approaches to approximate full-rank Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Conversational AI","memory","efficient"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=41"}},{"id":"official:16332f441b394832","title":"An interpretable answer scoring framework","url":"https://www.amazon.science/publications/an-interpretable-answer-scoring-framework","published":"2024","authors":["Omar Alonso","Preetam Dammu","Diji Yang"],"abstract":"In this new LLM-world where users can ask any natural language question, the focus is on the generation of answers with reliable information while satisfying the original intent. LLMs are known to generate multiple versions of answers for the same question, some of which may be better than others. Identifying the most suitable response that adequately addresses the question is non-trivial. In order to tackle Category: Search and information retrieval","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Search and information retrieval","LLM","retrieval"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=56"}},{"id":"official:48cd19e3dd350efd","title":"AdaZeta: Adaptive zeroth-order tensor-train adaption for memory-efficient large language models fine-tuning","url":"https://www.amazon.science/publications/adazeta-adaptive-zeroth-order-tensor-train-adaption-for-memory-efficient-large-language-models-fine-tuning","published":"2024","authors":["Yifan Yang","Kai Zhen","Ershad Banijamali","Thanasis Mouchtaris","Zheng Zhang"],"abstract":"Fine-tuning large language models (LLMs) has achieved remarkable performance across various natural language processing tasks, yet it demands more and more memory as model sizes keep growing. To address this issue, the recently proposed Memory-efficient Zeroth-order (MeZO) methods attempt to fine-tune LLMs using only forward passes, thereby avoiding the need for a backpropagation graph. However, significant Category: Machine learning","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Machine learning","memory","efficient"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=41"}},{"id":"official:a1c847c493ec2d5b","title":"A product-aware query auto-completion framework for e-commerce search via retrieval-augmented generation method","url":"https://www.amazon.science/publications/a-product-aware-query-auto-completion-framework-for-e-commerce-search-via-retrieval-augmented-generation-method","published":"2024","authors":["Andy Sun","Tianqi Zheng","Aakash Kolekar","Rohit Patki","Hossein Khazaei","Xuan Guo","George Cai","David Liu","Ruirui Li","Yupin Huang","Dante Everaert","Hanqing Lu"],"abstract":"Query Auto-Completion (QAC) is a fundamental component of user search experience on e-commerce websites. It assists in finding userintended products, by automatically presenting search queries as users typing in the search bar. Traditional QAC systems build upon query popularity to suggest a list of potential completions, but they fall short for unforeseen search prefixes. A generative Large Language Model Category: Search and information retrieval","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":64,"matched_keywords":["Search and information retrieval","language model","retrieval"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=58"}},{"id":"official:073ec8d19382d042","title":"DataLore: Can a large language model find all lost scrolls in a data repository?","url":"https://www.amazon.science/publications/datalore-can-a-large-language-model-find-all-lost-scrolls-in-a-data-repository","published":"2024","authors":["Yuze Lou","Chuan Lei","Xiao Qin","Zichen Wang","Christos Faloutsos","Rishita Anubhai","Huzefa Rangwala"],"abstract":"How can we effectively generate missing data transformations among tables in a data repository? Multiple versions of the same tables are generated from the iterative process when data scientists and machine learning engineers fine-tune their ML pipelines, making incremental improvements. This process often involves data transformation and augmentation that produces an augmented table based on its base version Category: Information and knowledge management","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.1109/icde60146.2024.00388","openalex_id":"https://openalex.org/W4400910547","cited_by_count":3,"quality_score":63,"matched_keywords":["Information and knowledge management","language model"],"author_affiliations":["Amazon","Amazon (United States)","University of Michigan–Ann Arbor"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=73"}},{"id":"official:570a9e6e85033ac5","title":"Self supervised LLM customizer (SSLC): Customizing LLMs on unlabeled data to enhance contextual question answering","url":"https://www.amazon.science/publications/self-supervised-llm-customizer-sslc-customizing-llms-on-unlabeled-data-to-enhance-contextual-question-answering","published":"2024","authors":["Raveendra Hegde","Saurabh Sharma"],"abstract":"While we can customize large language models (LLMs) on specific domains by finetuning using the domain specific labeled data, performance of the customized models is highly dependent on the quality of the labeled data. Obtaining high-quality labeled data for custom domains often requires considerable human effort and associated costs. However, in many cases, unlabeled data is readily available at little Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.1145/3703412.3703421","openalex_id":"https://openalex.org/W4411464969","cited_by_count":2,"quality_score":62,"matched_keywords":["Conversational AI","LLM"],"author_affiliations":["Amazon","Amazon (United States)"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=48"}},{"id":"official:959c58815cc38e44","title":"Perceptual evaluation of audio-visual synchrony grounded in viewers’ opinion scores","url":"https://www.amazon.science/publications/perceptual-evaluation-of-audio-visual-synchrony-grounded-in-viewers-opinion-scores","published":"2024","authors":["Lucas Goncalves","Prashant Mathur","Chandrashekhar Lavania","Metehan Cekic","Marcello Federico","Kyu Han"],"abstract":"Recent advancements in audio-visual generative modeling have been propelled by progress in deep learning and the availability of data-rich benchmarks. However, the growth is not attributed solely to models and benchmarks. Universally accepted evaluation metrics also play an important role in advancing the field. While there are many metrics available to evaluate audio and visual content separately, there Category: Computer vision","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.1007/978-3-031-72986-7_17","openalex_id":"https://openalex.org/W4403990379","cited_by_count":6,"quality_score":62,"matched_keywords":["Computer vision"],"author_affiliations":["Amazon","Amazon (United States)","Seattle University","The University of Texas at Dallas"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=53"}},{"id":"official:0e6822999be1be1f","title":"Multimodal attention merging for improved speech recognition and audio event classification","url":"https://www.amazon.science/publications/multimodal-attention-merging-for-improved-speech-recognition-and-audio-event-classification","published":"2024","authors":["Anirudh Sundar","Huck Yang","David Chan","Shalini Ghosh","Venkatesh Ravichandran","Phani Nidadavolu"],"abstract":"Training large foundation models using self-supervised objectives on unlabeled data, followed by fine-tuning on downstream tasks, has emerged as a standard procedure. Unfortunately, the efficacy of this approach is often constrained by both limited fine-tuning compute and scarcity in labeled downstream data. We introduce Multimodal Attention Merging (MAM), an attempt that facilitates direct knowledge transfer Category: Machine learning","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.1109/icasspw62465.2024.10627466","openalex_id":"https://openalex.org/W4401597702","cited_by_count":6,"quality_score":62,"matched_keywords":["Machine learning"],"author_affiliations":["Amazon","Amazon (United States)","Georgia Institute of Technology"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=66"}},{"id":"official:06a72bc05c43272d","title":"Explainable and coherent complement recommendation based on large language models","url":"https://www.amazon.science/publications/explainable-and-coherent-complement-recommendation-based-on-large-language-models","published":"2024","authors":["Zelong Li","Yan Liang","Ming Wang","Sungro Yoon","Jiaying Shi","Xin Shen","Xiang He","Chenwei Zhang","Wenyi Wu","Hanbo Wang","Jin Li","Jim Chan"],"abstract":"A complementary item is an item that pairs well with another item when consumed together. In the context of e-commerce, providing recommendations for complementary items is essential for both customers and stores. Current models for suggesting complementary items often rely heavily on user behavior data, such as co-purchase relationships. However, just because two items are frequently bought together does Category: Search and information retrieval","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.1145/3627673.3680028","openalex_id":"https://openalex.org/W4403582628","cited_by_count":2,"quality_score":62,"matched_keywords":["Search and information retrieval","retrieval"],"author_affiliations":["Amazon","Amazon (United States)","Rutgers, The State University of New Jersey"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=53"}},{"id":"official:9685c1038e5f44aa","title":"Diffusion Soup: Model merging for text-to-image diffusion models","url":"https://www.amazon.science/publications/diffusion-soup-model-merging-for-text-to-image-diffusion-models","published":"2024","authors":["Ben Biggs","Arjun Seshadri","Yang Zou","Achin Jain","Aditya Golatkar","Yusheng Xie","Alessandro Achille","Ashwin Swaminathan","Stefano Soatto"],"abstract":"We present Diffusion Soup, a compartmentalization method for Text-to-Image Generation that averages the weights of diffusion models trained on sharded data. By construction, our approach enables training-free continual learning and unlearning with no additional memory or inference costs, since models corresponding to data shards can be added or removed by re-averaging. We show that Diffusion Soup samples Category: Computer vision","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.1007/978-3-031-73036-8_15","openalex_id":"https://openalex.org/W4404545451","cited_by_count":2,"quality_score":62,"matched_keywords":["Computer vision","memory"],"author_affiliations":["Amazon","Amazon (United States)","Uber AI (United States)"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=51"}},{"id":"official:85b14f571ab9d96f","title":"Bayesian prompt ensembles: Model uncertainty estimation for black-box large language models","url":"https://www.amazon.science/publications/bayesian-prompt-ensembles-model-uncertainty-estimation-for-black-box-large-language-models","published":"2024","authors":["Francesco Tonolini","Jordan Massiah","Nikolaos Aletras","Gabriella Kazai"],"abstract":"An important requirement for the reliable deployment of pre-trained large language models (LLMs) is the well-calibrated quantification of the uncertainty in their outputs. While the likelihood of predicting the next token is a practical surrogate of the data uncertainty learned during training, model uncertainty is challenging to estimate, i.e., due to lack of knowledge acquired during training. Prior efforts Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.18653/v1/2024.findings-acl.728","openalex_id":"https://openalex.org/W4402670050","cited_by_count":6,"quality_score":62,"matched_keywords":["Conversational AI"],"author_affiliations":["Amazon","Amazon (Germany)","Amazon (United States)","University of the Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=59"}},{"id":"official:4f42742e2f9ef1a6","title":"Pre-training methods for question reranking","url":"https://www.amazon.science/publications/pre-training-methods-for-question-reranking","published":"2024","authors":["Stefano Campese","Ivano Lauriola","Alessandro Moschitti"],"abstract":"One interesting approach to Question Answering (QA) is to search for semantically similar questions, which have been answered before. This task is different from answer retrieval as it focuses on questions rather than only on the answers, therefore it requires different model training on different data. In this work, we introduce a novel unsupervised pre-training method specialized for retrieving and ranking Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.18653/v1/2024.eacl-short.41","openalex_id":"https://openalex.org/W4411638717","cited_by_count":1,"quality_score":61,"matched_keywords":["Conversational AI","retrieval"],"author_affiliations":["Amazon","Amazon (United States)","University of Trento"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=77"}},{"id":"official:658ad16b51914af9","title":"Leveraging large language models for multimodal search","url":"https://www.amazon.science/publications/leveraging-large-language-models-for-multimodal-search","published":"2024","authors":["Oriol Barbany Mayor","Michael Huang","Xinliang Zhu","Arnab Dhua"],"abstract":"Multimodal search has become increasingly important in providing users with a natural and effective way to ex-press their search intentions. Images offer fine-grained details of the desired products, while text allows for easily incorporating search modifications. However, some existing multimodal search systems are unreliable and fail to address simple queries. The problem becomes harder with the large Category: Computer vision","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.1109/cvprw63382.2024.00127","openalex_id":"https://openalex.org/W4402917078","cited_by_count":5,"quality_score":61,"matched_keywords":["Computer vision"],"author_affiliations":["Amazon","Amazon (United States)","Search"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=68"}},{"id":"official:834f022d1ee55950","title":"Improving LLM group fairness on tabular data via in-context learning","url":"https://www.amazon.science/publications/improving-llm-group-fairness-on-tabular-data-via-in-context-learning","published":"2024","authors":["Valeriia Cherepanova","CJ Lee","Nil-Jana Akpinar","Riccardo Fogliato","Martin Bertran Lopez","Michael Kearns","James Zou"],"abstract":"Large language models (LLMs) have been shown to be effective on tabular prediction tasks in the low-data regime, leveraging their internal knowledge and ability to learn from instructions and examples. However, LLMs can fail to generate predictions that satisfy group fairness, that is, produce equitable outcomes across groups. Critically, conventional debiasing approaches for natural language tasks do not Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.1609/aies.v8i1.36572","openalex_id":"https://openalex.org/W4415231319","cited_by_count":1,"quality_score":61,"matched_keywords":["Conversational AI","LLM"],"author_affiliations":["Amazon","Amazon (United States)"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=63"}},{"id":"official:5cc53524b8c59254","title":"HLAT: High-quality large language model pre-trained on AWS Trainium","url":"https://www.amazon.science/publications/hlat-high-quality-large-language-model-pre-trained-on-aws-trainium","published":"2024","authors":["Haozheng Fan","Hao Zhou","Guangtai Huang","Parameswaran Raman","Mason Fu","Gaurav Gupta","Dhananjay Ram","Yida Wang","Luke Huan"],"abstract":"Getting large language models (LLMs) to perform well on the downstream tasks requires pre-training over trillions of tokens. This typically demands a large number of powerful computational devices in addition to a stable distributed training framework to accelerate the training. The growing number of applications leveraging AI/ML led to a scarcity of the expensive conventional accelerators (such as GPUs Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.1109/bigdata62323.2024.10825098","openalex_id":"https://openalex.org/W4406459231","cited_by_count":1,"quality_score":61,"matched_keywords":["Conversational AI","language model"],"author_affiliations":["Amazon","Amazon (Germany)","Amazon (United States)"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=39"}},{"id":"official:e74cf929064b829a","title":"Entity disambiguation with extreme multi-label ranking","url":"https://www.amazon.science/publications/entity-disambiguation-with-extreme-multi-label-ranking","published":"2024","authors":["Jyun-Yu Jiang","Wei-Cheng Chang","Jiong Zhang","Cho-Jui Hsieh","Hsiang-Fu Yu"],"abstract":"Entity disambiguation is one of the most important natural language tasks to identify entities behind ambiguous surface mentions within a knowledge base. Although many recent studies apply deep learning to achieve decent results, they need exhausting pretraining and mediocre recall in the retrieval stage. In this paper, we propose a novel framework, eXtreme Multi-label Ranking for Entity Disambiguation Category: Search and information retrieval","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.1145/3589334.3645498","openalex_id":"https://openalex.org/W4396722822","cited_by_count":1,"quality_score":61,"matched_keywords":["Search and information retrieval","retrieval"],"author_affiliations":["Amazon","Amazon (United States)","Google (United States)","Search","University of California, Los Angeles"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=75"}},{"id":"official:773a8665cfe3dfe7","title":"Combining multiple metrics for evaluating retrieval-augmented conversations","url":"https://www.amazon.science/publications/combining-multiple-metrics-for-evaluating-retrieval-augmented-conversations","published":"2024","authors":["Jason Choi","Marcus Collins","Eugene Agichtein","Oleg Rokhlenko","Shervin Malmasi"],"abstract":"Conversational AI is a subtype of Human-Computer Interaction that has gained wide adoption. These systems are typically powered by Large Language Models (LLMs) that use Retrieval Augmented Generation (RAG) to infuse external knowledge, which is effective against issues like hallucination. However, automatically evaluating retrieval augmented conversations with minimal human effort remains challenging, particularly Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.18653/v1/2024.hcinlp-1.4","openalex_id":"https://openalex.org/W4401042826","cited_by_count":1,"quality_score":61,"matched_keywords":["Conversational AI","retrieval"],"author_affiliations":["Amazon","Amazon (United States)"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=64"}},{"id":"official:4527a4fbd5aa4c6d","title":"An efficient domain-independent approach for supervised keyphrase extraction and ranking","url":"https://www.amazon.science/publications/an-efficient-domain-independent-approach-for-supervised-keyphrase-extraction-and-ranking","published":"2024","authors":["Sriraghavendra Ramaswamy"],"abstract":"We present a supervised learning approach for automatic extraction of keyphrases from single documents. Our solution uses simple-to-compute statistical and positional features of candidate phrases and does not rely on any external knowledge base or on pre-trained language models or word embeddings. The ranking component of our proposed solution is a fairly lightweight ensemble model. Evaluation on benchmark Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.5121/csit.2024.140421","openalex_id":"https://openalex.org/W4392346270","cited_by_count":1,"quality_score":61,"matched_keywords":["Conversational AI","efficient"],"author_affiliations":["Amazon","Amazon (United States)"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=79"}},{"id":"official:0e9e2cd07dc83be4","title":"VERA: Validation and evaluation of retrieval-augmented systems","url":"https://www.amazon.science/publications/vera-validation-and-evaluation-of-retrieval-augmented-systems","published":"2024","authors":["Terry Ding","Adi Banerjee","Mabel Li","Laurent Mombaerts","Tarik Borogovac","Juan Pablo De la Cruz Weinstein"],"abstract":"The increasing use of Retrieval-Augmented Generation (RAG) systems in various applications necessitates stringent protocols to ensure RAG systems’ accuracy, safety, and alignment with user intentions. In this paper, we introduce VERA (Validation and Evaluation of Retrieval-Augmented Systems), a framework designed to enhance the transparency and reliability of outputs from large language models (LLMs) that Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Conversational AI","retrieval"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=50"}},{"id":"official:e6381b28de63ba65","title":"Unsupervised text representation learning via instruction-tuning for zero-shot dense retrieval","url":"https://www.amazon.science/publications/unsupervised-text-representation-learning-via-instruction-tuning-for-zero-shot-dense-retrieval","published":"2024","authors":["Qiuhai Zeng","Chris Qiu","Dae Yon Hwang","Cynthia He","Bill Campbell"],"abstract":"Dense retrieval systems are commonly used for information retrieval (IR). They rely on learning text representations through an encoder and usually require supervised modeling via labelled data which can be costly to obtain or simply unavailable. In this study, we introduce a novel unsupervised text representation learning technique via instruction-tuning the pre-trained encoder-decoder large language models Category: Search and information retrieval","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Search and information retrieval","retrieval"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=47"}},{"id":"official:507352a5181a7ca2","title":"Unified embeddings for multimodal retrieval via frozen LLMs","url":"https://www.amazon.science/publications/unified-embeddings-for-multimodal-retrieval-via-frozen-llms","published":"2024","authors":["Ziyang Wang","Heba Elfardy","Markus Dreyer","Kevin Small","Mohit Bansal"],"abstract":"In this work, We present Unified Embeddings for Multimodal Retrieval (UNIMUR), a simple but effective approach that embeds multimodal inputs and retrieves visual and textual outputs via frozen Large Language Models (LLMs). Specifically, UNIMUR jointly retrieves multimodal outputs via unified multimodal embedding and applies dual alignment training to account for both visual and textual semantics. Thus, Category: Computer vision","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Computer vision","retrieval"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=73"}},{"id":"official:19f68ca2dd47b2ec","title":"Tree-of-traversals: A zero-shot reasoning algorithm for augmenting black-box language models with knowledge graphs","url":"https://www.amazon.science/publications/tree-of-traversals-a-zero-shot-reasoning-algorithm-for-augmenting-black-box-language-models-with-knowledge-graphs","published":"2024","authors":["Elan Markowitz","Anil Ramakrishna","Jwala Dhamala","Ninareh Mehrabi","Charith Peris","Rahul Gupta","Kai-Wei Chang","Aram Galstyan"],"abstract":"Knowledge graphs (KGs) complement Large Language Models (LLMs) by providing reliable, structured, domain-specific, and up-to-date external knowledge. However, KGs and LLMs are often developed separately and must be integrated after training. We introduce Tree-of-Traversals, a novel zero-shot reasoning algorithm that enables augmentation of black-box LLMs with one or more KGs. The algorithm equips a LLM Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Conversational AI","LLM"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=56"}},{"id":"official:bbd8f888d6ec6fd3","title":"Towards improved multi-source attribution for long-form answer generation","url":"https://www.amazon.science/publications/towards-improved-multi-source-attribution-for-long-form-answer-generation","published":"2024","authors":["Nilay Patel","Shivashankar Subramanian","Siddhant Garg","Pratyay Banerjee","Amita Misra"],"abstract":"Teaching large language models (LLMs) to generate text with attribution to evidence sources can reduce hallucinations, improve verifiability in question answering systems (QA), and increase reliability of retrieval augmented LLMs. Despite gaining increasing popularity for usage in QA systems and search engines, current LLMs struggle with attribution for long-form responses which require reasoning over multiple Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Conversational AI","retrieval"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=71"}},{"id":"official:d72e6d6aa76fecea","title":"Toward informal language processing: Knowledge of slang in large language models","url":"https://www.amazon.science/publications/toward-informal-language-processing-knowledge-of-slang-in-large-language-models","published":"2024","authors":["Zhewei Sun","Qian Hu","Rahul Gupta","Richard Zemel","Yang Xu"],"abstract":"Recent advancement in large language models (LLMs) has offered a strong potential for natural language systems to process informal language. A representative form of informal language is slang, used commonly in daily conversations and online social media. To date, slang has not been comprehensively evaluated in LLMs due partly to the absence of a carefully designed and publicly accessible benchmark. Using Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Conversational AI","media"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=71"}},{"id":"official:09fce77a17aba922","title":"Tokenization matters: Navigating data-scarce tokenization for gender inclusive language technologies","url":"https://www.amazon.science/publications/tokenization-matters-navigating-data-scarce-tokenization-for-gender-inclusive-language-technologies","published":"2024","authors":["Anaelia Ovalle","Ninareh Mehrabi","Palash Goyal","Jwala Dhamala","Kai-Wei Chang","Richard Zemel","Aram Galstyan","Yuval Pinter","Rahul Gupta"],"abstract":"Gender-inclusive NLP research has documented the harmful limitations of gender binary-centric large language models (LLM), such as the inability to correctly use gender-diverse English neopronouns (e.g., xe, zir, fae). While data scarcity is a known culprit, the precise mechanisms through which scarcity affects this behavior remain under-explored. We discover LLM misgendering is significantly influenced Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Conversational AI","LLM"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=60"}},{"id":"official:c4b86c6e9b973707","title":"TofuEval: Evaluating hallucinations of LLMs on topic-focused dialogue summarization","url":"https://www.amazon.science/publications/tofueval-evaluating-hallucinations-of-llms-on-topic-focused-dialogue-summarization","published":"2024","authors":["Liyan Tang","Igor Shalyminov","Amy Wong","Jon Burnsky","Jake Vincent","Yu’an Yang","Siffi Singh","Song Feng","Hwanjun Song","Hang Su","Justin Sun","Yi Zhang"],"abstract":"Single document news summarization has seen substantial progress on faithfulness in recent years, driven by research on the evaluation of factual consistency, or hallucinations. We ask whether these advances carry over to other text summarization domains. We propose a new evaluation benchmark on topic-focused dialogue summarization, generated by LLMs of varying sizes. We provide binary sentence-level human Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Conversational AI","news"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=72"}},{"id":"official:fd4bf44c5e6a157b","title":"The fine-tuning paradox: Boosting translation quality without sacrificing LLM abilities","url":"https://www.amazon.science/publications/the-fine-tuning-paradox-boosting-translation-quality-without-sacrificing-llm-abilities","published":"2024","authors":["David Stap","Eva Hasler","Bill Byrne","Christof Monz","Ke Tran"],"abstract":"Fine-tuning large language models (LLMs) for machine translation has shown improvements in overall translation quality. However, it is unclear what is the impact of fine-tuning on desirable LLM behaviors that are not present in neural machine translation models, such as steerability, inherent document-level translation abilities, and the ability to produce less literal translations. We perform an extensive Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Conversational AI","LLM"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=61"}},{"id":"official:0fc8dd00f4ec1c98","title":"The empirical impact of data sanitization on language models","url":"https://www.amazon.science/publications/the-empirical-impact-of-data-sanitization-on-language-models","published":"2024","authors":["Anwesan Pal","Radhika Bhargava","Kyle Hinsz","Jacques Esterhuizen","Sudi Bhattacharya"],"abstract":"Data sanitization in the context of language modeling involves identifying sensitive content, such as personally identifiable information (PII), and redacting them from a dataset corpus. It is a common practice used in natural language processing (NLP) to maintain privacy. Nevertheless, the impact of data sanitization on the language understanding capability of a language model remains less studied. This Category: Security, privacy, and abuse prevention","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Security, privacy, and abuse prevention","language model"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=40"}},{"id":"official:1f79aa4c8402ef25","title":"The N-Grammys: Accelerating autoregressive inference with learning-free batched speculation","url":"https://www.amazon.science/publications/the-n-grammys-accelerating-autoregressive-inference-with-learning-free-batched-speculation","published":"2024","authors":["Lawrence Stewart","Matthew Trager","Sujan Gonugondla","Stefano Soatto"],"abstract":"Speculative decoding aims to speed up autoregressive generation of a language model by verifying in parallel the tokens generated by a smaller draft model. In this work, we explore the effectiveness of learning-free, negligible-cost draft strategies, namely N-grams obtained from the model weights and the context. While the predicted next token of the base model is rarely the top prediction of these simple Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.48550/arxiv.2411.03786","openalex_id":"https://openalex.org/W4404161885","cited_by_count":0,"quality_score":60,"matched_keywords":["Conversational AI","language model"],"author_affiliations":["Amazon","Amazon (United States)","Sierra Engineering (United States)","University of California, Los Angeles","University of California, Santa Barbara"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=41"}},{"id":"official:cdd1fe7f7e07f9c8","title":"TAIL: Task-specific adapters for imitation learning with large pretrained models","url":"https://www.amazon.science/publications/tail-task-specific-adapters-for-imitation-learning-with-large-pretrained-models","published":"2024","authors":["Zuxin Liu","Jesse Zhang","Kavosh Asadi","Yao Liu","Ding Zhao","Shoham Sabach","Rasool Fakoor"],"abstract":"The full potential of large pretrained models remains largely untapped in control domains like robotics. This is mainly due to data scarcity and computational challenges associated with training or fine-tuning large models for such applications. Prior work mainly emphasizes either effective pretraining of large models for decision-making or single-task adaptation. But real-world problems will require data-efficient Category: Machine learning","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Machine learning","efficient"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=68"}},{"id":"official:57a015d45174227f","title":"Speechworthy instruction-tuned language models","url":"https://www.amazon.science/publications/speechworthy-instruction-tuned-language-models","published":"2024","authors":["Hyundong Cho","Nicolaas Jedema","Leonardo Ribeiro","Karishma Sharma","Pedro Szekely","Alessandro Moschitti","Ruben Janssen","Jonathan May"],"abstract":"Current instruction-tuned language models are exclusively trained with textual preference data and thus are often not aligned with the unique requirements of other modalities, such as speech. To better align language models with the speech domain, we explore (i) prompting strategies grounded in radio-industry best practices and (ii) preference learning using a novel speech-based preference data of 20K samples Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Conversational AI","preference"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=46"}},{"id":"official:3981cfa984afe7d5","title":"Socratic human feedback (SoHF): Expert steering strategies for LLM code generation","url":"https://www.amazon.science/publications/socratic-human-feedback-sohf-expert-steering-strategies-for-llm-code-generation","published":"2024","authors":["Subramanian Chidambaram","Erran Li","Min Bai","Xiaopeng LI","Kaixiang Lin","Xiong Zhou","Alex C. Williams"],"abstract":"Large Language Models (LLMs) are increasingly used for generating code solutions, empowered by features like self-debugging and self-reflection. However, LLMs often struggle with complex programming problems without human guidance. This paper investigates the strategies employed by expert programmers to steer code-generating LLMs toward successful outcomes. Through a study involving experts using natural Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Conversational AI","LLM"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=44"}},{"id":"official:5419ae3bad68b5ae","title":"Sequential LLM framework for fashion recommendation","url":"https://www.amazon.science/publications/sequential-llm-framework-for-fashion-recommendation","published":"2024","authors":["Han Liu","Xianfeng Tang","Tianlang Chen","Jiapeng Liu","Indu Indu","Henry Peng Zou","Peng Dai","Roberto Fernandez Galan","Mike Porter","Dongmei Jia","Ning Zhang","Lian Xiong"],"abstract":"The fashion industry is one of the leading domains in the global e-commerce sector, prompting major online retailers to employ recommendation systems for product suggestions and customer convenience. While recommendation systems have been widely studied, most are designed for general e-commerce problems and struggle with the unique challenges of the fashion domain. To address these issues, we propose a Category: Machine learning","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Machine learning","LLM"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=44"}},{"id":"official:b05b4f72960294f9","title":"Robustness preserving fine-tuning using neuron importance","url":"https://www.amazon.science/publications/robustness-preserving-fine-tuning-using-neuron-importance","published":"2024","authors":["Guangrui Li","Rahul Duggal","Aaditya Singh","Kaustav Kundu","Bing Shuai","Jon Wu"],"abstract":"Robust fine-tuning aims to adapt a vision-language model to downstream tasks while preserving its zero-shot capabilities on unseen data. Recent studies have introduced fine-tuning strategies to improve in-distribution (ID) performance on the downstream tasks while minimizing deterioration in out-of-distribution (OOD) performance on unseen data. This balance is achieved either by aligning the fine-tuned Category: Computer vision","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.1007/978-3-031-73113-6_4","openalex_id":"https://openalex.org/W4404544986","cited_by_count":0,"quality_score":60,"matched_keywords":["Computer vision","language model"],"author_affiliations":["Amazon","Amazon (United States)"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=52"}},{"id":"official:c86b77bd1049fed0","title":"Retrieval augmented spelling correction for e-commerce applications","url":"https://www.amazon.science/publications/retrieval-augmented-spelling-correction-for-e-commerce-applications","published":"2024","authors":["Xuan Guo","Rohit Patki","Dante Everaert","Christopher Potts"],"abstract":"The rapid introduction of new brand names into everyday language poses a unique challenge for e-commerce spelling correction services, which must distinguish genuine misspellings from novel brand names that use unconventional spelling. We seek to address this challenge via Retrieval Augmented Generation (RAG). On this approach, product names are retrieved from a catalog and incorporated into the context Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Conversational AI","retrieval"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=44"}},{"id":"official:8b633ace051d090c","title":"Reconciling methodological paradigms: Employing large language models as novice qualitative research assistants in talent management research","url":"https://www.amazon.science/publications/reconciling-methodological-paradigms-employing-large-language-models-as-novice-qualitative-research-assistants-in-talent-management-research","published":"2024","authors":["Sreyoshi Bhaduri","Satya Kapoor","Alex Gil","Anshul Mittal","Rutu Mulkar"],"abstract":"Qualitative data collection and analysis approaches, such as those employing interviews and focus groups, provide rich insights into customer attitudes, sentiment, and behavior. However, manually analyzing qualitative data requires extensive time and effort to identify relevant topics and thematic insights. This study proposes a novel approach to address this challenge by leveraging Retrieval Augmented Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Conversational AI","retrieval"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=50"}},{"id":"official:c63775ae787a7b97","title":"ReScorer: An aggregation and alignment technique for building trust into LLM reasons","url":"https://www.amazon.science/publications/rescorer-an-aggregation-and-alignment-technique-for-building-trust-into-llm-reasons","published":"2024","authors":["Jay Mohta","Brian de Silva","Sugumar Murugesan","Dantong Liu","Yan Xu","Mingwei Shen"],"abstract":"Large language models (LLMs) offer substantial potential for automating labeling tasks, showcasing robust zero-shot performance across diverse classification tasks. The LLM-generated reasons that accompany these classifications contain signals about the quality of the classifications. Estimates of quality of these reasons can, in essence, be used to detect potentially incorrect predictions. Conventional Category: Machine learning","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Machine learning","LLM"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=46"}},{"id":"official:b4ca0c401dae12ca","title":"RS-DPO: A hybrid rejection sampling and direct preference optimization method for alignment of large language models","url":"https://www.amazon.science/publications/rs-dpo-a-hybrid-rejection-sampling-and-direct-preference-optimization-method-for-alignment-of-large-language-models","published":"2024","authors":["Saeed Khaki","JinJin Li","Lan Ma","Liu Yang","Prathap Ramachandra"],"abstract":"Reinforcement learning from human feedback (RLHF) has been extensively employed to align large language models with user intent. However, proximal policy optimization (PPO) based RLHF is occasionally unstable requiring significant hyperparameter finetuning, and computationally expensive to maximize the estimated reward during alignment. Recently, direct preference optimization (DPO) is proposed to address Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Conversational AI","preference"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=69"}},{"id":"official:3c216d94dde68132","title":"REPOFORMER: Selective retrieval for repository-level code completion","url":"https://www.amazon.science/publications/repoformer-selective-retrieval-for-repository-level-code-completion","published":"2024","authors":["Di Wu","Wasi Ahmad","Dejiao Zhang","Murali Krishna Ramanathan","Xiaofei Ma"],"abstract":"Recent advances in retrieval-augmented generation (RAG) have initiated a new era in repository-level code completion. However, the invariable use of retrieval in existing methods exposes issues in both efficiency and robustness, with a large proportion of the retrieved contexts proving unhelpful or harmful to code language models (code LMs). In this paper, we propose a selective RAG framework to avoid retrieval Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Conversational AI","retrieval"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=60"}},{"id":"official:e63667af75921fd4","title":"RAGChecker: A fine-grained framework for diagnosing retrieval-augmented generation","url":"https://www.amazon.science/publications/ragchecker-a-fine-grained-framework-for-diagnosing-retrieval-augmented-generation","published":"2024","authors":["Dongyu Ru","Lin Qiu","Xiangkun Hu","Tianhang Zhang","Peng Shi","Shuaichen Chang","Cheng Jiayang","Cunxiang Wang","Shichao Sun","Huanyu Li","Zizhao Zhang","Binjie Wang"],"abstract":"Despite Retrieval-Augmented Generation (RAG) showing promising capability in leveraging external knowledge, a comprehensive evaluation of RAG systems is still challenging due to the modular nature of RAG, evaluation of long-form responses and reliability of measurements. In this paper, we propose a fine-grained evaluation framework, RAGChecker, that incorporates a suite of diagnostic metrics for both the Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Conversational AI","retrieval"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=38"}},{"id":"official:af36a35b247a22d0","title":"Q-Tuning: Queue-based prompt tuning for lifelong few-shot language learning","url":"https://www.amazon.science/publications/q-tuning-queue-based-prompt-tuning-for-lifelong-few-shot-language-learning","published":"2024","authors":["Yanhui Guo","Shaoyuan Xu","Jinmiao Fu","Jia (Kevin) Liu","Chaosheng Dong","Bryan Wang"],"abstract":"This paper introduces Q-tuning, a novel approach for continual prompt tuning that enables the lifelong learning of a pre-trained language model. When learning a new task, Q-tuning trains a task-specific prompt by adding it to a prompt queue consisting of the prompts from older tasks. To better transfer the knowledge of old tasks, we design an adaptive knowledge aggregation technique that reweighs previous Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Conversational AI","language model"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=71"}},{"id":"official:5e667417b3594d0f","title":"Panda: Performance debugging for databases using LLM agents","url":"https://www.amazon.science/publications/panda-performance-debugging-for-databases-using-llm-agents","published":"2024","authors":["Vikramank Singh","Kapil Eknath Vaidya","Vinayshekhar Bannihatti Kumar","Sopan Khosla","Murali Narayanaswamy","Rashmi Gangadharaiah","Tim Kraska"],"abstract":"Debugging a performance issue in databases is notoriously hard. Wouldn’t it be convenient if there exists an oracle or a co-pilot for every database system which users can query in natural language (NL) — ‘what’s wrong?’, or even better— ‘How to fix it?’. Large Language Models (LLMs), like ChatGPT, seem to be a natural surrogate to this oracle given their ability to answer a wide range of questions by efficiently Category: Machine learning","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Machine learning","LLM"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=78"}},{"id":"official:5cf6e77aaf74ff48","title":"Order of magnitude speedups for LLM membership inference","url":"https://www.amazon.science/publications/order-of-magnitude-speedups-for-llm-membership-inference","published":"2024","authors":["Rongting Zhang","Martin Bertran Lopez","Aaron Roth"],"abstract":"Large Language Models (LLMs) have the promise to revolutionize computing broadly, but their complexity and extensive training data also expose significant privacy vulnerabilities. One of the simplest privacy risks associated with LLMs is their susceptibility to membership inference attacks (MIAs), wherein an adversary aims to determine whether a specific data point was part of the model’s training set. Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Conversational AI","LLM"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=47"}},{"id":"official:ee47a59ad3abcbd1","title":"One token to seg them all: Language instructed reasoning segmentation in videos","url":"https://www.amazon.science/publications/one-token-to-seg-them-all-language-instructed-reasoning-segmentation-in-videos","published":"2024","authors":["Zechen Bai","Tong He","Haiyang Mei","Pichao Wang","Ziteng Gao","Joya Chen","Lei Liu","Pichao Wang","Zheng Zhang","Mike Zheng Shou"],"abstract":"We introduce VideoLISA, a video-based multimodal large language model designed to tackle the problem of language-instructed reasoning segmentation in videos. Leveraging the reasoning capabilities and world knowledge of large language models, and augmented by the Segment Anything Model, VideoLISA generates temporally consistent segmentation masks in videos based on language instructions. Existing image-based Category: Computer vision","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Computer vision","language model"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=39"}},{"id":"official:5d5ddc13d42074b4","title":"Non-autoregressive sequence-to-sequence vision-language models","url":"https://www.amazon.science/publications/non-autoregressive-sequence-to-sequence-vision-language-models","published":"2024","authors":["Kunyu Shi","Qi Dong","Luis Goncalves","Zhuowen Tu","Stefano Soatto"],"abstract":"Sequence-to-sequence vision-language models are showing promise, but their applicability is limited by their inference latency due to their autoregressive way of generating predictions. We propose a parallel decoding sequence-to-sequence vision-language model, trained with a Query-CTC loss, that marginalizes over multiple inference paths in the decoder. This allows us to model the joint distribution of Category: Computer vision","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Computer vision","language model"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=74"}},{"id":"official:de31f3ef690cc8c2","title":"No head left behind - Multi-head alignment distillation for transformers","url":"https://www.amazon.science/publications/no-head-left-behind-multi-head-alignment-distillation-for-transformers","published":"2024","authors":["Tianyang Zhao","Kunwar Yashraj Singh","Srikar Appalaraju","Peng Tang","Vijay Mahadevan","R. Manmatha","Ying Nian Wu"],"abstract":"Knowledge distillation aims at reducing model size without compromising much performance. Recent work has applied it to large vision-language (VL) Transformers, and has shown that attention maps in the multi-head attention modules of vision-language Transformers contain extensive intra-modal and cross-modal co-reference relations to be distilled. The standard approach is to apply a one-to-one attention Category: Computer vision","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Computer vision","distillation"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=81"}},{"id":"official:a9b630c281004841","title":"Near-duplicate question detection","url":"https://www.amazon.science/publications/near-duplicate-question-detection","published":"2024","authors":["Preetam Dammu","Omar Alonso"],"abstract":"Suggesting relevant questions to users is an important task in various applications, such as community Q&A or e-commerce websites. To ensure that there is no redundancy in the selected set of candidate questions, it is essential to filter out any near-duplicate questions. Identifying near-duplicate questions has another use case in light of the adoption of Large Language Models (LLMs) – fetching pre-computed Category: Search and information retrieval","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Search and information retrieval","retrieval"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=74"}},{"id":"official:ea04c55441cb0eed","title":"Metapath of thoughts: Verbalized metapaths in heterogeneous graph as contextual augmentation to LLM","url":"https://www.amazon.science/publications/metapath-of-thoughts-verbalized-metapaths-in-heterogeneous-graph-as-contextual-augmentation-to-llm","published":"2024","authors":["Harshvardhan Solanki","Jyoti Singh","Yihui Chong","Ankur Teredesai"],"abstract":"Heterogeneous graph neural networks (HGNNs) excel in cap-turing graph topology and structural information. However, they are ineffective in processing the textual components present in nodes and edges and thus producing suboptimal performance in downstream tasks such as node-classification. Additionally, HGNNs lack in their explanatory power and are considered black-box. Although, Large Language models Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Conversational AI","LLM"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=47"}},{"id":"official:38a82f9b36436674","title":"Meta knowledge for retrieval augmented large language models","url":"https://www.amazon.science/publications/meta-knowledge-for-retrieval-augmented-large-language-models","published":"2024","authors":["Laurent Mombaerts","Terry Ding","Florian Felice","Jonathan Taws","Adi Banerjee","Tarik Borogovac"],"abstract":"Retrieval Augmented Generation (RAG) is a technique used to augment Large Language Models (LLMs) with contextually relevant, time-critical, or domain-specific information without altering the underlying model parameters. However, constructing RAG systems that can effectively synthesize information from large and diverse set of documents remains a significant challenge. We introduce a novel data-centric Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Conversational AI","retrieval"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=50"}},{"id":"official:3516f9c4b958c647","title":"MemoryLLM: Towards self-updatable large language models","url":"https://www.amazon.science/publications/memoryllm-towards-self-updatable-large-language-models","published":"2024","authors":["Yu Wang","Yifan Gao","Xiusi Chen","Haoming Jiang","Shiyang Li","Jingfeng Yang","Qingyu Yin","Zheng Li","Xian Li","Bing Yin","Jingbo Shang","Julian McAuley"],"abstract":"Existing Large Language Models (LLMs) usually remain static after deployment, which might make it hard to inject new knowledge into the model. We aim to build models containing a considerable portion of self-updatable parameters, enabling the model to integrate new knowledge effectively and efficiently. To this end, we introduce MemoryLLM, a model that comprises a transformer and a fixed-size memory pool Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Conversational AI","memory"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=60"}},{"id":"official:542167b601761c7e","title":"Measuring question answering difficulty for retrieval-augmented generation","url":"https://www.amazon.science/publications/measuring-question-answering-difficulty-for-retrieval-augmented-generation","published":"2024","authors":["Matteo Gabburo","Nicolaas Jedema","Siddhant Garg","Leonardo Ribeiro","Alessandro Moschitti"],"abstract":"In this paper, we investigate which questions are challenging for retrieval-based Question Answering (QA). We (i) propose retrieval complexity (RC), a novel metric conditioned on the completeness of retrieved documents, which measures the difficulty of answering questions, and (ii) propose an unsupervised pipeline to measure RC given an arbitrary retrieval system. Our proposed pipeline measures RC more Category: Search and information retrieval","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Search and information retrieval","retrieval"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=55"}},{"id":"official:cdba9ef629e1596a","title":"Mapache: Masked parallel transformer for advanced speech editing and synthesis","url":"https://www.amazon.science/publications/mapache-masked-parallel-transformer-for-advanced-speech-editing-and-synthesis","published":"2024","authors":["Guillermo Cambara Ruiz","Patrick Tobing","Mikolaj Babianski","Ravi chander Vipperla","Duo Wang","Ron Shmelkin","Giuseppe Coccia","Orazio Angelini","Arnaud Joly","Mateusz Lajszczak","Vincent Pollet"],"abstract":"Recent advancements in Generative AI, such as scaled Transformer large language models (LLM) and diffusion decoders, have revolutionized speech synthesis. With speech encompassing the complexities of natural language and audio dimensionality, many recent models have relied on autoregressive modeling of quantized speech tokens. Such an approach limits speech synthesis to left-to-right generation, making Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Conversational AI","LLM"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=79"}},{"id":"official:0bb5a837af9fdea0","title":"MERLIN: Multiple enhanced representations with LLM generated indices","url":"https://www.amazon.science/publications/merlin-multiple-enhanced-representations-with-llm-generated-indices","published":"2024","authors":["Anirudh Ravichandran","Yidong Zou","Jayapragash Baskar","Anurag Beniwal"],"abstract":"Large Language Models (LLMs) can be leveraged to improve performance in various stages of the search pipeline – the indexing stage, the query understanding stage, and the ranking or re-ranking stage. The latter two stages involve invoking a LLM during inference, adding latency in fetching the final ranked list of documents. Index enhancement, on the other hand, can be done in the indexing stage, in near Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Conversational AI","LLM"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=52"}},{"id":"official:87e103df19e6cb38","title":"Learning when to retrieve, what to rewrite, and how to respond in conversational QA","url":"https://www.amazon.science/publications/learning-when-to-retrieve-what-to-rewrite-and-how-to-respond-in-conversational-qa","published":"2024","authors":["Nirmal Roy","Leonardo Ribeiro","Rexhina Blloshmi","Kevin Small"],"abstract":"Augmenting Large Language Models (LLMs) with information retrieval capabilities (i.e., Retrieval-Augmented Generation (RAG)) has proven beneficial for knowledge-intensive tasks. However, understanding users’ contextual search intent when generating responses is an understudied topic for conversational question answering (QA). This conversational extension leads to additional concerns when compared to single-turn Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Conversational AI","retrieval"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=46"}},{"id":"official:d5e7b2f9bd2e2b89","title":"Large language model guided graph clustering","url":"https://www.amazon.science/publications/large-language-model-guided-graph-clustering","published":"2024","authors":["Puja Trivedi","Nurendra Choudhary","Eddie Huang","Vassilis N. Ioannidis","Karthik Subbian","Danai Koutra"],"abstract":"Graph clustering on text-attributed graphs (TAGS), i.e., graphs that include natural language text as additional node information, is typically performed using graph neural networks (GNNs), which forego the text in lieu of embeddings. While GNN methods ensure scalability and effectively leverage graph topology, text attributes contain rich information that can be leveraged using large language models (LLMs Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Conversational AI","language model"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=38"}},{"id":"official:48a4f43078a4de99","title":"Label with confidence: Effective confidence calibration and ensembles in LLM-powered classification","url":"https://www.amazon.science/publications/label-with-confidence-effective-confidence-calibration-and-ensembles-in-llm-powered-classification","published":"2024","authors":["Karen Hovsepian","Dantong Liu","Sugumar Murugesan"],"abstract":"Large Language Models (LLMs) have been employed as crowd-sourced annotators to alleviate the burden of human labeling. However, the broader adoption of LLM-based automated labeling systems encounters two main challenges: 1) LLMs are prone to producing unexpected and unreliable predictions, and 2) no single LLM excels at all labeling tasks. To address these challenges, we first develop fast and effective Category: Machine learning","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Machine learning","LLM"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=47"}},{"id":"official:dab914edc66a51e7","title":"LLM-PIEval: A benchmark for indirect prompt injection attacks in large language models","url":"https://www.amazon.science/publications/llm-pieval-a-benchmark-for-indirect-prompt-injection-attacks-in-large-language-models","published":"2024","authors":["Anil Ramakrishna","Jimit Majmudar","Rahul Gupta","Devamanyu Hazarika"],"abstract":"Large Language Models (LLMs) have brought with them an unprecedented interest in AI in society. This has enabled their use in several day to day applications such as virtual assistants or smart home agents. This integration with external tools also brings several risk areas where malicious actors may try to inject harmful instruc-tions in the user query (direct prompt injection) or in the retrieved information Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Conversational AI","LLM"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=41"}},{"id":"official:d2d180c83c5c2f69","title":"Knowledge-centric hallucination detection","url":"https://www.amazon.science/publications/knowledge-centric-hallucination-detection","published":"2024","authors":["Xiangkun Hu","Dongyu Ru","Lin Qiu","Qipeng Guo","Tianhang Zhang","Yang Xu","Yun Luo","Pengfei Liu","Zheng Zhang","Yue Zhang"],"abstract":"Large Language Models (LLMs) have shown impressive capabilities but also a concerning tendency to hallucinate. This paper presents REFCHECKER, a framework that introduces claim-triplets to represent claims in LLM responses, aiming to detect fine-grained hallucinations. In REFCHECKER, an extractor generates claim-triplets from a response, which are then evaluated by a checker against a reference. We delineate Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Conversational AI","LLM"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=42"}},{"id":"official:9cb369cbcd87433e","title":"Intent detection in the age of LLMs","url":"https://www.amazon.science/publications/intent-detection-in-the-age-of-llms","published":"2024","authors":["Gaurav Arora","Shreya Jain","Srujana Merugu"],"abstract":"Intent detection is a critical component of task-oriented dialogue systems (TODS) which enables the identification of suitable actions to address user utterances at each dialog turn. Traditional approaches relied on computationally efficient supervised sentence transformer encoder models, which require substantial training data and struggle with out-of-scope (OOS) detection. The emergence of generative Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Conversational AI","efficient"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=45"}},{"id":"official:73d83ed97faa4f8c","title":"Inductive or deductive? Rethinking the fundamental reasoning abilities of LLMs","url":"https://www.amazon.science/publications/inductive-or-deductive-rethinking-the-fundamental-reasoning-abilities-of-llms","published":"2024","authors":["Kewei Cheng","Jingfeng Yang","Haoming Jiang","Zhengyang Wang","Binxuan Huang","Ruirui Li","Shiyang Li","Zheng Li","Yifan Gao","Xian Li","Bing Yin","Yizhou Sun"],"abstract":"Reasoning encompasses two typical types: deductive reasoning and inductive reasoning. Despite extensive research into the reasoning capabilities of Large Language Models (LLMs), most studies have failed to rigorously differentiate between inductive and deductive reasoning, leading to a blending of the two. This raises an essential question: In LLM reasoning, which poses a greater challenge - deductive or Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Conversational AI","LLM"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=52"}},{"id":"official:08e24a275c7a8aa7","title":"Improving tool retrieval by leveraging large language models for query generation","url":"https://www.amazon.science/publications/improving-tool-retrieval-by-leveraging-large-language-models-for-query-generation","published":"2024","authors":["Mohammad Kachuee","Sarthak Ahuja","Vaibhav Kumar","Puyang Xu","Derek Liu"],"abstract":"Using tools by Large Language Models (LLMs) is a promising avenue to extend their reach beyond language or conversational settings. The number of tools can scale to thousands as they enable accessing sensory information, fetching updated factual knowledge, or taking actions in the real world. In such settings, in-context learning by providing a short list of relevant tools in the prompt is a viable approach Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Conversational AI","retrieval"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=39"}},{"id":"official:01ac801f017f0ee0","title":"ITERALIGN: Iterative constitutional alignment of large language models","url":"https://www.amazon.science/publications/iteralign-iterative-constitutional-alignment-of-large-language-models","published":"2024","authors":["Xiusi Chen","Hongzhi Wen","Sreyashi Nag","Chen Luo","Qingyu Yin","Ruirui Li","Zheng Li","Wei Wang"],"abstract":"With the rapid development of large language models (LLMs), aligning LLMs with human values and societal norms to ensure their reliability and safety has become crucial. Reinforcement learning with human feedback (RLHF) and Constitutional AI (CAI) have been proposed for LLM alignment. However, these methods require either heavy human annotations or explicitly pre-defined constitutions, which are labor-intensive Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Conversational AI","LLM"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=69"}},{"id":"official:8823de26860c2ca8","title":"Has my system prompt been used? Large language model prompt membership inference","url":"https://www.amazon.science/publications/has-my-system-prompt-been-used-large-language-model-prompt-membership-inference","published":"2024","authors":["Roman Levin","Valeriia Cherepanova","Abhimanyu Hans","Avi Schwarzschild","Tom Goldstein"],"abstract":"Prompt engineering has emerged as a powerful technique for optimizing large language models (LLMs) for specific applications, enabling faster prototyping and improved performance, and giving rise to the interest of the community in protecting proprietary system prompts. In this work, we explore a novel perspective on prompt privacy through the lens of membership inference. We develop Prompt Detective, a Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Conversational AI","language model"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=63"}},{"id":"official:f2988204c8cbfb10","title":"Hallucination detection in LLM-enriched product listings","url":"https://www.amazon.science/publications/hallucination-detection-in-llm-enriched-product-listings","published":"2024","authors":["Ling Jiang","Keer Jiang","Xiaoyu Chu","Saaransh Gulati","Pulkit Garg"],"abstract":"E-commerce faces persistent challenges with data quality issue of product listings. Recent advances in Large Language Models (LLMs) offer a promising avenue for automated product listing enrichment. However, LLMs are prone to hallucinations, which we define as the generation of content that is unfaithful to the source input. This poses significant risks in customer-facing applications. Hallucination detection Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Conversational AI","LLM"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=67"}},{"id":"official:910aa3d74eae8c84","title":"HalluMeasure: Fine-grained hallucination measurement using chain-of-thought reasoning","url":"https://www.amazon.science/publications/hallumeasure-fine-grained-hallucination-measurement-using-chain-of-thought-reasoning","published":"2024","authors":["Shayan Ali Akbar","Md Mosharaf Hossain","Tess Wood","Si-Chi Chin","Erica Salinas","Victor Alvarez","Erwin Cornejo"],"abstract":"Automating the measurement of hallucinations in LLM-generated responses is a challenging task as it requires careful investigation of each factual claim in a response. In this paper, we introduce HalluMeasure, a new LLM-based hallucination detection mechanism that decomposes an LLM response into atomic claims, and evaluates each atomic claim against the provided reference context. The model uses a step-by-step Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Conversational AI","LLM"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=45"}},{"id":"official:b2f6e2bff1b66a4f","title":"Generative explore-exploit: Training-free optimization of generative recommender systems using LLM optimizers","url":"https://www.amazon.science/publications/generative-explore-exploit-training-free-optimization-of-generative-recommender-systems-using-llm-optimizers","published":"2024","authors":["Besnik Fetahu","Zhiyu Chen","Davis Yoshida","Giuseppe Castellucci","Nikhita Vedula","Jason Choi","Shervin Malmasi"],"abstract":"Recommender systems are widely used to suggest engaging content, and Large Language Models (LLMs) have given rise to generative recommenders. Such systems can directly generate items, including for open-set tasks like question suggestion. While the world knowledge of LLMs enable good recommendations, improving the generated content through user feedback is challenging as continuously fine-tuning LLMs is Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Conversational AI","LLM"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=57"}},{"id":"official:bfcc8c3857f5102e","title":"FlexEControl: Flexible and efficient multimodal control for text-to-image generation","url":"https://www.amazon.science/publications/flexecontrol-flexible-and-efficient-multimodal-control-for-text-to-image-generation","published":"2024","authors":["Xuehai He","Skyler Zheng","Jacob Zhiyuan Fang","Robinson Piramuthu","Mohit Bansal","Vicente Ordonez","Gunnar Sigurdsson","Violet Peng","Xin Eric Wang"],"abstract":"Controllable text-to-image (T2I) diffusion models generate images conditioned on both text prompts and semantic inputs of other modalities like edge maps. Nevertheless, current controllable T2I methods commonly face challenges related to efficiency and faithfulness, especially when conditioning on multiple inputs from either the same or diverse modalities. In this paper, we propose a novel Flexible and Category: Computer vision","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Computer vision","efficient"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=51"}},{"id":"official:a5d8dc9b63a33d51","title":"Fine-to-coarse entailment hierarchy construction for coarse-to-fine story generation","url":"https://www.amazon.science/publications/fine-to-coarse-entailment-hierarchy-construction-for-coarse-to-fine-story-generation","published":"2024","authors":["Haw-Shiuan Chang","Nanyun Peng","Mohit Bansal","Tagyoung Chung"],"abstract":"When users want to write a story with a language model (LM) assistant such as ChatGPT, it is often very difficult to provide a prompt that clearly specifies all their interests. For the providers of LM assistants, it is also difficult to ensure their output stories come from a dataset without copyright concerns. Motivated by these limitations, we propose a coarse-to-fine (C2F) tree-based story generation Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Conversational AI","language model"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=64"}},{"id":"official:fde3b357a39cbbdb","title":"Fewer truncations improve language modeling","url":"https://www.amazon.science/publications/fewer-truncations-improve-language-modeling","published":"2024","authors":["Hantian Ding","Zijian Wang","Giovanni Paolini","Varun Kumar","Anoop Deoras","Dan Roth","Stefano Soatto"],"abstract":"In large language model training, input documents are typically concatenated together and then split into sequences of equal length to avoid padding tokens. Despite its efficiency, the concatenation approach compromises data integrity — it inevitably breaks many documents into incomplete pieces, leading to excessive truncations that hinder the model from learning to compose logically coherent and factually Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Conversational AI","language model"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=65"}},{"id":"official:82c9e584e3fbd7e3","title":"Fast training dataset attribution via in-context learning","url":"https://www.amazon.science/publications/fast-training-dataset-attribution-via-in-context-learning","published":"2024","authors":["Milad Fotouhi","Taha Bahadori","Oluwaseyi Feyisetan","Seyed Miran","David E. Heckerman"],"abstract":"We investigate the use of in-context learning and prompt engineering to estimate the contributions of training data in the outputs of instruction-tuned large language models (LLMs). We propose two novel approaches: (1) a similarity-based approach that measures the difference between LLM out-puts with and without provided context, and (2) a mixture distribution model approach that frames the problem of identifying Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Conversational AI","LLM"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=53"}},{"id":"official:0ac8e420de95cd0f","title":"FairRAG: Fair human generation via fair retrieval augmentation","url":"https://www.amazon.science/publications/fairrag-fair-human-generation-via-fair-retrieval-augmentation","published":"2024","authors":["Robik Shrestha","Yang Zou","James Chen","Zhiheng Li","Yusheng Xie","Tiffany Deng"],"abstract":"Existing text-to-image generative models reflect or even amplify societal biases ingrained in their training data. This is especially concerning for human image generation where models are biased against certain demographic groups. Existing attempts to rectify this issue are hindered by the inherent limitations of the pre-trained models and fail to substantially improve demographic diversity. In this work Category: Computer vision","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Computer vision","retrieval"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=65"}},{"id":"official:f481450eb3f7ce91","title":"Factual confidence of LLMs: On reliability and robustness of current estimators","url":"https://www.amazon.science/publications/factual-confidence-of-llms-on-reliability-and-robustness-of-current-estimators","published":"2024","authors":["Matéo Mahaut","Laura Aina","Paula Czarnowska","Momchil Hardalov","Thomas Müller","Lluís Marquez"],"abstract":"Large Language Models (LLMs) tend to be unreliable in the factuality of their answers. To address this problem, NLP researchers have proposed a range of techniques to estimate LLM’s confidence over facts. However, due to the lack of a systematic comparison, it is not clear how the different methods compare to one another. To fill this gap, we present a survey and empirical comparison of estimators of factual Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Conversational AI","LLM"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=58"}},{"id":"official:ab4861c2901103ba","title":"FANTAstic SEquences and where to find them: Faithful and efficient API call generation through state-tracked constrained decoding and reranking","url":"https://www.amazon.science/publications/fantastic-sequences-and-where-to-find-them-faithful-and-efficient-api-call-generation-through-state-tracked-constrained-decoding-and-reranking","published":"2024","authors":["Zhuoer Wang","Leonardo Ribeiro","Alexandros Papangelis","Rohan Mukherjee","Tzu-Yen Wang","Xinyan Zhao","Arijit Biswas","James Caverlee","Angeliki Metallinou"],"abstract":"API call generation is the cornerstone of large language models’ tool-using ability that provides access to the larger world. However, existing supervised and in-context learning approaches suffer from high training costs, poor data efficiency, and generated API calls that can be unfaithful to the API documentation and the user’s request. To address these limitations, we propose an output-side optimization Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Conversational AI","efficient"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=53"}},{"id":"official:0d1f3187824fa2ab","title":"Explaining and improving contrastive decoding by extrapolating the probabilities of a huge and hypothetical LM","url":"https://www.amazon.science/publications/explaining-and-improving-contrastive-decoding-by-extrapolating-the-probabilities-of-a-huge-and-hypothetical-lm","published":"2024","authors":["Haw-Shiuan Chang","Nanyun Peng","Mohit Bansal","Anil Ramakrishna","Tagyoung Chung"],"abstract":"Contrastive decoding (CD) (Li et al., 2023) improves the next-token distribution of a large expert language model (LM) using a small amateur LM. Although CD is applied to various LMs and domains to enhance open-ended text generation, it is still unclear why CD often works well, when it could fail, and how we can make it better. To deepen our understanding of CD, we first theoretically prove that CD could Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Conversational AI","language model"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=44"}},{"id":"official:91e7f5d96632a7f4","title":"Evaluation of topic continuity using nonlinearlized naive bayes with attention mechanism","url":"https://www.amazon.science/publications/evaluation-of-topic-continuity-using-nonlinearlized-naive-bayes-with-attention-mechanism","published":"2024","authors":["Shu-Ting Pi","Pradeep Bagavan","Yejia Li","Disha .","Qun Liu"],"abstract":"Utilizing Large Language Models (LLM) as chatbots in diverse business scenarios often presents the challenge of maintaining topic continuity. Abrupt shifts in topics can lead to poor user experiences and inefficient utilization of computational resources. In this paper, we present a topic continuity model aimed at assessing whether a response aligns with the initial conversation topic. Our model is built Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Conversational AI","LLM"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=48"}},{"id":"official:96dc294a1d98afb0","title":"Enhancing low-resource LLMs classification with PEFT and synthetic data","url":"https://www.amazon.science/publications/enhancing-low-resource-llms-classification-with-peft-and-synthetic-data","published":"2024","authors":["Parth Patwa","Simone Filice","Zhiyu Chen","Giuseppe Castellucci","Oleg Rokhlenko","Shervin Malmasi"],"abstract":"Large Language Models (LLMs) operating in 0-shot or few-shot settings achieve competitive results in Text Classification tasks. In-Context Learning (ICL) typically achieves better accuracy than the 0-shot setting, but it pays in terms of efficiency, due to the longer input prompt. In this paper, we propose a strategy to make LLMs as efficient as 0-shot text classifiers, while getting comparable or better Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Conversational AI","efficient"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=67"}},{"id":"official:f0c0ba46acd3e216","title":"Enhancing e-commerce product title translation with retrieval-augmented generation and large language models","url":"https://www.amazon.science/publications/enhancing-e-commerce-product-title-translation-with-retrieval-augmented-generation-and-large-language-models","published":"2024","authors":["Bryan Zhang","Taichi Nakatani","Stephan Walter"],"abstract":"E-commerce stores enable multilingual product discovery which require accurate product title translation. Multilingual large language models (LLMs) have shown promising capacity to perform machine translation tasks, and it can also enhance and translate product titles cross-lingually in one step. However, product title translation often requires more than just language conversion because titles are short Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Conversational AI","retrieval"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=47"}},{"id":"official:314e0485ceb96801","title":"Empowering shoppers with event-focused search","url":"https://www.amazon.science/publications/empowering-shoppers-with-event-focused-search","published":"2024","authors":["Austin Ward","Omar Alonso"],"abstract":"We present Event-focused Search, an automated and scalable pipeline designed to facilitate event discovery and enhance event-based search. This is done by leveraging large language models (LLMs) to populate event datasets, perform temporal search based on selected dates, and aggregate search results based on appropriate events based on those searches. We illustrate this pipeline through proof-of-concept Category: Search and information retrieval","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Search and information retrieval","retrieval"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=52"}},{"id":"official:fcc0056e5f1caf0c","title":"Eliciting better multilingual structured reasoning from LLMs through code","url":"https://www.amazon.science/publications/eliciting-better-multilingual-structured-reasoning-from-llms-through-code","published":"2024","authors":["Bryan Li","Tamer Alkhouli","Daniele Bonadiman","Nikolaos Pappas","Saab Mansour"],"abstract":"The development of large language models (LLM) has shown progress on reasoning, though studies have largely considered either English or simple reasoning tasks. To address this, we introduce a multilingual structured reasoning and explanation dataset, termed xSTREET, that covers four tasks across six languages. xSTREET exposes a gap in base LLM performance between English and non-English reasoning tasks Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Conversational AI","LLM"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=59"}},{"id":"official:441ac577e859bbb6","title":"ECCR: Explainable and coherent complement recommendation based on large language models","url":"https://www.amazon.science/publications/eccr-explainable-and-coherent-complement-recommendation-based-on-large-language-models","published":"2024","authors":["Zelong Li","Yan Liang","Ming Wang","Sungro Yoon","Jiaying Shi","Xin Shen","Xiang He","Chenwei Zhang","Wenyi Wu","Hanbo Wang","Jin Li","Jim Chan"],"abstract":"A complementary item is an item that pairs well with another item when consumed together. In the context of e-commerce, providing recommendations for complementary items is essential for both customers and stores. Current models for suggesting complementary items often rely heavily on user behavior data, such as co-purchase relationships. However, just because two items are frequently bought together does Category: Search and information retrieval","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Search and information retrieval","retrieval"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=56"}},{"id":"official:e7642ad7d404ad2b","title":"E-commerce product categorization with LLM-based dual-expert classification paradigm","url":"https://www.amazon.science/publications/e-commerce-product-categorization-with-llm-based-dual-expert-classification-paradigm","published":"2024","authors":["Zhu Cheng","Wen Zhang","Chih-Chi (Jimmy) Chou","You-Yi Jau","Archita Pathak","Penny Gao","Umit Batur"],"abstract":"Accurate product categorization in e-commerce is critical for delivering a satisfactory online shopping experience to customers. With the vast number of available products and the numerous potential categories, it becomes crucial to develop a classification system capable of assigning products to their correct categories with high accuracy. We present a dual-expert classification system that utilizes the Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Conversational AI","LLM"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=44"}},{"id":"official:d76ba6b70585fa66","title":"Don’t shoot the breeze: Topic continuity model using nonlinear Naive Bayes with attention","url":"https://www.amazon.science/publications/dont-shoot-the-breeze-topic-continuity-model-using-nonlinear-naive-bayes-with-attention","published":"2024","authors":["Shu-Ting Pi","Pradeep Bagavan","Yejia Li","Disha Makhija","Qun Liu"],"abstract":"Utilizing Large Language Models (LLM) as chatbots in diverse business scenarios often presents the challenge of maintaining topic continuity. Abrupt shifts in topics can lead to poor user experiences and inefficient utilization of computational resources. In this paper, we present a topic continuity model aimed at assessing whether a response aligns with the initial conversation topic. Our model is built Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Conversational AI","LLM"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=38"}},{"id":"official:a038478bcf0f98c1","title":"DocKD: Knowledge distillation from LLMs for open-world document understanding models","url":"https://www.amazon.science/publications/dockd-knowledge-distillation-from-llms-for-open-world-document-understanding-models","published":"2024","authors":["Sungnyun Kim","Haofu Liao","Srikar Appalaraju","Peng Tang","Zhuowen Tu","Ravi Kumar Satzoda","R. Manmatha","Vijay Mahadevan","Stefano Soatto"],"abstract":"Visual document understanding (VDU) is a challenging task that involves understanding documents across various modalities (text and image) and layouts (forms, tables, etc.). This study aims to enhance generalizability of small VDU models by distilling knowledge from LLMs. We identify that directly prompting LLMs often fails to generate informative and useful data. In response, we present a new framework Category: Computer vision","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Computer vision","distillation"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=43"}},{"id":"official:dfc749e9c557791b","title":"Discovering bias in latent space: An unsupervised debiasing approach","url":"https://www.amazon.science/publications/discovering-bias-in-latent-space-an-unsupervised-debiasing-approach","published":"2024","authors":["Dyah Adila","Shuai Zhang","Boran Han","Yuyang (Bernie) Wang"],"abstract":"The question-answering (QA) capabilities of foundation models are highly sensitive to prompt variations, rendering their performance susceptible to superficial, non-meaning-altering changes. This vulnerability often stems from the model’s preference or bias towards specific input characteristics, such as option position or superficial image features in multi-modal settings. We propose to rectify this bias Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Conversational AI","preference"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=57"}},{"id":"official:bf5040513aab2c0c","title":"DiffusionPipe: Training large diffusion models with efficient pipelines","url":"https://www.amazon.science/publications/diffusionpipe-training-large-diffusion-models-with-efficient-pipelines","published":"2024","authors":["Ye Tian","Zhen Jia","Ziyue Luo","Yida Wang","Chuan Wu"],"abstract":"Diffusion models have emerged as dominant performers for image generation. To support training large diffusion models, this paper studies pipeline parallel training of diffusion models and proposes DiffusionPipe, a synchronous pipeline training system that advocates innovative pipeline bubble filling technique, catering to structural char-acteristics of diffusion models. State-of-the-art diffusion models Category: Cloud and systems","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Cloud and systems","efficient"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=67"}},{"id":"official:f6162e9d9b1bc277","title":"Convolution meets LoRA: Parameter efficient finetuning for segment anything model","url":"https://www.amazon.science/publications/convolution-meets-lora-parameter-efficient-finetuning-for-segment-anything-model","published":"2024","authors":["Zihan Zhong","Zhiqiang Tang","Tong He","Haoyang Fang","Chun Yuan"],"abstract":"The Segment Anything Model (SAM) stands as a foundational framework for image segmentation. While it exhibits remarkable zero-shot generalization in typical scenarios, its advantage diminishes when applied to specialized domains like medical imagery and remote sensing. To address this limitation, this paper introduces Conv-LoRA, a simple yet effective parameter-efficient fine-tuning approach. By integrating Category: Computer vision","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Computer vision","efficient"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=74"}},{"id":"official:9686eeb3360b17a7","title":"Can small language models help large language models reason better?: LM-guided chain-of-thought","url":"https://www.amazon.science/publications/can-small-language-models-help-large-language-models-reason-better-lm-guided-chain-of-thought","published":"2024","authors":["Jooyoung Lee","Fan Yang","Thanh Tran","Qian Hu","Emre Barut","Kai-Wei Chang","Chengwei Su"],"abstract":"We introduce a novel framework, LM-Guided CoT, that leverages a lightweight (i.e., 10B) LM in reasoning tasks. Specifically, the lightweight LM first generates a rationale for each input instance. The Frozen large LM is then prompted to predict a task output based on the rationale generated by the lightweight LM. Our approach is resource-efficient in the sense Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Conversational AI","efficient"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=71"}},{"id":"official:2063001cf304dc27","title":"CERET: Cost-effective extrinsic refinement for text generation","url":"https://www.amazon.science/publications/ceret-cost-effective-extrinsic-refinement-for-text-generation","published":"2024","authors":["Jason Cai","Hang Su","Monica Sunkara","Igor Shalyminov","Saab Mansour"],"abstract":"Large Language Models (LLMs) are powerful models for generation tasks, but they may not generate good quality outputs in their first attempt. Apart from model fine-tuning, existing approaches to improve prediction accuracy and quality typically involve LLM self-improvement / self-reflection that incorporate feedback from models themselves. Despite their effectiveness, these methods are hindered by their Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Conversational AI","LLM"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=60"}},{"id":"official:8c46566db01c4c1d","title":"CANDLE: Iterative conceptualization and instantiation distillation from large language models for commonsense reasoning","url":"https://www.amazon.science/publications/candle-iterative-conceptualization-and-instantiation-distillation-from-large-language-models-for-commonsense-reasoning","published":"2024","authors":["Weiqi Wang","Tianqing Fang","Chunyang Li","Haochen Shi","Wenxuan Ding","Baixuan Xu","Zhaowei Wang","Jiaxin Bai","Xin Liu","Jiayang Cheng","Chunkit Chan","Yangqiu Song"],"abstract":"The sequential process of conceptualization and instantiation is essential to generalizable commonsense reasoning as it allows the application of existing knowledge to unfamiliar scenarios. However, existing works tend to undervalue the step of instantiation and heavily rely on pre-built concept taxonomies and human annotations to collect both types of knowledge, resulting in a lack of instantiated knowledge Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Conversational AI","distillation"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=47"}},{"id":"official:3f323ff685ac4499","title":"Building natural language interface for product search","url":"https://www.amazon.science/publications/building-natural-language-interface-for-product-search","published":"2024","authors":["Vijit Malik","Vinayak Puranik","Anirban Majumder","Vivek Sembium"],"abstract":"Automatic extraction of attribute preferences from search queries is a critical problem in providing accurate product recommendations to customer. The task becomes even more challenging in cold-start settings where we do not have any supervised/labelled data available to train ML models. In this work, we implement a novel dataset generation pipeline (LLM-API) that leverages Large Language Models (LLMs), Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Conversational AI","LLM"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=51"}},{"id":"official:74fc24814b090352","title":"BioBridge: Bridging biomedical foundation models via knowledge graphs","url":"https://www.amazon.science/publications/biobridge-bridging-biomedical-foundation-models-via-knowledge-graphs","published":"2024","authors":["Zifeng Wang","Zichen Wang","Balasubramaniam Srinivasan","Vassilis N. Ioannidis","Huzefa Rangwala","Rishita Anubhai"],"abstract":"Foundation models (FMs) learn from large volumes of unlabeled data to demonstrate superior performance across a wide range of tasks. However, FMs developed for biomedical domains have largely remained unimodal, i.e., independently trained and used for tasks on protein sequences alone, small-molecule structures alone, or clinical data alone. To overcome this limitation, we present BioBRIDGE, a parameter-efficient Category: Machine learning","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Machine learning","efficient"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=76"}},{"id":"official:d80a96f2b312951a","title":"Auto-evolve: Enhancing large language model’s performance via self-reasoning framework","url":"https://www.amazon.science/publications/auto-evolve-enhancing-large-language-models-performance-via-self-reasoning-framework","published":"2024","authors":["Krishna Aswani","Alex Lu","Pranav Patankar","Priya Dhalwani","Iris Tan","Jayant Ganeshmohan","Simon Lacasse"],"abstract":"Recent advancements in prompt engineering strategies, such as Chain-of-Thought (CoT) and Self-Discover, have demonstrated significant potential in improving the reasoning abilities of Large Language Models (LLMs). However, these state-of-the-art (SOTA) prompting strategies rely on single or fixed set of static seed reasoning modules like \"think step by step\" or \"break down this problem\" intended to simulate Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Conversational AI","language model"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=44"}},{"id":"official:bd1d61978fb90d63","title":"Attribute controlled fine-tuning for large language models: A case study on detoxification","url":"https://www.amazon.science/publications/attribute-controlled-fine-tuning-for-large-language-models-a-case-study-on-detoxification","published":"2024","authors":["Tao Meng","Ninareh Mehrabi","Palash Goyal","Anil Ramakrishna","Aram Galstyan","Richard Zemel","Kai-Wei Chang","Rahul Gupta","Charith Peris"],"abstract":"We propose a constraint learning schema for fine-tuning Large Language Models (LLMs) with attribute control. Given a training corpus and control criteria formulated as a sequencelevel constraint on model outputs, our method fine-tunes the LLM on the training corpus while enhancing constraint satisfaction with minimal impact on its utility and generation quality. Specifically, our approach regularizes the Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Conversational AI","LLM"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=45"}},{"id":"official:6bfcf94caa52a91a","title":"An interpretable ensemble of graph and language models for improving search relevance in e-commerce","url":"https://www.amazon.science/publications/an-interpretable-ensemble-of-graph-and-language-models-for-improving-search-relevance-in-e-commerce","published":"2024","authors":["Nurendra Choudhary","Eddie Huang","Karthik Subbian","Chandan Reddy"],"abstract":"The problem of search relevance in the E-commerce domain is a challenging one since it involves understanding the intent of a user’s short nuanced query and matching it with the appropriate products in the catalog. This problem has traditionally been addressed using language models (LMs) and graph neural networks (GNNs) to capture semantic and inter-product behavior signals, respectively. However, the rapid Category: Machine learning","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.1145/3589335.3648318","openalex_id":"https://openalex.org/W4392490155","cited_by_count":4,"quality_score":60,"matched_keywords":["Machine learning"],"author_affiliations":["Amazon","Amazon (United States)","Virginia Tech"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=75"}},{"id":"official:5fac0cb1a0f1c4bb","title":"Adaptive video understanding agent: Enhancing efficiency with dynamic frame sampling and feedback-driven reasoning","url":"https://www.amazon.science/publications/adaptive-video-understanding-agent-enhancing-efficiency-with-dynamic-frame-sampling-and-feedback-driven-reasoning","published":"2024","authors":["Sullam Jeoung","Goeric Huybrechts","Bhavana Ganesh","Aram Galstyan","Sravan Bodapati"],"abstract":"Understanding long-form video content presents significant challenges due to its temporal complexity and the substantial computational resources required. In this work, we propose an agent-based approach to enhance both the efficiency and effectiveness of long-form video understanding by utilizing large language models (LLMs) and their tool-harnessing ability. A key aspect of our method is queryadaptive Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Conversational AI","agent"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=44"}},{"id":"official:cdc31bcab6ea7328","title":"Adapting LLM predictions in in-context learning with data priors","url":"https://www.amazon.science/publications/adapting-llm-predictions-in-in-context-learning-with-data-priors","published":"2024","authors":["Javier Chiyah-Garcia","Prasoon Goyal","Michael Johnston","Reza Ghanadan"],"abstract":"In-Context Learning (ICL) has enabled Large Language Models (LLMs) to excel as generalpurpose models in zero and few-shot task settings. However, since LLMs are often not trained on the downstream tasks, they lack crucial contextual knowledge from the data distributions, which limits their task adaptability. This paper explores using data priors to automatically customize prompts in ICL. We extract these Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Conversational AI","LLM"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=43"}},{"id":"official:15c6ba4671d5acf1","title":"Accept the modality gap: An exploration in the hyperbolic space","url":"https://www.amazon.science/publications/accept-the-modality-gap-an-exploration-in-the-hyperbolic-space","published":"2024","authors":["Sameera Ramasinghe","Violetta Shevchenko","Gil Avraham","Ajanthan Thalaiyasingam"],"abstract":"Recent advancements in machine learning have spotlighted the potential of hyperbolic spaces as they effectively learn hierarchical feature representations. While there has been progress in leveraging hyperbolic spaces in single-modality contexts, its exploration in multimodal settings remains underexplored. A recent work has sought to transpose Euclidean multimodal learning techniques to hyperbolic spaces Category: Search and information retrieval","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Search and information retrieval","retrieval"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=68"}},{"id":"official:d52b7a6616d56b22","title":"AG-LSEC: Audio grounded lexical speaker error correction","url":"https://www.amazon.science/publications/ag-lsec-audio-grounded-lexical-speaker-error-correction","published":"2024","authors":["Rohit Paturi","Xiang Li","Sundararajan Srinivasan"],"abstract":"Speaker Diarization (SD) systems are typically audio-based and operate independently of the ASR system in traditional speech transcription pipelines and can have speaker errors due to SD and/or ASR reconciliation, especially around speaker turns and regions of speech overlap. To reduce these errors, a Lexical Speaker Error Correction (LSEC), in which an external language model provides lexical information Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Conversational AI","language model"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=55"}},{"id":"official:aaa75bea2dc0643a","title":"A preference-driven paradigm for enhanced translation with large language models","url":"https://www.amazon.science/publications/a-preference-driven-paradigm-for-enhanced-translation-with-large-language-models","published":"2024","authors":["Dawei Zhu","Sony Trenous","Xiaoyu Shen","Dietrich Klakow","Bill Byrne","Eva Hasler"],"abstract":"Recent research has shown that large language models (LLMs) can achieve remarkable translation performance through supervised fine tuning (SFT) using only a small amount of parallel data. However, SFT simply instructs the model to imitate the reference translations at the token level, making it vulnerable to the noise present in the references. Hence, the assistance from SFT often reaches a plateau once Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":60,"matched_keywords":["Conversational AI","preference"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=70"}},{"id":"official:09a0d358a513969e","title":"VidLA: Video-language alignment at scale","url":"https://www.amazon.science/publications/vidla-video-language-alignment-at-scale","published":"2024","authors":["Mamshad Nayeem Rizve","Fan Fei","Jayakrishnan Unnikrishnan","Son Tran","Benjamin Yao","Belinda Zeng","Mubarak Shah","Trishul Chilimbi"],"abstract":"In this paper, we propose VidLA, an approach for video-language alignment at scale. There are two major limitations of previous video-language alignment approaches. First, they do not capture both short-range and long-range temporal dependencies and typically employ complex hierarchical deep network architectures that are hard to integrate with existing pretrained image-text foundation models. To effectively Category: Computer vision","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.1109/cvpr52733.2024.01332","openalex_id":"https://openalex.org/W4402727883","cited_by_count":3,"quality_score":59,"matched_keywords":["Computer vision"],"author_affiliations":["Amazon","Amazon (United States)"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=72"}},{"id":"official:a88e0114045c2256","title":"Extracting structured labor market information from job postings with generative ai","url":"https://www.amazon.science/publications/extracting-structured-labor-market-information-from-job-postings-with-generative-ai","published":"2024","authors":["Mark Howison","Will Ensor","Suraj Maharjan","Rahil Parikh","Srinivasan Sengamedu","\"SHS\"","Paul Daniels","Amber Gaither","Carrie Yeats","Chandan Reddy","Justine Hastings"],"abstract":"Labor market information is an important input to labor, workforce, education, and macroeconomic policy. However, granular and real-time data on labor market trends are lacking; publicly available data from survey samples are released with significant lags and miss critical information such as skills and benefits. We use generative Artificial Intelligence to automatically extract structured labor market Category: Economics","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.1145/3674847","openalex_id":"https://openalex.org/W4400555572","cited_by_count":3,"quality_score":59,"matched_keywords":["Economics"],"author_affiliations":["Amazon","Amazon (United States)","National Association of Area Agencies on Aging","Seattle University","University of Washington","Virginia Tech"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=55"}},{"id":"official:292217e3c3eaa6a5","title":"Zero-shot controllable image-to-video animation via motion decomposition","url":"https://www.amazon.science/publications/zero-shot-controllable-image-to-video-animation-via-motion-decomposition","published":"2024","authors":["Shoubin Yu","Jacob Zhiyuan Fang","Skyler Zheng","Gunnar Sigurdsson","Vicente Ordonez","Robinson Piramuthu","Mohit Bansal"],"abstract":"In this paper, we introduce a new challenging task called Zero-Shot Controllable Image-to-Video Animation, where the goal is to animate an image based on motion trajectories defined by the user, without fine-tuning the base model. Primary challenges include maintaining consistency of background, consistency of object in motion, faithfulness to the user-defined trajectory, and quality of motion animation Category: Computer vision","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.1145/3664647.3681394","openalex_id":"https://openalex.org/W4403791617","cited_by_count":2,"quality_score":58,"matched_keywords":["Computer vision"],"author_affiliations":["Amazon","Amazon (United States)","Rice University","University of North Carolina at Chapel Hill"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=51"}},{"id":"official:46b0c868b782f1b7","title":"X-Former: Unifying contrastive and reconstruction learning for MLLMs","url":"https://www.amazon.science/publications/x-former-unifying-contrastive-and-reconstruction-learning-for-mllms","published":"2024","authors":["Swetha Sirnam","Jinyu Yang","Tal Neiman","Mamshad Nayeem Rizve","Son Tran","Benjamin Yao","Trishul Chilimbi","Mubarak Shah"],"abstract":"Recent advancements in Multimodal Large Language Models (MLLMs) have revolutionized the field of vision-language understanding by integrating visual perception capabilities into Large Language Models (LLMs). The prevailing trend in this field involves the utilization of a vision encoder derived from vision-language contrastive learning (CL), showing expertise in capturing overall representations while facing Category: Computer vision","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.1007/978-3-031-72658-3_9","openalex_id":"https://openalex.org/W4403081406","cited_by_count":2,"quality_score":58,"matched_keywords":["Computer vision"],"author_affiliations":["Amazon","Amazon (United States)","Seattle University","University of Central Florida"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=54"}},{"id":"official:200f338f5e70874c","title":"Scaling object-centric robotic manipulation with multimodal object identification","url":"https://www.amazon.science/publications/scaling-object-centric-robotic-manipulation-with-multimodal-object-identification","published":"2024","authors":["Chaitanya Mitash","Mostafa Hussein","Jeroen Vanbaar","Vikedo Terhuja","Kapil Katyal"],"abstract":"Robotic manipulation is a key enabler for automation in the fulfillment logistics sector. Such robotic systems require perception and manipulation capabilities to handle a wide variety of objects. Existing systems either operate on a closed set of objects or perform object-agnostic manipulation which lacks the capability for deliberate and reliable manipulation at scale. Object identification (ID) unlocks Category: Computer vision","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.1109/icra57147.2024.10611181","openalex_id":"https://openalex.org/W4401416136","cited_by_count":2,"quality_score":58,"matched_keywords":["Computer vision"],"author_affiliations":["Amazon","Amazon (United States)"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=73"}},{"id":"official:bba378c2224a73c4","title":"MIVC: Multiple instance visual component for visual-language models","url":"https://www.amazon.science/publications/mivc-multiple-instance-visual-component-for-visual-language-models","published":"2024","authors":["Wenyi Wu","Qi Li","Wenliang Zhong","Junzhou Huang"],"abstract":"Vision-language models have been widely explored across a wide range of tasks and achieve satisfactory performance. However, it’s under-explored how to consolidate entity understanding through a varying number of images and to align it with the pre-trained language models for generative tasks. In this paper, we propose MIVC, a general multiple instance visual component to bridge the gap between various Category: Computer vision","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.1109/wacv57701.2024.00793","openalex_id":"https://openalex.org/W4394597968","cited_by_count":2,"quality_score":58,"matched_keywords":["Computer vision"],"author_affiliations":["Amazon","Amazon (United States)","The University of Texas at Arlington"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=81"}},{"id":"official:af2c655324bdd81b","title":"De-noised vision-language fusion guided by visual cues for e-commerce product search","url":"https://www.amazon.science/publications/de-noised-vision-language-fusion-guided-by-visual-cues-for-e-commerce-product-search","published":"2024","authors":["Zhizhang Hu","Shasha Li","Ming Du","Arnab Dhua","Doug Gray"],"abstract":"In e-commerce applications, vision-language multimodal transformer models play a pivotal role in product search. The key to successfully training a multimodal model lies in the alignment quality of image-text pairs in the dataset. However, the data in practice is often automatically collected with minimal manual intervention. Hence the alignment of image-text pairs is far from ideal. In e-commerce, this Category: Computer vision","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.1109/cvprw63382.2024.00204","openalex_id":"https://openalex.org/W4402916408","cited_by_count":2,"quality_score":58,"matched_keywords":["Computer vision"],"author_affiliations":["Amazon","Amazon (United States)","Search","University of California, Merced"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=67"}},{"id":"official:84ffec53f222980d","title":"Augment the pairs | Semantics-preserving image-caption pair augmentation for grounding-based vision and language models","url":"https://www.amazon.science/publications/augment-the-pairs-semantics-preserving-image-caption-pair-augmentation-for-grounding-based-vision-and-language-models","published":"2024","authors":["Jingru Yi","Burak Uzkent","Oana Ignat","Zili Li","Amanmeet Garg","Xiang Yu","Linda Liu"],"abstract":"Grounding-based vision and language models have been successfully applied to low-level vision tasks, aiming to precisely locate objects referred in captions. The effectiveness of grounding representation learning heavily relies on the scale of the training dataset. Despite being a useful data enrichment strategy, data augmentation has received minimal attention in existing vision and language tasks as augmentation Category: Computer vision","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.1109/wacv57701.2024.00543","openalex_id":"https://openalex.org/W4394597381","cited_by_count":2,"quality_score":58,"matched_keywords":["Computer vision"],"author_affiliations":["Amazon","Amazon (United States)"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=82"}},{"id":"official:48bc8fa22267aa4b","title":"Aligning vision language models with contrastive learning","url":"https://www.amazon.science/publications/aligning-vision-language-models-with-contrastive-learning","published":"2024","authors":["Kenan Emir Ak","Jay Mohta","Dimitris Dimitriadis","Saurav Manchanda","Yan Xu","Mingwei Shen"],"abstract":"In recent years, Vision Language Models (VLMs) have achieved significant advancements due to the success of large language models. The common strategy for aligning vision and language models involves a two-step process: an alignment (or pretraining) stage and an instruction tuning stage. During the alignment stage, a projection module is trained to map image embeddings into the language space using a paired Category: Computer vision","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.1007/978-3-031-91672-4_3","openalex_id":"https://openalex.org/W4410540170","cited_by_count":2,"quality_score":58,"matched_keywords":["Computer vision"],"author_affiliations":["Amazon","Amazon (United States)","Seattle University"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=50"}},{"id":"official:90783da2cce0186b","title":"Post-training embedding alignment for decoupling enrollment and runtime speaker recognition models","url":"https://www.amazon.science/publications/post-training-embedding-alignment-for-decoupling-enrollment-and-runtime-speaker-recognition-models","published":"2024","authors":["Chenyang Gao","Brecht Desplanques","Chelsea J.-T. Ju","Aman Chadha","Andreas Stolcke"],"abstract":"Automated speaker identification (SID) is a crucial step for the per-sonalization of a wide range of speech-enabled services. Typical SID systems use a symmetric enrollment-verification framework with a single model to derive embeddings both offline for voice profiles extracted from enrollment utterances, and online from runtime utter-ances. Due to the distinct circumstances of enrollment and runtime, such Category: Machine learning","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.48550/arxiv.2401.12440","openalex_id":"https://openalex.org/W4391212374","cited_by_count":1,"quality_score":57,"matched_keywords":["Machine learning"],"author_affiliations":["Amazon","Amazon (United States)","Rutgers, The State University of New Jersey"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=80"}},{"id":"official:49aad251385e1d10","title":"Open vocabulary multi-label video classification","url":"https://www.amazon.science/publications/open-vocabulary-multi-label-video-classification","published":"2024","authors":["Rohit Gupta","Mamshad Nayeem Rizve","Jayakrishnan Unnikrishnan","Ashish Tawari","Son Tran","Mubarak Shah"],"abstract":"Pre-trained vision-language models (VLMs) have enabled significant progress in open vocabulary computer vision tasks such as image classification, object detection and image segmentation. Some recent works have focused on extending VLMs to open vocabulary single label action classification in videos. However, previous methods fall short in holistic video understanding which requires the ability to simultaneously Category: Computer vision","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.1007/978-3-031-72933-1_16","openalex_id":"https://openalex.org/W4403068684","cited_by_count":1,"quality_score":57,"matched_keywords":["Computer vision"],"author_affiliations":["Amazon","Amazon (United States)","Seattle University","University of Central Florida"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=54"}},{"id":"official:ad2e6c107149cb0a","title":"Masking latent gender knowledge for debiasing image captioning","url":"https://www.amazon.science/publications/masking-latent-gender-knowledge-for-debiasing-image-captioning","published":"2024","authors":["Fan Yang","Shalini Ghosh","Kechen Qin","Prashan Wanigasekara","Emre Barut","Chengwei Su","Rahul Gupta","Weitong Ruan"],"abstract":"Large language models incorporate world knowledge and present breakthrough performances on zero-shot learning. However, these models capture societal bias (e.g., gender or racial bias) due to bias during the training process which raises ethical concerns or can even be potentially harmful. The issue is more pronounced in multi-modal settings, such as image captioning, as images can also add onto biases Category: Computer vision","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.18653/v1/2024.trustnlp-1.19","openalex_id":"https://openalex.org/W4401042107","cited_by_count":1,"quality_score":57,"matched_keywords":["Computer vision"],"author_affiliations":["Amazon","Amazon (United States)"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=55"}},{"id":"official:6536e03361eceefa","title":"HVCLIP: High-dimensional vector in CLIP for unsupervised domain adaptation","url":"https://www.amazon.science/publications/hvclip-high-dimensional-vector-in-clip-for-unsupervised-domain-adaptation","published":"2024","authors":["Sol Vesdapunt","Kah Kuen Fu","Yue (Rex) Wu","Xu Zhang","Pradeep Natarajan"],"abstract":"Recent advancement in the large-scale image-text pre-training model (such as CLIP) has significantly improved unsupervised domain adaptation (UDA) by leveraging the pre-trained knowledge to bridge the source and target domain gap. However, Catastrophic forgetting still remains to be the main challenge, since traditional fine-tuning method to adjust CLIP model weights on a target domain can quickly override Category: Computer vision","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.1007/978-3-031-72848-8_3","openalex_id":"https://openalex.org/W4404792747","cited_by_count":1,"quality_score":57,"matched_keywords":["Computer vision"],"author_affiliations":["Amazon","Amazon (United States)","Seattle University"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=54"}},{"id":"official:18b9d27d12b75888","title":"A multimodal benchmark and improved architecture for zero shot learning","url":"https://www.amazon.science/publications/a-multimodal-benchmark-and-improved-architecture-for-zero-shot-learning","published":"2024","authors":["Keval Doshi","Amanmeet Garg","Burak Uzkent","Xiaolong Wang","Mohamed Omar"],"abstract":"In this work, we demonstrate that due to the inadequacies in the existing evaluation protocols and datasets, there is a need to revisit and comprehensively examine the multimodal Zero-Shot Learning (MZSL) problem formulation. Specifically, we address two major challenges faced by current MZSL approaches; (1) Established baselines are frequently incomparable and occasionally even flawed since existing evaluation Category: Computer vision","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.1109/wacv57701.2024.00202","openalex_id":"https://openalex.org/W4394597744","cited_by_count":1,"quality_score":57,"matched_keywords":["Computer vision"],"author_affiliations":["Amazon","Amazon (United States)"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=81"}},{"id":"official:8f0283d6071f13a7","title":"iEdit: Localised text-guided image editing with weak supervision","url":"https://www.amazon.science/publications/iedit-localised-text-guided-image-editing-with-weak-supervision","published":"2024","authors":["Rumeysa Bodur","Erhan Gundogdu","Binod Bhattarai","Tae-Kyun Kim","Michael Donoser","Loris Bazzani"],"abstract":"Diffusion models (DMs) can generate realistic images with text guidance using large-scale datasets. However, they demonstrate limited controllability on the generated images. We introduce iEdit, a novel method for text-guided image editing conditioned on a source image and textual prompt. As a fully-annotated dataset with target images does not exist, previous approaches perform subject-specific fine-tuning Category: Computer vision","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Computer vision"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=64"}},{"id":"official:c2b65fa3d873029a","title":"Weakly-supervised multi-sensor anomaly detection with time-series foundation models","url":"https://www.amazon.science/publications/weakly-supervised-multi-sensor-anomaly-detection-with-time-series-foundation-models","published":"2024","authors":["Zelin He","Matthew Reimherr","Sarah Alnegheimish","Akash Chandrayan"],"abstract":"Anomaly detection in industrial sensor data is challenging as sensor readings are frequently affected by routine operations, leading to sudden changes that may not indicate actual issues. This makes it difficult to distinguish between normal and anomalous behavior. With a few expert-labeled anomalies, we aim to leverage these sparse labels to improve sensor anomaly detection. Besides the issue of limited Category: Machine learning","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Machine learning"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=38"}},{"id":"official:4773b8865bad5f10","title":"Vision-language understanding in hyperbolic space","url":"https://www.amazon.science/publications/vision-language-understanding-in-hyperbolic-space","published":"2024","authors":["Sarthak Srivastava","Kathy Wu"],"abstract":"State-of-the-art performance has been achieved in recent years on tasks such as search, recommendation and classification using Visuo-Lingual Multi-Modal models. While the pre-trained Vision-Language models like Contrastive Language-Image Pre-training (CLIP) have achieved promising zero-shot performance on several generalized tasks by learning vision-language concepts in a common space, the natural hierarchical Category: Computer vision","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Computer vision"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=56"}},{"id":"official:c53cfa10b5a1d167","title":"VisFocus: Prompt-guided vision encoders for OCR-free dense document understanding","url":"https://www.amazon.science/publications/visfocus-prompt-guided-vision-encoders-for-ocr-free-dense-document-understanding","published":"2024","authors":["Ofir Abramovich","Niv Nayman","Sharon Fogel","Inbal Lavi","Ron Litman","Shahar Tsiper","Royee Tichauer","Srikar Appalaraju","Shai Mazor","R. Manmatha"],"abstract":"In recent years, notable advancements have been made in the domain of visual document understanding, with the prevailing architecture comprising a cascade of vision and language models. The text component can either be extracted explicitly with the use of external OCR models in OCR-based approaches, or alternatively, the vision model can be endowed with reading capabilities in OCR-free approaches. Typically Category: Computer vision","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Computer vision"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=53"}},{"id":"official:b333345f1d34f02c","title":"ViewFusion: Towards multi-view consistency via interpolated denoising","url":"https://www.amazon.science/publications/viewfusion-towards-multi-view-consistency-via-interpolated-denoising","published":"2024","authors":["Xianghui Yang","Yan Zuo","Sameera Ramasinghe","Loris Bazzani","Gil Avraham","Anton van den Hengel"],"abstract":"Novel-view synthesis through diffusion models has demonstrated remarkable potential for generating diverse and high-quality images. Yet, the independent process of image generation in these prevailing methods leads to challenges in maintaining multiple view consistency. To address this, we introduce ViewFusion, a novel, training-free algorithm that can be seamlessly integrated into existing pre-trained Category: Computer vision","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Computer vision"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=69"}},{"id":"official:daf0b1e136af37b0","title":"ViGoR: Improving visual grounding of large vision language models with fine-grained reward modeling","url":"https://www.amazon.science/publications/vigor-improving-visual-grounding-of-large-vision-language-models-with-fine-grained-reward-modeling","published":"2024","authors":["Siming Yan","Min Bai","Weifeng Chen","Xiong Zhou","Qixing Huang","Erran Li"],"abstract":"By combining natural language understanding, generation capabilities, and breadth of knowledge of large language models with image perception, recent large vision language models (LVLMs) have shown unprecedented visual reasoning capabilities. However, the generated text often suffers from inaccurate grounding in the visual input, resulting in errors such as hallucination of nonexistent scene elements, missing Category: Computer vision","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Computer vision"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=53"}},{"id":"official:d1f5bce1297ea362","title":"Trustworthiness in medical product question answering by large language models","url":"https://www.amazon.science/publications/trustworthiness-in-medical-product-question-answering-by-large-language-models","published":"2024","authors":["Daniel Lopez-Martinez"],"abstract":"Large language models (LLMs) have achieved remarkable progress in recent years. These models have the capability to answer complex questions about medical disorders, their pathophysiology, etiology and corresponding interventions. However, when providing information about medical products and treatments, it is important to ensure that models respond reliably with factually correct information that adheres Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Conversational AI"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=63"}},{"id":"official:60ed2f90bb8b7487","title":"Transferring knowledge from large foundation models to small downstream models","url":"https://www.amazon.science/publications/transferring-knowledge-from-large-foundation-models-to-small-downstream-models","published":"2024","authors":["Shikai Qiu","Boran Han","Danielle Maddix Robinson","Shuai Zhang","Yuyang (Bernie) Wang","Andrew Wilson"],"abstract":"How do we transfer the relevant knowledge from ever larger foundation models into small, task-specific downstream models that can run at much lower costs? Standard transfer learning using pre-trained weights as the initialization transfers limited information and commits us to often massive pre-trained architectures. This procedure also precludes combining multiple pre-trained models that learn complementary Category: Machine learning","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Machine learning"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=58"}},{"id":"official:061ed9a0a22b574c","title":"Training LLMs to better self-debug and explain code","url":"https://www.amazon.science/publications/training-llms-to-better-self-debug-and-explain-code","published":"2024","authors":["Nan Jiang","Xiaopeng LI","Shiqi Wang","Qiang Zhou","Baishakhi Ray","Varun Kumar","Xiaofei Ma","Anoop Deoras"],"abstract":"In the domain of code generation, self-debugging is crucial. It allows LLMs to refine their generated code based on execution feedback. This is particularly important because generating correct solutions in one attempt proves challenging for complex tasks. Prior works on self-debugging mostly focus on prompting methods by providing LLMs with few-shot examples, which work poorly on small open-sourced LLMs Category: Machine learning","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Machine learning"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=41"}},{"id":"official:3c299586c7eb14b0","title":"Towards quantitative evaluation metrics for image editing approaches","url":"https://www.amazon.science/publications/towards-quantitative-evaluation-metrics-for-image-editing-approaches","published":"2024","authors":["Dana Cohen","Oron Anschel","Alon Shoshan","Igor Kviatkovsky","Manoj Aggarwal","Gérard Medioni"],"abstract":"In the rapidly evolving field of Generative AI, this work takes initial steps towards establishing a systematic approach for comparing image editing methods. Currently, there is a lack of quantitative metrics for evaluating image editing tasks, with new methods being evaluated mostly qualitatively. Our methodology involves three key components: 1) The creation of a large synthetic dataset using GAN-Control Category: Computer vision","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Computer vision"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=62"}},{"id":"official:a9dd4e4b17fe6dc2","title":"The steerability of large language models toward data-driven personas","url":"https://www.amazon.science/publications/the-steerability-of-large-language-models-toward-data-driven-personas","published":"2024","authors":["Junyi Li","Charith Peris","Ninareh Mehrabi","Palash Goyal","Kai-Wei Chang","Aram Galstyan","Richard Zemel","Rahul Gupta"],"abstract":"Large language models (LLMs) are known to generate biased responses where the opinions of certain groups and populations are underrepresented. Here, we present a novel approach to achieve controllable generation of specific viewpoints using LLMs, that can be leveraged to produce multiple perspectives and to reflect the diverse opinions. Moving beyond the traditional reliance on demographics like age, gender Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Conversational AI"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=71"}},{"id":"official:224a1ac3e39f410a","title":"The power of summary-source alignments","url":"https://www.amazon.science/publications/the-power-of-summary-source-alignments","published":"2024","authors":["Ori Ernst","Ori Shapira","Aviv Slobodkin","Sharon Adar","Mohit Bansal","Jacob Goldberger","Ran Levy","Ido Dagan"],"abstract":"Multi-document summarization (MDS) is a challenging task, often decomposed to subtasks of salience and redundancy detection, followed by text generation. In this context, alignment of corresponding sentences between a reference summary and its source documents has been leveraged to generate training data for some of the component tasks. Yet, this enabling alignment step has usually been applied heuristically Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Conversational AI"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=55"}},{"id":"official:08013670362f92e9","title":"The Amazon Nova family of models: Technical report and model card","url":"https://www.amazon.science/publications/the-amazon-nova-family-of-models-technical-report-and-model-card","published":"2024","authors":["Amazon Artificial General Intelligence"],"abstract":"We present Amazon Nova, a new generation of state-of-the-art foundation models that deliver frontier intelligence and industry-leading price performance. Amazon Nova Pro is a highly capable multimodal model with the best combination of accuracy, speed, and cost for a wide range of tasks. Amazon Nova Lite is a low-cost multimodal model that is lightning fast for processing images, video, documents and text Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Conversational AI"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=39"}},{"id":"official:348d8fb089067c9b","title":"Talking nonsense: Probing large language models’ understanding of adversarial gibberish inputs","url":"https://www.amazon.science/publications/talking-nonsense-probing-large-language-models-understanding-of-adversarial-gibberish-inputs","published":"2024","authors":["Valeriia Cherepanova","James Zou"],"abstract":"Large language models (LLMs) exhibit excellent ability to understand human languages, but do they also understand their own language that appears gibberish to us? In this work we delve into this question, aiming to uncover the mechanisms underlying such behavior in LLMs. We employ the Greedy Coordinate Gradient optimizer to craft prompts that compel LLMs to generate coherent responses from seemingly nonsensical Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Conversational AI"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=55"}},{"id":"official:229ebac979d7ca36","title":"THRONE: An object-based hallucination benchmark for the free-form generations of large vision-language models","url":"https://www.amazon.science/publications/throne-an-object-based-hallucination-benchmark-for-the-free-form-generations-of-large-vision-language-models","published":"2024","authors":["Prannay Kaul","Zhizhong Li","Hao Yang","Yonatan Dukler","Ashwin Swaminathan","C. J. Taylor","Stefano Soatto"],"abstract":"Mitigating hallucinations in large vision-language models (LVLMs) remains an open problem. Recent benchmarks do not address hallucinations in open-ended free-form responses, which we term “Type I hallucinations”. Instead, they focus on hallucinations responding to very specific question formats—typically a multiple-choice response regarding a particular object or attribute—which we term “Type II hallucinations Category: Computer vision","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Computer vision"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=61"}},{"id":"official:a2cf195bed9a22f1","title":"Synthesize step-by-step: Tools, templates and LLMs as data generators for reasoning-based chart VQA","url":"https://www.amazon.science/publications/synthesize-step-by-step-tools-templates-and-llms-as-data-generators-for-reasoning-based-chart-vqa","published":"2024","authors":["Zhuowan Li","Bhavan Jasani","Peng Tang","Shabnam Ghadar"],"abstract":"Understanding data visualizations like charts and plots requires reasoning about both visual elements and numerics. Although strong in extractive questions, current chart visual question answering (chart VQA) models suffer on complex reasoning questions. In this work, we address the lack of reasoning ability by data augmentation. We lever-age Large Language Models (LLMs), which have shown to have strong Category: Computer vision","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Computer vision"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=71"}},{"id":"official:377ca1be8f8d88eb","title":"SpeechGuard: Exploring the adversarial robustness of multimodal large language models","url":"https://www.amazon.science/publications/speechguard-exploring-the-adversarial-robustness-of-multimodal-large-language-models","published":"2024","authors":["Raghuveer Peri","Sai Muralidhar Jayanthi","Srikanth Ronanki","Anshu Bhatia","Karel Mundnich","Saket Dingliwal","Nilaksh Das","Zejiang Hou","Goeric Huybrechts","Srikanth Vishnubhotla","Daniel Garcia-Romero","Sundararajan Srinivasan"],"abstract":"Integrated Speech and Large Language Models (SLMs) that can follow speech instructions and generate relevant text responses have gained popularity lately. However, the safety and robustness of these models remains largely unclear. In this work, we investigate the potential vulnerabilities of such instruction-following speech-language models to adversarial attacks and jailbreaking. Specifically, we design Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Conversational AI"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=59"}},{"id":"official:249a61f99c61d4ee","title":"Snakes and ladders: Accelerating state space model inference with speculative decoding","url":"https://www.amazon.science/publications/snakes-and-ladders-accelerating-state-space-model-inference-with-speculative-decoding","published":"2024","authors":["Yangchao Wu","Yonatan Dukler","Matthew Trager","Alessandro Achille","Wei Xia","Stefano Soatto"],"abstract":"Speculative decoding is a method for accelerating inference in large language models (LLMs) by predicting multiple tokens using a smaller ‘draft model’ and validating them against the larger ‘base model.’ If a draft token is inconsistent with what the base model would have generated, speculative decoding ‘backtracks’ to the last consistent token before resuming generation. This is straightforward in autoregressive Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Conversational AI"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=38"}},{"id":"official:7ad1ffa1c469590a","title":"SetLexSem Challenge: Using set operations to evaluate the lexical and semantic robustness of language models","url":"https://www.amazon.science/publications/setlexsem-challenge-using-set-operations-to-evaluate-the-lexical-and-semantic-robustness-of-language-models","published":"2024","authors":["Bardiya Akhbari","Manish Gawali","Nicholas Dronen"],"abstract":"Set theory is foundational to mathematics and, when sets are finite, to reasoning about the world. An intelligent system should perform set operations consistently, regardless of superficial variations in the operands. Initially designed for semantically-oriented NLP tasks, large language models (LLMs) are now being evaluated on algorithmic tasks. Because sets are comprised of arbitrary symbols (e.g. numbers Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Conversational AI"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=41"}},{"id":"official:0e10ead747282cce","title":"Sequential editing for lifelong training of speech recognition models","url":"https://www.amazon.science/publications/sequential-editing-for-lifelong-training-of-speech-recognition-models","published":"2024","authors":["Devang Kulshreshtha","Saket Dingliwal","Brady Houston","Nikolaos Pappas","Srikanth Ronanki"],"abstract":"Automatic Speech Recognition (ASR) traditionally assumes known domains, but adding data from a new domain raises concerns about computational inefficiencies linked to retraining models on both existing and new domains. Fine-tuning solely on new domains risks Catastrophic Forgetting (CF). To address this, Lifelong Learning (LLL) algorithms have been proposed for ASR. Prior research has explored techniques Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Conversational AI"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=57"}},{"id":"official:e80a8f4987ce3513","title":"Self-contradictory reasoning evaluation and detection","url":"https://www.amazon.science/publications/self-contradictory-reasoning-evaluation-and-detection","published":"2024","authors":["Ziyi Liu","Soumya Sanyal","Isabelle Lee","Yongkang Du","Rahul Gupta","Yang Liu","Jieyu Zhao"],"abstract":"In a plethora of recent work, large language models (LLMs) demonstrated impressive reasoning ability, but many proposed downstream reasoning tasks only focus on final answers. Two fundamental questions persist: 1) how consistent is the reasoning, and 2) can models detect unreliable reasoning? In this paper, we investigate self-contradictory (SELF-CONTRA) reasoning, where the model reasoning does not support Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Conversational AI"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=37"}},{"id":"official:29d5cbb0ad4bfcec","title":"Scaling use-case based shopping using LLMs","url":"https://www.amazon.science/publications/scaling-use-case-based-shopping-using-llms","published":"2024","authors":["Sachin Farfade","Sachin Vernekar","Vineet Chaoji","Rajdeep Mukherjee"],"abstract":"Products on e-commerce websites are usually organized based on seller-provided product attributes. Customers looking for a product typically have certain needs or product use-cases in mind, for e.g., a headphone for gym classes, or a printer for a small business. However, they often struggle to map these use-cases to product attributes and subsequently fail to find the product they need. In this talk, we Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Conversational AI"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=76"}},{"id":"official:8736cf23c05a3cc7","title":"Salient information prompting to steer content in prompt-based abstractive summarization","url":"https://www.amazon.science/publications/salient-information-prompting-to-steer-content-in-prompt-based-abstractive-summarization","published":"2024","authors":["Lei Xu","Asad Karim","Saket Dingliwal","Aparna Elangovan"],"abstract":"Large language models (LLMs) can generate fluent summaries across domains using prompting techniques, reducing the need to train models for summarization applications. However, crafting effective prompts that guide LLMs to generate summaries with the appropriate level of detail and writing style remains a challenge. In this paper, we explore the use of salient information extracted from the source document Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Conversational AI"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=45"}},{"id":"official:f129913af62c0fbc","title":"STORYANALOGY: Deriving story-level analogies from large language models to unlock analogical understanding","url":"https://www.amazon.science/publications/storyanalogy-deriving-story-level-analogies-from-large-language-models-to-unlock-analogical-understanding","published":"2024","authors":["Cheng Jiayang","Lin Qiu","Tsz Ho Chan","Tianqing Fang","Weiqi Wang","Chunkit Chan","Dongyu Ru","Qipeng Guo","Hongming Zhang","Yangqiu Song","Yue Zhang","Zheng Zhang"],"abstract":"Analogy-making between narratives is crucial for human reasoning. In this paper, we evaluate the ability to identify and generate analogies by constructing a first-of-its-kind large-scale story-level analogy corpus, STORYANALOGY, which contains 24K story pairs from diverse domains with human annotations on two similarities from the extended Structure-Mapping Theory. We design a set of tests on STORYANALOGY Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Conversational AI"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=79"}},{"id":"official:106d2c3a5246451d","title":"Revisiting SMoE language models by evaluating inefficiencies with task specific expert pruning","url":"https://www.amazon.science/publications/revisiting-smoe-language-models-by-evaluating-inefficiencies-with-task-specific-expert-pruning","published":"2024","authors":["Soumajyoti Sarkar","Leonard Lausen","Volkan Cevher","Sheng Zha","Thomas Brox","George Karypis"],"abstract":"Sparse Mixture of Expert (SMoE) models have emerged as a scalable alternative to dense models in language modeling. These models use conditionally activated feedforward subnetworks in transformer blocks, allowing for a separation between total model parameters and per-example computation. However, large token-routed SMoE models face a significant challenge: during inference, the entire model must be used Category: Machine learning","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Machine learning"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=44"}},{"id":"official:c745a9e2934d5e0b","title":"Redefining proactivity for information seeking dialogue","url":"https://www.amazon.science/publications/redefining-proactivity-for-information-seeking-dialogue","published":"2024","authors":["Jing Yang Lee","Seokhwan Kim","Kartik Mehta","Jiun-Yu Kao","Yu-Hsiang Lin","Arpit Gupta"],"abstract":"Information-Seeking Dialogue (ISD) agents aim to provide accurate responses to user queries. While proficient in directly addressing user queries, these agents, as well as LLMs in general, predominantly exhibit reactive behavior, lacking the ability to generate proactive responses that actively engage users in sustained conversations. However, existing definitions of proactive dialogue in this context do Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Conversational AI"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=43"}},{"id":"official:20e0fa0dff78cf62","title":"Reasoning and planning with large language models in code development (survey for KDD 2024 tutorial)","url":"https://www.amazon.science/publications/reasoning-and-planning-with-large-language-models-in-code-development-survey-for-kdd-2024-tutorial","published":"2024","authors":["Gaurav Gupta","Wooseok Ha","Behrooz Omidvar-Tehrani","Shiqi Wang","Jun Huan"],"abstract":"Large Language Models (LLMs) are revolutionizing the field of code development by leveraging their deep understanding of code patterns, syntax, and semantics to assist developers in various tasks, from code generation and testing to code understanding and documentation. In this survey, accompanying our proposed lecture-style tutorial for KDD 2024, we explore the multifaceted impact of LLMs on code development Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Conversational AI"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=58"}},{"id":"official:ece8c23d0eea496a","title":"ReCLIP: Refine contrastive language image pre-training with source free domain adaptation","url":"https://www.amazon.science/publications/reclip-refine-contrastive-language-image-pre-training-with-source-free-domain-adaptation","published":"2024","authors":["Xuefeng Hu","Ke Zhang","Lu Xia","Albert Chen","Jiajia Luo","Yuyin Sun","Ken Wang","Nan Qiao","Xiao Zeng","Min Sun","Cheng-Hao Kuo","Ram Nevatia"],"abstract":"Large-scale pre-trained vision-language models (VLM) such as CLIP have demonstrated noteworthy zero-shot classification capability, achieving 76.3% top-1 accuracy on ImageNet without seeing any examples. However, while applying CLIP to a downstream target domain, the presence of visual and text domain gaps and cross-modality misalignment can greatly impact the model performance. To address such challenges Category: Computer vision","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Computer vision"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=81"}},{"id":"official:eb680a1d9ec8fbd1","title":"RNR: Teaching large language models to follow roles and rules","url":"https://www.amazon.science/publications/rnr-teaching-large-language-models-to-follow-roles-and-rules","published":"2024","authors":["Kuan Wang","Alexander Bukharin","Haoming Jiang","Qingyu Yin","Zhengyang Wang","Tuo Zhao","Jingbo Shang","Chao Zhang","Bing Yin","Xian Li","Jianshu Chen","Shiyang Li"],"abstract":"Instruction fine-tuning (IFT) elicits instruction following capabilities and steers the behavior of large language models (LLMs) via supervised learning. However, existing models trained on open-source IFT datasets only have the ability to follow instructions from users, and often fail to follow complex role and rules specified by developers, a.k.a. system prompts. The ability to follow these roles and Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Conversational AI"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=54"}},{"id":"official:f7bc7fdc8f78a582","title":"REFINESUMM: Self-refining MLLM for generating a multimodal summarization dataset","url":"https://www.amazon.science/publications/refinesumm-self-refining-mllm-for-generating-a-multimodal-summarization-dataset","published":"2024","authors":["Vaidehi Patil","Leonardo Ribeiro","Mengwen Liu","Mohit Bansal","Markus Dreyer"],"abstract":"Multimodal Large Language Models (MLLMs) excel at synthesizing key information from diverse sources. However, generating accurate and faithful multimodal summaries is challenging, primarily due to the lack of appropriate multimodal datasets for fine-tuning that meaningfully integrate textual and visual modalities. To address this gap, we present a new dataset specifically designed for image-text multimodal Category: Computer vision","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Computer vision"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=55"}},{"id":"official:8e993bee53fb5811","title":"REACT: Residual-adaptive contextual tuning for fast model adaptation in cybersecurity","url":"https://www.amazon.science/publications/react-residual-adaptive-contextual-tuning-for-fast-model-adaptation-in-cybersecurity","published":"2024","authors":["Jiayun Zhang","Junshen Xu","Yi Fan"],"abstract":"Cybersecurity applications are challenged by constant distribution shifts due to the evolvement of services, users, and threats, degrading pretrained model performance. Fast adaptation is crucial for maintaining reliable security measures. Existing works primarily focus on pretraining models that can quickly adapt to new distributions, yet their fine-tuning relies on a rudimentary strategy that treats each Category: Machine learning","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Machine learning"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=40"}},{"id":"official:f4173a9bbeac0574","title":"Quality matters: Evaluating synthetic data for tool-using LLMs","url":"https://www.amazon.science/publications/quality-matters-evaluating-synthetic-data-for-tool-using-llms","published":"2024","authors":["Shadi Iskander","Nachshon Cohen","Zohar Karnin","Ori Shapira","Sofia Tolmach"],"abstract":"Training large language models (LLMs) for external tool usage is a rapidly expanding field, with recent research focusing on generating synthetic data to address the shortage of available data. However, the absence of systematic data quality checks poses complications for properly training and testing models. To that end, we propose two approaches for assessing the reliability of data for training LLMs Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Conversational AI"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=46"}},{"id":"official:4354b117389009b7","title":"Pushing the limits of all-atom geometric graph neural networks: Pre-training, scaling and zeroshot transfer","url":"https://www.amazon.science/publications/pushing-the-limits-of-all-atom-geometric-graph-neural-networks-pre-training-scaling-and-zeroshot-transfer","published":"2024","authors":["Zihan Pengmei","Zhengyuan Shen","Zichen Wang","Marcus Collins","Huzefa Rangwala"],"abstract":"The ability to construct transferable descriptors for molecular and biological systems has broad applications in drug discovery, molecular dynamics, and protein analysis. Geometric graph neural networks (Geom-GNNs) utilizing all-atom information have revolutionized atomistic simulations by enabling the prediction of interatomic potentials and molecular properties. Despite these advances, the application Category: Machine learning","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Machine learning"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=62"}},{"id":"official:3a2e67e38583ee69","title":"Prompting vision-language models for aspect-controlled generation of referring expressions","url":"https://www.amazon.science/publications/prompting-vision-language-models-for-aspect-controlled-generation-of-referring-expressions","published":"2024","authors":["Danfeng Guo","Sanchit Agarwal","Arpit Gupta","Jiun-Yu Kao","Emre Barut","Tagyoung Chung","Jing Huang","Mohit Bansal"],"abstract":"Referring Expression Generation (REG) is the task of generating a description that unambiguously identifies a given target in the scene. Different from Image Captioning (IC), REG requires learning fine-grained characteristics of not only the scene objects but also their surrounding context. Referring expressions are usually not singular; an object can often be uniquely referenced in numerous ways, for in-stance Category: Computer vision","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Computer vision"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=65"}},{"id":"official:c1997f6af42e9244","title":"Prompting foundational models for omni-supervised instance segmentation","url":"https://www.amazon.science/publications/prompting-foundational-models-for-omni-supervised-instance-segmentation","published":"2024","authors":["Arnav Das","Ritwick Chaudhry","Kaustav Kundu","Davide Modolo"],"abstract":"Pixel-level mask annotation costs are a major bottleneck in training deep neural networks for instance segmentation. Recent promptable foundation models like the Segment Anything Model (SAM) and GroundedDINO (GDino) have shown impressive zero-shot performance in segmentation and object detection benchmarks. While these models are not capable of performing inference without prompts, they are ideal for omnisupervised Category: Computer vision","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Computer vision"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=57"}},{"id":"official:482f7a0f440e8352","title":"Promptformer: Prompted conformer transducer for ASR","url":"https://www.amazon.science/publications/promptformer-prompted-conformer-transducer-for-asr","published":"2024","authors":["Sergio Duarte Torres","Arunasish Sen","Aman Rana","Lukas Drude","Alejandro Gomez Alanis","Andreas Schwarz","Leif Rādel","Volker Leutnant"],"abstract":"Context cues carry information which can improve multiturn interactions in automatic speech recognition (ASR) systems. In this paper, we introduce a novel mechanism inspired by hyper-prompting to fuse textual context with acoustic representations in the attention mechanism. Results on a test set with multi-turn interactions show that our method achieves 5.9% relative word error rate reduction (rWERR) over Category: Machine learning","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Machine learning"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=79"}},{"id":"official:ffaa7e44ba0cc2df","title":"Prompt-tuned muti-task taxonomic transformer (PTMTTaxoFormer)","url":"https://www.amazon.science/publications/prompt-tuned-muti-task-taxonomic-transformer-ptmttaxoformer","published":"2024","authors":["Rajashekar Vasantha","Nhan Nguyen","Yue Zhang"],"abstract":"Hierarchical Text Classification (HTC) is a sub-class of multi-label classification. It is challenging because the hierarchy typically has a large number of diverse topics. Existing methods for HTC fall within two categories, local methods (a classifier for each level, node, or parent) or global methods (a single classifier for everything). Local methods are computationally expensive, whereas global methods Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Conversational AI"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=43"}},{"id":"official:0f79c81407fd9ca4","title":"Prompt perturbation consistency learning for robust language models","url":"https://www.amazon.science/publications/prompt-perturbation-consistency-learning-for-robust-language-models","published":"2024","authors":["Yao Qiang","Nandi Subhrangshu","Ninareh Mehrabi","Greg Ver Steeg","Anoop Kumar","Anna Rumshisky","Aram Galstyan"],"abstract":"Large language models (LLMs) have demonstrated impressive performance on a number of natural language processing tasks, such as question answering and text summarization. However, their performance on sequence labeling tasks, such as intent classification and slot filling (IC-SF), which is a central component in personal assistant systems, lags significantly behind discriminative models. Furthermore, there Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Conversational AI"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=78"}},{"id":"official:19fae70f566d25e6","title":"PrivLM-Bench: A multi-level privacy evaluation benchmark for language models","url":"https://www.amazon.science/publications/privlm-bench-a-multi-level-privacy-evaluation-benchmark-for-language-models","published":"2024","authors":["Haoran Li","Dadi Guo","Donghao Li","Wei Fan","Qi Hu","Xin Liu","Chunkit Chan","Duanyi Yao","Yuan Yao","Yangqiu Song"],"abstract":"The rapid development of language models (LMs) brings unprecedented accessibility and usage for both models and users. On the one hand, powerful LMs achieve state-of-the-art performance over numerous downstream NLP tasks. On the other hand, more and more attention is paid to unrestricted model accesses that may bring malicious privacy risks of data leakage. To address these issues, many recent works propose Category: Security, privacy, and abuse prevention","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Security, privacy, and abuse prevention"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=47"}},{"id":"official:77326e746f0d5bc2","title":"Pretraining and finetuning language models on geospatial networks for accurate address matching","url":"https://www.amazon.science/publications/pretraining-and-finetuning-language-models-on-geospatial-networks-for-accurate-address-matching","published":"2024","authors":["Saket Maheshwary","Arpan Paul","Saurabh Sohoney"],"abstract":"We propose a novel framework for pretraining and fine-tuning language models with the goal of determining whether two addresses represent the same physical building. Address matching and building authoritative address catalogues are important to many applications and businesses, such as delivery services, online retail, emergency services, logistics, etc. We propose to view a collection of addresses as Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Conversational AI"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=42"}},{"id":"official:858764b9c07c85e9","title":"Pre-training differentially private models with limited public data","url":"https://www.amazon.science/publications/pre-training-differentially-private-models-with-limited-public-data","published":"2024","authors":["Zhiqi Bu","Xinwei Zhang","Sheng Zha","Mingyi Hong"],"abstract":"The superior performance of large foundation models relies on the use of massive amounts of high-quality data, which often contain sensitive, private and copyrighted material that requires formal protection. While differential privacy (DP) is a prominent method to gauge the degree of security provided to the models, its application is commonly limited to the model fine-tuning stage, due to the performance Category: Machine learning","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Machine learning"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=38"}},{"id":"official:a4dad9f539add37a","title":"Policy optimization to align the fidelity and efficiency of reasoning agents in multi-turn dialogues","url":"https://www.amazon.science/publications/policy-optimization-to-align-the-fidelity-and-efficiency-of-reasoning-agents-in-multi-turn-dialogues","published":"2024","authors":["Jeremy Curuksu"],"abstract":"Reinforcement learning from human preferences can fine tune language models for helpfulness and safety, but does not directly address the fidelity and efficiency of reasoning agents in multi-turn dialogues. I propose a method to improve the validity, coherence and efficiency of reasoning agents by defining a reward model as a mapping between predefined queries and tools which can be applied to any custom Category: Automated reasoning","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Automated reasoning"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=62"}},{"id":"official:478757389af448cf","title":"Playlist search reinvented: LLMs behind the curtain","url":"https://www.amazon.science/publications/playlist-search-reinvented-llms-behind-the-curtain","published":"2024","authors":["Geetha Aluri","Siddharth Sharma","Tarun Sharma","Joaquin Delgado"],"abstract":"Improving search functionality poses challenges such as data scarcity for model training, metadata enrichment for comprehensive document indexing, and the labor-intensive manual annotation for evaluation. Traditionally, iterative methods relying on human annotators and customer feedback have been used. However, recent advancements in Large Language Models (LLMs) offer new solutions. This paper focuses on Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Conversational AI"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=47"}},{"id":"official:556c6d5dfb156cc4","title":"Planes, trains and automobiles: Leverage multimodal in-mission signals for shopping journeys","url":"https://www.amazon.science/publications/planes-trains-and-automobiles-leverage-multimodal-in-mission-signals-for-shopping-journeys","published":"2024","authors":["Viet Ha","Shasha Li","Arnau Ramisa","Xinliang Zhu"],"abstract":"Modern search systems offer multiple ways for expressing information needs, including image, voice, and text. Consequently, an increasing number of users seamlessly transition between these modalities to convey their intents. This emerging trend presents new opportunities for utilizing queries in different modalities to help users complete their search journeys efficiently. In this proposal, we introduce Category: Computer vision","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.1145/3627673.3679067","openalex_id":"https://openalex.org/W4403577879","cited_by_count":0,"quality_score":56,"matched_keywords":["Computer vision"],"author_affiliations":["Amazon","Amazon (United States)"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=52"}},{"id":"official:a54dc643932ca3dc","title":"PG-STORY: Taxonomy, dataset, and evaluation for ensuring child-safe content for story generation","url":"https://www.amazon.science/publications/pg-story-taxonomy-dataset-and-evaluation-for-ensuring-child-safe-content-for-story-generation","published":"2024","authors":["Alicia Y. Tsai","Shereen Oraby","Anjali Narayan-Chen","Alessandra Cervone","Spandana Gella","Apurv Verma","Tagyoung Chung","Jing Huang","Nanyun Peng"],"abstract":"Creating children’s stories through text generation is a creative task that requires stories to be both entertaining and suitable for young audiences. However, since current story generation systems often rely on pre-trained language models fine-tuned with limited story data, they may not always prioritize child-friendliness. This can lead to the unintended generation of stories containing problematic elements Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Conversational AI"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=46"}},{"id":"official:113a2945207810a2","title":"On the scalability of diffusion-based text-to-image generation","url":"https://www.amazon.science/publications/on-the-scalability-of-diffusion-based-text-to-image-generation","published":"2024","authors":["Hao Li","Yang Zou","Ying Wang","Orchid Majumder","Yusheng Xie","R. Manmatha","Ashwin Swaminathan","Zhuowen Tu","Stefano Ermon","Stefano Soatto"],"abstract":"Scaling up model and data size has been quite successful for the evolution of LLMs. However, the scaling law for the diffusion based text-to-image (T2I) models is not fully explored. It is also unclear how to efficiently scale the model for better performance at reduced cost. The different training settings and expensive training cost make a fair model comparison extremely difficult. In this work, we empirically Category: Computer vision","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Computer vision"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=70"}},{"id":"official:3bf4645e189ec445","title":"Multimodal learning with online text cleaning for e-commerce product search","url":"https://www.amazon.science/publications/multimodal-learning-with-online-text-cleaning-for-e-commerce-product-search","published":"2024","authors":["Zhizhang Hu","Shasha Li","Ming Du","Arnab Dhua","Douglas Gray"],"abstract":"Vision-language transformer models play a pivotal role in e-commerce product search. When using product description (e.g. product title) and product image pairs to train such models, there are often non-visual-descriptive text attributes in the product description, which makes the visual textual alignment challenging. We introduce MultiModal Learning with online Token Pruning (MML-TP). MML-TP leverages Category: Computer vision","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Computer vision"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=58"}},{"id":"official:627f0a98904b2d0c","title":"Multicalibration for confidence scoring in LLMs","url":"https://www.amazon.science/publications/multicalibration-for-confidence-scoring-in-llms","published":"2024","authors":["Gianluca Detommaso","Martin Bertran Lopez","Riccardo Fogliato","Aaron Roth"],"abstract":"This paper proposes the use of “multicalibration” to yield interpretable and reliable confidence scores for outputs generated by large language models (LLMs). Multicalibration asks for calibration not just marginally, but simultaneously across various intersecting groupings of the data. We show how to form groupings for prompt/completion pairs that are correlated with the probability of correctness via Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Conversational AI"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=60"}},{"id":"official:606a00687f4679e8","title":"Multi-stage multi-modal pre-training for automatic speech recognition","url":"https://www.amazon.science/publications/multi-stage-multi-modal-pre-training-for-automatic-speech-recognition","published":"2024","authors":["Yash Jain","David Chan","Pranav Dheram","Aparna Khare","Olabanji Shonibare","Venkatesh Ravichandran","Shalini Ghosh"],"abstract":"Recent advances in machine learning have demonstrated that multi-modal pre-training can improve automatic speech recognition (ASR) performance compared to randomly initialized models, even when models are fine-tuned on uni-modal tasks. Existing multi-modal pre-training methods for the ASR task have primarily focused on single-stage pre-training where a single unsupervised task is used for pre-training followed Category: Machine learning","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Machine learning"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=70"}},{"id":"official:bac0b4305b2bd553","title":"Multi-review fusion-in-context","url":"https://www.amazon.science/publications/multi-review-fusion-in-context","published":"2024","authors":["Aviv Slobodkin","Ori Shapira","Ran Levy","Ido Dagan"],"abstract":"Grounded text generation, encompassing tasks such as long-form question-answering and summarization, necessitates both content selection and content consolidation. Current end-to-end methods are difficult to control and interpret due to their opaqueness. Accordingly, recent works have proposed a modular approach, with separate components for each step. Specifically, we focus on the second subtask, of generating Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Conversational AI"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=69"}},{"id":"official:9d1a1e8c7dc7309b","title":"Multi-modal hallucination control by visual information grounding","url":"https://www.amazon.science/publications/multi-modal-hallucination-control-by-visual-information-grounding","published":"2024","authors":["Alessandro Favero","Luca Zancato","Matthew Trager","Siddharth Choudhary","Pramuditha Perera","Alessandro Achille","Ashwin Swaminathan","Stefano Soatto"],"abstract":"Generative Vision-Language Models (VLMs) are prone to generate plausible-sounding textual answers that, however, are not always grounded in the input image. We investigate this phenomenon, usually referred to as “hallucination” and show that it stems from an excessive reliance on the language prior. In particular, we show that as more tokens are generated, the reliance on the visual prompt decreases, and Category: Computer vision","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Computer vision"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=72"}},{"id":"official:d04a2a69a7cb213d","title":"MinPrompt: Graph-based minimal prompt data augmentation for few-shot question answering","url":"https://www.amazon.science/publications/minprompt-graph-based-minimal-prompt-data-augmentation-for-few-shot-question-answering","published":"2024","authors":["Xiusi Chen","Jyun-Yu Jiang","Wei-Cheng Chang","Cho-Jui Hsieh","Hsiang-Fu Yu","Wei Wang"],"abstract":"Recent advances in few-shot question answering (QA) mostly rely on the power of pre-trained large language models (LLMs) and fine-tuning in specific settings. Although the pre-training stage has already equipped LLMs with powerful reasoning capabilities, LLMs still need to be fine-tuned to adapt to specific domains to achieve the best results. In this paper, we propose to select the most informative data Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Conversational AI"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=60"}},{"id":"official:ef2d49864b48ddc3","title":"Max-margin transducer loss: Improving sequence-discriminative training using a large-margin learning strategy","url":"https://www.amazon.science/publications/max-margin-transducer-loss-improving-sequence-discriminative-training-using-a-large-margin-learning-strategy","published":"2024","authors":["Rupak Vignesh Swaminathan","Grant Strimel","Ariya Rastrow","Harish Mallidi","Kai Zhen","Hieu Duy Nguyen","Nathan Susanj","Thanasis Mouchtaris"],"abstract":"In this work, we propose a novel sequence-discriminative training criterion for automatic speech recognition (ASR) based on the Conformer Transducer. Inspired by the large-margin classifier framework, we separate the “good” and the “bad” hypotheses in an N-best list produced from a pre-trained transducer model by a margin (τ ), hence the term, Max-Margin Transducer (MMT) loss. It is observed that fine-tuning Category: Machine learning","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"https://doi.org/10.1109/icassp48485.2024.10446322","openalex_id":"https://openalex.org/W4392904758","cited_by_count":0,"quality_score":56,"matched_keywords":["Machine learning"],"author_affiliations":["Amazon","Amazon (United States)"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=80"}},{"id":"official:3bb3d2654b978f84","title":"Mastering robot manipulation with multimodal prompts through pretraining and multi-task fine-tuning","url":"https://www.amazon.science/publications/mastering-robot-manipulation-with-multimodal-prompts-through-pretraining-and-multi-task-fine-tuning","published":"2024","authors":["Jiachen Li","Qiaozi (QZ) Gao","Michael Johnston","Xiaofeng Gao","Xuehai He","Hangjie Shi","Suhaila Shakiah","Reza Ghanadan","William Yang Wang"],"abstract":"Prompt-based learning has been demonstrated as a compelling paradigm contributing to large language models’ tremendous success (LLMs). Inspired by their success in language tasks, existing research has leveraged LLMs in embodied instruction following and task planning. In this work, we tackle the problem of training a robot to understand multimodal prompts, interleaving vision signals with text descriptions Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Conversational AI"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=41"}},{"id":"official:31de3f314e34c0a8","title":"MICo: Preventative detoxification of large language models through inhibition control","url":"https://www.amazon.science/publications/mico-preventative-detoxification-of-large-language-models-through-inhibition-control","published":"2024","authors":["Roy Siegelmann","Ninareh Mehrabi","Palash Goyal","Prasoon Goyal","Lisa Bauer","Jwala Dhamala","Aram Galstyan","Rahul Gupta","Reza Ghanadan"],"abstract":"Large Language Models (LLMs) are powerful tools which have been both dominant and commonplace in the field of Artificial Intelligence. Yet, LLMs have a tendency to devolve into toxic degeneration, wherein otherwise safe and unproblematic models begin generating toxic content. For the sake of social responsibility and inspired by the biological mechanisms of inhibition control, we introduce the paradigm Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Conversational AI"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=71"}},{"id":"official:caaf42f342cfa427","title":"MERLIN: Multimodal & multilingual embedding for recommendations at large-scale via item associations","url":"https://www.amazon.science/publications/merlin-multimodal-multilingual-embedding-for-recommendations-at-large-scale-via-item-associations","published":"2024","authors":["Sambeet Tiady","Arihant Jain","Dween Rabius Sanny","Khushi Gupta","Srinivas Virinchi","Swapnil Gupta","Anoop S V K K Saladi","Deepak Gupta"],"abstract":"Product recommendations incentivize customers to make multiunit purchases by surfacing relevant products, leading to lower cost per unit for e-commerce stores and lower prices for their customers. However, the humongous scale of products, implicit co-purchase asymmetry and variation in co-purchase behavior across different categories, are orthogonal problems to solve. To address these problems, we propose Category: Machine learning","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Machine learning"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=51"}},{"id":"official:fce8134ec4b48949","title":"MAGID: An automated pipeline for generating synthetic multi-modal datasets","url":"https://www.amazon.science/publications/magid-an-automated-pipeline-for-generating-synthetic-multi-modal-datasets","published":"2024","authors":["Hossein Aboutalebi","Justin Sun","Hwanjun Song","Yusheng Xie","Arshit Gupta","Hang Su","Igor Shalyminov","Nikolaos Pappas","Siffi Singh","Saab Mansour"],"abstract":"Development of multimodal interactive systems is hindered by the lack of rich, multimodal (text, images) conversational data, which is needed in large quantities for LLMs. Previous approaches augment textual dialogues with retrieved images, posing privacy, diversity, and quality constraints. In this work, we introduce Multimodal Augmented Generative Images Dialogues (MAGID), a framework to augment text-only Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Conversational AI"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=70"}},{"id":"official:8a4d50c3f1b37d60","title":"M3T: A new benchmark dataset for multi-modal document-level machine translation","url":"https://www.amazon.science/publications/m3t-a-new-benchmark-dataset-for-multi-modal-document-level-machine-translation","published":"2024","authors":["Benjamin Hsu","Xiaoyu Liu","Huayang Li","Yoshinari Fujinuma","Maria Nădejde","Xing Niu","Yair Kittenplon","Ron Litman","Raghavendra Pappagari"],"abstract":"Document translation poses a challenge for Neural Machine Translation (NMT) systems. Most document-level NMT systems rely on meticulously curated sentence-level parallel data, assuming flawless extraction of text from documents along with their precise reading order. These systems also tend to disregard additional visual cues such as the document layout, deeming it irrelevant. However, real-world documents Category: Computer vision","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Computer vision"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=67"}},{"id":"official:e4ed8514e85ab56a","title":"Low-cost generation and evaluation of dictionary example sentences","url":"https://www.amazon.science/publications/low-cost-generation-and-evaluation-of-dictionary-example-sentences","published":"2024","authors":["Bill Cai","Clarence Ng","Daniel Tan","Shelvia Hotama"],"abstract":"Dictionary example sentences play an important role in illustrating word definitions and usage, but manually creating quality sentences is challenging. Prior works have demonstrated that language models can be trained to generate example sentences. However, they relied on costly customized models and word sense datasets for generation and evaluation of their work. Rapid advancements in foundational models Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Conversational AI"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=69"}},{"id":"official:53aec1377e8bda2d","title":"Leveraging customer feedback for multi-modal insight extraction","url":"https://www.amazon.science/publications/leveraging-customer-feedback-for-multi-modal-insight-extraction","published":"2024","authors":["Sandeep Sricharan Mukku","Abinesh Kanagarajan","Pushpendu Ghosh","Chetan Aggarwal"],"abstract":"Businesses can benefit from customer feedback in different modalities, such as text and images, to enhance their products and services. However, it is difficult to extract actionable and relevant pairs of text segments and images from customer feedback in a single pass. In this paper, we propose a novel multi-modal method that fuses image and text information in a latent space and decodes it to extract Category: Computer vision","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Computer vision"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=68"}},{"id":"official:d37e249c12aa864b","title":"Leveraging LLMs for dialogue quality measurement","url":"https://www.amazon.science/publications/leveraging-llms-for-dialogue-quality-measurement","published":"2024","authors":["Jinghan Jia","Abi Komma","Timothy Leffel","Xujun Peng","Ajay Nagesh","Tamer Soliman","Aram Galstyan","Anoop Kumar"],"abstract":"In task-oriented conversational-AI evaluation, unsupervised methods poorly correlate with human judgments, and supervised approaches lack generalization. Recent advances in large language models (LLMs) show robust zero-shot and few-shot capabilities across NLP tasks. This paper explores using LLMs for automated dialogue quality evaluation, experimenting with various configurations on public and proprietary Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Conversational AI"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=64"}},{"id":"official:c30130cf9cfeff4c","title":"Learning to generate answers with citations via factual consistency models","url":"https://www.amazon.science/publications/learning-to-generate-answers-with-citations-via-factual-consistency-models","published":"2024","authors":["Rami Aly","Zhiqiang Tang","Samson Tan","George Karypis"],"abstract":"Large Language Models (LLMs) frequently hallucinate, impeding their reliability in mission-critical situations. One approach to address this issue is to provide citations to relevant sources alongside generated content, enhancing the verifiability of generations. However, citing passages accurately in answers remains a substantial challenge. This paper proposes a weakly-supervised fine-tuning method leveraging Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Conversational AI"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=54"}},{"id":"official:01f17c2284496a71","title":"Learning metadata-agnostic representations for Text-to-SQL in-context example selection","url":"https://www.amazon.science/publications/learning-metadata-agnostic-representations-for-text-to-sql-in-context-example-selection","published":"2024","authors":["Chuhong Mai","Ro-ee Tal","Thahir Mohamed"],"abstract":"In-context learning (ICL) is a powerful paradigm where large language models (LLMs) benefit from task demonstrations added to the prompt. Yet, selecting optimal demonstrations is not trivial, especially for complex or multi-modal tasks where input and output distributions differ. We hypothesize that forming taskspecific representations of the input is key. In this paper, we propose a method to align representations Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Conversational AI"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=43"}},{"id":"official:425ed908764d5a1f","title":"Learning from natural language explanations for generalizable entity matching","url":"https://www.amazon.science/publications/learning-from-natural-language-explanations-for-generalizable-entity-matching","published":"2024","authors":["Somin Wadhwa","Adit Krishnan","Runhui Wang","Byron C. Wallace","Chris (Luyang) Kong"],"abstract":"Entity matching is the task of linking records from different sources that refer to the same real-world entity. Past work has primarily treated entity linking as a standard supervised learning problem. However, supervised entity matching models often do not generalize well to new data, and collecting exhaustive labeled training data is often cost prohibitive. Further, recent efforts have adopted LLMs for Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Conversational AI"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=45"}},{"id":"official:1841c378cfbf794f","title":"Large language models for preventing medication direction errors in online pharmacies","url":"https://www.amazon.science/publications/large-language-models-for-preventing-medication-direction-errors-in-online-pharmacies","published":"2024","authors":["Cristobal Pais","Jeff Liu","Elizabeth Wade","Bobby Voigt","Vin Gupta","Mohsen Bayati"],"abstract":"Errors in pharmacy medication directions, such as incorrect instructions for dosage or frequency, can increase patient safety risk substantially by raising the chances of adverse drug events. This study explores how integrating domain knowledge with large language models (LLMs)—capable of sophisticated text interpretation and generation—can reduce these errors. We introduce MEDIC (medication direction copilot Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Conversational AI"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=66"}},{"id":"official:da3aede8d6e28867","title":"Large language models as recommender systems: A study of popularity bias","url":"https://www.amazon.science/publications/large-language-models-as-recommender-systems-a-study-of-popularity-bias","published":"2024","authors":["Jan Malte Lichtenberg","Alexander Buchholz","Pola Schwöbel"],"abstract":"The issue of popularity bias—where popular items are disproportionately recommended, overshadowing less popular but potentially relevant items—remains a significant challenge in recommender systems. Recent advancements have seen the integration of general-purpose Large Language Models (LLMs) into the architecture of such systems. This integration raises concerns that it might exacerbate popularity bias, Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Conversational AI"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=61"}},{"id":"official:ebb614c4d00b6084","title":"Large language models (LLMs) on tabular data: Prediction, generation, and understanding — a survey","url":"https://www.amazon.science/publications/large-language-models-llms-on-tabular-data-prediction-generation-and-understanding-a-survey","published":"2024","authors":["Xi Fang","Weijie Xu","Fiona Anting Tan","Jiani Zhang","Ziqing Hu","Yanjun (Jane) Qi","Scott Nickleach","Diego Socolinsky","Srinivasan Sengamedu","\"SHS\"","Christos Faloutsos"],"abstract":"Recent breakthroughs in large language modeling have facilitated rigorous exploration of their application in diverse tasks related to tabular data modeling, such as prediction, tabular data synthesis, question answering, and table understanding. Each task presents unique challenges and opportunities. However, there is currently a lack of comprehensive review that summarizes and compares the key techniques Category: Machine learning","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Machine learning"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=58"}},{"id":"official:888bf99ab67d1001","title":"Lancet: Accelerating mixture-of-experts training via whole graph computation-communication overlapping","url":"https://www.amazon.science/publications/lancet-accelerating-mixture-of-experts-training-via-whole-graph-computation-communication-overlapping","published":"2024","authors":["Chenyu Jiang","Ye Tian","Zhen Jia","Shuai Zheng","Chuan Wu","Yida Wang"],"abstract":"The Mixture-of-Expert (MoE) technique plays a crucial role in expanding the size of DNN model parameters. However, it faces the challenge of extended all-to-all communication latency during the training process. Existing methods attempt to mitigate this issue by overlapping all-to-all with expert computation. Yet, these methods frequently fall short of achieving sufficient overlap, consequently restricting Category: Cloud and systems","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Cloud and systems"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=67"}},{"id":"official:8ad34396a35f2052","title":"LaRS: Latent reasoning skills for chain-of-thought reasoning","url":"https://www.amazon.science/publications/lars-latent-reasoning-skills-for-chain-of-thought-reasoning","published":"2024","authors":["Zifan Xu","Haozhu Wang","Dmitriy Bespalov","Xian Wu","Yanjun (Jane) Qi"],"abstract":"Chain-of-thought (CoT) prompting is a popular in-context learning (ICL) approach for large language models (LLMs), especially when tackling complex reasoning tasks. Traditional ICL approaches construct prompts using examples that contain questions similar to the input question. However, CoT prompting, which includes crucial intermediate reasoning steps (rationales) within its examples, necessitates selecting Category: Machine learning","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Machine learning"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=63"}},{"id":"official:c1040dbd4bd5f425","title":"Interleaved audio/audiovisual transfer learning for AV-ASR in low-resourced languages","url":"https://www.amazon.science/publications/interleaved-audio-audiovisual-transfer-learning-for-av-asr-in-low-resourced-languages","published":"2024","authors":["Zhengyang Li","Patrick Blumenberg","Jing Liu","Thomas Graave","Timo Lohrenz","Siegfried Kunzmann","Tim Fingscheidt"],"abstract":"Cross-language transfer learning from English to a target language has shown effectiveness in low-resourced audiovisual speech recognition (AV-ASR). We first investigate a 2-stage protocol, which performs fine-tuning of the English pre-trained AV encoder on a large audio corpus in the target language (1st stage), and then carries out cross-modality transfer learning from audio to AV in the target language Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Conversational AI"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=54"}},{"id":"official:ddc95f8ab4d635ee","title":"Improving multi-hop reasoning in LLMs by learning from rich human feedback","url":"https://www.amazon.science/publications/improving-multi-hop-reasoning-in-llms-by-learning-from-rich-human-feedback","published":"2024","authors":["Nitish Joshi","Koushik Kalyanaraman","Zhiting Hu","Kumar Chellapilla","He He","Erran Li"],"abstract":"Recent large language models (LLMs) have enabled tremendous progress in natural-language understanding. However, they are prone to generate confident but nonsensical reasoning chains, a significant obstacle to establishing trust with users. In this work, we aim to incorporate rich human feedback on such incorrect model generated reasoning chains for multi-hop reasoning to improve performance on these tasks Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Conversational AI"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=66"}},{"id":"official:289825dc63d27c81","title":"Improving minimax group fairness in sequential recommendation","url":"https://www.amazon.science/publications/improving-minimax-group-fairness-in-sequential-recommendation","published":"2024","authors":["Krishna Acharya","David Wardrope","Timos Korres","Aleksandr Petrov","Anders Uhrenholt"],"abstract":"Training sequential recommenders such as SASRec with uniform sample weights achieves good overall performance but can fall short on specific user groups. One such example is popularity bias, where mainstream users receive better recommendations than niche content viewers. To improve recommendation quality across diverse user groups, we explore three Distributionally Robust Optimization(DRO) methods: Group Category: Machine learning","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Machine learning"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=62"}},{"id":"official:3dc1984a9d7a4666","title":"II-MMR: Identifying and improving multi-modal multi-hop reasoning in visual question answering","url":"https://www.amazon.science/publications/ii-mmr-identifying-and-improving-multi-modal-multi-hop-reasoning-in-visual-question-answering","published":"2024","authors":["Jihyung Kil","Farideh Tavazoee","Dongyeop Kang","Joo-Kyung Kim"],"abstract":"Visual Question Answering (VQA) often involves diverse reasoning scenarios across Vision and Language (V&L). Most prior VQA studies, however, have merely focused on assessing the model’s overall accuracy without evaluating it on different reasoning cases. Furthermore, some recent works observe that conventional Chain-of-Thought (CoT) prompting fails to generate effective reasoning for VQA, especially for Category: Computer vision","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Computer vision"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=61"}},{"id":"official:ca76c473781bdf2f","title":"Hyperbolic learning with synthetic captions for open-world detection","url":"https://www.amazon.science/publications/hyperbolic-learning-with-synthetic-captions-for-open-world-detection","published":"2024","authors":["Fanjie Kong","Yanbei Chen","Jiarui Cai","Davide Modolo"],"abstract":"Open-world detection poses significant challenges, as it requires the detection of any object using either object class labels or free-form texts. Existing related works often use large-scale manual annotated caption datasets for training, which are extremely expensive to collect. Instead, we propose to transfer knowledge from vision-language models (VLMs) to enrich the open-vocabulary descriptions automatically Category: Computer vision","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Computer vision"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=69"}},{"id":"official:800054cd7015ce7f","title":"How robust are LLMs to in-context majority label bias?","url":"https://www.amazon.science/publications/how-robust-are-llms-to-in-context-majority-label-bias","published":"2024","authors":["Karan Gupta","Sumegh Roychowdhury","Siva Rajesh Kasa","Santhosh Kasa","Anish Bhanushali","Nikhil Pattisapu","Prasanna Srinivasa Murthy","Alok Chandra"],"abstract":"In the In-Context Learning (ICL) setup, various forms of label biases can manifest. One such manifestation is majority label bias, which arises when the distribution of labeled examples in the in-context samples is skewed towards one or more specific classes making Large Language Models (LLMs) more prone to predict those labels. Such discrepancies can arise from various factors, including logistical constraints Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Conversational AI"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=81"}},{"id":"official:71aa583ca8864c3a","title":"How far is too far? Studying the effects of domain discrepancy on masked language models","url":"https://www.amazon.science/publications/how-far-is-too-far-studying-the-effects-of-domain-discrepancy-on-masked-language-models","published":"2024","authors":["Deep Kayal","Alexander Rakhlin","Ali Dashti","Serguei Stepaniants"],"abstract":"Pre-trained masked language models, such as BERT, perform strongly on a wide variety of NLP tasks and have become ubiquitous in recent years. The typical way to use such models is to fine-tune them on downstream data. In this work, we aim to study how the difference in domains between the pre-trained model and the task effects its final performance. We first devise a simple mechanism to quantify the domain Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Conversational AI"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=72"}},{"id":"official:d2c186d86243fcba","title":"Hop, skip, jump to convergence: Dynamics of learning rate transitions for improved training of large language models","url":"https://www.amazon.science/publications/hop-skip-jump-to-convergence-dynamics-of-learning-rate-transitions-for-improved-training-of-large-language-models","published":"2024","authors":["Shreyas Subramanian","Vignesh Ganapathiraman","Corey Barrett"],"abstract":"Various types of learning rate (LR) schedulers are being used for training or fine tuning of Large Language Models today. In practice, several mid-flight changes are required in the LR schedule either manually, or with careful choices around warmup steps, peak LR, type of decay and restarts. To study this further, we consider the effect of switching the learning rate at a predetermined time during training Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Conversational AI"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=42"}},{"id":"official:b36fc30fcab70d44","title":"Harnessing the power of LLMs in practice: A survey on ChatGPT and beyond","url":"https://www.amazon.science/publications/harnessing-the-power-of-llms-in-practice-a-survey-on-chatgpt-and-beyond","published":"2024","authors":["Jingfeng Yang","Haongye Jin","Ruixiang Tang","Xiaotian Han","Qizhang Feng","Haoming Jiang","Shaochen Zhong","Bing Yin","Xia Hu"],"abstract":"This paper presents a comprehensive and practical guide for practitioners and end-users working with Large Language Models (LLMs) in their downstream natural language processing (NLP) tasks. We provide discussions and insights into the usage of LLMs from the perspectives of models, data, and downstream tasks. Firstly, we offer an introduction and brief summary of current language models. Then, we discuss Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Conversational AI"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=76"}},{"id":"official:8326f83ca21a20e7","title":"Guardrails for avoiding harmful medical product recommendations and off-label promotion in generative AI models","url":"https://www.amazon.science/publications/guardrails-for-avoiding-harmful-medical-product-recommendations-and-off-label-promotion-in-generative-ai-models","published":"2024","authors":["Daniel Lopez-Martinez"],"abstract":"Generative AI (GenAI) models have demonstrated remarkable capabilities in a wide variety of medical tasks. However, as these models are trained using generalist datasets with very limited human oversight, they can learn uses of medical products that have not been adequately evaluated for safety and efficacy, nor approved by regulatory agencies. Given the scale at which GenAI may reach users, unvetted recommendations Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Conversational AI"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=56"}},{"id":"official:693ed7f1f9cb9fca","title":"Graph neural prompting with large language models","url":"https://www.amazon.science/publications/graph-neural-prompting-with-large-language-models","published":"2024","authors":["Yijun Tian","Huan Song","Zichen Wang","Haozhu Wang","Ziqing Hu","Fang Wang","Nitesh V. Chawla","Panpan Xu"],"abstract":"Large language models (LLMs) have shown remarkable generalization capability with exceptional performance in various language modeling tasks. However, they still exhibit inherent limitations in precisely capturing and returning grounded knowledge. While existing work has explored utilizing knowledge graphs (KGs) to enhance language modeling via joint training and customized model architectures, applying Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Conversational AI"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=77"}},{"id":"official:96a6683787a1e8fa","title":"Graph chain-of-thought: Augmenting large language models by reasoning on graphs","url":"https://www.amazon.science/publications/graph-chain-of-thought-augmenting-large-language-models-by-reasoning-on-graphs","published":"2024","authors":["Bowen Jin","Chulin Xie","Jiawei Zhang","Kashob Kumar Roy","Yu Zhang","Zheng Li","Ruirui Li","Xianfeng Tang","Suhang Wang","Yu Meng","Jiawei Han"],"abstract":"Large language models (LLMs), while exhibiting exceptional performance, suffer from hallucinations, especially on knowledge-intensive tasks. Existing works propose to augment LLMs with individual text units retrieved from external knowledge corpora to alleviate the issue. However, in many domains, texts are interconnected (e.g., academic papers in a bibliographic graph are linked by citations and co-authorships Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Conversational AI"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=52"}},{"id":"official:c7f0b1ac6a70c8a0","title":"Gradual fine-tuning with graph routing for multi-source unsupervised domain adaptation","url":"https://www.amazon.science/publications/gradual-fine-tuning-with-graph-routing-for-multi-source-unsupervised-domain-adaptation","published":"2024","authors":["Yao Ma","Samuel Louvan","Zhunxuan Wang"],"abstract":"Multi-source unsupervised domain adaptation aims to leverage labeled data from multiple source domains for training a machine learning model to generalize well on a target domain without labels. Source domain selection plays a crucial role in determining the model’s performance. It relies on the similarities amongst source and target domains. Nonetheless, existing work for source domain selection often Category: Machine learning","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Machine learning"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=61"}},{"id":"official:a5c4f5e25e653e55","title":"Generative AI based virtual assistant for reconciliation research","url":"https://www.amazon.science/publications/generative-ai-based-virtual-assistant-for-reconciliation-research","published":"2024","authors":["Daksha Yadav","Sabrina Zhang","Tom Jin","Prakash Krishnan","Des Clarke"],"abstract":"Timely and accurate reconciliation of the company’s finan-cial information is an important internal control over the company’s financial reporting to support quarterly and annual external financial compliance activities. However, due to the complexity of typical end-to-end business system integrations, the process to research and investigate reconciliation items can be manual and time consuming. This paper Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Conversational AI"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=77"}},{"id":"official:46d3a5f7d916cd8d","title":"Generating contextual images for long-form text","url":"https://www.amazon.science/publications/generating-contextual-images-for-long-form-text","published":"2024","authors":["Avijit Mitra","Nalin Gupta","Chetan Naik","Abhinav Sethy","Kinsey Bice","Zeynab Raeesy"],"abstract":"We investigate the problem of synthesizing relevant visual imagery from generic long-form text, leveraging Large Language Models (LLMs) and Text-to-Image Models (TIMs). Current Text-to-Image models require short prompts that describe the image content and style explicitly. Unlike image prompts, generation of images from general long-form text requires the image synthesis system to derive the visual content Category: Computer vision","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Computer vision"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=71"}},{"id":"official:66dfe23cede89fd4","title":"GRAM: Global reasoning for multi-page VQA","url":"https://www.amazon.science/publications/gram-global-reasoning-for-multi-page-vqa","published":"2024","authors":["Tsachi Blau","Sharon Fogel","Roi Ronen","Alona Golts","Roy Ganz","Elad Ben Avraham","Aviad Aberdam","Shahar Tsiper","Ron Litman"],"abstract":"The increasing use of transformer-based large language models brings forward the challenge of processing long sequences. In document visual question answering (DocVQA), leading methods focus on the single-page setting, while documents can span hundreds of pages. We present GRAM, a method that seamlessly extends pre-trained single-page models to the multi-page setting, with-out requiring computationally-heavy Category: Computer vision","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Computer vision"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=72"}},{"id":"official:f7caabeef05b376b","title":"Fine-tuning language models for joint rewriting and completion of code with potential bugs","url":"https://www.amazon.science/publications/fine-tuning-language-models-for-joint-rewriting-and-completion-of-code-with-potential-bugs","published":"2024","authors":["Dingmin Wang","Jinman Zhao","Hengzhi Pei","Samson Tan","Sheng Zha"],"abstract":"Handling drafty partial code remains a notable challenge in real-time code suggestion applications. Previous work has demonstrated shortcomings of large language models of code (CodeLLMs) in completing partial code with potential bugs. In this study, we view partial code as implementation hints and fine-tune CodeLLMs to jointly rewrite and complete partial code into functional full programs. We explore Category: Machine learning","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Machine learning"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=53"}},{"id":"official:fce2c13110735953","title":"FLAP: Flow-adhering planning with constrained decoding in LLMs","url":"https://www.amazon.science/publications/flap-flow-adhering-planning-with-constrained-decoding-in-llms","published":"2024","authors":["Shamik Roy","Sailik Sengupta","Daniele Bonadiman","Saab Mansour","Arshit Gupta"],"abstract":"Planning is a crucial task for agents in task oriented dialogs (TODs). Human agents typically resolve user issues by following predefined workflows, decomposing workflow steps into actionable items, and performing actions by executing APIs in order; all of which require reasoning and planning. With the recent advances in LLMs, there have been increasing attempts to use them for task planning and API usage Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Conversational AI"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=71"}},{"id":"official:8b24dad39866db33","title":"Enhancing vision-language pre-training with rich supervisions","url":"https://www.amazon.science/publications/enhancing-vision-language-pre-training-with-rich-supervisions","published":"2024","authors":["Yuan Gao","Kunyu Shi","Pengkai Zhu","Edouard Belval","Oren Nuriel","Srikar Appalaraju","Shabnam Ghadar","Vijay Mahadevan","Zhuowen Tu","Stefano Soatto"],"abstract":"We propose Strongly Supervised pre-training with ScreenShots (S4) - a novel pre-training paradigm for Vision-Language Models using data from large-scale web screenshot rendering. Using web screenshots unlocks a treasure trove of visual and textual cues that are not present in using image-text pairs. In S4, we leverage the inherent tree-structured hierarchy of HTML elements and the spatial localization to Category: Computer vision","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Computer vision"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=73"}},{"id":"official:ebef2b0b16cf6b58","title":"Enhancing security control production with generative AI","url":"https://www.amazon.science/publications/enhancing-security-control-production-with-generative-ai","published":"2024","authors":["Chen Ling","Mina Ghashami","Vianne Gao","Ali Torkamani","Ruslan Vaulin","Nivedita Mangam","Bhavya Jain","Farhan Diwan","Malini SS","Mingrui Cheng","Shreya Tarur Kumar","Felix Candelario"],"abstract":"Security controls are mechanisms or policies designed for cloud-based services to reduce risk, protect information, and ensure compliance with security regulations. The development of security controls is traditionally a labor-intensive and time-consuming process. This paper explores the use of Generative AI to accelerate the generation of security controls. We specifically focus on generating Gherkin codes Category: Security, privacy, and abuse prevention","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Security, privacy, and abuse prevention"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=38"}},{"id":"official:bc6ee4eed56019b4","title":"Enhancing multimodal large language models with multi-instance visual prompt generator for visual representation enrichment","url":"https://www.amazon.science/publications/enhancing-multimodal-large-language-models-with-multi-instance-visual-prompt-generator-for-visual-representation-enrichment","published":"2024","authors":["Wenliang Zhong","Wenyi Wu","Qi Li","Rob Barton","Boxin Du","Shioulin Sam","Karim Bouyarmane","Ismail Tutar","Junzhou Huang"],"abstract":"Multimodal Large Language Models (MllMs) have achieved SOTA performance in various visual language tasks by fusing the visual representations with LLMs lever-aging some visual adapters. In this paper, we first establish that adapters using query-based Transformers such as Q-former is a simplified Multi-instance Learning method with-out considering instance heterogeneity/correlation. We then propose a general Category: Computer vision","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Computer vision"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=61"}},{"id":"official:210506e080527f22","title":"Enhancing contextual understanding in large language models through contrastive decoding","url":"https://www.amazon.science/publications/enhancing-contextual-understanding-in-large-language-models-through-contrastive-decoding","published":"2024","authors":["Zheng Zhao","Emilio Monti","Jens Lehmann","Haytham Assem"],"abstract":"Large language models (LLMs) tend to inadequately integrate input context during text generation, relying excessively on encoded prior knowledge in model parameters, potentially resulting in generated text with factual inconsistencies or contextually unfaithful content. LLMs utilize two primary knowledge sources: 1) prior (parametric) knowledge from pretraining, and 2) contextual (non-parametric) knowledge Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Conversational AI"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=69"}},{"id":"official:8454ffe253aa51ae","title":"ECON: On the detection and resolution of evidence conflicts","url":"https://www.amazon.science/publications/econ-on-the-detection-and-resolution-of-evidence-conflicts","published":"2024","authors":["Cheng Jiayang","Chunkit Chan","Qianqian Zhuang","Lin Qiu","Tianhang Zhang","Tengxiao Liu","Yangqiu Song","Yue Zhang","Pengfei Liu","Zheng Zhang"],"abstract":"The rise of large language models (LLMs) has significantly influenced the quality of information in decision-making systems, leading to the prevalence of AI-generated content and challenges in detecting misinformation and managing conflicting information, or \"inter-evidence conflicts.\" This study introduces a method for generating diverse, validated evidence conflicts to simulate real-world misinformation Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Conversational AI"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=43"}},{"id":"official:e1610f66f4bc1485","title":"Domain aligned CLIP for few-shot classification","url":"https://www.amazon.science/publications/domain-aligned-clip-for-few-shot-classification","published":"2024","authors":["Waleed Gondal","Jochen Gast","Inigo Alonso Ruiz","Richard Droste","Tommaso Macri","Suren Kumar","Luitpold Staudigl"],"abstract":"Large vision-language representation learning models like CLIP have demonstrated impressive performance for zero-shot transfer to downstream tasks while largely benefiting from inter-modal (image-text) alignment via contrastive objectives. This downstream performance can further be enhanced by full-scale fine-tuning which is often compute intensive, requires large labelled data, and can reduce out-of-distribution Category: Computer vision","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Computer vision"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=81"}},{"id":"official:50023d1e4ac0f3ac","title":"Distance-aware calibration for pre-trained language models","url":"https://www.amazon.science/publications/distance-aware-calibration-for-pre-trained-language-models","published":"2024","authors":["Alberto Gasparin","Gianluca Detommaso"],"abstract":"Language Models for text classification often produce overconfident predictions for both indistribution and out-of-distribution samples, i.e. the model’s output probabilities do not match their accuracy. Prior work showed that simple post-hoc approaches are effective for mitigating this issue, but are not robust in noisy settings, e.g., when the distribution shift is caused by spelling mistakes. In this Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Conversational AI"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=45"}},{"id":"official:0e0bb47b73c15aef","title":"Diffusion models for multi-modal generative modeling","url":"https://www.amazon.science/publications/diffusion-models-for-multi-modal-generative-modeling","published":"2024","authors":["Changyou Chen","Han Ding","Bunyamin Sisman","Yi Xu","Ouye Xie","Benjamin Yao","Son Tran","Belinda Zeng"],"abstract":"Diffusion-based generative modeling has been achieving state-of-the-art results on various generation tasks. Most diffusion models, however, are limited to a single-generation modeling. Can we generalize diffusion models with the ability of multi-modal generative training for more generalizable modeling? In this paper, we propose a principled way to define a diffusion model by constructing a unified multi-modal Category: Machine learning","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Machine learning"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=69"}},{"id":"official:fc55f5571294db79","title":"Differentially private bias-term fine-tuning of foundation models","url":"https://www.amazon.science/publications/differentially-private-bias-term-fine-tuning-of-foundation-models","published":"2024","authors":["Zhiqi Bu","Yu-Xiang Wang","Sheng Zha","George Karypis"],"abstract":"We study the problem of differentially private (DP) fine-tuning of large pre-trained models — a recent privacy-preserving approach suitable for solving downstream tasks with sensitive data. Existing work has demonstrated that high accuracy is possible under strong privacy constraint, yet requires significant computational overhead or modifications to the network architecture. We propose differentially private Category: Machine learning","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Machine learning"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=56"}},{"id":"official:8c49b6ed381fa05a","title":"DiNADO: Norm-disentangled neurally-decomposed oracles for controlling language models","url":"https://www.amazon.science/publications/dinado-norm-disentangled-neurally-decomposed-oracles-for-controlling-language-models","published":"2024","authors":["Sidi Lu","Wenbo Zhao","Chenyang Tao","Arpit Gupta","Shanchan Wu","Tagyoung Chung","Violet Peng"],"abstract":"NeurAlly-Decomposed Oracle (NADO) is a powerful approach for controllable generation with large language models. It is designed to avoid catastrophic forgetting while achieving guaranteed convergence to an entropy-maximized closed-form optimal solution with reasonable modeling capacity. Despite the success, several challenges arise when apply NADO to a wide range of scenarios. Vanilla NADO suffers from Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Conversational AI"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=52"}},{"id":"official:815a579c9a612e0d","title":"DetoxBench: Benchmarking large language models for multitask fraud & abuse detection","url":"https://www.amazon.science/publications/detoxbench-benchmarking-large-language-models-for-multitask-fraud-abuse-detection","published":"2024","authors":["Joymallya Chakraborty","Wei Xia","Anirban Majumder","Dan Ma","Walid Chaabene","Naveed Janvekar"],"abstract":"Large language models (LLMs) have demonstrated remarkable capabilities in natural language processing tasks. However, their practical application in high-stake domains, such as fraud and abuse detection, remains an area that requires further exploration. The existing applications often narrowly focus on specific tasks like toxicity or hate speech detection. In this paper, we present a comprehensive benchmark Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Conversational AI"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=50"}},{"id":"official:ef499e37460044d9","title":"DeepMMATE: Deep learning based multimodal architecture for audit taxability classification with XAI","url":"https://www.amazon.science/publications/deepmmate-deep-learning-based-multimodal-architecture-for-audit-taxability-classification-with-xai","published":"2024","authors":["Harish Y V S"],"abstract":"Review of non-taxable products is an important internal audit which is carried out by majority of e-commerce stakeholders. This process usually cross checks the initial taxability assignments to avoid any unnecessary penalties incurred to the companies during the actual audits by the respective state compliance teams/tax departments. In order to handle millions of products sold online on e-commerce websites Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Conversational AI"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=76"}},{"id":"official:cf75cf74c13d8397","title":"Data-centric anomaly detection with diffusion models","url":"https://www.amazon.science/publications/data-centric-anomaly-detection-with-diffusion-models","published":"2024","authors":["Sheldon Liu","Gordon Wang","Lei Liu","Xuefeng Liu"],"abstract":"Anomaly detection, also referred to as one-class classification, plays a crucial role in identifying product images that deviate from the expected distribution. This study introduces Data-centric Anomaly Detection with Diffusion Models (DCADDM), presenting a systematic strategy for data collection and further diversifying the data with image generation via diffusion models. The algorithm addresses data Category: Computer vision","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Computer vision"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=78"}},{"id":"official:4a956680ddbcfd88","title":"Dancing in chains: Reconciling instruction following and faithfulness in language models","url":"https://www.amazon.science/publications/dancing-in-chains-reconciling-instruction-following-and-faithfulness-in-language-models","published":"2024","authors":["Zhengxuan Wu","Yuhao Zhang","Peng Qi","Yumo Xu","Rujun Han","Yian Zhang","Jifan Chen","Bonan Min","Zhiheng Huang"],"abstract":"Modern language models (LMs) need to follow human instructions while being faithful; yet, they often fail to achieve both. Here, we provide concrete evidence of a trade-off between instruction following (i.e., follow open-ended instructions) and faithfulness (i.e., ground responses in given context) when training LMs with these objectives. For instance, fine-tuning LLaMA-7B on instruction following datasets Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Conversational AI"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=40"}},{"id":"official:ff2353ae146a299c","title":"DISTMM: Accelerating distributed multimodal model training","url":"https://www.amazon.science/publications/distmm-accelerating-distributed-multimodal-model-training","published":"2024","authors":["Jun Huang","Zhen Zhang","Shuai Zheng","Feng Qin","Yida Wang"],"abstract":"Multimodal model training takes multiple types of inputs to process with differently structured submodules, and aggregates outcomes from the submodules to learn the relationship among various types of inputs, e.g., correlating text to image for text-to-image generation. The differences of submodule architectures as well as their inputs lead to heterogeneity in terms of computation efficiency. Failing to Category: Computer vision","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Computer vision"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=75"}},{"id":"official:235ebf6b081f81af","title":"DEED: Dynamic early exit on decoder for accelerating encoder-decoder transformer models","url":"https://www.amazon.science/publications/deed-dynamic-early-exit-on-decoder-for-accelerating-encoder-decoder-transformer-models","published":"2024","authors":["Peng Tang","Pengkai Zhu","Tian Li","Srikar Appalaraju","Vijay Mahadevan","R. Manmatha"],"abstract":"Encoder-decoder transformer models have achieved great success on various vision-language (VL) and language tasks, but they suffer from high inference latency. Typically, the decoder takes up most of the latency because of the auto-regressive decoding. To accelerate the inference, we propose an approach of performing Dynamic Early Exit on Decoder (DEED). We build a multi-exit encoder-decoder transformer Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Conversational AI"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=70"}},{"id":"official:f8d35d74d5943b5c","title":"CoverICL: Selective annotation for in-context learning via active graph coverage","url":"https://www.amazon.science/publications/covericl-selective-annotation-for-in-context-learning-via-active-graph-coverage","published":"2024","authors":["Costas Mavromatis","Balasubramaniam Srinivasan","Zhengyuan Shen","Jiani Zhang","Huzefa Rangwala","Christos Faloutsos","George Karypis"],"abstract":"In-context learning (ICL) adapts Large Language Models (LLMs) to new tasks, without requiring any parameter updates, but few an-notated examples as input. In this work, we investigate selective annotation for ICL, where there is a limited budget for annotating examples, similar to low-budget active learning (AL). Although uncertainty-based selection is unreliable with few annotated data, we present COVERICL Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Conversational AI"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=38"}},{"id":"official:31757b56f374058b","title":"Cost-effective hallucination detection for LLMs","url":"https://www.amazon.science/publications/cost-effective-hallucination-detection-for-llms","published":"2024","authors":["Simon Valentin","Jinmiao Fu","Gianluca Detommaso","Shaoyuan Xu","Giovanni Zappella","Bryan Wang"],"abstract":"Large language models (LLMs) can be prone to hallucinations —generating unreliable outputs that are unfaithful to their inputs, external facts or internally inconsistent. In this work, we address several challenges for post-hoc hallucination detection in production settings. Our pipeline for hallucination detection entails: first, producing a confidence score representing the likelihood that a generated Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Conversational AI"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=49"}},{"id":"official:0a91357f23d8298d","title":"CorrSynth - A correlated sampling method for diverse dataset generation from LLMs","url":"https://www.amazon.science/publications/corrsynth-a-correlated-sampling-method-for-diverse-dataset-generation-from-llms","published":"2024","authors":["Suhas Kowshik","Abhishek Divekar","Vijit Malik"],"abstract":"Large language models (LLMs) have demonstrated remarkable performance in diverse tasks using zero-shot and few-shot prompting. Even though their capabilities of data synthesis have been studied well in recent years, the generated data suffers from a lack of diversity, less adherence to the prompt, and potential biases that creep into the data from the generator model. In this work, we tackle the challenge Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Conversational AI"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=45"}},{"id":"official:2cdcaf89d85be487","title":"ConSiDERS—the human-evaluation framework: Rethinking human evaluation for generative large language models","url":"https://www.amazon.science/publications/considers-the-human-evaluation-framework-rethinking-human-evaluation-for-generative-large-language-models","published":"2024","authors":["Aparna Elangovan","Ling Liu","Lei Xu","Sravan Bodapati","Dan Roth"],"abstract":"In this position paper, we argue that human evaluation of generative large language models (LLMs) should be a multidisciplinary undertaking that draws upon insights from disciplines such as user experience research and human behavioral psychology to ensure that the experimental design and results are reliable. The conclusions from these evaluations, thus, must consider factors such as usability, aesthetics Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Conversational AI"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=57"}},{"id":"official:0cb047adf403b68f","title":"CodeFort: Robust training for code generation models","url":"https://www.amazon.science/publications/codefort-robust-training-for-code-generation-models","published":"2024","authors":["Yuhao Zhang","Shiqi Wang","Haifeng Qian","Zijian Wang","Mingyue Shang","Linbo Liu","Sanjay Krishna Gouda","Baishakhi Ray","Murali Krishna Ramanathan","Xiaofei Ma","Anoop Deoras"],"abstract":"Code generation models are not robust to small perturbations, which often lead to incorrect generations and significantly degrade the performance of these models. Although improving the robustness of code generation models is crucial to enhancing user experience in real-world applications, existing research efforts do not address this issue. To fill this gap, we propose CodeFort, a framework to improve Category: Machine learning","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Machine learning"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=43"}},{"id":"official:c8f874f42bc26bc6","title":"Code representation learning at scale","url":"https://www.amazon.science/publications/code-representation-learning-at-scale","published":"2024","authors":["Dejiao Zhang","Wasi Ahmad","Ming Tan","Hantian Ding","Ramesh Nallapati","Dan Roth","Xiaofei Ma","Bing Xiang"],"abstract":"Recent studies have shown that code language models at scale demonstrate significant performance gains on downstream tasks, i.e., code generation. However, most of the existing works on code representation learning train models at a hundred million parameter scale using very limited pre-training corpora. In this work, we fuel code representation learning with a vast amount of code data via a two-stage pre-training Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Conversational AI"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=66"}},{"id":"official:d273874e79e78c5d","title":"CoD: Coherent detection of entities from images with multiple modalities","url":"https://www.amazon.science/publications/cod-coherent-detection-of-entities-from-images-with-multiple-modalities","published":"2024","authors":["Vinay Kumar Verma","Dween Rabius Sanny","Abhishek Singh","Deepak Gupta"],"abstract":"Object detection is a fundamental problem in computer vision, whose research has primarily focused on unimodal models, solely operating on visual data. However, in many real-world applications, data from multiple modalities may be available, such as text accompanying the visual data. Leveraging traditional models on these multi-modal data sources may lead to difficulties in accurately delineating object Category: Computer vision","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Computer vision"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=82"}},{"id":"official:fd1b0b8f6ea73b39","title":"CoCoMIC: Code completion by jointly modeling in-file and cross-file context","url":"https://www.amazon.science/publications/cocomic-code-completion-by-jointly-modeling-in-file-and-cross-file-context","published":"2024","authors":["Yangruibo Ding","Zijian Wang","Wasi Ahmad","Murali Krishna Ramanathan","Ramesh Nallapati","Parminder Bhatia","Dan Roth","Bing Xiang"],"abstract":"While pre-trained language models (LM) for code have achieved great success in code completion, they generate code conditioned only on the contents within the file, i.e., in-file context, but ignore the rich semantics in other files within the same project, i.e., project-level cross-file context, a critical source of information that is especially useful in modern modular software development. Such overlooking Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Conversational AI"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=68"}},{"id":"official:c3ccb227fc901f0a","title":"Clustering-based sampling for few-shot cross-domain keyphrase extraction","url":"https://www.amazon.science/publications/clustering-based-sampling-for-few-shot-cross-domain-keyphrase-extraction","published":"2024","authors":["Prakamya Mishra","Lincy Pattanaik","Arunima Sundar","Nishant Yadav","Mayank Kulkarni"],"abstract":"Keyphrase extraction is the task of identifying a set of keyphrases present in a document that captures its most salient topics. Scientific domain-specific pre-training has led to achieving state-of-the-art keyphrase extraction performance with a majority of benchmarks being within the domain. In this work, we explore how to effectively enable the cross-domain generalization capabilities of such models Category: Machine learning","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Machine learning"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=77"}},{"id":"official:88ba85774c55f2c2","title":"Can your model tell a negation from an implicature? Unravelling challenges with intent encoders","url":"https://www.amazon.science/publications/can-your-model-tell-a-negation-from-an-implicature-unravelling-challenges-with-intent-encoders","published":"2024","authors":["Yuwei Zhang","Siffi Singh","Sailik Sengupta","Igor Shalyminov","Hwanjun Song","Hang Su","Saab Mansour"],"abstract":"Conversational systems often rely on embedding models for intent classification and intent clustering tasks. The advent of Large Language Models (LLMs), which enable instructional embeddings allowing one to adjust semantics over the embedding space using prompts, are being viewed as a panacea for these downstream conversational tasks. However, traditional evaluation benchmarks rely solely on task metrics Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Conversational AI"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=59"}},{"id":"official:7e0eb3fd4b39c0ba","title":"Can language models learn to skip steps?","url":"https://www.amazon.science/publications/can-language-models-learn-to-skip-steps","published":"2024","authors":["Tengxiao Liu","Qipeng Guo","Xiangkun Hu","Jiayang Cheng","Yue Zhang","Xipeng Qiu","Zheng Zhang"],"abstract":"Trained on vast corpora of human language, language models demonstrate emergent human-like reasoning abilities. Yet they are still far from true intelligence, which opens up intriguing opportunities to explore the parallels of humans and model behaviors. In this work, we study the ability to skip steps in reasoning—a hallmark of human expertise developed through practice. Unlike humans, who may skip steps Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Conversational AI"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=40"}},{"id":"official:521d4887f6d5339e","title":"CaMML: Context-aware multimodal learner for large models","url":"https://www.amazon.science/publications/camml-context-aware-multimodal-learner-for-large-models","published":"2024","authors":["Yixin Chen","Shuai Zhang","Boran Han","Tong He","Bo Li"],"abstract":"In this work, we introduce Context-Aware MultiModal Learner (CaMML), for tuning large multimodal models (LMMs). CaMML, a lightweight module, is crafted to seamlessly integrate multimodal contextual samples into large models, thereby empowering the model to derive knowledge from analogous, domain-specific, up-to-date information and make grounded inferences. Importantly, CaMML is highly scalable and can Category: Computer vision","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Computer vision"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=56"}},{"id":"official:2abd2c87c83d6769","title":"CA-SSLR: Condition-aware self-supervised learning representation for generalized speech processing","url":"https://www.amazon.science/publications/ca-sslr-condition-aware-self-supervised-learning-representation-for-generalized-speech-processing","published":"2024","authors":["Yen-Ju Lu","Jing Liu","Thomas Thebaud","Laureano Moro-Velazquez","Ariya Rastrow","Najim Dehak","Jesus Villalba"],"abstract":"We introduce Condition-Aware Self-Supervised Learning Representation (CASSLR), a generalist conditioning model broadly applicable to various speech-processing tasks. Compared to standard fine-tuning methods that optimize for downstream models, CA-SSLR integrates language and speaker embeddings from earlier layers, making the SSL model aware of the current language and speaker context. This approach reduces Category: Machine learning","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Machine learning"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=41"}},{"id":"official:0bd84801b18cfbd4","title":"Bridging remote sensors with multisensor geospatial foundation models","url":"https://www.amazon.science/publications/bridging-remote-sensors-with-multisensor-geospatial-foundation-models","published":"2024","authors":["Boran Han","Shuai Zhang","Xingjian Shi","Markus Reichstein"],"abstract":"In the realm of geospatial analysis, the diversity of remote sensors, encompassing both optical and microwave technologies, offers a wealth of distinct observational capabilities. Recognizing this, we present msGFM, a multisensor geospatial foundation model that effectively unifies data from four key sensor modalities. This integration spans an expansive dataset of two million multisensor images. msGFM Category: Computer vision","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Computer vision"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=72"}},{"id":"official:6829e87b11e7d90f","title":"Boosting entity recognition by leveraging cross-task domain models for weak supervision","url":"https://www.amazon.science/publications/boosting-entity-recognition-by-leveraging-cross-task-domain-models-for-weak-supervision","published":"2024","authors":["Sanjay Agrawal","Srujana Merugu","Vivek Sembium"],"abstract":"Entity Recognition (ER) is a common natural language processing task encountered in a number of real-world applications. For common domains and named entities such as places and organisations, there exists sufficient high quality annotated data and foundational models such as T5 and GPT-3.5 also provide highly accurate predictions. However, for niche domains such as e-commerce and medicine with specialized Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Conversational AI"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=52"}},{"id":"official:b758e938962549a4","title":"Benchmarking zero-shot recognition with vision-language models: Challenges on granularity and specificity","url":"https://www.amazon.science/publications/benchmarking-zero-shot-recognition-with-vision-language-models-challenges-on-granularity-and-specificity","published":"2024","authors":["Zhenlin Xu","Yi Zhu","Tiffany Deng","Abhay Mittal","Yanbei Chen","Manchen Wang","Paolo Favaro","Joe Tighe","Davide Modolo"],"abstract":"This paper presents novel benchmarks for evaluating vision-language models (VLMs) in zero-shot recognition, focusing on granularity and specificity. Although VLMs ex-cel in tasks like image captioning, they face challenges in open-world settings. Our benchmarks test VLMs’ consistency in understanding concepts across semantic granularity levels and their response to varying text specificity. Findings show Category: Computer vision","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Computer vision"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=67"}},{"id":"official:7481bc0a80182ea7","title":"BELIEVE: Belief-enhanced instruction generation and augmentation for zero-shot bias mitigation","url":"https://www.amazon.science/publications/believe-belief-enhanced-instruction-generation-and-augmentation-for-zero-shot-bias-mitigation","published":"2024","authors":["Lisa Bauer","Ninareh Mehrabi","Palash Goyal","Kai-Wei Chang","Aram Galstyan","Rahul Gupta"],"abstract":"Language models, pre-trained on large amounts of unmoderated content, have been shown to contain societal biases. Mitigating such biases typically requires access to model parameters and training schemas. In this work, we address bias mitigation at inference time, such that it can be applied to any black-box model. To this end, we propose a belief generation and aug-mentation framework, BELIEVE, that demonstrates Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Conversational AI"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=65"}},{"id":"official:ce9a46a3c6bf9ee7","title":"BASS: Batched attention-optimized speculative sampling","url":"https://www.amazon.science/publications/bass-batched-attention-optimized-speculative-sampling","published":"2024","authors":["Haifeng Qian","Sujan Gonugondla","Sungsoo Ha","Mingyue Shang","Sanjay Krishna Gouda","Ramesh Nallapati","Sudipta Sengupta","Anoop Deoras"],"abstract":"Speculative decoding has emerged as a powerful method to improve latency and throughput in hosting large language models. However, most existing implementations focus on generating a single sequence. Real-world generative-AI applications often require multiple responses, and how to perform speculative decoding in a batched setting while preserving its latency benefits poses non-trivial challenges. This Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Conversational AI"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=57"}},{"id":"official:8be5f5daa7cf0712","title":"Automating AWS security controls: Leveraging generative AI for Gherkin script generation","url":"https://www.amazon.science/publications/automating-aws-security-controls-leveraging-generative-ai-for-gherkin-script-generation","published":"2024","authors":["Chen Ling","Mina Ghashami","Kyuhong Park","Ali Torkamani","Nivedita Mangam","Malini SS","Felix Candelario","Farhan Diwan","Mingrui Cheng"],"abstract":"Security controls are mechanisms or policies designed for cloud based services to reduce risk, protect information, and ensure compliance with security regulations. The development of security controls is traditionally a labor-intensive and time-consuming process. This paper explores the use of Generative AI to accelerate the generation of security controls. We specifically focus on generating Gherkin codes Category: Machine learning","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Machine learning"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=62"}},{"id":"official:c718d8426841e182","title":"AutoGluon-Multimodal (AutoMM): Supercharging multimodal AutoML with foundation models","url":"https://www.amazon.science/publications/autogluon-multimodal-automm-supercharging-multimodal-automl-with-foundation-models","published":"2024","authors":["Zhiqiang Tang","Haoyang Fang","Su Zhou","Taojiannan Yang","Zihan Zhong","Tony Hu","Katrin Kirchhoff","George Karypis"],"abstract":"AutoGluon-Multimodal (AutoMM) is introduced as an open-source AutoML library designed specifically for multimodal learning. Distinguished by its exceptional ease of use, AutoMM enables fine-tuning of foundation models with just three lines of code. Supporting various modalities including image, text, and tabular data, both independently and in combination, the library offers a comprehensive suite of functionalities Category: Machine learning","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Machine learning"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=65"}},{"id":"official:613bcb4c091df977","title":"An evaluation benchmark for generative AI in security domain","url":"https://www.amazon.science/publications/an-evaluation-benchmark-for-generative-ai-in-security-domain","published":"2024","authors":["Mina Ghashami","Mikhail Kuznetsov","Vianne Gao","Ganyu Teng","Phil Wallis","Joseph Xie","Ali Torkamani","Baris Coskun","Wei Ding"],"abstract":"As computing environments become increasingly complex and distributed, the volume and complexity of security data generated across various systems have grown exponentially. Extracting useful insights from this security data is crucial for effective security analytics, anomaly detection, and threat identification. However, there is a lack of comprehensive evaluation benchmarks for assessing the performance Category: Machine learning","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Machine learning"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=41"}},{"id":"official:165e66ccf8401ec6","title":"Adapting uni-modal language models for dense multi-modal co-reference resolution using parameter augmentation","url":"https://www.amazon.science/publications/adapting-uni-modal-language-models-for-dense-multi-modal-co-reference-resolution-using-parameter-augmentation","published":"2024","authors":["Sam Osebe","Prashan Wanigasekara","Thomas Gueudre","Thanh Tran"],"abstract":"The context of modern smart voice assistants are often multi-modal, where images, audio and video content are consumed by users simultaneously. In such a setup, co-reference resolution is especially challenging, and runs across modalities and dialogue turns. We explore the problem of multi-modal co-reference resolution in multi-turn dialogues and quantify the performance of multi-modal LLMs on a specially Category: Computer vision","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Computer vision"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=67"}},{"id":"official:ad0a7749bf26e478","title":"AbsInstruct: Eliciting abstraction ability from LLMs through explanation tuning with plausibility estimation","url":"https://www.amazon.science/publications/absinstruct-eliciting-abstraction-ability-from-llms-through-explanation-tuning-with-plausibility-estimation","published":"2024","authors":["Zhaowei Wang","Wei Fan","Qing Zong","Hongming Zhang","Sehyun Choi","Tianqing Fang","Xin Liu","Yangqiu Song","Ginny Y. Wong","Simon See"],"abstract":"Abstraction ability is crucial in human intelligence, which can also benefit various tasks in NLP study. Existing work shows that LLMs are deficient in abstract ability, and how to improve it remains unexplored. In this work, we design the framework AbsInstruct to enhance LLMs’ abstraction ability through instruction tuning. The framework builds instructions with in-depth explanations to assist LLMs in Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Conversational AI"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=48"}},{"id":"official:f63bfd7f2902821f","title":"AXCEL: Automated eXplainable consistency evaluation using LLMs","url":"https://www.amazon.science/publications/axcel-automated-explainable-consistency-evaluation-using-llms","published":"2024","authors":["P Aditya Sreekar","Sahil Verma","Suransh Chopra","Sarik Ghazarian","Abhishek Persad","Narayanan Sadagopan"],"abstract":"Large Language Models (LLMs) are widely used in both industry and academia for various tasks, yet evaluating the consistency of generated text responses continues to be a challenge. Traditional metrics like ROUGE and BLEU show a weak correlation with human judgment. More sophisticated metrics using Natural Language Inference (NLI) have shown improved correlations but are complex to implement, require domain-specific Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Conversational AI"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=46"}},{"id":"official:81ed43d732cb5f38","title":"A weak supervision approach for few-shot aspect based sentiment analysis","url":"https://www.amazon.science/publications/a-weak-supervision-approach-for-few-shot-aspect-based-sentiment-analysis","published":"2024","authors":["Robert Vacareanu","Siddharth Varia","Kishaloy Halder","Shuai Wang","Giovanni Paolini","Neha Anna John","Miguel Ballesteros","Smaranda Muresan"],"abstract":"We explore how weak supervision on abundant unlabeled data can be leveraged to improve few-shot performance in aspect-based sentiment analysis (ABSA) tasks. We propose a pipeline approach to construct a noisy ABSA dataset, and we use it to adapt a pre-trained sequence-to-sequence model to the ABSA tasks. We test the resulting model on three widely used ABSA datasets, before and after fine-tuning. Our pro-posed Category: Conversational AI","companies":["Amazon"],"matched_orgs":["Amazon"],"company_groups":["company_us"],"company_regions":["US"],"sources":["official_publication_page"],"source":"official_publication_page","work_type":"publication","doi":"","openalex_id":"","cited_by_count":0,"quality_score":56,"matched_keywords":["Conversational AI"],"author_affiliations":["Amazon"],"concepts":[],"official_report":true,"quality_signals":{"company_match_source":"official Amazon Science publications page https://www.amazon.science/publications?p=77"}}]}
